9.520: Statistical Learning Theory and Applications, Fall 2013

Class Times: Monday and Wednesday: 1:00pm - 2:30pm (Updated)
Units: 3-0-9 H,G
Location: 46-3189 (Updated)

Tomaso Poggio (TP), Lorenzo Rosasco (LR), Carlo Ciliberto (CC), Charlie Frogner (CF), Georgios Evangelopoulos (GE).

Office Hours: Friday 2-3 pm in 46-5156, CBCL lounge (by appointment)
Email Contact : 9.520@mit.edu
Previous Class: SPRING 12
Further Info: 9.520 is currently NOT using the Stellar system

Course description

The class covers foundations and recent advances of Machine Learning in the framework of Statistical Learning Theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that can solve complex tasks considered out of reach. Modern cameras recognize faces, and smart phones voice commands, cars can see and detect pedestrians and ATM machines automatically read checks. The machine learning algorithms that are at the roots of these success stories are trained with labeled examples rather than programmed to solve a task. Among the approaches in modern machine learning, the course focus is on regularization techniques, that are key to high- dimensional supervised learning. Regularization methods allow to treat in a unified way a large number of diverse approaches, while providing tools to design new ones.

Starting from classical methods such as Regularization Networks and Support Vector Machines, the course covers state of the art techniques based on the concepts of geometry (aka manifold learning), sparsity and a variety of algorithms for supervised learning (batch and online), feature selection, structured prediction and multitask learning.

The final part of the course is new and will focus on (unsupervised) learning of data representations, with an emphasis on hierarchical (deep) architectures. In particular we will present a new theory (M-theory) of hierarchical architectures, motivated by the visual cortex, that might suggest how to learn, in an unsupervised way, data representations that can lower the sample complexity of later supervised learning stages.

The goal of this class is to provide students with the knowledge needed to use and develop effective machine learning solutions to challenging problems.


We will make extensive use of linear algebra, basic functional analysis (we cover the essentials in class and during the math-camp), basic concepts in probability theory and concentration of measure (also covered in class and during the mathcamp). Students are expected to be familiar with MATLAB.


Requirements for grading (other than attending lectures) are: 2 problems sets, and a final project.

Problem Sets

Note: for the Problems that require to write code and run experiments, submit via email to 9.520@mit.edu your MATLAB code by the due date.


The final project can be either a Wikipedia entry or a research project (we recommend a Wikipedia entry).
We envision 2 kinds of research project: For the Wikipedia article, we suggest a short one using the Wikipedia standard article format; for the research project you should use this template. Reports should be 8 pages maximum, including references. Additional material can be included in the appendix.


Follow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Class Date Title Instructor(s)

Reading List

There is no textbook for this course. All the required information will be presented in the slides associated with each class. The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of suggested readings will also be provided separately for each class.

Primary References

Background Mathematics References

Neuroscience Related References