9.520/6.860: Statistical Learning Theory and Applications, Fall 2017

Units: 3-0-9 H,G
Class Times: Monday and Wednesday: 1:00 pm - 2:30 pm
Location: 46-3002 (Singleton Auditorium)

Tomaso Poggio (TP), Lorenzo Rosasco (LR)


Georgios Evangelopoulos (GE), Amauche Emenari, Andres Campero-Nunez, Michael Lee

Office Hours: Friday 1:00 pm - 2:00 pm, 46-5156 (Poggio lab lounge) and/or 46-5165 (MIBR Reading Room)
Email Contact: 9.520@mit.edu
Previous Class: FALL 2016, 2015 lecture videos
Registration: Please register to 9.520/6.860 by filing this registration form
Mailing list: Registered students will be added in the course mailing list (9520students)
Stellar page: http://stellar.mit.edu/S/course/9/fa17/9.520/

Course description

The course covers foundations and recent advances of Machine Learning from the point of view of Statistical Learning and Regularization Theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that can solve complex tasks, until recently the exclusive domain of biological organisms, such as computer vision, speech recognition or natural language understanding: cameras recognize faces, smart phones understand voice commands, smart speakers/assistants answer questions and cars can see and avoid obstacles.

The machine learning algorithms that are at the roots of these success stories are trained with labeled examples rather than programmed to solve a task. Among the approaches in modern machine learning, the course focuses on regularization techniques, that provide a theoretical foundation to high-dimensional supervised learning. Besides classic approaches such as Support Vector Machines, the course covers state of the art techniques using sparsity or data geometry (aka manifold learning), a variety of algorithms for supervised learning (batch and online), feature selection, structured prediction, and multitask learning and principles for designing or learning data representations. Concepts from optimization theory useful for machine learning are covered in some detail (first order methods, proximal/splitting techniques,...).

The final part of the course will focus on deep learning networks. It will introduce an emerging theory formalizing three key areas for the rigorous characterization of deep learning: approximation theory -- which functions can be represented efficiently?; optimization theory -- how easy is it to minimize the training error?; and generalization properties -- is classical learning theory sufficient for deep learning? It will also outline a theory of hierarchical architectures that aims to explain how to build machine that learn using cortex principles and similar to how children learn: from few labeled and many more unlabeled data.

The goal of the course is to provide students with the theoretical knowledge and the basic intuitions needed to use and develop effective machine learning solutions to challenging problems.


We will make extensive use of basic notions of calculus, linear algebra and probability. The essentials are covered in class and in the math camp material. We will introduce a few concepts in functional/convex analysis and optimization. Note that this is an advanced graduate course and some exposure on introductory Machine Learning concepts or courses is expected. Students are also expected to have basic familiarity with MATLAB/Octave.


Requirements for grading are attending lectures/participation (10%), four problems sets (60%) and a final project (30%).

Grading policies, pset and project tentative dates: (slides)

Problem Sets

Problem Set 1, out: Sep. 20, due: Sun., Oct. 01 (Class 08).

Submission instructions: Follow the instructions included with the problem set. Use the latex template for the report (there is a maximum page limit). Submit your report online through stellar.mit by the due date/time and a printout in the first class after the due date.


Course projects should be individual research projects, focusing, typically, on one or more of the following: theory, comparisons/critical evaluations, applications, review or implementation. Project deliverables include report and poster presentation.

Reports will follow NIPS format and style files: template files

Projects archive

List of Wikipedia entries, created or edited as part of projects during previous course offerings.


Follow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Class Date Title Instructor(s)

Reading List

Notes covering the classes will be provided in the form of independent chapters of a book currently in draft format. Additional information will be given through the slides associated with classes (where applicable). The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of additional suggested readings will also be provided separately for each class.

Book (draft)

Primary References

Resources and links