Class Times:Monday and Wednesday: 1:00pm - 2:30pmUnits:3-0-9 H,G Location:46-3310Instructors:TAs:Carlo Ciliberto, Georgios Evangelopoulos, Maximilian Nickel, Ben Deen, Hongyi Zhang, Stephen Voinea, Owen Lewis.

Office Hours:Friday 2-3 pm in 46-5156 (Poggio Lab lounge) Email Contact :9.520@mit.edu Previous Class:FALL 2014 Further Info:9.520 is currently NOTusing the Stellar systemNew:Videos available here!

Course descriptionThe class covers foundations and recent advances of Machine Learning from the point of view of Statistical Learning Theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that can solve complex tasks considered out of reach. ATM machines read checks, cameras recognize faces, smart phones understand your voice and cars can see and avoid obstacles.

The machine learning algorithms that are at the roots of these success stories are trained with labeled examples rather than programmed to solve a task. Among the approaches in modern machine learning, the course focuses on regularization techniques, that provide a theoretical foundation to high- dimensional supervised learning. Besides classic approaches such as Support Vector Machines, the course covers state of the art techniques exploiting data geometry (aka manifold learning), sparsity and a variety of algorithms for supervised learning (batch and online), feature selection, structured prediction and multitask learning. Concepts from optimization theory useful for machine learning are covered in some detail (first order methods, proximal/splitting techniques...).

The final part of the course will focus on deep learning networks. It will introduce a theoretical framework connecting the computations within the layers of deep learning networks to kernel machines. It will study an extension of the convolutional layers in order to deal with more general invariance properties and to learn them from implicitly supervised data. This theory of hierarchical architectures may explain how visual cortex learn, in an implicitly supervised way, data representation that can lower the sample complexity of a final supervised learning stage.

The goal of this class is to provide students with the theoretical knowledge and the basic intuitions needed to use and develop effective machine learning solutions to challenging problems.## We will make extensive use of linear algebra, basic functional analysis (we cover the essentials in class and during the math-camp), basic concepts in probability theory and concentration of measure (also covered in class and during the mathcamp). Students are expected to be familiar with MATLAB.

Prerequisites## Requirements for grading (other than attending lectures) are: 2 problems sets, and a final project.

Grading## Problem Set 1: Posted: Oct. 14, Due date extended to Mon, Nov. 02.

Problem Sets

Problem Set 2. Posted: Nov. 12, Due date:Nov. 30.

Problem Set 2 submission process changed:please refer to the new instructions sent to the mailing list.

ProjectsProject request:Fill the online form (byNov. 25).

The course project can be any of the following:Tentative list of project topic examples (updated regularily): g-docs

Wikipedia: Editing or creating new Wikipedia entries on a topic from the coure syllabus. See here for a list of previously created or edited Wikipedia entries that resulted from course projects.Coding: An implementation of one of the algorithms presented in class, integrated and part of the GURLS open-source library.Exercises: Designing (and solving) problems for the various course topics.

## Wikipedia articles (instructions)

You should use the standard Wikipedia article format, follow Wikipedia layout, style and content rules and create Sandbox pages for drafting and previewing your articles.## Nov. 18 (class 20): projects open/abstract submission (online form) (Nov. 18 - Nov. 25)

Dates

Nov. 25(class 22): hard deadline

Dec. 16: final project submission

SyllabusFollow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Class Date Title Instructor(s)

## Notes covering the classes will be provided in the form of independent chapters of a book currently in draft format. Additional information will be given through the slides associated with classes (where applicable). The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of suggested readings will also be provided separately for each class.

Reading List## Book (draft)

- L. Rosasco, T. Poggio,
Machine Learning: a Regularization Approach, MIT-9.520 Lectures Notes, Manuscript, Dec. 2015(will be provided).## Primary References

- Bousquet, O., S. Boucheron and G. Lugosi.
Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning Lecture Notes in Artificial Intelligence 3176, 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, Heidelberg, Germany (2004)- N. Cristianini and J. Shawe-Taylor.
Introduction To Support Vector Machines.Cambridge, 2000.- F. Cucker and S. Smale.
On The Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 2002.- F. Cucker and D-X. Zhou.
Learning theory: an approximation theory viewpoint.Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2007.- L. Devroye, L. Gyorfi, and G. Lugosi.
A Probabilistic Theory of Pattern Recognition.Springer, 1997.- T. Evgeniou and M. Pontil and T. Poggio.
Regularization Networks and Support Vector Machines.Advances in Computational Mathematics, 2000.- T. Poggio and S. Smale.
The Mathematics of Learning: Dealing with Data.Notices of the AMS, 2003- I. Steinwart and A. Christmann.
Support vector machines.Springer, New York, 2008.- V. N. Vapnik.
Statistical Learning Theory.Wiley, 1998.- V. N. Vapnik.
The Nature of Statistical Learning Theory.Springer, 1995.- Silvia Villa, Lorenzo Rosasco, Tomaso Poggio.
On Learning, Complexity and Stability. "Empirical Inference, Festschrift in Honor of Vladimir N. Vapnik." Editors: Scholkopf, Bernhard; Luo, Zhiyuan; Vovk, Vladimir. Springer-Verlag Berlin and Heidelberg GmbH, Chapter 7, page 59-70, 2013.## Background Mathematics References

- A. N. Kolmogorov and S. V. Fomin,
Introductory Real Analysis, Dover Publications, 1975.- A. N. Kolmogorov and S. V. Fomin,
Elements of the Theory of Functions and Functional Analysis, Dover Publications, 1999.- Luenberger,
Optimization by Vector Space Methods, Wiley, 1969.## Neuroscience Related References

- Serre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. "
Object Recognition with Cortex-like Mechanisms", IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.- Serre, T., A. Oliva and T. Poggio."
A Feedforward Architecture Accounts for Rapid Categorization", Proceedings of the National Academy of Sciences (PNAS), Vol. 104, No. 15, 6424-6429, 2007.- S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio. "
Mathematics of the Neural Response", Foundations of Computational Mathematics, Vol. 10, 1, 67-91, June 2009.- Fabio Anselmi, Joel Z. Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti and Tomaso Poggio "
Unsupervised Learning of Invariant Representations in Hierarchical Architectures", Theoretical Computer Science, 2014.- Fabio Anselmi, Lorenzo Rosasco and Tomaso Poggio "
On Invariance and Selectivity in Representation Learning", arXiv:1503.05938, 2015