Units:3-0-9 H,G Class Times:Monday and Wednesday: 1:00 pm - 2:30 pmLocation:46-3310(PILM Serminar Room)Instructors:Tomaso Poggio (TP), Lorenzo Rosasco (LR), Alexander Rakhlin (AR), Andrzej Banburski (AB)

TAs:Michael Lee, Nhat Le, David Zhou

Office Hours:Friday 1:00 pm - 2:00 pm, 46-5156 (Poggio lab lounge) and/or 46-5165 (MIBR Reading Room) Email Contact:9.520@mit.edu Previous Class:FALL 2017, 2017 lecture videos Registration:Please register to 9.520/6.860 by filing this registration form Mailing list:Registered students will be added in the course mailing list (9520students) Stellar page:http://stellar.mit.edu/S/course/9/fa18/9.520/

Course descriptionThe course covers foundations and recent advances of machine learning from the point of view of statistical learning and regularization theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that can solve complex tasks, until recently the exclusive domain of biological organisms, such as computer vision, speech recognition or natural language understanding: cameras recognize faces, smart phones understand voice commands, smart speakers/assistants answer questions and cars can see and avoid obstacles. The machine learning algorithms that are at the roots of these success stories are trained with examples rather than programmed to solve a task.

Among different approaches in modern machine learning, the course focuses on a regularization perspective and includes both shallow and deep networks. The content is roughly divided into two parts. In the first part, key algorithmic ideas are introduced, with an emphasis on the interplay between modeling and optimization aspects. Algorithms that will be discussed include classical regularization networkds (regularized least squares, SVM, logistic regression),stochastic gradient methods, implicit regularization, sketching, sparsity based methods and deep neural networks. In the second part, key ideas in statistical learning theory will be developed to analyze the properties of the various algorithms previously introduced. Classical concepts like generalization, uniform convergence and Rademacher complexitities will be developed, together with topics such as bounds based on margin, stability, and privacy. The final part of the course focuses on deep learning networks. It will introduce an emerging theoretical framework addressing three key puzzles in deep learning: approximation theory -- which functions can be represented more efficiently by deep networks than shallow networks -- optimization theory -- why can stochastic gradient descent easily find global minima -- and machine learning -- whether classical learning theory can explain generalization in deep networks. It will also discuss connections with the architecture of visual cortex, which was the original inspiration of the layered local connectivity of modern networks and may provide ideas for future developments of deep learning.

The goal of the course is to provide students with the theoretical knowledge and the basic intuitions needed to use and develop effective machine learning solutions to challenging problems.## We will make extensive use of basic notions of calculus, linear algebra and probability. The essentials are covered in class and in the math camp material. We will introduce a few concepts in functional/convex analysis and optimization. Note that this is an advanced graduate course and some exposure on introductory Machine Learning concepts or courses is expected. Students are also expected to have basic familiarity with MATLAB/Octave.

Prerequisites

GradingRequirements for grading are attending lectures/participation (10%), four problems sets (60%) and a final project (30%).

Grading policies,

pset and project tentative dates: (slides)

Problem SetsProblem Set 1, out: Sep. 19, due: Tue., Sep. 25 (Class 07).

Problem Set 2, out: Oct. 03, due: Tue., Oct. 09 (Class 10).

Problem Set 3, out: Oct. 31, due: Sat., Nov. 10 (Class 18).

Problem Set 4, out: Nov. 14, due: Tue., Nov. 20 (Class 21).

Submission instructions:Follow the instructions included with the problem set. Use the latex template for the report (there is a maximum page limit). Submit your report online through stellar.mit by the due date/time and a printout in the first class after the due date.

ProjectsGuidelines and key dates. Online form for project proposal (complete by Nov. 01).

Reports are expected to be within 5 pages, with extended abstracts using NIPS style files

Projects archiveList of Wikipedia entries, created or edited as part of projects during previous course offerings.

SyllabusFollow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Class Date Title Instructor(s)

## Notes covering the classes will be provided in the form of independent chapters of a book currently in draft format. Additional information will be given through the slides associated with classes (where applicable). The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of additional suggested readings will also be provided separately for each class.

Reading List## Book (draft)

- L. Rosasco and T. Poggio,
Machine Learning: a Regularization Approach, MIT-9.520 Lectures Notes, Manuscript, Dec. 2017(provided).## Primary References

- S. Shalev-Shwartz and S. Ben-David.
Understanding Machine Learning: From Theory to Algorithms.Cambridge University Press, 2014.- T. Hastie, R. Tibshirani and J. Friedman.
The Elements of Statistical Learning. 2nd Ed., Springer, 2009.- I. Steinwart and A. Christmann.
Support Vector Machines.Springer, 2008.- O. Bousquet, S. Boucheron and G. Lugosi.
Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning, LNCS 3176, pp. 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, 2004.- N. Cristianini and J. Shawe-Taylor.
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.Cambridge University Press, 2000.- F. Cucker and S. Smale.
On The Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 2002.- F. Cucker and D-X. Zhou.
Learning theory: an approximation theory viewpoint.Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2007.- L. Devroye, L. Gyorfi, and G. Lugosi.
A Probabilistic Theory of Pattern Recognition.Springer, 1997.- T. Evgeniou, M. Pontil and T. Poggio.
Regularization Networks and Support Vector Machines.Advances in Computational Mathematics, 2000.- T. Poggio and S. Smale.
The Mathematics of Learning: Dealing with Data.Notices of the AMS, 2003.- V. N. Vapnik.
Statistical Learning Theory.Wiley, 1998.- V. N. Vapnik.
The Nature of Statistical Learning Theory.Springer, 2000.- S. Villa, L. Rosasco, T. Poggio.
On Learnability, Complexity and Stability. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, Chapter 7, pp. 59-70, Springer-Verlag, 2013.- T. Poggio and F. Anselmi.
Visual Cortex and Deep Networks: Learning Invariant Representations, Computational Neuroscience Series, MIT Press, 2016.- T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda, and Q. Liao.
Why and When can Deep-but not Shallow-Networks Avoid the Curse of Dimensionality: A Review. International Journal of Automation and Computing, 1-17, 2017.- T. Poggio and Q. Liao.
Theory II: Landscape of the Empirical Risk in Deep Learning. CBMM Memo 66, 2017.## Resources and links

- Machine Learning 2017-2018. University of Genoa, graduate ML course.
- L. Rosasco,
Introductory Machine Learning Notes, University of Genoa, ML 2016/2017 lectures notes, Oct. 2016.

Announcements

- [10/19]
There's going to be one problem set less, see the updated dates above..- [09/05]
Lectures 13 and 14 (Oct. 22 and 24) will take place in Building 34 Room 101.