Units:3-0-9 H,G Class Times:Monday and Wednesday: 1:00 pm - 2:30 pmLocation:46-3310Instructors:TAs:Hongyi Zhang, Max Kleiman-Weiner, Jon Malmaud, Brando Miranda, Xavier Boix, Georgios Evangelopoulos

Office Hours:Thursday 3-4 pm, 46-5156 (Poggio Lab lounge) Email Contact:9.520@mit.edu Previous Class:FALL 2015, lecture videos Further Info:9.520/6.860 is currently NOTusing the Stellar systemRegistration:Please register to 9.520/6.860 by filing this registration form Mailing list:Registered students will be added in the course mailing list (9520students)

Course descriptionThe course covers foundations and recent advances of Machine Learning from the point of view of Statistical Learning and Regularization Theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that can solve complex tasks, previously considered out of reach: Cameras recognize faces, smartphones understand your voice and cars can see and avoid obstacles.

The machine learning algorithms that are at the roots of these success stories are trained with labeled examples rather than programmed to solve a task. Among the approaches in modern machine learning, the course focuses on regularization techniques, that provide a theoretical foundation to high-dimensional supervised learning. Besides classic approaches such as Support Vector Machines, the course covers state of the art techniques exploiting data geometry (aka manifold learning), sparsity and a variety of algorithms for supervised learning (batch and online), feature selection, structured prediction and multitask learning. Concepts from optimization theory useful for machine learning are covered in some detail (first order methods, proximal/splitting techniques,...).

The final part of the course will focus on deep learning networks. It will introduce a theoretical framework connecting the computations within the layers of deep learning networks to kernel machines. It will study an extension of convolutional layers in order to deal with more general invariance properties and to learn them from implicitly supervised data. It will describe new theorems characterizing the class of learning problems for which deep networks -- but not shallow networks -- can avoid the curse of dimensionality. This emerging theory of hierarchical architectures may explain how the visual cortex learns, in an implicitly supervised way, a data representation that can lower the sample complexity of a final supervised learning stage.

The goal of this course is to provide students with the theoretical knowledge and the basic intuitions needed to use and develop effective machine learning solutions to challenging problems.## We will make extensive use of basic notions of calculus, linear algebra and probability. The essentials are covered in class and during the Math Camp. We will introduce a few concepts in functional/convex analysis and optimization. Note that this is an advanced graduate course and some exposure on introductory Machine Learning concepts or courses is expected. Students are also expected to be familiar with MATLAB/Octave.

Prerequisites

GradingRequirements for grading are attending lectures/participation (10%), four problems sets (60%) and a final project (30%).

Slides with grading policies and anticiptated dates: (pdf).

Problem SetsProblem Set 1, out: Sep. 21, due: Sun., Oct. 02 (Class 08).

Problem Set 2, out: Oct. 13, due: Mon., Oct. 24 (Class 13).

Problem Set 3, out: Oct. 27, due: Mon., Nov. 07 (Class 17).

Problem Set 4, out: Nov. 17, due: Tue., Nov. 29 (Class 24).

Submission instructions:Follow the instructions included with each problem set. Use the provided latex template for the writeup (there is a maximum page limit). Submit your writeup online (including code if applicable) by the due date/time and a printout in the first class after the due date.

ProjectsCourse projects should be individual research projects, focusing, typically, on one or more of the following: theory, comparisons/critical evaluations, applications, review or implementation.

Guidelines and key dates. Online form for project proposal (complete by Oct. 31).

Poster, due: Sun., Dec. 11,Poster presentationsMon., Dec. 12,Final report, due: Thu., Dec. 15Reports should follow NIPS format and style files: template files

Projects archiveList of Wikipedia entries, created or edited as part of projects during previous course offerings.

SyllabusFollow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Class Date Title Instructor(s)

## Notes covering the classes will be provided in the form of independent chapters of a book currently in draft format. Additional information will be given through the slides associated with classes (where applicable). The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of additional suggested readings will also be provided separately for each class.

Reading List## Book (draft)

- L. Rosasco and T. Poggio,
Machine Learning: a Regularization Approach, MIT-9.520 Lectures Notes, Manuscript, Dec. 2015(provided).## Primary References

- S. Shalev-Shwartz and S. Ben-David.
Understanding Machine Learning: From Theory to Algorithms.Cambridge University Press, 2014.- T. Hastie, R. Tibshirani and J. Friedman.
The Elements of Statistical Learning. 2nd Ed., Springer, 2009.- I. Steinwart and A. Christmann.
Support Vector Machines.Springer, 2008.- O. Bousquet, S. Boucheron and G. Lugosi.
Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning, LNCS 3176, pp. 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, 2004.- N. Cristianini and J. Shawe-Taylor.
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.Cambridge University Press, 2000.- F. Cucker and S. Smale.
On The Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 2002.- F. Cucker and D-X. Zhou.
Learning theory: an approximation theory viewpoint.Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, 2007.- L. Devroye, L. Gyorfi, and G. Lugosi.
A Probabilistic Theory of Pattern Recognition.Springer, 1997.- T. Evgeniou, M. Pontil and T. Poggio.
Regularization Networks and Support Vector Machines.Advances in Computational Mathematics, 2000.- T. Poggio and S. Smale.
The Mathematics of Learning: Dealing with Data.Notices of the AMS, 2003- V. N. Vapnik.
Statistical Learning Theory.Wiley, 1998.- V. N. Vapnik.
The Nature of Statistical Learning Theory.Springer, 2000.- S. Villa, L. Rosasco, T. Poggio.
On Learnability, Complexity and Stability. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, Chapter 7, pp. 59-70, Springer-Verlag, 2013.- T. Poggio and F. Anselmi.
Visual Cortex and Deep Networks: Learning Invariant Representations, Computational Neuroscience Series, MIT Press, 2016.## Resources and links

- Machine Learning 2016-2017. University of Genoa, graduate ML course.
- L. Rosasco,
Introductory Machine Learning Notes, University of Genoa, ML 2016/2017 lectures notes, Oct. 2016.

Announcements

- Project report style files and instructions are available.
- Notes on approximation theory and deep networks (Classes 23 and 24) have been released.
- Chapter 10 (
Learning Data Representation) has been released. Updates pending.- Problem Set 4 is out. Due date is Tue, Nov. 29, 11:59pm.
- Chapter 9 (
Multi-output Learning) has been released. Updates pending.- Project feedback has been sent out. You can optionally revise by Nov. 10 --there is not going to be a new iteration for feedback.
- Chapter 7 (
Online Learning) and Chapter 8 (Manifold Regularization), have been released. Updates will follow.- Project proposal is due on Oct. 31 --complete the online form (with title and abstract). Guidelines have been posted.
- Problem Set 3 is out. Due date is Mon, Nov. 07, 11:59pm. Check mailing list announcement.
- Chapter 6 (
Sparsity), covering Classes 11-14, has been released. Minor updates will follow.- LR will be having office hours, Fridays 3:00 - 4:00 pm (at 46-5156).
- Problem Set 2 is out. Due date is Mon, Oct. 24, 11:59pm. Check mailing list announcement.
- Chapter 5 (
Beyond Penalization) has been released. "Further Reading" suggestions, for classes 06-09, are updated.- Appendix 2 (
Convex Optimization) of the book has been released.- Chapter 2 (
Consistency, Learnability and Regularization) and Chapter 4 (Regularization Networks) of the notes are on the course shared folder. Updated versions are likely to follow.- Problem Set 1 is out. Due date is Sun, Oct. 02, 11:59pm. Check mailing list announcement.
- Office hours changed to Thursdays 3:00 - 4:00 pm (at 46-5156).
- Link to shared class dropbox has been reset. Check mailing list announcement.
- Chapter 3 (
Hypothesis Spaces) of the course notes, covering Classes 04-05, is now on the course shared folder.- Chapter 1 (
Statistical Learning Theory) of the course notes, covering Class 03, has been sent out through the mailing list.- Slides for the grading requirements and due dates (as discussed in Class 03) have been posted (pdf).