9.520: Statistical Learning Theory and Applications, Fall 2014

Class Times: Monday and Wednesday: 1:00pm - 2:30pm

Units: 3-0-9 H,G

Location: 46-3189

Instructors:
Tomaso Poggio (TP), Lorenzo Rosasco (LR), Carlo Ciliberto (CC), Charlie Frogner (CF), Georgios Evangelopoulos (GE), Ben Deen (BD).

Office Hours: Friday 2-3 pm in 46-5156, CBCL lounge (by appointment)

Email Contact : 9.520@mit.edu

Previous Class: FALL 2013

Further Info: 9.520 is currently NOT using the Stellar system

Course Description

Prerequisites

Grading

Problem Sets

Projects

Syllabus

Reading List

Course description

The class covers foundations and recent advances of Machine Learning in the framework of Statistical Learning Theory.

Understanding intelligence and how to replicate it in machines is arguably one of the greatest problems in science. Learning, its principles and computational implementations, is at the very core of intelligence. During the last decade, for the first time, we have been able to develop artificial intelligence systems that can solve complex tasks previously considered out of reach. Modern cameras recognize faces, and smart phones voice commands, cars can see and detect pedestrians and ATM machines automatically read checks. The machine learning algorithms that are at the roots of these success stories are trained with labeled examples rather than programmed to solve a task. Among the approaches in modern machine learning, the course focus is on regularization techniques, that are key to high- dimensional supervised learning. Regularization methods allow to treat in a unified way a large number of diverse approaches, while providing tools to design new ones.

Starting from classical methods such as Regularization Networks and Support Vector Machines, the course covers state of the art techniques based on the concepts of geometry (aka manifold learning), sparsity and a variety of algorithms for supervised learning (batch and online), feature selection, structured prediction and multitask learning.

The final part of the course is new and will focus on connections between Radial Basis Functions and deep learning networks. We will also introduce new techniques for the (unsupervised) learning of data representation. In particular we will present a new theory (M-theory) of hierarchical architectures, motivated by the visual cortex, that might suggest how to learn, in an unsupervised way, data representation that can lower the sample complexity of a final supervised learning stage.

The goal of this class is to provide students with the knowledge needed to use and develop effective machine learning solutions to challenging problems.

Prerequisites
We will make extensive use of linear algebra, basic functional analysis (we cover the essentials in class and during the math-camp), basic concepts in probability theory and concentration of measure (also covered in class and during the mathcamp). Students are expected to be familiar with MATLAB.
Grading
Requirements for grading (other than attending lectures) are: 2 problems sets, and a final project.

Problem Sets
Instructions: Use the LaTeX template provided to report your work. You are allowed a maximum of ten pages. Do not change the font or margin of the template. Submit the writeup and code scripts in a (zip/tar) file named <LastName_FirstName>_9520_fall2014_pset1 via email to 9.520@mit.edu using as a subject line "[9.520 fall2014 pset1] LastName FirstName". You also have to submit a printed copy of the solutions (not the code!) in class by the due date.
Projects
Project abstract submission (Due: Nov. 12)
Project topics suggestions: pdf, g-docs

Project draft submission (Due: Dec. 08)

Wikipedia articles
The course project can be a Wikipedia entry (we encourage and recommend this) or an implementation/extension of GURLS. You should use the standard Wikipedia article format, follow Wikipedia layout, style and content rules and create Sandbox pages for experimenting and drafting your articles. Here is a list of previously created or edited wikipedia entries.

Syllabus

Follow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Class Date Title Instructor(s)

Reading List
There is no textbook for this course. All the required information will be presented in the slides associated with each class. The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of suggested readings will also be provided separately for each class.
Primary References

Bousquet, O., S. Boucheron and G. Lugosi. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning Lecture Notes in Artificial Intelligence 3176, 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, Heidelberg, Germany (2004)

F. Cucker and S. Smale. On The Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 2002.

F. Cucker and D-X. Zhou. Learning theory: an approximation theory viewpoint. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2007.

L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1997.

T. Evgeniou and M. Pontil and T. Poggio. Regularization Networks and Support Vector Machines. Advances in Computational Mathematics, 2000.

T. Poggio and S. Smale. The Mathematics of Learning: Dealing with Data. Notices of the AMS, 2003

I. Steinwart and A. Christmann. Support vector machines. Springer, New York, 2008.

V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.

V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

N. Cristianini and J. Shawe-Taylor. Introduction To Support Vector Machines. Cambridge, 2000.

Background Mathematics References

A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, Dover Publications, 1975.

A. N. Kolmogorov and S. V. Fomin, Elements of the Theory of Functions and Functional Analysis, Dover Publications, 1999.

Luenberger, Optimization by Vector Space Methods, Wiley, 1969.

Neuroscience Related References

Serre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. "Object Recognition with Cortex-like Mechanisms", IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.

Serre, T., A. Oliva and T. Poggio."A Feedforward Architecture Accounts for Rapid Categorization", Proceedings of the National Academy of Sciences (PNAS), Vol. 104, No. 15, 6424-6429, 2007.

S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio. "Mathematics of the Neural Response", Foundations of Computational Mathematics, Vol. 10, 1, 67-91, June 2009.

Class Times:	Monday and Wednesday: 1:00pm - 2:30pm
Units:	3-0-9 H,G
Location:	46-3189
Instructors:	Tomaso Poggio (TP), Lorenzo Rosasco (LR), Carlo Ciliberto (CC), Charlie Frogner (CF), Georgios Evangelopoulos (GE), Ben Deen (BD).
Office Hours:	Friday 2-3 pm in 46-5156, CBCL lounge (by appointment)
Email Contact :	9.520@mit.edu
Previous Class:	FALL 2013
Further Info:	9.520 is currently NOT using the Stellar system