9.520: Statistical Learning Theory and Applications, Spring 2009

Class Times: Monday and Wednesday 10:30-12:00

Units: 3-0-9 H,G

Location: 46-5193

Instructors: Tomaso Poggio (TP), Ryan Rifkin (RR), Jake Bouvrie (JB),
Lorenzo Rosasco (LR), Charlie Frogner (CF)

Office Hours: By appointment

Email Contact : 9.520@mit.edu

Previous Class: SPRING 08

Course Description

Prerequisites

Grading

Problem Sets

Projects

Syllabus

Reading List

Course description
Focuses on the problem of supervised and unsupervised learning from the perspective of modern statistical learning theory, starting with the theory of multivariate function approximation from sparse data. Develops basic tools such as regularization, including support vector machines for regression and classification. Derives generalization bounds using stability. Discusses current research topics such as manifold regularization, sparsity, feature selection, bayesian connections and techniques, and online learning. Emphasizes applications in several areas: computer vision, speech recognition, and bioinformatics. Discusses advances in the neuroscience of the cortex and their impact on learning theory and applications. The course is graded on the basis of final projects and hands-on applications and exercises.
Prerequisites
6.867 or permission of instructor. In practice, a substantial level of mathematical maturity is necessary. Familiarity with probability and functional analysis will be very helpful. We try to keep the mathematical prerequisites to a minimum, but we will introduce complicated material at a fast pace.
Grading
There will be two problem sets and a final project. To receive credit, you must attend regularly, and put in effort on all problem sets and the project.

Problem sets
Problem set #1: PDF --
Problem set #2: PDF -- Due Mon. April 13th (in class)

Projects
Project ideas: PDF

Syllabus
Follow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Date Title Instructor(s)

Class 01 Wed 04 Feb The Course at a Glance TP

Class 02 Mon 09 Feb The Learning Problem and Regularization TP

Class 03 Wed 11 Feb Reproducing Kernel Hilbert Spaces LR

Mon 16 Feb - President's Day

Class 04 Tue 17 Feb Regularized Least Squares RR

Class 05 Wed 18 Feb Several Views Of Support Vector Machines RR

Class 06 Mon 23 Feb Multiclass Classification RR

Class 07 Wed 25 Feb Spectral Regularization LR

Class 08 Mon 02 Mar Manifold Regularization LR

Class 09 Wed 04 Mar Generalization Bounds, Intro to Stability LR/TP

Class 10 Mon 09 Mar Stability of Tikhonov Regularization LR/TP

Class 11 Wed 11 Mar Sparsity Based Regularization I LR

Class 12 Mon 16 Mar Regularization for Multi-Output Learning LR

Class 13 Wed 18 Mar Loose ends, Project discussions

SPRING BREAK March 23-27

Class 14 Mon 30 Mar Sparsity, rank, and all that Ben Recht

Class 15 Wed 01 Apr Bayesian Interpretations of Regularization CF

Class 16 Mon 06 Apr A Bayesian Perspective on Statistical Learning Theory Dan Roy

Class 17 Wed 08 Apr Nonparametric Bayesian Regression and Density Estimation Vikash

Class 18 Mon 13 Apr Hierarchical Bayesian Modeling for Unsupervised Learning Vikash

Class 19 Wed 15 Apr Geometry and Learning Partha Niyogi

Mon 20 Apr - Patriot's Day

Class 20 Wed 22 Apr Demographic forecasting and the role of priors Federico Girosi

Class 21 Mon 27 Apr Vision and Visual Neuroscience TP

Class 22 Wed 29 Apr Vision and Visual Neuroscience Thomas Serre

Class 23 Mon 04 May Derived Kernels JB

Class 24 Wed 06 May Application of Belief Nets to Modelling Attention Sharat/Thomas

Class 25 Mon 11 May Project Presentations

Class 26 Wed 13 May Project Presentations

Math Camp Tue 09 Feb
5:00pm-7:00pm Probability theory notes

Old Math Camp Slides XX Functional analysis

Old Math Camp Slides XX Probability theory

Reading List
There is no textbook for this course. All the required information will be presented in the slides associated with each class. The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of suggested readings will also be provided separately for each class.
Primary References

Bousquet, O., S. Boucheron and G. Lugosi. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning Lecture Notes in Artificial Intelligence 3176, 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, Heidelberg, Germany (2004)

F. Cucker and S. Smale. On The Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 2002.

L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1997.

T. Evgeniou and M. Pontil and T. Poggio. Regularization Networks and Support Vector Machines. Advances in Computational Mathematics, 2000.

T. Poggio and S. Smale. The Mathematics of Learning: Dealing with Data. Notices of the AMS, 2003

V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.

V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

Secondary References

O. Bousquet and A. Elisseeff, Stability and Generalization, Journal of Machine Learning Research, Vol. 2, pp.499-526, 2002.

N. Cristianini and J. Shawe-Taylor. Introduction To Support Vector Machines. Cambridge, 2000.

Lo Gerfo L., Rosasco L., Odone F., De Vito E. and Verri, A. Spectral Algorithms for Supervised Learning, to appear in Neural Computation

Poggio, T., R. Rifkin, S. Mukherjee and P. Niyogi. General Conditions for Predictivity in Learning Theory, Nature, Vol. 428, 419-422, 2004 (see also Past Performance and Future Results).
Rifkin, R.,. and R.A. Lippert. Notes on Regularized Least-Squares, CBCL Paper #268/AI Technical Report #2007-019, Massachusetts Institute of Technology, Cambridge, MA, May, 2007.

Rifkin, R. and A. Klautau. In Defense of One-vs-All Classification, Journal of Machine Learning Research, Vol. 5, 101-141, 2004.

Background Mathematics References

A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, Dover Publications, 1975.

A. N. Kolmogorov and S. V. Fomin, Elements of the Theory of Functions and Functional Analysis, Dover Publications, 1999.

Luenberger, Optimization by Vector Space Methods, Wiley, 1969.

Neuroscience Related References

Serre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. "Object Recognition with Cortex-like Mechanisms", IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.

Serre, T., A. Oliva and T. Poggio."A Feedforward Architecture Accounts for Rapid Categorization", Proceedings of the National Academy of Sciences (PNAS), Vol. 104, No. 15, 6424-6429, 2007.

S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio. "Mathematics of the Neural Response", CBCL Paper #276/MIT CSAIL Technical Report #TR2008-070, Massachusetts Institute of Technology, Cambridge, MA, November, 2008

Class Times:	Monday and Wednesday 10:30-12:00
Units:	3-0-9 H,G
Location:	46-5193
Instructors:	Tomaso Poggio (TP), Ryan Rifkin (RR), Jake Bouvrie (JB), Lorenzo Rosasco (LR), Charlie Frogner (CF)
Office Hours:	By appointment
Email Contact :	9.520@mit.edu
Previous Class:	SPRING 08

	Date	Title	Instructor(s)
Class 01	Wed 04 Feb	The Course at a Glance	TP
Class 02	Mon 09 Feb	The Learning Problem and Regularization	TP
Class 03	Wed 11 Feb	Reproducing Kernel Hilbert Spaces	LR
Mon 16 Feb - President's Day
Class 04	Tue 17 Feb	Regularized Least Squares	RR
Class 05	Wed 18 Feb	Several Views Of Support Vector Machines	RR
Class 06	Mon 23 Feb	Multiclass Classification	RR
Class 07	Wed 25 Feb	Spectral Regularization	LR
Class 08	Mon 02 Mar	Manifold Regularization	LR
Class 09	Wed 04 Mar	Generalization Bounds, Intro to Stability	LR/TP
Class 10	Mon 09 Mar	Stability of Tikhonov Regularization	LR/TP
Class 11	Wed 11 Mar	Sparsity Based Regularization I	LR
Class 12	Mon 16 Mar	Regularization for Multi-Output Learning	LR
Class 13	Wed 18 Mar	Loose ends, Project discussions
SPRING BREAK March 23-27
Class 14	Mon 30 Mar	Sparsity, rank, and all that	Ben Recht
Class 15	Wed 01 Apr	Bayesian Interpretations of Regularization	CF
Class 16	Mon 06 Apr	A Bayesian Perspective on Statistical Learning Theory	Dan Roy
Class 17	Wed 08 Apr	Nonparametric Bayesian Regression and Density Estimation	Vikash
Class 18	Mon 13 Apr	Hierarchical Bayesian Modeling for Unsupervised Learning	Vikash
Class 19	Wed 15 Apr	Geometry and Learning	Partha Niyogi
Mon 20 Apr - Patriot's Day
Class 20	Wed 22 Apr	Demographic forecasting and the role of priors	Federico Girosi
Class 21	Mon 27 Apr	Vision and Visual Neuroscience	TP
Class 22	Wed 29 Apr	Vision and Visual Neuroscience	Thomas Serre
Class 23	Mon 04 May	Derived Kernels	JB
Class 24	Wed 06 May	Application of Belief Nets to Modelling Attention	Sharat/Thomas
Class 25	Mon 11 May	Project Presentations
Class 26	Wed 13 May	Project Presentations

Math Camp	Tue 09 Feb 5:00pm-7:00pm	Probability theory notes
Old Math Camp Slides	XX	Functional analysis
Old Math Camp Slides	XX	Probability theory