9.520: Statistical Learning Theory and Applications, Spring 2011

Class Times: Monday and Wednesday 10:30-12:00

Units: 3-0-9 H,G

Location: 46-5193

Instructors:
Tomaso Poggio (TP), Lorenzo Rosasco (LR), Charlie Frogner (CF), Pavan Mallapragada (PM)

Office Hours: Friday 1-2 pm in 46-5156, CBCL lounge (changed location again!)

Email Contact : 9.520@mit.edu

Previous Class: SPRING 10

Course Description

Prerequisites

Grading

Scribe Notes

Problem Sets

Projects

Syllabus

Reading List

Course description

Focuses on the problem of supervised and unsupervised learning from the perspective of modern statistical learning theory, starting with the theory of multivariate function approximation from sparse data. Develops basic tools such as regularization, including support vector machines for regression and classification. Derives generalization bounds using stability. Discusses current research topics such as manifold regularization, sparsity, feature selection, Bayesian connections and techniques, and online learning. Emphasizes applications in several areas: computer vision, speech recognition, and bioinformatics. Discusses advances in the neuroscience of the cortex and their impact on learning theory and applications. Requirements for grading (other than attending lectures) are: scribing one lecture, 2 problems sets, final project.

Prerequisites
6.867 or permission of instructor. In practice, a substantial level of mathematical maturity is necessary. Familiarity with probability and functional analysis will be very helpful. We try to keep the mathematical prerequisites to a minimum, but we will introduce complicated material at a fast pace.
Grading
Requirements for grading (other than attending lectures) are: scribing one lecture, 2 problems sets, final project.

Scribe Notes

New scribe template and usage instructions : here

In this class we will scribe 16 lectures: lectures #2 - #12, #14 - #15, and lectures #17-#19. Each student who is taking this class for credit will be required to scribe one lecture. Scribe notes should be a natural integration of the presentation of the lectures with the material in the slides. The lecture slides are available on this website for your reference. Good scribe notes are important, both for your grades and for other students to read. In particular, please make an effort to present the material in a clear, concise, and comprehensive manner.

Scribe notes must be prepared with Latex, using this template. Scribe notes (.tex file and all additional files) should be submitted to 9.520@mit.edu no later than one week after the class. Please make sure to proofread the notes carefully before submitting. We will review the scribe notes to check the technical content and quality of writing. We will also give feedback and ask for a revised version if necessary. Completed scribe notes will be posted on this website as soon as possible.

You can sign up here for scribe notes. If you have problems opening or editing the page, please send us an email at 9.520@mit.edu. In addition, if you have any questions of concerns about the scribing requirement, please feel free to send us an email.

Problem Sets
Problem set #1: PDF PDF | dataset --due Wednesday, March 16th
Problem set #2: PDF PDF | dataset (prob. 5) | two moons dataset (prob. 3) --due Monday, April 25th

Projects
Final writeup due Sunday, May 15th, by midnight

Project Ideas Contact: Instructors

Part-based Human Recognition in Videos Contact: Hueihan Jhuang

Solving Large Scale Kernel Machines using Random Features Contact: Nicholas Edelman

Evaluating which Classifiers Work Best for Decoding Neural Data Contact: Ethan Meyers

Does learning from segmented images aid categorization? Contact: Cheston Tan
What can humans see with a single glance? Contact: Cheston Tan
Demo of the motion silencing effect Contact: Cheston Tan
When invariance learning goes wrong Contact: Joel Leibo
More TBA

Syllabus

Follow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Date Title Instructor(s) Scribe notes

Class 01 Wed 02 Feb The Course at Glance TP ---

Class 02 Mon 07 Feb The Learning Problem and Regularization TP _PDF (2010)

Class 03 Wed 09 Feb Reproducing Kernel Hilbert Spaces LR _PDF (2010)

Class 04 Mon 14 Feb Mercer Theorem, Feature Maps and Representer Theorem LR ---

Class 05 Wed 16 Feb Regularized Least Squares CF ---

Mon 21 Feb - President's Day

Class 06 Tue 22 Feb Two Views of Support Vector Machines CF ---

Class 07 Wed 23 Feb Generalization Bounds, Intro to Stability LR/TP ---

Class 08 Mon 28 Feb Stability of Tikhonov Regularization LR/TP ---

Class 09 Wed 02 Mar Spectral Regularization LR ---

Class 10 Mon 07 Mar Manifold Regularization LR ---

Class 11 Wed 09 Mar Regularization for Multi-Output Learning LR ---

Class 12 Mon 14 Mar Sparsity Based Regularization LR ---

Class 13 Wed 16 Mar Loose ends, Project discussions ---

SPRING BREAK March 21-25

Class 14 Mon 28 Mar Regularization with Multiple Kernels LR ---

Class 15 Wed 30 Mar On-line Learning LR/TP ---

Class 16 Mon 04 Apr Reinforcement Learning David Wingate ---

Class 17 Wed 06 Apr Bayesian Interpretations of Regularization CF/TP ---

Class 18 Mon 11 Apr Nonparametric Bayesian Methods LR ---

Class 19 Wed 13 Apr Approximate Inference Ruslan Salakhtudinov ---

Mon 18 Apr - Patriot's Day

Class 20 Wed 20 Apr Manifold learning, the heat equation and spectral clustering Misha Belkin ---

Class 21 Mon 25 Apr Hierarchical Representation for Learning: Visual Cortex TP ---

Class 22 Wed 27 Apr Hierarchical Representation for Learning: Mathematics LR ---

Class 23 Mon 02 May Hierarchical Representation for Learning: Model Jim Mutch/LR/TP ---

Class 24 Wed 04 May MIT 150 ---

Class 25 Mon 09 May Project Presentations --- ---

Class 26 Wed 11 May Project Presentations --- ---

Math Camp Mon 07 Feb 5:30pm - 7:30pm Functional analysis: slides, notes. Probability theory: notes --- ---

Old Math Camp Slides XX Functional analysis

Old Math Camp Slides XX Probability theory

Reading List
There is no textbook for this course. All the required information will be presented in the slides associated with each class. The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of suggested readings will also be provided separately for each class.
Primary References

Bousquet, O., S. Boucheron and G. Lugosi. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning Lecture Notes in Artificial Intelligence 3176, 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, Heidelberg, Germany (2004)

F. Cucker and S. Smale. On The Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 2002.

F. Cucker and D-X. Zhou. Learning theory: an approximation theory viewpoint. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge, 2007.

L. Devroye, L. Gyorfi, and G. Lugosi. A Probabilistic Theory of Pattern Recognition. Springer, 1997.

T. Evgeniou and M. Pontil and T. Poggio. Regularization Networks and Support Vector Machines. Advances in Computational Mathematics, 2000.

T. Poggio and S. Smale. The Mathematics of Learning: Dealing with Data. Notices of the AMS, 2003

I. Steinwart and A. Christmann. Support vector machines. Springer, New York, 2008.

V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.

V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995.

N. Cristianini and J. Shawe-Taylor. Introduction To Support Vector Machines. Cambridge, 2000.

Background Mathematics References

A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, Dover Publications, 1975.

A. N. Kolmogorov and S. V. Fomin, Elements of the Theory of Functions and Functional Analysis, Dover Publications, 1999.

Luenberger, Optimization by Vector Space Methods, Wiley, 1969.

Neuroscience Related References

Serre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. "Object Recognition with Cortex-like Mechanisms", IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.

Serre, T., A. Oliva and T. Poggio."A Feedforward Architecture Accounts for Rapid Categorization", Proceedings of the National Academy of Sciences (PNAS), Vol. 104, No. 15, 6424-6429, 2007.

S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio. "Mathematics of the Neural Response", Foundations of Computational Mathematics, Vol. 10, 1, 67-91, June 2009.

Class Times:	Monday and Wednesday 10:30-12:00
Units:	3-0-9 H,G
Location:	46-5193
Instructors:	Tomaso Poggio (TP), Lorenzo Rosasco (LR), Charlie Frogner (CF), Pavan Mallapragada (PM)
Office Hours:	Friday 1-2 pm in 46-5156, CBCL lounge (changed location again!)
Email Contact :	9.520@mit.edu
Previous Class:	SPRING 10

	Date	Title	Instructor(s)	Scribe notes
Class 01	Wed 02 Feb	The Course at Glance	TP	---
Class 02	Mon 07 Feb	The Learning Problem and Regularization	TP	_PDF (2010)
Class 03	Wed 09 Feb	Reproducing Kernel Hilbert Spaces	LR	_PDF (2010)
Class 04	Mon 14 Feb	Mercer Theorem, Feature Maps and Representer Theorem	LR	---
Class 05	Wed 16 Feb	Regularized Least Squares	CF	---
Mon 21 Feb - President's Day
Class 06	Tue 22 Feb	Two Views of Support Vector Machines	CF	---
Class 07	Wed 23 Feb	Generalization Bounds, Intro to Stability	LR/TP	---
Class 08	Mon 28 Feb	Stability of Tikhonov Regularization	LR/TP	---
Class 09	Wed 02 Mar	Spectral Regularization	LR	---
Class 10	Mon 07 Mar	Manifold Regularization	LR	---
Class 11	Wed 09 Mar	Regularization for Multi-Output Learning	LR	---
Class 12	Mon 14 Mar	Sparsity Based Regularization	LR	---
Class 13	Wed 16 Mar	Loose ends, Project discussions		---
SPRING BREAK March 21-25
Class 14	Mon 28 Mar	Regularization with Multiple Kernels	LR	---
Class 15	Wed 30 Mar	On-line Learning	LR/TP	---
Class 16	Mon 04 Apr	Reinforcement Learning	David Wingate	---
Class 17	Wed 06 Apr	Bayesian Interpretations of Regularization	CF/TP	---
Class 18	Mon 11 Apr	Nonparametric Bayesian Methods	LR	---
Class 19	Wed 13 Apr	Approximate Inference	Ruslan Salakhtudinov	---
Mon 18 Apr - Patriot's Day

Class 20	Wed 20 Apr	Manifold learning, the heat equation and spectral clustering	Misha Belkin	---
Class 21	Mon 25 Apr	Hierarchical Representation for Learning: Visual Cortex	TP	---
Class 22	Wed 27 Apr	Hierarchical Representation for Learning: Mathematics	LR	---
Class 23	Mon 02 May	Hierarchical Representation for Learning: Model	Jim Mutch/LR/TP	---
Class 24	Wed 04 May	MIT 150		---
Class 25	Mon 09 May	Project Presentations	---	---
Class 26	Wed 11 May	Project Presentations	---	---

Math Camp	Mon 07 Feb 5:30pm - 7:30pm	Functional analysis: slides, notes. Probability theory: notes	---	---
Old Math Camp Slides	XX	Functional analysis
Old Math Camp Slides	XX	Probability theory