Class Times:Monday and Wednesday 10:30-12:00 Units:3-0-9 H,G Location:46-5193 Instructors:Tomaso Poggio (TP), Ryan Rifkin (RR), Jake Bouvrie (JB),

Lorenzo Rosasco (LR), Charlie Frogner (CF)

Office Hours:By appointment Email Contact :9.520@mit.eduPrevious Class: SPRING 08 ## Focuses on the problem of supervised and unsupervised learning from the perspective of modern statistical learning theory, starting with the theory of multivariate function approximation from sparse data. Develops basic tools such as regularization, including support vector machines for regression and classification. Derives generalization bounds using stability. Discusses current research topics such as manifold regularization, sparsity, feature selection, bayesian connections and techniques, and online learning. Emphasizes applications in several areas: computer vision, speech recognition, and bioinformatics. Discusses advances in the neuroscience of the cortex and their impact on learning theory and applications. The course is graded on the basis of final projects and hands-on applications and exercises.

Course description## 6.867 or permission of instructor. In practice, a substantial level of mathematical maturity is necessary. Familiarity with probability and functional analysis will be very helpful. We try to keep the mathematical prerequisites to a minimum, but we will introduce complicated material at a fast pace.

Prerequisites## There will be two problem sets and a final project. To receive credit, you must attend regularly, and put in effort on all problem sets and the project.

Grading

## Problem set #1:

Problem sets

Problem set #2:Due Mon. April 13th(in class)

## Project ideas:

Projects

## Syllabus

Follow the link for each class to find a detailed description, suggested readings, and class slides. Some of the later classes may be subject to reordering or rescheduling.

Date Title Instructor(s) Class 01 Wed 04 Feb The Course at a Glance TP Class 02 Mon 09 Feb The Learning Problem and Regularization TP Class 03 Wed 11 Feb Reproducing Kernel Hilbert Spaces LR Mon 16 Feb - President's Day Class 04 Tue17 FebRegularized Least Squares RR Class 05 Wed 18 Feb Several Views Of Support Vector Machines RR Class 06 Mon 23 Feb Multiclass Classification RR Class 07 Wed 25 Feb Spectral Regularization LR Class 08 Mon 02 Mar Manifold Regularization LR Class 09 Wed 04 Mar Generalization Bounds, Intro to Stability LR/TP Class 10 Mon 09 Mar Stability of Tikhonov Regularization LR/TP Class 11 Wed 11 Mar Sparsity Based Regularization I LR Class 12 Mon 16 Mar Regularization for Multi-Output Learning LR Class 13 Wed 18 Mar Loose ends, Project discussions SPRING BREAK March 23-27 Class 14 Mon 30 Mar Sparsity, rank, and all that Ben Recht Class 15 Wed 01 Apr Bayesian Interpretations of Regularization CF Class 16 Mon 06 Apr A Bayesian Perspective on Statistical Learning Theory Dan Roy Class 17 Wed 08 Apr Nonparametric Bayesian Regression and Density Estimation Vikash Class 18 Mon 13 Apr Hierarchical Bayesian Modeling for Unsupervised Learning Vikash Class 19 Wed 15 Apr Geometry and Learning Partha Niyogi Mon 20 Apr - Patriot's Day Class 20 Wed 22 Apr Demographic forecasting and the role of priors Federico Girosi Class 21 Mon 27 Apr Vision and Visual Neuroscience TP Class 22 Wed 29 Apr Vision and Visual Neuroscience Thomas Serre Class 23 Mon 04 May Derived Kernels JB Class 24 Wed 06 May Application of Belief Nets to Modelling Attention Sharat/Thomas Class 25 Mon 11 May Project Presentations Class 26 Wed 13 May Project Presentations

Math Camp Tue 09 Feb

5:00pm-7:00pmProbability theory notes Old Math Camp Slides XX Functional analysis Old Math Camp Slides XX Probability theory ## There is no textbook for this course. All the required information will be presented in the slides associated with each class. The books/papers listed below are useful general reference reading, especially from the theoretical viewpoint. A list of suggested readings will also be provided separately for each class.

Reading List## Primary References

- Bousquet, O., S. Boucheron and G. Lugosi. Introduction to Statistical Learning Theory. Advanced Lectures on Machine Learning Lecture Notes in Artificial Intelligence 3176, 169-207. (Eds.) Bousquet, O., U. von Luxburg and G. Ratsch, Springer, Heidelberg, Germany (2004)
- F. Cucker and S. Smale.
On The Mathematical Foundations of Learning.Bulletin of the American Mathematical Society, 2002.- L. Devroye, L. Gyorfi, and G. Lugosi.
A Probabilistic Theory of Pattern Recognition.Springer, 1997.- T. Evgeniou and M. Pontil and T. Poggio.
Regularization Networks and Support Vector Machines.Advances in Computational Mathematics, 2000.- T. Poggio and S. Smale.
The Mathematics of Learning: Dealing with Data.Notices of the AMS, 2003- V. N. Vapnik.
Statistical Learning Theory.Wiley, 1998.- V. N. Vapnik.
The Nature of Statistical Learning Theory.Springer, 1995.## Secondary References

- O. Bousquet and A. Elisseeff, Stability and Generalization, Journal of Machine Learning Research, Vol. 2, pp.499-526, 2002.
- N. Cristianini and J. Shawe-Taylor.
Introduction To Support Vector Machines.Cambridge, 2000.- Lo Gerfo L., Rosasco L., Odone F., De Vito E. and Verri, A. Spectral Algorithms for Supervised Learning, to appear in Neural Computation
- Poggio, T., R. Rifkin, S. Mukherjee and P. Niyogi. General Conditions for Predictivity in Learning Theory, Nature, Vol. 428, 419-422, 2004 (see also Past Performance and Future Results).
- Rifkin, R.,. and R.A. Lippert. Notes on Regularized Least-Squares, CBCL Paper #268/AI Technical Report #2007-019, Massachusetts Institute of Technology, Cambridge, MA, May, 2007.
- Rifkin, R. and A. Klautau. In Defense of One-vs-All Classification, Journal of Machine Learning Research, Vol. 5, 101-141, 2004.
## Background Mathematics References

- A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, Dover Publications, 1975.
- A. N. Kolmogorov and S. V. Fomin, Elements of the Theory of Functions and Functional Analysis, Dover Publications, 1999.
- Luenberger, Optimization by Vector Space Methods, Wiley, 1969.
## Neuroscience Related References

- Serre, T., L. Wolf, S. Bileschi, M. Riesenhuber and T. Poggio. "Object Recognition with Cortex-like Mechanisms", IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.
- Serre, T., A. Oliva and T. Poggio."A Feedforward Architecture Accounts for Rapid Categorization", Proceedings of the National Academy of Sciences (PNAS), Vol. 104, No. 15, 6424-6429, 2007.
- S. Smale, L. Rosasco, J. Bouvrie, A. Caponnetto, and T. Poggio. "Mathematics of the Neural Response", CBCL Paper #276/MIT CSAIL Technical Report #TR2008-070, Massachusetts Institute of Technology, Cambridge, MA, November, 2008