6.883: Online Methods in Machine Learning

Theory and Applications

MW 2:30-4, Room 32-124

Instructor: Alexander Rakhlin

TA: Arthur Flajolet

Course description:

In a growing number of machine learning applications—such as problems of advertisement placement, movie recommendation, and node or link prediction in evolving networks—one must make online, real-time decisions and continuously improve performance with the sequential arrival of data. The course aims to provide a foundation for the development of such online methods and for their analysis.

The course will have three interleaved components:
  • fundamental theoretical tools for analyzing online methods
  • algorithmic techniques for developing computationally efficient methods, and
  • applications to real-world problems.
We will outline important connections to statistics, probability, optimization, and game theory. Since the topic of the course is fairly novel, we will indicate directions of further research and many open questions.

Prerequisites: Probability and Linear Algebra.
Target audience: graduate students or advanced undergraduates.


Tentative list of topics:

  • Introduction
      Mind-reading machine. Bit prediction. Cover's result.
  • Online methods for i.i.d. data
    • Stochastic approximation and sample average approximation.
    • Stochastic gradient descent. Sparse updates.
    • Minibatching. Acceleration.
    • Pegasos. SGD for Kernel SVM. Perceptron.
    • Logistic regression. Multiclass learning.
    • Neural nets. Backprop via SGD.
    • Online-to-batch conversion. Bias-variance tradeoff and early stopping.
  • Online methods for dynamic environments
    • Online prediction as an optimization problem.
    • Bit prediction. Dynamic programming. Prediction with side information.
    • Learning in static networks. Laplacian regularization.
  • Probabilistic toolkit
    • Stochastic processes, martingales, maximal inequalities, symmetrization.
    • Sequential complexity measures: random averages, covering numbers, combinatorial parameters.
    • Comparison to the Vapnik-Chervonenkis theory.
  • Optimization toolkit
    • Online convex optimization.
    • Gradient descent, mirror descent, dual averaging, exponential weights.
  • Minimax analysis
    • Learning with absolute loss. Multiclass classification. Online linear optimization.
  • Algorithmic toolkit
    • The general relaxation framework. Approximate dynamic programming.
    • Relaxations for i.i.d. covariates. Transductive learning.
    • Algorithms for online classification and regression.
    • The magic of random playout: general techniques, follow-the-perturbed-leader.
    • Examples: linear classes, kernel methods, static experts, online shortest path.
    • Relaxations for multiclass learning.
    • Approximation algorithms for online relaxations.
  • Learning in evolving networks
    • Online prediction of user attributes; node and link classification.
    • Random playout for evolving graphs.
    • Experiments with social network data.
  • Online collaborative filtering and movie rating prediction.
  • Parital information problems
    • Multi-armed bandits, contextual bandits. Ad placement.
    • Dynamic pricing, partial monitoring.
  • Online regression
    • Minimax analysis, algorithms.
  • Adaptive online learning methods, data-dependent bounds
  • Scoring online competitions: tracking performance of adaptively chosen hypotheses

Lecture notes

Recitation notes (by Arthur)



  • Homework 1 posted on Stellar, due April 4
  • Homework 2 posted on Stellar, due April 25
  • Homework 3 posted on Stellar, due May 06
  • Final project guidelines


Tutorial on online supervised learning with applications to node classification in social networks.

Books (not required)


Office hours: 32-D630, Tue 1:30-2:30pm, or by appointment