Statistical Reinforcement Learning and Decision Making

9.S915 Fall 2022

Course Description: The course will focus on the statistical and algorithmic foundations of decision making and reinforcement learning. Special attention will be paid to function approximation and flexible model classes such as neural networks. Topics covered include multi-armed and contextual bandits, structured bandits, and reinforcement learning. The course will present a unifying framework for addressing the exploration-exploitation dilemma using both frequentist and Bayesian approaches, with connections and parallels between supervised learning/estimation and decision making as an overarching theme.

Instructors: Dylan J. Foster and Alexander Rakhlin.

Target Audience: Graduate or advanced undergraduate students.

Prerequisites: probability at the level of 6.041 or permission of instructor.

Grading: based on 3 problem sets.

Details: Canvas website.

Course notes

Additional Resources:

Lectures:

Introduction
Lecture 01: Introduction
Lecture 02: Statistical Learning. Online Supervised Learning
Lecture 03: Online Supervised Learning
Multi-Armed Bandits
Lecture 04: Multi-Armed Bandits
Lecture 05: Optimism in the Face of Uncertainty: Upper Confidence Bound (UCB) algorithm
Lecture 06: Posterior Sampling Methods
Contextual Bandits
Lecture 07: Optimism with a Finite Class
Lecture 08: Linear Models and LinUCB. Failure of Optimism.
Lecture 09: eps-Greedy. Inverse Gap Weighting
Structured Bandits
Lecture 10: Optimism for Structured Bandits. Eluder Dimension.
Lecture 11: Decision-Estimation Coefficient. The E2D Meta-Algorithm
Lecture 12: Examples. Inverse Gap Weighting. Optimal G-design.
Lecture 13: Examples. Connections to Optimism and Posterior Sampling.
Intro to RL
Lecture 14: Finite-Horizon Episodic MDPs
Lecture 15: Bellman Optimality. Performance-Difference Lemma. Optimism.
Lecture 16: Optimism and UCB-VI
General Decision Making
Lecture 17: Decision-Estimation Coefficient.
Lecture 18: E2D Algorithm. Online Oracles for Hellinger and KL.
Lecture 19: Lower Bound and Examples
Lecture 20: Proof of the Lower Bound
Reinforcement Learning
Lecture 21: Tabular RL: DEC and the PC-IGW Algorithm
Lecture 22: PC-IGW Analysis
Lecture 23: Function Approximation. Realizability. Linear Q* and Linear MDPs
Lecture 24: LSVI-UCB Algorithm and Analysis
Lecture 25: Bellman Rank. Examples. BiLinUCB Algorithm
Lecture 26: BiLinUCB Analysis
Lecture 27: Conclusions / Open Directions

Statistical Reinforcement Learning and Decision Making

9.S915 Fall 2022

Course notes

Additional Resources:

Lectures:

Introduction

Multi-Armed Bandits

Contextual Bandits

Structured Bandits

Intro to RL

General Decision Making

Reinforcement Learning