Statistical Reinforcement Learning and Decision Making

9.S915 Fall 2022

Lectures:

  1. Introduction

  2. Lecture 01: Introduction
  3. Lecture 02: Statistical Learning. Online Supervised Learning
  4. Lecture 03: Online Supervised Learning
  5. Multi-Armed Bandits

  6. Lecture 04: Multi-Armed Bandits
  7. Lecture 05: Optimism in the Face of Uncertainty: Upper Confidence Bound (UCB) algorithm
  8. Lecture 06: Posterior Sampling Methods
  9. Contextual Bandits

  10. Lecture 07: Optimism with a Finite Class
  11. Lecture 08: Linear Models and LinUCB. Failure of Optimism.
  12. Lecture 09: eps-Greedy. Inverse Gap Weighting
  13. Structured Bandits

  14. Lecture 10: Optimism for Structured Bandits. Eluder Dimension.
  15. Lecture 11: Decision-Estimation Coefficient. The E2D Meta-Algorithm
  16. Lecture 12: Examples. Inverse Gap Weighting. Optimal G-design.
  17. Lecture 13: Examples. Connections to Optimism and Posterior Sampling.
  18. Intro to RL

  19. Lecture 14: Finite-Horizon Episodic MDPs
  20. Lecture 15: Bellman Optimality. Performance-Difference Lemma. Optimism.
  21. Lecture 16: Optimism and UCB-VI
  22. General Decision Making

  23. Lecture 17: Decision-Estimation Coefficient.
  24. Lecture 18: E2D Algorithm. Online Oracles for Hellinger and KL.
  25. Lecture 19: Lower Bound and Examples
  26. Lecture 20: Proof of the Lower Bound
  27. Reinforcement Learning

  28. Lecture 21: Tabular RL: DEC and the PC-IGW Algorithm
  29. Lecture 22: PC-IGW Analysis
  30. Lecture 23: Function Approximation. Realizability. Linear Q* and Linear MDPs
  31. Lecture 24: LSVI-UCB Algorithm and Analysis
  32. Lecture 25: Bellman Rank. Examples. BiLinUCB Algorithm
  33. Lecture 26: BiLinUCB Analysis
  34. Lecture 27: Conclusions / Open Directions