# Statistical Reinforcement Learning and Decision Making

### 9.S915 Fall 2022

* *Notes will be available here soon

### Lectures:

#### Introduction

- Lecture 01: Introduction
- Lecture 02: Statistical Learning. Online Supervised Learning
- Lecture 03: Online Supervised Learning
#### Multi-Armed Bandits

- Lecture 04: Multi-Armed Bandits
- Lecture 05: Optimism in the Face of Uncertainty: Upper Confidence Bound (UCB) algorithm
- Lecture 06: Posterior Sampling Methods
#### Contextual Bandits

- Lecture 07: Optimism with a Finite Class
- Lecture 08: Linear Models and LinUCB. Failure of Optimism.
- Lecture 09: eps-Greedy. Inverse Gap Weighting
#### Structured Bandits

- Lecture 10: Optimism for Structured Bandits. Eluder Dimension.
- Lecture 11: Decision-Estimation Coefficient. The E2D Meta-Algorithm
- Lecture 12: Examples. Inverse Gap Weighting. Optimal G-design.
- Lecture 13: Examples. Connections to Optimism and Posterior Sampling.
#### Intro to RL

- Lecture 14: Finite-Horizon Episodic MDPs
- Lecture 15: Bellman Optimality. Performance-Difference Lemma. Optimism.
- Lecture 16: Optimism and UCB-VI
#### General Decision Making

- Lecture 17: Decision-Estimation Coefficient.
- Lecture 18: E2D Algorithm. Online Oracles for Hellinger and KL.
- Lecture 19: Lower Bound and Examples
- Lecture 20: Proof of the Lower Bound
#### Reinforcement Learning

- Lecture 21: Tabular RL: DEC and the PC-IGW Algorithm
- Lecture 22: PC-IGW Analysis
- Lecture 23: Function Approximation. Realizability. Linear Q* and Linear MDPs
- Lecture 24: LSVI-UCB Algorithm and Analysis
- Lecture 25: Bellman Rank. Examples. BiLinUCB Algorithm
- Lecture 26: BiLinUCB Analysis
- Lecture 27: Conclusions / Open Directions