Course Description:
The course will focus on the statistical and algorithmic foundations
of decision making and reinforcement learning. Special attention will
be paid to function approximation and flexible model classes such as
neural networks. Topics covered include multi-armed and contextual
bandits, structured bandits, and reinforcement learning. The course
will present a unifying framework for addressing the
exploration-exploitation dilemma using both frequentist and Bayesian
approaches, with connections and parallels between supervised
learning/estimation and decision making as an overarching theme.
This year, we will devote special attention to Large Language Models (LLMs) and their training. We will derive and analyze popular RL methods for LLM finetuning. Students should expect hands-on homework in addition to theoretical analysis.
Instructors: Dylan J. Foster and Alexander Rakhlin.
Target Audience: Graduate or advanced undergraduate students.
Prerequisites: probability at the level of 6.041 or permission of instructor.
Grading: based on 3-4 problem sets.
Details: Canvas website.