#
Efficient Online Learning on Polytopes with Linear Minimization Oracles

Darshan Chakrabarti, Gabriele Farina, Christian Kroer

#### Abstract

We study online learning and equilibrium computation in games with polyhedral decision sets, a property shared by both normal-form games (NFGs) and extensive-form games (EFGs), when the learning agent is restricted to utilizing a best-response oracle. We show how to achieve constant regret in zero-sum games and $O(T^{1/4})$ in general-sum games, while using only $O(\log t)$ best-response queries at a given iteration $t$. This is in stark contrast to the prior best result, which requires $O(T)$ queries per iteration. Moreover, our framework yields the first last-iterate convergence guarantees for self-play with best-response oracles in zero-sum games. This convergence occurs at a linear rate, though with a condition-number dependence. We go on to show a $O(1/\sqrt{T})$ best-iterate convergence rate without such a dependence. Our results build on linear-rate convergence results for variants of the Frank-Wolfe (FW) algorithm for strongly convex and Lipschitz minimization problems over polyhedral domains. These FW results depend on a condition number of the polytope, known as facial distance. In order to enable application to settings such as EFGs, we show two broad new results: 1) the facial distance for polytopes of the form $\{\mathbf{x} \in \mathbb{R}^n_+ \mid \mathbf{A} \mathbf{x} = \mathbf{b}\}$ is at least $\gamma / \sqrt{k}$ where $\gamma$ is the minimum value of a nonzero coordinate of a vertex in the feasible set and $k\leq n$ is the number of tight inequality constraints, and 2) the facial distance for polytopes of the form $\mathbf{A}\mathbf{x} = \mathbf{b}, \mathbf{C}\mathbf{x} \leq \mathbf{d}, \mathbf{x} \geq \mathbf{0}$ where $\mathbf{x} \in \mathbb{R}^n$, $\mathbf{C} \geq \mathbf{0}$, $\mathbf{C}$ is integral, and $\mathbf{d} \geq 0$, is at least $\min(1/\sqrt{n}, 1/\|\mathbf{C}\|_\infty \sqrt{n})$. This yields the first such results for several problems such as sequence-form polytopes, flow polytopes, and matching polytopes.

#### Bibtex entry

@inproceedings{Chakrabarti24:Efficient,
title={Efficient Online Learning on Polytopes with Linear Minimization Oracles},
author={Darshan Chakrabarti and Gabriele Farina and Christian Kroer},
booktitle={AAAI Conference on Artificial Intelligence},
year={2024}
}