In the following, we will be considering a repeated two-player game, called
Blackwell’s Approachability Game. The game is formally represented
by a 4-tuple $(\mathcal{X}, \mathcal{Y}, f, S)$, where:
$\mathcal{X}$ denotes a convex compact action space of the first player, identified
as $\mathcal{X}$ with an abuse of notation;
$\mathcal{Y}$ similarly denotes a convex compact action space of the second player,
identified as $\mathcal{Y}$ with an abuse of notation;
$f : \mathcal{X}\times\mathcal{Y} \to \mathbb{R}^d$ biaffine payoff
function of the game. We assume $f$ is bounded above in norm by $\gamma > 0$,
that is
mapping at each instant of time $t$ the game history
$\mathcal{H}_{t-1} \doteq ((x_i, y_i))_{i = 1}^{t - 1}$ into the next action
$x_t \in \mathcal{X}$. Of course, a similar yet simmetric definition holds for player
$\mathcal{Y}$.
The objective of player $\mathcal{X}$ is to keep the average payoff
as close as possible to $S$, while $\mathcal{Y}$ tries to deny this.
2. Approachable and forceable sets
A set $H$ is said to be forceable if there exists an action
$\bar{x}$ for player $\mathcal{X}$ such that whatever the
$y$ used by the opponent, $f(\bar{x}, y) \in H$.
The set $S$ is said to be approachable if there exists a strategy
$\sigma_\mathcal{X}$ for player $\mathcal{X}$ such that whatever the strategy
$\sigma_\mathcal{Y}$ used by the opponent, the distance from $\phi_t$ and
$S$ converges uniformely to 0.
In symbols:
\[
\exists\, \sigma_\mathcal{X}:\ \forall\, \epsilon > 0\ \exists\, T
\ \forall\, \sigma_\mathcal{Y}\ \forall t \ge T\ \rho(\phi_t, S) < \epsilon
\]
where $\rho(\phi_t, S) = \min_{s\in S} \left\|\phi_t - s\right\|$ as usual (the
projection $\rho(\phi_t, S)$ exists and is unique since $S$ is closed and convex
and we are working in the $l^2$ norm).
2.1. The scalar case
We start by recalling the classic theorem by Sion, an extension of Von Neumann’s minimax theorem.
Theorem 1 (Sion’s theorem).
Given compact convex sets $\mathcal{X}, \mathcal{Y}$, and a biaffine function
$f: \mathcal{X}\times\mathcal{Y}\to\mathbb{R}$, we have
\[
\nu = \max_{x\in\mathcal{X}}\min_{y\in\mathcal{Y}} f(x,y) =
\min_{y\in\mathcal{Y}}\max_{x\in\mathcal{X}} f(x,y).
\]
An immediate consequence of Sion’s theorem is that there exists an action
$\bar{x}$ of player $\mathcal{X}$ such that
Indeed, assume the opposite: for all $x$ there exists a $\hat{y}(x)\in\mathcal{Y}$
such that $f(x, \hat{y}(x)) < \nu$. This would mean that for all $x\in\mathcal{X}$,
$\min_{y\in\mathcal{Y}} f(x, y) < \nu$, so that $\nu = \max_x \min_y f(x,y) < \nu$,
a contradiction!
Now, suppose player $\mathcal{X}$ plays $\bar{x}$ repeatedly. This proves that
the interval $[\lambda, +\infty)$ is forceable and approachable for every
$\lambda \le \nu$.
Symmetrically, there exists an action $\bar{y}$ for player $\mathcal{Y}$
such that
Again, by contradicition: if the condition was false, then for all $y\in\mathcal{Y}$
there exists an action $\hat{x}(y)\in\mathcal{X}$ such that $f(\hat{x}(y), y) > \nu$.
This in turn implies that $\max_x f(x, y) > \nu$ for all actions $y\in\mathcal{Y}$, and $\nu = \min_y \max_x f(x, y) > \nu$,
contradiction.
As a consequence, $[\lambda, +\infty)$ is not approachable nor foarceable
for any value of $\lambda > \nu$. We can then state the following lemma.
Lemma 1 (Scalar approachability condition).
In a scalar game, the following conditions are equivalent:
$[\lambda, +\infty)$ is approachable;
$[\lambda, +\infty)$ is forceable;
$\lambda \le \nu$.
2.2. Approachable halfspaces
Before turning our attention to general closed convex sets (Section 3),
we focus on an important class of target sets: halfspaces. The following lemma
is central in our discussion.
Lemma 2.
A halfspace $H(\lambda, q) = \{h \in\, \mathbb{R}^d\, :\, \lambda^\top h \ge q\}$
with $\left\|\lambda\right\| = 1$ is approachable if and only if it is forceable.
Proof. Consider a new two-player game $\hat{\Gamma}$ whose
payoff function $\hat{f}: \mathcal{X}\times\mathcal{Y} \to \mathbb{R}$
is defined as
\[
\hat{f}(x, y) \doteq \lambda^\top f(x,y).
\]
This is a scalar game ($\hat{f}$ maps to $\mathbb{R}$) with a biaffine payoff
function.
The key observation is that $H(\lambda, q)$ is approachable if and only
if $\hat{H}(q) = [q, +\infty)$ is approachable. Indeed, notice that for any
$u\in\mathbb{R}^d$:
\[\rho(u, H(\lambda, q)) = \rho(\lambda^\top u, \hat{H}(q)).\]
Therefore, approachability with respect to $H(\lambda, q)$ is equivalent
to approachability with respect to $\hat{H}(q)$. At the same time we have that
so that we conclude that forceability with respect to $H(\lambda, q)$ is
equivalent to forceability with respect to $\hat{H}(q)$. Invoking Lemma
1 completes the proof. $\square$
3. Blackwell’s Theorem
Theorem 2.
A closed convex set $S$ is approachable if and only if every halfspace
$H \supseteq S$ is forceable.
Proof. ($\Longrightarrow$) This direction is trivial: if $S$ is
approachable than any set $Q \supseteq S$ is approachable, including any
halfspace $H \supseteq S$. On the other hand, we know from Lemma
2 that approchability and
forceability are equivalent when dealing with halfspaces.
($\Longleftarrow$) Suppose we played a strategy $\sigma_\mathcal{X}$ up until time $t$, and
it is now our turn to play. We have collected an average reward of $\phi_{t}$,
and would like “move closer” to $S$. See Figure 1.
Figure 1.
Let $\psi_t$ be the projection of $\phi_t$ onto $S$. Remember that such projection
exists and is unique since $S$ is a closed convex set.
If $\psi_t\in S$, play any random move. Otherwise, consider the halfspace
$H(\hat{\lambda}, \hat{q})$ perpendicular to
$\psi_t - \phi_t$, containing $S$ and such that $\hat{\lambda}^\top \psi_t = \hat{q}$.
We clearly have
where we let $f_{t+1}\doteq f(\bar{x}, y_{t+1})$ be the payoff received by player
$\mathcal{X}$ at time $t+1$.
Notice that since $\psi_t$ is also the projection of $\phi_t$ onto $H$,
Taking the square root of Equation 4 shows
that $\rho(\phi_t, S)$ converges uniformely to zero as $t \to \infty$, at
a rate equal to $O(1/\sqrt{t})$. Therefore $S$ is approachable and we conclude the proof. $\square$
4. Approximate Nash equilibria
Here we make the further assumption that $\mathcal{X}$ and $\mathcal{Y}$ are
(finite-dimensional) polytopes, and that the game is zero-sum. We let
$u : \mathcal{X}\times\mathcal{Y} \to \mathbb{R}$ represent the bilinear
payoff function of the game. We start by recalling two classical definitions in game theory.
Definition 1 (Regret).
The (external) regret of player $\mathcal{X}$ against action $\hat{x}\in\mathcal{X}$
after history $\mathcal{H}_{t} = ((x_i, y_i))_{i=1}^{t}$ is defined as
Therefore, we conclude that $(\bar{x}_t,\bar{y}_t)$ is a $(\epsilon_\mathcal{X}, \epsilon_\mathcal{Y})$-Nash
equilibrium. $\square$
4.1. From regret-minimization to approachability
To our purposes, it is convenient to introduce the concept of instantaneous
regret, $r_{\hat{x}}(x_t,y_t)$, representing the regret accumulated by
player $\mathcal{X}$ against action $\hat{x}$ when the two players play
$x_t\in\mathcal{X}$ and $y_t\in\mathcal{Y}$, respectively:
is the average of the instantaneous regrets of player $\mathcal{X}$. Note that
under this characterization, $\bar{R}_\mathcal{X}(\mathcal{H}_{t}, \hat{x})$
looks similar to the definition of average payoff $\phi_t$.
It is tempting to introduce the biaffine multi-dimensional payoff function
so that any regret-minimizing strategy is equivalent to an approach strategy
for the non-positive orthant. However, this approach is flawed, in that
the resulting payoff space would have an infinite number of dimensions.
Luckily, we can prove that the condition of Equation 5,
which has to hold for all $\hat{x}\in\mathcal{X}$, is equivalent to the same
condition over a restricted, finite number of actions when $\mathcal{X}$ is
a finitely generated polytope. Indeed, we claim the following:
Lemma 3.
A strategy $\sigma_\mathcal{X}$ for player $\mathcal{X}$ is regret-minimizing
if
for all histories $\mathcal{H}_{t}$ and all $\hat{x} \in \{b_1, \ldots, b_n\}$,
where $\{b_1,\ldots,b_n\}$ is a convex basis for $\mathcal{X}$.
Proof. ($\Longrightarrow$) This directions is trivial,
as clearly $b_1, \ldots, b_n \in \mathcal{X}$.
($\Longleftarrow$) By definition of basis, given any $x\in\mathcal{X}$ there
exist $n$ reals $\lambda_1(x), \ldots, \lambda_n(x) \ge 0$ with
$\lambda_1(x) + \cdots + \lambda_n(x) = 1$, such that $x = \lambda_1(x) b_1 + \cdots + \lambda_n(x) b_n$.
Therefore, using the bilinearity of $u$:
We know by hypothesis that $\limsup R_\mathcal{X}(\mathcal{H}_{t}, b_j) \le 0$
for all $j = 1,\ldots, n$. Using the subadditivity of $\limsup$, we find
This completes the proof. $\square$
We conclude this section noting that when considering a standard normal-form game,
the set of pure strategies forms a basis for the strategy
simplex $\mathcal{X}$. However, notice that our presentation is more general, and accounts
for non-standard action spaces.
4.2. Building an explicit strategy
Lemma 3 fixes the problem we had we Equation
6. Specifically, we can now consider an
approachability game over $\mathcal{X}$ and $\mathcal{Y}$, with a biaffine payoff
function
and having the non-positive orthant as target set.
From the construction in the proof of Theorem 2 we
know that in order to find the action $x_t\in\mathcal{X}$ given
$\mathcal{H}_{t-1}$, we need to first project $\phi_t$ onto $S$, finding
$\psi_t$, and then force the halfspace containing $S$ perpendicular to $\phi_t-\psi_t$ at
$\psi_t$.
We start by projecting $\phi_t$ onto $S$. The particular structure
of $S$ (non-positive orthant) makes it very easy:
\[\psi_t = [\phi_t]^-,\]
where $[x]^-$ denotes the vector whose $i$-th component is equal to
$\min\{x_i, 0\}$.
All halfplanes tangent to $S$ and containing $S$ are of the form
By definition, $H(\lambda)$ is forceable if there exists an action
$\bar{x}\in\mathcal{X}$ such that $\lambda^\top f(\bar{x}, y) \ge 0$ for all $y\in\mathcal{Y}$.
Expanding the definition of $f$,
where $\Lambda\doteq\sum_{j=1}^n \lambda_j < 0$ conveniently acts as a normalization
constant, so that Equation 7 not only
gives us a candidate $\bar{x}$, it also guarantees that such $\bar{x}$
belongs to the polytope $\mathcal{X}$ by expressing it as a convex combination of the basis vectors
$b_1,\ldots,b_n$.
We now have all the pieces needed to write down the complete algorithm:
We start by playing a random $x_1\in\mathcal{X}$, and observe
the opponent’s move $y_1$.
At time $t$, if $\phi_t \in S$, we play at random. Otherwise, we
play the action $x_t$ defined by Equation 7.
In both cases, move and wait for the opponent’s move $y_{t}$.
Notice that since $\lambda$ is parallel to $\psi_t - \phi_t = -[\phi_t]^+$,
we can rewrite $x_t$ as
Blackwell’s theorem guarantees that if both $\mathcal{X}$ and $\mathcal{Y}$
play according to the strategy above, then for all $\hat{x}\in\mathcal{X},\hat{y}\in\mathcal{Y}$,
Using Theorem 3 we conclude that the strategy above
guarantees that at time $t$ a $O(\sqrt{\max\{|\mathcal{X}|,|\mathcal{Y}|\}}/\sqrt{t})$-Nash equilibrium is found.
5. References
This post is based on the work of
Matus Telgarsky, Blackwell Approachability and Minimax Theory (pdf)
Sergiu Hart and Mas-Colell, A Simple Adaptive Procedure Leading to Correlated Equilibrium (pdf)