Unconstrained Minima and Small Gradients
Contents
1. Introduction
Question 1 (Existence of small gradients).
If $f: E \to\, \mathbb{R}^n$ is (Gâteaux) differentiable, lower semicontinuous and lower bounded, can we conclude that it has points in which it is “arbitrarily flat”?
More formally, is it true that $\forall\ \epsilon > 0$ there exists $x_\epsilon\in E$ such that $\|\nabla f(x_\epsilon) \| \le \epsilon$?2. Attainment
This section contains a quick review of a few essential results about the existence of unconstrained optima.Theorem 1 (Weierstrass).
Let $f: D \to\ \mathbb{R}$ be a lower semicontinuous function defined over the nonempty closed and bounded domain
$D \subseteq E$, where $E$ is a Euclidean space. Then $f$ admits a global minimizer $x^* \in D$, that is
\[
f(x^*) \le f(x) \quad \forall\,x\in D.
\]
Corollary 1.
Let $f: E \to\,\mathbb{R}$ be a lower semicontinuous function with a bounded sublevel set $S_\alpha := \{x \in E: f(x) \le \alpha\}$.
Then $f$ has a global minimizer $x^* \in E$.
2.1. Coercive functions
Definition 1 (Coercive function).
A function $f: D \to\, \mathbb{R}$ defined on a subset $D$ of the Euclidean space $E$ is called
coercive if $f(x) \to \infty$ as $\|x\|\to\infty$.
Corollary 2.
Let $f: E \to\,\mathbb{R}$ be a lower semicontinuous coercive function.
Then $f$ has a global minimizer $x^* \in E$.
3. Existence of small gradients
When $f$ is Gâteaux differentiable, the following well-known necessary condition can be proven:Theorem 2 (Fermat).
Let $f: D \to\,\mathbb{R}$ be Gâteaux differentiable on the open domain $D \subseteq E$, where
$E$ is a Euclidean space, and let $x^* \in D$ be a local minimizer of $f$. Then $\nabla f(x^*) = 0$.
Theorem 3 (Approximate Fermat’s theorem).
Let $f: E \to\,\mathbb{R}$ be a Gâteaux differentiable, lower semicontinuous and lower bounded
function. Then, for all
$\epsilon > 0$ there exists $x_\epsilon \in E$ such that $\|\nabla f(x_\epsilon)\| \le \epsilon$.
Theorem 4.
Let $f: E \to\,\mathbb{R}$ be a Gâteaux differentiable, lower semicontinuous and lower bounded
function. Then, there exists a sequence $\{x_i\}$ such that
\[
f(x_i) \to \inf f, \quad \nabla f(x_i) \to 0.
\]
4. A cool application: Gordan’s lemma
Lemma 1 (Gordan).
Let $a_1, \ldots, a_m$ be point in $\mathbb{R}^n$. Then exactly one of the following is true:
- The origin is in the convex hull of $\{a_1, \ldots, a_m\}$, that is there exist $\lambda_1, \ldots, \lambda_m \ge 0$, $\lambda_1 + \cdots + \lambda_m = 1$ such that $\lambda_1 a_1 + \cdots + \lambda_m a_m = 0$.
- All points are strictly on the same side of a hyperplane passing through the origin, that is there exists $\mu \in \mathbb{R}^n$ such that $\langle \mu, a_i \rangle < 0$ for all $i = 1, \ldots, m$.
\[h:\,\mathbb{R}^n \to \mathbb{R},\quad \mu \mapsto \log\left(\sum_{i=1}^m \exp \langle \mu, a_i \rangle\right).\] |
\[\nabla h(\mu) = \sum_{i=1}^m \left( \frac{\exp \langle \mu, a_i\rangle}{\sum_{j=1}^m \exp \langle \mu, a_j \rangle}\right) a_i.\] | (1) |
- Condition 1 true $\Rightarrow$ condition 2 false.
For contradiction, assume that both conditions hold, and take a $\mu \in \mathbb{R}^n$ that satisfies condition 2. Then, \[ \begin{aligned} 0 &= \langle \mu, 0 \rangle\\ &= \langle \mu, \lambda_1 a_1 + \cdots + \lambda_m a_m \rangle\\ &= \lambda_1 \langle \mu, a_1 \rangle + \cdots + \lambda_m \langle \mu, a_m \rangle\\ &< 0, \end{aligned} \] where the last inequality holds since at least one of the $\lambda$’s must be strictly positive. - Condition 2 false $\Rightarrow$ condition 1 true.
If condition 2 is false, then for each $\mu \in\mathbb{R}^n$ there exists an $i$ such that $\langle \mu, a_i \rangle \ge 0$. Hence, $h(\mu) \ge 0$ and $h$ is lower bounded. We can then use Theorem 3 and extract a sequence $\{\mu_k\}$ such that $\nabla h(\mu_k) \to 0$. Using Equation 1, we can rewrite this as \[ \sum_{i=1}^m \beta_i(\mu_k) a_i \to 0. \] Since each convex combination vector $\{(\beta_1(\mu_k), \ldots, \beta_m(\mu_k))\}$ lives in a bounded set, we can assume without loss of generality (by using the Bolzano-Weierstrass theorem) that $\beta_i(\mu_k) \to {\bar \beta}_i \ge 0$, with ${\bar\beta}_1+\cdots +{\bar\beta}_m=1$, and therefore \[ \sum_{i=1}^m {\bar\beta}_i a_i = 0. \] $\square$