Gabriele Farina - Polarity and oracle equivalence

MIT 6.7220/15.084 — Nonlinear Optimization (Spring ‘25) Thu, Mar 13th 2025
Lecture 11
Polarity and oracle equivalence
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
We conclude this first part of the course by tying one last loose end and giving one last
fundamental result regarding the computational aspects of convex optimization problems.
Let Ω ⊆ ℝ𝑛 be closed, convex, and bounded. In Lecture 5, we saw how separation—which
we argued captures the essence of duality—can be turned into an algorithmic tool through
the notion of separation oracle. Specifically, if a separation oracle for Ω can be constructed,
then it can be converted into an algorithm for convex optimization on Ω via the ellipsoid
algorithm provided that there exist radii 𝑟, 𝑅 > 0 and a point 𝑥0 ∈ Ω such that 𝔹𝑟(𝑥0) ⊆
Ω ⊆ 𝔹𝑅(0).
On the other hand, a separation oracle for Ω can always be constructed efficiently from an
algorithm for convex optimization on Ω, since the direction connecting 𝑦 to its projection
onto Ω is a separating direction (see Lecture 5).
With a separation oracle, or an algorithm for computing Euclidean projections onto Ω, we
can always decide whether 𝑦 ∈ Ω or not—that is, construct a membership oracle. Finally,
given an algorithm for convex optimization over Ω, we obtain as a special case an algorithm
for linear optimization on Ω. We can summarize all these observations pictorially via the
black solid arrows in the diagram below, each of which represents an efficient reduction.
Special case
Lecture 5
(Ellipsoid
method)
Lecture 5
Special
case
Section L11.2 Section L11.3
Euclidean proj.
onto Ω
Membership
in Ω
Separation
oracle for Ω
Convex opt.
on Ω
Linear opt.
on Ω
The goal of this lecture is to show that, perhaps surprisingly, the reductions indicated by
dashed blue arrows in the diagram above hold. Specifically, we will show that:
• If one has an oracle for linear optimization on Ω, then one can efficiently “upgrade
it” into an oracle for general convex optimization on Ω. This holds even if by “linear
optimization oracle” we mean an algorithm that given any 𝑐 ∈ ℝ𝑛 can return the
minimum value of 𝑐⊤𝑥 on Ω, but does not return a minimizer 𝑥∗ at which this value
is achieved.
• A mere membership oracle—that is, an algorithm that given any 𝑦 ∈ ℝ𝑛 can decide if
𝑦 ∈ Ω or not—can be efficiently turned into a separation oracle for Ω, and thus into
an algorithm for generic convex optimization.
L11.1 Oracles
For this lecture we consider a set Ω ⊆ ℝ𝑛 compact and convex with
𝔹𝑟(𝑥0) ⊆ Ω ⊆ 𝔹𝑅(0) for some 𝑥0 ∈ Ω.
Given Ω, several types of algorithms can be constructed. In particular, we especially focus
on these:
• A membership oracle takes as input a point 𝑦 ∈ ℝ𝑛 and returns whether 𝑦 ∈ Ω or not.
• A separation oracle takes as input a point 𝑦 ∈ ℝ𝑛 and returns a separating direction
if 𝑦 ∉ Ω, or the statement “𝑦 ∈ Ω” otherwise.
• A Euclidean projection oracle takes as input a point 𝑦 ∈ ℝ𝑛 and returns the projection
of 𝑦 onto Ω.
• A linear optimization oracle takes as input a vector 𝑐 ∈ ℝ𝑛 and returns the minimum
value of 𝑐⊤𝑥 over 𝑥 ∈ Ω, and optionally a minimizer of this function.
For the purposes of this lecture, by “efficient reduction” between oracles we mean that
the reduction can be implemented with a number of calls to the original oracle that is at
most polynomial in the dimension 𝑛 of the space. Furthermore, for simplicity we will not
be concerned with approximation errors, nor with representation issues for the numbers
involved.
L11.2 From membership to separation
The ability to convert a membership oracle into a separation oracle was first shown
by Yudin, D. B., & Nemirovskii, A. S. [YN76] and then crystallized by Grötschel, M.,
Lovász, L., & Schrijver, A. [GLS93] in their book on the ellipsoid method. A more modern
construction was proposed recently by Lee, Y. T., Sidford, A., & Vempala, S. S. [LSV18].
We will follow this latter construction.
L11.2.1 Intuition
Let’s start with the easy case: if ‖𝑦‖2 > 𝑅, since Ω ⊆ 𝔹𝑅(0) we can simply separate 𝑦 from
the ball 𝔹𝑅(0) by returning the halfspace {𝑥 ∈ ℝ𝑛 : 𝑦⊤𝑥 ≤ 𝑅‖𝑦‖2}. So, we focus on the case
where ‖𝑦‖2 ≤ 𝑅.
The main idea hinges on a different construction for a separating hyperplane, which does
not rely on the Euclidean projection of the point 𝑦 onto Ω, unlike what we saw in Lecture 5.
Instead, we will construct a separation oracle as follows:
1. We start from 𝑥0, and move on a straight line towards
𝑦 until we hit the boundary of the set Ω. Call the
boundary point 𝑧.
2. We will compute the equation for a plane tangent
at the boundary of Ω at 𝑧. Such a plane is called a
supporting hyperplane of Ω at 𝑧.¹ 𝑧
𝑥0
Ω
𝔹𝑟(𝑥0)
𝑦
Crucially, figuring out how much we can move from 𝑥0 in the direction of 𝑦 can be done
via binary search on the membership of the points 𝑥0 + 𝛼(𝑦 − 𝑥0) for 𝛼 ∈ [0, 1].
The second step is more delicate, in that we need to perform some kind of “numeric
differentiation” along the boundary of Ω to find a supporting hyperplane.
L11.2.2 The depth function
In order to figure out the equation of the supporting hyperplane, we will need a way of
constructing points that are close to 𝑧 and on the boundary of Ω. A natural idea is to
construct such points by moving in the direction of 𝑦 − 𝑥0 starting from a point close to
𝑥0, until the boundary is reached, that is, until the membership oracle starts signaling that
we have gone too far and left Ω.
In order to make the argument formal, let’s introduce the function 𝑑𝑦 : Ω → ℝ that informs
us of how far we can move in the direction 𝑦 − 𝑥 starting from any 𝑥 ∈ Ω until we reach
the boundary of Ω:
𝑑𝑦 : Ω → ℝ, 𝑑𝑦(𝑥) ≔ max{𝛼 ∈ ℝ≥0 : 𝑥 + 𝛼 𝑦 − 𝑥0
‖𝑦 − 𝑥0‖2
∈ Ω}.
The maximum is guaranteed to exist because the set on the right is
• nonempty, since the value 𝛼 = 0 always guarantees the condition on the right;
• closed, since Ω is closed; and
• bounded, since Ω ⊆ 𝔹𝑅(0). [▷ You should be able to produce a formal proof!]
Furthermore, by convexity of Ω, the set on the right-hand side is guaranteed to be an
interval.
We now study the properties of the depth function 𝑑𝑦(𝑥). The proofs of these results are
easy exercises, and are left to the appendix at the end of this lecture note [▷ Try to prove
these on your own before reading the proofs].
Theorem L11.1. For any given 𝑥 ∈ Ω, we have 𝑑𝑦(𝑥) ∈ [0, 2𝑅]. Furthermore, we can
compute 𝑑𝑦(𝑥) up to 𝜖 > 0 error using 𝑂(log(𝑅/𝜖)) calls to a membership oracle for Ω.
Theorem L11.2. The function 𝑑𝑦 is concave (i.e., −𝑑𝑦 is convex).
Theorem L11.3. The function 𝑑𝑦 restricted to the domain 𝔹𝑟/3(𝑥0) is (3𝑅/𝑟)-Lipschitz
continuous.
L11.2.3 Constructing the supporting hyperplane
Given the depth function 𝑑𝑦, we can now construct the supporting hyperplane at 𝑧 by
performing numeric differentiation at 𝑥0. A key issue that we will need to overcome is that—
while Lipschitz continuous; see Theorem L11.3—the depth function might not differentiable
at 𝑥0. We will discuss how to address this issue in a second. Before we do that though, let’s
first establish the connection between the depth function and the supporting hyperplane
under the idealized hypothesis that 𝑑𝑦 is differentiable at 𝑥0, just to make sure that the
overall direction is aligned with our objectives.
Theorem L11.4. If 𝑑𝑦 is differentiable at 𝑥0, the gradient ∇𝑑𝑦(𝑥0) provides a separating
direction, that is,
⟨∇𝑑𝑦(𝑥0), 𝑦 − 𝑥⟩ < 0 ∀𝑥 ∈ Ω.
Proof. By concavity, we know that first-order approximation of the function is a global
upper bound,² and so
𝑑𝑦(𝑥) ≤ 𝑑𝑦(𝑥0) + ⟨∇𝑑𝑦(𝑥0), 𝑥 − 𝑥0⟩ ∀𝑥 ∈ Ω. (1)
In particular, consider the point
𝑥′ ≔ 𝑥0 − 𝑟
2𝑅 (𝑦 − 𝑥0).
This point satisfies ‖𝑥′ − 𝑥0‖2 ≤ 𝑟 since ‖𝑦 − 𝑥0‖2 ≤ 2𝑅. Hence, we have that 𝑥′ ∈ Ω.
Furthermore,
𝑑𝑦(𝑥′) = 𝑑𝑦(𝑥0) + 𝑟
2𝑅 ‖𝑦 − 𝑥0‖2.
Plugging into (1), we then find
𝑑𝑦(𝑥0) + 𝑟
2𝑅 ‖𝑦 − 𝑥0‖2 ≤ 𝑑𝑦(𝑥0) − 𝑟
2𝑅 ⟨∇𝑑𝑦(𝑥0), 𝑦 − 𝑥0⟩
⟹ ‖𝑦 − 𝑥0‖2 ≤ ⟨∇𝑑𝑦(𝑥0), 𝑥0 − 𝑦⟩.
Furthermore, since 𝑦 ∉ Ω, we have 𝑑𝑦(𝑥0) < ‖𝑦 − 𝑥0‖2, and so we can write
𝑑𝑦(𝑥0) < ⟨∇𝑑𝑦(𝑥0), 𝑥0 − 𝑦⟩.
Plugging the previous inequality into (1), we find
𝑑𝑦(𝑥) < ⟨∇𝑑𝑦(𝑥0), 𝑥 − 𝑦⟩ ∀𝑥 ∈ Ω.
Finally, using the fact that 𝑑𝑦(𝑥) ≥ 0 for all 𝑥 ∈ Ω by Theorem L11.1, the statement
follows. □
Theorem L11.4 shows that, as we suspected, the gradient of the depth function at 𝑥0
provides a separating direction. Unfortunately, the depth function might not be differen-
tiable at 𝑥0.
The key to overcoming this issue is that convex functions have very nice properties, and in
particular, they are twice differentiable almost everywhere (see Alexandrov’s theorem and
Rademacher’s theorem). So, we can first perturb 𝑥0 slightly to a nearby point 𝑥0′ where the
depth function is differentiable, and then use a numerical gradient of the depth function
at 𝑥0′ as a separating direction. You can take a look at the original paper by Lee, Y. T.,
Sidford, A., & Vempala, S. S. [LSV18] for the technical details.
L11.3 From linear optimization to separation: Polarity
We now turn our attention to the other connection promised in the introduction: the fact
that a linear optimization oracle can be converted into a separation oracle. In particular,
this implies that a linear optimization oracle can always be promoted into an oracle for
convex optimization. As a special case, this implies that if we know how to optimize linear
functions on a closed convex set Ω, we can in particular project onto Ω.
A key notion that will help us establish this connection is that of the polar set. The polar
set Ω⚬ of a set Ω ⊆ ℝ𝑛 that contains the origin is another convex compact set. Crucially,
it exhibits some form of “duality” with respect to Ω, in the sense that linear optimization
on Ω can be converted into membership in Ω⚬, and vice versa. We thus reach the following
diagram, that completes the picture started in the introduction.
Theorem L11.7 + Theorem L11.5
E
E
L11.6
L11.6
Euclidean proj.
onto Ω
Membership
in Ω
Separation
oracle for Ω
Convex opt.
on Ω
Linear opt.
on Ω
Euclidean proj.
onto Ω⚬
Membership
in Ω⚬
Separation
oracle for Ω⚬
Convex opt.
on Ω⚬
Linear opt.
on Ω⚬
Ω
Ω⚬
Figure 1: Connections among oracles between a set Ω and its polar Ω⚬. “E”: Ellipsoid
method.
L11.3.1 The polar set Ω⚬
We start from defining the notion of polar set.
Definition L11.1 (Polar set Ω⚬). Let Ω be compact and convex, and such that 𝔹𝑟(0) ⊆
Ω ⊆ 𝔹𝑅(0) for some radii 0 < 𝑟 ≤ 𝑅. The polar set Ω⚬ to Ω is defined as
Ω⚬ ≔ {𝑦 ∈ ℝ𝑛 : ⟨𝑥, 𝑦⟩ ≤ 1 ∀𝑥 ∈ Ω}.
As an example, for each of the two plots below, each set is polar to the other.
−1 1 𝑥1
−2
−1
1
𝑥2
0
Ω
Ω⚬
−1 1 𝑥1
−1
1
2
𝑥2
0
Ω
Ω⚬
The polar set satisfies similar properties to Ω, as we now show. Furthermore, a very
important property of polarity is that it is an involution, that is, the polar of the polar is
the original set.
Theorem L11.5. Let the set Ω ⊆ ℝ𝑛 be convex and compact, and such that 𝔹𝑟(0) ⊆
Ω ⊆ 𝔹𝑅(0) for some 0 < 𝑟 ≤ 𝑅. Then,
1. the polar set Ω⚬ is convex and compact, and satisfies 𝔹1/𝑅(0) ⊆ Ω⚬ ⊆ 𝔹1/𝑟(0); and
2. the bipolar (Ω⚬)⚬ is equal to Ω.
Proof. We prove the two points separately.
1. The set Ω⚬ is convex and closed since by definition it is an intersection of halfspaces.
Furthermore, since Ω ⊆ 𝔹𝑅(0), for all 𝑦 ∈ 𝔹1/𝑅(0) and 𝑥 ∈ Ω we have
⟨𝑦, 𝑥⟩ ≤ ‖𝑦‖2‖𝑥‖2 ≤ (1/𝑅) ⋅ 𝑅 ≤ 1 ⟹ 𝔹1/𝑅(0) ⊆ Ω⚬.
We now show that Ω⚬ ⊆ 𝔹1/𝑟(0). Pick any 𝑦 ∈ Ω⚬, and consider the vector 𝑧 ≔
𝑟𝑦/‖𝑦‖2 ∈ 𝔹𝑟(0) ⊆ Ω. Being in Ω⚬, 𝑦 must then satisfy
1 ≥ ⟨𝑦, 𝑧⟩ = ⟨𝑦, 𝑟 𝑦
‖𝑦‖2
⟩ = 𝑟‖𝑦‖2.
Hence, ‖𝑦‖2 ≤ 1/𝑟. This shows that Ω⚬ is bounded, and since we know it is closed,
we conclude that Ω⚬ is compact, concluding the proof.
2. We prove the two directions separately.
• (⊇) We start from the easy direction, and show that Ω ⊆ (Ω⚬)⚬. Pick any 𝑥 ∈
Ω. Showing that 𝑥 ∈ (Ω⚬)⚬ means showing that ⟨𝑥, 𝑦⟩ ≤ 1 for all 𝑦 ∈ Ω⚬. But
this is direct, since 𝑦 ∈ Ω⚬ implies ⟨𝑥, 𝑦⟩ ≤ 1 by definition of the polar.
• (⊆) We show that (Ω⚬)⚬ ⊆ Ω. Let 𝑥 ∈ (Ω⚬)⚬. Assume for contradiction that
𝑥 ∉ Ω. Since Ω is closed and convex, we can then separate 𝑥 from Ω (Lecture
5). Hence, there exist 𝑢 ∈ ℝ𝑛, 𝑣 ∈ ℝ such that
⟨𝑢, 𝑥⟩ > 𝑣, and ⟨𝑢, 𝑦⟩ ≤ 𝑣 for all 𝑦 ∈ Ω.
(Note that necessarily 𝑢 ≠ 0 or the above inequalities would be inconsistent.)
Since 𝑦 = 𝑟𝑢/‖𝑢‖2 belongs to 𝔹𝑟(0), which is contained in Ω, then in particular
𝑣 > 0. But then, we can divide the inequality on the right by 𝑣 and find that
⟨𝑢
𝑣 , 𝑦⟩ ≤ 1 ∀𝑦 ∈ Ω ⟹ 𝑢
𝑣 ∈ Ω⚬.
However, remember that by hypothesis, we assumed that 𝑥 ∈ (Ω⚬)⚬, and so
⟨𝑢
𝑣 , 𝑥⟩ ≤ 1. This contradicts the fact that ⟨𝑢, 𝑥⟩ > 𝑣. Hence, we conclude that
𝑥 ∈ Ω.
□
L11.3.2 Separation over the polar
Assume for now that a point 𝑥0 ∈ Ω such that 𝔹𝑟(𝑥0) ⊆ Ω is known. We will show that a
linear optimization oracle for Ω can be converted into a membership oracle for the polar
set Ω⚬.
Theorem L11.6. A membership oracle for Ω⚬ can be constructed efficiently starting from
a linear optimization oracle for Ω, even if the linear optimization oracle only returns
the optimal objective value and not the minimizer.
The construction only requires a single call to the linear optimization oracle.
Proof. Without loss of generality, we assume that 𝑥0 = 0. If not, it suffices to shift both
Ω and 𝑦 by the quantity −𝑥0 so that 𝔹𝑟(0) ⊆ Ω, and continue the argument with the
understanding that both Ω and 𝑦 have been shifted.
Let 𝑦 ∈ ℝ𝑛. Determining if 𝑦 ∈ Ω⚬ means, by definition of the polar, checking whether
⟨𝑦, 𝑥⟩ ≤ 1 for all 𝑥 ∈ Ω. We can do so by optimizing the linear function ⟨𝑦, 𝑥⟩ over 𝑥 ∈
Ω using the linear optimization oracle. If the optimal value is less than or equal to 1, we
return that 𝑦 ∈ Ω⚬. Else, we return that 𝑦 ∉ Ω⚬. □
Theorem L11.6 provides the top, rightward thick red arrow in Figure 1. By invoking the
bipolarity theorem of Theorem L11.5, it follows that Theorem L11.6 also establishes the
bottom, leftward thick red arrow in the diagram.
In order to upgrade the linear optimization oracle for Ω into a separation oracle for Ω⚬,
we can use the results discussed in Section L11.2. However, with very little extra work we
can also show that the separation oracle for Ω⚬ can be constructed directly starting from
a linear optimization oracle for Ω without resorting to the randomized algorithm discussed
in Section L11.2, as long as the linear optimization oracle also returns a minimizer.
Theorem L11.7. A separation oracle for Ω⚬ can be constructed efficiently, without use
of Section L11.2, starting from a linear optimization oracle for Ω that returns the
optimal point.
The construction only requires a single call to the linear optimization oracle.
Proof. Let 𝑦 ∈ ℝ𝑛. As discussed above, if max𝑥∈Ω⟨𝑦, 𝑥⟩ ≤ 1, then 𝑦 ∈ Ω⚬, and our job is
done. Conversely, suppose that max𝑥∈Ω⟨𝑦, 𝑥⟩ > 1, and let 𝑥⋆ ∈ Ω be a maximizer. Then,
the direction 𝑑 ≔ 𝑥⋆ is a separating direction, because ⟨𝑑, 𝑦⟩ > 1, but ⟨𝑑, 𝑥⟩ ≤ 1 for all
𝑥 ∈ Ω⚬ by definition of polar set. □
Theorem L11.7 provides the right, downward thick red arrow in the diagram of Figure 1.
By invoking the bipolarity theorem of Theorem L11.5, it follows that Theorem L11.7 also
establishes the left, upward thick red arrow in the diagram.
L11.3.3 Getting rid of the assumption of knowing 𝑥0
Finally, we show that it is possible to convert a linear optimization oracle into a separation
oracle, even when the location of the interior point 𝑥0 is unknown. This is in contrast to
the conversion from membership to separation, which requires knowledge of 𝑥0 upfront.
▶ General idea. The idea for the reduction is to consider an augmented set Ω′ ⊆ ℝ𝑛+1
constructed from Ω, in which we know by construction where an interior 𝑥′
0 lies. Specifically,
consider the convex hull
Ω′ ≔ {𝜆(𝑥
0) + (1 − 𝜆)𝑤 : 𝑥 ∈ Ω, 𝑤 ∈ 𝔹1(𝑒𝑛+1), 𝜆 ∈ [0, 1]},
where 𝑒𝑛+1 ≔ (0, ..., 0, 1) ∈ ℝ𝑛+1 is the (𝑛 + 1)-st indicator vector.
It is straightforward to check [▷ You should check!] that Ω′ is convex, and it satisfies the
inclusions
𝔹1(𝑒𝑛+1) ⊆ Ω′ ⊆ 𝔹𝑅+1(0).
Hence, the point 𝑥′
0 ≔ 𝑒𝑛+1 is a known interior point, and the ball 𝔹1(𝑥′
0) is fully contained
in Ω′.
▶ Optimization oracle for Ω′. Before we can invoke the construction with the polar
discussed above, we need to ensure that in the process of augmenting Ω into Ω′, we have
not lost the ability to optimize linear objectives on Ω′.
It is very easy to check [▷ You should check!] that optimization over Ω′ can be decomposed
greedily according to
max
𝑥′∈Ω′⟨𝑐′, 𝑥′⟩ = max{max
𝑥∈Ω ⟨𝑐′, (𝑥
0)⟩, max
𝑤∈𝔹1(𝑥′
0)
⟨𝑐′, 𝑤⟩}
= max{max
𝑥∈Ω ⟨𝑐′, (𝑥
0)⟩, ⟨𝑐′, 𝑥′
0⟩ + ‖𝑐′‖2}.
The first element in the maximum on the right-hand side can be computed using the
original linear optimization oracle for Ω, and so a linear optimization oracle for Ω′ can be
constructed from one for Ω efficiently (i.e., with only a polynomial-time overhead).
▶ Separation oracle for Ω. Since we have a linear optimization oracle for Ω′, and we
know a point 𝑥′
0 in the interior of Ω′, we are now in the position of leveraging the result
in Section L11.3.1 and obtain a separation oracle for Ω′. To complete the construction, we
still need to show that such a separation oracle can be efficiently modified into a separation
oracle for Ω.
Suppose we want to separate the point 𝑦 ∈ ℝ𝑛 from Ω. We can construct the point 𝑦′ ≔
(𝑦, 0) ∈ ℝ𝑛+1, and query our separation oracle for Ω′.
• Suppose that the separation oracle for Ω′ returns that (𝑦, 0) ∈ Ω′. By inspecting the
definition of Ω′, it must then be the case that 𝑦 ∈ Ω.
• Conversely, suppose that the separation oracle for Ω′ returns a separating direction
𝑑′ = (𝑑, 𝛾) ∈ ℝ𝑛 × ℝ such that
⟨(𝑑
𝛾), (𝑦
0) − 𝜆(𝑥
0) − (1 − 𝜆)𝑤⟩ < 0 ∀𝑥 ∈ Ω, 𝑤 ∈ 𝔹1(𝑥′
0), 𝜆 ∈ [0, 1].
Then, the condition holds in particular for 𝜆 = 1, and we find
⟨𝑑, 𝑦 − 𝑥⟩ < 0 ∀𝑥 ∈ Ω,
implying that 𝑑 is a direction separating 𝑦 from Ω.
Bibliography for this lecture
[YN76] Yudin, D. B., & Nemirovskii, A. S. (1976). Informational complexity and efficient
methods for the solution of convex extremal problems. Matekon, 13(2), 22–45.
[GLS93] Grötschel, M., Lovász, L., & Schrijver, A. (1993). Geometric Algorithms and
Combinatorial Optimization. Springer. https://link.springer.com/book/10.1007/
978-3-642-78240-4
[LSV18] Lee, Y. T., Sidford, A., & Vempala, S. S. (2018). Efficient convex optimization
with membership oracles. Conference on Learning Theory, 1292–1294.
L11.A Appendix
L11.A.1 Proof of Theorem L11.1
Theorem L11.1 (Restated). For any given 𝑥 ∈ Ω, we have 𝑑𝑦(𝑥) ∈ [0, 2𝑅]. Furthermore,
we can compute 𝑑𝑦(𝑥) up to 𝜖 > 0 error using 𝑂(log(𝑅/𝜖)) calls to a membership oracle
for Ω.
Proof. Intuitively, since Ω is contained in a ball of radius 𝑅 centered in 0, when 𝛼 becomes
too large we must leave the ball and therefore Ω. In particular, for 𝛼 > 2𝑅,
‖(𝑥 + 𝛼 𝑦 − 𝑥0
‖𝑦 − 𝑥0‖2
)‖
2
≥ 𝛼‖ 𝑦 − 𝑥0
‖𝑦 − 𝑥0‖2
‖
2
− ‖𝑥‖2
≥ 𝛼 − 𝑅
> 𝑅.
Hence, 𝑑𝑦 ∈ [0, 2𝑅]. Since the interval has length 2𝑅, in order to get the desired 𝜀 > 0
precision it then suffices to run the binary search for 𝑂(log(𝑅/𝜖)) iterations. □
L11.A.2 Proof of Theorem L11.2
Theorem L11.2 (Restated). The function 𝑑𝑦 is concave (i.e., −𝑑𝑦 is convex).
Proof. Pick any 𝑥, 𝑥′ ∈ Ω, and 𝑡 ∈ [0, 1]. We need to show that 𝑑𝑦(𝑡𝑥 + (1 − 𝑡)𝑥′) ≥
𝑡𝑑𝑦(𝑥) + (1 − 𝑡)𝑑𝑦(𝑥′).
By definition of 𝑑𝑦, the points
𝑤 ≔ 𝑥 + 𝑑𝑦(𝑥) 𝑦 − 𝑥0
‖𝑦 − 𝑥0‖2
, 𝑤′ ≔ 𝑥′ + 𝑑𝑦(𝑥′) 𝑦 − 𝑥0
‖𝑦 − 𝑥0‖2
belong to Ω. Since Ω is convex, the point 𝑡𝑤 + (1 − 𝑡)𝑤′ also belongs to Ω. Hence, we have
𝑡𝑤 + (1 − 𝑡)𝑤 = (𝑡𝑥 + (1 − 𝑡)𝑥′) + (𝑡𝑑𝑦(𝑥) + (1 − 𝑡)𝑑𝑦(𝑥′)) 𝑦 − 𝑥0
‖𝑦 − 𝑥0‖2
∈ Ω.
This implies that 𝑡𝑑𝑦(𝑥) + (1 − 𝑡)𝑑𝑦(𝑥′) ∈ ℝ≥0 belongs to the set
{𝛼 ∈ ℝ≥0 : (𝑡𝑥 + (1 − 𝑡)𝑥′) + 𝛼 𝑦 − 𝑥0
‖𝑦 − 𝑥0‖2
∈ Ω},
and therefore
𝑑𝑦(𝑡𝑥 + (1 − 𝑡)𝑥′) ≥ 𝑡𝑑𝑦(𝑥) + (1 − 𝑡)𝑑𝑦(𝑥′)
by the definition of 𝑑𝑦. □
L11.A.3 Proof of Theorem L11.3
Theorem L11.3 (Restated). The function 𝑑𝑦 restricted to the domain 𝔹𝑟/3(𝑥0) is (3𝑅/𝑟)
-Lipschitz continuous.
Proof. Let 𝑥, 𝑥′ ∈ 𝔹𝑟/3(𝑥0). We want to show that |𝑑𝑦(𝑥) − 𝑑𝑦(𝑥′)| ≤ (3𝑅/𝑟)‖𝑥 − 𝑥′‖2. If
𝑥 = 𝑥′, the result is trivial, so we assume 𝑥 ≠ 𝑥′. The key insight is that the function 𝑑𝑦
is concave, and furthermore 𝑑𝑦(𝑥′′) ≥ 0 for all 𝑥′′ ∈ 𝔹𝑟(𝑥0) as shown in Theorem L11.1.
These two facts combined imply that 𝑑𝑦 cannot decrease too rapidly [▷ Make a drawing
to convince yourself before jumping into the algebra].
In particular, consider the point
𝑥′′ ≔ 𝑥′ + 𝑡(𝑥 − 𝑥′), where 𝑡 ≔ 2
3
𝑟
‖𝑥 − 𝑥′‖2
.
Using the triangle inequality, this point satisfies
‖𝑥′′‖2 ≤ ‖𝑥′‖2 + 2
3
𝑟
‖𝑥 − 𝑥′‖2
‖𝑥 − 𝑥′‖2 ≤ 𝑟
3 + 2𝑟
3 = 𝑟.
Hence, 𝑑𝑦(𝑥′′) ≥ 0. Furthermore, 𝑥 is a convex combination of 𝑥′ and 𝑥′′, as
𝑥 = (1 − 1
𝑡 )𝑥′ + 1
𝑡 𝑥′′
and 𝑡 ≥ 1 since ‖𝑥 − 𝑥′‖2 ≤ ‖𝑥‖2 + ‖𝑥′‖2 ≤ 2
3 𝑟. By the concavity of 𝑑𝑦, we have
𝑑𝑦(𝑥) ≥ (1 − 1
𝑡 )𝑑𝑦(𝑥′) + 1
𝑡 𝑑𝑦(𝑥′′)
≥ (1 − 1
𝑡 )𝑑𝑦(𝑥′) (since 𝑑𝑦(𝑥′′) ≥ 0)
= (1 − 3
2
‖𝑥 − 𝑥′‖
𝑟 )𝑑𝑦(𝑥′).
Rearranging, we finally find
𝑑𝑦(𝑥′) − 𝑑𝑦(𝑥) ≤ (3
2
𝑑𝑦(𝑥′)
𝑟 )‖𝑥 − 𝑥′‖2
≤ 3𝑅
𝑟 ‖𝑥 − 𝑥′‖2 (since 𝑑𝑦(𝑥′) ≤ 2𝑅; see Theorem L11.1).
Finally, repeating the analysis swapping the roles of 𝑥 and 𝑥′, we obtain
𝑑𝑦(𝑥) − 𝑑𝑦(𝑥′) ≤ 3𝑅
𝑟 ‖𝑥 − 𝑥′‖2.
Together, the inequalities imply
|𝑑𝑦(𝑥) − 𝑑𝑦(𝑥′)| ≤ 3𝑅
𝑟 ‖𝑥 − 𝑥′‖2
as we wanted to show. □
Changelog
• Mar 13, 2025: Fixed denominator from R to 2R (spotted in class)
• Mar 17, 2025: Fixed typo (thanks Jonathan Huang!)
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful
for any reports of typos.
¹Such a plane need not be unique. [▷ Think of an example.]
²While Theorem L4.1 was stated for a function differentiable at all points in its domain, it is immediate
to see that the proof only required differentiability at the particular point used in the linearization. [▷ You
should prove this!]

Content

Typo or question?

Metadata