􀈖
MIT 6.7220/15.084 — Nonlinear Optimization Thu, Feb 8th 2024
Lecture 2
First-order optimality conditions
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)
These notes are class material that has not undergone formal peer review. The TAs and I are grateful for any
reports of typos.
First-order optimality conditions define conditions that optimal points need to satisfy. For this lec-
ture, we will make the blanket assumption that we work with differentiable functions.
1 Unconstrained optimization
I’m pretty sure you have already encountered first-order optimality conditions for unconstrained
optimization problems before. For example, consider the following optimization problem.
Example 1.1. Find a solution to the problem
min
𝑥
s.t.
𝑓(𝑥)
𝑥 ∈ ℝ,
where the differentiable function 𝑓 : ℝ → , plotted
on the right, is defined as
𝑓(𝑥) ≔ −2𝑥 + 𝑒𝑥 − 5.
−3 −2 −1 1 2 3 𝑥
−5
5
10
𝑦
0
Solution . I expect that most students would have the same thought: take the gradient of the
function, set it to 0, and solve for 𝑥! In this case, this leads to −2 + 𝑒𝑥 = 0 which implies that
the optimal point is 𝑥 = log 2 ≈ 0.693.
Now, in the above process we have been pretty informal. It is good to remember that when facing
an optimization problem of the form min𝑥∈ℝ𝑛 𝑓 (𝑥), with 𝑓(𝑥) differentiable, solving ∇𝑓(𝑥) = 0 has
some limitations:
• It is only a necessary condition that all optimal points need to satisfy; but not all points that
satisfy it are automatically optimal. [▷ For example, think about what happens with 𝑓(𝑥) =
−𝑥2? With 𝑓(𝑥) = 𝑥3? With 𝑓(𝑥) = 𝑥3 + 3𝑥2 6𝑥 − 8?]
• In other words, the solutions to ∇𝑓(𝑥) = 0 form a list of possible minimizing points: solving
∇𝑓(𝑥) = 0 allows us to focus our attention on few promising candidate points (some people
call these “critical points”). It might give false positives but never false negatives: if a point
fails the ∇𝑓(𝑥) = 0 test, it cannot be optimal.
In practice, as you know from experience, solving ∇𝑓(𝑥) = 0 is a practical way of analytically solving
unconstrained problems. Today and next time, we will focus on the following two big questions:
• What is the correct generalization of the necessary condition ∇𝑓(𝑥) = 0, when we are faced
with a constrained optimization problem?
• Under what circumstances does ∇𝑓(𝑥) = 0 also become sufficient for optimality?
2 Constrained optimization
In order to generalize the “∇𝑓(𝑥) = 0” condition to constrained optimization problems, it is impor-
tant to make sure we are all on the same page as to why such a condition arises in the first place in
unconstrained problems. From there, generalizing will be straightforward.
2.1 Where the zero gradient condition arises from in unconstrained optimization
The idea is very simple: if 𝑥 is a minimizer of the function, then look at the values of the function
𝑓 : ℝ𝑛 → ℝ along the direction 𝑑. Clearly, 𝑓(𝑥 + 𝑡 ⋅ 𝑑) ≥ 𝑓(𝑥) for all 𝑡 ≥ 0 (or 𝑥 would not be a
minimizer). Hence, the directional derivative 𝑓(𝑥; 𝑑) of 𝑓 at 𝑥 along direction 𝑑,
𝑓(𝑥; 𝑑) = lim
𝑡↓0
𝑓(𝑥 + 𝑡 ⋅ 𝑑) − 𝑓(𝑥)
𝑡 ≥ 0,
since the limit of a nonnegative sequence must be nonnegative.
By definition of gradient, we have 𝑓(𝑥; 𝑑) = ⟨∇𝑓(𝑥), 𝑑⟩, and so the previous inequality can be rewrit-
ten as
⟨∇𝑓(𝑥), 𝑑⟩ ≥ 0 ∀𝑑 ∈ ℝ𝑛.
Because the above inequality must hold for all directions 𝑑 ∈ ℝ𝑛, in particular it must hold for 𝑑 =
−∇𝑓(𝑥), leading to
−‖∇𝑓(𝑥)‖2 ≥ 0 ∇𝑓(𝑥) = 0.
2.2 The constrained case
Now that we have a clearer picture of why the “∇𝑓(𝑥) = 0” condition arises in unconstrained prob-
lems, the extension to the constrained case is rather natural.
The main difference with the unconstrained case is that, in a constrained set, we might be limited
in the choices of available directions 𝑑 along which we can approach 𝑥 while remaining in the set.
Nonetheless, for any direction 𝑑 such that 𝑥 + 𝑡 ⋅ 𝑑 ∈ Ω for all 𝑡 ≥ 0 sufficiently small, the above
argument applies without changes, and we can still conclude that necessarily ⟨∇𝑓(𝑥), 𝑑⟩ ≥ 0.
So, the natural generalization of the “∇𝑓(𝑥) = 0” condition to constrained problems can be infor-
mally stated as follows: for the optimality of 𝑥 it is necessary that
⟨∇𝑓(𝑥), 𝑑⟩ ≥ 0 for all 𝑑 ∈ ℝ𝑛 that remain in Ω from 𝑥. (1)
In order to instantiate the above condition, two steps are required:
1. first, we need to determine what the set of “directions 𝑑 that remain in Ω from 𝑥” is.
2. then, based on the directions above, see in what way they constrain ∇𝑓(𝑥). For example, we
have seen before that when the set of all directions spans the entire space 𝑛, then ∇𝑓(𝑥) = 0.
Out of the two, usually the first point is the easiest. In all the cases that will be of our interest, we
can determine the set of directions that remain in Ω from 𝑥 by simply considering any other 𝑦 ∈
Ω and considering the direction from 𝑥 to 𝑦. This holds trivially if all line segments between 𝑥 and
any point in Ω are entirely contained in Ω, a condition known as star-convexity at 𝑥.
Definition 2.1 (Star-convexity at 𝑥). A set Ω ⊆ 𝑛 is said to be star-convex at a point 𝑥 ∈ Ω if,
for all 𝑦 ∈ Ω, the entire segment from 𝑥 to 𝑦 is contained in Ω. In symbols, if
𝑥 + 𝑡 ⋅ (𝑦 − 𝑥) ∈ Ω ∀𝑡 ∈ [0, 1].
(Note that the condition is equivalent to “𝑡 ⋅ 𝑦 + (1 − 𝑡) ⋅ 𝑥 ∈ Ω for all 𝑦 ∈ Ω and 𝑡 ∈ [0, 1]”, or
also “𝑡 ⋅ 𝑥 + (1 − 𝑡) ⋅ 𝑦 ∈ Ω for all 𝑦 ∈ Ω and 𝑡 ∈ [0, 1]”.)
In fact, for all our purposes today, we will only consider sets that are star-convex at all of their
points. Such sets are simply called convex.
Definition 2.2 (Convex set). A set Ω is convex if it is star-convex at all of its points 𝑥 ∈ Ω. In
other words, Ω is convex if all segments formed between any two points 𝑥, 𝑦 ∈ Ω are entirely
contained in Ω. In symbols, if
𝑡 ⋅ 𝑥 + (1 − 𝑡) ⋅ 𝑦 ∈ Ω ∀𝑥, 𝑦 ∈ Ω and 𝑡 ∈ [0, 1].
Under assumption of convexity, the condition (1) can be equivalently rewritten as follows.
Theorem 2.1 (First-order necessary optimality condition for a convex feasible set). Let Ω ⊆ 𝑛
be convex and 𝑓 : 𝑛 be a differentiable function. For a point 𝑥 ∈ Ω to be a minimizer of
𝑓 over Ω it is necessary that
⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ ≥ 0 ∀𝑦 ∈ Ω.
2.3 Geometric intuition: normal cones
The condition established in Theorem 2.1 has the following geometric interpretation: the gradient
of 𝑓 at a solution 𝑥 ∈ Ω must form an acute angle with all directions 𝑦 − 𝑥, 𝑦 ∈ Ω. While this makes
perfect sense, it is actually more customary, for mental visualization purposes, to flip signs and
instead have the following useful mental picture: at any solution 𝑥 ∈ Ω, the opposite of the gradient
−∇𝑓(𝑥) must form an obtuse angle with all directions 𝑦 − 𝑥, 𝑦 ∈ Ω. In other words, −∇𝑓(𝑥) can
only “look” in those directions in which the set is not in the 90° cone of vision.
Of course, depending on the shape of the set Ω and the particular point 𝑥 ∈ Ω, the set of directions
that point away from the set might be extremely limited—for example we have seen earlier that
when Ω = ℝ𝑛, then no directions “point away” from Ω, and the only possible value for −∇𝑓(𝑥) is
therefore 0. This mental picture of “directions pointing away” from Ω is generally pretty useful, and
we give it a name.
Definition 2.3 (Normal cone). Let Ω ⊆ 𝑛 be convex, and let 𝑥 ∈ Ω. The normal cone to Ω at
𝑥, denoted 𝒩Ω(𝑥), is defined as the set
𝒩Ω(𝑥) ≔ {𝑑 ∈ ℝ𝑛 : ⟨𝑑, 𝑦 − 𝑥⟩ ≤ 0 ∀𝑦 ∈ Ω}.
With this definition, the first-order necessary optimality condition for 𝑥, given in Theorem 2.1,
can be equivalently written as
−∇𝑓(𝑥) ∈ 𝒩Ω(𝑥).
Example 2.1. As an example, here are a few normal cones computed for a convex set.
Ω
𝑥2
𝒩Ω(𝑥2)
𝑥1
𝒩Ω(𝑥1)
3 Normal cones to some notable sets, and applications
Let’s build our intuition regarding normal cones by considering examples that are progressively
harder. Along the way, we will see that first-order optimality conditions, in all their simplicity, imply
some of the deepest results in optimization theory.
3.1 Point in the interior
Let’s start from an easy example: the normal cone of a point in the interior of the feasible set.¹
¹Remember that a point is in the interior of a set if you can draw a ball centered at the point, such that the ball
is fully contained in the set.
Example 3.1 (Normal cone at an interior point). What is the normal cone 𝒩Ω(𝑥) of a point 𝑥
in the interior of the feasible set Ω?
𝑥
Ω
Solution . In this case, the normal cone contains only the zero vector, that is,
𝒩Ω(𝑥) = {0}.
This is easy to prove: if any 𝑑 ≠ 0 were to belong to 𝒩Ω(𝑥), then we could consider the point 𝑥 +
𝛿𝑑 for sufficiently small 𝛿 > 0, and have
⟨𝑑, 𝑥 + 𝛿𝑑 − 𝑥⟩ = 𝛿‖𝑑‖2 > 0.
Hence, for a point 𝑥 in the interior of Ω to be optimal, it is necessary that ∇𝑓(𝑥) = 0.
This fully recovers what we know for unconstrained domains, since there every point is in the interior
of the feasible set.
3.2 Point on a hyperplane / subspace
Next up, we consider the normal cone to a point on a hyperplane.
Example 3.2 (Normal cone to a hyperplane). Consider a hyperplane
Ω ≔ {𝑦 ∈ ℝ𝑛 : ⟨𝑎, 𝑦⟩ = 0}, where 𝑎 ∈ ℝ𝑛, 𝑎 ≠ 0
and a point 𝑥 ∈ Ω.
It is pretty intuitive from the picture that
𝒩Ω(𝑥) = span{𝑎} = {𝜆 ⋅ 𝑎 : 𝜆 ∈ ℝ}.
𝑥 𝑎
Ω 𝒩Ω(𝑥)
Solution . In order to convert this intuition into a formal proof, [▷ before continuing, try to
think how you would go about proving this yourself!] we would need to show two things:
• all points in span{𝑎} do indeed belong to 𝒩Ω(𝑥); by convexity, this means that we need to
show that all points 𝑧 ∈ span{𝑎} satisfy
⟨𝑧, 𝑦 − 𝑥⟩ ≤ 0 ∀𝑦 ∈ Ω;
• none of the points outside of span{𝑎} belong to 𝒩Ω(𝑥); that is, for any point 𝑧 ∉ span{𝑎},
then there exists 𝑦 ∈ Ω such that ⟨𝑧, 𝑦 − 𝑥⟩ > 0.
The first point is straightforward: by definition of span, all points in span{𝑎} are of the form 𝜆 ⋅
𝑎 for some 𝜆 ∈ . But then, for all 𝑦 ∈ Ω,
⟨𝑧, 𝑦 − 𝑥⟩ = ⟨𝜆 ⋅ 𝑎, 𝑦 − 𝑥⟩ = 𝜆 ⋅ ⟨𝑎, 𝑦⟩ − 𝜆 ⋅ ⟨𝑎, 𝑥⟩ = 0 − 0 ≤ 0,
where the last equality follows from the definition of Ω and the fact that both 𝑥 and 𝑦 belong
to it. To prove the second point, we can let the geometric intuition guide us. Draw a vector 𝑧 ∉
span{𝑎} applied to 𝑥, and look at the picture:
𝑧
𝑥
𝑦
𝑎
span{𝑎}
Ω
We can project the point 𝑥 + 𝑧 onto Ω, finding some 𝑦 ∈ Ω, and onto 𝑥 + span{𝑎}, finding some
point 𝑥 + 𝑘 ⋅ 𝑎:
𝑧 = (𝑦 − 𝑥) + 𝑘 ⋅ 𝑎.
We now show that 𝑧 cannot be in 𝒩Ω(𝑥), because it would have a positive inner product with
𝑦 − 𝑥:
⟨𝑧, 𝑦 − 𝑥⟩ = ⟨(𝑦 − 𝑥) + 𝑘 ⋅ 𝑎, 𝑦 − 𝑥⟩
= ‖𝑦 − 𝑥‖2 + 𝑘 ⋅ ⟨𝑎, 𝑦 − 𝑥⟩ = ‖𝑦 − 𝑥‖2.
Since 𝑧 was not aligned with span{𝑎} by hypothesis, then 𝑦 ≠ 𝑥, and therefore ⟨𝑧, 𝑦 − 𝑥⟩ > 0 as
we wanted to show.
Remark 3.1. Because normal cones are insensitive to shifts in the set, the result above ap-
plies without changes to any affine plane Ω ≔ {𝑦 ∈ 𝑛 : ⟨𝑎, 𝑦⟩ = 𝑏}, with 𝑎 ∈ 𝑛, 𝑏 ∈ . Again,
𝒩Ω(𝑥) = span{𝑎} = {𝜆 ⋅ 𝑎 : 𝜆 ∈ ℝ} at any 𝑥 ∈ Ω.
Remark 3.2. The same argument above, based on decomposing 𝑥 + 𝑧 onto Ω and its orthogonal
complement span{𝑎} applies to lower-dimensional affine subspaces
Ω ≔ {𝑦 ∈ ℝ𝑛 : 𝐴𝑦 = 𝑏}.
In this case, we obtain that
𝒩Ω(𝑥) = colspan(𝐴⊤).
(This immediately recovers Example 3.2 by considering 𝐴 = 𝑎⊤)
In the case of Remark 3.2, the argument above with the projection goes through verbatim. In this
case, one would need to project 𝑥 + 𝑧 onto colspan(𝐴) and onto Ω
²The orthogonality of colspan(𝐴) and Ω is a reflection of the well-known linear algebra result that the orthogonal
complement of the nullspace of a matrix is the span of the columns of the transpose matrix.
Remark 3.3 (Lagrange multipliers). The discussion we just had, shows that whenver we have a
problem of the form
min
𝑥
s.t.
𝑓(𝑥)
𝐴𝑥 = 𝑏
𝑥 ∈ ℝ𝑛,
at optimality it needs to hold that
−∇𝑓(𝑥) = 𝐴⊤𝜆, for some 𝜆 ∈ ℝ𝑑
where 𝑑 is the number of rows of 𝐴. This necessity of being able to express—at optimality—the
gradient of the objective as a combination of the constraints is very general. The entries of 𝜆
are an example of Lagrange multipliers.
In the next two subsections, we will see how the characterization of the normal cone to affine sub-
spaces enables us to solve a couple of problems that arise in practice.

Content

PDF File

Typo or question?

Get in touch!
gfarina AT mit.edu

Metadata

Course: MIT 6.7220 / 15.084
Term: Spring 2024
Date: 2024-02-08