􀀲􀀲􀄚􀀔􀊘􀅄􀋀􀊚􀊘􀋀􀊚
MIT 6.7220/15.084 — Nonlinear Optimization Tue, Feb 13th 2024
Lecture 3
The special case of convex functions
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)
These notes are class material that has not undergone formal peer review. The TAs and I are grateful for any
reports of typos.
In the previous lecture, we have seen how any solution 𝑥 to a nonlinear optimization problem defined
on a convex feasible set Ω ⊆ ℝ𝑛 must necessarily satisfy the first-order optimality condition
⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ ≥ 0 ∀𝑦 ∈ Ω.
In general, this optimality condition is only necessary but not sufficient. However, there exists a
notable class of functions for which such a condition is sufficient. These are called convex functions,
and are the topic of today’s lecture.
1 Convex functions
Intuitively, a good mental picture for convex functions is as functions that “curve upward” (think
of a bowl for example). All the following functions are convex:
𝑥0.25 0.5 0.75 1
𝑓(𝑥) = 𝑥 log 𝑥
𝑥−1 −0.5 0 0.5 1
−1
−0.5
0.5
1 𝑓(𝑥) = −𝑥
𝑥−4 −2 0 2 4
1
2
3
4
𝑓(𝑥) = log(1 +
In particular, due to their curvature, local optima of these functions are also global optima, and the
first-order optimality condition completely characterizes optimal points.
To capture the condition on the curvature in the most general terms (that is, without even assuming
differentiability of the function), the following definition is used.
Definition 1.1 (Convex function). Let Ω ⊆ 𝑛 be convex.
A function 𝑓 : Ω → is convex if, for any two points 𝑥, 𝑦 ∈
Ω and 𝑡 ∈ [0, 1],
𝑓((1 − 𝑡) ⋅ 𝑥 + 𝑡 ⋅ 𝑦) ≤ (1 − 𝑡) ⋅ 𝑓(𝑥) + 𝑡 ⋅ 𝑓(𝑦).
𝑥 𝑦
1.1 Convexity implies bounding by linearization
Assuming that 𝑓 is not only convex but also differentiable, a very important property of convex
functions is that they lie above their linearization at any point.
𝑥0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
−0.4
−0.3
−0.2
−0.1 𝑓
𝑓(𝑥0) + ⟨∇𝑓 (𝑥0), 𝑥 − 𝑥0
𝑥0
This follows directly from the definition, as we show next.
Theorem 1.1. Let 𝑓 : Ω → be a convex and differentiable function defined on a convex domain
Ω. Then, at all 𝑥 ∈ Ω,
𝑓(𝑦) ≥ 𝑓(𝑥) + ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩⏟⏟⏟⏟⏟⏟⏟⏟⏟
linearization of 𝑓 around 𝑥
∀𝑦 ∈ Ω.
Proof . Pick any 𝑥, 𝑦 ∈ Ω. By definition of convexity, we have
𝑓(𝑥 + 𝑡 ⋅ (𝑦 − 𝑥)) ≤ 𝑓(𝑥) + 𝑡 ⋅ (𝑓(𝑦) − 𝑓(𝑥)) ∀𝑡 ∈ [0, 1].
Moving the 𝑓(𝑥) from the right-hand side to the left-hand side, and dividing by 𝑡, we therefore get
𝑓(𝑥 + 𝑡 ⋅ (𝑦 − 𝑥)) − 𝑓(𝑥)
𝑡 ≤ 𝑓(𝑦) − 𝑓(𝑥) ∀𝑡 ∈ (0, 1].
Taking a limit as 𝑡 ↓ 0 and recognizing a directional derivative at 𝑥 along direction 𝑦 − 𝑥 on the
left-hand side, we conclude that
⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ ≤ 𝑓(𝑦) − 𝑓(𝑥).
Rearranging yields the result.
1.2 Sufficiency of first-order optimality conditions
The above result also immediately shows the sufficiency of first-order optimality conditions.
Theorem 1.2. Let Ω ⊆ 𝑛 be convex and 𝑓 : Ω → be a convex differentiable function. Then,
−∇𝑓(𝑥) ∈ 𝒩Ω(𝑥) 𝑥 is a minimizer of 𝑓 on Ω
Proof . We already know from Lecture 2 that −∇𝑓(𝑥) ∈ 𝒩Ω(𝑥) is necessary for optimality. So,
we just need to show sufficiency. Specifically, we need to show that if ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ ≥ 0 for all
𝑦 ∈ Ω, then surely 𝑓(𝑦) ≥ 𝑓(𝑥) for all 𝑦 ∈ Ω. This follows immediately from Theorem 1.1.
2 Equivalent definitions of convexity
Theorem 2.1. Let Ω ⊆ 𝑛 be a convex set, and 𝑓 : Ω → be a function. The following are
equivalent definitions of convexity:
1. 𝑓((1 − 𝑡) ⋅ 𝑥 + 𝑡 ⋅ 𝑦) ≤ (1 − 𝑡) ⋅ 𝑓(𝑥) + 𝑡 ⋅ 𝑓(𝑦) for all 𝑥, 𝑦 ∈ Ω, 𝑡 ∈ [0, 1].
2. (If 𝑓 is differentiable) 𝑓(𝑦) ≥ 𝑓(𝑥) + ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ for all 𝑥, 𝑦 ∈ Ω.
3. (If 𝑓 is twice differentiable and Ω is open) 2𝑓(𝑥) ≽ 0 for all 𝑥 ∈ Ω.
Most general
Most often used
Often easiest to check
Proof . We have already seen how (1) ⟹ (2) in Theorem 1.1. To conclude the proof, it sufficies to
show that under differentiability (2) ⟹ (1), and that under twice differentiability and openness
of Ω (2) ⟺ (3). We break the proof into separate steps.
Proof that (2) ⟹ (1). Pick any 𝑎, 𝑏 ∈ Ω and 𝑡 ∈ (0, 1), and consider the point
Ω ∋ 𝑧 ≔ 𝑡 ⋅ 𝑎 + (1 − 𝑡) ⋅ 𝑏.
From the linearization bound (2) for the choices (𝑥, 𝑦) = (𝑧, 𝑎), (𝑧, 𝑏), we know that
𝑓(𝑎) ≥ 𝑓(𝑧) + ⟨∇𝑓(𝑧), 𝑎 − 𝑧⟩,
𝑓(𝑏) ≥ 𝑓(𝑧) + ⟨∇𝑓(𝑧), 𝑏 − 𝑧⟩.
Multiplying the first inequality by 𝑡 and the second by 1 − 𝑡, and summing, we obtain
𝑡 ⋅ 𝑓(𝑎) + (1 − 𝑡) ⋅ 𝑓(𝑏) ≥ 𝑓(𝑧) + ⟨∇𝑓(𝑧), 𝑡 ⋅ 𝑎 + (1 − 𝑡) ⋅ 𝑏 − 𝑧⟩
= 𝑓(𝑧),
where the equality follows since by definition 𝑧 = 𝑡 ⋅ 𝑎 + (1 − 𝑡) ⋅ 𝑏. Rearranging, we have (1).
Proof that (2) ⟹ (3). Pick any two 𝑥, 𝑦 ∈ Ω, and 𝑡 ∈ (0, 1]. Define 𝑥𝑡 𝑥 + 𝑡 ⋅ (𝑦 − 𝑥); note
that 𝑥𝑡 Ω by convexity of Ω. From (2), we can write
𝑓(𝑥𝑡) ≥ 𝑓 (𝑥) + ⟨∇𝑓 (𝑥), 𝑥𝑡 − 𝑥⟩
𝑓(𝑥) ≥ 𝑓(𝑥𝑡) + ⟨∇𝑓 (𝑥𝑡), 𝑥 − 𝑥𝑡⟩.
Summing the inequalities, we therefore conclude that
0 ≥ ⟨∇𝑓(𝑥) − ∇𝑓(𝑥𝑡), 𝑥𝑡 − 𝑥⟩
= ⟨∇𝑓(𝑥) − ∇𝑓(𝑥𝑡), 𝑡 ⋅ (𝑦 − 𝑥)⟩
= 𝑡 ⋅ ⟨∇𝑓(𝑥) − ∇𝑓(𝑥𝑡), 𝑦 − 𝑥⟩.
Rearranging and dividing by 𝑡2, we have
⟨∇𝑓(𝑥 + 𝑡 ⋅ (𝑦 − 𝑥)) − ∇𝑓(𝑥), 𝑦 − 𝑥⟩
𝑡 ≥ 0.
Taking the limit as 𝑡 ↓ 0, we therefore have
⟨(𝑦 − 𝑥), ∇2𝑓 (𝑥)(𝑦 − 𝑥)⟩ ≥ 0.
Since Ω is open by hypothesis, the direction of 𝑦 − 𝑥 is arbitrary, and therefore we must
have 2𝑓(𝑥) ≽ 0, as we wanted to show.
Proof that (3) ⟹ (2). By hypothesis, for any 𝑥, 𝑦 ∈ Ω and 𝜏 ∈ [0, 1],
0 ≤ ⟨𝑦 − 𝑥, ∇2𝑓 (𝑥 + 𝜏 ⋅ (𝑦 − 𝑥)) ⋅ (𝑦 − 𝑥)⟩.
Hence, taking a double integral,
0 ≤ ∫
1
0

𝑡
0
⟨𝑦 − 𝑥, ∇2𝑓 (𝑥 + 𝜏 ⋅ (𝑦 − 𝑥)) ⋅ (𝑦 − 𝑥)⟩ d𝜏 d𝑡
= ∫
1
0
⟨𝑦 − 𝑥, ∫
𝑡
0
2𝑓(𝑥 + 𝜏 ⋅ (𝑦 − 𝑥)) ⋅ (𝑦 − 𝑥)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
= d
d𝜏 ∇𝑓 (𝑥+𝜏 ⋅(𝑦−𝑥))
d𝜏 ⟩ d𝑡
= ∫
1
0
⟨𝑦 − 𝑥, ∇𝑓(𝑥 + 𝑡 ⋅ (𝑦 − 𝑥)) − ∇𝑓(𝑥)⟩ d𝑡
= −⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ + ∫
1
0
⟨∇𝑓(𝑥 + 𝑡 ⋅ (𝑦 − 𝑥)), 𝑦 − 𝑥⟩⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
d
d𝑡 𝑓(𝑥+𝑡⋅(𝑦−𝑥))
d𝑡
= 𝑓(𝑦) − 𝑓(𝑥) − ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩.
Rearranging yields (2).
3 Examples of convex functions
The third criterion of Theorem 2.1 is usually the easiest to check in practice. For example, from that
criterion it follows immediately that these functions are convex:
𝑓(𝑥) = 𝑎𝑥 + 𝑏 for any 𝑎, 𝑏 ∈ 𝑛;
𝑓(𝑥) = 𝑥𝐴𝑥 for any 𝐴 ≽ 0, including 𝑓(𝑥) = ‖𝑥‖2
2;
• the negative entropy function 𝑓(𝑥) = 𝑛
𝑖=1 𝑥𝑖 log 𝑥𝑖 defined for 𝑥𝑖 > 0;
• the function 𝑓(𝑥) = − ∑𝑛
𝑖=1 log 𝑥𝑖 defined for 𝑥𝑖 > 0;
• the function 𝑓(𝑥) = log(1 + 𝑒𝑥).
Convexity-preserving operations In addition to the criteria above, one can also recognize convex
functions when they are obtained from simpler convex functions combined via convexity-preserving
operations, such as the following.
Theorem 3.1. The following operations preserve convexity:
• Multiplication of a convex function 𝑓(𝑥) by a nonnegative scalar 𝑐 ≥ 0;
• Addition of two convex functions 𝑓(𝑥), 𝑔(𝑥);
• Pointwise supremum of a collection 𝐽 of convex functions {𝑓𝑗(𝑥) : 𝑗 ∈ 𝐽 }:
𝑓max(𝑥) ≔ max
𝑗∈𝐽 𝑓𝑗(𝑥);
• Pre-composition 𝑓(𝐴𝑥 + 𝑏) of a function 𝑓 with an affine function 𝐴𝑥 + 𝑏.
• Post-composition 𝑔(𝑓(𝑥)) of a convex function with an increasing convex function 𝑔;
Infimal convolution 𝑓 +
∨ 𝑔 of two convex functions 𝑓, 𝑔 : 𝑛 , defined as
(𝑓 +
∨ 𝑔)(𝑥) ≔ inf{𝑓 (𝑦) + 𝑔(𝑥 − 𝑦) : 𝑦 ∈ ℝ𝑛}.
In all the cases above, it is straightforward to verify the preservation of convexity starting from the
definition of convexity given in Definition 1.1.
Further readings
If you want to read more about convex functions, the following resources all contain an excellent
treatment.
[BV04] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[Online]. Available: https://web.stanford.edu/~boyd/cvxbook/
[HL01] J.-B. Hiriart-Urruty and C. Lemaréchal, Fundamentals of Convex Analysis. Springer, 2001.
[Online]. Available: https://link.springer.com/book/10.1007/978-3-642-56468-0
[Nes18] Y. Nesterov, Lectures on Convex Optimization. Springer International Publishing, 2018.
[Online]. Available: https://link.springer.com/book/10.1007/978-3-319-91578-4

Content

PDF File

Typo or question?

Get in touch!
gfarina AT mit.edu

Metadata

Course: MIT 6.7220 / 15.084
Term: Spring 2024
Date: 2024-02-13