􀉂􀈘􀈚􀈘􀈪􀈘􀈠􀈚􀈘􀈙
MIT 6.7220/15.084 — Nonlinear Optimization Apr 16-18th 2024
Lecture 14A-B
Self-concordant functions
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)
These notes are class material that has not undergone formal peer review. The TAs and I are grateful for any
reports of typos.
In this lecture, we return to Hessian preconditioning and Newton’s method, and extend the strong
guarantees—including the quadratic convergence rate to optimality—to a class of functions, called
self-concordant functions. Crucially, the set of self-concordant functions includes certain functions
that “shoot to infinity” at the boundary of their domain, such as the function − log(𝑥) on the domain
(0, ∞). The material we will cover today serves as a foundation for the analysis of interior-point
methods, which we will discuss in the next lecture.
1 Two shortcomings of the standard analysis
In Lecture 12, we have taken a look at the standard analysis of Newton’s method. The key result
we established was that the method converges double exponentially fast (in distance) to a local
minimum of a function 𝑓 if the Hessian of 𝑓 is 𝑀 -Lipschitz continuous and the curvature of the
function around the minimum is sufficiently large (say, 2𝑓(𝑥) ≽ 𝜇𝐼). However, this analysis has
two shortcomings.
1.1 Inability to handle log functions
The first shortcoming is practical. The analysis we gave in Lecture 12 breaks down for functions that
“shoot to infinity” at the boundary of their domain. This is the case—for example—of functions
such as 𝑓(𝑥) = 𝑥 − log(𝑥), 𝑓(𝑥) = − log(𝑥) − log(1 − 𝑥), et cetera. These functions have enormous
applications in optimization, but their Hessian is not Lipschitz continuous. Yet, as we show next,
Newton’s method converges to the minimum at a double exponential rate.
Example 1.1.
1 2 3 4 5 6 𝑥
1
2
3
4
0
Consider the function 𝑓 : (0, ∞) →
𝑓(𝑥) = 𝑥 − log(𝑥),
whose minimum is at 𝑥 = 1. The update of New-
ton’s method is
𝑥𝑡+1 = 𝑥𝑡 − [𝑓 ′′(𝑥𝑡)]−1𝑓 ′(𝑥𝑡)
= 𝑥𝑡 − 𝑥2
𝑡 ⋅ (1 −
1
𝑥𝑡
) = 2𝑥𝑡 − 𝑥2
𝑡 .
So, we have 1 − 𝑥𝑡+1 = 1 + 𝑥2
𝑡 2𝑥𝑡 = (1 − 𝑥𝑡)2.
This shows that not only does Newton’s method converge to the minimum at a quadratic rate,
but it does so with equality. One can check very easily that changing the parameterization from
𝑥 to 𝑘𝑥 also rescales the radius of convergence by 𝑘.
The above hints to the fact that requiring Lipschitz continuity of the Hessian is not the most natural
condition for studying the quadratic convergence of Newton’s method. Of course, if Lipschitz conti-
nuity has to go, one has to wonder: “what other condition ensuring smoothness can we impose?” We
will propose one later today, called self-concordance.
1.2 Lack of affine invariance in the radius of fast convergence
The second shortcoming is conceptual. For 𝜂 = 1, Newton’s method repeatedly moves to the mini-
mum of the quadratic approximation of 𝑓 around the current iterate. This point is independent of
the choice of coordinates used to parameterize the space, and only depends on the function itself.
Hence, the radius of fast convergence is also independent of the choice of coordinates.
However, the radius of fast convergence we gave in Lecture 12 was not affine invariant. For example,
consider a function 𝑓(𝑥). If we now make a transformation of coordinates and consider the function
𝑓(𝐴𝑥), the constants 𝑀 and 𝜇 change, and in general we arrive at a different radius of fast con-
vergence.
Part of the issue is that even the requirement ‖∇2𝑓(𝑥) − ∇2𝑓(𝑦)‖𝑠 𝑀 ‖𝑥 − 𝑦‖2 is not affine invari-
ant, in that reparameterization of 𝑥 and 𝑦 lead to a different constant. Ideally, we would like to
impose a condition that is fully affine invariant. Self-concordance satisfies this desideratum.
2 Self-concordant functions
With these two shortcomings in mind, we introduce the concept of self-concordant functions, which
are a class of strictly convex functions (that is, with positive definite Hessians) that are smooth
and have a well-behaved Hessian. The definition provides an affine-invariant condition focused on
bounding the approximation error of the second-order expansion of the function around each point.
In other words, instead of focusing on the Lipschitz continuity of the Hessian, we focus on the quality
of the approximation of the function by the second-order Taylor expansion around each point.
2.1 Intrinsic norms
Before we can give a meaningful quantification of the approximation error of the second-order expan-
sion in our setting, special care must be taken to describe the radius around which the approximation
is good in an affine-invariant way. Furthermore, one needs to use care close to the boundary of the
domain. In the theory of self-concordant functions, both of these issues are elegantly resolved by
moving away from the Euclidean norm and using instead the notion of intrinsic norm at a point 𝑥
in the domain of 𝑓, defined as.
‖𝑣‖𝑥 ≔ √⟨𝑣, ∇2𝑓 (𝑥)𝑣⟩ .
The name intrinsic norm comes from the following two important facts:
• the intrinsic norm is affine-invariant, in that if all points 𝑥 are now replaced with 𝑥 = 𝐴𝑥, the
vectors 𝑣 with 𝐴𝑣′, and the function 𝑔(𝑥) ≔ 𝑓(𝐴𝑥) is introduced, then
‖𝑣𝑥 = √⟨𝑣, ∇2𝑔(𝑥)𝑣⟩ = √⟨𝑣, 𝐴2𝑓(𝑥)𝐴𝑣⟩ = √⟨𝐴𝑣, ∇2𝑓(𝑥)𝐴𝑣⟩ = ‖𝑣‖𝑥.
• the intrinsic norm is insensitive to the choice of reference inner product ⟨⋅, ⋅⟩ for the space. This
is a consequence of the previous point, since all inner products are equivalent up to change of
coordinates.
2.2 Definition of self-concordance
We are now ready to define self-concordance. For this lecture, we will in particular focus on the
notion of strong nondegenerate self-concordance, which is a stronger version of the definition, and is
the definition that plays the central role in the theory of interior point methods.
Definition 2.1. Let Ω ⊆ 𝑛 be open and convex. A twice-differentiable function 𝑓 : Ω → is
said to be strongly nondegenerate self-concordant if:
1. The Hessian of 𝑓 is positive definite everywhere on Ω;
2. The ellipsoid 𝑊 (𝑥) ≔ {𝑦 ∈ 𝑛 : ‖𝑦 − 𝑥‖2
𝑥 < 1} is contained in Ω for all 𝑥 ∈ Ω; and
3. Inside of the ellipsoid 𝑊 (𝑥), the function 𝑓 is almost quadratic:
(1 − ‖𝑦 − 𝑥‖𝑥)
2
2𝑓(𝑥) ≼ ∇2𝑓 (𝑦) ≼
1
(1 − ‖𝑦 − 𝑥‖𝑥)
2 ∇2𝑓 (𝑥) ∀𝑥 ∈ Ω, 𝑦 ∈ 𝑊 (𝑥),
which is equivalent to the statement
(1 − ‖𝑦 − 𝑥‖𝑥)‖𝑣‖𝑥 ≤ ‖𝑣‖𝑦 ≤
‖𝑣‖𝑥
1 − ‖𝑦 − 𝑥‖𝑥
∀𝑥 ∈ Ω, 𝑦 ∈ 𝑊 (𝑥), 𝑣 ∈ ℝ𝑛.
The following equivalent characterization is often used too.
Theorem 2.1. If 𝑓 : Ω → is three-time differentiable with positive definite Hessian everywhere
on Ω, then strong nondegenerate self-concordance is equivalent to 𝑓 satisfying the following two
properties:
1. for any 𝑥0 Ω and 𝑑 ∈ 𝑛, the restriction 𝜑(𝛾) ≔ 𝑓(𝑥0 + 𝛾𝑑) of 𝑓 to the segment {𝛾 :
𝑥0 + 𝛾𝑑 ∈ Ω} satisfies
𝜑′′′(𝛾) ≤ 2𝜑′′(𝛾)3/2; and
2. any sequence {𝑥𝑘} converging to a point on the boundary of Ω is such that 𝑓(𝑥𝑘) → +∞.
Remark 2.1. In this lecture, we will use the term self-concordant to mean strongly nondegener-
ate self-concordant. In different contexts, self-concordance without qualifications refers to only
condition 1 in Theorem 2.1.
2.3 Notable example: the log function
As a sanity check, let’s show that the function 𝑓(𝑥) = − log(𝑥) on the domain Ω ≔ (0, ∞) is self-
concordant. We will show the multidimensional version of this result.
Example 2.1. The function 𝑓(𝑥) = − ∑𝑛
𝑖=1 log(𝑥𝑖) on the domain Ω ≔ 𝑛
>0 is self-concordant.
Solution . The function 𝑓 is twice differentiable with positive definite Hessian
2𝑓(𝑥) = diag( 1
𝑥2
1
, ..., 1
𝑥2
𝑛
) ≻ 0 ∀𝑥 ∈ Ω = ℝ𝑛
>0.
The ellipsoid 𝑊 (𝑥) is therefore equivalent to
𝑊 (𝑥) = {𝑦 ∈ ℝ𝑛 : ∑
𝑛
𝑖=1
(𝑦𝑖 − 𝑥𝑖
𝑥𝑖
)
2
< 1} = {𝑦 ∈ ℝ𝑛 : ∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
< 1}.
It is immediate to check that the condition implies that 𝑦𝑖 > 0 for all 𝑖 = 1, ..., 𝑛. So, 𝑊 (𝑥) ⊆ Ω.
We now check the third condition. If 𝑦 ∈ 𝑊 (𝑥), then for any 𝑣 ∈ 𝑛 we have
‖𝑣‖2
𝑦 = ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑦𝑖
)
2
= ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑥𝑖
)
2
(𝑥𝑖
𝑦𝑖
)
2
≤ ‖𝑣‖2
𝑥 max
𝑖 (𝑥𝑖
𝑦𝑖
)
2
.
Using the fact that
𝑦𝑖
𝑥𝑖
≥ 1 − | 𝑦𝑖
𝑥𝑖
− 1| ≥ 1 − √∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
= 1 − ‖𝑦 − 𝑥‖𝑥 for all 𝑖 = 1, ..., 𝑛,
we find that
max
𝑖 ( 𝑥𝑖
𝑦𝑖
)
2
= (max
𝑖 (𝑥𝑖
𝑦𝑖
))
2
1
(1 − ‖𝑦 − 𝑥‖𝑥)
2 ‖𝑣‖2
𝑦 ≤
‖𝑣‖2
𝑥
(1 − ‖𝑦 − 𝑥‖𝑥)
2 .
On the other hand, we have
‖𝑣‖2
𝑦 = ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑦𝑖
)
2
= ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑥𝑖
)
2
(𝑥𝑖
𝑦𝑖
)
2
≥ ‖𝑣‖2
𝑥 min
𝑖 (𝑥𝑖
𝑦𝑖
)
2
.
Using the fact that
𝑦𝑖
𝑥𝑖
≤ 1 + | 𝑦𝑖
𝑥𝑖
− 1| ≤ 1 + √∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
= 1 + ‖𝑦 − 𝑥‖𝑥 for all 𝑖 = 1, ..., 𝑛,
we find that
min
𝑖 ( 𝑥𝑖
𝑦𝑖
)
2
= (min
𝑖 (𝑥𝑖
𝑦𝑖
))
2
1
(1 + ‖𝑦 − 𝑥‖𝑥)
2 ‖𝑣‖2
𝑦 ≥
‖𝑣‖2
𝑥
(1 + ‖𝑦 − 𝑥‖𝑥)
2 .
Finally, using the fact that 1
1+𝑧 1 − 𝑧 (valid for all 𝑧 > −1), the statement follows.
2.4 Composition properties of self-concordant functions
In general, checking whether a function is self-concordant is a nontrivial task. Usually, one does not
use the definition; rather, the following composition rules are used.
Sum of self-concordant functions. The set of self-concordant functions is closed under addition.
Theorem 2.2. Let 𝑓1 : Ω1 and 𝑓2 : Ω2 be self-concordant functions whose domains
satisfy Ω1 Ω2 . Then, the function 𝑓 + 𝑔 : Ω1 Ω2 is self-concordant.
[▷ You should try to prove this!]
Addition of an affine function. Addition of an affine function to a self-concordant functions does
not affect the self-concordance property, since self-concordance depends only on the Hessian of the
function, and the addition of affine functions does not affect the Hessian.
Theorem 2.3. Let 𝑓 : Ω → be self-concordant function. Then, the function 𝑔(𝑥) ≔ 𝑓(𝑥) +
⟨𝑎, 𝑥⟩ + 𝑏 is self-concordant on Ω.
Affine transformation. The composition of a self-concordant function with an injective affine
transformation preserves the self-concordance.
Theorem 2.4. Let 𝑓 : Ω → be self-concordant and 𝐴 ∈ 𝑚×𝑛, 𝑏 ∈ 𝑚 represent an injec-
tive transformation.¹ Then, assuming Ω {𝑥 ∈ 𝑚 : 𝐴𝑥 + 𝑏 ∈ Ω} ≠ , the affinely-transformed
function 𝑔(𝑥) ≔ 𝑓(𝐴𝑥 + 𝑏) is self-concordant on the domain Ω′.
¹Injectivity is necessary to preserve the positive definiteness of the Hessian.
Consequences. Putting together the previous result together with Example 2.1, obtain the fol-
lowing important corollary.
Corollary 2.1. Let Ω ≔ {𝑥 ∈ 𝑛 : 𝑎
𝑖 𝑥 > 𝑏𝑖 for all 𝑖} be a nonempty open polyhedral set con-
taining no lines, where 𝑎𝑖, 𝑏𝑖 ∈ 𝑛. A function of the form
𝑓(𝑥) = 𝑐⊤𝑥 − ∑
𝑚
𝑖=1
log(𝑎
𝑖 𝑥 − 𝑏𝑖),
where 𝑐 ∈ 𝑛, is self-concordant on Ω.
2.5 Existence and uniqueness of the minimum
In addition, we mention the following property.
Theorem 2.5. Let 𝑓 : Ω → be self-concordant and lower bounded. Then, 𝑓 attains a unique
minimum.
The proof of Theorem 2.5 is typically derived from the convergence of Newton’s method, which we
will discuss in Section 3 (in particular, Theorem 3.1 and its corollary play an important role).
3 Newton’s method applied to self-concordant functions
In this section, we discuss how the guarantees of Newton’s method can be extended to self-concor-
dant functions, while gaining affine invariance at the same time.
For notational convenience, in the rest of the section we use 𝑛(𝑥) to denote the Newton direction
(i.e., second-order descent direction) at a generic point 𝑥 ∈ Ω, which is defined as usual as
𝑛(𝑥) ≔ −[∇2𝑓(𝑥)]−1∇𝑓 (𝑥).
Remark 3.1. The intrinsic norm of the descent direction ‖𝑛(𝑥)‖𝑥 at any point 𝑥 is a crucial
quantity to study Newton’s method for self-concordant functions. It is sometimes called the
Newton decrement, and indicated as 𝜆(𝑥) ≔ ‖𝑛(𝑥)‖𝑥. We will avoid the notation 𝜆(𝑥) to mini-
mize the set of notation.
3.1 Proximity to the minimum
A neat property of self-concordant functions is that it is possible to bound the distance from the
minimum of the function simply by looking at the (intrinsic) norm of the Newton direction at any
point 𝑥. In particular, if the norm is sufficiently small, then the minimum must be near. Formally,
we have the following result.
Theorem 3.1. Let 𝑓 : Ω → be self-concordant. If a point 𝑥 ∈ Ω is such that ‖𝑛(𝑥)‖𝑥 1/9,
then there exists a minimum 𝑧 of 𝑓 within distance
‖𝑧 − 𝑥‖𝑥 ≤ 3 ⋅ ‖𝑛(𝑥)‖𝑥.
In fact, the result above can be further strengthened to a larger intrinsic radius of 1/4 (instead of
1/9), as we remark next.
Remark 3.2. With a bit of additional work, the result in Theorem 3.1 can be strengthened as
follows: If a point 𝑥 ∈ Ω is such that ‖𝑛(𝑥)‖𝑥 1/4, then there exists a minimum 𝑧 of 𝑓 within
distance
‖𝑧 − 𝑥‖𝑥 ≤ ‖𝑛(𝑥)‖𝑥 +
3‖𝑛(𝑥)‖2
𝑥
(1 − ‖𝑛(𝑥)‖𝑥)
3 .
3.2 Recovering the quadratic convergence rate
For self-concordant functions, the following affine-invariant guarantee can be established.
Theorem 3.2. Let 𝑓 : Ω → be self-concordant. If a point 𝑥𝑡 Ω is such that ‖𝑛(𝑥𝑡)‖𝑥𝑡
<
1, then
‖𝑛(𝑥𝑡+1)‖𝑥𝑡+1


‖𝑛(𝑥𝑡)‖𝑥𝑡
1 − ‖𝑛(𝑥𝑡)‖𝑥𝑡 ⎠

2
.
The proof is somewhat elaborate (it involves a few steps) but it is not particularly difficult. The key
idea is to use the self-concordance property to bound the Hessian at the next iterate in terms of the
Hessian at the current iterate. You can find a detailed proof in the references cited below.
Combined with the result in the previous subsection, this gives a quadratic convergence rate.
4 Further readings
The short book by Renegar, J. [Ren01] and the monograph by Nesterov, Y. [Nes18] (Chapter 5)
provide a comprehensive introduction to self-concordant functions and their applications in opti-
mization.
I especially recommend the book by Renegar, J. [Ren01] for a concise yet rigorous account.
[Ren01] J. Renegar, A Mathematical View of Interior-point Methods in Convex Optimization.
Philadelphia, PA, USA: SIAM, 2001. doi: 10.1137/1.9780898718812.
[Nes18] Y. Nesterov, Lectures on Convex Optimization. Springer International Publishing, 2018.
[Online]. Available: https://link.springer.com/book/10.1007/978-3-319-91578-4

Content

PDF File

Typo or question?

Get in touch!
gfarina AT mit.edu

Metadata

Course: MIT 6.7220 / 15.084
Term: Spring 2024
Date: 2024-04-18