MIT 6.7220/15.084 — Nonlinear Optimization (Spring ‘25) Apr 29-May 1st 2024
Lecture 19
Self-concordant functions
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
In Lecture 17, we have taken a look at the standard analysis of Newton’s method. The key
result we established was that the method converges doubly exponentially fast (in distance)
to a local minimum of a function 𝑓 if the Hessian of 𝑓 is 𝑀 -Lipschitz continuous and the
curvature of the function around the minimum is sufficiently large (say, ∇2𝑓(𝑥⋆) ⪰ 𝜇𝐼).
However, this analysis has two shortcomings.
■ Inability to handle log functions. The first shortcoming is practical. The analysis we
gave in Lecture 17 breaks down for functions that “shoot to infinity” at the boundary
of their domain. This is the case—for example—of functions such as 𝑓(𝑥) = 𝑥 − log(𝑥),
𝑓(𝑥) = − log(𝑥) − log(1 − 𝑥), et cetera. These functions have enormous applications in
optimization, but their Hessian is not Lipschitz continuous. Yet, as we show next, Newton’s
method converges to the minimum at a double exponential rate.
Example L19.1.
1 2 3 4 5 6 𝑥
1
2
3
4
0
Consider the function 𝑓 : (0, ∞) → ℝ
𝑓(𝑥) = 𝑥 − log(𝑥),
whose minimum is at 𝑥 = 1. The update of
Newton’s method is
𝑥𝑡+1 = 𝑥𝑡 − [𝑓′′(𝑥𝑡)]−1𝑓′(𝑥𝑡)
= 𝑥𝑡 − 𝑥2
𝑡 ⋅ (1 − 1
𝑥𝑡
) = 2𝑥𝑡 − 𝑥2
𝑡 .
So, we have 1 − 𝑥𝑡+1 = 1 + 𝑥2
𝑡 − 2𝑥𝑡 = (1 − 𝑥𝑡)2.
This shows that not only does Newton’s method converge to the minimum at a
quadratic rate, but it does so with equality. One can check very easily that changing
the parameterization from 𝑥 to 𝑘𝑥 also rescales the radius of convergence by 𝑘.
The above hints to the fact that requiring Lipschitz continuity of the Hessian is not the
most natural condition for studying the quadratic convergence of Newton’s method. Of
course, if Lipschitz continuity has to go, one has to wonder: “what other condition ensuring
smoothness can we impose?” We will propose one later today, called self-concordance.
■ Lack of affine invariance in the radius of fast convergence. The second shortcoming is
conceptual. For 𝜂 = 1, Newton’s method repeatedly moves to the minimum of the quadratic
approximation of 𝑓 around the current iterate. This point is independent of the choice of
coordinates used to parameterize the space, and only depends on the function itself. Hence,
the radius of fast convergence is also independent of the choice of coordinates.
However, the radius of fast convergence we gave in Lecture 17 was not affine invariant. For
example, consider a function 𝑓(𝑥). If we now make a transformation of coordinates and
consider the function 𝑓(𝐴𝑥), the constants 𝑀 and 𝜇 change, and in general we arrive at a
different radius of fast convergence.
Part of the issue is that even the requirement ‖∇2𝑓(𝑥) − ∇2𝑓(𝑦)‖𝑠 ≤ 𝑀 ‖𝑥 − 𝑦‖2 is not
affine invariant, in that reparameterization of 𝑥 and 𝑦 lead to a different constant. Ideally,
we would like to impose a condition that is fully affine invariant. Self-concordance satisfies
this desideratum.
L19.1 Self-concordant functions
With these two shortcomings in mind, we introduce the concept of self-concordant functions,
which are a class of strictly convex functions (in particular, with positive definite Hessians)
that are smooth and have a well-behaved Hessian. The definition provides an affine-invariant
condition focused on bounding the approximation error of the second-order expansion of the
function around each point.
In other words, instead of focusing on the Lipschitz continuity of the Hessian, we focus
on the quality of the approximation of the function by the second-order Taylor expansion
around each point.
L19.1.1 Intrinsic norms
Before we can give a meaningful quantification of the approximation error of the second-
order expansion in our setting, special care must be taken to describe the radius around
which the approximation is good in an affine-invariant way. Furthermore, one needs to use
care close to the boundary of the domain. In the theory of self-concordant functions, both
of these issues are elegantly resolved by moving away from the Euclidean norm and using
instead the notion of intrinsic norm at a point 𝑥 in the domain of 𝑓, defined as.
‖𝑣‖𝑥 ≔ √⟨𝑣, ∇2𝑓(𝑥)𝑣⟩ .
The name intrinsic norm comes from the following two important facts:
• the intrinsic norm is affine-invariant, in that if all points 𝑥 are now replaced with 𝑥 =
𝐴𝑥′, the vectors 𝑣 with 𝐴𝑣′, and the function 𝑔(𝑥′) ≔ 𝑓(𝐴𝑥′) is introduced, then
‖𝑣′‖𝑥′ = √⟨𝑣′, ∇2𝑔(𝑥′)𝑣′⟩ = √⟨𝑣′, 𝐴⊤∇2𝑓(𝑥)𝐴𝑣′⟩ = √⟨𝐴𝑣′, ∇2𝑓(𝑥)𝐴𝑣′⟩ = ‖𝑣‖𝑥.
• the intrinsic norm is insensitive to the choice of reference inner product ⟨⋅, ⋅⟩ for
the space. This is a consequence of the previous point, since all inner products are
equivalent up to change of coordinates.
L19.1.2 Definition of self-concordance
We are now ready to define self-concordance. For this lecture, we will in particular focus
on the notion of strong nondegenerate self-concordance, which is a stronger version of the
definition, and is the definition that plays the central role in the theory of interior point
methods.
Definition L19.1. Let Ω ⊆ ℝ𝑛 be open and convex. A twice-differentiable function 𝑓 :
Ω → ℝ is said to be strongly nondegenerate self-concordant if:
1. The Hessian of 𝑓 is positive definite everywhere on Ω;
2. The ellipsoid 𝑊 (𝑥) ≔ {𝑦 ∈ ℝ𝑛 : ‖𝑦 − 𝑥‖2
𝑥 < 1} is contained in Ω for all 𝑥 ∈ Ω; and
3. Inside of the ellipsoid 𝑊 (𝑥), the function 𝑓 is almost quadratic:
(1 − ‖𝑦 − 𝑥‖𝑥)2∇2𝑓(𝑥) ⪯ ∇2𝑓(𝑦) ⪯ 1
(1 − ‖𝑦 − 𝑥‖𝑥)2 ∇2𝑓(𝑥) ∀𝑥 ∈ Ω, 𝑦 ∈ 𝑊 (𝑥),
which is equivalent to the statement
(1 − ‖𝑦 − 𝑥‖𝑥)‖𝑣‖𝑥 ≤ ‖𝑣‖𝑦 ≤ ‖𝑣‖𝑥
1 − ‖𝑦 − 𝑥‖𝑥
∀𝑥 ∈ Ω, 𝑦 ∈ 𝑊 (𝑥), 𝑣 ∈ ℝ𝑛.
The following equivalent characterization is often used too.
Theorem L19.1. If 𝑓 : Ω → ℝ is three-time differentiable with positive definite Hessian
everywhere on Ω, then strong nondegenerate self-concordance is equivalent to 𝑓 satis-
fying the following two properties:
1. for any 𝑥0 ∈ Ω and 𝑑 ∈ ℝ𝑛, the restriction 𝜑(𝛾) ≔ 𝑓(𝑥0 + 𝛾𝑑) of 𝑓 to the segment
{𝛾 : 𝑥0 + 𝛾𝑑 ∈ Ω} satisfies
𝜑′′′(𝛾) ≤ 2𝜑′′(𝛾)3/2; and
2. any sequence {𝑥𝑘} converging to a point on the boundary of Ω is such that
𝑓(𝑥𝑘) → +∞.
Remark L19.1. In this lecture, we will use the term self-concordant to mean strongly
nondegenerate self-concordant. In different contexts, self-concordance without qualifica-
tions refers to only condition 1 in Theorem L19.1.
While not a proof, it is easy to see how the definition of self-concordance in Definition
L19.1, for univariate functions, implies the first item of Theorem L19.1. In particular, for
a univariate function Definition L19.1 implies that for any 𝑡 > 0 sufficiently small
𝜙′′(𝑥 + 𝑡) ≤ 1
(1 − √𝜙′′(𝑥)𝑡)2 𝜙′′(𝑥).
Subtracting 𝜙′′(𝑥) from both sides and dividing by 𝑡 yields
𝜙′′(𝑥 + 𝑡) − 𝜙′′(𝑥)
𝑡 ≤ 2(𝜙′′(𝑥))3/2 − (𝜙′′(𝑥))2𝑡
(1 − √𝜙′′(𝑥)𝑡)2 ≤ 2(𝜙′′(𝑥))3/2.
Taking a limit as 𝑡 ↓ 0 then yields 𝜙′′′(𝑥) ≤ 2(𝜙′′(𝑥))3/2 as claimed. The other direction is
slighlty more involved but still elementary.
L19.1.3 Notable example: the log function
As a sanity check, let’s show that the function 𝑓(𝑥) = − log(𝑥) on the domain Ω ≔ (0, ∞)
is self-concordant. We will show the multidimensional version of this result.
Example L19.2. The function 𝑓(𝑥) = − ∑𝑛
𝑖=1 log(𝑥𝑖) on the domain Ω ≔ ℝ𝑛
>0 is self-
concordant.
Solution. The function 𝑓 is twice differentiable with positive definite Hessian
∇2𝑓(𝑥) = diag( 1
𝑥2
1
, ..., 1
𝑥2
𝑛
) ≻ 0 ∀𝑥 ∈ Ω = ℝ𝑛
>0.
The ellipsoid 𝑊 (𝑥) is therefore equivalent to
𝑊 (𝑥) = {𝑦 ∈ ℝ𝑛 : ∑
𝑛
𝑖=1
( 𝑦𝑖 − 𝑥𝑖
𝑥𝑖
)
2
< 1} = {𝑦 ∈ ℝ𝑛 : ∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
< 1}.
It is immediate to check that the condition implies that 𝑦𝑖 > 0 for all 𝑖 = 1, ..., 𝑛. So,
𝑊 (𝑥) ⊆ Ω. We now check the third condition. We will try to relate ‖𝑣‖2
𝑦 to ‖𝑣‖2
𝑥, by
observing that
‖𝑣‖2
𝑦 = ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑦𝑖
)
2
= ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑥𝑖
)
2
(𝑥𝑖
𝑦𝑖
)
2
.
In particular, the previous equality implies immediately that
‖𝑣‖2
𝑥 min
𝑖 (𝑥𝑖
𝑦𝑖
)
2
≤ ‖𝑣‖2
𝑦 ≤ ‖𝑣‖2
𝑥 max
𝑖 ( 𝑥𝑖
𝑦𝑖
)
2
. (1)
We will now use the triangle inequality to bound the minimum and maximum of 𝑥𝑖
𝑦𝑖
.
Specifically, we have that, for all 𝑖 ∈ {1, ..., 𝑛},
𝑦𝑖
𝑥𝑖
≤ 1 + | 𝑦𝑖
𝑥𝑖
− 1| ≤ 1 + √∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
= 1 + ‖𝑦 − 𝑥‖𝑥
𝑦𝑖
𝑥𝑖
≥ 1 − | 𝑦𝑖
𝑥𝑖
− 1| ≥ 1 − √∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
= 1 − ‖𝑦 − 𝑥‖𝑥.
Taking a minimum over 𝑖 = 1, ..., 𝑛, we then get
min
𝑖 ( 𝑥𝑖
𝑦𝑖
)
2
≥ 1
(1 + ‖𝑦 − 𝑥‖𝑥)2 , max
𝑖 (𝑥𝑖
𝑦𝑖
)
2
≤ 1
(1 − ‖𝑦 − 𝑥‖𝑥)2 .
Plugging the previous bounds into (1), we find
‖𝑣‖2
𝑥
(1 + ‖𝑦 − 𝑥‖𝑥)2 ≤ ‖𝑣‖2
𝑦 ≤ ‖𝑣‖2
𝑥
(1 − ‖𝑦 − 𝑥‖𝑥)2 .
Finally, using the fact that 1
1+𝑧 ≥ 1 − 𝑧 (valid for all 𝑧 > −1), and taking square roots,
(1 − ‖𝑦 − 𝑥‖𝑥)‖𝑣‖𝑥 ≤ ‖𝑣‖𝑦 ≤ ‖𝑣‖𝑥
1 − ‖𝑦 − 𝑥‖𝑥
.
□
L19.1.4 Other examples
Other examples of self-concordant functions include the following:
• − log det 𝑋 on the set of positive definite matrices 𝑋
• − log(1 − ‖𝐴𝑥‖2
2) on the ellipsoid defined by 𝐴 ≻ 0
• Quadratic functions 𝑥⊤𝐴𝑥, with 𝐴 ≻ 0, on ℝ𝑛
L19.1.5 Composition properties of self-concordant functions
In general, checking whether a function is self-concordant is a nontrivial task. Usually, one
does not use the definition; rather, the following composition rules are used.
i) (Sum of self-concordant function) The set of self-concordant functions is closed under
addition.
Theorem L19.2. Let 𝑓1 : Ω1 → ℝ and 𝑓2 : Ω2 → ℝ be self-concordant functions
whose domains satisfy Ω1 ∩ Ω2 ≠ ∅. Then, the function 𝑓 + 𝑔 : Ω1 ∩ Ω2 → ℝ is
self-concordant.
[▷ You should try to prove this!]
ii) (Addition of an affine function) Addition of an affine function to a self-concordant
functions does not affect the self-concordance property, since self-concordance depends
only on the Hessian of the function, and the addition of affine functions does not affect
the Hessian.
Theorem L19.3. Let 𝑓 : Ω → ℝ be self-concordant function. Then, the function
𝑔(𝑥) ≔ 𝑓(𝑥) + ⟨𝑎, 𝑥⟩ + 𝑏 is self-concordant on Ω.
iii) (Affine transformation) The composition of a self-concordant function with an injec-
tive affine transformation preserves the self-concordance.
Theorem L19.4. Let 𝑓 : Ω → ℝ be self-concordant and 𝐴 ∈ ℝ𝑚×𝑛, 𝑏 ∈ ℝ𝑚 repre-
sent an injective transformation.¹ Then, assuming Ω′ ≔ {𝑥 ∈ ℝ𝑚 : 𝐴𝑥 + 𝑏 ∈ Ω} ≠
∅, the affinely-transformed function 𝑔(𝑥) ≔ 𝑓(𝐴𝑥 + 𝑏) is self-concordant on the
domain Ω′.
iv) (Consequence) Putting together the previous result together with Example L19.2, we
obtain the following important corollary.
Corollary L19.1. Let Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝑎⊤
𝑖 𝑥 > 𝑏𝑖 for all 𝑖} be a nonempty open poly-
hedral set containing no lines, where 𝑎𝑖 ∈ ℝ𝑛, 𝑏𝑖 ∈ ℝ. A function of the form
𝑓(𝑥) = 𝑐⊤𝑥 − ∑
𝑚
𝑖=1
log(𝑎⊤
𝑖 𝑥 − 𝑏𝑖),
where 𝑐 ∈ ℝ𝑛, is self-concordant on Ω.
L19.2 Properties of self-concordant functions
Self-concordant functions enjoy a number of nice properties. We now scratch the surface.
L19.2.1 The “near-quadratic” nature of self-concordant functions
In point 3. of Definition L19.1, we wrote that self-concordance is effectively asking that
“inside of the ellipsoid 𝑊 (𝑥), the function 𝑓 is almost quadratic”. We will need to make this
statement more precise.
Theorem L19.5 (Quadratic approximation bound). Let 𝑓 : Ω → ℝ be self-concordant,
and let 𝑞𝑥 be the second-order Taylor expansion of 𝑓 around 𝑥, that is,
𝑞𝑥(𝑦) ≔ 𝑓(𝑥) + ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ + 1
2 ⟨𝑦 − 𝑥, ∇2𝑓(𝑥)(𝑦 − 𝑥)⟩.
Then, for all 𝑥 ∈ Ω and 𝑦 ∈ 𝑊 (𝑥), we have
|𝑓(𝑦) − 𝑞𝑥(𝑦)| ≤ ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥) .
Proof. We use the fundamental theorem of calculus² to bound the remainder of the
second-order Taylor expansion as
𝑓(𝑦) − 𝑞𝑥(𝑦) = ∫
1
0
∫
𝑎
0
⟨𝑦 − 𝑥, (∇2𝑓(𝑥 + 𝑏(𝑦 − 𝑥)) − ∇2𝑓(𝑥))(𝑦 − 𝑥)⟩ d𝑏 d𝑎.
The key point of Item 3 of Definition L19.1 is that the inner difference of Hessians is
bounded quite nicely for self-concordant functions. In particular, since by hypothesis 𝑦 ∈
𝑊 (𝑥), we can write
∇2𝑓(𝑥 + 𝑏(𝑦 − 𝑥)) − ∇2𝑓(𝑥) ⪯ 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 ∇2𝑓(𝑥) − ∇2𝑓(𝑥)
= ( 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 − 1)∇2𝑓(𝑥).
Hence, taking absolute values, we find
|𝑓(𝑦) − 𝑞𝑥(𝑦)| ≤ ∫
1
0
∫
𝑎
0
( 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 − 1)|⟨𝑦 − 𝑥, ∇2𝑓(𝑥)(𝑦 − 𝑥)⟩| d𝑏 d𝑎
= ‖𝑦 − 𝑥‖2
𝑥 ∫
1
0
∫
𝑎
0
( 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 − 1) d𝑏 d𝑎
= ‖𝑦 − 𝑥‖2
𝑥 ∫
1
0
( 1
‖𝑦 − 𝑥‖𝑥
[ 1
1 − 𝑏‖𝑦 − 𝑥‖𝑥
]
𝑎
0
− 𝑎) d𝑎
= ‖𝑦 − 𝑥‖2
𝑥 ∫
1
0
( 𝑎
1 − 𝑎‖𝑦 − 𝑥‖𝑥
− 𝑎) d𝑎
= ‖𝑦 − 𝑥‖3
𝑥 ∫
1
0
𝑎2
1 − 𝑎‖𝑦 − 𝑥‖𝑥
d𝑎
At this point, we can just bound the integral according to
∫
1
0
𝑎2
1 − 𝑎‖𝑦 − 𝑥‖𝑥
d𝑎 ≤ ∫
1
0
𝑎2
1 − ‖𝑦 − 𝑥‖𝑥
d𝑎 ≤ 1
3(1 − ‖𝑦 − 𝑥‖𝑥),
yielding the final result. □
L19.2.2 Proximity to a minimum
A neat property of self-concordant functions is that it is possible to bound the distance
from the minimum of the function simply by looking at the (intrinsic) norm of the Newton
direction at any point 𝑥, which for notational convenience we denote as
𝑛(𝑥) ≔ −[∇2𝑓(𝑥)]−1∇𝑓(𝑥).
Remark L19.2. The intrinsic norm of the descent direction ‖𝑛(𝑥)‖𝑥 at any point 𝑥 is a
crucial quantity to study Newton’s method for self-concordant functions. It is sometimes
called the Newton decrement, and indicated as 𝜆(𝑥) ≔ ‖𝑛(𝑥)‖𝑥. We will avoid the
notation 𝜆(𝑥) to minimize the set of notation.
In particular, if the norm is sufficiently small, then the minimum must be near. Formally,
we have the following result.
Theorem L19.6. Let 𝑓 : Ω → ℝ be self-concordant. If a point 𝑥 ∈ Ω is such that
‖𝑛(𝑥)‖𝑥 ≤ 1/9, then there exists a minimizer 𝑧 of 𝑓 within distance
‖𝑧 − 𝑥‖𝑥 ≤ 3 ⋅ ‖𝑛(𝑥)‖𝑥.
Proof. For positive definite quadratics, the theorem is immediate to prove. The idea is
to now use the near-quadratic nature of self-concordant functions to establish a compact
region around 𝑥 where a minimum must exist, and then invoke the Weierstrass theorem.
In particular, for any 𝑦 in 𝑊 (𝑥), we know from Theorem L19.5 that
𝑓(𝑦) ≥ 𝑞𝑥(𝑦) − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥)
= 𝑓(𝑥) + ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ + 1
2⟨𝑦 − 𝑥, ∇2𝑓(𝑥)(𝑦 − 𝑥)⟩ − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥)
= 𝑓(𝑥) + ⟨[∇2𝑓(𝑥)]−1/2∇𝑓(𝑥), [∇2𝑓(𝑥)]1/2(𝑦 − 𝑥)⟩ + 1
2‖𝑦 − 𝑥‖2
𝑥 − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥)
Hence, using Cauchy-Schwarz,
𝑓(𝑦) ≥ 𝑓(𝑥) − ‖𝑛(𝑥)‖𝑥‖𝑦 − 𝑥‖𝑥 + 1
2 ‖𝑦 − 𝑥‖2
𝑥 − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥).
It is easy to see by direct substitution that for any 𝑦 ∈ 𝑊 (𝑥) with ‖𝑦 − 𝑥‖𝑥 > 3‖𝑛(𝑥)‖𝑥,
then 𝑓(𝑦) > 𝑓(𝑥). Hence, for any we can restrict the function 𝑓 to the closed ball of
radius ‖𝑦 − 𝑥‖𝑥 ≤ 1/3 (which is a subset of 𝑊 (𝑥)) and find a minimum 𝑧 there; such a
minimum is guaranteed to be a minimum of 𝑓 on 𝑊 (𝑥). However, since 𝑓 is convex and
𝑧 is in the interior of 𝑊 (𝑥), then ∇𝑓(𝑧) = 0, and therefore 𝑧 is a minimizer of 𝑓 on Ω as
well. □
The proof of Theorem L19.6 was a bit loose in our bounds. In fact, the result can be further
strengthened to a larger intrinsic radius of 1/4 (instead of 1/9).
Remark L19.3. With a bit of additional work, the result in Theorem L19.6 can be
strengthened as follows: If a point 𝑥 ∈ Ω is such that ‖𝑛(𝑥)‖𝑥 ≤ 1/4, then there exists
a minimum 𝑧 of 𝑓 within distance
‖𝑧 − 𝑥‖𝑥 ≤ ‖𝑛(𝑥)‖𝑥 + 3‖𝑛(𝑥)‖2
𝑥
(1 − ‖𝑛(𝑥)‖𝑥)3 .
L19.2.3 Existence and uniqueness of the minimum
In addition, we mention the following property.
Theorem L19.7. Let 𝑓 : Ω → ℝ be self-concordant and lower bounded. Then, 𝑓 attains
a unique minimum.
The “uniqueness” part of the theorem is implied directly by the strict convexity of the self-
concordant functions, which is guaranteed by Item 1 of Definition L19.1. Existence on the
other hand is not immediate. One pretty direct proof is to show that there must exist at
least one point with ‖𝑛(𝑥)‖𝑥 ≤ 1/9, and then use Theorem L19.6.
L19.3 Newton’s method applied to self-concordant functions
In this section, we discuss how the guarantees of Newton’s method can be extended to self-
concordant functions, while gaining affine invariance at the same time.
L19.3.1 Recovering the quadratic convergence rate
For self-concordant functions, the following affine-invariant guarantee can be established.
Theorem L19.8. Let 𝑓 : Ω → ℝ be self-concordant. If a point 𝑥𝑡 ∈ Ω is such that
‖𝑛(𝑥𝑡)‖𝑥𝑡
< 1, then
‖𝑛(𝑥𝑡+1)‖𝑥𝑡+1
≤
(
‖𝑛(𝑥𝑡)‖𝑥𝑡
1 − ‖𝑛(𝑥𝑡)‖𝑥𝑡 )
2
.
The proof involves a couple of steps but it is not particularly difficult. The key idea is to
use the self-concordance property to bound the Hessian at the next iterate in terms of the
Hessian at the current iterate. You can find a detailed proof in the references cited below.
Combined with the result in the previous subsection, this gives a quadratic convergence rate.
L19.4 Further readings
The short book by Renegar, J. [Ren01] and the monograph by Nesterov, Y. [Nes18] (Chapter
5) provide a comprehensive introduction to self-concordant functions and their applications
in optimization.
I especially recommend the book by Renegar, J. [Ren01] for a concise yet rigorous account.
[Ren01] Renegar, J. (2001). A Mathematical View of Interior-point Methods in Convex
Optimization. SIAM. https://doi.org/10.1137/1.9780898718812
[Nes18] Nesterov, Y. (2018). Lectures on Convex Optimization. Springer International
Publishing. https://link.springer.com/book/10.1007/978-3-319-91578-4
Changelog
• Apr 29, 2025: Fixed a few typos (thanks Jonathan Huang!)
• May 3, 2025: Fixed two typos
• May 11, 2025: Fixed a few typos (thanks Khizer!)
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful
for any reports of typos.
¹Injectivity is necessary to preserve the positive definiteness of the Hessian.
²In particular, for a univariate function 𝜙(𝑡) we have
𝜙(1) − 𝜙(0) − 𝜙′(0) − 1
2𝜙′′(0) = ∫
1
0
∫
𝑎
0
𝜙′′(𝑏) − 𝜙′′(0) d𝑏 d𝑎.
It is easy to obtain the multivariate version of the theorem by simply considering the restriction of a generic
function 𝑓 onto the line spanned by 𝑦 − 𝑥, that is, 𝜙(𝑡) ≔ 𝑓(𝑥 + 𝑡(𝑦 − 𝑥)).
Lecture 19
Self-concordant functions
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
In Lecture 17, we have taken a look at the standard analysis of Newton’s method. The key
result we established was that the method converges doubly exponentially fast (in distance)
to a local minimum of a function 𝑓 if the Hessian of 𝑓 is 𝑀 -Lipschitz continuous and the
curvature of the function around the minimum is sufficiently large (say, ∇2𝑓(𝑥⋆) ⪰ 𝜇𝐼).
However, this analysis has two shortcomings.
■ Inability to handle log functions. The first shortcoming is practical. The analysis we
gave in Lecture 17 breaks down for functions that “shoot to infinity” at the boundary
of their domain. This is the case—for example—of functions such as 𝑓(𝑥) = 𝑥 − log(𝑥),
𝑓(𝑥) = − log(𝑥) − log(1 − 𝑥), et cetera. These functions have enormous applications in
optimization, but their Hessian is not Lipschitz continuous. Yet, as we show next, Newton’s
method converges to the minimum at a double exponential rate.
Example L19.1.
1 2 3 4 5 6 𝑥
1
2
3
4
0
Consider the function 𝑓 : (0, ∞) → ℝ
𝑓(𝑥) = 𝑥 − log(𝑥),
whose minimum is at 𝑥 = 1. The update of
Newton’s method is
𝑥𝑡+1 = 𝑥𝑡 − [𝑓′′(𝑥𝑡)]−1𝑓′(𝑥𝑡)
= 𝑥𝑡 − 𝑥2
𝑡 ⋅ (1 − 1
𝑥𝑡
) = 2𝑥𝑡 − 𝑥2
𝑡 .
So, we have 1 − 𝑥𝑡+1 = 1 + 𝑥2
𝑡 − 2𝑥𝑡 = (1 − 𝑥𝑡)2.
This shows that not only does Newton’s method converge to the minimum at a
quadratic rate, but it does so with equality. One can check very easily that changing
the parameterization from 𝑥 to 𝑘𝑥 also rescales the radius of convergence by 𝑘.
The above hints to the fact that requiring Lipschitz continuity of the Hessian is not the
most natural condition for studying the quadratic convergence of Newton’s method. Of
course, if Lipschitz continuity has to go, one has to wonder: “what other condition ensuring
smoothness can we impose?” We will propose one later today, called self-concordance.
■ Lack of affine invariance in the radius of fast convergence. The second shortcoming is
conceptual. For 𝜂 = 1, Newton’s method repeatedly moves to the minimum of the quadratic
approximation of 𝑓 around the current iterate. This point is independent of the choice of
coordinates used to parameterize the space, and only depends on the function itself. Hence,
the radius of fast convergence is also independent of the choice of coordinates.
However, the radius of fast convergence we gave in Lecture 17 was not affine invariant. For
example, consider a function 𝑓(𝑥). If we now make a transformation of coordinates and
consider the function 𝑓(𝐴𝑥), the constants 𝑀 and 𝜇 change, and in general we arrive at a
different radius of fast convergence.
Part of the issue is that even the requirement ‖∇2𝑓(𝑥) − ∇2𝑓(𝑦)‖𝑠 ≤ 𝑀 ‖𝑥 − 𝑦‖2 is not
affine invariant, in that reparameterization of 𝑥 and 𝑦 lead to a different constant. Ideally,
we would like to impose a condition that is fully affine invariant. Self-concordance satisfies
this desideratum.
L19.1 Self-concordant functions
With these two shortcomings in mind, we introduce the concept of self-concordant functions,
which are a class of strictly convex functions (in particular, with positive definite Hessians)
that are smooth and have a well-behaved Hessian. The definition provides an affine-invariant
condition focused on bounding the approximation error of the second-order expansion of the
function around each point.
In other words, instead of focusing on the Lipschitz continuity of the Hessian, we focus
on the quality of the approximation of the function by the second-order Taylor expansion
around each point.
L19.1.1 Intrinsic norms
Before we can give a meaningful quantification of the approximation error of the second-
order expansion in our setting, special care must be taken to describe the radius around
which the approximation is good in an affine-invariant way. Furthermore, one needs to use
care close to the boundary of the domain. In the theory of self-concordant functions, both
of these issues are elegantly resolved by moving away from the Euclidean norm and using
instead the notion of intrinsic norm at a point 𝑥 in the domain of 𝑓, defined as.
‖𝑣‖𝑥 ≔ √⟨𝑣, ∇2𝑓(𝑥)𝑣⟩ .
The name intrinsic norm comes from the following two important facts:
• the intrinsic norm is affine-invariant, in that if all points 𝑥 are now replaced with 𝑥 =
𝐴𝑥′, the vectors 𝑣 with 𝐴𝑣′, and the function 𝑔(𝑥′) ≔ 𝑓(𝐴𝑥′) is introduced, then
‖𝑣′‖𝑥′ = √⟨𝑣′, ∇2𝑔(𝑥′)𝑣′⟩ = √⟨𝑣′, 𝐴⊤∇2𝑓(𝑥)𝐴𝑣′⟩ = √⟨𝐴𝑣′, ∇2𝑓(𝑥)𝐴𝑣′⟩ = ‖𝑣‖𝑥.
• the intrinsic norm is insensitive to the choice of reference inner product ⟨⋅, ⋅⟩ for
the space. This is a consequence of the previous point, since all inner products are
equivalent up to change of coordinates.
L19.1.2 Definition of self-concordance
We are now ready to define self-concordance. For this lecture, we will in particular focus
on the notion of strong nondegenerate self-concordance, which is a stronger version of the
definition, and is the definition that plays the central role in the theory of interior point
methods.
Definition L19.1. Let Ω ⊆ ℝ𝑛 be open and convex. A twice-differentiable function 𝑓 :
Ω → ℝ is said to be strongly nondegenerate self-concordant if:
1. The Hessian of 𝑓 is positive definite everywhere on Ω;
2. The ellipsoid 𝑊 (𝑥) ≔ {𝑦 ∈ ℝ𝑛 : ‖𝑦 − 𝑥‖2
𝑥 < 1} is contained in Ω for all 𝑥 ∈ Ω; and
3. Inside of the ellipsoid 𝑊 (𝑥), the function 𝑓 is almost quadratic:
(1 − ‖𝑦 − 𝑥‖𝑥)2∇2𝑓(𝑥) ⪯ ∇2𝑓(𝑦) ⪯ 1
(1 − ‖𝑦 − 𝑥‖𝑥)2 ∇2𝑓(𝑥) ∀𝑥 ∈ Ω, 𝑦 ∈ 𝑊 (𝑥),
which is equivalent to the statement
(1 − ‖𝑦 − 𝑥‖𝑥)‖𝑣‖𝑥 ≤ ‖𝑣‖𝑦 ≤ ‖𝑣‖𝑥
1 − ‖𝑦 − 𝑥‖𝑥
∀𝑥 ∈ Ω, 𝑦 ∈ 𝑊 (𝑥), 𝑣 ∈ ℝ𝑛.
The following equivalent characterization is often used too.
Theorem L19.1. If 𝑓 : Ω → ℝ is three-time differentiable with positive definite Hessian
everywhere on Ω, then strong nondegenerate self-concordance is equivalent to 𝑓 satis-
fying the following two properties:
1. for any 𝑥0 ∈ Ω and 𝑑 ∈ ℝ𝑛, the restriction 𝜑(𝛾) ≔ 𝑓(𝑥0 + 𝛾𝑑) of 𝑓 to the segment
{𝛾 : 𝑥0 + 𝛾𝑑 ∈ Ω} satisfies
𝜑′′′(𝛾) ≤ 2𝜑′′(𝛾)3/2; and
2. any sequence {𝑥𝑘} converging to a point on the boundary of Ω is such that
𝑓(𝑥𝑘) → +∞.
Remark L19.1. In this lecture, we will use the term self-concordant to mean strongly
nondegenerate self-concordant. In different contexts, self-concordance without qualifica-
tions refers to only condition 1 in Theorem L19.1.
While not a proof, it is easy to see how the definition of self-concordance in Definition
L19.1, for univariate functions, implies the first item of Theorem L19.1. In particular, for
a univariate function Definition L19.1 implies that for any 𝑡 > 0 sufficiently small
𝜙′′(𝑥 + 𝑡) ≤ 1
(1 − √𝜙′′(𝑥)𝑡)2 𝜙′′(𝑥).
Subtracting 𝜙′′(𝑥) from both sides and dividing by 𝑡 yields
𝜙′′(𝑥 + 𝑡) − 𝜙′′(𝑥)
𝑡 ≤ 2(𝜙′′(𝑥))3/2 − (𝜙′′(𝑥))2𝑡
(1 − √𝜙′′(𝑥)𝑡)2 ≤ 2(𝜙′′(𝑥))3/2.
Taking a limit as 𝑡 ↓ 0 then yields 𝜙′′′(𝑥) ≤ 2(𝜙′′(𝑥))3/2 as claimed. The other direction is
slighlty more involved but still elementary.
L19.1.3 Notable example: the log function
As a sanity check, let’s show that the function 𝑓(𝑥) = − log(𝑥) on the domain Ω ≔ (0, ∞)
is self-concordant. We will show the multidimensional version of this result.
Example L19.2. The function 𝑓(𝑥) = − ∑𝑛
𝑖=1 log(𝑥𝑖) on the domain Ω ≔ ℝ𝑛
>0 is self-
concordant.
Solution. The function 𝑓 is twice differentiable with positive definite Hessian
∇2𝑓(𝑥) = diag( 1
𝑥2
1
, ..., 1
𝑥2
𝑛
) ≻ 0 ∀𝑥 ∈ Ω = ℝ𝑛
>0.
The ellipsoid 𝑊 (𝑥) is therefore equivalent to
𝑊 (𝑥) = {𝑦 ∈ ℝ𝑛 : ∑
𝑛
𝑖=1
( 𝑦𝑖 − 𝑥𝑖
𝑥𝑖
)
2
< 1} = {𝑦 ∈ ℝ𝑛 : ∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
< 1}.
It is immediate to check that the condition implies that 𝑦𝑖 > 0 for all 𝑖 = 1, ..., 𝑛. So,
𝑊 (𝑥) ⊆ Ω. We now check the third condition. We will try to relate ‖𝑣‖2
𝑦 to ‖𝑣‖2
𝑥, by
observing that
‖𝑣‖2
𝑦 = ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑦𝑖
)
2
= ∑
𝑛
𝑖=1
( 𝑣𝑖
𝑥𝑖
)
2
(𝑥𝑖
𝑦𝑖
)
2
.
In particular, the previous equality implies immediately that
‖𝑣‖2
𝑥 min
𝑖 (𝑥𝑖
𝑦𝑖
)
2
≤ ‖𝑣‖2
𝑦 ≤ ‖𝑣‖2
𝑥 max
𝑖 ( 𝑥𝑖
𝑦𝑖
)
2
. (1)
We will now use the triangle inequality to bound the minimum and maximum of 𝑥𝑖
𝑦𝑖
.
Specifically, we have that, for all 𝑖 ∈ {1, ..., 𝑛},
𝑦𝑖
𝑥𝑖
≤ 1 + | 𝑦𝑖
𝑥𝑖
− 1| ≤ 1 + √∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
= 1 + ‖𝑦 − 𝑥‖𝑥
𝑦𝑖
𝑥𝑖
≥ 1 − | 𝑦𝑖
𝑥𝑖
− 1| ≥ 1 − √∑
𝑛
𝑖=1
( 𝑦𝑖
𝑥𝑖
− 1)
2
= 1 − ‖𝑦 − 𝑥‖𝑥.
Taking a minimum over 𝑖 = 1, ..., 𝑛, we then get
min
𝑖 ( 𝑥𝑖
𝑦𝑖
)
2
≥ 1
(1 + ‖𝑦 − 𝑥‖𝑥)2 , max
𝑖 (𝑥𝑖
𝑦𝑖
)
2
≤ 1
(1 − ‖𝑦 − 𝑥‖𝑥)2 .
Plugging the previous bounds into (1), we find
‖𝑣‖2
𝑥
(1 + ‖𝑦 − 𝑥‖𝑥)2 ≤ ‖𝑣‖2
𝑦 ≤ ‖𝑣‖2
𝑥
(1 − ‖𝑦 − 𝑥‖𝑥)2 .
Finally, using the fact that 1
1+𝑧 ≥ 1 − 𝑧 (valid for all 𝑧 > −1), and taking square roots,
(1 − ‖𝑦 − 𝑥‖𝑥)‖𝑣‖𝑥 ≤ ‖𝑣‖𝑦 ≤ ‖𝑣‖𝑥
1 − ‖𝑦 − 𝑥‖𝑥
.
□
L19.1.4 Other examples
Other examples of self-concordant functions include the following:
• − log det 𝑋 on the set of positive definite matrices 𝑋
• − log(1 − ‖𝐴𝑥‖2
2) on the ellipsoid defined by 𝐴 ≻ 0
• Quadratic functions 𝑥⊤𝐴𝑥, with 𝐴 ≻ 0, on ℝ𝑛
L19.1.5 Composition properties of self-concordant functions
In general, checking whether a function is self-concordant is a nontrivial task. Usually, one
does not use the definition; rather, the following composition rules are used.
i) (Sum of self-concordant function) The set of self-concordant functions is closed under
addition.
Theorem L19.2. Let 𝑓1 : Ω1 → ℝ and 𝑓2 : Ω2 → ℝ be self-concordant functions
whose domains satisfy Ω1 ∩ Ω2 ≠ ∅. Then, the function 𝑓 + 𝑔 : Ω1 ∩ Ω2 → ℝ is
self-concordant.
[▷ You should try to prove this!]
ii) (Addition of an affine function) Addition of an affine function to a self-concordant
functions does not affect the self-concordance property, since self-concordance depends
only on the Hessian of the function, and the addition of affine functions does not affect
the Hessian.
Theorem L19.3. Let 𝑓 : Ω → ℝ be self-concordant function. Then, the function
𝑔(𝑥) ≔ 𝑓(𝑥) + ⟨𝑎, 𝑥⟩ + 𝑏 is self-concordant on Ω.
iii) (Affine transformation) The composition of a self-concordant function with an injec-
tive affine transformation preserves the self-concordance.
Theorem L19.4. Let 𝑓 : Ω → ℝ be self-concordant and 𝐴 ∈ ℝ𝑚×𝑛, 𝑏 ∈ ℝ𝑚 repre-
sent an injective transformation.¹ Then, assuming Ω′ ≔ {𝑥 ∈ ℝ𝑚 : 𝐴𝑥 + 𝑏 ∈ Ω} ≠
∅, the affinely-transformed function 𝑔(𝑥) ≔ 𝑓(𝐴𝑥 + 𝑏) is self-concordant on the
domain Ω′.
iv) (Consequence) Putting together the previous result together with Example L19.2, we
obtain the following important corollary.
Corollary L19.1. Let Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝑎⊤
𝑖 𝑥 > 𝑏𝑖 for all 𝑖} be a nonempty open poly-
hedral set containing no lines, where 𝑎𝑖 ∈ ℝ𝑛, 𝑏𝑖 ∈ ℝ. A function of the form
𝑓(𝑥) = 𝑐⊤𝑥 − ∑
𝑚
𝑖=1
log(𝑎⊤
𝑖 𝑥 − 𝑏𝑖),
where 𝑐 ∈ ℝ𝑛, is self-concordant on Ω.
L19.2 Properties of self-concordant functions
Self-concordant functions enjoy a number of nice properties. We now scratch the surface.
L19.2.1 The “near-quadratic” nature of self-concordant functions
In point 3. of Definition L19.1, we wrote that self-concordance is effectively asking that
“inside of the ellipsoid 𝑊 (𝑥), the function 𝑓 is almost quadratic”. We will need to make this
statement more precise.
Theorem L19.5 (Quadratic approximation bound). Let 𝑓 : Ω → ℝ be self-concordant,
and let 𝑞𝑥 be the second-order Taylor expansion of 𝑓 around 𝑥, that is,
𝑞𝑥(𝑦) ≔ 𝑓(𝑥) + ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ + 1
2 ⟨𝑦 − 𝑥, ∇2𝑓(𝑥)(𝑦 − 𝑥)⟩.
Then, for all 𝑥 ∈ Ω and 𝑦 ∈ 𝑊 (𝑥), we have
|𝑓(𝑦) − 𝑞𝑥(𝑦)| ≤ ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥) .
Proof. We use the fundamental theorem of calculus² to bound the remainder of the
second-order Taylor expansion as
𝑓(𝑦) − 𝑞𝑥(𝑦) = ∫
1
0
∫
𝑎
0
⟨𝑦 − 𝑥, (∇2𝑓(𝑥 + 𝑏(𝑦 − 𝑥)) − ∇2𝑓(𝑥))(𝑦 − 𝑥)⟩ d𝑏 d𝑎.
The key point of Item 3 of Definition L19.1 is that the inner difference of Hessians is
bounded quite nicely for self-concordant functions. In particular, since by hypothesis 𝑦 ∈
𝑊 (𝑥), we can write
∇2𝑓(𝑥 + 𝑏(𝑦 − 𝑥)) − ∇2𝑓(𝑥) ⪯ 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 ∇2𝑓(𝑥) − ∇2𝑓(𝑥)
= ( 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 − 1)∇2𝑓(𝑥).
Hence, taking absolute values, we find
|𝑓(𝑦) − 𝑞𝑥(𝑦)| ≤ ∫
1
0
∫
𝑎
0
( 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 − 1)|⟨𝑦 − 𝑥, ∇2𝑓(𝑥)(𝑦 − 𝑥)⟩| d𝑏 d𝑎
= ‖𝑦 − 𝑥‖2
𝑥 ∫
1
0
∫
𝑎
0
( 1
(1 − 𝑏‖𝑦 − 𝑥‖𝑥)2 − 1) d𝑏 d𝑎
= ‖𝑦 − 𝑥‖2
𝑥 ∫
1
0
( 1
‖𝑦 − 𝑥‖𝑥
[ 1
1 − 𝑏‖𝑦 − 𝑥‖𝑥
]
𝑎
0
− 𝑎) d𝑎
= ‖𝑦 − 𝑥‖2
𝑥 ∫
1
0
( 𝑎
1 − 𝑎‖𝑦 − 𝑥‖𝑥
− 𝑎) d𝑎
= ‖𝑦 − 𝑥‖3
𝑥 ∫
1
0
𝑎2
1 − 𝑎‖𝑦 − 𝑥‖𝑥
d𝑎
At this point, we can just bound the integral according to
∫
1
0
𝑎2
1 − 𝑎‖𝑦 − 𝑥‖𝑥
d𝑎 ≤ ∫
1
0
𝑎2
1 − ‖𝑦 − 𝑥‖𝑥
d𝑎 ≤ 1
3(1 − ‖𝑦 − 𝑥‖𝑥),
yielding the final result. □
L19.2.2 Proximity to a minimum
A neat property of self-concordant functions is that it is possible to bound the distance
from the minimum of the function simply by looking at the (intrinsic) norm of the Newton
direction at any point 𝑥, which for notational convenience we denote as
𝑛(𝑥) ≔ −[∇2𝑓(𝑥)]−1∇𝑓(𝑥).
Remark L19.2. The intrinsic norm of the descent direction ‖𝑛(𝑥)‖𝑥 at any point 𝑥 is a
crucial quantity to study Newton’s method for self-concordant functions. It is sometimes
called the Newton decrement, and indicated as 𝜆(𝑥) ≔ ‖𝑛(𝑥)‖𝑥. We will avoid the
notation 𝜆(𝑥) to minimize the set of notation.
In particular, if the norm is sufficiently small, then the minimum must be near. Formally,
we have the following result.
Theorem L19.6. Let 𝑓 : Ω → ℝ be self-concordant. If a point 𝑥 ∈ Ω is such that
‖𝑛(𝑥)‖𝑥 ≤ 1/9, then there exists a minimizer 𝑧 of 𝑓 within distance
‖𝑧 − 𝑥‖𝑥 ≤ 3 ⋅ ‖𝑛(𝑥)‖𝑥.
Proof. For positive definite quadratics, the theorem is immediate to prove. The idea is
to now use the near-quadratic nature of self-concordant functions to establish a compact
region around 𝑥 where a minimum must exist, and then invoke the Weierstrass theorem.
In particular, for any 𝑦 in 𝑊 (𝑥), we know from Theorem L19.5 that
𝑓(𝑦) ≥ 𝑞𝑥(𝑦) − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥)
= 𝑓(𝑥) + ⟨∇𝑓(𝑥), 𝑦 − 𝑥⟩ + 1
2⟨𝑦 − 𝑥, ∇2𝑓(𝑥)(𝑦 − 𝑥)⟩ − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥)
= 𝑓(𝑥) + ⟨[∇2𝑓(𝑥)]−1/2∇𝑓(𝑥), [∇2𝑓(𝑥)]1/2(𝑦 − 𝑥)⟩ + 1
2‖𝑦 − 𝑥‖2
𝑥 − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥)
Hence, using Cauchy-Schwarz,
𝑓(𝑦) ≥ 𝑓(𝑥) − ‖𝑛(𝑥)‖𝑥‖𝑦 − 𝑥‖𝑥 + 1
2 ‖𝑦 − 𝑥‖2
𝑥 − ‖𝑦 − 𝑥‖3
𝑥
3(1 − ‖𝑦 − 𝑥‖𝑥).
It is easy to see by direct substitution that for any 𝑦 ∈ 𝑊 (𝑥) with ‖𝑦 − 𝑥‖𝑥 > 3‖𝑛(𝑥)‖𝑥,
then 𝑓(𝑦) > 𝑓(𝑥). Hence, for any we can restrict the function 𝑓 to the closed ball of
radius ‖𝑦 − 𝑥‖𝑥 ≤ 1/3 (which is a subset of 𝑊 (𝑥)) and find a minimum 𝑧 there; such a
minimum is guaranteed to be a minimum of 𝑓 on 𝑊 (𝑥). However, since 𝑓 is convex and
𝑧 is in the interior of 𝑊 (𝑥), then ∇𝑓(𝑧) = 0, and therefore 𝑧 is a minimizer of 𝑓 on Ω as
well. □
The proof of Theorem L19.6 was a bit loose in our bounds. In fact, the result can be further
strengthened to a larger intrinsic radius of 1/4 (instead of 1/9).
Remark L19.3. With a bit of additional work, the result in Theorem L19.6 can be
strengthened as follows: If a point 𝑥 ∈ Ω is such that ‖𝑛(𝑥)‖𝑥 ≤ 1/4, then there exists
a minimum 𝑧 of 𝑓 within distance
‖𝑧 − 𝑥‖𝑥 ≤ ‖𝑛(𝑥)‖𝑥 + 3‖𝑛(𝑥)‖2
𝑥
(1 − ‖𝑛(𝑥)‖𝑥)3 .
L19.2.3 Existence and uniqueness of the minimum
In addition, we mention the following property.
Theorem L19.7. Let 𝑓 : Ω → ℝ be self-concordant and lower bounded. Then, 𝑓 attains
a unique minimum.
The “uniqueness” part of the theorem is implied directly by the strict convexity of the self-
concordant functions, which is guaranteed by Item 1 of Definition L19.1. Existence on the
other hand is not immediate. One pretty direct proof is to show that there must exist at
least one point with ‖𝑛(𝑥)‖𝑥 ≤ 1/9, and then use Theorem L19.6.
L19.3 Newton’s method applied to self-concordant functions
In this section, we discuss how the guarantees of Newton’s method can be extended to self-
concordant functions, while gaining affine invariance at the same time.
L19.3.1 Recovering the quadratic convergence rate
For self-concordant functions, the following affine-invariant guarantee can be established.
Theorem L19.8. Let 𝑓 : Ω → ℝ be self-concordant. If a point 𝑥𝑡 ∈ Ω is such that
‖𝑛(𝑥𝑡)‖𝑥𝑡
< 1, then
‖𝑛(𝑥𝑡+1)‖𝑥𝑡+1
≤
(
‖𝑛(𝑥𝑡)‖𝑥𝑡
1 − ‖𝑛(𝑥𝑡)‖𝑥𝑡 )
2
.
The proof involves a couple of steps but it is not particularly difficult. The key idea is to
use the self-concordance property to bound the Hessian at the next iterate in terms of the
Hessian at the current iterate. You can find a detailed proof in the references cited below.
Combined with the result in the previous subsection, this gives a quadratic convergence rate.
L19.4 Further readings
The short book by Renegar, J. [Ren01] and the monograph by Nesterov, Y. [Nes18] (Chapter
5) provide a comprehensive introduction to self-concordant functions and their applications
in optimization.
I especially recommend the book by Renegar, J. [Ren01] for a concise yet rigorous account.
[Ren01] Renegar, J. (2001). A Mathematical View of Interior-point Methods in Convex
Optimization. SIAM. https://doi.org/10.1137/1.9780898718812
[Nes18] Nesterov, Y. (2018). Lectures on Convex Optimization. Springer International
Publishing. https://link.springer.com/book/10.1007/978-3-319-91578-4
Changelog
• Apr 29, 2025: Fixed a few typos (thanks Jonathan Huang!)
• May 3, 2025: Fixed two typos
• May 11, 2025: Fixed a few typos (thanks Khizer!)
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful
for any reports of typos.
¹Injectivity is necessary to preserve the positive definiteness of the Hessian.
²In particular, for a univariate function 𝜙(𝑡) we have
𝜙(1) − 𝜙(0) − 𝜙′(0) − 1
2𝜙′′(0) = ∫
1
0
∫
𝑎
0
𝜙′′(𝑏) − 𝜙′′(0) d𝑏 d𝑎.
It is easy to obtain the multivariate version of the theorem by simply considering the restriction of a generic
function 𝑓 onto the line spanned by 𝑦 − 𝑥, that is, 𝜙(𝑡) ≔ 𝑓(𝑥 + 𝑡(𝑦 − 𝑥)).