􀊚􀊚􀊚􀊚􀊚􀊚􀊚􀊚􀊚􀊘􀊚􀊘􀊚􀋂􀊘􀋂􀋂􀊚􀋂􀋂􀋂􀊚􀊚􀋂􀋂􀊚􀋂􀋂􀋂􀊚􀋂􀋂􀋂􀋂􀊘􀋂􀋂􀊚􀋂􀋂􀋂􀋂􀋂􀋂􀋂􀋂􀊚􀊚􀋂􀋂􀊚􀋂􀋂􀊚􀋂􀊚􀋂􀊚􀋂􀋂􀋂􀋂􀋂􀋂􀋂􀋂􀋂􀋂􀄾􀊚􀄾􀋂􀋂􀋂􀋂􀋂􀊘􀋂􀋂􀊚􀋂􀋂􀄾􀄾􀊘􀋂􀋂􀊘􀊚􀊘􀊘􀊘􀊠􀊠
MIT 6.7220/15.084 — Nonlinear Optimization Tue, Apr 9th 2024
Lecture 12
Hessians, preconditioning, and Newton's method
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)
These notes are class material that has not undergone formal peer review. The TAs and I are grateful for any
reports of typos.
So far, we have been concerned with first-order methods, that is, those optimization methods that
use gradient information. With today’s lecture, we will start discussing second-order methods, which
use not only the gradient but also the Hessian (second-order derivative) of the function to be opti-
mized. For that, we will now restrict our attention to optimization problems of the form
min
𝑥
s.t.
𝑓(𝑥)
𝑥 ∈ ℝ𝑛
where 𝑓(𝑥) is a twice-differentiable function.
1 From first-order to second-order Taylor approximations
As we mentioned in Lecture 7 when introducing the gradient descent algorithm, a fundamental
idea for constructing optimization algorithms is to approximate the function to be optimized by a
simpler function that is easier to minimize. In the case of gradient descent, we picked a direction
of movement based on the minimum of the first-order Taylor expansion of the objective function
around the current point.
In the case of twice-differentiable functions, it seems natural to wonder what happens if we instead
were to pick the direction of movement by looking instead at the second-order Taylor expansion of
the objective, which is a more faithful approximation of the function.
The second-order Taylor expansion of a function 𝑓(𝑥) around a point 𝑥𝑡 is given by
𝑓(𝑥) ≈ 𝑓(𝑥𝑡) + ⟨∇𝑓(𝑥𝑡), 𝑥 − 𝑥𝑡⟩ + 1
2⟨𝑥 − 𝑥𝑡, ∇2𝑓(𝑥𝑡)(𝑥 − 𝑥𝑡)⟩
where 2𝑓(𝑥𝑡) is the Hessian matrix of 𝑓 at 𝑥𝑡. The minimum of 𝑓(𝑥) can be found in closed form
by setting the gradient (with respect to 𝑥) of the above expression to zero, which gives
∇𝑓(𝑥𝑡) + ∇2𝑓(𝑥𝑡)(𝑥 − 𝑥𝑡) = 0 𝑥 = 𝑥𝑡 − [∇2𝑓(𝑥𝑡)]−1∇𝑓 (𝑥𝑡).
So, we find that by moving from first-order to second-order Taylor approximation, and assuming
that 2𝑓(𝑥𝑡) is invertible, the natural direction of descent changes from
𝑑 = −∇𝑓(𝑥𝑡)⏟⏟⏟⏟⏟
using first-order
Taylor approximation
to 𝑑 = −[∇2𝑓(𝑥𝑡)]−1∇𝑓 (𝑥𝑡)⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
using second-order Taylor approximation
.
1.1 Second-order direction as an example of preconditioning
The second-order direction of descent is obtained by multiplying the negative gradient (that is, the
first-order direction of descent) by the inverse of the Hessian matrix. This operation is known as
preconditioning the gradient by the Hessian.
The Hessian preconditioning of the gradient direction has the effect of making the optimization prob-
lem affinely invariant. In other words, if we apply an affine transformation to the objective function
and the initial point, the second-order direction of descent will also be automatically transformed.
To demonstrate this property, consider the optimization problem
min
𝑥
s.t.
𝑓(𝑥)
𝑥 ∈ ℝ𝑛
and suppose that the coordinates of 𝑥 have been reparametrized via a new coordinate system 𝑦
where the correspondence to 𝑥 is given by 𝑥 ↔ 𝐴𝑦 + 𝑏.
In the coordinate system of 𝑦, the function being minimized is 𝑔(𝑦) ≔ 𝑓(𝐴𝑦 + 𝑏); so, the gradient
and Hessian of 𝑔 can be computed as:
𝑦𝑔(𝑦) = 𝐴𝑥𝑓(𝑥), 2
𝑦𝑔(𝑦) = 𝐴2
𝑥𝑓(𝑥)𝐴
So,
−[∇2
𝑦𝑔(𝑦)]−1∇𝑦𝑔(𝑦)
⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝑑𝑦
= −[𝐴2
𝑥𝑓(𝑥)𝐴]−1(𝐴𝑥𝑓(𝑥))
= −𝐴−1[∇2
𝑥𝑓 (𝑥)]−1𝐴−⊤𝐴𝑥𝑓(𝑥)
= 𝐴−1 ⋅ (−[∇2
𝑥𝑓 (𝑥)]−1∇𝑥𝑓(𝑥))
⏟⏟⏟⏟⏟⏟⏟
𝑑𝑥
,
This implies that the second-order directions of descent measured with respect to 𝑦 and 𝑥 satisfy
𝑑𝑥 ↔ 𝐴𝑑𝑦,
which mimics the correspondence 𝑥 ↔ 𝐴𝑦 + 𝑏. So, for example, the update 𝑥 = 𝑥 − 𝜂𝑑𝑥 in the 𝑥
coordinate system corresponds to the update
𝑥 = (𝐴𝑦 + 𝑏) − 𝜂𝐴𝑑𝑦 = 𝐴(𝑦 − 𝜂𝑑𝑦) + 𝑏 = 𝐴𝑦 + 𝑏
in the 𝑦 coordinate system, preserving the correspondence at all steps. The same property does
not hold for gradient descent. We will have a more in-depth look into preconditioning in the next
Lecture.
2 From gradient descent to Newton’s method
Having established the second-order direction of descent, we can define the natural generalization
of the gradient descent algorithm to the second-order setting. This algorithm, which takes the name
damped Newton’s method, is given by the update rule
𝑥𝑡+1 = 𝑥𝑡 − 𝜂[∇2𝑓 (𝑥𝑡)]−1∇𝑓 (𝑥𝑡) . (1)
For the choice 𝜂 = 1, which corresponds to setting 𝑥𝑡+1 to be the minimum of the second-order
approximation centered at 𝑥𝑡 at each iteration, this algorithm is known simply as Newton’s method.
2.1 Failure mode: lack of curvature
Given that the update rule of Newton’s method involves the inverse of the Hessian, a natural concern
is what happens when the Hessian is singular or near-singular (for example, when one eigenvalue
is extremely close to zero). Such a situation corresponds to the case where the function has very
little curvature, and the second-order approximation is not a good approximation of the function.
We now show that these concerns are well-founded by considering a function whose curvature away
from the minimum decays extremely fast.
Example 2.1. Consider the function
−2 −1.5 −1 −0.5 0.5 1 1.5 2 𝑥
1
2
3
4
0
𝑓(𝑥) = log(𝑒2𝑥 + 𝑒−2𝑥),
plotted on the right, whose gradient and
Hessian are respectively computed as
∇𝑓(𝑥) = 2 ⋅ 𝑒4𝑥 − 1
𝑒4𝑥 + 1
,
2𝑓(𝑥) = 16 ⋅ 𝑒4𝑥
(𝑒4𝑥 + 1)2 .
The two tables below show the first 10 iterates of Newton’s method and gradient descent when
applied to 𝑓(𝑥) starting at two close initial points: 𝑥0 = 0.5 on the left, and 𝑥0 = 0.7 on the right.
As you can see, the behavior of Newton’s method is very different: while it converges extremely
quickly to the minimum when starting at 𝑥0 = 0.5, it diverges when starting at 𝑥0 = 0.7.
Newton’s method GD (𝜂 = 0.1)
𝑡 = 0 0.5000 0.5000
𝑡 = 1 −0.4067 0.3477
𝑡 = 2 0.2047 0.2274
𝑡 = 3 −0.0237 0.1422
𝑡 = 4 3.53 × 10−5 0.0868
𝑡 = 5 −1.17 × 10−13 0.0524
𝑡 = 6 −1.14 × 10−17 0.0315
𝑡 = 7 0.0000 0.0189
𝑡 = 8 0.0000 0.0114
𝑡 = 9 0.0000 0.0068
Newton’s method GD (𝜂 = 0.1)
𝑡 = 0 0.7000 0.7000
𝑡 = 1 −1.3480 0.5229
𝑡 = 2 26.1045 0.3669
𝑡 = 3 −2.79 × 1044 0.2418
𝑡 = 4 diverged 0.1520
𝑡 = 5 diverged 0.0930
𝑡 = 6 diverged 0.0562
𝑡 = 7 diverged 0.0338
𝑡 = 8 diverged 0.0203
𝑡 = 9 diverged 0.0122
In the next section, we will analyze the convergence properties of Newton’s method formally.
3 Analysis of Newton’s method
Example 2.1 shows that Newton’s method breaks down in the absence of sufficient curvature. How-
ever, it also showed that the method can be extremely efficient when the starting point is close to
the minimum and the minimum has enough curvature. In this section, we formalize these positive
observations quantitatively.
3.1 A key lemma
The analysis of Newton’s method and damped Newton’s method hinges on the following lemma,
which relates the geometric shrinking in the distance to optimality of the iterates to the spectral
norm of a specific matrix that depends on the Hessian of the function.
Theorem 3.1. Let 𝑓 : 𝑛 be twice differentiable with invertible Hessian, and let 𝑥 be a
local minimum of 𝑓. The distance to optimality of the iterates 𝑥𝑡 generated by the damped
Newton’s method with stepsize 𝜂 > 0 satisfy
𝑥𝑡+1 − 𝑥⋆ = (𝐼 − 𝜂𝐻𝑡)(𝑥𝑡 − 𝑥⋆),
where
𝐻𝑡 ≔ [∇2𝑓 (𝑥𝑡)]−1 ∫
1
0
2𝑓 (𝑥⋆ + 𝜆(𝑥𝑡 − 𝑥⋆)) d𝜆.
Proof . The fundamental idea of the proof is to write the second-order direction as a function of
the Hessian matrices on the segment connecting 𝑥⋆ to 𝑥𝑡.
In particular, we have
[∇2𝑓 (𝑥𝑡)]−1∇𝑓 (𝑥𝑡) = [∇2𝑓 (𝑥𝑡)]−1(∇𝑓 (𝑥𝑡) − ∇𝑓 (𝑥⋆)) (since ∇𝑓(𝑥⋆) = 0)
= [∇2𝑓 (𝑥𝑡)]−1 ∫
1
0
2𝑓 (𝑥⋆ + 𝜆(𝑥𝑡 − 𝑥⋆))(𝑥𝑡 − 𝑥⋆) d𝜆
= ([∇2𝑓 (𝑥𝑡)]−1 ∫
1
0
2𝑓 (𝑥⋆ + 𝜆(𝑥𝑡 − 𝑥⋆)) d𝜆)(𝑥𝑡 − 𝑥⋆)
= 𝐻𝑡(𝑥𝑡 − 𝑥⋆).
Hence, substituting this expression in the update rule of damped Newton’s method, we find
𝑥𝑡+1 − 𝑥⋆ = (𝑥𝑡 − 𝑥⋆) − 𝜂[∇2𝑓 (𝑥𝑡)]−1∇𝑓 (𝑥𝑡) = (𝐼 − 𝜂𝐻𝑡)(𝑥𝑡 − 𝑥⋆),
as we wanted to show.
The previous lemma shows that as long as the spectral norm of 𝐼 − 𝜂𝐻𝑡 is less than 1 (that is, the
maximum absolute value of any eigenvalue of 𝐼 − 𝜂𝐻𝑡 is less than 1), the distance to optimality of
the iterates 𝑥𝑡 generated by Newton’s method will decay exponentially fast. We will leverage this
result in the next two subsection to give local and global convergence guarantees under different
hypotheses.
3.2 First corollary: Local convergence guarantees
As a first corollary of the previous lemma, we can derive a convergence guarantee for Newton’s
method when starting from a point that is “close enough” to a minimum with sufficient curvature.
This type of guarantee is known as a local convergence guarantee, as it only applies to points that
are within a certain distance from the solution. An empirical illustration of this guarantee is shown
in Example 2.1 when starting from the initial point 𝑥0 = 0.5 (left table of the example).
In particular, let 𝑥⋆ be a local minimum of 𝑓 with strong curvature, that is, a point such that
∇𝑓(𝑥) = 0, and 2𝑓(𝑥) ≽ 𝜇𝐼 (2)
for some 𝜇 > 0. Furthermore, assume that 𝑓 is smooth, in the sense that its Hessian is 𝑀 -Lipschitz
continuous, that is
‖∇2𝑓(𝑥) − ∇2𝑓(𝑦)‖𝑠 ≤ 𝑀 ⋅ ‖𝑥 − 𝑦‖2. (3)
Our analysis will rest on the following high-level idea.
Since the Hessian at 𝑥 is ≽ 𝜇𝐼 and the Hessian is 𝑀 -Lipschitz continuous—and therefore
cannot change too fast—then we can determine a neighborhood of points “sufficiently close” to
𝑥 with strong curvature. This will allow us to upper bound the spectral norm of 𝐼 − 𝐻𝑡 and
invoke the general result of Theorem 3.1.
We make the above idea formal in the following.
Theorem 3.2. Under the assumptions (2) and (3) above, the spectral norm of the matrix 𝐼 −
𝐻𝑡 induced at time 𝑡 by the iterate 𝑥𝑡 produced by Newton’s method satisfies
‖𝐼 − 𝐻𝑡‖𝑠 ≤ 𝑀
𝜇 ‖𝑥𝑡 − 𝑥⋆‖2
whenever
‖𝑥𝑡 − 𝑥⋆‖2 𝜇
2𝑀 .
Proof . The key technique is to leverage the Lipschitz continuity of the Hessian to bound the
spectral norm of 𝐼 − 𝐻𝑡. To do that, we can make a difference of Hessians appear in the expression
of 𝐼 − 𝐻𝑡 by rewriting the identity matrix as follows:
𝐼 − 𝐻𝑡 = ([∇2𝑓 (𝑥𝑡)]−1 ∫
1
0
2𝑓 (𝑥𝑡) d𝜆) − 𝐻𝑡
= [∇2𝑓 (𝑥𝑡)]−1 ∫
1
0
(∇2𝑓 (𝑥𝑡) − ∇2𝑓 (𝑥⋆ + 𝜆(𝑥𝑡 − 𝑥⋆))) d𝜆.
So, taking spectral norms on both sides, we find
‖𝐼 − 𝐻𝑡‖𝑠 ≤ ‖[∇2𝑓 (𝑥𝑡)]−1‖
𝑠
⋅ ∫
1
0
‖∇2𝑓 (𝑥𝑡) − ∇2𝑓 (𝑥⋆ + 𝜆(𝑥𝑡 − 𝑥⋆))‖𝑠 d𝜆
≤ ‖[∇2𝑓 (𝑥𝑡)]−1‖
𝑠
⋅ ∫
1
0
𝑀 (1 − 𝜆)‖𝑥𝑡 − 𝑥⋆‖2 d𝜆
= 𝑀
2 ‖[∇2𝑓 (𝑥𝑡)]−1‖
𝑠
⋅ ‖𝑥𝑡 − 𝑥⋆‖2. (4)
To complete the proof, we only need to bound the spectral norm of [∇2𝑓 (𝑥𝑡)]−1. As mentioned
above, we will do so by using the fact that 2𝑓(𝑥⋆) ≽ 𝜇𝐼 and the Hessian changes slowly. In
particular, we have
‖∇2𝑓 (𝑥𝑡) − ∇2𝑓 (𝑥⋆)‖𝑠 ≤ 𝑀 ‖𝑥𝑡 − 𝑥⋆‖2,
which implies that
−𝑀 ‖𝑥𝑡 − 𝑥⋆‖2𝐼 ≼ ∇2𝑓 (𝑥𝑡) − ∇2𝑓 (𝑥⋆) ≼ 𝑀 ‖𝑥𝑡 − 𝑥⋆‖2𝐼 .
In particular,
2𝑓(𝑥𝑡) ≽ ∇2𝑓 (𝑥⋆) − 𝑀 ‖𝑥𝑡 − 𝑥⋆‖2𝐼 ≽ (𝜇 − 𝑀 ‖𝑥𝑡 − 𝑥⋆‖2)𝐼 ≽
𝜇
2 𝐼,
where the last inequality used the hypothesis that ‖𝑥𝑡 − 𝑥⋆‖2 𝜇
2𝑀 . Hence, ‖[∇2𝑓 (𝑥𝑡)]−1‖
𝑠
2
𝜇 ,
and plugging this bound into (4) yields the statement.
In particular, since ‖𝐼 − 𝐻𝑡𝑠 1/2, then Theorem 3.1 guarantees that the distance to optimality
decreases exponentially. In fact, since Theorem 3.2 bounds the spectral norm of 𝐼 − 𝐻𝑡 as a linear
function of ‖𝑥𝑡 − 𝑥2, the decrease is doubly exponential. We have just shown the following.
Theorem 3.3. Let 𝑓 : 𝑛 be twice differentiable with 𝑀 -Lipschitz continuous Hessian, and
let 𝑥⋆ be a local minimum of 𝑓 with strong curvature, that is, a point such that
∇𝑓(𝑥⋆) = 0, and 2𝑓 (𝑥⋆) ≽ 𝜇𝐼
for some 𝜇 > 0. Then, as long as we start Newton’s method from a point 𝑥0 with distance
‖𝑥0 − 𝑥⋆‖2 𝜇
2𝑀
from the local minimum, the distance to optimality of the iterates 𝑥𝑡 generated by Newton’s
method decays as
‖𝑥𝑡+1 − 𝑥⋆‖2
𝜇/𝑀 ≤ (‖𝑥𝑡 − 𝑥⋆‖2
𝜇/𝑀 )
2
.
3.3 Second corollary: Global convergence for functions with strong curvature
In general, obtaining global convergence guarantees for Newton’s method is a much harder task than
obtaining local convergence guarantees. However, we can still obtain global convergence guarantees
for Newton’s method when the function is both 𝜇-strongly convex and 𝐿-smooth, that is,
𝜇𝐼 ≼ ∇2𝑓(𝑥) ≼ 𝐿𝐼 ∀𝑥 ∈ ℝ𝑛.
Under these assumptions,
𝜇
𝐿 𝐼 ≼ [∇2𝑓(𝑥)]−1 ∫
1
0
2𝑓(𝑥 + 𝜆(𝑥 − 𝑥)) d𝜆 ≼ 𝐿
𝜇 𝐼
so that
(1 − 𝜂 𝐿
𝜇 )𝐼 ≼ 𝐼 − 𝜂𝐻𝑡 ≼ (1 − 𝜂 𝜇
𝐿)𝐼.
Hence, if 𝜂 ≤ 𝜇
𝐿 , we have
0 ≼ 𝐼 − 𝜂𝐻𝑡 ≼ (1 − 𝜂 𝜇
𝐿)𝐼 ‖𝐼 − 𝜂𝐻𝑡𝑠 ≤ 1 − 𝜂
𝜇
𝐿
and invoking Theorem 3.1 we obtain the following corollary.
Corollary 3.1. Let 𝑓 : 𝑛 be twice differentiable, 𝜇-strongly convex and 𝐿-smooth. Then,
the distance to optimality¹ of the iterates 𝑥𝑡 generated by damped Newton’s method with step-
size 𝜂 ≤ 𝜇
𝐿 decays exponentially fast at the rate
‖𝑥𝑡+1 − 𝑥⋆‖2 ≤ (1 − 𝜂 𝜇
𝐿 )‖𝑥𝑡 − 𝑥⋆‖2.
¹By strong convexity, the function must attain one minimum, which is also unique.
Remark 3.1. The previous result is made somewhat less appealing by the fact that we already
know that gradient descent can reach a similar convergence rate for the same class of smooth
and strongly convex function, without even needing to invert the Hessian (see Lecture 7, section
on convergence rate for smooth PŁ functions).
4 Further readings
Further material on (damped) Newton’s method can be found in Chapter 1.2.4 of Nesterov, Y.
[Nes18]’s book. The use of preconditioning in optimization is discussed in Chapter 1.3 of the same
book, under the name “variable metric method”.
Appendix: Cubic-regularized Newton method. As we have seen above, when the Hessian matrix is
ill conditioned, Newton’s method might take huge steps and diverge. For that reason, we were able
to show global convergence only in the limited setting of strong curvature (albeit at an extremely
fast rate). We will see in a couple of lectures another extremely important function class for which
Newton’s method can be shown to converge globally: self-concordant functions.
Beyond these isolated classes of benign functions, one might wonder if second-order method can
be made more robust to ill-conditioned Hessian matrices. To resolve this question for the positive,
Nesterov, Y., & Polyak, B. T. [NP06] proposed a modification of Newton’s method that combines
the idea of proximal step to control the step size with the second-order information of the Hessian
matrix. In particular, in our pursuit of a good second-order method we would like to be able to trade
off the following two objectives:
• moving as much as possible in the direction of minimizing the second-order approximation
𝑓(𝑥) ≈ 𝑓(𝑥𝑡) + ⟨∇𝑓(𝑥𝑡), 𝑥 − 𝑥𝑡⟩ + 1
2⟨𝑥 − 𝑥𝑡, ∇2𝑓(𝑥𝑡)(𝑥 − 𝑥𝑡)⟩;
• staying in a sufficiently small neighborhood of 𝑥𝑡, where the approximation is accurate. The
update rule 𝑥𝑡+1 = 𝑥𝑡 𝜂[∇2𝑓(𝑥𝑡)]−1∇𝑓 (𝑥𝑡) seen in (1) does not guarantee this, as it might
take large steps when the Hessian is ill conditioned.
As we saw already in Lecture 9, one way to capture this tension quantitatively is via the notion
of proximal step. For first-order methods, when using Euclidean distances to measure the step size,
this resulted in the choice of next iterate given by
𝑥𝑡+1 ≔ arg min
𝑥 𝜂⟨∇𝑓(𝑥𝑡), 𝑥 − 𝑥𝑡⟩ + 1
2‖𝑥 − 𝑥𝑡2
2,
which we saw was exactly equivalent to the gradient descent update rule 𝑥𝑡+1 = 𝑥𝑡 𝜂∇𝑓(𝑥𝑡). For
second-order methods, a natural generalization of the approach results in the update
𝑥𝑡+1 ∈ arg min
𝑥 𝜂(⟨∇𝑓(𝑥𝑡), 𝑥 − 𝑥𝑡⟩ + 1
2 ⟨𝑥 − 𝑥𝑡, ∇2𝑓 (𝑥𝑡)(𝑥 − 𝑥𝑡)⟩) + 1
6‖𝑥 − 𝑥𝑡‖3
2 . (5)
The previous update rule is precisely the one proposed by Nesterov, Y., & Polyak, B. T. [NP06].
We will call the resulting method cubic-regularized Newton’s method. As shown by Nesterov, Y.,
& Polyak, B. T. [NP06], this method guarantees global convergence for a much broader class of
functions than the ones we have seen so far, while still enjoying quadratic convergence rates once
the iterates are close enough to the solution. In addition to the original paper by Nesterov, Y., &
Polyak, B. T. [NP06], you can find a treatment of cubic-regularized Newton’s method in Chapter
4.1 of Nesterov, Y. [Nes18]’s book.
Bibliography
[Nes18] Y. Nesterov, Lectures on Convex Optimization. Springer International Publishing, 2018.
[Online]. Available: https://link.springer.com/book/10.1007/978-3-319-91578-4
[NP06] Y. Nesterov and B. T. Polyak, “Cubic regularization of Newton method and its global
performance,” Math. Program., vol. 108, no. 1, pp. 177–205, Aug. 2006, doi: 10.1007/
s10107-006-0706-8.
5 Appendix: Spectral norms
The spectral norm of a matrix 𝐴 measures the maximum increase in Euclidean norm that 𝐴 can
produce, that is,
‖𝐴‖𝑠 ≔ max
𝑥≠0
‖𝐴𝑥‖2
‖𝑥‖2
.
(This type of norm, defined as the maximum effect that a function can have on the norm of an input,
is known as an operator norm.) In other words, if ‖𝐴‖𝑠 𝑘, then we know that
‖𝐴𝑥‖2 ≤ 𝑘‖𝑥‖2
for all vectors 𝑥.
The spectral norm of a matrix is closely related to its eigenvalues. In particular, we have the following
characterization.
Theorem 5.1. For a symmetric matrix 𝐴, the spectral norm is equal to the maximum absolute
value of any eigenvalue of 𝐴.
Proof . The theorem is straightforward when 𝐴 is a diagonal matrix, where the maximum is
attained by any vector supported only on the coordinate with the maximum absolute eigenvalue.
To show the result in general, we can reduce to the diagonal case by considering the eigendecom-
position
𝐴 = 𝑄⊤Λ𝑄,
where 𝑄 is an orthogonal matrix (that is, ‖𝑄𝑣‖2 = ‖𝑄⊤𝑣‖2 = ‖𝑣‖2 for all 𝑣) and Λ is a diagonal
matrix with the eigenvalues of 𝐴 on the diagonal (this decomposition exists from the spectral
theorem for symmetric real matrices). Then,
max
𝑥≠0
‖𝐴𝑥‖2
‖𝑥‖2
= max
𝑥≠0
‖𝑄⊤Λ𝑄𝑥‖2
‖𝑥‖2
= max
𝑥≠0
‖Λ𝑄𝑥‖2
‖𝑄𝑥‖2
,
where the last equality used the orthogonality of 𝑄 twice (once in the numerator to remove on
𝑄, and once in the denominator to add a 𝑄). Since 𝑄 is invertible, we can therefore operate a
change of variable and write
max
𝑥≠0
‖𝑄⊤Λ𝑄𝑥‖2
‖𝑄𝑥‖2
= max
𝑦≠0
‖Λ𝑦‖2
‖𝑦‖2
.
The result then follows from the diagonal case.
An immediate consequence of the previous result is the following:
Corollary 5.1. Let 𝐴 ∈ 𝑛×𝑛 be symmetric. Then, ‖𝐴‖𝑠 𝑘 if and only if
−𝑘𝐼 ≼ 𝐴 ≼ 𝑘𝐼.
Proof . The condition −𝑘𝐼 ≼ 𝐴 ≼ 𝑘𝐼 is equivalent to asking that every eigenvalue of 𝐴 be in the
range [−𝑘, 𝑘]. The result then follows from the previous theorem.
Finally, we show that the spectral norm of a product of matrices is submultiplicative. This follows
pretty much directly from the definition of the spectral norm.
Theorem 5.2. For any matrices 𝐴, 𝐵 ∈ 𝑛×𝑛, we have ‖𝐴𝐵‖𝑠 ‖𝐴‖𝑠‖𝐵‖𝑠.
Proof . Let 𝑥 ≠ 0 be a vector. Then,
‖𝐴𝐵𝑥‖2 ≤ ‖𝐴‖𝑠‖𝐵𝑥‖2 ≤ ‖𝐴‖𝑠‖𝐵‖𝑠‖𝑥‖2.
Dividing both sides by ‖𝑥‖2 and taking the maximum over all 𝑥 ≠ 0 yields the result.

Content

PDF File

Typo or question?

Get in touch!
gfarina AT mit.edu

Metadata

Course: MIT 6.7220 / 15.084
Term: Spring 2024
Date: 2024-04-09