Nonlinear Optimization
MIT 6.7220 / 15.084 — Spring 2025
Course Info
- Instructors: Gabriele Farina (✉️ gfarina@)
- Time: Tuesdays & Thursdays, 11:00am - 12:30pm
- Location: 34-101
- TAs:
- Kaveh Alimohammadi (✉️ mrz@)
- Zhiyuan Fan (✉️ fanzy@)
- Sobhan Mohammadpour (✉️ somo@)
- Başak Özaydın (✉️ bozaydin@)
- Office hours: see Syllabus
- This course will use Canvas
Schedule (subject to change)
# |
Date
|
Topic |
Lecture notes
|
---|---|---|---|
1 | Feb 4 |
Introduction
Overview of the course material; goals and
applications; general form of an optimization problem; Weierstrasss
theorem; computational considerations.
|
pdf | html |
Part I: Nonlinear optimization theory | |||
2 | Feb 6 HW1 |
First-order optimality conditions
First-order necessary optimality conditions; the
unconstrained case; halfspace constraints.
|
pdf | html |
3 | Feb 11 |
More on normal cones, and a first taste of duality
The connection between normal cones and Lagrange
multipliers; linear programming duality as a straightforward
consequence of normal cones.
|
pdf | html |
4 | Feb 13 |
Convex functions and sufficiency of normal cones
Definition of convexity for functions; local
optimality implies global optimality; sufficiency of first-order
optimality conditions; how to recognize a convex function.
|
pdf | html |
5 | Feb 18 |
break No class
|
|
6 | Feb 20 |
The driver of duality: Separation
Feasibility as minimization of distance from the
feasible set; how separation implies the normal cone to the
intersection of halfspaces.
|
pdf | html |
7 | Feb 25 HW2 |
Separation as a proof system
Primer on computational complexity; the class NP
coNP and its significance; certificate of infeasibility and Farkas
lemma.
|
pdf | html |
8 | Feb 27 |
Functional constraints, Lagrange multipliers, and KKT conditions
Optimization problems with functional constraints;
Lagrangian function and Lagrange multipliers; constraint
qualifications (linear independence of constraint gradients, Slaters
condition).
|
pdf | html |
9 | Mar 4 |
Lagrangian duality
Lagrangian function; connections with KKT
conditions; dual problem; examples.
|
pdf | html |
10 | Mar 6 |
Conic constraints
Conic programs and notable cases: linear programs,
second order cone programs, semidefinite programs; selected
applications.
|
pdf | html |
11 | Mar 11 HW3 |
Polynomial optimization
Polynomial optimization problems; connection
between polynomial optimization and semidefinite programming; the
sum of squares decomposition.
|
pdf | html |
12 | Mar 13 |
Polarity and oracle equivalence
Duality between optimization and separation; polar
and bipolar.
|
pdf | html |
Midterm & Spring break | |||
13 | Mar 18 |
review Midterm review
|
|
14 | Mar 20 |
exam In-class midterm
|
|
15 | Mar 25 |
break Spring break
|
|
16 | Mar 27 |
break Spring break
|
|
Part II: Algorithms based on first-order information | |||
17 | Apr 1 |
Gradient descent and descent lemmas
Gradient descent algorithm; smoothness and the
gradient descent lemma; computation of a point with small gradient
in general (non-convex) functions; descent in function value for
convex functions: the Euclidean mirror descent lemma.
|
pdf | html |
18 | Apr 3 HW4 |
Acceleration and momentum
The idea of momentum; Nesterov's accelerated
gradient descent; Allen-Zhu and Orecchia's accelerated gradient
descent; practical considerations.
|
pdf | html |
19 | Apr 8 |
Projected gradient descent and mirror descent (Part I)
The projected gradient descent (PGD) algorithm;
distance-generating functions and Bregman divergences; proximal
steps and their properties; the mirror descent algorithm; descent
lemmas for mirror descent.
|
pdf | html |
20 | Apr 10 |
Projected gradient descent and mirror descent (Part II)
The projected gradient descent (PGD) algorithm;
distance-generating functions and Bregman divergences; proximal
steps and their properties; the mirror descent algorithm; descent
lemmas for mirror descent.
|
pdf | html |
21 | Apr 15 HW5 |
Stochastic gradient descent and empirical risk minimization
Practical importance of stochastic gradient descent
with momentum in tranining deep learning models; empirical risk
minimization problems; minibatches; stochastic gradient descent
lemmas; the effect of variance.
|
pdf | html |
22 | Apr 17 |
Distributed optimization and ADMM [Optional]
The setting of distributed optimization; the
alternating direction method of multipliers (ADMM) algorithm;
convergence analysis of ADMM in the case of convex functions.
|
pdf | html |
Part III: Preconditioning and higher-order information | |||
23 | Apr 22 |
Hessians, preconditioning, and Newtons method
Local quadratic approximation of objectives;
Newtons method; local quadratic convergence rate; global convergence
for certain classes of functions; practical considerations.
|
|
24 | Apr 24 |
Adaptive preconditioning: AdaGrad and ADAM
Adaptive construction of diagonal preconditioning
matrices; the AdaGrad algorithm; the convergence rate of AdaGrad;
proof sketch; AdaGrad with momentum: the popular ADAM algorithm.
|
|
25 | Apr 29 HW6 |
Self-concordant functions (Part I)
Limitations of the classic analysis of Newtons
method; desiderata and self-concordant functions; properties and
examples.
|
|
26 | May 1 |
Self-concordant functions (Part II)
Limitations of the classic analysis of Newtons
method; desiderata and self-concordant functions; properties and
examples.
|
|
27 | May 6 |
Central path and interior-point methods (Part I)
Central path; barriers and their complexity
parameter; the short-step barrier method; proof sketch.
|
|
28 | May 8 |
Central path and interior-point methods (Part II)
Central path; barriers and their complexity
parameter; the short-step barrier method; proof sketch.
|
|
Final exam | |||
29 | May 13 |
review Final exam review
|
|
30 | May 16-21 |
exam In-class final
|