Nonlinear Optimization

MIT 6.7220 / 15.084 — Spring 2025

Course Info

  • Instructors: Gabriele Farina (✉️ gfarina@)
  • Time: Tuesdays & Thursdays, 11:00am - 12:30pm
  • Location: 34-101
  • TAs:
    • Kaveh Alimohammadi (✉️ mrz@)
    • Zhiyuan Fan (✉️ fanzy@)
    • Sobhan Mohammadpour (✉️ somo@)
    • Başak Özaydın (✉️ bozaydin@)
  • Office hours: see Syllabus
  • This course will use Canvas

Schedule (subject to change)

#
Date
Topic
Lecture notes
1 Feb 4
Introduction
Overview of the course material; goals and applications; general form of an optimization problem; Weierstrasss theorem; computational considerations.
pdf | html
Part I: Nonlinear optimization theory
2 Feb 6
HW1
First-order optimality conditions
First-order necessary optimality conditions; the unconstrained case; halfspace constraints.
pdf | html
3 Feb 11
More on normal cones, and a first taste of duality
The connection between normal cones and Lagrange multipliers; linear programming duality as a straightforward consequence of normal cones.
pdf | html
4 Feb 13
Convex functions and sufficiency of normal cones
Definition of convexity for functions; local optimality implies global optimality; sufficiency of first-order optimality conditions; how to recognize a convex function.
pdf | html
5 Feb 18
break No class
6 Feb 20
The driver of duality: Separation
Feasibility as minimization of distance from the feasible set; how separation implies the normal cone to the intersection of halfspaces.
pdf | html
7 Feb 25
HW2
Separation as a proof system
Primer on computational complexity; the class NP coNP and its significance; certificate of infeasibility and Farkas lemma.
pdf | html
8 Feb 27
Functional constraints, Lagrange multipliers, and KKT conditions
Optimization problems with functional constraints; Lagrangian function and Lagrange multipliers; constraint qualifications (linear independence of constraint gradients, Slaters condition).
pdf | html
9 Mar 4
Lagrangian duality
Lagrangian function; connections with KKT conditions; dual problem; examples.
pdf | html
10 Mar 6
Conic constraints
Conic programs and notable cases: linear programs, second order cone programs, semidefinite programs; selected applications.
pdf | html
11 Mar 11
HW3
Polynomial optimization
Polynomial optimization problems; connection between polynomial optimization and semidefinite programming; the sum of squares decomposition.
pdf | html
12 Mar 13
Polarity and oracle equivalence
Duality between optimization and separation; polar and bipolar.
pdf | html
Midterm & Spring break
13 Mar 18
review Midterm review
14 Mar 20
exam In-class midterm
15 Mar 25
break Spring break
16 Mar 27
break Spring break
Part II: Algorithms based on first-order information
17 Apr 1
Gradient descent and descent lemmas
Gradient descent algorithm; smoothness and the gradient descent lemma; computation of a point with small gradient in general (non-convex) functions; descent in function value for convex functions: the Euclidean mirror descent lemma.
pdf | html
18 Apr 3
HW4
Acceleration and momentum
The idea of momentum; Nesterov's accelerated gradient descent; Allen-Zhu and Orecchia's accelerated gradient descent; practical considerations.
pdf | html
19 Apr 8
Projected gradient descent and mirror descent (Part I)
The projected gradient descent (PGD) algorithm; distance-generating functions and Bregman divergences; proximal steps and their properties; the mirror descent algorithm; descent lemmas for mirror descent.
pdf | html
20 Apr 10
Projected gradient descent and mirror descent (Part II)
The projected gradient descent (PGD) algorithm; distance-generating functions and Bregman divergences; proximal steps and their properties; the mirror descent algorithm; descent lemmas for mirror descent.
pdf | html
21 Apr 15
HW5
Stochastic gradient descent and empirical risk minimization
Practical importance of stochastic gradient descent with momentum in tranining deep learning models; empirical risk minimization problems; minibatches; stochastic gradient descent lemmas; the effect of variance.
pdf | html
22 Apr 17
Distributed optimization and ADMM [Optional]
The setting of distributed optimization; the alternating direction method of multipliers (ADMM) algorithm; convergence analysis of ADMM in the case of convex functions.
pdf | html
Part III: Preconditioning and higher-order information
23 Apr 22
Hessians, preconditioning, and Newtons method
Local quadratic approximation of objectives; Newtons method; local quadratic convergence rate; global convergence for certain classes of functions; practical considerations.
24 Apr 24
Adaptive preconditioning: AdaGrad and ADAM
Adaptive construction of diagonal preconditioning matrices; the AdaGrad algorithm; the convergence rate of AdaGrad; proof sketch; AdaGrad with momentum: the popular ADAM algorithm.
25 Apr 29
HW6
Self-concordant functions (Part I)
Limitations of the classic analysis of Newtons method; desiderata and self-concordant functions; properties and examples.
26 May 1
Self-concordant functions (Part II)
Limitations of the classic analysis of Newtons method; desiderata and self-concordant functions; properties and examples.
27 May 6
Central path and interior-point methods (Part I)
Central path; barriers and their complexity parameter; the short-step barrier method; proof sketch.
28 May 8
Central path and interior-point methods (Part II)
Central path; barriers and their complexity parameter; the short-step barrier method; proof sketch.
Final exam
29 May 13
review Final exam review
30 May 16-21
exam In-class final