Nonlinear Optimization

MIT 6.7220 / 15.084 — Spring 2025

Course Info

Instructors: Gabriele Farina (✉️ gfarina@)
Time: Tuesdays & Thursdays, 11:00am - 12:30pm
Location: 34-101
TAs:
- Kaveh Alimohammadi (✉️ mrz@)
- Zhiyuan Fan (✉️ fanzy@)
- Sobhan Mohammadpour (✉️ somo@)
- Başak Özaydın (✉️ bozaydin@)
Office hours: see Syllabus
This course will use Canvas

Schedule (subject to change)

#	Date	Topic	Lecture notes
1	Feb 4	Introduction Overview of the course material; goals and applications; general form of an optimization problem; Weierstrasss theorem; computational considerations.	pdf \| html
Part I: Nonlinear optimization theory
2	Feb 6 HW1	First-order optimality conditions First-order necessary optimality conditions; the unconstrained case; halfspace constraints.	pdf \| html
3	Feb 11	More on normal cones, and a first taste of duality The connection between normal cones and Lagrange multipliers; linear programming duality as a straightforward consequence of normal cones.	pdf \| html
4	Feb 13	Convex functions and sufficiency of normal cones Definition of convexity for functions; local optimality implies global optimality; sufficiency of first-order optimality conditions; how to recognize a convex function.	pdf \| html
5	Feb 18	break No class
6	Feb 20	The driver of duality: Separation Feasibility as minimization of distance from the feasible set; how separation implies the normal cone to the intersection of halfspaces.	pdf \| html
7	Feb 25 HW2	Separation as a proof system Primer on computational complexity; the class NP coNP and its significance; certificate of infeasibility and Farkas lemma.	pdf \| html
8	Feb 27	Functional constraints, Lagrange multipliers, and KKT conditions Optimization problems with functional constraints; Lagrangian function and Lagrange multipliers; constraint qualifications (linear independence of constraint gradients, Slaters condition).	pdf \| html
9	Mar 4	Lagrangian duality Lagrangian function; connections with KKT conditions; dual problem; examples.	pdf \| html
10	Mar 6	Conic constraints Conic programs and notable cases: linear programs, second order cone programs, semidefinite programs; selected applications.	pdf \| html
11	Mar 11 HW3	Polynomial optimization Polynomial optimization problems; connection between polynomial optimization and semidefinite programming; the sum of squares decomposition.	pdf \| html
12	Mar 13	Polarity and oracle equivalence Duality between optimization and separation; polar and bipolar.	pdf \| html
Midterm & Spring break
13	Mar 18	review Midterm review
14	Mar 20	exam In-class midterm
15	Mar 25	break Spring break
16	Mar 27	break Spring break
Part II: Algorithms based on first-order information
17	Apr 1	Gradient descent and descent lemmas Gradient descent algorithm; smoothness and the gradient descent lemma; computation of a point with small gradient in general (non-convex) functions; descent in function value for convex functions: the Euclidean mirror descent lemma.	pdf \| html
18	Apr 3 HW4	Acceleration and momentum The idea of momentum; Nesterov's accelerated gradient descent; Allen-Zhu and Orecchia's accelerated gradient descent; practical considerations.	pdf \| html
19	Apr 8	Projected gradient descent and mirror descent (Part I) The projected gradient descent (PGD) algorithm; distance-generating functions and Bregman divergences; proximal steps and their properties; the mirror descent algorithm; descent lemmas for mirror descent.	pdf \| html
20	Apr 10	Projected gradient descent and mirror descent (Part II) The projected gradient descent (PGD) algorithm; distance-generating functions and Bregman divergences; proximal steps and their properties; the mirror descent algorithm; descent lemmas for mirror descent.	pdf \| html
21	Apr 15 HW5	Stochastic gradient descent and empirical risk minimization Practical importance of stochastic gradient descent with momentum in tranining deep learning models; empirical risk minimization problems; minibatches; stochastic gradient descent lemmas; the effect of variance.	pdf \| html
22	Apr 17	Distributed optimization and ADMM [Optional] The setting of distributed optimization; the alternating direction method of multipliers (ADMM) algorithm; convergence analysis of ADMM in the case of convex functions.	pdf \| html
Part III: Preconditioning and higher-order information
23	Apr 22	Hessians, preconditioning, and Newtons method Local quadratic approximation of objectives; Newtons method; local quadratic convergence rate; global convergence for certain classes of functions; practical considerations.	pdf \| html
24	Apr 24	Adaptive preconditioning: AdaGrad and ADAM Adaptive construction of diagonal preconditioning matrices; the AdaGrad algorithm; the convergence rate of AdaGrad; proof sketch; AdaGrad with momentum: the popular ADAM algorithm.	pdf \| html
25	Apr 29 HW6	Self-concordant functions (Part I) Limitations of the classic analysis of Newtons method; desiderata and self-concordant functions; properties and examples.	pdf \| html
26	May 1	Self-concordant functions (Part II) Limitations of the classic analysis of Newtons method; desiderata and self-concordant functions; properties and examples.	pdf \| html
27	May 6	Central path and interior-point methods (Part I) Central path; barriers and their complexity parameter; the short-step barrier method; proof sketch.	pdf \| html
28	May 8	Central path and interior-point methods (Part II) Central path; barriers and their complexity parameter; the short-step barrier method; proof sketch.	pdf \| html
Final exam
29	May 13	review Final exam review
30	May 16-21	exam In-class final