Papers, Reports, Slides, and Videolectures by Dimitri Bertsekas

Acceptance speech for the Kachiyan Prize, 2014. Acceptance speech for the Bellman Heritage award , 2022

Reviews of the two books honored with the 2018 INFORMS J. von Neumann Theory Prize: Parallel and Distributed Computation, 1989, review by Anna Nagurney, and Neuro-Dynamic Programming, 1996, review by George Cybenko

LECTURE SLIDES AND VIDEO LECTURES

REINFORCEMENT LEARNING COURSE AT ASU, 2019-2023

Textbook, videolectures, slides, and other related material.

Video on Model Predictive Control, and Reinforcement Learning: A Unified Framework Based on Dynamic Programming, Plenary talk at IFAC NMPC, August 2024; Lecture slides.

Video on Computer Chess with Model Predictive Control and Reinforcement Learning, at Reinforcement Learning Workshop, Indian Inst. of Science, Jan. 2025. Slides

Video of an Overview Lecture on Distributed RL from IPAM workshop at UCLA, Feb. 2020 (Slides).

Video of an Overview Lecture on Multiagent RL at ASU, Oct. 2020 (Slides).

Video lectures on Exact and Approximate Finite Horizon DP: Videos from a 4-lecture, 4-hour short course at the University of Cyprus on finite horizon DP, Nicosia, 2017. Videos from Youtube. Based on Chapters 1 and 6 of the book Dynamic Programming and Optimal Control, Vol. I, 4th Edition, Athena Scientific.

Video lectures on Exact and Approximate Infinite Horizon DP: Videos from a 6-lecture, 12-hour short course at Tsinghua Univ. on approximate DP, Beijing, China, 2014. From the Tsinghua course site, and from Youtube. (Complete Set of Lecture Slides.) Based on the book Dynamic Programming and Optimal Control, Vol. II, 4th Edition: Approximate Dynamic Programming, Athena Scientific.

Abstract Dynamic Programming, a lecture slide overview of the book Abstract Dynamic Programming, Athena Scientific, 2013; (Additional related lecture slides on regular policies in Abstract DP); (Related Video Lectures on semicontractive DP, the solutions of Bellman's equation, and applications to stable optimal control); click here for a copy of the book.

Dynamic Programming and Stochastic Control, 2015, Lecture Slides for MIT course 6.231, Fall 2015. Based on the 2-Vol. book Dynamic Programming and Optimal Control, Athena Scientific. MIT OpenCourseware site.

Video from a January 2017 slide presentation on the relation of Proximal Algorithms and Temporal Difference Methods, for solving large linear systems of equations arising in Dynamic Programming among others. The slides are hard to read at times in the video, so you may wish to download the PDF version of the slides. See also Related slides with a more numerical deterministic nonDP point of view from NIPS 2017 Click here for a Related Report. A version appears in Computational Optimization and Applications J, Vol. 70, 2018, pp. 709-736. Based on the books Convex Optimization Algorithms, Nonlinear Programming, 3rd Edition. MIT OpenCourseware site. and Dynamic Programming and Optimal Control, Vol. II, 4th Edition: Approximate Dynamic Programming, Athena Scientific.

Video from A 60-Year Journey in Convex Optimization, a lecture on the history and the evolution of the subject at MIT. Includes a review of the MCMC abstract duality framework. Click here for the slides from the lecture. Based in part on the paper Min Common-Max Crossing Duality: A Geometric View of Conjugacy in Convex Optimization

Nonlinear Programming, Lecture Slides for MIT course 6.252, 2005.

Convex Analysis and Optimization, 2014 Lecture Slides for MIT course 6.253, Spring 2014. Based on the book "Convex Optimization Theory," Athena Scientific, 2009, and the book "Convex Optimization Algorithms," Athena Scientific, 2014.

Convex Analysis and Optimization, 2003, Lecture Slides for MIT course 6.253, Fall 2003. Related VideoLecture, Feb. 2003.

Enhanced Fritz John Conditions and Pseudonormality, a lecture slide overview of a major part of the book "Convex Analysis and Optimization," Athena Scientific, 2003.

Video of A 60-Year Journey in Convex Optimization, a lecture on the history and the evolution of the subject. Slides

Video of opening remarks at the event honoring John Tsitsiklis; career at MIT, Oct. 7, 2023.

Ten Simple Rules for Mathematical Writing, Tutorial lecture on writing engineering/mathematics papers and books.

SURVEY AND EXPOSITORY ARTICLES

Dynamic and Neuro-Dynamic Programming - Reinforcement Learning

Bertsekas, D., "Multiagent Reinforcement Learning: Rollout and Policy Iteration," IEEE/CAA Journal of Automatica Sinica, Vol. 8, 2021, pp. 249-271. Video of an overview lecture.

D. P. Bertsekas, "Approximate Policy Iteration: A Survey and Some New Methods," Journal of Control Theory and Applications, Vol. 9, 2011, pp. 310-335. A survey of policy iteration methods for approximate Dynamic Programming, with a particular emphasis on two approximate policy evaluation methods, projection/temporal difference and aggregation, as well as the pathologies of policy improvement.

D. P. Bertsekas, "Dynamic Programming and Suboptimal Control: A Survey from ADP to MPC," in Fundamental Issues in Control, European J. of Control, Vol. 11, Nos. 4-5, 2005; From 2005 CDC, Seville, Spain. A selective survey of approximate dynamic programming (ADP), with a particular emphasis on two directions of research: rollout algorithms and model predictive control (MPC), and their connection.

D. P. Bertsekas, "Rollout Algorithms for Discrete Optimization: A Survey," Handbook of Combinatorial Optimization, Springer, 2013. A 19-page expository article providing a summary overview of the subject.

D. P. Bertsekas, "Lambda-Policy Iteration: A Review and a New Implementation," Lab. for Information and Decision Systems Report LIDS-P-2874, MIT, October 2011. Appears in "Reinforcement Learning and Approximate Dynamic Programming for Feedback Control," by F. Lewis and D. Liu (eds.), IEEE Press Computational Intelligence Series, 2012. A review of lambda-policy iteration, a method for exact and approximate dynamic programming, and the theoretical foundation of the LSPE(lambda) method. We discuss various implementations of the method, including one that is new and introduces a natural form of exploration enhancement in LSTD(lambda), LSPE(lambda), and TD(lambda). (Video of a related lecture from ADPRL 2014.) (Lecture Slides from ADPRL 2014.)

D. P. Bertsekas, and S. E. Shreve, "Mathematical Issues in Dynamic Programming," an unpublished expository paper that provides orientation on the central mathematical issues for a comprehensive and rigorous theory of dynamic programming and stochastic control, as given in the authors' book "Stochastic Optimal Control: The Discrete-Time Case," Bertsekas and Shreve, Academic Press, 1978 (republished by Athena Scientific, 1996). For an extended version see the Appendix of the book "Dynamic Programming and Optimal Control, Vol. II, 4th Edition," (by D. Bertsekas, Athena Scientific, 2012).

D. P. Bertsekas, "Neuro-Dynamic Programming," Encyclopedia of Optimization, Kluwer, 2001. A 9-page expository article providing orientation, references, and a summary overview of the subject. You may also find helpful the following introductory slide presentation: "Neuro-Dynamic Programming: An Overview."

Optimization and Distributed Computation

D. P. Bertsekas, "Auction Algorithms for Network Flow Problems: A Tutorial Introduction," Computational Optimization and Applications, Vol. 1, pp. 7-66, 1992. An extensive tutorial paper that surveys auction algorithms, a comprehensive class of algorithms for solving the classical linear network flow problem and its various special cases such as shortest path, max-flow, assignment, transportation, and transhipment problems. An account of this material may also be found in the internet-accessible book "Linear Network Optimization," (D. Bertsekas, 1991).

D. P. Bertsekas, "Auction Algorithms," Encyclopedia of Optimization, Kluwer, 2001. An 8-page expository article providing orientation, references, and a summary overview of the subject, as given in the author's book "Network Optimization: Continuous and Discrete Models," Athena Scientific, 1998; the book is also internet-accessible.

D. P. Bertsekas, and J. N. Tsitsiklis, "Some Aspects of Parallel and Distributed Iterative Algorithms - A Survey," Automatica, Vol. 27, 1991, pp. 3-21. A survey of some topics from the 1989 "Parallel and Distributed Computation" book by the authors. It includes some new results on asynchronous iterative algorithms. Also click here for a followup paper on termination of asynchronous iterative algorithms.

Convex Optimization

D. P. Bertsekas, "Min Common/Max Crossing Duality: A Geometric View of Conjugacy in Convex Optimization," Lab. for Information and Decision Systems Report LIDS-P-2796, MIT, August 2008; revised Jan. 2009. A long expository paper on the geometric foundations of duality and convex optimization, as more extensively discussed in the book "Convex Optimization Theory," (Athena Scientific, 2009); the book is also internet-accessible. Related video from A 60-Year Journey in Convex Optimization, a lecture on the history and the evolution of the subject.Slides .

D. P. Bertsekas, "Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey," Lab. for Information and Decision Systems Report LIDS-P-2848, MIT, August 2010; this is an extended version of a paper in the edited volume Optimization for Machine Learning, by S. Sra, S. Nowozin, and S. J. Wright, MIT Press, Cambridge, MA, 2012, pp. 85-119. A survey of incremental methods for minimizing a sum $\sum_{i=1}^mf_i(x)$, and their applications in inference/machine learning, signal processing, and large-scale and distributed optimization. (Related Video Lecture).

OTHER ARTICLES AND ESSAYS

"The Language of Mathematics," Essay based on an interview by Genevieve Wanucha

D. P. Bertsekas, "A BASIC Compiler in BASIC," 80 MicroComputing Magazine, Oct. 1982. An assembly language-based compiler to compile BASIC programs for the TRS-80 microcomputer; check the full article .

D. P. Bertsekas, Ten Simple Rules for Mathematical Writing, Tutorial lecture on writing engineering/mathematics papers and books; podcast.

D. P. Bertsekas, "Academia, Art, and Life," 2025.

RESEARCH PAPERS

Most Recent Papers (2009 - )

Selected papers on Dynamic and Neuro-Dynamic Programming - Reinforcement Learning

Selected papers on Nonlinear Programming and Optimization Applications

Selected papers on Network Optimization

Selected papers on Parallel and Distributed Algorithms

Selected papers on Set-Membership Estimation and Control

Top Menu

Most Recent Papers (2009 - )

Y. Li, F. Chen, Y. Li, C. Fan, D. Bertsekas, "On-Line Policy Iteration with Trajectory-Driven Policy Generation," arXiv:2604.15004, April, 2026

T. McAuliffe, S. Liew, Y. Li, A. Ushenin, C. Wang, A. Tasos, J. Pearce, D. Tasoulis, D. P. Bertsekas, T. Tsagaris, "Model Predictive Control for Trade Execution," arXiv:2603.28898, March, 2026.

Hammar, K., Li, Y., Alpcan, T., Lupu, E. C., and Bertsekas, D., "Adaptive Network Security Policies via Belief Aggregation and Rollout," arXiv:2507.15163, July, 2025.

Li, Y., Hammar, K., and Bertsekas, D., "Feature-Based Belief Aggregation for Partially Observable Markov Decision Problems," arXiv:2507.04646, July, 2025.

Li, Y., and Bertsekas, D., "An Error Bound for Aggregation in Approximate Dynamic Programming," arXiv:2507.01324, July, 2025.

Li, Y., and Bertsekas, D., "Semilinear Dynamic Programming: Analysis, Algorithms, and Certainty Equivalence Properties," arXiv:2501.04668, January, 2025.

Gundawar, A., Li, Y., and Bertsekas, D., "Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout," arXiv:2409.06477, Sept. 2024. Videolecture at Reinforcement Learning Workshop, Indian Inst. of Science, Jan. 2025. Slides

Musunuru, P., Li, Y., Weber, J., and Bertsekas, D., "An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking," arXiv:2405.15137, May 2024.

Li, Y., and Bertsekas, D., "Most Likely Sequence Generation for n-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms," arXiv:2403.15465, March, 2024.

Bertsekas, D., "New Auction Algorithms for the Assignment Problem and Extensions," Results in Control and Optimization, Vol. 14, 2024; a version appears as ArXiv Preprint arXiv:2310.03159.

Garces, D., Bhattacharya, S., Bertsekas, D., and Gil, G., "Approximate Multiagent Reinforcement Learning for On-Demand Urban Mobility Problem on a Large Map", ICRA, 2024.

Agrawal, G., Bertsekas, D., Liu, H., "Auction-Based Learning for Question Answering over Knowledge Graphs," Information 2023, 14, 336. https://doi.org/10.3390/ info14060336.

D. P. Bertsekas, "Auction Algorithms for Path Planning, Network Transport, and Reinforcement Learning," Arizona State University/SCAI Report; arXiv:22207.09588, 2022.

Weber, J., Giriyan, D., Parkar, D., Richa, A., Bertsekas, D., "Distributed Online Rollout for Multivehicle Routing in Unmapped Environments," May 2023, arXiv:2305.11596v1.

Bertsekas, D., "Rollout Algorithms and Approximate Dynamic Programming for Bayesian Optimization and Sequential Estimation," Dec. 2022, arXiv:2212.07998.

Bhambri, S., Bhattacharjee, A., Bertsekas, D., "Playing Wordle Using an Online Rollout Algorithm for Deterministic POMDP," In Proc. of 2023 IEEE Conference on Games, Boston, MA.

Bhambri, S., Bhattacharjee, A., Bertsekas, D., "Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach," Arizona State University/SCAI Report, Nov. 2022; arXiv:2211.10298.

D. P. Bertsekas, "Newton's Method for Reinforcement Learning and Model Predictive Control," Results in Control and Optimization, Vol. 7, 2022, pp. 100-121.

Bhattacharya, S., Kailas, S., Badyal, S., Gil, S., Bertsekas, D.,"Multiagent Reinforcement Learning: Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems," June 2022; IEEE Transactions on Robotics.

D. P. Bertsekas, "On-Line Policy Iteration for Infinite Horizon Dynamic Programming," arXiv preprint arXiv:2106.00746, May 2021 (incorporated into Chapter 3 of the book Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control, Athena Scientific, 2022).

D. P. Bertsekas, "Multiagent Reinforcement Learning: Rollout and Policy Iteration," IEEE/CAA Journal of Automatica Sinica, Vol. 8, 2021, pp. 249-271; Video of an overview lecture.

D. P. Bertsekas, "Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm," arXiv preprint, arXiv:2002.07407 February 2020 (incorporated into Chapter 3 of the book Rollout, Policy Iteration, and Distributed Reinforcement Learning, Athena Scientific, 2020).

Li, Y., Johansson, K. H, Martensson, J., and D. P. Bertsekas, "Data-Driven Rollout for Deterministic Optimal Control ," Proc.\ of 2021 CDC; also arXiv preprint arXiv:2105.03116, Sept. 2021.

D. P. Bertsekas, "Stable Optimal Control and Semicontractive Dynamic Programming," SIAM J. on Control and Optimization, Vol. 56, 2018, pp. 231-252, (Related Lecture Slides), (Related Video Lecture from MIT, May 2017). (Related Lecture Slides from UConn, Oct. 2017). (Related Video Lecture from UConn, Oct. 2017).

D. P. Bertsekas, "Affine Monotonic and Risk-Sensitive Models in Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-3204, MIT, June 2016 (revised Nov. 2017); arXiv preprint arXiv:1608.01393; IEEE Transactions on Aut. Control, Vol. 64, 2019, pp. 3117-3128.

D. P. Bertsekas, "Incremental Aggregated Proximal and Augmented Lagrangian Algorithms," Lab. for Information and Decision Systems Report LIDS-P-3176, MIT, July 2015; revised September 2015; arXiv preprint arXiv:1507.1365936. Incorporated into the author's book "Nonlinear Programming," 3rd edition, Athena Scientific, 2016. (Related Lecture Slides). (Related Video Lecture) .

D. P. Bertsekas and H. Yu, "Stochastic Shortest Path Problems Under Weak Conditions," Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, Jan. 2016 (generalized and incorporated into the book Abstract Dynamic Programming, 3rd edition, Athena Scientific, 2022).

M. Wang and D. P. Bertsekas, "Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex Optimization ," Lab. for Information and Decision Systems Report LIDS-P-2907, MIT, July 2013. (Related Lecture Slides) (Related Video Lecture) .

M. Wang and D. P. Bertsekas, "Stochastic First-Order Methods with Random Constraint Projection," SIAM Journal on Optimization, Vol. 26, 2016, pp. 681-717. (Related Lecture Slides) (Related Video Lecture) .

D. P. Bertsekas, "Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications," Lab. for Information and Decision Systems Report LIDS-P-2884, MIT, May 2012.

M. Wang and D. P. Bertsekas, "Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems," Lab. for Information and Decision Systems Report LIDS-P-2878, MIT, December 2011 (revised March 2012); Mathematics of Operations Research, Vol. 39, pp. 1-30, 2013 related slide presentation, related poster presentation.

M. Wang and D. P. Bertsekas, "Convergence of Iterative Simulation-Based Methods for Singular Linear Systems," Lab. for Information and Decision Systems Report LIDS-P-2879, MIT, December 2011 (revised April 2012); Stochastic Systems, Vol. 3, pp 39-96, 2013. related slide presentation, related poster presentation.

D. P. Bertsekas, "Lambda-Policy Iteration: A Review and a New Implementation," Lab. for Information and Decision Systems Report LIDS-P-2874, MIT, October 2011. In "Reinforcement Learning and Approximate Dynamic Programming for Feedback Control," by F. Lewis and D. Liu (eds.), IEEE Press Computational Intelligence Series.(Video of a related lecture from ADPRL 2014.)(Lecture Slides from ADPRL 2014.)

H. Yu and D. P. Bertsekas, "Q-Learning and Policy Iteration Algorithms for Stochastic Shortest Path Problems," Lab. for Information and Decision Systems Report LIDS-P-2871, MIT, September 2011; revised March 2012; Annals of Operations Research Vol. 208, 2013, pp. 95-132.

H. Yu and D. P. Bertsekas, "On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems," Lab. for Information and Decision Systems Report LIDS-P-2859, MIT, March 2011; revised Sept. 2011; Mathematics of Operations Research 38(2), pp. 209-227, 2013.

D. P. Bertsekas, "Temporal Difference Methods for General Projected Equations," IEEE Trans. on Automatic Control, Vol. 56, pp. 2128 - 2139, 2011. (Related Lecture Slides).

D. P. Bertsekas, "Centralized and Distributed Newton Methods for Network Optimization and Extensions," Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011.

D. P. Bertsekas and H. Yu, "Distributed Asynchronous Policy Iteration in Dynamic Programming," Proc. of 2010 Allerton Conference on Communication, Control, and Computing, Allerton Park, ILL, Sept. 2010. (Related Lecture Slides) (An extended version with additional algorithmic analysis) (A counterexample by Williams and Baird that motivates in part this paper).

D. P. Bertsekas, "Incremental Proximal Methods for Large Scale Convex Optimization," Mathematical Programming, Vol. 129, 2011, pp.163-195. An extended survey version: "Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey," Lab. for Information and Decision Systems Report LIDS-P-2848, MIT, August 2010. (Related Lecture Slides). (Related Video Lecture).

D. P. Bertsekas, "Pathologies of Temporal Difference Methods in Approximate Dynamic Programming," Proc. 2010 IEEE Conference on Decision and Control, Atlanta, GA, Dec. 2010. (Related Lecture Slides).

D. P. Bertsekas and H. Yu, "Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-2831, MIT, April, 2010 (revised November 2011); Math. of Operations Research, Vol. 37, 2012, pp. 66-94; a shorter version appears in Proc. of 2010 IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010. (Related Lecture Slides) (A counterexample by Williams and Baird that motivates in part this paper)

D. P. Bertsekas, "Approximate Policy Iteration: A Survey and Some New Methods," Journal of Control Theory and Applications, Vol. 9, 2011, pp. 310-335.

D. P. Bertsekas and H. Yu, "A Unifying Polyhedral Approximation Framework for Convex Optimization," Lab. for Information and Decision Systems Report LIDS-P-2820, MIT, September 2009 (revised December 2010), published in SIAM J. on Optimization, Vol. 21, 2011, pp. 333-360. (Related VideoLecture, Dec. 2008) (Related Lecture Slides).

H. Yu and D. P. Bertsekas, "Error Bounds for Approximations from Projected Linear Equations," Mathematics of Operations Research, Vol. 35, 2010, pp. 306-329. -- A shorter/abridged version appeared at European Workshop on Reinforcement Learning (EWRL'08), 2008, Lille, France. (Related Lecture Slides).

A. Nedich and D. P. Bertsekas, "The Effect of Deterministic Noise in Subgradient Methods," Math. Programming, Ser. A, Vol. 125, pp. 75-99, 2010.

D. P. Bertsekas and H. Yu, "Projected Equation Methods for Approximate Solution of Large Linear Systems," Journal of Computational and Applied Mathematics, Vol. 227, 2009, pp. 27-50.

Lecture Slides, Survey Papers, Research papers, Top Menu

RESEARCH PAPERS AND DETAILED ABSTRACTS (1970 - )

Dynamic and Neuro-Dynamic Programming - Reinforcement Learning

Hammar, K., Li, Y., Alpcan, T., Lupu, E. C., and Bertsekas, D., "Adaptive Network Security Policies via Belief Aggregation and Rollout," arXiv:2507.15163, July, 2025. Evolving security vulnerabilities and shifting operational conditions require frequent updates to network security policies. These updates include adjustments to incident response procedures and modifications to access controls, among others. Reinforcement learning methods have been proposed for automating such policy adaptations, but most of the methods in the research literature lack performance guarantees and adapt slowly to changes. In this paper, we address these limitations and present a method for computing security policies that is scalable, offers theoretical guarantees, and adapts quickly to changes. It assumes a model or simulator of the system and comprises three components: belief estimation through particle filtering, offline policy computation through aggregation, and online policy adaptation through rollout. Central to our method is a new feature-based aggregation technique, which improves scalability and flexibility. We analyze the approximation error of aggregation and show that rollout efficiently adapts policies to changes under certain conditions. Simulations and testbed results demonstrate that our method outperforms state-of-the-art methods on several benchmarks, including CAGE-2.

Li, Y., Hammar, K., and Bertsekas, D., "Feature-Based Belief Aggregation for Partially Observable Markov Decision Problems," arXiv:2507.04646, July, 2025. We consider a finite-state partially observable Markov decision problem (POMDP) with an infinite horizon and a discounted cost, and we propose a new method for computing a cost function approximation that is based on features and aggregation. In particular, using the classical belief-space formulation, we construct a related Markov decision problem (MDP) by first aggregating the unobservable states into feature states, and then introducing representative beliefs over these feature states. This two-stage aggregation approach facilitates the use of dynamic programming methods for solving the aggregate problem and provides additional design flexibility. The optimal cost function of the aggregate problem can in turn be used within an on-line approximation in value space scheme for the original POMDP. We derive a new bound on the approximation error of our scheme. In addition, we establish conditions under which the cost function approximation provides a lower bound for the optimal cost. Finally, we present a biased aggregation approach, which leverages an optimal cost function estimate to improve the quality of the approximation error of the aggregate problem.

Li, Y., and Bertsekas, D., "An Error Bound for Aggregation in Approximate Dynamic Programming," arXiv:2507.01324, July, 2025. We consider a general aggregation framework for discounted finite-state infinite horizon dynamic programming (DP) problems. It defines an aggregate problem whose optimal cost function can be obtained offline by exact DP and then used as a terminal cost approximation for an on-line reinforcement learning (RL) scheme. We derive a bound on the error between the optimal cost functions of the aggregate problem and the original problem. This bound was first derived by Tsitsiklis and van Roy [TvR96] for the special case of hard aggregation. Our bound is similar but applies far more broadly, including to soft aggregation and feature-based aggregation schemes.

Li, Y., and Bertsekas, D., "Semilinear Dynamic Programming: Analysis, Algorithms, and Certainty Equivalence Properties," arXiv:2501.04668, January, 2025. We consider a broad class of dynamic programming (DP) problems that involve a partially linear structure and some positivity properties in their system equation and cost function. We address deterministic and stochastic problems, possibly with Markov jump parameters. We focus primarily on infinite horizon problems and prove that under our assumptions, the optimal cost function is linear, and that an optimal policy can be computed efficiently with standard DP algorithms. Moreover, we show that forms of certainty equivalence hold for our stochastic problems, in analogy with the classical linear quadratic optimal control problems.

Gundawar, A., Li, Y., and Bertsekas, D., "Superior Computer Chess with Model Predictive Control, Reinforcement Learning, and Rollout," arXiv:2409.06477, Sept. 2024. In this paper we apply model predictive control (MPC), rollout, and reinforcement learning (RL) methodologies to computer chess. We intro- duce a new architecture for move selection, within which available chess engines are used as components. One engine is used to provide position evaluations in an approximation in value space MPC/RL scheme, while a second engine is used as nominal opponent, to emulate or approximate the moves of the true opponent player. We show that our architecture improves substantially the performance of the position evaluation engine. In other words our architecture provides an additional layer of intelligence, on top of the intelligence of the engines on which it is based. This is true for any engine, regardless of its strength: top engines such as Stockfish and Komodo Dragon (of varying strengths), as well as weaker engines. Structurally, our basic architecture selects moves by a one-move lookahead search, with an intermediate move generated by a nominal opponent engine, and followed by a position evaluation by another chess engine. Simpler schemes that forego the use of the nominal opponent, also per- form better than the position evaluator, but not quite by as much. More complex schemes, involving multistep lookahead, may also be used and generally tend to perform better as the length of the lookahead increases. Theoretically, our methodology relies on generic cost improvement properties and the superlinear convergence framework of Newton?s method, which fundamentally underlies approximation in value space, and related MPC/RL and rollout/policy iteration schemes. A critical requirement of this framework is that the first lookahead step should be executed exactly. This fact has guided our architectural choices, and is apparently an important factor in improving the performance of even the best available chess engines.

Bertsekas, D., "Model Predictive Control, and Reinforcement Learning: A Unified Framework Based on Dynamic Programming," Published as an IFAC NMPC Preprint, August 2024; Slide presentation. Video of the lecture at IFAC NMPC . In this paper we describe a new conceptual framework that connects approximate Dynamic Programming (DP), Model Predictive Control (MPC), and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call them the {\it off-line training} and the {\it on-line play} algorithms. The names are borrowed from some of the major successes of RL involving games; primary examples are the recent (2017) AlphaZero program (which plays chess, [SHS17], [SSS17]), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon, [Tes94], [Tes95], [TeG96]). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Significantly, the synergy between off-line training and on-line play also underlies MPC (as well as other major classes of sequential decision problems), and indeed the MPC design architecture is very similar to the one of AlphaZero and TD-Gammon. This conceptual insight provides a vehicle for bridging the cultural gap between RL and MPC, and sheds new light on some fundamental issues in MPC. These include the enhancement of stability properties through rollout, the treatment of uncertainty through the use of certainty equivalence, the resilience of MPC in adaptive control settings that involve changing system parameters, and the insights provided by the superlinear performance bounds implied by Newton's method.

Musunuru, P., Li, Y., Weber, J., and Bertsekas, D., "An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking," arXiv:2405.15137, May 2024. In this work, we consider data association problems involving multi-object tracking (MOT). In particular, we address the challenges arising from object occlusions. We propose a framework called approximate dynamic programming track (ADPTrack), which applies dynamic programming principles to improve an existing method called the base heuristic. Given a set of tracks and the next target frame, the base heuristic extends the tracks by matching them to the objects of this target frame directly. In contrast, ADPTrack first processes a few subsequent frames and applies the base heuristic starting from the next target frame to obtain tentative tracks. It then leverages the tentative tracks to match the objects of the target frame. This tends to reduce the occlusion-based errors and leads to an improvement over the base heuristic. When tested on the MOT17 video dataset, the proposed method demonstrates a 0.7\% improvement in the association accuracy (IDF1 metric) over a state-of-the-art method that is used as the base heuristic. It also obtains improvements with respect to all the other standard metrics. Empirically, we found that the improvements are particularly pronounced in scenarios where the video data is obtained by fixed-position cameras.

Li, Y., and Bertsekas, D., "Most Likely Sequence Generation for n-Grams, Transformers, HMMs, and Markov Chains, by Using Rollout Algorithms," arXiv:2403.15465, March, 2024. In this paper we consider a transformer with an $n$-gram structure, such as the one underlying ChatGPT. The transformer provides next word probabilities, which can be used to generate word sequences. We consider methods for computing word sequences that are highly likely, based on these probabilities. Computing the optimal (i.e., most likely) word sequence starting with a given initial state is an intractable problem, so we propose methods to compute highly likely $N$-word sequences in time that is a low order polynomial in $N$ and in the vocabulary size of the $n$-gram. These methods are based on the rollout approach from approximate dynamic programming, a form of single policy iteration, which can improve the performance of any given heuristic policy. In our case we use a greedy heuristic that generates as next word one that has the highest probability. We show with analysis, examples, and computational experimentation that our methods are capable of generating highly likely sequences with a modest increase in computation over the greedy heuristic (roughly $N$ times larger). While our analysis and experiments are focused on transformer and ChatGPT-like models, our methods apply to general finite-state Markov chains, and related inference applications of Hidden Markov Models (HMM).

Bertsekas, D., "Rollout Algorithms and Approximate Dynamic Programming for Bayesian Optimization and Sequential Estimation," Dec. 2022, arXiv:2212.07998. We provide a unifying approximate dynamic programming framework that applies to a broad variety of problems involving sequential estimation. We consider first the construction of surrogate cost functions for the purposes of optimization, and we focus on the special case of Bayesian optimization, using the rollout algorithm and some of its variations. We then discuss the more general case of sequential estimation of a random vector using optimal measurement selection, and its application to problems of stochastic and adaptive control. We finally consider related search and sequential decoding problems, and a rollout algorithm for the approximate solution of the Wordle and Mastermind puzzles, recently developed in the paper [BBB22].

Bhambri, S., Bhattacharjee, A., Bertsekas, D., "Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach," Arizona State University/SCAI Report, Nov. 2022; arXiv:2211.10298. In this paper we address the solution of the popular Wordle puzzle, using new reinforcement learning methods, which apply more generally to adaptive control of dynamic systems and to classes of Partially Observable Markov Decision Process (POMDP) problems. These methods are based on approximation in value space and the rollout approach, admit a straightforward implementation, and provide improved performance over various heuristic approaches. For the Wordle puzzle, they yield on-line solution strategies that are very close to optimal at relatively modest computational cost. Our methods are viable for more complex versions of Wordle and related search problems, for which an optimal strategy would be impossible to compute. They are also applicable to a wide range of adaptive sequential decision problems that involve an unknown or frequently changing environment whose parameters are estimated on-line.

Garces, D., Bhattacharya, S., Gil, G., and Bertsekas, D., "Multiagent Reinforcement Learning for Autonomous Routing and Pickup Problem with Adaptation to Variable Demand", Nov. 2022; arXiv:2211.14983. We derive a learning framework to generate routing/pickup policies for a fleet of vehicles tasked with servicing stochastically appearing requests on a city map. We focus on policies that 1) give rise to coordination amongst the vehicles, thereby reducing wait times for servicing requests, 2) are non-myopic, considering a-priori unknown potential future requests, and 3) can adapt to changes in the underlying demand distribution. Specifically, we are interested in adapting to fluctuations of actual demand conditions in urban environments, such as on-peak vs. off-peak hours. We achieve this through a combination of (i) online play, a lookahead optimization method that improves the performance of rollout methods via an approximate policy iteration step, and (ii) an offline approximation scheme that allows for adapting to changes in the underlying demand model. In particular, we achieve adaptivity of our learned policy to different demand distributions by quantifying a region of validity using the q-valid radius of a Wasserstein Ambiguity Set. We propose a mechanism for switching the originally trained offline approximation when the current demand is outside the original validity region. In this case, we propose to use an offline architecture, trained on a historical demand model that is closer to the current demand in terms of Wasserstein distance. We learn routing and pickup policies over real taxicab requests in downtown San Francisco with high variability between on-peak and off-peak hours, demonstrating the ability of our method to adapt to real fluctuation in demand distributions. Our numerical results demonstrate that our method outperforms rollout-based reinforcement learning, as well as several benchmarks based on classical methods from the field of operations research.

D. P. Bertsekas, "Auction Algorithms for Path Planning, Network Transport, and Reinforcement Learning," Arizona State University/SCAI Report, July 2022; this is an updated version of a paper posted at arXiv:22207.09588. We consider some classical optimization problems in path planning and network transport, and we introduce new auction-based algorithms for their optimal and suboptimal solution. The algorithms are based on mathematical ideas that are related to competitive bidding for attaining market equilibrium, which underlie auction processes. However, their starting point is different, namely weighted and unweighted path construction in directed graphs, rather than assignment of persons to objects. The new algorithms have several potential advantages over existing methods: they are empirically faster in some important contexts, such as max-flow, they are well-suited for on-line replanning, and they can be adapted to distributed operation. Moreover, they can take advantage of reinforcement learning methods that use off-line training with data, as well as on-line training during real-time operation.

D. P. Bertsekas, "Newton's Method for Reinforcement Learning and Model Predictive Control," Results in Control and Optimization, Vol. 7, 2022, pp. 100-121. The purpose of this paper is to propose and develop a new conceptual framework for approximate Dynamic Programming (DP) and Reinforcement Learning (RL). This framework centers around two algorithms, which are designed largely independently of each other and operate in synergy through the powerful mechanism of Newton's method. We call these the off-line training and the on-line play algorithms; the names are borrowed from some of the major successes of RL involving games. Primary examples are the recent (2017) AlphaZero program (which plays chess), and the similarly structured and earlier (1990s) TD-Gammon program (which plays backgammon). In these game contexts, the off-line training algorithm is the method used to teach the program how to evaluate positions and to generate good moves at any given position, while the on-line play algorithm is the method used to play in real time against human or computer opponents. Both AlphaZero and TD-Gammon were trained off-line extensively using neural networks and an approximate version of the fundamental DP algorithm of policy iteration. Yet the AlphaZero player that was obtained off-line is not used directly during on-line play (it is too inaccurate due to approximation errors that are inherent in off-line neural network training). Instead a separate on-line player is used to select moves, based on multistep lookahead minimization and a terminal position evaluator that was trained using experience with the off-line player. The on-line player performs a form of policy improvement, which is not degraded by neural network approximations. As a result, it greatly improves the performance of the off-line player. Similarly, TD-Gammon performs on-line a policy improvement step using one-step or two-step lookahead minimization, which is not degraded by neural network approximations. To this end it uses an off-line neural network-trained terminal position evaluator, and importantly it also extends its on-line lookahead by rollout (simulation with the one-step lookahead player that is based on the position evaluator). An important lesson from AlphaZero and TD-Gammon is that the performance of an off-line trained policy can be greatly improved by on-line approximation in value space, with long lookahead (involving minimization or rollout with the off-line policy, or both), and terminal cost approximation that is obtained off-line. This performance enhancement is often dramatic and is due to a simple fact, which is couched on algorithmic mathematics and is the focal point of this work: (a) Approximation in value space with one-step lookahead minimization amounts to a step of Newton's method for solving Bellman's equation. (a) The starting point for the Newton step is based on the results of off-line training, and may be enhanced by longer lookahead minimization and on-line rollout. Indeed the major determinant of the quality of the on-line policy is the Newton step that is performed on-line, while off-line training plays a secondary role by comparison. Significantly, the synergy between off-line training and on-line play also underlies Model Predictive Control (MPC), a major control system design methodology that has been extensively developed since the 1980s. This synergy can be understood in terms of abstract models of infinite horizon DP and simple geometrical constructions, and helps to explain the all-important stability issues within the MPC context. In this work we aim to provide insights (often based on visualization), which explain the beneficial effects of on-line decision making on top of off-line training. In the process, we will bring out the strong connections between the artificial intelligence view of RL, and the control theory views of MPC and adaptive control. While we will deemphasize mathematical proofs, there is considerable related analysis, which supports our conclusions and can be found in the author's recent RL books [Ber19a], [Ber20a], and the abstract DP monograph [Ber22b]. One of our principal aims is to show, through the algorithmic ideas of Newton's method and the unifying principles of abstract DP, that the AlphaZero/TD-Gammon methodology of approximation in value space and rollout applies very broadly to deterministic and stochastic optimal control problems, involving both discrete and continuous search spaces, as well as finite and infinite horizon.

D. P. Bertsekas, "Distributed Asynchronous Policy Iteration for Sequential Zero-Sum Games and Minimax Control," arXiv preprint arXiv:2107.10406, July 2021; revised October 2021 (incorporated as Chapter 5 into the book Abstract Dynamic Programming, 3rd edition, Athena Scientific, 2022). We introduce a contractive abstract dynamic programming framework and related policy iteration algorithms, specifically designed for sequential zero-sum games and minimax problems with a general structure. Aside from greater generality, the advantage of our algorithms over alternatives is that they resolve some long-standing convergence difficulties of the ``natural" policy iteration algorithm, which have been known since the Pollatschek and Avi-Itzhak method [PoA69] for finite-state Markov games. Mathematically, this ``natural" algorithm is a form of Newton's method for solving Bellman's equation, but Newton's method, contrary to the case of single-player DP problems, is not globally convergent in the case of a minimax problem, because the Bellman operator may have components that are neither convex nor concave. Our algorithms address this difficulty by introducing alternating player choices, and by using a policy-dependent mapping with a uniform sup-norm contraction property, similar to earlier works by Bertsekas and Yu [BeY10], [BeY12], [YuB13]. Moreover, our algorithms allow a convergent and highly parallelizable implementation, which is based on state space partitioning, and distributed asynchronous policy evaluation and policy improvement operations within each set of the partition. Our framework is also suitable for the use of reinforcement learning methods based on aggregation, which may be useful for large-scale problem instances.

D. P. Bertsekas, "On-Line Policy Iteration for Infinite Horizon Dynamic Programming," arXiv preprint arXiv:2106.00746, May 2021 (incorporated into the books Rollout, Policy Iteration, and Distributed Reinforcement Learning, and Lessons from AlphaZero for Optimal, Model Predictive, and Adaptive Control, Athena Scientific, 2022). In this paper we propose an on-line policy iteration (PI) algorithm for finite-state infinite horizon discounted dynamic programming, whereby the policy improvement operation is done on-line, only for the states that are encountered during operation of the system. This allows the continuous updating/improvement of the current policy, thus resulting in a form of on-line PI that incorporates the improved controls into the current policy as new states and controls are generated. The algorithm converges in a finite number of stages to a type of locally optimal policy, and suggests the possibility of variants of PI and multiagent PI where the policy improvement is simplified. Moreover, the algorithm can be used with on-line replanning, and is also well-suited for on-line PI algorithms with value and policy approximations.

D. P. Bertsekas, "Multiagent Reinforcement Learning: Rollout and Policy Iteration," ASU Report Oct. 2020; IEEE/CAA Journal of Automatica Sinica, Vol. 8, 2021, pp. 249-271; Video of an overview lecture. We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration (PI for short), i.e., start from some base policy and generate an improved policy. Rollout is the simplest method of this type, where just one improved policy is generated. We can view PI as repeated application of rollout, where the rollout policy at each iteration serves as the base policy for the next iteration. In contrast with PI, rollout has a robustness property: it can be applied on-line and is suitable for on-line replanning. Moreover, rollout can use as base policy one of the policies produced by PI, thereby improving on that policy. This is the type of scheme underlying the prominently successful AlphaZero chess program. In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected (conceptually) by a separate agent. This is the class of multiagent problems where the agents have a shared objective function, and a shared and perfect state information. Based on a problem reformulation that trades off control space complexity with state space complexity, we develop an approach, whereby at every stage, the agents sequentially (one-at-a-time) execute a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of total computation required at every stage grows linearly with the number of agents. By contrast, in the standard rollout algorithm, the amount of total computation grows exponentially with the number of agents. Despite the dramatic reduction in required computation, we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout: it guarantees an improved performance relative to the base policy. We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information, which is sufficient to maintain the cost improvement property, without any on-line coordination of control selection between the agents. For discounted and other infinite horizon problems, we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation. For one of our PI algorithms, we prove convergence to an agent-by-agent optimal policy, thus establishing a connection with the theory of teams. For another PI algorithm, which is executed over a more complex state space, we prove convergence to an optimal policy. Approximate forms of these algorithms are also given, based on the use of policy and value neural networks. These PI algorithms, in both their exact and their approximate form are strictly off-line methods, but they can be used to provide a base policy for use in an on-line multiagent rollout scheme.

D. P. Bertsekas, "Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning," ASU Report, April 2020; arXiv preprint, arXiv:2005.01627, May 2020; Results in Control and Optimization J., Vol. 1, 2020. We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order, with knowledge of the choices of the preceding agents in the order. As a result, the amount of computation for each policy improvement grows linearly with the number of agents, as opposed to exponentially for the standard all-agents-at-once method. For the case of a finite-state discounted problem, we showed convergence to an agent-by-agent optimal policy. In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.

Li, Y., Johansson, K. H, Martensson, J., and D. P. Bertsekas, "Data-Driven Rollout for Deterministic Optimal Control ," Proc.\ of 2021 CDC; also arXiv preprint arXiv:2105.03116, Sept. 2021. We consider deterministic infinite horizon optimal control problems with nonnegative stage costs. We draw inspiration from learning model predictive control scheme designed for continuous dynamics and iterative tasks, and propose a rollout algorithm that relies on sampled data generated by some base policy. The proposed algorithm is based on value and policy iteration ideas, and applies to deterministic problems with arbitrary state and control spaces, and arbitrary dynamics. It admits extensions to problems with trajectory constraints, and a multiagent structure.

Liu, M., Pedrielli, G., Sulc, P., Poppleton, E., and D. P. Bertsekas, "ExpertRNA: A New Framework for RNA Structure Prediction," bioRxiv 2021.01.18.427087, January 2021; INFORMS J. on Computing, 34(5), pp.2464-2484. Ribonucleic acid (RNA) is a fundamental biological molecule that is essential to all living organisms, performing a versatile array of cellular tasks. The function of many RNA molecules is strongly related to the structure it adopts. As a result, great effort is being dedicated to the design of efficient algorithms that solve the folding problem: given a sequence of nucleotides, return a probable list of base pairs, referred to as the secondary structure prediction. Early algorithms have largely relied on finding the structure with minimum free energy. However, the predictions rely on effective simplified free energy models that may not correctly identify the correct structure as the one with the lowest free energy. In light of this, new, data-driven approaches that not only consider free energy, but also use machine learning techniques to learn motifs have also been investigated, and have recently been shown to outperform free energy based algorithms on several experimental data sets. In this work, we introduce the new ExpertRNA algorithm that provides a modular framework which can easily incorporate an arbitrary number of rewards (free energy or non-parametric/data driven) and secondary structure prediction algorithms. We argue that this capability of ExpertRNA has the potential to balance out different strengths and weaknesses of state-of-the-art folding tools. We test the ExpertRNA on several RNA sequence-structure data sets, and we compare the performance of ExpertRNA against a state-of-the-art folding algorithm. We find that ExpertRNA produces, on average, more accurate predictions than the structure prediction algorithm used, thus validating the promise of the approach.

D. P. Bertsekas, "Constrained Multiagent Rollout and Multidimensional Assignment with the Auction Algorithm," arXiv preprint, arXiv:2002.07407, April 2020. We consider an extension of the rollout algorithm that applies to constrained deterministic dynamic programming, including challenging combinatorial optimization problems. The algorithm relies on a suboptimal policy, called base heuristic. Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm has a cost improvement property: it produces a feasible solution, whose cost is no worse than the base heuristic's cost. We then focus on multiagent problems, where the control at each stage consists of multiple components (one per agent), which are coupled either through the cost function or the constraints or both. We show that the cost improvement property is maintained with an alternative implementation that has greatly reduced computational requirements, and makes possible the use of rollout in problems with many agents. We demonstrate this alternative algorithm by applying it to layered graph problems that involve both a spatial and a temporal structure. We consider in some detail a prominent example of such problems: multidimensional assignment, where we use the auction algorithm for 2-dimensional assignment as a base heuristic. This auction algorithm is particularly well-suited for our context, because through the use of prices, it can advantageously use the solution of an assignment problem as a starting point for solving other related assignment problems, and this can greatly speed up the execution of the rollout algorithm.

Bhattacharya, S., Kailas, S., Badyal, S., Gil, S., Bertsekas, D.,"Multiagent Rollout and Policy Iteration for POMDP with Application to Multi-Robot Repair Problems," Proc. CORL, 2020; arXiv preprint, arXiv:2011.04222, November 2020. In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, partial state observations, and a multiagent structure. We discuss and compare algorithms that simultaneously or sequentially optimize the agents' controls by using multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. Our methods specifically address the computational challenges of partially observable multiagent problems. In particular: 1) We consider rollout algorithms that dramatically reduce required computation while preserving the key cost improvement property of the standard rollout method. The per-step computational requirements for our methods are on the order of O(Cm) as compared with O(Cm) for standard rollout, where C is the maximum cardinality of the constraint set for the control component of each agent, and m is the number of agents. 2) We show that our methods can be applied to challenging problems with a graph structure, including a class of robot repair problems whereby multiple robots collaboratively inspect and repair a system under partial information. 3) We provide a simulation study that compares our methods with existing methods, and demonstrate that our methods can handle larger and more complex partially observable multiagent problems (state space size 10^37 and control space size 10^7, respectively). In particular, we verify experimentally that our multiagent rollout methods perform nearly as well as standard rollout for problems with few agents, and produce satisfactory policies for problems with a larger number of agents that are intractable by standard rollout and other state of the art methods. Finally, we incorporate our multiagent rollout algorithms as building blocks in an approximate policy iteration scheme, where successive rollout policies are approximated by using neural network classifiers. While this scheme requires a strictly off-line implementation, it works well in our computational experiments and produces additional significant performance improvement over the single online rollout iteration method.

S. Bhattacharya, S. Badyal, T. Wheeler, S. Gil, D. P. Bertsekas, "Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems," IEEE Robotics and Automation Letters, Vol. 5, pp. 3967-3974, 2020; arXiv preprint arXiv:2002.04175. In this paper we consider infinite horizon discounted dynamic programming problems with finite state and control spaces, and partial state observations. We discuss an algorithm that uses multistep lookahead, truncated rollout with a known base policy, and a terminal cost function approximation. This algorithm is also used for policy improvement in an approximate policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of our approach is that it is well suited for distributed computation through an extended belief space formulation and the use of a partitioned architecture, which is trained with multiple neural networks. We apply our methods in simulation to a class of sequential repair problems where a robot inspects and repairs a pipeline with potentially several rupture sites under partial information about the state of the pipeline.

D. P. Bertsekas, "Multiagent Rollout Algorithms and Reinforcement Learning," arXiv preprint arXiv:1910.00120, April 2020. We consider finite and infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. We introduce an algorithm, whereby at every stage, each agent's decision is made by executing a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of local computation required at every stage by each agent is independent of the number of agents, while the amount of global computation (over all agents) grows linearly with the number of agents. By contrast, with the standard rollout algorithm, the amount of global computation grows exponentially with the number of agents. Despite the drastic reduction in required computation, we show that our algorithm has the fundamental cost improvement property of rollout: an improved performance relative to the base policy. We also explore related reinforcement learning and approximate policy iteration algorithms, and we discuss how this cost improvement property is affected when we attempt to improve further the method's computational efficiency through parallelization of the agents' computations.

D. P. Bertsekas, "Biased Aggregation, Rollout, and Enhanced Policy Improvement for Reinforcement Learning," Lab. for Information and Decision Systems Report, MIT, October 2018; a shorter version appears as arXiv preprint arXiv:1910.02426, Oct. 2019. We propose a new aggregation framework for approximate dynamic programming, which provides a connection with rollout algorithms, approximate policy iteration, and other single and multistep lookahead methods. The central novel characteristic is the use of a scoring function $V$ of the state, which biases the values of the aggregate cost function towards their correct levels. The classical aggregation framework is obtained when $V\equiv0$, but our scheme works best when $V$ is a known reasonably good approximation to the optimal cost function $\jstar$. When $V$ is equal to the cost function $J_\m$ of some known policy $\m$, our scheme is equivalent to the rollout algorithm based on $\m$, in the extreme case where there is a single aggregate state. In the case of hard aggregation with multiple aggregate states, our scheme is equivalent to approximation in value space with lookahead function $\tl J$ equal to $J_\m$ plus local corrections that are constant within each aggregate state. The local correction levels are obtained by solving a low-dimensional aggregate DP problem, yielding an arbitrarily close approximation $\tl J$ to $\jstar$, when the number of aggregate states is sufficiently large. As a result, for $V=J_\m$, our score-based aggregation approach can be used as an enhanced form of improvement of the policy $\m$. When combined with an approximate policy evaluation scheme, it can form the basis for a new and enhanced form of approximate policy iteration. When $V$ is a generic scoring function, the aggregation scheme is equivalent to approximation in value space based on $V$, in the extreme case of a single aggregate state. It can yield an arbitrarily close approximation to $\jstar$ when a sufficiently large number of aggregate states are used, through local corrections to $V$, obtained by solving an aggregate problem. Except for the scoring function, the aggregate problem is similar to the one of the classical aggregation framework, and its algorithmic solution by simulation or other methods is nearly identical to one for classical aggregation, assuming values of $V$ are available when needed.

D. P. Bertsekas, "Feature-Based Aggregation and Deep Reinforcement Learning: A Survey and Some New Implementations," Lab. for Information and Decision Systems Report, MIT, April 2018 (revised August 2018); arXiv preprint arXiv:1804.04577; a version published in IEEE/CAA Journal of Automatica Sinica. (Lecture Slides). (Related Video Lecture). In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller ``aggregate" Markov decision problem, whose states relate to the features. The optimal cost function of the aggregate problem, a nonlinear function of the features, serves as an architecture for approximation in value space of the optimal cost function or the cost functions of policies of the original problem. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with reinforcement learning based on deep neural networks, which is used to obtain the needed features. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by deep reinforcement learning, thereby potentially leading to more effective policy improvement.

D. P. Bertsekas, "Proximal Algorithms and Temporal Differences for Large Linear Systems: Extrapolation, Approximation, and Simulation," Lab. for Information and Decision Systems Report LIDS-P-3205, MIT, October 2016; arXiv preprint arXiv:1610.1610.05427; a version appears in Computational Optimization and Applications J, Vol. 70, 2018, pp. 709-736. (Related Video Lecture from INFORMS ICS Conference, Jan 2017). (Slides from INFORMS ICS Conference). (Related Slides from NIPS 2017). We consider large linear and nonlinear fixed point problems, and solution with proximal algorithms. We show that there is a close connection between two seemingly different types of methods from distinct fields: 1) Proximal iterations for linear systems of equations, which are prominent in numerical analysis and convex optimization, and 2) Temporal difference (TD) type methods, such as TD(lambda), LSTD(lambda), and LSPE(lambda), which are central in simulation-based approximate dynamic programming/reinforcement learning (DP/RL), and its recent prominent successes in large-scale game contexts, among others. One benefit of this connection is a new and simple way to accelerate the standard proximal algorithm by extrapolation towards the TD iteration, which generically has a faster convergence rate. Another benefit is the potential integration into the proximal algorithmic context of several new ideas that have emerged in the DP/RL context. We discuss some of the possibilities, and in particular, algorithms that project each proximal iterate onto the subspace spanned by a small number of basis functions, using low-dimensional calculations and simulation. A third benefit is that insights and analysis from proximal algorithms can be brought to bear on the enhancement of TD methods. The linear fixed point methodology can be extended to nonlinear fixed point problems involving a contraction, thus providing guaranteed and potentially substantial acceleration of the proximal and forward backward splitting algorithms at no extra cost. Moreover, the connection of proximal and TD methods can be extended to nonlinear (nondifferentiable) fixed point problems through new proximal-like algorithms that involve successive linearization, similar to policy iteration in DP.

D. P. Bertsekas, "Stable Optimal Control and Semicontractive Dynamic Programming," SIAM J. on Control and Optimization, Vol. 56, 2018, pp. 231-252, (Related Lecture Slides). We consider discrete-time infinite horizon deterministic optimal control problems with nonnegative cost, and a destination that is cost-free and absorbing. The classical linear-quadratic regulator problem is a special case. Our assumptions are very general, and allow the possibility that the optimal policy may not be stabilizing the system, e.g., may not reach the destination either asymptotically or in a finite number of steps. We introduce a new unifying notion of stable feedback policy, based on perturbation of the cost per stage, which in addition to implying convergence of the generated states to the destination, quantifies the speed of convergence. We consider the properties of two distinct cost functions: $J^*$, the overall optimal, and $\hat J$, the restricted optimal over just the stable policies. Different classes of stable policies (with different speeds of convergence) may yield different values of $\hat J$. We show that for any class of stable policies, $\hat J$ is a solution of Bellman's equation, and we characterize the smallest and the largest solutions: they are $J^*$, and $J^+$, the restricted optimal cost function over the class of (finitely) terminating policies. We also characterize the regions of convergence of various modified versions of value and policy iteration algorithms, as substitutes for the standard algorithms, which may not work in general.

D. P. Bertsekas, "Proper Policies in Infinite-State Stochastic Shortest Path Problems," arXiv preprint arXiv:1711.10129; appeared as Section 4.6 of the 3rd edition of the author's book "Abstract Dynamic Programming", Athena Scientific, 2022. (Related Lecture Slides). We consider stochastic shortest path problems with infinite state and control spaces, and a nonnegative cost per stage. We extend the notion of a proper policy from the context of finite state space to the context of infinite state space. We consider the optimal cost function $J^*$, and the optimal cost function $\hat J$ over just the proper policies. Assuming that there exists at least one proper policy, we show that $J^*$ and $\hat J$ are the smallest and largest solutions of Bellman's equation, respectively, within a class of functions with a boundedness property. The standard value iteration algorithm may be attracted to either $J^*$ or $\hat J$, depending on the initial condition.

D. P. Bertsekas, "Affine Monotonic and Risk-Sensitive Models in Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-3204, MIT, June 2016; arXiv preprint arXiv:1608.01393; IEEE Transactions on Aut. Control, Vol. 64, 2019, pp. 3117-3128. In this paper we consider a broad class of infinite horizon discrete-time optimal control models that involve a nonnegative cost function and an affine mapping in their dynamic programming equation. They include as special cases classical models such as stochastic undiscounted nonnegative cost problems, stochastic multiplicative cost problems, and risk-sensitive problems with exponential cost. We focus on the case where the state space is finite and the control space has some compactness properties. We assume that the affine mapping has a semicontractive character, whereby for some policies it is a contraction, while for others it is not. In one line of analysis, we impose assumptions that guarantee that the latter policies cannot be optimal. Under these assumptions, we prove strong results that resemble those for discounted Markovian decision problems, such as the uniqueness of solution of Bellman's equation, and the validity of forms of value and policy iteration. In the absence of these assumptions, the results are weaker and unusual in character: the optimal cost function need not be a solution of Bellman's equation, and an optimal policy may not be found by value or policy iteration. Instead the optimal cost function over just the contractive policies solves Bellman's equation, and can be computed by a variety of algorithms.

D. P. Bertsekas, "Regular Policies in Abstract Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-3173, MIT, May 2015; SIAM J. on Optimization, Vol. 27, No. 3, pp. 1694-1727. (Related Lecture Slides); (Related Video Lectures). We consider an abstract dynamic programming model, and analysis based on regular policies that are well-behaved with respect to value iteration. We show that the optimal cost function over regular policies may have favorable fixed point and value iteration properties, which the optimal cost function over all policies need not have. We accordingly develop a methodology that can deal with long standing analytical and algorithmic issues in undiscounted dynamic programming models, such as stochastic shortest path, positive cost, negative cost, mixed positive-negative cost, risk-sensitive, and multiplicative cost problems. Among others, we use our approach to obtain new results for convergence of value and policy iteration in deterministic discrete-time optimal control with nonnegative cost per stage.

D. P. Bertsekas, "Value and Policy Iteration in Deterministic Optimal Control and Adaptive Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-3174, MIT, May 2015 (revised Sept. 2015); arXiv preprint arXiv:1507.01026; IEEE Transactions on Neural Networks and Learning Systems, Vol. 28, 2017, pp. 500-509. In this paper, we consider discrete-time infinite horizon problems of optimal control to a terminal set of states. These are the problems that are often taken as the starting point for adaptive dynamic programming. Under very general assumptions, we establish the uniqueness of solution of Bellman's equation and we provide convergence results for value and policy iteration.

D. P. Bertsekas, "Robust Shortest Path Planning and Semicontractive Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-2915, MIT, Feb. 2014 (revised Jan. 2015 and June 2016); Naval Research Logistics (NRL), 66(1), pp.15-37. In this paper we consider shortest path problems in a directed graph where the transitions between nodes are subject to uncertainty. We use a minimax formulation, where the objective is to guarantee that a special destination state is reached with a minimum cost path even under the worst possible instance of the uncertainty. Problems of this type arise, among others, in planning and pursuit-evasion contexts, and in model predictive control. Our analysis makes use of the recently developed theory of abstract semicontractive dynamic programming models. We investigate questions of existence and uniqueness of solution of the optimality equation, existence of optimal paths, and the validity of various algorithms patterned after the classical methods of value and policy iteration, as well as a new Dijkstra-like algorithm for problems with nonnegative arc lengths.

D. P. Bertsekas and H. Yu, "Stochastic Shortest Path Problems Under Weak Conditions," Lab. for Information and Decision Systems Report LIDS-P-2909, MIT, Jan. 2016; to appear in Math. of OR. In this paper we weaken the conditions under which some of the basic analytical and algorithmic results for finite-state stochastic shortest path problems hold. We provide an analysis under three types of assumptions, under all of which the standard form of policy iteration may fail, and other anomalies may occur. In the first type of assumptions, we require a standard compactness and continuity condition, as well as the existence of an optimal proper policy, thereby allowing positive and negative costs per stage, and improper policies with finite cost at all states. The analysis is based on introducing an additive perturbation $\d>0$ to the cost per stage, which drives the cost of improper policies to infinity. By considering the $\d$-perturbed problem and taking the limit as $\d\downarrow0$, we show the validity of Bellman's equation and value iteration, and we construct a convergent policy iteration algorithm that uses a diminishing sequence of perturbations. In the second type of assumptions we require nonpositive one-stage costs and we give policy iteration algorithms that are optimistic and do not require the use of perturbations. In the third type of assumptions we require nonnegative one-stage costs, as well as the compactness and continuity condition, and we convert the problem to an equivalent stochastic shortest path problem for which the existing theory applies. Using this transformation, we address the uniqueness of solution of Bellman's equation, the convergence of value iteration, and the convergence of some variants of policy iteration. Our analysis and algorithms under the second and third type of assumptions fully apply to finite-state positive (reward) and negative (reward) dynamic programming models.

H. Yu and D. P. Bertsekas, "A Mixed Value and Policy Iteration Method for Stochastic Control with Universally Measurable Policies," Lab. for Information and Decision Systems Report LIDS-P-2905, MIT, July 2013; Mathematics of Operations Research, Vol. 40, 2015, pp. 926-968. We consider the stochastic control model with Borel spaces and universally measurable policies. For this model the standard policy iteration is known to have difficult measurability issues and cannot be carried out in general. We present a mixed value and policy iteration method that circumvents this difficulty. The method allows the use of stationary policies in computing the optimal cost function, in a manner that resembles policy iteration. It can also be used to address similar difficulties of policy iteration in the context of upper and lower semicontinuous models. We analyze the convergence of the method in infinite horizon total cost problems, for the discounted case where the one-stage costs are bounded, and for the undiscounted case where the one-stage costs are nonpositive or nonnegative. For the undiscounted total cost problems with nonnegative one-stage costs, we also give a new convergence theorem for value iteration, which shows that value iteration converges whenever it is initialized with a function that is above the optimal cost function and yet bounded by a multiple of the optimal cost function. This condition resembles WhittleÕs bridging condition and is partly motivated by it. The theorem is also partly motivated by a result of Maitra and Sudderth, which showed that value iteration, when initialized with the constant function zero, could require a transfinite number of iterations to converge. We use the new convergence theorem for value iteration to establish the convergence of our mixed value and policy iteration method for the nonnegative cost models.

H. Yu and D. P. Bertsekas, "Weighted Bellman Equations and their Applications in Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-2876, MIT, October 2012; related slide presentation (Related Lecture Slides from ADPRL 2014.) (Video of the lecture from ADPRL 2014.) We consider approximation methods for Markov decision processes in the learning and simulation context. For policy evaluation based on solving approximate versions of a Bellman equation, we propose the use of weighted Bellman mappings. Such mappings comprise weighted sums of one-step and multistep Bellman mappings, where the weights depend on both the step and the state. For projected versions of the associated Bellman equations, we show that their solutions have the same nature and essential approximation properties as the commonly used approximate solutions from TD($\lambda$). The most important feature of our framework is that each state can be associated with a different type of mapping. Compared with the standard TD($\lambda$) framework, this gives a more flexible way to combine multistage costs and state transition probabilities in approximate policy evaluation, and provides alternative means for bias-variance control. With weighted Bellman mappings, there is also greater flexibility to design learning and simulation-based algorithms. We demonstrate this with examples, including new TD-type algorithms with state-dependent $\lambda$ parameters, as well as block versions of the algorithms. Weighted Bellman mappings can also be applied in approximate policy iteration: we provide several examples, including some new optimistic policy iteration schemes. Another major feature of our framework is that the projection need not be based on a norm, but rather can use a semi-norm. This allows us to establish a close connection between projected equation and aggregation methods, and to develop for the first time multistep aggregation methods, including some of the TD($\lambda$)-type.

D. P. Bertsekas, "Weighted Sup-Norm Contractions in Dynamic Programming: A Review and Some New Applications," Lab. for Information and Decision Systems Report LIDS-P-2884, MIT, May 2012. We consider a class of generalized dynamic programming models based on weighted sup-norm contractions. We provide an analysis that parallels the one available for discounted MDP and for generalized models based on unweighted sup-norm contractions. In particular, we discuss the main properties and associated algorithms of these models, including value iteration, policy iteration, and their optimistic and approximate variants. The analysis relies on several earlier works that use more specialized assumptions. In particular, we review and extend the classical results of Denardo [Den67] for unweighted sup-norm contraction models, as well as more recent results relating to approximation methods for discounted MDP. We also apply the analysis to stochastic shortest path problems where all policies are assumed proper. For these problems we extend three results that are known for discounted MDP. The first relates to the convergence of optimistic policy iteration and extends a result of Rothblum [Rot79], the second relates to error bounds for approximate policy iteration and extends a result of Bertsekas and Tsitsiklis [BeT96], and the third relates to error bounds for approximate optimistic policy iteration and extends a result of Thiery and Scherrer [ThS10b].

M. Wang and D. P. Bertsekas, "Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems," Lab. for Information and Decision Systems Report LIDS-P-2878, MIT, December 2011 (revised March 2012); Mathematics of Operations Research, Vol. 39, pp. 1-30, 2013; related slide presentation, related poster presentation.

Abstract: We consider linear systems of equations, $Ax=b$, of various types frequently arising in large-scale applications, with an emphasis on the case where $A$ is singular. Under certain conditions, necessary as well as sufficient, linear deterministic iterative methods generate sequences $\{x_k\}$ that converge to a solution, as long as there exists at least one solution. We show that this convergence property is frequently lost when these methods are implemented with simulation, as is often done in important classes of large-scale problems. We introduce additional conditions and novel algorithmic stabilization schemes under which $\{x_k\}$ converges to a solution when $A$ is singular, and may also be used with substantial benefit when $A$ is nearly singular. Moreover, we establish the mathematical foundation for related work that deals with special cases of singular systems, including some arising in approximate dynamic programming, where convergence may be obtained without a stabilization mechanism.

M. Wang and D. P. Bertsekas, "Convergence of Iterative Simulation-Based Methods for Singular Linear Systems," Lab. for Information and Decision Systems Report LIDS-P-2879, MIT, December 2011 (revised April 2012); Stochastic Systems, 2013; related slide presentation, related poster presentation.

Abstract: We consider simulation-based algorithms for linear systems of equations, $Ax=b$, where $A$ is singular. The convergence properties of iterative solution methods can be impaired when the methods are implemented with simulation, as is often done in important classes of large-scale problems. We focus on special cases of singular systems, including some arising in approximate dynamic programming, where convergence of the residual sequence may be obtained without a stabilization mechanism, while the sequence of iterates may diverge. For some of these special cases, under additional assumptions, we show that the sequence is guaranteed to converge. For situations where the sequence of iterates diverges, we propose schemes for extracting from the divergent sequence another sequence that converges to a solution of $Ax=b$.

Abstract: In this paper we discuss lambda-policy iteration, a method for exact and approximate dynamic programming. It is intermediate between the classical value iteration (VI) and policy iteration (PI) methods, and it is closely related to optimistic (also known as modified) PI, whereby each policy evaluation is done approximately, using a finite number of VI. We review the theory of the method and associated questions of bias and exploration arising in simulation-based cost function approximation. We then discuss various implementations, which offer advantages over well-established PI methods that use LSPE(lambda), LSTD(lambda), or TD(lambda) for policy evaluation with cost function approximation. One of these implementations is based on a new simulation scheme, called geometric sampling, which uses multiple short trajectories rather than a single infinitely long trajectory.

Abstract: We consider the stochastic shortest path problem, a classical finite-state Markovian decision problem with a termination state, and we propose new convergent Q-learning algorithms that combine elements of policy iteration and classical Q-learning/value iteration. These algorithms are related to the ones introduced by the authors for discounted problems in [BeY10]. The main difference from the standard policy iteration approach is in the policy evaluation phase: instead of solving a linear system of equations, our algorithm solves an optimal stopping problem inexactly with a finite number of value iterations. The main advantage over the standard Q-learning approach is lower overhead: most iterations do not require a minimization over all controls, in the spirit of modified policy iteration. We prove the convergence of asynchronous stochastic lookup table implementations of our method for undiscounted, total cost stochastic shortest path problems, thereby overcoming some of the traditional convergence difficulties of asynchronous modified policy iteration, and providing policy iteration-like alternative Q-learning schemes with as reliable convergence as classical Q-learning. We also discuss methods that use basis function approximations of Q-factors and we give associated error bounds.

Abstract: We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsi94] and establishing completely the convergence of Q-learning for these SSP models.

D. P. Bertsekas, "Temporal Difference Methods for General Projected Equations," IEEE Trans. on Automatic Control, Vol. 56, pp. 2128 - 2139, 2011.

Abstract: We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce an analytical framework based on an equivalence with variational inequalities, and a class of iterative algorithms that may be implemented with low-dimensional simulation. These algorithms originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD methods, which offer special implementation advantages and reduced overhead over the standard LSTD and LSPE methods, and can deal with rank deficiency in the associated matrix inversion. There is a sharp qualitative distinction between the deterministic and the simulation-based versions: the performance of the former is greatly affected by direction and feature scaling, yet the latter have the same asymptotic convergence rate regardless of scaling, because of their common simulation-induced performance bottleneck. (Related Lecture Slides)

N. Polydorides, M. Wang, and D. P. Bertsekas, "A Quasi Monte Carlo Method for Large Scale Inverse Problems," Proc. of The 9th International Conference on Monte Carlo and Quasi-Monte Carlo Methods in Scientific Computing (MCQMC 2010); in "Monte Carlo and Quasi-Monte Carlo Methods 2010," by H. Wozniakowski and L. Plaskota (eds), Springer-Verlag.

Abstract: We consider large-scale linear inverse problems with a simulation-based algorithm that approximates the solution within a low-dimensional subspace. The algorithm uses Tikhonov regularization, regression, and low-dimensional linear algebra calculations and storage. For sampling efficiency, we implement importance sampling schemes, specially tailored to the structure of inverse problems. We emphasize various alternative methods for approximating the optimal sampling distribution and we demonstrate their impact on the reduction of simulation noise. The performance of our algorithm is tested on a practical inverse problem arising from Fredholm integral equations of the first kind.

Abstract: We consider the distributed solution of dynamic programming (DP) problems by policy iteration. We envision a network of processors, each updating asynchronously a local policy and a local cost function, defined on a portion of the state space. The computed values are communicated asynchronously between processors and are used to perform the local policy and cost updates. The natural algorithm of this type can fail even under favorable circumstances, as shown by Williams and Baird [WiB93]. We propose an alternative and almost as simple algorithm, which converges to the optimum under the most general conditions, including asynchronous updating by multiple processors using outdated local cost functions of other processors.

D. P. Bertsekas, "Pathologies of Temporal Difference Methods in Approximate Dynamic Programming," Proc. 2010 IEEE Conference on Decision and Control, Atlanta, GA, Dec. 2010. (Related Lecture Slides)

Abstract: Approximate policy iteration methods based on temporal differences are popular in practice, and have been tested extensively, dating to the early nineties, but the associated convergence behavior is complex, and not well understood at present. An important question is whether the policy iteration process is seriously hampered by oscillations between poor policies, roughly similar to the attraction of gradient methods to poor local minima. There has been little apparent concern in the approximate DP/reinforcement learning literature about this possibility, even though it has been documented with several simple examples. Recent computational experimentation with the game of tetris, a popular testbed for approximate DP algorithms over a 15-year period, has brought the issue to sharp focus. In particular, using a standard set of 22 features and temporal difference methods, an average score of a few thousands was achieved. Using the same features and a random search method, an overwhelmingly better average score was achieved (600,000-900,000). The paper explains the likely mechanism of this phenomenon, and derives conditions under which it will not occur.

D. P. Bertsekas and H. Yu, "Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-2831, MIT, April, 2010 (revised November 2011); Math. of Operations Research, Vol. 37, 2012, pp. 66-94; a shorter version appears in Proc. of 2010 IEEE Conf. on Decision and Control, Atlanta, GA, Dec. 2010.(Related Lecture Slides) (A counterexample by Williams and Baird that motivates in part this paper)

Abstract: We consider the classical finite-state discounted Markovian decision problem, and we introduce a new policy iteration-like algorithm for finding the optimal Q-factors. Instead of policy evaluation by solving a linear system of equations, our algorithm requires (possibly inexact) solution of a nonlinear system of equations, involving estimates of state costs as well as Q-factors. This is Bellman's equation for an optimal stopping problem that can be solved with simple Q-learning iterations, in the case where a lookup table representation is used; it can also be solved with the Q-learning algorithm of Tsitsiklis and Van Roy [TsV99], in the case where feature-based Q-factor approximations are used. In exact/lookup table representation form, our algorithm admits asynchronous and stochastic iterative implementations, in the spirit of asynchronous/modified policy iteration, with lower overhead and more reliable convergence advantages over existing Q-learning schemes. Furthermore, for large-scale problems, where linear basis function approximations and simulation-based temporal difference implementations are used, our algorithm resolves effectively the inherent difficulties of existing schemes due to inadequate exploration.

D. P. Bertsekas, "Approximate Policy Iteration: A Survey and Some New Methods," Journal of Control Theory and Applications, Vol. 9, 2011, pp. 310-335.

Abstract: We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic policy iteration.

Our discussion of policy evaluation is couched in general terms, and aims to unify the many approaches in the field in the light of recent research developments, and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods such as LSTD, and iterative methods such as LSPE and TD(lambda), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods, and provides the connecting link between LSTD and LSPE.

Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.

N. Polydorides, M. Wang, and D. P. Bertsekas, "Approximate Solution of Large-Scale Linear Inverse Problems with Monte Carlo Simulation," Lab. for Information and Decision Systems Report LIDS-P-2822, MIT, November 2009.

Abstract: We consider the approximate solution of linear ill-posed inverse problems of high dimension with a simulation-based algorithm that approximates the solution within a low-dimensional subspace. The algorithm uses Tikhonov regularization, regression, and low-dimensional linear algebra calculations and storage. For sampling efficiency, we use variance reduction/importance sampling schemes, specially tailored to the structure of inverse problems. We demonstrate the implementation of our algorithm in a series of practical large-scale examples arising from Fredholm integral equations of the first kind.

M. Wang, N. Polydorides, and D. P. Bertsekas, "Approximate Simulation-Based Solution of Large-Scale Least Squares Problems," Lab. for Information and Decision Systems Report LIDS-P-2819, MIT, September 2009.

Abstract: We consider linear least squares problems of very large dimension, such as those arising for example in inverse problems. We introduce an associated approximate problem, within a subspace spanned by a relatively small number of basis functions, and solution methods that use simulation, importance sampling, and low-dimensional calculations. The main components of this methodology are a regression/regularization approach that can deal with nearly singular problems, and an importance sampling design approach that exploits existing continuity structures in the underlying models, and allows the solution of very large problems.

D. P. Bertsekas, "Projected Equations, Variational Inequalities, and Temporal Difference Methods," Lab. for Information and Decision Systems Report LIDS-P-2808, MIT, March 2009 -- A shorter/abridged version appeared in Proc. of IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2009, Nashville, TN.

Abstract: We consider projected equations for approximate solution of high-dimensional fixed point problems within low-dimensional subspaces. We introduce a unified framework based on an equivalence with variational inequalities (VIs), and a class of iterative feasible direction methods that may be implemented with low-dimensional simulation. These methods originated in approximate dynamic programming (DP), where they are collectively known as temporal difference (TD) methods. Even when specialized to DP, our methods include extensions/new versions of TD algorithms, which offer special implementation advantages, reduced overhead, and improved computational complexity over the standard LSTD and LSPE methods. We demonstrate a sharp qualitative distinction between the deterministic and the simulation-based versions: the performance of the former is greatly affected by direction and feature scaling, yet the latter asymptotically perform identically, regardless of scaling. (Related Lecture Slides)

H. Yu and D. P. Bertsekas, "Basis Function Adaptation Methods for Cost Approximation in MDP," Proc. of IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning 2009, Nashville, TN. Click here for an extended version.

Abstract: We generalize a basis adaptation method for cost approximation in Markov decision processes (MDP), extending earlier work of Menache, Mannor, and Shimkin. In our context, basis functions are parametrized and their parameters are tuned by minimizing an objective function involving the cost function approximation obtained when a temporal differences (TD) or other method is used. The adaptation scheme involves only low order calculations and can be implemented in a way analogous to policy gradient methods. In the generalized basis adaptation framework we provide extensions to TD methods for nonlinear optimal stopping problems and to alternative cost approximations beyond those based on TD. (Related Lecture Slides)

Abstract: We consider linear fixed point equations and their approximations by projection on a low dimensional subspace. We derive new bounds on the approximation error of the solution, which are expressed in terms of low dimensional matrices and can be computed by simulation. When the fixed point mapping is a contraction, as is typically the case in Markov decision processes (MDP), one of our bounds is always sharper than the standard contraction-based bounds, and another one is often sharper. The former bound is also tight in a worst-case sense. Our bounds also apply to the non-contraction case, including policy evaluation in MDP with nonstandard projections that enhance exploration. There are no error bounds currently available for this case to our knowledge.(Related Lecture Slides)

D. P. Bertsekas and H. Yu, "Projected Equation Methods for Approximate Solution of Large Linear Systems," Journal of Computational and Applied Mathematics, Vol. 227, 2009, pp. 27-50.

Abstract: We consider linear systems of equations and solution approximations derived by projection on a low-dimensional subspace. We propose stochastic iterative algorithms, based on simulation, which converge to the approximate solution and are suitable for very large-dimensional problems. The algorithms are extensions of recent approximate dynamic programming methods, known as temporal difference methods, which solve a projected form of Bellman's equation by using simulation-based approximations to this equation, or by using a projected value iteration method. (Related Lecture Slides) An extended report: "Solution of Large Systems of Equations Using Approximate Dynamic Programming Methods," Lab. for Information and Decision Systems Report 2754, MIT, June 2007.

H. Yu and D. P. Bertsekas, "Q-Learning Algorithms for Optimal Stopping Based on Least Squares," Proc. European Control Conference 2007, Kos, Greece, July 2007. (Related Lecture Slides) An extended report: "A Least Squares Q-Learning Algorithm for Optimal Stopping Problems," Lab. for Information and Decision Systems Report 2731, MIT, February 2007 (revised June 2007).

Abstract: We consider the solution of discounted optimal stopping problems using linear function approximation methods. A $Q$-learning algorithm for such problems, proposed by Tsitsiklis and Van Roy, is based on the method of temporal differences and stochastic approximation. We propose alternative algorithms, which are based on projected value iteration ideas and least squares. We prove the convergence of some of these algorithms and discuss their properties. (Lecture Slides)

H. Yu and D. P. Bertsekas, "On Near-Optimality of the Set of Finite-State Controllers for Average Cost POMDP," Lab. for Information and Decision Systems Report 2689, MIT, April 2006; Mathematics of Operations Research, 33(1), pp. 1-11, February 2008.

Abstract: We consider the average cost problem for partially observable Markov decision processes (POMDP) with finite state, observation, and control spaces. We prove that there exists an $\epsilon$-optimal finite-state controller functionally independent of initial distributions for any $\epsilon > 0$, under the assumption that the optimal liminf average cost function of the POMDP is constant. As part of our proof, we establish that if the optimal liminf average cost function is constant, then the optimal limsup average cost function is also constant, and the two are equal. We also discuss the connection between the existence of nearly optimal finite-history controllers and two other important issues for average cost POMDP: the existence of an average cost that is independent of the initial state distribution, and the existence of a bounded solution to the constant average cost optimality equation. (Related slide presentation)

H. Yu and D. P. Bertsekas, "Convergence Results for Some Temporal Difference Methods Based on Least Squares," Lab. for Information and Decision Systems Report 2697, MIT, June 2006; revised August, 2008; IEEE Trans. on Aut. Control, Vol. 54, 2009, pp. 1515-1531.

Abstract: We consider finite-state Markovian Decision Problems and prove convergence and rate of convergence results for certain least squares policy evaluation algorithms. These are temporal difference methods for constructing a linear function approximation of the cost function of a stationary policy, within the context of infinite-horizon discounted and average cost dynamic programming. We introduce an average cost method, patterned after the discounted cost method, which uses a constant stepsize, and we prove its convergence. We also show that the convergence rate of both the discounted and the average cost methods is optimal within the class of temporal difference methods. Analysis and experiment indicate that our methods are substantially and often dramatically faster than TD(Lambda), as well as more reliable. (Related slide presentation)

D. P. Bertsekas, "Separable Dynamic Programming and Approximate Decomposition Methods," Lab. for Information and Decision Systems Report 2684, MIT, Feb. 2006; IEEE Trans. on Aut. Control, Vol. 52, 2007, pp. 911-916.

Abstract: We consider control, planning, and resource allocation problems involving several independent subsystems that are coupled through a control/decision constraint. We discuss one-step lookahead methods that use an approximate cost-to-go function derived from the solution of single subsystem problems. We propose a new method for constructing such approximations, and derive bounds on the performance of the associated suboptimal policies. We then specialize this method to problems of reachability of target tubes that have the form of a box (a Cartesian product of subsystem tubes). This yields inner approximating tubes, which have the form of a union of a finite number of boxes, each involving single subsystem calculations.

Abstract: We survey selectively the field of approximate dynamic programming (ADP), with a particular emphasis on two recent directions of research: rollout algorithms and model predictive control (MPC). We argue that while motivated by different concerns, these two methodologies are closely connected, and the mathematical essence of their desirable properties (cost improvement and stability, respectively) is couched on the central dynamic programming idea of policy iteration. In particular, among other things, we show that the most common MPC schemes can be viewed as rollout algorithms or one-step policy iteration methods. Furthermore, we embed rollout and MPC within a new unifying suboptimal control framework, based on a concept of restricted or constrained structure policies, which contains these schemes as special cases.

D. P. Bertsekas, "Rollout Algorithms for Constrained Dynamic Programming," Lab. for Information and Decision Systems Report 2646, MIT, April 2005.

Abstract: The rollout algorithm is a suboptimal control method for deterministic and stochastic problems that can be solved by dynamic programming. In this short note, we derive an extension of the rollout algorithm that applies to constrained deterministic dynamic programming problems, and relies on a suboptimal policy, called base heuristic. Under suitable assumptions, we show that if the base heuristic produces a feasible solution, the rollout algorithm also produces a feasible solution, whose cost is no worse than the cost corresponding to the base heuristic.

H. Yu, and D. P. Bertsekas, "Discretized Approximations for POMDP with Average Cost," The 20th Conference on Uncertainty in Artificial Intelligence, 2004, Banff, Canada.

Abstract: In this paper, we propose a new lower approximation scheme for POMDP with discounted and average cost criterion. The approximating functions are determined by their values at a finite number of belief points, and can be computed efficiently using value iteration algorithms for finite-state MDP. While for discounted problems several lower approximation schemes have been proposed earlier, ours seems the first of its kind for average cost problems. We focus primarily on the average cost case, and we show that the corresponding approximation can be computed efficiently using multi-chain algorithms for finite-state MDP. We give a preliminary analysis showing that regardless of the existence of the optimal average cost in the POMDP, the approximation obtained is a lower bound of the liminf optimal average cost function, and can also be used to calculate an upper bound on the limsup optimal average cost function, as well as bounds on the cost of executing the stationary policy associated with the approximation. We show the convergence of the cost approximation, when the optimal average cost is constant and the optimal differential cost is continuous.

D. P. Bertsekas, V. Borkar, and A. Nedic, "Improved Temporal Difference Methods with Linear Function Approximation," in "Learning and Approximate Dynamic Programming", by A. Barto, W. Powell, J. Si, (Eds.), IEEE Press, 2004, pp. 231-255.

Abstract: We consider temporal difference algorithms within the context of infinite-horizon finite-state dynamic programming problems with discounted cost, and linear cost function approximation. We show, under standard assumptions, that a least squares-based temporal difference method, proposed by Nedic and Bertsekas [NeB03], converges with a stepsize equal to 1. To our knowledge, this is the first iterative temporal difference method that converges without requiring a diminishing stepsize. We discuss the connections of the method with Sutton's TD(Lambda) and with various versions of least-squares based value iteration, and we show via analysis and experiment that the method is substantially and often dramatically faster than TD(Lambda), as well as simpler and more reliable. We also discuss the relation of our method with the LSTD method of Boyan [Boy02], and Bradtke and Barto [BrB96].

A. Nedic and D. P. Bertsekas, "Least-Squares Policy Evaluation Algorithms with Linear Function Approximation," LIDS Report LIDS-P-2537, Dec. 2001; J. of Discrete Event Systems, Vol. 13, 2003, pp. 79-110.

Abstract: We consider two policy evaluation methods for discounted dynamic programming, which use simulation, temporal differences, and linear cost function approximation. The first method is a new gradient-like algorithm involving least-squares subproblems and a diminishing stepsize, which is based on the lambda-policy iteration method of Bertsekas and Ioffe. The second method is the LSTD(lambda) algorithm recently proposed by Boyan, which for lambda=0 coincides with the linear least-squares temporal-difference algorithm of Bradtke and Barto. At present, there is only a convergence result by Bradtke and Barto for the LSTD(0) algorithm. Here, we strengthen this result by showing the convergence of LSTD(lambda), with probability 1, for every lambda in [0,1].

C. C. Wu and D. P. Bertsekas, "Admission Control for Wireless Networks,"accepted inIEEE Trans. on Vehicular Technology.

Abstract: With the population of wireless subscribers increasing at a rapid rate, overloaded situations are likely to become an increasing problem. Admission control can be used to balance the goals of maximizing bandwidth utilization and ensuring sufficient resources for high priority events. In this paper, we formulate the admission control problem as a Markov decision problem. While dynamic programming can be used to solve such problems, the large size of the state space makes this impractical. We propose an approximate dynamic programming technique, which involves creating an approximation of the original model with a state space sufficiently small so that dynamic programming can be applied. Our results show that the method improves significantly on policies that are generally in use, in particular, the greedy policy and the reservation policy. Much of the computation required for our method can be done off-line, and the real-time computation required is easily distributed between the cells.

D. P. Bertsekas and D. A. Castanon, "Rollout Algorithms for Stochastic Scheduling Problems," J. of Heuristics, Vol. 5, 1999, pp. 89-108.

Abstract: Stochastic scheduling problems are difficult stochastic control problems with combinatorial decision spaces. In this paper we focus on a class of stochastic scheduling problems, the quiz problem and its variations. We discuss the use of heuristics for their solution, and we propose rollout algorithms based on these heuristics which approximate the stochastic dynamic programming algorithm. We show how the rollout algorithms can be implemented efficiently, and we delineate circumstances under which they are guaranteed to perform better than the heuristics on which they are based. We also show computational results which suggest that the performance of the rollout policies is near-optimal, and is substantially better than the performance of their underlying heuristics.

D. P. Bertsekas, M. L. Homer, D. A. Logan, S. D. Patek, and N. R. Sandell, "Missile Defense and Interceptor Allocation by Neuro-Dynamic Programming," IEEE Trans. on Systems, Man, and Cybernetics, 1999.

Abstract: The purpose of this paper is to propose a solution methodology for a missile defense problem involving the sequential allocation of defensive resources over a series of engagements. The problem is cast as a dynamic programming/Markovian decision problem, which is computationally intractable by exact methods because of its large number of states and its complex modeling issues. We have employed a Neuro-Dynamic Programming (NDP) framework, whereby the cost-to-go function is approximated using neural network architectures that are trained on simulated data. We report on the performance obtained using several different training methods, and we compare this performance with the optimal.

J. Abounadi, D. Bertsekas, and V. Borkar, "Learning Algorithms for Markov Decision Processes," Report LIDS-P-2434, Lab. for Info. and Decision Systems, 1998; SIAM J. on Control and Optimization, Vol. 40, 2001, pp. 681-698.

Abstract: This paper gives the first rigorous convergence analysis of analogs of Watkins' Q-learning algorithm, applied to average cost control of finite-state Markov chains. We discuss two algorithms which may be viewed as stochastic approximation counterparts of two existing algorithms for recursively computing the value function of average cost problem - the traditional relative value iteration algorithm and a recent algorithm of Bertsekas based on the stochastic shortest path (SSP) formulation of the problem. Both synchronous and asynchronous implementations are considered and are analyzed using the ``ODE" method. This involves establishing asymptotic stability of associated ODE limits. The SSP algorithm also uses ideas from two time scale stochastic approximation.

J. Abounadi, D. Bertsekas, and V. Borkar, "Stochastic Approximation for Non-Expansive Maps: Q-Learning Algorithms," Report LIDS-P-2433, Lab. for Info. and Decision Systems, 1998; SIAM J. on Control and Optimization, Vol. 41, 2002, pp. 1-22.

Abstract: We discuss synchronous and asynchronous variants of fixed point iterations of the form x^{k+1} = x^k + \gamma(k) \bl(F(x^k,w^k)-x^k\br), where F is a non-expansive mapping under a suitable norm, and {w^k} is a stochastic sequence. These are stochastic approximation iterations that can be analyzed using the ODE approach based either on Kushner and Clark's Lemma for the synchronous case or Borkar's Theorem for the asynchronous case. However, the analysis requires that the iterates {x^k} are bounded, a fact which is usually hard to prove. We develop a novel framework for proving boundedness, which is based on scaling ideas and properties of Lyapunov functions. We then combine the boundedness property with Borkar's stability analysis of ODE's involving non-expansive mappings to prove convergence with probability 1. We also apply our convergence analysis to Q-learning algorithms for stochastic shortest path problems and we are able to relax some of the assumptions of the currently available results.

S. D. Patek and D. P. Bertsekas, "Stochastic Shortest Path Games," SIAM J. on Control and Optimization, Vol. 36, 1999, pp. 804-824.

Abstract: We consider dynamic, two-player, zero-sum games where the "minimizing" player seeks to drive an underlying finite-state dynamic system to a special terminal state along a least expected cost path. The "maximizer" seeks to interfere with the minimizer's progress so as to maximize the expected total cost. We consider, for the frst time, undiscounted finite-state problems, with compact action spaces, and transition costs that are not strictly positive. We admit that there are policies for the minimizer which permit the maximizer to prolong the game indefinitely. Under assumptions which generalize deterministic shortest path problems, we establish (i) the existence of a real-valued equilibrium cost vector achievable with stationary policies for the opposing players and (ii) the convergence of value iteration and policy iteration to the unique solution of Bellman's equation.

S. D. Patek and D. P. Bertsekas, "Finite-state Average Cost Stochastic Games with Compact Constraint Sets and a Recurrence Condition," SIAM J. on Control and Optimization, Vol. 36, 1998, pp. 804-824.

Abstract: We characterize and establish the existence of stationary equilibrium solutions for a class of finite-state average cost games. We assume that two players choose actions at each stage from compact constraint sets enforcing some relatively mild assumptions on the transition probability and cost functions. We also assume that there is a distinguished state which is recurrent under each pair of stationary policies and that the corresponding Markov chains have a single recurrent class. In the second half of the paper we establish the convergence of several dynamic programming algorithms.

D. P. Bertsekas, "A New Value Iteration Method for the Average Cost Dynamic Programming Problem," SIAM J. on Control and Optimization, Vol. 36, 1998, pp. 742-759.

Abstract: We propose a new value iteration method for the classical average cost Markovian Decision problem, under the assumption that all stationary policies are unichain and furthermore there exists a state that is recurrent under all stationary policies. This method is motivated by a relation between the average cost problem and an associated stochastic shortest path problem. Contrary to the standard relative value iteration, our method involves a weighted sup norm contraction and for this reason it admits a Gauss-Seidel implementation. Computational tests indicate that the Gauss-Seidel version of the new method substantially outperforms the standard method for difficult problems.

D. P. Bertsekas, "Differential Training of Rollout Policies," Proc. of the 35th Allerton Conference on Communication, Control, and Computing, Allerton Park, Ill., October 1997.

Abstract: We consider the approximate solution of stochastic optimal control problems using a neuro-dynamic programming/reinforcement learning methodology. We focus on the computation of a rollout policy, which is obtained by a single policy iteration starting from some known base policy and using some form of exact or approximate policy improvement. We indicate that, in a stochastic environment, the popular methods of computing rollout policies are particularly sensitive to simulation and approximation error, and we present more robust alternatives, which aim to estimate relative rather than absolute Q-factor and cost-to-go values. In particular, we propose a method, called differential training, that can be used to obtain an approximation to cost-to-go differences rather than cost-to-go values by using standard methods such as TD(lambda) and lambda-policy iteration. This method is suitable for recursively generating rollout policies in the context of simulation-based policy iteration methods.

D. P. Bertsekas, J. N. Tsitsiklis, and C. Wu, "Rollout Algorithms for Combinatorial Optimization," J. of Heuristics, Vol. 3, 1997, pp. 245-262.

Abstract: We consider the approximate solution of discrete optimization problems using procedures that are capable of magnifying the effectiveness of any given heuristic algorithm through sequential application. In particular, we embed the problem within a dynamic programming framework, and we introduce several types of rollout algorithms, which are related to notions of policy iteration. We provide conditions guaranteeing that the rollout algorithm improves the performance of the original heuristic algorithm. The method is illustrated in the context of a machine maintenance and repair problem.

D. P. Bertsekas and S. Ioffe, "Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming," Report LIDS-P-2349, Lab. for Information and Decision Systems, MIT, 1996.

Abstract: We consider a new policy iteration method for dynamic programming problems with discounted and undiscounted cost. The method is based on the notion of temporal differences, and is primarily geared to the case of large and complex problems where the use of approximations is essential. We develop the theory of these methods without approximation, we describe how to embed them within a neuro-dynamic programming/reinforcement learning context where feature-based approximation architectures are used, we relate them to TD(Lambda) methods, and we illustrate their use in the training of a tetris playing program.

L. C. Polymenakos, D. P. Bertsekas, and J. N. Tsitsiklis, "Efficient Algorithms for Continuous-Space Shortest Path Problems," IEEE Transactions on Automatic Control, Vol. 43, 1998, pp. 278-283.

Abstract: We consider a continuous-space shortest path problem in a two-dimensional plane. This is the problem of finding a trajectory that starts at a given point, ends at the boundary of a compact set, and minimizes a cost function of the form $\int_0^Tr(x(t))dt+q(x(T)).$ For a discretized version of this problem, a Dijkstra-like method that requires one iteration per discretization point has been developed by Tsitsiklis. Here we develop some new label correcting-like methods based on the Small Label First methods of Bertsekas. We prove the finite termination of these methods, and we present computational results showing that they are competitive and often superior to the Dijkstra-like method, and are also much faster than the traditional Jacobi and Gauss-Seidel methods.

S. D. Patek and D. P. Bertsekas,"Play Selection in American Football: a Case Study in Neuro-Dynamic Programming", Chapter 7 in Advances in Computational and Stochastic Optimization, Logic Programming, and Heuristic Search: Interfaces in Computer Science and Operations Research, David L. Woodruff, editor. Kluwer Academic Publishers, Boston, 1997.

Abstract: We present a computational case study of neuro-dynamic program- ming, a recent class of reinforcement learning methods. We cast the problem of play selection in American football as a stochastic shortest path Markov Decision Problem (MDP). In particular, we consider the problem faced by a quarterback in attempting to maximize the net score of an offensive drive. The resulting optimization problem serves as a medium-scale testbed for numerical algorithms based on policy iteration. The algorithms we consider evolve as a sequence of approximate policy eval- uations and policy updates. An (exact) evaluation amounts to the computation of the reward-to-go function associated with the policy in question. Approxi- mations of reward-to-go are obtained either as the solution or as a step toward the solution of a training problem involving simulated state/reward data pairs. Within this methodological framework there is a great deal of flexibility. In specifying a particular algorithm, one must select a parametric form for esti- mating the reward-to-go function as well as a training algorithm for tuning the approximation. One example we consider, among many others, is the use of a multilayer perceptron (i.e. neural network) which is trained by backpropaga- tion. The objective of this paper is to illustrate the application of neuro-dynamic programming methods in solving a well-defined optimization problem. We will contrast and compare various algorithms mainly in terms of performance, al- though we will also consider complexity of implementation. Because our version of football leads to a medium-scale Markov decision problem, it is possible to compute the optimal solution numerically, providing a yardstick for meaningful comparison of the approximate methods.

B. Van Roy, D. P. Bertsekas, Y. Lee, and J. N. Tsitsiklis, "A Neuro-Dynamic Programming Approach to Retailer Inventory Management," Proceedings of the IEEE Conference on Decision and Control, 1997; this is an extended version which appeared as a Lab. for Information and Decision Systems Report, MIT, Nov. 1996.

Abstract: We present a model of two-echelon retailer inventory systems, and we cast the problem of generating optimal control strategies into the framework of dynamic programming. We formulate two specific case studies for which the underlying dynamic programming problems involve thirty three and forty six state variables, respectively. Because of the enormity of these state spaces, classical algorithms of dynamic programming are inapplicable. To address these complex problems we develop approximate dynamic programming algorithms. The algorithms are motivated by recent research in artificial intelligence involving simulation-based methods and neural network approximations, and they are representative of algorithms studied in the emerging field of neuro-dynamic programming. We assess performance of resulting solutions relative to optimized s-type ("order-up-to" policies), which are generally accepted as reasonable heuristics for the types of problems we consider. In both case studies, we are able to generate control strategies substantially superior to the heuristics, reducing inventory costs by approximately ten percent.

D. P. Bertsekas, F. Guerriero, and R. Musmanno, "Parallel Asynchronous Label Correcting Methods for Shortest Paths," J. of Optimization Theory and Applications, Vol. 88, 1996, pp. 297-320.

Abstract: In this paper we develop parallel asynchronous implementations of some known and some new label correcting methods for finding a shortest path from a single origin to all the other nodes of a directed graph. We compare these implementations on a shared memory multiprocessor, the Alliant FX/80, using several types of randomly generated problems. Excellent (sometimes superlinear) speedup is achieved with some of the methods, and it is found that the asynchronous versions of these methods are substantially faster than their synchronous counterparts.

D. P. Bertsekas, "Generic Rank One Corrections for Value Iteration in Markovian Decision Problems," Operations Research Letters, Vol. 17, 1995, pp. 111-119.

Abstract: Given a linear iteration of the form $x:=F(x)$, we consider modified versions of the form $x:=F(x+\g d)$, where $d$ is a fixed direction, and $\g$ is chosen to minimize the norm of the residual $\|x+\g d-F(x+\g d)\|$. We propose ways to choose $d$ so that the convergence rate of the modified iteration is governed by the subdominant eigenvalue of the original. In the special case where $F$ relates to a Markovian decision problem, we obtain a new extrapolation method for value iteration. In particular, our method accelerates the Gauss-Seidel version of the value iteration method for discounted problems in the same way that MacQueen's error bounds accelerate the standard version. Furthermore, our method applies equally well to Markov Renewal and undiscounted problems.

D. P. Bertsekas, "A Counterexample to Temporal Difference Learning," Neural Computation, Vol. 7, 1995, pp. 270-279.

Abstract: Sutton's TD(lambda) method aims to provide a representation of the cost function in an absorbing Markov chain with transition costs. A simple example is given where the representation obtained depends on $\l$. For $\l=1$ the representation is optimal with respect to a least squares error criterion, but as $\l$ decreases towards 0 the representation becomes progressively worse and, in some cases, very poor. The example suggests a need to understand better the circumstances under which TD(0) and Q-learning obtain satisfactory neural network-based compact representations of the cost function. A variation of TD(0) is also proposed, which performs better on the example.

D. P. Bertsekas and J. N. Tsitsiklis, "An Analysis of Stochastic Shortest Path Problems," Mathematics of Operations Research, Vol. 16, 1991, pp. 580-595.

Abstract: We consider a stochastic version of the classical shortest path problem whereby for each node of a graph, we must choose a probability distribution over the set of successor nodes so as to reach a certain destination node with minimum expected cost. The costs of transition between successive nodes can be positive as well as negative. We prove natural generalizations of the standard results for the deterministic shortest path problem, and we extend the corresponding theory for undiscounted finite state Markovian decision problems by removing the usual restriction that costs are either all nonnegative or all nonpositive.

D. P. Bertsekas and D. A. Castanon, "Adaptive Aggregation Methods for Infinite Horizon Dynamic Programming," IEEE Transactions on Aut. Control, Vol. 34, 1989, pp. 589-598.

Abstract: We propose a class of iterative aggregation algorithms for solving infinite horizon dynamic programming problems. The idea is to interject aggregation iterations in the course of the usual successive approximation method. An important new feature that sets our method apart from earlier proposals is that the aggregate groups of states change adaptively from one aggregation iteration to the next, depending on the progress of the computation. This allows acceleration of convergence in difficult problems involving multiple ergodic classes for which methods using fixed groups of aggregate states are ineffective. No knowledge of special problem structure is utilized by the algorithms.

D. P. Bertsekas, "Distributed Dynamic Programming," IEEE Transactions on Aut. Control, Vol. AC-27, 1982, pp. 610-616.

Abstract: We consider distributed algorithms for solving dynamic programming problems whereby several processors participate simultaneously in th computation while maintaining coordination by information exchange via communication links. A model of asynchronous distributed computation is developed which requires very weak assumptions on the ordering of computations,the timing of information exchange,the amount of local information needed at each computation node, and the initial condition for the algorithm. the class of problems considered is very broad and includes shortest path problems, and finite and infinite horizon stochastic optimal control problems. When specialized to the shortest path problem, the algorithm reduces to the algorithm originally implemented for routing messages in the internet.

D. P. Bertsekas, and S. E. Shreve "Existence of Optimal Stationary Policies in Deterministic Optimal Control," J. of Math Analysis and Applications, Vol. 69, 1979, pp. 607-620.

Abstract: This paper considers deterministic discrete-time optimal control problems over an infinite horizon involving a stationary system and a nonpositive cost per stage. Various results are provided relating to existence of an epsilon-optimal stationary policy, and existence of an optimal stationary policy assuming an optimal policy exists.

S. E. Shreve, and D. P. Bertsekas, "Alternative Theoretical Frameworks for Finite Horizon Discrete-Time Stochastic Optimal Control," SIAM J. on Control and Optimization, Vol. 16, 1978, pp. 953-978.

Abstract: Stochastic optimal control problems are usually analyzed under one of three types of assumptions: a) Countability assumptions on the underlying probability space - this eliminates all difficulties of measure theoretic nature; b) Semicontinuity assumptions under which the existence of optimal Borel measurable policies can be guaranteed; and c) Borel measurability assumptions under which the existence of p-optimal or p-epsilon-optimal Borel measurable policies can be guaranteed (Blackwell, Strauch). In this paper, we introduce a general theoretical framework based on outer integration which contains these three models as special cases. Within this framework all known results for finite horizon problems together with some new ones are proved and subsequently specialized. An important new feature of our specialization to the Borel measurable model is the introduction of universally measurable policies. We show that everywhere optimal or nearly optimal policies exist within this class and this enables us to dispense with the notion of p-optimality.

D. P. Bertsekas, "Monotone Mappings with Application in Dynamic Programming," SIAM J. on Control and Optimization, Vol. 15, 1977, pp. 438-464; a version from the Proceedings of the 1975 IEEE Conference on Decision and Control, Houston, TX: "Monotone Mappings in Dynamic Programming."

Abstract: The structure of many sequential optimization problems over a finite or infinite horizon can be summarized in the mapping that defines the related dynamic programming algorithm. In this paper, we take as a starting point this mapping and obtain results that are applicable to a broad class of problems. This approach has also been taken earlier by Denardo under contraction assumptions. The analysis here is carried out without contraction assumptions and thus the results obtained can be applied, for example, to the positive and negative dynamic programming models of Blackwell and Strauch. Most of the existing results for these models are generalized and several new results are obtained relating mostly to the convergence of the dynamic programming algorithm and the existence of optimal stationary policies.

D. P. Bertsekas, "On Error Bounds for Successive Approximation Methods," IEEE Transactions on Aut. Control, June 1976, pp. 394-396.

Abstract: This note considers a class of contraction mappings and the successive approximation method for obtaining the associated fixed points. Some error bounds are provided which genedie and strengthen those given by McQueen [ l ] and Denardo [2] for dynamic programming algorithms.

D. P. Bertsekas, "Convergence of Discretization Procedures in Dynamic Programming," IEEE Transactions on Aut. Control, June 1975, pp. 415-419.

Abstract: The computational solution of discrete-time stochastic optimal control problems by dynamic programming requires, in most cases, discretization of the state and control spaces whenever these spaces are infinite. In this short paper we consider a discretization procedure often employed in practice. Under certain compactness and Lipschitz continuity assumptions we show that the solution of the discretized algorithm converges to the solution of the continuous algorithm, as the discretization grid becomes finer and finer. Furthermore, any control law obtained from the discretized algorithm results in a value of the cost functional which converges to the optimal value of the problem.

D. P. Bertsekas, "Linear Convex Stochastic Control Problems Over an Infinite Horizon," IEEE Transactions on Aut. Control, Vol. AC-18, 1973, pp. 314-315.

Abstract: A stochastic control problem over an infinite horizon which involves a linear system and a convex cost functional is analyzed. We prove the convergence of the dynamic programming algorithm associated with the problem, and we show the existence of a stationary Borel measurable optimal control law. The approach used illustrates how results on infinite time reachability [1] can be used for the analysis of dynamic programming algorithms over an infinite horizon subject to state constraints.

D. P. Bertsekas, "Control of Uncertain Systems with a Set-Membership Description of the Uncertainty," Ph.D. Thesis, Dept. of Electrical Engineering, M.I.T., 1971.

Abstract: The problem of optimal feedback control of uncertain discrete-time dynamic systems is considered, where the uncertain quantities do not have a stochastic description but instead they are known to belong to given sets. The problem is converted to a sequential minimax problem and dynamic programming is suggested as a general method for its solution. The notion of a sufficiently informative function, which parallels the notion of a sufficient statistic of optimal control is introduced, and the possible decomposition of the optimal controller into an estimator and an actuator is demonstrated. Some special cases involving a linear system are further examined. A problem involving a convex cost functional and perfect state information for the controller is considered in detail. Particular attention is given to a special case, the problem of reachability of a target tube, and an ellipsoidal approximation algorithm is obtained which leads to linear control laws. State estimation problems are also examined, and some algorithms are derived which offer distinct advantages over existing estimation schemes. These algorithms are subsequently used in the solution of some reachability problems with imperfect state information for the controller.

Lecture Slides, Survey Papers, Research papers, Top Menu

Nonlinear Programming and Optimization Applications

D. P. Bertsekas, "Incremental Aggregated Proximal and Augmented Lagrangian Algorithms," Lab. for Information and Decision Systems Report LIDS-P-3176, MIT, June 2015. Incorporated into the author's book "Nonlinear Programming," 3rd edition, Athena Scientific, 2016. (Related Video Lecture) . We consider minimization of the sum of a large number of convex functions, and we propose an incremental aggregated version of the proximal algorithm, which bears similarity to the incremental aggregated gradient and subgradient methods that have received a lot of recent attention. Under cost function differentiability and strong convexity assumptions, we show linear convergence for a sufficiently small constant stepsize. We then consider dual versions of incremental proximal algorithms, which are incremental augmented Lagrangian methods for separable constrained optimization problems. Contrary to the standard augmented Lagrangian method, these methods admit decomposition in the minimization of the augmented Lagrangian. The incremental aggregated augmented Lagrangian method also bears similarity to several known decomposition algorithms, which, however, are not incremental in nature: the alternating direction method of multipliers (ADMM), the Stephanopoulos and Westerberg algorithm [StW75], and the related methods of Tadjewski [Tad89] and Ruszczynski [Rus95].

M. Wang and D. P. Bertsekas, "Stochastic First-Order Methods with Random Constraint Projection," SIAM Journal on Optimization, 2016, Vol. 26, pp. 681-717. (Related Lecture Slides) (Related Video Lecture) .

We consider convex optimization problems with structures that are suitable for sequential treatment or online sampling. In particular, we focus on problems where the objective function is an expected value, and the constraint set is the intersection of a large number of simpler sets. We propose an algorithmic framework for stochastic first-order methods using random projection/proximal updates and random constraint updates, which contain as special cases several known algorithms as well as many new algorithms. To analyze the convergence of these algorithms in a unified manner, we prove a general coupled convergence theorem. It states that the convergence is obtained from an interplay between two coupled processes: progress toward feasibility and progress toward optimality. Under suitable stepsize assumptions, we show that the optimality error decreases at a rate of O(1/ \sqrt{k}) and the feasibility error decreases at a rate of O(log k/k). We also consider a number of typical sampling processes for generating stochastic first-order information and random constraints, which are common in data-intensive applications, online learning, and simulation optimization. By using the coupled convergence theorem as a modular architecture, we are able to analyze the convergence of stochastic algorithms that use arbitrary combinations of these sampling processes.

M. Wang and D. P. Bertsekas, "Incremental Constraint Projection Methods for Variational Inequalities," Lab. for Information and Decision Systems Report LIDS-P-2898, MIT, December 2012; Mathematical Programming, Vol. 150, 2015, pp. 321-363. We consider the solution of strongly monotone variational inequalities of the form $F(x^*)'(x-x^*)\geq 0$, for all $x\in X$. We focus on special structures that lend themselves to sampling, such as when $X$ is the intersection of a large number of sets, and/or $F$ is an expected value or is the sum of a large number of component functions. We propose new methods that combine elements of incremental constraint projection and stochastic gradient. We analyze the convergence and the rate of convergence of these methods with various types of sampling schemes, and we establish a substantial rate of convergence advantage for random sampling over cyclic sampling.

D. P. Bertsekas, "Incremental Proximal Methods for Large Scale Convex Optimization," Mathematical Programming, Vol. 129, 2011, pp.163-195. (Related Lecture Slides). (Related Video Lecture) .

Abstract: We consider the minimization of a sum $\sum_{i=1}^mf_i(x)$ consisting of a large number of convex component functions $f_i$. For this problem, incremental methods consisting of gradient or subgradient iterations applied to single components have proved very effective. We propose new proximal versions of incremental methods, consisting of proximal iterations applied to single components, as well as combinations of gradient, subgradient, and proximal iterations. We provide a convergence and rate of convergence analysis of a variety of such methods, including some that involve randomization in the selection of components. We also discuss applications in various contexts, including signal processing and inference/machine learning.

D. P. Bertsekas, "Centralized and Distributed Newton Methods for Network Optimization and Extensions," Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011.

Abstract: We consider Newton methods for common types of single commodity and multi-commodity network flow problems. Despite the potentially very large dimension of the problem, they can be implemented using the conjugate gradient method and low-dimensional network operations, as shown nearly thirty years ago. We revisit these methods, compare them to more recent proposals, and describe how they can be implemented in a distributed computing system. We also discuss generalizations, including the treatment of arc gains, linear side constraints, and related special structures.

Abstract: We propose a unifying framework for polyhedral approximation in convex optimization. It subsumes classical methods, such as cutting plane and simplicial decomposition, but also includes new methods, and new versions/extensions of old methods, such as a simplicial decomposition method for nondifferentiable optimization, and a new piecewise linear approximation method for convex single commodity network flow problems. Our framework is based on an extended form of monotropic programming, a broadly applicable model, which includes as special cases Fenchel duality and Rockafellar's monotropic programming, and is characterized by an elegant and symmetric duality theory. Our algorithm combines flexibly outer and inner linearization of the cost function. The linearization is progressively refined by using primal and dual differentiation, and the roles of outer and inner linearization are reversed in a mathematically equivalent dual algorithm. We provide convergence results and error bounds for the general case where outer and inner linearization are combined in the same algorithm.

A. Nedich and D. P. Bertsekas, "The Effect of Deterministic Noise in Subgradient Methods," Math. Programming, Ser. A, Vol. 125, pp. 75-99, 2010.

Abstract: In this paper, we study the influence of noise on subgradient methods for convex constrained optimization. The noise may be due to various sources, and is manifested in inexact computation of the subgradients. Assuming that the noise is deterministic and bounded, we discuss the convergence properties for two cases: the case where the constraint set is compact, and the case where this set need not be compact but the objective function has a sharp set of minima (for example the function is polyhedral). In both cases, using several different stepsize rules, we prove convergence to the optimal value within some tolerance that is given explicitly in terms of the subgradient errors. In the first case, the tolerance is nonzero, but in the second case, somewhat surprisingly, the optimal value can be obtained exactly, provided the size of the error in the subgradient computation is below some threshold. We then extend these results to objective functions that are the sum of a large number of convex functions, in which case an incremental subgradient method can be used.

Yu, H., Bertsekas, D. P., and Rousu, J., An Efficient Discriminative Training Method for Generative Models," Extended Abstract, the 6th International Workshop on Mining and Learning with Graphs (MLG), 2008.

Abstract: We propose an efficient discriminative training method for generative models under supervised learning. In our setting, fully observed instances are given as training examples, together with a specification of variables of interest for prediction. We formulate the training as a convex programming problem, incorporating the SVM-type large margin constraints to favor parameters under which the maximum a posteriori (MAP) estimates of the prediction variables, conditioned on the rest, are close to their true values given in the training instances. The resulting optimization problem is, however, more complex than its quadratic programming (QP) counterpart resulting from the SVM-type training of conditional models, because of the presence of non-linear constraints on the parameters. We present an efficient optimization method, which combines several techniques, namely, a data-dependent reparametrization of dual variables, restricted simplicial decomposition, and the proximal point algorithm. Our method extends the one for solving the aforementioned QP counterpart, proposed earlier by some of the authors.

D. P. Bertsekas, "Extended Monotropic Programming and Duality," Lab. for Information and Decision Systems Report 2692, MIT, March 2006, corrected in Feb. 2010; a version appeared in JOTA, 2008, Vol. 139, pp. 209-225.

Abstract: We consider the separable problem of minimizing $\sum_{i=1}^mf_{i}(x_{i})$ subject to $x\in S$, where $x_i$ are multidimensional subvectors of $x$, $f_i$ are convex functions, and $S$ is a subspace. Monotropic programming, extensively studied by Rockafellar, is the special case where the subvectors $x_i$ are the scalar components of $x$. We show a strong duality result that parallels Rockafellar's result for monotropic programming, and contains other known and new results as special cases. The proof is based on the use of $\e$-subdifferentials and the $\e$-descent method, which is used here as an analytical vehicle.

D. P. Bertsekas, and P. Tseng, "Set Intersection Theorems and Existence of Optimal Solutions," Lab. for Information and Decision Systems Report 2628, MIT, Nov. 2004; revised August 2005; Math. Programming J., Vol. 110, 2007 pp. 287-314.

Abstract: The question of nonemptiness of the intersection of a nested sequence of closed sets is fundamental in a number of important optimization topics, including the existence of optimal solutions, the validity of the minimax inequality in zero sum games, and the absence of a duality gap in constrained optimization. We introduce the new notion of an asymptotic direction of a sequence of closed sets, and the associated notions of retractive, horizon, and critical directions, and we provide several conditions that guarantee the nonemptiness of the corresponding intersection. We show how these conditions can be used to obtain simple proofs of some known results on existence of optimal solutions, and to derive some new results, including an extension of the Frank-Wolfe Theorem for (nonconvex) quadratic programming. (Related Slide Presentation)

D. P. Bertsekas, "Lagrange Multipliers with Optimal Sensitivity Properties in Constrained Optimization," Lab. for Information and Decision Systems Report 2632, MIT, October 2004; in Proc. of the 2004 Erice Workshop on Large Scale Nonlinear Optimization, Erice, Italy, Kluwer, 2005.

We consider optimization problems with inequality and abstract set constraints, and we derive sensitivity properties of Lagrange multipliers under very weak conditions. In particular, we do not assume uniqueness of a Lagrange multiplier or continuity of the perturbation function. We show that the Lagrange multiplier of minimum norm defines the optimal rate of improvement of the cost per unit constraint violation. (Related Slide Presentation), (Related Slide Presentation)

D. P. Bertsekas, A. E. Ozdaglar, and P. Tseng, "Enhanced Fritz John Conditions for Convex Programming," Lab. for Information and Decision Systems Report 2631, MIT, July 2004; SIAM J. on Optimization, Vol. 16, 2006, p. 766.

Abstract: We consider convex constrained optimization problems, and we enhance the classical Fritz John optimality conditions to assert the existence of multipliers with special sensitivity properties. In particular, we prove the existence of Fritz John multipliers that are informative in the sense that they identify constraints whose relaxation, at rates proportional to the multipliers, strictly improves the primal optimal value. Moreover, we show that if the set of geometric multipliers is nonempty, then the minimum-norm vector of this set is informative, and defines the optimal rate of cost improvement per unit constraint violation. Our assumptions are very general, and do not include the absence of duality gap or the existence of optimal solutions. In particular, for the case where there is a duality gap, we establish enhanced Fritz John conditions involving the dual optimal value and dual optimal solutions. (Related Slide Presentation)

A. E. Ozdaglar and D. P. Bertsekas, "The Relation Between Pseudonormality and Quasiregularity in Constrained Optimization," Optimization Methods and Software, Vol. 19 (2004), pp. 493--506.

Abstract: We consider optimization problems with equality, inequality, and abstract set constraints. We investigate the relations between various characteristics of the constraint set that imply the existence of Lagrange multipliers. For problems with no abstract set constraint, the classical condition of quasiregularity provides the connecting link between the most common constraint qualifications and existence of Lagrange multipliers. In earlier work, we introduced a new and general condition, pseudonormality, that is central within the theory of constraint qualifications, exact penalty functions, and existence of Lagrange multipliers. In this paper, we explore the relations between pseudonormality, quasiregularity, and existence of Lagrange multipliers in the presence of an abstract set constraint. In particular, under a regularity assumption on the abstract constraint set, we show that pseudonormality implies quasiregularity. However, contrary to pseudonormality, quasiregularity does not imply the existence of Lagrange multipliers, except under additional assumptions.

Abstract: In this paper, we propose methods for solving broadly applicable integer multicommodity flow problems. We focus in particular on the problem of routing and wavelength assignment (RWA), which is critically important for increasing the efficiency of wavelength-routed all-optical networks. Our methodology can be applied as a special case to the problem of routing in a circuit-switched network. We discuss an integer-linear programming formulation, which can be addressed with highly efficient linear (not integer) programming methods, to obtain optimal or nearly optimal solutions. Note: A comparative computational evaluation of the methodology of this paper is given in the thesis by Ali Meli.

A. E. Ozdaglar and D. P. Bertsekas, "Routing and Wavelength Assignment in Optical Networks," Report LIDS-P-2535, Dec. 2001; IEEE Trans. on Networking, no. 2, Apr. 2003, pp. 259-272.

Abstract: The problem of routing and wavelength assignment (RWA) is critically important for increasing the efficiency of wavelength-routed all-optical networks. Given the physical network structure and the required connections, the RWA problem is to select a suitable path and wavelength among the many possible choices for each connection so that no two paths sharing a link are assigned the same wavelength. In work to date, this problem has been formulated as a difficult integer programming problem that does not lend itself to efficient solution or insightful analysis. In this work, we propose several novel optimization problem formulations that offer the promise of radical improvements over the existing methods. We adopt a (quasi-)static view of the problem and propose new integer-linear programming formulations, which can be addressed with highly efficient linear (not integer) programming methods and yield optimal or near-optimal RWA policies. The fact that this is possible is surprising, and is the starting point for new and greatly improved methods for RWA. Aside from its intrinsic value, the quasi-static solution method can form the basis for suboptimal solution methods for the stochastic/dynamic settings. Note: A comparative computational evaluation of the methodology of this paper is given in the thesis by Ali Meli.

D. P. Bertsekas and A. E. Ozdaglar, "Pseudonormality and a Lagrange Multiplier Theory for Constrained Optimization," Report LIDS-P-2489, Nov. 2000; JOTA, Vol. 114, (2002), pp. 287--343.

Abstract: We consider optimization problems with equality, inequality, and abstract set constraints, and we explore various characteristics of the constraint set that imply the existence of Lagrange multipliers. We prove a generalized version of the Fritz-John theorem, and we introduce new and general conditions that extend and unify the major constraint qualifications. Among these conditions, two new properties, pseudonormality and quasinormality, emerge as central within the taxonomy of interesting constraint characteristics. In the case where there is no abstract set constraint, these properties provide the connecting link between the classical constraint qualifications and two distinct pathways to the existence of Lagrange multipliers: one involving the notion of quasiregularity and Farkas' Lemma, and the other involving the use of exact penalty functions. The second pathway also applies in the general case where there is an abstract set constraint.(Related Slide Presentation)

P. Tseng and D. P. Bertsekas, "An Epsilon-Relaxation Method for Separable Convex Cost Generalized Network Flow Problems," Math. Programming, Vol. 88, (2000), pp. 85--104.

Abstract: We generalize the epsilon-relaxation method for the single commodity, linear or separable convex cost network flow problem to network flow problems with positive gains. The method maintains epsilon-complementary slackness at all iterations and adjusts the arc flows and the node prices so as to satisfy flow conservation upon termination. Each iteration of the method involves either a price change on a node or a flow change along an arc or a flow change along a simple cycle. Complexity bounds for the method are derived. For one implementation employing epsilon-scaling, the bound is polynomial in the number of nodes N, the number of arcs A, a certain constant Gamma depending on the arc gains, and ln(epsilon^0/\bar epsilon), where epsilon^0 and \bar epsilon denote, respectively, the initial and the final tolerance epsilon.

D. P. Bertsekas and A. E. Koksal-Ozdaglar, "Enhanced Optimality Conditions and Exact Penalty Functions," Proc. of the 38th Allerton Conference on Communication, Control, and Computing, Allerton Park, Ill., September 2000.

A. Nedic and D. P. Bertsekas, "Incremental Subgradient Methods for Nondifferentiable Optimization," Report LIDS-P-2460, Dec. 2000, SIAM J. on Optimization, Vol. 12, 2001, pp. 109-138.

A. Nedic, D. P. Bertsekas, and V. Borkar, Distributed Asynchronous Incremental Subgradient Methods, Proceedings of the March 2000 Haifa Workshop ``Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications", D. Butnariu, Y. Censor, and S. Reich, Eds., Elsevier, Amsterdam, 2001.

Abstract: We propose and analyze a distributed asynchronous subgradient method for minimizing a convex function that consists of the sum of a large number of component functions. The idea is to distribute the computation of the component subgradients among a set of processors, which communicate only with a coordinator. The coordinator performs the subgradient iteration incrementally and asynchronously, by taking steps along the subgradients of the component functions that are available at the update time. The incremental approach has performed very well in centralized computation, and the parallel implementation should improve its performance substantially, particularly for typical problems where computation of the component subgradients is relatively costly.

A. Nedic and D. P. Bertsekas, "Convergence Rate of Incremental Subgradient Algorithms," Stochastic Optimization: Algorithms and Applications (S. Uryasev and P. M. Pardalos, Editors), pp. 263-304, Kluwer Academic Publishers, 2000.

Abstract: We consider a class of subgradient methods for minimizing a convex function that consists of the sum of a large number of component functions. This type of minimization arises in a dual context from Lagrangian relaxation of the coupling constraints of large scale separable problems. The idea is to perform the subgradient iteration incrementally, by sequentially taking steps along the subgradients of the component functions, with intermediate adjustment of the variables after processing each component function. This incremental approach has been very successful in solving large differentiable least squares problems, such as those arising in the training of neural networks, and it has resulted in a much better practical rate of convergence than the steepest descent method. In this paper, we present convergence results and estimates of the convergence rate of a number of variants of incremental subgradient methods, including some that use randomization. The convergence rate estimates are consistent with our computational results, and suggests that the randomized variants perform substantially better than their deterministic counterparts.

D. P. Bertsekas, and J. N. Tsitsiklis, "Gradient Convergence in Gradient Methods with Errors," Report LIDS-P-2404, Lab. for Info. and Decision Systems, November 1997, SIAM J. on Optimization, Vol. 10, 2000, pp. 627-642.

Abstract: For the classical gradient method and several deterministic and stochastic variants for unconstrained minimization, we discuss the issue of convergence of the gradient sequence and the attendant issue of stationarity of limit points of the iterate sequence. We assume that the cost function gradient is Lipschitz continuous, and that the stepsize diminishes to 0 and satisfies standard stochastic approximation conditions. We show that either the cost sequence diverges to - infinity or else the cost sequence converges to a finite value and the gradient sequence converges to 0 (with probability 1 in the stochastic case). Existing results assume various boundedness conditions, such as boundedness of the cost sequence or the gradient sequence or the iterate sequence, which we do not assume.

D. P. Bertsekas, "A Note on Error Bounds for Convex and Nonconvex Programs, " COAP (Comp. Optimization and Applications), Vol. 12, 1999, pp. 41-51.

Abstract: Given a single feasible solution $x_F$ and a single infeasible solution $x_I$ of a mathematical program, we provide an upper bound to the optimal dual value. We assume that $x_F$ satisfies a weakened form of the Slater condition. We apply the bound to convex programs and we discuss its relation to Hoffman-like bounds. As a special case, we recover a bound due to Mangasarian [Man97] on the distance of a point to a convex set specified by inequalities.

C. C. Wu and D. P. Bertsekas, "Distributed Power Control Algorithms for Wireless Networks," IEEE Trans. on Vehicular Technology, Vol. 50, pp. 504-514, 2001.

Abstract: Power control has been shown to be an effective way to increase capacity in wireless systems. In previous work on power control, it has been assumed that power levels can be assigned from a continuous range. In practice, however, power levels are assigned from a discrete set. In this work, we consider the minimization of the total power transmitted over given discrete sets of available power levels subject to maintaining an acceptable signal quality for each mobile. We have developed distributed iterative algorithms for solving a more general version of this integer programming problem, which is of independent interest, and have shown that they find the optimal solution in a finite number of iterations which is polynomial in the number of power levels and the number of mobiles.

D. P. Bertsekas, "A New class of Incremental Gradient Methods for Least Squares," SIAM J. on Optimization, Vol. 7, 1997, pp. 913-926.

Abstract: The LMS method for linear least squares problems differs from the steepest descent method in that it processes data blocks one-by-one, with intermediate adjustment of the parameter vector under optimization. This mode of operation often leads to faster convergence when far from the eventual limit, and to slower (sublinear) convergence when close to the optimal solution. We embed both LMS and steepest descent, as well as other intermediate methods, within a one-parameter class of algorithms, and we propose a hybrid class of methods that combine the faster early convergence rate of LMS with the faster ultimate linear convergence rate of steepest descent. These methods are well-suited for neural network training problems with large data sets.

D. P. Bertsekas, "Incremental Least Squares Methods and the Extended Kalman Filter," SIAM J. on Optimization, Vol. 6, 1996, pp. 807-822.

Abstract: In this paper we propose and analyze nonlinear least squares methods, which process the data incrementally, one data block at a time. Such methods are well suited for large data sets and real time operation, and have received much attention in the context of neural network training problems. We focus on the Extended Kalman Filter, which may be viewed as an incremental version of the Gauss-Newton method. We provide a nonstochastic analysis of its convergence properties, and we discuss variants aimed at accelerating its convergence.

D. P. Bertsekas, "Thevenin Decomposition and Network Optimization," J. of Optimization Theory and Applications, Vol. 89, 1996, pp. 1-15.

Abstract: Thevenin's theorem, one of the most celebrated results of electric circuit theory, provides a two-parameter characterization of the behavior of an arbitrarily large circuit, as seen from two of its terminals. We interpret the theorem as a sensitivity result in an associated minimum energy/network flow problem, and we abstract its main idea to develop a decomposition method for convex quadratic programming problems with linear equality constraints, such as those arising in a variety of contexts such as Newton's method, interior point methods, and least squares estimation. Like Thevenin's theorem, our method is particularly useful in problems involving a system consisting of several subsystems, connected to each other with a small number of coupling variables.

D. P. Bertsekas and P. Tseng, "Partial Proximal Minimization Algorithms for Convex Programming," SIAM J. on Optimization, Vol. 4, 1994, pp. 551-572.

Abstract: We consider an extension of the proximal minimization algorithm where only some of the minimization variables appear in the quadratic proximal term. We interpret the resulting iterates in terms of the iterates of the standard algorithm and we show a uniform descent property, which holds independently of the proximal terms used. This property is used to give simple convergence proofs of parallel algorithms where multiple processors simultaneously execute proximal iterations using different partial proximal terms. We also show that partial proximal minimization algorithms are dual to multiplier methods with partial elimination of constraints, and we establish a relation between parallel proximal minimization algorithms and parallel constraint distribution algorithms.

D. P. Bertsekas and P. Tseng, "On the Convergence of the Exponential Multiplier Method for Convex Programming," Math Programming, Vol. 60, pp. 1-19, 1993.

Abstract: In this paper, we analyze the exponential method of multipliers for convex constrained minimization problems, which operates like the usual Augmented Lagrangian method, except that it uses an exponential penalty function in place of the usual quadratic. We also analyze a dual counterpart, the entropy minimization algorithm, which operates like the proximal minimization algorithm, except that it uses a logarithmic/entropy ``proximal'' term in place of a quadratic. We strengthen substantially the available convergence results for these methods, and we derive their convergence rate when applied to linear programs.

J. Eckstein and D. P. Bertsekas, "On the Douglas-Ratchford Splitting Method and the Proximal Point Algorithm for Maximal Monotone Operators," Mathematical Programming, Vol. 55, 1992, pp. 293-318.

Abstract: This paper shows, by means of an operator called a splitting operator, that the Douglas-Ratchford splitting method for finding a zero of the sum of two monotone operators is a special case of the proximal point algorithm. Therefor, applications of Douglas-Ratchford splitting, such as the alternating direction method of multipliers for convex programming decomposition, are also special cases of the proximal point algorithm. This observation allows the unification and generalization of a variety of convex programming algorithms. By introducing a modified version of the proximal point algorithm, we derive a new, generalized alternating direction method of multipliers for convex programming. Advances of this sort illustrate the power and generality gained by adopting monotone operator theory as a conceptual framework.

P. Tseng and D. P. Bertsekas, "Relaxation Methods for Strictly Convex Costs and Linear Constraints," Mathematics of Operations Research, Vol. 16, 1991 pp. 462-481.

Abstract: Consider the problem of minimizing a stritcly convex (possibly nondifferentiable) cost subject to linear constraints. We propose a dual coordinate ascent method for this problem that uses inexact line search and either essentially cyclic or Gauss-Southwell order of coordinate relaxation. We show, under very weak conditions, that this method generates a sequence of primal vectors converging to the optimal primal solution. Under an additional regularity assumption, and assuming that the effective domain of the cost function is polyhedral, we show that a related sequence of dual vectors converges in cost to the optimal cost. If the constraint set has an interior point in the effective domain of the cost function, then this sequence of dual vectors is bounded and each of its limit points is an optimal dual solution. When the cost function is strongly convex, we show that the order of coordinate relaxation can become progressively more chaotic. These results significantly improve upon those obtained previously.

P. Tseng and D. P. Bertsekas, "Relaxation Methods for Monotropic Programs," Math. Programming, Vol. 46, (1990), pp. 127--151.

Abstract: We propose a dual ascent method for the problem of minimizing a convex, possibly nondifferentiable, separable cost subject to linear constraints. The method has properties reminiscent of the Gauss-Seidel method in numerical analysis and uses the epsilon-complemetary slackness mechanism introduced in Bertsekas, Hosein and Tseng (1987) to ensure finite convergence to near optimality. As special cases we obtain the methods in Bertsekas, Hosein and Tseng (1987) for network flow programs and the methods in Tseng and Bertsekas (1987) for linear programs.

J. Dunn and D. P. Bertsekas, "Efficient Dynamic Programming Implementations of Newton's Method for Unconstrained Optimal Control Problems," J. Opt. Theory and Appl., Vol. 63, 1989, pp. 23-38.

Abstract: Naive implementations of Newton's method for unconstrained N-stage discrete-time optimal control problems with Bolza objective functions tend to increase in cost like N^3 as N increases. However, if the inherent recursive structure of the Bolza problem is properly exploited, the cost of computing a Newton step will increase only linearly with N. The efficient Newton implementation scheme proposed here is similar to Mayne's DDP (differential dynamic programming) method but produces the Newton step exactly, even when the dynamical equations are nonlinear. The proposed scheme is also related to a Riccati treatment of the linear, two-point boundary-value problems that characterize optimal solutions. For discrete-time problems, the dynamic programming approach and the Riccati substitution differ in an interesting way; however, these differences essentially vanish in the continuous-time limit.

P. Tseng and D. P. Bertsekas, "Relaxation Methods for Linear Programs," Math. of Operations Research, Vol. 12, (1987), pp. 569--596.

Abstract: In this paper we propose a new method for solving linear programs. This method may be viewed as a generalized coordinate descent method whereby the descent directions are chosen from a finite set. The generation of the descent directions are based on results from monotropic programming theory. The method may be alternatively viewed as an extension of the relaxation method for network flow problems [1], [2]. Node labeling, cuts, and flow augmentation paths in the network case correspond to, respectively, tableau pivoting, rows of tableaus, and columns of tableaus possessing special sign patterns in the linear programming case.

E. M. Gafni, and D. P. Bertsekas, "Two-Metric Projection Methods for Constrained Optimization," SIAM J. on Control and Optimization, Vol. 22, 1984, pp. 936-964.

Abstract: This paper is concerned with the problem min{f(x)|x\in X}, where X is a convex subset of a linear space H, and f is a smooth real-valued function on H. We propose the class of methods x_{k+1}=P(x_k-\alpha_k g_k), where P denotes projection on X with respect to the Hilbert space norm ||.||, g_k denotes the Frechet derivative of f at x_k with respect to another Hilbert space norm ||.||_k on H, and \alpha_k is a positive scalar stepsize. We thus remove an important restriction in the original proposal of Goldstein, and Levitin and Poljak, where the norms ||.|| and ||.||_k must be the same. It is therefore possible to match the norm ||.|| with the structure of X so that the projection operation is simplified while at the same time reserving the option to choose ||.||_k on the basis of approximations to the Hessian of f so as to attain a typically superlinear rate of convergence. The resulting methods are particularly attractive for large-scale problems with specially structured constraint sets such as optimal control and nonlinear multi-commodity network flow problems. The latter class of problems is discussed in some detail.

D. P. Bertsekas, G. S, Lauer, N. R. Sandell, and T. A. Posbergh, "Optimal Short-Term Scheduling of Large-Scale Power Systems," IEEE Trans. on Aut. Control, Vol. AC-28, 1983, pp. 1-11.

Abstract: This paper is concerned with the long-standing problem of optimal unit commitment in an electric power system. We follow the traditional formulation of this problem which gives rise to a large-scale, dynamic, mixed-integer programming problem. We describe a solution methodology based on duality, Lagrangian relaxation and nondifferentiable optimization that has two unique features. First, computational requirements typically grow only linearly with the number of generating units. Second, the duality gap decreases in relative terms as the number of units increases, and as a result our algorithm tends to actually perform better for problems of large size. This allows for the first time consistently reliable solution of large practical problems involving several hundreds of units within realistic time constraints. Aside from the unit commitment problem, this methodology is applicable to a broad class of large-scale dynamic scheduling and resource allocation problems involving integer variables.

D. P. Bertsekas, and N. R. Sandell, "Estimates of the Duality Gap for large-Scale Separable Nonconvex Optimization Problems," Proc. of 21st IEEE Conference on Decision and Control, Volume 21, Part 1, Dec. 1982, pp. 782 - 785.

Abstract: We derive some estimates of the duality gap for separable constrained optimization problems involving nonconvex, possibly discontinuous, objective functions, and nonconvex, possibly discrete, constraint sets. The main result is that as the number of separable terms increases to infinity the duality gap as a fraction of the optimal cost decreases to zero. The analysis is related to the one of Aubin and Ekeland, and is based on the Shapley-Folkman theorem. Our assumptions are different and our estimates are sharper and more convenient for integer programming problems.

D. P. Bertsekas and E. Gafni, "Projection Methods for Variational Inequalities with Applications to the Traffic Assignment Problem," Math. Progr. Studies, Vol. 17, 1982, pp. 139-159.

Abstract: It is well known [2, 3, 16] that if $\bar T:\Rn\mapsto\Rn$ is a Lipschitz continuous, strongly monotone operator and $X$ is a closed convex set, then a solution $x^*\in X$ of the variational inequality$(x-x^*)'\bar T(x^*)\geq 0$, for all $x\in X$, can be found iteratively by means of the projection method $x_{k+1}=P_X[x_k-\alpha \bar T(x_k)]$, $x_0\in X$, provided the stepsize $\alpha$ is sufficiently small. We show that the same is true if $\bar T$ is of the form $\bar T=A'TA$ where $A:\Rn\mapsto\Rm$ is a linear mapping, provided $T:\Rm\mapsto\Rm$ is Lipschitz continuous and strongly monotone, and the set $X$ is polyhedral. This fact is used to construct an effective algorithm for finding a network flow which satisfies given demand constraints, and is positive only on paths of minimum delay or travel time.

D. P. Bertsekas, "Projected Newton Methods for Optimization Problems with Simple Constraints," SIAM J. Control and Optimization, Vol. 20, 1982, pp. 221-246.

Abstract: We consider the problem min{f(x)|x>=0} and propose algorithms of the form x_{k+1}=P(x_k-a_kD_k grad f(x_k)) where P denotes projection on the positive orthant, a_k is a stepsize chosen by an Armijo-like rule, and D_k is a positive definite symmetric matrix, which is partly diagonal. We show that D_k can be calculated simply on the basis of second derivatives of f so that the resulting Newton-like algorithm has a typically superlinear rate of convergence. With other choices of D_k convergence at a typically linear rate is obtained. The algorithms are almost as simple as their unconstrained counterparts. They are well suited for problems of large dimension such as those arising in optimal control while being competitive with existing methods for low-dimensional problems. The effectiveness of the Newton-like algorithm is demonstrated via computational examples involving as many as 10,000 variables. Extensions to general linearly constrained problems are also provided. These extensions utilize a notion of an active generalized rectangle patterned after the notion of an active manifold used in maniforld suboptimization methods, By contrast with these methods, many constraints can be added or subtracted from the binding set at each iteration without the need to solve a quadratic programming problem.

D. P. Bertsekas, "Enlarging the Region of Convergence of Newton's Method for Constrained Optimization," J. Optimization Th. and Applications, Vol. 36, 1982, pp. 221-252.

Abstract: In this paper, we consider Newton's method for solving the system of necessary optimality conditions of optimization problems with equality and inequality constraints. The principal drawbacks of the method are the need for a good starting point, the inability to distinguish between local maxima and local minima, and when inequality constraints are present, the necessity to solve a quadratic programming problem at each iteration. We show that all these drawbacks can be overcome to a great extent without sacrificing the superlinear convergence rate by making use of exact differentiable penalty functions introduced by Di Pillo and Grippo. We also show that there is a close relationship between the class of penalty functions of Di Pillo and Grippo and the class of Fletcher, and that the region of convergence of a variation of Newton's method can be enlarged by making use of one of Fletcher's penalty functions.

D. P. Bertsekas, "Convexification Procedures and Decomposition Methods for Nonconvex Optimization Problems," J. of Optimization Theory and Applications, Vol. 29, 1979, pp. 169-197.

Abstract: In order for primal-dual methods to be applicable to a constrained minimization problem, it is necessary that restrictive convexity conditions are satisfied. In this paper, we consider a procedure by means of which a nonconvex problem is convexified and transformed into one which can be solved with the aid of primal-dual methods. Under this transformation, separability of the type necessary for application of decomposition algorithms is preserved. This feature extends the range of applicability of such algorithms to nonconvex problems. Relation with multiplier methods are explored with the aid of a local version of a conjugate convex function.

D. P. Bertsekas, "Local Convex Conjugacy and Fenchel Duality," Preprints of Triennial World Congress of IFAC, Helsinki, June 1978, Vol. 2, 1978, pp. 1079-1084.

Abstract: In this paper we introduce a notion of a convex conjugate function of a nonlinear function defined on a manifold specified by nonlinear equality constraints. Under certain assumptions the conjugate is defined locally around a point and upon conjugation yields the original function. Local versions of the Fenchel duality theorem are also proved.

D. P. Bertsekas, "On the Convergence Properties of Second-Order Multiplier Methods," J. of Optimization Theory and Applications, Vol. 25, 1978, pp. 443-449.

Abstract: The purpose of this note is to provide some estimates relating to Newton-type methods of multipliers. These estimates can be used to infer that convergence in such methods can be achieved for an arbitrary choice of the initial multiplier vector by selecting the penalty parameter sufficiently large.

D. P. Bertsekas, "Approximation Procedures Based on the Method of Multipliers," J. of Optimization Theory and Applications, Vol. 23, 1977.

Abstract: In this paper, we consider a method for solving certain optimization problems with constraints, nondifferentiabilities, and other ill-conditioning terms in the cost functional by approximating them by well-behaved optimization problems. The approach is based on the method of multipliers. The convergence properties of the methods proposed can be inferred from corresponding properties of multiplier methods with partial elimination of constraints. A related analysis is provided in this paper.

D. P. Bertsekas, "Multiplier Methods: A Survey," Automatica, Vol. 12, 1976, pp. 133-145.

Abstract: The purpose of this paper is to provide a survey of convergence and rate of convergence aspects of a class of recently proposed methods for constrained minimization - the, so called, multiplier methods. The results discussed highlight the operational aspects of multiplier methods and demonstrate their significant advantages over ordinary penalty methods.

D. P. Bertsekas, "On Penalty and Multiplier Methods for Constrained Minimization," SIAM J. on Control and Optimization, Vol. 14, 1976, pp. 216-235.

Abstract: In this paper we consider a generalized class of quadratic penalty function methods for the solution of nonconvex nonlinear programming problems. This class contains as special cases both the usual quadratic penalty function method and the recently proposed multiplier method. We obtain convergence and rate of convergence results for the sequences of primal and dual variables generated. The convergence results for the multiplier method are global in nature and constitute a substantial improvement over existing local convergence results. The rate of convergence results show that the multiplier method should be expected to converge considerably faster than the pure penalty method. At the same time, we construct a global duality framework for nonconvex optimization problems. The dual functional is concave, everywhere finite, and has strong differentiability properties. Furthermore, its value, gradient and Hessian matrix within an arbitrary bounded set can be obtained by uncon- strained minimization of a certain augmented Lagrangian.

D. P. Bertsekas, "On the Goldstein-Levitin-Polyak Gradient Projection Method," IEEE Trans. on Aut. Control, Vol. AC-21, 1976.

Abstract: This paper considers some aspects of the gradient projection method proposed by Goldstein, Levitin and Polyak, and more recently, in a less general context by McCormick. We propose and analyze some convergent step-size rules to be used in conjunction with the method. These rules are similar in spirit to the efficient Armijo rule for the method of steepest descent and under mild assumptions, they have the desirable property that they identify the set of active inequality constraints in a finite number of iterations.

D. P. Bertsekas, "A New Algorithm for Solution of Resistive Networks Involving Diodes," IEEE Trans. on Circuits and Systems, Vol. CAS-23, 1976, pp. 599-608.

Abstract: The solution of electric network problems by various algorithms such as for example Newton's method is often hampered by the presence of physical diodes with steeply rising exponential characteristics which cause overflow and slow convergence during numerical computation. In this paper we propose and analyze an algorithm which bypasses these difficulties by successively approximating the steep diode characteristics by smoother exponential functions. The algorithm may be modified to be used in the presence of ideal diodes and is related to penalty and multiplier methods for constrained minimization and Davidenko's method for solving certain ill-conditioned systems of nonlinear equations.

D. P. Bertsekas, "Combined Primal-Dual and Penalty Methods for Constrained Minimization," SIAM J. on Control, Vol. 13, pp. 521-544, 1975.

Abstract: In this paper we consider a class of combined primal-dual and penalty methods often called methods of multipliers. The analysis focuses mainly on the rate of convergence of these methods. It is shown that this rate is considerably more favorable than the corresponding rate for penalty function methods. Some efficient versions of multiplier methods are also considered whereby the intermediate unconstrained minimizations involved are approximate and only asymptotically exact. It is shown that such approximation schemes may lead to a substantial deterioration of the convergence rate, and a special approximation scheme is proposed which exhibits the same rate as the method with exact minimization. Finally, we analyze the properties of the step size rule of the multiplier method in relation to other possible step sizes, and we consider a modified step size rule for the case of the convex programming problem.

D. P. Bertsekas, "Necessary and Sufficient Conditions for a Penalty Method to be Exact," Mathematical Programming, Vol. 9, pp. 87-99, 1975.

Abstract: This paper identifies necessary and sufficient conditions for a penalty method to yield an optimal solution or a Lagrange multiplier of a convex programming problem by means of a single unconstrained minimization. The conditions are given in terms of properties of the objective and constraint functions of the problem as well as the penalty function adopted. It is shown among other things that all linear programs with finite optimal value satisfy such conditions when the penalty function is quadratic.

D. P. Bertsekas, "Nondifferentiable Optimization via Approximation," in Mathematical Programming Study 3, Nondifferentiable Optimization, M. L. Balinski, P. Wolfe, (eds.), North-Holland Publ. Co., pp. 1-15, 1975.

Abstract: This paper presents a systematic approach for minimization of a wide class of nondifferentiable functions. The technique is based on approximation of the nondifferentiable function by a smooth function and is related to penalty and multiplier methods for constrained minimization. Some convergence results are given and the method is ilustrated by means of examples from nonlinear programming.

D. P. Bertsekas, "On the Method of Multipliers for Convex Programming," IEEE Transactions on Aut. Control, June 1975, pp. 385-388.

Abstract: It is known that the method of multipliers for constrained minimization can be viewed as a fixed stepsize gradient method for solving a certain dual problem. In this short paper it is shown that for convex programming problems the method converges globally for a wide range of possible stepsizes. This fact is proved for both cases where unconstrained minimization is exact and approximate. The results provide the basis for considering modifications of the basic stepsize of the method of multipliers which are aimed at acceleration of its speed of convergence. A few such modifications are discussed and some computational results are presented relating to a problem in optimal control.

D. P. Bertsekas, "Necessary and Sufficient Conditions for Existence of an Optimal Portfolio," Journal of Economic Theory, Vol. 8, No. 2, pp. 235-247, 1974.

Abstract: This paper identifies necessary and sufficient conditions for existence of a solution to a class of optimization problems under uncertainty. This class includes certain problems of optimal portfolio selection when rates of return of risky assets are uncertain, as well as problems of optimal choice of inputs and outputs by a perfectly competitive firm facing uncertain prices.

D. P. Bertsekas, "Partial Conjugate Gradient Methods for a Class of Optimal Control Problems," IEEE Trans. on Aut. Control, Vol. AC-19, 1974, pp. 209-217.

Abstract: In this paper we examine the computational aspects of a certain class of discrete-time optimal control problems. We propose and analyze two partial conjugate gradient algorithms which operate in cycles of s+1 conjugate gradient steps (s\le n = space dimension). The algorithms are motivated by the special form of the Hessian matrix of the cost functional. The first algorithm exhibits a linear convergence rate and offers some advantages over steepest descent in certain cases such as when the system is unstable. The second algorithm requires second-order information with respect to the control variables at the beginning of each cycle and exhibits (s+1)-step superlinear convergence rate. Furthermore, it solves a linear-quadratic problem in s+1 steps as compared with the mN steps (m = control space dimension, N = number of stages) required by the ordinary conjugate gradient method.

D. P. Bertsekas, "Stochastic Optimization Problems with Nondifferentiable Cost Functionals," Journal of Optimization Theory and Applications, Vol. 12, 1973. pp. 218-231.

Abstract: In this paper, we examine a class of stochastic optimization problems characterized by nondifferentiability of the objective function. It is shown that, in many cases, the expected value of the objective function is differentiable and, thus, the resulting optimization problem can be solved by using classical analytical or numerical methods. Te results are subsequently applied to the solution of a problem of economic resource allocation.

D. P. Bertsekas, and S. K. Mitter, "A Descent Numerical Method for Optimization Problems with Nondifferentiable Cost Functionals," SIAM Journal on Control, Vol. 11, 1973, pp. 637-652.

Abstract: In this paper we consider the numerical solution of convex optimization problems with nondifferentiable cost functionals. We propose a new algorithm, the epsilon-subgradient method, a large step, double iterative algorithm which converges rapidly under very general assumptions. We discuss the application of the algorithm in some problems of nonlinear programming and optimal control and we show that the epsilon-subgradient method contains as a special case a minimax algorithm due to Pschenichnyi.

B. W. Kort and D. P. Bertsekas, "A New Penalty Function Method for Constrained Minimization," Proc. of 1972 IEEE Conference on Decision and Control, New Orleans, La., 1972, pp. 162-166.

Abstract: During recent years it has been shown that the performance of penalty function methods for constrained minimization can be improved significantly by introducing gradient type iterations for solving the dual problem. In this paper we present a new penalty function algorithm of this type which offers significant advantages over existing schemes for the case of the convex programming problem. The algorithm treats inequality constraints explicitly and can also be used for the solution of general mathematical programming problems.

D. P. Bertsekas, and S. K. Mitter, "Steepest Descent for Optimization Problems with Nondifferentiable Cost Functionals," Proc. of Princeton Conference on Information Sciences and Systems, 1971, pp. 347-351.

D. P. Bertsekas, "Note on the Design of Linear Systems with Piecewise Constant Gains," IEEE Transactions on Automatic Control, 1970, pp. 262-263.

Lecture Slides, Survey Papers, Research papers, Top Menu

Network Optimization

Bertsekas, D., "New Auction Algorithms for the Assignment Problem and Extensions," ASU Research Report, Oct. 2023; arXiv:2310.03159.

Abstract: We consider the classical linear assignment problem, and we introduce new auction algorithms for its optimal and suboptimal solution. The algorithms are founded on duality theory, and are related to ideas of competitive bidding by persons for objects and the attendant market equilibrium, which underlie real-life auction processes. We distinguish between two fundamentally different types of bidding mechanisms: {\it aggressive} and {\it cooperative\/}. Mathematically, aggressive bidding relies on a notion of approximate coordinate descent in dual space, an $\e$-complementary slackness condition to regulate the amount of descent approximation, and the idea of $\e$-scaling to resolve efficiently the price wars that occur naturally as multiple bidders compete for few valuable objects. Cooperative bidding avoids price wars through detection and cooperative resolution of any competitive impasse that involves a group of persons. We discuss the relations between the aggressive and the cooperative bidding approaches, we derive new algorithms and variations that combine ideas from both of them, and we also make connections with other primal-dual methods, including the Hungarian method. Furthermore, our discussion points the way to algorithmic extensions that apply more broadly to network optimization, including shortest path, max-flow, transportation, and minimum cost flow problems with both linear and convex cost functions.

Agrawal, G., Bertsekas, D., Liu, H., "Auction-Based Learning for Question Answering over Knowledge Graphs," Information 2023, 14, 336. https://doi.org/10.3390/ info14060336.

Abstract: Knowledge graphs are graph-based data models which can represent real-time data that is constantly growing with the addition of new information. The question-answering systems over knowledge graphs (KGQA) retrieve answers to a natural language question from the knowledge graph. Most existing KGQA systems use static knowledge bases for offline training. After deployment, they fail to learn from unseen new entities added to the graph. There is a need for dynamic algorithms which can adapt to the evolving graphs and give interpretable results. In this research work, we propose using new auction algorithms for question answering over knowledge graphs. These algorithms can adapt to changing environments in real-time, making them suitable for offline and online training. An auction algorithm computes paths connecting an origin node to one or more destination nodes in a directed graph and uses node prices to guide the search for the path. The prices are initially assigned arbitrarily and updated dynamically based on defined rules. The algorithm navigates the graph from the high-price to the low-price nodes. When new nodes and edges are dynamically added or removed in an evolving knowledge graph, the algorithm can adapt by reusing the prices of existing nodes and assigning arbitrary prices to the new nodes. For subsequent related searches, the "learned" prices provide the means to "transfer knowledge" and act as a "guide" to steer it toward the lower-priced nodes. Our approach reduces the search computational effort by 60 percent in our experiments, thus making the algorithm computationally efficient. The resulting path given by the algorithm can be mapped to the attributes of entities and relations in knowledge graphs to provide an explainable answer to the query. We discuss some applications for which our method can be used.

Abstract: We consider some classical optimization problems in path planning and network transport, and we introduce new auction-based algorithms for their optimal and suboptimal solution. The algorithms are based on mathematical ideas that are related to competitive bidding for attaining market equilibrium, which underlie auction processes. However, their starting point is different, namely weighted and unweighted path construction in directed graphs, rather than assignment of persons to objects. The new algorithms have several potential advantages over existing methods: they are empirically faster in some important contexts, such as max-flow, they are well-suited for on-line replanning, and they can be adapted to distributed operation. Moreover, they can take advantage of reinforcement learning methods that use off-line training with data, as well as on-line training during real-time operation.

D. P. Bertsekas, "Centralized and Distributed Newton Methods for Network Optimization and Extensions," Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011.

A. E. Ozdaglar and D. P. Bertsekas, "Optimal Solution of Integer Multicommodity Flow Problems with Application in Optical Networks," Proc. of Symposium on Global Optimization, Santorini, Greece, June 2003; Frontiers in global optimization, pp. 411--435, Nonconvex Optim. Appl., 74, Kluwer Acad. Publ., Boston, MA, 2004.

A. E. Ozdaglar and D. P. Bertsekas, "Routing and Wavelength Assignment in Optical Networks," Report LIDS-P-2535, Dec. 2001; IEEE Trans. on Networking, no. 2, Apr. 2003, pp. 259-272.

P. Tseng, and D. P. Bertsekas, "An Epsilon-Relaxation Method for Separable Convex Cost Generalized Network Flow Problems," Math. Programming, Vol. 88, 2000, pp. 85-104.

Abstract: We generalize the epsilon-relaxation method of for the single commodity, separable convex cost network flow problem to network flow problems with positive gains. We show that the method terminates with a near optimal solution, and we provide an associated complexity analysis.

D. P. Bertsekas, L. C. Polymenakos, and P. Tseng, "Epsilon-Relaxation and Auction Methods for Separable Convex Cost Network Flow Problems," in Network Optimization, by P. M. Pardalos, D. W. Hearn, and W. W. Hager (eds.), Lecture Notes in Economics and Mathematical Systems, Springer-Verlag, N.Y., 1998, pp. 103-126; also given in the book by Bertsekas "Network Optimization: Continuous and Discrete Models," Athena Scientific, 1998.

Abstract: We consider a generic auction method for the solution of the single commodity, separable convex cost network flow problem. This method provides a unifying framework for the $\e$-relaxation method and the auction/sequential shortest path algorithm and, as a consequence, we develop a unified complexity analysis for the two methods. We also present computational results showing that these methods are much faster than earlier relaxation methods, particularly for ill-conditioned problems.

D. P. Bertsekas, L. C. Polymenakos, and P. Tseng, "An Epsilon-Relaxation Method for Convex Network Optimization Problems," SIAM J. on Optimization, Vol. 7, 1997, pp. 853-870.

Abstract: We propose a new method for the solution of the single commodity, separable convex cost network flow problem. The method generalizes the $\e$-relaxation method developed for linear cost problems, and reduces to that method when applied to linear cost problems. We show that the method terminates with a near optimal solution, and we provide an associated complexity analysis. We also present computational results showing that the method is much faster than earlier relaxation methods, particularly for ill-conditioned problems.

D. P. Bertsekas, F. Guerriero, and R. Musmanno, "Parallel Asynchronous Label Correcting Methods for Shortest Paths," J. of Optimization Theory and Applications, Vol. 88, 1996, pp. 297-320.

D. P. Bertsekas, "An Auction Algorithm for the Max-Flow Problem," J. of Optimization Theory and Applications, Vol. 87, 1995, pp. 69-101.

Abstract: We propose a new algorithm for the max-flow problem. It consists of a sequence of augmentations along paths constructed by an auction-like algorithm. These paths are not necessarily shortest, that is, they need not contain a minimum number of arcs. However, they typically can be found with much less computation than the shortest augmenting paths used by competing methods. Our algorithm outperforms these latter methods as well as state-of-the-art preflow-push algorithms by a very large margin in tests with standard randomly generated problems.

D. P. Bertsekas, S. Pallottino, and M. G. Scutella, "Polynomial Auction Algorithms for Shortest Paths," Computational Optimization and Applications , Vol. 4, 1995, pp. 99-125.

Abstract: In this paper we consider strongly polynomial variations of the auction algorithm for the single origin/many destinations shortest path problem. These variations are based on the idea of graph reduction, that is, deleting unnecessary arcs of the graph by using certain bounds naturally obtained in the course of the algorithm. We study the structure of the reduced graph and we exploit this structure to obtain algorithms with $O\bl(n\min\{m,n\log n\}\br)$ and $O(n^2)$ running time. Our computational experiments show that these algorithms outperform their closest competitors on randomly generated dense all destinations problems, and on a broad variety of few destination problems.

D. P. Bertsekas, "An Auction/Sequential Shortest Path Algorithm for the Minimum Cost Flow Problem," Report LIDS-P-2146, Lab. for Info. and Decision Systems, Revision of Feb. 1995.

Abstract: We propose a new algorithm for the solution of the linear minimum cost network flow problem, based on a sequential shortest path augmentation approach. Each shortest path is constructed by means of the recently proposed auction/shortest path algorithm. This approach allows useful information to be passed from one shortest path construction to the next. However, the naive implementation of this approach where the length of each arc is equal to its reduced cost fails because of the presence of zero cost cycles. We overcome this difficulty by using as arc lengths $\e$-perturbations of reduced costs and by using $\e$-complementary slackness conditions in place of the usual complementary slackness conditions. We present several variants of the main algorithm, including one that has proved very efficient for the max-flow problem. We also discuss the possibility of combining the algorithm with the relaxation method and we provide encouraging computational results.

L. C. Polymenakos, and D. P. Bertsekas, "Parallel Shortest Path Auction Algorithms," Parallel Computing, Vol. 20, pp. 1221-1247, 1994.

Abstract: In this paper we discuss the parallel implementation of the auction algorithm for shortest path problems. We show that both the one-sided and the two-sided versions of the algorithm admit asynchronous implementations. We implemented the parallel schemes for the algorithm on a shared memory machine and tested its efficiency under various degrees of synchronization and for different types of problems. We discuss the efficiency of the parallel implementation of the many origins-one destination problem, the all origins-one destination problem, and the many origins-many destinations problem.

D. P. Bertsekas and P. Tseng, "RELAX-IV: A Faster Version of the RELAX Code for Solving Minimum Cost Flow Problems," Report LIDS-P-2276, 1994.

Abstract: The structure of dual ascent methods is particularly well-suited for taking advantage of good initial dual solutions of minimum cost flow problems. For this reason, these methods are extremely efficient for reoptimization and sensitivity analysis. In the absence of prior knowledge of a good initial dual solution, one may attempt to find such a solution by means of a heuristic initialization. RELAX-IV is a minimum cost flow code that combines the RELAX code of [BeT88a], [BeT88b] with an initialization based on a recently proposed auction/sequential shortest path algorithm. This initialization is shown to be extremely helpful in speeding up the solution of difficult problems, involving for example long augmenting paths, for which the relaxation method has been known to be slow. On the other hand, this initialization procedure does not significantly deteriorate the performance of the relaxation method for the types of problems where it has been known to be very fast.

D. P. Bertsekas, Mathematical Equivalence of the Auction Algorithm for Assignment and the Epsilon-Relaxation (Preflow-Push) Method for Min Cost Flow," In: Large Scale Optimization, Hager W.W., Hearn D.W., Pardalos P.M. (eds), Springer, Boston, MA.

Abstract: It is well known that the linear minimum cost flow network problem can be converted to an equivalent assignment problem. We show here that when the auction algorithm is applied to this equivalent problem with some special rules for choosing the initial object prices and the person submitting a bid at each iteration, one obtains the generic form of the e-relaxation method. The reverse equivalence is already known, that is, if we view the assignment problem as a special case of a minimum cost flow problem and we apply the e-relaxation method with some special rules for choosing the node to iterate on, we obtain the auction algorithm. Thus, the two methods are mathematically equivalent.

D. P. Bertsekas, and D. A. Castanon, "A Generic Auction Algorithm for the Minimum Cost Network Flow Problem," Computational Optimization and Applications, Vol. 2, 1993, pp. 229-260.

Abstract: In this paper we broadly generalize the assignment auction algorithm to solve linear minimum cost network flow problems. We introduce a generic algorithm, which contains as special cases a number of known algorithms, including the e-relaxation method, and the auction algorithm for assignment and for transportation problems. The generic algorithm can serve as a broadly useful framework for the development and the complexity analysis of specialized auction algorithms that exploit the structure of particular network problems. Using this framework, we develop and analyze two new algorithms, an algorithm for general minimum cost flow problems, called network auction, and an algorithm for the k node-disjoint shortest path problem.

D. P. Bertsekas, D. A. Castanon, and H. Tsaknakis, "Reverse Auction and the Solution of Asymmetric Assignment Problems," SIAM J. on Optimization, Vol. 3, 1993, pp. 268-299.

Abstract: In this paper we propose auction algorithms for solving several types of assignment problems with inequality constraints. Included are asymmetric problems with different numbers of persons and objects, and multiassignment problems, where persons may be assigned to several objects and reversely. A central new idea in all these algorithms is to combine regular auction, where persons bid for objects by raising their prices, with reverse auction, where objects compete for persons by essentially offering discounts. Reverse auction can also be used to accelerate substantially (and sometimes dramatically) the convergence of regular auction for symmetric assignment problems.

D. P. Bertsekas, "A Simple and Fast Label Correcting Algorithm for Shortest Paths," Networks, Vol. 23, pp. 703-709, 1993.

Abstract: We propose a new method for ordering the candidate nodes in label correcting methods for shortest path problems. The method is equally simple but much faster than the D' Esopo-Pape algorithm. It is similar to the threshold algorithm in that it tries to scan nodes with small labels as early as possible, and performs comparably with that algorithm. Our algorithm can also be combined with the threshold algorithm thereby considerably improving the practical performance of both algorithms.

D. P. Bertsekas and D. A. Castanon, "Parallel Primal-Dual Methods for the Minimum Cost Flow Problem," Computational Optimization and Applications, Vol. 2, pp. 317-336, 1993.

Abstract: In this paper we discuss the parallel asynchronous implementation of the classical primal-dual method for solving the linear minimum cost network flow problem. Multiple augmentations and price rises are simultaneously attempted starting from several nodes with possibly outdated price and flow information. The results are then merged asynchronously subject to rather weak compatibility conditions. We show that this algorithm is valid, terminating finitely to an optimal solution. We also present computational results using an Encore Multimax that illustrate the speedup that can be obtained by parallel implementation.

D. P. Bertsekas and D. A. Castanon, "A Forward Reverse Auction Algorithm for Asymmetric Assignment Problems," Report LIDS-P-2159, Lab. for Information and Decision Systems, also Computational Optimization and Applications, Vol. 1, pp. 277-297, 1992.

Abstract: In this paper we consider the asymmetric assignment problem and we propose a new auction algorithm for its solution. The algorithm uses in a novel way the recently proposed idea of reverse auction, where in addition to persons bidding for objects by raising their prices, we also have objects competing for persons by essentially offering discounts. In practice, the new algorithm apparently deals better with price wars than the currently existing auction algorithms. As a result it frequently does not require $\e$-scaling for good practical performance, and tends to terminate substantially (and often dramatically) faster than its competitors.

D. P. Bertsekas and D. A. Castanon, "Parallel Synchronous and Asynchronous Implementations of the Auction Algorithm," Parallel Computing, Vol. 17, pp. 707-732, 1991.

Abstract: In this paper we discuss the parallel implementation of the auction algorithm for the classical assignment problem. We show that the algorithm admits a totally asynchronous implementation and we consider several implementations on a shared memory machine, with varying degrees of synchronization. We also discuss and explore computationally the tradeoffs involved in using asynchronism to reduce the synchronization penalty.

D. P. Bertsekas, "An Auction Algorithm for Shortest Paths," SIAM J. on Optimization, Vol. 1, 1991, pp. 425-447.

Abstract: We propose a new and simple algorithm for finding shortest paths in a directed graph. In the single origin/single destination case, the algorithm maintains a single path starting at the origin, which is extended or contracted by a single node at each iteration. Simultaneously, at most one dual variable is adjusted at each iteration so as to either improve or maintain the value of a dual function. For the case of multiple origins, the algorithm is well suited for parallel computation. It maintains multiple paths that can be extended or contracted in parallel by several processors that share the results of their computations. Based on experiments with randomly generated problems on a serial machine, the algorithm outperforms substantially its closest competitors for problems with few origins and a single destination. It also seems better suited for parallel computation than other shortest path algorithms.

D. P. Bertsekas, "The auction algorithm for assignment and other network flow problems: A tutorial," ," Interfaces, Vol. 20, 1990, pp.133-149.

Abstract: The auction algorithm is an intuitive method for solving the classical assignment problem. It outperforms substantially its main competitors for important types of problems, both in the ory and in practice and is also naturally well suited for parallel computation. I derive the algorithm from first principles, ex plain its computational properties, and discuss its extensions to transportation and transshipment problems.

P. Tseng, D. P. Bertsekas and J. N. Tsitsiklis, "Partially Asynchronous Algorithms for Network Flow and Other Problems," SIAM J. on Control and Optimization, Vol. 28, 1990, pp. 678-710.

Abstract: The problem of computing a fixed point of a nonexpansive function f is considered. Sufficient conditions are provided under which a parallel, partially asynchronous implementation of the iteration x:=f(x) converges. These results are then applied to (i) quadratic programming subject to box constraints, (ii) strictly convex cost network flow optimization, (iii) an agreement and a Markov chain problem, (iv) neural network optimization, and (v) finding the least element of a polyhedral set determined by a weakly diagonally dominant, Leontief system. Finally, simulation results illustrating the attainable speedup and the effects of asynchronism are presented.

D. P. Bertsekas and D. A. Castanon, "The Auction Algorithm for the Transportation Problem," Annals of Operations Research, Vol. 20, pp. 67-96, 1989.

Abstract: This paper generalizes the auction algorithm to solve linear transportation problems. The idea is to convert the transportation problem into an assignment problem, and then to modify the auction algorithm to exploit the special structure of this problem. Computational results show that this modified version of the auction algorithm is very efficient for certain types of transportation problems.

D. P. Bertsekas and J. Eckstein, "Dual Coordinate Step Methods for Linear Network Flow Problems," Mathematical Programming, Series B, Vol. 42, 1988, pp. 203-243.

Abstract: We review a class of recently-proposed linear-cost network flow methods which are amenable to distributed implementation. All the methods in the class use the notion of epsilon-complementary slackness, and most do not explicitly manipulate any "global" objects such as paths, trees, or cuts. Interestingly, these methods have stimulated a large class of serial computational methods complexity results. We develop the basic theory of these methods and present two specific methods, the epsilon-relaxation algorithm for the minimum-cost flow problem, and the auction algorithm for the assignment problem. We show how to implement these methods with serial complexities of O(N^3 log NC) and O(NA log NC), respectively. We also discuss practical implementation issues and computational experience to date. Finally, we show how to implement epsilon-relaxation in a completely asynchronous, "chaotic" environment in which some processors compute faster than others, and there can be arbitrarily large communication delays.

D. P. Bertsekas, "The Auction Algorithm: A Distributed Relaxation Method for the Assignment Problem," Annals of Operations Research, Vol. 14, 1988, pp. 105-123.

Abstract: We propose a massively parallelizable algorithm for the classical assignment problem. The algorithm operates like an auction whereby unassigned persons bid simultaneously for objects thereby raising their prices. Once all bids are in, objects are awarded to the highest bidder. The algorithm can also be interpreted as a Jacobi - like relaxation method for solving a dual problem. Its (sequential) worst - case complexity, for a particular implementation that uses scaling, is O(NAlog(NC)), where N is the number of persons, A is the number of pairs of persons and objects that can be assigned to each other, and C is the maximum absolute object value. Computational results show that, for large problems, the algorithm is competitive with existing methods even without the benefit of parallelism. When executed on a parallel machine, the algorithm exhibits substantial speedup.

D. P. Bertsekas and P. Tseng, "The relax codes for linear minimum cost network flow problems," Annals of Operations Research, Vol. 13, 1988, pp. 125-190.

Abstract: We describe a relaxation algorithm for solving the classical minimum cost net- work flow problem. Our implementation is compared with mature state-of-the-art primal simplex and primal-dual codes and is found to be several times faster on all types of randomly generated network flow problems. Furthermore, the speed-up factor increases with problem dimension. The codes, called RELAX-II and RELAXT-II, have a facility for efficient reoptimization and sensitivity analysis, and are in the public domain.

D. P. Bertsekas and P. Tseng, "Relaxation Methods for Minimum Cost Ordinary and Generalized Network Flow Problems," Operations Research Journal, Vol. 36, 1988, pp. 93-114.

Abstract: We propose a new class of algorithms for linear network flow problems with and without gains. These algorithms are based on iterative improvement of a dual cost and operate in a manner that is reminiscent of coordinate ascent and Gauss-Seidel relaxation methods. we compare our coded implementations of these methods with mature state-of-the-art primal simplex and primal-dual codes,and find them to be several times faster on standard benchmark problems, and faster by an order of magnitude on large, randomly generated problems. Our experiments indicate that the speedup factor increases with problem dimension.

D. P. Bertsekas, P. A. Hosein, and P. Tseng, "Relaxation Methods for Network Flow Problems with Convex Arc Costs," SIAM J. on Optimization, Vol. 25, 1987.

Abstract: We consider the standard single commodity network flow problem with both linear and strictly convex possibly nondifferentiable arc costs. For the case where all arc costs are strictly convex we study the convergence of a dual Gauss-Seidel type relaxation method that is well suited for parallel computation. We then extend this method to the case where some of the arc costs are linear. As a special case we recover a relaxation method for the linear minimum cost network flow problem proposed in Bertsekas [1] and Bertsekas and Tseng [2].

D. P. Bertsekas and D. ElBaz, "Distributed Asynchronous Relaxation Algorithms for Convex Network Flow Problems," SIAM J. Control and Optimization, Vol. 25, 1987, pp. 74-85.

Abstract: We consider the solution of the single commodity strictly convex network flow prolem in a distributed asynchronous computation environment. The dual of this problem is unconstrained, differentiable, and well suited for solution via Gauss-Seidel relaxation. We show that the structure of the dual allows the successful application of a distributed asynchronous method whereby relaxation iterations are carried out in parallel by several processors in arbitrary order and with arbitrarily large interprocessor communication delays.

D. P. Bertsekas, "Distributed Relaxation Methods for Linear Network Flow Problems," Proc. of 25th CDC, Athens, Greece, 1986, pp. 2101-2106.

Abstract: We consider distributed solution of the classical linear minimum cost network flow problem. We formulate a dual problem which is unconstrained, piecewise linear, and involves a dual variable for each node. We propose a dual algorithm that resembles a Gauss-Seidel relaxation method. At each iteration the dual variable of a single node is changed based on local information from adjacent nodes. In a distributed setting each node can change its variable independently of the variables of other nodes. The algorithm is efficient for some classes of problems, notably for the max-flow problem for which it resembles a recent algorithm by Goldberg [11].

J. N. Tsitsiklis, and D. P. Bertsekas, "Distributed Asynchronous Optimal Routing in Data Networks," IEEE Trans. on Aut. Control, Vol. AC-31, 1986, pp. 325-332.

Abstract: In this paper we study the performance of a class of distributed optimal routing algorithms of the gradient projection type under weaker and more realistic assumptions than those considered thus far. In particular, we show convergence to an optimal routing without assuming synchronization of computation at all nodes and measurement of link lengths at all links, while taking into account the probability of link flow transients caused by routing updates. This demonstrates the robustness of these algorithms in a realistic distributed operating environment.

D. P. Bertsekas, "A Unified Framework for Primal-Dual Methods in Minimum Cost Network Flow Problems," Math. Programming, Vol. 32, pp. 125-145, 1985.

Abstract: We introduce a broad class of algorithms for finding a minimum cost flow in a capacitated network. the algorithms are of the primal-dual type. They maintain primal feasibility with respect to capacity constraints, while trying to satisfy the conservation of flow equation at each node by means of a wide variety of procedures based on flow augmentation, price adjustment, and ascent of a dual functional. The manner in which these procedures are combined is flexible thereby allowing the construction of algorithms that can be tailored to the problem at hand for maximum effectiveness. Particular attention is given to methods that incorporate features from classical relaxation procedures. Experimental codes based on these methods outperform by a substantial margin the fastest available primal-dual and primal simplex codes on standard benchmark problems.

E. Gafni, and D. P. Bertsekas, "Dynamic Control of Session Input Rates in Communication Networks," IEEE Trans. on Aut. Control, Vol. AC-29, 1984, pp. 1009-1016.

Abstract: We consider a distributed iterative algorithm for dynamically adjusting the input rate of each session of a voice or data network using virtual circuits so as to exercise flow control. Each session origin periodically receives information regarding the level of congestion along the session path and iteratively corrects its input rate. In this paper, we place emphasis on voice networks, but the ideas involved are also relevant for dynamic flow control in data networks. The algorithm provides for the addition of new and termination of old sessions and maintains at all times feasibility of link flows with respect to capacity constraints. Fairness with respect to all sessions is built into the algorithm and a mechanism is provided to control link utilization and average delay per packet at any desired level.

D. P. Bertsekas, E. Gafni, and R. G. Gallager, "Second Derivative Algorithms for Minimum Delay Distributed Routing in Networks," IEEE Trans. on Communications, Vol. COM-32, 1984 p. 911.

Abstract: We propose a class of algorithms for finding an optimal quasi-static routing in a communication network. The algorithms are based on Gallager's method [1] and provide methods for iteratively updating the routing table entries of each node in a manner that guarantees convergence to a minimum delay routing. Their main feature is that they utilize second derivatives of the objective function and may be viewed as approximations to a constrained version of Newton's method. The use of second derivatives results in improved speed of convergence and automatic stepsize scaling with respect to level of traffic input. These advantages are of crucial importance for the practical implementation of the algorithm using distributed computation in an environment where input traffic statistics gradually change.

D. P. Bertsekas and E. Gafni, "Projected Newton Methods and Optimization of Multicommodity Flows," IEEE Trans. on Automatic Control, Vol. AC-28, 1983, pp. 1090-1096.

Abstract: A superlinearly convergent Newton like method for linearly constrained optimization problems is adapted for solution of multicommodity network flow problems of the type arising in communication and transportation networks. We show that the method can be implemented approximately by making use of conjugate gradient iterations without the need to compute explicitly the Hessian matrix. Preliminary computational results suggest that this type of method is capable of yielding highly accurate solutions of nonlinear multicommodity flow problems far more efficiently than any of the methods available at present.

D. P. Bertsekas and E. Gafni, "Projection Methods for Variational Inequalities with Applications to the Traffic Assignment Problem," Math. Progr. Studies, Vol. 17, 1982, pp. 139-159.

D. P. Bertsekas, "A New Algorithm for the Assignment Problem," Mathematical Programming, Vol. 21, pp. 152-171, 1981.

Abstract: We propose a new algorithm for the classical assignment problem. The algorithm resembles in some ways the Hungarian method but differs substantially in other respects. The average computational complexity of an efficient implementation of the algorithm seems to be considerably better than the one of the Hungarian method. In a large number of randomly generated problems the algorithm has consistently outperformed an efficiently coded version of the Hungarian method by a broad margin. The factor of improvement increases with the problem dimension N and reaches an order of magnitude for N equal to several hundreds.

D. P. Bertsekas, "A class of optimal routing algorithms for communication networks," ICCC 80, Atlanta, GA, Oct. 1980.

Abstract: We describe an algorithm for minimum delay routing in a communication network. During the algorithm each node maintains a list of paths along which it sends traffic to each destination together with a list of the fractions of total traffic that are sent along these paths. At each iteration a minimum marginal delay path to each destination is computed and added to the current list if not already there. Simultaneously the corresponding fractions are updated in a way that reduces average delay per message. The algorithm is capable of employing second derivatives of link delay functions thereby providing automatic scaling with respect to traffic input level. It can be implemented in both a distributed and a centralized manner, and it can be shown to converge to an optimal routing at a linear rate.

D. P. Bertsekas, "A Distributed Algorithm for the Assignment Problem," Lab. for Information and Decision Systems Report, MIT, May 1979; a typeset version of the typewritten original.

Abstract: This paper describes a new algorithm for solving the classical assignment problem. The algorithm is of a primal-dual nature and in some ways resembles the Hungarian and subgradient methods, but is substantially different in other respects. Its main feature is that it is well suited for distributed operation whereby each node participates in the computation on the basis of limited local information about the topology of the network and the data of the problem. The algorithmic process resembles an auction where economic agents compete for resources by making successively higher bids. The algorithm terminates in a finite number of iterations after resource prices reach levels where no further bidding is profitable. (This is the original paper on the auction algorithm.)

Lecture Slides, Survey Papers, Research papers, Top Menu

Parallel and Distributed Algorithms

D. P. Bertsekas, "Centralized and Distributed Newton Methods for Network Optimization and Extensions," Lab. for Information and Decision Systems Report LIDS-P-2866, MIT, April 2011.

D. P. Bertsekas and J. N. Tsitsiklis, "Comment on Coordination of Groups of Mobile Autonomous Agents Using Nearest Neighbor Rules," Lab. for Information and Decision Systems Report, MIT, June 2006; to appear in IEEE Trans. on Aut. Control.

Abstract: We clarify the relation of the model and the convergence results of Jadbabaie et al. [3] to those studied by Bertsekas et al. [6, 5, 1]. We show that the update equations in [3] are a very special case of those in [5]. Furthermore, the main convergence results in [3] are special cases of those in [5], except for a small difference in the connectivity assumptions which, however, does not affect the proof.

A. Nedic, D. P. Bertsekas, and V. Borkar, Distributed Asynchronous Incremental Subgradient Methods, Proceedings of the March 2000 Haifa Workshop "Inherently Parallel Algorithms in Feasibility and Optimization and Their Applications", D. Butnariu, Y. Censor, and S. Reich, Eds., Elsevier, Amsterdam, 2001.

S. A. Savari and D. P. Bertsekas, "Finite Termination of Asynchronous Iterative Algorithms," Parallel Computing, Vol. 22, 1996, pp. 39-56.

Abstract: We consider $n$-processor distributed systems where the $i$th processor executes asynchronously the iteration $x_i=f_i(x)$. It is natural to terminate the iteration of the $i$th processor when some local condition, such as $x_i-f_i(x)$: ``small", holds. However, local termination conditions of this type may not lead to global termination because of the asynchronous character of the algorithm. In this paper, we propose several approaches to modify the original algorithm and/or supplement it with an interprocessor communication protocol so that this difficulty does not arise.

E. A. Varvarigos and D. P. Bertsekas, "A Conflict Sense Routing Protocol and Its Performance for Hypercubes", IEEE Trans. Computers, Vol. 45, 1996, pp. 693-703 (copy of this paper available from the first author).

Abstract: We propose a new switching format for multiprocessor networks, which we call Conflict Sense Routing Protocol. This switching format is a hybrid of packet and circuit switching, and combines advantages of both. We initially present the protocol in a way applicable to a general topology. We then present an implementation of this protocol for a hypercube computer and a particular routing algorithm. We also analyze the steady-state throughput of the hypercube implementation for random node-to-node communications.

D. P. Bertsekas, D. A. Castanon, J. Eckstein, and S. Zenios, "Parallel Computing in Network Optimization", Handbooks in OR & MS, (M. O. Ball, et. al, Eds.), Vol. 7, 1995, pp. 331-399.

D. P. Bertsekas, F. Guerriero and R. Musmanno, "Parallel Shortest Path Methods for Globally Optimal Trajectories," High Performance Computing: Technology, Methods, and Applications, (J. Dongarra et.al., Eds.), Elsevier, 1995.

Abstract: In this paper we consider a special type of trajectory optimization problem that can be viewed as a continuous-space analog of the classical shortest path problem. This problem is approached by space discretization and solution of a discretized version of the associated Hamilton-Jacobi equation. It was recently shown by Tsitsiklis that some of the ideas of classical shortest path methods, such as those underlying Dijkstra's algorithm, can be applied to solve the discretized Hamilton-Jacobi equation. In more recent work, Polymenakos, Bertsekas, and Tsitsiklis have carried this analogy further to show that some efficient label correcting methods for shortest path problems, the SLF and SLF/LLL methods of Bertsekas, can be fruitfully adapted to solve the discretized Hamilton-Jacobi equation. In this paper we discuss parallel asynchronous implementations of these methods on a shared memory multiprocessor, the Alliant FX/80. Our results show that these methods are well suited for parallelization and achieve excellent speedup.

L. C. Polymenakos, and D. P. Bertsekas, "Parallel Shortest Path Auction Algorithms," Parallel Computing, Vol. 20, pp. 1221-1247, 1994.

E. A. Varvarigos and D. P. Bertsekas, "Multinode Broadcast in Hypercubes and Rings with Randomly Distributed Length of Packets," IEEE Transactions on Parallel and Distributed Systems, Vol. 4, pp. 144-154, 1993.

Abstract: We consider a multinode broadcast (MNB) in a hypercube and in a ring network of processors. This is the communication task where we want each node of the network to broadcast a packet to all the other nodes. The communication model that we use is different than those considered in the literature so far. In particular, we assume that the lengths of the packets that are broadcast are not fixed, but are distributed according to some probabilistic rule, and we compare the optimal times required to execute the MNB for variable and for fixed packet lengths.

D. P. Bertsekas and D. A. Castanon, "Parallel Asynchronous Hungarian Methods for the Assignment Problem," ORSA J. on Computing, Vol. 5, pp. 261-274, 1993.

Abstract: In this paper we discuss the parallel asynchronous implementation of the Hungarian method for solving the classical assignment problem. Multiple augmentations and price rises are simultaneously attempted starting from several unassigned sources and using possibly outdated price and assignment information. The results are then merged asynchronously subject to rather weak compatibility conditions. We show the validity of this algorithm and we demonstrate computationally that an asynchronous implementation is often faster than its synchronous counterpart.

D. P. Bertsekas and D. A. Castanon, "Parallel Primal-Dual Methods for the Minimum Cost Flow Problem," Computational Optimization and Applications, Vol. 2, pp. 317-336, 1993.

D. P. Bertsekas and D. A. Castanon, "Parallel Synchronous and Asynchronous Implementations of the Auction Algorithm," Parallel Computing, Vol. 17, 1991, pp. 707-732.

D. P. Bertsekas, and J. N. Tsitsiklis, "Some Aspects of Parallel and Distributed Iterative Algorithms - A Survey," Automatica, Vol. 27, 1991, pp. 3-21.

Abstract: We consider iterative algorithms of the form x:=f(x), executed by a parallel or distributed computing system. We first consider synchronous executions of such iterations and study their communication requirements, as well as issues related to processor synchronization. We also discuss the parallelization of iterations of the Gauss-Seidel type. We then consider asynchronous implementations whereby each processor iterates on a different component of x, at its own pace, using the most recently received (but possibly outdated) information on the remaining components of x. While certain algorithms may fail to converge when implemented asynchronously, a large number of positive convergence results is available. We classify asynchronous algorithms into three main categories, depending on the amount of asynchronism they can tolerate, and survey the corresponding convergence results. We also discuss issues related to their termination.

D. P. Bertsekas, C. Ozveren, G. D. Stamoulis, P. Tseng, and J. N. Tsitsiklis, "Optimal Communication Algorithms for Hypercubes," J. of Parallel and Distributed Computing, Vol. 11, 1991, pp. 263-275.

Abstract: We consider the following basic communication problems in a hypercube network of processors: the problem of a single processor sending a different packet to each of the other processors, the problem of simultaneous broadcast of the same packet from every processor to all other processors, and the problem of simultaneous exchange of different packets between every pair of processors. The algorithms proposed for these problems are optimal in terms of execution time and communication resource requirements; that is, they require the minimum possible number of time steps and packet transmissions. In contrast, algorithms in the literature are optimal only within an additive or multiplicative factor.

P. Tseng, D. P. Bertsekas and J. N. Tsitsiklis, "Partially Asynchronous Algorithms for Network Flow and Other Problems," SIAM J. on Control and Optimization, Vol. 28, 1990, pp. 678-710.

D. P. Bertsekas and J. N. Tsitsiklis, "Convergence Rate and Termination of Asynchronous Iterative Algorithms", Proceedings of the 1989 International Conference on Supercomputing, Crete, Greece, pp. 461-470, June 1989.

Abstract: We consider iterative algorithms of the form z := f(z), executed by a parallel or distributed computing system. We focus on asynchronous implementations whereby each processor iterates on a different component of z, at its own pace, using the most recently received (but possibly outdated) information on the remaining components of 2. We provide results on the convergence rate of such algorithms and make a comparison with the convergence rate of the corresponding synchronous methods in which the computation proceeds in phases. We also present results on how to terminate asynchronous iterations in finite time with an approximate solution of the computational problem under consideration.

D. P. Bertsekas and J. Eckstein, "Dual Coordinate Step Methods for Linear Network Flow Problems," Mathematical Programming, Series B, Vol. 42, 1988, pp. 203-243.

D. P. Bertsekas, P. A. Hosein, and P. Tseng, "Relaxation Methods for Network Flow Problems with Convex Arc Costs," SIAM J. on Optimization, Vol. 25, 1987.

D. P. Bertsekas and D. ElBaz, "Distributed Asynchronous Relaxation Algorithms for Convex Network Flow Problems," SIAM J. Control and Optimization, Vol. 25, 1987, pp. 74-85.

E. Gafni, and D. P. Bertsekas, "Asymptotic Optimality of Shortest Path Routing Algorithms," IEEE Trans. on Information Theory, Vol. IT-33, 1987, pp. 83-90.

Abstract: Many communication networks use adaptive shortest path routing. By this we mean that each network link is periodically assigned a length that depends on its congestion level during the preceding period, and all traffic generated between length updates is routed along a shortest path corresponding to the latest link lengths. We show that in certain situations, typical of networks involving a large number of small users and utilizing virtual circuits, this routing method performs optimally in an asymptotic sense. In other cases, shortest path routing can be far from optimal.

D. P. Bertsekas, "Distributed Relaxation Methods for Linear Network Flow Problems," Proc. of 25th CDC, Athens, Greece, 1986, pp. 2101-2106.

J. N. Tsitsiklis, and D. P. Bertsekas, "Distributed Asynchronous Optimal Routing in Data Networks," IEEE Trans. on Aut. Control, Vol. AC-31, 1986, pp. 325-332.

J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, "Distributed Asynchronous Deterministic and Stochastic Gradient Optimization Algorithms," IEEE Trans. on Aut. Control, Vol. AC-31, 1986, pp. 803-812.

Abstract: We present a model for asynchronous distributed computation and then proceed to analyze the convergence of natural asynchronous distributed versions of a large class of deterministic and stochastic gradient-like algorithms. We show that such algorithms retain the desirable convergence properties of their centralized counterparts, provided that the time between consecutive interprocessor communications and the communication delays are not too large.

D. P. Bertsekas, "Distributed Asynchronous Computation of Fixed Points," Mathematical Programming, Vol. 27, 1983, pp. 107-120.

Abstract: We present an algorithmic model for distributed computation computation of fixed points whereby several processors participate simultaneously in the calculations while exchanging information via communication links. We place essentially no restrictions on the ordering of computation and communication between processors thereby allowing for completely uncoordinated execution. We provide a general convergence theorem for algorithms of this type, and demonstrate its applicability to several classes of problems, including the calculation of fixed points of contraction and monotone mappings arising in linear and nonlinear systems of equations, optimization problems, shortest path problems, and dynamic programming.

D. P. Bertsekas, "Distributed Dynamic Programming," IEEE Transactions on Aut. Control, Vol AC-27, 1982, pp. 610-616.

Abstract: We consider distributed algorithms for solving dynamic programming problems whereby several processors participate simultaneously in the computation while maintaining coordination by information exchange via communication links. A model of asynchronous distributed computation is developed which requires very weak assumptions on the ordering of computations,the timing of information exchange,the amount of local information needed at each computation node, and the initial condition for the algorithm. the class of problems considered is very broad and includes shortest path problems, and finite and infinite horizon stochastic optimal control problems. When specialized to the shortest path problem, the algorithm reduces t the algorithm originally implemented for routing messages in the internet.

Kimemia, J., Gershwin, S., and D. P. Bertsekas, "Computation of Production Control Policies by a Dynamic Programming Technique," LIDS Report LIDS-P-1236, MIT; also in Analysis and Optimization of Systems, A. Bensoussan and J. L. Lions (eds.), Springer, N. Y., pp. 243-269, 1982.

Abstract: The problem of production management for an automated manufacturing system is described. The system consists of machines that can perform a variety of tasks on a family of parts. The machines are unreliable, and the main difficulty the control system faces is to meet production requirements while machines fail and are repaired at random times. A multi-level hierarchical control algorithm is proposed which involves a stochastic optimal control problem at the first level. Optimal production policies are characterized and a computational scheme is described.

E. Gafni, and D. P. Bertsekas, "Distributed Algorithms for Generating Loop-Free Routes in Networks with Frequently Changing Topology," IEEE Trans. on Communications, Vol. COM-29, 1981, pp. 11-18.

Abstract: We consider the problem of maintaining communication between the nodes of a data network and a central station in the presence of frequent topological changes as, for example, in mobile packet radio networks. We argue that flooding schemes have significant drawbacks for such networks, and propose a general class of distributed algorithms for establishing new loop-free routes to the station for any node left without a route due to changes in the network topology. By virtue of built-in redundancy, the algorithms are typically activated very infrequently and, even when they are, they do not involve any communication within the portion of the network that has not been materially affected by a topological change.

D. P. Bertsekas, "A Distributed Algorithm for the Assignment Problem," Lab. for Information and Decision Systems Report, MIT, May 1979; a typeset version of the typewritten original.

Lecture Slides, Survey Papers, Research papers, Top Menu

Set-Membership Estimation and Control

D. P. Bertsekas, "Control of Uncertain Systems with a Set-Membership Description of the Uncertainty," Ph.D. Thesis, Dept. of Electrical Engineering, M.I.T., 1971.

D. P. Bertsekas and I. B. Rhodes, "On the Minimax Reachability of Target Sets and Target Tubes," Automatica, Vol. 7, pp. 233-241, March 1971.

Abstract: This paper is concerned with the closed-loop control of discrete-time systems in the presence of uncertainty. The uncertainty may arise as disturbances in the system dynamics, disturbances corrupting the output measurements or incomplete knowledge of the initial state of the system. In all cases, the uncertain quantities are assumed unknown except that they lie in given sets. Attention is first given to the problem of driving the system state at the final time into a prescribed target set under the worst possible combination of disturbances. This is then extended to the problem of keeping the entire state trajectory in a given target "tube." Necessary and sufficient conditions for reachability of a target set and a target tube are given in the case where the system state can be measured exactly, while sufficient conditions for reachability are given for the case where only disturbance corrupted output measurements are available. An algorithm is given for the efficient construction of ellipsoidal approximations to the sets involved and it is shown that this algorithm leads to linear control laws. The application of the results in this paper to pursuit-evasion games is also discussed.

D. P. Bertsekas and I. B. Rhodes, "Recursive State Estimation with a Set-Membership Description of the Uncertainty," IEEE Trans. on Automatic Control, Vol. AC-16, pp. 117-128, April 1971.

Abstract: This paper is concerned with the problem of estimating the state of a linear dynamic system using noise-corrupted observations, when input disturbances and observation errors are unknown except for the fact that they belong to given bounded sets. The cases of both energy constraints and individual instantaneous constraints for the uncertain quantities are considered. In the former case, the set of possible system states compatible with the observations received is shown to be an ellipsoid, and equations for its center and weighting matrix are given, while in the latter case, equations describing a bounding ellipsoid to the set of possible states are derived. All three problems of filtering, prediction, and smoothing are examined by relating them to standard tracking problems of optimal control theory. The resulting estimators are similar in structure and comparable in simplicity to the corresponding stochastic linear minimum-variance estimators, and it is shown that they provide distinct advantages over existing schemes for recursive estimation with a set-membership description of uncertainty.

D. P. Bertsekas, "Infinite Time Reachability of State Space Regions by Using Feedback Control," IEEE Trans. on Automatic Control, Vol. AC-17, pp. 604-613, October 1972.

Abstract: In this paper we consider some aspects of the problem of feedback control of a time-invariant uncertain system subject to state constraints over an infinite-time interval. The central question that we investigate is under what conditions can the state of the uncertain system be forced to stay in a specified region of the state space for all times by using feedback control. At the same time we study the behavior of the region of n-step reachability as n tends to infinity. It is shown that in general this region may exhibit instability as we pass to the limit, and that under a compactness assumption this region converges to a steady state. A special case involving a linear finite-dimensional system is examined in more detail. It is shown that there exist ellipsoidal regions in state space where the state can be confined by making use of a linear time-invariant control law, provided that the system is stabilizable. Such control laws can be calculated efficiently through the solution of a recursive matrix equation of the Riccati type.

D. P. Bertsekas, "On the Solution of Some Minimax Problems," Proceedings of the 1972 IEEE Conference on Decision and Control, pp. 328-332.

Abstract: In dynamic minimax and stochastic optimization problems frequently one is forced to use a suboptimal controller since the computation and implementation of the optimal controller based on dynamic programming is impractical in many cases. In this paper we study the performance of some suboptimal controllers in relation to the performance of the optimal feedback controller and the optimal open-loop controller. Attention is focused on some classes of, so called, open-loop-feedback controllers. It is shown under quite general assumptions that these open-loop-feedback controllers perform at least as well as the optimal open-loop controller. The results are developed for general minimax problems with perfect and imperfect state information. In the latter case the open-loop-feedback controller makes use of an estimator which is required to perform at least as well as a pure predictor in order for the results to hold. Some of the results presented have stochastic counterparts.

D. P. Bertsekas and I. B. Rhodes, "Sufficiently Informative Functions and the Minimax Feedback Control of Uncertain Dynamic Systems," IEEE Trans. on Automatic Control, Vol. AC-18, pp. 117-124, April 1973.

Abstract: The problem of optimal feedback control of uncertain discrete-time dynamic systems is considered where the uncertain quantities do not have a stochastic description but instead are known to belong to given sets. The problem is converted to a sequential minimax problem and dynamic programming is suggested as a general method for its solution. The notion of a sufficiently informative function, which parallels the notion of a sufficient statistic of stochastic optimal control, is introduced, and conditions under which the optimal controller decomposes into an estimator and an actuator are identified. A limited class of problems for which this decomposition simplifies the computation and implementation of the optimal controller is delineated.

D. P. Bertsekas, "Linear Convex Stochastic Control Problems Over an Infinite Horizon," IEEE Transactions on Aut. Control, Vol. AC-18, 1973, pp. 314-315.

Abstract: A stochastic control problem over an infinite horizon which involves a linear system and a convex cost functional is analyzed. We prove the convergence of the dynamic programming algorithm associated with the problem, and we show the existence of a stationary Borel measurable optimal control law. The approach used illustrates how results on infinite time reachability can be used for the analysis of dynamic programming algorithms over an infinite horizon subject to state constraints.

D. P. Bertsekas, "Robust Shortest Path Planning and Semicontractive Dynamic Programming," Lab. for Information and Decision Systems Report LIDS-P-2915, MIT, Feb. 2014 (revised Jan. 2015 and June 2016); Naval Research Logistics (NRL), Vol. 66, 2019 pp.15-37. In this paper we consider shortest path problems in a directed graph where the transitions between nodes are subject to uncertainty. We use a minimax formulation, where the objective is to guarantee that a special destination state is reached with a minimum cost path even under the worst possible instance of the uncertainty. Problems of this type arise, among others, in planning and pursuit-evasion contexts, and in model predictive control. Our analysis makes use of the recently developed theory of abstract semicontractive dynamic programming models. We investigate questions of existence and uniqueness of solution of the optimality equation, existence of optimal paths, and the validity of various algorithms patterned after the classical methods of value and policy iteration, as well as a new Dijkstra-like algorithm for problems with nonnegative arc lengths.

Lecture Slides, Survey Papers, Research papers, Top Menu