Dynamic Programming and Optimal Control, Vols. I and II Dimitri P. Bertsekas Athena Scientific, 1995 Preface This two-volume book is based on a first-year graduate course on dynamic programming and optimal control that I have taught for over twenty years at Stanford University, the University of Illinois, and the Massachusetts Institute of Technology. The course has been typically attended by students from engineering, operations research, economics, and applied mathematics. Accordingly, a principal objective of the book has been to provide a unified treatment of the subject, suitable for a broad audience. In particular, problems with a continuous character, such as stochastic control problems, popular in modern control theory, are simultaneously treated with problems with a discrete character, such as Markovian decision problems, popular in operations research. Furthermore, many applications and examples, drawn from a broad variety of fields, are discussed. The book may be viewed as a greatly expanded and pedagogically improved version of my 1987 book ``Dynamic Programming: Deterministic and Stochastic Models," published by Prentice-Hall. I have included much new material on deterministic and stochastic shortest path problems, as well as a new chapter on continuous-time optimal control problems and the Pontryagin Maximum Principle, developed from a dynamic programming viewpoint. I have also added a fairly extensive exposition of simulation-based approximation techniques for dynamic programming. These techniques, which are often referred to as ``neuro-dynamic programming" or ``reinforcement learning," represent a breakthrough in the practical application of dynamic programming to complex problems that involve the dual curse of large dimension and lack of an accurate mathematical model. Other material was also augmented, substantially modified, and updated. With the new material, however, the book grew so much in size that it became necessary to divide it into two volumes: one on finite horizon, and the other on infinite horizon problems. This division was not only natural in terms of size, but also in terms of style and orientation. The first volume is more oriented towards modeling, and the second is more oriented towards mathematical analysis and computation. To make the first volume self-contained for instructors who wish to cover a modest amount of infinite horizon material in a course that is primarily oriented towards modeling, conceptualization, and finite horizon problems, I have added a final chapter that provides an introductory treatment of infinite horizon problems. Many topics in the book are relatively independent of the others. For example Chapter 2 of Vol.\ I on shortest path problems can be skipped without loss of continuity, and the same is true for Chapter 3 of Vol.\ I, which deals with continuous-time optimal control. As a result, the book can be used to teach several different types of courses. \nitem{(a)} A two-semester course that covers both volumes. \nitem{(b)} A one-semester course primarily focused on finite horizon problems that covers most of the first volume. \nitem{(c)} A one-semester course focused on stochastic optimal control that covers Chapters 1, 4, 5, and 6 of Vol.\ I, and Chapters 1, 2, and 4 of Vol.\ II. \nitem{(c)} A one-semester course that covers Chapter 1, about 50\% of Chapters 2 through 6 of Vol.\ I, and about 70\% of Chapters 1, 2, and 4 of Vol.\ II. This is the course I usually teach at MIT. \nitem{(d)} A one-quarter engineering course that covers the first three chapters and parts of Chapters 4 through 6 of Vol.\ I. \nitem{(e)} A one-quarter mathematically oriented course focused on infinite horizon problems that covers Vol.\ II. \smskip The mathematical prerequisite for the text is knowledge of advanced calculus, introductory probability theory, and matrix-vector algebra. A summary of this material is provided in the appendixes. Naturally, prior exposure to dynamic system theory, control, optimization, or operations research will be helpful to the reader, but based on my experience, the material given here is reasonably self-contained. The book contains a large number of exercises, and the serious reader will benefit greatly by going through them. Solutions to all exercises are compiled in a manual that is available to instructors from Athena Scientific or from the author. Many thanks are due to the several people who spent long hours contributing to this manual, particularly Steven Shreve, Eric Loiederman, Lakis Polymenakos, and Cynara Wu. Dynamic programming is a conceptually simple technique that can be adequately explained using elementary analysis. Yet a mathematically rigorous treatment of general dynamic programming requires the complicated machinery of measure-theoretic probability. My choice has been to bypass the complicated mathematics by developing the subject in generality, while claiming rigor only when the underlying probability spaces are countable. A mathematically rigorous treatment of the subject is carried out in my monograph ``Stochastic Optimal Control: The Discrete Time Case," Academic Press, 1978, coauthored by Steven Shreve. This monograph complements the present text and provides a solid foundation for the subjects developed somewhat informally here. Finally, I am thankful to a number of individuals and institutions for their contributions to the book. My understanding of the subject was sharpened while I worked with Steven Shreve on our 1978 monograph. My interaction and collaboration with John Tsitsiklis on stochastic shortest paths and approximate dynamic programming have been most valuable. Michael Caramanis, Emmanuel Fernandez-Gaucherand, Pierre Humblet, Lennart Ljung, and John Tsitsiklis taught from versions of the book, and contributed several substantive comments and homework problems. A number of colleagues offered valuable insights and information, particularly David Castanon, Eugene Feinberg, and Krishna Pattipati. NSF provided research support. Prentice-Hall graciously allowed the use of material from my 1987 book. Teaching and interacting with the students at MIT have kept up my interest and excitement for the subject.