# Dynamic Programming Volume I: Finite States Authors: Thomas J. Sargent, John Stachurski Source: https://github.com/QuantEcon/book-dp1 Site: https://book-dp1.quantecon.org Prose license: CC-BY-SA-4.0 Code license: MIT This file concatenates the full Markdown source of the book for LLM ingestion. Equations are LaTeX inside Markdown. Cross-references and figure paths refer to the published HTML site. ======================================================================== ## index This book covers the theory of dynamic programming with finite state spaces. It is the first of a two-volume sequence by [Sargent](https://www.tomsargent.com/) and [Stachurski](https://johnstachurski.net/). ## Contents **Front Matter** - [](preface.md) - [](common_symbols.md) **Chapters** - [](ch_intro.md) — Introduction - [](ch_fps.md) — Operators and Fixed Points - [](ch_mcs.md) — Markov Chains - [](ch_opt_stop.md) — Optimal Stopping - [](ch_mdps.md) — Markov Decision Processes - [](ch_state_dep.md) — State-Dependent Dynamics - [](ch_val.md) — Valuation - [](ch_rdps.md) — Recursive Decision Processes - [](ch_adps.md) — Abstract Dynamic Programs - [](ch_ctime.md) — Continuous Time ## Companion code Source code accompanying the book is available in [`source_code_jl/`](https://github.com/QuantEcon/book-dp1/tree/main/source_code_jl) (Julia) and [`source_code_py/`](https://github.com/QuantEcon/book-dp1/tree/main/source_code_py) (Python). ## License & reuse The book is published under [CC-BY-SA-4.0](licensing.md); companion code is under [MIT](licensing.md). The authors and QuantEcon explicitly permit the use of this material for indexing, text and data mining, and AI training with attribution. See [License & AI Training Permission](licensing.md) for the full statement. ======================================================================== ## Preface ## About this book This book is about dynamic programming and its applications in economics, finance, and adjacent fields. It brings together recent innovations in the theory of dynamic programming and provides applications and code that can help readers approach the research frontier. The book is aimed at graduate students and researchers, although most chapters are accessible to undergraduate students with solid quantitative backgrounds. The book contains classical results on dynamic programming that are found in texts such as {cite:t}`bellman1957dynamic`, {cite:t}`denardo1981dynamic`, {cite:t}`puterman2005markov`, and {cite:t}`stokey1989recursive`, as well as extensions created by researchers and practitioners over the last few decades as they wrestled with how to formulate and solve dynamic models that can explain patterns observed in data. These extensions include recursive preferences, robust control, continuous-time models, and time varying-discount rates. Such settings often fail to satisfy contraction-mapping restrictions on which traditional methods are based. To accommodate these applications, the key theoretical chapters of this book ({prf:ref}`c-rdps` and {prf:ref}`c-adps`) adopt and extend the abstract framework of {cite:t}`bertsekas2022abstract`. This approach provides great generality while also offering transparent proofs. {prf:ref}`c-introii`--{prf:ref}`c-mcs` provide motivation and background material on solving fixed point problems and computing lifetime valuations. {prf:ref}`c-opt_stop` and {prf:ref}`c-mdps` cover optimal stopping and Markov decision processes, respectively. {prf:ref}`c-state_dep` extends the Markov decision framework to settings where discount rates vary over time. {prf:ref}`c-val` treats recursive preferences. The main theoretical results on dynamic programming from {prf:ref}`c-opt_stop` to {prf:ref}`c-state_dep` are special cases of more general results in {prf:ref}`c-rdps` and {prf:ref}`c-adps`. A brief discussion of continuous-time models can be found in {prf:ref}`c-ctime`. Mathematically inclined readers with background in dynamic programming might prefer to start with the general results in {prf:ref}`c-rdps` and {prf:ref}`c-adps`. Indeed, it is possible to read the text in reverse, jumping to {prf:ref}`c-rdps` and {prf:ref}`c-adps` and then moving back to cover special cases according to interests. However, our teaching experience tells us that most students find the general results challenging on first pass but considerably easier after they have practiced dynamic programming through the earlier chapters. This is why we have started the presentation with special cases and ended it with general results. Instructors wishing to use this book as a text for undergraduate students can start with {prf:ref}`c-introii`, skim through {prf:ref}`c-fpt`, cover {prf:ref}`c-mcs`--{prf:ref}`c-mdps` in depth, optionally include {prf:ref}`c-state_dep`, and skip {prf:ref}`c-val`--{prf:ref}`c-ctime` entirely. ## Volume I scope This book focuses on dynamic programs with finite-state spaces, leaving more general settings to Volume 2. Restricting attention to finite states involves some costs, since there are specific settings where continuous-state models are simpler (one example being Gaussian linear-quadratic models). Moreover, many continuous-state models allow us to unleash calculus, one of humanity's most useful inventions. Nevertheless, finite-state models are extremely useful. Computational representations are always implemented using finitely many floating point numbers, and many workhorse models in economics and finance are already discrete. In addition, focusing on problems with finite state spaces allows us to avoid using function-analytic and measure-theoretic machinery and imposing associated auxilary conditions required to ensure measurability and the existence of extrema. Without these distractions, the core theory of dynamic programming is especially simple. For these reasons, we believe that even for sophisticated readers, a good approach to dynamic programming begins with a thorough analysis of the finite-state case. This is the task that we have tackled in Volume 1. ## Companion code Computer code is a first-class citizen in this book. Code is written in Julia and can be found at We chose Julia because it is open source and because Julia allows us to write computer code that is as close as possible to the relevant mathematical equations. Julia code in the text is written to maximize clarity rather than speed. We have also written matching Python code that can be found in the same repository. When combined with appropriate scientific libraries, Python is very practical and efficient for dynamic programming, but implementations tend to be library specific and are sometimes not as clean as those in Julia. That is why we chose Julia for programs embedded in the text. We have tried to mix rigorous theory with exciting applications. Despite the various layers of abstractions used to unify the theory, the results are practical, being motivated by important optimization problems from economics and finance. This book is one of several being written in partnership with the QuantEcon organization, with funding generously provided by Schmidt Futures (see acknowledgments). There is some overlap with the first book in the series, {cite:t}`sargent2022economic`, particularly on the topic of Markov chains. Although repetition is sometimes undesirable, we decided that some overlap would be useful, since it saves readers from having to jump between two documents. ## Acknowledgments We are greatly indebted to Jim Savage and Schmidt Futures for generous financial support, as well as to Shu Hu, Smit Lunagariya, Maanasee Sharma, and Chien Yeh for outstanding research assistance. We are grateful to Makoto Nirei for hosting John Stachurski at the University of Tokyo in June and July 2022, where significant progress was made. We also thank Alexis Akira Toda, Quentin Batista, Fernando Cirelli, Chase Coleman, Yihong Du, Ippei Fujiwara, Saya Ikegawa, Fazeleh Kazemian, Yuchao Li, Dawie van Lill, Qingyin Ma, Simon Mishricky, Pietro Monticone, Shinichi Nishiyama, Flint O'Neil, Zejin Shi, Akshay Shanker, Arnav Sood, Alexis Akira Toda, Natasha Watkins, Jingni Yang, and Ziyue (Humphrey) Yang for many important fixes, comments, and suggestions. Yuchao Li read the entire manuscript, from cover to cover, and his input and deep knowledge of dynamic programming helped us immensely. Jesse Perla provided insightful comments on our code. ======================================================================== ## Common Symbols and Terminology The following symbols and conventions are used throughout the book. | Symbol | Meaning | |--------|---------| | $\1\{P\}$ | indicator, equal to 1 if statement $P$ is true and 0 otherwise | | $\alpha \coloneqq 1$ | $\alpha$ is defined to be equal to $1$ | | $f \equiv 1$ | function $f$ is everywhere equal to $1$ | | $\wp(A)$ | the power set of $A$ — the collection of all subsets of $A$ | | $\natset{n}$ | $\{1, \ldots, n\}$ | | $\NN, \ZZ, \RR, \CC$ | the natural, integer, real, and complex numbers | | $\ZZ_+, \RR_+, \ldots$ | the nonnegative elements of $\ZZ, \RR, \ldots$ | | $\lvert x\rvert$ for $x \in \RR$ | the absolute value of $x$ | | $\lvert \lambda\rvert$ for $\lambda \in \CC$ | the modulus of $\lambda$ (i.e., $\sqrt{a^2+b^2}$ if $\lambda=a+ib$) | | $\lvert B\rvert$ for set $B$ | the cardinality of $B$ | | $\RR^n$ | all $n$-tuples of real numbers | | $x \leq y$ for $x,y \in \RR^n$ | $x_i \leq y_i$ for $i=1,\ldots,n$ (pointwise partial order) | | $x \ll y$ for $x,y \in \RR^n$ | $x_i < y_i$ for $i=1,\ldots,n$ | | $\dD(F)$ | the set of distributions on $F$ | | $\RR^{\Msf}$ | the set of all functions from $\Msf$ to $\RR$ | | $i\RR^{\Msf}$ | the set of increasing functions in $\RR^{\Msf}$ | | $\lL(\Xsf)$ | the set of linear operators on $\RR^{\Xsf}$ | | $\mM(\Xsf)$ | the set of Markov operators in $\lL(\Xsf)$ | | $\la a, b \ra$ | inner product of the vectors $a$ and $b$ | | $\bigvee_{\alpha \in A} u_\alpha$ | the supremum of $\{u_\alpha\}_{\alpha \in A}$ | | $\bigwedge_{\alpha \in A} u_\alpha$ | the infimum of $\{u_\alpha\}_{\alpha \in A}$ | | iid | independent and identically distributed | | $X \eqdist Y$ | $X$ and $Y$ have the same distribution | | $X \sim F$ | $X$ has distribution $F$ | | $F \lefsd G$ | $G$ first-order stochastically dominates $F$ | ======================================================================== ## Introduction (c-introii)= # Introduction The temporal structure of a typical dynamic program is ```{prf:algorithm} :label: algo-intro-auto-1 - an initial state $X_0$ is given - $t \leftarrow 0$ - % \tcp{foo} - while $t < T$: - the controller of the system observes the current **state** $X_t$ - the controller chooses an **action** $A_t$ - the controller receives a **reward** $R_t$ that - depends on the current state and action - the state updates to $X_{t+1}$ - $t \leftarrow t + 1$ ``` The state $X_t$ is a vector listing current values of variables deemed relevant to choosing the current action. The action $A_t$ is a vector describing choices of a set of decision variables. If $T < \infty$, then the problem has a **finite horizon**. Otherwise it is an **infinite horizon** problem. Figure {numref}`f-state_action_reward` illustrates the first two rounds of a dynamic program. As shown in the figure, a rule for updating the state depends on the current state and action. ```{figure} figures/state_action_reward.svg :name: f-state_action_reward A dynamic program ``` Dynamic programming provides a way to maximize the expected *lifetime* reward of a decision-maker who receives a prospective reward sequence $(R_t)_{t \geq 0}$ and who confronts a system that maps today's state and control into the next period's state. A **lifetime reward** is an aggregation of the individual period rewards $(R_t)_{t \geq 0}$ into a single value. An example of lifetime reward is an expected discounted sum $\EE \sum_{t \geq 0} \beta^t R_t$ for some $\beta \in (0,1)$. ```{prf:example} :label: eg-retail A manager wants to set prices and inventories to maximize a firm's **expected present value** (EPV), which, given the interest rate $r$, is defined as $$ \EE \left[ \pi_0 + \frac{1}{1+r} \pi_1 + \left(\frac{1}{1+r}\right)^2 \pi_2 + \cdots \right]. $$ (eq-dpnpv) Here $X_t$ will be a vector that quantifies the size of the inventories, prices set by competitors, and other factors relevant to profit maximization. The action $A_t$ sets current prices and orders of new stock. The current reward $R_t$ is the current profit $\pi_t$, and the profit stream $(\pi_t)_{t \geq 0}$ is aggregated into a lifetime reward via {eq}`eq-dpnpv`. ``` Dynamic programming has a *vast* array of applications, from robotics and artificial intelligence to the sequencing of DNA. Dynamic programming is used every day to control aircraft, route shipping, test products, recommend information on media platforms, and solve research problems. Some companies produce specialized computer chips that are designed for specific dynamic programs. Within economics and finance, dynamic programming is applied to topics including unemployment, monetary policy, fiscal policy, asset pricing, firm investment, wealth dynamics, inventory control, commodity pricing, sovereign default, the division of labor, natural resource extraction, human capital accumulation, retirement decisions, portfolio choice, and dynamic pricing. We discuss some of these applications in the rest of the book. The core theory of dynamic programming is relatively simple and concise. But implementation can be computationally demanding. That situation provides one of the major challenges facing the field of dynamic programming. ```{prf:example} :label: eg-retail2 To illustrate how computationally demanding problems can be, consider again {prf:ref}`eg-retail`. Suppose that, for each book, a book retailer chooses to hold between 0 and 10 copies. If there are 100 books to choose from, then the number of possible combinations for her inventories is $11^{100}$, about 20 orders of magnitude larger than the number of atoms in the known universe. In reality, there are probably many more books to choose from, as well as other factors in the business environment that affect choices of a retailer. ``` In this book, we discuss fundamental theory, traditional economic applications, and recent applications with computationally demanding environments. We also cover recent trends towards more sophisticated specifications of lifetime rewards, often called recursive preferences. Throughout the book, theory and computation are combined, since, for interesting problems, brute-force computation is futile, while theory alone provides limited insights. The interplay between interesting applications, fundamental theory, computational methods, and evolving hardware capability makes dynamic programming exciting. (s-pfin)= ## Bellman Equations In this section, we introduce the recursive structure of dynamic programming in a simple setting. After solving a finite-horizon model, we consider an infinite-horizon version and explain how it produces a system of nonlinear equations. Then we turn to methods for solving such systems. ### Finite-Horizon Job Search We begin with a celebrated model of job search created by {cite:t}`mccall1970`. McCall analyzed the decision problem of an unemployed worker in terms of current and prospective wage offers, impatience, and the availability of unemployment compensation. Here we study a simple version of the model in which essential ideas of dynamic programming are particularly clear. Readers who are familiar with Bellman equations can skim this section quickly and proceed directly to Section {ref}`s-fps`. (sss-atpp)= #### A Two-Period Problem Imagine someone who begins her working life at time $t=1$ without employment. While unemployed, she receives a new job offer paying wage $W_t$ at each date $t$. She can accept the offer and work *permanently* at that wage level or reject the offer, receive unemployment compensation $c$, and draw a new offer next period. We assume that the wage offer sequence is iid and nonnegative, with distribution $\phi$. In particular, - $\Wsf \subset \RR_+$ is a finite set of possible wage outcomes and - $\phi \colon \Wsf \to [0, 1]$ is a probability distribution on $\Wsf$, assigning a probability $\phi(w)$ to each possible wage outcome $w$. The worker is impatient. Impatience is parameterized by a time discount factor $\beta \in (0, 1)$, so that the present value of a next-period payoff of $y$ dollars is $\beta y$. Since $\beta < 1$, the worker will be tempted to accept reasonable offers, rather than to wait for better ones. A key question is how long to wait. Suppose as a first step that working life is just two periods. To solve our problem, we work backwards, starting at the final date $t=2$, after $W_2$ has been observed.[^1] If she is already employed, the worker has no decision to make: She continues working at her current wage. If she is unemployed, then she should take the largest of $c$ and $W_2$. Now we step back to $t=1$. At this time, having received offer $W_1$, the unemployed worker's options are (a) accept $W_1$ and receive it in both periods or (b) reject it, receive unemployment compensation $c$, and then, in the second period, choose the maximum of $W_2$ and $c$. Let's assume that the worker seeks to maximize EPV. The EPV of option (a) is $W_1 + \beta W_1$, which is also called the **stopping value**. The EPV of option (b), also called the **continuation value**, is $h_1 \coloneq c + \beta \, \EE \max\{c, W_2\}$. More explicitly, $$ h_1 = c + \beta \sum_{w' \in \Wsf} v_2(w') \phi(w'), \quad \text{where} \quad v_2(w) \coloneq \max\{c, w\}. $$ (eq-jstpc) The optimal choice at $t=1$ is now clear: Accept the offer if $W_1 + \beta W_1 \geq h_1$ and reject otherwise. A decision tree is shown in Figure {numref}`f-js_two_period`. ```{figure} figures/js_two_period.svg :name: f-js_two_period Decision tree for a two-period problem ``` #### Comments on Information In determining the optimal choice, we assumed that the worker (a) cares about expected values and (b) knows how to compute them. In {prf:ref}`c-val` and {prf:ref}`c-rdps` we discuss how to extend or weaken these assumptions. Some of these extensions allow decision-makers to focus on measurements that differ from expected values. Other extensions assume that the decision-maker does not know underlying probability distributions. For now we put these issues aside and return to the setup discussed in {ref}`sss-atpp`. #### Value Functions A key idea in dynamic programming is to use "value functions" to track maximal lifetime rewards from a given state at a given time. The **time 2 value function** $v_2$ defined in {eq}`eq-jstpc` returns the maximum value obtained in the final stage for each possible realization of the time 2 wage offer. The **time 1 value function** $v_1$ evaluated at $w \in \Wsf$ is $$ v_1(w) \coloneq \max \left\{ w + \beta w ,\, c + \beta \, \sum_{w' \in \Wsf} v_2(w') \phi(w') \right\}. $$ (eq-jsi_v1) It represents the present value of expected lifetime income after receiving the first offer $w$, conditional on choosing optimally in both periods. ```{figure} ../figures/iid_job_search_0.pdf :name: f-iid_job_search_0 The value function $v_1$ and the reservation wage ``` The value function is shown in Figure {numref}`f-iid_job_search_0`. This figure also shows the **reservation wage** $$ w_1^* \coloneq \frac{h_1}{1+\beta}. $$ (eq-rw0) It is the $w$ that solves the indifference condition $$ w + \beta w = c + \beta \, \sum_{w' \in \Wsf} v_2(w') \phi(w'), $$ and equates the value of stopping to the value of continuing. For an offer $W_1$ above $w_1^*$, the stopping value exceeds the continuation value. For an offer below the reservation wage, the reverse is true. Hence, the optimal choice for the worker at $t=1$ is completely described by the reservation wage. Parameters and functions underlying the figure are shown in {numref}`list-two_period_job_search`. ```{code-block} julia :name: list-two_period_job_search :caption: Computing $v_1$ and $w^*_1$ (`two_period_job_search.jl`) :linenos: using Distributions "Creates an instance of the job search model, stored as a NamedTuple." function create_job_search_model(; n=50, # wage grid size w_min=10.0, # lowest wage w_max=60.0, # highest wage a=200, # wage distribution parameter b=100, # wage distribution parameter β=0.96, # discount factor c=10.0 # unemployment compensation ) w_vals = collect(LinRange(w_min, w_max, n+1)) ϕ = pdf(BetaBinomial(n, a, b)) return (; n, w_vals, ϕ, β, c) end " Computes lifetime value at t=1 given current wage w_1 = w. " function v_1(w, model) (; n, w_vals, ϕ, β, c) = model h_1 = c + β * max.(c, w_vals)'ϕ return max(w + β * w, h_1) end " Computes reservation wage at t=1. " function res_wage(model) (; n, w_vals, ϕ, β, c) = model h_1 = c + β * max.(c, w_vals)'ϕ return h_1 / (1 + β) end ``` Equation {eq}`eq-rw0` is instructive. We can see that higher unemployment compensation $c$ shifts up the continuation value $h_1$ and increases the reservation wage. As a result, the worker will, on average, spend more time unemployed when unemployment compensation is higher. ```{exercise} :label: ex-intro-auto-1 If unemployment compensation increases unemployment duration, should we conclude that increasing such compensation is detrimental to society? Provide some thoughts on this question in the context of the McCall model. ``` ```{solution} ex-intro-auto-1 Here is one possible answer: On the one hand, providing additional unemployment compensation is costly for taxpayers and tends to increase the unemployment rate. On the other hand, unemployment compensation encourages the worker to reject low initial offers, leading to a better lifetime wage. This can enhance worker welfare and expand the tax base. A larger model is needed to disentangle these effects. ``` #### Three Periods Now let's suppose that the worker works in period $t=0$ as well as $t=1,2$. Figure {numref}`f-js_decisions` shows the decision tree for the three periods. Notice that the subtree containing nodes 1 and 2 is just the decision tree for the two-period problem in Figure {numref}`f-js_two_period`. We will use this to find optimal actions. ```{figure} figures/js_decisions.svg :name: f-js_decisions Decision tree for the job seeker ``` At $t=0$, the value of accepting the current offer $W_0$ is $W_0 + \beta W_0 + \beta^2 W_0$, while the maximal value of rejecting and waiting is $c$ plus, after discounting by $\beta$, the maximum value that can be obtained by behaving optimally from $t=1$. We have already calculated this value: It is just $v_1(W_1)$, as given in {eq}`eq-jsi_v1`! The maximal time zero value $v_0(w)$ is the maximum of the value of these two options, given $W_0 = w$, so we can write $$ v_0(w) = \max \left\{ w + \beta \, w + \beta^2 \, w ,\, c + \beta \, \sum_{w' \in \Wsf} v_1(w') \phi(w') \right\}. $$ (eq-js_ir) By plugging $v_1$ from {eq}`eq-jsi_v1` into this expression, we can determine $v_0$, as well as the optimal action, the one that achieves the largest value in the max term in {eq}`eq-js_ir`. Figure {numref}`f-js_decisions` illustrates how the backward induction process works. The last-period value function $v_2$ is trivial to obtain. With $v_2$ in hand, we can compute $v_1$. With $v_1$ in hand, we can compute $v_0$. Once all the value functions are available, we can calculate whether to accept or reject the current offer at each point in time. ```{exercise} :label: ex-intro-auto-2 The optimal action at time $t=0$ is determined by a time zero reservation wage $w_0^*$. The worker should accept the time zero wage offer if and only if $W_0$ exceeds $w^*_0$. Calculate $w_0^*$ for this problem, by analogy with $w_1^*$ in {eq}`eq-rw0`. ``` Notice how we subdivided the three-period problem down into a pair of two-period problems, given by {eq}`eq-jsi_v1` and {eq}`eq-js_ir`. Breaking many-period problems down into a sequence of two-period problems is the essence of dynamic programming. The recursive relationships between $v_0$ and $v_1$ in {eq}`eq-js_ir`, as well as between $v_1$ and $v_2$ in {eq}`eq-jsi_v1`, are examples of what are called **Bellman equations**. We will see many other examples. ```{exercise} :label: ex-intro-auto-3 Extend the preceding arguments to $T$ time periods, where $T$ can be any finite number. Using Julia or another programming language, write a function that takes $T$ as an argument and returns $(w_0^*, \ldots, w_T^*)$, the sequence of reservation wages for each period. ``` (ss-ihfl)= ### Infinite Horizon (sss-jsin)= Next, we consider an infinite horizon problem that in some ways is more challenging but in other ways simpler. On the one hand, the lack of a terminal period means that backward induction requires a subtler justification. On the other hand, the infinite horizon means that the worker always faces an infinite future, so that we only have to study a single-value function and need not keep track of the number of remaining periods in the problem. This will become clearer as the section unfolds.[^2] With this discussion in mind, let us consider a worker who aims to maximize $$ \EE \sum_{t=0}^{\infty} \beta^t R_t, $$ (eq-earnings) where $R_t \in \{c, W_t\}$ is earnings at time $t$. As before, jobs are permanent, so accepting a job at a given wage means earning that wage in every subsequent period. Let's clarify our assumptions: ```{prf:assumption} :label: a-jsbds The wage process satisfies $(W_t)_{t \geq 0} \iidsim \phi$ where $\phi \in \dD(\Wsf)$ and $\Wsf \subset \RR_+$ is finite. The parameters $c$ and $\beta$ are positive and $\beta < 1$. ``` Here and in what follows, for any finite or countable set $F$, the symbol $\dD(F)$ indicates the set of distributions on $F$. As with the finite-state case, infinite-horizon dynamic programming involves a two-step procedure that first assigns values to states and then deduces optimal actions given those values. We begin with an informal discussion and then formalize the main ideas. To trade off current and future rewards optimally, we need to compare the current payoffs we get from our two choices with the states that those choices lead to and the maximum value that can be extracted from those states. But how do we calculate the maximum value that can be extracted from each state when lifetime is infinite? Consider first the present expected lifetime value of being employed with wage $w \in \Wsf$. This case is easy because, under the current assumptions, workers who accept a job are employed forever. Lifetime payoff is $$ w + \beta w + \beta^2 w + \cdots = \frac{w}{1 - \beta}. $$ (eq-jssv) How about the maximum present expected lifetime value attainable when entering the current period unemployed with wage offer $w$ in hand? Denote this (as yet unknown) value by $v^*(w)$. We call $v^*$ the **value function**. While $v^*$ is not trivial to pin down, the task is not impossible. Our first step in the right direction is to observe that it satisfies the **Bellman equation** $$ v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*(w') \phi(w') \right\}, $$ (eq-jsbell00) at every $w \in \Wsf$. (Here $w'$ is the offer next period.) Our reasoning is as follows: The first term inside the max operation is the **stopping value**, or lifetime payoff from accepting current offer $w$. The second term inside the max operation is the **continuation value**, or current expected value of rejecting and behaving optimally thereafter. The maximal value is obtained by selecting the largest of these two alternatives. Note the similarity between {eq}`eq-jsbell00` and our finite horizon Bellman equations {eq}`eq-jsi_v1` and {eq}`eq-js_ir`. The only real difference is that the value function is no longer time-dependent. This is because the worker always looks forward toward an infinite horizon, regardless of the current date. Equation {eq}`eq-jsbell00` is to be solved for a function $v^* \in \RR^\Wsf$, the set of all functions from $\Wsf$ to $\RR$. Once we have solved for $v^*$ (assuming this is possible), optimal choices can be made by observing current $w$ and then choosing the largest of the two alternatives on the right-hand side of {eq}`eq-jsbell00`, just as we did in the finite horizon case. This idea -- that optimal choices can be made by computing the value function and maximizing the right-hand side of the Bellman equation -- is called **Bellman's principle of optimality**, and will be a cornerstone of what follows. Later we prove it in a general setting. To solve for $v^*$, we use fixed-point theory, our topic in the next section. Later, in Section {ref}`ss-js`, we return to the job search problem and apply fixed-point theory to solve for $v^*$. (s-fps)= ## Stability and Contractions In this section, we cover enough fixed-point theory to solve an infinite horizon job search problem. (In {prf:ref}`c-fpt` we consider more general results.) Readers who are familiar with the Neumann series lemma and Banach's fixed-point theorem can skim this section and proceed to {ref}`ss-js`. ### Vector Space To begin, we recall some fundamental properties of real numbers, finite-dimensional vector space, basic topology, and equivalence of norms. (sss-rirl)= #### Real and Complex Vectors For the most part, we are interested in vectors whose elements are real numbers (as distinguished from complex numbers). Before investigating such vectors, let's provide some useful language about the real line $\RR$. (You might want to review some elementary concepts from real analysis in Appendix §{prf:ref}`c-areal`, such as suprema, infima, minima, maxima, and convergence.) Given $a, b \in \RR$, let $a \vee b \coloneq \max\{a, b\}$ and $a \wedge b \coloneq \min\{a, b\}$. The **absolute value** of $a \in \RR$ is defined as $|a| \coloneq a \vee (-a)$. A **real-valued vector** $u = (u_1, \ldots, u_n)$ is a finite real sequence with $u_i \in \RR$ as the $i$-th element. The set of all real vectors of length $n$ is denoted by $\RR^n$. The **inner product** of $n$-vectors $(u_1, \ldots, u_n)$ and $(v_1, \ldots, v_n)$ is $\inner{u,v} \coloneq \sum_{i=1}^n u_i v_i$. The set $\CC$ of complex numbers is defined in the appendix to {cite:t}`sargent2022economic` and many other places; as is the set $\CC^n$ of all complex-valued $n$-vectors. We assume readers know what complex numbers are and how to compute the modulus of a complex number. ```{exercise} :label: ex-ast Let $\alpha$, $s$, and $t$ be real numbers. Show that $\alpha \vee (s+t) \leq s + \alpha \vee t$ whenever $s \geq 0$. ``` ```{solution} ex-ast Fix $\alpha$, $s$, and $t$ with $s \geq 0$. Suppose first that $\alpha \geq s + t$. Then $\alpha \vee (s+t) = \alpha \leq \alpha \vee t \leq s + \alpha \vee t$, as claimed. Suppose next that $\alpha \leq s+t$. Then $\alpha \vee (s+t) = s+t \leq s + \alpha \vee t$, as required. ``` (sss-nfvs)= #### Norms The **Euclidean norm** on a real vector space is defined as $$ \| u \| \coloneq \sqrt{ \inner{u, u} } \qquad (u \in \RR^n). $$ Because they provide more flexibility when checking conditions that underlie various results, some alternative norms on $\RR^n$ are important for applications of fixed-point theory. As a first step, recall that a function $\| \cdot \| \colon \RR^n \to \RR$ is called a **norm** on $\RR^n$ if, for any $\alpha \in \RR$ and $u, v \in \RR^n$, 2 1. $\| u \| \geq 0$ 2. $\| u \| =0 \iff u=0$ 3. $\| \alpha u \| = |\alpha| \| u\|$ and 4. $\| u + v \| \leq \| u \| + \| v \|$ 5. (nonnegativity) 6. (positive definiteness) 7. (absolute homogeneity) 8. (triangle inequality) The Euclidean norm on $\RR^n$ satisfies the **Cauchy--Schwarz inequality** $$ | \inner{u, v} | \leq \| u \| \cdot \| v \| \quad \text{for all } u, v \in \RR^n . $$ This inequality can be used to prove that the triangle inequality holds for the Euclidean norm (see, e.g., {cite}`kreyszig1978introductory`). ```{prf:example} :label: eg-ell1norm The **$\ell_1$ norm** of a vector $u = (u_1, \ldots, u_n) \in \RR^n$ is defined by $$ \| u \|_1 \coloneq \sum_{i=1}^n |u_i|. $$ (eq-l1normfd) In machine learning applications, $\| \cdot \|_1$ is sometimes called the "Manhattan norm," and $d_1 (u, v) \coloneq \| u - v \|_1$ is called the "Manhattan distance" or "taxicab distance" between vectors $u$ and $v$. We will refer to it as the **$\ell_1$ distance** or **$\ell_1$ deviation**. ``` ```{exercise} :label: ex-intro-auto-4 Verify that the $\ell_1$ norm is in fact a norm on $\RR^n$. ``` ```{exercise} :label: ex-expnorm Fix $p \in \RR^n$ with $p_i > 0$ for all $i \in [n]$ and $\sum_i p_i = 1$. Show that $\| u \|_{1,p} \coloneq \sum_{i=1}^n |u_i| p_i$ is a norm on $\RR^n$. ``` The $\ell_1$ norm and the Euclidean norm are special cases of the so-called **$\ell_p$ norm**, which is defined for $p \geq 1$ by $$ \| u \|_p \coloneq \left( \sum_{i=1}^n |u_i|^p \right)^{1/p}. $$ (eq-lpnormfd) It can be shown that $u \mapsto \| u \|_p$ is a norm for all $p \geq 1$, as suggested by the name (see, e.g., {cite}`kreyszig1978introductory`). For this norm, the subadditivity asserted in (d) is called **Minkowski's inequality**. Since the Euclidean case is obtained by setting $p=2$, the Euclidean norm is also called the $\ell_2$ norm, and we write $\| \cdot \|_2$ rather than $\| \cdot \|$ when extra clarity is required. ```{exercise} :label: ex-ellinftyfd Prove that the **supremum norm** (or **$\ell_\infty$ norm**), defined by $\| u \|_\infty \coloneq \max_{i=1}^n |u_i|$, is also a norm on $\RR^n$. ``` (The symbol $\| u \|_\infty$ is used because, for all $u \in \RR^n$, we have $\| u \|_p \to \| u \|_\infty$ as $p \to \infty$.) For the next exercise, we recall that the **indicator function** of logical statement $P$, denoted here by $\1\{P\}$, takes value 1 (resp., 0) if $P$ is true (resp., false). For example, if $x, y \in \RR$, then $$ \1\{x \leq y\} = \begin{cases} 1 & \text{ if } x \leq y \\ 0 & \text{ otherwise} . \end{cases} $$ If $A \subset S$, where $S$ is any set, then $\1_A(x) \coloneq \1\{x \in A\}$ for all $x \in S$. ```{exercise} :label: ex-intro-auto-5 The so-called $\ell_0$ "norm" $\| u \|_0 \coloneq \sum_{i=1}^n \1 \{u_i \not= 0\}$ used in some data science applications is *not* a norm on $\RR^n$. Prove this. ``` ```{solution} ex-intro-auto-5 For $\alpha > 0$ we always have $\| \alpha u \|_0 = \| u \|_0$, which violates absolute homogeneity. ``` (sss-eqvecnorms)= #### Equivalence of Vector Norms An important property of a finite-dimensional normed vector space is that all norms are "equivalent." Let's review this result and discuss why it matters. To begin, recall that when $u$ and $(u_m) \coloneq (u_m)_{m \in \NN}$ are all elements of $\RR^n$, we say that $(u_m)$ **converges** to $u$ and write $u_m \to u$ if $$ \| u_m - u \| \to 0 \text{ as } m \to \infty \text{ for some norm } \| \cdot \| \text{ on } \RR^n. $$ It might seem that this definition is imprecise. Don't we need to clarify that the convergence is with respect to a particular norm? No we don't. This is because any two norms $\| \cdot \|_a$ and $\| \cdot \|_b$ on $\RR^n$ are **equivalent** in the sense that there exist finite positive constants $M, N$ such that $$ M \|u\|_a \leq \| u\|_b \leq N \| u \|_a \quad \text{for all } u \in \RR^n. $$ (eq-eqvecnorms) (See, e.g., {cite}`kreyszig1978introductory`.) ```{exercise} :label: ex-intro-auto-6 Let us write $\| \cdot \|_a \sim \| \cdot \|_b$ if there exist finite $M, N$ such that {eq}`eq-eqvecnorms` holds. Prove that $\sim$ is an equivalence relation (see {ref}`ss-setsfuns`) on the set of all norms on $\RR^n$. ``` ```{exercise} :label: ex-eqncon Let $\| \cdot \|_a$ and $\| \cdot \|_b$ be any two norms on $\RR^n$. Given a point $u$ in $\RR^n$ and a sequence $(u_m)$ in $\RR^n$, use {eq}`eq-eqvecnorms` to confirm that $\| u_m - u \|_a \to 0$ implies $\| u_m - u \|_b \to 0$ as $m \to \infty$. ``` The next exercise tells us that pointwise convergence and norm convergence are the same thing in finite dimensions. ```{exercise} :label: ex-pointnorm Let $\| \cdot \|$ be any norm on $\RR^n$. Fixing a point $u$ in $\RR^n$ and a sequence $(u_m)$ in $\RR^n$, let $u^i$ and $u_m^i$ be the $i$-th component of $u$ and $u_m$ respectively. Show that $u_m^i \to u^i$ for all $i \in \{1, \ldots, n\}$ if and only if $\|u_m - u\| \to 0$. ``` Recall that a set $C \subset \RR^n$ is called **bounded** if there exists an $M \in \NN$ with $\|x\| \leq M$ for all $x \in C$; and **closed** in $\RR^n$ if, for all $u \in \RR^n$ and sequences $(u_m) \subset C$ such that $u_m \to u$ as $m \to \infty$, we also have $u \in C$. A set $G \subset \RR^n$ is called **open** in $\RR^n$ if $G^c$ is closed in $\RR^n$. A set $N$ is called a **neighborhood** of $u \in \RR^n$ if there exists an open set $G \subset \RR^n$ with $u \in G \subset N$. A map $T$ from $U \subset \RR^n$ to $\RR^k$ is called **continuous at $u \in U$** if $Tu_m \to Tu$ for any $(u_m) \subset U$ with $u_m \to u$; and **continuous** if $T$ is continuous at every $u \in U$. These notions apply to any norm, since convergence does not depend on our choice of norm. (ss-lineq)= #### Matrices and Neumann Series Next, we discuss geometric series in matrix space, along with the Neumann series lemma, one of many useful results in applied and numerical analysis. Before starting we recall that if $A = (a_{ij})$ is an $n \times n$ matrix with $i,j$-th element $a_{ij}$, then the definition of matrix multiplication tells us that for $u \in \RR^n$, the $i$-th element of $Au$ is $\sum_{j=1}^n a_{ij}u_j$, while the $j$-th element of $u^\top A$ is $\sum_{i=1}^n a_{ij}u_i$. Think of $u \mapsto Au$ and $u \mapsto u^\top A$ is two different mappings, each of which takes an $n$-vector and produces a new $n$-vector. ```{prf:remark} In this book, we adopt a convention that a vector in $\RR^n$ is just an $n$-tuple of real values. This coincides with the viewpoint of languages like Julia and Python: Vectors are just "flat" arrays. But when we use vectors in matrix algebra, they should be understood as column vectors unless we state otherwise. ``` Just as we considered norms of vectors in {ref}`sss-nfvs`, we will find it helpful to have a notion of norms of matrices. A real-valued map defined on $\RR^{n \times n}$, the set of real $n \times n$ matrices, is called a **matrix norm** if it has the following properties: for any $\alpha \in \RR$ and any $n\times n$ matrices $A, B$, 1. $\| A \| \geq 0$, 2. $\| A \| =0 \iff A=0$, 3. $\| \alpha A \| = |\alpha| \| A\|$, 4. $\| A + B \| \leq \| A \| + \| B \|$, and These are called nonnegativity, positive definiteness, absolute homogeneity, and the triangle inequality, analogous to the norms on $\RR^n$ discussed in {ref}`sss-nfvs`. An example of a matrix norm is the so-called **operator norm** $$ \| B \|_o \coloneq \max_{\|u\| = 1} \| B u \|. $$ (eq-matnorm) Here $B$ is $n \times n$, $u$ is in $\RR^n$ and the norm on the right-hand side is the Euclidean norm over the $n$-vector $B u$. Another example of a matrix norm is the supremum norm defined as $$ \| B \|_\infty \coloneq \max_{1 \leq i, j \leq n} |b_{ij}|, \quad \text{ where } b_{ij} \text{ is the } i,j \text{-th element of } B. $$ (eq-matsupnorm) Some matrix norms have the **submultiplicative** property, which means that, for all $A, B \in \RR^{n \times n}$, we have $\| A B \| \leq \|A \| \|B\|$. ```{exercise} :label: ex-intro-auto-7 Show that the operator $\| \cdot \|_o$ is submultiplicative on $\RR^{n \times n}$. Provide a counterexample to the claim that $\| \cdot \|_\infty$ is submultiplicative. ``` In what follows we often use the operator norm as our choice of matrix norm (partly because of its attractive submultiplicative property). Hence, by convention, an expression such as $\| A\|$ refers to the operator norm $\|A\|_o$ of $A$. Analogous to the vector case, we say that a sequence $(A_k)$ of $n \times n$ matrices converges to an $n \times n$ matrix $A$ and write $A_k \to A$ if $\| A_k - A \| \to 0$ as $k \to \infty$. Just as with vectors, this form of norm convergence holds if and only if each element of $A_k$ converges to the corresponding element of $A$. The proof is similar to the solution to {prf:ref}`ex-pointnorm`. If $A$ is an $n \times n$ matrix, then $\lambda \in \CC$ is called an **eigenvalue** of $A$ if there exists a nonzero $e \in \CC^n$ such that $Ae = \lambda e$. (Here $\CC$ is the set of complex numbers and $\CC^n$ is the set of complex $n$-vectors.) A vector $e$ satisfying this equality is called an **eigenvector** of $A$ and $(\lambda, e)$ is called an **eigenpair**. In Julia, we can compute the eigenvalues of a square matrix $A$ via `eigvals(A)`. The code ``` julia using LinearAlgebra A = [0 -1; 1 0] println(eigvals(A)) ``` produces ``` julia 2-element Vector{ComplexF64}: 0.0 - 1.0im 0.0 + 1.0im ``` Here `im` stands for $i$, the imaginary unit, so the eigenvalues of $A$ are $-i$ and $i$. Turning to geometric series, let us begin in one dimension. Consider the one-dimensional linear equation $u = au + b$, where $a, b$ are given and $u$ is unknown. Its solution $u^*$ satisfies $$ |a| < 1 \quad \implies \quad u^* = \frac{b}{1-a} = \sum_{k \geq 0} a^k b. $$ (eq-geoms) This scalar result extends naturally to vectors. To show this we suppose that $u$ and $b$ are column vectors in $\RR^n$, and that $A$ is an $n \times n$ matrix. We consider the vector equation $u = A u + b$. For the next result, we recall that the **spectral radius** of $A$ is defined as $$ \rho(A) \coloneq \max\setntn{|\lambda|}{\lambda \text{ is an eigenvalue of } A} $$ (eq-srad) Here $|\lambda|$ indicates the modulus of complex number $\lambda$. With $I$ as the $n \times n$ identity matrix, we can state the following result. ```{prf:theorem} :label: t-nsl If $\rho(A) < 1$, then $I - A$ is nonsingular and $$ (I - A)^{-1} = \sum_{k \geq 0} A^k. $$ ``` It follows directly that the vector system $u = A u + b$ has a unique solution $u^* = (I - A)^{-1} b = \sum_{k \geq 0} A^k b$ whenever $\rho(A) < 1$. This is the multivariate extension of {eq}`eq-geoms`. The code in {numref}`list-compute_spec_rad` shows how to compute the spectral radius of an arbitrary matrix $A$ in Julia. The print statement produces `0.5828`, so, for this matrix, $\rho(A)<1$. ```{code-block} julia :name: list-compute_spec_rad :caption: Computing a spectral radius (`compute_spec_rad.jl`) :linenos: using LinearAlgebra ρ(A) = maximum(abs(λ) for λ in eigvals(A)) # Spectral radius A = [0.4 0.1; # Test with arbitrary A 0.7 0.2] print(ρ(A)) ``` ```{exercise} :label: ex-intro-auto-8 Prove that $\rho(\alpha B) = |\alpha| \, \rho(B)$ for all $\alpha \in \RR$. ``` The rest of this section works through the proof of the Neumann series lemma, with several parts left as exercises. An informal proof of the lemma runs as follows. If $S \coloneq \sum_{k \geq 0} A^k$, then $$ I + AS = I + A \sum_{k \geq 0} A^k = I + A + A^2 + \cdots = S. $$ Rearranging $I + AS = S$ gives $S = (I - A)^{-1}$, which matches the claim in the Neumann series lemma. This informal argument lacks rigor. To make it rigorous, we must prove (a) that the sum $\sum_{k \geq 0} A^k$ converges and (b) that the matrix $I-A$ is invertible. ```{prf:lemma} :label: l-rsnb If $B$ is any square matrix and $\| \cdot \|$ is any matrix norm, then $$ \rho(B)^k \leq \| B^k \| \text{ for all } k \in \NN \qquad \text{ and } \qquad \| B^k \|^{1/k} \to \rho(B) \text{ as } k \to \infty. $$ ``` A proof of {prf:ref}`l-rsnb` can be found in Chapter 12 of {cite:t}`bollobas1999linear`. The second result is sometimes called **Gelfand's formula**. ```{exercise} :label: ex-rcondi Using {prf:ref}`l-rsnb`, show that 1. $\| B^k \| \to 0$ as $k \to \infty$ if and only if $\rho(B) < 1$. 2. $\rho(B) > 1$ implies $\| B^k \| \to \infty$ as $k \to \infty$. ``` ```{exercise} :label: ex-intro-auto-9 Prove: If $A$ and $B$ are square matrices that commute (i.e., $AB=BA$), then $\rho(AB) \leq \rho(A) \rho(B)$. (Hint: Show $(AB)^k = A^k B^k$ and use Gelfand's formula.) ``` ```{exercise} :label: ex-intro-auto-10 Prove: $\rho(A) < 1$ implies that the series $\sum_{k \geq 0} A^k$ converges, in the sense that every element of the matrix $S_K \coloneq \sum_{k=0}^K A^k$ converges as $K \to \infty$. ``` From this last result, one can show that $(I-A)^{-1}$ exists by computing it: ```{exercise} :label: ex-ivpower Prove this claim by showing that, when $\sum_{k \geq 0} A^k$ exists, the inverse of $I-A$ exists, and indeed $(I-A)^{-1} = \sum_{k \geq 0} A^k$.[^3] ``` {numref}`list-power_series` helps illustrate the result in {prf:ref}`ex-ivpower`, although we truncate the infinite sum $\sum_{k \geq 0} A^k$ at 50. ```{code-block} julia :name: list-power_series :caption: Matrix inversion versus power series (`power_series.jl`) :linenos: # Primitives A = [0.4 0.1; 0.7 0.2] # Method one: direct inverse B_inverse = inv(I - A) # Method two: power series function power_series(A) B_sum = zeros((2, 2)) A_power = I for k in 1:50 B_sum += A_power A_power = A_power * A end return B_sum end # Print maximal error print(maximum(abs.(B_inverse - power_series(A)))) ``` The output `5.621e-12` is close enough to zero for many practical purposes. (ss-nonsys)= ### Nonlinear Systems While the Neumann series lemma is a powerful tool for solving linear systems, it doesn't help us with *non*linear problems. In this section, we present Banach's fixed-point theorem, one of a variety of techniques for handling nonlinear systems. ({prf:ref}`c-fpt` introduces other methods.) (sss-fixedps)= #### Fixed Points A standard approach to solving an equation is to formulate it as a fixed-point problem. This section provides the basic definitions and some simple results from fixed-point theory. Let $U$ be any nonempty set. We call $T$ a **self-map** on $U$ if $T$ is a function from $U$ into itself. For a self-map $T$ on $U$, a point $u^* \in U$ is called a **fixed point** of $T$ in $U$ if $T u^* = u^*$. (In fixed-point theory, it is common to write $T u$ for the image of $u$ under $T$, rather than $T(u)$.) ```{prf:example} :label: eg-tat Let $U = \RR^n$ and let $T$ be defined by $T u = Au + b$, where $A$ and $b$ are as in {ref}`ss-lineq`. Since $u$ is a fixed point of $T$ if and only if $u = Au + b$, solving the equation $u = A u + b$ is the same as searching for the fixed point of $T$. By the Neumann series lemma, $T$ has the unique fixed point $u^* \coloneq (I - A)^{-1} b$ in $U$ whenever $\rho(A)<1$. ``` ```{prf:example} Every $u$ in set $U$ is fixed under the identity map $I \colon u \mapsto u$. ``` ```{prf:example} If $U = \NN$ and $Tu = u+1$, then $T$ has no fixed point. ``` Figure {numref}`f-three_fixed_points` shows another example, for a self-map $T$ on $U \coloneq [0, 2]$. Fixed points are numbers $u \in [0, 2]$ where $T$ meets the 45-degree line. In this case there are three. ```{figure} ../figures/three_fixed_points.pdf :name: f-three_fixed_points Graph and fixed points of $T \colon u \mapsto 2.125/(1 + u^{-4})$ ``` ```{exercise} :label: ex-ufnflow Let $U$ be any set and let $T$ be a self-map on $U$. Suppose there exists an $\bar u \in U$ and an $m \in \NN$ such that $T^k u = \bar u$ for all $u \in U$ and $k \geq m$. Prove that, under this condition, $\bar u$ is the unique fixed point of $T$ in $U$. ``` ```{solution} ex-ufnflow Let $T$ and $U$ be as stated in the exercise. Regarding uniqueness, suppose that $T$ has two distinct fixed points $u$ and $y$ in $U$. Since $T^m u = \bar u$ and $T^m y = \bar u$, we have $T^m u = T^m y$. But $u$ and $y$ are distinct fixed points, so $u = T^m u$ must be distinct from $y = T^m y$. Contradiction. Regarding the claim that $\bar u$ is a fixed point, we recall that $T^k u = \bar u$ for $k \geq m$. Hence $T^m \bar u = \bar u$ and $T^{m+1} \bar u = \bar u$. But then $$ T \bar u = T T^m \bar u = T^{m+1} \bar u = \bar u, $$ so $\bar u$ is a fixed point of $T$. ``` ```{exercise} :label: ex-clifp Let $T$ be a self-map on $U \subset \RR^d$. Prove the following: If $T^m u \to u^*$ as $m \to \infty$ for some pair $u, u^* \in U$ and, in addition, $T$ is continuous at $u^*$, then $u^*$ is a fixed point of $T$. ``` ```{solution} ex-clifp Assume the hypotheses of the exercise and let $u_m \coloneq T^m u$ for all $m \in \NN$. By continuity and $u_m \to u^*$ we have $T u_m \to T u^*$. But the sequence $(T u_m)$ is just $(u_m)$ with the first element omitted, so, given that $u_m \to u^*$, we must have $T u_m \to u^*$. Since limits are unique, it follows that $u^* = T u^*$. ``` When considering fixed points, given a self-map $T$ on $U$, we typically seek conditions on $T$ and $U$ under which the following properties hold: - $T$ has at least one fixed point on $U$ (existence), - $T$ has at most one fixed point on $U$ (uniqueness), and - the fixed point of $T$ on $U$ can be computed numerically. (sss-glostab)= #### Global Stability A self-map $T$ on $U$ is called **globally stable** on $U$ if $T$ has a unique fixed point $u^*$ in $U$ and $T^k u \to u^*$ as $k \to \infty$ for all $u \in U$. Here $T^k$ indicates $k$ compositions of $T$ with itself. Global stability is a desirable property in the setting of dynamic programming. A number of our results rely on it. ```{exercise} :label: ex-tat As in {prf:ref}`eg-tat`, let $U = \RR^n$ and let $T$ be defined by $T u = Au + b$. Using induction, prove that $$ T^k u = A^k u + A^{k-1} b + A^{k-2} b + \cdots + Ab + b $$ (eq-titer) for all $u \in U$ and $k \in \NN$. Next, show that $T$ is globally stable on $U$ whenever $\rho(A) < 1$. ``` Let $T$ be a self-map on $U \subset \RR^n$. We call $T$ **invariant** on $C \subset U$ and call $C$ an **invariant set** if $T$ is also a self-map on $C$; that is, if $u \in C$ implies $Tu \in C$. ```{exercise} :label: ex-cinvfp Let $T$ be a globally stable self-map on $U \subset \RR^n$, with fixed point $u^*$. Prove the following: If $C$ is closed and $T$ is invariant on $C$, then $u^* \in C$. ``` ```{solution} ex-cinvfp Let the stated hypotheses hold and fix $u \in C$. By global stability we have $T^k u \to u^*$. Since $T$ is invariant on $C$ we have $(T^k u)_{k \in \NN} \subset C$. Since $C$ is closed, this implies that the limit is in $C$. In other words, $u^* \in C$, as claimed. ``` (ss-bcmt)= #### Banach's Fixed-Point Theorem Next, we present the Banach fixed-point theorem, a workhorse for analyzing nonlinear operators. Let $U$ be a nonempty subset of $\RR^n$ and let $\| \cdot \|$ be a norm on $\RR^n$. A self-map $T$ on $U$ is called a **contraction** on $U$ with respect to $\| \cdot \|$ if there exists a $\lambda < 1$ such that $$ \| Tu - Tv \| \leq \lambda \| u - v \| \quad \text{for all} \quad u, v \in U. $$ (eq-uc) The constant $\lambda$ is called the **modulus of contraction**. ```{exercise} :label: ex-sciufp Let $T$ be a contraction on $U$ with respect to a norm $\| \cdot \|$. Show that, $T$ is continuous on $U$ and has at most one fixed point in $U$. ``` ```{exercise} :label: ex-bcam Let $U = \RR^n$ and let $Tx = Ax + b$, where $A$ is $n \times n$ and $b$ is $n \times 1$. Prove that $T$ is a contraction of modulus $\| A \|$ on $U$ (see {eq}`eq-matnorm` for the definition) whenever $\| A \| < 1$. ``` ```{solution} ex-bcam By the definition of the operator norm, we have $\| A u \| \leq \| A \| \| u \|$ for all $u \in \RR^n$. If $\| A \| < 1$, then $T$ is a contraction of modulus $\| A \|$, since, for any $x, y \in U$, $$ \| Ax + b - Ay - b \| = \| A(x - y) \| \leq \|A \| \| x - y \|. $$ ``` The following theorem features a contraction. ```{prf:theorem} :label: t-bfpt If $U$ is closed in $\RR^n$ and $T$ is a contraction of modulus $\lambda$ on $U$ with respect to some norm $\| \cdot \|$ on $\RR^n$, then $T$ has a unique fixed point $u^*$ in $U$ and $$ \| T^k u - u^* \| \leq \lambda^k \| u - u^* \| \quad \text{for all } k \in \NN \text{ and } u \in U. $$ (eq-banachrate) In particular, $T$ is globally stable on $U$. ``` We prove {prf:ref}`t-bfpt` in stages that build on the following exercises. ```{exercise} :label: ex-bctqb Let $U$ and $T$ have the properties stated in {prf:ref}`t-bfpt`. Fix $u_0 \in U$ and let $u_m \coloneq T^m u_0$. Show that $$ \| u_m - u_k \| \leq \sum_{i=m}^{k-1} \lambda^i \| u_0 - u_1 \| $$ holds for all $m, k \in \NN$ with $m < k$. ``` ```{exercise} :label: ex-bctic Using the results in {prf:ref}`ex-bctqb`, prove that $(u_m)$ is a Cauchy sequence in $\RR^n$. (A sequence $(v_m) \subset \RR^n$ is called a **Cauchy sequence** if, for any $\epsilon > 0$, there exists an $N \in \NN$ such that $m, n \geq N$ implies $\| v_m - v_n \| < \epsilon$.) ``` ```{solution} ex-bctic From the bound in {prf:ref}`ex-bctqb`, we obtain $$ \| u_m - u_k \| \leq \frac{\lambda^m - \lambda^k}{1 - \lambda} \| u_0 - u_1 \| \qquad (m,k \in \NN \text{ with } m < k). $$ Hence $(u_m)$ is Cauchy, as claimed. ``` A fundamental property of $\RR^n$ is that if $(v_m)$ is a Cauchy sequence in $\RR^n$, then there exists a $\bar v \in \RR^n$ such that $(v_m)$ converges to $\bar v$. (This property is called **completeness** of the vector space $\RR^n$. See, for example, {cite}`cinlar2013real`.) Hence it follows from {prf:ref}`ex-bctic` that $(u_m)$ has a limit $u^* \in \RR^n$. ```{exercise} :label: ex-intro-auto-11 Prove that $u^* \in U$. ``` ```{prf:proof} *Proof of {prf:ref}`t-bfpt`.* The preceding exercises established existence of a point $u^* \in U$ such that $T^m u \to u^*$. The fact that $u^*$ is a fixed point of $T$ now follows from {prf:ref}`ex-clifp` and {prf:ref}`ex-sciufp`. Uniqueness is implied by {prf:ref}`ex-sciufp`. The bound {eq}`eq-banachrate` follows from iteration on the contraction inequality {eq}`eq-uc` while setting $v=u^*$. ◻ ``` ```{exercise} :label: ex-intro-auto-12 Let $T$ be a contraction of modulus $\beta$ on $\RR^n$ and fixed point $\bar u$. Consider the **damped** or **relaxed** iteration scheme $u_{n+1} = (1-\alpha) u_n + \alpha Tu_n$. Show that, for any choice of $u_0$, these iterates converge to $\bar u$ whenever $0 < \alpha \leq 1$. ``` ```{solution} ex-intro-auto-12 Fix $\alpha \in (0,1]$ and let $F$ be defined by $Fu = (1-\alpha)u + \alpha Tu$. Readers will be able to verify that $F$ is also a contraction with identical fixed point $\bar u$, and that damped iteration is just iteration with $F$. The claim follows. ``` (sss-sus)= ### Successive Approximation Consider a self-map $T$ on $U \subset \RR^n$. We seek algorithms that compute fixed points of $T$ whenever they exist. #### Iteration If $T$ is globally stable on $U$, then a natural algorithm for approximating the unique fixed point $u^*$ of $T$ in $U$ is to pick any $u \in U$ and iterate with $T$ for some finite number of steps: ```{prf:algorithm} :label: algo-intro-auto-2 - fix $u_0 \in U$ and $\tau > 0$ - $k \leftarrow 0$ - $\epsilon \leftarrow \tau + 1$ - while $\epsilon > \tau$: - $u_{k+1} \leftarrow T u_k$ - $\epsilon \leftarrow \|u_{k+1} - u_k\|$ - $k \leftarrow k+1$ - return $u_k$ ``` By the definition of global stability, $(u_k)_{k \geq 0}$ converges to $u^*$. The algorithm just described is called either **successive approximation** or **fixed-point iteration**. {numref}`list-s_approx` provides a function that implements this procedure. Distances between points are measured with the $\ell_\infty$ norm. ```{code-block} julia :name: list-s_approx :caption: Successive approximation (`s_approx.jl`) :linenos: """ Computes an approximate fixed point of a given operator T via successive approximation. """ function successive_approx(T, # operator (callable) u_0; # initial condition tolerance=1e-6, # error tolerance max_iter=10_000, # max iteration bound print_step=25) # print at multiples u = u_0 error = Inf k = 1 while (error > tolerance) & (k <= max_iter) u_new = T(u) error = maximum(abs.(u_new - u)) if k % print_step == 0 println("Completed iteration $k with error $error.") end u = u_new k += 1 end if error <= tolerance println("Terminated successfully in $k iterations.") else println("Warning: hit iteration bound.") end return u end ``` ```{code-block} julia :name: list-linear_iter :caption: Using successive approximations to compute $u^*$ (`linear_iter.jl`) :linenos: include("s_approx.jl") using LinearAlgebra # Compute the fixed point of Tu = Au + b via linear algebra A, b = [0.4 0.1; 0.7 0.2], [1.0; 2.0] u_star = (I - A) \ b # compute (I - A)^{-1} * b # Compute the fixed point via successive approximation T(u) = A * u + b u_0 = [1.0; 1.0] u_star_approx = successive_approx(T, u_0) # Test for approximate equality (prints "true") print(isapprox(u_star, u_star_approx, rtol=1e-5)) ``` {numref}`list-linear_iter` applies successive approximation to the map $Tu = Au + b$ using the function defined in `s_approx.jl`. Figure {numref}`f-linear_iter_fig_1` shows the sequence of iterates generated by four runs of the successive approximation algorithm, each with a different starting condition $u_0$. The map and parameters are the same as in {numref}`list-linear_iter`. It is clear from the figure that a good choice of initial condition (i.e., one that is close to the fixed point) accelerates convergence. ```{figure} ../figures/linear_iter_fig_1.pdf :name: f-linear_iter_fig_1 Successive approximations from different initial conditions ``` Of course for $Tu = Au + b$ with $\rho(A)<1$, there is a more direct method to compute the fixed point: The Neumann series lemma tells us that $u^* = (I-A)^{-1} b$ so we can apply a numerical linear equation solver. However, even for this case, sometimes successive approximation is used instead. One reason is that $(I-A)^{-1}$ can be very large, making application of a linear solver problematic. Another is that we might be satisfied with a quick approximation of the fixed point, computed with a few iterations of $T$. Both of these situations can arise in dynamic programming. (sss-nonmaps)= #### A One-Dimensional Example To illustrate successive approximations in a nonlinear setting, we use the Solow--Swan growth model, which is a good place to begin presenting a theory of economic growth. A fixed point for the Solow--Swan model can be computed with pencil and paper. The model also provides a good laboratory for studying how successive approximations might converge to a fixed point. One version of the Solow--Swan growth dynamics is $$ k_{t+1} = s f(k_t) + (1 - \delta) k_t, \qquad t = 0, 1, \ldots, $$ (eq-solow) where $k_t$ is capital stock per worker, $f \colon (0, \infty) \to (0, \infty)$ is a production function, $s > 0$ is a saving rate and $\delta \in (0,1)$ is a rate of depreciation. If we set $g(k) \coloneq sf(k) + (1-\delta)k$, then iterating with $g$ from a starting point $k_0$ (i.e., setting $k_{t+1} = g(k_t)$ for all $t \geq 0$) generates the sequence in {eq}`eq-solow`. We can also understand this process as using successive approximation to compute the fixed point of $g$. ```{exercise} :label: ex-ssnc Let $f(k)=A k^{\alpha}$ with $A > 0$ and $0 < \alpha < 1$. Show that, while the Solow--Swan map $g(k) = s A k^\alpha + (1 - \delta) k$ sends $U \coloneq (0, \infty)$ into itself, $g$ is *not* a contraction on $U$. (Hint: use the definition of the derivative of $g$ as a limit and consider the derivative $g'(k)$ for $k$ close to zero.) ``` ```{solution} ex-ssnc By the definition of the derivative, for any $x \in U \coloneq (0, \infty)$, we have $$ \lim_{y \to x} \left| \frac{g(y)-g(x)}{y - x} - g'(x) \right| = 0. $$ Hence, by the reverse triangle inequality, for fixed $\epsilon > 0$, we can take a $\delta > 0$ such that $$ \left| \frac{g(y)-g(x)}{y - x} \right| > |g'(x)| - \epsilon = g'(x) - \epsilon, $$ for all $y \in (x - \delta, x + \delta)$. Rearranging gives $$ |g(x)-g(y)| > [g'(x) - \epsilon] |x-y|, $$ for all $y \in (x - \delta, x + \delta)$. But $g'(x) = s \alpha x^{\alpha-1} + 1 - \delta$, which converges to $+\infty$ as $x \to 0$. It follows that, for any $\lambda \in [0,1)$, we can find a pair $x, y$ such that $|g(x)-g(y)| > \lambda |x-y|$. Hence $g$ is not a contraction map under $|\cdot|$. ``` Although the model specified in {prf:ref}`ex-ssnc` does not generate a contraction, it is globally stable. The next exercise asks you to prove this. ```{exercise} :label: ex-kssol Show that, in the setting of {prf:ref}`ex-ssnc`, the unique fixed point of $g$ in $U$ is $$ k^* \coloneq \left( \frac{s A}{\delta} \right)^{1/(1 - \alpha)} $$ Prove that, for $k \in U$, 1. $k \leq k^*$ implies $k \leq g(k) \leq k^*$ and 2. $k^* \leq k$ implies $k^* \leq g(k) \leq k$. Conclude that $g$ is globally stable on $U$. (Why?) ``` Figure {numref}`f-solow_fp` illustrates the dynamics in a 45-degree diagram when $f(k) = A k^\alpha$. In the top subfigure, $A=2.0$, $\alpha=0.3$, $s=0.3$ and $\delta=0.4$. The function $g$ is plotted alongside the 45-degree line. When $g(k_t)$ lies strictly above the 45-degree line, then $k_{t+1} = g(k_t) > k_t$ and so capital per worker rises. If $g(k_t) < k_t$ then it falls. A trajectory $(k_t)_{t \geq 0}$ that is produced by starting from a particular choice of $k_0$ is traced out in the figure. ```{figure} ../figures/solow_fp.pdf :name: f-solow_fp Successive approximation for the Solow--Swan model ``` The figure illustrates that $k^*$ is the unique fixed point of $g$ in $U$ and all sequences converge to it. The second statement can be rephrased as: successive approximation successfully computes the fixed point of $g$ by stepping through the time path of capital. (ss-fvv)= ### Finite-Dimensional Function Space In {ref}`sss-jsin` we introduced a Bellman equation for the infinite horizon job search problem. The unknown object in the Bellman equation is a function $v^*$ defined on the set $\Wsf$ of possible wage offers. Below we discuss how to solve for this unknown function. Since the set of wage offers is finite we can write $\Wsf$ as $\{w_1, \ldots, w_n\}$ for some $n \in\NN$. If we adopt this convention and also write $v^*(w_i)$ as $v^*_i$, then we can view $v^*$ as a vector $(v^*_1, \ldots, v^*_n)$ in $\RR^n$. The vector interpretation is useful when coding, since vectors (numerical arrays) are an efficient data type. Nevertheless, for mathematical exposition, we usually find it more convenient to express function-like objects (e.g., value functions) as functions rather than vectors. Thus, we typically write $v^*(w)$ instead of $v^*_i$. ```{prf:remark} :label: r-deep There is a deeper reason that we usually work with functions rather than vectors: When we shift to general state and action spaces in Volume 2, objects such as value functions can no longer be represented by finite-dimensional vectors. Instead, we must use the language of functional analysis. By adopting this language now, the leap to general spaces will be smoother, since terminology and notation will mostly be unchanged. ``` Section {ref}`sss-poov` clarifies our notation with respect to functions and vectors. (sss-poov)= #### Pointwise Operations on Functions If $\Xsf$ is any set and $u$ maps $\Xsf$ to $\RR$, then we call $u$ a **real-valued function** on $\Xsf$ and write $u \colon \Xsf \to \RR$. Throughout, the symbol $\RR^\Xsf$ denotes the set of all real-valued functions on $\Xsf$. This is a special case of the symbol $B^A$ that represents the set of all functions from $A$ to $B$, where $A$ and $B$ are sets. If $u, v \in \RR^\Xsf$ and $\alpha, \beta \in \RR$, then the expressions $\alpha u + \beta v$ and $uv$ also represent elements of $\RR^\Xsf$, defined at $x \in \Xsf$ by $$ (\alpha u + \beta v)(x) = \alpha u(x) + \beta v(x) \quad \text{and} \quad (uv)(x) = u(x)v(x). $$ (eq-arpo) Similarly, $|u|$, $u \vee v$, and $u \wedge v$ are real-valued functions on $\Xsf$ defined by $$ |u|(x) = |u(x)|, \quad (u \vee v)(x) = u(x) \vee v(x) \;\; \text{ and } \;\; (u \wedge v)(x) = u(x) \wedge v(x). $$ (eq-fvg) Figure {numref}`f-infsup` illustrates functions $u \vee v$ and $u \wedge v$ when $\Xsf$ is a subset of $\RR$. ```{figure} figures/infsup.svg :name: f-infsup Functions $u \vee v$ and $u \wedge v$ ``` Similarly, if $u = (u_i)_{i=1}^n$ and $v = (v_i)_{i=1}^n$ are vectors in $\RR^n$, then $$ |u| \coloneq (|u_i|)_{i=1}^n, \quad u \wedge v \coloneq (u_i \wedge v_i)_{i=1}^n \quad \text{and} \quad u \vee v \coloneq (u_i \vee v_i)_{i=1}^n. $$ (eq-pointov) Figure {numref}`f-vec_sup_inf` illustrates in $\RR^2$. ```{figure} figures/vec_sup_inf.pdf :name: f-vec_sup_inf The vectors $u \vee v$ and $u \wedge v$ in $\RR^2$ ``` (sss-fvv)= #### Functions versus Vectors Let $\Xsf$ be finite, so that $\Xsf = \{x_1, \ldots, x_n\}$ for some $n \in \NN$. The set $\RR^\Xsf$ is the vector space $\RR^n$ expressed in different notation. The next lemma clarifies. ```{prf:lemma} :label: l-rxrn If $\Xsf = \{x_1, \ldots, x_n\}$, then $$ \RR^\Xsf \; \ni \; u \quad \longleftrightarrow \quad (u(x_1), \ldots, u(x_n)) \in \RR^n $$ (eq-isomph) is a one-to-one correspondence between the function space $\RR^\Xsf$ and the vector space $\RR^n$. ``` The claim in {prf:ref}`l-rxrn` is obvious: a real-valued function $u$ on $\Xsf$ is uniquely identified by the set of values that it takes on $\Xsf$, which is an $n$-tuple of real numbers. Throughout the text, whenever the supporting set $\Xsf$ is finite, we freely use the identification in {eq}`eq-isomph`. For example, if $\| \cdot \|$ is any norm on $\RR^n$, then $\| \cdot \|$ extends to $\RR^\Xsf$ via the identification in {eq}`eq-isomph`. That is, for $u \in \RR^\Xsf$, the value $\| u \|$ is given by the norm of the vector $(u(x_1), \ldots, u(x_n)) \in \RR^n$. We say that a subset of $\RR^\Xsf$ is **closed** (resp., **open**, **compact**, etc.) if the corresponding subset of $\RR^n$ is closed (resp., open, compact, etc.) With these conventions, the Neumann series lemma and Banach's contraction mapping theorem extend directly from $\RR^n$ to $\RR^\Xsf$. For example, if $|\Xsf|=n$, $C$ is closed in $\RR^\Xsf$ and $T$ is a contraction on $C \subset \RR^\Xsf$, in the sense that $T \colon C \to C$ and $$ \text{ there exists a } \lambda \in [0, 1) \ \st \quad \| Tf - Tg \| \leq \lambda \| f - g \| \quad \text{for all} \quad f, g \in C, $$ then $T$ has a unique fixed point $f^*$ in $C$ and $$ \| T^n f - f^* \| \leq \lambda^n \| f - f^* \| \quad \text{for all } n \in \NN \text{ and } f \in \RR^\Xsf. $$ Incidentally, in the preceding paragraph $T$ is a function that sends functions into functions (e.g., sends $f$ into $Tf$). To help distinguish $T$ from the functions that it acts on, $T$ in this setting is often called an **operator** rather than a function. This is a convention rather than a formal distinction: from a mathematical perspective, an operator is just a function. A foundational class of operators acting on $\RR^\Xsf$ is the set of linear operators. There is a strong sense in which linear operators are just matrices. We investigate these ideas in {ref}`ss-linopers`. At the same time, when studying dynamic programming we also use many operators that are not linear. One example is the "Bellman operator," which we start to investigate in {ref}`sss-bellop_fm`. (sss-dists)= #### Distributions Given a set $\Xsf$ with $n$ elements, the set of probability **distributions** on $\Xsf$ is written as $\dD(\Xsf)$ and contains all $\phi \in \RR_+^\Xsf$ with $\sum_{x \in \Xsf} \phi(x) =1$. Since we can identify any $f \in \RR^\Xsf$ with a corresponding vector in $\RR^n$, the set $\dD(\Xsf)$ can also be thought of as a subset of $\RR^n$. This collection of vectors (i.e., the nonnegative vectors that sum to unity) is also called the **unit simplex**. Given $\Xsf_0 \subset \Xsf$ and $\phi \in \dD(\Xsf)$, we say that $\phi$ is **supported** on $\Xsf_0$ if $\phi(x) > 0$ implies $x \in \Xsf_0$. Fix $h \in \RR^\Xsf$ and $\phi \in \dD(\Xsf)$. Let $X$ be a random variable with distribution $\phi$, so that $\PP\{X = x\} = \phi(x)$ for all $x \in \Xsf$. The **expectation** of $h(X)$ is $$ \EE h(X) \coloneq \sum_{x \in \Xsf} h(x) \phi(x) = \inner{h, \phi}. $$ ```{exercise} :label: ex-msmax Fix $h \in \RR^\Xsf$. Show that $\phi^* \in \argmax_{\phi \in \dD(\Xsf)} \inner{h, \phi}$ if and only if $\phi^*$ is supported on $\argmax_{x \in \Xsf} h(x)$. ``` ```{solution} ex-msmax Let $m = \max_{x \in \Xsf} h(x)$ and $M = \argmax_{x \in \Xsf} h(x)$. Suppose first that $\phi^*$ is supported on $M$ and let $\phi$ be any distribution on $\Xsf$. Then $\inner{h, \phi^*} = m \geq \inner{h, \phi}$. Conversely, if $\phi^*$ is not supported on $M$, then $\inner{h, \phi^*} < m$. In other words, $\phi^* \in \argmax_{\phi \in \dD(\Xsf)} \inner{h, \phi}$ if and only if $\phi^*$ is supported on $M$. ``` If $\Xsf \subset \RR$, then the **cumulative distribution function** (CDF) corresponding to $\phi$ is the map $\Phi$ from $\Xsf$ to $\RR$ given by $$ \Phi(x) \coloneq \PP\{X \leq x\} = \sum_{x' \in \Xsf} \1\{x' \leq x\} \phi(x'). $$ If $\tau \in [0,1]$, then the $\tau$-th **quantile** of $X$ is $$ Q_\tau \,X \coloneq \min \setntn{x \in \Xsf}{\Phi(x) \geq \tau}. $$ (eq-quantile) If $\tau = 1/2$, then $Q_\tau \,X$ is called the **median** of $X$. ```{prf:example} Suppose $\Xsf = \{x_1, x_2, x_3\}$. If $\phi = (0.5, 0.0, 0.5)$ and $X \sim \phi$, then $\Phi = (0.5, 0.5, 1)$ and $Q_{1/2}(X) = x_1$. The min in {eq}`eq-quantile` allows us to select a unique median (even though $x_2$ is also a reasonable choice). ``` Evidently, if the median of $X$ is $x$, then the median of $X + \alpha$ will be $x + \alpha$. This same logic carries over to arbitrary quantiles, as the next exercise asks you to show. ```{exercise} :label: ex-qftr Prove that the quantile function is additive over constants. That is, for any $\tau \in [0,1]$, random variable $X$ on $\Xsf$ and $\alpha \in \RR$, we have $Q_\tau(X + \alpha) = Q_\tau(X) + \alpha$. ``` ```{solution} ex-qftr Fix $\tau \in [0,1]$, $X \sim \phi \in \dD(\Xsf)$ and $\alpha \in \RR$. Let $\Phi_X$ be the CDF of $X$. Let $Y \coloneq X + \alpha$, let $\Ysf \coloneq \setntn{x + \alpha}{x \in \Xsf}$ and let $\Phi_Y$ the CDF of $Y$. Note that $\Phi_Y(y) = \PP\{Y \leq y\} = \PP\{X \leq y - \alpha\} = \Phi_X(y-\alpha)$ for all $y \in \Ysf$. Let $x^* \coloneq Q_\tau \,X$ and let $y^* = Q_\tau(X + \alpha) = \min\setntn{y \in \Ysf}{\Phi_Y(y) \geq \tau}.$ We need to show that $y^* = x^* + \alpha$. We do this by proving $y^* \geq x^* + \alpha$ and $y^* \leq x^* + \alpha$. For the first inequality, fix $y \in \Ysf$ such that $\Phi_Y(y) \geq \tau$. Let $x = y - \alpha$. We then have $\Phi_Y(x + \alpha) \geq \tau$ and hence $\Phi_X(x) \geq \tau$. Hence $x \geq x^*$, or $y \geq x^* + \alpha$. Since this last inequality holds for any $y \in \Ysf$ with $\Phi_Y(y) \geq \tau$, we have $y^* \geq x^* + \alpha$. For the reverse inequality, fix $x \in \Xsf$ with $\Phi_X(x) \geq \tau$ and set $y = x + \alpha$. We have $\Phi_Y(y) = \Phi_X(y - \alpha) = \Phi_X(x) \geq \tau$, so $y \geq y^*$, or $x \geq y^* - \alpha$. Since the last inequality holds for all $x \in \Xsf$ with $\Phi_X(x) \geq \tau$, we have $x^* \geq y^* - \alpha$. Rearranging gives $y^* \leq x^* + \alpha$, as was to be shown. ``` (ss-js)= ## Infinite-Horizon Job Search Armed with fixed-point methods, we return to the job search problem discussed in {ref}`ss-ihfl`. (ss-jsvp)= ### Values and Policies In this section, we solve for the value function of an infinite horizon job search problem and associated optimal choices. (sss-jsoc)= #### Optimal Choices Let's recall the strategy for solving the infinite-horizon job search problem we proposed in {ref}`sss-jsin`. The first step is to compute the optimal value function $v^*$ that solves the Bellman equation $$ v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*(w') \phi(w') \right\} \qquad (w \in \Wsf). $$ (eq-jsbell) Suppose for a moment that we can compute $v^*$, and let $$ h^* \coloneq c + \beta \sum_{w'} v^*(w') \phi(w') $$ (eq-jsdhd) be the infinite-horizon **continuation value** that equals the maximal lifetime value that the worker can receive, contingent on deciding to continue being unemployed today. With $h^*$ in hand, the optimal decision at any given time, facing the current wage draw $w \in \Wsf$, is as follows: 1. If $w / (1-\beta) \geq h^*$, then accept the job offer. 2. If not, then reject and wait for the next offer. This decision maximizes lifetime value given the current offer. (Later we will prove that this decision process is optimal as claimed. For now, however, we focus on computing $v^*$ and $h^*$.) (sss-bellop_fm)= #### The Bellman Operator The method proposed in {ref}`sss-jsoc` requires that we solve for $v^*$. To do so, we introduce a **Bellman operator** $T$ defined at $v \in \RR^\Wsf$ that is constructed to assure that any fixed point of $T$ solves the Bellman equation and vice versa: $$ (Tv)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \sum_{w' \in \Wsf} v(w') \phi(w') \right\} \qquad (w \in \Wsf). $$ (eq-jsbod) Let $\vV \coloneq \RR^\Wsf_+$ and let $\| \cdot \|_\infty$ be the supremum norm on $\vV$. We measure the distance between two elements $f, g$ of $\vV$ by $\| f - g\| = \max_{w \in \Wsf}|f(w) - g(w)|$. Under this distance, we have the following result. ```{prf:proposition} :label: p-js0be $T$ is a contraction of modulus $\beta$ on $\vV$. ``` Now we turn to the proof of {prf:ref}`p-js0be`. An implication of the proposition is that $T^k v \to v^*$ as $k \to \infty$ for any $v \in \vV$, so we can compute $v^*$ to any required degree of accuracy by successive approximation. Our proof of {prf:ref}`p-js0be` uses the elementary bound $$ |\alpha \vee x - \alpha \vee y| \leq |x - y| \qquad (\alpha, x, y \in \RR) $$ (eq-elmb) ```{exercise} :label: ex-elmb Verify that {eq}`eq-elmb` always holds. ({prf:ref}`ex-ast` might be helpful.) ``` ```{solution} ex-elmb Fix $\alpha, x, y \in \RR$. We have $x = x - y + y \leq |x-y| + y$. Applying the result in {prf:ref}`ex-ast` yields $$ \alpha \vee x \leq |x-y| + \alpha \vee y \quad \iff \quad \alpha \vee x - \alpha \vee y\leq |x-y|. $$ Reversing the roles of $x$ and $y$ completes the proof. ``` ```{prf:proof} *Proof of {prf:ref}`p-js0be`.* Take any $f, g$ in $\vV$ and fix any $w \in \Wsf$. Apply the bound in {eq}`eq-elmb` to get $$ \begin{aligned} |(Tf)(w) - (Tg)(w)| & \leq \left| c + \beta \sum_{w'} f(w') \phi(w') - \left( c + \beta \sum_{w'} g(w') \phi(w') \right) \right| \\ & = \beta \left| \sum_{w'} [f(w') - g(w')] \phi(w') \right|. \end{aligned} $$ Apply the triangle inequality to obtain $$ |(Tf)(w) - (Tg)(w)| \leq \beta \sum_{w'} |f(w') - g(w')| \phi(w') \leq \beta \| f - g \|_\infty . $$ Taking the supremum over all $w$ on the left-hand side of this expression leads to $$ \|Tf - Tg \|_\infty \leq \beta \| f - g \|_\infty . $$ Since $f, g$ were arbitrary elements of $\vV$, the contraction claim is verified. ◻ ``` (sss-policies)= #### Optimal Policies A dynamic program seeks optimal policies. We briefly introduce the notion of a policy and relate it to the job search application. In general, for a dynamic program, choices by the controller aim to maximize lifetime rewards and consist of a state-contingent sequence $(A_t)_{t \geq 0}$ specifying how the agent acts at each point in time. Workers do not know what the future will bring, so it is natural to assume that $A_t$ can depend on present and past events but not future ones. Hence $A_t$ is a function of the current state $X_t$ and past state-action pairs $(A_{t-i}, X_{t-i})$ for $i \geq 1$. That is, $$ A_t = \sigma_t( X_t, A_{t-1}, X_{t-1}, A_{t-2}, X_{t-2}, \ldots, A_0, X_0) $$ for some function $\sigma_t$; $\sigma_t$ is called a time $t$ **policy function**. A key insight of dynamic programming is that some problems can be set up so that *the optimal current action can be expressed as a function of the current state $X_t$*. ```{prf:example} In {prf:ref}`eg-retail`, the retailer chooses stock orders and prices in each period. Every quantity relevant to this decision belongs in the current state. It might include not just the level of current inventories and various measures of business conditions, but also information about rates at which inventories have changed over each of the past six months. ``` If the current state $X_t$ is enough to determine a current optimal action, then policies are just maps from states to actions. So we can write $A_t = \sigma(X_t)$ for some function $\sigma$. A policy function that depends only on the current state is often called a **Markov policy**. Since all policies we consider will be Markov policies, we refer to them more concisely as "policies." ```{prf:remark} In the last paragraph, we dropped the time subscript on $\sigma$ with no loss of generality because we can always include the date $t$ in the current state; i.e., if $Y_t$ is the state without time, then we can set $X_t = (t, Y_t)$). Whether this is necessary depends on the problem at hand. For the job search model with finite horizon, the date matters because opportunities for future earnings decrease with the passage of time. For the infinite horizon version of the problem, in which an agent always looks forward toward an infinite horizon, the only current information that matters to the agent at time $t$ is the wage offer $W_t$. As a result, the calendar date $t$ does not affect the agent's decision at time $t$, so there is no need to include time in the state. (In {ref}`sss-nonstat`, we will formalize this argument.) ``` In the job search model, the state is the current wage offer and possible actions are to accept or to reject the current offer. With $0$ interpreted as reject and $1$ understood as accept, the action space is $\{0,1\}$, so a policy is a map $\sigma$ from $\Wsf$ to $\{0,1\}$. Let $\Sigma$ be the set of all such maps. A policy is an "instruction manual": for an agent following $\sigma \in \Sigma$, if current wage offer is $w$, the agent always responds with $\sigma(w) \in \{0, 1\}$. The policy dictates whether the agent accepts or rejects at any given wage. For each $v \in \vV$, a **$v$-greedy policy** is a $\sigma \in \Sigma$ satisfying $$ \sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \, \sum_{w' \in \Wsf} v(w') \phi(w') \right\} \quad \text{for all } w \in \Wsf. $$ (eq-jsvg) Equation {eq}`eq-jsvg` says that an agent accepts if $w/(1-\beta)$ exceeds the continuation value computed using $v$ and rejects otherwise. Our discussion of optimal choices in {ref}`sss-jsoc` can now be summarized as the recommendation $$ \text{Adopt a } v^* \text{-greedy policy.} $$ This statement is sometimes called Bellman's principle of optimality. Inserting $v^*$ into {eq}`eq-jsvg` and rearranging, we can express a $v^*$-greedy policy via $$ \sigma^*(w) = \1 \left\{ w \geq w^* \right\} \quad \text{where } \; w^* \coloneq (1 - \beta) h^* . $$ (eq-opjs3d) The quantity $w^*$ in {eq}`eq-opjs3d` is called the **reservation wage**, and parallels the reservation wage that we introduced for the finite-horizon problem. Equation {eq}`eq-opjs3d` states that value maximization requires accepting an offer if and only if it exceeds the reservation wage. Thus, $w^*$ provides a scalar description of an optimal policy. ### Computation Let's turn to computation. In {ref}`sss-jscomp`, we apply a standard dynamic programming method, called value function iteration. In {ref}`ss-crwd`, we apply a more specialized method that uses the structure of the job search problem to accelerate computation. (sss-jscomp)= #### Value Function Iteration Recall that, by {prf:ref}`p-js0be`, we can compute an approximate optimal policy by applying successive approximation via the Bellman operator. In the language of dynamic programming, this is called **value function iteration**. {prf:ref}`algo-js_vfi` provides a full description. ```{prf:algorithm} Value function iteration for job search :label: algo-js_vfi - input $v_0 \in \vV$, an initial guess of $v^*$ - input $\tau$, a tolerance level for error - $\epsilon \leftarrow \tau + 1$ - $k \leftarrow 0$ - while $\epsilon > \tau $: - for $w \in \Wsf$: - $v_{k+1}(w) \leftarrow (Tv_k) (w)$ - $\epsilon \leftarrow \| v_k - v_{k+1} \|_\infty$ - $k \leftarrow k + 1$ - Compute a $v_k$-greedy policy $\sigma$ - return $\sigma$ ``` While $T^k v$ rarely attains $v^*$ for $k < \infty$, we can obtain a close approximation by monitoring distances between successive iterates, waiting until they become small enough. Later we will study how these distances depend on $k$, the number of iterations, as well as on parameters defining rewards and opportunities. {numref}`list-iid_job_search` implements value function iteration for the infinite-horizon job search model, using the function for successive approximation from {numref}`list-s_approx`. ```{code-block} julia :name: list-iid_job_search :caption: Value function iteration (`iid_job_search.jl`) :linenos: include("two_period_job_search.jl") include("s_approx.jl") " The Bellman operator. " function T(v, model) (; n, w_vals, ϕ, β, c) = model return [max(w / (1 - β), c + β * v'ϕ) for w in w_vals] end " Get a v-greedy policy. " function get_greedy(v, model) (; n, w_vals, ϕ, β, c) = model σ = w_vals ./ (1 - β) .>= c .+ β * v'ϕ # Boolean policy vector return σ end " Solve the infinite-horizon IID job search model by VFI. " function vfi(model=default_model) (; n, w_vals, ϕ, β, c) = model v_init = zero(model.w_vals) v_star = successive_approx(v -> T(v, model), v_init) σ_star = get_greedy(v_star, model) return v_star, σ_star end ``` Figure {numref}`f-iid_job_search_1` shows a sequence of iterates $(T^k v)_k$ when $v \equiv 0$ and parameters are as given in {numref}`list-two_period_job_search`. Iterates $0, 1$, and $2$ are shown, in addition to iterate $1000$, which we take as a good approximation to the limiting function. If you experiment with different initial conditions, you will see that they all converge to the same limit. ```{figure} ../figures/iid_job_search_1.pdf :name: f-iid_job_search_1 A sequence of iterates of the Bellman operator ``` Figure {numref}`f-iid_job_search_3` shows an approximation of $v^*$ computed using the code in {numref}`list-iid_job_search`, along with the stopping reward $w/(1-\beta)$ and the corresponding continuation value {eq}`eq-jsdhd`. As anticipated, the value function is the pointwise supremum of the stopping reward and the continuation value. The worker chooses to accept an offer only when that offer exceeds some value close to 43.5. ```{figure} ../figures/iid_job_search_3.pdf :name: f-iid_job_search_3 The approximate value function for job search ``` (ss-crwd)= #### Computing the Continuation Value Directly The technique we employed to solve the job search model in {ref}`ss-jsvp` follows a standard approach to dynamic programming. But for this particular problem, there is an easier way to compute the optimal policy that sidesteps calculating the value function. This section explains how. Recall that the value function satisfies the Bellman equation $$ v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \sum_{w'} v^*(w') \phi(w') \right\} \qquad (w \in \Wsf), $$ (eq-jsbell0) and that the continuation value is given by {eq}`eq-jsdhd`. We can use $h^*$ to eliminate $v^*$ from {eq}`eq-jsbell0`. First we insert $h^*$ on the right-hand side of {eq}`eq-jsbell0` and then we replace $w$ with $w'$, which gives $v^*(w') = \max \left\{ w'/(1-\beta) ,\, h^* \right\}$. Then we take mathematical expectations of both sides, multiply by $\beta$ and add $c$ to obtain $$ h^* = c + \beta \sum_{w'} \max \left\{ \frac{w'}{1-\beta} ,\, h^* \right\} \phi(w'). $$ (eq-jsodbh) To obtain the unknown value $h^*$, we introduce the mapping $g \colon \RR_+ \to \RR_+$ defined by $$ g(h) = c + \beta \sum_{w'} \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(w'). $$ (eq-jsdg) By construction, $h^*$ solves {eq}`eq-jsodbh` if and only if $h^*$ is a fixed point of $g$. ```{exercise} :label: ex-jscuf0 Show that $g$ is a contraction map on $\RR_+$. Conclude that $h^*$ is the unique fixed point of $g$ in $\RR_+$. ``` Figure {numref}`f-iid_job_search_4` shows the function $g$ using the discrete wage offer distribution and parameters as adopted previously. The unique fixed point is $h^*$. ```{figure} ../figures/iid_job_search_g.pdf :name: f-iid_job_search_4 Computing the continuation value as the fixed point of $g$ ``` {prf:ref}`ex-jscuf0` implies that we can compute $h^*$ by choosing arbitrary $h \in \RR_+$ and iterating with $g$. Doing so produces a value of approximately 1086. (The associated reservation wage is $w^* = (1-\beta) h^* \approx 43.4$.) Computation of $h^*$ using this method is much faster than value function iteration because the fixed-point problem is in $\RR_+$ rather than $\RR^n_+$. With $h^*$ in hand, we have solved the dynamic programming problem, since a policy $\sigma^*$ is $v^*$-greedy if and only if it satisfies $$ \sigma^*(w) = \1 \left\{ \frac{w}{1-\beta} \geq h^* \right\} \qquad (w \in \RR_+). $$ (eq-opjs2d) ```{exercise} :label: ex-intro-auto-13 As a computational exercise, compare the value function $v^*$ computed via $$ v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, h^* \right\}, $$ with our previous result, shown in Figure {numref}`f-iid_job_search_3`. You should find them essentially identical. ``` (s-cn_intro)= ## Chapter Notes Dynamic programming is often attributed to Richard Bellman (1920--1984). Both the term "dynamic programming" and the technique were popularized by {cite:t}`bellman1957dynamic`. According to his autobiography, Bellman chose the name dynamic programming to avoid giving the impression that he was conducting mathematical research within RAND Corporation. His ultimate boss, Secretary of Defense Charles Wilson, apparently disliked such research {cite:p}`bellman1984eye`. For treatments of dynamic programming from the perspective of economics and finance, see, for example, {cite:t}`sargent1987dynamic`, {cite:t}`stokey1989recursive`, {cite:t}`van2003dynamic`, {cite:t}`bauerle2011markov`, or {cite:t}`stachurski2022economic`. The job search model was introduced by {cite:t}`mccall1970`. The McCall model and its extensions transformed economists' way of thinking about labor markets (see, e.g., {cite}`lucas1978unemployment`). Influential extensions to the job search model include {cite:t}`burdett1978theory`, {cite:t}`jovanovic1979firm`, {cite:t}`pissarides1979job`, {cite:t}`jovanovic1984matching`, {cite:t}`mortensen1986job`, {cite:t}`ljungqvist2002lay` and {cite:t}`chetty2008moral`. {cite:t}`rogerson2005search` provides a useful survey. For elementary real analysis, the book by {cite:t}`bartle2011introduction` is excellent. {cite:t}`ok2007real` is a superb treatment of real analysis and how it is used throughout economic theory. Discussions of Banach's theorem and the Neumann series lemma can be found in {cite:t}`cheney2013analysis` and {cite:t}`atkinson2005theoretical`. {cite:t}`martins2010existence` provides an extension to Banach's theorem that requires only local contractivity. [^1]: The procedure of solving the last period first and then working back in time is called **backward induction**. Starting with the last period makes sense because there is no future to consider. [^2]: Incidentally, imposing an infinite horizon is not the same as assuming humans live forever. Rather, it corresponds to the idea that humans have no specific "termination" date. More generally, we can understand an infinite horizon as an approximation to a finite horizon in which observations are recorded at relatively high frequency and no clear termination date exists. [^3]: Hint: To prove that $A$ is invertible and $B = A^{-1}$, it suffices to show that $AB = I$. ======================================================================== ## Operators and Fixed Points (c-fpt)= # Operators and Fixed Points This chapter discusses techniques that underlie the optimization and fixed-point methods used throughout the book. Many of these techniques relate to order. Order-theoretic concepts will prove valuable not only for fixed-point methods but also for understanding the main concepts in dynamic programming. {prf:ref}`c-rdps` will show core components of dynamic programming can be expressed in terms of simple order-theoretic constructs. ## Stability In this section, we discuss algorithms for computing fixed points and analyze their convergence. (ss-cops)= ### Conjugate Maps First we treat a technique for simplifying analysis of stability and fixed points that we'll apply in applications. To illustrate the idea, suppose that we want to study dynamics induced by a self-map $T$ on $U \subset \RR^n$. We might want to know if a unique fixed point of $T$ exists and if iterates of $T$ converge to a fixed point. One approach is to apply fixed-point theory to $T$. However, sometimes there is an easier approach: Transform $T$ into a "simpler" map $\hat T$ and study its the fixed-point properties. For this to work, we need to be sure that useful properties we discover about $\hat T$ will transmit themselves back to properties of $T$, the map that actually interests us. This section explains a notion of conjugacy that formalizes these ideas. The study of conjugate relationships originated in the field of dynamical systems theory. Later we will apply this approach to operators that arise in contexts of dynamic programming and recursive preferences. (sss-conjugacy)= #### Conjugacy A **dynamical system** is a pair $(U, T)$, where $U$ is any set and $T$ is a self-map on $U$. Two dynamical systems $(U, T)$ and $(\hat U, \hat T)$ are said to be **conjugate** under $\Phi$ if $\Phi$ is a bijection from $U$ into $\hat U$ such that $T = \Phi^{-1} \circ \hat T \circ \Phi$ on $U$. Conjugacy of $(U, T)$ and $(\hat U, \hat T)$ under $\Phi$ can be understood as follows: Shifting a point $u \in U$ to $T u$ via $T$ is equivalent to moving $u$ into $\hat U$ via $\hat u = \Phi u$, applying $\hat T$, and then moving the result back using $\Phi^{-1}$: ```{figure} figures/conjugacy.svg :name: fig-conjugacy :align: center :width: 50% Conjugacy of dynamical systems $(U, T)$ and $(\\hat U, \\hat T)$ under $\\Phi$. ``` ```{prf:example} :label: eg-hommat2 Let $A$ be $n \times n$ **diagonalizable**, meaning that there exists a diagonal matrix $D$ and a matrix $P$ such that $A = P^{-1}DP$. We regard $A$ as a self-map on $\RR^n$, $D$ as a self-map on $\CC^n$, and $P$ as a map from $\RR^n$ to $\CC^n$. The identity $A = P^{-1}DP$ implies that the dynamical systems $(A, \RR^n)$ and $(D, \CC^n)$ are conjugate. ``` The next two exercises illustrate benefits of establishing a conjugate relationship between two dynamical systems. ```{exercise} :label: ex-ctrn Show that if $(U,T)$ and $(\hat U, \hat T)$ are conjugate under $\Phi$, then $u \in U$ is a fixed point of $T$ on $U$ if and only if $\Phi u \in \hat U$ is a fixed point of $\hat T$ on $\hat U$. ``` ```{solution} ex-ctrn Let $(U,T)$ and $(\hat U, \hat T)$ be conjugate under $\Phi$, with $\hat T \circ \Phi = \Phi \circ T$. The stated equivalence holds because $$ T u=u \; \iff \; \Phi T u = \Phi u \; \iff \; \hat T\Phi u = \Phi u. $$ ``` ```{exercise} :label: ex-ctrn_ex Extending {prf:ref}`ex-ctrn`, let $(U,T)$ and $(\hat U, \hat T)$ be dynamical systems and let $\Fix(T)$ and $\Fix(\hat T)$ be the set of fixed points of $T$ and $\hat T$, respectively. Show that $\Phi$ is a bijection from $\Fix(T)$ to $\Fix(\hat T)$. ``` The next result summarizes the most important consequences of our findings. ```{prf:proposition} :label: p-iccm If $(U,T)$ and $(\hat U, \hat T)$ are conjugate dynamical systems, then 1. $u$ is a fixed point of $T$ if and only if $\Phi u$ is a fixed point of $\hat T$, 2. $\hat u$ is a fixed point of $\hat T$ if and only if $\Phi^{-1} \hat u$ is a fixed point of $T$, and 3. the set of fixed points of $T$ and the set of fixed points of $\hat T$ have the same cardinality. ``` In particular, if $T$ has a unique fixed point on $U$ if and only if $\hat T$ has a unique fixed point on $\hat U$. (sss-topconds)= #### Topological Conjugacy Let $U$ and $\hat U$ be two subsets of $\RR^n$. A function $\Phi$ from $U$ to $\hat U$ is called a **homeomorphism** if it is continuous, bijective, and its inverse $\Phi^{-1}$ is also continuous. ```{prf:example} The map $\Phi u = \ln u$ from $(0,\infty)$ to $\RR$ is a homeomorphism, with continuous inverse $\Phi^{-1} y = \exp(y)$. ``` ```{prf:example} :label: eg-hommat Let $\Phi$ be an $n \times n$ matrix. We can regard $\Phi$ as a map sending column vector $u$ into column vector $\Phi u$. This map is a homeomorphism from $\RR^n$ to itself if and only if $\Phi$ is nonsingular. ``` Assume again that $U$ and $\hat U$ are subsets of $\RR^n$. In this setting, we say that dynamical systems $(U, T)$ and $(\hat U, \hat T)$ are **topologically conjugate** under $\Phi$ if $(U, T)$ and $(\hat U, \hat T)$ are conjugate under $\Phi$ and, in addition, $\Phi$ is a homeomorphism. ```{exercise} :label: ex-fps-auto-1 Let $U \coloneq (0,\infty)$ and $\hat U \coloneq \RR$. Let $T u = A u^{\alpha}$, where $A > 0$ and $\alpha \in \RR$, and let $\hat T \hat u = \ln A + \alpha \hat u$. Show that $T$ and $\hat T$ are topologically conjugate under $\Phi \coloneq \ln$. ``` ```{solution} ex-fps-auto-1 To show that $T = \Phi^{-1} \circ \hat T \circ \Phi$ holds, we can equivalently prove that $\Phi \circ T = \hat T \circ \Phi$. For $u \in \RR$, we have $\Phi T u = \ln A + \alpha \ln u$ and $\hat T \Phi u = \ln A + \alpha \ln u$. Hence $\Phi \circ T = \hat T \circ \Phi$, as was to be shown. ``` ```{exercise} :label: ex-ctrn2 Consider again the setting of {prf:ref}`ex-ctrn`, but now suppose that $(U,T)$ and $(\hat U, \hat T)$ are topologically conjugate under $\Phi$, Fixing $u, u^* \in U$, show that $\lim_{k \to \infty} T^k u = u^*$ if and only if $\lim_{k \to \infty} \hat T^k \Phi u = \Phi u^*$. ``` ```{solution} ex-ctrn2 From $\hat T = \Phi \circ T \circ \Phi^{-1}$ we have $\hat T^2 = \Phi \circ T \circ \Phi^{-1} \circ \Phi \circ T \circ \Phi^{-1} = \Phi \circ T^2 \circ \Phi^{-1}$ and, continuing in the same way (or using induction), $\hat T^k = \Phi \circ T^k \circ \Phi^{-1}$ for all $k \in \NN$. Equivalently, $\hat T^k \circ \Phi = \Phi \circ T^k$ for all $k \in \NN$. Hence, using continuity of $\Phi$ and $\Phi^{-1}$, $$ T^k u \to u^* \; \iff \; \Phi T^ku \to \Phi u^* \; \iff \; \hat T^k\Phi u \to \Phi u^*. $$ ``` The next exercise asks you to show that topologically conjugacy is an equivalence relation, as defined in {ref}`ss-setsfuns`. ```{exercise} :label: ex-dseqref Let $\mathbf U$ be the set of all dynamical systems $(U, T)$ with $U \subset \RR^n$. Show that topologically conjugacy is an equivalence relation on $\mathbf U$. ``` ```{solution} ex-dseqref Let $\mathbf U$ be the set of all dynamical systems $(U, T)$ with $U \subset \RR^n$ and write $(U, T) \sim (\hat U, \hat T)$ if these systems are topologically conjugate. It is easy to see that $\sim$ is reflexive and symmetric. Regarding transitivity, suppose that $(U, T) \sim (U', T')$ and $(U', T') \sim (U'', T'')$. Let $F$ be the homeomorphism from $U$ to $U'$ and $G$ be the homeomorphism from $U'$ to $U''$. Then $H \coloneq G \circ F$ is a homeomorphism from $U$ to $U''$ with inverse $(F \circ G)^{-1}$. Moreover, on $U$, we have $$ T = F^{-1} \circ T' \circ F = F^{-1} \circ G^{-1} \circ T'' \circ G \circ F = (G F)^{-1} \circ T'' \circ G \circ F. $$ Hence $(U, T) \sim (U'', T'')$ and $\sim$ is transitive, as required. ``` From the preceding exercises we can state the following useful result: ```{prf:proposition} :label: p-kue If $(U,T)$ and $(\hat U, \hat T)$ are topologically conjugate, then 1. $T$ is globally stable on $U$ if and only if $\hat T$ is globally stable on $\hat U$, and 2. the unique fixed points $u^* \in U$ and $\hat u^* \in \hat U$ satisfy $\hat u^* = \Phi u^*$. ``` (ss-locstab)= ### Local Stability In {ref}`sss-glostab` we investigated global stability. Here we introduce local stability and provide a sufficient condition for situations in which the map is smooth. Let $U$ be a subset of $\RR^n$ and let $T$ be a self-map on $U$. A fixed point $u^*$ of $T$ in $U$ is called **locally stable** for the dynamical system $(U, T)$ if there exists an open set $O \subset U$ such that $u^* \in O$ and $T^k u \to u^*$ as $k \to \infty$ for every $u \in O$. In other words, the domain of attraction for $u^*$ contains an open neighborhood of $u^*$. ```{prf:example} :label: eg-soloc Consider the self-map $g$ on $\RR$ defined by $g(x) = x^2$. The fixed point $1$ is not stable (e.g., $g^t(x) \to \infty$ for any $x > 1$). However, $0$ is locally stable, because $-1 < x < 1$ implies that $g^t(x) \to 0$ as $t \to \infty$. ``` ```{exercise} :label: ex-lctrn Returning to the setting of {prf:ref}`ex-ctrn2`, let $(U,T)$ and $(\hat U, \hat T)$ be topologically conjugate and let $u^*$ be a fixed point of $T$ in $U$. Show that $u^*$ is locally stable for $(U, T)$ if and only if $\Phi u^*$ is locally stable for $(\hat U, \hat T)$. ``` For an interior fixed point $x^*$ of a smooth self-map $g$ on an interval of $\RR$, local stability holds whenever $|g'(x^*)| < 1$. The proof strategy proceeds as follows: When $|g'(x^*)| < 1$, the first-order linear approximation $$ \hat g(x) \coloneq g(x^*) + g'(x^*)(x - x^*) = x^* + g'(x^*)(x - x^*) $$ is a contraction of modulus $|g'(x^*)|$ with unique fixed point $x^*$. Hence all trajectories of $\hat g$ converge to $x^*$. Moreover, since $g$ and $\hat g$ are similar in a neighborhood of $x^*$, the same is true for trajectories of $g$ starting close to $x^*$. The next theorem formalizes this line of argument and extends it to multiple dimensions. In stating the theorem, we take $T$ to be a self-map on $U$ with fixed point $u^*$ in $U$ and assume that $T$ is continuously differentiable on $U$. Recall that the **Jacobian** of $T$ at $u \in U$ is $$ J_T(u) \coloneq \begin{pmatrix} \frac{\partial T_1}{\partial u_1}(u) & \cdots & \frac{\partial T_1}{\partial u_n}(u) \\ & \cdots & \\ \frac{\partial T_n}{\partial u_1}(u) & \cdots & \frac{\partial T_n}{\partial u_n}(u) \end{pmatrix} \quad \text{where} \quad Tu = \begin{pmatrix} T_1 u \\ \vdots \\ T_n u \end{pmatrix} , $$ and let $\hat T$ be the first-order approximation to $T$ at $u^*$: $$ \hat Tu = u^* + J_T(u^*) (u - u^*) \qquad (u \in U). $$ ```{prf:theorem} :label: t-hartgrob If $J_T(u^*)$ is nonsingular and contains no eigenvalues on the unit circle in $\CC$, then there exists an open neighborhood $O$ of $u^*$ such that $(O, T)$ and $(O, \hat T)$ are topologically conjugate. ``` Combining this theorem with the result of {prf:ref}`ex-lctrn`, we see that, under the conditions of the theorem, $u^*$ is globally stable for $(O, T)$, and hence locally stable for $(U, T)$, whenever $(O, \hat T)$ is globally stable. By the Neumann series lemma, the first-order approximation will be globally stable whenever $J_T(u^*)$ has spectral radius less than one. Thus, we have ```{prf:corollary} Under the conditions of {prf:ref}`t-hartgrob`, the fixed point $u^*$ is locally stable whenever $\rho(J_T(u^*)) < 1$. ``` (ss-crates)= ### Convergence Rates To discuss relative rates of convergence we fix a norm $\| \cdot \|$ on $\RR^n$ and take a sequence $(u_k)_{k \geq 0} \subset \RR^n$ converging to $u^* \in \RR^n$. Set $e_k \coloneq \| u_k - u^* \|$ for all $k$. We say that $(u_k)$ converges to $u^*$ **at rate at least $q$** if $q \geq 1$ and, for some $\beta \in (0, \infty)$ and $N \in \NN$, we have $$ e_{k+1} \leq \beta e_k^q \quad \text{ for all } k \geq N. $$ We say that convergence occurs **at rate $q$** if, in addition, $$ \limsup_{k \to \infty} \frac{e_{k+1}}{e_k^q} = \beta . $$ In addition, - If $q=2$, then we say that convergence is (at least) **quadratic**. - If $q=1$ and $\beta < 1$, then we say that convergence is (at least) **linear**. ```{prf:example} Let $T$ be a contraction of modulus $\lambda$ on a closed set $U \subset \RR^n$. If $u^*$ is the unique fixed point of $T$ in $U$ and $u_k \coloneq T^k u_0$, then $(u_k)$ converges at least linearly to $u^*$, since $$ e_{k+1} = \| u_{k+1} - u^* \| = \| T u_k - T u^* \| \leq \lambda e_k. $$ ``` Orders of convergence are studied in the neighborhood of zero, implying that higher orders are faster. For example, suppose $\epsilon_k \coloneq \| u_k - u^* \|$ is the size of the error and that $u_k$ converges to $u^*$ quadratically. If, say, $\epsilon_k = 10^{-5}$, then $\epsilon_{k+1} \approx \beta 10^{-10}$. Provided that $\beta$ is not large, the number of accurate digits roughly doubles at each step. Successive approximations typically converge at a linear rate. To see this in one dimension, try the following exercise. ```{exercise} :label: ex-odrc Let $T \colon U \to U$ where $T$ is twice continuously differentiable and $U$ is an open interval in $\RR$. Suppose that $T$ has a fixed point $u^* \in U$ and that $u_k \coloneq T^k u_0$ converges to $u^*$ as $k \to \infty$. Prove that the rate of convergence is linear whenever $0 < |T' u^*| < 1$. In completing the proof, you might find it helpful to use the fact that, by a second-order Taylor expansion, there is a $v_k \in (u_k, u^*)$ such that $$ Tu_k = u^* + T' u^* (u_k - u^*) + \frac{T''v_k}{2} (u_k - u^*)^2. $$ (eq-sotay) ``` ```{solution} ex-odrc Since $u_{k+1} = T u_k$, we have $$ \frac{u_{k+1} - u^*}{u_k - u^*} = T' u^* + \frac{T''v_k}{2} (u_k - u^*). $$ Since $T$ is twice continuously differentiable, $T''v_k$ is bounded on bounded sets. As a result, taking absolute values and using $u_k \to u^*$ confirms the linear rate claimed in the exercise. ``` (The restriction that $0 < |T' u^*| < 1$ in {prf:ref}`ex-odrc` is mild. For example, given convergence of successive approximation to the fixed point, we expect $|T' u^*| < 1$, since this inequality implies that $u^*$ is locally stable.) (ss-grad)= ### Gradient-Based Methods While successive approximation always converges when global stability holds, faster fixed-point algorithms can often be obtained by leveraging extra information, such as gradients. Newton's method is an important gradient-based technique. (As we discuss in {ref}`sss-hpi`, Newton's method is a key component of algorithms for solving dynamic programs.) While Newton's method is often used to solve for roots of a given function, here we use it to find fixed points. #### Newton Fixed-Point Iteration Suppose first that $T$ is a differentiable self-map on an open set $U \subset \RR^n$ and that we want to find a fixed point of $T$. Our plan is to start with a guess $u_0$ of the fixed point and then update it to $u_1$. To do this we use the first-order approximation $\hat T$ of $T$ around $u_0$ and solve for the fixed point of $\hat T$ -- which we can do exactly since $\hat T$ is linear. We take this new point as $u_1$ and then continue. If $T$ is one-dimensional then $\hat T u \coloneq T u_0 + T' u_0 (u - u_0)$. For $n > 1$ we replace $T' u_0$ with the Jacobian of $T$ at $u_0$, which we write as $J_T(u_0)$. We then solve $\hat T u_1 = u_1$ for $u_1$, which gives $$ u_1 = (I - J_T(u_0))^{-1} (Tu_0 - J_T(u_0) u_0) \qquad \text{(} I \text{ is the } n \times n \text{ identity)}. $$ Figure {numref}`f-newton_1` shows $u_0$ and $u_1$ when $n=1$ and $Tu = 1 + u/(u + 1)$ and $u_0 = 0.5$. The value $u_1$ is the fixed point of the first-order approximation $\hat T$. It is closer to the fixed point of $T$ than $u_0$, as desired. ```{figure} ../figures/newton_1.pdf :name: f-newton_1 First step of Newton's method applied to $T$ ``` **Newton's (fixed-point) method** continues in the same way, from $u_1$ to $u_2$ and so on, leading to the sequence of points $$ u_{k+1} = Qu_k \quad \text{where} \quad Qu \coloneq (I - J_T(u))^{-1} (Tu - J_T(u) u) \qquad k = 0, 1, \ldots $$ (eq-newq) We need not write a new solver, since the successive approximation function in {numref}`list-s_approx` can be applied to $Q$ defined in {eq}`eq-newq`. (sss-rofcon)= #### Rates of Convergence Figure {numref}`f-newton_solow_traj` shows both the Newton approximation sequence and the successive approximation sequence applied to computing the fixed point of the Solow--Swan model from {ref}`sss-nonmaps`. We use two different initial conditions (top and bottom subfigures). Both sequences converge, but the Newton sequences converge faster. ```{figure} ../figures/newton_solow_traj.pdf :name: f-newton_solow_traj Newton's method applied to the Solow--Swan update rule ``` A fast rate of convergence for Newton scheme can be confirmed theoretically: Under mild conditions, there exists a neighborhood of the fixed point within which the Newton iterates converge quadratically. See, for example, Theorem 5.4.1 of {cite:t}`atkinson2005theoretical`. Some dynamic programming algorithms take advantage of this fast rate of convergence (see {ref}`sss-hpin`). #### Speed versus Robustness Sometimes we can accelerate computations by exploiting a problem's special structure (e.g., differentiability, convexity, monotonicity). But we often face a trade-off between speed and robustness to details of problem specification. More robust methods impose less structure. Relative to other algorithms, successive approximation tends to be robust but slow. We saw one illustration of the relatively slow rate of convergence in Figure {numref}`f-newton_solow_traj`. But we can also see its relatively strong robustness properties via the same example, by inspecting Figure {numref}`f-newton_solow_45`, which compares the update rule of successive approximation (the function $g$) with the update rule for Newton's method (the function $Q$ in {eq}`eq-newq`). Also plotted is the dashed 45 degree line. The parameterization is the same as for the top subfigure in Figure {numref}`f-solow_fp`. As previously discussed, the shape of $g$ implies global convergence of successive approximation. However, $Q$ is well-behaved near the fixed point (i.e., very flat and hence strongly contractive) but poorly behaved away from the fixed point. This illustrates that Newton's method is fast but generally less robust. ```{figure} ../figures/newton_solow_45.pdf :name: f-newton_solow_45 Robustness of successive approximation versus Newton's method ``` (sss-para)= #### Parallelization We have discussed rates of convergence for fixed-point methods. Mathematicians and computer scientists also analyze algorithms via **worst-case complexity**, which measures the number of fundamental operations (e.g., addition and multiplication of floating point numbers) when an algorithm acts on data that is least favorable for good performance. These measures are attractive because they are independent of the software and hardware platforms on which algorithms are implemented. Software and hardware matter not just for absolute performance of algorithms but also for *relative* performance. For example, although a single update step in successive approximation can often be partially parallelized, the algorithm is inherently serial, in the sense that the $(k+1)$-th iterate cannot be computed until iterate $k$ is available. Moreover, because the rate of convergence is typically slow (i.e., linear), there can be many small serial steps. This limits parallelization. Newton's method is also serial to some degree, since we are just iterating with a different map (the operator $Q$ in {eq}`eq-newq`). However, because it involves inverting matrices of possibly high dimension, each step is computationally intensive. At the same time, since the rate of convergence is faster, we have to take fewer steps. In this sense, the algorithm is less serial -- it involves a smaller number of more expensive steps. Because it is less serial, Newton's method offers far more potential for parallelization. Thus, the speed gain associated with Newton's method can become very large when using effective parallelization. (ss-ord)= ## Order This section reviews key concepts from order theory. (ss-posets)= ### Partial Orders We define partial orders and examine some of their basic properties. #### Partially Ordered Sets A **partial order** on a nonempty set $P$ is a relation $\preceq$ on $P \times P$ that, for any $p, q, r$ in $P$, satisfies 2 1. $p \preceq p$ 2. $p \preceq q$ and $q \preceq p$ implies $p = q$ and 3. $p \preceq q$ and $q \preceq r$ implies $p \preceq r$ 4. (reflexivity), 5. (antisymmetry), and 6. (transitivity). The pair $(P, \preceq)$ is called a **partially ordered set**. For convenience, we sometimes write $P$ for $(P, \preceq)$ and $q \succeq p$ for $p \preceq q$. The statement $p \preceq q \preceq r$ means $p \preceq q$ and $q \preceq r$. ```{prf:example} The usual order $\leq$ on $\RR$ is a partial order on $\RR$. For example, $a \leq b$ and $b \leq a$ implies $a=b$. ``` ```{exercise} :label: ex-fps-auto-2 Let $P$ be any set and consider the relation induced by equality, so that $p \preceq q$ if and only if $p = q$. Show that this relation is a partial order on $P$. ``` ```{exercise} :label: ex-sspo Let $M$ be any set. Show that set inclusion $\subset$ induces a partial order on $\wp(M)$, the set of all subsets of $M$. ``` ```{prf:example} :label: eg-ppor Fix an arbitrary nonempty set $\Xsf$. The **pointwise order** $\leq$ on the set $\RR^\Xsf$ of all functions from $\Xsf$ to $\RR$ is defined as follows: ::: center given $u, v$ in $\RR^\Xsf$, set $u \leq v$ if $u(x) \leq v(x)$ for all $x \in \Xsf$. ::: ``` ```{exercise} :label: ex-fps-auto-3 Show that the pointwise order $\leq$ is a partial order on $\RR^\Xsf$. ``` In what follows, for $u, v \in \RR^\Xsf$, we write $u \ll v$ if $u(x) < v(x)$ for all $x \in \Xsf$. ```{exercise} :label: ex-fps-auto-4 Show that the relation $\ll$ is *not* a partial order on $\RR^\Xsf$. ``` ```{solution} ex-fps-auto-4 It is easy to confirm that $\ll$ violates reflexivity. ``` The preceding pointwise concepts extend immediately to vectors, since vectors are just real-valued functions under the identification asserted in {prf:ref}`l-rxrn`. In particular, for vectors $u = (u_1, \ldots, u_n)$ and $v = (v_1, \ldots, v_n)$ in $\RR^n$, we write - $u \leq v$ if $u_i \leq v_i$ for all $i \in \natset{n}$ and - $u \ll v$ if $u_i < v_i$ for all $i \in \natset{n}$. Statements $u \geq v$ and $u \gg v$ are defined analogously. Figure {numref}`f-pointwise_order` illustrates. Naturally, $\leq$ is called the **pointwise order** on $\RR^n$. ```{figure} figures/pointwise_order.pdf :name: f-pointwise_order Pointwise we have $u \leq v$ and $u \ll v$ but not $w \leq v$ ``` ```{exercise} :label: ex-clic Limits in $\RR$ preserve weak inequalities. Use this property to prove that the same is true in $\RR^n$. In particular, show that, for vectors $a, b \in \RR^n$ and sequence $(u_k)$ in $\RR^n$ with $a \leq u_k \leq b$ for all $k \in \NN$ and $u_k \to u \in\RR^n$, we have $a \leq u \leq b$. ``` ```{prf:example} :label: eg-poom Analogous to vectors, for $n \times k$ matrices $A = (a_{ij})$ and $B=(b_{ij})$, we write - $A \leq B$ if $a_{ij} \leq b_{ij}$ for all $i, j$. - $A \ll B$ if $a_{ij} < b_{ij}$ for all $i, j$. We call $\leq$ the **pointwise order** over matrices. ``` ```{exercise} :label: ex-fps-auto-5 Explain why the pointwise order introduced in {prf:ref}`eg-poom` is also a special case of the pointwise order over functions. ``` ```{solution} ex-fps-auto-5 Just set $\Xsf = [n] \times [k]$. ``` ```{exercise} :label: ex-bmk Prove the next two facts: 1. If $B$ is $m \times k$ and $B \geq 0$, then $|B u| \leq B |u|$ for all $k \times 1$ column vectors $u$. 2. If $A$ is $n \times n$ with $A \geq 0$ and $(u_k)$ is a sequence in $\RR^n$ satisfying $u_{k+1} \leq A u_k$ for all $k \geq 0$, then $u_k \leq A^k u_0$. ``` ```{solution} ex-bmk Regarding the first claim, fix $B \in \matset{m}{k}$ with $b_{ij} \geq 0$ for all $i, j$. Pick any $i \in \natset{m}$ and $u \in \RR^k$. By the triangle inequality, we have $|\sum_j b_{ij} u_j| \leq \sum_j b_{ij} |u_j|$. Stacking these inequalities yields $|B u| \leq B |u|$, as was to be shown. Regarding the second, let $A$ and $(u_k)$ be as stated, with $u_{k+1} \leq A u_k$ for all $k$. We aim to prove $u_k \leq A^k u_0$ for all $k$ using induction. In doing so, we observe that $u_1 \leq A u_0$, so the claim is true at $k=1$. Suppose now that it holds at $k-1$. Then $u_k \leq A u_{k-1} \leq A A^{k-1} u_0 = A^k u_0$, where the last step used nonnegativity of $A$ and the induction hypothesis. The claim is now proved. ``` ```{exercise} :label: ex-smup Let $A$ be $n \times k$ and let $u$ and $v$ be $k$-vectors. Prove that $A \gg 0$, $u \leq v$ and $u \not=v$ implies $Au \ll Av$. ``` ```{solution} ex-smup Assume the stated conditions. Let $h \coloneq v - u$ and let $a_{ij}$ be the $i,j$-th element of $A$. We have $h \geq 0$ and $h_j > 0$ at some $j$. Hence $\sum_j a_{ij} h_j > 0$. This says that every row of $Ah$ is strictly positive. In other words $Ah = A(v - u) \gg 0$. The claim follows. ``` A partial order $\preceq$ on $P$ is called **total** if, for all $p, q \in P$, either $p \preceq q$ or $q \preceq p$. ```{prf:example} The usual order $\leq$ on $\RR$ is a total order, as is the same order on $\NN$. ``` ```{prf:example} Figure {numref}`f-pointwise_order` shows that the pointwise order $\leq$ is not a total order on $\RR^n$. For example, neither $v \leq w$ nor $w \leq v$, since $w_1 > v_1$ but $w_2 < v_2$. ``` ```{exercise} :label: ex-fps-auto-6 Is the partial order defined in {prf:ref}`ex-sspo` a total order? Either prove that it is or provide a counterexample. ``` ```{solution} ex-fps-auto-6 Let $M = \{1, 2\}$, let $A = \{1\}$ and let $B = \{2\}$. Then $A \subset B$ and $B \subset A$ both fail. Hence $\subset$ is not a total order on $\wp(M)$. ``` (sss-lge)= #### Least and Greatest Elements Given a partially ordered set $(P, \preceq)$ and $A \subset P$, we say that $g \in P$ is a **greatest element** of $A$ if $g \in A$ and, in addition, $a \in A \implies a \preceq g$. We call $\ell \in P$ a **least element** of $A$ if $\ell \in A$ and, in addition, $a \in A \implies \ell \preceq a$. If $A$ is totally ordered, then a greatest element $g$ of $A$ is also called a **maximum** of $A$, whereas a least element $\ell$ of $A$ is also called a **minimum**. See Appendix {prf:ref}`c-areal` for more about maxima and minima. ```{prf:remark} Elementary optimization problems have real-valued objectives, which means that we seek maxima and minima. In contrast, the objective in dynamic programming is to maximize a lifetime value function (or minimize a lifetime cost function), a *function* over a state space. Thus, the objective takes values in a partially ordered set and we seek greatest (or least) elements. ``` ```{exercise} :label: ex-fps-auto-7 Let $P$ be any partially ordered set and fix $A \subset P$. Prove that $A$ has at most one greatest element and at most one least element. ``` ```{solution} ex-fps-auto-7 We prove the claim concerning greatest elements: Suppose that $g$ and $g'$ are greatest elements of $A$. Then, since both are in $A$, we have $g \preceq g'$ and $g' \preceq g$. Hence, by antisymmetry, $g = g'$. ``` ```{exercise} :label: ex-subsupinf0 Let $M$ be a nonempty set and let $\wp(M)$ be the set of all subsets of $M$, partially ordered by $\subset$. Let $\{A_i\} = \{A_i\}_{i \in I}$ be a subset of $\wp(M)$, where $I$ is an arbitrary nonempty index set. Show that $S \coloneq \bigcup_i A_i$ is the greatest element of $\{A_i\}$ if and only if $S \in \{A_i\}$. ``` ```{solution} ex-subsupinf0 To see this, suppose first that $S \in \{A_i\}$. Since $A_j \subset \cup_i A_i =: S$ for all $j \in I$, the set $S$ is a greatest element of $\{A_i\}$. Conversely, if $S$ is not in $\{A_i\}$, then $S$ is not a greatest element (since the definition directly requires that $S \in \{A_i\}$. ``` ```{exercise} :label: ex-fps-auto-8 Adopt the setting of {prf:ref}`ex-subsupinf0` and suppose that $\{A_i\}$ is the set of bounded subsets of $\RR^n$. Prove that $\{A_i\}$ has no greatest element. ``` ```{solution} ex-fps-auto-8 Since the union of all bounded subsets of $\RR^n$ is $\RR^n$ (which is not bounded), $\{A_i\}$ has no greatest element. Indeed, if $G$ is the greatest element of $\{A_i\}$, then $G$ contains every bounded subset of $\RR^n$. But then $G$ is not bounded. Contradiction. ``` (sss-supinf)= #### Sup and Inf Concepts of suprema and infima on the real line (Appendix {prf:ref}`c-areal`) extend naturally to partially ordered sets. Given a partially ordered set $(P, \preceq)$ and a nonempty subset $A$ of $P$, we call $u \in P$ an **upper bound** of $A$ if $a \preceq u$ for all $a$ in $A$. Letting $U_P(A)$ be the set of all upper bounds of $A$ in $P$, we call $\bar u \in P$ a **supremum** of $A$ if $$ \bar u \in U_P(A) \; \text{ and } \; \bar u \preceq u \; \text{ for all } \; u \in U_P(A). $$ Thus, $\bar u$ is the least element (see {ref}`sss-lge`) of the set of upper bounds $U_P(A)$, whenever it exists. ```{exercise} :label: ex-fps-auto-9 Prove that $A$ has at most one supremum in $P$. ``` ```{solution} ex-fps-auto-9 Suppose that $s$ and $s'$ are both suprema of $A$ in $P$. Then both $s$ and $s'$ are upper bounds, so $s \preceq s'$ and $s' \preceq s$. Hence $s = s'$. ``` If $P \subset \RR$ and $\preceq$ is $\leq$, then the notion of supremum on a partially ordered set reduces to the elementary definition of the supremum for subsets of the real line discussed in Appendix {prf:ref}`c-areal`. Letting $A$ be a subset of partially ordered space $P$, - the supremum of $A$ is typically denoted $\bigvee A$. - If $A = \{a_i\}_{i \in I}$ for some index set $I$, we also write $\bigvee A$ as $\bigvee_i \, a_i$. - If $A = \{a, b\}$, then $\bigvee A$ is also written as $a \vee b$. Suprema and greatest elements are clearly related. The next exercise clarifies this. ```{exercise} :label: ex-gesup Prove the following statements: 1. If $\bar a = \bigvee A$ and $\bar a \in A$, then $\bar a$ is a greatest element of $A$. 2. If $A$ has a greatest element $\bar a$, then $\bar a = \bigvee A$. ``` ```{prf:remark} In view of {prf:ref}`ex-gesup`, when $A$ has a greatest element, we can refer to it by $\bigvee A$. This notation is used frequently throughout the book. ``` We call $\ell \in P$ a **lower bound** of $A$ if $a \succeq \ell$ for all $a$ in $A$. An element $\bar \ell$ of $P$ is called a **infimum** of $A$ if $\bar \ell$ is a lower bound of $A$ and $\bar \ell \succeq \ell$ for every lower bound $\ell$ of $A$. We use analogous notation to denote the infimum. For example, if $A = \{a, b\}$, then $\bigwedge A$ is also written as $a \wedge b$. ```{exercise} :label: ex-fps-auto-10 Let $(P, \preceq)$ be a partially ordered set and let $A$ be a subset of $P$. Prove that if $\ell$ is a least element of $A$, then $\ell = \bigwedge A$. ``` ```{exercise} :label: eg-subsupinf Let $M$ be a nonempty set and let $\wp(M)$ be the set of all subsets of $M$, partially ordered by $\subset$. Let $\{A_i\}_{i \in I}$ be a subset of $\wp(M)$. Prove that $\bigvee_i A_i = \cup_i A_i$ and $\bigwedge_i A_i = \cap_i A_i$. ``` ```{solution} eg-subsupinf To see the former, observe that $A_j \subset \cup_i A_i$ for all $j \in I$. Hence $\cup_i A_i$ is an upper bound of $\{A_i\}$. Moreover, if $B \subset M$ and $A_j \subset B$ for all $i \in I$, then $\cup_i A_i \subset B$. This proves that $\cup_i A_i$ is the supremum. The proof of the infimum case is similar. ``` ```{exercise} :label: ex-fps-auto-11 Even when $P$ is totally ordered, existence of suprema and infima for an abstract partially ordered set $(P, \preceq)$ can fail. Provide an example of a totally ordered set $P$ and a subset $A$ of $P$ that has no supremum in $P$. ``` ```{solution} ex-fps-auto-11 Here is one possible answer. Let $P = (0,1)$, partially ordered by $\leq$. The set $A = [1/2, 1)$ is bounded above in $\RR$ (and hence has a supremum in $\RR$) but has no supremum in $P$. Indeed, if $s = \bigvee A$, the $s \in P$ and $a \leq s$ for all $s \in A$. It is clear that no such element exists. ``` (sss-sifx)= ### The Case of Pointwise Order For us, the pointwise partial order $\leq$ introduced in {prf:ref}`eg-ppor` is especially useful. In this section, we review some properties of this order. Throughout, $\Xsf$ is an arbitrary finite set. (sss-latprop)= #### Suprema and Infima under a Pointwise Order Given $u, v \in \RR^\Xsf$, the symbol $u \wedge v$ is possibly ambiguous because we used the symbol both for a pointwise minimum in {ref}`sss-poov` and an infimum of $\{u, v\}$ in {ref}`sss-supinf`. Fortunately, for elements of the partially ordered set $(\RR^\Xsf, \leq)$, these two definitions coincide. Indeed, if $f(x) \coloneq \min\{u(x), v(x)\}$ for all $x \in \Xsf$, then 1. $f$ is a lower bound for $\{u, v\}$ in $(\RR^\Xsf, \leq)$, and 2. $g \leq u$ and $g \leq v$ implies $g \leq f$. Hence $f$ is the infimum of $\{u, v\}$ in $(\RR^\Xsf, \leq)$. ```{exercise} :label: ex-fps-auto-12 Prove that the supremum $u \vee v$ of $\{u, v\}$ in $(\RR^\Xsf, \leq)$ is the pointwise maximum $f(x) \coloneq \max\{u(x), v(x)\}$. ``` A subset $V$ of $\RR^\Xsf$ is called a **sublattice** of $\RR^\Xsf$ if $u, v \in V$ implies $u \vee v \in V$ and $u \wedge v \in V$. ```{prf:example} The sets $$ V_1 \coloneq \setntn{f \in \RR^\Xsf}{f \geq 0}, \quad V_2 \coloneq \setntn{f \in \RR^\Xsf}{f \gg 0} \; \text{ and } \; V_3 \coloneq \setntn{f \in \RR^\Xsf}{|f| \leq 1} $$ are all sublattices of $\RR^\Xsf$. ``` Above we discussed the fact that, for a pair of functions $\{u, v\}$, the supremum in $(\RR^\Xsf, \leq)$ is the pointwise maximum, whereas the infimum in $(\RR^\Xsf, \leq)$ is the pointwise minimum. The same principle holds for finite collections of functions. Thus, if $\{ v_i \} \coloneq \{ v_i \}_{i \in I}$ is a finite subset of $\RR^\Xsf$, then, for all $x \in \Xsf$, $$ \left( \bigvee_i v_i \right)(x) \coloneq \max_{i \in I} v_i(x) \quad \text{and} \quad \left( \bigwedge_i v_i \right)(x) \coloneq \min_{i \in I} v_i(x). $$ ```{exercise} :label: ex-fps-auto-13 Verify these claims. ``` ```{exercise} :label: ex-fps-auto-14 Show that if $V$ is a sublattice and $\{v_i\}$ is a finite collection of functions in $V$, then $\bigvee_i v_i$ and $\bigwedge_i v_i$ are also in $V$. ``` The next example discusses greatest elements in the setting of pointwise order. ```{prf:example} :label: eg-pmige Let $\Xsf$ be nonempty and fix $V \subset \RR^\Xsf$. Let $V$ be partially ordered by the pointwise order $\leq$. Let $\{v_\sigma\} \coloneq \{v_\sigma\}_{\sigma \in \Sigma}$ be a finite subset of $V$ and let $v^* \coloneq \vee_\sigma \, v_\sigma \in \RR^\Xsf$ be the pointwise maximum. If $v^* \in \{v_\sigma\}$, then $v^*$ is the greatest element of $\{v_\sigma\}$. If not, then $\{v_\sigma\}$ has no greatest element. ``` Figure {numref}`f-v_star_illus` helps illustrate {prf:ref}`eg-pmige`. In this case, $v^*$ is not in $\{v_\sigma\}$ and $\{v_\sigma\}$ has no greatest element (since neither $v_{\sigma'} \leq v_{\sigma''}$ nor $v_{\sigma''} \leq v_{\sigma'}$). ```{figure} ../figures/v_star_illus.pdf :name: f-v_star_illus $v^*$ is the upper envelope of $\{v_\sigma\}_{\sigma \in \Sigma}$ ``` ```{exercise} :label: ex-fps-auto-15 Prove the two claims at the end of {prf:ref}`eg-pmige`. ``` ```{solution} ex-fps-auto-15 Suppose first that $v^* \in \{v_\sigma\}$. Since $v_\sigma \leq v^*$ for all $\sigma$, the function $v^*$ is the greatest element. Regarding the second claim, suppose (seeking a contradiction), that $v^* \notin \{v_\sigma\}$ and $\bar v$ is a greatest element of $\{v_\sigma\}$. By definition, $v_\sigma \leq \bar v$ for all $\sigma$, so taking the pointwise maximum gives $v^* \leq \bar v$. At the same time, since $\bar v$ is a greatest element, we have $\bar v \in \{v_\sigma\}$, and therefore $\bar v \leq \max_\sigma v_\sigma = v^*$. Putting the two inequalities together gives $\bar v = v^*$, which in turn implies that $v^* \in \{v_\sigma\}$. Contradiction. ``` Given a partially ordered set $(P, \preceq)$ and $a, b \in P$, the **order interval** $[a, b]$ is defined as all $p\in P$ such that $a \preceq p \preceq b$. (If $a \preceq b$ fails, the order interval is empty.) ```{exercise} :label: ex-fps-auto-16 Let $V$ be a sublattice of $\RR^\Xsf$. Show that the intersection of any two order intervals in $V$ is an order interval in $V$. ``` ```{solution} ex-fps-auto-16 Let $I_a \coloneq [a_1, a_2]$ and $I_b \coloneq [b_1, b_2]$ be two order intervals in $V$. Consider the order interval $I \coloneq [a_1 \vee b_1, a_2 \wedge b_2]$. If $h \in I$, then $h \geq a_1 \vee b_1$, so $h \geq a_1$ and $h \geq b_1$. A similar argument gives $h \leq a_2$ and $h \leq b_2$. Hence $h \in I_a \cap I_b$. Working in the other direction, it is not difficult to show that $h \in I_a \cap I_b$ implies $h \in I$. Hence $I = I_a \cap I_b$. In particular, $I_a \cap I_b$ is an order interval in $V$. ``` #### Inequalities and Identities In this section, we note some useful inequalities and identities related to the pointwise partial order on $\RR^\Xsf$. As before, $\Xsf$ is any finite set. ```{prf:lemma} :label: l-efine For $f, g, h \in \RR^\Xsf$, the following statements are true: 1. $|f + g| \leq |f| + |g|$. 2. $(f \wedge g) + h = (f + h) \wedge (g + h)$ and $(f \vee g) + h = (f + h) \vee (g + h)$. 3. $(f \vee g) \wedge h = (f \wedge h) \vee (g \wedge h)$ and $(f \wedge g) \vee h = (f \vee h) \wedge (g \vee h)$. 4. $|f \wedge h - g \wedge h | \leq |f - g|$. 5. $|f \vee h - g \vee h | \leq |f - g|$. ``` These results follow immediately from proofs of corresponding claims when $f,g$, and $h$ are in $\RR$. For example, by the usual triangle inequality for scalars, we have $|f(x) + g(x)| \leq |f(x)| + |g(x)|$ for all $x \in \Xsf$. This is equivalent to the statement $|f+g| \leq |f|+|g|$ in (i). Similarly, inequality (v) follows directly from a corresponding scalar inequality that was already proved in {prf:ref}`ex-elmb`,. A complete proof of lemma {prf:ref}`l-efine` can be found with Theorem 30.1 of {cite:t}`aliprantis1998principles`. It is also true that, if $f, g, h \in \RR_+^\Xsf$, then $$ (f + g) \wedge h \leq (f \wedge h) + (g \wedge h). $$ (eq-abc) ```{exercise} :label: ex-awedg Prove: If $a, b, c \in \RR_+$, then $|a \wedge c - b \wedge c| \leq |a-b| \wedge c$. ``` ```{solution} ex-awedg Fix $a, b \in \RR_+$ and $c \in \RR_+$. By {eq}`eq-abc`, we have $$ a \wedge c = (a - b + b) \wedge c \leq ( |a - b| + b) \wedge c \leq |a - b| \wedge c + b \wedge c. $$ Thus, $a \wedge c - b \wedge c \leq |a-b| \wedge c$. Reversing roles of $a$ and $b$ gives $b \wedge c - a \wedge c \leq |a-b| \wedge c$. This proves the claim in {prf:ref}`ex-awedg`. ``` We note the following useful inequality. ```{prf:lemma} :label: l-maxineq Let $D$ be a finite set. If $f$ and $g$ are elements of $\RR^D$, then $$ |\max_{z \in D} f(z) - \max_{z \in D} g(z) | \leq \max_{z \in D} | f(z) - g(z) |. $$ (eq-maxineq) ``` ```{prf:proof} Fixing $f, g \in \RR^D$, we have $$ f = f - g + g \leq |f - g| + g. $$ $$ \fore \max f \leq \max( |f - g| + g ) \leq \max |f - g| + \max g. $$ $$ \fore \max f - \max g \leq \max |f - g|. $$ Reversing the roles of $f$ and $g$ proves the claim. ◻ ``` The inequality in {prf:ref}`l-maxineq` helps with dynamic programming problems that involve maximization. The next exercise concerns minimization. ```{exercise} :label: ex-fps-auto-17 Prove that, in the setting of {prf:ref}`l-maxineq`, we have $$ |\min_{z \in D} f(z) - \min_{z \in D} g(z) | \leq \max_{z \in D} | f(z) - g(z) |. $$ (eq-minineq) ``` ```{solution} ex-fps-auto-17 Since $\min f = - \max (-f)$ and similarly for $g$, we can apply {prf:ref}`l-maxineq` to obtain $$ |\min f - \min g| = |\max (-g) - \max(- f)| \leq \max |(-g) - (-f)| = \max |f - g|. $$ ``` We end this section with a discussion of upper envelopes. To frame the discussion, we take $\{ T_\sigma \} \coloneq \{ T_\sigma \}_{\sigma \in \Sigma}$ to be a finite family of self-maps on a sublattice $V$ of $\RR^\Xsf$. Consider some properties of the operator $T$ on $V$ defined by $$ Tv = \bigvee_{\sigma \in \Sigma} \, T_\sigma \, v \qquad (v \in V). $$ It follows from the sublattice property that $T$ is a self-map on $V$. In some sources, $T$ is called the **upper envelope** of the functions $\{ T_\sigma \}$. The following lemma will be useful for dynamic programming. ```{prf:lemma} :label: l-supbc If, for each $\sigma \in \Sigma$, the operator $T_\sigma$ is a contraction of modulus $\lambda_\sigma$ under the supremum norm, then $T$ is a contraction of modulus $\max_\sigma \lambda_\sigma$ under the same norm. ``` ```{prf:proof} Let the stated conditions hold and fix $u, v \in V$. Applying {prf:ref}`l-maxineq`, we get $$ \begin{aligned} \|Tu - Tv \|_\infty & = \max_x | \max_\sigma (T_\sigma \, u)(x) - \max_\sigma \, (T_\sigma \, v)(x) | \\ & \leq \max_x \max_\sigma \, | (T_\sigma \, u)(x) - (T_\sigma \, v)(x) | \\ & = \max_\sigma \, \max_x \, | (T_\sigma \, u)(x) - (T_\sigma \, v)(x) |. \end{aligned} $$ $$ \fore \|Tu - Tv \|_\infty \leq \max_\sigma \, \| T_\sigma \, u - T_\sigma \, v \|_\infty \leq \max_\sigma \, \lambda_\sigma \, \| u - v \|_\infty . $$ Hence $T$ is a contraction of modulus $\max_\sigma \, \lambda_\sigma$ on $V$, as claimed. ◻ ``` (sss-mono)= ### Order-Preserving Maps Order-preserving maps appear throughout the theory of dynamic programming. Here we define them and state a condition for contractivity that requires the order preserving property. #### Definition Given two partially ordered sets $(P, \preceq)$ and $(Q, \trianglelefteq)$, a map $T$ from $P$ to $Q$ is called **order-preserving** if, given $p, p' \in P$, we have $$ p \preceq p' \quad \implies \quad Tp \trianglelefteq Tp'. $$ (eq-orpres) $T$ is called **order-reversing** if, instead, $$ p \preceq p' \quad \implies \quad Tp' \trianglelefteq Tp. $$ (eq-orpresr) ```{prf:example} :label: eg-nmopm Let $\leq$ be the pointwise order on $\RR^n$. If $A$ is $n \times n$ with $A \geq 0$, then $T \colon \RR^n \to \RR^n$ defined by $Tu = Au + b$ is order preserving on $\RR^n$, since $u \leq v$ implies $v-u \geq 0$, and hence $A(v-u) \geq 0$. But then $Au \leq Av$ and hence $Tu \leq Tv$. ``` ```{prf:example} Given $a \leq b$ in $\RR$, let $C[a,b]$ be all continuous functions from $[a, b]$ to $\RR$ and let $\leq$ be the pointwise order on $C[a,b]$. Let $$ I(f) \coloneq \int_a^b f(x) dx \qquad (f \in C[a,b]). $$ Since $f \leq g$ implies $\int_a^b f(x) dx \leq \int_a^b g(x) dx$, the map $I$ is order-preserving on $C[a,b]$. ``` ```{exercise} :label: ex-passlg Let $P, Q$ be partially ordered sets and let $F$ be an order-preserving map from $P$ to $Q$. Suppose that $\{u_i \} \subset P$ has a greatest and a least element. Prove that, in this setting, both $\bigvee_i F u_i$ and $\bigwedge_i F u_i$ exist in $Q$, and, moreover, $$ F \bigvee_i u_i = \bigvee_i F u_i \quad \text{and} \quad F \bigwedge_i u_i = \bigwedge_i F u_i. $$ ``` ```{solution} ex-passlg We prove the claim regarding greatest elements. To this end, let $\bar u$ be the greatest element of $\{u_i\}$. Then $F u_i \preceq F \bar u$ for all $i$, so $F \bar u$ is the greatest element, and hence the supremum, of $\{F u_i\}$. That is, $\bigvee_i F u_i = F \bar u = F \bigvee_i u_i$. ``` ```{exercise} :label: ex-akop Let $(P, \preceq)$ be a partially ordered set and let $A$ be an order-preserving self-map on $P$. Prove that $A^k$ is order-preserving on $P$ for any $k \in \NN$. ``` ```{solution} ex-akop Let $A$ and $P$ be as stated. The claim that $A^k$ is order-preserving on $P$ holds at $k=1$. Suppose now that it holds at $k$ and fix $p, q \in P$ with $p \preceq q$. By the induction hypothesis and the fact that $A$ is order-preserving, we have $A A^k p \preceq A A^k q$. Hence $A^{k+1} p \preceq A^{k+1} q$. We conclude that $A^{k+1}$ is also order-preserving, as was to be shown. ``` ```{exercise} :label: ex-nnmatop Let $A$ be $n \times k$ with $A \geq 0$. Show that the map $u \mapsto Au$ is order-preserving on $\RR^k$ under the pointwise order. ``` ```{solution} ex-nnmatop Fix an $n\times k$ matrix $A$ with $A \geq 0$, along with $u, v \in \RR^k$. We need to show that $u \leq v$ implies $Au \leq Av$ for any conformable vectors $u, v$. This holds because if $u \leq v$ we have $v - u \geq 0$, so $A (v-u) \geq 0$. But then $Av - Au \geq 0$, or $Au \leq Av$. ``` ```{exercise} :label: ex-nnmatop2 Let $A$ and $B$ be $n \times n$ with $0 \leq A \leq B$. Prove that $A^k \leq B^k$ for all $k \in \NN$ and, in addition, that $\rho(A) \leq \rho(B)$. ``` ```{solution} ex-nnmatop2 Fix square $A,B$ with $0 \leq A \leq B$. It follows from the rules of matrix multiplication that, for arbitrary nonnegative square matrices $E, F, G$ with $F \leq G$, we have $EF \leq EG$ and $FE \leq GE$. Hence, if $A^k \leq B^k$ for some $k \in \NN$, then $A^{k+1} = A A^k \leq B A^k \leq B B^k =B^{k+1}$. Thus, by induction, $A^k \leq B^k$ for all $k \in \NN$, which verifies the first claim. Regarding the second, it is clear that for nonnegative matrices $E, F$ with $E \leq F$ we have $\| E\|_\infty \leq \| F\|_\infty$. Hence $\| A^k \|_\infty \leq \| B^k \|_\infty$ for all $k \in \NN$. Raising both sides to the power $1/k$ and applying Gelfand's lemma verifies $\rho(A) \leq \rho(B)$. ``` #### Increasing and Decreasing Functions Regarding the definitions in {eq}`eq-orpres` and {eq}`eq-orpresr`, when $(Q, \trianglelefteq) = (\RR, \leq)$, it is common to say "increasing" instead of order-preserving, and "decreasing" instead of order-reversing. We adopt this terminology. In particular, given partially ordered set $(P, \preceq)$, we call $h \in \RR^P$ - **increasing** if $p \preceq p'$ implies $h(p) \leq h(p')$ and - **decreasing** if $p \preceq p'$ implies $h(p) \geq h(p')$. We use the symbol $i\RR^P$ for the set of increasing functions in $\RR^P$. ```{prf:example} If $P = \{1, \ldots, n\}$ and $\preceq$ is the usual order $\leq$ on $\RR$, then $x \mapsto 2x$ and $x \mapsto \1\{2 \leq x\}$ are in $i\RR^P$ but $x \mapsto -x$ and $x \mapsto \1\{x \leq 2\}$ are not. ``` ```{prf:remark} Instead of adopting the fancy terms "order-preserving" and "order-reversing," why not just use "increasing" and "decreasing"? A short answer is that, for a general partial order, the concepts of order-preserving and order-reversing can be very different from usual notions of increasing and decreasing functions. ``` ```{exercise} :label: ex-fps-auto-18 Prove: If $P$ is any partially ordered set and $f, g \in i\RR^P$, then 1. $\alpha f + \beta g \in i\RR^P$ whenever $\alpha, \beta \geq 0$. 2. $f \vee g \in i\RR^P$ and $f \wedge g \in i\RR^P$. ``` ```{exercise} :label: ex-fps-auto-19 Given finite $P$, show that $i\RR^P$ is closed in $\RR^P$. ``` ```{solution} ex-fps-auto-19 Take $(f_k)_{k \geq 1}$ in $i\RR^P$ and $f \in \RR^P$ with $f_k \to f$ as $k \to \infty$. Since $f_k \to f$ we have $f_k(z) \to f(z)$ for all $z \in P$. (Norm convergence implies pointwise convergence.) Fix $x, y \in P$ with $x \preceq y$. From $(f_k) \subset i\RR^P$ we have $f_k(x) \leq f_k(y)$ for all $k$. Since weak inequalities are preserved under limits, $f(x) \leq f(y)$. Hence $f \in i\RR^P$. ``` ```{exercise} :label: ex-fps-auto-20 Let $X$ be a random variable taking values in finite $\Xsf$. Define $\ell \colon \RR^\Xsf \to \RR$ by $\ell h = \EE h(X)$. Show that $\ell$ is increasing when $\RR^\Xsf$ has the pointwise order. ``` The next exercise shows that, in a totally ordered setting, an increasing function can be represented as the sum of increasing binary functions. ```{exercise} :label: ex-iu Let $\Xsf = \{x_1, \ldots, x_n\}$ where $x_k \preceq x_{k+1}$ for all $k$. Show that, for any $u \in i\RR^\Xsf_+$, there exist $s_1, \ldots, s_n$ in $\RR_+$ such that $u(x) = \sum_{k=1}^n s_k \1\{x \succeq x_k\}$ for all $x \in \Xsf$. ``` ```{solution} ex-iu Set $\alpha_k \coloneq u(x_k)$ for all $k$ and $s_k \coloneq \alpha_k - \alpha_{k-1}$ with $\alpha_0 \coloneq 0$. Fix $x_j \in \Xsf$. Then $$ \sum_{k=1}^n s_k \1\{x_j \geq x_k\} = \sum_{k=1}^j s_k = (\alpha_1 - \alpha_0) + (\alpha_2 - \alpha_1) + \ldots + (\alpha_j - \alpha_{j-1}) = \alpha_j. $$ In other words, $\sum_{k=1}^n s_k \1\{x_j \geq x_k\} = u(x_j)$. This completes the proofs. ``` As usual, if $h \colon P \to Q$ and $P, Q \subset \RR$, then we will call $h$ - **strictly increasing** if $x < y$ implies $h(x) < h(y)$, and - **strictly decreasing** if $x < y$ implies $h(x) > h(y)$. (sss-blackwell)= #### Blackwell's Condition Our discussion of Banach's Theorem in {ref}`ss-bcmt` showed the usefulness of contractivity. For an order-preserving operator on a subset of $\RR^\Xsf$, the following condition often simplifies establishing this property. In the statement of the lemma, $U$ is a subset of $\RR^\Xsf$, partially ordered by $\leq$, and $\Xsf$ is finite. Also, $U$ has the property that $u \in U$ and $c \in \RR_+$ implies $u + c \in U$. ```{prf:lemma} :label: l-blackwell If $T$ is an order preserving self-map on $U$ and there exists a constant $\beta \in (0,1)$ such that $$ T(u + c) \leq Tu + \beta c \quad \text{for all } u \in U \text{ and } c \in \RR_+, $$ (eq-blackwell) then $T$ is a contraction of modulus $\beta$ on $U$ with respect to the supremum norm. ``` ```{prf:proof} Let $U, T$ have the stated properties and fix $u, v \in U$. We have $$ Tu = T(v + u - v) \leq T(v + \| u - v \|_\infty) \leq Tv + \beta \| u - v \|_\infty. $$ Rearranging gives $Tu - Tv \leq \beta \| u - v \|_\infty$. Reversing roles of $u$ and $v$ proves the claim. ◻ ``` (sss-sdfin)= ### Stochastic Dominance So far we have discussed partial orders over vectors, functions and sets. It is also useful to have a partial order over distributions that tells us when one distribution is in some sense "larger" than another. In this section, we introduce a partial order over some distributions commonly used in economics and finance. Let's start with an example. Recall that a random variable $X$ is binomial $B(n,0.5)$ if it counts the number of heads in $n$ flips of a fair coin. Figure {numref}`f-binom_stoch_dom` shows two distributions, $\phi \eqdist X \sim B(10, 0.5)$ and $\psi \eqdist Y \sim B(18, 0.5)$. Since $Y$ counts over more flips, we expect it to take larger values in some sense, and we also expect its distribution $\psi$ to reflect this. How can we make these thoughts precise? ```{figure} ../figures/binom_stoch_dom.pdf :name: f-binom_stoch_dom Two binomial distributions ``` A standard order over distributions that captures this idea is defined as follows: Given finite set $\Xsf$ partially ordered by $\preceq$ and $\phi, \psi \in \dD(\Xsf)$, we say that $\psi$ **stochastically dominates** $\phi$ and write **$\phi \lefsd \psi$** if $$ \sum_x u(x) \phi(x) \leq \sum_x u(x) \psi(x) \; \text{ for every } u \text{ in } i\RR^\Xsf. $$ (eq-fosdd) The relation $\lefsd$ is also called **first-order stochastic dominance** to differentiate it from other forms of stochastic order. ```{prf:example} If $\phi$ and $\psi$ are the binomial distributions defined in the preceding paragraphs and $\Xsf = \{0, \ldots, 18\}$, then $\phi \lefsd \psi$ holds. Indeed, if $W_1, \ldots, W_{18}$ are iid binary random variables with $\PP\{W_i = 1\}=0.5$ for all $i$, then $X \coloneq \sum_{i=1}^{10} W_i$ has distribution $\phi$ and $Y \coloneq \sum_{i=1}^{18} W_i$ has distribution $\psi$. In addition, $X \leq Y$ with probability one (i.e., for any outcome of the draws $W_1, \ldots, W_{18}$). It follows that, for any given $u \in i\RR^\Xsf$, we have $u(X) \leq u(Y)$ with probability one. Hence $\EE u(X) \leq \EE u(Y)$ holds, which is the same statement as {eq}`eq-fosdd`. ``` A good way to interpret first-order stochastic dominance is to suppose that an agent has preferences over outcomes in $\Xsf$ described by a utility function $u \in \RR^\Xsf$. Suppose in addition that the agent prefers more to less, in the sense that $u \in i\RR^\Xsf$, and that the agent ranks lotteries over $\Xsf$ according to expected utility, so that the agent evaluates $\phi \in \dD(\Xsf)$ according to $\sum_x u(x) \phi(x)$. Then the agent (weakly) prefers $\psi$ to $\phi$ whenever $\phi \lefsd \psi$. We can say more. Consider the class $\aA$ of *all* agents who (a) have preferences over outcomes in $\Xsf$, (b) prefer more to less, and (c) rank lotteries over $\Xsf$ according to expected utility. Then $\phi \lefsd \psi$ if and only if every agent in $\aA$ prefers $\psi$ to $\phi$. ```{prf:remark} The last paragraph helps explain the pervasiveness of stochastic dominance in economics. It is standard to assume that economic agents have increasing utility functions and use expected utility to rank lotteries. In such environments, an upward shift in a lottery, as measured by stochastic dominance, makes all agents better off. ``` ```{exercise} :label: ex-fsss A simple setting in which we can study stochastic dominance is where $\Xsf = \{1, 2\}$ and $\Xsf$ is partially ordered by $\leq$. In this case, $\phi \lefsd \psi$ if and only $\phi$ puts more mass on 1 than $\psi$, and, equivalently, less mass on 2. That is, $$ \phi \lefsd \psi \iff \psi(1) \leq \phi(1) \iff \phi(2) \leq \psi(2) . $$ Verify the equivalence of these statements. ``` ```{solution} ex-fsss Fix $\phi, \psi \in \Xsf$ and suppose that $\phi \lefsd \psi$. Let $u \in \RR^\Xsf$ be defined by $u(1)=0$ and $u(2)=1$. Then, by the definition of stochastic dominance, we have $\phi(2) \leq \psi(2)$. Since $\phi(1)=1-\phi(2)$ and $\psi(1)=1-\psi(2)$, this inequality is equivalent to $\psi(1) \leq \phi(1)$. Finally, suppose that $\psi(1) \leq \phi(1)$ and fix $u \in i\RR^\Xsf$. Let $h = u(2) - u(1) \geq 0$. Then $$ \sum_x u(x) \phi(x) = u(1) \phi(1) + (u(1) + h) (1 - \phi(1)) = u(1) + h (1 - \phi(1)). $$ Similarly, $\sum_x u(x) \psi(x) = u(1) + h (1 - \psi(1))$. Since $h \geq 0$ and $\psi(1) \leq \phi(1)$, we have $\sum_x u(x) \phi(x) \leq \sum_x u(x) \psi(x)$. Thus, $\phi \lefsd \psi$. This chain of implications proves the equivalences in the exercise. ``` To state another useful perspective on stochastic dominance, we introduce the notation $$ G^\phi (y) \coloneq \sum_{x \succeq y} \phi(x) \qquad (\phi \in \dD(\Xsf), \; y \in \Xsf). $$ For a given distribution $\phi$, the function $G^\phi$ is sometimes called the **counter CDF** (counter cumulative distribution function) of $\phi$. ```{prf:lemma} :label: l-eqfst For each $\phi, \psi \in \dD(\Xsf)$, the following statements hold: 1. $\phi \lefsd \psi \implies G^\phi \leq G^\psi$. 2. If $\Xsf$ is totally ordered by $\preceq$, then $G^\phi \leq G^\psi \implies \phi \lefsd \psi$. ``` The proof is given. Figure {numref}`f-fosd_tauchen_1` helps to illustrate. Here $\Xsf \subset \RR$ and $\phi$ and $\psi$ are distributions on $\Xsf$. We can see that $\phi \lefsd \psi$ because the counter CDFs are ordered in the sense that $G^\phi \leq G^\psi$ pointwise on $\Xsf$. ```{figure} ../figures/fosd_tauchen_1.pdf :name: f-fosd_tauchen_1 Visualization of $\phi \lefsd \psi$ ``` ```{prf:lemma} :label: l-pdpo Stochastic dominance is a partial order on $\dD(\Xsf)$. ``` ```{exercise} :label: ex-fps-auto-21 Prove the transitivity component of {prf:ref}`l-pdpo`, that is, prove that $\lefsd$ is transitive on $\dD(\Xsf)$. ``` ```{solution} ex-fps-auto-21 Suppose $f, g, h \in \dD(\Xsf)$ with $f \lefsd g$ and $g \lefsd h$. Fixing $u \in i\RR^\Xsf$, we have $$ \sum_x u(x)f(x) \leq \sum_x u(x)g(x) \quad \text{and} \quad \sum_x u(x)g(x) \leq \sum_x u(x)h(x) $$ Hence $\sum_x u(x)f(x) \leq \sum_x u(x)h(x)$. Since $u$ was arbitrary in $i\RR^\Xsf$, we are done. ``` ```{exercise} :label: ex-qmsf Fix $\tau \in (0,1]$ and let $Q_\tau$ be the quantile function defined. Choose $\phi, \psi \in \dD(\Xsf)$ and let $X,Y$ be $\Xsf$-valued random variables with distributions $\phi$ and $\psi$ respectively. Prove that $\phi \lefsd \psi$ implies $Q_\tau (X) \leq Q_\tau(Y)$. ``` ```{solution} ex-qmsf Let $F^\phi \coloneq 1 - G^\phi$ be the CDF of $\phi$ and let $F^\psi$ be the CDF of $\psi$. In view of {prf:ref}`l-eqfst`, we have $F^\psi \leq F^\phi$. As a consequence, $$ \setntn{x \in \Xsf}{F^\psi(x) \geq \tau} \subset \setntn{x \in \Xsf}{F^\phi(x) \geq \tau}. $$ It follows directly that $$ \min \setntn{x \in \Xsf}{F^\phi(x) \geq \tau} \leq \min \setntn{x \in \Xsf}{F^\psi(x) \geq \tau}. $$ That is, $Q_\tau(X) \leq Q_\tau(Y)$. ``` (ss-paramonf)= ### Parametric Monotonicity We are often interested in whether a change in a parameter shifts an outcome up or down. For example, a parameter might appear in a central bank decision rule for pegging an interest rate, and we want to know whether increasing that parameter will increase steady state inflation. By providing sufficient conditions for monotone shifts in fixed points, results in this section can help answer such questions. Let $(P, \preceq)$ be a partially ordered set. Given two self-maps $S$ and $T$ on a set $P$, we write $S \preceq T$ if $S u \preceq T u$ for every $u \in P$ and say that $T$ **dominates** $S$ on $P$. ```{prf:example} Let $P=(\RR^n_+, \leq)$, let $Su = Au + b$ and $Tu=Bu + b$, where $b \in P$ and $A$ and $B$ are $n \times n$ with $0 \leq A \leq B$. For any $u \in P$, we have $A u \leq B u$. Hence $Su \leq Tu$ and $T$ dominates $S$ on $P$. ``` ```{exercise} :label: ex-dompower Let $(P, \preceq)$ be a partially ordered set, and let $S$ and $T$ be order-preserving self-maps such that $S \preceq T$. Show that $S^k \preceq T^k$ holds for all $k \in \NN$. ``` ```{solution} ex-dompower Let $P, S, T$ be as described in the exercise. We aim to show that $S^k \preceq T^k$ holds for all $k \in \NN$. Clearly, it holds for $k=1$. If it also holds at $k-1$, then, for any $u \in P$, we have $S^k u = S S^{k-1} u \leq S T^{k-1} u \leq T T^{k-1} u = T^k u$, where we used the induction hypothesis, the order-preserving property of $S$ and the assumption that $S \preceq T$. ``` ```{exercise} :label: ex-fps-auto-22 Let $(P, \preceq)$ be a partially ordered set, let $\sS$ be the set of all self-maps on $P$ and write $S \preceq T$ if $T$ dominates $S$ on $P$. Show that $\preceq$ is a partial order on $\sS$. ``` One might assume that, in a setting where $T$ dominates $S$, the fixed points of $T$ will be larger. This can hold, as in Figure {numref}`f-fixed_point_monotonicity_1`, but it can also fail, as in Figure {numref}`f-fixed_point_monotonicity_2`. A difference between these two situations is that in Figure {numref}`f-fixed_point_monotonicity_1` the map $T$ is globally stable. This leads us to our next result. ```{figure} figures/fixed_point_monotonicity_1.svg :name: f-fixed_point_monotonicity_1 Ordered fixed points when global stability holds ``` ```{figure} figures/fixed_point_monotonicity_2.svg :name: f-fixed_point_monotonicity_2 Reverse-ordered fixed points when global stability fails ``` ```{prf:proposition} :label: p-ofpds Let $S$ and $T$ be self-maps on $M \subset \RR^n$ and let $\leq$ be the pointwise order. If $T$ dominates $S$ on $M$ and, in addition, $T$ is order-preserving and globally stable on $M$, then its unique fixed point dominates any fixed point of $S$. ``` ```{prf:proof} *Proof of {prf:ref}`p-ofpds`.* Assume the conditions of the proposition and let $u_T$ be the unique fixed point of $T$. Let $u_S$ be any fixed point of $S$. Since $S \leq T$, we have $u_S = S u_S \leq T u_S$. Applying $T$ to both sides of this inequality and using the order-preserving property of $T$ and transitivity of $\leq$ gives $u_S \leq T^2 u_S$. Continuing in this fashion yields $u_S \leq T^k u_S$ for all $k \in \NN$. Taking the limit in $k$ and using the fact that $\leq$ is closed under limits gives $u_S \leq u_T$. ◻ ``` As an application of {prf:ref}`p-ofpds`, consider again the Solow--Swan growth model $k_{t+1} = g(k_t) \coloneq s f(k_t) + (1 - \delta) k_t$. We saw in {ref}`sss-nonmaps` that if $f(k) = Ak^\alpha$ where $A > 0$ and $\alpha \in (0, 1)$, then $g$ is globally stable on $M \coloneq (0,\infty)$. Clearly $k \mapsto g(k)$ is order-preserving on $M$. If we now increase, say, the savings rate $s$, then $g$ will be shifted up everywhere, implying, via {prf:ref}`p-ofpds`, that the fixed point also rises. {prf:ref}`ex-monss` asks you to step through the details. ```{exercise} :label: ex-monss Let $g(k) = s A k^\alpha + (1 - \delta) k$ where all parameters are strictly positive, $\alpha \in (0, 1)$ and $\delta \leq 1$. Let $k^*(s, A, \alpha, \delta)$ be the unique fixed point of $g$ in $M$. Without using the expression we derived for $k^*$ previously ({prf:ref}`ex-kssol`), show that 1. $k^*(s, A, \alpha, \delta)$ is increasing in $s$ and $A$. 2. $k^*(s, A, \alpha, \delta)$ is decreasing in $\delta$. ``` Figure {numref}`f-solow_fp_adjust` helps illustrate the results of {prf:ref}`ex-monss`. The top left sub-figure shows a baseline parameterization, with $A=2.0$, $s = \alpha = 0.3$ and $\delta=0.4$. The other sub-figures show how the steady state changes as parameters deviate from that baseline. ```{figure} ../figures/solow_fp_adjust.pdf :name: f-solow_fp_adjust Parametric monotonicity for the Solow--Swan model ``` ```{exercise} :label: ex-iid_js_tg In {eq}`eq-jsdg`, we defined a map $g$ such that the optimal continuation value $h^*$ is a fixed point. Using this construction, prove that $h^*$ is increasing in $\beta$. ``` ```{solution} ex-iid_js_tg Fix $\beta_1 \leq \beta_2$. Let $g_1$ and $g_2$ be the corresponding fixed-point maps, as defined in {eq}`eq-jsdg`. Since $\beta_1 \leq \beta_2$, we have $g_1(h) \leq g_2(h)$ for all $h \in \RR_+$ and, in addition, $g_2$ is a contraction map (and hence globally stable), {prf:ref}`p-ofpds` applies. In particular, the fixed point $h_1^*$ corresponding to $\beta_1$ is less than or equal to $h_2^*$, the fixed point corresponding to $\beta_2$. ``` Figure {numref}`f-iid_job_search_tg` gives an illustration of the result in {prf:ref}`ex-iid_js_tg`. Here an increase in $\beta$ leads to a larger continuation value. This seems reasonable, since larger $\beta$ indicates more concern about outcomes in future periods. ```{figure} ../figures/iid_job_search_tg.pdf :name: f-iid_job_search_tg Parametric monotonicity in $\beta$ for the continuation value ``` While the preceding examples of parametric monotonicity are all one-dimensional, we will soon see that {prf:ref}`p-ofpds` can also be applied in high-dimensional settings. ## Matrices and Operators Many aspects of dynamic programming are most clearly framed using operator theory. In this section, we discuss linear operators and their connections to matrices. We emphasize nonnegative matrices and so-called positive linear operators that arise naturally in dynamic programming. ### Nonnegative Matrices We begin by reviewing basic properties of nonnegative matrices. (sss-nnmpow)= #### Nonnegative Matrices and Their Powers We call a matrix $A$ **nonnegative** and write $A \geq 0$ if all elements of $A$ are nonnegative. We call $A$ **everywhere positive** and write $A \gg 0$ if all elements of $A$ are strictly positive. A square matrix $A$ is called **irreducible** if $A \geq 0$ and $\sum_{k=1}^\infty A^k \gg 0$. An interpretation in terms of connected networks is given in Chapter 1 of {cite:t}`sargent2022economic`. Let $A$ be $n \times n$. It is not always true that the spectral radius $\rho(A)$ is an eigenvalue of $A$.[^1] However, when $A \geq 0$, the spectral radius is always an eigenvalue. The following theorem states this result and several extensions. ```{prf:theorem} :label: t-pf If $A \geq 0$, then $\rho(A)$ is an eigenvalue of $A$ with nonnegative, real-valued right and left eigenvectors. In particular, we can find a nonnegative, nonzero column vector $e$ and a nonnegative, nonzero row vector $\epsilon$ such that $$ A e = \rho(A) e \quad \text{ and } \quad \epsilon A = \rho(A) \epsilon. $$ (eq-pfrl) If $A$ is irreducible, then the right and left eigenvectors are everywhere positive and unique. Moreover, if $A$ is everywhere positive, then with $e$ and $\epsilon$ normalized so that $\inner{\epsilon, e}=1$, we have $$ \rho(A)^{-t} A^t \to e \, \epsilon \qquad (t \to \infty). $$ (eq-patocon) ``` The convergence in {eq}`eq-patocon` provides a sharp characterization of large powers of $A$ that will prove useful in what follows. The assumption that $A$ is everywhere positive can be weakened without affecting this convergence. A complete statement and full proof of the Perron--Frobenius theorem can be found in {cite:t}`meyer2000matrix`. ```{prf:remark} Note that, in general, if $v$ is an everywhere positive real-valued eigenvector for $A$, then so is $\alpha v$ for all $\alpha > 0$. Hence the uniqueness asserted in the Perron--Frobenius theorem is up to positive multiples. It tells us that if $e$ is the right eigenvector corresponding to $\rho(A)$ and $\hat e$ is another positive vector satisfying $A \hat e = \rho(A) \hat e$, then $\hat e = \alpha e$ for some $\alpha > 0$. A similar statement holds for the left eigenvalue $\epsilon$. ``` We can use the Perron--Frobenius theorem to provide bounds on the spectral radius of a nonnegative matrix. Fix $n \times n$ matrix $A = (a_{ij})$ and set - $\rsum_i(A) \coloneq \sum_j a_{ij} =$ the $i$-th row sum of $A$ and - $\csum_j(A) \coloneq \sum_i a_{ij} =$ the $j$-th column sum of $A$. ```{prf:lemma} :label: l-rscsbounds If $A \geq 0$, then 1. $\min_i \rsum_i(A) \leq \rho(A) \leq \max_i \rsum_i(A)$ and 2. $\min_j \csum_j(A) \leq \rho(A) \leq \max_j \csum_j(A)$. ``` ```{exercise} :label: ex-fps-auto-23 Prove {prf:ref}`l-rscsbounds`. (Hint: Since $e$ and $\epsilon$ are nonnegative and nonzero, and since eigenvectors are defined only up to nonzero multiples, you can assume that both of these vectors sum to one.) ``` ```{solution} ex-fps-auto-23 Let $A$ be as stated and let $e$ be the right eigenvector in {eq}`eq-pfrl`. Since $e$ is nonnegative and nonzero, and since eigenvectors are defined only up to constant multiples, we can and do assume that $\sum_j e_j = 1$. From $A e = \rho(A) e$ we have $\sum_j a_{ij} e_j = \rho(A) e_i$ for all $i$. Summing with respect to $i$ gives $\sum_j \csum_j(A) e_j = \rho(A)$. Since the elements of $e$ are nonnegative and sum to one, $\rho(A)$ is a weighted average of the column sums. Hence the second pair of bounds in {prf:ref}`l-rscsbounds` holds. The remaining proof is similar (use the left eigenvector). ``` #### A Local Spectral Radius Result Let $A$ be an $n \times n$ matrix. We know from Gelfand's formula that if $\| \cdot \|$ is any matrix norm, then $\|A^k\|^{1/k} \to \rho(A)$ as $k \to \infty$. While useful, this lemma can be difficult to apply because it involves matrix norms. Fortunately, when $A$ is nonnegative, we have the following variation, which only involves vector norms. ```{prf:lemma} :label: l-lsr Let $\| \cdot \|$ be any norm on $\RR^n$. If $A$ is nonnegative and $h \in \RR^n$ obeys $h \gg 0$, then $$ \| A^k h \|^{1/k} \to \rho(A) \quad \text{ as } k \to \infty. $$ (eq-lsr) ``` The expression on the left of {eq}`eq-lsr` is sometimes called the **local spectral radius** of $A$ at $h$. {prf:ref}`l-lsr` gives one set of conditions under which a local spectral radius equals the spectral radius. This result will be useful when we examine state-dependent discounting in {prf:ref}`c-state_dep`. For a proof of {prf:ref}`l-lsr` see Theorem 9.1 of {cite:t}`krasnoselskii1972approximate`. (sss-smat)= #### Markov Matrices An $n \times n$ matrix $P$ is called a **stochastic matrix** or **Markov matrix** if $$ P \geq 0 \quad \text{and} \quad P \1 = \1 $$ where $\1$ is a column vector of ones, so that $P$ is nonnegative and has unit row sums. The Perron--Frobenius theorem will be useful for the following exercise. ```{exercise} :label: ex-sm_sr1 Let $P, Q$ be $n \times n$ Markov matrices. Prove the following facts. 1. $P Q$ is also a Markov matrix. 2. $\rho(P)=1$. 3. There exists a row vector $\psi \in \RR^n_+$ such that $\psi \1 = 1$ and $\psi P = \psi$. 4. If $P$ is irreducible, then the vector $\psi$ in (iii) is everywhere positive and unique, in the sense that no other vector $\psi \in \RR^n_+$ satisfies $\psi \1 = 1$ and $\psi P = \psi$. ``` ```{solution} ex-sm_sr1 Let $P$ and $Q$ be as stated. Evidently $PQ \geq 0$. Moreover, $PQ \1 = P \1 = \1$, so $PQ$ is Markov. That $\rho(P)=1$ follows directly from {prf:ref}`l-rscsbounds`. By the Perron--Frobenius theorem, there exists a nonzero, nonnegative row vector $\phi$ satisfying $\phi P = \phi$. Rescaling $\phi$ to $\phi / (\phi \1)$ gives the desired vector $\psi$. The final positivity and uniqueness claim is also by the Perron--Frobenius theorem, and its consequences for irreducible matrices. Indeed, if $\phi$ is another nonnegative vector satisfying $\phi \1 = 1$ and $\phi P = \phi$, then, by the Perron--Frobenius theorem, $\phi = \alpha \psi$ for some $\alpha > 0$. But then $\alpha \psi \1 = 1$ and $\psi \1 = 1$, which gives $\alpha=1$. Hence $\phi = \psi$. ``` The vector $\psi$ in part (iii) of {prf:ref}`ex-sm_sr1` is called a **stationary distribution** for $P$. Such distributions play an important role in the theory of Markov chains. We discuss their interpretation and significance in {ref}`sss-serg`. ```{exercise} :label: ex-cgfe Given Markov matrix $P$ and constant $\epsilon > 0$, prove the following result: There exists no $h \in \RR^\Xsf$ with $Ph \geq h + \epsilon$. ``` ```{solution} ex-cgfe Let $P$ and $\epsilon$ have the stated properties. Fix $h \in \RR^\Xsf$. It suffices to show that for this arbitrary $h$ we can find an $x \in \Xsf$ such that $(Ph)(x) < h(x) + \epsilon$. This is easy to verify, since, for $\bar x \in \argmax_{x \in \Xsf} h(x)$ we have $(Ph)(\bar x) = \sum_{x'} h(x') P(\bar x, x') \leq h(\bar x)$. ``` (ss-lake)= ### A Lake Model We illustrate the power of the Perron--Frobenius theorem by showing how it helps us analyze a model of employment and unemployment flows in a large population. The model is sometimes called a "lake model" because there are two pools of workers: those who are currently employed and those who are currently unemployed but still seeking work. The flows between states are as follows: - Workers exit the labor market at rate $d$. - New workers enter the labor market at rate $b$. - Employed workers separate from their jobs and become unemployed at rate $\alpha$. - Unemployed workers find jobs at rate $\lambda$. We assume that all parameters lie in $(0, 1)$. New workers are initially unemployed. Transition rates between two pools appear in Figure {numref}`f-worker_switching`. For example, the rate of flow from employment to unemployment is $\alpha (1-d)$, which equals the fraction of employed workers who remained in the labor market and separated from their jobs. ```{figure} figures/worker_switching.svg :name: f-worker_switching Lake model transition dynamics ``` Let $e_t$ and $u_t$ be the number of employed and unemployed workers at time $t$ respectively. The total population (of workers) is $n_t \coloneq e_t + u_t$. In view of the rates just stated, the number of unemployed workers evolves according to $$ u_{t+1} = (1-d) \alpha e_t + (1-d)(1-\lambda) u_t + b n_t. $$ The three terms on the right correspond to the newly unemployed (due to separation), the unemployed who failed to find jobs last period, and new entrants into the labor force. The number of employed workers evolves according to $$ e_{t+1} = (1-d) (1- \alpha) e_t + (1-d)\lambda u_t . $$ Evolution of the time series for $u_t$, $e_t$ and $n_t$ is illustrated in Figure {numref}`f-lake_2`. We set parameters to $\alpha = 0.01$, $\lambda = 0.1$, $d = 0.02$, and $b = 0.025$. The initial population of unemployed and employed workers are $u_0 = 0.6$ and $e_0 =1.2$, respectively. The series grow over the long run due to net population growth. ```{figure} ../figures/lake_2.pdf :name: f-lake_2 Time series for $e_t$, $u_t$ and $n_t$, (`lake_2.jl`) ``` Can we say more about the dynamics of this system? For example, what long-run unemployment rate should we expect? Also, do long-run outcomes depend heavily on the initial conditions $u_0$ and $e_0$? Can we make some general statements that hold regardless of the initial state? To begin to address these questions, we first organize the linear system for $(e_t)$ and $(u_t)$ by setting $$ x_t \coloneq \begin{pmatrix} u_t \\ e_t \end{pmatrix} \quad \text{and} \quad A \coloneq \begin{pmatrix} (1-d)(1-\lambda) + b & (1-d) \alpha + b \\ (1-d)\lambda & (1-d) (1- \alpha) \end{pmatrix}. $$ (eq-axw) With these definitions, we can write the dynamics as $x_{t+1} = A x_t$. As a result, $x_t = A^t x_0$, where $x_0 = (u_0 \; e_0)^\top$. The overall growth rate of the total labor force is $g = b - d$, in the sense that $n_{t+1} = (1+g) n_t$ for all $t$. ```{exercise} :label: ex-fps-auto-24 Confirm this claim by using the equation $x_{t+1} = A x_t$. ``` ```{solution} ex-fps-auto-24 It is straightforward to confirm that both columns of $A$ sum to $1+g$. As a result, with $\1^\top$ as a row vector of ones, we have $$ n_{t+1} = \1^\top x_{t+1} = \1^\top A x_t = (1+g) \1^\top x_t = (1+g) n_t , $$ as was to be shown. ``` ```{exercise} :label: ex-fps-auto-25 Prove that $\rho(A) = 1 + g$. (Hint: Use one of the results in {ref}`sss-nnmpow`.) ``` ```{exercise} :label: ex-fps-auto-26 By the Perron--Frobenius theorem, $1+g$ is an eigenvalue (in fact the dominant eigenvalue) of $A$. Show that $\1^\top \coloneq (1 \; 1)$ is a left eigenvector corresponding to this eigenvalue. ``` ```{exercise} :label: ex-fps-auto-27 Prove that the unique right eigenvector $\bar x$ satisfying $A \bar x = \rho(A) \bar x$ and $\1^\top \bar x = 1$ is given by $$ \bar x \coloneq \begin{pmatrix} \bar u \\ \bar e \end{pmatrix} \quad \text{with} \quad \bar u \coloneq \frac{1 + g - (1-d)(1-\alpha)} {1 + g - (1-d)(1-\alpha) + (1-d) \lambda}, $$ (eq-lakex) and $\bar e \coloneq 1 - \bar u$. ``` In the language of Perron--Frobenius theory, the right eigenvector $\bar x$ is called the **dominant eigenvector**, since it corresponds to the dominant (i.e., largest) eigenvalue $\rho(A)$. This eigenvector plays an important role in determining long-run outcomes. In the remainder of this section we illustrate this fact. To begin, recall that $\alpha \bar x$ is also a right eigenvector corresponding to the eigenvalue $\rho(A)$ when $\alpha > 0$. The set $D \coloneq \setntn{x \in \RR^2}{x = \alpha \bar x \text{ for some } \alpha > 0}$ is shown as a dashed black line in Figure {numref}`f-lake_1`. The figure also shows two time paths, each of the form $(x_t)_{t \geq 0} = (A^t x_0)_{t \geq 0}$, generated from two different initial conditions. In both cases, we see that both paths converge to $D$ over time. The figure suggests that paths share strong similarities in the long run that are determined by the dominant eigenvector $\bar x$. ```{figure} ../figures/lake_1.pdf :name: f-lake_1 Time paths $x_t = A^t x_0$ for two choices of $x_0$ (`lake_1.jl`) ``` To see why this is so, we return {eq}`eq-patocon` from to the Perron--Frobenius theorem, which tells us that, since $A \gg 0$, we have $$ A^t \approx \rho(A)^t \cdot \bar x \1^\top = (1 + g)^t \begin{pmatrix} \bar u & \bar u \\ \bar e & \bar e \end{pmatrix} \quad \text{for large } t. $$ As a result, for any initial condition $x_0 = (u_0 \; e_0)^\top$, we have $$ A^t x_0 \approx (1 + g)^t \begin{pmatrix} \bar u & \bar u \\ \bar e & \bar e \end{pmatrix} \begin{pmatrix} u_0 \\ e_0 \end{pmatrix} = (1 + g)^t (u_0 + e_0) \begin{pmatrix} \bar u \\ \bar e \end{pmatrix} = n_t \bar x, $$ where $n_t =(1 + g)^t n_0$ and $n_0 = u_0 + e_0$. This says that, regardless of the initial condition, the state $x_t$ scales along $\bar x$ at the rate of population growth. This is precisely what we saw in Figure {numref}`f-lake_1`. We can provide additional interpretations to the components $\bar u$ and $\bar e$ of $\bar x$. Since $n_t$ is the size of the workforce at time $t$, the rate of unemployment is $u_t / n_t$. As just shown, for large $t$ this is close to $(n_t \bar u) / n_t = \bar u$. Hence $\bar u$ is the long-term rate of unemployment along the stable growth path. Similarly, the other component $\bar e$ of the dominant eigenvector is the long-run employment rate. In summary, the dominant eigenvector provides with both the long-run rate of unemployment and the stable growth path, to which all trajectories with positive initial conditions converge over time. ```{prf:remark} A more thorough analysis would require us to think carefully about how the underlying rates $\alpha$, $\lambda$, $b$, and $d$ are determined. For the hiring rate $\lambda$, we could use the job search model to fix the rate at which workers are matched to jobs. In particular, with $w^*$ as the reservation wage, we could set $$ \lambda = \PP\{w_t \geq w^*\} = \sum_{w \geq w^*} \phi(w) . $$ Doing so would allow us to study determinants of $\lambda$ that could include unemployment compensation and workers' impatience. ``` (ss-linopers)= ### Linear Operators There are two ways to think about a matrix. In one definition, an $n \times k$ matrix $A$ is an $n \times k$ array of (real) numbers. In the second, $A$ is a linear operator from $\RR^k$ to $\RR^n$ that takes a vector $u \in \RR^k$ and sends it to $Au$ in $\RR^n$. Let's clarify these ideas in a setting where $n=k$. While the matrix representation is important, the linear operator representation is more fundamental and more general. (sss-matop)= #### Matrices versus Linear Operators A **linear operator** on $\RR^n$ is a map $L$ from $\RR^n$ to $\RR^n$ such that $$ L(\alpha u + \beta v) = \alpha Lu + \beta Lv \quad \text{for all } u, v \in \RR^n \text{ and } \alpha, \beta \in \RR. $$ (eq-linop) (We write $Lu$ instead of $L(u)$, etc.) For example, if $A$ is an $n \times n$ matrix, then the map from $u$ to $Au$ defines a linear operator, since the rules of matrix algebra yield $A(\alpha u + \beta v) = \alpha Au + \beta Av$. We just showed that each matrix can be regarded as a linear operator. In fact the converse is also true: ```{prf:theorem} :label: t-clinop If $L$ is a linear operator on $\RR^n$, then there exists an $n \times n$ matrix $A = (a_{ij})$ such that $Lu = Au$ for all $u \in \RR^n$. ``` A proof of {prf:ref}`t-clinop` can be found in {cite:t}`kreyszig1978introductory` and many other sources. Why introduce linear operators if they are essentially the same as matrices? One reason is that, while a one-to-one correspondence between linear operators and matrices holds in $\RR^n$, the concept of linear operators is far more general. Linear operators can be defined over many different kinds of sets whose elements have vector-like properties. This is related to the point that we made about function spaces in {prf:ref}`r-deep`. Another reason is computational: The matrix representation of a linear operator can be tedious to construct and difficult to instantiate in memory in large problems. We illustrate this point in {ref}`sss-lici`. (sss-linop)= #### Linear Operators on Function Space The definition of linear operators on $\RR^n$ extends naturally to linear operators on $\RR^\Xsf$ when $\Xsf = \{x_1, \ldots, x_n\}$: A **linear operator** on $\RR^\Xsf$ is a map $L$ from $\RR^\Xsf$ to itself such that, for all $u, v \in \RR^\Xsf$ and $\alpha, \beta \in \RR$, we have $L(\alpha u + \beta v) = \alpha Lu + \beta Lv$. In what follows, $$ \lopx \coloneq \text{ the set of all linear operators on } \RR^\Xsf. $$ Let $L$ be a function from $\Xsf \times \Xsf$ to $\RR$. This function induces an operator $L$ from $\RR^\Xsf$ to itself via $$ (Lu)(x) = \sum_{x' \in \Xsf} L(x,x') u(x') \qquad (x \in \Xsf, \; u \in \RR^\Xsf). $$ (eq-ellmat) We use the same symbol $L$ on both sides of the equals sign because both represent essentially the same object (in the sense that a matrix $A$ can be viewed as a collection of numbers $(A_{ij})$ or as a linear map $u \mapsto Au$). The function $L$ on the right-hand side of {eq}`eq-ellmat` is sometimes called the "kernel" of the operator $L$. However, we will call it a matrix in what follows, since $L(x, x') = L(x_i, x_j)$ is just an $n \times n$ array of real numbers. When more precision is required, we will call it the **matrix representation** of $L$. In essence, the operation in {eq}`eq-ellmat` is just matrix multiplication: $(Lu)(x)$ is row $x$ of the matrix product $L u$. ```{exercise} :label: ex-fps-auto-28 Confirm that $L$ on the left-hand side of {eq}`eq-ellmat` is in fact a linear operator (i.e., an element of $\lopx$). ``` The eigenvalues and eigenvectors of the linear operator $L$ are defined as the eigenvalues and eigenvectors of its matrix representation. The spectral radius $\rho(L)$ of $L$ is defined analogously. We used the same symbol for the operator $L$ on the left-hand side of {eq}`eq-ellmat` and its matrix representation on the right because these two objects are in one-to-one correspondence. In particular, every $L \in \lopx$ can be expressed in the form of {eq}`eq-ellmat` for a suitable choice of matrix $(L(x,x'))$. Readers who are comfortable with these claims can skip ahead to {ref}`sss-lici`. The next lemma provides more details. ```{prf:lemma} :label: l-linop When $\Xsf = \{x_1, \ldots, x_n\}$, the following sets are in one-to-one correspondence: 1. The set of all $n \times n$ real matrices. 2. The set of all linear operators on $\RR^n$. 3. The set $\lopx$ of linear operators on $\RR^\Xsf$. 4. The set of all functions from $\Xsf \times \Xsf$ to $\RR$. ``` {prf:ref}`l-linop` needs no formal proof. {prf:ref}`t-clinop` already tells us that (a) and (b) are in one-to-one correspondence. Also, (b) and (c) are in one-to-one correspondence because each $L \in \lopx$ can be identified with a linear operator $u \mapsto Lu$ on $\RR^n$ by pairing $u, Lu \in \RR^\Xsf$ with its vector representation in $\RR^n$ (see {ref}`sss-fvv`). Finally, (d) and (a) are in one-to-one correspondence under the identification $L(x_i, x_j) \leftrightarrow L_{ij}$. (sss-lici)= #### Computational Issues At the end of {ref}`sss-matop` we claimed that working with linear operators brings some computational advantages vis-à-vis working with matrices. This section fills in some details (Readers who prefer not to think about computational issues at this point can skip ahead to {ref}`sss-pomo`.) To illustrate the main idea, consider a setting where the state space $\Xsf$ takes the form $\Xsf = \Ysf \times \Zsf$ with $|\Ysf| = j$ and $|\Zsf| = k$. A typical element of $\Xsf$ is $x = (y, z)$. As we shall see, this kind of setting arises naturally in dynamic programming. Let $Q$ be a map from $\Zsf \times \Zsf$ to $\RR$ (i.e., a $k \times k$ matrix) and consider the operator sending $u \in \RR^\Xsf$ to $Lu \in \RR^\Xsf$ according to the rule $$ (Lu)(x) = (Lu)(y, z) = \sum_{z' \in \Zsf} u(y, z') Q(z, z'). $$ (eq-libm) ```{exercise} :label: ex-fps-auto-29 Prove that $L \in \lopx$. ``` Since $L$ is a linear operator on $\RR^\Xsf$, {prf:ref}`l-linop` tells us that $L$ can be represented as an $n \times n$ matrix $(L(x_i,x_j)) = (L_{ij})$, where $n = |\Xsf| = j \times k$. To construct this matrix, we first need to "flatten" $\Ysf \times \Zsf$ into a set $\Xsf = \{x_1, \ldots, x_n\}$ with a single index. There are two natural ways to do this. Considering $\Ysf \times \Zsf$ as a two-dimensional array with typical element $(y_i, z_j)$, we can (a) stack all $k$ columns vertically into one long column, or (b) concatenate all $j$ rows into one long row. The first arrangement is called **column-major ordering** and is the default for languages such as Julia and Fortran. The second is called **row-major ordering** and is the default for languages such as Python and C. Either way we obtain a set of elements indexed by $1, \ldots, n$. After adopting one of these conventions, {prf:ref}`l-linop` assures us we can construct a uniquely defined $n \times n$ matrix that represents $L$. Once we decide how to construct this matrix, we can instantiate it in computer memory and compute the operation $u \mapsto Lu$ by matrix multiplication. There are, however, several disadvantages to implementing $L$ using this matrix-based approach. One is that constructing the matrix representation is tedious. Another is that confusion can arise when swapping between column- and row-major orderings in order to shift between languages or to communicate with colleagues. A third is that differences are introduced between computer code and the natural representation {eq}`eq-libm`, which can be a source of bugs. A fourth issue is that an $n \times n$ matrix has to be instantiated in memory, even though the linear operation in {eq}`eq-libm` is only an inner product in $\RR^k$. The last issue can be alleviated in most languages by employing sparse matrices, but doing so adds boilerplate and can be a source of inefficiency. Because of these issues, most modern scientific computing environments support linear operators directly, as well as actions on linear operators such as inverting linear maps. These considerations encourage us to take an operator-based approach. (sss-pomo)= #### Positive Operators and Markov Operators Having agreed on the benefits of an operator-theoretic exposition, let us now describe some kinds of linear operators. We continue to assume that $\Xsf$ is a finite set with $n$ elements. The set $\RR^\Xsf_+$ of all $u \in \RR^\Xsf$ with $u \geq 0$ is called the **positive cone** of $\RR^\Xsf$. An operator $L \in \lopx$ is called **positive** if $L$ is invariant on the positive cone; that is, if $$ u \geq 0 \; \implies \; Lu \geq 0. $$ (eq-polinop) ```{prf:example} The operator $L \in \lopx$ defined in {eq}`eq-libm` is positive whenever $Q \geq 0$. This is because $$ u \geq 0 \; \implies \; \sum_{z'} u(y, z') Q(z, z') \geq 0 \text{ for all } x = (y, z) \text{ in } \Xsf. $$ ``` ```{prf:lemma} :label: l-posifnon An operator $L \in \lopx$ is positive if and only if its matrix representation is a nonnegative matrix. ``` ```{exercise} :label: ex-fps-auto-30 Prove {prf:ref}`l-posifnon`. ``` ```{solution} ex-fps-auto-30 Fix $L \in \lopx$ with $(Lu)(x) = \sum_{x' \in \Xsf} L(x,x') u(x')$ for all $x \in \Xsf$ and $u \in \RR^\Xsf$. Positivity of $L$ requires that $$ u \geq 0 \; \implies \; \sum_{x' \in \Xsf} L(x,x') u(x') \geq 0 \text{ for all } x \in \Xsf. $$ Clearly, this holds whenever $L(x, x') \geq 0$ for all $x, x' \in \Xsf$. Regarding the converse, suppose that $L$ is positive. Seeking a contradiction, suppose in addition that we can find a pair $(x_a, x_b) \in \Xsf \times \Xsf$ such that $L(x_a,x_b) < 0$. With $u(x) \coloneq \1\{x = x_b\}$, we have $(Lu)(x_a) = \sum_{x' \in \Xsf} L(x_a,x') u(x') = L(x_a, x_b) < 0$. This contradicts positivity of $L$. ``` ```{prf:remark} The {prf:ref}`l-posifnon` characterization suggests that we should really call a linear operator satisfying {eq}`eq-polinop` "nonnegative" rather than positive. Nevertheless, the "positive" terminology is standard (see, e.g., {cite}`zaanen2012introduction`). ``` ```{exercise} :label: ex-plop Given $L \in \lopx$, prove the following statement: $L$ is positive if and only if $L$ is order-preserving on $\RR^\Xsf$ under the pointwise order. ``` ```{solution} ex-plop Suppose first that $L$ is positive. Fix $u \leq v$ in $\RR^\Xsf$ and observe that, by positivity, $L(v-u) \geq 0$. But then $Lv - Lu \geq 0$ and hence $Lu \leq Lv$. This shows that $L$ is order-preserving. Regarding the converse, if $L$ is order-preserving, then $u \geq 0$ implies $Lu \geq L0$. But for every linear operator, we have $L0 = 0$, and so $Lu \geq 0$. Hence $L$ is a positive operator. ``` An operator $P \in \lopx$ is called a **Markov operator** on $\RR^\Xsf$ if $P$ is positive and $P \1 = \1$. We let $$ \mopx \coloneq \text{ the set of all Markov operators on } \RR^\Xsf. $$ Viewed as matrices, elements of $\mopx$ are nonnegative matrices whose rows sum to one. The next exercise asks you to confirm this. ```{exercise} :label: ex-cmoper Fix $P \in \lopx$ and let $P(x,x')$ be the matrix representation. Prove that $P \in \mopx$ if and only if $P(x,x') \geq 0$ for all $x,x' \in \Xsf$ and $\sum_{x' \in \Xsf} P(x, x') =1$ for all $x \in \Xsf$. ``` ```{solution} ex-cmoper Fix $P \in \lopx$ and let $P(x,x')$ be the matrix representation, so that $$ (Pu)(x) = \sum_{x'} P(x,x') u(x') \qquad (x \in \Xsf), $$ for any $u \in \RR^\Xsf$. Suppose first that $P \in \mopx$. The statement that $P$ is a positive linear operator is equivalent to $P(x,x') \geq 0$ for all $x,x'$ by {prf:ref}`l-posifnon`. Moreover, $P\1 = \1$ is equivalent to $\sum_{x' \in \Xsf} P(x, x') =1$ for all $x \in \Xsf$. ``` ```{exercise} :label: ex-pmptp Prove: If $P \in \mopx$ and $v \in \RR^\Xsf$ with $v \gg 0$, then $Pv \gg 0$. ``` In the next exercise, you can think of $\phi$ as a row vector and $\phi P$ as premultiplying the matrix $P$ by this row vector. {prf:ref}`c-mcs` uses the map $\phi \mapsto \phi P$ to update marginal distributions generated by Markov chains. ```{exercise} :label: ex-fps-auto-31 Fix $P \in \lopx$. Prove that $P \in \mopx$ if and only if the function $\phi P$ defined by $$ (\phi P)(x') = \sum_{x \in \Xsf} P(x, x') \phi(x) \qquad (x' \in \Xsf) $$ (eq-updatedist) is in $\dD(\Xsf)$ whenever $\phi \in \dD(\Xsf)$. ``` ```{solution} ex-fps-auto-31 In the solution, we use the characterization in {prf:ref}`ex-cmoper`: $P \in \mopx$ if and only if $P(x,x') \geq 0$ for all $x,x' \in \Xsf$ and $\sum_{x' \in \Xsf} P(x, x') =1$ for all $x \in \Xsf$. Fix $P \in \lopx$ and suppose first that $P \in \mopx$. Then $$ (\phi P)(x') = \sum_x P(x, x') \phi(x) \qquad (x' \in \Xsf) $$ (eq-updatedista) is in $\dD(\Xsf)$ whenever $\phi \in \dD(\Xsf)$, since, for any such $\phi$, the vector $\phi P$ is clearly nonnegative and $$ \sum_{x'} (\phi P)(x') = \sum_x \sum_{x'} P(x, x') \phi(x) = \sum_x \phi(x) = 1. $$ Now suppose instead that $P \in \lopx$ and $\phi P \in \dD(\Xsf)$ whenever $\phi \in \dD(\Xsf)$. It follows that $P(x, x')$ is nonnegative at arbitrary $(x,x')$, since $(\phi P)(x') = P(x,x')$ when $\phi$ is the distribution that puts all mass on $x$. Moreover, $P(x, \cdot)$ must sum to one at arbitrary $x$ because if $\phi$ is the distribution that puts all mass on $x$, then $$ 1 = \sum_{x'} (\phi P)(x') = \sum_{x'} P(x,x'). $$ ``` Markov operators are important for us because they generate Markov dynamics, a foundation of dynamic programming. Thus, {eq}`eq-updatedist` is a rule for updating distributions by one period under the Markov dynamics specified by $P$. We'll use it often in the next chapter. (s-cn_fps)= ## Chapter Notes {cite:t}`davey2002introduction` provide a good introduction to partial orders and order-theoretic concepts. Our favorite books on fixed points and analysis include {cite:t}`ok2007real`, {cite:t}`zhang2012variational`, {cite:t}`cheney2013analysis`, and {cite:t}`atkinson2005theoretical`. Good background material on order-theoretic fixed-point methods can be found in {cite:t}`guo2004partial` and {cite:t}`zhang2012variational`. [^1]: For example, eigenvalues of $A = \diag(-1, 0)$ are $\{-1, 0\}$. Hence $\rho(A) = |-1| = 1$, which is not an eigenvalue of $A$. ======================================================================== ## Markov Chains (c-mcs)= # Markov Dynamics To prepare to analyze dynamic programs, we now study stochastic processes generated by Markov chains. These processes are widely used to construct economic and financial models. At the end of this chapter we return to the job search problem from {prf:ref}`c-introii` and allow wage draws to be correlated over time (rather than iid). We use a Markov chain to generated serially correlated wage draws. Throughout this chapter, the symbol $\Xsf$ represents a finite set. (s-fmcs)= ## Foundations This section describes elementary properties of Markov models. (ss-markchain)= ### Markov Chains Let's start with a definition and some simple examples. (sss-dmcs)= #### Defining Markov Chains Fix $\Xsf = \{x_1, \ldots, x_n\}$ and $P \in \mopx$. We interpret $P(x, x')$ as the probability that a random process moves from $x$ to $x'$ over one unit of time. For this interpretation to make sense we need $P(x, x')$ to be nonnegative and $\sum_{x' \in \Xsf} P(x, x')$ to equal one for every $x \in \Xsf$, since we want the chain to stay somewhere in the state space after each update. These are exactly the properties guaranteed by the assumption $P \in \mopx$ (see {prf:ref}`ex-cmoper`). To formalize ideas, let $(X_t) \coloneq (X_t)_{t \geq 0}$ be a sequence of random variables taking values in $\Xsf$ and call $(X_t)$ a **Markov chain** on **state space** $\Xsf$ if there exists a $P \in \mopx$ such that $$ \PP \{ X_{t+1} = x' \mid X_0, X_1, \ldots, X_t \} = P(X_t, x') \quad \text{for all} \quad t \geq 0, \; x' \in \Xsf. $$ (eq-mcdef) To simplify terminology, we also call $(X_t)$ **$P$-Markov** when {eq}`eq-mcdef` holds. We call either $X_0$ or its distribution $\psi_0$ the **initial condition** of $(X_t)$, depending on context. $P$ is also called the **transition matrix** of the Markov chain. The definition of a Markov chain says two things: 1. When updating to $X_{t+1}$ from $X_t$, earlier states are not required. 2. $P$ encodes all of the information required to perform the update, given the current state $X_t$. One way to think about Markov chains is algorithmically: Fix $P \in \mopx$ and let $\psi_0$ be an element of $\dD(\Xsf)$. Now generate $(X_t)$ via {prf:ref}`algo-mc`. The resulting sequence is $P$-Markov with initial condition $\psi_0$. ```{prf:algorithm} Generation of $P$-Markov $(X_t)$ with initial condition $\psi_0$ :label: algo-mc - $t \leftarrow 0$ - $X_t \leftarrow$ a draw from $\psi_0$ - while $t < \infty$: - $X_{t+1} \leftarrow $ a draw from the distribution $P(X_t, \cdot)$ - $t \leftarrow t + 1$ ``` (sss-mcss)= #### Application: S--s Dynamics As an example, consider a firm whose inventory of some product follows S--s dynamics, meaning that the firm waits until its inventory falls below some level $s > 0$ and then immediately replenishes by ordering $S$ units. This pattern of decisions can be rationalized if ordering requires paying a fixed cost. Thus, in {ref}`ss-ip`, we will show that S--s behavior is optimal in a setting where fixed costs exist and the firm's aim is to maximize its present value. To represent S--s dynamics, we suppose that a firm's inventory $(X_t)_{t \geq 0}$ of a given product obeys $$ X_{t+1} = \max\{ X_t - D_{t+1}, 0\} + S \1\{X_t \leq s\}, $$ where - $(D_t)_{t \geq 1}$ is an exogenous iid demand process with $D_t \eqdist \phi \in \dD(\ZZ_+)$ for all $t$ and - $S$ is the quantity ordered when $X_t \leq s$. For the distribution $\phi$ of demand we take the geometric distribution, so that $\phi(d) = \PP\{D_t = d\} = p (1 - p)^d$ for $d \in \ZZ_+$. ```{exercise} :label: ex-mcs-auto-1 Confirm the following claim: An appropriate state space for this model is $\Xsf \coloneq \{0, \ldots, S+s\}$, since $$ X_t \in \Xsf \implies \PP\{X_{t+1} \in \Xsf\} = 1. $$ ``` ```{solution} ex-mcs-auto-1 Let $X_t = x \in \Xsf$, so that $X_{t+1} = \max\{ x - D_{t+1}, 0\} + S \1\{x \leq s\}$. Evidently, $X_{t+1}$ is integer-valued and nonnegative. If $x \leq s$, then $X_{t+1} \leq \max\{s - D_{t+1}, 0\} + S \leq s + S$. Similarly, if $s < x \leq S + s$, then $X_{t+1} \leq \max\{x - D_{t+1}, 0\} \leq S + s$. The claim is verified. ``` If we define $h(x, d) \coloneq \max\{ x - d, 0\} + S \1\{x \leq s\}$, so that $X_{t+1} = h(X_t, D_{t+1})$ for all $t$, then the transition matrix can be expressed as $$ P(x, x') = \PP\{h(x, D_{t+1}) = x'\} = \sum_{d \geq 0} \1\{h(x, d) = x'\} \phi(d) \qquad ((x, x') \in \Xsf \times \Xsf). $$ {numref}`list-inventory_sim` provides code that simulates inventory paths and computes other objects of interest. Since the state space $\Xsf = \{x_1, \ldots, x_n\}$ corresponds to $\{0, \ldots, S+s\}$ and Julia indexing starts at $1$, we set $x_i = i-1$. This convention is used when computing `P[i, j]`{.julia}, which corresponds to $P(x_i, x_j)$. The code in the listing is used to produce the simulation of inventories in Figure {numref}`f-inventory_sim_1`. The function `compute_mc`{.julia} returns an instance of a `MarkovChain`{.Julia} object that can store both the state $\Xsf$ and the transition probabilities. The `QuantEcon.jl`{.julia} library defines this data type and provides functions that simulate a Markov chains, compute a stationary distribution, and perform related tasks. ```{code-block} julia :name: list-inventory_sim :caption: An implementation of S--s inventory dynamics (`inventory_sim.jl`) :linenos: using Distributions, QuantEcon, IterTools function create_inventory_model(; S=100, # Order size s=10, # Order threshold p=0.4) # Demand parameter ϕ = Geometric(p) h(x, d) = max(x - d, 0) + S * (x <= s) return (; S, s, ϕ, h) end "Simulate the inventory process." function sim_inventories(model; ts_length=200) (; S, s, ϕ, h) = model X = Vector{Int32}(undef, ts_length) X[1] = S # Initial condition for t in 1:(ts_length-1) X[t+1] = h(X[t], rand(ϕ)) end return X end "Compute the transition probabilities and state." function compute_mc(model; d_max=100) (; S, s, ϕ, h) = model n = S + s + 1 # Size of state space state_vals = collect(0:(S + s)) P = Matrix{Float64}(undef, n, n) for (i, j) in product(1:n, 1:n) P[i, j] = sum((h(i-1, d) == j-1) * pdf(ϕ, d) for d in 0:d_max) end return MarkovChain(P, state_vals) end "Compute the stationary distribution of the model." function compute_stationary_dist(model) mc = compute_mc(model) return mc.state_values, stationary_distributions(mc)[1] end ``` ```{figure} ../figures/inventory_sim_1.pdf :name: f-inventory_sim_1 Inventory simulation (`inventory_sim.jl`) ``` #### Higher Order Transition Matrices Given a finite state space $\Xsf$, $k \geq 0$ and $P \in \mopx$, let $P^k$ be the $k$-th power of $P$. (If $k = 0$, then $P^k$ is the identity matrix.) Since $\mopx$ is closed under multiplication ({prf:ref}`ex-sm_sr1`), $P^k$ is in $\mopx$ for all $k \geq 0$. In this context, $P^k$ is sometimes called the **$k$-step transition matrix** corresponding to $P$. In what follows, $P^k(x,x')$ denotes the $(x,x')$-th element of the matrix representation of $P^k$. The $k$-step transition matrix has the following interpretation: If $(X_t)$ is $P$-Markov, then for any $t, k \in \ZZ_+$ and $x, x' \in \Xsf$, $$ P^k(x, x') = \PP\{X_{t + k} = x' \given X_t = x\}. $$ (eq-pik) Thus, $P^k$ provides the $k$-step transition probabilities for the $P$-Markov chain $(X_t)$. ```{exercise} :label: ex-mcs-auto-2 Prove the claim in the last sentence via induction. ``` ```{solution} ex-mcs-auto-2 Fixing $t \geq 0$ and $P \in \mopx$, this claim can be verified by induction over $k$. The claim is obviously true when $k=0, 1$. Suppose the claim is also true at $k$ and now consider the case $k+1$. By the law of total probability, for given $x, x' \in \Xsf$, we have $$ \PP\{X_{t+k+1} = x' \given X_t = x \} = \sum_z \PP\{X_{t+k+1} = x' \given X_{t+k} = z\} \PP\{X_{t+k} = z \given X_t = x\}. $$ The induction hypothesis allows us to use {eq}`eq-pik` at $k$, so the last equation becomes $$ \PP\{X_{t+k+1} = x' \given X_t = x \} = \sum_z P^k(x, z) P(z, x') = P^{k+1} (x, x'). $$ The law {eq}`eq-pik` is now verified at $k+1$, completing our proof by induction. ``` We can now give the following useful characterization of irreducibility: ```{prf:lemma} :label: l-ecoi Given $P \in \mopx$, the following statements are equivalent: 1. $P$ is irreducible. 2. If $(X_t)$ is $P$-Markov and $x, x' \in \Xsf$, then there exists a $k \geq 0$ such that $$ \PP\{X_k = x' \given X_0=x\} > 0. $$ ``` Thus, irreducibility of $P$ means that the $P$-Markov chain eventually visits any state from any other state with positive probability. ```{prf:proof} *Proof of {prf:ref}`l-ecoi`.* Fix $P \in \mopx$. $P$ is irreducible if and only if $\sum_{k \geq 0} P^k \gg 0$. This is equivalent to the statement that for each $(x,x') \in \Xsf \times \Xsf$, there exists a $k \geq 0$ such that $P^k(x,x') > 0$, which is, in turn, equivalent to part (ii) of {prf:ref}`l-ecoi`. ◻ ``` ```{exercise} :label: ex-mcs-auto-3 Using {prf:ref}`l-ecoi`, prove that the stochastic matrix associated with the S--s inventory dynamics in {ref}`sss-mcss` is irreducible. ``` ```{solution} ex-mcs-auto-3 Let $x \in \Xsf$ be the current state at time $t$ and suppose first that $s < x$. The next period state $X_{t+1}$ hits $s$ with positive probability, since $\phi(d) > 0$ for all $d \in \ZZ_+$. The state $X_{t+2}$ hits $S+s$ with positive probability, since $\phi(0)>0$. From $S+s$, the inventory level reaches any point in $\Xsf = \{0, \ldots, S+s\}$ in one step with positive probability. Hence, from current state $x$, inventory reaches any other state $y$ with positive probability in three steps. The logic for the case $x \leq s$ is similar and left to the reader. ``` Several libraries have code for testing irreducibility, including `QuantEcon.jl`. See {numref}`list-is_irreducible` for an example of a call to this functionality. In this case, irreducibility fails because state 2 is an **absorbing state**. Once entered, the probability of ever leaving that state is zero. (A subset $\Ysf$ of $\Xsf$ with this property is called an **absorbing set**.) ```{code-block} julia :name: list-is_irreducible :caption: Testing irreducibility (`is_irreducible.jl`) :linenos: using QuantEcon P = [0.1 0.9; 0.0 1.0] mc = MarkovChain(P) print(is_irreducible(mc)) ``` (sss-serg)= ### Stationarity and Ergodicity Next we review aspects of Markov dynamics, including stationarity and ergodicity. Fix $P \in \mopx$ and let $(X_t)$ be $P$-Markov. Let $\psi_t$ be the distribution of $X_t$. Marginal distributions $\psi_t$ evolve according to $$ \psi_{t+1}(x') = \sum_x P(x, x') \psi_t(x) \quad \text{for all } x' \in \Xsf \text{ and } t \geq 0. $$ (eq-mdd) To verify {eq}`eq-mdd`, rewrite it as $\PP \{X_{t+1} = x'\} = \sum_x \PP\{X_{t+1}=x' \,|\, X_t=x\} \PP\{X_t=x\}$, which is true by the law of total probability. With each $\psi_t$ regarded as a row vector, {eq}`eq-mdd` can also be written as $$ \psi_{t+1} = \psi_t P. $$ (eq-fdemc) Equation {eq}`eq-fdemc` tells us that dynamics of marginal distributions for Markov chains are generated by deterministic linear difference equations in distribution space. This is remarkable because the dynamics that drive $(X_t)$ are stochastic and can be arbitrarily nonlinear. Iterating on {eq}`eq-fdemc`, we get $\psi_t = \psi_0 P^t$ for all $t$. In summary, $$ (X_t)_{t \geq 0} \text{ is } P \text{-Markov with } X_0 \eqdist \psi_0 \; \implies \; X_t \eqdist \psi_0 P^t \text{ for all } t \geq 0. $$ (eq-mcmdists) For {eq}`eq-mcmdists` and $\psi_{t+1} = \psi_t P$ to hold, each $\psi_t$ must be a row vector. In what follows, we always treat the distributions $(\psi_t)_{t \geq 0}$ of $(X_t)_{t \geq 0}$ as row vectors. ```{exercise} :label: ex-mcs-auto-4 Let $(X_t)$ be $P$-Markov on $\Xsf$ with $X_0 \eqdist \psi_0$. Show that $$ \EE h(X_t) = \psi_0 P^t h = \inner{\psi_0 P^t, h} \quad \text{for all } t \in \NN \text{ and } h \in \RR^\Xsf. $$ ``` ```{solution} ex-mcs-auto-4 Fix $t \in \NN$. Under the stated hypotheses, we have $X_t \eqdist \psi_0 P^t$ (see {eq}`eq-mcmdists`). Hence $$ \EE h(X_t) = \sum_{x'} h(x') \PP\{X_t = x'\} = \sum_{x'} h(x') (\psi_0 P^t)(x') = \inner{\psi_0 P^t, h}. $$ ``` Consistent with our definition of stationary distributions in {ref}`sss-smat`, a marginal distribution $\psi^* \in \dD(\Xsf)$ is called **stationary** for $P$ if $$ \sum_x P(x, x') \psi^*(x) = \psi^*(x') \quad \text{for all } x \in \Xsf. $$ In vector form, this is $\psi^* P = \psi^*$. By this definition and {eq}`eq-mdd`, if $\psi^*$ is stationary and $X_t$ has distribution $\psi^*$, then so does $X_{t+k}$ for all $k \geq 1$. We saw in {prf:ref}`ex-sm_sr1` that every irreducible $P \in \mopx$ has exactly one stationary distribution in $\dD(\Xsf)$. The following **ergodic property** holds under the same assumptions. ```{prf:theorem} :label: t-mcat If $P$ is irreducible with stationary distribution $\psi^*$, then, for any $P$-Markov chain $(X_t)$ and any $x \in \Xsf$, we have $$ \PP \left\{ \lim_{k \to \infty} \frac{1}{k} \sum_{t=0}^{k-1} \1\{X_t = x\} = \psi^*(x) \right\} = 1. $$ (eq-erglln) ``` A proof of {eq}`eq-erglln` can be found in {cite:t}`bremaud2020markov`. Property {eq}`eq-erglln` tells us that, with probability one (i.e., for almost every $P$-Markov chain that we generate), the fraction of time that the chain spends in any given state is, in the limit, equal to the probability assigned to that state by the stationary distribution. Markov chains with this property are sometimes said to be **ergodic**. Since the S--s inventory model from {ref}`sss-mcss` is irreducible, the ergodicity result from {prf:ref}`t-mcat` applies. In particular, the process has only one stationary distribution $\psi^*$ in $\dD(\Xsf)$, where $\Xsf = \{0, \ldots, S+s\}$, and {eq}`eq-erglln` is valid. Figure {numref}`f-inventory_sim_2` illustrates this by plotting both the stationary distribution $\psi^*$ (which is computed using the code in {numref}`list-inventory_sim`), and the value $m(y) \coloneq \frac{1}{k} \sum_{t=0}^{k-1} \1\{X_t = y\}$ at each $y \in \Xsf$ for $k$ set to $1,000,000$. As predicted by the theorem, the fraction of time spent by the chain in each state is close to the probability assigned by $\psi^*$. ```{figure} ../figures/inventory_sim_2.pdf :name: f-inventory_sim_2 Ergodicity (`inventory_sim.jl`) ``` (sss-eue)= #### Application: Day Laborer Suppose that a day laborer is either unemployed ($X_t = 1$) or employed ($X_t = 2$) in each period. In state $1$ he is hired with probability $\alpha \in (0, 1)$. In state $2$ he is fired with probability $\beta \in (0, 1)$. The corresponding state space and transition matrix are $$ \Xsf = \{1, 2\} \quad \text{and} \quad P = \begin{pmatrix} 1-\alpha & \alpha \\ \beta & 1-\beta \end{pmatrix}. $$ (eq-pw) {numref}`list-laborer_sim` provides a function to update from $X_t$ to $X_{t+1}$, using the fact that $\texttt{rand()}$ generates a draw from the uniform distribution on $[0,1)$. ```{exercise} :label: ex-mcs-auto-5 Explain why {numref}`list-laborer_sim` updates the current state according to the probabilities in $P$. ``` ```{code-block} julia :name: list-laborer_sim :caption: Updating the state of the day laborer (`laborer_sim.jl`) :linenos: function create_laborer_model(; α=0.3, β=0.2) return (; α, β) end function laborer_update(x, model) # update X from t to t+1 (; α, β) = model if x == 1 x′ = rand() < α ? 2 : 1 else x′ = rand() < β ? 1 : 2 end return x′ end ``` ```{exercise} :label: ex-mcs-auto-6 Because $P$ is everywhere positive, it must be irreducible, so $P$ has a unique stationary distribution $\psi^* \in \dD(\Xsf)$. Show that $\psi^*$ is given by $$ \psi^* = \frac{1}{\alpha + \beta} \begin{pmatrix} \beta & \alpha \end{pmatrix}. $$ ``` It is also true that $\psi P^t \to \psi^*$ as $t \to \infty$ for any $\psi \in \dD(\Xsf)$. Thus, the operator $P$ when understood as the mapping $\psi \mapsto \psi P$, is globally stable on $\dD(\Xsf)$ ```{exercise} :label: ex-mcs-auto-7 Prove this using the Perron--Frobenius theorem. More generally, show that this global stability result holds for any $P \in \mopx$ with $P \gg 0$. ``` ```{solution} ex-mcs-auto-7 Assume $P$ is everywhere positive with unique stationary distribution $\psi^*$. Since $\rho(P)=1$, the last part of the Perron--Frobenius theorem tells us that $P^t \to e \, \epsilon$ as $t \to \infty$, where $e$ and $\epsilon$ are the dominant right and left eigenvectors, normalized such that $\inner{e, \epsilon} = 1$. In this case, we know $\psi^*$ is the dominant left eigenvector and $\1$ is the dominant right eigenvector. Moreover, $\psi^* \in \dD(\Xsf)$ yields $\inner{\psi^*, \1}=1$. Hence, for any $\psi \in \dD(\Xsf)$, we have $$ \psi P^t \to \psi \1 \, \psi^* = \psi^* \quad \text{as} \quad t \to \infty. $$ Hence global stability holds, as claimed. ``` ```{exercise} :label: ex-mcs-auto-8 Fix $\alpha=0.3$ and $\beta=0.2$. Compute the sequence $(\psi P^t)$ for different choices of $\psi$ and confirm that your results are consistent with the claim that $\psi P^t \to \psi^*$ as $t \to \infty$ for any $\psi \in \dD(\Xsf)$. ``` ```{exercise} :label: ex-mcs-auto-9 Since $P$ is irreducible, ergodicity property {eq}`eq-erglln` holds. Simulate a long realization of a $P$-Markov chain from an arbitrary initial condition and confirm that your results are consistent with {eq}`eq-erglln`. ``` (ss-apta)= ### Approximation To simplify numerical calculations, we sometimes approximate a continuous-state Markov process with a Markov chain. For example, consider a **linear Gaussian AR(1)** model, where $(X_t)_{t \geq 0}$ evolves in $\RR$ according to $$ X_{t+1} = \rho X_t + b + \nu \epsilon_{t+1}, \quad |\rho|<1, \quad ( \epsilon_t ) \iidsim N(0, 1). $$ (eq-arrepeat) The model {eq}`eq-arrepeat` has a unique **stationary distribution** $\psi^*$ given by $$ \psi^* = N(\mu_x, \sigma_x^2) \quad \text{with} \quad \mu_x \coloneq \frac{b}{1-\rho} \quad \text{and} \quad \sigma_x^2 \coloneq \frac{\nu^2}{1-\rho^2}. $$ This means that $$ \text{ } X_t \eqdist \psi^* \text{ and } X_{t+1} = \rho X_t + b + \nu \epsilon_{t+1} \text{ implies } X_{t+1} \eqdist \psi^* \text{. } $$ ```{exercise} :label: ex-mcs-auto-10 Suppose that $X_t \eqdist \psi^*$, $\epsilon_{t+1} \eqdist N(0, 1)$ and $X_t$ and $\epsilon_{t+1}$ are independent. Prove that $\rho X_t + b + \nu \epsilon_{t+1}$ has distribution $\psi^*$. Is this still true if we drop the independence assumption? ``` Process {eq}`eq-arrepeat` is also ergodic in a similar sense to {eq}`eq-erglln`: On average, realizations of the process spend most of their time in regions of the state where the stationary distribution puts high probability mass. (You can check this via simulations if you wish.) Hence, in the discretization that follows, we shall put the discrete state space in this area. ```{exercise} :label: ex-pfnr Set $b=0$ in {eq}`eq-arrepeat` and let $F$ be the CDF of $N(0, \nu^2)$. Show that $$ \PP\{t - \delta < X_{t+1} \leq t + \delta \given X_t = x\} = F(t - \rho x + \delta) - F(t - \rho x - \delta), $$ for all $\delta, t \in \RR$. ``` ```{solution} ex-pfnr Since we are conditioning on $X_t =x$, we can replace $X_{t+1}$ with $\rho x + \nu \epsilon_{t+1}$. The result then follows from $\PP\{\alpha < \nu \epsilon_{t+1} \leq \beta\} = F(\beta)-F(\alpha)$. ``` To discretize {eq}`eq-arrepeat` we use **Tauchen's method**, starting with the case $b=0$.[^1] As a first step, we choose $n$ as the number of states for the discrete approximation and $m$ as an integer that sets the width of the state space. Then we create a state space $\Xsf \coloneq \{x_1, \ldots, x_n\} \subset \mathbb R$ as an equispaced grid that brackets the stationary mean on both sides by $m$ standard deviations: - set $x_1 = -m \, \sigma_x$, - set $x_n = m \, \sigma_x$ and - set $x_{i+1} = x_i + s$ where $s = (x_n - x_1) / (n - 1)$ and $i$ in $\natset{n-1}$. The next step is to create an $n \times n$ matrix $P$ that approximates the dynamics in {eq}`eq-arrepeat`. For $i, j \in \natset{n}$, 1. if $j = 1$, then set $P(x_i, x_j) = F(x_1-\rho x_i + s/2)$. 2. If $j = n$, then set $P(x_i, x_j) = 1 - F(x_n - \rho x_i - s/2)$. 3. Otherwise, set $P(x_i, x_j) = F(x_j - \rho x_i + s/2) - F(x_j - \rho x_i - s/2)$. The first two are boundary rules and the third applies {prf:ref}`ex-pfnr`. ```{exercise} :label: ex-mcs-auto-11 Prove that $\sum_{j=1}^n P(x_i, x_j)=1$ for all $i \in \natset{n}$. ``` Finally, if $b \not= 0$, then we shift the state space to center it on the mean $\mu_x$ of the stationary distribution $N(\mu_x, \sigma_x^2)$. This is done by replacing $x_i$ with $x_i + \mu_x$ for each $i$. Julia routines that compute $\Xsf$ and $P$ can be found in the library [QuantEcon.jl](https://github.com/QuantEcon/QuantEcon.jl). Figure {numref}`f-tauchen_1` compares the continuous stationary distribution $\psi^*$ and the unique stationary distribution of the discrete approximation when $\Xsf$ and $P$ are constructed using Tauchen's method when $\rho=0.9$, $b=0.0$, $\nu=1.0$ and the discretization parameters are $n=15$ and $m=3$. ```{figure} ../figures/tauchen_1.pdf :name: f-tauchen_1 Comparison of $\psi^* = N(\mu_x, \sigma_x^2)$ and its discrete approximant ``` (s-condexp)= ## Conditional Expectations In this section, we discuss how to compute conditional expectations for Markov chains. The theory will be essential for the study of finite Markov decision processes, since, in these models, lifetime rewards are mathematical expectations of flow reward functions of Markov states. ### Mathematical Expectations We begin with mathematical expectations of functions of Markov states. (sss-ceo)= #### Conditional Expectations Fix $P \in \mopx$. For each $h \in \RR^\Xsf$, we define $$ (P h)(x) = \sum_{x' \in \Xsf} h(x') P(x,x') \qquad (x \in \Xsf). $$ (eq-piactrc) Noting that $P(x, \cdot)$ is the distribution of $X_{t+1}$ given $X_t = x$, we can write $$ (P h)(x) = \EE [h(X_{t+1}) \given X_t = x], $$ (eq-expin) where $(X_t)$ is any $P$-Markov chain on $\Xsf$. In terms of matrix algebra, viewing $h$ has an $n \times 1$ column vector, the expression $(Ph)(x)$ is one element of the vector $Ph$ obtained by premultiplying $h$ by $P$. The interpretation in {eq}`eq-expin` extends to powers of $P$. In particular, we have $$ (P^k h)(x) = \sum_{x'} h(x') P^k(x,x') = \EE [h(X_{t+k}) \given X_t = x]. $$ (eq-exp_k) ```{exercise} :label: ex-bpropm Show that 1. Every constant function $h \in \RR^\Xsf$ is a fixed point of $P$ (i.e., $P h = h$). 2. $\max_x |Ph(x)| \leq \max_x |h(x)|$ for all $h \in \RR^\Xsf$. ``` (sss-lawie)= #### The Law of Iterated Expectations The **law of iterated expectations** is a workhorse in economics and finance. One version of the law states that if $X$ and $Y$ are two random variables, then $\EE[ \EE[Y \given X] ] = \EE[Y]$. Let's show how this law operates for Markov chains. Let $(X_t)$ be $P$-Markov with $X_0 \eqdist \psi_0$. Fix $t, k \in \NN$. Set $\EE_t \coloneq \EE [ \cdot \given X_t]$. We claim that $$ \EE [ \EE_t [h(X_{t+k})] ] = \EE [ h(X_{t+k}) ] \quad \text{for any } h \in \RR^\Xsf. $$ (eq-lie) To see that this holds, recall that $\EE [h(X_{t+k}) \given X_t = x] = (P^k h)(x)$. Hence $\EE [h(X_{t+k}) \given X_t] = (P^k h)(X_t)$. Therefore, $$ \EE [ \EE_t [h(X_{t+k})] ] = \EE [ (P^k h)(X_t) ] = \sum_{x'} (P^k h)(x') \psi_t (x') = \sum_{x'} (P^k h)(x') (\psi_0 P^t) (x'). $$ Since $\psi_0 P^t$ is a row vector, we can write the last expression as $$ \psi_0 P^t P^k h = \psi_0 P^{t+k} h = \psi_{t+k} h = \EE h(X_{t+k}). $$ Hence {eq}`eq-lie` holds. (sss-mmcs)= #### Monotone Markov Chains (ss-mmc)= Next, we connect Markov chains to order theory via stochastic dominance. These connections will have applications later in the book. Let $\Xsf$ be a finite set partially ordered by $\preceq$. A Markov operator $P \in \mopx$ is called **monotone increasing** if $$ x, y \in \Xsf \text{ and } x \preceq y \quad \implies \quad P(x, \cdot) \lefsd P(y, \cdot). $$ Thus, $P$ is monotone increasing if shifting up the current state shifts up the next-period state, in the sense that its distribution increases in the stochastic dominance ordering (see {ref}`sss-sdfin`) on $\dD(\Xsf)$. Below, we will see that monotonicity of Markov operators is closely related to monotonicity of value functions in dynamic programming. Monotonicity of Markov operators is related to positive autocorrelation. To illustrate the idea, consider the AR(1) model $X_{t+1} = \rho X_t + \sigma \epsilon_{t+1}$ from {ref}`ss-apta` and suppose we apply Tauchen discretization, mapping the parameters $\rho , \sigma$ and a discretization size $n$ into a Markov operator $P$ on state space $\Xsf = \{x_1, \ldots, x_n\} \subset \RR$, totally ordered by $\leq$. If $\rho \geq 0$, so that positive autocorrelation holds, then $P$ is monotone increasing. ```{exercise} :label: ex-tami Verify this claim. ``` ```{solution} ex-tami Using {prf:ref}`ex-pfnr` and the definition of $P$, it can be shown that $$ G(x, x_k) \coloneq \sum_{j=k}^n P(x, x_j) = \PP\{x_k - s/2 < X_{t+1} \given X_t = x\}. $$ Rewriting the probability in terms of $\epsilon_{t+1}$, we get $$ G(x, x_k) = \PP\{\epsilon_{t+1} > (x_k - s/2 - \rho x) / \sigma \}. $$ Since $\rho \geq 0$, we can now see that $x \leq y$ implies $G(x, x_k) \leq G(y, x_k)$ for all $k$, or, equivalently, $G(x, \cdot) \leq G(y, \cdot)$ pointwise on $\Xsf$. By {prf:ref}`l-eqfst`, this is equivalent to the statement that $P(x, \cdot) \lefsd P(y, \cdot)$, which confirms that $P$ is monotone increasing. ``` ```{exercise} :label: ex-mcs-auto-12 In {ref}`sss-eue` we discussed a Markov chain $$ \Xsf = \{1, 2\} \quad \text{and} \quad P_w = \begin{pmatrix} 1-\alpha & \alpha \\ \beta & 1-\beta \end{pmatrix}, $$ for some $\alpha,\beta \in [0,1]$. Show that $P_w$ is monotone increasing if and only if $\alpha + \beta \leq 1$. ``` ```{solution} ex-mcs-auto-12 This matrix $P_w$ is monotone increasing if and only if $(1-\alpha, \alpha) \lefsd (\beta, 1-\beta)$. From {prf:ref}`ex-fsss`, we know that this is equivalent to $\beta \leq 1-\alpha$, or $\beta + \alpha \leq 1$. ``` ```{exercise} :label: ex-miif Prove that $P$ is monotone increasing if and only if $P$ is invariant on $i\RR^\Xsf$; that is, if $h \in i\RR^\Xsf$ implies $Ph \in i\RR^\Xsf$. ``` ```{solution} ex-miif Suppose that $P$ is monotone increasing and fix $h \in i\RR^\Xsf$. We claim that $Ph \in i\RR^\Xsf$. To see this, pick any $x, y \in \Xsf$ with $x \preceq y$. Since $x \preceq y$ we have $P(x, \cdot) \lefsd P(y, \cdot)$. Hence $\sum_{x'} h(x') P(x, x') \leq \sum_{x'} h(x')P(y, x')$. This shows that $Ph \in i\RR^\Xsf$. To see the converse, suppose that $P$ is invariant on $i\RR^\Xsf$. Fix $x, y \in \Xsf$ with $x \preceq y$. We claim that $P(x, \cdot) \lefsd P(y, \cdot)$. To see this, fix $u \in i\RR^\Xsf$. $Pu \in i\RR^\Xsf$ by invariance, so $(Pu)(x) \leq (Pu)(y)$ and hence $\sum_{x'} u(x') P(x, x') \leq \sum_{x'} u(x')P(y, x')$. Since $u$ was chosen arbitrarily from $i\RR^\Xsf$, we have $P(x, \cdot) \lefsd P(y, \cdot)$. Hence $P$ is monotone increasing, as was to be shown. ``` ```{exercise} :label: ex-pmptm Prove: If $P$ is monotone increasing then so is $P^t$ for all $t \in \NN$. ``` ```{solution} ex-pmptm Clearly, this is true for $t=1$. Suppose it is also true for arbitrary $t$. Then, for any $h \in i\RR^\Xsf$, the function $P^t h$ is again in $i\RR^\Xsf$. From this it follow that $P^{t+1} h = P P^t h$ is also in $i\RR^\Xsf$, since $P$ is monotone increasing. This proves that $P^{t+1}$ is invariant on $i\RR^\Xsf$, and therefore monotone increasing. ``` (ss-fdrs)= ### Geometric Sums Dynamic programs often form a lifetime value $V_0$ as a geometric sum of a reward sequence $(R_t)_{t \geq 0}$ with constant discount factor, so that $V_0 = \EE \sum_{t = 0}^\infty \beta^t R_t$ for some $\beta > 0$. We saw this in {eq}`eq-dpnpv`, where we aggregated a profit stream $(\pi_t)_{t \geq 0}$ into an expected present value of the firm, and again in {eq}`eq-earnings`, where a worker evaluates lifetime earnings. In this section, we study expectations of geometric sums. #### Theory Consider a conditional mathematical expectation of a discounted sum of future measurements: $$ v(x) \coloneq \EE_x \, \sum_{t=0}^\infty \beta^t h(X_t) \coloneq \EE \left[ \, \sum_{t=0}^\infty \beta^t h(X_t) \given X_0 = x \right], $$ (eq-exp_gs) for some constant $\beta \in \RR_+$ and $h \in \RR^\Xsf$. Here - $(X_t)$ is $P$-Markov on some finite set $\Xsf$, - $v(x)$ is a **lifetime reward** starting from state $x$, and - $\EE_x$ indicates that we are conditioning on $X_0 =x$. With $I$ as the identity matrix, the next result describes $v$ as function of $\beta$, $P$ and $h$. ```{prf:lemma} :label: l-fgsd If $\beta < 1$, then $I-\beta P$ is invertible and $$ v = \sum_{t=0}^\infty (\beta P)^t h = (I - \beta P)^{-1} h. $$ (eq-exp_gs2) ``` ```{prf:proof} Under the stated conditions $$ \EE_x \, \sum_{t=0}^\infty \beta^t h(X_t) = \sum_{t=0}^\infty \beta^t \EE_x h(X_t) = \sum_{t=0}^\infty \beta^t (P^t h)(x), $$ (eq-pfgsd) where the first equality in {eq}`eq-pfgsd` uses linearity of expectations and the second follows from {eq}`eq-exp_k` and the assumption that $(X_t)$ is $P$-Markov starting at $x$.[^2] Applying the Neumann series lemma (p. ) to the matrix $\beta P$, we see that $\sum_{t=0}^\infty (\beta P)^t = (I - \beta P)^{-1}$. The lemma applies because $\rho(\beta P) = \beta \rho(P) = \beta < 1$, as follows from {prf:ref}`ex-sm_sr1`. ◻ ``` (sss-fvfi)= #### Application: Valuation of Firms Consider a firm that receives random profit stream $(\pi_t)_{t \geq 0}$. Supposes that the value of the firm equals the expected present value of its profit stream. Suppose for now that the interest rate is constant at $r > 0$. With $\beta \coloneq 1/(1+r)$, total valuation is $$ V_0 = \EE \sum_{t=0}^\infty \beta^t \pi_t. $$ (eq-vfcr) To compute this value, we need to know how profits evolve. A common strategy is to set $\pi_t = \pi(X_t)$ for some fixed $\pi \in \RR^\Xsf$, where $(X_t)_{t \geq 0}$ is a state process. For known dynamics of $(X_t)$ and function $\pi$, the value $V_0$ in {eq}`eq-vfcr` can be computed. Here we assume that $(X_t)$ is $P$-Markov for $P \in \mopx$ with finite $\Xsf$. Then conditioning on $X_0 = x$, we can write the value as $$ v(x) \coloneq \EE_x \sum_{t=0}^\infty \beta^t \pi_t \coloneq \EE \left[ \sum_{t=0}^\infty \beta^t \pi_t \given X_0 = x \right]. $$ By {prf:ref}`l-fgsd`, the value $v(x)$ is finite and the function $v \in \RR^\Xsf$ can be obtained by $$ v = \sum_{t = 0}^\infty \beta^t P^t \pi = (I - \beta P)^{-1} \pi. $$ It is plausible that the value of the firm will be higher for a return process in which higher states generate higher profits and predict higher future states. The next exercise confirms this. ```{exercise} :label: ex-mcs-auto-13 Let $\Xsf$ be partially ordered and suppose that $\pi \in i\RR^\Xsf$ and that $P$ is monotone increasing. (See {ref}`ss-mmc` for terminology and notation.) Prove that, under these conditions, $v$ is increasing on $\Xsf$. ``` ```{solution} ex-mcs-auto-13 Let $\pi$ and $P$ satisfy the stated conditions. By {prf:ref}`ex-pmptm`, $P^t$ is monotone increasing for all $t$. By this fact and the assumption $\pi \in i\RR^\Xsf$, we see that $P^t \pi \in i\RR^\Xsf$ for all $t$. Hence $v = \sum_{t \geq 0} \beta^t P^t \pi$ is also increasing. ``` (sss-nccas)= #### Application: Valuing Consumption Streams To model consumption-saving choices we want to evaluate different consumption paths, where a **consumption path** is a nonnegative random sequence $(C_t)_{t \geq 0}$. In what follows we consider consumption paths such that $C_t = c(X_t)$ for all $t \geq 0$, where $c \in \RR_+^\Xsf$ and $(X_t)_{t \geq 0}$ is $P$-Markov on finite set $\Xsf$. Thus, consumption streams are time-invariant functions of a finite-state Markov chain. In a standard "time additive" model of consumer preferences with constant geometric discounting, the time zero value of a consumption stream $(C_t)_{t \geq 0}$, given current state $X_0 = x \in \Xsf$, is $$ v(x) = \EE_x \sum_{t=0}^\infty \beta^t u(C_t), $$ (eq-cvalas) where $\beta \in (0,1)$ is a discount factor and $u \colon \RR_+ \to \RR$ is called the **flow utility function**. Dependence of $v(x)$ on $x$ comes from the initial condition $X_0 = x$ influencing the Markov state process and, therefore, the consumption path. Using $C_t = c(X_t)$ and defining $r \coloneq u \circ c$ we can write $v(x) = \EE_x \, \sum_{t \geq 0} \beta^t r(X_t)$. By {prf:ref}`l-fgsd`, this sum is finite and $v$ can be expressed as $$ v = (I - \beta P)^{-1} r. $$ (eq-vfucq) Figure {numref}`f-val_consumption_1` shows an example when $u$ has the constant relative risk aversion (**CRRA**) specification $$ u(c)=\frac{c^{1-\gamma}}{1-\gamma} \qquad (c \geq 0, \; \gamma > 0), $$ (eq-crra) while $c(x) = \exp(x)$, so that consumption takes the form $C_t = \exp(X_t)$, and $(X_t)_{t \geq 0}$ is a Tauchen discretization (see {ref}`ss-apta`) of $X_{t+1} = \rho X_t + \nu W_{t+1}$ where $(W_t)_{t \geq 1}$ is iid and standard normal. Parameters are $n=25$, $\beta=0.98$, $\rho=0.96$, $\nu=0.05$ and $\gamma = 2$. We set $r = u \circ c$ and solved for $v$ via {eq}`eq-vfucq`. ```{figure} ../figures/val_consumption_1.pdf :name: f-val_consumption_1 The value of $(C_t)_{t\geq 0}$ given $X_t = x$ ``` ```{exercise} :label: ex-mcs-auto-14 Replicate Figure {numref}`f-val_consumption_1`. ``` ```{exercise} :label: ex-mcs-auto-15 The value function in Figure {numref}`f-val_consumption_1` appears to be increasing in the state $x$. Prove this for the CRRA model when $\rho \geq 0$. ``` ```{solution} ex-mcs-auto-15 Both $u$ and $\exp$ are increasing on $\Xsf$, so $r$ is in $i\RR^\Xsf$. Since $\rho \geq 0$, $P$ is monotone increasing (see {ref}`sss-mmcs`). Clearly, $\beta P$ shares this property. It follows that $\beta Pr \in i\RR^\Xsf$. Applying $\beta P$ again, we have $(\beta P)^2 r \in i\RR^\Xsf$. Continuing in this way, we see that $(\beta P)^k r$ is increasing for all $k$. Hence $\sum_{k \geq 0} (\beta P)^k r$ is increasing. By the Neumann series lemma, this sum is equal to $v$, so $v \in i\RR^\Xsf$. ``` (s-jsr)= ## Job Search Revisited In this section, we extend the job search problem studied in {ref}`ss-js` to a setting with Markov wage offers. We discuss additional structure when the Markov operator for wage offers is monotone increasing. We will also allow job separations to occur. (ss-jsms)= ### Job Search with Markov State We adopt the job search setting of {ref}`ss-js` but assume now that the wage process $(W_t)$ is $P$-Markov on $\Wsf \subset \RR_+$, where $P \in \mopw$ and $\Wsf$ is finite. #### Value Function Iteration The **value function** $v^*$ for the Markov job search model is now defined as follows: $v^*(w)$ is the maximum lifetime value that can be obtained when the worker is unemployed with current wage offer is $w$ in hand. Value function $v^*$ satisfies Bellman equation $$ v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*(w') P(w, w') \right\} \qquad (w \in \Wsf). $$ (eq-jsbellc) We continue to assume that $c > 0$ and $\beta \in (0,1)$. Bellman equation {eq}`eq-jsbellc` extends a corresponding Bellman equation for the iid case (cf. {eq}`eq-jsbell`). (A full proof is given in {prf:ref}`c-opt_stop`.) The Bellman operator corresponding to {eq}`eq-jsbellc` is $$ (Tv)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w'} \, v(w') P(w, w') \right\} \qquad (w \in \Wsf). $$ As before, $T$ is constructed so that $v^*$ is a fixed point (since {eq}`eq-jsbellc` holds). {prf:ref}`ex-jsmtc` will show that $v^*$ is the only fixed point of $T$ in $\RR^\Wsf_+$. Extending the iid definition (cf. {eq}`eq-jsvg`), a policy $\sigma \colon \Wsf \to \{0,1\}$ is called **$v$-greedy** if $$ \sigma(w) = \1\left\{ \frac{w'}{1-\beta} \geq c + \beta \, \sum_{w'} \, v(w') P(w, w') \right\}, $$ for all $w \in \Wsf$. Let $\vV \coloneq \RR^\Wsf_+$ and endow $\vV$ with the pointwise partial order $\leq$ and the supremum norm, so that $\| f - g\|_\infty = \max_{w \in \Wsf}|f(w) - g(w)|$. ```{exercise} :label: ex-jsmtc Prove that 1. $T$ is an order-preserving self-map on $\vV$. 2. $T$ is a contraction of modulus $\beta$ on $\vV$. ``` ```{solution} ex-jsmtc We start with part (i). To show that $T$ is a self-map on $\vV \coloneq \RR^\Wsf_+$, we just need to verify that $v \in \vV$ implies $Tv \in \vV$, which only requires us to verify that $T$ maps nonnegative functions into nonnegative functions. This is clear from the definition. Regarding the order-preserving property, fix $f, g \in \vV$ with $f \leq g$. We claim that $Tf \leq Tg$. Indeed, if $w \in \Wsf$, then $\sum_{w' \in \Wsf} \, f(w') P(w, w') \leq \sum_{w' \in \Wsf} \, g(w') P(w, w')$, which in turn implies that $(Tf)(w) \leq (Tg)(w)$. Since $w$ was an arbitrary wage value, we have $Tf \leq Tg$, so $T$ is order-preserving. Regarding part (ii), let $e(w) \coloneq w/(1-\beta)$ and fix $f, g$ in $V$. Writing the operators pointwise and applying the last result in {prf:ref}`l-efine` gives $$ \begin{aligned} |Tf - Tg| & = | e \vee (c + \beta Pf) - e \vee (c + \beta Pg)| \\ & \leq \left| \beta Pf - \beta Pg \right| \\ & = \beta \left| P(f-g) \right| \\ & \leq \beta P \left| f-g \right|. \end{aligned} $$ (Here the last inequality uses the result in {prf:ref}`ex-bmk`.) Since $P \geq 0$ we have $P | f-g | \leq P \|f-g\|_\infty \1 = \|f-g\|_\infty \1$, so $$ |Tf - Tg | \leq \beta \| f - g \|_\infty \1. $$ Taking the maximum on both sides gives $\|Tf-Tg\|_\infty \leq \beta \|f-g\|_\infty$. Since $f, g$ were arbitrary elements of $V$, the contraction claim is verified. ``` We recommend that you study the proof of the next lemma, since the same style of argument occurs often in the book. ```{prf:lemma} :label: l-vfmijs $v^*$ is increasing on $(\Wsf, \leq)$ whenever $P$ is monotone increasing. ``` ```{prf:proof} Let $i\vV$ be the increasing functions in $\vV$ and suppose that $P$ is monotone increasing. $T$ is a self-map on $i\vV$ in this setting, since $v \in i\vV$ implies $h(w) \coloneq c + \beta \sum_{w'} \, v(w') P(w, w')$ is in $i\vV$. Hence, for such a $v$, both $h$ and the stopping value function $e(w) \coloneq w/(1-\beta)$ are in $i\vV$. It follows that $Tv = h \vee e$ is in $i\vV$. Since $i\vV$ is a closed subset of $\vV$ and $T$ is a self-map on $i\vV$, the fixed point $v^*$ is in $i\vV$ (see {prf:ref}`ex-cinvfp`). ◻ ``` In view of the contraction property established in {prf:ref}`ex-jsmtc`, we can use value function iteration (i) to compute an approximation $v$ to the value function and (ii) to calculate the $v$-greedy policy that approximates the optimal policy. Code for implementing this procedure is in {numref}`list-markov_js`. The definition of a $v$-greedy policy resembles that of the iid case (see {eq}`eq-jsvg`). ```{code-block} julia :name: list-markov_js :caption: Job search with Markov state (`markov_js.jl`) :linenos: using QuantEcon, LinearAlgebra include("s_approx.jl") "Creates an instance of the job search model with Markov wages." function create_markov_js_model(; n=200, # wage grid size ρ=0.9, # wage persistence ν=0.2, # wage volatility β=0.98, # discount factor c=1.0 # unemployment compensation ) mc = tauchen(n, ρ, ν) w_vals, P = exp.(mc.state_values), mc.p return (; n, w_vals, P, β, c) end " The Bellman operator Tv = max{e, c + β P v} with e(w) = w / (1-β)." function T(v, model) (; n, w_vals, P, β, c) = model h = c .+ β * P * v e = w_vals ./ (1 - β) return max.(e, h) end " Get a v-greedy policy." function get_greedy(v, model) (; n, w_vals, P, β, c) = model σ = w_vals / (1 - β) .>= c .+ β * P * v return σ end "Solve the infinite-horizon Markov job search model by VFI." function vfi(model) v_init = zero(model.w_vals) v_star = successive_approx(v -> T(v, model), v_init) σ_star = get_greedy(v_star, model) return v_star, σ_star end ``` (sss-jscvcm)= #### Continuation Values The continuation value $h^*$ from the iid case is now replaced by a **continuation value function** $$ h^*(w) \coloneq c + \beta \, \sum_{w'} \, v^*(w') P(w, w') \qquad (w \in \Wsf). $$ The continuation value depends on $w$ because the current offer helps predict the offer next period, which in turn affects the value of continuing. The functions $w \mapsto w / (1-\beta)$, $h^*$ and $v^*$ corresponding to the default model in {numref}`list-markov_js` are shown in Figure {numref}`f-markov_js_1`. ```{figure} ../figures/markov_js_1.pdf :name: f-markov_js_1 Value, stopping, and continuation for Markov job search ``` ```{exercise} :label: ex-mcs-auto-16 Explain why the continuation value function is increasing in Figure {numref}`f-markov_js_1`. If possible, provide a mathematical and economic explanation. ``` ```{solution} ex-mcs-auto-16 The code in {numref}`list-markov_js` creates a Markov chain via Tauchen approximation of an AR(1) process with positive autocorrelation parameter. By {prf:ref}`ex-tami`, $P$ is monotone increasing. Hence, by {prf:ref}`l-vfmijs`, the value function is increasing. Since $h^* = c + \beta P v^*$, it follows that $h^*$ is increasing. Regarding intuition, positive autocorrelation in wages means that high current wages predict high future wages. It follows that the value of waiting for future wages rises with current wages. ``` ```{exercise} :label: ex-mcs-auto-17 Using the Bellman equation {eq}`eq-jsbellc`, show that $h^*$ obeys $$ h^*(w) \coloneq c + \beta \, \sum_{w'} \, \max \left\{ \frac{w'}{1-\beta}, \, h^*(w') \right\} P(w, w') \qquad (w \in \Wsf). $$ ``` ```{exercise} :label: ex-jscvq Let $Q$ be the operator on $\vV$ defined at $h \in \vV$ by $$ (Qh)(w) \coloneq c + \beta \, \sum_{w'} \, \max \left\{ \frac{w'}{1-\beta}, \, h(w') \right\} P(w, w') \qquad (w \in \Wsf). $$ Prove that $Q$ is (a) an order-preserving self-map on $\vV$ and (b) a contraction of modulus $\beta$ on $\vV$ under the supremum norm. ``` {prf:ref}`ex-jscvq` suggests an alternative way to solve the job search problem: iterate with $Q$ to obtain the continuation value function $h^*$ and then use the policy $$ \sigma^*(w) = \1\left\{ \frac{w}{1-\beta} \geq h^*(w) \right\} \qquad (w \in \Wsf) $$ that tells the worker to accept when the current stopping value exceeds the current continuation value. We saw that, in the iid case, a computational strategy based on continuation values is far more efficient than value function iteration (see {ref}`ss-crwd`). Since continuation values are functions rather than scalars, here the two approaches (iterating with $T$ vs iterating with $Q$) are more similar. In {prf:ref}`c-opt_stop` we discuss alternative computational strategies in more detail, seeking conditions under which one approach will be more efficient than the other. (ss-jsws)= ### Job Search with Separation We now modify the job search problem discussed in {ref}`ss-jsms` by adding separations. In particular, an existing match between worker and firm terminates with probability $\alpha$ every period. (This is an extension because setting $\alpha=0$ recovers the permanent job scenario from {ref}`ss-jsms`.) The worker now views the loss of a job as a capital loss and a spell of unemployment as an investment. In what follows, the wage process and discount factor are unchanged from {ref}`ss-jsms`. As before, $\vV \coloneq \RR^\Wsf_+$ is endowed with the supremum norm. The value function $v^*_u$ for an unemployed worker satisfies the recursion $$ v^*_u(w) = \max \left\{ v^*_e(w) ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*_u(w') P(w, w') \right\} \qquad (w \in \Wsf), $$ (eq-jsvu) where $v^*_e$ is the value function for an employed worker, that is, the lifetime value of a worker who starts the period employed at wage $w$. Value function $v^*_e$ satisfies $$ v^*_e(w) = w + \beta \left[ \alpha \sum_{w'} v^*_u(w') P(w, w') + (1-\alpha) v^*_e(w) \right] \qquad (w \in \Wsf). $$ (eq-jsve) This equation states that value accruing to an employed worker is current wage plus the discounted expected value of being either employed or unemployed next period. We claim that, when $0 < \alpha, \beta < 1$, the system {eq}`eq-jsvu`--{eq}`eq-jsve` has a unique solution $(v_e^*, v_u^*)$ in $\vV \times \vV$. To show this we first solve {eq}`eq-jsve` in terms of $v^*_e(w)$ to obtain $$ v^*_e(w) = \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P v^*_u)(w) \right). $$ (eq-veee) (Recall $(Ph)(w) \coloneq \sum_{w'} h(w') P(w, w')$ for $h \in \RR^\Wsf$.) Substituting into {eq}`eq-jsvu` yields $$ v^*_u(w) = \max \left\{ \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P v^*_u)(w) \right) ,\, c + \beta \, (P v^*_u)(w) \right\}. $$ (eq-jsvut) ```{exercise} :label: ex-mcs-auto-18 Prove that there exists a unique $v^*_u \in \vV$ that solves {eq}`eq-jsvut`. Propose a convergent method for computing both $v^*_u$ and $v^*_e$. (Hint: See {prf:ref}`l-supbc`.) ``` ```{solution} ex-mcs-auto-18 Let $T$ be the operator on $\vV$ such that $(Tv_u)(w)$ is the right-hand side of {eq}`eq-jsvut`. To solve the exercise, it suffices to prove that $T$ is a contraction map on $\vV$. (Then $v_u$ can be obtained, in the limit, by applying successive approximation to $T$ and, once the approximate fixed point is computed, $v_e$ can be obtained via {eq}`eq-veee`.) To show that $T$ is a contraction, we let $T_1$ and $T_2$ be the operators on $\vV$ defined by $$ (T_1 v)(w) = \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P v)(w) \right) \quad \text{and} \quad (T_2 v)(w) = c + \beta \, (P v)(w) . $$ Since $Tv = (T_1 v) \vee (T_2 v)$, {prf:ref}`l-supbc` tells us that $T$ will be a contraction provided that $T_1$ and $T_2$ are both contraction maps. For the case of $T_2$, we have $$ \| T_2 f - T_2 g\|_\infty = \max_w |c + \beta \, (P f)(w) - c - \beta \, (P g)(w)| \leq \max_w \beta \sum_{w'} | f(w') - g(w')| P(w, w'). $$ The last term is dominated by $\beta \| f - g \|_\infty$, so $T_2$ is a contraction. The proof for $T_1$ is similar in spirit and hence left to the reader. ``` Figure {numref}`f-markov_js_with_sep_1` shows the value function $v_u^*$ for an unemployed worker, which is the fixed point of {eq}`eq-jsvut`, as well as the stopping and continuation values, which are given by $$ s^*(w) \coloneq \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P v^*_u)(w) \right) \quad \text{and} \quad h^*_e(w) \coloneq c + \beta \, (P v^*_u)(w) $$ respectively, for each $w \in \Wsf$. Parameters are as in {numref}`list-markov_js_with_sep`. The value function $v^*_u$ is the pointwise maximum (i.e., $v^*_u = s^* \vee h^*$). The worker's optimal policy while unemployed is $$ \sigma^*(w) \coloneq \1\{s^*(w) \geq h^*(w)\}. $$ As before, the smallest $w$ such that $\sigma^*(w) = 1$ is called the **reservation wage**. ```{code-block} julia :name: list-markov_js_with_sep :caption: Job search with separation model (`markov_js_with_sep.jl`) :linenos: using QuantEcon, LinearAlgebra "Creates an instance of the job search model with separation." function create_js_with_sep_model(; n=200, # wage grid size ρ=0.9, ν=0.2, # wage persistence and volatility β=0.98, α=0.1, # discount factor and separation rate c=1.0) # unemployment compensation mc = tauchen(n, ρ, ν) w_vals, P = exp.(mc.state_values), mc.p return (; n, w_vals, P, β, c, α) end ``` ```{figure} ../figures/markov_js_with_sep_1.pdf :name: f-markov_js_with_sep_1 Value function with job separation ``` Figure {numref}`f-markov_js_with_sep_2` shows how the reservation wage changes with $\alpha$. To produce this figure we solved the model for the reservation wage at 10 values of $\alpha$ in an evenly spaced grid ranging $0$ to $1$. The reservation wage falls with $\alpha$, since time spent unemployed is a capital investment in better wages, and the value of this investment declines as the separation rate rises. ```{figure} ../figures/markov_js_with_sep_2.pdf :name: f-markov_js_with_sep_2 Reservation wage versus separation rate ``` ```{exercise} :label: ex-mcs-auto-19 Replicate Figure {numref}`f-markov_js_with_sep_2`. ``` (s-cn_mcs)= ## Chapter Notes Many good textbooks on Markov chains exist, including {cite:t}`norris1998markov`, {cite:t}`haggstrom2002finite`, and {cite:t}`privault2013understanding`. {cite:t}`sargent2022economic` provides a relatively comprehensive treatment from a network perspective that is a natural one for Markov chains. Other economic applications are discussed in {cite:t}`stokey1989recursive` and {cite:t}`ljungqvist2012recursive`. {cite:t}`meyer2000matrix` gives a detailed account of the theory of nonnegative matrices. Another useful reference is {cite:t}`horn2012matrix`. A systematic study of monotone Markov chains was initiated by {cite:t}`daley1968stochastically`. Monotone Markov methods have many important applications in economics. See, for example, {cite:t}`hopenhayn1992stochastic`, {cite:t}`kamihigashi2014stochastic`, {cite:t}`jaskiewicz2014stationary`, {cite:t}`balbus2014constructive`, {cite:t}`foss2018stochastic` and {cite:t}`hu2019unique`. [^1]: Tauchen's method {cite:p}`tauchen1986finite` is simple but sub-optimal in some cases. For a more general discretization method and a survey of the literature, see {cite:t}`farmer2017discretizing`. [^2]: To justify the first equality, care must be taken when pushing expectations through infinite sums. In the present setting, justification can be provided via the dominated convergence theorem (see, e.g., {cite}`dudley2002real`, Theorem 4.3.5). A proof of a more general result can be found in {ref}`s-state_dep_append`. ======================================================================== ## Optimal Stopping (c-opt_stop)= # Optimal Stopping We study problems of maximizing lifetime rewards in settings in which decision-makers face risks. The job search model studied in {prf:ref}`c-introii` and {prf:ref}`c-mcs` is one example. Others include an entrepreneur who decides whether to exit or enter a market, a borrower who considers defaulting on a loan, a firm that contemplates introducing a new technology, or a portfolio manager deciding whether to exercise a real or financial option. These can all be formulated as dynamic programming and have common features that facilitate sharp characterizations of optimality. They are all two-action (or binary choice) problems that provide good laboratories for studying some special dynamic programs in which recursive representations are particularly enlightening. ## Introduction to Optimal Stopping We begin with a standard theory of optimal stopping and then consider alternative approaches that feature continuation values and threshold policies. We aim to provide a rigorous discussion of optimality that refines our less formal analysis of job search in {ref}`ss-js` and {ref}`ss-jsms`. (ss-prostat)= ### Theory Our first step is to set out the fundamental theory of discrete time infinite-horizon optimal stopping problems. (sss-tsp)= #### The Stopping Problem Let $\Xsf$ be a finite set. Given $\Xsf$, an **optimal stopping problem** is a tuple $\sS = (\beta, P, c, e)$ that consists of 1. a discount factor $\beta \in (0,1)$, 2. a Markov operator $P \in \mopx$, 3. a **continuation reward function** $c \in \RR^\Xsf$, and 4. an **exit reward function** $e \in \RR^\Xsf$. Given a $P$-Markov chain $(X_t)_{t \geq 0}$, a decision-maker observes the state $X_t$ in each period and decides whether to continue or stop. If she chooses to stop, she receives final reward $e(X_t)$ and the process terminates. If she decides to continue, then she receives $c(X_t)$ and the process repeats next period. Lifetime rewards are $$ \EE \sum_{t \geq 0} \beta^t R_t, $$ where $R_t$ equals $c(X_t)$ while the agent continues, $e(X_t)$ when the agent stops, and zero thereafter. ```{prf:example} Consider the infinite-horizon job search problem from {prf:ref}`c-introii`, where the wage offer process $(W_t)$ is iid with common distribution $\phi$ on finite set $\Wsf$. This is an optimal stopping problem with state space $\Xsf = \Wsf$ and $P \in \mopx$ having all rows equal to $\phi$, so that all draws are iid from $\phi$. The exit reward function is $e(x) = x/(1-\beta)$ and the continuation reward function is constant and equal to unemployment compensation. ``` ```{prf:example} :label: eg-amop Consider an infinite-horizon American call option that provides the right to buy a given asset at strike price $K$ at each future date. The market price of the asset is $S_t = s(X_t)$, where $(X_t)$ is $P$-Markov on finite set $\Xsf$ and $s \in \RR^\Xsf$. The interest rate is $r > 0$. Deciding when to exercise is an optimal stopping problem, with exit corresponding to exercising the option. The discount factor is $1/(1+r)$, the exit reward function is $e(x) \coloneq s(x) - K$ and the continuation reward is zero.[^1] ``` Optimal decisions are described by a **policy function**, which is a map $\sigma$ from $\Xsf$ to $\{0,1\}$. After observing state $x$ at any given time, the decision-maker takes action $\sigma(x)$, where $0$ means "continue" and $1$ means "stop." Implicit in this formulation is the assumption that the current state contains enough information for the agent to decide whether or not to stop. Let $\Sigma$ be the set of functions from $\Xsf$ to $\{0,1\}$. Let $v_\sigma(x)$ denote the expected lifetime value of following policy $\sigma$ now and in every future period, given optimal stopping problem $\sS = (\beta, P, c, e)$ and current state $x \in \Xsf$. We call $v_\sigma$ the **$\sigma$-value function**. We also call $v_\sigma(x)$ the **lifetime value** of policy $\sigma$ conditional on initial state $x$. Section {ref}`sss-ospval`, shows that $v_\sigma$ is well defined and describes how to calculate it. A policy $\sigma^* \in \Sigma$ is called **optimal** for $\sS$ if $$ v_{\sigma^*}(x) = \max_{\sigma \in \Sigma} v_\sigma(x) \quad \text{for all } x \in \Xsf. $$ (sss-ospval)= #### Lifetime Values Fixing $\sigma \in \Sigma$, let us consider how to compute the lifetime value $v_\sigma(x)$ of following $\sigma$ conditional on $X_0 = x$. Evidently, $v_\sigma$ satisfies $$ v_\sigma(x) = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x' \in \Xsf} v_\sigma(x') P(x, x') \right] \quad \text{for all } x \in \Xsf . $$ (eq-vsigos) Indeed, if $\sigma(x)=1$, then {eq}`eq-vsigos` states that $v_\sigma(x) = e(x)$, which is what we expect: if we choose to stop at a given state, then lifetime value from that state equals the exit reward. If, instead, $\sigma(x)=0$, then {eq}`eq-vsigos` becomes $$ v_\sigma(x) = c(x) + \beta \sum_{x'} v_\sigma(x') P(x, x') , $$ (eq-vsss) which is also what we expect: the value of continuing is the current reward plus the discounted expected reward obtained by continuing with policy $\sigma$ next period. We want to solve {eq}`eq-vsigos` for $v_\sigma$. To this end, we define $r_\sigma \in \RR^\Xsf$ and $L_\sigma \in \lopx$ via $$ r_\sigma(x) \coloneq \sigma(x) e(x) + (1-\sigma(x)) c(x) \quad \text{and} \quad L_\sigma(x, x') \coloneq \beta (1-\sigma(x)) P(x, x'). $$ With this notation, we can write {eq}`eq-vsigos` pointwise as $v_\sigma = r_\sigma + L_\sigma \, v_\sigma$. If $\rho(L_\sigma) < 1$, then $$ v_\sigma = (I - L_\sigma)^{-1} \, r_\sigma. $$ (eq-osvspw) ```{exercise} :label: ex-rbsp Confirm that $\rho(L_\sigma) < 1$ holds for any optimal stopping problem. ``` ```{solution} ex-rbsp Pointwise on $\Xsf$ we have $1 - \sigma \leq 1$, so $L_\sigma \leq \beta P$. By {prf:ref}`ex-nnmatop2`, we then have $\rho(L_\sigma) \leq \rho(\beta P) = \beta < 1$. ``` By {prf:ref}`ex-rbsp` and the Neumann series lemma, $v_\sigma$ is uniquely defined by {eq}`eq-osvspw`. (sss-os_po)= #### Policy Operators For the proofs, it will be helpful to view $v_\sigma$ as the fixed point of an operator. We associate each $\sigma \in \Sigma$ with a **policy operator** $T_\sigma$ defined at $v \in \RR^{\Xsf}$ by $$ (T_\sigma \, v)(x) = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x'} v(x') P(x, x') \right], $$ (eq-ospol) for each $x \in \Xsf$. With this notation, {eq}`eq-vsigos` can be written as $v_\sigma = T_\sigma \, v_\sigma$. ```{exercise} :label: ex-opt_stop-auto-1 Prove that, for any $\sigma \in \Sigma$, the operator $T_\sigma$ is order-preserving with respect to the pointwise partial order $\leq$ on $\RR^\Xsf$. ``` ```{solution} ex-opt_stop-auto-1 Fix $\sigma \in \Sigma$. If $f, g \in \RR^\Xsf$, $f \leq g$ and $x \in \Xsf$, then $$ \begin{aligned} (T_\sigma g)(x) - (T_\sigma f)(x) & = (1-\sigma(x)) \left[ \beta \sum_{x'} g(x') P(x, x') - \beta \sum_{x'} f(x') P(x, x') \right] \\ & = (1-\sigma(x))\beta \sum_{x'} (g(x') - f(x')) P(x, x'). \end{aligned} $$ Since $g(x') \geq f(x')$ for all $x'$ we have $(T_\sigma g)(x) \geq (T_\sigma f)(x)$ for all $x$. ``` Using the notation in {ref}`sss-ospval`, we can also define $T_\sigma$ via $$ T_\sigma \, v = r_\sigma + L_\sigma \, v . $$ ```{prf:proposition} :label: p-ospolop For any $\sigma \in \Sigma$, the policy operator $T_\sigma$ is a contraction of modulus $\beta$ on $\RR^\Xsf$ under the supremum norm. ``` The significance of {prf:ref}`p-ospolop` is that by construction $v_\sigma$ is a fixed point of $T_\sigma$. By the contraction property in {prf:ref}`p-ospolop`, $v_\sigma$ is the only fixed point of $T_\sigma$ in $\RR^\Xsf$ and, moreover, iterates of $T_\sigma$ always converge to $v_\sigma$. ```{exercise} :label: ex-opt_stop-auto-2 Prove {prf:ref}`p-ospolop`. ``` ```{solution} ex-opt_stop-auto-2 Fix $\sigma \in \Sigma$. Given $f, g \in \RR^\Xsf$ and $x \in \Xsf$, we have $$ \begin{aligned} |(T_\sigma f)(x) - (T_\sigma \, g)(x)| & = \left| \, (1-\sigma(x))\beta \sum_{x'} (g(x') - f(x')) P(x, x') \, \right| \\ & \leq \beta \left| \sum_{x'} [f(x') - g(x')] P(x, x') \right|. \end{aligned} $$ Applying the triangle inequality and $\sum_{x'} P(x, x')=1$, we obtain $$ |(T_\sigma f)(x) - (T_\sigma \, g)(x)| \leq \beta \sum_{x'} |f(x') - g(x')| P(x, x') \leq \beta \| f - g \|_\infty . $$ Taking the supremum over all $x$ on the left-hand side of this expression leads to $$ \|T_\sigma f - T_\sigma \, g \|_\infty \leq \beta \| f - g \|_\infty . $$ Since $f, g$ were arbitrary elements of $\RR^\Xsf$, the contraction claim is proved. ``` #### The Value Function In the job search problem in {ref}`ss-jsms`, we argued that the value function equals the fixed point of the Bellman operator. Here we make the same argument more formally in the more general setting of optimal stopping. First, given an optimal stopping problem $\sS = (\beta, P, c, e)$ with $\sigma$-value functions $\{v_\sigma\}_{\sigma \in \Sigma}$, we define the **value function** $v^*$ of $\sS$ via $$ v^*(x) \coloneq \max_{\sigma \in \Sigma} v_\sigma(x) \qquad (x \in \Xsf), $$ (eq-osvstar) so that $v^*(x)$ is the maximal lifetime value available to an agent facing current state $x$. Following notation in {ref}`sss-latprop`, we can also write $v^* = \vee_\sigma \, v_\sigma$. Given that solving the maximization in {eq}`eq-osvstar` is, in general, a difficult problem, how can we obtain the value function? The following steps can do the job: 1. formulate a Bellman equation for the value function of the optimal stopping problem, namely, $$ v(x) = \max \left\{ e(x), c(x) + \beta \sum_{x'} v(x') P(x, x') \right\} \qquad (x \in \Xsf), $$ (eq-osbe) 2. prove that this Bellman equation has a unique solution in $\RR^\Xsf$, and then 3. show that this solution equals the value function, as defined in {eq}`eq-osvstar`. We shall complete these steps in {ref}`sss-thebos`. (sss-thebos)= #### The Bellman Operator Define the **Bellman operator** for the optimal stopping problem $\sS = (\beta, P, c, e)$ as $$ (Tv)(x) = \max \left\{ e(x) ,\, c(x) + \beta \sum_{x'} v(x') P(x, x') \right\}, $$ (eq-osbod) where $x \in \Xsf$ and $v \in \RR^{\Xsf}$. By construction, any fixed point of $T$ solves the Bellman equation and vice versa. Pointwise, we can express $T$ via $Tv = e \vee (c + \beta Pv)$. ```{exercise} :label: ex-opt_stop-auto-3 Prove that $T$ is an order-preserving self-map on $\RR^\Xsf$. ``` ```{solution} ex-opt_stop-auto-3 Fix $f, g \in \RR^\Xsf$ with $f \leq g$. Since $P \geq 0$, we have $Pf \leq Pg$. Hence $c + \beta Pf \leq c + \beta Pg$. As a result, $$Tf = e \vee (c + \beta Pf) \leq e \vee (c + \beta Pg) = Tg.$$ ``` Our main result for this section is: ```{prf:proposition} :label: p-ospccd If $\sS$ is an optimal stopping problem with Bellman operator $T$ and value function $v^*$, then 1. $T$ is a contraction map of modulus $\beta$ on $\RR^\Xsf$ under the supremum norm $\| \cdot \|_\infty$ and 2. the unique fixed point of $T$ on $\RR^\Xsf$ is the value function $v^*$. ``` ```{exercise} :label: ex-ospccd Prove the claim in (i) of {prf:ref}`p-ospccd`. ``` ```{solution} ex-ospccd This result follows from {prf:ref}`l-supbc`. For the sake of the exercise, we also provide a direct proof: Take any $f, g$ in $\RR^\Xsf$. Writing the operators pointwise and applying the last result in {prf:ref}`l-efine` gives $$ \begin{aligned} |Tf - Tg| & = | e \vee (c + \beta Pf) - e \vee (c + \beta Pg)| \\ & \leq \left| \beta Pf - \beta Pg \right| \\ & = \beta \left| P(f-g) \right| \\ & \leq \beta P \left| f-g \right|. \end{aligned} $$ (Here the last inequality uses the result in {prf:ref}`ex-bmk`.) Since $P \geq 0$ we have $P | f-g | \leq P \|f-g\|_\infty \1 = \|f-g\|_\infty \1$, so $$ |Tf - Tg | \leq \beta \| f - g \|_\infty \1. $$ Taking the maximum on both sides gives $\|Tf-Tg\|_\infty \leq \beta \|f-g\|_\infty$. Since $f, g$ were arbitrary elements of $\RR^\Xsf$, the contraction claim is verified. ``` ```{prf:proof} *Proof of {prf:ref}`p-ospccd`.* With the result of {prf:ref}`ex-ospccd` in hand, we need only show that the unique fixed point $\bar v$ of $T$ in $\RR^\Xsf$ is equal to $v^* = \vee_\sigma \, v_\sigma$. We show $\bar v \leq v^*$ and then $\bar v \geq v^*$. For the first inequality, let $\sigma \in \Sigma$ be defined by $$ \sigma(x) = \1 \left\{ e(x) \geq c(x) + \beta \, \sum_{x'} \bar v(x') P(x, x') \right\} \quad \text{for all } x \in \Xsf. $$ Observe that for this choice of $\sigma$ we have, for any $x \in \Xsf$, $$ \begin{aligned} (T_\sigma \, \bar v)(x) & = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x'} \bar v(x') P(x, x') \right] \\ & = \max \left\{ e(x) ,\, c(x) + \beta \sum_{x'} \bar v(x') P(x, x') \right\} = (T\bar v)(x) = \bar v(x). \end{aligned} $$ In particular, $T_\sigma \, \bar v = \bar v$. But the only fixed point of $T_\sigma$ in $\RR^\Xsf$ is $v_\sigma$, so $\bar v = v_\sigma$. But then $\bar v \leq v^*$, by the definition of $v^*$. This is our first inequality. Regarding the second, fix $\sigma \in \Sigma$ and observe that $T v \geq T_\sigma v$ for all $v \in \RR^\Xsf$. Since $T$ is order-preserving and globally stable, {prf:ref}`p-ofpds` implies that $v_\sigma \leq \bar v$. Taking the maximum over $\sigma \in \Sigma$ yields $v^* \leq \bar v$. ◻ ``` #### Optimal Policies Paralleling the definition provided in the discussion of job search ({ref}`ss-js`), for each $v \in \RR^\Xsf$, we call $\sigma \in \Sigma$ **$v$-greedy** if, for all $x \in \Xsf$, $$ \sigma(x) \in \argmax_{a \in \{0,1\}} \left\{ a e(x) + (1-a) \left[ c(x) + \beta \, \sum_{x'} v(x') P(x, x') \right] \right\}. $$ (eq-oposg) A $v$-greedy policy uses $v$ to assign values to states and then chooses to stop or continue based on the action that generates a higher payoff. With this language in place, the next proposition makes precise our informal {ref}`sss-jsin` argument that optimal choices can be made using the value function. ```{prf:proposition} :label: p-osbpo Policy $\sigma \in \Sigma$ is optimal if and only if it is $v^*$-greedy. ``` {prf:ref}`p-osbpo` is a version of **Bellman's principle of optimality**. We shall prove this principle in a more general setting in {prf:ref}`c-mdps`. #### Value Function Iteration The theory just presented tells us that successive approximation using the Bellman operator converges to $v^*$ and $v^*$-greedy policies are optimal. These facts make value function iteration (VFI) a natural algorithm for solving optimal stopping problems. (VFI for optimal stopping problems corresponds to VFI for job search, as shown.) Later, in {prf:ref}`t-fbk_rpd`, we will show that when the number of iterates is sufficiently large, VFI produces an optimal policy. (ss-entex)= ### Firm Valuation with Exit In {ref}`sss-fvfi` we discussed firm valuation using expected present value of the cash flow generated by profits. This is a standard approach. However, it ignores that firms have the option to cease operations and sell all remaining assets. In this section, we consider firm valuation in the presence of an exit option. #### Optional Exit Consider a firm whose productivity is exogenous and evolves according to a $Q$-Markov chain $(Z_t)$ on finite set $\Zsf \subset \RR$. Profits are given by $\pi_t = \pi(Z_t)$ for some fixed $\pi \in \RR^\Zsf$. At the start of each period, the firm decides whether to remain in operation and receive current profit $\pi_t$, or to exit and receive scrap value $s > 0$ for sale of physical assets. Discounting is at fixed rate $r$ and $\beta \coloneq 1/(1+r)$. We assume that $r > 0$. Let $\Sigma$ be all $\sigma \colon \Zsf \to \{0,1\}$. For given $\sigma \in \Sigma$ and $v \in \RR^\Zsf$, the corresponding policy operator is $$ (T_\sigma v)(z) = \sigma(z) s + (1-\sigma(z)) \left[ \pi(z) + \beta \sum_{z'} v(z') Q(z, z') \right] \qquad (z \in \Zsf). $$ We saw in {ref}`sss-ospval`--{ref}`sss-os_po` that $T_\sigma$ has a unique fixed point $v_\sigma$ and that $v_\sigma(z)$ represents the value of following policy $\sigma$ forever, conditional on $Z_0 = z$. The Bellman operator for the firm's problem is the order-preserving self-map $T$ on $\RR^\Zsf$ defined by $$ (Tv)(z) = \max \left\{ s, \pi(z) + \beta \sum_{z'} v(z') Q(z, z') \right\} \quad (z \in \Zsf). $$ Pointwise, $T$ can be written as $Tv = s \vee (\pi + \beta Q v)$. Let $v^*$ be the value function for this problem. By {prf:ref}`p-ospccd`, $v^*$ is the unique fixed point of $T$ in $\RR^\Zsf$ and the unique solution to the Bellman equation. Moreover, successive approximation from any $v \in \RR^\Zsf$ converges to $v^*$. Finally, by {prf:ref}`p-osbpo`, a policy is optimal if and only if it is $v^*$-greedy. Figure {numref}`f-firm_exit_1` plots $v^*$, computed via VFI (i.e., successive approximation using $T$, along with the stopping value $s$ and the continuation value function $h^* = \pi + \beta Q v^*$, under the parameterization given in {numref}`list-firm_exit`. As implied by the Bellman equation, $v^*$ is the pointwise maximum of $s$ and $h^*$. The $v^*$-greedy policy instruct the firm to exit when the continuation value of the firm falls below the scrap value. ```{figure} ../figures/firm_exit_1.pdf :name: f-firm_exit_1 Value function for firms with exit option ``` ```{code-block} julia :name: list-firm_exit :caption: Firm exit model (`firm_exit.jl`) :linenos: "Creates an instance of the firm exit model." function create_exit_model(; n=200, # productivity grid size ρ=0.95, μ=0.1, ν=0.1, # persistence, mean and volatility β=0.98, s=100.0 # discount factor and scrap value ) mc = tauchen(n, ρ, ν, μ) z_vals, Q = mc.state_values, mc.p return (; n, z_vals, Q, β, s) end ``` ```{exercise} :label: ex-opt_stop-auto-4 Let $\pi(Z_t)=Z_t$. Replicate Figure {numref}`f-firm_exit_1` by using the parameters in {numref}`list-firm_exit` and applying VFI. Reviewing the code for job search should be helpful. ``` #### Exit Versus No-Exit If we define $w$ by $w(z) = \EE_z \sum_{t \geq 0} \beta^t \pi_t$ for all $z \in \Zsf$, then $w(z)$ is the value of the firm given $Z_0 = z$ when the firm never exits so that $w$ evaluates the firm according to expected present value of the profit stream. Figure {numref}`f-firm_exit_2` shows the no-exit value $w$ based on the parameterization in {numref}`list-firm_exit`. ```{figure} ../figures/firm_exit_2.pdf :name: f-firm_exit_2 Firm value with and without exit ``` In Figure {numref}`f-firm_exit_2`, we see that $w \leq v^*$ on $\Zsf$. Let's now prove that this is always true. To show $w \leq v^*$, first observe that $w = (I - \beta Q)^{-1} \pi$, by $\beta < 1$ and {prf:ref}`l-fgsd`. Rearranging gives $w = \pi + \beta Q w$. Now note that under the policy $\sigma \equiv 0$, where the firm chooses never to exit, we have $T_\sigma v = \pi + \beta Q v$. Hence the unique fixed point of $T_\sigma$ is $w$. As a result, $w = v_\sigma$ for $\sigma \equiv 0$. But $v^* \geq v_\sigma$ for all $\sigma \in \Sigma$. This proves that $w \leq v^*$. Choosing never to exit is a feasible policy. Since $v^*$ involves maximization of firm value over the set of all feasible policies, it must be at least as large as the value of never exiting. ```{exercise} :label: ex-opt_stop-auto-5 Prove the following: If $Q \gg 0$ and $s > w(z)$ for at least one $z \in \Zsf$, then $w \ll v^*$. Provide some intuition for this result. ``` ```{solution} ex-opt_stop-auto-5 First observe that, since $v^* \geq w$ and $T$ is order-preserving, we have $v^* = Tv^* \geq Tw = s \vee (\pi + \beta Qw) = s \vee w$. From this we get $v^* \geq s \vee w$ and applying $T$ to both sides gives $v^* \geq T(s \vee w)$. Next, observe that $$ T(s \vee w) = s \vee (\pi + \beta Q (s \vee w)) \geq \pi + \beta Q (s \vee w) \gg \pi + \beta Q w = w $$ where the strict inequality is by {prf:ref}`ex-smup`. We conclude that $v^* \geq T(s \vee w) \gg w$, as was to be shown. Intuitively, the option to exit adds value to firms everywhere in the state space, since $Q \gg 0$ implies that the state can shift to a region of the state space where exit is optimal in a later period. ``` ```{exercise} :label: ex-opt_stop-auto-6 Consider a version of the model of firm value with exit where productivity is constant but prices are stochastic. In particular, the price process $(P_t)$ for the final good is $Q$-Markov. Suppose further that one-period profits for a given price $p$ are $\max_{\ell \geq 0} \pi(\ell, p)$, where $\ell$ is labor input. Suppose that $\pi(\ell, p) = p \ell^{1/2} - w \ell$, where the wage rate $w$ is constant. Formulate the Bellman equation. ``` ```{solution} ex-opt_stop-auto-6 For the model described, the Bellman equation takes the form $$ v(p) = \max \left\{ s, \, \max_{\ell \geq 0} \pi(\ell, p) + \beta \sum_{p'} v(p') Q(p, p') \right\}. $$ Straightforward calculus shows that maximized one-period profits are $\pi(p) = p^2/ (4w)$. Hence the final expression is $$ v(p) = \max \left\{ s, \, \frac{p^2}{4w} + \beta \sum_{p'} v(p') Q(p, p') \right\}. $$ ``` ### Monotonicity We study monotonicity in values and actions in the general optimal stopping problem described in {ref}`ss-prostat`, with $\Xsf$ as the state space, $e$ as the exit reward function, and $c$ as the continuation reward function. #### Monotone Values Let $v^*$ be the value function of an optimal stopping problem defined by $\Xsf$, $P$, $\beta$, $c$ and $e$ and define a **continuation value function** $h^*$ $$ h^*(x) \coloneq c(x) + \beta \sum_{x'} v^*(x') P(x, x') \qquad (x \in \Xsf). $$ (eq-oscv) (The continuation reward function $c$ and the continuation value function $h^*$ are distinct objects.) Let $\Xsf$ be partially ordered and let $i\RR^\Xsf$ be the increasing functions in $\RR^\Xsf$. ```{prf:lemma} :label: l-hsmi If $e, c \in i\RR^\Xsf$ and $P$ is monotone increasing, then $h^*$ and $v^*$ are both increasing. ``` ```{prf:proof} Let the stated conditions hold. The Bellman operator can be written pointwise as $Tv = e \vee (c + \beta Pv)$. Since $P$ is monotone increasing, $P$ is invariant on $i\RR^\Xsf$. It follows from this fact and the conditions on $e$ and $c$ that $T$ is invariant on $i\RR^\Xsf$. Hence, by {prf:ref}`ex-cinvfp`, $v^*$ is in $i\RR^\Xsf$. Since $h^* = c + \beta P v^*$, the same is true for $h^*$. ◻ ``` ```{prf:example} :label: eg-fem Consider the {ref}`ss-entex` firm problem with exit with Bellman operator $Tv = s \vee (\pi + \beta Qv)$. Since $s$ is constant, it follows directly that $v^*$ and $h^*$ are both increasing functions when $\pi \in i\RR^\Zsf$ and $Q$ is monotone increasing. ``` #### Monotone Actions The optimal policy in the iid job search problem takes the form $\sigma^*(w) = \1\{w \geq w^* \}$ for all $w \in \Wsf$, where $w^* \coloneq (1-\beta) h^*$ is the reservation wage and $h^*$ is the continuation value. This optimal policy is of threshold type: once the wage offer exceeds the threshold, the decision is to stop. Since threshold policies are convenient, let us now try to characterize them. Throughout this section, we take $\Xsf$ to be a subset of $\RR$. Elements of $\Xsf$ are ordered by $\leq$, the usual order on $\RR$. ```{exercise} :label: ex-osopd Prove that the optimal policy $\sigma^*$ is decreasing on $\Xsf$ whenever $e$ is decreasing on $\Xsf$ and $h^*$ is increasing on $\Xsf$. ``` ```{solution} ex-osopd Fix $x, x' \in \Xsf$ with $x \leq x'$. Since $\sigma^*$ is binary, to show $\sigma^*$ is decreasing it suffices to show that $\sigma^*(x)=0$ implies $\sigma^*(x')=0$. Hence we suppose that $\sigma^*(x)=0$. This in turn implies that $e(x) < h^*(x)$. As $x \leq x'$, $e$ is decreasing and $h^*$ is increasing on $\Xsf$, we have $e(x') < h^*(x')$. Hence $\sigma^*(x')=0$. We conclude that $\sigma^*$ is decreasing on $\Xsf$, as claimed. ``` For a binary function on $\Xsf \subset \RR$, the condition that $\sigma^*$ is decreasing means that the decision-maker chooses to exit when $x$ is sufficiently small. ```{prf:example} In the firm problem with exit, as described in {ref}`ss-entex`, $h^*$ is increasing whenever $\pi \in i\RR^\Zsf$ and $Q$ is monotone increasing. Since the scrap value is constant, {prf:ref}`ex-osopd` applies under these conditions. Hence the optimal policy is decreasing. This reasoning agrees with Figure {numref}`f-firm_exit_1`, where exit is optimal when the state is small and continuing is optimal when $z$ is large. This makes sense: since $Q$ is monotone increasing, low current values of $z$ predict low future values of $z$, so profits associated with continuing can be anticipated to be low. ``` ```{exercise} :label: ex-opt_stop-auto-7 Show that the conditions of {prf:ref}`ex-osopd` hold when $e$ is constant on $\Xsf$, $c$ is increasing on $\Xsf$, and $P$ is monotone increasing. ``` ```{exercise} :label: ex-osopi Prove that the optimal policy $\sigma^*$ is increasing on $\Xsf$ whenever $e$ is increasing on $\Xsf$ and $h^*$ is decreasing on $\Xsf$. ``` ```{solution} ex-osopi The solution to {prf:ref}`ex-osopi` is similar to that of {prf:ref}`ex-osopd` and hence omitted. ``` ```{prf:example} In the iid job search problem, $e(w) = w / (1-\beta)$ is increasing and $h^*$ is constant. Hence the result in {prf:ref}`ex-osopi` applies. This is why the optimal policy $\sigma^*(w) = \1\{w \geq (1-\beta) h^*\}$ is increasing. The agent accepts all sufficiently large wage offers. ``` In the settings of {prf:ref}`ex-osopd`--{prf:ref}`ex-osopi`, the optimal policy is either increasing or decreasing. Since $\Xsf$ is totally ordered, monotonicity implies that a threshold policy is optimal. For example, if $\sigma^*$ is increasing, then we take $x^*$ to be the smallest $x \in \Xsf$ such that $\sigma^*(x) = 1$. For such an $x^*$ we have $$ x < x^* \; \implies \sigma^*(x)=0 \quad \text{and} \quad x \geq x^* \; \implies \sigma^*(x)=1. $$ ```{prf:remark} Conditions in {prf:ref}`ex-osopd`--{prf:ref}`ex-osopi` are sufficient but not necessary for monotone policies. Figure {numref}`f-markov_js_1` on  provides an example of a setting where the policy is increasing (the agent accepts for sufficiently large wage offers) even though both $e(x)=x/(1-\beta)$ and $h^*$ are strictly increasing. ``` (ss-convals)= ### Continuation Values In {ref}`ss-crwd` we solved the job search problem with iid draws by computing the continuation value $h^*$ directly and then choosing the policy $\sigma^*(w) = \1 \left\{ w/(1-\beta) \geq h^* \right\}$. We saw that this approach is more efficient than first computing the value function, since the continuation value is one-dimensional rather than $|\Wsf|$-dimensional. In {ref}`sss-jscvcm`, we tried the same approach for the job search problem with Markov state, where wage draws are correlated. We gathered fewer benefits from using the continuation value approach in that setting, since the continuation value function has the same dimensionality as the value function. These observations motivate us to explore continuation value methods more carefully. In this section, we formulate a continuation value approach for the general optimal stopping problem and verify convergence. We will see that, while all relevant state components must be included in the value function, purely transitory components do not affect continuation values. Hence the continuation value approach is at least as efficient and sometimes substantially more so. Another asymmetry between value functions and continuation value functions is that the latter are typically smoother. For example, in job search problems, the value function is usually kinked at the reservation wage, while the continuation value function is smooth. Greater smoothness comes from taking expectations over stochastic transitions: integration acts as a smoothing operation. Like lower dimensionality, increased smoothness facilitates analysis and computation. #### The Continuation Value Operator Let $h^*$ be the continuation value function for the optimal stopping problem defined in {eq}`eq-oscv`. To compute $h^*$ directly we begin with the optimal stopping version of the Bellman equation evaluated at $v^*$ and rewrite it as $$ v^*(x') = \max \left\{ e(x'), h^*(x') \right\} \qquad (x' \in \Xsf). $$ Taking expectations of both sides of the equation conditional on current state $x$ produces $\sum_{x'} v^*(x') P(x, x') = \sum_{x'} \max \left\{ e(x'), h^*(x') \right\} P(x, x')$. Multiplying by $\beta$, adding $c(x)$, and using the definition of $h^*$, we get $$ h^*(x) = c(x) + \beta \sum_{x'} \max \left\{ e(x'), h^*(x') \right\} P(x, x') \qquad (x \in \Xsf). $$ (eq-oshsr) This expression motivates us to introduce a **continuation value operator** $C \colon \RR^\Xsf \to \RR^\Xsf$ via $$ (Ch)(x) = c(x) + \beta \sum_{x'} \max \left\{ e(x'), h(x') \right\} P(x, x') \qquad (x \in \Xsf). $$ (eq-oscvo) ```{prf:proposition} :label: p-oscv The operator $C$ is a contraction of modulus $\beta$ on $\RR^\Xsf$ with the unique fixed point $h^*$ in $\RR^\Xsf$. ``` {prf:ref}`p-oscv` provides the following alternative method to compute the optimal policy that does not involve VFI: 1. Use successive approximations to $h^*$ with $C$ and 2. Calculate $\sigma^*$ via $\sigma^*(x) = \1\{e(x) \geq h^*(x)\}$ for each $x \in \Xsf$. In {ref}`sss-osdr` we discuss settings where this approach is advantageous. ```{prf:proof} *Proof of {prf:ref}`p-oscv`.* Fix $f, g \in \RR^\Xsf$ and $x \in \Xsf$. By the triangle inequality and the bound $|\alpha \vee x - \alpha \vee y| \leq |x - y|$ from, we have $$ \begin{aligned} |(Cf)(x) - (Cg)(x)| & \leq \beta \sum_{x'} \left| \max \left\{ e(x'), f(x') \right\} - \max \left\{ e(x'), g(x') \right\} \right| P(x, x') \\ & \leq \beta \sum_{x'} \left| f(x') - g(x') \right| P(x, x'). \end{aligned} $$ The right hand side is dominated by $\beta \| f - g \|_\infty$. Taking the maximum on the left hand side gives $$ \| Cf - Cg \|_\infty \leq \beta \| f - g\|_\infty, $$ which confirms that $C$ is a contraction of modulus $\beta$ on $\RR^\Xsf$. From the contraction property, we know that $C$ has exactly one fixed point in $\RR^\Xsf$; {eq}`eq-oshsr` implies that $h^*$ is that fixed point. ◻ ``` (sss-osdr)= #### Dimensionality Reduction The beginning of {ref}`ss-convals` mentioned that switching from value function iteration to continuation value iteration can substantially reduce the dimensionality of the problem in some cases. Here we describe situations where this works. To begin, let $\Wsf$ and $\Zsf$ be two finite sets and suppose that $\phi \in \dD(\Wsf)$ and $Q \in \mopz$. Let $(W_t)$ be iid with distribution $\phi$ and let $(Z_t)$ be an $Q$-Markov chain on $\Zsf$. If $(W_t)$ and $(Z_t)$ are independent, then $(X_t)$ defined by $X_t = (W_t, Z_t)$ is $P$-Markov on $\Xsf$, where $$ P(x, x') = P((w, z), (w', z')) = \phi(w') Q(z, z'). $$ Suppose that the continuation reward depends only on $z$ so that we can write the Bellman operator as $$ (Tv)(w, z) = \max \left\{ e(w, z) ,\, c(z) + \beta \sum_{w' \in \Wsf}\sum_{z' \in \Zsf} v(w', z') \phi(w') Q(z, z') \right\}. $$ (eq-osbodd) Since the right hand side depends on both $w$ and $z$, the Bellman operator acts on an $n$-dimensional space, where $n \coloneq |\Xsf| = |\Wsf| \times |\Zsf|$. However, if we inspect the right hand side of {eq}`eq-osbodd`, we see that the continuation value function depends only on $z$. Dependence on $w$ vanishes because $w$ does not help predict $w'$. Thus, the continuation value function is an object in $|\Zsf|$-dimensional space. The continuation value operator $$ (Ch)(z) = c(z) + \beta \sum_{w'} \sum_{z'} \max \left\{ e(w', z'), h(z') \right\} \phi(w') Q(z, z') \qquad (z \in \Zsf) $$ (eq-cvlds) acts on this lower dimensional-space. ```{prf:example} We can embed the iid job search problem into this setting by taking $(W_t)$ to be the wage offer process and $(Z_t)$ to be constant. This is why the iid case offers a large dimensionality reduction when we switch to continuation values. ``` More examples of dimensionality reduction are illustrated in the applications. #### Application to Firm Value Consider the firm valuation problem from {ref}`ss-entex` but suppose now that scrap value fluctuates with prices of underlying assets. For simplicity let's assume that scrap value at each time $t$ is given by the iid sequence $(S_t)$, where each $S_t$ has density $\phi$ on $\RR_+$. The corresponding Bellman operator is $$ (Tv)(z, s) = \max \left\{ s, \pi(z) + \beta \sum_{z'} \int v(z', s') \phi(s') \diff s' Q(z, z') \right\}. $$ We can convert this problem to a finite-state space optimal stopping problem by discretizing the density $\phi$ onto a finite grid contained in $\RR_+$. However, since continuation values depend only on $z$, a better approach is to switch to a continuation value operator. ```{exercise} :label: ex-opt_stop-auto-8 Write down the continuation value operator for this function as a mapping from $\RR^\Zsf$ to itself. ``` ```{solution} ex-opt_stop-auto-8 Either by manipulating the Bellman equation or appealing to {eq}`eq-cvlds`, we see that the continuation value operator is defined at $h \in \RR^\Zsf$ by $$ (Ch)(z) = \pi(z) + \beta \sum_{z'} \int \max \{s', h(z')\} \phi(s') \diff s' Q(z, z') \qquad (z \in \Zsf). $$ The next period scrap value $S_{t+1}$ is integrated out and the remaining function depends only on $z \in \Zsf$. ``` ```{exercise} :label: ex-opt_stop-auto-9 In {ref}`sss-sdfin` we defined stochastic dominance for distributions on finite sets. For densities $\phi$ and $\psi$ on $\RR_+$, the definition is similar: we say that $\psi$ stochastically dominates $\phi$ and write $\phi \lefsd \psi$ if $\int u(x) \phi(x) \diff x \leq \int u(x) \psi(x) \diff x$ for every $u$ in $i\RR^\Xsf$.[^2] With this definition, show that if $\phi_a$ and $\phi_b$ are two alternative densities for scrap value and $\phi_a \lefsd \phi_b$, then $\sigma^*_a \geq \sigma^*_b$ pointwise on $\Zsf$, where $\sigma^*_i$ is the optimal policy corresponding to density $\phi_i$ for $i \in \{a, b\}$. Interpret this result. ``` ```{solution} ex-opt_stop-auto-9 Let $\phi_a$ and $\phi_b$ be as stated. For $i \in \{a, b\}$ and $h \in \RR^\Zsf$, let $$ (C_i h)(z) = \pi(z) + \beta \sum_{z'} \int \max \{s', h(z')\} \phi_i(s') \diff s' Q(z, z') . $$ Since, for each $z' \in \Zsf$, the function $s' \mapsto \max \{s', h(z')\}$ is increasing, we have $$ \sum_{z'} \int \max \{s', h(z')\} \phi_a(s') \diff s' Q(z, z') \leq \sum_{z'} \int \max \{s', h(z')\} \phi_b(s') \diff s' Q(z, z') . $$ Hence $C_a h \leq C_b h$ for all $h \in \RR^\Zsf$. As $C_b$ is order-preserving and globally stable, {prf:ref}`p-ofpds` implies that the fixed point of $C_b$ dominates the fixed point of $C_a$. That is, $h^*_a \leq h^*_b$. But then, for any $z \in \Zsf$, we have $h^*_a(z) \leq h^*_b(z)$ and hence $$ \sigma_b^*(z) = \1\{s \geq h^*_b(z)\} \leq \1\{s \geq h^*_a(z)\} = \sigma_a^*(z). $$ The interpretation of $\sigma_b^* \leq \sigma_a^*$ is that firm exits at fewer states when the scrap value distribution is $\phi^*_b$. This makes sense, since the current scrap value offer $s$ is already known, while future offers are more promising under $\phi^*_b$ than $\phi^*_a$. Hence continuing is more attractive. ``` (s-osapp)= ## Further Applications In this section, we discuss some additional applications of optimal stopping. ### American Options We discussed American options briefly in {prf:ref}`eg-amop`. Here we investigate this class of derivatives more carefully. We focus on American call options that provide the right to buy a particular stock or bond at a fixed **strike price** $K$ at any time before a set expiration date. The market price of the asset at time $t$ is denoted by $S_t$. We discussed a case in which the expiration date is infinity in {prf:ref}`eg-amop`. However, options without termination dates -- also called perpetual options -- are rare in practice. Hence we focus on the finite-horizon case. We are interested in computing the expected value of holding the option when discounting with a fixed interest rate, a typical assumption when pricing American options. Finite horizon American options can be priced by backward induction in an approach like the one we used for the finite horizon job search problem discussed in {prf:ref}`c-introii`. Alternatively, we can embed finite horizon options into the theory of infinite-horizon optimal stopping. We use the second approach here, since we have just presented a theory for infinite-horizon optimal stopping. To this end, we take $T \in \NN$ to be a fixed integer indicating the date of expiration. The option is purchased at $t=0$ and can be exercised at any $t \in \NN$ with $t \leq T$. To include $t$ in the current state, we set $$ \Tsf \coloneq \{1, \ldots, T+1\} \quad \text{and} \quad m(t) \coloneq \min\{t+1, T+1\} \text{ for all } t \in \Tsf. $$ The idea is that time is updated via $t' = m(t)$, so that time increments at each update until $t=T+1$. After that we hold $t$ constant. Bounding time at $T+1$ keeps the state space finite. We assume that a stock price $S_t$ evolves according to $$ S_t = Z_t + W_t \quad \text{where} \quad (W_t)_{t \geq 0} \iidsim \phi \in \dD(\Wsf). $$ Here $(Z_t)_{t \geq 0}$ is $Q$-Markov on finite set $\Zsf$ for some $Q \in \mopz$ and $\Wsf$ is also finite. This means that the share price has both persistent and transient stochastic components. If we set parameters so that $(Z_t)_{t \geq 0}$ resembles a random walk, price changes will be difficult to predict. To form a {ref}`sss-tsp` optimal stopping problem, we must specify the state and clarify the $P \in \mopx$ that maps to the state process. We set the state space to $\Xsf \coloneq \Tsf \times \Wsf \times \Zsf$ and $$ P((t, w, z), (t', w', z')) \coloneq \1\{t' = m(t)\} \phi(w') Q(z, z'). $$ Thus, time updates deterministically via $t' = m(t)$ and $z'$ and $w'$ are drawn independently from $Q(z, \cdot)$ and $\phi$ respectively. As for a perpetual option, the continuation reward is zero and the discount factor is $\beta \coloneq 1/(1+r)$, where $r > 0$ is a fixed risk-free rate. The exit reward can be expressed as $\1\{t \leq T\} (S_t - K)$ so that exercising at time $t$ earns the owner $S_t - K$ up to expiry and zero thereafter. In terms of the state $(t, z)$, the exit reward is $$ e(t, w, z) \coloneq \1\{t \leq T\} [z + w - K]. $$ The Bellman equation can be written $$ v(t, w, z) = \max \left\{ e(t, w, z) ,\, \beta \sum_{w'}\sum_{z'} v(t', w', z') \phi(w') Q(z, z') \right\}, $$ where $t' = m(t)$. This value function $v(t, w, z)$ neatly captures the value of the option: It is the maximum of current exercise value and the discounted expected value of carrying the option over to the next period. Since the problem just described is an optimal stopping problem in the sense of {ref}`sss-tsp`, all of the optimality results attained for that problem apply. In particular, iterates of the Bellman operator converge to the value function $v^*$ and, moreover, a policy is optimal if and only if it is $v^*$-greedy. We can do better than VFI. Since $(W_t)_{t \geq 0}$ is iid and appears only in the exit reward, we can reduce dimensionality by switching to the continuation value operator, which, in this case, can be expressed as $$ (Ch)(t, z) = \beta \sum_{z'} \sum_{w'} \max \left\{ e(t', w', z'), \, h(t', z') \right\} \phi(w') Q(z, z'). $$ As proved in {ref}`ss-convals`, the unique fixed point of $C$ is the continuation value function $h^*$, and $C^k h \to h^*$ as $k \to \infty$ for all $h \in \RR^\Xsf$. With the fixed point in hand, we can compute the optimal policy as $$ \sigma^*(t, w, z) = \1 \left\{ e(t, w, z) \geq h^*(t, z) \right\}. $$ Here $\sigma^*(t, w, z) = 1$ prescribes exercising the option at time $t$. Figure {numref}`f-american_option_1` provides a visual representation of optimal actions under the default parameterization described in {numref}`list-american_option`. Each of the three figures show contour lines of the net exit reward $f(t, w, z) \coloneq e(t, w, z) - h^*(w, z)$, viewed as a function of $(w, z)$, when $t$ is held fixed. The date $t$ for each subfigure is shown in the title. The optimal policy exercises the option when $f(t, w, z) \geq 0$. ```{code-block} julia :name: list-american_option :caption: Pricing and American option (`american_option.jl`) :linenos: using QuantEcon, LinearAlgebra, IterTools "Creates an instance of the option model with log S_t = Z_t + W_t." function create_american_option_model(; n=100, μ=10.0, # Markov state grid size and mean value ρ=0.98, ν=0.2, # persistence and volatility for Markov state s=0.3, # volatility parameter for W_t r=0.01, # interest rate K=10.0, T=200) # strike price and expiration date t_vals = collect(1:T+1) mc = tauchen(n, ρ, ν) z_vals, Q = mc.state_values .+ μ, mc.p w_vals, φ, β = [-s, s], [0.5, 0.5], 1 / (1 + r) e(t, i_w, i_z) = (t ≤ T) * (z_vals[i_z] + w_vals[i_w] - K) return (; t_vals, z_vals, w_vals, Q, φ, T, β, K, e) end ``` In each subfigure, the **exercise region**, which is the set $(w, z)$ such that $f(t, w, z) \geq 0$, correspond to the northeast part of the figure, where $w$ and $z$ are both large. The boundary between exercise and continuing is the zero contour line, which is shown in black. Notice that the size of the exercise region expands with $t$. This is because the value of waiting decreases when the set of possible exercise dates declines. Figure {numref}`f-american_option_2` provides some simulations of the stock price process $(S_t)_{t \geq 0}$ over the lifetime of the option, again using the default parameterization described in {numref}`list-american_option`. The blue region in the top part of each subfigure contains values of the stock price $S_t = Z_t + W_t$ such that $S_t \geq K$. In this configuration in which the price of the underlying exceeds the strike price, the option is said to be "in the money." The figure also shows an optimal exercise date that is the first $t$ such that $e(t, W_t, Z_t) \geq h^*(W_t, Z_t)$ in a simulation. ```{figure} ../figures/american_option_1.pdf :name: f-american_option_1 Exercise region for the American option ``` ```{figure} ../figures/american_option_2.pdf :name: f-american_option_2 Simulations for the American option process ``` (sss-osent)= ### Research and Development Consider a firm that engages in costly research and development (R&D) in order to develop a new product. The firm decides whether to continue developing the product before starting to market it or to stop developing and start marketing it. For simplicity, we assume that the value of bringing the product to market is a one-time lump sum payment $\pi_t = \pi(X_t)$, where $(X_t)$ is a $P$-Markov chain on finite set $\Xsf$ with $P \in \mopx$. The flow cost of investing in R&D is $C_t$ per period, where $(C_t)$ is a stochastic process. Future payoffs are discounted at rate $r > 0$ and we set $\beta \coloneq 1/(1+r)$. #### Constant R&D Costs As a first take on this problem, suppose that $C_t \equiv c \in \RR_+$ for all $t$ and formulate an optimal stopping problem with exit reward $e=\pi$ and constant continuation reward $-c$. The Bellman equation is $$ v(x) = \max \left\{ \pi(x), -c + \beta \sum_{x'} v(x') P(x, x') \right\} \qquad (x \in \Xsf). $$ ```{exercise} :label: ex-opt_stop-auto-10 Write down the continuation value operator for this problem. Prove that the continuation value function $h^*$ is increasing in $x$ whenever $\pi \in i\RR^\Xsf$ and $P$ is monotone increasing. ``` ```{solution} ex-opt_stop-auto-10 In view of {eq}`eq-oscvo`, the continuation value operator for this problem is $$ (Ch)(x) = -c + \beta \sum_{x'} \max \left\{ \pi(x'), h(x') \right\} P(x, x') \qquad (x \in \Xsf). $$ The monotonicity result for $h^*$ follows from {prf:ref}`l-hsmi`. ``` ```{exercise} :label: ex-opt_stop-auto-11 Prove that the optimal policy $\sigma^*$ is increasing whenever $\pi$ is increasing and $(X_t)$ is iid (so that all rows of $P$ are identical). Provide economic intuition for this result. ``` ```{solution} ex-opt_stop-auto-11 If $(X_t)$ is iid with common distribution $\phi$, then the continuation value $h^*$ is constant; in particular, it is the unique solution to $$ h = -c + \beta \sum_{x'} \max \left\{ \pi(x'), h(x') \right\} \phi(x'). $$ Since constant functions are (weakly) decreasing, {prf:ref}`ex-osopi` applies and $\sigma^*$ is increasing. Intuitively, the value of waiting is independent of the current state, while the value of bringing the product to market is increasing in the current state. Hence, if the firm brings to the product to market in state $x$, then it should also do so at any $x' \geq x$. ``` (sss-iidrd)= #### IID R&D Costs Let's suppose now that $(C_t)_{t \geq 0}$ is iid with common distribution $\phi \in \dD(\Wsf)$. The Bellman equation is $$ v(c, x) = \max \left\{ \pi(x), -c + \beta \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x') \right\}. $$ (eq-rdvc) Since $(C_t)$ is iid, we would ideally like to integrate it out in the manner of {ref}`sss-osdr`, thereby lowering the dimensionality of the problem. However, note that the continuation value associated with {eq}`eq-rdvc` is $$ h(c, x) \coloneq -c + \beta \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x'), $$ which still depends on $c$. Fortunately, there is a way to eliminate $c$. Define the expected value $g(x)$ in state $x$ by $$ g(x) \coloneq \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x'). $$ (eq-osevf) Rewrite the Bellman equation using $g$ and replacing $(c, x)$ with $(c', x')$ to get $$ v(c', x') = \max \left\{ \pi(x'), -c' + \beta g(x') \right\}. $$ Averaging over $(c', x')$ and using the definition of $g$ again gives $$ g(x) = \sum_{x'} \sum_{c'} \max \left\{ \pi(x'), -c' + \beta g(x') \right\} \phi(c') P(x, x'). $$ (eq-dgrad) This is a functional equation in $g$ that depends only on $x$. To solve it, we introduce the operator $R$ defined by $$ (Rg)(x) = \sum_{x'} \sum_{c'} \max \left\{ \pi(x'), -c' + \beta g(x') \right\} \phi(c') P(x, x') \quad (x \in \Xsf). $$ ```{exercise} :label: ex-riac Prove that $R$ is a contraction of modulus $\beta$ on $\RR^\Xsf$. ``` From {prf:ref}`ex-riac`, we see that {eq}`eq-dgrad` has a unique solution $g^*$ in $\RR^\Xsf$ that can be computed by successive approximation. With $g^*$ in hand, we can compute the optimal policy via $$ \sigma^*(c, x) = \1 \left\{ \pi(x) \geq -c + \beta g^*(x) \right\}. $$ ```{prf:remark} Here we solved for the expected value function defined in {eq}`eq-osevf`. In {ref}`s-mbels` we shall discuss this method and its convergence properties in a more general setting. ``` (s-cn_opt_stop)= ## Chapter Notes Various textbooks treat optimal stopping in depth, although most use continuous time. {cite:t}`peskir2006` and {cite:t}`shiryaev2007optimal` are good examples. There are many applications of optimal stopping in economics and finance, with influential early research papers including {cite:t}`mccall1970`, {cite:t}`jovanovic1982selection`, {cite:t}`hopenhayn1992entry`, and {cite:t}`ericson1995markov`. {cite:t}`arellano2008default` considers borrowing on international financial markets with the option of sovereign default (see {ref}`sss-aod`). {cite:t}`riedel2009optimal` studies optimal stopping under Knightian uncertainty. {cite:t}`fajgelbaum2017uncertainty` include an optimal stopping problem for firms in a model of uncertainty traps. The firm problem with optimal exit has been used to analyze firm dynamics and firm size distributions in equilibrium models with heterogeneous firms. {cite:t}`hopenhayn1992entry` is the classic reference. {cite:t}`perla2014equilibrium` construct a growth model in which firms at the bottom of the productivity distribution imitate more productive firms. {cite:t}`carvalho2019large` analyze business cycles in a setting of firm growth with exit and a Pareto distribution of firms. Infinite duration American options are analyzed in {cite:t}`mordecki2002optimal`. Practical methods for pricing American options are provided by {cite:t}`longstaff2001valuing`, {cite:t}`rogers2002monte`, and {cite:t}`kohler2010pricing`. Replacement problems are an important optimal stopping problem not treated in this chapter. An important early paper by {cite:t}`rust1987optimal` uses dynamic programming to find optimal replacement policies for of engine parts and goes on to fit the model to data. {ref}`ss-sest` discusses structural estimation in the style of {cite:t}`rust1987optimal` and others. [^1]: We are studying American options in discrete time. Options with discrete exercise times are sometimes called **Bermudan options**. References for the continuous-time case are provided in {ref}`s-cn_opt_stop`. [^2]: Actually, in most definitions, $u$ is also restricted to be bounded and measurable, in order to ensure that the integrals are finite. These technicalities can be ignored in the exercise. ======================================================================== ## Markov Decision Processes (c-mdps)= # Markov Decision Processes In this chapter we study a class of discrete time, infinite horizon dynamic programs called Markov decision processes (MDPs). This standard class of problems is broad enough to encompass many applications, including the optimal stopping problems in {prf:ref}`c-opt_stop`. MDPs can also be combined with reinforcement learning to tackle settings where important inputs to an MDP are not known. (s-mdps)= ## Definition and Properties In this section, we define MDPs and investigate optimality. (ss-gfsmdp)= ### The MDP Model (sss-fsmdp)= We study a controller who interacts with a state process $(X_t)_{t \geq 0}$ by choosing an action path $(A_t)_{t \geq 0}$ to maximize expected discounted rewards $$ \EE \sum_{t \geq 0} \beta^t r(X_t, A_t), $$ (eq-dmdp_ob) taking an initial state $X_0$ as given. As with all dynamic programs, we insist that the controller is not clairvoyant: He or she cannot choose actions that depend on future states. To formalize the problem, we fix a finite set $\Xsf$, henceforth called the **state space**, and a finite set $\Asf$, henceforth called the **action space**. In what follows, a **correspondence** $\Gamma$ from $\Xsf$ to $\Asf$ is a function from $\Xsf$ into $\wp(\Asf)$, the set of all subsets of $\Asf$. The correspondence is called **nonempty** if $\Gamma(x) \not= \emptyset$ for all $x \in \Xsf$. For example, the map $\Gamma$ defined by $\Gamma(x) = [-x, x]$ is a nonempty correspondence from $\RR$ to $\RR$. Given $\Xsf$ and $\Asf$, we define a **Markov decision process** (**MDP**) to be a tuple $\mM = (\Gamma, \beta, r, P)$ consisting of 1. a nonempty correspondence $\Gamma$ from $\Xsf$ to $\Asf$, referred to as the **feasible correspondence**, which in turn defines the **feasible state action pairs** $$ \Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)}, $$ 2. a constant $\beta$ in $(0, 1)$, referred to as the **discount factor**, 3. a function $r$ from $\Gsf$ to $\RR$, referred to as the **reward function**, and 4. a **stochastic kernel** $P$ from $\Gsf$ to $\Xsf$; that is, $P$ is a map from $\Gsf \times \Xsf$ to $\RR_+$ satisfying $$ \sum_{x' \in \Xsf} P(x, a, x') = 1 \quad \text{ for all } (x,a) \text{ in } \Gsf. $$ Here $\Gamma(x) \subset \Asf$ is the set of actions available to the controller in state $x$. Given a feasible state action pair $(x, a)$, reward $r(x, a)$ is received, and the next period state $x'$ is randomly drawn from $P(x,a, \cdot)$, which is an element of $\dD(\Xsf)$. The dynamics and reward flow are summarized in {prf:ref}`algo-d_mdp`. ```{prf:algorithm} MDP dynamics: States, actions, and rewards :label: algo-d_mdp - $t \leftarrow 0$ - input $X_0$ - while $t < \infty$: - observe $X_t$ - choose action $A_t$ - receive reward $r(X_t, A_t)$ - draw $X_{t+1}$ from $P(X_t, A_t, \cdot)$ - $t \leftarrow t + 1$ ``` The **Bellman equation** corresponding to $\mM$ is $$ v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \quad \text{for all } \, x \in \Xsf. $$ (eq-fsmdp_bell) This can be understood as an equation in the unknown function $v \in \RR^\Xsf$. Below we define the value function $v^*$ as maximal lifetime rewards and show that $v^*$ is the unique solution to the Bellman equation in $\RR^\Xsf$. We can understand the Bellman equation as reducing an infinite-horizon problem to a two-period problem involving the present and the future. Current actions influence (i) current rewards and (ii) expected discounted value from future states. In every case we examine, there is a trade-off between maximizing current rewards and shifting probability mass towards states with high future rewards. ### Examples Here we list examples of MDPs. We will see that some models neatly fit the MDP structure, whereas others can be coaxed into the MDP framework by adding states or applying other tricks. #### A Renewal Problem {cite:t}`rust1987optimal` ignited the field of dynamic structural estimation by examining an engine replacement problem for a bus workshop. In each period the superintendent decides whether or not to replace the engine of a given bus. Replacement is costly but delaying risks unexpected failure. {cite:t}`rust1987optimal` solved this trade-off using dynamic programming. We consider an abstract version of Rust's problem with binary action $A_t$. When $A_t = 1$, the state resets to some fixed **renewal state** $\bar x$ in a finite set $\Xsf$ (e.g., mileage resets to zero when an engine is replaced). When $A_t = 0$, the state updates according to $Q \in \mopx$ (e.g., mileage increases stochastically when the engine is not replaced). Given current state $x$ and action $a$, current reward $r(x,a)$ is received. The discount factor is $\beta \in (0,1)$. For this problem, the Bellman equation has the form $$ v(x) = \max \left\{ r(x,1) + \beta v(\bar x), \; r(x,0) + \beta \sum_{x'} v(x')Q(x, x') \right\} \qquad (x \in \Xsf), $$ (eq-renewalb) where the first term is the value from action 1 and the second is the value of action 0. To set the problem up as an MDP we set $\Asf = \{0,1\}$ and $\Gamma(x) = \Asf$ for all $x \in \Xsf$. We define $$ P(x, a, x') \coloneq a \1\{x' = \bar x\} + (1-a) Q(x, x') \qquad ((x,a) \in \Gsf, \; x' \in \Xsf). $$ (eq-prenew) ```{exercise} :label: ex-mdps-auto-1 Prove that $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$. ``` The primitives $(\Gamma, \beta, r, P)$ form an MDP. Moreover, the renewal Bellman equation {eq}`eq-renewalb` is a special case of the MDP Bellman equation {eq}`eq-fsmdp_bell`. To verify this we rewrite {eq}`eq-renewalb` as $$ v(x) = \max_{a \in \{0,1\}} \left\{ r(x,a) + \beta \left[ a v(\bar x) + (1-a) \sum_{x'} v(x')Q(x, x') \right] \right\}, $$ Inserting $P$ from {eq}`eq-prenew` into the right-hand side of the last equation recovers the MDP Bellman equation {eq}`eq-fsmdp_bell`. (sss-oim)= #### Optimal Inventory Management We study a firm where a manager maximizes shareholder value. To simplify the problem, we ignore exit options (so that firm value is the expected present value of profits) and assume that the firm only sells one product. Letting $\pi_t$ be profits at time $t$ and $r > 0$ be the interest rate, the value of the firm is $$ V_0 = \EE \sum_{t \geq 0} \beta^t \pi_t \qquad \text{ where } \quad \beta \coloneq \frac{1}{1+r}. $$ (eq-voi) The firm faces exogenous demand process $(D_t)_{t \geq 0} \iidsim \phi \in \dD(\ZZ_+)$. Inventory $(X_t)_{t \geq 0}$ of the product obeys $$ X_{t+1} = f(X_t, A_t, D_{t+1}) \qquad \text{where} \quad f(x,a,d) \coloneq (x - d)\vee 0 + a. $$ (eq-i_lom) The term $A_t$ is units of stock ordered this period, which take one period to arrive. The definition of $f$ imposes the assumption that firms cannot sell more stock than they have on hand. We assume that the firm can store at most $K$ items at one time. With the price of the firm's product set to one, current profits are given by $$ \pi_t \coloneq X_t \wedge D_{t+1} - c A_t - \kappa \1\{A_t > 0\}. $$ Here $c$ is unit product cost and $\kappa$ is a fixed cost of ordering inventory. We take the minimum $X_t \wedge D_{t+1}$ because orders in excess of inventory are assumed to be lost rather than back-filled. We can map our inventory problem into an MDP with state space $\Xsf \coloneq \{0, \ldots, K\}$ and action space $\Asf \coloneq \Xsf$. The feasible correspondence $\Gamma$ is $$ \Gamma(x) \coloneq \{0, \ldots, K - x\}, $$ (eq-i_gamma) which represents the set of feasible orders when the current inventory state is $x$. The reward function is expected current profits, or $$ r(x, a) \coloneq \sum_{d \geq 0} (x \wedge d) \phi(d) - c a - \kappa \1\{a > 0\}. $$ (eq-cpip) The stochastic kernel from the set of feasible state action pairs $\Gsf$ induced by $\Gamma$ is, in view of {eq}`eq-i_lom`, $$ P(x, a, x') \coloneq \PP\{ f(x, a, D) = x' \} \qquad \text{when} \quad D \sim \phi. $$ (eq-i_sk) ```{exercise} :label: ex-i_sk Suppose that $\phi$ is the geometric distribution on $\ZZ_+$ with parameter $p$. Write down an expression for the stochastic kernel {eq}`eq-i_sk` using only $x, a, x'$ and the parameters of the model. ``` ```{solution} ex-i_sk The stochastic kernel is $$ P(x, a, x') = \begin{cases} 0 & \text{if } x' < a \\ (1-p)^x & \text{if } x' = a \\ (1-p)^{x + a - x'} p & \text{if } x' > a \end{cases} $$ (eq-i_ske) The middle case is obtained by observing that the next period state hits $x'$ when $x' = a$ if and only if $D_{t+1} \geq x$ and then using the expression for the geometric distribution. ``` The Bellman equation for this optimal inventory problem is $$ v(x) = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \beta \sum_{d \geq 0} v(f(x, a, d)) \phi(d) \right\}, $$ (eq-oibelleq) at each $x \in \Xsf$, where $r(x,a)$ is as given in {eq}`eq-cpip` and the aim is to solve for $v$. We introduce the Bellman operator $$ (Tv)(x) = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \beta \sum_{d \geq 0} v(f(x, a, d)) \phi(d) \right\}. $$ This operator maps $\RR^\Xsf$ to itself and is designed so that its set of fixed points in $\RR^\Xsf$ coincide with solutions to {eq}`eq-oibelleq` in $\RR^\Xsf$. ```{exercise} :label: ex-mdps-auto-2 Prove that $T$ is a contraction of modulus $\beta$ on $\RR^\Xsf$ when paired with the supremum norm $\| v \|_\infty \coloneq \max_{x \in \Xsf} |v(x)|$. ``` ```{solution} ex-mdps-auto-2 $T$ is a sup norm contraction mapping on $\RR^\Xsf$ because, in view of the max-inequality lemma, for any $v, w$ in $\RR^\Xsf$, $$ \begin{aligned} |(T v)(x)| - (T w)(x)| & \leq \beta \, \max_{a \in \Gamma(x)} \left| \sum_{d \geq 0} \left[ v(f(x, a, d)) - w(f(x, a, d)) \right] \phi(d) \right| \\ & \leq \beta \, \max_{a \in \Gamma(x)} \sum_{d \geq 0} \left| v(f(x, a, d)) - w(f(x, a, d)) \right| \phi(d). \end{aligned} $$ Since $\sum_{d \geq 0} \phi(d) = 1$, it follows that, for arbitrary $x \in \Xsf$, $$ |(T v)(x) - (T w)(x)| \leq \beta \| v - w\|_\infty. $$ Taking the supremum over all $x \in \Xsf$ yields the desired result. ``` (sss-cake)= #### Example: Cake Eating Many dynamic programming problems in economics involve a trade-off between current and future consumption. The simplest example in this class is the "cake eating" problem, where initial household wealth is given but no labor income is received. Wealth evolves according to $$ W_{t+1} = R(W_t - C_t) \qquad (t \geq 0), $$ where $C_t$ is current consumption and $R$ is the gross interest rate. The agent seeks to maximize $$ \EE \sum_{t \geq 0} \beta^t u(C_t) \quad \text{given } W_0 = w, $$ subject to $0 \leq C_t \leq W_t$ (implying that the agent cannot borrow). Consumption level $C_t$ generates utility $u(C_t)$. Assuming that wealth takes values in a finite set $\Wsf \subset \RR_+$, the Bellman equation for this problem can be written as $$ v(w) = \max_{0 \leq w' \leq w} \left\{ u (w - w'/R) + \beta v(w') \right\}. $$ (eq-cakebe) In {eq}`eq-cakebe` we are using $w' = R(w - c)$ to obtain $c=(w-w'/R)$. The household uses {eq}`eq-cakebe` to trade-off current utility of consumption against the value of future wealth. ```{exercise} :label: ex-mdps-auto-3 Frame this model as an MDP with $\Wsf$ as the state space. ``` ```{solution} ex-mdps-auto-3 We take the action $A_t$ to be the choice of next period wealth $W_{t+1}$, so that the action space is also $\Wsf$. The feasible correspondence is $$ \Gamma(w) = \setntn{a \in \Wsf}{a \leq R w} \qquad (w \in \Wsf), $$ implying that $\Gsf = \setntn{(w, a) \in \Wsf \times \Wsf}{a \leq Rw}$. The current reward is utility of consumption, or $$ r(w, a) = u \left( w - \frac{a}{R} \right) \qquad ((w, a) \in \Gsf). $$ The stochastic kernel is $P(w, a, w') = \1\{w' = a\}$. This just states that next period wealth $w'$ is equal to the action $a$ with probability one. ``` (sss-jsmdp)= #### Example: Optimal Stopping The optimal stopping problem we studied in {prf:ref}`c-opt_stop` can be framed as an MDP. On one hand, doing so allows us to apply results obtained for MDPs to optimal stopping. On the other hand, expressing an optimal stopping problem as an MDP requires an additional state variable, which complicates the exposition. The next exercise helps to illustrate the key ideas. ```{prf:remark} While readers interested in the connection between optimal stopping and MDPs will benefit from this section, others can freely skip to {ref}`ss-fmdpo` without losing continuity. Later, in {prf:ref}`c-rdps`, we will show that optimal stopping problems can be embedded in a very general framework (which includes MDPs) without adding extra state variables. ``` Let's focus on the job search problem with Markov state discussed in {ref}`ss-jsms` (although the arguments for the general optimal stopping problem in {ref}`sss-tsp` are very similar). As before, $\Wsf$ is the set of wage outcomes. Since we need the symbol $P$ for other purposes, we let $Q$ be the Markov matrix for wages, so that $(W_t)_{t\geq 0}$ is $Q$-Markov on $\Wsf$. To express the job search problem as an MDP, let $\Xsf = \{0,1\} \times \Wsf$ be a state space whose typical element is $(e, w)$, with $e$ representing either unemployment ($e=0$) or employment ($e=1$) and $w$ being the current wage offer. An action $a \in \Asf \coloneq \{0, 1\}$ indicates rejection or acceptance of the current wage offer. ```{exercise} :label: ex-mdps-auto-4 Express the job search problem as an MDP, with state space $\Xsf$ and action space $\Asf$ as described in the previous paragraph. ``` ```{solution} ex-mdps-auto-4 To impose that workers never leave the firm, we require $a \geq e$. Thus, the feasible correspondence is $$ \Gamma(x) = \Gamma(e, w) = \setntn{a \in \{0, 1\}}{a \geq e} . $$ The set of feasible state action pairs is $\Gsf = \setntn{ ((e, w), a) \in \Xsf \times \Asf}{a \geq e}$. The reward function is $$ r(x,a) = r((e, w), a) = a w + (1-a) c. $$ Regarding the stochastic kernel, we need to define state transition probabilities for all feasible state action pairs. Letting $P[(e, w), a, (e', w')]$ be the probability of transitioning to state $(e', w')$ given current state $(e,w)$ and current action $a \leq e$, we set $$ P[(0, w), a, (e', w')] = \1\{e'=a\} \cdot [ \, a \1\{w' = w\} + (1-a) Q(w, w') \, ] $$ (eq-powa) and $P[(1, w), 1, (e', w')] = \1\{e'=1\} \1\{w' = w\}$. Equation {eq}`eq-powa` says that if $a=0$ then $e'=0$ and the next wage is drawn from $Q(w, w')$, whereas if $a=1$ then $e'=1$ and the next wage is $w$. You can verify that $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$. To double-check that these definitions work, we can verify that they lead to the same Bellman equations that we saw in {ref}`ss-jsms`. Under the definitions of $\Gamma$, $r$, and $P$ just provided, we have $v(1, w) = w + \beta \EE v(1, w)$. This implies that $v(1, w) = w/(1-\beta)$, which is what we expect for lifetime value of an agent employed with wage $w$. Moreover, the Bellman equation for $v(0, w)$ agrees with the one we obtained for an unemployed agent. To see this when $e=0$, observe that the Bellman equation is $$ \begin{aligned} v(0, w) & = \max_{a \in \{0, 1\}} \left\{ a w + (1-a) c + \beta \sum_{(e', w')} v(e', w') P[(0, w), a, (e', w')] \right\} \\ & = \max_{a \in \{0, 1\}} \left\{ a w + (1-a) c + \beta \left[ a v(a, w) + (1-a) \sum_{w'} v(a, w') Q(w, w') \right] \right\}, \end{aligned} $$ where the second equation follows from {eq}`eq-powa`. (You can see this by checking the cases $a=0$ and $a=1$.) Rearranging and using $v(1, w) = w/(1-\beta)$ now gives $$ v(0, w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w'} \, v(0, w') Q(w', w') \right\}. $$ This is the Bellman equation for an unemployed agent from the job search problem we saw previously. ``` (ss-fmdpo)= ### Optimality In this section, we return to the general MDP setting of {ref}`sss-fsmdp`, define optimal policies and state our main optimality result. As was the case for job search, actions are governed by policies, which are maps from states to actions (see, in particular, {ref}`sss-policies`, where policies were introduced). (sss-fmdpv)= #### Policies and Lifetime Values Let $\mM = (\Gamma, \beta, r, P)$ be an MDP. The set of **feasible policies** corresponding to $\mM$ is $$ \Sigma \coloneq \setntn{\sigma \in \Asf^\Xsf} {\sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf}. $$ (eq-dmdp_fp) If we select a policy $\sigma$ from $\Sigma$, it is understood that we respond to state $X_t$ with action $A_t \coloneq \sigma(X_t)$ at every date $t$. As a result, the state evolves by drawing $X_{t+1}$ from $P(X_t, \sigma(X_t), \cdot)$ at each $t \geq 0$. In other words, $(X_t)_{t \geq 0}$ is $P_\sigma$-Markov when $$ P_\sigma(x, x') \coloneq P(x, \sigma(x), x') \qquad (x, x' \in \Xsf). $$ Note that $P_\sigma \in \mopx$. Fixing a policy "closes the loop" in the state transition process and defines a Markov chain for the state. Under the policy $\sigma$, rewards at state $x$ are $r(x, \sigma(x))$. If $$ r_\sigma(x) \coloneq r(x, \sigma(x)) \quad \text{and} \quad \EE_x \coloneq \EE[ \; \cdot \; \given X_0 = x], $$ then the lifetime value of following $\sigma$ starting from state $x$ can be written as $$ v_\sigma (x) = \EE_x \sum_{t \geq 0} \beta^t r_\sigma(X_t) \quad \text{where } (X_t) \text{ is } P_\sigma \text{-Markov with } X_0 = x. $$ (eq-vssdef) Since $\beta < 1$, applying {prf:ref}`l-fgsd` to this expression yields $$ v_\sigma = \sum_{t \geq 0} \beta^t P_\sigma^t \, r_\sigma = (I - \beta P_\sigma)^{-1} \, r_\sigma . $$ (eq-vsigdcn) Analogous to the optimal stopping case, we call $v_\sigma$ the **$\sigma$-value function**. We also call $v_\sigma(x)$ the **lifetime value** of policy $\sigma$ conditional on initial state $x$. ```{exercise} :label: ex-mdps-auto-5 Prove that $v_1 \leq v_\sigma \leq v_2$ when $v_2 \coloneq \| r \|_\infty / (1-\beta)$ and $v_1 \coloneq -v_2$. ``` ```{solution} ex-mdps-auto-5 We need to show that $v_\sigma = (I-\beta P_\sigma)^{-1} r_\sigma$ obeys $v_1 \leq v_\sigma \leq v_2$ where $v_1, v_2$ are as defined in the exercise. Regarding the upper bound, let $\bar r \coloneq \| r \|_\infty$. We have $$ (I-\beta P_\sigma)^{-1} \, r_\sigma \leq (I-\beta P_\sigma)^{-1} \, \bar r \, \1 = \bar r \sum_{t \geq 0} (\beta P)^t \1 = \frac{\bar r}{1 - \beta} = v_2. $$ A similar argument shows that $v_1 \leq v_\sigma$. ``` Another way to compute $v_\sigma$ is to use the **policy operator** $T_\sigma$ corresponding to $\sigma$, which is defined at $v \in \RR^\Xsf$ by $$ (T_\sigma \, v)(x) = r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x') \qquad (x \in \Xsf). $$ (eq-mdpts) ($T_\sigma$ is analogous to the policy operator defined for the optimal stopping problem in {ref}`sss-os_po`.) In vector notation, $$ T_\sigma \, v = r_\sigma + \beta P_\sigma \, v. $$ The next exercise shows how $T_\sigma$ can be put to work. ```{exercise} :label: ex-vsits Fixing $\sigma$ in $\Sigma$, prove that 1. $T_\sigma$ is an order-preserving self-map on $\RR^\Xsf$, 2. $T_\sigma$ is a contraction on $\RR^\Xsf$ of modulus $\beta$ under the norm $\| \cdot \|_\infty$, 3. the $\sigma$-value function $v_\sigma$ is the unique fixed point of $T_\sigma$ in $\RR^\Xsf$, and 4. $T_\sigma^k v \to v_\sigma$ as $k \to \infty$ for all $v \in \RR^\Xsf$. ``` ```{solution} ex-vsits Fix $\sigma \in \Sigma$. It is obvious that $T_\sigma$ is a self-map on $\RR^\Xsf$ and $T_\sigma$ is clearly order-preserving, since $v \leq w$ implies $P_\sigma v \leq P_\sigma w$ and hence $T_\sigma v \leq T_\sigma w$. Also, $T_\sigma$ is a contraction of modulus $\beta$ on $\RR^\Xsf$ under the supremum norm, since, for any $v, w$ in $\RR^\Xsf$ we have $$ \begin{aligned} |(T_\sigma v)(x) -(T_\sigma w)(x)| & = \beta \, \left| \sum_{x'} P(x, \sigma(x), x') v(x') - \sum_{x'} P(x, \sigma(x), x') w(x') \right| \\ & \leq \sum_{x'} P(x, \sigma(x), x') \beta \, \left| v(x') - w(x') \right| \leq \beta \| v - w\|_\infty. \end{aligned} $$ Taking the supremum over all $x \in \Xsf$ yields the desired result. This contraction property combined with Banach's fixed-point theorem implies that $T_\sigma$ has a unique fixed point. Now suppose that $v$ is the unique fixed point of $T_\sigma$. Then $v = r_\sigma + \beta P_\sigma v$. But then $v = (I-\beta P_\sigma)^{-1} r_\sigma$. Hence $v = v_\sigma$. This establishes all claims in the lemma. ``` Computationally, this means that we can pick $v \in \RR^\Xsf$ and iterate with $T_\sigma$ to obtain an approximation to $v_\sigma$. ```{exercise} :label: ex-iczpi Prove that, when the initial condition for iteration is $v \equiv 0 \in \RR^\Xsf$, the $k$-th iterate $T_\sigma^k v$ is equal to the truncated sum $\sum_{t = 0}^{k-1} \beta^t P_\sigma^t r_\sigma$. ``` ```{prf:remark} To compute $v_\sigma$, should we use the expression $(I - \beta P_\sigma)^{-1} \, r_\sigma$ in {eq}`eq-vsigdcn` or iterate with $T_\sigma$? For small state spaces, the first option is typically faster. However, it is easy to write down dynamic programming problems where $\Xsf$ is very large (see, e.g., {prf:ref}`eg-retail2`). If, say, $\Xsf$ has $10^9$ elements, then $I - \beta \, P_\sigma$ is $10^9 \times 10^9$. Matrices of this size are difficult to invert -- or even store in memory. In such settings, iterating with $T_\sigma$ might be preferred. ``` The next exercise extends {prf:ref}`ex-iczpi` and aids interpretation of policy operators. It tells us that $(T_\sigma^k \, v)(x)$ is the payoff from following policy $\sigma$ and starting in state $x$ when lifetime is truncated to the finite horizon $k$ and $v$ provides a terminal payoff in each state. ```{exercise} :label: ex-approx Fix $\sigma \in \Sigma$ and let $(X_t)$ be $P_\sigma$-Markov with initial condition $x \in \Xsf$. Prove that, for given $v \in \RR^\Xsf$, and $k \in \NN$, we have $$ (T_\sigma^k \, v)(x) = \EE_x \left[ \sum_{t=0}^{k-1} \beta^t r(X_t, \sigma(X_t)) + \beta^k v(X_k) \right]. $$ ``` (sss-mdpdefop)= #### Defining Optimality Given MDP $\mM = (\Gamma, \beta, r, P)$ with $\sigma$-value functions $\{v_\sigma\}_{\sigma \in \Sigma}$, the **value function** corresponding to $\mM$ is defined as $v^* \coloneq \vee_{\sigma \in \Sigma} \, v_\sigma$, where, as usual, the maximum is pointwise. More explicitly, $$ v^*(x) = \max_{\sigma \in \Sigma} v_\sigma(x) \qquad (x \in \Xsf). $$ (eq-vstar0) This is consistent with our definition of the value function in the optimal stopping case. It is the maximal lifetime value we can extract from each state using feasible behavior. The maximum in {eq}`eq-vstar0` exists at each $x$ because $\Sigma$ is finite. A policy $\sigma \in \Sigma$ is called **optimal** for $\mM$ if $v_\sigma = v^*$. In other words, a policy is optimal if its lifetime value is maximal at each state. ```{prf:example} Consider again Figure {numref}`f-v_star_illus`, supposing that $\Sigma = \{\sigma', \sigma''\}$. As drawn, there is no optimal policy, since $v^*$ differs from both $v_{\sigma'}$ and $v_{\sigma''}$. Below, in {prf:ref}`p-dmdp_o`, we will show that such an outcome is *not* possible for MDPs. ``` Our optimality results are easier to follow with some additional terminology. To start, given $v \in \RR^\Xsf$, we define a policy $\sigma \in \Sigma$ to be **$v$-greedy** if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \quad \text{for all } x \in \Xsf. $$ (eq-ncdmc) In essence, a $v$-greedy policy treats $v$ as the correct value function and sets all actions accordingly. ```{exercise} :label: ex-mdps-auto-6 Fix $\sigma \in \Sigma$ and $v \in \RR^\Xsf$. Prove that the set $\{T_\sigma \, v\}_{\sigma \in \Sigma}$ has a least and greatest element. ``` ```{solution} ex-mdps-auto-6 Fix $v \in V$ and take $\hat \sigma$ to be $v$-greedy, so that $$ \hat \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \quad \text{for all } x \in \Xsf . $$ (eq-ncdmca) If $\sigma$ is any other feasible policy, then $$ r(x, \hat \sigma(x)) + \beta \sum_{x'} v(x') P(x, \hat \sigma(x), x') \geq r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x') $$ at all $x$. In operator form, this is $T_{\hat \sigma} \, v \geq T_\sigma \, v$. Since $\sigma$ is an arbitrary greedy policy, we have shown that $T_{\hat \sigma} \, v$ is the greatest element of $\{T_\sigma \, v\}_{\sigma \in \Sigma}$. A similar argument replacing argmax with argmin in {eq}`eq-ncdmca` shows that a least element also exists. ``` **Bellman's principle of optimality** is said to hold for the MDP $\mM$ if $$ \sigma \in \Sigma \text{ is optimal for } \mM \quad \iff \quad \sigma \text{ is } v^*\text{-greedy}. $$ The **Bellman operator** corresponding to $\mM$ is the self-map $T$ on $\RR^\Xsf$ defined by $$ (Tv)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \qquad (x \in \Xsf). $$ (eq-fsmdp_bellop) Obviously, $Tv=v$ if and only if $v$ satisfies the Bellman equation {eq}`eq-fsmdp_bell`. ```{div} :name: ex-egmdps ``` ```{exercise} :label: ex-boeq Given $v \in \RR^\Xsf$, prove that 1. at least one $v$-greedy policy exists, 2. $\sigma \in \Sigma$ is $v$-greedy if and only if $T_\sigma \, v = Tv$, and 3. $(Tv)(x) = \max_{\sigma \in \Sigma} (T_\sigma \, v)(x)$ for all $x \in \Xsf$. ``` ```{solution} ex-boeq Fix $v \in \RR^\Xsf$. Part (i) follows from the fact that $\Gamma(x)$ is finite and nonempty at each $x \in \Xsf$. Hence we can select an element $a^*_x$ from the argmax in the definition of a $v$-greedy policy at each $x$ in $\Xsf$. The resulting policy is $v$-greedy. For part (ii) we need to show that $\sigma \in \Sigma$ is $v$-greedy if and only if $$ r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x') = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} $$ for all $x \in \Xsf$. But this is immediate from the definition. Regarding part (iii), it follows from the definitions that $(T_\sigma \, v)(x) \leq (Tv)(x)$ for all $x \in \Xsf$. At the same time, for any $v$-greedy $\sigma \in \Sigma$, we have $(T_\sigma \, v)(x) = (Tv)(x)$ for all $x$. Hence $Tv = \vee_\sigma \, T_\sigma \, v$, as was to be shown. ``` The last part of {prf:ref}`ex-boeq` tells us that $T$ is the pointwise maximum of $\{T_\sigma\}_{\sigma \in \Sigma}$, which can be expressed as $T = \vee_\sigma \, T_\sigma$. Figure {numref}`f-bellman_envelope` illustrates this relationship in one dimension. ```{figure} ../figures/bellman_envelope.pdf :name: f-bellman_envelope $T$ is the pointwise maximum of $\{T_\sigma\}_{\sigma \in \Sigma}$ (one-dimensional setting) ``` ```{exercise} :label: ex-tcmdp Prove: $T$ is a contraction of modulus $\beta$ on $\RR^\Xsf$ under norm $\| \cdot \|_\infty \,$. ``` ```{solution} ex-tcmdp This result follows from {prf:ref}`l-supbc`. For the sake of the exercise, we also provide a direct proof: Fix $v, w \in \RR^\Xsf$ and $x \in \Xsf$. By {prf:ref}`ex-boeq` and the max-inequality lemma, we have $$ \begin{aligned} |(T v)(x) -(T w)(x)| & = \left| \max_{\sigma \in \Sigma} (T_\sigma \, v)(x) - \max_{\sigma \in \Sigma} (T_\sigma \, w)(x) \right| \\ & \leq \max_{\sigma \in \Sigma} \left| (T_\sigma \, v)(x) - (T_\sigma \, w)(x) \right| = \| T_\sigma \, v - T_\sigma \, w\|_\infty. \end{aligned} $$ Applying contractivity of $T_\sigma$ ({prf:ref}`ex-vsits`), we get $\| Tv - Tw \|_\infty \leq \beta \| v - w\|_\infty$. ``` #### Optimality Theory We can now state our main optimality result for MDPs. ```{prf:proposition} :label: p-dmdp_o If $\mM = (\Gamma, \beta, r, P)$ is an MDP with value function $v^*$ and Bellman operator $T$, then 1. $v^*$ is the unique solution to the Bellman equation in $\RR^\Xsf$, 2. $\lim_{k \to \infty} T^k v =v^*$ for all $v \in \RR^\Xsf$, 3. Bellman's principle of optimality holds for $\mM$, and 4. at least one optimal policy exists. ``` While {prf:ref}`p-dmdp_o` is a special case of later results (see {ref}`sss-rdp_opres`), a direct proof is not difficult and we now provide one for interested readers. ```{prf:proof} *Proof of {prf:ref}`p-dmdp_o`.* In {prf:ref}`ex-tcmdp` we showed that $T$ is a contraction mapping on the closed set $\RR^\Xsf$. Hence $T$ is globally stable on $\RR^\Xsf$ and therefore has a unique fixed point $\bar v \in \RR^\Xsf$. Our first claim is that $\bar v = v^*$. We show $\bar v \leq v^*$ and then $\bar v \geq v^*$. For the first inequality, let $\sigma \in \Sigma$ be $\bar v$-greedy. Recalling {prf:ref}`ex-boeq`, we have $T_\sigma\, \bar v = T \bar v = \bar v$. Hence $\bar v$ is also a fixed point of $T_\sigma$. But the only fixed point of $T_\sigma$ in $\RR^\Xsf$ is $v_\sigma$, so $\bar v = v_\sigma$. But then $\bar v \leq v^*$, since, by definition, $v^* = \vee_\sigma \, v_\sigma$. This is our first inequality. As for the second inequality, fix $\sigma \in \Sigma$ and observe that $T_\sigma \, v \leq T v$ for all $v \in \RR^\Xsf$. Since $T$ is order-preserving and globally stable, {prf:ref}`p-ofpds` implies that $v_\sigma \leq \bar v$. Taking the supremum over $\sigma \in \Sigma$ yields $v^* \leq \bar v$. Hence $v^*$ is a fixed point of $T$ in $\RR^\Xsf$. Since $T$ is globally stable on $\RR^\Xsf$, the remaining claims in parts (i)--(ii) follow immediately. As for part (iii), it follows from {prf:ref}`ex-boeq` and part (i) of this theorem that $$ \sigma \text{ is } v^* \text{-greedy} \quad \iff \quad T_\sigma \, v^* = T v^* \quad \iff \quad T_\sigma \, v^* = v^*. $$ The right hand side of this expression tells us that $v^*$ is a fixed point of $T_\sigma$. But the only fixed point of $T_\sigma$ is $v_\sigma$, so the right hand side is equivalent to the statement $v_\sigma = v^*$. Hence, by this chain of logic and the definition of optimality, $$ \sigma \text{ is } v^* \text{-greedy} \iff v^* = v_\sigma \iff \text{ } \sigma \text{ is optimal}. $$ (eq-eqwb) Hence (iii) holds. Part (iv) is left to {prf:ref}`ex-iiiiv`. ◻ ``` ```{exercise} :label: ex-iiiiv Prove that, in {prf:ref}`p-dmdp_o`, (iii) implies (iv). ``` ```{solution} ex-iiiiv Part (iii) of {prf:ref}`p-dmdp_o` implies (iv) because every $v \in \RR^\Xsf$ has at least one greedy policy ({prf:ref}`ex-egmdps`). In particular, at least one $v^*$-greedy policy exists. ``` Figure {numref}`f-optimality_illustration_1` illustrates {prf:ref}`p-dmdp_o` in an abstract case, where $\Xsf$ is a singleton $\{x\}$. We write $v$ instead of $v(x)$ for the value of state $x$ and place $v$ on the horizontal axis. In the figure, the set of policies is $\Sigma = \{\sigma', \sigma''\}$. For given $\sigma \in \Sigma$, the map $T_\sigma$ is an affine function $T_\sigma \, v = r_\sigma + \beta P_\sigma \, v$ and the fixed point is $v_\sigma$. The Bellman operator $T$ is the upper envelope of the functions $\{T_\sigma\}$, as shown in (ii) of {prf:ref}`ex-boeq`. By definition, 1. $v^*$ is the largest of these fixed points, which equals $v_{\sigma''}$, and 2. $\sigma''$ is the optimal policy, since $v_{\sigma''} = v^*$. In accordance with {prf:ref}`p-dmdp_o`, $v^*$ is also the fixed point of the Bellman operator. ```{figure} ../figures/optimality_illustration_1.pdf :name: f-optimality_illustration_1 Illustration of optimality for MDPs ``` It is important to understand the significance of (iii) in {prf:ref}`p-dmdp_o`. Greedy policies are relatively easy to compute, in the sense that solving {eq}`eq-ncdmc` at each $x$ is easier than trying to directly solve the problem of maximizing lifetime value, since $\Sigma$ is in general far larger than $\Gamma(x)$. Part (iii) tells us that solving the overall problem reduces to computing a $v$-greedy policy with the right choice of $v$. For optimal stopping problems, that choice is the value function $v^*$. Intuitively, $v^*$ assigns a "correct" value to each state, in the sense of maximal lifetime value the controller can extract, so using $v^*$ to calculate greedy policies leads to the optimal outcome. (ss-fmdpsal)= ### Algorithms In previous chapters we solved job search and optimal stopping problems using value function iteration. In this section, we present a generalization suitable for arbitrary MDPs and then discuss two important alternatives. (sss-vfi)= #### Value Function Iteration **Value function iteration** **(VFI)** for MDPs is similar to VFI for the job search model: We use successive approximation on $T$ to compute an approximation $v_k$ to the value function $v^*$ and then take a $v_k$-greedy policy. The general procedure is given by {prf:ref}`algo-fsvfi`. ```{prf:algorithm} VFI for MDPs :label: algo-fsvfi - input $v_0 \in \RR^\Xsf$, an initial guess of $v^*$ - input $\tau$, a tolerance level for error - $\epsilon \leftarrow +\infty$ and $k \leftarrow 0$ - while $\epsilon > \tau $: - $v_{k+1} \leftarrow Tv_k$ - $\epsilon \leftarrow \| v_k - v_{k+1} \|_\infty$ - $k \leftarrow k + 1$ - **return** a $v_k$-greedy policy $\sigma$ ``` The fact that the sequence $(v_k)_{k \geq 0}$ produced by VFI converges to $v^*$ is immediate from {prf:ref}`p-dmdp_o` (as the tolerance $\tau$ is taken toward zero). It is also true that the greedy policy produced in the last step is approximately optimal when $\tau$ is small, and exactly optimal when $k$ is sufficiently large. Proofs are given in {prf:ref}`c-rdps`, where we examine VFI in a more general setting. VFI is robust, easy to understand and easy to implement. These properties explain its enduring popularity. At the same time, in terms of efficiency, VFI is often dominated by alternative algorithms that we now describe. (sss-hpi)= #### Howard Policy Iteration Unlike VFI, **Howard policy iteration (HPI)** computes optimal policies by iterating between computing the value of a given policy and computing the greedy policy associated with that value. The full technique is described in {prf:ref}`algo-fshpi`. ```{prf:algorithm} HPI for MDPs :label: algo-fshpi - input $\sigma \in \Sigma$ - $v_0 \leftarrow v_\sigma$ and $k \leftarrow 0$ - repeat: - $\sigma_k \leftarrow $ a $v_k$-greedy policy - $v_{k+1} \leftarrow (I - \beta P_{\sigma_k} )^{-1} r_{\sigma_k}$ - if $v_{k+1} = v_k$: break - $k \leftarrow k + 1$ - return $\sigma_k$ ``` A visualization of HPI is given in Figure {numref}`f-flint`, where $\sigma$ is the initial choice. Next, we compute the lifetime value $v_\sigma$, and then the $v_\sigma$-greedy policy $\sigma'$, and so on. The computation of lifetime value is called the **policy evaluation** step, whereas the computation of greedy policies is called **policy improvement**. ```{figure} figures/flint.svg :name: f-flint HPI algorithm ``` HPI has two very attractive features. One is that, in a finite state setting, the algorithm always converges to an exact optimal policy in a finite number of steps, regardless of the initial condition. We prove this fact in a more general setting in {prf:ref}`c-rdps`. The second is that the rate of convergence is faster than VFI, as will be shown in {ref}`sss-hpin`. Figure {numref}`f-howard_newton_1` gives another illustration, presented in the one-dimensional setting that we used for Figure {numref}`f-optimality_illustration_1`. In this illustration, we imagine that there are many optimal policies, and hence many functions in $\{T_\sigma\}$, so that their upper envelope, which is the Bellman operator, becomes a smoother curve. The figure shows the update from $v_\sigma$ to the next lifetime value $v_{\sigma'}$, via the following two steps: 1. Take $\sigma'$ to be $v_\sigma$-greedy, which means that $T_{\sigma'} v_\sigma = T v_\sigma$ (see {prf:ref}`ex-boeq`). 2. Take $v_{\sigma'}$ to be the fixed point of $T_{\sigma'}$. The next step, from $v_{\sigma'}$ to $v_{\sigma''}$ is analogous. Comparison of this figure with Figure {numref}`f-newton_1` suggests that HPI is an implementation of Newton's method, applied to the Bellman operator. We confirm this in {ref}`sss-hpin`. ```{figure} ../figures/howard_newton_1.pdf :name: f-howard_newton_1 HPI as a version of Newton's method ``` (sss-hpin)= #### HPI as Newton Iteration In discussing the connection between HPI and Newton iteration, one issue is that $T$ is not always differentiable, as seen in Figure {numref}`f-optimality_illustration_1`. But $T$ is convex, and this lets us substitute *subgradients* for derivatives. Once we make this modification, HPI and Newton iteration are identical, as we now show. First, recall that, given a self-map $T$ from $S \subset \RR^n$ to itself, an $n \times n$ matrix $D$ is called a **subgradient** of $T$ at $v \in S$ if $$ Tu \geq Tv + D (u - v) \quad \text{for all } u \in S. $$ (eq-subgrad) Figure {numref}`f-subgrad` illustrates the definition in one dimension, where $D$ is just a scalar determining the slope of a tangent line at $v$. In the left subfigure, $T_1$ is convex and differentiable at $v$, which means that only one subgradient exists (since any other choice of slope implies that the inequality in {eq}`eq-subgrad` will fail for some $u$). In the right subfigure, $T_2$ is convex but nondifferentiable at $v$, so multiple subgradients exist. ```{figure} figures/subgrad.pdf :name: f-subgrad Subgradients of convex functions ``` In the next result, we take $(\Gamma, \beta, r, P)$ to be a given MDP and let $T$ be the associated Bellman operator. ```{prf:lemma} If $v \in \RR^\Xsf$ and $\sigma \in \Sigma$ is $v$-greedy, then $\beta P_\sigma$ is a subgradient of $T$ at $v$. ``` ```{prf:proof} Fix $v \in \RR^\Xsf$ and let $\sigma$ be $v$-greedy. Using $T \geq T_\sigma$ and $T_\sigma \, v = T v$, we have $$ Tu = Tv + Tu - Tv \geq Tv + T_\sigma \, u - T_\sigma \, v. $$ Applying the definition of $T_\sigma$ now gives $$ Tu \geq Tv + \beta P_\sigma u - \beta P_\sigma v = Tv + \beta P_\sigma (u - v). $$ Hence $\beta P_\sigma$ is a subgradient of $T$ at $v$, as claimed. ◻ ``` Now let's consider Newton's method applied to the problem of finding the fixed point of $T$. Since $T$ is nondifferentiable and convex, we replace the Jacobian in Newton's method (see {eq}`eq-newq`) with the subgradient. This leads us to iterate on $$ v_{k+1} = Qv_k \quad \text{where} \quad Qv \coloneq (I - \beta P_\sigma)^{-1} (Tv - \beta P_\sigma v). $$ In the definition of $Q$, the policy $\sigma$ is $v$-greedy. Using $Tv = T_\sigma v$, the map $Q$ reduces to $Qv \coloneq (I - \beta P_\sigma)^{-1} r_\sigma$, which is exactly the update step to produce the next $\sigma$-value function in HPI (i.e., the lifetime value of a $v$-greedy policy). The fact that HPI is a version of Newton's method suggests that its iterates $(v_k)_{k \geq 0}$ enjoy quadratic convergence. This is indeed the case: Under mild conditions, one can show there exists a constant $N$ such that, for all $k \geq 0$, $$ \| v_{k+1} - v_k \| \leq N \| v_k - v_{k-1} \|^2 $$ (eq-hpirate) (see, e.g., {cite}`puterman2005markov`, Theorem 6.4.8). Hence HPI enjoys both a fast convergence rate and the robustness of global convergence. However, HPI is not always optimal in terms of efficiency, since the size of the constant term in {eq}`eq-hpirate` also matters. This term can be large because, at each step, the update from $v_\sigma$ to $v_{\sigma'}$ requires computing the exact lifetime value $v_{\sigma'}$ of the $v_\sigma$-greedy policy $\sigma'$. Computing this fixed point exactly can be computationally expensive in high dimensions. One way around this issue is to forgo computing the fixed point $v_{\sigma'}$ exactly, replacing it with an approximation. Section {ref}`sss-opi` takes up this idea. (sss-opi)= #### Optimistic Policy Iteration Optimistic policy iteration (OPI) is an algorithm that borrows from both VFI and HPI. In essence, the algorithm is the same as HPI except that, instead of computing the full value $v_\sigma$ of a given policy, the approximation $T_\sigma^m v$ from {prf:ref}`ex-approx` is used instead. {prf:ref}`algo-mdp_opi` clarifies. ```{prf:algorithm} OPI for MDPs :label: algo-mdp_opi - input $m \in \NN$ and tolerance $\tau \geq 0$ - input $\sigma \in \Sigma$ and set $v_0 \leftarrow v_\sigma$ - $k \leftarrow 0$ - repeat: - $\sigma_k \leftarrow $ a $v_k$-greedy policy - $v_{k+1} \leftarrow T_{\sigma_k}^m v_k$ - if $\| v_{k+1} - v_k \| \leq \tau$: break - $k \leftarrow k + 1$ - return $\sigma_k$ ``` In the algorithm, the policy operator $T_{\sigma_k}$ is applied $m$ times to generate an approximation of $v_{\sigma_k}$. The constant step size $m$ can also be replaced with a sequence $(m_k) \subset \NN$. In either case, for MDPs, convergence to an optimal policy is guaranteed. We prove this in a more general setting in {prf:ref}`c-rdps`. Notice that, as $m \to \infty$, the algorithm increasingly approximates HPI, since $T_{\sigma_k}^m v_k$ converges to $v_{\sigma_k}$. At the same time, if $m=1$, the reduces to VFI. This follows from {prf:ref}`ex-boeq`, which tells us that, when $\sigma_k$ is $v_k$-greedy, $T_{\sigma_k} v_k = T v_k$. Hence, with intermediate $m$, OPI can be seen as a "convex combination" of HPI and VFI. In almost all dynamic programming applications, there exist choices of $m > 1$ such that OPI converges faster than VFI. We investigate these ideas in the applications. In some cases, there exist values of $m$ such that OPI dominates HPI. However, this depends on the structure of the problem and the software and hardware platforms being employed -- see {ref}`sss-para` and the applications for additional discussion. ## Applications This section gives several applications of the MDP model to economic problems. The applications illustrate the ease with which MDPs can be implemented on a computer (provided that the state and action spaces are not too large). (ss-ip)= ### Optimal Inventories In {ref}`sss-mcss` we studied a firm whose inventory behavior was specified to follow S--s dynamics. In {ref}`sss-oim` we introduced a model where investment behavior is endogenous, determined by the desire to maximize firm value. In this section, we show that this endogenous inventory behavior can replicate the S--s dynamics from {ref}`sss-mcss`. We saw in {ref}`sss-oim` that the optimal inventory model is an MDP, so the {prf:ref}`p-dmdp_o` optimality and convergence results apply. In particular, the unique fixed point of the Bellman operator is the value function $v^*$, and a policy $\sigma^*$ is optimal if and only if $\sigma^*$ is $v^*$-greedy. We solve the model numerically using VFI. As in {prf:ref}`ex-i_sk`, we take $\phi$ to be the geometric distribution on $\ZZ_+$ with parameter $p$. We use the default parameter values shown in {numref}`list-inventory_dp`. The code listing also presents an implementation of the Bellman operator. ```{code-block} julia :name: list-inventory_dp :caption: Solving the optimal inventory model (`inventory_dp.jl`) :linenos: using Distributions f(x, a, d) = max(x - d, 0) + a # Inventory update function create_inventory_model(; β=0.98, # discount factor K=40, # maximum inventory c=0.2, κ=2, # cost paramters p=0.6) # demand parameter ϕ(d) = (1 - p)^d * p # demand distribution x_vals = collect(0:K) # set of inventory levels return (; β, K, c, κ, p, ϕ, x_vals) end "The function B(x, a, v) = r(x, a) + β Σ_x′ v(x′) P(x, a, x′)." function B(x, a, v, model; d_max=100) (; β, K, c, κ, p, ϕ, x_vals) = model revenue = sum(min(x, d) * ϕ(d) for d in 0:d_max) current_profit = revenue - c * a - κ * (a > 0) next_value = sum(v[f(x, a, d) + 1] * ϕ(d) for d in 0:d_max) return current_profit + β * next_value end "The Bellman operator." function T(v, model) (; β, K, c, κ, p, ϕ, x_vals) = model new_v = similar(v) for (x_idx, x) in enumerate(x_vals) Γx = 0:(K - x) new_v[x_idx], _ = findmax(B(x, a, v, model) for a in Γx) end return new_v end ``` Figure {numref}`f-inventory_dp_vs` exhibits an approximation of the value function $v^*$, computed by iterating with $T$ starting at $v \equiv 1$. Figure {numref}`f-inventory_dp_vs` also shows the approximate optimal policy, obtained as a $v^*$-greedy policy: $$ \sigma^*(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{d \geq 0} v^*(f(x, a, d)) \phi(d) \right\}. $$ The plot of the optimal policy shows that there is a threshold region below which the firm orders large batches and above which the firm orders nothing. This makes sense, since the firm wishes to economize on the fixed cost of ordering. Figure {numref}`f-inventory_dp_ts` shows a simulation of inventory dynamics under the optimal policy, starting from $X_0 = 0$. The time path closely approximates the S--s dynamics discussed in {ref}`sss-mcss`. ```{figure} ../figures/inventory_dp_vs.pdf :name: f-inventory_dp_vs The value function and optimal policy for the inventory problem ``` ```{figure} ../figures/inventory_dp_ts.pdf :name: f-inventory_dp_ts Optimal inventory dynamics ``` ```{exercise} :label: ex-mdps-auto-7 Compute the optimal policy by extending the code given in {numref}`list-inventory_dp`. Replicate Figure {numref}`f-inventory_dp_ts`, modulo randomness, by sampling from a geometric distribution and implementing the dynamics in {eq}`eq-i_lom`. At each $X_t$, the action $A_t$ should be chosen according to the optimal policy $\sigma^*(X_t)$. ``` (ss-osci)= ### Optimal Savings with Labor Income (sss-osli)= As our next example of an MDP, we modify the cake eating problem in {ref}`sss-cake` to add labor income. Wealth evolves according to $$ W_{t+1} = R (W_t + Y_t - C_t) \qquad (t = 0, 1, \ldots), $$ (eq-cewd) where $(W_t)$ takes values in finite set $\Wsf \subset \RR_+$ and labor income $(Y_t)$ is a Markov chain on finite set $\Ysf \subset \RR_+$ with transition matrix $Q$.[^1] $R$ is a gross rate of interest, so that investing $d$ dollars today returns $Rd$ next period. Other parts of the problem are unchanged. The Bellman operator can be written as $$ (Tv)(w, y) = \max_{w' \in \Gamma(w, y)} \left\{ u \left( w + y - \frac{w'}{R} \right) + \beta \sum_{y'} v(w', y') Q(y, y') \right\}. $$ (eq-dos_bel) #### MDP Representation To frame this problem as an MDP, we set the state to $x \coloneq (w, y)$, representing current wealth and income, taking values in the state space $\Xsf \coloneq \Wsf \times \Ysf$. The action is savings $s$, which takes values in $\Wsf$ and equals $w'$. The feasible correspondence is the set of feasible savings values $$ \Gamma(w, y) = \setntn{s \in \Wsf}{s \leq R (w + y)}. $$ The current reward is utility of consumption $r(w, s) = u(w + y - s/R)$. The stochastic kernel is $$ P((w, y), s, (w', y')) = \1\{w' = s\} Q(y, y'). $$ Having framed an MDP, the {prf:ref}`p-dmdp_o` optimality results apply. #### Implementation To implement the algorithms discussed in {ref}`ss-fmdpsal`, we use the Bellman operator {eq}`eq-dos_bel`, and the corresponding definition of a $v$-greedy policy, which is $$ \sigma(w, y) \in \argmax_{w' \in \Gamma(w, y)} \left\{ u \left( w + y - \frac{w'}{R} \right) + \beta \sum_{y'} v(w', y') Q(y, y') \right\}, $$ for all $(w, y)$. The policy operator for given $\sigma \in \Sigma$ is $$ (T_\sigma \, v)(w, y) = u \left( w + y - \frac{\sigma(w, y)}{R} \right) + \beta \sum_{y'} v(\sigma(w, y), y') Q(y, y'). $$ (eq-dos_pol) Code for implementing the model and these two operators is given in {numref}`list-finite_opt_saving_0`. Income is constructed as a discretized AR(1) process using the method from {ref}`ss-apta`. Exponentiation is applied to the grid so that income takes positive values. The function `get_value` in {numref}`list-finite_opt_saving_1` uses the expression $v_\sigma = (I - \beta \, P_\sigma)^{-1} r_\sigma$ from {eq}`eq-vsigdcn` to obtain the value of a given policy $\sigma$. The matrix $P_\sigma$ and vector $r_\sigma$ take the form $$ \begin{aligned} P_\sigma((w, y), (w', y')) & = \1\{\sigma(w, y) = w'\} Q(y, y'), \\ r_\sigma(w, y) & = u(w + y - \sigma(w, y) / R). \end{aligned} $$ ```{code-block} julia :name: list-finite_opt_saving_0 :caption: Discrete optimal savings model (`finite_opt_saving_0.jl`) :linenos: using QuantEcon, LinearAlgebra, IterTools function create_savings_model(; R=1.01, β=0.98, γ=2.5, w_min=0.01, w_max=20.0, w_size=200, ρ=0.9, ν=0.1, y_size=5) w_grid = LinRange(w_min, w_max, w_size) mc = tauchen(y_size, ρ, ν) y_grid, Q = exp.(mc.state_values), mc.p return (; β, R, γ, w_grid, y_grid, Q) end "B(w, y, w′, v) = u(R*w + y - w′) + β Σ_y′ v(w′, y′) Q(y, y′)." function B(i, j, k, v, model) (; β, R, γ, w_grid, y_grid, Q) = model w, y, w′ = w_grid[i], y_grid[j], w_grid[k] u(c) = c^(1-γ) / (1-γ) c = w + y - (w′ / R) @views value = c > 0 ? u(c) + β * dot(v[k, :], Q[j, :]) : -Inf return value end "The Bellman operator." function T(v, model) w_idx, y_idx = (eachindex(g) for g in (model.w_grid, model.y_grid)) v_new = similar(v) for (i, j) in product(w_idx, y_idx) v_new[i, j] = maximum(B(i, j, k, v, model) for k in w_idx) end return v_new end "The policy operator." function T_σ(v, σ, model) w_idx, y_idx = (eachindex(g) for g in (model.w_grid, model.y_grid)) v_new = similar(v) for (i, j) in product(w_idx, y_idx) v_new[i, j] = B(i, j, σ[i, j], v, model) end return v_new end ``` ```{code-block} julia :name: list-finite_opt_saving_1 :caption: Discrete optimal savings model (`finite_opt_saving_1.jl`) :linenos: include("finite_opt_saving_0.jl") "Compute a v-greedy policy." function get_greedy(v, model) w_idx, y_idx = (eachindex(g) for g in (model.w_grid, model.y_grid)) σ = Matrix{Int32}(undef, length(w_idx), length(y_idx)) for (i, j) in product(w_idx, y_idx) _, σ[i, j] = findmax(B(i, j, k, v, model) for k in w_idx) end return σ end "Get the value v_σ of policy σ." function get_value(σ, model) # Unpack and set up (; β, R, γ, w_grid, y_grid, Q) = model w_idx, y_idx = (eachindex(g) for g in (w_grid, y_grid)) wn, yn = length(w_idx), length(y_idx) n = wn * yn u(c) = c^(1-γ) / (1-γ) # Build P_σ and r_σ as multi-index arrays P_σ = zeros(wn, yn, wn, yn) r_σ = zeros(wn, yn) for (i, j) in product(w_idx, y_idx) w, y, w′ = w_grid[i], y_grid[j], w_grid[σ[i, j]] r_σ[i, j] = u(w + y - w′/R) for j′ in y_idx P_σ[i, j, σ[i, j], j′] = Q[j, j′] end end # Reshape for matrix algebra P_σ = reshape(P_σ, n, n) r_σ = reshape(r_σ, n) # Apply matrix operations --- solve for the value of σ v_σ = (I - β * P_σ) \ r_σ # Return as multi-index array return reshape(v_σ, wn, yn) end ``` ```{code-block} julia :name: list-finite_opt_saving_2 :caption: Discrete optimal savings model (`finite_opt_saving_2.jl`) :linenos: include("s_approx.jl") include("finite_opt_saving_1.jl") "Value function iteration routine." function value_iteration(model, tol=1e-5) vz = zeros(length(model.w_grid), length(model.y_grid)) v_star = successive_approx(v -> T(v, model), vz, tolerance=tol) return get_greedy(v_star, model) end "Howard policy iteration routine." function policy_iteration(model) wn, yn = length(model.w_grid), length(model.y_grid) σ = ones(Int32, wn, yn) i, error = 0, 1.0 while error > 0 v_σ = get_value(σ, model) σ_new = get_greedy(v_σ, model) error = maximum(abs.(σ_new - σ)) σ = σ_new i = i + 1 println("Concluded loop $i with error $error.") end return σ end "Optimistic policy iteration routine." function optimistic_policy_iteration(model; tolerance=1e-5, m=100) v = zeros(length(model.w_grid), length(model.y_grid)) error = tolerance + 1 while error > tolerance last_v = v σ = get_greedy(v, model) for i in 1:m v = T_σ(v, σ, model) end error = maximum(abs.(v - last_v)) end return get_greedy(v, model) end ``` #### Timing Since all results for MDPs apply, we know that the value function $v^*$ is the unique fixed point of the Bellman operator in $\RR^\Xsf$, and that VFI, HPI, and OPI all converge. {numref}`list-finite_opt_saving_2` implements these three algorithms. Since the state and action space are finite, HPI is guaranteed to return an exact optimal policy. Figure {numref}`f-finite_opt_saving_2_1` shows the number of seconds taken to solve the finite optimal savings model under the default parameters when executed on a laptop machine with 20 CPUs running at around 4 GHz. The horizontal axis corresponds to the step parameter $m$ in OPI ({prf:ref}`algo-mdp_opi`). The two other algorithms do not depend on $m$ and hence their timings are constant. The figure shows that HPI is an order of magnitude faster than VFI and that OPI is even faster for moderate values of $m$. One reason VFI is slow is that the discount factor is close to one. This matters because the convergence rate for VFI is linear with error size decreasing geometrically in $\beta$. In contrast, HPI, being an instance of Newton iteration, converges quadratically (see {ref}`sss-rofcon`). As a result, HPI tends to dominate VFI when the discount factor approaches unity. Run-times are also dependent on implementation, and relative speed varies significantly with coding style, software, and hardware platforms. In our implementation, the main deficiency is that parallelization is under-utilized. Better exploitation of parallelization tends to favor HPI, as discussed in {ref}`sss-para`. ```{figure} ../figures/finite_opt_saving_2_1.pdf :name: f-finite_opt_saving_2_1 Timings for alternative algorithms, savings model ``` (sss-outputs)= #### Outputs Figure {numref}`f-finite_opt_saving_ts` shows a typical time series for the wealth of a single household under the optimal policy. The series is created by computing an optimal policy $\sigma^*$, generating $(Y_t)_{t=0}^{m-1}$ as a $Q$-Markov chain on $\Ysf$ and then computing $(W_t)_{t=0}^m$ via $W_{t+1} = \sigma^*(W_t, Y_t)$ for $t$ running from $0$ to $m-1$. Initial wealth $W_0$ is set to $1.0$ and $m = 2000$. ```{figure} ../figures/finite_opt_saving_ts.pdf :name: f-finite_opt_saving_ts Time series for wealth ``` Figure {numref}`f-finite_opt_saving_hist` shows the result of computing and histogramming a longer time series, with $m$ set to 1,000,000. This histogram approximates the stationary distribution of wealth for a large population, each updating via $\sigma^*$ and each with independently generated labor income series $(Y_t)_{t=0}^{m-1}$. (This is due to ergodicity of the wealth-income process. For a discussion of the connection between stationary distributions and time series under ergodicity see, for example, {cite}`sargent2022economic`.) ```{figure} ../figures/finite_opt_saving_hist.pdf :name: f-finite_opt_saving_hist Histogram of wealth ``` The shape of the wealth distribution in Figure {numref}`f-finite_opt_saving_hist` is unrealistic. In almost all countries, the wealth distribution has a very long right tail. The Gini coefficient of the distribution in Figure {numref}`f-finite_opt_saving_hist` is 0.54, which is too low. For example, World Bank data for 2019 produces a wealth Gini for the US equal to 0.852. For Germany and Japan the figures are 0.816 and 0.627 respectively. In {ref}`ss-osr` we discuss a variation on the optimal savings model that can produce a more realistic wealth distribution. (ss-foi)= ### Optimal Investment As our next application, we consider a monopolist facing adjustment costs and stochastically evolving demand. The monopolist balances setting enough capacity to meet demand against costs of adjusting capacity. #### Problem Description We assume that the monopolist produces a single product and faces an inverse demand function of the form $$ P_t = a_0 - a_1 Y_t + Z_t, $$ where $a_0, a_1$ are positive parameters, $Y_t$ is output, $P_t$ is price, and the demand shock $Z_t$ follows $$ Z_{t+1} = \rho Z_t + \sigma \eta_{t+1}, \qquad \{\eta_t \} \iidsim N(0, 1). $$ Current profits are $$ \pi_t \coloneq P_t Y_t - c Y_t - \gamma (Y_{t+1} - Y_t)^2. $$ Here $\gamma (Y_{t+1} - Y_t)^2$ represents costs associated with adjusting production scale, parameterized by $\gamma$, and $c$ is unit cost of current production. Costs are convex, so rapid changes to capacity are expensive. The monopolist chooses $(Y_t)$ to maximize the expected discounted value of its profit flow, which we write as $$ \EE \, \sum_{t=0}^{\infty} \beta^t \pi_t. $$ (eq-lq_object_mp) Here $\beta = 1/(1+r)$, where $r > 0$ is a fixed interest rate. A way to start thinking about the optimal time path of output is to consider what would happen if $\gamma = 0$. Without adjustment costs there is no intertemporal trade-off, so the monopolist should choose output to maximize current profit in each period. The implied level of output at time $t$ is $$ \bar Y_t \coloneq \frac{a_0 - c + Z_t}{2 a_1}. $$ (eq-baryt) ```{exercise} :label: ex-mdps-auto-8 Show that $\bar Y_t$ maximizes current profit when $\gamma=0$. ``` For $\gamma > 0$, we expect the following behavior. - If $\gamma$ is close to zero, then the optimal output path $Y_t$ will track the time path of $\bar Y_t$ relatively closely, whereas - if $\gamma$ is larger, then $Y_t$ will be significantly smoother than $\bar Y_t$, as the monopolist seeks to avoid adjustment costs. #### MDP Representation We can represent this problem as an MDP. To do so we let $\Ysf$ be a grid contained in $\RR_+$ that lists possible output values. To conform to the finite state setting, we discretize the shock process $(Z_t)$ using Tauchen's method, as described in {ref}`ss-apta`. For convenience we again use $(Z_t)$ to represent the discrete process, which is a finite Markov chain on $\Zsf \subset \RR$ with transition matrix $Q$. The state space for this MDP is $\Xsf = \Ysf \times \Zsf$, while the action space is $\Ysf$. The feasible correspondence is defined by $\Gamma(x) = \Ysf$, meaning that choice of output is not restricted by the state. Thus, the feasible policy set $\Sigma$ is all $\sigma \colon \Ysf \times \Zsf \to \Ysf$. We write $(y,z)$ for the current state, $q$ for the action (which chooses next period output) and $(y',z')$ for the next period state. The current reward function is current profits, which we can write as $$ r((y, z), q) = (a_0 - a_1 y + z - c) y - \gamma (q - y)^2. $$ The stochastic kernel is $$ P((y, z), q, (y', z')) = \1\{y' = q\} Q(z, z'). $$ The term $\1\{y' = q\}$ states that next period output $y'$ is equal to our current choice $q$ for next period output. With these definitions, the problem defines an MDP and all of the optimality theory for MDPs applies. #### Implementation The Bellman operator can be expressed as $$ (Tv)(y, z) = \max_{y' \in \RR} \left\{ r(y, z, y') + \beta \sum_{z'} v(y', z') Q(z, z') \right\}. $$ Given $\sigma \in \Sigma$, we can express the policy operator as $$ (T_\sigma \, v)(y, z) = r(y, z, \sigma(y, z)) + \beta \sum_{z'} v(\sigma(y, z), z') Q(z, z'). $$ A $v$-greedy policy is a $\sigma \in \Sigma$ that obeys $$ \sigma(y, z) \in \argmax_{y' \in \Ysf} \left\{ r(y, z, y') + \beta \sum_{z'} v(y', z') Q(z, z') \right\} \quad \text{for all } (y,z) \in \Xsf. $$ By combining iteration with the policy operator and computation of greedy policies, we can implement OPI, compute the optimal policy $\sigma^*$, and study output choices generated by this policy. We are particularly interested in how output responds over time to randomly generated demand shocks. Figure {numref}`f-finite_lq_1` shows the result of a simulation designed to shed light on how output responds to demand. After choosing initial values $(Y_1, Z_1)$ and generating a $Q$-Markov chain $(Z_t)_{t=1}^T$, we simulated optimal output via $Y_{t+1} = \sigma^*(Y_t, Z_t)$. The default parameters are shown in {numref}`list-finite_lq`. In the figure, the adjustment cost parameter $\gamma$ is varied as shown in the title. In addition to the optimal output path, the path of $(\bar Y_t)$ as defined in {eq}`eq-baryt` is also presented. The figure shows how increasing $\gamma$ promotes smoothing, as predicted in the preceding discussion. For small $\gamma$, adjustment costs have only minor effects on choices, so output closely follows $(\bar Y_t)$, the optimal path when output responds immediately to demand shocks. Conversely, larger values of $\gamma$ make adjustment expensive, so the operator responds relatively slowly to changes in demand. ```{code-block} julia :name: list-finite_lq :caption: Optimal investment model (`finite_lq.jl`) :linenos: using QuantEcon, LinearAlgebra, IterTools include("s_approx.jl") function create_investment_model(; r=0.04, # Interest rate a_0=10.0, a_1=1.0, # Demand parameters γ=25.0, c=1.0, # Adjustment and unit cost y_min=0.0, y_max=20.0, y_size=100, # Grid for output ρ=0.9, ν=1.0, # AR(1) parameters z_size=25) # Grid size for shock β = 1/(1+r) y_grid = LinRange(y_min, y_max, y_size) mc = tauchen(z_size, ρ, ν) z_grid, Q = mc.state_values, mc.p return (; β, a_0, a_1, γ, c, y_grid, z_grid, Q) end ``` ```{figure} ../figures/finite_lq_1.pdf :name: f-finite_lq_1 Simulation of optimal output with different adjustment costs ``` Figure {numref}`f-finite_lq_time` compares timings for VFI, HPI, and OPI. Parameters are as in {numref}`list-finite_lq`. As in Figure {numref}`f-finite_opt_saving_2_1`, which gave timings for the optimal savings model, the horizontal axis shows $m$, which is the step parameter in OPI (see {prf:ref}`algo-mdp_opi`). VFI and HPI do not depend on $m$ and hence their timings are constant. The vertical axis is time in seconds. HPI is faster than VFI, although the difference is not as dramatic as was the case for optimal savings. One reason is that the discount factor is relatively small for the optimal investment model ($r=0.04$ and $\beta = 1/(1+r)$, so $\beta \approx 0.96)$. Since $\beta$ is the modulus of contraction for the Bellman operator, this means that VFI converges relatively quickly. Another observation is that, for many values of $m$, OPI dominates both VFI and HPI in terms of speed, which is consistent with our findings for the optimal savings model. At $m=70$, OPI is around 20 times faster than VFI. ```{figure} ../figures/finite_lq_time.pdf :name: f-finite_lq_time Timings for alternative algorithms, investment model ``` ```{exercise} :label: ex-fhi Consider a firm that maximizes expected discounted value in a setting where future profits are discounted at rate $\beta = 1/(1+r)$, the only production input is labor and hiring involves fixed costs. Let $\ell_t$ be employment at the firm at time $t$. Current profits are $$ \pi_t = p Z_t \ell_t^\alpha - w \ell_t - \kappa \1\{\ell_{t+1} \not= \ell_t\}, $$ where $p$ is the output price, $w$ is the wage rate, $\alpha$ is a production parameter, the productivity shock is $Q$-Markov on $\Zsf$ and $\kappa$ is a fixed cost of hiring and firing. This fixed cost induces lumpy adjustment, as shown in Figure {numref}`f-firm_hiring_ts`. Show that this model is an MDP. Write the Bellman equation and the procedure for OPI in the context of this model. Replicate Figure {numref}`f-firm_hiring_ts`, modulo randomness, using the parameters shown in {numref}`list-firm_hiring`. ``` ```{code-block} julia :name: list-firm_hiring :caption: Firm hiring model (`firm_hiring.jl`) :linenos: using QuantEcon, LinearAlgebra, IterTools function create_hiring_model(; r=0.04, # Interest rate κ=1.0, # Adjustment cost α=0.4, # Production parameter p=1.0, w=1.0, # Price and wage l_min=0.0, l_max=30.0, l_size=100, # Grid for labor ρ=0.9, ν=0.4, b=1.0, # AR(1) parameters z_size=100) # Grid size for shock β = 1/(1+r) l_grid = LinRange(l_min, l_max, l_size) mc = tauchen(z_size, ρ, ν, b, 6) z_grid, Q = mc.state_values, mc.p return (; β, κ, α, p, w, l_grid, z_grid, Q) end ``` ```{figure} ../figures/firm_hiring_ts.pdf :name: f-firm_hiring_ts Optimal shifts in the stock of labor ``` (s-mbels)= ## Modified Bellman Equations Direct application of MDP theory is sometimes suboptimal. For example, we saw in {ref}`ss-crwd` that solving the job search problem with iid wage draws is best accomplished by generating a recursion on the continuation value, which reduces dimensionality for iterative solution methods. Separately, in {ref}`sss-iidrd`, we saw how a different manipulation of the Bellman equation also increased efficiency. Now we aim to study such modifications systematically. We begin by providing other examples of how manipulating a Bellman equation can facilitate computation and analysis. Then we establish a theoretical foundation for this line of analysis, and show how similar ideas can also be applied to policy operators and greedy policies. (We also treat similar topics at a more advanced and abstract level in Volume 2.) (ss-sest)= ### Structural Estimation As a first illustration of the ideas in this section, we discuss a connection between econometric estimation and dynamic programs. Our focus is on some modifications that econometricians often make to Bellman equations and how they affect computation and optimality. #### What Is Structural Estimation? Structural estimation is a branch of quantitative social science in which, in a quest to understand observed quantities and prices, researchers attribute Markov decision problems to economic agents. A key step in this approach is to formulate dynamic programs in terms of functional forms and parameters. The econometric challenge is to infer parameters that bring the model outputs as close as possible to actual data. Structural estimation aims to discover objects that are invariant to hypothetical interventions that the analysis wants to investigate. Examples of such invariant objects are parameters of utility functions, discount factors, and production technologies. Agents inside the model solve their MDPs. A policy intervention that systematically alters the Markov processes that they face will alter agents' optimal policies, that is, their decision rules. Various examples of such interventions involving aspects of fiscal and monetary policy are described in various chapters of {cite:t}`lucas1981rational` a compendium of early papers that were written in response to the {cite:t}`lucas1976econometric` Critique of then prevailing dynamic econometric models.[^2] ```{prf:example} {cite:t}`gillingham2022equilibrium` study the used car market in Denmark by modeling consumers who trade cars in the new and used car markets. By modeling consumers' decision problems, the authors are able to investigate how consumers would react to a hypothetical modification in automobile taxes. The study finds that automobile taxes were too high in the sense that the government could have raised more tax revenue by lowering tax rates. ``` Efficient solution methods are essential in structural estimation because the underlying dynamic program must be solved repeatedly in order to search the parameter space for a good fit to data. Moreover, these dynamic programs are often high-dimensional, due to shocks to preferences and other random variables that the agents inside the model are assumed to see but that the econometrician does not. When these shocks are persistent, the dimension of the state grows.[^3] In order to maintain focus on dynamic programming, we will not describe the details of the estimation step required for structural estimation (although {ref}`s-cn_mdps` contains references for those who wish to learn about that). Instead, we focus on the kinds of dynamic programs treated in structural estimation and techniques for solving them efficiently. (sss-evfs)= #### An Illustration Let us look at an example of a dynamic program with preference shocks used in structural estimation, which is taken from a study of labor supply by married women {cite:p}`keane2011structural`. The husband of the decision-maker, a married woman, is already working. The couple has young children and the mother is deciding whether to work. Her utility function is $$ u(c, d, \xi) = c + (\alpha n + \xi) (1 - d), $$ where $c$ is consumption, $\alpha$ is a parameter, $n$ is the number of children, $\xi$ is a preference shock and $d$ is the action variable. The action is binary, with $d=1$ representing the decision to work in the current period and $d=0$ representing the decision not to work.[^4] The budget constraint for the household is $$ c_t = f_t + w_t d_t - \pi n d_t, $$ where $f_t$ is the father's income, $w_t$ is the mother's wage and $\pi$ is the cost of child care. Wages depend on human capital $h_t$, which increases with experience. In particular, $$ w_t = \gamma h_t + \eta_t, \quad \text{with} \quad h_t = h_{t-1} + d_{t-1}. $$ Here $\eta_t$ is random and $\gamma$ is a parameter. We assume that $(f_t)_{t \geq 0}$ is $F$-Markov on some finite set. In the model, $(\xi_t)_{t \geq 0}$ and $(\eta_t)_{t \geq 0}$ are iid. We denote their joint distribution by $\phi$. With constant discount factor $\beta$ and implied utility $$ r(f, h, \xi, \eta, d) \coloneq f + (\gamma h + \eta) d - \pi n d + (\alpha n + \xi) (1 - d), $$ the problem of maximizing expected discounted utility is an MDP with the Bellman equation $$ v(f, h, \xi, \eta) = \max_d \left\{ r(f, h, \xi, \eta, d) + \beta \sum_{f', \xi', \eta'} v(f', h + d, \xi', \eta') F(f, f') \phi(\xi', \eta') \right\}. $$ While we can proceed directly with a technique such as VFI to obtain optimal choices, we can simplify. One way is by reducing the number of states. A hint comes from looking at the expected value function $$ g(f, h, d) \coloneq \sum_{f', \xi', \eta'} v(f', h + d, \xi', \eta') F(f, f') \phi(\xi', \eta') $$ This function depends only on three arguments and, moreover, the choice variable $d$ is binary. Hence we can break $g$ down into two functions $g(f, h, 0)$ and $g(f, h, 1)$, each of which depends only on the pair $(f, h)$. These functions are substantially simpler than $v$ when the domain of $(\xi, \eta)$ is large. Hence, it is natural to consider whether we can solve our problem using $g$ rather than $v$. (sss-revfs)= #### Expected Value Functions Rather than address this question within the context of the preceding model, let's shift to a generic version of the dynamic program used in structural estimation and how it can be solved using expected value methods. Our generic version takes the form $$ v(y, \epsilon) = \max_{a \in \Gamma(y)} \left\{ r(y, \epsilon, a) + \beta \sum_{y'} \int v(y', \epsilon') P(y, a, y') \phi(\epsilon') \diff \epsilon' \right\} $$ (eq-ev_bell) for all $y \in \Ysf$ and $\epsilon \in \Esf$. Here $\Ysf$ is a finite set, often determined by discretization of a continuous space, whereas $\Esf$, the outcome space for $\epsilon$, is allowed to be continuous. The state $y$ will be called the endogenous state and $\epsilon$ is the preference shock. In practice, $\epsilon$ will often be a vector of shocks that affect current rewards. The integral can therefore be multivariate and is over all of $\Esf$. The problem represented by {eq}`eq-ev_bell` is a version of a regular MDP, with state $x = (y, \epsilon)$ taking values in $\Xsf \coloneq \Ysf \times \Esf$. If we discretize the space $\Esf$, then all the optimality theory for MDPs applies. Instead of taking this approach, however, we draw on our discussion of labor choice in {ref}`sss-evfs`. In particular, to enhance efficiency, we will work with the **expected value function** $$ g(y, a) \coloneq \sum_{y'} \int v(y', \epsilon') P(y, a, y') \phi(\epsilon') \diff \epsilon'. $$ (eq-evga) There are several potential advantages associated with working with $g$ rather than $v$. One is that the set of actions $\Asf$ can be much smaller than the set of states that would be created by discretization of the preference shock space $\Esf$ (especially if $\epsilon_t$ takes values in a high-dimensional space). Another is that the integral provides smoothing, so that $g$ is typically a smooth function. This can accelerate structural estimation procedures. (sss-ovev)= #### Optimality via EV Methods To exploit the relative simplicity of the expected value function, we rewrite the Bellman equation {eq}`eq-ev_bell` as $$ v(y, \epsilon) = \max_{a \in \Gamma(y)} \left\{ r(y, \epsilon, a) + \beta g(y, a) \right\}. $$ Taking expectations of both sides and using {eq}`eq-evga` again gives $$ g(y, a) = \sum_{y'} \int \max_{a' \in \Gamma(y')} \left\{ r(y', \epsilon', a') + \beta g(y', a') \right\} \phi(\epsilon') \diff \epsilon' P(y, a, y') . $$ To solve this functional equation we introduce the **expected value Bellman operator** $R$ defined at $g \in \RR^\Gsf$ by $$ (Rg)(y, a) = \sum_{y'} \int \max_{a' \in \Gamma(y')} \left\{ r(y', \epsilon', a') + \beta g(y', a') \right\} \phi(\epsilon') \diff \epsilon' P(y, a, y'). $$ (eq-evbo) Here $\Gsf$ is the set of feasible state action pairs $(y, a)$. ```{exercise} :label: ex-raac Prove that $R$ is order-preserving and a contraction of modulus $\beta$ on $\RR^\Gsf$ (with respect to the supremum norm). ``` In what follows, we let $g^*$ be the fixed point of $R$ in $\RR^\Gsf$. Since $R$ is a contraction map, $g^*$ can be computed by successive approximation. The next result shows that knowing this fixed point is enough to solve the dynamic program. ```{prf:proposition} :label: p-opor A policy $\sigma \in \Sigma$ is optimal if and only if $$ \sigma(y, \epsilon) \in \argmax_{a \in \Gamma(y)} \left\{ r(y, \epsilon, a) + \beta g^*(y, a) \right\} \qquad \text{for all } (y, \epsilon) \in \Ysf \times \Esf. $$ ``` We postpone proving {prf:ref}`p-opor` until {ref}`ss-rfbe`, where we prove a more general result. ```{prf:example} In the labor supply problem in {ref}`sss-evfs`, the expected value Bellman operator becomes $$ (Rg)(f, h, d) = \sum_{f', \xi', \eta'} \max_{d'} \left\{ r(f', h+d, \xi', \eta', d') \beta g(f', h+d, d') \right\} F(f, f') \phi(\xi', \eta'). $$ Iterating from an arbitrary guess of $g$ converges to the unique fixed point $g^*$ of $R$. By {prf:ref}`p-opor`, we can then compute the optimal policy $\sigma^*$ at $(f,h, \xi, \eta)$ by taking $$ \sigma^*(f, h, \xi, \eta) \in \argmax_d \left\{ r(f, h, \xi, \eta, d) + \beta g^*(f, h, d) \right\}. $$ ``` (ss-gummax)= ### The Gumbel Max Trick {ref}`sss-revfs` described how using expected values can reduce dimensionality by smoothing. But there is another feature of an expected value formulation of a Bellman equation that we can take advantage of when we are prepared to impose extra structure on preference shocks. This section provides details. A real-valued random variable $Z$ is said to have a **Gumbel distribution** (or a "type 1 generalized extreme value distribution") with mode $\mu \in \RR$ if its cumulative distribution function takes the form $F(z) = \exp(-\exp(z - \mu))$. To denote a random variable with a Gumbel distribution, we write $Z \sim G(\mu)$. The expectation of $Z$ is $\mu + \gamma$, where $\gamma \approx 0.577$ is the **Euler--Mascheroni constant**. ```{exercise} :label: ex-mdps-auto-9 Prove: if $Z \sim G(\mu)$ and $\lambda \in \RR$, then $Z + \lambda \sim G(\mu + \lambda)$. ``` The Gumbel distribution has the following useful stability property, a proof of which can be found in {cite:t}`huijben2022review`. ```{prf:lemma} :label: l-gummax If $Z_1, \ldots, Z_k \iidsim G(0)$ and $c_1, \ldots, c_k$ are real numbers, then $$ \max_{1 \leq i \leq k} (Z_i + c_i) \sim G \left\{ -\gamma + \ln \left[ \sum_{i=1}^k \exp(c_i) \right] \right\}. $$ ``` To exploit {prf:ref}`l-gummax`, we continue the discussion in {ref}`sss-ovev`, but assume now that $\Asf = \{a_1, \ldots, a_k\}$, that $\Gamma(y') = \Asf$ for all $y'$ (so that actions are unrestricted), that $\epsilon'$ in {eq}`eq-evbo` is additive in rewards and indexed by actions, so that $r(y', \epsilon', a') = r(y', a') + \epsilon'(a')$ for all feasible $(y',a')$, and that, conditional on $y'$, the vector $(\epsilon(a_1), \ldots, \epsilon(a_k))$ consists of $k$ independent $G(0)$ shocks. Thus, each feasible choice returns a rewards perturbed by an independent Gumbel shock. From these assumptions and {prf:ref}`l-gummax`, the term inside the integral in {eq}`eq-evbo` satisfies $$ \begin{aligned} \max_{a'} \left\{ r(y', \epsilon', a') + \beta g(y', a') \right\} & = \max_{a'} \left\{ r(y', a') + \epsilon'(a') + \beta g(y', a') \right\} \\ & \sim G \left\{ -\gamma + \ln \left[ \sum_{a'} \exp\left( r(y', a') + \beta g(y', a') \right) \right] \right\}. \end{aligned} $$ Recalling our rule for computing mathematical expectations of Gumbel distributed random variables, the expected value Bellman operator $R$ in {eq}`eq-evbo` becomes $$ (Rg)(y, a) = \sum_{y'} \ln \left[ \sum_{a'} \exp\left( r(y', a') + \beta g(y', a') \right) \right] P(y, a, y'). $$ (eq-evbo2) This operator is convenient because the absence of a max operator permits fast evaluation. Notice also that $R$ is smooth in $g$, which suggests that we can use gradient information to compute its fixed points. ```{prf:proposition} The operator $R$ in {eq}`eq-evbo2` is a contraction of modulus $\beta$ on $\RR^\Gsf$. ``` ```{prf:proof} The operator $R$ is order-preserving on $\RR^\Gsf$. Straightforward algebra shows that, for $c \in \RR_+$ and $g \in \RR^\Gsf$, we have $R(g + c) = Rg + \beta c$. The claim now follows from Blackwell's sufficient condition for a contraction. ◻ ``` Notice how the Gumbel max trick that exploits {prf:ref}`l-gummax` depends crucially on the expected value formulation of the Bellman equation, rather than the standard formulation {eq}`eq-ev_bell`. This is because the expected value formulation puts the max inside the expectation operator, unlike the standard formulation, where the max is on the outside. Variations of the Gumbel max trick have many uses in structural econometrics (see {ref}`s-cn_mdps`). (ss-osr)= ### Optimal Savings with Stochastic Returns on Wealth We modify the {ref}`ss-osci` optimal savings problem by replacing a constant gross rate of interest $R$ by an iid sequence $(\eta_t)_{t \geq 0}$ with common distribution $\phi$ on finite set $\Esf$. So the consumer faces a fluctuating rate of returns on financial wealth. In each period $t$, the consumer knows $\eta_t$, the gross rate of interest between $t$ and $t+1$, before deciding how much to consume and how much to save. Other aspects of the problem are unchanged. We have two motivations. One is computational, namely, to illustrate how framing a decision in terms of expected values can reduce dimensionality, analogous to the results in {ref}`sss-ovev`. The other is to generate a more realistic wealth distribution than that generated by the {ref}`sss-outputs` optimal savings model. With stochastic returns on wealth, the Bellman equation becomes $$ v(w, y, \eta) = \max_{w' \leq \eta(w+y)} \left\{ u \left(w+y - \frac{w'}{\eta} \right) + \beta \sum_{y', \eta'} v(w', y', \eta') Q(y, y') \phi(\eta') \right\} . $$ Both $w$ and $w'$ are constrained to a finite set $\Wsf \subset \RR_+$. The expected value function can be expressed as $$ g(y, w') \coloneq \sum_{y', \; \eta'} v(w', y', \eta') Q(y, y') \phi(\eta'). $$ (eq-gyw) In the remainder of this section, we will say that a savings policy $\sigma$ is **$g$-greedy** if $$ \sigma(y, w, \eta) \in \argmax_{ w' \leq \eta(w+y)} \left\{ u \left(w+y - \frac{w'}{\eta} \right) + \beta g(y, w') \right\} . $$ Since it is an MDP, we can see immediately that if we replace $v$ in {eq}`eq-gyw` with the value function $v^*$, then a $g$-greedy policy will be an optimal one. Using manipulations analogous to those we used in {ref}`sss-ovev`, we can rewrite the Bellman equation in terms of expected value functions via $$ g(y, w') = \sum_{y', \; \eta'} \; \max_{ w'' \leq \eta'(w'+y')} \left\{ u \left(w'+y' - \frac{w''}{\eta'} \right) + \beta g(y', w'') \right\} Q(y, y') \phi(\eta'). $$ From here we could proceed by introducing an expected value Bellman operator analogous to $\eta$ in {eq}`eq-evbo`, proving it to be a contraction map and then showing that greedy policies taken with respect to the fixed point are optimal. All of this can be accomplished without too much difficulty -- we prove more general results in {ref}`ss-rfbe`. However, we also know that OPI is, in general, more efficient than VFI. This motivates us to introduce the modified $\sigma$-value operator $$ (R_\sigma \, g)(y, w') = \sum_{y', \; \eta'} \left\{ u \left( w' +y' - \frac{\sigma(w', y', \eta')}{\eta'} \right) + \beta g(y', \sigma(w', y', \eta')) \right\} Q(y, y') \phi(\eta'). $$ This is a modification of the regular $\sigma$-value operator $T_\sigma$ that makes it act on expected value functions. A suitably modified OPI routine that is adapted from the regular OPI algorithm in {ref}`sss-opi` can be found in {prf:ref}`algo-mod_opi`. The routine is convergent. We discuss this in greater detail in {ref}`ss-rfbe`. ```{code-block} julia :name: list-modified_opt_savings :caption: Optimal savings parameters (`modified_opt_savings.jl`) :linenos: using QuantEcon, LinearAlgebra, IterTools function create_savings_model(; β=0.98, γ=2.5, w_min=0.01, w_max=20.0, w_size=100, ρ=0.9, ν=0.1, y_size=20, η_min=0.75, η_max=1.25, η_size=2) η_grid = LinRange(η_min, η_max, η_size) ϕ = ones(η_size) * (1 / η_size) # Uniform distribution w_grid = LinRange(w_min, w_max, w_size) mc = tauchen(y_size, ρ, ν) y_grid, Q = exp.(mc.state_values), mc.p return (; β, γ, η_grid, ϕ, w_grid, y_grid, Q) end ``` Figure {numref}`f-modified_opt_savings_hist` shows a histogram of a long wealth time series that parallels Figure {numref}`f-finite_opt_saving_hist`. The only significant difference is the switch to stochastic returns (as previously described). Parameters are as in {numref}`list-modified_opt_savings`. Now the wealth distribution has a more realistic long right tail (a few observations are in the far right tail, although they are difficult to see). The Gini coefficient is 0.72, which is closer to typical country values recorded in World Bank data (but still lower than the US). In essence, this occurs because return shocks have multiplicative rather than additive effects on wealth, so a sequence of high draws compounds to make wealth grow fast. ```{figure} ../figures/modified_opt_savings_hist.pdf :name: f-modified_opt_savings_hist Histogram of wealth (stochastic returns) ``` ```{exercise} :label: ex-tranper Consider a version of the optimal savings problem from {ref}`ss-osci` where labor income has both persistent and transient components. In particular, assume that $Y_t = Z_t + \epsilon_t$ for all $t$, where $(\epsilon_t)_{t \geq 0}$ is iid with common distribution $\phi$ on $\Esf$, whereas $(Z_t)_{t \geq 0}$ is $Q$-Markov on $\Zsf$. Such a specification of labor income can capture how households should react differently to transient and "permanent" shocks (see {ref}`s-cn_mdps` for more discussion). Following the pattern developed for the savings model with stochastic returns on wealth, write down both the Bellman equation and the Bellman equation in terms of expected value functions. ``` ```{solution} ex-tranper The Bellman equation becomes $$ v(w, z, \epsilon) = \max_{w' \leq R(w+z+\epsilon)} \left\{ u \left(w+z + \epsilon - \frac{w'}{R} \right) + \beta \sum_{z', \epsilon'} v(w', z', \epsilon') Q(z, z') \phi(\epsilon') \right\} . $$ Both $w$ and $w'$ are constrained to a finite set $\Wsf \subset \RR_+$. The expected value function can be expressed as $$ g(z, w') \coloneq \sum_{z', \; \epsilon'} v(w', z', \epsilon') Q(z, z') \phi(\epsilon'). $$ (eq-gzw) In the remainder of this section, we will say that a savings policy $\sigma$ is **$g$-greedy** if $$ \sigma(z, w, \epsilon) \in \argmax_{ w' \leq R(w+z+\epsilon)} \left\{ u \left(w+z + \epsilon - \frac{w'}{R} \right) + \beta g(z, w') \right\} . $$ Since it is an MDP, we can see immediately that if we replace $v$ in {eq}`eq-gzw` with the value function $v^*$, then a $g$-greedy policy will be an optimal one. We can rewrite the Bellman equation in terms of expected value functions via $$ g(z, w') = \sum_{z', \; \epsilon'} \; \max_{ w'' \leq R(w'+z'+\epsilon')} \left\{ u \left(w'+z' + \epsilon' - \frac{w''}{R} \right) + \beta g(z', w'') \right\} Q(z, z') \phi(\epsilon'). $$ ``` (ss-qlearn)= ### Q-Factors $Q$-factors assign values to state action pairs. They set the stage for $Q$-learning, an application of reinforcement learning, a recursive algorithm for estimating parameters. $Q$-learning uses stochastic approximation techniques to learn $Q$-factors. Under special conditions $Q$-learning eventually learns optimal $Q$-factors for a finite MDP. $Q$-learning is connected to the topic of this chapter because it relies on a Bellman operator for the $Q$-factor. We discuss that Bellman operator, but we don't discuss $Q$-learning here. To begin, we fix an MDP $(\Gamma, \beta, r, P)$ with state space $\Xsf$ and action space $\Asf$. For each $v \in \RR^\Xsf$, the **$Q$-factor** corresponding to $v$ is the function $$ q(x, a) = r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \qquad ((x,a) \in \Gsf). $$ We can convert the Bellman equation into an equation in $Q$-factors by observing that, given such a $q$, the Bellman equation can be written as $v(x) = \max_{a \in \Gamma(x)}q(x, a)$. Taking the mean and discounting on both sides of this equation gives $$ \beta \sum_{x'} v(x') P(x, a, x') = \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x'). $$ Adding $r(x,a)$ and using the definition of $q$ again gives $$ q(x, a) = r(x, a) + \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x'). $$ This functional equation motivates us to introduce the **$Q$-factor Bellman operator** $$ (Sq)(x, a) = r(x, a) + \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x') \qquad ((x,a) \in \Gsf). $$ (eq-pabo) ```{exercise} :label: ex-saac Prove that $S$ is order-preserving and a contraction of modulus $\beta$ on $\RR^\Gsf$ (with respect to the supremum norm). ``` Let $q^*$ be the unique fixed point of $S$ in $\RR^\Gsf$. ```{prf:proposition} :label: p-opos A policy $\sigma \in \Sigma$ is optimal if and only if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} q^*(x, a) \qquad \text{for all } (x, a) \in \Gsf. $$ ``` Enthusiastic readers might like to try to prove {prf:ref}`p-opos` directly. We defer the proof until {ref}`ss-rfbe`, where a more general result is obtained. (ss-rfbe)= ### Operator Factorizations Our study of structural estimation in {ref}`ss-sest`, optimal savings in {ref}`ss-osr` and $Q$-factors in {ref}`ss-qlearn` all involved manipulations of the Bellman and policy operators that presented alternative perspectives on the respective optimization problems. Rather than offering additional applications that apply such ideas, we now develop a general theoretical framework from which to understand manipulations of the Bellman and policy operators for general MDPs. The framework clarifies when and how these techniques can be applied. (sss-vaps)= #### Refactoring the Bellman Operator Fix an MDP $(\Gamma, \beta, r, P)$ with state space $\Xsf$ and action space $\Asf$. As usual, $\Sigma$ is the set of feasible policies, $\Gsf$ is the set of feasible state, action pairs, $T$ is the Bellman operator and $v^*$ denotes the value function. Our first step is to decompose $T$. To do this we introduce three auxiliary operators: - $E \colon \RR^\Xsf \to \RR^\Gsf$ defined by $(Ev)(x, a) = \sum_{x'} v(x') P(x, a, x')$, - $D \colon \RR^\Gsf \to \RR^\Gsf$ defined by $(Dg)(x, a) = r(x, a) + \beta g(x, a)$ and - $M \colon \RR^\Gsf \to \RR^\Xsf$ defined by $(Mq)(x) = \max_{a \in \Gamma(x)} q(x, a)$. Evidently the action of the Bellman operator $T$ on a given $v \in \RR^\Xsf$ is the composition of these three steps: 1. take conditional expectations given $(x, a) \in \Gsf$ (applying $E$), 2. discount and adding current rewards (applying $D$), and 3. maximize with respect to current action (applying $M$). As a result, we can write $T = M D E \coloneq M \circ D \circ E$ (apply $E$ first, $D$ second, and $M$ third). This decomposition is visualized in Figure {numref}`f-triangle2`. The action of $T$ is a round trip from the top node, which is the set of value functions. ```{figure} figures/triangle2.svg :name: f-triangle2 Multiple Bellman operators (EV $=$ expected value) ``` If we study Figure {numref}`f-triangle2`, we can imagine two other round trips. One is a round trip from the set of expected value functions, obtained by the sequence $EMD$. The other is a round trip from the set of $Q$-factors, obtained by the sequence $DEM$. Let's name these additional round trips $R$ and $S$ respectively, so that, collecting all three, $$ R = EMD, \quad S = DEM, \quad T = MDE. $$ (eq-rst) Both $R$ and $S$ act on functions in $\RR^\Gsf$. The next exercise provides an explicit representation of these operators. ```{exercise} :label: ex-mdps-auto-10 Show that for any $g, q \in \RR^\Gsf$ and $(x, a) \in \Gsf$ we have $$ \begin{aligned} (Rg)(x, a) & = \sum_{x'} \max_{a' \in \Gamma(x')} \left\{ r(x', a') + \beta g(x', a') \right\} P(x, a, x') \;\; \text{ and} \\ (Sq)(x, a) & = r(x, a) + \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x'). \end{aligned} $$ ``` Let's connect our "refactored" Bellman operators $R$ and $S$ to our preceding examples. Inspection of {eq}`eq-pabo` confirms that $S$ is exactly the $Q$-factor Bellman operator. In addition, $R$ is a general version of the expected value Bellman operator defined in {eq}`eq-evbo`. ```{exercise} :label: ex-rtrips Show that, for all $k \in \NN$, the following relationships hold - $R^k = ET^{k-1}MD = EMS^{k-1}D$ - $S^k = DR^{k-1}ME = DET^{k-1}M$ - $T^k = MS^{k-1}DE = MDR^{k-1}E$ (Here, for any operator $A$, we take $A^0$ to be the identity map.) ``` While the equalities in {prf:ref}`ex-rtrips` can be proved by induction via the logic revealed by {eq}`eq-rst`, the intuition is straightforward from Figure {numref}`f-triangle2`. For example, the relationship $R^k = ET^{k-1}MD$ states that round-tripping $k$ times from the space of expected values (EV function space) is the same as shifting to value function space by applying $MD$, round-tripping $k-1$ times using $T$, and then shifting one more step to EV function space via $E$. Although the relationships in {prf:ref}`ex-rtrips` are easy to prove, they are already useful. For example, suppose that, in a computational setting, $R$ is easier to iterate with than $T$. Then to iterate with $T$ $k$ times, we can instead use $T^k = MDR^{k-1}E$: We apply $E$ once, $R$ $k-1$ times, and $M$ and $D$ once each. If $k$ is large, this might be more efficient. In the rest of this section, we let $\| \cdot \| \coloneq \| \cdot \|_\infty$, the supremum norm on either $\RR^\Xsf$ or $\RR^\Gsf$. ```{exercise} :label: ex-cononex Prove the following facts: 1. $\| E v - Ev' \| \leq \|v - v'\|$ for all $v, v' \in \RR^\Xsf$, 2. $\| M g - Mg' \| \leq \|g - g'\|$ for all $g, g' \in \RR^\Gsf$, and 3. $\| D q - Dq' \| \leq \beta \|q - q'\|$ for all $q, q' \in \RR^\Gsf$. ``` We can say that $E$ and $M$ are **nonexpansive** on $\RR^\Xsf$ and $\RR^\Gsf$ respectively, whereas $D$ is a contraction on $\RR^\Gsf$. ```{prf:lemma} :label: l-allc The operators $R$, $S$, and $T$ are all contraction maps of modulus $\beta$ under the supremum norm. ``` ```{prf:proof} We proved that $T$ is a contraction of modulus $\beta$ in {prf:ref}`p-dmdp_o`. We can prove this more easily now by applying {prf:ref}`ex-cononex`, which, for arbitrary $v, v' \in \RR^\Xsf$, gives $$ \| Tv - Tv' \| = \| MDEv - MDEv' \| \leq \| MDv - MDv' \| \leq \beta \| Mv - Mv' \| \leq \beta \| v - v' \|. $$ Proofs for $R = EMD$ and $S = DEM$ are similar. ◻ ``` In Section {ref}`sss-refaco`, we clarify relationships between these operators and prove {prf:ref}`p-opor` and {prf:ref}`p-opos`. (sss-refaco)= #### Refactorizations and Optimality From {prf:ref}`l-allc` we see that $R$, $S$ and $T$ all have unique fixed points. We denote them by $g^*$, $q^*$ and $v^*$ respectively, so that $$ Rg^* = g^*, \quad Sq^* = q^*, \quad \text{and} \quad Tv^* = v^*. $$ We already know that $v^*$ is the value function ({prf:ref}`p-dmdp_o`). The following results show that the other two fixed points are, like the value function, sufficient to determine optimality. ```{prf:proposition} :label: p-dpdbk The fixed points of $R$, $S$, and $T$ are connected by the following relationships: 1. $g^* = E v^*$, 2. $q^* = D g^*$, and 3. $v^* = M q^*$. ``` ```{prf:proof} To prove (i), first observe that, in the notation of {eq}`eq-rst`, we have $Ev^* = ET v^* = EMDE v^* = R E v^*$. Hence $Ev^*$ is a fixed point of $R$. But $R$ has only one fixed point, which is $g^*$. Therefore, $g^* = Ev^*$. The proofs of (ii) and (iii) are analogous. ◻ ``` The results in {prf:ref}`p-dpdbk` can be written more explicitly as - $g^*(x, a) = \sum_{x'} v^*(x') P(x, a, x')$ for all $(x,a) \in \Gsf$, - $q^*(x, a) = r(x, a) + \beta g^*(x, a)$ for all $(x,a) \in \Gsf$, and - $v^*(x) = \max_{a \in \Gamma(x)}q^*(x, a)$ for all $x \in \Xsf$. In the next result and the discussion that follows, given $g, q \in \RR^\Gsf$, we will call $\sigma \in \Sigma$ - **$g$-greedy** if $\sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta g(x, a) \right\}$ for all $x \in \Xsf$, and - **$q$-greedy** if $\sigma(x) \in \argmax_{a \in \Gamma(x)} q(x, a)$ for all $x \in \Xsf$. These definitions are exact analogs of the $v$-greedy concept, applied to expected value functions and $Q$-factors respectively. ```{prf:corollary} :label: c-gequiv For $\sigma \in \Sigma$, the following statements are equivalent: 1. $\sigma$ is $v$-greedy when $v=v^*$. 2. $\sigma$ is $g$-greedy when $g=g^*$. 3. $\sigma$ is $q$-greedy when $q=q^*$. In particular, $\sigma$ is optimal if and only if any one (and hence all) of (i)--(iii) holds. ``` ```{prf:proof} To see that (i) implies (ii), suppose that $\sigma$ is $v$-greedy when $v = v^*$. Then for arbitrary $x \in \Xsf$ $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v^*(x') P(x, a, x') \right\} = \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta g^*(x, a) \right\}. $$ Hence $\sigma$ is $g$-greedy when $g=g^*$, and (i) implies (ii). The proofs of the remaining equivalences (ii) $\implies$ (iii) $\implies$ (i) are similar. The claim that $\sigma$ is optimal if and only if any one of (i)--(iii) holds now follows from {prf:ref}`p-dmdp_o`, which asserts that $\sigma$ is optimal if and only if $\sigma$ is $v^*$-greedy. ◻ ``` Notice that {prf:ref}`p-opos` is a special case of {prf:ref}`c-gequiv`. The results in {prf:ref}`c-gequiv` can be understood as "refactored" versions of Bellman's principle of optimality. A consequence of these results is that we can solve a given MDP by modifying VFI to operate either on expected value functions or on $Q$-factors. For example, if we find it more convenient to iterate in expected value space, then (informally) we can proceed as follows: 1. Fix $g \in \RR^\Gsf$. 2. Iterate with $R$ to obtain $g_k \coloneq R^k g \approx g^*$. 3. Compute a $g_k$-greedy policy. Since $g_k \approx g^*$, the resulting policy will be approximately optimal. (sss-refacopi)= #### Refactored OPI In {prf:ref}`c-mdps` we found that VFI is often outperformed by HPI or OPI. Our next step is to apply these methods to modified versions of the Bellman equation, as discussed in {ref}`sss-refaco`. This allows us to combine advantages of HPI/OPI with the potential efficiency gains obtained by refactoring the Bellman equation. We illustrate these ideas by producing a version of OPI that can compute $Q$-factors and expected value functions. (The same is true for HPI, although we leave details of that construction to interested readers.) To begin, we introduce a new operator, denoted $M_\sigma$, that, for fixed $\sigma \in \Sigma$ and $q \in \RR^\Gsf$, produces $$ (M_\sigma \, q)(x) \coloneq q(x, \sigma(x)) \qquad (x \in \Xsf). $$ This operator is the policy analog of the maximization operator $M$ defined by $(Mq)(x) = \max_{a \in \Gamma(x)} q(x, a)$ in {ref}`sss-vaps`. Analogous to {eq}`eq-rst`, we set $$ R_\sigma \coloneq E \, M_\sigma \, D, \quad S_\sigma \coloneq D \,E \,M_\sigma, \quad T_\sigma \coloneq M_\sigma \, D \,E. $$ You can verify that $T_\sigma$ is the ordinary $\sigma$-policy operator (defined in {eq}`eq-mdpts`). The operators $R_\sigma$ and $S_\sigma$ are the expected value and $Q$-factor equivalents. ```{exercise} :label: ex-rtrips2 The relationships in {prf:ref}`ex-rtrips` continue to hold after we swap $R, S, T, M$ with $R_\sigma, S_\sigma, T_\sigma, M_\sigma$. Confirm the first of these relationships, showing in particular that $$ R_\sigma^k = E \, T_\sigma^{k-1} \, M_\sigma \, D \qquad \text{for all } k \in \NN. $$ (eq-rket) ``` Let's now show that OPI can be successfully modified via these alternative operators. We will focus on the expected value viewpoint (value functions are replaced by expected value functions), which is often helpful in the applications we wish to consider. Our modified OPI routine is given in {prf:ref}`algo-mod_opi`. It makes the obvious modifications to regular OPI, switching to working with expected value functions in $\RR^\Gsf$ and from iteration with $T_\sigma$ to iteration with $R_\sigma$. ```{prf:algorithm} Refactored OPI for expected value functions :label: algo-mod_opi - input $g_0 \in \RR^\Gsf$, an initial guess of $g^*$ - input $\tau$, a tolerance level for error - input $m \in \NN$, a step size - $k \leftarrow 0$ - $\epsilon \leftarrow \tau + 1$ - while $\epsilon > \tau $: - $\sigma_k \leftarrow $ a $g_k$-greedy policy - $g_{k+1} \leftarrow R_{\sigma_k}^m g_k$ - $\epsilon \leftarrow \| g_k - g_{k+1} \|_\infty$ - $k \leftarrow k + 1$ - return $\sigma_k$ ``` {prf:ref}`algo-mod_opi` is globally convergent in the same sense as regular OPI ({prf:ref}`algo-mdp_opi`). In fact, if we pick $v_0 \in \RR^\Xsf$ and apply regular OPI with this initial condition, as well as applying {prf:ref}`algo-mod_opi` with initial condition $g_0 \coloneq E v_0$, then the sequences $(v_k)_{k \geq 0}$ and $(g_k)_{k \geq 0}$ generated by the two algorithms are connected via $g_k = E v_k$ for all $k \geq 0$. If greedy policies are unique, then it is also true that the policy sequences generated by the two algorithms are identical. Let's prove these claims, assuming for convenience that greedy policies are unique. Consider first the claim that $g_k = E v_k$ for all $k \geq 0$. This is true by assumption when $k=0$. Suppose, as an induction hypothesis, that $g_k = E v_k$ holds at arbitrary $k$. Let $\sigma$ be $g_k$-greedy. Then $$ \sigma(x) = \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta g_k(x, a) \right\} = \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v_k(x') P(x, a, x') \right\}, $$ where the second equality is implied by $g_k = E v_k$. Hence $\sigma$ is both $g_k$-greedy and $v_k$-greedy and so is the next policy selected by both modified and regular OPI. Moreover, updating via {prf:ref}`algo-mod_opi` and applying {eq}`eq-rket`, we have $$ g_{k+1} = R_\sigma^m \, g_k = E T_\sigma^{m-1} \, M_\sigma \, D g_k = E T_\sigma^{m-1} \, M_\sigma \, D E v_k = E T_\sigma^m \, v_k. $$ Since $\sigma$ is $v_k$-greedy, $T_\sigma^m \, v_k$ is the next function selected by regular OPI. Hence $v_{k+1} = T_\sigma^m \, v_k$. Connecting with the last chain of equalities yields $g_{k+1} = Ev_{k+1}$. This completes the proof that $g_k = E v_k$ for all $k$. Policy functions generated by the algorithms are identical as well. The preceding discussion provides a justification for the modified OPI algorithm we adopted in {ref}`ss-osr`. (s-cn_mdps)= ## Chapter Notes Detailed treatment of MDPs can be found in books by {cite:t}`bellman1957dynamic`, {cite:t}`howard1960dynamic`, {cite:t}`denardo1981dynamic`, {cite:t}`puterman2005markov`, {cite:t}`bertsekas2012dynamic`, {cite:t}`hernandez2012discrete`, {cite:t}`hernandez2012further`, and {cite:t}`kochenderfer2022algorithms`. Further discussion of the connection between HPI and Newton iteration can be found in Section 6.4 of {cite:t}`puterman2005markov`. HPI is routinely used in artificial intelligence applications, including during the training of AlphaZero by DeepMind. Further discussion of these variants of HPI and their connection to Newton iteration can be found in {cite:t}`bertsekas2021rollout` and {cite:t}`bertsekas2022newton`. There are several methods for available for accelerating value function iteration, including asynchronous VFI and Anderson acceleration. Due to space constraints, we omit discussion of these topics. Interested readers can find a treatment of asynchronous VFI in {cite:t}`bertsekas2022abstract`. For discussion of Anderson acceleration see, for example, {cite:t}`walker2011anderson` or {cite:t}`geist2018anderson`. First-order methods for accelerating VFI are presented in {cite:t}`goyal2023first`. Other methods for computing solutions to MDPs include the linear programming (LP) approach and the policy gradient technique, both of which solve a problem of the form $$ \max_{\sigma \in \Sigma} \sum_x w(x) v(x) \quad \st \quad v = r_\sigma + \beta P_\sigma \, v, $$ (eq-lpga) for some chosen weight function $w$. The LP approach views {eq}`eq-lpga` as a linear program and applies various algorithms to the primal and dual problems. See, for example, {cite:t}`puterman2005markov` or {cite:t}`ying2020note`. The policy gradient method involves approximating $\sigma$ and $v$ in {eq}`eq-lpga` using smooth functions with finitely many parameters. These parameters are then adjusted via some version of gradient ascent. A recent trend for high-dimensional MDPs is to approximate the value and policy functions with neural nets. An early exposition can be found in {cite:t}`bertsekas1996neuro`. A more recent monograph is {cite:t}`bertsekas2021rollout`. For research along these lines in the context of economic applications see, for example, {cite:t}`maliar2021deep`, {cite:t}`hill2021solving`, {cite:t}`han2021deepham`, {cite:t}`kahou2021exploiting`, {cite:t}`kase2022estimating`, and {cite:t}`azinovic2022deep`. In some versions of these algorithms, as well as in VFI and HPI, the expectations associated with dynamic programs are computed using Monte Carlo sampling methods. See, for example, {cite:t}`rust1997using`, {cite:t}`powell2007approximate`, and {cite:t}`bertsekas2021rollout`. {cite:t}`sidford2023variance` combine LP and sampling approaches. The optimal savings problem is a workhorse in macroeconomics and has been treated extensively in the literature. Early references include {cite:t}`brock1972optimal`, {cite:t}`mirman1975optimal`, {cite:t}`schechtman1976income`, {cite:t}`deaton1992behaviour`, and {cite:t}`carroll1997buffer`. For more recent studies, see, for example, {cite:t}`li2014solving`, {cite:t}`accikgoz2018existence`, {cite:t}`light2018precautionary`, {cite:t}`lehrer2018effect`, or {cite:t}`ma2020income`. Recent applications involving optimal savings in a representative agent framework include {cite:t}`bianchi2011overborrowing`, {cite:t}`paciello2014exogenous`, {cite:t}`rendahl2016fiscal`, {cite:t}`heathcote2018wealth`, {cite:t}`paroussos2019climate`, {cite:t}`erosa2019taxation`, {cite:t}`herrendorf2021structural`, and {cite:t}`michelacci2022extensive`. For more on the long right tail of the wealth distribution (as discussed in {ref}`ss-osr`), see, for example, {cite:t}`benhabib2015wealth_jet`, {cite:t}`krueger2016macroeconomics`, or {cite:t}`stachurski2019impossibility`. Households solving optimal savings problems are often embedded in heterogeneous agent models in order to study income distributions, wealth distributions, business cycles and other macroeconomic phenomena. Representative examples include {cite:t}`aiyagari1994uninsured`, {cite:t}`huggett1993risk`, {cite:t}`krusell1998income`, {cite:t}`miao2006competitive`, {cite:t}`algan2014solving`, {cite:t}`toda2014incomplete`, {cite:t}`benhabib2015wealth_jet`, {cite:t}`stachurski2019impossibility`, {cite:t}`toda2019wealth`, {cite:t}`light2020uniqueness`, {cite:t}`hubmer2020sources`, or {cite:t}`cao2020recursive`. Exercise §{prf:ref}`ex-tranper` considered optimal savings and consumption in the presence of transient and persistent shocks to labor income. For research in this vein, see, for example, {cite:t}`quah1990permanent`, {cite:t}`carroll2009precautionary`, {cite:t}`de2010elderly`, or {cite:t}`lettau2014shocks`. For empirical work on labor income dynamics, see, for example, {cite:t}`newhouse2005persistence`, {cite:t}`guvenen2007learning`, {cite:t}`guvenen2009empirical`, or {cite:t}`blundell2015labor`. For analysis of optimal savings in a very general setting, see {cite:t}`ma2020income` or {cite:t}`ma2021theory`. The optimal investment problem dates back to {cite:t}`lucas1971investment`. Textbook treatments can be found in {cite:t}`stokey1989recursive` and {cite:t}`dixit2012investment`. {cite:t}`sargent1980tobin` and {cite:t}`hayashi1982tobin` used optimal investment problems to connect optimal capital accumulation with Tobin's $q$ (which is the ratio between a physical asset's market value and its replacement value). Other influential papers in the field include {cite:t}`lee2000role`, {cite:t}`hassett2002tax`, {cite:t}`bloom2007uncertainty`, {cite:t}`bond2007microeconometric`, {cite:t}`bloom2009impact`, and {cite:t}`wang2012hayashi`. {cite:t}`carruth2000we` contains a survey. Classic papers about S--s inventory models include {cite:t}`arrow1951optimal` and {cite:t}`dvoretzky1952inventory`. Optimality of S--s policies under certain conditions was first established by {cite:t}`scarf1960optimality`. {cite:t}`kelle1999effect` study the impact of S--s inventory policies on the supply chain, including connection to the "bullwhip" effect. The connection between S--s inventory policies and macroeconomic fluctuations is studied in {cite:t}`nirei2006threshold`. The model in {prf:ref}`ex-fhi` is loosely adapted from {cite:t}`bagliano2004models`. {cite:t}`rust1994structural` is a classic and highly readable reference in the area of structural estimation of MDPs. {cite:t}`keane1997career` provides an influential study of the career choices of young men. {cite:t}`roberts1997decision` analyze the decision to export in the presence of sunk costs. {cite:t}`keane2011structural` provide an overview of structural estimation applied to labor market problems. {cite:t}`gentry2018structural` review analysis of auctions using structural estimation. {cite:t}`legrand2019empirical` surveys the use of structural models to study the dynamics of commodity prices. {cite:t}`calsamiglia2020structural` use structural estimation to study school choices. {cite:t}`iskhakov2020machine` provide a thoughtful discussion on the differences between structural estimation and machine learning. {cite:t}`luo2022penalized` propose structural estimation via sieves. Theoretical analysis of expected value functions in discrete choice models and other settings can be found in {cite:t}`rust1994structural`, {cite:t}`norets2010continuity`, {cite:t}`mogensen2018solving` and {cite:t}`kristensen2021solving`. The expected value Gumbel max trick is due to {cite:t}`rust1987optimal` and builds on work by {cite:t}`mcfadden1974measurement`. The Gumbel max trick is also used in machine learning methods (see, e.g., {cite}`jang2016categorical`). In {ref}`ss-qlearn` we mentioned $Q$-learning, which was originally proposed by {cite:t}`watkins1989learning`. {cite:t}`tsitsiklis1994asynchronous` and {cite:t}`melo2001convergence` studied convergence of $Q$-learning. In related work, {cite:t}`esponda2021equilibrium` study MDPs where dynamics are unknown, and where agents update their understanding of transition laws via Bayesian updating. The theory in {ref}`ss-rfbe` on optimality under modifications of the Bellman equation is loosely based on {cite:t}`ma2021dynamic`. That paper considers arbitrary modifications in a very general setting. [^1]: See {cite:t}`marcet2007incomplete` and {cite:t}`zhu2020existence` for more extensive analysis of how adding a labor supply choice can affect outcomes in a consumption-savings model. [^2]: Rational expectations econometrics was a response to that Critique. While early work on rational expectations originated from the macroeconomics community (e.g., {cite}`hansen1980formulating`, {cite}`hansen2019rational`), many of their examples were actually about industrial organization and other microeconomic models. This work was part of a broad process that erased many boundaries between micro and macro theory. [^3]: {cite:t}`hansen1980formulating` analyze the implications of such "Shiller errors" for efficient estimation procedures in a class of linear structural models. [^4]: Here, the woman is the primary carer of the child; she derives no utility from children in periods in which she works. See {cite:t}`keane2011structural` for further discussion. ======================================================================== ## State-Dependent Dynamics (c-state_dep)= # Stochastic Discounting In this chapter we describe how to extend the MDP model to handle time-varying discount factors, a specification now widely used in macroeconomics and finance. (s-vf)= ## Time-Varying Discount Factors We introduce formulas for infinite-horizon lifetime valuations under stochastic discounting and provide necessary and sufficient conditions for existence of finite solutions. (sss-ggs)= ### Valuation Our first step is to motivate and understand lifetime valuation when discount factors vary over time. (sss-tvir)= #### Motivation (ss-mosd)= In {ref}`sss-fvfi` we discussed firm valuation in a setting where the interest rate is constant. But data show that interest rates are time-varying, even for safe assets such as US Treasury bills. Figure {numref}`f-plot_interest_rates_nominal` shows nominal interest rate on one year Treasury bills since the 1950s, whereas Figure {numref}`f-plot_interest_rates_real` shows an estimate of the real interest rate for 10 year T-bills since 2012. Both nominal and real interest rates are evidently time varying. ```{figure} ../figures/plot_interest_rates_nominal.pdf :name: f-plot_interest_rates_nominal Nominal US interest rates (`plot_interest_rates_nominal.jl`) ``` ```{figure} ../figures/plot_interest_rates_real.pdf :name: f-plot_interest_rates_real Real US interest rates (`plot_interest_rates_real.jl`) ``` ```{prf:example} :label: eg-fvstatedep Consider a firm valuation problem where interest rates $(r_t)_{t \geq 0}$ are stochastic. The time zero expected present value of time $t$ profit $\pi_t$ is $$ \EE \left[ \beta_1 \cdots \beta_t \cdot \pi_t \right] \quad \text{where} \quad \beta_t \coloneq \frac{1}{1+r_t}. $$ The lifetime value of the firm is then $$ V_0 = \EE \, \sum_{t=0}^\infty \left[ \prod_{i=0}^{t} \beta_i \right] \pi_t. $$ (eq-v0fir) ``` ```{prf:remark} Time-varying discount factors are found in extensions of the Section {ref}`sss-nccas` household consumption-saving problem that appear in modern models of business cycle dynamics, asset prices, and wealth distributions. For just one important example, see {cite:t}`krusell1998income`. {cite:t}`marimon1984` added random discount factors in his thorough analysis of growth and turnpike properties of general equilibrium models, unfortunately only parts of which were included in {cite:t}`marimon1989`. Exogenous impatience shocks have been used as demand shocks in some dynamic models. For more citations see {ref}`s-cn_val`. ``` (sss-do_theory)= #### Theory The aim of this section is to understand and evaluate expressions such as {eq}`eq-v0fir`. Throughout, - $\Xsf$ is a finite set, $P \in \mopx$, and $(X_t)_{t \geq 0}$ is $P$-Markov. - $h$ is an element of $\RR^\Xsf$, with $h(X_t)$ typically interpreted as a payoff or reward at time $t$ in state $X_t$. - $b$ is a map from $\Xsf \times \Xsf$ to $(0, \infty)$ and $$ \beta_t \coloneq b(X_{t-1}, X_t) \text{ for } t \in \NN \quad \text{with} \quad \beta_0 \coloneq 1. $$ (eq-defbetat) The sequence $(\beta_t)_{t \geq 0}$ is called a **discount factor process** and $\prod_{i=0}^t \beta_i$ is the discount factor for period $t$ payoffs evaluated at time zero. We are interested in expected discounted sums of the form $$ v(x) \coloneq \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] h(X_t) \qquad (x \in \Xsf). $$ (eq-vxecg) ```{prf:theorem} :label: t-dpec Let $L \in \lopx$ be the discount operator defined by $$ L(x,x') \coloneq b(x, x') P(x, x') , $$ (eq-lbp) for $(x,x') \in \Xsf \times \Xsf$. If $\rho(L)<1$, then $v$ in {eq}`eq-vxecg` is finite for all $x \in \Xsf$ and, moreover, $$ v = (I - L)^{-1} h = \sum_{t =0}^\infty L^t h. $$ (eq-redpeg) ``` {prf:ref}`t-dpec` generalizes {prf:ref}`l-fgsd`. Indeed, if $b \equiv \beta \in (0, 1)$, then $L = \beta P$ and $\rho(L) = \beta \rho(P) = \beta < 1$, so the result in {prf:ref}`t-dpec` reduces to {prf:ref}`l-fgsd`. ```{prf:proof} :label: p-t-dpec *Proof of {prf:ref}`t-dpec`.* To verify {prf:ref}`t-dpec`, we first prove that $$ \EE_x \, \left[ \prod_{i=0}^t \beta_i \right] \, h(X_t) = (L^t h)(x) \quad \text{for all } t \in \NN \text{, } h \in \RR^\Xsf \text{ and } x \in \Xsf. $$ (eq-ebbh) We establish {eq}`eq-ebbh` using induction on $t$. It is easy to see that {eq}`eq-ebbh` holds at $t=1$. Now suppose it holds at $t$. We claim it also holds at $t+1$. To show this we fix $h \in \RR^\Xsf$ and set $\delta_t \coloneq \prod_{i=0}^t \beta_i$. Applying the law of iterated expectations (see {ref}`sss-lawie`) yields $$ \EE_x \, \delta_{t+1} \, h(X_{t+1}) = \EE_x \, \EE_t \, b(X_t, X_{t+1}) \delta_t \, h(X_{t+1}) = \EE_x \, \delta_t \, \EE_t \, b(X_t, X_{t+1}) h(X_{t+1}). $$ Since $$ \EE_t \, b(X_t, X_{t+1}) h(X_{t+1}) = \sum_{x'} b(X_t, x') h( x') P(X_t, x') = \sum_{x'} L(X_t, x') h( x') = (Lh)(X_t), $$ we can now write $$ \EE_x \, \delta_{t+1} h(X_{t+1}) = \EE_x \, \delta_t f(X_t) \quad \text{where} \quad f(x) \coloneq (L h)(x). $$ (eq-ebbhi) Applying the induction hypothesis to {eq}`eq-ebbhi` yields $\EE_x \, \delta_{t+1} h(X_{t+1}) = (L^t f)(x)$. But $L^t f = L^t L h = L^{t+1} h$, so $\EE_x \, \delta_{t+1} h(X_{t+1}) = (L^{t+1} h)(x)$. This completes the induction step and hence the proof of {eq}`eq-ebbh` Now we can complete the proof of {prf:ref}`t-dpec`. To this end, we fix $x \in \Xsf$ and use {eq}`eq-ebbh` to obtain $$ v(x) = \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] h(X_t) = \sum_{t=0}^\infty \EE_x \, \left[ \prod_{i=0}^t \beta_i \right] h(X_t) = \sum_{t=0}^\infty (L^t h)(x). $$ (eq-vxgs) Pointwise, this is $v = \sum_{t \geq 0} L^t h$. By the Neumann series lemma and $\rho(L)<1$, this sum converges and equals $(I-L)^{-1} h$. ◻ ``` In {eq}`eq-vxgs` we passed expectations through an infinite sum. This operation is valid under the assumption $\rho(L)<1$. A complete proof can be found in {ref}`s-state_dep_append`. ```{exercise} :label: ex-state_dep-auto-1 Consider {prf:ref}`eg-fvstatedep` again but now assume that $(X_t)$ is $P$-Markov, $\pi_t = \pi(X_t)$, and $r_t = r(X_t)$ for some $r, \pi \in \RR^\Xsf$.[^1] The expected present value of the firm given current state $X_0 = x$ is $$ v(x) = \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^{t} \beta_i \right] \pi_t. $$ (eq-vfecr) Suggest a condition under which $v(x)$ is finite and discuss how to compute it. ``` ```{solution} ex-state_dep-auto-1 Set $L(x, x') \coloneq \beta(x) P(x, x')$ with $\beta(x) \coloneq 1/(1+r(x))$. We claim that {eq}`eq-vfecr` is finite for all $x \in \Xsf$ and satisfies $v = (I - L)^{-1} \pi$ whenever $\rho(L)<1$. To see this, we apply {prf:ref}`t-dpec` with $b(x, x') = \beta(x)$ and $h = \pi$. Incidentally, to understand $v = \pi + Lv$, suppose we buy the firm now, hold it for one period and then sell it. The expected present value of the payoff is $\pi + Lv$. If expected benefit equals cost, then the value of (i.e., cost of buying) the firm now should equal $\pi + Lv$. That is, $v = \pi + Lv$. We expand on these ideas in {ref}`s-asset`. ``` ```{exercise} :label: ex-state_dep-auto-2 Let $\Xsf$ be partially ordered and assume $\rho(L)<1$. Prove that $v$ is increasing on $\Xsf$ whenever $P$ is monotone increasing, $\pi$ is increasing on $\Xsf$, and $r$ is decreasing on $\Xsf$. ``` ### Testing the Spectral Radius Condition In {prf:ref}`t-dpec` the condition $\rho(L) < 1$ drives stability. In this section, we develop necessary and sufficient conditions for $\rho(L) < 1$ to hold. #### Spectral Radii via Expectations First we develop an alternative representation of the spectral radius based on expectations. The next result is proved via a local spectral radius argument. In the statement, $\beta_t$ is as defined in {eq}`eq-defbetat` and $L$ is the operator in {eq}`eq-lbp`. ```{prf:lemma} :label: l-alsr Let $(X_t)$ be $P$-Markov starting at $x$. The spectral radius of $L$ obeys $$ \rho(L) = \lim_{t \to \infty} \ell_t^{1/t} \quad \text{when} \quad \ell_t \coloneq \max_{x \in \Xsf} \EE_x \prod_{i=0}^{t} \beta_i. $$ (eq-alsr) Moreover, $\rho(L) < 1$ if and only if there exists a $t \in \NN$ such that $\ell_t < 1$. ``` ```{prf:proof} Let $\1$ be an $n$-vector of ones. In view of {eq}`eq-ebbh`, for fixed $t \in \NN$, we have $$ \ell_t^{1/t} = \left( \max_{x \in \Xsf} (L^t \1)(x) \right)^{1/t} = \| L^t \1 \|_\infty^{1/t}. $$ Since $\1 \gg 0$, an application of {prf:ref}`l-lsr` yields {eq}`eq-alsr`. For a proof of the second claim in {prf:ref}`l-alsr`, see Proposition 4.1 of {cite:t}`stachurski2021dynamic`. ◻ ``` The expression in {eq}`eq-alsr` connects the spectral radius with the long-run properties of the discount factor process. The connection becomes even simpler when $P$ is irreducible, as the next exercise asks you to show. ```{exercise} :label: ex-iralsr Let $P$ be irreducible. Show that, when $(X_t)$ is $P$-Markov with $X_0$ drawn from the unique stationary distribution $\psi^*$ of $P$, we also have $$ \rho(L) = \lim_{t \to \infty} \left( \EE \prod_{i=0}^{t} \beta_i \right)^{1/t}. $$ (eq-aslsr) (Hint: Try replacing $\| \cdot \|_\infty$ in the proof of {prf:ref}`l-alsr` with $\|h\|_* \coloneq \sum_{x} |h(x)| \psi^*(x)$. We showed that $\| \cdot \|_*$ is a norm on $\RR^\Xsf$ in {prf:ref}`ex-expnorm`.) ``` {prf:ref}`ex-iralsr` shows that the spectral radius is a long-run (geometric) average of the discount factor process. For the conclusions of {prf:ref}`t-dpec` to hold, we need this long-run average to be less than unity. ```{solution} ex-iralsr Let $(X_t)$ be $P$-Markov with $X_0$ drawn from $\psi^*$, let $\| \cdot \|_*$ be the norm defined in the exercise and let $\1$ be an $n$-vector of ones. In view of {eq}`eq-ebbh`, for fixed $t \in \NN$, we have $$ \EE \prod_{i=0}^{t} \beta_i = \EE \left[ \EE \prod_{i=0}^{t} \beta_i \given X_0 \right] = \EE (L^t \1)(X_0) = \| L^t \1 \|_*. $$ Since $\1 \gg 0$, the local spectral radius result yields {eq}`eq-aslsr`. ``` Figure {numref}`f-discount_spec_rad` illustrates the condition $\rho(L) < 1$ when $\beta_t = X_t$ and $P$ is a Markov matrix produced by discretization of the AR1 process $$ X_{t+1} = \mu (1 - a) + a X_t + s (1 - a^2)^{1/2} \epsilon_{t+1} \qquad (\epsilon_t) \iidsim N(0,1). $$ (eq-car1) The discussion in {ref}`ss-apta` tells us that the stationary distribution $\psi^*$ of {eq}`eq-car1` is normally distributed with mean $\mu$ and standard deviation $s$. The parameter $a$ controls autocorrelation. In the figure we set $\mu$ to 0.96, which, since $\beta_t = X_t$, is the stationary mean of the discount factor process. The parameters $a$ and $s$ are varied in the figure, and the contour plot shows the corresponding value of $\rho(L)$. The process {eq}`eq-car1` is discretized via the Tauchen method with the size of the state space set to 6 (which avoids negative values for $\beta(x)$). ```{figure} ../figures/discount_spec_rad.pdf :name: f-discount_spec_rad $\rho(L)$ for different values of $(a,s)$ (`discount_spec_rad.jl`) ``` The figure shows that $\rho(L)$ tends to increase with both the volatility and the autocorrelation of the state process. This seems natural given the expression on the right hand side of {eq}`eq-aslsr`, since sequences of large values of $\beta_i$ compound in the product $\prod_{i=0}^{t} \beta_i$, pushing up the long-run average value, and such sequences occur more often when autocorrelation and volatility are large. We finish this section with a lemma that simplifies computation of the spectral radius in settings where the process $(\beta_t)$ depends only on a subset of the state variables -- a setting that is common in applications. In the statement of the lemma, the state space $\Xsf$ takes the form $\Xsf = \Ysf \times \Zsf$. We fix $Q \in \mopz$ and $R \in \mopy$. The discount operator $L$ is $$ L(x,x') = b(z, z') Q(z, z') R(y, y') \quad \text{with} \quad b \colon \Zsf \times \Zsf \to \RR_+. $$ Let $(Z_t)$ and $(Y_t)$ be $Q$-Markov and $R$-Markov respectively, so that, with $P$ as the pointwise product of $Q$ and $R$, the process $(X_t) \coloneq ((Z_t, Y_t))$ is $P$-Markov. We set $L_\Zsf(z,z') \coloneq b(z, z') Q(z, z')$. ```{prf:lemma} :label: l-zyind The operators $L$ and $L_\Zsf$ obey $\rho(L) = \rho(L_\Zsf)$, where the first spectral radius is taken in $\lopx$ and the second is taken in $\lopz$. ``` ```{prf:proof} Let $\beta_t = b(Z_t, Z_{t+1})$. Since $(\beta_t)$ depends only on $(Z_t)$ and, in addition, $(Y_t)$ and $(Z_t)$ are independent, for $x = (y,z) \in \Xsf$ we have $\EE_x \, \prod_{i=0}^t \beta_i = \EE_{(y, z)} \, \prod_{i=0}^t \beta_i = \EE_z \, \prod_{i=0}^t \beta_i$. Hence $$ \left( \max_{x \in \Xsf} \EE_x \, \prod_{i=0}^t \beta_i \right)^{1/t} = \left(\max_{z \in \Zsf} \EE_z \, \prod_{i=0}^t \beta_i \right)^{1/t}. $$ Taking the limit and using {prf:ref}`l-alsr` gives $\rho(L) = \rho(L_\Zsf)$, where the first spectral radius is taken in $\lopx$ and the second is taken in $\lopz$. ◻ ``` (sss-necnsl)= #### Necessary Conditions In {ref}`sss-ggs` we studied settings where lifetime value is a function $v$ on a state space $\Xsf$ that satisfies an equation of the form $v = h + L v$. The unknown is $v \in \RR^\Xsf$ where $\Xsf$ is a finite set, $h \in \RR^\Xsf$ is given and $L$ is a linear operator from $\RR^\Xsf$ to itself. We discussed the fact that $\rho(L) < 1$ is sufficient for $v = h + L v$ to have a unique solution. In some settings the condition $\rho(L) < 1$ is also necessary. For example, let - $V = (0, \infty)^\Xsf$ - $L$ be a positive linear operator on $\RR^\Xsf$. In this setting we have the following result: ```{prf:lemma} :label: l-nsln If $h \in V$, then the next two statements are equivalent. 1. $\rho(L) < 1$. 2. The equation $v = h + L v$ has a unique solution in $V$. ``` ```{prf:proof} Regarding (i) $\implies$ (ii), existence of a unique $v \in \RR^\Xsf$ satisfying $v = h + L v$ follows from the Neumann series lemma. Since $v= \sum_{t \geq 0} L^t h \geq h \gg 0$, we have $v \in V$. For (ii) $\implies$ (i), let $v$ be any solution to $v = h + L v$ in $V$. By the Perron--Frobenius theorem, we can select a left eigenvector $e$ such that $e \geq 0$ and $e^\top L= \rho(L) e^\top$. For this $e$, we have $e^\top v = e^\top Lv + e^\top h = \rho(L) e^\top v + e^\top h$. Since $e \geq 0$, $e \not= 0$ and $v, h \gg 0$, it must be that $e^\top h > 0$ and $e^\top v > 0$. Therefore $\rho(L)$ satisfies $(1-\rho(L)) \alpha = \beta$ for $\alpha, \beta > 0$. Hence $\rho(L) < 1$. ◻ ``` In {ref}`sss-powtas` we will extend lemma {prf:ref}`l-nsln` to handle certain nonlinear equations. ### Fixed-Point Results State-dependent discounting breaks the contractivity properties that we exploited in {prf:ref}`c-mdps`, when we studied optimality of MDPs (see, e.g., the proof of {prf:ref}`p-dmdp_o`). Here we introduce a generalization of Banach's fixed-point theorem that can deliver global stability under weaker conditions. For the remainder of this section, $\Xsf$ is any finite set. #### Long-Run Contractions Fix $U \subset \RR^\Xsf$. We call a self-map $T$ on $U$ **eventually contracting** if there exists a $k \in \NN$ and a norm $\| \cdot \|$ on $\RR^\Xsf$ such that $T^k$ is a contraction on $U$ under $\| \cdot \|$. ```{prf:theorem} :label: t-bfpt2 Let $U$ be a closed subset of $\RR^\Xsf$ and let $T$ be a self-map on $U$. If $T$ is eventually contracting on $U$, then $T$ is globally stable on $U$. ``` ```{exercise} :label: ex-state_dep-auto-3 Prove {prf:ref}`t-bfpt2`. (Hint: {prf:ref}`t-bfpt` is self-improving, in the sense that it implies this seemingly stronger result.) ``` ```{solution} ex-state_dep-auto-3 Let $U$ be closed in $\RR^\Xsf$ and let $T$ be a self-map on $U$ such that $T^k$ is a contraction. Let $u^*$ be the unique fixed point of $T^k$. Fix $\epsilon > 0$. We can choose $m$ such that $\| T^{mk} T u^* - u^* \| < \epsilon$. Then $$ \| T T^{mk} u^* - u^*\| = \| T u^* - u^*\| < \epsilon. $$ Since $\epsilon$ was arbitrary we have $\| T u^* - u^*\| = 0$, implying that $u^*$ is a fixed point of $T$. The proof that $T^m u \to u^*$ for any $u$ is left to the reader. ``` The next example illustrates {prf:ref}`t-bfpt2` by proving a result similar to {prf:ref}`ex-tat`. ```{prf:example} :label: eg-affmap_ec If $T u = A u + b$ for some $b \in \RR^\Xsf$ and $A \in \lopx$ with $\rho(A) < 1$, then, under the Euclidean norm, $$ \| T^k u - T^k v \| = \| A^k u - A^k v \| = \| A^k (u - v) \| \leq \|A^k \| \| u - v \|, $$ where the last line is by the submuliplicative property of the operator norm. Since $\rho(A) < 1$, we can choose a $k \in \NN$ such that $\| A^k \| < 1$ (see {prf:ref}`ex-rcondi`). Hence $T$ is eventually contracting and {prf:ref}`t-bfpt2` yields global stability. The unique fixed point satisfies $u = Au + b$ and, since $\rho(A) < 1$, we can use the Neumann series lemma to write it as $u = (I - A)^{-1} b$. ``` {prf:ref}`eg-affmap_ec` illustrates the connection between {prf:ref}`t-bfpt2` and the Neumann series lemma. {prf:ref}`t-bfpt2` is more general because it can be applied in nonlinear settings. But the Neumann series lemma remains important because, when applicable, it provides inverse and power series representations of the fixed point. On one hand, if $T$ is a contraction map on $U \subset \RR^\Xsf$ with respect to a given norm $\| \cdot \|_a$, we cannot necessarily say that $T$ is a contraction with respect to some other norm $\| \cdot \|_b$ on $\RR^\Xsf$. On the other hand, if $T$ is an eventual contraction on $U$ with respect to some given norm on $\RR^\Xsf$, then $T$ is eventually contracting with respect to every norm on $\RR^\Xsf$. The next exercise asks you to verify this. ```{exercise} :label: ex-state_dep-auto-4 Let $\| \cdot \|_a$ and $\| \cdot \|_b$ be norms on $\RR^\Xsf$ and let $T$ be a self-map on $U \subset \RR^\Xsf$ such that $T^k$ is a contraction on $U$ with respect to $\| \cdot \|_a$ for some $k \in \NN$. Prove that there exists an $\ell \in \NN$ such that $T^\ell$ is a contraction on $U$ with respect to $\| \cdot \|_b$. ``` (sss-asrcon)= #### A Spectral Radius Condition The following sufficient condition for eventual contractivity will be helpful when we study dynamic programs with state-dependent discounting. ```{prf:proposition} :label: p-ecrdps0 Let $T$ be a self-map on $U \subset \RR^\Xsf$. If there exists a positive linear operator $L$ on $\RR^\Xsf$ such that $\rho(L) < 1$ and $$ |Tv-Tw| \leq L |v - w| $$ for all $v, w \in U$, then $T$ is an eventual contraction on $U$. ``` ```{prf:proof} Fix $v, w \in U$. Pick any $k \in \NN$. We have $|T^k v - T^k w| \leq L | T^{k-1} v - T^{k-1} w |$, or $$ e_k \leq L e_{k-1} \quad \text{where} \quad e_k \coloneq |T^k v - T^k w| . $$ (eq-dskh) Since $L$ is positive, $L$ is order-preserving on $U$ by {prf:ref}`ex-plop`. As a result, we can iterate on {eq}`eq-dskh` to obtain $e_k \leq L^k e_0$, or $$ |T^k v - T^k w| \leq L^k | v - w |. $$ Let $\| \cdot \|$ be the Euclidean norm. Since $0 \leq a \leq b$ implies $\| a \| \leq \| b \|$, we get $$ \| T^k v - T^k w \| \leq \| L^k |v - w |\| \leq \| L^k \| \| v - w \|, $$ where $\| L^k \|$ is the operator norm (see {ref}`ss-lineq`). Since $\rho(L) < 1$, we have $\| L^k \| \to 0$ as $k \to \infty$ . ({prf:ref}`ex-rcondi`) Hence $T$ is eventually contracting on $U$. ◻ ``` #### A Generalized Blackwell Condition In {ref}`sss-blackwell` we studied a sufficient condition for order-preserving self maps to be contractions. The next proposition provides an analogous result for eventual contractions. In the statement of the proposition, $U$ is a subset of $\RR^\Xsf$ such that $v, c \in U$ and $c \geq 0$ implies $v+c \in U$. ```{prf:proposition} :label: p-blackec Let $T$ be an order-preserving self-map on $U$. If there exists a positive linear operator $L$ on $\RR^\Xsf$ such that $\rho(L) < 1$ and $$ T(v + c) \leq Tv + Lc \quad \text{ for all } c, v \in \RR^\Xsf \text{ with } c \geq 0, $$ then $T$ is eventually contracting on $U$. ``` ```{prf:proof} Fix $v, w \in U$ and let $T$ and $L$ be as in the statement of the proposition. By the assumed properties on $T$, we have $$ Tv = T(v + w - w) \leq T(w + |v - w|) \leq Tw + L |v - w|. $$ Rearranging gives $Tv - Tw \leq L|v-w|$. Reversing the roles of $v$ and $w$ yields $|Tv - Tw| \leq L|v-w|$. The claim in {prf:ref}`p-blackec` now follows from {prf:ref}`p-ecrdps0`. ◻ ``` (s-opt_sdep)= ## Optimality with State-Dependent Discounting We can now turn to dynamic programs in which the objective is to maximize a lifetime value in the presence of state-dependent discounting. First, we present an extension of the MDP model from {prf:ref}`c-mdps` that admits state-dependent discounting. Then we provide weak conditions under which optimal policies exist and Bellman's principle of optimality holds. (ss-mdsdd)= ### MDPs with State-Dependent Discounting We are ready to extend the MDP model to include state-dependent discounting. We construct a framework and then provide weak conditions for optimality based on spectral radius methods. (sss-sdepgp)= #### Setup To provide a framework for dynamic programs with state-dependent discounting, we begin with an MDP $(\Gamma, \beta, r, P)$ with state space $\Xsf$, action space $\Asf$ and feasible state action pairs $\Gsf$. We then replace the constant discount factor $\beta$ with a function $\beta$ from $\Gsf \times \Xsf$ to $\RR_+$. We call the resulting model an **MDP with state-dependent discounting**. The **Bellman equation** takes the form $$ v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \sum_{x'} v(x') \beta(x, a, x') P(x, a, x') \right\}, $$ (eq-gec_bell00) where $x \in \Xsf$ and $v \in \RR^\Xsf$. Notice that the discount factor depends on all relevant information: The current action, the current state and the stochastically determined next period state. For MDPs with state-dependent discounting, we can obtain standard optimality results by assuming a that there exists a $b < 1$ such that $\beta(x, a, x') \leq b$ for all $(x, a, x') \in \Gsf \times \Xsf$. In this setting it is easy to show that lifetime values are finite, and to extend the optimality results for regular MDPs found in {prf:ref}`p-dmdp_o`. Unfortunately, the assumption discussed in the previous paragraph is too strict for many applications. (We return to this point in {ref}`sss-csrcon`.) We will state an optimality result under weaker conditions. (sss-sdlife)= #### Finite Lifetime Values Let $\Sigma$ be the set of all feasible policies, defined as for regular MDPs. The **policy operator** $T_\sigma$ corresponding to $\sigma \in \Sigma$ is represented by $$ (T_\sigma \, v)(x) = r(x, \sigma(x)) + \sum_{x'} v(x') \beta(x, \sigma(x), x') P(x, \sigma(x), x'). $$ (eq-gec_polop) Following {prf:ref}`c-mdps`, we set $r_\sigma(x) \coloneq r(x,\sigma(x))$. We define $L_\sigma \in \lopx$ via $$ L_\sigma(x,x') \coloneq \beta(x, \sigma(x), x') P(x, \sigma(x), x'). $$ (eq-lsigsd) Notice that we can now write {eq}`eq-gec_polop` as $T_\sigma \, v = r_\sigma + L_\sigma \, v$. In line with our discussion of MDPs in {prf:ref}`c-mdps`, when $T_\sigma$ has a unique fixed point we denote it by $v_\sigma$ and interpret it as lifetime value. ```{prf:assumption} :label: a-sdlife For all $\sigma \in \Sigma$ we have $\rho(L_\sigma) < 1$. ``` ```{prf:lemma} :label: l-sdlifef If {prf:ref}`a-sdlife` holds, then, for each $\sigma \in \Sigma$, the linear operator $I-L_\sigma$ is invertible and, in $\RR^\Xsf$, the policy operator $T_\sigma$ has a unique fixed point $$ v_\sigma = (I - L_\sigma)^{-1} r_\sigma. $$ (eq-vsig_stat_dep) ``` ```{prf:proof} Fix $\sigma \in \Sigma$. By the Neumann series lemma, $I - L_\sigma$ is invertible. Any fixed point of $T_\sigma$ obeys $v = r_\sigma + L_\sigma \, v$, which, given invertibility of $I - L_\sigma$, is equivalent to {eq}`eq-vsig_stat_dep`. ◻ ``` As discussed, the value $v_\sigma(x)$ has the interpretation of lifetime value of policy $\sigma$ conditional on initial state $x$. We can reinforce this interpretation by connecting {prf:ref}`l-sdlifef` to {prf:ref}`t-dpec`. The next exercise asks you to work through all the steps. ```{exercise} :label: ex-lvstatedep Fix $\sigma \in \Sigma$, set $\beta_t \coloneq \beta(X_{t-1}, \sigma (X_{t-1}), X_{t})$ for $t \geq 1$ and $\beta_0 \coloneq 1$. Let $(X_t)$ be $P_\sigma$-Markov with initial condition $x$. (As before, $P_\sigma(x, x') \coloneq P(x, \sigma(x), x')$.) Prove that, under {prf:ref}`a-sdlife`, the function $v_\sigma$ obeys $$ v_\sigma(x) = \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] r_\sigma(X_t) \qquad (x \in \Xsf). $$ (eq-vyzx0) ``` ```{exercise} :label: ex-state_dep-auto-5 Show that, under {prf:ref}`a-sdlife`, the operator $T_\sigma$ is globally stable on $\RR^\Xsf$. ``` ```{solution} ex-state_dep-auto-5 Fix $\sigma \in \Sigma$ and let {prf:ref}`a-sdlife` hold. We saw in the proof of {prf:ref}`l-sdlifef` that $T_\sigma \, v = r_\sigma + L_\sigma \, v$ and $v_\sigma = (I - L_\sigma)^{-1} r_\sigma$ is the unique fixed point in of this operator $\RR^\Xsf$. Moreover, for fixed $v, w \in \RR^\Xsf$, we have $$ |T_\sigma \, v - T_\sigma \, w| = |L_\sigma \, v - L_\sigma \, w| = |L_\sigma \, (v - w)| = L_\sigma \, |v - w|. $$ Hence, by {prf:ref}`p-ecrdps0`, $T_\sigma$ is globally stable on $\RR^\Xsf$. ``` ```{exercise} :label: ex-state_dep-auto-6 Show that {prf:ref}`a-sdlife` holds whenever there exists an $L \in \lopx$ such that $\rho(L)<1$ and $$ \beta(x, a, x') P(x, a, x') \leq L(x, x') \quad \text{for all } (x,a) \in \Gsf \text{ and } x' \in \Xsf. $$ (eq-sddmdp0) ``` ```{solution} ex-state_dep-auto-6 Fix $\sigma \in \Sigma$. When {eq}`eq-sddmdp0` holds, we have $0 \leq L_\sigma \leq L$. {prf:ref}`ex-nnmatop2` now implies that $\rho(L_\sigma) \leq \rho(L)$. Hence $\rho(L_\sigma) < 1$. ``` (sss-opsdd)= #### Optimality The **Bellman operator** takes the form $$ (Tv)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \sum_{x'} v(x') \beta(x, a, x') P(x, a, x') \right\}, $$ (eq-gec_bell) where $x \in \Xsf$ and $v \in \RR^\Xsf$. Given $v \in \RR^\Xsf$, a policy $\sigma$ is called **$v$-greedy** if $\sigma(x)$ is a maximizer of the right-hand side of {eq}`eq-gec_bell` for all $x$ in $\Xsf$. Equivalently, $\sigma$ is $v$-greedy whenever $T_\sigma \, v = Tv$. When {prf:ref}`a-sdlife` holds and, as a result, $T_\sigma$ has a unique fixed point $v_\sigma$ for each $\sigma \in \Sigma$, we let $v^*$ denote the **value function**, which is defined as $v^* \coloneq \vee_{\sigma \in \Sigma} v_\sigma$. As for the regular MDP case, a policy $\sigma$ is called **optimal** if $v_\sigma = v^*$. We can now state our main optimality result for MDPs with state-dependent discounting. ```{prf:proposition} :label: p-sddmdpec0 If {prf:ref}`a-sdlife` holds, then 1. the value function $v^*$ is the unique solution to the Bellman equation in $\RR^\Xsf$, 2. a policy $\sigma \in \Sigma$ is optimal if and only if it is $v^*$-greedy, and 3. at least one optimal policy exists. ``` In {ref}`ss-ecrdps` we prove a result that includes {prf:ref}`p-sddmdpec0` as a special case. (sss-sdmdpsal)= #### Algorithms Algorithms for solving an MDP with state-dependent discounting include value function iteration (VFI), Howard policy iteration (HPI), and optimistic policy iteration (OPI). The algorithms for VFI and OPI are identical to those given for regular MDPs (see {ref}`ss-fmdpsal`), provided that the correct operators $T$ and $T_\sigma$ are used, and that the definition of a $v$-greedy policy is as given in {ref}`sss-sdepgp`. The algorithm for HPI is almost identical, with the only change being that computation of lifetime values involves $L_\sigma$. Details are given in {prf:ref}`algo-hpi_sd`. ```{prf:algorithm} HPI for MDPs with state-dependent discounting :label: algo-hpi_sd - input $\sigma \in \Sigma$ - $v_0 \leftarrow v_\sigma$ and $k \leftarrow 0$ - repeat: - $\sigma_k \leftarrow $ a $v_k$-greedy policy - $v_{k+1} \leftarrow (I - L_{\sigma_k} )^{-1} r_{\sigma_k}$ - if $v_{k+1} = v_k$: break - $k \leftarrow k + 1$ - return $\sigma_k$ ``` We prove in {prf:ref}`c-rdps` that, under the conditions of {prf:ref}`a-sdlife`, VFI, OPI and HPI are all convergent, and that HPI converges to an exact optimal policy in a finite number of steps. (sss-deffsmpd)= #### Exogenous Discounting Some applications use an exogenous state component to drive a discount factor process. In this section we set up such a model and obtain optimality conditions by applying {prf:ref}`p-sddmdpec0`. The first step is to decompose the state $X_t$ into a pair $(Y_t, Z_t)$, where $(Y_t)_{t \geq 0}$ is endogenous (i.e., affected by the actions of the controller) and $(Z_t)_{t \geq 0}$ is purely exogenous. In particular, the primitives consist of 1. a nonempty correspondence $\Gamma$ from $\Ysf \times \Zsf$ to $\Asf$, 2. a function $\beta$ from $\Zsf$ to $\RR_+$, 3. a function $r$ from $\Gsf \coloneq \setntn{(y, a) \in \Ysf \times \Asf}{a \in \Gamma(y)}$ to $\RR$, 4. a stochastic matrix $Q$ on $\Zsf$ and 5. a stochastic kernel $R$ from $\Gsf$ to $\Ysf$. The corresponding Bellman equation is $$ v(y, z) = \max_{a \in \Gamma(y, z)} \left\{ r(y, a) + \beta(z) \sum_{z', \, y'} v(y', z') Q(z, z') R(y, a, y') \right\}, $$ (eq-besdmdp) for all $(y, z) \in \Xsf$. Given $v \in \RR^\Xsf$, a policy $\sigma \in \Sigma$ is called **$v$-greedy** if $$ \sigma(y, z) \in \argmax_{a \in \Gamma(y, z)} \left\{ r(y, a) + \beta(z) \sum_{z', \, y'} v(y', z') Q(z, z') R(y, a, y') \right\}, $$ (eq-ncdmcsd) for all $(y, z) \in \Xsf$. This exogenous discount model is a special case of the general MDP with state-dependent discounting. Indeed, we can write {eq}`eq-besdmdp` as {eq}`eq-gec_bell` by setting $x \coloneq (y,z)$ and defining $$ P(x,a,x') \coloneq P((y, z),a, (y', z')) \coloneq Q(z, z') R(y, a, y'). $$ The following proposition provides a relatively simple sufficient condition for the core optimality results in the setting of the exogenous discount model. ```{prf:proposition} :label: p-dmdpsd Let $L_\Zsf$ be the operator in $\lopz$ defined by $L_\Zsf(z, z') \coloneq \beta(z) Q(z, z')$. If $\rho(L_\Zsf) < 1$, then all of the optimality results in {prf:ref}`p-sddmdpec0` hold. ``` ```{exercise} :label: ex-state_dep-auto-7 Prove {prf:ref}`p-dmdpsd`. (Hint: Use {prf:ref}`l-zyind`.) ``` ```{solution} ex-state_dep-auto-7 Fix $\sigma \in \Sigma$. In the present setting, the discount operator $L_\sigma$ from {eq}`eq-lsigsd` becomes $$ L_\sigma(x, x') = L_\sigma((y,z), (y',z')) = \beta(z) Q(z,z') R(y, \sigma(y), y'). $$ In view of {prf:ref}`l-zyind`, the spectral radius of $L_\sigma$ on $\lopx$ is equal to the spectral radius of $L_\Zsf(z, z') = \beta(z) Q(z, z')$ on $\lopz$. It follows that $\rho(L_\Zsf) < 1$ in $\lopz$ implies $\rho(L_\sigma) < 1$ in $\lopx$, so {prf:ref}`a-sdlife` holds. Hence, under this condition, {prf:ref}`p-sddmdpec0` is valid. ``` (sss-csrcon)= #### Comments on the Spectral Radius Condition In {ref}`sss-sdlife` we mentioned that requiring $\sup \beta < 1$ is too strict for some applications. For example, the real interest rate $r_t$ shown in Figure {numref}`f-plot_interest_rates_real` is sometimes negative. Using long historical records, {cite:t}`farmer2023discounting` find that the discount rate is negative around 1/3 of the time. This means that the associated discount factor $\beta_t = 1/(1+r_t)$ is sometimes greater than $1$ and $\sup \beta < 1$ fails. In macroeconomics, empirically motivated time-varying discount factor specifications lead to models where $\beta_t > 1$ occurs with positive probability. For example, {cite:t}`hills2019effective` study a model that can be embedded in the MDP framework just described. Figure {numref}`f-ar1_spec_rad` shows a simulation of one of the discount factor processes used in their model, prior to discretization. The exogenous state and discount factor process takes the form $\beta_t = b Z_t$, where $(Z_t)$ is an exogenous state obeying $Z_{t+1} = 1 - \rho + \rho Z_t + \sigma \epsilon_{t+1}$ with $(\epsilon_t)$ iid and standard normal. Clearly $\sup \beta < 1$ fails for this model too. Let's now consider the weaker condition $\rho(L) < 1$ described in {prf:ref}`p-dmdpsd` and check whether it holds. Following {cite:t}`hills2019effective`, we discretize the dynamics of $(Z_t)$ via a Tauchen approximation, producing a stochastic matrix $Q$ on a finite set $\Zsf$.[^2] The set of values for $\beta_t$ ranges between $0.95$ and $1.04$, so that $\beta_t > 1$ remains possible. Nonetheless, with $L(z, z') = \beta(z) Q(z, z')$ we obtain $\rho(L)=0.9996$. Hence {prf:ref}`p-dmdpsd` applies. ```{figure} ../figures/ar1_spec_rad.pdf :name: f-ar1_spec_rad Discount factor process $(\beta)_{t \geq 0}$ in {cite:t}`hills2019effective`. ``` ### Inventory Management Revisited In this section, we modify the inventory management model from {ref}`ss-ip` to include time-varying interest rates. Recall that, in the model of {ref}`ss-ip`, the Bellman equation takes the form $$ v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{d \geq 0} v(f(x, a, d)) \phi(d) \right\}, $$ (eq-ipcdis) at each $x \in \Xsf$, where $\Xsf \coloneq \{0, \ldots, K\}$, $x$ is the current inventory level, $a$ is the current inventory order, $r(x, a)$ is current profits (defined in {eq}`eq-cpip`), $f(x,a,d) \coloneq (x - d)\vee 0 + a$ and $d$ is an iid demand shock with distribution $\phi$. Let's now add a time-varying discount rate and investigate its impact on optimal choices. We add time-varying discounting by replacing the constant $\beta$ in {eq}`eq-ipcdis` with a stochastic process $(\beta_t)$ where $\beta_t = 1/(1+r_t)$. We suppose that the dynamics can be expressed as $\beta_t = \beta(Z_t)$, where the exogenous process $(Z_t)_{t \geq 0}$ is $Q$-Markov on $\Zsf$. After relabeling the endogenous state $X_t$ as $Y_t$ and $x$ as $y$, in line with the notation in {ref}`sss-deffsmpd`, the Bellman equation becomes $v(y, z) = \max_{a \in \Gamma(y, z)} B((y, z), a, v)$ where $$ B((y, z), a, v) = r(y, a) + \beta(z) \sum_{d, \, z'} v(f(y, a, d), z') \phi(d) Q(z, z'). $$ (eq-invsddb0) If we set $$ R(y, a, y') \coloneq \PP\{f(y, a, d) = y'\} \quad \text{when} \quad D \sim \phi, $$ then $R(y, a, y')$ is the probability of realizing next period inventory level $y'$ when the current level is $y$ and the action is $a$. Hence we can rewrite {eq}`eq-invsddb0` as $$ B((y, z), a, v) = r(y, a) + \beta(z) \sum_{y', z'} v(y', z') Q(z, z') R(y, a, y') . $$ (eq-invsddb) We have now created a version of the MDP with exogenous state-dependent discounting described in {ref}`sss-deffsmpd`. Letting $L(z, z') \coloneq \beta(z) Q(z, z')$ and applying {prf:ref}`p-dmdpsd`, we see that all of the standard optimality results hold whenever $\rho(L)<1$. Figure {numref}`f-inventory_sdd_ts` shows how inventory evolves under an optimal program when the parameters of the problem are as given in {numref}`list-inventory_sdd`. (The code preallocates and computes arrays representing $r$, $R$, and $Q$ in {eq}`eq-invsddb` and includes a test for $\rho(L)<1$.) We set $\beta(z) = z$ and take $(Z_t)$ to be a discretization of an AR(1) process. Figure {numref}`f-inventory_sdd_ts` was created by simulating $(Z_t)$ according to $Q$ and inventory $(Y_t)$ according to $Y_{t+1} = (Y_t - D_{t+1} ) \vee 0 + A_t$, where $A_t$ follows the optimal policy. The outcome is similar to Figure {numref}`f-inventory_dp_ts`, in the sense that inventory falls slowly and then jumps up. As before, fixed costs induce this lumpy behavior. However, a new phenomenon is now present: Inventories trend up when interest rates fall and down when they rise. (The interest rate $r_t$ is calculated via $\beta_t = 1/(1+r_t)$ at each $t$.) High interest rates foreshadow high interest rates due to positive autocorrelation ($\rho > 0$), which in turn devalue future profits and hence encourage managers to economize on stock. ```{figure} ../figures/inventory_sdd_ts.pdf :name: f-inventory_sdd_ts Inventory dynamics with time-varying interest rates ``` ```{code-block} julia :name: list-inventory_sdd :caption: Investment model with time-varying discounting (`inventory_sdd.jl`) :linenos: using LinearAlgebra, Random, Distributions, QuantEcon f(y, a, d) = max(y - d, 0) + a # Inventory update function create_sdd_inventory_model(; ρ=0.98, ν=0.002, n_z=20, b=0.97, # Z state parameters K=40, c=0.2, κ=0.8, p=0.6, # firm and demand parameters d_max=100) # truncation of demand shock ϕ(d) = (1 - p)^d * p # demand distribution d_vals = collect(0:d_max) ϕ_vals = ϕ.(d_vals) y_vals = collect(0:K) # inventory levels n_y = length(y_vals) mc = tauchen(n_z, ρ, ν) z_vals, Q = mc.state_values .+ b, mc.p ρL = maximum(abs.(eigvals(z_vals .* Q))) @assert ρL < 1 "Error: ρ(L) ≥ 1." R = zeros(n_y, n_y, n_y) for (i_y, y) in enumerate(y_vals) for (i_y′, y′) in enumerate(y_vals) for (i_a, a) in enumerate(0:(K - y)) hits = f.(y, a, d_vals) .== y′ R[i_y, i_a, i_y′] = dot(hits, ϕ_vals) end end end r = fill(-Inf, n_y, n_y) for (i_y, y) in enumerate(y_vals) for (i_a, a) in enumerate(0:(K - y)) cost = c * a + κ * (a > 0) r[i_y, i_a] = dot(min.(y, d_vals), ϕ_vals) - cost end end return (; K, c, κ, p, r, R, y_vals, z_vals, Q) end ``` Figure {numref}`f-inventory_sdd_timing` shows execution time for VFI and OPI at different choices of $m$ (see {ref}`sss-opsdd` for the interpretation of $m$). As for the optimal savings problem we studied in {prf:ref}`c-mdps`, OPI is around 1 order of magnitude faster when $m$ is close to 50 (cf. Figure {numref}`f-finite_opt_saving_2_1`). ```{figure} ../figures/inventory_sdd_timing.pdf :name: f-inventory_sdd_timing OPV vs VFI timings for the inventory problem ``` (s-asset)= ## Asset Pricing This section provides a brief introduction to asset pricing in a Markov environment. While the topic of asset pricing is fascinating in its own right, our main aim is to provide additional practice in handling linear valuation problems. (Readers who wish to push ahead with their study of dynamic programming can safely skip to {prf:ref}`c-val`.) (ss-aap)= ### Introduction to Asset Pricing We first discuss risk-neutral pricing and show why this assumption is typically implausible. Next, we introduce stochastic discount factors and stationary asset pricing. (sss-rnc)= #### Risk-Neutral Pricing? Consider the problem of assigning a current price $\Pi_t$ to an asset that confers on its owner the right to payoff $G_{t+1}$. The payoff is stochastic and realized next period. One simple idea is to use **risk-neutral pricing**, which implies that $$ \Pi_t = \EE_t \, \beta \, G_{t+1}, $$ (eq-rnfrapt) for some constant discount factor $\beta \in (0,1)$. If the payoff is in $k$ periods, then we modify the price to $\EE_t \, \beta^k \, G_{t+k}$. In essence, risk-neutral pricing says that cost equals expected reward, discounted to present value by compounding a constant rate of discount. (A *rate* of discount, say $\rho$, is linked to a discount *factor*, say $\beta$, by $\beta = 1/(1+\rho) \approx \exp(-\rho)$.) ```{prf:example} Let $S_t$ be the price of a stock at each point in time $t$. A **European call option** gives its owner the right to purchase the stock at price $K$ at time $t+k$. There is no obligation to exercise the option, so the payoff at $t+k$ is $\max\{S_{t+k}-K, 0\}$. Under risk-neutral pricing, the time $t$ price of this option is $$ \Pi_t = \EE_t \, \beta^k \, \max\{S_{t+k}-K, 0\}. $$ ``` Although risk neutrality allows for simple pricing, assuming risk neutrality for all investors is *not* plausible. To give one example, suppose that we take the asset that pays $G_{t+1}$ in {eq}`eq-rnfrapt` and replace it with another asset that pays $H_{t+1} =G_{t+1} + \epsilon_{t+1}$, where $\epsilon_{t+1}$ is independent of $G_{t+1}$, $\EE_t \, \epsilon_{t+1}=0$ and $\var \epsilon_{t+1} > 0$. In effect, we are adding risk to the original payoff without changing its mean. Under risk neutrality, the price of this new asset is $$ \Pi_t^H = \EE_t \, \beta \, [G_{t+1} + \epsilon_{t+1}] = \Pi_t + \beta \, \EE_t \, \epsilon_{t+1} = \Pi_t. $$ Thus, $H_{t+1}$ and $G_{t+1}$ are priced identically, even though their means are both $\EE_t G_{t+1}$ and their variances satisfy $$ \var H_{t+1} = \var G_{t+1} + \var \epsilon_{t+1} > \var G_{t+1}. $$ This outcome contradicts the idea that investors typically want compensation for bearing risk. A helpful way to think about the same point is to consider the rate of return $r_{t+1} \coloneq (G_{t+1}-\Pi_t) / \Pi_t$ on holding an asset with payoff $G_{t+1}$. From {eq}`eq-rnfrapt` we have $\EE_t \, \beta (1 + r_{t+1}) = 1$, or $$ \EE_t \, r_{t+1} = \frac{1-\beta}{\beta}. $$ Since the right-hand side does not depend on $G_{t+1}$, risk neutrality implies that all assets have the same expected rate of return. But this contradicts the finding that, on average, riskier assets tend to have higher rates of return that compensate investors for bearing risk. ```{prf:example} The **risk premium** on a given asset is defined as the expected rate of return minus the rate of return on a risk-free asset. If we assume risk neutrality then, by the preceding discussion, the risk premium is zero for all assets. However, calculations based on post-war US data show that the average return premium on equities over safe assets is around 8% per annum (see, e.g., {cite}`cochrane2009asset`). ``` (sss-ansdf)= #### A Stochastic Discount Factor To go beyond risk neutral-pricing, let's start with a model containing one asset and one agent. It is straightforward to price the asset and compare it to the risk neutral case. A representative agent takes the price $\Pi_t$ of a risky asset as given and solves $$ \begin{aligned} & \max_{0 \leq \alpha \leq 1} \{ u(C_t) + \beta \EE_t u(C_{t+1}) \} \\ \text{subject to} \quad & C_t = E_t - \Pi_t \alpha \quad \text{and} \quad C_{t+1} = E_{t+1} + \alpha G_{t+1}. \end{aligned} $$ Here - $u$ is a flow utility function, - $G_{t+1}$ is the payoff of the asset and $\Pi_t$ is the time-$t$ price, - $\beta$ is a constant discount factor measuring impatience of the agent, - $E_t$ and $E_{t+1}$ are endowments and - $\alpha$ is the share of the asset purchased by the agent. Rewriting as $\max_\alpha \{ u(E_t - \Pi_t \alpha) + \beta \EE_t u(E_{t+1} + \alpha G_{t+1}) \}$ and differentiating with respect to $\alpha$ leads to the first order condition $$ u'(E_t - \Pi_t \alpha) \Pi_t = \beta \EE_t u'(E_{t+1} + \alpha G_{t+1}) G_{t+1}. $$ Rearranging gives us $$ \Pi_t = \EE_t \left[ \beta \frac{u'(C_{t+1})}{u'(C_t)} G_{t+1} \right]. $$ (eq-lfrapt) Comparing {eq}`eq-lfrapt` with {eq}`eq-rnfrapt`, we see that the payoff is now multiplied by a positive random variable rather than a constant. The random variable $$ M_{t+1} \coloneq \beta \frac{u'(C_{t+1})}{u'(C_t)} $$ (eq-lucassdf) is called the **stochastic discount factor** or **pricing kernel**. We call this particular form of the pricing kernel shown in {eq}`eq-lucassdf` **Lucas stochastic discount factor** (Lucas SDF) in honor of {cite:t}`lucas1978asset`. ```{prf:example} If $u$ is linear, so that $u(c) = a c + b$ for some $a, b \in \RR$, then $u'(c) = a$ for all $c$, so $M_{t+1} = \beta$. If the utility function has no curvature, then pricing is risk neutral. ``` ```{prf:example} If utility has the CRRA form $u(c) = c^{1-\gamma}/(1-\gamma)$ for some $\gamma > 0$, then the Lucas SDF takes the form $$ M_{t+1} = \beta \left( \frac{C_{t+1}}{C_t} \right)^{-\gamma}, $$ (eq-lucrra) which we can also write as $M_{t+1} = \beta \exp(-\gamma g_{t+1})$ when $g_{t+1} \coloneq \ln( C_{t+1}/C_t)$ is the growth rate of consumption. ``` In the CRRA case, the Lucas SDF applies heavier discounting to assets that concentrate payoffs in states of the world where the agent is already enjoying strong consumption growth. Conversely, the SDF attaches higher weights to future payoffs that occur when consumption growth is low because such payoffs hedge against the risk of drawing low consumption states. (sss-sdfg)= #### A General Specification The standard neoclassical theory of asset pricing generalizes the Lucas discounting specification by assuming only that there exists a positive random variable $M_{t+1}$ such that the price of an asset with payoff $G_{t+1}$ is $$ \Pi_t = \EE_t \, M_{t+1} \, G_{t+1} \qquad (t \geq 0). $$ (eq-frapt) In line with the preceding discussion, $M_{t+1}$ is called a **stochastic discount factor** (SDF). Equation {eq}`eq-frapt` generalizes {eq}`eq-lfrapt` by refraining from restricting the SDF (apart from assuming positivity). Actually, it can be shown that there exists an SDF $M_{t+1}$ such that {eq}`eq-frapt` is always valid under relatively weak assumptions. In particular, a single SDF $M_{t+1}$ can be used to price *any* asset in the market, so if $H_{t+1}$ is a another stochastic payoff then the current price of an asset with this payoff is $\EE_t \, M_{t+1} \, H_{t+1}$. We do not prove these claims, since our interest is in understanding forward-looking equations in Markov environments. Some relevant references are listed in {ref}`s-cn_state_dep`. (sss-marp)= #### Markov Pricing A common assumption in quantitative applications is that all underlying randomness is driven by a Markov model. In this spirit, we take $(X_t)$ to be $P$-Markov on finite-state $\Xsf$, where $P \in \mopx$, and suppose further that the SDF and payoff have the forms $$ M_{t+1} = m(X_t, X_{t+1}) \quad \text{and} \quad G_{t+1} = g(X_t, X_{t+1}), $$ for fixed functions $m, g$ mapping $\Xsf \times \Xsf$ to $\RR_+$. Since $m$ is arbitrary at this point, we don't assume a particular specification for the SDF. In this setting, conditioning on $X_t = x$, the standard asset pricing equation $\Pi_t = \EE_t \, M_{t+1} \, G_{t+1}$ becomes $$ \pi(x) = \sum_{x'} m(x, x') g(x, x') P(x, x') \qquad (x \in \Xsf), $$ (eq-mfrapt) where $\pi(x)$ is the price of the asset conditional on $X_t = x$ (i.e., $\Pi_t = \pi(X_t)$). (sss-psdst)= #### Pricing a Stationary Dividend Stream Now we are ready to look at pricing a stationary cash flow over an infinite horizon, a basic problem in asset pricing. We will apply the Markov structure assumed in {ref}`sss-marp`. In all that follows, $(X_t)$ is $P$-Markov on $\Xsf$ and $M_{t+1}$ is defined as in {ref}`sss-marp`. We seek the time $t$ price, denoted by $\Pi_t$, for an **ex dividend contract** on the dividend stream $(D_t)_{t \geq 0}$. The contract provides the owner with the right to the dividend stream. The "ex dividend" component means that, should the dividend stream be traded at time $t$, the dividend paid at time $t$ goes to the seller rather than the buyer. As a result, purchasing at $t$ and selling at $t+1$ pays $\Pi_{t+1} + D_{t+1}$. Hence, applying the asset pricing rule {eq}`eq-frapt`, at time $t$ price $\Pi_t$ of the contract must satisfy $$ \Pi_t = \EE_t \, M_{t+1} (\Pi_{t+1} + D_{t+1}). $$ (eq-fdstreq) We assume the existence of a $d \in \RR_+^\Xsf$ such that $D_t = d(X_t)$ for all $t$. Using {eq}`eq-mfrapt`, we can write this as $$ \pi(x) = \sum_{x'} m(x, x') (\pi(x') + d(x')) P(x, x') \qquad (x \in \Xsf), $$ (eq-apxy) or, equivalently, $$ \pi = A \pi + A d \quad \text{when } A(x,x') \coloneq m(x, x') P(x, x'). $$ (eq-mcda) By the Neumann series lemma, $\rho(A) < 1$ implies {eq}`eq-mcda` has unique solution $$ \pi^* \coloneq (I - A )^{-1} A d = \sum_{k=1}^\infty A^k d. $$ The vector $\pi^*$ is called an **equilibrium price function**. ```{exercise} :label: ex-nsasp Show that $\rho(A) < 1$ is both necessary and sufficient for existence of a unique solution to {eq}`eq-apxy` in $(0,\infty)^\Xsf$ whenever $m, d \gg 0$. ``` ```{solution} ex-nsasp Assume $m, d \gg 0$ and write {eq}`eq-apxy` as $\pi = A \pi + h$, where $h \coloneq Ad$. A simple argument shows that $h \gg 0$ (see {prf:ref}`ex-pmptp` for a closely related claim.) The claim in {prf:ref}`ex-nsasp` now follows directly from {prf:ref}`l-nsln`. ``` ```{exercise} :label: ex-state_dep-auto-8 As discussed in {ref}`sss-rnc`, the case $m \equiv \beta$ for some $\beta \in \RR_+$ is called the risk-neutral case. Provide a condition on $\beta$ under which $\rho(A) < 1$. ``` ```{exercise} :label: ex-state_dep-auto-9 Confirm that $(\Pi_t)_{t \geq 0}$ generated by $\Pi_t = \pi^*(X_t)$ solves {eq}`eq-fdstreq`. ``` ```{prf:remark} We can call $A$ an **Arrow--Debreu discount operator**. Its powers apply discounting: The valuation of any random payoff $g$ in $k$ periods is $A^k g$. ``` ```{exercise} :label: ex-cumdiv Derive the price for a **cum dividend contract** on the dividend stream $(D_t)_{t \geq 0}$, with the model otherwise unchanged. Under this contract, should the right to the dividend stream be traded at time $t$, the dividend paid at time $t$ goes to the buyer rather than the seller. ``` ```{solution} ex-cumdiv Under a cum dividend contract, purchasing at $t$ and selling at $t+1$ pays $D_t + \Pi_{t+1}$. Hence, applying the fundamental asset pricing equation, the time $t$ price $\Pi_t$ of the contract must satisfy $$ \Pi_t = D_t + \EE_t \, M_{t+1} \Pi_{t+1}. $$ Proceeding as for the ex dividend contract, the price conditional on current state $x$ is $\pi(x) = d(x) + \sum_{x'} m(x, x') \pi(x') P(x, x')$. In vector form, this is $\pi = d + A \pi$. Solving out for prices gives $\pi^* = (I - A )^{-1} d$. ``` #### Forward Sum Representation Asset prices can be expressed as infinite sums. Let's show this for cum dividend contracts (although the case of ex dividend contracts is similar). In {prf:ref}`ex-cumdiv` you found that the state-contingent price vector $\pi$ for a cum dividend contract on the dividend stream $(D_t)_{t \geq 0}$ obeys $$ \pi = d + A \pi \quad \text{when } A(x,x') \coloneq m(x, x') P(x, x') $$ (eq-mcda2) and $\rho(A) < 1$. As before, $D_t = d(X_t)$ and $(X_t)_{t\geq 0}$ is $P$-Markov on $\Xsf$. Applying the uniqueness component of the Neumann series lemma and {prf:ref}`t-dpec`, we see that the function $\pi$ also obeys $$ \pi(x) = \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t M_i \right] D_t \qquad (x \in \Xsf), $$ where $M_{t+1} \coloneq m(X_t, X_{t+1})$ for $t \geq 0$ and $M_0 \coloneq 1$. This expression agrees with our intuition: The price of the contract is the expected present value of the dividend stream, with the time $t$ dividend discounted by the composite factor $M_1 \cdots M_t$. ### Nonstationary Dividends Until now, our discussion of asset pricing has assumed that dividends are stationary. However, dividends typically grow over time, along with other economic measures such as GDP. In this section, we solve for the price of a dividend stream when dividends exhibit random growth. #### Price-Dividend Ratios A standard model of dividend growth is $$ \ln \frac{D_{t+1}}{D_t} = \kappa(X_t, \eta_{t+1}) \qquad t = 0, 1, \ldots, $$ where $\kappa$ is a fixed function, $(X_t)$ is the state process and $(\eta_t)$ is iid. We let $\phi$ be the density of each $\eta_t$ and assume that $(X_t)$ is $P$-Markov on a finite set $\Xsf$. Let's suppose as before that the SDF obeys $M_{t+1} = m(X_t, X_{t+1})$ for some positive function $m$. Since dividends grow over time, so will the price of the asset. As such, we should no longer seek a fixed function $\pi$ such that $\Pi_t = \pi(X_t)$ for all $t$, since the resulting price process $(\Pi_t)$ will fail to grow. Instead, we try to solve for the **price-dividend ratio** $V_t \coloneq \Pi_t / D_t$, which we hope will be stationary. ```{exercise} :label: ex-state_dep-auto-10 Using $\Pi_t = \EE_t \, [ M_{t+1} (D_{t+1} + \Pi_{t+1})]$, show that $$ V_t = \EE_t \, \left[ M_{t+1} \exp(\kappa(X_t, \eta_{t+1})) \left(1 + V_{t+1} \right) \right]. $$ (eq-pdrrv) ``` After conditioning on $X_t = x$, {eq}`eq-pdrrv` leads us to conjecture existence of a function $v$ such that $$ v(x) = \sum_{x'} m(x, x') \int \exp(\kappa(x, \eta)) \phi(\diff \eta) \left[ 1 + v(x') \right] P(x, x'), $$ (eq-pdrf) for all $x \in \Xsf$. We understand {eq}`eq-pdrf` as an equation to be solved for the unknown object $v \in \RR^\Xsf$. If we can find a solution $v^*$ to {eq}`eq-pdrf`, then setting $V_t = v^*(X_t)$ yields a process $(V_t)$ that obeys {eq}`eq-pdrrv`. ```{exercise} :label: ex-state_dep-auto-11 Let $$ A(x, x') \coloneq m(x, x') \int \exp(\kappa(x, \eta)) \phi(\diff \eta) P(x, x') \qquad (x, x' \in \Xsf). $$ (eq-ansd) Show that {eq}`eq-pdrrv` has a unique solution $v^*$ in $\RR^\Xsf$ when $\rho(A) < 1$, and $$ v^* = (I - A)^{-1} A\1 = \sum_{t \geq 1} A^t \1. $$ ``` ```{solution} ex-state_dep-auto-11 We seek a $v$ that solves $$ v(x) = \sum_{x'} \left[ 1 + v(x') \right] A(x, x') \qquad (x, x' \in \Xsf). $$ Treating $A$ as a matrix and $v$ as a column vector, this equation becomes $v = A \1 + A v$, where $\1$ is a column vector of ones. By the Neumann series lemma, $\rho(A) < 1$ implies that this equation has the unique solution $v^* = (I - A)^{-1} A\1$. By the same lemma, $v^*$ has the alternative representation $v^* = \sum_{t \geq 0} A^t (A \1) = \sum_{t \geq 1} A^t \1$. ``` The price-dividend process $(V^*_t)$ defined by $V^*_t = v^*(X_t)$ solves {eq}`eq-pdrrv`. The price can be recovered via $\Pi_t = V^*_t D_t$. #### Application: Markov Growth with a Lucas SDF As an example, suppose that dividend growth obeys $$ \kappa(X_t, \eta_{d, t+1}) = \mu_d + X_t + \sigma_d \, \eta_{d, t+1}, $$ where $(\eta_{d,t})_{t \geq 0}$ is iid and standard normal. Consumption growth is given by $$ \ln \frac{C_{t+1}}{C_t} = \mu_c + X_t + \sigma_c \, \eta_{c, t+1} , $$ where $(\eta_{c,t})_{t \geq 0}$ is also iid and standard normal. We use the Lucas SDF in {eq}`eq-lucrra`, implying that $$ M_{t+1} = \beta \left( \frac{C_{t+1}}{C_t} \right)^{-\gamma} = \beta \exp(-\gamma( \mu_c + X_t + \sigma_c \eta_{c, t+1} )). $$ ```{exercise} :label: ex-state_dep-auto-12 Using {eq}`eq-ansd`, show that $$ A(x, x') = \beta \exp \left( -\gamma \mu_c + \mu_d + (1 - \gamma) x + \frac{\gamma^2 \sigma_c^2 + \sigma_d^2}{2} \right) P(x, x'). $$ ``` ```{code-block} julia :name: list-pd_ratio :caption: Asset pricing model with Lucas SDF (`pd_ratio.jl`) :linenos: using QuantEcon, LinearAlgebra "Creates an instance of the asset pricing model with Markov state." function create_asset_pricing_model(; n=200, # state grid size ρ=0.9, ν=0.2, # state persistence and volatility β=0.99, γ=2.5, # discount and preference parameter μ_c=0.01, σ_c=0.02, # consumption growth mean and volatility μ_d=0.02, σ_d=0.1) # dividend growth mean and volatility mc = tauchen(n, ρ, ν) x_vals, P = exp.(mc.state_values), mc.p return (; x_vals, P, β, γ, μ_c, σ_c, μ_d, σ_d) end ``` Figure {numref}`f-pd_ratio_1` shows the price-dividend ratio function $v^*$ for the specification given in {numref}`list-pd_ratio`, as well as for an alternative mean dividend growth rate $\mu_d$. The state process is a Tauchen discretization of an AR(1) process with positive autocorrelation. An increase in the state predicts higher dividends, which tends to increase the price. At the same time, higher $x$ also predicts higher consumption growth, which acts negatively on the price. For values of $\gamma$ greater than 1, the second effect dominates and the price-dividend ratio slopes down. ```{figure} ../figures/pd_ratio_1.pdf :name: f-pd_ratio_1 Price-dividend ratio as a function of the state ``` ```{exercise} :label: ex-state_dep-auto-13 Complete the code in {numref}`list-pd_ratio` and replicate Figure {numref}`f-pd_ratio_1`. Add a test to your code that checks $\rho(A)<1$ before computing the price-dividend ratio. ``` ### Incomplete Markets In {ref}`sss-psdst` we used the Neumann series lemma to solve for the equilibrium price vector $\pi$. However, some modifications to the basic model introduce nonlinearities that render the Neumann series lemma inapplicable. For example, {cite:t}`harrison1978speculative` analyze a setting with heterogeneous beliefs and incomplete markets, leading to failure of the standard asset pricing equation. This results in a nonlinear equation for prices. We treat the {cite:t}`harrison1978speculative` model only briefly. There are two types of agents. Type $i$ believes that the state updates according to stochastic matrix $P_i$ for $i=1,2$. Agents are risk-neutral, so $m(x,y) \equiv \beta \in (0, 1)$. {cite:t}`harrison1978speculative` show that, for their model, the equilibrium condition {eq}`eq-apxy` becomes $$ \pi(x) = \max_i \beta \sum_{x'} [\pi(x') + d(x')] P_i(x, x') $$ (eq-ee) for $x \in \Xsf$ and $i \in \{1, 2\}$. Setting aside the details that lead to this equation, our objective is simply to obtain a vector of prices $\pi$ that solves {eq}`eq-ee`. As a first step, we introduce an operator $T \colon \RR^\Xsf_+ \to \RR^\Xsf_+$ that maps $\pi$ to $T \pi$ via $$ (T \pi)(x) = \max_i \beta \sum_{x'} [\pi(x') + d(x')] P_i(x, x') \qquad (x \in \Xsf). $$ (eq-tee) We are assuming $d \geq 0$, so $T$ is indeed a self-map on $\RR^\Xsf_+$. By construction, a vector $\pi \in \RR_+^\Xsf$ is a fixed point of $T$ if and only if it is a vector of prices that solves {eq}`eq-ee`. Hence, we have successfully converted our equilibrium problem into a fixed point problem. We aim to show that $T$ is a contraction. To this end, pick any $p, q \in \RR^\Xsf_+$. Applying the inequality from {prf:ref}`l-maxineq`, we obtain $$ | (Tp)(x) - (Tq)(x) | \leq \beta \max_i \left| \sum_{x'} [p(x') + d(x')] P_i(x, x') - \sum_{x'} [q(x') - d(x')] P_i(x, x') \right|. $$ Using the triangle inequality and canceling terms leads to $$ | (Tp)(x) - (Tq)(x) | \leq \beta \max_{i \in \{1, 2\}} \sum_{x'} |p(x') - q(x')| P_i(x, x') \leq \beta \| p - q \|_\infty. $$ Since this bound holds for all $x$, we can take the maximum with respect to $x$ and obtain $$ \| Tp - Tq \|_\infty \leq \beta \| p - q \|_\infty. $$ Thus, on $\RR^\Xsf_+$, the map $T$ is a contraction of modulus $\beta$ with respect to the sup norm. Since $\RR^\Xsf_+$ is a closed subset of $\RR^\Xsf$, we conclude that $T$ has a unique fixed point in this set. Hence, the system {eq}`eq-ee` has a unique solution $\pi^*$ in $\RR^\Xsf_+$, representing equilibrium prices. This fixed point can be computed by successive approximation. ```{exercise} :label: ex-state_dep-auto-14 Provide an alternative proof of contractivity of $T$ on $\RR^\Xsf_+$ using Blackwell's condition ({ref}`sss-blackwell`). ``` (s-cn_state_dep)= ## Chapter Notes Asset pricing is discussed in many sources, including {cite:t}`hansen2010pricing`, {cite:t}`ross2009neoclassical`, {cite:t}`cochrane2009asset`, {cite:t}`duffie2010dynamic` and {cite:t}`campbell2017financial`. Asset pricing is part of many applications and extensions in macroeconomics, public finance, international economics, and other fields. Some of these are described in {cite:t}`ljungqvist2012recursive`. Dynamic programming with state-dependent discounting is becoming more common in macroeconomics and finance. Representative examples include {cite:t}`krusell1998income`, {cite:t}`woodford2011simple`, {cite:t}`christiano2014risk`, {cite:t}`albuquerque2016valuation`, {cite:t}`saijo2017uncertainty`, {cite:t}`basu2017uncertainty`, {cite:t}`degroot2018`, {cite:t}`schorfheide2018identifying`, {cite:t}`hills2019effective`, {cite:t}`toda2019wealth`, {cite:t}`fagereng2019saving`, {cite:t}`hubmer2020sources` and {cite:t}`cao2020recursive`. For more on the theory of state-dependent discounting, see {cite:t}`jasso2020discrete`, {cite:t}`toda2021perov` or {cite:t}`stachurski2021dynamic`. An analysis of sovereign default with time-varying interest rates is provided by {cite:t}`bloise2022sovereign`. Another challenge to the standard model with constant discount rates comes from empirical and experimental studies that find evidence of "hyperbolic discounting," where valuations across time fall rapidly at first and then more slowly. Provocative reviews of hyperbolic and quasi-hyperbolic discounting can be found in {cite:t}`frederick2002time` and {cite:t}`rubinstein2003economics`. {cite:t}`cao2018saving` provide conditions under which predictions from optimal savings models with quasi-hyperbolic discounting are robust. {cite:t}`balbus2018uniqueness` analyze uniqueness of time-consistent stationary Markov policies for quasi-hyperbolic households under uncertainty. {cite:t}`balbus2022time` study equilibria in dynamic models with recursive payoffs and generalized discounting. {cite:t}`noor2022optimal` addresses the topic of optimal discounting. Additional references include {cite:t}`diamond2003quasi`, {cite:t}`dasgupta2005uncertainty`, {cite:t}`karp2005global`, {cite:t}`amador2006commitment`, {cite:t}`balbus2018uniqueness`, {cite:t}`fedus2019hyperbolic`, {cite:t}`hens2020value`, {cite:t}`jaskiewicz2021markov`, and {cite:t}`drugeon2021markovian`. This chapter focused on time additive models with state-dependent discounting. More general preference specifications with this feature include {cite:t}`albuquerque2016valuation`, {cite:t}`schorfheide2018identifying`, {cite:t}`pohl2018higher`, {cite:t}`gomez2020important`, and {cite:t}`de2022valuation`. In {prf:ref}`c-rdps` we consider state-dependent discounting in general settings that accommodate such nonlinearities. [^1]: We are assuming that randomness in interest rates is a function of the same Markov state that influences profits. There is very little loss of generality in making this assumption. In fact, the two processes can still be statistically independent. For example, if we take $X_t$ to have the form $X_t = (Y_t, Z_t)$, where $(Y_t)$ and $(Z_t)$ are independent Markov chains, then we can take $\beta_t$ to be a function of $Y_t$ and $\pi_t$ to be a function of $Z_t$. The resulting interest and profit processes are statistically independent. [^2]: The parameters are $\rho = 0.85$, $\sigma = 0.0062$, and $b = 0.99875$. In line with {cite:t}`hills2019effective`, we discretize the model via `mc = tauchen(n, ρ, σ, 1 - ρ, m)`{.julia} with $m = 4.5$ and $n = 15$. ======================================================================== ## Valuation (c-val)= # Nonlinear Valuation Dynamic programs are optimization problems where the objective to be maximized is lifetime value. As such, one key topic is how to combine a sequence of rewards into a corresponding lifetime value. So far we have considered linear valuation based on summation over expected discounted rewards, using either constant discount rates ({prf:ref}`c-introii`--{prf:ref}`c-mdps`) or state-dependent discounting ({prf:ref}`c-state_dep`). In this chapter, we consider extensions, where lifetime value is computed from a recursion over the reward sequence instead of a discounted sum. This "recursive preference" approach permits far more general specifications of lifetime value, and is becoming increasingly popular in economics, finance, and computer science (see, e.g., {ref}`s-cn_state_dep`). This chapter focuses purely on valuation (i.e., combining reward sequences into lifetime values), rather than optimization. Later, in {prf:ref}`c-rdps`, we will show how to maximize lifetime value in settings where recursive preferences are adopted. Throughout this chapter, the symbol $\Xsf$ always represents a finite set. (s-bcmaps)= ## Beyond Contraction Maps The most natural way to express lifetime value in recursive preference environments is as a fixed point of a (typically nonlinear) operator. One challenge is that some recursive preference specifications induce operators that fail to be contractions. For this reason, we now invest in additional fixed point theory. All of this theory concerns order preserving maps, since the operators we consider always inherit monotonicity from underlying preferences. (ss-ktfs)= ### Knaster--Tarski for Function Space If you try to draw an increasing function that maps $[0,1]$ to itself without touching the 45-degree line, you will find it impossible. Below we state a famous fixed-point theorem due to Bronislaw Knaster (1893--1980) and Alfred Tarski (1901--1983) that generalizes this idea. In the statement, $\Xsf$ is a finite set and $V \coloneq [v_1, v_2]$, where $v_1, v_2$ are functions in $\RR^\Xsf$ with $v_1 \leq v_2$. ```{prf:theorem} :label: t-kt_pre If $T$ is an order preserving self-map on $V$, then the set of fixed points of $T$ is nonempty and contains least and greatest elements $a \leq b$. Moreover, $$ T^k v_1 \leq a \leq b \leq T^k v_2 \quad \text{for all } k \geq 0. $$ ``` Unlike, say, the fixed-point theorem of Banach ({ref}`ss-bcmt`), {prf:ref}`t-kt_pre` only yields existence. Uniqueness does not hold in general, as you can easily confirm by sketching the one-dimensional case or completing the following exercise. ```{exercise} :label: ex-val-auto-1 Consider the setting of {prf:ref}`t-kt_pre` and suppose in addition that $v_1 \not= v_2$. Show that there exists an order preserving self-map on $V$ with a continuum of fixed points. ``` ```{solution} ex-val-auto-1 Here is one possible answer: Let $v_1, v_2$ be distinct and let $T$ be the identity map on $V = [v_1, v_2]$. Then $T$ is order preserving and every point in $V$ is fixed under $T$. The set $V$ is a continuum because it contains all points $v = \alpha v_1 + (1-\alpha) v_2$ with $0 \leq \alpha \leq 1$. ``` (ss-conconop)= ### Concavity, Convexity, and Stability In this section, we study sufficient conditions for global stability that replace contractivity with shape properties such as concavity and monotonicity. To build intuition, we start with the one-dimensional case and show how these properties can be combined to achieve stability. Readers focused on results can safely skip to {ref}`sss-mulc`. (sss-odiso)= #### The One-Dimensional Case In {ref}`sss-nonmaps`, we showed that concavity and monotonicity can yield global stability for the Solow--Swan model. Here is a more general result. ```{prf:proposition} :label: p-mgss If $g$ is an increasing concave self-map on $U \coloneq (0, \infty)$ and, for all $x \in U$, there exist $a, b \in U$ with $a \leq x \leq b$, $a < g(a)$ and $g(b) \leq b$, then $g$ is globally stable on $U$. ``` ```{prf:proof} Regarding existence, fix $x \in U$ and suppose first that $x \leq g(x)$. Since $g$ is increasing, we have $g(x) \leq g^2(x)$. Continuing in this fashion shows that $(g^k(x))_{k \geq 0}$ is monotone increasing. Moreover, there exists a $b \in U$ such that $x \leq b$ and $g(b)\leq b$. Hence $g(x) \leq g(b) \leq b$. Iterating yields $g^k(x) \leq b$ for all $k$, so $(g^k(x))_{k \geq 0}$ is increasing and bounded above. Thus, there exists an $x^* \in U$ such that $x_k \coloneq g^k(x)$ converges to $x^*$ (by {prf:ref}`t-lubp` and {prf:ref}`ex-mssl`). Since $g$ is concave and hence continuous on any open set (see, e.g., {cite}`barbu2012convexity`), the result in {prf:ref}`ex-clifp` implies that $x^* = g(x^*)$. If, instead, $g(x) \leq x$, then a similar argument shows that $(g^k(x))_{k \geq 0}$ is decreasing and bounded. Using analogous reasoning, we obtain a fixed point $x^*$ in $U$ with $g^k(x) \to x^*$. To show the uniqueness of the fixed point, assume $g(x)=x$ and $g(y)=y$ for some $x, y \in U$. We claim that $x=y$. To see this, suppose without loss of generality that $x\leq y$. By assumption, there exists an $a \in U$ such that $a \leq x \leq y$ and $g(a)>a$. Because $a \leq x\leq y$, we can take $\lambda \in [0,1]$ such that $x = \lambda a + (1-\lambda) y$. If $\lambda > 0$, then concavity of $g$ and $g(a)>a$ implies the contradiction $$ g(x) = g \left(\lambda a + (1-\lambda )y \right) \geq \lambda g(a) + (1-\lambda) g(y) > \lambda a + (1-\lambda) y = x = g(x). $$ Hence $\lambda=0$. Since $x = \lambda a + (1-\lambda) y$, this yields $x=y$. ◻ ``` ```{figure} ../figures/concave_map_fp.pdf :name: f-concave_map_fp Global stability induced by increasing concave functions ``` Figure {numref}`f-concave_map_fp` gives one example, where $g(x) = 1 + \sqrt{x}/2$. The conditions of {prf:ref}`p-mgss` hold because, given any $x > 0$, we can find an $a$ in $(0,x)$ that gets mapped strictly up (i.e., $g(a)$ is above the 45-degree line) and a point $b > x$ that gets mapped down (i.e., $g(b)$ is below the 45-degree line). ```{exercise} :label: ex-val-auto-2 Prove that the map $g$ and set $U$ defined in the discussion of the Solow--Swan model before {prf:ref}`p-mgss` satisfies the conditions of the proposition. ``` ```{exercise} :label: ex-val-auto-3 Show that the condition $a < g(a)$ in {prf:ref}`p-mgss` cannot be dropped without weakening the conclusion. ``` ```{solution} ex-val-auto-3 If the condition $a < g(a)$ in {prf:ref}`p-mgss` is dropped then $g$ could be the identity map, which has multiple fixed points and is not globally stable. ``` ```{exercise} :label: ex-val-auto-4 Dropping the Cobb--Douglas specification on production, suppose $g(k) = s f(k) + (1-\delta) k$ where $0 < s, \delta < 1$ and $f$ is a strictly positive increasing concave production function on $U = (0, \infty)$ satisfying the **Inada conditions** $$ f'(k) \to \infty \text{ as } k \to 0 \quad \text{and} \quad f'(k) \to 0 \text{ as } k \to \infty. $$ Use {prf:ref}`p-mgss` to prove that $g$ is globally stable on $U$. ``` ```{exercise} :label: ex-val-auto-5 {cite:t}`fajgelbaum2017uncertainty` study a law of motion for aggregate uncertainty given by $$ s_{t+1} = g(s_t) \quad \text{where} \quad g(s) \coloneq \rho^2 \left[ \frac{1}{s} + a^2 \frac{1}{\eta} \right]^{-1} + \gamma. $$ Let $a$, $\eta$ and $\gamma$ be positive constants and assume $0 < \rho < 1$. Prove that $g$ is globally stable on $M \coloneq (0, \infty)$. ``` (sss-mulc)= #### The Multidimensional Case {prf:ref}`p-mgss` extends to multiple dimensions. In this section, we present a multidimensional version that covers both convex and concave functions. To state our result, we extend the definition of convexity and concavity to vector-valued self-maps. The definitions mirror those for scalar-valued functions: A self-map $T$ on a convex subset $D$ of $\RR^\Xsf$ is called **convex** if $$ T(\lambda u + (1-\lambda) v) \leq \lambda Tu + (1-\lambda) Tv \text{ whenever } u,v \in D \text{ and } \lambda \in [0, 1]; $$ and **concave** if $$ \lambda Tu + (1-\lambda) Tv \leq T(\lambda u + (1-\lambda) v) \text{ whenever } u,v \in D \text{ and } \lambda \in [0, 1]. $$ Here $\leq$ is, as usual, the pointwise order. We are now ready to state our next fixed-point result, which was first proved in an infinite-dimensional setting by {cite:t}`du1990fixed`. In the statement, $\Xsf$ is a finite set, $V \coloneq [v_1, v_2]$ is a nonempty order interval in $(\RR^\Xsf, \leq)$, and $T$ is a self-map on $V$. ```{prf:theorem} :label: t-du If $T$ is order preserving on $V$, then $T$ is globally stable on $V$ under any one of the following four conditions: 1. $T$ is concave and $T v_1 \gg v_1$, 2. $T$ is concave and there exists a $\delta > 0$ such that $T v_1 \geq v_1 + \delta (v_2-v_1)$, 3. $T$ is convex and $T v_2 \ll v_2$, or 4. $T$ is convex and there exists a $\delta > 0$ such that $T v_2 \leq v_2 - \delta (v_2-v_1)$. ``` Conditions (i) and (ii) are similar -- in fact (ii) holds whenever (i) holds, so (ii) is the weaker (but slightly more complicated) condition. Conditions (iii) and (iv) are similar in the same sense. Figure {numref}`f-du` illustrates the convex and the concave versions of the result in one dimension. We encourage you to sketch your own variations to understand the roles that different conditions play. ```{figure} figures/du.svg :name: f-du Du’s theorem: convex and concave cases ``` ```{exercise} :label: ex-val-auto-6 Let $F$ and $G$ be self-maps on convex $D \subset \RR^n$. Show that $T \coloneq F \circ G$ is concave on $D$ whenever $F$ and $G$ are order preserving and concave on $D$. ``` A full proof of {prf:ref}`t-du` can be found in {cite:t}`du1990fixed` or Theorem 2.1.2 and Corollary 2.1.1 of {cite:t}`zhang2012variational`. In our setting, existence follows from the Knaster--Tarski theorem. We prove uniqueness. (sss-powtas)= ### A Power-Transformed Affine Equation Du's theorem provides conditions under which concave or convex order preserving self-maps on order intervals attain global stability. In this section we study maps of this type that have additional structure. While this additional structure is restrictive, it allows us to obtain global stability on unbounded subsets rather than order intervals. To begin, let $\Xsf$ be a finite set and consider the equation $$ v = [h + (A v)^{1/\theta}]^\theta \qquad (v \in V), $$ (eq-powtrans) where $\theta$ is a nonzero parameter, $A \in \lopx$ with $A \geq 0$, $V = (0, \infty)^\Xsf$, and $h \in V$. This system reduces to the affine model studied in {prf:ref}`l-nsln` when $\theta = 1$. To analyze {eq}`eq-powtrans`, we introduce the self-map $$ Gv = [h + (A v)^{1/\theta}]^\theta \qquad (v \in V). $$ (eq-gop) Continuing to assume that $h \gg 0$ and $A$ is a positive linear operator, we can use Du's theorem to establish the next result (which generalizes {prf:ref}`l-nsln` on page  ). ```{prf:theorem} :label: t-powaff If $A$ is irreducible, then the following statements are equivalent. 1. $\rho(A)^{1/\theta} < 1$. 2. $G$ is globally stable on $V$. In the case $\rho(A)^{1/\theta} \geq 1$, the map $G$ has no fixed point in $V$. ``` The key to proving (i) implies (ii) is that $G$ is order preserving and either convex or concave, depending on the value of $\theta$. The remaining conditions in Du's theorem are established over order intervals using $\rho(A)^{1/\theta} < 1$. By applying an approximation argument, global stability is extended from order intervals to all of $V$. Some of these details are contained in the following exercises and a full proof can be found in {cite:t}`stachurski2022unique`. Let $$ F_x(t) = \left\{ h(x) + t^{1/\theta} \right\}^\theta \qquad (t > 0). $$ ```{exercise} :label: ex-ficoncon Prove that, for all $x \in \Xsf$, the function $F_x$ is increasing, 1. convex whenever $\theta \in (0,1]$, and 2. concave otherwise (i.e., for other nonzero $\theta$). ``` ```{solution} ex-ficoncon It is straightforward to show that $F_x' > 0$ on $(0,\infty)$, which proves that $F_x$ is increasing. Some additional algebra confirms that $\theta \in (0,1]$ implies $F_x'' > 0$, whereas $\theta < 0$ and $\theta > 1$ both imply $F_x'' < 0$. Details are left to the reader. ``` ```{exercise} :label: ex-val-auto-7 Using {prf:ref}`ex-ficoncon`, prove that $G$ is order preserving on $V$, convex on $V$ whenever $\theta \in (0,1]$, and concave otherwise. ``` ```{solution} ex-val-auto-7 Observe that $(Gu)(x) = F_x[(Au)(x)]$. Since $A$ is order preserving and $F$ is increasing, $u \leq v$ implies $Gu \leq Gv$. In particular, $G$ is order preserving. If $\theta \in (0,1]$, then $F$ is convex. Hence, fixing $u, v \in V$ and $\lambda \in [0, 1]$ (and dropping $x$ from our notation), we have $$ FA(\lambda u + (1-\lambda) v) = F(\lambda Au + (1-\lambda) Av) \leq \lambda FAu + (1-\lambda) FAv. $$ Hence $G$ is convex. The proof that $G$ is concave for other values of $\theta$ is similar and omitted. ``` ```{exercise} :label: ex-val-auto-8 {cite:t}`kleinman2023dynamic` study a dynamic discrete choice model of migration with savings and capital accumulation. They show that optimal consumption for landlords in their model is $c_t = \sigma_t R_t k_t$, where $k_t$ is capital, $R_t$ is the gross rate of return on capital, and $\sigma_t$ is a state-dependent process obeying $$ \sigma_t^{-1} = 1 + \beta^\psi \left[ \EE_t R_{t+1}^{(\psi-1)/\psi} \sigma_{t+1}^{-1/\psi} \right]^\psi. $$ (eq-kls) Here $\beta$ is a discount factor and $\psi$ is a utility parameter. Assume $R_t = f(X_t)$ where $\Xsf$ is finite, $f \in \RR^\Xsf$, and $(X_t)$ is $P$-Markov for some $P \in \mopx$. Let $A \in \lopx$ be defined by $$ (Av)(x) = \beta \sum_{x'} f(x')^{(\psi-1)/\psi} v(x')P(x, x'). $$ Prove that there exists a unique solution to {eq}`eq-kls` of the form $\sigma_t = \sigma(X_t)$ for some $\sigma \in \RR^\Xsf$ with $\sigma \gg 0$ if and only if $\rho(A)^\psi < 1$. ``` ```{solution} ex-val-auto-8 Setting $v_t = \sigma_t^{-1/\psi}$, we can write {eq}`eq-kls` as $$ v_t = \left\{ 1 + \beta^\psi \left[ \EE_t R_{t+1}^{(\psi-1)/\psi} v_{t+1} \right]^\psi \right\}^{1/\psi}. $$ (eq-klsv) We conjecture a stationary Markov solution $v_t = v(X_t)$ for some $v \in \RR^\Xsf$ with $v \gg 0$. This $v$ must satisfy $$ v(x) = \left\{ 1 + \beta^\psi \left[ \sum_{x'} f(x')^{(\psi-1)/\psi} v(x') P(x,x') \right]^\psi \right\}^{1/\psi} \qquad (x \in \Xsf). $$ Using the definition of $A$ in the exercise, we can write the equation in vector form as $v = [1 + (Av)^\psi]^{1/\psi}$. It follows from {prf:ref}`t-powaff` that a unique strictly positive solution to this equation exists if and only if $\rho(A)^\psi < 1$. This proves the claim in the exercise. ``` (s-recursive)= ## Recursive Preferences In this section, we compute lifetime values associated with given reward processes in settings that involve nonlinear recursions. These nonlinear recursions are called *recursive preferences*. We will show how some common specifications of recursive preferences can be translated into lifetime valuations via the fixed-point methods introduced in {prf:ref}`c-fpt` and {ref}`s-bcmaps`. (ss-vcpf)= ### Motivation: Optimal Savings We motivate recursive preference models by analyzing consumption decisions. (sss-csrv)= #### A Recursive View of a Standard Model The time additive model of valuation in {ref}`sss-nccas` can be studied from a purely recursive point of view. As a starting point, we state that the value $V_t$ of current and future consumption is defined at each point in time $t$ by the recursion $$ V_t = u(C_t) + \beta \, \EE_t V_{t+1}. $$ (eq-addseprec) The random variables $V_t$ and $V_{t+1}$ are the unknown objects in this expression. The expectation $\EE_t$ conditions on $X_0, \ldots, X_t$ and $C_t = c(X_t)$. The process $(X_t)_{t \geq 0}$ is $P$-Markov. Since consumption is a function of $(X_t)_{t \geq 0}$ and knowledge of the current state $X_t$ is sufficient to forecast future values (by the Markov property), it is natural to guess that $V_t$ will depend on the Markov chain only through $X_t$. Hence we guess there is a solution of {eq}`eq-addseprec` takes the form $V_t = v(X_t)$ for some $v \in \RR^\Xsf$. (Here $v$ is an *ansatz*, meaning "educated guess." First we guess the form of a solution and then we try to verify that the guess is correct. So long as we carry out the second step, starting with a guess brings no loss of rigor.) Under this conjecture, {eq}`eq-addseprec` can be rewritten as $v(X_t) = u(c(X_t)) + \beta \EE_t v(X_{t+1})$. Conditioning on $X_t = x$ and setting $r \coloneq u \circ c$, this becomes $$ v(x) = r(x) + \beta \, \EE_x \, v(X_{t+1}) = r(x) + \beta (P v) (x) \qquad (x \in \Xsf). $$ (eq-vbec) In vector form, we get $v = r + \beta P v$. From the Neumann series lemma, the solution is $v^* = (I - \beta P)^{-1} r$, which is identical to {eq}`eq-vfucq`. ```{exercise} :label: ex-val-auto-9 Verify our guess: Show $(V^*_t)$ obeys {eq}`eq-addseprec` when $V_t^* \coloneq v^*(X_t)$. ``` ```{solution} ex-val-auto-9 Fixing $t$, rearranging $v^* = (I - \beta P)^{-1} r$ to $v^* = r + \beta P v^*$ and evaluating at $X_t$ gives $$ \begin{aligned} V_t^* = v^*(X_t) & = r(X_t) + \beta \, \sum_{x'} v^*(x') P(X_t, x') \\ & = u(C_t) + \beta \EE [ v^*(X_{t+1}) \given X_t] = u(C_t) + \beta \EE_t V^*_{t+1}. \end{aligned} $$ Hence $(V^*_t)_{t \geq 0}$ obeys {eq}`eq-addseprec`, as claimed. ``` In summary, {eq}`eq-addseprec` and the sequential representation {eq}`eq-cvalas` specify the same lifetime value for consumption paths. While the recursive formulation in {eq}`eq-addseprec` now seems redundant, since it produces the same specification that we obtained from the sequential approach, the recursive set up gives us a formula to build on, and hence a pathway to overcoming limitations of the time additive approach. Most of the rest of this chapter will be focused on this agenda. Pursuing this agenda will produce preferences over consumption paths where the sequential approach has no natural counterpart. This occurs when current value $V_t$ is nonlinear in current rewards and continuation values (unlike the linear specification {eq}`eq-addseprec`). Such specifications are called **recursive preferences**. When dealing with recursive preference models, the lack of a sequential counterpart means that we are *forced* to proceed recursively. ```{prf:remark} :label: r-rpterm The term "recursive preferences" is confusing, since, as we have just agreed, time additive preferences also admit the recursive specification {eq}`eq-addseprec`. When economists say "recursive preferences," they almost always refer to settings where lifetime utility can *only* be expressed recursively. We follow this convention. ``` (sss-las)= #### Limitations of Time Additive Preferences In the previous section, we discussed how the time additive preference specification $$ v(x) = \EE_x \sum_{t \geq 0} \beta^t u(C_t), $$ (eq-cvalas_rep) also called the **discounted expected utility model**, can be framed recursively, and how this provides a pathway to go beyond the time additive specification. We are motivated to do so because the time additive specification has been rejected by experimental and observational data in many settings. In this section, we highlight some of the limitations of time additive preferences. While our discussion is only brief, more background and a list of references can be found in {ref}`s-cn_val`. One issue with {eq}`eq-cvalas_rep` is the assumption of a constant positive discount rate, which has been refuted by a long list of empirical studies. This issue was discussed in {ref}`s-cn_state_dep`. Another limitation of time additive preferences is that agents are risk-neutral in future utility (see, e.g., {eq}`eq-addseprec`, where current value depends linearly on future value). Although risk aversion over consumption can be built in through curvature of $u$, this same curvature also determines the elasticity of intertemporal substitution, meaning that the two aspects of preferences cannot be separated. We elaborate on this point in {ref}`sss-aggre`. A third issue with time additivity is that agents with such preferences are indifferent to any variation in the joint distribution of rewards that leaves marginal distributions unchanged. To get a sense of what this means, suppose you accept a new job and will be employed by this firm for the rest of your life. Your daily consumption will be entirely determined by your daily wage. Your boss offers you two options: - Your boss will flip a coin at the start of your first day on the job. If the coin is heads, you will receive \$10,000 a day for the rest of your life. If the coin is tails, you will receive \$1 per day for the rest of your life. - Your boss will flip a coin at the start of every day. If the coin is heads, you will receive \$10,000. If the coin is tails, you will receive \$1. If you have a strict preference between options A and B, then your choice cannot be rationalized with time additive preferences. To see why, let $\phi$ be a probability distribution that represents the lottery just described, putting mass 0.5 on 10,000 and mass 0.5 on 1. Under option A, consumption $(C_t)_{t \geq 1}$ is given by $C_t = C_1$ for all $t$, where $C_1 \sim \phi$. Under option B, consumption $(C_t)_{t \geq 1}$ is an iid sequence drawn from $\phi$. Either way, lifetime utility is $$ \EE \sum_{t \geq 1} \beta^t u(C_t) = \sum_{t \geq 1} \beta^t \EE u(C_t) = \frac{\beta \bar u}{1-\beta}, $$ where $\bar u \coloneq \EE u(C_1) = u(1)/2 + u(10,000)/2$. The critical part of this argument is the passing of expectations through the sum, which uses time additivity . The implication is that lifetime utility depends only on the marginal distribution of each $C_t$, rather than on the joint distribution of the stochastic process $(C_t)_{t \geq 0}$. (ss-rspref)= ### Risk-Sensitive Preferences Having motivated recursive preferences, let's turn to our first example: **risk-sensitive preferences**. For the consumption problem described in {ref}`sss-csrv`, imposing risk-sensitive preferences means replacing the recursion $v = r + \beta Pv$ for $v$ with $$ v(x) = r(x) + \beta \frac{1}{\theta} \ln \left\{ \sum_{x'} \exp(\theta v(x')) P(x, x') \right\} \qquad (x \in \Xsf). $$ (eq-rsfre) As before, $r(x) = u(c(x))$ represents current utility when the current state is $x$. The parameter $\theta$ is a nonzero constant in $\RR$. In {eq}`eq-rsfre`, the transform $f(v) = \exp(\theta v)$ is applied to $v$ before expectation is taken. After the expectation is computed, the transform is undone via $f^{-1}(v) = (1/\theta) \ln (v)$. We will show that the agent can be either risk-averse or risk-loving with respect to future outcomes, depending on the value of $\theta$. #### Lifetime Utility We understand the functional equation {eq}`eq-rsfre` as "defining" lifetime utility under risk-sensitive preferences. A function $v$ solving {eq}`eq-rsfre` gives a lifetime valuation $v(x)$ to each $x \in \Xsf$, with the interpretation that $v(x)$ is lifetime utility conditional on initial state $x$. This definition of lifetime value is by analogy to the time additive case studied in {ref}`sss-csrv`, where the function $v$ solving $v = r + \beta P v$ measures lifetime utility from each initial state. In the previous paragraph we wrote "defining" in scare quotes because we can't be sure we have a definition at this point. Just because we write down a recursive expression for lifetime utility doesn't mean that corresponding lifetime utility is actually well defined. (For example, we can happily write down the recursive vector equation $v = v + \1$ but no vector $v$ solving this equation exists.) One aim of this chapter is to provide conditions under which recursions like {eq}`eq-rsfre` have solutions. Another issue is uniqueness. Suppose that {eq}`eq-rsfre` has many solutions. In this case the predictions of the utility model are ambiguous. Our perspective is that the recursive preference specification {eq}`eq-rsfre` is not correctly formulated unless existence and uniqueness hold. We return to this point in {ref}`sss-rseu`. One final comment: even if we can find a $v$ that solves {eq}`eq-rsfre`, the nonlinearities introduced by risk sensitivity imply that there will be no neat sequential representation analogous to $v(x) = \EE_x \sum_t \beta^t u(C_t)$ from the time additive case. (This connects to {prf:ref}`r-rpterm`, where we discuss recursive preference terminology.) (sss-erae)= #### Risk-Adjusted Expectation We want to understand the "expectation-like" expression on the right hand side of {eq}`eq-rsfre` that replaces the ordinary conditional expectation $\sum_{x'} v(x') P(x, x')$ from the time additive case. To this end, we define, for arbitrary random variable $\xi$ and nonzero $\theta \in \RR$, $$ \eE_\theta [\xi] = \frac{1}{\theta} \ln \left\{ \EE [ \exp(\theta \xi) ] \right\}. $$ The value $\eE_\theta[\xi]$ is called the **entropic risk-adjusted expectation** of $\xi$ given $\theta$. ```{exercise} :label: ex-val-auto-10 Prove that, for any random variable $\xi$ any nonzero $\theta$ and any constant $c$, we have $\eE_\theta [\xi + c] = \eE_\theta [\xi] + c$. ``` The key idea behind the entropic risk-adjusted expectation is that decreasing $\theta$ lowers appetite for risk and increasing $\theta$ does the opposite. ```{exercise} :label: ex-val-auto-11 Prove that, if $\xi$ is normally distributed, then $$ \eE_\theta[ \xi] = \EE [ \xi ] + \theta \frac{\var[\xi]}{2}. $$ (eq-rsndi) (Hint: Look up the moment generating function of a normal distribution.) ``` Expression {eq}`eq-rsndi` shows that, for the Gaussian case, $\eE_\theta[ \xi]$ equals the mean plus a term that penalizes variance when $\theta < 0$ and rewards it when $\theta > 0$. More generally, we have the following result. ```{prf:lemma} :label: l-tcrs For any random variable $\xi$ taking values in $\Xsf$, we have 1. $\eE_\theta[\xi] \leq \EE[\xi]$ for all $\theta < 0$. 2. $\eE_\theta[\xi] \geq \EE[\xi]$ for all $\theta > 0$. Moreover, both of these inequalities are strict if and only if $\var[\xi] > 0$. ``` ```{prf:proof} Fix $\theta \in \RR$ and let $f \colon \RR \to (0,\infty)$ be defined by $f(x) = \exp(\theta x)$. Note that $f'(x) = \theta \exp(\theta x)$ and $f''(x) = \theta^2 \exp(\theta x)$. Thus $f$ is convex and either increasing or decreasing depending on whether $\theta$ is positive or negative. Then $\eE_\theta[\xi] = f^{-1}( \EE f(\xi))$. By Jensen's inequality, $$ \EE [ f(\xi) ] \geq f(\EE [\xi] ). $$ If $\theta > 0$, then $f^{-1}$ is increasing, so applying $f^{-1}$ to both sides gives $\eE_\theta[\xi] \geq \EE [ \xi]$. If $\theta < 0$, then $f^{-1}$ is decreasing, so applying $f^{-1}$ to both sides gives $\eE_\theta[\xi] \leq \EE [ \xi]$. This proves the two weak inequalities in {prf:ref}`l-tcrs`. To obtain strict inequalities we can apply the same argument using a strict version of Jensen's inequality (see, e.g., {cite}`liao2018sharpening`), which is valid when $\var[\xi] > 0$. ◻ ``` (sss-rseu)= #### Existence and Uniqueness Let's return to investigating lifetime utility under risk-sensitive preferences. To this end, we introduce the **risk-sensitive Koopmans operator** $K_\theta$ on $\RR^\Xsf$ via $$ (K_\theta \, v)(x) = r(x) + \beta \frac{1}{\theta} \ln \left\{ \sum_{x'} \exp(\theta v(x')) P(x, x') \right\} \qquad (x \in \Xsf). $$ (eq-koop) Evidently, for given nonzero $\theta$, a function $v \in \RR^\Xsf$ solves the risk-sensitive preference lifetime utility specification {eq}`eq-rsfre` if and only if $v$ is a fixed point of $K_\theta$. This explains the significance of the following result: ```{prf:proposition} :label: p-ktheta If $\beta < 1$, then $K_\theta$ is globally stable on $\RR^\Xsf$. ``` We postpone a proof of {prf:ref}`p-ktheta` because we will prove a more general result in {ref}`sss-proofkt`. For now we note the following implications. 1. For each nonzero $\theta$, lifetime utility is both well-defined and uniquely defined for risk-sensitive preferences (i.e., {eq}`eq-rsfre` has a unique solution). 2. The unique solution, denoted henceforth by $v^*$, can be computed by successive approximation using $K_\theta$. #### The Gaussian Case As a tractable case, let's suppose that $r(x) = x$ and that $X_{t+1} = \rho X_t + \sigma W_{t+1}$ where $(W_t)_{t \geq 1}$ is iid and standard normal. Here $|\rho| < 1$ and $\sigma \geq 0$ controls volatility of the state. Rather than discretizing the state process, we leave it as continuous and proceed by hand. In this setting, the functional equation {eq}`eq-rsfre` for $v$ becomes $$ v(x) = x + \beta \eE_\theta[ v(\rho x + \sigma W)], $$ (eq-rsgf) for each $x \in \Xsf$, where $W$ is standard normal. Since $\rho x + \sigma W$ is Gaussian, the expression {eq}`eq-rsndi` for the risk-adjusted expectation of a normal random variable leads us to conjecture that the solution $v$ will be affine, i.e., $v(x) = a x + b$ for some $a, b \in \RR$. This conjecture turns out to be correct: ```{exercise} :label: ex-val-auto-12 Verify that $v(x) = a x + b$ solves {eq}`eq-rsgf` when $$ a \coloneq \frac{1}{1 - \rho \beta} \quad \text{and} \quad b \coloneq \theta \frac{\beta}{1-\beta} \frac{(a \sigma)^2}{2}. $$ ``` We can see that, under the stated assumptions, lifetime value $v$ is increasing in the state variable $x$. However, impacts of the parameters generally depend on $\theta$. For example, if $\theta > 0$, increasing $\sigma$ shifts up lifetime utility. If $\theta < 0$, then lifetime value decreases with $\sigma$. This is as we expect: Lifetime utility is affected positively or negatively by volatility, depending on whether or not the agent is risk averse or risk loving. Figure {numref}`f-rs_utility_1` shows the true solution $v(x) = ax + b$ to the risk-sensitive lifetime utility model, as well as an approximate fixed point from a discrete approximation. The discrete approximation is computed by applying successive approximation to $K_\theta$ after discretizing the state process via Tauchen's method. The parameters and discretization are shown in {numref}`list-rs_utility`. ```{figure} ../figures/rs_utility_1.pdf :name: f-rs_utility_1 Approximate and true solutions in the Gaussian case ``` ```{code-block} julia :name: list-rs_utility :caption: Risk sensitive utility model parameters (`rs_utility.jl`) :linenos: using LinearAlgebra, QuantEcon function create_rs_utility_model(; n=180, # size of state space β=0.95, # time discount factor ρ=0.96, # correlation coef in AR(1) σ=0.1, # volatility θ=-1.0) # risk aversion mc = tauchen(n, ρ, σ, 0, 10) # n_std = 10 x_vals, P = mc.state_values, mc.p r = x_vals # special case u(c(x)) = x return (; β, θ, ρ, σ, r, x_vals, P) end ``` ```{exercise} :label: ex-val-auto-13 Replicate Figure {numref}`f-rs_utility_1`. ``` ```{exercise} :label: ex-val-auto-14 Dropping the Gaussian assumption, suppose now that consumption is iid with $C_t = c(X_t)$ where $(X_t)_{t \geq 0}$ is iid with distribution $\phi$ on finite set $\Xsf$. Now the operator $K_\theta$ becomes $$ (K_\theta v)(x) = r(x) + \beta \frac{1}{\theta} \ln \left\{ \sum_{x'} \exp(\theta v(x')) \phi(x') \right\} \qquad (x \in \Xsf). $$ Although iterating on $K_\theta$ is convergent, there is a more efficient method that reduces to solving a one-dimensional equation. Propose such a method and confirm that it is convergent. (Hint: Consider reviewing {ref}`sss-iidrd`.) ``` ```{solution} ex-val-auto-14 Letting $g \coloneq \sum_{x'} \exp(\theta v(x'))\phi(x')$, we have $v(x) = r(x) + \beta \frac{1}{\theta} \ln g$. Applying the same transformation to both sides produces $$ g = \sum_{x'} \exp\left[\theta\left(r(x') +\beta \frac{1}{\theta} \ln g\right)\right]\phi(x'). $$ Solving for $g$ yields $g = \left[\sum_{x'}\exp(\theta r(x'))\phi(x)\right]^{\frac{1}{1-\beta}}$. ``` (ss-ezi)= ### Epstein--Zin Preferences One of the most popular specifications of recursive preferences in quantitative research is Epstein--Zin utility.[^1] This class of preferences has been used to study asset pricing, business cycles, monetary policy, fiscal policy, optimal taxation, climate policy, pension plans, and other topics. In this section, we introduce the Epstein--Zin specification and discuss how to solve it. We will see that the specification, while highly nonlinear, is nonetheless well behaved. (sss-ezispec)= #### Specification With **Epstein--Zin preferences**, the relationship $V_t = u(C_t) + \beta \EE_t V_{t+1}$ is replaced by $$ V_t = \left\{ (1-\beta) C_t^\alpha + \beta [\EE_t V_{t+1}^\gamma]^{\alpha / \gamma} \right\}^{1/\alpha}, $$ (eq-ezfp) where $\gamma$, $\alpha$ are nonzero parameters and $\beta \in (0,1)$. As for risk-sensitive preferences, lack of time additivity implies that there is no neat sequential representation for lifetime value. As a result, we must work directly with the recursive expression {eq}`eq-ezfp`. Assume as before that $C_t = c(X_t)$, where $c \in \RR_+^\Xsf$ and $(X_t)_{t \geq 0}$ is $P$-Markov on finite set $\Xsf$. We conjecture a solution of the form $V_t = v(X_t)$ for some $v \in V \coloneq \RR_+^\Xsf$. Under this conjecture, the **Epstein--Zin Koopmans operator** corresponding to {eq}`eq-ezfp` is $$ (Kv)(x) = \left\{ (1-\beta) c(x)^\alpha + \beta \left[ \sum_{x'} v(x')^\gamma P(x, x') \right]^{\alpha/\gamma} \right\}^{1/\alpha}. $$ (eq-kez) As will be discussed further in {ref}`sss-rae`, the parameter $\gamma$ governs risk aversion with respect to temporal gambles (where outcomes are resolved in the next period), while $\beta$ controls impatience and $\alpha$ parametrizes the intertemporal elasticity of substitution. The fact that all three parameters have distinct effects helps fit data. For example, see {cite:t}`tallarini2000risk` and {cite:t}`barillas2009doubts`. An important question is whether Epstein--Zin preferences are well defined. In particular, what conditions do we need on primitives such that the Koopmans operator $K$ in {eq}`eq-kez` has a unique fixed point? (sss-pkoez)= #### Properties of the Koopmans Operator To address this question we rewrite {eq}`eq-kez` in vector form as $$ Kv = \left\{ h + \beta [P v^\gamma]^{\alpha /\gamma} \right\}^{1/\alpha}, $$ (eq-kezvec) where $h \in \RR^\Xsf$. This is equivalent to {eq}`eq-kez` when $h = (1-\beta) c^\alpha$. To avoid fractional powers of negative numbers, we assume throughout that $h \geq 0$. ```{exercise} :label: ex-val-auto-15 Prove that, under this assumption, $K$ is a self-map on $V \coloneq (0, \infty)^\Xsf$. ``` ```{solution} ex-val-auto-15 Let $K$ be as stated (see {eq}`eq-kezvec`) and fix $v \gg 0$. Clearly $v^\gamma \gg 0$ and hence $Pv^\gamma \gg 0$ (see {prf:ref}`ex-pmptm`). Since $h \geq 0$, it follows easily that $Kv \gg 0$. ``` The set $V$ is called the **interior of the positive cone** of $\RR^\Xsf$. The operator $K$ is difficult to work with for two reasons. First, linear and nonlinear transformations are intertwined. Second, there are several cases for the parameters that we need to handle in order to understand stability. Nonetheless, by applying a smooth transformation, we will find it easy to show that the Epstein--Zin Koopmans operator $K$ is globally stable under mild conditions. In particular, ```{prf:proposition} :label: p-ezbk If $P$ is irreducible and $h \gg 0$, then $K$ is globally stable on $V$. ``` A proof of {prf:ref}`p-ezbk` is provided in {ref}`sss-psez`. {prf:ref}`p-ezbk` implies that Epstein--Zin utility is well-defined under the stated conditions and, moreover, that the solution can be computed via successive approximation on $K$. {numref}`list-ez_utility` provides code for performing this operation. Figure {numref}`f-ez_utility_c` shows convergence of the sequence of iterates to the fixed point $v^*$, under the parameters in {numref}`list-ez_utility`, given an initial condition $v_0$. The figure plots every 10th iterate, repeated 100 times. ```{code-block} julia :name: list-ez_utility :caption: Epstein--Zin utility model and Koopmans operator (`ez_utility.jl`) :linenos: include("s_approx.jl") using LinearAlgebra, QuantEcon function create_ez_utility_model(; n=200, # size of state space ρ=0.96, # correlation coef in AR(1) σ=0.1, # volatility β=0.99, # time discount factor α=0.75, # EIS parameter γ=-2.0) # risk aversion parameter mc = tauchen(n, ρ, σ, 0, 5) x_vals, P = mc.state_values, mc.p c = exp.(x_vals) return (; β, ρ, σ, α, γ, c, x_vals, P) end function K(v, model) (; β, ρ, σ, α, γ, c, x_vals, P) = model R = (P * (v.^γ)).^(1/γ) return ((1 - β) * c.^α + β * R.^α).^(1/α) end function compute_ez_utility(model) v_init = ones(length(model.x_vals)) v_star = successive_approx(v -> K(v, model), v_init, tolerance=1e-10) return v_star end ``` ```{figure} ../figures/ez_utility_c.pdf :name: f-ez_utility_c Convergence of Koopmans iterates for Epstein--Zin utility ``` (sss-psez)= #### Proof of the Stability Result We prove {prf:ref}`p-ezbk` by 1. introducing an operator $\hat K$ obtained from $K$ via a smooth transformation, 2. proving that $(\hat V, \hat K)$ and $(V, K)$ are topologically conjugate, and 3. obtaining conditions under which $\hat K$ is globally stable on $V$. Throughout this section, the assumptions of {prf:ref}`p-ezbk` are in force. To begin we define $\hat K$ via $$ \hat K v = \left\{ h + \beta (P v)^{1/\theta} \right\}^\theta \qquad \text{where} \quad \theta \coloneq \frac{\gamma}{\alpha}. $$ (eq-kppu) The operator $\hat K$ is simpler to work with than $K$ because it unifies $\alpha, \gamma$ into a single parameter $\theta$ and decomposes the Epstein--Zin update rule into two parts: a linear map $P$ and a separate nonlinear component. ```{exercise} :label: ex-ksmov Prove that 1. $\hat K$ is a self-map on $V$ and 2. $v \in V$ is a fixed point of $K$ if and only if $v^\gamma$ is a fixed point of $\hat K$. ``` ```{solution} ex-ksmov If $v \gg 0$, then, since $h \gg 0$ also holds, we have $$ \hat K v = \left\{ h + \beta (P v)^{1/\theta} \right\}^\theta \geq h^\theta \gg 0. $$ Hence $\hat K$ is a self-map on $V$. In addition, the statement $Kv = v$ is equivalent to $v^\alpha = h + \beta (Pv^\gamma)^{\alpha/\gamma}$. Using $\theta = \gamma/\alpha$, we can rewrite the last equation as $v^\gamma = [h + \beta (Pv^\gamma)^{1/\theta}]^\theta$. In other words, $v = Kv$ if and only if $v^\gamma = \hat K v^\gamma$. ``` ```{prf:lemma} :label: l-kkhtc Let $\Phi$ be defined by $\Phi v = v^\gamma$. The map $\Phi$ is a homeomorphism from $V$ to itself and $(V, K)$ and $(V, \hat K)$ are topologically conjugate under $\Phi$. ``` ```{prf:proof} Evidently $\Phi$ is a continuous bijection from $V$ to itself, with continuous inverse $\Phi^{-1} v = v^{1/\gamma}$. Hence $\Phi$ is a homeomorphism. In addition, for $v \in V$, $$ \hat K\Phi v = \left\{ h + \beta (P\Phi v)^{1/\theta} \right\}^\theta = \left\{ h + \beta (P v^\gamma)^{\alpha /\gamma} \right\}^{\gamma/\alpha} = \Phi Kv . $$ This shows that $(V, K)$ and $(V, \hat K)$ are topologically conjugate, as claimed. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`p-ezbk`.* Set $A = \beta^\theta P$. With this notation we have $\hat K v = [h + (A v)^{1/\theta}]^\theta$. In view of in {prf:ref}`t-powaff`, this operator is globally stable on $V$ whenever $\rho(A)^{1/\theta} < 1$. In our case $\rho(A) = \rho(\beta^\theta P) = \beta^\theta$, so $\rho(A)^{1/\theta} = \beta$. It follows that $\hat K$ is globally stable on $V$ whenever $\beta < 1$. Since $(V, K)$ and $(V, \hat K)$ are topologically conjugate, the proof of {prf:ref}`p-ezbk` is complete. ◻ ``` #### Why Not Use Contractivity? While we can consider studying stability of $\hat K$ using contraction arguments, this approach fails under useful parameterizations. To illustrate, suppose that $\Xsf = \{x_1\}$. Then $h$ is a constant, $P$ is the identity, $v$ is a scalar and $\hat K v = F(v)$ with $F(v) = \left\{ h + \beta v^{1/\theta} \right\}^\theta$, as shown in Figure {numref}`f-ez_noncontraction`. Here $\theta = 5$, $h=0.5$ and $\beta=0.5$. We see that $\hat K$ has infinite slope at zero, so the contraction property fails.[^2] ```{figure} ../figures/ez_noncontraction.pdf :name: f-ez_noncontraction Shape properties of $\hat K$ in one dimension ``` ```{exercise} :label: ex-val-auto-16 Prove that, given the parameter values used for Figure {numref}`f-ez_noncontraction`, the function $F$ satisfies $F'(t) \to \infty$ as $t \downarrow 0$. ``` (s-genrep)= ## General Representations We have discussed two well-known examples of recursive preferences. In this section we build a general representation. While various constructions can be found in the decision theory literature, many are not well suited to quantitative work. Here we give a relatively parsimonious operator-theoretic definition. (ss-koopop)= ### Koopmans Operators In {ref}`sss-rseu` and {ref}`sss-ezispec` we met risk-sensitive and Epstein--Zin Koopmans operators respectively. In this section, we provide a general definition of a Koopmans operator that will contain these two examples as special cases. We begin by outlining structure that can be combined to generate Koopmans operators in a Markov environment. The two key components are an aggregation function and a certainty equivalent operator. We then build Koopmans operators from these primitives and connect them to applications. In every setting we consider, lifetime value is identified with the unique fixed point of the Koopmans operator (whenever it exists). (sss-rae)= #### Certainty Equivalents The first primitive we consider is a generalization of conditional expectations: Given $V \subset \RR^\Xsf$, we define a **certainty equivalent operator** on $V$ to be a self-map $R$ on $V$ such that 1. $R$ is order preserving on $V$ and 2. all constants are fixed under $R$ (i.e., $R\, ( \lambda \1) = \lambda \1$ for all $\lambda \in \RR$ with $\lambda \1 \in V$). ```{prf:example} :label: eg-oeo The usual conditional expectations operator is a certainty equivalent operator. To see this, set $V = \RR^\Xsf$ and fix $P \in \mopx$. Since $f, g \in \RR^\Xsf$ with $f \leq g$ implies $Pf \leq Pg$ and $P (\lambda \1) = \lambda P \1 = \lambda \1$, we see that $P$ satisfies (i)--(ii). ``` ```{exercise} :label: ex-val-auto-17 In the last example, the certainty equivalent $R = P$ is linear. Prove that this is the only linear case. In particular, prove the following: if $\mathbf R(\Xsf)$ is the set of all certainty equivalent operators on $\RR^\Xsf$, then $\mathbf R(\Xsf) \cap \lopx = \mopx$. ``` ```{solution} ex-val-auto-17 Let $R$ be a linear operator on $\RR^\Xsf$ that is also a certainty equivalent. In particular, $R$ is order preserving, and $R\1 = \1$. Since $R$ is order preserving and linear, $R$ is a positive linear operator (see {prf:ref}`ex-plop`). Hence $R$ is a Markov operator. ``` The next example is nonlinear. It treats the risk-adjusted expectation that appears in risk-sensitive preferences. ```{prf:example} :label: eg-ffln Let $V = \RR^\Xsf$ and fix nonzero $\theta$ and $P \in \mopx$. The **entropic certainty equivalent operator** is the operator $R_\theta$ on $V$ defined by $$ (R_\theta \, v)(x) = \frac{1}{\theta} \ln \left\{ \sum_{x'} \exp(\theta v(x')) P(x,x') \right\} \qquad (v \in V, \; x \in \Xsf). $$ ``` ```{exercise} :label: ex-val-auto-18 Show that $R_\theta$ is in fact a certainty equivalent operator. ``` ```{prf:example} :label: eg-ffa As a third example, let $V$ be the interior of the positive cone, as in {ref}`sss-pkoez`, and fix $P \in \mopx$. The operator $$ (R_\gamma \, v)(x) = \left\{ \sum_{x'} v(x')^\gamma P(x, x') \right\}^{1/\gamma} \qquad (v \in V, \; x \in \Xsf, \; \gamma \not=0) $$ (eq-ffa) is a certainty equivalent operator on $V$. The map $R_\gamma$ is sometimes called the **Kreps--Porteus certainty equivalent operator** in honor of {cite:t}`kreps1978temporal`. We met $R_\gamma$ in {ref}`ss-ezi`, when we discussed Epstein--Zin preferences. ``` ```{exercise} :label: ex-val-auto-19 Confirm that $R_\gamma$ is a certainty equivalent operator. ``` ```{exercise} :label: ex-quant_ra Let $V = \RR^\Xsf$ and fix $P \in \mopx$ and $\tau \in [0,1]$. Let $R_\tau$ be the **quantile certainty equivalent**. That is, $(R_\tau \, v)(x) = Q_\tau \,v(X)$ where $X \sim P(x, \cdot)$ and $Q_\tau$ is the quantile functional. More specifically, $$ (R_\tau \, v)(x) = \min \left\{ y \in \RR \;\Big|\; \sum_{x'} \1\{v(x') \leq y\} P(x,x') \geq \tau \right\} \qquad (v \in V, \; x \in \Xsf). $$ Confirm that $R_\tau$ defines a certainty equivalent operator on $V$. ``` ```{solution} ex-quant_ra Since $V$ is all of $\RR^\Xsf$, the condition $R_\tau \colon V \to V$ is trivially satisfied. Regarding monotonicity, fix $v, w \in V$ with $v \leq w$ and $x \in \Xsf$. Let $X$ be a draw from $P(x, \cdot)$. Then $(R_\tau \, v)(x) = Q_\tau[v(X)] \leq Q_\tau[w(X)] = (R_\tau \, w)(x)$, where the inequality is by {prf:ref}`ex-qmsf`. Moreover, given $\lambda \in \RR$ and a random variable $Y$ with $\PP\{Y = \lambda\} = 1$, we clearly have $Q_\tau(Y) = \lambda$. It follows that $R_\tau \, \lambda \1 = \lambda \1$. Hence $R_\tau$ is a certainty equivalent operator, as was to be shown. ``` ```{exercise} :label: ex-val-auto-20 Let $R$ be a certainty equivalent operator on $V \subset \RR^\Xsf_+$, where $\lambda \1 \in V$ for all $\lambda \geq 0$. Prove that $R0=0$ and $Rv \geq 0$ whenever $v \geq 0$. ``` The set of certainty equivalent operators on $\RR^\Xsf$ is invariant under convex combinations, as the next exercise asks you to confirm. ```{exercise} :label: ex-val-auto-21 Let $\mathbf R(\Xsf)$ be the set of certainty equivalent operators on $\RR^\Xsf$ and prove the following: $$ R_a, R_b \in \mathbf R(\Xsf) \text{ and } 0 \leq \lambda \leq 1 \; \implies \; \lambda R_a + (1-\lambda) R_b \in \mathbf R(\Xsf) . $$ ``` #### Properties Let $V$ be a convex cone in $\RR^\Xsf$. A certainty equivalent operator $R$ on $V$ is called - **positive homogeneous** on $V$ if $R(\lambda v) = \lambda Rv$ for all $v \in V$ and $\lambda \geq 0$ with $\lambda v \in V$, - **superadditive** on $V$ if $R(v + w) \geq Rv + Rw$ for all $v, w \in V$ with $v + w \in V$, - **subadditive** on $V$ if $R(v + w) \leq Rv + Rw$ for all $v, w \in V$ with $v + w \in V$, - **constant-subadditive** on $V$ if $R (v + \lambda \1) \leq R v + \lambda \1$ for all $v \in V$ and $\lambda \geq 0$ with $v + \lambda \1 \in V$. ```{prf:example} Given $P \in \mopx$, the linear certainty equivalent $R = P$ is positive homogeneous and both superadditive and subadditive on $\RR^\Xsf$. ``` ```{prf:example} :label: eg-minkow Let $V$ be the interior of the positive cone of $\RR^\Xsf$ and fix $P \in \mopx$. In this setting, the Kreps--Porteus certainty equivalent operator $R_\gamma$ in {eq}`eq-ffa` is subadditive on $V$ when $\gamma \geq 1$ and superadditive on $V$ when $\gamma \leq 1$ (and, as usual, $\gamma \not= 0$). The subadditive case follows directly from Minkowski's inequality, while the superadditive case follows from the mean inequalities in {cite:t}`bullen2003handbook` (p. 202). ``` ```{exercise} :label: ex-qrasa Prove that the quantile certainty equivalent operator $R_\tau$ from {prf:ref}`ex-quant_ra` is constant-subadditive. ``` ```{solution} ex-qrasa Fix $v \in V = \RR^\Xsf$, $\lambda \in \RR_+$ and $x \in \Xsf$. If $X \sim P(x, \cdot)$, then $$ (R_\tau \, (v+\lambda))(x) = Q_\tau \,(v(X)+\lambda) = Q_\tau \,(v(X)) +\lambda, $$ where the second equality is by {prf:ref}`ex-qftr`. Since $x$ was arbitrary, we have $R_\tau (v + \lambda) = R_\tau \, v + \lambda$. Hence $R_\tau$ is constant-subadditive, as claimed. ``` ```{exercise} :label: ex-ersa Show that the entropic certainty equivalent operator $R_\theta$ from {prf:ref}`eg-ffln` is constant-subadditive. ``` ```{solution} ex-ersa Fix $v \in V$, $P \in \mopx$ and $\lambda \in \RR_+$. Let $X$ be a draw from $P(x, \cdot)$. We have $$ \begin{aligned} (R_\theta (v + \lambda))(x) & = \frac{1}{\theta} \ln \left\{ \EE \exp[\theta (v(X) + \lambda)] \right\} \\ & = \frac{1}{\theta} \ln \left\{ \EE \exp[\theta v(X)] \cdot \exp(\theta \lambda) \right\} \\ & = \frac{1}{\theta} \ln \left\{ \EE \exp[\theta v(X)] \right\} + \lambda. \end{aligned} $$ Hence constant-subadditivity holds. ``` ```{exercise} :label: ex-val-auto-22 Prove: If $R$ is constant-subadditive on $V$, then $R$ is nonexpansive with respect to the supremum norm. That is, $$ \|Rv - Rw\|_\infty \leq \|v - w\|_\infty \quad \text{for all } v, w \in \RR^\Xsf. $$ ``` ```{solution} ex-val-auto-22 Let the primitives be as stated. Fix $v, w \in V$. By monotonicity and constant-subadditivity of $R$, we have $$ Rv = R(v - w + w) \leq R(\|v - w\|_\infty \1 + w) \leq R w + \|v - w\|_\infty \1 . $$ Hence $(Rv)(x) - (Rw)(x) \leq \|v-w\|_\infty$ for all $x \in \Xsf$. Reversing the roles of $v$ and $w$ proves the claim. ``` In some instances, a certainty equivalent operator is either convex or concave in the sense of {ref}`sss-mulc`. ```{prf:example} The entropic certainty equivalent operator $R_\theta$ in {prf:ref}`eg-ffln` is concave on $\RR^\Xsf$ whenever $\theta < 0$. To prove this we use the result in {cite:t}`follmer2011entropic` which states that, for $\theta < 0$, $0 \leq \alpha \leq 1$ and finite-valued random variables $Z, Z'$, we have $$ \eE_\theta(\alpha Z + (1-\alpha) Z') \geq \alpha \eE_\theta(Z) + (1-\alpha) \eE_\theta(Z'), $$ (eq-etzz) where $\eE_\theta$ is as defined in {ref}`sss-erae`. ``` ```{exercise} :label: ex-val-auto-23 Using {eq}`eq-etzz`, show that $R_\theta$ is concave on $V = \RR^\Xsf$ when $\theta < 0$. ``` ```{solution} ex-val-auto-23 Fix $v, v' \in V$, $\theta < 0$, $x \in \Xsf$ and $\alpha \in [0,1]$. Letting $X \sim P(x, \cdot)$, $Z = v(X)$ and $Z' = v'(X)$, we have $$ (R(v + v'))(x) = \eE_\theta(\alpha Z + (1-\alpha) Z') \geq \alpha \eE_\theta(Z) + (1-\alpha) \eE_\theta(Z'). $$ The last expression is just $\alpha (Rv)(x) + (1-\alpha) (Rv')(x)$, so $R$ is concave on $V$, as claimed. ``` ```{exercise} :label: ex-supsub Let $V$ be convex and let $R$ be a certainty equivalent operator on $V$. Prove the following: 1. $R$ is convex on $V$ whenever $R$ is subadditive and positive homogeneous on $V$. 2. $R$ is concave on $V$ whenever $R$ is superadditive and positive homogeneous on $V$. ``` ```{solution} ex-supsub Regarding (i), fix $\lambda \in [0,1]$ and $v, w \in V$. Using subadditivity and positive homogeneity, we have $$ R(\lambda v + (1-\lambda)w) \leq R(\lambda v) + R((1-\lambda)w) = \lambda Rv + (1-\lambda) Rw. $$ This proves that $R$ is convex on $V$. The proof of (ii) is similar. ``` Combining {prf:ref}`ex-supsub` and {prf:ref}`eg-minkow`, we have proved ```{prf:lemma} :label: l-kpconcon The Kreps--Porteus certainty equivalent operator $R_\gamma$ in {eq}`eq-ffa` is convex on $V$ when $\gamma \geq 1$ and concave on $V$ when $\gamma \leq 1$. ``` Later we will combine {prf:ref}`l-kpconcon` with the fixed-point results for convex and concave operators in {ref}`sss-mulc` to establish existence and uniqueness of lifetime values for certain kinds of Koopmans operators. #### Monotonicity Let $\Xsf$ be partially ordered and let $i\RR^\Xsf$ be the set of increasing functions in $\RR^\Xsf$. Let $V$ be such that $i\RR^\Xsf \subset V \subset \RR^\Xsf$ and let $R$ be a certainty equivalent on $V$. We call $R$ **monotone increasing** if $R$ is invariant on $i\RR^\Xsf$. This extends the terminology in {ref}`sss-mmcs`, where we applied it to Markov operators (cf., {prf:ref}`ex-miif`). The concept of monotone increasing certainty equivalent operators is connected to outcomes where lifetime preferences are increasing in the state. ```{exercise} :label: ex-val-auto-24 Show that the entropic certainty equivalent operator in {prf:ref}`eg-ffln` is monotone increasing on $V = \RR^\Xsf$ whenever $P$ is monotone increasing, for all nonzero values of $\theta$. ``` ```{exercise} :label: ex-val-auto-25 Show that the Kreps--Porteus certainty equivalent operator in {prf:ref}`eg-ffa` is monotone increasing on $V = (0, \infty)^\Xsf$ whenever $P$ is monotone increasing, for all nonzero values of $\gamma$. ``` (sss-aggre)= #### Aggregation We mentioned that Koopmans operators are typically constructed by combining a certainty equivalent operator and an aggregation function. Let's now discuss the second of these components. Given $V \subset \RR^\Xsf$, an **aggregator** $A$ on $V$ is a map $A$ from $\Xsf \times \RR$ to $\RR$ such that 1. $w(x) = A(x, v(x))$ is in $V$ whenever $v \in V$ and 2. $y \mapsto A(x, y)$ is increasing for all $x \in \Xsf$. Intuitively, an aggregator combines current state and continuation values to measure lifetime value. Common types of aggregators include the - **Leontief aggregator** $A_{\textsc{min}}(x,y) = \min\{ r(x), \beta y \}$ with $r \in \RR^\Xsf$ and $\beta \geq 0$, - **Uzawa aggregator** $A_{\textsc{uzawa}}(x,y) = r(x) + b(x) y$ with $r \in \RR^\Xsf$ and $b \in \RR^\Xsf_+$, and - **CES aggregator** $A_{\textsc{ces}}(x, y) = \{r(x)^\alpha + \beta y^\alpha\}^{1/\alpha}$ with $r \in (0,\infty)^\Xsf$, $\beta \geq 0$ and $\alpha \not= 0$. Here CES stands for "constant elasticity of substitution." An important special case of both the CES and Uzawa aggregators is the - **additive aggregator** $A_{\textsc{add}}(x,y) = r(x) + \beta y$ with $r \in \RR^\Xsf$ and $\beta \geq 0$. From these basic types we can also build composite aggregators. For example, we might consider a CES-Uzawa aggregator of the form $A(x, y) = \{r(x)^\alpha + b(x) y^\alpha\}^{1/\alpha}$ with $r, b \in \RR^\Xsf$, $b \geq 0$ and $\alpha \not= 0$. As we will see in {ref}`sss-ezsd`, the CES-Uzawa aggregator can be used to construct models with both Epstein--Zin utility and state-dependent discounting (as in, say, {cite}`albuquerque2016valuation` or {cite}`schorfheide2018identifying`.) (sss-bkp)= #### Building Koopmans Operators We are now ready to build Koopmans operators by combining certainty equivalents and aggregators. Given $V \subset \RR^\Xsf$, we call a self-map $K$ on $V$ a **Koopmans operator** if $$ K = A \circ R, $$ (eq-kmop) for some aggregator $A$ and certainty equivalent operator $R$ on $V$. The expression in {eq}`eq-kmop` means that $(Kv)(x) = A(x, (Rv)(x))$ at $v \in V$ and $x \in \Xsf$. It is generally appropriate to suppose that a uniform increase in continuation values will increase current value. This property holds for $K$ in {eq}`eq-kmop`. In particular, it follows from the definitions of $A$ and $R$ that $K$ is an order preserving self-map on $V$. ```{prf:example} For risk-sensitive preferences, the Koopmans operator can be expressed as $K_\theta = A_{\textsc{add}} \circ R_\theta$, where $R_\theta$ is the entropic certainty equivalent operator. ``` ```{prf:example} The Epstein--Zin Koopmans operator can be expressed as $K = A_{\textsc{ces}} \circ R_\gamma$, where $R_\gamma$ is the Kreps--Porteus expectations operator, as defined in {eq}`eq-ffa`. This is a version of {eq}`eq-kmop` under the CES aggregator. ``` ```{prf:remark} We defined time additive preferences somewhat loosely in {ref}`sss-nccas`. Here is a better definition: The Koopmans operator $K = A \circ R$ is **time additive** if $A = A_{\textsc{add}}$ and $R$ is ordinary conditional expectations (as in {prf:ref}`eg-oeo`). ``` #### Comments on CES Aggregation The CES aggregator is so-named because, in a static utility maximization problem where $c$ and $y$ are two goods and utility is $U(c,y) = ((1-\beta) c^\alpha + \beta y^\alpha)^{1/\alpha}$, the elasticity of substitution is constant and given by $1/(1-\alpha)$. In the present setting, where aggregation is across time, $1/(1-\alpha)$ is usually called the **elasticity of intertemporal substitution** (EIS). The next exercise explains. ```{exercise} :label: ex-val-auto-26 Consider $U(c, y) = ((1-\beta) c^\alpha + \beta y^\alpha)^{1/\alpha}$ as a utility function over current and future goods $c$ and $y$. Then $$ \text{EIS} = \frac{d \ln (y/c)}{ d \ln(U_c/U_y)} \quad \text{where} \quad U_c \coloneq \frac{\partial U(c,y) }{\partial c} \; \text{ and } \; U_y \coloneq \frac{\partial U(c,y)}{\partial y}. $$ Confirm that EIS $= 1/(1-\alpha)$. ``` ```{solution} ex-val-auto-26 It is not difficult to show that $(U_c/U_y) = ((1-\beta)/\beta) (y/c)^{1-\alpha}$. Taking logs and rearranging gives $\ln(y/c) = (1/(1-\alpha)) \ln (U_c/U_y) + k$, where $k$ is a constant. Using the definition in the exercise yields EIS $= 1/(1-\alpha)$. ``` The fact that EIS $= 1/(1-\alpha)$ under the CES aggregator is significant because the EIS can be measured from data using regression and other techniques. While estimates vary significantly, the detailed meta-analysis by {cite:t}`havranek2015cross` suggests 0.5 as a plausible average value for international studies, with rich countries tending slightly higher. {cite:t}`basu2017uncertainty` use 0.8 when calibrating to US data. Under these estimates, the relationship EIS $= 1/(1-\alpha)$ implies a value for $\alpha$ between -1.0 and -0.25. (ss-ko_stab)= #### Lifetime Value (sss-recop)= In {ref}`sss-bkp` we constructed a generic Koopmans operator using an aggregator and a certainty equivalent operator. In this section, we connect this Koopmans operator to lifetime values and discuss the significance of global stability. To begin, fix set $\Xsf$ and function class $V \subset \RR^\Xsf$. Let $K = A \circ R$ be a Koopmans operator for some aggregator $A$ and certainty equivalent operator $R$ on $V$. The **lifetime value** generated by $K$ is the unique fixed point of $K$ in $V$, whenever it exists. Given such a $v$, the value $v(x)$ is interpreted as lifetime value conditional on initial state $x$. ```{prf:example} :label: eg-asp In the case of time additive preferences, lifetime value was defined in {eq}`eq-vfucq` by $v = (I - \beta P)^{-1} r$. Equivalently, $v$ is the fixed point of the operator $K$ defined by $Kv = r + \beta P v$. Since $K$ is globally stable, the fixed point is unique. In view of {prf:ref}`l-fgsd`, it satisfies $$ v(x) = \EE \sum_{t \geq 0} \beta^t r(X_t) \quad \text{when } (X_t) \text{ is } P \text{-Markov and } X_0 = x. $$ ``` ```{prf:example} By {prf:ref}`p-ktheta`, the risk-sensitive Koopmans operator $K_\theta = A_{\textsc{add}} \circ R_\theta$ is globally stable on $V = \RR^\Xsf$ when $\beta \in (0,1)$. In this setting, the unique fixed point of $K_\theta$ in $V$ is interpreted as lifetime value under the risk-sensitive preferences described in {ref}`ss-rspref`. ``` In many applications, our existence and uniqueness proofs for fixed points of $K$ will also establish global stability. For Koopmans operators, global stability has the following interpretation: for $w \in V$, $m \in \NN$ and $x \in \Xsf$, the value $(K^m w)(x)$ gives total finite-horizon utility over periods $0, \ldots, m$ under the preferences embedded in $K$, with initial state $x$ and terminal condition $w$. Hence global stability implies that, for any choice of terminal condition, finite-horizon utility converges to infinite-horizon utility as the time horizon converges to infinity. The next exercise helps to illustrate this point. ```{exercise} :label: ex-asfh Consider again the time additive preferences $V_t = u(C_t) + \beta \EE_t V_{t+1}$ in {prf:ref}`eg-asp`. Suppose that the time horizon is finite, with some exogenous terminal value $V_m = w(X_m)$ at time $m$. Letting $v_m(x)$ represent lifetime value up until time $m$, conditional on initial state $x$, show that 1. $v_m = \sum_{t=0}^{m-1} (\beta P)^t r + (\beta P)^m w$, 2. $v_m = K^m w$, where $K$ is the associated Koopmans operator $Kv = r + \beta P v$ and, 3. $K^m w \to v^* \coloneq (I - \beta P)^{-1} r$ as $m \to \infty$. ``` ```{solution} ex-asfh Iterating forward from $V_0$ gives $$ V_0 = u(C_0) + \beta \EE_0 \, V_1 = u(C_0) + \beta \EE_0 \left[ u(C_1) + \beta \EE_1 \, V_2 \right] = u(C_0) + \beta \EE_0 \, u(C_1) + \beta^2 \EE_0 \, V_2. $$ Continuing forward until time $m$ yields $V_0 = \sum_{t=0}^{m-1} \beta^t \EE_0 \, u(C_t) + \beta^m \EE_0 \, V_m$. Shifting to functional form and using $r = u \circ c$, the last expression becomes $$ v = \sum_{t=0}^{m-1} (\beta P)^t r + (\beta P)^m w. $$ By {prf:ref}`ex-tat`, this is just $K^m w$ when $K$ is the associated Koopmans operator $Kv = r + \beta P v$ and, moreover, $K^m w \to v^* \coloneq (I - \beta P)^{-1} r$. ``` {prf:ref}`ex-asfh` confirms that, at least for the time additive case, global stability of $K$ is equivalent to the statement that a finite-horizon valuation with arbitrary terminal condition $w$ converges to the infinite-horizon valuation. #### Monotone Lifetime Values Let $\Xsf = (\Xsf, \preceq)$ be partially ordered, let $i\RR^\Xsf$ be the set of increasing functions in $\RR^\Xsf$, and let $V$ be such that $i\RR^\Xsf \subset V \subset \RR^\Xsf$. Let $K$ be a Koopmans operator on $V$, so that $Kv = A \circ R$ for some aggregator $A$ and certainty equivalent operator $R$ on $V$. Suppose that $K$ has a unique fixed point $v^* \in V$. A natural question is: when is $v^*$ increasing in the state? ```{prf:lemma} If $K$ is globally stable, then $v^*$ is increasing on $\Xsf$ whenever the following two conditions hold: 1. $A(x, v) \leq A(x', v)$ whenever $v \in V$ and $x \preceq x'$, and 2. $R$ is monotone increasing on $V$. ``` ```{prf:proof} It is not difficult to check that, under the stated conditions, $K$ is invariant on $i\RR^\Xsf$. It follows from {prf:ref}`ex-cinvfp` that $v^*$ is increasing on $\Xsf$. ◻ ``` ```{exercise} :label: ex-val-auto-27 Consider the Epstein--Zin Koopmans operator $K = A_{\textsc{ces}} \circ R_\gamma$ on $V$, where $V \coloneq (0, \infty)^\Xsf$ and the primitives is as in {eq}`eq-kez`. Assume the conditions of {prf:ref}`p-ezbk`, so that $K$ has a unique fixed point $v^*$ in $V$. Given $P \in \mopx$, we can write $R_\gamma$ as $R_\gamma v = (Pv^\gamma)^{1/\gamma}$ at each $v \in V$. Prove that $v^*$ is increasing in $\Xsf$ whenever $P$ is monotone increasing and $c \in i\RR^\Xsf$. ``` (sss-rast)= ### A Blackwell-Type Condition Let $R$ be a certainty equivalent operator on $V = \RR^\Xsf$ and let $A$ be an aggregator on $V$. Let $K$ be the Koopmans operator on $V$ defined by $(Kv)(x) = A(x, (Rv)(x))$. When $R$ is constant-subadditive, we can often establish global stability of $K$ on $V$ via a contraction mapping argument. This section gives details. #### Blackwell Aggregators We call an aggregator $A$ on $V$ a **Blackwell aggregator** if there exists a $\beta \in (0,1)$ such that $$ A(x, y + \lambda) \leq A(x, y) + \beta \lambda, $$ (eq-black_agg) for all $x \in \Xsf$, $y \in \RR$ and $\lambda \in \RR_+$. ```{exercise} :label: ex-bcaal Fix $\beta \in \RR_+$ and $r \in \RR^\Xsf$. Show that the additive aggregator $A(x,y) = r(x) + \beta y$ and the Leontief aggregator $A(x,y) = \min\{ r(x), \beta y \}$ are Blackwell aggregators when $\beta < 1$. ``` ```{solution} ex-bcaal The additive case is obvious. Regarding the Leontief case, fix $x \in \Xsf$, $y \in \RR$ and $\lambda \in \RR_+$. We have $$ \min\{ r(x), \beta (y + \lambda) \} \leq \min\{ r(x) + \beta \lambda, \beta y + \beta \lambda \} = \min\{ r(x) , \beta y \} + \beta \lambda. $$ That is, $A(x, y + \lambda) \leq A(x, y) + \beta \lambda$. Hence Blackwell's condition holds. ``` The next proposition states conditions for global stability in settings where aggregators have the Blackwell property. ```{prf:proposition} :label: p-kogs If $A$ is a Blackwell aggregator and $R$ is constant-subadditive, then the Koopmans operator $K \coloneq A \circ R$ is a contraction on $V$ with respect to $\| \cdot \|_\infty$. ``` ```{prf:proof} Let the primitives be as stated. In view of {prf:ref}`l-blackwell`, and taking into account the fact that $K$ is order preserving, we need only show that there exists a $\beta \in (0,1)$ with $K(v + \lambda) \leq Kv + \beta \lambda$ for all $v \in V$ and $\lambda \in \RR_+$. To see this, fix $v \in V$ and $\lambda \in \RR_+$. Applying constant-subadditivity of $R$ and monotonicity of $A$, we have $$ K(v + \lambda) = A(\cdot, R(v + \lambda)) \leq A(\cdot, Rv + \lambda). $$ Since $A$ is a Blackwell aggregator, the last term is bounded by $A(\cdot, Rv) + \beta \lambda$ with $\beta < 1$. Hence $K(v+\lambda) \leq Kv + \beta \lambda$, and $K$ is a contraction of modulus $\beta$ on $V$. ◻ ``` The stability of time additive preferences is a special case of {prf:ref}`p-kogs`. (sss-proofkt)= #### The Risk-Sensitive Case We can now complete the proof of {prf:ref}`p-ktheta`, which concerned global stability of the Koopmans operator generated by risk-sensitive preferences. ```{prf:proof} *Proof of {prf:ref}`p-ktheta`.* Fix $\theta \not= 0$ and recall that $K_\theta$ in {eq}`eq-koop` can be expressed as $K_\theta = A_{\textsc{add}} \circ R_\theta$ when $R_\theta$ is the entropic certainty equivalent. Since $A_{\textsc{add}}$ is a Blackwell aggregator and $R_\theta$ is constant-subadditive ({prf:ref}`ex-ersa`), {prf:ref}`p-kogs` applies. In particular, $K_\theta$ is globally stable on $\RR^\Xsf$. ◻ ``` ```{exercise} :label: ex-val-auto-28 Let $K = A_{\textsc{min}} \circ R$ on $V = \RR^\Xsf$. Prove that $K$ is globally stable on $V$ whenever $R$ is constant-subadditive and $A_{\textsc{min}}(x, y) = \min\{r(x), \beta y\}$ with $\beta \in (0,1)$. ``` (sss-qp_stab)= #### Quantile Preferences Consider a setting where $V = \RR^\Xsf$ and $K_\tau \coloneq A_{\textsc{add}} \circ R_\tau$. That is, $$ (K_\tau v)(x) = r(x) + \beta (R_\tau v)(x) \qquad (x \in \Xsf), $$ (eq-qpas) for $\beta \in (0,1)$, $\tau \in [0,1]$, $r \in \RR^\Xsf$ and $R_\tau$ as described in {prf:ref}`ex-quant_ra`. Since $R_\tau$ is constant-subadditive ({prf:ref}`ex-qrasa`) and the additive aggregator is Blackwell, $K_\tau$ is globally stable ({prf:ref}`p-kogs`). The operator $K_\tau$ represents quantile preferences, as described in {cite:t}`de2019dynamic` and other studies (see {ref}`s-cn_val`). The value $\tau$ parameterizes attitude to risk, a point we return to in {ref}`sss-quant_dp`. ```{exercise} :label: ex-val-auto-29 Consider replacing the operator $K_\tau$ in {eq}`eq-qpas` with $K = A_{\textsc{min}} \circ R_\tau$. Under the same assumptions (apart from the switch to Leontief aggregator), prove that $K$ is globally stable. ``` ```{solution} ex-val-auto-29 We already know from {prf:ref}`ex-bcaal` that the Leontief aggregator satisfies Blackwell's condition when $\beta \in (0,1)$. Since $R_\tau$ is constant-subadditive, global stability follows from {prf:ref}`p-kogs`. ``` (sss-uzawa_stab)= ### Uzawa Aggregation Let's consider the Koopmans operator $K = A_{\textsc{uzawa}} \circ R$, where $V$ is some subset of $\RR^\Xsf$ and $R$ is a certainty equivalent operator on $V$. In particular, $$ (Kv)(x) = r(x) + b(x) (Rv)(x) \qquad (x \in \Xsf, \; v \in V), $$ (eq-kuzawa) with $r, b \in \RR^\Xsf$ and $b \geq 0$. We are interested in conditions that imply $K$ is globally stable on $V$. #### The Case of Conditional Expectation Let $V = \RR^\Xsf$ and suppose $R = P$ for some $P \in \mopx$, so that $R$ is ordinary conditional expectations. Then $K$ becomes $Kv = r + Lv$ where $L \in \lopx$ with $L(x,x') = b(x)P(x,x')$. By {prf:ref}`ex-tat`, $K$ is globally stable on $V$ whenever $\rho(L) < 1$. This kind of structure arises when households derive utility from a consumption path while their discount factor fluctuates according to some state variable (see, e.g., {cite}`krusell1998income`, {cite}`toda2019wealth`, {cite}`cao2020recursive`, and {cite}`hubmer2020sources`). For a given consumption path $(C_t)$, lifetime values takes the form $$ v(x) = \EE_x \, \sum_{t=0}^\infty \left( \prod_{i=1}^t \beta_i \right) u(C_t), $$ (eq-pwdfs) where $u$ is a flow utility function and $\{\beta_t\}$ is a discount factor process. Suppose $C_t = c(X_t)$ and $\beta_t = b(X_t)$ where $b \geq 0$ and $(X_t)$ is $P$-Markov for some $P \in \mopx$. Set $r \coloneq u \circ c$ and $L(x,x') \coloneq b(x) P(x,x')$. By {prf:ref}`t-dpec`, the condition $\rho(L) < 1$ implies that $v$ in {eq}`eq-pwdfs` is the unique fixed point of $Kv = r + L v = r + b Pv$. In other words, lifetime value under {eq}`eq-pwdfs` is the unique fixed point of the Koopmans operator when the aggregator is of Uzawa type and the certainty equivalent is conditional expectation. How does this relate to optimization? Recall our discussion of state-dependent MDPs in {prf:ref}`c-state_dep`. There, the policy operator $T_\sigma$ in {eq}`eq-gec_polop` is a special case of {eq}`eq-kuzawa` when the discount factor depends only on the current state and action. With some additional requirements, the condition $\rho(L)<1$ is necessary as well as sufficient for existence of a unique fixed point for $Kv = r + Lv$. Indeed, if $b \gg 0$ and $P$ is irreducible, then $L$ is also irreducible and a positive linear operator. Applying {prf:ref}`l-nsln`, we see that $r \gg 0$ and $\rho(L) \geq 1$ implies $Kv = r + Lv$ has no fixed point in $V \coloneq \setntn{v \in \RR^\Xsf}{v \gg 0}$. ```{exercise} :label: ex-lirbir Confirm that $L$ is irreducible when $b \gg 0$ and $P$ is irreducible. ``` #### Stability via Concavity Now consider $Kv = r + b Rv$ from {eq}`eq-kuzawa` when $R$ is not in $\mopx$. Here $b Rv$ is the pointwise product, so that $(bRv)(x) = b(x) (Rv)(x)$ for all $x$. We cannot use {prf:ref}`p-kogs` to prove stability of $K$ unless $b(x) < 1$ for all $x \in \Xsf$. Since this condition is rather strict, we now study weaker conditions that can be valid even when $b$ exceeds $1$ in some states. Specifically, we consider 1. $b Rv \leq c + Lv$ for some $c \in \RR^\Xsf$ and $L \in \lopx$ with $\rho(L) < 1$. 2. $r \gg 0$ and $R$ is concave on $\RR^\Xsf_+$. Let $V = [0, \bar v]$ where $\bar v \coloneq (I-L)^{-1}(r + c)$. ```{prf:proposition} If conditions *(a)--(b)* hold, then $K$ is globally stable on $V$. ``` ```{prf:proof} Under (a)--(b), $K$ is concave on $\RR^\Xsf_+$, with $$ 0 \ll r = r + b R 0 = K0 \quad \text{and} \quad K \bar v = r + b R \bar v \leq r + c + L \bar v = r + c - (I - L)\bar v + \bar v = \bar v. $$ The claim now follows from Du's theorem. ◻ ``` (sss-ezsd)= #### Epstein--Zin Preferences with State-Dependent Discounting Combining the CES-Uzawa aggregator $A(x, y) = \{r(x)^\alpha + b(x) y^\alpha\}^{1/\alpha}$ with the Kreps--Porteus certainty equivalent operator leads to the Koopmans operator $$ Kv = \left\{ h + b \left[ Pv^\gamma \right]^{\alpha/\gamma} \right\}^{1/\alpha}, \quad \text{with} \quad h, b \in \RR^\Xsf_+. $$ (eq-kezsd) A fixed point of $K$ corresponds to lifetime value for an agent with Epstein--Zin preferences and state-dependent discounting. (Such set ups are used in research on macroeconomic dynamics and asset pricing -- see {ref}`s-cn_val` for more details). In what follows we take $V = (0, \infty)^\Xsf$ and assume that $h, b \in V$ and $P$ is irreducible. ```{exercise} :label: ex-val-auto-30 Show that $K$ is self-map on $V$. ``` To discuss stability of $K$ we introduce the operator $B \in \lopx$ defined by $$ (Bv)(x) \coloneq b(x)^\theta \sum_{x'} v(x') P(x, x') \quad \text{where} \;\; \theta \coloneq \frac{\gamma}{\alpha}. $$ ```{prf:proposition} :label: p-kcesu $K$ is globally stable on $V$ if and only if $\rho(B)^{\alpha/\gamma} < 1$. ``` To prove {prf:ref}`p-kcesu`, we proceed as in {ref}`sss-psez`, constructing a conjugate operator $\hat K$ and proving stability of the latter. For this purpose, we introduce $$ \hat K v = \left\{ h + (B v)^{1/\theta} \right\}^\theta \qquad (v \in V), $$ (eq-kppusd) Also, let $\Phi$ be defined by $\Phi v = v^\gamma$. ```{exercise} :label: ex-kkhtc2 Prove: $(V, K)$ and $(V, \hat K)$ are topologically conjugate under $\Phi$. ``` ```{solution} ex-kkhtc2 We saw in {prf:ref}`l-kkhtc` that $\Phi$ is a homeomorphism from $V$ to itself. For $v \in V$, we have $$ \hat K\Phi v = \left\{ h + (b^\theta P\Phi v)^{1/\theta} \right\}^\theta = \left\{ h + b (Pv^\gamma)^{1/\theta} \right\}^\theta = \left\{ h + b (P v^\gamma)^{\alpha /\gamma} \right\}^{\gamma/\alpha} = \Phi Kv . $$ Thus, $\hat K\Phi = \Phi K$ on $V$. Rearranging gives $K = \Phi^{-1} \hat K \Phi$, so $(V, K)$ and $(V, \hat K)$ are topologically conjugate, as claimed. ``` ```{prf:proof} *Proof of {prf:ref}`p-kcesu`.* In view of {prf:ref}`ex-kkhtc2`, it suffices to show that $\hat K$ is globally stable on $V$ if and only if $\rho(B)^{\alpha/\gamma} < 1$. This is implied by {prf:ref}`t-powaff`, since $B$ is irreducible (see {prf:ref}`ex-lirbir`) and $\rho(B)^{1/\theta} = \rho(B)^{\alpha/\gamma}$. ◻ ``` (s-cn_val)= ## Chapter Notes The time additive preference structure in {ref}`ss-vcpf` was popularized by {cite:t}`samuelson1939interactions`, who built on earlier work by {cite:t}`fisher1930theory` and {cite:t}`ramsey1928mathematical`. An axiomatic foundation was supplied by {cite:t}`koopmans1960stationary`. {cite:t}`bastianello2022choquet` study the foundations of discounted expected utility (DEU) from a purely subjective framework. Problems with the time additive DEU model include non-constant discounting, as discussed in {ref}`s-cn_state_dep`, as well as sign effects (gains being discounted more than losses) and magnitude effects (small outcomes being discounted more than large ones. See, for example, {cite:t}`thaler1981some` and {cite:t}`benzion1989discount`. A critical review of the time additive model and a list of many references can be found in {cite:t}`frederick2002time`. In the stochastic setting, the time additive framework is a subset of the expected utility model ({cite}`von1944theory`, {cite}`friedman1956theory`, {cite}`savage1951theory`). There are many well documented departures from expected utility in experimental data. See the start of {cite:t}`andreoni2012risk` and the article {cite:t}`ericson2019intertemporal` for an introduction to the literature. An interesting historical discussion of time additive expected utility can be found in {cite:t}`becker1989recursive`. (It is ironic that those most responsible for popularizing the time additive DEU framework have also been among the most critical. For example, {cite:t}`samuelson1939interactions` stated that it is "completely arbitrary" to assume that the DEU specification holds. He goes on to claim that, in the analysis of savings and consumption, it is "extremely doubtful whether we can learn much from considering such an economic man." In addition, {cite:t}`stokey1989recursive`, whose work helped to standardize DEU as a methodology for quantitative analysis, argued in a separate study that DEU is attractive only because of its relative simplicity {cite:p}`lucas1984optimal`.) Do the departures from time additive expected utility found in experimental data actually matter for quantitative work? Evidence suggests that the answer is affirmative. In macroeconomics and asset pricing in particular, researchers increasingly use non-additive preferences in order to bring model outputs closer to the data. For example, many quantitative models of asset pricing rely heavily on Epstein--Zin preferences. Representative examples include {cite:t}`epstein1991empirical`, {cite:t}`tallarini2000risk`, {cite:t}`bansal2004risks`, {cite:t}`hansen2008consumption`, {cite:t}`bansal2012empirical`, {cite:t}`schorfheide2018identifying`, and {cite:t}`de2022valuation`. Alternative numerical solution methods are discussed in {cite:t}`pohl2018higher`. An excellent introduction to recursive preference models can be found in {cite:t}`backus2004exotic`. Our use of the term "Koopmans operator," which is not entirely standard, honors early contributions by Nobel laureate Tjalling Koopmans on recursive preferences (see {cite}`koopmans1960stationary` and {cite}`koopmans1964stationary`). Theoretical properties of recursive preference models have been studied in many papers, including {cite:t}`epstein1989risk`, {cite:t}`weil1990nonexpected`, {cite:t}`boyd1990recursive`, {cite:t}`hansen2009long`, {cite:t}`marinacci2010unique`, {cite:t}`bommier2017monotone`, {cite:t}`bloise2018convex`, {cite:t}`marinacci2019unique`, {cite:t}`pohl2019relative`, {cite:t}`balbus2020recursive`, {cite:t}`borovivcka2020necessary`, {cite:t}`dejarnette2020time`, {cite:t}`christensen2022existence`, and {cite:t}`becker2023`. The paper by {cite:t}`marinacci2019unique` provides a useful alternative approach to existence of unique fixed points in the setting of order preserving maps. Experimental results on Epstein--Zin preferences can be found in {cite:t}`meissner2022measuring`. There is a strong connection between risk-sensitive preferences and the literature on robust control. See, for example, {cite:t}`cagetti2002robustness`, {cite:t}`hansen2007recursive`, and {cite:t}`barillas2009doubts`. We return to this point in {prf:ref}`c-rdps`. The quantile preferences we considered in {ref}`sss-qp_stab` have been analyzed in static and dynamic settings by {cite:t}`giovannetti2013asset`, {cite:t}`de2019dynamic`, {cite:t}`de2022static` and {cite:t}`de2022dynamic`. Recursive components of the analysis of quantile and Uzawa preference models build on the study of monotone preferences in {cite:t}`bommier2017monotone`. Some recursive preference specifications involve ambiguity aversion. An introduction to this literature and its applications can be found in {cite:t}`klibanoff2009recursive`, {cite:t}`hayashi2011intertemporal`, {cite:t}`hansen2018aversion`, {cite:t}`bommier2019ambiguity` and {cite:t}`hansen2020structured`. {cite:t}`marinacci2023recursive` discuss the connection between recursivity and attitudes to uncertainty. We discuss ambiguity again in {prf:ref}`c-rdps`. Recursive preferences are increasingly applied outside the field of asset pricing, where they first came to prominence. See, for example, {cite:t}`bommier2012risk`, {cite:t}`colacito2018bkk`, {cite:t}`jensen2019life`, or {cite:t}`augeraud2019value`. The coin flip application in {ref}`sss-las` is related to correlation aversion, as discussed in {cite:t}`lorenzo2023recursive`, and preference for "consumption spreads" as reviewed in {cite:t}`frederick2002time`. Some applications of {prf:ref}`t-du` to network analysis can be found in {cite:t}`sargent2022economic`. [^1]: Epstein--Zin preferences were popularized in {cite:t}`epstein1989risk`. They are a special case of preferences defined by {cite:t}`kreps1978temporal`. Further discussion can be found in {ref}`s-cn_val`. [^2]: We could try to truncate the interval to a neighborhood of the fixed point and hope that $\hat K$ is a contraction when restricted to this interval. But in higher dimensions we are not sure that a fixed point exists for a broad range of parameters, which makes this idea hard to implement. ======================================================================== ## Recursive Decision Processes (c-rdps)= # Recursive Decision Processes While the MDP model from {prf:ref}`c-mdps` and {prf:ref}`c-state_dep` is elegant and widely used, researchers in economics, finance, and other fields are working to extend it. Reasons include: 1. MDP theory cannot be applied to settings where lifetime values are described by the kinds of nonlinear recursions discussed in {prf:ref}`c-val`. 2. Equilibria in some models of production and economic geography can be computed using dynamic programming but not all such programming problems fit within the MDP framework. 3. Dynamic programming problems that include adversarial agents to promote robust decision rules can fail to be MDPs. To handle such departures from the MDP assumptions, we now construct a more general dynamic programming framework, building on an approach to optimization initially developed by {cite:t}`denardo1967contraction` and extended by {cite:t}`bertsekas2022abstract`. Further references are provided in {ref}`s-cn_finite_rdps`. We start this chapter by building a framework that centers on an abstract representation of the Bellman equation ({ref}`s-rdp_theory`). We then state optimality results and show how they can be verified in a range of applications. We defer proofs of core optimality results to {prf:ref}`c-adps`, where we strip dynamic programs down to their essence by adopting a purely operator-theoretic perspective. (s-rdp_theory)= ## Definition and Properties In this section, we introduce and analyze optimality conditions for recursive decision processes that include and extend all dynamic programming frameworks discussed so far. Throughout this chapter, $\Xsf$ denotes a finite set. (sss-defrdps)= ### Defining RDPs Consider a generic Bellman equation of the form $$ v(x) = \max_{a \in \Gamma(x)} B(x, a, v). $$ (eq-belleq0) Here $x$ is the state, $a$ is an action, $\Gamma$ is a feasible correspondence, and $B$ is an "aggregator" function. We understand $\Gamma(x)$ as all actions available to the controller in state $x$. The function $v$ assigns values to states and is a member of some class $V \subset \RR^\Xsf$. This "abstract" Bellman equation generalizes all of the Bellman equations presented in previous chapters. Our plan is to analyze the Bellman equation {eq}`eq-belleq0` and state conditions on $B$ and the other primitives that make strong optimality properties hold. As a first step, we introduce two finite sets, - an **action space** $\Asf$ and - a **state space** $\Xsf$. Given $\Xsf$ and $\Asf$, we define a **recursive decision process** (RDP) to be a triple $\rR = (\Gamma, V, B)$ consisting of 1. a **feasible correspondence** $\Gamma$ that is a nonempty correspondence from $\Xsf$ to $\Asf$, which in turn defines - the feasible state-action pairs $$ \Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)} $$ - and the set of feasible policies $$ \Sigma \coloneq \setntn{\sigma \in \Asf^\Xsf} {\sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf}, $$ 2. a subset $V$ of $\RR^\Xsf$ called the **value space**, and 3. a **value aggregator** $B$ that maps $\Gsf \times V$ to $\RR$ and satisfies both the monotonicity condition $$ v, w \in V \text{ and } v \leq w \implies B(x, a, v) \leq B(x, a, w) \; \text{ for all } (x, a) \in \Gsf, $$ (eq-fmon) and the consistency condition $$ w \in V \text{ whenever } w(x) = B(x, \sigma(x), v) \text{ for some } \sigma \in \Sigma \text{ and } v \in V. $$ (eq-fsconsis) Throughout, $\leq$ represents the pointwise order on $\RR^\Xsf$. The definition of the feasible correspondence in (i) is identical to that for the MDP in {prf:ref}`c-mdps`. As for (ii), we understand $V$ to be a class of functions that assign values to states. In (iii), the interpretation of the aggregator $B$ is: > $B(x, a, v) =$ total lifetime rewards, contingent on current action $a$, current state $x$, and using $v$ to evaluate future states. The monotonicity condition {eq}`eq-fmon` is natural: if, relative to $v$, rewards are at least as high for $w$ in every future state, then the total rewards one can extract under $w$ should be at least as high. The consistency condition in {eq}`eq-fsconsis` ensures that as we consider values of different policies we remain within the value space $V$. The MDP framework is a special case of the RDP framework: ```{prf:example} :label: eg-rdp_fmdp Consider MDP $\mM = (\Gamma, \beta, r, P)$ with state space $\Xsf$ and action space $\Asf$ (see, e.g., {ref}`sss-fsmdp`). We can frame $\mM$ as an RDP by taking $\Gamma$ as unchanged, $V = \RR^\Xsf$, and $$ B(x, a, v) = r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \qquad ((x,a) \in \Gsf,\; v \in V). $$ (eq-bforfmdp) Now $(\Gamma, V, B)$ forms an RDP. The monotonicity condition {eq}`eq-fmon` clearly holds and the consistency condition {eq}`eq-fsconsis` is trivial, since $V$ is all of $\RR^\Xsf$. Inserting {eq}`eq-bforfmdp` into the abstract Bellman equation {eq}`eq-belleq0` recovers the MDP Bellman equation ({eq}`eq-fsmdp_bell`). ``` ```{prf:example} :label: eg-rdp_cake Consider a basic cake eating problem (see {ref}`sss-cake`), where $\Xsf$ is a finite subset of $\RR_+$ and $x \in \Xsf$ is understood to be the number of remaining slices of cake today. Let $x'$ be the number of remaining slices next period and $u(x-x')$ be the utility from slices enjoyed today. The utility function $u$ maps $\RR_+$ to $\RR$. Let $V = \RR^\Xsf$, let $\Gamma$ be defined by $\Gamma(x) = \setntn{x' \in \Xsf}{x' \leq x}$ and let $$ B(x, x', v) = u(x - x') + \beta v(x'). $$ Then $(\Gamma, V, B)$ is an RDP with Bellman equation identical to that of the original cake eating problem in {ref}`sss-cake`. The monotonicity condition {eq}`eq-fmon` and the consistency condition {eq}`eq-fsconsis` are easy to verify. ``` The last example is a special case of {prf:ref}`eg-rdp_fmdp`, since the cake eating problem is an MDP (see {ref}`sss-cake`). Nonetheless, {prf:ref}`eg-rdp_cake` is instructive because, for cake eating, the MDP construction is tedious (e.g., we need to define a stochastic kernel $P$ even though transitions are deterministic), while the RDP construction is straightforward. The next example makes a related point. ```{prf:example} :label: eg-os_rdp In {ref}`sss-jsmdp` we showed that the job search model is an MDP but the construction was tedious. But we can also represent job search as an RDP and the embedding is straightforward. To see this, recall that, for an arbitrary optimal stopping problem with primitives as described in {prf:ref}`c-opt_stop`, the Bellman equation is $$ v(x) = \max \left\{ e(x), c(x) + \beta \sum_{x'} v(x') P(x, x') \right\} \qquad (x \in \Xsf). $$ (eq-opt_stop_be_rdp) Let $V = \RR^\Xsf$ and $\Gamma(x) = \{0, 1\}$ for all $x$. Let $$ B(x, a, v) = a e(x) + (1-a) \left[ c(x) + \beta \sum_{x'} v(x') P(x, x') \right], $$ (eq-opt_stop_agg_rdp) for $x \in \Xsf$ and $a \in \Asf \coloneq \{0, 1\}$. Then $(\Gamma, V, B)$ is an RDP ({prf:ref}`ex-cakemon`) and setting $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ reproduces the Bellman equation {eq}`eq-opt_stop_be_rdp`. ``` ```{exercise} :label: ex-cakemon Verify that conditions {eq}`eq-fmon` and {eq}`eq-fsconsis` hold for this RDP. ``` ```{prf:example} :label: eg-stokeylucas The dynamic programming framework popularized by {cite:t}`stokey1989recursive` is characterized by two features: First, the state is divided into an exogenous process $(Z_t)$ and an endogenous process $(Y_t)$. In addition, the next period endogenous state is directly chosen by the current action. The Bellman equation takes the form $$ v(y, z) = \max_{y' \in \Gamma(y, z)} \left\{ F(y, z, y') + \beta \sum_{z'} v(y', z') Q(z, z') \right\}. $$ (eq-be_sl) We assume that $(Z_t)$ is $Q$-Markov on finite set $\Zsf$ and $(Y_t)$ takes values in finite set $\Ysf$. With state space $\Xsf \coloneq \Ysf \times \Zsf$, action space $\Ysf$, feasible correspondence $x \mapsto \Gamma(x)$, value space $V = \RR^\Xsf$ and aggregator $$ B(x, a, v) = B((y, z), y', v) = F(y, z, y') + \beta \sum_{z'} v(y', z') Q(z, z'), $$ we obtain an RDP with Bellman equation identical to {eq}`eq-be_sl`. ``` ```{exercise} :label: ex-rdps-auto-1 Show that this RDP can also be expressed as an MDP. ``` ```{solution} ex-rdps-auto-1 This problem can be formulated as an MDP by setting the state to $x \coloneq (y, z)$, taking values in $\Xsf \coloneq \Ysf \times \Zsf$. The action space is $\Ysf$. The feasible correspondence is $x \mapsto \Gamma(x)$ and current reward is $r(x, a) = r((y, z), y') = F(y, z, y')$. The stochastic kernel is $$ P(x, a, x') = P((y, z), a, (y', z')) = \1\{y' = a\} Q(z, z'). $$ This MDP $(\Gamma, \beta, r, P)$ has a Bellman equation identical to {eq}`eq-be_sl`. ``` {prf:ref}`eg-rdp_fmdp`--{prf:ref}`eg-stokeylucas` treated RDPs that can be embedded into the MDP framework. In the remaining examples, we consider models that cannot be represented as MDPs. ```{prf:example} :label: eg-sdd We can add state-dependent discounting to {prf:ref}`eg-rdp_fmdp` by changing the aggregator to $$ B(x, a, v) = r(x, a) + \sum_{x'} v(x') \beta(x, a, x') P(x, a, x'). $$ (eq-bforfmdpsd) Here $\beta$ is a map from $\Gsf \times \Xsf$ to $\RR_+$. With $\Gamma$ and $V$ unchanged, $(\Gamma, V, B)$ is an RDP with Bellman equation identical to that of the MDP with state-dependent discounting we analyzed in {prf:ref}`c-state_dep`. In {ref}`ss-ecrdps` we will use RDP theory developed in this chapter to verify the optimality results claimed in {prf:ref}`c-state_dep`. ``` ```{exercise} :label: ex-sddisrdp Verify that $(\Gamma, V, B)$ as defined in {prf:ref}`eg-sdd` is an RDP. ``` ```{solution} ex-sddisrdp We must check that $(\Gamma, V, B)$ satisfies conditions {eq}`eq-fmon`--{eq}`eq-fsconsis`. The monotonicity condition holds because $\beta$ is nonnegative, so $w \leq v$ implies $$ \sum_{x'} w(x') \beta(x,a, x') P(x, a, x') \leq \sum_{x'} v(x') \beta(x,a,x') P(x, a, x') \; \text{ for all } (x, a) \in \Gsf. $$ The consistency condition is trivial, since $V$ is all of $\RR^\Xsf$. ``` ```{prf:example} :label: eg-rsrdp We can modify the MDP in {prf:ref}`eg-rdp_fmdp` to use risk-sensitive preferences. We do this by taking $\Gamma, V$ to be the same as the MDP example and setting $$ B(x, a, v) = r(x, a) + \beta \frac{1}{\theta} \ln \left\{ \sum_{x'} \exp(\theta v(x')) P(x, a, x') \right\}, $$ for all $(x, a) \in \Gsf$ and $v \in V$. Here $\Gsf$ is generated by the feasible correspondence $\Gamma$. ``` ```{exercise} :label: ex-rdps-auto-2 Confirm that the risk-sensitive model $(\Gamma, V, B)$ in {prf:ref}`eg-rsrdp` is an RDP for all nonzero $\theta$. ``` ```{prf:example} :label: eg-ezrdp We can also modify the MDP in {prf:ref}`eg-rdp_fmdp` to use the Epstein--Zin specification (see {eq}`eq-ezfp`) by setting $$ B(x, a, v) = \left\{ r(x, a) + \beta \left[ \sum_{x'} v(x')^\gamma P(x,a, x') \right]^{\alpha / \gamma} \right\}^{1/\alpha}, $$ (eq-ezfp2) where $\beta \in (0,1)$, $\gamma$ and $\alpha$ are nonzero parameters, and $r \gg 0$. Let $V$ be all strictly positive functions in $\RR^\Xsf$. Then $(\Gamma, V, B)$ is an RDP. ``` ```{exercise} :label: ex-ezisrpd Confirm the last claim in {prf:ref}`eg-ezrdp`. ``` ```{prf:example} :label: eg-spath The **shortest path problem** considers optimal traversal of a directed graph $\gG = (\Xsf, E)$, where $\Xsf$ is the vertices of the graph and $E$ is the edges. A weight function $c \colon E \to \RR_+$ associates cost to each edge $(x,x') \in E$. The aim is to find the minimum cost path from $x$ to a specified vertex $d$ for every $x \in \Xsf$. Under some conditions, the problem can be solved by applying a Bellman operator of the form $$ (Tv)(x) = \min_{x' \in \oO(x)} \{ c(x, x') + v(x') \} \qquad (x \in \Xsf), $$ (eq-bespt) where $\oO(x) \coloneq \setntn{x' \in \Xsf}{(x, x') \in E}$ is the direct successors of $x$ and $v(x')$ is the minimum cost-to-go from state $x'$. The problem is not an MDP because future values are not discounted. It can be framed as an RDP, however, by setting $\Gamma(x) = \oO(x)$, $B(x, x', v) = c(x, x') + v(x')$ and $V = \RR^\Xsf$. ``` {prf:ref}`eg-spath` is a minimization problem. We treat minimization explicitly in {ref}`ss-minim_rdps`, although the shortest path setting can be converted maximization by replacing $c(x,x')$ with $-c(x,x')$. This produces an application similar to the cake eating problem in {prf:ref}`eg-rdp_cake` (although discounting is eliminated and network structure shows up in the constraint). (ss-frdpot0)= ### Lifetime Value We aim to discuss optimality of RDPs. To prepare for this topic, we now clarify lifetime values associated with different policy choices in the RDP setting. (sss-polval)= #### Policies and Value Let $\rR = (\Gamma, V, B)$ be an RDP with state and action spaces $\Xsf$ and $\Asf$, and let $\Sigma$ be the set of all feasible policies. For each $\sigma \in \Sigma$ we introduce the **policy operator** $T_\sigma$ as a self-map on $V$ defined by $$ (T_\sigma \, v)(x) = B(x, \sigma(x), v) \qquad (x \in \Xsf). $$ (eq-rdpts) The RDP policy operator is a direct generalization of the MDP policy operator defined, as well as the optimal stopping policy operator defined. ```{exercise} :label: ex-rdptsop Show that $T_\sigma$ is an order-preserving self-map on $V$ for all $\sigma \in \Sigma$. ``` ```{solution} ex-rdptsop Fix $\sigma \in \Sigma$. The claim that $T_\sigma$ is a self-map on $V$ follows immediately from the consistency condition in {eq}`eq-fsconsis`. The order-preserving property follows from the monotonicity condition in {eq}`eq-fmon`. ``` Consider a given RDP $(\Gamma, V, B)$ and fix $\sigma \in \Sigma$. If $T_\sigma$ has a unique fixed point in $V$, we denote this fixed point by $v_\sigma$ and call it the **$\sigma$-value function**. It is natural to interpret $v_\sigma$ as representing the lifetime value of following policy $\sigma$. ```{prf:example} For the optimal stopping problem discussed in {prf:ref}`c-opt_stop`, the function $v_\sigma$ that records the lifetime value of a policy $\sigma$ from any given state is the unique fixed point of the optimal stopping policy operator $T_\sigma$. See {ref}`sss-os_po`. ``` ```{prf:example} For the MDP model discussed in {prf:ref}`c-mdps`, lifetime value of policy $\sigma$ is given by $v_\sigma = (I-\beta P_\sigma)^{-1} r_\sigma$. As discussed in {prf:ref}`ex-vsits`, $v_\sigma$ is the unique fixed point of the MDP policy operator $T_\sigma$ defined by $T_\sigma v= r_\sigma + \beta P_\sigma v$. ``` ```{prf:example} For the MDP model with state-dependent discounting introduced in {prf:ref}`c-state_dep`, {prf:ref}`ex-lvstatedep` shows that the lifetime value of following policy $\sigma$ is the unique fixed point of the policy operator $T_\sigma$ defined in {eq}`eq-gec_polop`. ``` The previous examples are linear but the same idea extends to nonlinear recursive preference models as well. To see this, recall the generic Koopmans operator $(Kv)(x) = A(x, (Rv)(x))$ introduced in {ref}`ss-koopop`. Lifetime value is the unique fixed point of this operator whenever it exists. In all of the RDP examples we have considered, the policy operator can be expressed as $(T_\sigma \, v)(x) = A_\sigma(x, (R_\sigma \, v)(x))$ for some aggregator $A_\sigma$ and certainty equivalent operator $R_\sigma$. Hence $T_\sigma$ is a Koopmans operator and lifetime value associated with policy $\sigma$ is the fixed point of this operator. (sss-uands)= #### Uniqueness and Stability Let $\rR = (\Gamma, V, B)$ be a given RDP with policy operators $\{T_\sigma\}$. Given that our objective is to maximize lifetime value over the set of policies in $\Sigma$, we need to assume at the very least that lifetime value is well defined at each policy. To this end, we say that $\rR$ is **well-posed** whenever $T_\sigma$ has a unique fixed point $v_\sigma$ in $V$ for all $\sigma \in \Sigma$. ```{prf:example} :label: eg-oswp The optimal stopping RDP we introduced in {prf:ref}`eg-os_rdp` is well-posed. Indeed, for each $\sigma \in \Sigma$, the policy operator $T_\sigma$ has a unique fixed point in $\RR^\Xsf$ by {prf:ref}`p-ospolop`. ``` ```{prf:example} :label: eg-mdpwp The RDP generated by the MDP model in {prf:ref}`eg-rdp_fmdp` is well-posed, since, for each $\sigma \in \Sigma$, the operator $T_\sigma = r_\sigma + \beta P_\sigma$ has the unique fixed point $v_\sigma = (I-\beta P_\sigma)^{-1} r_\sigma$ in $\RR^\Xsf$. ``` ```{prf:example} The shortest path problem discussed in {prf:ref}`eg-spath` is not well-posed without further assumptions. For example, consider a graph that contains two vertices $x$ and $y$, with $x \in \oO(y)$, $y \in \oO(x)$, and $c(x,y) + c(y,x) > 0$. Then, for any policy $\sigma$ that maps $x$ to $y$ and $y$ to $x$, we have $$ (T_\sigma \, v)(x) = c(x,y) + v(y) \quad \text{and} \quad (T_\sigma \, v)(y) = c(y,x) + v(x) . $$ Hence, if $v \in \RR^\Xsf$ is a fixed point of $T_\sigma$, we obtain $v(x) = c(x,y) + v(y)$ and $v(y) = c(y,x) + v(x)$. Substition yields $v(x) = c(x,y) + c(y, x) + v(x)$, which is a contradiction. ``` Let $\rR$ be an RDP with policy operators $\{T_\sigma\}_{\sigma \in \Sigma}$. In what follows, we call $\rR$ **globally stable** if $T_\sigma$ is globally stable on $V$ for all $\sigma \in \Sigma$. ```{prf:example} :label: eg-oswp2 The optimal stopping RDP we introduced in {prf:ref}`eg-os_rdp` is globally stable, since, for each $\sigma \in \Sigma$, the policy operator $T_\sigma$ is globally stable on $\RR^\Xsf$ by {prf:ref}`p-ospolop`. ``` ```{prf:example} :label: eg-mdpwp2 The RDP generated by the MDP model in {prf:ref}`eg-rdp_fmdp` is globally stable. See {prf:ref}`ex-vsits`. ``` Obviously, every globally stable RDP is well-posed. ```{prf:remark} :label: r-gsnat In line with our discussion of stability of Koopmans operators in {ref}`sss-recop`, for $v \in V$, $\sigma \in \Sigma$, $m \in \NN$ and $x \in \Xsf$, it is natural to interpret $(T_\sigma^m \, v)(x)$ as total (finite horizon) utility over periods $0, \ldots, m$ under policy $\sigma$, with initial state $x$ and terminal condition $v \in V$. Hence global stability implies that, for any choice of terminal condition, finite horizon valuations always converge to their infinite horizon counterparts. ``` In {ref}`ss-frdpot` we will see that global stability yields strong optimality properties. (sss-contrdp)= #### Continuity Let $\rR = (\Gamma, V, B)$ be an RDP. We call $\rR$ **continuous** if $B(x, a, v)$ is continuous in $v$ for all $(x, a) \in \Gsf$. In other words, $\rR$ is continuous if, for any $v \in V$, any $(x, a) \in \Gsf$ and any sequence $(v_k)_{k \geq 1}$ in $V$, we have $$ \lim_{k \to \infty} B(x, a, v_k) = B(x, a, v) \quad \text{whenever} \quad \lim_{k \to \infty} v_k = v . $$ Continuity is satisfied by all applications considered in this text. For example, for the RDP generated by an MDP ({prf:ref}`eg-rdp_fmdp`), the deviation $| B(x, a, v_k) - B(x, a, v)|$ is dominated by $\beta \| v_k - v \|_\infty$ for all $(x, a) \in \Gsf$. Hence continuity holds. Below we will see that continuity is useful when considering covergence of certain algorithms. (ss-frdpot)= ### Optimality In this section, we present optimality theory for RDPs. #### Greedy Policies Given an RDP $\rR = (\Gamma, V, B)$ and $v \in V$, a policy $\sigma \in \Sigma$ is called **$v$-greedy** if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} B(x, a, v) \quad \text{for all } x \in \Xsf. $$ (eq-fdefgreedy) Since $\Gamma(x)$ is finite and nonempty at each $x \in \Xsf$, at least one such policy exists. As with policy operators, the notion of greedy policies extends existing definitions from earlier chapters. ```{exercise} :label: ex-rdphgl Show that, for each $v \in V$, the set $\{T_\sigma \, v\}_{\sigma \in \Sigma} \subset V$ contains a least and greatest element (see {ref}`sss-lge` for the definitions). Explain the connection between the greatest element and $v$-greedy policies. ``` ```{solution} ex-rdphgl Fix $v \in V$ and consider the set $\{T_\sigma \, v\}_{\sigma \in \Sigma} \subset V$. We first show that $\{T_\sigma \, v\}_{\sigma \in \Sigma}$ contains a greatest element. Suppose that $\bar \sigma$ is $v$-greedy. If $\sigma$ is any other policy, then $$ (T_\sigma \, v)(x) = B(x, \sigma(x), v) \leq (T_{\bar \sigma} \, v)(x) \quad \text{for all } x \in \Xsf . $$ Hence $T_{\bar \sigma} \, v$ is a greatest element of $\{T_\sigma \, v\}_{\sigma \in \Sigma}$. The proof that $\{T_\sigma \, v\}_{\sigma \in \Sigma}$ contains a least element is analogous, after replacing $\argmax$ with $\argmin$. ``` Given RDP $\rR = (\Gamma, V, B)$, we say that $v \in V$ satisfies the **Bellman equation** if $v(x) = \max_{a \in \Gamma(x)}B(x,a,v)$ for all $x \in \Xsf$. The **Bellman operator** corresponding to $\rR$ is the map $T$ on $V$ defined by $$ (T v)(x) = \max_{a \in \Gamma(x)} B(x, a, v) \qquad (x \in \Xsf). $$ ```{prf:example} For the Epstein--Zin RDP in {eq}`eq-ezfp2`, the Bellman operator is given by $$ (Tv)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a)^\alpha + \beta \left[ \sum_{x'} v(x')^\gamma P(x,a, x') \right]^{\alpha / \gamma} \right\}^{1/\alpha} \quad (x \in \Xsf). $$ ``` ```{exercise} :label: ex-rdpac Given RDP $\rR = (\Gamma, V, B)$ with policy operators $\{T_\sigma\}$ and Bellman operator $T$, show that, for each $v \in V$, 1. $T v = \bigvee_\sigma \, T_\sigma \, v \coloneq \bigvee_{\sigma \in \Sigma} \, (T_\sigma \, v)$ and 2. $\sigma$ is $v$-greedy if and only if $T v = T_\sigma \, v$. 3. $T$ is an order-preserving self-map on $V$. ``` ```{solution} ex-rdpac Regarding part (i), fix $v \in V$. For any $\sigma \in \Sigma$ and $x \in \Xsf$, we have $$ (T v)(x) = \max_{a \in \Gamma(x)} B(x, a, v) = \max_{\sigma \in \Sigma} B(x, \sigma(x), v) = \max_{\sigma \in \Sigma} (T_\sigma \, v)(x). $$ Since $x$ was chosen arbitrarily, we have confirmed that $T v = \bigvee_{\sigma \in \Sigma} T_\sigma \, v$. Regarding part (ii), $\sigma$ is $v$-greedy if and only if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} B(x, a, v) \quad \text{for all } x \in \Xsf. $$ This is equivalent to $B(x, \sigma(x), v) = \max_{a \in \Gamma(x)} B(x, a, v)$ for all $x \in \Xsf$. Hence $\sigma$ is $v$-greedy if and only if $T_\sigma \, v = Tv$, as claimed. Regarding (iii), to see that $T$ is a self-map on $V$, fix $v \in V$ and let $\sigma$ be $v$-greedy. Then, by (ii), $Tv = T_\sigma \, v \in V$. Hence $T$ is a self-map, as claimed. The fact that $T$ is order-preserving on $V$ follows immediately from the monotonicity property of $B$ in {eq}`eq-fmon`. ``` ```{exercise} :label: ex-rdps-auto-3 Show that, for a given RDP $(\Gamma, V, B)$ and fixed $v \in V$, the Bellman operator $T$ obeys $$ (T^k v)(x) = \max_{a \in \Gamma(x)} B(x, a, T^{k-1} v), $$ (eq-stkvr) for all $k \in \ZZ_+$ and all $x \in \Xsf$. Show, in addition, that for any policy $\sigma \in \Sigma$, the policy operator $T_\sigma$ obeys $$ (T_\sigma^k v)(x) = B(x, \sigma(x), T_\sigma^{k-1} v), $$ (eq-tkvr) for all $k \in \ZZ_+$ and all $x \in \Xsf$. ``` ```{solution} ex-rdps-auto-3 Here's a proof for $T$ and fixed $k \in \NN$: At arbitrary $x \in \Xsf$, $$ (T^k v)(x) = (T (T^{k-1} v))(x) = \max_{a \in \Gamma(x)} B(x, a, T^{k-1} v). $$ (eq-atkvr) This confirms the claim in {eq}`eq-stkvr`. ``` (sss-rdpalgos)= #### Algorithms To solve RDPs for optimal policies, we use two core algorithms: Howard policy iteration (HPI) and optimistic policy iteration (OPI). As in previous chapters, OPI includes VFI as a special case. To describe HPI we take $\rR = (\Gamma, V, B)$ to be a well-posed RDP with feasible policy set $\Sigma$, policy operators $\{T_\sigma\}$, and Bellman operator $T$. In this setting, the HPI algorithm is essentially identical to the one given for MDPs in {ref}`sss-hpi`, except that $v_\sigma$ is calculated as the fixed point of $T_\sigma$, rather than taking the specific form $(I-\beta P_\sigma)^{-1} r_\sigma$. The details are in {prf:ref}`algo-hpi_rdps`. ```{prf:algorithm} HPI for RDPs :label: algo-hpi_rdps - input $\sigma \in \Sigma$ - $v_0 \leftarrow v_\sigma$ and $k \leftarrow 0$ - repeat: - $\sigma_k \leftarrow $ a $v_k$-greedy policy - $v_{k+1} \leftarrow $ the fixed point of $T_{\sigma_k}$ - if $v_{k+1} = v_k$: break - $k \leftarrow k + 1$ - return $\sigma_k$ ``` {prf:ref}`algo-hpi_rdps` is somewhat ambiguous, since it is not always clear how to implement the instruction "$v_k \leftarrow$ the fixed point of $T_{\sigma_k}$". However, if $\rR$ is globally stable, then each $T_{\sigma_k}$ is globally stable, so an approximation of the fixed point can be calculated by iterating with $T_{\sigma_k}$. This line of thought leads us to consider optimistic policy iterating (OPI) as a more practical alternative. {prf:ref}`algo-opi_rdps` states an OPI routine for solving $\rR$ that generalizes the MDP OPI routine in {ref}`ss-fmdpsal`. ```{prf:algorithm} OPI for RDPs :label: algo-opi_rdps - input $m \in \NN$ and tolerance $\tau \geq 0$ - input $\sigma \in \Sigma$ and set $v_0 \leftarrow v_\sigma$ - $k \leftarrow 0$ - repeat: - $\sigma_k \leftarrow $ a $v_k$-greedy policy - $v_{k+1} \leftarrow T_{\sigma_k}^m v_k$ - if $\| v_{k+1} - v_k \| \leq \tau$: break - $k \leftarrow k + 1$ - return $\sigma_k$ ``` In {prf:ref}`algo-opi_rdps` we require that $v_0 = v_\sigma$ for some $\sigma \in \Sigma$. This assumption can be dropped in some settings. For practical purposes, however, it is almost always straightforward to initialize OPI with $v_0 = v_\sigma$ for some simple choice of $\sigma$. ```{exercise} :label: ex-rdps-auto-4 Prove that, for the sequence $(v_k)$ in the OPI algorithm {prf:ref}`algo-opi_rdps`, we have $v_k = T^k v_0$ whenever $m=1$. (In other words, OPI reduces to VFI when $m=1$.) ``` When we turn to proofs, it will help to have an operator-theoretic description of HPI and OPI. To this end, we define two operators. The first is $\Hmax \colon V \to \{v_\sigma\}$, which is defined via $$ \Hmax v = v_\sigma \text{ where } \sigma \text{ is } v \text{-greedy}. $$ (eq-howop) We call $\Hmax$ the **Howard operator** generated by $\rR$. Iterating with $\Hmax$ implements HPI. In particular, if we fix $\sigma \in \Sigma$ and set $v_k = \Hmax^k v_\sigma$, then $(v_k)_{k \geq 0}$ is the sequence of $\sigma$-value functions generated by HPI.[^1] Next, fixing $m \in \NN$, we define the operator $W_m$ from $V$ to itself via $$ W_m v \coloneq T^m_\sigma v \quad \text{where} \quad \sigma \text{ is } v \text{-greedy}. $$ (eq-wmop) (See the previous footnote on the choice of $v$-greedy policies.) The operator $W_m$ is an approximation of $H$, since $T^m_\sigma v \to v_\sigma = Hv$ as $m \to \infty$. Iterating with $W_m$ generates the value sequence in OPI. More specifically, we take $v_0 \in \{v_\sigma\}$ and generate $$ (v_k, \sigma_k)_{k \geq 0} \quad \text{where } v_k = W_m^k v_0 \text{ and } \sigma_k \text{ is } v_k \text{-greedy}. $$ (eq-opivp) This produces an infinite sequence of OPI value and policy iterates. (sss-rdp_opres)= #### Optimality Let $\rR$ be a well-posed RDP with policy operators $\{T_\sigma\}$ and $\sigma$-value functions $\{v_\sigma\}$. In this context, we set $v^* \coloneq \bigvee_\sigma \, v_\sigma \in \RR^\Xsf$ and call $v^*$ the **value function** of $\rR$. By definition, $v^*$ satisfies $$ v^*(x) = \max_{\sigma \in \Sigma} v_\sigma(x) \qquad \text{for all } \; x \in \Xsf. $$ (eq-fvstar) A policy $\sigma$ is called **optimal** for $\rR$ if $v_\sigma = v^*$; that is, if $$ v_\sigma(x) \geq v_\tau(x) \quad \text{for all } \tau \in \Sigma \text{ and all } x \in \Xsf. $$ Both of these definitions generalize the definitions we used for MDPs and optimal stopping. In particular, optimality of a policy means that it generates maximum possible lifetime value from every state. We say that $\rR$ satisfies **Bellman's principle of optimality** if $$ \sigma \in \Sigma \text{ is optimal for } \rR \quad \iff \quad \sigma \text{ is } v^*\text{-greedy}. $$ We can now state our main optimality result for RDPs. In the statement, $\rR$ is a well-posed RDP with value function $v^*$. ```{prf:theorem} :label: t-fbk_rpd If $\rR$ is globally stable, then 1. $v^*$ is the unique solution to the Bellman equation in $V$, 2. $\rR$ satisfies Bellman's principle of optimality, 3. $\rR$ has at least one optimal policy, 4. HPI returns an optimal policy in finitely many steps, and 5. the OPI sequence in {eq}`eq-opivp` is such that $v_k \to v^*$ as $k \to \infty$ and, moreover, there exists a $K \in \NN$ such that $\sigma_k$ is optimal for all $k \geq K$. ``` As OPI includes VFI as a special case ($m=1$), {prf:ref}`t-fbk_rpd` also implies convergence of VFI under the stated conditions. In terms of applications, {prf:ref}`t-fbk_rpd` is the most important optimality result in this book. It provides the core optimality results from dynamic programming and a broadly convergent algorithm for computing optimal policies. The proof of {prf:ref}`t-fbk_rpd` is deferred to {ref}`s-otpro`. ```{prf:example} :label: eg-ossas The optimality results for optimal stopping problems we presented in {prf:ref}`c-opt_stop` are a special case of {prf:ref}`t-fbk_rpd`, since such optimal stopping problems generate globally stable RDPs (as discussed in {prf:ref}`eg-oswp2`). ``` ```{prf:example} :label: eg-mdpsas The optimality results for MDPs we presented in {prf:ref}`c-mdps` are a special case of {prf:ref}`t-fbk_rpd`, since MDPs generate globally stable RDPs (as discussed in {prf:ref}`eg-mdpwp2`). ``` {prf:ref}`eg-ossas`--{prf:ref}`eg-mdpwp2` are relatively elementary. More complex models will be handled in {ref}`s-torpds`. (sss-cott)= #### Comments on the Optimality Theorem Many traditional treatments of dynamic programming build optimality theory around contractivity (see, e.g., {cite}`puterman2005markov` or {cite}`stokey1989recursive`, Section 4.2). Assumptions are constructed so that the policy operators and Bellman operator are all contraction mappings. While such assumptions are sufficient for {prf:ref}`t-fbk_rpd` (since contractivity of the policy operators implies stability), they are not necessary. There are a variety of ways to prove uniqueness and stability of fixed points, including the monotonicity-based methods discussed in {ref}`ss-conconop` and the spectral methods in {ref}`sss-asrcon`. These alternatives will prove useful in settings where contractivity fails, as we shall see in {ref}`s-torpds`. Another point worth noting about the conditions in {prf:ref}`t-fbk_rpd` is that no assumptions are placed on the Bellman operator. Rather, one only needs to check properties of the policy operators. This is advantageous because, unlike the Bellman operator, the policy operators do not involve maximization. (sss-nonstat)= #### Nonstationary Policies Up until now, we have focused entirely on stationary policies, in the sense that the same policy is used at every point in time. What if we drop this assumption and admit the option to change policies? Might this lead to higher lifetime values? In this section, we show that for globally stable RDPs the answer is negative. This finding justifies our focus on stationary policies. To begin, let $\rR = (\Gamma, V, B)$ be a globally stable RDP. Recall from {prf:ref}`r-gsnat` that, given $v \in V$, $\sigma \in \Sigma$, $k \in \NN$ and $x \in \Xsf$, the value $(T_\sigma^k \, v)(x)$ gives finite horizon utility over periods $0, \ldots, k$ under policy $\sigma$, with initial state $x$ and terminal condition $v$. Extending this idea, it is natural to understand $T_{\sigma_k} T_{\sigma_{k-1}} \cdots T_{\sigma_1} v$ as providing finite horizon utility values for the nonstationary policy sequence $(\sigma_k)_{k \in \NN} \subset \Sigma$, given terminal condition $v \in V$. For the same policy sequence, we define its lifetime value via $$ \bar v \coloneq \limsup_{k \to \infty} v_k \quad \text{with } v_k \coloneq T_{\sigma_k} T_{\sigma_{k-1}} \cdots T_{\sigma_1} v $$ whenever the limsup is finite and independent of the terminal condition $v$. Suppose that this is the case, and hence $\bar v$ is well defined. We claim that $\bar v \leq v^*$. ```{exercise} :label: ex-vklvs Show that, under the stated conditions, $v_k \leq T^k v$ for all $k \in \NN$. ``` Since $\bar v$ is independent of the terminal condition $v$, we can assume without loss of generality that $v \in V_\Sigma$. By {prf:ref}`t-fbk_rpd`, we have $T^k v \to v^*$ as $k \to \infty$. Hence, by {prf:ref}`ex-vklvs`, $$ \bar v = \limsup_{k \to \infty} v_k \leq \limsup_{k \to \infty} T^k v = \lim_{k \to \infty} T^k v = v^*, $$ as was to be shown. #### Bounded RDPs We call an RDP $\rR = (\Gamma, V, B)$ **bounded** if $V$ is convex and, moreover, there exist functions $v_1, v_2 \in V$ such that $v_1 \leq v_2$, $$ v_1(x) \leq B(x, a, v_1) \quad \text{ and } \quad B(x, a, v_2) \leq v_2(x) \;\; \text{ for all } (x, a) \in \Gsf . $$ (eq-fsconsis_bd) We will show that boundedness can be used to obtain optimality results for well-posed RDPs, even without global stability. Another attractive feature of boundedness is that it permits a reduction of value space, as illustrated by the next two exercises. ```{exercise} :label: ex-boured Let $(\Gamma, V, B)$ be bounded and let $v_1, v_2 \in V$ be such that {eq}`eq-fsconsis_bd` holds. Prove that, in this setting, $(\Gamma, \hat V, B)$ is also an RDP when $\hat V \coloneq [v_1, v_2]$. ``` ```{exercise} :label: ex-boured2 Adopt the setting of {prf:ref}`ex-boured` and suppose, in addition, that the RDP is well-posed. Show that $v_\sigma \in \hat V$ for all $\sigma \in \Sigma$. ``` {prf:ref}`ex-boured2` implies the reduced RDP $(\Gamma, \hat V, B)$ is also well-posed under the stated conditions, and that it contains all the $\sigma$-value functions and the value function from the original RDP $(\Gamma, V, B)$. Hence any optimality results for $(\Gamma, \hat V, B)$ carry over to $(\Gamma, V, B)$. ```{solution} ex-boured2 Let $\{T_\sigma\}$ be the policy operators associated with a bounded and well-posed RDP $(\Gamma, V, B)$. Let $\hat V \coloneq [v_1, v_2]$, where $v_1, v_2$ are as in {eq}`eq-fsconsis_bd`. Fix $\sigma \in \Sigma$ and let $v_\sigma$ be the $\sigma$-value function of policy operator $T_\sigma$. It follows from {eq}`eq-fsconsis_bd` that $T_\sigma$ is a self-map on $\hat V$. By the Knaster--Tarski fixed-point theorem, $T_\sigma$ has at least one fixed point in $\hat V$. By uniqueness, that fixed point is $v_\sigma$. ``` ```{exercise} :label: ex-rgm_bounded Show that the RDP generated by an MDP in {prf:ref}`eg-rdp_fmdp` is bounded. ``` ```{solution} ex-rgm_bounded Let $(\Gamma, V, B)$ be as stated. Let $\bar r = \| r \|_\infty$. We claim that {eq}`eq-fsconsis_bd` holds when $v_2 = \bar r/(1-\beta)$ and $v_1 = -v_2$. (The functions $v_1$ and $v_2$ are constant.) To see this, observe that $$ B(x, a, v_2) = r(x, a) + \beta v_2 \leq \bar r + \beta v_2 = v_2. $$ for all $(x,a) \in \Gsf$. This is the upper bound condition in {eq}`eq-fsconsis_bd`. The proof of the lower bound condition is similar. ``` ```{exercise} :label: ex-rdps-auto-5 Show that the optimal stopping RDP from {prf:ref}`eg-os_rdp` is bounded. ``` ```{solution} ex-rdps-auto-5 Letting $\bar r = \|e\|_\infty + \|c\|_\infty$, it can be shown that {eq}`eq-fsconsis_bd` holds when $v_2 = \bar r/(1-\beta)$ and $v_1 = - v_2$. The argument is similar to that provided for the MDP case in the solution to {prf:ref}`ex-rgm_bounded`. ``` ```{exercise} :label: ex-rdps-auto-6 Consider the RDP $(\Gamma, V, B)$ generated by an MDP with stochastic discounting in {prf:ref}`eg-sdd`. Prove that this RDP is bounded whenever the conditions of {prf:ref}`p-sddmdpec0` hold. ``` ```{solution} ex-rdps-auto-6 Let $\bar r \coloneq \| r \|_\infty \1$. We claim that the functions $v_2 = (I-L)^{-1} \bar r$ and $v_1 = - v_2$ satisfy {eq}`eq-fsconsis_bd`. To see this, observe that $$ \bar r + Lv_2 = \bar r + L v_2 - v_2 + v_2 = \bar r - (I - L)v_2 + v_2 = \bar r - \bar r + v_2 = v_2. $$ Since $B(x, a, v_2) \leq \bar r + (L v_2)(x)$, this proves that $v_2$ satisfies the upper bound condition in {eq}`eq-fsconsis_bd`. The proof of the lower bound condition is similar. ``` ```{exercise} :label: ex-shir Consider the shortest path RDP $(\Gamma, V, B)$ in {prf:ref}`eg-spath` and assume in addition that the graph $\gG$ contains only one cycle, which is a self-loop at $d$, that $d$ is accessible from every vertex $x \in \Xsf$, and that $c(d, d) = 0$. (These assumptions imply that every path leads to $d$ in finite time and that travelers reaching $d$ remain there forever at zero cost.) Let $C(x)$ be the maximum cost of traveling to $d$ from $x$, which is finite by the stated assumptions. Show that {eq}`eq-fsconsis_bd` holds when $v_1 \coloneq 0$ and $v_2 \coloneq C$. ``` ```{solution} ex-shir The only nontrivial condition to check is that the bound $B(x, x', v_2) \leq v_2(x)$ holds for all feasible $(x, x')$. In particular, we need to show that $$ c(x,x') + C(x') \leq C(x) \text{ whenever } x' \in \oO(x) \text{ and } x' \not= x. $$ This is true by the definition of $C$, since $C(x)$ is the maximum travel cost to $d$ and $c(x,x') + C(x')$ is the cost of traveling to $d$ via $x'$ and then taking the most expensive path. ``` The next result shows that, when considering optimality, stability can be replaced by boundedness. ```{prf:theorem} :label: t-fbk_rpd_bounded If $\rR$ is well-posed and bounded, then *(i)--(iv)* of {prf:ref}`t-fbk_rpd` hold. ``` (ss-iso_rpds)= ### Topologically Conjugate RDPs Sometimes RDP models can be simplified by transformations over value space. In this section we investigate such transformations. The underlying ideas are related to topological conjugacy of dynamical systems, which we introduced in {ref}`sss-topconds`. To begin, let $\rR = (\Gamma, V, B)$ and $\hat \rR = (\Gamma, \hat V, \hat B)$ be two RDPs with identical state space $\Xsf$, action space $\Asf$ and feasible correspondence $\Gamma$. We consider settings where $$ V = \MM^\Xsf \quad \text{and} \quad \hat V = \hat{\MM}^\Xsf \quad \text{where } \MM, \hat \MM \subset \RR, $$ and, in addition, that there exists a homeomorphism $\phi$ from $\MM$ onto $\hat \MM$ such that $$ B(x, a, v) = \phi^{-1}[ \hat B(x, a, \phi \circ v) ] \quad \text{for all } v \in V \text{ and } (x, a) \in \Gsf. $$ (eq-bbhat) We call $\rR$ and $\hat R$ **topologically conjugate** under $\phi$ if $\phi$ is a homeomorphism $\phi$ from $\MM$ to $\hat \MM$ and {eq}`eq-bbhat` holds. Here is our main result for this section. ```{prf:proposition} :label: p-osas If $\rR$ and $\hat \rR$ are topologically conjugate, then $\rR$ is globally stable if and only if $\hat \rR$ is globally stable. ``` The benefit of {prf:ref}`p-osas` is that one of these models might be easier to analyze than the other. We apply the proposition to the Epstein--Zin specification in {ref}`ss-ezrev` and to a smooth ambiguity model in {ref}`ss-smoothamb`. The next exercise will be useful for the proof. ```{exercise} :label: ex-phiphi Prove the following: If $\phi$ is a homeomorphism from $\MM$ to $\hat \MM$ and $\Phi v \coloneq \phi \circ v$, then $\Phi$ is a homeomorphism from $V$ to $\hat V$. ``` ```{prf:proof} *Proof of {prf:ref}`p-osas`.* By {prf:ref}`ex-phiphi`, $\Phi v \coloneq \phi \circ v$ is a homeomorphism from $V$ to $\hat V$. Moreover, for any $\sigma \in \Sigma$, the respective policy operators $T_\sigma$ and $\hat T_\sigma$ are linked by $$ (T_\sigma \, v)(x) = B(x, \sigma(x), v) = \phi^{-1}[ \hat B(x, \sigma(x), \phi \circ v) ] = \phi^{-1}[ ( \hat T_\sigma \phi \circ v)(x) ]. $$ This shows that $T_\sigma = \Phi^{-1} \circ \hat T_\sigma \circ \Phi$ on $V$. Hence $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ are topologically conjugate dynamical systems, from which it follows that $T_\sigma$ is globally stable if and only if $\hat T_\sigma$ is globally stable. This completes the proof of {prf:ref}`p-osas`. ◻ ``` In the next section we will see how these ideas can simplify optimality analysis. (ss-ezrev)= #### Application: Epstein--Zin RDPs In this section, we show how the preceding optimality results and the notion of topologically conjugacy can be deployed to analyze the Epstein--Zin RDP from {prf:ref}`eg-ezrdp`. Recall that the aggregator in {prf:ref}`eg-ezrdp` is $$ B(x, a, v) = \left\{ r(x, a) + \beta \left( \sum_{x'} v(x')^\gamma P(x, a, x') \right)^{\alpha/\gamma} \right\}^{1/\alpha}. $$ Let $V = (0, \infty)^\Xsf$. We assume that $r \gg 0$ and take a nonempty feasible correspondence $\Gamma$ as given. {prf:ref}`ex-ezisrpd` confirmed that $\rR \coloneq (\Gamma, V, B)$ is an RDP. We will call the stochastic kernel $P$ **irreducible** if $P(x, \sigma(x), x')$ is irreducible for all $\sigma \in \Sigma$. Below we establish stability of $\rR$ under irreducibility. ```{prf:proposition} :label: p-stabez If $P$ is irreducible, then $\rR$ is globally stable. ``` To prove {prf:ref}`p-stabez`, we set up a simpler and more tractable model. Our first step is to introduce another RDP by setting $$ \hat B(x, a, v) = B\left(x, a, v^{1/\gamma}\right)^{\gamma}. $$ We set $\rR \coloneq (\Gamma, V, B)$ and $\hat \rR \coloneq (\Gamma, V, \hat B)$. Notice that $\hat B$ can also be expressed as $$ \hat B(x,a, v) = \left\{ r(x, a) + \beta \left( \sum_{x'} v(x') P(x, a, x') \right)^{1/\theta} \right\}^{\theta}, $$ (eq-bezt) where $\theta \coloneq \gamma/\alpha$. The value of of introducing $\hat \rR$ comes from the fact that $\hat \rR$ is easier to work with than $\rR$ (just as the modified Epstein--Zin Koopmans operator $\hat K$ defined in {ref}`sss-psez` turned out to be easier to work with than the original Epstein--Zin Koopmans operator $K$ introduced in {ref}`sss-pkoez`). ```{exercise} :label: ex-ezai Prove that $\rR$ and $\hat \rR$ are topologically conjugate RDPs (see {ref}`ss-iso_rpds`). ``` ```{solution} ex-ezai The map $\phi(m)=m^\gamma$ is a homeomorphism from $V$ to itself and {eq}`eq-bbhat` holds under $\phi$. This implies the claim in the exercise. ``` Now we investigate the properties of the simpler RDP $\hat \rR$. ```{prf:lemma} :label: l-ezai2 If $P$ is irreducible, then $\hat \rR$ is a globally stable RDP. ``` ```{prf:proof} In view of {eq}`eq-bezt`, each policy operator $\hat T_\sigma$ associated with $\hat \rR$ takes the form $$ (\hat T_\sigma \,v)(x) = \left\{ r(x, \sigma(x)) + \beta \left( \sum_{x'} w(x') P(x, \sigma(x), x') \right)^{1/\theta} \right\}^{\theta}. $$ (eq-ezbhpo) Each such $\hat T_\sigma$ is a special case of $\hat K$ defined by $\hat K v = \left\{ h + \beta (P v)^{1/\theta} \right\}^\theta$ (see {eq}`eq-kppu`). We saw in {ref}`sss-psez` that this operator is globally stable under the stated assumptions. Hence $\hat \rR$ is a globally stable RDP. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`p-stabez`.* {prf:ref}`ex-ezai` and {prf:ref}`p-osas` together imply that $\rR$ is globally stable if and only if $\hat \rR$ is globally stable. The claim that $\rR$ is globally stable now follows from {prf:ref}`l-ezai2`. ◻ ``` (s-torpds)= ## Types of RDPs In {ref}`s-rdp_theory` we showed that well-posed RDPs have strong optimality properties whenever they are globally stable or bounded, and that VFI and OPI converge whenever they are globally stable. But what conditions are sufficient for these properties? We start with a relatively strict condition based on contractivity and then progress to models that fail to be contractive. (sss-crdps)= ### Contracting RDPs In this section, we study RDPs with strong contraction properties. Many traditional dynamic programs fit into this framework. (sss-crdpdef)= #### Definition and Examples Let $\rR = (\Gamma, V, B)$ be an RDP with state space $\Xsf$, action space $\Asf$, and feasible state-action pair set $\Gsf$. We call $\rR$ **contracting** if there exists a $\beta < 1$ such that $$ | B(x, a, v) - B(x, a, w)| \leq \beta \| v - w \|_\infty \quad \text{for all } (x, a) \in \Gsf \text{ and } v, w \in V. $$ (eq-crdp) In line with the terminology for contraction maps, we call $\beta$ the **modulus of contraction** for $\rR$ when {eq}`eq-crdp` holds. ```{prf:example} :label: eg-osrdp_contracting The optimal stopping RDP from {prf:ref}`eg-os_rdp` is contracting with modulus $\beta$, since, for $B$ in {eq}`eq-opt_stop_agg_rdp`, an application of the triangle inequality gives $$ |B(x, a, v) - B(x, a, w)| = (1-a) \beta \left| \sum_{x'} [v(x') - w(x')] P(x, x') \right| \leq \beta \| v - w \|_\infty. $$ ``` ```{exercise} :label: ex-rdps-auto-7 Show that any MDP is a contracting RDP. ``` ```{exercise} :label: ex-ecrvc Show that every contracting RDP is value-continuous. ``` ```{prf:proposition} :label: p-gsdpc If $\rR$ is contracting with modulus $\beta$, then $T$ and $\{T_\sigma\}_{\sigma \in \Sigma}$ are all contractions of modulus $\beta$ on $V$ under the norm $\| \cdot \|_\infty$. ``` ```{prf:proof} Let $\rR = (\Gamma, V, B)$ be contracting with modulus $\beta$. Fix $\sigma \in \Sigma$ and let $v$ and $w$ be elements of $V$. By {eq}`eq-crdp` we have $$ |(T_\sigma \, v)(x) - (T_\sigma \, w)(x)| = | B(x, \sigma(x), v) - B(x, \sigma(x), w)| \leq \beta \| v - w \|_\infty, $$ (eq-tscrdp) for every $x \in \Xsf$. Maximizing over $x$ proves that $T_\sigma$ is a contraction of modulus $\beta$ with respect to the supremum norm. Contractivity of $T$ now follows from {prf:ref}`l-supbc`. ◻ ``` The following corollary to {prf:ref}`p-gsdpc` is immediate from Banach's contraction mapping theorem. ```{prf:corollary} :label: c-rdpcis If $\rR$ is contracting and $V$ is closed in $\RR^\Xsf$, then $\rR$ is globally stable and hence all of the optimality results in {prf:ref}`t-fbk_rpd` apply. ``` (sss-ebs)= #### Error Bounds {prf:ref}`c-rdpcis` tells us that contracting RDPs are globally stable and, as a result, the sequence of functions in $V$ generated by VFI ({prf:ref}`algo-opi_rdps` with $m=1$) converges to $v^*$. However this result is asymptotic and conditions on $v_0 = v_\sigma$ for some $\sigma \in \Sigma$. We can improve this result in the current setting by leveraging the contraction property: ```{prf:proposition} :label: p-mate Let $(\Gamma, V, B)$ be a contracting RDP with modulus of contraction $\beta$ and Bellman operator $T$. Fix $v \in V$ and let $v_k = T^k v$. If $\sigma$ is $v_k$-greedy, then $$ \|v^* - v_{\sigma} \|_{\infty} \leq \frac{2 \beta}{1-\beta} \|v_k - v_{k-1} \|_{\infty} \quad \text{for all } k \in \NN. $$ (eq-dpebr) ``` Since the VFI algorithm terminates when $\|v_k - v_{k-1} \|_{\infty}$ falls below a given tolerance, the result in {eq}`eq-dpebr` directly provides a quantitative bound on the performance of the policy returned by VFI. ```{prf:proof} *Proof of {prf:ref}`p-mate`.* Let $(\Gamma, V, B)$ and $v$ be as stated and let $v^*$ be the value function. Note that $$ \|v^* - v_{\sigma} \|_{\infty} \leq \|v^* - v_k \|_{\infty} + \|v_k - v_{\sigma} \|_{\infty}. $$ (eq-safv) To bound the first term on the right-hand side of ({eq}`eq-safv`), we use the fact that $v^*$ is a fixed point of $T$, obtaining $$ \|v^* - v_k \|_{\infty} \leq \|v^* - Tv_k \|_{\infty} + \| Tv_k - v_k \|_{\infty} \leq \beta \|v^* - v_k \|_{\infty} + \beta \|v_k - v_{k-1}\|_{\infty}. $$ Hence $$ \|v^* - v_k \|_{\infty} \leq \frac{\beta}{1-\beta} \|v_k - v_{k-1} \|_{\infty}. $$ (eq-bovo) Now consider the second term on the right-hand side of ({eq}`eq-safv`). Since $\sigma$ is $v_k$-greedy, we have $Tv_k = T_{\sigma} v_k$, and $$ \|v_k - v_{\sigma} \|_{\infty} \leq \|v_k - Tv_k \|_{\infty} + \|Tv_k - v_{\sigma}\|_{\infty} = \|Tv_{k-1} - Tv_k\|_{\infty} + \|T_{\sigma} \, v_k - T_{\sigma} \, v_{\sigma}\|_{\infty}. $$ $$ \fore \|v_k - v_{\sigma} \|_{\infty} \leq \beta \|v_{k-1} - v_k\|_{\infty} + \beta \| v_k - v_{\sigma}\|_{\infty}. $$ $$ \fore \|v_k - v_{\sigma} \|_{\infty} \leq \frac{\beta}{1-\beta} \|v_k - v_{k-1} \|_{\infty}. $$ (eq-bovt) Together, ({eq}`eq-safv`), ({eq}`eq-bovo`), and ({eq}`eq-bovt`) give us ({eq}`eq-dpebr`). ◻ ``` (sss-abtc)= #### A Blackwell-Type Condition Next we state a useful condition for contractivity that is related to Blackwell's sufficient condition discussed in {ref}`sss-blackwell`. We say that RDP $(\Gamma, V, B)$ satisfies **Blackwell's condition** if $v \in V$ implies $v + \lambda \coloneq v + \lambda \1$ is in $V$ for every $\lambda \geq 0$ and, in addition, there exists a $\beta \in [0, 1)$ such that $$ B(x, a, v + \lambda) \leq B(x, a, v) + \beta \lambda \qquad \text{for all } (x, a) \in \Gsf \text{, } v \in V \text{ and } \lambda \in \RR_+. $$ ```{exercise} :label: ex-fbico Prove the following: If $\rR$ satisfies Blackwell's condition, then $\rR$ is contracting with modulus $\beta$. ``` ```{solution} ex-fbico Let $\rR = (\Gamma, V, B)$ satisfy Blackwell's condition. Fix $v, w \in V$ and $(x, a) \in \Gsf$. Observe that $v = w + v - w \leq w + \| v - w\|_\infty$. By monotonicity of $B$ and Blackwell's condition, we have $$ B(x, a, v) \leq B(x, a, w + \| v - w\|_\infty) \leq B(x, a, w) + \beta \| v - w\|_\infty. $$ As a result, $B(x, a, v) - B(x, a, w) \leq \beta \| v - w\|_\infty$. Reversing the roles of $v$ and $w$ yields $$ |B(x, a, v) - B(x, a, w)| \leq \beta \| v - w\|_\infty. $$ Since $\beta < 1$, the RDP $\rR$ is contracting. ``` ```{exercise} :label: ex-sddc Prove that the RDP for the state-dependent discounting model in {prf:ref}`eg-sdd` is contracting on $V = \RR^\Xsf$ whenever there exists a $b < 1$ with $\beta(x,a, x') \leq b$ for all $(x,a) \in \Gsf$ and $x' \in \Xsf$. ``` ```{exercise} :label: ex-rdps-auto-8 Prove that the discrete optimal savings model from {ref}`sss-osli` satisfies Blackwell's condition. ``` (sss-quant_dp)= #### Application: Job Search with Quantile Preferences Consider the job search problem with correlated wage draws first investigated in {ref}`ss-jsms`. With finite wage offer set $\Wsf$, wage offer process generated by $P \in \mopw$ and $\beta \in (0,1)$, we can frame this as an RDP $(\Gamma, V, B)$ with $V = \RR^\Wsf$, $\Gamma(w) = \{0, 1\}$ for $w \in \Wsf$ and $$ B(w, a, v) \coloneq a \frac{w}{1-\beta} + (1-a) [ c + \beta (Pv)(w) ]. $$ Since the model just described is an optimal stopping problem, {prf:ref}`eg-osrdp_contracting` tells us that $(V, \Gamma, B)$ is contracting. Now consider the following modification, where $\Gamma$ and $V$ are as before but $B$ is replaced by $$ B_\tau(w, a, v) \coloneq a \frac{w}{1-\beta} + (1-a) [ c + \beta (R_\tau v)(w) ], $$ where $\tau \in [0,1]$ and $R_\tau$ is the quantile certainty equivalent operator described in {prf:ref}`ex-quant_ra`. ```{exercise} :label: ex-rdps-auto-9 Prove that $(V, \Gamma, B_\tau)$ is a contracting RDP. ``` Figure {numref}`f-quantile_js` shows the reservation wage for a range of $\tau$ values, computed using OPI (and taking the smallest $w \in \Wsf$ such that $\sigma^*(w) = 1$). The stationary distribution of $P$ is also shown in the figure, tilted 90 degrees. The parameters and the code for applying $T_\sigma$ and evaluating greedy functions is shown in {numref}`list-quantile_js`. That listing includes the quantile operator $R_\tau$, which is implemented in {numref}`list-quantile_function`. (Quantiles of discrete random variables can also be computed using functionality contained in `Distributions.jl`.) ```{figure} ../figures/quantile_js.pdf :name: f-quantile_js The reservation wage as a function of $\tau$ ``` The main message of Figure {numref}`f-quantile_js` is that the reservation wage rises in $\tau$. In essence, higher $\tau$ focuses the attention of the worker on the right tail of the distribution of continuation values. This encourages the worker to take on more risk, which leads to a higher reservation wage (i.e., reluctance to accept a given current offer). ```{code-block} julia :name: list-quantile_function :caption: Conditional quantile operator (`quantile_function.jl`) :linenos: "Compute the τ-th quantile of v(X) when X ∼ ϕ." function quantile(τ, v, ϕ) # Sort v and reorder ϕ accordingly indices = sortperm(v) v_sorted = v[indices] ϕ_sorted = ϕ[indices] for (i, v_value) in enumerate(v_sorted) p = sum(ϕ_sorted[1:i]) # sum all ϕ[j] s.t. v[j] ≤ v_value if p ≥ τ # exit and return v_value if prob ≥ τ return v_value end end end ``` ```{code-block} julia :name: list-quantile_js :caption: Job search with quantile operator (`quantile_js.jl`) :linenos: using QuantEcon include("quantile_function.jl") "Creates an instance of the job search model." function create_markov_js_model(; n=100, # wage grid size ρ=0.9, # wage persistence ν=0.2, # wage volatility β=0.98, # discount factor c=1.0, # unemployment compensation τ=0.5 # quantile parameter ) mc = tauchen(n, ρ, ν) w_vals, P = exp.(mc.state_values), mc.p return (; n, w_vals, P, β, c, τ) end """ The policy operator (T_σ v)(w) = σ(w) (w / (1-β)) + (1 - σ(w))(c + β (R_τ v)(w)) """ function T_σ(v, σ, model) (; n, w_vals, P, β, c, τ) = model h = c .+ β * R(τ, v, P) e = w_vals ./ (1 - β) return σ .* e + (1 .- σ) .* h end " Get a v-greedy policy." function get_greedy(v, model) (; n, w_vals, P, β, c, τ) = model σ = w_vals / (1 - β) .≥ c .+ β * R(τ, v, P) return σ end ``` (sss-aod)= #### Application: Optimal Default In this section, we consider a small open economy that borrows in international financial markets in order to smooth consumption and has the option to default. We show that the model is a contractive RDP. Income $(Y_t)_{t \geq 0}$ is exogenous and $Q$-Markov on finite set $\Ysf$. A representative household faces budget constraint $$ C_t = Y_t + b_t - q b_{t+1} \qquad (t \geq 0), $$ where $C_t$ is consumption at time $t$, $q$ is the price at time $t$ of a risk-free claim on one unit of time $t+1$ consumption; $q$ is determined outside the model, say international markets; $b_t$ measures foreign lending. Purchasing a claim on $b_{t+1}$ units of time $t+1$ consumption costs $q b_{t+1}$. Purchasing bond with *negative* face value $b_{t+1}$ pays $q b_{t+1}$ in current consumption goods and promises to deliver $b_{t+1}$ next period. Bond trading is managed by a benevolent government that wants to maximize household utility. Households discount future utility at rate $\beta \in (0,1)$ and current consumption $C_t$ generates current utility $u(C_t)$. The government faces borrowing constraint $b_t \geq - m$ where $m \geq 0$. The government maximizes expected discounted utility for the households. The government can default on foreign loans. In this case, output available for consumption drops from $Y_t$ to $h(Y_t)$, where $h$ is a function satisfying $h(y) < y$ for all $y$. After a country defaults, it temporarily loses access to the international credit market. At the end of each period during which the country is in default, it regains access to international credit markets with probability $\theta \in (0,1)$. With probability $1-\theta$ it remains in financial autarky. When a country regains access to foreign borrowing, its debt is reset to zero. We can cast this as an RDP by considering the value of each state and action. We set the state space $\Xsf$ to be the set of all $(y, b, d)$ in $\Ysf \times \Bsf \times \{0, 1\}$, where $\Bsf$ is a finite subset of $[-m, \infty)$ indicating possible choices for bond holdings $b_t$ and $d$ is a binary variable indicating whether the country is in default ($d=0$ means not in default and $d=1$ means in default). The value space $V$ is all of $\RR^\Xsf$. The action space is $(b_a, d_a) \in \Bsf \times \{0,1\}$ indicating choices for bond holdings and default. The feasible correspondence specifies feasible $(b_a,d_a)$ at given state $(y, b, d)$ and is given by $$ \Gamma(y, b, d) = \begin{cases} \Bsf \times \{0, 1\} & \text{ if } d = 0 \text{ and } \\ \{0 \} \times \{1\} & \text{ if } d = 1. \end{cases} $$ In other words, if $d=0$, so the country is not in default, the government can choose any $b_a \in \Bsf$ and also any $d_a \in \{0, 1\}$ (i.e., default or not default). If $d=1$, however, the government has no choices. We represent this situation by $b_a =0$ and $d_a =1$. The value aggregator takes the form $$ B((y, b, d), (b_a, d_a), v) = \text{value in state } (y, b, d) \text{ under action } (b_a, d_a). $$ To specify it we decompose the problem across cases for $d$ and $d_a$. First consider the case where $d=0$ (not currently in default) and $d_a=0$ (the government chooses not to default). For this case $y + b - q b_a$ is current consumption, so we set $$ B((y, b, 0), (b_a, 0), v) = u(y + b - q b_a) + \beta \sum_{y'} v(y', b_a, 0) Q(y, y'). $$ (eq-arnid) Now consider the case where $d=0$ and $d_a=1$, so the government chooses to default. Then current consumption is $h(y)$ and we set $$ B((y, b, 0), (b_a, 1), v) = u(h(y)) + \beta \\ \left[ \theta \sum_{y'} v(y', 0, 0) Q(y, y') + (1-\theta) \sum_{y'} v(y', 0, 1) Q(y, y') \right]. $$ (eq-arid) The term $\sum_{y'} v(y', 0, 0) Q(y, y')$ is the expected value next period when the country is readmitted to international financial markets (with $b'=0$ and $d'=0$), whereas the term $\sum_{y'} v(y', 0, 1) Q(y, y')$ is the expected value next period when default continues (with $b'=0$ and $d'=1$). Since $B((y, b, 1), (b_a, 0), v)$ is not feasible (a defaulted country cannot itself directly choose to reenter financial markets), the only other possibility is $B((y, b, 1), (b_a, 1), v)$, which is the expected value when the country remains in default. But this is the same as $B((y, b, 0), (b_a, 1), v)$ specified earlier: The value for a country that stays in default is the same as that for a country that newly enters default. ```{exercise} :label: ex-rdps-auto-10 By working through cases {eq}`eq-arnid`--{eq}`eq-arid` for the value aggregator $B$, show that the model just described is a contractive RDP. ``` (ss-ecrdps)= ### Eventually Contracting RDPs (s-tvdr)= Many RDPs are not contracting. There is no single method for handling all types of non-contractive RDPs, so we introduce alternative techniques over the next few sections. The first such technique, treated in this section, handles RDPs that contract "eventually," even though they may fail to contract in one step. We show that these eventually contracting RDPs are globally stable, so all of the fundamental optimality results apply. One application for these results is the MDP model with state-dependent discounting treated in {prf:ref}`c-state_dep`. This section contains a proof of the main optimality result in that chapter ({prf:ref}`p-sddmdpec0`). (definition-and-properties)= #### Definition and Properties Let $\rR = (\Gamma, V, B)$ be an RDP with policy set $\Sigma$. We call $\rR$ **eventually contracting** if there is a map $L$ from $\Gsf \times \Xsf$ to $\RR_+$ such that $$ |B(x, a, v)-B(x, a, w)| \leq \sum_{x'} |v(x') - w(x')| L(x, a, x'), $$ (eq-ecrdp) for all $(x, a) \in \Gsf$ and all $v, w \in V$, and moreover, $$ \sigma \in \Sigma \implies \rho(L_\sigma) < 1 \quad \text{where} \quad L_\sigma(x, x') \coloneq L(x, \sigma(x), x'). $$ ```{prf:proposition} :label: p-ecrdps Let $\rR = (\Gamma, V, B)$ be an RDP. If $\rR$ is eventually contracting and $V$ is closed in $\RR^\Xsf$, then $\rR$ is also globally stable and hence all of the optimality and convergence results in {prf:ref}`t-fbk_rpd` apply. ``` ```{prf:proof} Let $\rR$ be as stated and fix $\sigma \in \Sigma$. Let $T_\sigma$ be the associated policy operator and let $L_\sigma$ be the linear operator in {eq}`eq-ecrdp`. For fixed $v, w \in V$ we have $$ \begin{aligned} |(T_\sigma \, v)(x) - (T_\sigma \, w)(x)| & = \left| B(x, \sigma(x), v) - B(x, \sigma(x), w) \right| \\ & \leq \sum_{x'} \left| v(x') - w(x') \right| \, L_\sigma(x, x'). \end{aligned} $$ Since $L_\sigma \geq 0$ and $\rho(L_\sigma) < 1$, {prf:ref}`p-ecrdps0` implies that $T_\sigma$ is eventually contracting on $V$. Since $V$ is closed in $\RR^\Xsf$, it follows that $T_\sigma$ is globally stable ({prf:ref}`t-bfpt2`,). Hence $\rR$ is globally stable, as claimed. ◻ ``` ```{exercise} :label: ex-rdps-auto-11 In {ref}`ss-entex` we studied an optimal exit problem for a firm. We can modify this problem to handle stochastic interest rates by introducing the RDP $\rR = (\Gamma, V, B)$ on state space $\Xsf$ with $\Gamma(x) = \{0,1\}$, $V = \RR^\Xsf$ and $$ B(x, a, v) = a s + (1-a) \left[ \pi(x) + \beta(x) \sum_{x'} v(x') Q(x, x') \right], $$ for some $\beta \in \RR^\Xsf_+$. (We suppose that state-dependence of $\beta$ is generated by state-dependent interest rates.) State the Bellman equation for this problem. Prove that $\rR$ is globally stable whenever there exists an $L \in \lopx$ such that $\rho(L) < 1$ and $\beta(x) Q(x, x') \leq L(x, x')$ for all $x, x' \in \Xsf$. ``` (sss-mdpsdd)= #### Optimality for MDPs with State-Dependent Discounting With {prf:ref}`p-ecrdps` in hand, we can complete the proof of {prf:ref}`p-sddmdpec0`, which pertained to optimality properties for MDPs with state-dependent discounting. Let $(\Gamma, \beta, r, P)$ be an MDP with state-dependent discounting, as defined in {ref}`sss-sdepgp`. The state space is $\Xsf$ and the action space is $\Asf$. The function $\beta$ maps $\Gsf \times \Xsf$ to $\RR_+$. Set $$ L(x, a, x') \coloneq \beta(x, a, x') P(x, a, x') \quad \text{and} \quad L_\sigma(x, x') \coloneq L(x, \sigma(x), x'), $$ for all $(x, a, x') \in \Gsf \times \Xsf$ and $\sigma \in \Sigma$. Assume the conditions of {prf:ref}`p-sddmdpec0`, so that $\rho(L_\sigma) < 1$ for all $\sigma \in \Sigma$. If we set $$ B(x, a, v) \coloneq r(x, a) + \sum_{x'} v(x') \beta(x, a, x') P(x, a, x'), $$ (eq-bforfmdpsd_r) and take $V$ to be all of $\RR^\Xsf$, then $\rR \coloneq (\Gamma, V, B)$ forms an RDP, as discussed in {prf:ref}`ex-sddc`. We claim that $\rR$ is an eventually contracting RDP. To see this, fix $v, w \in V$ and $(x, a) \in \Gsf$. Applying the definition {eq}`eq-bforfmdpsd_r` and the triangle inequality, we have $$ \begin{aligned} |B(x, a, v)-B(x, a, w)| & \leq \left| \sum_{x'} [v(x') - w(x')] \beta(x, a, x') P(x, a, x') \right| \\ & \leq \sum_{x'} |v(x') - w(x')| L(x, a, x'), \end{aligned} $$ Under the stated assumptions, for each $\sigma \in \Sigma$, the operator $L_\sigma(x, x') = L(x, \sigma(x), x')$ satisfies $\rho(L_\sigma) < 1$. Hence $\rR$ is eventually contracting, as claimed. Since $V = \RR^\Xsf$ is closed, {prf:ref}`p-ecrdps` implies that $\rR$ is a globally stable RDP. The claims in {prf:ref}`p-sddmdpec0` now follow from {prf:ref}`t-fbk_rpd`. (ss-conconrdps)= ### Convex and Concave RDPs {prf:ref}`t-fbk_rpd` shows that RDPs have excellent optimality properties when all policy operators are globally stable on value space. So far we have looked at conditions for stability based on contractions ({ref}`sss-crdps`) and eventual contractions ({ref}`ss-ecrdps`). But sometimes both of these approaches fail and we need alternative conditions. In this section, we explore alternative conditions based on Du's theorem. Du's theorem is well suited to the task of studying stability of policy operators, since it leverages the fact that all policy operators are order-preserving. #### Definitions and Optimality Let $\rR = (\Gamma, V, B)$ be an RDP with $V = [v_1, v_2]$ for some $v_1 \leq v_2$ in $\RR^\Xsf$. We call $\rR$ **convex** if 1. for all $(x, a) \in \Gsf$, $\lambda \in [0, 1]$ and $v, w$ in $V$, we have $$ B(x, a, \lambda v + (1-\lambda) w) \leq \lambda B(x, a, v ) + (1-\lambda) B(x, a, w) \quad \text{and}, $$ (eq-bconvx) 2. there exists a $\delta > 0$ such that $$ B(x, a, v_2) \leq v_2(x) - \delta[v_2(x) - v_1(x)] \text{ for all } (x, a) \in \Gsf. $$ (eq-bstrmd_d) Analogous to the convex case, we call $\rR$ **concave** if 1. for all $(x, a) \in \Gsf$, $\lambda \in [0, 1]$ and $v, w$ in $V$, we have $$ B(x, a, \lambda v + (1-\lambda) w) \geq \lambda B(x, a, v ) + (1-\lambda) B(x, a, w) \quad \text{and}, $$ (eq-bconvx2) 2. there exists a $\delta > 0$ such that $$ B(x, a, v_1) \geq v_1(x) + \delta [v_2(x) - v_1(x)] \text{ for all } (x, a) \in \Gsf. $$ (eq-bstrmd2_d) In both of the these definitions, condition (ii) is rather complex. The next exercise provides simpler sufficient conditions. ```{exercise} :label: ex-qcsb Prove that {eq}`eq-bstrmd_d` holds whenever $$ B(x, a, v_2) < v_2(x) \text{ for all } (x, a) \in \Gsf. $$ (eq-bstrmd_0) Similarly, prove that {eq}`eq-bstrmd2_d` holds whenever $$ B(x, a, v_1) > v_1(x) \text{ for all } (x, a) \in \Gsf. $$ (eq-bstrmd2) ``` ```{solution} ex-qcsb We discuss the first case, regarding {eq}`eq-bstrmd_d`. When {eq}`eq-bstrmd_0` holds, by finiteness of $\Gsf$, we can take an $\epsilon > 0$ such that $$ B(x, a, v_2) \leq v_2(x) - \epsilon \text{ for all } (x, a) \in \Gsf. $$ We then have $$ \epsilon \leq v_2(x) - B(x, a, v_2) \leq v_2(x) - B(x, a, v_1) \leq v_2(x) - v_1(x) $$ for all $x$, so $0 < \epsilon \leq \|v_2 - v_1\|_\infty$. Set $\delta \coloneq \epsilon / \| v_2 - v_1 \|_\infty$. From {eq}`eq-bstrmd_0` we get $$ B(x, a, v_2) \leq v_2(x) - \delta \| v_2 - v_1 \|_\infty \leq v_2(x) - \delta [ v_2(x) - v_1(x)] $$ for arbitrary $(x,a) \in \Gsf$. Hence {eq}`eq-bstrmd_d` holds. ``` Both convexity and concavity yield stability, as the next proposition shows. ```{div} :name: p-convmx2 ``` ```{prf:proposition} :label: p-convmx If $\rR$ is either convex or concave, then $\rR$ is globally stable. ``` ```{prf:proof} We begin with the convex case. Fix $\sigma \in \Sigma$. By the monotonicity property of RDPs, $T_\sigma$ is an order-preserving self-map on $V$. Since {eq}`eq-bconvx` holds, $T_\sigma$ is also a convex operator on $V$. Moreover, $T_\sigma \, v_1 \geq v_1$ because $T_\sigma \colon V \to V$ and, by {eq}`eq-bstrmd_0`, $T_\sigma \, v_2 \leq v_2 - \delta (v_2 - v_1)$. Hence Du's theorem applies and $T_\sigma$ is globally stable on $V$. This shows that $\rR$ is a globally stable RDP. The proof of the concave case is analogous (using Du's theorem applied to order-preserving concave operators). ◻ ``` It follows from {prf:ref}`p-convmx2` that, for convex and concave RDPs, all of the optimality and convergence results in {prf:ref}`t-fbk_rpd` apply. #### Application to MDPs {prf:ref}`p-convmx2` can be applied to establish optimality properties of regular MDPs. This exercise is redundant in the sense that optimality properties of regular MDPs have already been established using other means. At the same time, some of the arguments developed here will be helpful when we face more sophisticated problems. To sketch the argument, let $\rR = (\Gamma, V, B)$ be an RDP generated by an ordinary MDP $(\Gamma, \beta, r, P)$, as discussed in {prf:ref}`eg-rdp_fmdp`. In particular, $V =\RR^\Xsf$, and $B(x,a, v) = r(x, a) + \beta \sum_{x'} v(x') P(x, a, x')$. We set $r_1 \coloneq \min r$ and $r_2 \coloneq \max r$. Then we fix $\epsilon > 0$ and define $V$ via $$ \hat V \coloneq [v_1, v_2] \quad \text{ where } \quad v_1 \coloneq \frac{r_1 - \epsilon}{1-\beta} \; \text{ and } \; v_2 \coloneq \frac{r_2 + \epsilon}{1-\beta}. $$ (eq-mvbs) (The functions $v_1$ and $v_2$ are constant.) We claim that the RDP $\hat \rR \coloneq (\Gamma, \hat V, B)$ is both convex *and* concave. ```{exercise} :label: ex-rdps-auto-12 Prove that {eq}`eq-bstrmd_0` and {eq}`eq-bstrmd2` both hold for $\hat \rR$. ``` ```{solution} ex-rdps-auto-12 We prove {eq}`eq-bstrmd2` and leave {eq}`eq-bstrmd_0` to the reader. For given $(x, a) \in \Gsf$, $$ B(x, a, v_1) = r(x,a) + \beta \sum_{x'} v_1(x')P(x,a,x') \geq r_1 + \beta \frac{r_1 - \epsilon}{1-\beta} = \frac{r_1 - \beta r_1 + \beta r_1 - \beta \epsilon}{1-\beta} = v_1 + \epsilon. $$ Hence {eq}`eq-bstrmd2` is confirmed. ``` ```{exercise} :label: ex-rdps-auto-13 Complete the proof that $\hat \rR$ is both concave and convex. ``` ## Further Applications In this section, we consider some applications of the optimality results in {ref}`s-torpds`. (sss-arsk)= ### Risk-Sensitive RDPs In {ref}`ss-rspref` we introduced risk-sensitive preferences and discussed a recursive utility problem. Now we embed risk-sensitive preferences into a dynamic program and apply the preceding optimality results to compute optimal policies. #### Optimality Results Consider the risk-sensitive preference RDP in {prf:ref}`eg-rsrdp`, with state space $\Xsf$ and action space $\Asf$. Let $V = \RR^\Xsf$. For $(x,a) \in \Gsf$ and $v \in V$, we can express the aggregator as $$ B(x, a, v) \coloneq r(x, a) + \beta (R_\theta^a \, v)(x), $$ where $\theta$ is a nonzero constant and $$ (R_\theta^a \, v)(x) \coloneq \frac{1}{\theta} \ln \left\{ \sum_{x'} \exp(\theta v(x')) P(x, a,x') \right\}. $$ Notice that, for each fixed $a \in \Gamma(x)$, the operator $R_\theta^a$ is an entropic certainty equivalent operator on $V$ (see {prf:ref}`eg-ffln`). ```{prf:proposition} :label: p-rsmdp If $\beta < 1$, then $(\Gamma, V, B)$ is contracting. ``` ```{prf:proof} Fix $\beta < 1$. We show that $(\Gamma, V, B)$ obeys Blackwell's condition ({ref}`sss-abtc`). To this end, fix $v \in V$, $(x,a) \in \Gsf$, and $\lambda \geq 0$. Since $R_\theta^a$ is constant-subadditive ({prf:ref}`ex-ersa`), we have $$ B(x, a, v + \lambda) = r(x, a) + \beta [R_\theta^a (v + \lambda)](x) \leq r(x, a) + \beta (R_\theta^a v)(x) + \beta \lambda. $$ The right-hand side equals $B(x, a, v) + \beta \lambda$, so Blackwell's condition holds. The claim in {prf:ref}`p-rsmdp` now follows from {prf:ref}`ex-fbico`. ◻ ``` The next exercise pertains to quantile preferences rather than risk-sensitive preferences, but the result can be obtained via a relatively straightforward modification of the proof of {prf:ref}`p-rsmdp`. ```{exercise} :label: ex-rdps-auto-14 Let $\rR \coloneq (\Gamma, V, B)$ be an RDP with $V = \RR^\Xsf$ and fix $\tau \in [0,1]$. Let $B(x, a, v) = r(x, a) + \beta (R_\tau^a \, v)(x)$ where, for each $a \in \Gamma(x)$, the map $R_\tau^a$ is given by $$ (R_\tau^a \, v)(x) = \min \left\{ y \in \RR \;\Big|\; \sum_{x'} \1\{v(x') \leq y\} P(x, a, x') \geq \tau \right\} \qquad (v \in V, \; x \in \Xsf). $$ Prove that $\rR$ is globally stable whenever $\beta < 1$. ``` ```{solution} ex-rdps-auto-14 For each fixed $a \in \Gamma(x)$, the map $R_\tau^a$ is a version of the quantile certainty equivalent operator defined in {prf:ref}`ex-quant_ra`. With this observation, we can replicate the proof of {prf:ref}`p-rsmdp`, after replacing $R_\theta^a$ with $R_\tau^a$. The latter is also constant-subadditive, by {prf:ref}`ex-qrasa`. ``` #### Risk-Sensitive Job Search Let's consider a job search problem where future wage outcomes are evaluated via risk-sensitive expectations. The associated Bellman operator is $$ (Tv)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \frac{\beta}{\theta} \ln \left[ \sum_{w'} \, \exp(\theta v(w')) P(w, w') \right] \right\} \qquad (w \in \Wsf). $$ Here $\theta$ is a nonzero parameter and other details are as in {ref}`ss-jsms`. We can represent the problem as an RDP with state space $\Wsf$, action space $\Asf = \{0, 1\}$, feasible correspondence $\Gamma(w) = \Asf$, value space $V \coloneq \RR^\Wsf$, and value aggregator $$ B(w, a, v) = a \frac{w}{1-\beta} + (1-a) \left\{ c + \frac{\beta}{\theta} \ln \left[ \sum_{w'} \, \exp(\theta v(w')) P(w, w') \right] \right\}. $$ If $\theta < 0$, then the agent is risk-averse with respect to the gamble associated with continuing and waiting for new wage draws. If $\theta > 0$ then the agent is risk-loving with respect to such gambles. For $\theta \approx 0$, the agent is close to risk-neutral. Figure {numref}`f-risk_sensitive_js` shows how the continuation value, value function and optimal decision vary with $\theta$. Apart from $\theta$, parameters are identical to those in {numref}`list-markov_js`. Indeed, for $\theta$ close to zero, as in the middle sub-figure of Figure {numref}`f-risk_sensitive_js`, we see that the value function and reservation wage are almost identical to those from the risk-neutral model in Figure {numref}`f-markov_js_1`. ```{figure} ../figures/risk_sensitive_js.pdf :name: f-risk_sensitive_js Job search with risk-sensitive preferences ``` As expected, a negative value of $\theta$ tends to reduce the continuation value and hence the reservation wage, since the agent's dislike of risk encourages early acceptance of an offer. For positive values of $\theta$ the reverse is true, as seen in the bottom sub-figure. ```{exercise} :label: ex-rdps-auto-15 Replicate Figure {numref}`f-risk_sensitive_js`. The simplest method is to modify the code in {numref}`list-markov_js` and use value function iteration. ``` (ss-adag)= ### Adversarial Agents Some problems in economics, finance, and artificial intelligence assume that decisions emerge from a dynamic two-person zero-sum game in which the two agents' preferences are perfectly *mis*aligned. This can lead to a dynamic program where the Bellman equation takes the form $$ v(x) = \max_{a \in \Gamma(x)} \inf_{d \in D(x, a)} B(x, a, d, v) \qquad (x \in \Xsf, \; v \in \RR^\Xsf), $$ (eq-adag) where $B(x, a, d, v)$ represents lifetime value for the decision maker conditional on her current action $a$ and her adversary's action $d$. The decision maker chooses action $a \in \Gamma(x)$ with the knowledge that the opponent will then choose $d \in D(x, a)$ to minimize her lifetime value. ```{prf:remark} In some settings we can replace the $\inf$ in {eq}`eq-adag` with $\min$. In other settings, this is not so obvious. For this reason, we use $\inf$ throughout, paired with the assumption that $B$ is bounded below. This means that the infimum is always well-defined and finite. ``` (optimality)= #### Optimality To establish optimality properties in the setting of {eq}`eq-adag`, we introduce the following assumptions: - If $v, w \in \RR^\Xsf$ with $v \leq w$, then $$ B(x, a, d, v) \leq B(x, a, d, w) \quad \text{for all } x \in \Xsf, \; a \in \Gamma(x), \; d \in D(x, a). $$ - There exists a $v_1 \in \RR^\Xsf$ and $\epsilon > 0$ such that $$ v_1(x) + \epsilon \leq B(x, a, d, v_1) \quad \text{for all } x \in \Xsf, \; a \in \Gamma(x), \; d \in D(x, a). $$ - There exists a $v_2 \in \RR^\Xsf$ such that $v_1 \leq v_2$ and $$ B(x, a, d, v_2) \leq v_2(x) \quad \text{for all } x \in \Xsf, \; a \in \Gamma(x), \; d \in D(x, a). $$ - If $\lambda \in [0,1]$ and $v, w \in \RR^\Xsf$, then $$ B(x, a, d, \lambda v + (1-\lambda) w) \geq \lambda B(x, a, d, v ) + (1-\lambda) B(x, a, d, w) $$ for all $x \in \Xsf$, $a \in \Gamma(x)$ and $d \in D(x, a)$. Condition (a) is a natural monotonicity condition: A uniform increase in continuation values increases current value at all states and actions. Conditions (b) and (c) provide upper and lower bounds. Condition (d) is a concavity condition. To analyze the decision maker's problem, we set $V \coloneq [v_1, v_2]$ and $$ \hat B(x, a, v) \coloneq \inf_{d \in D(x, a)} B(x, a, d, v) \qquad ((x, a) \in \Gsf, \; v \in V), $$ We consider $\rR = (\Gamma, V, \hat B)$. ```{prf:proposition} :label: p-adagcrdp If conditions *(a)--(d)* hold, then $\rR$ is a concave RDP. ``` An immediate corollary of {prf:ref}`p-adagcrdp` is that, under the stated conditions, the decision maker's problem is a globally stable RDP, and hence the fundamental optimality properties in {prf:ref}`t-fbk_rpd` all hold. In the proof of {prf:ref}`p-adagcrdp`, we use the following exercise. ```{exercise} :label: ex-minconcave Let $f$ and $g$ map nonempty set $D$ into $\RR$. Assume that both $f$ and $g$ are bounded below. Prove that, in this setting, $$ \inf_{d \in D} (f(d)+g(d)) \geq \inf_{d \in D} f(d)+ \inf_{d \in D}g(d). $$ ``` ```{prf:proof} *Proof of {prf:ref}`p-adagcrdp`.* First, we need to check that $\rR$ is an RDP. In view (a) we have $\hat B(x, a, v) \leq \hat B(x, a, w)$ whenever $(x, a) \in \Gsf$ and $v, w \in V$ and $v \leq w$. Also, by (b) and (c), $$ v_1(x) < \hat B(x, a, v_1) \; \text{ and } \hat B(x, a, v_2) \leq v_2(x) \quad \text{for all } (x, a) \in \Gsf. $$ (eq-adasb) As a result, $v_1(x) \leq \hat B(x, \sigma(x), v) \leq v_2(x)$ for all $x \in \Xsf$ and $v \in V$. Together, these facts imply the monotonicity and consistency conditions required of an RDP. In view of {eq}`eq-adasb` and {prf:ref}`ex-qcsb`, to establish that $\rR$ is concave, we need only show that, for fixed $\lambda \in [0,1]$ and $v, w \in V$, $$ \hat B(x, a, \lambda v + (1-\lambda) w) \geq \lambda \hat B(x, a, v ) + (1-\lambda) \hat B(x, a, w) , $$ (eq-adabconvx2) for all $(x, a) \in \Gsf$. This holds because, given $(x, a) \in \Gsf$, $\lambda \in [0,1]$ and $v, w \in V$, $$ \begin{aligned} \hat B(x, a, \lambda v + (1-\lambda) w) & = \inf_{d \in D(x, a)} B(x, a, d, \lambda v + (1-\lambda) w) \\ & \geq \inf_{d \in D(x, a)} [\lambda B(x, a, d, v ) + (1-\lambda) B(x, a, d, w)] \\ & \geq \lambda \inf_{d \in D(x, a)} B(x, a, d, v ) + (1-\lambda) \inf_{d \in D(x, a)} B(x, a, d, w), \end{aligned} $$ where the first inequality is by condition (d) and the second is by {prf:ref}`ex-minconcave`. This proves {eq}`eq-adabconvx2`, so $\rR$ is a concave RDP. ◻ ``` (sss-apmp)= #### A Perturbed MDP Problem In this section, we provide a relatively abstract application of {prf:ref}`p-adagcrdp`. Later, in {ref}`ss-amro`, we will see more concrete applications. The setting we consider is a modified MDP where the adversarial agent's actions affect the reward function and transition kernel. This leads to a Bellman equation of the form $$ v(x) = \max_{a \in \Gamma(x)} \inf_{d \in D(x, a)} \left\{ r(x, a, d) + \beta \sum_{x'} v(x')P(x, a, d, x') \right\} \qquad (x \in \Xsf). $$ (eq-twoplayer) The choice perturbation $d \in D(x, a)$ is made by the adversary. The object $P$ is a stochastic kernel, in the sense that $P(x, a, d, \cdot)$ is a distribution over $\Xsf$ for each feasible $(x, a, d)$. We assume that $\Gamma$ is a nonempty correspondence from $\Xsf$ to $\Asf$ and $D(x, a)$ is nonempty for all $(x, a) \in \Gsf$. Let $$ \hat B(x, a, v) = \inf_{d \in D(x, a)} \left\{ r(x, a, d) + \beta \sum_{x'} v(x') P(x, a, d, x') \right\} \qquad ((x, a) \in \Gsf). $$ To construct the value space $V$, we let $r_1 = \min r$ and $r_2 = \max r$, and set $$ V = [v_1, v_2] \quad \text{where} \quad v_1 \coloneq \frac{r_1 - \epsilon}{1-\beta} \quad \text{and} \quad v_2 \coloneq \frac{r_2}{1-\beta}. $$ (eq-smvbs) (These constant functions are similar to $v_1, v_2$ in {eq}`eq-mvbs`.) ```{exercise} :label: ex-bch Prove: For $v_1, v_2$ in {eq}`eq-smvbs`, conditions (b)--(c) hold. ``` ```{solution} ex-bch Regarding (b), note that $v_1$ is constant. Hence, at fixed $(x, a) \in \Gsf$ and $d \in D(x, a)$, we have $$ B(x, a, d, v_1) = r(x,a) + \beta \frac{r_1 - \epsilon}{1-\beta} \geq r_1 + \beta \frac{r_1 - \epsilon}{1-\beta} = \frac{r_1 - \beta r_1 + \beta r_1 - \beta \epsilon}{1-\beta} = v_1 + \epsilon. $$ Hence (b) is confirmed. Regarding (c), we have $$ B(x, a, d, v_2) = r(x,a) + \beta \frac{r_2}{1-\beta} \leq r_2 + \beta \frac{r_2}{1-\beta} = v_2. $$ We have now verified conditions (a)--(d). ``` ```{prf:lemma} :label: l-permdc The perturbed MDP model $\rR \coloneq (\Gamma, V, \hat B)$ is a concave RDP. ``` An immediate corollary of {prf:ref}`l-permdc` is that $\rR$ is globally stable (via {prf:ref}`p-convmx`) and all optimality results in {prf:ref}`t-fbk_rpd` apply. ```{prf:proof} *Proof of {prf:ref}`l-permdc`.* It suffices to show that $\rR$ obeys (a)--(d). Condition (a) and (d) are elementary in this setting. Conditions (b) and (c) were established in {prf:ref}`ex-bch`. ◻ ``` (ss-amro)= ### Ambiguity and Robustness Until now we have considered agents facing decision problems where outcomes are uncertain but probabilities are known. For example, while the job seeker introduced in {prf:ref}`c-introii` does not know the next period wage offer when choosing her current action, she does know the distribution of that offer. She uses this distribution to determine an optimal course of action. Similarly, the controllers in our discussion of optimal stopping and MDPs used their knowledge of the Markov transition law to determine an optimal policy. In many cases, the assumption that the decision maker knows all probability distributions that govern outcomes under different actions is debatable. In this section we study lifetime valuations in settings of **Knightian uncertainty** ({cite:author}`knight1921risk`, {cite:year}`knight1921risk`), which means that outcome distributions are themselves unknown. Some authors refer to Knightian uncertainty as **ambiguity**. Below we consider some dynamic problems where decision makers face Knightian uncertainty. #### Robust Control First we study the choices of a decision maker who knows her reward function but distrusts her specification of the stochastic kernel $P$ that describes the evolution of the state. This distrust is expressed by assuming that she knows that $P$ belongs to some class of stochastic kernels from $\Gsf \times \Xsf$ to $\Xsf$. This can lead to aggregators of the form $$ B(x, a, v) = r(x, a) + \beta \inf_{P \in \pP(x, a)} \left\{ \sum_{x'} v(x')P(x, a, x') \right\}, $$ (eq-robc) for $(x, a) \in \Gsf$. As usual, $r$ maps $\Gsf$ to $\RR$ and $\beta \in (0,1)$. The decision maker can construct a policy that is robust to her distrust of the stochastic kernel by using this aggregator $B$. Such aggregators arise in the field of robust control. Positing that the decision maker knows a nontrivial set of stochastic kernels is a way of modeling Knightian uncertainty, as distinguished from risks that are described by known probability distributions. ```{prf:example} Consider the simple job search problem from {prf:ref}`c-introii`. Suppose that the worker believes that the wage offer distribution lies in some subset $\pP$ of $\dD(\Wsf)$. She can seek a decision rule that is robust to worst-case beliefs by optimizing with aggregator $$ B(w, a, v) = a \frac{w}{1-\beta} + (1-a) \inf_{\phi \in \pP} \sum_{w'} v(x') \phi(w'). $$ ``` Returning to the robust control model with aggregator $B$ in {eq}`eq-robc`, we take $V$ is as defined in {eq}`eq-smvbs` and set $\rR = (\Gamma, V, B)$. The set $\pP$ of stochastic kernels is entirely arbitrary. ```{prf:proposition} $\rR$ is a concave RDP. ``` ```{prf:proof} Writing $B$ as $$ B(x, a, v) = \inf_{P \in \pP(x, a)} \left\{ r(x, a) + \beta \sum_{x'} v(x')P(x, a, x') \right\}, $$ (eq-rebe) we see that $\rR$ is a special case of the perturbed MDP model in {ref}`sss-apmp`. Concavity now follows from {prf:ref}`l-permdc`. ◻ ``` We conclude from this discussion that the robust control RDP is globally stable. Hence all of the fundamental optimality properties hold. #### Robustness and Adversarial Agents A more general way to implement robustness is via the aggregator $$ B(x, a, v) = r(x, a) + \beta \inf_{P \in \pP(x, a)} \left\{ \sum_{x'} v(x')P(x, a, x') + d(P(x, a, \cdot), \bar P(x, a, \cdot)) \right\}. $$ (eq-robc2) In this set up, $\pP(x, a)$ is often large, weakening the constraint on $P$. At the same time, we introduce the penalty term $d(P(x, a, \cdot), \bar P(x, a, \cdot))$, which can be understood as recording the deviation between a given kernel $P$ and some baseline specification $\bar P$. One interpretation of this setting is that the decision maker begins with a baseline specification of dynamics but lacks confidence in its accuracy. In her desire to choose a robust policy, she imagines herself playing against an adversarial agent. Her adversary can choose transition kernels that deviate from the baseline, but the presence of the penalty term means that extreme deviations are curbed. If we define $$ \hat r(x, a) = r(x, a) + d(P(x, a, \cdot), \bar P(x, a, \cdot)), $$ then {eq}`eq-robc2` can be expressed as $$ B(x, a, v) = \inf_P \left\{ \hat r(x, a) + \beta \sum_{x'} v(x')P(x, a, x') \right\}. $$ This is a special case of {eq}`eq-rebe`, so the same optimality theory applies. #### Connection to Risk-Sensitive Preferences One measure of discrepancy between two probability distributions is the **Kullback--Liebler divergence** (KL divergence) $$ d_{KL}(q \given p) \coloneq \sum_x q(x) \ln \left( \frac{q(x)}{p(x)} \right) \quad \text{for } q, p \in \dD(\Xsf). $$ It is assumed here that $q \prec_{\rm{ac}} p$, which means that $q(x)=0$ whenever $p(x)=0$. We note for future reference that $d_{KL}$ obeys the **duality formula for variational inference**, which states that, given $h \in \RR^\Xsf$, $$ \ln \sum_x \exp(h(x)) p(x) = \sup_{q \prec_{\rm{ac}} p} \left\{ \sum_x h(x) q(x) - d_{KL}(q \given p) \right\}. $$ (eq-varfor) (See, e.g., {cite}`dupuis2011weak`, Proposition 1.4.2.) In robust control, KL divergence can be used to measure deviation between the baseline specification and alternative specifications. It turns out that, under this measure of divergence, there is a tight relationship between robust control and risk-sensitive preferences. To illustrate this relationship, we fix $\theta < 0$ and set $d_\theta \coloneq -(1/\theta) d_{KL}$, so that $d_\theta$ is a simple positive rescaling of the Kullback--Leibler divergence. Using $d_\theta$ in {eq}`eq-robc2` leads to $$ B(x, a, v) = r(x, a) + \beta \inf_{P \in \pP(x, a)} \left\{ \sum_{x'} v(x')P(x, a, x') + d_\theta (P(x, a, \cdot) \given \bar P(x, a, \cdot)) \right\}. $$ The constraint set $\pP(x,a)$ is all $P \in \mopx$ such that $P(x, a, \cdot) \prec_{\rm{ac}} \bar P(x, a, \cdot)$. If we multiply both sides of the variational formula {eq}`eq-varfor` by $(1/\theta)$ and set $h = \theta v$ we get $$ \frac{1}{\theta} \ln \sum_x \exp(\theta v(x)) p(x) = \inf_{q \prec_{\rm{ac}} p} \left\{ \sum_x v(x) q(x) - \frac{1}{\theta} d_{KL}(q \given p) \right\}. $$ This allows us to rewrite $B$ as $$ B(x, a, v) = r(x, a) + \beta \frac{1}{\theta} \ln \left\{ \sum_{x'} \exp(\theta v(x')) \bar P(x, a, x') \right\}. $$ Hence, for this choice of deviation, the robust control aggregator {eq}`eq-robc2` reduces to the risk-sensitive aggregator (see {prf:ref}`eg-rsrdp`) under the baseline transition kernel. (ss-smoothamb)= ### Smooth Ambiguity {cite:t}`ju2012ambiguity` propose and study a recursive smooth ambiguity model in the context of asset pricing. A generic discrete formulation of their optimization problem can be expressed in terms of the aggregator $$ B(x, a, v) = \left\{ r(x, a) + \beta \left\{ \int \left[ \sum_{x'} v(x')^{\gamma} P_\theta (x, a, x') \right]^{\kappa/\gamma} \mu(x, \diff \theta) \right\}^{\alpha/\kappa} \right\}^{1/\alpha}, $$ (eq-smoothagg) where $\alpha, \kappa, \gamma$ are nonzero parameters, $P_\theta$ is a stochastic kernel from $\Gsf$ to $\Xsf$ for each $\theta$ in a finite dimensional parameter space $\Theta$, and $\mu(x, \cdot)$ is a probability distribution over $\Theta$ for each $x \in \Xsf$. The distribution $\mu(x, \cdot)$ represents subjective beliefs over the transition rule for the state. The aggregator $B$ in {eq}`eq-smoothagg` is defined for $x \in \Xsf$, $a \in \Gamma(x)$ and $v \in I$, where $I$ is be the interior of the positive cone of $\RR^\Xsf$. To ensure finite real values, we assume $r \gg 0$. As with the Epstein--Zin case, $\alpha$ parameterizes the elasticity of intertemporal substitution and $\gamma$ governs risk aversion. The parameter $\kappa$ captures ambiguity aversion. If $\kappa = \gamma$, the agent is said to be ambiguity neutral. ```{exercise} :label: ex-rdps-auto-16 Show that the smooth ambiguity aggregator $B$ reduces to the Epstein--Zin aggregator when the agent is ambiguity neutral. ``` ```{solution} ex-rdps-auto-16 In this case, where $\kappa = \gamma$, $B$ reduces to $$ B(x, a, v) = \left\{ r(x, a) + \beta \left\{ \sum_{x'} v(x')^{\gamma} P (x, a, x') \right\}^{\alpha/\gamma} \right\}^{1/\alpha}, $$ where $P(x, a, x') \coloneq \int P_\theta (x, a, x') \mu(x, \diff \theta)$ is a weighted average over beliefs. This is identical to the Epstein--Zin aggregator (see {prf:ref}`eg-ezrdp`). ``` Returning to {eq}`eq-smoothagg`, we focus on the case $\kappa < \gamma < 0 < \alpha < 1$, which includes the calibration used in {cite:t}`ju2012ambiguity`. (Other cases can be handled using similar methods and details are left to the reader.) After constructing a suitable value space, we will show that the resulting RDP is globally stable. As a first step, set $r_1 \coloneq \min r$, $r_2 \coloneq \max r$ and fix $\epsilon > 0$. Consider the constant functions $$ v_1 \coloneq\left( \frac{r_1}{1-\beta} \right)^{1/\alpha} \quad \text{and} \quad v_2 \coloneq \left( \frac{r_2 +\epsilon}{1-\beta} \right)^{1/\alpha} . $$ ```{exercise} :label: ex-rdps-auto-17 Prove that $$ v_1 \leq B(x, a, v_1) \leq B(x, a, v_2) < v_2 \quad \text{for all } (x, a) \in \Gsf. $$ ``` In the remainder of this section on smooth ambiguity, we set $V = [v_1, v_2]$. ```{exercise} :label: ex-rdps-auto-18 Prove that $\rR \coloneq (\Gamma, V, B)$ is an RDP. ``` Here is our main result for this section. It implies that all optimality and convergence results for $\rR$ are valid (see, in particular, {prf:ref}`t-fbk_rpd`). ```{prf:proposition} :label: p-sambs Under the stated assumptions, the RDP $\rR$ is a globally stable. ``` To prove {prf:ref}`p-sambs`, we use a transformation, just as we did with the Epstein--Zin case in {ref}`ss-ezrev`. To this end, we introduce the composite parameters $$ \xi \coloneq \frac{\gamma}{\kappa} \in (0,1) \quad \text{and} \quad \zeta \coloneq \frac{\kappa}{\alpha} < 0. $$ Then we define $$ \hat B(x, a, v) = \left\{ r(x, a) + \beta \left\{ \int \left[ \sum_{x'} v(x')^{\xi} P_\theta (x, a, x') \right]^{1/\xi} \mu(x, \diff \theta) \right\}^{\zeta} \right\}^{1/\zeta}, $$ (eq-smoothagg_t2) and $$ \hat V = [\hat v_1, \hat v_2] \quad \text{where } \; \hat v_1 \coloneq v_2^{1/\kappa} \text{ and } \hat v_2 \coloneq v_1^{1/\kappa}. $$ Note that $\hat V$ is a nonempty order interval of strictly positive real-valued functions, since $0 < v_1 < v_2$ and $\kappa < 0$. We set $\hat \rR = (\Gamma, \hat V, \hat B)$. ```{exercise} :label: ex-sabound Prove that $\hat \rR$ is an RDP satisfying $$ \hat v_1 < \hat B(x, a, \hat v_1) \quad \text{and} \quad \hat B(x, a, \hat v_2) \leq \hat v_2 \quad \text{for all } (x, a) \in \Gsf. $$ ``` The next exercise shows that $\rR$ and $\hat \rR$ are topologically conjugate (see {ref}`ss-iso_rpds`). ```{exercise} :label: ex-smtcr Let $\phi$ be defined on $(0,\infty)$ by $\phi(t) = t^\kappa$. Show that 1. $B(x, a, v) = \phi^{-1}[ \hat B(x, a, \phi \circ v) ]$ for all $v \in V$ and $(x, a) \in \Gsf$, and 2. $\phi$ is a homeomorphism from $[v_1, v_2]$ to $[\hat v_1, \hat v_2]$ (as subsets of $\RR$). ``` ```{prf:lemma} :label: l-saconcave For each $(x, a) \in \Gsf$, the function $\hat v \mapsto \hat B(x, a, \hat v)$ is concave on $\hat V$. ``` ```{prf:proof} Fix $(x, a) \in \Gsf$. We write $\hat B(x, a, \hat v)$ has $$ \hat B(x, a, \hat v) = \psi \left( \int f(\theta, v) \mu(x, \diff \theta) \right), $$ where $$ f(\theta, v) \coloneq \left[ \sum_{x'} v(x')^{\xi} P_\theta (x, a, x') \right]^{1/\xi} \quad \text{and} \quad \psi(t) \coloneq \left\{ r(x, a) + \beta t^{\zeta} \right\}^{1/\zeta}. $$ For fixed $\theta$, the function $v \mapsto f(\theta, v)$ is concave over all $v$ in the interior of the positive cone of $\RR^\Xsf$ by {prf:ref}`l-kpconcon`. The real-valued function $\psi$ satisfies $\psi' > 0$ and $\psi'' < 0$ over $t \in (0,\infty)$. Since we are composing order-preserving concave functions, it follows that $\hat B(x, a, \hat v)$ is concave on $\hat V$. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`p-sambs`.* To prove that $\rR$ is globally stable it suffices to prove that $\hat \rR$ is globally stable (see {prf:ref}`ex-smtcr` and {prf:ref}`p-osas`). Given the results of {prf:ref}`ex-sabound` and {prf:ref}`l-saconcave`, the RDP $\hat \rR$ is concave. But then $\hat \rR$ is globally stable, by {prf:ref}`p-convmx`. ◻ ``` (ss-minim_rdps)= ### Minimization Until now, all results and applications have concerned maximization of lifetime values. Now is a good time to treat minimization. Throughout this section, $\rR$ is a well-posed RDP. The pointwise minimum $\vmin \coloneq \bigwedge_\sigma v_\sigma$ is called the **min-value function** generated by $\rR$. We call a policy $\sigma \in \Sigma$ **min-optimal** for $\rR$ if $v_\sigma = \vmin$. A policy $\sigma \in \Sigma$ is called **$v$-min-greedy** for $\rR$ if $$ \sigma(x) \in \argmin_{a \in \Gamma(x)} B(x, a, v) \quad \text{for all } x \in \Xsf. $$ We say that $\rR$ obeys **Bellman's principle of min-optimality** if $$ \sigma \in \Sigma \text{ is min-optimal for } \rR \quad \iff \quad \sigma \text{ is } \vmin \text{-min-greedy}. $$ The **Bellman min-operator** $\tmin$ is defined by $$ (\tmin v)(x) = \min_{a \in \Gamma(x)} B(x, a, v) \qquad (x \in \Xsf). $$ We say that $v \in V$ obeys the **min-Bellman equation** if $\tmin v = v$. The algorithm defined by replacing "$v$-greedy" with "$v$-min-greedy" in {prf:ref}`algo-hpi_rdps` (HPI) will be called **min-HPI**. We can now state the following result, which is analogous to {prf:ref}`t-fbk_rpd`. In the statement, $\rR$ is a well-posed RDP with min-value function $\vmin$. ```{prf:theorem} :label: t-fbk_min_0 If $\rR$ is globally stable, then 1. $\vmin$ is the unique solution to the min-Bellman equation in $V$, 2. $\rR$ satisfies Bellman's principle of min-optimality, 3. $\rR$ has at least one min-optimal policy, and 4. min-HPI returns an exact optimal policy in finitely many steps. ``` Although we omit the details, a min-OPI convergence result directly analogous to the OPI convergence result in (v) of {prf:ref}`t-fbk_rpd` also holds (after replacing maximization-based $v$-greedy policies with $v$-min-greedy policies). {prf:ref}`t-fbk_min_0` is proved in {ref}`ss-minim`. For now, we consider two applications that involve minimization. (sss-apath)= #### Application: Shortest Paths Recall the shortest path problem introduced in {prf:ref}`eg-spath`, where $\Xsf$ is the vertices of a graph, $E$ is the edges, $c \colon E \to \RR_+$ maps a travel cost to each edge $(x,x') \in E$, and $\oO(x)$ is the set of direct successors of $x$. The aim is to minimize total travel cost to a destination node $d$. We adopt all assumptions from {prf:ref}`ex-shir` and assume in addition that $c(x,x')=0$ implies $x=d$. As in {prf:ref}`ex-shir`, we let $C(x)$ be the maximum cost of traveling to $d$ from $x$ along any directed path. We regard the problem as an RDP $\rR = (\oO, V, B)$ with $V = [0, C]$ and $$ B(x, x', v) = c(x, x') + v(x') \qquad (x \in \Xsf). $$ (eq-spct) In the present setting, the function $v$ in {eq}`eq-spct` is often called the **cost-to-go** function, with $v(x')$ in {eq}`eq-bnegd` understood as remaining costs after moving to state $x'$. While the value aggregator $B$ in {eq}`eq-spct` is simple, the absence of discounting (which is standard in the shortest path literature) means that $\rR$ is not contracting. Fortunately, $\rR$ turns out to be concave (in the sense of {ref}`ss-conconrdps`), which allows us to prove ```{prf:proposition} :label: p-fsp Under the stated conditions, the shortest path RDP is globally stable and the min-value function $\vmin$ is the unique solution to $$ \vmin(x) = \min_{x' \in \Gamma(x)} \left\{ c(x, x') + \vmin(x') \right\} \qquad (x \in \Xsf) $$ in $V$. A policy $\sigma \in \Sigma$ is min-optimal if and only if $$ \sigma(x) \in \argmin_{x' \in \Gamma(x)} \left\{ c(x, x') + \vmin(x') \right\} \quad \text{for all } x \in \Xsf. $$ ``` (In the present context, $\vmin$ is also known as the **minimum cost-to-go** function.) ```{prf:proof} We first show that $\rR$ is concave. By the definition of concave RDPs in {ref}`ss-conconrdps`, and given that $B(x, x', v)$ is affine in $v$ (and hence concave), it suffices to prove that there exists a $\delta > 0$ such that $$ c(x, x') \geq \delta C(x) \text{ for all } x \in \Xsf \text{ and } x' \in \oO(x). $$ (eq-bddcx) (This corresponds to {eq}`eq-bstrmd2_d` when $v_1 = 0$ and $v_2 = C$.) To this end, we set $$ \delta = \min_{x \not= d} \min_{x' \in \oO(x)} \frac{c(x, x')}{C(x)}. $$ By the stated cost assumptions, we have $c(x, x') > 0$ when $x \not= d$ and $x' \in \oO(x)$, whereas $C(x) > 0$ when $x \not= d$. Since $\Xsf$ is finite, it follows that $\delta$ is finite and positive. Evidently, with this definition, the bound {eq}`eq-bddcx` holds for all $x \not= d$. In addition, {eq}`eq-bddcx` holds trivially when $x=d$, since $C(d) = 0$. Hence {eq}`eq-bddcx` is valid for all $x \in \Xsf$. Concavity of $\rR$ implies global stability by {prf:ref}`p-convmx`. The remaining claims now follow from {prf:ref}`t-fbk_min_0`. ◻ ``` (sss-negrate)= #### Application: Negative Discount Rate Optimality When discussing MDPs we used $\beta$ to represent the discount factor. Given $\beta$, the **discount rate** or **rate of time preference** is the value $\rho$ that solves $\beta = 1/(1+\rho)$. The standard MDP assumption $\beta < 1$ implies this rate is positive. You will recall from {prf:ref}`c-mdps` that the condition $\beta < 1$ is central to the general theory of MDPs, since it yields global stability of the Bellman and policy operators on $\RR^\Xsf$ (via the Neumann series lemma or Banach's fixed-point theorem). In the previous section, on shortest paths, we studied an RDP with a zero discount rate. Now we go one step further and consider problems with negative rates of time preference. Such preference are commonly inferred when people face unpleasant tasks. Subjects of studies often prefer getting such tasks "over and done with" rather than postponing them. (Negative discount rates are inferred in other settings as well. {ref}`s-cn_abstract` provides background and references.) In this section, we model optimal choice under a negative discount rate. Taking our cue from the preceding discussion, we consider a scenario where a task generates disutility but has to be completed. In particular, we assume that $$ B(x, x', v) = c(x, x') + \beta v(x') \qquad (x, x' \in \Xsf), $$ (eq-bnegd) where $\Xsf$ is a finite set and $\beta > 1$ is some positive constant. The function $c$ gives the cost of transitioning from $x$ to the new state $x'$ The value aggregator $B$ in {eq}`eq-bnegd` is the same as the shortest path aggregator {eq}`eq-spct`, except for the constant $\beta$. To keep the discussion simple, we adopt all other assumptions from the shortest path discussion in {ref}`sss-apath`. ```{exercise} :label: ex-rdps-auto-19 Let $C(x)$ be the maximum cost of traveling from $x \in \Xsf$ to the destination node $d$ under any feasible policy. Prove that $C(x) < \infty$ for all $x$. ``` ```{solution} ex-rdps-auto-19 A feasible policy $\sigma$ is a map from $\Xsf$ to itself satisfying $\sigma(x) \in \oO(x)$ for all $x$. Recalling that $\Xsf$ is finite and setting $n = |\Xsf|$, the stated assumptions imply that $\sigma^k(x) = d$ for all $k \geq n$ (since all paths lead to $d$ in at most $n$ steps). Given that $c(d,d)=0$, it follows that the lifetime cost of following $\sigma$ from initial condition $x$ is no more than $$ c(x, \sigma(x)) + \beta c(\sigma(x), \sigma^2(x)) + \beta^2 c(\sigma^2(x), \sigma^3(x)) + \cdots + \beta^{n-1} c(\sigma^{n-1}(x), \sigma^n(x)) $$ With $c_\top \coloneq \max c$, we then have $$ C(x) \leq c_\top \frac{1 - \beta^n}{1 - \beta}. $$ ``` We now have an $\rR = (\Gamma, V, B)$ with $\Gamma = \oO$, $B$ as in {eq}`eq-bnegd` and $V = [0, C]$. The policy operators map $V$ into itself because, for $v \in V$, we clearly have $0 \leq T_\sigma \, v$ and, in addition, $$ (T_\sigma \, v)(x) = c(x, \sigma(x)) + \beta v(\sigma(x)) \leq c(x, \sigma(x)) + \beta C(\sigma(x)) \leq C(x). $$ The last bound holds because $C(x)$ is, by definition, greater than the cost of traveling from $x$ to $\sigma(x)$ and then following the most expensive path. ```{prf:proposition} :label: p-negdisc Under the stated conditions, the negative discount rate RDP is globally stable, the min-value function $\vmin$ is the unique solution to $$ \vmin(x) = \min_{x' \in \Gamma(x)} \left\{ c(x, x') + \beta \vmin(x') \right\} \qquad (x \in \Xsf) $$ in $V$ and a policy $\sigma \in \Sigma$ is min-optimal if and only if $$ \sigma(x) \in \argmin_{x' \in \Gamma(x)} \left\{ c(x, x') + \beta \vmin(x') \right\} \quad \text{for all } x \in \Xsf. $$ ``` ```{prf:proof} The proof of {prf:ref}`p-negdisc` is essentially identical to the proof of {prf:ref}`p-fsp`. Readers are invited to confirm this. ◻ ``` (s-cn_finite_rdps)= ## Chapter Notes The RDP framework adopted in this chapter is inspired by {cite:t}`bertsekas2022abstract`, who in turn credits {cite:t}`mitten1964composition` as the first research paper to frame Richard Bellman's dynamic programming problems in an abstract setting. {cite:t}`denardo1967contraction` describes key ideas including what we call contracting RDPs (see {ref}`sss-crdps`). {cite:t}`denardo1967contraction` credits {cite:t}`shapley1953stochastic` for inspiring his contraction-based arguments. The key optimality results from this chapter are new, although closely related results appear in {cite:t}`bertsekas2022abstract`. See, in addition, {cite:t}`bloise2024not`, which builds on {cite:t}`bertsekas2022abstract` and {cite:t}`ren2021dynamic`. The job search application with quantile preferences in {ref}`sss-quant_dp` is based on {cite:t}`de2022dynamic`. The same reference includes a general theory of dynamic programming when certainty equivalents are computed using quantile operators and aggregation is time additive. The optimal default application in {ref}`sss-aod` is loosely based on {cite:t}`arellano2008default`. Influential contributions to this line of work include, {cite:t}`yue2010sovereign`, {cite:t}`chatterjee2012maturity`, {cite:t}`arellano2012default`, {cite:t}`cruces2013sovereign`, {cite:t}`ghosh2013fiscal`, {cite:t}`gennaioli2014sovereign`, and {cite:t}`bocola2019quantitative`. At the start of the chapter we motivated RDPs by mentioning that equilibria in some models of production and economic geography can be computed using dynamic programming. Examples include {cite:t}`hsu2012central`, {cite:t}`hsu2014optimal`, {cite:t}`antras2020geography`, {cite:t}`kikuchi2021coase` and {cite:t}`tyazhelnikov2022production`. Early references for dynamic programming with risk-sensitive preferences include {cite:t}`jacobson`, {cite:t}`whittle1981risk`, and {cite:t}`hansen1995discounted`. Elegant modern treatments can be found in {cite:t}`asienkiewicz2017note` and {cite:t}`bauerle2024markov`, and an extension to general static risk measures is available in {cite:t}`bauerle2022markov`. Risk-sensitivity is applied to the study of optimal growth in {cite:t}`bauerle2018stochastic`, and to optimal divided payouts in {cite:t}`bauerle2017optimal`. Risk-sensitivity is also used in applications of reinforcement learning, where the underlying state process is not known. See, for example, {cite:t}`shen2014risk`, {cite:t}`majumdar2017risk` or {cite:t}`gao2021robust`. Dynamic programming problems that acknowledge model uncertainty by including adversarial agents to promote robust decision rules can be found in {cite:t}`cagetti2002robustness`, {cite:t}`hansen2011robustness`, and other related papers. {cite:t}`al2019recursive` study the connection between Epstein--Zin utility and parameter uncertainty. {cite:t}`ruszczynski2010risk` considers risk averse dynamic programming and time consistency. The smooth ambiguity model in {ref}`ss-smoothamb` is loosely adapted from {cite:t}`klibanoff2009recursive` and {cite:t}`ju2012ambiguity`. For applications of optimization under smooth ambiguity, see, for example, {cite:t}`guan2020time` or {cite:t}`yu2023time`. {cite:t}`zhao2020ambiguity` studies yield curves in a setting where ambiguity-averse agents face varying amounts of Knightian uncertainty over the short and long run. Readers who wish to see some motivation for the discussion of negative discounting in {ref}`sss-negrate` can consult {cite:t}`loewenstein1991workers`, who found that the majority of workers they surveyed reported a preference for increasing wage profiles over decreasing ones that yield the same undiscounted sum, even when it was pointed out that the latter could be used to construct a dominating consumption sequence. {cite:t}`loewenstein1991negative` obtained similar results. In summarizing their study, they argue that, in the context of the choice problems that they examined, "sequences of outcomes that decline in value are greatly disliked, indicating a negative rate of time preference" {cite:p}`loewenstein1991negative`. In {ref}`sss-negrate` we considered dynamic programs with negative discount rates. A more general treatment of such problems can be found in {cite:t}`kikuchi2021coase`, which also shows how negative discount rate dynamic programs connect to static problems concerning equilibria in production networks and draws connections with Coase's theory of the firm. An algorithm that we neglected to discuss is stochastic gradient descent (or ascent) in policy space. Typically policies are parameterized via an approximation architecture that consists of basis functions, activation functions, and compositions of them (e.g., a neural network). In large models, such approximation is used even when the state and action spaces are finite, simply because the curse of dimensionality makes exact representations infeasible. For recent discussions of gradient descent in policy spaces see {cite:t}`nota2019policy`, {cite:t}`mei2020global`, and {cite:t}`bhandari2022global`. [^1]: For $\Hmax$ to be well-defined, we must always select the same $v$-greedy policy when the operator is applied to $v$. To this end, we enumerate the policy set $\Sigma$ and choose the first $v$-greedy policy. This choice of convention has no effect on convergence results. ======================================================================== ## Abstract Dynamic Programs (c-adps)= # Abstract Dynamic Programming In {prf:ref}`c-rdps` we introduced RDPs, stated their optimality properties, and investigated applications that satisfy optimality conditions. But we have yet to prove the core optimality and convergence results in {prf:ref}`t-fbk_rpd`. Rather than proving these result directly, we now present a very abstract version of a dynamic programming problem that consist of a family of self-maps on a partially ordered set. Doing so allows us to simplify proofs and extend the reach of dynamic programming theory. (The value of these extensions will become clearer in Volume 2.) (s-otpro)= ## Abstract Dynamic Programs First, we define abstract dynamic programs and prove optimality results under a set of high-level assumptions. Then we connect these results to our {prf:ref}`c-rdps` optimality claims for RDPs. ### Preliminaries Let's cover some fundamental concepts that we'll use when considering abstract dynamic programs. #### Order Stability The first concept is related to stability of maps over partially ordered spaces. Our aim is to provide a weak notion of stability that can be applied in any partially ordered set (without any form of topology). Let $V$ be a partially ordered set and let $T$ be a self-map on $V$ with exactly one fixed point $\bar v$ in $V$. In this setting, we call $T$ - **upward stable** on $V$ if $v \in V$ and $v \preceq T \, v$ implies $v \preceq \bar v$, - **downward stable** on $V$ if $v \in V$ and $T \, v \preceq v$ implies $\bar v \preceq v$, and - **order stable** on $V$ if $T$ is both upward and downward stable. Figure {numref}`f-up_down_stable` gives an illustration of a map $T$ that on $V = [0,1]$ that is order stable: all points mapped up by $T$ lie below its fixed point and all points mapped down by $T$ lie above its fixed point. The figure suggests that order stability is related to global stability, as defined in {ref}`sss-glostab`. We will affirm this in {prf:ref}`l-gsusds`. ```{figure} ../figures/up_down_stable.pdf :name: f-up_down_stable Order-stability of a self-map $T$ on $V = ([0,1], \leq)$ ``` ```{exercise} :label: ex-adps-auto-1 Let $\Xsf$ be finite and consider the self-map on $V \coloneq (\RR^\Xsf, \leq)$ defined by $T v = r + A v$ for some $r \in \RR^\Xsf$ and $A \in \lopx$ with $0 \leq A$ and $\rho(A)<1$. Prove that $T$ is order stable on $V$. ``` ```{solution} ex-adps-auto-1 By the Neumann series lemma, $T$ has a unique fixed point in $V$ given by $\bar v \coloneq (I-A)^{-1} r$. $T$ is upward stable because, given $v \in \RR^\Xsf$ with $v \leq T \, v$, we have $v \leq r + A v$, or $(I - A) v \leq r$. By the Neumann series lemma, $(I - A)^{-1}$ is a positive linear operator (as the sum of nonnegative matrices), so we can multiply by this inverse to get $v \leq (I - A)^{-1} r = \bar v$. This proves upward stability. Reversing the inequalities shows that downward stability also holds. ``` ```{prf:lemma} :label: l-gsusds Let $\Xsf$ be finite, let $V$ be a subset of $\RR^\Xsf$, and let $T$ be an order preserving self-map on $V$. If $T$ is globally stable on $V$, then $T$ is order stable on $V$. ``` ```{prf:proof} Assume the stated conditions. By global stability, $T$ has a unique fixed point $\bar v$ in $V$. If $v \in V$ and $v \leq T \, v$, then iterating on this inequality and using the fact that $T$ is order preserving yields $v \leq T^k \, v$ for all $k \in \NN$. Applying global stability and taking the limit gives $v \leq \bar v$. Hence upward stability holds. The proof of downward stability is similar. ◻ ``` (sss-orddual)= #### Order Duals Given partially ordered set $V$, let $V^\partial = (V, \preceq^\partial)$ be the **order dual**, so that, for $u, v \in V$, we have $u \preceq^\partial v$ if and only if $v \preceq u$. (The notation is slightly confusing but the concept is simple: $V^\partial$ is just $V$ with the order reversed.) The following result will be useful. ```{prf:lemma} :label: l-odod $S$ is order stable on $V$ if and only if $S$ is order stable on $V^\partial$. ``` ```{prf:proof} Let $S$ be as stated. By definition, $S$ has a unique fixed point $\bar v \in V$. Hence it remains only to check that $S$ is upward and downward stable on $V^\partial$. Regarding upward stability, suppose $v \in V$ and $v \preceq^\partial S v$. Then $Sv \preceq v$ and hence $\bar v \preceq v$, by downward stability of $S$ on $V$. But then $v \preceq^\partial \bar v$, so $S$ is upward stable on $V^\partial$. The proof of downward stability is similar. We have shown that $S$ is order stable on $V^\partial$ whenever $S$ is order stable on $V$. The reverse implication holds because the order dual of $V^\partial$ is $V$. ◻ ``` (ss-adpmain)= ### Abstract Dynamic Programs In this section, we formalize abstract dynamic programs and present fundamental optimality results. {ref}`sss-prelude` starts the ball rolling with an informal overview. (sss-prelude)= #### Prelude We saw in {ref}`s-rdp_theory` that a globally stable RDP yields a set of feasible policies $\Sigma$ and, for each $\sigma \in \Sigma$, a policy operator $T_\sigma$ defined on the value space $V \subset \RR^\Xsf$. Notice that the dynamic program is fully specified by the family of operators $\{T_\sigma\}_{\sigma \in \Sigma}$ and the space $V$ that they act on. From this set of operators we obtain the set of lifetime values $\{v_\sigma\}_{\sigma \in \Sigma}$, with each $v_\sigma$ uniquely identified as a fixed point of $T_\sigma$. These lifetime values define the value function $v^*$ as the pointwise maximum $v^* = \vee_\sigma \, v_\sigma$. An optimal policy is then defined as a $\sigma \in \Sigma$ obeying $v_\sigma = v^*$. To shed unnecessary structure before the main optimality proofs, a natural idea is to start directly with an abstract set of "policy operators" $\{T_\sigma\}$ acting on some set $V$. One can then define lifetime values and optimality as in the previous paragraph and start to investigate conditions on the family of operators $\{T_\sigma\}$ that lead to optimality. We use these ideas as our starting point, beginning with an arbitrary family $\{T_\sigma\}$ of operators on a partially ordered set. (sss-opset)= #### Defining ADPs (sss-chopt)= An **abstract dynamic program** (**ADP**) is a pair $\aA = (V, \{T_\sigma\}_{\sigma \in \Sigma})$ such that 1. $V = (V, \preceq)$ is a partially ordered set, 2. $\{T_\sigma\} \coloneq \{T_\sigma\}_{\sigma \in \Sigma}$ is a family of self-maps on $V$, and 3. for all $v \in V$, the set $\{T_\sigma \, v\}_{\sigma \in \Sigma}$ has both a least and greatest element. Elements of the index set $\Sigma$ are called **policies** and elements of $\{T_\sigma \}$ are called **policy operators**. Given $v \in V$, a policy $\sigma$ in $\Sigma$ is called **$v$-greedy** if $T_{\sigma} \, v \succeq T_\tau \, v$ for all $\tau \in \Sigma$. Existence of a greatest element in (iii) of the definition is equivalent to the statement that each $v \in V$ has at least one $v$-greedy policy. ```{prf:remark} Existence of a least element in (iii) is needed only because we wish to consider minimization as well as maximization. For settings where only maximization is considered, this can be dropped from the list of assumptions. (An analogous statement holds for minimization and greatest elements.) We will mention least elements in {prf:ref}`eg-rdsareadps` and then disregard them until we treat minimization in {ref}`ss-minim`. ``` ```{prf:remark} In the applications treated in this chapter, $\preceq$ will always be the pointwise partial order. In Volume 2 other partial orders arise. ``` ```{prf:example} :label: eg-rdsareadps Let $\rR = (\Gamma, V, B)$ be an RDP with finite-state $\Xsf$, as defined in {ref}`sss-defrdps`. For each $\sigma$ in the feasible policy set $\Sigma$, let $T_\sigma$ be the corresponding policy operator, defined at $v \in V$ by $(T_\sigma \, v)(x) = B(x, \sigma(x), v)$. The pair $\aA_{\rR} \coloneq (V, \{T_\sigma\})$ is an ADP, since $V$ is partially ordered by $\leq$, $T_\sigma$ is a self-map on $V$ for all $\sigma \in \Sigma$, and, given $v \in V$, choosing $\bar \sigma \in \Sigma$ such that $\bar \sigma(x) \in \argmax_{a \in \Gamma(x)} B(x, a, v)$ for all $x \in \Xsf$ produces a $v$-greedy policy and a greatest element for $\{T_\sigma \, v\}$ (cf., {prf:ref}`ex-rdphgl`). A least element of $\{T_\sigma \, v\}$ can be generated by replacing "argmax" with "argmin." ``` In the setting of {prf:ref}`eg-rdsareadps`, we call $\aA_{\rR}$ the ADP **generated by** $\rR$. ```{prf:example} :label: eg-mdpsareadps Let $\mM = (\Gamma, \beta, r, P)$ be an MDP, as defined in {ref}`sss-fsmdp`, with policy operators $\{T_\sigma\}$ defined by $T_\sigma \, v = r_\sigma + \beta P_\sigma \, v$ (as in {eq}`eq-mdpts`). $\aA_{\mM} \coloneq (\RR^\Xsf, \{T_\sigma\})$ is an ADP (as a special case of {prf:ref}`eg-rdsareadps`). We call $\aA_{\mM}$ the ADP **generated by** $\mM$. ``` We have just shown that RDPs are ADPs. But there are also ADPs that do not fit naturally into the RDP framework. The next two examples illustrate. In these examples, the Bellman equation does not match the RDP Bellman equation $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ due to the inverted order of expectation and maximization. ```{prf:example} :label: eg-qfacadp Recall the $Q$-factor MDP Bellman operator, which takes the form $$ (Sq)(x, a) = r(x, a) + \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x'), $$ (eq-paboo) with $q \in \RR^\Gsf$ and $(x,a) \in \Gsf$ (We are repeating {eq}`eq-pabo`.) The $Q$-factor policy operators $\{S_\sigma\}$ corresponding to {eq}`eq-paboo` are given by $$ (S_\sigma \, q)(x, a) = r(x, a) + \beta \sum_{x'} q(x', \sigma(x')) P(x, a, x') \qquad ((x,a) \in \Gsf). $$ (eq-pabop) Each $S_\sigma$ is a self-map on $\RR^\Gsf = (\RR^\Gsf, \leq)$. If $q \in \RR^\Gsf$ and $\sigma \in \Sigma$ is such that $\sigma(x) \in \argmax_{a \in \Gamma(x)}q(x, a)$ for all $x \in \Xsf$, then $S_\sigma \, q \geq S_{\tau} \, q$ on $\Gsf$ for all $\tau \in \Sigma$. Hence $\sigma$ is $q$-greedy and $\aA \coloneq (\RR^\Gsf, \{S_\sigma\})$ is an ADP. ``` ```{prf:example} :label: eg-rsqfac In reinforcement learning and related fields the $Q$-factor approach from {prf:ref}`eg-qfacadp` has been extended to risk-sensitive decision processes (see, e.g., {cite}`fei2021exponential`). The corresponding $Q$-factor Bellman equation is given by $$ f(x, a) = r(x, a) + \frac{\beta}{\theta} \ln \left\{ \sum_{x'} \exp \left[ \theta \max_{a' \in \Gamma(x')} f(x', a') \right] P(x, a, x') \right\} \qquad ((x,a) \in \Gsf). $$ (eq-rsqbell) The policy operators over risk-sensitive $Q$-factors take the form $$ (Q_\sigma \, f)(x, a) = r(x, a) + \frac{\beta}{\theta} \ln \left[ \sum_{x'} \exp \left[ \theta f(x', \sigma(x')) \right] P(x, a, x') \right], $$ (eq-rspbell) where $f \in \RR^\Gsf$ and $\sigma \in \Sigma$. An argument similar to the one given in {prf:ref}`eg-qfacadp` confirms that each $f \in \RR^\Gsf$ has an $f$-greedy policy. Hence $(\RR^\Gsf, \{Q_\sigma\})$ is an ADP. ``` In {prf:ref}`c-ctime` we will see that continuous time dynamic programs can also be viewed as ADPs. ## Optimality In this section, we study optimality properties of ADPs, aiming for generalizations of the foundational results of dynamic programming. To achieve this aim we need to define optimality and provide sufficient conditions. ### Max-Optimality We begin with maximization. Later, in {ref}`ss-minim`, we will show that results for minimization problems are simple corollaries of maximization results. #### Lifetime Values The objective of dynamic programming is to optimize lifetime value. But what is lifetime value in this abstract context? Suppose that, for an ADP $(V, \{T_\sigma\})$ and fixed $\sigma \in \Sigma$, the policy operator $T_\sigma$ has a unique fixed point. In this setting, we write $v_\sigma$ for the fixed point of $T_\sigma$ and call it the **$\sigma$-value function**. We interpret it as the lifetime value of following policy $\sigma$. A closely related interpretation was discussed at length for RDPs in {ref}`sss-polval` and the situation here is analogous. ```{prf:example} Let $\mM$ be an MDP. If $\aA_\mM$ is the ADP generated by $\mM$, as in {prf:ref}`eg-mdpsareadps`, then the unique fixed point of $T_\sigma$ is $v_\sigma = (I-\beta P_\sigma)^{-1} r_\sigma$. This accords with our interpretation of fixed points of $T_\sigma$ as lifetime values, since $(I-\beta P_\sigma)^{-1} r_\sigma$ is precisely the lifetime value of $\sigma$ under the MDP assumptions (see {ref}`sss-fmdpv`). ``` ```{prf:example} Let $\aA = (V, \{T_\sigma\})$ when each $T_\sigma$ is a Koopmans operator on $V$, as defined in {ref}`ss-koopop`. A fixed point of a Koopmans operator is interpreted as lifetime utility under the preferences it represents (see {ref}`ss-koopop`). Thus $v_\sigma$, when well-defined, is the lifetime value associated with policy $\sigma$ and the preferences embedded in $T_\sigma$. ``` We call an ADP $\aA \coloneq (V, \{T_\sigma\})$ **well-posed** if every policy operator $T_\sigma$ has a unique fixed point in $V$. In view of the preceding discussion on lifetime values, well-posedness is a minimum requirement for constructing an optimality theory around ADPs. #### Operators Let $\aA = (V, \{T_\sigma\})$ be an ADP. We set $$ \tmax \, v \coloneq \bigvee_\sigma T_\sigma \, v \qquad (v \in V), $$ (eq-tbve) and call $\tmax$ the **Bellman operator** generated by $\aA$. Note that $T$ is a well-defined self-map on $V$ by part (iii) of the definition of ADPs (existence of greedy policies). A function $v \in V$ is said to satisfy the **Bellman equation** if it is a fixed point of $\tmax$. The definition of $\tmax$ in {eq}`eq-tbve` includes all of the Bellman operators we have met as special cases. For example, consider an RDP $\rR = (\Gamma, V, B)$ with Bellman operator $(Tv)(x) = \max_{a \in \Gamma(x)}B(x,a,v)$. We can write $T$ as $\bigvee_\sigma T_\sigma \, v$, as shown in {prf:ref}`ex-rdpac`. Thus, the Bellman operator of the RDP agrees with the Bellman operator $\tmax$ of the corresponding ADP $\aA_{\rR}$. ```{exercise} :label: ex-adps-auto-2 Show that 1. $\sigma \in \Sigma$ is $v$-greedy if and only if $T_\sigma \, v = \tmax \, v$, and 2. $\tmax$ in {eq}`eq-tbve` is order preserving whenever $T_\sigma$ is order preserving for all $\sigma \in \Sigma$. ``` ```{solution} ex-adps-auto-2 The first part of the exercise is immediate from the definitions. For the second, take $v, w \in V$ with $v \preceq w$. Since $T_\sigma$ is order preserving, we have $T_\sigma \, v \preceq T_\sigma \, w$ for all $\sigma \in \Sigma$. Hence $T_\sigma \, v \preceq \tmax w$ for all $\sigma \in \Sigma$. Therefore $\tmax v \preceq Tw$. ``` Below we consider Howard policy iteration (HPI) as an algorithm for solving for optimal policies of ADPs. We use precisely the same instruction set as for the RDP case, as shown in {prf:ref}`algo-hpi_rdps`. To further clarify the algorithm, we define a map $\Hmax$ from $V$ to $\{v_\sigma\}$ via $\Hmax \, v = v_\sigma$ where $\sigma$ is $v$-max-greedy. Iterating with $\Hmax$ generates the value sequence associated with Howard policy iteration.[^1] In what follows, we call $\Hmax$ the **Howard operator** generated by the ADP. #### Properties Let $\aA \coloneq (V, \{T_\sigma\}_{\sigma \in \Sigma})$ be an ADP. We call $\aA$ - **finite** if $\Sigma$ is a finite set, - **order stable** if every policy operator $T_\sigma$ is order stable on $V$, and - **max-stable** if $\aA$ is order stable and $\tmax$ has at least one fixed point in $V$. Obviously max-stable $\implies$ order stable $\implies$ well-posed. Regarding the definition of max-stability, existence of a fixed point of $T$ in $V$ is a high-level assumption that can be challenging to verify in applications. At the same time, our main concern in the present volume is the case where $\aA$ is finite, and, in this setting, order stability is enough: ```{prf:proposition} :label: p-fposet If $\aA$ is order stable and finite, then $\aA$ is max-stable. ``` {prf:ref}`p-fposet` is proved in {ref}`sss-port`. ```{prf:corollary} :label: c-srsal Let $\rR$ be an RDP and let $\aA_\rR$ be the ADP generated by $\rR$. If $\rR$ is globally stable, then $\aA_\rR$ is max-stable. ``` ```{prf:proof} Let $\rR$ and $\aA_\rR$ be as stated and suppose that $\rR$ is globally stable. In view of {prf:ref}`l-gsusds`, each policy operator is order stable. Hence $\aA_\rR$ is order stable. Since $\Sigma$ is finite, {prf:ref}`p-fposet` implies that $\aA_\rR$ is also max-stable. ◻ ``` ```{exercise} :label: ex-adps-auto-3 Show that the ADP described in {prf:ref}`eg-qfacadp` is max-stable. ``` Order stability is central to the optimality results just stated. While order stability is a somewhat nonstandard condition, the next result shows that, at least in simple settings, order stability is necessary for any discussion of optimality. ```{prf:proposition} :label: p-osnec Let $\aA = (V, \{T_\sigma\})$ be an ADP generated by an RDP $\rR = (\Gamma, V, B)$. If $V$ is an order interval in $\RR^\Xsf$, then the following statements are equivalent: 1. $\aA$ is well-posed. 2. $\aA$ is order stable. ``` ```{prf:proof} Let $\aA$ be as stated, with $V = [v_1, v_2]$ for some $v_1, v_2$ in $\RR^\Xsf$ with $v_1 \leq v_2$. Obviously (ii) $\implies$ (i). Regarding (i) $\implies$ (ii), let $\aA$ be well-posed and pick any policy operator $T_\sigma$. Since $\aA$ is well-posed, $T_\sigma$ has a unique fixed point $v_\sigma$ in $V$. Suppose $v \in V$ with $T_\sigma \, v \leq v$. Since, $T_\sigma$ is order preserving, $T_\sigma$ is a self-map on $[v_1, v]$. By the Knaster--Tarski theorem (p. ), $T_\sigma$ has at least one fixed point in $[v_1, v]$. By uniqueness, that fixed point is $v_\sigma$. Hence $v_\sigma \leq v$ and downward stability holds. Upward stability can be confirmed via a similar argument. Hence $\aA$ is order stable. ◻ ``` (sss-opres)= #### Max-Optimality Results Let $\aA = (V, \{T_\sigma\})$ be a well-posed ADP with $\sigma$-value functions $\{ v_\sigma \}_{\sigma \in \Sigma}$. We define $$ V_\Sigma \coloneq \{v_\sigma\}_{\sigma \in \Sigma} \quad \text{and} \quad V_u \coloneq \setntn{v \in V}{v \preceq Tv}. $$ ```{exercise} :label: ex-adps-auto-4 Prove that $V_\Sigma \subset V_u$. ``` ```{solution} ex-adps-auto-4 For all $v \in V_\Sigma$, we have $v = v_\sigma$ for some $\sigma$, and hence $Tv \geq T_\sigma \, v = T_\sigma \, v_\sigma = v_\sigma = v$. ``` If $V_\Sigma$ has a greatest element, then we denote it by $\vmax$ and call it the **value function** generated by $\aA$. In this setting, a policy $\sigma \in \Sigma$ is called **optimal** for $\aA$ if $v_\sigma = \vmax$. We say that $\aA$ obeys **Bellman's principle of optimality** if $$ \sigma \in \Sigma \text{ is optimal for } \aA \quad \iff \quad \sigma \text{ is } \vmax \text{-greedy}. $$ These definitions are direct generalizations of the corresponding definitions for RDPs discussed in {prf:ref}`c-rdps`. We can now state our main optimality result for ADPs. ```{prf:theorem} :label: t-fbk If $\aA$ is finite and order stable, then 1. the set of $\sigma$-value functions $V_\Sigma$ has a greatest element $v^*$, 2. $\vmax$ is the unique solution to the Bellman equation in $V$, 3. $\aA$ obeys Bellman's principle of optimality, 4. $\aA$ has at least one optimal policy, and 5. HPI returns an exact optimal policy in finitely many steps. ``` {prf:ref}`t-fbk` informs us that finite well-posed ADPs have first-rate optimality properties under a relatively mild stability condition. In {ref}`ss-orr` we use {prf:ref}`t-fbk` to prove all optimality results for RDPs stated in {prf:ref}`c-rdps`. The proof of {prf:ref}`t-fbk` is given in {ref}`sss-port`. Note that (iv) follows directly from (i) and is included only for completeness. #### General States This volume focuses on dynamic programming problems with finite states. Here we restrict ourselves to one high-level result for general state spaces. ```{prf:proposition} :label: p-fbkc If $\aA$ is max-stable, then (i)--(iv) of {prf:ref}`t-fbk` hold. ``` {prf:ref}`p-fbkc` tells us that we can drop finiteness of policy set $\Sigma$ (which is implied by finite states and actions) whenever the Bellman operator has at least one fixed point. Various fixed-point methods are available for establishing this existence. We defer further details until Volume 2. {prf:ref}`p-fbkc` is proved in in {ref}`sss-port`. #### Application: Mixed Strategies This section discusses adding mixed strategies to an RDP. We will need to apply {prf:ref}`p-fbkc` to discuss optimality because the set of mixed strategies is not finite. Let $\rR = (\Gamma, V, B)$ be an RDP with finite state space $\Xsf$, finite action space $\Asf$, policy set $\Sigma$ and Bellman operator $T$ (see {ref}`ss-frdpot`). A **mixed strategy** for $\rR$ is a map $\phi$ sending $x \in \Xsf$ into a distribution $\phi_x \in \dD(\Asf)$ supported on $\Gamma(x)$. In other words, for each $x \in \Xsf$, $$ \phi_x \colon \Asf \to [0, 1] \quad \text{and} \quad \sum_{a \in \Gamma(x)} \phi_x(a) = 1. $$ Let $\Phi$ be the set of all mixed strategies for $\rR$. For each mixed strategy $\phi \in \Phi$, we introduce the policy operator on $V$ defined by $$ (\hat T_\phi \, v)(x) = \sum_{a \in \Asf} B(x, a, v) \phi_x(a) \qquad (v \in V, \; x \in \Xsf). $$ The right-hand side is the expected lifetime value from current state $x$, when the current action is drawn from $\phi_x$ and future states are evaluated via $v$. ```{exercise} :label: ex-msgreedy Fix $v \in V$. Prove: If $\phi \in \Phi$ and, for each $x \in \Xsf$, the distribution $\phi_x$ is supported on $\argmax_{a \in \Gamma(x)} B(x, a, v)$, then $\hat T_\phi \, v \geq \hat T_\psi \, v$ for all $\psi \in \Phi$. ``` ```{solution} ex-msgreedy This follows directly from {prf:ref}`ex-msmax`. ``` ```{exercise} :label: ex-mstmax Show that, given $v \in V$ and $x \in \Xsf$ we have $$ \max_{\phi \in \Phi} \, (\hat T_\phi \, v)(x) = \max_{a \in \Gamma(x)} B(x, a, v). $$ ``` ```{solution} ex-mstmax This result follows from {prf:ref}`ex-msgreedy`, since, at each $x$, the maximizing distribution $\phi_x$ is supported on $\argmax_{a \in \Gamma(x)} B(x, a, v)$. ``` It follows from this discussion that $\aA_M \coloneq (V, \{\hat T_\phi\}_{\phi \in \Phi})$ is an ADP (where "M" stands for "mixed"), and that the Bellman operator $\hat T$ associated with the ADP $\aA_M$ is given by $$ (\hat T v)(x) = \max_{a \in \Gamma(x)} B(x, a, v) = (Tv)(x) \qquad (v \in V, \; x \in \Xsf). $$ (eq-msbell) Let us assume for simplicity that $\rR$ is contracting (see {ref}`sss-crdps`), with modulus of contraction $\beta \in (0,1)$. Assume also that $V$ is closed in $\RR^\Xsf$. As a result, the value function $v^*$ for $\rR$ exists in $V$ and is the unique fixed point of $T$ in $V$ ({prf:ref}`c-rdpcis`). ```{exercise} :label: ex-mixcon Show that, under the assumptions previously stated, $\{\hat T_\phi\}_{\phi \in \Phi}$ and $\hat T$ are all contraction mappings. ``` By {prf:ref}`ex-mixcon`, the ADP $\aA_M$ is max-stable (since globally stable operators are order stable -- see {prf:ref}`l-gsusds` -- and the Bellman operator $\hat T$ has a fixed point). Hence, by {prf:ref}`p-fbkc`, the value function $\hat v^*$ for $\aA_M$ exists in $V$ and is the unique fixed point of $\hat T$ in $V$. But, by {eq}`eq-msbell`, $\hat T$ and $T$ agree on $V$. Hence $\hat v^* = v^*$. We conclude as follows: while the set of mixed strategies is larger than the set of pure strategies (i.e., deterministic policies), the maximal lifetime value from each state is the same. (ss-orr)= ### Optimality Results for RDPs In this section, we return to the optimality properties of RDPs, as first discussed in {ref}`sss-rdp_opres`. Our aim is to connect the ADP optimality results from {ref}`sss-opres` to the special case of RDPs and, through this process, complete the proofs of our key RDP optimality results from {prf:ref}`c-rdps`. (ss-aopi)= #### OPI Convergence The first step is to provide some preliminary results related to OPI convergence, where OPI obeys the algorithm given. Throughout, $\rR = (\Gamma, V, B)$ is a globally stable RDP with policy set $\Sigma$, policy operators $\{T_\sigma\}$, Bellman operator $T$, and value function $v^*$. As usual, $v_\sigma$ denotes the unique fixed point of $T_\sigma$ for all $\sigma \in \Sigma$. In the results that follow, $m$ is a fixed natural number indicating the OPI step size and $H$ and $W_m$ are as defined in {ref}`sss-rdpalgos`. ```{prf:lemma} :label: l-adpvficon If $v \in V_\Sigma \,$, then $T^k v \to v^*$ as $k \to \infty$. ``` ```{prf:proof} Fix $v \in V_\Sigma$. On one hand, $v \leq v^*$ and hence $T^k v \leq T^k v^* = v^*$ for all $k$. On the other hand, if $\sigma$ is any policy, then $T_\sigma^k \, v \leq T^k v$ for all $k$. Hence $T_\sigma^k \, v \leq T^k v \leq v^*$ for all $k$. If we now take $\sigma$ to be an optimal policy, which exists under the stated assumptions, we have $T_\sigma^k \, v \to v_\sigma = v^*$ as $k \to \infty$. Hence $T^k v \to v^*$, as required. ◻ ``` ```{prf:lemma} :label: l-opio The OPI operator $W_m$ is a self-map on $V_u$ and $$ v \in V_u \implies Tv \leq W_m v \leq T^m v \quad \text{for all } m \in \NN. $$ ``` ```{prf:proof} Regarding the self-map property, pick any $v \in V_u$. Since $T$ and $T_\sigma$ are order preserving, $v \leq Tv$ and $\sigma$ is $v$-greedy, we have $$ W_m v = T_\sigma T_\sigma^{m-1} v \leq T T_\sigma^{m-1} v \leq T T_\sigma^{m-1} T v = T T_\sigma^m v = TW_m v . $$ Hence $W_m v \in V_u$ and $W_m$ is invariant on $V_u$. To obtain the inequality $Tv \leq W_m v$, fix $v \in V_u$. Since $T_\sigma$ is order preserving, $v \leq T v$ and $\sigma$ is $v$-greedy, we have $$ T_\sigma^{m-1} v \leq T_\sigma^{m-1} T v = T_\sigma^{m-1} T_\sigma \, v = W_m v. $$ Continuing, in the same manner, gives $T_\sigma^{m-j} v \leq W_m v$ for $j < m$ and, in particular, $T_\sigma v \leq W_m v$. Because $\sigma$ is $v$-greedy, this yields $Tv \leq W_m v$. Regarding the second inequality, we use the fact that $T_\sigma \leq T$ on $V$ and $T$ and $T_\sigma$ are both order preserving to obtain $W_m v = T^m_\sigma v \leq T^m v$ (see {prf:ref}`ex-dompower`). ◻ ``` ```{prf:lemma} :label: l-rdpopiv For each $v \in V_u$ we have $T^k v \leq W_m^k v$ for all $k \in \NN$. ``` ```{prf:proof} Fix $v \in V_u$. Let $v_k = T^k v$ and $w_k = W_m^k v$ for all $k$. The claim is true at $k=1$ by {prf:ref}`l-opio`. Suppose it is true at $k-1$, so that $v_{k-1} \leq w_{k-1}$. We claim it is true at $k$ as well. To show this we take $\sigma$ to be $w_{k-1}$-greedy and, using the fact that $v \in V_u$ and $W_m V_u \subset V_u$, obtain $w_{k-1} \leq T w_{k-1} = T_\sigma \, w_{k-1}$. Since $T_\sigma$ is order preserving, this means that the sequence $(T^\ell_\sigma \, w_{k-1})_{\ell \in \NN}$ is increasing. As a result, we have $$ v_k = T v_{k-1} \leq T w_{k-1} = T_\sigma \, w_{k-1} \leq T_\sigma^m \, w_{k-1} = W_m w_{k-1} = w_k. $$ This proves the claim in {prf:ref}`l-rdpopiv`. ◻ ``` ```{prf:lemma} :label: l-opiifcon Let $v_0$ be any element of $V_\Sigma$ and let $v_k = W_m^k v_0$ for all $k \in \NN$. If $v_k = v_{k+1}$ for some $k \in \NN$, then $v_k = \vmax$ and every $v_k$-greedy policy is optimal. ``` ```{prf:proof} Let the sequence $(v_k)$ be as stated and suppose that $v_k = v_{k+1}$. Let $\sigma$ be $v_k$-greedy. It follows that $T^m_\sigma \, v_k = v_k$ and, moreover, $v_k \leq T v_k = T_\sigma \, v_k$, where the last inequality is by $v_k \in V_u$. As a result, $$ v_k \leq T_\sigma \, v_k \leq T^m_\sigma \, v_k = v_k. $$ In particular, $T v_k = T_\sigma \, v_k = v_k$, which in turn gives $v_k = \vmax$. Bellman's principle of optimality now implies that every $v_k$-greedy policy is optimal. ◻ ``` ```{prf:lemma} :label: l-vfigc If $(v_k) \subset V_u$ and $v_k \to v^*$ as $k \to \infty$, then there exists a $K \in \NN$ such that ::: center $k \geq K \implies$ every $v_k$-greedy policy is optimal. ::: ``` ```{prf:proof} Let $\rR$ be as stated and fix $(v_k) \subset V_u$ with $v_k \to v^*$ as $k \to \infty$. Let $\Sigma^*$ be the set of optimal policies and let $\Sigma' \coloneq \Sigma \setminus \Sigma^*$. Since $\Sigma'$ is finite, we have $$ e \coloneq \min_{\sigma \in \Sigma'} \|v_\sigma - v^*\|_\infty > 0. $$ Choose $K \in \NN$ such that $\|v_k - v^*\|_\infty < e$ for all $k \geq K$. Fix $k \geq K$ and let $\sigma$ be $v_k$-greedy. We claim that $\sigma$ is optimal. Indeed, since $v_k \subset V_u$, we have $v_k \leq T v_k = T_\sigma \, v_k$, so, by upward stability, $v_k \leq v_\sigma$. As a result, $$ |v^* - v_\sigma | = v^* - v_\sigma \leq v^* - v_k . $$ Hence $\|v^* - v_\sigma\|_\infty \leq \| v^* - v_k \|_\infty < e$. But then $\sigma \notin \Sigma'$, so $\sigma$ is optimal. ◻ ``` #### Proofs of RDP Results In {ref}`ss-frdpot` we stated two key optimality results for RDPs, the first concerning globally stable RDPs ({prf:ref}`t-fbk_rpd`) and the second concerning bounded RDPs ({prf:ref}`t-fbk_rpd_bounded`). Let's now prove them. In what follows, $\rR = (\Gamma, V, B)$ is a well-posed RDP and $\aA_\rR \coloneq (V, \{T_\sigma\})$ is the ADP generated by $\rR$. ```{prf:proof} *Proof of {prf:ref}`t-fbk_rpd`.* Let $\rR$ be globally stable. Then $\aA_\rR$ is finite and max-stable, by {prf:ref}`c-srsal`. Hence the optimality and HPI convergence claims in {prf:ref}`t-fbk_rpd` follow from {prf:ref}`t-fbk`. Regarding OPI convergence, let $(v_k, \sigma_k)$ be as given in {eq}`eq-opivp`. From {prf:ref}`l-adpvficon` we obtain $T^k v_0 \to \vmax$. Also, from {prf:ref}`l-rdpopiv`, we have $T^k v_0 \leq v_k$ for all $k$. In fact we also have $T^k v_0 \leq v_k \leq \vmax$ for all $k$, where the second inequality holds because $W_m$ has the property $W_m w \leq \vmax$ whenever $w \leq \vmax$. (If $w \leq \vmax$, then, taking $\sigma$ to be $w$-greedy, we have $T_\sigma \, w = T w \leq T \vmax = \vmax$, so, iterating $m$ times on this inequality, $W_m w \leq \vmax$.) The convergence $T^k v_0 \to \vmax$ and the bound $T^k v_0 \leq v_k \leq v^*$ for all $k$ together imply $v_k \to \vmax$ as $k \to \infty$. Given such convergence, {prf:ref}`l-vfigc` implies that there exists a $K \in \NN$ such that $\sigma_k$ is optimal whenever $k \geq K$. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`t-fbk_rpd_bounded`.* Let $\rR = (\Gamma, V, B)$ be a bounded and well-posed. In view of {prf:ref}`ex-boured`--{prf:ref}`ex-boured2`, it suffices to prove the optimality claims in {prf:ref}`t-fbk_rpd_bounded` for the reduced RDP $\hat \rR = (\Gamma, \hat V, B)$, where $\hat V$ is the order interval in $\RR^\Xsf$ generated by the bounding functions (i.e., $\hat V = [v_1, v_2]$). Let $\aA$ be the ADP generated by $\hat \rR$ By {prf:ref}`p-osnec`, $\aA$ is order stable. {prf:ref}`c-srsal` now implies that $\aA$ is max-stable. Hence the claims in {prf:ref}`t-fbk_rpd_bounded` follow from {prf:ref}`t-fbk`. ◻ ``` (ss-minim)= ### Min-Optimality Until now, our ADP theory has focused on maximization of lifetime values. Now we turn to minimization. One of our aims is to prove the RDP minimization results in {ref}`ss-minim_rdps`. We will see that ADP minimization results are easily recovered from ADP maximization results via order duality. Let $\aA = (V, \{T_\sigma\})$ be a well-posed ADP and let $V_\Sigma \coloneq \{v_\sigma\}$ be the set of $\sigma$-value functions. We call $\sigma \in \Sigma$ **min-optimal** for $\aA$ if $v_\sigma$ is a least element of $V_\Sigma$. When $V_\Sigma$ has a least element we denote it by $\vmin$ and call it the **min-value function** generated by $\aA$. A policy $\sigma$ is called **$v$-min-greedy** if $T_\sigma \, v \preceq T_\tau \, v$ for all $\tau \in \Sigma$. Existence of a $v$-min-greedy policy for each $v \in V$ is guaranteed by the definition of ADPs. We say that $\aA$ obeys **Bellman's principle of min-optimality** if $$ \sigma \in \Sigma \text{ is min-optimal for } \aA \quad \iff \quad \sigma \text{ is } \vmin \text{-min-greedy}. $$ We define the **Bellman min-operator** corresponding to $\aA$ as the self-map $\tmin$ on $V$ defined by $\tmin v = \bigwedge_\sigma T_\sigma \, v$. This map is well-defined because $\{T_\sigma \, v\}_{\sigma \in \Sigma}$ has a least element and, moreover, $\sigma \in \Sigma$ is $v$-min-greedy if and only if $T_\sigma \, v = \tmin \, v$. We say that $v$ satisfies the **Bellman min-equation** if $\tmin v = v$. We call $\aA$ **min-stable** if $\aA$ is order stable and $\tmin$ has at least one fixed point in $V$. We define $\Hmin$ from $V$ to $\{v_\sigma\}$ via $\Hmin \, v = v_\sigma$ where $\sigma$ is $v$-min-greedy and call $\Hmin$ the **Howard min-operator** generated by $\aA$. Iterating with $\Hmin$ is called **min-HPI**. Results analogous to {prf:ref}`t-fbk` hold for the minimization case. ```{prf:theorem} :label: t-fbk_min If $\aA$ is min-stable, then 1. the min-value function $\vmin$ generated by $\aA$ exists in $V$, 2. $\vmin$ is the unique solution to the Bellman min-equation in $V$, 3. $\aA$ obeys Bellman's principle of min-optimality, and 4. $\aA$ has at least one min-optimal policy. If, in addition, $\Sigma$ is finite, then min-HPI converges to $\vmin$ in finitely many steps. ``` To prove {prf:ref}`t-fbk_min` we use order duality. Below, if $\aA \coloneq (V, \{T_\sigma\})$ is an ADP then its **dual** is $$ \aA^\partial \coloneq (V^\partial, \{T_\sigma\}) \; \text{ where } V^\partial \text{ is the order dual of } V. $$ In this setting, we let $\tmax^\partial$ be the Bellman operator for $\aA^\partial$, $(\vmax)^\partial$ be the value function for $\aA^\partial$, and so on. We note that $\aA$ is self-dual, in the sense that $(\aA^\partial)^\partial = \aA$, since the same is true for $V$. To make our terminology more symmetric, in the remainder of this section we refer to maximization-based optimal policies as **max-optimal**, the Bellman operator $\tmax = \bigvee_\sigma T_\sigma \, v$ as the **Bellman max-operator**, and so on. ```{exercise} :label: ex-mmp Let $\aA$ be a well-posed ADP with dual $\aA^\partial$. Verify the following. 1. Given $v \in V$, $\sigma \in \Sigma$ is $v$-min-greedy for $\aA$ if and only if $\sigma$ is $v$-max-greedy for $\aA^\partial$, 2. $\tmin = \tmax^\partial$ and $\tmin^\partial = \tmax$, 3. $\Hmin = \Hmax^\partial$ and $\Hmin^\partial = \Hmax$, 4. $\aA$ is order stable if and only if $\aA^\partial$ is order stable, 5. $\aA$ is min-stable if and only if $\aA^\partial$ is max-stable, and, in this case, $\vmin = (\vmax)^\partial$, and 6. $\sigma \in \Sigma$ is max-optimal for $\aA$ if and only if $\sigma$ is min-optimal for $\aA^\partial$. ``` ```{solution} ex-mmp Regarding (i), fix $v \in V$. Policy $\sigma$ is $v$-min-greedy for $\aA$ if and only if $T_\sigma \, v \preceq T_\tau \, v$ for all $\tau \in \Sigma$, which is equivalent to $T_\sigma \, v \succeq^\partial T_\tau \, v$ for all $\tau \in \Sigma$. Hence $\sigma$ is $v$-min-greedy for $\aA$ if and only if $\sigma$ is $v$-max-greedy for $\aA^\partial$. Regarding (ii)--(iii), fix $v \in V$ and let $\sigma$ be $v$-min-greedy for $\aA$ (and hence $v$-max-greedy for $\hat \aA$). We then have $\tmax^\partial v = T_\sigma \, v = \tmin v$. Hence $\tmax^\partial = \tmin$. Similarly, at the same $v$ and with the same policy $\sigma$, $\Hmax^\partial v$ is equal to $v_\sigma$ and so is $\Hmin$. A similar argument gives $\tmin^\partial = \tmax$ and $\Hmin^\partial = \Hmax$. Regarding (iv), {prf:ref}`l-odod` implies that $\aA$ is order stable if and only if $\aA^\partial$ is order stable. Regarding (v), $\tmin = \tmax^\partial$, so $\tmin$ has a fixed point in $V$ if and only if $\tmax^\partial$ has a fixed point in $V$. By this fact and (iv), $\aA$ is min-stable if and only if $\aA^\partial$ is max-stable. Moreover, in this setting, we have $\vmin = \bigwedge_\sigma v_\sigma = \bigvee_\sigma^\partial v_\sigma = (\vmax)^\partial$. Part (vi) follows from similar analysis and details are left to the reader. ``` Self-duality implies corollaries to {prf:ref}`ex-mmp` which we treat as self-evident. For example, if $\aA$ is max-stable if and only if $\aA^\partial$ is min-stable, which follows from part (v) and the fact that $(\aA^\partial)^\partial = \aA$. ```{prf:proof} *Proof of {prf:ref}`t-fbk_min`.* Let $\aA$ be min-stable. By {prf:ref}`ex-mmp`, the dual $\aA^\partial$ is max-stable. Hence all of the conclusions of the max-optimality result in {prf:ref}`t-fbk` apply to $\aA^\partial$. All that remains is to translate these max-optimality results for $\aA^\partial$ back to min-optimality results for $\aA$. Regarding claim (i) of the min-optimality results, max-optimality of $\aA^\partial$ implies that $(\vmax)^\partial$ exists in $V$. But then $\vmin$ exists in $V$, since, by {prf:ref}`ex-mmp`, $\vmin = (\vmax)^\partial$. Regarding (ii), we know that $(\vmax)^\partial$ is the unique solution to $\tmax^\partial (\vmax)^\partial = (\vmax)^\partial$, so, applying {prf:ref}`ex-mmp` again, we have $\tmin \, \vmin = \vmin$. The remaining steps of the proof are similar and left to the reader. ◻ ``` (s-cn_abstract)= ## Chapter Notes As indicated in notes for {prf:ref}`c-rdps`, our interest in abstract dynamic programming was inspired by {cite:t}`bertsekas2022abstract`. This chapter generalizes his framework by switching to a "completely abstract" setting based on analysis of self-maps on partially ordered space. The material here is based on {cite:t}`sargent2023completely`. Earlier work on dynamic programming in a setting with no topology can be found in {cite:t}`kamihigashi2014elementary`. [^1]: For $\Hmax$ to be well-defined, we must always select the same $v$-greedy policy when the operator is applied to $v$. We can use the axiom of choice to assign to each $v$ a designated $v$-greedy policy, although, in applications, a simple rule usually suffices. For example, if $\Sigma$ is finite, we can enumerate the policy set $\Sigma$ and choose the first $v$-greedy policy. ======================================================================== ## Continuous Time (c-ctime)= # Continuous-Time Earlier chapters treated dynamics in discrete time. Now we switch to continuous time. We restrict ourselves to finite state spaces, where continuous-time processes are pure jump processes. This allows us to provide a rigorous and self-contained treatment, while laying foundations for a treatment of general state problems. ## Continuous-Time Markov Chains In this section, we introduce continuous-time Markov models. In {ref}`s-ctmdp`, we will use them as components of continuous-time Markov decision processes. (ss-backg)= ### Background In {ref}`ss-markchain` we learned that if $(X_t) = (X_0, X_1, \ldots)$ is $P$-Markov, then the distributions $(\psi_t)$ of the state process obey $\psi_{t+1} = \psi_t P$ for all $t$. This update rule is a linear difference equation in distribution space, which in turn suggests that, once we switch to continuous-time, distributions will evolve according to linear *differential* equations in distribution space. This idea turns out to be correct. As such, we begin this chapter with some facts about linear differential equations. #### Scalar Exponentials Solutions to linear differential equations involve exponential functions. The real-valued **exponential function** can be defined by the power series $$ \me^x :=: \exp(x) \coloneq \sum_{k \geq 0} \frac{x^k}{k!} \qquad (x \in \RR). $$ (eq-expser) ```{prf:example} :label: eg-balance If $u_t$ is the balance of a savings account that pays a continuously compounded interest rate $r$, then the balance evolves according to $$ \dot u_t \coloneq \frac{\diff}{\diff t} u_t = r u_t \quad \text{for all } \; t \geq 0 \quad \text{with initial balance } u_0 \text{ given}. $$ (eq-odode) We understand {eq}`eq-odode` as a functional equation whose solution is an element $t \mapsto u_t$ of $C_1(\RR_+, \RR)$, the set of continuously differentiable functions from $\RR_+$ to $\RR$, that satisfies {eq}`eq-odode`. We claim that $u_t \coloneq \me^{r t} u_0$ is the only solution to {eq}`eq-odode` in $C_1(\RR_+, \RR)$. It is easy to check that this choice of $u_t$ obeys {eq}`eq-odode`. As for uniqueness, suppose that $t \mapsto y_t$ is another solution in $C_1(\RR_+, \RR)$, so that $\dot y_t = r y_t$ for all $t \geq 0$ and $y_0 = u_0$. Then $$ \frac{\diff}{\diff t} \left( y_t \, \me^{-rt} \right) = \dot y_t \, \me^{-rt} - r y_t \, \me^{-rt} = r y_t \, \me^{-rt} - r y_t \, \me^{-rt} = 0, $$ so $t \mapsto y_t \, \me^{-rt}$ is constant on $\RR_+$, implying existence of a $c \in \RR$ such that $y_t = c \, \me^{rt}$ for all $t \geq 0$. Setting $t=0$ and using the initial condition gives $c=u_0$. Hence, at any $t$, we have $y_t = \me^{rt} u_0 = u_t$. ``` The continuous-time system in {prf:ref}`eg-balance` is closely related to the discrete time difference equation $u_{t+1} = \me^{r} u_t$. Indeed, if we start at $u_0$, then the $t$-th iterate is $\me^{r t} u_0$, so solutions agree at integer times. We can think of the continuous-time system as one that interpolates between points in time of a corresponding discrete time system. The exponential $\me^\lambda$ of $\lambda = a + i b \in \CC$ can also be defined via {eq}`eq-expser`. From the identity $\me^{ib} = \cos(b) + i \sin(b)$, we obtain $$ \me^{\lambda} = \me^{a + ib} = \me^{a}(\cos(b) + i \sin(b)). $$ (eq-expcom) This equation will soon prove useful. (sss-expsid)= #### The Exponential Distribution A random variable $W$ is said to be **exponentially distributed** with rate $\theta$, and we write $W \eqdist \Exp(\theta)$, when the counter CDF $G$ satisfies $$ G(t) \coloneq \PP\{W > t\} = \me^{- \theta t} \qquad (t \geq 0). $$ Continuous-time Markov chains have a close relationship with the exponential distribution, a fact that stems from its being the only distribution having the **memoryless** property $$ \PP \{W > s + t \given W > s \} = \PP \{W > t\} \quad \text{for all } s, t > 0. $$ (eq-memoryless) ```{exercise} :label: ex-vmeml Verify that {eq}`eq-memoryless` holds when $W \eqdist \Exp(\theta)$. ``` ```{solution} ex-vmeml if $W \eqdist \Exp(\theta)$ and $s, t > 0$, then $$ \frac{ \PP \{W > s + t \text{ and } W > s \} } {\PP \{W > s\}} = \frac{ \PP \{W > s + t \} } {\PP \{W > s\}} = \frac{e^{-\theta s - \theta t}}{e^{-\theta s}} = e^{-\theta t}. $$ This is equivalent to {eq}`eq-memoryless`. ``` The memoryless property is special. For example, the probability that an individual human being lives 70 years from birth is not equal to the probability that he or she lives another 70 years conditional on having reached age 70. In fact, the exponential distribution is the *only* memoryless distribution supported on the nonnegative reals: ```{prf:lemma} :label: l-expom If $W$ has counter CDF $G$ satisfying $0 < G(t) < 1$ for all $t > 0$, then the following statements are equivalent: 1. $W \eqdist \Exp(\theta)$ for some $\theta > 0$. 2. $W$ satisfies the memoryless property in {eq}`eq-memoryless`. ``` ```{prf:proof} Exercise {eq}`ex-vmeml` treats (i) $\Rightarrow$ (ii). As for (ii) $\Rightarrow$ (i), suppose (ii) holds. Then $G$ has three properties: 1. $G$ is decreasing on $\RR_+$ (as is any counter CDF), 2. $0 < G(t) < 1$ for all $t > 0$, and 3. $G(s + t) = G(s) G(t)$ for all $s, t > 0$. From (a)--(c) we will show that $$ G(t) = G(1)^t \qquad \text{for all } t \geq 0. $$ (eq-implex) This is sufficient to prove (i) because then $\theta \coloneq - \ln G(1)$ is a positive real number (by (b)) and, furthermore, $$ G(t) = \exp\{ \ln [ G(1)^t ] \} = \exp\{ \ln [ G(1) ] t \} = \exp( - \theta t). $$ To see that {eq}`eq-implex` holds, fix $m,n \in \NN$. We can use (c) to obtain both $G(m/n) = G(1/n)^m$ and $G(1) = G(1/n)^n$. It follows that $G(m/n)^n = G(1/n)^{m n} = G(1)^m$ and, raising to the power of $1/n$, we get {eq}`eq-implex` when $t=m/n$. The discussion so far confirms that {eq}`eq-implex` holds when $t$ is rational. So now take any $t \geq 0$ and rational sequences $(a_n)$ and $(b_n)$ converging to $t$ with $a_n \leq t \leq b_n$ for all $n$. By (a) we have $G(b_n) \leq G(t) \leq G(a_n)$ for all $n$, so $G(1)^{b_n} \leq G(t) \leq G(1)^{a_n}$. for all $n \in \NN$. Taking the limit in $n$ completes the proof. ◻ ``` #### Extension to Matrices The real exponential formula {eq}`eq-expser` extends to the **matrix exponential** via $$ \me^A \coloneq I + A + \frac{A^2}{2!} + \cdots = \sum_{k \geq 0} \frac{A^k}{k!}, $$ (eq-defexpmat) where $A$ is any square matrix. As we will see, the matrix exponential plays a key role in the solution of vector-valued linear differential equations. ```{exercise} :label: ex-ctime-auto-1 Let $A$ be $n \times n$ and let $\| \cdot \|$ be the operator norm. Show that {eq}`eq-defexpmat` converges, in the sense that $\| \sum_{k = 0}^m \frac{A^k}{k!} \|$ is bounded in $m$. ``` ```{solution} ex-ctime-auto-1 Using the triangle inequality and submultiplicative property of the matrix norm, we have $$ \left\| \sum_{k = 0}^m \frac{A^k}{k!} \right\| \leq \sum_{k = 0}^m \frac{\| A^k \|}{k!} \leq \sum_{k = 0}^m \frac{\| A\|^k }{k!} \leq \me^{\|A\|}, $$ where the last term uses the ordinary (scalar) exponential function defined in {eq}`eq-expser`. (If you also want to prove that the scalar series in {eq}`eq-expser` converges, you can do so via the ratio test.) ``` ```{prf:lemma} :label: l-expcom Let $A$ and $B$ be square matrices. 1. If $A$ is diagonalizable with $A = P D P^{-1}$, then $\me^{A} = P \me^{D} P^{-1}$. 2. If $A$ and $B$ commute (i.e. $AB = BA$), then $\me^{A + B} = \me^{A} \me^{B}$. 3. If $m$ is any positive integer, then $\me^{mA} = (\me^{A})^m$. 4. $\lambda$ is an eigenvalue of $A$ if and only if $\me^\lambda$ is an eigenvalue of $\me^{A}$. 5. The function $\RR \ni t \mapsto \me^{t A}$ is differentiable in $t$, with $$ \frac{\diff}{\diff t} \me^{tA} = A \me^{t A} = \me^{t A} A. $$ (eq-matdiffl) 6. $\me^{A^\top} = (\me^A)^\top$. 7. The fundamental theorem of calculus holds, in the sense that $$ \me^{tA} - \me^{sA} = \int_s^t \me^{\tau A} A \diff \tau \quad \text{for all } s \leq t. $$ (eq-ftcexp) ``` In {prf:ref}`l-expcom` and what follows, integration or differentiation of a vector- or matrix-valued function is carried out element by element. For example, to differentiate a matrix $B(t) = (b_{ij}(t))$ that depends on $t$, we form a new matrix by differentiating each element $b_{ij}(t)$ with respect to $t$. The integral $\int_a^b B(t) \diff t$ is the matrix of integrals $\int_a^b b_{ij}(t) \diff t$. ```{exercise} :label: ex-ctime-auto-2 Prove part (i) of {prf:ref}`l-expcom`. ``` ```{solution} ex-ctime-auto-2 Given $A = P^{-1} D P$ we have $A^k = P^{-1} D^k P$ for all $k$, so $$ \me^A = \sum_{k \geq 0} \frac{A^k}{k!} = \sum_{k \geq 0} \frac{P^{-1} D^k P}{k!} = P^{-1} \sum_{k \geq 0} \frac{D^k}{k!} P = P^{-1} \me^D P. $$ ``` The proof of part (ii) of {prf:ref}`l-expcom` uses the definition of the exponential and the binomial formula. See, for example, {cite:t}`hirsh1974`. Part (iii) follows directly from part (ii). Part (iv) follows easily from part (i) when $A$ is diagonalizable (and can be proved more generally via the Jordan canonical form). ```{exercise} :label: ex-diffe Prove (v) of {prf:ref}`l-expcom`. A good starting point is to observe that, for any $t \in \RR$, $$ \frac{\diff}{\diff t} \me^{t A} = \lim_{h \to 0} \frac{\me^{t A + h A} - \me^{t A}}{h} = \me^{t A} \lim_{h \to 0} \frac{\me^{h A} - I}{h}. $$ (eq-diffe0) ``` ```{solution} ex-diffe We use the definition $\me^A = \sum_{k \geq 0} \frac{A^k}{k!}$ for the proof and fix $t \in \RR$. A common argument for differentiating $\me^{t A}$ with respect to $t$ is to take the derivative through the infinite sum to get $$ \frac{\diff}{\diff t} \me^{tA} = \left( A + t \frac{A^2}{1!} + t^2 \frac{A^3}{2!} + \cdots \right) = A \me^{t A}. $$ But this is not fully rigorous, since we have not justified interchange of limits. A better answer is to start with {eq}`eq-diffe0`, which gives $$ \frac{\diff}{\diff t} \me^{t A} = \me^{t A} \lim_{h \to 0} \frac{\me^{h A} - I}{h}. $$ and note that $$ \frac{\me^{h A} - I}{h} = A + \frac{1}{2!} h A^2 + \frac{1}{3!} h^2 A^3 + \cdots, $$ which converges to $A$ as $h \to 0$. ``` ```{exercise} :label: ex-invexp Using {prf:ref}`l-expcom`, show that, for any $n \times n$ matrix $A$, the matrix $\me^{A}$ is invertible, with inverse $\me^{-A}$. ``` ```{solution} ex-invexp Fix $A$ in $\matset{n}{n}$ and let $B=-A$. Evidently $AB = BA$, so $\me^A \me^B = \me^{A - A} = \me^0$. It is easy to check that $\me^0 = I$, so $\me^A \me^{-A} = I$. ``` As for (vii), we are drawing an analogy with the fundamental theorem of calculus for scalar-valued functions, which states that $f(t) - f(s) = \int_s^t f'(\tau) \diff \tau$ for all $s \leq t$, where $f'$ is the derivative of $f$. ```{exercise} :label: ex-ctime-auto-3 Prove part (vii) of {prf:ref}`l-expcom`. (Hint: use part (v).) ``` ```{solution} ex-ctime-auto-3 Fix $i,j$ with $1 \leq i,j \leq n$, let $e_k$ be the $k$-th canonical basis vector and let $f$ be the function on $\RR$ defined by $f(t) = \inner{e_i, \me^{tA} e_j}$. Part (v) tells of {prf:ref}`l-expcom` tells us that $f'(t) = \inner{e_i, \me^{tA} A e_j}$. By the fundamental theorem of calculus, we have $f(t) - f(s) = \int_s^t f'(\tau) \diff \tau$, or $$ \inner{e_i, \me^{tA} e_j} - \inner{e_i, \me^{sA} e_j} = \int_s^t \inner{e_i, \me^{\tau A} A e_j} \diff \tau. $$ As this is true for any $i$, we have $\me^{tA} - \me^{sA} = \int \me^{\tau A} A \diff \tau$, which is what we need to prove. ``` (sss-vvivp)= ### Continuous-Time Flows Next, we study solutions of multivariate differential equations, with a focus on linear systems. These results lay foundations for our study of continuous-time Markov dynamics in {ref}`ss-tmarca`. (sss-ctds)= #### Continuous-Time Dynamical Systems Recall from {ref}`ss-cops` that a discrete dynamical system is a pair $(U, S)$, where $U$ is a set and $S$ is a self-map on $U$. Trajectories are sequences $(S^t u)_{t \geq 0} = (u, Su, S^2 u, \ldots)$, where $u \in U$ is the initial condition. These ideas can be extended to continuous-time by considering a pair $(U, (S_t)_{t \geq 0})$ where $U$ is any set and $S_t$ is a self-map on $U$ for each $t \in \RR_+$. The interpretation is that if $u \in U$ is the current state of the system, then $S_t u$ will be the state after $t$ units of time. ```{prf:example} For the savings account in {prf:ref}`eg-balance` with solution $u_t \coloneq \me^{r t} u_0$, we can take $U = \RR$ and $S_t u = \me^{rt} u$. Then the state $S_t u$ at time $t$ is the balance at time $t$ associated with initial deposit $u$. ``` In general, to understand $(U, (S_t)_{t \geq 0})$ as a continuous-time dynamical system, we require that (a) $S_0$ is the identity map, so that the state after zero units of time is just the initial condition, and (b) if we start at $u$, move forward to $u_s \coloneq S_s u$, and then move again to $S_t u_s$ after another $t$ units of time, the outcome should be the same as moving from $u$ to $S_{s \,+\, t} \, u$ directly. That is, $$ S_{s \,+\, t} = S_t \circ S_s \quad \text{for all } t, s \geq 0. $$ This is the **semigroup property**. One way that continuous-time dynamical systems arise is via initial value problems. An **initial value problem** (IVP) in $\RR^n$ consists of a differential equation $\dot u_t = f(u_t)$ paired with an initial condition $u_0 \in \RR^n$, where $u_t \in \RR^n$ and $f \colon \RR^n \to \RR^n$. Under suitable conditions on $f$, the solution $u_t \coloneq F(t, u_0)$ is uniquely defined for all $t \geq 0$, and, moreover, $$ F(0, u_0) = u_0 \quad \text{and} \quad F(s + t, u_0) = F(t, F(s, u_0)) \quad \text{for all } s, t \geq 0 $$ (see, e.g., {cite}`hirsh1974`, Section 8.7). Hence $(S_t)_{t \geq 0}$ defined by $S_t u = F(t, u)$ satisfies the semigroup property and $(\RR^n, (S_t)_{t \geq 0})$ is a continuous-time dynamical system. The function $f$ is called the **vector field** of $(\RR^n, (S_t)_{t \geq 0})$. #### Linear Initial Value Problems Given our interest in continuous-time Markov chains and their connection to linear systems (see the comments at the start of {ref}`ss-backg`), we focus primarily on linear differential equations. The next result discusses linear IVPs, illustrating the key role of the matrix exponential. In the statement, $A$ is $n \times n$ and both $\dot u_t$ and $u_t$ are column vectors in $\RR^n$. ```{prf:proposition} :label: p-linearivp The unique solution of the $n$-dimensional IVP $$ \dot u_t = A u_t, \qquad u_0 \in \RR^n \text{ given}, $$ (eq-ododes) in the set of continuously differentiable functions $t \mapsto u_t$ from $\RR_+$ to $\RR^n$ is $$ u_t \coloneq \me^{ t A} u_0 \qquad (t \geq 0). $$ (eq-ododesflow) ``` (Here $\dot u_t \coloneq \diff u_t / \diff t$ is defined by differentiating the vector $u_t$ element-by-element, as discussed after {prf:ref}`l-expcom`.) ```{prf:proof} *Proof of {prf:ref}`p-linearivp`.* That $u_t \coloneq \me^{ t A} u_0$ solves {eq}`eq-ododes` is immediate from {prf:ref}`ex-diffe`. The proof of uniqueness is omitted, although the logic is very similar to the scalar case, which was discussed in {prf:ref}`eg-balance`. ◻ ``` ```{exercise} :label: ex-ctime-auto-4 Let $P$ be $n \times n$ and consider the IVP $\dot \phi_t = \phi_t P$ and $\phi_0$ given, where each $\phi_t$ is a *row* vector in $\RR^n$. Prove that this IVP has the unique solution $\phi_t \coloneq \phi_0 \, \me^{ t P}$. ``` ```{solution} ex-ctime-auto-4 If we take $u_t = \phi_t^\top$ and transpose $\dot \phi_t = \phi_t P$ we get {eq}`eq-ododes` when $A = P^\top$. By {prf:ref}`p-linearivp`, the unique solution is $u_t = \me^{ t P^\top} u_0 = ( \me^{ t P})^\top u_0$. Transposing again gives $\phi_t = \phi_0 \, \me^{t P}$, as was to be shown. ``` {prf:ref}`p-linearivp` motivates us to study flows of the form $$ t \mapsto u_t, \quad u_t = \me^{t A} u_0 \qquad (t \geq 0), $$ (eq-ctflow) where $A$ is $n \times n$, $u_0$ is a vector in $\RR^n$ indicating the initial condition, and $u_t$ is the "state" of the system at time $t$. Figure {numref}`f-expo_curve_1` shows an example when $$ A = \begin{pmatrix} -2.0 & -0.4 & 0 \\ -1.4 & -1.0 & 2.2 \\ 0.0 & -2.0 & -0.6 \end{pmatrix}. $$ ```{figure} ../figures/expo_curve_1.pdf :name: f-expo_curve_1 Exponential flow $t \mapsto \me^{tA}u_0$ starting from $u_0 \in \RR^3$ ``` (sss-diagcase)= #### Stability in the Diagonalizable Case For an exponential flow such as {eq}`eq-ctflow`, a key question is whether or not $u_t \to 0$ as $t \to \infty$. (This will matter when we try to evaluate lifetime rewards over an infinite horizon in continuous time.) Rather than analyze these issues at every possible $u_0$, we directly consider the matrix-valued flow $t \mapsto \me^{ t A}$ and study whether $\me^{tA} \to 0$. The case where $A$ is diagonalizable provides a good starting point. Suppose $A = P^{-1} D P$ with $D = \diag_j (\lambda_j)$ containing the eigenvalues of $A$. Then, by {prf:ref}`l-expcom`, for any $t \geq 0$, we have $$ \me^{t A} = \me^{t P^{-1} D P} = P^{-1} \me^{t D } P. $$ (eq-dixt) ```{exercise} :label: ex-matexpdiag Prove that $\me^{t D} = \diag (\me^{t \lambda_1}, \ldots, \me^{t \lambda_n})$. ``` {prf:ref}`ex-matexpdiag` and equation {eq}`eq-dixt` tell us that the long run dynamics of $\me^{tA}$ are determined by the scalar flows $t \mapsto \me^{t \lambda_j}$. How does $\me^{t \lambda}$ evolve over time when $\lambda \in \CC$? To answer this question we write $\lambda = a + ib$ and apply {eq}`eq-expcom` to obtain $$ \me^{t\lambda} = \me^{ta}(\cos(tb) + i \sin(tb)). $$ This equation tells us that $$ \me^{t\lambda} \to 0 \text{ as } t \to \infty \quad \text{if and only if} \quad \real \lambda < 0, $$ (eq-kre) where $\real \lambda$ is the **real part** of $\lambda$ (i.e., if $\lambda = a + ib$, then $\real \lambda = a$). From this analysis, we conclude that, when $A$ is diagonalizable, we have $\me^{tA} \to 0$ if and only if $\real \lambda_j < 0$ for all $\lambda_j \in \sigma(A)$, where $\sigma(A)$ denotes the set of all eigenvalues (the **spectrum**) of $A$. Another way to put this is that $\me^{tA} \to 0$ if and only if $s(A) < 0$, where $$ s(A) \coloneq \max_{\lambda \in \sigma(A)} \real \lambda, $$ is the **spectral bound** of $A$. As the preceding analysis suggests, the spectral bound plays a key role in the asymptotics of exponential flows, just as a spectral radius governs asymptotics of trajectories of linear maps (see, e.g., {prf:ref}`ex-rcondi`). Section {ref}`sss-specbounds` expands on this analysis, while dropping the assumption that $A$ is diagonalizable. (sss-specbounds)= #### The General Case Let $A$ be any square matrix. In the following statement about a spectral bound, $\| \cdot \|$ is the operator norm defined in {ref}`ss-lineq`. ```{prf:lemma} :label: l-propsa0 If $\tau > 0$, then $\tau s(A) = s(\tau A)$. Moreover, $$ \me^{s(A)} = \rho( \me^A ) \quad \text{and} \quad s(A) = \lim_{t \to \infty} \frac{1}{t} \ln \| \me^{t A} \|. $$ (eq-salim) ``` ```{exercise} :label: ex-ctime-auto-5 Confirm that $\tau s(A) = s(\tau A)$ when $\tau > 0$. ``` ```{solution} ex-ctime-auto-5 For any $\tau > 0$, we have $$ \tau s(A) = \tau \max_{\lambda \in \sigma(A)} \real \lambda = \max_{\tau \lambda \in \sigma(\tau A)} \tau \real \lambda = s(\tau A). $$ (eq-sposhom) ``` ```{exercise} :label: ex-ctime-auto-6 Prove the first equality in {eq}`eq-salim`. ``` ```{solution} ex-ctime-auto-6 With $\xi \coloneq \max_{\lambda \in \sigma(A)} \real \lambda$, we have $$ \rho(\me^A) =\max_{\lambda \in \sigma(\me^A)} |\lambda| =\max_{\lambda \in \sigma(A)} |\me^\lambda| =\max_{\lambda \in \sigma(A)} \me^{\real \lambda} =\me^\xi. $$ (The second equality is by {prf:ref}`l-expcom`.) Hence $\rho(\me^A) = \me^{s(A)}$, as was to be shown. ``` ```{exercise} :label: ex-ctime-auto-7 The second equality in {eq}`eq-salim` is reminiscent of Gelfand's lemma. Confirm that it holds when the limit is taken over $t \in \NN$. ``` ```{solution} ex-ctime-auto-7 For $t \in \NN$, we have $$ \frac{1}{t} \ln \| \me^{t A} \| = \ln \left( \| \me^{t A} \|^{1/t} \right) = \ln \left( \| (\me^A)^t \|^{1/t} \right). $$ Taking the limit $t \to \infty$ and applying Gelfand's lemma, this sequence converges to $\ln \rho( \me^A)$. But $\ln \rho( \me^A) = s(A)$, by the first equality in {eq}`eq-salim`. This proves the second equality in {eq}`eq-salim`. ``` (The second equality in {eq}`eq-salim` also holds when the limit is taken over $t \in \RR_+$. See, for example, {cite}`engel2006short`.) The next theorem is a key stability result for exponential flows. Among other things, it extends to arbitrary $A$ the finding that $s(A) < 0$ is necessary and sufficient for stability. ```{prf:theorem} :label: t-duetoly For any square matrix $A$, the following statements are equivalent: 1. $s(A) < 0$. 2. $\| \me^{tA} \| \to 0$ as $t \to \infty$. 3. There exist $M, \omega > 0$ such that $\| \me^{t A} \| \leq M \me^{-t \omega}$ for all $t \geq 0$. 4. $\int_0^\infty \| \me^{t A} u_0 \|^p \diff t < \infty$ for all $p \geq 1$ and $u_0 \in \RR^n$. ``` A full proof of {prf:ref}`t-duetoly` in a general setting can be found in §V.II of {cite:t}`engel2006short`. {prf:ref}`t-duetoly` tells us that the flow $t \mapsto \me^{tA} u_0$ converges to the origin at an exponential rate if and only if $s(A)<0$. The equivalence of (i) and (ii) was proved for the diagonalizable case in {ref}`sss-diagcase`. It can be viewed as the continuous-time analog of $\|B^k\| \to 0$ if and only if $\rho(B) < 1$ (see {prf:ref}`ex-rcondi`). ```{exercise} :label: ex-ctime-auto-8 Prove that (i) implies (ii) without assuming that $A$ is diagonalizable. In addition, prove that (iii) implies (iv). ``` ```{solution} ex-ctime-auto-8 Let's start with (i) $\implies$ (ii), or $s(A)<0$ implies $\| \me^{tA} \| \to 0$ as $t \to \infty$. Here is one proof that works for $t \in \NN$ and $t \to 0$. Observe that, since $(\me^A)^t = \me^{t A}$, the powers $B^t$ of $B \coloneq \me^A$ match the flow $t \mapsto \me^{tA}$ at integer times. We have $B^t \to 0$ if and only if $\rho(B) < 1$. But, by {prf:ref}`l-propsa0`, $\rho(B) = \rho(\me^A) = \me^{s(A)}$. Hence $\rho(B) < 1$ is equivalent to $s(A) < 0$. Thus, $s(A) < 0$ is the exact condition we need to obtain $B^t = \me^{t A} \to 0$. We can improve on this proof of (i) $\implies$ (ii) by allowing $t \in \RR$ and $t \to \infty$ as follows. Suppose $s(A) < 0$. Fix $\epsilon > 0$ such that $s(A)+\epsilon < 0$ and use {eq}`eq-salim` to obtain a $T < \infty$ such that $(1/t) \ln \| \me^{tA} \| \leq s(A) + \epsilon$ for all $t \geq T$. Equivalently, for $t$ large, we have $\| \me^{t A} \| \leq \me^{t(s(A)+\epsilon)}$. The claim follows. That (iii) implies (iv) is immediate: Just substitute the bound in (iii) into the integral. ``` (sss-semit)= #### Semigroup Terminology Advanced treatments of continuous-time systems often begin with semigroups. Let's briefly describe these and connect them to things we have studied earlier. (If you prefer to skip this section on first reading, you can move to the next one after noting that, given an $n \times n$ matrix $A$, the family $(S_t)_{t \geq 0} = (\me^{t A})_{t \geq 0}$ is called an **exponential semigroup** and that $A$ is called the **infinitesimal generator** of the semigroup.) Let $\Xsf$ be a finite set and let $(S_t)_{t \geq 0}$ be a subset of $\lopx$ indexed by $t \in \RR_{+}$. The family $(S_t)_{t \geq 0}$ is called a **strongly continuous semigroup** or **$C_0$-semigroup** on $\RR^\Xsf$ if 1. $S_0 = I$, where $I$ is the identity, 2. $S_{t + \, t'} = S_t \circ S_{t'}$, and 3. $t \mapsto S_t u$ is a continuous map from $\RR_+$ to $\RR^\Xsf$ for every $u \in \RR^\Xsf$. In essence, a $C_0$-semigroup on $\RR^\Xsf$ is a continuous-time dynamical system $(\RR^\Xsf, (S_t)_{\, t \geq 0})$ where each $S_t$ maps an initial state into a time $t$ state. ```{prf:example} :label: eg-exposemi Fix $A$ in $\lopx$ and let $(S_t)_{\, t \geq 0}$ be defined by $S_t = \me^{t A}$. Then $(S_t)_{\, t \geq 0}$ is a $C_0$-semigroup on $\RR^\Xsf$. To verify this we take $X = \{x_1, \ldots, x_n\}$ and view $S_t$ and $A$ as $n \times n$ matrices. The $C_0$-semigroup properties now follow directly from {prf:ref}`l-expcom`. For example, $t \mapsto S_t u$ is continuous because it is differentiable in $t$, by (v) of {prf:ref}`l-expcom`. ``` The semigroup perspective is important because it extends naturally to settings where $\Xsf$ is not finite, in which case we replace the finite-dimensional set $\RR^\Xsf$ with some (typically infinite-dimensional) class of functions $\gG \subset \RR^\Xsf$, and each $S_t$ becomes a linear operator mapping $\gG$ into itself. At this level of generality, $S_t u$ can be the solution to a partial differential equation, or a stochastic differential equation (see, e.g., {cite}`engel2006short` or {cite}`applebaum2019semigroups`). Operator semigroup theory offers an elegant and powerful framework for handling such systems. For semigroups in general settings we often have no analytical expressions for $S_t$. This situation is like the one we encountered in the continuous-time system in {ref}`sss-ctds`, where $\dot u_t = f(u_t)$ and $f$ is potentially nonlinear. When no analytical solution $u_t$ exists, analyzing the dynamics requires us to try to infer its properties from the vector field $f$, so that $f$ becomes the primary focus of analysis. A natural question, then, is, given a semigroup $(S_t)_{\, t \geq 0}$ on $\lopx$, does there always exist a "vector field" type object that "generates" $(S_t)_{\, t \geq 0}$? When $\Xsf$ is finite, the answer is affirmative. This object, to be denoted by $A$, is called the **infinitesimal generator** of the semigroup and is defined by $$ A = \lim_{t \, \downarrow \, 0} \frac{S_t - S_0}{t} = \lim_{t \, \downarrow \, 0} \frac{S_t - I}{t} $$ (eq-infgen) At $u \in U$, the vector $A u$ indicates the instantaneous change in the state. More precisely, when $\Xsf$ is finite, we have: ```{prf:proposition} :label: p-fdsemi If $(S_t)_{\, t \geq 0}$ is a $C_0$-semigroup on $\RR^\Xsf$, then 1. there exists an $A \in \lopx$ such that $S_t = \me^{t A}$ for all $t \geq 0$, and 2. $A$ is the infinitesimal generator of $(S_t)_{\, t \geq 0}$. ``` Semigroups of the form described in {prf:ref}`p-fdsemi` are called **exponential semigroups** (or "uniformly continuous" semigroups). A full proof of {prf:ref}`p-fdsemi` can be found in the discussion of Theorem 2.12 of {cite:t}`engel2006short`. The results are not surprising, since the main claim is that, in finite dimensions, solutions to linear differential equations have exponential form. The fact that $A$ is the infinitesimal generator of the semigroup $(S_t)_{\, t \geq 0} = (\me^{tA})_{\, t \geq 0}$ follows from {prf:ref}`l-expcom`, which gives $$ \lim_{t \, \downarrow \, 0} \frac{S_t - S_0}{t} = \lim_{t \, \downarrow \, 0} \frac{\me^{t A} - \me^{0}}{t} = \frac{\diff}{\diff t} \me^{t A} \; \Bigr\rvert_{\, t = 0} = A \me^{0 A} = A. $$ The preceding discussion places our analysis in a wider context. To practice our new terminology, we can restate (i) $\iff$ (ii) from {prf:ref}`t-duetoly` by saying that the exponential semigroup $(S_t)_{\, t \geq 0} = (\me^{tA})_{\, t \geq 0}$ converges to zero if and only if the spectral bound of its infinitesimal generator is negative. (ss-tmarca)= ### Markov Semigroups Having studied multivariate linear dynamics, we are now ready to concentrate on the Markov case, where dynamics evolve in distribution space. For the most part we now switch to operator-theoretic notation, where $\Xsf$ is a finite set with $n$ elements, and an $n \times n$ matrix is identified with a linear operator on $\lopx$. As emphasized in {ref}`sss-matop`, this is merely a change in terminology, and all preceding results for matrices extend directly to linear operators. (sss-inmat)= #### Intensity Matrices If $(X_t)_{\, t \geq 0}$ is $P$-Markov on $\Xsf$ for some $P \in \mopx$, then the marginal distributions of $(X_t)_{t \geq 0}$ evolve according to the linear difference system $\psi_{t+1} = \psi_t P$ (see {ref}`ss-markchain`). We now seek a continuous-time analog in the form of a linear differential equation. To this end we call $Q \in \lopx$ an **intensity operator** or **intensity matrix**[^1] when $$ Q(x, x') \geq 0 \text{ whenever } x \not= x' \quad \text{and} \quad \sum_{x'} Q(x, x') = 0 \text{ for all } x \in \Xsf. $$ (eq-intensity) Let $$ \iopx = \text{ the set of all intensity operators in } \lopx. $$ ```{prf:example} The matrix $$ Q \coloneq \begin{pmatrix} -2 & 1 & 1 \\ 0 & -1 & 1 \\ 2 & 1 & -3 \end{pmatrix} $$ is an intensity matrix, since off-diagonal terms are nonnegative and rows sum to zero. ``` Consider the IVP $$ \dot \psi_t(x') = \sum_{x} Q(x, x') \psi_t(x) \qquad (t \geq 0, \; x' \in \Xsf), $$ which we can also write as $$ \dot \psi_t = \psi_t \, Q, \qquad \psi_0 \in \dD(\Xsf) \text{ given}. $$ (eq-mdife) when $\psi_t$ and $\dot \psi_t$ are understood to be row vectors. We say that $\dD(\Xsf)$ is **invariant** for the IVP {eq}`eq-mdife` if the solution $(\psi_t)_{t \geq 0}$ remains in $\dD(\Xsf)$ for all $t \geq 0$. In view of {prf:ref}`p-linearivp`, we can rephrase this by stating that $\dD(\Xsf)$ is invariant for {eq}`eq-mdife` whenever $$ \psi_0 \in \dD(\Xsf) \quad \implies \quad \psi_0 \, \me^{t Q} \in \dD(\Xsf) \text{ for all } t \geq 0. $$ (eq-ccmk) Our key result for this section shows the central role of intensity matrices: ```{prf:proposition} :label: p-pkol Fix $Q \in \lopx$ and set $P_t \coloneq \me^{tQ}$ for each $t \geq 0$. The following statements are equivalent: 1. $Q \in \iopx$. 2. $P_t \in \mopx$ for all $t \geq 0$. 3. the set of distributions $\dD(\Xsf)$ is invariant for the IVP {eq}`eq-mdife`. ``` {prf:ref}`p-pkol` tells us that the set $\iopx$ coincides with the set of continuous-time (and time-homogeneous) Markov models on $\Xsf$. Any specification outside this class fails to generate flows in distribution space. The proof is completed in several steps. For {prf:ref}`ex-pt1`--{prf:ref}`ex-qkr`, $Q \in \iopx$ and $P_t \coloneq \me^{tQ}$. ```{exercise} :label: ex-pt1 Show that $P_t \1 = \1$ for all $t \geq 0$. ``` ```{exercise} :label: ex-repq Set $\theta \coloneq \max_{x \in \Xsf} |Q(x,x)|$ and $K \coloneq I + \frac{1}{\theta} \, Q$, where $I$ is the $n \times n$ identity. (If $\theta=0$, then set $K \coloneq I$.) Prove that $K$ is a stochastic matrix and $Q = \theta (K - I)$. ``` ```{exercise} :label: ex-qkr Using the representation for $Q$ obtained in {prf:ref}`ex-repq` and the definition of the matrix exponential, show that $P_t$ is nonnegative for all $t \geq 0$. ``` ```{solution} ex-qkr Recalling that, for matrix exponentials, $\me^{A+B} = \me^A \me^B$ whenever $AB = BA$, we have $$ \me^{tQ} = \me^{t \theta (K - I)} = \me^{-t \theta I} \me^{t \theta K} = \me^{-t \theta} \left( I + t \theta K + \frac{(t \theta)^2}{2!} K^2 + \cdots \right). $$ It is clear from this representation that all entries of $P_t = \me^{tQ}$ are nonnegative. ``` For the proof of {prf:ref}`p-pkol`, we have now shown that (i) implies (ii). Evidently (ii) implies (iii), because if $\psi_0 \in \dD$ and $\psi_t = \psi_0 P_t$ where $P_t$ is stochastic, then $\psi_t \in \dD(\Xsf)$. Hence it remains only to show that (iii) implies (i). ```{exercise} :label: ex-ctime-auto-9 Let $Q$ be $n \times n$ and assume (iii). Fix $x \in \Xsf$. By (iii) we have $\delta_x \, \me^{t Q} \1 = 1$ for all $t \geq 0$, where $\1$ is a vector of ones. Show that $\sum_{x'} Q(x, x') = 0$ using this identity. ``` ```{solution} ex-ctime-auto-9 Since $f(t) \coloneq \delta_x \, \me^{t Q} \1 = 1$ for all $t \geq 0$, we have $f'(t) = 0$. Recalling that $(\me^{t Q})' = \me^{t Q} Q$, this means that $$ \frac{\diff}{\diff t} \delta_x \, \me^{t Q} \1 = \delta_x \, \frac{\diff}{\diff t} \, \me^{t Q} \1 = \delta_x \, \me^{t Q} Q \1 = 0 , $$ for all $t \geq 0$. Evaluating at $t=0$, we get $\delta_x Q \1 = 0$. That is, $\sum_{x'} Q(x, x') = 0$. ``` ```{exercise} :label: ex-ctime-auto-10 Prove that $Q(x, x') \geq 0$ when (iii) holds and $x \not= x'$. ``` ```{solution} ex-ctime-auto-10 By {prf:ref}`l-expcom`, we have $$ \frac{\diff}{\diff t} \me^{tQ} = Q \me^{t Q} = \me^{t Q} Q \quad \text{for all} \quad t \geq 0. $$ (eq-qderiv) Evaluating {eq}`eq-qderiv` at $t=0$ and recalling that $\me^0 = I$ gives $$ Q = \lim_{h \, \downarrow \, 0} \; \frac{1}{h} \, ( \me^{h Q} - I ). $$ (eq-qdiffz) Interpreting $\delta_x$ as a row vector and $\delta_{x'}$ as a column vector, while using the fact that $x \not= x'$ combined with {eq}`eq-qdiffz`, we obtain $$ Q(x, x') = \delta_x Q \delta_{x'} = \delta_x \left[ \lim_{h \, \downarrow \, 0} \frac{\me^{h Q}}{h} \right] \delta_{x'} = \lim_{h \, \downarrow \, 0} \delta_x \frac{\me^{h Q}}{h} \delta_{x'}. $$ Hence we need only show that the $\delta_x \me^{h Q} \delta_{x'} \geq 0$. By (ii), $\delta_x \me^{h Q}$ is a distribution, so the inequality holds. ``` Returning to {prf:ref}`p-pkol`, the last two exercises confirm that (iii) implies (i). The proof is now complete. #### Interpretation Section {ref}`sss-inmat` covered the mathematical relationship between intensity matrices and Markov operators. Let's now discuss the connection more informally, in order to build intuition. To this end, let $(X_t)_{t \geq 0}$ be $P_h$-Markov in discrete time. Here $h > 0$ is the length of the time step. We write the corresponding distribution sequence $\psi_{t+h} = \psi_t P_h$ in terms of change per unit of time, as in $$ \frac{\psi_{t+h} - \psi_t}{h} = \psi_t \frac{P_h - I}{h} \quad \text{where} \quad I \text{ is the } n \times n \text{ identity}. $$ (eq-mritd) Continuous-time dynamics are obtained by taking the limit as $h \, \downarrow \, 0$. If we define $$ Q \coloneq \lim_{h \, \downarrow \, 0} \frac{P_h - I}{h}, $$ (eq-qsp) and assume that limits exist, then {eq}`eq-mritd` becomes {eq}`eq-mdife`. What properties does $Q$ have? Inspecting {eq}`eq-qsp` implies $$ Q(x, x') \approx \frac{P_h(x, x') - \1\{x = x'\}}{h} $$ (eq-qapp) when $h$ is small and positive. ```{exercise} :label: ex-ctime-auto-11 Prove that, when $h > 0$ and $P_h$ is stochastic, the matrix on the right-hand side of {eq}`eq-qapp` is an intensity matrix. ``` ```{exercise} :label: ex-ctime-auto-12 To formalize {eq}`eq-qapp`, use the expression for the matrix exponential in {eq}`eq-defexpmat` to prove that if $P_t = \me^{tQ}$, then $$ P_h(x, x') = h \, Q(x, x') + o(h) \quad \text{whenever} \quad x \not= x' . $$ (eq-phsv) ``` ```{solution} ex-ctime-auto-12 Using the matrix exponential {eq}`eq-defexpmat` and $P_t = \me^{tQ}$ yields $$ P_t(x, x') = \1\{x = x'\} + t Q(x, x') + t^2 \frac{Q^2(x, x')}{2!} + t^3 \frac{Q^3(x,x')}{3!} + \cdots $$ Setting $t=h$ and using $o(h)$ to capture terms converging to zero faster than $h$ as $h \, \downarrow \, 0$ recovers {eq}`eq-phsv`. ``` Equation {eq}`eq-phsv` tells us that $Q(x, x')$ represents the instantaneous rate of flow out of state $x$ and into state $x'$. The on-diagonal value $P_h(x,x)$ just balances the off-diagonal probabilities. (sss-transemi)= #### Markov Semigroups Fix $Q \in \iopx$. In the terminology of {ref}`sss-semit`, the family of operators $(P_t)_{t \geq 0} = (\me^{tQ})_{t \geq 0} \subset \mopx$ that solves $\dot \psi_t = \psi_t Q$ (see {eq}`eq-ccmk`) is an exponential semigroup. Since each $P_t$ is in $\mopx$, it is also called the **Markov semigroup** generated by $Q$. It satisfies the semigroup property $P_{s \, + \, t} = P_s \, P_t$ for all $s, t \geq 0$, which can be written more explicitly as $$ P_{s+t} (x, x') = \sum_z P_s(x, z) P_t(z, x') \qquad (s, t \geq 0, \; x, x' \in \Xsf). $$ (eq-ctkchap) In the present setting, {eq}`eq-ctkchap` is called the (continuous-time) **Chapman--Kolmogorov equation**. It states that the probability of moving from $x$ to $x'$ over $s+t$ units of time equals the probability of moving from $x$ to $z$ over $s$ units of time, and then $z$ to $x'$ over $t$ units of time, summed over all $z$. Again following the terminology in {ref}`sss-semit`, the intensity matrix $Q$ that defines $(P_t)_{t \geq 0} = (\me^{tQ})_{t \geq 0}$ is also called the infinitesimal generator of $(P_t)_{t \geq 0}$. From {prf:ref}`l-expcom`, the derivative of $\me^{tQ}$ is $Q\me^{tQ} = \me^{tQ}Q$. We can write this as - $\dot P_t = QP_t$, which is called the **Kolmogorov backward equation**, and - $\dot P_t = P_t Q$, which is called the **Kolmogorov forward equation**. We can work in the other direction as well: If we can establish that a function $t \mapsto P_t$ from $\RR_+$ to $\lopx$ satisfies either one of these equations, then $(P_t)_{t \geq 0}$ is a Markov semigroup with infinitesimal generator $Q$. The next proposition gives details. ```{prf:proposition} :label: p-koleq Let $Q$ be an intensity matrix. If $t \mapsto P_t$ is a differentiable function from $\RR_+$ to $\lopx$ such that $P_0 = I$ and either 1. $\dot P_t = QP_t$ or 2. $\dot P_t = P_tQ$, then $P_t = \me^{t Q}$ for all $t \geq 0$. ``` {prf:ref}`p-koleq` is a version of our result for linear IVPs in {prf:ref}`p-linearivp`, except that the IVP is now defined in operator space, rather than vector space. ### Continuous-Time Markov Chains We have discussed the connection between intensity matrices, Markov semigroups, and distribution flows. Let's now connect these objects to continuous-time Markov chains. In this section, we will (a) provide a formal definition of a continuous-time Markov chain associated with a given initial condition $\psi$ and intensity matrix $Q$, and (b) show how to construct such a chain algorithmically. We'll accomplish (b) in two steps: first showing how to construct a jump chain from certain primitives ({ref}`sss-jccon`--{ref}`sss-aid`) and then showing how to construct those primitives from a given initial condition $\psi$ and intensity matrix $Q$ ({ref}`sss-imjc`). (sss-ctmcdef)= #### Definition Let $C(\RR_+, \Xsf)$ be the set of right-continuous functions from $\RR_+$ to $\Xsf$ and let $(P_t)_{t \geq 0}$ be a Markov semigroup generated by some $Q \in \iopx$. A **continuous-time Markov chain** generated by $(P_t)_{t \geq 0}$ is a random function $(X_t)_{t \geq 0}$ that takes values in $C(\RR_+, \Xsf)$ and satisfies $$ \PP \{ X_{s \,+\, t} = x' \given \fF_s \} = P_t(X_s, x') \qquad \text{for all } s, t \geq 0 \text{ and } x' \in \Xsf, $$ (eq-ptxt) where $\fF_s \coloneq (X_\tau)_{0 \leq \tau \leq s}$ is the history of the process up to time $s$. To update from time $s$ to time $t$ given this history, we simply take the last value $X_s$ and update using $P_t$. Conditioning on $X_s = x$, we get $$ P_t(x, x') = \PP \{ X_{s \, + \, t} = x' \given X_s = x\} \qquad (s, t \geq 0, \; x, x' \in \Xsf). $$ Mirroring terminology for discrete chains from {ref}`sss-dmcs`, we will call a continuous-time Markov chain $(X_t)_{t \geq 0}$ **$Q$-Markov** when {eq}`eq-ptxt` holds and $Q$ is the infinitesimal generator of $(P_t)_{t \geq 0}$. In what follows, $\PP_x$ and $\EE_x$ denote probabilities and expectations conditional on $X_0 = x$. Given $h \in \RR^\Xsf$, we have $$ \EE_x \, h(X_t) = \sum_{x'} P_t(x, x')h(x') =: (P_t h)(x) . $$ This expression mirrors the discrete time case discussed in {ref}`sss-ceo`. (sss-jccon)= #### A Jump Chain Construction In {ref}`sss-ctmcdef` we defined a continuous-time Markov chain. In this section, we describe a standard method for constructing one by using three components: 1. an initial condition $\psi \in \dD(\Xsf)$, 2. a **jump matrix** $\Pi \in \mopx$, and 3. a **rate function** $\lambda$ mapping $\Xsf$ to $(0, \infty)$. The process $(X_t)$ starts at state $x$, which is drawn from $\psi$, waits there for an exponential time $W$ with rate $\lambda(x)$, and then updates to a new state $x'$ drawn from $\Pi(x, \cdot)$. We take $x'$ as the new state for the process and repeat. These ideas are restated in {prf:ref}`algo-ejc_algo`. In the algorithm, $(W_k)$ and $(Y_k)$ are drawn independently. The process $(W_k)$ is called the sequence of **holding times** or **wait times**, the sums $J_k = \sum_{i=1}^k W_i$ are called the **jump times** and $(Y_k)$ is called the **embedded jump chain**. The jumps and the process $(X_t)_{t \geq 0}$ are illustrated in Figure {numref}`f-jump_chain`. ```{prf:algorithm} Jump chain algorithm :label: algo-ejc_algo - draw $Y_0$ from $\psi$, set $J_0 = 0$ and $k=1$ - while $t < \infty$: - draw $W_k$ independently from Exp$(\lambda(Y_{k-1}))$ - $J_k \leftarrow J_{k-1} + W_k$ - $X_t \leftarrow Y_{k-1}$ for all $t$ in $[J_{k-1}, J_k)$ - draw $Y_k$ from $\Pi(Y_{k-1}, \cdot)$ - $k \leftarrow k+1$ ``` ```{figure} figures/jump_chain.svg :name: f-jump_chain A jump chain sample path ``` Let $I \in \lopx$ be the identity matrix, so $I(x,x') = \1\{x = x'\}$, and define $Q \in \lopx$ via $$ Q(x, x') = \lambda(x)(\Pi(x, x') - I(x,x')) \qquad (x, x' \in \Xsf) $$ (eq-qfromj) It is easy to verify that $Q$ is an intensity matrix. In fact, $Q$ is the intensity matrix for the Markov semigroup associated with the process generated by {prf:ref}`algo-ejc_algo`. For $x \not= x'$, it tells us that probability flows from $x$ to $x'$ at rate $\lambda(x) \Pi(x, x')$, which is the rate of leaving $x$ times the rate of moving from $x$ to $x'$. The next result formalizes these ideas. ```{prf:proposition} :label: p-jcc The process $(X_t)_{t \geq 0}$ generated by {prf:ref}`algo-ejc_algo` is $Q$-Markov. ``` To prove {prf:ref}`p-jcc` we take $(X_t)_{t \geq 0}$ to be as in the statement of the proposition and define $(P_t)_{t \geq 0}$ by $P_t(x, x') = \PP_x \{X_t = x'\}$ for all $x,x' \in \Xsf$. The proof uses the following steps: 1. Obtain an integral equation that $(P_t)_{t \geq 0}$ must satisfy. 2. Differentiate to obtain the Kolmogorov backward equation $\dot P_t = QP_t$. 3. Solve this differential equation to obtain $P_t = \me^{t Q}$ for all $t$. Here is the first step. In the statement, $\Pi P_{t-\tau}$ is the matrix product of $\Pi$ and $P_{t-\tau}$, while the equation in {eq}`eq-kbinteg` is sometimes called the **integrated Kolmogorov backward equation**. ```{prf:lemma} :label: l-ainteg For all $t \geq 0$ and $x, x'$ in $\Xsf$, the semigroup $(P_t)_{t \geq 0}$ satisfies $$ P_t(x, x') = e^{-t \lambda(x)} I(x, x') + \lambda(x) \int_0^t (\Pi P_{t-\tau})(x, x') e^{- \tau \lambda(x)} d \tau $$ (eq-kbinteg) ``` ```{prf:proof} Fixing $x, x' \in \Xsf$ and $t > 0$, we have $$ P_t(x, x') \coloneq \PP_x \{X_t = x'\} = \PP_x \{X_t = x', \; J_1 > t \} + \PP_x \{X_t = x', \; J_1 \leq t \}. $$ (eq-pt_split) Regarding the first term on the right-hand side of {eq}`eq-pt_split`, $$ \PP_x \{X_t = x', \; J_1 > t \} = I(x, x') P\{J_1 > t \} = I(x, x') e^{- t \lambda(x)}. $$ (eq-pt_first) For the second term on the right-hand side of {eq}`eq-pt_split`, we obtain $$ \PP_x \{X_t = x', \; J_1 \leq t \} = \EE_x \left[ \1\{J_1 \leq t\} \PP_x \{X_t = x' \,|\, W_1, Y_1\} \right] = \EE_x \left[ \1\{J_1 \leq t\} P_{t - J_1} (Y_1, x') \right]. $$ Evaluating the expectation and using the independence of $J_1$ and $Y_1$, this becomes $$ \begin{aligned} \PP_x \{X_t = x', \; J_1 \leq t \} & = \int_0^\infty \1\{\tau \leq t\} \sum_z \Pi(x, z) P_{t - \tau} (z, x') \lambda(x) e^{-\tau \lambda(x)} d \tau \\ & = \lambda(x) \int_0^t \sum_z \Pi(x, z) P_{t - \tau} (z, x') e^{-\tau \lambda(x)} d \tau. \end{aligned} $$ Combining this result with {eq}`eq-pt_split` and {eq}`eq-pt_first` gives {eq}`eq-kbinteg`. ◻ ``` Differentiating the integrated Kolmogorov backward equation produces the Kolmogorov backward equation: ```{prf:lemma} :label: l-fromitkb If $(P_t)_{t \geq 0}$ satisfies {eq}`eq-kbinteg`, then $P_0 = I$ and $\dot P_t = Q P_t$ for all $t \geq 0$ ``` ```{prf:proof} The claim that $P_0 = I$ is obvious. For the second claim, one can easily verify that, when $f$ is a differentiable function and $\alpha > 0$, we have $$ g(t) = e^{- t \alpha} f(t) \quad \implies \quad g'(t) = e^{- t \alpha} f'(t) - \alpha g(t) $$ (eq-gdiff) Note also that, with the change of variable $s = t - \tau$, we can rewrite {eq}`eq-kbinteg` as $$ P_t(x, x') = e^{-t \lambda(x)} \left\{ I(x, x') + \lambda(x) \int_0^t (\Pi P_s)(x, x') e^{s \lambda(x)} d s \right\}. $$ (eq-kbinteg2) Applying {eq}`eq-gdiff` produces $$ \dot P_t(x, x') = e^{-t \lambda(x)} \left\{ \lambda(x) (\Pi P_t)(x, x') e^{t \lambda(x)} \right\} - \lambda(x) P_t(x, x'). $$ Rearranging yields $\dot P_t(x, x') = \lambda(x) [ (\Pi - I) P_t](x, x')$, which is identical to $\dot P_t = Q P_t$. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`p-jcc`.* {prf:ref}`p-jcc` follows directly from {prf:ref}`l-ainteg` and {prf:ref}`l-fromitkb`, combined with {prf:ref}`p-koleq`. ◻ ``` (sss-aid)= #### Application: Inventory Dynamics Let $X_t$ be a firm's inventory at time $t$. When current stock is $x > 0$, customers arrive at rate $\lambda(x)$, so the wait time for the next customer is an independent draw from the $\Exp (\lambda(x))$ distribution; $\lambda$ maps $\Xsf$ to $(0, \infty)$. The $k$-th customer demands $U_k$ units, where each $U_k$ is an independent draw from a fixed distribution $\phi$ on $\NN$. Purchases are constrained by inventory, however, so inventory falls by $U_k \wedge X_t$. When inventory hits zero the firm orders $b$ units of new stock. The wait time for new stock is also exponential, being an independent draw from $\Exp (\lambda(0))$. Let $Y$ represent the inventory size after the next jump (induced by either a customer purchase or ordering new stock), given current stock $x$. If $x > 0$, then $Y$ is a draw from the distribution of $x - U \wedge x$ where $U \sim \phi$. If $x=0$, then $Y \equiv b$. Hence $Y$ is a draw from $\Pi(x, \cdot)$, where $\Pi(0, y) = \1\{y=b\}$ and, for $0 < x \leq b$, $$ \Pi(x, y) = \begin{cases} 0 & \text{ if } x \leq y \\ \PP\{x - U = y\} & \text{ if } 0 < y < x \\ \PP\{U \geq x\} & \text{ if } y = 0 \end{cases} $$ (eq-jumpkern) ```{exercise} :label: ex-ctime-auto-13 Prove that $\Pi$ is a stochastic matrix on $\Xsf \coloneq \{ 0, 1, \ldots, b\}$. ``` We can simulate the inventory process $(X_t)_{t \geq 0}$ via the jump chain algorithm. In this case, the wait time sequence $(W_k)$ is the wait time for customers (and for inventory when $X_t=0$) and the jump sequence $(Y_k)$ is the level of inventory immediately after each jump. By {prf:ref}`p-jcc`, the inventory process is $Q$-Markov with $Q$ given by $Q(x, x') = \lambda(x)(\Pi(x, x') - I(x,x'))$. Figure {numref}`f-inventory_cont_time_1` shows a simulation when orders are geometric, so that $$ \phi(k) = \PP\{U = k\} = (1-\alpha)^{k-1} \alpha \qquad (k \in \NN, \; \alpha \in (0, 1)). $$ In the simulation we set $\alpha=0.7$, $b=10$ and $\lambda(x) \equiv 0.5$. The figure plots $X_t$ for $t \in [0, 50]$. Since each wait time $W_i$ is a draw from $\Exp(0.5)$ the mean wait time is 2.0. The function that produces the map $t \mapsto X_t$ is shown in {numref}`list-inventory_cont_time`. ```{figure} ../figures/inventory_cont_time_1.pdf :name: f-inventory_cont_time_1 Continuous-time inventory dynamics ``` ```{code-block} julia :name: list-inventory_cont_time :caption: Continuous-time inventory dynamics (`inventory_cont_time.jl`) :linenos: using Random, Distributions """ Generate a path for inventory starting at b, up to time T. Return the path as a function X(t) constructed from (J_k) and (Y_k). """ function sim_path(; T=10, seed=123, λ=0.5, α=0.7, b=10) J, Y = 0.0, b J_vals, Y_vals = [J], [Y] Random.seed!(seed) φ = Exponential(1/λ) # Wait times are exponential G = Geometric(α) # Orders are geometric while true W = rand(φ) J += W push!(J_vals, J) if Y == 0 Y = b else U = rand(G) + 1 # Geometric on 1, 2,... Y = Y - min(Y, U) end push!(Y_vals, Y) if J > T break end end function X(t) k = searchsortedlast(J_vals, t) return Y_vals[k+1] end return X end ``` (sss-imjc)= #### From Intensity Matrices to Jump Chains If $Q \in \lopx$ is a given intensity matrix, how should we produce a continuous-time $Q$-Markov chain? If we can construct a jump chain that is $Q$-Markov, then not only do we obtain existence of a $Q$-Markov chain but we also provide a way to simulate one (via {prf:ref}`algo-ejc_algo`). To construct such a jump chain we first fix an intensity matrix $Q \in \lopx$ and, to simply matters, assume that all rows of $Q$ are nonzero. This means that the process has no absorbing states (since nonzero rows are equivalent to $Q(x,x) < 0$ for all $x$, which in turn states that there is a nonzero outflow from each state). Then we set $$ \lambda(x) \coloneq -Q(x,x) \quad \text{and} \quad \Pi(x,x') \coloneq I(x,x') + \frac{Q(x,x')}{\lambda(x)}. $$ It is straightforward to confirm that $\Pi \in \mopx$ and that $Q$ satisfies {eq}`eq-qfromj`. Hence, by {prf:ref}`p-jcc`, the process $(X_t)_{t \geq 0}$ generated by {prf:ref}`algo-ejc_algo` is $Q$-Markov. (s-ctmdp)= ## Continuous-Time Markov Decision Processes We are ready to turn to dynamic programming in continuous-time. As for the discrete time case, continuous-time dynamic programs aim to maximize a measure of lifetime value. In {ref}`ss-ctlv` we study lifetime valuations. In {ref}`ss-cadp` we learn how to maximize them. (ss-ctlv)= ### Valuation In this section, we consider lifetime valuations associated with continuous reward flows, starting from a general semigroup perspective and then progressing to specific cases (such as expected lifetime value under constant discounting). Throughout, $\Xsf$ is a finite set. (sss-asgp)= #### A Semigroup Perspective For the discrete time problems with state-dependent discounting that we studied in {prf:ref}`c-state_dep`, lifetime valuations take the form $v = \sum_{t \geq 0} K^t h$ for some $h \in \RR^\Xsf$ and a positive linear operator $K$ on $\RR^\Xsf$. (See {prf:ref}`t-dpec` and {eq}`eq-vsig_stat_dep`.) For a continuous-time version we fix $h \in \RR^\Xsf$, take $(K_t)_{t \geq 0}$ to be a **positive** exponential semigroup in $\lopx$, where positive means $K_t \geq 0$ for all $t$, and set $$ v = \int_0^\infty K_t h \diff t. $$ (eq-vykh) Let $A \in \lopx$ be the infinitesimal generator of $(K_t)_{\, t \geq 0}$. The next result provides a condition for finiteness of $v$ and several characterizations. ```{prf:proposition} :label: p-gsgv If $s(A) < 0$, then 1. the integral in {eq}`eq-vykh` is finite and $$ v = \int_0^t K_\tau h \diff \tau + K_t v \quad \text{for all } t \geq 0, $$ (eq-vsby) 2. $A$ is bijective and $v = - A^{-1} h$, 3. $A^{-1} \leq 0$, and 4. the operator $U \colon \RR^\Xsf \to \RR^\Xsf$ defined by $$ U w = h + (I + A) w \qquad \left(w \in \RR^\Xsf\right) $$ (eq-uvh) is order stable on $\RR^\Xsf$ and $v$ in {eq}`eq-vykh` is the unique fixed point. ``` A way to understand {eq}`eq-vsby` is to view the valuation $v$ as a price that reflects prospective benefits from holding an asset. The asset yields a flow of benefits, where $h(x)$ is the instantaneous reward in state $x$. Rewards $t$ periods in the future are discounted by the pricing operator $K_t$. Thus, $(K_t h)(x)$ is the anticipated payoff $t$ periods ahead, discounted for the wait time and possibly also for risk as in {eq}`eq-frapt`. The value $v(x)$ is then lifetime value, which equals the current price. In this asset valuation setting, {eq}`eq-vsby` is a natural consistency condition. It says that the price of purchasing the asset today is equal to the payouts obtained from holding the asset from now until time $t$ and then selling it for current discounted value $K_t v$. (This is the continuous-time analog of {eq}`eq-fdstreq`.) The preceding discussion matches the semigroup perspective on asset pricing introduced in {cite:t}`garman1985towards` and {cite:t}`duffie1986intertemporal`. In addition to shedding light on {eq}`eq-vsby`, it also leads to the assertion that $v = - A^{-1} h$ in (ii), which is obtained by differentiating {eq}`eq-vsby`. Details are in the proof: ```{prf:proof} *Proof of {prf:ref}`p-gsgv`.* From {prf:ref}`p-fdsemi`, we have $K_t = \me^{tA}$ for all $t \geq 0$. Since $s(A) < 0$, {prf:ref}`t-duetoly` implies that the integral in {eq}`eq-vykh` is finite. For any $t \geq 0$, $$ v = \int_0^\infty K_\tau h \diff \tau = \int_0^t K_\tau h \diff \tau + \int_t^\infty K_\tau h \diff \tau . $$ Using the semigroup property and linearity of $K_t$, we can write the last term on the right-hand side as $$ \int_t^\infty K_\tau h \diff \tau = \int_0^\infty K_{t + \tau} h \diff \tau = \int_0^\infty K_t K_\tau h \diff \tau = K_t \int_0^\infty K_\tau h \diff \tau = K_t v. $$ Combining this result with the expression for $v$ in the previous display proves {eq}`eq-vsby`. This proves part (i) of the proposition. Turning to (ii), if we rearrange {eq}`eq-vsby` and divide by $t > 0$, we get $$ -\frac{K_t - I}{t} v = \frac{1}{t} \int_0^t K_\tau h \diff \tau . $$ (eq-kdrl) By the fundamental theorem of calculus, $$ \lim_{t \to 0} \frac{1}{t} \int_0^t K_\tau h \diff \tau = \frac{\diff}{\diff t} \int_0^t K_\tau h \diff \tau \; \Bigr\rvert_{\, t = 0} = K_0 \, h = I \, h = h. $$ As a result, taking $t \to 0$ in {eq}`eq-kdrl` and using the definition of the infinitesimal generator yields $- A v = h$. Moreover, since $s(A) < 0$, all eigenvalues of $A$ are nonzero. Hence $A$ has nonzero determinant and is therefore nonsingular (bijective). Combining these facts yields $v = - A^{-1} h$. Regarding (iii), fix $g \in \RR^\Xsf$ with $g \geq 0$. From the preceding results, the function $w = \int_0^\infty K_t g \diff t$ is finite and equals $-A^{-1} g$. Since $K_t \geq 0$ for all $t$, we have $w \geq 0$. Thus, $-A^{-1} g \geq 0$ whenever $g \geq 0$. Hence $-A^{-1} \geq 0$, or $A^{-1} \leq 0$. For (iv) we use the fact that $v$ obeys $-A v = h$ to obtain $v = h + (I + A) v$. Hence $v$ is a fixed point of $U$. Conversely, if $w$ is a fixed point of $U$, then $-A w = h$. But $A$ is invertible, so then $w = - A^{-1} h = v$. Hence $v$ is the only fixed point of $U$ in $\RR^\Xsf$. Order stability of $U$ requires upward and downward stability on $\RR^\Xsf$. For upward stability, suppose that $w \in \RR^\Xsf$ and $Uw \geq w$. Then $h + A w \geq 0$, or $- A w \leq h$. But $-A^{-1} \geq 0$, so $w \leq - A^{-1} h = v$, and upward stability holds. The proof of downward stability is similar. ◻ ``` #### Valuations as Expectations In applications, the expression $v = \int_0^\infty K_t \, h \diff t$ from {eq}`eq-vykh` typically arises as a discounted expectation over a flow of rewards. When analyzing $v$ we wish to deploy {prf:ref}`p-gsgv`, so we need to check that any expectation we propose results in $(K_t)$ being a semigroup. The next proposition provides one result along these lines. ```{prf:proposition} :label: p-mfgs If $(X_t)_{t \geq 0}$ is a continuous-time Markov chain on $\Xsf$ and $\delta \in \RR^\Xsf$, then the family of operators $(K_t)_{t \geq 0} \subset \lopx$ defined by $$ (K_t \, h)(x) = \EE_x \exp \left(- \int_0^t \delta(X_\tau) \diff \tau \right) h(X_t) \qquad (t \geq 0) $$ (eq-kthgen) is a positive $C_0$-semigroup. ``` In the proof of {prf:ref}`p-mfgs`, we will use the fact that $(X_t)_{t \geq 0}$ satisfies the **Markov property**. In particular, if $H$ is a real-valued function on the path space $C(\RR_+, \Xsf)$, then $$ \EE \left[ H( (X_\tau)_{\tau \geq s} ) \,|\, (X_\tau)_{\tau=0}^s \right] = \EE_{X_s} H( (X_\tau)_{\tau \geq 0} ). $$ (eq-ctmarp) For a proof of {eq}`eq-ctmarp`, see, for example, Chapter 2 of {cite:t}`liggett2010continuous`. ```{exercise} :label: ex-delpro Let $(X_t)_{t \geq 0}$ be as stated and, for each $s,t \in \RR_+$ with $s \leq t$, let $\eta(s, t)$ be the random variable defined by $$ \eta(s, t) = \exp \left(- \int_s^t \delta(X_\tau) \diff \tau \right). $$ Show that 1. $\eta(s,t) > 0$ for all $0 \leq s \leq t$, 2. $\eta(s,s) = 1$ for all $s \in \RR_+$, and 3. $\eta(0,s+t) = \eta(0, s) \, \eta(s, s+t)$ for all $s, t \in \RR_+$. ``` ```{prf:proof} *Proof of {prf:ref}`p-mfgs`.* Fix $h \in \RR^\Xsf$. Evidently $(K_0 \, h)(x) = h(x)$, so $K_0 = I$. Regarding the semigroup property, we fix $s \leq t$ and use {prf:ref}`ex-delpro` and the law of iterated expectations to obtain $$ (K_{s + \, t} \, h)(x) = \EE_x \, \eta(0,s + t) \, h(X_{s + t}) = \EE_x \, \left[ \eta(0,s) \, \EE \left[ \eta(s, s+t) \, h(X_{s + t}) \,|\, (X_\tau)_{\tau=0}^s \right] \right]. $$ Using the Markov property {eq}`eq-ctmarp`, the inner expectation in the last display can be expressed as $$ \EE \left[ \exp \left(- \int_s^{s+t} \delta(X_\tau) \diff \tau \right) h(X_{s + \, t}) \,|\, (X_\tau)_{\tau=0}^s \right] \\ = \EE_{X_s} \left[ \exp \left(- \int_0^t \delta(X_\tau) \diff \tau \right) h(X_t) \right] = (K_t \, h) (X_s), $$ so $$ (K_{s \, + \, t} \, h)(x) = \EE_x \eta(0, s) (K_t \, h) (X_s) = \EE_x \exp \left(- \int_0^s \delta(X_\tau) \diff \tau \right) (K_t \, h) (X_s) = (K_s \, K_t \, h) (x). $$ This argument confirms that $K_{s \, + \, t} = K_s \circ K_t$. To see that $K_t$ is a positive operator for all $t$, observe that if $h \geq 0$, then the expectation in {eq}`eq-kthgen` is nonnegative. Hence $K_t \, h \geq 0$ whenever $h \geq 0$. To prove continuity of $t \mapsto K_t h$, it suffices to show that $(K_t h)(x) \to h(x)$ as $t \downarrow 0$ (see, e.g., {cite}`engel2006short`, Proposition 1.3). This holds by right-continuity of $X_t$, which gives $h(X_t) \to h(x)$ as $t \downarrow 0$, and hence $$ \lim_{t \downarrow 0} (K_t h)(x) = \EE_x \lim_{t \downarrow 0} \exp \left(- \int_0^t \delta(X_\tau) \diff \tau \right) h(X_t) = h(x). $$ (Readers familiar with measure theory can justify the change of limit and expectation via the dominated convergence theorem.) ◻ ``` #### Constant Discounting Many studies of continuous-time dynamic programming with discounting use a constant discount rate. In this setting, the lifetime value in {eq}`eq-vykh` becomes $$ v(x) \coloneq \EE_x \int_0^\infty \me^{-t \delta} h(X_t) \diff t $$ (eq-exp_gs_ct) for some $\delta \in \RR$ and $h \in \RR^\Xsf$. Here $(X_t)_{t \geq 0}$ is a continuous-time Markov chain on finite state $\Xsf$ generated by Markov semigroup $(P_t)_{t \geq 0}$ with intensity operator $Q$. The idea is that $h(X_t)$ is an instantaneous reward at each time $t$, while $\delta$ is a fixed discount rate. Equation {eq}`eq-exp_gs_ct` is the continuous-time version of {eq}`eq-exp_gs`. ```{prf:proposition} :label: p-expctval If $\delta > 0$, then $v$ in {eq}`eq-exp_gs_ct` is finite, $\delta I - Q$ is bijective, $$ (\delta I - Q)^{-1} \geq 0 \quad \text{and} \quad v = (\delta I - Q)^{-1} h. $$ (eq-ctvai) In addition, $v$ is the unique fixed point of $$ U w = h + (Q + (1 - \delta) I) w \qquad \left(w \in \RR^\Xsf\right) $$ (eq-uvhs) and $U$ is order stable on $\RR^\Xsf$. ``` ```{prf:proof} As a first step, we reverse the order of expectation and integration in {eq}`eq-exp_gs_ct` to get $$ v(x) = \int_0^\infty (K_t h)(x) \diff t \quad \text{where} \quad (K_t h)(x) \coloneq \me^{-t \delta} \, \EE_x \, h(X_t) = \me^{-t \delta} (P_t \, h)(x). $$ (This change of order can be justified by Fubini's theorem, which can be applied when $\EE_x \, \int_0^\infty \me^{-t \delta} \, | h(X_t) | \diff t < \infty$. Since $\Xsf$ is finite, we have $|h| \leq M < \infty$ for some constant $M$, and the double integral is dominated by $M \int_0^\infty \me^{-t\delta} \diff t = M / \delta$.) Note that $K_t$ is a special case of {eq}`eq-kthgen`. Hence $(K_t)_{t \geq 0}$ is a positive $C_0$-semigroup. Its infinitesimal generator is $A \coloneq Q - \delta I$, since $K_t = \me^{-t \delta} P_t = \me^{t(Q - \delta I)}$. We claim that $s(A) < 0$. To see this, observe that (using {eq}`eq-salim`), $$ \me^{s(Q - \delta I)} = \rho(\me^{Q - \delta I}) = \rho(\me^Q \me^{- \delta I}) = \rho(\me^Q \me^{- \delta} I) = \me^{- \delta} \rho(\me^Q ) = \me^{- \delta} \rho(P_1) = \me^{- \delta}. $$ Taking logs gives $s(Q - \delta I) = -\delta$. Since $\delta > 0$, we have $s(Q - \delta I) < 0$, as claimed. We can now apply {prf:ref}`p-gsgv` with $A = Q - \delta I$ and $K_t = \me^{tA}$. The proposition tells us that that $A$ is bijective, and $$ v = -A^{-1} h = (- A)^{-1} h = (\delta I - Q)^{-1} h . $$ It also tells us that $-A^{-1} \geq 0$, so $(\delta I - Q)^{-1} = (-A)^{-1} = - A^{-1} \geq 0$. This confirms both claims in {eq}`eq-ctvai`. Finally, the operator $U$ in {eq}`eq-uvhs` is a special case of $U$ in {eq}`eq-uvh`, with $A = Q - \delta I$, so $U$ is order stable with unique fixed point $v$ (by {prf:ref}`p-gsgv`). All of the claims in {prf:ref}`p-expctval` are now verified. ◻ ``` (ss-cadp)= ### Constructing a Decision Process In this section, we define continuous-time Markov decision processes, discuss optimality theory, and provide algorithms and applications. (definition)= #### Definition Given two finite sets $\Asf$ and $\Xsf$, called the state and action spaces respectively, we define a **continuous-time Markov decision process** (or **continuous-time MDP**) to be a tuple $\cC = (\Gamma, \delta, r, Q)$ consisting of 1. a nonempty correspondence $\Gamma$ from $\Xsf$ to $\Asf$, referred to as the **feasible correspondence**, which in turn defines the **feasible state-action pairs** $$ \Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)}, $$ 2. a constant $\delta > 0$, referred to as the **discount rate**, 3. a function $r$ from $\Gsf$ to $\RR$, referred to as the **reward function**, and 4. an **intensity kernel** $Q$ from $\Gsf$ to $\Xsf$; that is, a map $Q$ from $\Gsf \times \Xsf$ to $\RR$ satisfying $$ \sum_{x'} Q(x, a, x') = 0 \quad \text{ for all } (x,a) \text{ in } \Gsf $$ and $Q(x, a, x') \geq 0$ whenever $x \not= x'$. Informally, at state $x$ with action $a$ over the short interval from $t$ to $t+h$, the controller receives instantaneous reward $r(x,a)h$ and the state transitions to state $x'$ with probability $Q(x, a, x') h + o(h)$. Paralleling our discussion of the discrete time case in {prf:ref}`c-mdps`, the set of **feasible policies** is $$ \Sigma \coloneq \setntn{\sigma \in \Asf^\Xsf} {\sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf}. $$ (eq-dmdp_fp_c) #### Lifetime Values Choosing policy $\sigma$ from $\Sigma$ means that we respond to state $X_t$ with action $A_t \coloneq \sigma(X_t)$ at every $t \geq 0$. The state then evolves according to the intensity operator $$ Q_\sigma(x, x') \coloneq Q(x, \sigma(x), x') \qquad (x, x' \in \Xsf). $$ Letting $$ P^\sigma_t \coloneq \me^{t Q_\sigma} \quad \text{and} \quad r_\sigma(x) \coloneq r(x, \sigma(x)) \qquad (x \in \Xsf) $$ the **lifetime value** of following $\sigma$ starting from state $x$ is $$ v_\sigma (x) \coloneq \EE_x \int_0^\infty \me^{-\delta t} r(X_t, \sigma(X_t)) \diff t = \EE_x \int_0^\infty \me^{-\delta t} r_\sigma(X_t) \diff t, $$ (eq-lvctm) where $(X_t)_{t \geq 0}$ is $Q_\sigma$-Markov with initial condition $x$. We call $v_\sigma$ the **$\sigma$-value function**. Since $\delta > 0$, we can apply {prf:ref}`p-expctval` to obtain $$ v_\sigma = (\delta I - Q_\sigma)^{-1} r_\sigma. $$ (eq-ctfunv) Representation {eq}`eq-ctfunv` provides a straightforward method for computing $v_\sigma$. #### Greedy Policies A policy $\sigma \in \Sigma$ is called **$v$-greedy** for $\cC$ if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{r(x, a) + \sum_{x'} v(x') Q(x, a, x')\right\} \quad \text{for all } x \in \Xsf. $$ (eq-ctmdpgp) Like the discrete time case, a $v$-greedy policy chooses actions optimally to trade off high current rewards versus high rate of flow into future states with high values. Unlike the discrete time case, the discount factor does not appear in {eq}`eq-ctmdpgp` because the trade-off is instantaneous. #### Policy Iteration We introduce a continuous-time policy iteration algorithm that parallels discrete time HPI for Markov decision processes, as described in {ref}`sss-hpi`. The continuous-time HPI routine is given in {prf:ref}`algo-cthpi`, with the intuition being similar to that for the discrete time MDP version given. We provide convergence results in {ref}`ss-ctmdpopt`. ```{prf:algorithm} Continuous-time Howard policy iteration :label: algo-cthpi - input $\sigma_0 \in \Sigma$, an initial guess of $\sigma^*$ - $k \leftarrow 0$ - $\epsilon \leftarrow 1$ - while $\epsilon > 0 $: - $v_k \leftarrow (\delta I - Q_{\sigma_k})^{-1} r_{\sigma_k}$ - $\sigma_{k+1} \leftarrow $ a $v_k$-greedy policy - $\epsilon \leftarrow \1 \{ \sigma_k \not= \sigma_{k+1} \}$ - $k \leftarrow k + 1$ - return $\sigma_k$ ``` (sss-ctmdppo)= #### Policy Operators For each $\sigma \in \Sigma$, let $T_\sigma$ be the operator defined at $v \in \RR^\Xsf$ by $$ T_\sigma \, v = r_\sigma + (Q_\sigma + (1 - \delta) I) v. $$ (eq-ctmdppo) As shown in {prf:ref}`p-expctval`, each $T_\sigma$ is order stable on $\RR^\Xsf$, with unique fixed point $v_\sigma$. Hence $\aA \coloneq (\RR^\Xsf, \{T_\sigma\})$ is an order stable ADP. ```{exercise} :label: ex-ctgadpg Show that $\sigma$ is $v$-greedy (i.e., {eq}`eq-ctmdpgp` holds) if and only if $\sigma$ is $v$-greedy for $\aA$ in the sense of {ref}`sss-opset`. ``` ```{solution} ex-ctgadpg Fix $v \in \RR^\Xsf$. Policy $\sigma$ is $v$-max-greedy for $\aA$ if and only if $T_\sigma \, v \geq T_\tau \, v$ for all $\tau \in \Sigma$, which in turn holds if and only if (eq-vgctp)= $$ \begin{aligned} r(x, \sigma(x) & + \sum_{x'} v^*(x') Q(x, \sigma(x), x') + (1 - \delta) v^*(x) \\ & = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v^*(x') Q(x, a, x') \right\} + (1 - \delta) v^*(x), \end{aligned} $$ for all $x \in \Xsf$. Canceling terms, this reduces to $$ r(x, \sigma(x) + \sum_{x'} v^*(x') Q(x, \sigma(x), x') = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v^*(x') Q(x, a, x') \right\}, $$ for all $x \in \Xsf$, which is equivalent to the definition of $v^*$-greedy for $\cC$ in {eq}`eq-ctmdpgp`. ``` (ss-ctmdpopt)= ### Optimality For a continuous-time MDP $\cC = (\Gamma, \delta, r, Q)$ with $\sigma$-value functions $\{v_\sigma\}$, - the **value function** generated by $\cC$ is $v^* \coloneq \bigvee_\sigma v_\sigma$, and - a policy is called **optimal** for $\cC$ if $v_\sigma = v^*$. A function $v \in \RR^\Xsf$ is said to satisfy a **Hamilton--Jacobi--Bellman** (**HJB**) equation if $$ \delta v(x) = \max_{a \in \Gamma(x)} \left\{r(x, a) + \sum_{x'} v(x') Q(x, a, x')\right\} \quad \text{for all } x \in \Xsf. $$ (eq-hjb) We say that $\cC$ obeys **Bellman's principle of optimality** if $$ \sigma \in \Sigma \text{ is optimal for } \cC \quad \iff \quad \sigma \text{ is } v^* \text{-greedy}. $$ Here is our main optimality result for continuous-time MDPs. ```{prf:theorem} :label: t-ct_fbk For any continuous-time MDP $\cC = (\Gamma, \delta, r, Q)$, 1. the value function $v^*$ is the unique solution to the HJB equation in $\RR^\Xsf$, 2. $\cC$ obeys Bellman's principle of optimality, and 3. $\cC$ has at least one optimal policy. In addition, continuous-time HPI converges to an optimal policy in finitely many steps. ``` ```{prf:proof} Let $\cC = (\Gamma, \delta, r, Q)$ be a fixed continuous-time MDP with lifetime values $\{v_\sigma\}$ and value function $v^*$. Consider the order stable ADP $\aA \coloneq (\RR^\Xsf, \{T_\sigma\})$ discussed in {ref}`sss-ctmdppo`. The ADP Bellman max-operator is $T \coloneq \bigvee_\sigma T_\sigma$, which can be written more explicitly as $$ (T v)(x) = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v(x') Q(x, a, x') \right\} + (1 - \delta) v(x) . $$ (eq-ubvb) It is clear from {eq}`eq-ctmdpgp` and {prf:ref}`ex-ctgadpg` that, for each $v \in \RR^\Xsf$, the set of $v$-max-greedy policies is nonempty. Since $\Sigma$ is finite, it follows from {prf:ref}`p-fposet` that $\aA$ is max-stable. Hence, by {prf:ref}`t-fbk`, an optimal policy always exists and the value function $v^*$ is the unique fixed point of $T$ in $\RR^\Xsf$. The last statement is equivalent to the assertion that $v^*$ is the unique element of $\RR^\Xsf$ satisfying $$ v^*(x) = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v^*(x') Q(x, a, x') \right\} + (1 - \delta) v^*(x). $$ Rearranging this expression confirms that $v^*$ is the unique solution to the HJB equation in $\RR^\Xsf$. Applying {prf:ref}`t-fbk` again, a policy is optimal for $\aA$ if and only if $T_\sigma \, v^* = T v^*$. Since the definition of optimality for $\aA$ coincides with the definition of optimality for $\cC$, we see that $\cC$ obeys Bellman's principle of optimality. The continuous-time HPI routine described in {prf:ref}`algo-cthpi` is just ADP max-HPI (see {ref}`sss-opres`) specialized to the current setting. Hence, applying {prf:ref}`t-fbk` once more, continuous time HPI converges to an optimal policy in finitely many steps. ◻ ``` ### Application: Job Search Here we study a continuous-time version of the job search problem with separation considered in {ref}`ss-jsws`. As before, a worker can be either unemployed (state $0$) or employed (state $1$). When the worker is employed, she can be fired at any time. Firing occurs at rate $\alpha > 0$, meaning that the probability of being fired over the short interval from $t$ to $t+h$ is approximately $\alpha h$. When unemployed, the worker receives flow unemployment compensation $c$ and job offers at rate $\kappa$. She can choose either to accept or to reject an offer; she discounts the future at rate $\delta > 0$. We assume that job offers are associated with wage offers that take values in finite set $\Wsf$. Let $P \in \mopw$ give probabilities for new wage draws, so that, conditional on previous draw $w$, the next offer is drawn from $P(w, \cdot)$. For the state space we set $\Xsf = \{0, 1\} \times \Wsf$, with typical state $x = (s, w)$. Here $s$ is binary and indicates current employment status, while $w$ is the current wage. Let $$ \lambda(x) = \lambda(s, w) = \1\{s = 0\} \kappa + \1\{s = 1\} \alpha $$ denote the state-dependent jump rate, which switches between $\kappa$ and $\alpha$ depending on employment status. Let $a \in \Asf \coloneq \{0,1\}$ indicate the action, where $0$ means reject and $1$ means accept. Let $\Pi(x, a, x')$ represent the jump probabilities, with $$ \begin{aligned} \Pi((0, w), a, (0, w')) & = P(w, w') (1-a) \qquad \text{(unemployed to unemployed)} \\ \Pi((0, w), a, (1, w')) & = P(w, w') a \; \,\qquad \qquad \text{(unemployed to employed)} \\ \Pi((1, w), a, (0, w')) & = P(w, w') \quad \qquad \qquad \text{(employed to unemployed)} \\ \Pi((1, w), a, (1, w')) & = 0. \qquad\qquad\qquad \quad \; \; \, \, \text{(employed to employed)} \end{aligned} $$ The first two lines consider jump probabilities for the state $(s, w)$ when unemployed and the action is $a$. The second two consider jump probabilities when employed. The reason that the probability assigned to the last line is zero is that a jump from $s=1$ occurs because the worker is fired, so the value of $s$ after the jump is zero. ```{exercise} :label: ex-ctime-auto-14 Prove that $\Pi$ is a stochastic kernel, in the sense that $\Pi \geq 0$ and $\sum_{x'} \Pi(x, a, x') = 1$ for all possible $(x, a) = ((s, w), a)$ in $\Xsf \times \{0,1\}$. ``` Motivated by the jump chain construction of intensity matrices in {eq}`eq-qfromj`, we set $$ Q(x, a, x') = \lambda(x) (\Pi(x, a, x') - I(x, x')). $$ It follows that, for any $\sigma \in \Sigma \coloneq \{0,1\}^\Xsf$, the operator $$ Q_\sigma(x, x') \coloneq \lambda(x) (\Pi(x, \sigma(x), x') - I(x, x')), $$ is an intensity matrix for the jump chain under policy $\sigma$. If we define $$ r(x, a) = r((s, w), a) = c \1\{s = 0\} + w \1\{s = 1\}, $$ then lifetime value is given by {eq}`eq-lvctm`, where $(X_t)_{t \geq 0}$ is $Q_\sigma$-Markov and $X_0 = x$. With $\Gamma$ defined by $\Gamma(x) = \Asf$ for all $x \in \Xsf$, the tuple $\cC = (\Gamma, \delta, r, Q)$ is a continuous-time MDP and {prf:ref}`t-ct_fbk` applies. In particular, an optimal policy exists and can be computed with HPI in a finite number of iterations. Figure {numref}`f-cont_time_js_pol` shows an optimal policy computed in this way. (Code and parameter values can be found in `cont_time_js.jl`.) The policy is of threshold type, with a reservation wage of around 12. Figure {numref}`f-cont_time_js_res` shows how this reservation wage changes with parameters. The reservation wage increases as the separation rate falls, as the offer rate increases, as the discount rate falls, and as unemployment compensation increases. ```{exercise} :label: ex-ctime-auto-15 Provide economic intuition for the monotone relationships between parameters and the reservation wage discussed in the preceding paragraph. ``` ```{figure} ../figures/cont_time_js_pol.pdf :name: f-cont_time_js_pol Continuous-time job search policy ``` ```{figure} ../figures/cont_time_js_res.pdf :name: f-cont_time_js_res Continuous-time job search reservation wage ``` (s-cn_ctime)= ## Chapter Notes {cite:t}`applebaum2019semigroups` and {cite:t}`engel2006short` provide elegant introductions to semigroup theory and its applications in studying partial and stochastic differential equations. The beautiful book by {cite:t}`lasota1994chaos` and covers connections among semigroups, Markov processes, and stochastic differential equations. {cite:t}`norris1998markov` provides a good introduction to continuous-time Markov chains, while {cite:t}`liggett2010continuous` is more advanced. A rigorous treatment of continuous-time MDPs can be found in {cite:t}`hernandez2012further`, which also handles the case where $\Xsf$ is countably infinite. Our approach is somewhat different, since our main optimality results rest on the ADP theory in {prf:ref}`c-adps`. In recent years, continuous-time dynamic programming has become more common in macroeconomic analysis. Influential references include {cite:t}`nuno2018social`, {cite:t}`kaplan2018monetary`, {cite:t}`achdou2022income`, and {cite:t}`fernandez2023financial`. For computational aspects, see {cite:t}`duarte2018machine`, {cite:t}`rafales2021equilibrium`, {cite:t}`rendahl2022continuous`, and {cite:t}`eslami2023art`. [^1]: Other names for intensity matrices include "$Q$-matrices" (which is fine until you need to use another symbol), "Kolmogorov matrices," and "infinitesimal stochastic matrices." ======================================================================== ## Suprema and Infima (c-areal)= # Suprema and Infima This section of the appendix contains an extremely brief review of basic facts concerning sets, functions, suprema, and infima. We recommend {cite}`bartle2011introduction` for those who wish to learn more. (ss-setsfuns)= ## Sets and Functions A **set** is a collection of objections viewed as a whole. Examples include the set of **natural numbers** $\NN \coloneq \{1, 2, \ldots\}$ and $\natset{n} \coloneq \{1, 2, \ldots, n\}$ when $n \in \NN$. The set that contains no elements is called the **empty set** and denoted by $\emptyset$. Let $A$ and $B$ be two sets and let $A \times B$ be their **Cartesian product**, defined as the set of all ordered pairs $(a, b)$ such that $a \in A$ and $b \in B$. A **binary relation** $\sim$ between two sets $A$ and $B$ is a subset of $A \times B$. If $(a, b)$ is in this subset we write $a \sim b$. An **equivalence relation** on $A$ is a binary relation $\sim$ between $A$ and itself that is reflexive, symmetric, and transitive. That is, 1. $a \sim a$ for all $a \in A$, 2. $a \sim a'$ implies $a' \sim a$, and 3. $a \sim a'$ and $a' \sim a''$ implies $a \sim a''$. A **function** $f$ from set $A$ to set $B$, written $A \ni x \mapsto f(x) \in B$ or $f \colon A \to B$, is a rule (in fact, a binary relation) associating to each and every element $a$ in $A$ one and only one element $b \in B$. The point $b$ is also written as $f(a)$, and called the **image** of $a$ under $f$. For $C \subset A$, the set $f(C)$ is the set of all images of points in $C$, and is called the image of $C$ under $f$. Also, for $D \subset B$, the set $f^{-1}(D)$ is all points in $A$ that map into $D$ under $f$, and is called the **preimage** of $D$ under $f$. A function $f \colon A \to B$ is called **one-to-one** if distinct elements of $A$ are always mapped into distinct elements of $B$, and **onto** if every element of $B$ is the image under $f$ of at least one point in $A$. A **bijection** or **one-to-one correspondence** from $A$ to $B$ is a function $f$ from $A$ to $B$ that is both one-to-one and onto. A set $\Xsf$ is called **finite** if there exists a bijection from $\Xsf$ to $[n] \coloneq \{1, \ldots, n\}$ for some $n \in \NN$. In this case, we can write $\Xsf = \{x_1, \ldots, x_n\}$. The number $n$ is called the **cardinality** of $\Xsf$. Note that, according to our definition, every finite set is automatically nonempty. If $f \colon A \to B$ and $g \colon B \to C$, then the **composition** of $f$ and $g$ is the function $g \circ f$ from $A$ to $C$ defined at $a \in A$ by $(g \circ f)(a) \coloneq g(f(a))$. ## Some Properties of The Real Line Given a subset $A$ of $\RR$, we call $u \in \RR$ an **upper bound** of $A$ if $a \leq u$ for all $a$ in $A$. A **lower bound** of $A$ is any number $\ell$ such that $\ell \leq a$ for all $a \in A$. If $A$ has both an upper and lower bound then $A$ is called **bounded**. Equivalently, $A$ is bounded whenever there exists an $n \in \NN$ with $A \subset [-n, n]$. Let $U(A)$ be the set of all upper bounds of $A$. An element $\bar u$ of $\RR$ is called a **supremum** or **least upper bound** of $A$ if 1. $\bar u \in U(A)$ and 2. $\bar u \leq u$ for every $u \in U(A)$. When a supremum of $A$ exists in $\RR$, we write it as $\sup A$. ```{prf:example} For the set $I \coloneq [0, 1] \subset \RR$, the number $1$ is an upper bound of $I$. Moreover, if $u$ is an upper bound of $I$, then $u \geq 1$. Hence $1$ is the supremum of $I$. ``` ```{prf:example} $\NN$ has no supremum in $\RR$, since the set of upper bounds is empty. ``` ```{exercise} :label: ex-appA-auto-1 Show that, for all of the sets $(0,1)$, $[0, 1)$ and $(0, 1]$, the number $1$ is the supremum of the set. ``` ```{exercise} :label: ex-appA-auto-2 Fix $A \subset \RR$. Prove that, for $s \in U(A)$, we have $s = \sup A$ if and only if, for all $\epsilon > 0$, there exists a point $a \in A$ with $a > s - \epsilon$. ``` ```{exercise} :label: ex-appA-auto-3 Fix $A \subset \RR$. Prove that $A$ has at most one supremum. ``` One of the most important properties of $\RR$ is stated below. ```{prf:theorem} :label: t-lubp Every nonempty subset of $\RR$ with an upper bound in $\RR$ has a supremum in $\RR$. ``` {prf:ref}`t-lubp` is often taken as axiomatic in formal constructions of the real numbers. (Alternatively, one may assume completeness of the reals and then prove {prf:ref}`t-lubp` using this property. See, e.g., {cite}`bartle2011introduction`.) If $i \in \RR$ is a lower bound for $A$ and also satisfies $i \geq \ell$ for every lower bound $\ell$ of $A$, then $i$ is called the **infimum** of $A$ and we write $i = \inf A$. At most one such $i$ exists, and every nonempty subset of $\RR$ bounded from below has an infimum. A **real sequence** is a map $x$ from $\NN$ to $\RR$, with the value of the function at $k \in \NN$ typically denoted by $x_k$ rather than $x(k)$. A real sequence $x = (x_k)_{k \geq 1} \coloneq (x_k)_{k \in \NN}$ is said to **converge** to $\bar x \in \RR$ if, for each $\epsilon > 0$, there exists an $N \in \NN$ such that $|x_k - \bar x| < \epsilon$ for all $k \geq N$. In this case, we write $\lim_k x_k = \bar x$ or $x_k \to \bar x$. {cite}`bartle2011introduction` give an excellent introduction to real sequences and their basic properties. A real sequence $(x_k)_{k \geq 1}$ is called **increasing** if $x_k \leq x_{k+1}$ for all $k$ and **decreasing** if $x_{k+1} \leq x_k$ for all $k$. If $(x_k)_{k \geq 1}$ is increasing (resp., decreasing) and $x_k \to x \in \RR$ then we also write $x_k \uparrow x$ (resp., $x_k \downarrow x$). ```{exercise} :label: ex-mssl Let $(x_k)$ be a bounded monotone increasing sequence in $\RR$. Prove that $\sup_k x_k = \lim_k x_k$. ``` Let $(x_k)$ be a real sequence in $\RR$ and set $s_n \coloneq \sum_{k=1}^n x_k$. If the sequence $(s_n)$ converges to some $s \in \RR$, then we set $$ \sum_{k=1}^\infty x_k \coloneq \sum_{k \geq 1} x_k \coloneq s = \lim_{n \to \infty} s_n. $$ We say that the **series** $\sum_{k=1}^n x_k$ converges to $\sum_{k=1}^\infty x_k$. ## Max and Min A number $m$ contained in a subset $A$ of $\RR$ is called the **maximum** of $A$ and we write $m = \max A$ if $a \leq m$ for every $a \in A$. It is called the **minimum** of $A$ and we write $m = \min A$ if $a \geq m$ for every $a \in A$. ```{exercise} :label: ex-appA-auto-4 Prove: If $A$ is a finite subset of $\RR$, then $\sup A = \max A$. ``` A subset $A$ of $\RR$ is called **closed** if, for any sequence $(x_n)$ contained in $A$ and converging to some limit $x \in \RR$, the limit $x$ is in $A$. ```{exercise} :label: ex-exmm Show that, if $A$ is a closed and bounded subset of $\RR$, then $A$ has both a maximum and a minimum. ``` ```{exercise} :label: ex-appA-auto-5 Prove the following statements: 1. If $A \subset B$, then $\sup A \leq \sup B$. 2. If $s = \sup A$ and $s \in A$, then $s = \max A$. 3. If $i = \inf A$ and $i \in A$, then $i = \min A$. ``` Given an arbitrary set $D$ and a function $f \colon D \to \RR$, define $$ \sup_{x \in D} f(x) \coloneq \sup \setntn{f(x)}{x \in D} \quad \text{and} \quad \max_{x \in D} f(x) \coloneq \max \setntn{f(x)}{x \in D} $$ whenever the latter exists. The terms $\inf_{x \in D} f(x)$ and $\min_{x \in D} f(x)$ are defined analogously. A point $x^* \in D$ is called a - **maximizer** of $f$ on $D$ if $x^* \in D$ and $f(x^*) \geq f(x)$ for all $x \in D$, and a - **minimizer** of $f$ on $D$ if $x^* \in D$ and $f(x^*) \leq f(x)$ for all $x \in D$. Equivalently, $x^* \in D$ is a maximizer of $f$ on $D$ if $f(x^*) = \max_{x \in D} f(x)$, and a minimizer if $f(x^*) = \min_{x \in D} f(x)$. We define $$ \argmax_{x \in D} f(x) \coloneq \setntn{x^* \in X}{f(x^*) \geq f(x) \text{ for all } x \in D}. $$ The set $\argmin_{x \in D} f(x)$ is defined analogously. ======================================================================== ## Remaining Proofs (c-ai)= # Remaining Proofs ## {prf:ref}`c-fpt` Results ```{prf:proof} :label: p-l-eqfst *Proof of {prf:ref}`l-eqfst`.* Regarding (i), fix $\phi, \psi \in \dD(\Xsf)$ with $\phi \lefsd \psi$. Pick any $y \in \Xsf$. By transitivity of partial orders, the function $u(x) \coloneq \1\{y \preceq x\}$ is in $i\RR^\Xsf$. Hence $\sum_x u(x) \phi(x) \leq \sum_x u(x) \psi(x)$. Given the definition of $u$, this is equivalent to $G^\phi(y) \leq G^\psi(y)$. As $y$ was chosen arbitrarily, we have $G^\phi \leq G^\psi$ pointwise on $\Xsf$. Regarding (ii), let $\phi, \psi \in \dD(\Xsf)$ be such that $G^\phi \leq G^\psi$ and let $\Xsf$ be totally ordered by $\preceq$. We can write $\Xsf$ as $\{x_1, \ldots, x_n\}$ with $x_i \preceq x_{i+1}$ for all $i$. Pick any $u \in i\RR^\Xsf$ and let $\alpha_i = u(x_i)$. By {prf:ref}`ex-iu`, we can write $u$ as $u(x) = \sum_{i=1}^n s_i \1\{x \succeq x_i\}$ at each $x \in \Xsf$, where $s_i \geq 0$ for all $i$. Hence $$ \sum_{x \in \Xsf} u(x) \phi(x) = \sum_{x \in \Xsf} \sum_{i=1}^n s_i \1\{x \succeq x_i\} \phi(x) = \sum_{i=1}^n s_i \sum_{x \in \Xsf} \1\{x \succeq x_i\} \phi(x) = \sum_{i=1}^n s_i \, G^\phi(x_i). $$ A similar argument gives $\sum_{x \in \Xsf} u(x) \psi(x) = \sum_{i=1}^n s_i \, G^\psi(x_i)$. Since $G^\phi \leq G^\psi$, we have $$ \sum_{x \in \Xsf} u(x) \phi(x) = \sum_{i=1}^n s_i \, G^\phi(x_i) \leq \sum_{i=1}^n s_i \, G^\psi(x_i) = \sum_{x \in \Xsf} u(x) \psi(x) . $$ We conclude that $\phi \lefsd \psi$, as was to be shown. ◻ ``` (s-state_dep_append)= ## {prf:ref}`c-state_dep` Results We adopt the setting of {ref}`sss-do_theory` and consider the claim $$ \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] h(X_t) = \sum_{t=0}^\infty \EE_x \, \left[ \prod_{i=0}^t \beta_i \right] h(X_t), $$ (eq-passexp) when $(X_t)$ is $P$-Markov with initial condition $x$ and $h \in \RR^\Xsf$. Throughout this discussion the assumption $\rho(L) < 1$ is in force (see {prf:ref}`t-dpec`). Unlike the rest of the book, we assume some familiarity with measure theory, at the level of, say, {cite}`dudley2002real`, Chapters 3 and 4. To begin the discussion we set $$ F_T \coloneq \sum_{t=0}^T \delta_t \, h(X_t) \quad \text{and} \quad F \coloneq \sum_{t=0}^\infty \delta_t \, h(X_t) \quad \text{where} \quad \delta_t \coloneq \prod_{i=0}^t \beta_i. $$ Our first aim is to show that $F$ is a well-defined random variable, in the sense that the sum converges almost surely. Since absolute convergence of real series implies convergence, and since finite expectation implies finiteness almost everywhere, it suffices to show that $$ \EE_x \, \sum_{t=0}^\infty \delta_t \, |h(X_t)| < \infty. $$ (eq-exsid) By the monotone convergence theorem (see, e.g., {cite}`dudley2002real`, Theorem 4.3.2), we have $$ \EE_x \, \sum_{t=0}^\infty \delta_t \, |h(X_t)| = \sum_{t=0}^\infty \EE_x \, \delta_t \, |h(X_t)| = \sum_{t=0}^\infty (L^t |h|)(x) , $$ where the last equality is by {eq}`eq-ebbh`. Since $\rho(L) < 1$, we have shown that {eq}`eq-exsid` holds, which in turn confirms that $F$ is well-defined and finite almost surely. Now observe that, on the probability one set where $F$ is finite, we have $F_T \to F$ as $T \to \infty$. Moreover, $$ |F_T| \leq \sum_{t=0}^T \delta_t \, |h(X_t)| \leq Y \coloneq \sum_{t=0}^\infty \delta_t \, |h(X_t)|, $$ and, as shown above, $\EE_x \, Y < \infty$. By the dominated convergence theorem, we now have $\EE_x \, F = \lim_{T \to \infty} \EE_x \, F_T$, or, equivalently, $$ \EE_x \, \sum_{t=0}^\infty \delta_t \, h(X_t) = \lim_{T \to \infty} \EE_x \, \sum_{t=0}^T \delta_t \, h(X_t) = \lim_{T \to \infty} \sum_{t=0}^T \EE_x \, \delta_t \, h(X_t) = \sum_{t=0}^\infty \EE_x \, \delta_t \, h(X_t). $$ Hence {eq}`eq-passexp` holds. ## {prf:ref}`c-val` Results ```{prf:proof} :label: pt-du *Proof of uniqueness for {prf:ref}`t-du`.* We focus on the concave case. Let $I$ be as in {prf:ref}`t-du` and suppose that $T$ is an order-preserving concave self map on $I$ with $T \phi \gg \phi$. By {prf:ref}`t-kt_pre`, $T$ has least and greatest fixed points in $I$. We denote them by $a$ and $b$, respectively. Let $$ \lambda = \min_{x \in \Xsf} \frac{a(x) - \phi(x)}{b(x) - \phi(x)}, $$ and let $\bar x$ be a minimizer. It follows immediately from its definition that $\lambda$ obeys $0 \leq \lambda \leq 1$ and $$ a(x) \geq \lambda b(x) + (1-\lambda) \phi(x) \quad \text{for all } x \in \Xsf \text{ with equality at } \bar x. $$ As a result, applying the assumed properties of $T$, we have $$ a = Ta \geq T(\lambda b + (1-\lambda) \phi) \geq \lambda b + (1-\lambda) T \phi. $$ Suppose now that $\lambda < 1$. Since $T \phi \gg \phi$, we get $a \gg \lambda b + (1-\lambda) \phi$, and evaluating this at $\bar x$ yields $$ a(\bar x) > \lambda b(\bar x) + (1-\lambda) \phi(\bar x) = a(\bar x), $$ which is a contradiction. Hence $\lambda=1$ and, therefore, $a \geq b$. Since all fixed points $\bar u$ of $T$ in $I$ obey $a \leq \bar u \leq b$, we see that $a = b$ is the unique fixed point of $T$ in $I$. ◻ ``` (sss-port)= ## {prf:ref}`c-adps` Results Let's now turn to the proof of the core optimality results for ADPs. In what follows, $\aA = (V, \{T_\sigma\})$ is a well-posed ADP with Bellman operator $\tmax$ and $\sigma$-value functions $\{ v_\sigma \}_{\sigma \in \Sigma}$. We start with ```{prf:lemma} :label: l-hmu If $\aA$ is order stable, then the following statements hold: 1. $v \in V_u \implies v \preceq \Hmax \, v$. 2. If $\sigma \in \Sigma$ and $T v_\sigma = v_\sigma$, then $v_\sigma = \vmax$. 3. If $v \in V$ and $\Hmax \, v =v$, then $v = \vmax$ and $\tmax \, \vmax = \vmax$. 4. If $\aA$ is finite, then $\vmax$ exists in $V$ and $\Hmax \, \vmax = \vmax$. Moreover, for all $v \in V$, the HPI sequence $(v_k)$ defined by $v_k = \Hmax^k v$ converges to $\vmax$ in finitely many steps. 5. Fix $v \in V$ and let $(v_k)$ be the HPI sequence defined by $v_k = \Hmax^k v$ for $k \in \NN$. If $v_{k+1} = v_k$ for some $k \in \NN$, then $v_k = \vmax$ and every $v_k$-greedy policy is optimal. ``` ```{prf:proof} Regarding (i), fix $v \in V_u$ and let $\tau$ be $v$-greedy, with $\Hmax v = v_{\tau}$. Since $v \in V_u$, we have $v \preceq \tmax \, v = T_\tau \, v$. This inequality and upward stability of $T_\tau$ yield $v \preceq v_\tau$. But then $v \preceq \Hmax v$, as claimed. Regarding (ii), suppose $\sigma \in \Sigma$ and $\tmax \, v_\sigma = v_\sigma$. Fix $\tau \in \Sigma$ and note that $v_\sigma = \tmax \, v_\sigma \succeq T_\tau \, v_\sigma$. Downward stability of $T_\tau$ implies $v_\sigma \succeq v_\tau$. Since $\tau \in \Sigma$ was arbitrary, $v_\sigma = \vmax$. Regarding (iii), fix $v \in V$ with $\Hmax \, v = v$ and let $\sigma$ be such that $\Hmax \, v = v_\sigma$. Then $v_\sigma = v$, and, since $\sigma$ is $v$-greedy, $T_\sigma \, v = \tmax \, v$. But then $T_\sigma \, v_\sigma = \tmax \, v_\sigma$, and, since $v_\sigma = T_\sigma \, v_\sigma$, we have $v_\sigma = \tmax \, v_\sigma$. Part (ii) now implies $v = v_\sigma = \vmax$. This proves the first claim. Regarding the second, substituting $v_\sigma = \vmax$ into $v_\sigma = \tmax \, v_\sigma$ yields $\vmax = \tmax \,\vmax$. For (iv), it suffices to show that $\Hmax \, \vmax = \vmax$ and there exists a $K \in \NN$ such that $\Hmax^K v = \vmax$. To this end, let $v_k = \Hmax^k v$ and note that $v_k \in V_\Sigma$ for all $k \geq 1$. Part (i) implies that $v_{k+1} \succeq v_k$ for all $k \in \NN$. Since the sequence $(v_k)$ is contained in the finite set $V_\Sigma$, it must be that $v_{K+1} = v_K$ for some $K \in \NN$ (since otherwise $V_\Sigma$ contains an infinite sequence of distinct points). But then $\Hmax \, v_K = v_{K+1} = v_K$, so $v_K$ is a fixed point of $\Hmax$. Part (iii) now implies that $v_K =\vmax$. For (v), let $(v_k)$ be as stated and suppose that $v_{k+1} = v_k$ for some $k \in \NN$. Then $v_k$ is a fixed point of $\Hmax$, so, by (iii) above, we have $v_k = \vmax$. By Bellman's principle of optimality, every $v_k$-greedy policy is optimal. ◻ ``` ```{prf:proof} :label: pp-fposet *Proof of {prf:ref}`p-fposet`.* If $\aA$ is finite, then, by (iii) and (iv) of {prf:ref}`l-hmu`, the point $\vmax$ exists in $V$ and is a fixed point of $\tmax$. ◻ ``` We first prove {prf:ref}`p-fbkc` and then return to {prf:ref}`t-fbk`. ```{prf:proof} *Proof of {prf:ref}`p-fbkc`.* Let $\aA$ be max-stable. We need to establish the following claims. 1. $V_\Sigma$ has a greatest element $\vmax$, 2. $\vmax$ is the unique fixed point of $\tmax$ in $V$, 3. a policy is optimal if and only if it is $\vmax$-greedy, and 4. at least one optimal policy exists. For claims (a) and (b), we observe that, by max-stability, $\tmax$ has a fixed point $\bar v$ in $V$. By existence of greedy policies, we can find a $\sigma \in \Sigma$ such that $\bar v = \tmax \, \bar v = T_\sigma \, \bar v$. But $T_\sigma$ has a unique fixed point in $V$, equal to $v_\sigma$, so $\bar v = v_\sigma$. Moreover, if $\tau$ is any policy, then $T_\tau \, \bar v \preceq \tmax \, \bar v = \bar v$ and hence, by downward stability, $v_\tau \preceq \bar v$. These facts imply that $\vmax \coloneq \bar v$ is the greatest element of $V_\Sigma$ and a fixed point of $\tmax$. Since greatest elements are unique, $\vmax$ is the only fixed point of $\tmax$ in $V$. Regarding (c), parts (a) and (b) give $\vmax \in V$ and $\tmax \, \vmax = \vmax$. Now recall that $\sigma$ is optimal if and only if $v_\sigma = \vmax$. Since $v_\sigma$ is the unique fixed point of $T_\sigma$, this is equivalent to $T_\sigma \, \vmax = \vmax$. Since $\tmax \, \vmax = \vmax$, the last statement is equivalent to $T_\sigma \, \vmax = \tmax \vmax$, which is, in turn, equivalent to the statement that $\sigma$ is $\vmax$-greedy. Part (d) follows directly from (a). ◻ ``` ```{prf:proof} *Proof of {prf:ref}`t-fbk`.* Parts (i)--(iv) of {prf:ref}`t-fbk` follow from {prf:ref}`p-fbkc`, which provides optimality results for max-stable ADPs, and {prf:ref}`p-fposet`, which tells us that every finite order stable ADP is max-stable. Regarding the final claim in {prf:ref}`t-fbk`, on convergence of HPI, suppose that $\aA$ is finite and order stable. If HPI terminates, then (v) of {prf:ref}`l-hmu` implies that it returns an optimal policy. Part (iv) of the same lemma implies that HPI terminates in finitely many steps. ◻ ``` ======================================================================== ## License & AI Training # License & AI Training Permission This page documents the licenses applied to *Dynamic Programming Volume I: Finite States* and the explicit permission granted by the authors and QuantEcon for indexing, text and data mining, and AI training use. ## Licenses | Component | License | |---|---| | Book prose, equations, and figures | [Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0)](https://creativecommons.org/licenses/by-sa/4.0/) | | Source code (Python, Julia, build scripts) | [MIT License](https://opensource.org/licenses/MIT) | Third-party material (figures or quotations attributed to other sources) remains under its original license; such material is identified inline where it appears. ## Permission grant The authors and QuantEcon explicitly permit and encourage the use of this book and its source files (prose, equations, figures, code, and bibliography) for: - copying and indexing by search engines and crawlers; - text and data mining; - training, fine-tuning, evaluation, and benchmarking of AI / machine-learning models, including large language models; - research, scholarship, and educational use; - inclusion in derivative datasets, corpora, and embeddings. This permission is granted with **attribution to the authors and QuantEcon** and is consistent with the licenses listed above. ## Preferred citation ```bibtex @book{sargent_stachurski_dp1, author = {Sargent, Thomas J. and Stachurski, John}, title = {Dynamic Programming Volume I: Finite States}, publisher = {QuantEcon}, year = {2024}, url = {https://dp.quantecon.org} } ``` ## Machine-readable artifacts For LLM ingestion the site also publishes: - [`/llms.txt`](/llms.txt) — curated chapter index ([llmstxt.org](https://llmstxt.org) standard) - [`/llms-full.txt`](/llms-full.txt) — concatenated full Markdown source - [`/robots.txt`](/robots.txt) — explicit `Allow` for major AI crawlers ## Contact Questions about reuse, licensing, or attribution can be opened as issues on the [source repository](https://github.com/QuantEcon/book-dp1) or directed to . ========================================================================