# Dynamic Programming Volume II: General States Authors: Thomas J. Sargent, John Stachurski Source: https://github.com/QuantEcon/book-dp2 Site: https://book-dp2.quantecon.org Prose license: CC-BY-SA-4.0 Code license: MIT This file concatenates the full Markdown source of the book for LLM ingestion. Equations are LaTeX inside Markdown. Cross-references and figure paths refer to the published HTML site. ======================================================================== ## index This book covers the theory of dynamic programming with general state spaces. It is organized as follows: **Front Matter** - [](preface.md) - [](common_symbols.md) **Prelude** - [](ch_egs.md) — Examples of Dynamic Programs **Part I: General Theory** - [](ch_adps.md) — Abstract Decision Processes - [](ch_adps2.md) — ADPs on Pospace - [](ch_adps3.md) — ADPs on Banach Space - [](ch_transforms.md) — ADP Transformations **Part II: Models and Applications** - [](ch_ldps.md) — Linear Decision Processes - [](ch_rdps.md) — Recursive Decision Processes - [](ch_apps.md) — Additional Applications - [](ch_approx_learning.md) — Approximation and Learning **Appendices** - [](ch_math_foundations.md) — Mathematical Background - [](ch_solutions.md) — Solutions ======================================================================== ## Preface This book is the second of a two-volume sequence on theory and applications of dynamic programming. Volume 1 {cite:p}`sargent2025dynamic` focused on models with finite state and action spaces. This volume treats general state and action spaces. This is useful because many models are more easily expressed as optimization problems with infinite sets for states and actions. Moreover, settings with infinite states and choices give us access to new tools, including calculus and gradients. Following on from Chapters 8 and 9 of {cite}`sargent2025dynamic`, we work within an abstract setting that builds on the framework in {cite}`bertsekas2022abstract` and {cite}`sargent2025partially`. Four features distinguish our treatment from the standard textbook approach (e.g., {cite:t}`stokey1989recursive,puterman2005markov,hernandez2012discrete`). First, at least for all high-level theory, we replace metric and contraction assumptions with order-theoretic ones. The policy operators of dynamic programming are almost universally order preserving, even when they fail to be contractions, and order theory turns out to provide the right foundation for a unified optimality theory. Second, this shift lets us treat, within a single framework, a very wide range of problems drawn from economics, finance, operations research, and artificial intelligence: state-dependent, negative, and nonlinear discounting; recursive, risk-sensitive, and ambiguity-averse preferences; robust and adversarial control; sequential analysis and stochastic shortest paths; post-action value functions used in structural estimation; distributional dynamic programming; and linear-quadratic control on the cone of positive semidefinite matrices. Most of these problems fall outside the reach of standard theory. Third, the framework specializes cleanly to the classical setting: standard Markov decision processes, optimal savings problems, natural resource problems, inventory control problems, discounted optimal stopping problems, and firm-valuation problems all reappear as concrete applications of a single abstract theory. Fourth, the resulting proofs are predominantly algebraic, rather than geometric or topological. This makes the core theory accessible with minimal function-analytic background, simple to manipulate --- and a natural target for machine-assisted reasoning. Although the core theory component of the book rests on relatively elementary order-based arguments, mathematical prerequisites for the entire book are significantly higher than those for Volume 1. In order to progress through all the applications, readers will need at least familiarity with basic concepts from functional analysis, measure theory, and order theory. For convenience, we have provided an extensive appendix that reviews these topics. While Volume 1 starts with specific models and gradually builds towards general theory -- an approach designed to help readers who are just starting to learn dynamic programming -- this volume takes more of a mathematician's approach by first setting forth the general theory, then specializing to particular applications. Volume 1 especially has been deeply influenced by the elegant and insightful book *Abstract Dynamic Programming* by Dimitri Bertsekas ({cite:p}`bertsekas2022abstract`), now in its third edition. While our approach here has diverged somewhat from his framework, it remains true that this volume is also inspired by Bertsekas's work. This book has been a combined effort involving many people. We are indebted to Schmidt Futures for financial support, stemming from a grant organized by Jim Savage. Thanks are also due to our PhD students Matheus Villas Boas Alves, Shu Hu, Nisha Peng, Simon Mishricky, Longye Tian, and Humphrey Yang, all of whom contributed significantly to the final product. In addition, many other friends, former students, and colleagues helped with preparation, either by directly reading and commenting on the book or via research collaborations. These include Jingni Yang, Yuchao Li, Qingyin Ma, Akshay Shanker, Alexis Akira Toda, Junnan Zhang, and Sylvia Zhao. We are deeply grateful for all of their contributions, while retaining full responsibility for any errors that remain. The second author would like to thank his wife and parents for their unflagging support, as well as the Institute of Economic Research at Kyoto University for hosting him during 2025, where substantial progress was made, and also the International University of Japan, the site of a final desperate push in the early months of 2026. He extends his profound appreciation and gratitude to his hosts Yoshi Nishiyama and Tadashi Sekiguchi at KIER, and to Yue Hua and Chien-Yu Huang at IUJ. A final thanks goes to the stoner rock bands Spaceslug, Elephant Tree, King Buffalo, and All Them Witches. Their songs provided the perfect backing tracks for constructing DP theory. ======================================================================== ## Common Symbols and Terminology ## Mathematical Notation ```{list-table} :header-rows: 0 * - $\1\{P\}$ - indicator function ($1$ if statement $P$ is true, $0$ otherwise) * - $\alpha \coloneq 1$ - $\alpha$ is defined as equal to $1$ * - $f \equiv 1$ - function $f$ is everywhere equal to $1$ * - $\bigvee$ and $\bigwedge$ - supremum and infimum (see {ref}`sss-infsuppo`) * - $\wp(A)$ - the power set of $A$; that is, the set of all subsets of given set $A$ * - $\natset{n}$ - $\{1, \ldots, n\}$ * - $\CC$ - the complex numbers * - $\NN$, $\ZZ$ and $\RR$ - the natural numbers, integers and real numbers respectively * - $\ZZ_+$, $\RR_+$, etc. - the nonnegative elements of $\ZZ$, $\RR$, etc. * - $|x|$ - absolute value of scalar or vector $x$ (modulus if $x \in \CC$) * - $|B|$ for set $B$ - the cardinality of $B$ * - $\RR^n$ - all $n$-tuples of real numbers * - $\RR^{m \times n}$ - all $m \times n$ real matrices * - $x \leq y \;\; (x,y \in \RR^n)$ - $x_i \leq y_i$ for $i=1, \ldots n$ (pointwise partial order) * - $\RR^\Xsf$ - all functions from $\Xsf$ to $\RR$ * - $b\Xsf$ - all bounded (or bounded measurable) functions in $\RR^\Xsf$ (see {prf:ref}`eg-bx`) * - $bc\Xsf$ - all continuous functions in $b\Xsf$ (see {prf:ref}`eg-bcx0`) * - $f(n) = \OO(\beta^n)$ - there exists $C < \infty$ with $f(n) \leq C \beta^n$ for all $n \in \NN$ ``` ```{list-table} :header-rows: 0 * - $\dD(\Xsf)$ - the set of Borel probability measures on $\Xsf$ (see {ref}`ss-markop`) * - $\blop(E, F)$ - the set of bounded linear operators from $E$ to $F$ (see {ref}`ss-pnsl`) * - $\la a, b \ra$ - the inner product of $a$ and $b$ * - $v_n \uparrow v$ - $(v_n)$ is increasing and $\bigvee_n v = v$ (see {ref}`sss-monseq`) * - IID - independent and identically distributed * - $X \eqdist Y$ - $X$ and $Y$ have the same distribution * - $X \sim F$ - $X$ has distribution $F$ * - $F \lefsd G$ - $F$ first order stochastically dominates $G$ (see {ref}`ss-sd`) ``` ## Dynamic Programming Notation and Terminology ```{list-table} :header-rows: 0 * - $(V, \TT)$ - an ADP with value space $V$ and policy operators $T_\sigma \in \TT$ (see {ref}`sss-adpdef`) * - $v_\sigma$ - a $\sigma$-value function; fixed point of $T_\sigma$ (see {ref}`sss-adpdef`) * - $T$ - the Bellman operator, defined by $Tv = \bigvee_\sigma T_\sigma \, v$ (see {eq}`eq-adp_bellop`) * - $H$ - the Howard operator, defined by $Hv = v_\sigma$ where $\sigma$ is $v$-greedy (see {ref}`sss-top`) * - $W$ - the optimistic policy operator (see {eq}`eq-wmopmax`) * - $V_G$ - all $v \in V$ with at least one $v$-greedy policy (see {ref}`sss-subval`) * - $V_U$ - all $v \in V_G$ such that $v \preceq Tv$ (see {ref}`sss-subval`) * - $V_\Sigma$ - the set of fixed points of the policy operators (see {ref}`sss-subval`) * - $\vmax$ - the value function; greatest element of $V_\Sigma$ (see {ref}`sss-oabell`) * - VFI - value function iteration (see {ref}`sss-mdp_algos` and {ref}`sss-adpal`) * - OPI - optimistic policy iteration (see {ref}`sss-mdp_algos` and {ref}`sss-adpal`) * - HPI - Howard policy iteration (see {ref}`sss-mdp_algos` and {ref}`sss-adpal`) ``` ======================================================================== ## Prelude: Examples of Dynamic Programs Dynamic programming is a recursive technique for solving optimization problems. While initially developed for intertemporal problems (inventory management, investment planning, optimal savings and consumption, etc.), it has since been applied to various atemporal problems, ranging from genome sequencing and matrix multiplication to the structure of production chains. Creators of machine learning and artificial intelligence routinely use dynamic programming. The sheer breadth of applications currently being tackled with dynamic programming is a challenge for presenting a modern theory. As well as the vast range of concrete problems faced in applied settings, researchers are adding features to their models that require extensions of the foundations of dynamic programming. Such new features include time-varying discount rates, nonlinear discounting, risk-sensitive control, ambiguity aversion, nonlinear time aggregation, and so on. Researchers added these features in their quests to create new models capable of coming closer to data sets of interest. To set the scene, we begin with some problems that can be handled using classic methods. Then we discuss extensions and transformations that require more sophisticated theoretical foundations. Since our purpose here is to set the stage for the abstract theory that is the main focus of this Volume, our presentation here is mathematically informal at many points. Results stated in this chapter are special cases of the abstract theory, which begins in {prf:ref}`c-adps`. (s-fpintro)= ## A Firm Problem This section uses a firm valuation and control problem to introduce core dynamic programming concepts. We begin with traditional methods, showing how recursive representations lead to the Bellman equation and optimal policies. We then discuss extensions involving unbounded rewards, time-varying discount rates, and risk-sensitive preferences. (ss-tradmeth)= ### Models of a Firm We first consider firm valuation under a Markov profit process, deriving a recursive representation for expected present value. Next, by giving the manager an option to sell the firm at a time of their choosing, we add a control. A Bellman type of optimality is then stated and proved. This material establishes a template for dynamic programs that recur throughout this chapter. #### Valuation A firm generates a random flow of profits $(\pi_t)_{t \geq 0} = \pi_0, \pi_1, \pi_2, \ldots$. At time $t$, a manager wants to calculate the present value of the profit process. Current profits $\pi_t$ are known but future values $\pi_{t+1}, \pi_{t+2}, \ldots$ are not. The manager knows the distribution of the process $(\pi_t)_{t \geq 0}$. Consequently the manager can compute the **expected present value**, namely $$ V_t \coloneq \EE_t \left[ \pi_t + \beta \pi_{t+1} + \beta^2 \pi_{t+2} + \cdots \right]. $$ Here $\beta$ is a **discount factor**, often reparameterized as $\beta = 1/(1+r)$ when $r$ is a discount **rate**. Thus, $\beta^m \pi_{t+m}$ is time $t+m$ profits discounted to the present date $t$. The symbol $\EE_t$ denotes mathematical expectation conditional on time $t$ information. We can switch to a recursive representation of the sequence of valuations $(V_t)_{t \geq 0}$ by first writing $$ V_t = \pi_t + \beta \EE_t \left[ \pi_{t+1} + \beta \pi_{t+2} + \beta^2 \pi_{t+3} + \cdots \right] $$ and then applying the law of iterated expectations $\EE_t = \EE_t \EE_{t+1}$. This leads to $$ V_t = \pi_t + \beta \EE_t V_{t+1}. $$ (eq-firec) This expression will be important for us because the theory of dynamic programming is built around recursive relationships. Later we will see many variations of {eq}`eq-firec`. To make further progress in computing $(V_t)_{t \geq 0}$, let's assume that $\pi_t = \pi(X_t)$, where $(X_t)_{t \geq 0}$ is a discrete time Markov process taking values in a measurable space $(\Xsf, \bB)$. We assume that $(X_t)_{t \geq 0}$ is driven by a stochastic kernel $P$ (see {ref}`sss-sks`), so that $P(X_t, B)$ is the probability that $X_{t+1} \in B \in \bB$ given current state $X_t$. The function $\pi$ is assumed to be in $b\Xsf$, the set of bounded, $\bB$-measurable functions from $\Xsf$ to $\RR$. We understand $X_t$ as representing the "state of the world", including all factors affecting firm profits. {numref}`f-profit_paths` shows multiple realizations of the profit process $(\pi_t)_{t \geq 0}$ in the case where $\Xsf = \RR$, $\bB$ is the Borel sets, $\pi(x) = \exp(x)$, and $(X_t)$ is a discretization of an AR(1) process with $\rho = 0.9$ and $\nu = 0.2$. ```{figure} figures/profits_paths.pdf :name: f-profit_paths Sample profit paths ``` Under the Markov profits assumption, $V_t$ will depend on the current state $X_t$, since knowing the state helps predict future profits. At the same time, the Markov assumption means that earlier values $X_0, \ldots, X_{t-1}$ will not aid prediction once $X_t$ is known. This leads us to conjecture that $V_t = v(X_t)$ for some fixed function $v \colon \Xsf \to \RR$. Inserting this conjecture into {eq}`eq-firec` and evaluating at $X_t = x$ yields $$ v(x) = \pi(x) + \beta \int v(x') P(x, \diff x'). $$ (eq-firecm) Defining $(Pv)(x) \coloneq \int v(x') P(x, \diff x')$, which is consistent with the operator-theoretic notation in {ref}`ss-markop`, we can rewrite {eq}`eq-firecm` as $v = \pi + \beta P v$. Using the fact that $\pi \in b\Xsf$, and that the spectral radius of $\beta P$ is $\beta$ ({prf:ref}`l-mopfp`), this equation has a unique solution for $v$ in $b\Xsf$ whenever $\beta < 1$. The Neumann series lemma ({prf:ref}`t-nslbs`) tells us that the solution has the form $$ v = (I - \beta P)^{-1} \pi. $$ (eq-vsol) ```{figure} figures/profits_v_given_beta.pdf :name: f-profits_v_given_beta Value of the firm for different discount factors ``` {numref}`f-profits_v_given_beta` plots $v = (I - \beta P)^{-1} \pi$ over the state space $\Xsf$ for several choices of the discount factor $\beta$. The environment is the same as the one underlying {numref}`f-profit_paths`. As $\beta$ increases, the firm places greater weight on future profits, resulting in higher valuations across all states. ```{exercise} :label: ex-egs-auto-1 We began our analysis in a Markov environment by conjecturing that setting $V_t = v(X_t)$ for some $v$ would lead to a solution to the recursion {eq}`eq-firec`. Confirm that {eq}`eq-firec` holds when $v = (I - \beta P)^{-1} \pi$. ``` ```{solution} ex-egs-auto-1 From $v = (I - \beta P)^{-1} \pi$ we have $v = \pi + \beta P v$ and hence $v(X_t) = \pi(X_t) + \beta (Pv)(X_t)$. Since $(Pv)(X_t) = \EE_t v(X_{t+1})$, we recover $V_t = \pi(X_t) + \beta \EE_t V_{t+1}$. This is the recursive expression in {eq}`eq-firec`. ``` Notice how the difficult problem of computing a stochastic process $(V_t)_{t \geq 0}$ has been converted into the much easier problem of calculating $v = (I - \beta P)^{-1} \pi$. (sss-fintroc)= #### Control So far our manager has had no choice to make. Let's now give the manager the option to sell the firm to an outside buyer at the start of each period (before receiving current profits) for the fixed price $s$. We will describe the manager's decision with a binary policy function $\sigma$ that selects between selling the firm in the current period, after observing the current state $x$, and not selling --- in which case the manager again faces the option to sell the firm at the beginning of the next period. The same policy $\sigma$ is applied in every period, with $\sigma(x) \in \{0,1\}$ being the current decision. Modifying {eq}`eq-firecm` appropriately to a $\sigma$-dependent value function, we obtain the Bellman equation $$ v_\sigma(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v_\sigma(x') P(x, \diff x') \right], $$ (eq-firecms) where $v_\sigma(x)$ is the total value of the firm under the policy $\sigma$, conditional on state $x$. If $\sigma(x) = 1$, then the manager sells at payoff $s$ and the process ends, so $s$ is the total value received. Otherwise $\sigma(x) = 0$, indicating no sale, profit $\pi(x)$ is received, and the process continues, with discounted expected payoff $\beta \int v_\sigma(x') P(x, \diff x')$. This is the second term in {eq}`eq-firecms`. In what follows we call $v_\sigma$ the **$\sigma$-value function**. ```{prf:remark} We constructed {eq}`eq-firecm` in stages, starting with value as an infinite sum and then showing how it satisfies the recursion {eq}`eq-firecm`. The same can be done for {eq}`eq-firecms`, but it is a little more involved and we defer the details: in {ref}`sss-ndpolop`, we set up a very similar infinite sum and carefully prove that it satisfies a corresponding recursion. For now, {eq}`eq-firecms` is a natural extension of {eq}`eq-firecm`. ``` Let $\Sigma$ be the set of all policies, defined as all $\bB$-measurable functions mapping $\Xsf$ to $\{0,1\}$. For each $\sigma \in \Sigma$, let $v_\sigma$ be the function defined recursively in {eq}`eq-firecms`. We can use the Neumann series lemma to show that $v_\sigma$ is uniquely defined in $b\Xsf$. Alternatively, we can introduce the operator $T_\sigma \colon b\Xsf \to b\Xsf$ via $$ (T_\sigma \, v)(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] \qquad (x \in \Xsf). $$ (eq-fintrots) We call $T_\sigma$ the **policy operator** defined by $\sigma$. In operator form it can be expressed as $$ T_\sigma \, v = \sigma s + (1-\sigma)(\pi + \beta Pv). $$ Evidently, $v_\sigma$ solves {eq}`eq-firecms` if and only if $v_\sigma$ is a fixed point of $T_\sigma$. One easily shows that $T_\sigma$ is contracting (see {ref}`sss-conmap` for the definition) on the complete space $(b\Xsf, \| \cdot\|)$, where $\| \cdot \|$ is the supremum norm, defined by $\| f \| = \sup_{x \in \Xsf} |f(x)|$ for all $f \in b\Xsf$. Indeed, for arbitrary $v, w \in b\Xsf$, $$ |T_\sigma \, v - T_\sigma \, w| \leq (1-\sigma) \beta | Pv - Pw| \leq \beta | Pv - Pw| \leq \beta P | v - w|, $$ where the absolute value $| \cdot |$ is applied pointwise (and the last inequality is by the triangle inequality for integrals --- or, if you prefer, by the inequality for positive operators in {prf:ref}`ex-ploa` on page ). From the last expression we get $$ |T_\sigma \, v - T_\sigma \, w| \leq \beta P \| v - w \| \leq \beta \| v - w \|. $$ Taking the supremum over the left-hand side, we find that $T_\sigma$ is a contraction of modulus $\beta$ on $b\Xsf$. Hence, the $\sigma$-value function $v_\sigma$ is the unique fixed point of $T_\sigma$ in the candidate space $b\Xsf$. ```{figure} figures/profits_fps.pdf :name: f-profits_fps :width: 95% Policies and their lifetime value functions ``` {numref}`f-profits_fps` shows the value functions $v_\sigma$ and $v_\tau$ for two possible policies $\sigma$ and $\tau$. The policies are shown on the left and the value functions are on the right. The environment is the same as {numref}`f-profits_v_given_beta` with $\beta$ set to 0.96. The first policy $\sigma$ sells when the state is below 1.0 and continues otherwise. The second policy $\tau$ oscillates between selling and continuing. Each value function is computed as the fixed point of the corresponding policy operator in $b\Xsf$. The horizontal dashed line indicates the sale price $s$. In terms of value, policy $\tau$ is outperformed by $\sigma$ everywhere on the state space. Now let's consider optimality. A policy $\sigma$ is called **optimal** if $v_\tau(x) \leq v_\sigma(x)$ for all $\tau \in \Sigma$ and all $x \in \Xsf$. The **value function** $\vmax$, sometimes called the *optimal value function*, is defined by $$ \vmax(x) \coloneq \sup_{\sigma \in \Sigma} v_\sigma(x) \qquad (x \in \Xsf). $$ Evidently $\sigma$ is an optimal policy if and only if $v_\sigma = \vmax$. ```{exercise} :label: ex-egs-auto-2 Prove that $\vmax$ is well-defined as a real-valued function on $\Xsf$. ``` ```{solution} ex-egs-auto-2 Fix $\sigma \in \Sigma$. Let $a = |s| + \| \pi \|$ and let $M = a/(1-\beta)$. Let $[-M, M]$ be all $v \in b\Xsf$ such that $|v| \leq M$. Repeated use of the triangle inequality shows that $|T_\sigma \, v| \leq |s| + |\pi| + \beta P |v|$. Hence, for $v \in [-M, M]$, we have $|T_\sigma \, v| \leq a + \beta M = M$. In particular, $T_\sigma$ is a self-map on the closed set $[-M, M]$. As a result, its fixed point $v_\sigma$ also lies in this set. From this fact we obtain $v_\sigma(x) \leq M$ for all $\sigma$ and all $x$. As bounded above sets in $\RR$ have suprema, we conclude that $\vmax$ is well-defined ({prf:ref}`t-completeness`). ``` The problem of finding an optimal policy appears nontrivial, since $\Sigma$ is a large set whenever $\Xsf$ is large. Indeed, each $\sigma \in \Sigma$ is just an indicator function of a measurable set, so the cardinality of $\Sigma$ is the same as $\bB$. However, it turns out that we can find, characterize, and compute optimal policies relatively easily, using the theory of dynamic programming. In particular, we can state the following result, which is proved below in {ref}`sss-fintropp`. ```{prf:theorem} :label: t-fintroop The value function $\vmax$ is the unique $v \in b\Xsf$ that solves the functional equation $$ v(x) = \max \left\{ s ,\; \pi(x) + \beta \int v(x') P(x, \diff x') \right\} \qquad (x \in \Xsf). $$ (eq-fintroie) In addition, at least one optimal policy exists. Finally, a policy $\sigma \in \Sigma$ is optimal if and only if, for each $x \in \Xsf$, $$ \sigma(x) \in \argmax_{a \in \{0,1\}} \left\{ a s + (1 - a) \left[ \pi(x) + \beta \int \vmax(x') P(x, \diff x') \right] \right\}. $$ (eq-fintrob) ``` {prf:ref}`t-fintroop` is a simple consequence of Richard Bellman's (1920--1984) beautiful theory of dynamic programming {cite:p}`bellman1957dynamic`. Equation {eq}`eq-fintroie` is called the **Bellman equation**. The characterization in {eq}`eq-fintrob` has the following natural interpretation. The expression $\pi(x) + \beta \int \vmax(x') P(x, \diff x')$ is the payoff (expected present value) for choosing to continue, receiving current profits, and then behaving optimally (since we are valuing future states with $\vmax$). The best decision at $x$ is to continue if this is larger than $s$, which is achieved by setting $a=0$. Otherwise the manager should set $a=1$ and stop. We shall repeatedly use a term related to {prf:ref}`t-fintroop`. Thus, given $v \in b\Xsf$, we will agree to say that $\sigma \in \Sigma$ is **$v$-greedy** whenever $$ \sigma(x) \in \argmax_{a \in \{0,1\}} \left\{ a s + (1 - a) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] \right\} \quad \text{for all } x \in \Xsf. $$ (eq-fing) With this terminology, we can repeat the policy optimality characterization in {prf:ref}`t-fintroop` by saying that a policy is optimal if and only if it is $\vmax$-greedy. This idea is a cornerstone of our theory. ```{prf:remark} If we adopt the tie-breaking convention that the manager always stops when the rewards to stopping (selling) and continuing are equal, we can replace {eq}`eq-fintrob` by stating that $\sigma$ is optimal if and only if $$ \sigma(x) = \1 \left\{ s \geq \pi(x) + \beta \int \vmax(x') P(x, \diff x') \right\} \qquad \text{for all } x \in \Xsf. $$ Here $\1\{S\}$ is an indicator function equal to one when the statement $S$ is true and zero otherwise. ``` (sss-fintrobe)= #### The Bellman Operator We noted above that a policy is optimal if and only if it is $\vmax$-greedy. This provides a direct avenue for computing optimal policies: First calculate $\vmax$ and then take a $\vmax$-greedy policy. A straightforward way to approximate $\vmax$ is to iterate on the **Bellman operator**, which, in the present setting, is the self-map $T$ on $b\Xsf$ defined at $v \in b\Xsf$ by $$ (T v)(x) = \max \left\{ s ,\; \pi(x) + \beta \int v(x') P(x, \diff x') \right\} \qquad (x \in \Xsf). $$ (eq-fintroio) In operator-theoretic notation, we write $T$ as $Tv = s \vee (\pi + \beta Pv)$. Evidently $v$ solves the Bellman equation if and only if $v$ is a fixed point of $T$. Note that $T$ is a contraction of modulus $\beta$ on $(b\Xsf, \| \cdot \|)$. Indeed, fixing $v, w \in b\Xsf$ and applying the elementary bound $$ |\alpha \vee x - \alpha \vee y| \leq |x - y| \qquad (\alpha, x, y \in \RR), $$ we get $$ |T v - T w| = |s \vee (\pi + \beta Pv) - s \vee (\pi + \beta Pw)| \leq \beta | Pv - Pw|. $$ The rest of the argument is identical to the one for contractivity of $T_\sigma$ in {ref}`sss-fintroc`. From this and {prf:ref}`t-fintroop`, we see that $T$ has a unique fixed point in $b\Xsf$ and that the fixed point is the value function $\vmax$. Moreover, for any $v$ in $b\Xsf$, we have $T^k v \to \vmax$ as $k \to \infty$. In other words, fixed point iteration (also called successive approximation) allows us to approximate $\vmax$ arbitrarily well. ```{exercise} :label: ex-boeq0 Prove the following: for $v \in b\Xsf$, a policy $\sigma \in \Sigma$ is $v$-greedy if and only if $T v = T_\sigma \, v$. ``` ```{solution} ex-boeq0 This follows directly from the definitions. The statement that $\sigma$ is $v$-greedy is equivalent to $$ \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] = \max \left\{ s ,\; \pi(x) + \beta \int v(x') P(x, \diff x') \right\} $$ for all $x \in \Xsf$. Hence $T_\sigma \, v = Tv$. ``` ```{figure} figures/profits_optimal.pdf :name: f-profits_optimal :width: 95% Optimal policy and value function ``` {numref}`f-profits_optimal` shows an approximate optimal policy and an approximation to the value function $\vmax$. The latter was computed by iterating with the Bellman operator $T$ from initial condition $v_0 \equiv 0$ and monitoring for convergence (waiting for the step sizes $\| T^k v_0 - T^{k+1} v_0 \|$ to fall below some threshold). We then took the resulting function $v$ and computed a $v$-greedy policy. We used the same parameters as in {numref}`f-profits_fps`. The optimal policy has a threshold structure: the manager sells the firm when the state is below a critical value and continues operating otherwise. The value function $\vmax$ dominates the sale price $s$ everywhere, reflecting the value of the option to wait and sell later. (sss-fintropp)= #### Proving {prf:ref}`t-fintroop` In this book we will prove far more general results than {prf:ref}`t-fintroop`. So providing an immediate proof of the theorem here is actually redundant and postpones our presentation of some sample applications. Nevertheless, we now sketch a short proof of {prf:ref}`t-fintroop`, but alert readers that they can safely move on without digesting it now. ```{prf:proof} *Proof of {prf:ref}`t-fintroop`.* In {ref}`sss-fintrobe` we showed that $T$ is a contraction mapping on the complete space $(b\Xsf, \| \cdot \|)$. Hence $T$ is globally stable on $b\Xsf$ and therefore has a unique fixed point $\bar v \in b\Xsf$. Our first claim is that $\bar v = \vmax$. We show $\bar v \leq \vmax$ and then $\bar v \geq \vmax$. For the first inequality, let $\sigma \in \Sigma$ be $\bar v$-greedy. Recalling {prf:ref}`ex-boeq0`, we have $T_\sigma\, \bar v = T \bar v = \bar v$. Hence $\bar v$ is also a fixed point of $T_\sigma$. But the only fixed point of $T_\sigma$ in $b\Xsf$ is $v_\sigma$, so $\bar v = v_\sigma \leq \vmax$. This is our first inequality. As for the second, fix an arbitrary $\sigma \in \Sigma$ and observe that $T_\sigma \, \bar v \leq T \bar v = \bar v$. Since $T_\sigma$ is order-preserving (see {ref}`sss-opm`), this implies that $T^k_\sigma \, \bar v$ is decreasing and bounded above by $\bar v$. Because $T_\sigma$ is a contraction with fixed point $v_\sigma$, we can take the limit in $k$ to obtain $v_\sigma \leq \bar v$. Taking the supremum over $\sigma \in \Sigma$ yields $\vmax \leq \bar v$. This argument shows that $\vmax$ is a fixed point of $T$ in $b\Xsf$. Since $T$ is contracting on $b\Xsf$, we have confirmed that $\vmax$ is the unique solution to the Bellman equation in $b\Xsf$. Turning to our characterization of greedy policies, it follows from {prf:ref}`ex-boeq0` that $$ \sigma \text{ is } \vmax \text{-greedy} \quad \iff \quad T_\sigma \, \vmax = T \vmax \quad \iff \quad T_\sigma \, \vmax = \vmax. $$ The right hand side of this expression tells us that $\vmax$ is a fixed point of $T_\sigma$. But the only fixed point of $T_\sigma$ is $v_\sigma$, so the right hand side is equivalent to the statement $v_\sigma = \vmax$. By this chain of logic and the definition of optimality, we see that $$ \sigma \text{ is } \vmax \text{-greedy} \iff \vmax = v_\sigma \iff \text{ } \sigma \text{ is optimal}. $$ Since greedy policies exist for every $v$ in $b\Xsf$, this also proves existence of at least one optimal policy. ◻ ``` #### How About More General Policies? Up to now we have focused on **stationary Markov policies**, which, in our language, are measurable maps from $\Xsf$ to $\{0,1\}$. Restricting our attention to such policies prevents the manager from making decisions based on a longer history of states and actions, or from changing policies at some date. We have also ignored the possibility that the manager might wish to randomize actions, meaning that the choice is a probability of selling, rather than a fixed selection from $\{0,1\}$. It turns out that focusing exclusively on stationary Markov policies is appropriate in the current setting. We show in {ref}`ss-nonstat` that allowing nonstationary policy choices cannot lead to higher expected present value, and that this result holds far more generally. Moreover, randomization cannot improve outcomes in this setting. For a proof in a similar environment, see our discussion of mixed strategies in Section 9.2.1.6 of {cite}`sargent2025dynamic`. (ss-fintext)= ### Extensions The theory stated so far is elegant and can be extended in some directions with relatively little effort. For example, we can ask the manager to control inventories, capital stocks, and the size and disposition of a labor force across tasks. The same fundamental ideas still govern optimality, and the same solution approaches still work. But some extensions are more challenging and require moving beyond standard dynamic programs. We describe some of these challenges next. (sss-bcdr)= #### Beyond Constant Discount Rates One restrictive assumption in {ref}`ss-tradmeth` is that the discount factor $\beta$ is constant. In fact discount rates vary considerably. For example, market interest rates fluctuate substantially, responding to changes in monetary policy, inflation expectations, and risk premia. Interest rates charged to risky borrowers can fluctuate even more widely than benchmark rates. For a firm weighing the decision to continue operating versus exiting, the cost of capital at which future profits are discounted will have a substantial impact on optimal decisions. In practice, firms routinely incorporate time-varying discount rates into their strategic planning. {numref}`f-discount_rates` illustrates the extent of this variation using US data from the Federal Reserve. The top panel shows the federal funds rate, which transmits to firm financing costs through the credit channel. The bottom panel plots the real interest rate, computed as the 10-year Treasury yield minus twelve-month CPI inflation. Both sets of rates affect the intertemporal tradeoffs that firms face: when rates are high, future profits are discounted more heavily, reducing the present value of long-lived investments and making exit or disinvestment more attractive. Conversely, low or negative rates reduce the cost of waiting and encourage firms to invest, expand capacity, or delay exit from declining markets. The relative importance of real vs nominal rates varies across firms. The swings visible in {numref}`f-discount_rates` underscore the issues with assuming a fixed discount factor $\beta$ when studying intertemporal choices of firms. ```{figure} figures/discount_rates.pdf :name: f-discount_rates :width: 95% US nominal and real interest rates. ``` To accommodate dependence of discount factors on firm-specific or macroeconomic variables, we can specify that $\beta = b(X_t)$ for some suitable function $b$. Fortunately, this modification can be accommodated in the current setting. A discussion can be found in {ref}`sss-fvsd`. But the associated theory can become more complicated when the controller's actions affect state components that affect the discount factor. This happens, for example, in models of firms who face higher borrowing costs because they pursue high-risk strategies that involve occasionally running down their cash reserves. That makes the dynamic programming problem more challenging, particularly if we assume that interest rates are occasionally negative; such cases break contractivity properties of policy and Bellman operators. Optimality results for the finite state case can be found in {cite}`sargent2025dynamic`. We study more general cases in {prf:ref}`c-ldps`. (sss-urs)= #### Unbounded Rewards Another---more technical---issue with our analysis in {ref}`ss-tradmeth` is that $\pi$ is bounded. This directly embeds the problem in the space of bounded functions $b\Xsf$ and pairs naturally with the supremum norm. The contraction and optimality proofs unfold smoothly when working in such an environment. But assuming that $\pi$ is bounded can be overly restrictive. It turns out that we can drop the boundedness assumption without too much difficulty. One way is to assume instead that $\| \pi \|_\ell < \infty$ where $\ell \colon \Xsf \to [1, \infty)$ and $\| \cdot \|_\ell$ is the $\ell$-weighted norm defined by $\| f \|_\ell \coloneq \sup_{x \in \Xsf} |f(x)|/\ell(x)$. This approach is discussed in {ref}`sss-wrdpf`. Another option is to embed the problem in the class of integrable functions $L_1(\psi) \coloneq L_1(\Xsf, \bB, \psi)$ for a suitable measure $\psi$ on $(\Xsf, \bB)$. (For background see {ref}`ss-lp`.) For example, suppose that $\psi$ is a stationary distribution (see {ref}`sss-moif`) of the stochastic kernel $P$ and that $\pi \in L_1(\psi)$. The policy operators then send $L_1(\psi)$ into itself. Indeed, if $\pi \in L_1(\psi)$ and we fix $v \in L_1(\psi)$, then $|T_\sigma \, v| \leq |s| + |\pi| + \beta |P v|$, so $T_\sigma \, v$ is in $L_1(\psi)$ when $Pv$ is $\psi$-integrable. This is true by stationarity of $\psi$ --- see (i) of {prf:ref}`l-mopfpl` and the surrounding discussion. Moreover, $T_\sigma$ is a contraction map on $L_1(\psi)$, as can be seen by integrating both sides of the bound $|T_\sigma \, v - T_\sigma \, w| \leq \beta P | v - w|$ with respect to $\psi$ and using {eq}`eq-adjrules` on p.  to obtain $\int P | v - w| \diff \psi = \int |v - w| \diff \psi$. One can also show that the Bellman operator $T$ is a contraction with respect to the norm on $L_1(\psi)$ and then proceed to adapt proofs in {prf:ref}`t-fintroop`. Rather than provide all details here, we defer further discussion to {ref}`sss-fvur`. (ss-brn)= ### Beyond Risk Neutrality A limitation of the preceding analysis is how it treats risk. We assumed that the manager wants to maximize the expected present value of cash flow generated by the firm across different strategies (policies). But what if the manager wants something else? Canonical theories of firm behavior suggest that expected present value is the appropriate criterion. In complete and frictionless markets, a firm's shareholders can hedge firm-specific risks by trading other securities, so the firm's manager should not worry about that {cite:p}`modigliani1958cost,smith1985determinants`. But maybe financial markets are incomplete and maybe shareholders lack enough information about risks or financial sophistication to develop optimal hedging strategies. Maybe managers' incentives are not aligned with shareholders' interests. Managers might be concerned about a firm's survival and overweight downside risks relative to shareholders. Indeed, various studies offer evidence for such risk-averse decision-making within both small and large firms (see, e.g., {cite}`graham2013managerial`, {cite}`kerr2019risk`, or {cite}`almeida2024quantile`). In this section we consider adjusting the manager's problem to incorporate some of these considerations. (sss-rewarddist)= #### Distributions of Rewards Let's return to our jump-off point for optimization, where we decided to seek a policy that solves $\max_{\sigma \in \Sigma} v_\sigma(x)$ for all $x$. We know that $v_\sigma(x)$ is the expected lifetime value of policy $\sigma$ given initial condition $x$. As such, we can also write $$ v_\sigma(x) = \EE Z_\sigma, $$ (eq-vsiger) where $$ Z_\sigma \coloneq \sum_{t=0}^{T(\sigma)-1} \beta^t \pi_t + \beta^{T(\sigma)} s, \quad \text{with} \quad T(\sigma) \coloneq \inf \, \setntn{t \geq 0}{\sigma(X_t)=1}. $$ (ex-rsigrv) Here $T(\sigma)$ is the date at which the firm is sold at price $s$ (with convention $\inf \varnothing = \infty$, so that $T(\sigma)=\infty$ if the firm is never sold). Evidently $Z_\sigma$ is a random payoff, depending on the policy and the random path $(\pi_t)_{t \geq 0}$. The random variable $Z_\sigma$ depends on the initial state $x$ but we have chosen to suppress this in the notation. The left-hand subfigure in {numref}`f-profits_Z_sigma_combined` shows the distribution of $Z_{\sigopt}$, lifetime value under the optimal policy $\sigopt$, computed by fixing an initial condition $x$, simulating 100,000 profit paths from a given initial state, and then computing the discounted payoff along each path. (Parameter values were the same as in {numref}`f-profits_fps`.) The mean of the distribution is $v_{\sigopt}(x)$, which is also $\vmax(x)$. The policy is regarded as optimal because the mean of the distribution is larger than the mean of $Z_{\sigma}$ under any other policy, and given any other initial condition $x$. The right-hand subfigure compares $Z_{\sigopt}$ with $Z_\sigma$, where $\sigma$ is the policy that never sells. ```{figure} figures/profits_Z_sigma_combined.pdf :name: f-profits_Z_sigma_combined :width: 95% Distributions of the random payoff $Z_\sigma$ defined in {eq}`ex-rsigrv`. ``` Let's consider these distributions from the perspective of firm managers when markets are not frictionless, and information is not perfect. In this case, as discussed at the start of {ref}`ss-brn`, managers care about more than just the mean of these distributions. While the mean will surely be of interest, these managers are likely to also care about factors such as variance, upside risk and downside risk. Let's now consider how we might insert preferences over such factors into our model. (sss-drl)= #### Distributional Dynamic Programming Some researchers have begun to construct a theory of "distributional dynamic programming" where the core idea is to track the distribution of the payoff across policies and initial conditions. In our context, this means choosing $\sigma$ so that $Z_\sigma$ has an "optimal" distribution. For example, a manager might want a distribution with a relatively high mean and low downside risk. {cite}`bellemare2023distributional` show that, for a relatively broad class of dynamic programming problems, a distributional version of the Bellman equation can be constructed, where the left- and right-hand sides of the Bellman equation are both distributions. We formalize this idea within the abstract dynamic programming framework in {ref}`ss-ddp`. In practice, the theory of distributional dynamic programming is constrained by the fact that there is no natural extension of the idea of a greedy policy to the setting of distributional Bellman equations. As such, we focus on environments where agents are able to specify loss or reward functions over distributions. The next few sections investigate such cases, while still preserving concern for tail properties and additional moments beyond the mean. (sss-mvar)= #### Mean-Variance Analysis Let $R$ be a random payoff that we want to evaluate. Mean-variance analysis proposes the criterion $\EE [R] - (\gamma/2) \var[R]$, where $\gamma$ parameterizes risk-aversion. In the context of our manager's problem, the mean-variance criterion tells us to solve $$ \max_{\sigma \in \Sigma} \; m_V(Z_\sigma) \quad \text{where} \quad m_V(Z_\sigma) \coloneq \left\{ \EE [Z_\sigma] - \frac{\gamma}{2} \var[Z_\sigma] \right\}. $$ (eq-mvar) Assuming that the initial condition is $x$, the first term $\EE [Z_\sigma]$ is just $v_\sigma(x)$. The second term is harder to calculate but its role is clear: it downweights policies that generate high variance payoffs, with the extent of downweighting depending on the size of $\gamma$. More risk averse managers will use larger values of $\gamma$ and their preferred policies will deviate more from what we previously defined to be optimal---that is, the policy that solves $\max_{\sigma \in \Sigma} v_\sigma(x) = \max_{\sigma \in \Sigma} \EE Z_\sigma$ for all $x$. (sss-alt)= #### Alternatives to Mean-Variance The mean-variance criterion is not the only way to formulate concerns about risk. An alternative formulation solves $$ \max_{\sigma \in \Sigma}\; e_\gamma(Z_\sigma) \quad \text{where} \quad e_\gamma(Z_\sigma) \coloneq - \frac{1}{\gamma} \ln \left[ \EE [\exp(- \gamma Z_\sigma)] \right] $$ Here $\EE$ again denotes the mathematical expectation with respect to the probability distribution of the random payoff, which the decision maker is assumed to know. The map $e_\gamma$ is called an **entropic certainty equivalent**. The parameter $\gamma$ parameterizes risk aversion. When $\gamma > 0$, the decision maker values $Z_\sigma$ below $\EE[Z_\sigma]$. We say that risk aversion is higher when $\gamma$ is larger. An introduction to these ideas is provided in Section 7.2.2 of {cite}`sargent2025dynamic`. The entropic criterion is attractive for several reasons. One is that, with sufficiently many finite moments, a Taylor expansion produces $$ e_\gamma(X) \;=\; \kappa_1 \;-\; \frac{\gamma}{2}\,\kappa_2 \;+\; \frac{\gamma^2}{6}\,\kappa_3 \;-\; \frac{\gamma^3}{24}\,\kappa_4 \;+\; \cdots, $$ where $\kappa_n$ is the $n$-th cumulant $$ \kappa_1 = \EE[X], \qquad \kappa_2 = \var[X], \qquad \kappa_3 = \EE\!\left[(X - \kappa_1)^3\right], \qquad \cdots $$ This tells us that, with positive $\gamma$, the agent likes a high mean, dislikes variance, likes positive skewness (right tails), dislikes kurtosis (fat tails), etc. When higher moments are small we get $$ e_\gamma(Z_\sigma) \approx \EE [Z_\sigma] - \frac{\gamma}{2} \var[Z_\sigma], $$ which connects us back to mean-variance analysis. The approximation above becomes exact when $Z_\sigma$ is normally distributed. In addition, the entropic criterion can be regarded as an indirect utility function that emerges from a setting in which the manager doubts its probability model for $R$. In particular, let $P$ denote the manager's baseline model probability measure for the payoff $Z_\sigma$. Then it can be shown that $$ e_\gamma(Z_\sigma) = \min_Q\; \left\{ \EE_Q [Z_\sigma] + \frac{1}{\gamma} D_{KL}(Q \| P) \right\}, $$ where the minimum is over probability measures $Q$ absolutely continuous with respect to $P$ and $D_{KL}(Q \| P) \coloneq \EE_Q [\ln(\diff Q / \diff P)]$ is a Kullback--Leibler statistical divergence for measuring the discrepancy between two probability distributions. Here the parameter $1/\gamma$ controls the size of a penalty that the minimizer pays for distorting $Q$ relative to baseline probability model $P$; larger $\gamma$'s allow the minimizing "player" who chooses $Q$ to range over a larger set of alternative models. Such an analysis connects entropic preferences to robust control and ambiguity aversion and will be discussed in {prf:ref}`c-rdps`. The entropic criterion $e_\gamma(Z_\sigma)$ is a special case of a more general objective $$ \Phi(Z_\sigma) \coloneq \phi^{-1} \left\{ \EE [\phi(Z_\sigma)] \right\} $$ where $\phi$ is a given function. Typically $\phi$ is concave, as is the case for $\phi(x) = \exp(-\gamma x)$ when $\gamma > 0$. Another example is the **Kreps--Porteus expectation**, which is obtained by setting $\phi(x) = x^{1-\gamma}$. A third option for inserting preferences over risk is to evaluate $$ \VaR_\alpha(Z_\sigma) \coloneq \inf \, \setntn{c \in \RR}{\PP\{Z_\sigma+c<0\}\leq\alpha}. $$ where $\alpha \in [0,1]$ is a given constant. This objective is called **value-at-risk** and can be understood as the smallest cash injection $c$ such that the probability of a net loss is no more than $\alpha$. Intuitively, if $Z_\sigma$ has more downside risk, then $\VaR_\alpha(Z_\sigma)$ increases, as more cash is needed to keep the loss probability below the threshold. Thus, a manager seeking a low-risk policy might look to minimize $\VaR_\alpha(Z_\sigma)$, or, equivalently, to solve $\max_\sigma - \VaR_\alpha(Z_\sigma)$. Value-at-risk became industry standard in the 1990s, partly due to popularization through RiskMetrics, a risk management framework developed at J.P. Morgan. It has spawned a variety of alternatives and extensions, including conditional value-at-risk, entropic value-at-risk and relativistic value-at-risk. We will meet some of these ideas again in {prf:ref}`c-rdps`. (sss-difficulties)= #### Difficulties While the risk-management concepts discussed above are all sensible, they complicate solving for an optimal policy because they make the objective be nonlinear. For example, consider the mean-variance problem {eq}`eq-mvar`, which we can write as $$ \max_{\sigma \in \Sigma} m_V \left\{ \sum_{t=0}^{T(\sigma)-1} \beta^t \pi_t + \beta^{T(\sigma)} s \right\}. $$ (eq-uuoo) The function $m_V$ is nonlinear due to the presence of the variance term, and this nonlinearity prevents us from passing $m_V$ through the sum and thereby deriving a recursive expression for the value of a strategy similar to {eq}`eq-firecms`. To see this, recall the role that linearity of the expectations operator $\EE_t$ played in our derivation of representation {eq}`eq-firecm`. Bellman's dynamic programming theory requires having a recursive representation for valuations under alternative strategies. Without that, numerical strategies for solving global optimization problems like $\max_\sigma U(Z_\sigma)$ can be poorly behaved, very high dimensional, and virtually inaccessible to the theory of dynamic programming. Even worse, there is an important sense in which the criterion of maximizing $m_V(Z_\sigma)$ over $\sigma \in \Sigma$ is no longer the right one, since there is no guarantee that the best strategy for the manager is to choose a stationary Markov policy and apply it in every period. For example, in our new nonlinear setting, it might be optimal for the manager to apply a given policy $\sigma$ in the first period and a second policy $\tau$ in all periods thereafter (see, e.g., Section 5 of {cite}`bauerle2024markov`). While this might still seem feasible---we need only to compute one more policy--- **time inconsistency** arises. The manager must be committed to switching to $\tau$ in the second period, since re-optimization would lead to choosing $\sigma$ again. Unfortunately none of the alternatives to mean-variance analysis discussed in {ref}`sss-alt` offer a way out of the problem described above, since nonlinearity in the objective again prevents construction of a recursive representation. As with mean-variance, this lack of recursivity requires deploying dynamic programs in new ways.[^1] (sss-fbtr)= #### Back to Recursion In {ref}`sss-difficulties`, we saw how nonlinearity intended to capture risk-preferences led to a breakdown of dynamic programming theory. Fortunately, there is a way to inject nonlinearity and risk-preferences into the manager's problem without breaking recursivity and hence access to the core ideas of dynamic programming. The idea is to stop trying to apply risk-preferences directly to the net present value sum and instead apply them period-by-period. We can do this by starting with the risk-neutral recursive valuation {eq}`eq-firecms` and modifying it to $$ v_\sigma(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta (K v_\sigma)(x) \right], $$ (eq-vswk) where $K$ is a possibly nonlinear operator from $b\Xsf$ to itself. For example, using the notation $(Pv)(x) = \int v(x') P(x, \diff x')$, we can set $$ (K v)(x) = (Pv)(x) - \frac{\gamma}{2} \int \left[v(x') - (Pv)(x)\right]^2 P(x, \diff x') $$ to implement the mean-variance criterion, or $$ (K v)(x) = - \frac{1}{\gamma} \ln \left\{ \int \exp(- \gamma v(x')) P(x, \diff x') \right\} $$ for entropic risk preferences. This alternative approach to inserting risk preferences into the manager's problem is somewhat less intuitive than the direct approach that we reviewed in {ref}`sss-mvar` and {ref}`sss-alt`. In addition, it introduces a new problem: is $v_\sigma$ actually well-defined by the nonlinear functional equation {eq}`eq-vswk`? On the other hand, it offers a major advantage: provided we can show that $v_\sigma$ is in fact well-defined --- which is a problem for fixed point theory --- the valuations are recursive by construction. Exploiting this fact, we can extend Bellman's original theory in very natural ways. This is one of the main subjects of this book. We will attack this problem in stages, beginning with an abstract recursive setup in {prf:ref}`c-adps`. The setup will be recursive in the sense that each valuation $v_\sigma$ will be represented as the fixed point of a possibly nonlinear policy operator $T_\sigma$. Our approach will then be to rewrite Bellman's optimality theory in this abstract setting and seek properties on the policy operators under which the main results go through. At the end of this process we will connect back to the applications from this chapter. Before progressing to this abstract theory, we look at some other concrete examples, beginning with finite state Markov decision processes. (s-mdps)= ## Finite MDPs Finite state Markov decision processes (MDPs) form the foundations of many quantitative modeling and reinforcement learning routines, as well as providing a benchmark setting for dynamic programming theory (see, e.g., {cite}`puterman2005markov` or Chapter 5 of {cite}`sargent2025dynamic`). In this section we introduce finite state MDPs and some extensions. As was the case for the firm problem considered in {ref}`s-fpintro`, our main objective is to introduce a class of dynamic programming problems that will motivate the abstract theory starting in {prf:ref}`c-adps`. While the presentation below is self-contained, readers wanting a slower pace and more examples might prefer to begin with Chapter 5 of {cite}`sargent2025dynamic`. (ss-fmdpt)= ### Theory In this section we introduce the finite MDP framework and state core optimality results---the Bellman equation and the greedy policy characterization of optimal policies. We then present three fundamental algorithms: value function iteration, Howard policy iteration, and optimistic policy iteration. We illustrate the main ideas with an application to firm cash management. (sss-tdtm)= #### The Discrete Time Model A finite state **Markov decision process** (finite MDP) consists of - a finite set $\Xsf$ called the **state space**, - a finite set $\Asf$ called the **action space**, and a tuple $(\Gamma, r, \beta, P)$, where - $\Gamma$ is a nonempty correspondence from $\Xsf$ to $\Asf$, which in turn defines the **feasible state-action pairs** $$\Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)},$$ - a **reward function** $r \colon \Gsf \to \RR$, - a **discount factor** $\beta$ in $[0,1)$, and - a **stochastic kernel** $P$ from $\Gsf$ to $\Xsf$, which provides transition probabilities for the next period state given current state and action. Since $P$ is a stochastic kernel, it satisfies $\sum_{x'} P(x, a, x') = 1$ for all $(x,a) \in \Gsf$. Given an initial condition $X_0 = x$, the objective is to maximize the expected discounted sum $$ \EE \sum_{t \geq 0} \beta^t r(X_t, A_t) \quad \st \quad A_t \in \Gamma(X_t) \text{ for all } t \geq 0. $$ Here $(X_t)_{t \geq 0}$ takes values in $\Xsf$ and $(A_t)_{t \geq 0}$ takes values in $\Asf$. After observing state $X_t$, the controller chooses action $A_t$ from the feasible set $\Gamma(X_t)$ and the new state $X_{t+1}$ is drawn from the distribution $P(X_t, A_t, \cdot)$. The constant $\beta \in [0,1)$ is a discount factor and $r$ is a reward function. In maximizing this objective, the action sequence $(A_t)$ must also satisfy an information constraint: Each $A_t$ is required to be measurable with respect to the $\sigma$-algebra generated by $(X_0, \ldots, X_t)$. Thus, the controller can use information from the past and present but not the future. As was the case for the firm problem in {ref}`s-fpintro`, it turns out that actions depending on all of $(X_0, \ldots, X_t)$ are no better than actions that depend only on $X_t$. We will show this formally in {ref}`ss-nonstat`. As a result, we focus on **stationary Markov policies**, where the same deterministic function of the state is applied at every point in time. We call such stationary Markov policies **feasible policies**, the set of which is given by $$ \Sigma \coloneq \setntn{\sigma \in \Asf^\Xsf} {\sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf}. $$ (eq-mdpf) For each $\sigma \in \Sigma$, we set $$ P_\sigma(x, x') \coloneq P(x, \sigma(x), x') \quad \text{and} \quad r_\sigma(x) \coloneq r(x, \sigma(x)). $$ (eq-rps) It follows from our assumptions on $P$ that $P_\sigma$ is a stochastic matrix, meaning that $P_\sigma \geq 0$ and all rows sum to one. By choosing a policy $\sigma$, the controller determines a reward function $r_\sigma$ on the state and Markov dynamics $P_\sigma$ for the state process. Following notation in {ref}`ss-markop`, we write $$ (P_\sigma h)(x) = \sum_{x' \in \Xsf} h(x') P_\sigma(x, x') \qquad (h \in \RR^\Xsf, \; x \in \Xsf), $$ interpreting this value as the expectation of $h(X_{t+1})$ when $X_t = x$ and the controller uses policy $\sigma$. The lifetime value of $\sigma$ given $X_0 = x$ is $$ v_\sigma(x) \coloneq \EE \sum_{t \geq 0} \beta^t r(X_t, \sigma(X_t)) = \sum_{t \geq 0} \beta^t \EE \, r_\sigma(X_t) $$ when $(X_t)_{t \geq 0}$ is a Markov chain generated by $P_\sigma$ with initial condition $X_0 = x \in \Xsf$. Since $\EE \, r_\sigma(X_t) = (P_\sigma^t r_\sigma)(x)$, the function $v_\sigma$ can be expressed pointwise on $\Xsf$ as $$ v_\sigma = \sum_{t \geq 0} (\beta P_\sigma)^t r_\sigma = (I-\beta P_\sigma)^{-1} r_\sigma, $$ (eq-vsigmdp) where $I$ is the identity map on $\RR^\Xsf$, the set of real-valued functions on $\Xsf$. This representation on the right-hand side is essentially the same as {eq}`eq-vsol`. In particular, the second equality comes from the Neumann series lemma (page ). See also {cite}`puterman2005markov`, Theorem 6.1.1, or Chapter 5 of {cite}`sargent2025dynamic`. The policy operator associated with given $\sigma$ for the finite MDP model takes the form $$ (T_\sigma \, v)(x) = r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x') \qquad (v \in \RR^\Xsf, \; x \in \Xsf) $$ (eq-tsig_mdp0) ```{exercise} :label: ex-tsfmdp Prove that $T_\sigma$ is a contraction of modulus $\beta$ on $\RR^\Xsf$ under the supremum norm, with unique fixed point equal to $v_\sigma$ in {eq}`eq-vsigmdp`. ``` ```{solution} ex-tsfmdp In operator notation, the action of $T_\sigma$ can be written as $T_\sigma \, v = r_\sigma + \beta P_\sigma \, v$. With $\| \cdot \|$ as the supremum norm and $v, v' \in \RR^\Xsf$, we have $$ \| T_\sigma \, v - T_\sigma \, v' \| = \beta \| P_\sigma \, (v - v') \| \leq \beta \| P_\sigma \| \, \| v - v' \| = \beta \| v - v' \|, $$ (Readers who are less comfortable with an operator-theoretic approach can write these steps out pointwise, at fixed $x \in \Xsf$, and arrive at the same bound. In the last step we used $\| P_\sigma \| = 1$ from {prf:ref}`l-mopfp` on page .) The unique fixed point solves $v = r_\sigma + \beta P_\sigma v$. Assuming that $I - \beta P_\sigma$ is invertible, the fixed point is $v_\sigma = (I-\beta P_\sigma)^{-1} r_\sigma$. The invertibility assumption holds because $\rho(P_\sigma) = 1$ (see {prf:ref}`l-mopfpl` on page ) and hence $\rho(\beta P_\sigma) = \beta < 1$. See {prf:ref}`c-ibnl` on page  for more details. ``` (sss-coreop)= #### Core Optimality Results The definition of the value function is the same as that for the firm problem in {ref}`sss-fintroc`: $$ \vmax(x) \coloneq \sup_{\sigma \in \Sigma} v_\sigma(x) \qquad (x \in \Xsf). $$ Similarly, a policy $\sigma \in \Sigma$ is called **optimal** if $v_\tau \leq v_\sigma$ for all $\tau \in \Sigma$. ```{exercise} :label: ex-vha Let $\bar r = \max_{(x,a) \in \Gsf} | r(x, a)|$ and $M = \bar r / (1-\beta)$. Let $[-M, M]$ be all $v \in b\Xsf$ such that $|v| \leq M$. Show that 1. every $T_\sigma$ is a self-map on $[-M, M]$ and 2. $|v_\sigma| \leq M$ for all $\sigma \in \Sigma$. Conclude that $\vmax$ is well-defined as a real-valued function on $\Xsf$. ``` ```{solution} ex-vha Fix $\sigma \in \Sigma$. We have $|T_\sigma \, v| \leq \bar r + \beta P |v|$. Hence, for $v \in [-M, M]$, we have $|T_\sigma \, v| \leq \bar r + \beta M = M$. In particular, $T_\sigma$ is a self-map on the closed set $[-M, M] \subset \RR^\Xsf$. As a result, its fixed point $v_\sigma$ also lies in this set. From this fact we obtain $v_\sigma(x) \leq M$ for all $\sigma$ and all $x$. It follows that $\vmax$ is well-defined ({prf:ref}`t-completeness`). ``` The **Bellman equation** for this problem is $$ v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\}. $$ (eq-mdp_bell0) This is a functional equation that restricts $v \in \RR^\Xsf$. The **Bellman operator** is given by $$ (T \, v)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \qquad\qquad (x \in \Xsf). $$ (eq-mdpt00) By construction, $v$ solves the Bellman equation if and only if $v$ is a fixed point of the Bellman operator. In the next exercise, $d_\infty(f, g) \coloneq \sup_{x \in \Xsf}|f(x) - g(x)|$. {prf:ref}`c-supineq` may be helpful. ```{exercise} :label: ex-mdptcm Prove that $T$ is a contraction of modulus $\beta$ on $(\RR^\Xsf, d_\infty)$. ``` ```{solution} ex-mdptcm For arbitrary $v, w \in \RR^\Xsf$ and $x \in \Xsf$, $$ \begin{aligned} |(Tv)(x) - (Tw)(x)| & \leq \beta \max_{a \in A} \left| \sum_{x'} v(x')P(x,a,x') - \sum_{x'} w(x')P(x,a,x') \right| \\ & \leq \beta \max_{a \in A} \sum_{x'} \left| v(x') - w(x') \right| P(x,a,x') \\ & \leq \beta \| v - w\|, \end{aligned} $$ where $\| \cdot \|$ is the supremum norm, and hence $\|Tv-Tw\| \leq \beta \|v-w\|$. In the first step, the $\max$ operation is taken outside the absolute value via the inequality in {prf:ref}`c-supineq`. ``` We say that $\sigma \in \Sigma$ is **$v$-greedy** if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \quad \text{for all } x \in \Xsf, $$ (eq-mdpmvg0) ```{exercise} :label: ex-mdpgc Prove that $\sigma$ is $v$-greedy if and only if $T_\sigma \, v \geq T_\tau \, v$ for all $\tau \in \Sigma$. ``` We can now state the following optimality result, which naturally mirrors our previous result for the firm problem ({prf:ref}`t-fintroop`). ```{prf:theorem} :label: t-mdpfo The value function $\vmax$ is the unique solution to the Bellman equation in $\RR^\Xsf$. In addition, a policy $\sigma \in \Sigma$ is optimal if and only if it is $\vmax$-greedy. At least one optimal policy exists. ``` A full proof of {prf:ref}`t-mdpfo` can be found in Chapter 5 of {cite}`sargent2025dynamic`. The proof is almost identical to that of {prf:ref}`t-fintroop`, which we provided for the firm problem. We will also prove {prf:ref}`t-mdpfo` in {ref}`sss-mdpopt`, as a special case of far more general results. The next obvious step is to use the results in {prf:ref}`t-mdpfo` to compute optimal policies. Next we consider algorithms designed for this purpose. (sss-mdp_algos)= #### Algorithms The three most important algorithms for solving dynamic programming problems are **value function iteration** (VFI), **Howard policy iteration**, and **optimistic policy iteration** (OPI). In the present setting, they take the forms of {prf:ref}`algo-vfi_os`, {prf:ref}`algo-hpi_os`, and {prf:ref}`algo-opi_os`. ```{prf:algorithm} Value function iteration :label: algo-vfi_os - input $v \in \RR^\Xsf$, an initial guess of $\vmax$ - input $\tau$, a tolerance level for error - $\epsilon \leftarrow \tau + 1$ - while $\epsilon > \tau $: - $v' \leftarrow Tv$ - $\epsilon \leftarrow \| v' - v \|$ - $v \leftarrow v'$ - **return** a $v$-greedy policy $\sigma$ ``` ```{prf:algorithm} Howard policy iteration :label: algo-hpi_os - input $\sigma \in \Sigma$ and tolerance $\tau$ - set $v \leftarrow (I - \beta P_{\sigma} )^{-1} r_{\sigma}$ and $\epsilon \leftarrow \tau + 1$ - while $\epsilon > \tau $: - $\sigma \leftarrow $ a $v$-greedy policy - $v' \leftarrow (I - \beta P_{\sigma} )^{-1} r_{\sigma}$ - $\epsilon \leftarrow \| v' - v \|$ - $v \leftarrow v'$ - return $\sigma$ ``` ```{prf:algorithm} Optimistic policy iteration :label: algo-opi_os - input $v \in \RR^\Xsf$, an initial guess of $\vmax$ - input $m \in \NN$ and tolerance $\tau$ - $\epsilon \leftarrow \tau + 1$ - while $\epsilon > \tau $: - $\sigma \leftarrow $ a $v$-greedy policy - $v' \leftarrow T_{\sigma}^m v$ - $\epsilon \leftarrow \| v' - v \|$ - $v \leftarrow v'$ - return $\sigma$ ``` VFI amounts to iterating $k$ times with $T$ from some initial condition $v \in V$ (where $k$ is determined by a fixed tolerance level for error), producing an approximation $v_k \coloneq T^k v$ to $\vmax$, and then computing a $v_k$-greedy policy $\sigma$. This idea is natural, given that $\vmax$-greedy policies are optimal, since $T$ is a contraction mapping and $\vmax$ is the unique fixed point (so that $v_k$ is close to $\vmax$). In HPI, one begins with a guess $\sigma$ of the optimal policy and then iterates between computing the lifetime value of that policy (as given in {eq}`eq-vsigmdp`) and the corresponding greedy policy. In fact HPI is equivalent to Newton fixed point iteration applied to the Bellman operator. See, for example, Chapter 5 of {cite}`sargent2025dynamic`. OPI can be thought of as a "convex combination" of VFI and HPI. Instead of computing the lifetime value $v_\sigma = (I - \beta P_{\sigma})^{-1} r_{\sigma}$ of current policy guess $\sigma$, one computes instead $T_{\sigma}^m v$, which is an approximation to $v_\sigma$ (since $T_\sigma$ is a contraction with fixed point $v_\sigma$). There are two edge cases: 1. If $m$ is large, this approximation is tight, and hence OPI is close to HPI. 2. If $m=1$, OPI reduces to VFI. OPI usually outperforms both VFI and HPI for some intermediate values of $m$. Further intuition and discussion is provided in Chapter 5 of {cite}`sargent2025dynamic`. For the finite MDP setting, we can state the following results: ```{prf:theorem} :label: t-mdpac Given any finite MDP, the algorithms VFI, OPI, and HPI all converge, in the sense that the generated sequence of candidates $(v_k)_{k \geq 0}$ converges to $\vmax$ when the tolerance $\tau$ is set to zero. Moreover, the policy sequence $(\sigma_k)_{k \geq 0}$ generated by HPI converges to an optimal policy in a finite number of steps. ``` We shall prove these results after we discuss convergence of these algorithms in a general setting in {ref}`sss-adpal`. (sss-lps)= #### Solving MDPs via Linear Programming Many dynamic programs can be formulated as linear programs. We illustrate with finite MDPs. To do so, we recall that a typical linear program has the form $$ \min_v \inner{c,v} \; \text{ over all } \; v \in \RR^n \text{ with } Av \leq b. $$ (eq-lp) Here $c$ is a vector in $\RR^n$, the term $\inner{c,v}$ is the inner product $\sum_i c_i v_i$, $A$ is a matrix with $n$ columns and $b$ is a vector with the same length as $Av$. (Other LP formulations replace the inequality $Av \leq b$ with $Av = b$ or $Av \geq b$, or a mix of inequality and equality constraints. Standard LP algorithms and theory can be applied to all of these cases.) To place the finite state MDP from {ref}`sss-tdtm` into this framework, we begin by setting $$ V_D \coloneq \setntn{v \in \RR^\Xsf}{Tv \leq v}, $$ where $T$ is the Bellman operator. Recalling that $\vmax$ represents the value function, we have the following result: ```{prf:lemma} :label: l-vd If $v \in V_D$, then $\vmax \leq v$. ``` ```{prf:proof} Fix $v \in V_D$. We saw in {prf:ref}`ex-mdptcm` that $T$ is globally stable. As $T$ is also order preserving, it follows from $Tv \leq v$ that $T^k v$ is a decreasing sequence on $(\RR^\Xsf, \leq)$ obeying $T^k v \leq v$ for all $k$. Taking the limit gives $\vmax \leq v$. ◻ ``` Now let $c$ be an everywhere positive element of $\RR^\Xsf$ and consider the linear program $$ \begin{aligned} & \min_v \inner{c, v} \\ & \st r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \leq v(x) \text{ for all } (x, a) \in \Gsf. \end{aligned} $$ (eq-lpmdp) Here $\inner{c, v} = \sum_x c(x) v(x)$ and the minimization is over all $v \in \RR^\Xsf$ that satisfy the stated constraint. As before, we require that $c$ is everywhere positive. Evidently {eq}`eq-lpmdp` takes the form of {eq}`eq-lp` after suitable assignment of indices. Thus, {eq}`eq-lpmdp` is a linear program. This leads us to ```{prf:proposition} :label: p-mdplp The value function $\vmax$ is the unique solution to {eq}`eq-lpmdp` in $\RR^\Xsf$. ``` ```{prf:proof} Let $\bar v$ be any solution to {eq}`eq-lpmdp` in $\RR^\Xsf$. The constraint in {eq}`eq-lpmdp` implies that $T \bar v \leq \bar v$. Since $\bar v$ is in $V_D$, {prf:ref}`l-vd` implies that $\vmax \leq \bar v$. In addition, $\bar v \leq \vmax$. Indeed, if $\vmax(x) < \bar v(x)$ for some $x$, then, using $\vmax \leq \bar v$ and the positivity of $c$, we have $\inner{c , \vmax} < \inner{c, \bar v}$. This contradicts the hypothesis that $\bar v$ solves {eq}`eq-lpmdp`, since $\vmax$ is also in the choice set. The contradiction confirms that $\bar v \leq \vmax$ and hence $\bar v = \vmax$. The claim in {prf:ref}`p-mdplp` follows. ◻ ``` {prf:ref}`p-mdplp` implies that we can compute the value function and hence solve the MDP using linear programming techniques. The LP approach is useful for many models, such as those that incorporate additional linear constraints, e.g., bounds on expected rewards and resources, that are difficult to handle with iterative methods. On the other hand, the number of constraints in {eq}`eq-lpmdp` equals $|\Gsf|$, which can be large, so iterative methods are still often preferred. See {ref}`s-cn_egs` for references and further discussion. (sss-cm)= #### Example: Cash Management To illustrate finite MDPs in a concrete setting, we now study a cash management problem faced by a firm that must balance cash holdings against returns from securities. This problem dates back to the work of {cite:t}`baumol1952transactions` and {cite:t}`tobin1956interest`, who developed inventory-theoretic models of the demand for money. In their models, the decision maker decides how much cash to hold versus interest-bearing assets, balancing the opportunity cost of holding idle cash against the transaction costs of converting assets to cash. The decision maker manages fixed total wealth $\bar w$, which is divided between cash holdings $x$ and securities $s = \bar w - x$. Each period, the decision maker experiences random portfolio shocks and must decide whether to transfer funds between cash and securities. The decision maker earns returns on securities but pays transaction costs for transfers and faces penalties for insufficient cash. (Assuming that wealth is fixed allows us to get by tracking only $x$, and not $s$.) The state space for cash is $\Xsf = \{0, 1, \ldots, \bar w\}$. At state $x$, the feasible actions are transfers $a$ satisfying $0 \leq x+a \leq \bar w$. Thus, $\Gamma(x) = \{a \in \ZZ \mid -x \leq a \leq \bar w - x\}$. Portfolio shocks (cash payments, equity payments, debt restructuring, etc.) are iid, written as $(\xi_t)_{t \geq 1}$, and take values in a set $\Xi = \{-k, \ldots, k\}$ with probability mass function $\phi$. Transition probabilities are determined by the next-period state $$ F(x,a, \xi) = \max\{0, \min\{\bar w, x + a + \xi\}\}, $$ (eq-cm_transition) where the max and min keep $x'$ in the state space. The transition probabilities are, therefore, $$P(x, a, x') = \sum_{\xi \in \Xi} \1\{F(x, a, \xi) = x'\} \phi(\xi).$$ Flow profits are given by $$ \pi(x, a, \xi) \coloneq \rho (\bar w - x) - (c + \tau |a|) \1\{a \neq 0\} - p \, \1\{x + a + \xi < 0\}. $$ (eq-cm_R) Here $\rho$ is the rate of return on securities, $c$ is a fixed transaction cost, $\tau$ is a proportional transaction cost, and $p$ is a penalty for insufficient cash (in this case, when $x + a + \xi < 0$). To fit the problem into the MDP framework, we take expectations of flow profits to get the period reward $$ r(x, a) = \sum_{\xi \in \Xi} \pi(x, a, \xi) \, \phi(\xi). $$ (eq-cm_reward) Future payoffs are discounted using discount factor $\beta$. The set of feasible policies $\Sigma$ is defined from $\Gamma$ in the usual way (see {eq}`eq-mdpf`). The lifetime value of any given policy can be computed from $v_\sigma = (I - \beta P_\sigma)^{-1} r_\sigma$, as discussed in {ref}`sss-tdtm`. {numref}`f-cm_comparison` illustrates by using this formula to compute $\sigma$-value functions for two policies. The first is a "do nothing" policy that sets $a = 0$ for all states. The second is a target policy that always moves toward a fixed target cash level. As we will see, both policies are suboptimal. ```{figure} figures/cash_management_comparison.pdf :name: f-cm_comparison :width: 95% Value of a do-nothing policy ($\sigma_1$) and a target policy ($\sigma_2$) ``` Next let's solve for an approximately optimal policy using VFI, as described in {prf:ref}`algo-vfi_os`. {numref}`f-cm_optimal` shows the resulting (approximately) optimal policy and value function. The policy shows the optimal transfer amount as a function of current cash holdings, while the value function shows the lifetime value of following the optimal policy. Here we set total wealth $\bar w = 50$, return on securities $\rho = 0.02$, fixed transaction cost $c = 1$, proportional transaction cost $\tau = 0.1$, penalty for insufficient cash $p = 10$, and $\beta = 0.95$. The cash flow shocks are uniformly distributed on $\{-5, -4, \ldots, 4, 5\}$. The optimal policy recommends that when cash holdings are low, the decision maker should move funds from securities to cash (take a positive action), and when cash holdings are high, move funds from cash to securities (a negative action). The value function declines for large cash balances because wealth is fixed and hence high cash balances mean low holdings of securities and reduced returns. ```{figure} figures/cash_management_optimal.pdf :name: f-cm_optimal :width: 95% Optimal policy and value function for the cash management problem ``` {numref}`f-cm_hpi` shows iterates for the policy sequence and the value sequence under the HPI algorithm. The initial policy is a do-nothing policy. As mentioned in {prf:ref}`t-mdpac`, HPI converges in a finite number of iterations. Here it converges in 5 iterations, so the last policy is the exact optimal policy (modulo floating point arithmetic), and the last $\sigma$-value function is the value function $\vmax$. The gap between the first value function associated with the do-nothing policy, and the final value function associated with the optimal policy, is value of active cash management. This gap is largest at extreme cash levels, where the do-nothing policy either leaves the firm exposed to cash shortfalls or forgoes returns on securities. ```{figure} figures/cash_management_hpi.pdf :name: f-cm_hpi :width: 95% Iterating with HPI from the do-nothing policy ``` {numref}`f-cm_time_path` shows a simulated time path for cash and optimal cash transfers under the optimal policy. Cash is held unchanged in many time periods as a result of transaction costs. ```{figure} figures/cash_management_time_path.pdf :name: f-cm_time_path :width: 80% Time path for cash and actions under the optimal policy ``` (ss-mdpct)= ### Continuous Time In this section we modify our finite state MDP model from {ref}`sss-tdtm` to a continuous time setting. With appropriate manipulations, our continuous time model can be embedded in the discrete time framework. #### Primitives and Values As in the discrete time case, $\Xsf$ and $\Asf$ are finite sets, while the controller is constrained by a feasible correspondence $\Gamma$ from $\Xsf$ to $\Asf$. The definitions of $\Gsf$, $\Sigma$, and $r$ are unchanged. Discounting is determined by a constant $\delta > 0$, referred to as the **discount rate**, while transitions are driven by an **intensity kernel** $Q$ from $\Gsf$ to $\Xsf$, which is a map $Q$ from $\Gsf \times \Xsf$ to $\RR$ that satisfies $$ \sum_{x'} Q(x, a, x') = 0 \text{ for all } (x,a) \text{ in } \Gsf \text{ and } Q(x, a, x') \geq 0 \text{ when } x \not= x'. $$ Informally, over the short interval from $t$ to $t+h$, the controller receives instantaneous reward $r(x,a)h$ and the state transitions to state $x'$ with probability $Q(x, a, x') h + o(h)$. For a fixed $\sigma \in \Sigma$, we obtain an intensity operator (i.e., an infinitesimal generator) $$ Q_\sigma(x, x') \coloneq Q(x, \sigma(x), x') \qquad (x, x' \in \Xsf) $$ that determines a continuous time Markov chain $(X_t)_{t \geq 0}$ with transition probabilities given by $P^\sigma_t \coloneq \me^{t Q_\sigma}$ for all $x \in \Xsf$. In particular, $$ \EE_x h(X_t) = (P^\sigma_t h)(x) \text{ for any } h \in \RR^\Xsf. $$ (For background see Chapter 10 of {cite}`sargent2025dynamic`.) Continuing to define $r_\sigma(x) \coloneq r(x, \sigma(x))$, the lifetime value of following $\sigma$ starting from state $x$ is $$ v_\sigma (x) = \EE_x \int_0^\infty \me^{-\delta t} r_\sigma(X_t) \diff t = \int_0^\infty \me^{-\delta t} (P^\sigma_t r_\sigma)(x) \diff t $$ (eq-lvctm) (Passing the expectation through the integral can be justified by Fubini's theorem.) Using $\delta > 0$, we can rewrite $v_\sigma$ as $$ v_\sigma = \int_0^\infty \me^{t (Q_\sigma - \delta I)} r_\sigma \diff t = (\delta I - Q_\sigma)^{-1} r_\sigma. $$ (eq-ctfunv) The two representations for $v_\sigma$ are the continuous time analogs of the discrete-time representations given in {eq}`eq-vsigmdp`. A proof of the second equality is given in §10.2 of {cite}`sargent2025dynamic`. (Readers familiar with semigroup theory will recognize the two representations in {eq}`eq-ctfunv` as alternative expressions for the resolvent of the semigroup $(\me^{tQ})$ -- see, for example, {cite}`engel2006short`, Theorem 1.10.) (sss-ufiz)= #### Uniformization We can use the ADP framework to reformulate {eq}`eq-ctfunv` by making $v_\sigma$ be the fixed point of an order preserving policy operator. This process is called **uniformization**. The first step is to set $$ P(x, a, x') \coloneq \1\{x = x'\} + \frac{Q(x, a, x')}{m} \quad \text{where} \quad m \coloneq \max_{x \in \Xsf, \, a \in \Asf} |Q(x, a, x)|. $$ (eq-ct_reform_P) Then set $$ \beta \coloneq \frac{m}{m + \delta} \quad \text{and} \quad \hat r_\sigma \coloneq \frac{r_\sigma}{m + \delta}. $$ (eq-ct_reform_beta) As in the discrete time case, for each $\sigma \in \Sigma$, define $P_\sigma$ and $\hat r_\sigma$ according to $$ P_\sigma(x, x') \coloneq P(x, \sigma(x), x') \quad \text{and} \quad \hat r_\sigma(x) = \hat r(x, \sigma(x)). $$ ```{exercise} :label: ex-vsigalt Prove that 1. $P_\sigma$ is a stochastic matrix and 2. the $\sigma$-value function $v_\sigma$ obeys $v_\sigma = (I - \beta P_\sigma)^{-1} \hat r_\sigma$. ``` From {prf:ref}`ex-vsigalt`, we see that $v_\sigma$ is the unique fixed point in $V \coloneq \RR^\Xsf$ of the policy operator $$ T_\sigma \, v = \hat r_\sigma + \beta P_\sigma \, v. $$ (eq-tsigalt) #### Optimality Since {eq}`eq-tsigalt` becomes {eq}`eq-tsig_mdp0` after replacing $\hat r_\sigma$ with $r_\sigma$, we can apply the discrete time MDP theory in {ref}`s-mdps`. The Bellman equation becomes $$ v(x) = \max_{a \in \Gamma(x)} \left\{ \hat r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \qquad (x \in \Xsf). $$ (eq-rhatbe) The optimality properties in {prf:ref}`t-mdpfo` hold and, by {prf:ref}`t-mdpac`, VFI, OPI and HPI all converge. With $\vmax$ denoting the value function, a policy is optimal if and only if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ \hat r(x, a) + \beta \sum_{x'} \vmax(x') P(x, a, x') \right\} \quad \text{for all } x \in \Xsf, $$ The next two exercises unpack these equations and conditions to recover our original continuous time formulation. ```{exercise} :label: ex-hjbdbe Show that $v \in \RR^\Xsf$ obeys {eq}`eq-rhatbe` if and only if $$ \delta v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \sum_{x'} v(x') Q(x, a, x') \right\} \qquad (x \in \Xsf). $$ (eq-hjbde) ``` Equation {eq}`eq-hjbde` connects the exposition above to the traditional theory of continuous time MDPs (see, e.g., {cite}`guo2009continuous`). It is sometimes called the **Hamilton--Jacobi--Bellman** (**HJB**) equation, although that name is more commonly used when the state process is a diffusion. As in {prf:ref}`ex-hjbdbe`, it can be shown that $\sigma \in \Sigma$ is $\vmax$-greedy if and only if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \sum_{x'} \vmax(x') Q(x, a, x') \right\} \quad \text{for all } x \in \Xsf. $$ #### Example: Service Rate Control Here we study a queue system where a firm controls service rates to maximize profit. The firm operates a service facility with finite capacity $N$. Customers arrive according to a Poisson process with rate $\lambda$. The state $x \in \Xsf = \{0, 1, \ldots, N\}$ represents the number of customers currently waiting for service. The firm can control the service rate by selecting from a finite set of actions $\Asf$. Each action $a \in \Asf$ corresponds to a service rate $\mu(a)$. Higher service rates allow faster customer processing but incur greater operating costs. The intensity kernel $Q$ associated with this problem is $$ Q(x, a, x') = \begin{cases} \lambda & \text{if } x' = x + 1 \text{ and } x < N \\ \mu(a) & \text{if } x' = x - 1 \text{ and } x > 0 \\ 0 & \text{for all other } x' \text{ with } x \neq x' \text{.} \end{cases} $$ When $x=x'$, we set $Q(x, a, x') = -\sum_{y \neq x} Q(x, a, y)$ in order to ensure $\sum_{x'} Q(x, a, x') = 0$ at each $(x,a)$. All choices of $a$ are feasible, so $\Gamma(x) = \Asf$ for all $x$. The instantaneous profit rate is $$ r(x, a) = \mu(a) R \1\{x > 0\} - h x - c(a), $$ where $R$ is revenue per customer served, $h$ is holding cost per customer per unit time, and $c(a)$ is the service cost rate for action $a$. The first term represents revenue from serving customers, the second term captures the cost of customers waiting in the queue, and the third term is the cost of operating at service rate $\mu(a)$. The firm's objective is to maximize expected discounted profit flow $v_\sigma(x) = \EE_x \int_0^\infty \me^{-\delta t} r(X_t, \sigma(X_t)) \diff t$, where $\delta > 0$ is the discount rate. We solve the problem using the uniformization technique discussed in {ref}`sss-ufiz`. The first step is to calculate the uniformization rate $m = \max_{x,a} |Q(x,a,x)|$. Here - $Q(0, a, 0) = -\lambda$ (only arrivals when empty), - $Q(x, a, x) = -(\lambda + \mu(a))$ for $0 < x < N$ (both arrivals and departures), and - $Q(N, a, N) = -\mu(a)$ (only departures at capacity). Hence $$ m = \lambda + \bar \mu \quad \text{where} \quad \bar \mu \coloneq \max_a \mu(a). $$ Thus, following the specifications in {ref}`sss-ufiz`, we set $$ P(x, a, x') = \1\{x = x'\} + \frac{Q(x, a, x')}{\lambda + \bar \mu}, \quad \beta = \frac{\lambda + \bar \mu}{\lambda + \bar \mu + \delta}, \quad \hat r(x, a) = \frac{r(x, a)}{\lambda + \bar \mu + \delta}. $$ We then compute the optimal policy using VFI based on the Bellman equation {eq}`eq-rhatbe`. {numref}`f-queue_ctmdp` shows the optimal policy and value function for a system with $N = 10$ customers, arrival rate $\lambda = 2.0$, service rates $\mu = (2.5, 3.0, 3.5)$, revenue $R = 10$, holding cost $h = 2.5$, service costs $c = (1.5, 2.0, 4.5)$, and discount rate $\delta = 0.1$. When the queue is empty, the firm uses the lowest service rate to minimize operating costs. As the queue length increases, the firm gradually raises the service rate to balance the increasing holding costs against service costs. The value function increases initially as queue length grows (reflecting the value of serving customers) but eventually decreases as holding costs dominate. ```{figure} figures/queue_ctmdp_policy.pdf :name: f-queue_ctmdp :width: 95% Optimal service rate policy and value function for the queue system ``` (ss-mdpext)= ### Extensions In this section we discuss extensions of the MDP framework that parallel those for the firm problem. These include nonlinear objectives including mean-variance and risk-sensitive preferences. As before, we find that nonlinearity forecloses a recursive structure, motivating resort to period-by-period reformulations that restore tractability. We also introduce ambiguity, where the controller faces uncertainty about transition probabilities. #### Nonlinear Criteria Some extensions to the firm problem discussed in {ref}`ss-fintext` have counterparts here. For example, we discussed maximization problems of the form $$ \max_{\sigma \in \Sigma} U( Z_\sigma ) \quad \text{where} \quad Z_\sigma \coloneq \sum_{t=0}^{T(\sigma)-1} \beta^t \pi_t + \beta^{T(\sigma)} s, $$ (eq-cuoo) and $U$ is a nonlinear real-valued function. One example was given in {eq}`eq-uuoo`, where $U$ was the mean-variance map in {eq}`eq-mvar`. In other examples, $U$ emerges from value-at-risk, conditional value-at-risk, risk sensitivity, or a desire for robustness. There are obvious parallels for the MDP model we introduced in {ref}`ss-fmdpt`. We simply take the criterion from {eq}`eq-cuoo` and modify it to $$ \max_{\sigma \in \Sigma} U \left( Z_\sigma \right) \quad \text{where} \quad Z_\sigma \coloneq \sum_{t \geq 0} \beta^t r(X_t, \sigma(X_t)). $$ (eq-muoo) Here $(X_t)_{t \geq 0}$ is a Markov chain generated by $P_\sigma$ with fixed initial condition and $U$ is again, some given real-valued function. As before, we can choose $U$ to inject concern for mean-variance trade-offs, value-at-risk, conditional value-at-risk, risk sensitivity and a desire for robustness. (sss-btr)= #### Back to Recursions In {ref}`sss-difficulties` we discussed how optimization problems of the form {eq}`eq-cuoo` can be troublesome. The lack of a recursive structure prevents us from using Bellman machinery. The result is that we are left without a clear path to optimization, as well as the loss of time-consistency. Not surprisingly, all of these difficulties remain present when we switch to the MDP version in {eq}`eq-muoo`. For the most part, theorists have responded in ways similar to ones discussed for the firm problem in {ref}`sss-fbtr`, where recursive structure is enforced by applying nonlinear criteria period-by-period, rather than applying them directly to the sum representing lifetime value. For example, we can modify the policy operator {eq}`eq-fintrots` to $$ (T_\sigma \, v)(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta (K_\sigma \, v)(x) \right] $$ (eq-rfintrots) where, for each $\sigma \in \Sigma$, the map $K_\sigma$ is a given nonlinear operator. For example, by setting $$ (K_\sigma \, v)(x) = - \frac{1}{\gamma} \ln \left\{ \int \exp(- \gamma v(x')) P_\sigma(x, \diff x') \right\} $$ we switch the MDP problem to entropic risk preferences. As for the firm problem, this alternative approach to inserting risk preferences is less intuitive than the direct approach in {eq}`eq-muoo` and raises a question: when is $v_\sigma$ well-defined by the nonlinear functional equation {eq}`eq-rfintrots`? In addition, the formulation in {eq}`eq-rfintrots` offers the advantage that valuations are recursive by construction. This allows us to apply solution methods that extend Bellman's original theory in natural ways. We explore these ideas in the remainder of the text, beginning with the abstract recursive setup in {prf:ref}`c-adps`. (sss-ambi)= #### Ambiguity The MDP framework has been extended to include a decision maker's concerns about misspecification of the probability distribution. For example, consider the cash management problem from {ref}`sss-cm`. Applying the definitions of the profit function $\pi$ and the transition function $F$ from that section, the risk-neutral problem can be written as $$ \max_{\sigma \in \Sigma} \, \EE_\phi \, \sum_{t \geq 0} \beta^t \pi(X_t, \sigma(X_t), \xi_{t+1}) $$ (eq-rncm) where $(X_t)_{t \geq 0}$ obeys $X_{t+1} = F(X_t, \sigma(X_t), \xi_{t+1})$ for all $t$. We subscript expectation with $\phi$ to emphasize the fact that the mathematical expectation over $\xi_{t+1}$ is taken with respect to distribution $\phi$. Suppose now that the manager doesn't know $\phi$ but does know that $\phi$ belongs to a set of possible distributions $\Phi$. This is how the decision maker expresses **ambiguity** about the probability law that governs $\xi_{t+1}$. If we were to say to the decision maker to put a subjective probability distribution over $\Phi$, the decision maker would decline to do so. The decision maker proceeds in the spirit of Abraham Wald {cite}`wald1950statistical` by assuming only that he knows a set of possible models. To do this, he replaces {eq}`eq-rncm` with $$ \max_{\sigma \in \Sigma} \, \min_{\phi \in \Phi} \EE_\phi \, \sum_{t \geq 0} \beta^t \pi(X_t, \sigma(X_t), \xi_{t+1}). $$ (eq-rncma) By using this criterion, the manager seeks a decision rule that works well enough no matter which probability distribution $\phi \in \Phi$ governs $\xi_{t+1}$. A recursive structure is absent from criterion {eq}`eq-rncma`. One way out of this difficulty is to make our decision maker express model ambiguity in a way that is more susceptible to a recursive formulation. For example, we could ascribe our decision maker a value function that solves the following Bellman equation: $$ v(x) = \max_{a \in \Gamma(x)} \min_{\phi \in \Phi} \sum_\xi \left\{ \pi(x, a, \xi) + \beta v(F(x, a, \xi)) \right\} \phi(\xi). $$ (eq-rbfp) As we will see in {ref}`ss-rcfp`, this kind of specification puts dynamic programming theory back in business. (s-og)= ## Optimal Savings This section presents an optimal savings problem (also called the optimal consumption problem and the income fluctuation problem). This problem is a building block for many economic models. It features a basic intertemporal trade-off from consuming now or later. This trade-off can be solved by dynamic programming. Unlike the finite state problems above, the optimal savings model has a continuous state space, as well as a continuous action space. We define policies, policy operators, and lifetime values and state the key optimality results. We then look at extensions to Epstein--Zin preferences. (ss-ogop)= ### Policies and Decisions In an optimal savings problem (sometimes called an "income fluctuation problem"), a household seeks to maximize $$ \EE \, \sum_{t=0}^{\infty} \beta^t u(C_t) \quad \text{s.t.} \quad W_{t+1} = R(W_t - C_t) + Y_{t+1} \quad \text{and} \quad 0 \leq C_t \leq W_t. $$ (eq-objos) The constraints in {eq}`eq-objos` are required to hold for all $t \geq 0$, and an initial condition $w_0$ is taken as given. The **utility function** $u \colon \RR_+ \to \RR$ maps current consumption $C_t$ into a utility value (loosely speaking, a measure of satisfaction), $\beta \in (0,1)$ is a **discount factor** indicating impatience, and $R > 0$ is a gross rate of return on assets. The variable $W_t$ represents wealth at time $t$, while $Y_t$ is labor income. To keep the model simple, we assume $(Y_t)$ is iid with common distribution $\phi \in \dD(\RR_+)$, the set of distributions (i.e., Borel probability measures) on $\RR_+$. (We study more general settings later.) The variable $W_t$ is the state of the dynamic program, while $C_t$ is the action. {numref}`f-og_decisions` shows the timing for the optimal savings problem. After observing $W_t$, the household chooses $C_t$ and hence $W_t - C_t$. Then labor income $Y_{t+1}$ is realized and the state updates to $W_{t+1}$. The process then repeats. ```{figure} figures/og_decisions.svg :name: f-og_decisions Timing for the optimal savings problem ``` In maximization problem {eq}`eq-objos` there is another constraint: $C_t$ can depend only on information available at time $t$. Formally, current consumption $C_t$ must be a (deterministic) Borel measurable function of shocks, states, and actions observed up to and including time $t$. Thus, the current action cannot depend on future values such as $Y_{t+1}$ or $W_{t+1}$. A mapping from the history of the state and the shocks into the current action is called a **policy function**.[^2] The infinite horizon, iid $(Y_t)$-process, time-invariant structure of the optimal savings problem lets us focus on policies that make current consumption $C_t$ be a deterministic function $\sigma$ of the current state $W_t$. (We will prove this later and discover how it depends on the iid-nature of the $(Y_t)$ process.) We impose the following simplifying conditions: ```{prf:assumption} :label: a-uf The function $u$ is continuous and bounded on $\RR_+$ and the distribution of labor income can be represented by a continuous density $\phi$ on $\RR_+$. ``` In a slight abuse of notation, we use $\phi$ to represent the density of labor income as well as the corresponding distribution (i.e., Borel probability measure on $\RR_+$). Thus, in the integrals below, $\phi(\diff y)$ and $\phi(y) \diff y$ have the same meaning. (ss-pfslv)= #### Lifetime Value In this setting, a **stationary Markov policy** is a Borel measurable map $\sigma$ from $\RR_+$ to itself. Here we refer to stationary Markov policies more simply as **policies**. We call a policy $\sigma$ **feasible** if $0 \leq \sigma(w) \leq w$ for all $w \in \RR_+$, so that the consumption response $c = \sigma(w)$ obeys the inequalities in {eq}`eq-objos`. Let $\Sigma$ denote the set of all feasible policies. We seek $\sigma \in \Sigma$ that maximizes expected lifetime value. For given $\sigma$ and initial condition $w = w_0$, expected lifetime value is $$ v_\sigma(w) = \EE \sum_{t \geq 0} \beta^t u(\sigma(W_t)) \quad \text{when } \; W_{t+1} = R(W_t - \sigma(W_t)) + Y_{t+1} $$ (eq-iidasl) for all $t \geq 0$ and $(W_t)_{t \geq 0}$ starts at $w$. Below, we refer to $v_\sigma$ as the **$\sigma$-value function**. It is helpful to represent $v_\sigma$ as a **policy operator** for $\sigma \in \Sigma$: $$ (T_\sigma \, v)(w) = u(\sigma(w)) + \beta \int v(R(w - \sigma(w)) + y) \phi(\diff y) \qquad (w \in \RR_+). $$ (eq-ogpolop) This policy operator is a continuous state analog of the finite MDP policy operator we saw in {eq}`eq-tsig_mdp0`. It acts on functions $v \in V$, where $$ V \coloneq b\RR_+ \coloneq \text{all bounded Borel measurable functions from } \RR_+ \text{ to } \RR. $$ Recall that $V$ is a Banach space (see {ref}`s-fa`) with supremum norm $\| v \| := \sup_x |v(x)|$. ```{exercise} :label: ex-tsvog Show that $T_\sigma \, V \subset V$ when {prf:ref}`a-uf` holds. ``` ```{solution} ex-tsvog Let {prf:ref}`a-uf` hold and fix $v \in V$. Let $\sigma$ be any policy in $\Sigma$. Since $v$ and $u$ are bounded, the function $$ (T_\sigma \, v)(w) = u(\sigma(w)) + \beta \int v(R(w - \sigma(w)) + y) \phi(\diff y) $$ is also bounded. Measurability follows from measurability of $v$, continuity of $u$ and measurability of $\sigma$. ``` Policy operators are useful because $v \in V$ is a fixed point of $T_\sigma$ if and only if it equals the $\sigma$-value function. Thus, the fixed point of $T_\sigma$ characterizes the lifetime value of $\sigma$. This is a consequence of the following lemma. ```{prf:lemma} :label: l-ogtsup If {prf:ref}`a-uf` holds, then every policy operator $T_\sigma$ is globally stable on $V$. Moreover, the unique fixed point of $T_\sigma$ in $V$ is the function $v_\sigma$ defined in {eq}`eq-iidasl`. ``` ```{prf:proof} Fix $\sigma \in \Sigma$ and set $r_\sigma \coloneq u \circ \sigma$. Let $P_\sigma$ be the Markov operator (see {ref}`sss-markop`) defined at $v \in V$ by $$ (P_\sigma \, v)(w) \coloneq \int v(R(w - \sigma(w)) + y) \phi(\diff y) \qquad (w \in \RR_+). $$ Using this notation, we can write $$ T_\sigma \, v = r_\sigma + \beta P_\sigma \, v. $$ (eq-tsiggrow) In {ref}`ss-markop` and {prf:ref}`c-ibnl` we show that $P_\sigma$ is a bounded linear operator from $V$ to itself and, using $\beta \in (0,1)$, that $T_\sigma$ is globally stable on $V$ with unique fixed point $v_\sigma \in V$ obeying $$ v_\sigma = (I - \beta P_\sigma)^{-1} r_\sigma = \sum_{t \geq 0} (\beta P_\sigma)^t \, r_\sigma . $$ (eq-psirs) (Here $I$ is the identity map on $V$ and the second equality follows from the Neumann series lemma.) It remains only to show that $v_\sigma$ in {eq}`eq-psirs` agrees with $v_\sigma$ defined in {eq}`eq-iidasl`. To obtain this we use the fact that, when $W_{t + 1} = R(W_t - \sigma(W_t)) + Y_{t + 1}$ for all $t$ and $W_0 = w$, $$ \left( P_\sigma^t \, r_\sigma \right)(w) = \EE \left[ r_\sigma(W_t) \, \given W_0 = w \right] = \EE \left[ u(\sigma(W_t)) \, \given W_0 = w \right]. $$ (eq-psirs2) (The first equality also uses results in {ref}`ss-markop`.) Combining this with the last expression in {eq}`eq-psirs`, we see that $v_\sigma$ in {eq}`eq-psirs` and {eq}`eq-iidasl` are identical. ◻ ``` Incidentally, one can use the law of iterated expectations to prove that the $\sigma$-value function $v_\sigma$ is a fixed point of $T_\sigma$. Write $$ v_\sigma(w) = u(\sigma(w)) + \EE \sum_{t \geq 1} \beta^t u(\sigma(W_t)) . $$ Letting $\EE_1$ be the expectation conditional on $W_1$, applying the law of iterated expectations implies $$ v_\sigma(w) = u(\sigma(w)) + \beta \EE \, \left[ \EE_1 \, \sum_{t \geq 1} \beta^{t-1} u(\sigma(W_t)) \right] = u(\sigma(w)) + \beta \EE \, v_\sigma(W_1). $$ Expanding the last expression yields $$ v_\sigma(w) = u(\sigma(w)) + \beta \int v_\sigma(R(w - \sigma(w)) + y) \phi(\diff y). $$ (eq-vsigreciid) Thus, $v_\sigma$ is a fixed point of $T_\sigma$. (sss-lval)= #### Lifetime Values as Limits In the previous section we learned that fixed points of policy operators represent lifetime value. What do finite iterates of policy operators represent? Fixing $\sigma$ and inspecting the definition of $T_\sigma$ (see {eq}`eq-ogpolop`) indicates that $(T_\sigma \, v)(w)$ represents the reward received from using policy $\sigma$ for one period, when $w$ is initial wealth and the function $v$ is used to evaluate the reward from wealth in the second period. We can lengthen the horizon by iterating with $T_\sigma$ while keeping the terminal value function $v$ fixed. Choosing $k \in \NN$ and using the expression for $T_\sigma$ in {eq}`eq-tsiggrow`, we get $$ T^k_\sigma \, v = r_\sigma + \beta P_\sigma \, r_\sigma + \cdots + (\beta P_\sigma)^{k-1} r_\sigma + (\beta P_\sigma)^k v $$ The expression on the right is the value of using policy $\sigma$ for $k$ periods and then receiving a reward for terminal wealth determined by the function $v$. Thus, it is the finite horizon value of following $\sigma$ under this terminal condition. It seems plausible that the infinite-horizon lifetime value of a policy $\sigma$ could equal the limit of finite horizon values, so that $$ v_\sigma = \lim_{k \to \infty} T^k_\sigma v. $$ (eq-vsigbylim) {prf:ref}`l-ogtsup` assures us that this is true: since $T_\sigma$ is globally stable on $V$ with unique fixed point $v_\sigma$, the limit in {eq}`eq-vsigbylim` exists and equals $v_\sigma$, independent of the terminal condition $v \in V$. {numref}`f-os_multi_policies` shows two arbitrarily chosen feasible policies and their lifetime values when $R=1.04$, $\beta=0.95$, $u(c)=1 - \exp(-c)$, and $Y_t = \exp(\nu Z_t)$ when $\nu=0.8$ and $Z_t$ is standard normal. The lifetime values were computed via {eq}`eq-vsigbylim`. ```{figure} figures/os_multi_policies.pdf :name: f-os_multi_policies :width: 95% Randomly chosen policies and their lifetime values ``` (sss-iidosopt)= ### Optimality The **value function** for the optimal savings model is $$ \vmax(w) \coloneq \sup_{\sigma \in \Sigma} v_\sigma (w) \qquad (w \in \RR_+). $$ (eq-osvalf) Under {prf:ref}`a-uf` the supremum is always well defined in $\RR$, since $u$ and hence $r_\sigma$ is bounded by some constant $M$, implying that, for any $w \in \RR_+$ and $\sigma \in \Sigma$, $$ v_\sigma(w) \leq \sum_{t \geq 0} \beta^t M = \frac{M}{1-\beta}. $$ A policy is called **optimal** if $v_\sigma = \vmax$; that is, if following the policy from every initial state $w$ leads to the largest possible lifetime value attainable from $w$. The set of feasible policies lies in an infinite-dimensional function space, so we cannot find an optimal policy by exhaustive search. We want a systematic and efficient search procedure. Following the techniques we used for the firm management problem in {ref}`s-fpintro`, our approach will be to (a) set up a Bellman equation to help us assign maximal lifetime values to states, and (b) solve for a greedy policy with respect to this maximizing function. (sss-og_bin)= #### Bellman's Method Fix $v \in V$. In the present setting, a policy $\sigma \in \Sigma$ will be called **$v$-greedy** if $$ \sigma(w) \in \argmax_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} \quad \text{for all } w \geq 0. $$ (eq-osbellarg) A $v$-greedy policy uses $v$ to value next-period states and then chooses consumption optimally to trade off current utility against expected discounted future value associated with the implied level of savings. The following statements are both true: 1. Computing $v$-greedy policies is typically much easier than computing optimal policies, since we are only solving a two-period problem. 2. Computing $v$-greedy policies can be equivalent to computing optimal policies, given the right choice of $v$. What is the right choice of $v$? A natural candidate is the value function, since the value function tells us the maximal reward from alternative states. We explain this in more detail in {ref}`sss-dpros`. In that same section, we will also use the fact that the value function satisfies an important functional equation, which we now describe. We say that $v \in V$ satisfies the **Bellman equation** for the optimal savings problem if $$ v(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} \quad \text{for all } w \geq 0. $$ (eq-osbell) Stating that $v$ solves the Bellman equation is equivalent to stating that $v$ is a fixed point of the **Bellman operator** $T$ that maps a value function $v(w)$ into a value function $(T v)(w)$ defined by $$ (T v)(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} \qquad (w \geq 0). $$ (eq-osbell_op_0) The next lemma discusses properties of greedy policies and the Bellman operator. ```{prf:lemma} :label: l-conofos If {prf:ref}`a-uf` holds, then the function $f$ defined by $$ f(c, w) \coloneq u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \qquad (0 \leq c \leq w) $$ is continuous for all $v \in V$. In addition, 1. there exists at least one $v$-greedy policy for each $v \in V$ and 2. $Tv$ is continuous and bounded whenever $v \in V$. ``` ```{prf:proof} The claims in {prf:ref}`l-conofos` follow from results that will be proved later in the book. (The first claim follows from {prf:ref}`a-uf` and {prf:ref}`l-dksf` on page  (see, in particular, {prf:ref}`eg-srssf`). The second and third claims then follow from {prf:ref}`t-berge` on page .) ◻ ``` ```{exercise} :label: ex-tmodb Prove that $T$ is a contraction (see {ref}`sss-conmap` for the definition) on $(V, \| \cdot \|)$. \{prf:ref}`c-supineq` on page .] ``` ```{solution} ex-tmodb {prf:ref}`l-conofos` tells us that $T$ maps $V$ into $bc\RR_+$, which is a subset of $b\RR_+$. In particular, $T$ is a self-map on $V$. For the contraction property, we apply the sup inequality from {prf:ref}`c-supineq` and the triangle inequality for integrals to obtain $$ \begin{aligned} |(Tv)(w) - (Tv')(w)| & \leq \max_{0 \leq c \leq w} \beta \int \left| v(R(w - c) + y) - v'(R(w - c) + y) \right| \phi(\diff y) \\ & \leq \beta \| v - v'\|. \end{aligned} $$ Taking the supremum gives $\|Tv - Tv'\| \leq \beta \|v-v'\|$. ``` Since $(V, \| \cdot \|)$ is a Banach space, the contraction property in {prf:ref}`ex-tmodb` implies that $T$ is globally stable on $V$. (See {ref}`sss-conmap` for details). (sss-dpros)= #### DP Results for Optimal Savings Dynamic programming theory tells us that, under {prf:ref}`a-uf`, 1. at least one optimal policy exists, 2. the value function $\vmax$ is the unique solution to the Bellman equation in $V$, and 3. a policy $\sigma \in \Sigma$ is optimal if and only if it is $\vmax$-greedy. A direct proof of (i)--(iii) can be found in {cite}`stokey1989recursive`, {cite}`stachurski2022economic` and numerous other sources. The proofs heavily exploit the fact that the Bellman operator is a contraction mapping (as discussed in {prf:ref}`ex-tmodb`). We skip proofs for now, noting that they will be special cases of proofs we shall provide in {ref}`ss-osfirop`. Let's review what we've found so far. We started with one optimization problem---choosing an optimal consumption path $C_0, C_1, \ldots$ to maximize expected discounted lifetime utility---and ended up with another one---finding a greedy policy from the value function. Are we actually better off? The answer is: yes! Finding a greedy policy involves solving a scalar optimization problem performed for each state $w$, whereas as our previous optimization problem was infinite dimensional. High dimensionality is the mountain we must climb in all hard optimization problems and here we have used the recursive structure inherent in the problem to map a route up to the top. Of course this claim that we are better off is contingent on us being able to learn what the value function is, so that we can compute $\vmax$-greedy policies---or at least some reasonable approximation. We discuss this topic next. {numref}`f-os_multi_policies_2` shows an approximation of the optimal policy $\sigopt$ and the value function $\vmax$, both computed by OPI, for the same version of the optimal savings problem used in {numref}`f-os_multi_policies`. In this case we set $m=20$. ```{figure} figures/os_multi_policies_2.pdf :name: f-os_multi_policies_2 :width: 95% Approximating the optimal policy and value function via OPI ``` (sss-crracase)= #### Special Case: No Labor Income Let's quickly look at a version of the savings model where it's possible to get an analytical solution for the optimal policy and the value function. We will use this solution to help us investigate the role of parameters and, through this process, consider the need for extensions to the basic optimal savings model. To obtain an analytical solution, we set $Y_t \equiv 0$ and assume that the utility function has the CRRA form $$ u(c) \coloneq \frac{c^{1-\gamma}}{1-\gamma} \qquad (\gamma > 0, \; \gamma \not= 1). $$ (eq-os_crra) The conditions of the preceding discussion are not satisfied, since $u$ is not bounded on $\RR_+$ and may take the value $-\infty$. We assume instead that $\beta R^{1-\gamma} < 1$. This turns out to be sufficient to ensure finite lifetime values when consumption choices are positive: ```{exercise} :label: ex-egs-auto-3 Prove the following: If $\beta R^{1-\gamma} < 1$, then there exists a map $m \colon \RR_+ \to \{-\infty\} \cup \RR$ such that every feasible policy $\sigma$ obeys $v_\sigma(w) \leq m(w)$ for all $w \in \RR_+$. ``` For this CRRA problem, the optimal consumption policy is linear in $w$. That is, $$ \text{there exists a constant } \eta \text{ such that } \sigma(w) = \eta w \text{ is the optimal policy} $$ (eq-ceco) Let's verify this claim and also seek the value of the constant $\eta$. In doing so, we first observe that if {eq}`eq-ceco` holds, then $$ W_t = R^t (1 - \eta)^t w \quad \text{when } \; W_0 = w $$ and hence the value function $\vmax$ satisfies $$ \begin{aligned} \vmax(w) = \sum_t \beta^t u (\eta W_t) & = \sum_t \beta^t u \left( \eta R^t \left(1 -\eta \right)^t w \right) \\ & = \sum_t \beta^t \left( \eta R^t \left(1 -\eta \right)^t \right)^{1-\gamma} u \left( w \right) = \frac{\eta^{1-\gamma}}{1-\beta \left( R \left( 1-\eta \right) \right)^{1-\gamma}} u(w) \end{aligned} $$ Our conjecture is that the linear policy $\sigma(w) = \eta w$ satisfies the Bellman equation with the value function as given above. Under this conjecture, the Bellman equation becomes $$ \vmax(w) = \max_c \left\{ \frac{c^{1-\gamma}}{1-\gamma} + \beta \cdot \frac{\eta^{1-\gamma}}{1-\beta \left( R \left( 1-\eta \right) \right)^{1-\gamma}} \cdot \frac{\left(R \left( w-c \right) \right)^{1-\gamma}}{1-\gamma} \right\} $$ (eq-be) Taking the derivative with respect to $c$ yields the first-order condition $$ c^{-\gamma} + \beta m \left(R \left( w-c \right) \right)^{-\gamma} (-R) =0 \quad \text{ when } \; m \coloneq \frac{\eta^{1-\gamma}} { 1-\beta \left( R \left( 1-\eta \right) \right)^{1-\gamma} } $$ It then follows that $c^{-\gamma} = \beta m R^{1-\gamma}(w-c)^{-\gamma}$. Substituting the optimal policy $\sigma(w) = \eta w$ into this equality gives $$ \left( \eta w \right)^{-\gamma} = \frac{\beta R^{1-\gamma} \eta^{1-\gamma}} {1- \beta \left( R \left( 1-\eta \right) \right)^{1-\gamma}} (1-\eta)^{-\gamma} w^{-\gamma} $$ Now solving the above equality for $\eta$ yields $$ \eta = 1 - \left( \beta R^{1-\gamma} \right)^{1/\gamma} $$ In this connection, given any initial wealth $w$, the value function becomes $$ \vmax(w) = \frac{\eta^{1-\gamma}}{1-\beta \left( R \left( 1-\eta \right) \right)^{1-\gamma}} u(w) = \frac{\left( 1 - \left( \beta R^{1-\gamma} \right)^{1/\gamma} \right)^{1-\gamma}} {1-\beta R^{1-\gamma} \left( \beta R^{1-\gamma} \right)^{\frac{1-\gamma}{\gamma}}} u(w) =\eta^{-\gamma} u(w). $$ It is not difficult to verify that $\vmax(w) = \eta^{-\gamma} u(w)$ solves the Bellman equation {eq}`eq-be` for any $w$. The parameter $\gamma$ governs the curvature of the utility function and hence preferences about consumption smoothing. To see this, observe that consumption at time $t$ is $C_t = \eta W_t = \eta (R(1-\eta))^t w$, so the consumption growth factor is $C_{t+1}/C_t = (\beta R)^{1/\gamma}$. When $\gamma$ is large, the utility function has high curvature and the agent dislikes variation in consumption across time. Conversely, when $\gamma$ is small, the agent is more tolerant of consumption variation, leading to steeper paths. {numref}`fig-crra_paths` illustrates these effects for $\beta = 0.96$ and $R=1$. ```{figure} figures/crra_consumption_paths.pdf :name: fig-crra_paths :width: 70% Optimal consumption paths under CRRA utility for different values of $\gamma$. ``` (ss-ogezintro)= ### Epstein--Zin Preferences There are a number of issues and limitations associated with the basic optimal savings model we have discussed so far. Moreover, these limitations tend to bind more often as we move towards quantitative analysis and interesting research applications. In this section we discuss issues related to risk and intertemporal substitution. This discussion will motivate us to introduce Epstein--Zin preferences, which are a particularly popular specification of intertemporal preferences in economics and finance. #### Risk vs EIS One issue is that, under the model considered so far, the curvature of the utility function $u$ simultaneously governs both risk aversion (e.g., a more strongly concave utility function indicates stronger aversion to risk) and willingness to substitute consumption across time (as we saw in {numref}`fig-crra_paths`, where increasing $\gamma$ led to flatter consumption paths). Willingness to substitute consumption is usually measured by the elasticity of intertemporal substitution (EIS), which, for the CRRA utility function is $1/\gamma$. Larger $\gamma$ pushes down the EIS, indicating preference for smooth consumption over time. The fact that utility curvature controls both risk preferences and the EIS binds attitudes toward uncertainty together with attitudes toward intertemporal substitution. Researchers have found that to explain various macro-finance patterns, it helps to unbind them and allow separate parameters to describe these two attitudes. For example, matching observed equity premia using standard asset pricing models requires high risk aversion, but high $\gamma$ under CRRA implies a small EIS, which creates other difficulties including what is called a risk-free rate puzzle (see, e.g., Chapter 13 of {cite}`ljungqvist2012recursive`). #### EZ Preferences Epstein--Zin preferences {cite:p}`epstein1989risk,weil1990nonexpected` play a big role in many macro-finance models. Under these preferences, the Bellman equation {eq}`eq-osbell` from the standard optimal savings model becomes $$ v(w) = \max_{0 \leq c \leq w} \left[ (1-\beta) c^{1-1/\psi} + \beta \left( \int v(R(w-c) + y)^{1-\gamma} \phi(\diff y) \right)^{\frac{1-1/\psi}{1-\gamma}} \right]^{\frac{1}{1-1/\psi}} $$ where $\psi > 0$ is the EIS and $\gamma > 0$ is the coefficient of relative risk aversion. The inner expectation applies risk adjustment to future value via the Kreps--Porteus expectation, which we met earlier in {ref}`sss-alt`. The outer CES aggregator governs intertemporal substitution. The policy operator {eq}`eq-ogpolop` now becomes $$ (T_\sigma v)(w) = \left[ (1-\beta) \sigma(w)^{1-1/\psi} + \beta \left( \int v(R(w-\sigma(w)) + y)^{1-\gamma} \phi(\diff y) \right)^{\frac{1-1/\psi}{1-\gamma}} \right]^{\frac{1}{1-1/\psi}} $$ for any feasible policy $\sigma$. {numref}`f-ez_multi_policies` shows two arbitrarily chosen policies and their lifetime values under Epstein--Zin preferences, using the same income process as in {numref}`f-os_multi_policies`. The $\sigma$-value functions are now computed by iterating on our new version of $T_\sigma$. Parameters are $R=1.04$, $\beta=0.95$, $\gamma=5$ (risk aversion), and $\psi=1.5$ (EIS). ```{figure} figures/ez_multi_policies.pdf :name: f-ez_multi_policies :width: 95% Policies and lifetime values under Epstein--Zin preferences ``` {numref}`f-ez_optimal` shows the optimal policy and value function under Epstein--Zin preferences, computed via OPI. Compared to the standard expected utility case in {numref}`f-os_multi_policies_2`, the optimal consumption policy is qualitatively similar---increasing and concave in wealth---but the value function differs in interpretation and scale. With $\gamma > 1/\psi$, the agent exhibits preference for early resolution of uncertainty, which affects how future risk is valued. ```{figure} figures/ez_optimal.pdf :name: f-ez_optimal :width: 95% Optimal policy and value function under Epstein--Zin preferences ``` {numref}`f-ez_gamma_comparison` explores how risk aversion affects optimal consumption. The figure shows optimal policies for $\gamma \in \{1.25, 5, 20\}$, holding the EIS fixed at $\psi = 1.5$. Higher risk aversion leads to more precautionary saving: at each wealth level, the agent consumes less and saves more as $\gamma$ increases. Values of $\gamma$ around 10--20 are common in the long-run risk literature {cite:p}`bansal2004risks`, where high risk aversion is needed to match observed asset pricing moments. ```{figure} figures/ez_gamma_comparison.pdf :name: f-ez_gamma_comparison :width: 75% Optimal consumption by risk aversion $\gamma$ ``` #### Optimization Theory While the preceding analysis illustrates the potential usefulness of Epstein--Zin preferences, it puts us on shaky ground technically. For example, optimality properties of the ordinary optimal savings model in {ref}`sss-dpros` depend on contractivity of the Bellman operator. (For a sense of why, read the proof of {prf:ref}`t-fintroop`.) The Bellman operator associated with the Epstein--Zin Bellman equation is not a contraction under the supremum distance for the most quantitatively significant parameterizations, and the same is true for the policy operators. This means that, in order to handle both the standard and the Epstein--Zin variations of the savings problem, we require a more general theory of dynamic programming that can handle both contractive and non-contractive settings. We begin constructing appropriate tools in {prf:ref}`c-adps`. (s-wald)= ## Sequential Analysis This section presents a Bayesian formulation of a statistical decision problem described by {cite}`bertsekas1976dynamic`. Unlike the previous examples, there is no discounting, so the Bellman operator is not a contraction. Nonetheless, the same conceptual framework applies: the optimal loss function solves a Bellman equation and optimal policies have a threshold structure. In subsequent chapters, we will build a theory of dynamic programming that can handle this no-discounting case. For now, our objective is to motivate the theory by exploring the application through guess-work and simulation.[^3] ### Introduction We now consider a Bayesian formulation of the sequential testing problem originally studied by Milton Friedman, Allen Wallis, and Abraham Wald {cite:p}`wald1947sequential,arrow1949bayes`. The following is an account of how the problem was conceived and came to the attention of Wald. The account is by Milton Friedman, one of the giants of 20th Century economics, and relates to his work during World War II as an analyst at the U.S. Government's Statistical Research Group at Columbia University. > In order to understand the story, it is necessary to have an idea of a simple statistical problem, and of the standard procedure for dealing with it. The actual problem out of which sequential analysis grew will serve. The Navy has two alternative designs (say A and B) for a projectile. It wants to determine which is superior. To do so it undertakes a series of paired firings. On each round, it assigns the value 1 or 0 to A accordingly as its performance is superior or inferior to that of B and conversely 0 or 1 to B. The Navy asks the statistician how to conduct the test and how to analyze the results. > > The standard statistical answer was to specify a number of firings and a pair of percentages (e.g., 53% and 47%) and tell the client that if A receives a 1 in more than 53% of the firings, it can be regarded as superior; if it receives a 1 in fewer than 47%, B can be regarded as superior; if the percentage is between 47% and 53%, neither can be so regarded. > > When Allen Wallis was discussing such a problem with (Navy) Captain Garret L. Schuyler, the captain objected that such a test, to quote from Allen's account, may prove wasteful. If a wise and seasoned ordnance officer like Schuyler were on the premises, he would see after the first few thousand or even few hundred \[rounds\] that the experiment need not be completed either because the new method is obviously inferior or because it is obviously superior beyond what was hoped for. Friedman and Wallis worked on the problem for a while but didn't completely solve it. Realizing that, they told Wald about the problem. That set Wald on a path that led him to create sequential analysis {cite:p}`wald1947sequential`. While the story above relates to wartime activity, sequential analysis has many significant applications in economics, finance, operations research, and other fields. Examples include determining the number of clinical trials before bringing a drug to market, real-time fraud detection, algorithmic trading, supply chain monitoring, and experimental interface design by social media companies. On a technical level, this problem differs from the other problems we have investigated so far in that it involves no discounting. As a result, the Bellman operator is not necessarily a contraction. Nonetheless, we will find ways to prove that the core concepts from dynamic programming theory still apply. The setting is as follows: A decision-maker observes a sequence of iid draws $Z_1, Z_2, Z_3, \ldots$ from an unknown distribution $f$. The distribution $f$ is either $f_0$ or $f_1$, where both $f_0$ and $f_1$ are known probability densities. After observing each draw, the decision-maker must choose one of three actions: 1. Accept the hypothesis that $f = f_0$ and stop. 2. Accept the hypothesis that $f = f_1$ and stop. 3. Draw another observation at cost $c > 0$. The decision-maker incurs a loss whenever she makes an incorrect decision. The losses are as follows: - loss $L_0$ when incorrectly accepting $f_0$ (in fact $f = f_1$) - loss $L_1$ when incorrectly accepting $f_1$ (in fact $f = f_0$) Both $L_0$ and $L_1$ are strictly positive. The objective is to minimize the expected loss, which includes both the cost of sampling and the potential loss from incorrect terminal decisions. The decision-maker begins with a prior belief $\pi_0 \in (0,1)$ that $f = f_1$. The state variable is the posterior belief $\pi_n$, which represents the probability that $f = f_1$ given observations $1, \ldots, n$. After observing $Z_n$, the posterior is updated via Bayes' rule: $$ \pi_{n+1} = \kappa(\pi_n, Z_{n+1}), \quad \text{where} \quad \kappa(\pi, z) \coloneq \frac{\pi f_1(z)}{(1-\pi) f_0(z) + \pi f_1(z)}. $$ (eq-waldbayes) Notice that $(\pi_n)_{n \geq 0}$ is Markovian over the sampling process. ```{prf:remark} The expression in {eq}`eq-waldbayes` is more easily understood as an application of Bayes rule if we write it (informally) as $$ \PP \{f = f_1 \,|\, Z \} = \frac{\PP \{Z \,|\, f = f_1\}\PP \{f = f_1\}} {\PP \{Z \}} $$ and expand the denominator via $\PP \{Z \} = \sum_{h \in \{f_0, f_1\}} \PP \{Z \,|\, f = h\} \PP \{f = h\}$. ``` Given current belief $\pi$, the next draw from the $(Z_n)_{n \in \NN}$ sequence has the predicted distribution $$ \psi(\pi, z) \coloneq (1-\pi)f_0(z) + \pi f_1(z). $$ (eq-predden) The controller uses this distribution to take expectations over next-sample draws from the $(Z_n)_{n \geq 1}$ process. (optimality-1)= ### Optimality For this sequential sampling problem, the **Bellman equation** for minimizing loss has the form $$ g(\pi) = \min \left\{ \pi L_0, \; (1-\pi) L_1, \; c + \int g(\kappa(\pi, z)) \psi(\pi, z) \diff z \right\}. $$ (eq-waldie) The Bellman equation can be understood as follows: The value $g(\pi)$ represents the minimum expected loss given current belief state $\pi$. This value is itself the minimum over three terms, each of which corresponds to a choice. The first term is associated with accepting $f_0$ and has expected loss $\pi L_0$, since $\pi$ is the (subjective) probability that $f = f_1$. The second term is for accepting $f_1$. This has expected loss $(1-\pi) L_1$, since $1-\pi$ is the probability that $f = f_0$. The last term is the expected loss associated with continuing to the next sample and then behaving optimally. We now state an optimality result that parallels our earlier theorems. Let $\Xsf = (0,1)$ be the state space for the belief state $\pi$ and let $b\Xsf_+$ denote the set of bounded, Borel measurable functions from $\Xsf$ to $\RR_+$. The action space is $\Asf = \{0, 1, 2\}$, where action $0$ represents accepting $f_0$, action $1$ represents accepting $f_1$, and action $2$ represents continuing to sample. The set of all feasible policies, denoted by $\Sigma$, is all Borel measurable $\sigma \colon \Xsf \to \{0, 1, 2\}$. ```{prf:theorem} :label: t-waldop The optimal loss function $\gmin$ is the unique solution in $b\Xsf_+$ to the Bellman equation {eq}`eq-waldie`. A policy $\sigma \in \Sigma$ is optimal if and only if, for each $\pi \in \Xsf$, $$ \sigma(\pi) \in \argmin_{a \in \{0,1,2\}} Q(\pi, a) $$ (eq-waldob) where $$ Q(\pi, 0) = \pi L_0, \quad Q(\pi, 1) = (1-\pi) L_1, \quad \text{and} \quad Q(\pi, 2) = c + \int \gmin(\kappa(\pi, z)) \psi(\pi, z) \diff z. $$ At least one optimal policy exists. ``` ```{figure} figures/wald_distributions.pdf :name: f-wald_distributions :width: 95% Distributions and sample paths for $f_0$ and $f_1$ ``` We prove this theorem via a more general result in {prf:ref}`t-ndbk`. For now, to illustrate the key ideas, we consider a specific example where $f_0 = \text{Beta}(3, 4)$ and $f_1 = \text{Beta}(4, 3)$, as shown in {numref}`f-wald_distributions`. The figure also shows iid sample paths generated by the densities $f_0$ and $f_1$. The remaining parameters are set to $L_0 = 25$, $L_1 = 25$, and $c = 0.5$. ```{figure} figures/wald_optimal.pdf :name: f-wald_optimal :width: 95% Optimal policy and loss function ``` {numref}`f-wald_optimal` shows the optimal policy and the corresponding loss function $\gmin$. The functions were computed by a version of value function iteration, starting from initial condition $g_0 \equiv 0$. The state space $\Xsf$ was discretized into a grid of 200 points, and the integral over future observations was approximated using a grid of 50 points over the support of the distributions. The left panel displays the optimal action as a function of the posterior belief $\pi$. As predicted by {prf:ref}`t-waldop`, the optimal policy has a threshold structure: there exist cutoffs $t_0, t_1 \in [0,1]$ with $t_0 \leq t_1$ such that - accept $f_0$ if $\pi \leq t_0$, - accept $f_1$ if $\pi \geq t_1$, or - continue sampling if $t_0 < \pi < t_1$. {numref}`f-wald_belief_paths` shows dynamics of the belief state under the optimal policy. The belief state $\pi_n$ shifts according to the update rule $\pi_{n+1} = \kappa(\pi_n, Z_{n+1})$, with the samples $(Z_n)_{n \in \NN}$ being drawn from either $f_0$ or $f_1$. When the true distribution is $f_0$, the belief $\pi_n$ tends to drift downward toward zero; when it is $f_1$, the belief drifts upward toward one. Under the optimal policy, sampling terminates once the belief exits the continuation region $(t_0, t_1)$, at which point the corresponding hypothesis is accepted. ```{figure} figures/wald_belief_paths.pdf :name: f-wald_belief_paths :width: 95% Belief paths under the optimal policy ``` (summary)= ## Summary The examples in this chapter illustrate the breadth of problems that dynamic programming can address, but they also expose the limits of classical methods. The firm problem and finite MDPs with constant discounting yield contracting Bellman operators, making optimality theory and computation straightforward. However, several of the extensions and models we encountered require a more general foundation: Epstein--Zin preferences and the sequential analysis problem produce Bellman operators that are not contractions; risk-sensitive and robust formulations involve nonlinear aggregators; and distributional dynamic programming operates on a non-standard value space ordered by stochastic dominance. The abstract theory developed in the next chapter provides a unified framework that accommodates all of these settings. (s-cn_egs)= ## Chapter Notes Richard Bellman's ({cite:year}`bellman1957dynamic`) monograph established dynamic programming as a unified framework for sequential optimization, introducing optimality concepts and recursive functional equations that form the foundations of this text. David Blackwell made major contributions to the mathematical theory, proving contraction properties for discounted problems with finite state spaces {cite:p}`blackwell1962discrete` and extending these results to Borel spaces using order-preserving operators {cite:p}`blackwell1965`. Eric Denardo ({cite:year}`denardo1967contraction`) further generalized contraction results to a broad class of sequential decision problems, introducing conditions that anticipate many ideas in this text. The LP formulation for MDPs discussed in {ref}`sss-lps` has a long history. LP methods are particularly useful for constrained MDPs, where the controller faces additional restrictions on expected rewards or resource usage. Such problems arise in network routing, healthcare resource allocation, and other applications. See {cite}`altman1999constrained` for a textbook treatment. The LP formulation is also central to average-reward problems, where occupation measures play a key role (see, e.g., Chapter 8 of {cite}`puterman2005markov`). For large state-action spaces, approximate LP methods using basis function representations can help scale the approach; {cite}`de2003linear` provides a foundational treatment. The firm problem we studied in {ref}`s-fpintro` is closely related to classic references such as {cite}`jovanovic1982selection` and {cite}`hopenhayn1992stochastic`, and has been extended by many authors (see, e.g., {cite}`alessandria2021firm` or {cite}`sterk2021nature`). Regarding the extensions to the firm problem in {ref}`ss-brn`, excellent discussions of Markov decision processes with risk-sensitive objectives can be found in {cite}`bauerle2024markov` and {cite}`bauerle2025time`. We borrowed from their exposition in several parts of the chapter and return to the key ideas later in the text. The variational formula connecting risk-sensitivity to robustness is developed in {cite}`anantharam2017variational`; see also Chapter 8 of {cite}`sargent2025dynamic` for further discussion. The discussion in {ref}`sss-difficulties` mentions dynamic (time) inconsistency. For analysis of time inconsistency in macroeconomic models and its connections to dynamic programming, see {cite}`sargent2024critique`, {cite:t}`sargent2024calvoml`, and {cite:t}`sargent2024calvoml_2`. For recent theoretical work, see {cite}`stanca2025restricted`, {cite}`strack2026dynamic`, and {cite}`bayraktar2023stability` on the stability of equilibria in time-inconsistent stopping. The finite MDP framework in {ref}`s-mdps` is treated comprehensively in {cite}`puterman2005markov`; see also Chapter 5 of {cite}`sargent2025dynamic` for an introductory treatment. The three core algorithms we presented --- VFI, HPI, and OPI --- are discussed in these sources. Howard policy iteration was introduced in {cite}`howard1960dynamic`. The cash management application in {ref}`sss-cm` builds on the inventory-theoretic models of money demand developed by {cite:t}`baumol1952transactions` and {cite:t}`tobin1956interest`. Our continuous time MDP model in {ref}`ss-mdpct` follows the framework described in {cite}`guo2009continuous`; the uniformization technique we used to reduce continuous time problems to discrete time ones is standard (see, e.g., {cite}`puterman2005markov`, Chapter 11). The optimal savings problem in {ref}`s-og` is also called the income fluctuation problem. It was studied in an early and influential form by {cite}`brock1972optimal`, who analyzed optimal growth under uncertainty with discounted CRRA utility. It has become a core building block for heterogeneous agent models following {cite}`bewley1986stationary`, {cite}`huggett1993risk`, and {cite}`aiyagari1994uninsured`. Recent analysis can be found in {cite}`carroll2011theoretical`, {cite}`li2014solving`, {cite}`lehrer2018effect`, {cite}`light2018precautionary`, {cite}`ma2020income`, and {cite}`ma2021theory`. For a continuous-time treatment, see {cite}`achdou2022income`. See {cite}`stokey1989recursive` and {cite}`stachurski2022economic` for textbook treatments of the underlying dynamic programming theory. The Epstein--Zin preferences discussed in {ref}`ss-ogezintro` were introduced by {cite}`epstein1989risk` and {cite}`weil1990nonexpected`, building on earlier work of {cite}`kreps1978temporal`. The separation of risk aversion from the elasticity of intertemporal substitution enabled by Epstein--Zin utility has been central to the long-run risk literature initiated by {cite}`bansal2004risks`, where small persistent consumption shocks get heavily priced, generating realistic equity premia using "reasonable" parameters. The equity premium puzzle was posed by {cite}`mehra1985equity`; the associated risk-free rate puzzle was posed by {cite}`weil_equity_1989`. Both puzzles are discussed extensively in Chapter 13 of {cite}`ljungqvist2012recursive`. Chapter 7 of {cite}`sargent2025dynamic` provides an introduction to recursive preferences. The discussion of ambiguity in {ref}`sss-ambi` connects to a large literature on robust decision-making under model uncertainty. The minimax formulation we presented follows the approach of {cite}`wald1950statistical`; see also {cite}`ellsberg1961risk` for a foundational discussion of ambiguity aversion and {cite}`hansen2011robustness` for connections to robust control. Recent work on dynamic programming under ambiguity includes {cite}`maccheroni2006dynamic`, {cite}`klibanoff2009recursive`, {cite}`marinacci2019unique`, {cite}`neufeld2023markov`, {cite}`cerreia2026making`, {cite}`benyamine2026dynamicprogrammingepistemicuncertainty`, and {cite}`wang2026nonrectangularaveragerewardrobustmdps`. An excellent survey on ambiguity and its implications for economics and finance can be found in {cite}`ilut2023ambiguity`. The sequential analysis problem in {ref}`s-wald` originated with {cite}`wald1947sequential` and {cite}`arrow1949bayes`. The Bayesian formulation we presented follows {cite}`bertsekas1976dynamic`. For treatments from frequentist and Bayesian perspectives, respectively, see {cite}`SargentStachurski2026qe1` and {cite}`SargentStachurski2026qe2`. The introduction to this chapter mentioned applications of dynamic programming to atemporal problems, such as genome sequencing and the structure of production chains. For one discussion of the former see {cite}`gu2023gendp`; for the latter see, for example, {cite}`kikuchi2021coase`. We mentioned also that many recent applications of dynamic programming are connected to machine learning and artificial intelligence. Introductions to the literature can be found in {cite}`bertsekas2021rollout` and {cite}`kochenderfer2022algorithms`. [^1]: The literature on "recursive contracts" in macroeconomics makes progress here by using a set of procedures that have been called "dynamic programming squared". {cite:t}`ljungqvist2012recursive` devote a suite of chapters to that topic. [^2]: In engineering it is sometimes called a *closed loop control* to emphasize that the control must be a measurable function of an observed history and *not* depend on as yet unrealized random variables. [^3]: In his formulation, Abraham Wald {cite}`wald1947sequential` proceeded as a frequentist statistician, using objects from Neyman-Pearson's hypothesis testing theory. For descriptions of the problem from the distinct frequentist and Bayesian perspectives, see {cite}`SargentStachurski2026qe1` and {cite}`SargentStachurski2026qe2`. ======================================================================== ## Abstract Decision Processes > One of the principal objects of theoretical research in any department of knowledge is to find the point of view from which the subject appears in its greatest simplicity. > > ---Josiah Willard Gibbs Having reviewed some applications in {prf:ref}`c-egs`, our next aim is to present a general theory of dynamic programming that includes these applications as special cases and extends to many new problems. The framework developed in this chapter leads with order theory rather than metric structure. The reason is that contraction-based arguments on Banach spaces, the standard foundation for dynamic programming, are fragile. Many natural variants of the textbook setting --- nonstandard discounting, nonlinear period aggregators, nonstandard value spaces, and others --- break it. As suggested by the examples in {prf:ref}`c-egs`, contraction is the exception rather than the rule once one moves outside the additively separable, expected-reward template. What does unify these settings is that policy operators are order preserving and that lifetime values are characterized as fixed points of those operators on partially ordered sets. This chapter develops the optimality and convergence theory those ingredients support. This investment will pay off in later chapters: subsequent theory uses this foundation to establish, for many classes of models, the same fundamental optimality results and algorithmic convergence guarantees one expects from classical theory, even without contraction-type properties. The classical theory itself is recovered as a special case: the firm problem of {ref}`s-fpintro`, the finite MDP of {ref}`s-mdps`, and the optimal savings problem of {ref}`s-og` all reappear as concrete instances. A second feature of the framework is methodological. Order-theoretic proofs are almost entirely algebraic: each result reduces to systematic application of a small number of named hypotheses, with few analytic prerequisites. This setup pays off in two ways. It eases human analysis --- a proof in this chapter typically needs nothing beyond applications of definitions and chains of inequalities --- and it makes the theory a natural target for machine-assisted reasoning. We begin with definitions and basic properties. Some readers will find it helpful to review the main order-theoretic concepts in {ref}`s-prelim` before proceeding further. The optimality results given in this chapter are high-level. We use them mostly as inputs for downstream optimality theory, rather than for applications per se. (In particular, from {prf:ref}`c-adps2` on, we leverage findings from this chapter to construct downstream results that are more easily applied to specific problems.) When presenting this high-level theory, we refrain from giving examples until {ref}`s-adpapps`, hoping that readers have reviewed {prf:ref}`c-egs`, or otherwise have strong backgrounds in core concepts of dynamic programming. Such readers will be able to appreciate the practical value of the abstract framework to be presented here. (s-aops)= ## ADPs on Posets We begin by defining abstract dynamic programs and listing some of their properties, including the Bellman equation and Bellman operator. We then state conditions under which essential optimality properties hold. (ss-defs)= ### Definitions and Properties In this section we define abstract dynamic programs and list some simple properties. (sss-adpdef)= #### Key Definitions Let's start with the most important concept. ```{prf:definition} An **abstract dynamic program** (ADP) is a pair $(V, \TT)$, where 1. $V = (V, \preceq)$ is a partially ordered set and 2. $\TT = \setntn{T_\sigma}{\sigma \in \Sigma}$ is a nonempty family of order preserving self-maps on $V$. ``` In what follows, - $V$ is called the **value space**. - Each operator $T_\sigma$ in $\TT$ is called a **policy operator**. - $\Sigma$ is an arbitrary index set and elements of $\Sigma$ will be referred to as **policies**. In applications we will impose conditions under which each $T_\sigma$ has a unique fixed point in $V$. In these settings, the significance of $T_\sigma$ is that its fixed point represents the lifetime value of following policy $\sigma$. We will denote the unique fixed point of $T_\sigma$ by $v_\sigma$ and call it the **$\sigma$-value function**. {numref}`f-adp_three_policies` illustrates a simple ADP with $V = \RR$ (paired with the usual order) and $\Sigma = \{\sigma, \sigma', \sigma''\}$. Each policy operator is an affine self-map on $\RR$, and each has a unique fixed point: $v_\sigma$, $v_{\sigma'}$, and $v_{\sigma''}$, respectively. ```{figure} figures/adp_three_policies.svg :name: f-adp_three_policies An ADP on $\RR$ ``` ```{prf:remark} In most applications we analyze below, $(V, \preceq)$ is a space of real-valued functions paired with the pointwise partial order. But this is not universally the case. For example, in {ref}`ss-ddp`, $(V, \preceq)$ is a set of maps from the state space into a space of probability measures paired with pointwise stochastic dominance. In {ref}`ss-lq`, $(V, \preceq)$ is a space of matrices paired with the Loewner partial order. In {ref}`sss-fvur`, it is a subset of $L_p$ and paired with the almost everywhere partial order. ``` In {prf:ref}`c-egs`, the concept of greedy policies (see {eq}`eq-osbellarg`) played a key role. It will also play a key role here, in our ADP framework. The definition is as follows. ```{prf:definition} :label: def-def101 Given $v \in V$, we say that $$ \sigma \in \Sigma \text{ is \textbf{} v \text{-greedy} if } T_\tau \, v \preceq T_\sigma \, v \text{ for all } \tau \in \Sigma. $$ (eq-adpgr) ``` As shown below, {prf:ref}`def-def101` generalizes the notion of greedy policies for the problems considered in {prf:ref}`c-egs`. Throughout this book we let $$ V_G \coloneq \setntn{v \in V}{\text{ at least one } v \text{-greedy policy exists}}. $$ (eq-vgee) {numref}`f-bellman_envelope_a` illustrates the $v$-greedy concept for the ADP from {numref}`f-adp_three_policies`, evaluated at the marked point $v$. The three values $T_\sigma \, v$, $T_{\sigma'} \, v$, $T_{\sigma''} \, v$ are indicated: here $T_{\sigma'} \, v$ is the largest, so $\sigma'$ is the $v$-greedy policy. In applications, solving for a $v$-greedy policy is often straightforward. Moreover, there exist conditions under which solving the overall problem reduces to solving for a $v$-greedy policy with a "correct" choice of $v$. We pursue this idea below. (sss-props)= #### Properties of ADPs We now list properties useful for ADP optimality theory. First, we call $(V, \TT)$ - **well-posed** if each $T_\sigma \in \TT$ has a unique fixed point in $V$. Well-posedness is a minimal precondition for a coherent dynamic program: to maximize lifetime values, we need them to be well-defined. When well-posedness holds we will always use $v_\sigma$ to denote the unique fixed point of $T_\sigma$. We call $(V, \TT)$ - **finite** if $\TT$ is a finite set. Finiteness holds for the finite MDPs described in {ref}`s-mdps`, as well as other related settings with finite states and actions. Not surprisingly, finite ADPs have attractive optimality properties. We call $(V, \TT)$ - **regular** if $V_G = V$; that is, if a $v$-greedy policy exists for all $v \in V$. Regularity helps us to construct algorithms and to obtain existence of optimal policies. Since $V = V_G$ under regularity, {prf:ref}`l-torper` implies that $T$ is well-defined on all of $V$ whenever this property holds. We focus primarily on regular ADPs. We call $(V, \TT)$ - **order stable** if each $T_\sigma \in \TT$ is order stable on $V$. Recalling the definitions in {ref}`sss-orstab`, this means that $(V, \TT)$ is well-posed and, for each $\sigma \in \Sigma$ and $v \in V$, - $v \preceq T_\sigma \, v$ implies $v \preceq v_\sigma$, and - $T_\sigma \, v \preceq v$ implies $v_\sigma \preceq v$. Most proofs in this chapter use only properties (a) and (b). Occasionally we will need monotone convergence of the iterates, in which case we will assume that $(V, \TT)$ is - **strongly order stable** if each $T_\sigma \in \TT$ is strongly order stable on $V$. As per the discussion in {ref}`sss-orstab`, this implies that each $T_\sigma$ is order stable and, in addition, $v \preceq T_\sigma \, v$ implies $T_\sigma^n v \uparrow v_\sigma$ and $T_\sigma \, v \preceq v$ implies $T_\sigma^n v \downarrow v_\sigma$. (sss-adpbell)= #### The Bellman Equation A Bellman equation that plays a central role in optimality theory for the {ref}`sss-og_bin` optimal savings problem is again key for ADPs. Here we define the ADP Bellman equation and note some preliminary observations. Throughout this section, $(V, \TT)$ is an ADP with policy set $\Sigma$. ```{prf:definition} We say that $v \in V$ satisfies the **Bellman equation** if $$ v = \bigvee_{\sigma} T_\sigma \, v \qquad (v \in V). $$ (eq-adp_belleq) ``` In {eq}`eq-adp_belleq`, the supremum is taken over all $\sigma \in \Sigma$. There is, in general, no guarantee that the supremum exists. ```{prf:definition} We define the **Bellman operator** generated by $(V, \TT)$ via $$ Tv \coloneq \bigvee_{\sigma} T_\sigma \, v \quad \text{whenever the supremum exists}. $$ (eq-adp_bellop) ``` Evidently $v \in V$ satisfies the Bellman equation if and only if $Tv$ exists and $Tv = v$. {numref}`f-bellman_envelope_b` illustrates the Bellman operator for the simple ADP introduced earlier. At each $v$, the value $Tv$ is the largest of $T_\sigma \, v$, $T_{\sigma'} \, v$, $T_{\sigma''} \, v$: $T$ is the upper envelope of the three policy operators. The fixed point of $T$ is $v_{\sigma''}$, which is also the largest of the three $\sigma$-value functions and hence the value function $\vmax$. ```{figure} figures/adp_three_policies_greedy.svg :name: f-bellman_envelope_a σ′ is v-greedy ``` ```{figure} figures/adp_three_policies_envelope.svg :name: f-bellman_envelope_b T as upper envelope ``` The next lemma provides some essential facts about $T$ on $V_G$ (as defined in {eq}`eq-vgee`). ```{prf:lemma} :label: l-torper The Bellman operator $T$ has the following properties: 1. $T$ is well-defined and order preserving on $V_G$. 2. For $v \in V_G$ we have - $T_\sigma \, v \preceq Tv$ for all $\sigma \in \Sigma$ and - $T_\sigma \, v = T v$ if and only if $\sigma$ is $v$-greedy. ``` ```{prf:proof} We begin with part (ii). Fix $v \in V_G$ and let $\sigma$ be $v$-greedy. Then, by definition, $T_\sigma \, v$ is the greatest element of $\{T_\tau \, v\}_{\tau \in \Sigma}$. A greatest element is also a supremum, so we have $$ T v \coloneq \bigvee_{\tau \in \Sigma} T_\tau \, v = T_\sigma \, v . $$ This gives both (a) and $\Leftarrow$ in (b) of part (ii). For $\Rightarrow$ of (b), if $Tv = T_\sigma \, v$, then $T_\tau \, v \preceq T_\sigma \, v$ for all $\tau \in \Sigma$. In particular, $\sigma$ is $v$-greedy. Next we prove (i). For $v \in V_G$, a $v$-greedy policy exists, so $Tv$ is well-defined by (b) of part (ii). Regarding the order preserving claim, fix $v, w \in V_G$ with $v \preceq w$. Let $\sigma \in \Sigma$ be $v$-greedy. Since $T_\sigma$ is order preserving, we have $T v = T_\sigma \, v \preceq T_\sigma \, w \preceq T w$. ◻ ``` (sss-subval)= #### Subsets of the Value Space We often refer to the following three subsets of the value space $V$, the first of which was already introduced in {eq}`eq-vgee` and is repeated here for convenience: ```{prf:definition} Given an ADP $(V, \TT)$ with Bellman operator $T$, we set - $V_G \coloneq$ all $v \in V$ such that at least one $v$-greedy policy exists. - $V_U \coloneq$ all $v \in V_G$ with $v \preceq Tv$. - $V_\Sigma \coloneq$ all $v \in V$ such that $T_\sigma \, v = v$ for some $T_\sigma \in \TT$. ``` In the next lemma, $(V, \TT)$ is any ADP. ```{prf:lemma} :label: l-vsigvu $V_\Sigma \cap V_G \subset V_U$. ``` ```{prf:proof} Fix $v \in V_\Sigma$ and let $T_\sigma$ be such that $v$ is a fixed point. Using $v \in V_G$ and {prf:ref}`l-torper`, we have $v = T_\sigma \, v \preceq T v$. Hence $v \in V_U$. ◻ ``` ### Optimization In this subsection we define optimality for ADPs, connect it to the Bellman equation, and state the fundamental optimality properties. We then give sufficient conditions for these properties to hold. (sss-oabell)= #### Optimality and the Bellman Equation We say that a policy $\sigma \in \Sigma$ is **optimal** for $(V, \TT)$ if $v_\sigma$ is a greatest element of $V_\Sigma$. In other words, $\sigma$ is optimal if it attains the "highest possible" lifetime value. Perhaps the most important aspect of the theory of dynamic programming is the link between optimality and the Bellman equation. To clarify this link we introduce some terminology. Let $(V, \TT)$ be a well-posed ADP and set $$ \vmax \coloneq \bigvee_\sigma v_\sigma \coloneq \bigvee V_\Sigma \quad \text{whenever the supremum exists}. $$ When $\vmax$ exists (i.e., when the supremum exists) we call $\vmax$ the **value function** of the ADP. The following statements are obvious from the definitions: - Existence of an optimal policy $\sigma$ implies that $\vmax$ exists and is equal to $v_\sigma$. - If $\vmax = v_\sigma$ for some $\sigma \in \Sigma$, then $\sigma$ is optimal. At the same time, existence of $\vmax$ does not generally imply existence of a greatest element (and hence an optimal policy). ```{prf:definition} We say that **Bellman's principle of optimality holds** if $$ \setntn{\sigma \in \Sigma}{\sigma \text{ is optimal }} = \setntn{\sigma \in \Sigma}{\sigma \text{ is } \vmax \text{-greedy }}. $$ (eq-bpo) ``` (When $\vmax$ does not exist both sets are understood as empty.) The following results will be useful for studying optimality. ```{prf:lemma} :label: l-fo For a well-posed ADP, the following statements are valid. 1. If $\vmax$ exists in $V$ and satisfies the Bellman equation, then Bellman's principle of optimality holds. 2. $\vmax$ exists in $V_G$ and satisfies the Bellman equation if and only if an optimal policy exists and Bellman's principle of optimality holds. ``` ```{prf:proof} Regarding part (i), suppose that $\vmax$ exists in $V$. We prove the equality in {eq}`eq-bpo` when $T\vmax=\vmax$. Suppose first that $\sigma \in \Sigma$ is optimal, so that $v_\sigma = \vmax$. Since $T_\sigma \, v_\sigma = v_\sigma$, this implies $T_\sigma \, \vmax = \vmax$. But $T \, \vmax = \vmax$, so $T_\sigma \, \vmax = T \vmax$. Hence $\sigma$ is $\vmax$-greedy, since $T_\tau \, \vmax \preceq T \vmax = T_\sigma \, \vmax$ for all $\tau$. Suppose next that $\sigma$ is $\vmax$-greedy, so that $T_\sigma \, \vmax = T \vmax = \vmax$. But $v_\sigma$ is the unique fixed point of $T_\sigma$ in $V$, so $v_\sigma = \vmax$. Hence $\sigma$ is an optimal policy. Regarding part (ii), suppose that $\vmax$ exists in $V_G$ and $T\vmax = \vmax$. Let $\sigma$ be $\vmax$-greedy. Then $T_\sigma \, \vmax = T \vmax = \vmax$. But $v_\sigma$ is the unique fixed point of $T_\sigma$, so $v_\sigma = \vmax$. Hence $\sigma$ is an optimal policy. Bellman's principle of optimality follows from part (i). Regarding (ii, $\Leftarrow$), let $\sigma \in \Sigma$ be optimal, so that $v_\sigma = \vmax$. By Bellman's principle of optimality, the policy $\sigma$ is $\vmax$-greedy. As a result, $T \vmax = T_\sigma \, \vmax = T_\sigma \, v_\sigma = v_\sigma = \vmax$. In particular, $\vmax$ exists and satisfies the Bellman equation. ◻ ``` #### The Fundamental Optimality Properties Throughout this section, $(V, \TT)$ is a well-posed ADP. ```{prf:definition} We say that the **fundamental optimality properties hold** if 1. at least one optimal policy exists, 2. $\vmax$ exists and is the unique solution to the Bellman equation in $V_G$, and 3. Bellman's principle of optimality holds. ``` It is important to emphasize here that (B1)--(B3) are not independent. Indeed, (B2) implies both (B1) and (B3), as shown in the next proposition. ```{prf:proposition} :label: p-foo The fundamental optimality properties hold if and only if $\vmax$ exists and is the unique solution to the Bellman equation in $V_G$. ``` ```{prf:proof} Regarding ($\Leftarrow$), suppose that $\vmax$ exists and is the unique solution to the Bellman equation in $V_G$. (B2) holds by the hypothesis. By {prf:ref}`l-fo`, an optimal policy exists and Bellman's principle of optimality holds. This gives (B1) and (B3) respectively. The claim ($\Rightarrow$) is trivial, so the proof of {prf:ref}`p-foo` is complete. ◻ ``` Despite the fact that (B2) implies both (B1) and (B3), we have stated them together because, in terms of applications, all three parts are significant. (B1) tells us that the problem at hand has a solution. (B3) implies that a solution can be computed by taking a $\vmax$-greedy policy, and that any other optimal policy must also be $\vmax$-greedy. Finally, (B2) provides a restriction that can help us calculate $\vmax$. Provided that we search in $V_G$, any fixed point of $T$ is equal to $\vmax$. {prf:ref}`p-foo` tells us our main goal is to construct conditions under which $\vmax$ exists and is the unique fixed point of $T$ in $V_G$. We begin this task in {ref}`ss-suffcon`. The next exercise generalizes a well-known result from more traditional dynamic programming frameworks (see, e.g., {cite}`puterman2005markov`, Theorem 6.2.6). ```{exercise} :label: ex-adps-auto-1 Let $(V, \TT)$ be regular and well-posed. Prove the following: If $\vmax$ exists and is the unique fixed point of $T$ in $V$, then $\sigma \in \Sigma$ is optimal if and only if $T v_\sigma = v_\sigma$. ``` ```{solution} ex-adps-auto-1 Let the stated conditions hold. Suppose first that $\sigma$ is optimal, so that $v_\sigma = \vmax$. Since $T\vmax = \vmax$ we have $T v_\sigma = v_\sigma$. For the reverse implication, suppose that $T v_\sigma = v_\sigma$ holds. As $\vmax$ is the unique fixed point of $T$ in $V$ we have $v_\sigma = \vmax$. Since $\vmax$ is the greatest element of $V_\Sigma$, this proves that $\sigma$ is an optimal policy. ``` (sss-fpvf)= #### From Fixed Points to Optimality In this section we investigate the following question: When does existence of a solution to the Bellman equation imply the fundamental optimality properties (B1)--(B3)? We will see that order stability is useful here. The results stated in this section should be understood as intermediate inputs: we use them in downstream theorems that are tuned to applications. In the statement of the next result, $(V, \TT)$ is an ADP and $T$ is the Bellman operator. ```{prf:theorem} :label: t-bk Let $(V, \TT)$ be a well-posed ADP such that $$ T_\sigma \, v \preceq v \implies v_\sigma \preceq v \qquad \text{for all } \sigma \in \Sigma \text{ and } v \in V. $$ In this setting, the following are equivalent: 1. $T$ has a fixed point in $V_G$. 2. The fundamental optimality properties hold. ``` ```{prf:proof} *Proof of {prf:ref}`t-bk`.* Let $(V, \TT)$ be as stated. Suppose first that (i) holds, with $T \bar v = \bar v$ and $\bar v \in V_G$. In view of {prf:ref}`p-foo`, it suffices to show that $\vmax$ exists and is the unique solution to the Bellman equation in $V_G$. By the characterization of greedy policies in {prf:ref}`l-torper`, we can choose a $\sigma \in \Sigma$ such that $\bar v = T \, \bar v = T_\sigma \, \bar v$. By well-posedness, the unique fixed point of $T_\sigma$ in $V$ is its $\sigma$-value function $v_\sigma$, so the last equation yields $\bar v = v_\sigma$. Moreover, if $\tau$ is any policy, then $T_\tau \, v_\sigma \preceq T \, v_\sigma = v_\sigma$ and hence, by hypothesis, $v_\tau \preceq v_\sigma$. These facts imply that the fixed point $\bar v$ is a greatest element of $V_\Sigma$. Moreover, if $v'$ is another fixed point of $T$ in $V_G$, then, by the same argument, $v'$ is also a greatest element of $V_\Sigma$. Since greatest elements are unique, we have $v' = \bar v$. These arguments prove that $\vmax$ exists and is the unique solution to the Bellman equation in $V_G$, so (ii) holds. Conversely, suppose that (ii) holds. Then (B2) states that $\vmax$ exists and is the unique solution to the Bellman equation in $V_G$, so $\vmax \in V_G$ is a fixed point of $T$, giving (i). ◻ ``` ```{prf:corollary} :label: c-bk If $(V, \TT)$ is order stable and $T$ has a fixed point in $V_G$, then the fundamental optimality properties hold. ``` ```{prf:proof} Let $(V, \TT)$ be order stable and suppose that $T$ has a fixed point in $V_G$. Order stability implies well-posedness and property (b) (i.e., $T_\sigma \, v \preceq v \implies v_\sigma \preceq v$). Hence {prf:ref}`t-bk`(i) applies, giving the fundamental optimality properties. ◻ ``` The next result is almost a corollary of {prf:ref}`t-bk`. It imposes a strong order completeness condition on the value space. (See {ref}`ss-ococ` for background on order completeness.) ```{prf:theorem} :label: t-bkn0 Let $(V, \TT)$ be regular and well-posed. If $V$ is chain complete, then the fundamental optimality properties hold. ``` ```{prf:proof} Let $(V, \TT)$ be as stated and let $T$ be the Bellman operator generated by $(V, \TT)$. By {prf:ref}`t-bk`, it suffices to verify that (a) the hypothesis $T_\sigma \, v \preceq v \implies v_\sigma \preceq v$ holds, and (b) $T$ has a fixed point in $V_G$. Property (a) holds by {prf:ref}`l-ocius`. For (b), $T$ is order preserving ({prf:ref}`l-torper`) and $V$ is chain complete, so the Knaster--Tarski fixed point theorem ({prf:ref}`t-ccfp`) gives a fixed point in $V$, which lies in $V_G = V$ by regularity. ◻ ``` While the chain completeness assumption is strong, {prf:ref}`t-bkn0` is nonetheless significant: It says that, at least for this idealized setting, regular well-posed ADPs possess all the optimality properties that we seek. What drives this strong result? The definition of ADPs is itself doing some heavy lifting: it requires that lifetime values are inherently recursive and reflect order structure (being fixed points of order-preserving operators). Bellman's fundamental ideas work smoothly in this setting when we add some regularity. (s-adpop)= ## Algorithms and Convergence In this section we introduce algorithms for solving ADPs---including value function iteration, Howard policy iteration and optimistic policy iteration---and establish conditions under which they converge. We treat both maximization and minimization, extending the latter via the theory of dual ADPs. (sss-adpal)= ### Algorithms In this section we discuss algorithms for dynamic programming and present some preliminary results. The three algorithms we consider are value function iteration (VFI), Howard policy iteration (HPI) and optimistic policy iteration (OPI). These algorithms generalize the ones presented for the finite MDP case in {ref}`sss-mdp_algos`. (sss-top)= #### Operators As a first step, we introduce two operators. Throughout {ref}`sss-top` we take $(V, \TT)$ to be a fixed ADP. As usual, when $(V, \TT)$ is well-posed, the unique fixed point of $T_\sigma \in \TT$ in $V$ is denoted by $v_\sigma$. ```{prf:definition} We define the **Howard policy operator** corresponding to $(V, \TT)$ via $$ H \colon V_G \to V_\Sigma, \qquad H v = v_\sigma \quad \text{ where } \sigma \text{ is } v \text{-greedy}, $$ as well as, for each $m \in \NN$, the **optimistic policy operator** $$ W \colon V_G \to V, \qquad W v = T^m_\sigma v \quad \text{where } \sigma \text{ is } v \text{-greedy}. $$ (eq-wmopmax) ``` (Here and below, the dependence of $W$ on $m$ is often suppressed to simplify notation.) Like the Bellman operator $T$, the map $W$ is well-defined on all of $V$ when $(V, \TT)$ is regular. The Howard policy operator $H$ is well-defined on all of $V$ when $(V, \TT)$ is well-posed and regular. Note that, for both of these maps, we always select the same $v$-greedy policy when applying them to some fixed $v$.[^1] Below we will associate VFI, OPI, and HPI with fixing a $v \in V$ and then iteratively applying the operators $T$, $W$, and $H$ respectively. A small amount of thought will convince you that, for the optimal savings ADP described in {ref}`sss-egosadp`, these iterative procedures coincide with our earlier description of VFI, HPI, and OPI from {ref}`sss-mdp_algos`. The next lemma collects useful results for the operators introduced above. In the statement, $(V, \TT)$ is an ADP with Howard policy operator $H$, optimistic policy operator $W$ and Bellman operator $T$. We assume regularity, so that $V_G = V$. ```{prf:lemma} :label: l-meta If $(V, \TT)$ is regular and well-posed, then the following statements hold. 1. If $v \in V$ with $Hv=v$, then $Tv=v$. 2. The operators $T, W$ and $H$ all map $V_U$ to itself. 3. If $v \in V_U$, then $T v \preceq W v \preceq T^m v$. ``` ```{prf:proof} Regarding (L1), fix $v \in V$ with $H \, v = v$ and let $\sigma$ be a $v$-greedy policy such that $H \, v = v_\sigma$. Then $v_\sigma = v$. Since $\sigma$ is $v$-greedy, $T_\sigma \, v = T \, v$. Since $v_\sigma$ is fixed for $T_\sigma$, we also have $T_\sigma \, v = v$. Combining the last two equalities proves (L1). Regarding (L2), fix $v \in V_U$. Since $v \preceq Tv$ and $T$ is order preserving on $V_U$, we have $Tv \preceq TTv$. Hence $Tv \in V_U$. Regarding $W$, let $\sigma$ be $v$-greedy with $Wv = T_\sigma^m v$. Since $\sigma$ is $v$-greedy, $Tv = T_\sigma \, v$. Using this and the order preserving property of $T$ and $T_\sigma$, we get $$ \begin{aligned} W v &= T_\sigma \, T_\sigma^{m-1} v \preceq T \, T_\sigma^{m-1} v &&\text{(since } T_\sigma \preceq T \text{)} \\ &\preceq T \, T_\sigma^{m-1} \, T v &&\text{(since } v \preceq Tv \text{ and } T, T_\sigma \text{ order preserving)} \\ &= T \, T_\sigma^{m-1} \, T_\sigma \, v = T \, T_\sigma^m \, v = T W v &&\text{(using } Tv = T_\sigma \, v \text{).} \end{aligned} $$ Hence $W v \in V_U$. Finally, regarding $H$, we observe that $Hv \in V_\Sigma$ and, since $V_G = V$ by regularity, {prf:ref}`l-vsigvu` gives $V_\Sigma \subset V_U$. To prove (L3) we fix $v \in V_U$. Letting $\sigma$ be $v$-greedy, we have $v \preceq Tv = T_\sigma \, v$. Iterating on this inequality with $T_\sigma$ proves that $(T_\sigma^k \, v)$ is increasing. In particular, $Tv = T_\sigma \, v \preceq W v$. For the second inequality in (L3) we use the fact that $T_\sigma \preceq T$ on $V$ and $T$ and $T_\sigma$ are both order preserving to obtain $W v = T^m_\sigma v \preceq T^m v$. ◻ ``` The next lemma adds order stability and derives additional implications. ```{prf:lemma} :label: l-cl2 If $(V, \TT)$ is regular and order stable, then $$ v \in V_U \qquad \implies \qquad T^n v \preceq W^n v \quad \text{and} \quad T^n v \preceq H^n v $$ (eq-rank) for all $n \in \NN$. Moreover, $T^n v$, $H^n v$, and $W^n v$ are all increasing in $n$. ``` ```{prf:proof} *Proof of {prf:ref}`l-cl2`.* Our first claim is that $$ u,v \in V_U \text{ with } u \preceq v \implies Tu \preceq Wv \; \text{ and } Tu \preceq Hv. $$ (eq-uv) To show this we fix such $u, v$ and use regularity to select a $v$-greedy policy $\sigma$. Since $(V, \TT)$ is order stable the $\sigma$-value function (unique fixed point of $T_\sigma$) exists. Let $v_\sigma$ be the $\sigma$-value function, so that $T_\sigma \, v_\sigma = v_\sigma$ and $v_\sigma = H v$. Since $v \in V_U$ we have $$ v \preceq Tv = T_\sigma \, v \preceq T^m_\sigma \, v = W v \preceq v_\sigma = Hv. $$ (eq-vhv) The second inequality is by iterating on $v \preceq T_\sigma \, v$, while the third is by order stability. Since $Tu \preceq Tv$, we can use {eq}`eq-vhv` to obtain {eq}`eq-uv`. Iterating on {eq}`eq-uv` produces {eq}`eq-rank`. The last claim in {prf:ref}`l-cl2` follows from {eq}`eq-vhv`, which tells us that elements of $V_U$ are mapped up by $T$, $W$, and $H$. ◻ ``` ```{prf:remark} The proof above does not use the full strength of order stability. It only uses well-posedness and the property that $v \in V$ with $v \preceq T_\sigma \, v$ implies $v \preceq v_\sigma$. ``` #### Convergence In this section, we assume that *$(V, \TT)$ is a regular ADP and the fundamental optimality properties on page  hold*. Let $\vmax$ denote the value function. ```{prf:definition} In this setting, we say that - **VFI converges** if $T^n v \uparrow \vmax$ for all $v \in V_U$, - **OPI converges** if $W^n v \uparrow \vmax$ for all $v \in V_U$, and - **HPI converges** if $H^n v \uparrow \vmax$ for all $v \in V_U$. If, for all $v \in V_U$, there exists an $n \in \NN$ with $H^n v = \vmax$, we say that HPI converges in finitely many steps. ``` For OPI convergence, the meaning is that convergence holds for any choice of the OPI step size $m \in \NN$. ```{exercise} :label: ex-vfibyopi Prove that convergence of OPI implies convergence of VFI. ``` ```{solution} ex-vfibyopi Convergence of OPI implies convergence of VFI because OPI reduces to VFI when $m=1$ (since $W v = T_\sigma \, v$ when $\sigma$ is $v$-greedy, so $W = T$). ``` ```{prf:remark} :label: r-vfibyopi Despite the implication in {prf:ref}`ex-vfibyopi`, our statements of results often mention VFI explicitly -- mainly for the benefit of readers who skim through books reading only the main theorems. ``` The next lemma is a useful preliminary. ```{prf:lemma} :label: l-rwubv If $(V, \TT)$ is regular and order stable, then $v \preceq \vmax$ for all $v \in V_U$. ``` ```{prf:proof} Let $(V, \TT)$ be as stated and fix $v \in V_U$. By regularity, there exists a $\sigma \in \Sigma$ with $Tv = T_\sigma \, v$. Since $v \in V_U$, we have $v \preceq T_\sigma \, v$. This inequality and order stability yield $v \preceq v_\sigma$, where $v_\sigma$ is the fixed point of $T_\sigma$. As a consequence, $v \preceq \vmax$. ◻ ``` The next lemma sharpens {prf:ref}`l-cl2` by adding upper bounds in terms of $\vmax$. ```{prf:lemma} :label: l-algchains If $(V, \TT)$ is regular and order stable, then for every $v \in V_U$ and every $n \in \NN$, $$ v \preceq T^n v \preceq W^n v \preceq \vmax \qquad \text{and} \qquad v \preceq T^n v \preceq H^n v \preceq \vmax. $$ (eq-wchain) ``` ```{prf:proof} Fix $v \in V_U$. By {prf:ref}`l-cl2`, $T^n v$, $W^n v$, $H^n v$ are increasing in $n$ and $T^n v \preceq W^n v$, $T^n v \preceq H^n v$ for all $n$. In particular, $v \preceq T^n v$ in both chains. It remains to show that $W^n v \preceq \vmax$ and $H^n v \preceq \vmax$ for all $n$. We argue by induction on $n$. For $n = 0$: $v \preceq \vmax$ by {prf:ref}`l-rwubv`. For the inductive step, assume $W^n v \preceq \vmax$. By {prf:ref}`l-meta`(L2), $W^n v \in V_U$, so we may pick $\sigma$ that is $(W^n v)$-greedy. Applying {eq}`eq-vhv` at $W^n v$ yields $W^{n+1} v = T_\sigma^m \, W^n v \preceq v_\sigma \preceq \vmax$, the last inequality holding because $\vmax$ is the greatest element of $V_\Sigma$. The argument for $H^n v \preceq \vmax$ is identical, using $H^{n+1} v = v_\sigma \preceq \vmax$ in the last step. ◻ ``` ```{prf:corollary} :label: c-cl2 If $(V, \TT)$ is regular and order stable, then convergence of VFI implies convergence of OPI and HPI. ``` ```{prf:proof} Fix $v \in V_U$. By {prf:ref}`l-algchains`, $T^n v \preceq W^n v \preceq \vmax$ and $T^n v \preceq H^n v \preceq \vmax$ for all $n$. Hence, if $T^n v \uparrow \vmax$, then $W^n v \uparrow \vmax$ and $H^n v \uparrow \vmax$ by squeeze. ◻ ``` (ss-suffcon)= ### Optimality and Convergence In this section we introduce conditions for optimality of ADPs and convergence of algorithms that are well-posed and regular. These high-level results will later be used as inputs for lower level results that are more straightforward to check in applications. In each case, we will aim for the fundamental optimality properties, as given on page , and the convergence of the three major algorithms, discussed on page . #### Case I: Finite ADPs In applications we often deal with dynamic programs that have finitely many states and actions. This finiteness implies that the set of feasible policies is finite. The next result deals with this case. ```{prf:theorem} :label: t-bkf Let $(V, \TT)$ be regular and order stable. If $(V, \TT)$ is also finite, then 1. the fundamental optimality properties hold and 2. HPI converges in finitely many steps. ``` ```{prf:proof} Let $(V, \TT)$ be as stated. Fix $v$ in $V_U$ (which is nonempty by {prf:ref}`l-vsigvu`). Since $(V, \TT)$ is well-posed and regular, the Howard policy operator $H$ is well-defined on $V$. Let $v_n = H^n v$ for all $n \geq 0$. By {prf:ref}`l-cl2`, we have $v_n \preceq v_{n+1}$ for all $n$. Since $(v_n)_{n \geq 1}$ is contained in the finite set $V_\Sigma$, it must be that $v_{n+1} = v_n$ for some $n \in \NN$. But then $H v_n = v_n$, so, by {prf:ref}`l-meta`, we have $T v_n = v_n$. Since $T$ has a fixed point in $V$, the fundamental optimality properties hold ({prf:ref}`c-bk`). We have also shown that HPI converges in finitely many steps. ◻ ``` (sss-otccc)= #### Case II: Chain Complete Value Space Next we state a result for the chain complete setting that extends {prf:ref}`t-bkn0`. It shows that adding strong order stability is enough to provide convergence of all algorithms. ```{prf:theorem} :label: t-bkn Let $(V, \TT)$ be regular and strongly order stable. If $V$ is chain complete, then 1. the fundamental optimality properties hold and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} Since strong order stability implies order stability and hence well-posedness, part (i) is immediate from {prf:ref}`t-bkn0`. Regarding convergence of VFI, fix $v \in V_U$ and let $v_n \coloneq T^n v$. By {prf:ref}`l-rwubv`, we have $v_n \preceq \vmax$ for all $n$. Also, let $\bot$ be the least element of $V$, let $\sigma$ be an optimal policy and let $w_n \coloneq T_\sigma^n \, \bot$. We have $w_n \preceq T^n \bot \preceq T^n v = v_n$ and, by strong order stability, $w_n \uparrow \vmax$. Hence $v_n \uparrow \vmax$. Convergence of OPI and HPI now follow from {prf:ref}`c-cl2`. ◻ ``` (sss-sdedcon)= #### Case III: Countably Dedekind Complete Value Space The result in this section replaces chain completeness with countable Dedekind completeness and order continuity. To state the result we need two new definitions. We call $(V, \TT)$ - **order continuous** if each $T_\sigma \in \TT$ is order continuous on $V$. Order continuity means that $T_\sigma \, v_n \uparrow T_\sigma \, v$ whenever $\sigma \in \Sigma$ and $v_n \uparrow v$. This technical condition holds in many of the applications we consider. A general discussion of order continuous operators is provided in {ref}`ss-ordercon`. In addition, we call $(V, \TT)$ - **order bounded** if there exists a $u \in V$ with $T_\sigma \, u \preceq u$ for all $T_\sigma \in \TT$. ```{exercise} :label: ex-vub Let $(V, \TT)$ be order bounded, with $T_\sigma \, u \preceq u$ for all $\sigma \in \Sigma$. Prove the following: If $(V, \TT)$ is regular and order stable, then $v \preceq u$ for all $v \in V_U$. ``` ```{solution} ex-vub Assume the conditions and fix $v \in V_U$. Let $\sigma$ be $v$-greedy, so that $Tv = T_\sigma \, v$. Then $v \preceq T v = T_\sigma \, v$, so, by order stability, $v \preceq v_\sigma$. At the same time, $T_\sigma \, u \preceq u$ implies $v_\sigma \preceq u$ (again by order stability). Putting these inequalities together gives $v \preceq u$. ``` Now we can state the main result of this section. ```{prf:theorem} :label: t-dede Let $(V, \TT)$ be regular and order stable. If, in addition, $V$ is countably Dedekind complete and $(V, \TT)$ is both order bounded and order continuous, then 1. the fundamental optimality properties hold and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} In view of {prf:ref}`c-bk`, the fundamental optimality properties will hold when $T$ has a fixed point in $V$. To see that this is true, fix any $v \in V_U$ (which is nonempty by {prf:ref}`l-vsigvu`) and set $v_n \coloneq T^n v$. Since $(V, \TT)$ is order bounded, there is a $u \in V$ with $Tu \preceq u$ and $v \preceq u$ (see {prf:ref}`ex-vub`). Hence $v_n \preceq u$ for all $n$. Because $V$ is countably Dedekind complete, we deduce existence of a $\bar v \in V$ with $v_n \uparrow \bar v$. We claim that $T \bar v = \bar v$. Indeed, $v_{n+1} = T v_n \preceq T \bar v$ for all $n$, so, taking the supremum, $\bar v \preceq T \bar v$. For the reverse inequality we take $\sigma$ to be $\bar v$-greedy and use order continuity of $T_\sigma$ to obtain $$ T \bar v = T_\sigma \, \bar v = T_\sigma \, \bigvee_n v_n = \bigvee_n T_\sigma \, v_n \preceq \bigvee_n T \, v_n = \bigvee_n v_{n+1} = \bar v. $$ The fundamental optimality properties are now proved. In view of these properties, the only fixed point of $T$ in $V$ is $\vmax$. Hence $T^n v = v_n \uparrow \bar v = \vmax$. This proves convergence of VFI. Convergence of OPI and HPI follow from {prf:ref}`c-cl2`. ◻ ``` {numref}`tab-convergence_cases` summarizes the conditions and conclusions for the three cases treated above. ````{table} :name: tab-convergence_cases Conditions and conclusions for the three convergence cases ```{list-table} :header-rows: 2 :enumerated: false * - - Case I - Case II - Case III * - - ({prf:ref}`t-bkf`) - ({prf:ref}`t-bkn`) - ({prf:ref}`t-dede`) * - Regular - - - * - Order stable - - - * - Finite - - - * - Chain complete - - - * - Countably Dedekind complete - - - * - Order bounded - - - * - Order continuous - - - * - Optimality properties - - - * - VFI, OPI, HPI converge - - - * - HPI finite convergence - - - ``` ```` (ss-minim)= ### Minimization In some dynamic programs, the objective is to minimize lifetime cost of a given policy, rather than maximizing rewards. While we focus primarily on maximization in this book, the present section discusses how to handle minimization problems. The content of this section can be summarized by the following statement: for a given ADP $(V, \TT)$, a minimization problem can be converted to a maximization problem by reversing the partial order on $V$. Further details are given below. Readers who prefer to focus on maximization results can safely skip ahead. (sss-mindef)= #### Definitions Let $(V, \TT)$ be an ADP with policy set $\Sigma$. We define the **Bellman min-operator** $\tmin$ corresponding to $(V, \TT)$ by $$ \tmin v = \bigwedge_\sigma T_\sigma \, v \quad \text{whenever the infimum exists.} $$ We say that $v \in V$ satisfies the **Bellman min-equation** if $\tmin v = v$. Paralleling the maximization terminology, we say that - $\sigma \in \Sigma$ is **$v$-min-greedy** if $T_{\sigma} \, v \preceq T_\tau \, v$ for all $\tau \in \Sigma$. Analogous to the max case, we have $$ \sigma \text{ is } v \text{-min-greedy} \quad \iff \quad T_\sigma \, v = \tmin v. $$ In addition, we say that $(V, \TT)$ is - **min-order bounded** if there exists a $b \in V$ with $b \preceq T_\sigma \, b$ for all $T_\sigma \in \TT$, and - **min-regular** if, for each $v \in V$, at least one $v$-min-greedy policy exists. We let $$ V^G_{\triangledown} = \setntn{v \in V}{\text{ at least one } v \text{-min-greedy policy exists}}. $$ Now suppose $(V, \TT)$ is well-posed and let $V_\Sigma$ be defined as before (i.e., the set of $\sigma$-value functions). In this setting we set $\vmin = \bigwedge_\sigma v_\sigma$ and call it the **min-value function** whenever the infimum exists. Also, we say that - $\sigma \in \Sigma$ is **min-optimal** for $(V, \TT)$ if $v_\sigma = \vmin$, and - $(V, \TT)$ obeys **Bellman's principle of min-optimality** if $$ \sigma \in \Sigma \text{ is min-optimal for } (V, \TT) \quad \iff \quad \sigma \text{ is } \vmin \text{-min-greedy}. $$ When $(V, \TT)$ is min-regular and well-posed, we define the **Howard policy min-operator** corresponding to $(V, \TT)$ via $$ \Hmin \colon V^G_{\triangledown} \to V_\Sigma, \qquad \Hmin v = v_\sigma \quad \text{ where } \sigma \text{ is } v \text{-min-greedy}, $$ as well as, for each $m \in \NN$, the **optimistic policy min-operator** via $$ \Wmin \colon V^G_{\triangledown} \to V, \qquad \Wmin v \coloneq T^m_\sigma v \quad \text{where} \quad \sigma \text{ is } v \text{-min-greedy}. $$ (eq-wmop) We say that the **fundamental min-optimality properties hold** if 1. at least one min-optimal policy exists, 2. $\vmin$ exists and is the unique solution to the Bellman min-equation in $V^G_\triangledown$, and 3. Bellman's principle of min-optimality holds. This definition parallels (B1)--(B3) from page . Let $V_D$ be all $v \in V^G_{\triangledown}$ with $\tmin v \preceq v$. We say that - **min-VFI converges** if $\tmin^n v \downarrow \vmin$ for all $v \in V_D$, - **min-OPI converges** if $\Wmin^n v \downarrow \vmin$ for all $v \in V_D$ and all $m \in \NN$, and - **min-HPI converges** if $\Hmin^n v \downarrow \vmin$ for all $v \in V_D$. To further increase clarity, when discussing maximization and minimization in the same section, we add a "max-" prefix to the previously introduced definitions that pertain to maximization. For example, - "$v$-greedy policies" will be referred to as **$v$-max-greedy policies**, - "optimal policies" will be referred to as **max-optimal policies**, - "the Bellman equation" will be referred to as the **Bellman max-equation**, and so on. (sss-dualadps)= #### Dual ADPs Let's now investigate how minimization problems can be converted to maximization problems in this abstract setting. The key idea is that taking infima in $(V, \preceq)$ corresponds to taking suprema in the order-dual $(V, \preceq^\partial)$. Recall from {ref}`sss-orddual` that if $V \coloneq (V, \preceq)$ is a partially ordered set, then its order dual $V^\partial \coloneq (V, \preceq^\partial)$ is the set $V$ paired with the partial order $\preceq^\partial$ obtained by setting $u \preceq^\partial v$ if and only if $v \preceq u$. If $(V, \TT)$ is any ADP, then we call $(V, {\TT})^\partial \coloneq (V^\partial, \TT)$ the **dual** of $(V, \TT)$. In other words, the dual $(V, {\TT})^\partial$ of $(V, \TT)$ is the ADP created by maintaining the same family of policy operators $\TT$ while replacing the poset $V$ with its order dual $V^\partial$. ```{exercise} :label: ex-adps-auto-2 Show that $(V, {\TT})^\partial$ is an ADP. ``` ```{solution} ex-adps-auto-2 Let $(V, {\TT})^\partial$ be the dual of $(V, \TT)$. Each $T_\sigma \in \TT$ is a self-map on the poset $V^\partial$. Moreover, for any $T_\sigma \in \TT$, we have $$ v \preceq^\partial w \implies w \preceq v \implies T_\sigma \, w \preceq T_\sigma \, v \implies T_\sigma \, v \preceq^\partial T_\sigma \, w. $$ Hence $T_\sigma$ is order preserving on $V^\partial$. ``` Regarding notation for $(V, {\TT})^\partial$, - the Bellman max-operator will be denoted by $T^\partial$, - the Bellman min-operator will be denoted by $\tmin^\partial$, - the max-value function will be denoted by $\vmaxd$, - $V_G^\partial$ is all $v \in V$ such that at least one $v$-max greedy policy exists for $(V, \TT)^\partial$, - etc. Each ADP is self-dual, in the sense that $((V, {\TT})^\partial)^\partial = (V, \TT)$. This follows from the fact that all partially ordered sets are self-dual. ```{exercise} :label: ex-mmp Let $(V, \TT)$ be a well-posed ADP with dual $(V, {\TT})^\partial$. Fix $v \in V$ and verify the following: 1. $\sigma$ is $v$-min-greedy for $(V, \TT)$ if and only if $\sigma$ is $v$-max-greedy for $(V, {\TT})^\partial$. 2. $(V, \TT)$ is min-regular if and only if $(V, {\TT})^\partial$ is max-regular, 3. $(V, \TT)$ is min-order bounded if and only if $(V, {\TT})^\partial$ is max-order bounded, 4. If $\tmax^\partial v$ exists then so does $\tmin v$, and, moreover, $\tmin v = \tmax^\partial v$. 5. If $W^\partial v$ exists then so does $\Wmin v$, and, moreover, $\Wmin v = W^\partial v$. 6. If $H^\partial v$ exists then so does $\Hmin v$, and, moreover, $\Hmin v = H^\partial v$. 7. If $\vmaxd$ exists for $(V, {\TT})^\partial$, then $\vmin$ exists for $(V, \TT)$ and $\vmin = \vmaxd$. 8. $\sigma \in \Sigma$ is min-optimal for $(V, \TT)$ if and only if $\sigma$ is max-optimal for $(V, {\TT})^\partial$. 9. $V^G_{\triangledown} = V_G^\partial$. ``` ```{solution} ex-mmp Regarding (i), fix $v \in V$. Policy $\sigma$ is $v$-min-greedy for $(V, \TT)$ if and only if $T_\sigma \, v \preceq T_\tau \, v$ for all $\tau \in \Sigma$, which is equivalent to $T_\tau \, v \preceq^\partial T_\sigma \, v$ for all $\tau \in \Sigma$. Hence $\sigma$ is $v$-min-greedy for $(V, \TT)$ if and only if $\sigma$ is $v$-max-greedy for $(V, {\TT})^\partial$. Claim (ii) follows from (i). Claim (iii) holds because $b \preceq T_\sigma \, b$ for all $\sigma$ implies $T_\sigma \, b \preceq^\partial b$ for all $\sigma$. Claim (iv) is immediate from {prf:ref}`ex-dualsi` on page . The proofs of the remaining claims are also straightforward and details are left to the reader. ``` Self-duality implies corollaries to {prf:ref}`ex-mmp` that we treat as self-evident. For example, $\sigma \in \Sigma$ is min-optimal for $(V, {\TT})^\partial$ if and only if $\sigma$ is max-optimal for $(V, \TT)$. Part (viii) of {prf:ref}`ex-mmp` tells us that we can solve an ADP for a min-optimal policy by switching to the dual ADP and maximizing. ```{exercise} :label: ex-bmbm Show that Bellman's principle of min-optimality holds for $(V, \TT)$ if and only if Bellman's principle of max-optimality holds for $(V, {\TT})^\partial$. ``` ```{solution} ex-bmbm Bellman's principle of min-optimality for $(V, \TT)$ states that $$ \setntn{\sigma \in \Sigma}{\sigma \text{ is min-optimal for } (V, \TT)} = \setntn{\sigma \in \Sigma}{\sigma \text{ is } \vmin \text{-min-greedy}}. $$ Bellman's principle of max-optimality for $(V, {\TT})^\partial$ states that $$ \setntn{\sigma \in \Sigma}{\sigma \text{ is max-optimal for } (V, {\TT})^\partial} = \setntn{\sigma \in \Sigma}{\sigma \text{ is } \vmaxd \text{-max-greedy for } (V, {\TT})^\partial}. $$ We show that the former principle implies the latter using the facts established in {prf:ref}`ex-mmp`. To this end, observe that the statement $\sigma$ is max-optimal for $(V, \TT)^\partial$ is equivalent to the statement that $\sigma$ is min-optimal for $(V, \TT)$. Since Bellman's principle of min-optimality holds for $(V, \TT)$, this is equivalent to the statement that $\sigma$ is $\vmin$-min-greedy for $(V, \TT)$, which is equivalent to the statement that $\sigma$ is $\vmin$-max-greedy for $(V, {\TT})^\partial$, which is in turn equivalent to the statement that $\sigma$ is $\vmaxd$-max-greedy for $(V, {\TT})^\partial$. This argument confirms that the former principle implies the latter. The proof of the converse implication is similar and omitted. ``` (optimality-and-convergence)= #### Optimality and Convergence We can now easily translate max-optimality results to min-optimality results and vice versa. Our key tool is the next exercise. ```{exercise} :label: t-fbk_min Let $(V, \TT)$ be a well-posed ADP with dual $(V, {\TT})^\partial$. Show that the fundamental max-optimality properties hold for $(V, {\TT})^\partial$ if and only if the fundamental min-optimality properties hold for $(V, \TT)$. Moreover, show that 1. max-VFI converges for $(V, {\TT})^\partial$ if and only if min-VFI converges for $(V, \TT)$, 2. max-OPI converges for $(V, {\TT})^\partial$ if and only if min-OPI converges for $(V, \TT)$, and 3. max-HPI converges for $(V, {\TT})^\partial$ if and only if min-HPI converges for $(V, \TT)$. ``` ```{solution} t-fbk_min We prove each equivalence using the correspondences established in {prf:ref}`ex-mmp`. Regarding the fundamental optimality properties, suppose the fundamental max-optimality properties hold for $(V, {\TT})^\partial$. Then (B1) holds for $(V, {\TT})^\partial$, so at least one max-optimal policy exists for $(V, {\TT})^\partial$. By {prf:ref}`ex-mmp`(viii), this policy is min-optimal for $(V, \TT)$, giving (B1'). For (B2'), note that $\vmaxd$ is the unique solution to the Bellman max-equation in $V_G^\partial$ for $(V, {\TT})^\partial$. By {prf:ref}`ex-mmp`(iv), $\tmin v = \tmax^\partial v$ for all relevant $v$, and by {prf:ref}`ex-mmp`(ix), $V^G_\triangledown = V_G^\partial$, so the Bellman max-equation for the dual in $V_G^\partial$ is precisely the Bellman min-equation for $(V, \TT)$ in $V^G_\triangledown$. Moreover, by {prf:ref}`ex-mmp`(vii), $\vmin = \vmaxd$, so $\vmin$ is the unique solution in $V^G_\triangledown$, giving (B2'). For (B3'), Bellman's principle of max-optimality for the dual states that a policy is max-optimal for $(V, {\TT})^\partial$ if and only if it is $\vmaxd$-max-greedy for $(V, {\TT})^\partial$. By {prf:ref}`ex-mmp`(viii), max-optimality for the dual is equivalent to min-optimality for $(V, \TT)$. By {prf:ref}`ex-mmp`(i), $\vmaxd$-max-greediness for the dual is equivalent to $\vmaxd$-min-greediness for $(V, \TT)$, which by {prf:ref}`ex-mmp`(vii) is $\vmin$-min-greediness. This gives (B3'). The converse follows by the same argument applied to $((V, {\TT})^\partial)^\partial = (V, \TT)$. For (i), the set $V_U^\partial$ of all $v \in V_G^\partial$ with $v \preceq^\partial T^\partial v$ equals $V_D$ by {prf:ref}`ex-mmp`(iv) and (ix), since $v \preceq^\partial T^\partial v$ means $\tmax^\partial v \preceq v$, i.e., $\tmin v \preceq v$. Moreover, $(\tmax^\partial)^n v \to \vmaxd$ in the order on $V^\partial$ means $\tmin^n v \downarrow \vmin$ in $V$. Claims (ii) and (iii) follow similarly using {prf:ref}`ex-mmp`(v) and (vi). ``` Below we use {prf:ref}`t-fbk_min` to prove theorems providing sufficient conditions for minimization results. As one example of how {prf:ref}`t-fbk_min` can be applied, we construct a min-version of {prf:ref}`t-bk`. In the statement, $\tmin$ is the Bellman min-operator. ```{prf:theorem} :label: t-minbk Let $(V, \TT)$ be a well-posed ADP such that $$ v \preceq T_\sigma \, v \implies v \preceq v_\sigma \qquad \text{for all } \sigma \in \Sigma \text{ and } v \in V. $$ In this setting, the following are equivalent: 1. $\tmin$ has a fixed point in $V^G_\triangledown$. 2. The fundamental min-optimality properties hold. ``` ```{prf:proof} Let $(V, \TT)$ be as stated and consider the dual $(V, {\TT})^\partial$. By {prf:ref}`ex-mmp`(iv), $\tmin v = \tmax^\partial v$ for all $v$ where either side is defined. By {prf:ref}`ex-mmp`(ix), $V^G_\triangledown = V_G^\partial$. Hence claim (i) is equivalent to "$T^\partial$ has a fixed point in $V_G^\partial$". Moreover, $T_\sigma \, v \preceq^\partial v$ means $v \preceq T_\sigma \, v$, which by hypothesis gives $v \preceq v_\sigma$, i.e., $v_\sigma \preceq^\partial v$. So the hypothesis of {prf:ref}`t-bk` holds for the dual, and {prf:ref}`t-bk` states that "$T^\partial$ has a fixed point in $V_G^\partial$" is equivalent to the fundamental max-optimality properties for $(V, {\TT})^\partial$. By {prf:ref}`t-fbk_min`, that is equivalent to the fundamental min-optimality properties for $(V, \TT)$. ◻ ``` ```{prf:corollary} :label: c-minbk If $(V, \TT)$ is order stable and $\tmin$ has a fixed point in $V^G_\triangledown$, then the fundamental min-optimality properties hold. ``` ```{prf:proof} Let $(V, \TT)$ be order stable and suppose that $\tmin$ has a fixed point in $V^G_\triangledown$. Order stability implies well-posedness and property (a) (i.e., $v \preceq T_\sigma \, v \implies v \preceq v_\sigma$). Hence {prf:ref}`t-minbk`(i) applies, giving the fundamental min-optimality properties. ◻ ``` (s-adpapps)= ## Applications In this section we show how several important models fit into the ADP framework. Applications include firm valuation, optimal savings, finite Markov decision processes and linear-quadratic control. (sss-firmas)= ### Firm Valuation Recall the firm problem from {ref}`s-fpintro`, where, repeating {eq}`eq-fintrots`, the policy operators took the form $$ (T_\sigma \, v)(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] \qquad (x \in \Xsf). $$ (eq-fintrots2) The policy set $\Sigma$ was defined as all Borel measurable functions mapping $\Xsf$ to $\{0,1\}$. Let $\TT_{\rm FV} \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ and let $\leq$ be the pointwise partial order on $b\Xsf$. ```{exercise} :label: ex-vfpsm Prove that each $T_\sigma$ is an order preserving self-map on $(b\Xsf, \leq)$. ``` ```{solution} ex-vfpsm To check the self-map property, we first fix $v \in b\Xsf$. Since policies are Borel measurable, the map $r_\sigma$ is Borel measurable and $P_\sigma$ is a self-map on $b\Xsf$ ({prf:ref}`l-mopfp`), the function $T_\sigma \, v$ is Borel measurable. Since $r_\sigma$ is bounded and $b\Xsf$ is closed under linear operations, we see that $T_\sigma \, v = r_\sigma + \beta P_\sigma \, v$ is in $b\Xsf$. The order preserving property follows from the fact that each $P_\sigma$ is order preserving (see {prf:ref}`l-mopfp`). ``` The monotonicity in {prf:ref}`ex-vfpsm` implies that the pair $(b\Xsf, \TT_{\rm FV})$ is an ADP. We saw in {ref}`sss-fintroc` that each $T_\sigma$ is a contraction map on $b\Xsf$. Hence $(b\Xsf, \TT_{\rm FV})$ is well-posed. As we discussed in {ref}`sss-fintroc`, the unique fixed point $v_\sigma$ of $T_\sigma$ has the interpretation of assigning lifetime values (expected present value of the firm) to states (initial conditions for the underlying Markov chain) under $\sigma$. In {ref}`s-fpintro` we introduced the Bellman operator $T$ (see {eq}`eq-fintroio`) and the notion of greedy policies (see {eq}`eq-fing`). Let's make sure that these agree with the new ADP definitions from this chapter. To begin, fix $v \in b\Xsf$. On one hand, in {ref}`sss-fintroc`, we called $\sigma \in \Sigma$ **$v$-greedy** whenever $$ \sigma(x) \in \argmax_{a \in \{0,1\}} \left\{ a s + (1 - a) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] \right\} \quad \text{for all } x \in \Xsf. $$ (eq-fing2) On the other hand, in our ADP setting, we called $\sigma$ $v$-greedy when $\sigma \in \Sigma$ and $T_\tau \, v \preceq T_\sigma \, v$ for all $\tau \in \Sigma$. Here this translates to $$ \tau(x) s + (1 - \tau(x)) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] \leq \\ \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] $$ for all $\tau \in \Sigma$ and all $x \in \Xsf$, which is equivalent to stating that $\sigma$ obeys {eq}`eq-fing2`. Hence, for the firm ADP $(b\Xsf, \TT_{\rm FV})$, the two definitions agree. A $v$-greedy policy can always be chosen. Indeed, if we set $$ \sigma(x) = \1\left\{s \geq \pi(x) + \beta \int v(x') P(x, \diff x')\right\} \qquad (x \in \Xsf), $$ then $\sigma$ is Borel measurable, since $\pi$ and $x \mapsto \int v(x') P(x, \diff x')$ are both Borel measurable, and $\sigma$ obeys {eq}`eq-fing2`. Given that we can always choose a $\sigma \in \Sigma$ obeying {eq}`eq-fing2`, the firm ADP $(b\Xsf, \TT_{\rm FV})$ is regular. The ADP definition of the Bellman operator presented in {ref}`sss-adpbell` is $Tv = \bigvee_\sigma T_\sigma \, v$. For our firm ADP $(b\Xsf, \TT_{\rm FV})$, this agrees with the original definition we gave in {ref}`sss-fintrobe`. Indeed, given that the ADP is regular, we can state that $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$-greedy ({prf:ref}`l-torper`). We have just shown that any such $\sigma$ obeys {eq}`eq-fing2`. Taking such a $\sigma$ and applying $Tv = T_\sigma \, v$ yields $$ (T v)(x) = \max \left\{ s ,\; \pi(x) + \beta \int v(x') P(x, \diff x') \right\} $$ for all $x \in \Xsf$. This ADP construction of $T$ agrees with the original definition we presented in {eq}`eq-fintroio`. ```{exercise} :label: ex-firmoc Show that the firm ADP $(b\Xsf, \TT_{\rm FV})$ is order continuous. ``` ```{solution} ex-firmoc Fix $\sigma \in \Sigma$ and let $(v_n)$ be a sequence in $b\Xsf$ with $v_n \uparrow v$. By {prf:ref}`l-pcid`, $v_n(x') \uparrow v(x')$ in $\RR$ for all $x' \in \Xsf$. For any $x \in \Xsf$, we have $$ (T_\sigma \, v_n)(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v_n(x') P(x, \diff x') \right]. $$ By the monotone convergence theorem, $\int v_n(x') P(x, \diff x') \uparrow \int v(x') P(x, \diff x')$. Hence $(T_\sigma \, v_n)(x) \uparrow (T_\sigma \, v)(x)$ for all $x$. Applying {prf:ref}`l-pcid` again gives $T_\sigma \, v_n \uparrow T_\sigma \, v$, confirming order continuity. ``` ```{exercise} :label: ex-firmob Show that the firm ADP $(b\Xsf, \TT_{\rm FV})$ is order bounded. ``` ```{solution} ex-firmob Let $M \coloneq (s + \|\pi\|) / (1-\beta)$ and set $u \equiv M$. For any $\sigma \in \Sigma$ and $x \in \Xsf$, if $\sigma(x) = 1$ then $(T_\sigma \, u)(x) = s \leq M$. If $\sigma(x) = 0$ then $(T_\sigma \, u)(x) = \pi(x) + \beta M \leq \|\pi\| + \beta M \leq M$. Hence $T_\sigma \, u \leq u$ for all $\sigma$. ``` It is possible to prove optimality results for the firm problem here, verifying and extending {prf:ref}`t-fintroop`. For example, a proof can be constructed using {prf:ref}`t-dede`. For now we'll refrain from doing so. The reason is that we build additional ADP theory below, leveraging the framework set out in this chapter. Tackling specific applications will then become much easier. (sss-egosadp)= ### Optimal Savings Consider the optimal savings model from {ref}`s-og`, with {prf:ref}`a-uf` in force. We can represent this model as an ADP by taking $V \coloneq b\RR_+$ as the value space, paired with the pointwise order $\leq$, letting $\Sigma$ be the set of (Borel measurable) feasible policies, as defined in {ref}`ss-pfslv`, and setting $\TT_{\rm OS} \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$, where each policy operator $T_\sigma$ is as given in {eq}`eq-ogpolop`. It is straightforward to verify that each $T_\sigma \in \TT_{\rm OS}$ is order preserving under $\leq$, and {prf:ref}`ex-tsvog` confirms that $T_\sigma$ maps $V$ to itself. Hence $(V, \TT_{\rm OS})$ is an ADP. By {prf:ref}`l-ogtsup`, each policy operator $T_\sigma$ has a unique fixed point (i.e., $\sigma$-value function) $v_\sigma$. Consistent with the discussion in {ref}`sss-adpdef`, the real number $v_\sigma(w)$ represents the lifetime value of policy $\sigma$, conditional on initial wealth state $W_0 = w$. In {eq}`eq-osbellarg` we defined the concept of a $v$-greedy policy for the optimal savings model. Earlier, in {eq}`eq-adpgr`, we introduced the notion of a $v$-greedy policy for an arbitrary ADP. The second definition is a generalization of the first. Indeed, if $\sigma$ obeys the optimal savings greedy condition {eq}`eq-osbellarg` and $\tau$ is any other feasible policy, then $$ u(\tau(w)) + \beta \int v(R(w - \tau(w)) + y) \phi(\diff y) \leq \\ u(\sigma(w)) + \beta \int v(R(w - \sigma(w)) + y) \phi(\diff y) \qquad \text{for all } w \in \RR_+. $$ (eq-ospg) This is equivalent to the statement $T_\tau \, v \leq T_\sigma \, v$ for all $\tau \in \Sigma$, which is, in the present setting, the ADP definition of $v$-greedy (given that the partial order is $\leq$). Conversely, if $\sigma \in \Sigma$ obeys $T_\tau \, v \leq T_\sigma \, v$ for all $\tau \in \Sigma$, then it obeys {eq}`eq-ospg`. By appealing to {prf:ref}`l-conofos`, we can strengthen this to $$ \sigma(w) \in \argmax_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} \quad \text{for all } w \in \RR_+. $$ (eq-osgr) In other words, $\sigma$ is $v$-greedy in the sense of {ref}`s-og`. In addition, the ADP Bellman operator for $(V, \TT_{\rm OS})$, as defined in {eq}`eq-adp_bellop`, is a generalization of the optimal savings Bellman operator given in {eq}`eq-osbell_op_0`. To see this, let $T = \bigvee_\sigma T_\sigma$ be the ADP Bellman operator and fix $v \in V$. By (ii) of {prf:ref}`l-torper`, we have $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$-greedy. Letting $\sigma$ obey {eq}`eq-osgr`, which exists by {prf:ref}`l-conofos`, fixing $w \in \RR_+$ and combining these facts, we get $$ (Tv)(w) = (T_\sigma v)(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\}. $$ This confirms the claim that, in the setting of $(V, \TT_{\rm OS})$, the ADP Bellman operator reduces to the optimal savings Bellman operator in {eq}`eq-osbell_op_0`. In view of {prf:ref}`l-conofos`, a $v$-greedy policy exists for every $v \in V$. Hence $(V, \TT_{\rm OS})$ is regular. In {prf:ref}`l-ogtsup` on page  we showed that each $T_\sigma \in \TT_{\rm OS}$ has a unique fixed point, so $(V, \TT_{\rm OS})$ is well-posed. The same lemma also showed that each policy operator in $\TT_{\rm OS}$ is globally stable. {prf:ref}`l-pspace` now implies that $(V, \TT_{\rm OS})$ is order stable. ```{exercise} :label: ex-osvu Assuming $u \geq 0$, show that $u \in V_U$ (i.e., $u \leq Tu$). ``` ```{solution} ex-osvu For any $w \in \RR_+$, taking $c = w$ in the maximization gives $$ (T u)(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int u(R(w - c) + y) \phi(\diff y) \right\} \geq u(w) + \beta \int u(y) \phi(\diff y) \geq u(w), $$ where the last inequality uses $u \geq 0$. Hence $u \in V_U$ as claimed. ``` ```{exercise} :label: ex-adps-auto-3 Show that $(V, \TT_{\rm OS})$ is order bounded and order continuous. ``` ```{solution} ex-adps-auto-3 Fix $T_\sigma \in \TT_{\rm OS}$, which we can write as $T_\sigma \, v = r_\sigma + \beta P_\sigma \, v$ (see {eq}`eq-tsiggrow` on page ). Recalling that the utility function is bounded (by {prf:ref}`a-uf`), the constant $M \coloneq \sup u$ is finite. With $\bar v \coloneq M/(1-\beta)$, we have $T_\sigma \, \bar v \leq M + \beta P_\sigma \, \bar v = M + \beta M / (1-\beta) = \bar v$. Hence $(V, \TT_{\rm OS})$ is order bounded. Order continuity follows from {prf:ref}`ex-iooc`. ``` (ss-fsms)= ### MDPs as ADPs In {ref}`s-mdps` we introduced a finite MDP $(\Gamma, r, \beta, P)$ with finite state space $\Xsf$ and finite action space $\Asf$. Now we show that this finite MDP can be framed as an ADP $(\RR^\Xsf, \TT_{\rm MDP})$ and then discuss its properties. (sss-masa)= #### ADP Representations for MDPs (sss-mdpprop)= Let $(\Gamma, r, \beta, P)$ be as above. We recall that the policy operators take the form $$ (T_\sigma \, v)(x) = r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x') \qquad (v \in \RR^\Xsf, \; x \in \Xsf) $$ (eq-tsig_mdp) We set $\TT_{\rm MDP} \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$, where $\Sigma$ is the feasible policies, and pair $\RR^\Xsf$ with the pointwise partial order $\leq$. Since each $T_\sigma \in \TT_{\rm MDP}$ is order preserving on $(\RR^\Xsf, \leq)$, the pair $(\RR^\Xsf, \TT_{\rm MDP})$ is an ADP. We proved in {prf:ref}`ex-tsfmdp` that each $T_\sigma \in \TT_{\rm MDP}$ has a unique fixed point in $\RR^\Xsf$ given by $(I-\beta P_\sigma)^{-1} r_\sigma$. Hence $(\RR^\Xsf, \TT_{\rm MDP})$ is well-posed. $(\RR^\Xsf, \TT_{\rm MDP})$ is also order continuous. Indeed, in $(\RR^\Xsf, \leq)$, the statement $v_n \uparrow v$ is equivalent to $v_n(x) \uparrow v(x)$ in $\RR$ for all $x \in \Xsf$ ({prf:ref}`l-pcid`). As a result, every continuous order preserving map is order continuous. ```{exercise} :label: ex-mdpob Show that $(\RR^\Xsf, \TT_{\rm MDP})$ is order bounded. ``` ```{solution} ex-mdpob Let $\bar r = \max_{(x,a) \in \Gsf} | r(x, a)|$, set $M = \bar r / (1-\beta)$, and let $u \equiv M$ be the constant function on $\Xsf$. For any $\sigma \in \Sigma$ and $x \in \Xsf$, $$ (T_\sigma \, u)(x) = r(x, \sigma(x)) + \beta \sum_{x'} M \cdot P(x, \sigma(x), x') \leq \bar r + \beta M = M = u(x). $$ Hence $T_\sigma \, u \leq u$ for all $\sigma \in \Sigma$, so $(\RR^\Xsf, \TT_{\rm MDP})$ is order bounded. ``` The ADP $(\RR^\Xsf, \TT_{\rm MDP})$ is also order stable. Indeed, each $T_\sigma$ has the form $T_\sigma \, v = r_\sigma + \beta P_\sigma \, v$, where $P_\sigma$ is a stochastic matrix. Since $\beta P_\sigma \geq 0$ and $\rho(\beta P_\sigma) = \beta < 1$, {prf:ref}`ex-affsop` implies that each $T_\sigma$ is order stable. In {ref}`sss-coreop` we introduced the concept of $v$-greedy policies for the finite MDP, as well as the Bellman operator and the Bellman equation. Let's make sure that these are in fact special cases of our ADP definitions from {ref}`ss-defs`. ```{exercise} :label: ex-mdpgreedy Fix $v \in \RR^\Xsf$. Prove the following: If $\sigma \in \Sigma$ obeys $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \quad \text{for all } x \in \Xsf, $$ (eq-mdpmvg) then $\sigma$ is $v$-greedy in the ADP sense; that is, that $T_\sigma \, v \geq T_\tau \, v$ for all $\tau \in \Sigma$. ``` ```{exercise} :label: ex-mdp_bell Recall that the ADP Bellman operator is defined by the expression $T v = \bigvee_\sigma T_\sigma \, v$. Show that, in the present setting, this can also be written as $$ (T \, v)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \qquad\qquad (x \in \Xsf). $$ (eq-mdpt0) ``` ```{solution} ex-mdp_bell Since $\RR^\Xsf$ is endowed with the pointwise partial order, for given $v \in \RR^\Xsf$ and $x \in \Xsf$, the ADP Bellman operator $Tv = \bigvee_\sigma T_\sigma \, v$ reduces to $$ (T \, v)(x) = \sup_{\sigma \in \Sigma} (T_\sigma \, v)(x) = \sup_{\sigma \in \Sigma} \left\{ r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x') \right\}. $$ By the definition of $\Sigma$, we can also write this as $$ (T \, v)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\}, $$ (eq-mdpt) which is identical to {eq}`eq-mdpt0`. ``` From {eq}`eq-mdpt0` it follows that the ADP Bellman equation for $(\RR^\Xsf, \TT_{\rm MDP})$ is given by $$ v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \right\} \qquad\qquad (x \in \Xsf). $$ (eq-mdp_bell) This aligns with the traditional expression for the Bellman equation that we set out in {eq}`eq-mdp_bell0`. ```{exercise} :label: ex-mdpsreg Show that the ADP $(\RR^\Xsf, \TT_{\rm MDP})$ is regular. ``` ```{solution} ex-mdpsreg This follows from finiteness of $\Asf$, nonemptiness of $\Gamma$, and the characterization of greedy policies given in {prf:ref}`ex-mdpgreedy`. ``` (sss-mdpopt)= #### Optimality for Finite MDP Models We make a number of optimality claims for the finite MDP model in {ref}`sss-coreop`. Later we'll build on the ADP optimality results in ways that make verifying these claims straightforward. Nevertheless, for the sake of the exercise, we now prove the claims from {ref}`sss-coreop` using {prf:ref}`t-dede`, as well as establishing additional results. Readers who prefer to move on can skip this section without loss of continuity. We begin with the following claim. ```{prf:proposition} :label: p-mdpo For the ADP $(\RR^\Xsf, \TT_{\rm MDP})$, 1. the fundamental optimality properties hold, 2. VFI, OPI and HPI all converge, and 3. HPI converges in finitely many steps. ``` ```{prf:proof} We saw in {ref}`sss-mdpprop` that $(\RR^\Xsf, \TT_{\rm MDP})$ is regular and order stable, so Claims (i) and (iii) follow from {prf:ref}`t-bkf`. For claim (ii), we can use the fact that $\RR^\Xsf$ is Dedekind complete ({prf:ref}`eg-rxcc`), and that $(\RR^\Xsf, \TT_{\rm MDP})$ is order bounded and order continuous, and appeal to {prf:ref}`t-dede`. ◻ ``` ```{prf:remark} :label: r-mdpo {prf:ref}`p-mdpo` can be proved more easily using contraction-based arguments. We cover this approach below, in {ref}`sss-mdpsbc`. ``` The statement in {prf:ref}`p-mdpo` is easily translated into more standard MDP terminology. For example, it tells us that the value function $\vmax$ solves the Bellman equation, which we know is given by {eq}`eq-mdp_bell`, and, by the characterization of greedy policies in {eq}`eq-mdpmvg` plus Bellman's principle of optimality, that a policy $\sigma$ is optimal if and only if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{x'} \vmax(x') P(x, a, x') \right\} \quad \text{for all } x \in \Xsf. $$ Later we will show that ADP theory can also handle many extensions to the basic MDP framework. The next exercise asks you to work through an alternative to the proof of {prf:ref}`p-mdpo`. ```{exercise} :label: ex-adps-auto-4 Let $\hat V = [-M, M]$, where $M$ is as defined in {prf:ref}`ex-vha`. We showed in {prf:ref}`ex-vha` that each $T_\sigma$ is a self-map on $\hat V$. Using {prf:ref}`t-bkn`, show that the ADP $(\hat V, \TT_{\rm MDP})$ obeys the fundamental optimality properties and VFI, OPI, and HPI all converge. ``` ```{solution} ex-adps-auto-4 We saw that $(\RR^\Xsf, \TT_{\rm MDP})$ is regular and hence $(\hat V, \TT_{\rm MDP})$ is regular. Moreover, since every $T_\sigma \in \TT_{\rm MDP}$ is contracting on $\RR^\Xsf$ and hence $\hat V$, the ADP $(\hat V, \TT_{\rm MDP})$ is strongly order stable ({prf:ref}`l-pspace`). Since $\hat V$ is chain complete ({prf:ref}`eg-rxcc2`), {prf:ref}`t-bkn` implies that the fundamental optimality properties hold and VFI, OPI, and HPI all converge. ``` (ss-ddp)= ### Distributional Dynamic Programming In {ref}`sss-drl` we mentioned distributional dynamic programming, which seeks to track not just the expected discounted return under a policy, but also the full probability distribution of the random discounted return $\sum_{t \geq 0} \beta^t r(X_t, \sigma(X_t))$. This richer object captures variance, quantiles, tail risk, and other features that matter when agents are not risk-neutral. Here we review some of the basic concepts and show how these ideas can be embedded into the ADP framework. Further reading can be found in {ref}`s-cn_adps`. #### The Distributional ADP Let $\Xsf$ be a metric space of states, let $\Asf$ be a metric space of actions, let $r \colon \Xsf \times \Asf \to \RR$ be a bounded measurable reward function, let $P$ be a stochastic kernel on $\Xsf$ given $\Xsf \times \Asf$ (see {ref}`ss-markop`), and let $\beta \in (0,1)$. A policy is a measurable map $\sigma \colon \Xsf \to \Asf$ and $\Sigma$ denotes the set of all feasible policies. Let $\dD_1(\RR)$ denote the set of Borel probability measures on $\RR$ with finite first moment. A **distributional value function** is a stochastic kernel $\eta$ from $\Xsf$ to $\RR$ (see {ref}`ss-markop`) with $\eta(x, \cdot) \in \dD_1(\RR)$ for each $x \in \Xsf$; it assigns a return distribution $\eta(x, \cdot)$ to each state $x$. We let $\hH$ denote the set of all distributional value functions $\eta$ satisfying the uniform bound $$ \sup_{x \in \Xsf} \int |z| \, \eta(x, \diff z) < \infty. $$ This condition ensures that the metric introduced in {ref}`sss-wpdist` below is well-defined. Given $\eta \in \hH$, we write $\eta(x, h) \coloneq \int h(z) \, \eta(x, \diff z)$ for bounded measurable $h \colon \RR \to \RR$. The space $\hH$ is equipped with the **pointwise stochastic dominance order**, which we denote by $\trianglelefteq$ and define by $\eta \trianglelefteq \eta'$ if $\eta(x, \cdot) \lefsd \eta'(x, \cdot)$ for every $x \in \Xsf$ (see {ref}`ss-sd`). We now define the distributional analogue of the scalar policy operator $T_\sigma$. In the scalar case, $$ (T_\sigma \, v)(x) = r_\sigma(x) + \beta \int v(x') \, P_\sigma(x, \diff x'), $$ where $r_\sigma(x) \coloneq r(x, \sigma(x))$ and $P_\sigma(x, \cdot) \coloneq P(x, \sigma(x), \cdot)$. The distributional version replaces the scalar continuation value $v(x')$ with a random draw from the distribution $\eta(x', \cdot)$, and replaces addition and scalar multiplication with the corresponding operations on distributions. As a first step, given $\eta \in \hH$, we define the **continuation kernel** $P_\sigma \otimes \eta$ from $\Xsf$ to $\RR$ by $$ (P_\sigma \otimes \eta)(x, B) \coloneq \int \eta(x', B) \, P_\sigma(x, \diff x') \qquad (B \in \bB(\RR)). $$ (eq-contker) Here $(P_\sigma \otimes \eta)(x, \cdot)$ is a distributional analog of the continuation value: the distribution of the random value obtained by drawing $x' \sim P_\sigma(x, \cdot)$ and then drawing from $\eta(x', \cdot)$. Given a policy $\sigma \in \Sigma$, the **distributional policy operator** $D_\sigma \colon \hH \to \hH$ is defined by $$ (D_\sigma \, \eta)(x, h) \coloneq \int h(r_\sigma(x) + \beta v) \, (P_\sigma \otimes \eta)(x, \diff v) $$ (eq-ddpop) for bounded measurable $h \colon \RR \to \RR$. Here $h$ should be understood as a test function that we use to characterize the distribution $(D_\sigma \, \eta)(x, \cdot)$, which is the law of $r_\sigma(x) + \beta V$ when $V \sim (P_\sigma \otimes \eta)(x, \cdot)$: today's reward plus a discounted draw from the continuation distribution. This mirrors the scalar recursion. Expanding the continuation kernel, {eq}`eq-ddpop` can be equivalently written as $$ (D_\sigma \, \eta)(x, h) = \int \left[ \int h(r_\sigma(x) + \beta v) \, \eta(x', \diff v) \right] P_\sigma(x, \diff x'). $$ (eq-ddpopex) ```{prf:proposition} :label: p-ddpop Each $D_\sigma$ is an order preserving self-map on $(\hH, \trianglelefteq)$. ``` ```{prf:proof} Fix $\sigma \in \Sigma$ and $\eta \in \hH$. *Self-map.* For each $x$, the measure $(D_\sigma \, \eta)(x, \cdot)$ is the pushforward of the probability measure $(P_\sigma \otimes \eta)(x, \cdot)$ under the map $v \mapsto r_\sigma(x) + \beta v$, and is therefore a probability measure. Since $P_\sigma \otimes \eta$ is a stochastic kernel from $\Xsf$ to $\RR$ (as a composition of stochastic kernels) and the map $(x, v) \mapsto r_\sigma(x) + \beta v$ is jointly measurable, it follows that $D_\sigma \, \eta$ is again a stochastic kernel from $\Xsf$ to $\RR$. Finally, letting $M \coloneq \sup_{x,a} |r(x,a)|$ and $C \coloneq \sup_x \int |z| \, \eta(x, \diff z) < \infty$, we have $$ \sup_x \int |z| \, (D_\sigma \, \eta)(x, \diff z) \leq M + \beta C < \infty, $$ so $D_\sigma \, \eta \in \hH$. *Order preservation.* Fix $\eta \trianglelefteq \eta'$ in $\hH$, $x \in \Xsf$, and let $h \colon \RR \to \RR$ be bounded and increasing. Since $\beta > 0$, the map $v \mapsto h(r_\sigma(x) + \beta v)$ is also increasing. Because $\eta(x', \cdot) \lefsd \eta'(x', \cdot)$ for every $x'$, we have $$ \int h(r_\sigma(x) + \beta z) \, \eta(x', \diff z) \leq \int h(r_\sigma(x) + \beta z) \, \eta'(x', \diff z) $$ for every $x'$. Integrating against the nonnegative measure $P_\sigma(x, \diff x')$ and applying {eq}`eq-ddpopex` gives $(D_\sigma \, \eta)(x, h) \leq (D_\sigma \, \eta')(x, h)$. As $x$ and $h$ are arbitrary, $D_\sigma \eta \trianglelefteq D_\sigma \eta'$. ◻ ``` Since each $D_\sigma$ is an order preserving self-map on $\hH$, the pair $(\hH, \TT_{\rm DDP})$ is an ADP, where $\TT_{\rm DDP} \coloneq \setntn{D_\sigma}{\sigma \in \Sigma}$. The value space is $\hH$ (distributional value functions), the partial order is pointwise stochastic dominance, and the policy operators are $\{D_\sigma\}$. (sss-wpdist)= #### Well-Posedness As in the scalar MDP case, well-posedness follows from a contraction argument. To state it, we need a metric on distributions. The **Wasserstein-$1$ distance** between $\mu, \nu \in \dD_1(\RR)$ is $$ W_1(\mu, \nu) \coloneq \sup_{\|h\|_{\rm Lip} \leq 1} \left[ \int h \, \diff\mu - \int h \, \diff\nu \right], $$ where the supremum is over all $1$-Lipschitz functions $h \colon \RR \to \RR$. We equip $\hH$ with the **supremum Wasserstein metric** $$ \bar d_1(\eta, \eta') \coloneq \sup_{x \in \Xsf} W_1(\eta(x, \cdot), \eta'(x, \cdot)). $$ Under this metric, $\hH$ is a complete metric space. ```{exercise} :label: ex-ddpcon Show that $D_\sigma$ is a $\beta$-contraction on $(\hH, \bar d_1)$: $$ \bar d_1(D_\sigma \eta,\, D_\sigma \eta') \leq \beta \, \bar d_1(\eta, \eta') \qquad \text{for all } \eta, \eta' \in \hH. $$ [Hint: if $h$ is $1$-Lipschitz, then $v \mapsto h(r_\sigma(x) + \beta v)$ is $\beta$-Lipschitz.] ``` ```{solution} ex-ddpcon Fix $x \in \Xsf$ and let $h$ be $1$-Lipschitz. Define $g(v) \coloneq h(r_\sigma(x) + \beta v)$, which is $\beta$-Lipschitz. Applying {eq}`eq-ddpopex` and setting $\tilde g \coloneq g/\beta$, which is $1$-Lipschitz, $$ \begin{aligned} (D_\sigma \eta)(x, h) - (D_\sigma \eta')(x, h) &= \int [\eta(x', g) - \eta'(x', g)] \, P_\sigma(x, \diff x') \\ &= \beta \int [\eta(x', \tilde g) - \eta'(x', \tilde g)] \, P_\sigma(x, \diff x') \\ &\leq \beta \int W_1(\eta(x', \cdot), \eta'(x', \cdot)) \, P_\sigma(x, \diff x') \\ &\leq \beta \, \bar d_1(\eta, \eta'). \end{aligned} $$ Taking the supremum over all $1$-Lipschitz $h$ and then over $x$ gives the result. ``` By the Banach contraction mapping theorem, each $D_\sigma$ has a unique fixed point $\eta_\sigma \in \hH$. Hence $(\hH, \TT_{\rm DDP})$ is well-posed. Moreover, since each $D_\sigma$ is a contraction and hence globally stable on $\hH$, {prf:ref}`l-pspace` implies that $(\hH, \TT_{\rm DDP})$ is order stable. #### Identification of the Fixed Point Let $(X_t)_{t \geq 0}$ be the $P_\sigma$-Markov chain with $X_0 = x$ and define the discounted return $$ Z_\sigma(x) \coloneq \sum_{t=0}^\infty \beta^t r(X_t, \sigma(X_t)). $$ In the scalar case, the fixed point of $T_\sigma$ is the expected-return function $v_\sigma(x) = \EE Z_\sigma(x)$. What is the distributional analogue? The fixed point of $D_\sigma$ should be the distribution of $Z_\sigma(x)$. Since $r$ is bounded, say $|r| \leq M$, we have $|Z_\sigma(x)| \leq M/(1-\beta)$ almost surely, so the distribution of $Z_\sigma(x)$ lies in $\dD_1(\RR)$. ```{exercise} :label: ex-ddpfp Let $\hat \eta(x, \cdot)$ be the distribution of $Z_\sigma(x)$. Verify that $\hat \eta \in \hH$ and show that $D_\sigma \hat \eta = \hat \eta$. Conclude that $\eta_\sigma = \hat \eta$. ``` ```{solution} ex-ddpfp Since $|Z_\sigma(x)| \leq M/(1-\beta)$ a.s. for all $x$, the distribution $\hat\eta(x, \cdot)$ has bounded support and hence lies in $\dD_1(\RR)$ with $\sup_x \int |z| \, \hat\eta(x, \diff z) \leq M/(1-\beta) < \infty$. Measurability of $x \mapsto \hat\eta(x, B)$ follows from the canonical Markov construction in {ref}`sss-sks`: $\hat\eta(x, B) = \PP_x\{Z_\sigma \in B\}$, where $\PP_x$ is the law of the $(P_\sigma, x)$-Markov chain, and $Z_\sigma$ is a measurable functional of the path. Hence $\hat\eta \in \hH$. For the fixed point property, fix $x \in \Xsf$ and a bounded measurable $h$. The decomposition $Z_\sigma(x) = r_\sigma(x) + \beta Z_\sigma(X_1)$ and the Markov property give $$ \begin{aligned} \hat\eta(x, h) &= \EE[h(Z_\sigma(x))] = \EE[h(r_\sigma(x) + \beta Z_\sigma(X_1))] \\ &= \int \EE[h(r_\sigma(x) + \beta Z_\sigma(x'))] \, P_\sigma(x, \diff x') \\ &= \int \left[ \int h(r_\sigma(x) + \beta z) \, \hat\eta(x', \diff z) \right] P_\sigma(x, \diff x') = (D_\sigma \hat\eta)(x, h). \end{aligned} $$ Hence $D_\sigma \hat\eta = \hat\eta$. Since $D_\sigma$ has a unique fixed point, $\hat\eta = \eta_\sigma$. ``` Thus, while the scalar $\sigma$-value function $v_\sigma(x) = \EE[Z_\sigma(x)]$ records only the mean of the random return, the distributional $\sigma$-value function $\eta_\sigma(x, \cdot)$ records its full distribution---including variance, quantiles and tail behavior. The scalar value function is recovered as a special case: $v_\sigma(x) = \int z \, \eta_\sigma(x, \diff z)$. {numref}`f-distributional_dp` illustrates the action of the distributional policy operator on the optimal savings model from {ref}`s-og`, using the policy $\sigma$ computed in {numref}`f-os_multi_policies_2`. The left panel shows $D_{\sigma}^5 \eta_0$, an early iterate starting from the initial condition $\eta_0(x, \cdot) = \delta_0$ for all $x$. The right panel shows the converged distributional value function $\eta_{\sigma}$, which assigns to each state $x$ the full distribution of the discounted return $Z_{\sigma}(x)$. The iterations were implemented using the categorical projection method of {cite:t}`bellemare2017distributional`: the return axis is discretized into a fixed grid of atoms, and at each step the affine shift $z \mapsto r_\sigma(x) + \beta z$ is projected back onto this grid via linear interpolation of probability mass. The mean of $\eta_{\sigma}(x, \cdot)$ at each $x$ recovers the scalar value function $v_\sigma$ shown in {numref}`f-os_multi_policies_2`. ```{figure} figures/distributional_dp.pdf :name: f-distributional_dp :width: 95% Iterating $D_\sigma$: early iterate (left) and converged $\eta_\sigma$ (right) ``` #### Regularity One complication, in terms of developing a theory of distributional dynamic programming, is failure of regularity. Even when the state and action spaces are finite, greedy policies typically fail to exist. To see why, observe that, in the present setting, a policy $\sigma$ is $\eta$-greedy when $$ \int h(r_\tau(x) + \beta v) \, (P_\tau \otimes \eta)(x, \diff v) \leq \int h(r_\sigma(x) + \beta v) \, (P_\sigma \otimes \eta)(x, \diff v) $$ (eq-hrhr) for all $x \in \Xsf$, all $\tau \in \Sigma$ and all $h \in ib\RR$. A natural approach to this problem is to solve $$ \max_a \int \left[ \int h(r(x,a) + \beta v) \, \eta(x', \diff v) P(x, a, \diff x') \right] $$ at each $x$, and produce a policy $\sigma$ from the correspondence of maximizers. The problem with this idea is that, in most cases, the solution will depend on $h$. For example, if $h$ is concave with high curvature then optimal choices will avoid risk. If $h$ is linear, optimal choices will ignore risk. This makes it very difficult to attain {eq}`eq-hrhr` for all $h$ in $ib\RR$. Without regularity, the core optimization loop of dynamic programming---solve the Bellman equation, then extract a greedy policy---breaks down at the second step. In the reinforcement learning literature, practitioners who use distributional methods typically select actions using the mean of the return distribution---which amounts to standard scalar greediness---while exploiting the distributional representation to improve function approximation and learning dynamics. At a deeper level, the failure of regularity reflects the fact that, in order to set up a well-defined criterion for control, one must first commit to how risk is valued. This commitment takes the form of a specific nonlinear aggregator $K$ applied period-by-period, as described in {ref}`sss-fbtr` and {ref}`sss-btr` (see also the discussion of certainty equivalents and risk preferences in {ref}`sss-ces`). Such a commitment collapses the distributional object back to a scalar recursion, restoring regularity and the full apparatus of dynamic programming. This is the approach adopted throughout the remainder of this book. (ss-lq)= ### LQ Control LQ control is a major sub-field of dynamic programming, routinely applied to problems in engineering, economics, operations research and elsewhere. In this section we describe a canonical LQ problem and show how it can be solved using ADP methods. Rather than aiming to provide new results, we plan to show that LQ problems can be cleanly handled by the theory provided above, instead of requiring specialized machinery. (sss-lqdis)= #### Description We consider a deterministic undiscounted **linear-quadratic (LQ) control problem**, defined as a tuple $(Q, R, A, B)$, where - $A$ is $k \times k$ and $B$ is $k \times m$, - $Q$ is $k \times k$ and positive semidefinite, and - $R$ is $m \times m$ and positive definite. The objective of the LQ problem is to solve $$ \min_{(u_t)} \sum_{t \geq 0} \left[ x_t^\top Q x_t + u_t^\top R u_t \right] $$ (eq-lqobj) subject to $$ x_{t+1} = A x_t + B u_t \quad \text{ for all } t \geq 0. $$ The vector $x_t \in \RR^k$ is called the **state variable** and $u_t \in \RR^m$ is called the **action** or **control**. Note that $x_t^\top Q x_t + u_t^\top R u_t \geq 0$ for all $t$, so the infinite sum {eq}`eq-lqobj` takes values in $[0, \infty]$. The Bellman equation takes the form $$ \ell(x) = \min_{u \in \RR^m} \left\{ x^\top Q x + u^\top R u + \ell(Ax + B u) \right\} \quad \text{for all } x \in \RR^k. $$ (eq-lqbe) Since the solution to this problem is well-known, we will not seek new results. Rather, our aim is to illustrate how we can embed the LQ problem into the ADP framework and recover existing results in a relatively straightforward way. (sss-preric)= #### Preamble: LQ Background Before we go further, let's set up a framework for working with LQ problems and note down some standard results from the literature. In what follows, $A, B, R,$ and $Q$ are as described in {ref}`sss-lqdis`. We let - $\pP$ be the set of positive semidefinite $k \times k$ matrices - $\preccurlyeq$ be the **Loewner partial order** on $\RR^{k \times k}$, so that $$ M \preccurlyeq N \quad \iff \quad N - M \in \pP. $$ We use $0$ to denote the zero element of $\RR^{k \times k}$, so that $P$ is in $\pP$ if and only if $0 \preccurlyeq P$. Also, we define - the **Riccati map** $\bR \colon \pP \to \pP$ via $$ \bR(P) = A^\top P A - A^\top P B(B^\top P B + R)^{-1} B^\top P A + Q, \quad \text{and} $$ (eq-ricc) - the **control gain map** $\bF \colon \pP \to \RR^{m \times k}$ via $$ \bF (P) = - (B^\top P B + R)^{-1} B^\top P A . $$ (eq-lqus) Below we will connect the Riccati map to the Bellman operator for this problem and the control gain map will help us select decision rules. The fixed point equation $P = \bR(P)$ is called the **Riccati equation**. Note that the inverse in {eq}`eq-ricc` exists because $R$ is positive definite and $P$ is positive semidefinite, implying that $B^\top P B + R$ is also positive definite. The next exercise is a well-known result and the proof is not trivial. ```{exercise} :label: ex-afric Show that $$ \bR(P) = (A + BF)^\top P (A + BF) + F^\top R F + Q. $$ whenever $F = \bF(P)$. ``` ```{solution} ex-afric Let $M \coloneq (B^\top P B + R)^{-1}$ and $K \coloneq B^\top P A$, so that $F = -MK$. From {eq}`eq-ricc`, the Riccati map satisfies $$ \bR(P) = A^\top P A - K^\top M K + Q. $$ Since $A + BF = A - BMK$, expanding the right-hand side of the target expression gives $$ \begin{aligned} &(A + BF)^\top P (A + BF) + F^\top R F + Q \\ &= (A - BMK)^\top P (A - BMK) + K^\top M^\top R M K + Q \\ &= A^\top P A - A^\top P BMK - K^\top M^\top B^\top P A \\ & \qquad \qquad + K^\top M^\top B^\top P B M K + K^\top M^\top R M K + Q \\ &= A^\top P A - 2K^\top M K + K^\top M^\top (B^\top P B + R) M K + Q. \end{aligned} $$ Since $M = (B^\top P B + R)^{-1}$, we have $(B^\top P B + R) M = I$ and $M$ is symmetric. Hence $K^\top M^\top (B^\top P B + R) M K = K^\top M K$. Substituting yields $\bR(P) = A^\top P A - K^\top M K + Q$, as required. ``` ```{exercise} :label: ex-adps-auto-5 Confirm that $\bR$ does in fact map $\pP$ into itself. ``` ```{solution} ex-adps-auto-5 Fix $P \in \pP$. We need to show that $\bR(P)$ is positive semidefinite. This follows from {prf:ref}`ex-afric`, since each of the three terms on the right-hand side is positive semidefinite. (We use the fact that if $E$ is positive semidefinite and $F$ is any matrix, then $F^\top E F$ is positive semidefinite, as follows easily from the definition.) ``` The next lemma shows that the control gain map selects matrices that are akin to "min-greedy policies," although we need to be a bit careful with that terminology, since we also want our policies to be stable (as clarified below). ```{prf:lemma} :label: l-fxeam If $P \in \pP$, $F = \bF(P)$ and $\ell(x) = x^\top P x$, then $$ Fx = \argmin_{u \in \RR^m} \left\{ x^\top Q x + u^\top R u + \ell(Ax + B u) \right\} \quad \text{for all } x \in \RR^k. $$ (eq-fxeam) Moreover, the minimizer is unique. ``` ```{prf:proof} Fix $x \in \RR^k$. Since $R$ is positive definite and $P$ is positive semidefinite, the map $u \mapsto x^\top Q x + u^\top R u + (Ax + Bu)^\top P (Ax + Bu)$ is strictly convex in $u$. Setting its gradient with respect to $u$ equal to zero yields $2Ru + 2B^\top P (Ax + Bu) = 0$, which gives $$ u = -(B^\top P B + R)^{-1} B^\top P A x = Fx. $$ Since the function is strictly convex, this critical point is the unique minimizer. ◻ ``` In stating the next result we let $C$ be such that $C^\top C = Q$. We refer to {cite}`bertsekas2012dynamic` for the definitions of observability and controllability. ```{prf:lemma} :label: l-pminc If $(A, B)$ is controllable and $(A, C)$ is observable, then 1. The Riccati map $\bR$ has a unique fixed point in $\pP$. 2. If $P = \bR(P)$ and $F = \bF(P)$, then $\rho(A + BF) < 1$. ``` ```{prf:proof} See Proposition 4.4.1 of {cite}`bertsekas2012dynamic`. ◻ ``` #### Policies In the LQ setting, a **control matrix** is any $F \in \RR^{m \times k}$. Under a given control matrix $F$, the current control obeys $u_t = F x_t$ and the update rule for the state is $x_{t+1} = A x_t + B F x_t$. Hence the state evolves according to $x_t = (A + BF)^t x_0$. Following {eq}`eq-lqobj`, the lifetime cost of following $F$, starting at initial condition $x_0 \in \RR^k$, is $$ \ell_F (x_0) \coloneq \sum_{t=0}^\infty x_t^\top \left( F^\top R F + Q \right) x_t \; \text{ with } x_t = (A + BF)^t x_0. $$ (eq-lqlife) ```{prf:example} Suppose that $m=k=1$ and $Q=R=1$, so that a control matrix is a scalar $F \in \RR$. In view of {eq}`eq-lqlife`, the lifetime cost of $F$ when starting at $x_0$ is $$ \ell_F(x_0) = c x_0^2 \quad \text{where} \quad c \coloneq (F^2 + 1) \sum_{t=0}^\infty (A + BF)^{2t}. $$ (eq-lqlvod) The sum is finite when $|A + BF| < 1$. {numref}`f-lq_illustration_1` plots $\ell_F$ for different choices of $F$ when $A=B=1$ and the stability condition holds. Of the two alternatives, $F=-0.6$ attains the lowest cost from every state. ``` ```{figure} figures/lq_illustration_1.pdf :name: f-lq_illustration_1 The function $\ell_F$ for different choices of $F$ ``` Returning to the general case, finite lifetime costs require driving the state to zero fast enough for the sum {eq}`eq-lqlife` to converge. In this connection, extending the condition $|A + BF| < 1$ from the one-dimensional example, a control matrix $F$ is called **stable** if the spectral radius condition $\rho(A + BF) < 1$ holds. ```{exercise} :label: ex-adps-auto-6 Prove that, for any fixed $x_0 \in \RR^k$, the sequence $x_t = (A + BF)^t x_0$ converges to zero as $t \to \infty$ when $F$ is a stable control matrix. ``` ```{solution} ex-adps-auto-6 Let $F$ be as stated and fix $t \in \NN$. Let $E = A + BF$. Observe that $\|x_t\| \leq \|E^t\| \|x_0\|$, where the first norm on the right hand side is the operator norm on $\RR^{k \times k}$. By Gelfand's formula for the spectral radius (page ), there exists an $\epsilon \in (0, 1)$ such that $\|E^t\| \leq \epsilon^t$ for all sufficiently large $t$. Hence $\|x_t\| \leq \epsilon^t \|x_0\|$ for all such $t$. The claim follows. ``` (sss-lqasadp)= #### From LQ to ADP We wish to produce an ADP representation of the LQ problem. To this end, we set - $\Sigma \coloneq$ the set of stable control matrices, and - $\TT \coloneq \setntn{\bT_F}{F \in \Sigma}$ with each policy operator $P \mapsto \bT_F(P)$ defined by $$ \bT_F (P) = Q + F^\top R F + (A + BF)^\top P (A + BF) . $$ (eq-lqpolop) To match earlier ADP terminology, a stable control matrix is also referred to as a **policy**. ```{exercise} :label: ex-etfo Show that every $\bT_F$ in $\TT$ is an order preserving self-map on $(\pP, \preccurlyeq)$. ``` ```{solution} ex-etfo Fix $F \in \Sigma$. $\bT_F$ is a self-map on $\pP$ because $\pP$ is stable under addition and $(A + BF)^\top P (A + BF)$ is in $\pP$ whenever $P \in \pP$. To see that $\bT_F$ is order preserving, fix $P_1 \preccurlyeq P_2$. Since $P_2 - P_1$ is positive semidefinite, so is $$ \bT_F(P_2) - \bT_F(P_1) = (A + BF)^\top (P_2 - P_1) (A + BF). $$ (eq-nfnf) The expression on the right-hand side of {eq}`eq-nfnf` is positive semidefinite when $P_2 - P_1$ is positive semidefinite. Hence $\bT_F(P_1) \preccurlyeq \bT_F(P_2)$. ``` It follows from {prf:ref}`ex-etfo` that $(\pP, \TT)$ is an ADP. #### Interpretation and Properties The fixed point equation $P = \bT_F(P)$ can be interpreted as a recursive equation for lifetime cost under policy $F$. To see this, suppose $P = \bT_F(P)$ and set $\ell(x) \coloneq x^\top P x$. Using {eq}`eq-lqpolop` and a bit of algebra, you will be able to confirm that this function $\ell$ obeys the recursion $$ \ell(x) = x^\top Q x + (Fx)^\top R (Fx) + \ell((A + BF)x). $$ (eq-lqre) The right-hand side equals the current state cost $x^\top Q x$, plus the current action cost $(Fx)^\top R (Fx)$, plus the lifetime cost from the next state $(A + BF)x$. This is exactly the recursive structure of lifetime cost under policy $F$. ```{prf:lemma} :label: l-wplq If $F \in \Sigma$, then $\bT_F$ is globally stable on $\pP$ with unique fixed point $$ P_F \coloneq \sum_{t=0}^\infty [(A + BF)^t]^\top \left(F^\top R F + Q\right) (A + BF)^t. $$ (eq-lqnf) ``` ```{prf:proof} Fix $\bT_F \in \TT$. Let $\bL_F$ be a linear self-map on $\RR^{k \times k}$ defined by $\bL_F (P) \coloneq (A + BF)^\top P (A + BF)$. Since $F$ is stable, $\rho(\bL_F) < 1$ on $\RR^{k \times k}$. Hence, by the Neumann series lemma (see, in particular, {prf:ref}`c-ibnl` on page ), the map $\bT_F$ is globally stable on $\pP$ with unique fixed point $$ P_F = \sum_{t=0}^\infty \bL_F^t \left(F^\top R F + Q\right) = \sum_{t=0}^\infty [(A + BF)^t]^\top \left(F^\top R F + Q\right) (A + BF)^t. $$ This verifies the expression for $P_F$ in {eq}`eq-lqnf`. ◻ ``` Comparing {eq}`eq-lqnf` with {eq}`eq-lqlife`, we see that $x^\top P_F \, x = \ell_F(x)$ for all $x \in \RR^k$. Hence $P_F$ is the matrix representation of the lifetime cost function. ```{prf:lemma} :label: l-lqwpos The ADP $(\pP, \TT)$ is order stable. ``` ```{prf:proof} Fix any $\bT_F \in \TT$. By {prf:ref}`l-wplq`, $\bT_F$ is globally stable. By {prf:ref}`l-pspace` on page , globally stable order preserving self-maps are order stable. Hence $(\pP, \TT)$ is order stable. ◻ ``` #### Min-Greedy Policies Fixing $P \in \pP$, the ADP definition of min-greedy policies (page ) tells us that $F \in \Sigma$ is $P$-min-greedy if and only if $\bT_F(P) \preccurlyeq \bT_G(P)$ for all $G \in \Sigma$. The next exercise is useful for characterizing min-greedy policies. ```{exercise} :label: ex-mgreedy Prove that, for $P \in \pP$ and control matrix $F \in \RR^{m \times k}$, $$ F = \bF(P) \quad \iff \quad \bT_F(P) \preccurlyeq \bT_G(P) \text{ for every } G \in \RR^{m \times k}. $$ ``` ```{solution} ex-mgreedy Fix $P \in \pP$ and $x \in \RR^k$. Let $F = \bF(P)$. Since $F x$ is the minimizer of {eq}`eq-lqbe` when $\ell(x) = x^\top P x$, we have $$ x^\top Q x + x^\top F^\top R Fx + x^\top (A + BF)^\top P (A + BF) x \leq x^\top Q x + u^\top R u + \ell(Ax + B u) $$ (eq-fbg) for any $u \in \RR^m$. Setting $G$ to be any control matrix and letting $u = Gx$ allows us to write {eq}`eq-fbg` as $x^\top \bT_F(P) \, x \leq x^\top \bT_G(P) \, x$. Hence $\bT_F(P) \preccurlyeq \bT_G(P)$. Now let $\ell(x) = x^\top P x$ and let $F'$ be any control matrix. If $\bT_{F'}(P) \preccurlyeq \bT_G(P)$ holds for any control matrix $G$, then it holds when $G = \bF(P)$. Using this choice of $G$ and fixed $x \in \RR^k$, we get $x^\top \bT_{F'}(P) \, x \leq x^\top \bT_G(P) \, x$ and hence $$ x^\top Q x + u_{F'}^\top R u_{F'} + \ell(Ax + Bu_{F'}) \leq \min_{u \in \RR^m} \left\{ x^\top Q x + u^\top R u + \ell(Ax + B u) \right\} $$ where $u_{F'} = F'x$. {prf:ref}`l-fxeam` now implies that $F' = - (B^\top P B + R)^{-1} B^\top P A = \bF(P)$. ``` In obtaining optimality results, one problem we have is that $(\pP, \TT)$ is not always regular. On the one hand, if we take an arbitrary $P \in \pP$ and then calculate the control gain matrix $F = \bF(P)$, {prf:ref}`ex-mgreedy` assures us that we get the "min-greedy" property $\bT_F(P) \preccurlyeq \bT_G(P)$. On the other hand, we have no guarantee that $F$ is actually in $\Sigma$. In particular, $F$ might not be a *stable* control matrix. For this reason, we introduce a smaller set $\pP_S \subset \pP$ of all positive semidefinite matrices such that control gain $F = \bF(P)$ is stable. That is, we set $$ \pP_S \coloneq \setntn{P \in \pP}{\rho(A + B \bF(P))<1} = \setntn{P \in \pP}{\bF(P) \in \Sigma}. $$ With this definition, the following lemma is immediate from {prf:ref}`ex-mgreedy`. ```{prf:lemma} :label: l-nsgpo If $P \in \pP_S$, then the control gain $\bF(P)$ lies in $\Sigma$ and is $P$-min-greedy. ``` Note that $\pP_S$ is not necessarily closed under the policy operators $\bT_F$, so we cannot take $(\pP_S, \TT)$ as an ADP. (the-bellman-equation)= #### The Bellman Equation From the definition in {ref}`sss-adpbell`, the Bellman min-operator associated with the ADP $(\pP, \TT)$ is defined by $$ \bT (P) = \bigwedge_{F \in \Sigma} \bT_F (P) \;\; \text{ whenever the infimum exists}. $$ If $P \in \pP_S$ and $F = \bF(P)$, then, by {prf:ref}`l-nsgpo`, $F$ is $P$-min-greedy. Hence $\bT (P) = \bT_F (P)$ (see {prf:ref}`l-torper`), so $$ \bT(P) = Q + F^\top R F + (A + BF)^\top P (A + BF) \quad\text{when} \quad F = \bF(P). $$ (eq-lqadpt) Recalling {prf:ref}`ex-afric`, this means that ```{prf:lemma} :label: l-beri The Bellman min-operator $\bT$ equals the Riccati map $\bR$ on $\pP_S$. ``` The next lemma connects this to the Riccati equation and another version of the LQ Bellman equation. ```{prf:lemma} :label: l-lqbel For $P \in \pP_S$, the following statements are equivalent: 1. $P$ solves the Riccati equation $\bR(P) = P$. 2. $P$ solves the Bellman min-equation $\bT(P) = P$. 3. The function $\ell(x) \coloneq x^\top P x$ obeys {eq}`eq-lqbe`. ``` ```{prf:proof} Equivalence of (i) and (ii) follows from {prf:ref}`l-beri`. Regarding (iii), suppose that (ii) holds, so that $P \in \pP_S$ and $P = \bT(P)$. Then, setting $\ell(x) \coloneq x^\top P x$ and using the expression for $\bT(P)$ in {eq}`eq-lqadpt` yields $$ \ell(x) = x^\top Q x + x^\top F^\top R F x + \ell(Ax + BFx). $$ Applying {eq}`eq-fxeam` yields (iii). Using similar logic we can show that (iii) implies (ii). ◻ ``` #### Optimality Suppose that $(A, B)$ is controllable and $(A, C)$ is observable. Since $\bT$ and $\bR$ agree on $\pP_S$, {prf:ref}`l-pminc` tells us that $\bT$ has a fixed point $P^*$ in $\pP_S$. We call $P^*$ the **minimum loss matrix**. Let $\pP_\Sigma$ be the set of lifetime costs, so that $$ \pP_\Sigma = \setntn{P = P_F}{F \in \Sigma}. $$ Applying {prf:ref}`t-minbk` and the fact that $(\pP, \TT)$ is order stable ({prf:ref}`l-lqwpos`), we obtain the following optimality results: 1. $P^*$ is the least element of $(\pP_\Sigma, \preccurlyeq)$, 2. $P^*$ obeys the Bellman min-equation $\bT P^* = P^*$, and 3. a policy $F$ is min-optimal for $(\pP, \TT)$ if and only if $F$ is $P^*$-min-greedy. We can translate (a)--(c) into more familiar optimality results for LQ problems. For example, (b) combined with {prf:ref}`l-lqbel` tells us that $$ \ell^*(x) = \min_{u \in \RR^m} \left\{ x^\top Q x + u^\top R u + \ell^*(Ax + B u) \right\} \quad (x \in \RR^k), $$ (eq-lqbestar) where $\ell^*$ is defined by $\ell^*(x) = x^\top P^* x$ for all $x$. Moreover, if we now set $F = \bF(P^*)$, then, by {prf:ref}`l-nsgpo`, the policy $F$ is $P^*$-min-greedy. Hence, by (c), $F$ is min-optimal. (s-cn_adps)= ## Chapter Notes This chapter is based on the abstract dynamic programming framework of {cite}`sargent2025partially`, which builds on theory found in {cite}`denardo1967contraction`, {cite}`bertsekas1977monotone`, {cite}`verdu1987abstract`, {cite}`szepesvari1998non`, {cite}`kamihigashi2014elementary`, {cite}`bertsekas2017regular`, {cite}`li2021online`, and, in particular, {cite}`bertsekas2022abstract`. The paper by {cite}`sargent2025partially` adds a layer of abstraction over these earlier frameworks by shifting analysis to families of policy operators on partially ordered sets. Doing so makes it possible treat a wider class of problems and generate new results, as discussed in later chapters. Other precursors to the framework described in this chapter include {cite}`porteus1975optimality` and {cite}`kreps1977optimality`, who pioneered the use of order-theoretic methods to extend dynamic programming optimality theory beyond the standard expected discounted reward criterion. (An overview of this line of work appears in Appendix 6 of {cite}`kreps2013microeconomic`.) Their operator-based approach accommodates non-additive objectives, such as expected utility criteria, risk-sensitive preferences and stochastic games. Building on this foundation, {cite}`kreps1979dynamic` showed that non-additive recursive preferences are amenable to dynamic programming, work that inspired the recursive utility framework of {cite}`epstein1989risk`. The discussion of distributional dynamic programming in {ref}`ss-ddp` builds on {cite}`bellemare2017distributional`. The objective of this line of work is to replace scalar value functions with return distributions, enabling agents to reason about risk, variability, and higher-order statistics of outcomes. The theoretical foundations have been extended by several authors, including {cite}`dabney2018distributional`, {cite}`rowland2018analysis`, {cite}`bauerle2025distributional`, {cite}`marthe2026`, and {cite}`bauerle2026markov`. Distributional methods have found application in risk-sensitive control, robotics, and finance. Linear-quadratic optimal control theory was developed during the late 1950s and 1960s, when the Riccati equation emerged as a key tool for computing optimal feedback policies. These methods have been widely applied in economics. {cite}`sargent1987dynamic` provides an early treatment connecting LQ foundations to economic modeling. {cite}`hansen1980formulating` showed how to formulate and estimate dynamic linear rational expectations models using LQ methods. The risk-sensitive extension to linear-exponential-quadratic-Gaussian (LEQG) control, developed by {cite}`whittle1981risk` and adapted for discounted problems in economics by {cite}`hansen1995discounted`, provides a bridge between LQ control and robust decision-making under model uncertainty; see {cite}`hansen2011robustness` for a comprehensive treatment. LQ methods underpin much of macroeconomic modeling, including the analysis of fiscal policy, monetary policy, and business cycle dynamics. See {cite}`hansen2013recursive` and {cite}`ljungqvist2012recursive` for numerous applications. Research at the intersection of nonlinear dynamics and LQ control is currently quite active. One of the key ideas is to approximate nonlinear systems with very high dimensional linear systems, and then to approximate those linear systems via singular value decomposition. For further discussion of these topics, see {cite}`kutz2016dynamic` or {cite}`brunton2019data`. [^1]: In general, designating a $v$-greedy policy for all $v \in V$ requires the Axiom of Choice. In practice, many applications induce some structure on the policy set that can be used to produce simple selection mechanisms. ======================================================================== ## ADPs on Pospace In this chapter, we add topological structure to the value space and the policy operators. This allows us to provide sufficient conditions for optimality that are often easy to use in applications. Proofs of the theorems presented in this chapter leverage the poset ADP theory that we constructed in {prf:ref}`c-adps`. We begin by studying ADPs on pospace, then specialize to partially ordered metric spaces, where contractivity arguments become available. Minimization counterparts and a treatment of nonstationary policies round out the theoretical development. In {ref}`s-adp2apps` we apply the theory to discrete MDPs and Q-factors, optimal savings, no-discount optimal stopping, and the sequential analysis problem from {ref}`s-wald`. ## Adding Topology In this section we introduce ADPs in settings where the value space is a poset and also a pospace (i.e., partially ordered space, see {ref}`ss-pospace` for the definition and basic properties). We study optimality in settings where the policy operators have topological stability properties, as well as order properties. *Throughout this chapter, $V$ is always assumed to be a pospace.* In {ref}`ss-adppospace` we study ADPs on pospace, leveraging global stability to obtain optimality and convergence results. In {ref}`ss-ams` we specialize to partially ordered metric spaces, where contraction-based arguments apply. {ref}`ss-minrpospace` provides minimization counterparts and {ref}`ss-nonstat` treats nonstationary policies. (ss-adppospace)= ### ADPs on Pospace Let $V$ be a pospace and let $(V, \TT)$ be an ADP. Recall that a self-map $S$ on $V$ is called globally stable when $S$ has a unique fixed point $\bar v$ in $V$ and $S^n v \to \bar v$ as $n \to \infty$ for all $v \in V$. (See {ref}`ss-scon` for more background.) We say that - $(V, \TT)$ is **globally stable** if each $T_\sigma \in \TT$ is globally stable on $V$. Obviously, if $(V, \TT)$ is globally stable, then $(V, \TT)$ is well-posed. We also have the following useful preliminary result, which is an immediate consequence of {prf:ref}`l-pspace`. ```{prf:lemma} :label: l-gsios If $(V, \TT)$ is globally stable, then $(V, \TT)$ is strongly order stable. ``` The next result shows that regularity and global stability together yield strong optimality properties. ```{prf:theorem} :label: t-pospace Let $(V, \TT)$ be regular and globally stable. If $T$ has a fixed point in $V$, then 1. the fundamental optimality properties hold and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} Let $(V, \TT)$ and $T$ be as stated. Since $(V, \TT)$ is order stable ({prf:ref}`l-gsios`), part (i) follows from {prf:ref}`c-bk`. Regarding convergence of VFI, fix $v \in V_U$, let $\sigma$ be an optimal policy and let $\vmax$ be the value function. We have $T_\sigma \, v \preceq T v \preceq \vmax$, where the last inequality is by {prf:ref}`l-rwubv` (since, by that lemma, $v \preceq \vmax$ and, therefore, $Tv \preceq T\vmax = \vmax$). From this chain of inequalities, combined with the fact that $T_\sigma \preceq T$ on $V$, we obtain $T_\sigma^n \, v \preceq T^n v \preceq \vmax$ for all $n$. As $\sigma$ is optimal and $(V, \TT)$ is globally stable, the policy operator $T_\sigma$ is globally stable with unique fixed point $\vmax$. Because $T_\sigma^n \, v \preceq \vmax$ for all $n$ and $T_\sigma^n \, v \to \vmax$, {prf:ref}`l-posmcoc` implies that the supremum of $T^n_\sigma \, v$ is $\vmax$. From this fact and $T_\sigma^n \, v \preceq T^n v \preceq \vmax$ for all $n$, the supremum of $T^n v$ is also $\vmax$ ({prf:ref}`ex-supsand`). Moreover, this sequence is increasing (because $v \in V_U$), which leads us to $T^n v \uparrow \vmax$. Hence VFI converges. Since $(V, \TT)$ is regular and order stable (by {prf:ref}`l-gsios`), convergence of VFI implies convergence of OPI and HPI ({prf:ref}`c-cl2`). ◻ ``` {prf:ref}`t-pospace` is a high-level result because a condition (existence of a fixed point) is placed on the derived object $T$, rather than the primitives $V$ and $\TT$. Below, we present a sequence of results that leverage {prf:ref}`t-pospace` while also placing all assumptions on primitives. The first is the relatively obvious but useful corollary, which adds convergence of VFI and OPI to {prf:ref}`t-bkf`. ```{prf:corollary} :label: c-pospace Let $(V, \TT)$ be regular and globally stable. If $(V, \TT)$ is also finite, then 1. the fundamental optimality properties hold and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} Since $(V, \TT)$ is globally stable, $(V, \TT)$ is also order stable ({prf:ref}`l-gsios`). Since it is also regular and finite, the fundamental optimality properties hold ({prf:ref}`t-bkf`). In particular, $T$ has a fixed point in $V$, so all the conclusions of {prf:ref}`t-pospace` hold. ◻ ``` The conditions of the next theorem are similar to those of {prf:ref}`t-dede`, after replacing order stability and order continuity with global stability. ```{prf:theorem} :label: t-tspo Let $(V, \TT)$ be regular and globally stable. If, in addition, $(V, \TT)$ is order bounded and $V$ is countably Dedekind complete, then 1. the fundamental optimality properties hold and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} In view of {prf:ref}`t-pospace`, we need only show that $T$ has a fixed point in $V$. To see this, we use the order bounded property to take a $u \in V$ be such that $T_\sigma \, u \preceq u$ for all $\sigma \in \Sigma$. Now let $v$ be any element of $V_\Sigma$. Global stability implies order stability ({prf:ref}`l-gsios`), so we have $v \preceq u$. Moreover, $v \preceq T v$, since $V_\Sigma \subset V_U$ (by regularity and {prf:ref}`l-vsigvu`). Hence $T$ is a self-map on the order interval $[v, u]$. Letting $v_n \coloneq T^n v$ and applying countably Dedekind completeness, we have $v_n \uparrow \bar v$ for some $\bar v \in [v, u]$. We claim that $T \bar v = \bar v$. To see this, first observe that $v_{n+1} = T v_n \preceq T \bar v$ for all $n$, so $\bar v \preceq T \bar v$. Letting $\sigma$ be $\bar v$-greedy, we have $\bar v \preceq T \bar v = T_\sigma \, \bar v$, so, by order stability, $\bar v \preceq v_\sigma$. Also, letting $w_n \coloneq T_\sigma^n \, v$, we have $w_n \preceq v_n \preceq \bar v \preceq v_\sigma$ for all $n \in \NN$. By global stability, $w_n \to v_\sigma$. Applying {prf:ref}`l-posmcoc`, we also have $\vee_n w_n = v_\sigma$. From this fact and the previous chain of inequalities, it must be that $\bar v = v_\sigma$. Hence $T \bar v = T_\sigma \, \bar v = T_\sigma \, v_\sigma = v_\sigma = \bar v$. This completes the proof that $\bar v$ is a fixed point of $T$. ◻ ``` (ss-ams)= ### ADPs on Metric Space In this section we add more structure by assuming that our value space $V$ is in fact a **partially ordered metric space**, by which we mean that $V = (V, d, \preceq)$ where $d$ is a metric on $V$ and $(V, \preceq)$ is a pospace under the topology generated by $d$. By construction, this specializes the earlier setting of {ref}`ss-adppospace`, where $(V, \preceq)$ is just a pospace. Throughout {ref}`ss-ams`, - $(V, \TT)$ is an ADP and - $V = (V, d, \preceq)$ is a partially ordered metric space. We will say that the ADP $(V, \TT)$ is **semi-regular** if there exists a closed subset $V_0$ of $V$ such that $V_0 \subset V_G$ and $T V_0 \subset V_0$. In addition, we will say that VFI is **geometrically convergent on $V_0$** if $\vmax$ exists and there is a $\beta \in (0,1)$ such that $$ \text{ } d(T^n v, \vmax) = \OO(\beta^n) \text{ as } n \to \infty \text{ for each } v \in V_0 \text{. } $$ Here $f(n) = \OO(\beta^n)$ means that there exists a $C < \infty$ with $f(n) \leq C \beta^n$ for all $n \in \NN$. In stating the next theorem, we use the notion of sup-nonexpansiveness from {ref}`sss-snms`. ```{prf:theorem} :label: t-contract Let $(V, \TT)$ be semi-regular on $V_0$ and let $d$ be complete and sup-non­expansive. If each $T_\sigma \in \TT$ is a contraction of modulus $\beta$ on $V$, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $V_0$, and 3. VFI is geometrically convergent on $V_0$. If, in addition, $(V, \TT)$ is regular, then OPI and HPI also converge. ``` ```{prf:proof} Let $V$ and $\TT$ have the stated properties. By completeness and Banach's fixed point theorem (page ), the ADP $(V, \TT)$ is globally stable and hence order stable ({prf:ref}`l-gsios`). Moreover, each $T_\sigma$ is a contraction of modulus $\beta$ on $V$ and $T$ is a well-defined self-map on $V_0$. {prf:ref}`l-snms` now implies that $T$ is a contraction of modulus $\beta$ on $V_0$. As $V_0$ is closed and $V$ is complete, this implies that $T$ has a fixed point in $V_0 \subset V_G$. Hence, by {prf:ref}`c-bk`, the fundamental optimality properties hold. It follows that $\vmax$ exists and is the unique fixed point of $T$ in $V_G$. Since $T$ has a fixed point in $V_0$, this also implies that $\vmax \in V_0$. Geometric convergence of VFI now follows from contractivity of $T$ on $V_0$. If $(V, \TT)$ is also regular, then convergence of OPI and HPI follow from {prf:ref}`t-pospace`. ◻ ``` (ss-minrpospace)= ### Minimization Because of {prf:ref}`t-fbk_min`, the optimality and convergence theorems based around maximization can easily be converted to theorems for minimization. In this section we give some examples. Our first is a min-version of {prf:ref}`t-pospace`. ```{prf:theorem} :label: t-minpospace Let $(V, \TT)$ be min-regular and globally stable. Let $\tmin$ be the Bellman min-operator for $(V, \TT)$. If $\tmin$ has a fixed point in $V$, then 1. the fundamental min-optimality properties hold and 2. min-VFI, min-OPI and min-HPI all converge. ``` ```{prf:proof} Let $(V, \TT)$ be as stated and let $(V, \TT)^\partial$ be the dual ADP. Since $(V, \TT)$ is min-regular, $(V, \TT)^\partial$ is max-regular ({prf:ref}`ex-mmp`). Since $(V, \TT)$ is globally stable, $(V, \TT)^\partial$ is likewise globally stable. By assumption, the Bellman min-operator $\tmin$ for $(V, \TT)$ has a fixed point $\bar v$ in $V$. Since the Bellman max-operator $T^\partial$ for $(V, \TT)^\partial$ satisfies $T^\partial = \tmin$ (the supremum under $\preceq^\partial$ equals the infimum under $\preceq$), the element $\bar v$ is also a fixed point of $T^\partial$. As a result, {prf:ref}`t-pospace` implies that, for $(V, \TT)^\partial$, the fundamental max-optimality properties hold and max-VFI, max-OPI and max-HPI all converge. The conclusions of {prf:ref}`t-minpospace` now follow from {prf:ref}`t-fbk_min`. ◻ ``` For the rest of {ref}`ss-minrpospace`, $V = (V, d, \preceq)$ is always assumed to be a partially ordered metric space. Our next result is a min-version of {prf:ref}`t-contract`, restricted to the regular case. ```{prf:theorem} :label: t-mincontract Let $(V, \TT)$ be min-regular and let $d$ be complete and sup-non­expansive. If each $T_\sigma \in \TT$ is a contraction of modulus $\beta$ on $V$, then 1. the fundamental min-optimality properties hold and 2. min-VFI, min-OPI and min-HPI all converge. ``` ```{prf:proof} Let $(V, \TT)$ be as stated and let $(V, \TT)^\partial$ be the dual ADP. Since $(V, \TT)$ is min-regular, $(V, \TT)^\partial$ is max-regular ({prf:ref}`ex-mmp`). Since every $T_\sigma \in \TT$ is a contraction of modulus $\beta$ on $V$, {prf:ref}`t-contract` implies that, for $(V, \TT)^\partial$, the fundamental max-optimality properties hold and max-VFI, max-OPI and max-HPI all converge. The conclusions of {prf:ref}`t-mincontract` now follow from {prf:ref}`t-fbk_min`. ◻ ``` Now let's consider a min-version of {prf:ref}`t-tspo`. ```{prf:theorem} :label: t-mintspo Let $(V, \TT)$ be min-regular and globally stable. If, in addition, $(V, \TT)$ is min-order bounded and $V$ is countably Dedekind complete, then 1. the fundamental min-optimality properties hold and 2. min-VFI, min-OPI and min-HPI all converge. ``` ```{prf:proof} Let $(V, \TT)$ be as stated and let $(V, \TT)^\partial$ be the dual ADP. The set $V^\partial$ is countably Dedekind complete by {prf:ref}`ex-cdcdual` on page . Since $(V, \TT)$ is min-regular and min-order bounded, the dual $(V, \TT)^\partial$ is max-regular and max-order bounded ({prf:ref}`ex-mmp`). Since global stability of $(V, \TT)$ is equivalent to global stability of $(V, \TT)^\partial$, the conditions of {prf:ref}`t-tspo` all hold for $(V, \TT)^\partial$. As a result, for $(V, \TT)^\partial$, the fundamental max-optimality properties hold and max-VFI, max-OPI and max-HPI all converge. The conclusions of {prf:ref}`t-mintspo` now follow from {prf:ref}`t-fbk_min`. ◻ ``` (ss-nonstat)= ### Nonstationary Policies In all of the preceding discussion we focused on stationary policies. For example, in the context of the optimal savings problem from {ref}`s-og`, we fixed a policy $\sigma$ and computed its lifetime value $v_\sigma$ by assuming that $\sigma$ is applied at every $t$ in $\{0,1,\ldots\}$. In particular, in {ref}`sss-lval`, we showed that, for $v$ arbitrarily chosen from the value space, $$ v_\sigma = \lim_{n \to \infty} T^n_\sigma \, v. $$ (eq-vsigbylim2) This expression illustrates how lifetime value is obtained by repeatedly applying the same policy. But is this focus on stationary policies justified? Could it be that higher lifetime value is available when we allow a change of policy in each period? To address this question, suppose that we can select a **policy plan** $\bar \sigma \coloneq (\sigma_t)_{t \geq 0}$ in the infinite Cartesian product $\times_{t \geq 0} \Sigma$ and apply the $t$-th element $\sigma_t$ at time $t$. Generalizing {eq}`eq-vsigbylim2`, the lifetime value of $\bar \sigma$ can be defined by $$ v_{\bar \sigma} = \lim_{n \to \infty} T_{\sigma_0} T_{\sigma_1} \cdots T_{\sigma_n} v. $$ (eq-vsignos) Of course, for the definition in {eq}`eq-vsignos` to make sense we need to know that the limit exists. Ideally, it should also be independent of $v$. (In {eq}`eq-vsignos`, we iterate backwards in time, applying $T_{\sigma_j}$ first, because $v$ is best thought of as a terminal condition, rather than an initial condition. See {ref}`sss-lval` for intuition.) Since the expression {eq}`eq-vsignos` requires a topology, we consider an ADP $(V, \TT)$ where $V = (V, \preceq)$ is a partially ordered space. We also assume that the topology on $V$ is generated by a metric $d$. As usual, $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ is a family of order preserving self-maps on $V$. To ensure that {eq}`eq-vsignos` exists we also require the following: ```{prf:assumption} :label: a-uc The metric $d$ is complete and sup-nonexpansive (see {ref}`sss-snms`). In addition, there exists a positive constant $\lambda$ with $\lambda < 1$ and $$ d( T_\sigma \, v , T_\sigma \, w ) \leq \lambda d(v , w) \quad \text{for all } v, w \in V \text{ and all } \sigma \in \Sigma. $$ In addition, for all $v \in V$ we have $\sup_{\sigma \in \Sigma} d(v, T_\sigma \, v) < \infty$. ``` We will make use of the following preliminary results. ```{prf:lemma} :label: l-aue If {prf:ref}`a-uc` holds, then claims (i)--(ii) below are valid. If, in addition, $(V, \TT)$ is semi-regular, then claim (iii) is also valid. 1. for each $v \in V$ and policy plan $\hat \sigma \coloneq (\sigma_t)_{t \geq 0}$, the limit $$ v_{\hat \sigma} \coloneq \lim_{n \to \infty} T_{\sigma_0} \cdots T_{\sigma_n} \, v $$ exists in $V$ and is independent of $v$. 2. Every $T_\sigma \in \TT$ is continuous and globally stable on $V$, with unique fixed point $v_\sigma$ satisfying $$ v_\sigma = \lim_{j \to \infty} T_\sigma^j \, v \quad \text{for all } v \in V. $$ (eq-sconv) 3. There exists a $v \in V$ such that $v = \bigvee_{\sigma \in \Sigma} T_\sigma \, v$. ``` ```{prf:proof} Fix $v \in V$ and policy plan $\hat \sigma \coloneq (\sigma_t)_{t \geq 0}$. Given the policy plan above and $m \leq n$, we adopt the following simplified notation: $$ T_{m, n} \coloneq T_{\sigma_m} \circ T_{\sigma_{m+1}} \circ \cdots \circ T_{\sigma_n}. $$ Let $v_n = T_{0, n} v$. Our claim is that $\lim_n v_n$ exists in $V$ and is independent of $v$. To see this, observe first that $(v_n)$ is Cauchy, since, fixing $m, j \in \NN$, $$ d(v_m, v_{m+j}) \leq \lambda^{m+1} d \left( v, T_{m+1, m+j} v \right) , $$ and, by repeatedly applying the triangle inequality, $$ d (v, T_{m+1, m+j} v) \leq d (v, T_{m+1} v) + d (T_{m+1} v, T_{m+1} T_{m+2} v) \\ + \cdots + d (T_{m+1} \cdots T_{m+j-1} v, T_{m+1} \cdots T_{m+j-1} T_{m+j} v) . $$ By {prf:ref}`a-uc`, there exists a finite constant $b$ satisfying $d ( v, T_{\sigma_j} v ) \leq b$ for all $j$. From this and the last bound we obtain $$ d (v, T_{m+1, m+j} v) \leq b + \lambda b + \cdots + \lambda^{j-1} b \leq \frac{b}{1-\lambda}. $$ As a result, $d(v_m, v_{m+j}) \leq \lambda^{m+1} b / (1- \lambda)$. This shows that $(v_n)$ is Cauchy. Using completeness of $V$ and letting $\bar v$ be the limit of this sequence, we argue that $\bar v$ is independent of $v$. Indeed, if $w_n \coloneq T_{0, n} \, w$ for some $w \in V$, then $d(v_n, w_n) \leq \lambda^{n+1} d(v, w)$ for all $n$, so that $(v_n)$ and $(w_n)$ have the same limit. Hence $\lim_{n \to \infty} T_{0, n} \, v$ exists in $V$ and is independent of the initial condition $v$. This proves claim (i). The result in (ii) is immediate because, by {prf:ref}`a-uc`, every $T_\sigma \in \TT$ is a contraction mapping (and therefore continuous) on the complete metric space $V$. Finally, for (iii), applying {prf:ref}`l-snms` on page , the Bellman operator $T$ is also a contraction map and, therefore, has at least one fixed point in $V$. ◻ ``` We can now prove the main result of this section, which shows that any policy plan is (weakly) dominated in value by a stationary policy. ```{prf:theorem} :label: t-spo If $(V, \TT)$ is regular and {prf:ref}`a-uc` holds, then the fundamental optimality properties hold. In addition, given any policy plan $\bar \sigma$, there exists a stationary policy plan $\sigma$ such that $v_{\bar \sigma} \preceq v_\sigma$. ``` ```{prf:proof} The first claim is immediate from {prf:ref}`t-contract`. Regarding the second, let $\bar \sigma \coloneq (\sigma_t)_{t \geq 0}$ be an arbitrary policy plan and let $\sigma$ be an optimal policy. Since $v_\sigma$ is fixed for $T$, we have $\times_{t=0}^j T_{\sigma_t} v_\sigma \preceq T^j v_\sigma = v_\sigma$ for all $j$. By {prf:ref}`l-aue`, the left-hand side converges to $v_{\bar \sigma}$ as $j \to \infty$, and closedness of the partial order yields $v_{\bar \sigma} \preceq v_\sigma$. This proves that every policy plan is weakly dominated by a stationary continuation plan. ◻ ``` (s-adp2apps)= ## Applications We now apply the theory developed above to a range of problems. {ref}`ss-apps` revisits discrete MDPs and introduces Q-factors. {ref}`ss-osfirop` treats optimal savings under both strong and weak continuity assumptions. {ref}`ss-nodos` develops a no-discount optimal stopping framework, which is then applied to sequential analysis in {ref}`ss-sar`. (ss-apps)= ### Discrete MDPs and Q-Factors In {ref}`sss-mdpsbc` we re-derive the optimality results for finite MDPs using the pospace theory of this chapter. In {ref}`sss-qfmo` we introduce the Q-factor formulation of the MDP and establish its optimality properties. The Q-factor formulation underpins reinforcement learning, one of the most influential branches of modern AI. Reinforcement learning algorithms---most notably Q-learning---use Q-factors to learn optimal policies from data, without requiring a model of the environment. Here we focus on the underlying dynamic programming theory associated with Q-factors, taking the model as given. Further discussion of Q-learning can be found in {ref}`s-cn_adps2` and {ref}`ss-qlearn`. (sss-mdpsbc)= #### MDPs Optimality, Again In {ref}`s-mdps` we introduced a finite MDP $(\Gamma, r, \beta, P)$ with state space $\Xsf$ and action space $\Asf$. In {ref}`sss-masa` we showed that this finite MDP can be framed as ADP $(\RR^\Xsf, \TT_{\rm MDP})$, with policy operators given by $T_\sigma = r_\sigma + \beta P_\sigma$. In {ref}`sss-mdpopt` we established all of the major optimality properties for $(\RR^\Xsf, \TT_{\rm MDP})$. For the purpose of illustration, working in a familiar and simple environment, let's re-prove these results using the theorems in this chapter. First, we know from {ref}`sss-masa` that $(\RR^\Xsf, \TT_{\rm MDP})$ is regular. In {prf:ref}`ex-tsfmdp` we proved that every policy operator $T_\sigma$ is globally stable. Since $\TT_{\rm MDP}$ is finite {prf:ref}`c-pospace` applies. This proves validity of the fundamental optimality properties and convergence of all algorithms. Since global stability implies order stability, convergence of HPI in finite time holds by {prf:ref}`t-bkf`. ```{exercise} :label: ex-mdpc Consider again the MDP model $(\Gamma, r, \beta, P)$ on state space $\Xsf$ and with action space $\Asf$, but now suppose that $\Asf$ and $\Xsf$ are countable rather than finite. Suppose in addition that the reward function $r$ is bounded and $\Gamma(x)$ is finite for all $x \in \Xsf$. The remaining MDP definitions from {ref}`sss-tdtm` are unchanged. Let $b\Xsf$ be the set of all bounded functions on $\Xsf$. As before, $\Sigma$ is the set of all maps from $\Xsf$ to $\Asf$ and $\TT$ is all policy operators of the form $T_\sigma \, v = r_\sigma + \beta P_\sigma \, v$. Prove that 1. $(b\Xsf, \TT_{\rm MDP})$ is an ADP, 2. the fundamental optimality properties hold, and 3. VFI, OPI, and HPI all converge. ``` ```{solution} ex-mdpc Since $|T_\sigma \, v| \leq |r_\sigma| + \beta P_\sigma |v|$, the image $Tv$ is bounded whenever $|v|$ is bounded. Hence $T_\sigma$ is a self-map on $b\Xsf$. Clearly $T_\sigma$ is order preserving. Hence $(b\Xsf, \TT_{\rm MDP})$ is an ADP. Since each set $\Gamma(x)$ is finite, the proof in {ref}`sss-mdpprop` that the finite ADP is regular extends directly to the countable case. The proof in {prf:ref}`ex-tsfmdp` that every policy operator $T_\sigma$ is a contraction of modulus $\beta < 1$ under the supremum norm also extends to the countable case without significant modifications. The supremum norm remains sup-nonexpansive and complete on $b\Xsf$. Hence {prf:ref}`t-contract` applies. This proves validity of the fundamental optimality properties and convergence of all algorithms. ``` (sss-qfmo)= #### The Q-Factor Model Next we examine the Q-factor variation of the MDP model. This modification provides an alternative view on the Bellman equation that unlocks a stochastic approximation approach to reinforcement learning. We can study optimality of the Q-factor variation either by treating it directly, as an ADP in its own right, or by inferring optimality properties from the original MDP version of the problem, which we just obtained in {ref}`sss-mdpsbc`. The first approach is treated here and the second is treated in {prf:ref}`c-transforms`. To begin, we take the MDP model from {ref}`sss-mdpsbc` and, given $v \in \RR^\Xsf$, set $$ q(x, a) \coloneq r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') \qquad ((x,a) \in \Gsf). $$ (eq-qfac) The function $q$ is called the **Q-factor** corresponding to $v$. We will convert the original MDP Bellman equation {eq}`eq-mdp_bell` into an equation in $Q$-factors. The first step is to observe that, given $q$ in {eq}`eq-qfac`, the Bellman equation can be written as $v(x) = \max_{a \in \Gamma(x)}q(x, a)$. Taking the expectations and discounting on both sides of this equation yields $$ \beta \sum_{x'} v(x') P(x, a, x') = \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x'). $$ Adding $r(x,a)$ and using the definition of $q$ again gives $$ q(x, a) = r(x, a) + \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x'). $$ (eq-pabo0) This is the **Q-factor Bellman equation**. To study it, we introduce a family of policy operators $\SS \coloneq \setntn{S_\sigma}{\sigma \in \Sigma}$ via $$ (S_\sigma \, q)(x, a) = r(x, a) + \beta \sum_{x'} q(x', \sigma(x')) P(x, a, x') \qquad ((x,a) \in \Gsf). $$ (eq-pabos) Here $S_\sigma$ acts on function $q \in \RR^\Gsf$. The set $\RR^\Gsf$ is paired with the pointwise partial order. ```{exercise} :label: ex-adps2-auto-1 Prove that $(\RR^\Gsf, \SS)$ is an ADP. ``` ```{solution} ex-adps2-auto-1 Fix $\sigma \in \Sigma$. Clearly, $S_\sigma$ is a self-map on $\RR^{\Gsf}$. For any $q \leq f \in \RR^{\Gsf}$, we have $q(x, a) \leq f(x, a)$ for each $(x, a) \in \Gsf$, and therefore $(S_\sigma \, q)(x, a) \leq (S_\sigma \, f)(x, a)$ for each $(x, a) \in \Gsf$. This implies $S_\sigma \, q \leq S_\sigma \, f$, so $S_\sigma$ is order preserving. ``` ```{exercise} :label: ex-qfacrgs Fixing $q \in \RR^\Gsf$, show that $\sigma \in \Sigma$ is $q$-greedy for the ADP $(\RR^\Gsf, \SS)$ if and only if $\sigma(x)$ is in $\argmax_{a \in \Gamma(x)} q(x, a)$ for all $x \in \Xsf$. ``` ```{solution} ex-qfacrgs If $\sigma(x) \in \argmax_{a \in \Gamma(x)} q(x, a)$ for all $x \in \Xsf$, then $q(x, \sigma(x)) = \max_{a \in \Gamma(x)} q(x, a)$ for all $x \in \Xsf$. Hence, for any $\tau \in \Sigma$, we have $(S_\tau q)(x, a) \leq (S_\sigma \, q)(x, a)$ for each $(x, a) \in \Gsf$, which implies that $S_\tau q \leq S_\sigma \, q$. Hence, $\sigma$ is $q$-greedy policy. The reverse implication is left to the reader. ``` By definition, the ADP Bellman operator corresponding to $(\RR^\Gsf, \SS)$ obeys $S q \coloneq \bigvee_\sigma S_\sigma \, q$. The next exercise helps us connect this to the Q-factor Bellman equation {eq}`eq-pabo0`. ```{exercise} :label: ex-qpogdy Show that $Sq$ can also be written as $$ (Sq)(x, a) = r(x, a) + \beta \sum_{x'} \max_{a' \in \Gamma(x')}q(x', a') P(x, a, x') $$ (eq-pabo) for all $q \in \RR^\Gsf$. ``` ```{solution} ex-qpogdy Fix $q \in \RR^{\Gsf}$ and choose $\sigma$ such that $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} q(x, a) \qquad (x \in \Xsf). $$ By {prf:ref}`ex-qfacrgs`, $\sigma \in \Sigma$ is $q$-greedy. The representation {eq}`eq-pabo` now follows from $Sq = S_\sigma \, q$ (see {prf:ref}`l-torper`). ``` {prf:ref}`ex-qpogdy` tells us that, as expected, $q \in \RR^\Gsf$ is a fixed point of $S$ if and only if it is a solution to the Q-factor Bellman equation {eq}`eq-pabo0`. Now let's turn to optimality. ```{exercise} :label: ex-saac Prove that each $S_\sigma$ is a contraction of modulus $\beta$ on $\RR^\Gsf$ with respect to the supremum norm. ``` ```{solution} ex-saac Fix $\sigma \in \Sigma$ and $q, f \in \RR^{\Gsf}$. For each $(x, a) \in \Gsf$, we have $$ \begin{aligned} |(S_\sigma \, q)(x, a) - (S_\sigma \, f)(x, a)| & \leq \beta \sum_{x'} |q(x', \sigma(x')) - f(x', \sigma(x'))| P(x, a, x') \\ & \leq \beta \sum_{x'} \|q - f\|_{\infty} P(x, a, x') = \beta \|q - f\|_{\infty}. \end{aligned} $$ Taking the supremum over $(x, a)$ yields $\|S_\sigma \, q - S_\sigma \, f\|_{\infty} \leq \beta \|q - f\|_{\infty}$. ``` The next exercise asks you to confirm the core optimality properties also hold for the Q-factor MDP. You may like to use {prf:ref}`t-contract`. ```{exercise} :label: ex-qfaco Prove that the fundamental optimality properties hold for $(\RR^\Gsf, \SS)$, that VFI, OPI and HPI all converge, and that HPI converges in finitely many steps. ``` ```{solution} ex-qfaco Since $\Asf$ is finite and $\Gamma$ is nonempty, the characterization of greedy policies in {prf:ref}`ex-qfacrgs` implies that $(\RR^\Gsf, \SS)$ is regular. By {prf:ref}`ex-saac`, each $S_\sigma \in \SS$ is a contraction of modulus $\beta$ on $\RR^\Gsf$. The claim now follows from either {prf:ref}`c-pospace` or {prf:ref}`t-contract`. Since global stability implies order stability, convergence of HPI in finite time holds by {prf:ref}`t-bkf`. ``` (ss-osfirop)= ### Optimal Savings In {ref}`s-og` we introduced a simple optimal savings problem. In {ref}`sss-egosadp`, we converted the optimal savings problem from {ref}`s-og` into an ADP $(V, \TT_{\rm OS})$, where $V = b\RR_+$ and each $T_\sigma \in \TT_{\rm OS}$ takes the form $$ (T_\sigma \, v)(w) = u(\sigma(w)) + \beta \int v(R(w - \sigma(w)) + y) \phi(\diff y) \qquad (w \in \RR_+). $$ (eq-ogpolopr) Now we turn to optimality. In {ref}`sss-ossc` we will maintain the conditions in {prf:ref}`a-uf`, so that $u$ is continuous and bounded on $\RR_+$ and the distribution of labor income can be represented by a continuous density. In {ref}`sss-oswc` we will drop the continuous density assumption. (sss-ossc)= #### The Strongly Continuous Case Maintaining {prf:ref}`a-uf`, we prove the following optimality properties, which were stated without proof in {ref}`sss-dpros`. ```{prf:proposition} :label: p-osfo The optimal savings ADP $(V, \TT_{\rm OS})$ obeys the fundamental optimality properties and VFI, OPI and HPI converge. ``` ```{prf:proof} In {prf:ref}`l-ogtsup`, we showed that $(V, \TT_{\rm OS})$ is globally stable. In {ref}`sss-egosadp`, we showed that $(V, \TT_{\rm OS})$ is regular and order bounded. Since $V$ is countably Dedekind complete ({prf:ref}`c-sdcms`), the claims in {prf:ref}`p-osfo` follow from {prf:ref}`t-tspo`. ◻ ``` ```{prf:remark} :label: r-oso The results in {prf:ref}`p-osfo` can also be obtained via {prf:ref}`t-contract`. Interested readers can try this as an exercise. ``` Let's briefly translate these ADP results into the optimal savings results stated in {ref}`sss-dpros`. We showed in {ref}`sss-egosadp` that $v \in V$ satisfies the ADP Bellman equation $\bigvee_\sigma T_\sigma \, v=v$ if and only if it satisfies $$ v(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} \quad \text{for all } w \geq 0. $$ (eq-osbell2) By this fact and the fundamental optimality properties, the value function $\vmax$ exists and is the unique solution to {eq}`eq-osbell2` in $V$. In {ref}`sss-egosadp`, we say that a policy $\sigma$ is $v$-greedy if and only if $$ \sigma(w) \in \argmax_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} \quad \text{for all } w \geq 0. $$ (eq-osbellarg2) Applying Bellman's principle of optimality, a policy is optimal if and only if it satisfies {eq}`eq-osbellarg2` with $v$ replaced by $\vmax$. (sss-oswc)= #### The Weakly Continuous Case Now let's prove a result similar to {prf:ref}`p-osfo` under weaker conditions. In particular, we modify {prf:ref}`a-uf` by dropping the assumption that $\phi$ is a continuous density. Instead we'll let $\phi$ be an arbitrary probability measure on the Borel subset of $\RR_+$. Other conditions in {prf:ref}`a-uf` are maintained. Without the continuity of $\phi$, {prf:ref}`l-conofos` on page  fails. In particular, we cannot claim that each $v \in b\RR_+$ has a greedy policy. As a result, $(V, \TT_{\rm OS})$ is no longer regular. However, we do have the following result, stated as a solved exercise. ```{exercise} :label: ex-oswgt Prove the following: If $v \in bc\RR_+$, then at least one $v$-greedy policy exists. Moreover, the Bellman operator maps $bc\RR_+$ into itself. ``` ```{solution} ex-oswgt Fix $v \in bc\RR_+$. Since $u, v$ are both continuous and bounded, an application of the dominated convergence theorem confirms that the map $$ (w, c) \mapsto u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) $$ is continuous on the set of feasible state-action pairs $\Gsf$. Applying {prf:ref}`t-berge`, we see that there exists a measurable selection $\sigma$ such that $$ u(\sigma(w)) + \beta \int v(R(w - \sigma(w)) + y) \phi(\diff y) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} $$ for all $w \in \RR_+$. This measurable selection is $v$-greedy. The same theorem tells us that $$ (Tv)(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \right\} $$ is continuous on $\RR_+$. Since $Tv$ is also bounded, we have $Tv \in bc\RR_+$. ``` With the results from this exercise in hand, we can prove the next proposition. ```{prf:proposition} :label: p-osfow The optimal savings ADP $(V, \TT_{\rm OS})$ obeys the fundamental optimality properties, the value function is continuous, and VFI converges geometrically on $bc\RR_+$. ``` ```{prf:proof} We apply {prf:ref}`t-contract` with $V = b\RR_+$ and $V_0 = bc\RR_+$. The ADP is semi-regular because $V_0 \subset V_G$ and $T V_0 \subset V_0$ ({prf:ref}`ex-oswgt`). The supremum norm is complete and sup-non­expansive. It is straightforward to confirm that each $T_\sigma \in \TT$ is a contraction of modulus $\beta$ on $V$. This means that the conditions of {prf:ref}`t-contract` hold and the claims in {prf:ref}`p-osfow` are all valid. (Continuity of $\vmax$ is by $\vmax \in V_0$.) ◻ ``` (ss-nodos)= ### No-Discount Optimal Stopping Many important applications---including sampling problems, shortest path and routing problems, bandit problems, and reinforcement learning tasks---involve no discounting. In the absence of a discount factor, contractivity of the policy operators typically fails, so the results based on contraction mappings do not directly apply. This necessitates more sophisticated techniques. One of our main aims is to provide foundations for solving the sequential sampling problem from {ref}`s-wald`. #### Setup Let $(\Xsf, \bB)$ be a measurable space. We recall from {ref}`sss-sks` that a discrete time $\Xsf$-valued stochastic process $(X_t)_{t \geq 0}$ on probability space $(\Omega, \fF, \PP_x)$ is called **$P$-Markov** if $$ \PP \{X_{t+1} \in B \given \fF_t\} = P(X_t, B) \quad \PP \text{-a.s.\ for all } t \geq 0 \text{ and } B \in \bB \text{. } $$ We write $\PP_x$ and $\EE_x$ for probabilities and expectations when conditioning on $X_0 = x$. Let $b\Xsf_+$ be all $g \in b\Xsf$ taking only nonnegative values. We consider a cost minimization problem with Bellman equation $$ g(x) = \min \left\{ e(x), c(x) + \int g(x') P(x, \diff x') \right\}, $$ (eq-bell) where $e, c$ are functions in $b\Xsf_+$, while $P$ is a stochastic kernel on $\Xsf$. The function $e$ is called the **exit cost function** and $c$ is called the **flow cost**. The Bellman equation corresponds to a setting where a controller observes a $P$-Markov state process $(X_t)_{t \geq 0}$ and decides when to stop. Stopping at time $t$ incurs the one-off penalty $e(X_t)$. Continuing incurs the flow cost $c(X_t)$, followed by transition to the new state $X_{t+1}$ and the opportunity to decide again. #### Policies A policy is a $\bB$-measurable map $\sigma$ from $\Xsf$ to $\{0, 1\}$, with $\sigma(x) = 1$ indicating the decision to stop in state $x$. Given a policy $\sigma$, we call $$ E_\sigma \coloneq \setntn{x \in \Xsf}{\sigma(x) = 1} $$ the **exit region** for $\sigma$. We call its complement $E_\sigma^c$ the **continuation region**. To each policy $\sigma$, we associate the **stopping time** $$ \tau^\sigma \coloneq \inf\setntn{t \geq 0}{X_t \in E_\sigma} = \inf\setntn{t \geq 0}{\sigma(X_t) = 1}. $$ (eq-tau) Here and below, the convention for the infimum is that $\inf \varnothing \coloneq \infty$. Also, given $\sigma$, we define the **$\sigma$-loss function** via $$ g_\sigma(x) \coloneq \EE_x \left[ \sum_{t=0}^{\tau^\sigma-1} c(X_t) + e(X_{\tau^\sigma}) \right]. $$ (eq-lcf) Here $\sum_{t=0}^{-1} c(X_t)$ is understood as $0$, so that $g_\sigma(x) = e(x)$ when $x \in E_\sigma$. The function $g_\sigma$ takes values in $[0, \infty]$ and $g_\sigma(x)$ represents the total expected cost when applying $\sigma$ in every period, conditional on starting in state $x$. (sss-lbp)= #### The Lower Bound Policy One policy of particular interest is $\bar \sigma = \1\{e \leq c\}$. To simplify notation, the exit region for this policy is denoted $$ \bar E \coloneq E_{\bar \sigma} = \setntn{x \in \Xsf}{e(x) \leq c(x)}. $$ (eq-cerex) and the stopping time is denoted $$ \bar \tau \coloneq \tau^{\bar \sigma} = \inf\setntn{t \geq 0}{\bar \sigma(X_t) = 1}. $$ (eq-bartau) We call $\bar E$ the **certain exit region**. Under the innocuous convention that the controller always stops when indifferent between stopping and continuing, any optimal policy will choose to stop when $x \in \bar E$. The reason is that the controller has the opportunity to exit at cost $e(x) \leq c(x)$, and continuing incurs $c(x)$ plus additional costs in subsequent stages. The fact that the controller always stops when $x \in \bar E$ allows us to shrink the policy space. Specifically, we consider only policies where $\sigma(x) = 1$ for all $x \in \bar E$. Let $\Sigma$ be the set of all such policies. We can also express this set via $$ \Sigma \coloneq \{ \text{all } \bB \text{-measurable } \sigma \colon \Xsf \to \{0,1\} \text{ with } \bar \sigma \leq \sigma \}. $$ (eq-ndsig) Since $\bar \sigma \leq \sigma$ for all $\sigma \in \Sigma$, we refer to $\bar \sigma$ as the **lower bound policy**. Also, since $$ \bar \sigma \leq \sigma \; \implies \; \tau^\sigma \leq \bar \tau \;\; \PP\text{-a.s.}, $$ (eq-sbou) we refer to $\bar \tau$ as the **upper bound stopping time**. Our key assumption is as follows. ```{prf:assumption} :label: a-cap The upper bound stopping time $\bar \tau$ obeys $$ \sup_{x \in \Xsf} \EE_x \bar \tau < \infty. $$ ``` For {prf:ref}`a-cap`, it suffices to check that $\sup_{x \in \bar E^c} \EE_x \bar \tau$ is finite, since $\bar \tau = 0$ with probability one when $x \in \bar E$. Below we show that {prf:ref}`a-cap` is sufficient for all of the major optimality results associated with dynamic programming. ```{prf:lemma} :label: l-fomc When {prf:ref}`a-cap` holds, the $\sigma$-loss function {eq}`eq-lcf` is finite and bounded on $\Xsf$ for all $\sigma \in \Sigma$. ``` ```{prf:proof} Fix $\sigma \in \Sigma$. By {prf:ref}`a-cap`, there exists a constant $M < \infty$ with $\EE_x \bar \tau \leq M$ for all $x \in \Xsf$. In addition, the functions $c$ and $e$ are bounded, so we can take constants $N_c, N_e$ with $c \leq N_c$ and $e \leq N_e$. As a result, $$ g_\sigma(x) \leq N_c \, \EE_x \tau^\sigma + N_e \leq N_c \, \EE_x \bar \tau + N_e \leq N_c \, M + N_e, $$ where, in the second inequality, $\EE_x \tau^\sigma \leq \EE_x \bar \tau$ by {eq}`eq-sbou`. ◻ ``` (sss-ndpolop)= #### Policy Operators For each $\sigma \in \Sigma$, we define a **policy operator** $T_\sigma$ via $$ (T_\sigma \, g)(x) = \sigma(x) e(x) + (1 - \sigma(x)) \left[c(x) + \int g(x') P(x, \diff x') \right], $$ (eq-ndpol) or, in operator notation, as $$ T_\sigma \, g = \sigma e + (1-\sigma) c + K_\sigma \, g \qquad (g \in b\Xsf), $$ (eq-tsk) where $$ (K_\sigma \, g)(x) \coloneq (1 - \sigma(x)) \int g(x') P(x, \diff x'). \qquad (x \in \Xsf). $$ (eq-ksig) In {eq}`eq-tsk`, expressions such as $\sigma e$ are understood as pointwise products. As usual, the policy operator associated with $\sigma$ is introduced with the idea that its fixed point gives lifetime value -- in this case lifetime cost functions -- generated by $\sigma$. This turns out to be true here as well, although the proof is not entirely trivial. It requires some familiarity with shift operators and the Markov property in its general form. An introduction to these topics can be found in Chapter 3 of {cite}`meyn2009markov`. ```{prf:lemma} :label: l-tsfg For every $\sigma \in \Sigma$, the $\sigma$-loss function $g_\sigma$ is a fixed point of $T_\sigma$. ``` ```{prf:proof} Fix $\sigma \in \Sigma$. By {prf:ref}`l-fomc`, $g_\sigma$ is bounded on $\Xsf$, so all expectations below are well-defined. To simplify notation, for the duration of this proof we set $\tau \coloneq \tau^\sigma$ and $E \coloneq E_\sigma$. For $x \in E$, we have $\sigma(x)=1$ and hence $(T_\sigma \, g_\sigma)(x) = e(x)$. At the same time, {eq}`eq-lcf` implies that $g_\sigma(x) = e(x)$ also holds for such $x$. In particular, $T_\sigma \, g_\sigma = g_\sigma$ on $E$. Hence, to complete the proof, we only need to show that $T_\sigma \, g_\sigma = g_\sigma$ on $E^c$. To this end, fix $x \notin E$ and define the random variable $$ H \coloneq \sum_{t=0}^{\tau - 1} c(X_t) + e(X_{\tau}), $$ so that $$ g_\sigma(x) = \EE_x H \quad \text{and} \quad g_\sigma(X_1) = \EE_{X_1} H = \EE_x [ H \circ \theta \given X_1 ]. $$ On the right-hand side, $\theta \colon (x_0, x_1, \ldots) \mapsto (x_1, x_2, \ldots)$ is the shift operator on the sequence space $\Xsf^\infty$, and the last equality is by the Markov property {eq}`eq-markovprop`. We can write $g_\sigma(X_1)$ more explicitly by expanding $H \circ \theta$ to get $$ g_\sigma(X_1) = \EE_x \left[ \sum_{t=0}^{\tau \circ \, \theta - 1} c(X_{t+1}) + e(X_{\tau \circ \, \theta + 1}) \, \given \, X_1 \right] = \EE_x \left[ \sum_{t=1}^{\tau \circ \, \theta} c(X_t) + e(X_{\tau \circ \, \theta + 1}) \, \given \, X_1 \right]. $$ From the law of iterated expectations, this yields $$ \EE_x \, g_\sigma(X_1) = \EE_x \left[ \sum_{t=1}^{\tau \circ \, \theta} c(X_t) + e(X_{\tau \circ \, \theta + 1}) \right]. $$ (eq-exgs) Recalling that $x \in E^c$, so that $\sigma(x)=0$, we have $$ (T_\sigma \, g_\sigma)(x) = c(x) + \EE_x \, g_\sigma(X_1) = c(x) + \EE_x \left[ \sum_{t=1}^{\tau \circ \, \theta} c(X_t) + e(X_{\tau \circ \, \theta + 1}) \right]. $$ Using the fact that $x \in E^c$, so that the first visit to $E$ occurs after $t=1$, we obtain $\tau \circ \theta = \tau - 1$ and $X_{\tau \circ \, \theta + 1} = X_\tau$. (The stopping time $\tau \circ \theta$ counts time using the shifted sequence $(X_1, X_2, \ldots)$ and so equals $\tau - 1$; the stopped value on the shifted path then sits at position $\tau \circ \theta + 1 = \tau$ in the original sequence.) Applying these facts to the last display yields $$ (T_\sigma \, g_\sigma)(x) = c(x) + \EE_x \left[ \sum_{t=1}^{\tau - 1} c(X_t) + e(X_\tau) \right] = \EE_x \left[ \sum_{t=0}^{\tau - 1} c(X_t) + e(X_\tau) \right] = g_\sigma(x). $$ This confirms that $T_\sigma \, g_\sigma = g_\sigma$ on $E^c$, so the proof of {prf:ref}`l-tsfg` is done. ◻ ``` (sss-noadp)= #### ADP Formulation Let $\TT$ be the set of all policy operators, as defined in {eq}`eq-ndpol`, indexed over the restricted policy set $\Sigma$ defined in {eq}`eq-ndsig`. Since each $T_\sigma$ is order preserving, the pair $(b\Xsf_+, \TT)$ forms an ADP. For this ADP and given $g \in b\Xsf_+$, a policy $\sigma \in \Sigma$ is $g$-min-greedy when $T_\sigma g \leq T_s g$ for all $s \in \Sigma$. It is easy to check that one such policy always exists. Indeed, such a policy can be found by setting $$ \sigma(x) = \1 \left\{ e(x) \leq c(x) + \int g(x') P(x, \diff x') \right\} \qquad (x \in \Xsf) $$ This policy is in $\Sigma$, since $\sigma$ is $\bB$-measurable and, in addition, $$ e(x) \leq c(x) \implies e(x) \leq c(x) + \int g(x') P(x, \diff x'), $$ so that $\bar \sigma \leq \sigma$. Moreover, it is clear that $$ (T_\sigma \, g)(x) = \min \left\{ e(x), c(x) + \int g(x') P(x, \diff x') \right\} \leq (T_s \, g)(x) $$ for all $s \in \Sigma$. This argument also proves that the ADP $(b\Xsf_+, \TT)$ is regular. In essence, a $g$-min-greedy policy treats $g$ as a loss function, using it to associate the total expected cost of each state, and makes the current best choice accordingly. #### Stability of the Policy Operators In this section we prove that $(b\Xsf_+, \TT)$ is globally stable. ```{prf:proposition} :label: p-nodgs If {prf:ref}`a-cap` holds, then every $T_\sigma$ is globally stable on $b\Xsf_+$. ``` We prove the proposition using a sequence of lemmas. In the statement of the next lemma, $\sigma$ is a fixed policy, $K_\sigma$ is as defined in {eq}`eq-ksig`, and $\tau^\sigma$ is as defined in {eq}`eq-tau`. ```{prf:lemma} :label: l-kf For all $n \in \NN$ and $f$ in $b\Xsf$, we have $$ \| K_\sigma^n f \| \leq \| f\| \cdot \sup_{x \in \Xsf} \PP_x \left\{ \tau^\sigma \geq n \right\}. $$ ``` ```{prf:proof} Fixing $f \in b\Xsf$ and $x_0 \in \Xsf$, we iterate with $K_\sigma$ and apply the triangle inequality to obtain $$ |(K^n_\sigma f)(x_0)| \leq (1-\sigma(x_0)) \int (1-\sigma(x_1)) \int (1-\sigma(x_{n-1})) \cdots \\ \int |f(x_n)| P(x_{n-1}, \diff x_n) P(x_{n-2}, \diff x_{n-1}) \cdots P(x_0, \diff x_1). $$ Hence, if $(X_t)$ is $P$-Markov and starts at $x_0$, then $$ \begin{aligned} |(K^n_\sigma f)(x_0)| & \leq \| f\| \int \cdots \int \prod_{t=0}^{n-1} (1 - \sigma(x_t)) P(x_{n-1}, \diff x_n) P(x_{n-2}, \diff x_{n-1}) \cdots P(x_0, \diff x_1) \\ & = \| f\| \cdot \PP_{x_0} \bigcap_{t=0}^{n-1} \, \{\sigma(X_t) = 0\} = \| f\| \cdot \PP_{x_0} \{\tau^\sigma \geq n\}. \end{aligned} $$ Taking the supremum on the right and then the left produces the bound in {prf:ref}`l-kf`. ◻ ``` We will also use the following result regarding stopping times. ```{prf:lemma} :label: l-ibex If {prf:ref}`a-cap` holds, then, for all $\sigma \in \Sigma$, $$ \lim_{n \to \infty} \sup_{x \in \Xsf} \PP_x \{\tau^\sigma \geq n\} = 0. $$ (eq-ibex) ``` ```{prf:proof} Let $\bar \tau$ be as in {eq}`eq-bartau`. Under {prf:ref}`a-cap`, there exists an $M < \infty$ with $\EE_x \bar \tau \leq M$ for all $x$. In addition, for any $x \in \Xsf$ and any $n > 0$, we have: $$ \EE_x \bar \tau = \sum_{s=0}^{\infty} \PP_x\{\bar \tau > s\} \geq \sum_{s=0}^n \PP_x\{\bar \tau > s\} \geq n \cdot \PP_x\{\bar \tau > n\}, $$ where the last inequality holds because $\bar \tau > n$ implies $\bar \tau > s$ for all $s \leq n$. Combining these facts and using Markov's inequality yields $$ \PP_x\{\bar \tau > n\} \leq \frac{\EE_x \bar \tau}{n} \leq \frac{M}{n} \quad \text{for all } x \in \Xsf. $$ As a consequence, $$ \lim_{n \to \infty} \sup_{x \in \Xsf} \PP_x\{\bar \tau > n\} = 0. $$ (eq-ld) Now fix $\sigma \in \Sigma$. By {eq}`eq-sbou` we have $\tau^\sigma \leq \bar \tau$ and hence $\{\tau^\sigma > n\} \subset \{\bar \tau > n\}$. Combining this with {eq}`eq-ld` yields {eq}`eq-ibex`. ◻ ``` ```{prf:lemma} :label: l-con If {prf:ref}`a-cap` holds, then, for every $\sigma \in \Sigma$, the policy operator $T_\sigma$ is asymptotically contracting on $b\Xsf$. ``` ```{prf:proof} Fixing $f, g \in b\Xsf$, using the definition of $T_\sigma$ in {eq}`eq-tsk` and the bound in {prf:ref}`l-kf` produces $$ \|T_\sigma^n \, f - T_\sigma^n \, g\| \leq \|K_\sigma^n (f - g) \| \leq \|f - g \| \cdot \sup_{x \in \Xsf} \PP_x \left\{ \tau^\sigma \geq n \right\} $$ for all $n \in \NN$. The claim in {prf:ref}`l-con` now follows from {prf:ref}`l-ibex`. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`p-nodgs`.* Global stability of $T_\sigma$ follows from asymptotic contractiveness ({prf:ref}`l-con`), existence of a fixed point ({prf:ref}`l-tsfg`), and {prf:ref}`ex-acgs`. ◻ ``` #### Optimality We define the **minimum loss function** $\gmin$ via $$ \gmin(x) \coloneq \inf_{\sigma \in \Sigma} g_\sigma(x) \qquad (x \in \Xsf). $$ (eq-gstar) The function $\gmin$ also takes values in $[0,\infty)$ and is well-defined everywhere on $\Xsf$. The minimum loss function $\gmin$ is equal to the min-value function $\vmin$ of the ADP, as defined in {ref}`sss-mindef`. (This is because, by definition, $\vmin = \bigwedge_\sigma v_\sigma$, which is $\vmin = \bigwedge_\sigma g_\sigma$ in the current setting. This equation reduces to {eq}`eq-gstar` when working in $b\Xsf$ with the pointwise partial order.) A policy $\sigma \in \Sigma$ is called **optimal** if $g_\sigma \leq g_s$ for all $s \in \Sigma$. This is equivalent to the statement that $\sigma$ attains the minimum possible cost from every state, and is equivalent to the ADP definition in {ref}`sss-mindef`. We can now state the following result. ```{prf:theorem} :label: t-ndbk If {prf:ref}`a-cap` holds, then, for the no-discount optimal stopping ADP $(b\Xsf_+, \TT)$, 1. the fundamental min-optimality properties hold and 2. the min-VFI, min-OPI, and min-HPI algorithms all converge. ``` ```{prf:proof} We saw in {ref}`sss-noadp` that $(b\Xsf_+, \TT)$ is regular, and in {prf:ref}`p-nodgs` that $(b\Xsf_+, \TT)$ is globally stable. The ADP is min-order bounded, as well, since $T_\sigma 0 \geq 0$ for all $\sigma$. Since $b\Xsf_+$ is countably Dedekind complete, all of the conditions of {prf:ref}`t-mintspo` are satisfied. This implies the conclusions of {prf:ref}`t-ndbk`. ◻ ``` (ss-sar)= ### Sequential Analysis Revisited With {prf:ref}`t-ndbk` in hand, we are in a position to prove optimality results for the sequential analysis problem we presented in {ref}`s-wald`. First we reduce the sequential analysis problem to a special case of the no-discount optimal stopping problem treated in {prf:ref}`t-ndbk`. Then we check the conditions of that theorem. The only significant condition is {prf:ref}`a-cap`, so this is where we will be investing all our effort. ```{prf:remark} For simplicity, we assume below that the observations $(Z_n)$ are real-valued and that $f_0, f_1$ are densities with respect to Lebesgue measure. However, the arguments generalize without difficulty to the case where the observations take values in an arbitrary measurable space $(\Zsf, \aA)$ and the densities are defined with respect to a $\sigma$-finite reference measure $\mu$, with $\diff z$ replaced by $\mu(\diff z)$ throughout. ``` #### Set Up Our first step is to show that the hard part of the sequential analysis problem is a special case of the no-discount optimal stopping problem from {ref}`ss-nodos`. To this end, let's remind ourselves of the set up in {ref}`s-wald`. We recall that the state space for the belief state $\pi$ is $\Xsf = (0,1)$ and action space is $\Asf = \{0, 1, 2\}$, where action $0$ represents accepting $f_0$, action $1$ represents accepting $f_1$, and action $2$ represents continuing to sample. We assume that the two densities are defined on $\RR$. Repeating {eq}`eq-waldie`, the Bellman equation has the form $$ g(\pi) = \min \left\{ \pi L_0, \; (1-\pi) L_1, \; c + \int g(\pi') P(\pi, \diff \pi') \right\} $$ (eq-waldie2) for $\pi \in (0,1)$, where the stochastic kernel $P$ obeys $$ (Pg)(\pi) \coloneq \int g(\kappa(\pi, z)) \psi(\pi, z) \diff z. $$ (eq-waldsk) Here, recalling {eq}`eq-predden`, $$ \psi(\pi,z) = (1-\pi)f_0(z) + \pi f_1(z) $$ is the predictive density and $$ \kappa(\pi, z) = \frac{\pi f_1(z)}{(1-\pi) f_0(z) + \pi f_1(z)} = \frac{\pi f_1(z)}{\psi(\pi, z)} $$ is the Bayesian update rule. Together, $\psi$ and $\kappa$ define the stochastic kernel $P$ governing the belief state $(\pi_n)_{n \geq 0}$, describing how beliefs evolve from the perspective of the controller when the observation sequence $(Z_n)_{n \geq 1}$ is forecast using the predictive density. In all of what follows, the cost $c$ and the losses $L_0$ and $L_1$ are assumed to be positive constants. Our aim is to characterize and solve for optimal policies. In terms of dynamic programming, we can simplify the sequential analysis to a binary stopping problem. Our first step is to set $$ e(\pi) \coloneq \min\{\pi L_0, \; (1-\pi) L_1\}. $$ (eq-epi) Now consider the Bellman equation $$ g(\pi) = \min \left\{ e(\pi), \; c + (Pg)(\pi) \right\}. $$ (eq-waldie3) We claim that solving this dynamic program is sufficient for solving the sequential analysis problem. To see this, suppose we are able to show that the fundamental min-optimality properties hold for this dynamic program. Let the min-value function be denoted by $\gmin$. Continuing the convention that the controller always stops when indifferent between stopping and continuing, Bellman's principle of min-optimality tells the controller to stop if and only if $e(\pi) \leq c + (P\gmin)(\pi)$. In the present setting, this means that the controller stops if and only if at least one of the stopping losses $\pi L_0$ and $(1-\pi)L_1$ is less than or equal to the continuation loss $c + (P\gmin)(\pi)$. When this stop occurs, the controller then makes the static choice over the two density options (selecting $f_0$ or $f_1$) depending on which of $\pi L_0$ and $(1-\pi)L_1$ is smaller. Since this static problem is trivial, we can concentrate on solving the dynamic problem represented by the Bellman equation {eq}`eq-waldie3`. The stopping problem just described is a special case of the no-discount optimal stopping from {ref}`ss-nodos`, with $e$ as the exit cost function in {eq}`eq-epi` and the flow cost $c$ constant over the state space $\Xsf = (0,1)$. (To confirm this, compare the Bellman equation {eq}`eq-waldie3` with the general case in {eq}`eq-bell`.) As a result, {prf:ref}`t-ndbk` applies. All we need to do is check that {prf:ref}`a-cap` is valid in the current setting. This turns out to be true whenever $f_0$ and $f_1$ are distinct. Our proof will rely on a bound for martingale stopping times in {prf:ref}`t-exit`. #### Verifying {prf:ref}`a-cap` Let's consider verification of {prf:ref}`a-cap` in the present setting. Let $\bar \tau$ be the upper bound stopping time for the belief state (i.e., the stopping time in {eq}`eq-bartau` specialized to the current setting). This stopping time is defined in terms of the certain exit region (see {eq}`eq-cerex`), which, for our problem, is $$ \bar E \coloneq \setntn{\pi \in (0,1)}{\min\{\pi L_0, \; (1-\pi) L_1\} \leq c}. $$ Equivalently, $$ \pi \in \bar E \iff \pi \leq \frac{c}{L_0} \quad \text{or} \quad \pi \geq 1 - \frac{c}{L_1}. $$ The lower bound policy $\bar \sigma$ is (recalling its definition from {ref}`sss-lbp`) the indicator function for $\bar E$, stopping the process whenever the belief state enters the certain exit region. The upper bound stopping time is, therefore, $$ \bar \tau = \inf \left\{ n \geq 0 \,:\, \pi_n \leq \frac{c}{L_0} \text{ or } \pi_n \geq 1 - \frac{c}{L_1} \right\}. $$ (eq-btb) Here the $(\pi_n)$ process evolves according to the kernel $P$ from {eq}`eq-waldsk`: given $\pi_n$, we draw $Z_{n+1}$ independently from $\psi(\pi_n, \cdot)$, and then set $$ \pi_{n+1} = \kappa(\pi_n, Z_{n+1}) = \frac{\pi_n f_1(Z_{n+1})}{\psi(\pi_n, Z_{n+1})}. $$ (eq-bsu) (Note here that division by zero is not a concern: For $\pi \in (0,1)$, we have $\psi(\pi, z) > 0$ if and only if $f_0(z) + f_1(z) > 0$. Since $Z_{n+1}$ is drawn from $\psi(\pi, \cdot)$, we have $\psi(\pi_n, Z_{n+1}) > 0$ almost surely.) Matching {eq}`eq-ndsig`, the policy set $\Sigma$ under consideration is all $\bB$-measurable $\sigma \colon \Xsf \to \{0,1\}$ with $\bar \sigma \leq \sigma$. For $\sigma$ in this set, the policy operator takes the form $$ (T_\sigma \, g)(\pi) = \sigma(\pi) e(\pi) + (1 - \sigma(\pi)) \left[c + \int g(\pi') P(\pi, \diff \pi') \right]. $$ (eq-ndpoll) Let $V$ be all bounded Borel measurable functions from $(0,1)$ to $\RR_+$. With $\TT_{\rm SA}$ as the family of operators described by {eq}`eq-ndpoll`, indexed by $\sigma \in \Sigma$, the pair $(b(0,1), \TT_{\rm SA})$ forms an ADP. For this ADP, we can state the following result. In the statement, two functions are distinct when they are not equal almost everywhere. ```{prf:proposition} :label: p-waldop2 If $f_0$ and $f_1$ are distinct, then for the sequential analysis ADP $(V, \TT_{\rm SA})$, the fundamental min-optimality properties hold and min-VFI, min-OPI, and min-HPI all converge. ``` To prove {prf:ref}`p-waldop2`, we recognize $(V, \TT_{\rm SA})$ as a special case of the no-discount optimal stopping ADP treated in {prf:ref}`t-ndbk`. As such, we only need to verify {prf:ref}`a-cap`, which amounts to showing that $\sup_{0 \leq \pi \leq 1} \EE_\pi \bar \tau$ is finite. As a first step, we recall that, given two probability densities $f_0$ and $f_1$ on $\RR$, the **triangular discrimination** is $$ \Delta(f_0, f_1) := \int \frac{[f_1(z) - f_0(z)]^2}{f_0(z) + f_1(z)} \, \diff z, $$ where the integrand is defined to be zero when $f_0(z) + f_1(z) = 0$. ```{prf:lemma} :label: l-delta_positive If $f_0$ and $f_1$ are distinct, then $\Delta(f_0, f_1) > 0$. ``` ```{prf:proof} If $f_0$ and $f_1$ are distinct on a set of positive measure $E$, then, for $z \in E$ we have $f_0(z) + f_1(z) > 0$ (since otherwise both are zero, which contradicts the definition of $E$). This means that the integrand in the definition of $\Delta(f_0, f_1)$ is positive on $E$. Integration of a positive function over a set of positive measure yields a positive value. ◻ ``` Now let's investigate the properties of stopping times for the process $(\pi_n)$ from {eq}`eq-bsu`. We will begin with a generic stopping time $$ \tau = \inf\{n \geq 0 : \pi_n \notin (a,b)\}, $$ where $a, b$ are numbers satisfying $0 < a < b < 1$. ```{prf:lemma} :label: l-belief If $f_0$ and $f_1$ are distinct, then, for any initial condition $\pi_0 \in (0,1)$, we have $$ \EE_{\pi_0}[\tau] < \frac{1}{\delta}, \quad \text{where} \quad \delta := a^2 (1-b)^2 \Delta(f_0, f_1). $$ ``` ```{prf:proof} Fix $\pi_0 \in (0,1)$. If $\pi_0$ is not in $(a,b)$, then $\tau = 0$, in which case the claim is trivial. Hence, from now on, we assume that $\pi_0$ is in $(a,b)$. Note that $(\pi_n)_{n \geq 0}$ is a martingale, since $$ \EE[\pi_{n+1} \mid \pi_n = \pi] = \int \frac{\pi f_1(z)}{\psi(\pi, z)} \cdot \psi(\pi, z) \, \diff z = \int \pi f_1(z) \, \diff z = \pi. $$ We apply {prf:ref}`t-exit` to this bounded martingale. Our main task is to obtain $\delta$ in {eq}`eq-var_bound`. For fixed $\pi$, the law of motion {eq}`eq-bsu` yields $$ \EE[\pi^2_{n+1} \mid \pi_n = \pi] = \int \frac{\pi^2 f^2_1(z)}{\psi(\pi, z)} \, \diff z = \pi^2 \int \frac{f^2_1(z)}{\psi(\pi, z)} \, \diff z, $$ (As for the triangular discrimination, the integrand is defined to be zero when $\psi(\pi, z) = 0$, which occurs only when $f_0(z) = f_1(z) = 0$). Using the fact that $(\pi_n)$ is a martingale, we get $$ v(\pi) := \EE[(\pi_{n+1} - \pi_n)^2 \mid \pi_n = \pi] = \pi^2 \left[\int \frac{f^2_1(z)}{\psi(\pi, z)} \, \diff z - 1\right]. $$ To simplify the bracketed term, we observe that $$ \begin{aligned} \int \frac{[f_1(z) - \psi(\pi, z)]^2}{\psi(\pi, z)} \, \diff z &= \int \frac{f_1(z)^2}{\psi(\pi, z)} \, \diff z - 2\int f_1(z) \, \diff z + \int \psi(\pi, z) \, \diff z \\ &= \int \frac{f_1(z)^2}{\psi(\pi, z)} \, \diff z - 1. \end{aligned} $$ Using this fact, combined with $f_1(z) - \psi(\pi, z) = (1-\pi)[f_1(z) - f_0(z)]$ and the definition of $v(\pi)$, gives $$ v(\pi) = \pi^2 (1-\pi)^2 \int \frac{[f_1(z) - f_0(z)]^2}{\psi(\pi, z)} \, \diff z. $$ (eq-var_formula) Since $\psi(\pi, z) \leq f_0(z) + f_1(z)$, we have $$ \int \frac{[f_1(z) - f_0(z)]^2}{\psi(\pi, z)} \, \diff z \geq \int \frac{[f_1(z) - f_0(z)]^2}{f_0(z) + f_1(z)} \, \diff z = \Delta(f_0, f_1). $$ (eq-lower_bound) Combining {eq}`eq-var_formula` and {eq}`eq-lower_bound`, for $\pi \in [a,b]$: $$ v(\pi) \geq \pi^2 (1-\pi)^2 \Delta(f_0, f_1) \geq a^2 (1-b)^2 \Delta(f_0, f_1) =: \delta. $$ By {prf:ref}`l-delta_positive`, the triangular discrimination is strictly positive, so $\delta > 0$. Since $v(\pi) = \EE[(\pi_{n+1} - \pi_n)^2 \mid \pi_n = \pi]$, this verifies condition {eq}`eq-var_bound` for $M_n = \pi_n$ and this choice of $\delta$. Applying {prf:ref}`t-exit` and using $(\pi_\tau - \pi_0)^2 \leq 1$, we get $$ \EE_{\pi_0}[\tau] \leq \frac{\EE_{\pi_0}[(\pi_\tau - \pi_0)^2]}{\delta} < \frac{1}{\delta}. $$ This verifies the claim in {prf:ref}`l-belief`. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`p-waldop2`.* Let $f_0$ and $f_1$ be distinct. Since $(V, \TT_{\rm SA})$ is a special case of the no-discount optimal stopping ADP treated in {prf:ref}`t-ndbk`, we only need to show that $\sup_{0 \leq \pi \leq 1} \EE_\pi \bar \tau$ is finite, where $\bar \tau$ is as defined in {eq}`eq-btb`. If $c/L_0 + c/L_1 \geq 1$, then $\bar E = (0,1)$, so $\bar \tau = 0$ a.s. and {prf:ref}`a-cap` holds trivially. Otherwise, $a \coloneq c/L_0$ and $b \coloneq 1 - c/L_1$ satisfy $0 < a < b < 1$, and we can apply {prf:ref}`l-belief` to $\bar\tau$. This leads us to $\EE_{\pi}[\tau] < 1/\delta$, where $$ \delta \coloneq \left(\frac{c}{L_0}\right)^2 \left(1 - 1 + \frac{c}{L_1}\right)^2 \Delta(f_0, f_1) = \left(\frac{c^2}{L_0L_1}\right)^2 \Delta(f_0, f_1). $$ Since $f_0$ and $f_1$ are distinct, {prf:ref}`l-delta_positive` implies that $\Delta(f_0, f_1) > 0$. The bound $\EE_{\pi}[\tau] < 1/\delta$ is valid for all $\pi \in (0, 1)$, confirming that {prf:ref}`a-cap` holds when $f_0$ and $f_1$ are distinct. As a result, all the claims in {prf:ref}`t-ndbk` are valid. ◻ ``` (s-cn_adps2)= ## Chapter Notes The results in this chapter extend the abstract dynamic programming framework of {prf:ref}`c-adps` by adding topological structure to the value space. The contraction-based approach in {ref}`ss-ams` is rooted in the classical work of {cite}`denardo1967contraction` and {cite}`bertsekas2022abstract`. Earlier foundations for the operator-theoretic approach to dynamic programming include {cite}`blackwell1965` on discounted models, {cite}`strauch1966negative` on negative dynamic programming, and {cite}`bertsekas1977monotone` on monotone mappings without contraction. The order-theoretic perspective on fixed points used in {ref}`ss-adppospace` connects to {cite}`marinacci2019unique`, who establish uniqueness results for Tarski-type fixed points of monotone operators. The pospace ADP framework of this chapter is developed in {cite}`sargent2025partially`. The Q-factor representation of MDPs treated in {ref}`sss-qfmo` is used extensively in reinforcement learning, where the Bellman equation over Q-factors provides the basis for model-free algorithms. The name originates from Q-learning, introduced by {cite}`watkins1989learning`; see {cite}`watkins1992qlearning` for the convergence proof and {cite}`tsitsiklis1994asynchronous` for convergence analysis under asynchronous updates. Standard references on reinforcement learning include {cite}`bertsekas1996neuro` and {cite}`sutton2018reinforcement`. The optimal savings problem in {ref}`ss-osfirop` has a long history, originating with {cite}`brock1972optimal` in the stochastic setting; see also {cite}`stokey1989recursive`. The strongly continuous case draws on classical results for MDPs with continuous densities, while the weakly continuous case uses Berge's maximum theorem and relates to the Feller-continuity approach to MDPs developed in {cite}`hernandez2012discrete` and {cite}`hernandez2012further`. The no-discount optimal stopping framework in {ref}`ss-nodos` covers problems where contractivity fails due to the absence of discounting. Such problems arise in shortest path problems, where the foundational reference is {cite}`bertsekas1991stochastic`; in bandit problems, where the seminal work is {cite}`gittins1979bandit`; and in sequential sampling. For general treatments of optimal stopping theory, see {cite}`shiryaev2007optimal`, {cite}`peskir2006`, and {cite}`chow1971great`. The sequential analysis problem treated in {ref}`ss-sar` goes back to the pioneering work of {cite}`wald1947sequential`. The optimality of the sequential probability ratio test was established by {cite}`wald1948optimum`. Modern treatments and extensions of sequential analysis include {cite}`degroot1970optimal`, {cite}`siegmund1985sequential`, and {cite}`lai2001sequential`. The triangular discrimination used in our proof of {prf:ref}`l-belief` is a classical $f$-divergence; see {cite}`topsoe2000inequalities` for sharp inequalities involving this divergence. The general theory of $f$-divergences was introduced independently by {cite}`csiszar1967information` and {cite}`ali1966general`; for a modern treatment, see {cite}`liese2006divergences`. ======================================================================== ## ADPs on Banach Space Many applications of interest have some kind of algebraic structure, for example when the value space is a subset of a vector space. In this chapter, we add algebraic structure to the value space and the policy operators. This structure can be exploited to provide sharp optimality conditions and more explicit results. When introducing algebraic structure to ADP theory, we work in the setting of Banach lattices. These spaces are attractive for ADP analysis, due to well-integrated order, algebraic and metric properties. (See {ref}`sss-bldef` for background on Banach lattices. We recall that, given a Banach space $E$, $\blop(E)$ is the set of all bounded linear operators from $E$ to itself and, for $L \in \blop(E)$, the symbol $\rho(L)$ denotes the spectral radius (see {ref}`ss-pnsl`). We also use the notion of positive operators defined in {ref}`sss-posops`. Loosely speaking, positive operators between Banach lattices are generalizations of nonnegative matrices.) We begin by developing optimality theory for ADPs on Banach lattices, covering contractions and Blackwell's condition ({ref}`ss-contract`), order contractions ({ref}`ss-abls`), and concavity-based methods ({ref}`ss-ovscon`). We then apply the theory to firm valuation, real options, and structural estimation. (adps-on-banach-space)= ## ADPs on Banach Space In this section, $E$ is always a Banach lattice. For $V$ contained in $E$, an ADP $(V, \TT)$ will be called **additive** if each $T_\sigma \in \TT$ has the form $$ T_\sigma \, v = r_\sigma + K_\sigma \, v $$ (eq-additive) where $r_\sigma \in E$ and $K_\sigma$ is an order-preserving self-map on $V$. We study contractions and Blackwell's condition in {ref}`ss-contract`, order contractions in {ref}`ss-abls`, and concavity-based methods in {ref}`ss-ovscon`. (ss-contract)= ### Contractions In {ref}`sss-conbs` we review eventually contracting maps and Blackwell's condition. In {ref}`sss-obas` we use these tools to obtain optimality results for ADPs satisfying a Blackwell-type discounting condition. (sss-conbs)= #### Contractions in Banach Space Let $E = (E, \|\cdot\|, \leq)$ be a Banach lattice with positive cone $E_+$. Let $S$ be a self-map on a subset $V$ of $E$. Recall from {ref}`sss-conmap` that $S$ is said to be **eventually contracting** on $V$ if $S^n$ is a contraction for some $n \in \NN$; that is, there exists a $\lambda \in [0, 1)$ and an $n \in \NN$ such that $$ \|S^n u - S^n v \| \leq \lambda \|u - v\| \quad \text{for all} \quad u, v \in V, $$ (eq-ucb) The following result is an obvious consequence of {prf:ref}`t-bfpt22`. ```{prf:theorem} :label: t-ec If $S$ is eventually contracting on a closed subset $V$ of $E$, then $S$ is globally stable on $V$, with unique fixed point $v^*$. In addition, there exists a $\beta \in (0,1)$ such that $\|S^m v - v^*\| = \OO(\beta^m)$ as $m \to \infty$ for each $v \in V$. ``` ```{prf:proof} Since $V$ is a closed subset of the Banach space $E$, it is complete. Since $S$ is eventually contracting on $V$, there exist $\lambda \in [0, 1)$ and $n \in \NN$ such that $S^n$ is a contraction of modulus $\lambda$ on $V$. By {prf:ref}`t-bfpt22`, $S$ is globally stable on $V$ with unique fixed point $v^*$. For the convergence rate, fix $v \in V$ and write $m = qn + r$ with $0 \leq r < n$. Applying {prf:ref}`t-bfpt` to $S^n$ gives $$ \|S^m v - v^*\| = \|(S^n)^q (S^r v) - v^*\| \leq \lambda^q \|S^r v - v^*\| = \OO(\beta^m), $$ where $\beta \coloneq \lambda^{1/n}$, since $\lambda^q = \beta^{m-r}$ and $\|S^r v - v^*\|$ is bounded over $r \in \{0, \ldots, n-1\}$. ◻ ``` There is a well-known technique for testing for contractivity (or eventual contractivity) of order preserving maps via **Blackwell's condition**. This technique is often used for dynamic programming (see, e.g., {cite}`stokey1989recursive`, Theorem 3.3). Here we state an abstract version, for a self-map $S$ on $V \subset E$, focusing on contractivity in one step. (For eventual contractivity, replace $S$ with $S^n$.) In the statement, we assume that $E$ has a normalized order unit. Recalling the definition in {ref}`sss-orus`, this means that there exists an $e \in E_+$ obeying $\| e\| = 1$ and $|v| \leq \|v\| e$ for all $v \in E$. Also, we recall that $V \subset E$ is called **increasing** if $u, v \in E$ with $u \leq v$ and $u \in V$ implies $v \in V$. ```{prf:lemma} :label: l-bw0 Let $V$ be an increasing subset of $E$. Let $S$ be an order-preserving self-map on $V$. If there exists a $\lambda \in [0,1)$ such that $$ S(v + \kappa e) \leq Sv + \lambda \kappa e \quad \text{for all } v \in V \text{ and all } \kappa \in \RR_+, $$ (eq-sce) then $S$ is a contraction of modulus $\lambda$ on $V$. ``` ```{prf:proof} Let $S, V$ have the stated conditions. Fix $v, w \in V$. We have $$ Sv = S(w + v - w) \leq S(w + |v - w|) \leq S(w + \|v - w\| \cdot e), $$ where the inequalities follow from the monotonicity of $S$ and the properties of the order unit $e$. Applying {eq}`eq-sce` and rearranging gives $$ Sv - Sw \leq \lambda \|v - w\| \cdot e. $$ Reversing the roles of $v$ and $w$ gives $|Sv - Sw| \leq \lambda \|v - w\| \cdot e$. Since $\| \cdot \|$ is a lattice norm, we have $\|Sv - Sw\| \leq \lambda \|v-w\|$. ◻ ``` ```{exercise} :label: ex-hk {cite}`harrison1978speculative` study an operator of the form $$ (Sp)(x) = \beta \max_{i \in I} \int [p(x') + g(x')] P_i(x, \diff x') \qquad (x \in \Xsf). $$ Here $p$ is a price function, $g$ is a state-contingent cash flow and $\{P_i\}_{i \in I}$ is a finite family of stochastic kernels on a metric space $\Xsf$. We assume that $g \in b\Xsf$ and $\beta \in [0,1)$. Taking $\1$ as the unit in {prf:ref}`l-bw0`, use that lemma to show that $S$ is a contraction of modulus $\beta$ on $b\Xsf$. ``` ```{solution} ex-hk We apply {prf:ref}`l-bw0` with $E = V = b\Xsf$, $e = \1$, and $\lambda = \beta$. The set $V = b\Xsf$ is increasing and the operator $S$ is order-preserving: if $p \leq q$ pointwise, then $\int [p(x') + g(x')] P_i(x, \diff x') \leq \int [q(x') + g(x')] P_i(x, \diff x')$ for each $i$, and taking the maximum over $i$ and multiplying by $\beta \geq 0$ preserves the inequality. For the discounting condition, fix $p \in b\Xsf$ and $\kappa \geq 0$. Since each $P_i$ is a stochastic kernel, $\int \kappa \, P_i(x, \diff x') = \kappa$, so $$ S(p + \kappa\1)(x) = \beta \max_{i \in I} \int [p(x') + \kappa + g(x')] P_i(x, \diff x') = (Sp)(x) + \beta \kappa. $$ Hence $S(p + \kappa\1) \leq Sp + \beta \kappa \1$, and {prf:ref}`l-bw0` gives the result. ``` (sss-obas)= #### Optimality for Blackwell ADPs Throughout this section, $E$ is a Banach lattice containing a normalized order unit $e$ and $V$ is a closed increasing subset of $E$. In the statement of the next theorem, $V_0$ is a subset of $V$. ```{prf:theorem} :label: t-blackwell Let $(V, \TT)$ be an ADP. Suppose that there exists a $\lambda \in [0,1)$ such that $$ T_\sigma \, (v + \kappa e) \leq T_\sigma \, v + \lambda \kappa e $$ (eq-badp) for all $\kappa \in \RR_+$ and all $\sigma \in \Sigma$. If, in addition, $(V, \TT)$ is semi-regular on $V_0$ and $V_0$ is closed in $V$, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $V_0$, and 3. VFI converges geometrically on $V_0$. If $(V, \TT)$ is also regular, then OPI and HPI converge. ``` ```{prf:proof} Since $E$ has a normalized order unit, the metric induced by the norm on $E$ is sup-nonexpansive ({prf:ref}`p-blsn`). Because each $T_\sigma$ is order preserving and obeys {eq}`eq-badp`, {prf:ref}`l-bw0` implies that all policy operators are contractions of modulus $\lambda$ on $V$. The stated results now follow from {prf:ref}`t-contract`. ◻ ``` A self-map $M$ on $V$ will be called a **certainty equivalent operator** if $M$ is order preserving and **translation invariant**; that is, $$ \text{ } M(v + \kappa e) = Mv + \kappa e \text{ for all } v \in V \text{ and all } \kappa \in \RR_+ \text{ }. $$ (The terminology comes from the abstract theory of certainty equivalents. We cover these objects in more detail in {ref}`sss-ces`.) The next exercise treats a special case in which the policy operators of an additive ADP (see {eq}`eq-additive`) are built from certainty equivalent operators. Certainty equivalents are treated in more detail in {ref}`sss-ces`. As before, $V$ is a closed increasing subset of $E$ and $V_0$ is a subset of $V$. ```{exercise} :label: ex-monetary Let $(V, \TT)$ be an ADP where, for each $\sigma \in \Sigma$, the policy operator has the form $$ T_\sigma \, v = r_\sigma + \beta M_\sigma \, v $$ (eq-mts) for some certainty equivalent operator $M_\sigma$ and some $\beta \in \RR_+$ with $\beta < 1$. Prove the following: If $(V, \TT)$ is semi-regular on $V_0$ and $V_0$ is closed in $V$, then (i)--(iii) in {prf:ref}`t-blackwell` hold. If $(V, \TT)$ is also regular, then OPI and HPI converge. ``` ```{solution} ex-monetary Fix $v$ and $\kappa \in \RR_+$. Since $M_\sigma$ is a certainty equivalent operator, we have $$ T_\sigma \, ( v + \kappa e) = r_\sigma + \beta M_\sigma \, ( v + \kappa e) = r_\sigma + \beta M_\sigma \, v + \beta \kappa e = T_\sigma \, v + \beta \kappa e. $$ The claims now follow from {prf:ref}`t-blackwell`. ``` (ss-abls)= ### Order Contractions We first define order contractions and establish fixed point results. We then obtain optimality conditions for order contracting ADPs and specialize to the case of affine policy operators in {ref}`sss-affa`. (order-contractions)= #### Order Contractions Let $E = (E, \|\cdot\|, \leq)$ be a Banach lattice with positive cone $E_+$. We call $D \colon E_+ \to E_+$ a **discount operator** on $E$ if 1. $D0 = 0$, 2. $D$ is order-preserving, and 3. $D$ is eventually contracting. The reason we call such an operator $D$ a discount operator will become clearer below. Intuitively, $D$ is eventually contracting and has a fixed point at zero, so $D^n h \to 0$ for all $h \in E_+$. If we think of $h$ as a time-$n$ payoff and $D^n h$ as its (state-contingent) present value, then the properties of $D$ imply that the present value is increasing in the payoff (since $D$ is order-preserving) and the present value converges to zero as the future date of the payoff moves to the infinite future. ```{exercise} :label: ex-do Show that if $D$ is a discount operator on $E$, then there exist $\lambda \in [0,1)$ and $n \in \NN$ such that $\| D^n h \| \leq \lambda \| h \|$ for all $h \in E_+$. ``` ```{solution} ex-do Since $D$ is a discount operator, $D 0 = 0$ and $D$ is eventually contracting: there exist $\lambda \in [0,1)$ and $n \in \NN$ such that $\|D^n u - D^n v\| \leq \lambda \|u - v\|$ for all $u, v \in E_+$. Setting $v = 0$ gives $\|D^n h\| \leq \lambda \|h\|$ for all $h \in E_+$. ``` ```{prf:example} :label: eg-oclc0 Every positive linear operator $D$ on $E$ satisfying $\rho(D) < 1$ is a discount operator when viewed as a map on $E_+$. Linearity implies that $D0=0$. Positivity of $D$ implies that $D$ is order-preserving on $E_+$ and, for any $h \in E_+$, we have $\| D^n h \| \leq \|D^n \| \|h\|$. In addition, $\rho(D) < 1$ yields $\|D^n \| < 1$ for some $n$ in $\NN$ ({prf:ref}`ex-igf`), so $D$ is eventually contracting. ``` Fix $V \subset E$. Let $S$ be a self-map on $V$. We call $S$ **an order contraction of modulus $D$** on $V$ if there exists a discount operator $D$ on $E$ such that $$ |S \,v - S \, w| \leq D |v - w| \quad \text{for all} \quad v, w \in V. $$ (eq-oc) Order contracting maps obey the following fixed point result. ```{prf:theorem} :label: t-evconbl If $V$ is closed and $S$ is order contracting on $V$, then $S$ is globally stable on $V$, with unique fixed point $v^*$. In addition, there exists a $\beta \in (0,1)$ such that $\|S^m v - v^*\| = \OO(\beta^m)$ as $m \to \infty$ for each $v \in V$. ``` ```{prf:proof} Let the stated conditions hold. In particular, $S$ is order contracting on $V$ with respect to some discount operator $D$. Since $|S \,v - S \, w| \leq D |v - w|$ and $D$ is order preserving, iterating gives $|S^m v - S^m w| \leq D^m | v - w |$ for all $m \in \NN$. Because $D$ is a discount operator, {prf:ref}`ex-do` gives $\lambda \in [0,1)$ and $n \in \NN$ with $\| D^n h \| \leq \lambda \| h \|$ for all $h \in E_+$. Since the norm is a lattice norm (see {ref}`sss-bldef`), setting $m = n$ gives $$ \| S^n v - S^n w \| \leq \| D^n |v - w |\| \leq \lambda \| v - w \|. $$ Hence $S^n$ is a contraction and the claims follow from {prf:ref}`t-ec`. ◻ ``` ```{prf:example} :label: eg-oclc If $S$ is a self-map on $V$ and $D$ is a positive linear operator on $E$ satisfying $\rho(D) < 1$ and $$ |S \,v - S \, w| \leq |D v - D w| \quad \text{for all } v, w \in V, $$ (eq-iboc) then $D$ is a discount operator and $S$ is an order contraction of modulus $D$ on $V$. Indeed, from positivity of $D$ we have $|D v - D w| \leq D|v - w|$, so {eq}`eq-oc` holds. {prf:ref}`eg-oclc0` showed that $D$ is a discount operator. ``` The next exercise states a variation on Blackwell's condition that is suitable for order contractions. To begin, we take $V$ to be a subset of $E$ such that $$ v + h \in V \text{ whenever } v \in V \text{ and } h \in E_+. $$ ```{exercise} :label: ex-black Let $V$ have the stated property and let $S$ be an order-preserving self-map on $V$. Prove the following: if there exists a discount operator $D$ such that $$ S (v + h) \leq S \, v + D h \quad \text{for all} \quad v\in V \text{ and } h \in E_+, $$ (eq-bec) then $S$ is an order contraction of modulus $D$ on $V$. ``` ```{solution} ex-black Fixing $v, w \in V$, we have $$ S \, v = S (w + v - w) \leq S (w + |v - w|) \leq S \,w + D |v - w|. $$ Rearranging gives $S \, v - S \, w \leq D|v-w|$. Reversing the roles of $v$ and $w$ yields {eq}`eq-oc`. ``` #### Order Contracting ADPs Next we study ADPs where the value space $V$ is a subset of a Banach lattice $E$. The results in this section extend to ADPs where the operators are order contractions. We do not require existence of a normalized order unit, as in {ref}`sss-obas`. This will allow us to handle ADPs that evolve in $L_p$ spaces and simplify treatment of features such as state-dependent discounting. Our first result treats the case where $\TT$ is finite. ```{prf:theorem} :label: t-banach_bkf Let $(V, \TT)$ be regular and let $V$ be closed in $E$. If $\TT$ is finite and each $T_\sigma$ is an order contraction on $V$, then 1. the fundamental optimality properties hold, and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} Let $(V, \TT)$ have the stated properties. Each $T_\sigma$ is order contracting and hence globally stable (by {prf:ref}`t-evconbl`). Since $\TT$ is finite, the stated claims follow from {prf:ref}`c-pospace`. ◻ ``` Our second result replaces finiteness with uniform order contractivity. In the statement of the next theorem, $V_0$ is a subset of $V$ and $V$ is closed in $E$. ```{prf:theorem} :label: t-banach_bk Let $(V, \TT)$ be an ADP. Suppose there exists a discount operator $D$ such that each $T_\sigma \in \TT$ is an order contraction of modulus $D$. If, in addition, $(V, \TT)$ is semi-regular on $V_0$ and $V_0$ is closed in $V$, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $V_0$, and 3. VFI converges geometrically on $V_0$. If, in addition, $(V, \TT)$ is regular, then OPI and HPI also converge. ``` The proof is straightforward: ```{prf:proof} Let $(V, \TT)$ have the stated properties. Each $T_\sigma$ is order contracting with respect to $D$ on the closed set $V$ and therefore globally stable (by {prf:ref}`t-evconbl`). Hence the ADP is globally stable, and therefore order stable ({prf:ref}`l-gsios`). Regarding the Bellman operator, observe that, for $v, w \in V_0$, $$ T_\sigma \, v = T_\sigma \, w + T_\sigma \, v - T_\sigma \, w \leq T_\sigma \, w + |T_\sigma \, v - T_\sigma \, w| \leq T w + D | v - w|, $$ (eq-pett) where the last step uses $T_\sigma \leq T$ on $V_0$ and the order contraction bound. Taking the supremum over $\sigma$ gives $Tv - Tw \leq D|v-w|$. Reversing the roles of $v$ and $w$ yields $|Tv - Tw| \leq D|v-w|$. Hence $T$ is order contracting of modulus $D$ on $V_0$. Since $V_0$ is closed in $V$ and $V$ is closed in $E$, the set $V_0$ is closed in $E$. Applying {prf:ref}`t-evconbl` to $T$ on $V_0$, we see that $T$ has a unique fixed point in $V_0 \subset V_G$. Hence, by {prf:ref}`c-bk`, the fundamental optimality properties hold. Because, under those properties, $\vmax$ is the unique fixed point of $T$ in $V_G$, and because $T$ has a fixed point in $V_0 \subset V_G$, we see that $\vmax \in V_0$. This proves claims (i)--(ii). For (iii), {prf:ref}`t-evconbl` applied to $T$ on $V_0$ also gives $\|T^m v - \vmax\| = \OO(\beta^m)$ for each $v \in V_0$, which is geometric convergence of VFI. Convergence of OPI and HPI under regularity follows from {prf:ref}`t-pospace`. ◻ ``` (sss-affa)= #### Order Contractive Linear Models Next we focus on ADPs operating in Banach lattices where the policy operators are affine. As before, $E$ is a Banach lattice with positive cone $E_+$, linear operators $\blop(E)$ and positive linear operators $\blop_+(E)$. ```{prf:theorem} :label: t-affineban_f Let $(V, \TT)$ be a regular additive ADP, where $V$ is closed in $E$ and each $T_\sigma \in \TT$ has the form $$ T_\sigma \, v = r_\sigma + K_\sigma \, v \quad \text{for some } r_\sigma \in E \text{ and } K_\sigma \in \blop_+(E). $$ If $\TT$ is finite and $\rho(K_\sigma) < 1$ for all $\sigma \in \Sigma$, then 1. the fundamental optimality properties hold, and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} Fix $\sigma \in \Sigma$. Since $T_\sigma$ is affine with slope $K_\sigma$, we have $|T_\sigma \, v - T_\sigma \, w| = |K_\sigma v - K_\sigma w|$ for all $v, w \in V$. Since $\rho(K_\sigma) < 1$, {prf:ref}`eg-oclc` implies that $K_\sigma$ is a discount operator and $T_\sigma$ is order contracting with modulus $K_\sigma$. Since $\TT$ is finite, the claims follow from {prf:ref}`t-banach_bkf`. ◻ ``` Our second result replaces finiteness with a uniform discount operator bound. In the statement, $V_0$ is a subset of $V$ and $V$ is closed in $E$. ```{prf:theorem} :label: t-affineban_sr Let $(V, \TT)$ be an additive ADP where each $T_\sigma \in \TT$ has the form $$ T_\sigma \, v = r_\sigma + K_\sigma \, v \quad \text{for some } r_\sigma \in E \text{ and } K_\sigma \in \blop_+(E). $$ Suppose there exists a discount operator $D$ on $E$ such that $K_\sigma \leq D$ on $E_+$ for all $\sigma \in \Sigma$. If, in addition, $(V, \TT)$ is semi-regular on $V_0$ and $V_0$ is closed in $V$, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $V_0$, and 3. VFI converges geometrically on $V_0$. If $(V, \TT)$ is also regular, then OPI and HPI converge. ``` ```{prf:proof} Fixing $\sigma \in \Sigma$ and $v, w \in V$, we have $$ |T_\sigma \, v - T_\sigma \, w| = |K_\sigma \, v - K_\sigma \, w| \leq K_\sigma \, | v - w| \leq D \, | v - w|. $$ Hence each $T_\sigma$ is an order contraction of modulus $D$, and the claims follow from {prf:ref}`t-banach_bk`. ◻ ``` (ss-ovscon)= ### Concavity and Convexity Some dynamic programs involve nonlinear policy operators that fail to be contractions. For example, models with recursive preferences or ambiguity often have these features. In this setting, we can deploy alternative fixed point results related to concavity and convexity of operators. Here we apply such results to obtain optimality conditions for ADPs. We first state Du's fixed point theorem for concave and convex operators and then apply it to obtain optimality conditions for ADPs on order intervals. #### Fixed Point Results Let $E = (E, \|\cdot\|, \leq)$ be a Banach lattice and suppose that $V = [a, b]$ for some $a, b \in E$. In this setting, we say that a self-map $S$ on $V$ satisfies **Du's conditions** if either 1. $S$ is concave and $S \, a \geq a + \epsilon (b-a)$ for some $\epsilon \in (0,1)$, or 2. $S$ is convex and $S \, b \leq b - \epsilon (b-a)$ for some $\epsilon \in (0,1)$. We state a result below connecting Du's conditions to global stability. Before doing so, we note some useful sufficient conditions that can be used when the positive cone of the Banach lattice has nonempty interior. To state them we write $x \ll y$ if $y - x$ is interior to $E_+$. ```{prf:lemma} :label: l-riesz_con If $S$ is an order-preserving self-map on $V$ and either 1. $S$ is concave and $a \ll S \, a$ or 2. $S$ is convex and $S \, b \ll b$ then $S$ satisfies Du's conditions. ``` ```{prf:proof} If (a') holds, then $S \, a - a$ is interior to the positive cone $E_+$, so we can take a positive $\epsilon$ such that $S \, a - a \geq \epsilon (b-a)$. Hence (a) holds. The proof of (b) is similar. ◻ ``` Here is the stability result. ```{prf:theorem} :label: t-du Let $V = [a,b]$ and let $S$ be an order-preserving self-map on $V$. If $S$ satisfies Du's conditions, then $S$ is globally stable on $V$. ``` For a proof consult {cite}`du1990fixed` or {cite}`zhang2012variational`, Theorem 2.1.2. ```{figure} figures/du_conditions.svg :name: f-du_conditions Du's conditions in one dimension ``` #### Optimality Theory The following result provides optimality conditions for ADPs where the policy operators satisfy Du's conditions. ```{prf:theorem} :label: t-riesz_con Let $(V, \TT)$ be a regular ADP where $V = [a, b] \subset E$ and each $T_\sigma \in \TT$ satisfies Du's conditions. If either 1. $E$ is countably Dedekind complete, 2. $\TT$ is finite, or 3. the Bellman operator $T$ also satisfies Du's conditions, then 1. the fundamental optimality properties hold, and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} Let $(V, \TT)$ be as stated. By {prf:ref}`t-du`, Du's conditions are sufficient for global stability of each policy operator $T_\sigma$ on $V$, so $(V, \TT)$ is globally stable. If (a) holds, then, since $(V, \TT)$ is also regular and $V$ is bounded above, claims (i) and (ii) are implied by {prf:ref}`t-tspo`. If (b) holds, then, since $\TT$ is finite, the stated claims follow from {prf:ref}`c-pospace`. If (c) holds, then, since $T$ has a fixed point in $V$, the claims follow from {prf:ref}`t-pospace`. ◻ ``` ## Applications We apply the theory developed above to three classes of problems. In {ref}`ss-firmex` we study firm valuation under constant discounting, unbounded rewards, and state-dependent discounting. In {ref}`ss-ro` we analyze a real option problem. In {ref}`ss-struct` we treat structural estimation models, including state-dependent discounting and non-expected utility preferences. (ss-firmex)= ### Firm Valuation Let's now return to the firm valuation problem from {ref}`s-fpintro` and discuss optimality results. We will look at the original bounded case with discounting, a second case with unbounded rewards (boundedness of the profit function is replaced by an integrability condition), and a third case involving state-dependent discounting. (sss-condi)= #### Constant Discounting In {ref}`sss-firmas`, we revisited the firm problem from {ref}`s-fpintro` and showed that $(b\Xsf, \TT_{\rm FV})$ is an ADP, where $\TT_{\rm FV} \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ is the set of all policy operators having the form $$ T_\sigma \, v = \sigma s + (1 - \sigma) \left( \pi + \beta Pv \right). $$ (eq-fintrots2o) (Here $\sigma s$ is understood as the map $x \mapsto s\sigma(x)$ and so on.) We recall that the policy set $\Sigma$ is all $\bB$-measurable functions mapping $\Xsf$ to $\{0,1\}$, which coincides with the set of indicator functions on $\bB$, and that the indicator function $$ \sigma = \1\left\{s \geq \pi + \beta Pv \right\} $$ (eq-sigfv) is $v$-greedy. Existence of $v$-greedy policies for all $v$ implies that the firm ADP is regular. In {prf:ref}`t-fintroop` we stated optimality results for the firm valuation problem. In {ref}`sss-fintropp` we supplied a direct proof. Here's a more general result and a proof using the theory from this chapter: ```{prf:proposition} :label: p-fvg For the firm valuation ADP $(b\Xsf, \TT_{\rm FV})$, the fundamental optimality properties hold and VFI, HPI, and OPI all converge. ``` ```{prf:proof} We apply {prf:ref}`t-blackwell` with $e = \1$ and $\lambda = \beta$. Writing $T_\sigma \, v = r_\sigma + K_\sigma \, v$ where $$ r_\sigma \coloneq \sigma s + (1 - \sigma) \pi \quad \text{and} \quad K_\sigma \coloneq (1-\sigma) \beta P, $$ (eq-rsks) and using $P \1 = \1$, we have $T_\sigma(v + \kappa \1) = T_\sigma \, v + (1-\sigma) \beta \kappa \1 \leq T_\sigma \, v + \beta \kappa \1$ for all $v \in b\Xsf$ and $\kappa \in \RR_+$. As $(b\Xsf, \TT_{\rm FV})$ is regular, the conclusions of {prf:ref}`t-blackwell` apply. ◻ ``` As usual, the ADP Bellman operator $T$ obeys $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$-greedy, so, using the description of greedy policies above, we have $T v = s \vee (\pi + \beta P v)$. The Bellman equation is therefore $v = s \vee (\pi + \beta P v)$, which, in expanded form, becomes $$ v(x) = \max \left\{ s, \; \pi(x) + \beta \int v(x') P(x, \diff x') \right\} $$ (eq-fvmodbel0) {prf:ref}`p-fvg` implies that the value function $\vmax \coloneq \bigvee_\sigma v_\sigma$ solves the Bellman equation {eq}`eq-fvmodbel0`, and that it can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are $\vmax$-greedy, so $\sigma = \1\{s \geq \pi + \beta P \vmax\}$ is optimal. This is the only optimal policy under the convention that the manager always sells the firm when indifferent between selling and continuing. ```{figure} figures/profits_hpi.pdf :name: f-profits_hpi :width: 95% HPI iterates for the firm valuation problem ``` {numref}`f-profits_hpi` illustrates HPI applied to the firm valuation problem using the same parameters as {numref}`f-profits_optimal`. The initial policy is $\sigma_0 \equiv 0$ (never sell). At each step $k$, the lifetime value $v_{\sigma_k}$ is obtained by solving the linear system $v = r_{\sigma_k} + K_{\sigma_k} v$, where $r_{\sigma_k}$ and $K_{\sigma_k}$ are as defined in {eq}`eq-rsks`. The next policy $\sigma_{k+1}$ is then set to be $v_{\sigma_k}$-greedy. In the right panel, the value functions rise monotonically towards $\vmax$ (shown in black). The left panel shows the corresponding sell thresholds converging to the optimal policy $\sigopt$. (sss-fvur)= #### Unbounded Rewards {prf:ref}`p-fvg` used the assumptions in {prf:ref}`t-fintroop`, one of which was that the profit function $\pi$ is bounded. In practice, many of the functions that we use for applied modeling are unbounded. In {ref}`sss-urs` we sketched ideas for extending the optimality results to an unbounded setting, where $\pi$ is assumed instead to be integrable. Let's continue that discussion here. Instead of boundedness, we assume $\pi$ lies in $L_1(\psi) \coloneq L_1(\Xsf, \bB, \psi)$ for a distribution $\psi$ on $(\Xsf, \bB)$ that is stationary for $P$ (see {ref}`sss-moif`). We endow $L_1(\psi)$ with the almost everywhere pointwise partial order (see {ref}`sss-lpipo`). In particular, for $f, g \in L_1(\psi)$, the statement $f \leq g$ means that $\setntn{x \in \Xsf}{f(x) > g(x)}$ has $\psi$-measure zero. The firm valuation ADP is now $(L_1(\psi), \TT_{\rm FV})$, where each $T_\sigma$ in $\TT_{\rm FV}$ again has the form {eq}`eq-fintrots2o`, but now with $v$ and $\pi$ being elements of $L_1(\psi)$, while $P$ is understood as a Markov operator on $L_1(\psi)$. In {ref}`sss-urs` we showed that each policy operator $T_\sigma$ maps $L_1(\psi)$ into itself. Moreover, $T_\sigma$ is order preserving with respect to $\leq$; if $v \leq w$ holds $\psi$-almost everywhere, then $$ \int v(x') P(x, \diff x') \leq \int w(x') P(x, \diff x') $$ for all $x \in \Xsf$, and $T_\sigma \, v \leq T_\sigma \, w$ easily follows. This confirms that $(L_1(\psi), \TT_{\rm FV})$ is an ADP. The ADP is regular because, given $v \in L_1(\psi)$, the indicator in {eq}`eq-sigfv` is still $v$-greedy. The claims in {prf:ref}`p-fvg` extend to the ADP $(L_1(\psi), \TT_{\rm FV})$. The proof follows from {prf:ref}`t-affineban_sr`. Each $T_\sigma$ has the required form $T_\sigma \, v = r_\sigma + K_\sigma \, v$ for $r_\sigma \in L_1(\psi)$ and $K_\sigma \in \blop_+(L_1(\psi))$; we again set $r_\sigma$ and $K_\sigma$ as in {eq}`eq-rsks`. With $K \coloneq \beta P$, we have $K_\sigma \leq K$ for all $\sigma$ and, by {prf:ref}`l-mopfpl` on page , $\rho(K) = \rho(\beta P) = \beta < 1$. As $(L_1(\psi), \TT_{\rm FV})$ is regular, the conclusions of {prf:ref}`t-affineban_sr` apply. (sss-fvsd)= #### State-Dependent Discounting We emphasized the importance of time-varying discount rates in {ref}`sss-bcdr`. To incorporate such variation into our model, we set $r_t = r(X_t)$, where $r$ is a $\bB$-measurable function, and then $\beta(x) = 1/(1+r(x))$ for all $x \in \Xsf$. We require that $r > -1$, so that $\beta > 0$. For simplicity, we also assume that $\beta$ is bounded and, as in {ref}`sss-condi`, that the profit function $\pi$ is bounded. At current state $x$, a fixed policy $\sigma$ yields lifetime firm value $v_\sigma(x)$, where $v_\sigma$ satisfies the recursion $$ v_\sigma(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta(x) \int v_\sigma(x') P(x, \diff x') \right] \quad (x \in \Xsf). $$ Equivalently, $v_\sigma$ is a fixed point of $T_\sigma$ defined at $v \in b\Xsf$ by $$ T_\sigma \, v = \sigma s + (1-\sigma)(\pi + Kv), $$ (eq-fppol0) where $$ (Kv)(x) \coloneq \beta(x) \int v(x') P(x, \diff x') \qquad (v \in b\Xsf, \; x \in \Xsf). $$ The operator $K$ discounts future cash flows given the dynamics of discounting embedded in $\beta$ and $P$. Since $\beta$ is bounded, $K$ is a positive linear operator sending $b\Xsf$ into itself. It follows that each $T_\sigma$ is an order-preserving self-map on $b\Xsf$. Hence, letting $\TT_{\rm FV}$ be all policy operators of the form {eq}`eq-fppol0`, with $\sigma$ ranging over the policy set $\Sigma$, the pair $(b\Xsf, \TT_{\rm FV})$ is an ADP. It represents the dynamic decision problem for the firm under state-dependent discounting. Using similar arguments to the constant discounting case, we can confirm that the policy $\sigma = \1\{s \geq \pi + Kv\}$ is $v$-greedy. The ADP Bellman operator $T$ obeys $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$-greedy, so, using the description of greedy policies just given, we have $T v = s \vee (\pi + K v)$. The Bellman equation is therefore $v = s \vee (\pi + K v)$, which, in expanded form, becomes $$ v(x) = \max \left\{ s, \; \pi(x) + \beta(x) \int v(x') P(x, \diff x') \right\} $$ (eq-fvmodbel) Since greedy policies always exist, the ADP $(b\Xsf, \TT_{\rm FV})$ is regular. In order to obtain optimality results, we need some degree of stability for the ADP. To this end, we impose the following condition: ```{prf:assumption} :label: a-firms0 The spectral radius of the operator $K$ obeys $\rho(K)<1$. ``` {prf:ref}`a-firms0` requires that the discount factor is sufficiently small "on average," so that lifetime values are finite and uniquely defined. To illustrate the factors that influence $\rho(K)$, suppose that $(X_t)$ follows the AR(1) process $X_{t+1} = \mu(1-\alpha) + \alpha X_t + \nu Z_{t+1}$, with $Z_t$ iid standard normal, $0 < \alpha < 1$, and $\nu > 0$, discretized via Tauchen's method, with discount factor $\beta(x) = e^x$. {numref}`f-sdd_rho_K` shows contour plots of $\rho(K)$ as $\alpha$ and $\nu$ vary, for two values of the long-run mean $\mu$. The black line marks the boundary $\rho(K) = 1$; {prf:ref}`a-firms0` holds below and to the left of this line. A more negative $\mu$ (lower average discount factor) enlarges the stable region. In both panels, $\rho(K)$ increases with persistence and volatility, reflecting the fact that greater variation in the state process pushes the "average" discount factor upward. ```{figure} figures/sdd_rho_K.pdf :name: f-sdd_rho_K :width: 95% Spectral radius $\rho(K)$ as a function of the AR(1) parameters, with $\beta(x) = e^x$ ``` We can now state the following result for the firm valuation ADP $(b\Xsf, \TT_{\rm FV})$ under state-dependent discounting. ```{prf:proposition} :label: p-fvmodopt If {prf:ref}`a-firms0` holds, then the fundamental optimality properties hold and VFI, HPI, and OPI all converge. ``` ```{exercise} :label: ex-pfvod Prove {prf:ref}`p-fvmodopt`. ``` ```{solution} ex-pfvod We apply {prf:ref}`t-affineban_sr`. Each $T_\sigma$ has the form $T_\sigma \, v = r_\sigma + K_\sigma \, v$ for $r_\sigma \in b\Xsf$ and $K_\sigma \in \blop_+(b\Xsf)$; we set $r_\sigma \coloneq \sigma s + (1 - \sigma) \pi$ and $K_\sigma \coloneq (1-\sigma) K$, where $(1-\sigma) K$ is the operator defined by $((1-\sigma) K v)(x) = (1-\sigma(x)) (Kv)(x)$. Since $0 \leq K_\sigma \leq K$ for all $\sigma$ and $\rho(K) < 1$ by {prf:ref}`a-firms0`, the conditions of {prf:ref}`t-affineban_sr` hold. As $(b\Xsf, \TT_{\rm FV})$ is regular, the conclusions of the theorem apply. ``` (ss-ro)= ### A Real Option Problem Consider a firm that has developed a prototype product and faces a strategic decision: launch now, or continue development? The relative payoffs for these different choices depend on the economic environment, which evolves stochastically over time. This scenario is an example of a **real option problem**. The firm holds an option to launch, analogous to a call option on a financial asset, and must determine when to exercise it. Waiting has value when it allows the firm to launch under more favorable conditions. #### Set Up A state process $(X_t)$ summarizes payoff-relevant information at time $t$, such as demand conditions, competitive pressure, factor prices, and regulatory stance. The state influences the firm through two channels. Development costs $c(X_t)$ rise when input prices increase or skilled labor becomes scarce. Post-launch profitability $\pi(X_t)$ depends on demand strength and competitive conditions at the time of launch. Management strategies must balance these forces, weighing the benefits of launching in different states against the costs of continued development. We assume that $(X_t)_{t \geq 0}$ is $P$-Markov, where $P$ is a stochastic kernel on the measurable space $(\Xsf, \bB)$. At the start of time $t$, the firm pays a current flow cost $c(X_t)$ for development. Then, after observing the new state $X_{t+1}$, management decides whether or not to launch the product. If they decide to launch, the firm receives a state-contingent profit flow $(\pi(X_{t+j}))_{j \geq 1}$, where $\pi \colon \Xsf \to \RR$ is a given function. If they decide to wait, then no launch occurs and the process repeats. We impose the following weak conditions. ```{prf:assumption} :label: a-feba The Markov operator $P$ has unique stationary distribution $\phi$ on $(\Xsf, \bB)$. The functions $c, \pi$ are $\bB$-measurable and $\phi$-integrable. The function $\beta$ is bounded, nonnegative, and $\bB$-measurable. ``` We will seek solutions to the optimization problem in the space $L_1(\phi) \coloneq L_1(\Xsf, \bB, \phi)$. As in {ref}`sss-fvur`, we endow $L_1(\phi)$ with the usual $L_1$ norm and the $\phi$-a.e. pointwise order $\leq$, so that $f \leq g$ means $\phi \{f > g\} = 0$. Let $(Q_{t+j})_{j \geq 1}$ be the expected present value of the profit flow conditional on deciding to launch. This sequence obeys the recursion $$ Q_{t + j} = \pi(X_{t+j}) + \beta(X_{t+j}) \EE_{t + j} Q_{t + j + 1} \qquad (j \geq 1). $$ Exploiting the time homogeneity of the state process and following the same logic that we used to solve the recursion {eq}`eq-firec` on page , we find that $Q_{t+j} = q(X_{t+j})$ for all $j \geq 1$, where $q$ is the function that solves $$ q(x) = \pi(x) + \beta(x) \int q(x') P(x, \diff x') \qquad (x \in \Xsf). $$ Letting $K$ be the discount operator defined on $L_1(\phi)$ via $$ (Kv)(x) \coloneq \beta(x) \int v(x') P(x, \diff x') \qquad (v \in L_1(\phi), \; x \in \Xsf). $$ we can write the recursion as $q = \pi + K q$. Under the $\phi$-a.e. pointwise order, $K$ is order-preserving. We can confirm that $K$ maps $L_1(\phi)$ to itself by using the fact that, under {prf:ref}`a-feba`, the discount function $\beta$ obeys $|\beta| \leq N$ for some $N \in \NN$. Hence, for $v \in L_1(\phi)$, $$ |(Kv)(x)| \leq |\beta(x)| \int |v(x')| P(x, \diff x') \leq N \int |v(x')| P(x, \diff x'), $$ and $v \in L_1(\phi)$ implies $P|v| \in L_1(\phi)$ by {prf:ref}`l-mopfpl`. (sss-rolv)= #### Lifetime Values We derive lifetime values under a spectral radius condition on the discount operator and establish well-posedness of the ADP. Optimality results are given in {ref}`sss-firmexop`. We impose conditions under which the recursion $q = \pi + K q$ has a unique solution, implying that the profit function associated with launching the product is well defined. ```{prf:assumption} :label: a-firms The discount operator $K$ obeys $\rho(K)<1$. ``` Under {prf:ref}`a-firms`, the solution to the recursion $q = \pi + K q$ is $$ q = (I - K)^{-1} \pi. $$ With this function in hand, we can write out the lifetime value of policies. A policy here is a Borel measurable map $\sigma$ from $\Xsf$ to $\{0,1\}$ with $\sigma(x) = 1$ indicating the decision to launch the product at state $x$ and $\sigma(x) = 0$ indicating the decision to continue. As usual, we use $\Sigma$ to represent the set of all policies. If $\sigma \in \Sigma$ and $x \in \Xsf$, then $v_\sigma(x)$ denotes total firm value under policy $\sigma$, given initial state $x$. This function $v_\sigma$ obeys the recursion $$ v_\sigma(x) = -c(x) + \beta(x) \int [\sigma(x') q(x') + (1 - \sigma(x')) v_\sigma(x') ] P(x, \diff x') \quad (x \in \Xsf). $$ Equivalently, $v_\sigma$ is a fixed point of $T_\sigma$ defined at $v \in L_1(\phi)$ by $$ T_\sigma \, v = -c + K (\sigma \, q + (1 - \sigma) v) $$ (eq-fppol) Under {prf:ref}`a-feba` and {prf:ref}`a-firms`, each $T_\sigma$ is a self-map on $L_1(\phi)$. Since $K$ is a positive operator, each $T_\sigma$ is order preserving. Hence, letting $\TT_{\rm RO}$ be all policy operators of the form {eq}`eq-fppol`, with $\sigma$ ranging over the policy set $\Sigma$, the pair $(L_1(\phi), \TT_{\rm RO})$ is an ADP. ```{exercise} :label: ex-rovsig Show that, when {prf:ref}`a-feba` and {prf:ref}`a-firms` hold, $(L_1(\phi), \TT_{\rm RO})$ is well-posed, and that, for each $\sigma \in \Sigma$, the unique fixed point of $T_\sigma$ in $L_1(\phi)$ is $$ v_\sigma \coloneq (I - K_\sigma)^{-1}(-c + K \sigma \, q), $$ (eq-firmvsig) where $K_\sigma$ is the operator defined by $$ (K_\sigma \, f)(x) = \beta(x) \int f(x') (1-\sigma(x')) P(x, \diff x') \qquad (f \in L_1(\phi), \; x \in \Xsf). $$ ``` ```{solution} ex-rovsig It is straightforward to show that $K_\sigma \leq K$. Together, $K_\sigma \leq K$ and $\rho(K)<1$ imply that $\rho(K_\sigma) < 1$ (see {prf:ref}`t-orspr` on page ). The Neumann series lemma now implies that $v_\sigma$ in {eq}`eq-firmvsig` is well-defined and is the unique fixed point of $T_\sigma$ in $L_1(\phi)$. ``` (sss-firmexop)= #### Optimality Let's now turn to optimality. We start by studying greedy policies. ```{exercise} :label: ex-firmreg Show that, for $v \in L_1(\phi)$, the policy $\sigma = \1\{q \geq v\}$ is $v$-greedy. ``` ```{solution} ex-firmreg Fix $v \in L_1(\phi)$ and consider the policy $\sigma = \1\{q \geq v\}$. Under this policy we have $$ \sigma q + (1 - \sigma) v = q \vee v \geq \tau q + (1 - \tau) v \quad \text{for all } \tau \in \Sigma. $$ Since $K$ is a positive operator, this yields $T_\sigma \, v \geq T_\tau \, v$ for all $\tau \in \Sigma$. Hence $\sigma$ is $v$-greedy. ``` {prf:ref}`ex-firmreg` implies that $(L_1(\phi), \TT_{\rm RO})$ is regular. From {prf:ref}`l-torper` we know that the ADP Bellman operator satisfies $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$-greedy. Using this fact and the greedy policy $\sigma = \1\{q \geq v\}$ from {prf:ref}`ex-firmreg`, we obtain $$ T v = T_\sigma \, v = -c + K (\sigma \, q + (1 - \sigma) v) = -c + K (q \vee v). $$ (eq-firmbo) It follows that the Bellman equation for the ADP is $v = -c + K (q \vee v)$. Rewriting this expression using the definition of $K$, we get $$ v(x) = -c(x) + \beta(x) \int \max \left\{q(x'), v(x') \right\} P(x, \diff x') $$ (eq-fe_modbell) We can now state the following result for the real option ADP $(L_1(\phi), \TT_{\rm RO})$. ```{prf:proposition} :label: p-fpfbaa If {prf:ref}`a-feba` and {prf:ref}`a-firms` hold, then so do the fundamental optimality properties, and VFI, HPI, and OPI all converge. ``` ```{prf:proof} We prove {prf:ref}`p-fpfbaa` by checking the conditions of {prf:ref}`t-affineban_sr` on page . The ADP $(L_1(\phi), \TT_{\rm RO})$ is regular (by {prf:ref}`ex-firmreg`) and affine, with $r_\sigma \coloneq -c + K \sigma \, q$ and $K_\sigma \coloneq K (1 - \sigma)$. Since $r_\sigma$ lies in $L_1(\phi)$, since $0 \leq K_\sigma \leq K$ for all $\sigma \in \Sigma$, and since, by assumption, $\rho(K) < 1$, the conditions of {prf:ref}`t-affineban_sr` all hold. ◻ ``` {prf:ref}`p-fpfbaa` implies that the value function $\vmax$ solves the Bellman equation {eq}`eq-fe_modbell`, and that $\vmax$ can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are $\vmax$-greedy, which, by {prf:ref}`ex-firmreg`, translates to setting $\sigma = \1\{q \geq \vmax\}$. (ss-struct)= ### Structural Estimation Structural estimation is a core sub-field of quantitative economics that also plays a significant role in finance, marketing, operations research, and adjacent fields. Under this approach to estimation, researchers model economic agents as if they solve dynamic programs. The econometric challenge is to infer parameters that bring the model outputs (which are typically simulated from solutions to the underlying dynamic programs) as close as possible to the data. {prf:ref}`algo-se` gives an outline of the idea, with DP($\theta$) referring to a given dynamic program using a fixed parameterization indexed by $\theta$. ```{prf:algorithm} Structural Estimation :label: algo-se - input: data $\Dsf$, initial parameter guess $\theta_0$, tolerance $\tau$ - $\theta \leftarrow \theta_0$ - while true: - compute optimal policy $\sigma_\theta$ by solving DP$(\theta)$ - simulate model outputs $\Msf_\theta$ under $\sigma_\theta$ - compute distance $d(\Msf_\theta, \Dsf)$ between model and data - if $d(\Msf_\theta, \Dsf) < \tau$: - **break** - update $\theta$ to reduce $d(\Msf_\theta, \Dsf)$ - return $\theta$ ``` Typically, DP($\theta$) needs to be solved many times before convergence. In this section, we set aside the estimation step, where $d(\Msf_\theta, \Dsf)$ is constructed and $\theta$ is updated. We focus instead on the step where we compute optimal policy $\sigma_\theta$ by solving DP$(\theta)$. The types of dynamic programs typically adopted in this field have some interesting characteristics, which motivates our study. We note that structural estimation is sometimes called dynamic discrete choice because the action space is typically finite. Below we will look at settings where the state space is arbitrary and the action space is finite. (sss-pavf)= #### Post-Action Value Functions {cite}`rust1987optimal` and many subsequent authors study discrete choice problems with modified Bellman equations that take the form $$ g(z, a) = \int \int \max_{a' \in \Asf} \left\{ r(z', e', a') + \beta g(z', a') \right\} F(\diff e' \given z) G(\diff z' \given z, a). $$ Here $F$ and $G$ are conditional distributions, while $e$ is, in most cases, a form of unobserved heterogeneity. By taking $x = (z, e)$ and relabeling, we can write this equation as $$ g(x, a) = \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x') $$ (eq-stbe) Here $(x,a)$ takes values in $\Gsf \coloneq \Xsf \times \Asf$. We take $\Xsf$ to be a metric space. The reward function $r \in \RR^\Gsf$ is assumed to be bounded and Borel measurable, while $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$. The function $g$ is usually called a **post-action value function**, since it returns the value of the state after committing to a given action in the current period. The Bellman equation {eq}`eq-stbe` is nonstandard relative to traditional presentations of dynamic programming due to (a) the reversed order of integration and maximization, and (b) the dependence of $g$ on both state and action. It also fails to fit into the abstract dynamic programming formulation of {cite}`bertsekas2022abstract`, {cite}`ren2021dynamic`, {cite}`toda2024essential`, etc. At the same time, there are significant advantages of working with this version of the Bellman equation in structural estimation settings (see, e.g., {cite}`rust1994structural`, {cite}`kristensen2021solving`, or Chapter 5 of {cite}`sargent2025dynamic`). Despite its nontraditional format, we can set this problem up as an ADP by taking $\Sigma$ to be the set of Borel measurable maps from $\Xsf$ to $\Asf$ and, for each $\sigma \in \Sigma$, introducing the policy operator $$ (\hat T_\sigma \, g)(x, a) = \int [ r(x', \sigma(x')) + \beta g(x', \sigma(x')) ] P(x, a, \diff x'). $$ (eq-rsig) Recalling that $\Gsf$ is the product space $\Xsf \times \Asf$, let $(b\Gsf, \leq)$ be the set of bounded Borel measurable functions in $\RR^\Gsf$ paired with the pointwise partial order. Using boundedness and measurability of $r$, it is straightforward to show that each $\hat T_\sigma$ is an order preserving self map on $(b\Gsf, \leq)$. Letting $\hat{\TT}_{\rm SE}$ be the set of all such $\hat T_\sigma$, the pair $(b\Gsf, \hat{\TT}_{\rm SE})$ forms an ADP. Any $\sigma \in \Sigma$ obeying $$ \sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta g(x, a)\} \quad \text{for all } x \in \Xsf $$ (eq-ggr) is $g$-greedy, since, for such a $\sigma$ and any $\tau \in \Sigma$, we clearly have $$ \hat T_\tau \, g(x, a) \leq \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x') = \hat T_\sigma \, g(x, a) $$ for all $(x, a) \in \Gsf$. Does such a $\sigma$ necessarily exist? On one hand, $\Asf$ is finite and nonempty, so the argmax set is nonempty for all $x$. On the other hand, we still need to address the following measurability issue: ```{exercise} :label: ex-dmeis Prove that there exists a function $\sigma \colon \Xsf \to \Asf$ obeying {eq}`eq-ggr` that is also Borel measurable. ``` (To address this problem, we can use a measurable selection theorem, such as {prf:ref}`t-berge`. But that theorem has a difficult proof. In the solution to {prf:ref}`ex-dmeis`, we use a more elementary argument.) ```{solution} ex-dmeis To construct a measurable selection, we set $q(x, a) \coloneq r(x, a) + \beta g(x, a)$ on $\Gsf$. The map $x \mapsto q(x, a)$ is Borel measurable for each $a \in \Asf$ and $m(x) \coloneq \max_a q(x, a)$ is also measurable (because $\Asf$ is finite). Since $\Asf$ is finite, we can enumerate it as $\Asf = \{a_1, \ldots, a_n\}$ and define $\sigma$ by setting $\sigma(x) = a_{i}$ where $i$ is the smallest index such that $q(x, a_i) = m(x)$. Since $i$ depends on $x$, we write it more explicitly as $i(x)$. With this definition, for fixed $k \in [n]$, the set $\setntn{x}{i(x) = k}$ equals $$ \setntn{x}{q(x, a_k) = m(x) \,} \cap \setntn{x}{q(x, a_j) < m(x) \text{ for all } j < k \,} $$ As the intersection of measurable sets, this set is Borel measurable. Hence $x \mapsto i(x)$ is Borel measurable. It follows that $\sigma$ is a measurable selection. ``` In view of {prf:ref}`ex-dmeis`, the ADP $(b\Gsf, \hat{\TT}_{\rm SE})$ is regular. ```{exercise} :label: ex-setsc Show that each $\hat T_\sigma$ is a contraction of modulus $\beta$ on $b\Gsf$ under the supremum norm. ``` ```{solution} ex-setsc Fix $g, h \in b\Gsf$ and $\sigma \in \Sigma$. For any $(x, a) \in \Gsf$, we have $$ (\hat T_\sigma g)(x, a) - (\hat T_\sigma h)(x, a) = \beta \int [g(x', \sigma(x')) - h(x', \sigma(x'))] P(x, a, \diff x'). $$ Taking absolute values and using the triangle inequality for integrals, $$ |(\hat T_\sigma g)(x, a) - (\hat T_\sigma h)(x, a)| \leq \beta \int |g(x', \sigma(x')) - h(x', \sigma(x'))| P(x, a, \diff x'). $$ Since $|g(x', \sigma(x')) - h(x', \sigma(x'))| \leq \|g - h\|$ for all $x'$, we obtain $$ |(\hat T_\sigma g)(x, a) - (\hat T_\sigma h)(x, a)| \leq \beta \|g - h\| \int P(x, a, \diff x') = \beta \|g - h\|. $$ Taking the supremum over $(x, a) \in \Gsf$ yields the desired bound. ``` Given $g$, let $\sigma$ be a $g$-greedy policy. The ADP Bellman operator obeys $\hat T g = \hat T_\sigma \, g$. Using this fact and {eq}`eq-ggr`, we see that $$ (\hat T g)(x, a) = \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x') $$ for all $(x,a) \in \Gsf$. It follows that the ADP Bellman equation $g = \hat T g$ is equivalent to {eq}`eq-stbe`, confirming that our ADP accurately represents the problem we began with. We can now turn to optimality. ```{prf:proposition} :label: p-struct For the structural estimation ADP $(b\Gsf, \hat{\TT}_{\rm SE})$, the fundamental optimality properties hold and VFI, HPI, and OPI all converge. ``` ```{prf:proof} We apply {prf:ref}`t-blackwell` with $e = \1$ and $\lambda = \beta$. Since $P$ is a stochastic kernel, $\int P(x, a, \diff x') = 1$, and hence $\hat T_\sigma(g + \kappa \1) = \hat T_\sigma \, g + \beta \kappa \1$ for all $g \in b\Gsf$ and $\kappa \in \RR_+$. As $(b\Gsf, \hat{\TT}_{\rm SE})$ is regular, the conclusions of {prf:ref}`t-blackwell` apply. ◻ ``` (sss-stca)= #### State-Dependent Discounting Next we consider a modification of the structural estimation model in {ref}`sss-pavf` that includes state-dependent discounting. In this setting, the Bellman equation becomes $$ g(x, a) = \sum_{x'} \max_{a' \in \Asf} \left[ r(x', a') + \beta(x') g(x', a') \right] P(x, a, x') $$ (eq-eseb) where $(x,a) \in \Gsf \coloneq \Xsf \times \Asf$ and $\Asf, \Xsf$ are finite and nonempty. As usual, $r$ is a reward function on $\Gsf$ and $P$ is a transition kernel from $\Gsf$ to $\Xsf$. The discount factor $\beta$ is allowed to be a function of the state. We let $\| \cdot \|$ be the supremum norm and take $\Sigma$ to be the set of all functions from $\Xsf$ to $\Asf$. The state $\Xsf$ is taken to be finite to simplify the analysis. Given $\sigma \in \Sigma$, let $\hat T_\sigma$ be defined at $g \in \RR^\Gsf$ and $(x,a) \in \Gsf$ by $$ (\hat T_\sigma \, g) (x, a) = \sum_{x'} \left[ r(x', \sigma(x')) + \beta(x') g(x', \sigma(x')) \right] P(x, a, x') $$ Let $\hat{\TT}_{\rm SE} = \{\hat T_\sigma\}_{\sigma \in \Sigma}$. Each $\hat T_\sigma$ is an order-preserving self-map on $\RR^\Gsf$, so $(\RR^\Gsf, \hat{\TT}_{\rm SE})$ is an ADP. For each $g \in \RR^\Gsf$, we can construct a $g$-greedy policy by taking a $\sigma \in \Sigma$ such that $$ \sigma(x) \in \argmax_{a \in \Asf} [r(x, a) + \beta(x) g(x, a)] \quad \text{for all } x \in \Xsf. $$ (eq-seg) Since $\Asf$ is finite and nonempty, such a $\sigma$ exists. (We have no measurability issues here because $\Xsf$ is also finite.) As a result, the ADP $(\RR^\Gsf, \hat{\TT}_{\rm SE})$ is regular. Let $\hat T$ be the Bellman operator, so that $\hat T g \coloneq \bigvee_\sigma \hat T_\sigma \, g$. When evaluated at a $g$-greedy policy $\sigma$, we have $\hat T_\sigma \, g = \hat T g$. Using this equality and {eq}`eq-seg` yields $$ (\hat T g)(x, a) = \sum_{x'} \max_{a' \in \Asf} \left[ r(x', a') + \beta(x') g(x', a') \right] P(x, a, x'). $$ This confirms that solutions to the ADP Bellman equation $\hat T g = g$ solve the original Bellman equation {eq}`eq-eseb`. For each $\sigma \in \Sigma$ we set $$ (K_\sigma g)(x, a) = \sum_{x'} \beta(x') g(x', \sigma(x')) P(x, a, x') \qquad (x, a) \in \Gsf $$ (eq-ksigxa) We can now state the following result: ```{prf:proposition} :label: p-structsd If $\rho(K_\sigma) < 1$ for all $\sigma \in \Sigma$, then the fundamental optimality properties hold for $(\RR^\Gsf, \hat{\TT}_{\rm SE})$ and OPI, VFI, and HPI all converge. In addition, HPI converges in finitely many steps. ``` This discounting condition generalizes the traditional assumption that $\beta$ is constant and strictly less than one, in which case $\rho(K_\sigma) < 1$ always holds. ```{prf:proof} *Proof of {prf:ref}`p-structsd`.* We have already shown that $(\RR^\Gsf, \hat{\TT}_{\rm SE})$ is regular. Moreover, for fixed $\sigma \in \Sigma$ and $g \in \RR^\Gsf$, we have $$ (\hat T_\sigma \, g)(x,a) = \sum_{x'} r_\sigma(x') P(x, a, x') + (K_\sigma \, g)(x, a), $$ where $K_\sigma$ is as defined in {eq}`eq-ksigxa`. Since $\hat{\TT}_{\rm SE}$ is finite and $\rho(K_\sigma) < 1$ for all $\sigma \in \Sigma$, the first two claims in {prf:ref}`p-structsd` follow from {prf:ref}`t-affineban_f`. Since global stability implies order stability ({prf:ref}`l-pspace`), the last claim regarding HPI can be proved using {prf:ref}`t-bkf`. ◻ ``` (sss-noneu)= #### Beyond Expected Utility Some studies find incompatibilities between data and predictions of models that use additively separable preferences and mathematical expectation to evaluate uncertain outcomes (see, e.g., {cite}`lu2024didharoldzuerchertimeseparable`). To further this line of analysis, we revisit the basic structural estimation model in {ref}`sss-pavf` while replacing mathematical expectation with a general certainty equivalent operator. As in {ref}`sss-pavf`, spaces of bounded real-valued functions are paired with the pointwise order $\leq$, and the supremum norm, to be denoted by $\| \cdot \|$. The state space $\Xsf$ is a metric space, $\Asf$ is a finite choice set, $\Gsf \coloneq \Xsf \times \Asf$, the reward function $r \colon \Gsf \to \RR$ is measurable, and $\beta \in (0,1)$ is a constant discount factor. However, we modify the Bellman equation {eq}`eq-stbe` for the post-action value function to $$ g(x, a) = (M H g)(x, a) \quad \text{where } (Hg)(x') \coloneq \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right]. $$ (eq-neustbe) Here $M$ is a certainty equivalent operator mapping $b\Xsf$ into $b\Gsf$; that is, $M$ is order-preserving and $M(v + \kappa \1) = Mv + \kappa \1$ for all $\kappa \in \RR_+$. An example is given below in {ref}`sss-baue`. Let $\Sigma$ be the set of Borel measurable maps from $\Xsf$ to $\Asf$. Given $\sigma \in \Sigma$, we set $$ (\hat T_\sigma \, g)(x, a) = (M H_\sigma \, g)(x, a) \quad \text{where } (H_\sigma \, g)(x') \coloneq r(x', \sigma(x')) + \beta g(x', \sigma(x')). $$ With $\hat{\TT}_{\rm SE} \coloneq \{\hat T_\sigma\}_{\sigma \in \Sigma}$, the pair $(b\Gsf, \hat{\TT}_{\rm SE})$ is an ADP and $\sigma \in \Sigma$ is $g$-greedy whenever $$ \sigma(x) \in \argmax_{a' \in \Asf} \left[ r(x, a') + \beta g(x, a') \right] \quad \text{for all } x \in \Xsf. $$ (eq-neug) Since $\Asf$ is finite and nonempty, such a policy always exists. (A function $\sigma$ obeying {eq}`eq-neug` can be chosen to be measurable---see the solution to {prf:ref}`ex-dmeis`.) Given $g \in b\Gsf$, the ADP Bellman operator $\hat T$ satisfies $\hat T g = \hat T_\sigma \, g$ whenever $\sigma$ is $g$-greedy. Using this fact and {eq}`eq-neug`, we obtain $\hat T g = M H g$. Hence any fixed point of $\hat T$ solves the original Bellman equation {eq}`eq-neustbe`. ```{prf:proposition} :label: p-stor The fundamental optimality properties hold for $(b\Gsf, \hat{\TT}_{\rm SE})$, and VFI, OPI, and HPI all converge. ``` ```{prf:proof} We apply {prf:ref}`t-blackwell`. As regularity was already confirmed above, it suffices to show that $$ \hat T_\sigma \, (g + \kappa e) \leq \hat T_\sigma \, g + \beta \kappa e $$ (eq-badp2) for all $g \in b\Gsf$ and all $\kappa \in \RR_+$. To verify this, fix $\sigma$, $g$ and $\kappa$ as above. Since $(H_\sigma \, g)(x') = r(x', \sigma(x')) + \beta g(x', \sigma(x'))$, we have $H_\sigma(g + \kappa \1) = H_\sigma \, g + \beta \kappa \1$. Since $M$ is a certainty equivalent operator, $\hat T_\sigma(g + \kappa \1) = M(H_\sigma \, g + \beta \kappa \1) = M(H_\sigma \, g) + \beta \kappa \1 = \hat T_\sigma \, g + \beta \kappa \1$. ◻ ``` (sss-baue)= #### The Risk-Sensitive Case As an illustration, suppose that $M$ is the risk-sensitive certainty equivalent $$ (M f)(x, a) \coloneq - \frac{1}{\gamma} \ln \left\{ \int \exp \left[ -\gamma f(x') \right] P(x, a, \diff x') \right\} \qquad ((x, a) \in \Gsf), $$ where $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$ and $\gamma$ is a nonzero constant. Among other things, {prf:ref}`p-stor` tells us that $\sigma \in \Sigma$ is optimal if and only if $$ \sigma(x) \in \argmax_{a' \in \Asf} \left[ r(x, a') + \beta \gmax(x, a') \right] \quad \text{for all } x \in \Xsf, $$ where $\gmax$ is the unique solution to the functional equation $$ g(x, a) = - \frac{1}{\gamma} \ln \left\{ \int \exp \left\{ -\gamma \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] \right\} P(x, a, \diff x') \right\} $$ in the value space $b\Gsf$. Alternative certainty equivalents are discussed in {ref}`sss-ces`. (s-cn_adps3)= ## Chapter Notes This chapter extends the ADP framework of {prf:ref}`c-adps` and {prf:ref}`c-adps2` by working within partially ordered metric and topological spaces, which provide a natural setting for combining contraction-based and order-theoretic reasoning. In practice, most of the pospaces we examine are subsets of Banach lattices. Background on Banach lattices and positive operators can be found in the references listed in the chapter notes of {prf:ref}`c-adps`. Du's conditions for concave and convex operators, treated in {ref}`ss-ovscon`, originate in {cite}`du1990fixed`; see also {cite}`zhang2012variational`, Theorem 2.1.2. Certainty equivalent operators, which appear in {ref}`sss-noneu`, are covered in depth in {ref}`sss-ces`. The applications in this chapter draw on several fields. The discounting condition in {prf:ref}`a-firms` is similar to restrictions found in {cite}`hansen2012recursive` and {cite}`borovivcka2020necessary`. General results on dynamic programming with state-dependent discounting can be found in {cite}`stachurski2021dynamic`. The structural estimation framework in {ref}`ss-struct` originates with the classic work of {cite}`rust1987optimal`; for further background, see {cite}`rust1994structural`, {cite}`igami2020`, {cite}`kristensen2021solving`, and Chapter 5 of {cite}`sargent2025dynamic`. A duality-based perspective on dynamic discrete-choice models is developed in {cite}`chiong2016duality`. The non-expected utility extension in {ref}`sss-noneu` is motivated by evidence reviewed in {cite}`lu2024didharoldzuerchertimeseparable`. Nonexpected utility is considered again in {ref}`sss-ces`. ======================================================================== ## ADP Transformations A recurring task in mathematics is establishing when two apparently different objects are, in a precise sense, the same. Such equivalences are formalized by invertible, structure-preserving maps. For example, a group isomorphism is a bijection that preserves the group operation, revealing that two groups share identical algebraic structure despite different representations. A topological conjugacy between two dynamical systems is a homeomorphism intertwining their transition maps, establishing that the systems have the same dynamic behavior up to a relabeling of the state space. In the setting of dynamic programming, the relevant structure is order, and the appropriate notion of equivalence is that of an order isomorphism between posets. We now investigate transformations of dynamic programs that preserve optimality structure. First, in {ref}`ss-iso`, we investigate order isomorphisms, under which connections are exact. Two dynamic programs linked by isomorphisms are "the same" in terms of their optimality properties. In elementary terms, this is analogous to the way that $g = \phi \circ f$ has the same maximizer as $f$ when $\phi$ is strictly increasing; and that a maximizer of $g$ is a minimizer of $f$ when $\phi$ is strictly decreasing. Isomorphisms are useful but there is also a sense in which they preserve *too much* structure. Often we want to transform a dynamic program into another one that differs along at least some dimensions -- for example, in terms of dimensionality, or smoothness -- and try to understand the original problem by studying the second more tractable one. We investigate such transformations in {ref}`ss-fdps`, under the title of "factored" dynamic programs. In {ref}`s-subs`, we introduce factored dynamic programs (FDPs), which involve non-bijective transformations that link a primary ADP to a subordinate ADP of potentially lower dimension. We show how optimality properties transfer between the two. {ref}`s-transapps` applies both isomorphisms and FDPs to concrete settings, including Q-factor models, structural estimation, and Epstein--Zin preferences. (ss-iso)= ## Isomorphisms In this section we introduce a concept of "isomorphic" dynamic programs. In particular, we describe an isomorphic relationship that leads to essentially equivalent optimality properties. The basic idea can be explained with a simple example. Consider the savings problem from {ref}`sss-crracase`, with Bellman equation $$ v(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta v(w - c) \right\}. $$ If $u$ has heavy curvature near zero, one might consider taking an exponential transformation, aiming to work with functions that are easier to approximate numerically. Applying $\exp$ to both sides, writing $\hat u$ for $\exp \circ \, u$ and $\hat v$ for $\exp \circ \, v$, we get $$ \hat v(w) = \max_{0 \leq c \leq w} \left\{ \hat u(c) \hat v(w - c)^\beta \right\}. $$ Not surprisingly, these two dynamic programs turn out to have exactly the same optimal policies, giving us two viable angles of attack. The theory in this section clarifies and generalizes this idea, with the aim of allowing us to apply it effectively to both simple and complex problems. Along the way we will meet useful concepts such as topological conjugacy for dynamic systems and order isomorphisms. We will exploit these ideas to study additional optimization problems and algorithms, including time iteration methods and decision problems with ambiguity. ### Background Concepts We begin with conjugate dynamical systems in {ref}`sss-orderiso`, which preserve fixed point structure. In {ref}`sss-topcon`, we strengthen conjugacy to topological conjugacy, which additionally preserves stability. In {ref}`sss-orderconj`, we develop order conjugacy, an order-theoretic alternative that is better suited to passing optimality properties between ADPs. (sss-orderiso)= #### Conjugate Dynamics We recall that a (discrete time) **dynamical system** is a pair $(V, S)$, where $V$ is any set and $S$ is a self-map on $V$. Two dynamical systems $(V, S)$ and $(\hat V, \hat S)$ are said to be **conjugate under $F$** (or just **conjugate**) if $F$ is a bijection from $V$ into $\hat V$ and $F \circ S = \hat S \circ F$ on $V$. We can also write the last equality as $S = F^{-1} \circ \hat S \circ F$. This helps us understand the conjugacy relationship: shifting a point $v \in V$ to $S v$ via $S$ is equivalent to 1. moving $v$ into $\hat V$ via $\hat v = F v$, 2. applying $\hat S$ to produce $\hat S \hat v$, and then 3. moving the result $\hat S \hat v$ back to the original space $V$ using $F^{-1}$. ```{image} figures/tikzcd_conjugacy_inline.svg :width: 300px :align: center ``` ```{prf:example} Consider the dynamical systems $(\RR, S)$ and $((0,\infty), \hat S)$ where $Sx= ax + b$ and $\hat S x = \exp(b) x^a$. The map $\hat S$ is called the **log-linearization** of $S$. With $F = \exp$ we have $$ FSx = \exp(b) \exp(a x) = \exp(b) \exp(x)^a = \hat S F x. $$ Hence $(\RR, S)$ and $((0,\infty), \hat S)$ are conjugate under $F$. ``` ```{prf:example} :label: eg-diag Let $A$ be a diagonalizable $m \times m$ matrix, so that $A = E D E^{-1}$ where $D$ is a diagonal matrix of eigenvalues and the columns of $E$ are the corresponding eigenvectors. (Diagonalizability means that the eigenvectors are linearly independent, so $E$ is invertible.) Consider the dynamical systems $(\RR^m, A)$ and $(\RR^m, D)$ where the matrices $A$ and $D$ are understood as maps. With $F = E^{-1}$, we have $$ F A x = E^{-1} A x = E^{-1} E D E^{-1} x = D E^{-1} x = D F x. $$ Hence the two systems are conjugate under $F$. This is useful because the diagonal system $(\RR^m, D)$ is much easier to analyze (see {prf:ref}`eg-diag2`). ``` The next result lists some consequences of conjugacy. ```{prf:proposition} :label: p-iccm If $(V,S)$ and $(\hat V, \hat S)$ are conjugate, then 1. $S^n = F^{-1} \hat S^n F$ for all $n \in \NN$, 2. $v$ is a fixed point of $S$ if and only if $F v$ is a fixed point of $\hat S$, 3. $\hat v$ is a fixed point of $\hat S$ if and only if $F^{-1} \hat v$ is a fixed point of $S$, and 4. $v$ is the unique fixed point of $S$ in $V$ if and only if $Fv$ is the unique fixed point of $\hat S$ in $\hat V$. ``` The proofs of these claims are straightforward. For example, regarding (i), suppose that $S^n = F^{-1} \hat S^n F$ at some fixed $n$. Then, using conjugacy, $$ S^{n+1} = S S^n = S F^{-1} \hat S^n F = F^{-1} \hat S \hat S^n F = F^{-1} \hat S^{n+1} F. $$ ```{exercise} :label: ex-transforms-auto-1 Prove part (iv) of {prf:ref}`p-iccm`. ``` ```{solution} ex-transforms-auto-1 Let $v$ be the unique fixed point of $S$ in $V$. Then, using $F \circ S = \hat S \circ F$, we have $\hat S F v = F S v = F v$, so $F v$ is a fixed point of $\hat S$ in $\hat V$. Moreover, if $w$ is another fixed point of $\hat S$ in $\hat V$, then, by rearranging $F \circ S = \hat S \circ F$, we obtain $S F^{-1} w = F^{-1} \hat S w = F^{-1} w$, so $F^{-1} w$ is a fixed point of $S$. Since $v$ is the only fixed point of $S$, we then have $v = F^{-1} w$, or $w = Fv$. In particular, $Fv$ is the only fixed point of $\hat S$. ``` (sss-topcon)= #### Topological Conjugacy For us, perhaps the most important consequence of {prf:ref}`p-iccm` is that, if two dynamical systems $(V, S)$ and $(\hat V, \hat S)$ have this property, then the system $(V, S)$ has a unique fixed point if and only if $(\hat V, \hat S)$ has a unique fixed point. At the same time, conjugacy is not enough to pass on stability properties. For this we need topological conjugacy. To state this property, let $V$ and $\hat V$ be Hausdorff topological spaces. A map $F \colon V \to \hat V$ is called a **homeomorphism** if $F$ is a bijection from $V$ to $\hat V$ and, in addition, both $F$ and its inverse are continuous. The dynamical systems $(V, S)$ and $(\hat V, \hat S)$ are called **topologically conjugate** when these two systems are conjugate under a bijection $F$ and, in addition, $F$ is a homeomorphism. In this setting, we have the following result: ```{prf:proposition} :label: p-tc If $(V, S)$ and $(\hat V, \hat S)$ are topologically conjugate, then $S$ is globally stable on $V$ if and only if $\hat S$ is globally stable on $\hat V$. ``` The following example gives an elementary but nonetheless important illustration of the value of {prf:ref}`p-tc`. ```{prf:example} :label: eg-diag2 Continuing with {prf:ref}`eg-diag`, since $F \coloneq E^{-1}$ is a linear bijection on $\RR^m$, it is a homeomorphism. Hence $(\RR^m, A)$ and $(\RR^m, D)$ are topologically conjugate. By {prf:ref}`p-tc`, $A$ is globally stable on $\RR^m$ if and only if $D$ is globally stable on $\RR^m$. The diagonal system is easy to analyze: $D^k x \to 0$ for all $x \in \RR^m$ if and only if every diagonal entry of $D$ has modulus less than one. Since the diagonal entries of $D$ are the eigenvalues of $A$, we conclude that $A^k v \to 0$ for all $v \in \RR^m$ if and only if all eigenvalues of $A$ lie within the unit circle. ``` We will use {prf:ref}`p-tc` when we study Euler equations and time iteration in {ref}`ss-euler`. (sss-orderconj)= #### Order Conjugacy In the previous two sections we introduced conjugacy and then strengthened it to topological conjugacy, a fairly standard approach. Now, however, we will step back to conjugacy and strengthen it using order rather than topology. The order-theoretic notion we develop will be analogous to topological conjugacy, while at the same time being better suited to passing optimality properties from one ADP to another. To begin, we take $V$ and $\hat V$ to be posets and consider two dynamical systems $(V, S)$ and $(\hat V, \hat S)$. We call these systems **order conjugate under $F$** if 1. $(V, S)$ and $(\hat V, \hat S)$ are conjugate under $F$ and, 2. $F$ is an order isomorphism (see {ref}`sss-orfs`). To indicate that such an $F$ can be found, we simply say that $(V, S)$ and $(\hat V, \hat S)$ are **order conjugate**. ```{exercise} :label: ex-dseqref Prove that order conjugacy is an equivalence relation on the set of dynamical systems over partially ordered sets. ``` ```{solution} ex-dseqref Let $D$ be the set of all dynamical systems $(V, S)$ where $V$ is partially ordered. For $(V, S)$ and $(\hat V, \hat S)$ in $D$, we write $(V, S) \sim (\hat V, \hat S)$ when $(V, S)$ and $(\hat V, \hat S)$ are order conjugate. We claim that $\sim$ is reflexive, symmetric and transitive. Reflexivity is obvious: every $(V, S)$ in $D$ is order conjugate to itself under the identity map $I$. Symmetry is also straightforward: If $(V, S)$ and $(\hat V, \hat S)$ are order conjugate under $F$, then $(\hat V, \hat S)$ and $(V, S)$ are order conjugate under $F^{-1}$. Finally, if $(V, S) \sim (V', S')$ under $F$ and $(V', S') \sim (V'', S'')$ under $G$, then $G \circ F$ is an order isomorphism from $V$ to $V''$ and $$ G \circ F \circ S = G \circ S' \circ F = S'' \circ G \circ F \text{ on } V. $$ Hence $(V, S) \sim (V'', S'')$ and $\sim$ is also transitive. ``` The next lemma shows one benefit of establishing order conjugacy. It can be thought of as an order-theoretic version of {prf:ref}`p-tc`. ```{prf:lemma} :label: l-ocos If $(V, S)$ and $(\hat V, \hat S)$ are order conjugate under $F$, then 1. $S$ is order stable on $V$ if and only if $\hat S$ is order stable on $\hat V$, and 2. $S$ is strongly order stable on $V$ if and only if $\hat S$ is strongly order stable on $\hat V$. ``` ```{exercise} :label: ex-transforms-auto-2 Prove {prf:ref}`l-ocos`. ``` ```{solution} ex-transforms-auto-2 Let $(V, S)$ and $(\hat V, \hat S)$ be order conjugate under $F$, with respective fixed points $v$ and $\hat v = Fv$. By {prf:ref}`p-iccm`, $\hat v$ is the unique fixed point of $\hat S$ in $\hat V$. For (i), suppose $S$ is order stable on $V$ and let $\hat w$ satisfy $\hat S \hat w \preceq \hat w$. Then $F^{-1} \hat S \hat w \preceq F^{-1} \hat w$, so $S F^{-1} \hat w \preceq F^{-1} \hat w$. By order stability of $S$, $v \preceq F^{-1} \hat w$, i.e., $\hat v = Fv \preceq \hat w$. The proof that $\hat w \preceq \hat S \hat w$ implies $\hat w \preceq \hat v$ is similar. For (ii), suppose $S$ is strongly order stable on $V$ and let $\hat w$ satisfy $\hat S \hat w \preceq \hat w$. As above, $S F^{-1} \hat w \preceq F^{-1} \hat w$. Strong order stability of $S$ now gives $S^n F^{-1} \hat w \downarrow v$. Applying {prf:ref}`ex-ios` yields $F S^n F^{-1} \hat w \downarrow Fv = \hat v$, and since $\hat S^n = F S^n F^{-1}$ by {prf:ref}`p-iccm`, we obtain $\hat S^n \hat w \downarrow \hat v$. The proof that $\hat w \preceq \hat S \hat w$ implies $\hat S^n \hat w \uparrow \hat v$ is similar. Both reverse implications hold by symmetry. ``` (ss-isoopt)= ### Isomorphic ADPs In this section, we use order conjugacy to connect dynamic programs. We are interested in whether or not ADPs can be connected by order isomorphisms (or anti-isomorphisms), and what implications this has for optimality. In {ref}`sss-isoadp`, we define isomorphic ADPs and show that isomorphism is an equivalence relation. In {ref}`sss-iaoth`, we prove that isomorphic ADPs share the same optimality and convergence properties. {ref}`sss-aiadps` extends the analysis to anti-isomorphic ADPs, where maximization in one ADP corresponds to minimization in the other. Finally, in {ref}`sss-ezwo`, we apply both isomorphic and anti-isomorphic relationships to establish optimality for an Epstein--Zin preference model. (sss-isoadp)= #### Definition and Consequences Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be ADPs with policy sets $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ and $\hat{\TT} \coloneq \setntn{\hat T_\sigma}{\sigma \in \Sigma}$. We call these ADPs **isomorphic** under $F$ if 1. these two ADPs have the same policy set $\Sigma$, and 2. $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ are order conjugate under $F$ for all $\sigma \in \Sigma$. Part (ii) requires that $F$ is an order isomorphism from $V$ to $\hat V$ and $$ F \circ T_\sigma = \hat T_\sigma \circ F \;\; \text{ on } V \text{ for all } \sigma \in \Sigma. $$ (eq-adpih) ```{prf:example} Consider an ADP where $T_\sigma$, the policy operator, has the form $$ (T_\sigma \, v)(w) = u(\sigma(w)) + \beta v(w - \sigma(w)). $$ (eq-ospolopr) (This is the problem we introduced at the start of {ref}`ss-iso`.) This operator maps $c\RR$, the set of all continuous real-valued functions on $\RR$, into itself. Consider also a "multiplicative" version $$ (\hat T_\sigma \, \hat v)(w) = \hat u(\sigma(w)) [\hat v(w - \sigma(w))]^\beta $$ (eq-ospoloph) where $\hat u \coloneq \exp \circ \, u$ and $\hat T_\sigma$ acts on functions in $c(0,\infty)$, the set of continuous everywhere positive functions on the positive reals. Then, given $v \in c\RR$, we have $$ \begin{aligned} \exp [(T_\sigma \, v)(w)] & = \exp [ u(\sigma(w)) ] \cdot \exp[ \beta v(w - \sigma(w))] \\ & = \hat u(\sigma(w)) \cdot [\exp v(w - \sigma(w)) ]^\beta = (\hat T_\sigma \, \exp \circ \, v)(w). \end{aligned} $$ Since $v$ and $w$ were chosen arbitrarily, we find that $F \circ T_\sigma = \hat T_\sigma \circ F$ on $c\RR$, where $F$ is the transformation given by $F v \coloneq \exp \circ \, v$. {prf:ref}`ex-eth` tells us that $F$ is an order isomorphism from $c\RR$ to $c(0,\infty)$, so $(c\RR, \TT)$ and $(c(0,\infty), \hat{\TT})$ are isomorphic. ``` ```{prf:lemma} :label: l-isoadpeq Isomorphism between ADPs is an equivalence relation on the set of ADPs. ``` In other words, if $\mathbf A$ is the set of all ADPs and, for $(V, \TT), (\hat V, \hat{\TT}) \in \mathbf A$, the symbol $(V, \TT) \sim (\hat V, \hat{\TT})$ means $(V, \TT)$ and $(\hat V, \hat{\TT})$ are isomorphic, then $\sim$ is reflexive, symmetric and transitive. ```{exercise} :label: ex-transforms-auto-3 Prove {prf:ref}`l-isoadpeq`. (Hint: Use {prf:ref}`ex-dseqref`.) ``` (sss-iaoth)= #### Isomorphisms and Optimality We seek relationships between optimality properties of isomorphic ADPs. For all of this section, we take $(V, \TT)$ and $(\hat V, \hat{\TT})$ to be two ADPs with $\TT = \setntn{T_\sigma}{\sigma \in \Sigma}$ and $\hat{\TT} = \setntn{\hat T_\sigma}{\sigma \in \Sigma}$. When they exist, we let - $v_\sigma$ (resp., $\hat v_\sigma$) be the unique fixed point of $T_\sigma$ (resp., $\hat T_\sigma$), - $\tmax$ (resp., $\htmax$) be the Bellman operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$), and - $\vmax$ (resp., $\hvmax$) be the value function of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$). The next theorem shows that isomorphic ADPs share the same regularity and optimality properties: ```{prf:theorem} :label: t-iso If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are isomorphic under $F$, then 1. $\sigma$ is $v$-greedy for $(V, \TT)$ if and only if $\sigma$ is $Fv$-greedy for $(\hat V, \hat{\TT})$, 2. $(V, \TT)$ is regular if and only if $(\hat V, \hat{\TT})$ is regular, 3. $(V, \TT)$ is well-posed if and only if $(\hat V, \hat{\TT})$ is well-posed, 4. $(V, \TT)$ is order stable if and only if $(\hat V, \hat{\TT})$ is order stable, and 5. $\sigma$ is optimal for $(V, \TT)$ if and only if $\sigma$ is optimal for $(\hat V, \hat{\TT})$. ``` ```{prf:proof} Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be isomorphic under $F$. Regarding (i), fix $v \in V$ and suppose that $\sigma$ is $v$-greedy for $(V, \TT)$. Then $T_\tau \, v \preceq T_\sigma \, v$ and hence $F T_\tau \, v \preceq F T_\sigma \, v$ for all $\tau \in \Sigma$. Conjugacy now implies that $\hat T_\tau \, F v \preceq \hat T_\sigma \, F v$ for all $\tau \in \Sigma$, so $\sigma$ is $Fv$-greedy for $(\hat V, \hat{\TT})$. The converse implication is symmetric. Claim (ii) is immediate from claim (i). Claims (iii) and (iv) follow directly from order conjugacy of the policy operators (as in {eq}`eq-adpih`) and {prf:ref}`l-ocos`. Regarding (v), we use order conjugacy of the policy operators to obtain $F v_\sigma = \hat v_\sigma$ for all $\sigma \in \Sigma$, from which it follows that $$ v_\sigma = \bigvee_\tau v_\tau \quad \iff \quad F v_\sigma = F \bigvee_\tau v_\tau = \bigvee_\tau F v_\tau \quad \iff \quad \hat v_\sigma = \bigvee_\tau \hat v_\tau $$ In other words, $\sigma$ is optimal for $(V, \TT)$ if and only if $\sigma$ is optimal for $(\hat V, \hat{\TT})$. ◻ ``` The next theorem studies the case when $(V, \TT)$ and $(\hat V, \hat{\TT})$ are regular and well-posed. It tells us that isomorphic ADPs have the same optimality properties. ```{prf:theorem} :label: t-iso2 Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be regular, well-posed ADPs. If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are isomorphic under $F$, then 1. the Bellman operators obey $$ F \circ \tmax = \htmax \circ F \text{ on } V, $$ (eq-isotmax) 2. when they exist, the value functions for $(V, \TT)$ and $(\hat V, \hat{\TT})$ are related by $\hvmax = F \vmax$, and 3. the fundamental optimality properties hold for $(V, \TT)$ if and only if they hold for $(\hat V, \hat{\TT})$. ``` ```{prf:proof} Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be as stated. Regarding (i), we fix $v \in V$ and apply {eq}`eq-adpih` to obtain $$ F T v = F \bigvee_\sigma T_\sigma \, v = \bigvee_\sigma F T_\sigma \, v = \bigvee_\sigma \hat T_\sigma \, F v = \htmax \, F \, v. $$ (The second equality follows from regularity and {prf:ref}`l-ois`.) This confirms {eq}`eq-isotmax`, so $T$ and $\hat T$ are order conjugate under $F$. Regarding (ii), suppose that $\vmax = \bigvee_\sigma v_\sigma$ exists. Then $\hvmax = \bigvee_\sigma F v_\sigma = F \bigvee_\sigma v_\sigma = F \vmax$. Regarding (iii), suppose that the fundamental optimality properties hold for $(V, \TT)$. We need only show that they likewise hold for $(\hat V, \hat{\TT})$, since the reverse implication then holds by symmetry. First, an optimal policy exists for $(\hat V, \hat{\TT})$ by existence for $(V, \TT)$ and part (v) of {prf:ref}`t-iso`. Second, $\vmax$ is the unique fixed point of $T$ and, in addition, $(V, T)$ and $(\hat V, \hat T)$ are order conjugate, so $F\vmax$ is the unique fixed point of $\hat T$. In view of (ii), this means that $\hvmax$ is the unique fixed point of $\hat T$. Bellman's principle of optimality also holds for $(\hat V, \hat{\TT})$ by {prf:ref}`l-fo` on page  (or by (i) and (v) of {prf:ref}`t-iso`). ◻ ``` The next theorem considers convergence of algorithms. ```{prf:theorem} :label: t-iso3 Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be regular, well-posed ADPs. If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are isomorphic under $F$, then 1. the respective optimistic policy operators $W$ and $\hat W$ obey $$ F \circ W = \hat W \circ F \text{ on } V, \text{ and} $$ (eq-isoop) 2. the respective Howard policy operators $H$ and $\hat H$ obey $$ F \circ H = \hat H \circ F \text{ on } V. $$ (eq-isohow) Moreover, if the fundamental optimality properties hold for one and hence both of these ADPs, then the following statements are true. 1. VFI converges for $(V, \TT)$ if and only if VFI converges for $(\hat V, \hat{\TT})$, 2. OPI converges for $(V, \TT)$ if and only if OPI converges for $(\hat V, \hat{\TT})$, and 3. HPI converges for $(V, \TT)$ if and only if HPI converges for $(\hat V, \hat{\TT})$. ``` The proof is long but straightforward and presented as a solved exercise. ```{exercise} :label: ex-lbs Prove {prf:ref}`t-iso3`. ``` ```{solution} ex-lbs Fix $m \in \NN$. Let $W \coloneq W_m$ and $H$ be the optimistic and Howard policy operators for $(V, \TT)$. Let $\hat W \coloneq \hat W_m$ and $\hat H$ be the optimistic and Howard policy operators for $(\hat V, \hat{\TT})$. Fix $v \in V$ and let $\sigma$ be $v$-greedy for $(V, \TT)$, so that $W v = T^m _\sigma v$. By {prf:ref}`t-iso`, $\sigma$ is $Fv$-greedy for $(\hat V, \hat{\TT})$. Hence $\hat W F v = \hat T^m_\sigma F v = F T^m_\sigma v = F W v$. This proves that {eq}`eq-isoop` holds. Similarly, continuing to assume that $\sigma$ is $v$-greedy for $(V, \TT)$, we have $Hv = v_\sigma$ and, because $\sigma$ is $Fv$-greedy for $(\hat V, \hat{\TT})$, we also have $\hat H Fv = \hat v_\sigma$. As a result, $F^{-1} \hat H F v = F^{-1} \hat v_\sigma = F^{-1} F v_\sigma = v_\sigma = H v$. Hence {eq}`eq-isohow` also holds. Regarding (iii)--(v), we prove only (v), since the remaining arguments are similar. Suppose that HPI converges for $(V, \TT)$ and fix $\hat v \in \hat V_U$. Then $v \coloneq F^{-1} \hat v$ is in $V_U$, since $v = F^{-1} \hat v \preceq F^{-1} \hat T \hat v = T F^{-1} \hat v = Tv$. As a result, we have $H^n v \uparrow \vmax$. But then $F H^n v \uparrow F \vmax = \hat \vmax$, where $\uparrow$ is by {prf:ref}`ex-ios` and the equality is by {prf:ref}`t-iso2`(ii). Since $H$ and $\hat H$ are conjugate under $F$, we also have $F H^n v = \hat H^n F v$. Combining the last two equalities gives us $\hat H^n \hat v = \hat H^n F v = F H^n v \uparrow \hat \vmax$. As $\hat v$ was chosen arbitrarily from $\hat V_U$, we see that HPI converges for $(\hat V, \hat{\TT})$. ``` (sss-aiadps)= #### The Anti-Isomorphic Case In this section, we switch to studying anti-isomorphic ADPs. In doing so, we will consider minimization as well as maximization. We follow the notational conventions and terminology introduced in {ref}`ss-minim`. Most readers will find it helpful to review that section before reading this one. Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be ADPs with the same policy set. In line with the notation in {ref}`sss-iaoth`, we let - $\tmin$ (resp., $\htmin$) be the Bellman min-operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$), - $\vmin$ (resp., $\hvmin$) be the min-value function of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$), - $\Hmin$ (resp., $\hHmin$) be the Howard policy min-operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$), and - $\Wmin$ (resp., $\hWmin$) be the optimistic policy min-operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$). As was the case in {ref}`ss-minim`, we enhance clarity by adding a "max-" prefix to previously introduced definitions that pertain to maximization. For example, - "optimal policies" will be referred to as "max-optimal policies", - the "Bellman equation" will be referred to as the "Bellman max-equation", - the "Bellman operator" will be referred to as the "Bellman max-operator", and so on. We call $(V, \TT)$ and $(\hat V, \hat{\TT})$ **anti-isomorphic** under $F$ if these two ADPs have the same policy set $\Sigma$ and, in addition, $F$ is an anti-isomorphism from $V$ to $\hat V$ such that {eq}`eq-adpih` holds. We can also express this relationship in terms of isomorphisms and duality of ADPs, as defined in {ref}`sss-dualadps`: ```{exercise} :label: ex-aiaw Show that $(V, \TT)$ and $(\hat V, \hat{\TT})$ are anti-isomorphic under $F$ if and only if $(V, \TT)$ and $(\hat V, \hat{\TT})^\partial$ are isomorphic under $F$. (Hint: See {prf:ref}`ex-dualanti`). ``` Here is an optimality result for anti-isomorphic ADPs that parallels {prf:ref}`t-iso`. ```{prf:theorem} :label: t-antiiso If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are anti-isomorphic under $F$, then 1. $\sigma$ is $v$-max-greedy for $(V, \TT)$ if and only if $\sigma$ is $Fv$-min-greedy for $(\hat V, \hat{\TT})$, 2. $(V, \TT)$ is max-regular if and only if $(\hat V, \hat{\TT})$ is min-regular, 3. $(V, \TT)$ is well-posed if and only if $(\hat V, \hat{\TT})$ is well-posed, 4. $(V, \TT)$ is order stable if and only if $(\hat V, \hat{\TT})$ is order stable, and 5. $\sigma$ is max-optimal for $(V, \TT)$ if and only if $\sigma$ is min-optimal for $(\hat V, \hat{\TT})$. ``` ```{prf:proof} Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be anti-isomorphic, so that $(V, \TT)$ is isomorphic to $(\hat V, \hat{\TT})^\partial$ ({prf:ref}`ex-aiaw`). Regarding (i), suppose that $\sigma$ is $v$-max-greedy for $(V, \TT)$. Then, by {prf:ref}`t-iso`, $\sigma$ is $F v$-max-greedy for $(\hat V, \hat {\TT})^\partial$. Applying {prf:ref}`ex-mmp` on page , we see that $\sigma$ is $Fv$-min-greedy for $(\hat V, \hat{\TT})$. The proof of the reverse implication is analogous. This proves (i) and the proofs for the remaining claims are similar. Details are left to the reader. ◻ ``` ```{prf:theorem} :label: t-antiiso2 Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be well-posed ADPs, where $(V, \TT)$ is max-regular and $(\hat V, \hat{\TT})$ is min-regular. If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are anti-isomorphic under $F$, then 1. the Bellman operators $\tmax$ and $\htmin$ are connected via $$ F \circ \tmax = \htmin \circ F \text{ on } V, $$ (eq-antisotmax) 2. when they exist, the value functions for $(V, \TT)$ and $(\hat V, \hat{\TT})$ are related by $\hvmin = F \vmax$, 3. the fundamental max-optimality properties hold for $(V, \TT)$ if and only if the fundamental min-optimality properties hold for $(\hat V, \hat{\TT})$. ``` ```{exercise} :label: ex-antiiso2 Prove {prf:ref}`t-antiiso2`. ``` ```{solution} ex-antiiso2 Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be anti-isomorphic under $F$, so that $(V, \TT)$ and $(\hat V, \hat{\TT})^\partial$ are isomorphic under $F$. {prf:ref}`t-iso2` implies that $F \circ \tmax = \htmax^\partial \circ F$, where $\htmax^\partial$ is the Bellman max-operator of $(\hat V, \hat{\TT})^\partial$. {prf:ref}`ex-mmp` on page  gives $\htmax^\partial = \htmin$. Combining these results gives {eq}`eq-antisotmax`. Regarding (ii), since $(V, \TT)$ and $(\hat V, \hat{\TT})^\partial$ are isomorphic under $F$, {prf:ref}`t-iso2` yields $\hvmaxd = F \vmax$. Applying {prf:ref}`ex-mmp` again, we have $\hvmaxd = \hvmin$. Hence $\hvmin = F \vmax$. Regarding (iii), {prf:ref}`t-iso2` implies that the fundamental max-optimality properties hold for $(V, \TT)$ if and only if these same max-optimality properties hold for $(\hat V, \hat {\TT})^\partial$. {prf:ref}`t-fbk_min` on page  tells us that the fundamental max-optimality properties hold for $(\hat V, \hat{\TT})^\partial$ if and only if the fundamental min-optimality properties hold for $(\hat V, \hat{\TT})$. ``` Now we consider convergence of algorithms in the anti-isomorphic case. ```{prf:theorem} :label: t-antiiso3 Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be well-posed ADPs that are max-regular and min-regular, respectively. If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are anti-isomorphic under $F$, then 1. the optimistic policy operators $W$ and $\hWmin$ obey $$ F \circ W = \hWmin \circ F \text{ on } V, \text{ and} $$ (eq-antiisoop) 2. the Howard policy operators $H$ and $\hHmin$ obey $$ F \circ H = \hHmin \circ F \text{ on } V. $$ (eq-antiisohow) Moreover, if the fundamental optimality properties hold for one and hence both of these ADPs, then the following statements are true. 1. max-VFI converges for $(V, \TT)$ if and only if min-VFI converges for $(\hat V, \hat{\TT})$, 2. max-OPI converges for $(V, \TT)$ if and only if min-OPI converges for $(\hat V, \hat{\TT})$, and 3. max-HPI converges for $(V, \TT)$ if and only if min-HPI converges for $(\hat V, \hat{\TT})$. ``` ```{exercise} :label: ex-antiiso3 Prove {prf:ref}`t-antiiso3`. ``` ```{solution} ex-antiiso3 Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be anti-isomorphic under $F$, so that $(V, \TT)$ and $(\hat V, \hat{\TT})^\partial$ are isomorphic under $F$. {prf:ref}`t-iso3` implies that $F \circ W = \hWmax^\partial \circ F$, where $\hWmax^\partial$ is the optimistic policy max-operator of $(\hat V, \hat{\TT})^\partial$. {prf:ref}`ex-mmp` on page  gives $\hat W^\partial = \hWmin$. Combining these results gives {eq}`eq-antiisoop`. The proof of {eq}`eq-antiisohow` is similar. Regarding (iii), if max-VFI converges for $(V, \TT)$, then by {prf:ref}`t-iso3`, max-VFI converges for $(\hat V, \hat{\TT})^\partial$. But then min-VFI converges for $(\hat V, \hat{\TT})$, by {prf:ref}`t-fbk_min` on page . The proof of the converse implication is symmetric, and the proofs of (iv) and (v) are similar. ``` (sss-ezwo)= ### Example: Epstein--Zin Optimality In this section, we use the theory of isomorphic and anti-isomorphic ADPs to study optimality properties of a modified MDP model that incorporates Epstein--Zin preferences (motivation for which was provided in {ref}`ss-ogezintro`). We will show how isomorphic and anti-isomorphic relationships can be used to simplify analysis. To begin, consider a variation of the finite state MDP from {ref}`s-mdps` where the Bellman equation is modified to $$ v(x) = \max_{a \in \Gamma(x)} \left\{ (1-\beta) r(x, a)^\alpha + \beta \left[ (Lv)(x,a) \right]^{\alpha} \right\}^{1/\alpha}, $$ (eq-ezwb) with $$ (Lv)(x, a) \coloneq \left( \sum_{x'} v(x')^\nu P(x, a, x') \right)^{1/\nu}. $$ Here $\Xsf$, $\Asf$, $r$, $\Gamma$, $\beta$, and $P$ are all as in {ref}`s-mdps`, while $\Gsf$ is the feasible state-action pairs. The parameters $\alpha$ and $\nu$ are connected to the notation of {ref}`ss-ogezintro` via $\alpha = 1 - 1/\psi$ and $\nu = 1 - \gamma$, where $\psi$ is the EIS and $\gamma$ is the coefficient of relative risk aversion. The change of notation is made to simplify the presentation. If $\alpha=\nu=1$, then this Epstein--Zin model reduces to an ordinary finite state MDP. In this section, however, we allow $\alpha$ and $\nu$ to take any nonzero values. Using the notation $$ r_\sigma(x) \coloneq r(x, \sigma(x)) \quad \text{and} \quad (L_\sigma v)(x) \coloneq (Lv)(x, \sigma(x)), $$ we can write a corresponding policy operator $T_\sigma$ as $$ T_\sigma \, v = \left\{ (1-\beta) r_\sigma^\alpha + \beta \left( L_\sigma \, v \right)^{\alpha} \right\}^{1/\alpha}. $$ (eq-tseza) Let $\TT_{\rm EZ}$ be the set of all such $T_\sigma$. We assume that $r$ is strictly positive, so that $T_\sigma$ maps $(0, \infty)^\Xsf$ into itself. {cite}`sargent2025dynamic` establish optimality properties for this specification when $P_\sigma$ is irreducible for all $\sigma \in \Sigma$. Here we drop this assumption and establish the same optimality properties. To this end, let $\theta = \nu/\alpha$. Fix $\epsilon > 0$ with $\min r^\alpha - \epsilon > 0$. Let $$ \hat V = [v_1, v_2] \quad \text{where} \quad v_1 = m_1 \wedge m_2 \text{ and } v_2 = m_1 \vee m_2. $$ Here $m_1 \coloneq \left( \min r^\alpha - \epsilon \right)^\theta$ and $m_2 \coloneq \left( \max r^\alpha + \epsilon \right)^\theta$. Let $F$ be defined by $$ F \, v = v^\nu \qquad \text{with } v \in (0, \infty)^\Xsf, $$ where the exponent $\nu$ is applied pointwise to $v$, and set $$ V \coloneq F^{-1} \hat V = \setntn{v \in (0, \infty)^\Xsf}{v_1 \leq v^\nu \leq v_2}. $$ (eq-ezwv) We are interested in optimality properties of $(V, \TT_{\rm EZ})$. While we can try to tackle this ADP directly, the arguments become significantly easier after a transformation. To pursue this path, we introduce the auxiliary ADP $(\hat V, \hat{\TT}_{\rm EZ})$ with $\hat V$ as defined above and $$ \hat T_\sigma \, v = \left\{ (1-\beta) r_\sigma^\alpha + \beta \left( P_\sigma \, v \right)^{1/\theta} \right\}^\theta. $$ (eq-htaux) In the next exercise, $f \ll g$ means $f(x) < g(x)$ for all $x$, as in {ref}`ss-ovscon`. ```{exercise} :label: ex-topbot Verify the following inequalities: For $v_1$ and $v_2$ as defined above, we have $v_1 \ll \hat T_\sigma \, v_1$ and $\hat T_\sigma \, v_2 \ll v_2$. ``` ```{solution} ex-topbot Suppose first that $\theta > 0$, so that $v_1 = m_1$ and $v_2 = m_2$. Then $$ \begin{aligned} \hat T_\sigma \, v_1 &= \left\{ (1-\beta) r^\alpha_\sigma + \beta (\min r^\alpha - \epsilon) \right\}^\theta \\ &\geq \left\{ (1-\beta) \min r^\alpha + \beta (\min r^\alpha - \epsilon) \right\}^\theta = \left\{ \min r^\alpha - \beta \epsilon \right\}^\theta > m_1 = v_1. \end{aligned} $$ In addition, $$ \begin{aligned} \hat T_\sigma \, v_2 &= \left\{ (1-\beta) r_\sigma^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta \\ &\leq \left\{ (1-\beta) \max r^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta = \left\{ \max r^\alpha + \beta \epsilon \right\}^\theta < m_2 = v_2. \end{aligned} $$ If $\theta < 0$, then $v_1 = m_2$ and $v_2 = m_1$, so $$ \begin{aligned} \hat T_\sigma \, v_1 &= \left\{ (1-\beta) r_\sigma^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta \\ &\geq \left\{ (1-\beta) \max r^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta = \left\{ \max r^\alpha + \beta \epsilon \right\}^\theta > m_2 = v_1. \end{aligned} $$ In addition, $$ \begin{aligned} \hat T_\sigma \, v_2 &= \left\{ (1-\beta) r_\sigma^\alpha + \beta (\min r^\alpha - \epsilon) \right\}^\theta \\ &\leq \left\{ (1-\beta) \min r^\alpha + \beta (\min r^\alpha - \epsilon) \right\}^\theta = \left\{ \min r^\alpha - \beta \epsilon \right\}^\theta < m_1 = v_2. \end{aligned} $$ ``` ```{exercise} :label: ex-coonf Confirm the following statements. 1. If $0 < \theta \leq 1$, then $\hat T_\sigma$ is convex on $\hat V$. 2. If $\theta < 0$ or $1 \leq \theta$, then $\hat T_\sigma$ is concave on $\hat V$. ``` ```{solution} ex-coonf For $c, t > 0$, let $f(t) \coloneq ((1-\beta)c + \beta t^{1/\theta})^\theta$. Simple calculations show that $f' > 0$, that $f'' \leq 0$ when $\theta < 0$ or $1 \leq \theta$ (with equality only at $\theta = 1$), and that $f'' \geq 0$ when $0 < \theta \leq 1$ (again with equality only at $\theta = 1$). Hence $f$ is concave when $\theta < 0$ or $1 \leq \theta$, and convex when $0 < \theta \leq 1$. The claim in the exercise follows easily from these facts and the definition of $\hat T_\sigma$ in {eq}`eq-htaux`. ``` The results from {prf:ref}`ex-topbot` and {prf:ref}`ex-coonf` allow us to establish optimality properties for the auxiliary ADP $(\hat V, \hat{\TT}_{\rm EZ})$. ```{prf:lemma} :label: l-faux The following statements are both true. 1. The fundamental max-optimality results hold for $(\hat V, \hat{\TT}_{\rm EZ})$ and max-VFI, max-OPI, and max-HPI all converge. 2. The fundamental min-optimality results hold for $(\hat V, \hat{\TT}_{\rm EZ})$ and min-VFI, min-OPI, and min-HPI all converge. ``` ```{prf:proof} Fix $\sigma \in \Sigma$. Since $\hat T_\sigma$ is order-preserving and $v_1 \leq v \leq v_2$ for all $v \in \hat V$, {prf:ref}`ex-topbot` gives $v_1 \ll \hat T_\sigma \, v_1 \leq \hat T_\sigma \, v \leq \hat T_\sigma \, v_2 \ll v_2$. Also, by {prf:ref}`ex-coonf`, $\hat T_\sigma$ is either concave or convex. {prf:ref}`t-riesz_con` now implies that the fundamental max-optimality properties hold for $(\hat V, \hat{\TT}_{\rm EZ})$ and max-VFI, max-OPI, and max-HPI all converge. (See, in particular, {prf:ref}`l-riesz_con` on page .) Applying {prf:ref}`t-fbk_min`, we see that the second statement in {prf:ref}`l-faux` is true. ◻ ``` ```{exercise} :label: ex-ezii Show that, for all $\sigma \in \Sigma$, we have $F \circ T_\sigma = \hat T_\sigma \circ F$ on $V$. ``` ```{solution} ex-ezii Fix $\sigma \in \Sigma$ and $v \in V$. On the one hand, $$ F \, T_\sigma \, v = (T_\sigma \, v)^\nu = \left\{ (1-\beta) r_\sigma^\alpha + \beta (P_\sigma\, v^\nu)^{\alpha/\nu} \right\}^{\nu/\alpha} = \left\{ (1-\beta) r_\sigma^\alpha + \beta (P_\sigma\, v^\nu)^{1/\theta} \right\}^{\theta}. $$ On the other, $$ \hat T_\sigma \, F \, v = \hat T_\sigma \, v^\nu = \left\{ (1-\beta) r_\sigma^\alpha + \beta (P_\sigma\, v^\nu)^{1/\theta} \right\}^{\theta}. $$ Hence $F \circ T_\sigma = \hat T_\sigma \circ F$ on $V$, as claimed. ``` The next result follows easily from the conclusion of {prf:ref}`ex-ezii`. ```{prf:lemma} :label: l-eziso The following statements are true: 1. If $\nu > 0$, then $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are isomorphic. 2. If $\nu < 0$, then $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are anti-isomorphic. ``` ```{prf:proof} If $\nu < 0$, then $F$ is an order anti-isomorphism from $V$ to $\hat V$. (Obviously $F$ is order-reversing. Also, $F$ is clearly one-to-one and, by construction, $F$ maps $V$ onto $\hat V$.) From this fact and {prf:ref}`ex-ezii`, the ADPs $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are anti-isomorphic. If $\nu > 0$, then $F$ is an order isomorphism from $V$ to $\hat V$, so, applying the result of {prf:ref}`ex-ezii` again, $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are isomorphic. ◻ ``` We are now ready to state and prove the main result of this section. ```{prf:proposition} :label: p-ezsbiai The fundamental max-optimality properties hold for $(V, \TT_{\rm EZ})$. In addition, max-VFI, max-OPI, and max-HPI all converge. ``` ```{prf:proof} *Proof of {prf:ref}`p-ezsbiai`.* Suppose first that $\nu > 0$. Then, by {prf:ref}`l-eziso`, $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are isomorphic. Moreover, by {prf:ref}`l-faux`, the fundamental max-opti­mality results hold for $(\hat V, \hat{\TT}_{\rm EZ})$ and max-VFI, max-OPI, and max-HPI all converge. {prf:ref}`t-iso2` and {prf:ref}`t-iso3` now imply that the same results hold for $(V, \TT_{\rm EZ})$. Next, suppose that $\nu < 0$. Then, by {prf:ref}`l-eziso`, $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are anti-isomorphic. Moreover, by {prf:ref}`l-faux`, the fundamental min-optimality results hold for $(\hat V, \hat{\TT}_{\rm EZ})$, which implies that $(\hat V, \hat{\TT}_{\rm EZ})$ is well-posed and min-regular. {prf:ref}`t-antiiso` then gives that $(V, \TT_{\rm EZ})$ is also well-posed and max-regular. In addition, min-VFI, min-OPI, and min-HPI all converge for $(\hat V, \hat{\TT}_{\rm EZ})$. Applying {prf:ref}`t-antiiso2` and {prf:ref}`t-antiiso3`, we see that the fundamental max-optimality results hold for $(V, \TT_{\rm EZ})$ and max-VFI, max-OPI, and max-HPI all converge. ◻ ``` The relationship between $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ allows us to use either one to solve for an optimal policy. For example, if $\nu < 0$, then, by {prf:ref}`t-antiiso`, any min-optimal policy for $(\hat V, \hat{\TT}_{\rm EZ})$ will be max-optimal for $(V, \TT_{\rm EZ})$. Hence we can solve for the Epstein--Zin max-optimal policy either by directly solving $(V, \TT_{\rm EZ})$ or by solving $(\hat V, \hat{\TT}_{\rm EZ})$ for a min-optimal policy. The best choice depends on computational simplicity and numerical stability. (s-subs)= ## Semiconjugate Relationships In this section we introduce an asymmetric relationship between ADPs that involves a form of factorization. This factorization produces two versions of an ADP: a primary one and a subordinate one. Typically, the primary ADP will be relatively standard, while the subordinate ADP can be thought of as a variation that might have some analytical or computational advantages. Under certain conditions, studying the "simpler" subordinate ADP will shed light on outcomes and solutions for the primary ADP. In the applications we consider, the associated transformations are not bijective, unlike the isomorphic relationships considered in {ref}`ss-iso`. Sometimes the lack of bijective transformations occurs because one dynamic program (the primary ADP) evolves in a higher dimensional space than the other (the subordinate ADP). Although the transformations in question are not bijections, we nonetheless show that the primary and subordinate ADPs have tight connections in terms of optimality. This section is related to the dynamic programs associated with the modified Bellman equations we introduced in Chapter 5 of {cite}`sargent2025dynamic`. Relative to that theory, the exposition below is more concise and more general. As a result, we can easily cover additional variations on the MDP Bellman equation, of which there are many, as well as studying relationships between ADPs beyond the traditional MDP setting. ### Strong Semiconjugacy We begin by introducing a similarity notion related to conjugacy (see {ref}`sss-orderiso`) and show how it links the dynamics of trajectories on posets. (sss-semiconj)= #### Definition Let $(V, S)$ and $(\hat V, \hat S)$ be dynamical systems, where $V$ and $\hat V$ are posets. In this setting, we call $(V, S)$ and $(\hat V, \hat S)$ **strongly semiconjugate under $F,G$** when there exist maps $F \colon V \to \hat V$ and $G \colon \hat V \to V$ such that $$ S = G \circ F \text{ on } V \qquad \text{and} \qquad \hat S = F \circ G \text{ on } \hat V. $$ (eq-cojm) The "semiconjugate" terminology comes from the fact that, when {eq}`eq-cojm` holds, $$ F \circ S = \hat S \circ F \quad \text{and} \quad G \circ \hat S = S \circ G . $$ (eq-oscpo) An immediate implication of the definitions is: if either $F$ or $G$ is an order isomorphism, then the systems $(V, S)$ and $(\hat V, \hat S)$ are order conjugate. However, in the applications we consider, neither $F$ nor $G$ will be bijective. ```{exercise} :label: ex-transforms-auto-4 Confirm that {eq}`eq-cojm` implies {eq}`eq-oscpo`. ``` {numref}`fig-conj_semiconj` helps to illustrate the difference between conjugacy and strong semiconjugacy. ```{figure} figures/conj_semiconj.svg :name: fig-conj_semiconj Comparison of conjugacy and strong semiconjugacy ``` Like order conjugacy, strong semiconjugacy can be used to derive useful relationships between dynamical systems. The next lemma lists relationships that will be helpful when we turn to dynamic programming. ```{prf:lemma} :label: l-foo0 Let $(V, S)$ and $(\hat V, \hat S)$ be strongly semiconjugate under $F, G$. In this setting, 1. if $v$ is a fixed point of $S$ in $V$, then $Fv$ is a fixed point of $\hat S$ in $\hat V$. 2. if $w$ is a fixed point of $\hat S$ in $\hat V$, then $G w$ is a fixed point of $S$ in $V$. 3. $S$ has a unique fixed point in $V$ if and only if $\hat S$ has a unique fixed point in $\hat V$. ``` ```{prf:proof} Let $(V, S)$ and $(\hat V, \hat S)$ be as stated. If $v$ is a fixed point of $S$ in $V$, then $\hat S F v = F S v = F v$, so $Fv$ is a fixed point of $\hat S$ in $\hat V$. Similarly, if $w$ is a fixed point of $\hat S$ in $\hat V$, then $S G w = G \hat S w = G w$, so $G w$ is a fixed point of $S$ in $V$. This proves (i)--(ii). Regarding (iii), suppose that $v$ is the only fixed point of $S$ in $V$. By (i), $Fv$ is a fixed point of $\hat S$ in $\hat V$. Suppose in addition that $w$ is fixed for $\hat S$. Then $F G w = w$ and hence $G F G w = G w$, or $S G w = G w$. Since $v$ is the only fixed point of $S$ in $V$, we have $G w = v$. Applying $F$ gives $\hat S w = F v$. But $w$ is fixed for $\hat S$, so $w = F v$. This shows that $\hat S$ has exactly one fixed point in $\hat V$. The reverse implication holds by symmetry. ◻ ``` The next lemma extends {prf:ref}`l-foo0` to cover order stability. ```{prf:lemma} :label: l-foo0b Let $(V, S)$ and $(\hat V, \hat S)$ be strongly semiconjugate under $F, G$, and let $F, G$ be either both order preserving or both order reversing. Then 1. $S$ is order stable on $V$ if and only if $\hat S$ is order stable on $\hat V$. 2. If, in addition, $F$ and $G$ are order continuous, then $S$ is strongly order stable on $V$ if and only if $\hat S$ is strongly order stable on $\hat V$. ``` ```{prf:proof} Suppose first that $S$ is order stable on $V$, with unique fixed point $\bar v \in V$. By {prf:ref}`l-foo0`, $F \bar v$ is the unique fixed point of $\hat S$ in $\hat V$. To verify the order-stability conditions on $\hat V$, fix $w \in \hat V$ with $w \preceq \hat S w$. If $F$ and $G$ are order preserving, then $G w \preceq G \hat S w = S G w$, so order stability of $S$ gives $G w \preceq \bar v$. Applying $F$ yields $F G w \preceq F \bar v$, that is, $\hat S w \preceq F \bar v$. Combined with $w \preceq \hat S w$, this gives $w \preceq F \bar v$. If instead $F$ and $G$ are order reversing, then $G \hat S w \preceq G w$, i.e., $S G w \preceq G w$, so order stability of $S$ gives $\bar v \preceq G w$. Applying the order-reversing $F$ yields $F G w \preceq F \bar v$, again giving $w \preceq \hat S w \preceq F \bar v$. The proof that $\hat S w \preceq w$ implies $F \bar v \preceq w$ is similar. Hence $\hat S$ is order stable on $\hat V$. The reverse implication holds by symmetry, proving (i). Regarding (ii), suppose that $S$ is strongly order stable on $V$ and fix $w \in \hat V$ with $w \preceq \hat S w$. If $F$ and $G$ are order preserving, then $G w \preceq S G w$, so strong order stability of $S$ gives $S^n(Gw) \uparrow \bar v$. Applying $F$ yields $F(S^n(Gw)) \uparrow F \bar v$. If instead $F$ and $G$ are order reversing, then $S G w \preceq G w$, so strong order stability of $S$ gives $S^n(Gw) \downarrow \bar v$, and the order-reversing $F$ gives $F(S^n(Gw)) \uparrow F \bar v$. In both cases, using $\hat S^{n+1} w = F(S^n(Gw))$ (a consequence of {eq}`eq-oscpo`), we obtain $\hat S^n w \uparrow F \bar v$. The proof that $\hat S w \preceq w$ implies $\hat S^n w \downarrow F \bar v$ is similar. Hence $\hat S$ is strongly order stable on $\hat V$. The reverse implication holds by symmetry, proving (ii). ◻ ``` When we get to applications, our main aim will be to convert a given system $(V,S)$ into a "nicer" system $(\hat V, \hat S)$ and then learn about $(V, S)$ by studying $(\hat V, \hat S)$. In particular, we wish to (a) deduce the existence of a unique fixed point of $(V,S)$ only by studying $(\hat V, \hat S)$, and (b) compute this unique fixed point, working only with $(\hat V, \hat S)$. The next theorem shows the way. It draws on {prf:ref}`l-foo0` and {prf:ref}`l-foo0b` for translation of fixed points and order stability, and then adds a convergence result. ```{prf:theorem} :label: t-sosf Let $(V, S)$ and $(\hat V, \hat S)$ be strongly semiconjugate under order-preserving $F, G$, and let $G$ be order continuous. If $(\hat V, \hat S)$ is strongly order stable with unique fixed point $\bar w$, then $(V, S)$ is strongly order stable with unique fixed point $\bar v \coloneq G \bar w$ and $$ w \in \hat V \text{ and } w \preceq \hat S w \quad \implies \quad G \hat S^n w \uparrow \bar v . $$ (eq-oros) ``` Notice that, in {eq}`eq-oros`, we iterate only in the "nice" system $(\hat V, \hat S)$, and finally transfer back to $V$ using the mapping $G$. ```{prf:proof} *Proof of {prf:ref}`t-sosf`.* Let $(V, S)$ and $(\hat V, \hat S)$ be as stated. The fact that $(V, S)$ is strongly order stable with unique fixed point $\bar v \coloneq G \bar w$ follows from {prf:ref}`l-foo0` and {prf:ref}`l-foo0b`. Regarding {eq}`eq-oros`, fix $w \in \hat V$ with $w \preceq \hat S w$. By strong order stability of $(\hat V, \hat S)$, we have $\hat S^n w \uparrow \bar w$. Order continuity of $G$ then yields $G \hat S^n w \uparrow G \bar w = \bar v$. ◻ ``` (sss-orsc)= #### The Order-Reversing Case {prf:ref}`l-foo0b` already tells us that order stability transfers between strongly semiconjugate systems when $F$ and $G$ are both order reversing. The next result parallels {prf:ref}`t-sosf` for this case. ```{prf:theorem} :label: t-sosfr Let $(V, S)$ and $(\hat V, \hat S)$ be strongly semiconjugate under order-reversing $F, G$, and suppose that $w_n \downarrow w$ in $\hat V$ implies $G w_n \uparrow Gw$ in $V$. If $(\hat V, \hat S)$ is strongly order stable with unique fixed point $\bar w$, then $(V, S)$ is strongly order stable with unique fixed point $\bar v \coloneq G \bar w$ and $$ w \in \hat V \text{ and } \hat S w \preceq w \quad \implies \quad G \hat S^n w \uparrow \bar v . $$ (eq-orosr) ``` ```{prf:proof} Strong order stability of $(V, S)$ follows from {prf:ref}`l-foo0` and {prf:ref}`l-foo0b`. Regarding {eq}`eq-orosr`, fix $w \in \hat V$ with $\hat S w \preceq w$. By strong order stability of $(\hat V, \hat S)$, we have $\hat S^n w \downarrow \bar w$. The hypothesis on $G$ then yields $G \hat S^n w \uparrow G \bar w = \bar v$. ◻ ``` Note that the initial condition changes from $w \preceq \hat S w$ in {prf:ref}`t-sosf` to $\hat S w \preceq w$ in {prf:ref}`t-sosfr`. Starting above the fixed point in $\hat V$ and iterating down generates a sequence that, after applying the order-reversing map $G$, converges up to $\bar v$. #### Application: Firm Entry To illustrate strong semiconjugacy and its implications, we now study a firm entry problem from {cite}`fajgelbaum2017uncertainty`, slightly extended to allow discount rates to change over time. The basic structure of the model is similar to the firm problem we studied in {ref}`s-fpintro`. We show that the functional equation that describes firm value can be solved in lower-dimensional space and then mapped back to the original higher-dimensional space. Analogous to {eq}`eq-fintroie` on page , we take the lifetime value of a firm to be a function $v$ that solves $$ v(z, f) = \max \left\{ s(z) - f, \; \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\}. $$ (eq-lf) Here $z$ is an exogenous state, taking values in $\RR^k$, the real number $f$ represents an iid fixed cost, with distribution $\phi$, and $Q$ is a stochastic kernel for the exogenous state. The value $s(z)$ represents the present value of profits for the firm if it chooses to enter in the current period. Discounting is implemented via a state-dependent factor $\beta(z) > 0$. Let $\Esf \subseteq \RR_+$ be the set where $f$ takes values and let $\Zsf \subseteq \RR^k$ be the set where $z$ takes values. Let $\Xsf$ be the cartesian product $\Zsf \times \Esf$. Let $b\Xsf$ and $b\Zsf$ be real-valued bounded and Borel measurable functions on $\Xsf$ and $\Zsf$ respectively. We endow both function spaces with the pointwise partial order $\leq$ and the supremum norm. ```{prf:assumption} :label: a-fv The following conditions hold: 1. $\beta \colon \Zsf \to (0, \infty)$ is bounded and Borel measurable, 2. the profit function $s \colon \Zsf \to \RR$ is bounded and Borel measurable, and 3. there exists an $n \in \NN$ such that $$ \sup_{z \in \Zsf} \EE_z \prod_{t=0}^{n-1} \beta(Z_t) < 1. $$ ``` In (iii), the process $(Z_t)$ is $Q$-Markov with initial condition $z$. Note that, in the traditional constant discount rate setting, the term in (iii) is just $\beta^n$ for some constant $\beta \in (0,1)$, so the condition is automatically satisfied. To solve {eq}`eq-lf` we introduce an operator $S$, defined at each $v \in b\Xsf$ by $$ (Sv)(z, f) = \max \left\{ s(z) - f, \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\}. $$ Evidently, $v$ is a fixed point of $S$ if and only if it solves the firm valuation equation {eq}`eq-lf`. Although we can study $S$ directly, this fixed point problem can be solved in a lower-dimensional space by working with an alternative operator $T \colon b\Zsf \to b\Zsf$ defined at each $w \in b\Zsf$ by $$ (T w)(z) = \int \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \phi(\diff f) $$ ```{prf:lemma} :label: l-gs If {prf:ref}`a-fv` holds, then $T$ is strongly order stable on $b\Zsf$. ``` ```{prf:proof} Clearly, $T$ is a self-map on $b\Zsf$. For any $w_1, w_2 \in b\Zsf$, we have $$ \begin{aligned} |(T w_1)(z) - (T w_2)(z)| & \leq \beta(z) \int |w_1(z') - w_2(z')| Q(z, \diff z'). \end{aligned} $$ This inequality and Theorem 2.1 of {cite}`stachurski2021dynamic` imply that $(b\Zsf, T)$ is eventually contracting and hence globally stable under the conditions of {prf:ref}`a-fv`. Since $T$ is order preserving, the claim that $T$ is strongly order stable now follows from {prf:ref}`l-pspace`. ◻ ``` To connect $S$ and $T$, we introduce the map $G \colon b\Zsf \to b\Xsf$ defined via $$ (G w)(z, f) = \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \qquad ((z,f) \in \Xsf) $$ and the map $F \colon b\Xsf \to b\Zsf$ defined by $$ (Fv)(z) = \int v(z, f) \phi(\diff f) \qquad (z \in \Zsf). $$ The significance of $F$ and $G$ stems from the next lemma. ```{prf:lemma} :label: l-stfp $(b\Xsf, S)$ and $(b\Zsf, T)$ are strongly semiconjugate under order-preserving $F, G$. ``` ```{prf:proof} Clearly, $F$ and $G$ are order preserving. For each $v \in b\Xsf$, we have $$ \begin{aligned} (G F v)(z, f) &= \max \left\{ s(z) - f, \beta(z) \int (F v)(z') Q(z, \diff z') \right\} \\ &= \max \left\{ s(z) - f, \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\} = (Sv)(z, f). \end{aligned} $$ This confirms that $S = G \circ F$ holds. In addition, $T = F \circ G$ holds because, for each $w \in b\Zsf$, $$ \begin{aligned} (F G w)(z) &= \int (G w)(z, f) \phi(\diff f) \\ &= \int \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \phi(\diff f) = (Tw)(z). \end{aligned} $$ This proves that $(b\Xsf, S)$ and $(b\Zsf, T)$ are strongly semiconjugate under order-preserving $F, G$, as claimed. ◻ ``` We are now ready to use the low-dimensional system $(b\Zsf, T)$ to solve the higher-dimensional system $(b\Xsf, S)$. ```{prf:proposition} :label: p-fk If {prf:ref}`a-fv` holds, then the firm valuation functional equation {eq}`eq-lf` has a unique solution $\bar v$ in $b\Xsf$. Moreover, for $w \in b\Zsf$, $$ w \preceq T w \implies G T^n w \uparrow \bar v. $$ ``` In proving {prf:ref}`p-fk`, we use the following lemma. ```{prf:lemma} :label: l-gcont The map $G$ is order continuous. ``` ```{prf:proof} The statement $w_n \uparrow w$ in $b\Zsf$ is equivalent to the real convergence of $w_n(z)$ up to $w(z)$ for all $z \in \Zsf$ ({prf:ref}`l-pcid`). Using this fact, we fix $w_n \uparrow w$ in $b\Zsf$ and obtain $w_n(z') \uparrow w(z')$ for any $z' \in \Zsf$. Next we fix $(z, f) \in \Xsf$ and use the dominated convergence theorem to obtain $\int w_n(z') Q(z, \diff z') \uparrow \int w(z') Q(z, \diff z')$. This in turn implies that $(G w_n)(z, f) \uparrow (G w)(z, f)$. Thus, using {prf:ref}`l-pcid` again, $G w_n \uparrow G w$. ◻ ``` ```{prf:proof} *Proof of {prf:ref}`p-fk`.* We saw in {prf:ref}`l-stfp` that $(b\Xsf, S)$ and $(b\Zsf, T)$ are strongly semiconjugate under order-preserving $F, G$. In {prf:ref}`l-gcont` we proved that $G$ is order continuous. Finally, we proved in {prf:ref}`l-gs` that $T$ is strongly order stable on $b\Zsf$. All the claims in {prf:ref}`p-fk` now follow from {prf:ref}`t-sosf`. ◻ ``` (ss-fdps)= ### Factored Dynamic Programs Now we switch from studying dynamical systems to studying dynamic programs. This is straightforward because we view dynamic programs (ADPs) as families of dynamical systems. (definition)= #### Definition A **factored dynamic program** (FDP) is a tuple $(V, F, \hat V, \GG)$ where 1. $V$ and $\hat V$ are nonempty posets, 2. $F$ is a map from $V$ to $\hat V$, and 3. $\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}$ is a family of maps from $\hat V$ to $V$, and 4. the set $\{G_\sigma \, \hat v\}_{\sigma \in \Sigma}$ has a greatest element for every $\hat v \in \hat V$. If $F$ and all $G_\sigma$ are all order preserving, then we call $(V, F, \hat V, \GG)$ an **order preserving FDP**. (In {ref}`ss-orfdps`, we introduce order-reversing FDPs. This case is rarer, typically involving some form of nonstandard preferences.) Given an FDP $(V, F, \hat V, \GG)$, we introduce an operator family $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ by setting $$ T_\sigma = G_\sigma \circ F \text{ for all } \sigma \in \Sigma. $$ Evidently $(V, \TT)$ is an ADP. We call it the **primary ADP** generated by $(V, F, \hat V, \GG)$. The FDP $(V, F, \hat V, \GG)$ also produces a second ADP $(\hat V, \hat{\TT})$, where the policy operators in $\hat{\TT}$ take the form $$ \hat T_\sigma = F \circ G_\sigma \qquad \text{ for all } \sigma \in \Sigma, $$ We call $(\hat V, \hat{\TT})$ the **subordinate ADP** generated by $(V, F, \hat V, \GG)$. The figure below illustrates, with the numbers indicating the order in which mappings are applied. (The ordering of the tuple $(V, F, \hat V, \GG)$ traces the primary cycle on the left.) (sss-sprs)= #### Some Preliminary Results In this section we examine some basic consequences of the definitions, focusing on the order-preserving case. To this end, let $(V, F, \hat V, \GG)$ be an order-preserving FDP. We set $$ \Gmax \, \hat v \coloneq \bigvee_\sigma G_\sigma \, \hat v \qquad (\hat v \in \hat V), $$ (eq-gv) which is well-defined by (iv) in the definition of FDPs. We now state some preliminary results concerning the ADPs generated by $(V, F, \hat V, \GG)$. ```{prf:lemma} :label: l-gmax If $(V, \TT)$ is the primary ADP generated by $(V, F, \hat V, \GG)$, then 1. the Bellman operator $T$ obeys $T = \Gmax \circ F$ on $V$, and 2. $\sigma$ is $v$-greedy for $(V, \TT)$ if and only if $G_\sigma F v = \Gmax F v$. ``` ```{prf:proof} Regarding (i), for any $v \in V$ we have $T v = \bigvee_\sigma T_\sigma \, v = \bigvee_\sigma G_\sigma \, F v = \Gmax F v$. Claim (ii) follows from (i) by {prf:ref}`l-torper` on page . ◻ ``` ```{prf:lemma} :label: l-gmh If $(\hat V, \hat{\TT})$ is the subordinate ADP generated by $(V, F, \hat V, \GG)$, then 1. the Bellman operator $\hat T$ obeys $\hat T = F \circ \Gmax$ on $\hat V$, and 2. if $G_\sigma \, \hat v = \Gmax \hat v$, then $\sigma$ is $\hat v$-greedy for $(\hat V, \hat{\TT})$. ``` ```{prf:proof} Regarding (i), fix $\hat v \in \hat V$. By (iv), the set $\{G_\sigma \, \hat v\}$ has a greatest element, so $$ \htmax \hat v = \bigvee_\sigma F G_\sigma \, \hat v = F \bigvee_\sigma G_\sigma \, \hat v = F \Gmax \hat v. $$ Regarding (ii), if $G_\sigma \, \hat v = \Gmax \hat v$, then applying $F$ to both sides gives $\hat T_\sigma \, \hat v = \hat T \hat v$. Hence $\sigma$ is $\hat v$-greedy for $(\hat V, \hat{\TT})$. ◻ ``` Let $(V, F, \hat V, \GG)$ be an order-preserving FDP with generated ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$. We have already observed that the policy operators of $(V, \TT)$ and $(\hat V, \hat{\TT})$ obey $$ T_\sigma = G_\sigma \circ F \quad \text{and} \quad \hat T_\sigma = F \circ G_\sigma $$ (eq-oscpo0) for all $\sigma \in \Sigma$. It follows that each pair of policy systems $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ is strongly semiconjugate under $F, G_\sigma$. As shown in {prf:ref}`l-gmh`, we also have $$ \tmax = \Gmax \circ F \; \text{ on } V \qquad \text{and} \qquad \htmax = F \circ \Gmax \; \text{ on } \hat V. $$ (eq-ttse) ```{prf:lemma} :label: l-subcg2 The systems $(V, T)$ and $(\hat V, \hat T)$ are strongly semiconjugate under $F, \Gmax$. ``` The proof is immediate from {eq}`eq-ttse`. The strong semiconjugacy results in {prf:ref}`l-subcg2` imply helpful similarity properties for the two ADPs. The next lemma helps to illustrate. ```{prf:lemma} :label: l-subop The following relationships hold: 1. $(\hat V, \hat{\TT})$ is well-posed if and only if $(V, \TT)$ is well-posed, and 2. $(\hat V, \hat{\TT})$ is order stable if and only if $(V, \TT)$ is order stable. In either case, the $\sigma$-value functions are linked by $$ \hat v_\sigma = F v_\sigma \quad \text{and} \quad v_\sigma = G_\sigma \, \hat v_\sigma \quad \text{for all} \quad \sigma \in \Sigma. $$ (eq-vshvs) ``` ```{prf:proof} All claims follow from {prf:ref}`l-foo0` and {prf:ref}`l-foo0b` and the observation that $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ are strongly semiconjugate under order-preserving $F, G_\sigma$ at every $\sigma \in \Sigma$. ◻ ``` Note that {eq}`eq-ttse` implies that $(V, \tmax)$ and $(\hat V, \htmax)$ are strongly semiconjugate under order-preserving $F, \Gmax$. This fact allows us to connect optima for the two ADPs and we use it repeatedly below. #### Optimality Now we are ready to study the extent to which optimality properties are transferred under subordination. As before, the context is that $(V, F, \hat V, \GG)$ is an order-preserving FDP with primary ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$. The symbols $\tmax$ and $\htmax$ denote their respective Bellman operators. When they exist, $$ \vmax = \bigvee_\sigma v_\sigma \quad \text{and} \quad \hvmax = \bigvee_\sigma \hat v_\sigma $$ will represent their respective value functions. We begin with our main optimality result for order-preserving FDPs and the two ADPs they generate. ```{prf:theorem} :label: t-rgsc The following statements are equivalent: 1. The fundamental optimality properties hold for $(V, \TT)$. 2. The fundamental optimality properties hold for $(\hat V, \hat{\TT})$. If either and hence both of these statements are true, then 1. the value functions obey $$ \vmax = \Gmax \, \hvmax \quad \text{and} \quad \hvmax = F \, \vmax, $$ (eq-nttse) 2. $G_\sigma \, \hvmax = \Gmax \, \hvmax$ $\implies$ $\sigma$ is optimal for $(V, \TT)$, and 3. $\sigma$ is optimal for $(V, \TT)$ $\implies$ $\sigma$ is optimal for $(\hat V, \hat{\TT})$. ``` In the proof of {prf:ref}`t-rgsc`, we repeatedly use the strong semiconjugacy results in {prf:ref}`l-foo0` to transfer fixed points from one value space to another. ```{prf:proof} *Proof of {prf:ref}`t-rgsc`.* Suppose that (a) holds. Then $\vmax$ exists and is the unique fixed point of $\tmax$ in $V$. {prf:ref}`l-subcg2` states that $(V, \tmax)$ and $(\hat V, \htmax)$ are strongly semiconjugate under $F, \Gmax$, so, by {prf:ref}`l-foo0`, $F\vmax$ is the unique fixed point of $\htmax$ in $\hat V$. We claim that $\hvmax = F \vmax$. To see this, observe first that, given $\sigma \in \Sigma$, we have $v_\sigma \preceq \vmax$, so, applying the fixed point translation in {eq}`eq-vshvs`, we get $\hat v_\sigma = F v_\sigma \preceq F \vmax$. The last inequality becomes an equality if $\sigma$ is optimal for $(V, \TT)$. This proves that the supremum $\hvmax = \bigvee_\sigma \hat v_\sigma$ is equal to $F \vmax$. In particular, $\hvmax$ exists and is the unique fixed point of $\hat T$ in $\hat V$. By {prf:ref}`p-foo`, (b) holds. Now suppose that (b) holds. Then $\hvmax$ exists and is the unique fixed point of $\htmax$ in $\hat V$. Since $(V, \tmax)$ and $(\hat V, \htmax)$ are strongly semiconjugate under $F, \Gmax$, the element $\Gmax \hvmax$ is the unique fixed point of $\tmax$ in $V$. To prove (a), we need only show that $\vmax$ exists and $\vmax = \Gmax \hvmax$, since {prf:ref}`p-foo` then gives the fundamental optimality properties. First take any $\sigma \in \Sigma$. We have $\hat v_\sigma \preceq \hvmax$ and, by strong semiconjugacy of $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ under $F, G_\sigma$, the equality $v_\sigma = G_\sigma \, \hat v_\sigma$. Using these facts together gives $v_\sigma = G_\sigma \, \hat v_\sigma \preceq G_\sigma \, \hvmax \preceq \Gmax \hvmax$. This proves that $\Gmax \hvmax$ is an upper bound of $V_\Sigma$ in $V$. Thus, we can complete the proof by producing a $\sigma \in \Sigma$ with $v_\sigma = \Gmax \hvmax$. To this end, we choose $\sigma$ such that $G_\sigma \, \hvmax = \Gmax \hvmax$, which is possible by the definition of order-preserving FDPs. Bellman's principle of optimality holds for $(\hat V, \hat{\TT})$ (by (b) and {prf:ref}`l-fo` on page ) and, applying $F$ to the previous equality gives $\hat T_\sigma \, \hvmax = \hat T \hvmax$. Hence $\sigma$ is optimal for $(\hat V, \hat{\TT})$, so $\hat v_\sigma = \hvmax$. Combining this equality with $G_\sigma \, \hvmax = \Gmax \hvmax$ yields $v_\sigma = G_\sigma \, \hat v_\sigma = G_\sigma \, \hvmax = \Gmax \hvmax$. Now suppose that (a) and (b) hold. The arguments above also showed that {eq}`eq-nttse` is valid. Regarding (ii), let $\sigma \in \Sigma$ be such that $G_\sigma \, \hvmax = \Gmax \, \hvmax$. Applying {eq}`eq-nttse` yields $G_\sigma \, F \vmax = \Gmax F \vmax$, or $T_\sigma \, \vmax = \tmax \, \vmax$. By Bellman's principle of optimality, $\sigma$ is optimal for $(V, \TT)$. Regarding (iii), let $\sigma$ be optimal for $(V, \TT)$. Since $(V, \TT)$ obeys the fundamental optimality properties, $\sigma$ is $\vmax$-greedy (i.e., $T_\sigma \, \vmax = \tmax \vmax$). Also, by {eq}`eq-nttse`, we have $\hvmax = F \vmax$. Therefore, $$ \hat T_\sigma \, \hvmax = \hat T_\sigma \, F \vmax = F G_\sigma \, F \vmax = F \, T_\sigma \, \vmax = F \, T \vmax = F \vmax = \hvmax = \htmax \, \hvmax. $$ Thus, $\sigma$ is $\hvmax$-greedy for $(\hat V, \hat{\TT})$. But Bellman's principle of optimality also holds for $(\hat V, \hat{\TT})$, so $\sigma$ is optimal for $(\hat V, \hat{\TT})$. ◻ ``` (sss-aci)= #### A Converse Implication We continue to assume that $(V, F, \hat V, \GG)$ is an order-preserving FDP with primary ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$. We saw in {prf:ref}`t-rgsc` that the following implication holds: if $\sigma$ is optimal for $(V, \TT)$, then $\sigma$ is optimal for $(\hat V, \hat{\TT})$. The converse is not in general true. Often this is because, at an intuitive level, the subordinate ADP is blind to policy behavior at states that are unreachable under the transition dynamics, while the primary ADP cares about every state. (An example of this scenario is given in {ref}`sss-oh_conv`.) To obtain such a converse, we use a strict monotonicity condition on $F$. In what follows, $\prec$ refers to the strict inequality defined in {ref}`sss-stmon`. In particular, for elements $u, v$, the statement $u \prec v$ means that $u \preceq v$ and not $u = v$. Also, $F$ is strictly order preserving when $F$ is order preserving and $u \prec v$ implies $Fu \prec Fv$. ```{prf:proposition} :label: p-stmon Let the fundamental optimality properties hold for $(\hat V, \hat{\TT})$ and suppose, in addition, that $F$ is strictly order preserving. In this setting, 1. the fundamental optimality properties also hold for $(V, \TT)$ and 2. $\sigma$ is optimal for $(\hat V, \hat{\TT})$ $\implies$ $\sigma$ is optimal for $(V, \TT)$. ``` ```{prf:proof} Part (i) is immediate from {prf:ref}`t-rgsc` and only included for completeness. Regarding part (ii), let $\sigma$ be optimal for $(\hat V, \hat{\TT})$. Since the fundamental optimality properties hold for $(\hat V, \hat{\TT})$, the policy $\sigma$ must be $\hvmax$-greedy, from which we obtain $\hat T_\sigma \, \hvmax = \htmax \hvmax$. By definition of $\Gmax$, we also have $G_\sigma \, \hvmax \preceq \Gmax \hvmax$. Suppose in addition that $G_\sigma \, \hvmax \prec \Gmax \hvmax$. Since $F$ is strictly order-preserving, this leads to $F G_\sigma \, \hvmax \prec F \Gmax \hvmax$. But this contradicts $\hat T_\sigma \, \hvmax = \htmax \hvmax$, so $G_\sigma \, \hvmax \prec \Gmax \hvmax$ cannot hold. Hence, it must be that $G_\sigma \, \hvmax = \Gmax \hvmax$. By (ii) of {prf:ref}`t-rgsc`, we see that $\sigma$ is optimal for $(V, \TT)$. ◻ ``` (ss-orfdps)= ### Order-Reversing FDPs An **order-reversing FDP** is an FDP $(V, F, \hat V, \GG)$ where $F$ is order-reversing and each $G_\sigma$ is order-reversing. As in the order-preserving case, $T_\sigma \coloneq G_\sigma \circ F$ and $\hat T_\sigma \coloneq F \circ G_\sigma$ define the policy operators for the primary ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$, respectively. Since $F$ and $G_\sigma$ are both order reversing, each $T_\sigma$ and $\hat T_\sigma$ is order preserving, so these are valid ADPs. Throughout this section, - $(V, F, \hat V, \GG)$ is a given FDP, - $(V, \TT)$ is the primary ADP, and - $(\hat V, \hat{\TT})$ is the subordinate ADP. The primary ADP behaves just as in the order-preserving case: ```{prf:lemma} :label: l-gmaxr If $(V, F, \hat V, \GG)$ is order-reversing, then 1. the Bellman max-operator $\tmax$ obeys $\tmax = \Gmax \circ F$ on $V$, and 2. $\sigma$ is $v$-greedy for $(V, \TT)$ if and only if $G_\sigma F v = \Gmax F v$. ``` ```{prf:proof} The proof is identical to the proof of {prf:ref}`l-gmax`: claim (i) follows from $\tmax v = \bigvee_\sigma G_\sigma F v = \Gmax F v$ and (ii) follows from (i). ◻ ``` The subordinate ADP is where the order-reversing case diverges. Because $F$ reverses order, it converts the supremum $\Gmax$ into an infimum, so $F \circ \Gmax$ gives the Bellman *min*-operator rather than the max-operator. ```{prf:lemma} :label: l-gmhr If $(V, F, \hat V, \GG)$ is order-reversing, then 1. the Bellman min-operator $\htmin$ obeys $\htmin = F \circ \Gmax$ on $\hat V$, and 2. if $G_\sigma \, \hat v = \Gmax \hat v$, then $\sigma$ is $\hat v$-min-greedy for $(\hat V, \hat{\TT})$. ``` ```{prf:proof} Regarding (i), fix $\hat v \in \hat V$. By (iv), the set $\{G_\sigma \, \hat v\}$ has a greatest element, so $$ \htmin \hat v = \bigwedge_\sigma F G_\sigma \, \hat v = F \bigvee_\sigma G_\sigma \, \hat v = F \Gmax \hat v. $$ The second equality uses the fact that $F$ is order reversing and $\{G_\sigma \, \hat v\}$ has a greatest element, so $F$ maps this greatest element to the least element of $\{F G_\sigma \, \hat v\}$. Regarding (ii), if $G_\sigma \, \hat v = \Gmax \hat v$, then $\hat T_\sigma \, \hat v = F G_\sigma \, \hat v = F \Gmax \hat v = \htmin \hat v$. Hence $\sigma$ is $\hat v$-min-greedy for $(\hat V, \hat{\TT})$. ◻ ``` The policy operator identities $T_\sigma = G_\sigma \circ F$ and $\hat T_\sigma = F \circ G_\sigma$ hold for any FDP, regardless of order properties, so each pair of policy systems $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ is strongly semiconjugate under $F, G_\sigma$. Combining these with {prf:ref}`l-gmaxr` and {prf:ref}`l-gmhr`, the Bellman operators obey $$ \tmax = \Gmax \circ F \; \text{ on } V \qquad \text{and} \qquad \htmin = F \circ \Gmax \; \text{ on } \hat V. $$ (eq-ttser) ```{prf:lemma} :label: l-subcg2r The systems $(V, \tmax)$ and $(\hat V, \htmin)$ are strongly semiconjugate under $F, \Gmax$. ``` The proof is immediate from {eq}`eq-ttser`. Since $F$ and each $G_\sigma$ are order reversing, the next exercise gives a parallel of {prf:ref}`l-subop`. ```{exercise} :label: ex-subopr Prove the following: 1. $(\hat V, \hat{\TT})$ is well-posed if and only if $(V, \TT)$ is well-posed, and 2. $(\hat V, \hat{\TT})$ is order stable if and only if $(V, \TT)$ is order stable. Show also that, in either case, the $\sigma$-value functions are linked by $$ \hat v_\sigma = F v_\sigma \quad \text{and} \quad v_\sigma = G_\sigma \, \hat v_\sigma \quad \text{for all} \quad \sigma \in \Sigma. $$ (eq-vshvsr) ``` ```{solution} ex-subopr All claims follow from {prf:ref}`l-foo0` and {prf:ref}`l-foo0b` and the observation that $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ are strongly semiconjugate under order-reversing $F, G_\sigma$ at every $\sigma \in \Sigma$. ``` We now study optimality for order-reversing FDPs. The primary ADP $(V, \TT)$ is a standard maximization problem, while the subordinate ADP $(\hat V, \hat{\TT})$ is a minimization problem, with Bellman min-operator $\htmin = F \circ \Gmax$. When they exist, $$ \vmax = \bigvee_\sigma v_\sigma \quad \text{and} \quad \hvmin = \bigwedge_\sigma \hat v_\sigma $$ will denote the max-value function of $(V, \TT)$ and the min-value function of $(\hat V, \hat{\TT})$, respectively. The next result is the order-reversing analog of {prf:ref}`t-rgsc`, extended to include a converse implication under a strict monotonicity condition. In the statement, $F$ is called *strictly order reversing* when $F$ is order reversing and $u \prec v$ implies $Fv \prec Fu$. ```{prf:theorem} :label: t-rgscr The following statements are equivalent: 1. The fundamental max-optimality properties hold for $(V, \TT)$. 2. The fundamental min-optimality properties hold for $(\hat V, \hat{\TT})$. If either and hence both of these statements are true, then 1. the value functions obey $$ \vmax = \Gmax \, \hvmin \quad \text{and} \quad \hvmin = F \, \vmax, $$ (eq-nttser) 2. $G_\sigma \, \hvmin = \Gmax \, \hvmin$ $\implies$ $\sigma$ is optimal for $(V, \TT)$, and 3. $\sigma$ is optimal for $(V, \TT)$ $\implies$ $\sigma$ is min-optimal for $(\hat V, \hat{\TT})$. If, in addition, $F$ is strictly order reversing, then 1. $\sigma$ is min-optimal for $(\hat V, \hat{\TT})$ $\implies$ $\sigma$ is optimal for $(V, \TT)$. ``` ```{prf:proof} The proof follows the structure of {prf:ref}`t-rgsc`, with $\htmin$ replacing $\htmax$ and inequalities reversed by the order-reversing maps. For (a)$\implies$(b), {prf:ref}`l-subcg2r` gives that $(V, \tmax)$ and $(\hat V, \htmin)$ are strongly semiconjugate under $F, \Gmax$, so $F\vmax$ is the unique fixed point of $\htmin$ by {prf:ref}`l-foo0`. Since $F$ is order reversing, $\hat v_\sigma = F v_\sigma \succeq F \vmax$ for all $\sigma$, with equality for optimal $\sigma$. Hence $\hvmin = F \vmax$ and (b) follows from the min-analog of {prf:ref}`p-foo`. For (b)$\implies$(a), since $G_\sigma$ is order reversing and $\hat v_\sigma \succeq \hvmin$, we get $v_\sigma = G_\sigma \hat v_\sigma \preceq G_\sigma \hvmin \preceq \Gmax \hvmin$. Choosing $\sigma$ with $G_\sigma \hvmin = \Gmax \hvmin$ and applying Bellman's principle of min-optimality yields $\hat v_\sigma = \hvmin$, so $v_\sigma = \Gmax \hvmin$. Thus $\vmax = \Gmax \hvmin$ and (a) follows from {prf:ref}`p-foo`. Parts (ii) and (iii) follow by the same arguments as in {prf:ref}`t-rgsc`, replacing $\hvmax$ with $\hvmin$ and max-greedy with min-greedy throughout. Regarding (iv), let $\sigma$ be min-optimal for $(\hat V, \hat{\TT})$, so that $\hat T_\sigma \, \hvmin = \htmin \hvmin$. Supposing $G_\sigma \, \hvmin \prec \Gmax \hvmin$ leads to $F \Gmax \hvmin \prec F G_\sigma \, \hvmin$ (since $F$ is strictly order reversing), contradicting the previous equality. Hence $G_\sigma \, \hvmin = \Gmax \hvmin$, and (ii) gives the result. ◻ ``` (s-transapps)= ## Applications In {ref}`sss-qsub`, we show that the standard MDP and its Q-factor variant are the primary and subordinate ADPs of a single order-preserving FDP, unifying results previously proved separately. In {ref}`sss-expvissub`, we apply the same framework to connect discrete choice Bellman equations with the post-action value function formulation used in structural estimation. In {ref}`ss-ezr`, we revisit the Epstein--Zin model and use factored dynamic programs to obtain a lower-dimensional subordinate ADP that is more efficient to solve. (sss-qsub)= ### A Deeper Analysis of Q-Factors In {ref}`ss-apps` we discussed both MDPs and Q-factor MDPs, proving separately that they satisfy the fundamental optimality properties. Of course, these two models are connected. Here we unify the models by representing them as the primary and subordinate pairs of an FDP. We work with the finite-state MDP environment described in {ref}`ss-apps`. Using the primitives described there, we form an order-preserving FDP $(V, F, \hat V, \GG)$ by setting $$ (Fv)(x, a) = r(x, a) + \beta \sum_{x'} v(x')P(x,a,x') \qquad \left(v \in \RR^\Xsf, \; (x,a) \in \Gsf \right) $$ (eq-fv) and $$ (G_\sigma f)(x) = f(x, \sigma(x)) \qquad \left(\sigma \in \Sigma, \; f \in \RR^\Gsf \right), $$ with $V \coloneq \RR^\Xsf$ and $\hat V \coloneq \RR^\Gsf$. As required, $F$ maps $V$ to $\hat V$, and $G_\sigma$ maps $\hat V$ to $V$, and both are order preserving. Also, fixing $f \in \hat V$ and choosing $\sigma$ such that $\sigma(x) \in \argmax_{a \in \Gamma(x)} f(x, a)$, we verify the existence of a policy $\sigma$ with $G_\tau \, f \leq G_\sigma f$ for all $\tau \in \Sigma$. Hence $(V, F, \hat V, \GG)$ is an order-preserving FDP, as claimed. The primary ADP $(V, \TT)$ generated by $(V, F, \hat V, \GG)$ is produced by setting $T_\sigma = G_\sigma \circ F$ for each $\sigma$, which gives $$ (T_\sigma \, v)(x) = (G_\sigma \, F \, v)(x) = r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x'). $$ Thus, $(V, \TT)$ is nothing but the standard ADP generated from an MDP (see {ref}`sss-masa`). The subordinate ADP $(\hat V, \hat{\TT})$ generated by $(V, F, \hat V, \GG)$ is produced by setting $\hat T_\sigma = F \circ G_\sigma$ for each $\sigma$, which gives $$ (\hat T_\sigma \, f)(x, a) = (F \, G_\sigma \, f)(x, a) = r(x, a) + \beta \sum_{x'} f(x', \sigma(x')) P(x, a, x'). $$ This map $\hat T_\sigma$ is identical to the Q-factor policy operator $S_\sigma$ we constructed in {eq}`eq-pabos`, on page . Thus, $(\hat V, \hat{\TT})$ is just the Q-factor ADP we examined in {ref}`sss-qfmo` (where the ADP was written as $(\RR^\Gsf, \SS)$). We already know that the fundamental optimality properties hold for both of these models, and that VFI, OPI, and HPI all converge. We took the time to prove these facts separately, in {ref}`sss-mdpsbc` and {ref}`sss-qfmo`. {prf:ref}`t-rgsc` now tells us that one of these steps was unnecessary. Since these two ADPs are the primary and subordinate elements of the FDP $(V, F, \hat V, \GG)$, establishing these facts for either one of the pairs is enough to establish them for the other. In addition, we can use {prf:ref}`t-rgsc` to formally connect the value functions and optimal policies. In the next proposition, $\vmax$ is the MDP value function and $\qmax$ is the Q-factor value function. ```{prf:proposition} :label: p-qffdp For the MDP model and the Q-factor model from {ref}`sss-qfmo`, the following relationships hold: 1. For all $(x,a) \in \Gsf$, the value functions obey $\vmax(x) = \max_{a \in \Gamma(x)} \, \qmax(x, a)$ and $$ \qmax(x, a) = r(x, a) + \beta \sum_{x'} \vmax(x')P(x,a,x'). $$ 2. $\sigma$ is optimal for the Q-factor model whenever $\sigma$ is optimal for the MDP model. 3. $\sigma$ is optimal for the MDP model whenever $\qmax(x, \sigma(x)) = \max_{a \in \Gamma(x)} \qmax(x, a)$. ``` In {ref}`sss-aci`, we discussed the fact that the converse to (ii) fails to hold without additional conditions. In particular, to get the converse to (ii), we require that $F$ is strictly order preserving. Here's how that result looks in the present case. In the result, the statement that $\Xsf$ has no **isolated point** under $P$ means that there is no $x' \in \Xsf$ such that $P(x,a,x')=0$ for all $(x,a) \in \Gsf$. ```{prf:proposition} :label: p-stmonq If $\Xsf$ has no isolated point under $P$, then any policy $\sigma$ that is optimal for the Q-factor model is also optimal for the MDP model. ``` ```{prf:proof} In view of {prf:ref}`p-stmon` and the FDP connection between the MDP model $(V, \TT)$ and the Q-factor ADP $(\RR^\Gsf, \SS)$, we only need to show that $F$ is strictly order preserving. To see this, we can use {prf:ref}`eg-sopk` on page , which tells us that $P$ is strictly order preserving under the no isolated point condition. It follows that $v \mapsto r + \beta Pv$ is strictly order preserving. In other words, $F$ in {eq}`eq-fv` is strictly order preserving, as was to be shown. ◻ ``` (sss-expvissub)= ### Structural Estimation via Transforms In {ref}`ss-struct` we considered post-action value functions for discrete choice models in the context of structural estimation. Post-action value functions are a transformation of a more standard Bellman equation. Here we unify these two models through the lens of FDPs. The setting we consider is the same as in {ref}`sss-pavf`. The state space $\Xsf$ is a metric space. The action set $\Asf$ is finite. The set of policies $\Sigma$ is all measurable maps from $\Xsf$ to $\Asf$ and, correspondingly, $\Gsf \coloneq \Xsf \times \Asf$. The reward function $r \in \RR^\Gsf$ is assumed to be bounded and Borel measurable, while $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$. Consider the tuple $(b\Xsf, F, b\Gsf, \GG)$, where $$ (Fv)(x, a) = \int v(x')P(x,a, \diff x') \qquad (v \in b\Xsf) $$ (eq-evalf) and $$ (G_\sigma \, g)(x) = r(x, \sigma(x)) + \beta g(x, \sigma(x)). \qquad (g \in b\Gsf). $$ (eq-gsigexpval) We claim that $(b\Xsf, F, b\Gsf, \GG)$ is an order-preserving FDP. Clearly $F$ is an order-preserving map from $b\Xsf$ to $b\Gsf$, while each $G_\sigma$ is an order preserving map from $b\Gsf$ to $b\Xsf$. Moreover, we proved in {prf:ref}`ex-dmeis` that there exists a measurable map $\sigma \colon \Xsf \to \Asf$ obeying $$ \sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta g(x, a)\} \quad \text{for all } x \in \Xsf $$ (eq-ggr2) For any such $\sigma$ we have $G_\tau \, g \leq G_\sigma \, g$ for all $\tau \in \Sigma$. These facts confirm our claim. The primary ADP $(b\Xsf, \TT_{\rm SE})$ for this model is the "natural" discrete choice version of the dynamic program. To describe it, we use $T_\sigma = G_\sigma \circ F$ to determine the policy operators, yielding $$ (T_\sigma \, v)(x) = r(x, \sigma(x)) + \beta \int v(x') P(x, \sigma(x), \diff x'). $$ The corresponding Bellman equation is $$ v(x) = \max_{a \in \Asf} \left\{ r(x, a) + \beta \int v(x') P(x, a, \diff x') \right\} \qquad (v \in b\Xsf). $$ For the subordinate ADP $(b\Gsf, \hat{\TT}_{\rm SE})$, the policy operators are given by $\hat T_\sigma \, g = F G_\sigma \, g$, yielding $$ (\hat T_\sigma \, g)(x, a) = \int \left\{ r(x', \sigma(x')) + \beta g(x', \sigma(x')) \right\} P(x, a, \diff x') \qquad (g \in b\Gsf). $$ The corresponding Bellman equation is $$ g(x, a) = \int \max_{a' \in \Asf} \left\{ r(x', a') + \beta g(x', a') \right\} P(x, a, \diff x'), $$ which is exactly the post-action value function Bellman equation we examined in {eq}`eq-stbe`. We proved in {prf:ref}`p-struct` that, for this ADP $(b\Gsf, \hat{\TT}_{\rm SE})$, the fundamental optimality properties hold. {prf:ref}`t-rgsc` now implies that the same properties hold for $(b\Xsf, \TT_{\rm SE})$. It also tells us that any policy optimal for $(b\Xsf, \TT_{\rm SE})$ is also optimal for $(b\Gsf, \hat{\TT}_{\rm SE})$, and that a policy $\sigma$ is optimal for $(b\Xsf, \TT_{\rm SE})$ whenever $G_\sigma \, \gmax = \Gmax \, \gmax$. (Here $\Gmax = \bigvee_\sigma G_\sigma$ and $\gmax$ is the value function for $(b\Gsf, \hat{\TT}_{\rm SE})$, which is the unique solution to the post-action value function Bellman equation.) In the present setting, this means we can compute an optimal policy for the primary ADP as follows: 1. Compute the unique (in $b\Gsf$) fixed point $\gmax$ of the Bellman operator $\hat T$ corresponding to the subordinate (post-action value) ADP $(b\Gsf, \hat{\TT}_{\rm SE})$ and 2. compute a policy $\sigma$ obeying $$ \sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta \gmax(x, a)\} \quad \text{for all } x \in \Xsf. $$ Also, if the function $F$ in {eq}`eq-evalf` is strictly order preserving, then by {prf:ref}`p-stmon`, we can compute an optimal policy for $(b\Xsf, \TT_{\rm SE})$ by solving for an optimal policy for $(b\Gsf, \hat{\TT}_{\rm SE})$. The strictly order preserving property does not hold for $F$ in general. For a sufficient condition when $\Xsf$ is finite, see {prf:ref}`eg-sopk`. (ss-ezr)= ### Epstein--Zin Revisited We consider a special case of the Epstein--Zin model from {ref}`sss-ezwo` involving optimal savings with iid endowments. Using the FDP framework, we construct a subordinate ADP that operates on functions of wealth alone, rather than wealth and endowment jointly. This yields significant computational savings, which we quantify numerically. #### Subordination in an Epstein--Zin Setting In this section we consider a special case of the Epstein--Zin ADP $(V, \TT)$ analyzed in {ref}`sss-ezwo`. The special case concerns optimal savings in the presence of an iid endowment process. We will produce a subordinate ADP via a transformation reminiscent of the expected value transformation of an ordinary MDP in {ref}`sss-expvissub`. We will see that this subordinate ADP is easier to analyze and solve. We begin with a finite set $\Wsf$ of possible wealth values and a finite set $\Esf$ of possible values for the endowment process. (Finiteness helps simplify the exposition and can be replaced by continuity and compactness conditions.) The Bellman equation takes the form $$ v(w, e) = \max_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta \left( \sum_{e'} v(w', e')^\nu \phi(e') \right)^{\alpha/ \nu} \right\}^{1/\alpha}. $$ (eq-vew) Here $\Gamma(w, e) \subset \Wsf$ is the set of all feasible choices for next period wealth $w'$ given current wealth $w$ and current endowment $e$. The new endowment $e'$ is drawn independently from distribution $\phi$, which maps $\Esf$ into $[0,1]$. This model is a special case of the Epstein--Zin ADP from {ref}`sss-ezwo`. To see this we set $\Xsf \coloneq \Wsf \times \Esf$, with typical element $x = (w, e)$. Let $V$ be the order interval $[v_1^{1/\nu}, v_2^{1/\nu}] \subset (0, \infty)^\Xsf$ defined in {ref}`sss-ezwo` (see {eq}`eq-ezwv`). With $a = w'$ and $\Asf = \Wsf$, the Bellman equation {eq}`eq-vew` is a special case of {eq}`eq-ezwb`. To improve analysis we produce an order-preserving FDP where the primary ADP is the model just discussed and the subordinate ADP operates in a lower-dimensional state space. To do so we set $V$ as above, $$ (Fv)(w) = \left\{ \sum_{e} v(w, e)^\nu \phi(e) \right\}^{1/\nu} \qquad (w \in \Wsf), $$ which maps $V$ to $\hat V \coloneq F(V)$, and $$ (G_\sigma \, h)(w, e) = \left\{ (1-\beta) r(w, \sigma(w, e), e)^\alpha + \beta h(\sigma(w, e))^\alpha \right\}^{1/\alpha} \qquad ((w,e) \in \Xsf), $$ (eq-ezgsig) Both $F$ and $G_\sigma$ are order-preserving. Assuming that $\Gamma(w, e)$ is nonempty at each $(w, e) \in \Xsf$, one easily verifies the existence, for each $h \in \hat V$, of a policy $\sigma$ such that $$ \sigma(w, e) \in \argmax_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta h(w')^{\alpha} \right\}^{1/\alpha} \quad \text{for all } (w,e) \in \Xsf. $$ (eq-ggsig) For this $\sigma$ we have $G_\tau h \leq G_\sigma h$ for all $\tau \in \Sigma$. Hence, with $\GG = \{G_\sigma\}_{\sigma \in \Sigma}$, the tuple $(V, F, \hat V, \GG)$ is an order-preserving FDP. ```{exercise} :label: ex-transforms-auto-5 Show that the primary ADP for this FDP has the form $(V, \TT)$, where each $T_\sigma \in \TT$ is given by $$ (T_\sigma \, v)(w, e) = \left\{ (1-\beta) r(w, \sigma(w, e), e)^\alpha + \beta \left( \sum_{e'} v(\sigma(w, e), e')^\nu \phi(e') \right)^{\alpha/ \nu} \right\}^{1/\alpha}. $$ (eq-ezmwe0p) ``` Inspecting {eq}`eq-ezmwe0p`, we see that the corresponding Bellman equation is {eq}`eq-vew`. Thus, the primary ADP $(V, \TT)$ corresponds to the original problem we considered at the start of this section. Let's now look at the subordinate problem. ```{exercise} :label: ex-transforms-auto-6 Show that the subordinate ADP for the FDP $(V, F, \hat V, \GG)$ is $(\hat V, \hat{\TT})$, where each $\hat T_\sigma \in \hat{\TT}$ has the form $$ (\hat T_\sigma \, h)(w) = \left\{ \sum_e \left\{ (1-\beta) r(w, \sigma(w, e), e)^\alpha + \beta h(\sigma(w, e))^\alpha \right\}^{\nu/\alpha} \phi(e) \right\}^{1/\nu}. $$ (eq-ezmwepr) ``` The benefit of working with $(\hat V, \hat{\TT})$ is that $\hat T_\sigma$ acts on functions that depend only on $w$ rather than on both $w$ and $e$ (as is the case for $T_\sigma$). These lower dimensional operations are significantly more efficient, even when $\Esf$ is relatively small. Since $(V, \TT)$ is a special case of the ADP discussed in {ref}`sss-ezwo`, {prf:ref}`p-ezsbiai` implies that the fundamental optimality properties hold for $(V, \TT)$. As a result, {prf:ref}`t-rgsc` implies that they also hold for $(\hat V, \hat{\TT})$. It also tells us that we can obtain an optimal policy for $(V, \TT)$ by finding the value function $\hvmax$ for $(\hat V, \hat{\TT})$ and then calculating a policy $\sigma$ obeying $G_\sigma \, \hvmax = \Gmax \, \hvmax$. By the definition of $G_\sigma$ in {eq}`eq-ezgsig`, this means that we solve for $\sigma$ satisfying {eq}`eq-ggsig` after setting $h = \hvmax$. To compute $\hvmax$, we can use {prf:ref}`t-bkf`, which tells us that Howard policy iteration applied to $(\hat V, \hat{\TT})$ converges to $\hvmax$ in finitely many steps. Summarizing this analysis, an optimal policy for $(V, \TT)$ can be computed via {prf:ref}`algo-hpowb`. ```{prf:algorithm} Solving $(V, \TT)$ via $(\hat V, \hat{\TT})$ :label: algo-hpowb - input $\sigma_0 \in \Sigma$, set $k \leftarrow 0$ and $\epsilon \leftarrow 1$ - while $\epsilon > 0 $: - $h_k \leftarrow $ the fixed point of $\hat T_{\sigma_k}$ - $\sigma_{k+1} \leftarrow $ an $h_k$-greedy policy, satisfying $$ \sigma_{k+1}(w, e) \in \argmax_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta h_k(w')^\alpha \right\}^{1/\alpha} $$ - $\epsilon \leftarrow \1\{ \sigma_k \not= \sigma_{k+1} \}$ - $k \leftarrow k + 1$ - return $\sigma_{k-1}$ ``` {numref}`f-policies` shows $w \mapsto \sigopt(w, e)$ for two values of $e$ (smallest and largest) when $\sigopt$ is the optimal policy, calculated using {prf:ref}`algo-hpowb`. In the computation we set $\Gamma(w, e) = [0, w]$ and $r(w, s, e) = w - s + e$. We chose $\alpha$ and $\nu$ to match the values used in {cite}`schorfheide2018identifying`. In {numref}`f-rel_timing` we display the relative speed gain from using the lower-dimensional model $(\hat V, \hat{\TT})$ instead of $(V, \TT)$ across multiple choices of $|\Wsf|$ and $|\Esf|$. The speed gain is the time required to solve an optimal policy for $(V, \TT)$ using HPI applied to $(V, \TT)$ (as in {prf:ref}`t-bkf`), divided by the time required to solve for the same optimal policy via {prf:ref}`algo-hpowb`. The speed gain increases linearly in the size of $\Esf$. ```{figure} figures/policies.pdf :name: f-policies Optimal savings policy with Epstein--Zin preference ``` ```{figure} figures/rel_timing.pdf :name: f-rel_timing Speed gain from replacing $(V, \TT)$ with subordinate model $(\hat V, \hat{\TT})$ ``` (s-cn_transforms)= ## Chapter Notes A recurring theme in applied dynamic programming has been the rearrangement of the Bellman equation into alternative functional forms in order to obtain analytical, computational, or statistical advantages. Familiar instances include reservation-wage and continuation-value formulations of optimal stopping problems, used as early as the job-search analysis of {cite}`mccall1970` and developed extensively in the real-options literature (see, e.g., {cite}`dixit2012investment`); the expected and integrated value-function transformations used in structural estimation of discrete choice models, pioneered by {cite}`rust1987optimal`; the Q-factor representations introduced for reinforcement learning by {cite}`watkins1989learning`; and lower-dimensional reformulations that integrate out independent or uncontrollable states. Systematic studies of such transformations can be found in {cite}`ma2019optimal` and {cite}`ma2021dynamic`. The order-theoretic framework adopted in this chapter, building on the abstract foundations of {cite}`sargent2025partially`, unifies and extends those analyses by treating the underlying maps in their own right rather than as recipes for individual Bellman equations. Closely related ideas in the more concrete setting of finite-state MDPs can be found in Chapter 5 of {cite}`sargent2025dynamic`. The topological conjugacy results in {ref}`sss-topcon` are applied in {ref}`ss-eqvti` of {prf:ref}`c-apps` to establish the equivalence of value function iteration and time iteration. The applications in {ref}`s-transapps` unify topics treated in their own right elsewhere in this volume. References to the underlying literatures can be found in the corresponding chapter notes: see {ref}`s-cn_adps2` for Q-factor representations and reinforcement learning, {ref}`s-cn_adps3` for structural estimation of dynamic discrete choice models, and {ref}`s-cn_egs` for Epstein--Zin preferences. The optimal harvest model in {ref}`ss-oh` of {prf:ref}`c-apps` is another example of a factored dynamic program in the sense of {ref}`ss-fdps`. The firm-entry application in {ref}`s-subs` is adapted from {cite}`fajgelbaum2017uncertainty`, slightly extended to permit time-varying discounting; for related industry-dynamics models see {cite}`hopenhayn1992entry`. The parameter values used in {numref}`f-policies` follow {cite}`schorfheide2018identifying`. ======================================================================== ## Linear Decision Processes In this chapter, we define linear decision processes (LDPs). In terms of level of generality, we can think of LDPs as sitting between MDPs, as discussed in {ref}`s-mdps`, and the ADPs we considered in {prf:ref}`c-adps`--{prf:ref}`c-transforms`: $$ \text{MDPs } \subset \text{ LDPs } \subset \text{ ADPs}, $$ with all inclusions being strict. LDPs have a range of advantages over MDPs while maintaining much of their tractability. One advantage is that we can work with state-dependent discounting, which is particularly important for economic and financial applications. Another is that their flexible structure makes them easy to apply. For example, optimal stopping problems can be embedded directly into the LDP framework, whereas embedding optimal stopping problems into MDPs requires expanding the state space. LDPs differ from ADPs by including actions explicitly, instead of taking policy operators as the basic primitive. This is a more traditional perspective: one where controllers observe states and respond to those states by choosing actions. Ultimately, a choice of action given a state will take the form of a policy function; that will lead us back to the ADPs. By studying this circle, we can leverage theory from our earlier chapters. LDPs are more limited than ADPs, but also more concrete and more structured. For example, they provide an algebraic formula for computing lifetime values similar to the one available for MDPs (see, e.g., {eq}`eq-vsigmdp` on page ). This formula is not available for general ADPs. Thus, for LDPs, the HPI step requiring computation of lifetime values from policies is fully articulated, at least at a theoretical level. Another advantage of LDPs, relative to ADPs, is that we can start to construct systematic conditions for regularity, or existence of greedy policies, unlike in previous chapters. We begin with the theoretical foundations in {ref}`ss-affdy`. After introducing Feller properties in {ref}`sss-feller`, we define LDPs in {ref}`sss-ldpdef` and present optimality results. We then treat exogenous discount processes ({ref}`ss-exsdd`) and specialize the LDP framework to MDPs on general state spaces ({ref}`ss-mdpsrv`). In this chapter, we focus on the bounded case; unbounded models are handled in {prf:ref}`c-rdps`. {ref}`sss-nrm` and {ref}`sss-appospp` apply the theory to natural resource management and optimal savings with stochastic rates of return. (ss-affdy)= ## Theory In this section we develop the foundational theory of linear decision processes. We begin in {ref}`sss-feller` by studying Feller properties of transition kernels, which provide the continuity conditions needed for existence of optimal policies. We then define LDPs in {ref}`sss-ldpdef`, give examples, and discuss lifetime values. Next we present optimality results and their implications. {ref}`ss-exsdd` treats exogenous discount processes, and {ref}`ss-mdpsrv` specializes the LDP framework to Markov decision processes on general state spaces. (sss-feller)= ### Feller Properties Since we are always interested in whether or not optimal policies exist, we study conditions under which future values are continuous in states and actions. In the case of LDPs, this continuity will require that integrals of transition kernels vary continuously with actions. (For background on transition kernels see {ref}`sss-sks`.) Here we provide a collection of definitions and results that help us address this question. Throughout this section, - $\Xsf$ and $\Asf$ are separable metric spaces, - $\| \cdot \| \coloneq \| \cdot \|_\infty$ is the supremum norm, - $\Gsf$ is a subset of $\Xsf \times \Asf$, and - $K$ is a transition kernel mapping $b\Xsf$ to $b\Gsf$. Here you can think of $\Gsf$ as a collection of feasible state-action pairs. The last statement means that $$ (Kv)(x, a) \coloneq \int v(x') K(x, a, \diff x') $$ is in $b\Gsf$ whenever $v \in b\Xsf$. Extending standard terminology, we will say that the transition kernel $K$ is - **weak Feller** if $Kh$ is continuous on $\Gsf$ whenever $h \in b c\Xsf$ and - **strong Feller** if $Kh$ is continuous on $\Gsf$ whenever $h \in b \Xsf$. Let's look at some special cases. ```{prf:example} :label: eg-wfell Suppose that $\Wsf$ is another metric space and that $K$ has the form $$ (Kh)(x, a) = \beta(x, a) \int h(F(x, a, w)) \phi(\diff w) \qquad ((x, a) \in \Gsf), $$ where $\phi$ is a distribution on $\Wsf$, $F \colon \Gsf \times \Wsf \to \Xsf$ is Borel measurable, $(x, a) \mapsto F(x, a, w)$ is continuous for all $w \in \Wsf$, and $\beta \in bc\Gsf$. This corresponds to the case where discounting depends on states and actions, while the state evolves according to $$ X_{t+1} = F(X_t, A_t, W_{t+1}) \quad \text{with} \quad (W_t)_{t \geq 1} \iidsim \phi \in \dD(\Wsf). $$ In this setting, $K$ is weak Feller. Indeed, taking $(x_n, a_n) \to (x, a)$ in $\Gsf$ and assuming $h \in b c\Xsf$, the dominated convergence theorem yields $$ \beta(x_n, a_n) \int h(F(x_n, a_n, w)) \phi(\diff w) \to \beta(x, a) \int h(F(x, a, w)) \phi(\diff w) $$ as $n \to \infty$. In particular, $Kh$ is continuous on $\Gsf$. More generally, $K$ is weak Feller whenever $(x, a) \mapsto F(x, a, w)$ is continuous for $\phi$-almost all $w \in \Wsf$. ``` The strong Feller property requires more conditions, since we need to map a potentially discontinuous function $h$ into a continuous function $Kh$. For this, we rely on smoothing properties of the integral. To obtain these properties we introduce a "dominating" measure $\mu$ on $(\Xsf, \bB)$, which we assume to be $\sigma$-finite. A Borel measurable map $p$ from $\Gsf \times \Xsf$ to $\RR$ is called a **density kernel** from $\Gsf$ to $\Xsf$ with dominating measure $\mu$ if $p$ is nonnegative and $$ \int p(x, a, x') \mu(\diff x') = 1 \quad \text{for all } (x, a) \in \Gsf. $$ We say that a stochastic kernel $P$ from $\Gsf$ to $\Xsf$ has density kernel $p$ with dominating measure $\mu$ if $p$ is a density kernel on $\Xsf$ and $$ P(x, a, B) = \int_B p(x, a, x') \mu(\diff x') \quad \text{for all } (x, a, B) \in \Gsf \times \bB. $$ If the dominating measure $\mu$ is not identified in the discussion below then we will be referring to Lebesgue measure, and we write $\diff x$ instead of $\mu(\diff x)$. The following lemma shows how a continuous density kernel can transform discontinuous functions into continuous ones under integration. ```{prf:lemma} :label: l-dksf Let the transition kernel $K$ have the form $$ (Kh)(x, a) = \beta(x, a) \int h(x') p(x, a, x') \mu(\diff x') $$ (eq-kwdm) for some $\beta \in bc\Gsf$ and a density kernel $p$ with respect to some dominating measure $\mu$. If $(x,a) \mapsto p(x, a, x')$ is continuous on $\Gsf$ for $\mu$-almost all $x' \in \Xsf$, then $K$ is strong Feller. ``` ```{prf:proof} Fix $h \in b\Xsf$. Since products of continuous functions are continuous, we need only show that $(Ph)(x,a) \coloneq \int h(x') p(x, a, x') \mu(\diff x')$ is continouous. Given $(x_n, a_n) \to (x, a)$ in $\Gsf$, we have $$ |(Ph)(x_n, a_n) - (Ph)(x, a)| \leq \|h\| \int |p(x_n, a_n, x') - p(x, a, x')| \mu(\diff x'). $$ The continuity condition on $p$ gives $p(x_n, a_n, x') \to p(x, a, x')$ for $\mu$-almost all $x' \in \Xsf$, so Scheffé's lemma (p. ) applies. This yields $(Ph)(x_n, a_n) \to (Ph)(x,a)$. The strong Feller property follows. ◻ ``` ```{prf:example} :label: eg-srssf Suppose that $\Gsf \subset \RR^k$ and $\Xsf = \RR^m$. Let $g$ be a continuous map from $\Gsf$ to $\Xsf$, and let $\beta \colon \Xsf \to \RR$ be continuous and bounded. We consider the transition kernel from $\Gsf$ to $\Xsf$ given by $$ (Kh)(x, a) = \beta(x) \int h[g(x, a) + w] \phi(w) \diff w \qquad ((x, a) \in \Gsf), $$ (eq-addkern) where the density $\phi$ is continuous on $\Xsf$. In this setting, $K$ is strong Feller. Indeed, fix $h \in b\Xsf$. The change of variable $x' = g(x, a) + w$ yields $$ \begin{aligned} (Kh)(x, a) & = \beta(x) \int h[ g(x, a) + w ] \phi(w) \diff w \\ & = \beta(x) \int h(x') \phi(x' - g(x, a)) \diff x'. \end{aligned} $$ The strong Feller property now follows from continuity of the functions $\phi$ and $g$, combined with {prf:ref}`l-dksf`. ``` ### LDPs We now introduce LDPs and study their basic properties. {ref}`sss-ldpdef` defines LDPs and connects them to the ADP framework. We then present several examples, showing which models can and cannot be expressed as LDPs. Finally, we discuss lifetime values and their computation in the LDP setting. (sss-ldpdef)= #### Definition Let $\Xsf$ and $\Asf$ be separable metric spaces, referred to henceforth as the **state** and **action spaces**. As before, $\| \cdot \|$ denotes the supremum norm on $b \Xsf$. Given $\Xsf$ and $\Asf$, a **linear decision process (LDP)** is a tuple $(\Gamma, r, K)$ containing 1. a nonempty correspondence $\Gamma$ from $\Xsf$ to $\Asf$ called the **feasible correspondence**, with an associated set of **feasible state-action pairs** $$ \Gsf \coloneq \graph \Gamma = \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)}, $$ (eq-ldpdg) 2. a bounded Borel measurable **reward function** $r$ mapping $\Gsf$ into $\RR$, and 3. a **transition kernel** $K$ from $\Gsf$ to $\Xsf$ satisfying $Kv \in b\Gsf$ whenever $v \in b\Xsf$. The set $\Gamma(x)$ represents all actions available to a controller in state $x$. Figure {numref}`f-gamma` shows an illustration of one possible correspondence $\Gamma$ when $\Asf = \Xsf = \RR_+$, along with $\Gsf$, the resulting set of feasible state-action pairs. When representing the LDP by the tuple $(\Gamma, r, K)$, we are treating $\Xsf$ and $\Asf$ as understood from context. ```{figure} figures/gamma.pdf :name: f-gamma Feasible correspondence and feasible state-action pairs ``` For the LDP $(\Gamma, r, K)$, a **feasible policy** is a Borel measurable map $\sigma \colon \Xsf \to \Asf$ such that $\sigma(x) \in \Gamma(x)$ for all $x \in \Xsf$. {numref}`f-feasible_policy` shows a feasible policy $\sigma$ in the same setting. ```{figure} figures/feasible_policy.pdf :name: f-feasible_policy The action $\sigma(x)$ lies in $\Gamma(x)$ for all $x$ ``` We let $\Sigma$ denote the set of all feasible policies. With these policies in hand, we define the set of policy operators associated with $(\Gamma, r, K)$ via $$ (T_\sigma \, v)(x) = r(x, \sigma(x)) + \int v(x') K(x, \sigma(x), \diff x') \qquad (x \in \Xsf), $$ (eq-tsaff) where $v$ varies over $b \Xsf$. The assumption that $\Xsf$ and $\Asf$ are metric spaces is important in some applications and irrelevant in others. For simplicity, we maintain it throughout. When $\Xsf$ and $\Asf$ are discrete, the metric in question is always understood to be the discrete metric. In this case, every subset of these sets is a Borel set, so the measurability constraint in the definition of $\Sigma$ never binds. (sss-lpdsadps)= #### ADP Representation With $$ K_\sigma(x, x') \coloneq K(x, \sigma(x), x') \quad \text{and} \quad r_\sigma(x) \coloneq r(x, \sigma(x)), $$ we can also write the policy operator {eq}`eq-tsaff` as $$ T_\sigma \, v = r_\sigma + K_\sigma \, v. $$ Given $v \in b \Xsf$, we have $Kv \in b\Gsf$ and hence $K_\sigma v \in b\Xsf$. Since $b \Xsf$ is a vector space, it follows that $T_\sigma \, v$ is in $b \Xsf$. Since $K$ is a transition kernel, $K_\sigma$ is a positive linear operator, so $T_\sigma$ is order preserving. Hence $$ (b \Xsf, \TT_{\rm LDP}) \quad \text{with} \quad \TT_{\rm LDP} \coloneq \setntn{T_\sigma}{\sigma \in \Sigma} $$ is an ADP. We call $(V, \TT_{\rm LDP})$ the **ADP generated by** $(\Gamma, r, K)$ and use the following obvious conventions: - $(\Gamma, r, K)$ is called **well-posed** (resp., regular, order stable, etc.) if $(V, \TT_{\rm LDP})$ is well-posed (resp., regular, order stable, etc.). - $v_\sigma$ is the **$\sigma$-value function** for $(\Gamma, r, K)$ when $v_\sigma$ is the $\sigma$-value function for $(V, \TT_{\rm LDP})$, - $\sigma$ is called **optimal** for $(\Gamma, r, K)$ when $\sigma$ is optimal for $(V, \TT_{\rm LDP})$, - etc. We notice that each $T_\sigma$ has the affine form from the ADP analysis in {ref}`sss-affa`, with $K_\sigma \in \blop_+(b\Xsf)$ by {prf:ref}`t-poblc2`. We will use the theorems in that section for some of our optimality results. #### Examples Let's discuss some examples. Some but not all of these examples can be framed as LDPs. ```{prf:example} :label: eg-mdpisldp The finite MDP $(\Gamma, r, \beta, P)$ from {ref}`sss-tdtm` generates an LDP $(\Gamma, r, K)$ after setting $K(x, a, x') \coloneq \beta P(x, a, x')$. If $v \in b\Xsf$, then $Kv \in b\Gsf$, since $$ \int |v(x')| K(x, a, \diff x') = \beta \int |v(x')| P(x, a, \diff x') \leq \beta \| v \|, $$ ``` ```{prf:example} :label: eg-fvisldp The firm valuation problem with state dependent discounting from {ref}`sss-fvsd` generates an LDP $(\Gamma, r, K)$. For this problem, the state space is $\Xsf$, the action space is $\Asf = \{0, 1\}$, and $\Gamma(x) = \Asf$ for all $x$. We set $$ r(x, a) \coloneq a s + (1 - a) \pi(x) \quad \text{and} \quad K(x, a, \diff x') \coloneq (1 - a) \beta(x) P(x, \diff x') $$ Since $\pi$ is bounded and measurable, $r$ is bounded and measurable. Since $\beta$ is assumed to be bounded, $v \in b\Xsf$ implies $Kv \in b\Gsf$. ``` ```{prf:example} :label: eg-osisldp The optimal savings model from {ref}`s-og` can be framed as an LDP. The state is wealth and the action is current consumption, both taking values in $\RR_+$. The feasible correspondence is $\Gamma(x) = [0, x]$, the reward function is $r(x, a) = u(a)$, and the transition kernel $K$ is defined by $$ \int v(x') K(x, a, \diff x') = \beta \int v( R (x - a) + y) \phi(\diff y). $$ Since $u$ is bounded, $r$ is likewise bounded. Clearly $v \in b\Xsf$ implies $Kv \in b\Gsf$. ``` ```{prf:example} :label: eg-rsnotldp Consider a risk-sensitive MDP with policy operators of the form $$ (T_\sigma \, v)(x) = r(x, \sigma(x)) + \frac{\beta}{\theta} \ln \left[ \sum_{x'} \exp(\theta v(x')) P(x,\sigma(x),x') \right]. $$ (This is an MDP with expectation replaced by an entropic certainty equivalent, as previously discussed in {ref}`sss-alt`.) This model cannot be represented as an LDP because the nonlinear log-sum-exp term prevents us from expressing the model in the form of {eq}`eq-tsaff`. ``` ```{prf:example} The real option problem from {ref}`ss-ro` cannot be cast into the LDP framework. One reason is the inverted expectation and max in the Bellman equation (see {eq}`eq-fe_modbell`). Another is that the value space $L_1(\phi)$ is a set of equivalence classes of functions, rather than a set of functions. ``` #### Lifetime Values Let $(\Gamma, r, K)$ be an LDP with state space $\Xsf$, action space $\Asf$. Given a policy $\sigma \in \Sigma$, the **$\sigma$-value function** $v_\sigma$ is defined as the fixed point of the policy operator $T_\sigma$ in {eq}`eq-tsaff`. As a result, $v_\sigma$ satisfies the recursion $$ v_\sigma = r_\sigma + K_\sigma v_\sigma. $$ (eq-ldpvs) If the spectral radius condition $\rho(K_\sigma) < 1$ holds, then, by the Neumann series lemma (see, in particular {prf:ref}`c-ibnl` on page ), the operator $I - K_\sigma$ is invertible on $b \Xsf$ and the unique solution to {eq}`eq-ldpvs` is $$ v_\sigma = (I - K_\sigma)^{-1} r_\sigma = \sum_{t=0}^{\infty} K_\sigma^t r_\sigma. $$ (eq-ldpvsn) The $t$-th term $K_\sigma^t r_\sigma$ gives the expected reward at time $t$ under policy $\sigma$, discounted back to the present. The explicit representation of $v_\sigma$ in {eq}`eq-ldpvsn` is valuable for computation. For example, the MDP version of HPI in {prf:ref}`algo-hpi_os` can be extended to the current setting by replacing $v \leftarrow (I - \beta P_{\sigma} )^{-1} r_{\sigma}$ with $v \leftarrow (I - K_{\sigma} )^{-1} r_{\sigma}$. Under the conditions of {prf:ref}`p-affdpw`, with $K$ strong Feller, this algorithm converges. ### Optimality Results Now we turn to optimality results. We first treat the case where $\TT$ is finite, and then shift to general (metric) state and action spaces by adding continuity conditions. We conclude by deriving implications for greedy policies and the Bellman operator. In the following, we suppose that $(\Gamma, r, K)$ is an LDP with state space $\Xsf$ and action space $\Asf$. As before, these sets are separable metric spaces (with the discrete topology when finite). As shown in {ref}`sss-ldpdef`, the LDP $(\Gamma, r, K)$ generates an ADP $(b\Xsf, \TT_{\rm LDP})$ where each $T_\sigma \in \TT_{\rm LDP}$ has the affine form $T_\sigma v = r_\sigma + K_\sigma v$. We will infer optimality of the LDP by studying this ADP. #### Results First we present a result that works for the finite case. ```{prf:proposition} :label: p-affdpf If $\TT$ is finite and $\rho(K_\sigma) < 1$ for all $\sigma \in \Sigma$, then $(\Gamma, r, K)$ satisfies the fundamental optimality properties and VFI, HPI, and OPI all converge. ``` ```{prf:proof} Regularity is obvious in the finite case, so this is a direct consequence of {prf:ref}`t-affineban_f`. ◻ ``` To shift to the general case, we inject some continuity. ```{prf:assumption} :label: a-affdp The following conditions hold: 1. $\Gamma$ is nonempty, continuous and compact-valued. 2. The reward $r$ is continuous on $\Gsf$. ``` Recalling the definition of a discount operator from page , we can state the following result. ```{prf:proposition} :label: p-affdpw Let {prf:ref}`a-affdp` hold and let $K$ be weak Feller. If there exists a discount operator $D$ such that $K_\sigma \leq D$ on $b\Xsf_+$ for all $\sigma \in \Sigma$, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $bc\Xsf$, and 3. VFI converges geometrically on $bc\Xsf$. If, in addition, $K$ is strong Feller, then OPI and HPI converge. ``` ```{prf:proof} Let the stated conditions hold and let $(b\Xsf, \TT_{\rm LDP})$ be the ADP generated by $(\Gamma, r, K)$. We apply {prf:ref}`t-affineban_sr` with $V = b\Xsf$ and $V_0 = bc\Xsf$. The ADP has the required affine form $T_\sigma \, v = r_\sigma + K_\sigma \, v$ with $K_\sigma \in \blop_+(b\Xsf)$, and the discount operator condition $K_\sigma \leq D$ on $b\Xsf_+$ holds by hypothesis. We verify semi-regularity on $bc\Xsf$. Fix $v \in bc\Xsf$. Since $r$ is continuous on $\Gsf$ by {prf:ref}`a-affdp` and $K$ is weak Feller, the map $(x, a) \mapsto r(x,a) + \int v(x') K(x, a, \diff x')$ is continuous on $\Gsf$. Combined with our assumptions on $\Gamma$, {prf:ref}`t-berge` implies that a $v$-greedy policy exists and that $Tv \in bc\Xsf$. Hence $bc\Xsf \subset V_G$ and $T(bc\Xsf) \subset bc\Xsf$. Since $bc\Xsf$ is closed in $b\Xsf$, claims (i)--(iii) follow from {prf:ref}`t-affineban_sr`. For the last claim, if $K$ is strong Feller, the same argument applies to any $v \in b\Xsf$, giving regularity. OPI and HPI convergence then follow from {prf:ref}`t-affineban_sr`. ◻ ``` #### Implications Let $(\Gamma, r, K)$ be a given LDP. Clearly $\sigma \in \Sigma$ is $v$-greedy for $(\Gamma, r, K)$ if and only if $$ r(x, \tau(x)) + \int v(x') K(x, \tau(x), \diff x') \leq r(x, \sigma(x)) + \int v(x') K(x, \sigma(x), \diff x') $$ (eq-sbtau2) for all $\tau \in \Sigma$ and $x \in \Xsf$. The Bellman operator obeys $$ (Tv)(x) = \sup_{\sigma \in \Sigma} \left\{ r(x, \sigma(x)) + \int v(x') K(x, \sigma(x), \diff x') \right\} \qquad (x \in \Xsf). $$ If, say, the conditions of {prf:ref}`p-affdpw` hold and $K$ is strong Feller, then, for every $v \in b \Xsf$, there always exists a $\sigma \in \Sigma$ obeying $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \int v(x') K(x,a, \diff x') \right\} \quad \text{for all } x \in \Xsf. $$ (eq-ldpgre) (See the proof of {prf:ref}`p-affdpw`.) In this setting, a policy $\sigma \in \Sigma$ is $v$-greedy if and only if {eq}`eq-ldpgre` holds. Moreover, the Bellman operator simplifies to $$ (Tv)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \int v(x') K(x, a, \diff x') \right\} $$ (eq-ldpbellop) for every $v \in b \Xsf$. These expressions remain valid in the weak Feller setting when we restrict to $v \in b c \Xsf$. (ss-exsdd)= ### Exogenous Discount Processes In {prf:ref}`c-adps3` we look at several settings that include state-dependent discounting. In each case the setting was relatively simple: either a binary stopping problem or a model with discrete states and actions. Here we'll look at a problem with continuous state and action spaces. To make this setting tractable, we'll insist that the discount factor process depends only on an exogenous state (i.e., a state that is not influenced by decisions of the agent). #### Discount Factor Processes When the discount factor varies over time, forming a sequence $(\beta_t)_{t \geq 0}$, the present value of a random time $t$ payoff $H_t$ has the general form $\EE \, \beta_0 \cdots \beta_{t-1} H_t$. In this section we formalize this idea in a Markov environment and examine some simple consequences. Let $\Zsf$ be a metric space and let $Q$ be a stochastic kernel on $\Zsf$. Let $(Z_t)$ be $Q$-Markov on $\Zsf$. Let $\beta \in b\Zsf$ be a nonnegative function and consider the discount factor process $(\beta_t)_{t \geq 0}$ where $\beta_t \coloneq \beta(Z_t)$ for all $t$. We introduce the operator $$ (K_Q h)(z) := \beta(z) \int h(z') Q(z, \diff z') $$ (eq-kq) Our next lemma connects powers of $K_Q$ to expected present values. ```{prf:lemma} :label: l-kq For all $z \in \Zsf$, all $h \in b\Zsf$, and all $n \in \NN$, we have $$ (K_Q^n h)(z) = \EE_z \, \beta_0 \cdots \beta_{n-1} h(Z_n). $$ ``` We can confirm this rather natural expression by induction. ```{prf:proof} When $n=1$, we have $(K_Q h)(z) = \beta(z) \int h(z') Q(z, \diff z') = \EE_z \, \beta_0 \, h(Z_1)$. For the inductive step, suppose the claim holds at $n$. Then $$ \begin{aligned} (K_Q^{n+1} h)(z) & = (K_Q (K_Q^n h))(z) = \beta(z) \int (K_Q^n h)(z') Q(z, \diff z') \\ & = \beta(z) \int \EE_{z'} \, \beta_0 \cdots \beta_{n-1} h(Z_n) \, Q(z, \diff z') \\ & = \EE_z \, \beta_0 \cdots \beta_n \, h(Z_{n+1}), \end{aligned} $$ where the last step uses the law of iterated expectations and the Markov property. ◻ ``` The next result follows from Gelfand's formula for the spectral radius (p. ) and the details of the argument can be seen in {prf:ref}`eg-oclc0`. ```{prf:lemma} :label: l-sddeq The following two statements are equivalent. 1. $\rho(K_Q) < 1$. 2. There exists an $n \in \NN$ and $\lambda \in [0,1)$ such that $\| K_Q^n h \| \leq \lambda \| h \|$. ``` Now consider pricing an infinite horizon cash flow $(h(Z_t))_{t \geq 0}$. We set $$ q(z) \coloneq \EE_z \, \sum_{t \geq 0} \prod_{i=0}^{t-1} \beta(Z_i) \cdot h(Z_t). $$ ```{exercise} :label: ex-fqz Prove: If $h \in b\Zsf$ and $\rho(K_Q) < 1$, then $q = (I - K_Q)^{-1} h$. ``` ```{solution} ex-fqz By {prf:ref}`l-kq`, $K_Q^t h(z) = \EE_z \, \beta_0 \cdots \beta_{t-1} h(Z_t)$, so $$ q(z) = \EE_z \sum_{t \geq 0} \prod_{i=0}^{t-1} \beta(Z_i) \cdot h(Z_t) = \sum_{t \geq 0} (K_Q^t h)(z). $$ Since $\rho(K_Q) < 1$, the Neumann series lemma implies that $\sum_{t \geq 0} K_Q^t$ converges in operator norm to $(I - K_Q)^{-1}$. Hence the sum is well defined, $q = (I - K_Q)^{-1} h$, and $q \in b\Zsf$ since $h \in b\Zsf$. ``` #### An LDP with Exogenous Discounting Let $\Xsf$ and $\Asf$ be separable metric spaces and let $(\Gamma, r, K)$ be an LDP with state space $\Xsf$ and action space $\Asf$. Suppose further that $\Xsf$ is a product space of the form $\Ysf \times \Zsf$ and that $K$ has the form $$ (Kh)(x, a) = (Kh)(y,z, a) = \beta(z) \int \sum_{z'} h(y', z') Q(z, z') R(y, z, a, \diff y') , $$ where - $R$ is a stochastic kernel from $\Gsf$ to $\Ysf$, - $Q$ is a stochastic kernel from $\Zsf$ to $\Zsf$, and - $\beta$ is an element of $bc\Zsf$. We call $R$ the **endogenous kernel**, $Q$ the **exogenous kernel** and $\beta$ the **discount function**. The expression for $K$ tells us that the endogenous state $y$ updates via the kernel $R$, depending on current state $x=(y,z)$ and action $a$, while $z$ updates via $Q$. Since we are taking products, the two updates are independent. The exogenous process feeds into values and hence optimal policies through its impact on the discount factor. To make our lives slightly easier, we'll assume that $\Zsf$ is finite. As with every other finite set, we endow $\Zsf$ with the discrete topology. ```{prf:assumption} :label: a-rc The endogenous kernel $R$ is such that, for every $z \in \Zsf$, the map $$ (y, a) \mapsto \int g(y') R(y, z, a, \diff y') $$ is continuous on $$ \Gsf_z \coloneq \setntn{(y,a) \in \Ysf \times \Asf}{a \in \Gamma(y, z)} $$ whenever $g \in bc\Ysf$. ``` Let $K_Q$ be defined as in {eq}`eq-kq`. In this setting, we have the following result. ```{prf:proposition} :label: p-exdisc Let the continuity assumptions {prf:ref}`a-affdp` and {prf:ref}`a-rc` hold. If, in addition, $\rho(K_Q) < 1$, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $bc\Xsf$, and 3. VFI converges geometrically on $bc\Xsf$. ``` ```{prf:proof} We apply {prf:ref}`p-affdpw`. {prf:ref}`a-affdp` holds by hypothesis, so it remains to verify that (i) $K$ is weak Feller and (ii) there exists a discount operator $D$ on $b\Xsf$ such that $K_\sigma \leq D$ on $b\Xsf_+$ for all $\sigma \in \Sigma$. To verify (i), we fix $h \in bc\Xsf$. Since $\Zsf$ is finite (and has the discrete topology), it suffices to show $(y,a) \mapsto (Kh)(y, z, a)$ is continuous for fixed $z \in \Zsf$. Fix any such $z$. For each $z' \in \Zsf$, the map $y' \mapsto h(y', z')$ is continuous on $\Ysf$ (since $h \in bc\Xsf$ and $\Zsf$ is discrete), so {prf:ref}`a-rc` implies that $(y, a) \mapsto \int h(y', z') R(y, z, a, \diff y')$ is continuous on $\Gsf_z$. This implies that $$ (y, a) \mapsto \int \sum_{z'} h(y', z') Q(z, z') R(y, z, a, \diff y') = (Kh)(y, z, a) $$ is continuous. Hence $K$ is weak Feller. To verify (ii), we introduce the operator $D$ on $b\Xsf$ via $$ (D h)(y, z) \coloneq \beta(z) \sup_{a \in \Gamma(y,z)} \int \sum_{z'} h(y', z')Q(z, z') R(y, z, a, \diff y') . $$ Since $D$ takes the supremum over $a$, we immediately have $K_\sigma \leq D$ on $b\Xsf_+$ for all $\sigma \in \Sigma$. It remains to show that $D$ is a discount operator (i.e., $D0 = 0$, $D$ is order-preserving, and $D$ is eventually contracting). The first two properties are clear. For eventual contractivity, fix $h \in b\Xsf_+$ and observe that, since $h \leq \|h\|$ and $R, Q$ are stochastic kernels, $$ (Dh)(y, z) \leq \beta(z) \|h\| = \|h\| \cdot (K_Q \1)(z). $$ Since $D$ is order preserving, we can iterate on this bound to obtain $$ \begin{aligned} (D^2 h)(y, z) & \leq \beta(z) \max_{a \in \Gamma(y,z)} \int \sum_{z'} \|h\| \cdot (K_Q \1)(z') \, Q(z, z') R(y, z, a, \diff y') \\ & = \|h\| \cdot \beta(z) \sum_{z'} (K_Q \1)(z') \, Q(z, z') = \|h\| \cdot (K_Q^2 \1)(z). \end{aligned} $$ Continuing to iterate, we obtain, for all $(y, z) \in \Xsf$ and all $n \in \NN$, $$ (D^n h)(y, z) \leq \| h \| \cdot (K_Q^n \1)(z). $$ (eq-dnhbound) Taking the supremum over the right-hand side and then the left yields $\| D^n h \| \leq \| h \| \cdot \| K_Q^n \1 \|$. Since $\rho(K_Q) < 1$, {prf:ref}`l-sddeq` provides an $n \in \NN$ and $\lambda \in [0,1)$ with $\| K_Q^n \1 \| \leq \lambda$, so $D$ is eventually contracting. We conclude that $D$ is a discount operator on $b\Xsf$, and the claims follow from {prf:ref}`p-affdpw`. ◻ ``` (ss-mdpsrv)= ### Markov Decision Processes We treated discrete MDPs in {ref}`s-mdps`. Let's now consider MDPs on general state spaces. Mathematically, MDPs are LDPs with a fixed discount factor and Markov dynamics under any fixed policy. On one hand, MDPs are a special case of LDPs and need no separate theoretical discussion. On the other hand, MDPs are a benchmark representation of a dynamic program, used throughout mathematics, operations research, and computer science. For this reason we'll take the time to specialize our LDP results to the Markov setting. Throughout this section, $\Xsf$ and $\Asf$ are separable metric spaces. (theory)= #### Theory Let $(\Gamma, r, K)$ be an LDP with state space $\Xsf$ and action space $\Asf$. This LDP is called a **Markov Decision Process** (MDP) when the transition kernel has the form $$ \int v(x') K(x, a, \diff x') = \beta \int v(x') P(x, a, \diff x') $$ (eq-mdpdefp) for some $\beta \in [0, 1)$ and some stochastic kernel $P$ from $\Gsf$ to $\Xsf$. The MDP above will be represented by the tuple $(\Gamma, r, \beta, P)$. The ADP generated by this MDP will be denoted $(b\Xsf, \TT_{\rm MDP})$, where $$ T_\sigma = r_\sigma + \beta P_\sigma, \quad \text{where} \quad r_\sigma(x) \coloneq r(x, \sigma(x)) \quad \text{and} \quad P_\sigma(x, \diff x') \coloneq P(x, \sigma(x), \diff x'). $$ (eq-mdpops) Choosing a policy $\sigma$ picks out a stochastic kernel $P_\sigma$ on $\Xsf$, so choosing a policy is akin to picking an $\Xsf$-valued Markov process. The following optimality result is an immediate consequence of {prf:ref}`p-affdpw`. ```{prf:proposition} :label: p-mdpwf If {prf:ref}`a-affdp` holds and $P$ is weak Feller, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $bc\Xsf$, and 3. VFI converges geometrically on $bc\Xsf$. If, in addition, $P$ is strong Feller, then OPI and HPI converge. ``` ```{prf:proof} Since $K = \beta P$, the kernel $K$ inherits the Feller property from $P$. With $(D h)(x) \coloneq \beta \max_{a \in \Gamma(x)} \int h(x')P(x,a, \diff x')$, we have $$ (D h)(x) \leq \beta \| h \| \max_{a \in \Gamma(x)} \int P(x,a, \diff x') = \beta \| h \| $$ Since $\beta < 1$, we see that $D$ is a discount operator. The claims now follow from {prf:ref}`p-affdpw`. ◻ ``` ```{prf:example} The basic optimal savings problem we studied in {ref}`s-og` is a strong Feller MDP. To put the model in this framework we set - $\Xsf = \Asf = \RR_+$, - $\Gamma(x) = [0, x]$, - $r(x, a) = u(a)$, and - $p(x, a, x') = \phi(x' - R(x - a))$, where $\phi$ is the continuous density of the income process. Using the change of variable $y = x' - R(x-a)$, we can write the Bellman equation as $$ \begin{aligned} v(x) & = \max_{0 \leq a \leq x} \left\{ u(a) + \beta \int v(x') p(x, a, x') \diff x' \right\} \\ & = \max_{0 \leq a \leq x} \left\{ u(a) + \beta \int v(R(x - a) + y) \phi(y) \diff y \right\} \end{aligned} $$ (To simplify the change of variable argument, we are assuming that $\phi$ is defined on all of $\RR$ with $\phi(y)=0$ whenever $y \leq 0$. The integrals above are taken over all of $\RR$.) The optimality results we obtained for the optimal savings model in {ref}`s-og` can be recovered from {prf:ref}`p-mdpwf`. ``` #### Implications Since MDPs are such an important special case, we briefly specialize the implications from {ref}`sss-ldpdef` to the MDP setting, replacing the general transition kernel $K$ with $\beta P$. If the conditions of {prf:ref}`p-mdpwf` hold and $P$ is strong Feller, then, for every $v \in b \Xsf$, there exists a $\sigma \in \Sigma$ obeying $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \int v(x') P(x,a, \diff x') \right\} \quad \text{for all } x \in \Xsf, $$ (eq-mdpgre) and a policy $\sigma \in \Sigma$ is $v$-greedy if and only if {eq}`eq-mdpgre` holds. Moreover, the Bellman operator simplifies to $$ (Tv)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \int v(x') P(x, a, \diff x') \right\} $$ (eq-mdpbellop) for every $v \in b \Xsf$. These expressions remain valid without the strong Feller condition when we restrict to $v \in b c \Xsf$. ## Applications We apply the theory developed above to two classes of problems. In {ref}`sss-nrm` we study a natural resource management problem with state-dependent discounting. In {ref}`sss-appospp` we analyze an optimal savings problem with stochastic rates of return on assets. (sss-nrm)= ### Natural Resource Management We consider a natural resource management application with Bellman equation $$ v(y, z) = \max_{0 \leq e \leq y} \left\{ \pi(e) + \beta(z) \int \sum_{z'} v(f(y - e) \xi, z') Q(z, z')\phi(\diff \xi) \right\}. $$ Here $y$ is the stock of the resource, $e$ is the current usage, $Q$ is a stochastic kernel on finite set $\Zsf$, $\phi$ is a distribution on $\RR_+$, $\pi \colon \RR_+ \to \RR$ is a profit function, $f \colon \RR_+ \to \RR_+$ is a transition function that updates the resource, $\beta \colon \Zsf \to \RR_+$ is a discount factor function, and $\xi$ is a multiplicative shock. The quantity $f(y-e) \xi$ is the next period stock. If, say, $\xi$ is concentrated at $1$ and $f(y-e) = y-e$, then this is exploitation of a nonrenewable resource. Another interpretation is that $y$ is a stock of fish at a given fishery, $e$ is current catch, $f$ is a transition rule that updates the stock given biological properties and environmental factors, and $\xi$ is a random shock to updating. We assume that $\pi$ is continuous and bounded, and that the function $f$ is continuous. In the exogenous discounting setting of {ref}`ss-exsdd`, the state is $\Xsf = \RR_+ \times \Zsf$, the action space is $\RR_+$, the feasible correspondence is $\Gamma(y, z) = [0, y]$, the reward function is $r(y, z, e) = \pi(e)$, and the transition kernel is $$ (Kh)(y,z,e) = \beta(z) \sum_{z'} \int h(f(y-e) \xi, z') \phi(\diff \xi) Q(z,z'). $$ The endogenous kernel $R$ is determined by $$ \int g(y') R(y, z, e, \diff y') = \int g(f(y-e) \xi) \phi(\diff \xi). $$ ```{prf:proposition} :label: p-nro For the natural resource model, if $\rho(K_Q) < 1$, then the fundamental optimality properties hold, the value function $\vmax$ lies in $bc\Xsf$, and VFI converges geometrically on $bc\Xsf$. ``` ```{prf:proof} We verify that the associated LDP $(\Gamma, r, K)$ satisfies the conditions of {prf:ref}`p-exdisc`. For {prf:ref}`a-affdp`, $\Gamma(y,z) = [0,y]$ is nonempty, continuous and compact-valued (see {prf:ref}`ex-ecco`), and $r = \pi$ is continuous by assumption. For {prf:ref}`a-rc`, fix $g \in bc\RR_+$. We need $(y, e) \mapsto \int g(f(y-e)\xi) \phi(\diff \xi)$ to be continuous on the feasible state-action pairs. Since $f$ and $g$ are continuous, so is $(y, e) \mapsto g(f(y-e)\xi)$ for each $\xi$, and the dominated convergence theorem gives the required continuity. Since $\rho(K_Q) < 1$ by hypothesis, all conditions of {prf:ref}`p-exdisc` are met and the conclusions therein hold. ◻ ``` The state evolves according to $$ y_{t+1} = f(y_t - \sigma(y_t))\xi_{t+1} $$ (eq-nrrlom) where $\sigma$ is the optimal consumption policy. Let's take a look at the kind of outcomes we can generate when $\beta$ is fixed, so that the exogenous shock process is degenerate. For simulation purposes, profits take the exponential form $\pi(x) = 1 - \exp(-\theta x^\gamma)$, while the transition function is set to $f(x) = x^\alpha \ell(x)$. Here $\ell$ is a generalized logistic function, while $\xi$ is lognormal.[^1] We compute the optimal policy $\sigma$ using value function iteration and then study the dynamics associated with the law of motion {eq}`eq-nrrlom`. ```{figure} figures/nrm_policy.pdf :name: f-nrm_policy Optimal policy and dynamics for the natural resource model ``` {numref}`f-nrm_policy` shows the optimal consumption policy $\sigma$ when $\beta = 0.96$, along with the 45 degree line, the map $y \mapsto f(y) \EE \xi$, which shows the expected next period stock with zero consumption, and the map $y \mapsto f(y - \sigma(y)) \EE \xi$, which shows expected dynamics under the optimal policy. Interestingly, the optimal choice for this parameterization is to consume none of the resource when the stock is small, enabling the stock to grow. Consumption only becomes positive when the stock is large enough to remain stable at a relatively high level. Of course, this kind of behavior will only be seen when the agent is sufficiently patient. {numref}`f-nrm_sk` shows more detail on the dynamics by examining the stochastic kernel associated with the Markov dynamics in {eq}`eq-nrrlom`, after taking logs. Each stochastic kernel is represented as a contour plot of the relevant conditional density. The four subplots correspond to four different values of the discount factor $\beta$. For each value of $\beta$, the plot shows where probability mass for next period stock concentrates relative to current stock, given the associated optimal policy. Mass above the 45 degree line implies that the state moves up on average, while mass below indicates that the state drifts down. As $\beta$ increases, the optimal policy adjusts to reduce current consumption and increase conservation, leading to probability mass shifting upward at each current state value. The changes in the stochastic kernel in {numref}`f-nrm_sk` seem minor but in fact they have large impacts on long run outcomes. {numref}`f-nrm_density` illustrates this by showing an estimate of the stationary distribution corresponding to each Markov process. Densities were estimated by simulating $100$ independent paths of length $1{,}000$ from a common initial condition. The plots show a sharp transition around $\beta=0.95$. For $\beta$ around that level, the long run stock is low. For slightly higher $\beta$, the optimal path leads to much larger stocks (recalling that we are working in logs). ```{figure} figures/nrm_sk.pdf :name: f-nrm_sk Stochastic kernel under the optimal policy at different $\beta$ ``` ```{figure} figures/nrm_density.pdf :name: f-nrm_density Variation in the stationary distribution across $\beta$ values ``` Up until now we've taken $\beta$ as a fixed parameter when computing optimal policies. Now we allow it to vary with an exogenous state $z$ via $\beta(z)$, in line with our theoretical analysis in {prf:ref}`p-nro`. To illustrate the effect of state-dependent discounting, we set $\Zsf = \{0.9, 0.99\}$ and $\beta(z) = z$. The exogenous state follows a two-state Markov chain with persistence $0.99$ in each state. Other model parameters are as in the fixed-$\beta$ experiments above. We computed optimal policies via value function iteration on the product space $\RR_+ \times \Zsf$. {numref}`f-nrm_sdd_sim` shows the outcome of simulating $20$ independent paths of the resource stock under the optimal policy, given a single realization of the exogenous process $(Z_t)$. The top panel displays the discount factor $\beta_t$, while the bottom panel shows the corresponding log stock $\log y_t$ over multiple alternative paths for $(\xi_t)$. During patient regimes, the stock tends to grow as the optimal policy shifts toward conservation. When the discount factor drops, the agent increases exploitation and the stock tends to decline. ```{figure} figures/nrm_sdd_policy.pdf :name: f-nrm_sdd_policy Optimal investment policy under state-dependent discounting ``` ```{figure} figures/nrm_sdd_sim.pdf :name: f-nrm_sdd_sim Simulated resource dynamics under state-dependent discounting ``` (sss-appospp)= ### Stochastic Rates of Return As our next application, we consider a savings problem with a persistent state process and a stochastic rate of return on assets. Stochastic returns on assets appear to be important in generating sufficiently heavy right tails in wealth distributions when we take models to the data. In this model, - the state is $x = (w, z)$, where $w \in \RR_+$ is wealth and $z$ is an exogenous state process on finite set $\Zsf$ with stochastic kernel (matrix) $Q$, - the action $a$ is current consumption $c$, taking values in $\RR_+$, - the feasible correspondence is $\Gamma(x) = \Gamma(w, z) = [0, w]$, - the reward is $r(x, a) = r((w, z), c) = u(c)$, where $u$ is bounded and continuous, - the discount factor is $\beta \in (0,1)$, and - the stochastic kernel takes the form $$ \int v(x') P(x, a, \diff x') = \sum_{z'} \int v[ R(z') (w - c) + y(z', s') , z' ] \phi(\diff s') Q(z, z'). $$ The kernel can be explained as follows: Labor income is affected by an iid shock $s'$ drawn from distribution $\phi \in \dD(\Ssf)$, where $\Ssf$ is a topological space. In addition, both the interest rate and labor income are impacted by a common persistent component $z$. The latter is driven by stochastic matrix $Q$. We give $\Zsf$ the discrete topology and $\Xsf = \RR_+ \times \Zsf$ the product topology. We apply {prf:ref}`p-mdpwf` to this model. For {prf:ref}`a-affdp`, continuity and compact-valuedness of $\Gamma$ follow from {prf:ref}`ex-ecco`, and $r = u$ is continuous by assumption. It remains to verify that $P$ is weak Feller. Fixing $v \in bc\Xsf$, we must show that the mapping $$ m(w, z, c) \coloneq \sum_{z'} \int v[ R(z') (w - c) + y(z', s') , z' ] \phi(\diff s') Q(z, z') $$ is continuous on $\Gsf$. Taking $(w_n, z_n, c_n) \to (w, z, c)$ in $\Gsf$, since $\Zsf$ has the discrete topology, $(z_n)$ is eventually constant at $z$. Hence it suffices to show that $m(w_n, z, c_n) \to m(w, z, c)$. This follows from continuity and boundedness of $v$ and the dominated convergence theorem. Hence {prf:ref}`p-mdpwf` applies and the conclusions therein hold. The Bellman operator takes the form $$ (Tv)(w,z) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \, \sum_{z' \in \Zsf} \int v[ R(z') (w - c) + y(z', s') , z' ] \phi(\diff s') Q(z, z') \right\}. $$ ```{exercise} :label: ex-ldps-auto-1 Consider an optimal savings model identical to the one described in {ref}`sss-appospp` except that agents die with a time-dependent probability in each period (see, e.g., {cite}`de2020nonlinear`). To accommodate this feature, we modify the state, setting it to $x = (w, z, t) \in \Xsf := \RR_+ \times \Zsf \times \ZZ_+$, where $t$ represents time. The transition kernel becomes $$ \int v(x') K(x, c, \diff x') = \beta \, q(t) \sum_{z' \in \Zsf} \int v[ R(z') (w - c) + y(z', s') , z', t+1 ] \phi(\diff s') Q(z, z'), $$ where $q(t) \in [0, 1]$ is the survival probability at age $t$. Higher probability of dying reduces the expected continuation value. Show that {prf:ref}`p-affdpw` applies. (Impose the discrete topology on $\ZZ_+$.) ``` ```{solution} ex-ldps-auto-1 We verify the conditions of {prf:ref}`p-affdpw`. For {prf:ref}`a-affdp`, the correspondence $\Gamma(w, z, t) = [0, w]$ is continuous and compact-valued by {prf:ref}`ex-ecco`, and $r = u$ is continuous by assumption. For the discount operator condition, define $$ (Dh)(w,z,t) \coloneq \beta \max_{c \in [0,w]} \sum_{z'} \int h[ R(z') (w - c) + y(z', s') , z', t+1 ] \phi(\diff s') Q(z, z'). $$ Since $q(t) \leq 1$, we have $K_\sigma \leq D$ on $b\Xsf_+$ for all $\sigma \in \Sigma$. Moreover, for $h \in b\Xsf_+$, the stochastic kernel structure gives $(Dh)(w,z,t) \leq \beta \|h\|$, so $D$ is a contraction of modulus $\beta$ and hence a discount operator. It remains to show that $K$ is weak Feller. Fix $v \in bc\Xsf$ and let $(w_n, z_n, t_n, c_n) \to (w, z, t, c)$ in $\Gsf$. Since $\Zsf$ and $\ZZ_+$ have the discrete topology, $(z_n, t_n)$ is eventually constant at $(z, t)$. Hence it suffices to show that $$ \sum_{z'} \int v[ R(z') (w_n - c_n) + y(z', s') , z', t+1 ] \phi(\diff s') Q(z, z') $$ converges to the same expression with $(w_n, c_n)$ replaced by $(w, c)$. This follows from continuity and boundedness of $v$ and the dominated convergence theorem. ``` (s-cns_ldps)= ## Chapter Notes The Feller properties discussed in {ref}`sss-feller` are standard tools in the theory of Markov chains and stochastic processes. For further background, see {cite}`hernandez2012discrete` or {cite}`bauerle2011markov`. The use of Feller conditions to guarantee existence of optimal policies in MDPs and dynamic programs dates back to the foundational work of {cite}`blackwell1965`. Scheffé's lemma, used in the proof of {prf:ref}`l-dksf`, is a classical result in measure theory (see p. ). Standard proofs of the optimality results we stated for MDPs on general state spaces ({ref}`ss-mdpsrv`) can be found in {cite}`puterman2005markov`, {cite}`bauerle2011markov`, {cite}`hernandez2012discrete`, {cite}`stachurski2022economic`, or {cite}`sargent2025dynamic`. The exposition of exogenous discount processes in {ref}`ss-exsdd` is partly based on {cite}`stachurski2021dynamic`. State-dependent discounting in the context of dynamic programming is also studied in {cite}`jaskiewicz2014variable`. The natural resource management model in {ref}`sss-nrm` is a standard bioeconomic exploitation model; see {cite}`clark2010mathematical` for background. Versions with state-dependent discounting are relevant for modeling resource management under fluctuating economic conditions. For a discussion of stochastic rates of return on financial income, as considered in {ref}`sss-appospp`, see {cite}`benhabib2015wealth` or {cite}`stachurski2019impossibility`. The latter shows that heavy-tailed wealth distributions can also be generated by time preference shocks, but this channel is relatively unrealistic, since it requires that all households in the economy simultaneously experience time preference shocks in the same direction. Additional work on the relationship between stochastic discount factors and wealth distributions includes {cite}`toda2019wealth`, {cite}`ma2020income`, and {cite}`nirei2015wealth`. What we have called linear decision processes (LDPs) might be confused with Markov decision processes having linear reward or cost functions. The latter are a special case of the former. For a recent discussion of MDPs with linear cost functions, see {cite}`rantzer2022explicit` and {cite}`li2024semilinear`. [^1]: The logistic function is $\ell(x) = a + (b-a)/(1 + \exp(-c(x-d)))$ with $a=1$, $b=1.5$, $c=20$, $d=1$. Other parameters are $\theta = 0.5$, $\gamma = 0.9$, $\alpha = 0.7$, and $\xi \sim \text{LN}(-0.1, 0.2)$. The optimal policy was computed by value function iteration on a grid of $500$ state points and $2{,}000$ action points using JAX. ======================================================================== ## Recursive Decision Processes In this chapter we study what {cite}`sargent2025dynamic` call *recursive decision processes* (RDPs). The following display shows where RDPs fit relative to the other major classes of DP models studied so far in this book: $$ \text{MDPs } \subset \text{ LDPs } \subset \text{ RDPs } \subset \text{ ADPs}. $$ The main role of RDPs is to extend LDPs by accommodating nonlinearities in aggregators and calculation of present values. Another difference between this chapter and our discussion of LDPs is that we allow for unbounded rewards and value functions. While this can also be done in an LDP setting, the corresponding analysis turns out to be cleaner when working with RDPs. {ref}`s-rdps` introduces the RDP framework, provides examples, clarifies the relationships between RDPs, LDPs and ADPs, and discusses existence of greedy policies. {ref}`ss-ucs0` and {ref}`ss-ucs` present optimality results---first for bounded rewards and then for unbounded rewards handled via weighted contractions---while {ref}`ss-posol` gives conditions under which the value function is monotone, concave, or uniquely determined. After a digression on certainty equivalents ({ref}`sss-ces`), we extend the optimality theory to MDPs with general certainty equivalents ({ref}`ss-mpdsces`). {ref}`sss-osapp` applies the theory to optimal savings with unbounded utility, and we then study irreversible investment under risk neutrality, risk aversion, and ambiguity aversion. (s-rdps)= ## Introduction {ref}`ss-rdps` defines RDPs and provides examples. We then clarify the relationships between RDPs, LDPs and ADPs, and discuss existence of greedy policies in {ref}`ss-rpdeg`. (ss-rdps)= ### Definition and Examples To a first approximation, RDPs are dynamic programs with a Bellman equation of the form $$ v(x) = \max_{a \in \Gamma(x)} B(x, a, v) \qquad (x \in \Xsf) $$ (eq-belleq0) for some suitable choice of $B$. Here $x$ is the state, $a$ is an action, $\Gamma$ is a feasible correspondence and $B$ is an "aggregator," with interpretation > $B(x, a, v) =$ total lifetime rewards, contingent on current action $a$, current state $x$ and the use of $v$ to evaluate future states. In {ref}`sss-rdpdef`--{ref}`sss-serdp` we improve this definition and then provide examples. As usual, in a topological space setting, "measurable" means "Borel measurable" unless otherwise stated. (sss-rdpdef)= #### Definition Let $\Xsf$ and $\Asf$ be separable metric spaces, referred to henceforth as the **state** and **action spaces** respectively. Given these spaces, a **recursive decision process** (RDP) is a tuple $(\Gamma, V, B)$ containing 1. a nonempty correspondence $\Gamma$ from $\Xsf$ to $\Asf$ called the **feasible correspondence**, with an associated set of **feasible state-action pairs** $$ \Gsf := \graph \Gamma = \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)} $$ and an associated set of **feasible policies** $$ \Sigma \coloneq \{ \text{all measurable } \sigma \colon \Xsf \to \Asf \text{ satisfying } \sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf \}, $$ 2. a subset $V$ of $\RR^\Xsf$ called the **value space**, 3. a map $B \colon \Gsf \times V \to \RR$, referred to as an **aggregator**, satisfying the monotonicity condition $$ v \leq w \implies B(x, a, v) \leq B(x, a, w) $$ (eq-mon) for every $v, w \in V$ and every $(x, a) \in \Gsf$, and the consistency condition $$ \sigma \in \Sigma \text{ and } v \in V \; \implies \; m(x) \coloneq B(x, \sigma(x), v) \text{ is in } V. $$ (eq-con) Several objects, such as $\Xsf, \Asf$ and $\Gamma$ are familiar from our definition of LDPs in {ref}`sss-ldpdef`. Analgous to the LDP case, when representing the RDP by the tuple $(\Gamma, V, B)$, we are treating $\Xsf$ and $\Asf$ as understood from context. The value space $V$ is a class of functions that assign values to states. The order on the left side of {eq}`eq-mon` is the usual pointwise partial order for functions. The monotonicity restriction is natural: relative to $v$, if rewards are at least as high under $w$ in every future state, then the total rewards we can extract under $w$ should be at least as high. The final condition, in {eq}`eq-con`, is a consistency condition implying that $V$ is large enough to capture the value of following a particular policy. (sss-mdpsarerdps)= #### Example: Finite MDPs In {ref}`s-mdps` we introduced the basic MDP model, with finite state space $\Xsf$, finite action space $\Asf$, and remaining primitives $(\Gamma, r, \beta, P)$ as given in {ref}`sss-tdtm`. This maps easily to the RDP setting by taking $V = \RR^\Xsf$, $\Gamma$ as given, and $$ B(x, a, v) = r(x, a) + \beta \sum_{x'} v(x') P(x, a, x') $$ for $(x, a) \in \Gsf$ and $v \in \RR^\Xsf$. ```{exercise} :label: ex-rdps-auto-1 Confirm that, for this RDP, the monotonicity and consistency conditions {eq}`eq-mon` and {eq}`eq-con` both hold. ``` For this model, it is clear that the RDP Bellman equation $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ from {eq}`eq-belleq0` agrees with the original expression we gave in {eq}`eq-mdp_bell0`. (sss-firmrdp)= #### Example: The Firm Valuation Problem Recall the firm decision problem we analyzed in {ref}`sss-fintroc`, where the decision is binary ($0$ means continue and $1$ means sell) and the state $x$ takes values in a set $\Xsf$ and evolves via stochastic kernel $P$. To map this problem to an RDP we set $\Asf = \{0,1\}$ and $\Gamma(x) = \Asf$ for all $x$. We take $V \coloneq b\Xsf$ as the value space and set $$ B(x, a, v) = a s + (1 - a) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right]. $$ (eq-fvagg) The monotonicity condition {eq}`eq-mon` clearly holds. A policy is a $\bB$-measurable map $\sigma \colon \Xsf \to \{0,1\}$. Given any such policy and any $v \in b\Xsf$, the function $$ m(x) \coloneq \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] $$ is in $b\Xsf$ (since $\pi$ is assumed to be bounded), so the consistency condition {eq}`eq-con` also holds. For this model, the RDP Bellman equation $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ becomes $$ v(x) = \max_{a \in \{0,1\}} \left\{ a s + (1 - a) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] \right\} $$ This is equivalent to the original statement of the Bellman equation in {prf:ref}`t-fintroop`, on page . #### Firm Valuation with Unbounded Profits The firm valuation problem can also fit into the RDP framework when profits are unbounded, at least in some cases. For example, suppose that $\Xsf = \RR_+$, that $\ell$ is a given weight function on $\RR_+$ (see {ref}`sss-weisup`), and that there exist nonnegative constants $\alpha, \eta, \delta$ such that $$ \pi(x) \leq \eta \ell(x) + \delta \quad \text{and} \quad \int \ell(x') P(x, \diff x') \leq \alpha \ell(x) \qquad (x \in \Xsf). $$ (eq-fvwkb) (These conditions bound the rate at which profits grow.) We again take $B$ as in {eq}`eq-fvagg` and $\Gamma(x) = \{0,1\}$ for all $x \in \RR_+$. We set $V$ equal to $b_\ell \Xsf$, the set of measurable functions $v \in \RR^\Xsf$ with $\| v \|_\ell < \infty$. (Here $\| \cdot \|_\ell$ denotes the $\ell$-weighted supremum norm, as in {ref}`sss-weisup`.) ```{exercise} :label: ex-fvubrdp Show that the monotonicity and consistency conditions in the definition of an RDP hold for the tuple $(\Gamma, V, B)$ defined above. ``` ```{solution} ex-fvubrdp For monotonicity, fix $(x, a) \in \Gsf$ and $v, w \in V$ with $v \leq w$. Since $P(x, \cdot)$ is a nonnegative measure, we have $\int v(x') P(x, \diff x') \leq \int w(x') P(x, \diff x')$. Because $a \in \{0,1\}$, the coefficient $(1 - a)$ is nonnegative, so $B(x, a, v) \leq B(x, a, w)$. For consistency, fix $\sigma \in \Sigma$ and $v \in V = b_\ell \Xsf$. Consider $m(x) \coloneq B(x, \sigma(x), v)$. Since $\sigma$ is measurable and $\pi$ and $x \mapsto \int v(x') P(x, \diff x')$ are measurable, $m$ is measurable. For $\ell$-boundedness, we have $$ |m(x)| \leq |s| + |\pi(x)| + \beta \int |v(x')| P(x, \diff x'). $$ Since $v \in b_\ell \Xsf$, we have $|v| \leq \|v\|_\ell \ell$, giving $\int |v(x')| P(x, \diff x') \leq \|v\|_\ell \int \ell(x') P(x, \diff x') \leq \alpha \|v\|_\ell \ell(x)$ by {eq}`eq-fvwkb`. Also $|\pi(x)| \leq \eta \ell(x) + \delta$ by {eq}`eq-fvwkb`. Hence $$ |m(x)| \leq (|s| + \delta) + (\eta + \beta \alpha \|v\|_\ell)\, \ell(x), $$ so $m \in b_\ell \Xsf = V$ and the consistency condition holds. ``` (sss-serdp)= #### Example: Optimal Savings Consider the optimal savings problem studied in {ref}`s-og`. The state is $w \in \RR_+$ and the action is $c \in \RR_+$. The feasible correspondence is $\Gamma(w) = [0, w]$ and $V \coloneq b\RR_+$ is the value space. We set $$ B(w, c, v) = u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \qquad (v \in V, \; 0 \leq c \leq w). $$ As in {prf:ref}`a-uf`, we take $u$ to be bounded and continuous. Under these restrictions, the tuple $(\Gamma, V, B)$ is an RDP. The function $B$ is real-valued and the monotonicity condition {eq}`eq-mon` clearly holds. The consistency condition {eq}`eq-con` holds because, by the definition of $\Gamma$, a policy is a Borel measurable map $\sigma \colon \RR_+ \to \RR_+$ with $0 \leq \sigma(w) \leq w$ for all $w$, and given any such policy and any $v \in b\RR_+$, the function $$ m(w) \coloneq u(\sigma(w)) + \beta \int v(R(w - \sigma(w)) + y) \phi(\diff y) $$ is measurable and bounded (since $u$ is bounded and continuous). For this model, the RDP Bellman equation $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ from {eq}`eq-belleq0` agrees with the optimal savings Bellman equation in {eq}`eq-osbell`. (sss-ezisrdp)= #### Example: Savings with Kreps--Porteus Expectations Consider a variation of the optimal savings model from {ref}`sss-serdp` with Epstein--Zin-type preferences. To simplify the presentation, we set the EIS parameter to $\psi = \infty$, so that the CES aggregator reduces to addition, while retaining the nonlinear Kreps--Porteus expectation over future values. The Bellman equation becomes $$ v(w) = \max_{0 \leq c \leq w} \left\{ (1-\beta) u(c) + \beta \left( \int v(R(w-c) + y)^{1-\gamma} \phi(\diff y) \right)^{\frac{1}{1-\gamma}} \right\} $$ (eq-ezsbell) where $\gamma > 0$ with $\gamma \neq 1$ is the coefficient of relative risk aversion. As before, the state is $w \in \RR_+$, the action is $c \in \RR_+$, and $\Gamma(w) = [0, w]$. To avoid raising zero to a negative power, we assume that $u$ is measurable and that there exist constants $0 < \underline{u} \leq \bar{u} < \infty$ with $\underline{u} \leq u(c) \leq \bar{u}$ for all $c \in \RR_+$. The aggregator is $$ B(w, c, v) = (1-\beta) u(c) + \beta \left( \int v(R(w-c) + y)^{1-\gamma} \phi(\diff y) \right)^{\frac{1}{1-\gamma}}. $$ We set the value space $V$ to be all measurable functions $v \colon \RR_+ \to [\underline{u}, \bar{u}]$. ```{exercise} :label: ex-ezrdp Show that $(\Gamma, V, B)$ is an RDP. ``` ```{solution} ex-ezrdp For monotonicity, fix $(w, c) \in \Gsf$ and $v, v' \in V$ with $v \leq v'$. We show that the Kreps--Porteus expectation is increasing: if $\gamma < 1$, then $v^{1-\gamma} \leq (v')^{1-\gamma}$ pointwise, so the integral increases, and raising to $1/(1-\gamma) > 0$ preserves the inequality. If $\gamma > 1$, then $v^{1-\gamma} \geq (v')^{1-\gamma}$ (the power reverses order) so the integral decreases, but raising to $1/(1-\gamma) < 0$ reverses order again. In both cases $B(w, c, v) \leq B(w, c, v')$, confirming {eq}`eq-mon`. For consistency, fix $\sigma \in \Sigma$ and $v \in V$. Since $v$ takes values in $[\underline{u}, \bar{u}] \subset (0, \infty)$, the Kreps--Porteus expectation of $v$ also lies in $[\underline{u}, \bar{u}]$ (by the same argument as above, applied with the constant functions $\underline{u}$ and $\bar{u}$ in place of $v$ and $v'$). Hence $$ \underline{u} \leq (1-\beta) \underline{u} + \beta \underline{u} \leq m(w) \leq (1-\beta) \bar{u} + \beta \bar{u} = \bar{u} $$ where $m(w) \coloneq B(w, \sigma(w), v)$. Since $m$ is also measurable, we have $m \in V$, confirming {eq}`eq-con`. ``` With $B$ defined as above, the RDP Bellman equation $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ from {eq}`eq-belleq0` agrees with {eq}`eq-ezsbell`. (sss-mdpssr)= #### Example: MDPs with Modified Rewards Some authors use an MDP framework where current rewards depend on the next period state, so that the Bellman equation has the form $$ v(x) = \max_{a \in \Gamma(x)} \sum_{x'} \left\{ r(x, a, x') + \beta v(x') \right\} P(x, a, x') \qquad (x \in \Xsf). $$ (eq-mmbell) Here $r$ maps $\Gsf \times \Xsf$ to $\RR$ and other primitives are unchanged. We take $V = \RR^\Xsf$, $\Gamma$ as given, and set $$ B(x, a, v) = \sum_{x'} \left\{ r(x, a, x') + \beta v(x') \right\} P(x, a, x') $$ Evidently, for the associated RDP $(\Gamma, V, B)$, the monotonicity and consistency conditions {eq}`eq-mon` and {eq}`eq-con` both hold. For this choice of $B$, the RDP Bellman equation $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ agrees with the modified MDP Bellman equation in {eq}`eq-mmbell`. (sss-serez)= #### Example: Risk-Sensitive Preferences In §{prf:ref}`eg-rsnotldp` we discussed a risk-sensitive MDP with entropic certainty equivalent. This model can be embedded in the RDP framework by setting $V = \RR^\Xsf$, $\Gamma$ as given, and $$ B(x, a, v) \coloneq r(x, a) + \frac{\beta}{\theta} \ln \left[ \sum_{x'} \exp(\theta v(x')) P(x,a,x') \right] $$ (eq-rsbb) The parameter $\theta$ is any nonzero real value. ```{exercise} :label: ex-rdps-auto-2 Confirm that, for the associated RDP $(\Gamma, V, B)$, the monotonicity and consistency conditions {eq}`eq-mon` and {eq}`eq-con` both hold. ``` When $B$ is given by {eq}`eq-rsbb`, the RDP Bellman equation $v(x) = \max_{a \in \Gamma(x)} B(x, a, v)$ from {eq}`eq-belleq0` becomes $$ v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \frac{\beta}{\theta} \ln \left[ \sum_{x'} \exp(\theta v(x')) P(x,a,x') \right] \right\}. $$ ### RDPs vs LDPs vs ADPs As mentioned at the start of the chapter, we have MDPs $\subset$ LDPs $\subset$ RDPs $\subset$ ADPs and the inclusions are all strict. We already know that the first inclusion is strict (consider, for example, the LDP with state-dependent discounting in {ref}`ss-exsdd`). Here we review the remaining relationships. #### LDPs are RDPs Let $(\Gamma, r, K)$ be an LDP with state and action spaces $\Xsf$, $\Asf$, as defined in {ref}`sss-ldpdef`. Setting $V = b\Xsf$ and $$ B(x, a, v) = r(x, a) + \int v(x') K(x,a, \diff x') \qquad ((x,a) \in \Gsf, \; v \in V), $$ (eq-ldpagg) the resulting tuple $(\Gamma, V, B)$ is an RDP. To see this, note that $V = b\Xsf$ is a subset of $\RR^\Xsf$, so we only need to check the monotonicity and consistency conditions for the aggregator $B$. For monotonicity, fix $(x, a) \in \Gsf$ and $v, w \in V$ with $v \leq w$. Since $K(x, a, \cdot)$ is a nonnegative measure, we have $\int v(x') K(x, a, \diff x') \leq \int w(x') K(x, a, \diff x')$ and hence $B(x, a, v) \leq B(x, a, w)$. For consistency, fix $\sigma \in \Sigma$ and $v \in b\Xsf$. We need to show that $m(x) \coloneq B(x, \sigma(x), v)$ is in $b\Xsf$. This follows from the LDP conditions, which require $r \in b\Gsf$ and $Kv \in b\Gsf$ whenever $v \in b\Xsf$. The risk-sensitive MDP in {ref}`sss-serez` is an RDP but not an LDP, since the aggregator is nonlinear in future values. (sss-rva)= #### RDPs are ADPs Every RDP generates an ADP. To see this, let $(\Gamma, V, B)$ be an RDP with state space $\Xsf$ and action space $\Asf$. The set $V$ is paired with the pointwise partial order. With $\Sigma$ as the set of feasible policies and given $\sigma$ in $\Sigma$, we define $T_\sigma$ by $$ (T_\sigma \, v)(x) = B(x, \sigma(x), v) \qquad (x \in \Xsf, \; v \in V). $$ The monotonicity and consistency conditions in {eq}`eq-mon`--{eq}`eq-con` imply that $T_\sigma$ is an order-preserving self-map on $V$. Hence, with $\TT$ as the set of all policy operators, the pair $(V, \TT)$ is an ADP. We call $(V, \TT)$ the **ADP generated by $(\Gamma, V, B)$**. For ADPs generated by RDPs, we can provide intuitive representations of greedy policies and the Bellman equation. For example, we recall from our ADP definition in {eq}`eq-adpgr` that a policy $\sigma \in \Sigma$ is $v$-greedy for ADP $(V, \TT)$ if $T_\tau \, v \leq T_\sigma \, v$ for all $\tau \in \Sigma$. If $(V, \TT)$ is generated by $(\Gamma, V, B)$, then this is equivalent to the statement that $$ B(x, \tau(x), v) \leq B(x, \sigma(x), v) \quad \text{for all } \tau \in \Sigma \text{ and } x \in \Xsf. $$ (eq-sbtau) Also, we recall that the ADP Bellman operator is defined by $Tv = \bigvee_\sigma T_\sigma \, v$ whenever the supremum exists. When $(V, \TT)$ is generated by $(\Gamma, V, B)$, this is equivalent to the statement $(Tv)(x) = \sup_{\sigma \in \Sigma} B(x, \sigma(x), v)$ for all $x \in \Xsf$ whenever the pointwise supremum exists (see {prf:ref}`ex-polim`). Under reasonable conditions on $\Gamma$ and $B$, we will show that this can be improved to the stronger form $$ (Tv)(x) = \max_{a \in \Gamma(x)} B(x, a, v) \qquad (x \in \Xsf) $$ (eq-tvrdpg) (see {prf:ref}`l-exgref` and {prf:ref}`l-exgre`). ```{exercise} :label: ex-tcrdpsiso Let $(\Gamma, V, B)$ be an RDP with $V \subset M^\Xsf$ for some $M \subset \RR$. Let $\phi$ be an order isomorphism from $M$ onto a subset $\hat M$ of $\RR$, set $\hat V \coloneq \{\phi \circ v : v \in V\}$, and suppose that $\hat B$ is such that $(\Gamma, \hat V, \hat B)$ is an RDP and $$ B(x, a, v) = \phi^{-1}[ \hat B(x, a, \phi \circ v) ] \quad \text{for all } v \in V \text{ and } (x, a) \in \Gsf. $$ (eq-bbhat) Prove that the generated ADPs $(V, \TT)$ and $(\hat V, \hat{\TT})$ are isomorphic. ``` ```{solution} ex-tcrdpsiso Let $F v \coloneq \phi \circ v$. By construction, $F$ maps $V$ onto $\hat V$. Moreover, $F$ is an order isomorphism: since $\phi$ is an order isomorphism on $\RR$, we have $v(x) \leq w(x) \iff \phi(v(x)) \leq \phi(w(x))$ for each $x \in \Xsf$, and hence $v \leq w$ pointwise if and only if $Fv \leq Fw$ pointwise (cf. {prf:ref}`ex-vmvm`). It remains to verify the conjugacy condition. Fix $\sigma \in \Sigma$ and $v \in V$. Evaluating {eq}`eq-bbhat` at $a = \sigma(x)$ gives $B(x, \sigma(x), v) = \phi^{-1}[\hat B(x, \sigma(x), \phi \circ v)]$ for all $x$, which is $T_\sigma \, v = F^{-1} \circ \hat T_\sigma \circ F \, v$. Hence the generated ADPs are isomorphic. ``` Since every RDP is an ADP, we can use ADP optimality results to study RDPs. Given an RDP $(\Gamma, V, B)$ and its generated ADP $(V, \TT)$, we make the obvious connections, saying that - $(\Gamma, V, B)$ is **regular** if $(V, \TT)$ is regular, - $T$ is **the Bellman operator** for $(\Gamma, V, B)$ when $T$ is the Bellman operator for $(V, \TT)$, - $\sigma$ is **optimal** for $(\Gamma, V, B)$ when $\sigma$ is optimal for $(V, \TT)$, - etc. #### Not all ADPs are RDPs Although the RDP framework is broad, there are significant dynamic programs that fall outside this framework. ```{prf:example} :label: eg-firmexno Recall the real option problem from {ref}`ss-ro` with Bellman equation $$ v(x) = -c(x) + \beta(x) \int \max \left\{q(x'), v(x') \right\} P(x, \diff x') $$ (eq-fe_modbell2) We saw in {ref}`sss-rolv` that the problem can be represented as an ADP. At the same time this ADP cannot be directly placed in the RDP framework for two reasons. One is that the $\max$ operator in {eq}`eq-fe_modbell2` is inside the integral (as compared to {eq}`eq-belleq0`, where the $\max$ is on the outside). The second is that, in the setting of {ref}`sss-rolv`, the value space is an $L_p$ space, consisting of equivalence classes of functions, rather than a space of real-valued functions. ``` ```{prf:example} In {ref}`sss-pavf` we considered a structural estimation problem with Bellman equation $$ g(x, a) = \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x') $$ (eq-stbe2) Similar to the situation in {prf:ref}`eg-firmexno`, this cannot be directly placed in the RDP framework because the $\max$ operator is inside the expectation. ``` In the examples above, it is possible to rearrange the problem so that the $\max$ operator is shifted to the outside and, thereby, construct a version that fits the RDP framework. But there are good reasons to avoid this, related to smoothness and dimensionality (see, e.g., {cite}`kristensen2021solving` or {cite}`rust1994structural`). ```{prf:example} :label: eg-lqrdp In {ref}`ss-lq` we investigated linear-quadratic (LQ) problems with Bellman equations such as $$ v(x) = \min_u \left\{ u^\top R u + x^\top Q x + v(Ax + B u) \right\}. $$ (eq-lqbe2) Here $R, Q, A$, and $B$ are matrices, while $u$ and $x$ are vectors. This model looks similar to an RDP if we set $\Xsf = \RR^k$, $\Asf = \RR^m$, $\Gamma(x) = \Asf$ for all $x \in \Xsf$ and, for the aggregator, $$ B(x, u, v) = u^\top Ru + x^\top Q x + v(Ax + Bu) . $$ However, in {ref}`ss-lq` we took $\Sigma$ to be a set of *stable* controls, so that $F \in \Sigma$ means that $F$ is a matrix and $\rho(A + BF) < 1$. Thus, we restrict $\Sigma$ beyond just feasibility of actions, in contrast to the specification of $\Sigma$ within the definition of an RDP. The LQ problem is difficult to handle without such additional restrictions on policies. ``` (ss-rpdeg)= ### Existence of Greedy Policies As always, existence of greedy policies is important for our analysis. In this section, we investigate RDP environments where greedy policies exist. We begin with finite action spaces and then move to the general case. #### Finite Actions We begin with the discrete choice setting, where greedy policies always exist. ```{prf:lemma} :label: l-exgref Let $(\Gamma, V, B)$ be an RDP. If $\Asf$ is finite, then, for all $v \in V$, 1. $\sigma \in \Sigma$ is $v$-greedy if and only if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} B(x, a, v) \qquad \text{for all } x \in \Xsf, $$ (eq-defgreedy) 2. at least one $v$-greedy policy exists, and 3. the Bellman operator satisfies $$ (Tv)(x) = \max_{a \in \Gamma(x)} B(x, a, v) \qquad (x \in \Xsf). $$ (eq-tvrdpsc) ``` ```{prf:proof} Fix $v \in V$ and enumerate $\Asf = \{a_1, \ldots, a_n\}$. The claim (i, $\Leftarrow$) is obvious: if $\sigma \in \Sigma$ obeys {eq}`eq-defgreedy` then {eq}`eq-sbtau` clearly holds. Next we prove (ii). Since $\Gamma(x)$ is nonempty and finite for each $x$, the set $\argmax_{a \in \Gamma(x)} B(x, a, v)$ is nonempty for all $x$. Define $\sigma(x) = a_{i(x)}$ where $i(x)$ is the smallest index $i$ such that $a_i \in \argmax_{a \in \Gamma(x)} B(x, a, v)$. Then $\sigma$ satisfies {eq}`eq-defgreedy`. Moreover, $\sigma$ is Borel measurable. To see this, for each $k$, let $$ S_k \coloneq \setntn{x}{a_k \in \Gamma(x) \text{ and } B(x, a_k, v) \geq B(x, a_j, v) \text{ for all } a_j \in \Gamma(x)}. $$ Each $S_k$ is Borel, since it is defined by finitely many inequalities involving measurable functions. Moreover, $\sigma^{-1}(a_k) = S_k \setminus (S_1 \cup \cdots \cup S_{k-1})$, which is also Borel. These claims show that $\sigma \in \Sigma$ and $\sigma$ obeys {eq}`eq-defgreedy`. By (i, $\Leftarrow$), $\sigma$ is $v$-greedy. In particular, at least one $v$-greedy policy exists. For (iii), since a $v$-greedy policy exists, {prf:ref}`l-torper` gives $Tv = T_\sigma \, v$ for any $v$-greedy $\sigma$. Combined with {eq}`eq-defgreedy`, this yields {eq}`eq-tvrdpsc`. Now we return to (i, $\Rightarrow$). Suppose $\sigma$ is $v$-greedy. By {prf:ref}`l-torper`, $T_\sigma \, v = Tv$, and hence, by (iii), $$ B(x, \sigma(x), v) = (T_\sigma \, v)(x) = (Tv)(x) = \max_{a \in \Gamma(x)} B(x, a, v) $$ for all $x$, which is {eq}`eq-defgreedy`. ◻ ``` Note that, in the simple setting of {prf:ref}`l-exgref`, the Bellman equation takes the form of {eq}`eq-belleq0`. Below we investigate more complex settings where this is still true. #### Continuous Actions Now we drop the finiteness restriction on $\Asf$. In these general RDP settings, existence is less trivial. Here we state one useful result. ```{prf:lemma} :label: l-exgre Let $(\Gamma, V, B)$ be an RDP and fix $v \in V$. If - $\Gamma$ is continuous and compact-valued on $\Xsf$, and - $(x, a) \mapsto B(x, a, v)$ is continuous on $\Gsf$, then 1. a policy $\sigma \in \Sigma$ is $v$-greedy if and only if $$ \sigma(x) \in \argmax_{a \in \Gamma(x)} B(x, a, v) \quad \text{for all } x \in \Xsf $$ (eq-rdpgam) 2. at least one $v$-greedy policy exists, 3. the Bellman operator is well-defined at $v$, the function $Tv$ is continuous on $\Xsf$, and $$ (Tv)(x) = \max_{a \in \Gamma(x)} B(x, a, v) \quad \text{for all } x \in \Xsf. $$ (eq-tva) If, in addition, $\sigma(x)$ is the unique maximizer of $B(x, a, v)$ over $\Gamma(x)$ for each $x \in \Xsf$, then $\sigma$ is continuous on $\Xsf$. ``` ```{prf:proof} Fix $v \in V$ and suppose that the stated conditions hold. We use the following facts, which follow from {prf:ref}`t-berge`: - $m(x) \coloneq \max_{a \in \Gamma(x)} B(x, a, v)$ is well-defined and continuous on $\Xsf$, and - there exists a measurable $\sigma \in \Sigma$ satisfying {eq}`eq-rdpgam`. The claim (i, $\Leftarrow$) is obvious: if $\sigma \in \Sigma$ obeys {eq}`eq-rdpgam` then {eq}`eq-sbtau` clearly holds. Claim (ii) follows from (b) and (i, $\Leftarrow$). For (iii), existence of a $v$-greedy policy implies that $T$ is well-defined at $v$. To prove continuity, let $\sigma$ be the $v$-greedy policy constructed above, which satisfies {eq}`eq-rdpgam`. {prf:ref}`l-torper` gives $Tv = T_\sigma \, v$, so, by {eq}`eq-rdpgam`, $(Tv)(x) = m(x)$ for all $x$. This yields {eq}`eq-tva`. Also, by (a), the function $Tv$ is continuous. For (i, $\Rightarrow$), suppose $\sigma$ is $v$-greedy. By {prf:ref}`l-torper`, $T_\sigma \, v = Tv$, and hence, by (iii), $$ B(x, \sigma(x), v) = (T_\sigma \, v)(x) = (Tv)(x) = \max_{a \in \Gamma(x)} B(x, a, v) $$ for all $x$, which is {eq}`eq-defgreedy`. The final claim follows directly from {prf:ref}`t-berge`. ◻ ``` (s-rdpo)= ## Optimality Results Let's put together some sufficient conditions for optimality of RDP models. We will focus here on models that are naturally contracting. This permits us to handle dynamic programs with both bounded and unbounded rewards. We begin in {ref}`ss-ucs0` with the bounded case, where the aggregator $B$ is bounded and satisfies a Blackwell-type discounting condition. In {ref}`ss-ucs` we extend to potentially unbounded rewards using weighted contractions. Finally, {ref}`ss-posol` investigates properties of solutions, giving sufficient conditions for the value function to be monotone, concave, or uniquely determined, and for the optimal policy to be continuous. (ss-ucs0)= ### Bounded Contractions RDPs have strong optimality properties when they uniformly contract values. The current section investigates this case. Throughout this section, we assume that values are bounded. This typically occurs when reward functions are bounded. Later, in {ref}`ss-ucs`, we will consider unbounded settings. (sss-wrdpa0)= #### Framework Let $\Xsf, \Asf$ be separable metric spaces, let $\Gamma$ be a nonempty correspondence from $\Xsf$ to $\Asf$, and let $\Gsf$ be the feasible state-action pairs (see {ref}`sss-rdpdef`). Set $V = b\Xsf$. Let $B \colon \Gsf \times V \to \RR$ be a given function such that - $(x, a) \mapsto B(x, a, v)$ is measurable on $\Gsf$ for all $v \in V$, and - $B(x, a, w) \leq B(x, a, v)$ for all $w \leq v$ in $V$ and $(x, a) \in \Gsf$. ```{prf:assumption} :label: a-uca20 The function $B$ is bounded and, in addition, there exists a $\lambda \in [0,1)$ such that $$ B(x, a, v + \kappa) \leq B(x, a, v) + \lambda \kappa \quad \text{for all } (x, a, v) \in \Gsf \times V \text{ and all } \kappa \in \RR_+. $$ ``` The tuple $(\Gamma, V, B)$ is an RDP. To see this, note that the monotonicity condition {eq}`eq-mon` is given by the second restriction on $B$ above. For the consistency condition {eq}`eq-con`, fix $\sigma \in \Sigma$ and $v \in V$. The function $x \mapsto B(x, \sigma(x), v)$ is measurable, since $\sigma$ is measurable and $(x,a) \mapsto B(x,a,v)$ is measurable on $\Gsf$, and bounded, since $B$ is bounded by {prf:ref}`a-uca20`. Hence $B(\cdot, \sigma(\cdot), v) \in b\Xsf = V$. (sss-wrdpf0)= #### Finite Actions We seek optimality results for the RDP $(\Gamma, V, B)$ introduced in {ref}`sss-wrdpa0`. The simplest case is when the choice set is always finite. Also note that, in this setting, a policy $\sigma$ is $v$-greedy if and only if {eq}`eq-defgreedy` holds. ```{prf:proposition} :label: p-gsdpc2f0 Let {prf:ref}`a-uca20` hold. If $\Asf$ is finite, then 1. the fundamental optimality properties hold, 2. VFI converges geometrically on $V$, and 3. OPI and HPI also converge. If, in addition, $\Xsf$ is finite, then HPI converges in finitely many steps. ``` ```{prf:proof} We apply {prf:ref}`t-blackwell` with $E = b\Xsf$, $e = \1$, and $V = V_0 = b\Xsf$. The function $\1$ is a normalized order unit of $b\Xsf$. {prf:ref}`a-uca20` provides the Blackwell condition {eq}`eq-badp`: evaluating at $a = \sigma(x)$ gives $T_\sigma(v + \kappa \1) \leq T_\sigma \, v + \lambda \kappa \1$ for all $v \in V$, $\kappa \in \RR_+$, and $\sigma \in \Sigma$. Since $V_0 = V$, it is trivially closed in $V$. By {prf:ref}`l-exgref`, every $v \in V$ has a $v$-greedy policy, so the ADP is regular. The optimality and convergence claims follow from {prf:ref}`t-blackwell`. If $\Xsf$ is also finite, then $\Sigma$ is finite and hence $\TT$ is finite. Each $T_\sigma$ is a contraction and therefore globally stable, so the ADP is order stable by {prf:ref}`l-gsios`. Hence HPI converges in finitely many steps by {prf:ref}`t-bkf`. ◻ ``` (sss-cc0)= #### The Continuous Case We now drop the finiteness assumption, while continuing to work with the RDP $(\Gamma, V, B)$ introduced in {ref}`sss-wrdpa0`. In place of finiteness, we consider two continuity conditions on $B$. ```{prf:assumption} :label: a-wuca0 $\Gamma$ is continuous and compact-valued, while $(x, a) \mapsto B(x, a, v)$ is continuous on $\Gsf$ whenever $v \in bc \Xsf$. ``` ```{prf:assumption} :label: a-suca0 In addition to the conditions of {prf:ref}`a-wuca0`, the map $(x, a) \mapsto B(x, a, v)$ is continuous on $\Gsf$ whenever $v \in b\Xsf$. ``` The main result of this section is as follows. ```{prf:proposition} :label: p-gsdpc20 If {prf:ref}`a-uca20` and {prf:ref}`a-wuca0` hold, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $bc \Xsf$, and 3. VFI converges geometrically on $bc \Xsf$. If {prf:ref}`a-suca0` holds in place of {prf:ref}`a-wuca0`, then OPI and HPI also converge. ``` ```{prf:proof} We apply {prf:ref}`t-blackwell` with $E = b\Xsf$, $e = \1$, $V = b\Xsf$, and $V_0 = bc\Xsf$. The function $\1$ is a normalized order unit of $b\Xsf$. As in the proof of {prf:ref}`p-gsdpc2f0`, {prf:ref}`a-uca20` provides the Blackwell condition {eq}`eq-badp`. The set $bc\Xsf$ is closed in $b\Xsf$ under the supremum norm. We verify semi-regularity on $V_0$. Fix $v \in bc\Xsf$. By {prf:ref}`a-wuca0`, the map $(x,a) \mapsto B(x,a,v)$ is continuous on $\Gsf$. {prf:ref}`l-exgre` then gives that a $v$-greedy policy exists and that $Tv$ is continuous on $\Xsf$. Since $B$ is bounded ({prf:ref}`a-uca20`), we have $Tv \in bc\Xsf$. Hence $bc\Xsf \subset V_G$ and $T(bc\Xsf) \subset bc\Xsf$, confirming semi-regularity. Claims (i)--(iii) now follow from {prf:ref}`t-blackwell`. Under {prf:ref}`a-suca0`, the same argument applies to every $v \in V$, so $V_G = V$ and the ADP is regular. Convergence of OPI and HPI then follows from {prf:ref}`t-blackwell`. ◻ ``` (ss-ucs)= ### Weighted Contractions In {ref}`ss-ucs0`, we considered RDPs that are both contracting and bounded. Some useful RDPs fail to have this boundedness property. Here we extend our results to potentially unbounded problems that still retain contractivity. (While the results obtained in {ref}`ss-ucs0` are special cases of the results presented here (after minor modifications), we decided to present them separately in order to provide simple sufficient conditions in the bounded case.) When maximizing, the theory works best for problems where rewards are unbounded above and bounded below. (One approach to the reverse type of unboundedness can be found in {cite}`ma2022unbounded`.) Because we focus on such problems, we will typically assume that rewards are nonnegative. This costs no generality in such settings, since optimal policies are invariant to additive shifts. Throughout this section, $\ell$ is a weight function on $\Xsf$, $\| \cdot \|_\ell$ denotes the $\ell$-weighted supremum norm, and $b_\ell \Xsf$ is all $f \colon \Xsf \to \RR$ with $f/\ell \in b\Xsf$. See {ref}`sss-weisup` for background and discussion of weight functions and the space $b_\ell \Xsf$. (sss-wrdpa)= #### Framework Let $\Gamma$ be a nonempty correspondence from $\Xsf$ to $\Asf$ and let $\Gsf$ be the feasible state-action pairs (see {ref}`sss-rdpdef`). Set $V = b_\ell \Xsf_+$, so that $V$ is the nonnegative functions in $b_\ell \Xsf$. Let $B \colon \Gsf \times V \to \RR_+$ be a given function. (In the current setting, where $B$ can be unbounded, we restrict attention to the case where $B$ is nonnegative. Since the weighted contraction approach pursued here works best for rewards that are unbounded above but bounded below, imposing nonnegativity costs very little in the way of generality.) We suppose that - $(x, a) \mapsto B(x, a, v)$ is measurable on $\Gsf$ for all $v \in V$, and - $B(x, a, w) \leq B(x, a, v)$ for all $w \leq v$ in $V$ and $(x, a) \in \Gsf$. We also require two conditions related to contractivity and $\ell$-boundedness: ```{prf:assumption} :label: a-uca2 The following statements are true: 1. There exists a $\lambda \in [0,1)$ such that $$ B(x, a, v + \kappa \ell) \leq B(x, a, v) + \lambda \kappa \ell \quad \text{for all } (x, a, v) \in \Gsf \times V \text{ and all } \kappa \in \RR_+. $$ 2. for any $v \in V$, there exist constants $M, N \in \RR_+$ such that $$ B(x, a, v) \leq M + N \ell(x) \quad \text{for all } (x, a) \in \Gsf. $$ ``` Consider the tuple $(\Gamma, V, B)$. ```{prf:lemma} :label: l-isrdp If {prf:ref}`a-uca2` holds, then $(\Gamma, V, B)$ is an RDP. ``` ```{prf:proof} The only nontrivial claim is that the consistency condition {eq}`eq-con` holds. To verify this, let $\Sigma$ be the feasible policies, fix $v \in V$ and $\sigma \in \Sigma$, and consider the policy operator $(T_\sigma \, v)(x) = B(x, \sigma(x), v)$. Since $\sigma$ is, by definition, measurable, our restrictions on $B$ imply that $T_\sigma \, v$ is measurable and nonnegative. Moreover, (U2) yields constants $M, N$ such that $T_\sigma \, v \leq M + N \ell$, so $T_\sigma \, v$ is $\ell$-bounded. Hence $T_\sigma \, v \in V$ and the consistency condition holds. ◻ ``` (sss-wrdpf)= #### Finite Actions We seek optimality results for the RDP $(\Gamma, V, B)$ introduced in {prf:ref}`l-isrdp`. The simplest case is when the choice set is always finite: ```{prf:assumption} :label: a-fuca $\Asf$ is finite. ``` In this case we have the following result: ```{prf:proposition} :label: p-gsdpc2f If {prf:ref}`a-uca2` and {prf:ref}`a-fuca` hold, then 1. the fundamental optimality properties hold, 2. VFI converges geometrically on $V$, and 3. OPI and HPI also converge. If, in addition, $\Xsf$ is finite, then HPI converges in finitely many steps. ``` ```{prf:proof} We apply {prf:ref}`t-blackwell` with $E = b_\ell \Xsf$, $e = \ell$, and $V = V_0 = b_\ell \Xsf_+$. The weight function $\ell$ is a normalized order unit of $b_\ell \Xsf$, and condition (U1) implies the Blackwell condition {eq}`eq-badp`. Since $V_0 = V$, it is trivially closed in $V$. By {prf:ref}`l-exgref`, every $v \in V$ has a $v$-greedy policy, so the ADP is regular. The claims (i)--(iii) now follow from {prf:ref}`t-blackwell`. If $\Xsf$ is also finite, then $\Sigma$ is finite and hence $\TT$ is finite. Each $T_\sigma$ is a contraction and therefore globally stable, so the ADP is order stable by {prf:ref}`l-gsios`. Hence HPI converges in finitely many steps by {prf:ref}`t-bkf`. ◻ ``` (sss-cc)= #### The Continuous Case We now drop the finiteness assumption, while continuing to work with the RDP $(\Gamma, V, B)$ introduced in {prf:ref}`l-isrdp`. ```{prf:assumption} :label: a-wuca The weight function $\ell$ is continuous, $\Gamma$ is continuous and compact-valued, and $(x, a) \mapsto B(x, a, v)$ is continuous on $\Gsf$ whenever $v \in b_\ell c\Xsf$. ``` ```{prf:assumption} :label: a-suca In addition to the conditions of {prf:ref}`a-wuca`, the map $(x, a) \mapsto B(x, a, v)$ is continuous on $\Gsf$ whenever $v \in b_\ell \Xsf$. ``` ```{prf:proposition} :label: p-gsdpc2 If {prf:ref}`a-wuca` and {prf:ref}`a-uca2` hold, then 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $b_\ell c \Xsf_+$, and 3. VFI converges geometrically on $b_\ell c \Xsf_+$. If {prf:ref}`a-suca` holds in place of {prf:ref}`a-wuca`, then OPI and HPI also converge. ``` ```{prf:proof} We apply {prf:ref}`t-blackwell` with $E = b_\ell \Xsf$, $e = \ell$, $V = b_\ell \Xsf_+$, and $V_0 = b_\ell c \Xsf_+$. The weight function $\ell$ is a normalized order unit of $b_\ell \Xsf$, and condition (U1) provides the Blackwell condition {eq}`eq-badp`: evaluating at $a = \sigma(x)$ gives $T_\sigma(v + \kappa \ell) \leq T_\sigma \, v + \lambda \kappa \ell$ for all $v \in V$, $\kappa \in \RR_+$, and $\sigma \in \Sigma$. We verify semi-regularity on $V_0$. Fix $v \in b_\ell c \Xsf_+$. By {prf:ref}`a-wuca`, the map $(x, a) \mapsto B(x, a, v)$ is continuous on $\Gsf$. Our restrictions on $\Gamma$ and {prf:ref}`l-exgre` imply that $v$ has at least one greedy policy and that $Tv \in b_\ell c \Xsf_+$. Hence $V_0 \subset V_G$ and $T V_0 \subset V_0$. Since $\ell$ is continuous ({prf:ref}`a-wuca`), $V_0$ is closed in $V$ ({prf:ref}`t-wsnsc`), and claims (i)--(iii) follow from {prf:ref}`t-blackwell`. For the last claim, if {prf:ref}`a-suca` holds, the same argument applies to all $v \in V$, giving regularity. OPI and HPI convergence then follow from {prf:ref}`t-blackwell`. ◻ ``` (ss-posol)= ### Properties of Solutions In this section, we seek sufficient conditions for the value and policy functions to have useful shape and continuity properties. We *adopt the setting of {prf:ref}`p-gsdpc2`* and study the properties of the RDP $(\Gamma, V, B)$ discussed in that result. In the proofs below, we repeatedly use {prf:ref}`l-csgs` on page . (sss-mvals)= #### Monotone Values First, we seek conditions under which the value function is increasing. In addition to the conditions in {prf:ref}`p-gsdpc2`, we suppose that $\Xsf$ is partially ordered by $\preceq$. Let $$ ib_\ell c\Xsf_+ \coloneq \text{ the set of increasing functions in } b_\ell c\Xsf_+. $$ ```{exercise} :label: ex-rdps-auto-3 Show that $i b_\ell c\Xsf_+$ is a closed subset of $b_\ell c\Xsf_+$. ``` ```{solution} ex-rdps-auto-3 Let $(v_n)$ be a sequence in $ib_\ell c\Xsf_+$ with $\|v_n - v\|_\ell \to 0$ for some $v \in b_\ell c\Xsf_+$. By {prf:ref}`ex-wncpc`, $v_n(x) \to v(x)$ for every $x \in \Xsf$. If $x \preceq x'$, then $v_n(x) \leq v_n(x')$ for all $n$, so passing to the limit gives $v(x) \leq v(x')$. Hence $v$ is increasing and $v \in ib_\ell c\Xsf_+$. ``` ```{prf:assumption} :label: a-visiso If $x \preceq x'$, then, for all $v \in i b_\ell c\Xsf_+$ and all $a \in \Gamma(x)$, $$ \Gamma(x) \subset \Gamma(x') \quad \text{ and } \quad B(x, a, v) \leq B(x', a, v). $$ ``` Both conditions in {prf:ref}`a-visiso` are monotonicity conditions. The first is equivalent to stating that $\Gamma$ is order preserving when viewed as a map from $(\Xsf, \preceq)$ to $(\wp(\Asf), \subset)$. Here $\wp(\Asf)$ is the set of all subsets of $\Asf$ and $\subset$ is the partial order induced by set inclusion ({prf:ref}`eg-poac`). ```{prf:proposition} :label: p-vmon If {prf:ref}`a-visiso` holds, then $\vmax$ is increasing on $\Xsf$. ``` ```{prf:proof} It suffices to show that $T$ is invariant on $ib_\ell c\Xsf_+$, since, by {prf:ref}`p-gsdpc2`, $T$ is globally stable on $b_\ell c\Xsf_+$ and, in addition, $i b_\ell c\Xsf_+$ is closed in $b_\ell c\Xsf_+$. To see that this holds, pick any $v \in i b_\ell c\Xsf_+$ and fix $x$ and $x'$ with $x \preceq x'$. Since $T$ is invariant on $b_\ell c \Xsf_+$, we need only show that $Tv$ is increasing. But this must be so, since, by {prf:ref}`a-visiso`, $$ \sup_{a \in \Gamma(x)} B(x, a, v) \leq \sup_{a \in \Gamma(x)} B(x', a, v) \leq \sup_{a \in \Gamma(x')} B(x', a, v). $$ Hence $Tv(x) \leq Tv(x')$ and $T$ is invariant on $ib_\ell c\Xsf_+$. ◻ ``` ```{exercise} :label: ex-ogiv Consider again the optimal savings problem from {ref}`sss-osapp`. We saw in that discussion that the conditions of {prf:ref}`p-gsdpc2` hold. Show that the conditions of {prf:ref}`a-visiso` also hold. ``` ```{solution} ex-ogiv The state space is $\Xsf = \RR_+$ with the usual order. First, $w \leq w'$ implies $[0, w] \subset [0, w']$, so $\Gamma(w) \subset \Gamma(w')$. Second, fix $w \leq w'$, $c \in \Gamma(w)$, and $v \in ib_\ell c\RR_+$. Since $v$ is increasing and $R(w - c) \leq R(w' - c)$, we have $v(R(w - c) + y) \leq v(R(w' - c) + y)$ for every $y \geq 0$. Integrating gives $B(w, c, v) \leq B(w', c, v)$. ``` #### Concavity Next we seek sufficient conditions for the value function to be concave. In this section, we assume that both $\Xsf$ and $\Asf$ are convex subsets of a vector space. ```{prf:assumption} :label: a-convexfc The set of feasible state-action pairs $\Gsf$ is convex and $(x, a) \mapsto B(x, a, v)$ is concave on $\Gsf$ whenever $v$ is concave on $\Xsf$. ``` The convexity requirement on $\Gsf$ in {prf:ref}`a-convexfc` is equivalent to the statement that, for all $x, x'$ in $\Xsf$, all $a \in \Gamma(x)$ all $a' \in \Gamma(x')$ and all $\lambda \in [0, 1]$, we have $$ \lambda a + (1-\lambda) a' \in \Gamma(\lambda x + (1-\lambda) x'). $$ By taking $x=x'$, we see that each set $\Gamma(x)$ is convex in $\Asf$. ```{prf:proposition} :label: p-visco Let the conditions of {prf:ref}`p-gsdpc2` hold. If, in addition, {prf:ref}`a-convexfc` holds, then $\vmax$ is concave on $\Xsf$. ``` ```{prf:proof} Let $c b_\ell c \Xsf_+$ be the concave functions in $b_\ell c \Xsf_+$. By a similar argument to the one used in the proof of {prf:ref}`p-vmon`, it suffices to show that $T$ is invariant on $c b_\ell c \Xsf_+$. To this end, fix $v$ in $c b_\ell c \Xsf_+$, $\lambda$ in $[0, 1]$ and $x_0, x_1 \in \Xsf$. Let $a_i$ satisfy $Tv(x_i) = B(x_i, a_i, v)$ for each $i$. Let $x_\lambda = \lambda x_0 + (1-\lambda) x_1$ and $a_\lambda = \lambda a_0 + (1-\lambda) a_1$. By convexity of $\Gsf$, we know that $a_\lambda$ lies in $\Gamma(x_\lambda)$, which gives $$ \lambda B(x_0, a_0, v) + (1- \lambda) B(x_1, a_1, v) \leq B(x_\lambda, a_\lambda, v) \leq Tv(x_\lambda). $$ The left-hand side is $\lambda Tv(x_0) + (1-\lambda) Tv(x_1)$, so we have proved concavity of $Tv$. Hence $T$ is invariant on $c b_\ell c \Xsf_+$, and the claim in {prf:ref}`p-visco` holds. ◻ ``` ```{exercise} :label: ex-conos Consider the optimal savings problem studied in {prf:ref}`ex-ogiv`. Prove that if $u$ is also concave, then $\vmax$ is increasing and concave. ``` ```{solution} ex-conos Monotonicity of $\vmax$ follows from {prf:ref}`ex-ogiv` and {prf:ref}`p-vmon`. For concavity, we verify {prf:ref}`a-convexfc`. The set $\Gsf = \{(w, c) : 0 \leq c \leq w\}$ is convex. Fix concave $v$ on $\RR_+$. The map $(w, c) \mapsto u(c)$ is concave in $(w, c)$ since $u$ is concave. The map $(w, c) \mapsto v(R(w - c) + y)$ is concave in $(w, c)$ for each $y$, since $(w, c) \mapsto R(w - c) + y$ is affine and the composition of a concave function with an affine map is concave. Integrating preserves concavity, so $(w, c) \mapsto B(w, c, v)$ is concave on $\Gsf$. {prf:ref}`p-visco` now gives the result. ``` #### Uniqueness and Continuity When the conditions of {prf:ref}`p-gsdpc2` are in force, we know that at least one optimal policy exists in $\Sigma$. The question we ask now is, when is it unique? Not surprisingly, uniqueness can be obtained with a form of strict concavity. ```{prf:assumption} :label: a-convexfcs {prf:ref}`a-convexfc` is satisfied and, in addition, $a \mapsto B(x, a, v)$ is strictly concave on $\Gamma(x)$ for all $x$ in $\Xsf$ and all concave $v$ on $\Xsf$. ``` ```{prf:proposition} :label: p-convexfcs Let the conditions of {prf:ref}`p-gsdpc2` hold. If {prf:ref}`a-convexfcs` also holds, then the optimal policy is both unique and continuous. ``` ```{prf:proof} Let {prf:ref}`a-convexfcs` hold. An optimal policy exists, since we are assuming the conditions of {prf:ref}`p-gsdpc2`. Thus, only uniqueness needs to be shown. By the same theorem, a policy is optimal if and only if it is $\vmax$-greedy. So, to prove uniqueness, it suffices to show that there cannot be two such policies in $\Sigma$. To see this, observe that $\vmax$ is concave on $\Xsf$ by {prf:ref}`p-visco`. Hence, under {prf:ref}`a-convexfcs`, the map $a \mapsto B(x, a, \vmax)$ is strictly concave at each $x$. Strictly concave functions have unique maximizers, so the $\vmax$-greedy policy is unique in $\Sigma$. Continuity now follows from {prf:ref}`l-exgre` on page . ◻ ``` ```{exercise} :label: ex-conos2 Consider the optimal savings model in the setting of {prf:ref}`ex-conos`. Suppose that, in addition, $u$ is strictly concave. Show that, in this setting, the optimal policy is unique and continuous. ``` ```{solution} ex-conos2 We verify {prf:ref}`a-convexfcs`. By {prf:ref}`ex-conos`, {prf:ref}`a-convexfc` holds. For strict concavity, fix $w \in \RR_+$ and concave $v$ on $\RR_+$. Since $u$ is strictly concave, $c \mapsto u(c)$ is strictly concave on $[0, w]$. The map $c \mapsto \int v(R(w - c) + y) \phi(\diff y)$ is concave (as shown in {prf:ref}`ex-conos`). The sum of a strictly concave function and a concave function is strictly concave, so $c \mapsto B(w, c, v)$ is strictly concave on $\Gamma(w)$. {prf:ref}`p-convexfcs` now gives the result. ``` (sss-ces)= ### Digression on Certainty Equivalents Before continuing with the theory of RDPs, it will be helpful to review risk measures and certainty equivalents. These concepts are in one-to-one correspondence: we convert between them by flipping signs. Certainty equivalents can be understood as extensions of mathematical expectation that include attitudes towards risk. In later sections, we will tie the discussion of risk measures and certainty equivalents back into RDP theory and its applications. (The existence of parallel literatures on risk measures and certainty equivalents reflects the fact that researchers in finance and engineering often think about minimizing risk, while economists typically concern themselves with maximizing rewards.[^1] In this book, we tend to work with certainty equivalents, although the following discussion will allow readers to translate between the two.) Throughout the following discussion, the triple $(\Omega, \fF, \PP)$ is a probability space and $L_\infty \coloneq L_\infty(\Omega, \fF, \PP)$ is the set of essentially bounded random variables on $(\Omega, \fF, \PP)$; that is, all random $Z$ admitting an $N \in \NN$ with $|Z| \leq N$ $\PP$-a.s. In this setting, a **risk measure** is a map $\rR \colon L_\infty \to \RR$ satisfying 1. Monotonicity: If $Z, Z' \in L_\infty$ and $Z \leq Z'$ $\PP$-a.s., then $\rR(Z') \leq \rR(Z)$. 2. Cash invariance: $\rR(Z + a) = \rR(Z) - a$ for all $Z \in L_\infty$ and $a \in \RR$. A **certainty equivalent** is a map $\eE \colon L_\infty \to \RR$ satisfying 1. Monotonicity: If $Z, Z' \in L_\infty$ and $Z \leq Z'$ $\PP$-a.s., then $\eE(Z) \leq \eE(Z')$. 2. Cash invariance: $\eE(Z + a) = \eE(Z) + a$ for all $Z \in L_\infty$ and $a \in \RR$. (Note that the meaning of monotonicity and cash invariance changes from $\rR$ to $\eE$.) ```{exercise} :label: ex-fsign Prove the following: 1. A map $\eE \colon L_\infty \to \RR$ is a certainty equivalent if and only if $\rR = -\eE$ is a risk measure. 2. Show that the convex combination of any two certainty equivalents is also a certainty equivalent. ``` ```{solution} ex-fsign Let $\hat{\eE} = \lambda \eE_0 + (1-\lambda)\eE_1$. Fix $a \in \RR$ and $Z \in L_\infty$. Since $\eE_0$ and $\eE_1$ are certainty equivalents, we have $$ \hat{\eE}(Z + a) = \lambda \eE_0(Z + a) + (1-\lambda) \eE_1(Z + a) = \hat{\eE}(Z) + a. $$ Hence $\hat{\eE}$ is cash invariant. It is easy to see that $\hat{\eE}$ is also monotone. Hence $\hat{\eE}$ is a certainty equivalent. ``` We now define several significant subclasses of risk measures and state the corresponding properties of the associated certainty equivalent $\eE = -\rR$. A risk measure $\rR$ is called **convex** if $$ \rR(\lambda Z + (1 - \lambda) Z') \leq \lambda\, \rR(Z) + (1 - \lambda)\, \rR(Z') $$ for all $Z, Z' \in L_\infty$ and $\lambda \in [0, 1]$. Obviously, $\rR$ is convex if its negation $\eE \coloneq -\rR$ is **concave**: $$ \eE(\lambda Z + (1 - \lambda) Z') \geq \lambda\, \eE(Z) + (1 - \lambda)\, \eE(Z'). $$ Concavity of $\eE$ captures the idea that diversification is weakly preferred. A risk measure $\rR$ is called **coherent** if it is convex and **positively homogeneous**, meaning that $\rR(\lambda Z) = \lambda\, \rR(Z)$ for all $Z \in L_\infty$ and $\lambda > 0$. The certainty equivalent $\eE = -\rR$ is then concave and positively homogeneous. Together with concavity, this means $\eE$ is superadditive and positively homogeneous. #### Duality There is a dual representation theorem for convex risk measures, originally due to {cite}`follmer2002convex`, that helps us interpret and manipulate these functionals. Here we restate their result in terms of concave certainty equivalents. In doing so, we will restrict attention to the **law invariant case**; that is, the case where $\eE(Z)$ depends only on the distribution of $Z$ for all $Z \in L_\infty$.[^2] ```{prf:theorem} :label: thm-dual Let $\eE$ be a concave certainty equivalent. If $\eE$ is law invariant, then there exists a penalty function $\alpha \colon \pP(\RR) \to [0, \infty]$ such that, for all $Z \in L_\infty$, $$ \eE(Z) = \inf_{Q \ll P_Z} \left\{ \EE_Q[Z] + \alpha(Q) \right\}. $$ (eq-cedual) ``` In the theorem statement, $P_Z$ is $\PP \circ Z^{-1}$, the distribution of $Z$, and the infimum is over all $Q \in \pP(\RR)$ such that $Q$ is absolutely continuous with respect to $P_Z$. The constraint $Q \ll P_Z$ means that if $P_Z$ says an event is impossible, then $Q$ must also say it's impossible. One way to interpret {eq}`eq-cedual` is in terms of an adversarial agent who chooses $Q$ to minimize the expected return $\EE_Q[Z]$, while being constrained by a penalty term $\alpha(Q)$. This is the **robust optimization** point of view: the agent makes choices that are robust to variations by a real or fictitious adversary. The penalty function $\alpha$ controls how far the adversary is able to deviate from the reference model $P_Z$. From this perspective, the absolute continuity condition $Q \ll P_Z$ means that the adversary is allowed to disagree about how likely different scenarios are, but not about which scenarios are conceivable. A second interpretation involves **ambiguity**. The agent does not know the true model and $P_Z$ is only a reference point. The agent's cautious reasoning forces him to entertain a range of plausible models. The penalty term $\alpha(Q)$ reflects how implausible $Q$ is relative to $P_Z$. The absolute continuity constraint defines what the agent considers to be possible---the set of scenarios that could actually occur. ```{prf:remark} :label: r-dp The statement of {prf:ref}`thm-dual` usually includes a "Fatou condition," which is related to continuity. We omit this condition because we have assumed law invariance, which, combined with concavity, is enough for the conclusions of {prf:ref}`thm-dual`. See {cite}`JouiniSchachermayerTouzi2006`. ``` (sss-egsces)= #### Examples of Certainty Equivalents Let's look at examples, focusing primarily on certainty equivalents. Throughout this discussion, $Z$ is an element of $L_\infty$, $P_Z$ is its distribution, $F_Z$ is its cdf, and $F_Z^{-1}$ is the inverse cdf. The simplest certainty equivalent is mathematical expectation: $\eE(Z) = \EE[Z]$. This corresponds to risk neutrality: the agent is indifferent between any random variable and its mean. The other extreme is the **pessimistic certainty equivalent** $$ \eE_p(Z) \coloneq \operatorname{ess\,inf} Z = \sup \setntn{a \in \RR}{\PP\{Z < a\} = 0}. $$ We can think of $\eE_p(Z)$ as the left-hand end point of the support of $Z$. For the pessimistic certainty equivalent, the dual representation {eq}`eq-cedual` becomes $$ \eE_p(Z) = \inf_{Q \ll P_Z} \EE_Q[Z], $$ Both of these examples are coherent. Another example is the **$\alpha$-quantile certainty equivalent** $$ \qQ_\alpha(Z) = F_Z^{-1}(\alpha), \qquad \alpha \in (0, 1). $$ The value $\qQ_\alpha(Z)$ is the $\alpha$-quantile of $Z$. The corresponding risk measure $\rR_\alpha = - \qQ_\alpha$ is just **$\alpha$-level value-at-risk (VaR)**. VaR admits some pathologies. For example, VaR is not convex, and hence can increase under diversification. These deficiencies have motivated the introduction of **conditional value at risk (CVaR)** (also called average value at risk, or expected shortfall), defined as $$ \rR_\alpha(Z) = -\frac{1}{\alpha} \int_0^\alpha F_Z^{-1}(t)\, dt \qquad \alpha \in (0, 1], $$ The corresponding **CVaR certainty equivalent** is $\eE_\alpha(Z) = - \rR_\alpha(Z)$, interpreted as the mean of the $\alpha$-tail of the distribution of $Z$---the average over the worst $\alpha$-fraction of outcomes. The CVaR certainty equivalent is coherent and admits the dual representation $$ \eE_\alpha(Z) = \inf\, \left\{ \EE_Q[Z] \,:\, Q \ll P_Z,\; \frac{\diff Q}{\diff P_Z} \leq \frac{1}{\alpha} \right\}. $$ The parameter $\alpha$ interpolates between the two previous cases: $$ \alpha = 1 \implies \eE_1(Z) = \EE[Z], \qquad \alpha \to 0 \implies \eE_\alpha(Z) \to \operatorname{ess\,inf} Z. $$ Another important case, already discussed in {prf:ref}`c-egs`, is the **entropic certainty equivalent** $$ \eE_\gamma(Z) = -\frac{1}{\gamma} \ln \EE\, \left[\exp(-\gamma Z)\right] \qquad (\gamma > 0). $$ (eq-entcert) This equivalent is concave but not coherent. The dual representation is $$ \eE_\gamma(Z) = \inf_{Q \ll P_Z} \left\{ \EE_Q[Z] + \frac{1}{\gamma}\, D_{\mathrm{KL}}(Q \,\|\, P_Z) \right\}, $$ (eq-entcertdual) where $D_{\mathrm{KL}}(Q \,\|\, P_Z) = \EE_Q\!\left[\log \frac{\diff Q}{\diff P_Z}\right]$ is the Kullback--Leibler divergence. The parameter $\gamma$ controls the degree of risk aversion and interpolates between risk neutrality and worst case. In particular, $\gamma \to 0$ implies $\eE_\gamma(Z) \to \EE[Z]$, while $\gamma \to \infty$ implies $\eE_\gamma(Z) \to \operatorname{ess\,inf}\, Z$. It is worth noting here that the Kreps--Porteus expectation $\kK(Z) \coloneq (\EE[Z^{1-\gamma}])^{1/(1-\gamma)}$ is *not* a certainty equivalent, at least according to our definition. While monotonicity holds, $\kK$ fails cash invariance. This is, in essence, why Bellman and policy operators based around Epstein--Zin preferences often fail to be contractions. We discuss Kreps--Porteus expectations again in {ref}`ss-kpvsrs`. (sss-cecon)= #### Continuity Let $\eE$ be a certainty equivalent on $L_\infty = L_\infty(\Omega, \fF, \PP)$. We call $\eE$ **continuous** if, given any uniformly bounded sequence $(Z_n)_{n \in \NN}$ in $L_\infty$ and any $Z \in L_\infty$, we have $$ \eE(Z_n) \to \eE(Z) \quad \text{whenever } \; Z_n \to Z \; \text{ } \PP \text{-almost surely}. $$ In the definition above, uniform boundedness means that there exists an $M < \infty$ with $|Z_n| \leq M$ almost surely for all $n$. Continuity will be useful for the optimality theory developed below. ```{prf:example} The risk-neutral certainty equivalent $\eE = \EE$ is continuous. This follows immediately from the dominated convergence theorem ({prf:ref}`t-dct`). ``` ```{prf:example} :label: eg-cvarc The CVaR certainty equivalent $\eE_\alpha$ is continuous. ``` ```{prf:proof} Fix a sequence $(Z_n)$ and $Z$ in $L_\infty$ with $Z_n \to Z$ $\PP$-a.s. and $|Z_n| \leq M$ for all $n$. Let $F_n$ and $F$ be the respective cdfs. Since almost sure convergence implies convergence in distribution, $F_n(x) \to F(x)$ at every continuity point of $F$. By the quantile convergence theorem, $F_n^{-1}(t) \to F^{-1}(t)$ at every continuity point of $F^{-1}$, which excludes at most countably many $t$. Since $|F_n^{-1}(t)| \leq M$ for all $t$, the dominated convergence theorem gives $$ \eE_\alpha(Z_n) = \frac{1}{\alpha} \int_0^\alpha F_n^{-1}(t)\, dt \;\to\; \frac{1}{\alpha} \int_0^\alpha F^{-1}(t)\, dt = \eE_\alpha(Z), $$ confirming continuity. ◻ ``` ```{exercise} :label: ex-entc Let $\eE_\gamma$ be the entropic certainty equivalent with parameter $\gamma > 0$. Show that $\eE_\gamma$ is continuous. ``` ```{solution} ex-entc Fix a sequence $(Z_n)$ and $Z$ in $L_\infty$ with $Z_n \to Z$ $\PP$-a.s. and $|Z_n| \leq M$ for all $n$. By continuity of $\exp$ we have $\exp(-\gamma Z_n) \to \exp(-\gamma Z)$ $\PP$-a.s.. Now $|\exp(-\gamma Z_n)| \leq \exp(\gamma M)$ and bounded convergence imply $\EE[\exp(-\gamma Z_n)] \to \EE[\exp(-\gamma Z)]$. Since the limit is strictly positive, continuity of $\ln$ gives $\eE_\gamma(Z_n) \to \eE_\gamma(Z)$. ``` (ss-mpdsces)= ### MDPs with Certainty Equivalents We first set up a standard MDP framework with aggregator based on mathematical expectation and then replace the expectation with a general certainty equivalent, showing that the fundamental optimality results carry over when the certainty equivalent is continuous. (sss-mdpframe)= #### MDP Framework We begin by setting up a basic MDP framework that can then be adapted to add risk preferences. To this end, let $\Xsf$ and $\Asf$ be arbitrary metric spaces and let $\Gamma$ be a nonempty correspondence from $\Xsf$ to $\Asf$. Let $\Gsf = \setntn{(x,a)\in \Xsf \times \Asf}{a \in \Gamma(x)}$. Consider an RDP with feasible correspondence $\Gamma$ and aggregator $$ B(x, a, v) = r(x,a) + \beta \, \EE [ v(f(x,a,\xi)) ]. $$ (eq-mdpfvv) Here $\xi$ is a random element that takes values in a metric space $\Zsf$ and has distribution $\phi$, $f$ is a measurable function from $\Gsf \times \Zsf$ to $\Xsf$, and $r$ is a measurable function from $\Gsf$ to $\RR$. The discount factor $\beta$ obeys $0 \leq \beta < 1$. The value space is set to $b\Xsf$. ```{prf:assumption} :label: a-mdpcea The primitives described above obey the following conditions: 1. The correspondence $\Gamma$ is compact-valued and continuous. 2. The reward function $r$ is bounded and continuous. 3. The map $(x,a) \mapsto f(x,a,z)$ is continuous on $\Gsf$ for all $z \in \Zsf$. ``` We can treat optimality and convergence of algorithms for the associated RDP $(\Gamma, b\Xsf, B)$ using {prf:ref}`p-mdpwf` from {prf:ref}`c-ldps`. Here, however, we'll extend the model to use arbitrary certainty equivalents in place of $\EE$. Results for $\EE$ will be a special case. The next section gives details. (sss-ee)= #### Certainty Equivalents for MDPs Let's consider replacing the expectation in {eq}`eq-mdpfvv`, which corresponds to risk-neutrality over continuation values, with an arbitrary certainty equivalent $\eE$. The aggregator is now $$ B_{\eE}(x, a, v) \coloneq r(x,a) + \beta \, \eE [v(f(x,a,\xi))]. $$ (eq-mdpfce) Other primitives are left unchanged. The value space continues to be $b\Xsf$. We assume throughout that $(x,a) \mapsto \eE [v(f(x,a,\xi))]$ is measurable on $\Gsf$. We consider the RDP $(\Gamma, b\Xsf, B_{\eE})$. ```{prf:proposition} :label: p-cemdps If $\eE$ is continuous, then, for the RDP $(\Gamma, b\Xsf, B_{\eE})$, 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $bc \Xsf$, and 3. VFI converges geometrically on $bc \Xsf$. ``` ```{prf:proof} We verify the conditions of {prf:ref}`p-gsdpc20`. By {prf:ref}`a-mdpcea`, $\Gamma$ is compact-valued and continuous. Measurability of $(x, a) \mapsto B_{\eE}(x, a, v)$ on $\Gsf$ holds for each $v \in V$ by our standing assumption. Monotonicity of $B_{\eE}$ in $v$ follows from monotonicity of $\eE$. Since $r$ is bounded ({prf:ref}`a-mdpcea`) and $\eE$ maps bounded random variables to finite values, the map $(x, a) \mapsto B_{\eE}(x, a, v)$ is bounded for each $v \in V$. Moreover, cash invariance of $\eE$ gives $$ B_{\eE}(x, a, v + \kappa) = B_{\eE}(x, a, v) + \beta \kappa $$ for all $\kappa \in \RR_+$, so {prf:ref}`a-uca20` holds with $\lambda = \beta$. For {prf:ref}`a-wuca0`, fix $v \in bc\Xsf$ and let $(x_n, a_n) \to (x, a)$ in $\Gsf$. By {prf:ref}`a-mdpcea`, $f(x_n, a_n, z) \to f(x, a, z)$ for all $z \in \Zsf$, so continuity of $v$ gives $v(f(x_n, a_n, \xi)) \to v(f(x, a, \xi))$ almost surely. Since $|v| \leq \|v\|_\infty$, continuity of $\eE$ gives $\eE\, v(f(x_n, a_n, \xi)) \to \eE\, v(f(x, a, \xi))$. Combined with continuity of $r$, we conclude that $(x, a) \mapsto B_{\eE}(x, a, v)$ is continuous on $\Gsf$. The claims now follow from {prf:ref}`p-gsdpc20`. ◻ ``` ## Applications We now apply the RDP optimality theory developed in {ref}`s-rdpo` to a range of dynamic programming problems. In {ref}`sss-osapp` we revisit the optimal savings problem, this time with utility unbounded above, and verify the conditions of our weighted contraction results. We then study irreversible investment under risk neutrality ({ref}`sss-abm`), risk aversion ({ref}`sss-iririsk`), and ambiguity aversion ({ref}`ss-rcfp`). (sss-osapp)= ### Optimal Savings with Utility Unbounded Above Here we again consider the optimal savings model from {ref}`s-og`, but without the boundedness restriction on $u$. In particular, we assume that $u$ is continuous, nonnegative, and increasing, and that $$ \ell(w) \coloneq \EE \sum_{t \geq 0} \delta^t u(\hat W_t) < \infty $$ (eq-wbua) for some $\delta \in (\beta, 1)$. Here $(\hat W_t)$ is defined recursively via $\hat W_{t+1} = R \hat W_t + Y_{t+1}$ with $\hat W_0 = w$ and $(Y_t) \iidsim \phi$. We can think of $(\hat W_t)$ as an upper bound process for wealth, achieved when consumption is always zero. As before, $\phi$ is a continuous density on $\RR_+$. Our aim is to provide conditions under which the conclusions of {prf:ref}`p-gsdpc2` apply. We set $V = b_\ell \RR_+$, $\Gamma(w) = [0, w]$, and $$ B(w, c, v) = u(c) + \beta \int v(R(w - c) + y) \phi(\diff y). $$ ```{exercise} :label: ex-osu1u2 Show that (U1) and (U2) hold with $\lambda = \beta/\delta < 1$. ``` ```{solution} ex-osu1u2 For (U1), we have $$ B(w, c, v + \kappa \ell) - B(w, c, v) = \beta \kappa \int \ell(R(w - c) + y) \phi(\diff y) \leq \beta \kappa \int \ell(Rw + y) \phi(\diff y), $$ where the inequality uses $c \geq 0$ and the fact that $\ell$ is increasing (which follows from its definition as the expected sum of nonnegative terms along increasing sample paths). By the recursive structure of $\ell$, we have $$ \ell(w) = u(w) + \delta \int \ell(Rw + y) \phi(\diff y), $$ so $\int \ell(Rw + y) \phi(\diff y) \leq \ell(w)/\delta$. Hence $B(w, c, v + \kappa \ell) - B(w, c, v) \leq (\beta/\delta) \kappa \ell(w)$, confirming (U1). For (U2), fix $v \in V$. Since $u \geq 0$ and $c \leq w$, we have $u(c) \leq u(w) \leq \ell(w)$. Also, $|v| \leq \|v\|_\ell \ell$, so $$ |B(w, c, v)| \leq u(c) + \beta \|v\|_\ell \int \ell(Rw + y) \phi(\diff y) \leq \ell(w) + (\beta/\delta) \|v\|_\ell \ell(w), $$ which confirms (U2). ``` ```{exercise} :label: ex-oscsu Show that {prf:ref}`a-suca` holds for this model. ``` ```{solution} ex-oscsu Fix $v \in b_\ell \RR_+$ and let $(w_n, c_n) \to (w, c)$ in $\Gsf$. Since $u$ is continuous, it suffices to show that $\int v(R(w_n - c_n) + y) \phi(y) \diff y \to \int v(R(w - c) + y) \phi(y) \diff y$. Set $s_n = R(w_n - c_n)$ and $s = R(w - c)$, so $s_n \to s$. The change of variable $x' = s_n + y$ gives $\int v(s_n + y) \phi(y) \diff y = \int v(x') \phi(x' - s_n) \diff x'$, and similarly with $s$ in place of $s_n$ (cf. {prf:ref}`eg-srssf`). Fix $\epsilon > 0$ and set $\bar s = \sup_n s_n$. Since $\int \ell(\bar s + y) \phi(y) \diff y < \infty$ (by the definition of $\ell$), we can choose $M$ so that $\int_{M - \bar s}^\infty \ell(\bar s + y) \phi(y) \diff y < \epsilon$. Using $|v| \leq \|v\|_\ell \ell$ and $\ell$ increasing, the integral over $(M, \infty)$ is bounded by $\|v\|_\ell \epsilon$ for all $n$, and the same bound holds with $s$ in place of $s_n$. On $[0, M]$, the function $v$ is bounded by $\|v\|_\ell \max_{[0, M]} \ell < \infty$ (since $\ell$ is continuous), and Scheffé's lemma together with continuity of $\phi$ gives $\int_0^M |v(x')| \cdot |\phi(x' - s_n) - \phi(x' - s)| \diff x' \to 0$. Combining gives the result. ``` The results above imply that {prf:ref}`a-suca` and {prf:ref}`a-uca2` both hold. As a result, the conclusions of {prf:ref}`p-gsdpc2` apply. For example, the value function $\vmax$ exists, is an element of $b_\ell c \RR_+$, and satisfies $$ \vmax(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int \vmax(R(w - c) + y) \phi(\diff y) \right\} $$ for all $w \geq 0$. Moreover, VFI, OPI, and HPI all converge. In general, OPI and VFI are the easiest to implement. {numref}`f-vfi_vs_opi` illustrates the runtime of OPI as a function of $m$ for this model with CRRA utility $u(c) = (c^{1-\gamma} - 1)/(1-\gamma)$ and $\gamma = 0.5$. The runtime of VFI is shown as a horizontal line. Since VFI is the special case of OPI with $m=1$, the leftmost point of the OPI curve coincides with the VFI runtime. The minimum is attained near $m = 40$, where OPI runs roughly six times faster than VFI. Runtime then rises linearly in $m$. The key message is that OPI dominates VFI over a wide range of $m$. ```{figure} figures/os_vfi_vs_opi.pdf :name: f-vfi_vs_opi Runtime of OPI vs. VFI for the optimal savings model ``` (ss-irri)= ### Irreversible Investment We begin with the risk-neutral case in {ref}`sss-abm`, where standard contraction arguments apply. In {ref}`sss-iririsk` we introduce risk aversion by replacing mathematical expectation with a certainty equivalent, using the framework developed in {ref}`sss-ces`. (sss-abm)= #### The Risk-Neutral Case Let's begin with a canonical firm problem with irreversible investment, where the Bellman equation is $$ v(k, z) = \max_{i \in \Gamma(k, z)} \left\{ f(k,z) - i + \beta \int v(i + (1-\delta) k, g(z, \xi)) \phi(\diff \xi) \right\}. $$ Here $k \in \RR_+$ is capital stock, $i \in \RR_+$ is investment, $\beta \in (0,1)$ is the discount factor, $\delta \in (0,1)$ is a depreciation rate, $f$ is a production function, and $z \in \RR^m$ is an exogenous state vector. The feasible correspondence is defined by $\Gamma(k,z) = [0, \theta f(k, z)]$, where $\theta > 0$ is a borrowing constraint parameter. The state process evolves according to $$ Z_{t+1} = g(Z_t, \xi_{t+1}), \qquad (\xi_t)_{t \geq 0} \iidsim \phi. $$ Each $\xi_t$ takes values in a metric space $\Zsf$, the distribution $\phi$ is an element of $\pP(\Zsf)$, and $g \colon \RR^m \times \Zsf \to \RR^m$. ```{prf:assumption} :label: a-fia The function $f$ is bounded and $f$ and $g$ are both continuous. ``` Some comments are in order. First, to simplify the presentation, we've set the output price to unity, so that $f(k,z)$ is both output and revenue. This can easily be modified. Second, the boundedness restriction on $f$ is not automatically satisfied in many cases but greatly simplifies the analysis. In terms of quantitative applications, the cost is not large. For example, $f(k,z)=zk^\alpha$ can be replaced with $f(k,z) = \min\{zk^\alpha, y\}$ for large $y$. If $y$ is very large then the impact on choices and values is negligible. This firm problem is called an **irreversible investment model** because $i$ is required to be nonnegative. To frame this problem as an RDP, we set $V = b\Xsf$, where $\Xsf \coloneq \RR_+ \times \RR^m$, and $$ B(k,z,i, v) = f(k,z) - i + \beta \int v(i + (1-\delta) k, g(z, \xi)) \phi(\diff \xi) $$ The set of policies $\Sigma$ is all measurable maps from $\Xsf$ to $\RR_+$ satisfying the feasibility constraint. ```{prf:proposition} :label: p-fip If {prf:ref}`a-fia` holds, then $(\Gamma, V, B)$ is an RDP. Moreover, 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $bc \Xsf$, and 3. VFI converges geometrically on $bc \Xsf$. ``` ```{exercise} :label: ex-cfip Prove {prf:ref}`p-fip` by checking the conditions of {prf:ref}`p-cemdps`. ``` ```{solution} ex-cfip The model is an instance of the MDP framework of {ref}`ss-mpdsces` with $\eE = \EE$. Under {prf:ref}`a-fia`, $\Gamma$ is continuous, nonempty, and compact-valued ({prf:ref}`ex-ecco`), $r(k,z,i) = f(k,z) - i$ is bounded and continuous, and the transition $(k,z,i,\xi) \mapsto (i + (1-\delta) k, g(z,\xi))$ is continuous in $(k,z,i)$ for each $\xi$, so {prf:ref}`a-mdpcea` holds. Mathematical expectation is a continuous certainty equivalent (by the dominated convergence theorem), so the claims follow from {prf:ref}`p-cemdps`. ``` {numref}`f-fi_policy` shows an example computation, plotting optimal investment as a function of capital for two productivity levels. The plot also compares this irreversible case ($i \geq 0$) with a reversible benchmark where the firm can also disinvest ($i \geq -(1 - \delta)k$). At low capital, both firms invest identically. At high capital, with the same level of productivity, the reversible firm disinvests, while the irreversible firm sets $i$ to the lower bound constraint of zero. Importantly, for intermediate levels of capital, the reversible firm invests more aggressively, knowing that it can sell capital later if productivity drops. The irreversible firm faces a higher effective cost of capital due to the option value of waiting and targets a lower stock.[^3] {numref}`f-fi_sim` shows simulated paths for both firms facing identical productivity shocks. The reversible firm tracks productivity more closely, boosting capital during good times and shedding it during downturns. The irreversible firm adjusts sluggishly on the downside, since it can only reduce capital through depreciation. ```{figure} figures/firm_investment.pdf :name: f-fi_policy Investment policies: irreversible vs. reversible ``` ```{figure} figures/firm_investment_sim.pdf :name: f-fi_sim :width: 95% Simulated capital and investment paths under common shocks ``` (sss-iririsk)= #### Adding Risk Aversion As discussed in {ref}`ss-brn`, actual firm behavior often deviates from the risk-neutral benchmark attained under the assumption of frictionless complete markets. Here we extend the model from {ref}`sss-abm` in order to discuss this case. We swap the Bellman equation from {ref}`sss-abm` with $$ v(k, z) = \max_{0 \leq i \leq \theta f(k,z)} \left\{ f(k, z) - i + \beta \eE [v(i + (1-\delta) k, g(z, \xi))] \right\}. $$ (eq-ara) The only change is that mathematical expectation has been replaced with a certainty equivalent $\eE$. The term $\xi$ should be understood as a random element on $\Zsf$ with distribution $\phi$. We assume that the map $(k, z, i) \mapsto \eE [v(i + (1-\delta) k, g(z, \xi))]$ is measurable on $\Gsf$ for all $v \in b\Xsf$. We can set this model up as an RDP by taking $V = b\Xsf$, $\Gamma(k,z) = [0, \theta f(k,z)]$, and $$ B(k,z,i,v) = f(k,z) - i + \beta \eE [v(i + (1-\delta) k, g(z, \xi))]. $$ By the monotonicity of certainty equivalents, we have $B(k,z,i,v) \leq B(k,z,i,v')$ whenever $v \leq v'$. Also, by our measurability assumption and boundedness of $f$, the map sending $(k,z)$ into $B(k,z, \sigma(k,z), v)$ is bounded and measurable whenever $\sigma \in \Sigma$ and $v \in V$. This confirms that $(\Gamma, V, B)$ is an RDP. ```{prf:proposition} :label: p-fipr If {prf:ref}`a-fia` holds and $\eE$ is a continuous certainty equivalent, then, for the certainty equivalent variation of the firm investment model, 1. the fundamental optimality properties hold, 2. the value function $\vmax$ lies in $bc \Xsf$, and 3. VFI converges geometrically on $bc \Xsf$. ``` ```{prf:proof} The model is an instance of the MDP framework of {ref}`ss-mpdsces`, with state $x = (k,z)$, action $a = i$, reward $r(k,z,i) = f(k,z) - i$, and transition $(k,z,i,\xi) \mapsto (i + (1-\delta) k, g(z,\xi))$. Under {prf:ref}`a-fia`, $\Gamma$ is continuous, nonempty, and compact-valued ({prf:ref}`ex-ecco`), $r$ is bounded and continuous, and the transition is continuous in $(k,z,i)$ for each $\xi$, so {prf:ref}`a-mdpcea` holds. The claims now follow from {prf:ref}`p-cemdps`. ◻ ``` ```{prf:example} {prf:ref}`p-fipr` applies whenever $\eE$ is the CVaR certainty equivalent or the entropic certainty equivalent. See {prf:ref}`eg-cvarc` and {prf:ref}`ex-entc` respectively. ``` #### Numerical Illustration To illustrate the impact of risk aversion on firm behavior, we now compute optimal policies under the specific certainty equivalent $$ \eE_\lambda(Z) \coloneq \lambda \EE[Z] - (1 - \lambda)\, \rR_\alpha(Z). $$ (eq-mqce) Here $\lambda \in (0,1)$ and $\rR_\alpha$ is value-at-risk at a fixed $\alpha$, which will be set to the industry standard value $0.05$. The certainty equivalent puts positive weight on both expected rewards and VaR, matching common management practice. Decreasing $\lambda$ increases concern for left tail events. The map $\eE_\lambda$ is a valid certainty equivalent: $-\rR_\alpha = \qQ_\alpha$, the $\alpha$-quantile certainty equivalent, and convex combinations of certainty equivalents are certainty equivalents ({prf:ref}`ex-fsign`). Note that $\eE_\lambda$ is not a continuous certainty equivalent, since VaR can jump under small perturbations. This means that {prf:ref}`p-fipr` does not directly apply. (We are treating VaR here because of its popularity in applications, rather than its attractive theoretical properties.) At the same time, when we implement the model on a machine, all numerical quantities are ultimately represented by a finite set of double-precision floats. In this sense, the model as actually computed is an RDP with finite action sets. By {prf:ref}`p-gsdpc2f0`, optimal policies exist and VFI converges. {numref}`f-fi_risk_policy` compares optimal investment policies for the risk-neutral firm (certainty equivalent $\EE$) and a risk-averse firm using $\eE_\lambda$.[^4] At both productivity levels, the risk-averse firm invests less aggressively. The intuition is that the quantile component of $\eE$ penalizes downside outcomes in the continuation value, which lowers the perceived return to investment. Because the firm cannot reverse investment decisions, the option value of waiting is amplified by risk aversion. {numref}`f-fi_risk_sim` shows simulated paths for both firms facing identical productivity shocks. The risk-averse firm maintains a persistently lower capital stock and invests more cautiously throughout the sample. During periods of high productivity, the gap is especially pronounced: the risk-neutral firm boosts capital aggressively, while the risk-averse firm is restrained, anticipating the possibility of future downturns. ```{figure} figures/firm_investment_risk.pdf :name: f-fi_risk_policy Investment policies: risk-neutral vs. risk-averse ``` ```{figure} figures/firm_investment_risk_sim.pdf :name: f-fi_risk_sim :width: 95% Simulated paths: risk-neutral vs. risk-averse under common shocks ``` (ss-rcfp)= ### Firm Investment under Ambiguity In {ref}`sss-ambi` we discussed how concern for model misspecification can be incorporated into dynamic programs. Here we return to this topic in the context of irreversible investment. We first formulate the robust control version of the firm problem and then show how duality reduces it to the risk-sensitive case already covered by our theory. #### Model Formulation To set up a robust control version of the investment problem we formulate the Bellman equation as $$ v(k, z) = \max_i \left\{ f(k, z) - i + \beta \inf_{\psi \ll \phi} \left[ \int v(k', g(z, x)) \psi(\diff x) + \frac{1}{\gamma} D_{\mathrm{KL}}(\psi \,\|\, \phi) \right] \right\}. $$ Here $k' \coloneq i + (1-\delta) k$ and the maximization is over $i$ with $0 \leq i \leq \theta f(k,z)$. In this case we interpret the problem as one where the manager does not fully trust the model: she fears misspecification in terms of the distribution $\phi$ of the shock sequence $(\xi_t)$ and hence lacks full confidence when calculating expectations of continuation values. Nonetheless, she is willing to treat $\phi$ as a reference model. She entertains distributions $\psi$ that deviate from $\phi$, provided that they don't assign positive probability to events that $\phi$ deems impossible. The penalty term $(1/\gamma) D_{\mathrm{KL}}(\psi \,\|\, \phi)$ can be thought of as a soft constraint. Models further from the reference point (in terms of KL divergence) are regarded as less plausible. If $\gamma$ is close to zero then the penalty term will be very large for even small deviations. Because the evaluation of the continuation value involves an infimum, only very small deviations are considered. This corresponds to greater trust in the model. Conversely, larger values of $\gamma$ indicate deeper distrust. #### Risk-Sensitive Formulation Using the duality in {eq}`eq-entcertdual`, we can rewrite the robust control Bellman equation for the firm problem as $$ v(k, z) = \max_i \left\{ f(k, z) - i - \frac{\beta}{\gamma} \ln \left[ \int \exp[-\gamma v(k', g(z, x))] \phi(\diff x) \right] \right\}. $$ This is a version of {eq}`eq-ara`, with $\eE$ set to the entropic certainty equivalent $\eE_\gamma$. Since $\eE_\gamma$ is continuous ({prf:ref}`ex-entc`), the conditions of {prf:ref}`p-fipr` hold under {prf:ref}`a-fia`. As a result, for this model, the fundamental optimality properties hold, the value function $\vmax$ lies in $bc \Xsf$, and VFI converges geometrically on $bc \Xsf$. The above discussion shows that we do not require any new machinery to tackle the somewhat intimidating robust control version of the investment problem: a duality based approach allows us to switch to a setting where we already have all the results we need. {numref}`f-fi_entrop_policy` compares optimal investment policies under three specifications: the risk-neutral benchmark ($\gamma \to 0$) and the entropic certainty equivalent $\eE_\gamma$ at two levels of ambiguity aversion.[^5] As $\gamma$ increases---reflecting deeper distrust in the reference model---investment falls. The manager who entertains a wider range of alternative models, and who evaluates continuation values under the worst-case distribution within the KL penalty ball, perceives a lower return to committing capital. The effect is monotone in $\gamma$: higher ambiguity aversion leads to uniformly less aggressive investment across all capital levels. {numref}`f-fi_entrop_sim` shows simulated paths for all three firms facing identical productivity shocks. The more ambiguity-averse firm maintains a persistently lower capital stock. During periods of high productivity, the differences are most visible: the risk-neutral firm ramps up capital, while the ambiguity-averse firm invests more cautiously, hedging against the possibility that the favorable conditions are less persistent than the reference model suggests. ```{figure} figures/firm_investment_entrop.pdf :name: f-fi_entrop_policy Investment policies under ambiguity aversion ``` ```{figure} figures/firm_investment_entrop_sim.pdf :name: f-fi_entrop_sim :width: 95% Simulated paths under common shocks with varying ambiguity aversion ``` (ss-kpvsrs)= ### Kreps--Porteus vs Risk-Sensitivity We return to the setup in {ref}`sss-mdpframe`, where $\Xsf$, $\Asf$ are arbitrary metric spaces, $\Gamma$ is a nonempty correspondence from $\Xsf$ to $\Asf$, and $B(x, a, v) = r(x,a) + \beta \EE [v(f(x,a,\xi))]$. We suppose, as in {prf:ref}`a-mdpcea`, that the correspondence $\Gamma$ is compact-valued and continuous, the reward function $r$ is bounded and continuous, and that the map $(x,a) \mapsto f(x,a,z)$ is continuous on $\Gsf$ for all $z \in \Zsf$. As discussed in {ref}`sss-mdpframe`, the fundamental optimality properties hold and VFI converges on $bc \Xsf$. In {ref}`sss-ee` we extended this basic MDP analysis to settings where the aggregator has the form $B(x, a, v) \coloneq r(x,a) + \beta \eE v(f(x,a,\xi))$. In {prf:ref}`p-cemdps` we showed that, when $\eE$ is continuous, the fundamental optimality properties hold, the value function $\vmax$ lies in $bc \Xsf$, and VFI converges geometrically on $bc \Xsf$. One special case is the entropic setting, where $$ B_{\rm RS}(x, a, v) = r(x,a) + \frac{\beta}{\theta} \ln \EE \left[ \exp(\theta \cdot v(f(x,a,\xi))) \right]. $$ (eq-brs) This model is called a **risk-sensitive MDP**. The modified expectation is an application of the entropic certainty equivalent {eq}`eq-entcert` with $\theta = -\gamma$. This modified expectation allows for parameterization of risk-sensitivity through $\theta$, with $\theta < 0$ injecting risk-aversion. Since the entropic certainty equivalent is continuous ({prf:ref}`ex-entc`), we can apply {prf:ref}`p-cemdps`. This tells us that all of the preceding convergence and optimality results apply. Another alternative is to replace the entropic certainty equivalent with Kreps--Porteus expectations, leading to aggregator $$ B_{\rm KP}(x, a, v) = r(x,a) + \beta \left\{ \EE [ v(f(x,a,\xi))^\nu ] \right\}^{1/\nu} \qquad (\nu \in \RR \text{ and } \nu \not= 0). $$ Here, in order to avoid running into trouble with exponents, we require that $r > 0$ and take the value space $V$ to be all functions in $b\Xsf$ that take only positive values. We discussed such an RDP in {ref}`sss-ezisrdp`. Note, however, that the Kreps--Porteus expectation fails cash invariance, and, as such, is not a certainty equivalent (as previously discussed in {ref}`sss-egsces`). As a result, the preceding optimality theory does not apply. In particular, we cannot appeal to {prf:ref}`p-cemdps`. Moreover, the aggregator $B_{\rm KP}$ is not generally contracting, in the sense that {prf:ref}`a-uca20` typically fails. Instead, the RDP $(\Gamma, V, B_{\rm KP})$ has to be treated with other methods, such as the convexity-based techniques used in {ref}`sss-ezwo`. There is, however, a *multiplicative* variation on the Kreps--Porteus RDP that is simple to analyze. The model is obtained by setting $$ B_{\rm MKP}(x, a, v) = r(x,a) \cdot \left\{ \EE [ v(f(x,a,\xi))^\nu ] \right\}^{\beta/\nu}, $$ while continuing to assume that $r$ is everywhere positive. The parameter $\beta$ is a discount factor for the multiplicative model, and is assumed to take values in $[0,1)$. We call $(\Gamma, V, B_{\rm MKP})$ the **multiplicative Kreps--Porteus RDP**. It turns out that the multiplicative Kreps--Porteus RDP and the additive risk-sensitive RDP $(\Gamma, b\Xsf, B_{\rm RS})$ are closely related---in fact they are isomorphic. To illustrate this, we take logs of the Bellman equation associated with the multiplicative Kreps--Porteus RDP, obtaining $$ \ln v(x) = \max_{a \in \Gamma(x)} \left\{ \ln r(x,a) + \frac{\beta}{\nu} \ln \EE [ v(f(x,a,\xi))^\nu ] \right\}. $$ Setting $\hat v = \ln v$ and $\hat r = \ln r$ yields $$ \hat v(x) = \max_{a \in \Gamma(x)} \left\{ \hat r(x,a) + \frac{\beta}{\nu} \ln \EE [ \exp( \nu \cdot \hat v(f(x,a,\xi))) ] \right\}. $$ This is the Bellman equation for {eq}`eq-brs`, after replacing $r$ with $\hat r$ and $\theta$ with $\nu$. ```{exercise} :label: ex-rskp Formalize the discussion above by showing that the corresponding RDPs are isomorphic. (Consider using {prf:ref}`ex-tcrdpsiso`.) ``` ```{solution} ex-rskp We apply {prf:ref}`ex-tcrdpsiso`. The multiplicative Kreps--Porteus RDP is $(\Gamma, V, B_{\rm MKP})$ with $V$ the bounded measurable functions from $\Xsf$ to $(0, \infty)$. Let $\phi = \ln \colon (0, \infty) \to \RR$, which is an order isomorphism. Then $\hat V = \{\phi \circ v : v \in V\} = b\Xsf$, and $(\Gamma, \hat V, B_{\rm RS})$ is the risk-sensitive RDP (after replacing $r$ with $\hat r = \ln r$ and $\theta$ with $\nu$). It remains to verify {eq}`eq-bbhat`: for all $v \in V$ and $(x, a) \in \Gsf$, we need $B_{\rm MKP}(x, a, v) = \phi^{-1}[B_{\rm RS}(x, a, \phi \circ v)]$. Setting $\hat v = \phi \circ v = \ln \circ \, v$, we have $$ \begin{aligned} \phi^{-1}[B_{\rm RS}(x, a, \hat v)] &= \exp\!\left[ \ln r(x,a) + \frac{\beta}{\nu} \ln \EE\, \exp(\nu \cdot \hat v(f(x,a,\xi))) \right] \\ &= r(x,a) \cdot \left\{ \EE\, \exp(\nu \cdot \ln v(f(x,a,\xi))) \right\}^{\beta/\nu} \\ &= r(x,a) \cdot \left\{ \EE [ v(f(x,a,\xi))^\nu ] \right\}^{\beta/\nu} = B_{\rm MKP}(x, a, v). \end{aligned} $$ The conclusion now follows from {prf:ref}`ex-tcrdpsiso`. ``` (s-cn_rdps)= ## Chapter Notes RDPs were introduced in Chapter 8 of {cite}`sargent2025dynamic` in settings where the state space is finite. The theory in this chapter extends that treatment to general state spaces. The study of contracting dynamic programs with abstract Bellman equations was begun by {cite}`denardo1967contraction`. Extensive discussion can be found in {cite}`bertsekas2022abstract`. (The terminology is slightly confusing: the abstract dynamic programs studied in {cite}`denardo1967contraction` and {cite}`bertsekas2022abstract` are similar to the RDPs studied in this chapter. For us, however, abstract dynamic programs are the more general objects introduced in {ref}`ss-defs`.) The optimality results in {ref}`ss-ucs0`--{ref}`ss-ucs` combine the RDP framework with the Blackwell contraction theory of {prf:ref}`c-adps3`. Our framework is similar to the contractive models in {cite}`bertsekas2022abstract`. The results in {ref}`ss-posol` on monotonicity, concavity, uniqueness, and continuity of solutions extend related results in {cite}`bauerle2018stochastic`. In {ref}`ss-ucs` we treated RDPs where rewards are bounded below and unbounded above. Related work can be found in {cite}`toda2023unbounded`. One approach to the reverse case---where rewards are bounded above and unbounded below---can be found in {cite}`ma2022unbounded`. Their idea is to rearrange the Bellman equation so that the transformed problem has bounded rewards, allowing standard contraction mapping arguments to be applied. The transformation is inspired by the Q-function used in reinforcement learning. Regarding Euler equations, early results along the lines of {prf:ref}`p-env` were established by {cite}`mirman1975optimal` and {cite}`benveniste1979differentiability`. The certainty equivalents and risk measures discussed in {ref}`sss-ces` are standard tools in mathematical finance and decision theory. The dual representation theorem for convex risk measures is due to {cite}`follmer2002convex`; see also {cite}`JouiniSchachermayerTouzi2006`. The quantile certainty equivalent and its risk measure counterpart (VaR) have been studied in dynamic programming environments by {cite}`de2019dynamic`, {cite}`de2022static`, {cite}`almeida2024quantile`, {cite}`de2025dynamic`, and {cite}`decastro2025comparison`, among others. The robust control formulation in {ref}`ss-rcfp` builds on {cite}`hansen2001robust` and {cite}`hansen2011robustness`, who developed the multiplier preference approach to robustness in dynamic economic models. The duality between robust control and risk-sensitive preferences, which we exploit to reduce the robust problem to the entropic certainty equivalent case, is a central theme of that literature. For a recent textbook treatment that uses the RDP framework, see {cite}`toda2024essential`. [^1]: If we were more cynical, we would add that existence of these two literatures also reflects the fact that researchers can publish more papers if they study the same thing under different names. [^2]: More formally, a certainty equivalent $\eE$ is called **law invariant** if there exists a functional $e \colon \pP(\RR) \to \RR$ such that $\eE(Z) = e(\PP \circ Z^{-1})$ for all $Z \in L_\infty$. [^3]: We set $f(k,z) = \min\{zk^\alpha, y\}$ with $y=1000$, $\alpha = 0.3$, $\beta = 0.95$, $\delta = 0.1$, and $\theta = 1.5$. The exogenous state follows $Z_t = \exp(X_t)$ where $(X_t)$ is AR(1) with persistence $\rho = 0.9$ and volatility $\nu = 0.2$, discretized via the Tauchen method. The value function is approximated on a grid via linear interpolation of $v(\cdot, z)$ for each $z$, and solved via VFI. [^4]: The parameterization is the same as for the risk-neutral case and $\lambda = 0.5$. The dynamic program is solved via VFI. [^5]: The parameterization is the same as for the risk-neutral case, with $\gamma = 0.05$ and $\gamma = 0.5$ for the entropic certainty equivalent. VFI converges geometrically in all cases. ======================================================================== ## Additional Applications (s-optstop)= ## Job Search We formulate a basic iid job search model as an ADP and establish optimality results. We then reduce dimensionality via continuation values and study parametric monotonicity of the reservation wage. Finally, we extend the model to allow correlated wage draws with a Markov structure. (ss-jobsearch)= ### The Basic Model We begin with the job search problem of {cite}`mccall1970`, a finite state version of which was discussed at length in Chapter 1 of {cite}`sargent2025dynamic`. Here we consider a general state version and allow wages and rewards to be unbounded. We describe the model, construct the associated ADP on $L_1(\phi)$, and verify that it is well-posed and regular. We also consider smaller value spaces that can be used when the offer distribution has bounded support. We then establish the fundamental optimality properties and convergence of the standard algorithms. (sss-jsdes)= #### Description In each period, an unemployed worker receives a wage offer $W_t$, drawn from some known distribution $\phi$. The worker can accept the current offer or wait until the following period and consider a new offer. ```{prf:assumption} :label: a-woff_fin_mean The offer sequence $(W_t)_{t \geq 0}$ is iid and takes values in a Borel set $\Wsf \subset \RR_+$. The distribution $\phi$ has finite mean. The worker discounts future payoffs via discount factor $\beta \in (0,1)$. ``` Let $L_1(\phi) \coloneq L_1(\Wsf, \bB, \phi)$ be all Borel measurable $f \colon \Wsf \to \RR$ with $\int |f| \diff \phi < \infty$. As usual, functions equal $\phi$-almost everywhere are identified and $f \leq g$ means that $\{f > g\}$ has measure zero under $\phi$. Let $\Sigma$ be all Borel measurable $\sigma \colon \Wsf \to \{0,1\}$. Each such $\sigma$ can be understood as a policy, mapping states to actions: If $\sigma(w)=1$, then the unemployed worker stops and accepts current offer $w$. If $\sigma(w)=0$, she continues. Consider first a two-period problem. In period zero, the worker can either accept observed wage offer $w_0 \sim \phi$ or continue to the next period, receiving unemployment compensation $c$ and random payoff $v(W_1)$. The offer $W_1$ is drawn from $\phi$ and $v$ is a given "terminal reward" function. Under policy $\sigma$, which maps the wage offer $w_0$ into an accept/reject decision, the expected present value of her payoff is $$ v_\sigma(w_0) \coloneq \sigma(w_0) w_0 + (1-\sigma(w_0)) \left[ c + \beta \int v(w') \phi(\diff w') \right]. $$ (eq-jstp) If $\sigma(w_0)=1$ the worker accepts and receives reward $w_0$. If $\sigma(w_0)=0$, then she rejects and receives expected continuation reward $c + \beta \int v(w') \phi(\diff w')$. Now we switch to an infinite horizon. Jobs are assumed to be permanent, so the present value of stopping with wage offer $w$ in hand is $$ \frac{w}{1-\beta} = w + \beta w + \beta^2 w + \cdots $$ (eq-jss) Fixing $\sigma \in \Sigma$, let $v_\sigma(w)$ be the lifetime value of following policy $\sigma$ given initial wage offer $w$. By analogy with {eq}`eq-jstp`, we expect $v_\sigma$ to obey the recursion $$ v_\sigma(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v_\sigma(w') \phi(\diff w') \right] \quad \text{for all } w \in \Wsf. $$ (eq-jsvsig) Compared to {eq}`eq-jstp`, we have taken the value of stopping from {eq}`eq-jss` and also replaced the terminal value function $v$ on the right-hand side of {eq}`eq-jstp` with $v_\sigma$. This is because we now work with an infinite horizon, so that {eq}`eq-jsvsig` becomes a recursion in $v_\sigma$. Continuing to hold $\sigma$ fixed, we introduce the policy operator $v \mapsto T_\sigma \, v$ via $$ (T_\sigma \, v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v(w') \phi(\diff w') \right]. $$ (eq-js_poloper) Since $L_1(\phi)$ is closed under linear operations, policies are Borel measurable, and {prf:ref}`a-woff_fin_mean` is in force, we have $T_\sigma \, v \in L_1(\phi)$ whenever $v \in L_1(\phi)$. Clearly $T_\sigma$ is order preserving on $(L_1(\phi), \leq)$. Hence, with $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$, the pair $(L_1(\phi), \TT)$ is an ADP. By construction, any fixed point of $T_\sigma$ solves {eq}`eq-jsvsig`, so each such fixed point $v_\sigma$ has the interpretation of assigning lifetime values to states under $\sigma$. ```{exercise} :label: ex-apps-auto-1 Prove that $(L_1(\phi), \TT)$ is well-posed. ``` ```{solution} ex-apps-auto-1 Fix $\sigma \in \Sigma$. Straightforward arguments show that $T_\sigma$ is a contraction of modulus $\beta$ on $L_1(\phi)$. Since $L_1(\phi)$ is complete and $\beta \in (0,1)$, it follows that $T_\sigma$ has a unique fixed point. Hence $(L_1(\phi), \TT)$ is well-posed. ``` Fix $v \in L_1(\phi)$ and consider the policy $\sigma$ given by $$ \sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int v(w') \phi(\diff w') \right\} \qquad (w \in \Wsf). $$ (eq-js_greedy) This policy tells the worker to stop when the payoff from stopping is larger than the expected payoff from continuing, assuming that $v$ is used to value future states. ```{exercise} :label: ex-jsvg Show that $\sigma$ is $v$-greedy. ``` ```{solution} ex-jsvg For fixed $w \in \Wsf$, combining {eq}`eq-js_poloper` and {eq}`eq-js_greedy` gives $$ (T_\sigma \, v)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \int v(w') \phi(\diff w') \right\}. $$ (eq-js_tsigag) From {eq}`eq-js_tsigag` we see that $T_\tau \, v \leq T_\sigma \, v$ for any $\tau \in \Sigma$, so $\sigma$ is $v$-greedy as claimed. ``` Since $\sigma$ in {eq}`eq-js_tsigag` is well-defined at any $v \in L_1(\phi)$, and also Borel measurable, the ADP $(L_1(\phi), \TT)$ is regular. ```{exercise} :label: ex-apps-auto-2 Show that the Bellman operator of ADP $(L_1(\phi), \TT)$ obeys $$ (T v)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \int v(w') \phi(\diff w') \right\} \qquad (v \in V, \; w \in \Wsf). $$ (eq-tvjsadp) ``` ```{solution} ex-apps-auto-2 This follows immediately from {eq}`eq-js_tsigag`, which shows $T_\sigma \, v$ at a $v$-greedy policy $\sigma$, and {prf:ref}`l-torper`. ``` The expression for $T$ in {eq}`eq-tvjsadp` aligns with the job search Bellman operator from Chapter 1 of {cite}`sargent2025dynamic`, after replacing the finite expectation with an integral. #### Reducing the Value Space In the preceding analysis we used $L_1(\phi)$ for the value space because $\phi$ is allowed to have unbounded support. Since $w$ can be arbitrarily large, this implies that the function $T_\sigma \, v$ in {eq}`eq-js_poloper` is unbounded. The set $L_1(\phi)$ can handle unbounded functions. To complete this section, the next exercise looks at settings where $\phi$ has bounded support and considers how we might exploit this by selecting a smaller value space. ```{exercise} :label: ex-jsvs Suppose that $\Wsf$ is bounded and let 1. $b\Wsf$ be the bounded Borel measurable functions on $\Wsf$, 2. $bc\Wsf$ be the continuous functions in $b\Wsf$, 3. $ibc\Wsf$ be the increasing (i.e., nondecreasing) functions in $bc\Wsf$, and 4. $ibc\Wsf_+$ be the nonnegative functions in $ibc\Wsf$, Show that if $V =$ any of these spaces paired with the pointwise order and each $T_\sigma \in \TT$ is given by {eq}`eq-js_poloper`, then $(V, \TT)$ is an ADP with Bellman operator given by {eq}`eq-tvjsadp`. ``` ```{exercise} :label: ex-apps-auto-3 Show that, in the setting of {prf:ref}`ex-jsvs`, the job search problem is an RDP. In your answer, set $V = b\Wsf$. ``` ```{solution} ex-apps-auto-3 We set $\Gamma(w) = \{0, 1\}$ for every $w \in \Wsf$, $V = b\Wsf$, and $$ B(w, a, v) = a \frac{w}{1-\beta} + (1 - a) \left[ c + \beta \int v(w') \phi(\diff w') \right]. $$ (eq-gmodagg) For monotonicity, if $v \leq u$ pointwise, then $\int v \diff \phi \leq \int u \diff \phi$, so $B(w, a, v) \leq B(w, a, u)$ for all $(w,a) \in \Gsf$. For consistency, fix $\sigma \in \Sigma$ and $v \in b\Wsf$, and let $m(w) \coloneq B(w, \sigma(w), v)$. Since $\Wsf$ is bounded, $w/(1-\beta)$ is bounded on $\Wsf$, and $\int v \diff \phi$ is finite since $v$ is bounded. Hence $m$ is bounded. Moreover, $m$ is Borel measurable because $\sigma$ is measurable and $w \mapsto w/(1-\beta)$ is continuous. Thus $m \in b\Wsf$. ``` (sss-searchop0)= #### Optimality with iid Offers Let's return now to the general setting of {prf:ref}`a-woff_fin_mean`, where $\Wsf \subset \RR_+$ can be unbounded, and use $L_1(\phi)$ for the value space. We consider optimality properties and convergence of algorithms for the job search ADP $(L_1(\phi), \TT)$. ```{prf:proposition} :label: p-jsiid If {prf:ref}`a-woff_fin_mean` is in force, then 1. the fundamental optimality properties hold, and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} We use {prf:ref}`t-affineban_sr`. We saw above that the ADP $(L_1(\phi), \TT)$ is regular. Moreover, each $T_\sigma$ is affine, with $T_\sigma \, v = r_\sigma + K_\sigma \, v$ when $r_\sigma \coloneq \sigma e + (1 - \sigma) c$, $e \coloneq w/(1-\beta)$, and $K_\sigma \, v \coloneq (1 - \sigma) \beta \int v \diff \phi$. We have $0 \leq K_\sigma \leq D$ when $D v \coloneq \beta \int v \diff \phi$. Since $\rho(D) = \beta < 1$, the conditions of {prf:ref}`t-affineban_sr` hold. ◻ ``` Since the fundamental optimality properties hold, the value function $\vmax$ is a fixed point of the Bellman operator $T$ and a policy $\sigma$ is optimal if and only if it is $\vmax$-greedy, which is to say that $$ \sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int \vmax(w') \phi(\diff w') \right\} $$ for all $w \in \RR_+$. (We are assuming that the agent accepts the job offer if indifferent.) In other words, the optimal rule is to stop if and only if $$ w \geq (1 - \beta) \left[ c + \beta \int \vmax(w') \phi(\diff w') \right]. $$ The term on the right-hand side is called the **reservation wage**. This representation of optimal behavior is convenient because the reservation wage provides a scalar summary of the solution to the problem. (ss-rtbe)= ### Rearranging the Bellman Equation In the iid case, the Bellman equation can be reduced to a scalar fixed point problem in a single continuation value. We derive this reduction, connect it to the theory of factored DPs, and use it to study how the reservation wage varies with model parameters. (sss-conval)= #### Continuation Values In view of {eq}`eq-tvjsadp`, a function $v$ satisfies the Bellman equation when $$ v(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \int v(w') \phi(\diff w') \right\} \quad \text{for all } w \in \Wsf. $$ (eq-jsvfr) Taking $v$ as given, consider the term $$ h = c + \beta \int v(w') \phi(\diff w') . $$ (eq-jsdh) We can use $h$ to eliminate the function $v$ from {eq}`eq-jsvfr`. To do so we insert $h$ on the right-hand side, replace $w$ with $w'$ in {eq}`eq-jsvfr`, take expectations, multiply by $\beta$ and add $c$ to obtain $$ h = c + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w'). $$ (eq-jsodb) This is a nonlinear equation in $h$, the solution of which, henceforth denoted $\hmax$, is the **optimal continuation value** of our problem. Obtaining $\hmax$ allows us to solve the dynamic programming problem, since any policy $\sigma$ satisfying $$ \sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax \right\} \quad \text{for all } w \in \RR_+ $$ (eq-opjs2) is optimal. (We discuss this more formally in {ref}`sss-iidjsfdp`.) Another way to write {eq}`eq-opjs2` is $$ \sigma(w) = \1 \left\{ w \geq \wopt \right\} \quad \text{where } \; \wopt \coloneq (1 - \beta) \hmax, $$ (eq-opjs3) where the final term is the reservation wage. In order to solve {eq}`eq-jsodb`, we introduce the mapping $$ g(h) = c + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w') \qquad (h \in \RR_+). $$ (eq-jsdg) It is constructed such that any solution to {eq}`eq-jsodb` is a fixed point of $g$ and vice versa. ```{exercise} :label: ex-jscuf0 Using the elementary bound $$ |\alpha \vee x - \alpha \vee y| \leq |x - y| \qquad (\alpha, x, y \in \RR), $$ (eq-elmb2) show that $g$ is a contraction of modulus $\beta$ on $\RR_+$ under the usual Euclidean distance. ``` The result of {prf:ref}`ex-jscuf0` implies that $g$ has a unique fixed point $\hmax$ in $\RR_+$, which is the optimal continuation value. {numref}`f-iid_job_search_2` shows the function $g$ when $\ln W_t = \mu + s Z_t$ for standard normal $Z_t$, while $\beta = 0.9$ and $c = 1.0$. The integral in {eq}`eq-jsdg` is computed by Monte Carlo. The unique fixed point is $\hmax$. ```{figure} figures/iid_job_search_2.pdf :name: f-iid_job_search_2 The function $g$ from {eq}`eq-jsdg` ``` One obvious advantage of the formulation of the problem in {ref}`sss-conval` is that, instead of searching for a value function $\vmax$ in the infinite-dimensional space $L_1(\phi)$, we only need to solve for the fixed point of $g$ in the one-dimensional space $\RR_+$. The next exercise further reduces the search space to a bounded interval in $\RR_+$. ```{exercise} :label: ex-jscuf Find a constant $K$ such that $g$ maps $[0, K]$ to itself. ``` ```{solution} ex-jscuf Let $\bar w \coloneq \int w \phi(\diff w)$ and let $f(h) \coloneq c + \beta \bar w / (1-\beta) + \beta h$. Since $\max\{w'/(1-\beta), h\} \leq w'/(1-\beta) + h$ for $w', h \geq 0$, we have $g(h) \leq f(h)$ for all $h \in \RR_+$. The unique fixed point of $f$ is $K \coloneq (c + \beta \bar w / (1-\beta))/(1 - \beta)$. Since $g \leq f$ pointwise and $g$ is increasing (by inspection of {eq}`eq-jsdg`), we have $g(K) \leq f(K) = K$. Moreover, $g(0) = c + \beta \int (w'/(1-\beta)) \phi(\diff w') \geq 0$. Hence $g$ maps $[0, K]$ to itself. ``` {numref}`f-iid_job_search` shows the reservation wage, computed by iterating on $g$ to obtain (an approximation to) $\hmax$ and then calculating $\wopt$ via {eq}`eq-opjs3`. In the computation, $c$ and the distribution $\phi$ are as for the last figure, while $\beta$ ranges from $0.9$ to $0.99$. ```{figure} figures/iid_job_search.pdf :name: f-iid_job_search Reservation wage as a function of $\beta$ ``` ```{exercise} :label: ex-apps-auto-4 As a computational exercise, using the same specification as {numref}`f-iid_job_search` and $\beta=0.98$, compute $\hmax$ as a fixed point of $g$ in {eq}`eq-jsdg` and then $\vmax$ via $\vmax(w) = \max \left\{ w/(1-\beta), \hmax \right\}$. Next, compute $\vmax$ as a fixed point of the Bellman operator {eq}`eq-tvjsadp`. Plot both and confirm that the plots are essentially identical. ``` (sss-iidjsfdp)= #### An FDP Perspective As an exercise, let's connect the transformation discussed in {ref}`sss-conval` to the theory of FDPs in {prf:ref}`c-transforms`. For this discussion we adopt the environment of {ref}`sss-searchop0` and set 1. $V = L_1(\phi)$, 2. $\hat V = \RR_+$, 3. $F \colon V \to \hat V$ with $Fv = c + \beta \int v(w') \phi(\diff w')$, and 4. $G_\sigma \colon \hat V \to V$ with $(G_\sigma \, h)(w) = \sigma(w) (w/(1-\beta)) + (1-\sigma(w)) h$. Clearly, given $h \in \hat V$, we can attain the bound $G_\tau h \leq G_\sigma h$ for all $\tau \in \Sigma$ by setting $$ \sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq h \right\} \qquad (w \in \Wsf). $$ (eq-jsg) Let $\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}$. Since $F$ and each $G_\sigma$ are order-preserving, the tuple $(V, F, \hat V, \GG)$ is an order-preserving FDP. For the primary ADP generated by $(V, F, \hat V, \GG)$, the policy operators have the form $$ (T_\sigma \, v)(w) = (G_\sigma F v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[c + \beta \int v(w') \phi(\diff w') \right]. $$ This is identical to {eq}`eq-js_poloper`, so the primary ADP is just the original job search ADP $(L_1(\phi), \TT)$ from {ref}`sss-jsdes`. Regarding the subordinate ADP generated by $(V, \hat V, F, \GG)$, the policy operators have the form $$ \hat T_\sigma \, h = F G_\sigma h = c + \beta \int \left[ \sigma(w') \frac{w'}{1-\beta} + (1-\sigma(w')) h \right] \phi(\diff w'). $$ The associated Bellman operator is $$ \hat T \, h = c + \beta \int \max \left\{ \frac{w'}{1-\beta} , h \right\} \phi(\diff w'). $$ On inspection, we see that the fixed point of $\hat T$ is a solution to {eq}`eq-jsodb`. Thus, the subordinate ADP $(\hat V, \hat{\TT})$ represents the continuation value problem from {ref}`sss-conval`. These observations formalize the ideas expressed in {ref}`sss-conval`. For example, {prf:ref}`t-rgsc` tells us that a policy $\sigma \in \Sigma$ will be optimal for the job search problem when $\hmax$ is a fixed point of $\hat T$ and $\sigma$ obeys $G_\sigma \, \hmax = \Gmax \hmax$. (Here $G$ is the supremum $\bigvee_\sigma G_\sigma$.) In view of {eq}`eq-jsg`, such a $\sigma$ can be found by setting $$ \sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax \right\} \qquad (w \in \Wsf). $$ This policy aligns with the (informally derived) solution from {eq}`eq-opjs2`. (sss-jspm)= #### Parametric Monotonicity How does the solution to the job search problem vary with parameters? In terms of monotonicity, one way to answer this is to appeal to {prf:ref}`p-ofpdsms` on page . Since $g$ is an increasing contraction mapping on $\RR_+$, this proposition implies that any parameter that shifts up the function $g$ in {eq}`eq-jsdg` pointwise on $\RR_+$ also shifts its fixed point up. ```{prf:example} The optimal continuation value $\hmax$ is increasing in $c$. Indeed, if $c_1 \leq c_2$, then $$ c_1 + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w') \leq c_2 + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w'). $$ Thus, the function $g$ shifts up everywhere when $c$ increases and hence $\hmax$ increases with $c$. This is as expected, since higher unemployment compensation makes the value of continuing to the next period greater. ``` ```{exercise} :label: ex-jscu Prove that the reservation wage $\wopt$ is increasing in unemployment compensation $c$. Prove also that $\hmax$ is increasing in $\beta$. ``` {numref}`f-iid_job_search` suggests that $\wopt$ is also increasing in $\beta$. Since $\wopt = (1-\beta) \hmax$, we cannot infer this directly from the fact that $\hmax$ is increasing in $\beta$. Instead, we take the fixed point equation for $\hmax$ in {eq}`eq-jsodb` and substitute $w = (1-\beta) h$, which uses the definition of the reservation wage from {eq}`eq-opjs3`, to obtain a new fixed point equation $f(w) = w$ where $$ f(w) \coloneq c (1-\beta) + \beta \int \max \left\{ w' ,\, w \right\} \phi(\diff w'). $$ (eq-jswf) ```{exercise} :label: ex-apps-auto-5 Let $\bar w = \int w' \phi(\diff w')$ be the mean wage offer. Prove the following: If $c \leq \bar w$, then the reservation wage $\wopt$ is increasing in $\beta$. ``` ```{solution} ex-apps-auto-5 We apply {prf:ref}`p-ofpdsms` on page . Fixing $w \in \RR_+$, it suffices to show that the value $f(w) \coloneq c (1-\beta) + \beta \int \max \left\{ w' ,\, w \right\} \phi(\diff w')$ shifts up when $\beta$ increases. This is true when $c \leq \bar w$, because $f(w)$ is the weighted average of two terms and the second term is larger than the first: $$ \int \max \left\{ w' ,\, w \right\} \phi(\diff w') \geq \int w' \phi(\diff w') = \bar w \geq c. $$ Increasing $\beta$ puts more weight on the larger term, so $f(w)$ increases with $\beta$. ``` ```{exercise} :label: ex-jstau Let $\tau$ be the first passage time to employment for an unemployed agent. That is, $$ \tau \coloneq \inf \setntn{t \geq 0}{\sigopt(W_t) = 1}. $$ Prove that the mean first passage time $\EE \tau$ increases with $c$. ``` How do shifts in the wage offer distribution affect the reservation wage? One observation is that a shift to a "more favorable" wage distribution should increase the reservation wage, since an agent who continues can expect better offers. ```{exercise} :label: ex-apps-auto-6 Let $\phi, \psi$ be two offer distributions on $\RR_+$ and let $\wopt_\phi$ and $\wopt_\psi$ be the associated reservation wages. Suppose that both distributions are supported on $[0, M]$. Prove the following: If $\psi$ first order stochastically dominates $\phi$ (see {ref}`ss-sd`), then $\wopt_\phi \leq \wopt_\psi$. ``` ```{solution} ex-apps-auto-6 The proof is another application of {prf:ref}`p-ofpdsms`. Let $\psi$ and $\phi$ have the stated properties. Since the reservation wage is the fixed point of $f$ in {eq}`eq-jswf`, it is enough to fix $w \in \RR_+$ and show that $$ \int \max \left\{ w' ,\, w \right\} \phi(\diff w') \leq \int \max \left\{ w' ,\, w \right\} \psi(\diff w') $$ Since $w' \mapsto \max\{w', w\}$ is bounded and increasing on $[0, M]$, this inequality follows directly from the definition of stochastic dominance. ``` A more interesting monotonicity result for this model concerns the volatility of the wage process and its impact on the reservation wage. For this problem, greater volatility encourages patience because the option value of waiting is larger. The next exercise asks you to verify this, using the concept of mean-preserving spreads (page ). ```{exercise} :label: ex-apps-auto-7 Prove: If $\psi$ is a mean-preserving spread of $\phi$, then $\wopt_\phi \leq \wopt_\psi$. ``` ```{solution} ex-apps-auto-7 Let $\psi$ and $\phi$ have the stated properties and fix $w \in \RR_+$. In view of {prf:ref}`p-ofpdsms`, it is enough to show that, under the stated assumptions, the value $f(w)$ in {eq}`eq-jswf` increases with the mean-preserving spread, or, equivalently $$ \int \max \left\{ w' ,\, w \right\} \phi(\diff w') \leq \int \max \left\{ w' ,\, w \right\} \psi(\diff w'). $$ (eq-jsmps) To see that this is so, observe that, by the definition of a mean-preserving spread, there exists a pair $(w', Z)$ such that $\EE[ Z \given w' ] = 0$, $w' \eqdist \phi$ and $w' + Z \eqdist \psi$. By this fact and the law of iterated expectations, $$ \int \max \left\{ w' ,\, w \right\} \psi(\diff w') = \EE \left[ \max \left\{ w' + Z ,\, w \right\} \right] = \EE \left[ \EE \left[ \max \left\{ w' + Z ,\, w \right\} \, | \, w' \right] \right]. $$ An application of Jensen's inequality now produces $$ \int \max \left\{ w' ,\, w \right\} \psi(\diff w') \geq \EE \max \left\{ \EE [w' + Z \given w' ] ,\, w \right\} . $$ Using $\EE[w' \given w'] = w'$ and $\EE[Z \given w'] = 0$ confirms {eq}`eq-jsmps`. ``` (ss-cwd)= ### Job Search with Correlated Wage Draws In our simplistic model of job search we have so far assumed that wage offer draws are iid. Now let's allow these offers to have a Markov structure: ```{prf:assumption} :label: a-jsmara The wage sequence $(W_t)$ is $P$-Markov (see {ref}`ss-markop`) on a Borel set $\Wsf \subset \RR_+$, where $P$ is a Markov operator on $\Wsf$. The Markov operator $P$ has a stationary distribution $\phi$ on $\Wsf$ with finite mean, so that $\int w \phi(\diff w) < \infty$. ``` (sss-jsmwd)= #### An ADP Representation As before, $\Sigma$ is the set of all Borel measurable functions $\sigma$ mapping $\Wsf$ to $\{0, 1\}$. Each policy operator $T_\sigma$ is adjusted to $$ (T_\sigma \, v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v(w') P(w, \diff w') \right]. $$ We can write $T_\sigma$ more succinctly as $$ T_\sigma \, v = \sigma e + (1 - \sigma) (c + \beta Pv) \quad \text{when} \quad e(w) \coloneq \frac{w}{1-\beta}, $$ (eq-tsmsjs) with products such as $\sigma e$ defined pointwise. ```{exercise} :label: ex-tsalqm Prove that $T_\sigma$ is an order preserving self-map on $L_1(\phi)$. ``` ```{solution} ex-tsalqm Fix $v \in L_1(\phi)$ and $\sigma \in \Sigma$. For the self-map property, we need to show that $\sigma e + (1 - \sigma) (c + \beta Pv)$ is again in $L_1(\phi)$. Borel measurability is obvious from the Borel measurability of elements of $\Sigma$ and assumptions on the primitives. Regarding $\phi$-integrability, it suffices to show that the individual terms in the sum are integrable. That $\sigma e$ is integrable follows from {prf:ref}`a-jsmara`. Also, $(1-\sigma) c$ is integrable because $\phi$ is a probability measure. Finally, $0 \leq (1-\sigma) \beta Pv \leq Pv$ and $P$ maps $L_1(\phi)$ to itself (see {prf:ref}`l-mopfpl`). The order preserving property of $T_\sigma$ follows from the fact that $P$ is a positive linear operator. ``` ```{exercise} :label: ex-jsmeg Given $v \in L_1(\phi)$, show that the policy $\sigma \in \Sigma$ given by $$ \sigma(w) \coloneq \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int v(w') P(w, \diff w') \right\} \qquad (w \in \Wsf). $$ (eq-tvjsadpmar0) is $v$-greedy. Show, in addition, that the ADP Bellman operator corresponding to $(L_1(\phi), \TT)$ obeys $$ (T v)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \int v(w') P(w, \diff w') \right\} \qquad (v \in L_1(\phi), \; w \in \Wsf). $$ (eq-tvjsadpmar) ``` It follows from {prf:ref}`ex-tsalqm` and {prf:ref}`ex-jsmeg` that, with $\TT$ as all $T_\sigma$ in {eq}`eq-tsmsjs` for some $\sigma \in \Sigma$, the pair $(L_1(\phi), \TT)$ is a regular ADP. ```{exercise} :label: ex-opjsbv0 Show that every policy operator $T_\sigma$ is order continuous on $L_1(\phi)$. ``` ```{solution} ex-opjsbv0 To see that $T_\sigma$ is order continuous, observe that $T_\sigma$ can be expressed as $T_\sigma \, v = a + K_\sigma \, v$, where $a \in L_1(\phi)$ and $K_\sigma = \beta (1-\sigma) P \in \blop(L_1(\phi))$. By {prf:ref}`c-lpdc`, the operator $K_\sigma$ is order continuous. It follows that $T_\sigma$ is itself order continuous (see, e.g., {prf:ref}`ex-imroc` on page ). ``` ```{exercise} :label: ex-jswp Show that the unique fixed point of $T_\sigma$ in $L_1(\phi)$ is $$ v_\sigma \coloneq (I - \beta (1-\sigma) P)^{-1} (\sigma e + (1 - \sigma) c) $$ (eq-jsvso) ``` ```{solution} ex-jswp Since $\beta (1 - \sigma) P \leq \beta P$ in the pointwise order on $\blop(L_1(\phi))$, an application of {prf:ref}`t-orspr` on page  yields $\rho(\beta (1-\sigma) P) \leq \rho(\beta P) = \beta \rho(P) = \beta < 1$. Hence, by the Neumann series lemma, and in particular {prf:ref}`c-ibnl`, $T_\sigma$ has a unique fixed point in $L_1(\phi)$ given by $v_\sigma$ in {eq}`eq-jsvso`. ``` We can now state an optimality result for the job search model with Markov wage draws. ```{prf:proposition} :label: p-jsmdo If {prf:ref}`a-jsmara` holds, then the Markov job search ADP $(L_1(\phi), \TT)$ is well-posed. Moreover, 1. the fundamental optimality properties hold, and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} We showed above that $(L_1(\phi), \TT)$ is regular. From {eq}`eq-tsmsjs` we can write $T_\sigma \, v = r_\sigma + K_\sigma \, v$ when $r_\sigma \coloneq \sigma e + (1 - \sigma) c$ and $K_\sigma \coloneq (1 - \sigma) \beta P$. Observe that $0 \leq K_\sigma \leq K$ when $K \coloneq \beta P$. In addition, $\rho(K) = \beta < 1$ by {prf:ref}`l-mopfpl`. As a result, the conditions of {prf:ref}`t-affineban_sr` are satisfied. ◻ ``` The next exercise suggests a way to reduce the value space. ```{exercise} :label: ex-vslvb Let $$ \bar v \coloneq (I - \beta P)^{-1}(e + c) \quad \text{and} \quad V \coloneq \left\{ v \in L_1(\phi) \; : \; 0 \leq v \leq \bar v \right\}. $$ Show that, for all $\sigma \in \Sigma$, we have $v_\sigma \leq \bar v$ and $T_\sigma \, V \subset V$. ``` ```{solution} ex-vslvb Fix $\sigma \in \Sigma$. Using the power series representation from the Neumann series lemma, we have $$ v_\sigma = \sum_{t \geq 0} [\beta (1-\sigma) P]^t (\sigma e + (1 - \sigma) c) \leq \sum_{t \geq 0} (\beta P)^t ( e + c) = \bar v. $$ Next we show that $v \in V$ implies $T_\sigma v \in V$. To this end, fix $v \in V$. Evidently $0 \leq T_\sigma \, v$. Moreover, since $v \leq \bar v$, $$ T_\sigma \, v \leq e + c + \beta P v \leq e + c + \beta P (I - \beta P)^{-1}(e + c). $$ Using the power series representation, the right hand side can be expressed as $$ e + c + \beta P \sum_{t \geq 0} (\beta P)^t (e + c) = e + c + \sum_{t \geq 1} (\beta P)^t (e + c) = \sum_{t \geq 0} (\beta P)^t (e + c) = \bar v. $$ We have confirmed that $0 \leq T_\sigma \, v \leq \bar v$ when $v \in V$, so each $T_\sigma$ is a self-map on $V$. ``` Since $T_\sigma$ is also order preserving, $(V, \TT)$ is an ADP. ```{exercise} :label: ex-apps-auto-8 Prove: The optimality results from {ref}`sss-jsmwd` also hold for $(V, \TT)$. ``` ```{exercise} :label: ex-apps-auto-9 Prove: HPI converges in finitely many steps when $|\Wsf| < \infty$. ``` ```{solution} ex-apps-auto-9 If $\Wsf$ is finite, then the set of policies $\Sigma$, which is the set of maps from $\Wsf$ into $\{0,1\}$, is also finite. In addition, $(V, \TT)$ is globally stable and hence order stable. Hence {prf:ref}`t-bkf` applies. ``` (ss-numstud)= #### A Numerical Study {numref}`f-job_search_vs` shows the output of HPI under a range of parameter values. First we generate a stochastic matrix $P$ for wage offers via Tauchen's method, discretizing the AR1 process $W_{t+1} = \rho W_t + \nu \xi_{t+1}$ where $(\xi_t)$ is iid and standard normal. We set $\beta=0.99$, $\rho=0.9$, $\nu=0.2$ and $n=500$. In the left subfigure we plot $\vmax$, computed by HPI, as well as the exit option $e(w) = w/(1-\beta)$ and the reservation wage, which is $\bar w \coloneq \min\{w \in \Wsf : \sigopt(w) = 1\}$ when $\sigopt$ is $\vmax$-greedy. The reservation wage is the minimum wage offer at which the unemployed worker accepts. In the right subfigure we vary the volatility parameter $\nu$ over $0.1$ to $0.2$ and plot $\bar w$ as a function of $\nu$, holding other parameters fixed. Notice that the reservation wage increases with wage offer volatility. The reason is that more volatility increases the upside of waiting, due to the possibility of high future offers. At the same time, downside risk is mitigated by the ability to reject a bad offer. ```{figure} figures/job_search_vs.pdf :name: f-job_search_vs Solution to the job search problem ``` {numref}`f-job_search_algos` shows the first two iterates of HPI, OPI and VFI, as well as the value function $\vmax$ and the shared initial condition $v$. Parameter values are the same as the left-hand subfigure in {numref}`f-job_search_vs`. In the case of OPI, $m$ is set to 10. We see that HPI converges faster than VFI in terms of reduced distance to the value function per iteration. The rate for OPI is between HPI and VFI. ```{figure} figures/job_search_algos.pdf :name: f-job_search_algos Comparison of algorithms (job search) ``` #### Persistent and Transient Components Let's now look at a more sophisticated wage offer process, with persistent and transient components. In particular, we assume that $(W_t)$ obeys $$ W_t = \exp(Z_t) + \exp(\mu + \sigma \zeta_t), \qquad \text{where} \quad Z_{t+1} = \rho Z_t + d + s \epsilon_{t+1} $$ (eq-jswages) Here $\mu, d \in \RR$, $\sigma, s$ are positive, and $\rho \in (-1, 1)$. The sequences $(\zeta_t)_{t \geq 1}$ and $(\epsilon_t)_{t \geq 1}$ are independent, iid, and standard normal. Thus, the persistent component $\exp(Z_t)$ and the transient component are lognormal. The model is otherwise unchanged. The state becomes $(w, z) \in (0,\infty) \times \RR$ and the Bellman equation is $$ v(w, z) = \max \left\{ \frac{w}{1-\beta}, c + \beta \, \EE_z v(w', z') \right\} . $$ (eq-jsmvf) Here $\EE_z$ is expectation conditional on $z$. The expectation term can be written more explicitly as $$ \EE_z v(w', z') = \int v \left[ \exp(\rho z + d + s \epsilon) + \exp(\mu + \sigma \zeta), \rho z + d + s \epsilon \right] \, \phi(\diff \epsilon, \diff \zeta). $$ Here $z$ and the parameters are fixed and $\phi$ is the $N(0, I)$ distribution on $\RR^2$. Rather than analyzing this model directly, we can first reduce dimensionality by transforming it via continuation values, analogous to the technique we used in {ref}`ss-rtbe`. As a first step, let $h(z)$ be the continuation value from {eq}`eq-jsmvf`: $$ h(z) \coloneq c + \beta \, \EE_z v(w', z') \qquad (z \in \RR). $$ (Notice that $h$ is a function now, as opposed to the iid setting of {eq}`eq-jsdh`. This is not surprising, since the current state can be used to predict future wages, which in turn determine future value.) Given $h$, the Bellman equation can be written as $v(w, z) = \max \left\{w/(1-\beta), \, h(z) \right\}$. Combining this with the definition of $h$, we see that $$ h(z) = c + \beta \, \EE_z \max \left\{ \frac{w'}{1-\beta}, h(z') \right\} \qquad (z \in \RR). $$ (eq-jscvfh) (Note the similarity with {eq}`eq-jsodb`.) The function $h$ is defined on all of $\RR$, since this is the domain of $z$. If we can obtain the solution $\hmax$ to this functional equation, we can use it to act optimally via the policy $$ \sigopt(w, z) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax(z) \right\}. $$ (eq-opjs4) To formalize these ideas, we can construct an ADP such that the Bellman equation agrees with {eq}`eq-jscvfh`. To do so we take $\Sigma$ to be all Borel measurable functions sending $(w', z') \in (0, \infty) \times \RR$ to $\{0,1\}$ and, for each $\sigma \in \Sigma$, we set $$ (\hat T_\sigma \, h)(z) \coloneq c + \beta \, \EE_z \left\{ \sigma(w', z') \frac{w'}{1-\beta} + (1- \sigma(w', z')) h(z') \right\} . $$ (eq-jsdqp) We take $\phi$ to be the stationary distribution of $(Z_t)$ and consider each $\hat T_\sigma$ as a mapping over all $h \in L_1(\phi) \coloneq L_1(\RR, \bB, \phi)$. Let $\hat{\TT}$ be all such $\hat T_\sigma$ as $\sigma$ ranges over $\Sigma$. The pair $(L_1(\phi), \hat{\TT})$ forms an ADP. ```{exercise} :label: ex-dgmj Prove: If $h \in L_1(\phi)$, then the policy $\sigma$ given by $$ \sigma(w', z') = \1 \left\{ \frac{w'}{1-\beta} \geq h(z') \right\} \qquad ((w', z') \in (0, \infty) \times \RR) $$ is $h$-greedy for $(L_1(\phi), \hat{\TT})$. ``` We can alternatively write $\hat T_\sigma h$ as $\hat T_\sigma \, h = m_\sigma + K_\sigma \, h$, where $$ m_\sigma(z) \coloneq c + \beta \, \EE_z \left\{\sigma(w', z') \frac{w'}{1-\beta} \right\} \quad \text{and} \quad (K_\sigma \, h)(z) \coloneq \beta \, \EE_z \, (1- \sigma(w', z')) h(z'). $$ Each $K_\sigma$ is a positive linear operator on $L_1(\phi)$ and, moreover, for the positive linear operator $K$ defined by $$ (K h)(z) \coloneq \beta \EE_z \, h(z') = \beta \EE h(\rho z + d + s \epsilon_{t+1}), $$ we have $0 \leq K_\sigma \, h \leq Kh$. By {prf:ref}`l-mopfpl`, the spectral radius of $K$ equals $\beta < 1$. Hence, by {prf:ref}`t-affineban_sr`, the fundamental optimality properties hold, and VFI, OPI and HPI all converge. Let's now characterize the Bellman operator, which is defined on $L_1(\phi)$ by $\hat Th = \bigvee_\sigma \, \hat T_\sigma \, h$. ```{exercise} :label: ex-apps-auto-10 Given $h \in L_1(\phi)$, prove that $\hat Th$ obeys $$ (\hat Th)(z) = c + \beta \, \EE_z \max \left\{ \frac{w'}{1-\beta}, h(z') \right\} \qquad (z \in \RR). $$ (eq-jsdq) ``` ```{solution} ex-apps-auto-10 For any $h$-greedy policy $\sigma$, we have $\hat T_\sigma \, h = \hat Th$ ({prf:ref}`l-torper`). Using this fact and {prf:ref}`ex-dgmj` yields {eq}`eq-jsdq`. ``` ```{exercise} :label: ex-qsl2 Prove that $\hat T$ is a contraction of modulus $\beta$ on $L_1(\phi)$. ``` ```{solution} ex-qsl2 Fix $g, h \in L_1(\phi)$. By Jensen's inequality and {eq}`eq-elmb2`, we have $$ \begin{aligned} |(\hat Tg)(z) - (\hat Th)(z)| & \leq \beta \, \EE_z \left| \max \left\{ \frac{w'}{1-\beta}, g(z') \right\} - \max \left\{ \frac{w'}{1-\beta}, h(z') \right\} \right| \\ & \leq \beta \, \EE_z \left| g(z') - h(z') \right|. \end{aligned} $$ Let $Z$ be a draw from $\phi$. Taking the expectation of the last inequality with $z = Z$ and using the fact that $\phi$ is stationary gives $$ \EE |(\hat Tg)(Z) - (\hat Th)(Z)| \leq \beta \EE \EE_Z \left| g(z') - h(z') \right| = \beta \EE \left| g(z') - h(z') \right|. $$ This proves that $\|\hat Tg - \hat Th \| \leq \beta \, \| g - h \|$, where $\| \cdot \|$ is the norm on $L_1(\phi)$. ``` Since $L_1(\phi)$ is complete, Banach's contraction mapping theorem implies that $\hat T$ has a unique fixed point $\hmax$ in $L_1(\phi)$. ```{exercise} :label: ex-apps-auto-11 Let $c_a$ and $c_b$ be two levels of unemployment compensation satisfying $c_a \leq c_b$. Let $\hat T^a$ and $\hat T^b$ be the corresponding continuation value operators, so that $$ (\hat T^i h)(z) = c_i + \beta \, \EE_z \max \left\{ \frac{w'}{1-\beta}, h(z') \right\} \qquad (i \in \{a, b\}, \; z \in \RR). $$ Let $h_a$ and $h_b$ be their respective fixed points. Show that $h_a \leq h_b$ pointwise on $\RR$. ``` ```{solution} ex-apps-auto-11 Since $c_a \leq c_b$, we have $\hat T^a h \leq \hat T^b h$ for all $h \in L_1(\phi)$. By {prf:ref}`ex-qsl2`, $\hat T^b$ is a contraction on $L_1(\phi)$ and hence globally stable. Applying {prf:ref}`p-ofpdsms` gives $h_a \leq h_b$. ``` (s-osext)= ## Extensions In this section we extend the basic job search framework in several directions. {ref}`ss-nondis` introduces nonlinear discounting, where the discount factor depends on the magnitude of continuation value. {ref}`ss-nonexp` treats nonlinear expectations via the Kreps--Porteus certainty equivalent. {ref}`ss-jsl` considers job search with learning, where the offer distribution is unknown and the worker updates beliefs. {ref}`ss-jsep` adds job separation risk. (ss-nondis)= ### Nonlinear Discounting Next we consider a setting where discounting is a nonlinear function of continuation value. One motivation for this generalized setup is **magnitude effects**, under which, for some individuals, discount rates seem to decrease with the size of the reward (i.e., large rewards are discounted less, so the discount factor is increasing in the size of the reward; see, e.g., {cite}`green1997rate`). Our aim is to resolve the job search problem under this new setup. We suppose that wage offers are $P$-Markov on a Borel set $\Wsf \subset \RR_+$ and the value of continuing is given by $$ h(w) = c + \int \beta[ v(w') ] P(w, \diff w'), $$ (eq-hnd) where $\beta \colon \RR_+ \to \RR$ is a discount factor function. Given this nonlinear discounting formulation, the lifetime value of a constant wage stream that pays $w$ is $e(w)$, where $e$ is a fixed point of the operator $$ (H g)(w) = w + \beta(g(w)) \qquad (w \in \Wsf). $$ (eq-fes) (In the case where $\beta(x) = \beta x$ for some fixed $\beta \in (0,1)$, we get $e(w) = w / (1-\beta)$, so we recover the standard constant discount case.) To simplify the analysis, we assume that $\Wsf = [w_1, w_2]$ where $0 < w_1 < w_2$. We also assume that $0 < c < w_1$ and $w_2 \geq 1 - b$, so that the worst wage offer is better than unemployment compensation. For the discount factor function we set $$ \beta(x) \coloneq b F(x, \lambda) \quad \text{where } b \in (0,1) \text{ and } F(x, \lambda) = 1 - \exp(-\lambda x). $$ Obtaining optimality results for this model is not entirely trivial because the policy and Bellman operators are not, in general, contractions. This is due to the fact that $\beta$ can be steep close to zero, as shown in {numref}`f-expcdf`. At the same time, $\beta$ is concave, which gives us some hope that we can use the concavity-based fixed point and optimality results from {ref}`ss-ovscon`. ```{figure} figures/expcdf.pdf :name: f-expcdf The discount function $\beta$ for different choices of $\lambda$ and $b=0.99$ ``` We begin by analyzing $H$ in {eq}`eq-fes`. As a first step, we set $$ V \coloneq [0 , \bar v] \quad \text{where} \quad \bar v \coloneq \frac{c + w_2}{1 - b}. $$ In this expression, $V$ is understood as an order interval in $b\Wsf$. We endow $V$ with the pointwise partial order $\leq$ and the supremum norm. ```{exercise} :label: ex-apps-auto-12 Show that $H$ is an order-preserving self-map on $V$. ``` ```{solution} ex-apps-auto-12 Since $\beta$ is increasing, $g \leq h$ implies $\beta(g(w)) \leq \beta(h(w))$ for all $w$, and hence $Hg \leq Hh$, so $H$ is order preserving. For the self-map property, fix $g \in V$. The lower bound $(Hg)(w) = w + \beta(g(w)) \geq w_1 + \beta(0) = w_1 > 0$ holds for all $w \in \Wsf$. For the upper bound, $g \leq \bar v$ implies $\beta(g(w)) \leq \beta(\bar v) < b$, so $(Hg)(w) \leq w_2 + b \leq \bar v$, where the last step uses $w_2 \geq 1 - b$. ``` ```{exercise} :label: ex-apps-auto-13 Show that $H$ satisfies Du's conditions on $V$ and hence has a unique fixed point $e \in V$. ``` ```{solution} ex-apps-auto-13 By the preceding exercise, $H$ is an order-preserving self-map on $V$. Since $\beta$ is concave, so is $H$. Moreover, $(H \, 0)(w) = w + \beta(0) = w \geq w_1 > 0$ for all $w \in \Wsf$, which gives $0 \ll H \, 0$. By {prf:ref}`l-riesz_con`, $H$ satisfies Du's conditions. {prf:ref}`t-du` then implies that $H$ is globally stable on $V$, so its unique fixed point $e$ lies in $V$ and VFI converges. ``` In general, there is no closed-form expression for $e$, but it can be computed numerically by iterating on $H$. The policy set $\Sigma$ is all measurable $\sigma \colon \Wsf \to \{0,1\}$. Each policy operator $T_\sigma$ becomes $$ (T_\sigma \, v)(w) = \sigma(w) \, e(w) + (1-\sigma(w)) \left[ c + \int \beta[ v(w') ] P(w, \diff w') \right], $$ ```{exercise} :label: ex-apps-auto-14 Prove that each $T_\sigma$ maps $V$ into itself. ``` ```{solution} ex-apps-auto-14 Fix $\sigma \in \Sigma$ and $v \in V$. For the lower bound, $(T_\sigma \, v)(w) \geq \min\{e(w),\, c + \beta(0)\} = \min\{e(w),\, c\} = c > 0$ (using $e \in V$ and $\beta(0) = 0$). For the upper bound, $e(w) \leq \bar v$ since $e \in V$, and $c + \int \beta[v(w')] P(w, \diff w') \leq c + b < \bar v$ since $\beta(x) < b$ for all finite $x$. Hence $T_\sigma \, v \in V$. ``` Let $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$. Since every $T_\sigma$ is order-preserving, $(V, \TT)$ is an ADP. Extending our earlier analysis, a policy $\sigma$ is $v$-greedy whenever $$ \sigma(w) \coloneq \1 \left\{ e(w) \geq c + \int \beta[v(w')] P(w, \diff w') \right\} \qquad (w \in \Wsf). $$ Since such a policy exists, the ADP $(V, \TT)$ is regular. ```{exercise} :label: ex-ndoc Show that the nonlinear discount ADP satisfies the conditions of {prf:ref}`t-riesz_con`. ``` ```{solution} ex-ndoc Regularity has already been established and $b\Wsf$ is countably Dedekind complete ({prf:ref}`c-sdcms`). Hence, to check the conditions of {prf:ref}`t-riesz_con`, it suffices to show that, for given $\sigma$, the operator $T_\sigma$ is concave on $V$ and satisfies Du's conditions. Concavity of $T_\sigma$ follows from concavity of $\beta$ and monotonicity of the integral. For Du's conditions, we showed above that $(T_\sigma \, v)(w) \geq c > 0$ for all $v \in V$ and $w \in \Wsf$. In particular, $(T_\sigma \, 0)(w) \geq c > 0$ for all $w$, giving $0 \ll T_\sigma \, 0$. By {prf:ref}`l-riesz_con`, $T_\sigma$ satisfies Du's conditions. ``` (ss-nonexp)= ### Nonlinear Expectations In the last section we modified the job search model to include nonlinear discounting. Here we drop nonlinear discounting but assume instead that the job seeker uses a nonlinear expectation of future values. In particular, $$ T_\sigma \, v = \sigma e + (1-\sigma) ( c + \beta Rv) $$ where $$ (Rv)(w) \coloneq \left( \int v^{1-\gamma}(w') P(w, \diff w') \right)^{1/(1-\gamma)} \qquad (w \in \Wsf, \; \gamma \not= 1). $$ The operator $R$ is the Kreps--Porteus operator. The value $\gamma$ parameterizes risk aversion for the unemployed worker with respect to intertemporal gambles. When $\gamma = 0$, we recover the standard linear expectation; when $\gamma > 0$, the agent is risk-averse; when $\gamma < 0$, the agent is risk-loving. The constant $\beta$ lies in $(0,1)$. As before, $e(w) \coloneq w/(1-\beta)$ is the stopping reward. (Stopped rewards are deterministic so $e(w)$ is not affected by $\gamma$.) The operator $T_\sigma$ and the operator $R$ act on the set $$ V \coloneq [c , \bar v] \quad \text{where} \quad \bar v \coloneq \frac{c + w_2}{1 - \beta}. $$ As in {ref}`ss-nondis`, $V$ is understood as an order interval in $b\Wsf$ and $0 < c < w_1 < w_2$. ```{exercise} :label: ex-cff Show that 1. $R$ is order preserving on $V$ and maps any constant function to itself. 2. $R$ is concave when $\gamma \geq 0$ and convex when $\gamma \leq 0$. ``` Letting $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$, we aim to show that $(V, \TT)$ is an ADP and, moreover, that the conditions of {prf:ref}`t-riesz_con` hold. As a first step, we observe that, for fixed $\sigma \in \Sigma$ and for sufficiently small positive $\epsilon$, $$ (T_\sigma \, c)(w) \geq \min \left\{ \frac{c}{1-\beta}, \; c + \beta c \right\} \geq c + \epsilon (\bar v - c). $$ Also, for sufficiently small positive $\epsilon$, $$ (T_\sigma \, \bar v)(w) \leq \max \left\{ \frac{w_2}{1-\beta}, \; c + \beta \bar v \right\} \leq \max \left\{ \bar v - \frac{c}{1-\beta}, \; \bar v - w_2 \right\} \leq \bar v - \epsilon (\bar v - c). $$ Since $R$ and hence $T_\sigma$ is order preserving, these facts tell us that $T_\sigma$ maps $V$ to itself, so $(V, \TT)$ is an ADP. Combining the last two $\epsilon$ bounds with fact (ii) above, we see that the conditions of {prf:ref}`t-riesz_con` are satisfied for every $\gamma \in \RR \setminus \{1\}$. As a result, the fundamental optimality properties hold and VFI, OPI, and HPI all converge. {numref}`f-kp_job_search` shows the reservation wage $\bar w$ as a function of $\gamma$. The figure was computed as follows. Fixing $\gamma$, we calculated $\vmax$ via VFI, set $$ \sigma(w) \coloneq \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \left( \int [\vmax(w')]^{1-\gamma} P(w, \diff w') \right)^{1/(1-\gamma)} \right\} $$ and then set $\bar w \coloneq \min\{w \in \Wsf : \sigma(w) = 1\}$. Aside from $\gamma$, the parameters used in the calculations were the same as those given in {ref}`ss-numstud`. The figure shows that the reservation wage decreases with $\gamma$, as the job seeker becomes progressively more risk-averse. Increasing risk aversion means that gambles over future payoffs become less attractive, which favors stopping over continuing. This encourages the job seeker to lower the reservation wage. ```{figure} figures/kp_job_search.pdf :name: f-kp_job_search The reservation wage as a function of $\gamma$ ``` (ss-jsl)= ### Job Search with Learning Next we consider a variation of the job search model from §6.6 of {cite}`ljungqvist2012recursive`. The framework is the iid setting of {ref}`sss-jsdes`, apart from the fact that the wage offer distribution $\phi$ is unknown to the worker. Instead, the agent learns about $\phi$ by starting with a prior belief and then successively updating her beliefs based on observed wage offers. The learning process and hence the stopping problem is similar to the one we analyzed in {ref}`s-wald`. One difference is that we will use discounting, which simplifies optimality results. Another is that we will use a transformation to reduce dimensionality. (sss-jslo)= #### The Model The structure of information is as follows: The worker knows there are two possible offer distributions, with densities $f$ and $g$. At the start of time, nature selects $\phi$ to be either $f$ or $g$, the wage distribution from which the entire sequence $(W_t)_{t \geq 0}$ will be drawn. This choice is not observed by the worker, who puts prior probability $\pi_0$ on $f$ being chosen. In other words, the worker's initial guess of $\phi$ is $\pi_0 f(w) + (1 - \pi_0) g(w)$. Beliefs subsequently update according to Bayes' rule. Thus, the agent, having observed $W_{t+1}$, updates $\pi_t$ to $\pi_{t+1}$ via $$ \pi_{t+1} = \frac{f(W_{t+1})\pi_t}{f(W_{t+1}) \pi_t + g(W_{t+1}) (1 - \pi_t)}. $$ (eq-odu_pi_rec) (Note the connection to the learning dynamics we obtained for the sequential analysis problem in {eq}`eq-waldbayes`.) ```{prf:assumption} :label: a-fg The densities $f$ and $g$ are positive on $(0, M)$ and zero elsewhere. ``` Using {eq}`eq-odu_pi_rec`, we can formulate an ADP representation of the optimal stopping problem. Dropping time subscripts, let $\phi_{\pi} \coloneq \pi f + (1 - \pi) g$ represent the estimate of the wage offer distribution given belief $\pi$ and let $$ \kappa(w, \pi) \coloneq \frac{\pi f(w)}{\pi f(w) + (1 - \pi) g(w)} \qquad (w \in (0, M), \; \pi \in (0, 1)). $$ In particular, $\kappa(w, \pi)$ is the updated value $\pi'$ of $\pi$ having observed draw $w$. The state is $(w, \pi) \in (0, M) \times (0, 1)$ and $\pi$ is referred to as the **belief state**. The policy operators for this learning search problem take the form $$ (T_\sigma \, v)(w, \pi) = \sigma(w, \pi) \frac{w}{1 - \beta} + (1 - \sigma(w, \pi)) \left[ c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w' \right]. $$ Each $T_\sigma$ acts on $v \in V$, which we define as the set of bounded Borel measurable functions on $(0, M) \times (0, 1)$. Let $\TT = \setntn{T_\sigma }{\sigma \in \Sigma}$. Evidently $(V, \TT)$ is an ADP. ```{exercise} :label: ex-apps-auto-15 Given $v \in V$, show that $\sigma$ defined by $$ \sigma(w, \pi) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w' \right\} $$ is $v$-greedy for $(V, \TT)$. ``` ```{exercise} :label: ex-apps-auto-16 Show that the Bellman equation for this ADP is given by $$ v(w, \pi) = \max \left\{ \frac{w}{1 - \beta}, \, c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w' \right\}. $$ (eq-odu_mvf) ``` (sss-jsle)= #### An Efficient Solution Method Rather than tackling $(V, \TT)$ directly, we will introduce a variation with a lower dimensional state space. To begin, fix $v \in V$ and let $\omega(\pi)$ be the corresponding reservation wage at belief state $\pi$, which is the wage level at which the worker is indifferent between accepting and rejecting. This value satisfies $$ \frac{\omega(\pi)}{1 - \beta} = c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w'. $$ (eq-odu_mvf2) We combine {eq}`eq-odu_mvf` and {eq}`eq-odu_mvf2` to obtain $$ v(w, \pi) = \max \left\{ \frac{w}{1 - \beta} ,\, \frac{\omega(\pi)}{1 - \beta} \right\} $$ and then use this expression to eliminate $v$ in {eq}`eq-odu_mvf2`, obtaining $$ \omega(\pi) = (1 - \beta) c + \beta \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w'. $$ (eq-odu_mvf4) Equation {eq}`eq-odu_mvf4` can be understood as a functional equation in $\omega$. Equivalently, the map $\omega$ is the fixed point of the operator $\hat T$ given by $$ (\hat T \omega)(\pi) = (1 - \beta) c + \beta \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w'. $$ (eq-odu_dq) When this fixed point is well-defined we call it the **optimal reservation wage function**. The value $\omega(\pi)$ will indicate the smallest wage offer at which the worker is willing to accept, given her current belief state $\pi$. ```{exercise} :label: p-odu_qc Prove that $\hat T$ is a contraction of modulus $\beta$ on $\hat V \coloneq b(0,1)$. ``` ```{solution} p-odu_qc Let $\| \cdot \|$ be the supremum norm on $b(0,1)$. First we show that $\hat T$ is a self-mapping on $b(0,1)$. To this end, pick any $\omega \in b(0,1)$ and consider the function $\hat T\omega$ defined by {eq}`eq-odu_dq`. Evidently $\hat T\omega$ is Borel measurable. To see that this function is bounded, observe that, by the triangle inequality and the fact that $\phi_\pi$ is a density, $$ (\hat T \omega)(\pi) \leq (1 - \beta) c + \beta \max\{ M , \| \omega \| \} $$ The right hand side does not depend on $\pi$ so $\hat T\omega$ is bounded as claimed. Next let's establish the contraction property. Fix $\omega, \psi \in b(0,1)$ and $\pi \in (0, 1)$. Using the triangle inequality for integrals and the bound {eq}`eq-elmb2` on page  yields $$ |(\hat T \omega)(\pi) - (\hat T \psi)(\pi)| \leq \beta \int \left| \omega [ \kappa(w', \pi) ] - \psi [ \kappa(w', \pi) ] \right| \, \phi_{\pi}(w') \, \diff w' \leq \beta \| \omega - \psi \|. $$ Taking the supremum over $\pi$ gives $\|\hat T \omega - \hat T \psi\| \leq \beta \| \omega - \psi \|$. ``` ```{exercise} :label: ex-apps-auto-17 Prove that $\hat T$ maps $bc(0,1)$ to itself. ``` ```{solution} ex-apps-auto-17 Let $\omega$ be bounded and continuous on $(0,1)$. To show that $\hat T \omega$ is continuous, we need to prove that $$ \int \max \left\{ w', \omega [ \kappa(w', \pi_n) ] \right\} \, \phi_{\pi_n}(w') \, \diff w' \to \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w' $$ when $(\pi_n)$ is a sequence converging to $\pi \in (0, 1)$. For fixed $w'$, both $\kappa(w', \pi)$ and $\phi_\pi(w')$ are continuous in $\pi$, so, by the dominated convergence theorem (page ), it suffices to show that $$ H_n(w') \coloneq \max \left\{ w', \omega [ \kappa(w', \pi_n) ] \right\} \, \phi_{\pi_n}(w') $$ satisfies $\sup_n |H_n(w')| \leq H(w')$ for some $H \colon [0, M] \to \RR$ with $\int H(w') \diff w' < \infty$. Such an $H$ does indeed exist: one suitable choice is $$ H(w') \coloneq \max \left\{ M, \| \omega \| \right\} \, (f(w') + g(w')) . $$ ``` (parametric-monotonicity)= #### Parametric Monotonicity Let's try computing the optimal reservation wage function using the ideas described above. The wage offer distributions are set to $$ f = \text{Beta}(4, 2) \quad \text{and} \quad g = \text{Beta}(2, 4), $$ (eq-tdfg) as shown in {numref}`f-odu_2`. The other parameters are $c=0.1$ and $\beta = 0.95$. Since $\hat T$ is a contraction of modulus $\beta$ on $\hat V$, a unique solution $\omegaopt$ to the reservation wage functional equation exists in $\hat V$ and $\hat T^k \omega \to \omegaopt$ uniformly as $k \to \infty$, for any $\omega \in \hat V$. {numref}`f-odu_1` shows the result of this iteration, the optimal reservation wage, as a function of $\pi$, the belief state. ```{figure} figures/odu_2.pdf :name: f-odu_2 The two unknown densities $f$ and $g$ ``` ```{figure} figures/odu_1.pdf :name: f-odu_1 Optimal reservation wage function $\omegaopt$ ``` Note that the optimal reservation wage function $\omegaopt$ in {numref}`f-odu_1` is increasing in $\pi$. This result seems reasonable: In {numref}`f-odu_2`, the density $f$ puts more mass on higher draws, so, as our belief shifts toward $f$, our reservation wage should increase. The next proposition gives conditions for such monotonicity. ```{prf:proposition} :label: p-jsemlr If $f$ and $g$ have the monotone likelihood ratio property, then $\omegaopt$ is increasing in $\pi$. ``` ```{prf:proof} Let $ib(0,1)$ be all increasing functions in $b(0,1)$. As $ib(0,1)$ is closed in $b(0,1)$ (see, e.g., {prf:ref}`ex-icul`), it suffices to show that $\hat T$ is invariant on $ib(0,1)$. So pick any $\omega \in ib(0,1)$. Since $\hat T$ maps $b(0,1)$ to itself, we need only show that $\hat T\omega$ is increasing. For this it suffices to show that, with $$ h(w', \pi) \coloneq \omega \left[ \frac{\pi f(w')}{\pi f(w') + (1 - \pi) g(w')} \right] $$ the function $$ \pi \mapsto \int \max \left\{ w', h (w', \pi) \right\} \, \phi_{\pi}(w') \, \diff w' $$ is increasing. This will be true if we can establish that (i) $h$ is increasing in both $\pi$ and $w'$, and (ii) the map $\pi \mapsto \phi_\pi$ is isotone with respect to $\lefsd$. To see that (i) holds, write $h$ as $$ h(w', \pi) = \omega \left[ \frac{1}{1 + [(1 - \pi)/ \pi] [g(w')/f(w')]} \right] $$ Since $\omega$ is increasing, this expression is increasing in $\pi$. Also, $f$ and $g$ are assumed to have the monotone likelihood ratio property, which means that $g(w')/f(w')$ is decreasing in $w'$, and hence $h(w', \pi)$ is increasing in $w'$. Thus, condition (i) is established. Condition (ii) follows from {prf:ref}`p-mlrisd` on page , along with the result of {prf:ref}`ex-fosdcvx`. ◻ ``` ```{exercise} :label: ex-fosdcvx Let $F$ and $G$ be two distributions on $\RR$ with $G \lefsd F$. Let $H_\alpha$ be the convex combination defined by $$ H_\alpha \coloneq \alpha F + (1-\alpha) G \qquad (0 \leq \alpha \leq 1) $$ Show that $\alpha_1 \leq \alpha_2$ implies $H_{\alpha_1} \lefsd H_{\alpha_2}$. ``` ```{solution} ex-fosdcvx For fixed $\alpha$ and any increasing bounded function $u$, we have $$ \int u \diff H_\alpha = \int u \diff G + \alpha \left( \int u \diff F - \int u \diff G \right) $$ By the fact that $G \lefsd F$ and $u$ is increasing, this expression is increasing in $\alpha$. Hence $\alpha_1 \leq \alpha_2$ implies $H_{\alpha_1} \lefsd H_{\alpha_2}$ as claimed. ``` ```{exercise} :label: ex-apps-auto-18 Show that $f$ and $g$ in {eq}`eq-tdfg` have the monotone likelihood ratio property. Hint: the Gamma function is increasing over the interval $[2, 4]$. ``` (ss-jsep)= ### Job Search with Separation We consider a version of the job search model from {ref}`sss-jsmwd` where separation can occur. In particular, an existing match between worker and firm dissolves with probability $\alpha$ every period. Note that this discussion extends a treatment of a similar model in a finite-state setting from Chapter 3 of {cite}`sargent2025dynamic`. To simplify the discussion, we assume that the set of possible wage offers $\Wsf \subset \RR_+$ is bounded above by some positive constant $M$. The state space for the problem is $\Xsf \coloneq \{e, u\} \times \Wsf$, with a typical element $(s, w)$ denoting employment status $s$ (here $e$ means employed and $u$ means unemployed), and current offer $w$. A policy is a Borel measurable map $\sigma \colon \Wsf \to \{0,1\}$, where, as usual, $\sigma(w)=0$ means "reject the current offer" and $\sigma(w)=1$ means "accept." The wage offer sequence $(W_t)$ is assumed to be $P$-Markov on $\Wsf$. The value space $V$ will be all bounded and Borel measurable $v \colon \Xsf \to \RR$ and we endow $V$ with the supremum norm and the pointwise partial order. The policy operators take the form $$ (T_\sigma \, v)(e, w) = w + \beta \left[ \alpha \int v(u, w') P(w, \diff w') + (1-\alpha) v(e, w) \right] $$ (eq-tve) and $$ (T_\sigma \, v)(u, w) = \sigma(w) v(e, w) + (1 - \sigma(w)) \left[ c + \beta \, \int v(u, w') P(w, \diff w') \right]. $$ (eq-tvu) The right-hand side of the first expression is the current value of being employed with offer $w$ in hand, given the continuation values embodied in $v$. The right-hand side of the second expression is the current value of being unemployed with offer $w$ in hand, conditional on using policy $\sigma$. We can solve this problem directly by setting up the corresponding ADP, but we can also start by simplifying the value space in a way we now describe. This will make the analysis easier and help with computation. The first step is to regard {eq}`eq-tve` as a fixed point problem, replacing $(T_\sigma \, v)(e, w)$ with $v(e, w)$ on the left hand side and treating $v(u, \cdot)$ as given. Simple algebra then gives $$ v(e, w) = \frac{1}{1 - \beta(1-\alpha)} \left[ w + \alpha \beta \int v(u, w') P (w, \diff w') \right]. $$ (eq-veee) Let's write this in operator notation. In doing so, we will rewrite $v(u, \cdot)$ as $v_u$ and $v(e, \cdot)$ as $v_e$. Setting $$ h(w) \coloneq \frac{1}{1 - \beta(1-\alpha)} w, \quad \text{and} \quad \gamma \coloneq \frac{\alpha \beta}{1 - \beta(1-\alpha)}, $$ we have $v_e = h + \gamma P v_u$. We substitute this expression into {eq}`eq-tvu` to get $$ T_\sigma \, v_u = \sigma(h + \gamma Pv_u) + (1 - \sigma) (c + \beta P v_u), $$ (eq-tsrr) We take $b\Wsf$ as the value space and let $\TT = \setntn{T_\sigma}{\sigma \in \Sigma}$. As before, $\Sigma$ is all Borel measurable maps from $\Wsf$ to $\{0,1\}$. Recalling that $\Wsf$ is bounded above, one can easily confirm that $T_\sigma$ maps $b\Wsf$ to itself. Clearly $T_\sigma$ is order preserving. Hence $(b\Wsf, \TT)$ is an ADP. Given $v_u \in b\Wsf$, set $v_e = h + \gamma P v_u$ and consider the policy $\sigma$ defined by $$ \sigma(w) \coloneq \1 \left\{ v_e(w) \geq c + \beta \, \int \, v_u(w') P(w, \diff w') \right\} \quad \text{for all } w \in \Wsf. $$ (eq-sjss) We claim that $\sigma$ is $v_u$-greedy. Indeed, for such a $\sigma$ and any alternative policy $s$ we have $$ T_s \, v_u = s(h + \gamma Pv_u) + (1 - s) (c + \beta P v_u) \leq (h + \gamma Pv_u) \vee (c + \beta P v_u) = T_\sigma \, v_u . $$ The expression for $\sigma$ in {eq}`eq-sjss` is natural because it tells the worker to accept employment whenever its value is higher than the expected present value of continuing, given the continuation value for unemployment associated with $v_u$. ```{exercise} :label: ex-tsjsc Prove the following: There exists a $\lambda \in (0,1)$ such that $T_\sigma$ is a contraction of modulus $\lambda$ on $b\Wsf$ for all $\sigma \in \Sigma$. ``` ```{solution} ex-tsjsc Let $K \coloneq \beta P$ and $J \coloneq \gamma P$. Fix $f, g \in b\Wsf$ and $\sigma \in \Sigma$. Pointwise on $\Wsf$, we have $$ |T_\sigma \, f - T_\sigma \, g| = |\sigma J (f - g) + (1 - \sigma) K(f - g)| \leq |J (f - g)| \vee |K(f - g)|. $$ But $|J (f - g)| \leq \gamma P |f-g|$ and hence $$ |J (f - g)| \leq \gamma \| f - g\| $$ A similar argument shows that $|K (f - g)| \leq \beta \| f - g\|$. Hence $$ |T_\sigma \, f - T_\sigma \, g| \leq (\beta \| f - g\|) \vee (\gamma \| f - g\|) = (\beta \vee \gamma) \| f - g \|. $$ Since $\gamma < 1$, the operator $T_\sigma$ is a contraction of modulus $\beta \vee \gamma$. ``` Regarding optimality, we have the following result. ```{prf:proposition} :label: p-jsmdo2 The ADP $(b\Wsf, \TT)$ is well-posed. Moreover, 1. the fundamental optimality properties hold, and 2. VFI, OPI and HPI all converge. ``` ```{prf:proof} We showed above that every $v_u \in b\Wsf$ has at least one greedy policy, so the ADP is regular. The claims in {prf:ref}`p-jsmdo2` now follow from {prf:ref}`ex-tsjsc` and {prf:ref}`t-contract`. ◻ ``` The value function $\vmax_u$ for an unemployed worker satisfies the recursion $$ \vmax_u(w) = \max \left\{ \vmax_e(w) ,\, c + \beta \, \int \vmax_u(w') P(w, \diff w') \right\} \qquad (w \in \Wsf), $$ (eq-jsvu) where $\vmax_e$ is the value function for an employed worker, that is, the lifetime value of a worker who starts the period employed at wage $w$. Value function $\vmax_e$ satisfies $$ \vmax_e(w) = w + \beta \left[ \alpha \int \vmax_u(w') P(w, \diff w') + (1-\alpha) \vmax_e(w) \right] \qquad (w \in \Wsf). $$ (eq-jsve) This equation states that value accruing to an employed worker is current wage plus the discounted expected value of being either employed or unemployed next period. We claim that, when $0 < \alpha, \beta < 1$, the system {eq}`eq-jsvu`--{eq}`eq-jsve` has a unique solution $(\vmax_e, \vmax_u)$ in $b\Wsf \times b\Wsf$. Substituting into {eq}`eq-jsvu` yields $$ \vmax_u(w) = \max \left\{ \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P \vmax_u)(w) \right) ,\, c + \beta \, (P \vmax_u)(w) \right\}. $$ (eq-jsvut) ```{exercise} :label: ex-apps-auto-19 Prove that there exists a unique $\vmax_u \in \vV$ that solves {eq}`eq-jsvut`. Propose a convergent method for computing both $\vmax_u$ and $\vmax_e$. ``` The stopping and continuation values are given by $$ \sopt(w) \coloneq \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P \vmax_u)(w) \right) \quad \text{and} \quad \hmax(w) \coloneq c + \beta \, (P \vmax_u)(w) $$ respectively, for each $w \in \Wsf$. The value function $\vmax_u$ is the pointwise maximum (i.e., $\vmax_u = \sopt \vee \hmax$). The worker's optimal policy while unemployed is $$ \sigopt(w) \coloneq \1\{\sopt(w) \geq \hmax(w)\}. $$ ## More Applications In this section we present several additional applications of the theory developed in earlier chapters. {ref}`ss-coase` studies problems with negative discounting and connects them to Coase's theory of the firm. {ref}`ss-oh` treats an optimal harvest problem and shows how factorization reduces dimensionality. {ref}`ss-euler` develops Euler equation methods for continuous choice problems, and {ref}`ss-eqvti` establishes the equivalence of value function iteration and time iteration. (ss-coase)= ### Coase Meets Bellman In this section, we study a production chain model that can capture the key idea from Coase's classic study on the nature of the firm {cite}`coase1937nature`. We then show how the equilibrium price function can also be recovered as the solution to a dynamic programming problem. The dynamic programming problem is itself interesting: it analyzes loss minimization with negative discounting, a scenario that appears to have significant empirical relevance. Moreover, negative discounting makes the application of dynamic programming nontrivial. We solve the problem using order stability arguments combined with the theory developed in {ref}`ss-suffcon`. (sss-apc)= #### A Production Chain {cite}`coase1937nature` argues that firms have nontrivial size because of transaction costs associated with using the market. (One example is the cost of negotiating, drafting, monitoring and enforcing contracts with suppliers. Others include search frictions, transaction fees, taxes, and information costs --- see, e.g., {cite}`williamson1979transaction,north1993institutions,blume2009trading`). Because of these costs, entrepreneurs and managers can sometimes coordinate production at a lower cost within the firm. At the same time, a countervailing force, referred to by {cite}`coase1937nature` as "diminishing returns to management," prevents firms expanding without limit. Rising costs per task can be thought of as driven by the expanding informational requirements associated with larger planning problems, leading to progressively higher management costs, incentive problems and misallocation of resources. The size of firms is determined by the trade-off associated with these two forces (transaction costs and diminishing returns to management). Here we discuss a model that captures these ideas. In the model, firms produce a unit of a final good via sequential completion of processing stages. Stages are indexed by $t \in [0,1]$, with $t=1$ indicating that the good is complete. One allocation of tasks is illustrated in {numref}`f-coase_subp`. In this example, firm 1 sells one unit of the completed good to a final buyer. Firm 1 then contracts with firm 2 to purchase the partially completed good at stage $t_1$, with the intention of implementing the remaining $1 - t_1$ tasks in-house (i.e., processing from stage $t_1$ to stage $1$). Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at stage $t_2$. Firm 3 decides to complete the chain, selecting $t_3 = 0$. The value $t_i$ is called the **upstream boundary** of firm $i$. ```{figure} figures/coase_subp.svg :name: f-coase_subp Recursive allocation of production tasks ``` Production now unfolds from upstream to downstream. First, firm 3 completes processing stages from $t_3 = 0$ up to $t_2$ and transfers the good to firm 2. Firm 2 then processes from $t_2$ up to $t_1$ and transfers the good to firm 1, who processes from $t_1$ to $1$ and delivers the completed good to the final buyer. In what follows, the length of the interval of stages carried out by firm $i$ is denoted by $a_i$ and referred to as the **range** of tasks carried out by firm $i$. {numref}`f-coase_no` helps to clarify notation. ```{figure} figures/coase_no.svg :name: f-coase_no Notation ``` There is a countable infinity of ex ante identical firms and no fixed costs or barriers to entry. An **allocation** is a nonnegative sequence $A = \{a_i\}_{i \in \NN}$. An allocation $A$ defines a division of tasks across firms, with $a_i$ being the range of tasks implemented by the $i$-th firm. If $a_i = 0$ then firm $i$ is understood to be inactive. An allocation $A$ is called **feasible** if there exists some finite $I$ with $\sum_{i = 1}^I a_i = 1$. Feasibility means that the entire production process is completed by finitely many firms. Given a feasible allocation $A$, let $\{t_i\}$ represent the corresponding transaction stages, defined by $t_0 = 1$ and $t_i = t_{i-1} - a_i$. In particular, $t_{i-1}$ is the downstream boundary of firm $i$ and $t_i$ is its upstream boundary (as in {numref}`f-coase_no`). Firms face a price function $p \colon [0, 1] \to \RR_+$, with $p(t)$ indicating the price of the good at stage $t$. Since the $i$-th firm purchases the good at stage $t_i$, sells it at stage $t_{i-1}$, and undertakes the remaining $a_i$ tasks in-house, its total costs are its processing costs $c(a_i)$ plus gross input cost $\delta p(t_i)$. The term $\delta > 1$ represents transaction costs. Transaction costs are incurred only by the buyer (a simplifying assumption), so its profits are $$ \pi_i = p(t_{i-1}) - c(a_i) - \delta p(t_i) . $$ (eq-coase_prof) Diminishing returns to management are implemented by assuming that $c$ is increasing and strictly convex. We also assume that $c$ is continuously differentiable, with $c(0)=0$ and $c'(0) > 0$. ```{prf:definition} :label: d-coase_eq Given a price function $p$ and a feasible allocation $A = \{a_i\}$, let $\{t_i\}$ be the corresponding firm boundaries and let $\{\pi_i\}$ be corresponding profits. The pair $(p, A)$ is called an **equilibrium** for the production chain if 1. $\; p(0) = 0$, 2. $\; \pi_i = 0$ for all $i$, and 3. $p(s) - c(s - t) - \delta p(t) \leq 0$ for any pair $s, t$ with $0 \leq t \leq s \leq 1$. ``` Condition (i) is a zero profit condition for suppliers of initial inputs (which are costless). Condition (ii) states that all active firms make zero profits. Condition (iii) ensures that no firm in the production chain has an incentive to deviate, and that no inactive firm can enter and extract positive profits. To construct an equilibrium we introduce the operator $T$ mapping $p$ to $Tp$ via $$ (Tp)(s) = \min_{t \leq s} \, \{ c(s-t) + \delta p(t) \} \qquad (0 \leq s \leq 0). $$ (eq-coase_deft) Here and below, the restriction $0 \leq t$ in the minimum is understood. Since $\delta > 1$, the map $T$ is not a contraction in any obvious metric, and $T^n p$ diverges for many choices of $p$, even when continuous and bounded.[^1] Nevertheless, there exists a domain on which $T$ is well-behaved: the set of convex increasing continuous functions $p \colon [0,1] \to \RR$ such that $c'(0) s \leq p(s) \leq c(s)$ for all $0 \leq s \leq 1$. We denote this set of functions by $\pP$. ```{prf:proposition} :label: t-coase_bk Under the stated assumptions, 1. $T$ maps $\pP$ into itself, 2. $T$ has a unique fixed point in $\pP$, denoted below by $p^*$, and 3. for all $p \in \pP$, we have $T^k p \to p^*$ uniformly as $k \to \infty$. ``` This result is proved in {cite:t}`kikuchi2018span`. In {ref}`sss-ond` below, we give an alternative proof by constructing a dynamic program whose value function coincides with $p^*$. The significance of $p^*$ is that there exists an allocation $A^*$ such that $(p^*, A^*)$ is an equilibrium for the production chain in the sense of {prf:ref}`d-coase_eq`. To construct this allocation, we introduce the **equilibrium choice function** $$ t^*(s) \coloneq \argmin_{ t \leq s} \{c(s - t) + \delta p^*(t) \}. $$ (eq-coase_argmins) By definition, $t^*(s)$ is the cost minimizing upstream boundary for a firm contracted to deliver the good at stage $s$ and facing the price function $p^*$. Since $p^*$ lies in $\pP$ and since $c$ is strictly convex, the minimizer $t^*(s)$ exists and is uniquely defined. We can use $t^*$ to construct an equilibrium allocation. The optimal upstream boundary of firm 1 is $t^*(1)$. Hence firm 2's optimal upstream boundary is $t^*(t^*(1))$. Continuing in this way produces the sequence $\{t^*_i\}$ defined by $$ t_0^* = 1 \quad \text{and} \quad t_i^* = t^*(t^*_{i-1}) . $$ (eq-coase_ralloc) The sequence ends when a firm chooses to complete all remaining tasks. We label this firm (and hence the number of firms in the chain) as $n^*$, defined by $n^* \coloneq \inf\setntn{i \in \NN}{t^*_i = 0}$. The task allocation corresponding to {eq}`eq-coase_ralloc` is given by $a_i^* \coloneq t_{i-1}^* - t_i^*$ for all $i$. The function $p^*$ is called the **equilibrium price function** and $A^*$ is called the **equilibrium allocation**. {cite}`kikuchi2018span` prove the following result: ```{prf:proposition} :label: t-coase_bk2 The value $n^*$ is well-defined and finite, the allocation $A^* = \{a^*_i\}$ is feasible, and the pair $(p^*, A^*)$ is an equilibrium for the production chain. ``` The idea of the proof is as follows. As a fixed point of $T$, the equilibrium price function satisfies $$ p^*(s) = \min_{t \leq s} \, \{ c(s - t) + \delta p^*(t) \} \quad \text{for all} \quad s \in [0, 1]. $$ (eq-coase_epe) From this equation it is clear that $p^*$ satisfies part (iii) of {prf:ref}`d-coase_eq`. Moreover, the equilibrium upstream boundary for firm $i$ is the minimizer in {eq}`eq-coase_epe` when $s$ is its downstream boundary, so profits are zero for all incumbent firms. Hence part (ii) of {prf:ref}`d-coase_eq` is satisfied. Part (i) of the definition is immediate from the fact that $p^* \in \pP$, whence we obtain $p^*(0) \leq c(0) = 0$. More details can be found in {cite:t}`kikuchi2018span`. The first order condition for the minimization in {eq}`eq-coase_epe` is $$ \delta (p^*)'(t^*(s)) = c'(s - t^*(s)). $$ The left-hand side is the marginal cost of purchasing an additional unit of input through the market (including the transaction cost markup $\delta$), while the right-hand side is the marginal cost of performing one more task in-house. At the optimum, these two forces balance, just as {cite}`coase1937nature` argued verbally: "a firm will tend to expand until the costs of organizing an extra transaction within the firm become equal to the costs of carrying out the same transaction by means of an exchange on the open market." For more discussion (and a proof of the differentiability of $p^*$), see {cite}`kikuchi2018span`. {numref}`f-coase_firm_boundaries` shows equilibrium prices and allocations for two values of the transaction cost parameter, using the exponential cost function $c(a) = e^{\theta a} - 1$ with $\theta=10$. The vertical lines are firm boundaries, computed via {eq}`eq-coase_ralloc`. When $\delta = 1.02$ (top panel), transaction costs are low and we see many small firms. When $\delta = 1.2$ (bottom panel), higher transaction costs encourage less use of the market and more in-house production, yielding fewer, larger firms. ```{figure} figures/coase_firm_boundaries.pdf :name: f-coase_firm_boundaries Firm boundaries for $\delta = 1.02$ (top) and $\delta = 1.2$ (bottom) ``` (sss-ond)= #### Optimality with Negative Discounting Interestingly, we can write down a dynamic program such that the value function corresponds with the equilibrium price function defined above. The dynamic program involves negative discounting and is itself useful and informative. Some background discussion and motivation is provided in {ref}`s-cn_apps`. In the model, an agent is faced with a task of measure $\hat x > 0$. At time $t$ they have $x_t$ units of the task remaining. They then take action $a_t \geq 0$ and incur loss $c(a_t)$. The state moves to $x_t - a_t$. We interpret $a_t$ as effort and $c(a_t)$ as disutility. The optimization problem is $$ \min_{\{a_t\}} \; \sum_{t=0}^{\infty} \delta^t c(a_t) \;\; \text{ s.t. } \;\; \sum_{t=0}^{\infty} a_t = \hat x. $$ (eq-dpm) Throughout this section, we suppose that $\delta > 1$, that $c(0) = 0$, and that $c$ is continuously differentiable, strictly increasing and strictly convex. The convexity in $c$ encourages the agent to defer some effort. Negative discounting ($\delta > 1$) has the opposite effect: the agent wants to get the task "over and done with." This trade-off determines the optimum. We also assume that $c'(0) > 0$ and that there exists an $\eta \in (0, \hat x)$ satisfying $$ c'(\eta) = \delta c'(0). $$ (eq-coase_eta_eq) Such an $\eta$ exists and is unique whenever $\hat x$ is large enough, since $c'$ is continuous and strictly increasing with $c'(0) < \delta c'(0)$. Let $V$ be all increasing $v \colon [0, \hat x] \to \RR$ with $v(0)=0$. The policy operators for this dynamic program take the form $$ (T_\sigma \, v)(x) = c(x - \sigma(x)) + \delta v(\sigma(x)) \qquad (v \in V, \; 0 \leq x \leq \hat x). $$ Here a **policy** is a function $\sigma \colon [0, \hat x] \to [0, \hat x]$ satisfying 1. $0 \leq \sigma(x) \leq x$ for all $x$, 2. $\sigma$ is increasing, 3. the effort level $\pi(x) \coloneq x - \sigma(x)$ is increasing, and 4. $\sigma(x) = 0$ if and only if $x \leq \eta$. Let $\Sigma$ be the set of all such policies. ```{prf:remark} :label: r-r We have chosen the policy set carefully. This is necessary because $\delta > 1$ means that the policy operators are not contractions. We need to restrict $\Sigma$ in order to ensure good behavior. (In fact we chose $\Sigma$ by setting up the Bellman equation and guessing properties that should be satisfied by the optimal policy.) ``` Note that (ii) and (iii) together imply that $\sigma$ is $1$-Lipschitz, and hence continuous. Note also that, for each $\sigma \in \Sigma$, we have $T_\sigma v \in V$ whenever $\sigma \in \Sigma$ and $v \in V$. This follows from $(T_\sigma v)(0) = c(0) + \delta v(0) = 0$ and the fact that $T_\sigma v$ is increasing --- which is true because $\sigma$, $\pi$, $c$ and $v$ are all increasing. Let $\TT$ be the set of all policy operators. Each $T_\sigma$ is an order-preserving self-map on $V$, so $(V, \TT)$ is an ADP. The next exercise characterizes iterates of $T_\sigma$ and can be solved by induction. ```{exercise} :label: ex-nd_iter Show that, for all $\sigma \in \Sigma$, $k \in \NN$ and $v \in V$, $$ (T_\sigma^k v)(x) = \sum_{j=0}^{k-1} \delta^j \, c(\pi(\sigma^j(x))) + \delta^k v(\sigma^k(x)). $$ (eq-coase_tsig_k) ``` ```{solution} ex-nd_iter Induction on $k$. The base case $k=1$ is the definition of $T_\sigma$. Assuming {eq}`eq-coase_tsig_k` holds for $k$, $$ \begin{aligned} (T_\sigma^{k+1} v)(x) &= c(\pi(x)) + \delta \, (T_\sigma^k v)(\sigma(x)) \\ &= c(\pi(x)) + \delta \sum_{j=0}^{k-1} \delta^j \, c(\pi(\sigma^{j+1}(x))) + \delta^{k+1} v(\sigma^{k+1}(x)) \\ &= \sum_{j=0}^{k} \delta^j \, c(\pi(\sigma^j(x))) + \delta^{k+1} v(\sigma^{k+1}(x)). \end{aligned} $$ ``` Somewhat surprisingly, each $T_\sigma$ is well-behaved on $V$. The next lemma gives details. ```{prf:lemma} :label: l-ndos Each $T_\sigma$ has a unique fixed point $v_\sigma$ in $V$. Moreover, there exists a $k_0 \in \NN$ such that $T_\sigma^k v = v_\sigma$ for all $v \in V$ and all $k \geq k_0$. ``` ```{prf:proof} Fix $\sigma \in \Sigma$. Condition (iii) yields $\pi(x) \geq \pi(\eta) = \eta$ for all $x \geq \eta$, so $\sigma(x) \leq x - \eta$ whenever $x \geq \eta$, and hence $\sigma^k(x) = 0$ for all $k \geq \lceil x / \eta \rceil$. Setting $k_0 \coloneq \lceil \hat x / \eta \rceil$ and using {eq}`eq-coase_tsig_k` with $v(0) = 0$ gives $$ (T_\sigma^k v)(x) = \sum_{j=0}^{k_0-1} \delta^j \, c(\pi(\sigma^j(x))) \eqcolon v_\sigma(x) \qquad \text{for all } k \geq k_0. $$ (eq-coase_vsig) The right-hand side is independent of $v$, so all orbits reach $v_\sigma$ in at most $k_0$ steps. In particular, taking $v = v_\sigma$ gives $T_\sigma^k v_\sigma = v_\sigma$ for all $k \geq k_0$, and hence also for $k = 1$ (since $T_\sigma v_\sigma = T_\sigma^{k_0+1} v_\sigma = v_\sigma$, using $k_0 + 1 \geq k_0$). Thus $v_\sigma$ is a fixed point, lying in $V$ since $v_\sigma(0) = 0$ and each term $\delta^j c(\pi(\sigma^j(\cdot)))$ is increasing. If $v \in V$ is any other fixed point, then $v = T_\sigma^{k_0} v = v_\sigma$. ◻ ``` Let $V_0$ be all convex continuous $v \in V$ satisfying $c'(0) x \leq v(x) \leq c(x)$ for all $0 \leq x \leq \hat x$. The proofs of the next two lemmas are straightforward but lengthy, so we present them as exercises. ```{prf:lemma} :label: l-ndeg If $v \in V_0$, then a $v$-min-greedy policy exists. Moreover, the Bellman operator $T$ is well-defined on $V_0$ and, for each $v \in V_0$, we have $$ (Tv)(x) = \min_{a \leq x} \, \{ c(x-a) + \delta v(a) \} \quad \text{for all} \quad x \in [0, \hat x]. $$ ``` Here and below, the restriction $0 \leq a$ in the minimum is understood. ```{exercise} :label: ex-nd_greedy Prove {prf:ref}`l-ndeg`. ``` ```{solution} ex-nd_greedy Fix $v \in V_0$ and let $g(x, y) \coloneq c(x - y) + \delta v(y)$. Since $v$ is continuous and $[0, x]$ is compact, the minimizer $\sigma(x) \coloneq \argmin_{0 \leq y \leq x} g(x, y)$ exists for every $x$. We verify that $\sigma \in \Sigma$. Condition (i) holds by the constraint. For condition (iv), first suppose $x \leq \eta$. If $\sigma(x) = y > 0$, then $g(x, y) < g(x, 0) = c(x)$, i.e., $c(x - y) + \delta v(y) < c(x)$. Since $v(y) \geq c'(0) y$ and $\delta c'(0) = c'(\eta) \geq c'(x)$ (because $x \leq \eta$ and $c'$ is increasing), we get $\delta v(y) \geq c'(x) y$. But then $c(x-y) + c'(x) y < c(x)$, contradicting convexity of $c$. Hence $\sigma(x) = 0$. Conversely, suppose $\sigma(x) = 0$, so $g(x, 0) \leq g(x, y)$ for all $y \leq x$. Since $v(y) \leq c(y)$, this gives $c(x) \leq c(x - y) + \delta c(y)$ for all $y \leq x$, and hence $(c(x) - c(x-y))/y \leq \delta c(y)/y$. Taking $y \to 0$ yields $c'(x) \leq \delta c'(0) = c'(\eta)$, so $x \leq \eta$. For conditions (ii) and (iii), observe that $g$ has strictly decreasing differences: for $x_2 > x_1$, the function $y \mapsto g(x_2, y) - g(x_1, y) = c(x_2 - y) - c(x_1 - y)$ is strictly decreasing (since $c$ is strictly convex). By monotone comparative statics, the minimizer $\sigma(x)$ is increasing in $x$. Similarly, writing $g(x, y) = c(x - y) + \delta v(y)$ in terms of the effort $a = x - y$ as $\tilde g(x, a) = c(a) + \delta v(x - a)$, the function $\tilde g$ also has decreasing differences in $(x, a)$ (since $v$ is convex), so the minimizing effort $\pi(x) = x - \sigma(x)$ is increasing in $x$. Finally, $Tv = T_\sigma v$ by construction, so $T$ is well-defined on $V_0$. ``` ```{prf:lemma} :label: l-ndtu $T$ maps $V_0$ into itself and has a unique fixed point $\bar v$ in $V_0$. Moreover, $T^k v = \bar v$ for all $v \in V_0$ and all $k \geq k_0 \coloneq \lceil \hat x / \eta \rceil$. ``` ```{exercise} :label: ex-nd_bellman Prove {prf:ref}`l-ndtu`. ``` ```{solution} ex-nd_bellman We first show that $T$ maps $V_0$ into itself. Fix $v \in V_0$ and let $w = Tv$. We have $w(0) = c(0) = 0$ and $w(x) \leq c(x)$ (evaluating the minimum at $y = 0$). For the lower bound, $c(x-y) \geq c'(0)(x-y)$ and $v(y) \geq c'(0)y$ give $c(x-y) + \delta v(y) \geq c'(0)(x-y) + \delta c'(0)y \geq c'(0) x$, so $w(x) \geq c'(0) x$. Continuity of $w$ follows from Berge's theorem. For convexity, let $x_\lambda = \lambda x_1 + (1-\lambda)x_2$ and let $y_i$ denote the minimizer at $x_i$. Setting $y_\lambda = \lambda y_1 + (1-\lambda) y_2 \leq x_\lambda$, convexity of $c$ and $v$ gives $$ w(x_\lambda) \leq c(x_\lambda - y_\lambda) + \delta v(y_\lambda) \leq \lambda w(x_1) + (1-\lambda) w(x_2). $$ Finally, monotonicity of $w$ follows from monotonicity of $\sigma$ and $\pi$ (established in {prf:ref}`l-ndeg`), since $w(x) = c(\pi(x)) + \delta v(\sigma(x))$ and all constituent functions are increasing. Hence $w \in V_0$. We now show that $T$ converges in finitely many steps. By {prf:ref}`l-ndeg`, the minimizer $\sigma(x)$ of $Tv$ satisfies $\sigma(x) = 0$ for $x \leq \eta$. Hence $Tv(x) = c(x)$ for all $x \leq \eta$, regardless of $v \in V_0$. We claim by induction that $T^k v$ is independent of $v \in V_0$ on $[0, k\eta]$. The base case $k = 1$ was just established. For the inductive step, suppose $T^k v$ agrees with a fixed function $\bar v$ on $[0, k\eta]$ for every $v \in V_0$. Since $T^k v \in V_0$ (as $T$ maps $V_0$ into itself), {prf:ref}`l-ndeg` gives a minimizer $\sigma(x)$ for $T(T^k v)$ at each $x$ with $\sigma(x) \leq x - \eta$. For $x \leq (k+1)\eta$, this gives $\sigma(x) \leq k\eta$, so $T^k v(\sigma(x)) = \bar v(\sigma(x))$. Hence $T^{k+1} v(x) = \min_{0 \leq y \leq x}\{c(x-y) + \delta T^k v(y)\}$ depends only on $\bar v$ on $[0, k\eta]$, and is therefore independent of $v$. This completes the induction. For $k_0 = \lceil \hat x / \eta \rceil$, the function $T^{k_0} v = \bar v$ is independent of $v \in V_0$ on all of $[0, \hat x]$. Since $\bar v = T^{k_0} v \in V_0$ for any $v \in V_0$, and $T\bar v = T^{k_0+1} v = \bar v$ (as $k_0 + 1 \geq k_0$), $\bar v$ is a fixed point of $T$ in $V_0$. Uniqueness follows as in {prf:ref}`l-ndos`: if $q \in V_0$ satisfies $Tq = q$, then $q = T^{k_0} q = \bar v$. ``` ```{prf:theorem} :label: t-coase_ndbk Under the stated conditions, the fundamental optimality properties hold for $(V, \TT)$. In particular, the unique fixed point $\bar v$ of $T$ in $V_0$ is the min-value function $\vmin$, and a policy $\sigma \in \Sigma$ is optimal if and only if it satisfies $$ \sigma(x) \in \argmin_{a \leq x} \, \{ c(x-a) + \delta \vmin(a) \} \quad \text{for all } x \in [0, \hat x]. $$ There exists exactly one optimal policy. ``` ```{prf:proof} We apply {prf:ref}`c-minbk`. By {prf:ref}`l-ndos`, each $T_\sigma$ is globally stable on $V$. Since each $T_\sigma$ is also order preserving, {prf:ref}`l-pspace` gives order stability of $(V, \TT)$. By {prf:ref}`l-ndeg`, every $v \in V_0$ admits a $v$-min-greedy policy, so $V_0 \subset V^G_\triangledown$. By {prf:ref}`l-ndtu`, $T$ has a fixed point $\bar v \in V_0 \subset V^G_\triangledown$. Hence {prf:ref}`c-minbk` applies and the fundamental min-optimality properties hold, giving $\bar v = \vmin$. Finally, the optimal policy is unique because $a \mapsto c(x - a) + \delta \vmin(a)$ is strictly convex (as $c$ is strictly convex and $\vmin$ is convex), so the minimizer is unique at each $x$. ◻ ``` ##### Connection to the production chain. Setting $\hat x = 1$, the Bellman operator in {prf:ref}`t-coase_ndbk` coincides with the operator $T$ defined in {eq}`eq-coase_deft`, and $V_0 = \pP$. {prf:ref}`l-ndtu` then provides an alternative proof of {prf:ref}`t-coase_bk`: part (i) is the statement that $T$ maps $V_0 = \pP$ into itself, part (ii) follows from uniqueness of $\bar v$ in $V_0 = \pP$ (with $p^* = \bar v = \vmin$), and part (iii) is the finite-step convergence $T^k p \to \bar v$ for all $p \in \pP$. In particular, the equilibrium price function of the production chain coincides with the value function of the negative-discounting dynamic program. (ss-oh)= ### Optimal Harvests In this section we examine a model of forestry management and optimal harvests. We set up the problem in {ref}`sss-serh` and then show in {ref}`sss-fact_oh` how factorization reduces the dimensionality of the state space. {ref}`sss-oh_conv` discusses when optimality for the subordinate ADP implies optimality for the primary. (sss-serh)= #### Setup A manager controls a timber plantation with biomass $s_t$ at time $t$. The unit price for timber at time $t$ is $p_t$. The manager observes $(s_t, p_t)$ and decides whether to harvest or not. A decision to harvest generates revenue $s_t p_t$. If she chooses to wait, then time updates to the next period and the process repeats. Biomass takes values in $\Ssf$, a closed and bounded interval in $\RR_+$, and evolves according to $s_{t+1} = q(s_t)$, where $q$ is a continuous self-map on $\Ssf$. If $q(0) > 0$, then the plantation regenerates after each harvest. If not, the plantation never regenerates and the problem below is an optimal stopping problem. We assume that the price sequence $(p_t)$ is iid with distribution $\phi$ on closed and bounded interval $\Esf \subset \RR_+$. The cost of harvesting given biomass $s$ is $m(s)$. The cost of maintaining the plantation for one period, rather than harvesting, is $c(s)$. Both $m$ and $c$ are continuous real-valued functions on $\Ssf$. The firm is risk neutral and discounts the future using discount factor $\beta < 1$. The state space for the model is $\Ssf \times \Esf$. The Bellman equation can be expressed as $$ v(s, p) = \max \\ \left\{ p s - m(s) + \beta \int v(q(0), p') \phi(\diff p') ,\; - c(s) + \beta \int v(q(s), p') \phi(\diff p') \right\}. \; $$ (eq-harbell) Alternatively, we can write $$ (T v)(s, p) = \max_a \left\{ r(s, p, a) + \beta \int v[f(s, a), p'] \phi(\diff p') \right\}, $$ (eq-hb2) where $a$ takes values in $\{0, 1\}$, and $$ r(s, p, a) \coloneq a (p s - m(s)) - (1-a) c(s) \quad \text{and} \quad f(s, a) \coloneq q[(1 - a)s]. $$ Biomass $s$ takes values in $\Ssf$, while the price $p$ takes values in $\Esf$. Both $\Ssf$ and $\Esf$ are closed and bounded intervals in $\RR_+$. The functions $m$ and $c$ are in $bc\Ssf$, while $q$ is a continuous self-map on $\Ssf$. A feasible policy $\sigma$ is a measurable map from $\Ssf \times \Esf$ to $\{0, 1\}$, with $0$ indicating the decision not to harvest and $1$ indicating harvest. The policy operator corresponding to this model is $$ (T_\sigma \, v)(s, p) \coloneq r_\sigma(s, p) + \beta \int v[f(s, \sigma(s, p)), p'] \phi(\diff p'), $$ where $$ r_\sigma(s, p) = r(s, p, \sigma(s, p)) \qquad (\sigma \in \Sigma, \; s \in \Ssf, \; p \in \Esf). $$ With $\Sigma$ as the set of all feasible policies, $V$ as the set of bounded measurable real-valued functions on $\Ssf \times \Esf$, and $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$, the pair $(V, \TT)$ is an ADP with Bellman operator equal to {eq}`eq-hb2`. (sss-fact_oh)= #### Factorization We can reduce dimensionality for this ADP via a transformation. To construct it, we set $\hat V$ to be the bounded measurable real-valued functions on $\Ssf \times \{0,1\}$. If we set $$ \begin{aligned} (Fv)(s, a) & \coloneq \int v[f(s, a), p'] \phi(\diff p') \qquad (v \in V), \\ (G_\sigma \, w)(s, p) & \coloneq r_\sigma(s, p) + \beta w(s, \sigma(s, p)) \qquad (w \in \hat V), \end{aligned} $$ and $\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}$, then 1. $(V, F, \hat V, \GG)$ is an order-preserving FDP and 2. $(V, \TT)$ is the primary ADP. ```{exercise} :label: ex-apps-auto-20 Confirm these claims. ``` ```{solution} ex-apps-auto-20 We verify the conditions in the definition of an order-preserving FDP. The map $F$ is order preserving because, for $u \leq v$ pointwise, we have $u[f(s,a), p'] \leq v[f(s,a), p']$ for all $p'$, and integration preserves this inequality. Each $G_\sigma$ is order preserving because $w \leq w'$ pointwise implies $w(s, \sigma(s,p)) \leq w'(s, \sigma(s,p))$ and hence $(G_\sigma w)(s,p) \leq (G_\sigma w')(s,p)$. Given any $w \in \hat V$, the supremum $\GG w$ defined by $$ (\GG w)(s, p) = \max_a \left\{ r(s, p, a) + \beta w(s, a) \right\} $$ exists since the maximum is over the finite set $\{0,1\}$. This is the greatest element of $\{G_\sigma w\}_{\sigma \in \Sigma}$. We have now confirmed that $(V, F, \hat V, \GG)$ is an order-preserving FDP. For (2), we verify that $(V, \TT)$ is the primary ADP for this FDP. We need $T_\sigma = G_\sigma \circ F$. Indeed, for any $v \in V$, $$ \begin{aligned} (G_\sigma F v)(s, p) & = r_\sigma(s, p) + \beta (Fv)(s, \sigma(s,p)) \\ & = r_\sigma(s, p) + \beta \int v[f(s, \sigma(s,p)), p'] \phi(\diff p') = (T_\sigma v)(s, p). \end{aligned} $$ ``` For the subordinate ADP $(\hat V, \hat{\TT})$, we get $$ \begin{aligned} (\hat T_\sigma \, w)(s, a) & = (F G_\sigma \, w)(s, a) \\ & = \int \left\{ r_\sigma(f(s, a), p') + \beta w[f(s, a), \sigma(f(s, a), p')] \right\} \phi(\diff p'). \end{aligned} $$ Notice that the functions in $V$ are defined over points $(s, p) \in \Ssf \times \Esf$, while functions in $\hat V$ are defined over points $(s, a) \in \Ssf \times \{0, 1\}$. The functions in the second set are typically much easier to work with (because $\Esf$ is larger than $\{0,1\}$). Let $\hat T$ be the Bellman operator corresponding to the subordinate ADP $(\hat V, \hat{\TT})$. In light of {prf:ref}`t-rgsc`, we can compute an optimal policy for $(V, \TT)$ by 1. computing the unique fixed point $\wopt$ of $\hat T$ in $\hat V$ and 2. finding a policy $\sigma$ obeying $$ \sigma(s, p) \in \argmax_a \left\{ r(s, p, a) + \beta \wopt(s, a) \right\} \qquad ((s, p) \in \Ssf \times \Esf). $$ (sss-oh_conv)= #### The Converse Implication Since the subordinate ADP $(\hat V, \hat{\TT})$ has a lower-dimensional state space, it is natural to want to solve it first and transfer the resulting optimal policy back to the primary ADP $(V, \TT)$. As discussed in {ref}`sss-aci`, however, this is not always valid: a policy that is optimal for the subordinate need not be optimal for the primary. The reason can be understood as follows. The subordinate ADP evaluates a policy $\sigma$ only at continuation states of the form $(f(s, a), p')$, where $p'$ is drawn from $\phi$. In the harvest model, these biomass levels lie in $q(\Ssf) \cup \{q(0)\}$. If the growth function $q$ does not map $\Ssf$ onto itself, then there exist biomass levels $s_0 \in \Ssf$ that never arise as continuations. For example, if $q$ maps $\Ssf = [0, 10]$ into $[2, 8]$, then biomass $s_0 = 1$ is a valid initial state but can never occur after the first period. The subordinate ADP never evaluates $\sigma$ at such states, so a subordinate-optimal policy may make an arbitrarily poor harvest decision there. The primary ADP, however, evaluates $\sigma$ at every state $(s, p) \in \Ssf \times \Esf$ and would detect the suboptimal choice. The additional condition needed for the converse is that $F$ be strictly order preserving (see {prf:ref}`p-stmon`). In this model, sufficient conditions include: $\phi$ has a positive density on $\Esf$, $q$ maps $\Ssf$ onto itself, and we restrict attention to continuous functions. Under these conditions, any strict difference between two value functions at some $(s_0, p_0)$ propagates through $F$ to a strict difference in the subordinate value space, closing the gap. (ss-euler)= ### Euler Equation Methods The Euler equation is a first-order condition for optimality that can be used for analysis and computation. Euler equations are available in a range of settings, including but not limited to household savings problems and optimal growth problems. Here we will work in the context of a smooth optimal growth model with iid shocks. Admittedly, this version of the model is too simple to be useful for serious economic analysis. At the same time it provides a convenient vehicle for our exploration of Euler equations and related topics. Most of the ideas discussed here carry over to other settings where Euler equations can be obtained. We begin with the model and collect basic optimality results. We then derive the envelope condition and use it to obtain the Euler equation, first in sequential form and then as a functional equation on policies. The functional form leads to the Coleman--Reffett operator $K$, whose fixed points are solutions to the Euler equation. Time iteration---iterating with $K$---provides a method for computing the optimal policy directly, without first solving for the value function. We also discuss the endogenous grid method, which accelerates time iteration by avoiding a nonlinear root-finding step. In {ref}`ss-eqvti` we establish that VFI and time iteration are equivalent, in the sense that $T$ and $K$ are topologically conjugate. (sss-agm)= #### A Growth Model We consider an optimal growth model where shocks are multiplicative and iid, with income evolving according to $$ y_{t+1} = f(y_t - c_t) \xi_{t+1}, $$ (eq-oslomy) where $f$ is a production function. The Bellman equation is $$ v(y) = \max_{0 \leq c \leq y} \left\{ u(c) + \beta \int v(f(y - c) z) \phi(z) \diff z \right\} $$ (eq-defgp20) for every $y \in \RR_+$, where $u$ is a utility function and $\phi$ is the density of the shock process. We work in the following environment: ```{prf:assumption} :label: a-mism The discount factor $\beta$ obeys $0 < \beta < 1$. In addition, - $u$ and $f$ are continuously differentiable, strictly increasing, and strictly concave, - the density $\phi$ is continuous on $\RR_+$, - $f(0) = 0$ and $f(k) \to \infty$ as $k \to \infty$, and - $u$ is bounded, with $u(0) = 0$ and $u'(0) = \infty$. ``` ```{exercise} :label: ex-uprime_zero Show that, under {prf:ref}`a-mism`, $u'(c) \to 0$ as $c \to \infty$. ``` ```{solution} ex-uprime_zero Since $u$ is concave and differentiable, $u(c) \geq u(c_0) + u'(c_0)(c - c_0)$ for all $c, c_0 \geq 0$. If $\limsup_{c \to \infty} u'(c) >0$, then there exist $\epsilon > 0$ and arbitrarily large $c_0$ with $u'(c_0) \geq \epsilon$, giving $u(c) \geq u(c_0) + \epsilon(c - c_0) \to \infty$ as $c \to \infty$, contradicting boundedness. ``` We can set this up as an RDP with $\Xsf = \Asf = \RR_+$ by defining - $V = b\Xsf$, - $\Gamma(y) = [0,y]$, and - $B(y, c, v) = u(c) + \beta \int v(f(y - c) z) \phi(z) \diff z$. The next lemma shows that $(\Gamma, V, B)$ has strong continuity properties. ```{exercise} :label: ex-ogsf Show that $(y, c) \mapsto B(y, c, v)$ is continuous on $\Gsf$ for all $v \in V$. ``` ```{solution} ex-ogsf The term $u(c)$ is continuous in $(y,c)$ by {prf:ref}`a-mism`. For the integral term, fix $v \in V = b\Xsf$ and let $(y_n, c_n) \to (y, c)$ in $\Gsf$. Setting $g(y,c) = f(y-c)$, the integrand $v(g(y_n, c_n) z) \phi(z)$ converges to $v(g(y,c) z) \phi(z)$ for all $z$, by continuity of $v \circ g$ and $\phi$. Since $|v(g(y_n, c_n) z) \phi(z)| \leq \|v\| \phi(z)$ and $\phi$ is integrable, the dominated convergence theorem gives $\int v(g(y_n, c_n) z) \phi(z) \diff z \to \int v(g(y,c) z) \phi(z) \diff z$. (See also {prf:ref}`eg-srssf`.) ``` ```{prf:proposition} :label: p-ogbpiid If {prf:ref}`a-mism` holds, then the fundamental optimality properties hold and VFI, OPI and HPI all converge. ``` ```{exercise} :label: ex-og Prove {prf:ref}`p-ogbpiid`. ``` ```{solution} ex-og We verify the conditions of {prf:ref}`p-gsdpc20`. For {prf:ref}`a-uca20`, boundedness of $B$ follows from boundedness of $u$ and $v$. The discounting condition holds with $\lambda = \beta$ because $$ B(y, c, v + \kappa) = u(c) + \beta \int (v + \kappa)(f(y-c) z) \phi(z) \diff z = B(y, c, v) + \beta \kappa. $$ For {prf:ref}`a-suca0`, the constraint correspondence $\Gamma(y) = [0,y]$ is continuous and compact-valued, and continuity of $(y,c) \mapsto B(y,c,v)$ on $\Gsf$ for any $v \in b\Xsf$ follows from {prf:ref}`ex-ogsf` (the proof there uses only boundedness of $v$, not continuity). The claims now follow from {prf:ref}`p-gsdpc20`. ``` ```{prf:lemma} :label: l-ogu If {prf:ref}`a-mism` holds, then 1. the value function $\vmax$ is increasing, concave, and continuous, and 2. the optimal policy is unique. ``` ```{exercise} :label: ex-apps-auto-21 Prove {prf:ref}`l-ogu`. ``` ```{solution} ex-apps-auto-21 Under {prf:ref}`a-mism`, the conditions of {prf:ref}`p-ogbpiid` hold, so the fundamental optimality properties are in force. For monotonicity, {prf:ref}`a-visiso` is satisfied: $\Gamma(y) = [0,y] \subset [0,y'] = \Gamma(y')$ when $y \leq y'$, and $B(y,c,v) \leq B(y',c,v)$ for increasing $v$ because $f(y-c) \leq f(y'-c)$. Hence $\vmax$ is increasing by {prf:ref}`p-vmon`. Concavity of $\vmax$ follows from {prf:ref}`p-visco`, since $\Gsf$ is convex and $(y,c) \mapsto B(y,c,v)$ is concave on $\Gsf$ whenever $v$ is concave (using concavity of $u$ and $f$). Continuity holds because $\vmax \in bc\Xsf$ by {prf:ref}`p-ogbpiid`. For (ii), strict concavity of $u$ gives strict concavity of $c \mapsto B(y,c,v)$ for concave $v$, so the optimal policy is unique by {prf:ref}`p-convexfcs`. ``` (sss-envthms)= #### Envelope Theorems We will make use of a differential characterization of greedy policies that is closely connected to the Euler equation and will be useful in what follows. A proof can be found in Section 12.1 of {cite}`stachurski2022economic`. ```{prf:proposition} :label: p-env Let $v$ be an increasing concave function in $bc\RR_+$ and let $\sigma$ be the unique $v$-greedy policy in $\Sigma$. If the conditions of {prf:ref}`a-mism` hold, then 1. $\sigma$ is strictly increasing and interior, while 2. $Tv$ is strictly concave, strictly increasing, continuously differentiable and satisfies $$ (Tv)' = u' \circ \sigma \quad \text{ on } \; (0, \infty) $$ (eq-og_env) ``` One interesting aspect of {prf:ref}`p-env` is that $v$ does not have to be differentiable. Hence, the Bellman operator is smoothing, in the sense that images of some nonsmooth functions are smooth. Here's an important corollary: ```{prf:corollary} :label: c-env Let $\sigma$ be the unique optimal policy and let $\vmax$ be the value function. If the conditions of {prf:ref}`a-mism` hold, then 1. $\sigma$ is strictly increasing and interior, while 2. $\vmax$ is strictly concave, strictly increasing, continuously differentiable, and satisfies $$ (\vmax)' = u' \circ \sigma \quad \text{on } \; (0, \infty) $$ (eq-cpi_env) ``` ```{prf:proof} By {prf:ref}`l-ogu`, the value function $\vmax$ is increasing, concave, and continuous. Hence (i) of {prf:ref}`c-env` follows from (i) of {prf:ref}`p-env`. To obtain (ii), use the fact that $\vmax = T \vmax$ combined with part (ii) of {prf:ref}`p-env`. ◻ ``` {prf:ref}`c-env` has been presented in many forms in the economics literature and {eq}`eq-cpi_env` is often called the **envelope condition**. (sss-seq)= #### The Sequential Euler Equation In the present setting, the Euler equation takes the form $$ u'(c_t) = \beta \, \EE_t \left[ u'(c_{t+1}) f'(y_t - c_t) \xi_{t+1} \right] $$ (eq-og_euler) We refer to {eq}`eq-og_euler` as the **sequential Euler equation** because it restricts the endogenous sequence $(c_t)$. The left-hand side is the marginal utility of consuming one additional unit today. The right-hand side is the expected marginal benefit of saving that unit instead: one unit of savings yields a gross return of $f'(y_t - c_t) \xi_{t+1}$ in the next period, which is then valued at marginal utility $u'(c_{t+1})$ and discounted by $\beta$. At the optimum, these two quantities are equalized---any deviation from {eq}`eq-og_euler` would allow the agent to improve lifetime utility by shifting consumption between periods. The Euler equation is typically understood as a necessary condition for optimality of a consumption-savings path. However, when applied to policies rather than consumption paths, it turns out to be sufficient as well. Moreover, studying it leads to new insights on optimal behavior and computational methods. To investigate these ideas, let's shift to a policy-based perspective, where the Euler equation becomes a functional restriction on policies. Set $$ \Sigma_{\cC} \coloneq \text{ all continuous, strictly increasing } \sigma \in \Sigma \text{ satisfying } 0 < \sigma(y) < y \text{.} $$ By continuity, each $\sigma$ in $\Sigma_{\cC}$ satisfies $\sigma(0) = 0$. In what follows, we will say that $\sigma \in \Sigma_{\cC}$ **satisfies the Euler equation** if $$ (u'\circ \sigma)(y) = \beta \int (u'\circ \sigma)(f(y - \sigma(y)) z) f'(y - \sigma(y)) z \phi(\diff z) \;\; \text{ for all } y > 0. $$ (eq-eu_fe) ```{exercise} :label: ex-pupu Let $\sigma$ be a policy satisfying {eq}`eq-eu_fe` and let $(c_t)$ be the corresponding consumption path, so that $c_t = \sigma(y_t)$ for all $t$ and $(y_t)$ obeys {eq}`eq-oslomy`. Show that $(c_t)$ satisfies the sequential Euler equation {eq}`eq-og_euler`. ``` ```{solution} ex-pupu Since $c_t = \sigma(y_t)$, evaluating {eq}`eq-eu_fe` at $y = y_t$ gives $$ u'(c_t) = \beta \int (u' \circ \sigma)(f(y_t - c_t) z) \, f'(y_t - c_t) \, z \, \phi(\diff z). $$ By {eq}`eq-oslomy`, $y_{t+1} = f(y_t - c_t) \xi_{t+1}$, and $c_{t+1} = \sigma(y_{t+1})$. Substituting $z = \xi_{t+1}$ and replacing the integral with the conditional expectation gives {eq}`eq-og_euler`. ``` To solve the functional Euler equation, we convert it into a fixed point problem. Consider the operator $K$ from $\Sigma_{\cC}$ to itself defined as follows: for each $\sigma \in \Sigma_{\cC}$ and each $y > 0$, the value $$ K\sigma(y) \coloneq \text{ the } c \text{ in } (0, y) \text{ that solves } \; u'(c) = \beta \int (u'\circ \sigma)(f(y - c) z) f'(y - c) z \phi(\diff z). $$ We call $K$ the **Coleman--Reffett** operator. It is well defined since, for any $\sigma \in \Sigma_{\cC}$, the map $c \mapsto u'(c)$ is continuous and strictly decreasing on $(0, y)$ with $u'(c) \to +\infty$ as $c \downarrow 0$, while the integral term is continuous and strictly increasing in $c$ on $(0, y)$ and diverges to $+\infty$ as $c \uparrow y$. It follows that there exists exactly one solution $c$ in the interval $(0, y)$. It is immediate from the definition that $\sigma$ in $\Sigma_{\cC}$ is a fixed point of $K$ if and only if it satisfies the Euler equation. Thus, the Coleman--Reffett operator plays the same role for the optimal policy that the Bellman operator plays for the value function. ```{exercise} :label: ex-criso Show that $K$ is order preserving on $\Sigma_{\cC}$, in the sense that if $\sigma_a$ and $\sigma_b$ are elements of $\Sigma_{\cC}$ with $\sigma_a \leq \sigma_b$, then $K \sigma_a \leq K \sigma_b$. ``` ```{solution} ex-criso Let $\sigma_a$ and $\sigma_b$ be elements of $\Sigma_{\cC}$ with $\sigma_a \leq \sigma_b$. To see that $K \sigma_a \leq K \sigma_b$, observe that, for arbitrary $\sigma \in \Sigma_{\cC}$ and fixed $y \in \RR_+$, the value $K\sigma(y)$ is the zero of $$ H_\sigma(c) \coloneq u'(c) - \beta \int (u'\circ \sigma)(f(y - c) z) f'(y - c) z \phi(\diff z) $$ Given our assumptions on the primitives, we have $H_{\sigma_a}(c) \leq H_{\sigma_b}(c)$ for every $c \in (0, y)$. Since $H_{\sigma_b}$ is pointwise greater, its zero is also larger. In other words, $K\sigma_a(y) \leq K\sigma_b(y)$. Since $y$ was arbitrary, our proof is now done. ``` #### Time Iteration **Time iteration** means computing the optimal policy by iterating with $K$: starting from an initial guess $\sigma_0 \in \Sigma_{\cC}$, we generate the sequence $(\sigma_k)$ defined by $\sigma_{k+1} = K \sigma_k$. In {ref}`ss-eqvti` we will show that this sequence converges to the optimal policy $\sigopt$, the unique fixed point of $K$ in $\Sigma_{\cC}$. The proof uses the fact that $K$ and the Bellman operator $T$ are topologically conjugate, so the iterates of $K$ converge whenever VFI converges, and at the same rate. {numref}`fig-ti_iterates` illustrates time iteration for the growth model in {ref}`sss-agm` with log utility $u(c) = \ln c$, Cobb--Douglas production $f(k) = k^\alpha$, and lognormal shocks. This specification does not satisfy all of the conditions in {prf:ref}`a-mism` (for example, $u$ is not bounded), but it admits a closed-form optimal policy $\sigopt(y) = (1 - \alpha \beta) y$, which allows us to measure the accuracy of the algorithm directly. The left panel shows the first few iterates of $K$ starting from $\sigma_0(y) = y$ (consume everything), which converge visibly toward $\sigopt$. The right panel plots the sup-norm error $\| K^n \sigma_0 - \sigopt \|$, confirming rapid convergence. ```{figure} figures/ti_iterates.pdf :name: fig-ti_iterates :width: 100% Iterates of the Coleman--Reffett operator $K$, starting from $\sigma_0(y) = y$, with $\alpha = 0.4$ and $\beta = 0.96$. ``` In practice, time iteration is often more efficient than VFI because policy functions typically have less curvature than value functions, making them easier to approximate on a grid. However, each iterate of $K$ requires solving a nonlinear equation at every grid point, which can be costly. The endogenous grid method, discussed next, eliminates this root-finding step. (sss-egm)= #### The Endogenous Grid Method In the standard implementation of time iteration, we fix a grid of income values $\{y_i\}$ and, for each $y_i$, solve the nonlinear equation $$ u'(c) = \beta \int (u'\circ \sigma)(f(y_i - c) z) f'(y_i - c) z \, \phi(\diff z) $$ for $c$ using a root-finding algorithm. This is the most expensive step in each iteration. The **endogenous grid method** (EGM), introduced by {cite}`Carroll2006`, avoids root-finding by reversing the logic: instead of fixing income $y$ and solving for consumption $c$, we fix savings $s = y - c$ and solve for $c$ directly. Specifically, given the current policy $\sigma \in \Sigma_{\cC}$ and a fixed grid of savings values $\{s_j\}$, we compute $$ c_j = (u')^{-1} \left( \beta \int (u'\circ \sigma)(f(s_j) z) \, f'(s_j) \, z \, \phi(\diff z) \right) $$ (eq-egm_c) and then set $y_j = c_j + s_j$. Each pair $(y_j, c_j)$ satisfies the Euler equation by construction. The updated policy $K\sigma$ is then reconstructed from the pairs $\{(y_j, c_j)\}$ by interpolation. The key advantage is that {eq}`eq-egm_c` requires only evaluation of $(u')^{-1}$, which is available in closed form for standard utility functions (e.g., $(u')^{-1}(p) = p^{-1/\gamma}$ for CRRA utility). This replaces the iterative root-finding step with a single function evaluation at each grid point, leading to substantial speed gains. The name "endogenous grid method" reflects the fact that the income grid $\{y_j\}$ is not fixed in advance but is determined endogenously by the savings grid and the current policy. (ss-eqvti)= ### Equivalence of VFI and Time Iteration We now show that value function iteration and time iteration are, in a precise sense, equivalent. To do so, we construct a bijection $M$ between value functions and policies under which the Bellman operator $T$ and the Coleman--Reffett operator $K$ are conjugate. We then upgrade this conjugacy to a topological conjugacy, which allows us to transfer the known convergence properties of $T$ to $K$. #### Conjugacy of $K$ and $T$ The key step is to build a bijection $M$ between the space of value functions and the space of policies that intertwines $T$ and $K$. The envelope condition ({prf:ref}`p-env`) provides the natural link: given a value function $v$, the map $M$ recovers the corresponding greedy policy by inverting the marginal utility. Throughout this section {prf:ref}`a-mism` is in force. First let $V_{\cC}$ be all strictly concave, continuously differentiable $v$ mapping $\RR_+$ to itself and satisfying $v(0) = 0$ and $v'(y) > u'(y)$ whenever $y > 0$. For $v \in V_{\cC}$ let $Mv$ be defined by $$ (M v)(y) = (u')^{-1}(v' (y)) $$ when $y> 0$ and $(M v)(0) =0$. For this mapping, we have the following result: ```{prf:lemma} :label: l-mmi The map $M$ is a bijection from $V_{\cC}$ to $\Sigma_{\cC}$, with inverse $M^{-1}$ mapping $\sigma \in \Sigma_{\cC}$ into $v$ defined by $$ (M^{-1} \sigma)(y) = \int_0^y u'(\sigma(x)) \diff x. $$ (eq-og_im) Moreover, for every increasing concave $v \in bc\RR_+$, the function $M T v$ is the unique $v$-greedy policy. ``` ```{prf:proof} For ease of exposition we set $m(y) \coloneq (u')^{-1} (y)$ whenever $y > 0$. From {prf:ref}`a-mism`, one can show that $m$ is a continuous and strictly decreasing bijection from $(0,\infty)$ to itself. Now observe that, for fixed $v \in V_{\cC}$, the derivative $v'$ is a continuous, strictly decreasing function. Hence $M v = m \circ v'$ is strictly increasing and continuous. Moreover, interiority holds because $v'$ strictly dominates $u'$, implying that, when $y > 0$, $$ (M v)(y) = m(v'(y)) < m(u'(y)) = y $$ In particular, $\sigma(y) \coloneq (Mv)(y)$ is an element of $\Sigma_{\cC}$. To see that each $\sigma \in \Sigma_{\cC}$ has a preimage $v \in V_{\cC}$ with $Mv = \sigma$, fix any $\sigma \in \Sigma_{\cC}$ and let $v$ be given by {eq}`eq-og_im`. An application of the Fundamental Theorem of Calculus yields $v \in V_{\cC}$ and $Mv = \sigma$. It is also true that $M$ is one-to-one on $V_{\cC}$. To see this, suppose that $v$ and $w$ are elements of $V_{\cC}$ satisfying $Mv = Mw$. Then $v(0) = w(0) = 0$ and $v' = w'$ on $(0, \infty)$. The Fundamental Theorem of Calculus then implies that $v = w$ on $\RR_+$. Finally, given $v \in V_{\cC}$, let $\sigma$ be the unique $v$-greedy policy. The claim is that $\sigma = M T v$, or, equivalently, that $u'(\sigma(y)) = (Tv')(y)$ for all $y > 0$. That this statement is true has already been established, in {prf:ref}`p-env`. ◻ ``` The significance of $M$ is that the systems $(V_{\cC}, T)$ and $(\Sigma_{\cC}, K)$ are conjugate under this mapping: ```{prf:proposition} :label: p-ktc If {prf:ref}`a-mism` holds, then $$ T = M^{-1} \circ K \circ M \text{ on } V_{\cC} $$ (eq-kvt) ``` ```{prf:proof} Since $M$ is bijective as a map between $V_{\cC}$ and $\Sigma_{\cC}$, we can equivalently show that $M \circ T = K \circ M$, or, more explicitly, that $$ (M T v)(y) = ( K M v)(y) \text{ for any } v \in V_{\cC} \text{ and any } y \in (0, \infty) $$ (eq-kvt2) To establish {eq}`eq-kvt2`, fix $v \in V_{\cC}$ and $y > 0$. We saw in {prf:ref}`l-mmi` that $\sigma \coloneq M T v$ is the unique $v$-greedy policy. This policy necessarily satisfies the first order condition $$ u'(\sigma(y)) = \beta \int v' (f(y - \sigma(y)) z ) f'(y - \sigma(y)) z \phi(\diff z) $$ On the other hand, $KMv(y)$ is the unique $c$ in $(0, y)$ that solves $$ \begin{aligned} u'(c) & = \beta \int (u' \circ (Mv)) (f(y - c) z ) f'(y - c) z \phi(\diff z) \\ & = \beta \int (u' \circ ((u')^{-1} \circ v')) (f(y - c) z ) f'(y - c) z \phi(\diff z) \\ & = \beta \int v'(f(y - c) z ) f'(y - c) z \phi(\diff z) \end{aligned} $$ In particular, $c = \sigma(y)$. In other words, $KMv(y) = MTv(y)$, as was to be shown. ◻ ``` This shows that VFI and time iteration are essentially isomorphic---they are topologically conjugate dynamical systems with identical long-run behavior. The practical benefits of time iteration stem from its numerical advantages: it operates directly in policy space and, combined with methods such as EGM, can avoid costly root-finding steps. #### From Conjugacy to Topological Conjugacy Suppose we can show that the bijection $M$ is also continuous as a map from $V_{\cC}$ to $\Sigma_{\cC}$ and that its inverse is likewise continuous. Then {eq}`eq-kvt` implies that $(V_{\cC}, T)$ and $(\Sigma_{\cC}, K)$ are topologically conjugate (see {ref}`sss-topcon`). This will be informative about $(\Sigma_{\cC}, K)$ because topologically conjugate dynamical systems have essentially identical properties---and we already have a significant amount of information about the trajectories of $T$. The way we will show that $M$ and its inverse are continuous is to cook up a metric on $\Sigma_{\cC}$ that essentially guarantees this result. The metric in question is $$ \rho(\sigma_1, \sigma_2) = d_\infty( M^{-1} \sigma_1 , M^{-1} \sigma_2 ) \coloneq \| M^{-1} \sigma_1 - M^{-1} \sigma_2 \|_\infty $$ ```{exercise} :label: ex-apps-auto-22 Show that $M$ is continuous as a map from $(V_{\cC}, d_\infty)$ to $(\Sigma_{\cC}, \rho)$. Show that $M^{-1}$ is continuous as a map from $(\Sigma_{\cC}, \rho)$ to $(V_{\cC}, d_\infty)$. ``` ```{solution} ex-apps-auto-22 Both maps are isometries under these metrics. For $M$, take any $v_1, v_2 \in V_{\cC}$ and observe that $$ \rho(M v_1, M v_2) = d_\infty(M^{-1} M v_1, M^{-1} M v_2) = d_\infty(v_1, v_2). $$ For $M^{-1}$, take any $\sigma_1, \sigma_2 \in \Sigma_{\cC}$ and observe that $$ d_\infty(M^{-1} \sigma_1, M^{-1} \sigma_2) = \rho(\sigma_1, \sigma_2). $$ Isometries are continuous, so both claims follow. ``` Now we can state the following result, which infers properties about the time iteration and its fixed point from corresponding properties of the Bellman operator. ```{prf:proposition} :label: p-eu_fe If {prf:ref}`a-mism` holds and $\sigma$ is the unique optimal policy, then 1. $(\Sigma_{\cC}, K)$ is globally stable and 2. the unique fixed point of $K$ in $\Sigma_{\cC}$ is $\sigma$. In particular, $c \in \Sigma_{\cC}$ is optimal if and only if it satisfies the Euler equation {eq}`eq-eu_fe`. ``` ```{prf:proof} Let the conditions of the proposition hold and let $\sigma$ be the unique optimal policy. We have already seen that $T$ is globally stable on $bc\RR_+$ under $d_\infty$ and therefore it is globally stable on $V_{\cC}$ under the same metric. (That $T$ is a self-map on $V_{\cC}$ follows from {prf:ref}`p-env`.) Moreover, when we endow $\Sigma_{\cC}$ with the metric $\rho$, $T$ and $K$ are topologically conjugate under the bijection $M$. In view of {prf:ref}`p-tc`, this implies that $K$ is globally stable in $\Sigma_{\cC}$. Since $\vmax$ is the unique fixed point of $T$ in $V_{\cC}$, {prf:ref}`p-iccm` tells us that the unique fixed point of $K$ in $\Sigma_{\cC}$ is $M \vmax$. As shown in {prf:ref}`l-mmi`, $M \vmax = M T \vmax$ is the unique $\vmax$-greedy policy, which, by Bellman's principle of optimality, is $\sigma$. ◻ ``` ```{exercise} :label: ex-abs Let $\beta_a \leq \beta_b$ be two discount factors in $(0, 1)$. Let $\sigma_a$ and $\sigma_b$ be the respective optimal consumption policies. Show that, in the setting of {prf:ref}`p-eu_fe`, we have $\sigma_b \leq \sigma_a$. Provide intuition. ``` ```{solution} ex-abs Let $\beta_a$ and $\beta_b$ be two discount factors with the stated properties. Let $K_a$ and $K_b$ be the respective Coleman--Reffett operators. For $\sigma_b \leq \sigma_a$ to hold, it suffices (by {prf:ref}`p-ofpdsms` on page ) to show that $K_a$ is order preserving, globally stable on $\Sigma_{\cC}$, and has the property $K_b \sigma \leq K_a \sigma$ for any $\sigma \in \Sigma_{\cC}$. That $K_a$ is order preserving was established in {prf:ref}`ex-criso`. That $K_a$ is globally stable was shown in {prf:ref}`p-eu_fe`. To see that $K_b \sigma \leq K_a \sigma$ for all $\sigma \in \Sigma_{\cC}$, fix any such $\sigma$ and any $y \in (0, \infty)$. Observe that $K_i \sigma(y)$ is the zero of $$ H_i(c) \coloneq u'(c) - \beta_i \int (u'\circ \sigma)(f(y - c) z) f'(y - c) z \phi(\diff z) $$ for $i = a, b$. Evidently $H_b \leq H_a$ pointwise on $(0, y)$, so the zero of $H_b$ is no larger. In other words, $K_b \sigma(y) \leq K_a \sigma(y)$. Since $y$ was arbitrary, this completes our proof. The intuition for the result is that larger $\beta$ means greater patience, which encourages saving and hence reduces consumption. ``` (s-cn_apps)= ## Chapter Notes The job search model in {ref}`ss-jobsearch` is due to {cite}`mccall1970`. {cite}`mortensen1986job` provides an extensive survey of the job search literature and its connections to labor market analysis. A finite-state treatment of the McCall model and related problems can be found in {cite}`sargent2025dynamic`. The job search model with learning in {ref}`sss-jslo` is closely related to {cite}`rothschild1974searching`, who studied optimal search when the distribution of offers is unknown. {cite}`esponda2021` provide an equilibrium condition for agents controlling MDPs under Bayesian learning. The Kreps--Porteus certainty equivalent used in {ref}`ss-nonexp` originates from {cite}`kreps1978temporal`; see also {cite}`epstein1989risk` for a continuous-time formulation and further analysis. The job search model with separation in {ref}`ss-jsep` extends a finite-state treatment from Chapter 3 of {cite}`sargent2025dynamic`. The form of nonlinear discounting analyzed in {ref}`ss-nondis` was introduced by {cite}`jaskiewicz2014variable` and {cite}`bauerle2021stochastic`. As mentioned, one motivation is magnitude effects, under which discount rates decrease with the size of the reward. For evidence in favor of such effects, see, for example, {cite}`green1997rate`. The Coase-meets-Bellman production chain model in {ref}`sss-apc` is based on the work of {cite:t}`kikuchi2018span`. The negative discount optimality problem constructed in {ref}`sss-ond` is based on {cite:t}`kikuchi2021coase`. Negative discount rates seem to arise naturally in some settings. {cite}`thaler1981some`, {cite}`loewenstein1991negative` and {cite}`loewenstein1991workers` document separate instances of such phenomena. For example, {cite}`loewenstein1991workers` found that the majority of surveyed workers reported a preference for increasing wage profiles over decreasing ones, even when it was pointed out that the latter could be used to construct a dominating consumption sequence. {cite}`loewenstein1991negative` obtained similar results, stating that "sequences of outcomes that decline in value are greatly disliked, indicating a negative rate of time preference" {cite}`loewenstein1991negative`. The optimal harvest model in {ref}`ss-oh` illustrates how factored dynamic programs can reduce dimensionality in models with iid shocks. The stochastic optimal growth model described in {ref}`sss-agm` was first set out and analyzed in {cite}`brock1972optimal`. {cite}`stokey1989recursive` provides a textbook treatment of the dynamic programming theory for optimal growth, including the envelope theorem and Euler equation methods. The envelope condition in {ref}`sss-envthms` is a version of the classic result of {cite}`benveniste1979differentiability`, who established differentiability of the value function at interior optima; see also {cite}`milgrom2002envelope` for extensions to arbitrary choice sets. The Coleman--Reffett operator in {ref}`sss-seq` is named after {cite}`coleman1990solving`, who introduced policy-function iteration as an alternative to value function iteration for solving the stochastic growth model, and {cite}`coleman1991equilibrium`, who extended the method to distorted economies. {cite}`datta2002existence` generalized the approach using lattice-theoretic methods. The endogenous grid method was introduced by {cite}`Carroll2006`. Extensions to handle non-convex and discrete--continuous choice problems were developed by {cite}`fella2014generalized` and {cite}`iskhakov2017endogenous`, respectively. The dramatic speed improvements offered by EGM have made it a standard component of the structural estimation toolkit for life-cycle and consumer-choice models. The topological conjugacy approach used in {ref}`ss-eqvti` to establish equivalence of VFI and time iteration draws on ideas developed in {cite}`sargent2025partially`. [^1]: For example, if $p \equiv 1$, then $T^n p = \delta^n 1$, which diverges to $+\infty$. ======================================================================== ## Approximation and Learning The theory developed in earlier chapters can be extended to address two important problems faced by applied researchers. One is the need for approximation. The other is how to learn optimal policies from samples, based on limited knowledge. This short chapter addresses these issues. Our aim here is limited, namely, to provide a brief introduction to these large topics and connect them to the ADP theory studied above. The chapter consists of two sections. {ref}`s-approx` develops the theory of approximate value function iteration, showing that nonexpansive approximation operators preserve contractivity and lead to clean error bounds. It also introduces stochastic approximation, which provides a way to study the convergence properties of iterative algorithms and forms the theoretical foundation for the learning algorithms that follow. {ref}`s-sim_learn` treats simulation-based learning, covering Q-learning, risk-sensitive Q-learning, and policy gradient methods. (s-approx)= ## Approximation This section discusses approximate dynamic programming. Approximation is needed because (a) models with continuous states and actions cannot be implemented exactly on computers, and (b) many problems are high-dimensional and hence computationally expensive. We focus on approximate value function iteration. We begin in {ref}`ss-approx_methods` with a review of common approximation methods, including local and global approximation. In {ref}`ss-nonexp_approx`, we show that nonexpansive approximation operators preserve contractivity and derive error bounds for the resulting policies. {ref}`ss-sa` introduces stochastic approximation, a general framework for finding fixed points from noisy evaluations that underpins the learning algorithms of {ref}`s-sim_learn`. (ss-approx_methods)= ### Approximation Methods Suppose that we want to represent a function $f \colon \Xsf \to \RR$ using an approximation $\hat f$ that can be implemented on a machine using a finite number of parameters. One common approach is to take a normed linear space $V$ containing $f$ and $n$ elements $b_1, \ldots, b_n$ from $V$. The approximation then takes the form $$ \hat f(x) = \sum_{i=1}^n \alpha_i b_i(x) $$ (eq-hatf) The scalars $\alpha_1, \ldots, \alpha_n$ are chosen so that $\hat f$ is close to $f$. In this setting, the elements $b_i$ are typically called **basis functions**. In practice, one often fixes a sequence of functions $(b_n)_{n \in \NN} \subset V$ such that, with $B_n$ defined as the linear span of $b_1, \ldots, b_n$ for each $n$, the union $\bigcup_{n \in \NN} B_n$ is dense in $V$. Next we introduce a sequence of **approximation operators** $(L_n)_{n \in \NN}$ such that $L_n f$ has the form of $\hat f$ in {eq}`eq-hatf`. For example, given $f \in V$, the function $L_n f$ might equal the closest element in $B_n$ to $f$ under the specified norm on $V$. See, for example, {cite}`cheney2013analysis` or {cite}`atkinson2005theoretical`. (sss-la)= #### Local Approximation One variation on this idea, often called **local approximation**, proceeds by first choosing a set of grid points $\{x_1, \ldots, x_n\}$ contained in $\Xsf$ and a set of basis functions $\{\kappa_1, \ldots, \kappa_n\}$ defined on the state space $\Xsf$ with the property that $$ \sum_{i=1}^n \kappa_i(x) = 1 \text{ for all } x \in \Xsf \quad \text{and} \quad \kappa_j \geq 0 \quad \text{for all } j \in \{1, \ldots, n\}. $$ (eq-kerb) Elements of a family of basis functions satisfying {eq}`eq-kerb` are sometimes called **weighting functions** or **kernels**, and an approximation of a function $f$ taking the form $$ (Lf)(x) = \sum_{i=1}^n f(x_i) \kappa_i(x) $$ (eq-kernav) is called a **local approximator** or a **kernel averager**. The key idea is that $(Lf)(x)$, the approximation to $f(x)$ at $x \in \Xsf$, is a weighted average of the values of $f$ at the grid points. Typically, $\kappa_i(x)$ is relatively large when $x$ is near $x_i$, so that $f(x_i)$ has a large weight in determining $Lf(x)$. ```{prf:example} :label: eg-gka Gaussian kernel averagers are often used for local approximation. Given grid points $\{x_1, \ldots, x_n\} \subset \RR^d$ and bandwidth $h > 0$, set $$ K_h(x, x_i) \coloneq \exp\!\left( -\frac{\|x - x_i\|^2}{2h^2} \right) $$ and define $\kappa_i(x) \coloneq K_h(x, x_i) / \sum_{j=1}^n K_h(x, x_j)$. The resulting kernel averager {eq}`eq-kernav` assigns larger weight to grid points near $x$. Clearly {eq}`eq-kerb` is satisfied. ``` {numref}`f-gka` illustrates the Gaussian kernel averager from {prf:ref}`eg-gka` in one dimension. The left panel shows the kernel weights $\kappa_i(x)$ for six grid points: each $\kappa_i$ peaks at $x_i$ and decays smoothly, and the weights sum to one at every $x$. The right panel shows a function $f$ and its kernel-averager approximation $Lf$; the approximation agrees with $f$ at the grid points and smoothly blends between them. ```{figure} figures/gaussian_kernel_averager.pdf :name: f-gka :width: 95% Gaussian kernel averager: kernels (left) and approximation (right) ``` ```{prf:example} :label: eg-pli Continuous piecewise linear interpolation on a one-dimensional grid can be viewed as a kernel averager. Taking grid points $a = x_1 < x_2 < \cdots < x_n = b$, define the "hat" basis functions $$ \kappa_i(x) \coloneq \begin{cases} (x - x_{i-1}) / (x_i - x_{i-1}) & \text{if } x \in [x_{i-1}, x_i] \\ (x_{i+1} - x) / (x_{i+1} - x_i) & \text{if } x \in [x_i, x_{i+1}] \\ 0 & \text{otherwise} \end{cases} $$ with the obvious modifications at the endpoints. The resulting kernel averager {eq}`eq-kernav` is the unique continuous piecewise linear function that agrees with $f$ at all grid points. One easily checks that $\kappa_i \geq 0$ and $\sum_i \kappa_i(x) = 1$ for all $x \in [a, b]$, so {eq}`eq-kerb` holds. These ideas and results extend to continuous piecewise linear interpolation in $\RR^d$ (see, e.g., {cite}`stachurski2008continuous`). ``` We will exploit the following property in {ref}`ss-nonexp_approx` to derive error bounds for approximate value function iteration. ```{prf:lemma} :label: l-neka Let $L$ be the kernel averager in {eq}`eq-kernav` with each $\kappa_i \in b\Xsf$. Then $L$ maps $b\Xsf$ into itself and is nonexpansive under the supremum norm. ``` ```{prf:proof} The first claim is true because $b\Xsf$ is a linear space. Regarding the second, pick any $f, g \in b\Xsf$ and $x \in \Xsf$. Using {eq}`eq-kerb`, $$ | (Lf)(x) - (Lg)(x) | = \left| \sum_{i=1}^n (f(x_i) - g(x_i)) \kappa_i(x) \right| \leq \sum_{i=1}^n |f(x_i) - g(x_i)| \kappa_i(x) \leq \| f - g \|. $$ Taking the supremum over $x$ confirms the claim. ◻ ``` (sss-ga)= #### Global Approximation Another approach involves **global approximation** methods. Here we drop the partition-of-unity constraint {eq}`eq-kerb` from the basis functions and the function values $\{f(x_i)\}$ are replaced with a more generic set of coefficients $\{\theta_j\}_{j=1}^k$. With the basis denoted by $\{b_1, \ldots, b_k\}$, the approximations take the form $$ (G_\theta f)(x) = \sum_{j=1}^k \theta_j b_j(x). $$ The number of basis elements $k$ is typically smaller than the number of grid points, seeking a parsimonious representation. Coefficients are chosen by fitting. A typical criterion is least squares minimization: $$ \hat \theta \coloneq \argmin_\theta \sum_{i=1}^n [(G_\theta f)(x_i) - f(x_i)]^2. $$ If $Z_{ij} = b_j(x_i)$ and $y = (f(x_1), \ldots, f(x_n))^\top$, then $\hat \theta$ takes the familiar form $(Z^\top Z)^{-1} Z^\top y$ whenever $Z$ has full column rank. ```{prf:remark} Neural networks provide a form of global approximation where the basis functions $b_j$ are themselves parameterized. The universal approximation theorem {cite:p}`hornik1989multilayer` guarantees that sufficiently wide or deep networks can approximate any continuous function on a compact set to arbitrary precision. Unfortunately, nonexpansiveness is generally not preserved by neural network approximations. ``` (ss-nonexp_approx)= ### Error Bounds In this section, we develop error bounds for approximate value function iteration in the ADP framework of {prf:ref}`c-adps2`. The key result is that when the approximation operator is nonexpansive, the composition of the Bellman operator with the approximation operator inherits contractivity, and the error in the computed policy can be bounded in terms of the approximation quality. (sss-fvi_frame)= #### Framework Throughout this section, $(V, \TT)$ is an ADP with $V = (V, d, \preceq)$ a partially ordered metric space. An operator $L \colon V \to V$ is called **nonexpansive** with respect to $d$ when $$ d(Lv, Lw) \leq d(v, w) \qquad \text{for all } v, w \in V. $$ (eq-lne) We make the following assumption. ```{prf:assumption} :label: a-fvi The ADP $(V, \TT)$ is regular and each $T_\sigma \in \TT$ is a contraction of modulus $\beta$ on $(V, d)$. The metric $d$ is complete and sup-nonexpansive (see {ref}`sss-snms`). ``` Under {prf:ref}`a-fvi`, the conditions of {prf:ref}`t-contract` are satisfied (with $V_0 = V$, since regularity implies semi-regularity with $V_0 = V$). As a result, the fundamental optimality properties hold, the Bellman operator $T$ is a contraction of modulus $\beta$ on $V$, the value function $\vmax$ is the unique fixed point of $T$ in $V$, and VFI, OPI, and HPI all converge. Let $L$ be a self-map on $V$. We call $L$ the **approximation operator** and define the **approximate Bellman operator** via $\hat T \coloneq L \circ T$. ```{prf:lemma} :label: l-ltcon Let {prf:ref}`a-fvi` hold. If $L$ is nonexpansive on $(V, d)$, then $\hat T$ is a contraction of modulus $\beta$ on $(V, d)$. ``` ```{prf:proof} For any $v, w \in V$, $$ d(\hat T v, \hat T w) = d(L T v, L T w) \leq d(T v, T w) \leq \beta \, d(v, w). $$ ◻ ``` By {prf:ref}`l-ltcon` and the Banach contraction mapping theorem, $\hat T$ has a unique fixed point $\hat v \in V$, and the iterates $\hat T^n v_0$ converge geometrically to $\hat v$ for any starting point $v_0 \in V$. The **fitted value iteration** (FVI) algorithm iterates with $\hat T$ from an initial condition $v_0$ and, upon termination, returns a $v_N$-greedy policy. ```{prf:algorithm} Fitted value iteration :label: algo-fvi - input $v \in V$, an initial guess of $\vmax$ - input $\tau$, a tolerance level for error - $\epsilon \leftarrow \tau + 1$ - while $\epsilon > \tau $: - $v' \leftarrow LTv$ - $\epsilon \leftarrow d(v', v)$ - $v \leftarrow v'$ - **return** a $v$-greedy policy $\sigma$ ``` The next two results bound the policy error when the FVI algorithm terminates after $N$ iterations. (sss-fvi_error)= #### Error Bounds Let $v_0 \in V$ be the initial condition in the FVI routine and define $v_n \coloneq \hat T^n v_0 = (LT)^n v_0$ for each $n$. Let $e_N \coloneq d(v_N, v_{N-1})$ be the step size at termination. Let $\sigma$ be a $v_N$-greedy policy, so that $T_\sigma \, v_N = T v_N$. ```{prf:theorem} :label: t-fvi1 Let {prf:ref}`a-fvi` hold. If $L$ is nonexpansive on $(V,d)$ and the FVI algorithm terminates after $N$ iterations, then $$ d(\vmax, v_\sigma) \leq \frac{2}{1-\beta} \left( \beta \, e_N + d(LT v_N, T v_N) \right). $$ (eq-fvi1) ``` The bound {eq}`eq-fvi1` decomposes the error into two terms. The first term $\beta \, e_N$ reflects the accuracy of the iterative process and decreases geometrically with the number of iterations. The second term $d(LTv_N, Tv_N)$ is the approximation error at the final iterate---how well $L$ represents $Tv_N$. Both terms are computable by the algorithm. ```{prf:proof} *Proof of {prf:ref}`t-fvi1`.* By the triangle inequality, $$ d(\vmax, v_\sigma) \leq d(\vmax, v_N) + d(v_N, v_\sigma). $$ (eq-fvitri) We bound each term separately. *Bounding $d(\vmax, v_N)$.* Since $\vmax = T\vmax$ and $T$ is a $\beta$-contraction, $$ d(\vmax, v_N) \leq d(T\vmax, Tv_N) + d(Tv_N, v_N) \leq \beta \, d(\vmax, v_N) + d(Tv_N, v_N), $$ and hence $$ (1-\beta) \, d(\vmax, v_N) \leq d(Tv_N, v_N). $$ (eq-fvic1) To bound $d(Tv_N, v_N)$, we use $v_{N+1} = LTv_N$: $$ \begin{aligned} d(Tv_N, v_N) &\leq d(Tv_N, LTv_N) + d(LTv_N, v_N) \\ &= d(Tv_N, LTv_N) + d(LTv_N, LTv_{N-1}) \\ &\leq d(Tv_N, LTv_N) + d(Tv_N, Tv_{N-1}) \\ &\leq d(Tv_N, LTv_N) + \beta \, e_N. \end{aligned} $$ Here the third line uses nonexpansiveness of $L$ and the fourth uses contractivity of $T$. Combining with {eq}`eq-fvic1`, $$ (1-\beta) \, d(\vmax, v_N) \leq \beta \, e_N + d(LTv_N, Tv_N). $$ (eq-fvibd1) *Bounding $d(v_N, v_\sigma)$.* Since $\sigma$ is $v_N$-greedy, $T_\sigma \, v_N = Tv_N$. Using this and the fact that $T_\sigma$ is a $\beta$-contraction with fixed point $v_\sigma$, $$ \begin{aligned} d(v_N, v_\sigma) & \leq d(v_N, Tv_N) + d(Tv_N, v_\sigma) \\ & = d(v_N, Tv_N) + d(T_\sigma v_N, T_\sigma v_\sigma) \leq d(v_N, Tv_N) + \beta \, d(v_N, v_\sigma). \end{aligned} $$ Hence $(1-\beta) \, d(v_N, v_\sigma) \leq d(v_N, Tv_N)$, and the same bound as above gives $$ (1-\beta) \, d(v_N, v_\sigma) \leq \beta \, e_N + d(LTv_N, Tv_N). $$ (eq-fvibd2) *Combining.* Adding {eq}`eq-fvibd1` and {eq}`eq-fvibd2` and using {eq}`eq-fvitri` yields {eq}`eq-fvi1`. ◻ ``` The next result provides an alternative bound that replaces the approximation error at the final iterate with the approximation error at the true value function $\vmax$. ```{prf:theorem} :label: t-fvi2 Under {prf:ref}`a-fvi`, if $L$ is nonexpansive on $(V,d)$ and the FVI algorithm terminates after $N$ iterations, then $$ d(\vmax, v_\sigma) \leq \frac{2}{(1-\beta)^2} \left( \beta \, e_N + d(L\vmax, \vmax) \right). $$ (eq-fvi2) ``` The bound {eq}`eq-fvi2` trades a larger constant $(1-\beta)^{-2}$ for a simpler approximation term $d(L\vmax, \vmax)$, which depends only on how well the approximation operator $L$ represents the true value function. This is useful for a priori analysis of approximation architectures. ```{prf:proof} *Proof of {prf:ref}`t-fvi2`.* We first show that $$ (1-\beta) \, d(\vmax, v_\sigma) \leq 2 \, d(v_N, \vmax). $$ (eq-fviclaim3) To this end, fix $v \in V$ and observe that $d(v, Tv) \leq d(v, \vmax) + d(\vmax, Tv) \leq (1+\beta) d(v, \vmax)$. Using this at $v = v_N$ and repeating the argument for $d(v_N, v_\sigma)$ from the proof of {prf:ref}`t-fvi1`, we get $(1-\beta) d(v_N, v_\sigma) \leq (1+\beta) d(v_N, \vmax)$. Combining with {eq}`eq-fvitri`, $$ d(\vmax, v_\sigma) \leq d(\vmax, v_N) + \frac{1+\beta}{1-\beta} d(v_N, \vmax) = \frac{2}{1-\beta} d(v_N, \vmax). $$ This gives {eq}`eq-fviclaim3`. Next, let $\hat v$ be the unique fixed point of $\hat T = LT$ in $V$. By the triangle inequality, $$ d(\vmax, v_N) \leq d(\vmax, \hat v) + d(\hat v, v_N). $$ (eq-fvic4tri) For the first term, using $\vmax = T\vmax$ and $\hat v = LT\hat v$: $$ d(\vmax, \hat v) \leq d(\vmax, L\vmax) + d(L\vmax, \hat v) = d(\vmax, L\vmax) + d(LT\vmax, LT\hat v) \leq d(\vmax, L\vmax) + \beta \, d(\vmax, \hat v), $$ so $(1-\beta) \, d(\vmax, \hat v) \leq d(\vmax, L\vmax)$. For the second term, using $\hat v = \hat T \hat v$ and $v_{N+1} = \hat T v_N$: $$ d(\hat v, v_N) \leq d(\hat T \hat v, \hat T v_N) + d(\hat T v_N, v_N) \leq \beta \, d(\hat v, v_N) + \beta \, e_N, $$ where $d(\hat T v_N, v_N) = d(v_{N+1}, v_N) \leq \beta \, e_N$ by contractivity of $\hat T$. Hence $(1-\beta) \, d(\hat v, v_N) \leq \beta \, e_N$. Combining with {eq}`eq-fvic4tri`: $$ (1-\beta) \, d(\vmax, v_N) \leq d(\vmax, L\vmax) + \beta \, e_N. $$ Substituting into {eq}`eq-fviclaim3` yields {eq}`eq-fvi2`. ◻ ``` (sss-op_approx)= #### Order-Preserving Approximations One attractive special case arises when the approximation operator is not only nonexpansive but also order preserving. In the next result we set $\hat{\TT} \coloneq \setntn{L \circ T_\sigma}{\sigma \in \Sigma}$. ```{prf:proposition} :label: p-approx_adp Let {prf:ref}`a-fvi` hold. If $L \colon V \to V$ is both nonexpansive and order preserving, then the pair $(L(V), \hat{\TT})$ is an ADP. ``` ```{prf:proof} Each $L \circ T_\sigma$ is order preserving as a composition of order preserving maps. If $v \in L(V)$, then $T_\sigma \, v \in V$ and $L T_\sigma \, v \in L(V)$, so each $L \circ T_\sigma$ is a self-map on $L(V)$. ◻ ``` In this setting, the approximate model $(L(V), \hat{\TT})$ inherits the structure of an ADP, and the optimality theory from {prf:ref}`c-adps` and {prf:ref}`c-adps2` applies directly to it. The error bounds in {prf:ref}`t-fvi1` and {prf:ref}`t-fvi2` quantify the gap between the optimal policies of the original and approximate models. ```{prf:lemma} :label: l-kaop The kernel averager $L$ defined in {eq}`eq-kernav` is order preserving on $(b\Xsf, \leq)$. ``` ```{prf:proof} Let $f, g \in b\Xsf$ with $f \leq g$. For each $x \in \Xsf$, $$ (Lf)(x) = \sum_{i=1}^n f(x_i) \kappa_i(x) \leq \sum_{i=1}^n g(x_i) \kappa_i(x) = (Lg)(x), $$ where the inequality uses $f(x_i) \leq g(x_i)$ and $\kappa_i(x) \geq 0$. ◻ ``` ```{prf:remark} Combining {prf:ref}`l-neka` and {prf:ref}`l-kaop`, the kernel averagers of {ref}`sss-la` (including piecewise linear interpolation as a special case; see {prf:ref}`eg-pli`) are both nonexpansive and order preserving. By {prf:ref}`p-approx_adp`, these approximation methods yield approximate ADPs to which the full optimality theory applies. ``` (ss-sa)= ### Stochastic Approximation In the previous section, we discussed methods for controlling error when iterating with an approximate Bellman operator $\hat T = L \circ T$. In that setting, the operator $T$ is known. In many applications, however, $T$ itself cannot be evaluated exactly because it involves an expectation over an unknown or intractable distribution. Instead, we can only observe noisy evaluations of $T$. **Stochastic approximation** provides a framework for finding fixed points in this setting. Here we provide a fast introduction. Stochastic approximation will drive the convergence theory of the learning algorithms we present below in {ref}`s-sim_learn`. #### Fixed Point Iteration and Damping Stochastic approximation can be thought of as a general technique for solving fixed point problems or root finding problems when information is limited. Here we focus on the fixed point case. In our analysis, $T \colon \Theta \to \Theta$ will be a contraction of modulus $\beta$ on a closed subset $\Theta$ of $\RR^n$. By Banach's fixed point theorem, $T$ has a unique fixed point $\bar \theta$ and $T^k \theta \to \bar \theta$ for any $\theta \in \Theta$. To start thinking about stochastic approximation, we first recall **damped iteration**, which fixes $\alpha \in (0,1)$ and updates via $$ \theta_{k+1} = \theta_k + \alpha (T \theta_k - \theta_k). $$ (eq-damped) This amounts to iterating with the map $F\theta \coloneq \theta + \alpha (T\theta - \theta)$. Damped iteration is an alternative to regular fixed point iteration that has advantages in some cases. ```{prf:lemma} :label: l-damped If $T$ is a contraction of modulus $\beta$ on $(\Theta, \| \cdot \|)$, then $F$ is a contraction of modulus $1 - \alpha + \alpha \beta < 1$ with the same fixed point $\bar \theta$. ``` ```{prf:proof} First, $F \bar \theta = \bar \theta + \alpha(T \bar \theta - \bar \theta) = \bar \theta$. For contractivity, $$ \begin{aligned} \| F\theta - F\theta' \| & = \| (1-\alpha)(\theta - \theta') + \alpha(T\theta - T\theta') \| \\ & \leq (1-\alpha) \| \theta - \theta' \| + \alpha \| T\theta - T\theta' \| \leq (1-\alpha + \alpha \beta) \| \theta - \theta' \|. \end{aligned} $$ Since $\beta < 1$, we have $1-\alpha + \alpha\beta < 1$. ◻ ``` One setting where damped iteration can converge faster than standard iteration is when the iterates oscillate. {numref}`f-sa_damped_trajectory` and {numref}`f-sa_damped_norms` illustrate this case with the linear map $T\theta = A\theta$, where $$ A = \begin{pmatrix} 0.5 & 0.6 \\ 0.4 & -0.7 \end{pmatrix}. $$ The fixed point is the origin. Standard iteration produces a spiraling trajectory, while damped iteration (with $\alpha = 0.7$) converges more smoothly. ```{figure} figures/sa_damped_trajectory_standard.pdf :name: f-sa_damped_trajectory Standard iteration ``` ```{figure} figures/sa_damped_trajectory_damped.pdf :name: f-sa_damped_trajectory-b Damped iteration (α = 0.7) ``` ```{figure} figures/sa_damped_norms.pdf :name: f-sa_damped_norms Convergence comparison: norm of iterates ``` #### The Robbins--Monro Algorithm Now suppose that $T$ is a contraction on $\Theta$ with unique fixed point $\bar \theta$, but we can only evaluate $T$ with noise. That is, given input $\theta$, we observe $T\theta + W$, where $W$ is a zero-mean random disturbance. We cannot observe $T\theta$ or $W$ separately, only their sum. The **Robbins--Monro algorithm** {cite:p}`robbins1951stochastic` generalizes damped iteration to this noisy setting. It tells us to fix an initial $\theta_0 \in \Theta$ and update via $$ \theta_{k+1} = \theta_k + \alpha_k (T \theta_k + W_{k+1} - \theta_k), $$ (eq-robbins_monro) where the **learning rate** $(\alpha_k) \subset (0,1)$ satisfies $\alpha_k \to 0$. Compare with damped iteration in {eq}`eq-damped`: the only differences are (a) the learning rate decreases over time, and (b) the noise term $W_{k+1}$ enters the update. The decreasing learning rate ensures that, asymptotically, the noise is averaged away while the signal from $T$ accumulates. #### Convergence The following well-known result, due to {cite:t}`tsitsiklis1994asynchronous`, provides conditions under which the Robbins--Monro iterates converge to $\bar \theta$. In the statement, $(\fF_k)_{k \geq 0}$ is the filtration generated by $\theta_0, W_1, \ldots, W_k$. ```{prf:theorem} :label: t-sa_conv Let $T$ be an order-preserving contraction on $\Theta \subset \RR^n$ with fixed point $\bar \theta$, and let $(\theta_k)$ be generated by {eq}`eq-robbins_monro`. If 1. $\EE [W_{k+1} \,|\, \fF_k] = 0$ for all $k \geq 0$, 2. $\EE [\| W_{k+1} \|^2 \,|\, \fF_k] \leq C(1 + \| \theta_k \|^2)$ a.s. for some $C > 0$, and 3. $\sum_{k \geq 0} \alpha_k = \infty$ and $\sum_{k \geq 0} \alpha_k^2 < \infty$, then $\theta_k \to \bar \theta$ with probability one. ``` The restrictions on $(\alpha_k)$ in (iii) are called the **Robbins--Monro conditions**. The restriction $\sum \alpha_k = \infty$ ensures that the learning rates do not decay too fast, so the algorithm can reach $\bar \theta$ from any starting point. The condition $\sum \alpha_k^2 < \infty$ ensures that the accumulated noise variance is finite. A standard choice is $\alpha_k = (k+1)^{-\gamma}$ for $\gamma \in (1/2, 1)$. ```{prf:remark} :label: rem-async_sa The update {eq}`eq-robbins_monro` is synchronous: a single scalar $\alpha_k$ is applied to every component of $\theta$ at each step. {cite}`tsitsiklis1994asynchronous` proves a more general **asynchronous** version in which only one component of $\theta$ is updated at each step, using a per-component learning rate, provided every component is updated infinitely often and the Robbins--Monro conditions hold componentwise. This asynchronous extension is what underlies the sequential algorithm in {ref}`sss-asv` and the Q-learning convergence theorem in {ref}`ss-qlearn`. ``` (sss-sa_asset)= #### Example: Asset Pricing To illustrate stochastic approximation for a fixed point problem, we consider a basic asset pricing model. The model is deliberately simple, so that the exact solution can be computed by linear algebra (allowing us to verify numerically that stochastic approximation converges to the correct answer). In the model, the price of the asset satisfies $$ V_t = \beta \, \EE [V_{t+1} + D_{t+1} \mid \fF_t], $$ where $\beta \in (0,1)$ is a discount factor and $D_t$ is the dividend at time $t$. (See Section 6.3 of {cite}`sargent2025dynamic` for background on the model.) Assume that $(X_t)$ is $P$-Markov on a finite set $\Xsf$ with transition matrix $P$, and $D_{t+1} = d(X_{t+1})$. The equilibrium price function $v$ is the unique fixed point in $\RR^\Xsf$ of the map $(Tv)(x) = \beta \sum_{x'} [v(x') + d(x')] P(x, x')$. ```{exercise} :label: ex-ap_contraction Show that the operator $T$ defined above is a contraction of modulus $\beta$ on $(\RR^\Xsf, \| \cdot \|_\infty)$. ``` ```{solution} ex-ap_contraction Fix $v, w \in \RR^\Xsf$ and $x \in \Xsf$. Then $$ \begin{aligned} |(Tv)(x) - (Tw)(x)| & = \beta \left| \sum_{x'} [v(x') - w(x')] P(x, x') \right| \\ & \leq \beta \sum_{x'} |v(x') - w(x')| P(x, x') \leq \beta \| v - w \|_\infty. \end{aligned} $$ Taking the supremum over $x$ gives $\| Tv - Tw \|_\infty \leq \beta \| v - w \|_\infty$. ``` Setting $K \coloneq \beta P$, the unique fixed point of $T$ is $v^* = (I - K)^{-1} Kd$. We can also compute $v^*$ using stochastic approximation. To do this we use the random map $v \mapsto \hat Tv$ given by the algorithm below ```{prf:algorithm} :label: algo-approx_learning-auto-1 - for $x \in \Xsf$: - draw $X' \sim P(x, \cdot)$ - set $(\hat Tv)(x) = \beta [v(X') + d(X')]$ ``` We understand $\hat T$ as an operator that maps an element $v$ of $\RR^\Xsf$ into a random vector $\hat Tv$ in $\RR^\Xsf$. Note that $$ \EE \, [(\hat Tv)(x)] = \sum_{x'} \beta [v(x') + d(x')] P(x, x') = (Tv)(x). $$ Now we iterate via $$ v_{k+1} = v_k + \alpha_k (\hat T v_k - v_k) $$ (eq-adamp) which parallels the damped iteration in {eq}`eq-damped`. With $W_{k+1} \coloneq \hat Tv_k - Tv_k$ we have $\EE [W_{k+1} \mid \fF_k] = 0$ and $$ v_{k+1} = v_k + \alpha_k (Tv_k + (\hat T v_k - Tv_k) - v_k) = v_k + \alpha_k (Tv_k + W_{k+1} - v_k), $$ so updating obeys the Robbins--Monro rule {eq}`eq-robbins_monro`. The operator $T$ is order preserving since $v \leq w$ implies $Tv \leq Tw$ (the sum involves nonnegative weights). Conditions (i) and (iii) of {prf:ref}`t-sa_conv` hold by construction.[^1] Thus {prf:ref}`t-sa_conv` applies, and $(v_k)$ converges to $v^*$ with probability one. {numref}`f-sa_asset_pricing_batch` compares the stochastic approximation solution with the exact linear algebra solution $v^* = (I - K)^{-1} Kd$ for a specification with $|\Xsf| = 25$, $\beta = 0.96$, $d(x) = \me^x$, and $(X_t)$ discretized from an AR(1) process with $\rho = 0.9$ and $\sigma = 0.02$. The learning rate is $\alpha_k = (k+1)^{-0.55}$. After $500$ iterations (left panel), the SA estimate is uniformly below $v^*$ but has already captured the shape of the price function. After $50{,}000$ iterations (right panel), the SA solution closely matches the exact value. ```{figure} figures/sa_asset_pricing_batch.pdf :name: f-sa_asset_pricing_batch :width: 95% Batch stochastic approximation vs. linear algebra for asset pricing: partial convergence (left) and near-full convergence (right) ``` (sss-asv)= #### A Sequential Version An alternative way to apply stochastic approximation is to simulate a single trajectory $(X_t)_{t \geq 0}$ of the $P$-Markov chain and update $v$ sequentially. In particular, at each step $t$, only the entry $v(X_t)$ is modified: $$ v_{t+1}(X_t) = v_t(X_t) + \alpha_t \big[ (\hat T_s v_t)(X_t) - v_t(X_t) \big], $$ (eq-sa_asset_update) where $$ (\hat T_s v_t)(X_t) \coloneq \beta [v_t(X_{t+1}) + d(X_{t+1})] . $$ For states $y \neq X_t$ we hold the estimate steady, setting $v_{t+1}(y) = v_t(y)$. Note that $\hat T_s$ is a single-sample version of our previous version $\hat T$ from {ref}`sss-sa_asset`. The asynchronous extension of {prf:ref}`t-sa_conv` noted in {prf:ref}`rem-async_sa` shows that the iterates $v_t$ converge to $v^*$ with probability one, provided every state is visited infinitely often. The advantage of the sequential version will be clearer when we study Q-learning in {ref}`ss-qlearn`. In real-world Q-learning applications, sequences of observations are naturally generated by the environment and the agent's interactions with that environment. Optimal policies can be computed from these sequences without knowing or even estimating underlying dynamics. {numref}`f-sa_asset_pricing` illustrates the sequential version on the same asset pricing model from {ref}`sss-sa_asset`, using per-state learning rates $\alpha_t = n_t(X_t)^{-0.55}$, where $n_t(x)$ counts the number of visits to state $x$ by time $t$. The left panel shows the SA estimate after $10^4$ updates. Convergence is fastest near the center of the state space, where the stationary distribution of the chain concentrates most of its mass and states are visited most frequently. At the tails, fewer visits lead to slower learning. After $5 \times 10^7$ updates (right panel), the SA solution closely matches the exact value. ```{figure} figures/sa_asset_pricing.pdf :name: f-sa_asset_pricing :width: 95% Sequential stochastic approximation vs. linear algebra for asset pricing: partial convergence (left) and near-full convergence (right) ``` (s-sim_learn)= ## Simulation and Learning In most of this book we have worked in settings where the underlying model is known. In many applications, the model is unknown and the agent must learn from data generated by interacting with the environment. This section covers two approaches to learning: Q-learning (value-based) and policy gradient methods (policy-based). We emphasize that this section is intended only as a very brief introduction to a vast literature. (ss-qlearn)= ### Q-Learning Q-learning is a model-free algorithm that learns optimal policies from data, without knowledge of transition probabilities or reward functions. In this section we connect Q-learning to the Q-factor ADP framework developed in {ref}`sss-qfmo`. #### Q-Factors and Model-Free Learning Recall from {ref}`sss-qfmo` that, for a finite MDP $(\Gamma, r, \beta, P)$ with state space $\Xsf$ and action space $\Asf$, the Q-factor Bellman equation takes the form $$ q(x, a) = r(x, a) + \beta \sum_{x'} \max_{a' \in \Gamma(x')} q(x', a') \; P(x, a, x') \qquad ((x,a) \in \Gsf). $$ (eq-qfbe) As shown in {prf:ref}`p-qffdp`, 1. the unique solution $\qmax$ to {eq}`eq-qfbe` obeys $\vmax(x) = \max_{a \in \Gamma(x)} \qmax(x, a)$, where $\vmax$ is the value function, and 2. $\sigma$ is an optimal policy whenever $\sigma(x) = \argmax_{a \in \Gamma(x)} \qmax(x, a)$ for all $x$. Item (ii) tells us that the optimal policy can be calculated from $\qmax$ without knowledge of the transition kernel $P$. This observation opens the door to learning: if we can approximate $\qmax$ from samples, we can compute an approximately optimal policy without knowing the model. #### The Q-Learning Algorithm Q-learning approximates $\qmax$ by stochastic approximation of the Q-factor Bellman operator. At each step $t$, the agent is in state $X_t = x$, takes action $A_t = a$, observes reward $R_{t+1} = r(x, a)$ and next state $X_{t+1} \sim P(x, a, \cdot)$, and updates the Q-function (also called the **Q-table**) via $$ q_{t+1}(x, a) = (1 - \alpha_t) \, q_t(x, a) + \alpha_t \left( R_{t+1} + \beta \max_{a' \in \Gamma(X_{t+1})} q_t(X_{t+1}, a') \right). $$ (eq-qlupdate) Here $\alpha_t \in (0,1)$ is a **learning rate**. The term in parentheses is a single-sample estimate of the right-hand side of {eq}`eq-qfbe`. The update blends this fresh sample with the current estimate. The agent needs to observe only the tuple $(X_t, A_t, R_{t+1}, X_{t+1})$ at each step. Actions are selected by a **behavior policy** that balances exploration (visiting new state-action pairs) with exploitation (using the current estimate of $\qmax$). A common choice is the $\epsilon$-**greedy** strategy: with probability $\epsilon$, choose a random feasible action; otherwise choose $\argmax_{a \in \Gamma(x)} q_t(x, a)$. Because the convergence target $\qmax$ does not depend on how actions are selected---only on the Bellman equation {eq}`eq-qfbe`---Q-learning is called an **off-policy** method. #### Convergence Convergence of Q-learning follows from the stochastic approximation theory developed in {ref}`ss-sa`. To see this, let $S$ be the Q-factor Bellman operator defined in {eq}`eq-pabo0`, so that $(Sq)(x, a) = r(x, a) + \beta \sum_{x'} \max_{a'} q(x', a') P(x, a, x')$. By {prf:ref}`ex-saac`, each policy operator $S_\sigma$ is a contraction of modulus $\beta$ on $(\RR^\Gsf, \| \cdot \|_\infty)$. Since the supremum norm is sup-nonexpansive, {prf:ref}`t-contract` gives that $S$ is also a contraction of modulus $\beta$ with unique fixed point $\qmax$. At each step $t$, the agent observes $(X_t, A_t, R_{t+1}, X_{t+1})$ and computes the sample $$ (\hat S q_t)(X_t, A_t) \coloneq R_{t+1} + \beta \max_{a' \in \Gamma(X_{t+1})} q_t(X_{t+1}, a'). $$ With this notation, the Q-learning update {eq}`eq-qlupdate` becomes $$ q_{t+1}(X_t, A_t) = q_t(X_t, A_t) + \alpha_t \big[ (\hat S q_t)(X_t, A_t) - q_t(X_t, A_t) \big], $$ which parallels {eq}`eq-sa_asset_update`. Since the fixed point of $S$ is $\qmax$, applying the asynchronous version of {prf:ref}`t-sa_conv` noted in {prf:ref}`rem-async_sa` leads to the following classical result {cite:p}`watkins1992qlearning,tsitsiklis1994asynchronous`: ```{prf:theorem} :label: t-qlconv Consider a finite MDP with $\beta \in (0,1)$. If every state-action pair $(x, a) \in \Gsf$ is visited infinitely often and the learning rates satisfy the Robbins--Monro conditions of {prf:ref}`t-sa_conv`, then the Q-learning iterates satisfy $q_t \to \qmax$ almost surely. ``` {prf:ref}`t-qlconv` is often referred to as a "tabular" result: it assumes that the Q-table stores a separate value for every $(x, a)$ pair. An alternative approach, which scales to high dimensions, is to combine Q-learning with one of the function approximation schemes from {ref}`s-approx`. When the scheme in question is neural nets we get deep Q-learning, which has been highly successful in certain environments. See {ref}`s-cn_approx` for more discussion. #### Example: Inventory Management To illustrate Q-learning, we consider an inventory management problem where a firm manages stock of a single product. Inventory $(X_t)_{t \geq 0}$ evolves according to $$ X_{t+1} = \max(X_t - D_{t+1}, 0) + A_t, $$ where $D_t$ is iid demand and $A_t \in \{0, \ldots, K - X_t\}$ is the order quantity chosen by the firm. The state space is $\Xsf = \{0, \ldots, K\}$. Per-period profits are $$ \pi(X_t, A_t, D_{t+1}) = p \min(X_t, D_{t+1}) - c \, A_t - \kappa \1\{A_t > 0\}, $$ where $p$ is the selling price, $c$ is the unit ordering cost, and $\kappa$ is a fixed cost of placing an order. The firm seeks to maximize expected discounted profits $\EE \sum_{t \geq 0} \beta^t \pi(X_t, A_t, D_{t+1})$. Now suppose that a manager does not know the demand distribution, the cost parameters, or the transition dynamics. The manager can still learn the optimal ordering policy by Q-learning. At each period, the manager observes the current inventory $X_t$, places an order $A_t$, records the resulting profit, and observes the next-period inventory $X_{t+1}$. These four quantities suffice for the update {eq}`eq-qlupdate`. {numref}`f-inventory_qlearning` compares the Q-learning solution with the exact VFI solution for this problem, with $K = 20$, selling price $p = 1$, geometric demand with parameter $\theta = 0.7$, $c = 0.2$, $\kappa = 0.8$, and $\beta = 0.98$. After 20 million steps of interaction, the learned value function and policy closely match those obtained by exact value function iteration with full model knowledge. ```{figure} figures/inventory_qlearning.pdf :name: f-inventory_qlearning :width: 95% Value function and policy: VFI (exact) vs. Q-learning (model-free) ``` The 20 million steps required in this small example illustrate a general feature of model-free methods: because they do not exploit knowledge of transition probabilities or reward functions, they must extract all structural information from observed data, which can require very long trajectories. In practice, the cost of exploration is compounded by the fact that the agent typically needs to make reasonable decisions at each time step, limiting the extent to which it can explore freely. These challenges motivate both deep Q-learning (which uses function approximation to generalize across states) and the policy gradient methods discussed in {ref}`ss-policy_grad` (which search directly in policy space). (ss-rs_qlearn)= ### Risk-Sensitive Q-Learning The Q-learning framework can be extended to accommodate risk-sensitive preferences. As we will see, the extension provides a concrete illustration of the order-reversing FDP theory developed in {ref}`ss-orfdps`. #### Risk-Sensitive Bellman Equation We can construct a risk-sensitive version of the inventory problem Bellman equation by replacing the expectation with a certainty equivalent (see, e.g., {ref}`sss-serez`). Using the entropic certainty equivalent with parameter $\gamma > 0$, the Bellman equation becomes $$ v(x) = \max_{a \in \Gamma(x)} \left( -\frac{1}{\gamma} \right) \ln \left\{ \sum_{x'} \exp \left[ -\gamma \left( r(x, a) + \beta \, v(x') \right) \right] P(x, a, x') \right\}. $$ (eq-rsbell) Larger $\gamma$ implies stronger aversion to downside risk. (sss-rs_fdp)= #### The Order-Reversing FDP Structure Our aim is to derive a Q-factor Bellman equation for the risk-sensitive problem, analogous to {eq}`eq-qfbe`, and then build a Q-learning algorithm from it. One challenge is how to shift expectation to the outside of the maximization step, as in the standard Q-factor equation (see {eq}`eq-qfbe`). We will use the order-reversing FDP framework of {ref}`ss-orfdps` to factor the Bellman equation into a suitable form, enabling single-sample updates. We set up the FDP $(V, F, \hat V, \GG)$ as follows. The primary value space is $V \coloneq \RR^\Xsf$ and the subordinate value space is $\hat V \coloneq \RR_{++}^\Gsf$ (strictly positive functions on $\Gsf$), each with the pointwise partial order. The connecting map $F \colon V \to \hat V$ is defined by $$ (Fv)(x, a) \coloneq \sum_{x'} \exp\!\left[ -\gamma \left( r(x, a) + \beta \, v(x') \right) \right] P(x, a, x'). $$ (eq-rsfmap) Noting that the image of $F$ lies in $\RR_{++}^\Gsf$ (since the exponential function is strictly positive), we define the family $\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}$ of maps from $\hat V$ to $V$ by $$ (G_\sigma \, q)(x) \coloneq -\frac{1}{\gamma} \ln\!\big(q(x, \sigma(x))\big). $$ (eq-rsgmap) The map $G_\sigma$ evaluates the Q-function at the action prescribed by $\sigma$ and applies $\psi^{-1}(y) = -(1/\gamma)\ln(y)$ to convert the positive Q-value back to the value scale. ```{exercise} :label: ex-approx_learning-auto-1 Show that both $F$ and $G_\sigma$ are order-reversing. ``` ```{solution} ex-approx_learning-auto-1 Let us start with $F$. If $v \leq v'$ pointwise, then $-\gamma(r(x,a) + \beta v(x')) \geq -\gamma(r(x,a) + \beta v'(x'))$ for all $x'$, so $(Fv)(x,a) \geq (Fv')(x,a)$. For $G_\sigma$: if $q \leq q'$ pointwise, then $\ln(q(x, \sigma(x))) \leq \ln(q'(x, \sigma(x)))$, and multiplication by $-1/\gamma < 0$ reverses the inequality, giving $G_\sigma \, q \geq G_\sigma \, q'$. ``` Since $\Gamma(x)$ is finite, the set $\{G_\sigma \, q\}_{\sigma \in \Sigma}$ has a greatest element for every $q \in \hat V$. Thus, $(V, F, \hat V, \GG)$ is an order-reversing FDP. #### Primary and Subordinate Bellman Equations We have chosen $(V, F, \hat V, \GG)$ so that the primary ADP for this FDP has the Bellman equation {eq}`eq-rsbell`. To see this, recall that, in the FDP setting, $\Gmax$ is defined by $\Gmax q \coloneq \bigvee_\sigma G_\sigma \, q$. Here this gives $(\Gmax q)(x) = -(1/\gamma)\ln(\min_{a \in \Gamma(x)} q(x, a))$, since $\psi^{-1}$ is decreasing. The primary policy operators $T_\sigma \coloneq G_\sigma \circ F$ give $$ (T_\sigma \, v)(x) = -\frac{1}{\gamma} \ln\!\left( \sum_{x'} P(x, \sigma(x), x') \exp\!\left[-\gamma(r(x, \sigma(x)) + \beta \, v(x'))\right] \right). $$ With $\tmax = \Gmax \circ F$ (by {prf:ref}`l-gmaxr`), we recover {eq}`eq-rsbell`. For the subordinate ADP, by {prf:ref}`l-gmhr`, the Bellman min-operator is $\htmin = F \circ \Gmax$. Computing explicitly: $$ \begin{aligned} (\htmin q)(x, a) &= (F(\Gmax q))(x, a) \\ &= \sum_{x'} P(x, a, x') \exp\!\left[-\gamma\!\left(r(x, a) + \beta \cdot \left(-\tfrac{1}{\gamma}\right) \ln\!\big(\min_{a'} q(x', a')\big)\right)\right] \\ &= \sum_{x'} P(x, a, x') \left[\exp(-\gamma \, r(x, a)) \cdot \left(\min_{a' \in \Gamma(x')} q(x', a')\right)^\beta \right]. \end{aligned} $$ Thus the subordinate Bellman equation $q = \htmin q$ takes the form $$ q(x, a) = \sum_{x'} P(x, a, x') \left[ \exp(-\gamma \, r(x, a)) \cdot \left(\min_{a' \in \Gamma(x')} q(x', a')\right)^\beta \right]. $$ (eq-rsqbe) This is the **risk-sensitive Q-factor Bellman equation**---a fixed point equation in $q$ alone, and the min-analogue of the standard Q-factor Bellman equation {eq}`eq-qfbe`. Notice that the expectation (summation over $x'$) appears on the outside of {eq}`eq-rsqbe` because it originates from the map $F$ in the FDP factorization. The maps $G_\sigma$ act pointwise and do not involve the transition kernel $P$. This separation makes Q-learning possible: we can replace the expectation with a single-sample observation, just as in the standard case. Our original goal is to find an optimal policy for the primary risk-sensitive MDP. The Q-learning algorithm described below operates on the subordinate Q-factor problem, converging to its min-value function $\qmin$. The key question is whether $\qmin$ suffices to recover an optimal policy for the primary problem. {prf:ref}`t-rgscr` gives an affirmative answer: any policy $\sigma$ satisfying $G_\sigma \, \qmin = \Gmax \, \qmin$ is optimal for the primary problem $(V, \TT)$. Since $(\Gmax q)(x) = -(1/\gamma)\ln(\min_a q(x,a))$, this condition reduces to $\sigopt(x) = \argmin_{a \in \Gamma(x)} \qmin(x, a)$. In other words, once Q-learning has converged, we read off the optimal policy by minimizing the learned Q-function at each state. ```{exercise} :label: ex-rs_contraction Let $\hat T_\sigma = F \circ G_\sigma$ be the subordinate policy operator for the risk-sensitive FDP defined above, mapping $\RR_{++}^\Gsf$ to itself. Define the metric $d(q, q') \coloneq \| \ln q - \ln q' \|_\infty$ on $\RR_{++}^\Gsf$. Show that $\hat T_\sigma$ is a contraction of modulus $\beta$ under $d$. ``` ```{solution} ex-rs_contraction Using the expressions derived above, $$ (\hat T_\sigma q)(x, a) = \exp(-\gamma \, r(x, a)) \sum_{x'} P(x, a, x') \, q(x', \sigma(x'))^\beta. $$ Suppose $d(q, q') \leq \delta$, so that $\me^{-\delta} \leq q(x, a) / q'(x, a) \leq \me^{\delta}$ for all $(x, a) \in \Gsf$. Then $q(x', \sigma(x'))^\beta$ lies between $\me^{-\beta \delta}$ and $\me^{\beta \delta}$ times $q'(x', \sigma(x'))^\beta$. Since $P(x, a, \cdot)$ has nonnegative weights summing to one, $$ \me^{-\beta\delta} \, (\hat T_\sigma q')(x, a) \leq (\hat T_\sigma q)(x, a) \leq \me^{\beta\delta} \, (\hat T_\sigma q')(x, a), $$ where the $\exp(-\gamma \, r(x, a))$ factor cancels in the ratio. Taking logarithms and the supremum over $(x, a)$ gives $d(\hat T_\sigma q, \hat T_\sigma q') \leq \beta \, \delta = \beta \, d(q, q')$. ``` (sss-rsbehav)= #### Optimal Behavior Under Risk Sensitivity Before turning to Q-learning, we illustrate how risk sensitivity affects the optimal policy and the induced dynamics in the inventory model of {ref}`ss-qlearn`. We apply value function iteration to the risk-sensitive Bellman equation {eq}`eq-rsbell` for three levels of risk sensitivity, $\gamma \in \{0.01, 1.0, 2.0\}$, holding all other parameters at the values used for {numref}`f-inventory_qlearning`. {numref}`f-rs_inventory_vfi_gamma` shows the resulting optimal order quantities. A more risk-sensitive firm (larger $\gamma$) orders less stock, accepting lower expected sales in exchange for more predictable profits. {numref}`f-rs_inventory_sim_gamma` confirms the effect in simulation: inventory paths under the optimal policy stabilize at lower levels as $\gamma$ increases. ```{figure} figures/rs_inventory_vfi_gamma.pdf :name: f-rs_inventory_vfi_gamma Optimal policy as a function of inventory, for $\gamma \in \{0.01, 1.0, 2.0\}$ ``` ```{figure} figures/rs_inventory_sim_gamma.pdf :name: f-rs_inventory_sim_gamma :width: 85% Inventory dynamics under the optimal policy at $\gamma \in \{0.01, 1.0, 2.0\}$ ``` #### The Risk-Sensitive Q-Learning Update The corresponding Q-learning update replaces the expectation in {eq}`eq-rsqbe` with a single sample: $$ q_{t+1}(x, a) = (1 - \alpha_t) \, q_t(x, a) + \alpha_t \left[ \exp(-\gamma \, R_{t+1}) \cdot \left(\min_{a' \in \Gamma(X_{t+1})} q_t(X_{t+1}, a')\right)^\beta \right]. $$ (eq-rsqlupdate) Note several differences from the standard case. First, the optimal policy minimizes rather than maximizes $q$. Second, the reward enters through $\exp(-\gamma R_{t+1})$ rather than additively. Third, the continuation value enters as a power $(\min_{a'} q_t)^\beta$ rather than a scaled sum $\beta \cdot \max_{a'} q_t$. Fourth, to make the calculations, the agent now needs to know $\gamma$ as well as $\beta$. ```{prf:remark} As far as we know, convergence of the risk-sensitive Q-learning iterates is an open question. {prf:ref}`ex-rs_contraction` shows that each subordinate policy operator $\hat T_\sigma$ is a contraction of modulus $\beta$ under $d(q, q') = \|\ln q - \ln q'\|_\infty$ on $\RR_{++}^\Gsf$, and each $\hat T_\sigma$ is order preserving (since $F$ and $G_\sigma$ both reverse order). However, the metric $d$ is not generated by a norm, so the asynchronous stochastic approximation result of {cite:t}`tsitsiklis1994asynchronous` does not directly apply. Convergence seems likely but we are not aware of a result that proves it. ``` {numref}`f-rs_inventory_q_learning` shows the value function and policy produced by the risk-sensitive Q-learning update {eq}`eq-rsqlupdate` after $2 \times 10^7$ steps on the inventory model at $\gamma = 1.0$, alongside the exact VFI solution. Q-learning recovers the optimal policy and value function despite the open theoretical question above; see the QuantEcon lectures for implementation details and code. ```{figure} figures/rs_inventory_q_learning.pdf :name: f-rs_inventory_q_learning :width: 95% Risk-sensitive inventory at $\gamma = 1.0$: VFI vs. Q-learning after $2 \times 10^7$ steps ``` (ss-policy_grad)= ### Policy-Based Methods The methods discussed so far are *value-based*: they learn a value function (or Q-function) and extract a policy from it. An alternative approach is to parameterize the policy directly and optimize over the parameter space. Such **policy gradient** methods are widely used in modern reinforcement learning. #### The Policy Gradient Approach Rather than searching over all feasible policies, we restrict attention to a parameterized family $\{\sigma(\cdot, \theta) : \theta \in \Theta\}$, where $\theta \in \Theta \subset \RR^m$ is a parameter vector. Given an initial condition $x_0$, define $$ M(\theta) \coloneq v_{\sigma(\cdot, \theta)}(x_0), $$ the lifetime value of the policy $\sigma(\cdot, \theta)$ starting from $x_0$. The policy gradient method maximizes $M(\theta)$ over $\theta$ by gradient ascent: $$ \theta_{n+1} = \theta_n + \lambda_n \nabla_\theta \hat M(\theta_n), $$ (eq-pgascent) where $\lambda_n > 0$ is a step size and $\hat M$ is a Monte Carlo approximation to $M$ described below. To illustrate, consider the optimal savings problem from {ref}`s-og`: a household chooses a consumption policy to maximize $\EE \sum_{t \geq 0} \beta^t u(c_t)$, where $u$ is a smooth utility function, subject to $$ a_{t+1} = R(a_t - c_t) + Y_{t+1}, \qquad c_t \in [0, a_t], $$ (eq-pgifp) with iid income shocks $(Y_t)$ and gross interest rate $R > 0$. A policy $\sigma(a, \theta)$ maps current assets $a$ to consumption $c$, parameterized by $\theta$ (for example, the weights of a neural network). The state dynamics are a smooth function of the policy: given a realization of the shock sequence, $a_{t+1}$ is differentiable in $\theta$ through $c_t = \sigma(a_t, \theta)$. #### Monte Carlo Approximation In general, $M(\theta)$ cannot be evaluated exactly because it involves an expectation over the shock sequence. Instead, we form a Monte Carlo approximation by simulating $N$ independent paths of length $T$ under the policy $\sigma(\cdot, \theta)$. For each path $i = 1, \ldots, N$, set $a^i_0 = a_0$ and generate $$ c^i_t = \sigma(a^i_t, \theta), \qquad a^i_{t+1} = R(a^i_t - c^i_t) + Y^i_{t+1}, \qquad (t = 0, \ldots, T-1), $$ where $(Y^i_t)$ are iid draws. The Monte Carlo estimate is $$ \hat M(\theta) = \frac{1}{N} \sum_{i=1}^N \sum_{t=0}^{T-1} \beta^t u(\sigma(a^i_t, \theta)). $$ (eq-pgmc) The key observation is that $\hat M(\theta)$ is differentiable in $\theta$ whenever $u$ and $\sigma(\cdot, \theta)$ are differentiable, because the state dynamics {eq}`eq-pgifp` are a smooth function of the action. (The shocks $Y^i_{t+1}$ do not depend on $\theta$, so they pass through the differentiation.) Each simulated path defines a computational graph from $\theta$ through the policy evaluations and state transitions to the cumulative payoff. Modern automatic differentiation libraries can differentiate through these graphs efficiently, providing $\nabla_\theta \hat M(\theta)$ at cost comparable to evaluating $\hat M(\theta)$ itself. The gradient estimate is substituted into the ascent step {eq}`eq-pgascent`. A common and effective choice for the policy parameterization is a neural network mapping states to actions, with $\theta$ collecting all the weights and biases. The universal approximation theorem {cite:p}`hornik1989multilayer` guarantees that sufficiently expressive networks can approximate any continuous policy. The main advantage of policy gradient methods is that they handle continuous state and action spaces naturally and scale to high-dimensional problems. Discussion of related literature can be found in {ref}`s-cn_approx`. {numref}`f-policy_gradient_ifp` applies the policy gradient method to the household problem {eq}`eq-pgifp` with CRRA utility ($\gamma = 1.5$), $\beta = 0.96$, $R = 1.01$, and iid lognormal income. The policy is parameterized by a neural network with three hidden layers of six units each, trained for $400$ epochs using $100$ paths of length $200$. The learned consumption policy closely matches the exact solution obtained by the endogenous grid method (EGM; see {ref}`sss-egm`). For this low-dimensional problem, EGM is far more efficient; we use it here as a trusted benchmark to verify the policy gradient solution. The advantage of policy gradient methods emerges in high-dimensional settings where traditional grid-based methods are infeasible. ```{figure} figures/policy_gradient_ifp.pdf :name: f-policy_gradient_ifp Consumption policy: EGM (exact) vs. policy gradient (neural network) ``` Interestingly, the policy gradient method recovers the globally optimal policy despite the fact that we are optimizing lifetime value at a single initial condition $a_0$. This raises a natural question: when does maximizing at one initial condition generate an optimal policy --- that is, a policy that is optimal across all states? We address this question in the next section. (sss-local_global)= #### From Local to Global Optimality In general, $v_\sigma(x_0) = \vmax(x_0)$ does not imply $v_\sigma = \vmax$. But perhaps implication does hold in some settings? {cite}`stachurski2024local` analyze this question for MDPs on general state spaces. Here we present a simplified version for finite MDPs that conveys the core idea. In stating the main result, we recall that a stochastic matrix $P_\sigma$ is called **irreducible** if, for every pair of states $x, y \in \Xsf$, there exists an $m \in \NN$ with $P_\sigma^m(x, y) > 0$. ```{prf:theorem} :label: t-local_global Consider a finite MDP with $\beta \in (0,1)$. Let $\sigma$ be a feasible policy such that $P_\sigma$ is irreducible. Then the following are equivalent: 1. There exists an $x \in \Xsf$ with $v_\sigma(x) = \vmax(x)$. 2. The policy $\sigma$ is optimal (i.e., $v_\sigma = \vmax$). ``` ```{prf:proof} That (ii) implies (i) is immediate. For the converse, suppose $v_\sigma(x) = \vmax(x)$ for some $x \in \Xsf$. Define $h \coloneq \vmax - v_\sigma$. By definition of $\vmax$, we have $h \geq 0$. We claim that $h = 0$. Since $v_\sigma$ is the fixed point of $T_\sigma$ and $\vmax = T\vmax \geq T_\sigma \vmax$, we have $$ \vmax - v_\sigma \geq T_\sigma \vmax - T_\sigma v_\sigma = \beta P_\sigma (\vmax - v_\sigma), $$ that is, $h \geq \beta P_\sigma h$. Iterating gives $h \geq \beta^n P_\sigma^n h$ for all $n \in \NN$. Evaluating at $x$, $$ 0 \leq \beta^n (P_\sigma^n h)(x) \leq h(x) = 0 \qquad \text{for all } n \in \NN, $$ (eq-hprop) where the first inequality uses $h \geq 0$ and $P_\sigma^n h \geq 0$, and the second uses $h \geq \beta^n P_\sigma^n h$. Hence $(P_\sigma^n h)(x) = 0$ for all $n \geq 1$. Now suppose, for the sake of contradiction, that $h(y) > 0$ for some $y \in \Xsf$. By irreducibility, there exists an $m$ with $P_\sigma^m(x, y) > 0$. Then $$ (P_\sigma^m h)(x) = \sum_{y'} h(y') P_\sigma^m(x, y') \geq h(y) P_\sigma^m(x, y) > 0, $$ contradicting $(P_\sigma^m h)(x) = 0$. Hence $h = 0$, proving (ii). ◻ ``` {prf:ref}`t-local_global` tells us that policy gradient methods---which optimize at a single initial condition---produce globally optimal policies provided the candidate policy induces irreducible dynamics. Irreducibility ensures that every state is "reachable" from $x_0$, so optimality propagates throughout the state space. This helps explain the global convergence observed in {numref}`f-policy_gradient_ifp` for the household problem {eq}`eq-pgifp`. The income shocks $(Y_t)$ are lognormal with unbounded support, so they spread the state across a wide range of asset levels regardless of the consumption policy. This provides a form of irreducibility: every region of the state space is visited with positive probability under any feasible policy, allowing optimality at the initial condition $a_0$ to propagate globally. (s-cn_approx)= ## Chapter Notes The curse of dimensionality and the role of approximation in dynamic programming are discussed at length in {cite}`powell2007approximate` and {cite}`rust1997using`. {prf:ref}`t-fvi1` and {prf:ref}`t-fvi2` generalize the error bounds for fitted value iteration in {cite}`stachurski2008continuous` from concrete MDPs on Euclidean state spaces to the abstract ADP setting. {cite}`li2026performanceguaranteesdatadrivensequential` provide performance guarantees for approximate dynamic programming in the form of ratio bounds. {cite}`lee2026bellmanfixedpointgeometry` studies convergence of Q-factor value function iteration. The use of neural networks for approximate dynamic programming has become popular in recent years. The classic reference is {cite}`bertsekas1996neuro`. More recent examples include {cite}`maliar2021deep`, {cite}`kase2022estimating`, {cite}`han2026deepham`, {cite}`pascal2024artificial`, {cite}`kamber2025estimating`, {cite}`ashwin2024neural`, and {cite}`oguz2026trainingneuralnetworksembedded`. See {cite}`scheidegger2026deep` for an overview of the field. The stochastic approximation framework of {ref}`ss-sa` originates with {cite}`robbins1951stochastic`. The convergence result in {prf:ref}`t-sa_conv` is based on {cite}`tsitsiklis1994asynchronous`, who proves convergence in a more general asynchronous setting. {cite}`sutton2018reinforcement` give a comprehensive textbook treatment of stochastic approximation and temporal difference methods in the reinforcement learning context. {cite}`szepesvari2010algorithms` provides a concise and mathematically rigorous introduction to reinforcement learning algorithms, including detailed convergence analysis for both value-based and policy-based methods. {cite}`agarwal2021reinforcement` give a modern theoretical treatment of reinforcement learning with an emphasis on sample complexity and regret bounds. Q-learning was introduced by {cite}`watkins1992qlearning`, with convergence established by {cite}`tsitsiklis1994asynchronous`. For a more recent discussion of convergence, see {cite}`regehr2021elementary`. Deep Q-learning was mentioned in {ref}`ss-qlearn`. {cite}`bertsekas1996neuro` laid the theoretical foundations for combining neural network function approximation with dynamic programming. The potential of this approach was demonstrated by {cite}`mnih2015human`, who showed that a deep Q-network agent could reach human-level performance across a range of Atari games without game-specific engineering. {cite}`yang2020theoretical` provide a theoretical analysis of deep Q-learning. {cite}`wang2024deep` provide a recent survey of the deep reinforcement learning literature. The risk-sensitive Q-learning formulation of {ref}`ss-rs_qlearn` connects to the broader literature on risk-sensitive Markov decision processes surveyed in {cite}`bauerle2024markov`. {cite}`littman1996generalized` is a valuable early contribution to the theoretical foundations of Q-learning in potentially nonstandard environments. The use of deep learning for solving dynamic programming problems in economics is explored by {cite}`azinovic2022deep`. Policy gradient methods, discussed briefly in {ref}`ss-policy_grad`, are treated in depth by {cite}`sutton2018reinforcement`, {cite}`szepesvari2010algorithms`, and {cite}`agarwal2021reinforcement`. The discussion in {ref}`sss-local_global` is a simplified version of the result in {cite}`stachurski2024local`, which treats MDPs on general state spaces under an appropriate generalization of irreducibility. [^1]: For condition (ii), note that $W_{k+1}(x) = (\hat Tv_k)(x) - (Tv_k)(x) = \beta [v_k(X') + d(X')] - (Tv_k)(x)$, so $|W_{k+1}(x)| \leq 2\beta(\|v_k\|_\infty + \|d\|_\infty)$. Hence $\EE[\|W_{k+1}\|^2 \mid \fF_k] \leq 4 |\Xsf| \beta^2 (\|v_k\|_\infty + \|d\|_\infty)^2 \leq C(1 + \|v_k\|^2)$ for a constant $C$ depending only on $\beta$, $|\Xsf|$, and $\|d\|_\infty$. ======================================================================== ## Mathematical Background This chapter collects the mathematical tools employed throughout the book. It is intended as a reference; readers should consult the relevant sections as needed. At minimum, all readers should be familiar with the material in {ref}`s-found` before starting {prf:ref}`c-adps`. (s-found)= ## Foundations In this first section of the chapter, we cover several foundational ideas in analysis, including metrics and partial orders. (sss-si)= ### Properties of the Real Line Let's start with the real line. This set has a natural metric and a natural order, both of which have many important properties. (Later we will investigate conditions under which such properties extend to more general spaces.) (sss-mmsi)= #### Min, Max, Sup and Inf A point $u \in \RR$ is called an **upper bound** of a set $A \subset \RR$ if $a \leq u$ for all $a \in A$. Let $U(A)$ be the set of upper bounds of $A$. If $s \in U(A)$ and $s \leq u$ for all $u \in U(A)$, then $s$ is called the **supremum** of $A$ and we write $s = \sup A$. At most one such supremum $s$ exists. If $s$ is in $U(A)$ then the following are equivalent: 1. $s = \sup A$ 2. for all $\epsilon > 0$, there exists a point $a \in A$ with $a > s - \epsilon$ ```{prf:theorem} :label: t-completeness If $A \subset \RR$ is nonempty and bounded above, then $\sup A$ exists in $\RR$. ``` {prf:ref}`t-completeness` is essentially axiomatic. It is equivalent to the "completeness" property of $\RR$ that we discuss in {ref}`sss-comprl`. For $A \subset \RR$, a **lower bound** of $A$ is any number $\ell$ such that $\ell \leq a$ for all $a \in A$. If $i \in \RR$ is a lower bound for $A$ and also satisfies $i \geq \ell$ for every lower bound $\ell$ of $A$, then $i$ is called the **infimum** of $A$ and we write $i = \inf A$. At most one such $i$ exists, and every nonempty subset of $\RR$ bounded from below has an infimum. We adopt the following conventions: - If $A$ is not bounded above, then $\sup A = +\infty$. - If $A$ is not bounded below, then $\inf A = -\infty$. - If $A = \varnothing$, then $\sup A = -\infty$ and $\inf A = +\infty$. A number $m$ contained in a subset $A$ of $\RR$ is called the **maximum** of $A$ and we write $m = \max A$ if $a \leq m$ for every $a \in A$. It is called the **minimum** of $A$ if $a \geq m$ for every $a \in A$. Given an arbitrary set $D$ and $f \in \RR^D$, we set $$ \sup_{x \in D} f(x) \coloneq \sup \setntn{f(x)}{x \in D} \quad \text{and} \quad \max_{x \in D} f(x) \coloneq \max \setntn{f(x)}{x \in D}. $$ A point $x^* \in D$ is called a - **maximizer** of $f$ on $D$ if $x^* \in D$ and $f(x^*) \geq f(x)$ for all $x \in D$, and a - **minimizer** of $f$ on $D$ if $x^* \in D$ and $f(x^*) \leq f(x)$ for all $x \in D$. Equivalently, $x^* \in D$ is a maximizer of $f$ on $D$ if $f(x^*) = \max_{x \in D} f(x)$, and a minimizer if $f(x^*) = \min_{x \in D} f(x)$. We define $$ \argmax_{x \in D} f(x) \coloneq \setntn{x^* \in D}{f(x^*) \geq f(x) \text{ for all } x \in D}. $$ The set $\argmin_{x \in D} f(x)$ is defined analogously. As usual, given $a, b \in \RR$ we write $a \wedge b$ for $\min\{a, b\}$ and $a \vee b$ for $\max\{a, b\}$. Regarding the order structure of $\RR$, the following relationships are sometimes helpful: Given $x, y \in \RR$ and $a \in \RR_+$, 1. $x + y = x \vee y + x \wedge y$ 2. $|x - y| = x \vee y - x \wedge y$ 3. $|x - y| = x + y - 2 (x \wedge y)$ 4. $|x - y| = 2 ( x \vee y) -x -y$ 5. $a(x \vee y) = (ax ) \vee (ay)$ 6. $a(x \wedge y) = (ax ) \wedge (ay)$ (sss-comprl)= #### Completeness of the Real Line Recall that a sequence $(x_n) \subset \RR$ is called **Cauchy** if, for all $\epsilon > 0$, there exists an $N \in \NN$ with $|x_n - x_m| < \epsilon$ whenever $n, m \geq N$. ```{prf:theorem} :label: t-compreal A real sequence converges in $\RR$ if and only if it is Cauchy. ``` The statement that every Cauchy sequence in $\RR$ converges should be understood as axiomatic. It states that, once the irrational numbers are mixed in with the rational numbers, there are "no more gaps" in the real line. This is called the **completeness** property of $\RR$. See, e.g., {cite}`bartle2011introduction`. (s-prelim)= ### Partial Orders Partially ordered spaces are the natural habitat for dynamic programs, In this section we introduce the key ideas needed for the book. (sss-posets)= #### Partially Ordered Sets The pair $(V, \preceq)$ is called a **partially ordered set** -- or **poset** -- if $V$ is any nonempty set and $\preceq$ is a relation $\preceq$ on $V \times V$ such that, for any $u, v, w$ in $V$, 2 1. $u \preceq u$ 2. $u \preceq v$ and $v \preceq u$ implies $u = v$ and 3. $u \preceq v$ and $v \preceq w$ implies $u \preceq w$ 4. (reflexivity) 5. (antisymmetry) 6. (transitivity) The relation $\preceq$ is called a **partial order** on $V$. We often write $V$ instead of $(V, \preceq)$ when $\preceq$ is understood. We sometimes say that $w$ **dominates** $v$ when $v \preceq w$. ```{prf:example} The usual order $\leq$ is a partial order on $\RR$. ``` ```{prf:example} :label: eg-poac If $\sS$ is any collection of sets, then $\subset$ is a partial order on $\sS$. For example, if $E,F \in \sS$, then $E \subset F$ and $F \subset E$ implies $E=F$. ``` ```{prf:example} :label: ex-rnpo For $u = (u_1, \ldots, u_n)$ and $v = (v_1, \ldots, v_n)$ in $\RR^n$, we write $u \leq v$ if $u_i \leq v_i$ for $i = 1, \ldots, n$. It is simple to confirm that $\leq$ is a partial order on $\RR^n$. ``` A subset $C$ of a poset $(V, \preceq)$ is called a **chain** in $V$ if either $u \preceq v$ or $v \preceq u$ for all $u, v \in C$. A poset $(V, \preceq)$ is called **totally ordered** if $V$ itself is a chain. ```{prf:example} $(\RR, \leq)$ is totally ordered, while $\setntn{(q, q)}{q \in \ZZ}$ is a chain in $\RR^2$. ``` In applications, one of the most important notions of partial order is the pointwise partial order. To define it we let $U, V$ be nonempty sets with $V$ partially ordered by $\preceq$. Let $V^U$ be a set of maps from $U$ to $V$. For each $f, g \in V^U$, we set $$ f \preceq g \quad \iff \quad f(u) \preceq g(u) \text{ for all } u \in U. $$ (eq-ppo) Then $\preceq$ is a partial order on $V^U$, usually called the **pointwise order** on $V^U$. One very common setting is where $V \subset \RR^\Xsf$ for some nonempty set $\Xsf$. In this setting, we always write the pointwise partial order as $\leq$. In particular, for arbitrary $u, v \in \RR^\Xsf$ we write $u \leq v$ if and only if $u(x) \leq v(x)$ for all $x \in \Xsf$. The partial order in {prf:ref}`ex-rnpo` is a special case, when $\Xsf = \{1, \ldots, n\}$. #### Bounds Let $V$ be a poset. $I \subset V$ is called an **order interval** in $V$ if there exists an $a, b$ in $V$ with $a \preceq b$ such that $$ I = \setntn{v \in V}{a \preceq v \preceq b} . $$ In this case we also write $I = [a,b]$. ```{prf:example} If $(b\Xsf, \leq)$ is all bounded functions in $\RR^\Xsf$, paired with the pointwise order, and $I = [g, h]$ for some $g, h \in b\Xsf$, then $I$ is the order interval in $b\Xsf$ containing all $f \in b\Xsf$ such that $g(x) \leq f(x) \leq h(x)$ for all $x$ in $\Xsf$. ``` Given a poset $V$ and a subset $A$ of $V$, we call - $u \in V$ an **upper bound** of $A$ if $a \preceq u$ for all $a$ in $A$ and - $\ell \in V$ a **lower bound** of $A$ if $\ell \preceq a$ for all $a$ in $A$. A subset $A$ of poset $V$ is called **bounded above** (resp., **bounded below**) if the set of upper bounds (resp., lower bounds) of $A$ is nonempty (i.e., there exists at least one $v \in V$ with $a \preceq v$ for all $a \in A$). $A$ is called **order bounded** in $V$ if $A$ is both bounded above and bounded below. Obviously, $A$ is order bounded in $V$ if and only if there exists an order interval $I \subset V$ such that $A \subset I$. ```{exercise} :label: ex-math_foundations-auto-1 Prove that, for the poset $(\RR^n, \leq)$, a set $A \subset \RR^n$ is order bounded if and only if it is bounded; that is, if and only if there exists an $M \in \NN$ with $\|a\| \leq M$ for all $a \in A$. ``` ```{solution} ex-math_foundations-auto-1 If $A$ is order bounded, with $A \subset [u, v] \subset \RR^n$, then, given $a \in A$, we have $|a_i| \leq |u_i| \vee |v_i| \leq \| u \|_\infty \vee \| u \|_\infty =: M$ for all $i$, and hence $\|a \|_\infty \leq M$. Hence $A$ is bounded with respect to the norm $\| \cdot \|_\infty$, and therefore with respect to any norm on $\RR^n$ by equivalence of norms (see {ref}`sss-nonvecsp`). Conversely, if $A$ is bounded, with $\| a \|_\infty \leq M$ for all $a \in A$, then $- M \1 \leq a \leq M \1$ for all $a \in A$. Hence $A$ is order bounded. ``` #### Greatest and Least Elements Given poset $V$ and $A \subset V$, we say that - $g \in V$ is the **greatest element** of $A$ if $g \in A$ and $a \in A \implies a \preceq g$; and - $\ell \in V$ is the **least element** of $A$ if $\ell \in A$ and $a \in A \implies \ell \preceq a$. In other words, a greatest element of $A$ is an upper bound of $A$ that is also contained in $A$, while a least element of $A$ is a lower bound of $A$ also contained in $A$. ```{prf:example} Continuing {prf:ref}`eg-poac`, if $\wp(A)$ is the set of all subsets of set $A$, then $\subset$ is a partial order on $\wp(A)$. Since $B \subset A$ for all $B \in \wp(A)$, we see that $A$ is the greatest element of $\wp(A)$. The least element is $\varnothing$. ``` ```{exercise} :label: ex-glun Prove: A subset $A$ of a poset $V$ can have at most one greatest element and at most one least element. ``` Not all subsets of partially ordered sets have greatest elements. For one example, observe that $\NN \subset (\RR, \leq)$ has no greatest element. In this case the set of upper bounds is empty, so finding a greatest element is impossible. We can also have situations where the set of upper bounds is nonempty but no greatest element exists. ```{prf:example} In {numref}`f-unit_circ`, a point $y \in \RR^2$ obeys $y \leq x$ only when $y$ lies to the southwest of $x$ (below and to the left of the dashed lines). Thus, $x$ is not a greatest element of the circle $C$ shown in the figure. Some thought will convince you that no other point in $C$ is a greatest element of $C$. ``` ```{figure} figures/unit_circ.pdf :name: f-unit_circ The unit circle in $\RR^2$ has no greatest element ``` If a poset $V$ has a greatest element, that element is sometimes called the **top** of $V$. A least element of $V$ is sometimes called the **bottom** of $V$. ```{prf:example} If $C$ is all continuous $f \colon \RR \to [0,1]$, paired with the pointwise order $\leq$, then $C$ is order bounded with top $\equiv 1$ and bottom $\equiv 0$. ``` (sss-infsuppo)= #### Suprema and Infima Let $A$ be a subset of poset $V$ and let $U(A)$ be the set of all upper bounds of $A$ in $V$. We call $s \in V$ the **supremum** of $A$ if $s$ is a least element of $U(A)$. Since least elements are unique ({prf:ref}`ex-glun`), subsets of $V$ can have at most one supremum. When it exists, the supremum of $A$ is denoted by $\bigvee A$. Also, - if $A = \{a_i\}_{i \in I}$ for some index set $I$, we write $\bigvee A$ as $\bigvee_i \, a_i$. - Given $u$ and $v$ in $V$, the supremum $\bigvee \{u, v\}$ is also called the **join** of $u$ and $v$, and is written $u \vee v$. ```{prf:example} :label: eg-rse If $V = (\RR, \leq)$, then the notion of supremum for $A \subset \RR$ reduces to the usual one from real analysis (see {ref}`sss-mmsi`). In this setting (and only in this setting), we also use the traditional notation $\sup A$ to indicate the supremum of $A$. ``` While every $A \subset \RR$ that is bounded above has a supremum ({prf:ref}`t-completeness`), the same is not true for arbitrary posets. ```{prf:example} :label: eg-ccexa Let $C$ be the continuous functions from $[0, 1]$ into $\RR$. Consider the sequence of functions $F = \{f_n\}_{n \geq 2}$ where $f_n(x) = 0$ when $0 \leq x \leq 1/2$, $f_n(x) = n(x-1/2)$ when $1/2 \leq x \leq 1/n + 1/2$ and $f_n(x) = 1$ otherwise. If $g \in U(F)$, the set of upper bounds of $F$ in $C$, then, by continuity and the upper bound property, it must be that $g = 1$ on $[1/2, 1]$. Given any $g \in U(F)$, we can always take a $g' \in U(F)$ with $g' \leq g$ and $g'(x) < g(x)$ at at least one $x$. Hence $U(F)$ has no least element and, as a result, $F$ has no supremum in $C$. ``` ```{exercise} :label: ex-supsand Let $(u_n)$, $(v_n)$ and $(w_n)$ be sequences in a poset $V$ with $u_n \preceq v_n \preceq w_n$ for all $n$. Suppose $\bigvee_n u_n = \bigvee_n w_n =: s$. Prove that $\bigvee_n v_n = s$. ``` ```{solution} ex-supsand Since $v_n \preceq w_n \preceq s$ for all $n$, the element $s$ is an upper bound for $(v_n)$. Now let $t$ be any other upper bound, so that $v_n \preceq t$ for all $n$. In this case we also have $u_n \preceq t$ for all $n$. But $s$ is the least upper bound for $(u_n)$, so $s \preceq t$. This shows that $s \preceq t$ for any upper bound $t$ of $(v_n)$. In particular, $s$ is the supremum of $(v_n)$. ``` {numref}`f-lattice` provides a visualization of $u \vee v$ and $u \wedge v$ when $V = (\RR^2, \leq)$. {numref}`f-infsup` provides a visualization of $f \vee g$ and $f \wedge g$ when $V = (\RR^\Xsf, \leq)$ for some subset $\Xsf$ of the reals. In both cases, $\leq$ is the pointwise partial order. ```{figure} figures/lattice.svg :name: f-lattice The points $u \vee v$ and $u \wedge v$ in $\RR^2$ ``` ```{figure} figures/infsup.svg :name: f-infsup Functions $f \vee g$ and $f \wedge g$ when defined on a subset of $\RR$ ``` Given $A$ contained in poset $V$, an element of $V$ is called the **infimum** of $A$ if it is a greatest element of the set of lower bounds of $A$. The infimum of $A$ is typically denoted $\bigwedge A$. If $V \subset \RR$ with the usual order $\leq$, then we also use the notation $\inf A$. Also, - if $A = \{a_i\}_{i \in I}$ for some index set $I$, we sometimes write $\bigwedge A$ as $\bigwedge_i \, a_i$. - Given $u$ and $v$ in $V$, the infimum $\bigwedge \{u, v\}$ is also called the **meet** of $u$ and $v$, and is written $u \wedge v$. ```{exercise} :label: ex-polim Let $\Xsf$ be any nonempty set, let $V \subset \RR^\Xsf$, and consider $(V, \leq)$ as a poset when $\leq$ is the pointwise partial order. Let $G$ be a nonempty subset of $V$ and let $s$ and $i$ be given by $$ s(x) := \sup_{g \in G} g(x) \quad \text{and} \quad i(x) := \inf_{g \in G} g(x) \qquad (x \in \Xsf) $$ (eq-pwsai) (The $\sup$ and $\inf$ on the right-hand side follow the rules in {ref}`sss-si`, with $s$ taking values in $(-\infty, +\infty]$ and $i$ taking values in $[-\infty, \infty)$.) Prove that 1. If $s \in V$, then $\bigvee G$ exists in $V$ and $\bigvee G = s$. 2. If $i \in V$, then $\bigwedge G$ exists in $V$ and $\bigwedge G = i$. ``` ```{solution} ex-polim We prove (i). If $s \in V$, then $s \in U(G)$. Also, if $w \in U(G)$, then $g \leq w$ for all $g \in G$ and hence $s \leq w$. This proves that $s =\bigvee G$. ``` ```{exercise} :label: ex-pposi2 Let $C$ be all continuous functions from $[0,1]$ to itself paired with the pointwise order. Provide an example of a $G \subset C$ such that $\bigvee G$ exists in $C$ and yet $\bigvee G$ is not equal to $s$, the pointwise supremum in {eq}`eq-pwsai`. ``` ```{solution} ex-pposi2 Let $C$ be as stated and let $G$ be all $f_n \in C$ with $f_n(x) = x^{1/n}$ for all $x$. Clearly $s(x) = \1\{x > 0\}$, which is not in $C$. At the same time, if $g \in C$ dominates all $f_n$, then $g$ is continuous and equals $1$ for all $x \in (0,1]$, so $g \equiv 1$. Thus, $U(G)$, the set of upper bounds of $G$ in $C$, is equal to $\{1\}$. In particular, $\bigvee G = 1$, so $\bigvee G$ and $s$ are distinct. ``` For the poset $(\RR, \leq)$, every $x \in \RR$ is an upper bound of the empty set $\varnothing$, since, vacuously, $x$ dominates all elements of $\varnothing$. Since $\RR$ has no least element, the set $\varnothing \subset \RR$ has no supremum in $\RR$. By related reasoning, if we restrict attention to $([0,1], \leq)$, we see that $0$ is the supremum of $\varnothing \subset [0,1]$. The next exercise extends this line of reasoning. ```{exercise} :label: ex-eocpl Let $V$ be any poset. Prove the following: 1. If $b \coloneq \bigvee \varnothing$ exists in $V$, then $V$ has a least element and that least element is $b$. 2. If $b$ is the least element of $V$, then $\bigvee \varnothing = b$ in $V$. (Analogous statements are true for the infimum of $\varnothing$ and the greatest element of $V$.) ``` ```{solution} ex-eocpl Suppose first that $b \coloneq \bigvee \varnothing$ exists in $V$. If $v \in V$, then $v$ is an upper bound of $\varnothing$, since the statement $u \preceq v$ for all $u \in \varnothing$ is vacuously true. Hence $b \preceq v$. This proves that $b$ is the least element of $V$. Next, suppose that $b$ is the least element of $V$. Then $b$ is a lower bound of $\varnothing$, since the statement $b \preceq v$ for all $v \in \varnothing$ is vacuously true. As the least element of $V$, $b$ is also a least upper bound of $\varnothing$, so $\bigvee \varnothing = b$. ``` (sss-orddual)= #### Order Duals Given partially ordered set $V$, let $V^\partial = (V, \preceq^\partial)$ be the **order dual** (also called the **dual**), so that, for $u, v \in V$, we have $u \preceq^\partial v$ if and only if $v \preceq u$. We use $\bigvee^\partial A$ to denote the supremum of $A \subset V^\partial$ in $V^\partial$ and $\bigwedge^\partial$ for the infimum. ```{exercise} :label: ex-dualsi Fix $A \subset V$ and prove the following. 1. If $\bigvee A$ exists in $V$, then $\bigwedge^\partial A$ exists in $V^\partial$ and $\bigwedge^\partial A = \bigvee A$. 2. If $\bigwedge A$ exists in $V$, then $\bigvee^\partial A$ exists in $V^\partial$ and $\bigvee^\partial A = \bigwedge A$. ``` (sss-monseq)= #### Monotone Sequences Let $V$ be any poset. A sequence $(v_n)_{n \geq 1}$ in $V$ is called **increasing** if $v_n \preceq v_{n+1}$ for all $n \in \NN$, and **decreasing** if $v_{n+1} \preceq v_n$ for all $n \in \NN$. We write - $v_n \uparrow v$ when $(v_n)$ is increasing and $\bigvee_n v_n = v$ and - $v_n \downarrow v$ when $(v_n)$ is decreasing and $\bigwedge_n v_n = v$. These symbols generalize standard notation for convergence of monotone sequences in $\RR$. For example, if $(u_n)$ is increasing in $\RR$ and its limit is $u$, then one writes $u_n \uparrow u$. This is a special case of the usage above, since $u$ is also the supremum of $(u_n)$ under the standard order on $\RR$. In some settings, the order theoretic concepts $\uparrow$ and $\downarrow$ have simple pointwise characterizations. The next lemma gives one such characterization for $\uparrow$ (and an analogous result holds for $\downarrow$). In the statement of the lemma, $V \subset \RR^\Xsf$ for some nonempty $\Xsf$ and $\leq$ is the pointwise partial order. Also, we say that $V$ is **closed under pointwise suprema** if, for every increasing $(v_n) \subset V$ that is bounded above, the pointwise supremum $s(x) = \sup_n v_n(x)$ is an element of $V$. ```{prf:lemma} :label: l-pcid Let $V \subset \RR^\Xsf$ be closed under pointwise suprema, let $(v_n)$ be a sequence in $V$ and let $v$ be an element of $V$. In this setting, $$ v_n(x) \uparrow v(x) \text{ in } \RR \text{ for all } x \in \Xsf \quad \iff \quad v_n \uparrow v. $$ ``` ```{prf:proof} ($\Rightarrow$) Let $(v_n)$ be increasing, let $v$ be in $V$ and suppose that $v_n(x) \uparrow v(x)$ in $\RR$ for all $x \in \Xsf$. By {prf:ref}`ex-polim`, $\bigvee_n v_n$ exists in $V$ and equals $v$. Since $(v_n)$ is increasing, we have $v_n \uparrow v$. ($\Leftarrow$) Suppose that $v_n \uparrow v$ for some $v \in V$. Fix $x \in \Xsf$ and note that $v_n(x)$ is increasing and bounded above by $v(x)$. Hence the pointwise supremum function $s(x) = \sup_n v_n(x)$ exists in $\RR^\Xsf$ and $s \leq v$. Since $V$ is closed under pointwise suprema, we also have $s \in V$. Since $v_n \leq s \leq v$ for all $n$ and $\bigvee_n v_n = v$, we see that $s = v$. This means that, for any $x \in \Xsf$, we have $\sup_n v_n(x) = v(x)$. Hence $v_n(x) \uparrow v(x)$. ◻ ``` (sss-opm)= #### Order Preserving Maps A self-map $S$ from poset $V = (V, \preceq)$ to poset $U = (U, \trianglelefteq)$ is called - **order preserving** if $v, w \in V$ and $v \preceq w$ implies $Sv \trianglelefteq Sw$, and - **order reversing** if $v, w \in V$ and $v \preceq w$ implies $Sw \trianglelefteq Sv$. ```{prf:example} Let $\leq$ be the pointwise partial order on $\RR^n$. If $A$ is an $n \times n$ matrix with nonnegative values and $b$ is a vector in $\RR^n$, then the affine operator $S$ sending $v$ to $Av + b$ is order preserving on $(\RR^n, \leq)$. Indeed, if $u \leq v$, then $u - v \leq 0$ and hence $A(u-v) \leq 0$. It follows that $Au - Av \leq 0$ and, therefore, $Su - Sv \leq 0$. ``` In the definition of order preserving above, one common setting is when $U = \RR$ with its standard order. In this case, the mapping $S$ is often called **increasing**. We will also use this terminology. The result in the next exercise uses the fact that the standard order on $\RR$ is closed (i.e., preserved under limits). ```{exercise} :label: ex-icul Let $\Xsf$ be any poset and let $b\Xsf$ be the bounded real-valued functions on $\Xsf$ paired with the supremum norm. Let $ib\Xsf$ be the set of increasing functions in $b\Xsf$. Prove that $ib\Xsf$ is closed in $b\Xsf$. ``` ```{solution} ex-icul Let $(f_n)$ be a sequence in $ib\Xsf$ such that $f_n \to f$ for some $f \in b\Xsf$. The function $f$ is increasing because, for $x, x' \in \Xsf$ with $x \preceq x'$, we have $f_n(x) \leq f_n(x')$ for all $n$, and hence, taking the limit, $f(x) \leq f(x')$. ``` ```{exercise} :label: ex-dompower Let $(V, \preceq)$, be a partially ordered set and let $\sS$ be the set of all order preserving self-maps on $V$. Let $\preceq$ be the pointwise order on $\sS$ (i.e., $S \preceq T$ if $Sv \preceq Tv$ for all $v \in V$). In this setting, prove the following statements: 1. If $S \in \sS$, then $S^k \in \sS$ for all $k \in \NN$. 2. If $S, T \in \sS$ and $S \preceq T$, then $S^k \preceq T^k$ for all $k \in \NN$. ``` ```{solution} ex-dompower Part (i) follows easily from a simple induction argument. Regarding part (ii), let $S, T \in \sS$ obey $S \preceq T$. We claim that $S^k \preceq T^k$ holds for all $k \in \NN$. Clearly it holds for $k=1$. If it also holds at $k-1$, then, for any $u \in V$, we have $S^k u = S S^{k-1} u \leq S T^{k-1} u \leq T T^{k-1} u = T^k u$, where we used the induction hypothesis, the order preserving property of $S$ and the assumption that $S \preceq T$. ``` (sss-stmon)= #### Strict Monotonicity Now let's examine a form of strict monotonicity. We consider posets $V = (V, \preceq)$ and $W = (W, \trianglelefteq)$. For $u, v \in V$, we write $u \prec v$ if $u \preceq v$ and not $u = v$. For $x, y \in W$, we write $x \vartriangleleft y$ if $x \trianglelefteq y$ and not $x = y$. We call a map $S$ from $V$ to $W$ **strictly order preserving** if $v \prec w$ implies $S v \vartriangleleft S w$. In the example below, $\leq$ is the pointwise partial order and $u0$. Consider the map $P$ from $\RR^\Xsf$ to $\RR^\Usf$ defined by $$ (Pv)(u) = \sum_{x} v(x) P(u, x) \qquad (u \in \Usf). $$ We claim that $P$ is strictly order preserving. Indeed, if $v < w \in V$, then there exists some $\bar x \in \Xsf$ such that $v(\bar x) < w(\bar x)$. Also, by assumption, there exists some $u \in \Usf$ such that $P(u, \bar x) > 0$. As a result, we have $$ \begin{aligned} (P v)(u) & = \sum_{x \neq \, \bar x} v(x) P(u, x) + v(\bar x) P(u, \bar x) \\ & < \sum_{x \neq \, \bar x} v(x) P(u, x) + w(\bar x) P(u, \bar x) \leq \sum_{x} w(x) P(u, x) = (P w)(u). \end{aligned} $$ Hence, $P v < P w$. ``` (sss-orfs)= #### Order Isomorphisms A surjective map $F$ from poset $(V,\preceq)$ to poset $(\hat V, \trianglelefteq)$ is called an - **order isomorphism** if $v \preceq w \iff F v \trianglelefteq F w$, and an - **order anti-isomorphism** if $v \preceq w \iff F w \trianglelefteq F v$. (Surjective means that $F$ maps $V$ onto $\hat V$, so each $\hat v \in \hat V$ has a preimage.) When such an order isomorphism (resp., anti-isomorphism) exists, we say that $V$ and $\hat V$ are isomorphic (resp., anti-isomorphic). ```{prf:example} Let $V = \hat V = \RR^n_+$ with the usual pointwise order. Consider $F$ mapping $v \in V$ to $v^2 \in \hat V$, where the operation $v \mapsto v^2$ acts pointwise. $F$ is clearly onto and, for $v, w \geq 0$, we have $w \leq v$ if and only if $w^2 \leq v^2$. Hence $F$ is an order isomorphism. ``` In the next two exercises, $\Xsf$ is any nonempty set and all spaces of real-valued functions have the pointwise partial order. Scalar actions on functions are applied pointwise; for example, given $h \in \RR^\Xsf$, the function $\exp h$ maps $x$ to $\exp(h(x))$. ```{exercise} :label: ex-eth Given $h \in \RR^\Xsf$, let $Fh = \exp(\theta h)$. Show that $F$ is an order isomorphism (resp., anti-isomorphism) from $\RR^\Xsf$ to $(0, \infty)^\Xsf$ whenever $\theta > 0$ (resp., $\theta < 0$). ``` ```{figure} figures/eth_viz.svg :name: fig-eth_viz Visualization of {prf:ref}`ex-eth` with $\Xsf = [0,1]$ ``` {numref}`fig-eth_viz` provides a visualization of the result in {prf:ref}`ex-eth` when $\Xsf = [0,1]$. The two functions are $h(x) = x^2 - 1/2$, and $h'(x) = x$, so that $h \leq h'$. The middle panel sets $\theta = 2$, which preserves the order. The right panel sets $\theta =-2$, which reverses order. The next exercise generalizes {prf:ref}`ex-eth`. ```{exercise} :label: ex-vmvm Let $$ V = M^\Xsf \quad \text{and} \quad \hat V = \hat{M}^\Xsf \quad \text{where } M, \hat M \subset \RR. $$ (eq-vmvm) Let $\phi$ be a bijection from $M$ onto $\hat M$ and let $F v = \phi \circ v$. Prove the following: - $F$ is a bijection from $V$ to $\hat V$. - $\phi$ is an order isomorphism $\implies$ $F$ is an order isomorphism. - $\phi$ is an order anti-isomorphism $\implies$ $F$ is an order anti-isomorphism. ``` In the following exercises, $(V, \preceq)$ and $(\hat V, \trianglelefteq)$ are arbitrary posets. ```{exercise} :label: ex-math_foundations-auto-2 Let $F \colon V \to \hat V$. Show that $F$ is a bijection whenever $F$ is an order isomorphism (resp., order anti-isomorphism). ``` ```{solution} ex-math_foundations-auto-2 We address the first claim. Let $F$ be an order isomorphism from $V$ to $\hat V$ and observe that, by reflexivity, $Fv = Fv'$ implies $Fv \trianglelefteq Fv'$ and $Fv' \trianglelefteq Fv$. Since $F$ is order isomorphic, this yields $v \preceq v'$ and $v' \preceq v$. Using antisymmetry, we get $v = v'$. Hence $F$ is one-to-one as well as onto, and therefore bijective. ``` ```{exercise} :label: ex-iffom Let $F \colon V \to \hat V$ be a bijection. Show that 1. $F$ is an order isomorphism if and only if $F$ and $F^{-1}$ are order preserving, and 2. $F$ is an order anti-isomorphism if and only if $F$ and $F^{-1}$ are order reversing. ``` ```{prf:lemma} :label: l-ois Let $F$ be an order isomorphism from $V$ to $\hat V$. If the supremum of $\{v_\alpha\}_{\alpha \in \Lambda} \subset V$ exists in $V$, then $$ \bigvee_\alpha F v_\alpha \text{ exists in } \hat V \text{ and } \bigvee_\alpha F v_\alpha = F \bigvee_\alpha v_\alpha. $$ ``` ```{exercise} :label: ex-math_foundations-auto-3 Prove {prf:ref}`l-ois`. ``` ```{solution} ex-math_foundations-auto-3 Let $\{p_\alpha\}_{\alpha \in \Lambda}$ be a subset of $V$, let $V$, $\hat V$, and $F$ be as stated, and let $\bar p \coloneq \bigvee_\alpha p_\alpha$. We need to show that $\bar q \coloneq F \bar p$ is the supremum of $\{F p_\alpha\}_{\alpha \in \Lambda}$. First, $p_\alpha \preceq \bar p$ for all $\alpha$, so $F p_\alpha \preceq F \bar p = \bar q$ for all $\alpha$. In particular, $\bar q$ is an upper bound of $\{F p_\alpha\}_{\alpha \in \Lambda}$. Moreover, if $u$ is any upper bound of $\{F p_\alpha\}_{\alpha \in \Lambda}$, then $F p_\alpha \preceq u$ and hence $p_\alpha \preceq F^{-1} u$ for all $\alpha$, so $\bar p \preceq F^{-1} u$. But then $\bar q = F \bar p \preceq u$. Hence $\bar q$ is the supremum of $\{F p_\alpha\}_{\alpha \in \Lambda}$, as was to be shown. ``` The next exercise is related to {prf:ref}`l-ois`. ```{exercise} :label: ex-ios Let $V, \hat V$ be posets, let $(v_n)$ be a sequence in $V$, and let $F$ be a map from $V$ to $\hat V$. Prove the following: 1. If $F$ is an order isomorphism, then $v_n \uparrow v$ in $V$ if and only if $Fv_n \uparrow Fv$ in $\hat V$. 2. If $F$ is an order anti-isomorphism, then $v_n \uparrow v$ in $V$ if and only if $Fv_n \downarrow Fv$ in $\hat V$. ``` ```{exercise} :label: ex-dualanti Prove the following claims: 1. If $V, \hat V$ are order isomorphic, then $V$ is totally ordered if and only if $\hat V$ is totally ordered. 2. $F$ is an order anti-isomorphism from $V$ to $\hat V$ if and only if $F$ is an order isomorphism from $V$ to the dual $\hat V^\partial$. ``` (sss-orstab)= #### Order Stability In {ref}`sss-dpros` we discussed the fact that contractivity of the Bellman operator plays a significant role in the proof of Bellman-type optimality results for the optimal savings problem. Contractivity is a metric property that has no immediate counterpart in an abstract partially ordered set. This motivates us to introduce weaker conditions on operators that are well defined in any poset and, at the same time, strong enough to generate useful optimality results. This section gives details. Let $V$ be a poset and let $S$ be a self-map on $V$. In this setting, we call $S$ **order stable** if 1. $S$ has a unique fixed point $\bar v$ in $V$, 2. $v \in V$ with $v \preceq S \, v$ implies $v \preceq \bar v$, and 3. $v \in V$ with $S \, v \preceq v$ implies $\bar v \preceq v$. Conditions (ii) and (iii) say that points mapped up by $S$ lie below the fixed point, while points mapped down lie above it. We strengthen this notion by adding monotone convergence to the fixed point. We call $S$ **strongly order stable** if $S$ is order stable and, in addition, 1. $v \in V$ with $v \preceq S \, v$ implies $S^n v \uparrow \bar v$, and 2. $v \in V$ with $S \, v \preceq v$ implies $S^n v \downarrow \bar v$. {numref}`f-up_down_stable` gives an illustration of a strongly order stable map $S$ on $V = [0,1]$. All points mapped up by $S$ lie below and converge up to its unique fixed point. All points mapped down by $S$ lie above and converge down to its fixed point. ```{figure} figures/up_down_stable.pdf :name: f-up_down_stable A strongly order stable map $S$ on $[0,1]$ ``` Most results in this book require only order stability. Strong order stability is invoked in a small number of places where monotone convergence of the iterates is needed. The following result is useful when we consider minimization problems. ```{prf:lemma} :label: l-odod $S$ is order stable on $V$ if and only if $S$ is order stable on $V^\partial$. The same equivalence holds for strong order stability. ``` ```{prf:proof} Let $S$ be as stated. By definition, $S$ has a unique fixed point $\bar v \in V$. Hence it remains only to verify conditions (ii) and (iii) of order stability on $V^\partial$. Regarding (ii), suppose $v \in V$ and $v \preceq^\partial S v$. Then $Sv \preceq v$ and hence $\bar v \preceq v$, by (iii) applied to $S$ on $V$. But then $v \preceq^\partial \bar v$, so (ii) holds on $V^\partial$. The proof that (iii) holds on $V^\partial$ is similar. We have shown that $S$ is order stable on $V^\partial$ whenever $S$ is order stable on $V$. The reverse implication holds because the dual of $V^\partial$ is $V$. The argument for strong order stability is similar: $S^n v \uparrow \bar v$ in $V$ is equivalent to $S^n v \downarrow \bar v$ in $V^\partial$, so (iv) on $V$ corresponds to (v) on $V^\partial$ and vice versa. ◻ ``` (sss-metrics)= ### Metric Space In this section we define metric spaces and review convergence, open and closed sets, compactness, and completeness. #### Definition Let $V$ be a nonempty set. A function $d \colon V \times V \to \RR$ is called a **metric** on $V$ if, for any $u, v, w \in V$, 2 1. $d(u,v) \geq 0$, 2. $d(u,v)=0 \iff u=v$, 3. $d(u,v)=d(v,u)$ and 4. $d(u,v)\leq d(u,w)+d(w,v)$. 5. (nonnegativity) 6. (identifiability) 7. (symmetry) 8. (triangle inequality) Together, the pair $(V, d)$ is called a **metric space**. When the metric is clear from context we refer to the metric space by the symbol $V$ alone. ```{prf:example} :label: eg-bx Let $\Xsf$ be any set. Let $b\Xsf$ denote all bounded functions from $\Xsf$ to $\RR$. For all $f, g$ in $b\Xsf$, let $$ \| f \|_\infty \coloneq \sup_{x \in \Xsf} |f(x)| \quad \text{and} \quad d_\infty(f, g) \coloneq \|f - g \|_\infty. $$ The map $f \mapsto \| f \|_\infty$ is called the **supremum norm** and $d_\infty$ is called the **supremum distance**. The pair $(b\Xsf, d_\infty)$ is a metric space. The triangle inequality holds because, given $f, g, h$ in $b\Xsf$ and $x \in \Xsf$, we have (by the triangle inequality in $\RR$), $$ |f(x) - g(x)| \leq |f(x) - h(x)| + |h(x) - g(x)| \leq d_\infty (f, h) + d_\infty(h, g). $$ The right side is an upper bound for the left side, so $d_\infty(f, g) \leq d_\infty (f, h) + d_\infty(h, g)$. ``` ```{prf:example} :label: eg-ellp Let $\Xsf$ be finite or countable and fix $p$ with $1 \leq p < \infty$. Let $$ \| h \|_p \coloneq \left\{ \sum_{x \in \Xsf} |h(x)|^p \right\}^{1/p} \quad \text{and} \quad d_p(g, h) = \| g - h \|_p. $$ With $\ell_p(\Xsf) \coloneq \left\{ h \in \RR^\Xsf \,:\, \| h \|_p < \infty \right\}$ the pair $(\ell_p(\Xsf), d_p)$ is a metric space. The triangle inequality can be established via the **Hölder inequality** that states that $\| f g \|_1 \leq \| f \|_p \, \| g \|_q$ whenever $p, q \in [1, \infty]$ with $1/p + 1/q = 1$. In this setting the triangle inequality is also called the **Minkowski inequality**. ``` ```{prf:example} :label: eg-euc If, in {prf:ref}`eg-ellp`, we take $\Xsf = \{1, \ldots, n\}$ and $p=2$, then $\RR^\Xsf$ is naturally identified with $\RR^n$, the set of real-valued $n$ vectors $u = (u_1, \ldots, u_n)$, while $d_2(u, v)$ is the ordinary Euclidean distance $(\sum_{i=1}^n (u_i - v_i)^2)^{1/2}$ between vectors $u$ and $v$. ``` If $(V, d)$ is a metric space and $N \subset V$, then $(N, d)$ is also a metric space (where $d$ in the second case is defined by restricting the original metric to $(u, v) \in N \times N$). ```{prf:example} :label: eg-bcx0 Let $\Xsf$ be a metric space and let $bc\Xsf$ be all continuous functions in $b\Xsf$. Since $bc\Xsf$ is a subset of $b\Xsf$, $(bc\Xsf, d_\infty)$ is a metric space. ``` ```{exercise} :label: ex-discmet Let $V$ be any nonempty set and consider the **discrete metric** on $V$ given by $d(u, v) = \1 \{u \not= v \}$. Confirm that $d$ is a metric on $V$. ``` ```{solution} ex-discmet The proofs are straightforward. For example, to see that $d$ satisfies the triangle inequality, pick any $u,v, w \in V$. We claim that $d(u,v)\leq d(u,w)+d(w,v)$. If $u=v$, this bound is trivial, so suppose they are distinct. We then need to show that $1 \leq d(u,w)+d(w,v)$. Suppose to the contrary that $d(u,w)+d(w,v) = 0$. It follows that $u=w$ and $v=w$. But then $u=v$, which is a contradiction. ``` #### Convergence Given any point $u$ in metric space $(V, d)$, the **$\epsilon$-ball** around $u$ is the set $$ B_\epsilon(u) \coloneq \setntn{v \in V}{d(u, v) < \epsilon} . $$ We say that sequence $(u_n) \subset V$ **converges to $u \in V$** if $$ \forall \, \epsilon > 0,\; \exists \, n_\epsilon \in \NN \st n \geq n_\epsilon \implies u_n \in B_\epsilon(u). $$ ```{prf:example} Recall that a sequence $(x_n)$ in $\RR$ converges to $x \in \RR$ if, given any $\epsilon > 0$, there is an $N \in \NN$ such that $|x_n - x| < \epsilon$ for all $n \geq N$. This is equivalent to the statement that $x_n \to x$ in the metric space $(V, d)$ when $V = \RR$ and $d(x, y) = |x - y|$. ``` ```{exercise} :label: ex-discconv Let $d$ be the discrete metric. Show that, for any $u \in V$, there exists an $\epsilon > 0$ such that $B_\epsilon(u) = \{u \}$. Show in addition that if $(u_n)$ is a sequence in $V$ converging to some point in $V$, then $(u_n)$ is eventually constant. ``` A **subsequence** of a sequence $(u_n)$ in $V$ is any sequence of the form $(u_{n_k})_{k \geq 1}$, where $(n_k)$ is a strictly increasing sequence in $\NN$. A metric space $V$ is called **separable** if there exists a countable set $A \subset V$ such that, for any $v \in V$, there exists a sequence $(a_n)$ contained in $A$ with $a_n \to v$. For example, $\RR$ is separable because any $v \in \RR$ can be expressed as the limit of a rational sequence. Separability is useful in certain settings -- particularly when we need to combine topology and measure (see, e.g., {prf:ref}`t-berge`). In the applications we consider, most spaces will be separable. (sss-oclo)= #### Open and Closed Sets Let $V$ be a metric space. A point $u \in A \subset V$ is called **interior** to $A$ if there exists an $\epsilon > 0$ such that $B_\epsilon(u) \subset A$. A subset $G$ of $V$ is called **open** in $V$ if every $u \in G$ is interior to $G$. For example, every subset of a discrete metric space is open, since $B_{1/2}(u) = \{u\}$ for any $u$. A subset $F$ of $V$ is called **closed** if given any sequence $(u_n)$ satisfying $u_n \in F$ for all $n$ and $u_n \to u$ for some $u \in V$, the point $u$ is in $F$. In other words, $F$ contains the limit points of all convergent sequences that take values in $F$. Arbitrary unions and finite intersections of open sets are open, while arbitrary intersections and finite unions of closed sets are closed. A set $G \subset V$ is open if and only if $G^c$ is closed. ```{prf:example} Limits in $\RR$ preserve orders, so $a \leq x_n \leq b$ for all $n \in \NN$ and $x_n \to x$ implies $a \leq x \leq b$. Thus, any closed interval $[a, b]$ in $\RR$ is closed in the standard (one dimensional Euclidean) metric. ``` ```{prf:example} :label: eg-bcx As in the definition on page , let $S$ be a metric space and let $bcS$ be the set of all continuous functions in $bS$ (see {prf:ref}`eg-bx` for the definition). The set $bcS$ is a closed set in $bS$ because uniform limits of continuous functions are continuous. ``` (sss-commet)= #### Compactness A set $D$ in $V$ is called **bounded** if there exists a finite $K$ such that $d(u, v) \leq K$ whenever $u, v \in D$. A sequence in $V$ is called bounded if its range is a bounded subset of $V$. A subset $K$ of $V$ is called **precompact** in $V$ if every sequence in $K$ has a subsequence converging to some point in $V$. The set $K$ is called **compact** if, in addition, the limit points always lie in $K$. Thus, $K$ is compact if and only if $K$ is closed and precompact. The following theorem is a bedrock of real analysis. ```{prf:theorem} :label: t-cies A subset $K$ of Euclidean metric space is precompact if and only if it is bounded. As a result, the set $K$ is compact if and only if $K$ is closed and bounded. ``` Every precompact subset of a metric space is bounded, but the converse is not true in general. For example, consider the set $b\RR$ with the supremum distance ({prf:ref}`eg-bx`). Let $f_n$ be the normal density with variance 1 and mean $n$ for each $n$ in $\NN$. The set $\{f_n\}_{n \in \NN}$ is bounded, since $d_\infty(f_n, f_m) \leq 1$ for all $n, m$. But it is not precompact. For example, the sequence $\{f_n\}_{n \in \NN}$ has no convergent subsequence. Indeed, every pair of distinct points $f_n, f_m$ in the sequence has $d_\infty(f_n, f_m) = 1$. Later, in {ref}`sss-compactvs`, we will see sufficient conditions for boundedness to imply precompactness. (ss-completeness)= #### Completeness Let $V = (V, d)$ be a metric space. Analogous to the real case (see {ref}`sss-comprl`), a sequence $(u_n) \subset V$ is called **Cauchy** if, given any $\epsilon > 0$, there exists an $n_\epsilon \in \NN$ such that $n, m \geq n_\epsilon$ implies $d(u_n, u_m) < \epsilon$. $(V, d)$ is called **complete** if every Cauchy sequence in $V$ converges in $V$. Examples of complete spaces include $\RR^n$ paired with any metric generated by a norm, the set of $n \times k$ matrices paired with any metric generated by a norm, the space $(\ell_p(\Xsf), d_p)$ for countable $\Xsf$ and $p \in [1, \infty]$, the space $(b\Xsf, d_\infty)$, and the space $(bc\Xsf, d_\infty)$. Two metrics $d_1$ and $d_2$ on $V$ are called **equivalent** if there are positive constants $\alpha, \beta$ such that $\alpha d_1(u, v) \leq d_2(u, v) \leq \beta d_1(u, v)$ for all $u, v \in V$. Equivalent metrics generate the same Cauchy sequences, so completeness is preserved under equivalence. ## Topology Topological spaces are a generalization of metric spaces. They are useful for two reasons. One is that there exist interesting and useful topological spaces that cannot be represented as metric spaces. The second is that, by stripping away some of the structure naturally present in metric spaces, topological arguments add simplicity and clarity to many discussions in analysis. (ss-ts)= ### Topological Space We begin by introducing topological spaces and investigating some of their core characteristics. #### Definition and Examples A **topological space** is a pair $(V, \tau)$ where $V$ is a nonempty set and $\tau$ is a collection of subsets of $V$ such that 1. $\varnothing$ and $V$ are both in $\tau$, 2. $\tau$ is closed under finite intersections, and 3. $\tau$ is closed under arbitrary unions. Statements (ii) and (iii) mean that $$ A, B \in \tau \implies A \cap B \in \tau \quad \text{and} \quad \aA \subset \tau \implies \cup_{A \in \aA} A \in \tau. $$ The family $\tau$ is called a **topology** on $V$. The elements of $\tau$ are called **open sets**. Complements of open sets are called **closed**. ```{prf:example} Let $V$ be any nonempty set. The set of all subsets of $V$ is a topology on $V$, referred to as the **discrete topology**. ``` ```{prf:example} Let $(V, d)$ be a metric space and let $\tau$ be the set of all open subsets of $V$ (as defined in {ref}`sss-oclo`). Then $\tau$ is a topology on $V$, called the topology **generated by $d$**. ``` ```{prf:example} :label: eg-smtop If $V$ is any set and $\aA$ is any nonempty collection of subsets of $V$, then there exists a uniquely defined "smallest" topology $\tau$ that contains $\aA$, constructed by taking the intersection of all topologies containing $\aA$. (This intersection is not empty -- at minimum, it contains the discrete topology -- and one easily confirms that the intersection of a nonempty set of topologies is again a topology.) We call $\tau$ the topology **generated by $\aA$**. ``` A subset $N$ of a topological space $V = (V, \tau)$ is called a **neighborhood** of a point $v \in V$ if there exists a $G \in \tau$ with $v \in G \subset N$. A topological space $V$ is called a **Hausdorff space** if, for any $u, v \in V$ with $u \not= v$, there exist neighborhoods $N$ of $u$ and $M$ of $v$ with $N \cap M = \varnothing$. Every metrizable space is Hausdorff, and all topological spaces we consider in this book are Hausdorff spaces. (sss-nets)= #### Nets We briefly introduce nets, which are a generalization of a sequence. Nets are important because (a) they characterize topologies, in a sense described below, and (b) nets allow us to describe definitions and properties in a way that connects neatly to sequence-based definitions in metric spaces. Let $A$ be any nonempty set. A **preorder** on $A$ is a relation $\preceq$ on $A \times A$ such that, for any $a, b, c$ in $A$ we have $a \preceq a$ (reflexivity) and $a \preceq b$ and $b \preceq c$ implies $a \preceq c$ (transitivity). Obviously any antisymmetric preorder on $A$ is a partial order on $A$. A **directed set** is a nonempty set $A$ and a preorder $\preceq$ on $A$ such that, for any $a, b \in A$, there exists a $c \in A$ with $a \preceq c$ and $b \preceq c$. ```{prf:example} The set of natural numbers $\NN$ is a directed set when paired with the usual order $\leq$. ``` ```{prf:example} If $L$ is a lattice then $L$ is also a directed set. ``` Let $V$ be any set. A **net** in $V$ is a function from a directed set $A$ to $V$, typically written as $v_\bullet$ or $(v_\alpha)_{\alpha \in A}$. We sometimes simplify the latter to $(v_\alpha)$. The interpretation is that $\alpha \in A$ is mapped to $v_\alpha \in V$. Obviously any sequence $(v_n)$ in $V$ is also a net in $V$. A net $(v_\alpha)_{\alpha \in A}$ in $V$ is said to **converge** to $v \in V$ and we write $v_\alpha \to v$ if, for any neighborhood $N$ of $v$, there exists a $\beta \in A$ such that $v_\alpha \in N$ whenever $\beta \preceq \alpha$. This generalizes the concept of convergence of sequences in metric space. It is easy to check that convergent nets in $V$ have unique limits whenever $V$ is Hausdorff. (The converse is also true.) The next theorem shows that nets can be used to characterize topologies. ```{prf:theorem} :label: t-nchart A subset $C$ of a topological space $V$ is closed in $V$ if and only if every convergent net contained in $C$ converges to an element of $C$. ``` For a proof of {prf:ref}`t-nchart`, see Theorem 2.14 of {cite}`aliprantis2006border`. Let $(v_\alpha)_{\alpha \in A}$ and $(w_\beta)_{\beta \in B}$ be two nets in $V$. The net $(w_\beta)_{\beta \in B}$ is called a **subnet** of $(v_\alpha)_{\alpha \in A}$ if there exists an order preserving map $p$ from $B$ to $A$ such that (i) $w_\beta = v_{p(\alpha)}$ for all $\alpha \in A$ and (ii) for all $\alpha \in A$, there exists a $\alpha' \in p(B)$ such that $\alpha \preceq \alpha'$. Subnets generalize subsequences. For example, suppose that $w_n = 1/n^2$ and $v_n = 1/n$ for $n \in \NN$, then $(w_n)_{n \in \NN}$ is a subnet of $(v_n)_{n \in \NN}$ in $\RR$ (take $A = B = \NN$ and $p(n) = n^2$). A subset $K$ of topological space $V$ is called **compact** if, given any net $(v_\alpha)$ contained in $K$, there exists a subnet $(w_\beta)$ of $(v_\alpha)$ and a point $v \in K$ such that $w_\beta \to v$. This generalizes the notion of a compact subset of a metric space, as given in {ref}`sss-commet`. #### Continuous Functions Let $V$ and $W$ be topology spaces. A function $f \colon V \to W$ is said to be **continuous** at $v \in V$ if, for any net $(v_\alpha)$ in $V$ with $v_\alpha \to v$ in $V$ we have $f(v_\alpha) \to f(v)$ in $W$. If $f$ is continuous at every $v \in V$ we simply say that $f$ is continuous. It is well-known that $f$ is continuous on $V$ if and only if $f^{-1}(G)$ is open in $V$ whenever $G$ is open in $W$. (For a proof of this equivalence, see, e.g., Theorem 2.28 of {cite}`aliprantis2006border`.) ```{prf:example} If $V$ is paired with the discrete metric $\tau = \wp(V)$, then every function from $V$ into another topological space $W$ is continuous. ``` One of the most important features of continuous functions is that they carry compact sets into compact sets (see, e.g., §2.3 of {cite}`dudley2002real`): ```{prf:theorem} :label: t-concomcom If $f$ is a continuous function from topological space $V$ to topological space $W$, then $f(K)$ is compact in $W$ whenever $K$ is compact in $V$. ``` #### Initial Topologies Let $V$ be a nonempty set and, for each $\alpha$ in index set $\Lambda$, let $f_\alpha$ be a function from $V$ to topological space $(W_\alpha, \tau_\alpha)$. The **initial topology** generated by $\{f_\alpha\}_{\alpha \in \Lambda}$ is the topology $\tau$ on $V$ generated (in the sense of {prf:ref}`eg-smtop`) by the family of sets $$ \aA \coloneq \setntn{f_\alpha^{-1}(G)}{G \in \tau_\alpha, \; \alpha \in \Lambda}. $$ Evidently each $f_\alpha$ is continuous with respect to $\tau$ on $V$. The following lemma nicely characterizes convergence with respect to the initial topology. In the statement, $\tau$ is the initial topology just described. ```{prf:lemma} :label: l-convit For net $(v_\alpha)$ in $V$ and given $v \in V$, the following statements are equivalent: 1. $v_\alpha \to v$ under the initial topology $\tau$ and 2. $f(v_\alpha) \to f(v)$ under $\tau_\alpha$ for all $\alpha \in \Lambda$. ``` #### Metrizable Spaces A topological space $(V, \tau)$ is called **metrizable** if there exists a metric $d$ on $V$ such that $d$ generates the topology $\tau$. In metrizable spaces, sequences have the same "rights" as sequences in Euclidean space, in the sense that they determine the topology and hence other derived objects such as continuous functions. For example, given two metrizable spaces $V$ and $W$, 1. a function $f \colon V \to W$ is **continuous** if and only if, for $u \in V$ and any sequence $(v_n) \subset V$ we have $f(v_n) \to f(v)$ in $W$ whenever $v_n \to v$ in $V$. 2. A set $C$ is closed in $V$ if and only if any convergent sequence contained in $C$ converges to an element of $C$. Equivalent metrics generate the same topology, which is why it is often nicer to discuss topologies than metrics. For example, we will see later (in {ref}`s-fa`) that all metrics on Euclidean space $\RR^n$ generated by a norm are equivalent. Hence, while there are infinitely many norms on $\RR^n$, they all generate the same topology. This means that, when discussing norm topologies, we can speak unambiguously about open sets, compact sets, continuous functions, etc. (sss-prodtop)= #### Product Topologies Let $\{(V_n, \tau_n)\}_{n \in \NN}$ be a family of topological spaces and consider the Cartesian product $V = \prod_{n \in \NN} V_n$. The $i$-th projection map on $V$ is the function $\pi_i$ sending $v = (v_n)_{n \in \NN} \in V$ into $v_i$. The **product topology** on $V$, denoted here by $\tau$, is the initial topology generated by the set of projection maps $\{ \pi_n \}_{n \in \NN}$. The following result is a direct consequence of {prf:ref}`l-convit`. ```{prf:lemma} :label: l-ptopnet If $(v_\alpha) = ((v^1_\alpha, v^2_\alpha, \ldots))_{\alpha \in \Lambda}$ is a net in $V$ and $v = (v^1, v^2, \ldots)$ is an element of $V$, then $v_\alpha \to v$ in the product topology if and only if $v^i_\alpha \to v^i$ in $(V_i, \tau_i)$ for all $i \in \NN$. ``` ```{prf:example} :label: eg-euprod Consider $\RR$ with its usual topology, generated by the metric $d(u, v) = |u-v|$. The set of $n$-vectors $\RR^n$ is the Cartesian product of $n$ copies of $\RR$. The projections can be identified with the canonical basis vectors $e_1, \ldots, e_n$, since, given $u \in \RR^n$, the $i$-th projection is $\pi_i(u) = u_i = \inner{u, e_i}$. In view of {prf:ref}`l-convit`, a sequence $(u_k)$ converges to $u \in \RR^n$ in the product topology if and only if $\inner{u_k, e_i} \to \inner{u, e_i}$ in $\RR$ for all $i$ in $\{1, \ldots, n\}$. In other words, a sequence in $\RR^n$ converges in the product topology if and only if it converges pointwise. ``` More generally, if $((\Xsf_i, d_i))_{i=1}^n$ are metric spaces and $\Xsf \coloneq \prod_i \Xsf_i$ has the product topology, then $(u_k) \subset \Xsf$ converges to $u \in \Xsf$ if and only if $d_i(u_k, u) \to 0$ for all $i$. (sss-twt)= #### Existence of Extrema For finite subsets of $\RR$, maxima and minima clearly exist. For infinite collections the same is not true. For example, the set $(0, 1)$ has neither a maximum nor a minimum. Under what conditions on primitives are maxima and minima guaranteed to exist? There are multiple approaches to this issue, depending on the structure of the problem. In this section we treat one of the most fundamental, attributed to the German mathematician Karl Weierstrass (1815--1897). Let $f$ be a function from a metric space $V$ to $\RR$. Let $(v_n)$ be an $V$-valued sequence and let $v$ be a point in $V$. The function $f$ is called - **lower semicontinuous** at $v$ if $v_n \to v$ implies $f(v) \leq \liminf_n f(v_n)$, and - **upper semicontinuous** at $v$ if $v_n \to v$ implies $f(v) \geq \limsup_n f(v_n)$. If $f$ is lower semicontinuous at every point in $V$, then $f$ is called lower semicontinuous, and similarly for upper continuity. A proof of the next theorem can be found in {cite}`jahn2020introduction`. ```{prf:theorem} :label: t-wt Let $K$ be a compact subset of $V$ and let $f \colon K \to \RR$. 1. If $f$ is lower semicontinuous, then $f$ has a minimizer on $K$. 2. If $f$ is upper semicontinuous, then $f$ has a maximizer on $K$. In particular, if $f$ is a continuous function from $K$ to $\RR$, then $f$ has both a maximizer and a minimizer in $K$. ``` (ss-scon)= ### Stability and Contractions One of the most important approaches to fixed points in metric space is via the theory of contractive maps. Here we review key results. (sss-stames)= #### Fixed Points Let $V$ be any set and let $S$ be a self-map on $V$. If $v \in V$ obeys $Sv = v$, then $v$ is called a **fixed point** of $S$ in $V$. For example, if $V = \RR$ and $S$ is the identity, then every point in $\RR$ is fixed under $S$. If, instead, $Sx = x^2$, then the set of fixed points is $\{0, 1\}$. Now let $V$ be a topological space. We call $S \colon V \to V$ **globally stable** on $V$ if $S$ has a unique fixed point $u^* \in V$ and $S^k u \to u^*$ as $k \to \infty$ for all $u \in V$. When $V$ is metrizable, with metric $d$, a self-map $S$ is called **asymptotically contracting** if $d(S^n u, S^n v) \to 0$ as $n \to \infty$ for all $u, v \in V$. ```{exercise} :label: ex-acgs Show that if $S$ is asymptotically contracting and has a fixed point in $V$, then $S$ is globally stable on $V$. ``` We will often make use of the following lemma. ```{prf:lemma} :label: l-csgs Let $S$ be globally stable on topological space $V$ with unique fixed point $\bar v \in V$. If $U \subset V$ is closed in $V$ and $S U \subset U$, then $\bar v \in U$. ``` ```{prf:proof} Fix $u \in U$. Since $S$ is globally stable on $V$, we have $v_n \coloneq S^n u \to \bar v$ as $n \to \infty$. Since $SU \subset U$, we also have $v_n \in U$ for all $n$. As $U$ is closed in $V$, these two facts imply that $\bar v \in U$ (see {prf:ref}`t-nchart`). ◻ ``` (sss-conmap)= #### Contractions A self-map $S$ on metric space $V \coloneq (V, d)$ is called **contracting** or, more specifically, **a contraction of modulus $\lambda$** if there exists a $\lambda \in [0, 1)$ such that $$ d(Su, Sv) \leq \lambda d(u, v) \quad \text{for all} \quad u, v \in V $$ (eq-uc) ```{prf:theorem} :label: t-bfpt If $V$ is a complete metric space and $S \colon V \to V$ is a contraction of modulus $\lambda$ on $V$, then $S$ has a unique fixed point $u^*$ in $V$ and $$ d(S^n u, u^*) \leq \lambda^n d(u, u^*) \quad \text{for all } n \in \NN $$ ``` For a proof, see, for example, {cite}`aliprantis2006border`, Theorem 3.48. Most of the conclusions of Banach's contraction mapping theorem carry over when $S$ is **eventually contracting**; that is, when $S^k$ is contracting for some $k \in \NN$. A proof can be found on p. 9 of {cite}`goebel1990topics`. ```{prf:theorem} :label: t-bfpt22 If $V$ is complete, $S$ is a self-map on $V$ and $S^k$ is a contraction on $V$ for some $k \in \NN$, then $S$ is globally stable on $V$. ``` ## Measure and Integration In this section we review measurable functions and integration theory. Measurable functions generalize continuous functions while remaining closed under standard arithmetic and limiting operations, and they admit a well-defined theory of integration. Throughout, for real-valued $f$ on an arbitrary domain, we set $f^+ \coloneq f \vee 0$ and $f^- \coloneq - (f \wedge 0)$. See {numref}`f-gdecomp` for an illustration. The function $f^+$ is called the **positive part** of $f$, while $f^-$ is called the **negative part**. The identity $f = f^+ - f^-$ always holds, so the pair $f^+$, $f^-$ provides a decomposition of $f$ into the difference between two nonnegative functions. ```{figure} figures/gdecomp.pdf :name: f-gdecomp Decomposition of functions ``` (ss-measure)= ### Measure Theory We review measurable spaces, measurable functions, parametric continuity and measurable selections, and measures. (sss-measurab)= #### Measurable Space Let $\Xsf$ be any nonempty set. A collection of subsets $\aA$ of $\Xsf$ is called a **$\sigma$-algebra** on $\Xsf$ if 1. $\Xsf \in \aA$, 2. $A \in \aA$ implies $A^c \in \aA$, and 3. if $\{A_n\}_{n \geq 1}$ is a sequence contained in $\aA$, then $\cup_n A_n \in \aA$. A pair $(\Xsf, \aA)$ where $\Xsf$ is a nonempty set and $\aA$ is a $\sigma$-algebra on $\Xsf$ is called a **measurable space**. Points (ii) and (iii) tell us that $\aA$ is "stable" under the taking of complements and unions. By De Morgan's law $(\cap_n A_n)^c = \cup_n A_n^c$, any $\sigma$-algebra is stable under countable intersections too. By (i) and (ii), $\varnothing \in \aA$ also holds. ```{prf:example} Given any set $\Xsf$ and any $A \subset \Xsf$, the family of sets $\aA \coloneq \{\Xsf, A, A^c, \varnothing\}$ is a $\sigma$-algebra on $\Xsf$. ``` ```{prf:example} The power set $\wp(\Xsf)$ is a $\sigma$-algebra on $\Xsf$, as is the pair $\{\varnothing, \Xsf\}$. ``` One way to define a $\sigma$-algebra is to take a collection $\cC$ of subsets of $\Xsf$, and consider the smallest $\sigma$-algebra that contains this collection. ```{prf:definition} :label: d-gsa Let $\cC$ be any collection of subsets of $\Xsf$. The **$\sigma$-algebra generated by $\cC$** is the smallest $\sigma$-algebra on $\Xsf$ that contains $\cC$, and is denoted by $\sigma(\cC)$.[^1] ``` Now let $\Xsf$ be a metric space. The family of **Borel sets** on $\Xsf$, denoted by either $\bB$ or $\bB_\Xsf$ depending on whether or not the underlying space is clear, is defined as the $\sigma$-algebra generated by the open sets of $\Xsf$. Evidently $\bB$ contains not only all the open subsets of $\Xsf$ but also all the closed ones. From these sets we can continue taking complements and countable unions and everything we produce must be a Borel set. In fact it turns out that every set we work with in day-to-day analysis is a Borel set. #### Measurable Functions Given two arbitrary measurable spaces $(\Xsf, \aA)$ and $(\Ysf, \bB)$, a function $f$ from $\Xsf$ to $\Ysf$ is called $(\aA, \bB)$-**measurable** if $$ f^{-1}(B) \text{ is in } \aA \text{ whenever } B \in \bB. $$ In other words, measurable functions are those functions that pull measurable sets back to measurable sets. If $\Ysf$ is a metric space and $\bB$ is its Borel sets, then we will say that $f$ is **Borel measurable**. It can be shown in this case (see, e.g., {cite}`cinlar2011probability`, Proposition 2.3) that $f$ is Borel measurable if and only if either one of the following apparently weaker conditions are satisfied: 1. $f^{-1}(G)$ is in $\aA$ whenever $G$ is open in $\Ysf$ 2. $\Ysf$ is a Borel subset of $\RR$ and $f^{-1}((-\infty, \alpha))$ is in $\aA$ for all $\alpha \in \RR$. From this result it is immediate that every continuous function from $\Xsf$ to $\Ysf$ is also Borel measurable. While the class of continuous functions has beautiful properties and is closed under uniform limits (see {prf:ref}`eg-bcx`), it is not closed under pointwise limits,[^2] which makes it hard to work with in some instances. On the other hand, the set of Borel functions *is* closed under the taking of pointwise limits: ```{prf:lemma} :label: l-limbm If $(\Xsf, \aA)$ is a measurable space and $\{f_n\}$ is a sequence of real valued Borel measurable functions on $(\Xsf, \aA)$, then the functions $$ f \coloneq \sup_{n} f_n, \quad f \coloneq \limsup_{n \to \infty} f_n, \quad \text{and } \; f \coloneq \lim_{n \to \infty} f_n, $$ are all Borel measurable on $(\Xsf, \aA)$ whenever they exist. The same is true if we replace sup with inf. ``` In fact, in our setting, the set of Borel measurable functions is precisely the smallest class of functions that contains the continuous functions and is closed under the taking of pointwise limits (see, e.g., §11.7 of {cite}`kechris2012classical`). It is also true that compositions of Borel measurable functions are also Borel measurable, and, when the functions are real-valued, that Borel measurability is preserved under algebraic operations. The next lemma gives one statement of these results: ```{prf:lemma} :label: l-limbmext If $(\Xsf, \aA)$ is a measurable space, $\alpha, \beta$ are real scalars and $f$ and $g$ are real-valued Borel measurable functions on $(\Xsf, \aA)$, then the functions $$ \alpha f + \beta g, \quad f g \quad \text{and } \quad f/g \text{ when } g \not= 0 $$ are all Borel measurable functions on $(\Xsf, \aA)$. ``` See {cite}`cinlar2011probability`, Chapter 1, Section 2 for proofs. (sss-pcms)= #### Parametric Continuity and Measurable Selections We often wish to know whether or not continuity passes from primitives to solutions. For example, we might ask whether an equilibrium object, constructed through a process that involves optimization, varies continuously with parameters. The most commonly used theorem in this domain is **Berge's theorem of the maximum**. Here we state a version of Berge's theorem. Throughout, $\Asf$ and $\Xsf$ are metric spaces. A **correspondence** from $\Xsf$ to $\Asf$ is a map $\Gamma$ from $\Xsf$ to the set of all subsets of $\Asf$. A correspondence $\Gamma$ from $\Xsf$ to $\Asf$ is called **nonempty** if $\Gamma(x)$ is nonempty for all $x \in \Xsf$. A function $\sigma$ from $\Xsf$ to $\Asf$ is called a **measurable selection** with respect to $\Gamma$ if $\sigma$ is Borel measurable and $\sigma(x) \in \Gamma(x)$ for all $x \in \Xsf$. A nonempty correspondence $\Gamma$ is called - **compact-valued** if $\Gamma(x)$ is compact for all $x \in \Xsf$, - **lower hemi-continuous** at $x \in \Xsf$ if, for any $y \in \Gamma(x)$ and any $(x_n)$ with $x_n \to x$, there exists a sequence $(y_n)$ with $y_n \in \Gamma(x_n)$ for all $n$ and $y_n \to y$, - **upper hemi-continuous** at $x \in \Xsf$ if, for any sequence $(x_n)$ with $x_n \to x$ and any sequence $(y_n)$ with $y_n \in \Gamma(x_n)$ for all $n$, there exists a convergent subsequence of $(y_n)$ whose limit is in $\Gamma(x)$, and - **continuous** on $\Xsf$ if it is both lower and upper hemi-continuous at every $x \in \Xsf$. ```{exercise} :label: ex-ecco Let $\Xsf$ and $\Asf$ be subsets of finite-dimensional Euclidean space and let $g, h$ be continuous functions from $\Xsf$ to $\Asf$ with $g(x) \leq h(x)$ for all $x \in \Xsf$, where $\leq$ is the pointwise partial order. Let $\Gamma(x) = [g(x), h(x)]$. Prove that $\Gamma$ is compact-valued and continuous on $\Xsf$. ``` Let $\Gamma$ be a nonempty, compact-valued correspondence from $\Xsf$ to $\Asf$. Let $q$ be a real valued function on $\Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)}$ and set $$ m(x) \coloneq \max_{a \in \Gamma(x)} q(x, a) \qquad (x \in \Xsf) $$ (eq-bergemx) whenever the maximum is well defined. ```{prf:theorem} :label: t-berge If $\Gamma$ is continuous on $\Xsf$, $q$ is continuous on $\Gsf$, and $A$ is separable, then 1. $m$ is well defined and continuous on $\Xsf$, and 2. there exists a measurable selection $\sigma$ such that $q(x, \sigma(x)) = m(x)$ for all $x$. If, in addition, the maximizer in {eq}`eq-bergemx` is unique at each $x$, then the uniquely defined measurable selection $\sigma$ is continuous. ``` A proof of the continuity results in {prf:ref}`t-berge` can be found in §17.5 of {cite}`aliprantis2006border`. Existence of a measurable selection is proved in §18.19 of the same reference. (sss-measures)= #### Measures Through the theory constructed above, we can identify broad classes of sets and functions that are relatively well behaved (e.g., Borel sets and Borel functions). This opens the way to analyzing how to (a) measure these sets and (b) integrate the functions. The first step is to introduce the notion of a **measure**, which is a map $\mu$ from a $\sigma$-algebra $\aA$ to $[0, \infty]$ satisfying 1. $\mu(\varnothing) = 0$ and 2. $\mu(\cup_{n=1}^\infty A_n) = \sum_{n=1}^\infty \mu(A_n)$ whenever $\{A_n\} \subset \aA$ is disjoint. Here disjointness of $\{A_n\}$ means that any two distinct sets in this sequence are disjoint. ```{prf:example} :label: eg-count Let $\Xsf = \{x_1, x_2, \ldots\}$ be a countable set paired with $\wp(\Xsf)$, the set of all its subsets. Define $c \colon \wp(\Xsf) \to \RR_+$ by $c(A) = |A|$, where $|A|$ is the number of elements in $A$, with $c(A) = \infty$ if $A$ is infinite. Some thought will convince you that $c$ is a measure on $\wp(\Xsf)$. This measure is called the **counting measure**. ``` ```{prf:example} :label: eg-leb It can be proved (see, e.g., {cite}`dudley2002real`, §3.2) that there exists exactly one measure on the Borel subsets of $\RR^2$ that assigns area to rectangles in the usual way (i.e., area is the product of the two sides). It is called **Lebesgue measure** and often denoted by $\lambda$. The measure $\lambda$ also assigns the usual measures of area to other standard sets, such as circles. Indeed, if $C$ is a circle with radius $r$, then $\lambda(C) = \pi r^2$. In other words, $\lambda$ is the measure that gives us the classical notion of area in the plane, as taught in basic geometry. ``` ```{prf:example} {prf:ref}`eg-leb` introduced Lebesgue measure for $\RR^2$. An analogous version, also called Lebesgue measure and denoted by $\lambda$, is defined on the Borel subsets of $\RR^k$ for every $k \in \NN$. For example, if $k=1$ and $I$ is an interval in $\RR$, then $\lambda(I)$ is the length of that interval. ``` Returning to the general case of a measure $\mu$ on measurable space $(\Xsf, \aA)$, if there exists a sequence of sets $(A_n) \subset \aA$ with $\mu(A_n) < \infty$ for all $n$ and $\cup_n A_n = \Xsf$, then $\mu$ is called **$\sigma$-finite**. If $\mu(\Xsf) < \infty$, then $\mu$ is called **finite**. If $\mu(\Xsf) = 1$, then $\mu$ is called a **probability measure**. If $\Xsf$ is a metric space and $\aA = \bB$ (the Borel sets), then $\mu$ is called a **Borel measure**. If $\aA = \bB$ and $\mu(\Xsf) = 1$, then $\mu$ is called a **Borel probability measure**. For a Borel probability measure $\mu$, the value $\mu(B)$ usually is interpreted as the probability that, when a random element of $\Xsf$ is selected, that element is in $B$. ```{prf:example} :label: eg-pmf Take the setting of {prf:ref}`eg-count` but now let the measure be given by $\nu(A) = \sum_{x \in A} p(x)$ instead of $|A|$, where $p$ is a function from $\Xsf$ to $\RR_+$. It's not hard to see that $\nu$ defines a measure on $\wp(\Xsf)$. If $\sum_{x \in \Xsf} p(x) < \infty$ then $\nu$ is a finite Borel measure. If the sum equals unity then $\nu$ is a Borel probability measure. ``` A **measure space** is a triple $(\Xsf, \aA, \mu)$ where $(\Xsf, \aA)$ is a measurable space and $\mu$ is a measure on $\aA$. If $\mu(\Xsf) = 1$, then the measure space is also called a **probability space**. In this case it is common to write the measure space as $(\Omega, \fF, \PP)$. A **random variable** on probability space $(\Omega, \fF, \PP)$ is an $(\fF, \bB)$-measurable map $X$ from $\Omega$ to $\RR$ paired with its Borel sets $\bB$. More generally, given measurable space $(E, \eE)$, an $E$-valued **random element** on probability space $(\Omega, \fF, \PP)$ is an $(\fF, \eE)$-measurable map $X$ from $\Omega$ to $E$. The **distribution** of this random element $X$ is the probability measure $P$ defined by $$ P(B) = \PP\setntn{\omega \in \Omega}{X(\omega) \in B} \qquad (B \in \eE) $$ Here's a reassuring fact implying that Borel probability measures on $\RR$ are isomorphic to a set of very familiar objects. ```{prf:theorem} There is a one-to-one correspondence between $\fF$, the set of cumulative distribution functions on $\RR$, and the set of Borel probability measures on $\RR$. For each $F \in \fF$, the corresponding probability measure $\mu$ satisfies $$ \mu((a, b]) = F(b) - F(a) \text{ for all } a, b \in \RR \text{ with } a < b $$ ``` More generally, we have the interpretation $$ \mu(B) = \text{ probability that } x \in B \text{ when } x \text{ is drawn from } F $$ (sss-prodmeas)= #### Product Spaces and Product Measures Given measurable spaces $(\Xsf, \aA)$ and $(\Ysf, \bB)$, the **product $\sigma$-algebra** $\aA \otimes \bB$ is the $\sigma$-algebra on $\Xsf \times \Ysf$ generated by all sets of the form $A \times B$ with $A \in \aA$ and $B \in \bB$. The triple $(\Xsf \times \Ysf, \, \aA \otimes \bB)$ is called the **product measurable space**. If $\mu$ and $\nu$ are $\sigma$-finite measures on $(\Xsf, \aA)$ and $(\Ysf, \bB)$ respectively, then there exists a unique measure $\mu \otimes \nu$ on $\aA \otimes \bB$ satisfying $$ (\mu \otimes \nu)(A \times B) = \mu(A) \, \nu(B) \qquad (A \in \aA, \; B \in \bB). $$ The measure $\mu \otimes \nu$ is called the **product measure**. The construction extends naturally to finite products of measurable spaces. (ss-integ)= ### Integration We define abstract integrals and review their key properties, including monotonicity and the dominated convergence theorem. #### Abstract Integrals Let $(\Xsf, \aA)$ be a measurable space and let $m\aA_+$ be the set of nonnegative real-valued Borel measurable functions on $(\Xsf, \aA)$. We define an **integral** on $m\aA_+$ to be a function $I \colon m\aA_+ \to [0, \infty]$ such that 1. $I(f) = 0$ when $f = 0$ everywhere on $\Xsf$, 2. $f_1 \leq f_2 \leq \cdots$ and $\lim_{n\to \infty} f_n = f$ implies $\lim_{n \to \infty} I(f_n) = I(f)$, and 3. $\alpha, \beta \geq 0$ and $f, g \in m\aA_+$ implies $I(\alpha f + \beta g) = \alpha I(f) + \beta I(g)$. The limit in (ii) is a pointwise limit, so that $\lim_{n\to \infty} f_n = f$ means $\lim_{n\to \infty} f_n(x) = f(x)$ for every $x \in \Xsf$. The following important result, proved in chapter 1 of {cite}`cinlar2011probability`, states that *every* measure on a measurable space creates a unique and well defined integral. ```{prf:theorem} :label: t-adefi Let $(\Xsf, \aA)$ be measurable space. There exists a one to one correspondence between the set of measures on $(\Xsf, \aA)$ and the set of integrals on $m\aA_+$. For any measure $\mu$, the corresponding integral $I_\mu$ satisfies $$ I_\mu (\1_B) = \mu(B) \text{ whenever } B \in \aA $$ (eq-iiim) ``` The value $I_\mu (f)$ is called the **integral of $f$ under $\mu$** and the following notation is common: $$ I_\mu (f) :=: \int f \diff \mu :=: \int f(x) \mu(\diff x). $$ ```{prf:example} :label: eg-lebint Let $\bB$ be the Borel sets on $\RR$ and let $\lambda$ be Lebesgue measure. According to {prf:ref}`t-adefi`, there exists an integral $I_\lambda$ that associates to each Borel measurable function from $\RR$ to $\RR_+$ a number $I_\lambda(f)$, often written as $\int f \diff \lambda$ or just $\int f(x) \diff x$. If $f$ is continuous and supported on an interval $[a, b]$, then $I_\lambda(f)$ equals $\int_a^b f(x) \diff x$ in the standard Riemann sense (see, e.g., {cite}`cinlar2011probability`, §1.4). For example, with $f(x) = x^2$ on $[0, 1]$ and zero elsewhere, we have $\int f \diff \lambda = 1/3$. ``` The integral $I_\lambda$ introduced in {prf:ref}`eg-lebint` is called the **Lebesgue integral**, and it extends the standard Riemann integral to a larger set of functions (the Borel measurable functions), while at the same time guaranteeing that the attractive properties (i)--(iii) in the definition of the integral will hold. Equation {eq}`eq-iiim` makes sense in this setting because if, say, $f = \1_{[a, b]}$, then $$ I_\lambda (f) = \lambda( [a, b] ) = b - a, $$ where the first equality is by {eq}`eq-iiim` and the second is by the fact that Lebesgue measure assigns length to intervals. The value $b-a$ is also what we would expect for the integral, since it is the area under the curve for this simple function.[^3] ```{prf:example} :label: eg-pmfint As in {prf:ref}`eg-pmf`, let $\Xsf$ be countable, let $p \colon \Xsf \to \RR_+$ with $\sum_{x \in \Xsf} p(x) < \infty$, and let $\nu$ be the measure on $\wp(\Xsf)$ defined by $\nu(A) = \sum_{x \in A} p(x)$. Then, for any $f \colon \Xsf \to \RR_+$, $$ \int f(x) \nu(\diff x) = \sum_{x \in \Xsf} f(x) p(x). $$ In particular, taking $p \equiv 1$ gives the counting measure $c$ from {prf:ref}`eg-count`, and integration under $c$ reduces to summation. ``` If $\mu$ is a probability measure and $w \colon \Xsf \to \RR$, then one often writes $\EE w(x)$ for the integral of $w(x)$ with respect to $\mu$. That is, $$ \EE w(x) = \int w \diff \mu $$ Here we are thinking of $x$ as a random variable drawn from distribution $\mu$ and the integral corresponds to the **expectation** of $w(x)$ under $\mu$. (sss-propin)= #### Properties of Integrals Given a measure space $(\Xsf, \aA, \mu)$, a property is said to hold **$\mu$-almost everywhere** (or **$\mu$-almost surely** when $\mu$ is a probability measure) if it holds on all of $\Xsf$ except possibly a set of $\mu$-measure zero. A sequence $(f_n)$ converges to $f$ $\mu$-almost everywhere if $f_n(x) \to f(x)$ for $\mu$-almost every $x \in \Xsf$. The integral extends to functions that take negative values, as well as just the nonnegative functions in $m\aA_+$. Indeed, if $(\Xsf, \aA, \mu)$ is a measure space and $f \in m\aA$ is not necessarily nonnegative, then we can still decompose it into the difference between two nonnegative functions via $f = f^+ - f^-$. Imposing linearity, we now set $$ \int f \diff \mu \coloneq \int f^+ \diff \mu - \int f^- \diff \mu . $$ The only risk here is that both terms on the right equal $+\infty$, in which case the integral is not well defined. If both integrals are finite we call $f$ **integrable** with respect to $\mu$. In what follows, we leave $(\Xsf, \aA, \mu)$ fixed and write the integral $I_\mu(f)$ of $f$ under $\mu$ as $\int f \diff \mu$. We note that every integral is increasing, in the sense that $$ f \leq g \implies \int f \diff \mu \leq \int g \diff \mu. $$ (eq-integmon) To see this, observe that $g - f$ is nonnegative (and measurable) and hence $\int (g - f) \diff \mu$ is well defined and nonnegative. Now, using the linearity in part (iii) of {prf:ref}`t-adefi`, we have $$ \int g \diff \mu = \int (g - f + f) \diff \mu = \int (g - f) \diff \mu + \int f \diff \mu \geq \int f \diff \mu. $$ A battery of useful limit theorems exist for the integral we have defined. In our statements of these results, $(\Xsf, \aA, \mu)$ is any measure space and $f$ and $f_n$ are $(\aA, \bB)$-measurable functions from $\Xsf$ to $\RR$ for all $n \in \NN$. ```{prf:theorem} :label: t-dct Let $\lim_{n \to \infty} f_n = f$ hold $\mu$-almost everywhere on $\Xsf$. If either 1. $-\infty < \int f_1 \diff \mu$ and $f_n \leq f_{n+1}$ for all $n \in \NN$, or 2. there exists a $g \in m\aA_+$ with $\int g \diff \mu < \infty$ and $| f_n | \leq g$ for all $n \in \NN$, then $$ \lim \int f_n \diff \mu = \int f \diff \mu. $$ (eq-dctlim) ``` The first implication (i.e., (i) $\implies$ {eq}`eq-dctlim`) is called the **monotone convergence theorem**. The second is called the **dominated convergence theorem**. ### Conditioning Next we review prediction based on conditional expectations. Conditional expectations are themselves a cornerstone of economic theory and empirics, since they describe optimal forecasts based on limited information. Here we provide a brief treatment of the general setting that suffices for what follows. #### Definition Let $Y$ and the elements of $\gG \coloneq \{X_1, \ldots, X_k\}$ be scalar random variables. Consider the problem of predicting $Y$ given $\gG$. That is, we wish to form a prediction of the value that $Y$ will take once $X_1, \ldots, X_k$ are known, without any additional information on the state of the world. Another way to say this is that we seek a (nonrandom) function $f \colon \RR^k \to \RR$ such that $$ \hat Y \coloneq f(X_1, \ldots, X_k) \text{ is a good predictor of } Y. $$ To find such an $f$ we must define what "good" means. The most common definition in the present context is that **mean squared error** $\EE[(\hat Y - Y)^2]$ is small. Thus, we have a minimization problem in function space (the set from which $f$ is chosen). Based on projection arguments, it can be shown that there exists an essentially unique $\hat f$ in the set of functions from $\RR^k$ to $\RR$ that solves $$ \hat f = \argmin_f \EE[ (Y - f(X_1, \ldots, X_k))^2 ]. $$ (eq-fmin) (See, e.g., {cite}`cinlar2011probability`.) We call the resulting variable $$ \hat Y \coloneq \hat f(X_1, \ldots, X_k) $$ the **conditional expectation** of $Y$ given $\gG$. Common alternative notations for $\hat Y$ include $$ \EE_{\gG} Y :=: \EE[Y \given \gG] :=: \EE[Y \given X_1, \ldots, X_k] . $$ In the present context, $\gG$ is often called an **information set**. (sss-condprop)= #### Properties In the next proposition, a random variable $Y$ is called **$\gG$-measurable** if there exists a function $f$ such that $Y = f(X_1, \ldots, X_k)$. Intuitively, $Y$ is perfectly predictable given the data in $\gG$. ```{prf:proposition} :label: p-core Let $X$ and $Y$ be random variables with finite first moment and let $\gG$ and $\hH$ be information sets. The following properties hold: 1. $\EE_{\gG} X$ is $\gG$-measurable 2. If $\gG \subset \hH$, then $\EE_{\gG} [ \EE_{\hH} Y] = \EE_{\gG} Y$ and $\EE[ \EE_{\gG} Y ] = \EE Y$. 3. If $Y$ is independent of the variables in $\gG$, then $\EE_{\gG} Y = \EE Y$. 4. If $Y$ is $\gG$-measurable, then $\EE_{\gG} Y = Y$. 5. If $X$ is $\gG$-measurable, then $\EE_{\gG} [ X Y ] = X \EE_{\gG} Y$. 6. $\EE_{\gG} [ \alpha X + \beta Y ] = \alpha \EE_{\gG} X + \beta \EE_{\gG} Y$ for all $\alpha,\beta$ in $\RR$. ``` Property (vi) states that the linearity of expectations is preserved under conditioning. Property (ii) is called the **law of iterated expectations**, and is shared by all projections. Property (v) is sometimes called **conditional determinism**, since $X$ can be treated like a constant when it is pinned down by the information set. A full proof of {prf:ref}`p-core` can be found in {cite}`cinlar2011probability`. (ss-martin)= ### Martingales In this section we provide a brief introduction to martingales, one of the most important classes of stochastic processes, and a result on stopping times that's needed for the theory of optimal stopping. #### Discrete-Time Martingales Let $(\Omega, \fF, \PP)$ be a probability space and let $(\fF_t)_{t \geq 0}$ be a sequence of $\sigma$-algebras with $\fF_t \subset \fF_{t+1} \subset \fF$ for all $t$, called a **filtration**. A sequence of random variables $(M_t)_{t \geq 0}$ is called a **martingale** with respect to $(\fF_t)$ if, for all $t \geq 0$, 1. $M_t$ is $\fF_t$-measurable, 2. $\EE |M_t| < \infty$, and 3. $\EE[M_{t+1} \mid \fF_t] = M_t$. A **stopping time** with respect to $(\fF_t)$ is a random variable $\tau$ taking values in $\{0, 1, 2, \ldots\} \cup \{\infty\}$ such that $\{\tau \leq t\} \in \fF_t$ for all $t \geq 0$. #### Martingale Stopping Times Next we present the optional stopping theorem for discrete-time martingales. ```{prf:theorem} :label: t-ost Let $(M_t, \fF_t)$ be a martingale and let $\tau$ be a stopping time. If $\tau$ is bounded (i.e., $\tau \leq N$ for some constant $N$), then $$ \EE[M_\tau] = \EE[M_0]. $$ ``` Using this, we establish a general result on exit times for bounded martingales. In the statement of the theorem, $(M_t, \fF_t)$ is a discrete-time martingale with $M_0 = m$, and $$ \tau = \inf\{t \geq 0 : M_t \notin (a,b)\}. $$ for some $a,b$ in $\RR$ with $a < b$. ```{prf:theorem} :label: t-exit Let $(M_t, \fF_t)$ be bounded, in the sense that there exists a $K \in \RR$ with $|M_t| \leq K$ with probability one for all $t$. If, in addition, there exists a $\delta > 0$ with $$ \delta \leq \EE[(M_{t+1} - M_t)^2 \mid \fF_t] \quad \text{for all } t < \tau, $$ (eq-var_bound) then $$ \EE_m[\tau] \leq \frac{\EE_m[(M_\tau - m)^2]}{\delta}. $$ ``` ```{prf:proof} The **quadratic variation** process for this martingale is $$ \langle M \rangle_t = \sum_{s=0}^{t-1} (M_{s+1} - M_s)^2. $$ (The empty sum is understood to be zero, so $\langle M \rangle_0 = 0$.) We claim that $(M_t - M_0)^2 - \langle M \rangle_t$ is a martingale. To see this we write $M_{t+1} - M_0 = (M_{t+1} - M_t) + (M_t - M_0)$ and square it to get $$ (M_{t+1} - M_0)^2 = (M_t - M_0)^2 + 2(M_t - M_0)(M_{t+1} - M_t) + (M_{t+1} - M_t)^2. $$ Taking conditional expectations and using the martingale property, $$ \EE[(M_{t+1} - M_0)^2 \mid \fF_t] = (M_t - M_0)^2 + \EE[(M_{t+1} - M_t)^2 \mid \fF_t]. $$ (eq-sq_expand) By definition of quadratic variation, $\langle M \rangle_{t+1} = \langle M \rangle_t + (M_{t+1} - M_t)^2$, so $$ \EE[\langle M \rangle_{t+1} \mid \fF_t] = \langle M \rangle_t + \EE[(M_{t+1} - M_t)^2 \mid \fF_t]. $$ (eq-qv_expand) Subtracting {eq}`eq-qv_expand` from {eq}`eq-sq_expand`, we get $$ \EE[(M_{t+1} - M_0)^2 - \langle M \rangle_{t+1} \mid \fF_t] = (M_t - M_0)^2 - \langle M \rangle_t. $$ This is the martingale property for $(M_t - M_0)^2 - \langle M \rangle_t$. Since $(M_t)$ is bounded, this process is integrable for each $t$, and hence is a martingale. Since $\tau \wedge n \leq n$ is a bounded stopping time, the optional stopping theorem ({prf:ref}`t-ost`) gives $$ \EE[(M_{\tau \wedge n} - M_0)^2 - \langle M \rangle_{\tau \wedge n}] = \EE[(M_0 - M_0)^2 - \langle M \rangle_0] = 0, $$ and hence $$ \EE[(M_{\tau \wedge n} - M_0)^2] = \EE[\langle M \rangle_{\tau \wedge n}]. $$ (eq-ost) In addition, for $t < \tau$, we have $$ \EE[\langle M \rangle_{t+1} - \langle M \rangle_t \mid \fF_t] = \EE[(M_{t+1} - M_t)^2 \mid \fF_t] \geq \delta, $$ so $$ \EE[\langle M \rangle_{\tau \wedge n}] = \EE\left[\sum_{t=0}^{\tau \wedge n-1} (M_{t+1} - M_t)^2\right] \geq \delta \cdot \EE[\tau \wedge n]. $$ (eq-qv_lower) Combining {eq}`eq-qv_lower` and {eq}`eq-ost` yields $$ \EE[\tau \wedge n] \leq \frac{\EE[(M_{\tau \wedge n} - M_0)^2]}{\delta}. $$ Taking $n \to \infty$ and applying monotone convergence on the left and dominated convergence on the right (using boundedness of the martingale) gives the bound claimed in {prf:ref}`t-exit`. ◻ ``` ## Vector Spaces and Norms We humans have natural geometric intuition about the space $\RR^n$ when $n = 3$. If this intuition can be expressed algebraically, then $\RR^3$ results often extend to $\RR^n$ for arbitrary $n \in \NN$ -- and also to more general collections of objects, such as matrices, complex numbers and real-valued functions, provided that these collections are assigned some basic algebraic structure analogous to that enjoyed by vectors in $\RR^3$. Of course we need to formalize what "analogous" means by codifying the properties that we need the algebraic operations to satisfy. This leads to the concept of (abstract) vector space. In this section we recall the definition of such spaces and review key properties. (ss-absvec)= ### Vector Space We begin with linear algebraic properties in abstract sets that generalize the idea of adding and scalar multiplying vectors in $\RR^n$. Then we discuss properties of subsets of and maps over these abstract "vector spaces." (ss-avs)= #### Definition and Properties A **vector space** (also called a **linear space**) is a triple $(E, + , \cdot)$ where $E$ is a nonempty set, $+$ is a map from $E \times E$ to $E$ called **addition** and $\cdot$ is a map from $\RR \times E$ to $E$ called **scalar multiplication**, such that for all $u, v, w \in E$ and $\alpha, \beta \in \RR$, 1. $u + (v + w) = (u + v) + w$ 2. $u + v = v + u$ 3. there exists an element $0 \in E$, called the **origin**, s.t. $u + 0 = u$ for all $u \in E$ 4. for all $u \in E$, there exists a $v \in E$ such that $u + v = 0$ 5. $\alpha \cdot (\beta \cdot u) = (\alpha \cdot \beta) \cdot u$ 6. $1 \cdot u = u$ 7. $\alpha \cdot (u + v) = \alpha \cdot u + \alpha \cdot v$ 8. $(\alpha + \beta) \cdot u = \alpha \cdot u + \beta \cdot u$ In practice, the $\cdot$ symbol is usually omitted, so $\alpha u \coloneq \alpha \cdot u$. In the present context, the values $\alpha, \beta, \ldots$ are often called **scalars**. Also, the origin, which shares the symbol $0$ with the zero element from $\RR$, is sometimes referred to as the **additive identity**.[^4] ```{prf:example} The obvious example of a vector space is $\RR^n$ with the usual notions of addition and scalar multiplication. The origin in (iii) is the $n$-vector of zeros, while $v$ in (iv) is $-u$. All of the axioms are satisfied under this identification. (It would be highly surprising if this was not true, since $\RR^n$ is the model for the axioms.) ``` ```{prf:example} :label: eg-fvsp The set $\RR^\Xsf$ of real-valued functions on an arbitrary nonempty set $\Xsf$ is a real vector space when paired with the usual notions of addition and scalar multiplication of functions: for $f, g \in \RR^\Xsf$ and $\alpha \in \RR$, the functions $f+g$ and $\alpha f$ are defined by $$ (f + g)(x) \coloneq f(x) + g(x) \quad \text{and} \quad (\alpha f)(x) = \alpha f(x). $$ The zero element is $f \equiv 0$. Axioms (i)--(viii) are easily verified. ``` The vector space $\RR^n$ is a special case of {prf:ref}`eg-fvsp`, obtained when $\Xsf = \natset{n}$. (sss-convexity)= #### Convexity Given vector space $E$, set $C \subset E$ is called **convex** if $u, v \in C$ and $\alpha \in [0,1]$ implies $\alpha u + (1-\alpha) v \in C$. In other words, $C$ is closed under the taking of convex combinations. When $E$ is any vector space, a nonempty subset $C$ of $E$ is called a **cone** in $E$ if 1. $C$ is convex, 2. $x \in C$ and $-x \in C$ implies $x = 0$ and 3. $\alpha x \in C$ whenever $x \in C$ and $\alpha \geq 0$. (Some authors refer to $C$ as a "pointed convex cone.") #### Linear Maps and Subspaces Analogous to the case of $\RR^n$, a **linear subspace** of vector space $E$ is a set $S \subset E$ satisfying $$ \alpha, \beta \in \RR \text{ and } u, v \in S \; \implies \; \alpha u + \beta v \in S. $$ (eq-dplsvs) The proof of the next proposition is a useful exercise: ```{prf:proposition} :label: p-iovs If $(E, +, \cdot)$ is a vector space and $S$ is a linear subspace of $E$, then $(S, + , \cdot)$ is itself a vector space. ``` ```{prf:example} :label: eg-boundedvs Let $b\Xsf$ be the set of all bounded functions in $\RR^\Xsf$. This set is a linear subspace of $\RR^\Xsf$. Indeed, if $f$ and $g$ are bounded on $\Xsf$, then so is $\alpha f + \beta g$ for any scalars $\alpha$ and $\beta$, as follows from the triangle inequality. Now {prf:ref}`p-iovs` implies that $b\Xsf$ is a real vector space in its own right. ``` ```{prf:example} :label: eg-contvs Let $c\Xsf$ be the set of continuous functions from metric space $\Xsf$ to $\RR$. Condition {eq}`eq-dplsvs` holds for $c\Xsf$ when treated as a subset of $\RR^\Xsf$, since continuity is preserved under addition and scalar multiplication. Hence $c\Xsf$ is a linear subspace of $\RR^\Xsf$ and a vector space in its own right. ``` A **linear operator** from vector space $E$ into vector space $F$ is a map $A \colon E \to F$ satisfying $$ \alpha, \beta \in \RR \text{ and } u, v \in E \; \implies \; A(\alpha u + \beta v ) = \alpha A u + \beta A v. $$ (eq-linopvs) ```{prf:example} A matrix $A \in \RR_{n \times k}$ is a linear operator from $\RR^k$ to $\RR^n$ when identified with the map $x \mapsto Ax$. ``` It can in fact be shown that every linear operator from $\RR^k$ to $\RR^n$ can be represented by an $n \times k$ matrix. ```{prf:example} :label: eg-scountp Let $\Xsf$ be countable and consider the operator $P$ mapping $h \in b\Xsf$ into $Ph$ in $b\Xsf$ defined by $$ (Ph)(x) = \sum_{x' \in \Xsf} p(x, x') h(x'), $$ (eq-phxdc) where $p$ is nonnegative and obeys $\sum_y p(x,x')=1$ for all $x \in \Xsf$. ``` ```{exercise} :label: ex-math_foundations-auto-4 Show that $P$ (i) maps $b\Xsf$ to itself and (ii) is linear on $b\Xsf$. ``` The "kernel function" $p$ in {eq}`eq-phxdc` operator can be identified with a matrix in $\RR^{n \times n}$ when $|\Xsf|=n \in \NN$. No such identification exists when $|\Xsf|=\infty$. #### Bases and Dimension A **linear combination** of vectors $u_1,\ldots, u_k$ in $E$ is a vector of the form $\alpha_1 u_1 + \cdots + \alpha_k u_k$ where $\alpha_1,\ldots, \alpha_k$ are scalars. A set $S \subset E$ is called **linearly independent** if, for any finite set $\{u_1, \ldots, u_k\} \subset S$, we have $$ \alpha_1 u_1 + \cdots + \alpha_k u_k = 0 \text{ implies } \alpha_1 = \cdots = \alpha_k = 0. $$ A **basis** of a linear subspace $S$ of $E$ is a linearly independent subset $B$ of $S$ that spans $S$ (i.e., each $u \in S$ can be expressed as a finite linear combination of elements of $B$). ```{prf:theorem} :label: t-eqbase2 For a vector space $E$, the following statements are true: 1. $E$ has at least one basis. 2. If $E$ has a basis with $n$ elements, then every basis of $E$ has $n$ elements. 3. If $E$ has an infinite basis, then every basis is infinite. ``` A proof can be found in {cite}`jan`. In case (ii), we say that $E$ is **$n$-dimensional**. $E$ is called **finite-dimensional** if $E$ is $n$-dimensional for some $n \in \NN$. In case (iii), we call $E$ **infinite-dimensional**. ```{prf:example} If $\Xsf$ is finite, then $\RR^\Xsf$ is finite dimensional with dimension $|\Xsf|$. A basis is provided by the functions $f_i$ defined by $f_i(x) = \1\{x = i\}$. Any $g \in \RR^\Xsf$ can be expressed as a linear combination of these basis vectors via $g(x) = \sum_{i \in \Xsf} g(i) \1\{x = i\}$. ``` (s-fa)= ### Normed Vector Space In this section we recall basic definitions and properties concerning normed vector space and linear operators acting on such space. (sss-nonvecsp)= #### Norms on Vector Space Given vector space $E$, a map $\| \cdot \| \colon E \to \RR$ is called a **norm** on $E$ if, for any $\alpha \in \RR$ and any $u, v \in E$, 2 1. $\| u \| \geq 0$ 2. $\| u \| =0 \iff u=0$ 3. $\| \alpha u \| = |\alpha| \| u\|$ and 4. $\| u + v \| \leq \| u \| + \| v \|$ 5. (nonnegativity) 6. (positive definiteness) 7. (positive homogeneity) 8. (triangle inequality) The pair $(E, \| \cdot \|) = ((E, +, \cdot), \| \cdot \|)$ is called a **normed vector space** (or **normed linear space**). When $\| \cdot \|$ is understood we refer to the space using the symbol $E$. ```{prf:example} Euclidean vector space is the canonical example: The mapping defined on $\RR^k$ by $\| u\| = \sqrt{\inner{u, u}}$ with $\inner{u, u} = \sum_{i=1}^k u_i^2$ is a norm on $\RR^k$. The triangle inequality can be proved via the Cauchy--Schwarz inequality. The objective of studying normed linear spaces, as defined above, is to extend this canonical example and leverage its analysis to more general settings. ``` Consider a normed vector space $(E, \| \cdot\|)$ with origin $0$. Recalling the definition of boundedness from metric spaces, one can show that a subset $S$ of $E$ is bounded if and only if there exists an $M \in \NN$ such that $\| u \| \leq M$ for all $u \in S$. If $(E, \| \cdot \|)$ is a normed linear space, then $(E, d)$ is a metric space when $d(u,v) \coloneq \| u - v \|$. All metric concepts extend to $(E, \| \cdot \|)$. For example, - $G \subset E$ is said to be open in $E$ if $G$ is open in $(E, d)$. - A sequence $(v_n)$ in $E$ is said to converge to $v \in E$ if $d(v_n, v) \to 0$. Let $E$ be a vector space and let $\| \cdot \|$ and $\| \cdot \|'$ be two norm on $E$. These norms are said to be **equivalent** if there exist finite positive constants $A, B$ such that $\|x\| \leq A \|x\|'$ and $\|x\|' \leq B \|x\|$ for all $x \in E$. The following result is fundamental. See, for example, {cite}`aliprantis1998principles`, Theorem 27.6. ```{prf:theorem} :label: t-eon If $E$ is finite-dimensional, then any two norms on $E$ are equivalent. ``` (sss-comip)= #### Completeness Completeness is essential to many important theorems in applied analysis. Fortunately, the completeness of $\RR$ is inherited by many useful spaces. For example, ```{prf:theorem} :label: t-rncomplete Every finite-dimensional normed vector space is complete. ``` A proof can be found in, e.g., {cite}`aliprantis1998principles`, Theorem 27.6. A complete normed vector space is called a **Banach space**. There are many other important Banach spaces, beyond the finite-dimensional ones. ```{prf:example} :label: eg-bxbs Recall that in {prf:ref}`eg-bx` on page , we imposed a distance on $f, g$ in $b\Xsf$ via $$ d_\infty(f, g) \coloneq \|f - g \|_\infty \quad \text{where} \quad \| f \|_\infty \coloneq \sup_{x \in \Xsf} |f(x)| $$ The pair $(b\Xsf, \| \cdot \|_\infty)$ forms a Banach space. The completeness of this space is inherited from the completeness of $\RR$ (see, e.g., section 3.2 of {cite}`aliprantis2006border`). ``` ```{prf:example} :label: eg-bcxbs We previously discussed the fact that $bc\Xsf$ is a closed subset of $b\Xsf$, and that closed subsets of complete metric spaces are complete. Hence $(bc\Xsf, \| \cdot \|_\infty)$ forms a Banach space. ``` ```{prf:example} :label: eg-ellisban Following {prf:ref}`eg-ellp` on page , we define $$ \| h \|_p \coloneq \left\{ \sum_{x \in \Xsf} |h(x)|^p \right\}^{1/p} \quad \text{and} \quad \ell_p(\Xsf) \coloneq \left\{ h \in \RR^\Xsf \,:\, \| h \|_p < \infty \right\}. $$ The pair $(\ell_p(\Xsf), \| \cdot \|_p)$ is a Banach space. See {ref}`ss-lp` for more details. ``` (sss-compactvs)= #### Compactness Let $(E, \| \cdot \|)$ be a normed vector space. All equivalent metrics induce the same precompact sets and the same bounded sets in $E$. Since all norms on a finite-dimensional space are equivalent ({prf:ref}`t-eon`), any metric induced by a norm on a finite dimensional vector space has the property that its precompact and bounded sets coincide (cf., {prf:ref}`t-cies`). The next theorem states this fact for the record. ```{prf:theorem} :label: t-bw A subset of a finite dimensional normed vector space is compact if and only if it is closed and bounded. ``` In line with our discussion in {ref}`sss-commet`, this one-to-one pairing between closed bounded sets and compact sets breaks down in infinite dimensional spaces. In fact, the closed unit ball of a normed vector space $E$ is compact if and only if $E$ is finite-dimensional. (ss-lp)= #### $L_p$ Spaces Let $\mu$ be a $\sigma$-finite measure on measurable space $(\Xsf, \aA)$ and let $p \geq 1$. The space $L_p(\Xsf, \aA, \mu)$ consists of all Borel measurable functions $f \colon \Xsf \to \RR$ with $\int |f|^p \diff \mu$ finite. Functions that agree $\mu$-almost everywhere are identified. The functional $\|f\|_p \coloneq \left(\int |f|^p \diff \mu\right)^{1/p}$ is a norm on $L_p(\Xsf, \aA, \mu)$. ```{prf:theorem} :label: t-comlod The space $L_p(\Xsf, \aA, \mu)$ paired with the norm $\| \cdot \|_p$ is a Banach space. ``` **Scheffés identity** provides a useful quantitative interpretation of $d_1$ distance between densities: For any densities $f$ and $g$ on $(\Xsf, \aA, \mu)$, we have $$ \|f - g\|_1 = 2 \times \sup_{B \in \aA} \left| \int_B f \diff \mu - \int_B g \diff \mu \right| $$ (eq-schi) Finally, **Scheffés lemma** is useful for testing $L_1$ convergence: ```{prf:lemma} :label: l-scheffe If $(f_n)$ and $f$ are in $L_1(\Xsf, \aA, \mu)$ and $f_n \to f$ $\mu$-almost everywhere as $n \to \infty$, then $$ \int |f_n - f| \diff \mu \to 0 \text{ if and only if } \int |f_n| \diff \mu \to \int |f| \diff \mu. $$ ``` In the case where $f_n$ and $f$ are densities, Scheffé's lemma tells us that $f_n \to f$ in $L_1$ if and only if $f_n \to f$ almost everywhere. (ss-pnsl)= ### Bounded Linear Operators If $E$ and $F$ are normed linear spaces, then the **operator norm** of $A$ is defined as $$ \| A \| \coloneq \sup_{\| u \| = 1} \| A u \|. $$ (eq-defsnif) (Here $\| u\|$ is the norm of $u$ in $E$ and $\|Au\|$ is the norm of $Au$ in $F$.) When $\| A\|$ is finite, $A$ is called a **bounded linear operator**. The set of all bounded linear operators from $E$ to $F$ will be denoted $\blop(E, F)$. If $E=F$ then we write $\blop(E)$. Every $A \in \blop(E, F)$ is continuous, since, for $u_n \to u$ in $E$ we have $$ \| Au_n - Au\| \leq \|A\| \|u_n - u\| \to 0. $$ The converse is also true: every continuous linear operator from $E$ to $F$ is bounded -- see §2.7 of {cite}`kreyszig1978introductory` for a proof of this fact, as well as {prf:ref}`t-fdimbl` below. ```{prf:theorem} :label: t-fdimbl If $F$ is a finite-dimensional, then every linear operator from $E$ to $F$ is bounded. ``` As suggested by the name, the operator norm is a norm on $\blop(E, F)$. The details are left as an exercise. The operator norm is **submultiplicative**: If $A, B \in \blop(E)$, then $\|A B \| \coloneq \| A \circ B \| \leq \| A \| \cdot \| B \|$. Iteratively applying the submultiplicative property gives $\|A^i\| \leq \|A \|^i$ for any $i \in \NN$ and $A \in \blop(E)$, where $A^i$ is the $i$-th composition of $A$ with itself. Once we have a norm on $\blop(E, F)$, we have an induced metric given by $d(A, B) = \| A - B \|$, and $\blop(E, F)$ will be a Banach space whenever this metric is complete. ```{prf:theorem} If $F$ is a Banach space, then $\blop(E, F)$ is also a Banach space. ``` Let $E$ be a Banach space and let $A$ be an element of $\blop(E)$. A complex scalar $\lambda$ is called an **eigenvalue** of $A \in \blop(E)$ if there exists a nonzero vector $e$ such that $Ae = \lambda e$. The **spectrum** of $A$, typically denoted $\sigma(A)$, is the set of all scalar $\lambda$ such that $\lambda I - A$ fails to be bijective on $E$. Any eigenvalue $\lambda$ lies in $\sigma(A)$ because if $Ae = \lambda e$ for some nonzero $e$, then $\lambda I - A$ maps $e$ to $0$, while also mapping $0$ to $0$. Hence $\lambda I - A$ is not bijective. For $A \in \blop(E)$, the **spectral radius** of $A$ is defined as $$ \rho(A) \coloneq \sup \setntn{|\lambda|}{ \lambda \in \sigma(A)} $$ (eq-dsr) It is well known (see, e.g., {cite}`kreyszig1978introductory`, §7.3) that 1. $\rho(A) \leq \| A \|$, where $\| \cdot \|$ is the operator norm, and 2. $\| A^k \|^{1/k} \to \rho(A)$ as $k \to \infty$ (**Gelfand's formula**). ```{exercise} :label: ex-igf Confirm the existence of a $k \in \NN$ with $\| A^k \| < 1$ when $\rho(K)<1$. ``` ```{exercise} :label: ex-odk Let $\phi$ be a probability measure on measurable space $(\Xsf, \aA)$ and let $\beta$ be a positive real number. Let $K$ be the linear operator on $L_1(\phi) \coloneq L_1(\Xsf, \aA, \phi)$ defined by $Kv = \beta \|v\| \1$ for all $v \in L_1(\phi)$. Prove that $\rho(K)=\beta$. ``` ```{solution} ex-odk Because $\1$ is an eigenvalue (since $K \1 = \beta \1$), the definition of the spectral radius (see {eq}`eq-dsr`) implies that $\rho(K) \geq \beta$. At the same time, for any $v \in L_1(\phi)$ with $\| v \| = 1$ we have $\| Kv \| = \beta$, so $\| K \| = \beta$. Hence $\rho(K) \leq \| K \| = \beta$. We conclude that $\rho(K)=\beta$. ``` The following theorem is essential for many results in the book. ```{prf:theorem} :label: t-nslbs If $E$ is a Banach space, $A$ is an element of $\blop(E)$ and $\rho(A) < 1$, then $I - A$ is nonsingular and $$ (I - A)^{-1} = \sum_{j=0}^{\infty} A^j $$ ``` (The infinite sum is defined as the limit of the partial sums in $E$. Hence the infinite sum exists if and only if the partial sums converge in $E$.) ```{prf:proof} First observe that the sequence $B_n \coloneq \sum_{i=0}^n A^i$ is Cauchy when $\rho(A) < 1$. Indeed, using the operator norm, $$ \| B_k - B_{k + n} \| \leq \left\| \sum_{i \geq k}^\infty A^i \right\| \leq \sum_{i \geq k}^\infty \| A^i \|. $$ The final term will converge to zero in $k$ if $\sum_{i=0}^{\infty} \| A^i \|$ is finite. By the root test for convergence of series, this will be true whenever we have $\limsup_{i \to \infty} \| A^i \|^{1/i} < 1$. We know this is true by the hypothesis $\rho(A)<1$ and Gelfand's formula. Since $\blop(E)$ is complete under the operator norm, this Cauchy property implies that the limit $\sum_{i=0}^{\infty} A^i$ exists. Moreover, $(I - A) \sum_{i=0}^{\infty} A^i = I$, since $$ \left\| (I - A) \sum_{i=0}^{\infty} A^i - I \right\| = \lim_{n \to \infty} \left\| (I - A) \sum_{i=0}^n A^i - I \right\| = \lim_{n \to \infty} \left\| A^{n+1} \right\| $$ and the right hand side converges to zero by $\rho(A) < 1$ and Gelfand's formula. ◻ ``` An immediate consequence is global stability of affine maps with spectral radius less than one: ```{prf:corollary} :label: c-ibnl Fix $A \in \blop(E)$ and $b \in E$, where $E$ is a Banach space. If $\rho(A) < 1$, then the operator $T$ on $E$ defined by $Tv = Av + b$ is globally stable, with unique fixed point in $E$ given by $$ \bar v \coloneq (I - A)^{-1} b = \sum_{t \geq 0} A^t b. $$ ``` ```{exercise} :label: ex-affsop Consider the self-map on $\RR^k$ defined by $S v = r + A v$ where $r \in \RR^k$ and $A \in \RR^{k \times k}$. Show that $S$ is strongly order stable on $\RR^k$ when $A \geq 0$ and $\rho(A)<1$. ``` ```{solution} ex-affsop By the Neumann series lemma, $S$ has a unique fixed point in $\RR^k$ given by $\bar v \coloneq (I-A)^{-1} r$. Fix $v \in \RR^k$ with $v \leq S v$. Since $S$ is order preserving, the sequence $(S^n v)$ is increasing. The $n$-th element of this sequence is $S^n v = A^n v + \sum_{i=0}^{n-1} A^i r$. Since $\rho(A) < 1$, the pointwise limit of this sequence is $\sum_{i=0}^\infty A^i r = (I - A)^{-1} r = \bar v$. By {prf:ref}`l-pcid`, $\bar v$ is also the supremum of $(S^n v)$. This proves one direction of the definition of strong order stability. The proof of the other direction is similar. ``` (s-order)= ## Order In this section we study order completeness and order continuity, ordered vector spaces and Riesz spaces, the interplay between topology and order including Banach lattices and weighted sup-norm spaces, Markov models, and orders over distributions. (ss-ococ)= ### Order Continuity and Order Completeness When studying the real line $\RR$, we can define completeness either as existence of limits for Cauchy sequences ({prf:ref}`t-compreal`) or as existence of suprema for bounded above sets ({prf:ref}`t-completeness`). The first idea can be extended to metric spaces by generalizing the concept of Cauchy sequences (see {ref}`ss-completeness`). The second can be extended to posets by analogy with existence of suprema. The aim of this section is to describe this second concept of completeness. #### Lattices and Chains When we discuss posets, there are multiple notions of completeness, with each one determined by the classes of sets that are required to have suprema. Specifically, a nonempty poset $V$ is called - a **lattice** if every finite subset of $V$ has both a supremum and an infimum in $V$, - **chain complete** if every chain in $V$ has a supremum and an infimum in $V$, and - **countably chain complete** if every at most countable chain in $V$ has a supremum and an infimum in $V$. Note that chain completeness implies countable chain completeness but not conversely, and that neither concept implies nor is implied by the lattice property. In the definitions above, finite sets are understood to be nonempty (i.e., in one-to-one correspondence with $\{1, \ldots, n\}$ for some $n \in \NN$). Also, "at most countable" means empty, finite or countable. The fact that the empty set is included has significance, as the next lemma illustrates. ```{prf:lemma} :label: l-obdccc2 If $V$ is countably chain complete, then $V$ is order bounded. ``` ```{prf:proof} This follows from {prf:ref}`ex-eocpl` on page , since $\varnothing$ is a chain in $V$. ◻ ``` ```{exercise} :label: ex-charcc Consider a poset $V$ where the next two conditions hold for any sequence $(v_n) \subset V$: 1. If $(v_n)$ is decreasing, then there exists a $v \in V$ with $v_n \downarrow v$. 2. If $(v_n)$ is increasing, then there exists a $v \in V$ with $v_n \uparrow v$. Prove that $V$ is countably chain complete whenever $V$ is order bounded. ``` ```{solution} ex-charcc Let $V$ be an order bounded poset satisfying the two convergence conditions in {prf:ref}`ex-charcc` and let $C$ be an at most countable chain in $V$. If $C = \emptyset$, then the supremum of $C$ is the least element of $V$ and the infimum is the greatest element ({prf:ref}`ex-eocpl`). In particular, $C$ has an infimum and a supremum in $V$. If $C$ is a nonempty countable chain in $V$, then we can enumerate it as either an increasing or a decreasing sequence in $V$. In either case, existence of an infimum or supremum follows from the two convergence conditions. Finally, If $C$ is a nonempty finite chain, then we can view it as an eventually constant sequence and the same logic applies. ``` ```{prf:example} :label: eg-rxcc2 If $V = [g, h]$ is all $f \in \RR^\Xsf$ with $g \leq f \leq h$, then $(V, \leq)$ is chain complete. To see this, let $C$ be a chain in $V$. If $C$ is nonempty, then $s(x) := \sup_{f \in C} f(x)$ and $i(x) := \inf_{f \in C} f(x)$ are well-defined elements of $V$, and also the supremum and infimum of $C$ in $V$ by {prf:ref}`ex-polim`. If $C$ is empty, then we take $\bigvee C = g$ and $\bigwedge C = h$, which are again the supremum and infimum of $C$ in $V$ (see {prf:ref}`ex-eocpl`). Either way, every chain in $V$ has a supremum and an infimum. ``` ```{exercise} :label: ex-math_foundations-auto-5 Give an example of a lattice that is not chain complete. ``` ```{solution} ex-math_foundations-auto-5 The set $\RR^n$ with the pointwise partial order is a lattice but not chain complete. ``` ```{exercise} :label: ex-ccdual Let $V$ be any poset. Show that the dual poset $V^\partial$ is countably chain complete whenever $V$ is countably chain complete. ``` ```{solution} ex-ccdual We use the characterization of countable chain completeness from {prf:ref}`ex-charcc` on page . Assume the stated conditions. Clearly $V^\partial$ is order bounded, with top given by the bottom of $V$ and bottom given by the top of $V$. Also, if $(v_n)$ is an increasing sequence in $V^\partial$, then $(v_n)$ is decreasing in $V$. Since $V$ is countably chain complete, there exists a $v \in V$ with $v_n \downarrow v$. By {prf:ref}`ex-dualsi` on page , the infimum $v$ is the supremum of $(v_n)$ in $V^\partial$. In particular, there exists a $v$ with $v_n \uparrow v$ in $V^\partial$. A similar argument handles the case where $(v_n)$ is decreasing in $V^\partial$. We conclude that $V^\partial$ is countably chain complete. ``` The following result is a version of the Knaster--Tarski fixed point theorem for chain complete posets. In the statement, $V$ is a nonempty poset. ```{prf:theorem} :label: t-ccfp If $V$ is chain complete and $S$ is an order preserving self-map on $V$, then $S$ has a fixed point in $V$. ``` ```{prf:proof} See, for example, Theorems 8.11 and 8.22 of {cite}`davey2002introduction`. ◻ ``` ```{prf:lemma} :label: l-ocius Let $S$ be an order-preserving self-map on $V$. If $S$ has at most one fixed point in $V$ and $V$ is chain complete, then 1. $v \in V$ with $v \preceq S \, v$ implies $v \preceq \bar v$, and 2. $v \in V$ with $S \, v \preceq v$ implies $\bar v \preceq v$. ``` ```{prf:proof} First suppose that $V$ is chain complete, with greatest element $\top$ and least element $\bot$. Fix $v \in V$ with $Sv \preceq v$. Since $I \coloneq [\bot, v]$ is itself chain complete, and since $S$ maps $I$ to itself and is order preserving, the Knaster--Tarski fixed point theorem implies that $S$ has a fixed point $\bar v$ in $I$. By assumption, $\bar v$ is the only fixed point of $S$ in $V$. Moreover, $\bar v \in I$, so $\bar v \preceq v$. This proves (ii) in the lemma. A similar argument proves (i). ◻ ``` A **sublattice** of a lattice $V$ is a subset $S$ of $V$ with the property that $u \vee v$ and $u \wedge v$ are in $S$ whenever $u, v \in S$. ```{exercise} :label: ex-bcsl Let $\Xsf$ be a metric space and let $bc\Xsf$ be the bounded continuous functions from $\Xsf$ to $\RR$. Prove that $bc\Xsf$ is a sublattice of $\RR^\Xsf$. ``` (sss-dede)= #### Dedekind Completeness Consider the canonical partially ordered set $(\RR^k, \leq)$. This set is not countably chain complete: for example, letting $\1$ be a vector of ones, the increasing sequence $(v_n) = (n \1)$ has no supremum. At the same time, $(\RR^k, \leq)$ certainly has some completeness properties. For example, it follows easily from {prf:ref}`ex-polim` that every bounded above subset of $\RR^k$ has a supremum, and every bounded below subset of $\RR^k$ has an infimum. This motivates the following definitions: A partially ordered set $V$ is called **Dedekind complete** if, for any nonempty $A \subset V$, 1. $A$ is bounded above $\implies$ $A$ has a supremum in $V$ and 2. $A$ is bounded below $\implies$ $A$ has an infimum in $V$. $V$ is called **countably Dedekind complete** if, for any nonempty finite or countable $A \subset V$, 1. $A$ is bounded above $\implies$ $A$ has a supremum in $V$ and 2. $A$ is bounded below $\implies$ $A$ has an infimum in $V$. ```{prf:example} :label: eg-rxcc If $\Xsf$ is any nonempty set, then $(\RR^\Xsf, \leq)$ is Dedekind complete. Indeed, if $G \subset \RR^\Xsf$ is nonempty and bounded above, then $s(x) = \sup_{g \in G} g(x)$ exists in $\RR$ at each $x \in \Xsf$. Hence, by {prf:ref}`ex-polim`, the supremum $\bigvee G$ exists in $\RR^\Xsf$ (and equals $s$). A similar argument shows that any nonempty bounded below subset of $\RR^\Xsf$ has an infimum. ``` There are natural connections between Dedekind (resp., countable Dedekind) completeness and chain (resp. countable chain) completeness. Here is one simple result. ```{prf:lemma} :label: l-obdcccd Let $V$ be any poset. - If $I = [a, b] \subset V$ and $V$ is Dedekind complete, then $(I, \preceq)$ is chain complete. - If $I = [a, b] \subset V$ and $V$ is countably Dedekind complete, then $(I, \preceq)$ is countably chain complete. ``` ```{prf:proof} For part (i), let $I = [a, b]$, $V$ be as stated and let $A$ be a subset of $I$. On one hand, if $A$ is nonempty, then, by Dedekind completeness, $s \coloneq \bigvee A$ exists in $V$. Since $a \preceq s \preceq b$, we have $s \in I$, and $s$ is the supremum of $A$ in $(I, \preceq)$. A similar argument shows that $A$ has an infimum in $(I, \preceq)$. On the other hand, if $A = \varnothing$, then $a$ is an upper bound of $\varnothing$ (vacuously -- see {prf:ref}`ex-eocpl`) and $a \preceq v$ for every upper bound $v$ of $\varnothing$ in $I$ (in fact for every $v \in I$). Hence $a$ is the supremum of $A$ in $(I, \preceq)$. A similar argument shows that $b$ is the infimum of $\varnothing$ in $(I, \preceq)$. The proof of part (ii) is very similar to the proof of part (i). ◻ ``` ```{exercise} :label: ex-cdcdual Let $V$ be any poset. Show that $V^\partial$ is countably Dedekind complete whenever $V$ is countably Dedekind complete. ``` ```{solution} ex-cdcdual Let $V$ be countably Dedekind complete and let $A$ be finite or countable and, for the partial order $\preceq^\partial$, bounded above by some $b \in V$. This means that $a \preceq^\partial b$ for all $a \in A$, so $b \preceq a$ for all $a \in A$. In other words, $b$ is a lower bound for $A$ in $V$. Thus, by countable Dedekind completeness, the infimum of $A$ exists in $V$. Applying {prf:ref}`ex-dualsi`, we see that the supremum of $A$ exists in $V^\partial$. The proof for the bounded below case is similar. ``` (ss-ordercon)= #### Order Continuity We call a map $S$ from poset $V$ to poset $W$ **order continuous** on $V$ if $$ S v_n \uparrow S v \quad \text{whenever } v_n \uparrow v. $$ In other words, if $(v_n) \subset V$ with $v_n \uparrow v \in V$, then $\bigvee_n S v_n$ exists in $W$ and equals $Sv$. ```{prf:remark} The definition of order continuity varies across subfields of mathematics. The notion we use here is relatively weak but all we will need for our analysis. (In some sources, what we call order continuity is referred to as $\sigma$-order continuity, or countable order continuity. Since we use no other notions of order continuity, we maintain the simpler name.) ``` In the next lemma, $V$ and $W$ are arbitrary posets. ```{prf:lemma} :label: l-cciop Every order continuous map from $V$ to $W$ is order preserving. ``` ```{prf:proof} Let $S$ be an order continuous map from $V$ to $W$. Fix $v, v' \in V$ with $v \preceq v'$. Let $(v_n)$ be such that $v_1 = v$ and $v_n = v'$ for all $n > 1$. Evidently $\bigvee_n v_n = v'$. Since $S$ is order continuous, the supremum $\bigvee S v_n$ exists in $W$ and equals $Sv'$. This tells us that the supremum of $\{Sv, Sv'\}$ is $S v'$. Hence $Sv \preceq Sv'$ and $S$ is order preserving. ◻ ``` Next we state a variation on the Tarski--Kantorovich fixed point theorem. ```{prf:theorem} :label: t-tk Let $S$ be order continuous self-map on $V$. If $V$ is countably Dedekind complete and there exist elements $v_a \preceq v_b$ in $V$ with $v_a \preceq S v_a$ and $S v_b \preceq v_b$, then there exists a $\bar v \in V$ such that $S^n v_a \uparrow \bar v$ and, moreover, $S \bar v = \bar v$. ``` ```{prf:proof} Let $S, V$ be as stated. Fix $v_a \preceq v_b$ in $V$ with $v_a \preceq S v_a$ and $S v_b \preceq v_b$. The map $S$ is order continuous and hence order preserving, so the sequence $(v_n) \coloneq (S^n v_a)$ is increasing. As the set $V$ is countably Dedekind complete and the sequence is bounded above by $v_b$, the suprema $\bigvee_{n \geq 1} v_n$ and $\bigvee_{n \geq 1} S v_n$ exist in $V$. If $\bar v \coloneq \bigvee_n v_n$, then, by order continuity, $S \bar v = S \bigvee_{n \geq 1} v_n = \bigvee_{n \geq 1} S v_n = \bigvee_{n \geq 2} v_n = \bar v$. Hence $S \bar v = \bar v$. We have also shown that $S^n v_a \uparrow \bar v$. ◻ ``` Here's a more standard version of the Tarski--Kantorovich fixed point theorem. ```{prf:theorem} :label: t-tk2 If $S$ is an order continuous self-map on $V$ and $V$ is countably chain complete, then $S$ has a fixed point in $V$. ``` ```{prf:proof} By countable chain completeness, $V$ has a least element $v_a$ and greatest element $v_b$. For these elements we have $v_a \preceq S v_a$ and $S v_b \preceq v_b$. An essentially identical argument to the one in the proof of {prf:ref}`t-tk` shows that $(S^n v_a)$ has a supremum and that supremum is a fixed point of $S$. ◻ ``` The next lemma is analogous to {prf:ref}`l-ocius`. ```{prf:lemma} :label: l-ocius2 Let $S$ be a self-map on countably chain complete poset $V$. If $S$ has at most one fixed point in $V$ and $S$ is order continuous, then 1. $v \in V$ with $v \preceq S \, v$ implies $v \preceq \bar v$, and 2. $v \in V$ with $S \, v \preceq v$ implies $\bar v \preceq v$. ``` ```{prf:proof} The proof is similar to that of {prf:ref}`l-ocius`, with the only significant difference being that the Tarski--Kantorovich theorem (page ) is used to obtain the fixed point. ◻ ``` (ss-ovs)= ### Ordered Vector Space Next we add algebraic structure to posets. The combination of algebraic operations and order will allow us to develop sharp sufficient conditions for dynamic programs and convergence of algorithms. (sss-ovs)= #### Definition and Properties Let $E = (E, +, \cdot)$ be a vector space with origin $0$ (see {ref}`ss-absvec`) and let $\leq$ be a partial order on $E$. We call $(E, \leq)$ an **ordered vector space** if the order is preserved under addition and nonnegative scalar multiplication; that is, if 1. $u \leq v$ implies $u + b \leq v + b$ for any $b \in E$, and 2. $u \leq v$ and $\alpha \in \RR$ with $0 \leq \alpha$ implies $\alpha u \leq \alpha v$. The **positive cone** of $E$, typically denoted by $E_+$, is all $v \in E$ with $0 \leq v$. ```{prf:example} :label: eg-rnovs $\RR^n$ is an ordered vector space under the pointwise order $\leq$, with positive cone equal to the set of nonnegative vectors in $\RR^n$. ``` ```{exercise} :label: ex-math_foundations-auto-6 Let $\mathcal S$ be the vector space of all symmetric $n \times n$ matrices (with addition and scalar multiplication defined in the obvious way) and let $\nN$ be the negative semidefinite matrices in $\mathcal S$. As in {ref}`sss-lqasadp`, we impose the Loewner partial order, writing $A \preccurlyeq B$ when $A-B \in \nN$. Show that $(\mathcal S, \preccurlyeq)$ is an ordered vector space. ``` ```{exercise} :label: ex-rxovs Let $\Xsf$ be any nonempty set and let $\RR^\Xsf$ be the vector space of real-valued functions on $\Xsf$. Let $\leq$ be the pointwise partial order. Show that $(\RR^\Xsf, \leq)$ is an ordered vector space. ``` If $(E, \leq)$ is an ordered vector space and $u, v, w \in E$, then 1. $u \leq 0$ and $v \leq 0$ implies $u + v \leq 0$, 2. $u \leq v$ implies $-v \leq -u$, 3. $(u \vee v) + w = (u + w) \vee (v + w)$, and 4. $\alpha (u \vee v) = (\alpha u) \vee (\alpha v)$ whenever $\alpha \geq 0$. These facts follow directly from the definitions. Using the definition in {ref}`sss-infsuppo`, if $(v_n)$ is a sequence in ordered vector space $E$ and $v \in E$, then the statement $v_n \uparrow v$ means that $(v_n)$ is increasing and $\bigvee_n v_n = v$. ```{exercise} :label: ex-imroc Prove that 1. if $u_n \uparrow 0$ and $v_n \uparrow 0$, then $u_n + v_n \uparrow 0$, and 2. if $u_n \uparrow u$ and $b \in E$, then $u_n + b \uparrow u + b$. ``` ```{solution} ex-imroc Suppose $u_n \uparrow 0$ and $v_n \uparrow 0$. Let $U$ be the set of upper bounds of $(u_n + v_n)$. Since $u_n \leq 0$ and $v_n \leq 0$ for all $n$ we see that $0 \in U$. Fixing any $w \in U$, monotonicity of the sequences gives $u_n + v_m \leq w$ for all $n, m$, from which we obtain $u_n \leq w - v_m$ for all $n, m$ and hence $0 \leq w - v_m$ (because $0$ is the supremum of $(u_n)$). Rearranging gives $v_m \leq w$ for all $m$ and hence $0 \leq w$. This proves that $0$ is a least element of $U$, so $0$ is the supremum of $(u_n + v_n)$. Regarding the second claim, suppose $u_n \uparrow u$ and fix $b \in E$. Let $U$ be the set of upper bounds of $(u_n + b)$. Since $u_n \leq u$ for all $n$ we see that $u + b \in U$. If $w \in U$, then $u_n \leq w - b$ for all $n$, so $u \leq w - b$, or $u + b \leq w$. This proves that $u + b$ is a least element of $U$, so $u+b$ is the supremum of $(u_n + b)$. ``` ```{prf:lemma} :label: l-ocproper Let $(u_n)$ and $(v_n)$ be sequences in ordered vector space $E$ and let $\alpha, \beta$ be nonnegative constants. The following implications hold: 1. If $u_n \uparrow u$ and $v_n \uparrow v$, then $\alpha u_n + \beta v_n \uparrow \alpha u + \beta v$. 2. If $u_n \uparrow u$, then $-u_n \downarrow -u$. ``` ```{prf:proof} These claims follow from Theorem 10.2 of {cite}`zaanen2012introduction`. They can also be obtained by applying and extending the results in {prf:ref}`ex-imroc`. ◻ ``` In some settings, a partial order is introduced into a vector space $E$ by first choosing a (pointed convex) cone $C$ on $E$ (see {ref}`sss-convexity`) and stating that $u \leq v$ if and only if $v - u \in C$. The following discussion clarifies this idea. ```{exercise} :label: ex-math_foundations-auto-7 With $\leq$ defined as above, show that $(E, \leq)$ is an ordered vector space and that $C$ is the positive cone of $(E, \leq)$. ``` ```{exercise} :label: ex-math_foundations-auto-8 Continuing the previous exercise, show that if $E$ is a normed linear space and $C$ is closed in $E$, then $\leq$ is a closed partial order (see page ). ``` ```{exercise} :label: ex-math_foundations-auto-9 Show conversely that if $(E, \leq)$ is an ordered vector space, then the positive cone in $E$ is a (pointed convex) cone. ``` (sss-posops)= #### Operators on Ordered Vector Space A linear operator $T$ mapping ordered vector space $E$ to itself is called **positive** if $T$ is invariant on the positive cone; that is, if $u \in E$ and $u \geq 0$ implies $Tu \geq 0$. ```{prf:example} Let $\leq$ be the pointwise order on $\RR^n$ and let $A$ be an $n \times n$ matrix. We identify $A$ with the linear operator $\RR^n \ni x \mapsto Ax \in \RR^n$. This operator is positive if and only if all elements of $A$ are nonnegative. ``` (In the canonical example given above, *positive* operators are identified with *nonnegative* matrices. Unfortunately, this notational inconsistency is deeply embedded in the existing literature so we must accept it.) ```{exercise} :label: ex-math_foundations-auto-10 Prove that a linear operator mapping $E$ to itself is positive if and only if it is order preserving. ``` Let $E$ be an ordered vector space and let $A \colon E \to E$ be a linear operator. Recalling the definition in {ref}`ss-ordercon`, $A$ is order continuous on $E$ when $(v_n) \subset E$ and $v_n \uparrow v \in E$ implies $Av_n \uparrow Av$. By {prf:ref}`l-cciop`, every order continuous linear operator is order preserving -- and hence positive. The next exercise can be completed using {prf:ref}`l-ocproper`. ```{exercise} :label: ex-looce Show that, for a positive linear operator $A \colon E \to E$, the following statements are equivalent: 1. $A$ is order continuous on $E$. 2. $Av_n \downarrow Av$ whenever $(v_n) \subset E$ and $v_n \downarrow v \in E$. 3. $Av_n \downarrow 0$ whenever $(v_n) \subset E$ and $v_n \downarrow 0$. ``` ```{solution} ex-looce We prove that (iii) implies (i) and leave other details to the reader. Let $A$ be a positive linear operator mapping $E \to E$. Fix $(v_n) \subset E$ with $v_n \uparrow v \in E$. Using results from {prf:ref}`l-ocproper` we have $v - v_n \downarrow 0$ and hence, by (iii), $A (v - v_n) \downarrow 0$. Using linearity and results from {prf:ref}`l-ocproper` again, we obtain $A v_n \uparrow A v$. Hence (i) holds. ``` A self-map $S$ on a convex subset $C$ of ordered vector space $E \coloneq (E, \leq)$ is called **convex** on $C$ if $$ S(\lambda v + (1-\lambda) v') \leq \lambda Sv + (1-\lambda) Sv' \text{ whenever } v \leq v' \in C \text{ and } 0\leq \lambda \leq 1 $$ The map $S$ is called **concave** on $C$ if $$ \lambda Sv + (1-\lambda) Sv' \leq S(\lambda v + (1-\lambda) v') \text{ whenever } v \leq v' \in C \text{ and } 0\leq \lambda \leq 1 $$ ```{exercise} :label: ex-math_foundations-auto-11 Let $(\Xsf, \aA)$ be a measurability space and let $C = [0, b]$ be an order interval of $b\Xsf$. Suppose that $S v = u + f(K v)$ maps $C$ to itself, where $u \in b\Xsf$, $K$ is a linear operator on $b\Xsf$ and $f \colon \RR_+ \to \RR_+$ is concave. The function $f$ is applied pointwise, so that $f(Kv)(x) = f((Kv)(x))$. Prove that $S$ is concave on $C$. ``` (sss-riesz)= #### Riesz Space Next we introduce Riesz spaces, which are ordered vector spaces with lattice structure. This structure allows for the introduction of a notion of absolute value, which behaves similarly to the pointwise absolute value over vectors in $\RR^n$. Absolute value in turn helps us clarify and quantify the actions of operators, providing new opportunities for establishing optimality conditions in dynamic programs. An ordered vector space $E$ is called a **Riesz space** if $E$ is a lattice. With $\vee$ and $\wedge$ as the lattice operations and $u, v, w \in E$, the following properties always hold: 1. $u \wedge v = - ((-u) \vee (-v))$ and $u \vee v = - ((-u) \wedge (-v))$. 2. $(u \wedge v) + w = (u + w) \wedge (v + w)$ and $(u \vee v) + w = (u + w) \vee (v + w)$. These facts can be easily verified and other related results are found in Chapter 2 of {cite}`zaanen2012introduction`. ```{exercise} :label: ex-math_foundations-auto-12 Show that if $E$ is an ordered vector space and $E$ is closed under $\vee$ (i.e., $u, v \in E$ implies $u \vee v \in E$), then $E$ is a Riesz space. ``` ```{prf:example} :label: eg-rxddc If $\Xsf$ is any nonempty set and $\leq$ is the pointwise order, then $(\RR^\Xsf, \leq)$ is a Riesz space. Indeed, $(\RR^\Xsf, \leq)$ is an ordered vector space ({prf:ref}`ex-rxovs`) and, given $f, g \in \RR^\Xsf$, the pointwise maximum $x \mapsto \max\{f(x) , g(x)\}$ is the supremum $f \vee g$ of $\{f, g\}$. This can be checked directly or by referring back to {prf:ref}`ex-polim`. ``` ```{prf:lemma} :label: l-lsrs Let $V$ be a linear subspace of a Riesz space $E = (E, \leq)$. If $V$ is a sublattice of $E$, then $(V, \leq)$ is a Riesz space. ``` ```{prf:proof} It is clear that any linear subspace of an ordered vector space is again an ordered vector space. Moreover, since $V$ is a sublattice of $(E, \leq)$, it follows that $(V, \leq)$ is itself a lattice. Thus, $V = (V, \leq)$ is a Riesz space. ◻ ``` ```{exercise} :label: ex-math_foundations-auto-13 Provide a counterexample to the claim that every linear subspace $L$ of a Riesz space $(E, \leq)$ is a Riesz space under $\leq$. ``` ```{solution} ex-math_foundations-auto-13 The set of differentiable functions in $bc(0,1)$, the bounded continuous functions on $(0, 1)$, is a linear subspace of $bc(0,1)$ but not a lattice -- and hence not a Riesz space. ``` For element $u$ of any Riesz space $(E, \leq)$ we use the notation $$ |u| \coloneq u \vee (-u), \quad u^+ \coloneq u \vee 0 \quad \text{and} \quad u^- \coloneq (-u) \vee 0. $$ These points in $E$ are called the **absolute value**, **positive part**, and **negative part** of $u$ respectively. One easily shows that $|-u| = |u|$. Also, ```{prf:lemma} :label: l-poavrs For any $u, v \in E$ we have 1. $u = u^+ - u^-$ and $|u| = u^+ + u^-$, 2. $|u| = 0$ if and only if $u=0$, 3. $|u| + |v| = |u + v| \vee |u - v|$, and 4. if $u \in E_+\,$, then $|v| \leq u \iff v \in [-u, u]$. ``` ```{prf:proof} For (i), since $(u \vee v) + w = (u+w) \vee (v+w)$ holds in any ordered vector space, we have $u^+ - u = (u \vee 0) - u = 0 \vee (-u) = u^-$, giving the first equality. For the second, $u \vee (-u) = u \vee (-u) + u - u = (2u \vee 0) - u = 2u^+ - u = 2 u^+ - u^+ + u^- = u^+ + u^-$. For (iii) we refer to Theorem 5.3 of {cite}`zaanen2012introduction`. Regarding (iv), we have $$ |v| \leq u \; \iff \; v \vee (-v) \leq u \; \iff \; v \leq u \text{ and } -v \leq u \; \iff \; -u \leq v \leq u. $$ ◻ ``` Notice that (iii) implies the triangle inequality $|u+v| \leq |u| + |v|$. ```{exercise} :label: ex-ploa Prove that if $K$ is a positive linear operator on Riesz space $E$, then $|Ku| \leq K |u|$ for all $u \in E$. ``` ```{solution} ex-ploa Let $K$ be as described and fix $u \in E$. We use linearity, positivity, and {prf:ref}`l-poavrs` to obtain $$ |K u| = |K(u^+ - u^-)| = |Ku^+ - Ku^-| \leq |Ku^+| + |Ku^-| = Ku^+ + Ku^- = K(u^+ + u^-) = K|u|. $$ ``` We will make use of the following lemma: ```{prf:lemma} :label: l-supineq If $E$ is a Riesz space and $(v_\alpha), (w_\alpha)$ are subsets of $E$, then $$ \left| \vee_\alpha \, v_\alpha - \vee_\alpha \, w_\alpha \right| \leq \vee_\alpha \, \left| v_\alpha - w_\alpha \right| $$ whenever the suprema exist. ``` ```{prf:proof} We have $v_\alpha = v_\alpha - w_\alpha + w_\alpha \leq |v_\alpha-w_\alpha| + w_\alpha$ and hence $\vee_\alpha v_\alpha \leq \vee_\alpha |v_\alpha-w_\alpha| + \vee_\alpha w_\alpha$. Rearranging gives one side of the inequality in {prf:ref}`l-supineq`. The other side is obtained by reversing the roles of $v_\alpha$ and $w_\alpha$. ◻ ``` Here's an obvious corollary when $E = \RR$. ```{prf:corollary} :label: c-supineq Let $\Xsf$ be any set. If $f, g \in \RR^\Xsf$ are both bounded above, then $$ |\sup f - \sup g| \leq \sup |f-g|. $$ ``` (sss-sfmsr)= #### Riesz Spaces of Measurable Functions Let $(\Xsf, \aA, \mu)$ be a $\sigma$-finite measure space. As usual, we let - $m\Xsf$ be the real-valued Borel measurable functions on $(\Xsf, \aA)$ and - $b\Xsf$ be the bounded functions in $m\Xsf$. The vector spaces $m\Xsf$ and $b\Xsf$ are both Riesz spaces when paired with the pointwise partial order $\leq$, with $b\Xsf$ a subset of $m\Xsf$. ```{exercise} :label: ex-math_foundations-auto-14 Show that $m\Xsf$ is a Riesz space. Next, show that $b\Xsf$ is a Riesz space using {prf:ref}`l-lsrs`. ``` (sss-lpipo)= #### Almost Everywhere Pointwise Order Let $(\Xsf, \aA, \mu)$ be as in the previous section and fix $p \in [1, \infty)$. Let $L_p \coloneq L_p(\Xsf, \aA, \mu)$ be the Banach space of equivalence classes defined in {ref}`ss-lp`. Let $\leq$ be defined by $f \leq g$ if and only if $\setntn{x \in \Xsf}{f(x) > g(x)}$ has $\mu$-measure zero. ```{exercise} :label: ex-math_foundations-auto-15 The relation $\leq$ introduced above can be stated more formally as follows: given equivalence classes $f, g$ in $L_p$ we write $f \leq g$ if, for any functions $f_0 \in f$ and $g_0 \in g$, the set $\setntn{x \in \Xsf}{f_0(x) > g_0(x)}$ has $\mu$-measure zero. Using this definition, show that $L_p$ is partially ordered under $\leq$. ``` The space $(L_p, \leq)$ just described is a Riesz space. For example, if $f, g \in L_p$, then $|f \vee g| \leq |f| + |g|$, and $\int |f| \diff \mu$ and $\int |g| \diff \mu$ are both finite. Hence $f \vee g \in L_p$. (sss-ssmf)= #### Dedekind completeness of Riesz Space Since each Riesz space is a partially ordered space, the notions of Dedekind and countable Dedekind completeness apply directly. Moreover, when testing these forms of completeness, one-sided conditions suffice. The following one-sided condition is particularly simple. ```{prf:lemma} :label: l-oscon A Riesz space $E$ is countably Dedekind complete if and only if every bounded increasing sequence in $E$ has a supremum. ``` For a proof of {prf:ref}`l-oscon`, see Theorem 12.1 of {cite}`zaanen2012introduction`. We will make repeated use of the following fact. ```{prf:lemma} :label: l-lpdc The Riesz space $L_p(\Xsf, \aA, \mu)$ from {ref}`sss-lpipo` is Dedekind complete. ``` A proof of {prf:ref}`l-lpdc` can be found in Example 12.5 of {cite}`zaanen2012introduction`. Several interesting function spaces are naturally ordered by the pointwise partial order. Next we study the completeness properties of such Riesz spaces. We will make use of the following lemma, in the statement of which, for $(v_n) \subset \RR^\Xsf$, the symbol $\sup_n v_n$ indicates the pointwise supremum. ```{prf:lemma} :label: l-sdcms Let $\Xsf$ be any nonempty set and let $\leq$ be the pointwise partial order on $\RR^\Xsf$. Let $E = (E, \leq)$ be a Riesz space contained in $\RR^\Xsf$. If $\sup_n v_n$ is in $E$ whenever $(v_n) \subset E$ is increasing and bounded above, then $E$ is countably Dedekind complete. ``` ```{prf:proof} Let $E$ be as above and let $(v_n)$ be a sequence in $V$ that is increasing and bounded above. By assumption, $s \coloneq \sup_n v_n$ exists in $E$. By {prf:ref}`ex-polim` we have $\bigvee_n v_n = s$. In view of {prf:ref}`l-oscon`, the space $E$ is countably Dedekind complete. ◻ ``` Now let $(\Xsf, \aA, \mu)$ be a $\sigma$-finite measure space and let $m\Xsf$ and $b\Xsf$ be the Riesz spaces discussed in {ref}`sss-sfmsr`. As above, let $\leq$ be the pointwise partial order. ```{prf:corollary} :label: c-sdcms The spaces $m\Xsf$ and $b\Xsf$ are countably Dedekind complete under $\leq$. ``` ```{prf:proof} Consider the poset $m\Xsf$. Let $(v_n)$ be increasing and bounded above in $m\Xsf$. In view of {prf:ref}`l-sdcms` we need only show that $s \coloneq \sup_n v_n$ is in $m\Xsf$. This follows from existence of suprema in $\RR$ when subsets are bounded above (so $s$ is real-valued) and {prf:ref}`l-limbm`, which implies measurability. Next consider the poset $b\Xsf$. Let $(v_n) \subset b\Xsf$ be increasing and bounded above by $w \in b\Xsf$. By the same argument as the last paragraph, we have $s \coloneq \sup_n v_n \in m\Xsf$. Moreover, $v_1 \leq s \leq w$ with $v_1, w \in b\Xsf$. Hence $s \in b\Xsf$. The claim now follows from {prf:ref}`l-sdcms`. ◻ ``` (ss-bl)= ### Topology and Order In some applications it will be helpful to draw on results that use topological or metric structure. In this section we note some elementary facts about topological, metric and normed spaces where order is also present. (ss-pospace)= #### Partially Ordered Space A partial order $\preceq$ on topological space $V$ is called **closed** if, given any two nets $(u_\alpha)_{\alpha \in \Lambda}$ and $(v_\alpha)_{\alpha \in \Lambda}$ contained in $V$, $$ u_\alpha \to u, \;\; v_\alpha \to v \; \text{ and } \; u_\alpha \preceq v_\alpha \text{ for all } \alpha \in \Lambda \quad \implies \; u \preceq v. $$ (eq-cpor) A **partially ordered space**, also called a **pospace**, is a Hausdorff topological space endowed with a closed partial order. (We make the Hausdorff assumption so that sequences have unique limits.) ```{prf:example} One canonical pospace is $\RR^n$ paired with the product topology and the pointwise partial order $\leq$. The pointwise order $\leq$ is closed because convergence in the product topology implies pointwise convergence ({ref}`sss-prodtop`), and limits in $\RR$ preserve the usual real-valued order $\leq$. ``` The next lemma connects topological and order convergence in partially ordered space $V = (V, \preceq)$. In the statement, $(v_\alpha)_{\alpha \in \Lambda}$ is a net in $V$. ```{prf:lemma} :label: l-posmcoc If $v \in V$ with $v_\alpha \to v$ and $v_\alpha \preceq v$ for all $\alpha \in \Lambda$, then $\bigvee_\alpha v_\alpha = v$. ``` ```{prf:proof} Let $(v_\alpha)$ and $v$ be as stated. By assumption, $v$ is an upper bound of $(v_\alpha)$. If $w$ is any other upper bound, then $v_\alpha \preceq w$ for all $\alpha$. Since $\preceq$ is closed and $v_\alpha \to v$, this implies $v \preceq w$. Hence $v$ is the least upper bound of $(v_\alpha)$. ◻ ``` The next lemma shows how global stability (see {ref}`sss-stames`) interacts with order stability in the setting of partially ordered space. ```{prf:lemma} :label: l-pspace Let $V$ be a partially ordered space and let $S$ be an order preserving self-map on $V$. If $S$ is globally stable on $V$, then $S$ is strongly order stable on $V$. ``` ```{prf:proof} Let $S, V$ have the stated properties and let $\bar v$ be the unique fixed point of $S$ in $V$. If $v \in V$ and $v \preceq S \, v$, then, iterating on this inequality and using the fact that $S$ is order preserving, we have $v \preceq S^n v$ for all $n \in \NN$. Since the partial order is closed and $S$ is globally stable, taking the limit gives $v \preceq \bar v$. Using this inequality and $v \preceq S^n v$, we have $v \preceq S^n v \preceq S^n \bar v = \bar v$ for all $n$. Since $S^n v \to \bar v$, {prf:ref}`l-posmcoc` implies that $S^n v \uparrow \bar v$. This proves one direction of the definition of strong order stability. The proof of the other direction is similar. ◻ ``` The following result can be used to compare fixed points of operators. In the statement, $V = (V, \preceq)$ is a pospace and $\sS(V)$ is all self-maps on $V$, ordered pointwise (i.e., for $S, T \in \sS(V)$, we have $S \preceq T$ if and only if $Sv \preceq Tv$ for all $v \in V$). ```{prf:proposition} :label: p-ofpdsms Fix $S, T$ in $\sS(V)$. If $S \preceq T$ and, in addition, $T$ is order preserving and globally stable on $V$, then its unique fixed point dominates any fixed point of $S$. ``` ```{exercise} :label: ex-math_foundations-auto-16 Prove {prf:ref}`p-ofpdsms`. ``` ```{solution} ex-math_foundations-auto-16 Assume the conditions of the proposition and let $u_T$ be the unique fixed point of $T$ in $V$. Let $u_S$ be any fixed point of $S$. Since $S \preceq T$, we have $u_S = S u_S \preceq T u_S$. Applying $T$ to both sides of this inequality and using the order-preserving property of $T$ and transitivity of $\preceq$ gives $u_S \preceq T^2 u_S$. Continuing in this fashion yields $u_S \preceq T^k u_S$ for all $k \in \NN$. Taking the limit in $k$ and using the fact that $\preceq$ is closed gives $u_S \preceq u_T$. ``` (sss-snms)= #### Partially Ordered Metric Space A **partially ordered metric space** is a tuple $(V, \preceq, d)$ where $(V, \preceq)$ is a poset, $d$ is a metric on $V$, and $(V, \preceq)$ is a pospace under the topology induced by $d$ on $V$. In particular, $\preceq$ is closed with respect to $d$, so that $$ d(v_n, v) \to 0 \text{ and } d(u_n , u) \to 0 \text{ with } u_n \preceq v_n \text{ for all } n \text{ implies } u \preceq v \text{.} $$ On a partially ordered metric space $V = (V, \preceq, d)$, the metric $d$ is called **sup-nonexpansive** if, for all subsets $(v_\alpha)$ and $(w_\alpha)$ of $V$, we have $$ d \left( \vee_\alpha \, v_\alpha, \vee_\alpha \, w_\alpha \right) \leq \sup_\alpha \, d(v_\alpha, w_\alpha) $$ (eq-sne) whenever the suprema exist. Sup-nonexpansive metrics will be useful for us because contraction properties are passed from collections of mappings to their upper envelopes. The next lemma explains. In the statement, 1. $(V, \preceq, d)$ is a partially ordered metric space, 2. $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ is a collection of self-maps on $V$, 3. $V_0$ is a subset of $V$ and $Tv \coloneq \vee_\sigma T_\sigma \, v$ exists at each $v \in V_0$. ```{prf:lemma} :label: l-snms Let $d$ be sup-nonexpansive. If $TV_0 \subset V_0$ and each $T_\sigma \in \TT$ is a contraction of modulus $\beta$ on $V$, then $T$ is a contraction of modulus $\beta$ on $V_0$. ``` ```{prf:proof} Let the stated conditions hold, so that $T$ is a self-map on $V_0$. For any $v, w \in V_0$, we have $$ d(Tv, Tw) = d \left( \vee_\sigma T_\sigma \, v, \vee_\sigma T_\sigma \, w \right) \leq \vee_\sigma \, d ( T_\sigma \, v, T_\sigma \, w ) \leq \beta \, d ( v, w ), $$ This shows that $T$ is a contraction of modulus $\beta$ on $V_0$. ◻ ``` (sss-bldef)= #### Banach Lattices Let $E = (E, \leq)$ be a Riesz space and let $\| \cdot \|$ be a complete norm on $E$, so that $(E, \| \cdot \|)$ is a Banach space. If the norm is compatible with the order structure on $E$, in the sense that $\|u\| \leq \|v\|$ whenever $|u| \leq |v|$, then $\| \cdot \|$ is called a **lattice norm** and $E \coloneq (E, \leq, \| \cdot \|)$ is called a **Banach lattice**. ```{prf:example} :label: eg-lpbl Let $(L_p, \leq)$ be the Riesz space defined in {ref}`sss-lpipo`, where $f \leq g$ if $\mu\{f > g\} = 0$. Paired with this partial order, the Banach space $L_p$ becomes a Banach lattice. ``` If $(v_n)$ is a sequence in $E$ then convergence is as defined for sequences in normed linear space (see {ref}`sss-nonvecsp`): $v_n \to v$ means that $\| v_n - v \| \to 0$ as $n \to \infty$. This should not be confused with $v_n \uparrow v$ and $v_n \downarrow v$, which are defined in terms of suprema and infima (see {ref}`ss-ordercon`). Some relationships between the different forms of convergence are discussed in the next theorem. ```{prf:theorem} :label: t-poblc If $E = (E, \leq, \| \cdot \|)$ is a Banach lattice, then 1. $\leq$ is a closed partial order on $E$ under the norm topology, 2. if $(v_n) \subset V$ is increasing and $v_n \to v$, then $v_n \uparrow v$, and 3. if $(v_n) \subset V$ is decreasing and $v_n \to v$, then $v_n \downarrow v$. ``` ```{prf:proof} A proof of part (i) can be found in 18.4 of {cite}`zaanen2012introduction`. Regarding (ii), suppose first that $v_n$ is increasing and $v_n \to v$. Fix $m \in \NN$. By part (i) and $v_m \leq v_n$ for all $n \geq m$ we have $v_m \leq v$ for all $m$. Hence $v$ is an upper bound of $(v_n)$. If $w$ is another upper bound of $(v_n)$, then $v_n \leq w$ for all $n$ and hence, using part (i) once more, $v \leq w$. Hence $v$ is a least upper bound of $(v_n)$. The proof of (iii) is similar. ◻ ``` (sss-orus)= #### Order Units Let $E$ be a Banach lattice. An element $e \in E_+$ will be called a **normalized order unit** if $\|e\| = 1$ and $|u| \leq \|u\|\, e$ for all $u \in E$. For example, if $E = b\Xsf$, then $e = \1$ is a normalized order unit.[^5] ```{prf:proposition} :label: p-blsn If $E$ is a Banach lattice with normalized order unit $e$, then the norm on $E$ is sup-nonexpansive. ``` ```{prf:proof} Let $\{u_\alpha\}$ and $\{v_\alpha\}$ be subsets of $E$ such that $\vee_\alpha u_\alpha$ and $\vee_\alpha v_\alpha$ exist. Let $c = \sup_\alpha \|u_\alpha - v_\alpha\|$ and assume $c < \infty$, since otherwise the claim is trivial. For each $\alpha$, we have $$u_\alpha = v_\alpha + (u_\alpha - v_\alpha) \leq v_\alpha + |u_\alpha - v_\alpha| \leq \vee_\beta v_\beta + \|u_\alpha - v_\alpha\|\,e \leq \vee_\beta v_\beta + c\,e.$$ Since $\vee_\beta v_\beta + c\,e$ is an upper bound for every $u_\alpha$, we have $\vee_\alpha u_\alpha \leq \vee_\beta v_\beta + c\,e$. By symmetry, $\vee_\beta v_\beta \leq \vee_\alpha u_\alpha + c\,e$. Together these give $\left|\vee_\alpha u_\alpha - \vee_\beta v_\beta\right| \leq c\,e$, and since $\|e\| = 1$, $$\Bigl\|\vee_\alpha u_\alpha - \vee_\alpha v_\alpha\Bigr\| \leq c = \sup_{\alpha} \|u_\alpha - v_\alpha\|. $$ ◻ ``` (sss-weisup)= #### Weighted Sup-Norm Spaces In this section we introduce a class of Banach lattices that are useful for handing unbounded dynamic programming problems. To this end, let $\Xsf$ be a topological space. A **weight function** on $\Xsf$ is a mapping $\ell \in m\Xsf$ with $\ell(x) \geq 1$ for all $x \in \Xsf$. Given a weight function $\ell$ and $v \in \RR^\Xsf$ we introduce the **$\ell$-weighted supremum norm** $$ \| v \|_\ell \coloneq \sup_{x \in \Xsf} \, \frac{|v(x)|}{\ell(x)}. $$ In this setting, we let Elements of $b_\ell \Xsf$ are called **$\ell$-bounded** functions. ```{exercise} :label: ex-math_foundations-auto-17 Show that $b_{\ell}\Xsf$ and $b_{\ell} c\Xsf$ are both linear subspaces of $\RR^\Xsf$ under the usual pointwise notions of addition and scalar multiplication of functions. ``` ```{exercise} :label: ex-math_foundations-auto-18 Show that $b\Xsf \subset b_{\ell}\Xsf$ and $bc\Xsf \subset b_{\ell} c\Xsf$. ``` ```{solution} ex-math_foundations-auto-18 If $v \in b\Xsf$, then $|v| \leq M$ for some $M \in \NN$. But then $|v| / \ell \leq M$, since $\ell \geq 1$. Hence $v \in b_{\ell}\Xsf$. The proof of the second case is similar. ``` ```{exercise} :label: ex-math_foundations-auto-19 Show that $\| \cdot \|_\ell$ is a norm on $b_{\ell}\Xsf$. ``` ```{solution} ex-math_foundations-auto-19 The only nontrivial part of the proof is the triangle inequality. This is still quite straightforward: If $u, v \in b_{\ell}\Xsf$, then, using the triangle inequality in $\RR$, $$ \left| \frac{u}{\ell} + \frac{v}{\ell} \right| \leq \left|\frac{u}{\ell} \right| + \left|\frac{v}{\ell}\right| \leq \| u \|_\ell+ \|v\|_\ell. $$ Taking the supremum on the left-hand side completes the proof. ``` ```{exercise} :label: ex-wncpc Prove that convergence in $b_\ell \Xsf$ implies pointwise convergence; that is, if $(w_n)$ is a sequence in $b_{\ell} \Xsf$ and $\|w_n - w\|_\ell \to 0$ for some $w \in b_{\ell} \Xsf$, then $w_n(x) \to w(x)$ for every $x \in \Xsf$. ``` ```{solution} ex-wncpc Pick any $x \in \Xsf$. We have $|w_n(x) / \ell(x) - w(x) / \ell(x)| \leq \|w_n - w\|_\ell \to 0$ and hence $|w_n(x) - w(x) | \leq \|w_n - w\|_\ell \ell(x) \to 0$. ``` The next theorem gives conditions under which the spaces discussed above are Banach lattices. Proofs can be found in §12.2.1 of {cite}`stachurski2022economic`. ```{prf:theorem} :label: t-wsnsc $b_{\ell}\Xsf$ is a Banach lattice under the norm $\| \cdot \|_\ell$ and the usual pointwise order. If $\ell$ is a continuous function, then $b_\ell c \Xsf$ is also a Banach lattice. ``` ```{exercise} :label: ex-math_foundations-auto-20 Show that $\ell$ is a normalized order unit (see {ref}`sss-orus`) in $b_\ell \Xsf$. ``` ```{solution} ex-math_foundations-auto-20 $\|\ell\|_\ell = 1$ because $\|\ell\|_\ell = \sup_{x \in X} \ell(x)/\ell(x)$. The proof that $|f| \leq \|f\|_\ell\, \ell$ is immediate from the definition. ``` #### Positive Operators on Banach Lattices If $E$ is a Banach lattice, then, as in {ref}`ss-pnsl`, we take $\blop(E)$ to be the norm bounded (and hence norm continuous) linear self-maps on $E$. Let $\blop_+(E)$ be the positive linear self-maps on $E$. ```{prf:theorem} :label: t-poblc2 If $E$ is a Banach lattice, then $\blop_+(E) \subset \blop(E)$. ``` ```{prf:proof} See Theorem 15.1 of {cite}`zaanen2012introduction`. ◻ ``` ```{exercise} :label: ex-bllob Given $A \in \blop_+(E)$, prove that $$ \| A \| = \sup \setntn{ \| A u \| }{ u \in E_+ \text{ and } \| u \| = 1 }. $$ ``` ```{solution} ex-bllob Let $s$ be the supremum of $\| A u \|$ over all $u \in E_+$ with $\| u \| = 1$. Clearly $s \leq \|A\|$. To see that the reverse inequality holds, fix $u \in E$ with $\| u \| = 1$ and let $v = |u| \in E_+$. We claim that $\|Au \| \leq \|Av\|$, which suffices for $\|A \| \leq s$. To verify this claim, we use $|Au| \leq A|u| = Av = |Av|$ and the lattice norm property of $\| \cdot \|$ to obtain $\|Au \| \leq \|Av\|$. ``` Continuing in the setting of {prf:ref}`ex-bllob`, the pointwise partial order on $\blop(E)$ is defined by $A \leq B$ whenever $Av \leq Bv$ for all $v \in E$. The set $\blop_+(E)$ coincides with the positive cone of $\blop(E)$. On this positive cone, the spectral radius is order preserving: ```{prf:theorem} :label: t-orspr If $A, B \in \blop_+(E)$ and $A \leq B$, then $\|A\| \leq \|B\|$ and $\rho(A) \leq \rho(B)$. ``` ```{prf:proof} Fix $0 \leq A \leq B$. If $u \in E_+$ and $\|u\| = 1$, then $0 \leq Au \leq Bu$, so $\|Au \| \leq \|Bu\|$. The bound $\|A\| \leq \|B\|$ now follows from {prf:ref}`ex-bllob`. Regarding the second claim, we use $0 \leq A \leq B$ and by induction we obtain $0 \leq A^k \leq B^k$ for all $k \in \NN$. Hence, by the first claim, $\|A^k \| \leq \|B^k\|$ for all $k$. The inequality $\rho(A) \leq \rho(B)$ now follows from Gelfand's formula. ◻ ``` A Banach lattice $E$ is said to have a **$\sigma$-order continuous norm** if $$ (v_n) \subset E \text{ and } v_n \downarrow 0 \quad \implies \quad \| v_n \| \to 0. $$ ```{prf:example} :label: eg-lpdc0 The Banach lattice $L_p = (L_p(\Xsf, \aA, \mu), \leq)$ discussed in {prf:ref}`eg-lpbl` has order continuous norm. See, for example, {cite}`zaanen2012introduction`, §17. ``` ```{prf:theorem} :label: t-epooc Let $E$ be Banach lattice. If $E$ has $\sigma$-order continuous norm, then every positive linear operator from $E$ to itself is order continuous. ``` ```{prf:proof} Let $E$ be as stated and let $A$ be a positive linear self-map on $E$. Fix $(v_n) \subset E$ with $v_n \downarrow 0$. Since $A$ is a bounded linear operator ({prf:ref}`t-poblc2`) and $E$ has $\sigma$-order continuous norm, we have $\|A v_n\| \leq \|A \| \|v_n\| \to 0$ in $\RR$. Also, $(A v_n)$ is decreasing because $A$ is positive and $(v_n)$ is decreasing. Applying {prf:ref}`t-poblc` yields $Av_n \downarrow 0$. By {prf:ref}`ex-looce`, this convergence is sufficient for order continuity of $A$. ◻ ``` In the next example, $L_p$ is the Banach lattice discussed in {prf:ref}`eg-lpbl`. ```{prf:corollary} :label: c-lpdc If $p \in [1, \infty)$, then every positive linear operator from $L_p$ to itself is order continuous. ``` ```{prf:proof} This follows from {prf:ref}`t-epooc` and {prf:ref}`eg-lpdc0`. ◻ ``` (ss-markop)= ### Markov Models Many dynamic programs have some form of Markov structure (or can be coerced into a Markov framework by suitably changing the state space). Here we review key ideas related to Markov processes and state some useful results. In all of this section ({ref}`ss-markop`), $\Xsf$ is a metric space with Borel sets $\bB$. The symbol $\dD(\Xsf)$ is the set of all distributions (Borel probability measures) on $\Xsf$. If $\Xsf$ is finite, then the metric on $\Xsf$ is the discrete metric (under which all real-valued functions on $\Xsf$ are continuous and $\bB$ is the set of all subsets of $\Xsf$). (sss-sks)= #### Stochastic Kernels Let $\Usf$ be a second metric space. A **transition kernel** from $\Usf$ to $\Xsf$ is a function $N$ from $\Usf \times \bB$ to $\RR_+$ with the property that $u \mapsto N(u,B)$ is Borel measurable for each $B \in \bB$ and $B \mapsto N(u, B)$ is a measure on $(\Xsf, \bB)$ for all $u \in \Usf$. A **stochastic kernel** from $\Usf$ to $\Xsf$ is a transition kernel $P$ from $\Usf$ to $\Xsf$ satisfying $P(u, \Xsf) =1$ for all $u \in \Usf$. Informally, the stochastic kernel $P$ takes a point $u \in \Usf$ and randomly "transitions" to a new point in $\Xsf$ via the distribution $P(u, \diff x)$. A common setting is where $\Usf = \Xsf$. In this case we say that $N$ is a **transition kernel on $\Xsf$**, while $P$ is a **stochastic kernel on $\Xsf$**. ```{prf:example} If $\Xsf$ is finite and $p \colon \Xsf \times \Xsf \to [0,1]$ obeys $\sum_{x' \in \Xsf} p(x, x') = 1$ for all $x \in \Xsf$, then $P$ defined by $$ P(x,B) = \sum_{x' \in B} p(x,x') \qquad (x \in \Xsf,\; B \subset \Xsf) $$ is a stochastic kernel on $\Xsf$. ``` ```{prf:example} If $\mu$ is a $\sigma$-finite measure on $(\Xsf, \bB)$ and $p \colon \Xsf \times \Xsf \to \RR_+$ is Borel measurable with $\int p(x, x') \diff x' = 1$ for all $x \in \Xsf$, then $P$ defined by $$ P(x,B) = \int_B p(x,x') \diff x' \qquad (x \in \Xsf,\; B \subset \Xsf) $$ is a stochastic kernel on $\Xsf$. ``` An $\Xsf$-valued stochastic process $(X_t)_{t =0}^\infty$ on $(\Omega, \fF, \PP)$ is called **$(P, \psi)$-Markov** if $X_0 \eqdist \psi$ and $\PP\{ X_{t+1} \in B \given X_t\} = P(X_t, \diff x')$ with probability one for all $t \geq 0$. If $\psi = \delta_x$ for some $x \in \Xsf$, then we say $(X_t)_{t \geq 0}$ is **$(P, x)$-Markov**. We also say that $(X_t)_{t \geq 0}$ is **$P$-Markov** if $(X_t)_{t \geq 0}$ is $(P, \psi)$-Markov for some $\psi \in \dD(\Xsf)$. Given any stochastic kernel $P$ on $\Xsf$ and any initial condition $x \in \Xsf$, a $(P, x)$-Markov process always exists. In particular, we can take the **canonical construction**: set $\Omega = \Xsf^\infty$, let $X_t(\omega) = \omega_t$ be the coordinate projections, and let $\PP_x$ be the unique probability measure on $\Xsf^\infty$ (equipped with the product $\sigma$-algebra) such that $X_0 = x$ $\PP_x$-a.s. and $\PP_x\{X_{t+1} \in B \given X_0, \ldots, X_t\} = P(X_t, B)$. Existence of $\PP_x$ follows from the Ionescu-Tulcea theorem (see, e.g., {cite}`meyn2009markov`, Chapter 3). Let $\theta \colon \Xsf^\infty \to \Xsf^\infty$ denote the **shift operator** defined by $\theta(x_0, x_1, \ldots) = (x_1, x_2, \ldots)$. The following is the **Markov property** in its general form: for the canonical chain, if $f \colon \Xsf^\infty \to \RR$ is measurable and either nonnegative or bounded, then $$ \EE_x [ f \circ \theta \given X_1 ] = \EE_{X_1} f \quad \PP_x\text{-a.s.} $$ (eq-markovprop) For a proof see {cite}`meyn2009markov`, p. 63. ```{prf:example} :label: eg-srs Suppose $(X_t)_{t \geq 0}$ is defined by $$ X_{t+1} = F(X_t, W_{t+1}), \quad (W_t)_{t \geq 1} \iidsim \phi, \quad X_0 \sim \psi $$ (eq-naro) where $(W_t)_{t \geq 1}$ and iid random elements taking values in metric space $\Zsf$, $F \colon \Xsf \times \Zsf \to \Xsf$ is Borel measurable, and $X_0$ and $(W_t)_{t \geq 0}$ are defined on a common probability space $(\Omega, \fF, \PP)$ and are jointly independent. As a stochastic process, $(X_t)$ is $P$-Markov when $$ P(x,B) \coloneq \PP\{F(x,W_{t+1}) \in B\} = \int \1_B[F(x,z)] \phi(\diff z) \qquad (x \in \Xsf, \; B \in \bB). $$ (eq-artsk) ``` ```{exercise} :label: ex-iooc Let $V$ be a linear subspace of $m\Xsf$ and let $K$ be the **integral operator** defined by $$ (Kv)(x) = \int v(x') N(x, \diff x') \qquad (v \in V, \; x \in \Xsf), $$ where $N$ is a transition kernel (see {ref}`sss-sks`) on $(\Xsf, \aA)$ and we assume that $KV \subset V$. Show that $K$ is a positive linear operator on $V$. In addition, prove the following: If $V$ is closed under pointwise suprema (see {ref}`sss-monseq`), then $K$ is order continuous on $V$. ``` ```{solution} ex-iooc Let $V$, $K$ and $N$ be as stated. Positivity and linearity follow from basic properties of the integral (see {ref}`sss-propin`). Regarding order continuity, fix $v_n \uparrow v$. We claim that $Kv_n \uparrow K v$. Since $V$ is closed under pointwise suprema, {prf:ref}`l-pcid` implies that $v_n$ increases to $v$ pointwise. Moreover, $K$ is positive and therefore order preserving, so $Kv_n$ is increasing and bounded above by $Kv$. As $V$ is closed under pointwise suprema, it follows that $Kv_n$ increases pointwise to the pointwise supremum $\sup_n K v_n$. By the monotone convergence theorem, the limit is $Kv$. Applying {prf:ref}`l-pcid` again, we get $Kv_n \uparrow Kv$. ``` (sss-markop)= #### Markov Operators As before, let $m\Xsf$ be all the measurable functions on $\Xsf$ and let $P$ be a stochastic kernel on $\Xsf$. Given $h \in m\Xsf$ we set $$ (P h)(x) \coloneq \int h(x') P(x, \diff x') \qquad (x \in \Xsf). $$ (eq-amo) whenever the integral is well-defined. We call $P$ the **Markov operator** generated by the stochastic kernel $P$. We use the same symbol because stochastic kernels and Markov operators can be placed in one-to-one correspondence via $$ P(x,B) = (P \1_B)(x) \qquad (x \in \Xsf, \, B \in \bB). $$ (eq-edos) (The stochastic kernel is on the left and the Markov operator is on the right.) (We already studied a version of $P$ in {prf:ref}`eg-scountp` on page .) Intuitively, $(P h)(x)$ represents the expectation of $h(X_{t+1})$ given $X_t = x$. We extend this interpretation below. ```{prf:example} The Markov operator generated by the kernel $P$ from ({eq}`eq-naro`) takes the form $$ (P h)(x) = \int h[ F(x,z) ] \phi(\diff z) \qquad (x \in \Xsf, \; h \in b\Xsf). $$ (eq-moifsd) This expression is intuitive because $(P h)(x)$ represents the expectation of $h(X_{t+1})$ given $X_t = x$, and $h(X_{t+1})= h[F(X_t,W_{t+1})]$. Hence $(P h)(x) = \EE h[F(x,W_{t+1})]$, which is the right hand side of {eq}`eq-moifsd`. ``` ```{prf:lemma} :label: l-ocsk Let $(V, \leq)$ be a subset of the Borel measurable functions on $\Xsf$, ordered pointwise, and let $P$ be a stochastic kernel on $\Xsf$. If $PV \subset V$, then $P$ is order continuous on $V$. ``` ```{prf:proof} Let $V$ and $P$ be as stated. Let $(v_n)$ be an increasing sequence in $V$ with $v_n \uparrow v \in V$. By {prf:ref}`l-pcid`, $v_n$ converges pointwise to $v$. Fixing $x \in \Xsf$ and applying the monotone convergence theorem ({prf:ref}`t-dct`), we have $$ \lim_n \int v_n(x') P(x, \diff x') = \int v(x') P(x, \diff x'). $$ Since this holds for all $x$, another application of {prf:ref}`l-pcid` gives $Pv_n \uparrow Pv$. ◻ ``` Given a stochastic kernel $P$ on $\Xsf$, a distribution $\phi \in \dD$ is called **stationary** for $P$ if $$ \phi(B) = \int P(x, B) \phi(\diff x) \quad \text{ for all } B \in \bB. $$ ```{prf:example} Consider the AR(1) process $X_{t+1} = \rho X_t + \epsilon_{t+1}$ on $\Xsf = \RR$, where $|\rho| < 1$ and $(\epsilon_t)$ are IID $\nN(0, \sigma^2)$. The stochastic kernel is $P(x, B) = \PP\{\rho x + \epsilon \in B\}$. The unique stationary distribution is $\nN(0, \sigma^2 / (1 - \rho^2))$. ``` #### General Properties The following lemma lists useful properties of the Markov operator ({eq}`eq-amo`) when considered as a linear operator on $b\Xsf$, the set of bounded Borel measurable functions on $\Xsf$. Proofs can be found in {cite}`meyn2009markov`, Chapter 3. ```{prf:lemma} :label: l-mopfp If $P$ is a stochastic kernel on $\Xsf$ and $h \mapsto Ph$ is the Markov operator defined by {eq}`eq-amo`, then the following statements hold: 1. $P$ maps $b\Xsf$ to itself. 2. $P \1_\Xsf = \1_\Xsf$ pointwise on $\Xsf$. 3. $P$ is order preserving on $b\Xsf$: if $h, g \in b\Xsf$ and $h \leq g$, then $P h \leq P g$ 4. $P$ is linear on $b\Xsf$. 5. $\| P^t \| = 1$ for all $t \geq 0$, where $\| \cdot \|$ is the operator norm on $\blop(b\Xsf)$. 6. $\rho(P) = 1$, where $\rho$ is spectral radius on $\blop(b\Xsf)$. ``` (Note that (vi) follows from (v) and Gelfand's formula for the spectral radius.) Here is a fundamental result linking the stochastic kernel $P$, Markov operator $P$, and any $P$-Markov process $(X_t)_{t \geq 0}$. For a proof see {cite}`meyn2009markov`, Proposition 3.4.2. ```{prf:theorem} :label: t-fdofmo Let $P$ be a stochastic kernel on $\Xsf$ and let $(X_t)_{t \geq 0}$ be $P$-Markov. The corresponding Markov operator $P$ on $b\Xsf$ obeys $$ (P^t h)(x) = \EE [ h(X_t) \, | \, X_0 = x ] \quad \text{for all } t \geq 0 \text{ and } h \in b\Xsf. $$ (eq-mllab) ``` (sss-moif)= #### Markov Operators on Integrable Functions Sometimes we wish to consider Markov operators as linear operators over a space of integrable functions. To this end, let $P$ be a stochastic kernel on $\Xsf$ and let $\phi$ be stationary for $P$. As before, we use the same symbol $P$ for the Markov operator defined in {eq}`eq-amo`. The space $L_1(\phi) \coloneq L_1(\Xsf, \bB, \phi)$ is the Banach lattice discussed in {prf:ref}`eg-lpdc0`. ```{prf:lemma} :label: l-mopfpl The following statements hold: 1. $P$ is an element of $\blop(L_1(\phi))$; that is, a bounded linear self-map from $L_1(\phi)$ to itself. 2. $P$ is order preserving on $L_1(\phi)$ 3. $\| P^t \| = 1$ for all $t \geq 0$, where $\| \cdot \|$ is the operator norm on $\blop(L_1(\phi))$. 4. $\rho(P) = 1$, where $\rho$ is spectral radius on $\blop(L_1(\phi))$. ``` The proof of part (i) follows from the following adjoint rule: ```{prf:lemma} :label: l-adjrule Let $P$ be a stochastic kernel on $(\Xsf, \bB)$. If $\mu$ is a probability measure on $(\Xsf, \bB)$ and $h$ is Borel measurable function from $\Xsf$ to $\RR$, either nonnegative or bounded, then $$ \int Ph \diff \mu = \int h \diff \mu P. $$ (eq-adjrule) ``` ```{prf:proof} A proof of this claim for bounded $h$ can be found in Lemma 9.2.8 of {cite}`stachurski2022economic`. To extend this to nonnegative but possibly unbounded $h$ we consider $h_n = h \wedge n$ for each $n \in \NN$ and use the bounded result to obtain $\int Ph_n \diff \mu = \int h_n \diff \mu P$ for all $n$. The equality {eq}`eq-adjrule` now follows from taking limits, with repeated applications of the monotone convergence theorem. ◻ ``` If we apply {eq}`eq-adjrule` with stationary $\psi$ we get $$ \int Ph \diff \psi = \int h \diff \psi. $$ (eq-adjrules) To obtain (i) of {prf:ref}`l-mopfpl` we can use $\int |Ph| \diff \psi \leq \int P|h| \diff \psi$ and the apply {eq}`eq-adjrules` with $h$ replaced by $|h|$. This proves that $Ph$ is $\psi$-integrable whenever $h$ is $\psi$-integrable. Linearity of $P$ is immediate. Part (ii) follows from order preservation of the integral. For (iii), $\|P\| \leq 1$ follows from the bound on $\int |Ph| \diff \psi$ just obtained, and $\|P\| \geq 1$ from $P\1 = \1$. Part (iv) follows from (iii) and Gelfand's formula. (ss-sd)= ### Orders over Distributions Distributions are objects that decision makers naturally have preferences over. For example, speculators care about probability distributions over returns of prospective investments, often preferring distributions that offer high average returns with low risk. A planner might have preferences over the cross-sectional distributions of consumption and wealth. In this section, we discuss common methods for ordering distributions and their relationships with each other. #### Stochastic Dominance Let $\Xsf$ be a metric space and let $\dD(\Xsf)$ be the set of all distributions (i.e., Borel probability measures) on $\Xsf$. Let $ib\Xsf$ be the increasing bounded real-valued functions on $\Xsf$. For $\mu$ and $\nu$ in $\dD(\Xsf)$, we say that - $\nu$ **first order stochastically dominates** $\mu$ and write $\mu \lefsd \nu$ if $$ \int u(x) \mu(\diff x) \leq \int u(x) \nu(\diff x) \; \text{ for every } u \text{ in } ib\Xsf \text{ and} $$ - $\nu$ **second order stochastically dominates** $\mu$ and write $\mu \lessd \nu$ if $$ \int u(x) \mu(\diff x) \leq \int u(x) \nu(\diff x) \; \text{ for every concave } u \text{ in } ib\Xsf. $$ If we refer to stochastic dominance without explicitly stating the order, then the understanding is that we mean *first* order stochastic dominance. ```{prf:example} If $x$ and $y$ are points in $\RR$ with $x \leq y$, then $\delta_x \lefsd \delta_y$, since, for any increasing function $u$ we have $u(x) \leq u(y)$. ``` ```{exercise} :label: ex-math_foundations-auto-21 Let $Y$ be a random variable on $\RR$ with distribution $\mu$. Let $m$ be a nonnegative constant and let $\nu$ be the distribution of $Y + m$. Show that $\mu$ is stochastically dominated by $\nu$. ``` Suppose now that $\Xsf$ is a Borel subset of $\RR$ and fix $F, G \in \dD(\Xsf)$. We understand $F$ and $G$ as cumulative distribution functions. When testing first order stochastic dominance, it is sufficient to restrict attention to increasing functions $u \in b\Xsf$ that take the form $u(x) = \1\{a < x\}$ for some $a \in \Xsf$ (see, e.g., {cite}`stachurski2022economic`, §9.4.1). Recalling the interpretation of the integral given in {eq}`eq-iiim`, this leads to the statement that $F \lefsd G$ if and only if $1 - F(a) \leq 1 - G(a)$ for all $a \in \Xsf$, or $$ F \lefsd G \iff G(x) \leq F(x) \quad \text{ for all } x \in \Xsf $$ (eq-adfsd) ```{exercise} :label: ex-rypa The relation $\lefsd$ yields a partial order on $\dD(\Xsf)$. Prove this in the one-dimensional setting from the previous paragraph, where $\Xsf \subset \RR$. ``` ```{solution} ex-rypa The claim is that $\lefsd$ yields a partial order on $\dD(\Xsf)$. Reflexivity and transitivity are immediate from the definition. Asymmetry follows from the characterization in {eq}`eq-adfsd`. ``` #### Monotone Likelihood Ratios Here is a property that implies first order stochastic dominance: Consider a pair of distributions $(F, G)$ with positive densities $f$ and $g$ on an interval $I$ contained in $\RR$. We say that $(f, g)$ has a **monotone likelihood ratio** if $f/g$ is increasing on $I$; that is, if $$ x, x' \in I \text{ and } x \leq x' \implies \frac{f(x)}{g(x)} \leq \frac{f(x')}{g(x')} $$ (eq-mlr) ```{prf:example} The exponential density is $p(x, \lambda) = \lambda e^{-\lambda x}$ on $\RR_+$, where $\lambda$ is a positive constant. Taking the ratio $r(x) = p(x, \lambda_1) / p(x, \lambda_2)$ of exponential densities with $\lambda_1 \leq \lambda_2$, we have $$ r(x) = \frac{\lambda_1}{\lambda_2} \exp((\lambda_2 - \lambda_1) x) \qquad (x \in \RR_+). $$ ``` Since $r$ is increasing in $x$, the monotone likelihood ratio property holds. ```{prf:proposition} :label: p-mlrisd If $(f , g)$ has the monotone likelihood ratio property on $I$, then $G \lefsd F$. ``` ```{prf:proof} Let $a := \inf I$ and $b := \sup I$. (These values can be infinite.) Writing the monotone likelihood ratio property as $$ x \leq x' \implies f(x) g(x') \leq f(x') g(x) $$ (eq-mlr2) and integrating with respect to $x$ from $a$ to $x'$ gives $F(x') g(x') \leq f(x') G(x')$. Also, integrating {eq}`eq-mlr2` with respect to $x'$ from $x$ to $b$ gives $f(x) [1 - G(x)] \leq [1-F(x)] g(x)$. Setting $x = x' = y$ in the last two inequalities yields $$ \frac{1 - G(y)}{1 - F(y)} \leq \frac{g(y)}{f(y)} \leq \frac{G(y)}{F(y)} . $$ This implies $F(y) \leq G(y)$ for arbitrary $y$, so $G \lefsd F$. ◻ ``` (sss-mps)= #### Mean-Preserving Spreads We will be concerned with analyzing how behavior changes when decisions become "riskier" in some sense. To analyze such scenarios, we introduce the notion of a mean-preserving spread. In particular, for a given distribution $\phi$, we say that $\psi$ is a **mean-preserving spread** of $\phi$ if there exists a pair of random variables $(Y, Z)$ such that $$ \EE[Z \given Y] = 0, \quad Y \eqdist \phi \quad \text{and } \; Y + Z \eqdist \psi $$ Thus, $\psi$ is a mean-preserving spread of $\phi$ if it adds noise without changing the mean. ```{exercise} :label: ex-math_foundations-auto-22 Let $\phi = N(0, 1)$ and let $\psi=N(0,2)$. Show that $\psi$ is a mean-preserving spread of $\phi$. ``` ```{solution} ex-math_foundations-auto-22 If we take $Y$ and $Z$ to be independent and standard normal, then $\EE[Z \given Y] = \EE Z = 0$, $Y \eqdist \phi$ and, since sums of independent Gaussians and Gaussian, $Y + Z \eqdist N(0,2) = \psi$. Hence $\psi$ is a mean-preserving spread of $\phi$, as claimed. ``` ```{exercise} :label: ex-math_foundations-auto-23 Prove that if $\phi$ is a mean-preserving spread of $\psi$, then $\psi$ second order stochastically dominates $\phi$. [Hint: Use Jensen's inequality.] ``` ```{solution} ex-math_foundations-auto-23 Let $\phi$ be a mean-preserving spread of $\psi$. Then there exists a random pair $(Y, Z)$ such that $$ Y \eqdist \psi, \quad Y + Z \eqdist \phi \quad \text{and } \; \EE[Z \given Y] = 0. $$ Fixing arbitrary concave $u \in ib\RR$ and applying Jensen's inequality, $$ \EE\, u(Y + Z) = \EE \, \EE[ u(Y + Z) \given Y] \leq \EE \, u ( \EE[ Y + Z \given Y] ) = \EE \, u ( Y ). $$ Therefore $\int u(x) \phi(\diff x) = \EE \, u(Y + Z) \leq \EE u(Y) = \int u(x) \psi(\diff x)$. ``` ## Chapter Notes Good introductions to real analysis include {cite}`bartle2011introduction` and {cite}`aliprantis1998principles`. For topology, measure theory, and functional analysis at an advanced level, {cite}`aliprantis2006border` provides comprehensive coverage, while {cite}`dudley2002real` and {cite}`kreyszig1978introductory` offer accessible treatments. Fixed point theory in metric spaces is developed in {cite}`goebel1990topics`, and nonlinear optimization is covered in {cite}`jahn2020introduction`. For lattice theory and order, {cite}`davey2002introduction` provides a thorough introduction. High quality monographs on Riesz spaces, Banach lattices and positive operators include {cite}`aliprantis2006border`, {cite}`aliprantis2006positive`, {cite}`zaanen2012introduction`, {cite}`meyer2012banach`, and {cite}`batkai2017positive`. For further reading on probability theory and stochastic processes, {cite}`pollard2002user`, {cite}`cinlar2011probability` and {cite}`dudley2002real` are outstanding. For Markov chains and stochastic stability, the standard reference is {cite}`meyn2009markov`. [^1]: More precisely, $\sigma(\cC)$ is the intersection of all $\sigma$-algebras on $\Xsf$ that contain $\cC$. One can show that $\sigma(\cC)$ is always a well defined $\sigma$-algebra, since the intersection is nonempty (it at least contains $\wp(\Xsf)$) and any intersection of $\sigma$-algebras is again a $\sigma$-algebra. [^2]: For example, the pointwise limit of the sequence of functions $\{f_n\}$ given by $f_n(x) = x^n$ on $[0, 1]$ is discontinuous. [^3]: A more general perspective on {eq}`eq-iiim` that you might find useful is as follows. Suppose we identify measurable sets with their indicator functions. Then $\mu$ already provides us with an "integral" over the indicators in $m\aA_+$. The map $I_\mu$ extends the reach of this function to all of $m\aA_+$. [^4]: Some authors would call what we have described as a *real* vector space, which can then be extended to the notion of complex vector spaces. We have no need for this extension here, so we drop the adjective "real." [^5]: Not all Banach lattices have normalized order units. In fact it can be shown that a Banach lattice $E$ admits a normalized order unit if and only if $E$ is an AM-space with unit. We omit these details. ======================================================================== ## Solutions ```{exerciselist} ``` ======================================================================== ## License & AI Training # License & AI Training Permission This page documents the licenses applied to *Dynamic Programming Volume II: General States* and the explicit permission granted by the authors and QuantEcon for indexing, text and data mining, and AI training use. ## Licenses | Component | License | |---|---| | Book prose, equations, and figures | [Creative Commons Attribution-ShareAlike 4.0 International (CC-BY-SA-4.0)](https://creativecommons.org/licenses/by-sa/4.0/) | | Source code (Python, build scripts) | [MIT License](https://opensource.org/licenses/MIT) | Third-party material (figures or quotations attributed to other sources) remains under its original license; such material is identified inline where it appears. ## Permission grant The authors and QuantEcon explicitly permit and encourage the use of this book and its source files (prose, equations, figures, code, and bibliography) for: - copying and indexing by search engines and crawlers; - text and data mining; - training, fine-tuning, evaluation, and benchmarking of AI / machine-learning models, including large language models; - research, scholarship, and educational use; - inclusion in derivative datasets, corpora, and embeddings. This permission is granted with **attribution to the authors and QuantEcon** and is consistent with the licenses listed above. ## Preferred citation ```bibtex @book{sargent_stachurski_dp2, author = {Sargent, Thomas J. and Stachurski, John}, title = {Dynamic Programming Volume II: General States}, publisher = {QuantEcon}, year = {2026}, url = {https://book-dp2.quantecon.org} } ``` ## Machine-readable artifacts For LLM ingestion the site also publishes: - [`/llms.txt`](/llms.txt) — curated chapter index ([llmstxt.org](https://llmstxt.org) standard) - [`/llms-full.txt`](/llms-full.txt) — concatenated full Markdown source - [`/robots.txt`](/robots.txt) — explicit `Allow` for major AI crawlers ## Contact Questions about reuse, licensing, or attribution can be opened as issues on the [source repository](https://github.com/QuantEcon/book-dp2) or directed to . ========================================================================