ADP Transformations - Dynamic Programming Volume II: General States

A recurring task in mathematics is establishing when two apparently different objects are, in a precise sense, the same. Such equivalences are formalized by invertible, structure-preserving maps. For example, a group isomorphism is a bijection that preserves the group operation, revealing that two groups share identical algebraic structure despite different representations. A topological conjugacy between two dynamical systems is a homeomorphism intertwining their transition maps, establishing that the systems have the same dynamic behavior up to a relabeling of the state space. In the setting of dynamic programming, the relevant structure is order, and the appropriate notion of equivalence is that of an order isomorphism between posets.

We now investigate transformations of dynamic programs that preserve optimality structure. First, in Section 5.1, we investigate order isomorphisms, under which connections are exact. Two dynamic programs linked by isomorphisms are “the same” in terms of their optimality properties. In elementary terms, this is analogous to the way that $g = \phi \circ f$ has the same maximizer as $f$ when $\phi$ is strictly increasing; and that a maximizer of $g$ is a minimizer of $f$ when $\phi$ is strictly decreasing.

Isomorphisms are useful but there is also a sense in which they preserve too much structure. Often we want to transform a dynamic program into another one that differs along at least some dimensions – for example, in terms of dimensionality, or smoothness – and try to understand the original problem by studying the second more tractable one. We investigate such transformations in Section 5.2.2, under the title of “factored” dynamic programs. In Section 5.2, we introduce factored dynamic programs (FDPs), which involve non-bijective transformations that link a primary ADP to a subordinate ADP of potentially lower dimension. We show how optimality properties transfer between the two. Section 5.3 applies both isomorphisms and FDPs to concrete settings, including Q-factor models, structural estimation, and Epstein–Zin preferences.

5.1Isomorphisms¶

In this section we introduce a concept of “isomorphic” dynamic programs. In particular, we describe an isomorphic relationship that leads to essentially equivalent optimality properties. The basic idea can be explained with a simple example. Consider the savings problem from Section 1.3.2.3, with Bellman equation

v(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta v(w - c) \right\}.

If $u$ has heavy curvature near zero, one might consider taking an exponential transformation, aiming to work with functions that are easier to approximate numerically. Applying $\exp$ to both sides, writing $\hat u$ for $\exp \circ \, u$ and $\hat v$ for $\exp \circ \, v$ , we get

\hat v(w) = \max_{0 \leq c \leq w} \left\{ \hat u(c) \hat v(w - c)^\beta \right\}.

Not surprisingly, these two dynamic programs turn out to have exactly the same optimal policies, giving us two viable angles of attack. The theory in this section clarifies and generalizes this idea, with the aim of allowing us to apply it effectively to both simple and complex problems.

Along the way we will meet useful concepts such as topological conjugacy for dynamic systems and order isomorphisms. We will exploit these ideas to study additional optimization problems and algorithms, including time iteration methods and decision problems with ambiguity.

5.1.1Background Concepts¶

We begin with conjugate dynamical systems in Section 5.1.1.1, which preserve fixed point structure. In Section 5.1.1.2, we strengthen conjugacy to topological conjugacy, which additionally preserves stability. In Section 5.1.1.3, we develop order conjugacy, an order-theoretic alternative that is better suited to passing optimality properties between ADPs.

5.1.1.1Conjugate Dynamics¶

We recall that a (discrete time) dynamical system is a pair $(V, S)$ , where $V$ is any set and $S$ is a self-map on $V$ . Two dynamical systems $(V, S)$ and $(\hat V, \hat S)$ are said to be conjugate under $F$ (or just conjugate) if

$F$ is a bijection from $V$ into $\hat V$ and $F \circ S = \hat S \circ F$ on $V$ .

We can also write the last equality as $S = F^{-1} \circ \hat S \circ F$ . This helps us understand the conjugacy relationship: shifting a point $v \in V$ to $S v$ via $S$ is equivalent to

moving $v$ into $\hat V$ via $\hat v = F v$ ,
applying $\hat S$ to produce $\hat S \hat v$ , and then
moving the result $\hat S \hat v$ back to the original space $V$ using $F^{-1}$ .

The next result lists some consequences of conjugacy.

The proofs of these claims are straightforward. For example, regarding (i), suppose that $S^n = F^{-1} \hat S^n F$ at some fixed $n$ . Then, using conjugacy,

S^{n+1} = S S^n = S F^{-1} \hat S^n F = F^{-1} \hat S \hat S^n F = F^{-1} \hat S^{n+1} F.

5.1.1.2Topological Conjugacy¶

For us, perhaps the most important consequence of Proposition 5.1.1 is that, if two dynamical systems $(V, S)$ and $(\hat V, \hat S)$ have this property, then the system $(V, S)$ has a unique fixed point if and only if $(\hat V, \hat S)$ has a unique fixed point. At the same time, conjugacy is not enough to pass on stability properties. For this we need topological conjugacy.

To state this property, let $V$ and $\hat V$ be Hausdorff topological spaces. A map $F \colon V \to \hat V$ is called a homeomorphism if $F$ is a bijection from $V$ to $\hat V$ and, in addition, both $F$ and its inverse are continuous. The dynamical systems $(V, S)$ and $(\hat V, \hat S)$ are called topologically conjugate when these two systems are conjugate under a bijection $F$ and, in addition, $F$ is a homeomorphism.

In this setting, we have the following result:

The following example gives an elementary but nonetheless important illustration of the value of Proposition 5.1.2.

We will use Proposition 5.1.2 when we study Euler equations and time iteration in Section 8.3.3.

5.1.1.3Order Conjugacy¶

In the previous two sections we introduced conjugacy and then strengthened it to topological conjugacy, a fairly standard approach. Now, however, we will step back to conjugacy and strengthen it using order rather than topology. The order-theoretic notion we develop will be analogous to topological conjugacy, while at the same time being better suited to passing optimality properties from one ADP to another.

To begin, we take $V$ and $\hat V$ to be posets and consider two dynamical systems $(V, S)$ and $(\hat V, \hat S)$ . We call these systems order conjugate under $F$ if

$(V, S)$ and $(\hat V, \hat S)$ are conjugate under $F$ and,
$F$ is an order isomorphism (see Section A.1.2.9).

To indicate that such an $F$ can be found, we simply say that $(V, S)$ and $(\hat V, \hat S)$ are order conjugate.

Solution to Exercise 5.1.2

Let $D$ be the set of all dynamical systems $(V, S)$ where $V$ is partially ordered. For $(V, S)$ and $(\hat V, \hat S)$ in $D$ , we write $(V, S) \sim (\hat V, \hat S)$ when $(V, S)$ and $(\hat V, \hat S)$ are order conjugate. We claim that $\sim$ is reflexive, symmetric and transitive. Reflexivity is obvious: every $(V, S)$ in $D$ is order conjugate to itself under the identity map $I$ . Symmetry is also straightforward: If $(V, S)$ and $(\hat V, \hat S)$ are order conjugate under $F$ , then $(\hat V, \hat S)$ and $(V, S)$ are order conjugate under $F^{-1}$ . Finally, if $(V, S) \sim (V', S')$ under $F$ and $(V', S') \sim (V'', S'')$ under $G$ , then $G \circ F$ is an order isomorphism from $V$ to $V''$ and

G \circ F \circ S = G \circ S' \circ F = S'' \circ G \circ F \text{ on } V.

Hence $(V, S) \sim (V'', S'')$ and $\sim$ is also transitive.

The next lemma shows one benefit of establishing order conjugacy. It can be thought of as an order-theoretic version of Proposition 5.1.2.

Solution to Exercise 5.1.3

Let $(V, S)$ and $(\hat V, \hat S)$ be order conjugate under $F$ , with respective fixed points $v$ and $\hat v = Fv$ . By Proposition 5.1.1, $\hat v$ is the unique fixed point of $\hat S$ in $\hat V$ .

For (i), suppose $S$ is order stable on $V$ and let $\hat w$ satisfy $\hat S \hat w \preceq \hat w$ . Then $F^{-1} \hat S \hat w \preceq F^{-1} \hat w$ , so $S F^{-1} \hat w \preceq F^{-1} \hat w$ . By order stability of $S$ , $v \preceq F^{-1} \hat w$ , i.e., $\hat v = Fv \preceq \hat w$ . The proof that $\hat w \preceq \hat S \hat w$ implies $\hat w \preceq \hat v$ is similar.

For (ii), suppose $S$ is strongly order stable on $V$ and let $\hat w$ satisfy $\hat S \hat w \preceq \hat w$ . As above, $S F^{-1} \hat w \preceq F^{-1} \hat w$ . Strong order stability of $S$ now gives $S^n F^{-1} \hat w \downarrow v$ . Applying Exercise A.1.15 yields $F S^n F^{-1} \hat w \downarrow Fv = \hat v$ , and since $\hat S^n = F S^n F^{-1}$ by Proposition 5.1.1, we obtain $\hat S^n \hat w \downarrow \hat v$ . The proof that $\hat w \preceq \hat S \hat w$ implies $\hat S^n \hat w \uparrow \hat v$ is similar.

Both reverse implications hold by symmetry.

5.1.2Isomorphic ADPs¶

In this section, we use order conjugacy to connect dynamic programs. We are interested in whether or not ADPs can be connected by order isomorphisms (or anti-isomorphisms), and what implications this has for optimality. In Section 5.1.2.1, we define isomorphic ADPs and show that isomorphism is an equivalence relation. In Section 5.1.2.2, we prove that isomorphic ADPs share the same optimality and convergence properties. Section 5.1.2.3 extends the analysis to anti-isomorphic ADPs, where maximization in one ADP corresponds to minimization in the other. Finally, in Section 5.1.3, we apply both isomorphic and anti-isomorphic relationships to establish optimality for an Epstein–Zin preference model.

5.1.2.1Definition and Consequences¶

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be ADPs with policy sets $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ and $\hat{\TT} \coloneq \setntn{\hat T_\sigma}{\sigma \in \Sigma}$ . We call these ADPs isomorphic under $F$ if

these two ADPs have the same policy set $\Sigma$ , and
$(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ are order conjugate under $F$ for all $\sigma \in \Sigma$ .

Part (ii) requires that $F$ is an order isomorphism from $V$ to $\hat V$ and

F \circ T_\sigma = \hat T_\sigma \circ F \;\; \text{ on } V \text{ for all } \sigma \in \Sigma.

(5.1)

Example 5.1.4

Consider an ADP where $T_\sigma$ , the policy operator, has the form

(T_\sigma \, v)(w) = u(\sigma(w)) + \beta v(w - \sigma(w)).

(5.2)

(This is the problem we introduced at the start of Section 5.1.) This operator maps $c\RR$ , the set of all continuous real-valued functions on $\RR$ , into itself. Consider also a “multiplicative” version

(\hat T_\sigma \, \hat v)(w) = \hat u(\sigma(w)) [\hat v(w - \sigma(w))]^\beta

(5.3)

where $\hat u \coloneq \exp \circ \, u$ and $\hat T_\sigma$ acts on functions in $c(0,\infty)$ , the set of continuous everywhere positive functions on the positive reals. Then, given $v \in c\RR$ , we have

\begin{aligned} \exp [(T_\sigma \, v)(w)] & = \exp [ u(\sigma(w)) ] \cdot \exp[ \beta v(w - \sigma(w))] \\ & = \hat u(\sigma(w)) \cdot [\exp v(w - \sigma(w)) ]^\beta = (\hat T_\sigma \, \exp \circ \, v)(w). \end{aligned}

Since $v$ and $w$ were chosen arbitrarily, we find that $F \circ T_\sigma = \hat T_\sigma \circ F$ on $c\RR$ , where $F$ is the transformation given by $F v \coloneq \exp \circ \, v$ . Exercise A.1.10 tells us that $F$ is an order isomorphism from $c\RR$ to $c(0,\infty)$ , so $(c\RR, \TT)$ and $(c(0,\infty), \hat{\TT})$ are isomorphic.

In other words, if $\mathbf A$ is the set of all ADPs and, for $(V, \TT), (\hat V, \hat{\TT}) \in \mathbf A$ , the symbol $(V, \TT) \sim (\hat V, \hat{\TT})$ means $(V, \TT)$ and $(\hat V, \hat{\TT})$ are isomorphic, then $\sim$ is reflexive, symmetric and transitive.

5.1.2.2Isomorphisms and Optimality¶

We seek relationships between optimality properties of isomorphic ADPs. For all of this section, we take $(V, \TT)$ and $(\hat V, \hat{\TT})$ to be two ADPs with $\TT = \setntn{T_\sigma}{\sigma \in \Sigma}$ and $\hat{\TT} = \setntn{\hat T_\sigma}{\sigma \in \Sigma}$ . When they exist, we let

$v_\sigma$ (resp., $\hat v_\sigma$ ) be the unique fixed point of $T_\sigma$ (resp., $\hat T_\sigma$ ),
$\tmax$ (resp., $\htmax$ ) be the Bellman operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$ ), and
$\vmax$ (resp., $\hvmax$ ) be the value function of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$ ).

The next theorem shows that isomorphic ADPs share the same regularity and optimality properties:

Proof

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be isomorphic under $F$ . Regarding (i), fix $v \in V$ and suppose that $\sigma$ is $v$ -greedy for $(V, \TT)$ . Then $T_\tau \, v \preceq T_\sigma \, v$ and hence $F T_\tau \, v \preceq F T_\sigma \, v$ for all $\tau \in \Sigma$ . Conjugacy now implies that $\hat T_\tau \, F v \preceq \hat T_\sigma \, F v$ for all $\tau \in \Sigma$ , so $\sigma$ is $Fv$ -greedy for $(\hat V, \hat{\TT})$ . The converse implication is symmetric.

Claim (ii) is immediate from claim (i). Claims (iii) and (iv) follow directly from order conjugacy of the policy operators (as in (5.1)) and Lemma 5.1.3. Regarding (v), we use order conjugacy of the policy operators to obtain $F v_\sigma = \hat v_\sigma$ for all $\sigma \in \Sigma$ , from which it follows that

v_\sigma = \bigvee_\tau v_\tau \quad \iff \quad F v_\sigma = F \bigvee_\tau v_\tau = \bigvee_\tau F v_\tau \quad \iff \quad \hat v_\sigma = \bigvee_\tau \hat v_\tau

In other words, $\sigma$ is optimal for $(V, \TT)$ if and only if $\sigma$ is optimal for $(\hat V, \hat{\TT})$ . ◻

The next theorem studies the case when $(V, \TT)$ and $(\hat V, \hat{\TT})$ are regular and well-posed. It tells us that isomorphic ADPs have the same optimality properties.

Proof

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be as stated. Regarding (i), we fix $v \in V$ and apply (5.1) to obtain

F T v = F \bigvee_\sigma T_\sigma \, v = \bigvee_\sigma F T_\sigma \, v = \bigvee_\sigma \hat T_\sigma \, F v = \htmax \, F \, v.

(The second equality follows from regularity and Lemma A.1.4.) This confirms (5.4), so $T$ and $\hat T$ are order conjugate under $F$ .

Regarding (ii), suppose that $\vmax = \bigvee_\sigma v_\sigma$ exists. Then $\hvmax = \bigvee_\sigma F v_\sigma = F \bigvee_\sigma v_\sigma = F \vmax$ .

Regarding (iii), suppose that the fundamental optimality properties hold for $(V, \TT)$ . We need only show that they likewise hold for $(\hat V, \hat{\TT})$ , since the reverse implication then holds by symmetry. First, an optimal policy exists for $(\hat V, \hat{\TT})$ by existence for $(V, \TT)$ and part (v) of Theorem 5.1.5. Second, $\vmax$ is the unique fixed point of $T$ and, in addition, $(V, T)$ and $(\hat V, \hat T)$ are order conjugate, so $F\vmax$ is the unique fixed point of $\hat T$ . In view of (ii), this means that $\hvmax$ is the unique fixed point of $\hat T$ . Bellman’s principle of optimality also holds for $(\hat V, \hat{\TT})$ by Lemma 2.1.3 (or by (i) and (v) of Theorem 5.1.5). ◻

The next theorem considers convergence of algorithms.

Theorem 5.1.7

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be regular, well-posed ADPs. If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are isomorphic under $F$ , then

the respective optimistic policy operators $W$ and $\hat W$ obey

F \circ W = \hat W \circ F \text{ on } V, \text{ and}

(5.5)

the respective Howard policy operators $H$ and $\hat H$ obey

F \circ H = \hat H \circ F \text{ on } V.

(5.6)

Moreover, if the fundamental optimality properties hold for one and hence both of these ADPs, then the following statements are true.

VFI converges for $(V, \TT)$ if and only if VFI converges for $(\hat V, \hat{\TT})$ ,
OPI converges for $(V, \TT)$ if and only if OPI converges for $(\hat V, \hat{\TT})$ , and
HPI converges for $(V, \TT)$ if and only if HPI converges for $(\hat V, \hat{\TT})$ .

The proof is long but straightforward and presented as a solved exercise.

Solution to Exercise 5.1.5

Fix $m \in \NN$ . Let $W \coloneq W_m$ and $H$ be the optimistic and Howard policy operators for $(V, \TT)$ . Let $\hat W \coloneq \hat W_m$ and $\hat H$ be the optimistic and Howard policy operators for $(\hat V, \hat{\TT})$ . Fix $v \in V$ and let $\sigma$ be $v$ -greedy for $(V, \TT)$ , so that $W v = T^m _\sigma v$ . By Theorem 5.1.5, $\sigma$ is $Fv$ -greedy for $(\hat V, \hat{\TT})$ . Hence $\hat W F v = \hat T^m_\sigma F v = F T^m_\sigma v = F W v$ . This proves that (5.5) holds. Similarly, continuing to assume that $\sigma$ is $v$ -greedy for $(V, \TT)$ , we have $Hv = v_\sigma$ and, because $\sigma$ is $Fv$ -greedy for $(\hat V, \hat{\TT})$ , we also have $\hat H Fv = \hat v_\sigma$ . As a result, $F^{-1} \hat H F v = F^{-1} \hat v_\sigma = F^{-1} F v_\sigma = v_\sigma = H v$ . Hence (5.6) also holds.

Regarding (iii)–(v), we prove only (v), since the remaining arguments are similar. Suppose that HPI converges for $(V, \TT)$ and fix $\hat v \in \hat V_U$ . Then $v \coloneq F^{-1} \hat v$ is in $V_U$ , since $v = F^{-1} \hat v \preceq F^{-1} \hat T \hat v = T F^{-1} \hat v = Tv$ . As a result, we have $H^n v \uparrow \vmax$ . But then $F H^n v \uparrow F \vmax = \hat \vmax$ , where $\uparrow$ is by Exercise A.1.15 and the equality is by Theorem 5.1.6(ii). Since $H$ and $\hat H$ are conjugate under $F$ , we also have $F H^n v = \hat H^n F v$ . Combining the last two equalities gives us $\hat H^n \hat v = \hat H^n F v = F H^n v \uparrow \hat \vmax$ . As $\hat v$ was chosen arbitrarily from $\hat V_U$ , we see that HPI converges for $(\hat V, \hat{\TT})$ .

5.1.2.3The Anti-Isomorphic Case¶

In this section, we switch to studying anti-isomorphic ADPs. In doing so, we will consider minimization as well as maximization. We follow the notational conventions and terminology introduced in Section 2.2.3. Most readers will find it helpful to review that section before reading this one.

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be ADPs with the same policy set. In line with the notation in Section 5.1.2.2, we let

$\tmin$ (resp., $\htmin$ ) be the Bellman min-operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$ ),
$\vmin$ (resp., $\hvmin$ ) be the min-value function of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$ ),
$\Hmin$ (resp., $\hHmin$ ) be the Howard policy min-operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$ ), and
$\Wmin$ (resp., $\hWmin$ ) be the optimistic policy min-operator of $(V, \TT)$ (resp., $(\hat V, \hat{\TT})$ ).

As was the case in Section 2.2.3, we enhance clarity by adding a “max-” prefix to previously introduced definitions that pertain to maximization. For example,

“optimal policies” will be referred to as “max-optimal policies”,
the “Bellman equation” will be referred to as the “Bellman max-equation”,
the “Bellman operator” will be referred to as the “Bellman max-operator”,

and so on.

We call $(V, \TT)$ and $(\hat V, \hat{\TT})$ anti-isomorphic under $F$ if these two ADPs have the same policy set $\Sigma$ and, in addition, $F$ is an anti-isomorphism from $V$ to $\hat V$ such that (5.1) holds.

We can also express this relationship in terms of isomorphisms and duality of ADPs, as defined in Section 2.2.3.2:

Here is an optimality result for anti-isomorphic ADPs that parallels Theorem 5.1.5.

Solution to Exercise 5.1.7

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be anti-isomorphic under $F$ , so that $(V, \TT)$ and $(\hat V, \hat{\TT})^\partial$ are isomorphic under $F$ . Theorem 5.1.6 implies that $F \circ \tmax = \htmax^\partial \circ F$ , where $\htmax^\partial$ is the Bellman max-operator of $(\hat V, \hat{\TT})^\partial$ . Exercise 2.2.4 gives $\htmax^\partial = \htmin$ . Combining these results gives (5.7).

Regarding (ii), since $(V, \TT)$ and $(\hat V, \hat{\TT})^\partial$ are isomorphic under $F$ , Theorem 5.1.6 yields $\hvmaxd = F \vmax$ . Applying Exercise 2.2.4 again, we have $\hvmaxd = \hvmin$ . Hence $\hvmin = F \vmax$ .

Regarding (iii), Theorem 5.1.6 implies that the fundamental max-optimality properties hold for $(V, \TT)$ if and only if these same max-optimality properties hold for $(\hat V, \hat {\TT})^\partial$ . Exercise 2.2.6 tells us that the fundamental max-optimality properties hold for $(\hat V, \hat{\TT})^\partial$ if and only if the fundamental min-optimality properties hold for $(\hat V, \hat{\TT})$ .

Now we consider convergence of algorithms in the anti-isomorphic case.

Theorem 5.1.10

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be well-posed ADPs that are max-regular and min-regular, respectively. If $(V, \TT)$ and $(\hat V, \hat{\TT})$ are anti-isomorphic under $F$ , then

the optimistic policy operators $W$ and $\hWmin$ obey

F \circ W = \hWmin \circ F \text{ on } V, \text{ and}

(5.8)

the Howard policy operators $H$ and $\hHmin$ obey

F \circ H = \hHmin \circ F \text{ on } V.

(5.9)

Moreover, if the fundamental optimality properties hold for one and hence both of these ADPs, then the following statements are true.

max-VFI converges for $(V, \TT)$ if and only if min-VFI converges for $(\hat V, \hat{\TT})$ ,
max-OPI converges for $(V, \TT)$ if and only if min-OPI converges for $(\hat V, \hat{\TT})$ , and
max-HPI converges for $(V, \TT)$ if and only if min-HPI converges for $(\hat V, \hat{\TT})$ .

Solution to Exercise 5.1.8

Let $(V, \TT)$ and $(\hat V, \hat{\TT})$ be anti-isomorphic under $F$ , so that $(V, \TT)$ and $(\hat V, \hat{\TT})^\partial$ are isomorphic under $F$ . Theorem 5.1.7 implies that $F \circ W = \hWmax^\partial \circ F$ , where $\hWmax^\partial$ is the optimistic policy max-operator of $(\hat V, \hat{\TT})^\partial$ . Exercise 2.2.4 gives $\hat W^\partial = \hWmin$ . Combining these results gives (5.8). The proof of (5.9) is similar.

Regarding (iii), if max-VFI converges for $(V, \TT)$ , then by Theorem 5.1.7, max-VFI converges for $(\hat V, \hat{\TT})^\partial$ . But then min-VFI converges for $(\hat V, \hat{\TT})$ , by Exercise 2.2.6. The proof of the converse implication is symmetric, and the proofs of (iv) and (v) are similar.

5.1.3Example: Epstein–Zin Optimality¶

In this section, we use the theory of isomorphic and anti-isomorphic ADPs to study optimality properties of a modified MDP model that incorporates Epstein–Zin preferences (motivation for which was provided in Section 1.3.3). We will show how isomorphic and anti-isomorphic relationships can be used to simplify analysis.

To begin, consider a variation of the finite state MDP from Section 1.2 where the Bellman equation is modified to

v(x) = \max_{a \in \Gamma(x)} \left\{ (1-\beta) r(x, a)^\alpha + \beta \left[ (Lv)(x,a) \right]^{\alpha} \right\}^{1/\alpha},

(5.10)

with

(Lv)(x, a) \coloneq \left( \sum_{x'} v(x')^\nu P(x, a, x') \right)^{1/\nu}.

Here $\Xsf$ , $\Asf$ , $r$ , $\Gamma$ , $\beta$ , and $P$ are all as in Section 1.2, while $\Gsf$ is the feasible state-action pairs. The parameters $\alpha$ and $\nu$ are connected to the notation of Section 1.3.3 via $\alpha = 1 - 1/\psi$ and $\nu = 1 - \gamma$ , where $\psi$ is the EIS and $\gamma$ is the coefficient of relative risk aversion. The change of notation is made to simplify the presentation. If $\alpha=\nu=1$ , then this Epstein--Zin model reduces to an ordinary finite state MDP. In this section, however, we allow $\alpha$ and $\nu$ to take any nonzero values.

Using the notation

r_\sigma(x) \coloneq r(x, \sigma(x)) \quad \text{and} \quad (L_\sigma v)(x) \coloneq (Lv)(x, \sigma(x)),

we can write a corresponding policy operator $T_\sigma$ as

T_\sigma \, v = \left\{ (1-\beta) r_\sigma^\alpha + \beta \left( L_\sigma \, v \right)^{\alpha} \right\}^{1/\alpha}.

(5.11)

Let $\TT_{\rm EZ}$ be the set of all such $T_\sigma$ .

We assume that $r$ is strictly positive, so that $T_\sigma$ maps $(0, \infty)^\Xsf$ into itself. Sargent & Stachurski (2025) establish optimality properties for this specification when $P_\sigma$ is irreducible for all $\sigma \in \Sigma$ . Here we drop this assumption and establish the same optimality properties.

To this end, let $\theta = \nu/\alpha$ . Fix $\epsilon > 0$ with $\min r^\alpha - \epsilon > 0$ . Let

\hat V = [v_1, v_2] \quad \text{where} \quad v_1 = m_1 \wedge m_2 \text{ and } v_2 = m_1 \vee m_2.

Here $m_1 \coloneq \left( \min r^\alpha - \epsilon \right)^\theta$ and $m_2 \coloneq \left( \max r^\alpha + \epsilon \right)^\theta$ . Let $F$ be defined by

F \, v = v^\nu \qquad \text{with } v \in (0, \infty)^\Xsf,

where the exponent $\nu$ is applied pointwise to $v$ , and set

V \coloneq F^{-1} \hat V = \setntn{v \in (0, \infty)^\Xsf}{v_1 \leq v^\nu \leq v_2}.

(5.12)

We are interested in optimality properties of $(V, \TT_{\rm EZ})$ . While we can try to tackle this ADP directly, the arguments become significantly easier after a transformation. To pursue this path, we introduce the auxiliary ADP $(\hat V, \hat{\TT}_{\rm EZ})$ with $\hat V$ as defined above and

\hat T_\sigma \, v = \left\{ (1-\beta) r_\sigma^\alpha + \beta \left( P_\sigma \, v \right)^{1/\theta} \right\}^\theta.

(5.13)

In the next exercise, $f \ll g$ means $f(x) < g(x)$ for all $x$ , as in Section 4.1.3.

Solution to Exercise 5.1.9

Suppose first that $\theta > 0$ , so that $v_1 = m_1$ and $v_2 = m_2$ . Then

\begin{aligned} \hat T_\sigma \, v_1 &= \left\{ (1-\beta) r^\alpha_\sigma + \beta (\min r^\alpha - \epsilon) \right\}^\theta \\ &\geq \left\{ (1-\beta) \min r^\alpha + \beta (\min r^\alpha - \epsilon) \right\}^\theta = \left\{ \min r^\alpha - \beta \epsilon \right\}^\theta > m_1 = v_1. \end{aligned}

In addition,

\begin{aligned} \hat T_\sigma \, v_2 &= \left\{ (1-\beta) r_\sigma^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta \\ &\leq \left\{ (1-\beta) \max r^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta = \left\{ \max r^\alpha + \beta \epsilon \right\}^\theta < m_2 = v_2. \end{aligned}

If $\theta < 0$ , then $v_1 = m_2$ and $v_2 = m_1$ , so

\begin{aligned} \hat T_\sigma \, v_1 &= \left\{ (1-\beta) r_\sigma^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta \\ &\geq \left\{ (1-\beta) \max r^\alpha + \beta (\max r^\alpha + \epsilon) \right\}^\theta = \left\{ \max r^\alpha + \beta \epsilon \right\}^\theta > m_2 = v_1. \end{aligned}

In addition,

\begin{aligned} \hat T_\sigma \, v_2 &= \left\{ (1-\beta) r_\sigma^\alpha + \beta (\min r^\alpha - \epsilon) \right\}^\theta \\ &\leq \left\{ (1-\beta) \min r^\alpha + \beta (\min r^\alpha - \epsilon) \right\}^\theta = \left\{ \min r^\alpha - \beta \epsilon \right\}^\theta < m_1 = v_2. \end{aligned}

The results from Exercise 5.1.9 and Exercise 5.1.10 allow us to establish optimality properties for the auxiliary ADP $(\hat V, \hat{\TT}_{\rm EZ})$ .

Solution to Exercise 5.1.11

Fix $\sigma \in \Sigma$ and $v \in V$ . On the one hand,

F \, T_\sigma \, v = (T_\sigma \, v)^\nu = \left\{ (1-\beta) r_\sigma^\alpha + \beta (P_\sigma\, v^\nu)^{\alpha/\nu} \right\}^{\nu/\alpha} = \left\{ (1-\beta) r_\sigma^\alpha + \beta (P_\sigma\, v^\nu)^{1/\theta} \right\}^{\theta}.

On the other,

\hat T_\sigma \, F \, v = \hat T_\sigma \, v^\nu = \left\{ (1-\beta) r_\sigma^\alpha + \beta (P_\sigma\, v^\nu)^{1/\theta} \right\}^{\theta}.

Hence $F \circ T_\sigma = \hat T_\sigma \circ F$ on $V$ , as claimed.

The next result follows easily from the conclusion of Exercise 5.1.11.

We are now ready to state and prove the main result of this section.

Proof

Proof of Proposition 5.1.13.

Suppose first that $\nu > 0$ . Then, by Lemma 5.1.12, $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are isomorphic. Moreover, by Lemma 5.1.11, the fundamental max-optimality results hold for $(\hat V, \hat{\TT}_{\rm EZ})$ and max-VFI, max-OPI, and max-HPI all converge. Theorem 5.1.6 and Theorem 5.1.7 now imply that the same results hold for $(V, \TT_{\rm EZ})$ .

Next, suppose that $\nu < 0$ . Then, by Lemma 5.1.12, $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ are anti-isomorphic. Moreover, by Lemma 5.1.11, the fundamental min-optimality results hold for $(\hat V, \hat{\TT}_{\rm EZ})$ , which implies that $(\hat V, \hat{\TT}_{\rm EZ})$ is well-posed and min-regular. Theorem 5.1.8 then gives that $(V, \TT_{\rm EZ})$ is also well-posed and max-regular. In addition, min-VFI, min-OPI, and min-HPI all converge for $(\hat V, \hat{\TT}_{\rm EZ})$ . Applying Theorem 5.1.9 and Theorem 5.1.10, we see that the fundamental max-optimality results hold for $(V, \TT_{\rm EZ})$ and max-VFI, max-OPI, and max-HPI all converge. ◻

The relationship between $(V, \TT_{\rm EZ})$ and $(\hat V, \hat{\TT}_{\rm EZ})$ allows us to use either one to solve for an optimal policy. For example, if $\nu < 0$ , then, by Theorem 5.1.8, any min-optimal policy for $(\hat V, \hat{\TT}_{\rm EZ})$ will be max-optimal for $(V, \TT_{\rm EZ})$ . Hence we can solve for the Epstein--Zin max-optimal policy either by directly solving $(V, \TT_{\rm EZ})$ or by solving $(\hat V, \hat{\TT}_{\rm EZ})$ for a min-optimal policy. The best choice depends on computational simplicity and numerical stability.

5.2Semiconjugate Relationships¶

In this section we introduce an asymmetric relationship between ADPs that involves a form of factorization. This factorization produces two versions of an ADP: a primary one and a subordinate one. Typically, the primary ADP will be relatively standard, while the subordinate ADP can be thought of as a variation that might have some analytical or computational advantages. Under certain conditions, studying the “simpler” subordinate ADP will shed light on outcomes and solutions for the primary ADP.

In the applications we consider, the associated transformations are not bijective, unlike the isomorphic relationships considered in Section 5.1. Sometimes the lack of bijective transformations occurs because one dynamic program (the primary ADP) evolves in a higher dimensional space than the other (the subordinate ADP). Although the transformations in question are not bijections, we nonetheless show that the primary and subordinate ADPs have tight connections in terms of optimality.

This section is related to the dynamic programs associated with the modified Bellman equations we introduced in Chapter 5 of Sargent & Stachurski (2025). Relative to that theory, the exposition below is more concise and more general. As a result, we can easily cover additional variations on the MDP Bellman equation, of which there are many, as well as studying relationships between ADPs beyond the traditional MDP setting.

5.2.1Strong Semiconjugacy¶

We begin by introducing a similarity notion related to conjugacy (see Section 5.1.1.1) and show how it links the dynamics of trajectories on posets.

5.2.1.1Definition¶

Let $(V, S)$ and $(\hat V, \hat S)$ be dynamical systems, where $V$ and $\hat V$ are posets. In this setting, we call $(V, S)$ and $(\hat V, \hat S)$ strongly semiconjugate under $F,G$ when there exist maps $F \colon V \to \hat V$ and $G \colon \hat V \to V$ such that

S = G \circ F \text{ on } V \qquad \text{and} \qquad \hat S = F \circ G \text{ on } \hat V.

(5.14)

The “semiconjugate” terminology comes from the fact that, when (5.14) holds,

F \circ S = \hat S \circ F \quad \text{and} \quad G \circ \hat S = S \circ G .

(5.15)

An immediate implication of the definitions is: if either $F$ or $G$ is an order isomorphism, then the systems $(V, S)$ and $(\hat V, \hat S)$ are order conjugate. However, in the applications we consider, neither $F$ nor $G$ will be bijective.

Figure 5.1 helps to illustrate the difference between conjugacy and strong semiconjugacy.

Figure 5.1:Comparison of conjugacy and strong semiconjugacy

Like order conjugacy, strong semiconjugacy can be used to derive useful relationships between dynamical systems. The next lemma lists relationships that will be helpful when we turn to dynamic programming.

Proof

Let $(V, S)$ and $(\hat V, \hat S)$ be as stated. If $v$ is a fixed point of $S$ in $V$ , then $\hat S F v = F S v = F v$ , so $Fv$ is a fixed point of $\hat S$ in $\hat V$ . Similarly, if $w$ is a fixed point of $\hat S$ in $\hat V$ , then $S G w = G \hat S w = G w$ , so $G w$ is a fixed point of $S$ in $V$ . This proves (i)--(ii).

Regarding (iii), suppose that $v$ is the only fixed point of $S$ in $V$ . By (i), $Fv$ is a fixed point of $\hat S$ in $\hat V$ . Suppose in addition that $w$ is fixed for $\hat S$ . Then $F G w = w$ and hence $G F G w = G w$ , or $S G w = G w$ . Since $v$ is the only fixed point of $S$ in $V$ , we have $G w = v$ . Applying $F$ gives $\hat S w = F v$ . But $w$ is fixed for $\hat S$ , so $w = F v$ . This shows that $\hat S$ has exactly one fixed point in $\hat V$ . The reverse implication holds by symmetry. ◻

The next lemma extends Lemma 5.2.1 to cover order stability.

Proof

Suppose first that $S$ is order stable on $V$ , with unique fixed point $\bar v \in V$ . By Lemma 5.2.1, $F \bar v$ is the unique fixed point of $\hat S$ in $\hat V$ . To verify the order-stability conditions on $\hat V$ , fix $w \in \hat V$ with $w \preceq \hat S w$ . If $F$ and $G$ are order preserving, then $G w \preceq G \hat S w = S G w$ , so order stability of $S$ gives $G w \preceq \bar v$ . Applying $F$ yields $F G w \preceq F \bar v$ , that is, $\hat S w \preceq F \bar v$ . Combined with $w \preceq \hat S w$ , this gives $w \preceq F \bar v$ . If instead $F$ and $G$ are order reversing, then $G \hat S w \preceq G w$ , i.e., $S G w \preceq G w$ , so order stability of $S$ gives $\bar v \preceq G w$ . Applying the order-reversing $F$ yields $F G w \preceq F \bar v$ , again giving $w \preceq \hat S w \preceq F \bar v$ . The proof that $\hat S w \preceq w$ implies $F \bar v \preceq w$ is similar. Hence $\hat S$ is order stable on $\hat V$ . The reverse implication holds by symmetry, proving (i).

Regarding (ii), suppose that $S$ is strongly order stable on $V$ and fix $w \in \hat V$ with $w \preceq \hat S w$ . If $F$ and $G$ are order preserving, then $G w \preceq S G w$ , so strong order stability of $S$ gives $S^n(Gw) \uparrow \bar v$ . Applying $F$ yields $F(S^n(Gw)) \uparrow F \bar v$ . If instead $F$ and $G$ are order reversing, then $S G w \preceq G w$ , so strong order stability of $S$ gives $S^n(Gw) \downarrow \bar v$ , and the order-reversing $F$ gives $F(S^n(Gw)) \uparrow F \bar v$ . In both cases, using $\hat S^{n+1} w = F(S^n(Gw))$ (a consequence of (5.15)), we obtain $\hat S^n w \uparrow F \bar v$ . The proof that $\hat S w \preceq w$ implies $\hat S^n w \downarrow F \bar v$ is similar. Hence $\hat S$ is strongly order stable on $\hat V$ . The reverse implication holds by symmetry, proving (ii). ◻

When we get to applications, our main aim will be to convert a given system $(V,S)$ into a “nicer” system $(\hat V, \hat S)$ and then learn about $(V, S)$ by studying $(\hat V, \hat S)$ . In particular, we wish to (a) deduce the existence of a unique fixed point of $(V,S)$ only by studying $(\hat V, \hat S)$ , and (b) compute this unique fixed point, working only with $(\hat V, \hat S)$ . The next theorem shows the way. It draws on Lemma 5.2.1 and Lemma 5.2.2 for translation of fixed points and order stability, and then adds a convergence result.

Notice that, in (5.16), we iterate only in the “nice” system $(\hat V, \hat S)$ , and finally transfer back to $V$ using the mapping $G$ .

5.2.1.2The Order-Reversing Case¶

Lemma 5.2.2 already tells us that order stability transfers between strongly semiconjugate systems when $F$ and $G$ are both order reversing. The next result parallels Theorem 5.2.3 for this case.

Note that the initial condition changes from $w \preceq \hat S w$ in Theorem 5.2.3 to $\hat S w \preceq w$ in Theorem 5.2.4. Starting above the fixed point in $\hat V$ and iterating down generates a sequence that, after applying the order-reversing map $G$ , converges up to $\bar v$ .

5.2.1.3Application: Firm Entry¶

To illustrate strong semiconjugacy and its implications, we now study a firm entry problem from Fajgelbaum et al. (2017), slightly extended to allow discount rates to change over time. The basic structure of the model is similar to the firm problem we studied in Section 1.1. We show that the functional equation that describes firm value can be solved in lower-dimensional space and then mapped back to the original higher-dimensional space.

Analogous to (1.6), we take the lifetime value of a firm to be a function $v$ that solves

v(z, f) = \max \left\{ s(z) - f, \; \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\}.

(5.18)

Here $z$ is an exogenous state, taking values in $\RR^k$ , the real number $f$ represents an IID fixed cost, with distribution $\phi$ , and $Q$ is a stochastic kernel for the exogenous state. The value $s(z)$ represents the present value of profits for the firm if it chooses to enter in the current period. Discounting is implemented via a state-dependent factor $\beta(z) > 0$ .

Let $\Esf \subseteq \RR_+$ be the set where $f$ takes values and let $\Zsf \subseteq \RR^k$ be the set where $z$ takes values. Let $\Xsf$ be the cartesian product $\Zsf \times \Esf$ . Let $b\Xsf$ and $b\Zsf$ be real-valued bounded and Borel measurable functions on $\Xsf$ and $\Zsf$ respectively. We endow both function spaces with the pointwise partial order $\leq$ and the supremum norm.

In (iii), the process $(Z_t)$ is $Q$ -Markov with initial condition $z$ . Note that, in the traditional constant discount rate setting, the term in (iii) is just $\beta^n$ for some constant $\beta \in (0,1)$ , so the condition is automatically satisfied.

To solve (5.18) we introduce an operator $S$ , defined at each $v \in b\Xsf$ by

(Sv)(z, f) = \max \left\{ s(z) - f, \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\}.

Evidently, $v$ is a fixed point of $S$ if and only if it solves the firm valuation equation (5.18).

Although we can study $S$ directly, this fixed point problem can be solved in a lower-dimensional space by working with an alternative operator $T \colon b\Zsf \to b\Zsf$ defined at each $w \in b\Zsf$ by

(T w)(z) = \int \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \phi(\diff f)

To connect $S$ and $T$ , we introduce the map $G \colon b\Zsf \to b\Xsf$ defined via

(G w)(z, f) = \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \qquad ((z,f) \in \Xsf)

and the map $F \colon b\Xsf \to b\Zsf$ defined by

(Fv)(z) = \int v(z, f) \phi(\diff f) \qquad (z \in \Zsf).

The significance of $F$ and $G$ stems from the next lemma.

Proof

Clearly, $F$ and $G$ are order preserving. For each $v \in b\Xsf$ , we have

\begin{aligned} (G F v)(z, f) &= \max \left\{ s(z) - f, \beta(z) \int (F v)(z') Q(z, \diff z') \right\} \\ &= \max \left\{ s(z) - f, \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\} = (Sv)(z, f). \end{aligned}

This confirms that $S = G \circ F$ holds. In addition, $T = F \circ G$ holds because, for each $w \in b\Zsf$ ,

\begin{aligned} (F G w)(z) &= \int (G w)(z, f) \phi(\diff f) \\ &= \int \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \phi(\diff f) = (Tw)(z). \end{aligned}

This proves that $(b\Xsf, S)$ and $(b\Zsf, T)$ are strongly semiconjugate under order-preserving $F, G$ , as claimed. ◻

We are now ready to use the low-dimensional system $(b\Zsf, T)$ to solve the higher-dimensional system $(b\Xsf, S)$ .

In proving Proposition 5.2.7, we use the following lemma.

Proof

The statement $w_n \uparrow w$ in $b\Zsf$ is equivalent to the real convergence of $w_n(z)$ up to $w(z)$ for all $z \in \Zsf$ (Lemma A.1.3). Using this fact, we fix $w_n \uparrow w$ in $b\Zsf$ and obtain $w_n(z') \uparrow w(z')$ for any $z' \in \Zsf$ . Next we fix $(z, f) \in \Xsf$ and use the dominated convergence theorem to obtain $\int w_n(z') Q(z, \diff z') \uparrow \int w(z') Q(z, \diff z')$ . This in turn implies that $(G w_n)(z, f) \uparrow (G w)(z, f)$ . Thus, using Lemma A.1.3 again, $G w_n \uparrow G w$ . ◻

5.2.2Factored Dynamic Programs¶

Now we switch from studying dynamical systems to studying dynamic programs. This is straightforward because we view dynamic programs (ADPs) as families of dynamical systems.

5.2.2.1Definition¶

A factored dynamic program (FDP) is a tuple $(V, F, \hat V, \GG)$ where

$V$ and $\hat V$ are nonempty posets,
$F$ is a map from $V$ to $\hat V$ , and
$\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}$ is a family of maps from $\hat V$ to $V$ , and
the set $\{G_\sigma \, \hat v\}_{\sigma \in \Sigma}$ has a greatest element for every $\hat v \in \hat V$ .

If $F$ and all $G_\sigma$ are all order preserving, then we call $(V, F, \hat V, \GG)$ an order preserving FDP. (In Section 5.2.3, we introduce order-reversing FDPs. This case is rarer, typically involving some form of nonstandard preferences.)

Given an FDP $(V, F, \hat V, \GG)$ , we introduce an operator family $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ by setting

T_\sigma = G_\sigma \circ F \text{ for all } \sigma \in \Sigma.

Evidently $(V, \TT)$ is an ADP. We call it the primary ADP generated by $(V, F, \hat V, \GG)$ .

The FDP $(V, F, \hat V, \GG)$ also produces a second ADP $(\hat V, \hat{\TT})$ , where the policy operators in $\hat{\TT}$ take the form

\hat T_\sigma = F \circ G_\sigma \qquad \text{ for all } \sigma \in \Sigma,

We call $(\hat V, \hat{\TT})$ the subordinate ADP generated by $(V, F, \hat V, \GG)$ . The figure below illustrates, with the numbers indicating the order in which mappings are applied. (The ordering of the tuple $(V, F, \hat V, \GG)$ traces the primary cycle on the left.)

5.2.2.2Some Preliminary Results¶

In this section we examine some basic consequences of the definitions, focusing on the order-preserving case.

To this end, let $(V, F, \hat V, \GG)$ be an order-preserving FDP. We set

\Gmax \, \hat v \coloneq \bigvee_\sigma G_\sigma \, \hat v \qquad (\hat v \in \hat V),

(5.19)

which is well-defined by (iv) in the definition of FDPs.

We now state some preliminary results concerning the ADPs generated by $(V, F, \hat V, \GG)$ .

Let $(V, F, \hat V, \GG)$ be an order-preserving FDP with generated ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$ . We have already observed that the policy operators of $(V, \TT)$ and $(\hat V, \hat{\TT})$ obey

T_\sigma = G_\sigma \circ F \quad \text{and} \quad \hat T_\sigma = F \circ G_\sigma

(5.20)

for all $\sigma \in \Sigma$ . It follows that each pair of policy systems $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ is strongly semiconjugate under $F, G_\sigma$ . As shown in Lemma 5.2.10, we also have

\tmax = \Gmax \circ F \; \text{ on } V \qquad \text{and} \qquad \htmax = F \circ \Gmax \; \text{ on } \hat V.

(5.21)

The proof is immediate from (5.21).

The strong semiconjugacy results in Lemma 5.2.11 imply helpful similarity properties for the two ADPs. The next lemma helps to illustrate.

Note that (5.21) implies that $(V, \tmax)$ and $(\hat V, \htmax)$ are strongly semiconjugate under order-preserving $F, \Gmax$ . This fact allows us to connect optima for the two ADPs and we use it repeatedly below.

5.2.2.3Optimality¶

Now we are ready to study the extent to which optimality properties are transferred under subordination. As before, the context is that $(V, F, \hat V, \GG)$ is an order-preserving FDP with primary ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$ . The symbols $\tmax$ and $\htmax$ denote their respective Bellman operators. When they exist,

\vmax = \bigvee_\sigma v_\sigma \quad \text{and} \quad \hvmax = \bigvee_\sigma \hat v_\sigma

will represent their respective value functions.

We begin with our main optimality result for order-preserving FDPs and the two ADPs they generate.

In the proof of Theorem 5.2.13, we repeatedly use the strong semiconjugacy results in Lemma 5.2.1 to transfer fixed points from one value space to another.

Proof

Proof of Theorem 5.2.13.

Suppose that (a) holds. Then $\vmax$ exists and is the unique fixed point of $\tmax$ in $V$ . Lemma 5.2.11 states that $(V, \tmax)$ and $(\hat V, \htmax)$ are strongly semiconjugate under $F, \Gmax$ , so, by Lemma 5.2.1, $F\vmax$ is the unique fixed point of $\htmax$ in $\hat V$ . We claim that $\hvmax = F \vmax$ . To see this, observe first that, given $\sigma \in \Sigma$ , we have $v_\sigma \preceq \vmax$ , so, applying the fixed point translation in (5.22), we get $\hat v_\sigma = F v_\sigma \preceq F \vmax$ . The last inequality becomes an equality if $\sigma$ is optimal for $(V, \TT)$ . This proves that the supremum $\hvmax = \bigvee_\sigma \hat v_\sigma$ is equal to $F \vmax$ . In particular, $\hvmax$ exists and is the unique fixed point of $\hat T$ in $\hat V$ . By Proposition 2.1.4, (b) holds.

Now suppose that (b) holds. Then $\hvmax$ exists and is the unique fixed point of $\htmax$ in $\hat V$ . Since $(V, \tmax)$ and $(\hat V, \htmax)$ are strongly semiconjugate under $F, \Gmax$ , the element $\Gmax \hvmax$ is the unique fixed point of $\tmax$ in $V$ . To prove (a), we need only show that $\vmax$ exists and $\vmax = \Gmax \hvmax$ , since Proposition 2.1.4 then gives the fundamental optimality properties. First take any $\sigma \in \Sigma$ . We have $\hat v_\sigma \preceq \hvmax$ and, by strong semiconjugacy of $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ under $F, G_\sigma$ , the equality $v_\sigma = G_\sigma \, \hat v_\sigma$ . Using these facts together gives $v_\sigma = G_\sigma \, \hat v_\sigma \preceq G_\sigma \, \hvmax \preceq \Gmax \hvmax$ . This proves that $\Gmax \hvmax$ is an upper bound of $V_\Sigma$ in $V$ . Thus, we can complete the proof by producing a $\sigma \in \Sigma$ with $v_\sigma = \Gmax \hvmax$ .

To this end, we choose $\sigma$ such that $G_\sigma \, \hvmax = \Gmax \hvmax$ , which is possible by the definition of order-preserving FDPs. Bellman’s principle of optimality holds for $(\hat V, \hat{\TT})$ (by (b) and Lemma 2.1.3) and, applying $F$ to the previous equality gives $\hat T_\sigma \, \hvmax = \hat T \hvmax$ . Hence $\sigma$ is optimal for $(\hat V, \hat{\TT})$ , so $\hat v_\sigma = \hvmax$ . Combining this equality with $G_\sigma \, \hvmax = \Gmax \hvmax$ yields $v_\sigma = G_\sigma \, \hat v_\sigma = G_\sigma \, \hvmax = \Gmax \hvmax$ .

Now suppose that (a) and (b) hold. The arguments above also showed that (5.23) is valid. Regarding (ii), let $\sigma \in \Sigma$ be such that $G_\sigma \, \hvmax = \Gmax \, \hvmax$ . Applying (5.23) yields $G_\sigma \, F \vmax = \Gmax F \vmax$ , or $T_\sigma \, \vmax = \tmax \, \vmax$ . By Bellman’s principle of optimality, $\sigma$ is optimal for $(V, \TT)$ .

Regarding (iii), let $\sigma$ be optimal for $(V, \TT)$ . Since $(V, \TT)$ obeys the fundamental optimality properties, $\sigma$ is $\vmax$ -greedy (i.e., $T_\sigma \, \vmax = \tmax \vmax$ ). Also, by (5.23), we have $\hvmax = F \vmax$ . Therefore,

\hat T_\sigma \, \hvmax = \hat T_\sigma \, F \vmax = F G_\sigma \, F \vmax = F \, T_\sigma \, \vmax = F \, T \vmax = F \vmax = \hvmax = \htmax \, \hvmax.

Thus, $\sigma$ is $\hvmax$ -greedy for $(\hat V, \hat{\TT})$ . But Bellman’s principle of optimality also holds for $(\hat V, \hat{\TT})$ , so $\sigma$ is optimal for $(\hat V, \hat{\TT})$ . ◻

5.2.2.4A Converse Implication¶

We continue to assume that $(V, F, \hat V, \GG)$ is an order-preserving FDP with primary ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$ . We saw in Theorem 5.2.13 that the following implication holds: if $\sigma$ is optimal for $(V, \TT)$ , then $\sigma$ is optimal for $(\hat V, \hat{\TT})$ . The converse is not in general true. Often this is because, at an intuitive level, the subordinate ADP is blind to policy behavior at states that are unreachable under the transition dynamics, while the primary ADP cares about every state. (An example of this scenario is given in Section 8.3.2.3.)

To obtain such a converse, we use a strict monotonicity condition on $F$ . In what follows, $\prec$ refers to the strict inequality defined in Section A.1.2.8. In particular, for elements $u, v$ , the statement $u \prec v$ means that $u \preceq v$ and not $u = v$ . Also, $F$ is strictly order preserving when $F$ is order preserving and $u \prec v$ implies $Fu \prec Fv$ .

Proof

Part (i) is immediate from Theorem 5.2.13 and only included for completeness. Regarding part (ii), let $\sigma$ be optimal for $(\hat V, \hat{\TT})$ . Since the fundamental optimality properties hold for $(\hat V, \hat{\TT})$ , the policy $\sigma$ must be $\hvmax$ -greedy, from which we obtain $\hat T_\sigma \, \hvmax = \htmax \hvmax$ . By definition of $\Gmax$ , we also have $G_\sigma \, \hvmax \preceq \Gmax \hvmax$ . Suppose in addition that $G_\sigma \, \hvmax \prec \Gmax \hvmax$ . Since $F$ is strictly order-preserving, this leads to $F G_\sigma \, \hvmax \prec F \Gmax \hvmax$ . But this contradicts $\hat T_\sigma \, \hvmax = \htmax \hvmax$ , so $G_\sigma \, \hvmax \prec \Gmax \hvmax$ cannot hold. Hence, it must be that $G_\sigma \, \hvmax = \Gmax \hvmax$ . By (ii) of Theorem 5.2.13, we see that $\sigma$ is optimal for $(V, \TT)$ . ◻

5.2.3Order-Reversing FDPs¶

An order-reversing FDP is an FDP $(V, F, \hat V, \GG)$ where $F$ is order-reversing and each $G_\sigma$ is order-reversing. As in the order-preserving case, $T_\sigma \coloneq G_\sigma \circ F$ and $\hat T_\sigma \coloneq F \circ G_\sigma$ define the policy operators for the primary ADP $(V, \TT)$ and subordinate ADP $(\hat V, \hat{\TT})$ , respectively. Since $F$ and $G_\sigma$ are both order reversing, each $T_\sigma$ and $\hat T_\sigma$ is order preserving, so these are valid ADPs.

Throughout this section,

$(V, F, \hat V, \GG)$ is a given FDP,
$(V, \TT)$ is the primary ADP, and
$(\hat V, \hat{\TT})$ is the subordinate ADP.

The primary ADP behaves just as in the order-preserving case:

The subordinate ADP is where the order-reversing case diverges. Because $F$ reverses order, it converts the supremum $\Gmax$ into an infimum, so $F \circ \Gmax$ gives the Bellman min-operator rather than the max-operator.

Proof

Regarding (i), fix $\hat v \in \hat V$ . By (iv), the set $\{G_\sigma \, \hat v\}$ has a greatest element, so

\htmin \hat v = \bigwedge_\sigma F G_\sigma \, \hat v = F \bigvee_\sigma G_\sigma \, \hat v = F \Gmax \hat v.

The second equality uses the fact that $F$ is order reversing and $\{G_\sigma \, \hat v\}$ has a greatest element, so $F$ maps this greatest element to the least element of $\{F G_\sigma \, \hat v\}$ . Regarding (ii), if $G_\sigma \, \hat v = \Gmax \hat v$ , then $\hat T_\sigma \, \hat v = F G_\sigma \, \hat v = F \Gmax \hat v = \htmin \hat v$ . Hence $\sigma$ is $\hat v$ -min-greedy for $(\hat V, \hat{\TT})$ . ◻

The policy operator identities $T_\sigma = G_\sigma \circ F$ and $\hat T_\sigma = F \circ G_\sigma$ hold for any FDP, regardless of order properties, so each pair of policy systems $(V, T_\sigma)$ and $(\hat V, \hat T_\sigma)$ is strongly semiconjugate under $F, G_\sigma$ . Combining these with Lemma 5.2.15 and Lemma 5.2.16, the Bellman operators obey

\tmax = \Gmax \circ F \; \text{ on } V \qquad \text{and} \qquad \htmin = F \circ \Gmax \; \text{ on } \hat V.

(5.24)

The proof is immediate from (5.24). Since $F$ and each $G_\sigma$ are order reversing, the next exercise gives a parallel of Lemma 5.2.12.

We now study optimality for order-reversing FDPs. The primary ADP $(V, \TT)$ is a standard maximization problem, while the subordinate ADP $(\hat V, \hat{\TT})$ is a minimization problem, with Bellman min-operator $\htmin = F \circ \Gmax$ . When they exist,

\vmax = \bigvee_\sigma v_\sigma \quad \text{and} \quad \hvmin = \bigwedge_\sigma \hat v_\sigma

will denote the max-value function of $(V, \TT)$ and the min-value function of $(\hat V, \hat{\TT})$ , respectively.

The next result is the order-reversing analog of Theorem 5.2.13, extended to include a converse implication under a strict monotonicity condition. In the statement, $F$ is called strictly order reversing when $F$ is order reversing and $u \prec v$ implies $Fv \prec Fu$ .

Theorem 5.2.18

The following statements are equivalent:

The fundamental max-optimality properties hold for $(V, \TT)$ .
The fundamental min-optimality properties hold for $(\hat V, \hat{\TT})$ .

If either and hence both of these statements are true, then

the value functions obey

\vmax = \Gmax \, \hvmin \quad \text{and} \quad \hvmin = F \, \vmax,

(5.26)

$G_\sigma \, \hvmin = \Gmax \, \hvmin$ $\implies$ $\sigma$ is optimal for $(V, \TT)$ , and
$\sigma$ is optimal for $(V, \TT)$ $\implies$ $\sigma$ is min-optimal for $(\hat V, \hat{\TT})$ .

If, in addition, $F$ is strictly order reversing, then

$\sigma$ is min-optimal for $(\hat V, \hat{\TT})$ $\implies$ $\sigma$ is optimal for $(V, \TT)$ .

Proof

The proof follows the structure of Theorem 5.2.13, with $\htmin$ replacing $\htmax$ and inequalities reversed by the order-reversing maps. For (a) $\implies$ (b), Lemma 5.2.17 gives that $(V, \tmax)$ and $(\hat V, \htmin)$ are strongly semiconjugate under $F, \Gmax$ , so $F\vmax$ is the unique fixed point of $\htmin$ by Lemma 5.2.1. Since $F$ is order reversing, $\hat v_\sigma = F v_\sigma \succeq F \vmax$ for all $\sigma$ , with equality for optimal $\sigma$ . Hence $\hvmin = F \vmax$ and (b) follows from the min-analog of Proposition 2.1.4.

For (b) $\implies$ (a), since $G_\sigma$ is order reversing and $\hat v_\sigma \succeq \hvmin$ , we get $v_\sigma = G_\sigma \hat v_\sigma \preceq G_\sigma \hvmin \preceq \Gmax \hvmin$ . Choosing $\sigma$ with $G_\sigma \hvmin = \Gmax \hvmin$ and applying Bellman’s principle of min-optimality yields $\hat v_\sigma = \hvmin$ , so $v_\sigma = \Gmax \hvmin$ . Thus $\vmax = \Gmax \hvmin$ and (a) follows from Proposition 2.1.4.

Parts (ii) and (iii) follow by the same arguments as in Theorem 5.2.13, replacing $\hvmax$ with $\hvmin$ and max-greedy with min-greedy throughout.

Regarding (iv), let $\sigma$ be min-optimal for $(\hat V, \hat{\TT})$ , so that $\hat T_\sigma \, \hvmin = \htmin \hvmin$ . Supposing $G_\sigma \, \hvmin \prec \Gmax \hvmin$ leads to $F \Gmax \hvmin \prec F G_\sigma \, \hvmin$ (since $F$ is strictly order reversing), contradicting the previous equality. Hence $G_\sigma \, \hvmin = \Gmax \hvmin$ , and (ii) gives the result. ◻

5.3Applications¶

In Section 5.3.1, we show that the standard MDP and its Q-factor variant are the primary and subordinate ADPs of a single order-preserving FDP, unifying results previously proved separately. In Section 5.3.2, we apply the same framework to connect discrete choice Bellman equations with the post-action value function formulation used in structural estimation. In Section 5.3.3, we revisit the Epstein–Zin model and use factored dynamic programs to obtain a lower-dimensional subordinate ADP that is more efficient to solve.

5.3.1A Deeper Analysis of Q-Factors¶

In Section 3.2.1 we discussed both MDPs and Q-factor MDPs, proving separately that they satisfy the fundamental optimality properties. Of course, these two models are connected. Here we unify the models by representing them as the primary and subordinate pairs of an FDP.

We work with the finite-state MDP environment described in Section 3.2.1. Using the primitives described there, we form an order-preserving FDP $(V, F, \hat V, \GG)$ by setting

(Fv)(x, a) = r(x, a) + \beta \sum_{x'} v(x')P(x,a,x') \qquad \left(v \in \RR^\Xsf, \; (x,a) \in \Gsf \right)

(5.27)

and

(G_\sigma f)(x) = f(x, \sigma(x)) \qquad \left(\sigma \in \Sigma, \; f \in \RR^\Gsf \right),

with $V \coloneq \RR^\Xsf$ and $\hat V \coloneq \RR^\Gsf$ . As required, $F$ maps $V$ to $\hat V$ , and $G_\sigma$ maps $\hat V$ to $V$ , and both are order preserving. Also, fixing $f \in \hat V$ and choosing $\sigma$ such that $\sigma(x) \in \argmax_{a \in \Gamma(x)} f(x, a)$ , we verify the existence of a policy $\sigma$ with $G_\tau \, f \leq G_\sigma f$ for all $\tau \in \Sigma$ . Hence $(V, F, \hat V, \GG)$ is an order-preserving FDP, as claimed.

The primary ADP $(V, \TT)$ generated by $(V, F, \hat V, \GG)$ is produced by setting $T_\sigma = G_\sigma \circ F$ for each $\sigma$ , which gives

(T_\sigma \, v)(x) = (G_\sigma \, F \, v)(x) = r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x').

Thus, $(V, \TT)$ is nothing but the standard ADP generated from an MDP (see Section 2.3.3.1).

The subordinate ADP $(\hat V, \hat{\TT})$ generated by $(V, F, \hat V, \GG)$ is produced by setting $\hat T_\sigma = F \circ G_\sigma$ for each $\sigma$ , which gives

(\hat T_\sigma \, f)(x, a) = (F \, G_\sigma \, f)(x, a) = r(x, a) + \beta \sum_{x'} f(x', \sigma(x')) P(x, a, x').

This map $\hat T_\sigma$ is identical to the Q-factor policy operator $S_\sigma$ we constructed in (3.6). Thus, $(\hat V, \hat{\TT})$ is just the Q-factor ADP we examined in Section 3.2.1.2 (where the ADP was written as $(\RR^\Gsf, \SS)$ ).

We already know that the fundamental optimality properties hold for both of these models, and that VFI, OPI, and HPI all converge. We took the time to prove these facts separately, in Section 3.2.1.1 and Section 3.2.1.2. Theorem 5.2.13 now tells us that one of these steps was unnecessary. Since these two ADPs are the primary and subordinate elements of the FDP $(V, F, \hat V, \GG)$ , establishing these facts for either one of the pairs is enough to establish them for the other.

In addition, we can use Theorem 5.2.13 to formally connect the value functions and optimal policies. In the next proposition, $\vmax$ is the MDP value function and $\qmax$ is the Q-factor value function.

In Section 5.2.2.4, we discussed the fact that the converse to (ii) fails to hold without additional conditions. In particular, to get the converse to (ii), we require that $F$ is strictly order preserving. Here’s how that result looks in the present case. In the result, the statement that $\Xsf$ has no isolated point under $P$ means that there is no $x' \in \Xsf$ such that $P(x,a,x')=0$ for all $(x,a) \in \Gsf$ .

5.3.2Structural Estimation via Transforms¶

In Section 4.2.3 we considered post-action value functions for discrete choice models in the context of structural estimation. Post-action value functions are a transformation of a more standard Bellman equation. Here we unify these two models through the lens of FDPs.

The setting we consider is the same as in Section 4.2.3.1. The state space $\Xsf$ is a metric space. The action set $\Asf$ is finite. The set of policies $\Sigma$ is all measurable maps from $\Xsf$ to $\Asf$ and, correspondingly, $\Gsf \coloneq \Xsf \times \Asf$ . The reward function $r \in \RR^\Gsf$ is assumed to be bounded and Borel measurable, while $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$ .

Consider the tuple $(b\Xsf, F, b\Gsf, \GG)$ , where

(Fv)(x, a) = \int v(x')P(x,a, \diff x') \qquad (v \in b\Xsf)

(5.28)

and

(G_\sigma \, g)(x) = r(x, \sigma(x)) + \beta g(x, \sigma(x)). \qquad (g \in b\Gsf).

(5.29)

We claim that $(b\Xsf, F, b\Gsf, \GG)$ is an order-preserving FDP. Clearly $F$ is an order-preserving map from $b\Xsf$ to $b\Gsf$ , while each $G_\sigma$ is an order preserving map from $b\Gsf$ to $b\Xsf$ . Moreover, we proved in Exercise 4.2.4 that there exists a measurable map $\sigma \colon \Xsf \to \Asf$ obeying

\sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta g(x, a)\} \quad \text{for all } x \in \Xsf

(5.30)

For any such $\sigma$ we have $G_\tau \, g \leq G_\sigma \, g$ for all $\tau \in \Sigma$ . These facts confirm our claim.

The primary ADP $(b\Xsf, \TT_{\rm SE})$ for this model is the “natural” discrete choice version of the dynamic program. To describe it, we use $T_\sigma = G_\sigma \circ F$ to determine the policy operators, yielding

(T_\sigma \, v)(x) = r(x, \sigma(x)) + \beta \int v(x') P(x, \sigma(x), \diff x').

The corresponding Bellman equation is

v(x) = \max_{a \in \Asf} \left\{ r(x, a) + \beta \int v(x') P(x, a, \diff x') \right\} \qquad (v \in b\Xsf).

For the subordinate ADP $(b\Gsf, \hat{\TT}_{\rm SE})$ , the policy operators are given by $\hat T_\sigma \, g = F G_\sigma \, g$ , yielding

(\hat T_\sigma \, g)(x, a) = \int \left\{ r(x', \sigma(x')) + \beta g(x', \sigma(x')) \right\} P(x, a, \diff x') \qquad (g \in b\Gsf).

The corresponding Bellman equation is

g(x, a) = \int \max_{a' \in \Asf} \left\{ r(x', a') + \beta g(x', a') \right\} P(x, a, \diff x'),

which is exactly the post-action value function Bellman equation we examined in (4.20).

We proved in Proposition 4.2.4 that, for this ADP $(b\Gsf, \hat{\TT}_{\rm SE})$ , the fundamental optimality properties hold. Theorem 5.2.13 now implies that the same properties hold for $(b\Xsf, \TT_{\rm SE})$ . It also tells us that any policy optimal for $(b\Xsf, \TT_{\rm SE})$ is also optimal for $(b\Gsf, \hat{\TT}_{\rm SE})$ , and that a policy $\sigma$ is optimal for $(b\Xsf, \TT_{\rm SE})$ whenever $G_\sigma \, \gmax = \Gmax \, \gmax$ . (Here $\Gmax = \bigvee_\sigma G_\sigma$ and $\gmax$ is the value function for $(b\Gsf, \hat{\TT}_{\rm SE})$ , which is the unique solution to the post-action value function Bellman equation.)

In the present setting, this means we can compute an optimal policy for the primary ADP as follows:

Compute the unique (in $b\Gsf$ ) fixed point $\gmax$ of the Bellman operator $\hat T$ corresponding to the subordinate (post-action value) ADP $(b\Gsf, \hat{\TT}_{\rm SE})$ and
compute a policy $\sigma$ obeying

\sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta \gmax(x, a)\} \quad \text{for all } x \in \Xsf.

Also, if the function $F$ in (5.28) is strictly order preserving, then by Proposition 5.2.14, we can compute an optimal policy for $(b\Xsf, \TT_{\rm SE})$ by solving for an optimal policy for $(b\Gsf, \hat{\TT}_{\rm SE})$ . The strictly order preserving property does not hold for $F$ in general. For a sufficient condition when $\Xsf$ is finite, see Example A.1.12.

5.3.3Epstein–Zin Revisited¶

We consider a special case of the Epstein–Zin model from Section 5.1.3 involving optimal savings with IID endowments. Using the FDP framework, we construct a subordinate ADP that operates on functions of wealth alone, rather than wealth and endowment jointly. This yields significant computational savings, which we quantify numerically.

5.3.3.1Subordination in an Epstein–Zin Setting¶

In this section we consider a special case of the Epstein–Zin ADP $(V, \TT)$ analyzed in Section 5.1.3. The special case concerns optimal savings in the presence of an IID endowment process. We will produce a subordinate ADP via a transformation reminiscent of the expected value transformation of an ordinary MDP in Section 5.3.2. We will see that this subordinate ADP is easier to analyze and solve.

We begin with a finite set $\Wsf$ of possible wealth values and a finite set $\Esf$ of possible values for the endowment process. (Finiteness helps simplify the exposition and can be replaced by continuity and compactness conditions.) The Bellman equation takes the form

v(w, e) = \max_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta \left( \sum_{e'} v(w', e')^\nu \phi(e') \right)^{\alpha/ \nu} \right\}^{1/\alpha}.

(5.31)

Here $\Gamma(w, e) \subset \Wsf$ is the set of all feasible choices for next period wealth $w'$ given current wealth $w$ and current endowment $e$ . The new endowment $e'$ is drawn independently from distribution $\phi$ , which maps $\Esf$ into $[0,1]$ .

This model is a special case of the Epstein–Zin ADP from Section 5.1.3. To see this we set $\Xsf \coloneq \Wsf \times \Esf$ , with typical element $x = (w, e)$ . Let $V$ be the order interval $[v_1^{1/\nu}, v_2^{1/\nu}] \subset (0, \infty)^\Xsf$ defined in Section 5.1.3 (see (5.12)). With $a = w'$ and $\Asf = \Wsf$ , the Bellman equation (5.31) is a special case of (5.10).

To improve analysis we produce an order-preserving FDP where the primary ADP is the model just discussed and the subordinate ADP operates in a lower-dimensional state space. To do so we set $V$ as above,

(Fv)(w) = \left\{ \sum_{e} v(w, e)^\nu \phi(e) \right\}^{1/\nu} \qquad (w \in \Wsf),

which maps $V$ to $\hat V \coloneq F(V)$ , and

(G_\sigma \, h)(w, e) = \left\{ (1-\beta) r(w, \sigma(w, e), e)^\alpha + \beta h(\sigma(w, e))^\alpha \right\}^{1/\alpha} \qquad ((w,e) \in \Xsf),

(5.32)

Both $F$ and $G_\sigma$ are order-preserving. Assuming that $\Gamma(w, e)$ is nonempty at each $(w, e) \in \Xsf$ , one easily verifies the existence, for each $h \in \hat V$ , of a policy $\sigma$ such that

\sigma(w, e) \in \argmax_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta h(w')^{\alpha} \right\}^{1/\alpha} \quad \text{for all } (w,e) \in \Xsf.

(5.33)

For this $\sigma$ we have $G_\tau h \leq G_\sigma h$ for all $\tau \in \Sigma$ . Hence, with $\GG = \{G_\sigma\}_{\sigma \in \Sigma}$ , the tuple $(V, F, \hat V, \GG)$ is an order-preserving FDP.

Inspecting (5.34), we see that the corresponding Bellman equation is (5.31). Thus, the primary ADP $(V, \TT)$ corresponds to the original problem we considered at the start of this section. Let’s now look at the subordinate problem.

The benefit of working with $(\hat V, \hat{\TT})$ is that $\hat T_\sigma$ acts on functions that depend only on $w$ rather than on both $w$ and $e$ (as is the case for $T_\sigma$ ). These lower dimensional operations are significantly more efficient, even when $\Esf$ is relatively small.

Since $(V, \TT)$ is a special case of the ADP discussed in Section 5.1.3, Proposition 5.1.13 implies that the fundamental optimality properties hold for $(V, \TT)$ . As a result, Theorem 5.2.13 implies that they also hold for $(\hat V, \hat{\TT})$ . It also tells us that we can obtain an optimal policy for $(V, \TT)$ by finding the value function $\hvmax$ for $(\hat V, \hat{\TT})$ and then calculating a policy $\sigma$ obeying $G_\sigma \, \hvmax = \Gmax \, \hvmax$ . By the definition of $G_\sigma$ in (5.32), this means that we solve for $\sigma$ satisfying (5.33) after setting $h = \hvmax$ .

To compute $\hvmax$ , we can use Theorem 2.2.6, which tells us that Howard policy iteration applied to $(\hat V, \hat{\TT})$ converges to $\hvmax$ in finitely many steps. Summarizing this analysis, an optimal policy for $(V, \TT)$ can be computed via Algorithm 5.3.1.

Algorithm 5.3.1 (Solving

(V, \TT)

via

(\hat V, \hat{\TT})

)

input $\sigma_0 \in \Sigma$ , set $k \leftarrow 0$ and $\epsilon \leftarrow 1$
while $\epsilon > 0$ do
1. $h_k \leftarrow$ the fixed point of $\hat T_{\sigma_k}$
2. $\sigma_{k+1} \leftarrow$ an $h_k$ -greedy policy, satisfying
  $\sigma_{k+1}(w, e) \in \argmax_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta h_k(w')^\alpha \right\}^{1/\alpha}$
3. $\epsilon \leftarrow \1\{ \sigma_k \neq \sigma_{k+1} \}$
4. $k \leftarrow k + 1$
end
return $\sigma_{k-1}$

Figure 5.2 shows $w \mapsto \sigopt(w, e)$ for two values of $e$ (smallest and largest) when $\sigopt$ is the optimal policy, calculated using Algorithm 5.3.1. In the computation we set $\Gamma(w, e) = [0, w]$ and $r(w, s, e) = w - s + e$ . We chose $\alpha$ and $\nu$ to match the values used in Schorfheide et al. (2018).

In Figure 5.3 we display the relative speed gain from using the lower-dimensional model $(\hat V, \hat{\TT})$ instead of $(V, \TT)$ across multiple choices of $|\Wsf|$ and $|\Esf|$ . The speed gain is the time required to solve an optimal policy for $(V, \TT)$ using HPI applied to $(V, \TT)$ (as in Theorem 2.2.6), divided by the time required to solve for the same optimal policy via Algorithm 5.3.1. The speed gain increases linearly in the size of $\Esf$ .

Figure 5.2:Optimal savings policy with Epstein–Zin preference

Speed gain from replacing (V, \TT) with subordinate model (\hat V, \hat{\TT}) — Figure 5.3:Speed gain from replacing $(V, \TT)$ with subordinate model $(\hat V, \hat{\TT})$

5.4Chapter Notes¶

A recurring theme in applied dynamic programming has been the rearrangement of the Bellman equation into alternative functional forms in order to obtain analytical, computational, or statistical advantages. Familiar instances include reservation-wage and continuation-value formulations of optimal stopping problems, used as early as the job-search analysis of McCall (1970) and developed extensively in the real-options literature (see, e.g., Dixit & Pindyck (2012)); the expected and integrated value-function transformations used in structural estimation of discrete choice models, pioneered by Rust (1987); the Q-factor representations introduced for reinforcement learning by Watkins (1989); and lower-dimensional reformulations that integrate out independent or uncontrollable states. Systematic studies of such transformations can be found in Ma & Stachurski (2019) and Ma & Stachurski (2021). The order-theoretic framework adopted in this chapter, building on the abstract foundations of Sargent & Stachurski (2025), unifies and extends those analyses by treating the underlying maps in their own right rather than as recipes for individual Bellman equations. Closely related ideas in the more concrete setting of finite-state MDPs can be found in Chapter 5 of Sargent & Stachurski (2025).

The topological conjugacy results in Section 5.1.1.2 are applied in Section 8.3.4 of Chapter 8 to establish the equivalence of value function iteration and time iteration.

The applications in Section 5.3 unify topics treated in their own right elsewhere in this volume. References to the underlying literatures can be found in the corresponding chapter notes: see Section 3.3 for Q-factor representations and reinforcement learning, Section 4.3 for structural estimation of dynamic discrete choice models, and Section 1.6 for Epstein–Zin preferences. The optimal harvest model in Section 8.3.2 of Chapter 8 is another example of a factored dynamic program in the sense of Section 5.2.2.

The firm-entry application in Section 5.2 is adapted from Fajgelbaum et al. (2017), slightly extended to permit time-varying discounting; for related industry-dynamics models see Hopenhayn (1992). The parameter values used in Figure 5.2 follow Schorfheide et al. (2018).

References¶

Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
Fajgelbaum, P. D., Schaal, E., & Taschereau-Dumouchel, M. (2017). Uncertainty traps. The Quarterly Journal of Economics, 132(4), 1641–1692.
Stachurski, J., & Zhang, J. (2021). Dynamic programming with state-dependent discounting. Journal of Economic Theory, 192, 105190.
Schorfheide, F., Song, D., & Yaron, A. (2018). Identifying long-run risks: A Bayesian mixed-frequency approach. Econometrica, 86(2), 617–654.
McCall, J. J. (1970). Economics of Information and Job Search. The Quarterly Journal of Economics, 84(1), 113–126.
Dixit, R. K., & Pindyck, R. S. (2012). Investment under uncertainty. In Investment Under Uncertainty. Princeton University Press.
Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 999–1033.
Watkins, C. J. C. H. (1989). Learning from delayed rewards [Techreport]. PhD Thesis, King’s College, Cambridge United Kingdom.
Ma, Q., & Stachurski, J. (2019). Optimal timing of decisions: A general theory based on continuation values. Journal of Economic Dynamics and Control, 101, 62–81.
Ma, Q., & Stachurski, J. (2021). Dynamic programming deconstructed: Transformations of the Bellman equation and computational efficiency. Operations Research, 69(5), 1591–1607.
Sargent, T. J., & Stachurski, J. (2025). Dynamic Programs on Partially Ordered Sets. SIAM Journal on Optimization and Control, in press.
Hopenhayn, H. A. (1992). Entry, exit, and firm dynamics in long run equilibrium. Econometrica: Journal of the Econometric Society, 1127–1150.

5 ADP Transformations

5.1Isomorphisms¶

5.1.1Background Concepts¶

5.1.1.1Conjugate Dynamics¶

5.1.1.2Topological Conjugacy¶

5.1.1.3Order Conjugacy¶

5.1.2Isomorphic ADPs¶

5.1.2.1Definition and Consequences¶

5.1.2.2Isomorphisms and Optimality¶

5.1.2.3The Anti-Isomorphic Case¶

5.1.3Example: Epstein–Zin Optimality¶

5.2Semiconjugate Relationships¶

5.2.1Strong Semiconjugacy¶

5.2.1.1Definition¶

5.2.1.2The Order-Reversing Case¶

5.2.1.3Application: Firm Entry¶

5.2.2Factored Dynamic Programs¶

5.2.2.1Definition¶

5.2.2.2Some Preliminary Results¶

5.2.2.3Optimality¶

5.2.2.4A Converse Implication¶

5.2.3Order-Reversing FDPs¶

5.3Applications¶

5.3.1A Deeper Analysis of Q-Factors¶

5.3.2Structural Estimation via Transforms¶

5.3.3Epstein–Zin Revisited¶

5.3.3.1Subordination in an Epstein–Zin Setting¶

5.4Chapter Notes¶