Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5 ADP Transformations

A recurring task in mathematics is establishing when two apparently different objects are, in a precise sense, the same. Such equivalences are formalized by invertible, structure-preserving maps. For example, a group isomorphism is a bijection that preserves the group operation, revealing that two groups share identical algebraic structure despite different representations. A topological conjugacy between two dynamical systems is a homeomorphism intertwining their transition maps, establishing that the systems have the same dynamic behavior up to a relabeling of the state space. In the setting of dynamic programming, the relevant structure is order, and the appropriate notion of equivalence is that of an order isomorphism between posets.

We now investigate transformations of dynamic programs that preserve optimality structure. First, in Section 5.1, we investigate order isomorphisms, under which connections are exact. Two dynamic programs linked by isomorphisms are “the same” in terms of their optimality properties. In elementary terms, this is analogous to the way that g=ϕfg = \phi \circ f has the same maximizer as ff when ϕ\phi is strictly increasing; and that a maximizer of gg is a minimizer of ff when ϕ\phi is strictly decreasing.

Isomorphisms are useful but there is also a sense in which they preserve too much structure. Often we want to transform a dynamic program into another one that differs along at least some dimensions -- for example, in terms of dimensionality, or smoothness -- and try to understand the original problem by studying the second more tractable one. We investigate such transformations in Section 5.2.2, under the title of “factored” dynamic programs. In Section 5.2, we introduce factored dynamic programs (FDPs), which involve non-bijective transformations that link a primary ADP to a subordinate ADP of potentially lower dimension. We show how optimality properties transfer between the two. Section 5.3 applies both isomorphisms and FDPs to concrete settings, including Q-factor models, structural estimation, and Epstein--Zin preferences.

5.1Isomorphisms

In this section we introduce a concept of “isomorphic” dynamic programs. In particular, we describe an isomorphic relationship that leads to essentially equivalent optimality properties. The basic idea can be explained with a simple example. Consider the savings problem from Section 1.3.2.3, with Bellman equation

v(w)=max0cw{u(c)+βv(wc)}.v(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta v(w - c) \right\}.

If uu has heavy curvature near zero, one might consider taking an exponential transformation, aiming to work with functions that are easier to approximate numerically. Applying exp\exp to both sides, writing u^\hat u for expu\exp \circ \, u and v^\hat v for expv\exp \circ \, v, we get

v^(w)=max0cw{u^(c)v^(wc)β}.\hat v(w) = \max_{0 \leq c \leq w} \left\{ \hat u(c) \hat v(w - c)^\beta \right\}.

Not surprisingly, these two dynamic programs turn out to have exactly the same optimal policies, giving us two viable angles of attack. The theory in this section clarifies and generalizes this idea, with the aim of allowing us to apply it effectively to both simple and complex problems.

Along the way we will meet useful concepts such as topological conjugacy for dynamic systems and order isomorphisms. We will exploit these ideas to study additional optimization problems and algorithms, including time iteration methods and decision problems with ambiguity.

5.1.1Background Concepts

We begin with conjugate dynamical systems in Section 5.1.1.1, which preserve fixed point structure. In Section 5.1.1.2, we strengthen conjugacy to topological conjugacy, which additionally preserves stability. In Section 5.1.1.3, we develop order conjugacy, an order-theoretic alternative that is better suited to passing optimality properties between ADPs.

5.1.1.1Conjugate Dynamics

We recall that a (discrete time) dynamical system is a pair (V,S)(V, S), where VV is any set and SS is a self-map on VV. Two dynamical systems (V,S)(V, S) and (V^,S^)(\hat V, \hat S) are said to be conjugate under FF (or just conjugate) if

FF is a bijection from VV into V^\hat V and FS=S^FF \circ S = \hat S \circ F on VV.

We can also write the last equality as S=F1S^FS = F^{-1} \circ \hat S \circ F. This helps us understand the conjugacy relationship: shifting a point vVv \in V to SvS v via SS is equivalent to

  1. moving vv into V^\hat V via v^=Fv\hat v = F v,

  2. applying S^\hat S to produce S^v^\hat S \hat v, and then

  3. moving the result S^v^\hat S \hat v back to the original space VV using F1F^{-1}.

The next result lists some consequences of conjugacy.

The proofs of these claims are straightforward. For example, regarding (i), suppose that Sn=F1S^nFS^n = F^{-1} \hat S^n F at some fixed nn. Then, using conjugacy,

Sn+1=SSn=SF1S^nF=F1S^S^nF=F1S^n+1F.S^{n+1} = S S^n = S F^{-1} \hat S^n F = F^{-1} \hat S \hat S^n F = F^{-1} \hat S^{n+1} F.

5.1.1.2Topological Conjugacy

For us, perhaps the most important consequence of Proposition 5.1.1 is that, if two dynamical systems (V,S)(V, S) and (V^,S^)(\hat V, \hat S) have this property, then the system (V,S)(V, S) has a unique fixed point if and only if (V^,S^)(\hat V, \hat S) has a unique fixed point. At the same time, conjugacy is not enough to pass on stability properties. For this we need topological conjugacy.

To state this property, let VV and V^\hat V be Hausdorff topological spaces. A map F ⁣:VV^F \colon V \to \hat V is called a homeomorphism if FF is a bijection from VV to V^\hat V and, in addition, both FF and its inverse are continuous. The dynamical systems (V,S)(V, S) and (V^,S^)(\hat V, \hat S) are called topologically conjugate when these two systems are conjugate under a bijection FF and, in addition, FF is a homeomorphism.

In this setting, we have the following result:

The following example gives an elementary but nonetheless important illustration of the value of Proposition 5.1.2.

We will use Proposition 5.1.2 when we study Euler equations and time iteration in Section 8.3.3.

5.1.1.3Order Conjugacy

In the previous two sections we introduced conjugacy and then strengthened it to topological conjugacy, a fairly standard approach. Now, however, we will step back to conjugacy and strengthen it using order rather than topology. The order-theoretic notion we develop will be analogous to topological conjugacy, while at the same time being better suited to passing optimality properties from one ADP to another.

To begin, we take VV and V^\hat V to be posets and consider two dynamical systems (V,S)(V, S) and (V^,S^)(\hat V, \hat S). We call these systems order conjugate under FF if

  1. (V,S)(V, S) and (V^,S^)(\hat V, \hat S) are conjugate under FF and,

  2. FF is an order isomorphism (see Section A.1.2.9).

To indicate that such an FF can be found, we simply say that (V,S)(V, S) and (V^,S^)(\hat V, \hat S) are order conjugate.

The next lemma shows one benefit of establishing order conjugacy. It can be thought of as an order-theoretic version of Proposition 5.1.2.

5.1.2Isomorphic ADPs

In this section, we use order conjugacy to connect dynamic programs. We are interested in whether or not ADPs can be connected by order isomorphisms (or anti-isomorphisms), and what implications this has for optimality. In Section 5.1.2.1, we define isomorphic ADPs and show that isomorphism is an equivalence relation. In Section 5.1.2.2, we prove that isomorphic ADPs share the same optimality and convergence properties. Section 5.1.2.3 extends the analysis to anti-isomorphic ADPs, where maximization in one ADP corresponds to minimization in the other. Finally, in Section 5.1.3, we apply both isomorphic and anti-isomorphic relationships to establish optimality for an Epstein--Zin preference model.

5.1.2.1Definition and Consequences

Let (V,T)(V, \TT) and (V^,T^)(\hat V, \hat{\TT}) be ADPs with policy sets T{Tσ:σΣ}\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma} and T^{T^σ:σΣ}\hat{\TT} \coloneq \setntn{\hat T_\sigma}{\sigma \in \Sigma}. We call these ADPs isomorphic under FF if

  1. these two ADPs have the same policy set Σ\Sigma, and

  2. (V,Tσ)(V, T_\sigma) and (V^,T^σ)(\hat V, \hat T_\sigma) are order conjugate under FF for all σΣ\sigma \in \Sigma.

Part (ii) requires that FF is an order isomorphism from VV to V^\hat V and

FTσ=T^σF     on V for all σΣ.F \circ T_\sigma = \hat T_\sigma \circ F \;\; \text{ on } V \text{ for all } \sigma \in \Sigma.

In other words, if A\mathbf A is the set of all ADPs and, for (V,T),(V^,T^)A(V, \TT), (\hat V, \hat{\TT}) \in \mathbf A, the symbol (V,T)(V^,T^)(V, \TT) \sim (\hat V, \hat{\TT}) means (V,T)(V, \TT) and (V^,T^)(\hat V, \hat{\TT}) are isomorphic, then \sim is reflexive, symmetric and transitive.

5.1.2.2Isomorphisms and Optimality

We seek relationships between optimality properties of isomorphic ADPs. For all of this section, we take (V,T)(V, \TT) and (V^,T^)(\hat V, \hat{\TT}) to be two ADPs with T={Tσ:σΣ}\TT = \setntn{T_\sigma}{\sigma \in \Sigma} and T^={T^σ:σΣ}\hat{\TT} = \setntn{\hat T_\sigma}{\sigma \in \Sigma}. When they exist, we let

The next theorem shows that isomorphic ADPs share the same regularity and optimality properties:

The next theorem studies the case when (V,T)(V, \TT) and (V^,T^)(\hat V, \hat{\TT}) are regular and well-posed. It tells us that isomorphic ADPs have the same optimality properties.

The next theorem considers convergence of algorithms.

The proof is long but straightforward and presented as a solved exercise.

5.1.2.3The Anti-Isomorphic Case

In this section, we switch to studying anti-isomorphic ADPs. In doing so, we will consider minimization as well as maximization. We follow the notational conventions and terminology introduced in Section 2.2.3. Most readers will find it helpful to review that section before reading this one.

Let (V,T)(V, \TT) and (V^,T^)(\hat V, \hat{\TT}) be ADPs with the same policy set. In line with the notation in Section 5.1.2.2, we let

As was the case in Section 2.2.3, we enhance clarity by adding a “max-” prefix to previously introduced definitions that pertain to maximization. For example,

and so on.

We call (V,T)(V, \TT) and (V^,T^)(\hat V, \hat{\TT}) anti-isomorphic under FF if these two ADPs have the same policy set Σ\Sigma and, in addition, FF is an anti-isomorphism from VV to V^\hat V such that (5.7) holds.

We can also express this relationship in terms of isomorphisms and duality of ADPs, as defined in Section 2.2.3.2:

Here is an optimality result for anti-isomorphic ADPs that parallels Theorem 5.1.5.

Now we consider convergence of algorithms in the anti-isomorphic case.

5.1.3Example: Epstein--Zin Optimality

In this section, we use the theory of isomorphic and anti-isomorphic ADPs to study optimality properties of a modified MDP model that incorporates Epstein--Zin preferences (motivation for which was provided in Section 1.3.3). We will show how isomorphic and anti-isomorphic relationships can be used to simplify analysis.

To begin, consider a variation of the finite state MDP from Section 1.2 where the Bellman equation is modified to

v(x)=maxaΓ(x){(1β)r(x,a)α+β[(Lv)(x,a)]α}1/α,v(x) = \max_{a \in \Gamma(x)} \left\{ (1-\beta) r(x, a)^\alpha + \beta \left[ (Lv)(x,a) \right]^{\alpha} \right\}^{1/\alpha},

with

(Lv)(x,a)(xv(x)νP(x,a,x))1/ν.(Lv)(x, a) \coloneq \left( \sum_{x'} v(x')^\nu P(x, a, x') \right)^{1/\nu}.

Here X\Xsf, A\Asf, rr, Γ\Gamma, β\beta, and PP are all as in Section 1.2, while G\Gsf is the feasible state-action pairs. The parameters α\alpha and ν\nu are connected to the notation of Section 1.3.3 via α=11/ψ\alpha = 1 - 1/\psi and ν=1γ\nu = 1 - \gamma, where ψ\psi is the EIS and γ\gamma is the coefficient of relative risk aversion. The change of notation is made to simplify the presentation. If α=ν=1\alpha=\nu=1, then this Epstein--Zin model reduces to an ordinary finite state MDP. In this section, however, we allow α\alpha and ν\nu to take any nonzero values.

Using the notation

rσ(x)r(x,σ(x))and(Lσv)(x)(Lv)(x,σ(x)),r_\sigma(x) \coloneq r(x, \sigma(x)) \quad \text{and} \quad (L_\sigma v)(x) \coloneq (Lv)(x, \sigma(x)),

we can write a corresponding policy operator TσT_\sigma as

Tσv={(1β)rσα+β(Lσv)α}1/α.T_\sigma \, v = \left\{ (1-\beta) r_\sigma^\alpha + \beta \left( L_\sigma \, v \right)^{\alpha} \right\}^{1/\alpha}.

Let TEZ\TT_{\rm EZ} be the set of all such TσT_\sigma.

We assume that rr is strictly positive, so that TσT_\sigma maps (0,)X(0, \infty)^\Xsf into itself. Sargent & Stachurski (2025) establish optimality properties for this specification when PσP_\sigma is irreducible for all σΣ\sigma \in \Sigma. Here we drop this assumption and establish the same optimality properties.

To this end, let θ=ν/α\theta = \nu/\alpha. Fix ϵ>0\epsilon > 0 with minrαϵ>0\min r^\alpha - \epsilon > 0. Let

V^=[v1,v2]wherev1=m1m2 and v2=m1m2.\hat V = [v_1, v_2] \quad \text{where} \quad v_1 = m_1 \wedge m_2 \text{ and } v_2 = m_1 \vee m_2.

Here m1(minrαϵ)θm_1 \coloneq \left( \min r^\alpha - \epsilon \right)^\theta and m2(maxrα+ϵ)θm_2 \coloneq \left( \max r^\alpha + \epsilon \right)^\theta. Let FF be defined by

Fv=vνwith v(0,)X,F \, v = v^\nu \qquad \text{with } v \in (0, \infty)^\Xsf,

where the exponent ν\nu is applied pointwise to vv, and set

VF1V^={v(0,)X:v1vνv2}.V \coloneq F^{-1} \hat V = \setntn{v \in (0, \infty)^\Xsf}{v_1 \leq v^\nu \leq v_2}.

We are interested in optimality properties of (V,TEZ)(V, \TT_{\rm EZ}). While we can try to tackle this ADP directly, the arguments become significantly easier after a transformation. To pursue this path, we introduce the auxiliary ADP (V^,T^EZ)(\hat V, \hat{\TT}_{\rm EZ}) with V^\hat V as defined above and

T^σv={(1β)rσα+β(Pσv)1/θ}θ.\hat T_\sigma \, v = \left\{ (1-\beta) r_\sigma^\alpha + \beta \left( P_\sigma \, v \right)^{1/\theta} \right\}^\theta.

In the next exercise, fgf \ll g means f(x)<g(x)f(x) < g(x) for all xx, as in Section 4.1.3.

The results from Exercise 5.9 and Exercise 5.10 allow us to establish optimality properties for the auxiliary ADP (V^,T^EZ)(\hat V, \hat{\TT}_{\rm EZ}).

The next result follows easily from the conclusion of Exercise 5.11.

We are now ready to state and prove the main result of this section.

The relationship between (V,TEZ)(V, \TT_{\rm EZ}) and (V^,T^EZ)(\hat V, \hat{\TT}_{\rm EZ}) allows us to use either one to solve for an optimal policy. For example, if ν<0\nu < 0, then, by Theorem 5.1.8, any min-optimal policy for (V^,T^EZ)(\hat V, \hat{\TT}_{\rm EZ}) will be max-optimal for (V,TEZ)(V, \TT_{\rm EZ}). Hence we can solve for the Epstein--Zin max-optimal policy either by directly solving (V,TEZ)(V, \TT_{\rm EZ}) or by solving (V^,T^EZ)(\hat V, \hat{\TT}_{\rm EZ}) for a min-optimal policy. The best choice depends on computational simplicity and numerical stability.

5.2Semiconjugate Relationships

In this section we introduce an asymmetric relationship between ADPs that involves a form of factorization. This factorization produces two versions of an ADP: a primary one and a subordinate one. Typically, the primary ADP will be relatively standard, while the subordinate ADP can be thought of as a variation that might have some analytical or computational advantages. Under certain conditions, studying the “simpler” subordinate ADP will shed light on outcomes and solutions for the primary ADP.

In the applications we consider, the associated transformations are not bijective, unlike the isomorphic relationships considered in Section 5.1. Sometimes the lack of bijective transformations occurs because one dynamic program (the primary ADP) evolves in a higher dimensional space than the other (the subordinate ADP). Although the transformations in question are not bijections, we nonetheless show that the primary and subordinate ADPs have tight connections in terms of optimality.

This section is related to the dynamic programs associated with the modified Bellman equations we introduced in Chapter 5 of Sargent & Stachurski (2025). Relative to that theory, the exposition below is more concise and more general. As a result, we can easily cover additional variations on the MDP Bellman equation, of which there are many, as well as studying relationships between ADPs beyond the traditional MDP setting.

5.2.1Strong Semiconjugacy

We begin by introducing a similarity notion related to conjugacy (see Section 5.1.1.1) and show how it links the dynamics of trajectories on posets.

5.2.1.1Definition

Let (V,S)(V, S) and (V^,S^)(\hat V, \hat S) be dynamical systems, where VV and V^\hat V are posets. In this setting, we call (V,S)(V, S) and (V^,S^)(\hat V, \hat S) strongly semiconjugate under F,GF,G when there exist maps F ⁣:VV^F \colon V \to \hat V and G ⁣:V^VG \colon \hat V \to V such that

S=GF on VandS^=FG on V^.S = G \circ F \text{ on } V \qquad \text{and} \qquad \hat S = F \circ G \text{ on } \hat V.

The “semiconjugate” terminology comes from the fact that, when (5.33) holds,

FS=S^FandGS^=SG.F \circ S = \hat S \circ F \quad \text{and} \quad G \circ \hat S = S \circ G .

An immediate implication of the definitions is: if either FF or GG is an order isomorphism, then the systems (V,S)(V, S) and (V^,S^)(\hat V, \hat S) are order conjugate. However, in the applications we consider, neither FF nor GG will be bijective.

Figure 5.1 helps to illustrate the difference between conjugacy and strong semiconjugacy.

Comparison of conjugacy and strong semiconjugacy

Figure 5.1:Comparison of conjugacy and strong semiconjugacy

Like order conjugacy, strong semiconjugacy can be used to derive useful relationships between dynamical systems. The next lemma lists relationships that will be helpful when we turn to dynamic programming.

The next lemma extends Lemma 5.2.1 to cover order stability.

When we get to applications, our main aim will be to convert a given system (V,S)(V,S) into a “nicer” system (V^,S^)(\hat V, \hat S) and then learn about (V,S)(V, S) by studying (V^,S^)(\hat V, \hat S). In particular, we wish to (a) deduce the existence of a unique fixed point of (V,S)(V,S) only by studying (V^,S^)(\hat V, \hat S), and (b) compute this unique fixed point, working only with (V^,S^)(\hat V, \hat S). The next theorem shows the way. It draws on Lemma 5.2.1 and Lemma 5.2.2 for translation of fixed points and order stability, and then adds a convergence result.

Notice that, in (5.35), we iterate only in the “nice” system (V^,S^)(\hat V, \hat S), and finally transfer back to VV using the mapping GG.

5.2.1.2The Order-Reversing Case

Lemma 5.2.2 already tells us that order stability transfers between strongly semiconjugate systems when FF and GG are both order reversing. The next result parallels Theorem 5.2.3 for this case.

Note that the initial condition changes from wS^ww \preceq \hat S w in Theorem 5.2.3 to S^ww\hat S w \preceq w in Theorem 5.2.4. Starting above the fixed point in V^\hat V and iterating down generates a sequence that, after applying the order-reversing map GG, converges up to vˉ\bar v.

5.2.1.3Application: Firm Entry

To illustrate strong semiconjugacy and its implications, we now study a firm entry problem from Fajgelbaum et al. (2017), slightly extended to allow discount rates to change over time. The basic structure of the model is similar to the firm problem we studied in Section 1.1. We show that the functional equation that describes firm value can be solved in lower-dimensional space and then mapped back to the original higher-dimensional space.

Analogous to (1.12) on page , we take the lifetime value of a firm to be a function vv that solves

v(z,f)=max{s(z)f,  β(z)v(z,f)ϕ( ⁣df)Q(z, ⁣dz)}.v(z, f) = \max \left\{ s(z) - f, \; \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\}.

Here zz is an exogenous state, taking values in Rk\RR^k, the real number ff represents an iid fixed cost, with distribution ϕ\phi, and QQ is a stochastic kernel for the exogenous state. The value s(z)s(z) represents the present value of profits for the firm if it chooses to enter in the current period. Discounting is implemented via a state-dependent factor β(z)>0\beta(z) > 0.

Let ER+\Esf \subseteq \RR_+ be the set where ff takes values and let ZRk\Zsf \subseteq \RR^k be the set where zz takes values. Let X\Xsf be the cartesian product Z×E\Zsf \times \Esf. Let bXb\Xsf and bZb\Zsf be real-valued bounded and Borel measurable functions on X\Xsf and Z\Zsf respectively. We endow both function spaces with the pointwise partial order \leq and the supremum norm.

In (iii), the process (Zt)(Z_t) is QQ-Markov with initial condition zz. Note that, in the traditional constant discount rate setting, the term in (iii) is just βn\beta^n for some constant β(0,1)\beta \in (0,1), so the condition is automatically satisfied.

To solve (5.37) we introduce an operator SS, defined at each vbXv \in b\Xsf by

(Sv)(z,f)=max{s(z)f,β(z)v(z,f)ϕ( ⁣df)Q(z, ⁣dz)}.(Sv)(z, f) = \max \left\{ s(z) - f, \beta(z) \int \int v(z', f') \phi(\diff f') Q(z, \diff z') \right\}.

Evidently, vv is a fixed point of SS if and only if it solves the firm valuation equation (5.37).

Although we can study SS directly, this fixed point problem can be solved in a lower-dimensional space by working with an alternative operator T ⁣:bZbZT \colon b\Zsf \to b\Zsf defined at each wbZw \in b\Zsf by

(Tw)(z)=max{s(z)f,β(z)w(z)Q(z, ⁣dz)}ϕ( ⁣df)(T w)(z) = \int \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \phi(\diff f)

To connect SS and TT, we introduce the map G ⁣:bZbXG \colon b\Zsf \to b\Xsf defined via

(Gw)(z,f)=max{s(z)f,β(z)w(z)Q(z, ⁣dz)}((z,f)X)(G w)(z, f) = \max \left\{ s(z) - f, \beta(z) \int w(z') Q(z, \diff z') \right\} \qquad ((z,f) \in \Xsf)

and the map F ⁣:bXbZF \colon b\Xsf \to b\Zsf defined by

(Fv)(z)=v(z,f)ϕ( ⁣df)(zZ).(Fv)(z) = \int v(z, f) \phi(\diff f) \qquad (z \in \Zsf).

The significance of FF and GG stems from the next lemma.

We are now ready to use the low-dimensional system (bZ,T)(b\Zsf, T) to solve the higher-dimensional system (bX,S)(b\Xsf, S).

In proving Proposition 5.2.7, we use the following lemma.

5.2.2Factored Dynamic Programs

Now we switch from studying dynamical systems to studying dynamic programs. This is straightforward because we view dynamic programs (ADPs) as families of dynamical systems.

5.2.2.1Definition

A factored dynamic program (FDP) is a tuple (V,F,V^,G)(V, F, \hat V, \GG) where

  1. VV and V^\hat V are nonempty posets,

  2. FF is a map from VV to V^\hat V, and

  3. G{Gσ}σΣ\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma} is a family of maps from V^\hat V to VV, and

  4. the set {Gσv^}σΣ\{G_\sigma \, \hat v\}_{\sigma \in \Sigma} has a greatest element for every v^V^\hat v \in \hat V.

If FF and all GσG_\sigma are all order preserving, then we call (V,F,V^,G)(V, F, \hat V, \GG) an order preserving FDP. (In Section 5.2.3, we introduce order-reversing FDPs. This case is rarer, typically involving some form of nonstandard preferences.)

Given an FDP (V,F,V^,G)(V, F, \hat V, \GG), we introduce an operator family T{Tσ:σΣ}\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma} by setting

Tσ=GσF for all σΣ.T_\sigma = G_\sigma \circ F \text{ for all } \sigma \in \Sigma.

Evidently (V,T)(V, \TT) is an ADP. We call it the primary ADP generated by (V,F,V^,G)(V, F, \hat V, \GG).

The FDP (V,F,V^,G)(V, F, \hat V, \GG) also produces a second ADP (V^,T^)(\hat V, \hat{\TT}), where the policy operators in T^\hat{\TT} take the form

T^σ=FGσ for all σΣ,\hat T_\sigma = F \circ G_\sigma \qquad \text{ for all } \sigma \in \Sigma,

We call (V^,T^)(\hat V, \hat{\TT}) the subordinate ADP generated by (V,F,V^,G)(V, F, \hat V, \GG). The figure below illustrates, with the numbers indicating the order in which mappings are applied. (The ordering of the tuple (V,F,V^,G)(V, F, \hat V, \GG) traces the primary cycle on the left.)

5.2.2.2Some Preliminary Results

In this section we examine some basic consequences of the definitions, focusing on the order-preserving case.

To this end, let (V,F,V^,G)(V, F, \hat V, \GG) be an order-preserving FDP. We set

Gv^σGσv^(v^V^),\Gmax \, \hat v \coloneq \bigvee_\sigma G_\sigma \, \hat v \qquad (\hat v \in \hat V),

which is well-defined by (iv) in the definition of FDPs.

We now state some preliminary results concerning the ADPs generated by (V,F,V^,G)(V, F, \hat V, \GG).

Let (V,F,V^,G)(V, F, \hat V, \GG) be an order-preserving FDP with generated ADP (V,T)(V, \TT) and subordinate ADP (V^,T^)(\hat V, \hat{\TT}). We have already observed that the policy operators of (V,T)(V, \TT) and (V^,T^)(\hat V, \hat{\TT}) obey

Tσ=GσFandT^σ=FGσT_\sigma = G_\sigma \circ F \quad \text{and} \quad \hat T_\sigma = F \circ G_\sigma

for all σΣ\sigma \in \Sigma. It follows that each pair of policy systems (V,Tσ)(V, T_\sigma) and (V^,T^σ)(\hat V, \hat T_\sigma) is strongly semiconjugate under F,GσF, G_\sigma. As shown in Lemma 5.2.10, we also have

T=GF   on VandT^=FG   on V^.\tmax = \Gmax \circ F \; \text{ on } V \qquad \text{and} \qquad \htmax = F \circ \Gmax \; \text{ on } \hat V.

The proof is immediate from (5.52).

The strong semiconjugacy results in Lemma 5.2.11 imply helpful similarity properties for the two ADPs. The next lemma helps to illustrate.

Note that (5.52) implies that (V,T)(V, \tmax) and (V^,T^)(\hat V, \htmax) are strongly semiconjugate under order-preserving F,GF, \Gmax. This fact allows us to connect optima for the two ADPs and we use it repeatedly below.

5.2.2.3Optimality

Now we are ready to study the extent to which optimality properties are transferred under subordination. As before, the context is that (V,F,V^,G)(V, F, \hat V, \GG) is an order-preserving FDP with primary ADP (V,T)(V, \TT) and subordinate ADP (V^,T^)(\hat V, \hat{\TT}). The symbols T\tmax and T^\htmax denote their respective Bellman operators. When they exist,

v=σvσandv^=σv^σ\vmax = \bigvee_\sigma v_\sigma \quad \text{and} \quad \hvmax = \bigvee_\sigma \hat v_\sigma

will represent their respective value functions.

We begin with our main optimality result for order-preserving FDPs and the two ADPs they generate.

In the proof of Theorem 5.2.13, we repeatedly use the strong semiconjugacy results in Lemma 5.2.1 to transfer fixed points from one value space to another.

5.2.2.4A Converse Implication

We continue to assume that (V,F,V^,G)(V, F, \hat V, \GG) is an order-preserving FDP with primary ADP (V,T)(V, \TT) and subordinate ADP (V^,T^)(\hat V, \hat{\TT}). We saw in Theorem 5.2.13 that the following implication holds: if σ\sigma is optimal for (V,T)(V, \TT), then σ\sigma is optimal for (V^,T^)(\hat V, \hat{\TT}). The converse is not in general true. Often this is because, at an intuitive level, the subordinate ADP is blind to policy behavior at states that are unreachable under the transition dynamics, while the primary ADP cares about every state. (An example of this scenario is given in Section 8.3.2.3.)

To obtain such a converse, we use a strict monotonicity condition on FF. In what follows, \prec refers to the strict inequality defined in Section A.1.2.8. In particular, for elements u,vu, v, the statement uvu \prec v means that uvu \preceq v and not u=vu = v. Also, FF is strictly order preserving when FF is order preserving and uvu \prec v implies FuFvFu \prec Fv.

5.2.3Order-Reversing FDPs

An order-reversing FDP is an FDP (V,F,V^,G)(V, F, \hat V, \GG) where FF is order-reversing and each GσG_\sigma is order-reversing. As in the order-preserving case, TσGσFT_\sigma \coloneq G_\sigma \circ F and T^σFGσ\hat T_\sigma \coloneq F \circ G_\sigma define the policy operators for the primary ADP (V,T)(V, \TT) and subordinate ADP (V^,T^)(\hat V, \hat{\TT}), respectively. Since FF and GσG_\sigma are both order reversing, each TσT_\sigma and T^σ\hat T_\sigma is order preserving, so these are valid ADPs.

Throughout this section,

The primary ADP behaves just as in the order-preserving case:

The subordinate ADP is where the order-reversing case diverges. Because FF reverses order, it converts the supremum G\Gmax into an infimum, so FGF \circ \Gmax gives the Bellman min-operator rather than the max-operator.

The policy operator identities Tσ=GσFT_\sigma = G_\sigma \circ F and T^σ=FGσ\hat T_\sigma = F \circ G_\sigma hold for any FDP, regardless of order properties, so each pair of policy systems (V,Tσ)(V, T_\sigma) and (V^,T^σ)(\hat V, \hat T_\sigma) is strongly semiconjugate under F,GσF, G_\sigma. Combining these with Lemma 5.2.15 and Lemma 5.2.16, the Bellman operators obey

T=GF   on VandT^=FG   on V^.\tmax = \Gmax \circ F \; \text{ on } V \qquad \text{and} \qquad \htmin = F \circ \Gmax \; \text{ on } \hat V.

The proof is immediate from (5.58). Since FF and each GσG_\sigma are order reversing, the next exercise gives a parallel of Lemma 5.2.12.

We now study optimality for order-reversing FDPs. The primary ADP (V,T)(V, \TT) is a standard maximization problem, while the subordinate ADP (V^,T^)(\hat V, \hat{\TT}) is a minimization problem, with Bellman min-operator T^=FG\htmin = F \circ \Gmax. When they exist,

v=σvσandv^=σv^σ\vmax = \bigvee_\sigma v_\sigma \quad \text{and} \quad \hvmin = \bigwedge_\sigma \hat v_\sigma

will denote the max-value function of (V,T)(V, \TT) and the min-value function of (V^,T^)(\hat V, \hat{\TT}), respectively.

The next result is the order-reversing analog of Theorem 5.2.13, extended to include a converse implication under a strict monotonicity condition. In the statement, FF is called strictly order reversing when FF is order reversing and uvu \prec v implies FvFuFv \prec Fu.

5.3Applications

In Section 5.3.1, we show that the standard MDP and its Q-factor variant are the primary and subordinate ADPs of a single order-preserving FDP, unifying results previously proved separately. In Section 5.3.2, we apply the same framework to connect discrete choice Bellman equations with the post-action value function formulation used in structural estimation. In Section 5.3.3, we revisit the Epstein--Zin model and use factored dynamic programs to obtain a lower-dimensional subordinate ADP that is more efficient to solve.

5.3.1A Deeper Analysis of Q-Factors

In Section 3.2.1 we discussed both MDPs and Q-factor MDPs, proving separately that they satisfy the fundamental optimality properties. Of course, these two models are connected. Here we unify the models by representing them as the primary and subordinate pairs of an FDP.

We work with the finite-state MDP environment described in Section 3.2.1. Using the primitives described there, we form an order-preserving FDP (V,F,V^,G)(V, F, \hat V, \GG) by setting

(Fv)(x,a)=r(x,a)+βxv(x)P(x,a,x)(vRX,  (x,a)G)(Fv)(x, a) = r(x, a) + \beta \sum_{x'} v(x')P(x,a,x') \qquad \left(v \in \RR^\Xsf, \; (x,a) \in \Gsf \right)

and

(Gσf)(x)=f(x,σ(x))(σΣ,  fRG),(G_\sigma f)(x) = f(x, \sigma(x)) \qquad \left(\sigma \in \Sigma, \; f \in \RR^\Gsf \right),

with VRXV \coloneq \RR^\Xsf and V^RG\hat V \coloneq \RR^\Gsf. As required, FF maps VV to V^\hat V, and GσG_\sigma maps V^\hat V to VV, and both are order preserving. Also, fixing fV^f \in \hat V and choosing σ\sigma such that σ(x)argmaxaΓ(x)f(x,a)\sigma(x) \in \argmax_{a \in \Gamma(x)} f(x, a), we verify the existence of a policy σ\sigma with GτfGσfG_\tau \, f \leq G_\sigma f for all τΣ\tau \in \Sigma. Hence (V,F,V^,G)(V, F, \hat V, \GG) is an order-preserving FDP, as claimed.

The primary ADP (V,T)(V, \TT) generated by (V,F,V^,G)(V, F, \hat V, \GG) is produced by setting Tσ=GσFT_\sigma = G_\sigma \circ F for each σ\sigma, which gives

(Tσv)(x)=(GσFv)(x)=r(x,σ(x))+βxv(x)P(x,σ(x),x).(T_\sigma \, v)(x) = (G_\sigma \, F \, v)(x) = r(x, \sigma(x)) + \beta \sum_{x'} v(x') P(x, \sigma(x), x').

Thus, (V,T)(V, \TT) is nothing but the standard ADP generated from an MDP (see Section 2.3.3.1).

The subordinate ADP (V^,T^)(\hat V, \hat{\TT}) generated by (V,F,V^,G)(V, F, \hat V, \GG) is produced by setting T^σ=FGσ\hat T_\sigma = F \circ G_\sigma for each σ\sigma, which gives

(T^σf)(x,a)=(FGσf)(x,a)=r(x,a)+βxf(x,σ(x))P(x,a,x).(\hat T_\sigma \, f)(x, a) = (F \, G_\sigma \, f)(x, a) = r(x, a) + \beta \sum_{x'} f(x', \sigma(x')) P(x, a, x').

This map T^σ\hat T_\sigma is identical to the Q-factor policy operator SσS_\sigma we constructed in (3.14), on page . Thus, (V^,T^)(\hat V, \hat{\TT}) is just the Q-factor ADP we examined in Section 3.2.1.2 (where the ADP was written as (RG,S)(\RR^\Gsf, \SS)).

We already know that the fundamental optimality properties hold for both of these models, and that VFI, OPI, and HPI all converge. We took the time to prove these facts separately, in Section 3.2.1.1 and Section 3.2.1.2. Theorem 5.2.13 now tells us that one of these steps was unnecessary. Since these two ADPs are the primary and subordinate elements of the FDP (V,F,V^,G)(V, F, \hat V, \GG), establishing these facts for either one of the pairs is enough to establish them for the other.

In addition, we can use Theorem 5.2.13 to formally connect the value functions and optimal policies. In the next proposition, v\vmax is the MDP value function and q\qmax is the Q-factor value function.

In Section 5.2.2.4, we discussed the fact that the converse to (ii) fails to hold without additional conditions. In particular, to get the converse to (ii), we require that FF is strictly order preserving. Here’s how that result looks in the present case. In the result, the statement that X\Xsf has no isolated point under PP means that there is no xXx' \in \Xsf such that P(x,a,x)=0P(x,a,x')=0 for all (x,a)G(x,a) \in \Gsf.

5.3.2Structural Estimation via Transforms

In Section 4.2.3 we considered post-action value functions for discrete choice models in the context of structural estimation. Post-action value functions are a transformation of a more standard Bellman equation. Here we unify these two models through the lens of FDPs.

The setting we consider is the same as in Section 4.2.3.1. The state space X\Xsf is a metric space. The action set A\Asf is finite. The set of policies Σ\Sigma is all measurable maps from X\Xsf to A\Asf and, correspondingly, GX×A\Gsf \coloneq \Xsf \times \Asf. The reward function rRGr \in \RR^\Gsf is assumed to be bounded and Borel measurable, while PP is a stochastic kernel from G\Gsf to X\Xsf.

Consider the tuple (bX,F,bG,G)(b\Xsf, F, b\Gsf, \GG), where

(Fv)(x,a)=v(x)P(x,a, ⁣dx)(vbX)(Fv)(x, a) = \int v(x')P(x,a, \diff x') \qquad (v \in b\Xsf)

and

(Gσg)(x)=r(x,σ(x))+βg(x,σ(x)).(gbG).(G_\sigma \, g)(x) = r(x, \sigma(x)) + \beta g(x, \sigma(x)). \qquad (g \in b\Gsf).

We claim that (bX,F,bG,G)(b\Xsf, F, b\Gsf, \GG) is an order-preserving FDP. Clearly FF is an order-preserving map from bXb\Xsf to bGb\Gsf, while each GσG_\sigma is an order preserving map from bGb\Gsf to bXb\Xsf. Moreover, we proved in Exercise 4.8 that there exists a measurable map σ ⁣:XA\sigma \colon \Xsf \to \Asf obeying

σ(x)argmaxaA{r(x,a)+βg(x,a)}for all xX\sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta g(x, a)\} \quad \text{for all } x \in \Xsf

For any such σ\sigma we have GτgGσgG_\tau \, g \leq G_\sigma \, g for all τΣ\tau \in \Sigma. These facts confirm our claim.

The primary ADP (bX,TSE)(b\Xsf, \TT_{\rm SE}) for this model is the “natural” discrete choice version of the dynamic program. To describe it, we use Tσ=GσFT_\sigma = G_\sigma \circ F to determine the policy operators, yielding

(Tσv)(x)=r(x,σ(x))+βv(x)P(x,σ(x), ⁣dx).(T_\sigma \, v)(x) = r(x, \sigma(x)) + \beta \int v(x') P(x, \sigma(x), \diff x').

The corresponding Bellman equation is

v(x)=maxaA{r(x,a)+βv(x)P(x,a, ⁣dx)}(vbX).v(x) = \max_{a \in \Asf} \left\{ r(x, a) + \beta \int v(x') P(x, a, \diff x') \right\} \qquad (v \in b\Xsf).

For the subordinate ADP (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}), the policy operators are given by T^σg=FGσg\hat T_\sigma \, g = F G_\sigma \, g, yielding

(T^σg)(x,a)={r(x,σ(x))+βg(x,σ(x))}P(x,a, ⁣dx)(gbG).(\hat T_\sigma \, g)(x, a) = \int \left\{ r(x', \sigma(x')) + \beta g(x', \sigma(x')) \right\} P(x, a, \diff x') \qquad (g \in b\Gsf).

The corresponding Bellman equation is

g(x,a)=maxaA{r(x,a)+βg(x,a)}P(x,a, ⁣dx),g(x, a) = \int \max_{a' \in \Asf} \left\{ r(x', a') + \beta g(x', a') \right\} P(x, a, \diff x'),

which is exactly the post-action value function Bellman equation we examined in (4.45).

We proved in Proposition 4.2.4 that, for this ADP (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}), the fundamental optimality properties hold. Theorem 5.2.13 now implies that the same properties hold for (bX,TSE)(b\Xsf, \TT_{\rm SE}). It also tells us that any policy optimal for (bX,TSE)(b\Xsf, \TT_{\rm SE}) is also optimal for (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}), and that a policy σ\sigma is optimal for (bX,TSE)(b\Xsf, \TT_{\rm SE}) whenever Gσg=GgG_\sigma \, \gmax = \Gmax \, \gmax. (Here G=σGσ\Gmax = \bigvee_\sigma G_\sigma and g\gmax is the value function for (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}), which is the unique solution to the post-action value function Bellman equation.)

In the present setting, this means we can compute an optimal policy for the primary ADP as follows:

  1. Compute the unique (in bGb\Gsf) fixed point g\gmax of the Bellman operator T^\hat T corresponding to the subordinate (post-action value) ADP (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}) and

  2. compute a policy σ\sigma obeying

σ(x)argmaxaA{r(x,a)+βg(x,a)}for all xX.\sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta \gmax(x, a)\} \quad \text{for all } x \in \Xsf.

Also, if the function FF in (5.67) is strictly order preserving, then by Proposition 5.2.14, we can compute an optimal policy for (bX,TSE)(b\Xsf, \TT_{\rm SE}) by solving for an optimal policy for (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}). The strictly order preserving property does not hold for FF in general. For a sufficient condition when X\Xsf is finite, see Example A.1.12.

5.3.3Epstein--Zin Revisited

We consider a special case of the Epstein--Zin model from Section 5.1.3 involving optimal savings with iid endowments. Using the FDP framework, we construct a subordinate ADP that operates on functions of wealth alone, rather than wealth and endowment jointly. This yields significant computational savings, which we quantify numerically.

5.3.3.1Subordination in an Epstein--Zin Setting

In this section we consider a special case of the Epstein--Zin ADP (V,T)(V, \TT) analyzed in Section 5.1.3. The special case concerns optimal savings in the presence of an iid endowment process. We will produce a subordinate ADP via a transformation reminiscent of the expected value transformation of an ordinary MDP in Section 5.3.2. We will see that this subordinate ADP is easier to analyze and solve.

We begin with a finite set W\Wsf of possible wealth values and a finite set E\Esf of possible values for the endowment process. (Finiteness helps simplify the exposition and can be replaced by continuity and compactness conditions.) The Bellman equation takes the form

v(w,e)=maxwΓ(w,e){(1β)r(w,w,e)α+β(ev(w,e)νϕ(e))α/ν}1/α.v(w, e) = \max_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta \left( \sum_{e'} v(w', e')^\nu \phi(e') \right)^{\alpha/ \nu} \right\}^{1/\alpha}.

Here Γ(w,e)W\Gamma(w, e) \subset \Wsf is the set of all feasible choices for next period wealth ww' given current wealth ww and current endowment ee. The new endowment ee' is drawn independently from distribution ϕ\phi, which maps E\Esf into [0,1][0,1].

This model is a special case of the Epstein--Zin ADP from Section 5.1.3. To see this we set XW×E\Xsf \coloneq \Wsf \times \Esf, with typical element x=(w,e)x = (w, e). Let VV be the order interval [v11/ν,v21/ν](0,)X[v_1^{1/\nu}, v_2^{1/\nu}] \subset (0, \infty)^\Xsf defined in Section 5.1.3 (see (5.25)). With a=wa = w' and A=W\Asf = \Wsf, the Bellman equation (5.75) is a special case of (5.19).

To improve analysis we produce an order-preserving FDP where the primary ADP is the model just discussed and the subordinate ADP operates in a lower-dimensional state space. To do so we set VV as above,

(Fv)(w)={ev(w,e)νϕ(e)}1/ν(wW),(Fv)(w) = \left\{ \sum_{e} v(w, e)^\nu \phi(e) \right\}^{1/\nu} \qquad (w \in \Wsf),

which maps VV to V^F(V)\hat V \coloneq F(V), and

(Gσh)(w,e)={(1β)r(w,σ(w,e),e)α+βh(σ(w,e))α}1/α((w,e)X),(G_\sigma \, h)(w, e) = \left\{ (1-\beta) r(w, \sigma(w, e), e)^\alpha + \beta h(\sigma(w, e))^\alpha \right\}^{1/\alpha} \qquad ((w,e) \in \Xsf),

Both FF and GσG_\sigma are order-preserving. Assuming that Γ(w,e)\Gamma(w, e) is nonempty at each (w,e)X(w, e) \in \Xsf, one easily verifies the existence, for each hV^h \in \hat V, of a policy σ\sigma such that

σ(w,e)argmaxwΓ(w,e){(1β)r(w,w,e)α+βh(w)α}1/αfor all (w,e)X.\sigma(w, e) \in \argmax_{w' \in \Gamma(w, e)} \left\{ (1-\beta) r(w, w', e)^\alpha + \beta h(w')^{\alpha} \right\}^{1/\alpha} \quad \text{for all } (w,e) \in \Xsf.

For this σ\sigma we have GτhGσhG_\tau h \leq G_\sigma h for all τΣ\tau \in \Sigma. Hence, with G={Gσ}σΣ\GG = \{G_\sigma\}_{\sigma \in \Sigma}, the tuple (V,F,V^,G)(V, F, \hat V, \GG) is an order-preserving FDP.

Inspecting (5.79), we see that the corresponding Bellman equation is (5.75). Thus, the primary ADP (V,T)(V, \TT) corresponds to the original problem we considered at the start of this section. Let’s now look at the subordinate problem.

The benefit of working with (V^,T^)(\hat V, \hat{\TT}) is that T^σ\hat T_\sigma acts on functions that depend only on ww rather than on both ww and ee (as is the case for TσT_\sigma). These lower dimensional operations are significantly more efficient, even when E\Esf is relatively small.

Since (V,T)(V, \TT) is a special case of the ADP discussed in Section 5.1.3, Proposition 5.1.13 implies that the fundamental optimality properties hold for (V,T)(V, \TT). As a result, Theorem 5.2.13 implies that they also hold for (V^,T^)(\hat V, \hat{\TT}). It also tells us that we can obtain an optimal policy for (V,T)(V, \TT) by finding the value function v^\hvmax for (V^,T^)(\hat V, \hat{\TT}) and then calculating a policy σ\sigma obeying Gσv^=Gv^G_\sigma \, \hvmax = \Gmax \, \hvmax. By the definition of GσG_\sigma in (5.77), this means that we solve for σ\sigma satisfying (5.78) after setting h=v^h = \hvmax.

To compute v^\hvmax, we can use Theorem 2.2.6, which tells us that Howard policy iteration applied to (V^,T^)(\hat V, \hat{\TT}) converges to v^\hvmax in finitely many steps. Summarizing this analysis, an optimal policy for (V,T)(V, \TT) can be computed via Algorithm 5.3.1.

Figure 5.2 shows wσ(w,e)w \mapsto \sigopt(w, e) for two values of ee (smallest and largest) when σ\sigopt is the optimal policy, calculated using Algorithm 5.3.1. In the computation we set Γ(w,e)=[0,w]\Gamma(w, e) = [0, w] and r(w,s,e)=ws+er(w, s, e) = w - s + e. We chose α\alpha and ν\nu to match the values used in Schorfheide et al. (2018).

In Figure 5.3 we display the relative speed gain from using the lower-dimensional model (V^,T^)(\hat V, \hat{\TT}) instead of (V,T)(V, \TT) across multiple choices of W|\Wsf| and E|\Esf|. The speed gain is the time required to solve an optimal policy for (V,T)(V, \TT) using HPI applied to (V,T)(V, \TT) (as in Theorem 2.2.6), divided by the time required to solve for the same optimal policy via Algorithm 5.3.1. The speed gain increases linearly in the size of E\Esf.

Optimal savings policy with Epstein--Zin preference

Figure 5.2:Optimal savings policy with Epstein--Zin preference

Speed gain from replacing (V, \TT) with subordinate model (\hat V, \hat{\TT})

Figure 5.3:Speed gain from replacing (V,T)(V, \TT) with subordinate model (V^,T^)(\hat V, \hat{\TT})

5.4Chapter Notes

A recurring theme in applied dynamic programming has been the rearrangement of the Bellman equation into alternative functional forms in order to obtain analytical, computational, or statistical advantages. Familiar instances include reservation-wage and continuation-value formulations of optimal stopping problems, used as early as the job-search analysis of McCall (1970) and developed extensively in the real-options literature (see, e.g., Dixit & Pindyck (2012)); the expected and integrated value-function transformations used in structural estimation of discrete choice models, pioneered by Rust (1987); the Q-factor representations introduced for reinforcement learning by Watkins (1989); and lower-dimensional reformulations that integrate out independent or uncontrollable states. Systematic studies of such transformations can be found in Ma & Stachurski (2019) and Ma & Stachurski (2021). The order-theoretic framework adopted in this chapter, building on the abstract foundations of Sargent & Stachurski (2025), unifies and extends those analyses by treating the underlying maps in their own right rather than as recipes for individual Bellman equations. Closely related ideas in the more concrete setting of finite-state MDPs can be found in Chapter 5 of Sargent & Stachurski (2025).

The topological conjugacy results in Section 5.1.1.2 are applied in Section 8.3.4 of Chapter 8 to establish the equivalence of value function iteration and time iteration.

The applications in Section 5.3 unify topics treated in their own right elsewhere in this volume. References to the underlying literatures can be found in the corresponding chapter notes: see Section 3.3 for Q-factor representations and reinforcement learning, Section 4.3 for structural estimation of dynamic discrete choice models, and Section 1.6 for Epstein--Zin preferences. The optimal harvest model in Section 8.3.2 of Chapter 8 is another example of a factored dynamic program in the sense of Section 5.2.2.

The firm-entry application in Section 5.2 is adapted from Fajgelbaum et al. (2017), slightly extended to permit time-varying discounting; for related industry-dynamics models see Hopenhayn (1992). The parameter values used in Figure 5.2 follow Schorfheide et al. (2018).

References
  1. Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
  2. Fajgelbaum, P. D., Schaal, E., & Taschereau-Dumouchel, M. (2017). Uncertainty traps. The Quarterly Journal of Economics, 132(4), 1641–1692.
  3. Stachurski, J., & Zhang, J. (2021). Dynamic programming with state-dependent discounting. Journal of Economic Theory, 192, 105190.
  4. Schorfheide, F., Song, D., & Yaron, A. (2018). Identifying long-run risks: A Bayesian mixed-frequency approach. Econometrica, 86(2), 617–654.
  5. McCall, J. J. (1970). Economics of Information and Job Search. The Quarterly Journal of Economics, 84(1), 113–126.
  6. Dixit, R. K., & Pindyck, R. S. (2012). Investment under uncertainty. In Investment Under Uncertainty. Princeton University Press.
  7. Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 999–1033.
  8. Watkins, C. J. C. H. (1989). Learning from delayed rewards [Techreport]. PhD Thesis, King’s College, Cambridge United Kingdom.
  9. Ma, Q., & Stachurski, J. (2019). Optimal timing of decisions: A general theory based on continuation values. Journal of Economic Dynamics and Control, 101, 62–81.
  10. Ma, Q., & Stachurski, J. (2021). Dynamic programming deconstructed: Transformations of the Bellman equation and computational efficiency. Operations Research, 69(5), 1591–1607.
  11. Sargent, T. J., & Stachurski, J. (2025). Dynamic Programs on Partially Ordered Sets. SIAM Journal on Optimization and Control, in press.
  12. Hopenhayn, H. A. (1992). Entry, exit, and firm dynamics in long run equilibrium. Econometrica: Journal of the Econometric Society, 1127–1150.