Many applications of interest have some kind of algebraic structure, for example when the value space is a subset of a vector space. In this chapter, we add algebraic structure to the value space and the policy operators. This structure can be exploited to provide sharp optimality conditions and more explicit results.
When introducing algebraic structure to ADP theory, we work in the setting of Banach lattices. These spaces are attractive for ADP analysis, due to well-integrated order, algebraic and metric properties. (See Section A.5.3.3 for background on Banach lattices. We recall that, given a Banach space E, B(E) is the set of all bounded linear operators from E to itself and, for L∈B(E), the symbol ρ(L) denotes the spectral radius (see Section A.4.3). We also use the notion of positive operators defined in Section A.5.2.2. Loosely speaking, positive operators between Banach lattices are generalizations of nonnegative matrices.)
We begin by developing optimality theory for ADPs on Banach lattices, covering contractions and Blackwell’s condition (Section 4.1.1), order contractions (Section 4.1.2), and concavity-based methods (Section 4.1.3). We then apply the theory to firm valuation, real options, and structural estimation.
In Section 4.1.1.1 we review eventually contracting maps and Blackwell’s condition. In Section 4.1.1.2 we use these tools to obtain optimality results for ADPs satisfying a Blackwell-type discounting condition.
Let E=(E,∥⋅∥,⩽) be a Banach lattice with positive cone E+. Let S be a self-map on a subset V of E. Recall from Section A.2.2.2 that S is said to be eventually contracting on V if Sn is a contraction for some n∈N; that is, there exists a λ∈[0,1) and an n∈N such that
The following result is an obvious consequence of Theorem A.2.8.
There is a well-known technique for testing for contractivity (or eventual contractivity) of order preserving maps via Blackwell’s condition. This technique is often used for dynamic programming (see, e.g., Stokey & Lucas (1989), Theorem 3.3). Here we state an abstract version, for a self-map S on V⊂E, focusing on contractivity in one step. (For eventual contractivity, replace S with Sn.) In the statement, we assume that E has a normalized order unit. Recalling the definition in Section A.5.3.4, this means that there exists an e∈E+ obeying ∥e∥=1 and ∣v∣⩽∥v∥e for all v∈E. Also, we recall that V⊂E is called increasing if u,v∈E with u⩽v and u∈V implies v∈V.
Throughout this section, E is a Banach lattice containing a normalized order unit e and V is a closed increasing subset of E. In the statement of the next theorem, V0 is a subset of V.
A self-map M on V will be called a certainty equivalent operator if M is order preserving and translation invariant; that is,
(The terminology comes from the abstract theory of certainty equivalents. We cover these objects in more detail in Section 7.2.4.)
The next exercise treats a special case in which the policy operators of an additive ADP (see (4.1)) are built from certainty equivalent operators. Certainty equivalents are treated in more detail in Section 7.2.4. As before, V is a closed increasing subset of E and V0 is a subset of V.
We first define order contractions and establish fixed point results. We then obtain optimality conditions for order contracting ADPs and specialize to the case of affine policy operators in Section 4.1.2.3.
Let E=(E,∥⋅∥,⩽) be a Banach lattice with positive cone E+. We call D:E+→E+ a discount operator on E if
D0=0,
D is order-preserving, and
D is eventually contracting.
The reason we call such an operator D a discount operator will become clearer below. Intuitively, D is eventually contracting and has a fixed point at zero, so Dnh→0 for all h∈E+. If we think of h as a time-n payoff and Dnh as its (state-contingent) present value, then the properties of D imply that the present value is increasing in the payoff (since D is order-preserving) and the present value converges to zero as the future date of the payoff moves to the infinite future.
Fix V⊂E. Let S be a self-map on V. We call San order contraction of modulus D on V if there exists a discount operator D on E such that
Order contracting maps obey the following fixed point result.
The next exercise states a variation on Blackwell’s condition that is suitable for order contractions. To begin, we take V to be a subset of E such that
Next we study ADPs where the value space V is a subset of a Banach lattice E. The results in this section extend to ADPs where the operators are order contractions. We do not require existence of a normalized order unit, as in Section 4.1.1.2. This will allow us to handle ADPs that evolve in Lp spaces and simplify treatment of features such as state-dependent discounting.
Our first result treats the case where T is finite.
Our second result replaces finiteness with uniform order contractivity. In the statement of the next theorem, V0 is a subset of V and V is closed in E.
Next we focus on ADPs operating in Banach lattices where the policy operators are affine. As before, E is a Banach lattice with positive cone E+, linear operators B(E) and positive linear operators B+(E).
Our second result replaces finiteness with a uniform discount operator bound. In the statement, V0 is a subset of V and V is closed in E.
Some dynamic programs involve nonlinear policy operators that fail to be contractions. For example, models with recursive preferences or ambiguity often have these features. In this setting, we can deploy alternative fixed point results related to concavity and convexity of operators. Here we apply such results to obtain optimality conditions for ADPs.
We first state Du’s fixed point theorem for concave and convex operators and then apply it to obtain optimality conditions for ADPs on order intervals.
Let E=(E,∥⋅∥,⩽) be a Banach lattice and suppose that V=[a,b] for some a,b∈E. In this setting, we say that a self-map S on V satisfies Du’s conditions if either
S is concave and Sa⩾a+ϵ(b−a) for some ϵ∈(0,1), or
S is convex and Sb⩽b−ϵ(b−a) for some ϵ∈(0,1).
We state a result below connecting Du’s conditions to global stability. Before doing so, we note some useful sufficient conditions that can be used when the positive cone of the Banach lattice has nonempty interior. To state them we write x≪y if y−x is interior to E+.
Here is the stability result.
For a proof consult Du (1990) or Zhang (2012), Theorem 2.1.2.
We apply the theory developed above to three classes of problems. In Section 4.2.1 we study firm valuation under constant discounting, unbounded rewards, and state-dependent discounting. In Section 4.2.2 we analyze a real option problem. In Section 4.2.3 we treat structural estimation models, including state-dependent discounting and non-expected utility preferences.
Let’s now return to the firm valuation problem from Section 1.1 and discuss optimality results. We will look at the original bounded case with discounting, a second case with unbounded rewards (boundedness of the profit function is replaced by an integrability condition), and a third case involving state-dependent discounting.
In Section 2.3.1, we revisited the firm problem from Section 1.1 and showed that (bX,TFV) is an ADP, where TFV:={Tσ:σ∈Σ} is the set of all policy operators having the form
(Here σs is understood as the map x↦sσ(x) and so on.) We recall that the policy set Σ is all B-measurable functions mapping X to {0,1}, which coincides with the set of indicator functions on B, and that the indicator function
is v-greedy. Existence of v-greedy policies for all v implies that the firm ADP is regular.
In Theorem 1.1.1 we stated optimality results for the firm valuation problem. In Section 1.1.1.4 we supplied a direct proof. Here’s a more general result and a proof using the theory from this chapter:
As usual, the ADP Bellman operator T obeys Tv=Tσv whenever σ is v-greedy, so, using the description of greedy policies above, we have Tv=s∨(π+βPv). The Bellman equation is therefore v=s∨(π+βPv), which, in expanded form, becomes
Proposition 4.2.1 implies that the value function v∗:=⋁σvσ solves the Bellman equation (4.26), and that it can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are v∗-greedy, so σ=1{s⩾π+βPv∗} is optimal. This is the only optimal policy under the convention that the manager always sells the firm when indifferent between selling and continuing.
Figure 4.2:HPI iterates for the firm valuation problem
Figure 4.2 illustrates HPI applied to the firm valuation problem using the same parameters as Figure 1.4. The initial policy is σ0≡0 (never sell). At each step k, the lifetime value vσk is obtained by solving the linear system v=rσk+Kσkv, where rσk and Kσk are as defined in (4.25). The next policy σk+1 is then set to be vσk-greedy. In the right panel, the value functions rise monotonically towards v∗ (shown in black). The left panel shows the corresponding sell thresholds converging to the optimal policy σ∗.
Proposition 4.2.1 used the assumptions in Theorem 1.1.1, one of which was that the profit function π is bounded. In practice, many of the functions that we use for applied modeling are unbounded. In Section 1.1.2.2 we sketched ideas for extending the optimality results to an unbounded setting, where π is assumed instead to be integrable. Let’s continue that discussion here.
Instead of boundedness, we assume π lies in L1(ψ):=L1(X,B,ψ) for a distribution ψ on (X,B) that is stationary for P (see Section A.5.4.4). We endow L1(ψ) with the almost everywhere pointwise partial order (see Section A.5.2.5). In particular, for f,g∈L1(ψ), the statement f⩽g means that {x∈X:f(x)>g(x)} has ψ-measure zero.
The firm valuation ADP is now (L1(ψ),TFV), where each Tσ in TFV again has the form (4.23), but now with v and π being elements of L1(ψ), while P is understood as a Markov operator on L1(ψ). In Section 1.1.2.2 we showed that each policy operator Tσ maps L1(ψ) into itself. Moreover, Tσ is order preserving with respect to ⩽; if v⩽w holds ψ-almost everywhere, then
for all x∈X, and Tσv⩽Tσw easily follows. This confirms that (L1(ψ),TFV) is an ADP. The ADP is regular because, given v∈L1(ψ), the indicator in (4.24) is still v-greedy.
The claims in Proposition 4.2.1 extend to the ADP (L1(ψ),TFV). The proof follows from Theorem 4.1.8. Each Tσ has the required form Tσv=rσ+Kσv for rσ∈L1(ψ) and Kσ∈B+(L1(ψ)); we again set rσ and Kσ as in (4.25). With K:=βP, we have Kσ⩽K for all σ and, by Lemma A.5.32 on page , ρ(K)=ρ(βP)=β<1. As (L1(ψ),TFV) is regular, the conclusions of Theorem 4.1.8 apply.
We emphasized the importance of time-varying discount rates in Section 1.1.2.1. To incorporate such variation into our model, we set rt=r(Xt), where r is a B-measurable function, and then β(x)=1/(1+r(x)) for all x∈X. We require that r>−1, so that $\beta
0.Forsimplicity,wealsoassumethat\betaisboundedand,asinref‘sss−condi‘,thattheprofitfunction\piisbounded.Atcurrentstatex,afixedpolicy\sigmayieldslifetimefirmvaluev_\sigma(x),wherev_\sigma$ satisfies the recursion
The operator K discounts future cash flows given the dynamics of discounting embedded in β and P. Since β is bounded, K is a positive linear operator sending bX into itself. It follows that each Tσ is an order-preserving self-map on bX. Hence, letting TFV be all policy operators of the form (4.29), with σ ranging over the policy set Σ, the pair (bX,TFV) is an ADP. It represents the dynamic decision problem for the firm under state-dependent discounting.
Using similar arguments to the constant discounting case, we can confirm that the policy σ=1{s⩾π+Kv} is v-greedy. The ADP Bellman operator T obeys Tv=Tσv whenever σ is v-greedy, so, using the description of greedy policies just given, we have $T v = s \vee (\pi
K v).TheBellmanequationisthereforev = s \vee (\pi + K v)$, which, in expanded form, becomes
Since greedy policies always exist, the ADP (bX,TFV) is regular.
In order to obtain optimality results, we need some degree of stability for the ADP. To this end, we impose the following condition:
Assumption 4.2.1 requires that the discount factor is sufficiently small “on average,” so that lifetime values are finite and uniquely defined.
To illustrate the factors that influence ρ(K), suppose that (Xt) follows the AR(1) process Xt+1=μ(1−α)+αXt+νZt+1, with Zt iid standard normal, 0<α<1, and ν>0, discretized via Tauchen’s method, with discount factor β(x)=ex. Figure 4.3 shows contour plots of ρ(K) as α and ν vary, for two values of the long-run mean μ. The black line marks the boundary ρ(K)=1; Assumption 4.2.1 holds below and to the left of this line. A more negative μ (lower average discount factor) enlarges the stable region. In both panels, ρ(K) increases with persistence and volatility, reflecting the fact that greater variation in the state process pushes the “average” discount factor upward.
Figure 4.3:Spectral radius ρ(K) as a function of the AR(1) parameters, with β(x)=ex
We can now state the following result for the firm valuation ADP (bX,TFV) under state-dependent discounting.
Consider a firm that has developed a prototype product and faces a strategic decision: launch now, or continue development? The relative payoffs for these different choices depend on the economic environment, which evolves stochastically over time. This scenario is an example of a real option problem. The firm holds an option to launch, analogous to a call option on a financial asset, and must determine when to exercise it. Waiting has value when it allows the firm to launch under more favorable conditions.
A state process (Xt) summarizes payoff-relevant information at time t, such as demand conditions, competitive pressure, factor prices, and regulatory stance. The state influences the firm through two channels. Development costs c(Xt) rise when input prices increase or skilled labor becomes scarce. Post-launch profitability π(Xt) depends on demand strength and competitive conditions at the time of launch. Management strategies must balance these forces, weighing the benefits of launching in different states against the costs of continued development.
We assume that (Xt)t⩾0 is P-Markov, where P is a stochastic kernel on the measurable space (X,B). At the start of time t, the firm pays a current flow cost c(Xt) for development. Then, after observing the new state Xt+1, management decides whether or not to launch the product. If they decide to launch, the firm receives a state-contingent profit flow (π(Xt+j))j⩾1, where π:X→R is a given function. If they decide to wait, then no launch occurs and the process repeats. We impose the following weak conditions.
We will seek solutions to the optimization problem in the space L1(ϕ):=L1(X,B,ϕ). As in Section 4.2.1.2, we endow L1(ϕ) with the usual L1 norm and the ϕ-a.e. pointwise order ⩽, so that f⩽g means ϕ{f>g}=0.
Let (Qt+j)j⩾1 be the expected present value of the profit flow conditional on deciding to launch. This sequence obeys the recursion
Exploiting the time homogeneity of the state process and following the same logic that we used to solve the recursion (1.3) on page , we find that Qt+j=q(Xt+j) for all j⩾1, where q is the function that solves
we can write the recursion as q=π+Kq. Under the ϕ-a.e. pointwise order, K is order-preserving.
We can confirm that K maps L1(ϕ) to itself by using the fact that, under Assumption 4.2.2, the discount function β obeys ∣β∣⩽N for some N∈N. Hence, for v∈L1(ϕ),
We derive lifetime values under a spectral radius condition on the discount operator and establish well-posedness of the ADP. Optimality results are given in Section 4.2.2.3.
We impose conditions under which the recursion q=π+Kq has a unique solution, implying that the profit function associated with launching the product is well defined.
With this function in hand, we can write out the lifetime value of policies. A policy here is a Borel measurable map σ from X to {0,1} with σ(x)=1 indicating the decision to launch the product at state x and σ(x)=0 indicating the decision to continue. As usual, we use Σ to represent the set of all policies. If σ∈Σ and x∈X, then vσ(x) denotes total firm value under policy σ, given initial state x. This function vσ obeys the recursion
Under Assumption 4.2.2 and Assumption 4.2.3, each Tσ is a self-map on L1(ϕ). Since K is a positive operator, each Tσ is order preserving. Hence, letting TRO be all policy operators of the form (4.38), with σ ranging over the policy set Σ, the pair (L1(ϕ),TRO) is an ADP.
Let’s now turn to optimality. We start by studying greedy policies.
Exercise 4.7 implies that (L1(ϕ),TRO) is regular. From Lemma 2.1.1 we know that the ADP Bellman operator satisfies Tv=Tσv whenever σ is v-greedy. Using this fact and the greedy policy σ=1{q⩾v} from Exercise 4.7, we obtain
We can now state the following result for the real option ADP (L1(ϕ),TRO).
Proposition 4.2.3 implies that the value function v∗ solves the Bellman equation (4.43), and that v∗ can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are v∗-greedy, which, by Exercise 4.7, translates to setting σ=1{q⩾v∗}.
Structural estimation is a core sub-field of quantitative economics that also plays a significant role in finance, marketing, operations research, and adjacent fields. Under this approach to estimation, researchers model economic agents as if they solve dynamic programs. The econometric challenge is to infer parameters that bring the model outputs (which are typically simulated from solutions to the underlying dynamic programs) as close as possible to the data. Algorithm 4.2.1 gives an outline of the idea, with DP(θ) referring to a given dynamic program using a fixed parameterization indexed by θ.
Typically, DP(θ) needs to be solved many times before convergence. In this section, we set aside the estimation step, where d(Mθ,D) is constructed and θ is updated. We focus instead on the step where we compute optimal policy σθ by solving DP(θ). The types of dynamic programs typically adopted in this field have some interesting characteristics, which motivates our study.
We note that structural estimation is sometimes called dynamic discrete choice because the action space is typically finite. Below we will look at settings where the state space is arbitrary and the action space is finite.
Here F and G are conditional distributions, while e is, in most cases, a form of unobserved heterogeneity. By taking x=(z,e) and relabeling, we can write this equation as
Here (x,a) takes values in G:=X×A. We take X to be a metric space. The reward function r∈RG is assumed to be bounded and Borel measurable, while P is a stochastic kernel from G to X. The function g is usually called a post-action value function, since it returns the value of the state after committing to a given action in the current period.
The Bellman equation (4.45) is nonstandard relative to traditional presentations of dynamic programming due to (a) the reversed order of integration and maximization, and (b) the dependence of g on both state and action. It also fails to fit into the abstract dynamic programming formulation of Bertsekas (2022), Ren & Stachurski (2021), Toda (2024), etc. At the same time, there are significant advantages of working with this version of the Bellman equation in structural estimation settings (see, e.g., Rust (1994), Kristensen et al. (2021), or Chapter 5 of Sargent & Stachurski (2025)).
Despite its nontraditional format, we can set this problem up as an ADP by taking Σ to be the set of Borel measurable maps from X to A and, for each σ∈Σ, introducing the policy operator
Recalling that G is the product space X×A, let (bG,⩽) be the set of bounded Borel measurable functions in RG paired with the pointwise partial order. Using boundedness and measurability of r, it is straightforward to show that each T^σ is an order preserving self map on (bG,⩽). Letting T^SE be the set of all such T^σ, the pair (bG,T^SE) forms an ADP.
Does such a σ necessarily exist? On one hand, A is finite and nonempty, so the argmax set is nonempty for all x. On the other hand, we still need to address the following measurability issue:
(To address this problem, we can use a measurable selection theorem, such as Theorem A.3.3. But that theorem has a difficult proof. In the solution to Exercise 4.8, we use a more elementary argument.)
In view of Exercise 4.8, the ADP (bG,T^SE) is regular.
Given g, let σ be a g-greedy policy. The ADP Bellman operator obeys T^g=T^σg. Using this fact and (4.47), we see that
for all (x,a)∈G. It follows that the ADP Bellman equation g=T^g is equivalent to (4.45), confirming that our ADP accurately represents the problem we began with. We can now turn to optimality.
Next we consider a modification of the structural estimation model in Section 4.2.3.1 that includes state-dependent discounting. In this setting, the Bellman equation becomes
where (x,a)∈G:=X×A and A,X are finite and nonempty. As usual, r is a reward function on G and P is a transition kernel from G to X. The discount factor β is allowed to be a function of the state. We let ∥⋅∥ be the supremum norm and take Σ to be the set of all functions from X to A. The state X is taken to be finite to simplify the analysis.
Given σ∈Σ, let T^σ be defined at g∈RG and (x,a)∈G by
Let T^SE={T^σ}σ∈Σ. Each T^σ is an order-preserving self-map on RG, so (RG,T^SE) is an ADP. For each g∈RG, we can construct a g-greedy policy by taking a σ∈Σ such that
Since A is finite and nonempty, such a σ exists. (We have no measurability issues here because X is also finite.) As a result, the ADP (RG,T^SE) is regular.
Let T^ be the Bellman operator, so that T^g:=⋁σT^σg. When evaluated at a g-greedy policy σ, we have T^σg=T^g. Using this equality and (4.56) yields
Some studies find incompatibilities between data and predictions of models that use additively separable preferences and mathematical expectation to evaluate uncertain outcomes (see, e.g., Lu et al. (2024)). To further this line of analysis, we revisit the basic structural estimation model in Section 4.2.3.1 while replacing mathematical expectation with a general certainty equivalent operator.
As in Section 4.2.3.1, spaces of bounded real-valued functions are paired with the pointwise order ⩽, and the supremum norm, to be denoted by ∥⋅∥. The state space X is a metric space, A is a finite choice set, G:=X×A, the reward function r:G→R is measurable, and β∈(0,1) is a constant discount factor. However, we modify the Bellman equation (4.45) for the post-action value function to
Here M is a certainty equivalent operator mapping bX into bG; that is, M is order-preserving and M(v+κ1)=Mv+κ1 for all κ∈R+. An example is given below in Section 4.2.3.4.
Let Σ be the set of Borel measurable maps from X to A. Given σ∈Σ, we set
Since A is finite and nonempty, such a policy always exists. (A function σ obeying (4.62) can be chosen to be measurable---see the solution to Exercise 4.8.)
Given g∈bG, the ADP Bellman operator T^ satisfies T^g=T^σg whenever σ is g-greedy. Using this fact and (4.62), we obtain T^g=MHg. Hence any fixed point of T^ solves the original Bellman equation (4.60).
where P is a stochastic kernel from G to X and γ is a nonzero constant. Among other things, Proposition 4.2.6 tells us that σ∈Σ is optimal if and only if
This chapter extends the ADP framework of Chapter 2 and Chapter 3 by working within partially ordered metric and topological spaces, which provide a natural setting for combining contraction-based and order-theoretic reasoning. In practice, most of the pospaces we examine are subsets of Banach lattices. Background on Banach lattices and positive operators can be found in the references listed in the chapter notes of Chapter 2. Du’s conditions for concave and convex operators, treated in Section 4.1.3, originate in Du (1990); see also Zhang (2012), Theorem 2.1.2. Certainty equivalent operators, which appear in Section 4.2.3.3, are covered in depth in Section 7.2.4.
The applications in this chapter draw on several fields. The discounting condition in Assumption 4.2.3 is similar to restrictions found in Hansen & Scheinkman (2012) and Borovička & Stachurski (2020). General results on dynamic programming with state-dependent discounting can be found in Stachurski & Zhang (2021). The structural estimation framework in Section 4.2.3 originates with the classic work of Rust (1987); for further background, see Rust (1994), Igami (2020), Kristensen et al. (2021), and Chapter 5 of Sargent & Stachurski (2025). A duality-based perspective on dynamic discrete-choice models is developed in Chiong et al. (2016). The non-expected utility extension in Section 4.2.3.3 is motivated by evidence reviewed in Lu et al. (2024). Nonexpected utility is considered again in Section 7.2.4.
Stokey, N. L., & Lucas, R. E. (1989). Recursive methods in dynamic economics. Harvard University Press.
Harrison, J. M., & Kreps, D. M. (1978). Speculative investor behavior in a stock market with heterogeneous expectations. The Quarterly Journal of Economics, 92(2), 323–336.
Du, Y. (1990). Fixed points of increasing operators in ordered Banach spaces and applications. Applicable Analysis, 38(01–02), 1–20.
Zhang, Z. (2012). Variational, topological, and partial order methods with their applications (Vol. 29). Springer.
Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 999–1033.
Bertsekas, D. P. (2022). Abstract dynamic programming (3rd ed.). Athena Scientific.
Ren, G., & Stachurski, J. (2021). Dynamic programming with value convexity. Automatica, 130, 109641.
Toda, A. A. (2024). Essential Mathematics for Economics (1st ed., p. 308). Chapman. 10.1201/9781032698953
Rust, J. (1994). Structural estimation of Markov decision processes. Handbook of Econometrics, 4, 3081–3143.
Kristensen, D., Mogensen, P. K., Moon, J. M., & Schjerning, B. (2021). Solving dynamic discrete choice models using smoothing and sieve methods. Journal of Econometrics, 223(2), 328–360.
Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
Lu, J., Luo, Y., Saito, K., & Xin, Y. (2024). Did Harold Zuercher Have Time-Separable Preferences?
Hansen, L. P., & Scheinkman, J. A. (2012). Recursive utility in a Markov environment with stochastic growth. Proceedings of the National Academy of Sciences, 109(30), 11967–11972.
Borovička, J., & Stachurski, J. (2020). Necessary and sufficient conditions for existence and uniqueness of recursive utilities. The Journal of Finance.
Stachurski, J., & Zhang, J. (2021). Dynamic programming with state-dependent discounting. Journal of Economic Theory, 192, 105190.