Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

4 ADPs on Banach Space

Many applications of interest have some kind of algebraic structure, for example when the value space is a subset of a vector space. In this chapter, we add algebraic structure to the value space and the policy operators. This structure can be exploited to provide sharp optimality conditions and more explicit results.

When introducing algebraic structure to ADP theory, we work in the setting of Banach lattices. These spaces are attractive for ADP analysis, due to well-integrated order, algebraic and metric properties. (See Section A.5.3.3 for background on Banach lattices. We recall that, given a Banach space EE, B(E)\blop(E) is the set of all bounded linear operators from EE to itself and, for LB(E)L \in \blop(E), the symbol ρ(L)\rho(L) denotes the spectral radius (see Section A.4.3). We also use the notion of positive operators defined in Section A.5.2.2. Loosely speaking, positive operators between Banach lattices are generalizations of nonnegative matrices.)

We begin by developing optimality theory for ADPs on Banach lattices, covering contractions and Blackwell’s condition (Section 4.1.1), order contractions (Section 4.1.2), and concavity-based methods (Section 4.1.3). We then apply the theory to firm valuation, real options, and structural estimation.

4.1ADPs on Banach Space

In this section, EE is always a Banach lattice. For VV contained in EE, an ADP (V,T)(V, \TT) will be called additive if each TσTT_\sigma \in \TT has the form

Tσv=rσ+KσvT_\sigma \, v = r_\sigma + K_\sigma \, v

where rσEr_\sigma \in E and KσK_\sigma is an order-preserving self-map on VV.

We study contractions and Blackwell’s condition in Section 4.1.1, order contractions in Section 4.1.2, and concavity-based methods in Section 4.1.3.

4.1.1Contractions

In Section 4.1.1.1 we review eventually contracting maps and Blackwell’s condition. In Section 4.1.1.2 we use these tools to obtain optimality results for ADPs satisfying a Blackwell-type discounting condition.

4.1.1.1Contractions in Banach Space

Let E=(E,,)E = (E, \|\cdot\|, \leq) be a Banach lattice with positive cone E+E_+. Let SS be a self-map on a subset VV of EE. Recall from Section A.2.2.2 that SS is said to be eventually contracting on VV if SnS^n is a contraction for some nNn \in \NN; that is, there exists a λ[0,1)\lambda \in [0, 1) and an nNn \in \NN such that

SnuSnvλuvfor allu,vV,\|S^n u - S^n v \| \leq \lambda \|u - v\| \quad \text{for all} \quad u, v \in V,

The following result is an obvious consequence of Theorem A.2.8.

There is a well-known technique for testing for contractivity (or eventual contractivity) of order preserving maps via Blackwell’s condition. This technique is often used for dynamic programming (see, e.g., Stokey & Lucas (1989), Theorem 3.3). Here we state an abstract version, for a self-map SS on VEV \subset E, focusing on contractivity in one step. (For eventual contractivity, replace SS with SnS^n.) In the statement, we assume that EE has a normalized order unit. Recalling the definition in Section A.5.3.4, this means that there exists an eE+e \in E_+ obeying e=1\| e\| = 1 and vve|v| \leq \|v\| e for all vEv \in E. Also, we recall that VEV \subset E is called increasing if u,vEu, v \in E with uvu \leq v and uVu \in V implies vVv \in V.

4.1.1.2Optimality for Blackwell ADPs

Throughout this section, EE is a Banach lattice containing a normalized order unit ee and VV is a closed increasing subset of EE. In the statement of the next theorem, V0V_0 is a subset of VV.

A self-map MM on VV will be called a certainty equivalent operator if MM is order preserving and translation invariant; that is,

 M(v+κe)=Mv+κe for all vV and all κR+ .\text{ } M(v + \kappa e) = Mv + \kappa e \text{ for all } v \in V \text{ and all } \kappa \in \RR_+ \text{ }.

(The terminology comes from the abstract theory of certainty equivalents. We cover these objects in more detail in Section 7.2.4.)

The next exercise treats a special case in which the policy operators of an additive ADP (see (4.1)) are built from certainty equivalent operators. Certainty equivalents are treated in more detail in Section 7.2.4. As before, VV is a closed increasing subset of EE and V0V_0 is a subset of VV.

4.1.2Order Contractions

We first define order contractions and establish fixed point results. We then obtain optimality conditions for order contracting ADPs and specialize to the case of affine policy operators in Section 4.1.2.3.

4.1.2.1Order Contractions

Let E=(E,,)E = (E, \|\cdot\|, \leq) be a Banach lattice with positive cone E+E_+. We call D ⁣:E+E+D \colon E_+ \to E_+ a discount operator on EE if

  1. D0=0D0 = 0,

  2. DD is order-preserving, and

  3. DD is eventually contracting.

The reason we call such an operator DD a discount operator will become clearer below. Intuitively, DD is eventually contracting and has a fixed point at zero, so Dnh0D^n h \to 0 for all hE+h \in E_+. If we think of hh as a time-nn payoff and DnhD^n h as its (state-contingent) present value, then the properties of DD imply that the present value is increasing in the payoff (since DD is order-preserving) and the present value converges to zero as the future date of the payoff moves to the infinite future.

Fix VEV \subset E. Let SS be a self-map on VV. We call SS an order contraction of modulus DD on VV if there exists a discount operator DD on EE such that

SvSwDvwfor allv,wV.|S \,v - S \, w| \leq D |v - w| \quad \text{for all} \quad v, w \in V.

Order contracting maps obey the following fixed point result.

The next exercise states a variation on Blackwell’s condition that is suitable for order contractions. To begin, we take VV to be a subset of EE such that

v+hV whenever vV and hE+.v + h \in V \text{ whenever } v \in V \text{ and } h \in E_+.

4.1.2.2Order Contracting ADPs

Next we study ADPs where the value space VV is a subset of a Banach lattice EE. The results in this section extend to ADPs where the operators are order contractions. We do not require existence of a normalized order unit, as in Section 4.1.1.2. This will allow us to handle ADPs that evolve in LpL_p spaces and simplify treatment of features such as state-dependent discounting.

Our first result treats the case where T\TT is finite.

Our second result replaces finiteness with uniform order contractivity. In the statement of the next theorem, V0V_0 is a subset of VV and VV is closed in EE.

The proof is straightforward:

4.1.2.3Order Contractive Linear Models

Next we focus on ADPs operating in Banach lattices where the policy operators are affine. As before, EE is a Banach lattice with positive cone E+E_+, linear operators B(E)\blop(E) and positive linear operators B+(E)\blop_+(E).

Our second result replaces finiteness with a uniform discount operator bound. In the statement, V0V_0 is a subset of VV and VV is closed in EE.

4.1.3Concavity and Convexity

Some dynamic programs involve nonlinear policy operators that fail to be contractions. For example, models with recursive preferences or ambiguity often have these features. In this setting, we can deploy alternative fixed point results related to concavity and convexity of operators. Here we apply such results to obtain optimality conditions for ADPs.

We first state Du’s fixed point theorem for concave and convex operators and then apply it to obtain optimality conditions for ADPs on order intervals.

4.1.3.1Fixed Point Results

Let E=(E,,)E = (E, \|\cdot\|, \leq) be a Banach lattice and suppose that V=[a,b]V = [a, b] for some a,bEa, b \in E. In this setting, we say that a self-map SS on VV satisfies Du’s conditions if either

  1. SS is concave and Saa+ϵ(ba)S \, a \geq a + \epsilon (b-a) for some ϵ(0,1)\epsilon \in (0,1), or

  2. SS is convex and Sbbϵ(ba)S \, b \leq b - \epsilon (b-a) for some ϵ(0,1)\epsilon \in (0,1).

We state a result below connecting Du’s conditions to global stability. Before doing so, we note some useful sufficient conditions that can be used when the positive cone of the Banach lattice has nonempty interior. To state them we write xyx \ll y if yxy - x is interior to E+E_+.

Here is the stability result.

For a proof consult Du (1990) or Zhang (2012), Theorem 2.1.2.

Du’s conditions in one dimension

Figure 4.1:Du’s conditions in one dimension

4.1.3.2Optimality Theory

The following result provides optimality conditions for ADPs where the policy operators satisfy Du’s conditions.

4.2Applications

We apply the theory developed above to three classes of problems. In Section 4.2.1 we study firm valuation under constant discounting, unbounded rewards, and state-dependent discounting. In Section 4.2.2 we analyze a real option problem. In Section 4.2.3 we treat structural estimation models, including state-dependent discounting and non-expected utility preferences.

4.2.1Firm Valuation

Let’s now return to the firm valuation problem from Section 1.1 and discuss optimality results. We will look at the original bounded case with discounting, a second case with unbounded rewards (boundedness of the profit function is replaced by an integrability condition), and a third case involving state-dependent discounting.

4.2.1.1Constant Discounting

In Section 2.3.1, we revisited the firm problem from Section 1.1 and showed that (bX,TFV)(b\Xsf, \TT_{\rm FV}) is an ADP, where TFV{Tσ:σΣ}\TT_{\rm FV} \coloneq \setntn{T_\sigma}{\sigma \in \Sigma} is the set of all policy operators having the form

Tσv=σs+(1σ)(π+βPv).T_\sigma \, v = \sigma s + (1 - \sigma) \left( \pi + \beta Pv \right).

(Here σs\sigma s is understood as the map xsσ(x)x \mapsto s\sigma(x) and so on.) We recall that the policy set Σ\Sigma is all B\bB-measurable functions mapping X\Xsf to {0,1}\{0,1\}, which coincides with the set of indicator functions on B\bB, and that the indicator function

σ=1{sπ+βPv}\sigma = \1\left\{s \geq \pi + \beta Pv \right\}

is vv-greedy. Existence of vv-greedy policies for all vv implies that the firm ADP is regular.

In Theorem 1.1.1 we stated optimality results for the firm valuation problem. In Section 1.1.1.4 we supplied a direct proof. Here’s a more general result and a proof using the theory from this chapter:

As usual, the ADP Bellman operator TT obeys Tv=TσvTv = T_\sigma \, v whenever σ\sigma is vv-greedy, so, using the description of greedy policies above, we have Tv=s(π+βPv)T v = s \vee (\pi + \beta P v). The Bellman equation is therefore v=s(π+βPv)v = s \vee (\pi + \beta P v), which, in expanded form, becomes

v(x)=max{s,  π(x)+βv(x)P(x, ⁣dx)}v(x) = \max \left\{ s, \; \pi(x) + \beta \int v(x') P(x, \diff x') \right\}

Proposition 4.2.1 implies that the value function vσvσ\vmax \coloneq \bigvee_\sigma v_\sigma solves the Bellman equation (4.26), and that it can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are v\vmax-greedy, so σ=1{sπ+βPv}\sigma = \1\{s \geq \pi + \beta P \vmax\} is optimal. This is the only optimal policy under the convention that the manager always sells the firm when indifferent between selling and continuing.

HPI iterates for the firm valuation problem

Figure 4.2:HPI iterates for the firm valuation problem

Figure 4.2 illustrates HPI applied to the firm valuation problem using the same parameters as Figure 1.4. The initial policy is σ00\sigma_0 \equiv 0 (never sell). At each step kk, the lifetime value vσkv_{\sigma_k} is obtained by solving the linear system v=rσk+Kσkvv = r_{\sigma_k} + K_{\sigma_k} v, where rσkr_{\sigma_k} and KσkK_{\sigma_k} are as defined in (4.25). The next policy σk+1\sigma_{k+1} is then set to be vσkv_{\sigma_k}-greedy. In the right panel, the value functions rise monotonically towards v\vmax (shown in black). The left panel shows the corresponding sell thresholds converging to the optimal policy σ\sigopt.

4.2.1.2Unbounded Rewards

Proposition 4.2.1 used the assumptions in Theorem 1.1.1, one of which was that the profit function π\pi is bounded. In practice, many of the functions that we use for applied modeling are unbounded. In Section 1.1.2.2 we sketched ideas for extending the optimality results to an unbounded setting, where π\pi is assumed instead to be integrable. Let’s continue that discussion here.

Instead of boundedness, we assume π\pi lies in L1(ψ)L1(X,B,ψ)L_1(\psi) \coloneq L_1(\Xsf, \bB, \psi) for a distribution ψ\psi on (X,B)(\Xsf, \bB) that is stationary for PP (see Section A.5.4.4). We endow L1(ψ)L_1(\psi) with the almost everywhere pointwise partial order (see Section A.5.2.5). In particular, for f,gL1(ψ)f, g \in L_1(\psi), the statement fgf \leq g means that {xX:f(x)>g(x)}\setntn{x \in \Xsf}{f(x) > g(x)} has ψ\psi-measure zero.

The firm valuation ADP is now (L1(ψ),TFV)(L_1(\psi), \TT_{\rm FV}), where each TσT_\sigma in TFV\TT_{\rm FV} again has the form (4.23), but now with vv and π\pi being elements of L1(ψ)L_1(\psi), while PP is understood as a Markov operator on L1(ψ)L_1(\psi). In Section 1.1.2.2 we showed that each policy operator TσT_\sigma maps L1(ψ)L_1(\psi) into itself. Moreover, TσT_\sigma is order preserving with respect to \leq; if vwv \leq w holds ψ\psi-almost everywhere, then

v(x)P(x, ⁣dx)w(x)P(x, ⁣dx)\int v(x') P(x, \diff x') \leq \int w(x') P(x, \diff x')

for all xXx \in \Xsf, and TσvTσwT_\sigma \, v \leq T_\sigma \, w easily follows. This confirms that (L1(ψ),TFV)(L_1(\psi), \TT_{\rm FV}) is an ADP. The ADP is regular because, given vL1(ψ)v \in L_1(\psi), the indicator in (4.24) is still vv-greedy.

The claims in Proposition 4.2.1 extend to the ADP (L1(ψ),TFV)(L_1(\psi), \TT_{\rm FV}). The proof follows from Theorem 4.1.8. Each TσT_\sigma has the required form Tσv=rσ+KσvT_\sigma \, v = r_\sigma + K_\sigma \, v for rσL1(ψ)r_\sigma \in L_1(\psi) and KσB+(L1(ψ))K_\sigma \in \blop_+(L_1(\psi)); we again set rσr_\sigma and KσK_\sigma as in (4.25). With KβPK \coloneq \beta P, we have KσKK_\sigma \leq K for all σ\sigma and, by Lemma A.5.32 on page , ρ(K)=ρ(βP)=β<1\rho(K) = \rho(\beta P) = \beta < 1. As (L1(ψ),TFV)(L_1(\psi), \TT_{\rm FV}) is regular, the conclusions of Theorem 4.1.8 apply.

4.2.1.3State-Dependent Discounting

We emphasized the importance of time-varying discount rates in Section 1.1.2.1. To incorporate such variation into our model, we set rt=r(Xt)r_t = r(X_t), where rr is a B\bB-measurable function, and then β(x)=1/(1+r(x))\beta(x) = 1/(1+r(x)) for all xXx \in \Xsf. We require that r>1r > -1, so that $\beta

0.Forsimplicity,wealsoassumethat. For simplicity, we also assume that \betaisboundedand,asinrefssscondi,thattheprofitfunction is bounded and, as in {ref}`sss-condi`, that the profit function \piisbounded.Atcurrentstate is bounded. At current state x,afixedpolicy, a fixed policy \sigmayieldslifetimefirmvalue yields lifetime firm value v_\sigma(x),where, where v_\sigma$ satisfies the recursion

vσ(x)=σ(x)s+(1σ(x))[π(x)+β(x)vσ(x)P(x, ⁣dx)](xX).v_\sigma(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta(x) \int v_\sigma(x') P(x, \diff x') \right] \quad (x \in \Xsf).

Equivalently, vσv_\sigma is a fixed point of TσT_\sigma defined at vbXv \in b\Xsf by

Tσv=σs+(1σ)(π+Kv),T_\sigma \, v = \sigma s + (1-\sigma)(\pi + Kv),

where

(Kv)(x)β(x)v(x)P(x, ⁣dx)(vbX,  xX).(Kv)(x) \coloneq \beta(x) \int v(x') P(x, \diff x') \qquad (v \in b\Xsf, \; x \in \Xsf).

The operator KK discounts future cash flows given the dynamics of discounting embedded in β\beta and PP. Since β\beta is bounded, KK is a positive linear operator sending bXb\Xsf into itself. It follows that each TσT_\sigma is an order-preserving self-map on bXb\Xsf. Hence, letting TFV\TT_{\rm FV} be all policy operators of the form (4.29), with σ\sigma ranging over the policy set Σ\Sigma, the pair (bX,TFV)(b\Xsf, \TT_{\rm FV}) is an ADP. It represents the dynamic decision problem for the firm under state-dependent discounting.

Using similar arguments to the constant discounting case, we can confirm that the policy σ=1{sπ+Kv}\sigma = \1\{s \geq \pi + Kv\} is vv-greedy. The ADP Bellman operator TT obeys Tv=TσvTv = T_\sigma \, v whenever σ\sigma is vv-greedy, so, using the description of greedy policies just given, we have $T v = s \vee (\pi

v(x)=max{s,  π(x)+β(x)v(x)P(x, ⁣dx)}v(x) = \max \left\{ s, \; \pi(x) + \beta(x) \int v(x') P(x, \diff x') \right\}

Since greedy policies always exist, the ADP (bX,TFV)(b\Xsf, \TT_{\rm FV}) is regular.

In order to obtain optimality results, we need some degree of stability for the ADP. To this end, we impose the following condition:

Assumption 4.2.1 requires that the discount factor is sufficiently small “on average,” so that lifetime values are finite and uniquely defined.

To illustrate the factors that influence ρ(K)\rho(K), suppose that (Xt)(X_t) follows the AR(1) process Xt+1=μ(1α)+αXt+νZt+1X_{t+1} = \mu(1-\alpha) + \alpha X_t + \nu Z_{t+1}, with ZtZ_t iid standard normal, 0<α<10 < \alpha < 1, and ν>0\nu > 0, discretized via Tauchen’s method, with discount factor β(x)=ex\beta(x) = e^x. Figure 4.3 shows contour plots of ρ(K)\rho(K) as α\alpha and ν\nu vary, for two values of the long-run mean μ\mu. The black line marks the boundary ρ(K)=1\rho(K) = 1; Assumption 4.2.1 holds below and to the left of this line. A more negative μ\mu (lower average discount factor) enlarges the stable region. In both panels, ρ(K)\rho(K) increases with persistence and volatility, reflecting the fact that greater variation in the state process pushes the “average” discount factor upward.

Spectral radius \rho(K) as a function of the AR(1) parameters, with \beta(x) = e^x

Figure 4.3:Spectral radius ρ(K)\rho(K) as a function of the AR(1) parameters, with β(x)=ex\beta(x) = e^x

We can now state the following result for the firm valuation ADP (bX,TFV)(b\Xsf, \TT_{\rm FV}) under state-dependent discounting.

4.2.2A Real Option Problem

Consider a firm that has developed a prototype product and faces a strategic decision: launch now, or continue development? The relative payoffs for these different choices depend on the economic environment, which evolves stochastically over time. This scenario is an example of a real option problem. The firm holds an option to launch, analogous to a call option on a financial asset, and must determine when to exercise it. Waiting has value when it allows the firm to launch under more favorable conditions.

4.2.2.1Set Up

A state process (Xt)(X_t) summarizes payoff-relevant information at time tt, such as demand conditions, competitive pressure, factor prices, and regulatory stance. The state influences the firm through two channels. Development costs c(Xt)c(X_t) rise when input prices increase or skilled labor becomes scarce. Post-launch profitability π(Xt)\pi(X_t) depends on demand strength and competitive conditions at the time of launch. Management strategies must balance these forces, weighing the benefits of launching in different states against the costs of continued development.

We assume that (Xt)t0(X_t)_{t \geq 0} is PP-Markov, where PP is a stochastic kernel on the measurable space (X,B)(\Xsf, \bB). At the start of time tt, the firm pays a current flow cost c(Xt)c(X_t) for development. Then, after observing the new state Xt+1X_{t+1}, management decides whether or not to launch the product. If they decide to launch, the firm receives a state-contingent profit flow (π(Xt+j))j1(\pi(X_{t+j}))_{j \geq 1}, where π ⁣:XR\pi \colon \Xsf \to \RR is a given function. If they decide to wait, then no launch occurs and the process repeats. We impose the following weak conditions.

We will seek solutions to the optimization problem in the space L1(ϕ)L1(X,B,ϕ)L_1(\phi) \coloneq L_1(\Xsf, \bB, \phi). As in Section 4.2.1.2, we endow L1(ϕ)L_1(\phi) with the usual L1L_1 norm and the ϕ\phi-a.e. pointwise order \leq, so that fgf \leq g means ϕ{f>g}=0\phi \{f > g\} = 0.

Let (Qt+j)j1(Q_{t+j})_{j \geq 1} be the expected present value of the profit flow conditional on deciding to launch. This sequence obeys the recursion

Qt+j=π(Xt+j)+β(Xt+j)Et+jQt+j+1(j1).Q_{t + j} = \pi(X_{t+j}) + \beta(X_{t+j}) \EE_{t + j} Q_{t + j + 1} \qquad (j \geq 1).

Exploiting the time homogeneity of the state process and following the same logic that we used to solve the recursion (1.3) on page , we find that Qt+j=q(Xt+j)Q_{t+j} = q(X_{t+j}) for all j1j \geq 1, where qq is the function that solves

q(x)=π(x)+β(x)q(x)P(x, ⁣dx)(xX).q(x) = \pi(x) + \beta(x) \int q(x') P(x, \diff x') \qquad (x \in \Xsf).

Letting KK be the discount operator defined on L1(ϕ)L_1(\phi) via

(Kv)(x)β(x)v(x)P(x, ⁣dx)(vL1(ϕ),  xX).(Kv)(x) \coloneq \beta(x) \int v(x') P(x, \diff x') \qquad (v \in L_1(\phi), \; x \in \Xsf).

we can write the recursion as q=π+Kqq = \pi + K q. Under the ϕ\phi-a.e. pointwise order, KK is order-preserving.

We can confirm that KK maps L1(ϕ)L_1(\phi) to itself by using the fact that, under Assumption 4.2.2, the discount function β\beta obeys βN|\beta| \leq N for some NNN \in \NN. Hence, for vL1(ϕ)v \in L_1(\phi),

(Kv)(x)β(x)v(x)P(x, ⁣dx)Nv(x)P(x, ⁣dx),|(Kv)(x)| \leq |\beta(x)| \int |v(x')| P(x, \diff x') \leq N \int |v(x')| P(x, \diff x'),

and vL1(ϕ)v \in L_1(\phi) implies PvL1(ϕ)P|v| \in L_1(\phi) by Lemma A.5.32.

4.2.2.2Lifetime Values

We derive lifetime values under a spectral radius condition on the discount operator and establish well-posedness of the ADP. Optimality results are given in Section 4.2.2.3.

We impose conditions under which the recursion q=π+Kqq = \pi + K q has a unique solution, implying that the profit function associated with launching the product is well defined.

Under Assumption 4.2.3, the solution to the recursion q=π+Kqq = \pi + K q is

q=(IK)1π.q = (I - K)^{-1} \pi.

With this function in hand, we can write out the lifetime value of policies. A policy here is a Borel measurable map σ\sigma from X\Xsf to {0,1}\{0,1\} with σ(x)=1\sigma(x) = 1 indicating the decision to launch the product at state xx and σ(x)=0\sigma(x) = 0 indicating the decision to continue. As usual, we use Σ\Sigma to represent the set of all policies. If σΣ\sigma \in \Sigma and xXx \in \Xsf, then vσ(x)v_\sigma(x) denotes total firm value under policy σ\sigma, given initial state xx. This function vσv_\sigma obeys the recursion

vσ(x)=c(x)+β(x)[σ(x)q(x)+(1σ(x))vσ(x)]P(x, ⁣dx)(xX).v_\sigma(x) = -c(x) + \beta(x) \int [\sigma(x') q(x') + (1 - \sigma(x')) v_\sigma(x') ] P(x, \diff x') \quad (x \in \Xsf).

Equivalently, vσv_\sigma is a fixed point of TσT_\sigma defined at vL1(ϕ)v \in L_1(\phi) by

Tσv=c+K(σq+(1σ)v)T_\sigma \, v = -c + K (\sigma \, q + (1 - \sigma) v)

Under Assumption 4.2.2 and Assumption 4.2.3, each TσT_\sigma is a self-map on L1(ϕ)L_1(\phi). Since KK is a positive operator, each TσT_\sigma is order preserving. Hence, letting TRO\TT_{\rm RO} be all policy operators of the form (4.38), with σ\sigma ranging over the policy set Σ\Sigma, the pair (L1(ϕ),TRO)(L_1(\phi), \TT_{\rm RO}) is an ADP.

4.2.2.3Optimality

Let’s now turn to optimality. We start by studying greedy policies.

Exercise 4.7 implies that (L1(ϕ),TRO)(L_1(\phi), \TT_{\rm RO}) is regular. From Lemma 2.1.1 we know that the ADP Bellman operator satisfies Tv=TσvTv = T_\sigma \, v whenever σ\sigma is vv-greedy. Using this fact and the greedy policy σ=1{qv}\sigma = \1\{q \geq v\} from Exercise 4.7, we obtain

Tv=Tσv=c+K(σq+(1σ)v)=c+K(qv).T v = T_\sigma \, v = -c + K (\sigma \, q + (1 - \sigma) v) = -c + K (q \vee v).

It follows that the Bellman equation for the ADP is v=c+K(qv)v = -c + K (q \vee v). Rewriting this expression using the definition of KK, we get

v(x)=c(x)+β(x)max{q(x),v(x)}P(x, ⁣dx)v(x) = -c(x) + \beta(x) \int \max \left\{q(x'), v(x') \right\} P(x, \diff x')

We can now state the following result for the real option ADP (L1(ϕ),TRO)(L_1(\phi), \TT_{\rm RO}).

Proposition 4.2.3 implies that the value function v\vmax solves the Bellman equation (4.43), and that v\vmax can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are v\vmax-greedy, which, by Exercise 4.7, translates to setting σ=1{qv}\sigma = \1\{q \geq \vmax\}.

4.2.3Structural Estimation

Structural estimation is a core sub-field of quantitative economics that also plays a significant role in finance, marketing, operations research, and adjacent fields. Under this approach to estimation, researchers model economic agents as if they solve dynamic programs. The econometric challenge is to infer parameters that bring the model outputs (which are typically simulated from solutions to the underlying dynamic programs) as close as possible to the data. Algorithm 4.2.1 gives an outline of the idea, with DP(θ\theta) referring to a given dynamic program using a fixed parameterization indexed by θ\theta.

Typically, DP(θ\theta) needs to be solved many times before convergence. In this section, we set aside the estimation step, where d(Mθ,D)d(\Msf_\theta, \Dsf) is constructed and θ\theta is updated. We focus instead on the step where we compute optimal policy σθ\sigma_\theta by solving DP(θ)(\theta). The types of dynamic programs typically adopted in this field have some interesting characteristics, which motivates our study.

We note that structural estimation is sometimes called dynamic discrete choice because the action space is typically finite. Below we will look at settings where the state space is arbitrary and the action space is finite.

4.2.3.1Post-Action Value Functions

Rust (1987) and many subsequent authors study discrete choice problems with modified Bellman equations that take the form

g(z,a)=maxaA{r(z,e,a)+βg(z,a)}F( ⁣dez)G( ⁣dzz,a).g(z, a) = \int \int \max_{a' \in \Asf} \left\{ r(z', e', a') + \beta g(z', a') \right\} F(\diff e' \given z) G(\diff z' \given z, a).

Here FF and GG are conditional distributions, while ee is, in most cases, a form of unobserved heterogeneity. By taking x=(z,e)x = (z, e) and relabeling, we can write this equation as

g(x,a)=maxaA[r(x,a)+βg(x,a)]P(x,a, ⁣dx)g(x, a) = \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x')

Here (x,a)(x,a) takes values in GX×A\Gsf \coloneq \Xsf \times \Asf. We take X\Xsf to be a metric space. The reward function rRGr \in \RR^\Gsf is assumed to be bounded and Borel measurable, while PP is a stochastic kernel from G\Gsf to X\Xsf. The function gg is usually called a post-action value function, since it returns the value of the state after committing to a given action in the current period.

The Bellman equation (4.45) is nonstandard relative to traditional presentations of dynamic programming due to (a) the reversed order of integration and maximization, and (b) the dependence of gg on both state and action. It also fails to fit into the abstract dynamic programming formulation of Bertsekas (2022), Ren & Stachurski (2021), Toda (2024), etc. At the same time, there are significant advantages of working with this version of the Bellman equation in structural estimation settings (see, e.g., Rust (1994), Kristensen et al. (2021), or Chapter 5 of Sargent & Stachurski (2025)).

Despite its nontraditional format, we can set this problem up as an ADP by taking Σ\Sigma to be the set of Borel measurable maps from X\Xsf to A\Asf and, for each σΣ\sigma \in \Sigma, introducing the policy operator

(T^σg)(x,a)=[r(x,σ(x))+βg(x,σ(x))]P(x,a, ⁣dx).(\hat T_\sigma \, g)(x, a) = \int [ r(x', \sigma(x')) + \beta g(x', \sigma(x')) ] P(x, a, \diff x').

Recalling that G\Gsf is the product space X×A\Xsf \times \Asf, let (bG,)(b\Gsf, \leq) be the set of bounded Borel measurable functions in RG\RR^\Gsf paired with the pointwise partial order. Using boundedness and measurability of rr, it is straightforward to show that each T^σ\hat T_\sigma is an order preserving self map on (bG,)(b\Gsf, \leq). Letting T^SE\hat{\TT}_{\rm SE} be the set of all such T^σ\hat T_\sigma, the pair (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}) forms an ADP.

Any σΣ\sigma \in \Sigma obeying

σ(x)argmaxaA{r(x,a)+βg(x,a)}for all xX\sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta g(x, a)\} \quad \text{for all } x \in \Xsf

is gg-greedy, since, for such a σ\sigma and any τΣ\tau \in \Sigma, we clearly have

T^τg(x,a)maxaA[r(x,a)+βg(x,a)]P(x,a, ⁣dx)=T^σg(x,a)\hat T_\tau \, g(x, a) \leq \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x') = \hat T_\sigma \, g(x, a)

for all (x,a)G(x, a) \in \Gsf.

Does such a σ\sigma necessarily exist? On one hand, A\Asf is finite and nonempty, so the argmax set is nonempty for all xx. On the other hand, we still need to address the following measurability issue:

(To address this problem, we can use a measurable selection theorem, such as Theorem A.3.3. But that theorem has a difficult proof. In the solution to Exercise 4.8, we use a more elementary argument.)

In view of Exercise 4.8, the ADP (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}) is regular.

Given gg, let σ\sigma be a gg-greedy policy. The ADP Bellman operator obeys T^g=T^σg\hat T g = \hat T_\sigma \, g. Using this fact and (4.47), we see that

(T^g)(x,a)=maxaA[r(x,a)+βg(x,a)]P(x,a, ⁣dx)(\hat T g)(x, a) = \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x')

for all (x,a)G(x,a) \in \Gsf. It follows that the ADP Bellman equation g=T^gg = \hat T g is equivalent to (4.45), confirming that our ADP accurately represents the problem we began with. We can now turn to optimality.

4.2.3.2State-Dependent Discounting

Next we consider a modification of the structural estimation model in Section 4.2.3.1 that includes state-dependent discounting. In this setting, the Bellman equation becomes

g(x,a)=xmaxaA[r(x,a)+β(x)g(x,a)]P(x,a,x)g(x, a) = \sum_{x'} \max_{a' \in \Asf} \left[ r(x', a') + \beta(x') g(x', a') \right] P(x, a, x')

where (x,a)GX×A(x,a) \in \Gsf \coloneq \Xsf \times \Asf and A,X\Asf, \Xsf are finite and nonempty. As usual, rr is a reward function on G\Gsf and PP is a transition kernel from G\Gsf to X\Xsf. The discount factor β\beta is allowed to be a function of the state. We let \| \cdot \| be the supremum norm and take Σ\Sigma to be the set of all functions from X\Xsf to A\Asf. The state X\Xsf is taken to be finite to simplify the analysis.

Given σΣ\sigma \in \Sigma, let T^σ\hat T_\sigma be defined at gRGg \in \RR^\Gsf and (x,a)G(x,a) \in \Gsf by

(T^σg)(x,a)=x[r(x,σ(x))+β(x)g(x,σ(x))]P(x,a,x)(\hat T_\sigma \, g) (x, a) = \sum_{x'} \left[ r(x', \sigma(x')) + \beta(x') g(x', \sigma(x')) \right] P(x, a, x')

Let T^SE={T^σ}σΣ\hat{\TT}_{\rm SE} = \{\hat T_\sigma\}_{\sigma \in \Sigma}. Each T^σ\hat T_\sigma is an order-preserving self-map on RG\RR^\Gsf, so (RG,T^SE)(\RR^\Gsf, \hat{\TT}_{\rm SE}) is an ADP. For each gRGg \in \RR^\Gsf, we can construct a gg-greedy policy by taking a σΣ\sigma \in \Sigma such that

σ(x)argmaxaA[r(x,a)+β(x)g(x,a)]for all xX.\sigma(x) \in \argmax_{a \in \Asf} [r(x, a) + \beta(x) g(x, a)] \quad \text{for all } x \in \Xsf.

Since A\Asf is finite and nonempty, such a σ\sigma exists. (We have no measurability issues here because X\Xsf is also finite.) As a result, the ADP (RG,T^SE)(\RR^\Gsf, \hat{\TT}_{\rm SE}) is regular.

Let T^\hat T be the Bellman operator, so that T^gσT^σg\hat T g \coloneq \bigvee_\sigma \hat T_\sigma \, g. When evaluated at a gg-greedy policy σ\sigma, we have T^σg=T^g\hat T_\sigma \, g = \hat T g. Using this equality and (4.56) yields

(T^g)(x,a)=xmaxaA[r(x,a)+β(x)g(x,a)]P(x,a,x).(\hat T g)(x, a) = \sum_{x'} \max_{a' \in \Asf} \left[ r(x', a') + \beta(x') g(x', a') \right] P(x, a, x').

This confirms that solutions to the ADP Bellman equation T^g=g\hat T g = g solve the original Bellman equation (4.54).

For each σΣ\sigma \in \Sigma we set

(Kσg)(x,a)=xβ(x)g(x,σ(x))P(x,a,x)(x,a)G(K_\sigma g)(x, a) = \sum_{x'} \beta(x') g(x', \sigma(x')) P(x, a, x') \qquad (x, a) \in \Gsf

We can now state the following result:

This discounting condition generalizes the traditional assumption that β\beta is constant and strictly less than one, in which case ρ(Kσ)<1\rho(K_\sigma) < 1 always holds.

4.2.3.3Beyond Expected Utility

Some studies find incompatibilities between data and predictions of models that use additively separable preferences and mathematical expectation to evaluate uncertain outcomes (see, e.g., Lu et al. (2024)). To further this line of analysis, we revisit the basic structural estimation model in Section 4.2.3.1 while replacing mathematical expectation with a general certainty equivalent operator.

As in Section 4.2.3.1, spaces of bounded real-valued functions are paired with the pointwise order \leq, and the supremum norm, to be denoted by \| \cdot \|. The state space X\Xsf is a metric space, A\Asf is a finite choice set, GX×A\Gsf \coloneq \Xsf \times \Asf, the reward function r ⁣:GRr \colon \Gsf \to \RR is measurable, and β(0,1)\beta \in (0,1) is a constant discount factor. However, we modify the Bellman equation (4.45) for the post-action value function to

g(x,a)=(MHg)(x,a)where (Hg)(x)maxaA[r(x,a)+βg(x,a)].g(x, a) = (M H g)(x, a) \quad \text{where } (Hg)(x') \coloneq \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right].

Here MM is a certainty equivalent operator mapping bXb\Xsf into bGb\Gsf; that is, MM is order-preserving and M(v+κ1)=Mv+κ1M(v + \kappa \1) = Mv + \kappa \1 for all κR+\kappa \in \RR_+. An example is given below in Section 4.2.3.4.

Let Σ\Sigma be the set of Borel measurable maps from X\Xsf to A\Asf. Given σΣ\sigma \in \Sigma, we set

(T^σg)(x,a)=(MHσg)(x,a)where (Hσg)(x)r(x,σ(x))+βg(x,σ(x)).(\hat T_\sigma \, g)(x, a) = (M H_\sigma \, g)(x, a) \quad \text{where } (H_\sigma \, g)(x') \coloneq r(x', \sigma(x')) + \beta g(x', \sigma(x')).

With T^SE{T^σ}σΣ\hat{\TT}_{\rm SE} \coloneq \{\hat T_\sigma\}_{\sigma \in \Sigma}, the pair (bG,T^SE)(b\Gsf, \hat{\TT}_{\rm SE}) is an ADP and σΣ\sigma \in \Sigma is gg-greedy whenever

σ(x)argmaxaA[r(x,a)+βg(x,a)]for all xX.\sigma(x) \in \argmax_{a' \in \Asf} \left[ r(x, a') + \beta g(x, a') \right] \quad \text{for all } x \in \Xsf.

Since A\Asf is finite and nonempty, such a policy always exists. (A function σ\sigma obeying (4.62) can be chosen to be measurable---see the solution to Exercise 4.8.)

Given gbGg \in b\Gsf, the ADP Bellman operator T^\hat T satisfies T^g=T^σg\hat T g = \hat T_\sigma \, g whenever σ\sigma is gg-greedy. Using this fact and (4.62), we obtain T^g=MHg\hat T g = M H g. Hence any fixed point of T^\hat T solves the original Bellman equation (4.60).

4.2.3.4The Risk-Sensitive Case

As an illustration, suppose that MM is the risk-sensitive certainty equivalent

(Mf)(x,a)1γln{exp[γf(x)]P(x,a, ⁣dx)}((x,a)G),(M f)(x, a) \coloneq - \frac{1}{\gamma} \ln \left\{ \int \exp \left[ -\gamma f(x') \right] P(x, a, \diff x') \right\} \qquad ((x, a) \in \Gsf),

where PP is a stochastic kernel from G\Gsf to X\Xsf and γ\gamma is a nonzero constant. Among other things, Proposition 4.2.6 tells us that σΣ\sigma \in \Sigma is optimal if and only if

σ(x)argmaxaA[r(x,a)+βg(x,a)]for all xX,\sigma(x) \in \argmax_{a' \in \Asf} \left[ r(x, a') + \beta \gmax(x, a') \right] \quad \text{for all } x \in \Xsf,

where g\gmax is the unique solution to the functional equation

g(x,a)=1γln{exp{γmaxaA[r(x,a)+βg(x,a)]}P(x,a, ⁣dx)}g(x, a) = - \frac{1}{\gamma} \ln \left\{ \int \exp \left\{ -\gamma \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] \right\} P(x, a, \diff x') \right\}

in the value space bGb\Gsf.

Alternative certainty equivalents are discussed in Section 7.2.4.

4.3Chapter Notes

This chapter extends the ADP framework of Chapter 2 and Chapter 3 by working within partially ordered metric and topological spaces, which provide a natural setting for combining contraction-based and order-theoretic reasoning. In practice, most of the pospaces we examine are subsets of Banach lattices. Background on Banach lattices and positive operators can be found in the references listed in the chapter notes of Chapter 2. Du’s conditions for concave and convex operators, treated in Section 4.1.3, originate in Du (1990); see also Zhang (2012), Theorem 2.1.2. Certainty equivalent operators, which appear in Section 4.2.3.3, are covered in depth in Section 7.2.4.

The applications in this chapter draw on several fields. The discounting condition in Assumption 4.2.3 is similar to restrictions found in Hansen & Scheinkman (2012) and Borovička & Stachurski (2020). General results on dynamic programming with state-dependent discounting can be found in Stachurski & Zhang (2021). The structural estimation framework in Section 4.2.3 originates with the classic work of Rust (1987); for further background, see Rust (1994), Igami (2020), Kristensen et al. (2021), and Chapter 5 of Sargent & Stachurski (2025). A duality-based perspective on dynamic discrete-choice models is developed in Chiong et al. (2016). The non-expected utility extension in Section 4.2.3.3 is motivated by evidence reviewed in Lu et al. (2024). Nonexpected utility is considered again in Section 7.2.4.

References
  1. Stokey, N. L., & Lucas, R. E. (1989). Recursive methods in dynamic economics. Harvard University Press.
  2. Harrison, J. M., & Kreps, D. M. (1978). Speculative investor behavior in a stock market with heterogeneous expectations. The Quarterly Journal of Economics, 92(2), 323–336.
  3. Du, Y. (1990). Fixed points of increasing operators in ordered Banach spaces and applications. Applicable Analysis, 38(01–02), 1–20.
  4. Zhang, Z. (2012). Variational, topological, and partial order methods with their applications (Vol. 29). Springer.
  5. Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 999–1033.
  6. Bertsekas, D. P. (2022). Abstract dynamic programming (3rd ed.). Athena Scientific.
  7. Ren, G., & Stachurski, J. (2021). Dynamic programming with value convexity. Automatica, 130, 109641.
  8. Toda, A. A. (2024). Essential Mathematics for Economics (1st ed., p. 308). Chapman. 10.1201/9781032698953
  9. Rust, J. (1994). Structural estimation of Markov decision processes. Handbook of Econometrics, 4, 3081–3143.
  10. Kristensen, D., Mogensen, P. K., Moon, J. M., & Schjerning, B. (2021). Solving dynamic discrete choice models using smoothing and sieve methods. Journal of Econometrics, 223(2), 328–360.
  11. Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
  12. Lu, J., Luo, Y., Saito, K., & Xin, Y. (2024). Did Harold Zuercher Have Time-Separable Preferences?
  13. Hansen, L. P., & Scheinkman, J. A. (2012). Recursive utility in a Markov environment with stochastic growth. Proceedings of the National Academy of Sciences, 109(30), 11967–11972.
  14. Borovička, J., & Stachurski, J. (2020). Necessary and sufficient conditions for existence and uniqueness of recursive utilities. The Journal of Finance.
  15. Stachurski, J., & Zhang, J. (2021). Dynamic programming with state-dependent discounting. Journal of Economic Theory, 192, 105190.