ADPs on Banach Space - Dynamic Programming Volume II: General States

Many applications of interest have some kind of algebraic structure, for example when the value space is a subset of a vector space. In this chapter, we add algebraic structure to the value space and the policy operators. This structure can be exploited to provide sharp optimality conditions and more explicit results.

When introducing algebraic structure to ADP theory, we work in the setting of Banach lattices. These spaces are attractive for ADP analysis, due to well-integrated order, algebraic and metric properties. (See Section A.5.3.3 for background on Banach lattices. We recall that, given a Banach space $E$ , $\blop(E)$ is the set of all bounded linear operators from $E$ to itself and, for $L \in \blop(E)$ , the symbol $\rho(L)$ denotes the spectral radius (see Section A.4.3). We also use the notion of positive operators defined in Section A.5.2.2. Loosely speaking, positive operators between Banach lattices are generalizations of nonnegative matrices.)

We begin by developing optimality theory for ADPs on Banach lattices, covering contractions and Blackwell’s condition (Section 4.1.1), order contractions (Section 4.1.2), and concavity-based methods (Section 4.1.3). We then apply the theory to firm valuation, real options, and structural estimation.

4.1ADPs on Banach Space¶

In this section, $E$ is always a Banach lattice. For $V$ contained in $E$ , an ADP $(V, \TT)$ will be called additive if each $T_\sigma \in \TT$ has the form

T_\sigma \, v = r_\sigma + K_\sigma \, v

(4.1)

where $r_\sigma \in E$ and $K_\sigma$ is an order-preserving self-map on $V$ .

We study contractions and Blackwell’s condition in Section 4.1.1, order contractions in Section 4.1.2, and concavity-based methods in Section 4.1.3.

4.1.1Contractions¶

In Section 4.1.1.1 we review eventually contracting maps and Blackwell’s condition. In Section 4.1.1.2 we use these tools to obtain optimality results for ADPs satisfying a Blackwell-type discounting condition.

4.1.1.1Contractions in Banach Space¶

Let $E = (E, \|\cdot\|, \leq)$ be a Banach lattice with positive cone $E_+$ . Let $S$ be a self-map on a subset $V$ of $E$ . Recall from Section A.2.2.2 that $S$ is said to be eventually contracting on $V$ if $S^n$ is a contraction for some $n \in \NN$ ; that is, there exists a $\lambda \in [0, 1)$ and an $n \in \NN$ such that

\|S^n u - S^n v \| \leq \lambda \|u - v\| \quad \text{for all} \quad u, v \in V,

(4.2)

The following result is an obvious consequence of Theorem A.2.8.

Proof

Since $V$ is a closed subset of the Banach space $E$ , it is complete. Since $S$ is eventually contracting on $V$ , there exist $\lambda \in [0, 1)$ and $n \in \NN$ such that $S^n$ is a contraction of modulus $\lambda$ on $V$ . By Theorem A.2.8, $S$ is globally stable on $V$ with unique fixed point $v^*$ . For the convergence rate, fix $v \in V$ and write $m = qn + r$ with $0 \leq r < n$ . Applying Theorem A.2.7 to $S^n$ gives

\|S^m v - v^*\| = \|(S^n)^q (S^r v) - v^*\| \leq \lambda^q \|S^r v - v^*\| = \OO(\beta^m),

where $\beta \coloneq \lambda^{1/n}$ , since $\lambda^q = \beta^{m-r}$ and $\|S^r v - v^*\|$ is bounded over $r \in \{0, \ldots, n-1\}$ . ◻

There is a well-known technique for testing for contractivity (or eventual contractivity) of order preserving maps via Blackwell’s condition. This technique is often used for dynamic programming (see, e.g., Stokey & Lucas (1989), Theorem 3.3). Here we state an abstract version, for a self-map $S$ on $V \subset E$ , focusing on contractivity in one step. (For eventual contractivity, replace $S$ with $S^n$ .) In the statement, we assume that $E$ has a normalized order unit. Recalling the definition in Section A.5.3.4, this means that there exists an $e \in E_+$ obeying $\| e\| = 1$ and $|v| \leq \|v\| e$ for all $v \in E$ . Also, we recall that $V \subset E$ is called increasing if $u, v \in E$ with $u \leq v$ and $u \in V$ implies $v \in V$ .

Proof

Let $S, V$ have the stated conditions. Fix $v, w \in V$ . We have

Sv = S(w + v - w) \leq S(w + |v - w|) \leq S(w + \|v - w\| \cdot e),

where the inequalities follow from the monotonicity of $S$ and the properties of the order unit $e$ . Applying (4.3) and rearranging gives

Sv - Sw \leq \lambda \|v - w\| \cdot e.

Solution to Exercise 4.1.1

We apply Lemma 4.1.2 with $E = V = b\Xsf$ , $e = \1$ , and $\lambda = \beta$ . The set $V = b\Xsf$ is increasing and the operator $S$ is order-preserving: if $p \leq q$ pointwise, then $\int [p(x') + g(x')] P_i(x, \diff x') \leq \int [q(x') + g(x')] P_i(x, \diff x')$ for each $i$ , and taking the maximum over $i$ and multiplying by $\beta \geq 0$ preserves the inequality.

For the discounting condition, fix $p \in b\Xsf$ and $\kappa \geq 0$ . Since each $P_i$ is a stochastic kernel, $\int \kappa \, P_i(x, \diff x') = \kappa$ , so

S(p + \kappa\1)(x) = \beta \max_{i \in I} \int [p(x') + \kappa + g(x')] P_i(x, \diff x') = (Sp)(x) + \beta \kappa.

Hence $S(p + \kappa\1) \leq Sp + \beta \kappa \1$ , and Lemma 4.1.2 gives the result.

4.1.1.2Optimality for Blackwell ADPs¶

Throughout this section, $E$ is a Banach lattice containing a normalized order unit $e$ and $V$ is a closed increasing subset of $E$ . In the statement of the next theorem, $V_0$ is a subset of $V$ .

A self-map $M$ on $V$ will be called a certainty equivalent operator if $M$ is order preserving and translation invariant; that is,

\text{ } M(v + \kappa e) = Mv + \kappa e \text{ for all } v \in V \text{ and all } \kappa \in \RR_+ \text{ }.

(The terminology comes from the abstract theory of certainty equivalents. We cover these objects in more detail in Section 7.2.4.)

The next exercise treats a special case in which the policy operators of an additive ADP (see (4.1)) are built from certainty equivalent operators. Certainty equivalents are treated in more detail in Section 7.2.4. As before, $V$ is a closed increasing subset of $E$ and $V_0$ is a subset of $V$ .

4.1.2Order Contractions¶

We first define order contractions and establish fixed point results. We then obtain optimality conditions for order contracting ADPs and specialize to the case of affine policy operators in Section 4.1.2.3.

4.1.2.1Order Contractions¶

Let $E = (E, \|\cdot\|, \leq)$ be a Banach lattice with positive cone $E_+$ . We call $D \colon E_+ \to E_+$ a discount operator on $E$ if

$D0 = 0$ ,
$D$ is order-preserving, and
$D$ is eventually contracting.

The reason we call such an operator $D$ a discount operator will become clearer below. Intuitively, $D$ is eventually contracting and has a fixed point at zero, so $D^n h \to 0$ for all $h \in E_+$ . If we think of $h$ as a time- $n$ payoff and $D^n h$ as its (state-contingent) present value, then the properties of $D$ imply that the present value is increasing in the payoff (since $D$ is order-preserving) and the present value converges to zero as the future date of the payoff moves to the infinite future.

Fix $V \subset E$ . Let $S$ be a self-map on $V$ . We call $S$ an order contraction of modulus $D$ on $V$ if there exists a discount operator $D$ on $E$ such that

|S \,v - S \, w| \leq D |v - w| \quad \text{for all} \quad v, w \in V.

(4.6)

Order contracting maps obey the following fixed point result.

Proof

\| S^n v - S^n w \| \leq \| D^n |v - w |\| \leq \lambda \| v - w \|.

Hence $S^n$ is a contraction and the claims follow from Theorem 4.1.1. ◻

The next exercise states a variation on Blackwell’s condition that is suitable for order contractions. To begin, we take $V$ to be a subset of $E$ such that

v + h \in V \text{ whenever } v \in V \text{ and } h \in E_+.

4.1.2.2Order Contracting ADPs¶

Next we study ADPs where the value space $V$ is a subset of a Banach lattice $E$ . The results in this section extend to ADPs where the operators are order contractions. We do not require existence of a normalized order unit, as in Section 4.1.1.2. This will allow us to handle ADPs that evolve in $L_p$ spaces and simplify treatment of features such as state-dependent discounting.

Our first result treats the case where $\TT$ is finite.

Our second result replaces finiteness with uniform order contractivity. In the statement of the next theorem, $V_0$ is a subset of $V$ and $V$ is closed in $E$ .

The proof is straightforward:

Proof

Let $(V, \TT)$ have the stated properties. Each $T_\sigma$ is order contracting with respect to $D$ on the closed set $V$ and therefore globally stable (by Theorem 4.1.4). Hence the ADP is globally stable, and therefore order stable (Lemma 3.1.1). Regarding the Bellman operator, observe that, for $v, w \in V_0$ ,

T_\sigma \, v = T_\sigma \, w + T_\sigma \, v - T_\sigma \, w \leq T_\sigma \, w + |T_\sigma \, v - T_\sigma \, w| \leq T w + D | v - w|,

(4.9)

where the last step uses $T_\sigma \leq T$ on $V_0$ and the order contraction bound. Taking the supremum over $\sigma$ gives $Tv - Tw \leq D|v-w|$ . Reversing the roles of $v$ and $w$ yields $|Tv - Tw| \leq D|v-w|$ . Hence $T$ is order contracting of modulus $D$ on $V_0$ . Since $V_0$ is closed in $V$ and $V$ is closed in $E$ , the set $V_0$ is closed in $E$ . Applying Theorem 4.1.4 to $T$ on $V_0$ , we see that $T$ has a unique fixed point in $V_0 \subset V_G$ . Hence, by Corollary 2.1.6, the fundamental optimality properties hold. Because, under those properties, $\vmax$ is the unique fixed point of $T$ in $V_G$ , and because $T$ has a fixed point in $V_0 \subset V_G$ , we see that $\vmax \in V_0$ . This proves claims (i)--(ii). For (iii), Theorem 4.1.4 applied to $T$ on $V_0$ also gives $\|T^m v - \vmax\| = \OO(\beta^m)$ for each $v \in V_0$ , which is geometric convergence of VFI. Convergence of OPI and HPI under regularity follows from Theorem 3.1.2. ◻

4.1.2.3Order Contractive Linear Models¶

Next we focus on ADPs operating in Banach lattices where the policy operators are affine. As before, $E$ is a Banach lattice with positive cone $E_+$ , linear operators $\blop(E)$ and positive linear operators $\blop_+(E)$ .

Our second result replaces finiteness with a uniform discount operator bound. In the statement, $V_0$ is a subset of $V$ and $V$ is closed in $E$ .

Theorem 4.1.8

Let $(V, \TT)$ be an additive ADP where each $T_\sigma \in \TT$ has the form

T_\sigma \, v = r_\sigma + K_\sigma \, v \quad \text{for some } r_\sigma \in E \text{ and } K_\sigma \in \blop_+(E).

Suppose there exists a discount operator $D$ on $E$ such that $K_\sigma \leq D$ on $E_+$ for all $\sigma \in \Sigma$ . If, in addition, $(V, \TT)$ is semi-regular on $V_0$ and $V_0$ is closed in $V$ , then

the fundamental optimality properties hold,
the value function $\vmax$ lies in $V_0$ , and
VFI converges geometrically on $V_0$ .

If $(V, \TT)$ is also regular, then OPI and HPI converge.

4.1.3Concavity and Convexity¶

Some dynamic programs involve nonlinear policy operators that fail to be contractions. For example, models with recursive preferences or ambiguity often have these features. In this setting, we can deploy alternative fixed point results related to concavity and convexity of operators. Here we apply such results to obtain optimality conditions for ADPs.

We first state Du’s fixed point theorem for concave and convex operators and then apply it to obtain optimality conditions for ADPs on order intervals.

4.1.3.1Fixed Point Results¶

Let $E = (E, \|\cdot\|, \leq)$ be a Banach lattice and suppose that $V = [a, b]$ for some $a, b \in E$ . In this setting, we say that a self-map $S$ on $V$ satisfies Du’s conditions if either

$S$ is concave and $S \, a \geq a + \epsilon (b-a)$ for some $\epsilon \in (0,1)$ , or
$S$ is convex and $S \, b \leq b - \epsilon (b-a)$ for some $\epsilon \in (0,1)$ .

We state a result below connecting Du’s conditions to global stability. Before doing so, we note some useful sufficient conditions that can be used when the positive cone of the Banach lattice has nonempty interior. To state them we write $x \ll y$ if $y - x$ is interior to $E_+$ .

Here is the stability result.

For a proof consult Du (1990) or Zhang (2012), Theorem 2.1.2.

Figure 4.1:Du’s conditions in one dimension

4.1.3.2Optimality Theory¶

The following result provides optimality conditions for ADPs where the policy operators satisfy Du’s conditions.

4.2Applications¶

We apply the theory developed above to three classes of problems. In Section 4.2.1 we study firm valuation under constant discounting, unbounded rewards, and state-dependent discounting. In Section 4.2.2 we analyze a real option problem. In Section 4.2.3 we treat structural estimation models, including state-dependent discounting and non-expected utility preferences.

4.2.1Firm Valuation¶

Let’s now return to the firm valuation problem from Section 1.1 and discuss optimality results. We will look at the original bounded case with discounting, a second case with unbounded rewards (boundedness of the profit function is replaced by an integrability condition), and a third case involving state-dependent discounting.

4.2.1.1Constant Discounting¶

In Section 2.3.1, we revisited the firm problem from Section 1.1 and showed that $(b\Xsf, \TT_{\rm FV})$ is an ADP, where $\TT_{\rm FV} \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ is the set of all policy operators having the form

T_\sigma \, v = \sigma s + (1 - \sigma) \left( \pi + \beta Pv \right).

(4.10)

(Here $\sigma s$ is understood as the map $x \mapsto s\sigma(x)$ and so on.) We recall that the policy set $\Sigma$ is all $\bB$ -measurable functions mapping $\Xsf$ to $\{0,1\}$ , which coincides with the set of indicator functions on $\bB$ , and that the indicator function

\sigma = \1\left\{s \geq \pi + \beta Pv \right\}

(4.11)

is $v$ -greedy. Existence of $v$ -greedy policies for all $v$ implies that the firm ADP is regular.

In Theorem 1.1.1 we stated optimality results for the firm valuation problem. In Section 1.1.1.4 we supplied a direct proof. Here’s a more general result and a proof using the theory from this chapter:

Proof

We apply Theorem 4.1.3 with $e = \1$ and $\lambda = \beta$ . Writing $T_\sigma \, v = r_\sigma + K_\sigma \, v$ where

r_\sigma \coloneq \sigma s + (1 - \sigma) \pi \quad \text{and} \quad K_\sigma \coloneq (1-\sigma) \beta P,

(4.12)

and using $P \1 = \1$ , we have $T_\sigma(v + \kappa \1) = T_\sigma \, v + (1-\sigma) \beta \kappa \1 \leq T_\sigma \, v + \beta \kappa \1$ for all $v \in b\Xsf$ and $\kappa \in \RR_+$ . As $(b\Xsf, \TT_{\rm FV})$ is regular, the conclusions of Theorem 4.1.3 apply. ◻

As usual, the ADP Bellman operator $T$ obeys $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$ -greedy, so, using the description of greedy policies above, we have $T v = s \vee (\pi + \beta P v)$ . The Bellman equation is therefore $v = s \vee (\pi + \beta P v)$ , which, in expanded form, becomes

v(x) = \max \left\{ s, \; \pi(x) + \beta \int v(x') P(x, \diff x') \right\}

(4.13)

Proposition 4.2.1 implies that the value function $\vmax \coloneq \bigvee_\sigma v_\sigma$ solves the Bellman equation (4.13), and that it can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are $\vmax$ -greedy, so $\sigma = \1\{s \geq \pi + \beta P \vmax\}$ is optimal. This is the only optimal policy under the convention that the manager always sells the firm when indifferent between selling and continuing.

Figure 4.2:HPI iterates for the firm valuation problem

Figure 4.2 illustrates HPI applied to the firm valuation problem using the same parameters as Figure 1.4. The initial policy is $\sigma_0 \equiv 0$ (never sell). At each step $k$ , the lifetime value $v_{\sigma_k}$ is obtained by solving the linear system $v = r_{\sigma_k} + K_{\sigma_k} v$ , where $r_{\sigma_k}$ and $K_{\sigma_k}$ are as defined in (4.12). The next policy $\sigma_{k+1}$ is then set to be $v_{\sigma_k}$ -greedy. In the right panel, the value functions rise monotonically towards $\vmax$ (shown in black). The left panel shows the corresponding sell thresholds converging to the optimal policy $\sigopt$ .

4.2.1.2Unbounded Rewards¶

Proposition 4.2.1 used the assumptions in Theorem 1.1.1, one of which was that the profit function $\pi$ is bounded. In practice, many of the functions that we use for applied modeling are unbounded. In Section 1.1.2.2 we sketched ideas for extending the optimality results to an unbounded setting, where $\pi$ is assumed instead to be integrable. Let’s continue that discussion here.

Instead of boundedness, we assume $\pi$ lies in $L_1(\psi) \coloneq L_1(\Xsf, \bB, \psi)$ for a distribution $\psi$ on $(\Xsf, \bB)$ that is stationary for $P$ (see Section A.5.4.4). We endow $L_1(\psi)$ with the almost everywhere pointwise partial order (see Section A.5.2.5). In particular, for $f, g \in L_1(\psi)$ , the statement $f \leq g$ means that $\setntn{x \in \Xsf}{f(x) > g(x)}$ has $\psi$ -measure zero.

The firm valuation ADP is now $(L_1(\psi), \TT_{\rm FV})$ , where each $T_\sigma$ in $\TT_{\rm FV}$ again has the form (4.10), but now with $v$ and $\pi$ being elements of $L_1(\psi)$ , while $P$ is understood as a Markov operator on $L_1(\psi)$ . In Section 1.1.2.2 we showed that each policy operator $T_\sigma$ maps $L_1(\psi)$ into itself. Moreover, $T_\sigma$ is order preserving with respect to $\leq$ ; if $v \leq w$ holds $\psi$ -almost everywhere, then

\int v(x') P(x, \diff x') \leq \int w(x') P(x, \diff x')

for all $x \in \Xsf$ , and $T_\sigma \, v \leq T_\sigma \, w$ easily follows. This confirms that $(L_1(\psi), \TT_{\rm FV})$ is an ADP. The ADP is regular because, given $v \in L_1(\psi)$ , the indicator in (4.11) is still $v$ -greedy.

The claims in Proposition 4.2.1 extend to the ADP $(L_1(\psi), \TT_{\rm FV})$ . The proof follows from Theorem 4.1.8. Each $T_\sigma$ has the required form $T_\sigma \, v = r_\sigma + K_\sigma \, v$ for $r_\sigma \in L_1(\psi)$ and $K_\sigma \in \blop_+(L_1(\psi))$ ; we again set $r_\sigma$ and $K_\sigma$ as in (4.12). With $K \coloneq \beta P$ , we have $K_\sigma \leq K$ for all $\sigma$ and, by Lemma A.5.32, $\rho(K) = \rho(\beta P) = \beta < 1$ . As $(L_1(\psi), \TT_{\rm FV})$ is regular, the conclusions of Theorem 4.1.8 apply.

4.2.1.3State-Dependent Discounting¶

We emphasized the importance of time-varying discount rates in Section 1.1.2.1. To incorporate such variation into our model, we set $r_t = r(X_t)$ , where $r$ is a $\bB$ -measurable function, and then $\beta(x) = 1/(1+r(x))$ for all $x \in \Xsf$ . We require that $r > -1$ , so that $\beta > 0$ . For simplicity, we also assume that $\beta$ is bounded and, as in Section 4.2.1.1, that the profit function $\pi$ is bounded. At current state $x$ , a fixed policy $\sigma$ yields lifetime firm value $v_\sigma(x)$ , where $v_\sigma$ satisfies the recursion

v_\sigma(x) = \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta(x) \int v_\sigma(x') P(x, \diff x') \right] \quad (x \in \Xsf).

Equivalently, $v_\sigma$ is a fixed point of $T_\sigma$ defined at $v \in b\Xsf$ by

T_\sigma \, v = \sigma s + (1-\sigma)(\pi + Kv),

(4.14)

where

(Kv)(x) \coloneq \beta(x) \int v(x') P(x, \diff x') \qquad (v \in b\Xsf, \; x \in \Xsf).

The operator $K$ discounts future cash flows given the dynamics of discounting embedded in $\beta$ and $P$ . Since $\beta$ is bounded, $K$ is a positive linear operator sending $b\Xsf$ into itself. It follows that each $T_\sigma$ is an order-preserving self-map on $b\Xsf$ . Hence, letting $\TT_{\rm FV}$ be all policy operators of the form (4.14), with $\sigma$ ranging over the policy set $\Sigma$ , the pair $(b\Xsf, \TT_{\rm FV})$ is an ADP. It represents the dynamic decision problem for the firm under state-dependent discounting.

Using similar arguments to the constant discounting case, we can confirm that the policy $\sigma = \1\{s \geq \pi + Kv\}$ is $v$ -greedy. The ADP Bellman operator $T$ obeys $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$ -greedy, so, using the description of greedy policies just given, we have $T v = s \vee (\pi + K v)$ . The Bellman equation is therefore $v = s \vee (\pi + K v)$ , which, in expanded form, becomes

v(x) = \max \left\{ s, \; \pi(x) + \beta(x) \int v(x') P(x, \diff x') \right\}

(4.15)

Since greedy policies always exist, the ADP $(b\Xsf, \TT_{\rm FV})$ is regular.

In order to obtain optimality results, we need some degree of stability for the ADP. To this end, we impose the following condition:

Assumption 4.2.1 requires that the discount factor is sufficiently small “on average,” so that lifetime values are finite and uniquely defined.

To illustrate the factors that influence $\rho(K)$ , suppose that $(X_t)$ follows the AR(1) process $X_{t+1} = \mu(1-\alpha) + \alpha X_t + \nu Z_{t+1}$ , with $Z_t$ IID standard normal, $0 < \alpha < 1$ , and $\nu > 0$ , discretized via Tauchen’s method, with discount factor $\beta(x) = e^x$ . Figure 4.3 shows contour plots of $\rho(K)$ as $\alpha$ and $\nu$ vary, for two values of the long-run mean $\mu$ . The black line marks the boundary $\rho(K) = 1$ ; Assumption 4.2.1 holds below and to the left of this line. A more negative $\mu$ (lower average discount factor) enlarges the stable region. In both panels, $\rho(K)$ increases with persistence and volatility, reflecting the fact that greater variation in the state process pushes the “average” discount factor upward.

Spectral radius \rho(K) as a function of the AR(1) parameters, with \beta(x) = e^x — Figure 4.3:Spectral radius $\rho(K)$ as a function of the AR(1) parameters, with $\beta(x) = e^x$

We can now state the following result for the firm valuation ADP $(b\Xsf, \TT_{\rm FV})$ under state-dependent discounting.

Solution to Exercise 4.2.1

We apply Theorem 4.1.8. Each $T_\sigma$ has the form $T_\sigma \, v = r_\sigma + K_\sigma \, v$ for $r_\sigma \in b\Xsf$ and $K_\sigma \in \blop_+(b\Xsf)$ ; we set $r_\sigma \coloneq \sigma s + (1 - \sigma) \pi$ and $K_\sigma \coloneq (1-\sigma) K$ , where $(1-\sigma) K$ is the operator defined by $((1-\sigma) K v)(x) = (1-\sigma(x)) (Kv)(x)$ . Since $0 \leq K_\sigma \leq K$ for all $\sigma$ and $\rho(K) < 1$ by Assumption 4.2.1, the conditions of Theorem 4.1.8 hold. As $(b\Xsf, \TT_{\rm FV})$ is regular, the conclusions of the theorem apply.

4.2.2A Real Option Problem¶

Consider a firm that has developed a prototype product and faces a strategic decision: launch now, or continue development? The relative payoffs for these different choices depend on the economic environment, which evolves stochastically over time. This scenario is an example of a real option problem. The firm holds an option to launch, analogous to a call option on a financial asset, and must determine when to exercise it. Waiting has value when it allows the firm to launch under more favorable conditions.

4.2.2.1Set Up¶

A state process $(X_t)$ summarizes payoff-relevant information at time $t$ , such as demand conditions, competitive pressure, factor prices, and regulatory stance. The state influences the firm through two channels. Development costs $c(X_t)$ rise when input prices increase or skilled labor becomes scarce. Post-launch profitability $\pi(X_t)$ depends on demand strength and competitive conditions at the time of launch. Management strategies must balance these forces, weighing the benefits of launching in different states against the costs of continued development.

We assume that $(X_t)_{t \geq 0}$ is $P$ -Markov, where $P$ is a stochastic kernel on the measurable space $(\Xsf, \bB)$ . At the start of time $t$ , the firm pays a current flow cost $c(X_t)$ for development. Then, after observing the new state $X_{t+1}$ , management decides whether or not to launch the product. If they decide to launch, the firm receives a state-contingent profit flow $(\pi(X_{t+j}))_{j \geq 1}$ , where $\pi \colon \Xsf \to \RR$ is a given function. If they decide to wait, then no launch occurs and the process repeats. We impose the following weak conditions.

We will seek solutions to the optimization problem in the space $L_1(\phi) \coloneq L_1(\Xsf, \bB, \phi)$ . As in Section 4.2.1.2, we endow $L_1(\phi)$ with the usual $L_1$ norm and the $\phi$ -a.e. pointwise order $\leq$ , so that $f \leq g$ means $\phi \{f > g\} = 0$ .

Let $(Q_{t+j})_{j \geq 1}$ be the expected present value of the profit flow conditional on deciding to launch. This sequence obeys the recursion

Q_{t + j} = \pi(X_{t+j}) + \beta(X_{t+j}) \EE_{t + j} Q_{t + j + 1} \qquad (j \geq 1).

Exploiting the time homogeneity of the state process and following the same logic that we used to solve the recursion (1.1), we find that $Q_{t+j} = q(X_{t+j})$ for all $j \geq 1$ , where $q$ is the function that solves

q(x) = \pi(x) + \beta(x) \int q(x') P(x, \diff x') \qquad (x \in \Xsf).

Letting $K$ be the discount operator defined on $L_1(\phi)$ via

(Kv)(x) \coloneq \beta(x) \int v(x') P(x, \diff x') \qquad (v \in L_1(\phi), \; x \in \Xsf).

we can write the recursion as $q = \pi + K q$ . Under the $\phi$ -a.e. pointwise order, $K$ is order-preserving.

We can confirm that $K$ maps $L_1(\phi)$ to itself by using the fact that, under Assumption 4.2.2, the discount function $\beta$ obeys $|\beta| \leq N$ for some $N \in \NN$ . Hence, for $v \in L_1(\phi)$ ,

|(Kv)(x)| \leq |\beta(x)| \int |v(x')| P(x, \diff x') \leq N \int |v(x')| P(x, \diff x'),

and $v \in L_1(\phi)$ implies $P|v| \in L_1(\phi)$ by Lemma A.5.32.

4.2.2.2Lifetime Values¶

We derive lifetime values under a spectral radius condition on the discount operator and establish well-posedness of the ADP. Optimality results are given in Section 4.2.2.3.

We impose conditions under which the recursion $q = \pi + K q$ has a unique solution, implying that the profit function associated with launching the product is well defined.

Under Assumption 4.2.3, the solution to the recursion $q = \pi + K q$ is

q = (I - K)^{-1} \pi.

With this function in hand, we can write out the lifetime value of policies. A policy here is a Borel measurable map $\sigma$ from $\Xsf$ to $\{0,1\}$ with $\sigma(x) = 1$ indicating the decision to launch the product at state $x$ and $\sigma(x) = 0$ indicating the decision to continue. As usual, we use $\Sigma$ to represent the set of all policies. If $\sigma \in \Sigma$ and $x \in \Xsf$ , then $v_\sigma(x)$ denotes total firm value under policy $\sigma$ , given initial state $x$ . This function $v_\sigma$ obeys the recursion

v_\sigma(x) = -c(x) + \beta(x) \int [\sigma(x') q(x') + (1 - \sigma(x')) v_\sigma(x') ] P(x, \diff x') \quad (x \in \Xsf).

Equivalently, $v_\sigma$ is a fixed point of $T_\sigma$ defined at $v \in L_1(\phi)$ by

T_\sigma \, v = -c + K (\sigma \, q + (1 - \sigma) v)

(4.16)

Under Assumption 4.2.2 and Assumption 4.2.3, each $T_\sigma$ is a self-map on $L_1(\phi)$ . Since $K$ is a positive operator, each $T_\sigma$ is order preserving. Hence, letting $\TT_{\rm RO}$ be all policy operators of the form (4.16), with $\sigma$ ranging over the policy set $\Sigma$ , the pair $(L_1(\phi), \TT_{\rm RO})$ is an ADP.

4.2.2.3Optimality¶

Let’s now turn to optimality. We start by studying greedy policies.

Exercise 4.2.3 implies that $(L_1(\phi), \TT_{\rm RO})$ is regular. From Lemma 2.1.1 we know that the ADP Bellman operator satisfies $Tv = T_\sigma \, v$ whenever $\sigma$ is $v$ -greedy. Using this fact and the greedy policy $\sigma = \1\{q \geq v\}$ from Exercise 4.2.3, we obtain

T v = T_\sigma \, v = -c + K (\sigma \, q + (1 - \sigma) v) = -c + K (q \vee v).

(4.18)

It follows that the Bellman equation for the ADP is $v = -c + K (q \vee v)$ . Rewriting this expression using the definition of $K$ , we get

v(x) = -c(x) + \beta(x) \int \max \left\{q(x'), v(x') \right\} P(x, \diff x')

(4.19)

We can now state the following result for the real option ADP $(L_1(\phi), \TT_{\rm RO})$ .

Proposition 4.2.3 implies that the value function $\vmax$ solves the Bellman equation (4.19), and that $\vmax$ can be computed, at least approximately, by VFI, HPI or OPI. Moreover, policies are optimal if and only if they are $\vmax$ -greedy, which, by Exercise 4.2.3, translates to setting $\sigma = \1\{q \geq \vmax\}$ .

4.2.3Structural Estimation¶

Structural estimation is a core sub-field of quantitative economics that also plays a significant role in finance, marketing, operations research, and adjacent fields. Under this approach to estimation, researchers model economic agents as if they solve dynamic programs. The econometric challenge is to infer parameters that bring the model outputs (which are typically simulated from solutions to the underlying dynamic programs) as close as possible to the data. Algorithm 4.2.1 gives an outline of the idea, with DP( $\theta$ ) referring to a given dynamic program using a fixed parameterization indexed by $\theta$ .

Typically, DP( $\theta$ ) needs to be solved many times before convergence. In this section, we set aside the estimation step, where $d(\Msf_\theta, \Dsf)$ is constructed and $\theta$ is updated. We focus instead on the step where we compute optimal policy $\sigma_\theta$ by solving DP $(\theta)$ . The types of dynamic programs typically adopted in this field have some interesting characteristics, which motivates our study.

We note that structural estimation is sometimes called dynamic discrete choice because the action space is typically finite. Below we will look at settings where the state space is arbitrary and the action space is finite.

4.2.3.1Post-Action Value Functions¶

Rust (1987) and many subsequent authors study discrete choice problems with modified Bellman equations that take the form

g(z, a) = \int \int \max_{a' \in \Asf} \left\{ r(z', e', a') + \beta g(z', a') \right\} F(\diff e' \given z) G(\diff z' \given z, a).

Here $F$ and $G$ are conditional distributions, while $e$ is, in most cases, a form of unobserved heterogeneity. By taking $x = (z, e)$ and relabeling, we can write this equation as

g(x, a) = \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x')

(4.20)

Here $(x,a)$ takes values in $\Gsf \coloneq \Xsf \times \Asf$ . We take $\Xsf$ to be a metric space. The reward function $r \in \RR^\Gsf$ is assumed to be bounded and Borel measurable, while $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$ . The function $g$ is usually called a post-action value function, since it returns the value of the state after committing to a given action in the current period.

The Bellman equation (4.20) is nonstandard relative to traditional presentations of dynamic programming due to (a) the reversed order of integration and maximization, and (b) the dependence of $g$ on both state and action. It also fails to fit into the abstract dynamic programming formulation of Bertsekas (2022), Ren & Stachurski (2021), Toda (2024), etc. At the same time, there are significant advantages of working with this version of the Bellman equation in structural estimation settings (see, e.g., Rust (1994), Kristensen et al. (2021), or Chapter 5 of Sargent & Stachurski (2025)).

Despite its nontraditional format, we can set this problem up as an ADP by taking $\Sigma$ to be the set of Borel measurable maps from $\Xsf$ to $\Asf$ and, for each $\sigma \in \Sigma$ , introducing the policy operator

(\hat T_\sigma \, g)(x, a) = \int [ r(x', \sigma(x')) + \beta g(x', \sigma(x')) ] P(x, a, \diff x').

(4.21)

Recalling that $\Gsf$ is the product space $\Xsf \times \Asf$ , let $(b\Gsf, \leq)$ be the set of bounded Borel measurable functions in $\RR^\Gsf$ paired with the pointwise partial order. Using boundedness and measurability of $r$ , it is straightforward to show that each $\hat T_\sigma$ is an order preserving self map on $(b\Gsf, \leq)$ . Letting $\hat{\TT}_{\rm SE}$ be the set of all such $\hat T_\sigma$ , the pair $(b\Gsf, \hat{\TT}_{\rm SE})$ forms an ADP.

Any $\sigma \in \Sigma$ obeying

\sigma(x) \in \argmax_{a \in \Asf} \{r(x, a) + \beta g(x, a)\} \quad \text{for all } x \in \Xsf

(4.22)

is $g$ -greedy, since, for such a $\sigma$ and any $\tau \in \Sigma$ , we clearly have

\hat T_\tau \, g(x, a) \leq \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x') = \hat T_\sigma \, g(x, a)

for all $(x, a) \in \Gsf$ .

Does such a $\sigma$ necessarily exist? On one hand, $\Asf$ is finite and nonempty, so the argmax set is nonempty for all $x$ . On the other hand, we still need to address the following measurability issue:

(To address this problem, we can use a measurable selection theorem, such as Theorem A.3.3. But that theorem has a difficult proof. In the solution to Exercise 4.2.4, we use a more elementary argument.)

Solution to Exercise 4.2.4

To construct a measurable selection, we set $q(x, a) \coloneq r(x, a) + \beta g(x, a)$ on $\Gsf$ . The map $x \mapsto q(x, a)$ is Borel measurable for each $a \in \Asf$ and $m(x) \coloneq \max_a q(x, a)$ is also measurable (because $\Asf$ is finite). Since $\Asf$ is finite, we can enumerate it as $\Asf = \{a_1, \ldots, a_n\}$ and define $\sigma$ by setting $\sigma(x) = a_{i}$ where $i$ is the smallest index such that $q(x, a_i) = m(x)$ . Since $i$ depends on $x$ , we write it more explicitly as $i(x)$ . With this definition, for fixed $k \in [n]$ , the set $\setntn{x}{i(x) = k}$ equals

\setntn{x}{q(x, a_k) = m(x) \,} \cap \setntn{x}{q(x, a_j) < m(x) \text{ for all } j < k \,}

As the intersection of measurable sets, this set is Borel measurable. Hence $x \mapsto i(x)$ is Borel measurable. It follows that $\sigma$ is a measurable selection.

In view of Exercise 4.2.4, the ADP $(b\Gsf, \hat{\TT}_{\rm SE})$ is regular.

Solution to Exercise 4.2.5

Fix $g, h \in b\Gsf$ and $\sigma \in \Sigma$ . For any $(x, a) \in \Gsf$ , we have

(\hat T_\sigma g)(x, a) - (\hat T_\sigma h)(x, a) = \beta \int [g(x', \sigma(x')) - h(x', \sigma(x'))] P(x, a, \diff x').

Taking absolute values and using the triangle inequality for integrals,

|(\hat T_\sigma g)(x, a) - (\hat T_\sigma h)(x, a)| \leq \beta \int |g(x', \sigma(x')) - h(x', \sigma(x'))| P(x, a, \diff x').

Since $|g(x', \sigma(x')) - h(x', \sigma(x'))| \leq \|g - h\|$ for all $x'$ , we obtain

|(\hat T_\sigma g)(x, a) - (\hat T_\sigma h)(x, a)| \leq \beta \|g - h\| \int P(x, a, \diff x') = \beta \|g - h\|.

Taking the supremum over $(x, a) \in \Gsf$ yields the desired bound.

Given $g$ , let $\sigma$ be a $g$ -greedy policy. The ADP Bellman operator obeys $\hat T g = \hat T_\sigma \, g$ . Using this fact and (4.22), we see that

(\hat T g)(x, a) = \int \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] P(x, a, \diff x')

for all $(x,a) \in \Gsf$ . It follows that the ADP Bellman equation $g = \hat T g$ is equivalent to (4.20), confirming that our ADP accurately represents the problem we began with. We can now turn to optimality.

4.2.3.2State-Dependent Discounting¶

Next we consider a modification of the structural estimation model in Section 4.2.3.1 that includes state-dependent discounting. In this setting, the Bellman equation becomes

g(x, a) = \sum_{x'} \max_{a' \in \Asf} \left[ r(x', a') + \beta(x') g(x', a') \right] P(x, a, x')

(4.23)

where $(x,a) \in \Gsf \coloneq \Xsf \times \Asf$ and $\Asf, \Xsf$ are finite and nonempty. As usual, $r$ is a reward function on $\Gsf$ and $P$ is a transition kernel from $\Gsf$ to $\Xsf$ . The discount factor $\beta$ is allowed to be a function of the state. We let $\| \cdot \|$ be the supremum norm and take $\Sigma$ to be the set of all functions from $\Xsf$ to $\Asf$ . The state $\Xsf$ is taken to be finite to simplify the analysis.

Given $\sigma \in \Sigma$ , let $\hat T_\sigma$ be defined at $g \in \RR^\Gsf$ and $(x,a) \in \Gsf$ by

(\hat T_\sigma \, g) (x, a) = \sum_{x'} \left[ r(x', \sigma(x')) + \beta(x') g(x', \sigma(x')) \right] P(x, a, x')

Let $\hat{\TT}_{\rm SE} = \{\hat T_\sigma\}_{\sigma \in \Sigma}$ . Each $\hat T_\sigma$ is an order-preserving self-map on $\RR^\Gsf$ , so $(\RR^\Gsf, \hat{\TT}_{\rm SE})$ is an ADP. For each $g \in \RR^\Gsf$ , we can construct a $g$ -greedy policy by taking a $\sigma \in \Sigma$ such that

\sigma(x) \in \argmax_{a \in \Asf} [r(x, a) + \beta(x) g(x, a)] \quad \text{for all } x \in \Xsf.

(4.24)

Since $\Asf$ is finite and nonempty, such a $\sigma$ exists. (We have no measurability issues here because $\Xsf$ is also finite.) As a result, the ADP $(\RR^\Gsf, \hat{\TT}_{\rm SE})$ is regular.

Let $\hat T$ be the Bellman operator, so that $\hat T g \coloneq \bigvee_\sigma \hat T_\sigma \, g$ . When evaluated at a $g$ -greedy policy $\sigma$ , we have $\hat T_\sigma \, g = \hat T g$ . Using this equality and (4.24) yields

(\hat T g)(x, a) = \sum_{x'} \max_{a' \in \Asf} \left[ r(x', a') + \beta(x') g(x', a') \right] P(x, a, x').

This confirms that solutions to the ADP Bellman equation $\hat T g = g$ solve the original Bellman equation (4.23).

For each $\sigma \in \Sigma$ we set

(K_\sigma g)(x, a) = \sum_{x'} \beta(x') g(x', \sigma(x')) P(x, a, x') \qquad (x, a) \in \Gsf

(4.25)

We can now state the following result:

This discounting condition generalizes the traditional assumption that $\beta$ is constant and strictly less than one, in which case $\rho(K_\sigma) < 1$ always holds.

Proof

Proof of Proposition 4.2.5.

We have already shown that $(\RR^\Gsf, \hat{\TT}_{\rm SE})$ is regular. Moreover, for fixed $\sigma \in \Sigma$ and $g \in \RR^\Gsf$ , we have

(\hat T_\sigma \, g)(x,a) = \sum_{x'} r_\sigma(x') P(x, a, x') + (K_\sigma \, g)(x, a),

where $K_\sigma$ is as defined in (4.25). Since $\hat{\TT}_{\rm SE}$ is finite and $\rho(K_\sigma) < 1$ for all $\sigma \in \Sigma$ , the first two claims in Proposition 4.2.5 follow from Theorem 4.1.7. Since global stability implies order stability (Lemma A.5.19), the last claim regarding HPI can be proved using Theorem 2.2.6. ◻

4.2.3.3Beyond Expected Utility¶

Some studies find incompatibilities between data and predictions of models that use additively separable preferences and mathematical expectation to evaluate uncertain outcomes (see, e.g., Lu et al. (2024)). To further this line of analysis, we revisit the basic structural estimation model in Section 4.2.3.1 while replacing mathematical expectation with a general certainty equivalent operator.

As in Section 4.2.3.1, spaces of bounded real-valued functions are paired with the pointwise order $\leq$ , and the supremum norm, to be denoted by $\| \cdot \|$ . The state space $\Xsf$ is a metric space, $\Asf$ is a finite choice set, $\Gsf \coloneq \Xsf \times \Asf$ , the reward function $r \colon \Gsf \to \RR$ is measurable, and $\beta \in (0,1)$ is a constant discount factor. However, we modify the Bellman equation (4.20) for the post-action value function to

g(x, a) = (M H g)(x, a) \quad \text{where } (Hg)(x') \coloneq \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right].

(4.26)

Here $M$ is a certainty equivalent operator mapping $b\Xsf$ into $b\Gsf$ ; that is, $M$ is order-preserving and $M(v + \kappa \1) = Mv + \kappa \1$ for all $\kappa \in \RR_+$ . An example is given below in Section 4.2.3.4.

Let $\Sigma$ be the set of Borel measurable maps from $\Xsf$ to $\Asf$ . Given $\sigma \in \Sigma$ , we set

(\hat T_\sigma \, g)(x, a) = (M H_\sigma \, g)(x, a) \quad \text{where } (H_\sigma \, g)(x') \coloneq r(x', \sigma(x')) + \beta g(x', \sigma(x')).

With $\hat{\TT}_{\rm SE} \coloneq \{\hat T_\sigma\}_{\sigma \in \Sigma}$ , the pair $(b\Gsf, \hat{\TT}_{\rm SE})$ is an ADP and $\sigma \in \Sigma$ is $g$ -greedy whenever

\sigma(x) \in \argmax_{a' \in \Asf} \left[ r(x, a') + \beta g(x, a') \right] \quad \text{for all } x \in \Xsf.

(4.27)

Since $\Asf$ is finite and nonempty, such a policy always exists. (A function $\sigma$ obeying (4.27) can be chosen to be measurable—see the solution to Exercise 4.2.4.)

Given $g \in b\Gsf$ , the ADP Bellman operator $\hat T$ satisfies $\hat T g = \hat T_\sigma \, g$ whenever $\sigma$ is $g$ -greedy. Using this fact and (4.27), we obtain $\hat T g = M H g$ . Hence any fixed point of $\hat T$ solves the original Bellman equation (4.26).

Proof

We apply Theorem 4.1.3. As regularity was already confirmed above, it suffices to show that

\hat T_\sigma \, (g + \kappa e) \leq \hat T_\sigma \, g + \beta \kappa e

(4.28)

for all $g \in b\Gsf$ and all $\kappa \in \RR_+$ . To verify this, fix $\sigma$ , $g$ and $\kappa$ as above. Since $(H_\sigma \, g)(x') = r(x', \sigma(x')) + \beta g(x', \sigma(x'))$ , we have $H_\sigma(g + \kappa \1) = H_\sigma \, g + \beta \kappa \1$ . Since $M$ is a certainty equivalent operator, $\hat T_\sigma(g + \kappa \1) = M(H_\sigma \, g + \beta \kappa \1) = M(H_\sigma \, g) + \beta \kappa \1 = \hat T_\sigma \, g + \beta \kappa \1$ . ◻

4.2.3.4The Risk-Sensitive Case¶

As an illustration, suppose that $M$ is the risk-sensitive certainty equivalent

(M f)(x, a) \coloneq - \frac{1}{\gamma} \ln \left\{ \int \exp \left[ -\gamma f(x') \right] P(x, a, \diff x') \right\} \qquad ((x, a) \in \Gsf),

where $P$ is a stochastic kernel from $\Gsf$ to $\Xsf$ and $\gamma$ is a nonzero constant. Among other things, Proposition 4.2.6 tells us that $\sigma \in \Sigma$ is optimal if and only if

\sigma(x) \in \argmax_{a' \in \Asf} \left[ r(x, a') + \beta \gmax(x, a') \right] \quad \text{for all } x \in \Xsf,

where $\gmax$ is the unique solution to the functional equation

g(x, a) = - \frac{1}{\gamma} \ln \left\{ \int \exp \left\{ -\gamma \max_{a' \in \Asf} \left[ r(x', a') + \beta g(x', a') \right] \right\} P(x, a, \diff x') \right\}

in the value space $b\Gsf$ .

Alternative certainty equivalents are discussed in Section 7.2.4.

4.3Chapter Notes¶

This chapter extends the ADP framework of Chapter 2 and Chapter 3 by working within partially ordered metric and topological spaces, which provide a natural setting for combining contraction-based and order-theoretic reasoning. In practice, most of the pospaces we examine are subsets of Banach lattices. Background on Banach lattices and positive operators can be found in the references listed in the chapter notes of Chapter 2. Du’s conditions for concave and convex operators, treated in Section 4.1.3, originate in Du (1990); see also Zhang (2012), Theorem 2.1.2. Certainty equivalent operators, which appear in Section 4.2.3.3, are covered in depth in Section 7.2.4.

The applications in this chapter draw on several fields. The discounting condition in Assumption 4.2.3 is similar to restrictions found in Hansen & Scheinkman (2012) and Borovička & Stachurski (2020). General results on dynamic programming with state-dependent discounting can be found in Stachurski & Zhang (2021). The structural estimation framework in Section 4.2.3 originates with the classic work of Rust (1987); for further background, see Rust (1994), Igami (2020), Kristensen et al. (2021), and Chapter 5 of Sargent & Stachurski (2025). A duality-based perspective on dynamic discrete-choice models is developed in Chiong et al. (2016). The non-expected utility extension in Section 4.2.3.3 is motivated by evidence reviewed in Lu et al. (2024). Nonexpected utility is considered again in Section 7.2.4.

References¶

Stokey, N. L., & Lucas, R. E. (1989). Recursive methods in dynamic economics. Harvard University Press.
Harrison, J. M., & Kreps, D. M. (1978). Speculative investor behavior in a stock market with heterogeneous expectations. The Quarterly Journal of Economics, 92(2), 323–336.
Du, Y. (1990). Fixed points of increasing operators in ordered Banach spaces and applications. Applicable Analysis, 38(01–02), 1–20.
Zhang, Z. (2012). Variational, topological, and partial order methods with their applications (Vol. 29). Springer.
Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 999–1033.
Bertsekas, D. P. (2022). Abstract dynamic programming (3rd ed.). Athena Scientific.
Ren, G., & Stachurski, J. (2021). Dynamic programming with value convexity. Automatica, 130, 109641.
Toda, A. A. (2024). Essential Mathematics for Economics (1st ed., p. 308). Chapman. 10.1201/9781032698953
Rust, J. (1994). Structural estimation of Markov decision processes. Handbook of Econometrics, 4, 3081–3143.
Kristensen, D., Mogensen, P. K., Moon, J. M., & Schjerning, B. (2021). Solving dynamic discrete choice models using smoothing and sieve methods. Journal of Econometrics, 223(2), 328–360.
Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
Lu, J., Luo, Y., Saito, K., & Xin, Y. (2024). Did Harold Zuercher Have Time-Separable Preferences?
Hansen, L. P., & Scheinkman, J. A. (2012). Recursive utility in a Markov environment with stochastic growth. Proceedings of the National Academy of Sciences, 109(30), 11967–11972.
Borovička, J., & Stachurski, J. (2020). Necessary and sufficient conditions for existence and uniqueness of recursive utilities. The Journal of Finance.
Stachurski, J., & Zhang, J. (2021). Dynamic programming with state-dependent discounting. Journal of Economic Theory, 192, 105190.

4 ADPs on Banach Space

4.1ADPs on Banach Space¶

4.1.1Contractions¶

4.1.1.1Contractions in Banach Space¶

4.1.1.2Optimality for Blackwell ADPs¶

4.1.2Order Contractions¶

4.1.2.1Order Contractions¶

4.1.2.2Order Contracting ADPs¶

4.1.2.3Order Contractive Linear Models¶

4.1.3Concavity and Convexity¶

4.1.3.1Fixed Point Results¶

4.1.3.2Optimality Theory¶

4.2Applications¶

4.2.1Firm Valuation¶

4.2.1.1Constant Discounting¶

4.2.1.2Unbounded Rewards¶

4.2.1.3State-Dependent Discounting¶

4.2.2A Real Option Problem¶

4.2.2.1Set Up¶

4.2.2.2Lifetime Values¶

4.2.2.3Optimality¶

4.2.3Structural Estimation¶

4.2.3.1Post-Action Value Functions¶

4.2.3.2State-Dependent Discounting¶

4.2.3.3Beyond Expected Utility¶

4.2.3.4The Risk-Sensitive Case¶

4.3Chapter Notes¶