Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

7 Recursive Decision Processes

In this chapter we study what Sargent & Stachurski (2025) call recursive decision processes (RDPs). The following display shows where RDPs fit relative to the other major classes of DP models studied so far in this book:

MDPs  LDPs  RDPs  ADPs.\text{MDPs } \subset \text{ LDPs } \subset \text{ RDPs } \subset \text{ ADPs}.

The main role of RDPs is to extend LDPs by accommodating nonlinearities in aggregators and calculation of present values. Another difference between this chapter and our discussion of LDPs is that we allow for unbounded rewards and value functions. While this can also be done in an LDP setting, the corresponding analysis turns out to be cleaner when working with RDPs.

Section 7.1 introduces the RDP framework, provides examples, clarifies the relationships between RDPs, LDPs and ADPs, and discusses existence of greedy policies. Section 7.2.1 and Section 7.2.2 present optimality results---first for bounded rewards and then for unbounded rewards handled via weighted contractions---while Section 7.2.3 gives conditions under which the value function is monotone, concave, or uniquely determined. After a digression on certainty equivalents (Section 7.2.4), we extend the optimality theory to MDPs with general certainty equivalents (Section 7.2.5). Section 7.3.1 applies the theory to optimal savings with unbounded utility, and we then study irreversible investment under risk neutrality, risk aversion, and ambiguity aversion.

7.1Introduction

Section 7.1.1 defines RDPs and provides examples. We then clarify the relationships between RDPs, LDPs and ADPs, and discuss existence of greedy policies in Section 7.1.3.

7.1.1Definition and Examples

To a first approximation, RDPs are dynamic programs with a Bellman equation of the form

v(x)=maxaΓ(x)B(x,a,v)(xX)v(x) = \max_{a \in \Gamma(x)} B(x, a, v) \qquad (x \in \Xsf)

for some suitable choice of BB. Here xx is the state, aa is an action, Γ\Gamma is a feasible correspondence and BB is an “aggregator,” with interpretation

B(x,a,v)=B(x, a, v) = total lifetime rewards, contingent on current action aa, current state xx and the use of vv to evaluate future states.

In Section 7.1.1.1--Section 7.1.1.5 we improve this definition and then provide examples. As usual, in a topological space setting, “measurable” means “Borel measurable” unless otherwise stated.

7.1.1.1Definition

Let X\Xsf and A\Asf be separable metric spaces, referred to henceforth as the state and action spaces respectively. Given these spaces, a recursive decision process (RDP) is a tuple (Γ,V,B)(\Gamma, V, B) containing

  1. a nonempty correspondence Γ\Gamma from X\Xsf to A\Asf called the feasible correspondence, with an associated set of feasible state-action pairs

G:=graphΓ={(x,a)X×A:aΓ(x)}\Gsf := \graph \Gamma = \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)}

and an associated set of feasible policies

Σ{all measurable σ ⁣:XA satisfying σ(x)Γ(x) for all xX},\Sigma \coloneq \{ \text{all measurable } \sigma \colon \Xsf \to \Asf \text{ satisfying } \sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf \},
  1. a subset VV of RX\RR^\Xsf called the value space,

  2. a map B ⁣:G×VRB \colon \Gsf \times V \to \RR, referred to as an aggregator, satisfying the monotonicity condition

vw    B(x,a,v)B(x,a,w)v \leq w \implies B(x, a, v) \leq B(x, a, w)

for every v,wVv, w \in V and every (x,a)G(x, a) \in \Gsf, and the consistency condition

σΣ and vV        m(x)B(x,σ(x),v) is in V.\sigma \in \Sigma \text{ and } v \in V \; \implies \; m(x) \coloneq B(x, \sigma(x), v) \text{ is in } V.

Several objects, such as X,A\Xsf, \Asf and Γ\Gamma are familiar from our definition of LDPs in Section 6.1.2.1. Analgous to the LDP case, when representing the RDP by the tuple (Γ,V,B)(\Gamma, V, B), we are treating X\Xsf and A\Asf as understood from context.

The value space VV is a class of functions that assign values to states. The order on the left side of (7.5) is the usual pointwise partial order for functions. The monotonicity restriction is natural: relative to vv, if rewards are at least as high under ww in every future state, then the total rewards we can extract under ww should be at least as high.

The final condition, in (7.6), is a consistency condition implying that VV is large enough to capture the value of following a particular policy.

7.1.1.2Example: Finite MDPs

In Section 1.2 we introduced the basic MDP model, with finite state space X\Xsf, finite action space A\Asf, and remaining primitives (Γ,r,β,P)(\Gamma, r, \beta, P) as given in Section 1.2.1.1. This maps easily to the RDP setting by taking V=RXV = \RR^\Xsf, Γ\Gamma as given, and

B(x,a,v)=r(x,a)+βxv(x)P(x,a,x)B(x, a, v) = r(x, a) + \beta \sum_{x'} v(x') P(x, a, x')

for (x,a)G(x, a) \in \Gsf and vRXv \in \RR^\Xsf.

For this model, it is clear that the RDP Bellman equation v(x)=maxaΓ(x)B(x,a,v)v(x) = \max_{a \in \Gamma(x)} B(x, a, v) from (7.2) agrees with the original expression we gave in (1.46).

7.1.1.3Example: The Firm Valuation Problem

Recall the firm decision problem we analyzed in Section 1.1.1.2, where the decision is binary (0 means continue and 1 means sell) and the state xx takes values in a set X\Xsf and evolves via stochastic kernel PP. To map this problem to an RDP we set A={0,1}\Asf = \{0,1\} and Γ(x)=A\Gamma(x) = \Asf for all xx. We take VbXV \coloneq b\Xsf as the value space and set

B(x,a,v)=as+(1a)[π(x)+βv(x)P(x, ⁣dx)].B(x, a, v) = a s + (1 - a) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right].

The monotonicity condition (7.5) clearly holds. A policy is a B\bB-measurable map σ ⁣:X{0,1}\sigma \colon \Xsf \to \{0,1\}. Given any such policy and any vbXv \in b\Xsf, the function

m(x)σ(x)s+(1σ(x))[π(x)+βv(x)P(x, ⁣dx)]m(x) \coloneq \sigma(x) s + (1 - \sigma(x)) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right]

is in bXb\Xsf (since π\pi is assumed to be bounded), so the consistency condition (7.6) also holds. For this model, the RDP Bellman equation v(x)=maxaΓ(x)B(x,a,v)v(x) = \max_{a \in \Gamma(x)} B(x, a, v) becomes

v(x)=maxa{0,1}{as+(1a)[π(x)+βv(x)P(x, ⁣dx)]}v(x) = \max_{a \in \{0,1\}} \left\{ a s + (1 - a) \left[ \pi(x) + \beta \int v(x') P(x, \diff x') \right] \right\}

This is equivalent to the original statement of the Bellman equation in Theorem 1.1.1, on page .

7.1.1.4Firm Valuation with Unbounded Profits

The firm valuation problem can also fit into the RDP framework when profits are unbounded, at least in some cases. For example, suppose that X=R+\Xsf = \RR_+, that \ell is a given weight function on R+\RR_+ (see Section A.5.3.5), and that there exist nonnegative constants α,η,δ\alpha, \eta, \delta such that

π(x)η(x)+δand(x)P(x, ⁣dx)α(x)(xX).\pi(x) \leq \eta \ell(x) + \delta \quad \text{and} \quad \int \ell(x') P(x, \diff x') \leq \alpha \ell(x) \qquad (x \in \Xsf).

(These conditions bound the rate at which profits grow.) We again take BB as in (7.8) and Γ(x)={0,1}\Gamma(x) = \{0,1\} for all xR+x \in \RR_+. We set VV equal to bXb_\ell \Xsf, the set of measurable functions vRXv \in \RR^\Xsf with v<\| v \|_\ell < \infty. (Here \| \cdot \|_\ell denotes the \ell-weighted supremum norm, as in Section A.5.3.5.)

7.1.1.5Example: Optimal Savings

Consider the optimal savings problem studied in Section 1.3. The state is wR+w \in \RR_+ and the action is cR+c \in \RR_+. The feasible correspondence is Γ(w)=[0,w]\Gamma(w) = [0, w] and VbR+V \coloneq b\RR_+ is the value space. We set

B(w,c,v)=u(c)+βv(R(wc)+y)ϕ( ⁣dy)(vV,  0cw).B(w, c, v) = u(c) + \beta \int v(R(w - c) + y) \phi(\diff y) \qquad (v \in V, \; 0 \leq c \leq w).

As in Assumption 1.3.1, we take uu to be bounded and continuous. Under these restrictions, the tuple (Γ,V,B)(\Gamma, V, B) is an RDP. The function BB is real-valued and the monotonicity condition (7.5) clearly holds. The consistency condition (7.6) holds because, by the definition of Γ\Gamma, a policy is a Borel measurable map σ ⁣:R+R+\sigma \colon \RR_+ \to \RR_+ with 0σ(w)w0 \leq \sigma(w) \leq w for all ww, and given any such policy and any vbR+v \in b\RR_+, the function

m(w)u(σ(w))+βv(R(wσ(w))+y)ϕ( ⁣dy)m(w) \coloneq u(\sigma(w)) + \beta \int v(R(w - \sigma(w)) + y) \phi(\diff y)

is measurable and bounded (since uu is bounded and continuous).

For this model, the RDP Bellman equation v(x)=maxaΓ(x)B(x,a,v)v(x) = \max_{a \in \Gamma(x)} B(x, a, v) from (7.2) agrees with the optimal savings Bellman equation in (1.98).

7.1.1.6Example: Savings with Kreps--Porteus Expectations

Consider a variation of the optimal savings model from Section 7.1.1.5 with Epstein--Zin-type preferences. To simplify the presentation, we set the EIS parameter to ψ=\psi = \infty, so that the CES aggregator reduces to addition, while retaining the nonlinear Kreps--Porteus expectation over future values. The Bellman equation becomes

v(w)=max0cw{(1β)u(c)+β(v(R(wc)+y)1γϕ( ⁣dy))11γ}v(w) = \max_{0 \leq c \leq w} \left\{ (1-\beta) u(c) + \beta \left( \int v(R(w-c) + y)^{1-\gamma} \phi(\diff y) \right)^{\frac{1}{1-\gamma}} \right\}

where γ>0\gamma > 0 with γ1\gamma \neq 1 is the coefficient of relative risk aversion.

As before, the state is wR+w \in \RR_+, the action is cR+c \in \RR_+, and Γ(w)=[0,w]\Gamma(w) = [0, w]. To avoid raising zero to a negative power, we assume that uu is measurable and that there exist constants 0<uuˉ<0 < \underline{u} \leq \bar{u} < \infty with uu(c)uˉ\underline{u} \leq u(c) \leq \bar{u} for all cR+c \in \RR_+. The aggregator is

B(w,c,v)=(1β)u(c)+β(v(R(wc)+y)1γϕ( ⁣dy))11γ.B(w, c, v) = (1-\beta) u(c) + \beta \left( \int v(R(w-c) + y)^{1-\gamma} \phi(\diff y) \right)^{\frac{1}{1-\gamma}}.

We set the value space VV to be all measurable functions v ⁣:R+[u,uˉ]v \colon \RR_+ \to [\underline{u}, \bar{u}].

With BB defined as above, the RDP Bellman equation v(x)=maxaΓ(x)B(x,a,v)v(x) = \max_{a \in \Gamma(x)} B(x, a, v) from (7.2) agrees with (7.16).

7.1.1.7Example: MDPs with Modified Rewards

Some authors use an MDP framework where current rewards depend on the next period state, so that the Bellman equation has the form

v(x)=maxaΓ(x)x{r(x,a,x)+βv(x)}P(x,a,x)(xX).v(x) = \max_{a \in \Gamma(x)} \sum_{x'} \left\{ r(x, a, x') + \beta v(x') \right\} P(x, a, x') \qquad (x \in \Xsf).

Here rr maps G×X\Gsf \times \Xsf to R\RR and other primitives are unchanged. We take V=RXV = \RR^\Xsf, Γ\Gamma as given, and set

B(x,a,v)=x{r(x,a,x)+βv(x)}P(x,a,x)B(x, a, v) = \sum_{x'} \left\{ r(x, a, x') + \beta v(x') \right\} P(x, a, x')

Evidently, for the associated RDP (Γ,V,B)(\Gamma, V, B), the monotonicity and consistency conditions (7.5) and (7.6) both hold. For this choice of BB, the RDP Bellman equation v(x)=maxaΓ(x)B(x,a,v)v(x) = \max_{a \in \Gamma(x)} B(x, a, v) agrees with the modified MDP Bellman equation in (7.19).

7.1.1.8Example: Risk-Sensitive Preferences

In §Example 6.1.6 we discussed a risk-sensitive MDP with entropic certainty equivalent. This model can be embedded in the RDP framework by setting V=RXV = \RR^\Xsf, Γ\Gamma as given, and

B(x,a,v)r(x,a)+βθln[xexp(θv(x))P(x,a,x)]B(x, a, v) \coloneq r(x, a) + \frac{\beta}{\theta} \ln \left[ \sum_{x'} \exp(\theta v(x')) P(x,a,x') \right]

The parameter θ\theta is any nonzero real value.

When BB is given by (7.21), the RDP Bellman equation v(x)=maxaΓ(x)B(x,a,v)v(x) = \max_{a \in \Gamma(x)} B(x, a, v) from (7.2) becomes

v(x)=maxaΓ(x){r(x,a)+βθln[xexp(θv(x))P(x,a,x)]}.v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \frac{\beta}{\theta} \ln \left[ \sum_{x'} \exp(\theta v(x')) P(x,a,x') \right] \right\}.

7.1.2RDPs vs LDPs vs ADPs

As mentioned at the start of the chapter, we have MDPs \subset LDPs \subset RDPs \subset ADPs and the inclusions are all strict. We already know that the first inclusion is strict (consider, for example, the LDP with state-dependent discounting in Section 6.1.4). Here we review the remaining relationships.

7.1.2.1LDPs are RDPs

Let (Γ,r,K)(\Gamma, r, K) be an LDP with state and action spaces X\Xsf, A\Asf, as defined in Section 6.1.2.1. Setting V=bXV = b\Xsf and

B(x,a,v)=r(x,a)+v(x)K(x,a, ⁣dx)((x,a)G,  vV),B(x, a, v) = r(x, a) + \int v(x') K(x,a, \diff x') \qquad ((x,a) \in \Gsf, \; v \in V),

the resulting tuple (Γ,V,B)(\Gamma, V, B) is an RDP. To see this, note that V=bXV = b\Xsf is a subset of RX\RR^\Xsf, so we only need to check the monotonicity and consistency conditions for the aggregator BB. For monotonicity, fix (x,a)G(x, a) \in \Gsf and v,wVv, w \in V with vwv \leq w. Since K(x,a,)K(x, a, \cdot) is a nonnegative measure, we have v(x)K(x,a, ⁣dx)w(x)K(x,a, ⁣dx)\int v(x') K(x, a, \diff x') \leq \int w(x') K(x, a, \diff x') and hence B(x,a,v)B(x,a,w)B(x, a, v) \leq B(x, a, w). For consistency, fix σΣ\sigma \in \Sigma and vbXv \in b\Xsf. We need to show that m(x)B(x,σ(x),v)m(x) \coloneq B(x, \sigma(x), v) is in bXb\Xsf. This follows from the LDP conditions, which require rbGr \in b\Gsf and KvbGKv \in b\Gsf whenever vbXv \in b\Xsf.

The risk-sensitive MDP in Section 7.1.1.8 is an RDP but not an LDP, since the aggregator is nonlinear in future values.

7.1.2.2RDPs are ADPs

Every RDP generates an ADP. To see this, let (Γ,V,B)(\Gamma, V, B) be an RDP with state space X\Xsf and action space A\Asf. The set VV is paired with the pointwise partial order. With Σ\Sigma as the set of feasible policies and given σ\sigma in Σ\Sigma, we define TσT_\sigma by

(Tσv)(x)=B(x,σ(x),v)(xX,  vV).(T_\sigma \, v)(x) = B(x, \sigma(x), v) \qquad (x \in \Xsf, \; v \in V).

The monotonicity and consistency conditions in (7.5)--(7.6) imply that TσT_\sigma is an order-preserving self-map on VV. Hence, with T\TT as the set of all policy operators, the pair (V,T)(V, \TT) is an ADP. We call (V,T)(V, \TT) the ADP generated by (Γ,V,B)(\Gamma, V, B).

For ADPs generated by RDPs, we can provide intuitive representations of greedy policies and the Bellman equation. For example, we recall from our ADP definition in (2.1) that a policy σΣ\sigma \in \Sigma is vv-greedy for ADP (V,T)(V, \TT) if TτvTσvT_\tau \, v \leq T_\sigma \, v for all τΣ\tau \in \Sigma. If (V,T)(V, \TT) is generated by (Γ,V,B)(\Gamma, V, B), then this is equivalent to the statement that

B(x,τ(x),v)B(x,σ(x),v)for all τΣ and xX.B(x, \tau(x), v) \leq B(x, \sigma(x), v) \quad \text{for all } \tau \in \Sigma \text{ and } x \in \Xsf.

Also, we recall that the ADP Bellman operator is defined by Tv=σTσvTv = \bigvee_\sigma T_\sigma \, v whenever the supremum exists. When (V,T)(V, \TT) is generated by (Γ,V,B)(\Gamma, V, B), this is equivalent to the statement (Tv)(x)=supσΣB(x,σ(x),v)(Tv)(x) = \sup_{\sigma \in \Sigma} B(x, \sigma(x), v) for all xXx \in \Xsf whenever the pointwise supremum exists (see Exercise A.4). Under reasonable conditions on Γ\Gamma and BB, we will show that this can be improved to the stronger form

(Tv)(x)=maxaΓ(x)B(x,a,v)(xX)(Tv)(x) = \max_{a \in \Gamma(x)} B(x, a, v) \qquad (x \in \Xsf)

(see Lemma 7.1.1 and Lemma 7.1.2).

Since every RDP is an ADP, we can use ADP optimality results to study RDPs. Given an RDP (Γ,V,B)(\Gamma, V, B) and its generated ADP (V,T)(V, \TT), we make the obvious connections, saying that

7.1.2.3Not all ADPs are RDPs

Although the RDP framework is broad, there are significant dynamic programs that fall outside this framework.

In the examples above, it is possible to rearrange the problem so that the max\max operator is shifted to the outside and, thereby, construct a version that fits the RDP framework. But there are good reasons to avoid this, related to smoothness and dimensionality (see, e.g., Kristensen et al. (2021) or Rust (1994)).

7.1.3Existence of Greedy Policies

As always, existence of greedy policies is important for our analysis. In this section, we investigate RDP environments where greedy policies exist. We begin with finite action spaces and then move to the general case.

7.1.3.1Finite Actions

We begin with the discrete choice setting, where greedy policies always exist.

Note that, in the simple setting of Lemma 7.1.1, the Bellman equation takes the form of (7.2). Below we investigate more complex settings where this is still true.

7.1.3.2Continuous Actions

Now we drop the finiteness restriction on A\Asf. In these general RDP settings, existence is less trivial. Here we state one useful result.

7.2Optimality Results

Let’s put together some sufficient conditions for optimality of RDP models. We will focus here on models that are naturally contracting. This permits us to handle dynamic programs with both bounded and unbounded rewards.

We begin in Section 7.2.1 with the bounded case, where the aggregator BB is bounded and satisfies a Blackwell-type discounting condition. In Section 7.2.2 we extend to potentially unbounded rewards using weighted contractions. Finally, Section 7.2.3 investigates properties of solutions, giving sufficient conditions for the value function to be monotone, concave, or uniquely determined, and for the optimal policy to be continuous.

7.2.1Bounded Contractions

RDPs have strong optimality properties when they uniformly contract values. The current section investigates this case. Throughout this section, we assume that values are bounded. This typically occurs when reward functions are bounded. Later, in Section 7.2.2, we will consider unbounded settings.

7.2.1.1Framework

Let X,A\Xsf, \Asf be separable metric spaces, let Γ\Gamma be a nonempty correspondence from X\Xsf to A\Asf, and let G\Gsf be the feasible state-action pairs (see Section 7.1.1.1). Set V=bXV = b\Xsf. Let B ⁣:G×VRB \colon \Gsf \times V \to \RR be a given function such that

The tuple (Γ,V,B)(\Gamma, V, B) is an RDP. To see this, note that the monotonicity condition (7.5) is given by the second restriction on BB above. For the consistency condition (7.6), fix σΣ\sigma \in \Sigma and vVv \in V. The function xB(x,σ(x),v)x \mapsto B(x, \sigma(x), v) is measurable, since σ\sigma is measurable and (x,a)B(x,a,v)(x,a) \mapsto B(x,a,v) is measurable on G\Gsf, and bounded, since BB is bounded by Assumption 7.2.1. Hence B(,σ(),v)bX=VB(\cdot, \sigma(\cdot), v) \in b\Xsf = V.

7.2.1.2Finite Actions

We seek optimality results for the RDP (Γ,V,B)(\Gamma, V, B) introduced in Section 7.2.1.1. The simplest case is when the choice set is always finite. Also note that, in this setting, a policy σ\sigma is vv-greedy if and only if (7.32) holds.

7.2.1.3The Continuous Case

We now drop the finiteness assumption, while continuing to work with the RDP (Γ,V,B)(\Gamma, V, B) introduced in Section 7.2.1.1. In place of finiteness, we consider two continuity conditions on BB.

The main result of this section is as follows.

7.2.2Weighted Contractions

In Section 7.2.1, we considered RDPs that are both contracting and bounded. Some useful RDPs fail to have this boundedness property. Here we extend our results to potentially unbounded problems that still retain contractivity. (While the results obtained in Section 7.2.1 are special cases of the results presented here (after minor modifications), we decided to present them separately in order to provide simple sufficient conditions in the bounded case.)

When maximizing, the theory works best for problems where rewards are unbounded above and bounded below. (One approach to the reverse type of unboundedness can be found in Ma et al. (2022).) Because we focus on such problems, we will typically assume that rewards are nonnegative. This costs no generality in such settings, since optimal policies are invariant to additive shifts.

Throughout this section, \ell is a weight function on X\Xsf, \| \cdot \|_\ell denotes the \ell-weighted supremum norm, and bXb_\ell \Xsf is all f ⁣:XRf \colon \Xsf \to \RR with f/bXf/\ell \in b\Xsf. See Section A.5.3.5 for background and discussion of weight functions and the space bXb_\ell \Xsf.

7.2.2.1Framework

Let Γ\Gamma be a nonempty correspondence from X\Xsf to A\Asf and let G\Gsf be the feasible state-action pairs (see Section 7.1.1.1). Set V=bX+V = b_\ell \Xsf_+, so that VV is the nonnegative functions in bXb_\ell \Xsf. Let B ⁣:G×VR+B \colon \Gsf \times V \to \RR_+ be a given function. (In the current setting, where BB can be unbounded, we restrict attention to the case where BB is nonnegative. Since the weighted contraction approach pursued here works best for rewards that are unbounded above but bounded below, imposing nonnegativity costs very little in the way of generality.)

We suppose that

We also require two conditions related to contractivity and \ell-boundedness:

Consider the tuple (Γ,V,B)(\Gamma, V, B).

7.2.2.2Finite Actions

We seek optimality results for the RDP (Γ,V,B)(\Gamma, V, B) introduced in Lemma 7.2.3. The simplest case is when the choice set is always finite:

In this case we have the following result:

7.2.2.3The Continuous Case

We now drop the finiteness assumption, while continuing to work with the RDP (Γ,V,B)(\Gamma, V, B) introduced in Lemma 7.2.3.

7.2.3Properties of Solutions

In this section, we seek sufficient conditions for the value and policy functions to have useful shape and continuity properties. We adopt the setting of Proposition 7.2.5 and study the properties of the RDP (Γ,V,B)(\Gamma, V, B) discussed in that result. In the proofs below, we repeatedly use Lemma A.2.6 on page .

7.2.3.1Monotone Values

First, we seek conditions under which the value function is increasing. In addition to the conditions in Proposition 7.2.5, we suppose that X\Xsf is partially ordered by \preceq. Let

ibcX+ the set of increasing functions in bcX+.ib_\ell c\Xsf_+ \coloneq \text{ the set of increasing functions in } b_\ell c\Xsf_+.

Both conditions in Assumption 7.2.8 are monotonicity conditions. The first is equivalent to stating that Γ\Gamma is order preserving when viewed as a map from (X,)(\Xsf, \preceq) to ((A),)(\wp(\Asf), \subset). Here (A)\wp(\Asf) is the set of all subsets of A\Asf and \subset is the partial order induced by set inclusion (Example A.1.2).

7.2.3.2Concavity

Next we seek sufficient conditions for the value function to be concave. In this section, we assume that both X\Xsf and A\Asf are convex subsets of a vector space.

The convexity requirement on G\Gsf in Assumption 7.2.9 is equivalent to the statement that, for all x,xx, x' in X\Xsf, all aΓ(x)a \in \Gamma(x) all aΓ(x)a' \in \Gamma(x') and all λ[0,1]\lambda \in [0, 1], we have

λa+(1λ)aΓ(λx+(1λ)x).\lambda a + (1-\lambda) a' \in \Gamma(\lambda x + (1-\lambda) x').

By taking x=xx=x', we see that each set Γ(x)\Gamma(x) is convex in A\Asf.

7.2.3.3Uniqueness and Continuity

When the conditions of Proposition 7.2.5 are in force, we know that at least one optimal policy exists in Σ\Sigma. The question we ask now is, when is it unique? Not surprisingly, uniqueness can be obtained with a form of strict concavity.

7.2.4Digression on Certainty Equivalents

Before continuing with the theory of RDPs, it will be helpful to review risk measures and certainty equivalents. These concepts are in one-to-one correspondence: we convert between them by flipping signs. Certainty equivalents can be understood as extensions of mathematical expectation that include attitudes towards risk. In later sections, we will tie the discussion of risk measures and certainty equivalents back into RDP theory and its applications.

(The existence of parallel literatures on risk measures and certainty equivalents reflects the fact that researchers in finance and engineering often think about minimizing risk, while economists typically concern themselves with maximizing rewards.[1] In this book, we tend to work with certainty equivalents, although the following discussion will allow readers to translate between the two.)

Throughout the following discussion, the triple (Ω,F,P)(\Omega, \fF, \PP) is a probability space and LL(Ω,F,P)L_\infty \coloneq L_\infty(\Omega, \fF, \PP) is the set of essentially bounded random variables on (Ω,F,P)(\Omega, \fF, \PP); that is, all random ZZ admitting an NNN \in \NN with ZN|Z| \leq N P\PP-a.s.

In this setting, a risk measure is a map R ⁣:LR\rR \colon L_\infty \to \RR satisfying

  1. Monotonicity: If Z,ZLZ, Z' \in L_\infty and ZZZ \leq Z' P\PP-a.s., then R(Z)R(Z)\rR(Z') \leq \rR(Z).

  2. Cash invariance: R(Z+a)=R(Z)a\rR(Z + a) = \rR(Z) - a for all ZLZ \in L_\infty and aRa \in \RR.

A certainty equivalent is a map E ⁣:LR\eE \colon L_\infty \to \RR satisfying

  1. Monotonicity: If Z,ZLZ, Z' \in L_\infty and ZZZ \leq Z' P\PP-a.s., then E(Z)E(Z)\eE(Z) \leq \eE(Z').

  2. Cash invariance: E(Z+a)=E(Z)+a\eE(Z + a) = \eE(Z) + a for all ZLZ \in L_\infty and aRa \in \RR.

(Note that the meaning of monotonicity and cash invariance changes from R\rR to E\eE.)

We now define several significant subclasses of risk measures and state the corresponding properties of the associated certainty equivalent E=R\eE = -\rR.

A risk measure R\rR is called convex if

R(λZ+(1λ)Z)λR(Z)+(1λ)R(Z)\rR(\lambda Z + (1 - \lambda) Z') \leq \lambda\, \rR(Z) + (1 - \lambda)\, \rR(Z')

for all Z,ZLZ, Z' \in L_\infty and λ[0,1]\lambda \in [0, 1]. Obviously, R\rR is convex if its negation ER\eE \coloneq -\rR is concave:

E(λZ+(1λ)Z)λE(Z)+(1λ)E(Z).\eE(\lambda Z + (1 - \lambda) Z') \geq \lambda\, \eE(Z) + (1 - \lambda)\, \eE(Z').

Concavity of E\eE captures the idea that diversification is weakly preferred.

A risk measure R\rR is called coherent if it is convex and positively homogeneous, meaning that R(λZ)=λR(Z)\rR(\lambda Z) = \lambda\, \rR(Z) for all ZLZ \in L_\infty and λ>0\lambda > 0. The certainty equivalent E=R\eE = -\rR is then concave and positively homogeneous. Together with concavity, this means E\eE is superadditive and positively homogeneous.

7.2.4.1Duality

There is a dual representation theorem for convex risk measures, originally due to Föllmer & Schied (2002), that helps us interpret and manipulate these functionals. Here we restate their result in terms of concave certainty equivalents. In doing so, we will restrict attention to the law invariant case; that is, the case where E(Z)\eE(Z) depends only on the distribution of ZZ for all ZLZ \in L_\infty.[2]

In the theorem statement, PZP_Z is PZ1\PP \circ Z^{-1}, the distribution of ZZ, and the infimum is over all QP(R)Q \in \pP(\RR) such that QQ is absolutely continuous with respect to PZP_Z. The constraint QPZQ \ll P_Z means that if PZP_Z says an event is impossible, then QQ must also say it’s impossible.

One way to interpret (7.50) is in terms of an adversarial agent who chooses QQ to minimize the expected return EQ[Z]\EE_Q[Z], while being constrained by a penalty term α(Q)\alpha(Q). This is the robust optimization point of view: the agent makes choices that are robust to variations by a real or fictitious adversary. The penalty function α\alpha controls how far the adversary is able to deviate from the reference model PZP_Z. From this perspective, the absolute continuity condition QPZQ \ll P_Z means that the adversary is allowed to disagree about how likely different scenarios are, but not about which scenarios are conceivable.

A second interpretation involves ambiguity. The agent does not know the true model and PZP_Z is only a reference point. The agent’s cautious reasoning forces him to entertain a range of plausible models. The penalty term α(Q)\alpha(Q) reflects how implausible QQ is relative to PZP_Z. The absolute continuity constraint defines what the agent considers to be possible---the set of scenarios that could actually occur.

7.2.4.2Examples of Certainty Equivalents

Let’s look at examples, focusing primarily on certainty equivalents. Throughout this discussion, ZZ is an element of LL_\infty, PZP_Z is its distribution, FZF_Z is its cdf, and FZ1F_Z^{-1} is the inverse cdf.

The simplest certainty equivalent is mathematical expectation: E(Z)=E[Z]\eE(Z) = \EE[Z]. This corresponds to risk neutrality: the agent is indifferent between any random variable and its mean. The other extreme is the pessimistic certainty equivalent

Ep(Z)ess infZ=sup{aR:P{Z<a}=0}.\eE_p(Z) \coloneq \operatorname{ess\,inf} Z = \sup \setntn{a \in \RR}{\PP\{Z < a\} = 0}.

We can think of Ep(Z)\eE_p(Z) as the left-hand end point of the support of ZZ. For the pessimistic certainty equivalent, the dual representation (7.50) becomes

Ep(Z)=infQPZEQ[Z],\eE_p(Z) = \inf_{Q \ll P_Z} \EE_Q[Z],

Both of these examples are coherent.

Another example is the α\alpha-quantile certainty equivalent

Qα(Z)=FZ1(α),α(0,1).\qQ_\alpha(Z) = F_Z^{-1}(\alpha), \qquad \alpha \in (0, 1).

The value Qα(Z)\qQ_\alpha(Z) is the α\alpha-quantile of ZZ. The corresponding risk measure Rα=Qα\rR_\alpha = - \qQ_\alpha is just α\alpha-level value-at-risk (VaR).

VaR admits some pathologies. For example, VaR is not convex, and hence can increase under diversification. These deficiencies have motivated the introduction of conditional value at risk (CVaR) (also called average value at risk, or expected shortfall), defined as

Rα(Z)=1α0αFZ1(t)dtα(0,1],\rR_\alpha(Z) = -\frac{1}{\alpha} \int_0^\alpha F_Z^{-1}(t)\, dt \qquad \alpha \in (0, 1],

The corresponding CVaR certainty equivalent is Eα(Z)=Rα(Z)\eE_\alpha(Z) = - \rR_\alpha(Z), interpreted as the mean of the α\alpha-tail of the distribution of ZZ---the average over the worst α\alpha-fraction of outcomes. The CVaR certainty equivalent is coherent and admits the dual representation

Eα(Z)=inf{EQ[Z]:QPZ,   ⁣dQ ⁣dPZ1α}.\eE_\alpha(Z) = \inf\, \left\{ \EE_Q[Z] \,:\, Q \ll P_Z,\; \frac{\diff Q}{\diff P_Z} \leq \frac{1}{\alpha} \right\}.

The parameter α\alpha interpolates between the two previous cases:

α=1    E1(Z)=E[Z],α0    Eα(Z)ess infZ.\alpha = 1 \implies \eE_1(Z) = \EE[Z], \qquad \alpha \to 0 \implies \eE_\alpha(Z) \to \operatorname{ess\,inf} Z.

Another important case, already discussed in Chapter 1, is the entropic certainty equivalent

Eγ(Z)=1γlnE[exp(γZ)](γ>0).\eE_\gamma(Z) = -\frac{1}{\gamma} \ln \EE\, \left[\exp(-\gamma Z)\right] \qquad (\gamma > 0).

This equivalent is concave but not coherent. The dual representation is

Eγ(Z)=infQPZ{EQ[Z]+1γDKL(QPZ)},\eE_\gamma(Z) = \inf_{Q \ll P_Z} \left\{ \EE_Q[Z] + \frac{1}{\gamma}\, D_{\mathrm{KL}}(Q \,\|\, P_Z) \right\},

where DKL(QPZ)=EQ ⁣[log ⁣dQ ⁣dPZ]D_{\mathrm{KL}}(Q \,\|\, P_Z) = \EE_Q\!\left[\log \frac{\diff Q}{\diff P_Z}\right] is the Kullback--Leibler divergence. The parameter γ\gamma controls the degree of risk aversion and interpolates between risk neutrality and worst case. In particular, γ0\gamma \to 0 implies Eγ(Z)E[Z]\eE_\gamma(Z) \to \EE[Z], while γ\gamma \to \infty implies Eγ(Z)ess infZ\eE_\gamma(Z) \to \operatorname{ess\,inf}\, Z.

It is worth noting here that the Kreps--Porteus expectation K(Z)(E[Z1γ])1/(1γ)\kK(Z) \coloneq (\EE[Z^{1-\gamma}])^{1/(1-\gamma)} is not a certainty equivalent, at least according to our definition. While monotonicity holds, K\kK fails cash invariance. This is, in essence, why Bellman and policy operators based around Epstein--Zin preferences often fail to be contractions. We discuss Kreps--Porteus expectations again in Section 7.3.4.

7.2.4.3Continuity

Let E\eE be a certainty equivalent on L=L(Ω,F,P)L_\infty = L_\infty(\Omega, \fF, \PP). We call E\eE continuous if, given any uniformly bounded sequence (Zn)nN(Z_n)_{n \in \NN} in LL_\infty and any ZLZ \in L_\infty, we have

E(Zn)E(Z)whenever   ZnZ   P-almost surely.\eE(Z_n) \to \eE(Z) \quad \text{whenever } \; Z_n \to Z \; \text{ } \PP \text{-almost surely}.

In the definition above, uniform boundedness means that there exists an M<M < \infty with ZnM|Z_n| \leq M almost surely for all nn. Continuity will be useful for the optimality theory developed below.

7.2.5MDPs with Certainty Equivalents

We first set up a standard MDP framework with aggregator based on mathematical expectation and then replace the expectation with a general certainty equivalent, showing that the fundamental optimality results carry over when the certainty equivalent is continuous.

7.2.5.1MDP Framework

We begin by setting up a basic MDP framework that can then be adapted to add risk preferences. To this end, let X\Xsf and A\Asf be arbitrary metric spaces and let Γ\Gamma be a nonempty correspondence from X\Xsf to A\Asf. Let G={(x,a)X×A:aΓ(x)}\Gsf = \setntn{(x,a)\in \Xsf \times \Asf}{a \in \Gamma(x)}. Consider an RDP with feasible correspondence Γ\Gamma and aggregator

B(x,a,v)=r(x,a)+βE[v(f(x,a,ξ))].B(x, a, v) = r(x,a) + \beta \, \EE [ v(f(x,a,\xi)) ].

Here ξ\xi is a random element that takes values in a metric space Z\Zsf and has distribution ϕ\phi, ff is a measurable function from G×Z\Gsf \times \Zsf to X\Xsf, and rr is a measurable function from G\Gsf to R\RR. The discount factor β\beta obeys 0β<10 \leq \beta < 1. The value space is set to bXb\Xsf.

We can treat optimality and convergence of algorithms for the associated RDP (Γ,bX,B)(\Gamma, b\Xsf, B) using Proposition 6.1.7 from Chapter 6. Here, however, we’ll extend the model to use arbitrary certainty equivalents in place of E\EE. Results for E\EE will be a special case. The next section gives details.

7.2.5.2Certainty Equivalents for MDPs

Let’s consider replacing the expectation in (7.61), which corresponds to risk-neutrality over continuation values, with an arbitrary certainty equivalent E\eE. The aggregator is now

BE(x,a,v)r(x,a)+βE[v(f(x,a,ξ))].B_{\eE}(x, a, v) \coloneq r(x,a) + \beta \, \eE [v(f(x,a,\xi))].

Other primitives are left unchanged. The value space continues to be bXb\Xsf. We assume throughout that (x,a)E[v(f(x,a,ξ))](x,a) \mapsto \eE [v(f(x,a,\xi))] is measurable on G\Gsf. We consider the RDP (Γ,bX,BE)(\Gamma, b\Xsf, B_{\eE}).

7.3Applications

We now apply the RDP optimality theory developed in Section 7.2 to a range of dynamic programming problems. In Section 7.3.1 we revisit the optimal savings problem, this time with utility unbounded above, and verify the conditions of our weighted contraction results. We then study irreversible investment under risk neutrality (Section 7.3.2.1), risk aversion (Section 7.3.2.2), and ambiguity aversion (Section 7.3.3).

7.3.1Optimal Savings with Utility Unbounded Above

Here we again consider the optimal savings model from Section 1.3, but without the boundedness restriction on uu. In particular, we assume that uu is continuous, nonnegative, and increasing, and that

(w)Et0δtu(W^t)<\ell(w) \coloneq \EE \sum_{t \geq 0} \delta^t u(\hat W_t) < \infty

for some δ(β,1)\delta \in (\beta, 1). Here (W^t)(\hat W_t) is defined recursively via W^t+1=RW^t+Yt+1\hat W_{t+1} = R \hat W_t + Y_{t+1} with W^0=w\hat W_0 = w and (Yt) iid ϕ(Y_t) \iidsim \phi. We can think of (W^t)(\hat W_t) as an upper bound process for wealth, achieved when consumption is always zero. As before, ϕ\phi is a continuous density on R+\RR_+. Our aim is to provide conditions under which the conclusions of Proposition 7.2.5 apply.

We set V=bR+V = b_\ell \RR_+, Γ(w)=[0,w]\Gamma(w) = [0, w], and

B(w,c,v)=u(c)+βv(R(wc)+y)ϕ( ⁣dy).B(w, c, v) = u(c) + \beta \int v(R(w - c) + y) \phi(\diff y).

The results above imply that Assumption 7.2.7 and Assumption 7.2.4 both hold. As a result, the conclusions of Proposition 7.2.5 apply. For example, the value function v\vmax exists, is an element of bcR+b_\ell c \RR_+, and satisfies

v(w)=max0cw{u(c)+βv(R(wc)+y)ϕ( ⁣dy)}\vmax(w) = \max_{0 \leq c \leq w} \left\{ u(c) + \beta \int \vmax(R(w - c) + y) \phi(\diff y) \right\}

for all w0w \geq 0. Moreover, VFI, OPI, and HPI all converge.

In general, OPI and VFI are the easiest to implement. Figure 7.1 illustrates the runtime of OPI as a function of mm for this model with CRRA utility u(c)=(c1γ1)/(1γ)u(c) = (c^{1-\gamma} - 1)/(1-\gamma) and γ=0.5\gamma = 0.5. The runtime of VFI is shown as a horizontal line. Since VFI is the special case of OPI with m=1m=1, the leftmost point of the OPI curve coincides with the VFI runtime. The minimum is attained near m=40m = 40, where OPI runs roughly six times faster than VFI. Runtime then rises linearly in mm. The key message is that OPI dominates VFI over a wide range of mm.

Runtime of OPI vs. VFI for the optimal savings model

Figure 7.1:Runtime of OPI vs. VFI for the optimal savings model

7.3.2Irreversible Investment

We begin with the risk-neutral case in Section 7.3.2.1, where standard contraction arguments apply. In Section 7.3.2.2 we introduce risk aversion by replacing mathematical expectation with a certainty equivalent, using the framework developed in Section 7.2.4.

7.3.2.1The Risk-Neutral Case

Let’s begin with a canonical firm problem with irreversible investment, where the Bellman equation is

v(k,z)=maxiΓ(k,z){f(k,z)i+βv(i+(1δ)k,g(z,ξ))ϕ( ⁣dξ)}.v(k, z) = \max_{i \in \Gamma(k, z)} \left\{ f(k,z) - i + \beta \int v(i + (1-\delta) k, g(z, \xi)) \phi(\diff \xi) \right\}.

Here kR+k \in \RR_+ is capital stock, iR+i \in \RR_+ is investment, β(0,1)\beta \in (0,1) is the discount factor, δ(0,1)\delta \in (0,1) is a depreciation rate, ff is a production function, and zRmz \in \RR^m is an exogenous state vector. The feasible correspondence is defined by Γ(k,z)=[0,θf(k,z)]\Gamma(k,z) = [0, \theta f(k, z)], where θ>0\theta > 0 is a borrowing constraint parameter. The state process evolves according to

Zt+1=g(Zt,ξt+1),(ξt)t0 iid ϕ.Z_{t+1} = g(Z_t, \xi_{t+1}), \qquad (\xi_t)_{t \geq 0} \iidsim \phi.

Each ξt\xi_t takes values in a metric space Z\Zsf, the distribution ϕ\phi is an element of P(Z)\pP(\Zsf), and g ⁣:Rm×ZRmg \colon \RR^m \times \Zsf \to \RR^m.

Some comments are in order. First, to simplify the presentation, we’ve set the output price to unity, so that f(k,z)f(k,z) is both output and revenue. This can easily be modified. Second, the boundedness restriction on ff is not automatically satisfied in many cases but greatly simplifies the analysis. In terms of quantitative applications, the cost is not large. For example, f(k,z)=zkαf(k,z)=zk^\alpha can be replaced with f(k,z)=min{zkα,y}f(k,z) = \min\{zk^\alpha, y\} for large yy. If yy is very large then the impact on choices and values is negligible.

This firm problem is called an irreversible investment model because ii is required to be nonnegative. To frame this problem as an RDP, we set V=bXV = b\Xsf, where XR+×Rm\Xsf \coloneq \RR_+ \times \RR^m, and

B(k,z,i,v)=f(k,z)i+βv(i+(1δ)k,g(z,ξ))ϕ( ⁣dξ)B(k,z,i, v) = f(k,z) - i + \beta \int v(i + (1-\delta) k, g(z, \xi)) \phi(\diff \xi)

The set of policies Σ\Sigma is all measurable maps from X\Xsf to R+\RR_+ satisfying the feasibility constraint.

Figure 7.2 shows an example computation, plotting optimal investment as a function of capital for two productivity levels. The plot also compares this irreversible case (i0i \geq 0) with a reversible benchmark where the firm can also disinvest (i(1δ)ki \geq -(1 - \delta)k). At low capital, both firms invest identically. At high capital, with the same level of productivity, the reversible firm disinvests, while the irreversible firm sets ii to the lower bound constraint of zero. Importantly, for intermediate levels of capital, the reversible firm invests more aggressively, knowing that it can sell capital later if productivity drops. The irreversible firm faces a higher effective cost of capital due to the option value of waiting and targets a lower stock.[3]

Figure 7.3 shows simulated paths for both firms facing identical productivity shocks. The reversible firm tracks productivity more closely, boosting capital during good times and shedding it during downturns. The irreversible firm adjusts sluggishly on the downside, since it can only reduce capital through depreciation.

Investment policies: irreversible vs. reversible

Figure 7.2:Investment policies: irreversible vs. reversible

Simulated capital and investment paths under common shocks

Figure 7.3:Simulated capital and investment paths under common shocks

7.3.2.2Adding Risk Aversion

As discussed in Section 1.1.3, actual firm behavior often deviates from the risk-neutral benchmark attained under the assumption of frictionless complete markets. Here we extend the model from Section 7.3.2.1 in order to discuss this case. We swap the Bellman equation from Section 7.3.2.1 with

v(k,z)=max0iθf(k,z){f(k,z)i+βE[v(i+(1δ)k,g(z,ξ))]}.v(k, z) = \max_{0 \leq i \leq \theta f(k,z)} \left\{ f(k, z) - i + \beta \eE [v(i + (1-\delta) k, g(z, \xi))] \right\}.

The only change is that mathematical expectation has been replaced with a certainty equivalent E\eE. The term ξ\xi should be understood as a random element on Z\Zsf with distribution ϕ\phi. We assume that the map (k,z,i)E[v(i+(1δ)k,g(z,ξ))](k, z, i) \mapsto \eE [v(i + (1-\delta) k, g(z, \xi))] is measurable on G\Gsf for all vbXv \in b\Xsf.

We can set this model up as an RDP by taking V=bXV = b\Xsf, Γ(k,z)=[0,θf(k,z)]\Gamma(k,z) = [0, \theta f(k,z)], and

B(k,z,i,v)=f(k,z)i+βE[v(i+(1δ)k,g(z,ξ))].B(k,z,i,v) = f(k,z) - i + \beta \eE [v(i + (1-\delta) k, g(z, \xi))].

By the monotonicity of certainty equivalents, we have B(k,z,i,v)B(k,z,i,v)B(k,z,i,v) \leq B(k,z,i,v') whenever vvv \leq v'. Also, by our measurability assumption and boundedness of ff, the map sending (k,z)(k,z) into B(k,z,σ(k,z),v)B(k,z, \sigma(k,z), v) is bounded and measurable whenever σΣ\sigma \in \Sigma and vVv \in V. This confirms that (Γ,V,B)(\Gamma, V, B) is an RDP.

7.3.2.3Numerical Illustration

To illustrate the impact of risk aversion on firm behavior, we now compute optimal policies under the specific certainty equivalent

Eλ(Z)λE[Z](1λ)Rα(Z).\eE_\lambda(Z) \coloneq \lambda \EE[Z] - (1 - \lambda)\, \rR_\alpha(Z).

Here λ(0,1)\lambda \in (0,1) and Rα\rR_\alpha is value-at-risk at a fixed α\alpha, which will be set to the industry standard value 0.05. The certainty equivalent puts positive weight on both expected rewards and VaR, matching common management practice. Decreasing λ\lambda increases concern for left tail events. The map Eλ\eE_\lambda is a valid certainty equivalent: Rα=Qα-\rR_\alpha = \qQ_\alpha, the α\alpha-quantile certainty equivalent, and convex combinations of certainty equivalents are certainty equivalents (Exercise 7.10).

Note that Eλ\eE_\lambda is not a continuous certainty equivalent, since VaR can jump under small perturbations. This means that Proposition 7.3.2 does not directly apply. (We are treating VaR here because of its popularity in applications, rather than its attractive theoretical properties.) At the same time, when we implement the model on a machine, all numerical quantities are ultimately represented by a finite set of double-precision floats. In this sense, the model as actually computed is an RDP with finite action sets. By Proposition 7.2.1, optimal policies exist and VFI converges.

Figure 7.4 compares optimal investment policies for the risk-neutral firm (certainty equivalent E\EE) and a risk-averse firm using Eλ\eE_\lambda.[4] At both productivity levels, the risk-averse firm invests less aggressively. The intuition is that the quantile component of E\eE penalizes downside outcomes in the continuation value, which lowers the perceived return to investment. Because the firm cannot reverse investment decisions, the option value of waiting is amplified by risk aversion.

Figure 7.5 shows simulated paths for both firms facing identical productivity shocks. The risk-averse firm maintains a persistently lower capital stock and invests more cautiously throughout the sample. During periods of high productivity, the gap is especially pronounced: the risk-neutral firm boosts capital aggressively, while the risk-averse firm is restrained, anticipating the possibility of future downturns.

Investment policies: risk-neutral vs. risk-averse

Figure 7.4:Investment policies: risk-neutral vs. risk-averse

Simulated paths: risk-neutral vs. risk-averse under common shocks

Figure 7.5:Simulated paths: risk-neutral vs. risk-averse under common shocks

7.3.3Firm Investment under Ambiguity

In Section 1.2.3.3 we discussed how concern for model misspecification can be incorporated into dynamic programs. Here we return to this topic in the context of irreversible investment. We first formulate the robust control version of the firm problem and then show how duality reduces it to the risk-sensitive case already covered by our theory.

7.3.3.1Model Formulation

To set up a robust control version of the investment problem we formulate the Bellman equation as

v(k,z)=maxi{f(k,z)i+βinfψϕ[v(k,g(z,x))ψ( ⁣dx)+1γDKL(ψϕ)]}.v(k, z) = \max_i \left\{ f(k, z) - i + \beta \inf_{\psi \ll \phi} \left[ \int v(k', g(z, x)) \psi(\diff x) + \frac{1}{\gamma} D_{\mathrm{KL}}(\psi \,\|\, \phi) \right] \right\}.

Here ki+(1δ)kk' \coloneq i + (1-\delta) k and the maximization is over ii with 0iθf(k,z)0 \leq i \leq \theta f(k,z). In this case we interpret the problem as one where the manager does not fully trust the model: she fears misspecification in terms of the distribution ϕ\phi of the shock sequence (ξt)(\xi_t) and hence lacks full confidence when calculating expectations of continuation values. Nonetheless, she is willing to treat ϕ\phi as a reference model. She entertains distributions ψ\psi that deviate from ϕ\phi, provided that they don’t assign positive probability to events that ϕ\phi deems impossible.

The penalty term (1/γ)DKL(ψϕ)(1/\gamma) D_{\mathrm{KL}}(\psi \,\|\, \phi) can be thought of as a soft constraint. Models further from the reference point (in terms of KL divergence) are regarded as less plausible. If γ\gamma is close to zero then the penalty term will be very large for even small deviations. Because the evaluation of the continuation value involves an infimum, only very small deviations are considered. This corresponds to greater trust in the model. Conversely, larger values of γ\gamma indicate deeper distrust.

7.3.3.2Risk-Sensitive Formulation

Using the duality in (7.58), we can rewrite the robust control Bellman equation for the firm problem as

v(k,z)=maxi{f(k,z)iβγln[exp[γv(k,g(z,x))]ϕ( ⁣dx)]}.v(k, z) = \max_i \left\{ f(k, z) - i - \frac{\beta}{\gamma} \ln \left[ \int \exp[-\gamma v(k', g(z, x))] \phi(\diff x) \right] \right\}.

This is a version of (7.73), with E\eE set to the entropic certainty equivalent Eγ\eE_\gamma. Since Eγ\eE_\gamma is continuous (Exercise 7.11), the conditions of Proposition 7.3.2 hold under Assumption 7.3.1. As a result, for this model, the fundamental optimality properties hold, the value function v\vmax lies in bcXbc \Xsf, and VFI converges geometrically on bcXbc \Xsf.

The above discussion shows that we do not require any new machinery to tackle the somewhat intimidating robust control version of the investment problem: a duality based approach allows us to switch to a setting where we already have all the results we need.

Figure 7.6 compares optimal investment policies under three specifications: the risk-neutral benchmark (γ0\gamma \to 0) and the entropic certainty equivalent Eγ\eE_\gamma at two levels of ambiguity aversion.[5] As γ\gamma increases---reflecting deeper distrust in the reference model---investment falls. The manager who entertains a wider range of alternative models, and who evaluates continuation values under the worst-case distribution within the KL penalty ball, perceives a lower return to committing capital. The effect is monotone in γ\gamma: higher ambiguity aversion leads to uniformly less aggressive investment across all capital levels.

Figure 7.7 shows simulated paths for all three firms facing identical productivity shocks. The more ambiguity-averse firm maintains a persistently lower capital stock. During periods of high productivity, the differences are most visible: the risk-neutral firm ramps up capital, while the ambiguity-averse firm invests more cautiously, hedging against the possibility that the favorable conditions are less persistent than the reference model suggests.

Investment policies under ambiguity aversion

Figure 7.6:Investment policies under ambiguity aversion

Simulated paths under common shocks with varying ambiguity aversion

Figure 7.7:Simulated paths under common shocks with varying ambiguity aversion

7.3.4Kreps--Porteus vs Risk-Sensitivity

We return to the setup in Section 7.2.5.1, where X\Xsf, A\Asf are arbitrary metric spaces, Γ\Gamma is a nonempty correspondence from X\Xsf to A\Asf, and B(x,a,v)=r(x,a)+βE[v(f(x,a,ξ))]B(x, a, v) = r(x,a) + \beta \EE [v(f(x,a,\xi))]. We suppose, as in Assumption 7.2.11, that the correspondence Γ\Gamma is compact-valued and continuous, the reward function rr is bounded and continuous, and that the map (x,a)f(x,a,z)(x,a) \mapsto f(x,a,z) is continuous on G\Gsf for all zZz \in \Zsf. As discussed in Section 7.2.5.1, the fundamental optimality properties hold and VFI converges on bcXbc \Xsf.

In Section 7.2.5.2 we extended this basic MDP analysis to settings where the aggregator has the form B(x,a,v)r(x,a)+βEv(f(x,a,ξ))B(x, a, v) \coloneq r(x,a) + \beta \eE v(f(x,a,\xi)). In Proposition 7.2.10 we showed that, when E\eE is continuous, the fundamental optimality properties hold, the value function v\vmax lies in bcXbc \Xsf, and VFI converges geometrically on bcXbc \Xsf.

One special case is the entropic setting, where

BRS(x,a,v)=r(x,a)+βθlnE[exp(θv(f(x,a,ξ)))].B_{\rm RS}(x, a, v) = r(x,a) + \frac{\beta}{\theta} \ln \EE \left[ \exp(\theta \cdot v(f(x,a,\xi))) \right].

This model is called a risk-sensitive MDP. The modified expectation is an application of the entropic certainty equivalent (7.57) with θ=γ\theta = -\gamma. This modified expectation allows for parameterization of risk-sensitivity through θ\theta, with θ<0\theta < 0 injecting risk-aversion. Since the entropic certainty equivalent is continuous (Exercise 7.11), we can apply Proposition 7.2.10. This tells us that all of the preceding convergence and optimality results apply.

Another alternative is to replace the entropic certainty equivalent with Kreps--Porteus expectations, leading to aggregator

BKP(x,a,v)=r(x,a)+β{E[v(f(x,a,ξ))ν]}1/ν(νR and ν0).B_{\rm KP}(x, a, v) = r(x,a) + \beta \left\{ \EE [ v(f(x,a,\xi))^\nu ] \right\}^{1/\nu} \qquad (\nu \in \RR \text{ and } \nu \not= 0).

Here, in order to avoid running into trouble with exponents, we require that r>0r > 0 and take the value space VV to be all functions in bXb\Xsf that take only positive values. We discussed such an RDP in Section 7.1.1.6.

Note, however, that the Kreps--Porteus expectation fails cash invariance, and, as such, is not a certainty equivalent (as previously discussed in Section 7.2.4.2). As a result, the preceding optimality theory does not apply. In particular, we cannot appeal to Proposition 7.2.10. Moreover, the aggregator BKPB_{\rm KP} is not generally contracting, in the sense that Assumption 7.2.1 typically fails. Instead, the RDP (Γ,V,BKP)(\Gamma, V, B_{\rm KP}) has to be treated with other methods, such as the convexity-based techniques used in Section 5.1.3.

There is, however, a multiplicative variation on the Kreps--Porteus RDP that is simple to analyze. The model is obtained by setting

BMKP(x,a,v)=r(x,a){E[v(f(x,a,ξ))ν]}β/ν,B_{\rm MKP}(x, a, v) = r(x,a) \cdot \left\{ \EE [ v(f(x,a,\xi))^\nu ] \right\}^{\beta/\nu},

while continuing to assume that rr is everywhere positive. The parameter β\beta is a discount factor for the multiplicative model, and is assumed to take values in [0,1)[0,1). We call (Γ,V,BMKP)(\Gamma, V, B_{\rm MKP}) the multiplicative Kreps--Porteus RDP.

It turns out that the multiplicative Kreps--Porteus RDP and the additive risk-sensitive RDP (Γ,bX,BRS)(\Gamma, b\Xsf, B_{\rm RS}) are closely related---in fact they are isomorphic. To illustrate this, we take logs of the Bellman equation associated with the multiplicative Kreps--Porteus RDP, obtaining

lnv(x)=maxaΓ(x){lnr(x,a)+βνlnE[v(f(x,a,ξ))ν]}.\ln v(x) = \max_{a \in \Gamma(x)} \left\{ \ln r(x,a) + \frac{\beta}{\nu} \ln \EE [ v(f(x,a,\xi))^\nu ] \right\}.

Setting v^=lnv\hat v = \ln v and r^=lnr\hat r = \ln r yields

v^(x)=maxaΓ(x){r^(x,a)+βνlnE[exp(νv^(f(x,a,ξ)))]}.\hat v(x) = \max_{a \in \Gamma(x)} \left\{ \hat r(x,a) + \frac{\beta}{\nu} \ln \EE [ \exp( \nu \cdot \hat v(f(x,a,\xi))) ] \right\}.

This is the Bellman equation for (7.78), after replacing rr with r^\hat r and θ\theta with ν\nu.

7.4Chapter Notes

RDPs were introduced in Chapter 8 of Sargent & Stachurski (2025) in settings where the state space is finite. The theory in this chapter extends that treatment to general state spaces. The study of contracting dynamic programs with abstract Bellman equations was begun by Denardo (1967). Extensive discussion can be found in Bertsekas (2022). (The terminology is slightly confusing: the abstract dynamic programs studied in Denardo (1967) and Bertsekas (2022) are similar to the RDPs studied in this chapter. For us, however, abstract dynamic programs are the more general objects introduced in Section 2.1.1.)

The optimality results in Section 7.2.1--Section 7.2.2 combine the RDP framework with the Blackwell contraction theory of Chapter 4. Our framework is similar to the contractive models in Bertsekas (2022). The results in Section 7.2.3 on monotonicity, concavity, uniqueness, and continuity of solutions extend related results in Bäuerle & Jaśkiewicz (2018).

In Section 7.2.2 we treated RDPs where rewards are bounded below and unbounded above. Related work can be found in Toda (2023). One approach to the reverse case---where rewards are bounded above and unbounded below---can be found in Ma et al. (2022). Their idea is to rearrange the Bellman equation so that the transformed problem has bounded rewards, allowing standard contraction mapping arguments to be applied. The transformation is inspired by the Q-function used in reinforcement learning.

Regarding Euler equations, early results along the lines of Proposition 8.3.9 were established by Mirman & Zilcha (1975) and Benveniste & Scheinkman (1979).

The certainty equivalents and risk measures discussed in Section 7.2.4 are standard tools in mathematical finance and decision theory. The dual representation theorem for convex risk measures is due to Föllmer & Schied (2002); see also Jouini et al. (2006). The quantile certainty equivalent and its risk measure counterpart (VaR) have been studied in dynamic programming environments by Castro & Galvao (2019), Castro & Galvao (2022), Almeida et al. (2024), Castro et al. (2025), and Castro & Galvao (2025), among others.

The robust control formulation in Section 7.3.3 builds on Hansen & Sargent (2001) and Hansen & Sargent (2011), who developed the multiplier preference approach to robustness in dynamic economic models. The duality between robust control and risk-sensitive preferences, which we exploit to reduce the robust problem to the entropic certainty equivalent case, is a central theme of that literature.

For a recent textbook treatment that uses the RDP framework, see Toda (2024).

Footnotes
  1. If we were more cynical, we would add that existence of these two literatures also reflects the fact that researchers can publish more papers if they study the same thing under different names.

  2. More formally, a certainty equivalent E\eE is called law invariant if there exists a functional e ⁣:P(R)Re \colon \pP(\RR) \to \RR such that E(Z)=e(PZ1)\eE(Z) = e(\PP \circ Z^{-1}) for all ZLZ \in L_\infty.

  3. We set f(k,z)=min{zkα,y}f(k,z) = \min\{zk^\alpha, y\} with y=1000y=1000, α=0.3\alpha = 0.3, β=0.95\beta = 0.95, δ=0.1\delta = 0.1, and θ=1.5\theta = 1.5. The exogenous state follows Zt=exp(Xt)Z_t = \exp(X_t) where (Xt)(X_t) is AR(1) with persistence ρ=0.9\rho = 0.9 and volatility ν=0.2\nu = 0.2, discretized via the Tauchen method. The value function is approximated on a grid via linear interpolation of v(,z)v(\cdot, z) for each zz, and solved via VFI.

  4. The parameterization is the same as for the risk-neutral case and λ=0.5\lambda = 0.5. The dynamic program is solved via VFI.

  5. The parameterization is the same as for the risk-neutral case, with γ=0.05\gamma = 0.05 and γ=0.5\gamma = 0.5 for the entropic certainty equivalent. VFI converges geometrically in all cases.

References
  1. Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
  2. Kristensen, D., Mogensen, P. K., Moon, J. M., & Schjerning, B. (2021). Solving dynamic discrete choice models using smoothing and sieve methods. Journal of Econometrics, 223(2), 328–360.
  3. Rust, J. (1994). Structural estimation of Markov decision processes. Handbook of Econometrics, 4, 3081–3143.
  4. Ma, Q., Stachurski, J., & Toda, A. A. (2022). Unbounded dynamic programming via the Q-transform. Journal of Mathematical Economics, 100, 102652.
  5. Föllmer, H., & Schied, A. (2002). Convex Measures of Risk and Trading Constraints. Finance and Stochastics, 6(4), 429–447. 10.1007/s007800200072
  6. Jouini, E., Schachermayer, W., & Touzi, N. (2006). Law Invariant Risk Measures Have the Fatou Property. In S. Kusuoka & A. Yamazaki (Eds.), Advances in Mathematical Economics (Vol. 9, pp. 49–71). Springer. 10.1007/4-431-34342-3_4
  7. Denardo, E. V. (1967). Contraction Mappings in the Theory Underlying Dynamic Programming. SIAM Review, 9(2), 165–177.
  8. Bertsekas, D. P. (2022). Abstract dynamic programming (3rd ed.). Athena Scientific.
  9. Bäuerle, N., & Jaśkiewicz, A. (2018). Stochastic optimal growth model with risk sensitive preferences. Journal of Economic Theory, 173, 181–200.
  10. Toda, A. A. (2023). Unbounded Markov Dynamic Programming with Weighted Supremum Norm Perov Contractions.
  11. Mirman, L. J., & Zilcha, I. (1975). On optimal growth under uncertainty. Journal of Economic Theory, 11(3), 329–339.
  12. Benveniste, L. M., & Scheinkman, J. A. (1979). On the differentiability of the value function in dynamic models of economics. Econometrica, 727–732.
  13. de Castro, L., & Galvao, A. F. (2019). Dynamic quantile models of rational behavior. Econometrica, 87(6), 1893–1939.
  14. de Castro, L., & Galvao, A. F. (2022). Static and dynamic quantile preferences. Economic Theory, 73(2–3), 747–779.
  15. Almeida, H., Campello, M., de Castro, L. I., & Galvao Jr, A. F. (2024). A Quantile Model of Firm Investment [Techreport]. National Bureau of Economic Research.