Additional Applications - Dynamic Programming Volume II: General States

8.1Job Search¶

We formulate a basic IID job search model as an ADP and establish optimality results. We then reduce dimensionality via continuation values and study parametric monotonicity of the reservation wage. Finally, we extend the model to allow correlated wage draws with a Markov structure.

8.1.1The Basic Model¶

We begin with the job search problem of McCall (1970), a finite state version of which was discussed at length in Chapter 1 of Sargent & Stachurski (2025). Here we consider a general state version and allow wages and rewards to be unbounded.

We describe the model, construct the associated ADP on $L_1(\phi)$ , and verify that it is well-posed and regular. We also consider smaller value spaces that can be used when the offer distribution has bounded support. We then establish the fundamental optimality properties and convergence of the standard algorithms.

8.1.1.1Description¶

In each period, an unemployed worker receives a wage offer $W_t$ , drawn from some known distribution $\phi$ . The worker can accept the current offer or wait until the following period and consider a new offer.

Let $L_1(\phi) \coloneq L_1(\Wsf, \bB, \phi)$ be all Borel measurable $f \colon \Wsf \to \RR$ with $\int |f| \diff \phi < \infty$ . As usual, functions equal $\phi$ -almost everywhere are identified and $f \leq g$ means that $\{f > g\}$ has measure zero under $\phi$ . Let $\Sigma$ be all Borel measurable $\sigma \colon \Wsf \to \{0,1\}$ . Each such $\sigma$ can be understood as a policy, mapping states to actions: If $\sigma(w)=1$ , then the unemployed worker stops and accepts current offer $w$ . If $\sigma(w)=0$ , she continues.

Consider first a two-period problem. In period zero, the worker can either accept observed wage offer $w_0 \sim \phi$ or continue to the next period, receiving unemployment compensation $c$ and random payoff $v(W_1)$ . The offer $W_1$ is drawn from $\phi$ and $v$ is a given “terminal reward” function. Under policy $\sigma$ , which maps the wage offer $w_0$ into an accept/reject decision, the expected present value of her payoff is

v_\sigma(w_0) \coloneq \sigma(w_0) w_0 + (1-\sigma(w_0)) \left[ c + \beta \int v(w') \phi(\diff w') \right].

(8.1)

If $\sigma(w_0)=1$ the worker accepts and receives reward $w_0$ . If $\sigma(w_0)=0$ , then she rejects and receives expected continuation reward $c + \beta \int v(w') \phi(\diff w')$ .

Now we switch to an infinite horizon. Jobs are assumed to be permanent, so the present value of stopping with wage offer $w$ in hand is

\frac{w}{1-\beta} = w + \beta w + \beta^2 w + \cdots

(8.2)

Fixing $\sigma \in \Sigma$ , let $v_\sigma(w)$ be the lifetime value of following policy $\sigma$ given initial wage offer $w$ . By analogy with (8.1), we expect $v_\sigma$ to obey the recursion

v_\sigma(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v_\sigma(w') \phi(\diff w') \right] \quad \text{for all } w \in \Wsf.

(8.3)

Compared to (8.1), we have taken the value of stopping from (8.2) and also replaced the terminal value function $v$ on the right-hand side of (8.1) with $v_\sigma$ . This is because we now work with an infinite horizon, so that (8.3) becomes a recursion in $v_\sigma$ .

Continuing to hold $\sigma$ fixed, we introduce the policy operator $v \mapsto T_\sigma \, v$ via

(T_\sigma \, v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v(w') \phi(\diff w') \right].

(8.4)

Since $L_1(\phi)$ is closed under linear operations, policies are Borel measurable, and Assumption 8.1.1 is in force, we have $T_\sigma \, v \in L_1(\phi)$ whenever $v \in L_1(\phi)$ . Clearly $T_\sigma$ is order preserving on $(L_1(\phi), \leq)$ . Hence, with $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ , the pair $(L_1(\phi), \TT)$ is an ADP. By construction, any fixed point of $T_\sigma$ solves (8.3), so each such fixed point $v_\sigma$ has the interpretation of assigning lifetime values to states under $\sigma$ .

Fix $v \in L_1(\phi)$ and consider the policy $\sigma$ given by

\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int v(w') \phi(\diff w') \right\} \qquad (w \in \Wsf).

(8.5)

This policy tells the worker to stop when the payoff from stopping is larger than the expected payoff from continuing, assuming that $v$ is used to value future states.

Since $\sigma$ in (8.6) is well-defined at any $v \in L_1(\phi)$ , and also Borel measurable, the ADP $(L_1(\phi), \TT)$ is regular.

The expression for $T$ in (8.7) aligns with the job search Bellman operator from Chapter 1 of Sargent & Stachurski (2025), after replacing the finite expectation with an integral.

8.1.1.2Reducing the Value Space¶

In the preceding analysis we used $L_1(\phi)$ for the value space because $\phi$ is allowed to have unbounded support. Since $w$ can be arbitrarily large, this implies that the function $T_\sigma \, v$ in (8.4) is unbounded. The set $L_1(\phi)$ can handle unbounded functions. To complete this section, the next exercise looks at settings where $\phi$ has bounded support and considers how we might exploit this by selecting a smaller value space.

Solution to Exercise 8.1.5

We set $\Gamma(w) = \{0, 1\}$ for every $w \in \Wsf$ , $V = b\Wsf$ , and

B(w, a, v) = a \frac{w}{1-\beta} + (1 - a) \left[ c + \beta \int v(w') \phi(\diff w') \right].

(8.8)

For monotonicity, if $v \leq u$ pointwise, then $\int v \diff \phi \leq \int u \diff \phi$ , so $B(w, a, v) \leq B(w, a, u)$ for all $(w,a) \in \Gsf$ . For consistency, fix $\sigma \in \Sigma$ and $v \in b\Wsf$ , and let $m(w) \coloneq B(w, \sigma(w), v)$ . Since $\Wsf$ is bounded, $w/(1-\beta)$ is bounded on $\Wsf$ , and $\int v \diff \phi$ is finite since $v$ is bounded. Hence $m$ is bounded. Moreover, $m$ is Borel measurable because $\sigma$ is measurable and $w \mapsto w/(1-\beta)$ is continuous. Thus $m \in b\Wsf$ .

8.1.1.3Optimality with IID Offers¶

Let’s return now to the general setting of Assumption 8.1.1, where $\Wsf \subset \RR_+$ can be unbounded, and use $L_1(\phi)$ for the value space. We consider optimality properties and convergence of algorithms for the job search ADP $(L_1(\phi), \TT)$ .

Since the fundamental optimality properties hold, the value function $\vmax$ is a fixed point of the Bellman operator $T$ and a policy $\sigma$ is optimal if and only if it is $\vmax$ -greedy, which is to say that

\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int \vmax(w') \phi(\diff w') \right\}

for all $w \in \RR_+$ . (We are assuming that the agent accepts the job offer if indifferent.) In other words, the optimal rule is to stop if and only if

w \geq (1 - \beta) \left[ c + \beta \int \vmax(w') \phi(\diff w') \right].

The term on the right-hand side is called the reservation wage. This representation of optimal behavior is convenient because the reservation wage provides a scalar summary of the solution to the problem.

8.1.2Rearranging the Bellman Equation¶

In the IID case, the Bellman equation can be reduced to a scalar fixed point problem in a single continuation value. We derive this reduction, connect it to the theory of factored DPs, and use it to study how the reservation wage varies with model parameters.

8.1.2.1Continuation Values¶

In view of (8.7), a function $v$ satisfies the Bellman equation when

v(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \int v(w') \phi(\diff w') \right\} \quad \text{for all } w \in \Wsf.

(8.9)

Taking $v$ as given, consider the term

h = c + \beta \int v(w') \phi(\diff w') .

(8.10)

We can use $h$ to eliminate the function $v$ from (8.9). To do so we insert $h$ on the right-hand side, replace $w$ with $w'$ in (8.9), take expectations, multiply by $\beta$ and add $c$ to obtain

h = c + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w').

(8.11)

This is a nonlinear equation in $h$ , the solution of which, henceforth denoted $\hmax$ , is the optimal continuation value of our problem. Obtaining $\hmax$ allows us to solve the dynamic programming problem, since any policy $\sigma$ satisfying

\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax \right\} \quad \text{for all } w \in \RR_+

(8.12)

is optimal. (We discuss this more formally in Section 8.1.2.2.) Another way to write (8.12) is

\sigma(w) = \1 \left\{ w \geq \wopt \right\} \quad \text{where } \; \wopt \coloneq (1 - \beta) \hmax,

(8.13)

where the final term is the reservation wage.

In order to solve (8.11), we introduce the mapping

g(h) = c + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w') \qquad (h \in \RR_+).

(8.14)

It is constructed such that any solution to (8.11) is a fixed point of $g$ and vice versa.

The result of Exercise 8.1.6 implies that $g$ has a unique fixed point $\hmax$ in $\RR_+$ , which is the optimal continuation value. Figure 8.1 shows the function $g$ when $\ln W_t = \mu + s Z_t$ for standard normal $Z_t$ , while $\beta = 0.9$ and $c = 1.0$ . The integral in (8.14) is computed by Monte Carlo. The unique fixed point is $\hmax$ .

The function g from — Figure 8.1:The function $g$ from (8.14)

One obvious advantage of the formulation of the problem in Section 8.1.2.1 is that, instead of searching for a value function $\vmax$ in the infinite-dimensional space $L_1(\phi)$ , we only need to solve for the fixed point of $g$ in the one-dimensional space $\RR_+$ . The next exercise further reduces the search space to a bounded interval in $\RR_+$ .

Solution to Exercise 8.1.7

Let $\bar w \coloneq \int w \phi(\diff w)$ and let $f(h) \coloneq c + \beta \bar w / (1-\beta) + \beta h$ . Since $\max\{w'/(1-\beta), h\} \leq w'/(1-\beta) + h$ for $w', h \geq 0$ , we have $g(h) \leq f(h)$ for all $h \in \RR_+$ . The unique fixed point of $f$ is $K \coloneq (c + \beta \bar w / (1-\beta))/(1 - \beta)$ . Since $g \leq f$ pointwise and $g$ is increasing (by inspection of (8.14)), we have $g(K) \leq f(K) = K$ . Moreover, $g(0) = c + \beta \int (w'/(1-\beta)) \phi(\diff w') \geq 0$ . Hence $g$ maps $[0, K]$ to itself.

Figure 8.2 shows the reservation wage, computed by iterating on $g$ to obtain (an approximation to) $\hmax$ and then calculating $\wopt$ via (8.13). In the computation, $c$ and the distribution $\phi$ are as for the last figure, while $\beta$ ranges from 0.9 to 0.99.

Reservation wage as a function of \beta — Figure 8.2:Reservation wage as a function of $\beta$

8.1.2.2An FDP Perspective¶

As an exercise, let’s connect the transformation discussed in Section 8.1.2.1 to the theory of FDPs in Chapter 5. For this discussion we adopt the environment of Section 8.1.1.3 and set

$V = L_1(\phi)$ ,
$\hat V = \RR_+$ ,
$F \colon V \to \hat V$ with $Fv = c + \beta \int v(w') \phi(\diff w')$ , and
$G_\sigma \colon \hat V \to V$ with $(G_\sigma \, h)(w) = \sigma(w) (w/(1-\beta)) + (1-\sigma(w)) h$ .

Clearly, given $h \in \hat V$ , we can attain the bound $G_\tau h \leq G_\sigma h$ for all $\tau \in \Sigma$ by setting

\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq h \right\} \qquad (w \in \Wsf).

(8.16)

Let $\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}$ . Since $F$ and each $G_\sigma$ are order-preserving, the tuple $(V, F, \hat V, \GG)$ is an order-preserving FDP.

For the primary ADP generated by $(V, F, \hat V, \GG)$ , the policy operators have the form

(T_\sigma \, v)(w) = (G_\sigma F v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[c + \beta \int v(w') \phi(\diff w') \right].

This is identical to (8.4), so the primary ADP is just the original job search ADP $(L_1(\phi), \TT)$ from Section 8.1.1.1.

Regarding the subordinate ADP generated by $(V, \hat V, F, \GG)$ , the policy operators have the form

\hat T_\sigma \, h = F G_\sigma h = c + \beta \int \left[ \sigma(w') \frac{w'}{1-\beta} + (1-\sigma(w')) h \right] \phi(\diff w').

The associated Bellman operator is

\hat T \, h = c + \beta \int \max \left\{ \frac{w'}{1-\beta} , h \right\} \phi(\diff w').

On inspection, we see that the fixed point of $\hat T$ is a solution to (8.11). Thus, the subordinate ADP $(\hat V, \hat{\TT})$ represents the continuation value problem from Section 8.1.2.1.

These observations formalize the ideas expressed in Section 8.1.2.1. For example, Theorem 5.2.13 tells us that a policy $\sigma \in \Sigma$ will be optimal for the job search problem when $\hmax$ is a fixed point of $\hat T$ and $\sigma$ obeys $G_\sigma \, \hmax = \Gmax \hmax$ . (Here $G$ is the supremum $\bigvee_\sigma G_\sigma$ .) In view of (8.16), such a $\sigma$ can be found by setting

\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax \right\} \qquad (w \in \Wsf).

This policy aligns with the (informally derived) solution from (8.12).

8.1.2.3Parametric Monotonicity¶

How does the solution to the job search problem vary with parameters? In terms of monotonicity, one way to answer this is to appeal to Proposition A.5.20. Since $g$ is an increasing contraction mapping on $\RR_+$ , this proposition implies that any parameter that shifts up the function $g$ in (8.14) pointwise on $\RR_+$ also shifts its fixed point up.

Example 8.1.1

The optimal continuation value $\hmax$ is increasing in $c$ . Indeed, if $c_1 \leq c_2$ , then

c_1 + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w') \leq c_2 + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w').

Thus, the function $g$ shifts up everywhere when $c$ increases and hence $\hmax$ increases with $c$ . This is as expected, since higher unemployment compensation makes the value of continuing to the next period greater.

Figure 8.2 suggests that $\wopt$ is also increasing in $\beta$ . Since $\wopt = (1-\beta) \hmax$ , we cannot infer this directly from the fact that $\hmax$ is increasing in $\beta$ . Instead, we take the fixed point equation for $\hmax$ in (8.11) and substitute $w = (1-\beta) h$ , which uses the definition of the reservation wage from (8.13), to obtain a new fixed point equation $f(w) = w$ where

f(w) \coloneq c (1-\beta) + \beta \int \max \left\{ w' ,\, w \right\} \phi(\diff w').

(8.17)

Solution to Exercise 8.1.10

We apply Proposition A.5.20. Fixing $w \in \RR_+$ , it suffices to show that the value $f(w) \coloneq c (1-\beta) + \beta \int \max \left\{ w' ,\, w \right\} \phi(\diff w')$ shifts up when $\beta$ increases. This is true when $c \leq \bar w$ , because $f(w)$ is the weighted average of two terms and the second term is larger than the first:

\int \max \left\{ w' ,\, w \right\} \phi(\diff w') \geq \int w' \phi(\diff w') = \bar w \geq c.

Increasing $\beta$ puts more weight on the larger term, so $f(w)$ increases with $\beta$ .

How do shifts in the wage offer distribution affect the reservation wage? One observation is that a shift to a “more favorable” wage distribution should increase the reservation wage, since an agent who continues can expect better offers.

A more interesting monotonicity result for this model concerns the volatility of the wage process and its impact on the reservation wage. For this problem, greater volatility encourages patience because the option value of waiting is larger. The next exercise asks you to verify this, using the concept of mean-preserving spreads.

Solution to Exercise 8.1.13

Let $\psi$ and $\phi$ have the stated properties and fix $w \in \RR_+$ . In view of Proposition A.5.20, it is enough to show that, under the stated assumptions, the value $f(w)$ in (8.17) increases with the mean-preserving spread, or, equivalently

\int \max \left\{ w' ,\, w \right\} \phi(\diff w') \leq \int \max \left\{ w' ,\, w \right\} \psi(\diff w').

(8.18)

To see that this is so, observe that, by the definition of a mean-preserving spread, there exists a pair $(w', Z)$ such that $\EE[ Z \given w' ] = 0$ , $w' \eqdist \phi$ and $w' + Z \eqdist \psi$ . By this fact and the law of iterated expectations,

\int \max \left\{ w' ,\, w \right\} \psi(\diff w') = \EE \left[ \max \left\{ w' + Z ,\, w \right\} \right] = \EE \left[ \EE \left[ \max \left\{ w' + Z ,\, w \right\} \, | \, w' \right] \right].

An application of Jensen’s inequality now produces

\int \max \left\{ w' ,\, w \right\} \psi(\diff w') \geq \EE \max \left\{ \EE [w' + Z \given w' ] ,\, w \right\} .

Using $\EE[w' \given w'] = w'$ and $\EE[Z \given w'] = 0$ confirms (8.18).

8.1.3Job Search with Correlated Wage Draws¶

In our simplistic model of job search we have so far assumed that wage offer draws are IID. Now let’s allow these offers to have a Markov structure:

8.1.3.1An ADP Representation¶

As before, $\Sigma$ is the set of all Borel measurable functions $\sigma$ mapping $\Wsf$ to $\{0, 1\}$ . Each policy operator $T_\sigma$ is adjusted to

(T_\sigma \, v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v(w') P(w, \diff w') \right].

We can write $T_\sigma$ more succinctly as

T_\sigma \, v = \sigma e + (1 - \sigma) (c + \beta Pv) \quad \text{when} \quad e(w) \coloneq \frac{w}{1-\beta},

(8.19)

with products such as $\sigma e$ defined pointwise.

Solution to Exercise 8.1.14

Fix $v \in L_1(\phi)$ and $\sigma \in \Sigma$ . For the self-map property, we need to show that $\sigma e + (1 - \sigma) (c + \beta Pv)$ is again in $L_1(\phi)$ . Borel measurability is obvious from the Borel measurability of elements of $\Sigma$ and assumptions on the primitives. Regarding $\phi$ -integrability, it suffices to show that the individual terms in the sum are integrable. That $\sigma e$ is integrable follows from Assumption 8.1.2. Also, $(1-\sigma) c$ is integrable because $\phi$ is a probability measure. Finally, $0 \leq (1-\sigma) \beta Pv \leq Pv$ and $P$ maps $L_1(\phi)$ to itself (see Lemma A.5.32).

The order preserving property of $T_\sigma$ follows from the fact that $P$ is a positive linear operator.

Exercise 8.1.15

Given $v \in L_1(\phi)$ , show that the policy $\sigma \in \Sigma$ given by

\sigma(w) \coloneq \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int v(w') P(w, \diff w') \right\} \qquad (w \in \Wsf).

(8.20)

is $v$ -greedy. Show, in addition, that the ADP Bellman operator corresponding to $(L_1(\phi), \TT)$ obeys

(T v)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \int v(w') P(w, \diff w') \right\} \qquad (v \in L_1(\phi), \; w \in \Wsf).

(8.21)

It follows from Exercise 8.1.14 and Exercise 8.1.15 that, with $\TT$ as all $T_\sigma$ in (8.19) for some $\sigma \in \Sigma$ , the pair $(L_1(\phi), \TT)$ is a regular ADP.

We can now state an optimality result for the job search model with Markov wage draws.

The next exercise suggests a way to reduce the value space.

Solution to Exercise 8.1.18

Fix $\sigma \in \Sigma$ . Using the power series representation from the Neumann series lemma, we have

v_\sigma = \sum_{t \geq 0} [\beta (1-\sigma) P]^t (\sigma e + (1 - \sigma) c) \leq \sum_{t \geq 0} (\beta P)^t ( e + c) = \bar v.

Next we show that $v \in V$ implies $T_\sigma v \in V$ . To this end, fix $v \in V$ . Evidently $0 \leq T_\sigma \, v$ . Moreover, since $v \leq \bar v$ ,

T_\sigma \, v \leq e + c + \beta P v \leq e + c + \beta P (I - \beta P)^{-1}(e + c).

Using the power series representation, the right hand side can be expressed as

e + c + \beta P \sum_{t \geq 0} (\beta P)^t (e + c) = e + c + \sum_{t \geq 1} (\beta P)^t (e + c) = \sum_{t \geq 0} (\beta P)^t (e + c) = \bar v.

We have confirmed that $0 \leq T_\sigma \, v \leq \bar v$ when $v \in V$ , so each $T_\sigma$ is a self-map on $V$ .

Since $T_\sigma$ is also order preserving, $(V, \TT)$ is an ADP.

8.1.3.2A Numerical Study¶

Figure 8.3 shows the output of HPI under a range of parameter values. First we generate a stochastic matrix $P$ for wage offers via Tauchen’s method, discretizing the AR1 process $W_{t+1} = \rho W_t + \nu \xi_{t+1}$ where $(\xi_t)$ is IID and standard normal. We set $\beta=0.99$ , $\rho=0.9$ , $\nu=0.2$ and $n=500$ . In the left subfigure we plot $\vmax$ , computed by HPI, as well as the exit option $e(w) = w/(1-\beta)$ and the reservation wage, which is $\bar w \coloneq \min\{w \in \Wsf : \sigopt(w) = 1\}$ when $\sigopt$ is $\vmax$ -greedy. The reservation wage is the minimum wage offer at which the unemployed worker accepts.

In the right subfigure we vary the volatility parameter $\nu$ over 0.1 to 0.2 and plot $\bar w$ as a function of $\nu$ , holding other parameters fixed. Notice that the reservation wage increases with wage offer volatility. The reason is that more volatility increases the upside of waiting, due to the possibility of high future offers. At the same time, downside risk is mitigated by the ability to reject a bad offer.

Figure 8.3:Solution to the job search problem

Figure 8.4 shows the first two iterates of HPI, OPI and VFI, as well as the value function $\vmax$ and the shared initial condition $v$ . Parameter values are the same as the left-hand subfigure in Figure 8.3. In the case of OPI, $m$ is set to 10. We see that HPI converges faster than VFI in terms of reduced distance to the value function per iteration. The rate for OPI is between HPI and VFI.

Figure 8.4:Comparison of algorithms (job search)

8.1.3.3Persistent and Transient Components¶

Let’s now look at a more sophisticated wage offer process, with persistent and transient components. In particular, we assume that $(W_t)$ obeys

W_t = \exp(Z_t) + \exp(\mu + \sigma \zeta_t), \qquad \text{where} \quad Z_{t+1} = \rho Z_t + d + s \epsilon_{t+1}

(8.23)

Here $\mu, d \in \RR$ , $\sigma, s$ are positive, and $\rho \in (-1, 1)$ . The sequences $(\zeta_t)_{t \geq 1}$ and $(\epsilon_t)_{t \geq 1}$ are independent, IID, and standard normal. Thus, the persistent component $\exp(Z_t)$ and the transient component are lognormal. The model is otherwise unchanged. The state becomes $(w, z) \in (0,\infty) \times \RR$ and the Bellman equation is

v(w, z) = \max \left\{ \frac{w}{1-\beta}, c + \beta \, \EE_z v(w', z') \right\} .

(8.24)

Here $\EE_z$ is expectation conditional on $z$ . The expectation term can be written more explicitly as

\EE_z v(w', z') = \int v \left[ \exp(\rho z + d + s \epsilon) + \exp(\mu + \sigma \zeta), \rho z + d + s \epsilon \right] \, \phi(\diff \epsilon, \diff \zeta).

Here $z$ and the parameters are fixed and $\phi$ is the $N(0, I)$ distribution on $\RR^2$ .

Rather than analyzing this model directly, we can first reduce dimensionality by transforming it via continuation values, analogous to the technique we used in Section 8.1.2. As a first step, let $h(z)$ be the continuation value from (8.24):

h(z) \coloneq c + \beta \, \EE_z v(w', z') \qquad (z \in \RR).

(8.25)

(Notice that $h$ is a function now, as opposed to the IID setting of (8.10). This is not surprising, since the current state can be used to predict future wages, which in turn determine future value.)

Given $h$ , the Bellman equation can be written as $v(w, z) = \max \left\{w/(1-\beta), \, h(z) \right\}$ . Combining this with the definition of $h$ , we see that

h(z) = c + \beta \, \EE_z \max \left\{ \frac{w'}{1-\beta}, h(z') \right\} \qquad (z \in \RR).

(8.26)

(Note the similarity with (8.11).) The function $h$ is defined on all of $\RR$ , since this is the domain of $z$ . If we can obtain the solution $\hmax$ to this functional equation, we can use it to act optimally via the policy

\sigopt(w, z) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax(z) \right\}.

(8.27)

To formalize these ideas, we can construct an ADP such that the Bellman equation agrees with (8.26). To do so we take $\Sigma$ to be all Borel measurable functions sending $(w', z') \in (0, \infty) \times \RR$ to $\{0,1\}$ and, for each $\sigma \in \Sigma$ , we set

(\hat T_\sigma \, h)(z) \coloneq c + \beta \, \EE_z \left\{ \sigma(w', z') \frac{w'}{1-\beta} + (1- \sigma(w', z')) h(z') \right\} .

(8.28)

We take $\phi$ to be the stationary distribution of $(Z_t)$ and consider each $\hat T_\sigma$ as a mapping over all $h \in L_1(\phi) \coloneq L_1(\RR, \bB, \phi)$ . Let $\hat{\TT}$ be all such $\hat T_\sigma$ as $\sigma$ ranges over $\Sigma$ . The pair $(L_1(\phi), \hat{\TT})$ forms an ADP.

We can alternatively write $\hat T_\sigma h$ as $\hat T_\sigma \, h = m_\sigma + K_\sigma \, h$ , where

m_\sigma(z) \coloneq c + \beta \, \EE_z \left\{\sigma(w', z') \frac{w'}{1-\beta} \right\} \quad \text{and} \quad (K_\sigma \, h)(z) \coloneq \beta \, \EE_z \, (1- \sigma(w', z')) h(z').

Each $K_\sigma$ is a positive linear operator on $L_1(\phi)$ and, moreover, for the positive linear operator $K$ defined by

(K h)(z) \coloneq \beta \EE_z \, h(z') = \beta \EE h(\rho z + d + s \epsilon_{t+1}),

we have $0 \leq K_\sigma \, h \leq Kh$ . By Lemma A.5.32, the spectral radius of $K$ equals $\beta < 1$ . Hence, by Theorem 4.1.8, the fundamental optimality properties hold, and VFI, OPI and HPI all converge.

Let’s now characterize the Bellman operator, which is defined on $L_1(\phi)$ by $\hat Th = \bigvee_\sigma \, \hat T_\sigma \, h$ .

Solution to Exercise 8.1.23

Fix $g, h \in L_1(\phi)$ . By Jensen’s inequality and (8.15), we have

\begin{aligned} |(\hat Tg)(z) - (\hat Th)(z)| & \leq \beta \, \EE_z \left| \max \left\{ \frac{w'}{1-\beta}, g(z') \right\} - \max \left\{ \frac{w'}{1-\beta}, h(z') \right\} \right| \\ & \leq \beta \, \EE_z \left| g(z') - h(z') \right|. \end{aligned}

Let $Z$ be a draw from $\phi$ . Taking the expectation of the last inequality with $z = Z$ and using the fact that $\phi$ is stationary gives

\EE |(\hat Tg)(Z) - (\hat Th)(Z)| \leq \beta \EE \EE_Z \left| g(z') - h(z') \right| = \beta \EE \left| g(z') - h(z') \right|.

Since $L_1(\phi)$ is complete, Banach’s contraction mapping theorem implies that $\hat T$ has a unique fixed point $\hmax$ in $L_1(\phi)$ .

8.2Extensions¶

In this section we extend the basic job search framework in several directions. Section 8.2.1 introduces nonlinear discounting, where the discount factor depends on the magnitude of continuation value. Section 8.2.2 treats nonlinear expectations via the Kreps–Porteus certainty equivalent. Section 8.2.3 considers job search with learning, where the offer distribution is unknown and the worker updates beliefs. Section 8.2.4 adds job separation risk.

8.2.1Nonlinear Discounting¶

Next we consider a setting where discounting is a nonlinear function of continuation value. One motivation for this generalized setup is magnitude effects, under which, for some individuals, discount rates seem to decrease with the size of the reward (i.e., large rewards are discounted less, so the discount factor is increasing in the size of the reward; see, e.g., Green et al. (1997)). Our aim is to resolve the job search problem under this new setup.

We suppose that wage offers are $P$ -Markov on a Borel set $\Wsf \subset \RR_+$ and the value of continuing is given by

h(w) = c + \int \beta[ v(w') ] P(w, \diff w'),

(8.30)

where $\beta \colon \RR_+ \to \RR$ is a discount factor function. Given this nonlinear discounting formulation, the lifetime value of a constant wage stream that pays $w$ is $e(w)$ , where $e$ is a fixed point of the operator

(H g)(w) = w + \beta(g(w)) \qquad (w \in \Wsf).

(8.31)

(In the case where $\beta(x) = \beta x$ for some fixed $\beta \in (0,1)$ , we get $e(w) = w / (1-\beta)$ , so we recover the standard constant discount case.) To simplify the analysis, we assume that $\Wsf = [w_1, w_2]$ where $0 < w_1 < w_2$ . We also assume that $0 < c < w_1$ and $w_2 \geq 1 - b$ , so that the worst wage offer is better than unemployment compensation. For the discount factor function we set

\beta(x) \coloneq b F(x, \lambda) \quad \text{where } b \in (0,1) \text{ and } F(x, \lambda) = 1 - \exp(-\lambda x).

Obtaining optimality results for this model is not entirely trivial because the policy and Bellman operators are not, in general, contractions. This is due to the fact that $\beta$ can be steep close to zero, as shown in Figure 8.5. At the same time, $\beta$ is concave, which gives us some hope that we can use the concavity-based fixed point and optimality results from Section 4.1.3.

The discount function \beta for different choices of \lambda and b=0.99 — Figure 8.5:The discount function $\beta$ for different choices of $\lambda$ and $b=0.99$

We begin by analyzing $H$ in (8.31). As a first step, we set

V \coloneq [0 , \bar v] \quad \text{where} \quad \bar v \coloneq \frac{c + w_2}{1 - b}.

In this expression, $V$ is understood as an order interval in $b\Wsf$ . We endow $V$ with the pointwise partial order $\leq$ and the supremum norm.

In general, there is no closed-form expression for $e$ , but it can be computed numerically by iterating on $H$ .

The policy set $\Sigma$ is all measurable $\sigma \colon \Wsf \to \{0,1\}$ . Each policy operator $T_\sigma$ becomes

(T_\sigma \, v)(w) = \sigma(w) \, e(w) + (1-\sigma(w)) \left[ c + \int \beta[ v(w') ] P(w, \diff w') \right],

Let $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ . Since every $T_\sigma$ is order-preserving, $(V, \TT)$ is an ADP. Extending our earlier analysis, a policy $\sigma$ is $v$ -greedy whenever

\sigma(w) \coloneq \1 \left\{ e(w) \geq c + \int \beta[v(w')] P(w, \diff w') \right\} \qquad (w \in \Wsf).

Since such a policy exists, the ADP $(V, \TT)$ is regular.

8.2.2Nonlinear Expectations¶

In the last section we modified the job search model to include nonlinear discounting. Here we drop nonlinear discounting but assume instead that the job seeker uses a nonlinear expectation of future values. In particular,

T_\sigma \, v = \sigma e + (1-\sigma) ( c + \beta Rv)

where

(Rv)(w) \coloneq \left( \int v^{1-\gamma}(w') P(w, \diff w') \right)^{1/(1-\gamma)} \qquad (w \in \Wsf, \; \gamma \neq 1).

The operator $R$ is the Kreps–Porteus operator. The value $\gamma$ parameterizes risk aversion for the unemployed worker with respect to intertemporal gambles. When $\gamma = 0$ , we recover the standard linear expectation; when $\gamma > 0$ , the agent is risk-averse; when $\gamma < 0$ , the agent is risk-loving. The constant $\beta$ lies in $(0,1)$ . As before, $e(w) \coloneq w/(1-\beta)$ is the stopping reward. (Stopped rewards are deterministic so $e(w)$ is not affected by $\gamma$ .)

The operator $T_\sigma$ and the operator $R$ act on the set

V \coloneq [c , \bar v] \quad \text{where} \quad \bar v \coloneq \frac{c + w_2}{1 - \beta}.

As in Section 8.2.1, $V$ is understood as an order interval in $b\Wsf$ and $0 < c < w_1 < w_2$ .

Letting $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ , we aim to show that $(V, \TT)$ is an ADP and, moreover, that the conditions of Theorem 4.1.11 hold. As a first step, we observe that, for fixed $\sigma \in \Sigma$ and for sufficiently small positive $\epsilon$ ,

(T_\sigma \, c)(w) \geq \min \left\{ \frac{c}{1-\beta}, \; c + \beta c \right\} \geq c + \epsilon (\bar v - c).

Also, for sufficiently small positive $\epsilon$ ,

(T_\sigma \, \bar v)(w) \leq \max \left\{ \frac{w_2}{1-\beta}, \; c + \beta \bar v \right\} \leq \max \left\{ \bar v - \frac{c}{1-\beta}, \; \bar v - w_2 \right\} \leq \bar v - \epsilon (\bar v - c).

Since $R$ and hence $T_\sigma$ is order preserving, these facts tell us that $T_\sigma$ maps $V$ to itself, so $(V, \TT)$ is an ADP.

Combining the last two $\epsilon$ bounds with fact (ii) above, we see that the conditions of Theorem 4.1.11 are satisfied for every $\gamma \in \RR \setminus \{1\}$ . As a result, the fundamental optimality properties hold and VFI, OPI, and HPI all converge.

Figure 8.6 shows the reservation wage $\bar w$ as a function of $\gamma$ . The figure was computed as follows. Fixing $\gamma$ , we calculated $\vmax$ via VFI, set

\sigma(w) \coloneq \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \left( \int [\vmax(w')]^{1-\gamma} P(w, \diff w') \right)^{1/(1-\gamma)} \right\}

and then set $\bar w \coloneq \min\{w \in \Wsf : \sigma(w) = 1\}$ . Aside from $\gamma$ , the parameters used in the calculations were the same as those given in Section 8.1.3.2. The figure shows that the reservation wage decreases with $\gamma$ , as the job seeker becomes progressively more risk-averse. Increasing risk aversion means that gambles over future payoffs become less attractive, which favors stopping over continuing. This encourages the job seeker to lower the reservation wage.

The reservation wage as a function of \gamma — Figure 8.6:The reservation wage as a function of $\gamma$

8.2.3Job Search with Learning¶

Next we consider a variation of the job search model from §6.6 of Ljungqvist & Sargent (2018). The framework is the IID setting of Section 8.1.1.1, apart from the fact that the wage offer distribution $\phi$ is unknown to the worker. Instead, the agent learns about $\phi$ by starting with a prior belief and then successively updating her beliefs based on observed wage offers. The learning process and hence the stopping problem is similar to the one we analyzed in Section 1.4. One difference is that we will use discounting, which simplifies optimality results. Another is that we will use a transformation to reduce dimensionality.

8.2.3.1The Model¶

The structure of information is as follows: The worker knows there are two possible offer distributions, with densities $f$ and $g$ . At the start of time, nature selects $\phi$ to be either $f$ or $g$ , the wage distribution from which the entire sequence $(W_t)_{t \geq 0}$ will be drawn. This choice is not observed by the worker, who puts prior probability $\pi_0$ on $f$ being chosen. In other words, the worker’s initial guess of $\phi$ is $\pi_0 f(w) + (1 - \pi_0) g(w)$ . Beliefs subsequently update according to Bayes’ rule. Thus, the agent, having observed $W_{t+1}$ , updates $\pi_t$ to $\pi_{t+1}$ via

\pi_{t+1} = \frac{f(W_{t+1})\pi_t}{f(W_{t+1}) \pi_t + g(W_{t+1}) (1 - \pi_t)}.

(8.32)

(Note the connection to the learning dynamics we obtained for the sequential analysis problem in (1.58).)

Using (8.32), we can formulate an ADP representation of the optimal stopping problem. Dropping time subscripts, let $\phi_{\pi} \coloneq \pi f + (1 - \pi) g$ represent the estimate of the wage offer distribution given belief $\pi$ and let

\kappa(w, \pi) \coloneq \frac{\pi f(w)}{\pi f(w) + (1 - \pi) g(w)} \qquad (w \in (0, M), \; \pi \in (0, 1)).

In particular, $\kappa(w, \pi)$ is the updated value $\pi'$ of $\pi$ having observed draw $w$ . The state is $(w, \pi) \in (0, M) \times (0, 1)$ and $\pi$ is referred to as the belief state.

The policy operators for this learning search problem take the form

(T_\sigma \, v)(w, \pi) = \sigma(w, \pi) \frac{w}{1 - \beta} + (1 - \sigma(w, \pi)) \left[ c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w' \right].

Each $T_\sigma$ acts on $v \in V$ , which we define as the set of bounded Borel measurable functions on $(0, M) \times (0, 1)$ . Let $\TT = \setntn{T_\sigma }{\sigma \in \Sigma}$ . Evidently $(V, \TT)$ is an ADP.

8.2.3.2An Efficient Solution Method¶

Rather than tackling $(V, \TT)$ directly, we will introduce a variation with a lower dimensional state space. To begin, fix $v \in V$ and let $\omega(\pi)$ be the corresponding reservation wage at belief state $\pi$ , which is the wage level at which the worker is indifferent between accepting and rejecting. This value satisfies

\frac{\omega(\pi)}{1 - \beta} = c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w'.

(8.34)

We combine (8.33) and (8.34) to obtain

v(w, \pi) = \max \left\{ \frac{w}{1 - \beta} ,\, \frac{\omega(\pi)}{1 - \beta} \right\}

and then use this expression to eliminate $v$ in (8.34), obtaining

\omega(\pi) = (1 - \beta) c + \beta \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w'.

(8.35)

Equation (8.35) can be understood as a functional equation in $\omega$ . Equivalently, the map $\omega$ is the fixed point of the operator $\hat T$ given by

(\hat T \omega)(\pi) = (1 - \beta) c + \beta \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w'.

(8.36)

When this fixed point is well-defined we call it the optimal reservation wage function. The value $\omega(\pi)$ will indicate the smallest wage offer at which the worker is willing to accept, given her current belief state $\pi$ .

Solution to Exercise 8.2.8

Let $\| \cdot \|$ be the supremum norm on $b(0,1)$ . First we show that $\hat T$ is a self-mapping on $b(0,1)$ . To this end, pick any $\omega \in b(0,1)$ and consider the function $\hat T\omega$ defined by (8.36). Evidently $\hat T\omega$ is Borel measurable. To see that this function is bounded, observe that, by the triangle inequality and the fact that $\phi_\pi$ is a density,

(\hat T \omega)(\pi) \leq (1 - \beta) c + \beta \max\{ M , \| \omega \| \}

(8.37)

The right hand side does not depend on $\pi$ so $\hat T\omega$ is bounded as claimed.

Next let’s establish the contraction property. Fix $\omega, \psi \in b(0,1)$ and $\pi \in (0, 1)$ . Using the triangle inequality for integrals and the bound (8.15) yields

|(\hat T \omega)(\pi) - (\hat T \psi)(\pi)| \leq \beta \int \left| \omega [ \kappa(w', \pi) ] - \psi [ \kappa(w', \pi) ] \right| \, \phi_{\pi}(w') \, \diff w' \leq \beta \| \omega - \psi \|.

Taking the supremum over $\pi$ gives $\|\hat T \omega - \hat T \psi\| \leq \beta \| \omega - \psi \|$ .

Solution to Exercise 8.2.9

Let $\omega$ be bounded and continuous on $(0,1)$ . To show that $\hat T \omega$ is continuous, we need to prove that

\int \max \left\{ w', \omega [ \kappa(w', \pi_n) ] \right\} \, \phi_{\pi_n}(w') \, \diff w' \to \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w'

when $(\pi_n)$ is a sequence converging to $\pi \in (0, 1)$ . For fixed $w'$ , both $\kappa(w', \pi)$ and $\phi_\pi(w')$ are continuous in $\pi$ , so, by the dominated convergence theorem, it suffices to show that

H_n(w') \coloneq \max \left\{ w', \omega [ \kappa(w', \pi_n) ] \right\} \, \phi_{\pi_n}(w')

satisfies $\sup_n |H_n(w')| \leq H(w')$ for some $H \colon [0, M] \to \RR$ with $\int H(w') \diff w' < \infty$ . Such an $H$ does indeed exist: one suitable choice is

H(w') \coloneq \max \left\{ M, \| \omega \| \right\} \, (f(w') + g(w')) .

8.2.3.3Parametric Monotonicity¶

Let’s try computing the optimal reservation wage function using the ideas described above. The wage offer distributions are set to

f = \text{Beta}(4, 2) \quad \text{and} \quad g = \text{Beta}(2, 4),

(8.38)

as shown in Figure 8.7. The other parameters are $c=0.1$ and $\beta = 0.95$ . Since $\hat T$ is a contraction of modulus $\beta$ on $\hat V$ , a unique solution $\omegaopt$ to the reservation wage functional equation exists in $\hat V$ and $\hat T^k \omega \to \omegaopt$ uniformly as $k \to \infty$ , for any $\omega \in \hat V$ . Figure 8.8 shows the result of this iteration, the optimal reservation wage, as a function of $\pi$ , the belief state.

The two unknown densities f and g — Figure 8.7:The two unknown densities $f$ and $g$

Note that the optimal reservation wage function $\omegaopt$ in Figure 8.8 is increasing in $\pi$ . This result seems reasonable: In Figure 8.7, the density $f$ puts more mass on higher draws, so, as our belief shifts toward $f$ , our reservation wage should increase. The next proposition gives conditions for such monotonicity.

Proof

Let $ib(0,1)$ be all increasing functions in $b(0,1)$ . As $ib(0,1)$ is closed in $b(0,1)$ (see, e.g., Exercise A.1.8), it suffices to show that $\hat T$ is invariant on $ib(0,1)$ . So pick any $\omega \in ib(0,1)$ . Since $\hat T$ maps $b(0,1)$ to itself, we need only show that $\hat T\omega$ is increasing. For this it suffices to show that, with

h(w', \pi) \coloneq \omega \left[ \frac{\pi f(w')}{\pi f(w') + (1 - \pi) g(w')} \right]

the function

\pi \mapsto \int \max \left\{ w', h (w', \pi) \right\} \, \phi_{\pi}(w') \, \diff w'

is increasing. This will be true if we can establish that (i) $h$ is increasing in both $\pi$ and $w'$ , and (ii) the map $\pi \mapsto \phi_\pi$ is isotone with respect to $\lefsd$ . To see that (i) holds, write $h$ as

h(w', \pi) = \omega \left[ \frac{1}{1 + [(1 - \pi)/ \pi] [g(w')/f(w')]} \right]

Since $\omega$ is increasing, this expression is increasing in $\pi$ . Also, $f$ and $g$ are assumed to have the monotone likelihood ratio property, which means that $g(w')/f(w')$ is decreasing in $w'$ , and hence $h(w', \pi)$ is increasing in $w'$ . Thus, condition (i) is established.

Condition (ii) follows from Proposition A.5.34, along with the result of Exercise 8.2.10. ◻

8.2.4Job Search with Separation¶

We consider a version of the job search model from Section 8.1.3.1 where separation can occur. In particular, an existing match between worker and firm dissolves with probability $\alpha$ every period. Note that this discussion extends a treatment of a similar model in a finite-state setting from Chapter 3 of Sargent & Stachurski (2025).

To simplify the discussion, we assume that the set of possible wage offers $\Wsf \subset \RR_+$ is bounded above by some positive constant $M$ . The state space for the problem is $\Xsf \coloneq \{e, u\} \times \Wsf$ , with a typical element $(s, w)$ denoting employment status $s$ (here $e$ means employed and $u$ means unemployed), and current offer $w$ . A policy is a Borel measurable map $\sigma \colon \Wsf \to \{0,1\}$ , where, as usual, $\sigma(w)=0$ means “reject the current offer” and $\sigma(w)=1$ means “accept.”

The wage offer sequence $(W_t)$ is assumed to be $P$ -Markov on $\Wsf$ . The value space $V$ will be all bounded and Borel measurable $v \colon \Xsf \to \RR$ and we endow $V$ with the supremum norm and the pointwise partial order.

The policy operators take the form

(T_\sigma \, v)(e, w) = w + \beta \left[ \alpha \int v(u, w') P(w, \diff w') + (1-\alpha) v(e, w) \right]

(8.39)

and

(T_\sigma \, v)(u, w) = \sigma(w) v(e, w) + (1 - \sigma(w)) \left[ c + \beta \, \int v(u, w') P(w, \diff w') \right].

(8.40)

The right-hand side of the first expression is the current value of being employed with offer $w$ in hand, given the continuation values embodied in $v$ . The right-hand side of the second expression is the current value of being unemployed with offer $w$ in hand, conditional on using policy $\sigma$ .

We can solve this problem directly by setting up the corresponding ADP, but we can also start by simplifying the value space in a way we now describe. This will make the analysis easier and help with computation. The first step is to regard (8.39) as a fixed point problem, replacing $(T_\sigma \, v)(e, w)$ with $v(e, w)$ on the left hand side and treating $v(u, \cdot)$ as given. Simple algebra then gives

v(e, w) = \frac{1}{1 - \beta(1-\alpha)} \left[ w + \alpha \beta \int v(u, w') P (w, \diff w') \right].

(8.41)

Let’s write this in operator notation. In doing so, we will rewrite $v(u, \cdot)$ as $v_u$ and $v(e, \cdot)$ as $v_e$ . Setting

h(w) \coloneq \frac{1}{1 - \beta(1-\alpha)} w, \quad \text{and} \quad \gamma \coloneq \frac{\alpha \beta}{1 - \beta(1-\alpha)},

we have $v_e = h + \gamma P v_u$ . We substitute this expression into (8.40) to get

T_\sigma \, v_u = \sigma(h + \gamma Pv_u) + (1 - \sigma) (c + \beta P v_u),

(8.42)

We take $b\Wsf$ as the value space and let $\TT = \setntn{T_\sigma}{\sigma \in \Sigma}$ . As before, $\Sigma$ is all Borel measurable maps from $\Wsf$ to $\{0,1\}$ . Recalling that $\Wsf$ is bounded above, one can easily confirm that $T_\sigma$ maps $b\Wsf$ to itself. Clearly $T_\sigma$ is order preserving. Hence $(b\Wsf, \TT)$ is an ADP.

Given $v_u \in b\Wsf$ , set $v_e = h + \gamma P v_u$ and consider the policy $\sigma$ defined by

\sigma(w) \coloneq \1 \left\{ v_e(w) \geq c + \beta \, \int \, v_u(w') P(w, \diff w') \right\} \quad \text{for all } w \in \Wsf.

(8.43)

We claim that $\sigma$ is $v_u$ -greedy. Indeed, for such a $\sigma$ and any alternative policy $s$ we have

T_s \, v_u = s(h + \gamma Pv_u) + (1 - s) (c + \beta P v_u) \leq (h + \gamma Pv_u) \vee (c + \beta P v_u) = T_\sigma \, v_u .

The expression for $\sigma$ in (8.43) is natural because it tells the worker to accept employment whenever its value is higher than the expected present value of continuing, given the continuation value for unemployment associated with $v_u$ .

Solution to Exercise 8.2.12

Let $K \coloneq \beta P$ and $J \coloneq \gamma P$ . Fix $f, g \in b\Wsf$ and $\sigma \in \Sigma$ . Pointwise on $\Wsf$ , we have

|T_\sigma \, f - T_\sigma \, g| = |\sigma J (f - g) + (1 - \sigma) K(f - g)| \leq |J (f - g)| \vee |K(f - g)|.

But $|J (f - g)| \leq \gamma P |f-g|$ and hence

|J (f - g)| \leq \gamma \| f - g\|

A similar argument shows that $|K (f - g)| \leq \beta \| f - g\|$ . Hence

|T_\sigma \, f - T_\sigma \, g| \leq (\beta \| f - g\|) \vee (\gamma \| f - g\|) = (\beta \vee \gamma) \| f - g \|.

Since $\gamma < 1$ , the operator $T_\sigma$ is a contraction of modulus $\beta \vee \gamma$ .

Regarding optimality, we have the following result.

The value function $\vmax_u$ for an unemployed worker satisfies the recursion

\vmax_u(w) = \max \left\{ \vmax_e(w) ,\, c + \beta \, \int \vmax_u(w') P(w, \diff w') \right\} \qquad (w \in \Wsf),

(8.44)

where $\vmax_e$ is the value function for an employed worker, that is, the lifetime value of a worker who starts the period employed at wage $w$ . Value function $\vmax_e$ satisfies

\vmax_e(w) = w + \beta \left[ \alpha \int \vmax_u(w') P(w, \diff w') + (1-\alpha) \vmax_e(w) \right] \qquad (w \in \Wsf).

(8.45)

This equation states that value accruing to an employed worker is current wage plus the discounted expected value of being either employed or unemployed next period.

We claim that, when $0 < \alpha, \beta < 1$ , the system (8.44)–(8.45) has a unique solution $(\vmax_e, \vmax_u)$ in $b\Wsf \times b\Wsf$ .

Substituting into (8.44) yields

\vmax_u(w) = \max \left\{ \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P \vmax_u)(w) \right) ,\, c + \beta \, (P \vmax_u)(w) \right\}.

(8.46)

The stopping and continuation values are given by

\sopt(w) \coloneq \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P \vmax_u)(w) \right) \quad \text{and} \quad \hmax(w) \coloneq c + \beta \, (P \vmax_u)(w)

respectively, for each $w \in \Wsf$ . The value function $\vmax_u$ is the pointwise maximum (i.e., $\vmax_u = \sopt \vee \hmax$ ). The worker’s optimal policy while unemployed is

\sigopt(w) \coloneq \1\{\sopt(w) \geq \hmax(w)\}.

8.3More Applications¶

In this section we present several additional applications of the theory developed in earlier chapters. Section 8.3.1 studies problems with negative discounting and connects them to Coase’s theory of the firm. Section 8.3.2 treats an optimal harvest problem and shows how factorization reduces dimensionality. Section 8.3.3 develops Euler equation methods for continuous choice problems, and Section 8.3.4 establishes the equivalence of value function iteration and time iteration.

8.3.1Coase Meets Bellman¶

In this section, we study a production chain model that can capture the key idea from Coase’s classic study on the nature of the firm Coase (1937). We then show how the equilibrium price function can also be recovered as the solution to a dynamic programming problem. The dynamic programming problem is itself interesting: it analyzes loss minimization with negative discounting, a scenario that appears to have significant empirical relevance. Moreover, negative discounting makes the application of dynamic programming nontrivial. We solve the problem using order stability arguments combined with the theory developed in Section 2.2.2.

8.3.1.1A Production Chain¶

Coase (1937) argues that firms have nontrivial size because of transaction costs associated with using the market. (One example is the cost of negotiating, drafting, monitoring and enforcing contracts with suppliers. Others include search frictions, transaction fees, taxes, and information costs — see, e.g., Williamson (1979)North (1993)Blume et al. (2009)). Because of these costs, entrepreneurs and managers can sometimes coordinate production at a lower cost within the firm.

At the same time, a countervailing force, referred to by Coase (1937) as “diminishing returns to management,” prevents firms expanding without limit. Rising costs per task can be thought of as driven by the expanding informational requirements associated with larger planning problems, leading to progressively higher management costs, incentive problems and misallocation of resources. The size of firms is determined by the trade-off associated with these two forces (transaction costs and diminishing returns to management).

Here we discuss a model that captures these ideas. In the model, firms produce a unit of a final good via sequential completion of processing stages. Stages are indexed by $t \in [0,1]$ , with $t=1$ indicating that the good is complete. One allocation of tasks is illustrated in Figure 8.9. In this example, firm 1 sells one unit of the completed good to a final buyer. Firm 1 then contracts with firm 2 to purchase the partially completed good at stage $t_1$ , with the intention of implementing the remaining $1 - t_1$ tasks in-house (i.e., processing from stage $t_1$ to stage 1). Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at stage $t_2$ . Firm 3 decides to complete the chain, selecting $t_3 = 0$ . The value $t_i$ is called the upstream boundary of firm $i$ .

Figure 8.9:Recursive allocation of production tasks

Production now unfolds from upstream to downstream. First, firm 3 completes processing stages from $t_3 = 0$ up to $t_2$ and transfers the good to firm 2. Firm 2 then processes from $t_2$ up to $t_1$ and transfers the good to firm 1, who processes from $t_1$ to 1 and delivers the completed good to the final buyer. In what follows, the length of the interval of stages carried out by firm $i$ is denoted by $a_i$ and referred to as the range of tasks carried out by firm $i$ . Figure 8.10 helps to clarify notation.

There is a countable infinity of ex ante identical firms and no fixed costs or barriers to entry. An allocation is a nonnegative sequence $A = \{a_i\}_{i \in \NN}$ . An allocation $A$ defines a division of tasks across firms, with $a_i$ being the range of tasks implemented by the $i$ -th firm. If $a_i = 0$ then firm $i$ is understood to be inactive. An allocation $A$ is called feasible if there exists some finite $I$ with $\sum_{i = 1}^I a_i = 1$ . Feasibility means that the entire production process is completed by finitely many firms. Given a feasible allocation $A$ , let $\{t_i\}$ represent the corresponding transaction stages, defined by $t_0 = 1$ and $t_i = t_{i-1} - a_i$ . In particular, $t_{i-1}$ is the downstream boundary of firm $i$ and $t_i$ is its upstream boundary (as in Figure 8.10).

Firms face a price function $p \colon [0, 1] \to \RR_+$ , with $p(t)$ indicating the price of the good at stage $t$ . Since the $i$ -th firm purchases the good at stage $t_i$ , sells it at stage $t_{i-1}$ , and undertakes the remaining $a_i$ tasks in-house, its total costs are its processing costs $c(a_i)$ plus gross input cost $\delta p(t_i)$ . The term $\delta > 1$ represents transaction costs. Transaction costs are incurred only by the buyer (a simplifying assumption), so its profits are

\pi_i = p(t_{i-1}) - c(a_i) - \delta p(t_i) .

(8.47)

Diminishing returns to management are implemented by assuming that $c$ is increasing and strictly convex. We also assume that $c$ is continuously differentiable, with $c(0)=0$ and $c'(0) > 0$ .

Condition (i) is a zero profit condition for suppliers of initial inputs (which are costless). Condition (ii) states that all active firms make zero profits. Condition (iii) ensures that no firm in the production chain has an incentive to deviate, and that no inactive firm can enter and extract positive profits.

To construct an equilibrium we introduce the operator $T$ mapping $p$ to $Tp$ via

(Tp)(s) = \min_{t \leq s} \, \{ c(s-t) + \delta p(t) \} \qquad (0 \leq s \leq 0).

(8.48)

Here and below, the restriction $0 \leq t$ in the minimum is understood. Since $\delta > 1$ , the map $T$ is not a contraction in any obvious metric, and $T^n p$ diverges for many choices of $p$ , even when continuous and bounded.^[1] Nevertheless, there exists a domain on which $T$ is well-behaved: the set of convex increasing continuous functions $p \colon [0,1] \to \RR$ such that $c'(0) s \leq p(s) \leq c(s)$ for all $0 \leq s \leq 1$ . We denote this set of functions by $\pP$ .

This result is proved in Kikuchi et al. (2018). In Section 8.3.1.2 below, we give an alternative proof by constructing a dynamic program whose value function coincides with $p^*$ .

The significance of $p^*$ is that there exists an allocation $A^*$ such that $(p^*, A^*)$ is an equilibrium for the production chain in the sense of Definition 8.3.1. To construct this allocation, we introduce the equilibrium choice function

t^*(s) \coloneq \argmin_{ t \leq s} \{c(s - t) + \delta p^*(t) \}.

(8.49)

By definition, $t^*(s)$ is the cost minimizing upstream boundary for a firm contracted to deliver the good at stage $s$ and facing the price function $p^*$ . Since $p^*$ lies in $\pP$ and since $c$ is strictly convex, the minimizer $t^*(s)$ exists and is uniquely defined.

We can use $t^*$ to construct an equilibrium allocation. The optimal upstream boundary of firm 1 is $t^*(1)$ . Hence firm 2’s optimal upstream boundary is $t^*(t^*(1))$ . Continuing in this way produces the sequence $\{t^*_i\}$ defined by

t_0^* = 1 \quad \text{and} \quad t_i^* = t^*(t^*_{i-1}) .

(8.50)

The sequence ends when a firm chooses to complete all remaining tasks. We label this firm (and hence the number of firms in the chain) as $n^*$ , defined by $n^* \coloneq \inf\setntn{i \in \NN}{t^*_i = 0}$ . The task allocation corresponding to (8.50) is given by $a_i^* \coloneq t_{i-1}^* - t_i^*$ for all $i$ . The function $p^*$ is called the equilibrium price function and $A^*$ is called the equilibrium allocation. Kikuchi et al. (2018) prove the following result:

The idea of the proof is as follows. As a fixed point of $T$ , the equilibrium price function satisfies

p^*(s) = \min_{t \leq s} \, \{ c(s - t) + \delta p^*(t) \} \quad \text{for all} \quad s \in [0, 1].

(8.51)

From this equation it is clear that $p^*$ satisfies part (iii) of Definition 8.3.1. Moreover, the equilibrium upstream boundary for firm $i$ is the minimizer in (8.51) when $s$ is its downstream boundary, so profits are zero for all incumbent firms. Hence part (ii) of Definition 8.3.1 is satisfied. Part (i) of the definition is immediate from the fact that $p^* \in \pP$ , whence we obtain $p^*(0) \leq c(0) = 0$ . More details can be found in Kikuchi et al. (2018).

The first order condition for the minimization in (8.51) is

\delta (p^*)'(t^*(s)) = c'(s - t^*(s)).

The left-hand side is the marginal cost of purchasing an additional unit of input through the market (including the transaction cost markup $\delta$ ), while the right-hand side is the marginal cost of performing one more task in-house. At the optimum, these two forces balance, just as Coase (1937) argued verbally: “a firm will tend to expand until the costs of organizing an extra transaction within the firm become equal to the costs of carrying out the same transaction by means of an exchange on the open market.” For more discussion (and a proof of the differentiability of $p^*$ ), see Kikuchi et al. (2018).

Figure 8.11 shows equilibrium prices and allocations for two values of the transaction cost parameter, using the exponential cost function $c(a) = e^{\theta a} - 1$ with $\theta=10$ . The vertical lines are firm boundaries, computed via (8.50). When $\delta = 1.02$ (top panel), transaction costs are low and we see many small firms. When $\delta = 1.2$ (bottom panel), higher transaction costs encourage less use of the market and more in-house production, yielding fewer, larger firms.

Firm boundaries for \delta = 1.02 (top) and \delta = 1.2 (bottom) — Figure 8.11:Firm boundaries for $\delta = 1.02$ (top) and $\delta = 1.2$ (bottom)

8.3.1.2Optimality with Negative Discounting¶

Interestingly, we can write down a dynamic program such that the value function corresponds with the equilibrium price function defined above. The dynamic program involves negative discounting and is itself useful and informative. Some background discussion and motivation is provided in Section 8.4.

In the model, an agent is faced with a task of measure $\hat x > 0$ . At time $t$ they have $x_t$ units of the task remaining. They then take action $a_t \geq 0$ and incur loss $c(a_t)$ . The state moves to $x_t - a_t$ . We interpret $a_t$ as effort and $c(a_t)$ as disutility. The optimization problem is

\min_{\{a_t\}} \; \sum_{t=0}^{\infty} \delta^t c(a_t) \;\; \text{ s.t. } \;\; \sum_{t=0}^{\infty} a_t = \hat x.

(8.52)

Throughout this section, we suppose that $\delta > 1$ , that $c(0) = 0$ , and that $c$ is continuously differentiable, strictly increasing and strictly convex. The convexity in $c$ encourages the agent to defer some effort. Negative discounting ( $\delta > 1$ ) has the opposite effect: the agent wants to get the task “over and done with.” This trade-off determines the optimum.

We also assume that $c'(0) > 0$ and that there exists an $\eta \in (0, \hat x)$ satisfying

c'(\eta) = \delta c'(0).

(8.53)

Such an $\eta$ exists and is unique whenever $\hat x$ is large enough, since $c'$ is continuous and strictly increasing with $c'(0) < \delta c'(0)$ .

Let $V$ be all increasing $v \colon [0, \hat x] \to \RR$ with $v(0)=0$ . The policy operators for this dynamic program take the form

(T_\sigma \, v)(x) = c(x - \sigma(x)) + \delta v(\sigma(x)) \qquad (v \in V, \; 0 \leq x \leq \hat x).

Here a policy is a function $\sigma \colon [0, \hat x] \to [0, \hat x]$ satisfying

$0 \leq \sigma(x) \leq x$ for all $x$ ,
$\sigma$ is increasing,
the effort level $\pi(x) \coloneq x - \sigma(x)$ is increasing, and
$\sigma(x) = 0$ if and only if $x \leq \eta$ .

Let $\Sigma$ be the set of all such policies.

Note that (ii) and (iii) together imply that $\sigma$ is 1-Lipschitz, and hence continuous. Note also that, for each $\sigma \in \Sigma$ , we have $T_\sigma v \in V$ whenever $\sigma \in \Sigma$ and $v \in V$ . This follows from $(T_\sigma v)(0) = c(0) + \delta v(0) = 0$ and the fact that $T_\sigma v$ is increasing — which is true because $\sigma$ , $\pi$ , $c$ and $v$ are all increasing.

Let $\TT$ be the set of all policy operators. Each $T_\sigma$ is an order-preserving self-map on $V$ , so $(V, \TT)$ is an ADP. The next exercise characterizes iterates of $T_\sigma$ and can be solved by induction.

Solution to Exercise 8.3.1

Induction on $k$ . The base case $k=1$ is the definition of $T_\sigma$ . Assuming (8.54) holds for $k$ ,

\begin{aligned} (T_\sigma^{k+1} v)(x) &= c(\pi(x)) + \delta \, (T_\sigma^k v)(\sigma(x)) \\ &= c(\pi(x)) + \delta \sum_{j=0}^{k-1} \delta^j \, c(\pi(\sigma^{j+1}(x))) + \delta^{k+1} v(\sigma^{k+1}(x)) \\ &= \sum_{j=0}^{k} \delta^j \, c(\pi(\sigma^j(x))) + \delta^{k+1} v(\sigma^{k+1}(x)). \end{aligned}

Somewhat surprisingly, each $T_\sigma$ is well-behaved on $V$ . The next lemma gives details.

Proof

Fix $\sigma \in \Sigma$ . Condition (iii) yields $\pi(x) \geq \pi(\eta) = \eta$ for all $x \geq \eta$ , so $\sigma(x) \leq x - \eta$ whenever $x \geq \eta$ , and hence $\sigma^k(x) = 0$ for all $k \geq \lceil x / \eta \rceil$ . Setting $k_0 \coloneq \lceil \hat x / \eta \rceil$ and using (8.54) with $v(0) = 0$ gives

(T_\sigma^k v)(x) = \sum_{j=0}^{k_0-1} \delta^j \, c(\pi(\sigma^j(x))) \eqcolon v_\sigma(x) \qquad \text{for all } k \geq k_0.

(8.55)

The right-hand side is independent of $v$ , so all orbits reach $v_\sigma$ in at most $k_0$ steps. In particular, taking $v = v_\sigma$ gives $T_\sigma^k v_\sigma = v_\sigma$ for all $k \geq k_0$ , and hence also for $k = 1$ (since $T_\sigma v_\sigma = T_\sigma^{k_0+1} v_\sigma = v_\sigma$ , using $k_0 + 1 \geq k_0$ ). Thus $v_\sigma$ is a fixed point, lying in $V$ since $v_\sigma(0) = 0$ and each term $\delta^j c(\pi(\sigma^j(\cdot)))$ is increasing. If $v \in V$ is any other fixed point, then $v = T_\sigma^{k_0} v = v_\sigma$ . ◻

Let $V_0$ be all convex continuous $v \in V$ satisfying $c'(0) x \leq v(x) \leq c(x)$ for all $0 \leq x \leq \hat x$ . The proofs of the next two lemmas are straightforward but lengthy, so we present them as exercises.

Here and below, the restriction $0 \leq a$ in the minimum is understood.

Solution to Exercise 8.3.2

Fix $v \in V_0$ and let $g(x, y) \coloneq c(x - y) + \delta v(y)$ . Since $v$ is continuous and $[0, x]$ is compact, the minimizer $\sigma(x) \coloneq \argmin_{0 \leq y \leq x} g(x, y)$ exists for every $x$ . We verify that $\sigma \in \Sigma$ . Condition (i) holds by the constraint.

For condition (iv), first suppose $x \leq \eta$ . If $\sigma(x) = y > 0$ , then $g(x, y) < g(x, 0) = c(x)$ , i.e., $c(x - y) + \delta v(y) < c(x)$ . Since $v(y) \geq c'(0) y$ and $\delta c'(0) = c'(\eta) \geq c'(x)$ (because $x \leq \eta$ and $c'$ is increasing), we get $\delta v(y) \geq c'(x) y$ . But then $c(x-y) + c'(x) y < c(x)$ , contradicting convexity of $c$ . Hence $\sigma(x) = 0$ . Conversely, suppose $\sigma(x) = 0$ , so $g(x, 0) \leq g(x, y)$ for all $y \leq x$ . Since $v(y) \leq c(y)$ , this gives $c(x) \leq c(x - y) + \delta c(y)$ for all $y \leq x$ , and hence $(c(x) - c(x-y))/y \leq \delta c(y)/y$ . Taking $y \to 0$ yields $c'(x) \leq \delta c'(0) = c'(\eta)$ , so $x \leq \eta$ .

For conditions (ii) and (iii), observe that $g$ has strictly decreasing differences: for $x_2 > x_1$ , the function $y \mapsto g(x_2, y) - g(x_1, y) = c(x_2 - y) - c(x_1 - y)$ is strictly decreasing (since $c$ is strictly convex). By monotone comparative statics, the minimizer $\sigma(x)$ is increasing in $x$ . Similarly, writing $g(x, y) = c(x - y) + \delta v(y)$ in terms of the effort $a = x - y$ as $\tilde g(x, a) = c(a) + \delta v(x - a)$ , the function $\tilde g$ also has decreasing differences in $(x, a)$ (since $v$ is convex), so the minimizing effort $\pi(x) = x - \sigma(x)$ is increasing in $x$ .

Finally, $Tv = T_\sigma v$ by construction, so $T$ is well-defined on $V_0$ .

Solution to Exercise 8.3.3

We first show that $T$ maps $V_0$ into itself. Fix $v \in V_0$ and let $w = Tv$ . We have $w(0) = c(0) = 0$ and $w(x) \leq c(x)$ (evaluating the minimum at $y = 0$ ). For the lower bound, $c(x-y) \geq c'(0)(x-y)$ and $v(y) \geq c'(0)y$ give $c(x-y) + \delta v(y) \geq c'(0)(x-y) + \delta c'(0)y \geq c'(0) x$ , so $w(x) \geq c'(0) x$ . Continuity of $w$ follows from Berge’s theorem. For convexity, let $x_\lambda = \lambda x_1 + (1-\lambda)x_2$ and let $y_i$ denote the minimizer at $x_i$ . Setting $y_\lambda = \lambda y_1 + (1-\lambda) y_2 \leq x_\lambda$ , convexity of $c$ and $v$ gives

w(x_\lambda) \leq c(x_\lambda - y_\lambda) + \delta v(y_\lambda) \leq \lambda w(x_1) + (1-\lambda) w(x_2).

Finally, monotonicity of $w$ follows from monotonicity of $\sigma$ and $\pi$ (established in Lemma 8.3.4), since $w(x) = c(\pi(x)) + \delta v(\sigma(x))$ and all constituent functions are increasing. Hence $w \in V_0$ .

We now show that $T$ converges in finitely many steps. By Lemma 8.3.4, the minimizer $\sigma(x)$ of $Tv$ satisfies $\sigma(x) = 0$ for $x \leq \eta$ . Hence $Tv(x) = c(x)$ for all $x \leq \eta$ , regardless of $v \in V_0$ .

We claim by induction that $T^k v$ is independent of $v \in V_0$ on $[0, k\eta]$ . The base case $k = 1$ was just established. For the inductive step, suppose $T^k v$ agrees with a fixed function $\bar v$ on $[0, k\eta]$ for every $v \in V_0$ . Since $T^k v \in V_0$ (as $T$ maps $V_0$ into itself), Lemma 8.3.4 gives a minimizer $\sigma(x)$ for $T(T^k v)$ at each $x$ with $\sigma(x) \leq x - \eta$ . For $x \leq (k+1)\eta$ , this gives $\sigma(x) \leq k\eta$ , so $T^k v(\sigma(x)) = \bar v(\sigma(x))$ . Hence $T^{k+1} v(x) = \min_{0 \leq y \leq x}\{c(x-y) + \delta T^k v(y)\}$ depends only on $\bar v$ on $[0, k\eta]$ , and is therefore independent of $v$ . This completes the induction.

For $k_0 = \lceil \hat x / \eta \rceil$ , the function $T^{k_0} v = \bar v$ is independent of $v \in V_0$ on all of $[0, \hat x]$ . Since $\bar v = T^{k_0} v \in V_0$ for any $v \in V_0$ , and $T\bar v = T^{k_0+1} v = \bar v$ (as $k_0 + 1 \geq k_0$ ), $\bar v$ is a fixed point of $T$ in $V_0$ . Uniqueness follows as in Lemma 8.3.3: if $q \in V_0$ satisfies $Tq = q$ , then $q = T^{k_0} q = \bar v$ .

Proof

We apply Corollary 2.2.10. By Lemma 8.3.3, each $T_\sigma$ is globally stable on $V$ . Since each $T_\sigma$ is also order preserving, Lemma A.5.19 gives order stability of $(V, \TT)$ . By Lemma 8.3.4, every $v \in V_0$ admits a $v$ -min-greedy policy, so $V_0 \subset V^G_\triangledown$ . By Lemma 8.3.5, $T$ has a fixed point $\bar v \in V_0 \subset V^G_\triangledown$ . Hence Corollary 2.2.10 applies and the fundamental min-optimality properties hold, giving $\bar v = \vmin$ . Finally, the optimal policy is unique because $a \mapsto c(x - a) + \delta \vmin(a)$ is strictly convex (as $c$ is strictly convex and $\vmin$ is convex), so the minimizer is unique at each $x$ . ◻

Connection to the production chain.

Setting $\hat x = 1$ , the Bellman operator in Theorem 8.3.6 coincides with the operator $T$ defined in (8.48), and $V_0 = \pP$ . Lemma 8.3.5 then provides an alternative proof of Proposition 8.3.1: part (i) is the statement that $T$ maps $V_0 = \pP$ into itself, part (ii) follows from uniqueness of $\bar v$ in $V_0 = \pP$ (with $p^* = \bar v = \vmin$ ), and part (iii) is the finite-step convergence $T^k p \to \bar v$ for all $p \in \pP$ . In particular, the equilibrium price function of the production chain coincides with the value function of the negative-discounting dynamic program.

8.3.2Optimal Harvests¶

In this section we examine a model of forestry management and optimal harvests. We set up the problem in Section 8.3.2.1 and then show in Section 8.3.2.2 how factorization reduces the dimensionality of the state space. Section 8.3.2.3 discusses when optimality for the subordinate ADP implies optimality for the primary.

8.3.2.1Setup¶

A manager controls a timber plantation with biomass $s_t$ at time $t$ . The unit price for timber at time $t$ is $p_t$ . The manager observes $(s_t, p_t)$ and decides whether to harvest or not. A decision to harvest generates revenue $s_t p_t$ . If she chooses to wait, then time updates to the next period and the process repeats.

Biomass takes values in $\Ssf$ , a closed and bounded interval in $\RR_+$ , and evolves according to $s_{t+1} = q(s_t)$ , where $q$ is a continuous self-map on $\Ssf$ . If $q(0) > 0$ , then the plantation regenerates after each harvest. If not, the plantation never regenerates and the problem below is an optimal stopping problem.

We assume that the price sequence $(p_t)$ is IID with distribution $\phi$ on closed and bounded interval $\Esf \subset \RR_+$ . The cost of harvesting given biomass $s$ is $m(s)$ . The cost of maintaining the plantation for one period, rather than harvesting, is $c(s)$ . Both $m$ and $c$ are continuous real-valued functions on $\Ssf$ . The firm is risk neutral and discounts the future using discount factor $\beta < 1$ .

The state space for the model is $\Ssf \times \Esf$ . The Bellman equation can be expressed as

\begin{aligned} & v(s, p) = \max \\ & \left\{ p s - m(s) + \beta \int v(q(0), p') \phi(\diff p') ,\; - c(s) + \beta \int v(q(s), p') \phi(\diff p') \right\}. \; \end{aligned}

(8.56)

Alternatively, we can write

(T v)(s, p) = \max_a \left\{ r(s, p, a) + \beta \int v[f(s, a), p'] \phi(\diff p') \right\},

(8.57)

where $a$ takes values in $\{0, 1\}$ , and

r(s, p, a) \coloneq a (p s - m(s)) - (1-a) c(s) \quad \text{and} \quad f(s, a) \coloneq q[(1 - a)s].

Biomass $s$ takes values in $\Ssf$ , while the price $p$ takes values in $\Esf$ . Both $\Ssf$ and $\Esf$ are closed and bounded intervals in $\RR_+$ . The functions $m$ and $c$ are in $bc\Ssf$ , while $q$ is a continuous self-map on $\Ssf$ .

A feasible policy $\sigma$ is a measurable map from $\Ssf \times \Esf$ to $\{0, 1\}$ , with 0 indicating the decision not to harvest and 1 indicating harvest. The policy operator corresponding to this model is

(T_\sigma \, v)(s, p) \coloneq r_\sigma(s, p) + \beta \int v[f(s, \sigma(s, p)), p'] \phi(\diff p'),

where

r_\sigma(s, p) = r(s, p, \sigma(s, p)) \qquad (\sigma \in \Sigma, \; s \in \Ssf, \; p \in \Esf).

With $\Sigma$ as the set of all feasible policies, $V$ as the set of bounded measurable real-valued functions on $\Ssf \times \Esf$ , and $\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ , the pair $(V, \TT)$ is an ADP with Bellman operator equal to (8.57).

8.3.2.2Factorization¶

We can reduce dimensionality for this ADP via a transformation. To construct it, we set $\hat V$ to be the bounded measurable real-valued functions on $\Ssf \times \{0,1\}$ . If we set

\begin{aligned} (Fv)(s, a) & \coloneq \int v[f(s, a), p'] \phi(\diff p') \qquad (v \in V), \\ (G_\sigma \, w)(s, p) & \coloneq r_\sigma(s, p) + \beta w(s, \sigma(s, p)) \qquad (w \in \hat V), \end{aligned}

and $\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}$ , then

$(V, F, \hat V, \GG)$ is an order-preserving FDP and
$(V, \TT)$ is the primary ADP.

Solution to Exercise 8.3.4

We verify the conditions in the definition of an order-preserving FDP. The map $F$ is order preserving because, for $u \leq v$ pointwise, we have $u[f(s,a), p'] \leq v[f(s,a), p']$ for all $p'$ , and integration preserves this inequality. Each $G_\sigma$ is order preserving because $w \leq w'$ pointwise implies $w(s, \sigma(s,p)) \leq w'(s, \sigma(s,p))$ and hence $(G_\sigma w)(s,p) \leq (G_\sigma w')(s,p)$ . Given any $w \in \hat V$ , the supremum $\GG w$ defined by

(\GG w)(s, p) = \max_a \left\{ r(s, p, a) + \beta w(s, a) \right\}

exists since the maximum is over the finite set $\{0,1\}$ . This is the greatest element of $\{G_\sigma w\}_{\sigma \in \Sigma}$ . We have now confirmed that $(V, F, \hat V, \GG)$ is an order-preserving FDP.

For (2), we verify that $(V, \TT)$ is the primary ADP for this FDP. We need $T_\sigma = G_\sigma \circ F$ . Indeed, for any $v \in V$ ,

\begin{aligned} (G_\sigma F v)(s, p) & = r_\sigma(s, p) + \beta (Fv)(s, \sigma(s,p)) \\ & = r_\sigma(s, p) + \beta \int v[f(s, \sigma(s,p)), p'] \phi(\diff p') = (T_\sigma v)(s, p). \end{aligned}

For the subordinate ADP $(\hat V, \hat{\TT})$ , we get

\begin{aligned} (\hat T_\sigma \, w)(s, a) & = (F G_\sigma \, w)(s, a) \\ & = \int \left\{ r_\sigma(f(s, a), p') + \beta w[f(s, a), \sigma(f(s, a), p')] \right\} \phi(\diff p'). \end{aligned}

Notice that the functions in $V$ are defined over points $(s, p) \in \Ssf \times \Esf$ , while functions in $\hat V$ are defined over points $(s, a) \in \Ssf \times \{0, 1\}$ . The functions in the second set are typically much easier to work with (because $\Esf$ is larger than $\{0,1\}$ ).

Let $\hat T$ be the Bellman operator corresponding to the subordinate ADP $(\hat V, \hat{\TT})$ . In light of Theorem 5.2.13, we can compute an optimal policy for $(V, \TT)$ by

computing the unique fixed point $\wopt$ of $\hat T$ in $\hat V$ and
finding a policy $\sigma$ obeying

\sigma(s, p) \in \argmax_a \left\{ r(s, p, a) + \beta \wopt(s, a) \right\} \qquad ((s, p) \in \Ssf \times \Esf).

8.3.2.3The Converse Implication¶

Since the subordinate ADP $(\hat V, \hat{\TT})$ has a lower-dimensional state space, it is natural to want to solve it first and transfer the resulting optimal policy back to the primary ADP $(V, \TT)$ . As discussed in Section 5.2.2.4, however, this is not always valid: a policy that is optimal for the subordinate need not be optimal for the primary.

The reason can be understood as follows. The subordinate ADP evaluates a policy $\sigma$ only at continuation states of the form $(f(s, a), p')$ , where $p'$ is drawn from $\phi$ . In the harvest model, these biomass levels lie in $q(\Ssf) \cup \{q(0)\}$ . If the growth function $q$ does not map $\Ssf$ onto itself, then there exist biomass levels $s_0 \in \Ssf$ that never arise as continuations. For example, if $q$ maps $\Ssf = [0, 10]$ into $[2, 8]$ , then biomass $s_0 = 1$ is a valid initial state but can never occur after the first period. The subordinate ADP never evaluates $\sigma$ at such states, so a subordinate-optimal policy may make an arbitrarily poor harvest decision there. The primary ADP, however, evaluates $\sigma$ at every state $(s, p) \in \Ssf \times \Esf$ and would detect the suboptimal choice.

The additional condition needed for the converse is that $F$ be strictly order preserving (see Proposition 5.2.14). In this model, sufficient conditions include: $\phi$ has a positive density on $\Esf$ , $q$ maps $\Ssf$ onto itself, and we restrict attention to continuous functions. Under these conditions, any strict difference between two value functions at some $(s_0, p_0)$ propagates through $F$ to a strict difference in the subordinate value space, closing the gap.

8.3.3Euler Equation Methods¶

The Euler equation is a first-order condition for optimality that can be used for analysis and computation. Euler equations are available in a range of settings, including but not limited to household savings problems and optimal growth problems. Here we will work in the context of a smooth optimal growth model with IID shocks. Admittedly, this version of the model is too simple to be useful for serious economic analysis. At the same time it provides a convenient vehicle for our exploration of Euler equations and related topics. Most of the ideas discussed here carry over to other settings where Euler equations can be obtained.

We begin with the model and collect basic optimality results. We then derive the envelope condition and use it to obtain the Euler equation, first in sequential form and then as a functional equation on policies. The functional form leads to the Coleman–Reffett operator $K$ , whose fixed points are solutions to the Euler equation. Time iteration—iterating with $K$ —provides a method for computing the optimal policy directly, without first solving for the value function. We also discuss the endogenous grid method, which accelerates time iteration by avoiding a nonlinear root-finding step. In Section 8.3.4 we establish that VFI and time iteration are equivalent, in the sense that $T$ and $K$ are topologically conjugate.

8.3.3.1A Growth Model¶

We consider an optimal growth model where shocks are multiplicative and IID, with income evolving according to

y_{t+1} = f(y_t - c_t) \xi_{t+1},

(8.58)

where $f$ is a production function. The Bellman equation is

v(y) = \max_{0 \leq c \leq y} \left\{ u(c) + \beta \int v(f(y - c) z) \phi(z) \diff z \right\}

(8.59)

for every $y \in \RR_+$ , where $u$ is a utility function and $\phi$ is the density of the shock process.

We work in the following environment:

We can set this up as an RDP with $\Xsf = \Asf = \RR_+$ by defining

$V = b\Xsf$ ,
$\Gamma(y) = [0,y]$ , and
$B(y, c, v) = u(c) + \beta \int v(f(y - c) z) \phi(z) \diff z$ .

The next lemma shows that $(\Gamma, V, B)$ has strong continuity properties.

Solution to Exercise 8.3.6

The term $u(c)$ is continuous in $(y,c)$ by Assumption 8.3.1. For the integral term, fix $v \in V = b\Xsf$ and let $(y_n, c_n) \to (y, c)$ in $\Gsf$ . Setting $g(y,c) = f(y-c)$ , the integrand $v(g(y_n, c_n) z) \phi(z)$ converges to $v(g(y,c) z) \phi(z)$ for all $z$ , by continuity of $v \circ g$ and $\phi$ . Since $|v(g(y_n, c_n) z) \phi(z)| \leq \|v\| \phi(z)$ and $\phi$ is integrable, the dominated convergence theorem gives $\int v(g(y_n, c_n) z) \phi(z) \diff z \to \int v(g(y,c) z) \phi(z) \diff z$ . (See also Example 6.1.2.)

Solution to Exercise 8.3.7

We verify the conditions of Proposition 7.2.2. For Assumption 7.2.1, boundedness of $B$ follows from boundedness of $u$ and $v$ . The discounting condition holds with $\lambda = \beta$ because

B(y, c, v + \kappa) = u(c) + \beta \int (v + \kappa)(f(y-c) z) \phi(z) \diff z = B(y, c, v) + \beta \kappa.

For Assumption 7.2.3, the constraint correspondence $\Gamma(y) = [0,y]$ is continuous and compact-valued, and continuity of $(y,c) \mapsto B(y,c,v)$ on $\Gsf$ for any $v \in b\Xsf$ follows from Exercise 8.3.6 (the proof there uses only boundedness of $v$ , not continuity). The claims now follow from Proposition 7.2.2.

Solution to Exercise 8.3.8

Under Assumption 8.3.1, the conditions of Proposition 8.3.7 hold, so the fundamental optimality properties are in force. For monotonicity, Assumption 7.2.8 is satisfied: $\Gamma(y) = [0,y] \subset [0,y'] = \Gamma(y')$ when $y \leq y'$ , and $B(y,c,v) \leq B(y',c,v)$ for increasing $v$ because $f(y-c) \leq f(y'-c)$ . Hence $\vmax$ is increasing by Proposition 7.2.6. Concavity of $\vmax$ follows from Proposition 7.2.7, since $\Gsf$ is convex and $(y,c) \mapsto B(y,c,v)$ is concave on $\Gsf$ whenever $v$ is concave (using concavity of $u$ and $f$ ). Continuity holds because $\vmax \in bc\Xsf$ by Proposition 8.3.7. For (ii), strict concavity of $u$ gives strict concavity of $c \mapsto B(y,c,v)$ for concave $v$ , so the optimal policy is unique by Proposition 7.2.8.

8.3.3.2Envelope Theorems¶

We will make use of a differential characterization of greedy policies that is closely connected to the Euler equation and will be useful in what follows. A proof can be found in Section 12.1 of Stachurski (2022).

One interesting aspect of Proposition 8.3.9 is that $v$ does not have to be differentiable. Hence, the Bellman operator is smoothing, in the sense that images of some nonsmooth functions are smooth.

Here’s an important corollary:

Corollary 8.3.10 has been presented in many forms in the economics literature and (8.61) is often called the envelope condition.

8.3.3.3The Sequential Euler Equation¶

In the present setting, the Euler equation takes the form

u'(c_t) = \beta \, \EE_t \left[ u'(c_{t+1}) f'(y_t - c_t) \xi_{t+1} \right]

(8.62)

We refer to (8.62) as the sequential Euler equation because it restricts the endogenous sequence $(c_t)$ .

The left-hand side is the marginal utility of consuming one additional unit today. The right-hand side is the expected marginal benefit of saving that unit instead: one unit of savings yields a gross return of $f'(y_t - c_t) \xi_{t+1}$ in the next period, which is then valued at marginal utility $u'(c_{t+1})$ and discounted by $\beta$ . At the optimum, these two quantities are equalized—any deviation from (8.62) would allow the agent to improve lifetime utility by shifting consumption between periods.

The Euler equation is typically understood as a necessary condition for optimality of a consumption-savings path. However, when applied to policies rather than consumption paths, it turns out to be sufficient as well. Moreover, studying it leads to new insights on optimal behavior and computational methods.

To investigate these ideas, let’s shift to a policy-based perspective, where the Euler equation becomes a functional restriction on policies. Set

\Sigma_{\cC} \coloneq \text{ all continuous, strictly increasing } \sigma \in \Sigma \text{ satisfying } 0 < \sigma(y) < y \text{.}

By continuity, each $\sigma$ in $\Sigma_{\cC}$ satisfies $\sigma(0) = 0$ . In what follows, we will say that $\sigma \in \Sigma_{\cC}$ satisfies the Euler equation if

(u'\circ \sigma)(y) = \beta \int (u'\circ \sigma)(f(y - \sigma(y)) z) f'(y - \sigma(y)) z \phi(\diff z) \;\; \text{ for all } y > 0.

(8.63)

To solve the functional Euler equation, we convert it into a fixed point problem. Consider the operator $K$ from $\Sigma_{\cC}$ to itself defined as follows: for each $\sigma \in \Sigma_{\cC}$ and each $y > 0$ , the value

K\sigma(y) \coloneq \text{ the } c \text{ in } (0, y) \text{ that solves } \; u'(c) = \beta \int (u'\circ \sigma)(f(y - c) z) f'(y - c) z \phi(\diff z).

We call $K$ the Coleman–Reffett operator. It is well defined since, for any $\sigma \in \Sigma_{\cC}$ , the map $c \mapsto u'(c)$ is continuous and strictly decreasing on $(0, y)$ with $u'(c) \to +\infty$ as $c \downarrow 0$ , while the integral term is continuous and strictly increasing in $c$ on $(0, y)$ and diverges to $+\infty$ as $c \uparrow y$ . It follows that there exists exactly one solution $c$ in the interval $(0, y)$ .

It is immediate from the definition that $\sigma$ in $\Sigma_{\cC}$ is a fixed point of $K$ if and only if it satisfies the Euler equation. Thus, the Coleman–Reffett operator plays the same role for the optimal policy that the Bellman operator plays for the value function.

Solution to Exercise 8.3.10

Let $\sigma_a$ and $\sigma_b$ be elements of $\Sigma_{\cC}$ with $\sigma_a \leq \sigma_b$ . To see that $K \sigma_a \leq K \sigma_b$ , observe that, for arbitrary $\sigma \in \Sigma_{\cC}$ and fixed $y \in \RR_+$ , the value $K\sigma(y)$ is the zero of

H_\sigma(c) \coloneq u'(c) - \beta \int (u'\circ \sigma)(f(y - c) z) f'(y - c) z \phi(\diff z)

Given our assumptions on the primitives, we have $H_{\sigma_a}(c) \leq H_{\sigma_b}(c)$ for every $c \in (0, y)$ . Since $H_{\sigma_b}$ is pointwise greater, its zero is also larger. In other words, $K\sigma_a(y) \leq K\sigma_b(y)$ . Since $y$ was arbitrary, our proof is now done.

8.3.3.4Time Iteration¶

Time iteration means computing the optimal policy by iterating with $K$ : starting from an initial guess $\sigma_0 \in \Sigma_{\cC}$ , we generate the sequence $(\sigma_k)$ defined by $\sigma_{k+1} = K \sigma_k$ . In Section 8.3.4 we will show that this sequence converges to the optimal policy $\sigopt$ , the unique fixed point of $K$ in $\Sigma_{\cC}$ . The proof uses the fact that $K$ and the Bellman operator $T$ are topologically conjugate, so the iterates of $K$ converge whenever VFI converges, and at the same rate.

Figure 8.12 illustrates time iteration for the growth model in Section 8.3.3.1 with log utility $u(c) = \ln c$ , Cobb–Douglas production $f(k) = k^\alpha$ , and lognormal shocks. This specification does not satisfy all of the conditions in Assumption 8.3.1 (for example, $u$ is not bounded), but it admits a closed-form optimal policy $\sigopt(y) = (1 - \alpha \beta) y$ , which allows us to measure the accuracy of the algorithm directly. The left panel shows the first few iterates of $K$ starting from $\sigma_0(y) = y$ (consume everything), which converge visibly toward $\sigopt$ . The right panel plots the sup-norm error $\| K^n \sigma_0 - \sigopt \|$ , confirming rapid convergence.

Iterates of the Coleman–Reffett operator K, starting from \sigma_0(y) = y, with \alpha = 0.4 and \beta = 0.96. — Figure 8.12:Iterates of the Coleman–Reffett operator $K$ , starting from $\sigma_0(y) = y$ , with $\alpha = 0.4$ and $\beta = 0.96$ .

In practice, time iteration is often more efficient than VFI because policy functions typically have less curvature than value functions, making them easier to approximate on a grid. However, each iterate of $K$ requires solving a nonlinear equation at every grid point, which can be costly. The endogenous grid method, discussed next, eliminates this root-finding step.

8.3.3.5The Endogenous Grid Method¶

In the standard implementation of time iteration, we fix a grid of income values $\{y_i\}$ and, for each $y_i$ , solve the nonlinear equation

u'(c) = \beta \int (u'\circ \sigma)(f(y_i - c) z) f'(y_i - c) z \, \phi(\diff z)

for $c$ using a root-finding algorithm. This is the most expensive step in each iteration.

The endogenous grid method (EGM), introduced by Carroll (2006), avoids root-finding by reversing the logic: instead of fixing income $y$ and solving for consumption $c$ , we fix savings $s = y - c$ and solve for $c$ directly. Specifically, given the current policy $\sigma \in \Sigma_{\cC}$ and a fixed grid of savings values $\{s_j\}$ , we compute

c_j = (u')^{-1} \left( \beta \int (u'\circ \sigma)(f(s_j) z) \, f'(s_j) \, z \, \phi(\diff z) \right)

(8.64)

and then set $y_j = c_j + s_j$ . Each pair $(y_j, c_j)$ satisfies the Euler equation by construction. The updated policy $K\sigma$ is then reconstructed from the pairs $\{(y_j, c_j)\}$ by interpolation.

The key advantage is that (8.64) requires only evaluation of $(u')^{-1}$ , which is available in closed form for standard utility functions (e.g., $(u')^{-1}(p) = p^{-1/\gamma}$ for CRRA utility). This replaces the iterative root-finding step with a single function evaluation at each grid point, leading to substantial speed gains.

The name “endogenous grid method” reflects the fact that the income grid $\{y_j\}$ is not fixed in advance but is determined endogenously by the savings grid and the current policy.

8.3.4Equivalence of VFI and Time Iteration¶

We now show that value function iteration and time iteration are, in a precise sense, equivalent. To do so, we construct a bijection $M$ between value functions and policies under which the Bellman operator $T$ and the Coleman–Reffett operator $K$ are conjugate. We then upgrade this conjugacy to a topological conjugacy, which allows us to transfer the known convergence properties of $T$ to $K$ .

8.3.4.1Conjugacy of $K$ and $T$ ¶

The key step is to build a bijection $M$ between the space of value functions and the space of policies that intertwines $T$ and $K$ . The envelope condition (Proposition 8.3.9) provides the natural link: given a value function $v$ , the map $M$ recovers the corresponding greedy policy by inverting the marginal utility.

Throughout this section Assumption 8.3.1 is in force.

First let $V_{\cC}$ be all strictly concave, continuously differentiable $v$ mapping $\RR_+$ to itself and satisfying $v(0) = 0$ and $v'(y) > u'(y)$ whenever $y > 0$ . For $v \in V_{\cC}$ let $Mv$ be defined by

(M v)(y) = (u')^{-1}(v' (y))

(8.65)

when $y> 0$ and $(M v)(0) =0$ . For this mapping, we have the following result:

Proof

For ease of exposition we set $m(y) \coloneq (u')^{-1} (y)$ whenever $y > 0$ . From Assumption 8.3.1, one can show that $m$ is a continuous and strictly decreasing bijection from $(0,\infty)$ to itself. Now observe that, for fixed $v \in V_{\cC}$ , the derivative $v'$ is a continuous, strictly decreasing function. Hence $M v = m \circ v'$ is strictly increasing and continuous. Moreover, interiority holds because $v'$ strictly dominates $u'$ , implying that, when $y > 0$ ,

(M v)(y) = m(v'(y)) < m(u'(y)) = y

(8.67)

In particular, $\sigma(y) \coloneq (Mv)(y)$ is an element of $\Sigma_{\cC}$ .

To see that each $\sigma \in \Sigma_{\cC}$ has a preimage $v \in V_{\cC}$ with $Mv = \sigma$ , fix any $\sigma \in \Sigma_{\cC}$ and let $v$ be given by (8.66). An application of the Fundamental Theorem of Calculus yields $v \in V_{\cC}$ and $Mv = \sigma$ . It is also true that $M$ is one-to-one on $V_{\cC}$ . To see this, suppose that $v$ and $w$ are elements of $V_{\cC}$ satisfying $Mv = Mw$ . Then $v(0) = w(0) = 0$ and $v' = w'$ on $(0, \infty)$ . The Fundamental Theorem of Calculus then implies that $v = w$ on $\RR_+$ .

Finally, given $v \in V_{\cC}$ , let $\sigma$ be the unique $v$ -greedy policy. The claim is that $\sigma = M T v$ , or, equivalently, that $u'(\sigma(y)) = (Tv')(y)$ for all $y > 0$ . That this statement is true has already been established, in Proposition 8.3.9. ◻

The significance of $M$ is that the systems $(V_{\cC}, T)$ and $(\Sigma_{\cC}, K)$ are conjugate under this mapping:

Proof

Since $M$ is bijective as a map between $V_{\cC}$ and $\Sigma_{\cC}$ , we can equivalently show that $M \circ T = K \circ M$ , or, more explicitly, that

(M T v)(y) = ( K M v)(y) \text{ for any } v \in V_{\cC} \text{ and any } y \in (0, \infty)

(8.69)

To establish (8.69), fix $v \in V_{\cC}$ and $y > 0$ . We saw in Lemma 8.3.11 that $\sigma \coloneq M T v$ is the unique $v$ -greedy policy. This policy necessarily satisfies the first order condition

u'(\sigma(y)) = \beta \int v' (f(y - \sigma(y)) z ) f'(y - \sigma(y)) z \phi(\diff z)

(8.70)

On the other hand, $KMv(y)$ is the unique $c$ in $(0, y)$ that solves

\begin{aligned} u'(c) & = \beta \int (u' \circ (Mv)) (f(y - c) z ) f'(y - c) z \phi(\diff z) \\ & = \beta \int (u' \circ ((u')^{-1} \circ v')) (f(y - c) z ) f'(y - c) z \phi(\diff z) \\ & = \beta \int v'(f(y - c) z ) f'(y - c) z \phi(\diff z) \end{aligned}

In particular, $c = \sigma(y)$ . In other words, $KMv(y) = MTv(y)$ , as was to be shown. ◻

This shows that VFI and time iteration are essentially isomorphic—they are topologically conjugate dynamical systems with identical long-run behavior. The practical benefits of time iteration stem from its numerical advantages: it operates directly in policy space and, combined with methods such as EGM, can avoid costly root-finding steps.

8.3.4.2From Conjugacy to Topological Conjugacy¶

Suppose we can show that the bijection $M$ is also continuous as a map from $V_{\cC}$ to $\Sigma_{\cC}$ and that its inverse is likewise continuous. Then (8.68) implies that $(V_{\cC}, T)$ and $(\Sigma_{\cC}, K)$ are topologically conjugate (see Section 5.1.1.2). This will be informative about $(\Sigma_{\cC}, K)$ because topologically conjugate dynamical systems have essentially identical properties—and we already have a significant amount of information about the trajectories of $T$ .

The way we will show that $M$ and its inverse are continuous is to cook up a metric on $\Sigma_{\cC}$ that essentially guarantees this result. The metric in question is

\rho(\sigma_1, \sigma_2) = d_\infty( M^{-1} \sigma_1 , M^{-1} \sigma_2 ) \coloneq \| M^{-1} \sigma_1 - M^{-1} \sigma_2 \|_\infty

Now we can state the following result, which infers properties about the time iteration and its fixed point from corresponding properties of the Bellman operator.

Proof

Let the conditions of the proposition hold and let $\sigma$ be the unique optimal policy. We have already seen that $T$ is globally stable on $bc\RR_+$ under $d_\infty$ and therefore it is globally stable on $V_{\cC}$ under the same metric. (That $T$ is a self-map on $V_{\cC}$ follows from Proposition 8.3.9.) Moreover, when we endow $\Sigma_{\cC}$ with the metric $\rho$ , $T$ and $K$ are topologically conjugate under the bijection $M$ . In view of Proposition 5.1.2, this implies that $K$ is globally stable in $\Sigma_{\cC}$ . Since $\vmax$ is the unique fixed point of $T$ in $V_{\cC}$ , Proposition 5.1.1 tells us that the unique fixed point of $K$ in $\Sigma_{\cC}$ is $M \vmax$ . As shown in Lemma 8.3.11, $M \vmax = M T \vmax$ is the unique $\vmax$ -greedy policy, which, by Bellman’s principle of optimality, is $\sigma$ . ◻

Solution to Exercise 8.3.12

Let $\beta_a$ and $\beta_b$ be two discount factors with the stated properties. Let $K_a$ and $K_b$ be the respective Coleman–Reffett operators. For $\sigma_b \leq \sigma_a$ to hold, it suffices (by Proposition A.5.20) to show that $K_a$ is order preserving, globally stable on $\Sigma_{\cC}$ , and has the property $K_b \sigma \leq K_a \sigma$ for any $\sigma \in \Sigma_{\cC}$ . That $K_a$ is order preserving was established in Exercise 8.3.10. That $K_a$ is globally stable was shown in Proposition 8.3.13. To see that $K_b \sigma \leq K_a \sigma$ for all $\sigma \in \Sigma_{\cC}$ , fix any such $\sigma$ and any $y \in (0, \infty)$ . Observe that $K_i \sigma(y)$ is the zero of

H_i(c) \coloneq u'(c) - \beta_i \int (u'\circ \sigma)(f(y - c) z) f'(y - c) z \phi(\diff z)

for $i = a, b$ . Evidently $H_b \leq H_a$ pointwise on $(0, y)$ , so the zero of $H_b$ is no larger. In other words, $K_b \sigma(y) \leq K_a \sigma(y)$ . Since $y$ was arbitrary, this completes our proof.

The intuition for the result is that larger $\beta$ means greater patience, which encourages saving and hence reduces consumption.

8.4Chapter Notes¶

The job search model in Section 8.1.1 is due to McCall (1970). Mortensen (1986) provides an extensive survey of the job search literature and its connections to labor market analysis. A finite-state treatment of the McCall model and related problems can be found in Sargent & Stachurski (2025). The job search model with learning in Section 8.2.3.1 is closely related to Rothschild (1974), who studied optimal search when the distribution of offers is unknown. Esponda & Pouzo (2021) provide an equilibrium condition for agents controlling MDPs under Bayesian learning.

The Kreps–Porteus certainty equivalent used in Section 8.2.2 originates from Kreps & Porteus (1978); see also Epstein & Zin (1989) for a continuous-time formulation and further analysis. The job search model with separation in Section 8.2.4 extends a finite-state treatment from Chapter 3 of Sargent & Stachurski (2025).

The form of nonlinear discounting analyzed in Section 8.2.1 was introduced by Jaśkiewicz et al. (2014) and Bäuerle et al. (2021). As mentioned, one motivation is magnitude effects, under which discount rates decrease with the size of the reward. For evidence in favor of such effects, see, for example, Green et al. (1997).

The Coase-meets-Bellman production chain model in Section 8.3.1.1 is based on the work of Kikuchi et al. (2018). The negative discount optimality problem constructed in Section 8.3.1.2 is based on Kikuchi et al. (2021). Negative discount rates seem to arise naturally in some settings. Thaler (1981), Loewenstein & Prelec (1991) and Loewenstein & Sicherman (1991) document separate instances of such phenomena. For example, Loewenstein & Sicherman (1991) found that the majority of surveyed workers reported a preference for increasing wage profiles over decreasing ones, even when it was pointed out that the latter could be used to construct a dominating consumption sequence. Loewenstein & Prelec (1991) obtained similar results, stating that “sequences of outcomes that decline in value are greatly disliked, indicating a negative rate of time preference” Loewenstein & Prelec (1991).

The optimal harvest model in Section 8.3.2 illustrates how factored dynamic programs can reduce dimensionality in models with IID shocks.

The stochastic optimal growth model described in Section 8.3.3.1 was first set out and analyzed in Brock & Mirman (1972). Stokey & Lucas (1989) provides a textbook treatment of the dynamic programming theory for optimal growth, including the envelope theorem and Euler equation methods. The envelope condition in Section 8.3.3.2 is a version of the classic result of Benveniste & Scheinkman (1979), who established differentiability of the value function at interior optima; see also Milgrom & Segal (2002) for extensions to arbitrary choice sets. The Coleman–Reffett operator in Section 8.3.3.3 is named after Coleman (1990), who introduced policy-function iteration as an alternative to value function iteration for solving the stochastic growth model, and Coleman (1991), who extended the method to distorted economies. Datta et al. (2002) generalized the approach using lattice-theoretic methods.

The endogenous grid method was introduced by Carroll (2006). Extensions to handle non-convex and discrete–continuous choice problems were developed by Fella (2014) and Iskhakov et al. (2017), respectively. The dramatic speed improvements offered by EGM have made it a standard component of the structural estimation toolkit for life-cycle and consumer-choice models.

The topological conjugacy approach used in Section 8.3.4 to establish equivalence of VFI and time iteration draws on ideas developed in Sargent & Stachurski (2025).

Footnotes¶

For example, if $p \equiv 1$ , then $T^n p = \delta^n 1$ , which diverges to $+\infty$ .
↩

References¶

McCall, J. J. (1970). Economics of Information and Job Search. The Quarterly Journal of Economics, 84(1), 113–126.
Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
Green, L., Myerson, J., & McFadden, E. (1997). Rate of temporal discounting decreases with amount of reward. Memory & Cognition, 25, 715–723.
Ljungqvist, L., & Sargent, T. J. (2018). Recursive macroeconomic theory (4th ed.). MIT press.
Coase, R. H. (1937). The nature of the firm. Economica, 4(16), 386–405.
Williamson, O. E. (1979). Transaction-cost economics: the governance of contractual relations. Journal of Law and Economics, 22(2), 233–261.
North, D. C. (1993). Institutions, transaction costs and productivity in the long run. In Washington University in St. Louis [Techreport].
Blume, L. E., Easley, D., Kleinberg, J., & Tardos, E. (2009). Trading networks with price-setting agents. Games and Economic Behavior, 67(1), 36–50.
Kikuchi, T., Nishimura, K., & Stachurski, J. (2018). Span of control, transaction costs and the structure of production chains. Theoretical Economics, 13(2), 729–760.
Stachurski, J. (2022). Economic dynamics: theory and computation (2nd ed.). MIT Press.
Carroll, C. D. (2006). The method of endogenous gridpoints for solving dynamic stochastic optimization problems. Economics Letters, 91(3), 312–320.
Mortensen, D. T. (1986). Job search and labor market analysis. Handbook of Labor Economics, 2, 849–919.
Rothschild, M. (1974). Searching for the Lowest Price When the Distribution of Prices Is Unknown. Journal of Political Economy, 82(4), 689–711.
Esponda, I., & Pouzo, D. (2021). Equilibrium in misspecified Markov decision processes. Theoretical Economics, 16(2), 717–757.
Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica: Journal of the Econometric Society, 185–200.

8 Additional Applications

8.1Job Search¶

8.1.1The Basic Model¶

8.1.1.1Description¶

8.1.1.2Reducing the Value Space¶

8.1.1.3Optimality with IID Offers¶

8.1.2Rearranging the Bellman Equation¶

8.1.2.1Continuation Values¶

8.1.2.2An FDP Perspective¶

8.1.2.3Parametric Monotonicity¶

8.1.3Job Search with Correlated Wage Draws¶

8.1.3.1An ADP Representation¶

8.1.3.2A Numerical Study¶

8.1.3.3Persistent and Transient Components¶

8.2Extensions¶

8.2.1Nonlinear Discounting¶

8.2.2Nonlinear Expectations¶

8.2.3Job Search with Learning¶

8.2.3.1The Model¶

8.2.3.2An Efficient Solution Method¶

8.2.3.3Parametric Monotonicity¶

8.2.4Job Search with Separation¶

8.3More Applications¶

8.3.1Coase Meets Bellman¶

8.3.1.1A Production Chain¶

8.3.1.2Optimality with Negative Discounting¶

8.3.2Optimal Harvests¶

8.3.2.1Setup¶

8.3.2.2Factorization¶

8.3.2.3The Converse Implication¶

8.3.3Euler Equation Methods¶

8.3.3.1A Growth Model¶

8.3.3.2Envelope Theorems¶

8.3.3.3The Sequential Euler Equation¶

8.3.3.4Time Iteration¶

8.3.3.5The Endogenous Grid Method¶

8.3.4Equivalence of VFI and Time Iteration¶

8.3.4.1Conjugacy of KKK and TTT¶

8.3.4.2From Conjugacy to Topological Conjugacy¶

8.4Chapter Notes¶

8.3.4.1Conjugacy of $K$ and $T$ ¶