Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

8 Additional Applications

8.1Job Search

We formulate a basic iid job search model as an ADP and establish optimality results. We then reduce dimensionality via continuation values and study parametric monotonicity of the reservation wage. Finally, we extend the model to allow correlated wage draws with a Markov structure.

8.1.1The Basic Model

We begin with the job search problem of McCall (1970), a finite state version of which was discussed at length in Chapter 1 of Sargent & Stachurski (2025). Here we consider a general state version and allow wages and rewards to be unbounded.

We describe the model, construct the associated ADP on L1(ϕ)L_1(\phi), and verify that it is well-posed and regular. We also consider smaller value spaces that can be used when the offer distribution has bounded support. We then establish the fundamental optimality properties and convergence of the standard algorithms.

8.1.1.1Description

In each period, an unemployed worker receives a wage offer WtW_t, drawn from some known distribution ϕ\phi. The worker can accept the current offer or wait until the following period and consider a new offer.

Let L1(ϕ)L1(W,B,ϕ)L_1(\phi) \coloneq L_1(\Wsf, \bB, \phi) be all Borel measurable f ⁣:WRf \colon \Wsf \to \RR with f ⁣dϕ<\int |f| \diff \phi < \infty. As usual, functions equal ϕ\phi-almost everywhere are identified and fgf \leq g means that {f>g}\{f > g\} has measure zero under ϕ\phi. Let Σ\Sigma be all Borel measurable σ ⁣:W{0,1}\sigma \colon \Wsf \to \{0,1\}. Each such σ\sigma can be understood as a policy, mapping states to actions: If σ(w)=1\sigma(w)=1, then the unemployed worker stops and accepts current offer ww. If σ(w)=0\sigma(w)=0, she continues.

Consider first a two-period problem. In period zero, the worker can either accept observed wage offer w0ϕw_0 \sim \phi or continue to the next period, receiving unemployment compensation cc and random payoff v(W1)v(W_1). The offer W1W_1 is drawn from ϕ\phi and vv is a given “terminal reward” function. Under policy σ\sigma, which maps the wage offer w0w_0 into an accept/reject decision, the expected present value of her payoff is

vσ(w0)σ(w0)w0+(1σ(w0))[c+βv(w)ϕ( ⁣dw)].v_\sigma(w_0) \coloneq \sigma(w_0) w_0 + (1-\sigma(w_0)) \left[ c + \beta \int v(w') \phi(\diff w') \right].

If σ(w0)=1\sigma(w_0)=1 the worker accepts and receives reward w0w_0. If σ(w0)=0\sigma(w_0)=0, then she rejects and receives expected continuation reward c+βv(w)ϕ( ⁣dw)c + \beta \int v(w') \phi(\diff w').

Now we switch to an infinite horizon. Jobs are assumed to be permanent, so the present value of stopping with wage offer ww in hand is

w1β=w+βw+β2w+\frac{w}{1-\beta} = w + \beta w + \beta^2 w + \cdots

Fixing σΣ\sigma \in \Sigma, let vσ(w)v_\sigma(w) be the lifetime value of following policy σ\sigma given initial wage offer ww. By analogy with (8.1), we expect vσv_\sigma to obey the recursion

vσ(w)=σ(w)w1β+(1σ(w))[c+βvσ(w)ϕ( ⁣dw)]for all wW.v_\sigma(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v_\sigma(w') \phi(\diff w') \right] \quad \text{for all } w \in \Wsf.

Compared to (8.1), we have taken the value of stopping from (8.2) and also replaced the terminal value function vv on the right-hand side of (8.1) with vσv_\sigma. This is because we now work with an infinite horizon, so that (8.3) becomes a recursion in vσv_\sigma.

Continuing to hold σ\sigma fixed, we introduce the policy operator vTσvv \mapsto T_\sigma \, v via

(Tσv)(w)=σ(w)w1β+(1σ(w))[c+βv(w)ϕ( ⁣dw)].(T_\sigma \, v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v(w') \phi(\diff w') \right].

Since L1(ϕ)L_1(\phi) is closed under linear operations, policies are Borel measurable, and Assumption 8.1.1 is in force, we have TσvL1(ϕ)T_\sigma \, v \in L_1(\phi) whenever vL1(ϕ)v \in L_1(\phi). Clearly TσT_\sigma is order preserving on (L1(ϕ),)(L_1(\phi), \leq). Hence, with T{Tσ:σΣ}\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}, the pair (L1(ϕ),T)(L_1(\phi), \TT) is an ADP. By construction, any fixed point of TσT_\sigma solves (8.3), so each such fixed point vσv_\sigma has the interpretation of assigning lifetime values to states under σ\sigma.

Fix vL1(ϕ)v \in L_1(\phi) and consider the policy σ\sigma given by

σ(w)=1{w1βc+βv(w)ϕ( ⁣dw)}(wW).\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int v(w') \phi(\diff w') \right\} \qquad (w \in \Wsf).

This policy tells the worker to stop when the payoff from stopping is larger than the expected payoff from continuing, assuming that vv is used to value future states.

Since σ\sigma in (8.6) is well-defined at any vL1(ϕ)v \in L_1(\phi), and also Borel measurable, the ADP (L1(ϕ),T)(L_1(\phi), \TT) is regular.

The expression for TT in (8.7) aligns with the job search Bellman operator from Chapter 1 of Sargent & Stachurski (2025), after replacing the finite expectation with an integral.

8.1.1.2Reducing the Value Space

In the preceding analysis we used L1(ϕ)L_1(\phi) for the value space because ϕ\phi is allowed to have unbounded support. Since ww can be arbitrarily large, this implies that the function TσvT_\sigma \, v in (8.4) is unbounded. The set L1(ϕ)L_1(\phi) can handle unbounded functions. To complete this section, the next exercise looks at settings where ϕ\phi has bounded support and considers how we might exploit this by selecting a smaller value space.

8.1.1.3Optimality with iid Offers

Let’s return now to the general setting of Assumption 8.1.1, where WR+\Wsf \subset \RR_+ can be unbounded, and use L1(ϕ)L_1(\phi) for the value space. We consider optimality properties and convergence of algorithms for the job search ADP (L1(ϕ),T)(L_1(\phi), \TT).

Since the fundamental optimality properties hold, the value function v\vmax is a fixed point of the Bellman operator TT and a policy σ\sigma is optimal if and only if it is v\vmax-greedy, which is to say that

σ(w)=1{w1βc+βv(w)ϕ( ⁣dw)}\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \int \vmax(w') \phi(\diff w') \right\}

for all wR+w \in \RR_+. (We are assuming that the agent accepts the job offer if indifferent.) In other words, the optimal rule is to stop if and only if

w(1β)[c+βv(w)ϕ( ⁣dw)].w \geq (1 - \beta) \left[ c + \beta \int \vmax(w') \phi(\diff w') \right].

The term on the right-hand side is called the reservation wage. This representation of optimal behavior is convenient because the reservation wage provides a scalar summary of the solution to the problem.

8.1.2Rearranging the Bellman Equation

In the iid case, the Bellman equation can be reduced to a scalar fixed point problem in a single continuation value. We derive this reduction, connect it to the theory of factored DPs, and use it to study how the reservation wage varies with model parameters.

8.1.2.1Continuation Values

In view of (8.7), a function vv satisfies the Bellman equation when

v(w)=max{w1β,c+βv(w)ϕ( ⁣dw)}for all wW.v(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \int v(w') \phi(\diff w') \right\} \quad \text{for all } w \in \Wsf.

Taking vv as given, consider the term

h=c+βv(w)ϕ( ⁣dw).h = c + \beta \int v(w') \phi(\diff w') .

We can use hh to eliminate the function vv from (8.11). To do so we insert hh on the right-hand side, replace ww with ww' in (8.11), take expectations, multiply by β\beta and add cc to obtain

h=c+βmax{w1β,h}ϕ( ⁣dw).h = c + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w').

This is a nonlinear equation in hh, the solution of which, henceforth denoted h\hmax, is the optimal continuation value of our problem. Obtaining h\hmax allows us to solve the dynamic programming problem, since any policy σ\sigma satisfying

σ(w)=1{w1βh}for all wR+\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax \right\} \quad \text{for all } w \in \RR_+

is optimal. (We discuss this more formally in Section 8.1.2.2.) Another way to write (8.14) is

σ(w)=1{ww}where   w(1β)h,\sigma(w) = \1 \left\{ w \geq \wopt \right\} \quad \text{where } \; \wopt \coloneq (1 - \beta) \hmax,

where the final term is the reservation wage.

In order to solve (8.13), we introduce the mapping

g(h)=c+βmax{w1β,h}ϕ( ⁣dw)(hR+).g(h) = c + \beta \int \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(\diff w') \qquad (h \in \RR_+).

It is constructed such that any solution to (8.13) is a fixed point of gg and vice versa.

The result of Exercise 8.6 implies that gg has a unique fixed point h\hmax in R+\RR_+, which is the optimal continuation value. Figure 8.1 shows the function gg when lnWt=μ+sZt\ln W_t = \mu + s Z_t for standard normal ZtZ_t, while β=0.9\beta = 0.9 and c=1.0c = 1.0. The integral in (8.16) is computed by Monte Carlo. The unique fixed point is h\hmax.

The function g from

Figure 8.1:The function gg from (8.16)

One obvious advantage of the formulation of the problem in Section 8.1.2.1 is that, instead of searching for a value function v\vmax in the infinite-dimensional space L1(ϕ)L_1(\phi), we only need to solve for the fixed point of gg in the one-dimensional space R+\RR_+. The next exercise further reduces the search space to a bounded interval in R+\RR_+.

Figure 8.2 shows the reservation wage, computed by iterating on gg to obtain (an approximation to) h\hmax and then calculating w\wopt via (8.15). In the computation, cc and the distribution ϕ\phi are as for the last figure, while β\beta ranges from 0.9 to 0.99.

8.1.2.2An FDP Perspective

As an exercise, let’s connect the transformation discussed in Section 8.1.2.1 to the theory of FDPs in Chapter 5. For this discussion we adopt the environment of Section 8.1.1.3 and set

  1. V=L1(ϕ)V = L_1(\phi),

  2. V^=R+\hat V = \RR_+,

  3. F ⁣:VV^F \colon V \to \hat V with Fv=c+βv(w)ϕ( ⁣dw)Fv = c + \beta \int v(w') \phi(\diff w'), and

  4. Gσ ⁣:V^VG_\sigma \colon \hat V \to V with (Gσh)(w)=σ(w)(w/(1β))+(1σ(w))h(G_\sigma \, h)(w) = \sigma(w) (w/(1-\beta)) + (1-\sigma(w)) h.

Clearly, given hV^h \in \hat V, we can attain the bound GτhGσhG_\tau h \leq G_\sigma h for all τΣ\tau \in \Sigma by setting

σ(w)=1{w1βh}(wW).\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq h \right\} \qquad (w \in \Wsf).

Let G{Gσ}σΣ\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}. Since FF and each GσG_\sigma are order-preserving, the tuple (V,F,V^,G)(V, F, \hat V, \GG) is an order-preserving FDP.

For the primary ADP generated by (V,F,V^,G)(V, F, \hat V, \GG), the policy operators have the form

(Tσv)(w)=(GσFv)(w)=σ(w)w1β+(1σ(w))[c+βv(w)ϕ( ⁣dw)].(T_\sigma \, v)(w) = (G_\sigma F v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[c + \beta \int v(w') \phi(\diff w') \right].

This is identical to (8.4), so the primary ADP is just the original job search ADP (L1(ϕ),T)(L_1(\phi), \TT) from Section 8.1.1.1.

Regarding the subordinate ADP generated by (V,V^,F,G)(V, \hat V, F, \GG), the policy operators have the form

T^σh=FGσh=c+β[σ(w)w1β+(1σ(w))h]ϕ( ⁣dw).\hat T_\sigma \, h = F G_\sigma h = c + \beta \int \left[ \sigma(w') \frac{w'}{1-\beta} + (1-\sigma(w')) h \right] \phi(\diff w').

The associated Bellman operator is

T^h=c+βmax{w1β,h}ϕ( ⁣dw).\hat T \, h = c + \beta \int \max \left\{ \frac{w'}{1-\beta} , h \right\} \phi(\diff w').

On inspection, we see that the fixed point of T^\hat T is a solution to (8.13). Thus, the subordinate ADP (V^,T^)(\hat V, \hat{\TT}) represents the continuation value problem from Section 8.1.2.1.

These observations formalize the ideas expressed in Section 8.1.2.1. For example, Theorem 5.2.13 tells us that a policy σΣ\sigma \in \Sigma will be optimal for the job search problem when h\hmax is a fixed point of T^\hat T and σ\sigma obeys Gσh=GhG_\sigma \, \hmax = \Gmax \hmax. (Here GG is the supremum σGσ\bigvee_\sigma G_\sigma.) In view of (8.18), such a σ\sigma can be found by setting

σ(w)=1{w1βh}(wW).\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax \right\} \qquad (w \in \Wsf).

This policy aligns with the (informally derived) solution from (8.14).

8.1.2.3Parametric Monotonicity

How does the solution to the job search problem vary with parameters? In terms of monotonicity, one way to answer this is to appeal to Proposition A.5.20 on page . Since gg is an increasing contraction mapping on R+\RR_+, this proposition implies that any parameter that shifts up the function gg in (8.16) pointwise on R+\RR_+ also shifts its fixed point up.

Figure 8.2 suggests that w\wopt is also increasing in β\beta. Since w=(1β)h\wopt = (1-\beta) \hmax, we cannot infer this directly from the fact that h\hmax is increasing in β\beta. Instead, we take the fixed point equation for h\hmax in (8.13) and substitute w=(1β)hw = (1-\beta) h, which uses the definition of the reservation wage from (8.15), to obtain a new fixed point equation f(w)=wf(w) = w where

f(w)c(1β)+βmax{w,w}ϕ( ⁣dw).f(w) \coloneq c (1-\beta) + \beta \int \max \left\{ w' ,\, w \right\} \phi(\diff w').

How do shifts in the wage offer distribution affect the reservation wage? One observation is that a shift to a “more favorable” wage distribution should increase the reservation wage, since an agent who continues can expect better offers.

A more interesting monotonicity result for this model concerns the volatility of the wage process and its impact on the reservation wage. For this problem, greater volatility encourages patience because the option value of waiting is larger. The next exercise asks you to verify this, using the concept of mean-preserving spreads (page ).

8.1.3Job Search with Correlated Wage Draws

In our simplistic model of job search we have so far assumed that wage offer draws are iid. Now let’s allow these offers to have a Markov structure:

8.1.3.1An ADP Representation

As before, Σ\Sigma is the set of all Borel measurable functions σ\sigma mapping W\Wsf to {0,1}\{0, 1\}. Each policy operator TσT_\sigma is adjusted to

(Tσv)(w)=σ(w)w1β+(1σ(w))[c+βv(w)P(w, ⁣dw)].(T_\sigma \, v)(w) = \sigma(w) \frac{w}{1-\beta} + (1-\sigma(w)) \left[ c + \beta \int v(w') P(w, \diff w') \right].

We can write TσT_\sigma more succinctly as

Tσv=σe+(1σ)(c+βPv)whene(w)w1β,T_\sigma \, v = \sigma e + (1 - \sigma) (c + \beta Pv) \quad \text{when} \quad e(w) \coloneq \frac{w}{1-\beta},

with products such as σe\sigma e defined pointwise.

It follows from Exercise 8.14 and Exercise 8.15 that, with T\TT as all TσT_\sigma in (8.32) for some σΣ\sigma \in \Sigma, the pair (L1(ϕ),T)(L_1(\phi), \TT) is a regular ADP.

We can now state an optimality result for the job search model with Markov wage draws.

The next exercise suggests a way to reduce the value space.

Since TσT_\sigma is also order preserving, (V,T)(V, \TT) is an ADP.

8.1.3.2A Numerical Study

Figure 8.3 shows the output of HPI under a range of parameter values. First we generate a stochastic matrix PP for wage offers via Tauchen’s method, discretizing the AR1 process Wt+1=ρWt+νξt+1W_{t+1} = \rho W_t + \nu \xi_{t+1} where (ξt)(\xi_t) is iid and standard normal. We set β=0.99\beta=0.99, ρ=0.9\rho=0.9, ν=0.2\nu=0.2 and n=500n=500. In the left subfigure we plot v\vmax, computed by HPI, as well as the exit option e(w)=w/(1β)e(w) = w/(1-\beta) and the reservation wage, which is wˉmin{wW:σ(w)=1}\bar w \coloneq \min\{w \in \Wsf : \sigopt(w) = 1\} when σ\sigopt is v\vmax-greedy. The reservation wage is the minimum wage offer at which the unemployed worker accepts.

In the right subfigure we vary the volatility parameter ν\nu over 0.1 to 0.2 and plot wˉ\bar w as a function of ν\nu, holding other parameters fixed. Notice that the reservation wage increases with wage offer volatility. The reason is that more volatility increases the upside of waiting, due to the possibility of high future offers. At the same time, downside risk is mitigated by the ability to reject a bad offer.

Solution to the job search problem

Figure 8.3:Solution to the job search problem

Figure 8.4 shows the first two iterates of HPI, OPI and VFI, as well as the value function v\vmax and the shared initial condition vv. Parameter values are the same as the left-hand subfigure in Figure 8.3. In the case of OPI, mm is set to 10. We see that HPI converges faster than VFI in terms of reduced distance to the value function per iteration. The rate for OPI is between HPI and VFI.

Comparison of algorithms (job search)

Figure 8.4:Comparison of algorithms (job search)

8.1.3.3Persistent and Transient Components

Let’s now look at a more sophisticated wage offer process, with persistent and transient components. In particular, we assume that (Wt)(W_t) obeys

Wt=exp(Zt)+exp(μ+σζt),whereZt+1=ρZt+d+sϵt+1W_t = \exp(Z_t) + \exp(\mu + \sigma \zeta_t), \qquad \text{where} \quad Z_{t+1} = \rho Z_t + d + s \epsilon_{t+1}

Here μ,dR\mu, d \in \RR, σ,s\sigma, s are positive, and ρ(1,1)\rho \in (-1, 1). The sequences (ζt)t1(\zeta_t)_{t \geq 1} and (ϵt)t1(\epsilon_t)_{t \geq 1} are independent, iid, and standard normal. Thus, the persistent component exp(Zt)\exp(Z_t) and the transient component are lognormal. The model is otherwise unchanged. The state becomes (w,z)(0,)×R(w, z) \in (0,\infty) \times \RR and the Bellman equation is

v(w,z)=max{w1β,c+βEzv(w,z)}.v(w, z) = \max \left\{ \frac{w}{1-\beta}, c + \beta \, \EE_z v(w', z') \right\} .

Here Ez\EE_z is expectation conditional on zz. The expectation term can be written more explicitly as

Ezv(w,z)=v[exp(ρz+d+sϵ)+exp(μ+σζ),ρz+d+sϵ]ϕ( ⁣dϵ, ⁣dζ).\EE_z v(w', z') = \int v \left[ \exp(\rho z + d + s \epsilon) + \exp(\mu + \sigma \zeta), \rho z + d + s \epsilon \right] \, \phi(\diff \epsilon, \diff \zeta).

Here zz and the parameters are fixed and ϕ\phi is the N(0,I)N(0, I) distribution on R2\RR^2.

Rather than analyzing this model directly, we can first reduce dimensionality by transforming it via continuation values, analogous to the technique we used in Section 8.1.2. As a first step, let h(z)h(z) be the continuation value from (8.41):

h(z)c+βEzv(w,z)(zR).h(z) \coloneq c + \beta \, \EE_z v(w', z') \qquad (z \in \RR).

(Notice that hh is a function now, as opposed to the iid setting of (8.12). This is not surprising, since the current state can be used to predict future wages, which in turn determine future value.)

Given hh, the Bellman equation can be written as v(w,z)=max{w/(1β),h(z)}v(w, z) = \max \left\{w/(1-\beta), \, h(z) \right\}. Combining this with the definition of hh, we see that

h(z)=c+βEzmax{w1β,h(z)}(zR).h(z) = c + \beta \, \EE_z \max \left\{ \frac{w'}{1-\beta}, h(z') \right\} \qquad (z \in \RR).

(Note the similarity with (8.13).) The function hh is defined on all of R\RR, since this is the domain of zz. If we can obtain the solution h\hmax to this functional equation, we can use it to act optimally via the policy

σ(w,z)=1{w1βh(z)}.\sigopt(w, z) = \1 \left\{ \frac{w}{1-\beta} \geq \hmax(z) \right\}.

To formalize these ideas, we can construct an ADP such that the Bellman equation agrees with (8.44). To do so we take Σ\Sigma to be all Borel measurable functions sending (w,z)(0,)×R(w', z') \in (0, \infty) \times \RR to {0,1}\{0,1\} and, for each σΣ\sigma \in \Sigma, we set

(T^σh)(z)c+βEz{σ(w,z)w1β+(1σ(w,z))h(z)}.(\hat T_\sigma \, h)(z) \coloneq c + \beta \, \EE_z \left\{ \sigma(w', z') \frac{w'}{1-\beta} + (1- \sigma(w', z')) h(z') \right\} .

We take ϕ\phi to be the stationary distribution of (Zt)(Z_t) and consider each T^σ\hat T_\sigma as a mapping over all hL1(ϕ)L1(R,B,ϕ)h \in L_1(\phi) \coloneq L_1(\RR, \bB, \phi). Let T^\hat{\TT} be all such T^σ\hat T_\sigma as σ\sigma ranges over Σ\Sigma. The pair (L1(ϕ),T^)(L_1(\phi), \hat{\TT}) forms an ADP.

We can alternatively write T^σh\hat T_\sigma h as T^σh=mσ+Kσh\hat T_\sigma \, h = m_\sigma + K_\sigma \, h, where

mσ(z)c+βEz{σ(w,z)w1β}and(Kσh)(z)βEz(1σ(w,z))h(z).m_\sigma(z) \coloneq c + \beta \, \EE_z \left\{\sigma(w', z') \frac{w'}{1-\beta} \right\} \quad \text{and} \quad (K_\sigma \, h)(z) \coloneq \beta \, \EE_z \, (1- \sigma(w', z')) h(z').

Each KσK_\sigma is a positive linear operator on L1(ϕ)L_1(\phi) and, moreover, for the positive linear operator KK defined by

(Kh)(z)βEzh(z)=βEh(ρz+d+sϵt+1),(K h)(z) \coloneq \beta \EE_z \, h(z') = \beta \EE h(\rho z + d + s \epsilon_{t+1}),

we have 0KσhKh0 \leq K_\sigma \, h \leq Kh. By Lemma A.5.32, the spectral radius of KK equals β<1\beta < 1. Hence, by Theorem 4.1.8, the fundamental optimality properties hold, and VFI, OPI and HPI all converge.

Let’s now characterize the Bellman operator, which is defined on L1(ϕ)L_1(\phi) by T^h=σT^σh\hat Th = \bigvee_\sigma \, \hat T_\sigma \, h.

Since L1(ϕ)L_1(\phi) is complete, Banach’s contraction mapping theorem implies that T^\hat T has a unique fixed point h\hmax in L1(ϕ)L_1(\phi).

8.2Extensions

In this section we extend the basic job search framework in several directions. Section 8.2.1 introduces nonlinear discounting, where the discount factor depends on the magnitude of continuation value. Section 8.2.2 treats nonlinear expectations via the Kreps--Porteus certainty equivalent. Section 8.2.3 considers job search with learning, where the offer distribution is unknown and the worker updates beliefs. Section 8.2.4 adds job separation risk.

8.2.1Nonlinear Discounting

Next we consider a setting where discounting is a nonlinear function of continuation value. One motivation for this generalized setup is magnitude effects, under which, for some individuals, discount rates seem to decrease with the size of the reward (i.e., large rewards are discounted less, so the discount factor is increasing in the size of the reward; see, e.g., Green et al. (1997)). Our aim is to resolve the job search problem under this new setup.

We suppose that wage offers are PP-Markov on a Borel set WR+\Wsf \subset \RR_+ and the value of continuing is given by

h(w)=c+β[v(w)]P(w, ⁣dw),h(w) = c + \int \beta[ v(w') ] P(w, \diff w'),

where β ⁣:R+R\beta \colon \RR_+ \to \RR is a discount factor function. Given this nonlinear discounting formulation, the lifetime value of a constant wage stream that pays ww is e(w)e(w), where ee is a fixed point of the operator

(Hg)(w)=w+β(g(w))(wW).(H g)(w) = w + \beta(g(w)) \qquad (w \in \Wsf).

(In the case where β(x)=βx\beta(x) = \beta x for some fixed β(0,1)\beta \in (0,1), we get e(w)=w/(1β)e(w) = w / (1-\beta), so we recover the standard constant discount case.) To simplify the analysis, we assume that W=[w1,w2]\Wsf = [w_1, w_2] where 0<w1<w20 < w_1 < w_2. We also assume that 0<c<w10 < c < w_1 and w21bw_2 \geq 1 - b, so that the worst wage offer is better than unemployment compensation. For the discount factor function we set

β(x)bF(x,λ)where b(0,1) and F(x,λ)=1exp(λx).\beta(x) \coloneq b F(x, \lambda) \quad \text{where } b \in (0,1) \text{ and } F(x, \lambda) = 1 - \exp(-\lambda x).

Obtaining optimality results for this model is not entirely trivial because the policy and Bellman operators are not, in general, contractions. This is due to the fact that β\beta can be steep close to zero, as shown in Figure 8.5. At the same time, β\beta is concave, which gives us some hope that we can use the concavity-based fixed point and optimality results from Section 4.1.3.

The discount function \beta for different choices of \lambda and b=0.99

Figure 8.5:The discount function β\beta for different choices of λ\lambda and b=0.99b=0.99

We begin by analyzing HH in (8.55). As a first step, we set

V[0,vˉ]wherevˉc+w21b.V \coloneq [0 , \bar v] \quad \text{where} \quad \bar v \coloneq \frac{c + w_2}{1 - b}.

In this expression, VV is understood as an order interval in bWb\Wsf. We endow VV with the pointwise partial order \leq and the supremum norm.

In general, there is no closed-form expression for ee, but it can be computed numerically by iterating on HH.

The policy set Σ\Sigma is all measurable σ ⁣:W{0,1}\sigma \colon \Wsf \to \{0,1\}. Each policy operator TσT_\sigma becomes

(Tσv)(w)=σ(w)e(w)+(1σ(w))[c+β[v(w)]P(w, ⁣dw)],(T_\sigma \, v)(w) = \sigma(w) \, e(w) + (1-\sigma(w)) \left[ c + \int \beta[ v(w') ] P(w, \diff w') \right],

Let T{Tσ:σΣ}\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}. Since every TσT_\sigma is order-preserving, (V,T)(V, \TT) is an ADP. Extending our earlier analysis, a policy σ\sigma is vv-greedy whenever

σ(w)1{e(w)c+β[v(w)]P(w, ⁣dw)}(wW).\sigma(w) \coloneq \1 \left\{ e(w) \geq c + \int \beta[v(w')] P(w, \diff w') \right\} \qquad (w \in \Wsf).

Since such a policy exists, the ADP (V,T)(V, \TT) is regular.

8.2.2Nonlinear Expectations

In the last section we modified the job search model to include nonlinear discounting. Here we drop nonlinear discounting but assume instead that the job seeker uses a nonlinear expectation of future values. In particular,

Tσv=σe+(1σ)(c+βRv)T_\sigma \, v = \sigma e + (1-\sigma) ( c + \beta Rv)

where

(Rv)(w)(v1γ(w)P(w, ⁣dw))1/(1γ)(wW,  γ1).(Rv)(w) \coloneq \left( \int v^{1-\gamma}(w') P(w, \diff w') \right)^{1/(1-\gamma)} \qquad (w \in \Wsf, \; \gamma \not= 1).

The operator RR is the Kreps--Porteus operator. The value γ\gamma parameterizes risk aversion for the unemployed worker with respect to intertemporal gambles. When γ=0\gamma = 0, we recover the standard linear expectation; when γ>0\gamma > 0, the agent is risk-averse; when γ<0\gamma < 0, the agent is risk-loving. The constant β\beta lies in (0,1)(0,1). As before, e(w)w/(1β)e(w) \coloneq w/(1-\beta) is the stopping reward. (Stopped rewards are deterministic so e(w)e(w) is not affected by γ\gamma.)

The operator TσT_\sigma and the operator RR act on the set

V[c,vˉ]wherevˉc+w21β.V \coloneq [c , \bar v] \quad \text{where} \quad \bar v \coloneq \frac{c + w_2}{1 - \beta}.

As in Section 8.2.1, VV is understood as an order interval in bWb\Wsf and 0<c<w1<w20 < c < w_1 < w_2.

Letting T{Tσ:σΣ}\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}, we aim to show that (V,T)(V, \TT) is an ADP and, moreover, that the conditions of Theorem 4.1.11 hold. As a first step, we observe that, for fixed σΣ\sigma \in \Sigma and for sufficiently small positive ϵ\epsilon,

(Tσc)(w)min{c1β,  c+βc}c+ϵ(vˉc).(T_\sigma \, c)(w) \geq \min \left\{ \frac{c}{1-\beta}, \; c + \beta c \right\} \geq c + \epsilon (\bar v - c).

Also, for sufficiently small positive ϵ\epsilon,

(Tσvˉ)(w)max{w21β,  c+βvˉ}max{vˉc1β,  vˉw2}vˉϵ(vˉc).(T_\sigma \, \bar v)(w) \leq \max \left\{ \frac{w_2}{1-\beta}, \; c + \beta \bar v \right\} \leq \max \left\{ \bar v - \frac{c}{1-\beta}, \; \bar v - w_2 \right\} \leq \bar v - \epsilon (\bar v - c).

Since RR and hence TσT_\sigma is order preserving, these facts tell us that TσT_\sigma maps VV to itself, so (V,T)(V, \TT) is an ADP.

Combining the last two ϵ\epsilon bounds with fact (ii) above, we see that the conditions of Theorem 4.1.11 are satisfied for every γR{1}\gamma \in \RR \setminus \{1\}. As a result, the fundamental optimality properties hold and VFI, OPI, and HPI all converge.

Figure 8.6 shows the reservation wage wˉ\bar w as a function of γ\gamma. The figure was computed as follows. Fixing γ\gamma, we calculated v\vmax via VFI, set

σ(w)1{w1βc+β([v(w)]1γP(w, ⁣dw))1/(1γ)}\sigma(w) \coloneq \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \left( \int [\vmax(w')]^{1-\gamma} P(w, \diff w') \right)^{1/(1-\gamma)} \right\}

and then set wˉmin{wW:σ(w)=1}\bar w \coloneq \min\{w \in \Wsf : \sigma(w) = 1\}. Aside from γ\gamma, the parameters used in the calculations were the same as those given in Section 8.1.3.2. The figure shows that the reservation wage decreases with γ\gamma, as the job seeker becomes progressively more risk-averse. Increasing risk aversion means that gambles over future payoffs become less attractive, which favors stopping over continuing. This encourages the job seeker to lower the reservation wage.

8.2.3Job Search with Learning

Next we consider a variation of the job search model from §6.6 of Ljungqvist & Sargent (2018). The framework is the iid setting of Section 8.1.1.1, apart from the fact that the wage offer distribution ϕ\phi is unknown to the worker. Instead, the agent learns about ϕ\phi by starting with a prior belief and then successively updating her beliefs based on observed wage offers. The learning process and hence the stopping problem is similar to the one we analyzed in Section 1.4. One difference is that we will use discounting, which simplifies optimality results. Another is that we will use a transformation to reduce dimensionality.

8.2.3.1The Model

The structure of information is as follows: The worker knows there are two possible offer distributions, with densities ff and gg. At the start of time, nature selects ϕ\phi to be either ff or gg, the wage distribution from which the entire sequence (Wt)t0(W_t)_{t \geq 0} will be drawn. This choice is not observed by the worker, who puts prior probability π0\pi_0 on ff being chosen. In other words, the worker’s initial guess of ϕ\phi is π0f(w)+(1π0)g(w)\pi_0 f(w) + (1 - \pi_0) g(w). Beliefs subsequently update according to Bayes’ rule. Thus, the agent, having observed Wt+1W_{t+1}, updates πt\pi_t to πt+1\pi_{t+1} via

πt+1=f(Wt+1)πtf(Wt+1)πt+g(Wt+1)(1πt).\pi_{t+1} = \frac{f(W_{t+1})\pi_t}{f(W_{t+1}) \pi_t + g(W_{t+1}) (1 - \pi_t)}.

(Note the connection to the learning dynamics we obtained for the sequential analysis problem in (1.113).)

Using (8.66), we can formulate an ADP representation of the optimal stopping problem. Dropping time subscripts, let ϕππf+(1π)g\phi_{\pi} \coloneq \pi f + (1 - \pi) g represent the estimate of the wage offer distribution given belief π\pi and let

κ(w,π)πf(w)πf(w)+(1π)g(w)(w(0,M),  π(0,1)).\kappa(w, \pi) \coloneq \frac{\pi f(w)}{\pi f(w) + (1 - \pi) g(w)} \qquad (w \in (0, M), \; \pi \in (0, 1)).

In particular, κ(w,π)\kappa(w, \pi) is the updated value π\pi' of π\pi having observed draw ww. The state is (w,π)(0,M)×(0,1)(w, \pi) \in (0, M) \times (0, 1) and π\pi is referred to as the belief state.

The policy operators for this learning search problem take the form

(Tσv)(w,π)=σ(w,π)w1β+(1σ(w,π))[c+βv(w,κ(w,π))ϕπ(w) ⁣dw].(T_\sigma \, v)(w, \pi) = \sigma(w, \pi) \frac{w}{1 - \beta} + (1 - \sigma(w, \pi)) \left[ c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w' \right].

Each TσT_\sigma acts on vVv \in V, which we define as the set of bounded Borel measurable functions on (0,M)×(0,1)(0, M) \times (0, 1). Let T={Tσ:σΣ}\TT = \setntn{T_\sigma }{\sigma \in \Sigma}. Evidently (V,T)(V, \TT) is an ADP.

8.2.3.2An Efficient Solution Method

Rather than tackling (V,T)(V, \TT) directly, we will introduce a variation with a lower dimensional state space. To begin, fix vVv \in V and let ω(π)\omega(\pi) be the corresponding reservation wage at belief state π\pi, which is the wage level at which the worker is indifferent between accepting and rejecting. This value satisfies

ω(π)1β=c+βv(w,κ(w,π))ϕπ(w) ⁣dw.\frac{\omega(\pi)}{1 - \beta} = c + \beta \int v(w', \kappa(w', \pi)) \, \phi_{\pi}(w') \, \diff w'.

We combine (8.70) and (8.71) to obtain

v(w,π)=max{w1β,ω(π)1β}v(w, \pi) = \max \left\{ \frac{w}{1 - \beta} ,\, \frac{\omega(\pi)}{1 - \beta} \right\}

and then use this expression to eliminate vv in (8.71), obtaining

ω(π)=(1β)c+βmax{w,ω[κ(w,π)]}ϕπ(w) ⁣dw.\omega(\pi) = (1 - \beta) c + \beta \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w'.

Equation (8.73) can be understood as a functional equation in ω\omega. Equivalently, the map ω\omega is the fixed point of the operator T^\hat T given by

(T^ω)(π)=(1β)c+βmax{w,ω[κ(w,π)]}ϕπ(w) ⁣dw.(\hat T \omega)(\pi) = (1 - \beta) c + \beta \int \max \left\{ w', \omega [ \kappa(w', \pi) ] \right\} \, \phi_{\pi}(w') \, \diff w'.

When this fixed point is well-defined we call it the optimal reservation wage function. The value ω(π)\omega(\pi) will indicate the smallest wage offer at which the worker is willing to accept, given her current belief state π\pi.

8.2.3.3Parametric Monotonicity

Let’s try computing the optimal reservation wage function using the ideas described above. The wage offer distributions are set to

f=Beta(4,2)andg=Beta(2,4),f = \text{Beta}(4, 2) \quad \text{and} \quad g = \text{Beta}(2, 4),

as shown in Figure 8.7. The other parameters are c=0.1c=0.1 and β=0.95\beta = 0.95. Since T^\hat T is a contraction of modulus β\beta on V^\hat V, a unique solution ω\omegaopt to the reservation wage functional equation exists in V^\hat V and T^kωω\hat T^k \omega \to \omegaopt uniformly as kk \to \infty, for any ωV^\omega \in \hat V. Figure 8.8 shows the result of this iteration, the optimal reservation wage, as a function of π\pi, the belief state.

The two unknown densities f and g

Figure 8.7:The two unknown densities ff and gg

Optimal reservation wage function \omegaopt

Figure 8.8:Optimal reservation wage function ω\omegaopt

Note that the optimal reservation wage function ω\omegaopt in Figure 8.8 is increasing in π\pi. This result seems reasonable: In Figure 8.7, the density ff puts more mass on higher draws, so, as our belief shifts toward ff, our reservation wage should increase. The next proposition gives conditions for such monotonicity.

8.2.4Job Search with Separation

We consider a version of the job search model from Section 8.1.3.1 where separation can occur. In particular, an existing match between worker and firm dissolves with probability α\alpha every period. Note that this discussion extends a treatment of a similar model in a finite-state setting from Chapter 3 of Sargent & Stachurski (2025).

To simplify the discussion, we assume that the set of possible wage offers WR+\Wsf \subset \RR_+ is bounded above by some positive constant MM. The state space for the problem is X{e,u}×W\Xsf \coloneq \{e, u\} \times \Wsf, with a typical element (s,w)(s, w) denoting employment status ss (here ee means employed and uu means unemployed), and current offer ww. A policy is a Borel measurable map σ ⁣:W{0,1}\sigma \colon \Wsf \to \{0,1\}, where, as usual, σ(w)=0\sigma(w)=0 means “reject the current offer” and σ(w)=1\sigma(w)=1 means “accept.”

The wage offer sequence (Wt)(W_t) is assumed to be PP-Markov on W\Wsf. The value space VV will be all bounded and Borel measurable v ⁣:XRv \colon \Xsf \to \RR and we endow VV with the supremum norm and the pointwise partial order.

The policy operators take the form

(Tσv)(e,w)=w+β[αv(u,w)P(w, ⁣dw)+(1α)v(e,w)](T_\sigma \, v)(e, w) = w + \beta \left[ \alpha \int v(u, w') P(w, \diff w') + (1-\alpha) v(e, w) \right]

and

(Tσv)(u,w)=σ(w)v(e,w)+(1σ(w))[c+βv(u,w)P(w, ⁣dw)].(T_\sigma \, v)(u, w) = \sigma(w) v(e, w) + (1 - \sigma(w)) \left[ c + \beta \, \int v(u, w') P(w, \diff w') \right].

The right-hand side of the first expression is the current value of being employed with offer ww in hand, given the continuation values embodied in vv. The right-hand side of the second expression is the current value of being unemployed with offer ww in hand, conditional on using policy σ\sigma.

We can solve this problem directly by setting up the corresponding ADP, but we can also start by simplifying the value space in a way we now describe. This will make the analysis easier and help with computation. The first step is to regard (8.86) as a fixed point problem, replacing (Tσv)(e,w)(T_\sigma \, v)(e, w) with v(e,w)v(e, w) on the left hand side and treating v(u,)v(u, \cdot) as given. Simple algebra then gives

v(e,w)=11β(1α)[w+αβv(u,w)P(w, ⁣dw)].v(e, w) = \frac{1}{1 - \beta(1-\alpha)} \left[ w + \alpha \beta \int v(u, w') P (w, \diff w') \right].

Let’s write this in operator notation. In doing so, we will rewrite v(u,)v(u, \cdot) as vuv_u and v(e,)v(e, \cdot) as vev_e. Setting

h(w)11β(1α)w,andγαβ1β(1α),h(w) \coloneq \frac{1}{1 - \beta(1-\alpha)} w, \quad \text{and} \quad \gamma \coloneq \frac{\alpha \beta}{1 - \beta(1-\alpha)},

we have ve=h+γPvuv_e = h + \gamma P v_u. We substitute this expression into (8.87) to get

Tσvu=σ(h+γPvu)+(1σ)(c+βPvu),T_\sigma \, v_u = \sigma(h + \gamma Pv_u) + (1 - \sigma) (c + \beta P v_u),

We take bWb\Wsf as the value space and let T={Tσ:σΣ}\TT = \setntn{T_\sigma}{\sigma \in \Sigma}. As before, Σ\Sigma is all Borel measurable maps from W\Wsf to {0,1}\{0,1\}. Recalling that W\Wsf is bounded above, one can easily confirm that TσT_\sigma maps bWb\Wsf to itself. Clearly TσT_\sigma is order preserving. Hence (bW,T)(b\Wsf, \TT) is an ADP.

Given vubWv_u \in b\Wsf, set ve=h+γPvuv_e = h + \gamma P v_u and consider the policy σ\sigma defined by

σ(w)1{ve(w)c+βvu(w)P(w, ⁣dw)}for all wW.\sigma(w) \coloneq \1 \left\{ v_e(w) \geq c + \beta \, \int \, v_u(w') P(w, \diff w') \right\} \quad \text{for all } w \in \Wsf.

We claim that σ\sigma is vuv_u-greedy. Indeed, for such a σ\sigma and any alternative policy ss we have

Tsvu=s(h+γPvu)+(1s)(c+βPvu)(h+γPvu)(c+βPvu)=Tσvu.T_s \, v_u = s(h + \gamma Pv_u) + (1 - s) (c + \beta P v_u) \leq (h + \gamma Pv_u) \vee (c + \beta P v_u) = T_\sigma \, v_u .

The expression for σ\sigma in (8.91) is natural because it tells the worker to accept employment whenever its value is higher than the expected present value of continuing, given the continuation value for unemployment associated with vuv_u.

Regarding optimality, we have the following result.

The value function vu\vmax_u for an unemployed worker satisfies the recursion

vu(w)=max{ve(w),c+βvu(w)P(w, ⁣dw)}(wW),\vmax_u(w) = \max \left\{ \vmax_e(w) ,\, c + \beta \, \int \vmax_u(w') P(w, \diff w') \right\} \qquad (w \in \Wsf),

where ve\vmax_e is the value function for an employed worker, that is, the lifetime value of a worker who starts the period employed at wage ww. Value function ve\vmax_e satisfies

ve(w)=w+β[αvu(w)P(w, ⁣dw)+(1α)ve(w)](wW).\vmax_e(w) = w + \beta \left[ \alpha \int \vmax_u(w') P(w, \diff w') + (1-\alpha) \vmax_e(w) \right] \qquad (w \in \Wsf).

This equation states that value accruing to an employed worker is current wage plus the discounted expected value of being either employed or unemployed next period.

We claim that, when 0<α,β<10 < \alpha, \beta < 1, the system (8.96)--(8.97) has a unique solution (ve,vu)(\vmax_e, \vmax_u) in bW×bWb\Wsf \times b\Wsf.

Substituting into (8.96) yields

vu(w)=max{11β(1α)(w+αβ(Pvu)(w)),c+β(Pvu)(w)}.\vmax_u(w) = \max \left\{ \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P \vmax_u)(w) \right) ,\, c + \beta \, (P \vmax_u)(w) \right\}.

The stopping and continuation values are given by

s(w)11β(1α)(w+αβ(Pvu)(w))andh(w)c+β(Pvu)(w)\sopt(w) \coloneq \frac{1}{1 - \beta(1-\alpha)} \left(w + \alpha \beta (P \vmax_u)(w) \right) \quad \text{and} \quad \hmax(w) \coloneq c + \beta \, (P \vmax_u)(w)

respectively, for each wWw \in \Wsf. The value function vu\vmax_u is the pointwise maximum (i.e., vu=sh\vmax_u = \sopt \vee \hmax). The worker’s optimal policy while unemployed is

σ(w)1{s(w)h(w)}.\sigopt(w) \coloneq \1\{\sopt(w) \geq \hmax(w)\}.

8.3More Applications

In this section we present several additional applications of the theory developed in earlier chapters. Section 8.3.1 studies problems with negative discounting and connects them to Coase’s theory of the firm. Section 8.3.2 treats an optimal harvest problem and shows how factorization reduces dimensionality. Section 8.3.3 develops Euler equation methods for continuous choice problems, and Section 8.3.4 establishes the equivalence of value function iteration and time iteration.

8.3.1Coase Meets Bellman

In this section, we study a production chain model that can capture the key idea from Coase’s classic study on the nature of the firm Coase (1937). We then show how the equilibrium price function can also be recovered as the solution to a dynamic programming problem. The dynamic programming problem is itself interesting: it analyzes loss minimization with negative discounting, a scenario that appears to have significant empirical relevance. Moreover, negative discounting makes the application of dynamic programming nontrivial. We solve the problem using order stability arguments combined with the theory developed in Section 2.2.2.

8.3.1.1A Production Chain

Coase (1937) argues that firms have nontrivial size because of transaction costs associated with using the market. (One example is the cost of negotiating, drafting, monitoring and enforcing contracts with suppliers. Others include search frictions, transaction fees, taxes, and information costs --- see, e.g., Williamson (1979)North (1993)Blume et al. (2009)). Because of these costs, entrepreneurs and managers can sometimes coordinate production at a lower cost within the firm.

At the same time, a countervailing force, referred to by Coase (1937) as “diminishing returns to management,” prevents firms expanding without limit. Rising costs per task can be thought of as driven by the expanding informational requirements associated with larger planning problems, leading to progressively higher management costs, incentive problems and misallocation of resources. The size of firms is determined by the trade-off associated with these two forces (transaction costs and diminishing returns to management).

Here we discuss a model that captures these ideas. In the model, firms produce a unit of a final good via sequential completion of processing stages. Stages are indexed by t[0,1]t \in [0,1], with t=1t=1 indicating that the good is complete. One allocation of tasks is illustrated in Figure 8.9. In this example, firm 1 sells one unit of the completed good to a final buyer. Firm 1 then contracts with firm 2 to purchase the partially completed good at stage t1t_1, with the intention of implementing the remaining 1t11 - t_1 tasks in-house (i.e., processing from stage t1t_1 to stage 1). Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at stage t2t_2. Firm 3 decides to complete the chain, selecting t3=0t_3 = 0. The value tit_i is called the upstream boundary of firm ii.

Recursive allocation of production tasks

Figure 8.9:Recursive allocation of production tasks

Production now unfolds from upstream to downstream. First, firm 3 completes processing stages from t3=0t_3 = 0 up to t2t_2 and transfers the good to firm 2. Firm 2 then processes from t2t_2 up to t1t_1 and transfers the good to firm 1, who processes from t1t_1 to 1 and delivers the completed good to the final buyer. In what follows, the length of the interval of stages carried out by firm ii is denoted by aia_i and referred to as the range of tasks carried out by firm ii. Figure 8.10 helps to clarify notation.

Notation

Figure 8.10:Notation

There is a countable infinity of ex ante identical firms and no fixed costs or barriers to entry. An allocation is a nonnegative sequence A={ai}iNA = \{a_i\}_{i \in \NN}. An allocation AA defines a division of tasks across firms, with aia_i being the range of tasks implemented by the ii-th firm. If ai=0a_i = 0 then firm ii is understood to be inactive. An allocation AA is called feasible if there exists some finite II with i=1Iai=1\sum_{i = 1}^I a_i = 1. Feasibility means that the entire production process is completed by finitely many firms. Given a feasible allocation AA, let {ti}\{t_i\} represent the corresponding transaction stages, defined by t0=1t_0 = 1 and ti=ti1ait_i = t_{i-1} - a_i. In particular, ti1t_{i-1} is the downstream boundary of firm ii and tit_i is its upstream boundary (as in Figure 8.10).

Firms face a price function p ⁣:[0,1]R+p \colon [0, 1] \to \RR_+, with p(t)p(t) indicating the price of the good at stage tt. Since the ii-th firm purchases the good at stage tit_i, sells it at stage ti1t_{i-1}, and undertakes the remaining aia_i tasks in-house, its total costs are its processing costs c(ai)c(a_i) plus gross input cost δp(ti)\delta p(t_i). The term δ>1\delta > 1 represents transaction costs. Transaction costs are incurred only by the buyer (a simplifying assumption), so its profits are

πi=p(ti1)c(ai)δp(ti).\pi_i = p(t_{i-1}) - c(a_i) - \delta p(t_i) .

Diminishing returns to management are implemented by assuming that cc is increasing and strictly convex. We also assume that cc is continuously differentiable, with c(0)=0c(0)=0 and c(0)>0c'(0) > 0.

Condition (i) is a zero profit condition for suppliers of initial inputs (which are costless). Condition (ii) states that all active firms make zero profits. Condition (iii) ensures that no firm in the production chain has an incentive to deviate, and that no inactive firm can enter and extract positive profits.

To construct an equilibrium we introduce the operator TT mapping pp to TpTp via

(Tp)(s)=mints{c(st)+δp(t)}(0s0).(Tp)(s) = \min_{t \leq s} \, \{ c(s-t) + \delta p(t) \} \qquad (0 \leq s \leq 0).

Here and below, the restriction 0t0 \leq t in the minimum is understood. Since δ>1\delta > 1, the map TT is not a contraction in any obvious metric, and TnpT^n p diverges for many choices of pp, even when continuous and bounded.[1] Nevertheless, there exists a domain on which TT is well-behaved: the set of convex increasing continuous functions p ⁣:[0,1]Rp \colon [0,1] \to \RR such that c(0)sp(s)c(s)c'(0) s \leq p(s) \leq c(s) for all 0s10 \leq s \leq 1. We denote this set of functions by P\pP.

This result is proved in Kikuchi et al. (2018). In Section 8.3.1.2 below, we give an alternative proof by constructing a dynamic program whose value function coincides with pp^*.

The significance of pp^* is that there exists an allocation AA^* such that (p,A)(p^*, A^*) is an equilibrium for the production chain in the sense of Definition 8.3.1. To construct this allocation, we introduce the equilibrium choice function

t(s)argmints{c(st)+δp(t)}.t^*(s) \coloneq \argmin_{ t \leq s} \{c(s - t) + \delta p^*(t) \}.

By definition, t(s)t^*(s) is the cost minimizing upstream boundary for a firm contracted to deliver the good at stage ss and facing the price function pp^*. Since pp^* lies in P\pP and since cc is strictly convex, the minimizer t(s)t^*(s) exists and is uniquely defined.

We can use tt^* to construct an equilibrium allocation. The optimal upstream boundary of firm 1 is t(1)t^*(1). Hence firm 2’s optimal upstream boundary is t(t(1))t^*(t^*(1)). Continuing in this way produces the sequence {ti}\{t^*_i\} defined by

t0=1andti=t(ti1).t_0^* = 1 \quad \text{and} \quad t_i^* = t^*(t^*_{i-1}) .

The sequence ends when a firm chooses to complete all remaining tasks. We label this firm (and hence the number of firms in the chain) as nn^*, defined by ninf{iN:ti=0}n^* \coloneq \inf\setntn{i \in \NN}{t^*_i = 0}. The task allocation corresponding to (8.104) is given by aiti1tia_i^* \coloneq t_{i-1}^* - t_i^* for all ii. The function pp^* is called the equilibrium price function and AA^* is called the equilibrium allocation. Kikuchi et al. (2018) prove the following result:

The idea of the proof is as follows. As a fixed point of TT, the equilibrium price function satisfies

p(s)=mints{c(st)+δp(t)}for alls[0,1].p^*(s) = \min_{t \leq s} \, \{ c(s - t) + \delta p^*(t) \} \quad \text{for all} \quad s \in [0, 1].

From this equation it is clear that pp^* satisfies part (iii) of Definition 8.3.1. Moreover, the equilibrium upstream boundary for firm ii is the minimizer in (8.105) when ss is its downstream boundary, so profits are zero for all incumbent firms. Hence part (ii) of Definition 8.3.1 is satisfied. Part (i) of the definition is immediate from the fact that pPp^* \in \pP, whence we obtain p(0)c(0)=0p^*(0) \leq c(0) = 0. More details can be found in Kikuchi et al. (2018).

The first order condition for the minimization in (8.105) is

δ(p)(t(s))=c(st(s)).\delta (p^*)'(t^*(s)) = c'(s - t^*(s)).

The left-hand side is the marginal cost of purchasing an additional unit of input through the market (including the transaction cost markup δ\delta), while the right-hand side is the marginal cost of performing one more task in-house. At the optimum, these two forces balance, just as Coase (1937) argued verbally: “a firm will tend to expand until the costs of organizing an extra transaction within the firm become equal to the costs of carrying out the same transaction by means of an exchange on the open market.” For more discussion (and a proof of the differentiability of pp^*), see Kikuchi et al. (2018).

Figure 8.11 shows equilibrium prices and allocations for two values of the transaction cost parameter, using the exponential cost function c(a)=eθa1c(a) = e^{\theta a} - 1 with θ=10\theta=10. The vertical lines are firm boundaries, computed via (8.104). When δ=1.02\delta = 1.02 (top panel), transaction costs are low and we see many small firms. When δ=1.2\delta = 1.2 (bottom panel), higher transaction costs encourage less use of the market and more in-house production, yielding fewer, larger firms.

Firm boundaries for \delta = 1.02 (top) and \delta = 1.2 (bottom)

Figure 8.11:Firm boundaries for δ=1.02\delta = 1.02 (top) and δ=1.2\delta = 1.2 (bottom)

8.3.1.2Optimality with Negative Discounting

Interestingly, we can write down a dynamic program such that the value function corresponds with the equilibrium price function defined above. The dynamic program involves negative discounting and is itself useful and informative. Some background discussion and motivation is provided in Section 8.4.

In the model, an agent is faced with a task of measure x^>0\hat x > 0. At time tt they have xtx_t units of the task remaining. They then take action at0a_t \geq 0 and incur loss c(at)c(a_t). The state moves to xtatx_t - a_t. We interpret ata_t as effort and c(at)c(a_t) as disutility. The optimization problem is

min{at}  t=0δtc(at)     s.t.     t=0at=x^.\min_{\{a_t\}} \; \sum_{t=0}^{\infty} \delta^t c(a_t) \;\; \text{ s.t. } \;\; \sum_{t=0}^{\infty} a_t = \hat x.

Throughout this section, we suppose that δ>1\delta > 1, that c(0)=0c(0) = 0, and that cc is continuously differentiable, strictly increasing and strictly convex. The convexity in cc encourages the agent to defer some effort. Negative discounting (δ>1\delta > 1) has the opposite effect: the agent wants to get the task “over and done with.” This trade-off determines the optimum.

We also assume that c(0)>0c'(0) > 0 and that there exists an η(0,x^)\eta \in (0, \hat x) satisfying

c(η)=δc(0).c'(\eta) = \delta c'(0).

Such an η\eta exists and is unique whenever x^\hat x is large enough, since cc' is continuous and strictly increasing with c(0)<δc(0)c'(0) < \delta c'(0).

Let VV be all increasing v ⁣:[0,x^]Rv \colon [0, \hat x] \to \RR with v(0)=0v(0)=0. The policy operators for this dynamic program take the form

(Tσv)(x)=c(xσ(x))+δv(σ(x))(vV,  0xx^).(T_\sigma \, v)(x) = c(x - \sigma(x)) + \delta v(\sigma(x)) \qquad (v \in V, \; 0 \leq x \leq \hat x).

Here a policy is a function σ ⁣:[0,x^][0,x^]\sigma \colon [0, \hat x] \to [0, \hat x] satisfying

  1. 0σ(x)x0 \leq \sigma(x) \leq x for all xx,

  2. σ\sigma is increasing,

  3. the effort level π(x)xσ(x)\pi(x) \coloneq x - \sigma(x) is increasing, and

  4. σ(x)=0\sigma(x) = 0 if and only if xηx \leq \eta.

Let Σ\Sigma be the set of all such policies.

Note that (ii) and (iii) together imply that σ\sigma is 1-Lipschitz, and hence continuous. Note also that, for each σΣ\sigma \in \Sigma, we have TσvVT_\sigma v \in V whenever σΣ\sigma \in \Sigma and vVv \in V. This follows from (Tσv)(0)=c(0)+δv(0)=0(T_\sigma v)(0) = c(0) + \delta v(0) = 0 and the fact that TσvT_\sigma v is increasing --- which is true because σ\sigma, π\pi, cc and vv are all increasing.

Let T\TT be the set of all policy operators. Each TσT_\sigma is an order-preserving self-map on VV, so (V,T)(V, \TT) is an ADP. The next exercise characterizes iterates of TσT_\sigma and can be solved by induction.

Somewhat surprisingly, each TσT_\sigma is well-behaved on VV. The next lemma gives details.

Let V0V_0 be all convex continuous vVv \in V satisfying c(0)xv(x)c(x)c'(0) x \leq v(x) \leq c(x) for all 0xx^0 \leq x \leq \hat x. The proofs of the next two lemmas are straightforward but lengthy, so we present them as exercises.

Here and below, the restriction 0a0 \leq a in the minimum is understood.

8.3.1.2.1Connection to the production chain.

Setting x^=1\hat x = 1, the Bellman operator in Theorem 8.3.6 coincides with the operator TT defined in (8.102), and V0=PV_0 = \pP. Lemma 8.3.5 then provides an alternative proof of Proposition 8.3.1: part (i) is the statement that TT maps V0=PV_0 = \pP into itself, part (ii) follows from uniqueness of vˉ\bar v in V0=PV_0 = \pP (with p=vˉ=vp^* = \bar v = \vmin), and part (iii) is the finite-step convergence TkpvˉT^k p \to \bar v for all pPp \in \pP. In particular, the equilibrium price function of the production chain coincides with the value function of the negative-discounting dynamic program.

8.3.2Optimal Harvests

In this section we examine a model of forestry management and optimal harvests. We set up the problem in Section 8.3.2.1 and then show in Section 8.3.2.2 how factorization reduces the dimensionality of the state space. Section 8.3.2.3 discusses when optimality for the subordinate ADP implies optimality for the primary.

8.3.2.1Setup

A manager controls a timber plantation with biomass sts_t at time tt. The unit price for timber at time tt is ptp_t. The manager observes (st,pt)(s_t, p_t) and decides whether to harvest or not. A decision to harvest generates revenue stpts_t p_t. If she chooses to wait, then time updates to the next period and the process repeats.

Biomass takes values in S\Ssf, a closed and bounded interval in R+\RR_+, and evolves according to st+1=q(st)s_{t+1} = q(s_t), where qq is a continuous self-map on S\Ssf. If q(0)>0q(0) > 0, then the plantation regenerates after each harvest. If not, the plantation never regenerates and the problem below is an optimal stopping problem.

We assume that the price sequence (pt)(p_t) is iid with distribution ϕ\phi on closed and bounded interval ER+\Esf \subset \RR_+. The cost of harvesting given biomass ss is m(s)m(s). The cost of maintaining the plantation for one period, rather than harvesting, is c(s)c(s). Both mm and cc are continuous real-valued functions on S\Ssf. The firm is risk neutral and discounts the future using discount factor β<1\beta < 1.

The state space for the model is S×E\Ssf \times \Esf. The Bellman equation can be expressed as

v(s,p)=max{psm(s)+βv(q(0),p)ϕ( ⁣dp),  c(s)+βv(q(s),p)ϕ( ⁣dp)}.  v(s, p) = \max \\ \left\{ p s - m(s) + \beta \int v(q(0), p') \phi(\diff p') ,\; - c(s) + \beta \int v(q(s), p') \phi(\diff p') \right\}. \;

Alternatively, we can write

(Tv)(s,p)=maxa{r(s,p,a)+βv[f(s,a),p]ϕ( ⁣dp)},(T v)(s, p) = \max_a \left\{ r(s, p, a) + \beta \int v[f(s, a), p'] \phi(\diff p') \right\},

where aa takes values in {0,1}\{0, 1\}, and

r(s,p,a)a(psm(s))(1a)c(s)andf(s,a)q[(1a)s].r(s, p, a) \coloneq a (p s - m(s)) - (1-a) c(s) \quad \text{and} \quad f(s, a) \coloneq q[(1 - a)s].

Biomass ss takes values in S\Ssf, while the price pp takes values in E\Esf. Both S\Ssf and E\Esf are closed and bounded intervals in R+\RR_+. The functions mm and cc are in bcSbc\Ssf, while qq is a continuous self-map on S\Ssf.

A feasible policy σ\sigma is a measurable map from S×E\Ssf \times \Esf to {0,1}\{0, 1\}, with 0 indicating the decision not to harvest and 1 indicating harvest. The policy operator corresponding to this model is

(Tσv)(s,p)rσ(s,p)+βv[f(s,σ(s,p)),p]ϕ( ⁣dp),(T_\sigma \, v)(s, p) \coloneq r_\sigma(s, p) + \beta \int v[f(s, \sigma(s, p)), p'] \phi(\diff p'),

where

rσ(s,p)=r(s,p,σ(s,p))(σΣ,  sS,  pE).r_\sigma(s, p) = r(s, p, \sigma(s, p)) \qquad (\sigma \in \Sigma, \; s \in \Ssf, \; p \in \Esf).

With Σ\Sigma as the set of all feasible policies, VV as the set of bounded measurable real-valued functions on S×E\Ssf \times \Esf, and T{Tσ:σΣ}\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}, the pair (V,T)(V, \TT) is an ADP with Bellman operator equal to (8.117).

8.3.2.2Factorization

We can reduce dimensionality for this ADP via a transformation. To construct it, we set V^\hat V to be the bounded measurable real-valued functions on S×{0,1}\Ssf \times \{0,1\}. If we set

(Fv)(s,a)v[f(s,a),p]ϕ( ⁣dp)(vV),(Gσw)(s,p)rσ(s,p)+βw(s,σ(s,p))(wV^),\begin{aligned} (Fv)(s, a) & \coloneq \int v[f(s, a), p'] \phi(\diff p') \qquad (v \in V), \\ (G_\sigma \, w)(s, p) & \coloneq r_\sigma(s, p) + \beta w(s, \sigma(s, p)) \qquad (w \in \hat V), \end{aligned}

and G{Gσ}σΣ\GG \coloneq \{G_\sigma\}_{\sigma \in \Sigma}, then

  1. (V,F,V^,G)(V, F, \hat V, \GG) is an order-preserving FDP and

  2. (V,T)(V, \TT) is the primary ADP.

For the subordinate ADP (V^,T^)(\hat V, \hat{\TT}), we get

(T^σw)(s,a)=(FGσw)(s,a)={rσ(f(s,a),p)+βw[f(s,a),σ(f(s,a),p)]}ϕ( ⁣dp).\begin{aligned} (\hat T_\sigma \, w)(s, a) & = (F G_\sigma \, w)(s, a) \\ & = \int \left\{ r_\sigma(f(s, a), p') + \beta w[f(s, a), \sigma(f(s, a), p')] \right\} \phi(\diff p'). \end{aligned}

Notice that the functions in VV are defined over points (s,p)S×E(s, p) \in \Ssf \times \Esf, while functions in V^\hat V are defined over points (s,a)S×{0,1}(s, a) \in \Ssf \times \{0, 1\}. The functions in the second set are typically much easier to work with (because E\Esf is larger than {0,1}\{0,1\}).

Let T^\hat T be the Bellman operator corresponding to the subordinate ADP (V^,T^)(\hat V, \hat{\TT}). In light of Theorem 5.2.13, we can compute an optimal policy for (V,T)(V, \TT) by

  1. computing the unique fixed point w\wopt of T^\hat T in V^\hat V and

  2. finding a policy σ\sigma obeying

σ(s,p)argmaxa{r(s,p,a)+βw(s,a)}((s,p)S×E).\sigma(s, p) \in \argmax_a \left\{ r(s, p, a) + \beta \wopt(s, a) \right\} \qquad ((s, p) \in \Ssf \times \Esf).

8.3.2.3The Converse Implication

Since the subordinate ADP (V^,T^)(\hat V, \hat{\TT}) has a lower-dimensional state space, it is natural to want to solve it first and transfer the resulting optimal policy back to the primary ADP (V,T)(V, \TT). As discussed in Section 5.2.2.4, however, this is not always valid: a policy that is optimal for the subordinate need not be optimal for the primary.

The reason can be understood as follows. The subordinate ADP evaluates a policy σ\sigma only at continuation states of the form (f(s,a),p)(f(s, a), p'), where pp' is drawn from ϕ\phi. In the harvest model, these biomass levels lie in q(S){q(0)}q(\Ssf) \cup \{q(0)\}. If the growth function qq does not map S\Ssf onto itself, then there exist biomass levels s0Ss_0 \in \Ssf that never arise as continuations. For example, if qq maps S=[0,10]\Ssf = [0, 10] into [2,8][2, 8], then biomass s0=1s_0 = 1 is a valid initial state but can never occur after the first period. The subordinate ADP never evaluates σ\sigma at such states, so a subordinate-optimal policy may make an arbitrarily poor harvest decision there. The primary ADP, however, evaluates σ\sigma at every state (s,p)S×E(s, p) \in \Ssf \times \Esf and would detect the suboptimal choice.

The additional condition needed for the converse is that FF be strictly order preserving (see Proposition 5.2.14). In this model, sufficient conditions include: ϕ\phi has a positive density on E\Esf, qq maps S\Ssf onto itself, and we restrict attention to continuous functions. Under these conditions, any strict difference between two value functions at some (s0,p0)(s_0, p_0) propagates through FF to a strict difference in the subordinate value space, closing the gap.

8.3.3Euler Equation Methods

The Euler equation is a first-order condition for optimality that can be used for analysis and computation. Euler equations are available in a range of settings, including but not limited to household savings problems and optimal growth problems. Here we will work in the context of a smooth optimal growth model with iid shocks. Admittedly, this version of the model is too simple to be useful for serious economic analysis. At the same time it provides a convenient vehicle for our exploration of Euler equations and related topics. Most of the ideas discussed here carry over to other settings where Euler equations can be obtained.

We begin with the model and collect basic optimality results. We then derive the envelope condition and use it to obtain the Euler equation, first in sequential form and then as a functional equation on policies. The functional form leads to the Coleman--Reffett operator KK, whose fixed points are solutions to the Euler equation. Time iteration---iterating with KK---provides a method for computing the optimal policy directly, without first solving for the value function. We also discuss the endogenous grid method, which accelerates time iteration by avoiding a nonlinear root-finding step. In Section 8.3.4 we establish that VFI and time iteration are equivalent, in the sense that TT and KK are topologically conjugate.

8.3.3.1A Growth Model

We consider an optimal growth model where shocks are multiplicative and iid, with income evolving according to

yt+1=f(ytct)ξt+1,y_{t+1} = f(y_t - c_t) \xi_{t+1},

where ff is a production function. The Bellman equation is

v(y)=max0cy{u(c)+βv(f(yc)z)ϕ(z) ⁣dz}v(y) = \max_{0 \leq c \leq y} \left\{ u(c) + \beta \int v(f(y - c) z) \phi(z) \diff z \right\}

for every yR+y \in \RR_+, where uu is a utility function and ϕ\phi is the density of the shock process.

We work in the following environment:

We can set this up as an RDP with X=A=R+\Xsf = \Asf = \RR_+ by defining

The next lemma shows that (Γ,V,B)(\Gamma, V, B) has strong continuity properties.

8.3.3.2Envelope Theorems

We will make use of a differential characterization of greedy policies that is closely connected to the Euler equation and will be useful in what follows. A proof can be found in Section 12.1 of Stachurski (2022).

One interesting aspect of Proposition 8.3.9 is that vv does not have to be differentiable. Hence, the Bellman operator is smoothing, in the sense that images of some nonsmooth functions are smooth.

Here’s an important corollary:

Corollary 8.3.10 has been presented in many forms in the economics literature and (8.130) is often called the envelope condition.

8.3.3.3The Sequential Euler Equation

In the present setting, the Euler equation takes the form

u(ct)=βEt[u(ct+1)f(ytct)ξt+1]u'(c_t) = \beta \, \EE_t \left[ u'(c_{t+1}) f'(y_t - c_t) \xi_{t+1} \right]

We refer to (8.131) as the sequential Euler equation because it restricts the endogenous sequence (ct)(c_t).

The left-hand side is the marginal utility of consuming one additional unit today. The right-hand side is the expected marginal benefit of saving that unit instead: one unit of savings yields a gross return of f(ytct)ξt+1f'(y_t - c_t) \xi_{t+1} in the next period, which is then valued at marginal utility u(ct+1)u'(c_{t+1}) and discounted by β\beta. At the optimum, these two quantities are equalized---any deviation from (8.131) would allow the agent to improve lifetime utility by shifting consumption between periods.

The Euler equation is typically understood as a necessary condition for optimality of a consumption-savings path. However, when applied to policies rather than consumption paths, it turns out to be sufficient as well. Moreover, studying it leads to new insights on optimal behavior and computational methods.

To investigate these ideas, let’s shift to a policy-based perspective, where the Euler equation becomes a functional restriction on policies. Set

ΣC all continuous, strictly increasing σΣ satisfying 0<σ(y)<y.\Sigma_{\cC} \coloneq \text{ all continuous, strictly increasing } \sigma \in \Sigma \text{ satisfying } 0 < \sigma(y) < y \text{.}

By continuity, each σ\sigma in ΣC\Sigma_{\cC} satisfies σ(0)=0\sigma(0) = 0. In what follows, we will say that σΣC\sigma \in \Sigma_{\cC} satisfies the Euler equation if

(uσ)(y)=β(uσ)(f(yσ(y))z)f(yσ(y))zϕ( ⁣dz)     for all y>0.(u'\circ \sigma)(y) = \beta \int (u'\circ \sigma)(f(y - \sigma(y)) z) f'(y - \sigma(y)) z \phi(\diff z) \;\; \text{ for all } y > 0.

To solve the functional Euler equation, we convert it into a fixed point problem. Consider the operator KK from ΣC\Sigma_{\cC} to itself defined as follows: for each σΣC\sigma \in \Sigma_{\cC} and each y>0y > 0, the value

Kσ(y) the c in (0,y) that solves   u(c)=β(uσ)(f(yc)z)f(yc)zϕ( ⁣dz).K\sigma(y) \coloneq \text{ the } c \text{ in } (0, y) \text{ that solves } \; u'(c) = \beta \int (u'\circ \sigma)(f(y - c) z) f'(y - c) z \phi(\diff z).

We call KK the Coleman--Reffett operator. It is well defined since, for any σΣC\sigma \in \Sigma_{\cC}, the map cu(c)c \mapsto u'(c) is continuous and strictly decreasing on (0,y)(0, y) with u(c)+u'(c) \to +\infty as c0c \downarrow 0, while the integral term is continuous and strictly increasing in cc on (0,y)(0, y) and diverges to ++\infty as cyc \uparrow y. It follows that there exists exactly one solution cc in the interval (0,y)(0, y).

It is immediate from the definition that σ\sigma in ΣC\Sigma_{\cC} is a fixed point of KK if and only if it satisfies the Euler equation. Thus, the Coleman--Reffett operator plays the same role for the optimal policy that the Bellman operator plays for the value function.

8.3.3.4Time Iteration

Time iteration means computing the optimal policy by iterating with KK: starting from an initial guess σ0ΣC\sigma_0 \in \Sigma_{\cC}, we generate the sequence (σk)(\sigma_k) defined by σk+1=Kσk\sigma_{k+1} = K \sigma_k. In Section 8.3.4 we will show that this sequence converges to the optimal policy σ\sigopt, the unique fixed point of KK in ΣC\Sigma_{\cC}. The proof uses the fact that KK and the Bellman operator TT are topologically conjugate, so the iterates of KK converge whenever VFI converges, and at the same rate.

Figure 8.12 illustrates time iteration for the growth model in Section 8.3.3.1 with log utility u(c)=lncu(c) = \ln c, Cobb--Douglas production f(k)=kαf(k) = k^\alpha, and lognormal shocks. This specification does not satisfy all of the conditions in Assumption 8.3.1 (for example, uu is not bounded), but it admits a closed-form optimal policy σ(y)=(1αβ)y\sigopt(y) = (1 - \alpha \beta) y, which allows us to measure the accuracy of the algorithm directly. The left panel shows the first few iterates of KK starting from σ0(y)=y\sigma_0(y) = y (consume everything), which converge visibly toward σ\sigopt. The right panel plots the sup-norm error Knσ0σ\| K^n \sigma_0 - \sigopt \|, confirming rapid convergence.

Iterates of the Coleman--Reffett operator K, starting from \sigma_0(y) = y, with \alpha = 0.4 and \beta = 0.96.

Figure 8.12:Iterates of the Coleman--Reffett operator KK, starting from σ0(y)=y\sigma_0(y) = y, with α=0.4\alpha = 0.4 and β=0.96\beta = 0.96.

In practice, time iteration is often more efficient than VFI because policy functions typically have less curvature than value functions, making them easier to approximate on a grid. However, each iterate of KK requires solving a nonlinear equation at every grid point, which can be costly. The endogenous grid method, discussed next, eliminates this root-finding step.

8.3.3.5The Endogenous Grid Method

In the standard implementation of time iteration, we fix a grid of income values {yi}\{y_i\} and, for each yiy_i, solve the nonlinear equation

u(c)=β(uσ)(f(yic)z)f(yic)zϕ( ⁣dz)u'(c) = \beta \int (u'\circ \sigma)(f(y_i - c) z) f'(y_i - c) z \, \phi(\diff z)

for cc using a root-finding algorithm. This is the most expensive step in each iteration.

The endogenous grid method (EGM), introduced by Carroll (2006), avoids root-finding by reversing the logic: instead of fixing income yy and solving for consumption cc, we fix savings s=ycs = y - c and solve for cc directly. Specifically, given the current policy σΣC\sigma \in \Sigma_{\cC} and a fixed grid of savings values {sj}\{s_j\}, we compute

cj=(u)1(β(uσ)(f(sj)z)f(sj)zϕ( ⁣dz))c_j = (u')^{-1} \left( \beta \int (u'\circ \sigma)(f(s_j) z) \, f'(s_j) \, z \, \phi(\diff z) \right)

and then set yj=cj+sjy_j = c_j + s_j. Each pair (yj,cj)(y_j, c_j) satisfies the Euler equation by construction. The updated policy KσK\sigma is then reconstructed from the pairs {(yj,cj)}\{(y_j, c_j)\} by interpolation.

The key advantage is that (8.138) requires only evaluation of (u)1(u')^{-1}, which is available in closed form for standard utility functions (e.g., (u)1(p)=p1/γ(u')^{-1}(p) = p^{-1/\gamma} for CRRA utility). This replaces the iterative root-finding step with a single function evaluation at each grid point, leading to substantial speed gains.

The name “endogenous grid method” reflects the fact that the income grid {yj}\{y_j\} is not fixed in advance but is determined endogenously by the savings grid and the current policy.

8.3.4Equivalence of VFI and Time Iteration

We now show that value function iteration and time iteration are, in a precise sense, equivalent. To do so, we construct a bijection MM between value functions and policies under which the Bellman operator TT and the Coleman--Reffett operator KK are conjugate. We then upgrade this conjugacy to a topological conjugacy, which allows us to transfer the known convergence properties of TT to KK.

8.3.4.1Conjugacy of KK and TT

The key step is to build a bijection MM between the space of value functions and the space of policies that intertwines TT and KK. The envelope condition (Proposition 8.3.9) provides the natural link: given a value function vv, the map MM recovers the corresponding greedy policy by inverting the marginal utility.

Throughout this section Assumption 8.3.1 is in force.

First let VCV_{\cC} be all strictly concave, continuously differentiable vv mapping R+\RR_+ to itself and satisfying v(0)=0v(0) = 0 and v(y)>u(y)v'(y) > u'(y) whenever y>0y > 0. For vVCv \in V_{\cC} let MvMv be defined by

(Mv)(y)=(u)1(v(y))(M v)(y) = (u')^{-1}(v' (y))

when y>0y> 0 and (Mv)(0)=0(M v)(0) =0. For this mapping, we have the following result:

The significance of MM is that the systems (VC,T)(V_{\cC}, T) and (ΣC,K)(\Sigma_{\cC}, K) are conjugate under this mapping:

This shows that VFI and time iteration are essentially isomorphic---they are topologically conjugate dynamical systems with identical long-run behavior. The practical benefits of time iteration stem from its numerical advantages: it operates directly in policy space and, combined with methods such as EGM, can avoid costly root-finding steps.

8.3.4.2From Conjugacy to Topological Conjugacy

Suppose we can show that the bijection MM is also continuous as a map from VCV_{\cC} to ΣC\Sigma_{\cC} and that its inverse is likewise continuous. Then (8.142) implies that (VC,T)(V_{\cC}, T) and (ΣC,K)(\Sigma_{\cC}, K) are topologically conjugate (see Section 5.1.1.2). This will be informative about (ΣC,K)(\Sigma_{\cC}, K) because topologically conjugate dynamical systems have essentially identical properties---and we already have a significant amount of information about the trajectories of TT.

The way we will show that MM and its inverse are continuous is to cook up a metric on ΣC\Sigma_{\cC} that essentially guarantees this result. The metric in question is

ρ(σ1,σ2)=d(M1σ1,M1σ2)M1σ1M1σ2\rho(\sigma_1, \sigma_2) = d_\infty( M^{-1} \sigma_1 , M^{-1} \sigma_2 ) \coloneq \| M^{-1} \sigma_1 - M^{-1} \sigma_2 \|_\infty

Now we can state the following result, which infers properties about the time iteration and its fixed point from corresponding properties of the Bellman operator.

8.4Chapter Notes

The job search model in Section 8.1.1 is due to McCall (1970). Mortensen (1986) provides an extensive survey of the job search literature and its connections to labor market analysis. A finite-state treatment of the McCall model and related problems can be found in Sargent & Stachurski (2025). The job search model with learning in Section 8.2.3.1 is closely related to Rothschild (1974), who studied optimal search when the distribution of offers is unknown. Esponda & Pouzo (2021) provide an equilibrium condition for agents controlling MDPs under Bayesian learning.

The Kreps--Porteus certainty equivalent used in Section 8.2.2 originates from Kreps & Porteus (1978); see also Epstein & Zin (1989) for a continuous-time formulation and further analysis. The job search model with separation in Section 8.2.4 extends a finite-state treatment from Chapter 3 of Sargent & Stachurski (2025).

The form of nonlinear discounting analyzed in Section 8.2.1 was introduced by Jaśkiewicz et al. (2014) and Bäuerle et al. (2021). As mentioned, one motivation is magnitude effects, under which discount rates decrease with the size of the reward. For evidence in favor of such effects, see, for example, Green et al. (1997).

The Coase-meets-Bellman production chain model in Section 8.3.1.1 is based on the work of Kikuchi et al. (2018). The negative discount optimality problem constructed in Section 8.3.1.2 is based on Kikuchi et al. (2021). Negative discount rates seem to arise naturally in some settings. Thaler (1981), Loewenstein & Prelec (1991) and Loewenstein & Sicherman (1991) document separate instances of such phenomena. For example, Loewenstein & Sicherman (1991) found that the majority of surveyed workers reported a preference for increasing wage profiles over decreasing ones, even when it was pointed out that the latter could be used to construct a dominating consumption sequence. Loewenstein & Prelec (1991) obtained similar results, stating that “sequences of outcomes that decline in value are greatly disliked, indicating a negative rate of time preference” Loewenstein & Prelec (1991).

The optimal harvest model in Section 8.3.2 illustrates how factored dynamic programs can reduce dimensionality in models with iid shocks.

The stochastic optimal growth model described in Section 8.3.3.1 was first set out and analyzed in Brock & Mirman (1972). Stokey & Lucas (1989) provides a textbook treatment of the dynamic programming theory for optimal growth, including the envelope theorem and Euler equation methods. The envelope condition in Section 8.3.3.2 is a version of the classic result of Benveniste & Scheinkman (1979), who established differentiability of the value function at interior optima; see also Milgrom & Segal (2002) for extensions to arbitrary choice sets. The Coleman--Reffett operator in Section 8.3.3.3 is named after Coleman (1990), who introduced policy-function iteration as an alternative to value function iteration for solving the stochastic growth model, and Coleman (1991), who extended the method to distorted economies. Datta et al. (2002) generalized the approach using lattice-theoretic methods.

The endogenous grid method was introduced by Carroll (2006). Extensions to handle non-convex and discrete--continuous choice problems were developed by Fella (2014) and Iskhakov et al. (2017), respectively. The dramatic speed improvements offered by EGM have made it a standard component of the structural estimation toolkit for life-cycle and consumer-choice models.

The topological conjugacy approach used in Section 8.3.4 to establish equivalence of VFI and time iteration draws on ideas developed in Sargent & Stachurski (2025).

Footnotes
  1. For example, if p1p \equiv 1, then Tnp=δn1T^n p = \delta^n 1, which diverges to ++\infty.

References
  1. McCall, J. J. (1970). Economics of Information and Job Search. The Quarterly Journal of Economics, 84(1), 113–126.
  2. Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
  3. Green, L., Myerson, J., & McFadden, E. (1997). Rate of temporal discounting decreases with amount of reward. Memory & Cognition, 25, 715–723.
  4. Ljungqvist, L., & Sargent, T. J. (2018). Recursive macroeconomic theory (4th ed.). MIT press.
  5. Coase, R. H. (1937). The nature of the firm. Economica, 4(16), 386–405.
  6. Williamson, O. E. (1979). Transaction-cost economics: the governance of contractual relations. Journal of Law and Economics, 22(2), 233–261.
  7. North, D. C. (1993). Institutions, transaction costs and productivity in the long run. In Washington University in St. Louis [Techreport].
  8. Blume, L. E., Easley, D., Kleinberg, J., & Tardos, E. (2009). Trading networks with price-setting agents. Games and Economic Behavior, 67(1), 36–50.
  9. Kikuchi, T., Nishimura, K., & Stachurski, J. (2018). Span of control, transaction costs and the structure of production chains. Theoretical Economics, 13(2), 729–760.
  10. Stachurski, J. (2022). Economic dynamics: theory and computation (2nd ed.). MIT Press.
  11. Carroll, C. D. (2006). The method of endogenous gridpoints for solving dynamic stochastic optimization problems. Economics Letters, 91(3), 312–320.
  12. Mortensen, D. T. (1986). Job search and labor market analysis. Handbook of Labor Economics, 2, 849–919.
  13. Rothschild, M. (1974). Searching for the Lowest Price When the Distribution of Prices Is Unknown. Journal of Political Economy, 82(4), 689–711.
  14. Esponda, I., & Pouzo, D. (2021). Equilibrium in misspecified Markov decision processes. Theoretical Economics, 16(2), 717–757.
  15. Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica: Journal of the Econometric Society, 185–200.