We formulate a basic iid job search model as an ADP and establish optimality results. We then reduce dimensionality via continuation values and study parametric monotonicity of the reservation wage. Finally, we extend the model to allow correlated wage draws with a Markov structure.
We begin with the job search problem of McCall (1970), a finite state version of which was discussed at length in Chapter 1 of Sargent & Stachurski (2025). Here we consider a general state version and allow wages and rewards to be unbounded.
We describe the model, construct the associated ADP on L1(ϕ), and verify that it is well-posed and regular. We also consider smaller value spaces that can be used when the offer distribution has bounded support. We then establish the fundamental optimality properties and convergence of the standard algorithms.
In each period, an unemployed worker receives a wage offer Wt, drawn from some known distribution ϕ. The worker can accept the current offer or wait until the following period and consider a new offer.
Let L1(ϕ):=L1(W,B,ϕ) be all Borel measurable f:W→R with ∫∣f∣dϕ<∞. As usual, functions equal ϕ-almost everywhere are identified and f⩽g means that {f>g} has measure zero under ϕ. Let Σ be all Borel measurable σ:W→{0,1}. Each such σ can be understood as a policy, mapping states to actions: If σ(w)=1, then the unemployed worker stops and accepts current offer w. If σ(w)=0, she continues.
Consider first a two-period problem. In period zero, the worker can either accept observed wage offer w0∼ϕ or continue to the next period, receiving unemployment compensation c and random payoff v(W1). The offer W1 is drawn from ϕ and v is a given “terminal reward” function. Under policy σ, which maps the wage offer w0 into an accept/reject decision, the expected present value of her payoff is
Fixing σ∈Σ, let vσ(w) be the lifetime value of following policy σ given initial wage offer w. By analogy with (8.1), we expect vσ to obey the recursion
vσ(w)=σ(w)1−βw+(1−σ(w))[c+β∫vσ(w′)ϕ(dw′)]for all w∈W.
Compared to (8.1), we have taken the value of stopping from (8.2) and also replaced the terminal value function v on the right-hand side of (8.1) with vσ. This is because we now work with an infinite horizon, so that (8.3) becomes a recursion in vσ.
Continuing to hold σ fixed, we introduce the policy operator v↦Tσv via
Since L1(ϕ) is closed under linear operations, policies are Borel measurable, and Assumption 8.1.1 is in force, we have Tσv∈L1(ϕ) whenever v∈L1(ϕ). Clearly Tσ is order preserving on (L1(ϕ),⩽). Hence, with T:={Tσ:σ∈Σ}, the pair (L1(ϕ),T) is an ADP. By construction, any fixed point of Tσ solves (8.3), so each such fixed point vσ has the interpretation of assigning lifetime values to states under σ.
This policy tells the worker to stop when the payoff from stopping is larger than the expected payoff from continuing, assuming that v is used to value future states.
Since σ in (8.6) is well-defined at any v∈L1(ϕ), and also Borel measurable, the ADP (L1(ϕ),T) is regular.
The expression for T in (8.7) aligns with the job search Bellman operator from Chapter 1 of Sargent & Stachurski (2025), after replacing the finite expectation with an integral.
In the preceding analysis we used L1(ϕ) for the value space because ϕ is allowed to have unbounded support. Since w can be arbitrarily large, this implies that the function Tσv in (8.4) is unbounded. The set L1(ϕ) can handle unbounded functions. To complete this section, the next exercise looks at settings where ϕ has bounded support and considers how we might exploit this by selecting a smaller value space.
Let’s return now to the general setting of Assumption 8.1.1, where W⊂R+ can be unbounded, and use L1(ϕ) for the value space. We consider optimality properties and convergence of algorithms for the job search ADP (L1(ϕ),T).
Since the fundamental optimality properties hold, the value function v∗ is a fixed point of the Bellman operator T and a policy σ is optimal if and only if it is v∗-greedy, which is to say that
The term on the right-hand side is called the reservation wage. This representation of optimal behavior is convenient because the reservation wage provides a scalar summary of the solution to the problem.
In the iid case, the Bellman equation can be reduced to a scalar fixed point problem in a single continuation value. We derive this reduction, connect it to the theory of factored DPs, and use it to study how the reservation wage varies with model parameters.
We can use h to eliminate the function v from (8.11). To do so we insert h on the right-hand side, replace w with w′ in (8.11), take expectations, multiply by β and add c to obtain
This is a nonlinear equation in h, the solution of which, henceforth denoted h∗, is the optimal continuation value of our problem. Obtaining h∗ allows us to solve the dynamic programming problem, since any policy σ satisfying
It is constructed such that any solution to (8.13) is a fixed point of g and vice versa.
The result of Exercise 8.6 implies that g has a unique fixed point h∗ in R+, which is the optimal continuation value. Figure 8.1 shows the function g when lnWt=μ+sZt for standard normal Zt, while β=0.9 and c=1.0. The integral in (8.16) is computed by Monte Carlo. The unique fixed point is h∗.
One obvious advantage of the formulation of the problem in Section 8.1.2.1 is that, instead of searching for a value function v∗ in the infinite-dimensional space L1(ϕ), we only need to solve for the fixed point of g in the one-dimensional space R+. The next exercise further reduces the search space to a bounded interval in R+.
Figure 8.2 shows the reservation wage, computed by iterating on g to obtain (an approximation to) h∗ and then calculating w∗ via (8.15). In the computation, c and the distribution ϕ are as for the last figure, while β ranges from 0.9 to 0.99.
As an exercise, let’s connect the transformation discussed in Section 8.1.2.1 to the theory of FDPs in Chapter 5. For this discussion we adopt the environment of Section 8.1.1.3 and set
V=L1(ϕ),
V^=R+,
F:V→V^ with Fv=c+β∫v(w′)ϕ(dw′), and
Gσ:V^→V with (Gσh)(w)=σ(w)(w/(1−β))+(1−σ(w))h.
Clearly, given h∈V^, we can attain the bound Gτh⩽Gσh for all τ∈Σ by setting
On inspection, we see that the fixed point of T^ is a solution to (8.13). Thus, the subordinate ADP (V^,T^) represents the continuation value problem from Section 8.1.2.1.
These observations formalize the ideas expressed in Section 8.1.2.1. For example, Theorem 5.2.13 tells us that a policy σ∈Σ will be optimal for the job search problem when h∗ is a fixed point of T^ and σ obeys Gσh∗=Gh∗. (Here G is the supremum ⋁σGσ.) In view of (8.18), such a σ can be found by setting
How does the solution to the job search problem vary with parameters? In terms of monotonicity, one way to answer this is to appeal to Proposition A.5.20 on page . Since g is an increasing contraction mapping on R+, this proposition implies that any parameter that shifts up the function g in (8.16) pointwise on R+ also shifts its fixed point up.
Figure 8.2 suggests that w∗ is also increasing in β. Since w∗=(1−β)h∗, we cannot infer this directly from the fact that h∗ is increasing in β. Instead, we take the fixed point equation for h∗ in (8.13) and substitute w=(1−β)h, which uses the definition of the reservation wage from (8.15), to obtain a new fixed point equation f(w)=w where
How do shifts in the wage offer distribution affect the reservation wage? One observation is that a shift to a “more favorable” wage distribution should increase the reservation wage, since an agent who continues can expect better offers.
A more interesting monotonicity result for this model concerns the volatility of the wage process and its impact on the reservation wage. For this problem, greater volatility encourages patience because the option value of waiting is larger. The next exercise asks you to verify this, using the concept of mean-preserving spreads (page ).
Figure 8.3 shows the output of HPI under a range of parameter values. First we generate a stochastic matrix P for wage offers via Tauchen’s method, discretizing the AR1 process Wt+1=ρWt+νξt+1 where (ξt) is iid and standard normal. We set β=0.99, ρ=0.9, ν=0.2 and n=500. In the left subfigure we plot v∗, computed by HPI, as well as the exit option e(w)=w/(1−β) and the reservation wage, which is wˉ:=min{w∈W:σ∗(w)=1} when σ∗ is v∗-greedy. The reservation wage is the minimum wage offer at which the unemployed worker accepts.
In the right subfigure we vary the volatility parameter ν over 0.1 to 0.2 and plot wˉ as a function of ν, holding other parameters fixed. Notice that the reservation wage increases with wage offer volatility. The reason is that more volatility increases the upside of waiting, due to the possibility of high future offers. At the same time, downside risk is mitigated by the ability to reject a bad offer.
Figure 8.4 shows the first two iterates of HPI, OPI and VFI, as well as the value function v∗ and the shared initial condition v. Parameter values are the same as the left-hand subfigure in Figure 8.3. In the case of OPI, m is set to 10. We see that HPI converges faster than VFI in terms of reduced distance to the value function per iteration. The rate for OPI is between HPI and VFI.
Here μ,d∈R, σ,s are positive, and ρ∈(−1,1). The sequences (ζt)t⩾1 and (ϵt)t⩾1 are independent, iid, and standard normal. Thus, the persistent component exp(Zt) and the transient component are lognormal. The model is otherwise unchanged. The state becomes (w,z)∈(0,∞)×R and the Bellman equation is
Here z and the parameters are fixed and ϕ is the N(0,I) distribution on R2.
Rather than analyzing this model directly, we can first reduce dimensionality by transforming it via continuation values, analogous to the technique we used in Section 8.1.2. As a first step, let h(z) be the continuation value from (8.41):
(Notice that h is a function now, as opposed to the iid setting of (8.12). This is not surprising, since the current state can be used to predict future wages, which in turn determine future value.)
Given h, the Bellman equation can be written as v(w,z)=max{w/(1−β),h(z)}. Combining this with the definition of h, we see that
(Note the similarity with (8.13).) The function h is defined on all of R, since this is the domain of z. If we can obtain the solution h∗ to this functional equation, we can use it to act optimally via the policy
To formalize these ideas, we can construct an ADP such that the Bellman equation agrees with (8.44). To do so we take Σ to be all Borel measurable functions sending (w′,z′)∈(0,∞)×R to {0,1} and, for each σ∈Σ, we set
We take ϕ to be the stationary distribution of (Zt) and consider each T^σ as a mapping over all h∈L1(ϕ):=L1(R,B,ϕ). Let T^ be all such T^σ as σ ranges over Σ. The pair (L1(ϕ),T^) forms an ADP.
We can alternatively write T^σh as T^σh=mσ+Kσh, where
we have 0⩽Kσh⩽Kh. By Lemma A.5.32, the spectral radius of K equals β<1. Hence, by Theorem 4.1.8, the fundamental optimality properties hold, and VFI, OPI and HPI all converge.
Let’s now characterize the Bellman operator, which is defined on L1(ϕ) by T^h=⋁σT^σh.
Since L1(ϕ) is complete, Banach’s contraction mapping theorem implies that T^ has a unique fixed point h∗ in L1(ϕ).
In this section we extend the basic job search framework in several directions. Section 8.2.1 introduces nonlinear discounting, where the discount factor depends on the magnitude of continuation value. Section 8.2.2 treats nonlinear expectations via the Kreps--Porteus certainty equivalent. Section 8.2.3 considers job search with learning, where the offer distribution is unknown and the worker updates beliefs. Section 8.2.4 adds job separation risk.
Next we consider a setting where discounting is a nonlinear function of continuation value. One motivation for this generalized setup is magnitude effects, under which, for some individuals, discount rates seem to decrease with the size of the reward (i.e., large rewards are discounted less, so the discount factor is increasing in the size of the reward; see, e.g., Green et al. (1997)). Our aim is to resolve the job search problem under this new setup.
We suppose that wage offers are P-Markov on a Borel set W⊂R+ and the value of continuing is given by
where β:R+→R is a discount factor function. Given this nonlinear discounting formulation, the lifetime value of a constant wage stream that pays w is e(w), where e is a fixed point of the operator
(In the case where β(x)=βx for some fixed β∈(0,1), we get e(w)=w/(1−β), so we recover the standard constant discount case.) To simplify the analysis, we assume that W=[w1,w2] where 0<w1<w2. We also assume that 0<c<w1 and w2⩾1−b, so that the worst wage offer is better than unemployment compensation. For the discount factor function we set
Obtaining optimality results for this model is not entirely trivial because the policy and Bellman operators are not, in general, contractions. This is due to the fact that β can be steep close to zero, as shown in Figure 8.5. At the same time, β is concave, which gives us some hope that we can use the concavity-based fixed point and optimality results from Section 4.1.3.
Figure 8.5:The discount function β for different choices of λ and b=0.99
We begin by analyzing H in (8.55). As a first step, we set
In the last section we modified the job search model to include nonlinear discounting. Here we drop nonlinear discounting but assume instead that the job seeker uses a nonlinear expectation of future values. In particular,
The operator R is the Kreps--Porteus operator. The value γ parameterizes risk aversion for the unemployed worker with respect to intertemporal gambles. When γ=0, we recover the standard linear expectation; when γ>0, the agent is risk-averse; when γ<0, the agent is risk-loving. The constant β lies in (0,1). As before, e(w):=w/(1−β) is the stopping reward. (Stopped rewards are deterministic so e(w) is not affected by γ.)
The operator Tσ and the operator R act on the set
As in Section 8.2.1, V is understood as an order interval in bW and 0<c<w1<w2.
Letting T:={Tσ:σ∈Σ}, we aim to show that (V,T) is an ADP and, moreover, that the conditions of Theorem 4.1.11 hold. As a first step, we observe that, for fixed σ∈Σ and for sufficiently small positive ϵ,
Since R and hence Tσ is order preserving, these facts tell us that Tσ maps V to itself, so (V,T) is an ADP.
Combining the last two ϵ bounds with fact (ii) above, we see that the conditions of Theorem 4.1.11 are satisfied for every γ∈R∖{1}. As a result, the fundamental optimality properties hold and VFI, OPI, and HPI all converge.
Figure 8.6 shows the reservation wage wˉ as a function of γ. The figure was computed as follows. Fixing γ, we calculated v∗ via VFI, set
and then set wˉ:=min{w∈W:σ(w)=1}. Aside from γ, the parameters used in the calculations were the same as those given in Section 8.1.3.2. The figure shows that the reservation wage decreases with γ, as the job seeker becomes progressively more risk-averse. Increasing risk aversion means that gambles over future payoffs become less attractive, which favors stopping over continuing. This encourages the job seeker to lower the reservation wage.
Figure 8.6:The reservation wage as a function of γ
Next we consider a variation of the job search model from §6.6 of Ljungqvist & Sargent (2018). The framework is the iid setting of Section 8.1.1.1, apart from the fact that the wage offer distribution ϕ is unknown to the worker. Instead, the agent learns about ϕ by starting with a prior belief and then successively updating her beliefs based on observed wage offers. The learning process and hence the stopping problem is similar to the one we analyzed in Section 1.4. One difference is that we will use discounting, which simplifies optimality results. Another is that we will use a transformation to reduce dimensionality.
The structure of information is as follows: The worker knows there are two possible offer distributions, with densities f and g. At the start of time, nature selects ϕ to be either f or g, the wage distribution from which the entire sequence (Wt)t⩾0 will be drawn. This choice is not observed by the worker, who puts prior probability π0 on f being chosen. In other words, the worker’s initial guess of ϕ is π0f(w)+(1−π0)g(w). Beliefs subsequently update according to Bayes’ rule. Thus, the agent, having observed Wt+1, updates πt to πt+1 via
(Note the connection to the learning dynamics we obtained for the sequential analysis problem in (1.113).)
Using (8.66), we can formulate an ADP representation of the optimal stopping problem. Dropping time subscripts, let ϕπ:=πf+(1−π)g represent the estimate of the wage offer distribution given belief π and let
Rather than tackling (V,T) directly, we will introduce a variation with a lower dimensional state space. To begin, fix v∈V and let ω(π) be the corresponding reservation wage at belief state π, which is the wage level at which the worker is indifferent between accepting and rejecting. This value satisfies
When this fixed point is well-defined we call it the optimal reservation wage function. The value ω(π) will indicate the smallest wage offer at which the worker is willing to accept, given her current belief state π.
as shown in Figure 8.7. The other parameters are c=0.1 and β=0.95. Since T^ is a contraction of modulus β on V^, a unique solution ω∗ to the reservation wage functional equation exists in V^ and T^kω→ω∗ uniformly as k→∞, for any ω∈V^. Figure 8.8 shows the result of this iteration, the optimal reservation wage, as a function of π, the belief state.
Note that the optimal reservation wage function ω∗ in Figure 8.8 is increasing in π. This result seems reasonable: In Figure 8.7, the density f puts more mass on higher draws, so, as our belief shifts toward f, our reservation wage should increase. The next proposition gives conditions for such monotonicity.
We consider a version of the job search model from Section 8.1.3.1 where separation can occur. In particular, an existing match between worker and firm dissolves with probability α every period. Note that this discussion extends a treatment of a similar model in a finite-state setting from Chapter 3 of Sargent & Stachurski (2025).
To simplify the discussion, we assume that the set of possible wage offers W⊂R+ is bounded above by some positive constant M. The state space for the problem is X:={e,u}×W, with a typical element (s,w) denoting employment status s (here e means employed and u means unemployed), and current offer w. A policy is a Borel measurable map σ:W→{0,1}, where, as usual, σ(w)=0 means “reject the current offer” and σ(w)=1 means “accept.”
The wage offer sequence (Wt) is assumed to be P-Markov on W. The value space V will be all bounded and Borel measurable v:X→R and we endow V with the supremum norm and the pointwise partial order.
The right-hand side of the first expression is the current value of being employed with offer w in hand, given the continuation values embodied in v. The right-hand side of the second expression is the current value of being unemployed with offer w in hand, conditional on using policy σ.
We can solve this problem directly by setting up the corresponding ADP, but we can also start by simplifying the value space in a way we now describe. This will make the analysis easier and help with computation. The first step is to regard (8.86) as a fixed point problem, replacing (Tσv)(e,w) with v(e,w) on the left hand side and treating v(u,⋅) as given. Simple algebra then gives
We take bW as the value space and let T={Tσ:σ∈Σ}. As before, Σ is all Borel measurable maps from W to {0,1}. Recalling that W is bounded above, one can easily confirm that Tσ maps bW to itself. Clearly Tσ is order preserving. Hence (bW,T) is an ADP.
Given vu∈bW, set ve=h+γPvu and consider the policy σ defined by
The expression for σ in (8.91) is natural because it tells the worker to accept employment whenever its value is higher than the expected present value of continuing, given the continuation value for unemployment associated with vu.
Regarding optimality, we have the following result.
The value function vu∗ for an unemployed worker satisfies the recursion
where ve∗ is the value function for an employed worker, that is, the lifetime value of a worker who starts the period employed at wage w. Value function ve∗ satisfies
This equation states that value accruing to an employed worker is current wage plus the discounted expected value of being either employed or unemployed next period.
We claim that, when 0<α,β<1, the system (8.96)--(8.97) has a unique solution (ve∗,vu∗) in bW×bW.
In this section we present several additional applications of the theory developed in earlier chapters. Section 8.3.1 studies problems with negative discounting and connects them to Coase’s theory of the firm. Section 8.3.2 treats an optimal harvest problem and shows how factorization reduces dimensionality. Section 8.3.3 develops Euler equation methods for continuous choice problems, and Section 8.3.4 establishes the equivalence of value function iteration and time iteration.
In this section, we study a production chain model that can capture the key idea from Coase’s classic study on the nature of the firm Coase (1937). We then show how the equilibrium price function can also be recovered as the solution to a dynamic programming problem. The dynamic programming problem is itself interesting: it analyzes loss minimization with negative discounting, a scenario that appears to have significant empirical relevance. Moreover, negative discounting makes the application of dynamic programming nontrivial. We solve the problem using order stability arguments combined with the theory developed in Section 2.2.2.
Coase (1937) argues that firms have nontrivial size because of transaction costs associated with using the market. (One example is the cost of negotiating, drafting, monitoring and enforcing contracts with suppliers. Others include search frictions, transaction fees, taxes, and information costs --- see, e.g., Williamson (1979)North (1993)Blume et al. (2009)). Because of these costs, entrepreneurs and managers can sometimes coordinate production at a lower cost within the firm.
At the same time, a countervailing force, referred to by Coase (1937) as “diminishing returns to management,” prevents firms expanding without limit. Rising costs per task can be thought of as driven by the expanding informational requirements associated with larger planning problems, leading to progressively higher management costs, incentive problems and misallocation of resources. The size of firms is determined by the trade-off associated with these two forces (transaction costs and diminishing returns to management).
Here we discuss a model that captures these ideas. In the model, firms produce a unit of a final good via sequential completion of processing stages. Stages are indexed by t∈[0,1], with t=1 indicating that the good is complete. One allocation of tasks is illustrated in Figure 8.9. In this example, firm 1 sells one unit of the completed good to a final buyer. Firm 1 then contracts with firm 2 to purchase the partially completed good at stage t1, with the intention of implementing the remaining 1−t1 tasks in-house (i.e., processing from stage t1 to stage 1). Firm 2 repeats this procedure, forming a contract with firm 3 to purchase the good at stage t2. Firm 3 decides to complete the chain, selecting t3=0. The value ti is called the upstream boundary of firm i.
Figure 8.9:Recursive allocation of production tasks
Production now unfolds from upstream to downstream. First, firm 3 completes processing stages from t3=0 up to t2 and transfers the good to firm 2. Firm 2 then processes from t2 up to t1 and transfers the good to firm 1, who processes from t1 to 1 and delivers the completed good to the final buyer. In what follows, the length of the interval of stages carried out by firm i is denoted by ai and referred to as the range of tasks carried out by firm i. Figure 8.10 helps to clarify notation.
There is a countable infinity of ex ante identical firms and no fixed costs or barriers to entry. An allocation is a nonnegative sequence A={ai}i∈N. An allocation A defines a division of tasks across firms, with ai being the range of tasks implemented by the i-th firm. If ai=0 then firm i is understood to be inactive. An allocation A is called feasible if there exists some finite I with ∑i=1Iai=1. Feasibility means that the entire production process is completed by finitely many firms. Given a feasible allocation A, let {ti} represent the corresponding transaction stages, defined by t0=1 and ti=ti−1−ai. In particular, ti−1 is the downstream boundary of firm i and ti is its upstream boundary (as in Figure 8.10).
Firms face a price function p:[0,1]→R+, with p(t) indicating the price of the good at stage t. Since the i-th firm purchases the good at stage ti, sells it at stage ti−1, and undertakes the remaining ai tasks in-house, its total costs are its processing costs c(ai) plus gross input cost δp(ti). The term δ>1 represents transaction costs. Transaction costs are incurred only by the buyer (a simplifying assumption), so its profits are
Diminishing returns to management are implemented by assuming that c is increasing and strictly convex. We also assume that c is continuously differentiable, with c(0)=0 and c′(0)>0.
Condition (i) is a zero profit condition for suppliers of initial inputs (which are costless). Condition (ii) states that all active firms make zero profits. Condition (iii) ensures that no firm in the production chain has an incentive to deviate, and that no inactive firm can enter and extract positive profits.
To construct an equilibrium we introduce the operator T mapping p to Tp via
Here and below, the restriction 0⩽t in the minimum is understood. Since δ>1, the map T is not a contraction in any obvious metric, and Tnp diverges for many choices of p, even when continuous and bounded.[1] Nevertheless, there exists a domain on which T is well-behaved: the set of convex increasing continuous functions p:[0,1]→R such that c′(0)s⩽p(s)⩽c(s) for all 0⩽s⩽1. We denote this set of functions by P.
This result is proved in Kikuchi et al. (2018). In Section 8.3.1.2 below, we give an alternative proof by constructing a dynamic program whose value function coincides with p∗.
The significance of p∗ is that there exists an allocation A∗ such that (p∗,A∗) is an equilibrium for the production chain in the sense of Definition 8.3.1. To construct this allocation, we introduce the equilibrium choice function
By definition, t∗(s) is the cost minimizing upstream boundary for a firm contracted to deliver the good at stage s and facing the price function p∗. Since p∗ lies in P and since c is strictly convex, the minimizer t∗(s) exists and is uniquely defined.
We can use t∗ to construct an equilibrium allocation. The optimal upstream boundary of firm 1 is t∗(1). Hence firm 2’s optimal upstream boundary is t∗(t∗(1)). Continuing in this way produces the sequence {ti∗} defined by
The sequence ends when a firm chooses to complete all remaining tasks. We label this firm (and hence the number of firms in the chain) as n∗, defined by n∗:=inf{i∈N:ti∗=0}. The task allocation corresponding to (8.104) is given by ai∗:=ti−1∗−ti∗ for all i. The function p∗ is called the equilibrium price function and A∗ is called the equilibrium allocation. Kikuchi et al. (2018) prove the following result:
The idea of the proof is as follows. As a fixed point of T, the equilibrium price function satisfies
From this equation it is clear that p∗ satisfies part (iii) of Definition 8.3.1. Moreover, the equilibrium upstream boundary for firm i is the minimizer in (8.105) when s is its downstream boundary, so profits are zero for all incumbent firms. Hence part (ii) of Definition 8.3.1 is satisfied. Part (i) of the definition is immediate from the fact that p∗∈P, whence we obtain p∗(0)⩽c(0)=0. More details can be found in Kikuchi et al. (2018).
The first order condition for the minimization in (8.105) is
The left-hand side is the marginal cost of purchasing an additional unit of input through the market (including the transaction cost markup δ), while the right-hand side is the marginal cost of performing one more task in-house. At the optimum, these two forces balance, just as Coase (1937) argued verbally: “a firm will tend to expand until the costs of organizing an extra transaction within the firm become equal to the costs of carrying out the same transaction by means of an exchange on the open market.” For more discussion (and a proof of the differentiability of p∗), see Kikuchi et al. (2018).
Figure 8.11 shows equilibrium prices and allocations for two values of the transaction cost parameter, using the exponential cost function c(a)=eθa−1 with θ=10. The vertical lines are firm boundaries, computed via (8.104). When δ=1.02 (top panel), transaction costs are low and we see many small firms. When δ=1.2 (bottom panel), higher transaction costs encourage less use of the market and more in-house production, yielding fewer, larger firms.
Figure 8.11:Firm boundaries for δ=1.02 (top) and δ=1.2 (bottom)
Interestingly, we can write down a dynamic program such that the value function corresponds with the equilibrium price function defined above. The dynamic program involves negative discounting and is itself useful and informative. Some background discussion and motivation is provided in Section 8.4.
In the model, an agent is faced with a task of measure x^>0. At time t they have xt units of the task remaining. They then take action at⩾0 and incur loss c(at). The state moves to xt−at. We interpret at as effort and c(at) as disutility. The optimization problem is
Throughout this section, we suppose that δ>1, that c(0)=0, and that c is continuously differentiable, strictly increasing and strictly convex. The convexity in c encourages the agent to defer some effort. Negative discounting (δ>1) has the opposite effect: the agent wants to get the task “over and done with.” This trade-off determines the optimum.
We also assume that c′(0)>0 and that there exists an η∈(0,x^) satisfying
Here a policy is a function σ:[0,x^]→[0,x^] satisfying
0⩽σ(x)⩽x for all x,
σ is increasing,
the effort level π(x):=x−σ(x) is increasing, and
σ(x)=0 if and only if x⩽η.
Let Σ be the set of all such policies.
Note that (ii) and (iii) together imply that σ is 1-Lipschitz, and hence continuous. Note also that, for each σ∈Σ, we have Tσv∈V whenever σ∈Σ and v∈V. This follows from (Tσv)(0)=c(0)+δv(0)=0 and the fact that Tσv is increasing --- which is true because σ, π, c and v are all increasing.
Let T be the set of all policy operators. Each Tσ is an order-preserving self-map on V, so (V,T) is an ADP. The next exercise characterizes iterates of Tσ and can be solved by induction.
Somewhat surprisingly, each Tσ is well-behaved on V. The next lemma gives details.
Let V0 be all convex continuous v∈V satisfying c′(0)x⩽v(x)⩽c(x) for all 0⩽x⩽x^. The proofs of the next two lemmas are straightforward but lengthy, so we present them as exercises.
Here and below, the restriction 0⩽a in the minimum is understood.
Setting x^=1, the Bellman operator in Theorem 8.3.6 coincides with the operator T defined in (8.102), and V0=P. Lemma 8.3.5 then provides an alternative proof of Proposition 8.3.1: part (i) is the statement that T maps V0=P into itself, part (ii) follows from uniqueness of vˉ in V0=P (with p∗=vˉ=v▽∗), and part (iii) is the finite-step convergence Tkp→vˉ for all p∈P. In particular, the equilibrium price function of the production chain coincides with the value function of the negative-discounting dynamic program.
In this section we examine a model of forestry management and optimal harvests. We set up the problem in Section 8.3.2.1 and then show in Section 8.3.2.2 how factorization reduces the dimensionality of the state space. Section 8.3.2.3 discusses when optimality for the subordinate ADP implies optimality for the primary.
A manager controls a timber plantation with biomass st at time t. The unit price for timber at time t is pt. The manager observes (st,pt) and decides whether to harvest or not. A decision to harvest generates revenue stpt. If she chooses to wait, then time updates to the next period and the process repeats.
Biomass takes values in S, a closed and bounded interval in R+, and evolves according to st+1=q(st), where q is a continuous self-map on S. If q(0)>0, then the plantation regenerates after each harvest. If not, the plantation never regenerates and the problem below is an optimal stopping problem.
We assume that the price sequence (pt) is iid with distribution ϕ on closed and bounded interval E⊂R+. The cost of harvesting given biomass s is m(s). The cost of maintaining the plantation for one period, rather than harvesting, is c(s). Both m and c are continuous real-valued functions on S. The firm is risk neutral and discounts the future using discount factor β<1.
The state space for the model is S×E. The Bellman equation can be expressed as
Biomass s takes values in S, while the price p takes values in E. Both S and E are closed and bounded intervals in R+. The functions m and c are in bcS, while q is a continuous self-map on S.
A feasible policy σ is a measurable map from S×E to {0,1}, with 0 indicating the decision not to harvest and 1 indicating harvest. The policy operator corresponding to this model is
With Σ as the set of all feasible policies, V as the set of bounded measurable real-valued functions on S×E, and T:={Tσ:σ∈Σ}, the pair (V,T) is an ADP with Bellman operator equal to (8.117).
We can reduce dimensionality for this ADP via a transformation. To construct it, we set V^ to be the bounded measurable real-valued functions on S×{0,1}. If we set
Notice that the functions in V are defined over points (s,p)∈S×E, while functions in V^ are defined over points (s,a)∈S×{0,1}. The functions in the second set are typically much easier to work with (because E is larger than {0,1}).
Let T^ be the Bellman operator corresponding to the subordinate ADP (V^,T^). In light of Theorem 5.2.13, we can compute an optimal policy for (V,T) by
computing the unique fixed point w∗ of T^ in V^ and
Since the subordinate ADP (V^,T^) has a lower-dimensional state space, it is natural to want to solve it first and transfer the resulting optimal policy back to the primary ADP (V,T). As discussed in Section 5.2.2.4, however, this is not always valid: a policy that is optimal for the subordinate need not be optimal for the primary.
The reason can be understood as follows. The subordinate ADP evaluates a policy σ only at continuation states of the form (f(s,a),p′), where p′ is drawn from ϕ. In the harvest model, these biomass levels lie in q(S)∪{q(0)}. If the growth function q does not map S onto itself, then there exist biomass levels s0∈S that never arise as continuations. For example, if q maps S=[0,10] into [2,8], then biomass s0=1 is a valid initial state but can never occur after the first period. The subordinate ADP never evaluates σ at such states, so a subordinate-optimal policy may make an arbitrarily poor harvest decision there. The primary ADP, however, evaluates σ at every state (s,p)∈S×E and would detect the suboptimal choice.
The additional condition needed for the converse is that F be strictly order preserving (see Proposition 5.2.14). In this model, sufficient conditions include: ϕ has a positive density on E, q maps S onto itself, and we restrict attention to continuous functions. Under these conditions, any strict difference between two value functions at some (s0,p0) propagates through F to a strict difference in the subordinate value space, closing the gap.
The Euler equation is a first-order condition for optimality that can be used for analysis and computation. Euler equations are available in a range of settings, including but not limited to household savings problems and optimal growth problems. Here we will work in the context of a smooth optimal growth model with iid shocks. Admittedly, this version of the model is too simple to be useful for serious economic analysis. At the same time it provides a convenient vehicle for our exploration of Euler equations and related topics. Most of the ideas discussed here carry over to other settings where Euler equations can be obtained.
We begin with the model and collect basic optimality results. We then derive the envelope condition and use it to obtain the Euler equation, first in sequential form and then as a functional equation on policies. The functional form leads to the Coleman--Reffett operator K, whose fixed points are solutions to the Euler equation. Time iteration---iterating with K---provides a method for computing the optimal policy directly, without first solving for the value function. We also discuss the endogenous grid method, which accelerates time iteration by avoiding a nonlinear root-finding step. In Section 8.3.4 we establish that VFI and time iteration are equivalent, in the sense that T and K are topologically conjugate.
We will make use of a differential characterization of greedy policies that is closely connected to the Euler equation and will be useful in what follows. A proof can be found in Section 12.1 of Stachurski (2022).
One interesting aspect of Proposition 8.3.9 is that v does not have to be differentiable. Hence, the Bellman operator is smoothing, in the sense that images of some nonsmooth functions are smooth.
Here’s an important corollary:
Corollary 8.3.10 has been presented in many forms in the economics literature and (8.130) is often called the envelope condition.
We refer to (8.131) as the sequential Euler equation because it restricts the endogenous sequence (ct).
The left-hand side is the marginal utility of consuming one additional unit today. The right-hand side is the expected marginal benefit of saving that unit instead: one unit of savings yields a gross return of f′(yt−ct)ξt+1 in the next period, which is then valued at marginal utility u′(ct+1) and discounted by β. At the optimum, these two quantities are equalized---any deviation from (8.131) would allow the agent to improve lifetime utility by shifting consumption between periods.
The Euler equation is typically understood as a necessary condition for optimality of a consumption-savings path. However, when applied to policies rather than consumption paths, it turns out to be sufficient as well. Moreover, studying it leads to new insights on optimal behavior and computational methods.
To investigate these ideas, let’s shift to a policy-based perspective, where the Euler equation becomes a functional restriction on policies. Set
ΣC:= all continuous, strictly increasing σ∈Σ satisfying 0<σ(y)<y.
To solve the functional Euler equation, we convert it into a fixed point problem. Consider the operator K from ΣC to itself defined as follows: for each σ∈ΣC and each y>0, the value
Kσ(y):= the c in (0,y) that solves u′(c)=β∫(u′∘σ)(f(y−c)z)f′(y−c)zϕ(dz).
We call K the Coleman--Reffett operator. It is well defined since, for any σ∈ΣC, the map c↦u′(c) is continuous and strictly decreasing on (0,y) with u′(c)→+∞ as c↓0, while the integral term is continuous and strictly increasing in c on (0,y) and diverges to +∞ as c↑y. It follows that there exists exactly one solution c in the interval (0,y).
It is immediate from the definition that σ in ΣC is a fixed point of K if and only if it satisfies the Euler equation. Thus, the Coleman--Reffett operator plays the same role for the optimal policy that the Bellman operator plays for the value function.
Time iteration means computing the optimal policy by iterating with K: starting from an initial guess σ0∈ΣC, we generate the sequence (σk) defined by σk+1=Kσk. In Section 8.3.4 we will show that this sequence converges to the optimal policy σ∗, the unique fixed point of K in ΣC. The proof uses the fact that K and the Bellman operator T are topologically conjugate, so the iterates of K converge whenever VFI converges, and at the same rate.
Figure 8.12 illustrates time iteration for the growth model in Section 8.3.3.1 with log utility u(c)=lnc, Cobb--Douglas production f(k)=kα, and lognormal shocks. This specification does not satisfy all of the conditions in Assumption 8.3.1 (for example, u is not bounded), but it admits a closed-form optimal policy σ∗(y)=(1−αβ)y, which allows us to measure the accuracy of the algorithm directly. The left panel shows the first few iterates of K starting from σ0(y)=y (consume everything), which converge visibly toward σ∗. The right panel plots the sup-norm error ∥Knσ0−σ∗∥, confirming rapid convergence.
Figure 8.12:Iterates of the Coleman--Reffett operator K, starting from σ0(y)=y, with α=0.4 and β=0.96.
In practice, time iteration is often more efficient than VFI because policy functions typically have less curvature than value functions, making them easier to approximate on a grid. However, each iterate of K requires solving a nonlinear equation at every grid point, which can be costly. The endogenous grid method, discussed next, eliminates this root-finding step.
for c using a root-finding algorithm. This is the most expensive step in each iteration.
The endogenous grid method (EGM), introduced by Carroll (2006), avoids root-finding by reversing the logic: instead of fixing income y and solving for consumption c, we fix savings s=y−c and solve for c directly. Specifically, given the current policy σ∈ΣC and a fixed grid of savings values {sj}, we compute
and then set yj=cj+sj. Each pair (yj,cj) satisfies the Euler equation by construction. The updated policy Kσ is then reconstructed from the pairs {(yj,cj)} by interpolation.
The key advantage is that (8.138) requires only evaluation of (u′)−1, which is available in closed form for standard utility functions (e.g., (u′)−1(p)=p−1/γ for CRRA utility). This replaces the iterative root-finding step with a single function evaluation at each grid point, leading to substantial speed gains.
The name “endogenous grid method” reflects the fact that the income grid {yj} is not fixed in advance but is determined endogenously by the savings grid and the current policy.
We now show that value function iteration and time iteration are, in a precise sense, equivalent. To do so, we construct a bijection M between value functions and policies under which the Bellman operator T and the Coleman--Reffett operator K are conjugate. We then upgrade this conjugacy to a topological conjugacy, which allows us to transfer the known convergence properties of T to K.
The key step is to build a bijection M between the space of value functions and the space of policies that intertwines T and K. The envelope condition (Proposition 8.3.9) provides the natural link: given a value function v, the map M recovers the corresponding greedy policy by inverting the marginal utility.
First let VC be all strictly concave, continuously differentiable v mapping R+ to itself and satisfying v(0)=0 and v′(y)>u′(y) whenever y>0. For v∈VC let Mv be defined by
when y>0 and (Mv)(0)=0. For this mapping, we have the following result:
The significance of M is that the systems (VC,T) and (ΣC,K) are conjugate under this mapping:
This shows that VFI and time iteration are essentially isomorphic---they are topologically conjugate dynamical systems with identical long-run behavior. The practical benefits of time iteration stem from its numerical advantages: it operates directly in policy space and, combined with methods such as EGM, can avoid costly root-finding steps.
Suppose we can show that the bijection M is also continuous as a map from VC to ΣC and that its inverse is likewise continuous. Then (8.142) implies that (VC,T) and (ΣC,K) are topologically conjugate (see Section 5.1.1.2). This will be informative about (ΣC,K) because topologically conjugate dynamical systems have essentially identical properties---and we already have a significant amount of information about the trajectories of T.
The way we will show that M and its inverse are continuous is to cook up a metric on ΣC that essentially guarantees this result. The metric in question is
Now we can state the following result, which infers properties about the time iteration and its fixed point from corresponding properties of the Bellman operator.
The job search model in Section 8.1.1 is due to McCall (1970). Mortensen (1986) provides an extensive survey of the job search literature and its connections to labor market analysis. A finite-state treatment of the McCall model and related problems can be found in Sargent & Stachurski (2025). The job search model with learning in Section 8.2.3.1 is closely related to Rothschild (1974), who studied optimal search when the distribution of offers is unknown. Esponda & Pouzo (2021) provide an equilibrium condition for agents controlling MDPs under Bayesian learning.
The Kreps--Porteus certainty equivalent used in Section 8.2.2 originates from Kreps & Porteus (1978); see also Epstein & Zin (1989) for a continuous-time formulation and further analysis. The job search model with separation in Section 8.2.4 extends a finite-state treatment from Chapter 3 of Sargent & Stachurski (2025).
The form of nonlinear discounting analyzed in Section 8.2.1 was introduced by Jaśkiewicz et al. (2014) and Bäuerle et al. (2021). As mentioned, one motivation is magnitude effects, under which discount rates decrease with the size of the reward. For evidence in favor of such effects, see, for example, Green et al. (1997).
The Coase-meets-Bellman production chain model in Section 8.3.1.1 is based on the work of Kikuchi et al. (2018). The negative discount optimality problem constructed in Section 8.3.1.2 is based on Kikuchi et al. (2021). Negative discount rates seem to arise naturally in some settings. Thaler (1981), Loewenstein & Prelec (1991) and Loewenstein & Sicherman (1991) document separate instances of such phenomena. For example, Loewenstein & Sicherman (1991) found that the majority of surveyed workers reported a preference for increasing wage profiles over decreasing ones, even when it was pointed out that the latter could be used to construct a dominating consumption sequence. Loewenstein & Prelec (1991) obtained similar results, stating that “sequences of outcomes that decline in value are greatly disliked, indicating a negative rate of time preference” Loewenstein & Prelec (1991).
The optimal harvest model in Section 8.3.2 illustrates how factored dynamic programs can reduce dimensionality in models with iid shocks.
The stochastic optimal growth model described in Section 8.3.3.1 was first set out and analyzed in Brock & Mirman (1972). Stokey & Lucas (1989) provides a textbook treatment of the dynamic programming theory for optimal growth, including the envelope theorem and Euler equation methods. The envelope condition in Section 8.3.3.2 is a version of the classic result of Benveniste & Scheinkman (1979), who established differentiability of the value function at interior optima; see also Milgrom & Segal (2002) for extensions to arbitrary choice sets. The Coleman--Reffett operator in Section 8.3.3.3 is named after Coleman (1990), who introduced policy-function iteration as an alternative to value function iteration for solving the stochastic growth model, and Coleman (1991), who extended the method to distorted economies. Datta et al. (2002) generalized the approach using lattice-theoretic methods.
The endogenous grid method was introduced by Carroll (2006). Extensions to handle non-convex and discrete--continuous choice problems were developed by Fella (2014) and Iskhakov et al. (2017), respectively. The dramatic speed improvements offered by EGM have made it a standard component of the structural estimation toolkit for life-cycle and consumer-choice models.
The topological conjugacy approach used in Section 8.3.4 to establish equivalence of VFI and time iteration draws on ideas developed in Sargent & Stachurski (2025).
McCall, J. J. (1970). Economics of Information and Job Search. The Quarterly Journal of Economics, 84(1), 113–126.
Sargent, T. J., & Stachurski, J. (2025). Dynamic Programming: Finite States. Cambridge University Press.
Green, L., Myerson, J., & McFadden, E. (1997). Rate of temporal discounting decreases with amount of reward. Memory & Cognition, 25, 715–723.
Ljungqvist, L., & Sargent, T. J. (2018). Recursive macroeconomic theory (4th ed.). MIT press.
Coase, R. H. (1937). The nature of the firm. Economica, 4(16), 386–405.
Williamson, O. E. (1979). Transaction-cost economics: the governance of contractual relations. Journal of Law and Economics, 22(2), 233–261.
North, D. C. (1993). Institutions, transaction costs and productivity in the long run. In Washington University in St. Louis [Techreport].
Blume, L. E., Easley, D., Kleinberg, J., & Tardos, E. (2009). Trading networks with price-setting agents. Games and Economic Behavior, 67(1), 36–50.
Kikuchi, T., Nishimura, K., & Stachurski, J. (2018). Span of control, transaction costs and the structure of production chains. Theoretical Economics, 13(2), 729–760.
Stachurski, J. (2022). Economic dynamics: theory and computation (2nd ed.). MIT Press.
Carroll, C. D. (2006). The method of endogenous gridpoints for solving dynamic stochastic optimization problems. Economics Letters, 91(3), 312–320.
Mortensen, D. T. (1986). Job search and labor market analysis. Handbook of Labor Economics, 2, 849–919.
Rothschild, M. (1974). Searching for the Lowest Price When the Distribution of Prices Is Unknown. Journal of Political Economy, 82(4), 689–711.
Esponda, I., & Pouzo, D. (2021). Equilibrium in misspecified Markov decision processes. Theoretical Economics, 16(2), 717–757.
Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica: Journal of the Econometric Society, 185–200.