We study problems of maximizing lifetime rewards in settings in which decision-makers face risks. The job search model studied in Chapter 1 and Chapter 3 is one example. Others include an entrepreneur who decides whether to exit or enter a market, a borrower who considers defaulting on a loan, a firm that contemplates introducing a new technology, or a portfolio manager deciding whether to exercise a real or financial option.
These can all be formulated as dynamic programming and have common features that facilitate sharp characterizations of optimality. They are all two-action (or binary choice) problems that provide good laboratories for studying some special dynamic programs in which recursive representations are particularly enlightening.
We begin with a standard theory of optimal stopping and then consider alternative approaches that feature continuation values and threshold policies. We aim to provide a rigorous discussion of optimality that refines our less formal analysis of job search in Section 1.3 and Section 3.3.1.
Let X be a finite set. Given X, an optimal stopping problem is a tuple S=(β,P,c,e) that consists of
a discount factor β∈(0,1),
a Markov operator P∈M(RX),
a continuation reward functionc∈RX, and
an exit reward functione∈RX.
Given a P-Markov chain (Xt)t⩾0, a decision-maker observes the state Xt in each period and decides whether to continue or stop. If she chooses to stop, she receives final reward e(Xt) and the process terminates. If she decides to continue, then she receives c(Xt) and the process repeats next period. Lifetime rewards are
where Rt equals c(Xt) while the agent continues, e(Xt) when the agent stops, and zero thereafter.
Optimal decisions are described by a policy function, which is a map σ from X to {0,1}. After observing state x at any given time, the decision-maker takes action σ(x), where 0 means “continue” and 1 means “stop.” Implicit in this formulation is the assumption that the current state contains enough information for the agent to decide whether or not to stop.
Let Σ be the set of functions from X to {0,1}. Let vσ(x) denote the expected lifetime value of following policy σ now and in every future period, given optimal stopping problem S=(β,P,c,e) and current state x∈X. We call vσ the σ-value function. We also call vσ(x) the lifetime value of policy σ conditional on initial state x. Section Section 4.1.1.2, shows that vσ is well defined and describes how to calculate it. A policy σ∗∈Σ is called optimal for S if
Indeed, if σ(x)=1, then (4.3) states that vσ(x)=e(x), which is what we expect: if we choose to stop at a given state, then lifetime value from that state equals the exit reward. If, instead, σ(x)=0, then (4.3) becomes
which is also what we expect: the value of continuing is the current reward plus the discounted expected reward obtained by continuing with policy σ next period.
We want to solve (4.3) for vσ. To this end, we define rσ∈RX and Lσ∈L(RX) via
The significance of Proposition 4.1.1 is that by construction vσ is a fixed point of Tσ. By the contraction property in Proposition 4.1.1, vσ is the only fixed point of Tσ in RX and, moreover, iterates of Tσ always converge to vσ.
In the job search problem in Section 3.3.1, we argued that the value function equals the fixed point of the Bellman operator. Here we make the same argument more formally in the more general setting of optimal stopping.
First, given an optimal stopping problem S=(β,P,c,e) with σ-value functions {vσ}σ∈Σ, we define the value functionv∗ of S via
so that v∗(x) is the maximal lifetime value available to an agent facing current state x. Following notation in Section 2.2.2.1, we can also write v∗=∨σvσ.
Given that solving the maximization in (4.13) is, in general, a difficult problem, how can we obtain the value function? The following steps can do the job:
formulate a Bellman equation for the value function of the optimal stopping problem, namely,
A v-greedy policy uses v to assign values to states and then chooses to stop or continue based on the action that generates a higher payoff.
With this language in place, the next proposition makes precise our informal Paragraph argument that optimal choices can be made using the value function.
Proposition 4.1.3 is a version of Bellman’s principle of optimality. We shall prove this principle in a more general setting in Chapter 5.
The theory just presented tells us that successive approximation using the Bellman operator converges to v∗ and v∗-greedy policies are optimal. These facts make value function iteration (VFI) a natural algorithm for solving optimal stopping problems. (VFI for optimal stopping problems corresponds to VFI for job search, as shown.) Later, in Theorem 8.1.1, we will show that when the number of iterates is sufficiently large, VFI produces an optimal policy.
In Section 3.2.2.2 we discussed firm valuation using expected present value of the cash flow generated by profits. This is a standard approach. However, it ignores that firms have the option to cease operations and sell all remaining assets. In this section, we consider firm valuation in the presence of an exit option.
Consider a firm whose productivity is exogenous and evolves according to a Q-Markov chain (Zt) on finite set Z⊂R. Profits are given by πt=π(Zt) for some fixed π∈RZ. At the start of each period, the firm decides whether to remain in operation and receive current profit πt, or to exit and receive scrap value s>0 for sale of physical assets. Discounting is at fixed rate r and β:=1/(1+r). We assume that r>0.
Let Σ be all σ:Z→{0,1}. For given σ∈Σ and v∈RZ, the corresponding policy operator is
We saw in Section 4.1.1.2--Section 4.1.1.3 that Tσ has a unique fixed point vσ and that vσ(z) represents the value of following policy σ forever, conditional on Z0=z.
The Bellman operator for the firm’s problem is the order-preserving self-map T on RZ defined by
Let v∗ be the value function for this problem. By Proposition 4.1.2, v∗ is the unique fixed point of T in RZ and the unique solution to the Bellman equation. Moreover, successive approximation from any v∈RZ converges to v∗. Finally, by Proposition 4.1.3, a policy is optimal if and only if it is v∗-greedy.
Figure Figure 4.1 plots v∗, computed via VFI (i.e., successive approximation using T, along with the stopping value s and the continuation value function h∗=π+βQv∗, under the parameterization given in Listing 1. As implied by the Bellman equation, v∗ is the pointwise maximum of s and h∗. The v∗-greedy policy instruct the firm to exit when the continuation value of the firm falls below the scrap value.
Figure 4.1:Value function for firms with exit option
1
2
3
4
5
6
7
8
9
10
"Creates an instance of the firm exit model."
function create_exit_model(;
n=200, # productivity grid size
ρ=0.95, μ=0.1, ν=0.1, # persistence, mean and volatility
β=0.98, s=100.0 # discount factor and scrap value
)
mc = tauchen(n, ρ, ν, μ)
z_vals, Q = mc.state_values, mc.p
return (; n, z_vals, Q, β, s)
end
If we define w by w(z)=Ez∑t⩾0βtπt for all z∈Z, then w(z) is the value of the firm given Z0=z when the firm never exits so that w evaluates the firm according to expected present value of the profit stream. Figure Figure 4.2 shows the no-exit value w based on the parameterization in Listing 1.
Choosing never to exit is a feasible policy. Since v∗ involves maximization of firm value over the set of all feasible policies, it must be at least as large as the value of never exiting.
We study monotonicity in values and actions in the general optimal stopping problem described in Section 4.1.1, with X as the state space, e as the exit reward function, and c as the continuation reward function.
The optimal policy in the iid job search problem takes the form σ∗(w)=1{w⩾w∗} for all w∈W, where w∗:=(1−β)h∗ is the reservation wage and h∗ is the continuation value. This optimal policy is of threshold type: once the wage offer exceeds the threshold, the decision is to stop.
Since threshold policies are convenient, let us now try to characterize them.
Throughout this section, we take X to be a subset of R. Elements of X are ordered by ⩽, the usual order on R.
For a binary function on X⊂R, the condition that σ∗ is decreasing means that the decision-maker chooses to exit when x is sufficiently small.
In the settings of Exercise 4.9--Exercise 4.11, the optimal policy is either increasing or decreasing. Since X is totally ordered, monotonicity implies that a threshold policy is optimal. For example, if σ∗ is increasing, then we take x∗ to be the smallest x∈X such that σ∗(x)=1. For such an x∗ we have
In Section 1.3.2.2 we solved the job search problem with iid draws by computing the continuation value h∗ directly and then choosing the policy σ∗(w)=1{w/(1−β)⩾h∗}. We saw that this approach is more efficient than first computing the value function, since the continuation value is one-dimensional rather than ∣W∣-dimensional.
In Section 3.3.1.2, we tried the same approach for the job search problem with Markov state, where wage draws are correlated. We gathered fewer benefits from using the continuation value approach in that setting, since the continuation value function has the same dimensionality as the value function.
These observations motivate us to explore continuation value methods more carefully. In this section, we formulate a continuation value approach for the general optimal stopping problem and verify convergence. We will see that, while all relevant state components must be included in the value function, purely transitory components do not affect continuation values. Hence the continuation value approach is at least as efficient and sometimes substantially more so.
Another asymmetry between value functions and continuation value functions is that the latter are typically smoother. For example, in job search problems, the value function is usually kinked at the reservation wage, while the continuation value function is smooth. Greater smoothness comes from taking expectations over stochastic transitions: integration acts as a smoothing operation. Like lower dimensionality, increased smoothness facilitates analysis and computation.
Let h∗ be the continuation value function for the optimal stopping problem defined in (4.27). To compute h∗ directly we begin with the optimal stopping version of the Bellman equation evaluated at v∗ and rewrite it as
Taking expectations of both sides of the equation conditional on current state x produces ∑x′v∗(x′)P(x,x′)=∑x′max{e(x′),h∗(x′)}P(x,x′). Multiplying by β, adding c(x), and using the definition of h∗, we get
The beginning of Section 4.1.4 mentioned that switching from value function iteration to continuation value iteration can substantially reduce the dimensionality of the problem in some cases. Here we describe situations where this works.
To begin, let W and Z be two finite sets and suppose that ϕ∈D(W) and Q∈M(RZ). Let (Wt) be iid with distribution ϕ and let (Zt) be an Q-Markov chain on Z. If (Wt) and (Zt) are independent, then (Xt) defined by Xt=(Wt,Zt) is P-Markov on X, where
Since the right hand side depends on both w and z, the Bellman operator acts on an n-dimensional space, where n:=∣X∣=∣W∣×∣Z∣.
However, if we inspect the right hand side of (4.35), we see that the continuation value function depends only on z. Dependence on w vanishes because w does not help predict w′. Thus, the continuation value function is an object in ∣Z∣-dimensional space. The continuation value operator
Consider the firm valuation problem from Section 4.1.2 but suppose now that scrap value fluctuates with prices of underlying assets. For simplicity let’s assume that scrap value at each time t is given by the iid sequence (St), where each St has density ϕ on R+. The corresponding Bellman operator is
We can convert this problem to a finite-state space optimal stopping problem by discretizing the density ϕ onto a finite grid contained in R+. However, since continuation values depend only on z, a better approach is to switch to a continuation value operator.
We discussed American options briefly in Example 4.1.2. Here we investigate this class of derivatives more carefully. We focus on American call options that provide the right to buy a particular stock or bond at a fixed strike priceK at any time before a set expiration date. The market price of the asset at time t is denoted by St.
We discussed a case in which the expiration date is infinity in Example 4.1.2. However, options without termination dates -- also called perpetual options -- are rare in practice. Hence we focus on the finite-horizon case. We are interested in computing the expected value of holding the option when discounting with a fixed interest rate, a typical assumption when pricing American options.
Finite horizon American options can be priced by backward induction in an approach like the one we used for the finite horizon job search problem discussed in Chapter 1. Alternatively, we can embed finite horizon options into the theory of infinite-horizon optimal stopping. We use the second approach here, since we have just presented a theory for infinite-horizon optimal stopping.
To this end, we take T∈N to be a fixed integer indicating the date of expiration. The option is purchased at t=0 and can be exercised at any t∈N with t⩽T. To include t in the current state, we set
The idea is that time is updated via t′=m(t), so that time increments at each update until t=T+1. After that we hold t constant. Bounding time at T+1 keeps the state space finite.
We assume that a stock price St evolves according to
Here (Zt)t⩾0 is Q-Markov on finite set Z for some Q∈M(RZ) and W is also finite. This means that the share price has both persistent and transient stochastic components. If we set parameters so that (Zt)t⩾0 resembles a random walk, price changes will be difficult to predict.
To form a Section 4.1.1.1 optimal stopping problem, we must specify the state and clarify the P∈M(RX) that maps to the state process. We set the state space to X:=T×W×Z and
Thus, time updates deterministically via t′=m(t) and z′ and w′ are drawn independently from Q(z,⋅) and ϕ respectively.
As for a perpetual option, the continuation reward is zero and the discount factor is β:=1/(1+r), where r>0 is a fixed risk-free rate. The exit reward can be expressed as 1{t⩽T}(St−K) so that exercising at time t earns the owner St−K up to expiry and zero thereafter. In terms of the state (t,z), the exit reward is
where t′=m(t). This value function v(t,w,z) neatly captures the value of the option: It is the maximum of current exercise value and the discounted expected value of carrying the option over to the next period.
Since the problem just described is an optimal stopping problem in the sense of Section 4.1.1.1, all of the optimality results attained for that problem apply. In particular, iterates of the Bellman operator converge to the value function v∗ and, moreover, a policy is optimal if and only if it is v∗-greedy.
We can do better than VFI. Since (Wt)t⩾0 is iid and appears only in the exit reward, we can reduce dimensionality by switching to the continuation value operator, which, in this case, can be expressed as
As proved in Section 4.1.4, the unique fixed point of C is the continuation value function h∗, and Ckh→h∗ as k→∞ for all h∈RX. With the fixed point in hand, we can compute the optimal policy as
Here σ∗(t,w,z)=1 prescribes exercising the option at time t.
Figure Figure 4.3 provides a visual representation of optimal actions under the default parameterization described in Listing 2. Each of the three figures show contour lines of the net exit reward f(t,w,z):=e(t,w,z)−h∗(w,z), viewed as a function of (w,z), when t is held fixed. The date t for each subfigure is shown in the title. The optimal policy exercises the option when f(t,w,z)⩾0.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
using QuantEcon, LinearAlgebra, IterTools
"Creates an instance of the option model with log S_t = Z_t + W_t."
function create_american_option_model(;
n=100, μ=10.0, # Markov state grid size and mean value
ρ=0.98, ν=0.2, # persistence and volatility for Markov state
s=0.3, # volatility parameter for W_t
r=0.01, # interest rate
K=10.0, T=200) # strike price and expiration date
t_vals = collect(1:T+1)
mc = tauchen(n, ρ, ν)
z_vals, Q = mc.state_values .+ μ, mc.p
w_vals, φ, β = [-s, s], [0.5, 0.5], 1 / (1 + r)
e(t, i_w, i_z) = (t ≤ T) * (z_vals[i_z] + w_vals[i_w] - K)
return (; t_vals, z_vals, w_vals, Q, φ, T, β, K, e)
end
Program 2:Pricing and American option (american_option.jl)
In each subfigure, the exercise region, which is the set (w,z) such that f(t,w,z)⩾0, correspond to the northeast part of the figure, where w and z are both large. The boundary between exercise and continuing is the zero contour line, which is shown in black. Notice that the size of the exercise region expands with t. This is because the value of waiting decreases when the set of possible exercise dates declines.
Figure Figure 4.4 provides some simulations of the stock price process (St)t⩾0 over the lifetime of the option, again using the default parameterization described in Listing 2. The blue region in the top part of each subfigure contains values of the stock price St=Zt+Wt such that St⩾K. In this configuration in which the price of the underlying exceeds the strike price, the option is said to be “in the money.” The figure also shows an optimal exercise date that is the first t such that e(t,Wt,Zt)⩾h∗(Wt,Zt) in a simulation.
Figure 4.3:Exercise region for the American option
Figure 4.4:Simulations for the American option process
Consider a firm that engages in costly research and development (R&D) in order to develop a new product. The firm decides whether to continue developing the product before starting to market it or to stop developing and start marketing it. For simplicity, we assume that the value of bringing the product to market is a one-time lump sum payment πt=π(Xt), where (Xt) is a P-Markov chain on finite set X with P∈M(RX). The flow cost of investing in R&D is Ct per period, where (Ct) is a stochastic process. Future payoffs are discounted at rate r>0 and we set β:=1/(1+r).
As a first take on this problem, suppose that Ct≡c∈R+ for all t and formulate an optimal stopping problem with exit reward e=π and constant continuation reward −c. The Bellman equation is
Since (Ct) is iid, we would ideally like to integrate it out in the manner of Section 4.1.4.2, thereby lowering the dimensionality of the problem. However, note that the continuation value associated with (4.52) is
From Exercise 4.16, we see that (4.56) has a unique solution g∗ in RX that can be computed by successive approximation. With g∗ in hand, we can compute the optimal policy via
Various textbooks treat optimal stopping in depth, although most use continuous time. Peskir & Shiryaev (2006) and Shiryaev (2007) are good examples.
There are many applications of optimal stopping in economics and finance, with influential early research papers including McCall (1970), Jovanovic (1982), Hopenhayn (1992), and Ericson & Pakes (1995). Arellano (2008) considers borrowing on international financial markets with the option of sovereign default (see Section 8.2.1.5). Riedel (2009) studies optimal stopping under Knightian uncertainty. Fajgelbaum et al. (2017) include an optimal stopping problem for firms in a model of uncertainty traps.
The firm problem with optimal exit has been used to analyze firm dynamics and firm size distributions in equilibrium models with heterogeneous firms. Hopenhayn (1992) is the classic reference. Perla & Tonetti (2014) construct a growth model in which firms at the bottom of the productivity distribution imitate more productive firms. Carvalho & Grassi (2019) analyze business cycles in a setting of firm growth with exit and a Pareto distribution of firms.
Infinite duration American options are analyzed in Mordecki (2002). Practical methods for pricing American options are provided by Longstaff & Schwartz (2001), Rogers (2002), and Kohler et al. (2010).
Replacement problems are an important optimal stopping problem not treated in this chapter. An important early paper by Rust (1987) uses dynamic programming to find optimal replacement policies for of engine parts and goes on to fit the model to data. Section 5.3.1 discusses structural estimation in the style of Rust (1987) and others.
We are studying American options in discrete time. Options with discrete exercise times are sometimes called Bermudan options. References for the continuous-time case are provided in Section 4.3.
Actually, in most definitions, u is also restricted to be bounded and measurable, in order to ensure that the integrals are finite. These technicalities can be ignored in the exercise.
Peskir, G., & Shiryaev, A. (2006). Optimal Stopping and Free-boundary Problems. Springer Verlag.
Shiryaev, A. N. (2007). Optimal Stopping Rules (Vol. 8). Springer Science & Business Media.
McCall, J. J. (1970). Economics of information and job search. The Quarterly Journal of Economics, 84(1), 113–126.
Jovanovic, B. (1982). Selection and the Evolution of Industry. Econometrica, 50(3), 649–670.
Hopenhayn, H. A. (1992). Entry, exit, and firm dynamics in long run equilibrium. Econometrica, 60, 1127–1150.
Ericson, R., & Pakes, A. (1995). Markov-perfect industry dynamics: A framework for empirical work. The Review of Economic Studies, 62(1), 53–82.
Arellano, C. (2008). Default risk and income fluctuations in emerging economies. American Economic Review, 98(3), 690–712.
Riedel, F. (2009). Optimal stopping with multiple priors. Econometrica, 77(3), 857–908.
Fajgelbaum, P. D., Schaal, E., & Taschereau-Dumouchel, M. (2017). Uncertainty traps. The Quarterly Journal of Economics, 132(4), 1641–1692.
Perla, J., & Tonetti, C. (2014). Equilibrium imitation and growth. Journal of Political Economy, 122(1), 52–76.
Carvalho, V. M., & Grassi, B. (2019). Large firm dynamics and the business cycle. American Economic Review, 109(4), 1375–1425.
Mordecki, E. (2002). Optimal stopping and perpetual options for Lévy processes. Finance and Stochastics, 6(4), 473–493.
Longstaff, F. A., & Schwartz, E. S. (2001). Valuing American options by simulation: A simple least-squares approach. The Review of Financial Studies, 14(1), 113–147.
Rogers, L. C. (2002). Monte Carlo valuation of American options. Mathematical Finance, 12(3), 271–286.
Kohler, M., Krzyżak, A., & Todorovic, N. (2010). Pricing of high-dimensional American options by neural networks. Mathematical Finance, 20(3), 383–410.