Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

4 Optimal Stopping

Authors
Affiliations
New York University
Australian National University

We study problems of maximizing lifetime rewards in settings in which decision-makers face risks. The job search model studied in Chapter 1 and Chapter 3 is one example. Others include an entrepreneur who decides whether to exit or enter a market, a borrower who considers defaulting on a loan, a firm that contemplates introducing a new technology, or a portfolio manager deciding whether to exercise a real or financial option.

These can all be formulated as dynamic programming and have common features that facilitate sharp characterizations of optimality. They are all two-action (or binary choice) problems that provide good laboratories for studying some special dynamic programs in which recursive representations are particularly enlightening.

4.1Introduction to Optimal Stopping

We begin with a standard theory of optimal stopping and then consider alternative approaches that feature continuation values and threshold policies. We aim to provide a rigorous discussion of optimality that refines our less formal analysis of job search in Section 1.3 and Section 3.3.1.

4.1.1Theory

Our first step is to set out the fundamental theory of discrete time infinite-horizon optimal stopping problems.

4.1.1.1The Stopping Problem

Let X\Xsf be a finite set. Given X\Xsf, an optimal stopping problem is a tuple S=(β,P,c,e)\sS = (\beta, P, c, e) that consists of

  1. a discount factor β(0,1)\beta \in (0,1),

  2. a Markov operator PM(RX)P \in \mopx,

  3. a continuation reward function cRXc \in \RR^\Xsf, and

  4. an exit reward function eRXe \in \RR^\Xsf.

Given a PP-Markov chain (Xt)t0(X_t)_{t \geq 0}, a decision-maker observes the state XtX_t in each period and decides whether to continue or stop. If she chooses to stop, she receives final reward e(Xt)e(X_t) and the process terminates. If she decides to continue, then she receives c(Xt)c(X_t) and the process repeats next period. Lifetime rewards are

Et0βtRt,\EE \sum_{t \geq 0} \beta^t R_t,

where RtR_t equals c(Xt)c(X_t) while the agent continues, e(Xt)e(X_t) when the agent stops, and zero thereafter.

Optimal decisions are described by a policy function, which is a map σ\sigma from X\Xsf to {0,1}\{0,1\}. After observing state xx at any given time, the decision-maker takes action σ(x)\sigma(x), where 0 means “continue” and 1 means “stop.” Implicit in this formulation is the assumption that the current state contains enough information for the agent to decide whether or not to stop.

Let Σ\Sigma be the set of functions from X\Xsf to {0,1}\{0,1\}. Let vσ(x)v_\sigma(x) denote the expected lifetime value of following policy σ\sigma now and in every future period, given optimal stopping problem S=(β,P,c,e)\sS = (\beta, P, c, e) and current state xXx \in \Xsf. We call vσv_\sigma the σ\sigma-value function. We also call vσ(x)v_\sigma(x) the lifetime value of policy σ\sigma conditional on initial state xx. Section Section 4.1.1.2, shows that vσv_\sigma is well defined and describes how to calculate it. A policy σΣ\sigma^* \in \Sigma is called optimal for S\sS if

vσ(x)=maxσΣvσ(x)for all xX.v_{\sigma^*}(x) = \max_{\sigma \in \Sigma} v_\sigma(x) \quad \text{for all } x \in \Xsf.

4.1.1.2Lifetime Values

Fixing σΣ\sigma \in \Sigma, let us consider how to compute the lifetime value vσ(x)v_\sigma(x) of following σ\sigma conditional on X0=xX_0 = x. Evidently, vσv_\sigma satisfies

vσ(x)=σ(x)e(x)+(1σ(x))[c(x)+βxXvσ(x)P(x,x)]for all xX.v_\sigma(x) = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x' \in \Xsf} v_\sigma(x') P(x, x') \right] \quad \text{for all } x \in \Xsf .

Indeed, if σ(x)=1\sigma(x)=1, then (4.3) states that vσ(x)=e(x)v_\sigma(x) = e(x), which is what we expect: if we choose to stop at a given state, then lifetime value from that state equals the exit reward. If, instead, σ(x)=0\sigma(x)=0, then (4.3) becomes

vσ(x)=c(x)+βxvσ(x)P(x,x),v_\sigma(x) = c(x) + \beta \sum_{x'} v_\sigma(x') P(x, x') ,

which is also what we expect: the value of continuing is the current reward plus the discounted expected reward obtained by continuing with policy σ\sigma next period.

We want to solve (4.3) for vσv_\sigma. To this end, we define rσRXr_\sigma \in \RR^\Xsf and LσL(RX)L_\sigma \in \lopx via

rσ(x)σ(x)e(x)+(1σ(x))c(x)andLσ(x,x)β(1σ(x))P(x,x).r_\sigma(x) \coloneq \sigma(x) e(x) + (1-\sigma(x)) c(x) \quad \text{and} \quad L_\sigma(x, x') \coloneq \beta (1-\sigma(x)) P(x, x').

With this notation, we can write (4.3) pointwise as vσ=rσ+Lσvσv_\sigma = r_\sigma + L_\sigma \, v_\sigma. If ρ(Lσ)<1\rho(L_\sigma) < 1, then

vσ=(ILσ)1rσ.v_\sigma = (I - L_\sigma)^{-1} \, r_\sigma.

By Exercise 4.1 and the Neumann series lemma, vσv_\sigma is uniquely defined by (4.6).

4.1.1.3Policy Operators

For the proofs, it will be helpful to view vσv_\sigma as the fixed point of an operator. We associate each σΣ\sigma \in \Sigma with a policy operator TσT_\sigma defined at vRXv \in \RR^{\Xsf} by

(Tσv)(x)=σ(x)e(x)+(1σ(x))[c(x)+βxv(x)P(x,x)],(T_\sigma \, v)(x) = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x'} v(x') P(x, x') \right],

for each xXx \in \Xsf. With this notation, (4.3) can be written as vσ=Tσvσv_\sigma = T_\sigma \, v_\sigma.

Using the notation in Section 4.1.1.2, we can also define TσT_\sigma via

Tσv=rσ+Lσv.T_\sigma \, v = r_\sigma + L_\sigma \, v .

The significance of Proposition 4.1.1 is that by construction vσv_\sigma is a fixed point of TσT_\sigma. By the contraction property in Proposition 4.1.1, vσv_\sigma is the only fixed point of TσT_\sigma in RX\RR^\Xsf and, moreover, iterates of TσT_\sigma always converge to vσv_\sigma.

4.1.1.4The Value Function

In the job search problem in Section 3.3.1, we argued that the value function equals the fixed point of the Bellman operator. Here we make the same argument more formally in the more general setting of optimal stopping.

First, given an optimal stopping problem S=(β,P,c,e)\sS = (\beta, P, c, e) with σ\sigma-value functions {vσ}σΣ\{v_\sigma\}_{\sigma \in \Sigma}, we define the value function vv^* of S\sS via

v(x)maxσΣvσ(x)(xX),v^*(x) \coloneq \max_{\sigma \in \Sigma} v_\sigma(x) \qquad (x \in \Xsf),

so that v(x)v^*(x) is the maximal lifetime value available to an agent facing current state xx. Following notation in Section 2.2.2.1, we can also write v=σvσv^* = \vee_\sigma \, v_\sigma.

Given that solving the maximization in (4.13) is, in general, a difficult problem, how can we obtain the value function? The following steps can do the job:

  1. formulate a Bellman equation for the value function of the optimal stopping problem, namely,

v(x)=max{e(x),c(x)+βxv(x)P(x,x)}(xX),v(x) = \max \left\{ e(x), c(x) + \beta \sum_{x'} v(x') P(x, x') \right\} \qquad (x \in \Xsf),
  1. prove that this Bellman equation has a unique solution in RX\RR^\Xsf, and then

  2. show that this solution equals the value function, as defined in (4.13).

We shall complete these steps in Section 4.1.1.5.

4.1.1.5The Bellman Operator

Define the Bellman operator for the optimal stopping problem S=(β,P,c,e)\sS = (\beta, P, c, e) as

(Tv)(x)=max{e(x),c(x)+βxv(x)P(x,x)},(Tv)(x) = \max \left\{ e(x) ,\, c(x) + \beta \sum_{x'} v(x') P(x, x') \right\},

where xXx \in \Xsf and vRXv \in \RR^{\Xsf}. By construction, any fixed point of TT solves the Bellman equation and vice versa. Pointwise, we can express TT via Tv=e(c+βPv)Tv = e \vee (c + \beta Pv).

Our main result for this section is:

4.1.1.6Optimal Policies

Paralleling the definition provided in the discussion of job search (Section 1.3), for each vRXv \in \RR^\Xsf, we call σΣ\sigma \in \Sigma vv-greedy if, for all xXx \in \Xsf,

σ(x)argmaxa{0,1}{ae(x)+(1a)[c(x)+βxv(x)P(x,x)]}.\sigma(x) \in \argmax_{a \in \{0,1\}} \left\{ a e(x) + (1-a) \left[ c(x) + \beta \, \sum_{x'} v(x') P(x, x') \right] \right\}.

A vv-greedy policy uses vv to assign values to states and then chooses to stop or continue based on the action that generates a higher payoff.

With this language in place, the next proposition makes precise our informal Paragraph argument that optimal choices can be made using the value function.

Proposition 4.1.3 is a version of Bellman’s principle of optimality. We shall prove this principle in a more general setting in Chapter 5.

4.1.1.7Value Function Iteration

The theory just presented tells us that successive approximation using the Bellman operator converges to vv^* and vv^*-greedy policies are optimal. These facts make value function iteration (VFI) a natural algorithm for solving optimal stopping problems. (VFI for optimal stopping problems corresponds to VFI for job search, as shown.) Later, in Theorem 8.1.1, we will show that when the number of iterates is sufficiently large, VFI produces an optimal policy.

4.1.2Firm Valuation with Exit

In Section 3.2.2.2 we discussed firm valuation using expected present value of the cash flow generated by profits. This is a standard approach. However, it ignores that firms have the option to cease operations and sell all remaining assets. In this section, we consider firm valuation in the presence of an exit option.

4.1.2.1Optional Exit

Consider a firm whose productivity is exogenous and evolves according to a QQ-Markov chain (Zt)(Z_t) on finite set ZR\Zsf \subset \RR. Profits are given by πt=π(Zt)\pi_t = \pi(Z_t) for some fixed πRZ\pi \in \RR^\Zsf. At the start of each period, the firm decides whether to remain in operation and receive current profit πt\pi_t, or to exit and receive scrap value s>0s > 0 for sale of physical assets. Discounting is at fixed rate rr and β1/(1+r)\beta \coloneq 1/(1+r). We assume that r>0r > 0.

Let Σ\Sigma be all σ ⁣:Z{0,1}\sigma \colon \Zsf \to \{0,1\}. For given σΣ\sigma \in \Sigma and vRZv \in \RR^\Zsf, the corresponding policy operator is

(Tσv)(z)=σ(z)s+(1σ(z))[π(z)+βzv(z)Q(z,z)](zZ).(T_\sigma v)(z) = \sigma(z) s + (1-\sigma(z)) \left[ \pi(z) + \beta \sum_{z'} v(z') Q(z, z') \right] \qquad (z \in \Zsf).

We saw in Section 4.1.1.2--Section 4.1.1.3 that TσT_\sigma has a unique fixed point vσv_\sigma and that vσ(z)v_\sigma(z) represents the value of following policy σ\sigma forever, conditional on Z0=zZ_0 = z.

The Bellman operator for the firm’s problem is the order-preserving self-map TT on RZ\RR^\Zsf defined by

(Tv)(z)=max{s,π(z)+βzv(z)Q(z,z)}(zZ).(Tv)(z) = \max \left\{ s, \pi(z) + \beta \sum_{z'} v(z') Q(z, z') \right\} \quad (z \in \Zsf).

Pointwise, TT can be written as Tv=s(π+βQv)Tv = s \vee (\pi + \beta Q v).

Let vv^* be the value function for this problem. By Proposition 4.1.2, vv^* is the unique fixed point of TT in RZ\RR^\Zsf and the unique solution to the Bellman equation. Moreover, successive approximation from any vRZv \in \RR^\Zsf converges to vv^*. Finally, by Proposition 4.1.3, a policy is optimal if and only if it is vv^*-greedy.

Figure Figure 4.1 plots vv^*, computed via VFI (i.e., successive approximation using TT, along with the stopping value ss and the continuation value function h=π+βQvh^* = \pi + \beta Q v^*, under the parameterization given in Listing 1. As implied by the Bellman equation, vv^* is the pointwise maximum of ss and hh^*. The vv^*-greedy policy instruct the firm to exit when the continuation value of the firm falls below the scrap value.

Value function for firms with exit option

Figure 4.1:Value function for firms with exit option

1
2
3
4
5
6
7
8
9
10
"Creates an instance of the firm exit model."
function create_exit_model(;
        n=200,                  # productivity grid size
        ρ=0.95, μ=0.1, ν=0.1,   # persistence, mean and volatility
        β=0.98, s=100.0         # discount factor and scrap value
    )
    mc = tauchen(n, ρ, ν, μ)
    z_vals, Q = mc.state_values, mc.p
    return (; n, z_vals, Q, β, s)
end

Program 1:Firm exit model (firm_exit.jl)

4.1.2.2Exit Versus No-Exit

If we define ww by w(z)=Ezt0βtπtw(z) = \EE_z \sum_{t \geq 0} \beta^t \pi_t for all zZz \in \Zsf, then w(z)w(z) is the value of the firm given Z0=zZ_0 = z when the firm never exits so that ww evaluates the firm according to expected present value of the profit stream. Figure Figure 4.2 shows the no-exit value ww based on the parameterization in Listing 1.

Firm value with and without exit

Figure 4.2:Firm value with and without exit

In Figure Figure 4.2, we see that wvw \leq v^* on Z\Zsf. Let’s now prove that this is always true.

To show wvw \leq v^*, first observe that w=(IβQ)1πw = (I - \beta Q)^{-1} \pi, by β<1\beta < 1 and Lemma 3.2.1. Rearranging gives w=π+βQww = \pi + \beta Q w.

Now note that under the policy σ0\sigma \equiv 0, where the firm chooses never to exit, we have $T_\sigma v = \pi

Choosing never to exit is a feasible policy. Since vv^* involves maximization of firm value over the set of all feasible policies, it must be at least as large as the value of never exiting.

4.1.3Monotonicity

We study monotonicity in values and actions in the general optimal stopping problem described in Section 4.1.1, with X\Xsf as the state space, ee as the exit reward function, and cc as the continuation reward function.

4.1.3.1Monotone Values

Let vv^* be the value function of an optimal stopping problem defined by X\Xsf, PP, β\beta, cc and ee and define a continuation value function hh^*

h(x)c(x)+βxv(x)P(x,x)(xX).h^*(x) \coloneq c(x) + \beta \sum_{x'} v^*(x') P(x, x') \qquad (x \in \Xsf).

(The continuation reward function cc and the continuation value function hh^* are distinct objects.)

Let X\Xsf be partially ordered and let iRXi\RR^\Xsf be the increasing functions in RX\RR^\Xsf.

4.1.3.2Monotone Actions

The optimal policy in the iid job search problem takes the form σ(w)=1{ww}\sigma^*(w) = \1\{w \geq w^* \} for all wWw \in \Wsf, where w(1β)hw^* \coloneq (1-\beta) h^* is the reservation wage and hh^* is the continuation value. This optimal policy is of threshold type: once the wage offer exceeds the threshold, the decision is to stop.

Since threshold policies are convenient, let us now try to characterize them.

Throughout this section, we take X\Xsf to be a subset of R\RR. Elements of X\Xsf are ordered by \leq, the usual order on R\RR.

For a binary function on XR\Xsf \subset \RR, the condition that σ\sigma^* is decreasing means that the decision-maker chooses to exit when xx is sufficiently small.

In the settings of Exercise 4.9--Exercise 4.11, the optimal policy is either increasing or decreasing. Since X\Xsf is totally ordered, monotonicity implies that a threshold policy is optimal. For example, if σ\sigma^* is increasing, then we take xx^* to be the smallest xXx \in \Xsf such that σ(x)=1\sigma^*(x) = 1. For such an xx^* we have

x<x      σ(x)=0andxx      σ(x)=1.x < x^* \; \implies \sigma^*(x)=0 \quad \text{and} \quad x \geq x^* \; \implies \sigma^*(x)=1.

4.1.4Continuation Values

In Section 1.3.2.2 we solved the job search problem with iid draws by computing the continuation value hh^* directly and then choosing the policy σ(w)=1{w/(1β)h}\sigma^*(w) = \1 \left\{ w/(1-\beta) \geq h^* \right\}. We saw that this approach is more efficient than first computing the value function, since the continuation value is one-dimensional rather than W|\Wsf|-dimensional.

In Section 3.3.1.2, we tried the same approach for the job search problem with Markov state, where wage draws are correlated. We gathered fewer benefits from using the continuation value approach in that setting, since the continuation value function has the same dimensionality as the value function.

These observations motivate us to explore continuation value methods more carefully. In this section, we formulate a continuation value approach for the general optimal stopping problem and verify convergence. We will see that, while all relevant state components must be included in the value function, purely transitory components do not affect continuation values. Hence the continuation value approach is at least as efficient and sometimes substantially more so.

Another asymmetry between value functions and continuation value functions is that the latter are typically smoother. For example, in job search problems, the value function is usually kinked at the reservation wage, while the continuation value function is smooth. Greater smoothness comes from taking expectations over stochastic transitions: integration acts as a smoothing operation. Like lower dimensionality, increased smoothness facilitates analysis and computation.

4.1.4.1The Continuation Value Operator

Let hh^* be the continuation value function for the optimal stopping problem defined in (4.27). To compute hh^* directly we begin with the optimal stopping version of the Bellman equation evaluated at vv^* and rewrite it as

v(x)=max{e(x),h(x)}(xX).v^*(x') = \max \left\{ e(x'), h^*(x') \right\} \qquad (x' \in \Xsf).

Taking expectations of both sides of the equation conditional on current state xx produces xv(x)P(x,x)=xmax{e(x),h(x)}P(x,x)\sum_{x'} v^*(x') P(x, x') = \sum_{x'} \max \left\{ e(x'), h^*(x') \right\} P(x, x'). Multiplying by β\beta, adding c(x)c(x), and using the definition of hh^*, we get

h(x)=c(x)+βxmax{e(x),h(x)}P(x,x)(xX).h^*(x) = c(x) + \beta \sum_{x'} \max \left\{ e(x'), h^*(x') \right\} P(x, x') \qquad (x \in \Xsf).

This expression motivates us to introduce a continuation value operator C ⁣:RXRXC \colon \RR^\Xsf \to \RR^\Xsf via

(Ch)(x)=c(x)+βxmax{e(x),h(x)}P(x,x)(xX).(Ch)(x) = c(x) + \beta \sum_{x'} \max \left\{ e(x'), h(x') \right\} P(x, x') \qquad (x \in \Xsf).

Proposition 4.1.5 provides the following alternative method to compute the optimal policy that does not involve VFI:

  1. Use successive approximations to hh^* with CC and

  2. Calculate σ\sigma^* via σ(x)=1{e(x)h(x)}\sigma^*(x) = \1\{e(x) \geq h^*(x)\} for each xXx \in \Xsf.

In Section 4.1.4.2 we discuss settings where this approach is advantageous.

4.1.4.2Dimensionality Reduction

The beginning of Section 4.1.4 mentioned that switching from value function iteration to continuation value iteration can substantially reduce the dimensionality of the problem in some cases. Here we describe situations where this works.

To begin, let W\Wsf and Z\Zsf be two finite sets and suppose that ϕD(W)\phi \in \dD(\Wsf) and QM(RZ)Q \in \mopz. Let (Wt)(W_t) be iid with distribution ϕ\phi and let (Zt)(Z_t) be an QQ-Markov chain on Z\Zsf. If (Wt)(W_t) and (Zt)(Z_t) are independent, then (Xt)(X_t) defined by Xt=(Wt,Zt)X_t = (W_t, Z_t) is PP-Markov on X\Xsf, where

P(x,x)=P((w,z),(w,z))=ϕ(w)Q(z,z).P(x, x') = P((w, z), (w', z')) = \phi(w') Q(z, z').

Suppose that the continuation reward depends only on zz so that we can write the Bellman operator as

(Tv)(w,z)=max{e(w,z),c(z)+βwWzZv(w,z)ϕ(w)Q(z,z)}.(Tv)(w, z) = \max \left\{ e(w, z) ,\, c(z) + \beta \sum_{w' \in \Wsf}\sum_{z' \in \Zsf} v(w', z') \phi(w') Q(z, z') \right\}.

Since the right hand side depends on both ww and zz, the Bellman operator acts on an nn-dimensional space, where nX=W×Zn \coloneq |\Xsf| = |\Wsf| \times |\Zsf|.

However, if we inspect the right hand side of (4.35), we see that the continuation value function depends only on zz. Dependence on ww vanishes because ww does not help predict ww'. Thus, the continuation value function is an object in Z|\Zsf|-dimensional space. The continuation value operator

(Ch)(z)=c(z)+βwzmax{e(w,z),h(z)}ϕ(w)Q(z,z)(zZ)(Ch)(z) = c(z) + \beta \sum_{w'} \sum_{z'} \max \left\{ e(w', z'), h(z') \right\} \phi(w') Q(z, z') \qquad (z \in \Zsf)

acts on this lower dimensional-space.

More examples of dimensionality reduction are illustrated in the applications.

4.1.4.3Application to Firm Value

Consider the firm valuation problem from Section 4.1.2 but suppose now that scrap value fluctuates with prices of underlying assets. For simplicity let’s assume that scrap value at each time tt is given by the iid sequence (St)(S_t), where each StS_t has density ϕ\phi on R+\RR_+. The corresponding Bellman operator is

(Tv)(z,s)=max{s,π(z)+βzv(z,s)ϕ(s) ⁣dsQ(z,z)}.(Tv)(z, s) = \max \left\{ s, \pi(z) + \beta \sum_{z'} \int v(z', s') \phi(s') \diff s' Q(z, z') \right\}.

We can convert this problem to a finite-state space optimal stopping problem by discretizing the density ϕ\phi onto a finite grid contained in R+\RR_+. However, since continuation values depend only on zz, a better approach is to switch to a continuation value operator.

4.2Further Applications

In this section, we discuss some additional applications of optimal stopping.

4.2.1American Options

We discussed American options briefly in Example 4.1.2. Here we investigate this class of derivatives more carefully. We focus on American call options that provide the right to buy a particular stock or bond at a fixed strike price KK at any time before a set expiration date. The market price of the asset at time tt is denoted by StS_t.

We discussed a case in which the expiration date is infinity in Example 4.1.2. However, options without termination dates -- also called perpetual options -- are rare in practice. Hence we focus on the finite-horizon case. We are interested in computing the expected value of holding the option when discounting with a fixed interest rate, a typical assumption when pricing American options.

Finite horizon American options can be priced by backward induction in an approach like the one we used for the finite horizon job search problem discussed in Chapter 1. Alternatively, we can embed finite horizon options into the theory of infinite-horizon optimal stopping. We use the second approach here, since we have just presented a theory for infinite-horizon optimal stopping.

To this end, we take TNT \in \NN to be a fixed integer indicating the date of expiration. The option is purchased at t=0t=0 and can be exercised at any tNt \in \NN with tTt \leq T. To include tt in the current state, we set

T{1,,T+1}andm(t)min{t+1,T+1} for all tT.\Tsf \coloneq \{1, \ldots, T+1\} \quad \text{and} \quad m(t) \coloneq \min\{t+1, T+1\} \text{ for all } t \in \Tsf.

The idea is that time is updated via t=m(t)t' = m(t), so that time increments at each update until t=T+1t=T+1. After that we hold tt constant. Bounding time at T+1T+1 keeps the state space finite.

We assume that a stock price StS_t evolves according to

St=Zt+Wtwhere(Wt)t0 iid ϕD(W).S_t = Z_t + W_t \quad \text{where} \quad (W_t)_{t \geq 0} \iidsim \phi \in \dD(\Wsf).

Here (Zt)t0(Z_t)_{t \geq 0} is QQ-Markov on finite set Z\Zsf for some QM(RZ)Q \in \mopz and W\Wsf is also finite. This means that the share price has both persistent and transient stochastic components. If we set parameters so that (Zt)t0(Z_t)_{t \geq 0} resembles a random walk, price changes will be difficult to predict.

To form a Section 4.1.1.1 optimal stopping problem, we must specify the state and clarify the PM(RX)P \in \mopx that maps to the state process. We set the state space to XT×W×Z\Xsf \coloneq \Tsf \times \Wsf \times \Zsf and

P((t,w,z),(t,w,z))1{t=m(t)}ϕ(w)Q(z,z).P((t, w, z), (t', w', z')) \coloneq \1\{t' = m(t)\} \phi(w') Q(z, z').

Thus, time updates deterministically via t=m(t)t' = m(t) and zz' and ww' are drawn independently from Q(z,)Q(z, \cdot) and ϕ\phi respectively.

As for a perpetual option, the continuation reward is zero and the discount factor is β1/(1+r)\beta \coloneq 1/(1+r), where r>0r > 0 is a fixed risk-free rate. The exit reward can be expressed as 1{tT}(StK)\1\{t \leq T\} (S_t - K) so that exercising at time tt earns the owner StKS_t - K up to expiry and zero thereafter. In terms of the state (t,z)(t, z), the exit reward is

e(t,w,z)1{tT}[z+wK].e(t, w, z) \coloneq \1\{t \leq T\} [z + w - K].

The Bellman equation can be written

v(t,w,z)=max{e(t,w,z),βwzv(t,w,z)ϕ(w)Q(z,z)},v(t, w, z) = \max \left\{ e(t, w, z) ,\, \beta \sum_{w'}\sum_{z'} v(t', w', z') \phi(w') Q(z, z') \right\},

where t=m(t)t' = m(t). This value function v(t,w,z)v(t, w, z) neatly captures the value of the option: It is the maximum of current exercise value and the discounted expected value of carrying the option over to the next period.

Since the problem just described is an optimal stopping problem in the sense of Section 4.1.1.1, all of the optimality results attained for that problem apply. In particular, iterates of the Bellman operator converge to the value function vv^* and, moreover, a policy is optimal if and only if it is vv^*-greedy.

We can do better than VFI. Since (Wt)t0(W_t)_{t \geq 0} is iid and appears only in the exit reward, we can reduce dimensionality by switching to the continuation value operator, which, in this case, can be expressed as

(Ch)(t,z)=βzwmax{e(t,w,z),h(t,z)}ϕ(w)Q(z,z).(Ch)(t, z) = \beta \sum_{z'} \sum_{w'} \max \left\{ e(t', w', z'), \, h(t', z') \right\} \phi(w') Q(z, z').

As proved in Section 4.1.4, the unique fixed point of CC is the continuation value function hh^*, and CkhhC^k h \to h^* as kk \to \infty for all hRXh \in \RR^\Xsf. With the fixed point in hand, we can compute the optimal policy as

σ(t,w,z)=1{e(t,w,z)h(t,z)}.\sigma^*(t, w, z) = \1 \left\{ e(t, w, z) \geq h^*(t, z) \right\}.

Here σ(t,w,z)=1\sigma^*(t, w, z) = 1 prescribes exercising the option at time tt.

Figure Figure 4.3 provides a visual representation of optimal actions under the default parameterization described in Listing 2. Each of the three figures show contour lines of the net exit reward f(t,w,z)e(t,w,z)h(w,z)f(t, w, z) \coloneq e(t, w, z) - h^*(w, z), viewed as a function of (w,z)(w, z), when tt is held fixed. The date tt for each subfigure is shown in the title. The optimal policy exercises the option when f(t,w,z)0f(t, w, z) \geq 0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
using QuantEcon, LinearAlgebra, IterTools

"Creates an instance of the option model with log S_t = Z_t + W_t."
function create_american_option_model(;
        n=100, μ=10.0,  # Markov state grid size and mean value
        ρ=0.98, ν=0.2,  # persistence and volatility for Markov state
        s=0.3,          # volatility parameter for W_t
        r=0.01,         # interest rate 
        K=10.0, T=200)  # strike price and expiration date
    t_vals = collect(1:T+1)
    mc = tauchen(n, ρ, ν)
    z_vals, Q = mc.state_values .+ μ, mc.p
    w_vals, φ, β = [-s, s], [0.5, 0.5], 1 / (1 + r)
    e(t, i_w, i_z) = (t ≤ T) * (z_vals[i_z] + w_vals[i_w] - K)
    return (; t_vals, z_vals, w_vals, Q, φ, T, β, K, e)
end

Program 2:Pricing and American option (american_option.jl)

In each subfigure, the exercise region, which is the set (w,z)(w, z) such that f(t,w,z)0f(t, w, z) \geq 0, correspond to the northeast part of the figure, where ww and zz are both large. The boundary between exercise and continuing is the zero contour line, which is shown in black. Notice that the size of the exercise region expands with tt. This is because the value of waiting decreases when the set of possible exercise dates declines.

Figure Figure 4.4 provides some simulations of the stock price process (St)t0(S_t)_{t \geq 0} over the lifetime of the option, again using the default parameterization described in Listing 2. The blue region in the top part of each subfigure contains values of the stock price St=Zt+WtS_t = Z_t + W_t such that StKS_t \geq K. In this configuration in which the price of the underlying exceeds the strike price, the option is said to be “in the money.” The figure also shows an optimal exercise date that is the first tt such that e(t,Wt,Zt)h(Wt,Zt)e(t, W_t, Z_t) \geq h^*(W_t, Z_t) in a simulation.

Exercise region for the American option

Figure 4.3:Exercise region for the American option

Simulations for the American option process

Figure 4.4:Simulations for the American option process

4.2.2Research and Development

Consider a firm that engages in costly research and development (R&D) in order to develop a new product. The firm decides whether to continue developing the product before starting to market it or to stop developing and start marketing it. For simplicity, we assume that the value of bringing the product to market is a one-time lump sum payment πt=π(Xt)\pi_t = \pi(X_t), where (Xt)(X_t) is a PP-Markov chain on finite set X\Xsf with PM(RX)P \in \mopx. The flow cost of investing in R&D is CtC_t per period, where (Ct)(C_t) is a stochastic process. Future payoffs are discounted at rate r>0r > 0 and we set β1/(1+r)\beta \coloneq 1/(1+r).

4.2.2.1Constant R&D Costs

As a first take on this problem, suppose that CtcR+C_t \equiv c \in \RR_+ for all tt and formulate an optimal stopping problem with exit reward e=πe=\pi and constant continuation reward c-c. The Bellman equation is

v(x)=max{π(x),c+βxv(x)P(x,x)}(xX).v(x) = \max \left\{ \pi(x), -c + \beta \sum_{x'} v(x') P(x, x') \right\} \qquad (x \in \Xsf).

4.2.2.2IID R&D Costs

Let’s suppose now that (Ct)t0(C_t)_{t \geq 0} is iid with common distribution ϕD(W)\phi \in \dD(\Wsf). The Bellman equation is

v(c,x)=max{π(x),c+βxcv(c,x)ϕ(c)P(x,x)}.v(c, x) = \max \left\{ \pi(x), -c + \beta \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x') \right\}.

Since (Ct)(C_t) is iid, we would ideally like to integrate it out in the manner of Section 4.1.4.2, thereby lowering the dimensionality of the problem. However, note that the continuation value associated with (4.52) is

h(c,x)c+βxcv(c,x)ϕ(c)P(x,x),h(c, x) \coloneq -c + \beta \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x'),

which still depends on cc.

Fortunately, there is a way to eliminate cc. Define the expected value g(x)g(x) in state xx by

g(x)xcv(c,x)ϕ(c)P(x,x).g(x) \coloneq \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x').

Rewrite the Bellman equation using gg and replacing (c,x)(c, x) with (c,x)(c', x') to get

v(c,x)=max{π(x),c+βg(x)}.v(c', x') = \max \left\{ \pi(x'), -c' + \beta g(x') \right\}.

Averaging over (c,x)(c', x') and using the definition of gg again gives

g(x)=xcmax{π(x),c+βg(x)}ϕ(c)P(x,x).g(x) = \sum_{x'} \sum_{c'} \max \left\{ \pi(x'), -c' + \beta g(x') \right\} \phi(c') P(x, x').

This is a functional equation in gg that depends only on xx. To solve it, we introduce the operator RR defined by

(Rg)(x)=xcmax{π(x),c+βg(x)}ϕ(c)P(x,x)(xX).(Rg)(x) = \sum_{x'} \sum_{c'} \max \left\{ \pi(x'), -c' + \beta g(x') \right\} \phi(c') P(x, x') \quad (x \in \Xsf).

From Exercise 4.16, we see that (4.56) has a unique solution gg^* in RX\RR^\Xsf that can be computed by successive approximation. With gg^* in hand, we can compute the optimal policy via

σ(c,x)=1{π(x)c+βg(x)}.\sigma^*(c, x) = \1 \left\{ \pi(x) \geq -c + \beta g^*(x) \right\}.

4.3Chapter Notes

Various textbooks treat optimal stopping in depth, although most use continuous time. Peskir & Shiryaev (2006) and Shiryaev (2007) are good examples.

There are many applications of optimal stopping in economics and finance, with influential early research papers including McCall (1970), Jovanovic (1982), Hopenhayn (1992), and Ericson & Pakes (1995). Arellano (2008) considers borrowing on international financial markets with the option of sovereign default (see Section 8.2.1.5). Riedel (2009) studies optimal stopping under Knightian uncertainty. Fajgelbaum et al. (2017) include an optimal stopping problem for firms in a model of uncertainty traps.

The firm problem with optimal exit has been used to analyze firm dynamics and firm size distributions in equilibrium models with heterogeneous firms. Hopenhayn (1992) is the classic reference. Perla & Tonetti (2014) construct a growth model in which firms at the bottom of the productivity distribution imitate more productive firms. Carvalho & Grassi (2019) analyze business cycles in a setting of firm growth with exit and a Pareto distribution of firms.

Infinite duration American options are analyzed in Mordecki (2002). Practical methods for pricing American options are provided by Longstaff & Schwartz (2001), Rogers (2002), and Kohler et al. (2010).

Replacement problems are an important optimal stopping problem not treated in this chapter. An important early paper by Rust (1987) uses dynamic programming to find optimal replacement policies for of engine parts and goes on to fit the model to data. Section 5.3.1 discusses structural estimation in the style of Rust (1987) and others.

Footnotes
  1. We are studying American options in discrete time. Options with discrete exercise times are sometimes called Bermudan options. References for the continuous-time case are provided in Section 4.3.

  2. Actually, in most definitions, uu is also restricted to be bounded and measurable, in order to ensure that the integrals are finite. These technicalities can be ignored in the exercise.

References
  1. Peskir, G., & Shiryaev, A. (2006). Optimal Stopping and Free-boundary Problems. Springer Verlag.
  2. Shiryaev, A. N. (2007). Optimal Stopping Rules (Vol. 8). Springer Science & Business Media.
  3. McCall, J. J. (1970). Economics of information and job search. The Quarterly Journal of Economics, 84(1), 113–126.
  4. Jovanovic, B. (1982). Selection and the Evolution of Industry. Econometrica, 50(3), 649–670.
  5. Hopenhayn, H. A. (1992). Entry, exit, and firm dynamics in long run equilibrium. Econometrica, 60, 1127–1150.
  6. Ericson, R., & Pakes, A. (1995). Markov-perfect industry dynamics: A framework for empirical work. The Review of Economic Studies, 62(1), 53–82.
  7. Arellano, C. (2008). Default risk and income fluctuations in emerging economies. American Economic Review, 98(3), 690–712.
  8. Riedel, F. (2009). Optimal stopping with multiple priors. Econometrica, 77(3), 857–908.
  9. Fajgelbaum, P. D., Schaal, E., & Taschereau-Dumouchel, M. (2017). Uncertainty traps. The Quarterly Journal of Economics, 132(4), 1641–1692.
  10. Perla, J., & Tonetti, C. (2014). Equilibrium imitation and growth. Journal of Political Economy, 122(1), 52–76.
  11. Carvalho, V. M., & Grassi, B. (2019). Large firm dynamics and the business cycle. American Economic Review, 109(4), 1375–1425.
  12. Mordecki, E. (2002). Optimal stopping and perpetual options for Lévy processes. Finance and Stochastics, 6(4), 473–493.
  13. Longstaff, F. A., & Schwartz, E. S. (2001). Valuing American options by simulation: A simple least-squares approach. The Review of Financial Studies, 14(1), 113–147.
  14. Rogers, L. C. (2002). Monte Carlo valuation of American options. Mathematical Finance, 12(3), 271–286.
  15. Kohler, M., Krzyżak, A., & Todorovic, N. (2010). Pricing of high-dimensional American options by neural networks. Mathematical Finance, 20(3), 383–410.