Optimal Stopping - Dynamic Programming Volume I: Finite States

We study problems of maximizing lifetime rewards in settings in which decision-makers face risks. The job search model studied in Chapter 1 and Chapter 3 is one example. Others include an entrepreneur who decides whether to exit or enter a market, a borrower who considers defaulting on a loan, a firm that contemplates introducing a new technology, or a portfolio manager deciding whether to exercise a real or financial option.

These can all be formulated as dynamic programming and have common features that facilitate sharp characterizations of optimality. They are all two-action (or binary choice) problems that provide good laboratories for studying some special dynamic programs in which recursive representations are particularly enlightening.

4.1Introduction to Optimal Stopping¶

We begin with a standard theory of optimal stopping and then consider alternative approaches that feature continuation values and threshold policies. We aim to provide a rigorous discussion of optimality that refines our less formal analysis of job search in Section 1.3 and Section 3.3.1.

4.1.1Theory¶

Our first step is to set out the fundamental theory of discrete time infinite-horizon optimal stopping problems.

4.1.1.1The Stopping Problem¶

Let $\Xsf$ be a finite set. Given $\Xsf$ , an optimal stopping problem is a tuple $\sS = (\beta, P, c, e)$ that consists of

(i) a discount factor $\beta \in (0,1)$ ,

(ii) a Markov operator $P \in \mopx$ ,

(iii) a continuation reward function $c \in \RR^\Xsf$ , and

(iv) an exit reward function $e \in \RR^\Xsf$ .

Given a $P$ -Markov chain $(X_t)_{t \geq 0}$ , a decision-maker observes the state $X_t$ in each period and decides whether to continue or stop. If she chooses to stop, she receives final reward $e(X_t)$ and the process terminates. If she decides to continue, then she receives $c(X_t)$ and the process repeats next period. Lifetime rewards are

\EE \sum_{t \geq 0} \beta^t R_t,

where $R_t$ equals $c(X_t)$ while the agent continues, $e(X_t)$ when the agent stops, and zero thereafter.

Optimal decisions are described by a policy function, which is a map $\sigma$ from $\Xsf$ to $\{0,1\}$ . After observing state $x$ at any given time, the decision-maker takes action $\sigma(x)$ , where 0 means “continue” and 1 means “stop.” Implicit in this formulation is the assumption that the current state contains enough information for the agent to decide whether or not to stop.

Let $\Sigma$ be the set of functions from $\Xsf$ to $\{0,1\}$ . Let $v_\sigma(x)$ denote the expected lifetime value of following policy $\sigma$ now and in every future period, given optimal stopping problem $\sS = (\beta, P, c, e)$ and current state $x \in \Xsf$ . We call $v_\sigma$ the $\sigma$ -value function. We also call $v_\sigma(x)$ the lifetime value of policy $\sigma$ conditional on initial state $x$ . Section 4.1.1.2, shows that $v_\sigma$ is well defined and describes how to calculate it. A policy $\sigma^* \in \Sigma$ is called optimal for $\sS$ if

v_{\sigma^*}(x) = \max_{\sigma \in \Sigma} v_\sigma(x) \quad \text{for all } x \in \Xsf.

(4.1)

4.1.1.2Lifetime Values¶

Fixing $\sigma \in \Sigma$ , let us consider how to compute the lifetime value $v_\sigma(x)$ of following $\sigma$ conditional on $X_0 = x$ . Evidently, $v_\sigma$ satisfies

v_\sigma(x) = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x' \in \Xsf} v_\sigma(x') P(x, x') \right] \quad \text{for all } x \in \Xsf .

(4.2)

Indeed, if $\sigma(x)=1$ , then (4.2) states that $v_\sigma(x) = e(x)$ , which is what we expect: if we choose to stop at a given state, then lifetime value from that state equals the exit reward. If, instead, $\sigma(x)=0$ , then (4.2) becomes

v_\sigma(x) = c(x) + \beta \sum_{x'} v_\sigma(x') P(x, x') ,

(4.3)

which is also what we expect: the value of continuing is the current reward plus the discounted expected reward obtained by continuing with policy $\sigma$ next period.

We want to solve (4.2) for $v_\sigma$ . To this end, we define $r_\sigma \in \RR^\Xsf$ and $L_\sigma \in \lopx$ via

r_\sigma(x) \coloneq \sigma(x) e(x) + (1-\sigma(x)) c(x) \quad \text{and} \quad L_\sigma(x, x') \coloneq \beta (1-\sigma(x)) P(x, x').

With this notation, we can write (4.2) pointwise as $v_\sigma = r_\sigma + L_\sigma \, v_\sigma$ . If $\rho(L_\sigma) < 1$ , then

v_\sigma = (I - L_\sigma)^{-1} \, r_\sigma.

(4.4)

By Exercise 4.1.1 and the Neumann series lemma, $v_\sigma$ is uniquely defined by (4.4).

4.1.1.3Policy Operators¶

For the proofs, it will be helpful to view $v_\sigma$ as the fixed point of an operator. We associate each $\sigma \in \Sigma$ with a policy operator $T_\sigma$ defined at $v \in \RR^{\Xsf}$ by

(T_\sigma \, v)(x) = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x'} v(x') P(x, x') \right],

(4.5)

for each $x \in \Xsf$ . With this notation, (4.2) can be written as $v_\sigma = T_\sigma \, v_\sigma$ .

Solution to Exercise 4.1.2

Fix $\sigma \in \Sigma$ . If $f, g \in \RR^\Xsf$ , $f \leq g$ and $x \in \Xsf$ , then

\begin{aligned} (T_\sigma g)(x) - (T_\sigma f)(x) & = (1-\sigma(x)) \left[ \beta \sum_{x'} g(x') P(x, x') - \beta \sum_{x'} f(x') P(x, x') \right] \\ & = (1-\sigma(x))\beta \sum_{x'} (g(x') - f(x')) P(x, x'). \end{aligned}

Since $g(x') \geq f(x')$ for all $x'$ we have $(T_\sigma g)(x) \geq (T_\sigma f)(x)$ for all $x$ .

Using the notation in Section 4.1.1.2, we can also define $T_\sigma$ via

T_\sigma \, v = r_\sigma + L_\sigma \, v .

The significance of Proposition 4.1.1 is that by construction $v_\sigma$ is a fixed point of $T_\sigma$ . By the contraction property in Proposition 4.1.1, $v_\sigma$ is the only fixed point of $T_\sigma$ in $\RR^\Xsf$ and, moreover, iterates of $T_\sigma$ always converge to $v_\sigma$ .

Solution to Exercise 4.1.3

Fix $\sigma \in \Sigma$ . Given $f, g \in \RR^\Xsf$ and $x \in \Xsf$ , we have

\begin{aligned} |(T_\sigma f)(x) - (T_\sigma \, g)(x)| & = \left| \, (1-\sigma(x))\beta \sum_{x'} (g(x') - f(x')) P(x, x') \, \right| \\ & \leq \beta \left| \sum_{x'} [f(x') - g(x')] P(x, x') \right|. \end{aligned}

Applying the triangle inequality and $\sum_{x'} P(x, x')=1$ , we obtain

|(T_\sigma f)(x) - (T_\sigma \, g)(x)| \leq \beta \sum_{x'} |f(x') - g(x')| P(x, x') \leq \beta \| f - g \|_\infty .

Taking the supremum over all $x$ on the left-hand side of this expression leads to

\|T_\sigma f - T_\sigma \, g \|_\infty \leq \beta \| f - g \|_\infty .

Since $f, g$ were arbitrary elements of $\RR^\Xsf$ , the contraction claim is proved.

4.1.1.4The Value Function¶

In the job search problem in Section 3.3.1, we argued that the value function equals the fixed point of the Bellman operator. Here we make the same argument more formally in the more general setting of optimal stopping.

First, given an optimal stopping problem $\sS = (\beta, P, c, e)$ with $\sigma$ -value functions $\{v_\sigma\}_{\sigma \in \Sigma}$ , we define the value function $v^*$ of $\sS$ via

v^*(x) \coloneq \max_{\sigma \in \Sigma} v_\sigma(x) \qquad (x \in \Xsf),

(4.6)

so that $v^*(x)$ is the maximal lifetime value available to an agent facing current state $x$ . Following notation in Section 2.2.2.1, we can also write $v^* = \vee_\sigma \, v_\sigma$ .

Given that solving the maximization in (4.6) is, in general, a difficult problem, how can we obtain the value function? The following steps can do the job:

(i) formulate a Bellman equation for the value function of the optimal stopping problem, namely,

v(x) = \max \left\{ e(x), c(x) + \beta \sum_{x'} v(x') P(x, x') \right\} \qquad (x \in \Xsf),

(4.7)

(ii) prove that this Bellman equation has a unique solution in $\RR^\Xsf$ , and then

(iii) show that this solution equals the value function, as defined in (4.6).

We shall complete these steps in Section 4.1.1.5.

4.1.1.5The Bellman Operator¶

Define the Bellman operator for the optimal stopping problem $\sS = (\beta, P, c, e)$ as

(Tv)(x) = \max \left\{ e(x) ,\, c(x) + \beta \sum_{x'} v(x') P(x, x') \right\},

(4.8)

where $x \in \Xsf$ and $v \in \RR^{\Xsf}$ . By construction, any fixed point of $T$ solves the Bellman equation and vice versa. Pointwise, we can express $T$ via $Tv = e \vee (c + \beta Pv)$ .

Our main result for this section is:

Solution to Exercise 4.1.5

This result follows from Lemma 2.2.3. For the sake of the exercise, we also provide a direct proof:

Take any $f, g$ in $\RR^\Xsf$ . Writing the operators pointwise and applying the last result in Lemma 2.2.1 gives

\begin{aligned} |Tf - Tg| & = | e \vee (c + \beta Pf) - e \vee (c + \beta Pg)| \\ & \leq \left| \beta Pf - \beta Pg \right| \\ & = \beta \left| P(f-g) \right| \\ & \leq \beta P \left| f-g \right|. \end{aligned}

(Here the last inequality uses the result in Exercise 2.2.7.) Since $P \geq 0$ we have $P | f-g | \leq P \|f-g\|_\infty \1 = \|f-g\|_\infty \1$ , so

|Tf - Tg | \leq \beta \| f - g \|_\infty \1.

Taking the maximum on both sides gives $\|Tf-Tg\|_\infty \leq \beta \|f-g\|_\infty$ . Since $f, g$ were arbitrary elements of $\RR^\Xsf$ , the contraction claim is verified.

Proof

Proof of Proposition 4.1.2.

With the result of Exercise 4.1.5 in hand, we need only show that the unique fixed point $\bar v$ of $T$ in $\RR^\Xsf$ is equal to $v^* = \vee_\sigma \, v_\sigma$ . We show $\bar v \leq v^*$ and then $\bar v \geq v^*$ .

For the first inequality, let $\sigma \in \Sigma$ be defined by

\sigma(x) = \1 \left\{ e(x) \geq c(x) + \beta \, \sum_{x'} \bar v(x') P(x, x') \right\} \quad \text{for all } x \in \Xsf.

Observe that for this choice of $\sigma$ we have, for any $x \in \Xsf$ ,

\begin{aligned} (T_\sigma \, \bar v)(x) & = \sigma(x) e(x) + (1-\sigma(x)) \left[ c(x) + \beta \sum_{x'} \bar v(x') P(x, x') \right] \\ & = \max \left\{ e(x) ,\, c(x) + \beta \sum_{x'} \bar v(x') P(x, x') \right\} = (T\bar v)(x) = \bar v(x). \end{aligned}

In particular, $T_\sigma \, \bar v = \bar v$ . But the only fixed point of $T_\sigma$ in $\RR^\Xsf$ is $v_\sigma$ , so $\bar v = v_\sigma$ . But then $\bar v \leq v^*$ , by the definition of $v^*$ . This is our first inequality.

Regarding the second, fix $\sigma \in \Sigma$ and observe that $T v \geq T_\sigma v$ for all $v \in \RR^\Xsf$ . Since $T$ is order-preserving and globally stable, Proposition 2.2.7 implies that $v_\sigma \leq \bar v$ . Taking the maximum over $\sigma \in \Sigma$ yields $v^* \leq \bar v$ . ◻

4.1.1.6Optimal Policies¶

Paralleling the definition provided in the discussion of job search (Section 1.3), for each $v \in \RR^\Xsf$ , we call $\sigma \in \Sigma$ $v$ -greedy if, for all $x \in \Xsf$ ,

\sigma(x) \in \argmax_{a \in \{0,1\}} \left\{ a e(x) + (1-a) \left[ c(x) + \beta \, \sum_{x'} v(x') P(x, x') \right] \right\}.

(4.9)

A $v$ -greedy policy uses $v$ to assign values to states and then chooses to stop or continue based on the action that generates a higher payoff.

With this language in place, the next proposition makes precise our informal Section 1.1.2 argument that optimal choices can be made using the value function.

Proposition 4.1.3 is a version of Bellman’s principle of optimality. We shall prove this principle in a more general setting in Chapter 5.

4.1.1.7Value Function Iteration¶

The theory just presented tells us that successive approximation using the Bellman operator converges to $v^*$ and $v^*$ -greedy policies are optimal. These facts make value function iteration (VFI) a natural algorithm for solving optimal stopping problems. (VFI for optimal stopping problems corresponds to VFI for job search, as shown.) Later, in Theorem 8.1.1, we will show that when the number of iterates is sufficiently large, VFI produces an optimal policy.

4.1.2Firm Valuation with Exit¶

In Section 3.2.2.2 we discussed firm valuation using expected present value of the cash flow generated by profits. This is a standard approach. However, it ignores that firms have the option to cease operations and sell all remaining assets. In this section, we consider firm valuation in the presence of an exit option.

4.1.2.1Optional Exit¶

Consider a firm whose productivity is exogenous and evolves according to a $Q$ -Markov chain $(Z_t)$ on finite set $\Zsf \subset \RR$ . Profits are given by $\pi_t = \pi(Z_t)$ for some fixed $\pi \in \RR^\Zsf$ . At the start of each period, the firm decides whether to remain in operation and receive current profit $\pi_t$ , or to exit and receive scrap value $s > 0$ for sale of physical assets. Discounting is at fixed rate $r$ and $\beta \coloneq 1/(1+r)$ . We assume that $r > 0$ .

Let $\Sigma$ be all $\sigma \colon \Zsf \to \{0,1\}$ . For given $\sigma \in \Sigma$ and $v \in \RR^\Zsf$ , the corresponding policy operator is

(T_\sigma v)(z) = \sigma(z) s + (1-\sigma(z)) \left[ \pi(z) + \beta \sum_{z'} v(z') Q(z, z') \right] \qquad (z \in \Zsf).

We saw in Section 4.1.1.2–Section 4.1.1.3 that $T_\sigma$ has a unique fixed point $v_\sigma$ and that $v_\sigma(z)$ represents the value of following policy $\sigma$ forever, conditional on $Z_0 = z$ .

The Bellman operator for the firm’s problem is the order-preserving self-map $T$ on $\RR^\Zsf$ defined by

(Tv)(z) = \max \left\{ s, \pi(z) + \beta \sum_{z'} v(z') Q(z, z') \right\} \quad (z \in \Zsf).

Pointwise, $T$ can be written as $Tv = s \vee (\pi + \beta Q v)$ .

Let $v^*$ be the value function for this problem. By Proposition 4.1.2, $v^*$ is the unique fixed point of $T$ in $\RR^\Zsf$ and the unique solution to the Bellman equation. Moreover, successive approximation from any $v \in \RR^\Zsf$ converges to $v^*$ . Finally, by Proposition 4.1.3, a policy is optimal if and only if it is $v^*$ -greedy.

Figure 4.1 plots $v^*$ , computed via VFI (i.e., successive approximation using $T$ , along with the stopping value $s$ and the continuation value function $h^* = \pi + \beta Q v^*$ , under the parameterization given in Listing 1. As implied by the Bellman equation, $v^*$ is the pointwise maximum of $s$ and $h^*$ . The $v^*$ -greedy policy instructs the firm to exit when the continuation value of the firm falls below the scrap value.

Figure 4.1:Value function for firms with exit option

1
2
3
4
5
6
7
8
9
10
"Creates an instance of the firm exit model."
function create_exit_model(;
        n=200,                  # productivity grid size
        ρ=0.95, μ=0.1, ν=0.1,   # persistence, mean and volatility
        β=0.98, s=100.0         # discount factor and scrap value
    )
    mc = tauchen(n, ρ, ν, μ)
    z_vals, Q = mc.state_values, mc.p
    return (; n, z_vals, Q, β, s)
end

Program 1:Firm exit model (firm_exit.jl)

4.1.2.2Exit Versus No-Exit¶

If we define $w$ by $w(z) = \EE_z \sum_{t \geq 0} \beta^t \pi_t$ for all $z \in \Zsf$ , then $w(z)$ is the value of the firm given $Z_0 = z$ when the firm never exits so that $w$ evaluates the firm according to expected present value of the profit stream. Figure 4.2 shows the no-exit value $w$ based on the parameterization in Listing 1.

Figure 4.2:Firm value with and without exit

In Figure 4.2, we see that $w \leq v^*$ on $\Zsf$ . Let’s now prove that this is always true.

To show $w \leq v^*$ , first observe that $w = (I - \beta Q)^{-1} \pi$ , by $\beta < 1$ and Lemma 3.2.1. Rearranging gives $w = \pi + \beta Q w$ .

Now note that under the policy $\sigma \equiv 0$ , where the firm chooses never to exit, we have $T_\sigma v = \pi + \beta Q v$ . Hence the unique fixed point of $T_\sigma$ is $w$ . As a result, $w = v_\sigma$ for $\sigma \equiv 0$ . But $v^* \geq v_\sigma$ for all $\sigma \in \Sigma$ . This proves that $w \leq v^*$ .

Choosing never to exit is a feasible policy. Since $v^*$ involves maximization of firm value over the set of all feasible policies, it must be at least as large as the value of never exiting.

Solution to Exercise 4.1.7

First observe that, since $v^* \geq w$ and $T$ is order-preserving, we have $v^* = Tv^* \geq Tw = s \vee (\pi + \beta Qw) = s \vee w$ . From this we get $v^* \geq s \vee w$ and applying $T$ to both sides gives $v^* \geq T(s \vee w)$ .

Next, observe that

T(s \vee w) = s \vee (\pi + \beta Q (s \vee w)) \geq \pi + \beta Q (s \vee w) \gg \pi + \beta Q w = w

where the strict inequality is by Exercise 2.2.8. We conclude that $v^* \geq T(s \vee w) \gg w$ , as was to be shown.

Intuitively, the option to exit adds value to firms everywhere in the state space, since $Q \gg 0$ implies that the state can shift to a region of the state space where exit is optimal in a later period.

Solution to Exercise 4.1.8

For the model described, the Bellman equation takes the form

v(p) = \max \left\{ s, \, \max_{\ell \geq 0} \pi(\ell, p) + \beta \sum_{p'} v(p') Q(p, p') \right\}.

Straightforward calculus shows that maximized one-period profits are $\pi(p) = p^2/ (4w)$ . Hence the final expression is

v(p) = \max \left\{ s, \, \frac{p^2}{4w} + \beta \sum_{p'} v(p') Q(p, p') \right\}.

4.1.3Monotonicity¶

We study monotonicity in values and actions in the general optimal stopping problem described in Section 4.1.1, with $\Xsf$ as the state space, $e$ as the exit reward function, and $c$ as the continuation reward function.

4.1.3.1Monotone Values¶

Let $v^*$ be the value function of an optimal stopping problem defined by $\Xsf$ , $P$ , $\beta$ , $c$ and $e$ and define a continuation value function $h^*$

h^*(x) \coloneq c(x) + \beta \sum_{x'} v^*(x') P(x, x') \qquad (x \in \Xsf).

(4.10)

(The continuation reward function $c$ and the continuation value function $h^*$ are distinct objects.)

Let $\Xsf$ be partially ordered and let $i\RR^\Xsf$ be the increasing functions in $\RR^\Xsf$ .

4.1.3.2Monotone Actions¶

The optimal policy in the IID job search problem takes the form $\sigma^*(w) = \1\{w \geq w^* \}$ for all $w \in \Wsf$ , where $w^* \coloneq (1-\beta) h^*$ is the reservation wage and $h^*$ is the continuation value. This optimal policy is of threshold type: once the wage offer exceeds the threshold, the decision is to stop.

Since threshold policies are convenient, let us now try to characterize them.

Throughout this section, we take $\Xsf$ to be a subset of $\RR$ . Elements of $\Xsf$ are ordered by $\leq$ , the usual order on $\RR$ .

For a binary function on $\Xsf \subset \RR$ , the condition that $\sigma^*$ is decreasing means that the decision-maker chooses to exit when $x$ is sufficiently small.

In the settings of Exercise 4.1.9–Exercise 4.1.11, the optimal policy is either increasing or decreasing. Since $\Xsf$ is totally ordered, monotonicity implies that a threshold policy is optimal. For example, if $\sigma^*$ is increasing, then we take $x^*$ to be the smallest $x \in \Xsf$ such that $\sigma^*(x) = 1$ . For such an $x^*$ we have

x < x^* \; \implies \sigma^*(x)=0 \quad \text{and} \quad x \geq x^* \; \implies \sigma^*(x)=1.

4.1.4Continuation Values¶

In Section 1.3.2.2 we solved the job search problem with IID draws by computing the continuation value $h^*$ directly and then choosing the policy $\sigma^*(w) = \1 \left\{ w/(1-\beta) \geq h^* \right\}$ . We saw that this approach is more efficient than first computing the value function, since the continuation value is one-dimensional rather than $|\Wsf|$ -dimensional.

In Section 3.3.1.2, we tried the same approach for the job search problem with Markov state, where wage draws are correlated. We gathered fewer benefits from using the continuation value approach in that setting, since the continuation value function has the same dimensionality as the value function.

These observations motivate us to explore continuation value methods more carefully. In this section, we formulate a continuation value approach for the general optimal stopping problem and verify convergence. We will see that, while all relevant state components must be included in the value function, purely transitory components do not affect continuation values. Hence the continuation value approach is at least as efficient and sometimes substantially more so.

Another asymmetry between value functions and continuation value functions is that the latter are typically smoother. For example, in job search problems, the value function is usually kinked at the reservation wage, while the continuation value function is smooth. Greater smoothness comes from taking expectations over stochastic transitions: integration acts as a smoothing operation. Like lower dimensionality, increased smoothness facilitates analysis and computation.

4.1.4.1The Continuation Value Operator¶

Let $h^*$ be the continuation value function for the optimal stopping problem defined in (4.10). To compute $h^*$ directly we begin with the optimal stopping version of the Bellman equation evaluated at $v^*$ and rewrite it as

v^*(x') = \max \left\{ e(x'), h^*(x') \right\} \qquad (x' \in \Xsf).

(4.11)

Taking expectations of both sides of the equation conditional on current state $x$ produces $\sum_{x'} v^*(x') P(x, x') = \sum_{x'} \max \left\{ e(x'), h^*(x') \right\} P(x, x')$ . Multiplying by $\beta$ , adding $c(x)$ , and using the definition of $h^*$ , we get

h^*(x) = c(x) + \beta \sum_{x'} \max \left\{ e(x'), h^*(x') \right\} P(x, x') \qquad (x \in \Xsf).

(4.12)

This expression motivates us to introduce a continuation value operator $C \colon \RR^\Xsf \to \RR^\Xsf$ via

(Ch)(x) = c(x) + \beta \sum_{x'} \max \left\{ e(x'), h(x') \right\} P(x, x') \qquad (x \in \Xsf).

(4.13)

Proposition 4.1.5 provides the following alternative method to compute the optimal policy that does not involve VFI:

(i) Use successive approximations to $h^*$ with $C$ and

(ii) Calculate $\sigma^*$ via $\sigma^*(x) = \1\{e(x) \geq h^*(x)\}$ for each $x \in \Xsf$ .

In Section 4.1.4.2 we discuss settings where this approach is advantageous.

Proof

Proof of Proposition 4.1.5.

Fix $f, g \in \RR^\Xsf$ and $x \in \Xsf$ . By the triangle inequality and the bound $|\alpha \vee x - \alpha \vee y| \leq |x - y|$ from, we have

\begin{aligned} |(Cf)(x) - (Cg)(x)| & \leq \beta \sum_{x'} \left| \max \left\{ e(x'), f(x') \right\} - \max \left\{ e(x'), g(x') \right\} \right| P(x, x') \\ & \leq \beta \sum_{x'} \left| f(x') - g(x') \right| P(x, x'). \end{aligned}

The right hand side is dominated by $\beta \| f - g \|_\infty$ . Taking the maximum on the left hand side gives

\| Cf - Cg \|_\infty \leq \beta \| f - g\|_\infty,

which confirms that $C$ is a contraction of modulus $\beta$ on $\RR^\Xsf$ .

From the contraction property, we know that $C$ has exactly one fixed point in $\RR^\Xsf$ ; (4.12) implies that $h^*$ is that fixed point. ◻

4.1.4.2Dimensionality Reduction¶

The beginning of Section 4.1.4 mentioned that switching from value function iteration to continuation value iteration can substantially reduce the dimensionality of the problem in some cases. Here we describe situations where this works.

To begin, let $\Wsf$ and $\Zsf$ be two finite sets and suppose that $\phi \in \dD(\Wsf)$ and $Q \in \mopz$ . Let $(W_t)$ be IID with distribution $\phi$ and let $(Z_t)$ be an $Q$ -Markov chain on $\Zsf$ . If $(W_t)$ and $(Z_t)$ are independent, then $(X_t)$ defined by $X_t = (W_t, Z_t)$ is $P$ -Markov on $\Xsf$ , where

P(x, x') = P((w, z), (w', z')) = \phi(w') Q(z, z').

Suppose that the continuation reward depends only on $z$ so that we can write the Bellman operator as

(Tv)(w, z) = \max \left\{ e(w, z) ,\, c(z) + \beta \sum_{w' \in \Wsf}\sum_{z' \in \Zsf} v(w', z') \phi(w') Q(z, z') \right\}.

(4.14)

Since the right hand side depends on both $w$ and $z$ , the Bellman operator acts on an $n$ -dimensional space, where $n \coloneq |\Xsf| = |\Wsf| \times |\Zsf|$ .

However, if we inspect the right hand side of (4.14), we see that the continuation value function depends only on $z$ . Dependence on $w$ vanishes because $w$ does not help predict $w'$ . Thus, the continuation value function is an object in $|\Zsf|$ -dimensional space. The continuation value operator

(Ch)(z) = c(z) + \beta \sum_{w'} \sum_{z'} \max \left\{ e(w', z'), h(z') \right\} \phi(w') Q(z, z') \qquad (z \in \Zsf)

(4.15)

acts on this lower dimensional-space.

More examples of dimensionality reduction are illustrated in the applications.

4.1.4.3Application to Firm Value¶

Consider the firm valuation problem from Section 4.1.2 but suppose now that scrap value fluctuates with prices of underlying assets. For simplicity let’s assume that scrap value at each time $t$ is given by the IID sequence $(S_t)$ , where each $S_t$ has density $\phi$ on $\RR_+$ . The corresponding Bellman operator is

(Tv)(z, s) = \max \left\{ s, \pi(z) + \beta \sum_{z'} \int v(z', s') \phi(s') \diff s' Q(z, z') \right\}.

We can convert this problem to a finite-state space optimal stopping problem by discretizing the density $\phi$ onto a finite grid contained in $\RR_+$ . However, since continuation values depend only on $z$ , a better approach is to switch to a continuation value operator.

Exercise 4.1.13

In Section 2.2.4 we defined stochastic dominance for distributions on finite sets. For densities $\phi$ and $\psi$ on $\RR_+$ , the definition is similar: we say that $\psi$ stochastically dominates $\phi$ and write $\phi \lefsd \psi$ if $\int u(x) \phi(x) \diff x \leq \int u(x) \psi(x) \diff x$ for every $u$ in $i\RR^\Xsf$ .^[2] With this definition, show that if $\phi_a$ and $\phi_b$ are two alternative densities for scrap value and $\phi_a \lefsd \phi_b$ , then $\sigma^*_a \geq \sigma^*_b$ pointwise on $\Zsf$ , where $\sigma^*_i$ is the optimal policy corresponding to density $\phi_i$ for $i \in \{a, b\}$ . Interpret this result.

Solution to Exercise 4.1.13

Let $\phi_a$ and $\phi_b$ be as stated. For $i \in \{a, b\}$ and $h \in \RR^\Zsf$ , let

(C_i h)(z) = \pi(z) + \beta \sum_{z'} \int \max \{s', h(z')\} \phi_i(s') \diff s' Q(z, z') .

Since, for each $z' \in \Zsf$ , the function $s' \mapsto \max \{s', h(z')\}$ is increasing, we have

\sum_{z'} \int \max \{s', h(z')\} \phi_a(s') \diff s' Q(z, z') \leq \sum_{z'} \int \max \{s', h(z')\} \phi_b(s') \diff s' Q(z, z') .

Hence $C_a h \leq C_b h$ for all $h \in \RR^\Zsf$ . As $C_b$ is order-preserving and globally stable, Proposition 2.2.7 implies that the fixed point of $C_b$ dominates the fixed point of $C_a$ . That is, $h^*_a \leq h^*_b$ . But then, for any $z \in \Zsf$ , we have $h^*_a(z) \leq h^*_b(z)$ and hence

\sigma_b^*(z) = \1\{s \geq h^*_b(z)\} \leq \1\{s \geq h^*_a(z)\} = \sigma_a^*(z).

The interpretation of $\sigma_b^* \leq \sigma_a^*$ is that firm exits at fewer states when the scrap value distribution is $\phi^*_b$ . This makes sense, since the current scrap value offer $s$ is already known, while future offers are more promising under $\phi^*_b$ than $\phi^*_a$ . Hence continuing is more attractive.

4.2Further Applications¶

In this section, we discuss some additional applications of optimal stopping.

4.2.1American Options¶

We discussed American options briefly in Example 4.1.2. Here we investigate this class of derivatives more carefully. We focus on American call options that provide the right to buy a particular stock or bond at a fixed strike price $K$ at any time before a set expiration date. The market price of the asset at time $t$ is denoted by $S_t$ .

We discussed a case in which the expiration date is infinity in Example 4.1.2. However, options without termination dates – also called perpetual options – are rare in practice. Hence we focus on the finite-horizon case. We are interested in computing the expected value of holding the option when discounting with a fixed interest rate, a typical assumption when pricing American options.

Finite horizon American options can be priced by backward induction in an approach like the one we used for the finite horizon job search problem discussed in Chapter 1. Alternatively, we can embed finite horizon options into the theory of infinite-horizon optimal stopping. We use the second approach here, since we have just presented a theory for infinite-horizon optimal stopping.

To this end, we take $T \in \NN$ to be a fixed integer indicating the date of expiration. The option is purchased at $t=0$ and can be exercised at any $t \in \NN$ with $t \leq T$ . To include $t$ in the current state, we set

\Tsf \coloneq \{1, \ldots, T+1\} \quad \text{and} \quad m(t) \coloneq \min\{t+1, T+1\} \text{ for all } t \in \Tsf.

The idea is that time is updated via $t' = m(t)$ , so that time increments at each update until $t=T+1$ . After that we hold $t$ constant. Bounding time at $T+1$ keeps the state space finite.

We assume that a stock price $S_t$ evolves according to

S_t = Z_t + W_t \quad \text{where} \quad (W_t)_{t \geq 0} \iidsim \phi \in \dD(\Wsf).

Here $(Z_t)_{t \geq 0}$ is $Q$ -Markov on finite set $\Zsf$ for some $Q \in \mopz$ and $\Wsf$ is also finite. This means that the share price has both persistent and transient stochastic components. If we set parameters so that $(Z_t)_{t \geq 0}$ resembles a random walk, price changes will be difficult to predict.

To form a Section 4.1.1.1 optimal stopping problem, we must specify the state and clarify the $P \in \mopx$ that maps to the state process. We set the state space to $\Xsf \coloneq \Tsf \times \Wsf \times \Zsf$ and

P((t, w, z), (t', w', z')) \coloneq \1\{t' = m(t)\} \phi(w') Q(z, z').

Thus, time updates deterministically via $t' = m(t)$ and $z'$ and $w'$ are drawn independently from $Q(z, \cdot)$ and $\phi$ respectively.

As for a perpetual option, the continuation reward is zero and the discount factor is $\beta \coloneq 1/(1+r)$ , where $r > 0$ is a fixed risk-free rate. The exit reward can be expressed as $\1\{t \leq T\} (S_t - K)$ so that exercising at time $t$ earns the owner $S_t - K$ up to expiry and zero thereafter. In terms of the state $(t, z)$ , the exit reward is

e(t, w, z) \coloneq \1\{t \leq T\} [z + w - K].

The Bellman equation can be written

v(t, w, z) = \max \left\{ e(t, w, z) ,\, \beta \sum_{w'}\sum_{z'} v(t', w', z') \phi(w') Q(z, z') \right\},

where $t' = m(t)$ . This value function $v(t, w, z)$ neatly captures the value of the option: It is the maximum of current exercise value and the discounted expected value of carrying the option over to the next period.

Since the problem just described is an optimal stopping problem in the sense of Section 4.1.1.1, all of the optimality results attained for that problem apply. In particular, iterates of the Bellman operator converge to the value function $v^*$ and, moreover, a policy is optimal if and only if it is $v^*$ -greedy.

We can do better than VFI. Since $(W_t)_{t \geq 0}$ is IID and appears only in the exit reward, we can reduce dimensionality by switching to the continuation value operator, which, in this case, can be expressed as

(Ch)(t, z) = \beta \sum_{z'} \sum_{w'} \max \left\{ e(t', w', z'), \, h(t', z') \right\} \phi(w') Q(z, z').

(4.16)

As proved in Section 4.1.4, the unique fixed point of $C$ is the continuation value function $h^*$ , and $C^k h \to h^*$ as $k \to \infty$ for all $h \in \RR^\Xsf$ . With the fixed point in hand, we can compute the optimal policy as

\sigma^*(t, w, z) = \1 \left\{ e(t, w, z) \geq h^*(t, z) \right\}.

Here $\sigma^*(t, w, z) = 1$ prescribes exercising the option at time $t$ .

Figure 4.3 provides a visual representation of optimal actions under the default parameterization described in Listing 2. Each of the three figures show contour lines of the net exit reward $f(t, w, z) \coloneq e(t, w, z) - h^*(t, z)$ , viewed as a function of $(w, z)$ , when $t$ is held fixed. The date $t$ for each subfigure is shown in the title. The optimal policy exercises the option when $f(t, w, z) \geq 0$ .

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
using QuantEcon, LinearAlgebra, IterTools

"Creates an instance of the option model with log S_t = Z_t + W_t."
function create_american_option_model(;
        n=100, μ=10.0,  # Markov state grid size and mean value
        ρ=0.98, ν=0.2,  # persistence and volatility for Markov state
        s=0.3,          # volatility parameter for W_t
        r=0.01,         # interest rate 
        K=10.0, T=200)  # strike price and expiration date
    t_vals = collect(1:T+1)
    mc = tauchen(n, ρ, ν)
    z_vals, Q = mc.state_values .+ μ, mc.p
    w_vals, φ, β = [-s, s], [0.5, 0.5], 1 / (1 + r)
    e(t, i_w, i_z) = (t ≤ T) * (z_vals[i_z] + w_vals[i_w] - K)
    return (; t_vals, z_vals, w_vals, Q, φ, T, β, K, e)
end

Program 2:Pricing an American option (american_option.jl)

In each subfigure, the exercise region, which is the set $(w, z)$ such that $f(t, w, z) \geq 0$ , corresponds to the northeast part of the figure, where $w$ and $z$ are both large. The boundary between exercise and continuing is the zero contour line, which is shown in black. Notice that the size of the exercise region expands with $t$ . This is because the value of waiting decreases when the set of possible exercise dates declines.

Figure 4.4 provides some simulations of the stock price process $(S_t)_{t \geq 0}$ over the lifetime of the option, again using the default parameterization described in Listing 2. The blue region in the top part of each subfigure contains values of the stock price $S_t = Z_t + W_t$ such that $S_t \geq K$ . In this configuration in which the price of the underlying exceeds the strike price, the option is said to be “in the money.” The figure also shows an optimal exercise date that is the first $t$ such that $e(t, W_t, Z_t) \geq h^*(W_t, Z_t)$ in a simulation.

Figure 4.3:Exercise region for the American option

Figure 4.4:Simulations for the American option process

4.2.2Research and Development¶

Consider a firm that engages in costly research and development (R&D) in order to develop a new product. The firm decides whether to continue developing the product before starting to market it or to stop developing and start marketing it. For simplicity, we assume that the value of bringing the product to market is a one-time lump sum payment $\pi_t = \pi(X_t)$ , where $(X_t)$ is a $P$ -Markov chain on finite set $\Xsf$ with $P \in \mopx$ . The flow cost of investing in R&D is $C_t$ per period, where $(C_t)$ is a stochastic process. Future payoffs are discounted at rate $r > 0$ and we set $\beta \coloneq 1/(1+r)$ .

4.2.2.1Constant R&D Costs¶

As a first take on this problem, suppose that $C_t \equiv c \in \RR_+$ for all $t$ and formulate an optimal stopping problem with exit reward $e=\pi$ and constant continuation reward $-c$ . The Bellman equation is

v(x) = \max \left\{ \pi(x), -c + \beta \sum_{x'} v(x') P(x, x') \right\} \qquad (x \in \Xsf).

(4.17)

4.2.2.2IID R&D Costs¶

Let’s suppose now that $(C_t)_{t \geq 0}$ is IID with common distribution $\phi \in \dD(\Wsf)$ . The Bellman equation is

v(c, x) = \max \left\{ \pi(x), -c + \beta \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x') \right\}.

(4.18)

Since $(C_t)$ is IID, we would ideally like to integrate it out in the manner of Section 4.1.4.2, thereby lowering the dimensionality of the problem. However, note that the continuation value associated with (4.18) is

h(c, x) \coloneq -c + \beta \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x'),

which still depends on $c$ .

Fortunately, there is a way to eliminate $c$ . Define the expected value $g(x)$ in state $x$ by

g(x) \coloneq \sum_{x'} \sum_{c'} v(c', x') \phi(c') P(x, x').

(4.19)

Rewrite the Bellman equation using $g$ and replacing $(c, x)$ with $(c', x')$ to get

v(c', x') = \max \left\{ \pi(x'), -c' + \beta g(x') \right\}.

Averaging over $(c', x')$ and using the definition of $g$ again gives

g(x) = \sum_{x'} \sum_{c'} \max \left\{ \pi(x'), -c' + \beta g(x') \right\} \phi(c') P(x, x').

(4.20)

This is a functional equation in $g$ that depends only on $x$ . To solve it, we introduce the operator $R$ defined by

(Rg)(x) = \sum_{x'} \sum_{c'} \max \left\{ \pi(x'), -c' + \beta g(x') \right\} \phi(c') P(x, x') \quad (x \in \Xsf).

From Exercise 4.2.3, we see that (4.20) has a unique solution $g^*$ in $\RR^\Xsf$ that can be computed by successive approximation. With $g^*$ in hand, we can compute the optimal policy via

\sigma^*(c, x) = \1 \left\{ \pi(x) \geq -c + \beta g^*(x) \right\}.

4.3Chapter Notes¶

Various textbooks treat optimal stopping in depth, although most use continuous time. Peskir & Shiryaev (2006) and Shiryaev (2007) are good examples.

There are many applications of optimal stopping in economics and finance, with influential early research papers including McCall (1970), Jovanovic (1982), Hopenhayn (1992), and Ericson & Pakes (1995). Arellano (2008) considers borrowing on international financial markets with the option of sovereign default (see Section 8.2.1.5). Riedel (2009) studies optimal stopping under Knightian uncertainty. Fajgelbaum et al. (2017) include an optimal stopping problem for firms in a model of uncertainty traps.

The firm problem with optimal exit has been used to analyze firm dynamics and firm size distributions in equilibrium models with heterogeneous firms. Hopenhayn (1992) is the classic reference. Perla & Tonetti (2014) construct a growth model in which firms at the bottom of the productivity distribution imitate more productive firms. Carvalho & Grassi (2019) analyze business cycles in a setting of firm growth with exit and a Pareto distribution of firms.

Infinite duration American options are analyzed in Mordecki (2002). Practical methods for pricing American options are provided by Longstaff & Schwartz (2001), Rogers (2002), and Kohler et al. (2010).

Replacement problems are an important optimal stopping problem not treated in this chapter. An important early paper by Rust (1987) uses dynamic programming to find optimal replacement policies for of engine parts and goes on to fit the model to data. Section 5.3.1 discusses structural estimation in the style of Rust (1987) and others.

Footnotes¶

We are studying American options in discrete time. Options with discrete exercise times are sometimes called Bermudan options. References for the continuous-time case are provided in Section 4.3.
↩
Actually, in most definitions, $u$ is also restricted to be bounded and measurable, in order to ensure that the integrals are finite. These technicalities can be ignored in the exercise.
↩

References¶

Peskir, G., & Shiryaev, A. (2006). Optimal Stopping and Free-boundary Problems. Springer Verlag.
Shiryaev, A. N. (2007). Optimal Stopping Rules (Vol. 8). Springer Science & Business Media.
McCall, J. J. (1970). Economics of information and job search. The Quarterly Journal of Economics, 84(1), 113–126.
Jovanovic, B. (1982). Selection and the Evolution of Industry. Econometrica, 50(3), 649–670.
Hopenhayn, H. A. (1992). Entry, exit, and firm dynamics in long run equilibrium. Econometrica, 60, 1127–1150.
Ericson, R., & Pakes, A. (1995). Markov-perfect industry dynamics: A framework for empirical work. The Review of Economic Studies, 62(1), 53–82.
Arellano, C. (2008). Default risk and income fluctuations in emerging economies. American Economic Review, 98(3), 690–712.
Riedel, F. (2009). Optimal stopping with multiple priors. Econometrica, 77(3), 857–908.
Fajgelbaum, P. D., Schaal, E., & Taschereau-Dumouchel, M. (2017). Uncertainty traps. The Quarterly Journal of Economics, 132(4), 1641–1692.
Perla, J., & Tonetti, C. (2014). Equilibrium imitation and growth. Journal of Political Economy, 122(1), 52–76.
Carvalho, V. M., & Grassi, B. (2019). Large firm dynamics and the business cycle. American Economic Review, 109(4), 1375–1425.
Mordecki, E. (2002). Optimal stopping and perpetual options for Lévy processes. Finance and Stochastics, 6(4), 473–493.
Longstaff, F. A., & Schwartz, E. S. (2001). Valuing American options by simulation: A simple least-squares approach. The Review of Financial Studies, 14(1), 113–147.
Rogers, L. C. (2002). Monte Carlo valuation of American options. Mathematical Finance, 12(3), 271–286.
Kohler, M., Krzyżak, A., & Todorovic, N. (2010). Pricing of high-dimensional American options by neural networks. Mathematical Finance, 20(3), 383–410.