Introduction - Dynamic Programming Volume I: Finite States

The temporal structure of a typical dynamic program is

The state $X_t$ is a vector listing current values of variables deemed relevant to choosing the current action. The action $A_t$ is a vector describing choices of a set of decision variables. If $T < \infty$ , then the problem has a finite horizon. Otherwise it is an infinite horizon problem. Figure 1.1 illustrates the first two rounds of a dynamic program. As shown in the figure, a rule for updating the state depends on the current state and action.

Dynamic programming provides a way to maximize the expected lifetime reward of a decision-maker who receives a prospective reward sequence $(R_t)_{t \geq 0}$ and who confronts a system that maps today’s state and control into the next period’s state. A lifetime reward is an aggregation of the individual period rewards $(R_t)_{t \geq 0}$ into a single value. An example of lifetime reward is an expected discounted sum $\EE \sum_{t \geq 0} \beta^t R_t$ for some $\beta \in (0,1)$ .

Dynamic programming has a vast array of applications, from robotics and artificial intelligence to the sequencing of DNA. Dynamic programming is used every day to control aircraft, route shipping, test products, recommend information on media platforms, and solve research problems. Some companies produce specialized computer chips that are designed for specific dynamic programs.

Within economics and finance, dynamic programming is applied to topics including unemployment, monetary policy, fiscal policy, asset pricing, firm investment, wealth dynamics, inventory control, commodity pricing, sovereign default, the division of labor, natural resource extraction, human capital accumulation, retirement decisions, portfolio choice, and dynamic pricing. We discuss some of these applications in the rest of the book.

The core theory of dynamic programming is relatively simple and concise. But implementation can be computationally demanding. That situation provides one of the major challenges facing the field of dynamic programming.

In this book, we discuss fundamental theory, traditional economic applications, and recent applications with computationally demanding environments. We also cover recent trends towards more sophisticated specifications of lifetime rewards, often called recursive preferences. Throughout the book, theory and computation are combined, since, for interesting problems, brute-force computation is futile, while theory alone provides limited insights. The interplay between interesting applications, fundamental theory, computational methods, and evolving hardware capability makes dynamic programming exciting.

1.1Bellman Equations¶

In this section, we introduce the recursive structure of dynamic programming in a simple setting. After solving a finite-horizon model, we consider an infinite-horizon version and explain how it produces a system of nonlinear equations. Then we turn to methods for solving such systems.

1.1.1Finite-Horizon Job Search¶

We begin with a celebrated model of job search created by McCall (1970). McCall analyzed the decision problem of an unemployed worker in terms of current and prospective wage offers, impatience, and the availability of unemployment compensation. Here we study a simple version of the model in which essential ideas of dynamic programming are particularly clear.

Readers who are familiar with Bellman equations can skim this section quickly and proceed directly to Section 1.2.

1.1.1.1A Two-Period Problem¶

Imagine someone who begins her working life at time $t=1$ without employment. While unemployed, she receives a new job offer paying wage $W_t$ at each date $t$ . She can accept the offer and work permanently at that wage level or reject the offer, receive unemployment compensation $c$ , and draw a new offer next period. We assume that the wage offer sequence is IID and nonnegative, with distribution $\phi$ . In particular,

$\Wsf \subset \RR_+$ is a finite set of possible wage outcomes and
$\phi \colon \Wsf \to [0, 1]$ is a probability distribution on $\Wsf$ , assigning a probability $\phi(w)$ to each possible wage outcome $w$ .

The worker is impatient. Impatience is parameterized by a time discount factor $\beta \in (0, 1)$ , so that the present value of a next-period payoff of $y$ dollars is $\beta y$ . Since $\beta < 1$ , the worker will be tempted to accept reasonable offers, rather than to wait for better ones. A key question is how long to wait.

Suppose as a first step that working life is just two periods. To solve our problem, we work backwards, starting at the final date $t=2$ , after $W_2$ has been observed.^[1] If she is already employed, the worker has no decision to make: She continues working at her current wage. If she is unemployed, then she should take the largest of $c$ and $W_2$ .

Now we step back to $t=1$ . At this time, having received offer $W_1$ , the unemployed worker’s options are (a) accept $W_1$ and receive it in both periods or (b) reject it, receive unemployment compensation $c$ , and then, in the second period, choose the maximum of $W_2$ and $c$ .

Let’s assume that the worker seeks to maximize EPV. The EPV of option (a) is $W_1 + \beta W_1$ , which is also called the stopping value. The EPV of option (b), also called the continuation value, is $h_1 \coloneq c + \beta \, \EE \max\{c, W_2\}$ . More explicitly,

h_1 = c + \beta \sum_{w' \in \Wsf} v_2(w') \phi(w'), \quad \text{where} \quad v_2(w) \coloneq \max\{c, w\}.

(1.2)

The optimal choice at $t=1$ is now clear: Accept the offer if $W_1 + \beta W_1 \geq h_1$ and reject otherwise. A decision tree is shown in Figure 1.2.

Figure 1.2:Decision tree for a two-period problem

1.1.1.2Comments on Information¶

In determining the optimal choice, we assumed that the worker (a) cares about expected values and (b) knows how to compute them. In Chapter 7 and Chapter 8 we discuss how to extend or weaken these assumptions. Some of these extensions allow decision-makers to focus on measurements that differ from expected values. Other extensions assume that the decision-maker does not know underlying probability distributions. For now we put these issues aside and return to the setup discussed in Section 1.1.1.1.

1.1.1.3Value Functions¶

A key idea in dynamic programming is to use “value functions” to track maximal lifetime rewards from a given state at a given time. The time 2 value function $v_2$ defined in (1.2) returns the maximum value obtained in the final stage for each possible realization of the time 2 wage offer. The time 1 value function $v_1$ evaluated at $w \in \Wsf$ is

v_1(w) \coloneq \max \left\{ w + \beta w ,\, c + \beta \, \sum_{w' \in \Wsf} v_2(w') \phi(w') \right\}.

(1.3)

It represents the present value of expected lifetime income after receiving the first offer $w$ , conditional on choosing optimally in both periods.

Figure 1.3:The value function $v_1$ and the reservation wage

The value function is shown in Figure 1.3. This figure also shows the reservation wage

w_1^* \coloneq \frac{h_1}{1+\beta}.

(1.4)

It is the $w$ that solves the indifference condition

w + \beta w = c + \beta \, \sum_{w' \in \Wsf} v_2(w') \phi(w'),

and equates the value of stopping to the value of continuing. For an offer $W_1$ above $w_1^*$ , the stopping value exceeds the continuation value. For an offer below the reservation wage, the reverse is true. Hence, the optimal choice for the worker at $t=1$ is completely described by the reservation wage.

Parameters and functions underlying the figure are shown in Listing 1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
using Distributions

"Creates an instance of the job search model, stored as a NamedTuple."
function create_job_search_model(;
        n=50,        # wage grid size
        w_min=10.0,  # lowest wage
        w_max=60.0,  # highest wage
        a=200,       # wage distribution parameter
        b=100,       # wage distribution parameter
        β=0.96,      # discount factor
        c=10.0       # unemployment compensation
    )
    w_vals = collect(LinRange(w_min, w_max, n+1))
    ϕ = pdf(BetaBinomial(n, a, b))
    return (; n, w_vals, ϕ, β, c)
end

" Computes lifetime value at t=1 given current wage w_1 = w. "
function v_1(w, model)
    (; n, w_vals, ϕ, β, c) = model
    h_1 = c + β * max.(c, w_vals)'ϕ
    return max(w + β * w, h_1)
end

" Computes reservation wage at t=1. "
function res_wage(model)
    (; n, w_vals, ϕ, β, c) = model
    h_1 = c + β * max.(c, w_vals)'ϕ
    return h_1 / (1 + β)
end

Program 1:Computing $v_1$ and $w^*_1$ (two_period_job_search.jl)

Equation (1.4) is instructive. We can see that higher unemployment compensation $c$ shifts up the continuation value $h_1$ and increases the reservation wage. As a result, the worker will, on average, spend more time unemployed when unemployment compensation is higher.

1.1.1.4Three Periods¶

Now let’s suppose that the worker works in period $t=0$ as well as $t=1,2$ . Figure 1.4 shows the decision tree for the three periods. Notice that the subtree containing nodes 1 and 2 is just the decision tree for the two-period problem in Figure 1.2. We will use this to find optimal actions.

Figure 1.4:Decision tree for the job seeker

At $t=0$ , the value of accepting the current offer $W_0$ is $W_0 + \beta W_0 + \beta^2 W_0$ , while the maximal value of rejecting and waiting is $c$ plus, after discounting by $\beta$ , the maximum value that can be obtained by behaving optimally from $t=1$ . We have already calculated this value: It is just $v_1(W_1)$ , as given in (1.3)!

The maximal time zero value $v_0(w)$ is the maximum of the value of these two options, given $W_0 = w$ , so we can write

v_0(w) = \max \left\{ w + \beta \, w + \beta^2 \, w ,\, c + \beta \, \sum_{w' \in \Wsf} v_1(w') \phi(w') \right\}.

(1.5)

By plugging $v_1$ from (1.3) into this expression, we can determine $v_0$ , as well as the optimal action, the one that achieves the largest value in the max term in (1.5).

Figure 1.4 illustrates how the backward induction process works. The last-period value function $v_2$ is trivial to obtain. With $v_2$ in hand, we can compute $v_1$ . With $v_1$ in hand, we can compute $v_0$ . Once all the value functions are available, we can calculate whether to accept or reject the current offer at each point in time.

Notice how we subdivided the three-period problem down into a pair of two-period problems, given by (1.3) and (1.5). Breaking many-period problems down into a sequence of two-period problems is the essence of dynamic programming. The recursive relationships between $v_0$ and $v_1$ in (1.5), as well as between $v_1$ and $v_2$ in (1.3), are examples of what are called Bellman equations. We will see many other examples.

1.1.2Infinite Horizon¶

Next, we consider an infinite horizon problem that in some ways is more challenging but in other ways simpler. On the one hand, the lack of a terminal period means that backward induction requires a subtler justification. On the other hand, the infinite horizon means that the worker always faces an infinite future, so that we only have to study a single-value function and need not keep track of the number of remaining periods in the problem. This will become clearer as the section unfolds.^[2]

With this discussion in mind, let us consider a worker who aims to maximize

\EE \sum_{t=0}^{\infty} \beta^t R_t,

(1.6)

where $R_t \in \{c, W_t\}$ is earnings at time $t$ . As before, jobs are permanent, so accepting a job at a given wage means earning that wage in every subsequent period.

Let’s clarify our assumptions:

Here and in what follows, for any finite or countable set $F$ , the symbol $\dD(F)$ indicates the set of distributions on $F$ .

As with the finite-state case, infinite-horizon dynamic programming involves a two-step procedure that first assigns values to states and then deduces optimal actions given those values. We begin with an informal discussion and then formalize the main ideas.

To trade off current and future rewards optimally, we need to compare the current payoffs we get from our two choices with the states that those choices lead to and the maximum value that can be extracted from those states. But how do we calculate the maximum value that can be extracted from each state when lifetime is infinite?

Consider first the present expected lifetime value of being employed with wage $w \in \Wsf$ . This case is easy because, under the current assumptions, workers who accept a job are employed forever. Lifetime payoff is

w + \beta w + \beta^2 w + \cdots = \frac{w}{1 - \beta}.

(1.7)

How about the maximum present expected lifetime value attainable when entering the current period unemployed with wage offer $w$ in hand? Denote this (as yet unknown) value by $v^*(w)$ . We call $v^*$ the value function. While $v^*$ is not trivial to pin down, the task is not impossible. Our first step in the right direction is to observe that it satisfies the Bellman equation

v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*(w') \phi(w') \right\},

(1.8)

at every $w \in \Wsf$ . (Here $w'$ is the offer next period.)

Our reasoning is as follows: The first term inside the max operation is the stopping value, or lifetime payoff from accepting current offer $w$ . The second term inside the max operation is the continuation value, or current expected value of rejecting and behaving optimally thereafter. The maximal value is obtained by selecting the largest of these two alternatives.

Note the similarity between (1.8) and our finite horizon Bellman equations (1.3) and (1.5). The only real difference is that the value function is no longer time-dependent. This is because the worker always looks forward toward an infinite horizon, regardless of the current date.

Equation (1.8) is to be solved for a function $v^* \in \RR^\Wsf$ , the set of all functions from $\Wsf$ to $\RR$ . Once we have solved for $v^*$ (assuming this is possible), optimal choices can be made by observing current $w$ and then choosing the largest of the two alternatives on the right-hand side of (1.8), just as we did in the finite horizon case. This idea – that optimal choices can be made by computing the value function and maximizing the right-hand side of the Bellman equation – is called Bellman’s principle of optimality, and will be a cornerstone of what follows. Later we prove it in a general setting.

To solve for $v^*$ , we use fixed-point theory, our topic in the next section. Later, in Section 1.3, we return to the job search problem and apply fixed-point theory to solve for $v^*$ .

1.2Stability and Contractions¶

In this section, we cover enough fixed-point theory to solve an infinite horizon job search problem. (In Chapter 2 we consider more general results.) Readers who are familiar with the Neumann series lemma and Banach’s fixed-point theorem can skim this section and proceed to Section 1.3.

1.2.1Vector Space¶

To begin, we recall some fundamental properties of real numbers, finite-dimensional vector space, basic topology, and equivalence of norms.

1.2.1.1Real and Complex Vectors¶

For the most part, we are interested in vectors whose elements are real numbers (as distinguished from complex numbers). Before investigating such vectors, let’s provide some useful language about the real line $\RR$ . (You might want to review some elementary concepts from real analysis in Appendix A, such as suprema, infima, minima, maxima, and convergence.)

Given $a, b \in \RR$ , let $a \vee b \coloneq \max\{a, b\}$ and $a \wedge b \coloneq \min\{a, b\}$ . The absolute value of $a \in \RR$ is defined as $|a| \coloneq a \vee (-a)$ .

A real-valued vector $u = (u_1, \ldots, u_n)$ is a finite real sequence with $u_i \in \RR$ as the $i$ -th element. The set of all real vectors of length $n$ is denoted by $\RR^n$ . The inner product of $n$ -vectors $(u_1, \ldots, u_n)$ and $(v_1, \ldots, v_n)$ is $\inner{u,v} \coloneq \sum_{i=1}^n u_i v_i$ .

The set $\CC$ of complex numbers is defined in the appendix to Sargent & Stachurski (2023) and many other places; as is the set $\CC^n$ of all complex-valued $n$ -vectors. We assume readers know what complex numbers are and how to compute the modulus of a complex number.

1.2.1.2Norms¶

The Euclidean norm on a real vector space is defined as

\| u \| \coloneq \sqrt{ \inner{u, u} } \qquad (u \in \RR^n).

Because they provide more flexibility when checking conditions that underlie various results, some alternative norms on $\RR^n$ are important for applications of fixed-point theory.

As a first step, recall that a function $\| \cdot \| \colon \RR^n \to \RR$ is called a norm on $\RR^n$ if, for any $\alpha \in \RR$ and $u, v \in \RR^n$ ,

(a) $\| u \| \geq 0$

(b) $\| u \| =0 \iff u=0$

(d) $\| u + v \| \leq \| u \| + \| v \|$

(nonnegativity)

(positive definiteness)

(absolute homogeneity)

(triangle inequality)

The Euclidean norm on $\RR^n$ satisfies the Cauchy–Schwarz inequality

| \inner{u, v} | \leq \| u \| \cdot \| v \| \quad \text{for all } u, v \in \RR^n .

This inequality can be used to prove that the triangle inequality holds for the Euclidean norm (see, e.g., Kreyszig (1978)).

The $\ell_1$ norm and the Euclidean norm are special cases of the so-called $\ell_p$ norm, which is defined for $p \geq 1$ by

\| u \|_p \coloneq \left( \sum_{i=1}^n |u_i|^p \right)^{1/p}.

(1.10)

It can be shown that $u \mapsto \| u \|_p$ is a norm for all $p \geq 1$ , as suggested by the name (see, e.g., Kreyszig (1978)). For this norm, the subadditivity asserted in (d) is called Minkowski’s inequality.

Since the Euclidean case is obtained by setting $p=2$ , the Euclidean norm is also called the $\ell_2$ norm, and we write $\| \cdot \|_2$ rather than $\| \cdot \|$ when extra clarity is required.

(The symbol $\| u \|_\infty$ is used because, for all $u \in \RR^n$ , we have $\| u \|_p \to \| u \|_\infty$ as $p \to \infty$ .)

For the next exercise, we recall that the indicator function of logical statement $P$ , denoted here by $\1\{P\}$ , takes value 1 (resp., 0) if $P$ is true (resp., false). For example, if $x, y \in \RR$ , then

\1\{x \leq y\} = \begin{cases} 1 & \text{ if } x \leq y \\ 0 & \text{ otherwise} . \end{cases}

If $A \subset S$ , where $S$ is any set, then $\1_A(x) \coloneq \1\{x \in A\}$ for all $x \in S$ .

1.2.1.3Equivalence of Vector Norms¶

An important property of a finite-dimensional normed vector space is that all norms are “equivalent.” Let’s review this result and discuss why it matters.

To begin, recall that when $u$ and $(u_m) \coloneq (u_m)_{m \in \NN}$ are all elements of $\RR^n$ , we say that $(u_m)$ converges to $u$ and write $u_m \to u$ if

\| u_m - u \| \to 0 \text{ as } m \to \infty \text{ for some norm } \| \cdot \| \text{ on } \RR^n.

It might seem that this definition is imprecise. Don’t we need to clarify that the convergence is with respect to a particular norm?

No we don’t. This is because any two norms $\| \cdot \|_a$ and $\| \cdot \|_b$ on $\RR^n$ are equivalent in the sense that there exist finite positive constants $M, N$ such that

M \|u\|_a \leq \| u\|_b \leq N \| u \|_a \quad \text{for all } u \in \RR^n.

(1.11)

(See, e.g., Kreyszig (1978).)

The next exercise tells us that pointwise convergence and norm convergence are the same thing in finite dimensions.

Recall that a set $C \subset \RR^n$ is called bounded if there exists an $M \in \NN$ with $\|x\| \leq M$ for all $x \in C$ ; and closed in $\RR^n$ if, for all $u \in \RR^n$ and sequences $(u_m) \subset C$ such that $u_m \to u$ as $m \to \infty$ , we also have $u \in C$ . A set $G \subset \RR^n$ is called open in $\RR^n$ if $G^c$ is closed in $\RR^n$ . A set $N$ is called a neighborhood of $u \in \RR^n$ if there exists an open set $G \subset \RR^n$ with $u \in G \subset N$ . A map $T$ from $U \subset \RR^n$ to $\RR^k$ is called continuous at $u \in U$ if $Tu_m \to Tu$ for any $(u_m) \subset U$ with $u_m \to u$ ; and continuous if $T$ is continuous at every $u \in U$ . These notions apply to any norm, since convergence does not depend on our choice of norm.

1.2.1.4Matrices and Neumann Series¶

Next, we discuss geometric series in matrix space, along with the Neumann series lemma, one of many useful results in applied and numerical analysis.

Before starting we recall that if $A = (a_{ij})$ is an $n \times n$ matrix with $i,j$ -th element $a_{ij}$ , then the definition of matrix multiplication tells us that for $u \in \RR^n$ , the $i$ -th element of $Au$ is $\sum_{j=1}^n a_{ij}u_j$ , while the $j$ -th element of $u^\top A$ is $\sum_{i=1}^n a_{ij}u_i$ . Think of $u \mapsto Au$ and $u \mapsto u^\top A$ as two different mappings, each of which takes an $n$ -vector and produces a new $n$ -vector.

Just as we considered norms of vectors in Section 1.2.1.2, we will find it helpful to have a notion of norms of matrices. A real-valued map defined on $\RR^{n \times n}$ , the set of real $n \times n$ matrices, is called a matrix norm if it has the following properties: for any $\alpha \in \RR$ and any $n\times n$ matrices $A, B$ ,

(a) $\| A \| \geq 0$ ,

(b) $\| A \| =0 \iff A=0$ ,

(d) $\| A + B \| \leq \| A \| + \| B \|$ , and

These are called nonnegativity, positive definiteness, absolute homogeneity, and the triangle inequality, analogous to the norms on $\RR^n$ discussed in Section 1.2.1.2.

An example of a matrix norm is the so-called operator norm

\| B \|_o \coloneq \max_{\|u\| = 1} \| B u \|.

(1.12)

Here $B$ is $n \times n$ , $u$ is in $\RR^n$ and the norm on the right-hand side is the Euclidean norm over the $n$ -vector $B u$ . Another example of a matrix norm is the supremum norm defined as

\| B \|_\infty \coloneq \max_{1 \leq i, j \leq n} |b_{ij}|, \quad \text{ where } b_{ij} \text{ is the } i,j \text{-th element of } B.

(1.13)

Some matrix norms have the submultiplicative property, which means that, for all $A, B \in \RR^{n \times n}$ , we have $\| A B \| \leq \|A \| \|B\|$ .

In what follows we often use the operator norm as our choice of matrix norm (partly because of its attractive submultiplicative property). Hence, by convention, an expression such as $\| A\|$ refers to the operator norm $\|A\|_o$ of $A$ .

Analogous to the vector case, we say that a sequence $(A_k)$ of $n \times n$ matrices converges to an $n \times n$ matrix $A$ and write $A_k \to A$ if $\| A_k - A \| \to 0$ as $k \to \infty$ . Just as with vectors, this form of norm convergence holds if and only if each element of $A_k$ converges to the corresponding element of $A$ . The proof is similar to the solution to Exercise 1.2.8.

If $A$ is an $n \times n$ matrix, then $\lambda \in \CC$ is called an eigenvalue of $A$ if there exists a nonzero $e \in \CC^n$ such that $Ae = \lambda e$ . (Here $\CC$ is the set of complex numbers and $\CC^n$ is the set of complex $n$ -vectors.) A vector $e$ satisfying this equality is called an eigenvector of $A$ and $(\lambda, e)$ is called an eigenpair.

In Julia, we can compute the eigenvalues of a square matrix $A$ via eigvals(A). The code

using LinearAlgebra
A = [0 -1;
     1  0]
println(eigvals(A))

produces

2-element Vector{ComplexF64}:
 0.0 - 1.0im
 0.0 + 1.0im

Here im stands for $i$ , the imaginary unit, so the eigenvalues of $A$ are $-i$ and $i$ .

Turning to geometric series, let us begin in one dimension. Consider the one-dimensional linear equation $u = au + b$ , where $a, b$ are given and $u$ is unknown. Its solution $u^*$ satisfies

|a| < 1 \quad \implies \quad u^* = \frac{b}{1-a} = \sum_{k \geq 0} a^k b.

(1.14)

This scalar result extends naturally to vectors. To show this we suppose that $u$ and $b$ are column vectors in $\RR^n$ , and that $A$ is an $n \times n$ matrix. We consider the vector equation $u = A u + b$ . For the next result, we recall that the spectral radius of $A$ is defined as

\rho(A) \coloneq \max\setntn{|\lambda|}{\lambda \text{ is an eigenvalue of } A}

(1.15)

Here $|\lambda|$ indicates the modulus of complex number $\lambda$ .

With $I$ as the $n \times n$ identity matrix, we can state the following result.

It follows directly that the vector system $u = A u + b$ has a unique solution $u^* = (I - A)^{-1} b = \sum_{k \geq 0} A^k b$ whenever $\rho(A) < 1$ . This is the multivariate extension of (1.14).

The code in Listing 2 shows how to compute the spectral radius of an arbitrary matrix $A$ in Julia. The print statement produces 0.5828, so, for this matrix, $\rho(A)<1$ .

1
2
3
4
5
using LinearAlgebra                         
ρ(A) = maximum(abs(λ) for λ in eigvals(A))  # Spectral radius
A = [0.4 0.1;                               # Test with arbitrary A
     0.7 0.2]
print(ρ(A))

Program 2:Computing a spectral radius (compute_spec_rad.jl)

The rest of this section works through the proof of the Neumann series lemma, with several parts left as exercises. An informal proof of the lemma runs as follows. If $S \coloneq \sum_{k \geq 0} A^k$ , then

I + AS = I + A \sum_{k \geq 0} A^k = I + A + A^2 + \cdots = S.

Rearranging $I + AS = S$ gives $S = (I - A)^{-1}$ , which matches the claim in the Neumann series lemma.

This informal argument lacks rigor. To make it rigorous, we must prove (a) that the sum $\sum_{k \geq 0} A^k$ converges and (b) that the matrix $I-A$ is invertible.

A proof of Lemma 1.2.2 can be found in Chapter 12 of Bollobás (1999). The second result is sometimes called Gelfand’s formula.

From this last result, one can show that $(I-A)^{-1}$ exists by computing it:

Listing 3 helps illustrate the result in Exercise 1.2.14, although we truncate the infinite sum $\sum_{k \geq 0} A^k$ at 50.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Primitives
A = [0.4 0.1;
     0.7 0.2]

# Method one: direct inverse
B_inverse = inv(I - A)

# Method two: power series
function power_series(A)
    B_sum = zeros((2, 2))
    A_power = I
    for k in 1:50
        B_sum += A_power
        A_power = A_power * A
    end
    return B_sum
end

# Print maximal error
print(maximum(abs.(B_inverse - power_series(A))))

Program 3:Matrix inversion versus power series (power_series.jl)

The output 5.621e-12 is close enough to zero for many practical purposes.

1.2.2Nonlinear Systems¶

While the Neumann series lemma is a powerful tool for solving linear systems, it doesn’t help us with nonlinear problems. In this section, we present Banach’s fixed-point theorem, one of a variety of techniques for handling nonlinear systems. (Chapter 2 introduces other methods.)

1.2.2.1Fixed Points¶

A standard approach to solving an equation is to formulate it as a fixed-point problem. This section provides the basic definitions and some simple results from fixed-point theory.

Let $U$ be any nonempty set. We call $T$ a self-map on $U$ if $T$ is a function from $U$ into itself. For a self-map $T$ on $U$ , a point $u^* \in U$ is called a fixed point of $T$ in $U$ if $T u^* = u^*$ . (In fixed-point theory, it is common to write $T u$ for the image of $u$ under $T$ , rather than $T(u)$ .)

Figure 1.5 shows another example, for a self-map $T$ on $U \coloneq [0, 2]$ . Fixed points are numbers $u \in [0, 2]$ where $T$ meets the 45-degree line. In this case there are three.

Graph and fixed points of T \colon u \mapsto 2.125/(1 + u^{-4}) — Figure 1.5:Graph and fixed points of $T \colon u \mapsto 2.125/(1 + u^{-4})$

When considering fixed points, given a self-map $T$ on $U$ , we typically seek conditions on $T$ and $U$ under which the following properties hold:

$T$ has at least one fixed point on $U$ (existence),
$T$ has at most one fixed point on $U$ (uniqueness), and
the fixed point of $T$ on $U$ can be computed numerically.

1.2.2.2Global Stability¶

A self-map $T$ on $U$ is called globally stable on $U$ if $T$ has a unique fixed point $u^*$ in $U$ and $T^k u \to u^*$ as $k \to \infty$ for all $u \in U$ . Here $T^k$ indicates $k$ compositions of $T$ with itself. Global stability is a desirable property in the setting of dynamic programming. A number of our results rely on it.

Let $T$ be a self-map on $U \subset \RR^n$ . We call $T$ invariant on $C \subset U$ and call $C$ an invariant set if $T$ is also a self-map on $C$ ; that is, if $u \in C$ implies $Tu \in C$ .

1.2.2.3Banach’s Fixed-Point Theorem¶

Next, we present the Banach fixed-point theorem, a workhorse for analyzing nonlinear operators.

Let $U$ be a nonempty subset of $\RR^n$ and let $\| \cdot \|$ be a norm on $\RR^n$ . A self-map $T$ on $U$ is called a contraction on $U$ with respect to $\| \cdot \|$ if there exists a $\lambda < 1$ such that

\| Tu - Tv \| \leq \lambda \| u - v \| \quad \text{for all} \quad u, v \in U.

(1.17)

The constant $\lambda$ is called the modulus of contraction.

The following theorem features a contraction.

We prove Theorem 1.2.3 in stages that build on the following exercises.

A fundamental property of $\RR^n$ is that if $(v_m)$ is a Cauchy sequence in $\RR^n$ , then there exists a $\bar v \in \RR^n$ such that $(v_m)$ converges to $\bar v$ . (This property is called completeness of the vector space $\RR^n$ . See, for example, Çınlar & Vanderbei (2013).) Hence it follows from Exercise 1.2.22 that $(u_m)$ has a limit $u^* \in \RR^n$ .

1.2.3Successive Approximation¶

Consider a self-map $T$ on $U \subset \RR^n$ . We seek algorithms that compute fixed points of $T$ whenever they exist.

1.2.3.1Iteration¶

If $T$ is globally stable on $U$ , then a natural algorithm for approximating the unique fixed point $u^*$ of $T$ in $U$ is to pick any $u \in U$ and iterate with $T$ for some finite number of steps:

By the definition of global stability, $(u_k)_{k \geq 0}$ converges to $u^*$ . The algorithm just described is called either successive approximation or fixed-point iteration. Listing 4 provides a function that implements this procedure. Distances between points are measured with the $\ell_\infty$ norm.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
Computes an approximate fixed point of a given operator T 
via successive approximation.

"""
function successive_approx(T,                  # operator (callable)
                           u_0;                # initial condition
                           tolerance=1e-6,     # error tolerance
                           max_iter=10_000,    # max iteration bound
                           print_step=25)      # print at multiples
    u = u_0
    error = Inf
    k = 1

    while (error > tolerance) & (k <= max_iter)
        
        u_new = T(u)
        error = maximum(abs.(u_new - u))

        if k % print_step == 0
            println("Completed iteration $k with error $error.")
        end

        u = u_new
        k += 1
    end

    if error <= tolerance
        println("Terminated successfully in $k iterations.")
    else
        println("Warning: hit iteration bound.")
    end

    return u
end

Program 4:Successive approximation (s_approx.jl)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
include("s_approx.jl")
using LinearAlgebra

# Compute the fixed point of Tu = Au + b via linear algebra
A, b = [0.4 0.1; 0.7 0.2], [1.0; 2.0]
u_star = (I - A) \ b  # compute (I - A)^{-1} * b

# Compute the fixed point via successive approximation
T(u) = A * u + b
u_0 = [1.0; 1.0]
u_star_approx = successive_approx(T, u_0)

# Test for approximate equality (prints "true")
print(isapprox(u_star, u_star_approx, rtol=1e-5))

Program 5:Using successive approximations to compute $u^*$ (linear_iter.jl)

Listing 5 applies successive approximation to the map $Tu = Au + b$ using the function defined in s_approx.jl. Figure 1.6 shows the sequence of iterates generated by four runs of the successive approximation algorithm, each with a different starting condition $u_0$ . The map and parameters are the same as in Listing 5. It is clear from the figure that a good choice of initial condition (i.e., one that is close to the fixed point) accelerates convergence.

Figure 1.6:Successive approximations from different initial conditions

Of course for $Tu = Au + b$ with $\rho(A)<1$ , there is a more direct method to compute the fixed point: The Neumann series lemma tells us that $u^* = (I-A)^{-1} b$ so we can apply a numerical linear equation solver. However, even for this case, sometimes successive approximation is used instead. One reason is that $(I-A)^{-1}$ can be very large, making application of a linear solver problematic. Another is that we might be satisfied with a quick approximation of the fixed point, computed with a few iterations of $T$ . Both of these situations can arise in dynamic programming.

1.2.3.2A One-Dimensional Example¶

To illustrate successive approximations in a nonlinear setting, we use the Solow–Swan growth model, which is a good place to begin presenting a theory of economic growth. A fixed point for the Solow–Swan model can be computed with pencil and paper. The model also provides a good laboratory for studying how successive approximations might converge to a fixed point.

One version of the Solow–Swan growth dynamics is

k_{t+1} = s f(k_t) + (1 - \delta) k_t, \qquad t = 0, 1, \ldots,

(1.19)

where $k_t$ is capital stock per worker, $f \colon (0, \infty) \to (0, \infty)$ is a production function, $s > 0$ is a saving rate and $\delta \in (0,1)$ is a rate of depreciation. If we set $g(k) \coloneq sf(k) + (1-\delta)k$ , then iterating with $g$ from a starting point $k_0$ (i.e., setting $k_{t+1} = g(k_t)$ for all $t \geq 0$ ) generates the sequence in (1.19). We can also understand this process as using successive approximation to compute the fixed point of $g$ .

Solution to Exercise 1.2.25

By the definition of the derivative, for any $x \in U \coloneq (0, \infty)$ , we have

\lim_{y \to x} \left| \frac{g(y)-g(x)}{y - x} - g'(x) \right| = 0.

Hence, by the reverse triangle inequality, for fixed $\epsilon > 0$ , we can take a $\delta > 0$ such that

\left| \frac{g(y)-g(x)}{y - x} \right| > |g'(x)| - \epsilon = g'(x) - \epsilon,

for all $y \in (x - \delta, x + \delta)$ . Rearranging gives

|g(x)-g(y)| > [g'(x) - \epsilon] |x-y|,

for all $y \in (x - \delta, x + \delta)$ . But $g'(x) = s \alpha x^{\alpha-1} + 1 - \delta$ , which converges to $+\infty$ as $x \to 0$ . It follows that, for any $\lambda \in [0,1)$ , we can find a pair $x, y$ such that $|g(x)-g(y)| > \lambda |x-y|$ . Hence $g$ is not a contraction map under $|\cdot|$ .

Although the model specified in Exercise 1.2.25 does not generate a contraction, it is globally stable. The next exercise asks you to prove this.

Figure 1.7 illustrates the dynamics in a 45-degree diagram when $f(k) = A k^\alpha$ . In the top subfigure, $A=2.0$ , $\alpha=0.3$ , $s=0.3$ and $\delta=0.4$ . The function $g$ is plotted alongside the 45-degree line. When $g(k_t)$ lies strictly above the 45-degree line, then $k_{t+1} = g(k_t) > k_t$ and so capital per worker rises. If $g(k_t) < k_t$ then it falls. A trajectory $(k_t)_{t \geq 0}$ that is produced by starting from a particular choice of $k_0$ is traced out in the figure.

Figure 1.7:Successive approximation for the Solow–Swan model

The figure illustrates that $k^*$ is the unique fixed point of $g$ in $U$ and all sequences converge to it. The second statement can be rephrased as: successive approximation successfully computes the fixed point of $g$ by stepping through the time path of capital.

1.2.4Finite-Dimensional Function Space¶

In Section 1.1.2 we introduced a Bellman equation for the infinite horizon job search problem. The unknown object in the Bellman equation is a function $v^*$ defined on the set $\Wsf$ of possible wage offers. Below we discuss how to solve for this unknown function.

Since the set of wage offers is finite we can write $\Wsf$ as $\{w_1, \ldots, w_n\}$ for some $n \in\NN$ . If we adopt this convention and also write $v^*(w_i)$ as $v^*_i$ , then we can view $v^*$ as a vector $(v^*_1, \ldots, v^*_n)$ in $\RR^n$ . The vector interpretation is useful when coding, since vectors (numerical arrays) are an efficient data type.

Nevertheless, for mathematical exposition, we usually find it more convenient to express function-like objects (e.g., value functions) as functions rather than vectors. Thus, we typically write $v^*(w)$ instead of $v^*_i$ .

Section 1.2.4.1 clarifies our notation with respect to functions and vectors.

1.2.4.1Pointwise Operations on Functions¶

If $\Xsf$ is any set and $u$ maps $\Xsf$ to $\RR$ , then we call $u$ a real-valued function on $\Xsf$ and write $u \colon \Xsf \to \RR$ . Throughout, the symbol $\RR^\Xsf$ denotes the set of all real-valued functions on $\Xsf$ . This is a special case of the symbol $B^A$ that represents the set of all functions from $A$ to $B$ , where $A$ and $B$ are sets.

If $u, v \in \RR^\Xsf$ and $\alpha, \beta \in \RR$ , then the expressions $\alpha u + \beta v$ and $uv$ also represent elements of $\RR^\Xsf$ , defined at $x \in \Xsf$ by

(\alpha u + \beta v)(x) = \alpha u(x) + \beta v(x) \quad \text{and} \quad (uv)(x) = u(x)v(x).

(1.20)

Similarly, $|u|$ , $u \vee v$ , and $u \wedge v$ are real-valued functions on $\Xsf$ defined by

|u|(x) = |u(x)|, \quad (u \vee v)(x) = u(x) \vee v(x) \;\; \text{ and } \;\; (u \wedge v)(x) = u(x) \wedge v(x).

(1.21)

Figure 1.8 illustrates functions $u \vee v$ and $u \wedge v$ when $\Xsf$ is a subset of $\RR$ .

Similarly, if $u = (u_i)_{i=1}^n$ and $v = (v_i)_{i=1}^n$ are vectors in $\RR^n$ , then

|u| \coloneq (|u_i|)_{i=1}^n, \quad u \wedge v \coloneq (u_i \wedge v_i)_{i=1}^n \quad \text{and} \quad u \vee v \coloneq (u_i \vee v_i)_{i=1}^n.

(1.22)

Figure 1.9 illustrates in $\RR^2$ .

The vectors u \vee v and u \wedge v in \RR^2 — Figure 1.9:The vectors $u \vee v$ and $u \wedge v$ in $\RR^2$

1.2.4.2Functions versus Vectors¶

Let $\Xsf$ be finite, so that $\Xsf = \{x_1, \ldots, x_n\}$ for some $n \in \NN$ . The set $\RR^\Xsf$ is the vector space $\RR^n$ expressed in different notation. The next lemma clarifies.

The claim in Lemma 1.2.4 is obvious: a real-valued function $u$ on $\Xsf$ is uniquely identified by the set of values that it takes on $\Xsf$ , which is an $n$ -tuple of real numbers.

Throughout the text, whenever the supporting set $\Xsf$ is finite, we freely use the identification in (1.23). For example, if $\| \cdot \|$ is any norm on $\RR^n$ , then $\| \cdot \|$ extends to $\RR^\Xsf$ via the identification in (1.23). That is, for $u \in \RR^\Xsf$ , the value $\| u \|$ is given by the norm of the vector $(u(x_1), \ldots, u(x_n)) \in \RR^n$ .

We say that a subset of $\RR^\Xsf$ is closed (resp., open, compact, etc.) if the corresponding subset of $\RR^n$ is closed (resp., open, compact, etc.)

With these conventions, the Neumann series lemma and Banach’s contraction mapping theorem extend directly from $\RR^n$ to $\RR^\Xsf$ . For example, if $|\Xsf|=n$ , $C$ is closed in $\RR^\Xsf$ and $T$ is a contraction on $C \subset \RR^\Xsf$ , in the sense that $T \colon C \to C$ and

\text{ there exists a } \lambda \in [0, 1) \ \st \quad \| Tf - Tg \| \leq \lambda \| f - g \| \quad \text{for all} \quad f, g \in C,

then $T$ has a unique fixed point $f^*$ in $C$ and

\| T^n f - f^* \| \leq \lambda^n \| f - f^* \| \quad \text{for all } n \in \NN \text{ and } f \in \RR^\Xsf.

Incidentally, in the preceding paragraph $T$ is a function that sends functions into functions (e.g., sends $f$ into $Tf$ ). To help distinguish $T$ from the functions that it acts on, $T$ in this setting is often called an operator rather than a function. This is a convention rather than a formal distinction: from a mathematical perspective, an operator is just a function.

A foundational class of operators acting on $\RR^\Xsf$ is the set of linear operators. There is a strong sense in which linear operators are just matrices. We investigate these ideas in Section 2.3.3. At the same time, when studying dynamic programming we also use many operators that are not linear. One example is the “Bellman operator,” which we start to investigate in Section 1.3.1.2.

1.2.4.3Distributions¶

Given a set $\Xsf$ with $n$ elements, the set of probability distributions on $\Xsf$ is written as $\dD(\Xsf)$ and contains all $\phi \in \RR_+^\Xsf$ with $\sum_{x \in \Xsf} \phi(x) =1$ . Since we can identify any $f \in \RR^\Xsf$ with a corresponding vector in $\RR^n$ , the set $\dD(\Xsf)$ can also be thought of as a subset of $\RR^n$ . This collection of vectors (i.e., the nonnegative vectors that sum to unity) is also called the unit simplex. Given $\Xsf_0 \subset \Xsf$ and $\phi \in \dD(\Xsf)$ , we say that $\phi$ is supported on $\Xsf_0$ if $\phi(x) > 0$ implies $x \in \Xsf_0$ .

Fix $h \in \RR^\Xsf$ and $\phi \in \dD(\Xsf)$ . Let $X$ be a random variable with distribution $\phi$ , so that $\PP\{X = x\} = \phi(x)$ for all $x \in \Xsf$ . The expectation of $h(X)$ is

\EE h(X) \coloneq \sum_{x \in \Xsf} h(x) \phi(x) = \inner{h, \phi}.

If $\Xsf \subset \RR$ , then the cumulative distribution function (CDF) corresponding to $\phi$ is the map $\Phi$ from $\Xsf$ to $\RR$ given by

\Phi(x) \coloneq \PP\{X \leq x\} = \sum_{x' \in \Xsf} \1\{x' \leq x\} \phi(x').

If $\tau \in [0,1]$ , then the $\tau$ -th quantile of $X$ is

Q_\tau \,X \coloneq \min \setntn{x \in \Xsf}{\Phi(x) \geq \tau}.

(1.24)

If $\tau = 1/2$ , then $Q_\tau \,X$ is called the median of $X$ .

Evidently, if the median of $X$ is $x$ , then the median of $X + \alpha$ will be $x + \alpha$ . This same logic carries over to arbitrary quantiles, as the next exercise asks you to show.

Solution to Exercise 1.2.28

Fix $\tau \in [0,1]$ , $X \sim \phi \in \dD(\Xsf)$ and $\alpha \in \RR$ . Let $\Phi_X$ be the CDF of $X$ . Let $Y \coloneq X + \alpha$ , let $\Ysf \coloneq \setntn{x + \alpha}{x \in \Xsf}$ and let $\Phi_Y$ the CDF of $Y$ . Note that $\Phi_Y(y) = \PP\{Y \leq y\} = \PP\{X \leq y - \alpha\} = \Phi_X(y-\alpha)$ for all $y \in \Ysf$ .

Let $x^* \coloneq Q_\tau \,X$ and let $y^* = Q_\tau(X + \alpha) = \min\setntn{y \in \Ysf}{\Phi_Y(y) \geq \tau}.$ We need to show that $y^* = x^* + \alpha$ . We do this by proving $y^* \geq x^* + \alpha$ and $y^* \leq x^* + \alpha$ .

For the first inequality, fix $y \in \Ysf$ such that $\Phi_Y(y) \geq \tau$ . Let $x = y - \alpha$ . We then have $\Phi_Y(x + \alpha) \geq \tau$ and hence $\Phi_X(x) \geq \tau$ . Hence $x \geq x^*$ , or $y \geq x^* + \alpha$ . Since this last inequality holds for any $y \in \Ysf$ with $\Phi_Y(y) \geq \tau$ , we have $y^* \geq x^* + \alpha$ .

For the reverse inequality, fix $x \in \Xsf$ with $\Phi_X(x) \geq \tau$ and set $y = x + \alpha$ . We have $\Phi_Y(y) = \Phi_X(y - \alpha) = \Phi_X(x) \geq \tau$ , so $y \geq y^*$ , or $x \geq y^* - \alpha$ . Since the last inequality holds for all $x \in \Xsf$ with $\Phi_X(x) \geq \tau$ , we have $x^* \geq y^* - \alpha$ . Rearranging gives $y^* \leq x^* + \alpha$ , as was to be shown.

1.3Infinite-Horizon Job Search¶

Armed with fixed-point methods, we return to the job search problem discussed in Section 1.1.2.

1.3.1Values and Policies¶

In this section, we solve for the value function of an infinite horizon job search problem and associated optimal choices.

1.3.1.1Optimal Choices¶

Let’s recall the strategy for solving the infinite-horizon job search problem we proposed in Section 1.1.2. The first step is to compute the optimal value function $v^*$ that solves the Bellman equation

v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*(w') \phi(w') \right\} \qquad (w \in \Wsf).

(1.25)

Suppose for a moment that we can compute $v^*$ , and let

h^* \coloneq c + \beta \sum_{w'} v^*(w') \phi(w')

(1.26)

be the infinite-horizon continuation value that equals the maximal lifetime value that the worker can receive, contingent on deciding to continue being unemployed today.

With $h^*$ in hand, the optimal decision at any given time, facing the current wage draw $w \in \Wsf$ , is as follows:

(i) If $w / (1-\beta) \geq h^*$ , then accept the job offer.

(ii) If not, then reject and wait for the next offer.

This decision maximizes lifetime value given the current offer.

(Later we will prove that this decision process is optimal as claimed. For now, however, we focus on computing $v^*$ and $h^*$ .)

1.3.1.2The Bellman Operator¶

The method proposed in Section 1.3.1.1 requires that we solve for $v^*$ . To do so, we introduce a Bellman operator $T$ defined at $v \in \RR^\Wsf$ that is constructed to assure that any fixed point of $T$ solves the Bellman equation and vice versa:

(Tv)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \sum_{w' \in \Wsf} v(w') \phi(w') \right\} \qquad (w \in \Wsf).

(1.27)

Let $\vV \coloneq \RR^\Wsf_+$ and let $\| \cdot \|_\infty$ be the supremum norm on $\vV$ . We measure the distance between two elements $f, g$ of $\vV$ by $\| f - g\| = \max_{w \in \Wsf}|f(w) - g(w)|$ . Under this distance, we have the following result.

Now we turn to the proof of Proposition 1.3.1. An implication of the proposition is that $T^k v \to v^*$ as $k \to \infty$ for any $v \in \vV$ , so we can compute $v^*$ to any required degree of accuracy by successive approximation.

Our proof of Proposition 1.3.1 uses the elementary bound

|\alpha \vee x - \alpha \vee y| \leq |x - y| \qquad (\alpha, x, y \in \RR)

(1.28)

Proof

Proof of Proposition 1.3.1.

Take any $f, g$ in $\vV$ and fix any $w \in \Wsf$ . Apply the bound in (1.28) to get

\begin{aligned} |(Tf)(w) - (Tg)(w)| & \leq \left| c + \beta \sum_{w'} f(w') \phi(w') - \left( c + \beta \sum_{w'} g(w') \phi(w') \right) \right| \\ & = \beta \left| \sum_{w'} [f(w') - g(w')] \phi(w') \right|. \end{aligned}

Apply the triangle inequality to obtain

|(Tf)(w) - (Tg)(w)| \leq \beta \sum_{w'} |f(w') - g(w')| \phi(w') \leq \beta \| f - g \|_\infty .

Taking the supremum over all $w$ on the left-hand side of this expression leads to

\|Tf - Tg \|_\infty \leq \beta \| f - g \|_\infty .

Since $f, g$ were arbitrary elements of $\vV$ , the contraction claim is verified. ◻

1.3.1.3Optimal Policies¶

A dynamic program seeks optimal policies. We briefly introduce the notion of a policy and relate it to the job search application.

In general, for a dynamic program, choices by the controller aim to maximize lifetime rewards and consist of a state-contingent sequence $(A_t)_{t \geq 0}$ specifying how the agent acts at each point in time. Workers do not know what the future will bring, so it is natural to assume that $A_t$ can depend on present and past events but not future ones. Hence $A_t$ is a function of the current state $X_t$ and past state-action pairs $(A_{t-i}, X_{t-i})$ for $i \geq 1$ . That is,

A_t = \sigma_t( X_t, A_{t-1}, X_{t-1}, A_{t-2}, X_{t-2}, \ldots, A_0, X_0)

for some function $\sigma_t$ ; $\sigma_t$ is called a time $t$ policy function.

A key insight of dynamic programming is that some problems can be set up so that the optimal current action can be expressed as a function of the current state $X_t$ .

If the current state $X_t$ is enough to determine a current optimal action, then policies are just maps from states to actions. So we can write $A_t = \sigma(X_t)$ for some function $\sigma$ . A policy function that depends only on the current state is often called a Markov policy. Since all policies we consider will be Markov policies, we refer to them more concisely as “policies.”

Remark 1.3.1

In the last paragraph, we dropped the time subscript on $\sigma$ with no loss of generality because we can always include the date $t$ in the current state; i.e., if $Y_t$ is the state without time, then we can set $X_t = (t, Y_t)$ ). Whether this is necessary depends on the problem at hand. For the job search model with finite horizon, the date matters because opportunities for future earnings decrease with the passage of time. For the infinite horizon version of the problem, in which an agent always looks forward toward an infinite horizon, the only current information that matters to the agent at time $t$ is the wage offer $W_t$ . As a result, the calendar date $t$ does not affect the agent’s decision at time $t$ , so there is no need to include time in the state. (In Section 8.1.3.5, we will formalize this argument.)

In the job search model, the state is the current wage offer and possible actions are to accept or to reject the current offer. With 0 interpreted as reject and 1 understood as accept, the action space is $\{0,1\}$ , so a policy is a map $\sigma$ from $\Wsf$ to $\{0,1\}$ . Let $\Sigma$ be the set of all such maps.

A policy is an “instruction manual”: for an agent following $\sigma \in \Sigma$ , if current wage offer is $w$ , the agent always responds with $\sigma(w) \in \{0, 1\}$ . The policy dictates whether the agent accepts or rejects at any given wage.

For each $v \in \vV$ , a $v$ -greedy policy is a $\sigma \in \Sigma$ satisfying

\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \, \sum_{w' \in \Wsf} v(w') \phi(w') \right\} \quad \text{for all } w \in \Wsf.

(1.29)

Equation (1.29) says that an agent accepts if $w/(1-\beta)$ exceeds the continuation value computed using $v$ and rejects otherwise. Our discussion of optimal choices in Section 1.3.1.1 can now be summarized as the recommendation

\text{Adopt a } v^* \text{-greedy policy.}

This statement is sometimes called Bellman’s principle of optimality.

Inserting $v^*$ into (1.29) and rearranging, we can express a $v^*$ -greedy policy via

\sigma^*(w) = \1 \left\{ w \geq w^* \right\} \quad \text{where } \; w^* \coloneq (1 - \beta) h^* .

(1.30)

The quantity $w^*$ in (1.30) is called the reservation wage, and parallels the reservation wage that we introduced for the finite-horizon problem. Equation (1.30) states that value maximization requires accepting an offer if and only if it exceeds the reservation wage. Thus, $w^*$ provides a scalar description of an optimal policy.

1.3.2Computation¶

Let’s turn to computation. In Section 1.3.2.1, we apply a standard dynamic programming method, called value function iteration. In Section 1.3.2.2, we apply a more specialized method that uses the structure of the job search problem to accelerate computation.

1.3.2.1Value Function Iteration¶

Recall that, by Proposition 1.3.1, we can compute an approximate optimal policy by applying successive approximation via the Bellman operator. In the language of dynamic programming, this is called value function iteration. Algorithm 1.1 provides a full description.

While $T^k v$ rarely attains $v^*$ for $k < \infty$ , we can obtain a close approximation by monitoring distances between successive iterates, waiting until they become small enough. Later we will study how these distances depend on $k$ , the number of iterations, as well as on parameters defining rewards and opportunities.

Listing 6 implements value function iteration for the infinite-horizon job search model, using the function for successive approximation from Listing 4.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
include("two_period_job_search.jl")
include("s_approx.jl")

" The Bellman operator. "
function T(v, model)
    (; n, w_vals, ϕ, β, c) = model
    return [max(w / (1 - β), c + β * v'ϕ) for w in w_vals]
end

" Get a v-greedy policy. "
function get_greedy(v, model)
    (; n, w_vals, ϕ, β, c) = model
    σ = w_vals ./ (1 - β) .>= c .+ β * v'ϕ  # Boolean policy vector
    return σ
end
        
" Solve the infinite-horizon IID job search model by VFI. "
function vfi(model=default_model) 
    (; n, w_vals, ϕ, β, c) = model
    v_init = zero(model.w_vals)  
    v_star = successive_approx(v -> T(v, model), v_init)
    σ_star = get_greedy(v_star, model)
    return v_star, σ_star
end

Program 6:Value function iteration (iid_job_search.jl)

Figure 1.10 shows a sequence of iterates $(T^k v)_k$ when $v \equiv 0$ and parameters are as given in Listing 1. Iterates $0, 1$ , and 2 are shown, in addition to iterate 1000, which we take as a good approximation to the limiting function. If you experiment with different initial conditions, you will see that they all converge to the same limit.

Figure 1.10:A sequence of iterates of the Bellman operator

Figure 1.11 shows an approximation of $v^*$ computed using the code in Listing 6, along with the stopping reward $w/(1-\beta)$ and the corresponding continuation value (1.26). As anticipated, the value function is the pointwise supremum of the stopping reward and the continuation value. The worker chooses to accept an offer only when that offer exceeds some value close to 43.5.

Figure 1.11:The approximate value function for job search

1.3.2.2Computing the Continuation Value Directly¶

The technique we employed to solve the job search model in Section 1.3.1 follows a standard approach to dynamic programming. But for this particular problem, there is an easier way to compute the optimal policy that sidesteps calculating the value function. This section explains how.

Recall that the value function satisfies the Bellman equation

v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \sum_{w'} v^*(w') \phi(w') \right\} \qquad (w \in \Wsf),

(1.31)

and that the continuation value is given by (1.26). We can use $h^*$ to eliminate $v^*$ from (1.31). First we insert $h^*$ on the right-hand side of (1.31) and then we replace $w$ with $w'$ , which gives $v^*(w') = \max \left\{ w'/(1-\beta) ,\, h^* \right\}$ . Then we take mathematical expectations of both sides, multiply by $\beta$ and add $c$ to obtain

h^* = c + \beta \sum_{w'} \max \left\{ \frac{w'}{1-\beta} ,\, h^* \right\} \phi(w').

(1.32)

To obtain the unknown value $h^*$ , we introduce the mapping $g \colon \RR_+ \to \RR_+$ defined by

g(h) = c + \beta \sum_{w'} \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(w').

(1.33)

By construction, $h^*$ solves (1.32) if and only if $h^*$ is a fixed point of $g$ .

Figure 1.12 shows the function $g$ using the discrete wage offer distribution and parameters as adopted previously. The unique fixed point is $h^*$ .

Figure 1.12:Computing the continuation value as the fixed point of $g$

Exercise 1.3.2 implies that we can compute $h^*$ by choosing arbitrary $h \in \RR_+$ and iterating with $g$ . Doing so produces a value of approximately 1086. (The associated reservation wage is $w^* = (1-\beta) h^* \approx 43.4$ .) Computation of $h^*$ using this method is much faster than value function iteration because the fixed-point problem is in $\RR_+$ rather than $\RR^n_+$ .

With $h^*$ in hand, we have solved the dynamic programming problem, since a policy $\sigma^*$ is $v^*$ -greedy if and only if it satisfies

\sigma^*(w) = \1 \left\{ \frac{w}{1-\beta} \geq h^* \right\} \qquad (w \in \RR_+).

(1.34)

1.4Chapter Notes¶

Dynamic programming is often attributed to Richard Bellman (1920–1984). Both the term “dynamic programming” and the technique were popularized by Bellman (1957). According to his autobiography, Bellman chose the name dynamic programming to avoid giving the impression that he was conducting mathematical research within RAND Corporation. His ultimate boss, Secretary of Defense Charles Wilson, apparently disliked such research Bellman, 1984.

For treatments of dynamic programming from the perspective of economics and finance, see, for example, Sargent (1987), Stokey & Lucas (1989), Van & Dana (2003), Bäuerle & Rieder (2011), or Stachurski (2022).

The job search model was introduced by McCall (1970). The McCall model and its extensions transformed economists’ way of thinking about labor markets (see, e.g., Lucas (1978)). Influential extensions to the job search model include Burdett (1978), Jovanovic (1979), Pissarides (1979), Jovanovic (1984), Mortensen (1986), Ljungqvist (2002) and Chetty (2008). Rogerson et al. (2005) provides a useful survey.

For elementary real analysis, the book by Bartle & Sherbert (2011) is excellent. Ok (2007) is a superb treatment of real analysis and how it is used throughout economic theory. Discussions of Banach’s theorem and the Neumann series lemma can be found in Cheney (2013) and Atkinson & Han (2005). Rocha & Vailakis (2010) provides an extension to Banach’s theorem that requires only local contractivity.

Footnotes¶

The procedure of solving the last period first and then working back in time is called backward induction. Starting with the last period makes sense because there is no future to consider.
↩
Incidentally, imposing an infinite horizon is not the same as assuming humans live forever. Rather, it corresponds to the idea that humans have no specific “termination” date. More generally, we can understand an infinite horizon as an approximation to a finite horizon in which observations are recorded at relatively high frequency and no clear termination date exists.
↩
Hint: To prove that $A$ is invertible and $B = A^{-1}$ , it suffices to show that $AB = I$ .
↩

References¶

McCall, J. J. (1970). Economics of information and job search. The Quarterly Journal of Economics, 84(1), 113–126.
Sargent, T., & Stachurski, J. (2023). Economic Networks: Theory and Computation. Cambridge University Press.
Kreyszig, E. (1978). Introductory Functional Analysis with Applications (Vol. 1). Wiley New York.
Bollobás, B. (1999). Linear Analysis: An Introductory Course. Cambridge University Press.
Çınlar, E., & Vanderbei, R. J. (2013). Real and Convex Analysis. Springer Science & Business Media.
Bellman, R. (1957). Dynamic Programming. In Science. American Association for the Advancement of Science.
Bellman, R. (1984). Eye of the Hurricane. World Scientific.
Sargent, T. (1987). Dynamic Macroeconomic Theory. Harvard University Press.
Stokey, N., & Lucas, R. (1989). Recursive Methods in Dynamic Economics. Harvard University Press.
Van, C., & Dana, R.-A. (2003). Dynamic Programming in Economics. Springer.
Bäuerle, N., & Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer Science & Business Media.
Stachurski, J. (2022). Economic Dynamics: Theory and Computation (2nd ed.). MIT Press.
Lucas, R. E. (1978). Unemployment policy. American Economic Review, 68(2), 353–357.
Burdett, K. (1978). A theory of employee job search and quit rates. American Economic Review, 68(1), 212–220.
Jovanovic, B. (1979). Firm-specific capital and turnover. Journal of Political Economy, 87(6), 1246–1260.