Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

1 Introduction

Authors
Affiliations
New York University
Australian National University

The temporal structure of a typical dynamic program is

The state XtX_t is a vector listing current values of variables deemed relevant to choosing the current action. The action AtA_t is a vector describing choices of a set of decision variables. If T<T < \infty, then the problem has a finite horizon. Otherwise it is an infinite horizon problem. Figure Figure 1.1 illustrates the first two rounds of a dynamic program. As shown in the figure, a rule for updating the state depends on the current state and action.

A dynamic program

Figure 1.1:A dynamic program

Dynamic programming provides a way to maximize the expected lifetime reward of a decision-maker who receives a prospective reward sequence (Rt)t0(R_t)_{t \geq 0} and who confronts a system that maps today’s state and control into the next period’s state. A lifetime reward is an aggregation of the individual period rewards (Rt)t0(R_t)_{t \geq 0} into a single value. An example of lifetime reward is an expected discounted sum Et0βtRt\EE \sum_{t \geq 0} \beta^t R_t for some β(0,1)\beta \in (0,1).

Dynamic programming has a vast array of applications, from robotics and artificial intelligence to the sequencing of DNA. Dynamic programming is used every day to control aircraft, route shipping, test products, recommend information on media platforms, and solve research problems. Some companies produce specialized computer chips that are designed for specific dynamic programs.

Within economics and finance, dynamic programming is applied to topics including unemployment, monetary policy, fiscal policy, asset pricing, firm investment, wealth dynamics, inventory control, commodity pricing, sovereign default, the division of labor, natural resource extraction, human capital accumulation, retirement decisions, portfolio choice, and dynamic pricing. We discuss some of these applications in the rest of the book.

The core theory of dynamic programming is relatively simple and concise. But implementation can be computationally demanding. That situation provides one of the major challenges facing the field of dynamic programming.

In this book, we discuss fundamental theory, traditional economic applications, and recent applications with computationally demanding environments. We also cover recent trends towards more sophisticated specifications of lifetime rewards, often called recursive preferences. Throughout the book, theory and computation are combined, since, for interesting problems, brute-force computation is futile, while theory alone provides limited insights. The interplay between interesting applications, fundamental theory, computational methods, and evolving hardware capability makes dynamic programming exciting.

1.1Bellman Equations

In this section, we introduce the recursive structure of dynamic programming in a simple setting. After solving a finite-horizon model, we consider an infinite-horizon version and explain how it produces a system of nonlinear equations. Then we turn to methods for solving such systems.

We begin with a celebrated model of job search created by McCall (1970). McCall analyzed the decision problem of an unemployed worker in terms of current and prospective wage offers, impatience, and the availability of unemployment compensation. Here we study a simple version of the model in which essential ideas of dynamic programming are particularly clear.

Readers who are familiar with Bellman equations can skim this section quickly and proceed directly to Section Section 1.2.

1.1.1.1A Two-Period Problem

Imagine someone who begins her working life at time t=1t=1 without employment. While unemployed, she receives a new job offer paying wage WtW_t at each date tt. She can accept the offer and work permanently at that wage level or reject the offer, receive unemployment compensation cc, and draw a new offer next period. We assume that the wage offer sequence is iid and nonnegative, with distribution ϕ\phi. In particular,

The worker is impatient. Impatience is parameterized by a time discount factor β(0,1)\beta \in (0, 1), so that the present value of a next-period payoff of yy dollars is βy\beta y. Since β<1\beta < 1, the worker will be tempted to accept reasonable offers, rather than to wait for better ones. A key question is how long to wait.

Suppose as a first step that working life is just two periods. To solve our problem, we work backwards, starting at the final date t=2t=2, after W2W_2 has been observed.[1] If she is already employed, the worker has no decision to make: She continues working at her current wage. If she is unemployed, then she should take the largest of cc and W2W_2.

Now we step back to t=1t=1. At this time, having received offer W1W_1, the unemployed worker’s options are (a) accept W1W_1 and receive it in both periods or (b) reject it, receive unemployment compensation cc, and then, in the second period, choose the maximum of W2W_2 and cc.

Let’s assume that the worker seeks to maximize EPV. The EPV of option (a) is W1+βW1W_1 + \beta W_1, which is also called the stopping value. The EPV of option (b), also called the continuation value, is h1c+βEmax{c,W2}h_1 \coloneq c + \beta \, \EE \max\{c, W_2\}. More explicitly,

h1=c+βwWv2(w)ϕ(w),wherev2(w)max{c,w}.h_1 = c + \beta \sum_{w' \in \Wsf} v_2(w') \phi(w'), \quad \text{where} \quad v_2(w) \coloneq \max\{c, w\}.

The optimal choice at t=1t=1 is now clear: Accept the offer if W1+βW1h1W_1 + \beta W_1 \geq h_1 and reject otherwise. A decision tree is shown in Figure Figure 1.2.

Decision tree for a two-period problem

Figure 1.2:Decision tree for a two-period problem

1.1.1.2Comments on Information

In determining the optimal choice, we assumed that the worker (a) cares about expected values and (b) knows how to compute them. In Chapter 7 and Chapter 8 we discuss how to extend or weaken these assumptions. Some of these extensions allow decision-makers to focus on measurements that differ from expected values. Other extensions assume that the decision-maker does not know underlying probability distributions. For now we put these issues aside and return to the setup discussed in Section 1.1.1.1.

1.1.1.3Value Functions

A key idea in dynamic programming is to use “value functions” to track maximal lifetime rewards from a given state at a given time. The time 2 value function v2v_2 defined in (1.2) returns the maximum value obtained in the final stage for each possible realization of the time 2 wage offer. The time 1 value function v1v_1 evaluated at wWw \in \Wsf is

v1(w)max{w+βw,c+βwWv2(w)ϕ(w)}.v_1(w) \coloneq \max \left\{ w + \beta w ,\, c + \beta \, \sum_{w' \in \Wsf} v_2(w') \phi(w') \right\}.

It represents the present value of expected lifetime income after receiving the first offer ww, conditional on choosing optimally in both periods.

The value function v_1 and the reservation wage

Figure 1.3:The value function v1v_1 and the reservation wage

The value function is shown in Figure Figure 1.3. This figure also shows the reservation wage

w1h11+β.w_1^* \coloneq \frac{h_1}{1+\beta}.

It is the ww that solves the indifference condition

w+βw=c+βwWv2(w)ϕ(w),w + \beta w = c + \beta \, \sum_{w' \in \Wsf} v_2(w') \phi(w'),

and equates the value of stopping to the value of continuing. For an offer W1W_1 above w1w_1^*, the stopping value exceeds the continuation value. For an offer below the reservation wage, the reverse is true. Hence, the optimal choice for the worker at t=1t=1 is completely described by the reservation wage.

Parameters and functions underlying the figure are shown in Listing 1.

Equation (1.4) is instructive. We can see that higher unemployment compensation cc shifts up the continuation value h1h_1 and increases the reservation wage. As a result, the worker will, on average, spend more time unemployed when unemployment compensation is higher.

1.1.1.4Three Periods

Now let’s suppose that the worker works in period t=0t=0 as well as t=1,2t=1,2. Figure Figure 1.4 shows the decision tree for the three periods. Notice that the subtree containing nodes 1 and 2 is just the decision tree for the two-period problem in Figure Figure 1.2. We will use this to find optimal actions.

Decision tree for the job seeker

Figure 1.4:Decision tree for the job seeker

At t=0t=0, the value of accepting the current offer W0W_0 is W0+βW0+β2W0W_0 + \beta W_0 + \beta^2 W_0, while the maximal value of rejecting and waiting is cc plus, after discounting by β\beta, the maximum value that can be obtained by behaving optimally from t=1t=1. We have already calculated this value: It is just v1(W1)v_1(W_1), as given in (1.3)!

The maximal time zero value v0(w)v_0(w) is the maximum of the value of these two options, given W0=wW_0 = w, so we can write

v0(w)=max{w+βw+β2w,c+βwWv1(w)ϕ(w)}.v_0(w) = \max \left\{ w + \beta \, w + \beta^2 \, w ,\, c + \beta \, \sum_{w' \in \Wsf} v_1(w') \phi(w') \right\}.

By plugging v1v_1 from (1.3) into this expression, we can determine v0v_0, as well as the optimal action, the one that achieves the largest value in the max term in (1.6).

Figure Figure 1.4 illustrates how the backward induction process works. The last-period value function v2v_2 is trivial to obtain. With v2v_2 in hand, we can compute v1v_1. With v1v_1 in hand, we can compute v0v_0. Once all the value functions are available, we can calculate whether to accept or reject the current offer at each point in time.

Notice how we subdivided the three-period problem down into a pair of two-period problems, given by (1.3) and (1.6). Breaking many-period problems down into a sequence of two-period problems is the essence of dynamic programming. The recursive relationships between v0v_0 and v1v_1 in (1.6), as well as between v1v_1 and v2v_2 in (1.3), are examples of what are called Bellman equations. We will see many other examples.

1.1.2Infinite Horizon

Next, we consider an infinite horizon problem that in some ways is more challenging but in other ways simpler. On the one hand, the lack of a terminal period means that backward induction requires a subtler justification. On the other hand, the infinite horizon means that the worker always faces an infinite future, so that we only have to study a single-value function and need not keep track of the number of remaining periods in the problem. This will become clearer as the section unfolds.[2]

With this discussion in mind, let us consider a worker who aims to maximize

Et=0βtRt,\EE \sum_{t=0}^{\infty} \beta^t R_t,

where Rt{c,Wt}R_t \in \{c, W_t\} is earnings at time tt. As before, jobs are permanent, so accepting a job at a given wage means earning that wage in every subsequent period.

Let’s clarify our assumptions:

Here and in what follows, for any finite or countable set FF, the symbol D(F)\dD(F) indicates the set of distributions on FF.

As with the finite-state case, infinite-horizon dynamic programming involves a two-step procedure that first assigns values to states and then deduces optimal actions given those values. We begin with an informal discussion and then formalize the main ideas.

To trade off current and future rewards optimally, we need to compare the current payoffs we get from our two choices with the states that those choices lead to and the maximum value that can be extracted from those states. But how do we calculate the maximum value that can be extracted from each state when lifetime is infinite?

Consider first the present expected lifetime value of being employed with wage wWw \in \Wsf. This case is easy because, under the current assumptions, workers who accept a job are employed forever. Lifetime payoff is

w+βw+β2w+=w1β.w + \beta w + \beta^2 w + \cdots = \frac{w}{1 - \beta}.

How about the maximum present expected lifetime value attainable when entering the current period unemployed with wage offer ww in hand? Denote this (as yet unknown) value by v(w)v^*(w). We call vv^* the value function. While vv^* is not trivial to pin down, the task is not impossible. Our first step in the right direction is to observe that it satisfies the Bellman equation

v(w)=max{w1β,c+βwWv(w)ϕ(w)},v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*(w') \phi(w') \right\},

at every wWw \in \Wsf. (Here ww' is the offer next period.)

Our reasoning is as follows: The first term inside the max operation is the stopping value, or lifetime payoff from accepting current offer ww. The second term inside the max operation is the continuation value, or current expected value of rejecting and behaving optimally thereafter. The maximal value is obtained by selecting the largest of these two alternatives.

Note the similarity between (1.9) and our finite horizon Bellman equations (1.3) and (1.6). The only real difference is that the value function is no longer time-dependent. This is because the worker always looks forward toward an infinite horizon, regardless of the current date.

Equation (1.9) is to be solved for a function vRWv^* \in \RR^\Wsf, the set of all functions from W\Wsf to R\RR. Once we have solved for vv^* (assuming this is possible), optimal choices can be made by observing current ww and then choosing the largest of the two alternatives on the right-hand side of (1.9), just as we did in the finite horizon case. This idea -- that optimal choices can be made by computing the value function and maximizing the right-hand side of the Bellman equation -- is called Bellman’s principle of optimality, and will be a cornerstone of what follows. Later we prove it in a general setting.

To solve for vv^*, we use fixed-point theory, our topic in the next section. Later, in Section Section 1.3, we return to the job search problem and apply fixed-point theory to solve for vv^*.

1.2Stability and Contractions

In this section, we cover enough fixed-point theory to solve an infinite horizon job search problem. (In Chapter 2 we consider more general results.) Readers who are familiar with the Neumann series lemma and Banach’s fixed-point theorem can skim this section and proceed to Section 1.3.

1.2.1Vector Space

To begin, we recall some fundamental properties of real numbers, finite-dimensional vector space, basic topology, and equivalence of norms.

1.2.1.1Real and Complex Vectors

For the most part, we are interested in vectors whose elements are real numbers (as distinguished from complex numbers). Before investigating such vectors, let’s provide some useful language about the real line R\RR. (You might want to review some elementary concepts from real analysis in Appendix §Appendix A, such as suprema, infima, minima, maxima, and convergence.)

Given a,bRa, b \in \RR, let abmax{a,b}a \vee b \coloneq \max\{a, b\} and abmin{a,b}a \wedge b \coloneq \min\{a, b\}. The absolute value of aRa \in \RR is defined as aa(a)|a| \coloneq a \vee (-a).

A real-valued vector u=(u1,,un)u = (u_1, \ldots, u_n) is a finite real sequence with uiRu_i \in \RR as the ii-th element. The set of all real vectors of length nn is denoted by Rn\RR^n. The inner product of nn-vectors (u1,,un)(u_1, \ldots, u_n) and (v1,,vn)(v_1, \ldots, v_n) is u,vi=1nuivi\inner{u,v} \coloneq \sum_{i=1}^n u_i v_i.

The set C\CC of complex numbers is defined in the appendix to Sargent & Stachurski (2023) and many other places; as is the set Cn\CC^n of all complex-valued nn-vectors. We assume readers know what complex numbers are and how to compute the modulus of a complex number.

1.2.1.2Norms

The Euclidean norm on a real vector space is defined as

uu,u(uRn).\| u \| \coloneq \sqrt{ \inner{u, u} } \qquad (u \in \RR^n).

Because they provide more flexibility when checking conditions that underlie various results, some alternative norms on Rn\RR^n are important for applications of fixed-point theory.

As a first step, recall that a function  ⁣:RnR\| \cdot \| \colon \RR^n \to \RR is called a norm on Rn\RR^n if, for any αR\alpha \in \RR and u,vRnu, v \in \RR^n,

2

  1. u0\| u \| \geq 0

  2. u=0    u=0\| u \| =0 \iff u=0

  3. αu=αu\| \alpha u \| = |\alpha| \| u\| and

  4. u+vu+v\| u + v \| \leq \| u \| + \| v \|

  5. (nonnegativity)

  6. (positive definiteness)

  7. (absolute homogeneity)

  8. (triangle inequality)

The Euclidean norm on Rn\RR^n satisfies the Cauchy--Schwarz inequality

u,vuvfor all u,vRn.| \inner{u, v} | \leq \| u \| \cdot \| v \| \quad \text{for all } u, v \in \RR^n .

This inequality can be used to prove that the triangle inequality holds for the Euclidean norm (see, e.g., Kreyszig (1978)).

The 1\ell_1 norm and the Euclidean norm are special cases of the so-called p\ell_p norm, which is defined for p1p \geq 1 by

up(i=1nuip)1/p.\| u \|_p \coloneq \left( \sum_{i=1}^n |u_i|^p \right)^{1/p}.

It can be shown that uupu \mapsto \| u \|_p is a norm for all p1p \geq 1, as suggested by the name (see, e.g., Kreyszig (1978)). For this norm, the subadditivity asserted in (d) is called Minkowski’s inequality.

Since the Euclidean case is obtained by setting p=2p=2, the Euclidean norm is also called the 2\ell_2 norm, and we write 2\| \cdot \|_2 rather than \| \cdot \| when extra clarity is required.

(The symbol u\| u \|_\infty is used because, for all uRnu \in \RR^n, we have upu\| u \|_p \to \| u \|_\infty as pp \to \infty.)

For the next exercise, we recall that the indicator function of logical statement PP, denoted here by 1{P}\1\{P\}, takes value 1 (resp., 0) if PP is true (resp., false). For example, if x,yRx, y \in \RR, then

1{xy}={1 if xy0 otherwise.\1\{x \leq y\} = \begin{cases} 1 & \text{ if } x \leq y \\ 0 & \text{ otherwise} . \end{cases}

If ASA \subset S, where SS is any set, then 1A(x)1{xA}\1_A(x) \coloneq \1\{x \in A\} for all xSx \in S.

1.2.1.3Equivalence of Vector Norms

An important property of a finite-dimensional normed vector space is that all norms are “equivalent.” Let’s review this result and discuss why it matters.

To begin, recall that when uu and (um)(um)mN(u_m) \coloneq (u_m)_{m \in \NN} are all elements of Rn\RR^n, we say that (um)(u_m) converges to uu and write umuu_m \to u if

umu0 as m for some norm  on Rn.\| u_m - u \| \to 0 \text{ as } m \to \infty \text{ for some norm } \| \cdot \| \text{ on } \RR^n.

It might seem that this definition is imprecise. Don’t we need to clarify that the convergence is with respect to a particular norm?

No we don’t. This is because any two norms a\| \cdot \|_a and b\| \cdot \|_b on Rn\RR^n are equivalent in the sense that there exist finite positive constants M,NM, N such that

MuaubNuafor all uRn.M \|u\|_a \leq \| u\|_b \leq N \| u \|_a \quad \text{for all } u \in \RR^n.

(See, e.g., Kreyszig (1978).)

The next exercise tells us that pointwise convergence and norm convergence are the same thing in finite dimensions.

Recall that a set CRnC \subset \RR^n is called bounded if there exists an MNM \in \NN with xM\|x\| \leq M for all xCx \in C; and closed in Rn\RR^n if, for all uRnu \in \RR^n and sequences (um)C(u_m) \subset C such that umuu_m \to u as mm \to \infty, we also have uCu \in C. A set GRnG \subset \RR^n is called open in Rn\RR^n if GcG^c is closed in Rn\RR^n. A set NN is called a neighborhood of uRnu \in \RR^n if there exists an open set GRnG \subset \RR^n with uGNu \in G \subset N. A map TT from URnU \subset \RR^n to Rk\RR^k is called continuous at uUu \in U if TumTuTu_m \to Tu for any (um)U(u_m) \subset U with umuu_m \to u; and continuous if TT is continuous at every uUu \in U. These notions apply to any norm, since convergence does not depend on our choice of norm.

1.2.1.4Matrices and Neumann Series

Next, we discuss geometric series in matrix space, along with the Neumann series lemma, one of many useful results in applied and numerical analysis.

Before starting we recall that if A=(aij)A = (a_{ij}) is an n×nn \times n matrix with i,ji,j-th element aija_{ij}, then the definition of matrix multiplication tells us that for uRnu \in \RR^n, the ii-th element of AuAu is j=1naijuj\sum_{j=1}^n a_{ij}u_j, while the jj-th element of uAu^\top A is i=1naijui\sum_{i=1}^n a_{ij}u_i. Think of uAuu \mapsto Au and uuAu \mapsto u^\top A is two different mappings, each of which takes an nn-vector and produces a new nn-vector.

Just as we considered norms of vectors in Section 1.2.1.2, we will find it helpful to have a notion of norms of matrices. A real-valued map defined on Rn×n\RR^{n \times n}, the set of real n×nn \times n matrices, is called a matrix norm if it has the following properties: for any αR\alpha \in \RR and any n×nn\times n matrices A,BA, B,

  1. A0\| A \| \geq 0,

  2. A=0    A=0\| A \| =0 \iff A=0,

  3. αA=αA\| \alpha A \| = |\alpha| \| A\|,

  4. A+BA+B\| A + B \| \leq \| A \| + \| B \|, and

These are called nonnegativity, positive definiteness, absolute homogeneity, and the triangle inequality, analogous to the norms on Rn\RR^n discussed in Section 1.2.1.2.

An example of a matrix norm is the so-called operator norm

Bomaxu=1Bu.\| B \|_o \coloneq \max_{\|u\| = 1} \| B u \|.

Here BB is n×nn \times n, uu is in Rn\RR^n and the norm on the right-hand side is the Euclidean norm over the nn-vector BuB u. Another example of a matrix norm is the supremum norm defined as

Bmax1i,jnbij, where bij is the i,j-th element of B.\| B \|_\infty \coloneq \max_{1 \leq i, j \leq n} |b_{ij}|, \quad \text{ where } b_{ij} \text{ is the } i,j \text{-th element of } B.

Some matrix norms have the submultiplicative property, which means that, for all A,BRn×nA, B \in \RR^{n \times n}, we have ABAB\| A B \| \leq \|A \| \|B\|.

In what follows we often use the operator norm as our choice of matrix norm (partly because of its attractive submultiplicative property). Hence, by convention, an expression such as A\| A\| refers to the operator norm Ao\|A\|_o of AA.

Analogous to the vector case, we say that a sequence (Ak)(A_k) of n×nn \times n matrices converges to an n×nn \times n matrix AA and write AkAA_k \to A if AkA0\| A_k - A \| \to 0 as kk \to \infty. Just as with vectors, this form of norm convergence holds if and only if each element of AkA_k converges to the corresponding element of AA. The proof is similar to the solution to Exercise 1.11.

If AA is an n×nn \times n matrix, then λC\lambda \in \CC is called an eigenvalue of AA if there exists a nonzero eCne \in \CC^n such that Ae=λeAe = \lambda e. (Here C\CC is the set of complex numbers and Cn\CC^n is the set of complex nn-vectors.) A vector ee satisfying this equality is called an eigenvector of AA and (λ,e)(\lambda, e) is called an eigenpair.

In Julia, we can compute the eigenvalues of a square matrix AA via eigvals(A). The code

using LinearAlgebra
A = [0 -1;
     1  0]
println(eigvals(A))

produces

2-element Vector{ComplexF64}:
 0.0 - 1.0im
 0.0 + 1.0im

Here im stands for ii, the imaginary unit, so the eigenvalues of AA are i-i and ii.

Turning to geometric series, let us begin in one dimension. Consider the one-dimensional linear equation u=au+bu = au + b, where a,ba, b are given and uu is unknown. Its solution uu^* satisfies

a<1    u=b1a=k0akb.|a| < 1 \quad \implies \quad u^* = \frac{b}{1-a} = \sum_{k \geq 0} a^k b.

This scalar result extends naturally to vectors. To show this we suppose that uu and bb are column vectors in Rn\RR^n, and that AA is an n×nn \times n matrix. We consider the vector equation u=Au+bu = A u + b. For the next result, we recall that the spectral radius of AA is defined as

ρ(A)max{λ:λ is an eigenvalue of A}\rho(A) \coloneq \max\setntn{|\lambda|}{\lambda \text{ is an eigenvalue of } A}

Here λ|\lambda| indicates the modulus of complex number λ\lambda.

With II as the n×nn \times n identity matrix, we can state the following result.

It follows directly that the vector system u=Au+bu = A u + b has a unique solution u=(IA)1b=k0Akbu^* = (I - A)^{-1} b = \sum_{k \geq 0} A^k b whenever ρ(A)<1\rho(A) < 1. This is the multivariate extension of (1.19).

The code in Listing 2 shows how to compute the spectral radius of an arbitrary matrix AA in Julia. The print statement produces 0.5828, so, for this matrix, ρ(A)<1\rho(A)<1.

1
2
3
4
5
using LinearAlgebra                         
ρ(A) = maximum(abs(λ) for λ in eigvals(A))  # Spectral radius
A = [0.4 0.1;                               # Test with arbitrary A
     0.7 0.2]
print(ρ(A))

Program 2:Computing a spectral radius (compute_spec_rad.jl)

The rest of this section works through the proof of the Neumann series lemma, with several parts left as exercises. An informal proof of the lemma runs as follows. If Sk0AkS \coloneq \sum_{k \geq 0} A^k, then

I+AS=I+Ak0Ak=I+A+A2+=S.I + AS = I + A \sum_{k \geq 0} A^k = I + A + A^2 + \cdots = S.

Rearranging I+AS=SI + AS = S gives S=(IA)1S = (I - A)^{-1}, which matches the claim in the Neumann series lemma.

This informal argument lacks rigor. To make it rigorous, we must prove (a) that the sum k0Ak\sum_{k \geq 0} A^k converges and (b) that the matrix IAI-A is invertible.

A proof of Lemma 1.2.2 can be found in Chapter 12 of Bollobás (1999). The second result is sometimes called Gelfand’s formula.

From this last result, one can show that (IA)1(I-A)^{-1} exists by computing it:

Listing 3 helps illustrate the result in Exercise 1.17, although we truncate the infinite sum k0Ak\sum_{k \geq 0} A^k at 50.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Primitives
A = [0.4 0.1;
     0.7 0.2]

# Method one: direct inverse
B_inverse = inv(I - A)

# Method two: power series
function power_series(A)
    B_sum = zeros((2, 2))
    A_power = I
    for k in 1:50
        B_sum += A_power
        A_power = A_power * A
    end
    return B_sum
end

# Print maximal error
print(maximum(abs.(B_inverse - power_series(A))))

Program 3:Matrix inversion versus power series (power_series.jl)

The output 5.621e-12 is close enough to zero for many practical purposes.

1.2.2Nonlinear Systems

While the Neumann series lemma is a powerful tool for solving linear systems, it doesn’t help us with nonlinear problems. In this section, we present Banach’s fixed-point theorem, one of a variety of techniques for handling nonlinear systems. (Chapter 2 introduces other methods.)

1.2.2.1Fixed Points

A standard approach to solving an equation is to formulate it as a fixed-point problem. This section provides the basic definitions and some simple results from fixed-point theory.

Let UU be any nonempty set. We call TT a self-map on UU if TT is a function from UU into itself. For a self-map TT on UU, a point uUu^* \in U is called a fixed point of TT in UU if Tu=uT u^* = u^*. (In fixed-point theory, it is common to write TuT u for the image of uu under TT, rather than T(u)T(u).)

Figure Figure 1.5 shows another example, for a self-map TT on U[0,2]U \coloneq [0, 2]. Fixed points are numbers u[0,2]u \in [0, 2] where TT meets the 45-degree line. In this case there are three.

Graph and fixed points of T \colon u \mapsto 2.125/(1 + u^{-4})

Figure 1.5:Graph and fixed points of T ⁣:u2.125/(1+u4)T \colon u \mapsto 2.125/(1 + u^{-4})

When considering fixed points, given a self-map TT on UU, we typically seek conditions on TT and UU under which the following properties hold:

1.2.2.2Global Stability

A self-map TT on UU is called globally stable on UU if TT has a unique fixed point uu^* in UU and TkuuT^k u \to u^* as kk \to \infty for all uUu \in U. Here TkT^k indicates kk compositions of TT with itself. Global stability is a desirable property in the setting of dynamic programming. A number of our results rely on it.

Let TT be a self-map on URnU \subset \RR^n. We call TT invariant on CUC \subset U and call CC an invariant set if TT is also a self-map on CC; that is, if uCu \in C implies TuCTu \in C.

1.2.2.3Banach’s Fixed-Point Theorem

Next, we present the Banach fixed-point theorem, a workhorse for analyzing nonlinear operators.

Let UU be a nonempty subset of Rn\RR^n and let \| \cdot \| be a norm on Rn\RR^n. A self-map TT on UU is called a contraction on UU with respect to \| \cdot \| if there exists a λ<1\lambda < 1 such that

TuTvλuvfor allu,vU.\| Tu - Tv \| \leq \lambda \| u - v \| \quad \text{for all} \quad u, v \in U.

The constant λ\lambda is called the modulus of contraction.

The following theorem features a contraction.

We prove Theorem 1.2.3 in stages that build on the following exercises.

A fundamental property of Rn\RR^n is that if (vm)(v_m) is a Cauchy sequence in Rn\RR^n, then there exists a vˉRn\bar v \in \RR^n such that (vm)(v_m) converges to vˉ\bar v. (This property is called completeness of the vector space Rn\RR^n. See, for example, Çınlar & Vanderbei (2013).) Hence it follows from Exercise 1.25 that (um)(u_m) has a limit uRnu^* \in \RR^n.

1.2.3Successive Approximation

Consider a self-map TT on URnU \subset \RR^n. We seek algorithms that compute fixed points of TT whenever they exist.

1.2.3.1Iteration

If TT is globally stable on UU, then a natural algorithm for approximating the unique fixed point uu^* of TT in UU is to pick any uUu \in U and iterate with TT for some finite number of steps:

By the definition of global stability, (uk)k0(u_k)_{k \geq 0} converges to uu^*. The algorithm just described is called either successive approximation or fixed-point iteration. Listing 4 provides a function that implements this procedure. Distances between points are measured with the \ell_\infty norm.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
Computes an approximate fixed point of a given operator T 
via successive approximation.

"""
function successive_approx(T,                  # operator (callable)
                           u_0;                # initial condition
                           tolerance=1e-6,     # error tolerance
                           max_iter=10_000,    # max iteration bound
                           print_step=25)      # print at multiples
    u = u_0
    error = Inf
    k = 1

    while (error > tolerance) & (k <= max_iter)
        
        u_new = T(u)
        error = maximum(abs.(u_new - u))

        if k % print_step == 0
            println("Completed iteration $k with error $error.")
        end

        u = u_new
        k += 1
    end

    if error <= tolerance
        println("Terminated successfully in $k iterations.")
    else
        println("Warning: hit iteration bound.")
    end

    return u
end

Program 4:Successive approximation (s_approx.jl)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
include("s_approx.jl")
using LinearAlgebra

# Compute the fixed point of Tu = Au + b via linear algebra
A, b = [0.4 0.1; 0.7 0.2], [1.0; 2.0]
u_star = (I - A) \ b  # compute (I - A)^{-1} * b

# Compute the fixed point via successive approximation
T(u) = A * u + b
u_0 = [1.0; 1.0]
u_star_approx = successive_approx(T, u_0)

# Test for approximate equality (prints "true")
print(isapprox(u_star, u_star_approx, rtol=1e-5))

Program 5:Using successive approximations to compute uu^* (linear_iter.jl)

Listing 5 applies successive approximation to the map Tu=Au+bTu = Au + b using the function defined in s_approx.jl. Figure Figure 1.6 shows the sequence of iterates generated by four runs of the successive approximation algorithm, each with a different starting condition u0u_0. The map and parameters are the same as in Listing 5. It is clear from the figure that a good choice of initial condition (i.e., one that is close to the fixed point) accelerates convergence.

Successive approximations from different initial conditions

Figure 1.6:Successive approximations from different initial conditions

Of course for Tu=Au+bTu = Au + b with ρ(A)<1\rho(A)<1, there is a more direct method to compute the fixed point: The Neumann series lemma tells us that u=(IA)1bu^* = (I-A)^{-1} b so we can apply a numerical linear equation solver. However, even for this case, sometimes successive approximation is used instead. One reason is that (IA)1(I-A)^{-1} can be very large, making application of a linear solver problematic. Another is that we might be satisfied with a quick approximation of the fixed point, computed with a few iterations of TT. Both of these situations can arise in dynamic programming.

1.2.3.2A One-Dimensional Example

To illustrate successive approximations in a nonlinear setting, we use the Solow--Swan growth model, which is a good place to begin presenting a theory of economic growth. A fixed point for the Solow--Swan model can be computed with pencil and paper. The model also provides a good laboratory for studying how successive approximations might converge to a fixed point.

One version of the Solow--Swan growth dynamics is

kt+1=sf(kt)+(1δ)kt,t=0,1,,k_{t+1} = s f(k_t) + (1 - \delta) k_t, \qquad t = 0, 1, \ldots,

where ktk_t is capital stock per worker, f ⁣:(0,)(0,)f \colon (0, \infty) \to (0, \infty) is a production function, s>0s > 0 is a saving rate and δ(0,1)\delta \in (0,1) is a rate of depreciation. If we set g(k)sf(k)+(1δ)kg(k) \coloneq sf(k) + (1-\delta)k, then iterating with gg from a starting point k0k_0 (i.e., setting kt+1=g(kt)k_{t+1} = g(k_t) for all t0t \geq 0) generates the sequence in (1.31). We can also understand this process as using successive approximation to compute the fixed point of gg.

Although the model specified in Exercise 1.28 does not generate a contraction, it is globally stable. The next exercise asks you to prove this.

Figure Figure 1.7 illustrates the dynamics in a 45-degree diagram when f(k)=Akαf(k) = A k^\alpha. In the top subfigure, A=2.0A=2.0, α=0.3\alpha=0.3, s=0.3s=0.3 and δ=0.4\delta=0.4. The function gg is plotted alongside the 45-degree line. When g(kt)g(k_t) lies strictly above the 45-degree line, then kt+1=g(kt)>ktk_{t+1} = g(k_t) > k_t and so capital per worker rises. If g(kt)<ktg(k_t) < k_t then it falls. A trajectory (kt)t0(k_t)_{t \geq 0} that is produced by starting from a particular choice of k0k_0 is traced out in the figure.

Successive approximation for the Solow--Swan model

Figure 1.7:Successive approximation for the Solow--Swan model

The figure illustrates that kk^* is the unique fixed point of gg in UU and all sequences converge to it. The second statement can be rephrased as: successive approximation successfully computes the fixed point of gg by stepping through the time path of capital.

1.2.4Finite-Dimensional Function Space

In Paragraph we introduced a Bellman equation for the infinite horizon job search problem. The unknown object in the Bellman equation is a function vv^* defined on the set W\Wsf of possible wage offers. Below we discuss how to solve for this unknown function.

Since the set of wage offers is finite we can write W\Wsf as {w1,,wn}\{w_1, \ldots, w_n\} for some nNn \in\NN. If we adopt this convention and also write v(wi)v^*(w_i) as viv^*_i, then we can view vv^* as a vector (v1,,vn)(v^*_1, \ldots, v^*_n) in Rn\RR^n. The vector interpretation is useful when coding, since vectors (numerical arrays) are an efficient data type.

Nevertheless, for mathematical exposition, we usually find it more convenient to express function-like objects (e.g., value functions) as functions rather than vectors. Thus, we typically write v(w)v^*(w) instead of viv^*_i.

Section Section 1.2.4.1 clarifies our notation with respect to functions and vectors.

1.2.4.1Pointwise Operations on Functions

If X\Xsf is any set and uu maps X\Xsf to R\RR, then we call uu a real-valued function on X\Xsf and write u ⁣:XRu \colon \Xsf \to \RR. Throughout, the symbol RX\RR^\Xsf denotes the set of all real-valued functions on X\Xsf. This is a special case of the symbol BAB^A that represents the set of all functions from AA to BB, where AA and BB are sets.

If u,vRXu, v \in \RR^\Xsf and α,βR\alpha, \beta \in \RR, then the expressions αu+βv\alpha u + \beta v and uvuv also represent elements of RX\RR^\Xsf, defined at xXx \in \Xsf by

(αu+βv)(x)=αu(x)+βv(x)and(uv)(x)=u(x)v(x).(\alpha u + \beta v)(x) = \alpha u(x) + \beta v(x) \quad \text{and} \quad (uv)(x) = u(x)v(x).

Similarly, u|u|, uvu \vee v, and uvu \wedge v are real-valued functions on X\Xsf defined by

u(x)=u(x),(uv)(x)=u(x)v(x)     and     (uv)(x)=u(x)v(x).|u|(x) = |u(x)|, \quad (u \vee v)(x) = u(x) \vee v(x) \;\; \text{ and } \;\; (u \wedge v)(x) = u(x) \wedge v(x).

Figure Figure 1.8 illustrates functions uvu \vee v and uvu \wedge v when X\Xsf is a subset of R\RR.

Functions u \vee v and u \wedge v

Figure 1.8:Functions uvu \vee v and uvu \wedge v

Similarly, if u=(ui)i=1nu = (u_i)_{i=1}^n and v=(vi)i=1nv = (v_i)_{i=1}^n are vectors in Rn\RR^n, then

u(ui)i=1n,uv(uivi)i=1nanduv(uivi)i=1n.|u| \coloneq (|u_i|)_{i=1}^n, \quad u \wedge v \coloneq (u_i \wedge v_i)_{i=1}^n \quad \text{and} \quad u \vee v \coloneq (u_i \vee v_i)_{i=1}^n.

Figure Figure 1.9 illustrates in R2\RR^2.

The vectors u \vee v and u \wedge v in \RR^2

Figure 1.9:The vectors uvu \vee v and uvu \wedge v in R2\RR^2

1.2.4.2Functions versus Vectors

Let X\Xsf be finite, so that X={x1,,xn}\Xsf = \{x_1, \ldots, x_n\} for some nNn \in \NN. The set RX\RR^\Xsf is the vector space Rn\RR^n expressed in different notation. The next lemma clarifies.

The claim in Lemma 1.2.4 is obvious: a real-valued function uu on X\Xsf is uniquely identified by the set of values that it takes on X\Xsf, which is an nn-tuple of real numbers.

Throughout the text, whenever the supporting set X\Xsf is finite, we freely use the identification in (1.39). For example, if \| \cdot \| is any norm on Rn\RR^n, then \| \cdot \| extends to RX\RR^\Xsf via the identification in (1.39). That is, for uRXu \in \RR^\Xsf, the value u\| u \| is given by the norm of the vector (u(x1),,u(xn))Rn(u(x_1), \ldots, u(x_n)) \in \RR^n.

We say that a subset of RX\RR^\Xsf is closed (resp., open, compact, etc.) if the corresponding subset of Rn\RR^n is closed (resp., open, compact, etc.)

With these conventions, the Neumann series lemma and Banach’s contraction mapping theorem extend directly from Rn\RR^n to RX\RR^\Xsf. For example, if X=n|\Xsf|=n, CC is closed in RX\RR^\Xsf and TT is a contraction on CRXC \subset \RR^\Xsf, in the sense that T ⁣:CCT \colon C \to C and

 there exists a λ[0,1)  s.t. TfTgλfgfor allf,gC,\text{ there exists a } \lambda \in [0, 1) \ \st \quad \| Tf - Tg \| \leq \lambda \| f - g \| \quad \text{for all} \quad f, g \in C,

then TT has a unique fixed point ff^* in CC and

Tnffλnfffor all nN and fRX.\| T^n f - f^* \| \leq \lambda^n \| f - f^* \| \quad \text{for all } n \in \NN \text{ and } f \in \RR^\Xsf.

Incidentally, in the preceding paragraph TT is a function that sends functions into functions (e.g., sends ff into TfTf). To help distinguish TT from the functions that it acts on, TT in this setting is often called an operator rather than a function. This is a convention rather than a formal distinction: from a mathematical perspective, an operator is just a function.

A foundational class of operators acting on RX\RR^\Xsf is the set of linear operators. There is a strong sense in which linear operators are just matrices. We investigate these ideas in Section 2.3.3. At the same time, when studying dynamic programming we also use many operators that are not linear. One example is the “Bellman operator,” which we start to investigate in Section 1.3.1.2.

1.2.4.3Distributions

Given a set X\Xsf with nn elements, the set of probability distributions on X\Xsf is written as D(X)\dD(\Xsf) and contains all ϕR+X\phi \in \RR_+^\Xsf with xXϕ(x)=1\sum_{x \in \Xsf} \phi(x) =1. Since we can identify any fRXf \in \RR^\Xsf with a corresponding vector in Rn\RR^n, the set D(X)\dD(\Xsf) can also be thought of as a subset of Rn\RR^n. This collection of vectors (i.e., the nonnegative vectors that sum to unity) is also called the unit simplex. Given X0X\Xsf_0 \subset \Xsf and ϕD(X)\phi \in \dD(\Xsf), we say that ϕ\phi is supported on X0\Xsf_0 if ϕ(x)>0\phi(x) > 0 implies xX0x \in \Xsf_0.

Fix hRXh \in \RR^\Xsf and ϕD(X)\phi \in \dD(\Xsf). Let XX be a random variable with distribution ϕ\phi, so that P{X=x}=ϕ(x)\PP\{X = x\} = \phi(x) for all xXx \in \Xsf. The expectation of h(X)h(X) is

Eh(X)xXh(x)ϕ(x)=h,ϕ.\EE h(X) \coloneq \sum_{x \in \Xsf} h(x) \phi(x) = \inner{h, \phi}.

If XR\Xsf \subset \RR, then the cumulative distribution function (CDF) corresponding to ϕ\phi is the map Φ\Phi from X\Xsf to R\RR given by

Φ(x)P{Xx}=xX1{xx}ϕ(x).\Phi(x) \coloneq \PP\{X \leq x\} = \sum_{x' \in \Xsf} \1\{x' \leq x\} \phi(x').

If τ[0,1]\tau \in [0,1], then the τ\tau-th quantile of XX is

QτXmin{xX:Φ(x)τ}.Q_\tau \,X \coloneq \min \setntn{x \in \Xsf}{\Phi(x) \geq \tau}.

If τ=1/2\tau = 1/2, then QτXQ_\tau \,X is called the median of XX.

Evidently, if the median of XX is xx, then the median of X+αX + \alpha will be x+αx + \alpha. This same logic carries over to arbitrary quantiles, as the next exercise asks you to show.

1.3Infinite-Horizon Job Search

Armed with fixed-point methods, we return to the job search problem discussed in Section 1.1.2.

1.3.1Values and Policies

In this section, we solve for the value function of an infinite horizon job search problem and associated optimal choices.

1.3.1.1Optimal Choices

Let’s recall the strategy for solving the infinite-horizon job search problem we proposed in Paragraph. The first step is to compute the optimal value function vv^* that solves the Bellman equation

v(w)=max{w1β,c+βwWv(w)ϕ(w)}(wW).v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \, \sum_{w' \in \Wsf} \, v^*(w') \phi(w') \right\} \qquad (w \in \Wsf).

Suppose for a moment that we can compute vv^*, and let

hc+βwv(w)ϕ(w)h^* \coloneq c + \beta \sum_{w'} v^*(w') \phi(w')

be the infinite-horizon continuation value that equals the maximal lifetime value that the worker can receive, contingent on deciding to continue being unemployed today.

With hh^* in hand, the optimal decision at any given time, facing the current wage draw wWw \in \Wsf, is as follows:

  1. If w/(1β)hw / (1-\beta) \geq h^*, then accept the job offer.

  2. If not, then reject and wait for the next offer.

This decision maximizes lifetime value given the current offer.

(Later we will prove that this decision process is optimal as claimed. For now, however, we focus on computing vv^* and hh^*.)

1.3.1.2The Bellman Operator

The method proposed in Section 1.3.1.1 requires that we solve for vv^*. To do so, we introduce a Bellman operator TT defined at vRWv \in \RR^\Wsf that is constructed to assure that any fixed point of TT solves the Bellman equation and vice versa:

(Tv)(w)=max{w1β,c+βwWv(w)ϕ(w)}(wW).(Tv)(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \sum_{w' \in \Wsf} v(w') \phi(w') \right\} \qquad (w \in \Wsf).

Let VR+W\vV \coloneq \RR^\Wsf_+ and let \| \cdot \|_\infty be the supremum norm on V\vV. We measure the distance between two elements f,gf, g of V\vV by $| f

Now we turn to the proof of Proposition 1.3.1. An implication of the proposition is that TkvvT^k v \to v^* as kk \to \infty for any vVv \in \vV, so we can compute vv^* to any required degree of accuracy by successive approximation.

Our proof of Proposition 1.3.1 uses the elementary bound

αxαyxy(α,x,yR)|\alpha \vee x - \alpha \vee y| \leq |x - y| \qquad (\alpha, x, y \in \RR)

1.3.1.3Optimal Policies

A dynamic program seeks optimal policies. We briefly introduce the notion of a policy and relate it to the job search application.

In general, for a dynamic program, choices by the controller aim to maximize lifetime rewards and consist of a state-contingent sequence (At)t0(A_t)_{t \geq 0} specifying how the agent acts at each point in time. Workers do not know what the future will bring, so it is natural to assume that AtA_t can depend on present and past events but not future ones. Hence AtA_t is a function of the current state XtX_t and past state-action pairs (Ati,Xti)(A_{t-i}, X_{t-i}) for i1i \geq 1. That is,

At=σt(Xt,At1,Xt1,At2,Xt2,,A0,X0)A_t = \sigma_t( X_t, A_{t-1}, X_{t-1}, A_{t-2}, X_{t-2}, \ldots, A_0, X_0)

for some function σt\sigma_t; σt\sigma_t is called a time tt policy function.

A key insight of dynamic programming is that some problems can be set up so that the optimal current action can be expressed as a function of the current state XtX_t.

If the current state XtX_t is enough to determine a current optimal action, then policies are just maps from states to actions. So we can write At=σ(Xt)A_t = \sigma(X_t) for some function σ\sigma. A policy function that depends only on the current state is often called a Markov policy. Since all policies we consider will be Markov policies, we refer to them more concisely as “policies.”

In the job search model, the state is the current wage offer and possible actions are to accept or to reject the current offer. With 0 interpreted as reject and 1 understood as accept, the action space is {0,1}\{0,1\}, so a policy is a map σ\sigma from W\Wsf to {0,1}\{0,1\}. Let Σ\Sigma be the set of all such maps.

A policy is an “instruction manual”: for an agent following σΣ\sigma \in \Sigma, if current wage offer is ww, the agent always responds with σ(w){0,1}\sigma(w) \in \{0, 1\}. The policy dictates whether the agent accepts or rejects at any given wage.

For each vVv \in \vV, a vv-greedy policy is a σΣ\sigma \in \Sigma satisfying

σ(w)=1{w1βc+βwWv(w)ϕ(w)}for all wW.\sigma(w) = \1 \left\{ \frac{w}{1-\beta} \geq c + \beta \, \sum_{w' \in \Wsf} v(w') \phi(w') \right\} \quad \text{for all } w \in \Wsf.

Equation (1.54) says that an agent accepts if w/(1β)w/(1-\beta) exceeds the continuation value computed using vv and rejects otherwise. Our discussion of optimal choices in Section 1.3.1.1 can now be summarized as the recommendation

Adopt a v-greedy policy.\text{Adopt a } v^* \text{-greedy policy.}

This statement is sometimes called Bellman’s principle of optimality.

Inserting vv^* into (1.54) and rearranging, we can express a vv^*-greedy policy via

σ(w)=1{ww}where   w(1β)h.\sigma^*(w) = \1 \left\{ w \geq w^* \right\} \quad \text{where } \; w^* \coloneq (1 - \beta) h^* .

The quantity ww^* in (1.56) is called the reservation wage, and parallels the reservation wage that we introduced for the finite-horizon problem. Equation (1.56) states that value maximization requires accepting an offer if and only if it exceeds the reservation wage. Thus, ww^* provides a scalar description of an optimal policy.

1.3.2Computation

Let’s turn to computation. In Section 1.3.2.1, we apply a standard dynamic programming method, called value function iteration. In Section 1.3.2.2, we apply a more specialized method that uses the structure of the job search problem to accelerate computation.

1.3.2.1Value Function Iteration

Recall that, by Proposition 1.3.1, we can compute an approximate optimal policy by applying successive approximation via the Bellman operator. In the language of dynamic programming, this is called value function iteration. Algorithm 1.3 provides a full description.

While TkvT^k v rarely attains vv^* for k<k < \infty, we can obtain a close approximation by monitoring distances between successive iterates, waiting until they become small enough. Later we will study how these distances depend on kk, the number of iterations, as well as on parameters defining rewards and opportunities.

Listing 6 implements value function iteration for the infinite-horizon job search model, using the function for successive approximation from Listing 4.

Figure Figure 1.10 shows a sequence of iterates (Tkv)k(T^k v)_k when v0v \equiv 0 and parameters are as given in Listing 1. Iterates 0,10, 1, and 2 are shown, in addition to iterate 1000, which we take as a good approximation to the limiting function. If you experiment with different initial conditions, you will see that they all converge to the same limit.

A sequence of iterates of the Bellman operator

Figure 1.10:A sequence of iterates of the Bellman operator

Figure Figure 1.11 shows an approximation of vv^* computed using the code in Listing 6, along with the stopping reward w/(1β)w/(1-\beta) and the corresponding continuation value (1.46). As anticipated, the value function is the pointwise supremum of the stopping reward and the continuation value. The worker chooses to accept an offer only when that offer exceeds some value close to 43.5.

The approximate value function for job search

Figure 1.11:The approximate value function for job search

1.3.2.2Computing the Continuation Value Directly

The technique we employed to solve the job search model in Section 1.3.1 follows a standard approach to dynamic programming. But for this particular problem, there is an easier way to compute the optimal policy that sidesteps calculating the value function. This section explains how.

Recall that the value function satisfies the Bellman equation

v(w)=max{w1β,c+βwv(w)ϕ(w)}(wW),v^*(w) = \max \left\{ \frac{w}{1-\beta} ,\, c + \beta \sum_{w'} v^*(w') \phi(w') \right\} \qquad (w \in \Wsf),

and that the continuation value is given by (1.46). We can use hh^* to eliminate vv^* from (1.57). First we insert hh^* on the right-hand side of (1.57) and then we replace ww with ww', which gives v(w)=max{w/(1β),h}v^*(w') = \max \left\{ w'/(1-\beta) ,\, h^* \right\}. Then we take mathematical expectations of both sides, multiply by β\beta and add cc to obtain

h=c+βwmax{w1β,h}ϕ(w).h^* = c + \beta \sum_{w'} \max \left\{ \frac{w'}{1-\beta} ,\, h^* \right\} \phi(w').

To obtain the unknown value hh^*, we introduce the mapping g ⁣:R+R+g \colon \RR_+ \to \RR_+ defined by

g(h)=c+βwmax{w1β,h}ϕ(w).g(h) = c + \beta \sum_{w'} \max \left\{ \frac{w'}{1-\beta} ,\, h \right\} \phi(w').

By construction, hh^* solves (1.58) if and only if hh^* is a fixed point of gg.

Figure Figure 1.12 shows the function gg using the discrete wage offer distribution and parameters as adopted previously. The unique fixed point is hh^*.

Computing the continuation value as the fixed point of g

Figure 1.12:Computing the continuation value as the fixed point of gg

Exercise 1.33 implies that we can compute hh^* by choosing arbitrary hR+h \in \RR_+ and iterating with gg. Doing so produces a value of approximately 1086. (The associated reservation wage is w=(1β)h43.4w^* = (1-\beta) h^* \approx 43.4.) Computation of hh^* using this method is much faster than value function iteration because the fixed-point problem is in R+\RR_+ rather than R+n\RR^n_+.

With hh^* in hand, we have solved the dynamic programming problem, since a policy σ\sigma^* is vv^*-greedy if and only if it satisfies

σ(w)=1{w1βh}(wR+).\sigma^*(w) = \1 \left\{ \frac{w}{1-\beta} \geq h^* \right\} \qquad (w \in \RR_+).

1.4Chapter Notes

Dynamic programming is often attributed to Richard Bellman (1920--1984). Both the term “dynamic programming” and the technique were popularized by Bellman (1957). According to his autobiography, Bellman chose the name dynamic programming to avoid giving the impression that he was conducting mathematical research within RAND Corporation. His ultimate boss, Secretary of Defense Charles Wilson, apparently disliked such research Bellman, 1984.

For treatments of dynamic programming from the perspective of economics and finance, see, for example, Sargent (1987), Stokey & Lucas (1989), Van & Dana (2003), Bäuerle & Rieder (2011), or Stachurski (2022).

The job search model was introduced by McCall (1970). The McCall model and its extensions transformed economists’ way of thinking about labor markets (see, e.g., Lucas (1978)). Influential extensions to the job search model include Burdett (1978), Jovanovic (1979), Pissarides (1979), Jovanovic (1984), Mortensen (1986), Ljungqvist (2002) and Chetty (2008). Rogerson et al. (2005) provides a useful survey.

For elementary real analysis, the book by Bartle & Sherbert (2011) is excellent. Ok (2007) is a superb treatment of real analysis and how it is used throughout economic theory. Discussions of Banach’s theorem and the Neumann series lemma can be found in Cheney (2013) and Atkinson & Han (2005). Rocha & Vailakis (2010) provides an extension to Banach’s theorem that requires only local contractivity.

Footnotes
  1. The procedure of solving the last period first and then working back in time is called backward induction. Starting with the last period makes sense because there is no future to consider.

  2. Incidentally, imposing an infinite horizon is not the same as assuming humans live forever. Rather, it corresponds to the idea that humans have no specific “termination” date. More generally, we can understand an infinite horizon as an approximation to a finite horizon in which observations are recorded at relatively high frequency and no clear termination date exists.

  3. Hint: To prove that AA is invertible and B=A1B = A^{-1}, it suffices to show that AB=IAB = I.

References
  1. McCall, J. J. (1970). Economics of information and job search. The Quarterly Journal of Economics, 84(1), 113–126.
  2. Sargent, T., & Stachurski, J. (2023). Economic Networks: Theory and Computation. Cambridge University Press.
  3. Kreyszig, E. (1978). Introductory Functional Analysis with Applications (Vol. 1). Wiley New York.
  4. Bollobás, B. (1999). Linear Analysis: An Introductory Course. Cambridge University Press.
  5. Çınlar, E., & Vanderbei, R. J. (2013). Real and Convex Analysis. Springer Science & Business Media.
  6. Bellman, R. (1957). Dynamic Programming. In Science. American Association for the Advancement of Science.
  7. Bellman, R. (1984). Eye of the Hurricane. World Scientific.
  8. Sargent, T. (1987). Dynamic Macroeconomic Theory. Harvard University Press.
  9. Stokey, N., & Lucas, R. (1989). Recursive Methods in Dynamic Economics. Harvard University Press.
  10. Van, C., & Dana, R.-A. (2003). Dynamic Programming in Economics. Springer.
  11. Bäuerle, N., & Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer Science & Business Media.
  12. Stachurski, J. (2022). Economic Dynamics: Theory and Computation (2nd ed.). MIT Press.
  13. Lucas, R. E. (1978). Unemployment policy. American Economic Review, 68(2), 353–357.
  14. Burdett, K. (1978). A theory of employee job search and quit rates. American Economic Review, 68(1), 212–220.
  15. Jovanovic, B. (1979). Firm-specific capital and turnover. Journal of Political Economy, 87(6), 1246–1260.