The temporal structure of a typical dynamic program is
The state is a vector listing current values of variables deemed relevant to choosing the current action. The action is a vector describing choices of a set of decision variables. If , then the problem has a finite horizon. Otherwise it is an infinite horizon problem. Figure Figure 1.1 illustrates the first two rounds of a dynamic program. As shown in the figure, a rule for updating the state depends on the current state and action.
Figure 1.1:A dynamic program
Dynamic programming provides a way to maximize the expected lifetime reward of a decision-maker who receives a prospective reward sequence and who confronts a system that maps today’s state and control into the next period’s state. A lifetime reward is an aggregation of the individual period rewards into a single value. An example of lifetime reward is an expected discounted sum for some .
Dynamic programming has a vast array of applications, from robotics and artificial intelligence to the sequencing of DNA. Dynamic programming is used every day to control aircraft, route shipping, test products, recommend information on media platforms, and solve research problems. Some companies produce specialized computer chips that are designed for specific dynamic programs.
Within economics and finance, dynamic programming is applied to topics including unemployment, monetary policy, fiscal policy, asset pricing, firm investment, wealth dynamics, inventory control, commodity pricing, sovereign default, the division of labor, natural resource extraction, human capital accumulation, retirement decisions, portfolio choice, and dynamic pricing. We discuss some of these applications in the rest of the book.
The core theory of dynamic programming is relatively simple and concise. But implementation can be computationally demanding. That situation provides one of the major challenges facing the field of dynamic programming.
In this book, we discuss fundamental theory, traditional economic applications, and recent applications with computationally demanding environments. We also cover recent trends towards more sophisticated specifications of lifetime rewards, often called recursive preferences. Throughout the book, theory and computation are combined, since, for interesting problems, brute-force computation is futile, while theory alone provides limited insights. The interplay between interesting applications, fundamental theory, computational methods, and evolving hardware capability makes dynamic programming exciting.
1.1Bellman Equations¶
In this section, we introduce the recursive structure of dynamic programming in a simple setting. After solving a finite-horizon model, we consider an infinite-horizon version and explain how it produces a system of nonlinear equations. Then we turn to methods for solving such systems.
1.1.1Finite-Horizon Job Search¶
We begin with a celebrated model of job search created by McCall (1970). McCall analyzed the decision problem of an unemployed worker in terms of current and prospective wage offers, impatience, and the availability of unemployment compensation. Here we study a simple version of the model in which essential ideas of dynamic programming are particularly clear.
Readers who are familiar with Bellman equations can skim this section quickly and proceed directly to Section Section 1.2.
1.1.1.1A Two-Period Problem¶
Imagine someone who begins her working life at time without employment. While unemployed, she receives a new job offer paying wage at each date . She can accept the offer and work permanently at that wage level or reject the offer, receive unemployment compensation , and draw a new offer next period. We assume that the wage offer sequence is iid and nonnegative, with distribution . In particular,
is a finite set of possible wage outcomes and
is a probability distribution on , assigning a probability to each possible wage outcome .
The worker is impatient. Impatience is parameterized by a time discount factor , so that the present value of a next-period payoff of dollars is . Since , the worker will be tempted to accept reasonable offers, rather than to wait for better ones. A key question is how long to wait.
Suppose as a first step that working life is just two periods. To solve our problem, we work backwards, starting at the final date , after has been observed.[1] If she is already employed, the worker has no decision to make: She continues working at her current wage. If she is unemployed, then she should take the largest of and .
Now we step back to . At this time, having received offer , the unemployed worker’s options are (a) accept and receive it in both periods or (b) reject it, receive unemployment compensation , and then, in the second period, choose the maximum of and .
Let’s assume that the worker seeks to maximize EPV. The EPV of option (a) is , which is also called the stopping value. The EPV of option (b), also called the continuation value, is . More explicitly,
The optimal choice at is now clear: Accept the offer if and reject otherwise. A decision tree is shown in Figure Figure 1.2.
Figure 1.2:Decision tree for a two-period problem
1.1.1.2Comments on Information¶
In determining the optimal choice, we assumed that the worker (a) cares about expected values and (b) knows how to compute them. In Chapter 7 and Chapter 8 we discuss how to extend or weaken these assumptions. Some of these extensions allow decision-makers to focus on measurements that differ from expected values. Other extensions assume that the decision-maker does not know underlying probability distributions. For now we put these issues aside and return to the setup discussed in Section 1.1.1.1.
1.1.1.3Value Functions¶
A key idea in dynamic programming is to use “value functions” to track maximal lifetime rewards from a given state at a given time. The time 2 value function defined in (1.2) returns the maximum value obtained in the final stage for each possible realization of the time 2 wage offer. The time 1 value function evaluated at is
It represents the present value of expected lifetime income after receiving the first offer , conditional on choosing optimally in both periods.

Figure 1.3:The value function and the reservation wage
The value function is shown in Figure Figure 1.3. This figure also shows the reservation wage
It is the that solves the indifference condition
and equates the value of stopping to the value of continuing. For an offer above , the stopping value exceeds the continuation value. For an offer below the reservation wage, the reverse is true. Hence, the optimal choice for the worker at is completely described by the reservation wage.
Parameters and functions underlying the figure are shown in Listing 1.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30using Distributions "Creates an instance of the job search model, stored as a NamedTuple." function create_job_search_model(; n=50, # wage grid size w_min=10.0, # lowest wage w_max=60.0, # highest wage a=200, # wage distribution parameter b=100, # wage distribution parameter β=0.96, # discount factor c=10.0 # unemployment compensation ) w_vals = collect(LinRange(w_min, w_max, n+1)) ϕ = pdf(BetaBinomial(n, a, b)) return (; n, w_vals, ϕ, β, c) end " Computes lifetime value at t=1 given current wage w_1 = w. " function v_1(w, model) (; n, w_vals, ϕ, β, c) = model h_1 = c + β * max.(c, w_vals)'ϕ return max(w + β * w, h_1) end " Computes reservation wage at t=1. " function res_wage(model) (; n, w_vals, ϕ, β, c) = model h_1 = c + β * max.(c, w_vals)'ϕ return h_1 / (1 + β) end
Program 1:Computing and (two_period_job_search.jl)
Equation (1.4) is instructive. We can see that higher unemployment compensation shifts up the continuation value and increases the reservation wage. As a result, the worker will, on average, spend more time unemployed when unemployment compensation is higher.
1.1.1.4Three Periods¶
Now let’s suppose that the worker works in period as well as . Figure Figure 1.4 shows the decision tree for the three periods. Notice that the subtree containing nodes 1 and 2 is just the decision tree for the two-period problem in Figure Figure 1.2. We will use this to find optimal actions.
Figure 1.4:Decision tree for the job seeker
At , the value of accepting the current offer is , while the maximal value of rejecting and waiting is plus, after discounting by , the maximum value that can be obtained by behaving optimally from . We have already calculated this value: It is just , as given in (1.3)!
The maximal time zero value is the maximum of the value of these two options, given , so we can write
By plugging from (1.3) into this expression, we can determine , as well as the optimal action, the one that achieves the largest value in the max term in (1.6).
Figure Figure 1.4 illustrates how the backward induction process works. The last-period value function is trivial to obtain. With in hand, we can compute . With in hand, we can compute . Once all the value functions are available, we can calculate whether to accept or reject the current offer at each point in time.
Notice how we subdivided the three-period problem down into a pair of two-period problems, given by (1.3) and (1.6). Breaking many-period problems down into a sequence of two-period problems is the essence of dynamic programming. The recursive relationships between and in (1.6), as well as between and in (1.3), are examples of what are called Bellman equations. We will see many other examples.
1.1.2Infinite Horizon¶
Next, we consider an infinite horizon problem that in some ways is more challenging but in other ways simpler. On the one hand, the lack of a terminal period means that backward induction requires a subtler justification. On the other hand, the infinite horizon means that the worker always faces an infinite future, so that we only have to study a single-value function and need not keep track of the number of remaining periods in the problem. This will become clearer as the section unfolds.[2]
With this discussion in mind, let us consider a worker who aims to maximize
where is earnings at time . As before, jobs are permanent, so accepting a job at a given wage means earning that wage in every subsequent period.
Let’s clarify our assumptions:
Here and in what follows, for any finite or countable set , the symbol indicates the set of distributions on .
As with the finite-state case, infinite-horizon dynamic programming involves a two-step procedure that first assigns values to states and then deduces optimal actions given those values. We begin with an informal discussion and then formalize the main ideas.
To trade off current and future rewards optimally, we need to compare the current payoffs we get from our two choices with the states that those choices lead to and the maximum value that can be extracted from those states. But how do we calculate the maximum value that can be extracted from each state when lifetime is infinite?
Consider first the present expected lifetime value of being employed with wage . This case is easy because, under the current assumptions, workers who accept a job are employed forever. Lifetime payoff is
How about the maximum present expected lifetime value attainable when entering the current period unemployed with wage offer in hand? Denote this (as yet unknown) value by . We call the value function. While is not trivial to pin down, the task is not impossible. Our first step in the right direction is to observe that it satisfies the Bellman equation
at every . (Here is the offer next period.)
Our reasoning is as follows: The first term inside the max operation is the stopping value, or lifetime payoff from accepting current offer . The second term inside the max operation is the continuation value, or current expected value of rejecting and behaving optimally thereafter. The maximal value is obtained by selecting the largest of these two alternatives.
Note the similarity between (1.9) and our finite horizon Bellman equations (1.3) and (1.6). The only real difference is that the value function is no longer time-dependent. This is because the worker always looks forward toward an infinite horizon, regardless of the current date.
Equation (1.9) is to be solved for a function , the set of all functions from to . Once we have solved for (assuming this is possible), optimal choices can be made by observing current and then choosing the largest of the two alternatives on the right-hand side of (1.9), just as we did in the finite horizon case. This idea -- that optimal choices can be made by computing the value function and maximizing the right-hand side of the Bellman equation -- is called Bellman’s principle of optimality, and will be a cornerstone of what follows. Later we prove it in a general setting.
To solve for , we use fixed-point theory, our topic in the next section. Later, in Section Section 1.3, we return to the job search problem and apply fixed-point theory to solve for .
1.2Stability and Contractions¶
In this section, we cover enough fixed-point theory to solve an infinite horizon job search problem. (In Chapter 2 we consider more general results.) Readers who are familiar with the Neumann series lemma and Banach’s fixed-point theorem can skim this section and proceed to Section 1.3.
1.2.1Vector Space¶
To begin, we recall some fundamental properties of real numbers, finite-dimensional vector space, basic topology, and equivalence of norms.
1.2.1.1Real and Complex Vectors¶
For the most part, we are interested in vectors whose elements are real numbers (as distinguished from complex numbers). Before investigating such vectors, let’s provide some useful language about the real line . (You might want to review some elementary concepts from real analysis in Appendix §Appendix A, such as suprema, infima, minima, maxima, and convergence.)
Given , let and . The absolute value of is defined as .
A real-valued vector is a finite real sequence with as the -th element. The set of all real vectors of length is denoted by . The inner product of -vectors and is .
The set of complex numbers is defined in the appendix to Sargent & Stachurski (2023) and many other places; as is the set of all complex-valued -vectors. We assume readers know what complex numbers are and how to compute the modulus of a complex number.
1.2.1.2Norms¶
The Euclidean norm on a real vector space is defined as
Because they provide more flexibility when checking conditions that underlie various results, some alternative norms on are important for applications of fixed-point theory.
As a first step, recall that a function is called a norm on if, for any and ,
2
and
(nonnegativity)
(positive definiteness)
(absolute homogeneity)
(triangle inequality)
The Euclidean norm on satisfies the Cauchy--Schwarz inequality
This inequality can be used to prove that the triangle inequality holds for the Euclidean norm (see, e.g., Kreyszig (1978)).
The norm and the Euclidean norm are special cases of the so-called norm, which is defined for by
It can be shown that is a norm for all , as suggested by the name (see, e.g., Kreyszig (1978)). For this norm, the subadditivity asserted in (d) is called Minkowski’s inequality.
Since the Euclidean case is obtained by setting , the Euclidean norm is also called the norm, and we write rather than when extra clarity is required.
(The symbol is used because, for all , we have as .)
For the next exercise, we recall that the indicator function of logical statement , denoted here by , takes value 1 (resp., 0) if is true (resp., false). For example, if , then
If , where is any set, then for all .
1.2.1.3Equivalence of Vector Norms¶
An important property of a finite-dimensional normed vector space is that all norms are “equivalent.” Let’s review this result and discuss why it matters.
To begin, recall that when and are all elements of , we say that converges to and write if
It might seem that this definition is imprecise. Don’t we need to clarify that the convergence is with respect to a particular norm?
No we don’t. This is because any two norms and on are equivalent in the sense that there exist finite positive constants such that
(See, e.g., Kreyszig (1978).)
The next exercise tells us that pointwise convergence and norm convergence are the same thing in finite dimensions.
Recall that a set is called bounded if there exists an with for all ; and closed in if, for all and sequences such that as , we also have . A set is called open in if is closed in . A set is called a neighborhood of if there exists an open set with . A map from to is called continuous at if for any with ; and continuous if is continuous at every . These notions apply to any norm, since convergence does not depend on our choice of norm.
1.2.1.4Matrices and Neumann Series¶
Next, we discuss geometric series in matrix space, along with the Neumann series lemma, one of many useful results in applied and numerical analysis.
Before starting we recall that if is an matrix with -th element , then the definition of matrix multiplication tells us that for , the -th element of is , while the -th element of is . Think of and is two different mappings, each of which takes an -vector and produces a new -vector.
Just as we considered norms of vectors in Section 1.2.1.2, we will find it helpful to have a notion of norms of matrices. A real-valued map defined on , the set of real matrices, is called a matrix norm if it has the following properties: for any and any matrices ,
,
,
,
, and
These are called nonnegativity, positive definiteness, absolute homogeneity, and the triangle inequality, analogous to the norms on discussed in Section 1.2.1.2.
An example of a matrix norm is the so-called operator norm
Here is , is in and the norm on the right-hand side is the Euclidean norm over the -vector . Another example of a matrix norm is the supremum norm defined as
Some matrix norms have the submultiplicative property, which means that, for all , we have .
In what follows we often use the operator norm as our choice of matrix norm (partly because of its attractive submultiplicative property). Hence, by convention, an expression such as refers to the operator norm of .
Analogous to the vector case, we say that a sequence of matrices converges to an matrix and write if as . Just as with vectors, this form of norm convergence holds if and only if each element of converges to the corresponding element of . The proof is similar to the solution to Exercise 1.11.
If is an matrix, then is called an eigenvalue of if there exists a nonzero such that . (Here is the set of complex numbers and is the set of complex -vectors.) A vector satisfying this equality is called an eigenvector of and is called an eigenpair.
In Julia, we can compute the eigenvalues of a square matrix via eigvals(A). The code
using LinearAlgebra
A = [0 -1;
1 0]
println(eigvals(A))produces
2-element Vector{ComplexF64}:
0.0 - 1.0im
0.0 + 1.0imHere im stands for , the imaginary unit, so the eigenvalues of are and .
Turning to geometric series, let us begin in one dimension. Consider the one-dimensional linear equation , where are given and is unknown. Its solution satisfies
This scalar result extends naturally to vectors. To show this we suppose that and are column vectors in , and that is an matrix. We consider the vector equation . For the next result, we recall that the spectral radius of is defined as
Here indicates the modulus of complex number .
With as the identity matrix, we can state the following result.
It follows directly that the vector system has a unique solution whenever . This is the multivariate extension of (1.19).
The code in Listing 2 shows how to compute the spectral radius of an arbitrary matrix in Julia. The print statement produces 0.5828, so, for this matrix, .
1 2 3 4 5using LinearAlgebra ρ(A) = maximum(abs(λ) for λ in eigvals(A)) # Spectral radius A = [0.4 0.1; # Test with arbitrary A 0.7 0.2] print(ρ(A))
Program 2:Computing a spectral radius (compute_spec_rad.jl)
The rest of this section works through the proof of the Neumann series lemma, with several parts left as exercises. An informal proof of the lemma runs as follows. If , then
Rearranging gives , which matches the claim in the Neumann series lemma.
This informal argument lacks rigor. To make it rigorous, we must prove (a) that the sum converges and (b) that the matrix is invertible.
A proof of Lemma 1.2.2 can be found in Chapter 12 of Bollobás (1999). The second result is sometimes called Gelfand’s formula.
From this last result, one can show that exists by computing it:
Listing 3 helps illustrate the result in Exercise 1.17, although we truncate the infinite sum at 50.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20# Primitives A = [0.4 0.1; 0.7 0.2] # Method one: direct inverse B_inverse = inv(I - A) # Method two: power series function power_series(A) B_sum = zeros((2, 2)) A_power = I for k in 1:50 B_sum += A_power A_power = A_power * A end return B_sum end # Print maximal error print(maximum(abs.(B_inverse - power_series(A))))
Program 3:Matrix inversion versus power series (power_series.jl)
The output 5.621e-12 is close enough to zero for many practical purposes.
1.2.2Nonlinear Systems¶
While the Neumann series lemma is a powerful tool for solving linear systems, it doesn’t help us with nonlinear problems. In this section, we present Banach’s fixed-point theorem, one of a variety of techniques for handling nonlinear systems. (Chapter 2 introduces other methods.)
1.2.2.1Fixed Points¶
A standard approach to solving an equation is to formulate it as a fixed-point problem. This section provides the basic definitions and some simple results from fixed-point theory.
Let be any nonempty set. We call a self-map on if is a function from into itself. For a self-map on , a point is called a fixed point of in if . (In fixed-point theory, it is common to write for the image of under , rather than .)
Figure Figure 1.5 shows another example, for a self-map on . Fixed points are numbers where meets the 45-degree line. In this case there are three.

Figure 1.5:Graph and fixed points of
When considering fixed points, given a self-map on , we typically seek conditions on and under which the following properties hold:
has at least one fixed point on (existence),
has at most one fixed point on (uniqueness), and
the fixed point of on can be computed numerically.
1.2.2.2Global Stability¶
A self-map on is called globally stable on if has a unique fixed point in and as for all . Here indicates compositions of with itself. Global stability is a desirable property in the setting of dynamic programming. A number of our results rely on it.
Let be a self-map on . We call invariant on and call an invariant set if is also a self-map on ; that is, if implies .
1.2.2.3Banach’s Fixed-Point Theorem¶
Next, we present the Banach fixed-point theorem, a workhorse for analyzing nonlinear operators.
Let be a nonempty subset of and let be a norm on . A self-map on is called a contraction on with respect to if there exists a such that
The constant is called the modulus of contraction.
The following theorem features a contraction.
We prove Theorem 1.2.3 in stages that build on the following exercises.
A fundamental property of is that if is a Cauchy sequence in , then there exists a such that converges to . (This property is called completeness of the vector space . See, for example, Çınlar & Vanderbei (2013).) Hence it follows from Exercise 1.25 that has a limit .
1.2.3Successive Approximation¶
Consider a self-map on . We seek algorithms that compute fixed points of whenever they exist.
1.2.3.1Iteration¶
If is globally stable on , then a natural algorithm for approximating the unique fixed point of in is to pick any and iterate with for some finite number of steps:
By the definition of global stability, converges to . The algorithm just described is called either successive approximation or fixed-point iteration. Listing 4 provides a function that implements this procedure. Distances between points are measured with the norm.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35""" Computes an approximate fixed point of a given operator T via successive approximation. """ function successive_approx(T, # operator (callable) u_0; # initial condition tolerance=1e-6, # error tolerance max_iter=10_000, # max iteration bound print_step=25) # print at multiples u = u_0 error = Inf k = 1 while (error > tolerance) & (k <= max_iter) u_new = T(u) error = maximum(abs.(u_new - u)) if k % print_step == 0 println("Completed iteration $k with error $error.") end u = u_new k += 1 end if error <= tolerance println("Terminated successfully in $k iterations.") else println("Warning: hit iteration bound.") end return u end
Program 4:Successive approximation (s_approx.jl)
1 2 3 4 5 6 7 8 9 10 11 12 13 14include("s_approx.jl") using LinearAlgebra # Compute the fixed point of Tu = Au + b via linear algebra A, b = [0.4 0.1; 0.7 0.2], [1.0; 2.0] u_star = (I - A) \ b # compute (I - A)^{-1} * b # Compute the fixed point via successive approximation T(u) = A * u + b u_0 = [1.0; 1.0] u_star_approx = successive_approx(T, u_0) # Test for approximate equality (prints "true") print(isapprox(u_star, u_star_approx, rtol=1e-5))
Program 5:Using successive approximations to compute (linear_iter.jl)
Listing 5 applies successive approximation to the map using the function defined in s_approx.jl. Figure Figure 1.6 shows the sequence of iterates generated by four runs of the successive approximation algorithm, each with a different starting condition . The map and parameters are the same as in Listing 5. It is clear from the figure that a good choice of initial condition (i.e., one that is close to the fixed point) accelerates convergence.

Figure 1.6:Successive approximations from different initial conditions
Of course for with , there is a more direct method to compute the fixed point: The Neumann series lemma tells us that so we can apply a numerical linear equation solver. However, even for this case, sometimes successive approximation is used instead. One reason is that can be very large, making application of a linear solver problematic. Another is that we might be satisfied with a quick approximation of the fixed point, computed with a few iterations of . Both of these situations can arise in dynamic programming.
1.2.3.2A One-Dimensional Example¶
To illustrate successive approximations in a nonlinear setting, we use the Solow--Swan growth model, which is a good place to begin presenting a theory of economic growth. A fixed point for the Solow--Swan model can be computed with pencil and paper. The model also provides a good laboratory for studying how successive approximations might converge to a fixed point.
One version of the Solow--Swan growth dynamics is
where is capital stock per worker, is a production function, is a saving rate and is a rate of depreciation. If we set , then iterating with from a starting point (i.e., setting for all ) generates the sequence in (1.31). We can also understand this process as using successive approximation to compute the fixed point of .
Although the model specified in Exercise 1.28 does not generate a contraction, it is globally stable. The next exercise asks you to prove this.
Figure Figure 1.7 illustrates the dynamics in a 45-degree diagram when . In the top subfigure, , , and . The function is plotted alongside the 45-degree line. When lies strictly above the 45-degree line, then and so capital per worker rises. If then it falls. A trajectory that is produced by starting from a particular choice of is traced out in the figure.

Figure 1.7:Successive approximation for the Solow--Swan model
The figure illustrates that is the unique fixed point of in and all sequences converge to it. The second statement can be rephrased as: successive approximation successfully computes the fixed point of by stepping through the time path of capital.
1.2.4Finite-Dimensional Function Space¶
In Paragraph we introduced a Bellman equation for the infinite horizon job search problem. The unknown object in the Bellman equation is a function defined on the set of possible wage offers. Below we discuss how to solve for this unknown function.
Since the set of wage offers is finite we can write as for some . If we adopt this convention and also write as , then we can view as a vector in . The vector interpretation is useful when coding, since vectors (numerical arrays) are an efficient data type.
Nevertheless, for mathematical exposition, we usually find it more convenient to express function-like objects (e.g., value functions) as functions rather than vectors. Thus, we typically write instead of .
Section Section 1.2.4.1 clarifies our notation with respect to functions and vectors.
1.2.4.1Pointwise Operations on Functions¶
If is any set and maps to , then we call a real-valued function on and write . Throughout, the symbol denotes the set of all real-valued functions on . This is a special case of the symbol that represents the set of all functions from to , where and are sets.
If and , then the expressions and also represent elements of , defined at by
Similarly, , , and are real-valued functions on defined by
Figure Figure 1.8 illustrates functions and when is a subset of .
Figure 1.8:Functions and
Similarly, if and are vectors in , then
Figure Figure 1.9 illustrates in .

Figure 1.9:The vectors and in
1.2.4.2Functions versus Vectors¶
Let be finite, so that for some . The set is the vector space expressed in different notation. The next lemma clarifies.
The claim in Lemma 1.2.4 is obvious: a real-valued function on is uniquely identified by the set of values that it takes on , which is an -tuple of real numbers.
Throughout the text, whenever the supporting set is finite, we freely use the identification in (1.39). For example, if is any norm on , then extends to via the identification in (1.39). That is, for , the value is given by the norm of the vector .
We say that a subset of is closed (resp., open, compact, etc.) if the corresponding subset of is closed (resp., open, compact, etc.)
With these conventions, the Neumann series lemma and Banach’s contraction mapping theorem extend directly from to . For example, if , is closed in and is a contraction on , in the sense that and
then has a unique fixed point in and
Incidentally, in the preceding paragraph is a function that sends functions into functions (e.g., sends into ). To help distinguish from the functions that it acts on, in this setting is often called an operator rather than a function. This is a convention rather than a formal distinction: from a mathematical perspective, an operator is just a function.
A foundational class of operators acting on is the set of linear operators. There is a strong sense in which linear operators are just matrices. We investigate these ideas in Section 2.3.3. At the same time, when studying dynamic programming we also use many operators that are not linear. One example is the “Bellman operator,” which we start to investigate in Section 1.3.1.2.
1.2.4.3Distributions¶
Given a set with elements, the set of probability distributions on is written as and contains all with . Since we can identify any with a corresponding vector in , the set can also be thought of as a subset of . This collection of vectors (i.e., the nonnegative vectors that sum to unity) is also called the unit simplex. Given and , we say that is supported on if implies .
Fix and . Let be a random variable with distribution , so that for all . The expectation of is
If , then the cumulative distribution function (CDF) corresponding to is the map from to given by
If , then the -th quantile of is
If , then is called the median of .
Evidently, if the median of is , then the median of will be . This same logic carries over to arbitrary quantiles, as the next exercise asks you to show.
1.3Infinite-Horizon Job Search¶
Armed with fixed-point methods, we return to the job search problem discussed in Section 1.1.2.
1.3.1Values and Policies¶
In this section, we solve for the value function of an infinite horizon job search problem and associated optimal choices.
1.3.1.1Optimal Choices¶
Let’s recall the strategy for solving the infinite-horizon job search problem we proposed in Paragraph. The first step is to compute the optimal value function that solves the Bellman equation
Suppose for a moment that we can compute , and let
be the infinite-horizon continuation value that equals the maximal lifetime value that the worker can receive, contingent on deciding to continue being unemployed today.
With in hand, the optimal decision at any given time, facing the current wage draw , is as follows:
If , then accept the job offer.
If not, then reject and wait for the next offer.
This decision maximizes lifetime value given the current offer.
(Later we will prove that this decision process is optimal as claimed. For now, however, we focus on computing and .)
1.3.1.2The Bellman Operator¶
The method proposed in Section 1.3.1.1 requires that we solve for . To do so, we introduce a Bellman operator defined at that is constructed to assure that any fixed point of solves the Bellman equation and vice versa:
Let and let be the supremum norm on . We measure the distance between two elements of by $| f
g| = \max_{w \in \Wsf}|f(w) - g(w)|$. Under this distance, we have the following result.
Now we turn to the proof of Proposition 1.3.1. An implication of the proposition is that as for any , so we can compute to any required degree of accuracy by successive approximation.
Our proof of Proposition 1.3.1 uses the elementary bound
1.3.1.3Optimal Policies¶
A dynamic program seeks optimal policies. We briefly introduce the notion of a policy and relate it to the job search application.
In general, for a dynamic program, choices by the controller aim to maximize lifetime rewards and consist of a state-contingent sequence specifying how the agent acts at each point in time. Workers do not know what the future will bring, so it is natural to assume that can depend on present and past events but not future ones. Hence is a function of the current state and past state-action pairs for . That is,
for some function ; is called a time policy function.
A key insight of dynamic programming is that some problems can be set up so that the optimal current action can be expressed as a function of the current state .
If the current state is enough to determine a current optimal action, then policies are just maps from states to actions. So we can write for some function . A policy function that depends only on the current state is often called a Markov policy. Since all policies we consider will be Markov policies, we refer to them more concisely as “policies.”
In the job search model, the state is the current wage offer and possible actions are to accept or to reject the current offer. With 0 interpreted as reject and 1 understood as accept, the action space is , so a policy is a map from to . Let be the set of all such maps.
A policy is an “instruction manual”: for an agent following , if current wage offer is , the agent always responds with . The policy dictates whether the agent accepts or rejects at any given wage.
For each , a -greedy policy is a satisfying
Equation (1.54) says that an agent accepts if exceeds the continuation value computed using and rejects otherwise. Our discussion of optimal choices in Section 1.3.1.1 can now be summarized as the recommendation
This statement is sometimes called Bellman’s principle of optimality.
Inserting into (1.54) and rearranging, we can express a -greedy policy via
The quantity in (1.56) is called the reservation wage, and parallels the reservation wage that we introduced for the finite-horizon problem. Equation (1.56) states that value maximization requires accepting an offer if and only if it exceeds the reservation wage. Thus, provides a scalar description of an optimal policy.
1.3.2Computation¶
Let’s turn to computation. In Section 1.3.2.1, we apply a standard dynamic programming method, called value function iteration. In Section 1.3.2.2, we apply a more specialized method that uses the structure of the job search problem to accelerate computation.
1.3.2.1Value Function Iteration¶
Recall that, by Proposition 1.3.1, we can compute an approximate optimal policy by applying successive approximation via the Bellman operator. In the language of dynamic programming, this is called value function iteration. Algorithm 1.3 provides a full description.
While rarely attains for , we can obtain a close approximation by monitoring distances between successive iterates, waiting until they become small enough. Later we will study how these distances depend on , the number of iterations, as well as on parameters defining rewards and opportunities.
Listing 6 implements value function iteration for the infinite-horizon job search model, using the function for successive approximation from Listing 4.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24include("two_period_job_search.jl") include("s_approx.jl") " The Bellman operator. " function T(v, model) (; n, w_vals, ϕ, β, c) = model return [max(w / (1 - β), c + β * v'ϕ) for w in w_vals] end " Get a v-greedy policy. " function get_greedy(v, model) (; n, w_vals, ϕ, β, c) = model σ = w_vals ./ (1 - β) .>= c .+ β * v'ϕ # Boolean policy vector return σ end " Solve the infinite-horizon IID job search model by VFI. " function vfi(model=default_model) (; n, w_vals, ϕ, β, c) = model v_init = zero(model.w_vals) v_star = successive_approx(v -> T(v, model), v_init) σ_star = get_greedy(v_star, model) return v_star, σ_star end
Program 6:Value function iteration (iid_job_search.jl)
Figure Figure 1.10 shows a sequence of iterates when and parameters are as given in Listing 1. Iterates , and 2 are shown, in addition to iterate 1000, which we take as a good approximation to the limiting function. If you experiment with different initial conditions, you will see that they all converge to the same limit.

Figure 1.10:A sequence of iterates of the Bellman operator
Figure Figure 1.11 shows an approximation of computed using the code in Listing 6, along with the stopping reward and the corresponding continuation value (1.46). As anticipated, the value function is the pointwise supremum of the stopping reward and the continuation value. The worker chooses to accept an offer only when that offer exceeds some value close to 43.5.

Figure 1.11:The approximate value function for job search
1.3.2.2Computing the Continuation Value Directly¶
The technique we employed to solve the job search model in Section 1.3.1 follows a standard approach to dynamic programming. But for this particular problem, there is an easier way to compute the optimal policy that sidesteps calculating the value function. This section explains how.
Recall that the value function satisfies the Bellman equation
and that the continuation value is given by (1.46). We can use to eliminate from (1.57). First we insert on the right-hand side of (1.57) and then we replace with , which gives . Then we take mathematical expectations of both sides, multiply by and add to obtain
To obtain the unknown value , we introduce the mapping defined by
By construction, solves (1.58) if and only if is a fixed point of .
Figure Figure 1.12 shows the function using the discrete wage offer distribution and parameters as adopted previously. The unique fixed point is .

Figure 1.12:Computing the continuation value as the fixed point of
Exercise 1.33 implies that we can compute by choosing arbitrary and iterating with . Doing so produces a value of approximately 1086. (The associated reservation wage is .) Computation of using this method is much faster than value function iteration because the fixed-point problem is in rather than .
With in hand, we have solved the dynamic programming problem, since a policy is -greedy if and only if it satisfies
1.4Chapter Notes¶
Dynamic programming is often attributed to Richard Bellman (1920--1984). Both the term “dynamic programming” and the technique were popularized by Bellman (1957). According to his autobiography, Bellman chose the name dynamic programming to avoid giving the impression that he was conducting mathematical research within RAND Corporation. His ultimate boss, Secretary of Defense Charles Wilson, apparently disliked such research Bellman, 1984.
For treatments of dynamic programming from the perspective of economics and finance, see, for example, Sargent (1987), Stokey & Lucas (1989), Van & Dana (2003), Bäuerle & Rieder (2011), or Stachurski (2022).
The job search model was introduced by McCall (1970). The McCall model and its extensions transformed economists’ way of thinking about labor markets (see, e.g., Lucas (1978)). Influential extensions to the job search model include Burdett (1978), Jovanovic (1979), Pissarides (1979), Jovanovic (1984), Mortensen (1986), Ljungqvist (2002) and Chetty (2008). Rogerson et al. (2005) provides a useful survey.
For elementary real analysis, the book by Bartle & Sherbert (2011) is excellent. Ok (2007) is a superb treatment of real analysis and how it is used throughout economic theory. Discussions of Banach’s theorem and the Neumann series lemma can be found in Cheney (2013) and Atkinson & Han (2005). Rocha & Vailakis (2010) provides an extension to Banach’s theorem that requires only local contractivity.
The procedure of solving the last period first and then working back in time is called backward induction. Starting with the last period makes sense because there is no future to consider.
Incidentally, imposing an infinite horizon is not the same as assuming humans live forever. Rather, it corresponds to the idea that humans have no specific “termination” date. More generally, we can understand an infinite horizon as an approximation to a finite horizon in which observations are recorded at relatively high frequency and no clear termination date exists.
Hint: To prove that is invertible and , it suffices to show that .
- McCall, J. J. (1970). Economics of information and job search. The Quarterly Journal of Economics, 84(1), 113–126.
- Sargent, T., & Stachurski, J. (2023). Economic Networks: Theory and Computation. Cambridge University Press.
- Kreyszig, E. (1978). Introductory Functional Analysis with Applications (Vol. 1). Wiley New York.
- Bollobás, B. (1999). Linear Analysis: An Introductory Course. Cambridge University Press.
- Çınlar, E., & Vanderbei, R. J. (2013). Real and Convex Analysis. Springer Science & Business Media.
- Bellman, R. (1957). Dynamic Programming. In Science. American Association for the Advancement of Science.
- Bellman, R. (1984). Eye of the Hurricane. World Scientific.
- Sargent, T. (1987). Dynamic Macroeconomic Theory. Harvard University Press.
- Stokey, N., & Lucas, R. (1989). Recursive Methods in Dynamic Economics. Harvard University Press.
- Van, C., & Dana, R.-A. (2003). Dynamic Programming in Economics. Springer.
- Bäuerle, N., & Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer Science & Business Media.
- Stachurski, J. (2022). Economic Dynamics: Theory and Computation (2nd ed.). MIT Press.
- Lucas, R. E. (1978). Unemployment policy. American Economic Review, 68(2), 353–357.
- Burdett, K. (1978). A theory of employee job search and quit rates. American Economic Review, 68(1), 212–220.
- Jovanovic, B. (1979). Firm-specific capital and turnover. Journal of Political Economy, 87(6), 1246–1260.