To prepare to analyze dynamic programs, we now study stochastic processes generated by Markov chains. These processes are widely used to construct economic and financial models.
At the end of this chapter we return to the job search problem from Chapter 1 and allow wage draws to be correlated over time (rather than iid). We use a Markov chain to generated serially correlated wage draws.
Throughout this chapter, the symbol X represents a finite set.
Fix X={x1,…,xn} and P∈M(RX). We interpret P(x,x′) as the probability that a random process moves from x to x′ over one unit of time. For this interpretation to make sense we need P(x,x′) to be nonnegative and ∑x′∈XP(x,x′) to equal one for every x∈X, since we want the chain to stay somewhere in the state space after each update. These are exactly the properties guaranteed by the assumption P∈M(RX) (see Exercise 2.58).
To formalize ideas, let (Xt):=(Xt)t⩾0 be a sequence of random variables taking values in X and call (Xt) a Markov chain on state spaceX if there exists a P∈M(RX) such that
To simplify terminology, we also call (Xt)P-Markov when (3.1) holds. We call either X0 or its distribution ψ0 the initial condition of (Xt), depending on context. P is also called the transition matrix of the Markov chain.
The definition of a Markov chain says two things:
When updating to Xt+1 from Xt, earlier states are not required.
P encodes all of the information required to perform the update, given the current state Xt.
One way to think about Markov chains is algorithmically: Fix P∈M(RX) and let ψ0 be an element of D(X). Now generate (Xt) via Algorithm 3.1. The resulting sequence is P-Markov with initial condition ψ0.
As an example, consider a firm whose inventory of some product follows S--s dynamics, meaning that the firm waits until its inventory falls below some level s>0 and then immediately replenishes by ordering S units. This pattern of decisions can be rationalized if ordering requires paying a fixed cost. Thus, in Section 5.2.1, we will show that S--s behavior is optimal in a setting where fixed costs exist and the firm’s aim is to maximize its present value.
To represent S--s dynamics, we suppose that a firm’s inventory (Xt)t⩾0 of a given product obeys
Listing 1 provides code that simulates inventory paths and computes other objects of interest. Since the state space X={x1,…,xn} corresponds to {0,…,S+s} and Julia indexing starts at 1, we set xi=i−1. This convention is used when computing P[i, j]{.julia}, which corresponds to P(xi,xj). The code in the listing is used to produce the simulation of inventories in Figure Figure 3.1.
The function compute_mc{.julia} returns an instance of a MarkovChain{.Julia} object that can store both the state X and the transition probabilities. The QuantEcon.jl{.julia} library defines this data type and provides functions that simulate a Markov chains, compute a stationary distribution, and perform related tasks.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
using Distributions, QuantEcon, IterTools
function create_inventory_model(; S=100, # Order size
s=10, # Order threshold
p=0.4) # Demand parameter
ϕ = Geometric(p)
h(x, d) = max(x - d, 0) + S * (x <= s)
return (; S, s, ϕ, h)
end
"Simulate the inventory process."
function sim_inventories(model; ts_length=200)
(; S, s, ϕ, h) = model
X = Vector{Int32}(undef, ts_length)
X[1] = S # Initial condition
for t in 1:(ts_length-1)
X[t+1] = h(X[t], rand(ϕ))
end
return X
end
"Compute the transition probabilities and state."
function compute_mc(model; d_max=100)
(; S, s, ϕ, h) = model
n = S + s + 1 # Size of state space
state_vals = collect(0:(S + s))
P = Matrix{Float64}(undef, n, n)
for (i, j) in product(1:n, 1:n)
P[i, j] = sum((h(i-1, d) == j-1) * pdf(ϕ, d) for d in 0:d_max)
end
return MarkovChain(P, state_vals)
end
"Compute the stationary distribution of the model."
function compute_stationary_dist(model)
mc = compute_mc(model)
return mc.state_values, stationary_distributions(mc)[1]
end
Program 1:An implementation of S--s inventory dynamics (inventory_sim.jl)
Given a finite state space X, k⩾0 and P∈M(RX), let Pk be the k-th power of P. (If k=0, then Pk is the identity matrix.) Since M(RX) is closed under multiplication (Exercise 2.48), Pk is in M(RX) for all k⩾0. In this context, Pk is sometimes called the k-step transition matrix corresponding to P. In what follows, Pk(x,x′) denotes the (x,x′)-th element of the matrix representation of Pk.
The k-step transition matrix has the following interpretation: If (Xt) is P-Markov, then for any t,k∈Z+ and x,x′∈X,
Thus, Pk provides the k-step transition probabilities for the P-Markov chain (Xt).
We can now give the following useful characterization of irreducibility:
Thus, irreducibility of P means that the P-Markov chain eventually visits any state from any other state with positive probability.
Several libraries have code for testing irreducibility, including QuantEcon.jl. See Listing 2 for an example of a call to this functionality. In this case, irreducibility fails because state 2 is an absorbing state. Once entered, the probability of ever leaving that state is zero. (A subset Y of X with this property is called an absorbing set.)
1
2
3
4
5
using QuantEcon
P = [0.1 0.9;
0.0 1.0]
mc = MarkovChain(P)
print(is_irreducible(mc))
Program 2:Testing irreducibility (is_irreducible.jl)
To verify (3.9), rewrite it as P{Xt+1=x′}=∑xP{Xt+1=x′∣Xt=x}P{Xt=x}, which is true by the law of total probability. With each ψt regarded as a row vector, (3.9) can also be written as
Equation (3.10) tells us that dynamics of marginal distributions for Markov chains are generated by deterministic linear difference equations in distribution space. This is remarkable because the dynamics that drive (Xt) are stochastic and can be arbitrarily nonlinear.
Iterating on (3.10), we get ψt=ψ0Pt for all t. In summary,
(Xt)t⩾0 is P-Markov with X0=dψ0⟹Xt=dψ0Pt for all t⩾0.
For (3.11) and ψt+1=ψtP to hold, each ψt must be a row vector. In what follows, we always treat the distributions (ψt)t⩾0 of (Xt)t⩾0 as row vectors.
Consistent with our definition of stationary distributions in Section 2.3.1.3, a marginal distribution ψ∗∈D(X) is called stationary for P if
In vector form, this is ψ∗P=ψ∗. By this definition and (3.9), if ψ∗ is stationary and Xt has distribution ψ∗, then so does Xt+k for all k⩾1.
We saw in Exercise 2.48 that every irreducible P∈M(RX) has exactly one stationary distribution in D(X). The following ergodic property holds under the same assumptions.
Property (3.15) tells us that, with probability one (i.e., for almost every P-Markov chain that we generate), the fraction of time that the chain spends in any given state is, in the limit, equal to the probability assigned to that state by the stationary distribution. Markov chains with this property are sometimes said to be ergodic.
Since the S--s inventory model from Section 3.1.1.2 is irreducible, the ergodicity result from Theorem 3.1.2 applies. In particular, the process has only one stationary distribution ψ∗ in D(X), where X={0,…,S+s}, and (3.15) is valid. Figure Figure 3.2 illustrates this by plotting both the stationary distribution ψ∗ (which is computed using the code in Listing 1), and the value m(y):=k1∑t=0k−11{Xt=y} at each y∈X for k set to 1,000,000. As predicted by the theorem, the fraction of time spent by the chain in each state is close to the probability assigned by ψ∗.
Suppose that a day laborer is either unemployed (Xt=1) or employed (Xt=2) in each period. In state 1 he is hired with probability α∈(0,1). In state 2 he is fired with probability β∈(0,1). The corresponding state space and transition matrix are
To simplify numerical calculations, we sometimes approximate a continuous-state Markov process with a Markov chain. For example, consider a linear Gaussian AR(1) model, where (Xt)t⩾0 evolves in R according to
Process (3.19) is also ergodic in a similar sense to (3.15): On average, realizations of the process spend most of their time in regions of the state where the stationary distribution puts high probability mass. (You can check this via simulations if you wish.) Hence, in the discretization that follows, we shall put the discrete state space in this area.
To discretize (3.19) we use Tauchen’s method, starting with the case b=0.[1] As a first step, we choose n as the number of states for the discrete approximation and m as an integer that sets the width of the state space. Then we create a state space X:={x1,…,xn}⊂R as an equispaced grid that brackets the stationary mean on both sides by m standard deviations:
set x1=−mσx,
set xn=mσx and
set xi+1=xi+s where s=(xn−x1)/(n−1) and i in [n−1].
The next step is to create an n×n matrix P that approximates the dynamics in (3.19). For i,j∈[n],
if j=1, then set P(xi,xj)=F(x1−ρxi+s/2).
If j=n, then set P(xi,xj)=1−F(xn−ρxi−s/2).
Otherwise, set P(xi,xj)=F(xj−ρxi+s/2)−F(xj−ρxi−s/2).
The first two are boundary rules and the third applies Exercise 3.11.
Finally, if b=0, then we shift the state space to center it on the mean μx of the stationary distribution N(μx,σx2). This is done by replacing xi with xi+μx for each i.
Julia routines that compute X and P can be found in the library QuantEcon.jl.
Figure Figure 3.3 compares the continuous stationary distribution ψ∗ and the unique stationary distribution of the discrete approximation when X and P are constructed using Tauchen’s method when ρ=0.9, b=0.0, ν=1.0 and the discretization parameters are n=15 and m=3.
Figure 3.3:Comparison of ψ∗=N(μx,σx2) and its discrete approximant
In this section, we discuss how to compute conditional expectations for Markov chains. The theory will be essential for the study of finite Markov decision processes, since, in these models, lifetime rewards are mathematical expectations of flow reward functions of Markov states.
where (Xt) is any P-Markov chain on X. In terms of matrix algebra, viewing h has an n×1 column vector, the expression (Ph)(x) is one element of the vector Ph obtained by premultiplying h by P.
The interpretation in (3.24) extends to powers of P. In particular, we have
The law of iterated expectations is a workhorse in economics and finance. One version of the law states that if X and Y are two random variables, then E[E[Y∣X]]=E[Y]. Let’s show how this law operates for Markov chains.
Let (Xt) be P-Markov with X0=dψ0. Fix t,k∈N. Set Et:=E[⋅∣Xt]. We claim that
Thus, P is monotone increasing if shifting up the current state shifts up the next-period state, in the sense that its distribution increases in the stochastic dominance ordering (see Section 2.2.4) on D(X). Below, we will see that monotonicity of Markov operators is closely related to monotonicity of value functions in dynamic programming.
Monotonicity of Markov operators is related to positive autocorrelation. To illustrate the idea, consider the AR(1) model Xt+1=ρXt+σϵt+1 from Section 3.1.3 and suppose we apply Tauchen discretization, mapping the parameters ρ,σ and a discretization size n into a Markov operator P on state space X={x1,…,xn}⊂R, totally ordered by ⩽. If ρ⩾0, so that positive autocorrelation holds, then P is monotone increasing.
Dynamic programs often form a lifetime value V0 as a geometric sum of a reward sequence (Rt)t⩾0 with constant discount factor, so that V0=E∑t=0∞βtRt for some β>0. We saw this in (1.1), where we aggregated a profit stream (πt)t⩾0 into an expected present value of the firm, and again in (1.7), where a worker evaluates lifetime earnings. In this section, we study expectations of geometric sums.
Consider a firm that receives random profit stream (πt)t⩾0. Supposes that the value of the firm equals the expected present value of its profit stream. Suppose for now that the interest rate is constant at r>0. With β:=1/(1+r), total valuation is
To compute this value, we need to know how profits evolve. A common strategy is to set πt=π(Xt) for some fixed π∈RX, where (Xt)t⩾0 is a state process. For known dynamics of (Xt) and function π, the value V0 in (3.36) can be computed.
Here we assume that (Xt) is P-Markov for P∈M(RX) with finite X. Then conditioning on X0=x, we can write the value as
It is plausible that the value of the firm will be higher for a return process in which higher states generate higher profits and predict higher future states. The next exercise confirms this.
To model consumption-saving choices we want to evaluate different consumption paths, where a consumption path is a nonnegative random sequence (Ct)t⩾0. In what follows we consider consumption paths such that Ct=c(Xt) for all t⩾0, where c∈R+X and (Xt)t⩾0 is P-Markov on finite set X. Thus, consumption streams are time-invariant functions of a finite-state Markov chain.
In a standard “time additive” model of consumer preferences with constant geometric discounting, the time zero value of a consumption stream (Ct)t⩾0, given current state X0=x∈X, is
where β∈(0,1) is a discount factor and u:R+→R is called the flow utility function. Dependence of v(x) on x comes from the initial condition X0=x influencing the Markov state process and, therefore, the consumption path.
Using Ct=c(Xt) and defining r:=u∘c we can write v(x)=Ex∑t⩾0βtr(Xt). By Lemma 3.2.1, this sum is finite and v can be expressed as
while c(x)=exp(x), so that consumption takes the form Ct=exp(Xt), and (Xt)t⩾0 is a Tauchen discretization (see Section 3.1.3) of Xt+1=ρXt+νWt+1 where (Wt)t⩾1 is iid and standard normal. Parameters are n=25, β=0.98, ρ=0.96, ν=0.05 and γ=2. We set r=u∘c and solved for v via (3.40).
In this section, we extend the job search problem studied in Section 1.3 to a setting with Markov wage offers. We discuss additional structure when the Markov operator for wage offers is monotone increasing. We will also allow job separations to occur.
The value functionv∗ for the Markov job search model is now defined as follows: v∗(w) is the maximum lifetime value that can be obtained when the worker is unemployed with current wage offer is w in hand. Value function v∗ satisfies Bellman equation
Bellman equation (3.42) extends a corresponding Bellman equation for the iid case (cf. (1.45)). (A full proof is given in Chapter 4.) The Bellman operator corresponding to (3.42) is
Let V:=R+W and endow V with the pointwise partial order ⩽ and the supremum norm, so that ∥f−g∥∞=maxw∈W∣f(w)−g(w)∣.
We recommend that you study the proof of the next lemma, since the same style of argument occurs often in the book.
In view of the contraction property established in Exercise 3.21, we can use value function iteration (i) to compute an approximation v to the value function and (ii) to calculate the v-greedy policy that approximates the optimal policy. Code for implementing this procedure is in Listing 4. The definition of a v-greedy policy resembles that of the iid case (see (1.54)).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
using QuantEcon, LinearAlgebra
include("s_approx.jl")
"Creates an instance of the job search model with Markov wages."
function create_markov_js_model(;
n=200, # wage grid size
ρ=0.9, # wage persistence
ν=0.2, # wage volatility
β=0.98, # discount factor
c=1.0 # unemployment compensation
)
mc = tauchen(n, ρ, ν)
w_vals, P = exp.(mc.state_values), mc.p
return (; n, w_vals, P, β, c)
end
" The Bellman operator Tv = max{e, c + β P v} with e(w) = w / (1-β)."
function T(v, model)
(; n, w_vals, P, β, c) = model
h = c .+ β * P * v
e = w_vals ./ (1 - β)
return max.(e, h)
end
" Get a v-greedy policy."
function get_greedy(v, model)
(; n, w_vals, P, β, c) = model
σ = w_vals / (1 - β) .>= c .+ β * P * v
return σ
end
"Solve the infinite-horizon Markov job search model by VFI."
function vfi(model)
v_init = zero(model.w_vals)
v_star = successive_approx(v -> T(v, model), v_init)
σ_star = get_greedy(v_star, model)
return v_star, σ_star
end
Program 4:Job search with Markov state (markov_js.jl)
The continuation value depends on w because the current offer helps predict the offer next period, which in turn affects the value of continuing. The functions w↦w/(1−β), h∗ and v∗ corresponding to the default model in Listing 4 are shown in Figure Figure 3.5.
Figure 3.5:Value, stopping, and continuation for Markov job search
Exercise 3.24 suggests an alternative way to solve the job search problem: iterate with Q to obtain the continuation value function h∗ and then use the policy
that tells the worker to accept when the current stopping value exceeds the current continuation value.
We saw that, in the iid case, a computational strategy based on continuation values is far more efficient than value function iteration (see Section 1.3.2.2). Since continuation values are functions rather than scalars, here the two approaches (iterating with T vs iterating with Q) are more similar. In Chapter 4 we discuss alternative computational strategies in more detail, seeking conditions under which one approach will be more efficient than the other.
We now modify the job search problem discussed in Section 3.3.1 by adding separations. In particular, an existing match between worker and firm terminates with probability α every period. (This is an extension because setting α=0 recovers the permanent job scenario from Section 3.3.1.)
The worker now views the loss of a job as a capital loss and a spell of unemployment as an investment. In what follows, the wage process and discount factor are unchanged from Section 3.3.1. As before, V:=R+W is endowed with the supremum norm.
The value function vu∗ for an unemployed worker satisfies the recursion
where ve∗ is the value function for an employed worker, that is, the lifetime value of a worker who starts the period employed at wage w. Value function ve∗ satisfies
This equation states that value accruing to an employed worker is current wage plus the discounted expected value of being either employed or unemployed next period.
We claim that, when 0<α,β<1, the system (3.51)--(3.52) has a unique solution (ve∗,vu∗) in V×V. To show this we first solve (3.52) in terms of ve∗(w) to obtain
Figure Figure 3.6 shows the value function vu∗ for an unemployed worker, which is the fixed point of (3.54), as well as the stopping and continuation values, which are given by
respectively, for each w∈W. Parameters are as in Listing 5. The value function vu∗ is the pointwise maximum (i.e., vu∗=s∗∨h∗). The worker’s optimal policy while unemployed is
Figure Figure 3.7 shows how the reservation wage changes with α. To produce this figure we solved the model for the reservation wage at 10 values of α in an evenly spaced grid ranging 0 to 1. The reservation wage falls with α, since time spent unemployed is a capital investment in better wages, and the value of this investment declines as the separation rate rises.
Figure 3.7:Reservation wage versus separation rate
Many good textbooks on Markov chains exist, including Norris (1998), Häggström & others (2002), and Privault (2013). Sargent & Stachurski (2023) provides a relatively comprehensive treatment from a network perspective that is a natural one for Markov chains. Other economic applications are discussed in Stokey & Lucas (1989) and Ljungqvist & Sargent (2018). Meyer (2000) gives a detailed account of the theory of nonnegative matrices. Another useful reference is Horn & Johnson (2012).
A systematic study of monotone Markov chains was initiated by Daley (1968). Monotone Markov methods have many important applications in economics. See, for example, Hopenhayn & Prescott (1992), Kamihigashi & Stachurski (2014), Jaśkiewicz & Nowak (2014), Balbus et al. (2014), Foss et al. (2018) and Hu & Shmaya (2019).
Tauchen’s method Tauchen, 1986 is simple but sub-optimal in some cases. For a more general discretization method and a survey of the literature, see Farmer & Toda (2017).
To justify the first equality, care must be taken when pushing expectations through infinite sums. In the present setting, justification can be provided via the dominated convergence theorem (see, e.g., Dudley (2002), Theorem 4.3.5). A proof of a more general result can be found in Section B.2.
Brémaud, P. (2020). Markov Chains: Gibbs Fields, Monte Carlo Simulation and Queues (Vol. 31). Springer Nature.
Norris, J. R. (1998). Markov Chains. Cambridge University Press.
Häggström, O., & others. (2002). Finite Markov Chains and Algorithmic Applications. Cambridge University Press.
Privault, N. (2013). Understanding Markov Chains: Examples and Applications. Springer-Verlag Singapore.
Sargent, T., & Stachurski, J. (2023). Economic Networks: Theory and Computation. Cambridge University Press.
Stokey, N., & Lucas, R. (1989). Recursive Methods in Dynamic Economics. Harvard University Press.
Ljungqvist, L., & Sargent, T. (2018). Recursive Macroeconomic Theory (4th ed.). MIT Press.
Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra (Vol. 71). Siam.
Horn, R. A., & Johnson, C. R. (2012). Matrix Analysis. Cambridge University Press.
Daley, D. (1968). Stochastically monotone Markov Chains. Probability Theory and Related Fields, 10(4), 305–317.
Hopenhayn, H. A., & Prescott, E. C. (1992). Stochastic monotonicity and stationary distributions for dynamic economies. Econometrica, 60(6), 1387–1406.
Kamihigashi, T., & Stachurski, J. (2014). Stochastic stability in monotone economies. Theoretical Economics, 9(2), 383–407.
Jaśkiewicz, A., & Nowak, A. S. (2014). Stationary Markov perfect equilibria in risk sensitive stochastic overlapping generations models. Journal of Economic Theory, 151, 411–447.
Balbus, Ł., Reffett, K., & Woźny, Ł. (2014). A constructive study of Markov equilibria in stochastic games with strategic complementarities. Journal of Economic Theory, 150, 815–840.
Foss, S., Shneer, V., Thomas, J. P., & Worrall, T. (2018). Stochastic stability of monotone economies in regenerative environments. Journal of Economic Theory, 173, 334–360.