This chapter discusses techniques that underlie the optimization and fixed-point methods used throughout the book. Many of these techniques relate to order. Order-theoretic concepts will prove valuable not only for fixed-point methods but also for understanding the main concepts in dynamic programming. Chapter 8 will show core components of dynamic programming can be expressed in terms of simple order-theoretic constructs.
First we treat a technique for simplifying analysis of stability and fixed points that we’ll apply in applications.
To illustrate the idea, suppose that we want to study dynamics induced by a self-map T on U⊂Rn. We might want to know if a unique fixed point of T exists and if iterates of T converge to a fixed point. One approach is to apply fixed-point theory to T.
However, sometimes there is an easier approach: Transform T into a “simpler” map T^ and study its the fixed-point properties. For this to work, we need to be sure that useful properties we discover about T^ will transmit themselves back to properties of T, the map that actually interests us.
This section explains a notion of conjugacy that formalizes these ideas. The study of conjugate relationships originated in the field of dynamical systems theory. Later we will apply this approach to operators that arise in contexts of dynamic programming and recursive preferences.
A dynamical system is a pair (U,T), where U is any set and T is a self-map on U. Two dynamical systems (U,T) and (U^,T^) are said to be conjugate under Φ if Φ is a bijection from U into U^ such that T=Φ−1∘T^∘Φ on U.
Conjugacy of (U,T) and (U^,T^) under Φ can be understood as follows: Shifting a point u∈U to Tu via T is equivalent to moving u into U^ via u^=Φu, applying T^, and then moving the result back using Φ−1:
Figure 2.1:Conjugacy of dynamical systems (U,T) and (hatU,hatT) under Phi.
The next two exercises illustrate benefits of establishing a conjugate relationship between two dynamical systems.
The next result summarizes the most important consequences of our findings.
In particular, if T has a unique fixed point on U if and only if T^ has a unique fixed point on U^.
Let U and U^ be two subsets of Rn. A function Φ from U to U^ is called a homeomorphism if it is continuous, bijective, and its inverse Φ−1 is also continuous.
Assume again that U and U^ are subsets of Rn. In this setting, we say that dynamical systems (U,T) and (U^,T^) are topologically conjugate under Φ if (U,T) and (U^,T^) are conjugate under Φ and, in addition, Φ is a homeomorphism.
The next exercise asks you to show that topologically conjugacy is an equivalence relation, as defined in Section A.1.
From the preceding exercises we can state the following useful result:
In Section 1.2.2.2 we investigated global stability. Here we introduce local stability and provide a sufficient condition for situations in which the map is smooth.
Let U be a subset of Rn and let T be a self-map on U. A fixed point u∗ of T in U is called locally stable for the dynamical system (U,T) if there exists an open set O⊂U such that u∗∈O and Tku→u∗ as k→∞ for every u∈O. In other words, the domain of attraction for u∗ contains an open neighborhood of u∗.
For an interior fixed point x∗ of a smooth self-map g on an interval of R, local stability holds whenever ∣g′(x∗)∣<1. The proof strategy proceeds as follows: When ∣g′(x∗)∣<1, the first-order linear approximation
is a contraction of modulus ∣g′(x∗)∣ with unique fixed point x∗. Hence all trajectories of g^ converge to x∗. Moreover, since g and g^ are similar in a neighborhood of x∗, the same is true for trajectories of g starting close to x∗.
The next theorem formalizes this line of argument and extends it to multiple dimensions. In stating the theorem, we take T to be a self-map on U with fixed point u∗ in U and assume that T is continuously differentiable on U. Recall that the Jacobian of T at u∈U is
Combining this theorem with the result of Exercise 2.6, we see that, under the conditions of the theorem, u∗ is globally stable for (O,T), and hence locally stable for (U,T), whenever (O,T^) is globally stable. By the Neumann series lemma, the first-order approximation will be globally stable whenever JT(u∗) has spectral radius less than one. Thus, we have
To discuss relative rates of convergence we fix a norm ∥⋅∥ on Rn and take a sequence (uk)k⩾0⊂Rn converging to u∗∈Rn. Set ek:=∥uk−u∗∥ for all k. We say that (uk) converges to u∗at rate at least q if q⩾1 and, for some β∈(0,∞) and N∈N, we have
If q=2, then we say that convergence is (at least) quadratic.
If q=1 and β<1, then we say that convergence is (at least) linear.
Orders of convergence are studied in the neighborhood of zero, implying that higher orders are faster. For example, suppose ϵk:=∥uk−u∗∥ is the size of the error and that uk converges to u∗ quadratically. If, say, ϵk=10−5, then ϵk+1≈β10−10. Provided that β is not large, the number of accurate digits roughly doubles at each step.
Successive approximations typically converge at a linear rate. To see this in one dimension, try the following exercise.
(The restriction that 0<∣T′u∗∣<1 in Exercise 2.7 is mild. For example, given convergence of successive approximation to the fixed point, we expect ∣T′u∗∣<1, since this inequality implies that u∗ is locally stable.)
While successive approximation always converges when global stability holds, faster fixed-point algorithms can often be obtained by leveraging extra information, such as gradients. Newton’s method is an important gradient-based technique. (As we discuss in Section 5.1.4.2, Newton’s method is a key component of algorithms for solving dynamic programs.)
While Newton’s method is often used to solve for roots of a given function, here we use it to find fixed points.
Suppose first that T is a differentiable self-map on an open set U⊂Rn and that we want to find a fixed point of T. Our plan is to start with a guess u0 of the fixed point and then update it to u1. To do this we use the first-order approximation T^ of T around u0 and solve for the fixed point of T^ -- which we can do exactly since T^ is linear. We take this new point as u1 and then continue.
If T is one-dimensional then T^u:=Tu0+T′u0(u−u0). For n>1 we replace T′u0 with the Jacobian of T at u0, which we write as JT(u0). We then solve T^u1=u1 for u1, which gives
u1=(I−JT(u0))−1(Tu0−JT(u0)u0)(I is the n×n identity).
Figure Figure 2.2 shows u0 and u1 when n=1 and Tu=1+u/(u+1) and u0=0.5. The value u1 is the fixed point of the first-order approximation T^. It is closer to the fixed point of T than u0, as desired.
Figure 2.2:First step of Newton’s method applied to T
Newton’s (fixed-point) method continues in the same way, from u1 to u2 and so on, leading to the sequence of points
Figure Figure 2.3 shows both the Newton approximation sequence and the successive approximation sequence applied to computing the fixed point of the Solow--Swan model from Section 1.2.3.2. We use two different initial conditions (top and bottom subfigures). Both sequences converge, but the Newton sequences converge faster.
Figure 2.3:Newton’s method applied to the Solow--Swan update rule
A fast rate of convergence for Newton scheme can be confirmed theoretically: Under mild conditions, there exists a neighborhood of the fixed point within which the Newton iterates converge quadratically. See, for example, Theorem 5.4.1 of Atkinson & Han (2005). Some dynamic programming algorithms take advantage of this fast rate of convergence (see Section 5.1.4.3).
Sometimes we can accelerate computations by exploiting a problem’s special structure (e.g., differentiability, convexity, monotonicity). But we often face a trade-off between speed and robustness to details of problem specification. More robust methods impose less structure.
Relative to other algorithms, successive approximation tends to be robust but slow. We saw one illustration of the relatively slow rate of convergence in Figure Figure 2.3. But we can also see its relatively strong robustness properties via the same example, by inspecting Figure Figure 2.4, which compares the update rule of successive approximation (the function g) with the update rule for Newton’s method (the function Q in (2.13)). Also plotted is the dashed 45 degree line.
The parameterization is the same as for the top subfigure in Figure Figure 1.7. As previously discussed, the shape of g implies global convergence of successive approximation. However, Q is well-behaved near the fixed point (i.e., very flat and hence strongly contractive) but poorly behaved away from the fixed point. This illustrates that Newton’s method is fast but generally less robust.
Figure 2.4:Robustness of successive approximation versus Newton’s method
We have discussed rates of convergence for fixed-point methods. Mathematicians and computer scientists also analyze algorithms via worst-case complexity, which measures the number of fundamental operations (e.g., addition and multiplication of floating point numbers) when an algorithm acts on data that is least favorable for good performance. These measures are attractive because they are independent of the software and hardware platforms on which algorithms are implemented.
Software and hardware matter not just for absolute performance of algorithms but also for relative performance. For example, although a single update step in successive approximation can often be partially parallelized, the algorithm is inherently serial, in the sense that the (k+1)-th iterate cannot be computed until iterate k is available. Moreover, because the rate of convergence is typically slow (i.e., linear), there can be many small serial steps. This limits parallelization.
Newton’s method is also serial to some degree, since we are just iterating with a different map (the operator Q in (2.13)). However, because it involves inverting matrices of possibly high dimension, each step is computationally intensive. At the same time, since the rate of convergence is faster, we have to take fewer steps. In this sense, the algorithm is less serial -- it involves a smaller number of more expensive steps. Because it is less serial, Newton’s method offers far more potential for parallelization. Thus, the speed gain associated with Newton’s method can become very large when using effective parallelization.
A partial order on a nonempty set P is a relation ⪯ on P×P that, for any p,q,r in P, satisfies
2
p⪯p
p⪯q and q⪯p implies p=q and
p⪯q and q⪯r implies p⪯r
(reflexivity),
(antisymmetry), and
(transitivity).
The pair (P,⪯) is called a partially ordered set. For convenience, we sometimes write P for (P,⪯) and q⪰p for p⪯q. The statement p⪯q⪯r means p⪯q and q⪯r.
In what follows, for u,v∈RX, we write u≪v if u(x)<v(x) for all x∈X.
The preceding pointwise concepts extend immediately to vectors, since vectors are just real-valued functions under the identification asserted in Lemma 1.2.4. In particular, for vectors u=(u1,…,un) and v=(v1,…,vn) in Rn, we write
u⩽v if ui⩽vi for all i∈[n] and
u≪v if ui<vi for all i∈[n].
Statements u⩾v and u≫v are defined analogously. Figure Figure 2.5 illustrates. Naturally, ⩽ is called the pointwise order on Rn.
Figure 2.5:Pointwise we have u⩽v and u≪v but not w⩽v
A partial order ⪯ on P is called total if, for all p,q∈P, either p⪯q or q⪯p.
Given a partially ordered set (P,⪯) and A⊂P, we say that g∈P is a greatest element of A if g∈A and, in addition, a∈A⟹a⪯g. We call ℓ∈P a least element of A if ℓ∈A and, in addition, a∈A⟹ℓ⪯a.
If A is totally ordered, then a greatest element g of A is also called a maximum of A, whereas a least element ℓ of A is also called a minimum. See Appendix Appendix A for more about maxima and minima.
Concepts of suprema and infima on the real line (Appendix Appendix A) extend naturally to partially ordered sets. Given a partially ordered set (P,⪯) and a nonempty subset A of P, we call u∈P an upper bound of A if a⪯u for all a in A. Letting UP(A) be the set of all upper bounds of A in P, we call uˉ∈P a supremum of A if
Thus, uˉ is the least element (see Section 2.2.1.2) of the set of upper bounds UP(A), whenever it exists.
If P⊂R and ⪯ is ⩽, then the notion of supremum on a partially ordered set reduces to the elementary definition of the supremum for subsets of the real line discussed in Appendix Appendix A.
Letting A be a subset of partially ordered space P,
the supremum of A is typically denoted ⋁A.
If A={ai}i∈I for some index set I, we also write ⋁A as ⋁iai.
If A={a,b}, then ⋁A is also written as a∨b.
Suprema and greatest elements are clearly related. The next exercise clarifies this.
We call ℓ∈P a lower bound of A if a⪰ℓ for all a in A. An element ℓˉ of P is called a infimum of A if ℓˉ is a lower bound of A and ℓˉ⪰ℓ for every lower bound ℓ of A. We use analogous notation to denote the infimum. For example, if A={a,b}, then ⋀A is also written as a∧b.
For us, the pointwise partial order ⩽ introduced in Example 2.2.2 is especially useful. In this section, we review some properties of this order. Throughout, X is an arbitrary finite set.
2.2.2.1Suprema and Infima under a Pointwise Order¶
Given u,v∈RX, the symbol u∧v is possibly ambiguous because we used the symbol both for a pointwise minimum in Section 1.2.4.1 and an infimum of {u,v} in Section 2.2.1.3. Fortunately, for elements of the partially ordered set (RX,⩽), these two definitions coincide. Indeed, if f(x):=min{u(x),v(x)} for all x∈X, then
f is a lower bound for {u,v} in (RX,⩽), and
g⩽u and g⩽v implies g⩽f.
Hence f is the infimum of {u,v} in (RX,⩽).
A subset V of RX is called a sublattice of RX if
u,v∈V implies u∨v∈V and u∧v∈V.
Above we discussed the fact that, for a pair of functions {u,v}, the supremum in (RX,⩽) is the pointwise maximum, whereas the infimum in (RX,⩽) is the pointwise minimum. The same principle holds for finite collections of functions. Thus, if {vi}:={vi}i∈I is a finite subset of RX, then, for all x∈X,
The next example discusses greatest elements in the setting of pointwise order.
Figure Figure 2.6 helps illustrate Example 2.2.7. In this case, v∗ is not in {vσ} and {vσ} has no greatest element (since neither vσ′⩽vσ′′ nor vσ′′⩽vσ′).
Given a partially ordered set (P,⪯) and a,b∈P, the order interval[a,b] is defined as all p∈P such that a⪯p⪯b. (If a⪯b fails, the order interval is empty.)
In this section, we note some useful inequalities and identities related to the pointwise partial order on RX. As before, X is any finite set.
These results follow immediately from proofs of corresponding claims when f,g, and h are in R. For example, by the usual triangle inequality for scalars, we have ∣f(x)+g(x)∣⩽∣f(x)∣+∣g(x)∣ for all x∈X. This is equivalent to the statement ∣f+g∣⩽∣f∣+∣g∣ in (i). Similarly, inequality (v) follows directly from a corresponding scalar inequality that was already proved in Exercise 1.32,.
A complete proof of lemma Lemma 2.2.1 can be found with Theorem 30.1 of Aliprantis & Burkinshaw (1998).
The inequality in Lemma 2.2.2 helps with dynamic programming problems that involve maximization. The next exercise concerns minimization.
We end this section with a discussion of upper envelopes. To frame the discussion, we take {Tσ}:={Tσ}σ∈Σ to be a finite family of self-maps on a sublattice V of RX. Consider some properties of the operator T on V defined by
It follows from the sublattice property that T is a self-map on V. In some sources, T is called the upper envelope of the functions {Tσ}. The following lemma will be useful for dynamic programming.
Order-preserving maps appear throughout the theory of dynamic programming. Here we define them and state a condition for contractivity that requires the order preserving property.
Regarding the definitions in (2.28) and (2.29), when (Q,⊴)=(R,⩽), it is common to say “increasing” instead of order-preserving, and “decreasing” instead of order-reversing. We adopt this terminology. In particular, given partially ordered set (P,⪯), we call h∈RP
increasing if p⪯p′ implies h(p)⩽h(p′) and
decreasing if p⪯p′ implies h(p)⩾h(p′).
We use the symbol iRP for the set of increasing functions in RP.
The next exercise shows that, in a totally ordered setting, an increasing function can be represented as the sum of increasing binary functions.
Our discussion of Banach’s Theorem in Section 1.2.2.3 showed the usefulness of contractivity. For an order-preserving operator on a subset of RX, the following condition often simplifies establishing this property. In the statement of the lemma, U is a subset of RX, partially ordered by ⩽, and X is finite. Also, U has the property that u∈U and c∈R+ implies $u
So far we have discussed partial orders over vectors, functions and sets. It is also useful to have a partial order over distributions that tells us when one distribution is in some sense “larger” than another. In this section, we introduce a partial order over some distributions commonly used in economics and finance.
Let’s start with an example. Recall that a random variable X is binomial B(n,0.5) if it counts the number of heads in n flips of a fair coin. Figure Figure 2.7 shows two distributions, ϕ=dX∼B(10,0.5) and ψ=dY∼B(18,0.5). Since Y counts over more flips, we expect it to take larger values in some sense, and we also expect its distribution ψ to reflect this. How can we make these thoughts precise?
A standard order over distributions that captures this idea is defined as follows: Given finite set X partially ordered by ⪯ and ϕ,ψ∈D(X), we say that ψstochastically dominatesϕ and write ϕ⪯Fψ if
The relation ⪯F is also called first-order stochastic dominance to differentiate it from other forms of stochastic order.
A good way to interpret first-order stochastic dominance is to suppose that an agent has preferences over outcomes in X described by a utility function u∈RX. Suppose in addition that the agent prefers more to less, in the sense that u∈iRX, and that the agent ranks lotteries over X according to expected utility, so that the agent evaluates ϕ∈D(X) according to ∑xu(x)ϕ(x). Then the agent (weakly) prefers ψ to ϕ whenever ϕ⪯Fψ.
We can say more. Consider the class A of all agents who (a) have preferences over outcomes in X, (b) prefer more to less, and (c) rank lotteries over X according to expected utility. Then ϕ⪯Fψ if and only if every agent in A prefers ψ to ϕ.
To state another useful perspective on stochastic dominance, we introduce the notation
For a given distribution ϕ, the function Gϕ is sometimes called the counter CDF (counter cumulative distribution function) of ϕ.
The proof is given. Figure Figure 2.8 helps to illustrate. Here X⊂R and ϕ and ψ are distributions on X. We can see that ϕ⪯Fψ because the counter CDFs are ordered in the sense that Gϕ⩽Gψ pointwise on X.
We are often interested in whether a change in a parameter shifts an outcome up or down. For example, a parameter might appear in a central bank decision rule for pegging an interest rate, and we want to know whether increasing that parameter will increase steady state inflation. By providing sufficient conditions for monotone shifts in fixed points, results in this section can help answer such questions.
Let (P,⪯) be a partially ordered set. Given two self-maps S and T on a set P, we write S⪯T if Su⪯Tu for every u∈P and say that TdominatesS on P.
One might assume that, in a setting where T dominates S, the fixed points of T will be larger. This can hold, as in Figure Figure 2.9, but it can also fail, as in Figure Figure 2.10. A difference between these two situations is that in Figure Figure 2.9 the map T is globally stable. This leads us to our next result.
Figure 2.9:Ordered fixed points when global stability holds
Figure 2.10:Reverse-ordered fixed points when global stability fails
As an application of Proposition 2.2.7, consider again the Solow--Swan growth model kt+1=g(kt):=sf(kt)+(1−δ)kt. We saw in Section 1.2.3.2 that if f(k)=Akα where A>0 and α∈(0,1), then g is globally stable on M:=(0,∞). Clearly k↦g(k) is order-preserving on M. If we now increase, say, the savings rate s, then g will be shifted up everywhere, implying, via Proposition 2.2.7, that the fixed point also rises. Exercise 2.45 asks you to step through the details.
Figure Figure 2.11 helps illustrate the results of Exercise 2.45. The top left sub-figure shows a baseline parameterization, with A=2.0, s=α=0.3 and δ=0.4. The other sub-figures show how the steady state changes as parameters deviate from that baseline.
Figure 2.11:Parametric monotonicity for the Solow--Swan model
Figure Figure 2.12 gives an illustration of the result in Exercise 2.46. Here an increase in β leads to a larger continuation value. This seems reasonable, since larger β indicates more concern about outcomes in future periods.
Figure 2.12:Parametric monotonicity in β for the continuation value
While the preceding examples of parametric monotonicity are all one-dimensional, we will soon see that Proposition 2.2.7 can also be applied in high-dimensional settings.
Many aspects of dynamic programming are most clearly framed using operator theory. In this section, we discuss linear operators and their connections to matrices. We emphasize nonnegative matrices and so-called positive linear operators that arise naturally in dynamic programming.
We call a matrix Anonnegative and write A⩾0 if all elements of A are nonnegative. We call Aeverywhere positive and write A≫0 if all elements of A are strictly positive. A square matrix A is called irreducible if A⩾0 and ∑k=1∞Ak≫0. An interpretation in terms of connected networks is given in Chapter 1 of Sargent & Stachurski (2023).
Let A be n×n. It is not always true that the spectral radius ρ(A) is an eigenvalue of A.[1] However, when A⩾0, the spectral radius is always an eigenvalue. The following theorem states this result and several extensions.
The convergence in (2.43) provides a sharp characterization of large powers of A that will prove useful in what follows. The assumption that A is everywhere positive can be weakened without affecting this convergence. A complete statement and full proof of the Perron--Frobenius theorem can be found in Meyer (2000).
We can use the Perron--Frobenius theorem to provide bounds on the spectral radius of a nonnegative matrix. Fix n×n matrix A=(aij) and set
Let A be an n×n matrix. We know from Gelfand’s formula that if ∥⋅∥ is any matrix norm, then ∥Ak∥1/k→ρ(A) as k→∞. While useful, this lemma can be difficult to apply because it involves matrix norms. Fortunately, when A is nonnegative, we have the following variation, which only involves vector norms.
The expression on the left of (2.44) is sometimes called the local spectral radius of A at h. Lemma 2.3.3 gives one set of conditions under which a local spectral radius equals the spectral radius. This result will be useful when we examine state-dependent discounting in Chapter 6.
For a proof of Lemma 2.3.3 see Theorem 9.1 of Krasnosel’skii et al. (1972).
where 1 is a column vector of ones, so that P is nonnegative and has unit row sums. The Perron--Frobenius theorem will be useful for the following exercise.
The vector ψ in part (iii) of Exercise 2.48 is called a stationary distribution for P. Such distributions play an important role in the theory of Markov chains. We discuss their interpretation and significance in Section 3.1.2.
We illustrate the power of the Perron--Frobenius theorem by showing how it helps us analyze a model of employment and unemployment flows in a large population.
The model is sometimes called a “lake model” because there are two pools of workers: those who are currently employed and those who are currently unemployed but still seeking work. The flows between states are as follows:
Workers exit the labor market at rate d.
New workers enter the labor market at rate b.
Employed workers separate from their jobs and become unemployed at rate α.
Unemployed workers find jobs at rate λ.
We assume that all parameters lie in (0,1). New workers are initially unemployed.
Transition rates between two pools appear in Figure Figure 2.13. For example, the rate of flow from employment to unemployment is α(1−d), which equals the fraction of employed workers who remained in the labor market and separated from their jobs.
Let et and ut be the number of employed and unemployed workers at time t respectively. The total population (of workers) is nt:=et+ut. In view of the rates just stated, the number of unemployed workers evolves according to
The three terms on the right correspond to the newly unemployed (due to separation), the unemployed who failed to find jobs last period, and new entrants into the labor force. The number of employed workers evolves according to
Evolution of the time series for ut, et and nt is illustrated in Figure Figure 2.14. We set parameters to α=0.01, λ=0.1, d=0.02, and b=0.025. The initial population of unemployed and employed workers are u0=0.6 and e0=1.2, respectively. The series grow over the long run due to net population growth.
Figure 2.14:Time series for et, ut and nt, (lake_2.jl)
Can we say more about the dynamics of this system? For example, what long-run unemployment rate should we expect? Also, do long-run outcomes depend heavily on the initial conditions u0 and e0? Can we make some general statements that hold regardless of the initial state?
To begin to address these questions, we first organize the linear system for (et) and (ut) by setting
With these definitions, we can write the dynamics as xt+1=Axt. As a result, xt=Atx0, where x0=(u0e0)⊤.
The overall growth rate of the total labor force is g=b−d, in the sense that nt+1=(1+g)nt for all t.
In the language of Perron--Frobenius theory, the right eigenvector xˉ is called the dominant eigenvector, since it corresponds to the dominant (i.e., largest) eigenvalue ρ(A). This eigenvector plays an important role in determining long-run outcomes. In the remainder of this section we illustrate this fact.
To begin, recall that αxˉ is also a right eigenvector corresponding to the eigenvalue ρ(A) when α>0. The set D:={x∈R2:x=αxˉ for some α>0} is shown as a dashed black line in Figure Figure 2.15. The figure also shows two time paths, each of the form (xt)t⩾0=(Atx0)t⩾0, generated from two different initial conditions. In both cases, we see that both paths converge to D over time. The figure suggests that paths share strong similarities in the long run that are determined by the dominant eigenvector xˉ.
Figure 2.15:Time paths xt=Atx0 for two choices of x0 (lake_1.jl)
To see why this is so, we return (2.43) from to the Perron--Frobenius theorem, which tells us that, since A≫0, we have
where nt=(1+g)tn0 and n0=u0+e0. This says that, regardless of the initial condition, the state xt scales along xˉ at the rate of population growth. This is precisely what we saw in Figure Figure 2.15.
We can provide additional interpretations to the components uˉ and eˉ of xˉ. Since nt is the size of the workforce at time t, the rate of unemployment is ut/nt. As just shown, for large t this is close to (ntuˉ)/nt=uˉ. Hence uˉ is the long-term rate of unemployment along the stable growth path. Similarly, the other component eˉ of the dominant eigenvector is the long-run employment rate.
In summary, the dominant eigenvector provides with both the long-run rate of unemployment and the stable growth path, to which all trajectories with positive initial conditions converge over time.
There are two ways to think about a matrix. In one definition, an n×k matrix A is an n×k array of (real) numbers. In the second, A is a linear operator from Rk to Rn that takes a vector u∈Rk and sends it to Au in Rn. Let’s clarify these ideas in a setting where n=k. While the matrix representation is important, the linear operator representation is more fundamental and more general.
(We write Lu instead of L(u), etc.) For example, if A is an n×n matrix, then the map from u to Au defines a linear operator, since the rules of matrix algebra yield A(αu+βv)=αAu+βAv.
We just showed that each matrix can be regarded as a linear operator. In fact the converse is also true:
A proof of Theorem 2.3.4 can be found in Kreyszig (1978) and many other sources.
Why introduce linear operators if they are essentially the same as matrices? One reason is that, while a one-to-one correspondence between linear operators and matrices holds in Rn, the concept of linear operators is far more general. Linear operators can be defined over many different kinds of sets whose elements have vector-like properties. This is related to the point that we made about function spaces in Remark 1.2.2.
Another reason is computational: The matrix representation of a linear operator can be tedious to construct and difficult to instantiate in memory in large problems. We illustrate this point in Section 2.3.3.3.
The definition of linear operators on Rn extends naturally to linear operators on RX when X={x1,…,xn}: A linear operator on RX is a map L from RX to itself such that, for all u,v∈RX and α,β∈R, we have L(αu+βv)=αLu+βLv. In what follows,
We use the same symbol L on both sides of the equals sign because both represent essentially the same object (in the sense that a matrix A can be viewed as a collection of numbers (Aij) or as a linear map u↦Au).
The function L on the right-hand side of (2.56) is sometimes called the “kernel” of the operator L. However, we will call it a matrix in what follows, since L(x,x′)=L(xi,xj) is just an n×n array of real numbers. When more precision is required, we will call it the matrix representation of L.
In essence, the operation in (2.56) is just matrix multiplication: (Lu)(x) is row x of the matrix product Lu.
The eigenvalues and eigenvectors of the linear operator L are defined as the eigenvalues and eigenvectors of its matrix representation. The spectral radius ρ(L) of L is defined analogously.
We used the same symbol for the operator L on the left-hand side of (2.56) and its matrix representation on the right because these two objects are in one-to-one correspondence. In particular, every L∈L(RX) can be expressed in the form of (2.56) for a suitable choice of matrix (L(x,x′)). Readers who are comfortable with these claims can skip ahead to Section 2.3.3.3. The next lemma provides more details.
Lemma 2.3.5 needs no formal proof. Theorem 2.3.4 already tells us that (a) and (b) are in one-to-one correspondence. Also, (b) and (c) are in one-to-one correspondence because each L∈L(RX) can be identified with a linear operator u↦Lu on Rn by pairing u,Lu∈RX with its vector representation in Rn (see Section 1.2.4.2). Finally, (d) and (a) are in one-to-one correspondence under the identification L(xi,xj)↔Lij.
At the end of Section 2.3.3.1 we claimed that working with linear operators brings some computational advantages vis-à-vis working with matrices. This section fills in some details (Readers who prefer not to think about computational issues at this point can skip ahead to Section 2.3.3.4.)
To illustrate the main idea, consider a setting where the state space X takes the form X=Y×Z with ∣Y∣=j and ∣Z∣=k. A typical element of X is x=(y,z). As we shall see, this kind of setting arises naturally in dynamic programming.
Let Q be a map from Z×Z to R (i.e., a k×k matrix) and consider the operator sending u∈RX to Lu∈RX according to the rule
Since L is a linear operator on RX, Lemma 2.3.5 tells us that L can be represented as an n×n matrix (L(xi,xj))=(Lij), where n=∣X∣=j×k. To construct this matrix, we first need to “flatten” Y×Z into a set X={x1,…,xn} with a single index. There are two natural ways to do this. Considering Y×Z as a two-dimensional array with typical element (yi,zj), we can (a) stack all k columns vertically into one long column, or (b) concatenate all j rows into one long row. The first arrangement is called column-major ordering and is the default for languages such as Julia and Fortran. The second is called row-major ordering and is the default for languages such as Python and C. Either way we obtain a set of elements indexed by 1,…,n.
After adopting one of these conventions, Lemma 2.3.5 assures us we can construct a uniquely defined n×n matrix that represents L. Once we decide how to construct this matrix, we can instantiate it in computer memory and compute the operation u↦Lu by matrix multiplication.
There are, however, several disadvantages to implementing L using this matrix-based approach. One is that constructing the matrix representation is tedious. Another is that confusion can arise when swapping between column- and row-major orderings in order to shift between languages or to communicate with colleagues. A third is that differences are introduced between computer code and the natural representation (2.57), which can be a source of bugs. A fourth issue is that an n×n matrix has to be instantiated in memory, even though the linear operation in (2.57) is only an inner product in Rk. The last issue can be alleviated in most languages by employing sparse matrices, but doing so adds boilerplate and can be a source of inefficiency.
Because of these issues, most modern scientific computing environments support linear operators directly, as well as actions on linear operators such as inverting linear maps. These considerations encourage us to take an operator-based approach.
Having agreed on the benefits of an operator-theoretic exposition, let us now describe some kinds of linear operators. We continue to assume that X is a finite set with n elements.
The set R+X of all u∈RX with u⩾0 is called the positive cone of RX. An operator L∈L(RX) is called positive if L is invariant on the positive cone; that is, if
Viewed as matrices, elements of M(RX) are nonnegative matrices whose rows sum to one. The next exercise asks you to confirm this.
In the next exercise, you can think of ϕ as a row vector and ϕP as premultiplying the matrix P by this row vector. Chapter 3 uses the map ϕ↦ϕP to update marginal distributions generated by Markov chains.
Markov operators are important for us because they generate Markov dynamics, a foundation of dynamic programming. Thus, (2.63) is a rule for updating distributions by one period under the Markov dynamics specified by P. We’ll use it often in the next chapter.
Davey & Priestley (2002) provide a good introduction to partial orders and order-theoretic concepts. Our favorite books on fixed points and analysis include Ok (2007), Zhang (2012), Cheney (2013), and Atkinson & Han (2005). Good background material on order-theoretic fixed-point methods can be found in Guo et al. (2004) and Zhang (2012).
Atkinson, K., & Han, W. (2005). Theoretical Numerical Analysis (Vol. 39). Springer.
Aliprantis, C. D., & Burkinshaw, O. (1998). Principles of Real Analysis (3rd ed.). Academic Press.
Sargent, T., & Stachurski, J. (2023). Economic Networks: Theory and Computation. Cambridge University Press.
Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra (Vol. 71). Siam.
Krasnosel’skii, M. A., Vainikko, G. M., Zabreiko, P. P., Rutitskii, Ya. B., & Stetsenko, V. Ya. (1972). Approximate Solution of Operator Equations. Springer Netherlands.
Kreyszig, E. (1978). Introductory Functional Analysis with Applications (Vol. 1). Wiley New York.
Zaanen, A. C. (2012). Introduction to Operator Theory in Riesz Spaces. Springer.
Davey, B. A., & Priestley, H. A. (2002). Introduction to Lattices and Order. Cambridge University Press.
Ok, E. A. (2007). Real Analysis with Economic Applications (Vol. 10). Princeton University Press.
Zhang, Z. (2012). Variational, Topological, and Partial Order Methods with Their Applications (Vol. 29). Springer.
Cheney, W. (2013). Analysis for Applied Mathematics (Vol. 208). Springer Science & Business Media.
Guo, D., Cho, Y. J., & Zhu, J. (2004). Partial Ordering Methods in Nonlinear Problems. Nova Publishers.