This chapter collects the mathematical tools employed throughout the book. It is intended as a reference; readers should consult the relevant sections as needed. At minimum, all readers should be familiar with the material in Section A.1 before starting Chapter 2.
Let’s start with the real line. This set has a natural metric and a natural order, both of which have many important properties. (Later we will investigate conditions under which such properties extend to more general spaces.)
A point u∈R is called an upper bound of a set A⊂R if a⩽u for all a∈A. Let U(A) be the set of upper bounds of A. If s∈U(A) and s⩽u for all u∈U(A), then s is called the supremum of A and we write s=supA. At most one such supremum s exists. If s is in U(A) then the following are equivalent:
s=supA
for all ϵ>0, there exists a point a∈A with a>s−ϵ
Theorem A.1.1 is essentially axiomatic. It is equivalent to the “completeness” property of R that we discuss in Section A.1.1.2.
For A⊂R, a lower bound of A is any number ℓ such that ℓ⩽a for all a∈A. If i∈R is a lower bound for A and also satisfies i⩾ℓ for every lower bound ℓ of A, then i is called the infimum of A and we write i=infA. At most one such i exists, and every nonempty subset of R bounded from below has an infimum.
We adopt the following conventions:
If A is not bounded above, then supA=+∞.
If A is not bounded below, then infA=−∞.
If A=∅, then supA=−∞ and infA=+∞.
A number m contained in a subset A of R is called the maximum of A and we write m=maxA if a⩽m for every a∈A. It is called the minimum of A if a⩾m for every a∈A.
As usual, given a,b∈R we write a∧b for min{a,b} and a∨b for max{a,b}. Regarding the order structure of R, the following relationships are sometimes helpful: Given x,y∈R and a∈R+,
Recall that a sequence (xn)⊂R is called Cauchy if, for all ϵ>0, there exists an N∈N with ∣xn−xm∣<ϵ whenever n,m⩾N.
The statement that every Cauchy sequence in R converges should be understood as axiomatic. It states that, once the irrational numbers are mixed in with the rational numbers, there are “no more gaps” in the real line. This is called the completeness property of R. See, e.g., Bartle & Sherbert (2011).
The pair (V,⪯) is called a partially ordered set -- or poset -- if V is any nonempty set and ⪯ is a relation ⪯ on V×V such that, for any u,v,w in V,
2
u⪯u
u⪯v and v⪯u implies u=v and
u⪯v and v⪯w implies u⪯w
(reflexivity)
(antisymmetry)
(transitivity)
The relation ⪯ is called a partial order on V. We often write V instead of (V,⪯) when ⪯ is understood. We sometimes say that wdominatesv when v⪯w.
A subset C of a poset (V,⪯) is called a chain in V if either u⪯v or v⪯u for all u,v∈C. A poset (V,⪯) is called totally ordered if V itself is a chain.
In applications, one of the most important notions of partial order is the pointwise partial order. To define it we let U,V be nonempty sets with V partially ordered by ⪯. Let VU be a set of maps from U to V. For each f,g∈VU, we set
Then ⪯ is a partial order on VU, usually called the pointwise order on VU.
One very common setting is where V⊂RX for some nonempty set X. In this setting, we always write the pointwise partial order as ⩽. In particular, for arbitrary u,v∈RX we write u⩽v if and only if u(x)⩽v(x) for all x∈X. The partial order in Example A.1.3 is a special case, when X={1,…,n}.
A subset A of poset V is called bounded above (resp., bounded below) if the set of upper bounds (resp., lower bounds) of A is nonempty (i.e., there exists at least one v∈V with a⪯v for all a∈A). A is called order bounded in V if A is both bounded above and bounded below. Obviously, A is order bounded in V if and only if there exists an order interval I⊂V such that A⊂I.
g∈V is the greatest element of A if g∈A and a∈A⟹a⪯g; and
ℓ∈V is the least element of A if ℓ∈A and a∈A⟹ℓ⪯a.
In other words, a greatest element of A is an upper bound of A that is also contained in A, while a least element of A is a lower bound of A also contained in A.
Not all subsets of partially ordered sets have greatest elements. For one example, observe that N⊂(R,⩽) has no greatest element. In this case the set of upper bounds is empty, so finding a greatest element is impossible. We can also have situations where the set of upper bounds is nonempty but no greatest element exists.
Figure A.1:The unit circle in R2 has no greatest element
If a poset V has a greatest element, that element is sometimes called the top of V. A least element of V is sometimes called the bottom of V.
Let A be a subset of poset V and let U(A) be the set of all upper bounds of A in V. We call s∈V the supremum of A if s is a least element of U(A). Since least elements are unique (Exercise A.2), subsets of V can have at most one supremum. When it exists, the supremum of A is denoted by ⋁A. Also,
if A={ai}i∈I for some index set I, we write ⋁A as ⋁iai.
Given u and v in V, the supremum ⋁{u,v} is also called the join of u and v, and is written u∨v.
While every A⊂R that is bounded above has a supremum (Theorem A.1.1), the same is not true for arbitrary posets.
Figure A.2 provides a visualization of u∨v and u∧v when V=(R2,⩽). Figure A.3 provides a visualization of f∨g and f∧g when V=(RX,⩽) for some subset X of the reals. In both cases, ⩽ is the pointwise partial order.
Figure A.3:Functions f∨g and f∧g when defined on a subset of R
Given A contained in poset V, an element of V is called the infimum of A if it is a greatest element of the set of lower bounds of A. The infimum of A is typically denoted ⋀A. If V⊂R with the usual order ⩽, then we also use the notation infA. Also,
if A={ai}i∈I for some index set I, we sometimes write ⋀A as ⋀iai.
Given u and v in V, the infimum ⋀{u,v} is also called the meet of u and v, and is written u∧v.
For the poset (R,⩽), every x∈R is an upper bound of the empty set ∅, since, vacuously, x dominates all elements of ∅. Since R has no least element, the set ∅⊂R has no supremum in R. By related reasoning, if we restrict attention to ([0,1],⩽), we see that 0 is the supremum of ∅⊂[0,1]. The next exercise extends this line of reasoning.
Given partially ordered set V, let V∂=(V,⪯∂) be the order dual (also called the dual), so that, for u,v∈V, we have u⪯∂v if and only if v⪯u. We use ⋁∂A to denote the supremum of A⊂V∂ in V∂ and ⋀∂ for the infimum.
Let V be any poset. A sequence (vn)n⩾1 in V is called increasing if vn⪯vn+1 for all n∈N, and decreasing if vn+1⪯vn for all n∈N. We write
vn↑v when (vn) is increasing and ⋁nvn=v and
vn↓v when (vn) is decreasing and ⋀nvn=v.
These symbols generalize standard notation for convergence of monotone sequences in R. For example, if (un) is increasing in R and its limit is u, then one writes un↑u. This is a special case of the usage above, since u is also the supremum of (un) under the standard order on R.
In some settings, the order theoretic concepts ↑ and ↓ have simple pointwise characterizations. The next lemma gives one such characterization for ↑ (and an analogous result holds for ↓). In the statement of the lemma, V⊂RX for some nonempty X and ⩽ is the pointwise partial order. Also, we say that V is closed under pointwise suprema if, for every increasing (vn)⊂V that is bounded above, the pointwise supremum s(x)=supnvn(x) is an element of V.
A self-map S from poset V=(V,⪯) to poset U=(U,⊴) is called
order preserving if v,w∈V and v⪯w implies Sv⊴Sw, and
order reversing if v,w∈V and v⪯w implies Sw⊴Sv.
In the definition of order preserving above, one common setting is when U=R with its standard order. In this case, the mapping S is often called increasing. We will also use this terminology. The result in the next exercise uses the fact that the standard order on R is closed (i.e., preserved under limits).
Now let’s examine a form of strict monotonicity. We consider posets V=(V,⪯) and W=(W,⊴). For u,v∈V, we write u≺v if u⪯v and not u=v. For x,y∈W, we write x⊲y if x⊴y and not x=y. We call a map S from V to Wstrictly order preserving if v≺w implies Sv⊲Sw. In the example below, ⩽ is the pointwise partial order and u<v means u⩽v and not u=v.
A surjective map F from poset (V,⪯) to poset (V^,⊴) is called an
order isomorphism if v⪯w⟺Fv⊴Fw, and an
order anti-isomorphism if v⪯w⟺Fw⊴Fv.
(Surjective means that F maps V onto V^, so each v^∈V^ has a preimage.) When such an order isomorphism (resp., anti-isomorphism) exists, we say that V and V^ are isomorphic (resp., anti-isomorphic).
In the next two exercises, X is any nonempty set and all spaces of real-valued functions have the pointwise partial order. Scalar actions on functions are applied pointwise; for example, given h∈RX, the function exph maps x to exp(h(x)).
Figure A.4 provides a visualization of the result in Exercise A.10 when X=[0,1]. The two functions are h(x)=x2−1/2, and h′(x)=x, so that h⩽h′. The middle panel sets θ=2, which preserves the order. The right panel sets θ=−2, which reverses order.
In Section 1.3.2.2 we discussed the fact that contractivity of the Bellman operator plays a significant role in the proof of Bellman-type optimality results for the optimal savings problem. Contractivity is a metric property that has no immediate counterpart in an abstract partially ordered set. This motivates us to introduce weaker conditions on operators that are well defined in any poset and, at the same time, strong enough to generate useful optimality results. This section gives details.
Let V be a poset and let S be a self-map on V. In this setting, we call Sorder stable if
S has a unique fixed point vˉ in V,
v∈V with v⪯Sv implies v⪯vˉ, and
v∈V with Sv⪯v implies vˉ⪯v.
Conditions (ii) and (iii) say that points mapped up by S lie below the fixed point, while points mapped down lie above it.
We strengthen this notion by adding monotone convergence to the fixed point. We call Sstrongly order stable if S is order stable and, in addition,
v∈V with v⪯Sv implies Snv↑vˉ, and
v∈V with Sv⪯v implies Snv↓vˉ.
Figure A.5 gives an illustration of a strongly order stable map S on V=[0,1]. All points mapped up by S lie below and converge up to its unique fixed point. All points mapped down by S lie above and converge down to its fixed point.
Most results in this book require only order stability. Strong order stability is invoked in a small number of places where monotone convergence of the iterates is needed.
The following result is useful when we consider minimization problems.
Let V be a nonempty set. A function d:V×V→R is called a metric on V if, for any u,v,w∈V,
2
d(u,v)⩾0,
d(u,v)=0⟺u=v,
d(u,v)=d(v,u) and
d(u,v)⩽d(u,w)+d(w,v).
(nonnegativity)
(identifiability)
(symmetry)
(triangle inequality)
Together, the pair (V,d) is called a metric space. When the metric is clear from context we refer to the metric space by the symbol V alone.
If (V,d) is a metric space and N⊂V, then (N,d) is also a metric space (where d in the second case is defined by restricting the original metric to (u,v)∈N×N).
A subsequence of a sequence (un) in V is any sequence of the form (unk)k⩾1, where (nk) is a strictly increasing sequence in N.
A metric space V is called separable if there exists a countable set A⊂V such that, for any v∈V, there exists a sequence (an) contained in A with an→v. For example, R is separable because any v∈R can be expressed as the limit of a rational sequence. Separability is useful in certain settings -- particularly when we need to combine topology and measure (see, e.g., Theorem A.3.3). In the applications we consider, most spaces will be separable.
Let V be a metric space. A point u∈A⊂V is called interior to A if there exists an ϵ>0 such that Bϵ(u)⊂A.
A subset G of V is called open in V if every u∈G is interior to G. For example, every subset of a discrete metric space is open, since B1/2(u)={u} for any u.
A subset F of V is called closed if given any sequence (un) satisfying un∈F for all n and un→u for some u∈V, the point u is in F. In other words, F contains the limit points of all convergent sequences that take values in F. Arbitrary unions and finite intersections of open sets are open, while arbitrary intersections and finite unions of closed sets are closed. A set G⊂V is open if and only if Gc is closed.
A set D in V is called bounded if there exists a finite K such that d(u,v)⩽K whenever u,v∈D. A sequence in V is called bounded if its range is a bounded subset of V. A subset K of V is called precompact in V if every sequence in K has a subsequence converging to some point in V. The set K is called compact if, in addition, the limit points always lie in K. Thus, K is compact if and only if K is closed and precompact.
The following theorem is a bedrock of real analysis.
Every precompact subset of a metric space is bounded, but the converse is not true in general. For example, consider the set bR with the supremum distance (Example A.1.14). Let fn be the normal density with variance 1 and mean n for each n in N. The set {fn}n∈N is bounded, since d∞(fn,fm)⩽1 for all n,m. But it is not precompact. For example, the sequence {fn}n∈N has no convergent subsequence. Indeed, every pair of distinct points fn,fm in the sequence has d∞(fn,fm)=1.
Later, in Section A.4.2.3, we will see sufficient conditions for boundedness to imply precompactness.
Let V=(V,d) be a metric space. Analogous to the real case (see Section A.1.1.2), a sequence (un)⊂V is called Cauchy if, given any ϵ>0, there exists an nϵ∈N such that n,m⩾nϵ implies d(un,um)<ϵ. (V,d) is called complete if every Cauchy sequence in V converges in V. Examples of complete spaces include Rn paired with any metric generated by a norm, the set of n×k matrices paired with any metric generated by a norm, the space (ℓp(X),dp) for countable X and p∈[1,∞], the space (bX,d∞), and the space (bcX,d∞). Two metrics d1 and d2 on V are called equivalent if there are positive constants α,β such that αd1(u,v)⩽d2(u,v)⩽βd1(u,v) for all u,v∈V. Equivalent metrics generate the same Cauchy sequences, so completeness is preserved under equivalence.
Topological spaces are a generalization of metric spaces. They are useful for two reasons. One is that there exist interesting and useful topological spaces that cannot be represented as metric spaces. The second is that, by stripping away some of the structure naturally present in metric spaces, topological arguments add simplicity and clarity to many discussions in analysis.
The family τ is called a topology on V. The elements of τ are called open sets. Complements of open sets are called closed.
A subset N of a topological space V=(V,τ) is called a neighborhood of a point v∈V if there exists a G∈τ with v∈G⊂N. A topological space V is called a Hausdorff space if, for any u,v∈V with u=v, there exist neighborhoods N of u and M of v with N∩M=∅. Every metrizable space is Hausdorff, and all topological spaces we consider in this book are Hausdorff spaces.
We briefly introduce nets, which are a generalization of a sequence. Nets are important because (a) they characterize topologies, in a sense described below, and (b) nets allow us to describe definitions and properties in a way that connects neatly to sequence-based definitions in metric spaces.
Let A be any nonempty set. A preorder on A is a relation ⪯ on A×A such that, for any a,b,c in A we have a⪯a (reflexivity) and a⪯b and b⪯c implies a⪯c (transitivity). Obviously any antisymmetric preorder on A is a partial order on A. A directed set is a nonempty set A and a preorder ⪯ on A such that, for any a,b∈A, there exists a c∈A with a⪯c and b⪯c.
Let V be any set. A net in V is a function from a directed set A to V, typically written as v∙ or (vα)α∈A. We sometimes simplify the latter to (vα). The interpretation is that α∈A is mapped to vα∈V. Obviously any sequence (vn) in V is also a net in V.
A net (vα)α∈A in V is said to converge to v∈V and we write vα→v if, for any neighborhood N of v, there exists a β∈A such that vα∈N whenever β⪯α. This generalizes the concept of convergence of sequences in metric space. It is easy to check that convergent nets in V have unique limits whenever V is Hausdorff. (The converse is also true.)
The next theorem shows that nets can be used to characterize topologies.
For a proof of Theorem A.2.1, see Theorem 2.14 of Aliprantis & Border (2006).
Let (vα)α∈A and (wβ)β∈B be two nets in V. The net (wβ)β∈B is called a subnet of (vα)α∈A if there exists an order preserving map p from B to A such that (i) wβ=vp(α) for all α∈A and (ii) for all α∈A, there exists a α′∈p(B) such that α⪯α′.
Subnets generalize subsequences. For example, suppose that wn=1/n2 and vn=1/n for n∈N, then (wn)n∈N is a subnet of (vn)n∈N in R (take A=B=N and p(n)=n2).
A subset K of topological space V is called compact if, given any net (vα) contained in K, there exists a subnet (wβ) of (vα) and a point v∈K such that wβ→v. This generalizes the notion of a compact subset of a metric space, as given in Section A.1.3.4.
Let V and W be topology spaces. A function f:V→W is said to be continuous at v∈V if, for any net (vα) in V with vα→v in V we have f(vα)→f(v) in W. If f is continuous at every v∈V we simply say that f is continuous. It is well-known that f is continuous on V if and only if f−1(G) is open in V whenever G is open in W. (For a proof of this equivalence, see, e.g., Theorem 2.28 of Aliprantis & Border (2006).)
One of the most important features of continuous functions is that they carry compact sets into compact sets (see, e.g., §2.3 of Dudley (2002)):
Let V be a nonempty set and, for each α in index set Λ, let fα be a function from V to topological space (Wα,τα). The initial topology generated by {fα}α∈Λ is the topology τ on V generated (in the sense of Example A.2.3) by the family of sets
Evidently each fα is continuous with respect to τ on V. The following lemma nicely characterizes convergence with respect to the initial topology. In the statement, τ is the initial topology just described.
A topological space (V,τ) is called metrizable if there exists a metric d on V such that d generates the topology τ. In metrizable spaces, sequences have the same “rights” as sequences in Euclidean space, in the sense that they determine the topology and hence other derived objects such as continuous functions. For example, given two metrizable spaces V and W,
a function f:V→W is continuous if and only if, for u∈V and any sequence (vn)⊂V we have f(vn)→f(v) in W whenever vn→v in V.
A set C is closed in V if and only if any convergent sequence contained in C converges to an element of C.
Equivalent metrics generate the same topology, which is why it is often nicer to discuss topologies than metrics. For example, we will see later (in Section A.4.2) that all metrics on Euclidean space Rn generated by a norm are equivalent. Hence, while there are infinitely many norms on Rn, they all generate the same topology. This means that, when discussing norm topologies, we can speak unambiguously about open sets, compact sets, continuous functions, etc.
Let {(Vn,τn)}n∈N be a family of topological spaces and consider the Cartesian product V=∏n∈NVn. The i-th projection map on V is the function πi sending v=(vn)n∈N∈V into vi. The product topology on V, denoted here by τ, is the initial topology generated by the set of projection maps {πn}n∈N. The following result is a direct consequence of Lemma A.2.3.
More generally, if ((Xi,di))i=1n are metric spaces and X:=∏iXi has the product topology, then (uk)⊂X converges to u∈X if and only if di(uk,u)→0 for all i.
For finite subsets of R, maxima and minima clearly exist. For infinite collections the same is not true. For example, the set (0,1) has neither a maximum nor a minimum.
Under what conditions on primitives are maxima and minima guaranteed to exist? There are multiple approaches to this issue, depending on the structure of the problem. In this section we treat one of the most fundamental, attributed to the German mathematician Karl Weierstrass (1815--1897).
Let f be a function from a metric space V to R. Let (vn) be an V-valued sequence and let v be a point in V. The function f is called
lower semicontinuous at v if vn→v implies f(v)⩽liminfnf(vn), and
upper semicontinuous at v if vn→v implies f(v)⩾limsupnf(vn).
If f is lower semicontinuous at every point in V, then f is called lower semicontinuous, and similarly for upper continuity.
A proof of the next theorem can be found in Jahn (2020).
Let V be any set and let S be a self-map on V. If v∈V obeys Sv=v, then v is called a fixed point of S in V. For example, if V=R and S is the identity, then every point in R is fixed under S. If, instead, Sx=x2, then the set of fixed points is {0,1}.
Now let V be a topological space. We call S:V→Vglobally stable on V if S has a unique fixed point u∗∈V and Sku→u∗ as k→∞ for all u∈V. When V is metrizable, with metric d, a self-map S is called asymptotically contracting if d(Snu,Snv)→0 as n→∞ for all u,v∈V.
For a proof, see, for example, Aliprantis & Border (2006), Theorem 3.48.
Most of the conclusions of Banach’s contraction mapping theorem carry over when S is eventually contracting; that is, when Sk is contracting for some k∈N. A proof can be found on p. 9 of Goebel & Kirk (1990).
In this section we review measurable functions and integration theory. Measurable functions generalize continuous functions while remaining closed under standard arithmetic and limiting operations, and they admit a well-defined theory of integration. Throughout, for real-valued f on an arbitrary domain, we set f+:=f∨0 and f−:=−(f∧0). See Figure A.6 for an illustration. The function f+ is called the positive part of f, while f− is called the negative part. The identity f=f+−f− always holds, so the pair f+, f− provides a decomposition of f into the difference between two nonnegative functions.
Let X be any nonempty set. A collection of subsets A of X is called a σ-algebra on X if
X∈A,
A∈A implies Ac∈A, and
if {An}n⩾1 is a sequence contained in A, then ∪nAn∈A.
A pair (X,A) where X is a nonempty set and A is a σ-algebra on X is called a measurable space.
Points (ii) and (iii) tell us that A is “stable” under the taking of complements and unions. By De Morgan’s law (∩nAn)c=∪nAnc, any σ-algebra is stable under countable intersections too. By (i) and (ii), ∅∈A also holds.
One way to define a σ-algebra is to take a collection C of subsets of X, and consider the smallest σ-algebra that contains this collection.
Now let X be a metric space. The family of Borel sets on X, denoted by either B or BX depending on whether or not the underlying space is clear, is defined as the σ-algebra generated by the open sets of X. Evidently B contains not only all the open subsets of X but also all the closed ones. From these sets we can continue taking complements and countable unions and everything we produce must be a Borel set. In fact it turns out that every set we work with in day-to-day analysis is a Borel set.
In other words, measurable functions are those functions that pull measurable sets back to measurable sets. If Y is a metric space and B is its Borel sets, then we will say that f is Borel measurable. It can be shown in this case (see, e.g., Çınlar (2011), Proposition 2.3) that f is Borel measurable if and only if either one of the following apparently weaker conditions are satisfied:
f−1(G) is in A whenever G is open in Y
Y is a Borel subset of R and f−1((−∞,α)) is in A for all α∈R.
From this result it is immediate that every continuous function from X to Y is also Borel measurable.
While the class of continuous functions has beautiful properties and is closed under uniform limits (see Example A.1.20), it is not closed under pointwise limits,[2] which makes it hard to work with in some instances. On the other hand, the set of Borel functions is closed under the taking of pointwise limits:
In fact, in our setting, the set of Borel measurable functions is precisely the smallest class of functions that contains the continuous functions and is closed under the taking of pointwise limits (see, e.g., §11.7 of Kechris (2012)).
It is also true that compositions of Borel measurable functions are also Borel measurable, and, when the functions are real-valued, that Borel measurability is preserved under algebraic operations. The next lemma gives one statement of these results:
See Çınlar (2011), Chapter 1, Section 2 for proofs.
A.3.1.3Parametric Continuity and Measurable Selections¶
We often wish to know whether or not continuity passes from primitives to solutions. For example, we might ask whether an equilibrium object, constructed through a process that involves optimization, varies continuously with parameters. The most commonly used theorem in this domain is Berge’s theorem of the maximum. Here we state a version of Berge’s theorem. Throughout, A and X are metric spaces.
A correspondence from X to A is a map Γ from X to the set of all subsets of A. A correspondence Γ from X to A is called nonempty if Γ(x) is nonempty for all x∈X. A function σ from X to A is called a measurable selection with respect to Γ if σ is Borel measurable and σ(x)∈Γ(x) for all x∈X.
A nonempty correspondence Γ is called
compact-valued if Γ(x) is compact for all x∈X,
lower hemi-continuous at x∈X if, for any y∈Γ(x) and any (xn) with xn→x, there exists a sequence (yn) with yn∈Γ(xn) for all n and yn→y,
upper hemi-continuous at x∈X if, for any sequence (xn) with xn→x and any sequence (yn) with yn∈Γ(xn) for all n, there exists a convergent subsequence of (yn) whose limit is in Γ(x), and
continuous on X if it is both lower and upper hemi-continuous at every x∈X.
Let Γ be a nonempty, compact-valued correspondence from X to A. Let q be a real valued function on G:={(x,a)∈X×A:a∈Γ(x)} and set
A proof of the continuity results in Theorem A.3.3 can be found in §17.5 of Aliprantis & Border (2006). Existence of a measurable selection is proved in §18.19 of the same reference.
Through the theory constructed above, we can identify broad classes of sets and functions that are relatively well behaved (e.g., Borel sets and Borel functions). This opens the way to analyzing how to (a) measure these sets and (b) integrate the functions. The first step is to introduce the notion of a measure, which is a map μ from a σ-algebra A to [0,∞] satisfying
μ(∅)=0 and
μ(∪n=1∞An)=∑n=1∞μ(An) whenever {An}⊂A is disjoint.
Here disjointness of {An} means that any two distinct sets in this sequence are disjoint.
Returning to the general case of a measure μ on measurable space (X,A), if there exists a sequence of sets (An)⊂A with μ(An)<∞ for all n and ∪nAn=X, then μ is called σ-finite. If μ(X)<∞, then μ is called finite. If μ(X)=1, then μ is called a probability measure.
If X is a metric space and A=B (the Borel sets), then μ is called a Borel measure. If A=B and μ(X)=1, then μ is called a Borel probability measure. For a Borel probability measure μ, the value μ(B) usually is interpreted as the probability that, when a random element of X is selected, that element is in B.
A measure space is a triple (X,A,μ) where (X,A) is a measurable space and μ is a measure on A. If μ(X)=1, then the measure space is also called a probability space. In this case it is common to write the measure space as (Ω,F,P). A random variable on probability space (Ω,F,P) is an (F,B)-measurable map X from Ω to R paired with its Borel sets B. More generally, given measurable space (E,E), an E-valued random element on probability space (Ω,F,P) is an (F,E)-measurable map X from Ω to E. The distribution of this random element X is the probability measure P defined by
Given measurable spaces (X,A) and (Y,B), the product σ-algebraA⊗B is the σ-algebra on X×Y generated by all sets of the form A×B with A∈A and B∈B. The triple (X×Y,A⊗B) is called the product measurable space.
If μ and ν are σ-finite measures on (X,A) and (Y,B) respectively, then there exists a unique measure μ⊗ν on A⊗B satisfying
Let (X,A) be a measurable space and let mA+ be the set of nonnegative real-valued Borel measurable functions on (X,A). We define an integral on mA+ to be a function I:mA+→[0,∞] such that
I(f)=0 when f=0 everywhere on X,
f1⩽f2⩽⋯ and limn→∞fn=f implies limn→∞I(fn)=I(f), and
α,β⩾0 and f,g∈mA+ implies I(αf+βg)=αI(f)+βI(g).
The limit in (ii) is a pointwise limit, so that limn→∞fn=f means limn→∞fn(x)=f(x) for every x∈X.
The following important result, proved in chapter 1 of Çınlar (2011), states that every measure on a measurable space creates a unique and well defined integral.
The value Iμ(f) is called the integral of f under μ and the following notation is common:
The integral Iλ introduced in Example A.3.7 is called the Lebesgue integral, and it extends the standard Riemann integral to a larger set of functions (the Borel measurable functions), while at the same time guaranteeing that the attractive properties (i)--(iii) in the definition of the integral will hold.
Equation (A.28) makes sense in this setting because if, say, f=1[a,b], then
where the first equality is by (A.28) and the second is by the fact that Lebesgue measure assigns length to intervals. The value b−a is also what we would expect for the integral, since it is the area under the curve for this simple function.[3]
If μ is a probability measure and w:X→R, then one often writes Ew(x) for the integral of w(x) with respect to μ. That is,
Given a measure space (X,A,μ), a property is said to hold μ-almost everywhere (or μ-almost surely when μ is a probability measure) if it holds on all of X except possibly a set of μ-measure zero. A sequence (fn) converges to fμ-almost everywhere if fn(x)→f(x) for μ-almost every x∈X.
The integral extends to functions that take negative values, as well as just the nonnegative functions in mA+. Indeed, if (X,A,μ) is a measure space and f∈mA is not necessarily nonnegative, then we can still decompose it into the difference between two nonnegative functions via f=f+−f−. Imposing linearity, we now set
The only risk here is that both terms on the right equal +∞, in which case the integral is not well defined. If both integrals are finite we call fintegrable with respect to μ.
In what follows, we leave (X,A,μ) fixed and write the integral Iμ(f) of f under μ as ∫fdμ. We note that every integral is increasing, in the sense that
To see this, observe that g−f is nonnegative (and measurable) and hence ∫(g−f)dμ is well defined and nonnegative. Now, using the linearity in part (iii) of Theorem A.3.5, we have
A battery of useful limit theorems exist for the integral we have defined. In our statements of these results, (X,A,μ) is any measure space and f and fn are (A,B)-measurable functions from X to R for all n∈N.
The first implication (i.e., (i) ⟹(A.36)) is called the monotone convergence theorem. The second is called the dominated convergence theorem.
Next we review prediction based on conditional expectations. Conditional expectations are themselves a cornerstone of economic theory and empirics, since they describe optimal forecasts based on limited information. Here we provide a brief treatment of the general setting that suffices for what follows.
Let Y and the elements of G:={X1,…,Xk} be scalar random variables. Consider the problem of predicting Y given G. That is, we wish to form a prediction of the value that Y will take once X1,…,Xk are known, without any additional information on the state of the world. Another way to say this is that we seek a (nonrandom) function f:Rk→R such that
To find such an f we must define what “good” means. The most common definition in the present context is that mean squared errorE[(Y^−Y)2] is small. Thus, we have a minimization problem in function space (the set from which f is chosen). Based on projection arguments, it can be shown that there exists an essentially unique f^ in the set of functions from Rk to R that solves
In the next proposition, a random variable Y is called G-measurable if there exists a function f such that Y=f(X1,…,Xk). Intuitively, Y is perfectly predictable given the data in G.
Property (vi) states that the linearity of expectations is preserved under conditioning. Property (ii) is called the law of iterated expectations, and is shared by all projections. Property (v) is sometimes called conditional determinism, since X can be treated like a constant when it is pinned down by the information set. A full proof of Proposition A.3.7 can be found in Çınlar (2011).
In this section we provide a brief introduction to martingales, one of the most important classes of stochastic processes, and a result on stopping times that’s needed for the theory of optimal stopping.
Let (Ω,F,P) be a probability space and let (Ft)t⩾0 be a sequence of σ-algebras with Ft⊂Ft+1⊂F for all t, called a filtration. A sequence of random variables (Mt)t⩾0 is called a martingale with respect to (Ft) if, for all t⩾0,
Mt is Ft-measurable,
E∣Mt∣<∞, and
E[Mt+1∣Ft]=Mt.
A stopping time with respect to (Ft) is a random variable τ taking values in {0,1,2,…}∪{∞} such that {τ⩽t}∈Ft for all t⩾0.
Next we present the optional stopping theorem for discrete-time martingales.
Using this, we establish a general result on exit times for bounded martingales. In the statement of the theorem, (Mt,Ft) is a discrete-time martingale with M0=m, and
We humans have natural geometric intuition about the space Rn when n=3. If this intuition can be expressed algebraically, then R3 results often extend to Rn for arbitrary n∈N -- and also to more general collections of objects, such as matrices, complex numbers and real-valued functions, provided that these collections are assigned some basic algebraic structure analogous to that enjoyed by vectors in R3.
Of course we need to formalize what “analogous” means by codifying the properties that we need the algebraic operations to satisfy. This leads to the concept of (abstract) vector space. In this section we recall the definition of such spaces and review key properties.
We begin with linear algebraic properties in abstract sets that generalize the idea of adding and scalar multiplying vectors in Rn. Then we discuss properties of subsets of and maps over these abstract “vector spaces.”
A vector space (also called a linear space) is a triple (E,+,⋅) where E is a nonempty set, + is a map from E×E to E called addition and ⋅ is a map from R×E to E called scalar multiplication, such that for all u,v,w∈E and α,β∈R,
u+(v+w)=(u+v)+w
u+v=v+u
there exists an element 0∈E, called the origin, s.t. u+0=u for all u∈E
for all u∈E, there exists a v∈E such that u+v=0
α⋅(β⋅u)=(α⋅β)⋅u
1⋅u=u
α⋅(u+v)=α⋅u+α⋅v
(α+β)⋅u=α⋅u+β⋅u
In practice, the ⋅ symbol is usually omitted, so αu:=α⋅u. In the present context, the values α,β,… are often called scalars. Also, the origin, which shares the symbol 0 with the zero element from R, is sometimes referred to as the additive identity.[4]
The vector space Rn is a special case of Example A.4.2, obtained when X=[n].
Given vector space E, set C⊂E is called convex if u,v∈C and α∈[0,1] implies αu+(1−α)v∈C. In other words, C is closed under the taking of convex combinations.
When E is any vector space, a nonempty subset C of E is called a cone in E if
C is convex,
x∈C and −x∈C implies x=0 and
αx∈C whenever x∈C and α⩾0.
(Some authors refer to C as a “pointed convex cone.”)
A linear combination of vectors u1,…,uk in E is a vector of the form α1u1+⋯+αkuk where α1,…,αk are scalars. A set S⊂E is called linearly independent if, for any finite set {u1,…,uk}⊂S, we have
A basis of a linear subspace S of E is a linearly independent subset B of S that spans S (i.e., each u∈S can be expressed as a finite linear combination of elements of B).
A proof can be found in Jänich (1994). In case (ii), we say that E is n-dimensional. E is called finite-dimensional if E is n-dimensional for some n∈N. In case (iii), we call Einfinite-dimensional.
Given vector space E, a map ∥⋅∥:E→R is called a norm on E if, for any α∈R and any u,v∈E,
2
∥u∥⩾0
∥u∥=0⟺u=0
∥αu∥=∣α∣∥u∥ and
∥u+v∥⩽∥u∥+∥v∥
(nonnegativity)
(positive definiteness)
(positive homogeneity)
(triangle inequality)
The pair (E,∥⋅∥)=((E,+,⋅),∥⋅∥) is called a normed vector space (or normed linear space). When ∥⋅∥ is understood we refer to the space using the symbol E.
Consider a normed vector space (E,∥⋅∥) with origin 0. Recalling the definition of boundedness from metric spaces, one can show that a subset S of E is bounded if and only if there exists an M∈N such that ∥u∥⩽M for all u∈S.
If (E,∥⋅∥) is a normed linear space, then (E,d) is a metric space when d(u,v):=∥u−v∥. All metric concepts extend to (E,∥⋅∥). For example,
G⊂E is said to be open in E if G is open in (E,d).
A sequence (vn) in E is said to converge to v∈E if d(vn,v)→0.
Let E be a vector space and let ∥⋅∥ and ∥⋅∥′ be two norm on E. These norms are said to be equivalent if there exist finite positive constants A,B such that ∥x∥⩽A∥x∥′ and ∥x∥′⩽B∥x∥ for all x∈E. The following result is fundamental. See, for example, Aliprantis & Burkinshaw (1998), Theorem 27.6.
Completeness is essential to many important theorems in applied analysis. Fortunately, the completeness of R is inherited by many useful spaces. For example,
A proof can be found in, e.g., Aliprantis & Burkinshaw (1998), Theorem 27.6.
A complete normed vector space is called a Banach space. There are many other important Banach spaces, beyond the finite-dimensional ones.
Let (E,∥⋅∥) be a normed vector space. All equivalent metrics induce the same precompact sets and the same bounded sets in E. Since all norms on a finite-dimensional space are equivalent (Theorem A.4.3), any metric induced by a norm on a finite dimensional vector space has the property that its precompact and bounded sets coincide (cf., Theorem A.1.6). The next theorem states this fact for the record.
In line with our discussion in Section A.1.3.4, this one-to-one pairing between closed bounded sets and compact sets breaks down in infinite dimensional spaces. In fact, the closed unit ball of a normed vector space E is compact if and only if E is finite-dimensional.
Let μ be a σ-finite measure on measurable space (X,A) and let p⩾1. The space Lp(X,A,μ) consists of all Borel measurable functions f:X→R with ∫∣f∣pdμ finite. Functions that agree μ-almost everywhere are identified. The functional ∥f∥p:=(∫∣f∣pdμ)1/p is a norm on Lp(X,A,μ).
Scheffés identity provides a useful quantitative interpretation of d1 distance between densities: For any densities f and g on (X,A,μ), we have
(Here ∥u∥ is the norm of u in E and ∥Au∥ is the norm of Au in F.) When ∥A∥ is finite, A is called a bounded linear operator. The set of all bounded linear operators from E to F will be denoted B(E,F). If E=F then we write B(E). Every A∈B(E,F) is continuous, since, for un→u in E we have
The converse is also true: every continuous linear operator from E to F is bounded -- see §2.7 of Kreyszig (1978) for a proof of this fact, as well as Theorem A.4.8 below.
As suggested by the name, the operator norm is a norm on B(E,F). The details are left as an exercise.
The operator norm is submultiplicative: If A,B∈B(E), then ∥AB∥:=∥A∘B∥⩽∥A∥⋅∥B∥. Iteratively applying the submultiplicative property gives ∥Ai∥⩽∥A∥i for any i∈N and A∈B(E), where Ai is the i-th composition of A with itself.
Once we have a norm on B(E,F), we have an induced metric given by d(A,B)=∥A−B∥, and B(E,F) will be a Banach space whenever this metric is complete.
Let E be a Banach space and let A be an element of B(E). A complex scalar λ is called an eigenvalue of A∈B(E) if there exists a nonzero vector e such that Ae=λe. The spectrum of A, typically denoted σ(A), is the set of all scalar λ such that λI−A fails to be bijective on E. Any eigenvalue λ lies in σ(A) because if Ae=λe for some nonzero e, then λI−A maps e to 0, while also mapping 0 to 0. Hence λI−A is not bijective. For A∈B(E), the spectral radius of A is defined as
In this section we study order completeness and order continuity, ordered vector spaces and Riesz spaces, the interplay between topology and order including Banach lattices and weighted sup-norm spaces, Markov models, and orders over distributions.
When studying the real line R, we can define completeness either as existence of limits for Cauchy sequences (Theorem A.1.2) or as existence of suprema for bounded above sets (Theorem A.1.1). The first idea can be extended to metric spaces by generalizing the concept of Cauchy sequences (see Section A.1.3.5). The second can be extended to posets by analogy with existence of suprema. The aim of this section is to describe this second concept of completeness.
When we discuss posets, there are multiple notions of completeness, with each one determined by the classes of sets that are required to have suprema. Specifically, a nonempty poset V is called
a lattice if every finite subset of V has both a supremum and an infimum in V,
chain complete if every chain in V has a supremum and an infimum in V, and
countably chain complete if every at most countable chain in V has a supremum and an infimum in V.
Note that chain completeness implies countable chain completeness but not conversely, and that neither concept implies nor is implied by the lattice property.
In the definitions above, finite sets are understood to be nonempty (i.e., in one-to-one correspondence with {1,…,n} for some n∈N). Also, “at most countable” means empty, finite or countable. The fact that the empty set is included has significance, as the next lemma illustrates.
The following result is a version of the Knaster--Tarski fixed point theorem for chain complete posets. In the statement, V is a nonempty poset.
A sublattice of a lattice V is a subset S of V with the property that u∨v and u∧v are in S whenever u,v∈S.
Consider the canonical partially ordered set (Rk,⩽). This set is not countably chain complete: for example, letting 1 be a vector of ones, the increasing sequence (vn)=(n1) has no supremum. At the same time, (Rk,⩽) certainly has some completeness properties. For example, it follows easily from Exercise A.4 that every bounded above subset of Rk has a supremum, and every bounded below subset of Rk has an infimum. This motivates the following definitions:
A partially ordered set V is called Dedekind complete if, for any nonempty A⊂V,
A is bounded above ⟹A has a supremum in V and
A is bounded below ⟹A has an infimum in V.
V is called countably Dedekind complete if, for any nonempty finite or countable A⊂V,
A is bounded above ⟹A has a supremum in V and
A is bounded below ⟹A has an infimum in V.
There are natural connections between Dedekind (resp., countable Dedekind) completeness and chain (resp. countable chain) completeness. Here is one simple result.
Next we add algebraic structure to posets. The combination of algebraic operations and order will allow us to develop sharp sufficient conditions for dynamic programs and convergence of algorithms.
Let E=(E,+,⋅) be a vector space with origin 0 (see Section A.4.1) and let ⩽ be a partial order on E. We call (E,⩽) an ordered vector space if the order is preserved under addition and nonnegative scalar multiplication; that is, if
u⩽v implies u+b⩽v+b for any b∈E, and
u⩽v and α∈R with 0⩽α implies αu⩽αv.
The positive cone of E, typically denoted by E+, is all v∈E with 0⩽v.
If (E,⩽) is an ordered vector space and u,v,w∈E, then
u⩽0 and v⩽0 implies u+v⩽0,
u⩽v implies −v⩽−u,
(u∨v)+w=(u+w)∨(v+w), and
α(u∨v)=(αu)∨(αv) whenever α⩾0.
These facts follow directly from the definitions.
Using the definition in Section A.1.2.4, if (vn) is a sequence in ordered vector space E and v∈E, then the statement vn↑v means that (vn) is increasing and ⋁nvn=v.
In some settings, a partial order is introduced into a vector space E by first choosing a (pointed convex) cone C on E (see Section A.4.1.2) and stating that u⩽v if and only if v−u∈C. The following discussion clarifies this idea.
A linear operator T mapping ordered vector space E to itself is called positive if T is invariant on the positive cone; that is, if u∈E and u⩾0 implies Tu⩾0.
(In the canonical example given above, positive operators are identified with nonnegative matrices. Unfortunately, this notational inconsistency is deeply embedded in the existing literature so we must accept it.)
Let E be an ordered vector space and let A:E→E be a linear operator. Recalling the definition in Section A.5.1.3, A is order continuous on E when (vn)⊂E and vn↑v∈E implies Avn↑Av. By Lemma A.5.5, every order continuous linear operator is order preserving -- and hence positive. The next exercise can be completed using Lemma A.5.9.
A self-map S on a convex subset C of ordered vector space E:=(E,⩽) is called convex on C if
S(λv+(1−λ)v′)⩽λSv+(1−λ)Sv′ whenever v⩽v′∈C and 0⩽λ⩽1
Next we introduce Riesz spaces, which are ordered vector spaces with lattice structure. This structure allows for the introduction of a notion of absolute value, which behaves similarly to the pointwise absolute value over vectors in Rn. Absolute value in turn helps us clarify and quantify the actions of operators, providing new opportunities for establishing optimality conditions in dynamic programs.
An ordered vector space E is called a Riesz space if E is a lattice. With ∨ and ∧ as the lattice operations and u,v,w∈E, the following properties always hold:
u∧v=−((−u)∨(−v)) and u∨v=−((−u)∧(−v)).
(u∧v)+w=(u+w)∧(v+w) and (u∨v)+w=(u+w)∨(v+w).
These facts can be easily verified and other related results are found in Chapter 2 of Zaanen (2012).
For element u of any Riesz space (E,⩽) we use the notation
Let (X,A,μ) be as in the previous section and fix p∈[1,∞). Let Lp:=Lp(X,A,μ) be the Banach space of equivalence classes defined in Section A.4.2.4. Let ⩽ be defined by f⩽g if and only if {x∈X:f(x)>g(x)} has μ-measure zero.
The space (Lp,⩽) just described is a Riesz space. For example, if f,g∈Lp, then ∣f∨g∣⩽∣f∣+∣g∣, and ∫∣f∣dμ and ∫∣g∣dμ are both finite. Hence f∨g∈Lp.
Since each Riesz space is a partially ordered space, the notions of Dedekind and countable Dedekind completeness apply directly. Moreover, when testing these forms of completeness, one-sided conditions suffice. The following one-sided condition is particularly simple.
For a proof of Lemma A.5.14, see Theorem 12.1 of Zaanen (2012).
We will make repeated use of the following fact.
A proof of Lemma A.5.15 can be found in Example 12.5 of Zaanen (2012).
Several interesting function spaces are naturally ordered by the pointwise partial order. Next we study the completeness properties of such Riesz spaces. We will make use of the following lemma, in the statement of which, for (vn)⊂RX, the symbol supnvn indicates the pointwise supremum.
Now let (X,A,μ) be a σ-finite measure space and let mX and bX be the Riesz spaces discussed in Section A.5.2.4. As above, let ⩽ be the pointwise partial order.
In some applications it will be helpful to draw on results that use topological or metric structure. In this section we note some elementary facts about topological, metric and normed spaces where order is also present.
A partially ordered space, also called a pospace, is a Hausdorff topological space endowed with a closed partial order. (We make the Hausdorff assumption so that sequences have unique limits.)
The next lemma connects topological and order convergence in partially ordered space V=(V,⪯). In the statement, (vα)α∈Λ is a net in V.
The next lemma shows how global stability (see Section A.2.2.1) interacts with order stability in the setting of partially ordered space.
The following result can be used to compare fixed points of operators. In the statement, V=(V,⪯) is a pospace and S(V) is all self-maps on V, ordered pointwise (i.e., for S,T∈S(V), we have S⪯T if and only if Sv⪯Tv for all v∈V).
A partially ordered metric space is a tuple (V,⪯,d) where (V,⪯) is a poset, d is a metric on V, and (V,⪯) is a pospace under the topology induced by d on V. In particular, ⪯ is closed with respect to d, so that
d(vn,v)→0 and d(un,u)→0 with un⪯vn for all n implies u⪯v.
Sup-nonexpansive metrics will be useful for us because contraction properties are passed from collections of mappings to their upper envelopes. The next lemma explains. In the statement,
(V,⪯,d) is a partially ordered metric space,
T:={Tσ:σ∈Σ} is a collection of self-maps on V,
V0 is a subset of V and Tv:=∨σTσv exists at each v∈V0.
Let E=(E,⩽) be a Riesz space and let ∥⋅∥ be a complete norm on E, so that (E,∥⋅∥) is a Banach space. If the norm is compatible with the order structure on E, in the sense that ∥u∥⩽∥v∥ whenever ∣u∣⩽∣v∣, then ∥⋅∥ is called a lattice norm and E:=(E,⩽,∥⋅∥) is called a Banach lattice.
If (vn) is a sequence in E then convergence is as defined for sequences in normed linear space (see Section A.4.2.1): vn→v means that $| v_n
v | \to 0asn \to \infty.Thisshouldnotbeconfusedwithv_n \uparrow
vandv_n \downarrow v$, which are defined in terms of suprema and infima (see Section A.5.1.3). Some relationships between the different forms of convergence are discussed in the next theorem.
Let E be a Banach lattice. An element e∈E+ will be called a normalized order unit if ∥e∥=1 and ∣u∣⩽∥u∥e for all u∈E. For example, if E=bX, then e=1 is a normalized order unit.[5]
In this section we introduce a class of Banach lattices that are useful for handing unbounded dynamic programming problems. To this end, let X be a topological space. A weight function on X is a mapping ℓ∈mX with ℓ(x)⩾1 for all x∈X. Given a weight function ℓ and v∈RX we introduce the ℓ-weighted supremum norm
If E is a Banach lattice, then, as in Section A.4.3, we take B(E) to be the norm bounded (and hence norm continuous) linear self-maps on E. Let B+(E) be the positive linear self-maps on E.
Continuing in the setting of Exercise A.50, the pointwise partial order on B(E) is defined by A⩽B whenever Av⩽Bv for all v∈E. The set B+(E) coincides with the positive cone of B(E). On this positive cone, the spectral radius is order preserving:
A Banach lattice E is said to have a σ-order continuous norm if
Many dynamic programs have some form of Markov structure (or can be coerced into a Markov framework by suitably changing the state space). Here we review key ideas related to Markov processes and state some useful results.
In all of this section (Section A.5.4), X is a metric space with Borel sets B. The symbol D(X) is the set of all distributions (Borel probability measures) on X. If X is finite, then the metric on X is the discrete metric (under which all real-valued functions on X are continuous and B is the set of all subsets of X).
Let U be a second metric space. A transition kernel from U to X is a function N from U×B to R+ with the property that u↦N(u,B) is Borel measurable for each B∈B and B↦N(u,B) is a measure on (X,B) for all u∈U. A stochastic kernel from U to X is a transition kernel P from U to X satisfying P(u,X)=1 for all u∈U. Informally, the stochastic kernel P takes a point u∈U and randomly “transitions” to a new point in X via the distribution P(u,dx).
A common setting is where U=X. In this case we say that N is a transition kernel on X, while P is a stochastic kernel on X.
An X-valued stochastic process (Xt)t=0∞ on (Ω,F,P) is called (P,ψ)-Markov if X0=dψ and P{Xt+1∈B∣Xt}=P(Xt,dx′) with probability one for all t⩾0. If ψ=δx for some x∈X, then we say (Xt)t⩾0 is (P,x)-Markov. We also say that (Xt)t⩾0 is P-Markov if (Xt)t⩾0 is (P,ψ)-Markov for some ψ∈D(X).
Given any stochastic kernel P on X and any initial condition x∈X, a (P,x)-Markov process always exists. In particular, we can take the canonical construction: set Ω=X∞, let Xt(ω)=ωt be the coordinate projections, and let Px be the unique probability measure on X∞ (equipped with the product σ-algebra) such that X0=xPx-a.s. and Px{Xt+1∈B∣X0,…,Xt}=P(Xt,B). Existence of Px follows from the Ionescu-Tulcea theorem (see, e.g., Meyn & Tweedie (2009), Chapter 3).
Let θ:X∞→X∞ denote the shift operator defined by θ(x0,x1,…)=(x1,x2,…). The following is the Markov property in its general form: for the canonical chain, if f:X∞→R is measurable and either nonnegative or bounded, then
whenever the integral is well-defined. We call P the Markov operator generated by the stochastic kernel P. We use the same symbol because stochastic kernels and Markov operators can be placed in one-to-one correspondence via
(The stochastic kernel is on the left and the Markov operator is on the right.)
(We already studied a version of P in Example A.4.6 on page .) Intuitively, (Ph)(x) represents the expectation of h(Xt+1) given Xt=x. We extend this interpretation below.
Given a stochastic kernel P on X, a distribution ϕ∈D is called stationary for P if
The following lemma lists useful properties of the Markov operator ((A.95)) when considered as a linear operator on bX, the set of bounded Borel measurable functions on X. Proofs can be found in Meyn & Tweedie (2009), Chapter 3.
(Note that (vi) follows from (v) and Gelfand’s formula for the spectral radius.)
Here is a fundamental result linking the stochastic kernel P, Markov operator P, and any P-Markov process (Xt)t⩾0. For a proof see Meyn & Tweedie (2009), Proposition 3.4.2.
Sometimes we wish to consider Markov operators as linear operators over a space of integrable functions. To this end, let P be a stochastic kernel on X and let ϕ be stationary for P. As before, we use the same symbol P for the Markov operator defined in (A.95). The space L1(ϕ):=L1(X,B,ϕ) is the Banach lattice discussed in Example A.5.8.
The proof of part (i) follows from the following adjoint rule:
To obtain (i) of Lemma A.5.32 we can use ∫∣Ph∣dψ⩽∫P∣h∣dψ and the apply (A.102) with h replaced by ∣h∣. This proves that Ph is ψ-integrable whenever h is ψ-integrable. Linearity of P is immediate. Part (ii) follows from order preservation of the integral. For (iii), ∥P∥⩽1 follows from the bound on ∫∣Ph∣dψ just obtained, and ∥P∥⩾1 from P1=1. Part (iv) follows from (iii) and Gelfand’s formula.
Distributions are objects that decision makers naturally have preferences over. For example, speculators care about probability distributions over returns of prospective investments, often preferring distributions that offer high average returns with low risk. A planner might have preferences over the cross-sectional distributions of consumption and wealth. In this section, we discuss common methods for ordering distributions and their relationships with each other.
Let X be a metric space and let D(X) be the set of all distributions (i.e., Borel probability measures) on X. Let ibX be the increasing bounded real-valued functions on X. For μ and ν in D(X), we say that
νfirst order stochastically dominatesμ and write μ⪯Fν if
If we refer to stochastic dominance without explicitly stating the order, then the understanding is that we mean first order stochastic dominance.
Suppose now that X is a Borel subset of R and fix F,G∈D(X). We understand F and G as cumulative distribution functions. When testing first order stochastic dominance, it is sufficient to restrict attention to increasing functions u∈bX that take the form u(x)=1{a<x} for some a∈X (see, e.g., Stachurski (2022), §9.4.1). Recalling the interpretation of the integral given in (A.28), this leads to the statement that F⪯FG if and only if 1−F(a)⩽1−G(a) for all a∈X, or
Here is a property that implies first order stochastic dominance: Consider a pair of distributions (F,G) with positive densities f and g on an interval I contained in R. We say that (f,g) has a monotone likelihood ratio if f/g is increasing on I; that is, if
We will be concerned with analyzing how behavior changes when decisions become “riskier” in some sense. To analyze such scenarios, we introduce the notion of a mean-preserving spread. In particular, for a given distribution ϕ, we say that ψ is a mean-preserving spread of ϕ if there exists a pair of random variables (Y,Z) such that
Good introductions to real analysis include Bartle & Sherbert (2011) and Aliprantis & Burkinshaw (1998). For topology, measure theory, and functional analysis at an advanced level, Aliprantis & Border (2006) provides comprehensive coverage, while Dudley (2002) and Kreyszig (1978) offer accessible treatments. Fixed point theory in metric spaces is developed in Goebel & Kirk (1990), and nonlinear optimization is covered in Jahn (2020).
For lattice theory and order, Davey & Priestley (2002) provides a thorough introduction. High quality monographs on Riesz spaces, Banach lattices and positive operators include Aliprantis & Border (2006), Aliprantis & Burkinshaw (2006), Zaanen (2012), Meyer-Nieberg (2012), and Bátkai et al. (2017).
For further reading on probability theory and stochastic processes, Pollard (2002), Çınlar (2011) and Dudley (2002) are outstanding. For Markov chains and stochastic stability, the standard reference is Meyn & Tweedie (2009).
More precisely, σ(C) is the intersection of all σ-algebras on X that contain C. One can show that σ(C) is always a well defined σ-algebra, since the intersection is nonempty (it at least contains ℘(X)) and any intersection of σ-algebras is again a σ-algebra.
A more general perspective on (A.28) that you might find useful is as follows. Suppose we identify measurable sets with their indicator functions. Then μ already provides us with an “integral” over the indicators in mA+. The map Iμ extends the reach of this function to all of mA+.
Some authors would call what we have described as a real vector space, which can then be extended to the notion of complex vector spaces. We have no need for this extension here, so we drop the adjective “real.”
Not all Banach lattices have normalized order units. In fact it can be shown that a Banach lattice E admits a normalized order unit if and only if E is an AM-space with unit. We omit these details.