Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

A Mathematical Background

This chapter collects the mathematical tools employed throughout the book. It is intended as a reference; readers should consult the relevant sections as needed. At minimum, all readers should be familiar with the material in Section A.1 before starting Chapter 2.

A.1Foundations

In this first section of the chapter, we cover several foundational ideas in analysis, including metrics and partial orders.

A.1.1Properties of the Real Line

Let’s start with the real line. This set has a natural metric and a natural order, both of which have many important properties. (Later we will investigate conditions under which such properties extend to more general spaces.)

A.1.1.1Min, Max, Sup and Inf

A point uRu \in \RR is called an upper bound of a set ARA \subset \RR if aua \leq u for all aAa \in A. Let U(A)U(A) be the set of upper bounds of AA. If sU(A)s \in U(A) and sus \leq u for all uU(A)u \in U(A), then ss is called the supremum of AA and we write s=supAs = \sup A. At most one such supremum ss exists. If ss is in U(A)U(A) then the following are equivalent:

  1. s=supAs = \sup A

  2. for all ϵ>0\epsilon > 0, there exists a point aAa \in A with a>sϵa > s - \epsilon

Theorem A.1.1 is essentially axiomatic. It is equivalent to the “completeness” property of R\RR that we discuss in Section A.1.1.2.

For ARA \subset \RR, a lower bound of AA is any number \ell such that a\ell \leq a for all aAa \in A. If iRi \in \RR is a lower bound for AA and also satisfies ii \geq \ell for every lower bound \ell of AA, then ii is called the infimum of AA and we write i=infAi = \inf A. At most one such ii exists, and every nonempty subset of R\RR bounded from below has an infimum.

We adopt the following conventions:

A number mm contained in a subset AA of R\RR is called the maximum of AA and we write m=maxAm = \max A if ama \leq m for every aAa \in A. It is called the minimum of AA if ama \geq m for every aAa \in A.

Given an arbitrary set DD and fRDf \in \RR^D, we set

supxDf(x)sup{f(x):xD}andmaxxDf(x)max{f(x):xD}.\sup_{x \in D} f(x) \coloneq \sup \setntn{f(x)}{x \in D} \quad \text{and} \quad \max_{x \in D} f(x) \coloneq \max \setntn{f(x)}{x \in D}.

A point xDx^* \in D is called a

Equivalently, xDx^* \in D is a maximizer of ff on DD if f(x)=maxxDf(x)f(x^*) = \max_{x \in D} f(x), and a minimizer if f(x)=minxDf(x)f(x^*) = \min_{x \in D} f(x). We define

argmaxxDf(x){xD:f(x)f(x) for all xD}.\argmax_{x \in D} f(x) \coloneq \setntn{x^* \in D}{f(x^*) \geq f(x) \text{ for all } x \in D}.

The set argminxDf(x)\argmin_{x \in D} f(x) is defined analogously.

As usual, given a,bRa, b \in \RR we write aba \wedge b for min{a,b}\min\{a, b\} and aba \vee b for max{a,b}\max\{a, b\}. Regarding the order structure of R\RR, the following relationships are sometimes helpful: Given x,yRx, y \in \RR and aR+a \in \RR_+,

  1. x+y=xy+xyx + y = x \vee y + x \wedge y

  2. xy=xyxy|x - y| = x \vee y - x \wedge y

  3. xy=x+y2(xy)|x - y| = x + y - 2 (x \wedge y)

  4. xy=2(xy)xy|x - y| = 2 ( x \vee y) -x -y

  5. a(xy)=(ax)(ay)a(x \vee y) = (ax ) \vee (ay)

  6. a(xy)=(ax)(ay)a(x \wedge y) = (ax ) \wedge (ay)

A.1.1.2Completeness of the Real Line

Recall that a sequence (xn)R(x_n) \subset \RR is called Cauchy if, for all ϵ>0\epsilon > 0, there exists an NNN \in \NN with xnxm<ϵ|x_n - x_m| < \epsilon whenever n,mNn, m \geq N.

The statement that every Cauchy sequence in R\RR converges should be understood as axiomatic. It states that, once the irrational numbers are mixed in with the rational numbers, there are “no more gaps” in the real line. This is called the completeness property of R\RR. See, e.g., Bartle & Sherbert (2011).

A.1.2Partial Orders

Partially ordered spaces are the natural habitat for dynamic programs, In this section we introduce the key ideas needed for the book.

A.1.2.1Partially Ordered Sets

The pair (V,)(V, \preceq) is called a partially ordered set -- or poset -- if VV is any nonempty set and \preceq is a relation \preceq on V×VV \times V such that, for any u,v,wu, v, w in VV,

2

  1. uuu \preceq u

  2. uvu \preceq v and vuv \preceq u implies u=vu = v and

  3. uvu \preceq v and vwv \preceq w implies uwu \preceq w

  4. (reflexivity)

  5. (antisymmetry)

  6. (transitivity)

The relation \preceq is called a partial order on VV. We often write VV instead of (V,)(V, \preceq) when \preceq is understood. We sometimes say that ww dominates vv when vwv \preceq w.

A subset CC of a poset (V,)(V, \preceq) is called a chain in VV if either uvu \preceq v or vuv \preceq u for all u,vCu, v \in C. A poset (V,)(V, \preceq) is called totally ordered if VV itself is a chain.

In applications, one of the most important notions of partial order is the pointwise partial order. To define it we let U,VU, V be nonempty sets with VV partially ordered by \preceq. Let VUV^U be a set of maps from UU to VV. For each f,gVUf, g \in V^U, we set

fg    f(u)g(u) for all uU.f \preceq g \quad \iff \quad f(u) \preceq g(u) \text{ for all } u \in U.

Then \preceq is a partial order on VUV^U, usually called the pointwise order on VUV^U.

One very common setting is where VRXV \subset \RR^\Xsf for some nonempty set X\Xsf. In this setting, we always write the pointwise partial order as \leq. In particular, for arbitrary u,vRXu, v \in \RR^\Xsf we write uvu \leq v if and only if u(x)v(x)u(x) \leq v(x) for all xXx \in \Xsf. The partial order in Example A.1.3 is a special case, when X={1,,n}\Xsf = \{1, \ldots, n\}.

A.1.2.2Bounds

Let VV be a poset. IVI \subset V is called an order interval in VV if there exists an a,ba, b in VV with aba \preceq b such that

I={vV:avb}.I = \setntn{v \in V}{a \preceq v \preceq b} .

In this case we also write I=[a,b]I = [a,b].

Given a poset VV and a subset AA of VV, we call

A subset AA of poset VV is called bounded above (resp., bounded below) if the set of upper bounds (resp., lower bounds) of AA is nonempty (i.e., there exists at least one vVv \in V with ava \preceq v for all aAa \in A). AA is called order bounded in VV if AA is both bounded above and bounded below. Obviously, AA is order bounded in VV if and only if there exists an order interval IVI \subset V such that AIA \subset I.

A.1.2.3Greatest and Least Elements

Given poset VV and AVA \subset V, we say that

In other words, a greatest element of AA is an upper bound of AA that is also contained in AA, while a least element of AA is a lower bound of AA also contained in AA.

Not all subsets of partially ordered sets have greatest elements. For one example, observe that N(R,)\NN \subset (\RR, \leq) has no greatest element. In this case the set of upper bounds is empty, so finding a greatest element is impossible. We can also have situations where the set of upper bounds is nonempty but no greatest element exists.

The unit circle in \RR^2 has no greatest element

Figure A.1:The unit circle in R2\RR^2 has no greatest element

If a poset VV has a greatest element, that element is sometimes called the top of VV. A least element of VV is sometimes called the bottom of VV.

A.1.2.4Suprema and Infima

Let AA be a subset of poset VV and let U(A)U(A) be the set of all upper bounds of AA in VV. We call sVs \in V the supremum of AA if ss is a least element of U(A)U(A). Since least elements are unique (Exercise A.2), subsets of VV can have at most one supremum. When it exists, the supremum of AA is denoted by A\bigvee A. Also,

While every ARA \subset \RR that is bounded above has a supremum (Theorem A.1.1), the same is not true for arbitrary posets.

Figure A.2 provides a visualization of uvu \vee v and uvu \wedge v when V=(R2,)V = (\RR^2, \leq). Figure A.3 provides a visualization of fgf \vee g and fgf \wedge g when V=(RX,)V = (\RR^\Xsf, \leq) for some subset X\Xsf of the reals. In both cases, \leq is the pointwise partial order.

The points u \vee v and u \wedge v in \RR^2

Figure A.2:The points uvu \vee v and uvu \wedge v in R2\RR^2

Functions f \vee g and f \wedge g when defined on a subset of \RR

Figure A.3:Functions fgf \vee g and fgf \wedge g when defined on a subset of R\RR

Given AA contained in poset VV, an element of VV is called the infimum of AA if it is a greatest element of the set of lower bounds of AA. The infimum of AA is typically denoted A\bigwedge A. If VRV \subset \RR with the usual order \leq, then we also use the notation infA\inf A. Also,

For the poset (R,)(\RR, \leq), every xRx \in \RR is an upper bound of the empty set \varnothing, since, vacuously, xx dominates all elements of \varnothing. Since R\RR has no least element, the set R\varnothing \subset \RR has no supremum in R\RR. By related reasoning, if we restrict attention to ([0,1],)([0,1], \leq), we see that 0 is the supremum of [0,1]\varnothing \subset [0,1]. The next exercise extends this line of reasoning.

A.1.2.5Order Duals

Given partially ordered set VV, let V=(V,)V^\partial = (V, \preceq^\partial) be the order dual (also called the dual), so that, for u,vVu, v \in V, we have uvu \preceq^\partial v if and only if vuv \preceq u. We use A\bigvee^\partial A to denote the supremum of AVA \subset V^\partial in VV^\partial and \bigwedge^\partial for the infimum.

A.1.2.6Monotone Sequences

Let VV be any poset. A sequence (vn)n1(v_n)_{n \geq 1} in VV is called increasing if vnvn+1v_n \preceq v_{n+1} for all nNn \in \NN, and decreasing if vn+1vnv_{n+1} \preceq v_n for all nNn \in \NN. We write

These symbols generalize standard notation for convergence of monotone sequences in R\RR. For example, if (un)(u_n) is increasing in R\RR and its limit is uu, then one writes unuu_n \uparrow u. This is a special case of the usage above, since uu is also the supremum of (un)(u_n) under the standard order on R\RR.

In some settings, the order theoretic concepts \uparrow and \downarrow have simple pointwise characterizations. The next lemma gives one such characterization for \uparrow (and an analogous result holds for \downarrow). In the statement of the lemma, VRXV \subset \RR^\Xsf for some nonempty X\Xsf and \leq is the pointwise partial order. Also, we say that VV is closed under pointwise suprema if, for every increasing (vn)V(v_n) \subset V that is bounded above, the pointwise supremum s(x)=supnvn(x)s(x) = \sup_n v_n(x) is an element of VV.

A.1.2.7Order Preserving Maps

A self-map SS from poset V=(V,)V = (V, \preceq) to poset U=(U,)U = (U, \trianglelefteq) is called

In the definition of order preserving above, one common setting is when U=RU = \RR with its standard order. In this case, the mapping SS is often called increasing. We will also use this terminology. The result in the next exercise uses the fact that the standard order on R\RR is closed (i.e., preserved under limits).

A.1.2.8Strict Monotonicity

Now let’s examine a form of strict monotonicity. We consider posets V=(V,)V = (V, \preceq) and W=(W,)W = (W, \trianglelefteq). For u,vVu, v \in V, we write uvu \prec v if uvu \preceq v and not u=vu = v. For x,yWx, y \in W, we write xyx \vartriangleleft y if xyx \trianglelefteq y and not x=yx = y. We call a map SS from VV to WW strictly order preserving if vwv \prec w implies SvSwS v \vartriangleleft S w. In the example below, \leq is the pointwise partial order and u<vu<v means uvu\leq v and not u=vu=v.

A.1.2.9Order Isomorphisms

A surjective map FF from poset (V,)(V,\preceq) to poset (V^,)(\hat V, \trianglelefteq) is called an

(Surjective means that FF maps VV onto V^\hat V, so each v^V^\hat v \in \hat V has a preimage.) When such an order isomorphism (resp., anti-isomorphism) exists, we say that VV and V^\hat V are isomorphic (resp., anti-isomorphic).

In the next two exercises, X\Xsf is any nonempty set and all spaces of real-valued functions have the pointwise partial order. Scalar actions on functions are applied pointwise; for example, given hRXh \in \RR^\Xsf, the function exph\exp h maps xx to exp(h(x))\exp(h(x)).

Visualization of  with \Xsf = [0,1]

Figure A.4:Visualization of Exercise A.10 with X=[0,1]\Xsf = [0,1]

Figure A.4 provides a visualization of the result in Exercise A.10 when X=[0,1]\Xsf = [0,1]. The two functions are h(x)=x21/2h(x) = x^2 - 1/2, and h(x)=xh'(x) = x, so that hhh \leq h'. The middle panel sets θ=2\theta = 2, which preserves the order. The right panel sets θ=2\theta =-2, which reverses order.

The next exercise generalizes Exercise A.10.

In the following exercises, (V,)(V, \preceq) and (V^,)(\hat V, \trianglelefteq) are arbitrary posets.

The next exercise is related to Lemma A.1.4.

A.1.2.10Order Stability

In Section 1.3.2.2 we discussed the fact that contractivity of the Bellman operator plays a significant role in the proof of Bellman-type optimality results for the optimal savings problem. Contractivity is a metric property that has no immediate counterpart in an abstract partially ordered set. This motivates us to introduce weaker conditions on operators that are well defined in any poset and, at the same time, strong enough to generate useful optimality results. This section gives details.

Let VV be a poset and let SS be a self-map on VV. In this setting, we call SS order stable if

  1. SS has a unique fixed point vˉ\bar v in VV,

  2. vVv \in V with vSvv \preceq S \, v implies vvˉv \preceq \bar v, and

  3. vVv \in V with SvvS \, v \preceq v implies vˉv\bar v \preceq v.

Conditions (ii) and (iii) say that points mapped up by SS lie below the fixed point, while points mapped down lie above it.

We strengthen this notion by adding monotone convergence to the fixed point. We call SS strongly order stable if SS is order stable and, in addition,

  1. vVv \in V with vSvv \preceq S \, v implies SnvvˉS^n v \uparrow \bar v, and

  2. vVv \in V with SvvS \, v \preceq v implies SnvvˉS^n v \downarrow \bar v.

Figure A.5 gives an illustration of a strongly order stable map SS on V=[0,1]V = [0,1]. All points mapped up by SS lie below and converge up to its unique fixed point. All points mapped down by SS lie above and converge down to its fixed point.

A strongly order stable map S on [0,1]

Figure A.5:A strongly order stable map SS on [0,1][0,1]

Most results in this book require only order stability. Strong order stability is invoked in a small number of places where monotone convergence of the iterates is needed.

The following result is useful when we consider minimization problems.

A.1.3Metric Space

In this section we define metric spaces and review convergence, open and closed sets, compactness, and completeness.

A.1.3.1Definition

Let VV be a nonempty set. A function d ⁣:V×VRd \colon V \times V \to \RR is called a metric on VV if, for any u,v,wVu, v, w \in V,

2

  1. d(u,v)0d(u,v) \geq 0,

  2. d(u,v)=0    u=vd(u,v)=0 \iff u=v,

  3. d(u,v)=d(v,u)d(u,v)=d(v,u) and

  4. d(u,v)d(u,w)+d(w,v)d(u,v)\leq d(u,w)+d(w,v).

  5. (nonnegativity)

  6. (identifiability)

  7. (symmetry)

  8. (triangle inequality)

Together, the pair (V,d)(V, d) is called a metric space. When the metric is clear from context we refer to the metric space by the symbol VV alone.

If (V,d)(V, d) is a metric space and NVN \subset V, then (N,d)(N, d) is also a metric space (where dd in the second case is defined by restricting the original metric to (u,v)N×N(u, v) \in N \times N).

A.1.3.2Convergence

Given any point uu in metric space (V,d)(V, d), the ϵ\epsilon-ball around uu is the set

Bϵ(u){vV:d(u,v)<ϵ}.B_\epsilon(u) \coloneq \setntn{v \in V}{d(u, v) < \epsilon} .

We say that sequence (un)V(u_n) \subset V converges to uVu \in V if

ϵ>0,  nϵN s.t. nnϵ    unBϵ(u).\forall \, \epsilon > 0,\; \exists \, n_\epsilon \in \NN \st n \geq n_\epsilon \implies u_n \in B_\epsilon(u).

A subsequence of a sequence (un)(u_n) in VV is any sequence of the form (unk)k1(u_{n_k})_{k \geq 1}, where (nk)(n_k) is a strictly increasing sequence in N\NN.

A metric space VV is called separable if there exists a countable set AVA \subset V such that, for any vVv \in V, there exists a sequence (an)(a_n) contained in AA with anva_n \to v. For example, R\RR is separable because any vRv \in \RR can be expressed as the limit of a rational sequence. Separability is useful in certain settings -- particularly when we need to combine topology and measure (see, e.g., Theorem A.3.3). In the applications we consider, most spaces will be separable.

A.1.3.3Open and Closed Sets

Let VV be a metric space. A point uAVu \in A \subset V is called interior to AA if there exists an ϵ>0\epsilon > 0 such that Bϵ(u)AB_\epsilon(u) \subset A.

A subset GG of VV is called open in VV if every uGu \in G is interior to GG. For example, every subset of a discrete metric space is open, since B1/2(u)={u}B_{1/2}(u) = \{u\} for any uu.

A subset FF of VV is called closed if given any sequence (un)(u_n) satisfying unFu_n \in F for all nn and unuu_n \to u for some uVu \in V, the point uu is in FF. In other words, FF contains the limit points of all convergent sequences that take values in FF. Arbitrary unions and finite intersections of open sets are open, while arbitrary intersections and finite unions of closed sets are closed. A set GVG \subset V is open if and only if GcG^c is closed.

A.1.3.4Compactness

A set DD in VV is called bounded if there exists a finite KK such that d(u,v)Kd(u, v) \leq K whenever u,vDu, v \in D. A sequence in VV is called bounded if its range is a bounded subset of VV. A subset KK of VV is called precompact in VV if every sequence in KK has a subsequence converging to some point in VV. The set KK is called compact if, in addition, the limit points always lie in KK. Thus, KK is compact if and only if KK is closed and precompact.

The following theorem is a bedrock of real analysis.

Every precompact subset of a metric space is bounded, but the converse is not true in general. For example, consider the set bRb\RR with the supremum distance (Example A.1.14). Let fnf_n be the normal density with variance 1 and mean nn for each nn in N\NN. The set {fn}nN\{f_n\}_{n \in \NN} is bounded, since d(fn,fm)1d_\infty(f_n, f_m) \leq 1 for all n,mn, m. But it is not precompact. For example, the sequence {fn}nN\{f_n\}_{n \in \NN} has no convergent subsequence. Indeed, every pair of distinct points fn,fmf_n, f_m in the sequence has d(fn,fm)=1d_\infty(f_n, f_m) = 1.

Later, in Section A.4.2.3, we will see sufficient conditions for boundedness to imply precompactness.

A.1.3.5Completeness

Let V=(V,d)V = (V, d) be a metric space. Analogous to the real case (see Section A.1.1.2), a sequence (un)V(u_n) \subset V is called Cauchy if, given any ϵ>0\epsilon > 0, there exists an nϵNn_\epsilon \in \NN such that n,mnϵn, m \geq n_\epsilon implies d(un,um)<ϵd(u_n, u_m) < \epsilon. (V,d)(V, d) is called complete if every Cauchy sequence in VV converges in VV. Examples of complete spaces include Rn\RR^n paired with any metric generated by a norm, the set of n×kn \times k matrices paired with any metric generated by a norm, the space (p(X),dp)(\ell_p(\Xsf), d_p) for countable X\Xsf and p[1,]p \in [1, \infty], the space (bX,d)(b\Xsf, d_\infty), and the space (bcX,d)(bc\Xsf, d_\infty). Two metrics d1d_1 and d2d_2 on VV are called equivalent if there are positive constants α,β\alpha, \beta such that αd1(u,v)d2(u,v)βd1(u,v)\alpha d_1(u, v) \leq d_2(u, v) \leq \beta d_1(u, v) for all u,vVu, v \in V. Equivalent metrics generate the same Cauchy sequences, so completeness is preserved under equivalence.

A.2Topology

Topological spaces are a generalization of metric spaces. They are useful for two reasons. One is that there exist interesting and useful topological spaces that cannot be represented as metric spaces. The second is that, by stripping away some of the structure naturally present in metric spaces, topological arguments add simplicity and clarity to many discussions in analysis.

A.2.1Topological Space

We begin by introducing topological spaces and investigating some of their core characteristics.

A.2.1.1Definition and Examples

A topological space is a pair (V,τ)(V, \tau) where VV is a nonempty set and τ\tau is a collection of subsets of VV such that

  1. \varnothing and VV are both in τ\tau,

  2. τ\tau is closed under finite intersections, and

  3. τ\tau is closed under arbitrary unions.

Statements (ii) and (iii) mean that

A,Bτ    ABτandAτ    AAAτ.A, B \in \tau \implies A \cap B \in \tau \quad \text{and} \quad \aA \subset \tau \implies \cup_{A \in \aA} A \in \tau.

The family τ\tau is called a topology on VV. The elements of τ\tau are called open sets. Complements of open sets are called closed.

A subset NN of a topological space V=(V,τ)V = (V, \tau) is called a neighborhood of a point vVv \in V if there exists a GτG \in \tau with vGNv \in G \subset N. A topological space VV is called a Hausdorff space if, for any u,vVu, v \in V with uvu \not= v, there exist neighborhoods NN of uu and MM of vv with NM=N \cap M = \varnothing. Every metrizable space is Hausdorff, and all topological spaces we consider in this book are Hausdorff spaces.

A.2.1.2Nets

We briefly introduce nets, which are a generalization of a sequence. Nets are important because (a) they characterize topologies, in a sense described below, and (b) nets allow us to describe definitions and properties in a way that connects neatly to sequence-based definitions in metric spaces.

Let AA be any nonempty set. A preorder on AA is a relation \preceq on A×AA \times A such that, for any a,b,ca, b, c in AA we have aaa \preceq a (reflexivity) and aba \preceq b and bcb \preceq c implies aca \preceq c (transitivity). Obviously any antisymmetric preorder on AA is a partial order on AA. A directed set is a nonempty set AA and a preorder \preceq on AA such that, for any a,bAa, b \in A, there exists a cAc \in A with aca \preceq c and bcb \preceq c.

Let VV be any set. A net in VV is a function from a directed set AA to VV, typically written as vv_\bullet or (vα)αA(v_\alpha)_{\alpha \in A}. We sometimes simplify the latter to (vα)(v_\alpha). The interpretation is that αA\alpha \in A is mapped to vαVv_\alpha \in V. Obviously any sequence (vn)(v_n) in VV is also a net in VV.

A net (vα)αA(v_\alpha)_{\alpha \in A} in VV is said to converge to vVv \in V and we write vαvv_\alpha \to v if, for any neighborhood NN of vv, there exists a βA\beta \in A such that vαNv_\alpha \in N whenever βα\beta \preceq \alpha. This generalizes the concept of convergence of sequences in metric space. It is easy to check that convergent nets in VV have unique limits whenever VV is Hausdorff. (The converse is also true.)

The next theorem shows that nets can be used to characterize topologies.

For a proof of Theorem A.2.1, see Theorem 2.14 of Aliprantis & Border (2006).

Let (vα)αA(v_\alpha)_{\alpha \in A} and (wβ)βB(w_\beta)_{\beta \in B} be two nets in VV. The net (wβ)βB(w_\beta)_{\beta \in B} is called a subnet of (vα)αA(v_\alpha)_{\alpha \in A} if there exists an order preserving map pp from BB to AA such that (i) wβ=vp(α)w_\beta = v_{p(\alpha)} for all αA\alpha \in A and (ii) for all αA\alpha \in A, there exists a αp(B)\alpha' \in p(B) such that αα\alpha \preceq \alpha'.

Subnets generalize subsequences. For example, suppose that wn=1/n2w_n = 1/n^2 and vn=1/nv_n = 1/n for nNn \in \NN, then (wn)nN(w_n)_{n \in \NN} is a subnet of (vn)nN(v_n)_{n \in \NN} in R\RR (take A=B=NA = B = \NN and p(n)=n2p(n) = n^2).

A subset KK of topological space VV is called compact if, given any net (vα)(v_\alpha) contained in KK, there exists a subnet (wβ)(w_\beta) of (vα)(v_\alpha) and a point vKv \in K such that wβvw_\beta \to v. This generalizes the notion of a compact subset of a metric space, as given in Section A.1.3.4.

A.2.1.3Continuous Functions

Let VV and WW be topology spaces. A function f ⁣:VWf \colon V \to W is said to be continuous at vVv \in V if, for any net (vα)(v_\alpha) in VV with vαvv_\alpha \to v in VV we have f(vα)f(v)f(v_\alpha) \to f(v) in WW. If ff is continuous at every vVv \in V we simply say that ff is continuous. It is well-known that ff is continuous on VV if and only if f1(G)f^{-1}(G) is open in VV whenever GG is open in WW. (For a proof of this equivalence, see, e.g., Theorem 2.28 of Aliprantis & Border (2006).)

One of the most important features of continuous functions is that they carry compact sets into compact sets (see, e.g., §2.3 of Dudley (2002)):

A.2.1.4Initial Topologies

Let VV be a nonempty set and, for each α\alpha in index set Λ\Lambda, let fαf_\alpha be a function from VV to topological space (Wα,τα)(W_\alpha, \tau_\alpha). The initial topology generated by {fα}αΛ\{f_\alpha\}_{\alpha \in \Lambda} is the topology τ\tau on VV generated (in the sense of Example A.2.3) by the family of sets

A{fα1(G):Gτα,  αΛ}.\aA \coloneq \setntn{f_\alpha^{-1}(G)}{G \in \tau_\alpha, \; \alpha \in \Lambda}.

Evidently each fαf_\alpha is continuous with respect to τ\tau on VV. The following lemma nicely characterizes convergence with respect to the initial topology. In the statement, τ\tau is the initial topology just described.

A.2.1.5Metrizable Spaces

A topological space (V,τ)(V, \tau) is called metrizable if there exists a metric dd on VV such that dd generates the topology τ\tau. In metrizable spaces, sequences have the same “rights” as sequences in Euclidean space, in the sense that they determine the topology and hence other derived objects such as continuous functions. For example, given two metrizable spaces VV and WW,

  1. a function f ⁣:VWf \colon V \to W is continuous if and only if, for uVu \in V and any sequence (vn)V(v_n) \subset V we have f(vn)f(v)f(v_n) \to f(v) in WW whenever vnvv_n \to v in VV.

  2. A set CC is closed in VV if and only if any convergent sequence contained in CC converges to an element of CC.

Equivalent metrics generate the same topology, which is why it is often nicer to discuss topologies than metrics. For example, we will see later (in Section A.4.2) that all metrics on Euclidean space Rn\RR^n generated by a norm are equivalent. Hence, while there are infinitely many norms on Rn\RR^n, they all generate the same topology. This means that, when discussing norm topologies, we can speak unambiguously about open sets, compact sets, continuous functions, etc.

A.2.1.6Product Topologies

Let {(Vn,τn)}nN\{(V_n, \tau_n)\}_{n \in \NN} be a family of topological spaces and consider the Cartesian product V=nNVnV = \prod_{n \in \NN} V_n. The ii-th projection map on VV is the function πi\pi_i sending v=(vn)nNVv = (v_n)_{n \in \NN} \in V into viv_i. The product topology on VV, denoted here by τ\tau, is the initial topology generated by the set of projection maps {πn}nN\{ \pi_n \}_{n \in \NN}. The following result is a direct consequence of Lemma A.2.3.

More generally, if ((Xi,di))i=1n((\Xsf_i, d_i))_{i=1}^n are metric spaces and XiXi\Xsf \coloneq \prod_i \Xsf_i has the product topology, then (uk)X(u_k) \subset \Xsf converges to uXu \in \Xsf if and only if di(uk,u)0d_i(u_k, u) \to 0 for all ii.

A.2.1.7Existence of Extrema

For finite subsets of R\RR, maxima and minima clearly exist. For infinite collections the same is not true. For example, the set (0,1)(0, 1) has neither a maximum nor a minimum.

Under what conditions on primitives are maxima and minima guaranteed to exist? There are multiple approaches to this issue, depending on the structure of the problem. In this section we treat one of the most fundamental, attributed to the German mathematician Karl Weierstrass (1815--1897).

Let ff be a function from a metric space VV to R\RR. Let (vn)(v_n) be an VV-valued sequence and let vv be a point in VV. The function ff is called

If ff is lower semicontinuous at every point in VV, then ff is called lower semicontinuous, and similarly for upper continuity.

A proof of the next theorem can be found in Jahn (2020).

A.2.2Stability and Contractions

One of the most important approaches to fixed points in metric space is via the theory of contractive maps. Here we review key results.

A.2.2.1Fixed Points

Let VV be any set and let SS be a self-map on VV. If vVv \in V obeys Sv=vSv = v, then vv is called a fixed point of SS in VV. For example, if V=RV = \RR and SS is the identity, then every point in R\RR is fixed under SS. If, instead, Sx=x2Sx = x^2, then the set of fixed points is {0,1}\{0, 1\}.

Now let VV be a topological space. We call S ⁣:VVS \colon V \to V globally stable on VV if SS has a unique fixed point uVu^* \in V and SkuuS^k u \to u^* as kk \to \infty for all uVu \in V. When VV is metrizable, with metric dd, a self-map SS is called asymptotically contracting if d(Snu,Snv)0d(S^n u, S^n v) \to 0 as nn \to \infty for all u,vVu, v \in V.

We will often make use of the following lemma.

A.2.2.2Contractions

A self-map SS on metric space V(V,d)V \coloneq (V, d) is called contracting or, more specifically, a contraction of modulus λ\lambda if there exists a λ[0,1)\lambda \in [0, 1) such that

d(Su,Sv)λd(u,v)for allu,vVd(Su, Sv) \leq \lambda d(u, v) \quad \text{for all} \quad u, v \in V

For a proof, see, for example, Aliprantis & Border (2006), Theorem 3.48.

Most of the conclusions of Banach’s contraction mapping theorem carry over when SS is eventually contracting; that is, when SkS^k is contracting for some kNk \in \NN. A proof can be found on p. 9 of Goebel & Kirk (1990).

A.3Measure and Integration

In this section we review measurable functions and integration theory. Measurable functions generalize continuous functions while remaining closed under standard arithmetic and limiting operations, and they admit a well-defined theory of integration. Throughout, for real-valued ff on an arbitrary domain, we set f+f0f^+ \coloneq f \vee 0 and f(f0)f^- \coloneq - (f \wedge 0). See Figure A.6 for an illustration. The function f+f^+ is called the positive part of ff, while ff^- is called the negative part. The identity f=f+ff = f^+ - f^- always holds, so the pair f+f^+, ff^- provides a decomposition of ff into the difference between two nonnegative functions.

Decomposition of functions

Figure A.6:Decomposition of functions

A.3.1Measure Theory

We review measurable spaces, measurable functions, parametric continuity and measurable selections, and measures.

A.3.1.1Measurable Space

Let X\Xsf be any nonempty set. A collection of subsets A\aA of X\Xsf is called a σ\sigma-algebra on X\Xsf if

  1. XA\Xsf \in \aA,

  2. AAA \in \aA implies AcAA^c \in \aA, and

  3. if {An}n1\{A_n\}_{n \geq 1} is a sequence contained in A\aA, then nAnA\cup_n A_n \in \aA.

A pair (X,A)(\Xsf, \aA) where X\Xsf is a nonempty set and A\aA is a σ\sigma-algebra on X\Xsf is called a measurable space.

Points (ii) and (iii) tell us that A\aA is “stable” under the taking of complements and unions. By De Morgan’s law (nAn)c=nAnc(\cap_n A_n)^c = \cup_n A_n^c, any σ\sigma-algebra is stable under countable intersections too. By (i) and (ii), A\varnothing \in \aA also holds.

One way to define a σ\sigma-algebra is to take a collection C\cC of subsets of X\Xsf, and consider the smallest σ\sigma-algebra that contains this collection.

Now let X\Xsf be a metric space. The family of Borel sets on X\Xsf, denoted by either B\bB or BX\bB_\Xsf depending on whether or not the underlying space is clear, is defined as the σ\sigma-algebra generated by the open sets of X\Xsf. Evidently B\bB contains not only all the open subsets of X\Xsf but also all the closed ones. From these sets we can continue taking complements and countable unions and everything we produce must be a Borel set. In fact it turns out that every set we work with in day-to-day analysis is a Borel set.

A.3.1.2Measurable Functions

Given two arbitrary measurable spaces (X,A)(\Xsf, \aA) and (Y,B)(\Ysf, \bB), a function ff from X\Xsf to Y\Ysf is called (A,B)(\aA, \bB)-measurable if

f1(B) is in A whenever BB.f^{-1}(B) \text{ is in } \aA \text{ whenever } B \in \bB.

In other words, measurable functions are those functions that pull measurable sets back to measurable sets. If Y\Ysf is a metric space and B\bB is its Borel sets, then we will say that ff is Borel measurable. It can be shown in this case (see, e.g., Çınlar (2011), Proposition 2.3) that ff is Borel measurable if and only if either one of the following apparently weaker conditions are satisfied:

  1. f1(G)f^{-1}(G) is in A\aA whenever GG is open in Y\Ysf

  2. Y\Ysf is a Borel subset of R\RR and f1((,α))f^{-1}((-\infty, \alpha)) is in A\aA for all αR\alpha \in \RR.

From this result it is immediate that every continuous function from X\Xsf to Y\Ysf is also Borel measurable.

While the class of continuous functions has beautiful properties and is closed under uniform limits (see Example A.1.20), it is not closed under pointwise limits,[2] which makes it hard to work with in some instances. On the other hand, the set of Borel functions is closed under the taking of pointwise limits:

In fact, in our setting, the set of Borel measurable functions is precisely the smallest class of functions that contains the continuous functions and is closed under the taking of pointwise limits (see, e.g., §11.7 of Kechris (2012)).

It is also true that compositions of Borel measurable functions are also Borel measurable, and, when the functions are real-valued, that Borel measurability is preserved under algebraic operations. The next lemma gives one statement of these results:

See Çınlar (2011), Chapter 1, Section 2 for proofs.

A.3.1.3Parametric Continuity and Measurable Selections

We often wish to know whether or not continuity passes from primitives to solutions. For example, we might ask whether an equilibrium object, constructed through a process that involves optimization, varies continuously with parameters. The most commonly used theorem in this domain is Berge’s theorem of the maximum. Here we state a version of Berge’s theorem. Throughout, A\Asf and X\Xsf are metric spaces.

A correspondence from X\Xsf to A\Asf is a map Γ\Gamma from X\Xsf to the set of all subsets of A\Asf. A correspondence Γ\Gamma from X\Xsf to A\Asf is called nonempty if Γ(x)\Gamma(x) is nonempty for all xXx \in \Xsf. A function σ\sigma from X\Xsf to A\Asf is called a measurable selection with respect to Γ\Gamma if σ\sigma is Borel measurable and σ(x)Γ(x)\sigma(x) \in \Gamma(x) for all xXx \in \Xsf.

A nonempty correspondence Γ\Gamma is called

Let Γ\Gamma be a nonempty, compact-valued correspondence from X\Xsf to A\Asf. Let qq be a real valued function on G{(x,a)X×A:aΓ(x)}\Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)} and set

m(x)maxaΓ(x)q(x,a)(xX)m(x) \coloneq \max_{a \in \Gamma(x)} q(x, a) \qquad (x \in \Xsf)

whenever the maximum is well defined.

A proof of the continuity results in Theorem A.3.3 can be found in §17.5 of Aliprantis & Border (2006). Existence of a measurable selection is proved in §18.19 of the same reference.

A.3.1.4Measures

Through the theory constructed above, we can identify broad classes of sets and functions that are relatively well behaved (e.g., Borel sets and Borel functions). This opens the way to analyzing how to (a) measure these sets and (b) integrate the functions. The first step is to introduce the notion of a measure, which is a map μ\mu from a σ\sigma-algebra A\aA to [0,][0, \infty] satisfying

  1. μ()=0\mu(\varnothing) = 0 and

  2. μ(n=1An)=n=1μ(An)\mu(\cup_{n=1}^\infty A_n) = \sum_{n=1}^\infty \mu(A_n) whenever {An}A\{A_n\} \subset \aA is disjoint.

Here disjointness of {An}\{A_n\} means that any two distinct sets in this sequence are disjoint.

Returning to the general case of a measure μ\mu on measurable space (X,A)(\Xsf, \aA), if there exists a sequence of sets (An)A(A_n) \subset \aA with μ(An)<\mu(A_n) < \infty for all nn and nAn=X\cup_n A_n = \Xsf, then μ\mu is called σ\sigma-finite. If μ(X)<\mu(\Xsf) < \infty, then μ\mu is called finite. If μ(X)=1\mu(\Xsf) = 1, then μ\mu is called a probability measure.

If X\Xsf is a metric space and A=B\aA = \bB (the Borel sets), then μ\mu is called a Borel measure. If A=B\aA = \bB and μ(X)=1\mu(\Xsf) = 1, then μ\mu is called a Borel probability measure. For a Borel probability measure μ\mu, the value μ(B)\mu(B) usually is interpreted as the probability that, when a random element of X\Xsf is selected, that element is in BB.

A measure space is a triple (X,A,μ)(\Xsf, \aA, \mu) where (X,A)(\Xsf, \aA) is a measurable space and μ\mu is a measure on A\aA. If μ(X)=1\mu(\Xsf) = 1, then the measure space is also called a probability space. In this case it is common to write the measure space as (Ω,F,P)(\Omega, \fF, \PP). A random variable on probability space (Ω,F,P)(\Omega, \fF, \PP) is an (F,B)(\fF, \bB)-measurable map XX from Ω\Omega to R\RR paired with its Borel sets B\bB. More generally, given measurable space (E,E)(E, \eE), an EE-valued random element on probability space (Ω,F,P)(\Omega, \fF, \PP) is an (F,E)(\fF, \eE)-measurable map XX from Ω\Omega to EE. The distribution of this random element XX is the probability measure PP defined by

P(B)=P{ωΩ:X(ω)B}(BE)P(B) = \PP\setntn{\omega \in \Omega}{X(\omega) \in B} \qquad (B \in \eE)

Here’s a reassuring fact implying that Borel probability measures on R\RR are isomorphic to a set of very familiar objects.

More generally, we have the interpretation

μ(B)= probability that xB when x is drawn from F\mu(B) = \text{ probability that } x \in B \text{ when } x \text{ is drawn from } F

A.3.1.5Product Spaces and Product Measures

Given measurable spaces (X,A)(\Xsf, \aA) and (Y,B)(\Ysf, \bB), the product σ\sigma-algebra AB\aA \otimes \bB is the σ\sigma-algebra on X×Y\Xsf \times \Ysf generated by all sets of the form A×BA \times B with AAA \in \aA and BBB \in \bB. The triple (X×Y,AB)(\Xsf \times \Ysf, \, \aA \otimes \bB) is called the product measurable space.

If μ\mu and ν\nu are σ\sigma-finite measures on (X,A)(\Xsf, \aA) and (Y,B)(\Ysf, \bB) respectively, then there exists a unique measure μν\mu \otimes \nu on AB\aA \otimes \bB satisfying

(μν)(A×B)=μ(A)ν(B)(AA,  BB).(\mu \otimes \nu)(A \times B) = \mu(A) \, \nu(B) \qquad (A \in \aA, \; B \in \bB).

The measure μν\mu \otimes \nu is called the product measure. The construction extends naturally to finite products of measurable spaces.

A.3.2Integration

We define abstract integrals and review their key properties, including monotonicity and the dominated convergence theorem.

A.3.2.1Abstract Integrals

Let (X,A)(\Xsf, \aA) be a measurable space and let mA+m\aA_+ be the set of nonnegative real-valued Borel measurable functions on (X,A)(\Xsf, \aA). We define an integral on mA+m\aA_+ to be a function I ⁣:mA+[0,]I \colon m\aA_+ \to [0, \infty] such that

  1. I(f)=0I(f) = 0 when f=0f = 0 everywhere on X\Xsf,

  2. f1f2f_1 \leq f_2 \leq \cdots and limnfn=f\lim_{n\to \infty} f_n = f implies limnI(fn)=I(f)\lim_{n \to \infty} I(f_n) = I(f), and

  3. α,β0\alpha, \beta \geq 0 and f,gmA+f, g \in m\aA_+ implies I(αf+βg)=αI(f)+βI(g)I(\alpha f + \beta g) = \alpha I(f) + \beta I(g).

The limit in (ii) is a pointwise limit, so that limnfn=f\lim_{n\to \infty} f_n = f means limnfn(x)=f(x)\lim_{n\to \infty} f_n(x) = f(x) for every xXx \in \Xsf.

The following important result, proved in chapter 1 of Çınlar (2011), states that every measure on a measurable space creates a unique and well defined integral.

The value Iμ(f)I_\mu (f) is called the integral of ff under μ\mu and the following notation is common:

Iμ(f):=:f ⁣dμ:=:f(x)μ( ⁣dx).I_\mu (f) :=: \int f \diff \mu :=: \int f(x) \mu(\diff x).

The integral IλI_\lambda introduced in Example A.3.7 is called the Lebesgue integral, and it extends the standard Riemann integral to a larger set of functions (the Borel measurable functions), while at the same time guaranteeing that the attractive properties (i)--(iii) in the definition of the integral will hold.

Equation (A.28) makes sense in this setting because if, say, f=1[a,b]f = \1_{[a, b]}, then

Iλ(f)=λ([a,b])=ba,I_\lambda (f) = \lambda( [a, b] ) = b - a,

where the first equality is by (A.28) and the second is by the fact that Lebesgue measure assigns length to intervals. The value bab-a is also what we would expect for the integral, since it is the area under the curve for this simple function.[3]

If μ\mu is a probability measure and w ⁣:XRw \colon \Xsf \to \RR, then one often writes Ew(x)\EE w(x) for the integral of w(x)w(x) with respect to μ\mu. That is,

Ew(x)=w ⁣dμ\EE w(x) = \int w \diff \mu

Here we are thinking of xx as a random variable drawn from distribution μ\mu and the integral corresponds to the expectation of w(x)w(x) under μ\mu.

A.3.2.2Properties of Integrals

Given a measure space (X,A,μ)(\Xsf, \aA, \mu), a property is said to hold μ\mu-almost everywhere (or μ\mu-almost surely when μ\mu is a probability measure) if it holds on all of X\Xsf except possibly a set of μ\mu-measure zero. A sequence (fn)(f_n) converges to ff μ\mu-almost everywhere if fn(x)f(x)f_n(x) \to f(x) for μ\mu-almost every xXx \in \Xsf.

The integral extends to functions that take negative values, as well as just the nonnegative functions in mA+m\aA_+. Indeed, if (X,A,μ)(\Xsf, \aA, \mu) is a measure space and fmAf \in m\aA is not necessarily nonnegative, then we can still decompose it into the difference between two nonnegative functions via f=f+ff = f^+ - f^-. Imposing linearity, we now set

f ⁣dμf+ ⁣dμf ⁣dμ.\int f \diff \mu \coloneq \int f^+ \diff \mu - \int f^- \diff \mu .

The only risk here is that both terms on the right equal ++\infty, in which case the integral is not well defined. If both integrals are finite we call ff integrable with respect to μ\mu.

In what follows, we leave (X,A,μ)(\Xsf, \aA, \mu) fixed and write the integral Iμ(f)I_\mu(f) of ff under μ\mu as f ⁣dμ\int f \diff \mu. We note that every integral is increasing, in the sense that

fg    f ⁣dμg ⁣dμ.f \leq g \implies \int f \diff \mu \leq \int g \diff \mu.

To see this, observe that gfg - f is nonnegative (and measurable) and hence (gf) ⁣dμ\int (g - f) \diff \mu is well defined and nonnegative. Now, using the linearity in part (iii) of Theorem A.3.5, we have

g ⁣dμ=(gf+f) ⁣dμ=(gf) ⁣dμ+f ⁣dμf ⁣dμ.\int g \diff \mu = \int (g - f + f) \diff \mu = \int (g - f) \diff \mu + \int f \diff \mu \geq \int f \diff \mu.

A battery of useful limit theorems exist for the integral we have defined. In our statements of these results, (X,A,μ)(\Xsf, \aA, \mu) is any measure space and ff and fnf_n are (A,B)(\aA, \bB)-measurable functions from X\Xsf to R\RR for all nNn \in \NN.

The first implication (i.e., (i)     \implies (A.36)) is called the monotone convergence theorem. The second is called the dominated convergence theorem.

A.3.3Conditioning

Next we review prediction based on conditional expectations. Conditional expectations are themselves a cornerstone of economic theory and empirics, since they describe optimal forecasts based on limited information. Here we provide a brief treatment of the general setting that suffices for what follows.

A.3.3.1Definition

Let YY and the elements of G{X1,,Xk}\gG \coloneq \{X_1, \ldots, X_k\} be scalar random variables. Consider the problem of predicting YY given G\gG. That is, we wish to form a prediction of the value that YY will take once X1,,XkX_1, \ldots, X_k are known, without any additional information on the state of the world. Another way to say this is that we seek a (nonrandom) function f ⁣:RkRf \colon \RR^k \to \RR such that

Y^f(X1,,Xk) is a good predictor of Y.\hat Y \coloneq f(X_1, \ldots, X_k) \text{ is a good predictor of } Y.

To find such an ff we must define what “good” means. The most common definition in the present context is that mean squared error E[(Y^Y)2]\EE[(\hat Y - Y)^2] is small. Thus, we have a minimization problem in function space (the set from which ff is chosen). Based on projection arguments, it can be shown that there exists an essentially unique f^\hat f in the set of functions from Rk\RR^k to R\RR that solves

f^=argminfE[(Yf(X1,,Xk))2].\hat f = \argmin_f \EE[ (Y - f(X_1, \ldots, X_k))^2 ].

(See, e.g., Çınlar (2011).) We call the resulting variable

Y^f^(X1,,Xk)\hat Y \coloneq \hat f(X_1, \ldots, X_k)

the conditional expectation of YY given G\gG. Common alternative notations for Y^\hat Y include

EGY:=:E[YG]:=:E[YX1,,Xk].\EE_{\gG} Y :=: \EE[Y \given \gG] :=: \EE[Y \given X_1, \ldots, X_k] .

In the present context, G\gG is often called an information set.

A.3.3.2Properties

In the next proposition, a random variable YY is called G\gG-measurable if there exists a function ff such that Y=f(X1,,Xk)Y = f(X_1, \ldots, X_k). Intuitively, YY is perfectly predictable given the data in G\gG.

Property (vi) states that the linearity of expectations is preserved under conditioning. Property (ii) is called the law of iterated expectations, and is shared by all projections. Property (v) is sometimes called conditional determinism, since XX can be treated like a constant when it is pinned down by the information set. A full proof of Proposition A.3.7 can be found in Çınlar (2011).

A.3.4Martingales

In this section we provide a brief introduction to martingales, one of the most important classes of stochastic processes, and a result on stopping times that’s needed for the theory of optimal stopping.

A.3.4.1Discrete-Time Martingales

Let (Ω,F,P)(\Omega, \fF, \PP) be a probability space and let (Ft)t0(\fF_t)_{t \geq 0} be a sequence of σ\sigma-algebras with FtFt+1F\fF_t \subset \fF_{t+1} \subset \fF for all tt, called a filtration. A sequence of random variables (Mt)t0(M_t)_{t \geq 0} is called a martingale with respect to (Ft)(\fF_t) if, for all t0t \geq 0,

  1. MtM_t is Ft\fF_t-measurable,

  2. EMt<\EE |M_t| < \infty, and

  3. E[Mt+1Ft]=Mt\EE[M_{t+1} \mid \fF_t] = M_t.

A stopping time with respect to (Ft)(\fF_t) is a random variable τ\tau taking values in {0,1,2,}{}\{0, 1, 2, \ldots\} \cup \{\infty\} such that {τt}Ft\{\tau \leq t\} \in \fF_t for all t0t \geq 0.

A.3.4.2Martingale Stopping Times

Next we present the optional stopping theorem for discrete-time martingales.

Using this, we establish a general result on exit times for bounded martingales. In the statement of the theorem, (Mt,Ft)(M_t, \fF_t) is a discrete-time martingale with M0=mM_0 = m, and

τ=inf{t0:Mt(a,b)}.\tau = \inf\{t \geq 0 : M_t \notin (a,b)\}.

for some a,ba,b in R\RR with a<ba < b.

A.4Vector Spaces and Norms

We humans have natural geometric intuition about the space Rn\RR^n when n=3n = 3. If this intuition can be expressed algebraically, then R3\RR^3 results often extend to Rn\RR^n for arbitrary nNn \in \NN -- and also to more general collections of objects, such as matrices, complex numbers and real-valued functions, provided that these collections are assigned some basic algebraic structure analogous to that enjoyed by vectors in R3\RR^3.

Of course we need to formalize what “analogous” means by codifying the properties that we need the algebraic operations to satisfy. This leads to the concept of (abstract) vector space. In this section we recall the definition of such spaces and review key properties.

A.4.1Vector Space

We begin with linear algebraic properties in abstract sets that generalize the idea of adding and scalar multiplying vectors in Rn\RR^n. Then we discuss properties of subsets of and maps over these abstract “vector spaces.”

A.4.1.1Definition and Properties

A vector space (also called a linear space) is a triple (E,+,)(E, + , \cdot) where EE is a nonempty set, ++ is a map from E×EE \times E to EE called addition and \cdot is a map from R×E\RR \times E to EE called scalar multiplication, such that for all u,v,wEu, v, w \in E and α,βR\alpha, \beta \in \RR,

  1. u+(v+w)=(u+v)+wu + (v + w) = (u + v) + w

  2. u+v=v+uu + v = v + u

  3. there exists an element 0E0 \in E, called the origin, s.t. u+0=uu + 0 = u for all uEu \in E

  4. for all uEu \in E, there exists a vEv \in E such that u+v=0u + v = 0

  5. α(βu)=(αβ)u\alpha \cdot (\beta \cdot u) = (\alpha \cdot \beta) \cdot u

  6. 1u=u1 \cdot u = u

  7. α(u+v)=αu+αv\alpha \cdot (u + v) = \alpha \cdot u + \alpha \cdot v

  8. (α+β)u=αu+βu(\alpha + \beta) \cdot u = \alpha \cdot u + \beta \cdot u

In practice, the \cdot symbol is usually omitted, so αuαu\alpha u \coloneq \alpha \cdot u. In the present context, the values α,β,\alpha, \beta, \ldots are often called scalars. Also, the origin, which shares the symbol 0 with the zero element from R\RR, is sometimes referred to as the additive identity.[4]

The vector space Rn\RR^n is a special case of Example A.4.2, obtained when X=[n]\Xsf = \natset{n}.

A.4.1.2Convexity

Given vector space EE, set CEC \subset E is called convex if u,vCu, v \in C and α[0,1]\alpha \in [0,1] implies αu+(1α)vC\alpha u + (1-\alpha) v \in C. In other words, CC is closed under the taking of convex combinations.

When EE is any vector space, a nonempty subset CC of EE is called a cone in EE if

  1. CC is convex,

  2. xCx \in C and xC-x \in C implies x=0x = 0 and

  3. αxC\alpha x \in C whenever xCx \in C and α0\alpha \geq 0.

(Some authors refer to CC as a “pointed convex cone.”)

A.4.1.3Linear Maps and Subspaces

Analogous to the case of Rn\RR^n, a linear subspace of vector space EE is a set SES \subset E satisfying

α,βR and u,vS        αu+βvS.\alpha, \beta \in \RR \text{ and } u, v \in S \; \implies \; \alpha u + \beta v \in S.

The proof of the next proposition is a useful exercise:

A linear operator from vector space EE into vector space FF is a map A ⁣:EFA \colon E \to F satisfying

α,βR and u,vE        A(αu+βv)=αAu+βAv.\alpha, \beta \in \RR \text{ and } u, v \in E \; \implies \; A(\alpha u + \beta v ) = \alpha A u + \beta A v.

It can in fact be shown that every linear operator from Rk\RR^k to Rn\RR^n can be represented by an n×kn \times k matrix.

The “kernel function” pp in (A.58) operator can be identified with a matrix in Rn×n\RR^{n \times n} when X=nN|\Xsf|=n \in \NN. No such identification exists when X=|\Xsf|=\infty.

A.4.1.4Bases and Dimension

A linear combination of vectors u1,,uku_1,\ldots, u_k in EE is a vector of the form α1u1++αkuk\alpha_1 u_1 + \cdots + \alpha_k u_k where α1,,αk\alpha_1,\ldots, \alpha_k are scalars. A set SES \subset E is called linearly independent if, for any finite set {u1,,uk}S\{u_1, \ldots, u_k\} \subset S, we have

α1u1++αkuk=0 implies α1==αk=0.\alpha_1 u_1 + \cdots + \alpha_k u_k = 0 \text{ implies } \alpha_1 = \cdots = \alpha_k = 0.

A basis of a linear subspace SS of EE is a linearly independent subset BB of SS that spans SS (i.e., each uSu \in S can be expressed as a finite linear combination of elements of BB).

A proof can be found in Jänich (1994). In case (ii), we say that EE is nn-dimensional. EE is called finite-dimensional if EE is nn-dimensional for some nNn \in \NN. In case (iii), we call EE infinite-dimensional.

A.4.2Normed Vector Space

In this section we recall basic definitions and properties concerning normed vector space and linear operators acting on such space.

A.4.2.1Norms on Vector Space

Given vector space EE, a map  ⁣:ER\| \cdot \| \colon E \to \RR is called a norm on EE if, for any αR\alpha \in \RR and any u,vEu, v \in E,

2

  1. u0\| u \| \geq 0

  2. u=0    u=0\| u \| =0 \iff u=0

  3. αu=αu\| \alpha u \| = |\alpha| \| u\| and

  4. u+vu+v\| u + v \| \leq \| u \| + \| v \|

  5. (nonnegativity)

  6. (positive definiteness)

  7. (positive homogeneity)

  8. (triangle inequality)

The pair (E,)=((E,+,),)(E, \| \cdot \|) = ((E, +, \cdot), \| \cdot \|) is called a normed vector space (or normed linear space). When \| \cdot \| is understood we refer to the space using the symbol EE.

Consider a normed vector space (E,)(E, \| \cdot\|) with origin 0. Recalling the definition of boundedness from metric spaces, one can show that a subset SS of EE is bounded if and only if there exists an MNM \in \NN such that uM\| u \| \leq M for all uSu \in S.

If (E,)(E, \| \cdot \|) is a normed linear space, then (E,d)(E, d) is a metric space when d(u,v)uvd(u,v) \coloneq \| u - v \|. All metric concepts extend to (E,)(E, \| \cdot \|). For example,

Let EE be a vector space and let \| \cdot \| and \| \cdot \|' be two norm on EE. These norms are said to be equivalent if there exist finite positive constants A,BA, B such that xAx\|x\| \leq A \|x\|' and xBx\|x\|' \leq B \|x\| for all xEx \in E. The following result is fundamental. See, for example, Aliprantis & Burkinshaw (1998), Theorem 27.6.

A.4.2.2Completeness

Completeness is essential to many important theorems in applied analysis. Fortunately, the completeness of R\RR is inherited by many useful spaces. For example,

A proof can be found in, e.g., Aliprantis & Burkinshaw (1998), Theorem 27.6.

A complete normed vector space is called a Banach space. There are many other important Banach spaces, beyond the finite-dimensional ones.

A.4.2.3Compactness

Let (E,)(E, \| \cdot \|) be a normed vector space. All equivalent metrics induce the same precompact sets and the same bounded sets in EE. Since all norms on a finite-dimensional space are equivalent (Theorem A.4.3), any metric induced by a norm on a finite dimensional vector space has the property that its precompact and bounded sets coincide (cf., Theorem A.1.6). The next theorem states this fact for the record.

In line with our discussion in Section A.1.3.4, this one-to-one pairing between closed bounded sets and compact sets breaks down in infinite dimensional spaces. In fact, the closed unit ball of a normed vector space EE is compact if and only if EE is finite-dimensional.

A.4.2.4LpL_p Spaces

Let μ\mu be a σ\sigma-finite measure on measurable space (X,A)(\Xsf, \aA) and let p1p \geq 1. The space Lp(X,A,μ)L_p(\Xsf, \aA, \mu) consists of all Borel measurable functions f ⁣:XRf \colon \Xsf \to \RR with fp ⁣dμ\int |f|^p \diff \mu finite. Functions that agree μ\mu-almost everywhere are identified. The functional fp(fp ⁣dμ)1/p\|f\|_p \coloneq \left(\int |f|^p \diff \mu\right)^{1/p} is a norm on Lp(X,A,μ)L_p(\Xsf, \aA, \mu).

Scheffés identity provides a useful quantitative interpretation of d1d_1 distance between densities: For any densities ff and gg on (X,A,μ)(\Xsf, \aA, \mu), we have

fg1=2×supBABf ⁣dμBg ⁣dμ\|f - g\|_1 = 2 \times \sup_{B \in \aA} \left| \int_B f \diff \mu - \int_B g \diff \mu \right|

Finally, Scheffés lemma is useful for testing L1L_1 convergence:

In the case where fnf_n and ff are densities, Scheffé’s lemma tells us that fnff_n \to f in L1L_1 if and only if fnff_n \to f almost everywhere.

A.4.3Bounded Linear Operators

If EE and FF are normed linear spaces, then the operator norm of AA is defined as

Asupu=1Au.\| A \| \coloneq \sup_{\| u \| = 1} \| A u \|.

(Here u\| u\| is the norm of uu in EE and Au\|Au\| is the norm of AuAu in FF.) When A\| A\| is finite, AA is called a bounded linear operator. The set of all bounded linear operators from EE to FF will be denoted B(E,F)\blop(E, F). If E=FE=F then we write B(E)\blop(E). Every AB(E,F)A \in \blop(E, F) is continuous, since, for unuu_n \to u in EE we have

AunAuAunu0.\| Au_n - Au\| \leq \|A\| \|u_n - u\| \to 0.

The converse is also true: every continuous linear operator from EE to FF is bounded -- see §2.7 of Kreyszig (1978) for a proof of this fact, as well as Theorem A.4.8 below.

As suggested by the name, the operator norm is a norm on B(E,F)\blop(E, F). The details are left as an exercise.

The operator norm is submultiplicative: If A,BB(E)A, B \in \blop(E), then ABABAB\|A B \| \coloneq \| A \circ B \| \leq \| A \| \cdot \| B \|. Iteratively applying the submultiplicative property gives AiAi\|A^i\| \leq \|A \|^i for any iNi \in \NN and AB(E)A \in \blop(E), where AiA^i is the ii-th composition of AA with itself.

Once we have a norm on B(E,F)\blop(E, F), we have an induced metric given by d(A,B)=ABd(A, B) = \| A - B \|, and B(E,F)\blop(E, F) will be a Banach space whenever this metric is complete.

Let EE be a Banach space and let AA be an element of B(E)\blop(E). A complex scalar λ\lambda is called an eigenvalue of AB(E)A \in \blop(E) if there exists a nonzero vector ee such that Ae=λeAe = \lambda e. The spectrum of AA, typically denoted σ(A)\sigma(A), is the set of all scalar λ\lambda such that λIA\lambda I - A fails to be bijective on EE. Any eigenvalue λ\lambda lies in σ(A)\sigma(A) because if Ae=λeAe = \lambda e for some nonzero ee, then λIA\lambda I - A maps ee to 0, while also mapping 0 to 0. Hence λIA\lambda I - A is not bijective. For AB(E)A \in \blop(E), the spectral radius of AA is defined as

ρ(A)sup{λ:λσ(A)}\rho(A) \coloneq \sup \setntn{|\lambda|}{ \lambda \in \sigma(A)}

It is well known (see, e.g., Kreyszig (1978), §7.3) that

  1. ρ(A)A\rho(A) \leq \| A \|, where \| \cdot \| is the operator norm, and

  2. Ak1/kρ(A)\| A^k \|^{1/k} \to \rho(A) as kk \to \infty (Gelfand’s formula).

The following theorem is essential for many results in the book.

(The infinite sum is defined as the limit of the partial sums in EE. Hence the infinite sum exists if and only if the partial sums converge in EE.)

An immediate consequence is global stability of affine maps with spectral radius less than one:

A.5Order

In this section we study order completeness and order continuity, ordered vector spaces and Riesz spaces, the interplay between topology and order including Banach lattices and weighted sup-norm spaces, Markov models, and orders over distributions.

A.5.1Order Continuity and Order Completeness

When studying the real line R\RR, we can define completeness either as existence of limits for Cauchy sequences (Theorem A.1.2) or as existence of suprema for bounded above sets (Theorem A.1.1). The first idea can be extended to metric spaces by generalizing the concept of Cauchy sequences (see Section A.1.3.5). The second can be extended to posets by analogy with existence of suprema. The aim of this section is to describe this second concept of completeness.

A.5.1.1Lattices and Chains

When we discuss posets, there are multiple notions of completeness, with each one determined by the classes of sets that are required to have suprema. Specifically, a nonempty poset VV is called

Note that chain completeness implies countable chain completeness but not conversely, and that neither concept implies nor is implied by the lattice property.

In the definitions above, finite sets are understood to be nonempty (i.e., in one-to-one correspondence with {1,,n}\{1, \ldots, n\} for some nNn \in \NN). Also, “at most countable” means empty, finite or countable. The fact that the empty set is included has significance, as the next lemma illustrates.

The following result is a version of the Knaster--Tarski fixed point theorem for chain complete posets. In the statement, VV is a nonempty poset.

A sublattice of a lattice VV is a subset SS of VV with the property that uvu \vee v and uvu \wedge v are in SS whenever u,vSu, v \in S.

A.5.1.2Dedekind Completeness

Consider the canonical partially ordered set (Rk,)(\RR^k, \leq). This set is not countably chain complete: for example, letting 1\1 be a vector of ones, the increasing sequence (vn)=(n1)(v_n) = (n \1) has no supremum. At the same time, (Rk,)(\RR^k, \leq) certainly has some completeness properties. For example, it follows easily from Exercise A.4 that every bounded above subset of Rk\RR^k has a supremum, and every bounded below subset of Rk\RR^k has an infimum. This motivates the following definitions:

A partially ordered set VV is called Dedekind complete if, for any nonempty AVA \subset V,

  1. AA is bounded above     \implies AA has a supremum in VV and

  2. AA is bounded below     \implies AA has an infimum in VV.

VV is called countably Dedekind complete if, for any nonempty finite or countable AVA \subset V,

  1. AA is bounded above     \implies AA has a supremum in VV and

  2. AA is bounded below     \implies AA has an infimum in VV.

There are natural connections between Dedekind (resp., countable Dedekind) completeness and chain (resp. countable chain) completeness. Here is one simple result.

A.5.1.3Order Continuity

We call a map SS from poset VV to poset WW order continuous on VV if

SvnSvwhenever vnv.S v_n \uparrow S v \quad \text{whenever } v_n \uparrow v.

In other words, if (vn)V(v_n) \subset V with vnvVv_n \uparrow v \in V, then nSvn\bigvee_n S v_n exists in WW and equals SvSv.

In the next lemma, VV and WW are arbitrary posets.

Next we state a variation on the Tarski--Kantorovich fixed point theorem.

Here’s a more standard version of the Tarski--Kantorovich fixed point theorem.

The next lemma is analogous to Lemma A.5.3.

A.5.2Ordered Vector Space

Next we add algebraic structure to posets. The combination of algebraic operations and order will allow us to develop sharp sufficient conditions for dynamic programs and convergence of algorithms.

A.5.2.1Definition and Properties

Let E=(E,+,)E = (E, +, \cdot) be a vector space with origin 0 (see Section A.4.1) and let \leq be a partial order on EE. We call (E,)(E, \leq) an ordered vector space if the order is preserved under addition and nonnegative scalar multiplication; that is, if

  1. uvu \leq v implies u+bv+bu + b \leq v + b for any bEb \in E, and

  2. uvu \leq v and αR\alpha \in \RR with 0α0 \leq \alpha implies αuαv\alpha u \leq \alpha v.

The positive cone of EE, typically denoted by E+E_+, is all vEv \in E with 0v0 \leq v.

If (E,)(E, \leq) is an ordered vector space and u,v,wEu, v, w \in E, then

  1. u0u \leq 0 and v0v \leq 0 implies u+v0u + v \leq 0,

  2. uvu \leq v implies vu-v \leq -u,

  3. (uv)+w=(u+w)(v+w)(u \vee v) + w = (u + w) \vee (v + w), and

  4. α(uv)=(αu)(αv)\alpha (u \vee v) = (\alpha u) \vee (\alpha v) whenever α0\alpha \geq 0.

These facts follow directly from the definitions.

Using the definition in Section A.1.2.4, if (vn)(v_n) is a sequence in ordered vector space EE and vEv \in E, then the statement vnvv_n \uparrow v means that (vn)(v_n) is increasing and nvn=v\bigvee_n v_n = v.

In some settings, a partial order is introduced into a vector space EE by first choosing a (pointed convex) cone CC on EE (see Section A.4.1.2) and stating that uvu \leq v if and only if vuCv - u \in C. The following discussion clarifies this idea.

A.5.2.2Operators on Ordered Vector Space

A linear operator TT mapping ordered vector space EE to itself is called positive if TT is invariant on the positive cone; that is, if uEu \in E and u0u \geq 0 implies Tu0Tu \geq 0.

(In the canonical example given above, positive operators are identified with nonnegative matrices. Unfortunately, this notational inconsistency is deeply embedded in the existing literature so we must accept it.)

Let EE be an ordered vector space and let A ⁣:EEA \colon E \to E be a linear operator. Recalling the definition in Section A.5.1.3, AA is order continuous on EE when (vn)E(v_n) \subset E and vnvEv_n \uparrow v \in E implies AvnAvAv_n \uparrow Av. By Lemma A.5.5, every order continuous linear operator is order preserving -- and hence positive. The next exercise can be completed using Lemma A.5.9.

A self-map SS on a convex subset CC of ordered vector space E(E,)E \coloneq (E, \leq) is called convex on CC if

S(λv+(1λ)v)λSv+(1λ)Sv whenever vvC and 0λ1S(\lambda v + (1-\lambda) v') \leq \lambda Sv + (1-\lambda) Sv' \text{ whenever } v \leq v' \in C \text{ and } 0\leq \lambda \leq 1

The map SS is called concave on CC if

λSv+(1λ)SvS(λv+(1λ)v) whenever vvC and 0λ1\lambda Sv + (1-\lambda) Sv' \leq S(\lambda v + (1-\lambda) v') \text{ whenever } v \leq v' \in C \text{ and } 0\leq \lambda \leq 1

A.5.2.3Riesz Space

Next we introduce Riesz spaces, which are ordered vector spaces with lattice structure. This structure allows for the introduction of a notion of absolute value, which behaves similarly to the pointwise absolute value over vectors in Rn\RR^n. Absolute value in turn helps us clarify and quantify the actions of operators, providing new opportunities for establishing optimality conditions in dynamic programs.

An ordered vector space EE is called a Riesz space if EE is a lattice. With \vee and \wedge as the lattice operations and u,v,wEu, v, w \in E, the following properties always hold:

  1. uv=((u)(v))u \wedge v = - ((-u) \vee (-v)) and uv=((u)(v))u \vee v = - ((-u) \wedge (-v)).

  2. (uv)+w=(u+w)(v+w)(u \wedge v) + w = (u + w) \wedge (v + w) and (uv)+w=(u+w)(v+w)(u \vee v) + w = (u + w) \vee (v + w).

These facts can be easily verified and other related results are found in Chapter 2 of Zaanen (2012).

For element uu of any Riesz space (E,)(E, \leq) we use the notation

uu(u),u+u0andu(u)0.|u| \coloneq u \vee (-u), \quad u^+ \coloneq u \vee 0 \quad \text{and} \quad u^- \coloneq (-u) \vee 0.

These points in EE are called the absolute value, positive part, and negative part of uu respectively. One easily shows that u=u|-u| = |u|. Also,

Notice that (iii) implies the triangle inequality u+vu+v|u+v| \leq |u| + |v|.

We will make use of the following lemma:

Here’s an obvious corollary when E=RE = \RR.

A.5.2.4Riesz Spaces of Measurable Functions

Let (X,A,μ)(\Xsf, \aA, \mu) be a σ\sigma-finite measure space. As usual, we let

The vector spaces mXm\Xsf and bXb\Xsf are both Riesz spaces when paired with the pointwise partial order \leq, with bXb\Xsf a subset of mXm\Xsf.

A.5.2.5Almost Everywhere Pointwise Order

Let (X,A,μ)(\Xsf, \aA, \mu) be as in the previous section and fix p[1,)p \in [1, \infty). Let LpLp(X,A,μ)L_p \coloneq L_p(\Xsf, \aA, \mu) be the Banach space of equivalence classes defined in Section A.4.2.4. Let \leq be defined by fgf \leq g if and only if {xX:f(x)>g(x)}\setntn{x \in \Xsf}{f(x) > g(x)} has μ\mu-measure zero.

The space (Lp,)(L_p, \leq) just described is a Riesz space. For example, if f,gLpf, g \in L_p, then fgf+g|f \vee g| \leq |f| + |g|, and f ⁣dμ\int |f| \diff \mu and g ⁣dμ\int |g| \diff \mu are both finite. Hence fgLpf \vee g \in L_p.

A.5.2.6Dedekind completeness of Riesz Space

Since each Riesz space is a partially ordered space, the notions of Dedekind and countable Dedekind completeness apply directly. Moreover, when testing these forms of completeness, one-sided conditions suffice. The following one-sided condition is particularly simple.

For a proof of Lemma A.5.14, see Theorem 12.1 of Zaanen (2012).

We will make repeated use of the following fact.

A proof of Lemma A.5.15 can be found in Example 12.5 of Zaanen (2012).

Several interesting function spaces are naturally ordered by the pointwise partial order. Next we study the completeness properties of such Riesz spaces. We will make use of the following lemma, in the statement of which, for (vn)RX(v_n) \subset \RR^\Xsf, the symbol supnvn\sup_n v_n indicates the pointwise supremum.

Now let (X,A,μ)(\Xsf, \aA, \mu) be a σ\sigma-finite measure space and let mXm\Xsf and bXb\Xsf be the Riesz spaces discussed in Section A.5.2.4. As above, let \leq be the pointwise partial order.

A.5.3Topology and Order

In some applications it will be helpful to draw on results that use topological or metric structure. In this section we note some elementary facts about topological, metric and normed spaces where order is also present.

A.5.3.1Partially Ordered Space

A partial order \preceq on topological space VV is called closed if, given any two nets (uα)αΛ(u_\alpha)_{\alpha \in \Lambda} and (vα)αΛ(v_\alpha)_{\alpha \in \Lambda} contained in VV,

uαu,    vαv   and   uαvα for all αΛ      uv.u_\alpha \to u, \;\; v_\alpha \to v \; \text{ and } \; u_\alpha \preceq v_\alpha \text{ for all } \alpha \in \Lambda \quad \implies \; u \preceq v.

A partially ordered space, also called a pospace, is a Hausdorff topological space endowed with a closed partial order. (We make the Hausdorff assumption so that sequences have unique limits.)

The next lemma connects topological and order convergence in partially ordered space V=(V,)V = (V, \preceq). In the statement, (vα)αΛ(v_\alpha)_{\alpha \in \Lambda} is a net in VV.

The next lemma shows how global stability (see Section A.2.2.1) interacts with order stability in the setting of partially ordered space.

The following result can be used to compare fixed points of operators. In the statement, V=(V,)V = (V, \preceq) is a pospace and S(V)\sS(V) is all self-maps on VV, ordered pointwise (i.e., for S,TS(V)S, T \in \sS(V), we have STS \preceq T if and only if SvTvSv \preceq Tv for all vVv \in V).

A.5.3.2Partially Ordered Metric Space

A partially ordered metric space is a tuple (V,,d)(V, \preceq, d) where (V,)(V, \preceq) is a poset, dd is a metric on VV, and (V,)(V, \preceq) is a pospace under the topology induced by dd on VV. In particular, \preceq is closed with respect to dd, so that

d(vn,v)0 and d(un,u)0 with unvn for all n implies uv.d(v_n, v) \to 0 \text{ and } d(u_n , u) \to 0 \text{ with } u_n \preceq v_n \text{ for all } n \text{ implies } u \preceq v \text{.}

On a partially ordered metric space V=(V,,d)V = (V, \preceq, d), the metric dd is called sup-nonexpansive if, for all subsets (vα)(v_\alpha) and (wα)(w_\alpha) of VV, we have

d(αvα,αwα)supαd(vα,wα)d \left( \vee_\alpha \, v_\alpha, \vee_\alpha \, w_\alpha \right) \leq \sup_\alpha \, d(v_\alpha, w_\alpha)

whenever the suprema exist.

Sup-nonexpansive metrics will be useful for us because contraction properties are passed from collections of mappings to their upper envelopes. The next lemma explains. In the statement,

  1. (V,,d)(V, \preceq, d) is a partially ordered metric space,

  2. T{Tσ:σΣ}\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma} is a collection of self-maps on VV,

  3. V0V_0 is a subset of VV and TvσTσvTv \coloneq \vee_\sigma T_\sigma \, v exists at each vV0v \in V_0.

A.5.3.3Banach Lattices

Let E=(E,)E = (E, \leq) be a Riesz space and let \| \cdot \| be a complete norm on EE, so that (E,)(E, \| \cdot \|) is a Banach space. If the norm is compatible with the order structure on EE, in the sense that uv\|u\| \leq \|v\| whenever uv|u| \leq |v|, then \| \cdot \| is called a lattice norm and E(E,,)E \coloneq (E, \leq, \| \cdot \|) is called a Banach lattice.

If (vn)(v_n) is a sequence in EE then convergence is as defined for sequences in normed linear space (see Section A.4.2.1): vnvv_n \to v means that $| v_n

A.5.3.4Order Units

Let EE be a Banach lattice. An element eE+e \in E_+ will be called a normalized order unit if e=1\|e\| = 1 and uue|u| \leq \|u\|\, e for all uEu \in E. For example, if E=bXE = b\Xsf, then e=1e = \1 is a normalized order unit.[5]

A.5.3.5Weighted Sup-Norm Spaces

In this section we introduce a class of Banach lattices that are useful for handing unbounded dynamic programming problems. To this end, let X\Xsf be a topological space. A weight function on X\Xsf is a mapping mX\ell \in m\Xsf with (x)1\ell(x) \geq 1 for all xXx \in \Xsf. Given a weight function \ell and vRXv \in \RR^\Xsf we introduce the \ell-weighted supremum norm

vsupxXv(x)(x).\| v \|_\ell \coloneq \sup_{x \in \Xsf} \, \frac{|v(x)|}{\ell(x)}.

In this setting, we let

Elements of bXb_\ell \Xsf are called \ell-bounded functions.

The next theorem gives conditions under which the spaces discussed above are Banach lattices. Proofs can be found in §12.2.1 of Stachurski (2022).

A.5.3.6Positive Operators on Banach Lattices

If EE is a Banach lattice, then, as in Section A.4.3, we take B(E)\blop(E) to be the norm bounded (and hence norm continuous) linear self-maps on EE. Let B+(E)\blop_+(E) be the positive linear self-maps on EE.

Continuing in the setting of Exercise A.50, the pointwise partial order on B(E)\blop(E) is defined by ABA \leq B whenever AvBvAv \leq Bv for all vEv \in E. The set B+(E)\blop_+(E) coincides with the positive cone of B(E)\blop(E). On this positive cone, the spectral radius is order preserving:

A Banach lattice EE is said to have a σ\sigma-order continuous norm if

(vn)E and vn0    vn0.(v_n) \subset E \text{ and } v_n \downarrow 0 \quad \implies \quad \| v_n \| \to 0.

In the next example, LpL_p is the Banach lattice discussed in Example A.5.7.

A.5.4Markov Models

Many dynamic programs have some form of Markov structure (or can be coerced into a Markov framework by suitably changing the state space). Here we review key ideas related to Markov processes and state some useful results.

In all of this section (Section A.5.4), X\Xsf is a metric space with Borel sets B\bB. The symbol D(X)\dD(\Xsf) is the set of all distributions (Borel probability measures) on X\Xsf. If X\Xsf is finite, then the metric on X\Xsf is the discrete metric (under which all real-valued functions on X\Xsf are continuous and B\bB is the set of all subsets of X\Xsf).

A.5.4.1Stochastic Kernels

Let U\Usf be a second metric space. A transition kernel from U\Usf to X\Xsf is a function NN from U×B\Usf \times \bB to R+\RR_+ with the property that uN(u,B)u \mapsto N(u,B) is Borel measurable for each BBB \in \bB and BN(u,B)B \mapsto N(u, B) is a measure on (X,B)(\Xsf, \bB) for all uUu \in \Usf. A stochastic kernel from U\Usf to X\Xsf is a transition kernel PP from U\Usf to X\Xsf satisfying P(u,X)=1P(u, \Xsf) =1 for all uUu \in \Usf. Informally, the stochastic kernel PP takes a point uUu \in \Usf and randomly “transitions” to a new point in X\Xsf via the distribution P(u, ⁣dx)P(u, \diff x).

A common setting is where U=X\Usf = \Xsf. In this case we say that NN is a transition kernel on X\Xsf, while PP is a stochastic kernel on X\Xsf.

An X\Xsf-valued stochastic process (Xt)t=0(X_t)_{t =0}^\infty on (Ω,F,P)(\Omega, \fF, \PP) is called (P,ψ)(P, \psi)-Markov if X0=dψX_0 \eqdist \psi and P{Xt+1BXt}=P(Xt, ⁣dx)\PP\{ X_{t+1} \in B \given X_t\} = P(X_t, \diff x') with probability one for all t0t \geq 0. If ψ=δx\psi = \delta_x for some xXx \in \Xsf, then we say (Xt)t0(X_t)_{t \geq 0} is (P,x)(P, x)-Markov. We also say that (Xt)t0(X_t)_{t \geq 0} is PP-Markov if (Xt)t0(X_t)_{t \geq 0} is (P,ψ)(P, \psi)-Markov for some ψD(X)\psi \in \dD(\Xsf).

Given any stochastic kernel PP on X\Xsf and any initial condition xXx \in \Xsf, a (P,x)(P, x)-Markov process always exists. In particular, we can take the canonical construction: set Ω=X\Omega = \Xsf^\infty, let Xt(ω)=ωtX_t(\omega) = \omega_t be the coordinate projections, and let Px\PP_x be the unique probability measure on X\Xsf^\infty (equipped with the product σ\sigma-algebra) such that X0=xX_0 = x Px\PP_x-a.s. and Px{Xt+1BX0,,Xt}=P(Xt,B)\PP_x\{X_{t+1} \in B \given X_0, \ldots, X_t\} = P(X_t, B). Existence of Px\PP_x follows from the Ionescu-Tulcea theorem (see, e.g., Meyn & Tweedie (2009), Chapter 3).

Let θ ⁣:XX\theta \colon \Xsf^\infty \to \Xsf^\infty denote the shift operator defined by θ(x0,x1,)=(x1,x2,)\theta(x_0, x_1, \ldots) = (x_1, x_2, \ldots). The following is the Markov property in its general form: for the canonical chain, if f ⁣:XRf \colon \Xsf^\infty \to \RR is measurable and either nonnegative or bounded, then

Ex[fθX1]=EX1fPx-a.s.\EE_x [ f \circ \theta \given X_1 ] = \EE_{X_1} f \quad \PP_x\text{-a.s.}

For a proof see Meyn & Tweedie (2009), p. 63.

A.5.4.2Markov Operators

As before, let mXm\Xsf be all the measurable functions on X\Xsf and let PP be a stochastic kernel on X\Xsf. Given hmXh \in m\Xsf we set

(Ph)(x)h(x)P(x, ⁣dx)(xX).(P h)(x) \coloneq \int h(x') P(x, \diff x') \qquad (x \in \Xsf).

whenever the integral is well-defined. We call PP the Markov operator generated by the stochastic kernel PP. We use the same symbol because stochastic kernels and Markov operators can be placed in one-to-one correspondence via

P(x,B)=(P1B)(x)(xX,BB).P(x,B) = (P \1_B)(x) \qquad (x \in \Xsf, \, B \in \bB).

(The stochastic kernel is on the left and the Markov operator is on the right.)

(We already studied a version of PP in Example A.4.6 on page .) Intuitively, (Ph)(x)(P h)(x) represents the expectation of h(Xt+1)h(X_{t+1}) given Xt=xX_t = x. We extend this interpretation below.

Given a stochastic kernel PP on X\Xsf, a distribution ϕD\phi \in \dD is called stationary for PP if

ϕ(B)=P(x,B)ϕ( ⁣dx) for all BB.\phi(B) = \int P(x, B) \phi(\diff x) \quad \text{ for all } B \in \bB.

A.5.4.3General Properties

The following lemma lists useful properties of the Markov operator ((A.95)) when considered as a linear operator on bXb\Xsf, the set of bounded Borel measurable functions on X\Xsf. Proofs can be found in Meyn & Tweedie (2009), Chapter 3.

(Note that (vi) follows from (v) and Gelfand’s formula for the spectral radius.)

Here is a fundamental result linking the stochastic kernel PP, Markov operator PP, and any PP-Markov process (Xt)t0(X_t)_{t \geq 0}. For a proof see Meyn & Tweedie (2009), Proposition 3.4.2.

A.5.4.4Markov Operators on Integrable Functions

Sometimes we wish to consider Markov operators as linear operators over a space of integrable functions. To this end, let PP be a stochastic kernel on X\Xsf and let ϕ\phi be stationary for PP. As before, we use the same symbol PP for the Markov operator defined in (A.95). The space L1(ϕ)L1(X,B,ϕ)L_1(\phi) \coloneq L_1(\Xsf, \bB, \phi) is the Banach lattice discussed in Example A.5.8.

The proof of part (i) follows from the following adjoint rule:

If we apply (A.101) with stationary ψ\psi we get

Ph ⁣dψ=h ⁣dψ.\int Ph \diff \psi = \int h \diff \psi.

To obtain (i) of Lemma A.5.32 we can use Ph ⁣dψPh ⁣dψ\int |Ph| \diff \psi \leq \int P|h| \diff \psi and the apply (A.102) with hh replaced by h|h|. This proves that PhPh is ψ\psi-integrable whenever hh is ψ\psi-integrable. Linearity of PP is immediate. Part (ii) follows from order preservation of the integral. For (iii), P1\|P\| \leq 1 follows from the bound on Ph ⁣dψ\int |Ph| \diff \psi just obtained, and P1\|P\| \geq 1 from P1=1P\1 = \1. Part (iv) follows from (iii) and Gelfand’s formula.

A.5.5Orders over Distributions

Distributions are objects that decision makers naturally have preferences over. For example, speculators care about probability distributions over returns of prospective investments, often preferring distributions that offer high average returns with low risk. A planner might have preferences over the cross-sectional distributions of consumption and wealth. In this section, we discuss common methods for ordering distributions and their relationships with each other.

A.5.5.1Stochastic Dominance

Let X\Xsf be a metric space and let D(X)\dD(\Xsf) be the set of all distributions (i.e., Borel probability measures) on X\Xsf. Let ibXib\Xsf be the increasing bounded real-valued functions on X\Xsf. For μ\mu and ν\nu in D(X)\dD(\Xsf), we say that

u(x)μ( ⁣dx)u(x)ν( ⁣dx)   for every u in ibX and\int u(x) \mu(\diff x) \leq \int u(x) \nu(\diff x) \; \text{ for every } u \text{ in } ib\Xsf \text{ and}
u(x)μ( ⁣dx)u(x)ν( ⁣dx)   for every concave u in ibX.\int u(x) \mu(\diff x) \leq \int u(x) \nu(\diff x) \; \text{ for every concave } u \text{ in } ib\Xsf.

If we refer to stochastic dominance without explicitly stating the order, then the understanding is that we mean first order stochastic dominance.

Suppose now that X\Xsf is a Borel subset of R\RR and fix F,GD(X)F, G \in \dD(\Xsf). We understand FF and GG as cumulative distribution functions. When testing first order stochastic dominance, it is sufficient to restrict attention to increasing functions ubXu \in b\Xsf that take the form u(x)=1{a<x}u(x) = \1\{a < x\} for some aXa \in \Xsf (see, e.g., Stachurski (2022), §9.4.1). Recalling the interpretation of the integral given in (A.28), this leads to the statement that FFGF \lefsd G if and only if 1F(a)1G(a)1 - F(a) \leq 1 - G(a) for all aXa \in \Xsf, or

FFG    G(x)F(x) for all xXF \lefsd G \iff G(x) \leq F(x) \quad \text{ for all } x \in \Xsf

A.5.5.2Monotone Likelihood Ratios

Here is a property that implies first order stochastic dominance: Consider a pair of distributions (F,G)(F, G) with positive densities ff and gg on an interval II contained in R\RR. We say that (f,g)(f, g) has a monotone likelihood ratio if f/gf/g is increasing on II; that is, if

x,xI and xx    f(x)g(x)f(x)g(x)x, x' \in I \text{ and } x \leq x' \implies \frac{f(x)}{g(x)} \leq \frac{f(x')}{g(x')}

Since rr is increasing in xx, the monotone likelihood ratio property holds.

A.5.5.3Mean-Preserving Spreads

We will be concerned with analyzing how behavior changes when decisions become “riskier” in some sense. To analyze such scenarios, we introduce the notion of a mean-preserving spread. In particular, for a given distribution ϕ\phi, we say that ψ\psi is a mean-preserving spread of ϕ\phi if there exists a pair of random variables (Y,Z)(Y, Z) such that

E[ZY]=0,Y=dϕand   Y+Z=dψ\EE[Z \given Y] = 0, \quad Y \eqdist \phi \quad \text{and } \; Y + Z \eqdist \psi

Thus, ψ\psi is a mean-preserving spread of ϕ\phi if it adds noise without changing the mean.

A.6Chapter Notes

Good introductions to real analysis include Bartle & Sherbert (2011) and Aliprantis & Burkinshaw (1998). For topology, measure theory, and functional analysis at an advanced level, Aliprantis & Border (2006) provides comprehensive coverage, while Dudley (2002) and Kreyszig (1978) offer accessible treatments. Fixed point theory in metric spaces is developed in Goebel & Kirk (1990), and nonlinear optimization is covered in Jahn (2020).

For lattice theory and order, Davey & Priestley (2002) provides a thorough introduction. High quality monographs on Riesz spaces, Banach lattices and positive operators include Aliprantis & Border (2006), Aliprantis & Burkinshaw (2006), Zaanen (2012), Meyer-Nieberg (2012), and Bátkai et al. (2017).

For further reading on probability theory and stochastic processes, Pollard (2002), Çınlar (2011) and Dudley (2002) are outstanding. For Markov chains and stochastic stability, the standard reference is Meyn & Tweedie (2009).

Footnotes
  1. More precisely, σ(C)\sigma(\cC) is the intersection of all σ\sigma-algebras on X\Xsf that contain C\cC. One can show that σ(C)\sigma(\cC) is always a well defined σ\sigma-algebra, since the intersection is nonempty (it at least contains (X)\wp(\Xsf)) and any intersection of σ\sigma-algebras is again a σ\sigma-algebra.

  2. For example, the pointwise limit of the sequence of functions {fn}\{f_n\} given by fn(x)=xnf_n(x) = x^n on [0,1][0, 1] is discontinuous.

  3. A more general perspective on (A.28) that you might find useful is as follows. Suppose we identify measurable sets with their indicator functions. Then μ\mu already provides us with an “integral” over the indicators in mA+m\aA_+. The map IμI_\mu extends the reach of this function to all of mA+m\aA_+.

  4. Some authors would call what we have described as a real vector space, which can then be extended to the notion of complex vector spaces. We have no need for this extension here, so we drop the adjective “real.”

  5. Not all Banach lattices have normalized order units. In fact it can be shown that a Banach lattice EE admits a normalized order unit if and only if EE is an AM-space with unit. We omit these details.

References
  1. Bartle, R. G., & Sherbert, D. R. (2011). Introduction to real analysis (4th ed.). Hoboken, NJ: Wiley.
  2. Aliprantis, C. D., & Border, C., Kim. (2006). Infinite dimensional analysis: a hitchhiker’s guide (3rd ed.). Springer-Verlag, New York.
  3. Dudley, R. M. (2002). Real analysis and probability (Vol. 74). Cambridge University Press.
  4. Jahn, J. (2020). Introduction to the theory of nonlinear optimization (4th ed.). Springer Nature.
  5. Goebel, K., & Kirk, W. A. (1990). Topics in metric fixed point theory. Cambridge university press.
  6. Çınlar, E. (2011). Probability and stochastics (Vol. 261). Springer Science & Business Media.
  7. Kechris, A. (2012). Classical descriptive set theory (Vol. 156). Springer Science & Business Media.
  8. Jänich, K. (1994). Linear algebra. Undergraduate Texts in Mathematics. Springer-Verlag, New York, 7, 8.
  9. Aliprantis, C. D., & Burkinshaw, O. (1998). Principles of real analysis (3rd ed.). Academic Press.
  10. Kreyszig, E. (1978). Introductory functional analysis with applications (Vol. 1). wiley New York.
  11. Davey, B. A., & Priestley, H. A. (2002). Introduction to lattices and order. Cambridge University Press.
  12. Zaanen, A. C. (2012). Introduction to operator theory in Riesz spaces. Springer.
  13. Stachurski, J. (2022). Economic dynamics: theory and computation (2nd ed.). MIT Press.
  14. Meyn, S. P., & Tweedie, R. L. (2009). Markov chains and stochastic stability. Cambridge University Press.
  15. Aliprantis, C. D., & Burkinshaw, O. (2006). Positive operators (Vol. 119). Springer Science & Business Media.