Mathematical Background - Dynamic Programming Volume II: General States

This chapter collects the mathematical tools employed throughout the book. It is intended as a reference; readers should consult the relevant sections as needed. At minimum, all readers should be familiar with the material in Section A.1 before starting Chapter 2.

A.1Foundations¶

In this first section of the chapter, we cover several foundational ideas in analysis, including metrics and partial orders.

A.1.1Properties of the Real Line¶

Let’s start with the real line. This set has a natural metric and a natural order, both of which have many important properties. (Later we will investigate conditions under which such properties extend to more general spaces.)

A.1.1.1Min, Max, Sup and Inf¶

A point $u \in \RR$ is called an upper bound of a set $A \subset \RR$ if $a \leq u$ for all $a \in A$ . Let $U(A)$ be the set of upper bounds of $A$ . If $s \in U(A)$ and $s \leq u$ for all $u \in U(A)$ , then $s$ is called the supremum of $A$ and we write $s = \sup A$ . At most one such supremum $s$ exists. If $s$ is in $U(A)$ then the following are equivalent:

$s = \sup A$
for all $\epsilon > 0$ , there exists a point $a \in A$ with $a > s - \epsilon$

Theorem A.1.1 is essentially axiomatic. It is equivalent to the “completeness” property of $\RR$ that we discuss in Section A.1.1.2.

For $A \subset \RR$ , a lower bound of $A$ is any number $\ell$ such that $\ell \leq a$ for all $a \in A$ . If $i \in \RR$ is a lower bound for $A$ and also satisfies $i \geq \ell$ for every lower bound $\ell$ of $A$ , then $i$ is called the infimum of $A$ and we write $i = \inf A$ . At most one such $i$ exists, and every nonempty subset of $\RR$ bounded from below has an infimum.

We adopt the following conventions:

If $A$ is not bounded above, then $\sup A = +\infty$ .
If $A$ is not bounded below, then $\inf A = -\infty$ .
If $A = \varnothing$ , then $\sup A = -\infty$ and $\inf A = +\infty$ .

A number $m$ contained in a subset $A$ of $\RR$ is called the maximum of $A$ and we write $m = \max A$ if $a \leq m$ for every $a \in A$ . It is called the minimum of $A$ if $a \geq m$ for every $a \in A$ .

Given an arbitrary set $D$ and $f \in \RR^D$ , we set

\sup_{x \in D} f(x) \coloneq \sup \setntn{f(x)}{x \in D} \quad \text{and} \quad \max_{x \in D} f(x) \coloneq \max \setntn{f(x)}{x \in D}.

A point $x^* \in D$ is called a

maximizer of $f$ on $D$ if $x^* \in D$ and $f(x^*) \geq f(x)$ for all $x \in D$ , and a
minimizer of $f$ on $D$ if $x^* \in D$ and $f(x^*) \leq f(x)$ for all $x \in D$ .

Equivalently, $x^* \in D$ is a maximizer of $f$ on $D$ if $f(x^*) = \max_{x \in D} f(x)$ , and a minimizer if $f(x^*) = \min_{x \in D} f(x)$ . We define

\argmax_{x \in D} f(x) \coloneq \setntn{x^* \in D}{f(x^*) \geq f(x) \text{ for all } x \in D}.

The set $\argmin_{x \in D} f(x)$ is defined analogously.

As usual, given $a, b \in \RR$ we write $a \wedge b$ for $\min\{a, b\}$ and $a \vee b$ for $\max\{a, b\}$ . Regarding the order structure of $\RR$ , the following relationships are sometimes helpful: Given $x, y \in \RR$ and $a \in \RR_+$ ,

$x + y = x \vee y + x \wedge y$
$|x - y| = x \vee y - x \wedge y$
$|x - y| = x + y - 2 (x \wedge y)$
$|x - y| = 2 ( x \vee y) -x -y$
$a(x \vee y) = (ax ) \vee (ay)$
$a(x \wedge y) = (ax ) \wedge (ay)$

A.1.1.2Completeness of the Real Line¶

Recall that a sequence $(x_n) \subset \RR$ is called Cauchy if, for all $\epsilon > 0$ , there exists an $N \in \NN$ with $|x_n - x_m| < \epsilon$ whenever $n, m \geq N$ .

The statement that every Cauchy sequence in $\RR$ converges should be understood as axiomatic. It states that, once the irrational numbers are mixed in with the rational numbers, there are “no more gaps” in the real line. This is called the completeness property of $\RR$ . See, e.g., Bartle & Sherbert (2011).

A.1.2Partial Orders¶

Partially ordered spaces are the natural habitat for dynamic programs, In this section we introduce the key ideas needed for the book.

A.1.2.1Partially Ordered Sets¶

The pair $(V, \preceq)$ is called a partially ordered set – or poset – if $V$ is any nonempty set and $\preceq$ is a relation $\preceq$ on $V \times V$ such that, for any $u, v, w$ in $V$ ,

$u \preceq u$

$u \preceq v$ and $v \preceq u$ implies $u = v$ and

$u \preceq v$ and $v \preceq w$ implies $u \preceq w$

(reflexivity)

(antisymmetry)

(transitivity)

The relation $\preceq$ is called a partial order on $V$ . We often write $V$ instead of $(V, \preceq)$ when $\preceq$ is understood. We sometimes say that $w$ dominates $v$ when $v \preceq w$ .

A subset $C$ of a poset $(V, \preceq)$ is called a chain in $V$ if either $u \preceq v$ or $v \preceq u$ for all $u, v \in C$ . A poset $(V, \preceq)$ is called totally ordered if $V$ itself is a chain.

In applications, one of the most important notions of partial order is the pointwise partial order. To define it we let $U, V$ be nonempty sets with $V$ partially ordered by $\preceq$ . Let $V^U$ be a set of maps from $U$ to $V$ . For each $f, g \in V^U$ , we set

f \preceq g \quad \iff \quad f(u) \preceq g(u) \text{ for all } u \in U.

(A.1)

Then $\preceq$ is a partial order on $V^U$ , usually called the pointwise order on $V^U$ .

One very common setting is where $V \subset \RR^\Xsf$ for some nonempty set $\Xsf$ . In this setting, we always write the pointwise partial order as $\leq$ . In particular, for arbitrary $u, v \in \RR^\Xsf$ we write $u \leq v$ if and only if $u(x) \leq v(x)$ for all $x \in \Xsf$ . The partial order in Example A.1.3 is a special case, when $\Xsf = \{1, \ldots, n\}$ .

A.1.2.2Bounds¶

Let $V$ be a poset. $I \subset V$ is called an order interval in $V$ if there exists an $a, b$ in $V$ with $a \preceq b$ such that

I = \setntn{v \in V}{a \preceq v \preceq b} .

In this case we also write $I = [a,b]$ .

Given a poset $V$ and a subset $A$ of $V$ , we call

$u \in V$ an upper bound of $A$ if $a \preceq u$ for all $a$ in $A$ and
$\ell \in V$ a lower bound of $A$ if $\ell \preceq a$ for all $a$ in $A$ .

A subset $A$ of poset $V$ is called bounded above (resp., bounded below) if the set of upper bounds (resp., lower bounds) of $A$ is nonempty (i.e., there exists at least one $v \in V$ with $a \preceq v$ for all $a \in A$ ). $A$ is called order bounded in $V$ if $A$ is both bounded above and bounded below. Obviously, $A$ is order bounded in $V$ if and only if there exists an order interval $I \subset V$ such that $A \subset I$ .

Solution to Exercise A.1.1

If $A$ is order bounded, with $A \subset [u, v] \subset \RR^n$ , then, given $a \in A$ , we have $|a_i| \leq |u_i| \vee |v_i| \leq \| u \|_\infty \vee \| u \|_\infty =: M$ for all $i$ , and hence $\|a \|_\infty \leq M$ . Hence $A$ is bounded with respect to the norm $\| \cdot \|_\infty$ , and therefore with respect to any norm on $\RR^n$ by equivalence of norms (see Section A.4.2.1). Conversely, if $A$ is bounded, with $\| a \|_\infty \leq M$ for all $a \in A$ , then $- M \1 \leq a \leq M \1$ for all $a \in A$ . Hence $A$ is order bounded.

A.1.2.3Greatest and Least Elements¶

Given poset $V$ and $A \subset V$ , we say that

$g \in V$ is the greatest element of $A$ if $g \in A$ and $a \in A \implies a \preceq g$ ; and
$\ell \in V$ is the least element of $A$ if $\ell \in A$ and $a \in A \implies \ell \preceq a$ .

In other words, a greatest element of $A$ is an upper bound of $A$ that is also contained in $A$ , while a least element of $A$ is a lower bound of $A$ also contained in $A$ .

Not all subsets of partially ordered sets have greatest elements. For one example, observe that $\NN \subset (\RR, \leq)$ has no greatest element. In this case the set of upper bounds is empty, so finding a greatest element is impossible. We can also have situations where the set of upper bounds is nonempty but no greatest element exists.

The unit circle in \RR^2 has no greatest element — Figure A.1:The unit circle in $\RR^2$ has no greatest element

If a poset $V$ has a greatest element, that element is sometimes called the top of $V$ . A least element of $V$ is sometimes called the bottom of $V$ .

A.1.2.4Suprema and Infima¶

Let $A$ be a subset of poset $V$ and let $U(A)$ be the set of all upper bounds of $A$ in $V$ . We call $s \in V$ the supremum of $A$ if $s$ is a least element of $U(A)$ . Since least elements are unique (Exercise A.1.2), subsets of $V$ can have at most one supremum. When it exists, the supremum of $A$ is denoted by $\bigvee A$ . Also,

if $A = \{a_i\}_{i \in I}$ for some index set $I$ , we write $\bigvee A$ as $\bigvee_i \, a_i$ .
Given $u$ and $v$ in $V$ , the supremum $\bigvee \{u, v\}$ is also called the join of $u$ and $v$ , and is written $u \vee v$ .

While every $A \subset \RR$ that is bounded above has a supremum (Theorem A.1.1), the same is not true for arbitrary posets.

Example A.1.10

Let $C$ be the continuous functions from $[0, 1]$ into $\RR$ . Consider the sequence of functions $F = \{f_n\}_{n \geq 2}$ where $f_n(x) = 0$ when $0 \leq x \leq 1/2$ , $f_n(x) = n(x-1/2)$ when $1/2 \leq x \leq 1/n + 1/2$ and $f_n(x) = 1$ otherwise. If $g \in U(F)$ , the set of upper bounds of $F$ in $C$ , then, by continuity and the upper bound property, it must be that $g = 1$ on $[1/2, 1]$ . Given any $g \in U(F)$ , we can always take a $g' \in U(F)$ with $g' \leq g$ and $g'(x) < g(x)$ at at least one $x$ . Hence $U(F)$ has no least element and, as a result, $F$ has no supremum in $C$ .

Figure A.2 provides a visualization of $u \vee v$ and $u \wedge v$ when $V = (\RR^2, \leq)$ . Figure A.3 provides a visualization of $f \vee g$ and $f \wedge g$ when $V = (\RR^\Xsf, \leq)$ for some subset $\Xsf$ of the reals. In both cases, $\leq$ is the pointwise partial order.

The points u \vee v and u \wedge v in \RR^2 — Figure A.2:The points $u \vee v$ and $u \wedge v$ in $\RR^2$

Functions f \vee g and f \wedge g when defined on a subset of \RR — Figure A.3:Functions $f \vee g$ and $f \wedge g$ when defined on a subset of $\RR$

Given $A$ contained in poset $V$ , an element of $V$ is called the infimum of $A$ if it is a greatest element of the set of lower bounds of $A$ . The infimum of $A$ is typically denoted $\bigwedge A$ . If $V \subset \RR$ with the usual order $\leq$ , then we also use the notation $\inf A$ . Also,

if $A = \{a_i\}_{i \in I}$ for some index set $I$ , we sometimes write $\bigwedge A$ as $\bigwedge_i \, a_i$ .
Given $u$ and $v$ in $V$ , the infimum $\bigwedge \{u, v\}$ is also called the meet of $u$ and $v$ , and is written $u \wedge v$ .

Exercise A.1.4

Let $\Xsf$ be any nonempty set, let $V \subset \RR^\Xsf$ , and consider $(V, \leq)$ as a poset when $\leq$ is the pointwise partial order. Let $G$ be a nonempty subset of $V$ and let $s$ and $i$ be given by

s(x) := \sup_{g \in G} g(x) \quad \text{and} \quad i(x) := \inf_{g \in G} g(x) \qquad (x \in \Xsf)

(A.2)

(The $\sup$ and $\inf$ on the right-hand side follow the rules in Section A.1.1, with $s$ taking values in $(-\infty, +\infty]$ and $i$ taking values in $[-\infty, \infty)$ .) Prove that

If $s \in V$ , then $\bigvee G$ exists in $V$ and $\bigvee G = s$ .
If $i \in V$ , then $\bigwedge G$ exists in $V$ and $\bigwedge G = i$ .

For the poset $(\RR, \leq)$ , every $x \in \RR$ is an upper bound of the empty set $\varnothing$ , since, vacuously, $x$ dominates all elements of $\varnothing$ . Since $\RR$ has no least element, the set $\varnothing \subset \RR$ has no supremum in $\RR$ . By related reasoning, if we restrict attention to $([0,1], \leq)$ , we see that 0 is the supremum of $\varnothing \subset [0,1]$ . The next exercise extends this line of reasoning.

A.1.2.5Order Duals¶

Given partially ordered set $V$ , let $V^\partial = (V, \preceq^\partial)$ be the order dual (also called the dual), so that, for $u, v \in V$ , we have $u \preceq^\partial v$ if and only if $v \preceq u$ . We use $\bigvee^\partial A$ to denote the supremum of $A \subset V^\partial$ in $V^\partial$ and $\bigwedge^\partial$ for the infimum.

A.1.2.6Monotone Sequences¶

Let $V$ be any poset. A sequence $(v_n)_{n \geq 1}$ in $V$ is called increasing if $v_n \preceq v_{n+1}$ for all $n \in \NN$ , and decreasing if $v_{n+1} \preceq v_n$ for all $n \in \NN$ . We write

$v_n \uparrow v$ when $(v_n)$ is increasing and $\bigvee_n v_n = v$ and
$v_n \downarrow v$ when $(v_n)$ is decreasing and $\bigwedge_n v_n = v$ .

These symbols generalize standard notation for convergence of monotone sequences in $\RR$ . For example, if $(u_n)$ is increasing in $\RR$ and its limit is $u$ , then one writes $u_n \uparrow u$ . This is a special case of the usage above, since $u$ is also the supremum of $(u_n)$ under the standard order on $\RR$ .

In some settings, the order theoretic concepts $\uparrow$ and $\downarrow$ have simple pointwise characterizations. The next lemma gives one such characterization for $\uparrow$ (and an analogous result holds for $\downarrow$ ). In the statement of the lemma, $V \subset \RR^\Xsf$ for some nonempty $\Xsf$ and $\leq$ is the pointwise partial order. Also, we say that $V$ is closed under pointwise suprema if, for every increasing $(v_n) \subset V$ that is bounded above, the pointwise supremum $s(x) = \sup_n v_n(x)$ is an element of $V$ .

Proof

( $\Rightarrow$ ) Let $(v_n)$ be increasing, let $v$ be in $V$ and suppose that $v_n(x) \uparrow v(x)$ in $\RR$ for all $x \in \Xsf$ . By Exercise A.1.4, $\bigvee_n v_n$ exists in $V$ and equals $v$ . Since $(v_n)$ is increasing, we have $v_n \uparrow v$ . ( $\Leftarrow$ ) Suppose that $v_n \uparrow v$ for some $v \in V$ . Fix $x \in \Xsf$ and note that $v_n(x)$ is increasing and bounded above by $v(x)$ . Hence the pointwise supremum function $s(x) = \sup_n v_n(x)$ exists in $\RR^\Xsf$ and $s \leq v$ . Since $V$ is closed under pointwise suprema, we also have $s \in V$ . Since $v_n \leq s \leq v$ for all $n$ and $\bigvee_n v_n = v$ , we see that $s = v$ . This means that, for any $x \in \Xsf$ , we have $\sup_n v_n(x) = v(x)$ . Hence $v_n(x) \uparrow v(x)$ . ◻

A.1.2.7Order Preserving Maps¶

A self-map $S$ from poset $V = (V, \preceq)$ to poset $U = (U, \trianglelefteq)$ is called

order preserving if $v, w \in V$ and $v \preceq w$ implies $Sv \trianglelefteq Sw$ , and
order reversing if $v, w \in V$ and $v \preceq w$ implies $Sw \trianglelefteq Sv$ .

In the definition of order preserving above, one common setting is when $U = \RR$ with its standard order. In this case, the mapping $S$ is often called increasing. We will also use this terminology. The result in the next exercise uses the fact that the standard order on $\RR$ is closed (i.e., preserved under limits).

A.1.2.8Strict Monotonicity¶

Now let’s examine a form of strict monotonicity. We consider posets $V = (V, \preceq)$ and $W = (W, \trianglelefteq)$ . For $u, v \in V$ , we write $u \prec v$ if $u \preceq v$ and not $u = v$ . For $x, y \in W$ , we write $x \vartriangleleft y$ if $x \trianglelefteq y$ and not $x = y$ . We call a map $S$ from $V$ to $W$ strictly order preserving if $v \prec w$ implies $S v \vartriangleleft S w$ . In the example below, $\leq$ is the pointwise partial order and $u<v$ means $u\leq v$ and not $u=v$ .

Example A.1.12

Let $\Usf$ and $\Xsf$ be finite sets, and let $\RR^\Xsf$ and $\RR^\Usf$ be ordered by $\leq$ . Let $P$ be a map from $\Usf \times \Xsf \to \RR_+$ . Assume that, for any $x \in \Xsf$ , there exists some $u \in \Usf$ such that $P(u, x)>0$ . Consider the map $P$ from $\RR^\Xsf$ to $\RR^\Usf$ defined by

(Pv)(u) = \sum_{x} v(x) P(u, x) \qquad (u \in \Usf).

We claim that $P$ is strictly order preserving. Indeed, if $v < w \in V$ , then there exists some $\bar x \in \Xsf$ such that $v(\bar x) < w(\bar x)$ . Also, by assumption, there exists some $u \in \Usf$ such that $P(u, \bar x) > 0$ . As a result, we have

\begin{aligned} (P v)(u) & = \sum_{x \neq \, \bar x} v(x) P(u, x) + v(\bar x) P(u, \bar x) \\ & < \sum_{x \neq \, \bar x} v(x) P(u, x) + w(\bar x) P(u, \bar x) \leq \sum_{x} w(x) P(u, x) = (P w)(u). \end{aligned}

Hence, $P v < P w$ .

A.1.2.9Order Isomorphisms¶

A surjective map $F$ from poset $(V,\preceq)$ to poset $(\hat V, \trianglelefteq)$ is called an

order isomorphism if $v \preceq w \iff F v \trianglelefteq F w$ , and an
order anti-isomorphism if $v \preceq w \iff F w \trianglelefteq F v$ .

(Surjective means that $F$ maps $V$ onto $\hat V$ , so each $\hat v \in \hat V$ has a preimage.) When such an order isomorphism (resp., anti-isomorphism) exists, we say that $V$ and $\hat V$ are isomorphic (resp., anti-isomorphic).

In the next two exercises, $\Xsf$ is any nonempty set and all spaces of real-valued functions have the pointwise partial order. Scalar actions on functions are applied pointwise; for example, given $h \in \RR^\Xsf$ , the function $\exp h$ maps $x$ to $\exp(h(x))$ .

Visualization of with \Xsf = [0,1] — Figure A.4:Visualization of Exercise A.1.10 with $\Xsf = [0,1]$

Figure A.4 provides a visualization of the result in Exercise A.1.10 when $\Xsf = [0,1]$ . The two functions are $h(x) = x^2 - 1/2$ , and $h'(x) = x$ , so that $h \leq h'$ . The middle panel sets $\theta = 2$ , which preserves the order. The right panel sets $\theta =-2$ , which reverses order.

The next exercise generalizes Exercise A.1.10.

In the following exercises, $(V, \preceq)$ and $(\hat V, \trianglelefteq)$ are arbitrary posets.

Solution to Exercise A.1.14

Let $\{p_\alpha\}_{\alpha \in \Lambda}$ be a subset of $V$ , let $V$ , $\hat V$ , and $F$ be as stated, and let $\bar p \coloneq \bigvee_\alpha p_\alpha$ . We need to show that $\bar q \coloneq F \bar p$ is the supremum of $\{F p_\alpha\}_{\alpha \in \Lambda}$ . First, $p_\alpha \preceq \bar p$ for all $\alpha$ , so $F p_\alpha \preceq F \bar p = \bar q$ for all $\alpha$ . In particular, $\bar q$ is an upper bound of $\{F p_\alpha\}_{\alpha \in \Lambda}$ . Moreover, if $u$ is any upper bound of $\{F p_\alpha\}_{\alpha \in \Lambda}$ , then $F p_\alpha \preceq u$ and hence $p_\alpha \preceq F^{-1} u$ for all $\alpha$ , so $\bar p \preceq F^{-1} u$ . But then $\bar q = F \bar p \preceq u$ . Hence $\bar q$ is the supremum of $\{F p_\alpha\}_{\alpha \in \Lambda}$ , as was to be shown.

The next exercise is related to Lemma A.1.4.

A.1.2.10Order Stability¶

In Section 1.3.2.2 we discussed the fact that contractivity of the Bellman operator plays a significant role in the proof of Bellman-type optimality results for the optimal savings problem. Contractivity is a metric property that has no immediate counterpart in an abstract partially ordered set. This motivates us to introduce weaker conditions on operators that are well defined in any poset and, at the same time, strong enough to generate useful optimality results. This section gives details.

Let $V$ be a poset and let $S$ be a self-map on $V$ . In this setting, we call $S$ order stable if

$S$ has a unique fixed point $\bar v$ in $V$ ,
$v \in V$ with $v \preceq S \, v$ implies $v \preceq \bar v$ , and
$v \in V$ with $S \, v \preceq v$ implies $\bar v \preceq v$ .

Conditions (ii) and (iii) say that points mapped up by $S$ lie below the fixed point, while points mapped down lie above it.

We strengthen this notion by adding monotone convergence to the fixed point. We call $S$ strongly order stable if $S$ is order stable and, in addition,

$v \in V$ with $v \preceq S \, v$ implies $S^n v \uparrow \bar v$ , and
$v \in V$ with $S \, v \preceq v$ implies $S^n v \downarrow \bar v$ .

Figure A.5 gives an illustration of a strongly order stable map $S$ on $V = [0,1]$ . All points mapped up by $S$ lie below and converge up to its unique fixed point. All points mapped down by $S$ lie above and converge down to its fixed point.

A strongly order stable map S on [0,1] — Figure A.5:A strongly order stable map $S$ on $[0,1]$

Most results in this book require only order stability. Strong order stability is invoked in a small number of places where monotone convergence of the iterates is needed.

The following result is useful when we consider minimization problems.

Proof

Let $S$ be as stated. By definition, $S$ has a unique fixed point $\bar v \in V$ . Hence it remains only to verify conditions (ii) and (iii) of order stability on $V^\partial$ . Regarding (ii), suppose $v \in V$ and $v \preceq^\partial S v$ . Then $Sv \preceq v$ and hence $\bar v \preceq v$ , by (iii) applied to $S$ on $V$ . But then $v \preceq^\partial \bar v$ , so (ii) holds on $V^\partial$ . The proof that (iii) holds on $V^\partial$ is similar. We have shown that $S$ is order stable on $V^\partial$ whenever $S$ is order stable on $V$ . The reverse implication holds because the dual of $V^\partial$ is $V$ .

The argument for strong order stability is similar: $S^n v \uparrow \bar v$ in $V$ is equivalent to $S^n v \downarrow \bar v$ in $V^\partial$ , so (iv) on $V$ corresponds to (v) on $V^\partial$ and vice versa. ◻

A.1.3Metric Space¶

In this section we define metric spaces and review convergence, open and closed sets, compactness, and completeness.

A.1.3.1Definition¶

Let $V$ be a nonempty set. A function $d \colon V \times V \to \RR$ is called a metric on $V$ if, for any $u, v, w \in V$ ,

$d(u,v) \geq 0$ ,

$d(u,v)=0 \iff u=v$ ,

$d(u,v)=d(v,u)$ and

$d(u,v)\leq d(u,w)+d(w,v)$ .

(nonnegativity)

(identifiability)

(symmetry)

(triangle inequality)

Together, the pair $(V, d)$ is called a metric space. When the metric is clear from context we refer to the metric space by the symbol $V$ alone.

Example A.1.14 (The space

b\Xsf

)

Let $\Xsf$ be any set. Let $b\Xsf$ denote all bounded functions from $\Xsf$ to $\RR$ . For all $f, g$ in $b\Xsf$ , let

\| f \|_\infty \coloneq \sup_{x \in \Xsf} |f(x)| \quad \text{and} \quad d_\infty(f, g) \coloneq \|f - g \|_\infty.

The map $f \mapsto \| f \|_\infty$ is called the supremum norm and $d_\infty$ is called the supremum distance. The pair $(b\Xsf, d_\infty)$ is a metric space. The triangle inequality holds because, given $f, g, h$ in $b\Xsf$ and $x \in \Xsf$ , we have (by the triangle inequality in $\RR$ ),

|f(x) - g(x)| \leq |f(x) - h(x)| + |h(x) - g(x)| \leq d_\infty (f, h) + d_\infty(h, g).

The right side is an upper bound for the left side, so $d_\infty(f, g) \leq d_\infty (f, h) + d_\infty(h, g)$ .

Example A.1.15 (The space

\ell_p(\Xsf)

)

Let $\Xsf$ be finite or countable and fix $p$ with $1 \leq p < \infty$ . Let

\| h \|_p \coloneq \left\{ \sum_{x \in \Xsf} |h(x)|^p \right\}^{1/p} \quad \text{and} \quad d_p(g, h) = \| g - h \|_p.

With $\ell_p(\Xsf) \coloneq \left\{ h \in \RR^\Xsf \,:\, \| h \|_p < \infty \right\}$ the pair $(\ell_p(\Xsf), d_p)$ is a metric space. The triangle inequality can be established via the Hölder inequality that states that $\| f g \|_1 \leq \| f \|_p \, \| g \|_q$ whenever $p, q \in [1, \infty]$ with $1/p + 1/q = 1$ . In this setting the triangle inequality is also called the Minkowski inequality.

If $(V, d)$ is a metric space and $N \subset V$ , then $(N, d)$ is also a metric space (where $d$ in the second case is defined by restricting the original metric to $(u, v) \in N \times N$ ).

A.1.3.2Convergence¶

Given any point $u$ in metric space $(V, d)$ , the $\epsilon$ -ball around $u$ is the set

B_\epsilon(u) \coloneq \setntn{v \in V}{d(u, v) < \epsilon} .

We say that sequence $(u_n) \subset V$ converges to $u \in V$ if

\forall \, \epsilon > 0,\; \exists \, n_\epsilon \in \NN \st n \geq n_\epsilon \implies u_n \in B_\epsilon(u).

A subsequence of a sequence $(u_n)$ in $V$ is any sequence of the form $(u_{n_k})_{k \geq 1}$ , where $(n_k)$ is a strictly increasing sequence in $\NN$ .

A metric space $V$ is called separable if there exists a countable set $A \subset V$ such that, for any $v \in V$ , there exists a sequence $(a_n)$ contained in $A$ with $a_n \to v$ . For example, $\RR$ is separable because any $v \in \RR$ can be expressed as the limit of a rational sequence. Separability is useful in certain settings – particularly when we need to combine topology and measure (see, e.g., Theorem A.3.3). In the applications we consider, most spaces will be separable.

A.1.3.3Open and Closed Sets¶

Let $V$ be a metric space. A point $u \in A \subset V$ is called interior to $A$ if there exists an $\epsilon > 0$ such that $B_\epsilon(u) \subset A$ .

A subset $G$ of $V$ is called open in $V$ if every $u \in G$ is interior to $G$ . For example, every subset of a discrete metric space is open, since $B_{1/2}(u) = \{u\}$ for any $u$ .

A subset $F$ of $V$ is called closed if given any sequence $(u_n)$ satisfying $u_n \in F$ for all $n$ and $u_n \to u$ for some $u \in V$ , the point $u$ is in $F$ . In other words, $F$ contains the limit points of all convergent sequences that take values in $F$ . Arbitrary unions and finite intersections of open sets are open, while arbitrary intersections and finite unions of closed sets are closed. A set $G \subset V$ is open if and only if $G^c$ is closed.

A.1.3.4Compactness¶

A set $D$ in $V$ is called bounded if there exists a finite $K$ such that $d(u, v) \leq K$ whenever $u, v \in D$ . A sequence in $V$ is called bounded if its range is a bounded subset of $V$ . A subset $K$ of $V$ is called precompact in $V$ if every sequence in $K$ has a subsequence converging to some point in $V$ . The set $K$ is called compact if, in addition, the limit points always lie in $K$ . Thus, $K$ is compact if and only if $K$ is closed and precompact.

The following theorem is a bedrock of real analysis.

Every precompact subset of a metric space is bounded, but the converse is not true in general. For example, consider the set $b\RR$ with the supremum distance (Example A.1.14). Let $f_n$ be the normal density with variance 1 and mean $n$ for each $n$ in $\NN$ . The set $\{f_n\}_{n \in \NN}$ is bounded, since $d_\infty(f_n, f_m) \leq 1$ for all $n, m$ . But it is not precompact. For example, the sequence $\{f_n\}_{n \in \NN}$ has no convergent subsequence. Indeed, every pair of distinct points $f_n, f_m$ in the sequence has $d_\infty(f_n, f_m) = 1$ .

Later, in Section A.4.2.3, we will see sufficient conditions for boundedness to imply precompactness.

A.1.3.5Completeness¶

Let $V = (V, d)$ be a metric space. Analogous to the real case (see Section A.1.1.2), a sequence $(u_n) \subset V$ is called Cauchy if, given any $\epsilon > 0$ , there exists an $n_\epsilon \in \NN$ such that $n, m \geq n_\epsilon$ implies $d(u_n, u_m) < \epsilon$ . $(V, d)$ is called complete if every Cauchy sequence in $V$ converges in $V$ . Examples of complete spaces include $\RR^n$ paired with any metric generated by a norm, the set of $n \times k$ matrices paired with any metric generated by a norm, the space $(\ell_p(\Xsf), d_p)$ for countable $\Xsf$ and $p \in [1, \infty]$ , the space $(b\Xsf, d_\infty)$ , and the space $(bc\Xsf, d_\infty)$ . Two metrics $d_1$ and $d_2$ on $V$ are called equivalent if there are positive constants $\alpha, \beta$ such that $\alpha d_1(u, v) \leq d_2(u, v) \leq \beta d_1(u, v)$ for all $u, v \in V$ . Equivalent metrics generate the same Cauchy sequences, so completeness is preserved under equivalence.

A.2Topology¶

Topological spaces are a generalization of metric spaces. They are useful for two reasons. One is that there exist interesting and useful topological spaces that cannot be represented as metric spaces. The second is that, by stripping away some of the structure naturally present in metric spaces, topological arguments add simplicity and clarity to many discussions in analysis.

A.2.1Topological Space¶

We begin by introducing topological spaces and investigating some of their core characteristics.

A.2.1.1Definition and Examples¶

A topological space is a pair $(V, \tau)$ where $V$ is a nonempty set and $\tau$ is a collection of subsets of $V$ such that

$\varnothing$ and $V$ are both in $\tau$ ,
$\tau$ is closed under finite intersections, and
$\tau$ is closed under arbitrary unions.

Statements (ii) and (iii) mean that

A, B \in \tau \implies A \cap B \in \tau \quad \text{and} \quad \aA \subset \tau \implies \cup_{A \in \aA} A \in \tau.

The family $\tau$ is called a topology on $V$ . The elements of $\tau$ are called open sets. Complements of open sets are called closed.

A subset $N$ of a topological space $V = (V, \tau)$ is called a neighborhood of a point $v \in V$ if there exists a $G \in \tau$ with $v \in G \subset N$ . A topological space $V$ is called a Hausdorff space if, for any $u, v \in V$ with $u \neq v$ , there exist neighborhoods $N$ of $u$ and $M$ of $v$ with $N \cap M = \varnothing$ . Every metrizable space is Hausdorff, and all topological spaces we consider in this book are Hausdorff spaces.

A.2.1.2Nets¶

We briefly introduce nets, which are a generalization of a sequence. Nets are important because (a) they characterize topologies, in a sense described below, and (b) nets allow us to describe definitions and properties in a way that connects neatly to sequence-based definitions in metric spaces.

Let $A$ be any nonempty set. A preorder on $A$ is a relation $\preceq$ on $A \times A$ such that, for any $a, b, c$ in $A$ we have $a \preceq a$ (reflexivity) and $a \preceq b$ and $b \preceq c$ implies $a \preceq c$ (transitivity). Obviously any antisymmetric preorder on $A$ is a partial order on $A$ . A directed set is a nonempty set $A$ and a preorder $\preceq$ on $A$ such that, for any $a, b \in A$ , there exists a $c \in A$ with $a \preceq c$ and $b \preceq c$ .

Let $V$ be any set. A net in $V$ is a function from a directed set $A$ to $V$ , typically written as $v_\bullet$ or $(v_\alpha)_{\alpha \in A}$ . We sometimes simplify the latter to $(v_\alpha)$ . The interpretation is that $\alpha \in A$ is mapped to $v_\alpha \in V$ . Obviously any sequence $(v_n)$ in $V$ is also a net in $V$ .

A net $(v_\alpha)_{\alpha \in A}$ in $V$ is said to converge to $v \in V$ and we write $v_\alpha \to v$ if, for any neighborhood $N$ of $v$ , there exists a $\beta \in A$ such that $v_\alpha \in N$ whenever $\beta \preceq \alpha$ . This generalizes the concept of convergence of sequences in metric space. It is easy to check that convergent nets in $V$ have unique limits whenever $V$ is Hausdorff. (The converse is also true.)

The next theorem shows that nets can be used to characterize topologies.

For a proof of Theorem A.2.1, see Theorem 2.14 of Aliprantis & Border (2006).

Let $(v_\alpha)_{\alpha \in A}$ and $(w_\beta)_{\beta \in B}$ be two nets in $V$ . The net $(w_\beta)_{\beta \in B}$ is called a subnet of $(v_\alpha)_{\alpha \in A}$ if there exists an order preserving map $p$ from $B$ to $A$ such that (i) $w_\beta = v_{p(\alpha)}$ for all $\alpha \in A$ and (ii) for all $\alpha \in A$ , there exists a $\alpha' \in p(B)$ such that $\alpha \preceq \alpha'$ .

Subnets generalize subsequences. For example, suppose that $w_n = 1/n^2$ and $v_n = 1/n$ for $n \in \NN$ , then $(w_n)_{n \in \NN}$ is a subnet of $(v_n)_{n \in \NN}$ in $\RR$ (take $A = B = \NN$ and $p(n) = n^2$ ).

A subset $K$ of topological space $V$ is called compact if, given any net $(v_\alpha)$ contained in $K$ , there exists a subnet $(w_\beta)$ of $(v_\alpha)$ and a point $v \in K$ such that $w_\beta \to v$ . This generalizes the notion of a compact subset of a metric space, as given in Section A.1.3.4.

A.2.1.3Continuous Functions¶

Let $V$ and $W$ be topology spaces. A function $f \colon V \to W$ is said to be continuous at $v \in V$ if, for any net $(v_\alpha)$ in $V$ with $v_\alpha \to v$ in $V$ we have $f(v_\alpha) \to f(v)$ in $W$ . If $f$ is continuous at every $v \in V$ we simply say that $f$ is continuous. It is well-known that $f$ is continuous on $V$ if and only if $f^{-1}(G)$ is open in $V$ whenever $G$ is open in $W$ . (For a proof of this equivalence, see, e.g., Theorem 2.28 of Aliprantis & Border (2006).)

One of the most important features of continuous functions is that they carry compact sets into compact sets (see, e.g., §2.3 of Dudley (2002)):

A.2.1.4Initial Topologies¶

Let $V$ be a nonempty set and, for each $\alpha$ in index set $\Lambda$ , let $f_\alpha$ be a function from $V$ to topological space $(W_\alpha, \tau_\alpha)$ . The initial topology generated by $\{f_\alpha\}_{\alpha \in \Lambda}$ is the topology $\tau$ on $V$ generated (in the sense of Example A.2.3) by the family of sets

\aA \coloneq \setntn{f_\alpha^{-1}(G)}{G \in \tau_\alpha, \; \alpha \in \Lambda}.

Evidently each $f_\alpha$ is continuous with respect to $\tau$ on $V$ . The following lemma nicely characterizes convergence with respect to the initial topology. In the statement, $\tau$ is the initial topology just described.

A.2.1.5Metrizable Spaces¶

A topological space $(V, \tau)$ is called metrizable if there exists a metric $d$ on $V$ such that $d$ generates the topology $\tau$ . In metrizable spaces, sequences have the same “rights” as sequences in Euclidean space, in the sense that they determine the topology and hence other derived objects such as continuous functions. For example, given two metrizable spaces $V$ and $W$ ,

a function $f \colon V \to W$ is continuous if and only if, for $u \in V$ and any sequence $(v_n) \subset V$ we have $f(v_n) \to f(v)$ in $W$ whenever $v_n \to v$ in $V$ .
A set $C$ is closed in $V$ if and only if any convergent sequence contained in $C$ converges to an element of $C$ .

Equivalent metrics generate the same topology, which is why it is often nicer to discuss topologies than metrics. For example, we will see later (in Section A.4.2) that all metrics on Euclidean space $\RR^n$ generated by a norm are equivalent. Hence, while there are infinitely many norms on $\RR^n$ , they all generate the same topology. This means that, when discussing norm topologies, we can speak unambiguously about open sets, compact sets, continuous functions, etc.

A.2.1.6Product Topologies¶

Let $\{(V_n, \tau_n)\}_{n \in \NN}$ be a family of topological spaces and consider the Cartesian product $V = \prod_{n \in \NN} V_n$ . The $i$ -th projection map on $V$ is the function $\pi_i$ sending $v = (v_n)_{n \in \NN} \in V$ into $v_i$ . The product topology on $V$ , denoted here by $\tau$ , is the initial topology generated by the set of projection maps $\{ \pi_n \}_{n \in \NN}$ . The following result is a direct consequence of Lemma A.2.3.

Example A.2.7

Consider $\RR$ with its usual topology, generated by the metric $d(u, v) = |u-v|$ . The set of $n$ -vectors $\RR^n$ is the Cartesian product of $n$ copies of $\RR$ . The projections can be identified with the canonical basis vectors $e_1, \ldots, e_n$ , since, given $u \in \RR^n$ , the $i$ -th projection is $\pi_i(u) = u_i = \inner{u, e_i}$ . In view of Lemma A.2.3, a sequence $(u_k)$ converges to $u \in \RR^n$ in the product topology if and only if $\inner{u_k, e_i} \to \inner{u, e_i}$ in $\RR$ for all $i$ in $\{1, \ldots, n\}$ . In other words, a sequence in $\RR^n$ converges in the product topology if and only if it converges pointwise.

More generally, if $((\Xsf_i, d_i))_{i=1}^n$ are metric spaces and $\Xsf \coloneq \prod_i \Xsf_i$ has the product topology, then $(u_k) \subset \Xsf$ converges to $u \in \Xsf$ if and only if $d_i(u_k, u) \to 0$ for all $i$ .

A.2.1.7Existence of Extrema¶

For finite subsets of $\RR$ , maxima and minima clearly exist. For infinite collections the same is not true. For example, the set $(0, 1)$ has neither a maximum nor a minimum.

Under what conditions on primitives are maxima and minima guaranteed to exist? There are multiple approaches to this issue, depending on the structure of the problem. In this section we treat one of the most fundamental, attributed to the German mathematician Karl Weierstrass (1815–1897).

Let $f$ be a function from a metric space $V$ to $\RR$ . Let $(v_n)$ be an $V$ -valued sequence and let $v$ be a point in $V$ . The function $f$ is called

lower semicontinuous at $v$ if $v_n \to v$ implies $f(v) \leq \liminf_n f(v_n)$ , and
upper semicontinuous at $v$ if $v_n \to v$ implies $f(v) \geq \limsup_n f(v_n)$ .

If $f$ is lower semicontinuous at every point in $V$ , then $f$ is called lower semicontinuous, and similarly for upper continuity.

A proof of the next theorem can be found in Jahn (2020).

A.2.2Stability and Contractions¶

One of the most important approaches to fixed points in metric space is via the theory of contractive maps. Here we review key results.

A.2.2.1Fixed Points¶

Let $V$ be any set and let $S$ be a self-map on $V$ . If $v \in V$ obeys $Sv = v$ , then $v$ is called a fixed point of $S$ in $V$ . For example, if $V = \RR$ and $S$ is the identity, then every point in $\RR$ is fixed under $S$ . If, instead, $Sx = x^2$ , then the set of fixed points is $\{0, 1\}$ .

Now let $V$ be a topological space. We call $S \colon V \to V$ globally stable on $V$ if $S$ has a unique fixed point $u^* \in V$ and $S^k u \to u^*$ as $k \to \infty$ for all $u \in V$ . When $V$ is metrizable, with metric $d$ , a self-map $S$ is called asymptotically contracting if $d(S^n u, S^n v) \to 0$ as $n \to \infty$ for all $u, v \in V$ .

We will often make use of the following lemma.

A.2.2.2Contractions¶

A self-map $S$ on metric space $V \coloneq (V, d)$ is called contracting or, more specifically, a contraction of modulus $\lambda$ if there exists a $\lambda \in [0, 1)$ such that

d(Su, Sv) \leq \lambda d(u, v) \quad \text{for all} \quad u, v \in V

(A.4)

For a proof, see, for example, Aliprantis & Border (2006), Theorem 3.48.

Most of the conclusions of Banach’s contraction mapping theorem carry over when $S$ is eventually contracting; that is, when $S^k$ is contracting for some $k \in \NN$ . A proof can be found on p. 9 of Goebel & Kirk (1990).

A.3Measure and Integration¶

In this section we review measurable functions and integration theory. Measurable functions generalize continuous functions while remaining closed under standard arithmetic and limiting operations, and they admit a well-defined theory of integration. Throughout, for real-valued $f$ on an arbitrary domain, we set $f^+ \coloneq f \vee 0$ and $f^- \coloneq - (f \wedge 0)$ . See Figure A.6 for an illustration. The function $f^+$ is called the positive part of $f$ , while $f^-$ is called the negative part. The identity $f = f^+ - f^-$ always holds, so the pair $f^+$ , $f^-$ provides a decomposition of $f$ into the difference between two nonnegative functions.

A.3.1Measure Theory¶

We review measurable spaces, measurable functions, parametric continuity and measurable selections, and measures.

A.3.1.1Measurable Space¶

Let $\Xsf$ be any nonempty set. A collection of subsets $\aA$ of $\Xsf$ is called a $\sigma$ -algebra on $\Xsf$ if

$\Xsf \in \aA$ ,
$A \in \aA$ implies $A^c \in \aA$ , and
if $\{A_n\}_{n \geq 1}$ is a sequence contained in $\aA$ , then $\cup_n A_n \in \aA$ .

A pair $(\Xsf, \aA)$ where $\Xsf$ is a nonempty set and $\aA$ is a $\sigma$ -algebra on $\Xsf$ is called a measurable space.

Points (ii) and (iii) tell us that $\aA$ is “stable” under the taking of complements and unions. By De Morgan’s law $(\cap_n A_n)^c = \cup_n A_n^c$ , any $\sigma$ -algebra is stable under countable intersections too. By (i) and (ii), $\varnothing \in \aA$ also holds.

One way to define a $\sigma$ -algebra is to take a collection $\cC$ of subsets of $\Xsf$ , and consider the smallest $\sigma$ -algebra that contains this collection.

Now let $\Xsf$ be a metric space. The family of Borel sets on $\Xsf$ , denoted by either $\bB$ or $\bB_\Xsf$ depending on whether or not the underlying space is clear, is defined as the $\sigma$ -algebra generated by the open sets of $\Xsf$ . Evidently $\bB$ contains not only all the open subsets of $\Xsf$ but also all the closed ones. From these sets we can continue taking complements and countable unions and everything we produce must be a Borel set. In fact it turns out that every set we work with in day-to-day analysis is a Borel set.

A.3.1.2Measurable Functions¶

Given two arbitrary measurable spaces $(\Xsf, \aA)$ and $(\Ysf, \bB)$ , a function $f$ from $\Xsf$ to $\Ysf$ is called $(\aA, \bB)$ -measurable if

f^{-1}(B) \text{ is in } \aA \text{ whenever } B \in \bB.

In other words, measurable functions are those functions that pull measurable sets back to measurable sets. If $\Ysf$ is a metric space and $\bB$ is its Borel sets, then we will say that $f$ is Borel measurable. It can be shown in this case (see, e.g., Çınlar (2011), Proposition 2.3) that $f$ is Borel measurable if and only if either one of the following apparently weaker conditions are satisfied:

$f^{-1}(G)$ is in $\aA$ whenever $G$ is open in $\Ysf$
$\Ysf$ is a Borel subset of $\RR$ and $f^{-1}((-\infty, \alpha))$ is in $\aA$ for all $\alpha \in \RR$ .

From this result it is immediate that every continuous function from $\Xsf$ to $\Ysf$ is also Borel measurable.

While the class of continuous functions has beautiful properties and is closed under uniform limits (see Example A.1.20), it is not closed under pointwise limits,^[2] which makes it hard to work with in some instances. On the other hand, the set of Borel functions is closed under the taking of pointwise limits:

In fact, in our setting, the set of Borel measurable functions is precisely the smallest class of functions that contains the continuous functions and is closed under the taking of pointwise limits (see, e.g., §11.7 of Kechris (2012)).

It is also true that compositions of Borel measurable functions are also Borel measurable, and, when the functions are real-valued, that Borel measurability is preserved under algebraic operations. The next lemma gives one statement of these results:

See Çınlar (2011), Chapter 1, Section 2 for proofs.

A.3.1.3Parametric Continuity and Measurable Selections¶

We often wish to know whether or not continuity passes from primitives to solutions. For example, we might ask whether an equilibrium object, constructed through a process that involves optimization, varies continuously with parameters. The most commonly used theorem in this domain is Berge’s theorem of the maximum. Here we state a version of Berge’s theorem. Throughout, $\Asf$ and $\Xsf$ are metric spaces.

A correspondence from $\Xsf$ to $\Asf$ is a map $\Gamma$ from $\Xsf$ to the set of all subsets of $\Asf$ . A correspondence $\Gamma$ from $\Xsf$ to $\Asf$ is called nonempty if $\Gamma(x)$ is nonempty for all $x \in \Xsf$ . A function $\sigma$ from $\Xsf$ to $\Asf$ is called a measurable selection with respect to $\Gamma$ if $\sigma$ is Borel measurable and $\sigma(x) \in \Gamma(x)$ for all $x \in \Xsf$ .

A nonempty correspondence $\Gamma$ is called

compact-valued if $\Gamma(x)$ is compact for all $x \in \Xsf$ ,
lower hemi-continuous at $x \in \Xsf$ if, for any $y \in \Gamma(x)$ and any $(x_n)$ with $x_n \to x$ , there exists a sequence $(y_n)$ with $y_n \in \Gamma(x_n)$ for all $n$ and $y_n \to y$ ,
upper hemi-continuous at $x \in \Xsf$ if, for any sequence $(x_n)$ with $x_n \to x$ and any sequence $(y_n)$ with $y_n \in \Gamma(x_n)$ for all $n$ , there exists a convergent subsequence of $(y_n)$ whose limit is in $\Gamma(x)$ , and
continuous on $\Xsf$ if it is both lower and upper hemi-continuous at every $x \in \Xsf$ .

Let $\Gamma$ be a nonempty, compact-valued correspondence from $\Xsf$ to $\Asf$ . Let $q$ be a real valued function on $\Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)}$ and set

m(x) \coloneq \max_{a \in \Gamma(x)} q(x, a) \qquad (x \in \Xsf)

(A.5)

whenever the maximum is well defined.

A proof of the continuity results in Theorem A.3.3 can be found in §17.5 of Aliprantis & Border (2006). Existence of a measurable selection is proved in §18.19 of the same reference.

A.3.1.4Measures¶

Through the theory constructed above, we can identify broad classes of sets and functions that are relatively well behaved (e.g., Borel sets and Borel functions). This opens the way to analyzing how to (a) measure these sets and (b) integrate the functions. The first step is to introduce the notion of a measure, which is a map $\mu$ from a $\sigma$ -algebra $\aA$ to $[0, \infty]$ satisfying

$\mu(\varnothing) = 0$ and
$\mu(\cup_{n=1}^\infty A_n) = \sum_{n=1}^\infty \mu(A_n)$ whenever $\{A_n\} \subset \aA$ is disjoint.

Here disjointness of $\{A_n\}$ means that any two distinct sets in this sequence are disjoint.

Returning to the general case of a measure $\mu$ on measurable space $(\Xsf, \aA)$ , if there exists a sequence of sets $(A_n) \subset \aA$ with $\mu(A_n) < \infty$ for all $n$ and $\cup_n A_n = \Xsf$ , then $\mu$ is called $\sigma$ -finite. If $\mu(\Xsf) < \infty$ , then $\mu$ is called finite. If $\mu(\Xsf) = 1$ , then $\mu$ is called a probability measure.

If $\Xsf$ is a metric space and $\aA = \bB$ (the Borel sets), then $\mu$ is called a Borel measure. If $\aA = \bB$ and $\mu(\Xsf) = 1$ , then $\mu$ is called a Borel probability measure. For a Borel probability measure $\mu$ , the value $\mu(B)$ usually is interpreted as the probability that, when a random element of $\Xsf$ is selected, that element is in $B$ .

A measure space is a triple $(\Xsf, \aA, \mu)$ where $(\Xsf, \aA)$ is a measurable space and $\mu$ is a measure on $\aA$ . If $\mu(\Xsf) = 1$ , then the measure space is also called a probability space. In this case it is common to write the measure space as $(\Omega, \fF, \PP)$ . A random variable on probability space $(\Omega, \fF, \PP)$ is an $(\fF, \bB)$ -measurable map $X$ from $\Omega$ to $\RR$ paired with its Borel sets $\bB$ . More generally, given measurable space $(E, \eE)$ , an $E$ -valued random element on probability space $(\Omega, \fF, \PP)$ is an $(\fF, \eE)$ -measurable map $X$ from $\Omega$ to $E$ . The distribution of this random element $X$ is the probability measure $P$ defined by

P(B) = \PP\setntn{\omega \in \Omega}{X(\omega) \in B} \qquad (B \in \eE)

Here’s a reassuring fact implying that Borel probability measures on $\RR$ are isomorphic to a set of very familiar objects.

More generally, we have the interpretation

\mu(B) = \text{ probability that } x \in B \text{ when } x \text{ is drawn from } F

A.3.1.5Product Spaces and Product Measures¶

Given measurable spaces $(\Xsf, \aA)$ and $(\Ysf, \bB)$ , the product $\sigma$ -algebra $\aA \otimes \bB$ is the $\sigma$ -algebra on $\Xsf \times \Ysf$ generated by all sets of the form $A \times B$ with $A \in \aA$ and $B \in \bB$ . The triple $(\Xsf \times \Ysf, \, \aA \otimes \bB)$ is called the product measurable space.

If $\mu$ and $\nu$ are $\sigma$ -finite measures on $(\Xsf, \aA)$ and $(\Ysf, \bB)$ respectively, then there exists a unique measure $\mu \otimes \nu$ on $\aA \otimes \bB$ satisfying

(\mu \otimes \nu)(A \times B) = \mu(A) \, \nu(B) \qquad (A \in \aA, \; B \in \bB).

The measure $\mu \otimes \nu$ is called the product measure. The construction extends naturally to finite products of measurable spaces.

A.3.2Integration¶

We define abstract integrals and review their key properties, including monotonicity and the dominated convergence theorem.

A.3.2.1Abstract Integrals¶

Let $(\Xsf, \aA)$ be a measurable space and let $m\aA_+$ be the set of nonnegative real-valued Borel measurable functions on $(\Xsf, \aA)$ . We define an integral on $m\aA_+$ to be a function $I \colon m\aA_+ \to [0, \infty]$ such that

$I(f) = 0$ when $f = 0$ everywhere on $\Xsf$ ,
$f_1 \leq f_2 \leq \cdots$ and $\lim_{n\to \infty} f_n = f$ implies $\lim_{n \to \infty} I(f_n) = I(f)$ , and
$\alpha, \beta \geq 0$ and $f, g \in m\aA_+$ implies $I(\alpha f + \beta g) = \alpha I(f) + \beta I(g)$ .

The limit in (ii) is a pointwise limit, so that $\lim_{n\to \infty} f_n = f$ means $\lim_{n\to \infty} f_n(x) = f(x)$ for every $x \in \Xsf$ .

The following important result, proved in chapter 1 of Çınlar (2011), states that every measure on a measurable space creates a unique and well defined integral.

The value $I_\mu (f)$ is called the integral of $f$ under $\mu$ and the following notation is common:

I_\mu (f) :=: \int f \diff \mu :=: \int f(x) \mu(\diff x).

The integral $I_\lambda$ introduced in Example A.3.7 is called the Lebesgue integral, and it extends the standard Riemann integral to a larger set of functions (the Borel measurable functions), while at the same time guaranteeing that the attractive properties (i)–(iii) in the definition of the integral will hold.

Equation (A.6) makes sense in this setting because if, say, $f = \1_{[a, b]}$ , then

I_\lambda (f) = \lambda( [a, b] ) = b - a,

where the first equality is by (A.6) and the second is by the fact that Lebesgue measure assigns length to intervals. The value $b-a$ is also what we would expect for the integral, since it is the area under the curve for this simple function.^[3]

If $\mu$ is a probability measure and $w \colon \Xsf \to \RR$ , then one often writes $\EE w(x)$ for the integral of $w(x)$ with respect to $\mu$ . That is,

\EE w(x) = \int w \diff \mu

Here we are thinking of $x$ as a random variable drawn from distribution $\mu$ and the integral corresponds to the expectation of $w(x)$ under $\mu$ .

A.3.2.2Properties of Integrals¶

Given a measure space $(\Xsf, \aA, \mu)$ , a property is said to hold $\mu$ -almost everywhere (or $\mu$ -almost surely when $\mu$ is a probability measure) if it holds on all of $\Xsf$ except possibly a set of $\mu$ -measure zero. A sequence $(f_n)$ converges to $f$ $\mu$ -almost everywhere if $f_n(x) \to f(x)$ for $\mu$ -almost every $x \in \Xsf$ .

The integral extends to functions that take negative values, as well as just the nonnegative functions in $m\aA_+$ . Indeed, if $(\Xsf, \aA, \mu)$ is a measure space and $f \in m\aA$ is not necessarily nonnegative, then we can still decompose it into the difference between two nonnegative functions via $f = f^+ - f^-$ . Imposing linearity, we now set

\int f \diff \mu \coloneq \int f^+ \diff \mu - \int f^- \diff \mu .

The only risk here is that both terms on the right equal $+\infty$ , in which case the integral is not well defined. If both integrals are finite we call $f$ integrable with respect to $\mu$ .

In what follows, we leave $(\Xsf, \aA, \mu)$ fixed and write the integral $I_\mu(f)$ of $f$ under $\mu$ as $\int f \diff \mu$ . We note that every integral is increasing, in the sense that

f \leq g \implies \int f \diff \mu \leq \int g \diff \mu.

(A.7)

To see this, observe that $g - f$ is nonnegative (and measurable) and hence $\int (g - f) \diff \mu$ is well defined and nonnegative. Now, using the linearity in part (iii) of Theorem A.3.5, we have

\int g \diff \mu = \int (g - f + f) \diff \mu = \int (g - f) \diff \mu + \int f \diff \mu \geq \int f \diff \mu.

A battery of useful limit theorems exist for the integral we have defined. In our statements of these results, $(\Xsf, \aA, \mu)$ is any measure space and $f$ and $f_n$ are $(\aA, \bB)$ -measurable functions from $\Xsf$ to $\RR$ for all $n \in \NN$ .

The first implication (i.e., (i) $\implies$ (A.8)) is called the monotone convergence theorem. The second is called the dominated convergence theorem.

A.3.3Conditioning¶

Next we review prediction based on conditional expectations. Conditional expectations are themselves a cornerstone of economic theory and empirics, since they describe optimal forecasts based on limited information. Here we provide a brief treatment of the general setting that suffices for what follows.

A.3.3.1Definition¶

Let $Y$ and the elements of $\gG \coloneq \{X_1, \ldots, X_k\}$ be scalar random variables. Consider the problem of predicting $Y$ given $\gG$ . That is, we wish to form a prediction of the value that $Y$ will take once $X_1, \ldots, X_k$ are known, without any additional information on the state of the world. Another way to say this is that we seek a (nonrandom) function $f \colon \RR^k \to \RR$ such that

\hat Y \coloneq f(X_1, \ldots, X_k) \text{ is a good predictor of } Y.

To find such an $f$ we must define what “good” means. The most common definition in the present context is that mean squared error $\EE[(\hat Y - Y)^2]$ is small. Thus, we have a minimization problem in function space (the set from which $f$ is chosen). Based on projection arguments, it can be shown that there exists an essentially unique $\hat f$ in the set of functions from $\RR^k$ to $\RR$ that solves

\hat f = \argmin_f \EE[ (Y - f(X_1, \ldots, X_k))^2 ].

(A.9)

(See, e.g., Çınlar (2011).) We call the resulting variable

\hat Y \coloneq \hat f(X_1, \ldots, X_k)

the conditional expectation of $Y$ given $\gG$ . Common alternative notations for $\hat Y$ include

\EE_{\gG} Y :=: \EE[Y \given \gG] :=: \EE[Y \given X_1, \ldots, X_k] .

In the present context, $\gG$ is often called an information set.

A.3.3.2Properties¶

In the next proposition, a random variable $Y$ is called $\gG$ -measurable if there exists a function $f$ such that $Y = f(X_1, \ldots, X_k)$ . Intuitively, $Y$ is perfectly predictable given the data in $\gG$ .

Property (vi) states that the linearity of expectations is preserved under conditioning. Property (ii) is called the law of iterated expectations, and is shared by all projections. Property (v) is sometimes called conditional determinism, since $X$ can be treated like a constant when it is pinned down by the information set. A full proof of Proposition A.3.7 can be found in Çınlar (2011).

A.3.4Martingales¶

In this section we provide a brief introduction to martingales, one of the most important classes of stochastic processes, and a result on stopping times that’s needed for the theory of optimal stopping.

A.3.4.1Discrete-Time Martingales¶

Let $(\Omega, \fF, \PP)$ be a probability space and let $(\fF_t)_{t \geq 0}$ be a sequence of $\sigma$ -algebras with $\fF_t \subset \fF_{t+1} \subset \fF$ for all $t$ , called a filtration. A sequence of random variables $(M_t)_{t \geq 0}$ is called a martingale with respect to $(\fF_t)$ if, for all $t \geq 0$ ,

$M_t$ is $\fF_t$ -measurable,
$\EE |M_t| < \infty$ , and
$\EE[M_{t+1} \mid \fF_t] = M_t$ .

A stopping time with respect to $(\fF_t)$ is a random variable $\tau$ taking values in $\{0, 1, 2, \ldots\} \cup \{\infty\}$ such that $\{\tau \leq t\} \in \fF_t$ for all $t \geq 0$ .

A.3.4.2Martingale Stopping Times¶

Next we present the optional stopping theorem for discrete-time martingales.

Using this, we establish a general result on exit times for bounded martingales. In the statement of the theorem, $(M_t, \fF_t)$ is a discrete-time martingale with $M_0 = m$ , and

\tau = \inf\{t \geq 0 : M_t \notin (a,b)\}.

for some $a,b$ in $\RR$ with $a < b$ .

Proof

The quadratic variation process for this martingale is

\langle M \rangle_t = \sum_{s=0}^{t-1} (M_{s+1} - M_s)^2.

(The empty sum is understood to be zero, so $\langle M \rangle_0 = 0$ .) We claim that $(M_t - M_0)^2 - \langle M \rangle_t$ is a martingale. To see this we write $M_{t+1} - M_0 = (M_{t+1} - M_t) + (M_t - M_0)$ and square it to get

(M_{t+1} - M_0)^2 = (M_t - M_0)^2 + 2(M_t - M_0)(M_{t+1} - M_t) + (M_{t+1} - M_t)^2.

Taking conditional expectations and using the martingale property,

\EE[(M_{t+1} - M_0)^2 \mid \fF_t] = (M_t - M_0)^2 + \EE[(M_{t+1} - M_t)^2 \mid \fF_t].

(A.11)

By definition of quadratic variation, $\langle M \rangle_{t+1} = \langle M \rangle_t + (M_{t+1} - M_t)^2$ , so

\EE[\langle M \rangle_{t+1} \mid \fF_t] = \langle M \rangle_t + \EE[(M_{t+1} - M_t)^2 \mid \fF_t].

(A.12)

Subtracting (A.12) from (A.11), we get

\EE[(M_{t+1} - M_0)^2 - \langle M \rangle_{t+1} \mid \fF_t] = (M_t - M_0)^2 - \langle M \rangle_t.

This is the martingale property for $(M_t - M_0)^2 - \langle M \rangle_t$ . Since $(M_t)$ is bounded, this process is integrable for each $t$ , and hence is a martingale. Since $\tau \wedge n \leq n$ is a bounded stopping time, the optional stopping theorem (Theorem A.3.8) gives

\EE[(M_{\tau \wedge n} - M_0)^2 - \langle M \rangle_{\tau \wedge n}] = \EE[(M_0 - M_0)^2 - \langle M \rangle_0] = 0,

and hence

\EE[(M_{\tau \wedge n} - M_0)^2] = \EE[\langle M \rangle_{\tau \wedge n}].

(A.13)

In addition, for $t < \tau$ , we have

\EE[\langle M \rangle_{t+1} - \langle M \rangle_t \mid \fF_t] = \EE[(M_{t+1} - M_t)^2 \mid \fF_t] \geq \delta,

\EE[\langle M \rangle_{\tau \wedge n}] = \EE\left[\sum_{t=0}^{\tau \wedge n-1} (M_{t+1} - M_t)^2\right] \geq \delta \cdot \EE[\tau \wedge n].

(A.14)

Combining (A.14) and (A.13) yields

\EE[\tau \wedge n] \leq \frac{\EE[(M_{\tau \wedge n} - M_0)^2]}{\delta}.

Taking $n \to \infty$ and applying monotone convergence on the left and dominated convergence on the right (using boundedness of the martingale) gives the bound claimed in Theorem A.3.9. ◻

A.4Vector Spaces and Norms¶

We humans have natural geometric intuition about the space $\RR^n$ when $n = 3$ . If this intuition can be expressed algebraically, then $\RR^3$ results often extend to $\RR^n$ for arbitrary $n \in \NN$ – and also to more general collections of objects, such as matrices, complex numbers and real-valued functions, provided that these collections are assigned some basic algebraic structure analogous to that enjoyed by vectors in $\RR^3$ .

Of course we need to formalize what “analogous” means by codifying the properties that we need the algebraic operations to satisfy. This leads to the concept of (abstract) vector space. In this section we recall the definition of such spaces and review key properties.

A.4.1Vector Space¶

We begin with linear algebraic properties in abstract sets that generalize the idea of adding and scalar multiplying vectors in $\RR^n$ . Then we discuss properties of subsets of and maps over these abstract “vector spaces.”

A.4.1.1Definition and Properties¶

A vector space (also called a linear space) is a triple $(E, + , \cdot)$ where $E$ is a nonempty set, $+$ is a map from $E \times E$ to $E$ called addition and $\cdot$ is a map from $\RR \times E$ to $E$ called scalar multiplication, such that for all $u, v, w \in E$ and $\alpha, \beta \in \RR$ ,

$u + (v + w) = (u + v) + w$
$u + v = v + u$
there exists an element $0 \in E$ , called the origin, s.t. $u + 0 = u$ for all $u \in E$
for all $u \in E$ , there exists a $v \in E$ such that $u + v = 0$
$\alpha \cdot (\beta \cdot u) = (\alpha \cdot \beta) \cdot u$
$1 \cdot u = u$
$\alpha \cdot (u + v) = \alpha \cdot u + \alpha \cdot v$
$(\alpha + \beta) \cdot u = \alpha \cdot u + \beta \cdot u$

In practice, the $\cdot$ symbol is usually omitted, so $\alpha u \coloneq \alpha \cdot u$ . In the present context, the values $\alpha, \beta, \ldots$ are often called scalars. Also, the origin, which shares the symbol 0 with the zero element from $\RR$ , is sometimes referred to as the additive identity.^[4]

The vector space $\RR^n$ is a special case of Example A.4.2, obtained when $\Xsf = \natset{n}$ .

A.4.1.2Convexity¶

Given vector space $E$ , set $C \subset E$ is called convex if $u, v \in C$ and $\alpha \in [0,1]$ implies $\alpha u + (1-\alpha) v \in C$ . In other words, $C$ is closed under the taking of convex combinations.

When $E$ is any vector space, a nonempty subset $C$ of $E$ is called a cone in $E$ if

$C$ is convex,
$x \in C$ and $-x \in C$ implies $x = 0$ and
$\alpha x \in C$ whenever $x \in C$ and $\alpha \geq 0$ .

(Some authors refer to $C$ as a “pointed convex cone.”)

A.4.1.3Linear Maps and Subspaces¶

Analogous to the case of $\RR^n$ , a linear subspace of vector space $E$ is a set $S \subset E$ satisfying

\alpha, \beta \in \RR \text{ and } u, v \in S \; \implies \; \alpha u + \beta v \in S.

(A.15)

The proof of the next proposition is a useful exercise:

A linear operator from vector space $E$ into vector space $F$ is a map $A \colon E \to F$ satisfying

\alpha, \beta \in \RR \text{ and } u, v \in E \; \implies \; A(\alpha u + \beta v ) = \alpha A u + \beta A v.

(A.16)

It can in fact be shown that every linear operator from $\RR^k$ to $\RR^n$ can be represented by an $n \times k$ matrix.

The “kernel function” $p$ in (A.17) operator can be identified with a matrix in $\RR^{n \times n}$ when $|\Xsf|=n \in \NN$ . No such identification exists when $|\Xsf|=\infty$ .

A.4.1.4Bases and Dimension¶

A linear combination of vectors $u_1,\ldots, u_k$ in $E$ is a vector of the form $\alpha_1 u_1 + \cdots + \alpha_k u_k$ where $\alpha_1,\ldots, \alpha_k$ are scalars. A set $S \subset E$ is called linearly independent if, for any finite set $\{u_1, \ldots, u_k\} \subset S$ , we have

\alpha_1 u_1 + \cdots + \alpha_k u_k = 0 \text{ implies } \alpha_1 = \cdots = \alpha_k = 0.

A basis of a linear subspace $S$ of $E$ is a linearly independent subset $B$ of $S$ that spans $S$ (i.e., each $u \in S$ can be expressed as a finite linear combination of elements of $B$ ).

A proof can be found in Jänich (1994). In case (ii), we say that $E$ is $n$ -dimensional. $E$ is called finite-dimensional if $E$ is $n$ -dimensional for some $n \in \NN$ . In case (iii), we call $E$ infinite-dimensional.

A.4.2Normed Vector Space¶

In this section we recall basic definitions and properties concerning normed vector space and linear operators acting on such space.

A.4.2.1Norms on Vector Space¶

Given vector space $E$ , a map $\| \cdot \| \colon E \to \RR$ is called a norm on $E$ if, for any $\alpha \in \RR$ and any $u, v \in E$ ,

$\| u \| \geq 0$
$\| u \| =0 \iff u=0$
$\| \alpha u \| = |\alpha| \| u\|$ and
$\| u + v \| \leq \| u \| + \| v \|$

(nonnegativity)

(positive definiteness)

(positive homogeneity)

(triangle inequality)

Consider a normed vector space $(E, \| \cdot\|)$ with origin 0. Recalling the definition of boundedness from metric spaces, one can show that a subset $S$ of $E$ is bounded if and only if there exists an $M \in \NN$ such that $\| u \| \leq M$ for all $u \in S$ .

$G \subset E$ is said to be open in $E$ if $G$ is open in $(E, d)$ .
A sequence $(v_n)$ in $E$ is said to converge to $v \in E$ if $d(v_n, v) \to 0$ .

Let $E$ be a vector space and let $\| \cdot \|$ and $\| \cdot \|'$ be two norm on $E$ . These norms are said to be equivalent if there exist finite positive constants $A, B$ such that $\|x\| \leq A \|x\|'$ and $\|x\|' \leq B \|x\|$ for all $x \in E$ . The following result is fundamental. See, for example, Aliprantis & Burkinshaw (1998), Theorem 27.6.

A.4.2.2Completeness¶

Completeness is essential to many important theorems in applied analysis. Fortunately, the completeness of $\RR$ is inherited by many useful spaces. For example,

A proof can be found in, e.g., Aliprantis & Burkinshaw (1998), Theorem 27.6.

A complete normed vector space is called a Banach space. There are many other important Banach spaces, beyond the finite-dimensional ones.

A.4.2.3Compactness¶

Let $(E, \| \cdot \|)$ be a normed vector space. All equivalent metrics induce the same precompact sets and the same bounded sets in $E$ . Since all norms on a finite-dimensional space are equivalent (Theorem A.4.3), any metric induced by a norm on a finite dimensional vector space has the property that its precompact and bounded sets coincide (cf., Theorem A.1.6). The next theorem states this fact for the record.

In line with our discussion in Section A.1.3.4, this one-to-one pairing between closed bounded sets and compact sets breaks down in infinite dimensional spaces. In fact, the closed unit ball of a normed vector space $E$ is compact if and only if $E$ is finite-dimensional.

A.4.2.4 $L_p$ Spaces¶

Let $\mu$ be a $\sigma$ -finite measure on measurable space $(\Xsf, \aA)$ and let $p \geq 1$ . The space $L_p(\Xsf, \aA, \mu)$ consists of all Borel measurable functions $f \colon \Xsf \to \RR$ with $\int |f|^p \diff \mu$ finite. Functions that agree $\mu$ -almost everywhere are identified. The functional $\|f\|_p \coloneq \left(\int |f|^p \diff \mu\right)^{1/p}$ is a norm on $L_p(\Xsf, \aA, \mu)$ .

Scheffés identity provides a useful quantitative interpretation of $d_1$ distance between densities: For any densities $f$ and $g$ on $(\Xsf, \aA, \mu)$ , we have

\|f - g\|_1 = 2 \times \sup_{B \in \aA} \left| \int_B f \diff \mu - \int_B g \diff \mu \right|

(A.18)

Finally, Scheffés lemma is useful for testing $L_1$ convergence:

In the case where $f_n$ and $f$ are densities, Scheffé’s lemma tells us that $f_n \to f$ in $L_1$ if and only if $f_n \to f$ almost everywhere.

A.4.3Bounded Linear Operators¶

If $E$ and $F$ are normed linear spaces, then the operator norm of $A$ is defined as

\| A \| \coloneq \sup_{\| u \| = 1} \| A u \|.

(A.19)

(Here $\| u\|$ is the norm of $u$ in $E$ and $\|Au\|$ is the norm of $Au$ in $F$ .) When $\| A\|$ is finite, $A$ is called a bounded linear operator. The set of all bounded linear operators from $E$ to $F$ will be denoted $\blop(E, F)$ . If $E=F$ then we write $\blop(E)$ . Every $A \in \blop(E, F)$ is continuous, since, for $u_n \to u$ in $E$ we have

\| Au_n - Au\| \leq \|A\| \|u_n - u\| \to 0.

The converse is also true: every continuous linear operator from $E$ to $F$ is bounded – see §2.7 of Kreyszig (1978) for a proof of this fact, as well as Theorem A.4.8 below.

As suggested by the name, the operator norm is a norm on $\blop(E, F)$ . The details are left as an exercise.

The operator norm is submultiplicative: If $A, B \in \blop(E)$ , then $\|A B \| \coloneq \| A \circ B \| \leq \| A \| \cdot \| B \|$ . Iteratively applying the submultiplicative property gives $\|A^i\| \leq \|A \|^i$ for any $i \in \NN$ and $A \in \blop(E)$ , where $A^i$ is the $i$ -th composition of $A$ with itself.

Once we have a norm on $\blop(E, F)$ , we have an induced metric given by $d(A, B) = \| A - B \|$ , and $\blop(E, F)$ will be a Banach space whenever this metric is complete.

Let $E$ be a Banach space and let $A$ be an element of $\blop(E)$ . A complex scalar $\lambda$ is called an eigenvalue of $A \in \blop(E)$ if there exists a nonzero vector $e$ such that $Ae = \lambda e$ . The spectrum of $A$ , typically denoted $\sigma(A)$ , is the set of all scalar $\lambda$ such that $\lambda I - A$ fails to be bijective on $E$ . Any eigenvalue $\lambda$ lies in $\sigma(A)$ because if $Ae = \lambda e$ for some nonzero $e$ , then $\lambda I - A$ maps $e$ to 0, while also mapping 0 to 0. Hence $\lambda I - A$ is not bijective. For $A \in \blop(E)$ , the spectral radius of $A$ is defined as

\rho(A) \coloneq \sup \setntn{|\lambda|}{ \lambda \in \sigma(A)}

(A.20)

It is well known (see, e.g., Kreyszig (1978), §7.3) that

$\rho(A) \leq \| A \|$ , where $\| \cdot \|$ is the operator norm, and
$\| A^k \|^{1/k} \to \rho(A)$ as $k \to \infty$ (Gelfand’s formula).

The following theorem is essential for many results in the book.

(The infinite sum is defined as the limit of the partial sums in $E$ . Hence the infinite sum exists if and only if the partial sums converge in $E$ .)

Proof

First observe that the sequence $B_n \coloneq \sum_{i=0}^n A^i$ is Cauchy when $\rho(A) < 1$ . Indeed, using the operator norm,

\| B_k - B_{k + n} \| \leq \left\| \sum_{i \geq k}^\infty A^i \right\| \leq \sum_{i \geq k}^\infty \| A^i \|.

The final term will converge to zero in $k$ if $\sum_{i=0}^{\infty} \| A^i \|$ is finite. By the root test for convergence of series, this will be true whenever we have $\limsup_{i \to \infty} \| A^i \|^{1/i} < 1$ . We know this is true by the hypothesis $\rho(A)<1$ and Gelfand’s formula.

Since $\blop(E)$ is complete under the operator norm, this Cauchy property implies that the limit $\sum_{i=0}^{\infty} A^i$ exists. Moreover, $(I - A) \sum_{i=0}^{\infty} A^i = I$ , since

\left\| (I - A) \sum_{i=0}^{\infty} A^i - I \right\| = \lim_{n \to \infty} \left\| (I - A) \sum_{i=0}^n A^i - I \right\| = \lim_{n \to \infty} \left\| A^{n+1} \right\|

and the right hand side converges to zero by $\rho(A) < 1$ and Gelfand’s formula. ◻

An immediate consequence is global stability of affine maps with spectral radius less than one:

Solution to Exercise A.4.4

By the Neumann series lemma, $S$ has a unique fixed point in $\RR^k$ given by $\bar v \coloneq (I-A)^{-1} r$ . Fix $v \in \RR^k$ with $v \leq S v$ . Since $S$ is order preserving, the sequence $(S^n v)$ is increasing. The $n$ -th element of this sequence is $S^n v = A^n v + \sum_{i=0}^{n-1} A^i r$ . Since $\rho(A) < 1$ , the pointwise limit of this sequence is $\sum_{i=0}^\infty A^i r = (I - A)^{-1} r = \bar v$ . By Lemma A.1.3, $\bar v$ is also the supremum of $(S^n v)$ . This proves one direction of the definition of strong order stability. The proof of the other direction is similar.

A.5Order¶

In this section we study order completeness and order continuity, ordered vector spaces and Riesz spaces, the interplay between topology and order including Banach lattices and weighted sup-norm spaces, Markov models, and orders over distributions.

A.5.1Order Continuity and Order Completeness¶

When studying the real line $\RR$ , we can define completeness either as existence of limits for Cauchy sequences (Theorem A.1.2) or as existence of suprema for bounded above sets (Theorem A.1.1). The first idea can be extended to metric spaces by generalizing the concept of Cauchy sequences (see Section A.1.3.5). The second can be extended to posets by analogy with existence of suprema. The aim of this section is to describe this second concept of completeness.

A.5.1.1Lattices and Chains¶

When we discuss posets, there are multiple notions of completeness, with each one determined by the classes of sets that are required to have suprema. Specifically, a nonempty poset $V$ is called

a lattice if every finite subset of $V$ has both a supremum and an infimum in $V$ ,
chain complete if every chain in $V$ has a supremum and an infimum in $V$ , and
countably chain complete if every at most countable chain in $V$ has a supremum and an infimum in $V$ .

Note that chain completeness implies countable chain completeness but not conversely, and that neither concept implies nor is implied by the lattice property.

In the definitions above, finite sets are understood to be nonempty (i.e., in one-to-one correspondence with $\{1, \ldots, n\}$ for some $n \in \NN$ ). Also, “at most countable” means empty, finite or countable. The fact that the empty set is included has significance, as the next lemma illustrates.

Solution to Exercise A.5.3

We use the characterization of countable chain completeness from Exercise A.5.1. Assume the stated conditions. Clearly $V^\partial$ is order bounded, with top given by the bottom of $V$ and bottom given by the top of $V$ . Also, if $(v_n)$ is an increasing sequence in $V^\partial$ , then $(v_n)$ is decreasing in $V$ . Since $V$ is countably chain complete, there exists a $v \in V$ with $v_n \downarrow v$ . By Exercise A.1.7, the infimum $v$ is the supremum of $(v_n)$ in $V^\partial$ . In particular, there exists a $v$ with $v_n \uparrow v$ in $V^\partial$ . A similar argument handles the case where $(v_n)$ is decreasing in $V^\partial$ . We conclude that $V^\partial$ is countably chain complete.

The following result is a version of the Knaster–Tarski fixed point theorem for chain complete posets. In the statement, $V$ is a nonempty poset.

A sublattice of a lattice $V$ is a subset $S$ of $V$ with the property that $u \vee v$ and $u \wedge v$ are in $S$ whenever $u, v \in S$ .

A.5.1.2Dedekind Completeness¶

Consider the canonical partially ordered set $(\RR^k, \leq)$ . This set is not countably chain complete: for example, letting $\1$ be a vector of ones, the increasing sequence $(v_n) = (n \1)$ has no supremum. At the same time, $(\RR^k, \leq)$ certainly has some completeness properties. For example, it follows easily from Exercise A.1.4 that every bounded above subset of $\RR^k$ has a supremum, and every bounded below subset of $\RR^k$ has an infimum. This motivates the following definitions:

A partially ordered set $V$ is called Dedekind complete if, for any nonempty $A \subset V$ ,

$A$ is bounded above $\implies$ $A$ has a supremum in $V$ and
$A$ is bounded below $\implies$ $A$ has an infimum in $V$ .

$V$ is called countably Dedekind complete if, for any nonempty finite or countable $A \subset V$ ,

$A$ is bounded above $\implies$ $A$ has a supremum in $V$ and
$A$ is bounded below $\implies$ $A$ has an infimum in $V$ .

There are natural connections between Dedekind (resp., countable Dedekind) completeness and chain (resp. countable chain) completeness. Here is one simple result.

Proof

For part (i), let $I = [a, b]$ , $V$ be as stated and let $A$ be a subset of $I$ . On one hand, if $A$ is nonempty, then, by Dedekind completeness, $s \coloneq \bigvee A$ exists in $V$ . Since $a \preceq s \preceq b$ , we have $s \in I$ , and $s$ is the supremum of $A$ in $(I, \preceq)$ . A similar argument shows that $A$ has an infimum in $(I, \preceq)$ . On the other hand, if $A = \varnothing$ , then $a$ is an upper bound of $\varnothing$ (vacuously -- see Exercise A.1.6) and $a \preceq v$ for every upper bound $v$ of $\varnothing$ in $I$ (in fact for every $v \in I$ ). Hence $a$ is the supremum of $A$ in $(I, \preceq)$ . A similar argument shows that $b$ is the infimum of $\varnothing$ in $(I, \preceq)$ .

The proof of part (ii) is very similar to the proof of part (i). ◻

A.5.1.3Order Continuity¶

We call a map $S$ from poset $V$ to poset $W$ order continuous on $V$ if

S v_n \uparrow S v \quad \text{whenever } v_n \uparrow v.

In other words, if $(v_n) \subset V$ with $v_n \uparrow v \in V$ , then $\bigvee_n S v_n$ exists in $W$ and equals $Sv$ .

In the next lemma, $V$ and $W$ are arbitrary posets.

Next we state a variation on the Tarski–Kantorovich fixed point theorem.

Proof

Let $S, V$ be as stated. Fix $v_a \preceq v_b$ in $V$ with $v_a \preceq S v_a$ and $S v_b \preceq v_b$ . The map $S$ is order continuous and hence order preserving, so the sequence $(v_n) \coloneq (S^n v_a)$ is increasing. As the set $V$ is countably Dedekind complete and the sequence is bounded above by $v_b$ , the suprema $\bigvee_{n \geq 1} v_n$ and $\bigvee_{n \geq 1} S v_n$ exist in $V$ . If $\bar v \coloneq \bigvee_n v_n$ , then, by order continuity, $S \bar v = S \bigvee_{n \geq 1} v_n = \bigvee_{n \geq 1} S v_n = \bigvee_{n \geq 2} v_n = \bar v$ . Hence $S \bar v = \bar v$ . We have also shown that $S^n v_a \uparrow \bar v$ . ◻

Here’s a more standard version of the Tarski–Kantorovich fixed point theorem.

The next lemma is analogous to Lemma A.5.3.

A.5.2Ordered Vector Space¶

Next we add algebraic structure to posets. The combination of algebraic operations and order will allow us to develop sharp sufficient conditions for dynamic programs and convergence of algorithms.

A.5.2.1Definition and Properties¶

Let $E = (E, +, \cdot)$ be a vector space with origin 0 (see Section A.4.1) and let $\leq$ be a partial order on $E$ . We call $(E, \leq)$ an ordered vector space if the order is preserved under addition and nonnegative scalar multiplication; that is, if

$u \leq v$ implies $u + b \leq v + b$ for any $b \in E$ , and
$u \leq v$ and $\alpha \in \RR$ with $0 \leq \alpha$ implies $\alpha u \leq \alpha v$ .

The positive cone of $E$ , typically denoted by $E_+$ , is all $v \in E$ with $0 \leq v$ .

If $(E, \leq)$ is an ordered vector space and $u, v, w \in E$ , then

$u \leq 0$ and $v \leq 0$ implies $u + v \leq 0$ ,
$u \leq v$ implies $-v \leq -u$ ,
$(u \vee v) + w = (u + w) \vee (v + w)$ , and
$\alpha (u \vee v) = (\alpha u) \vee (\alpha v)$ whenever $\alpha \geq 0$ .

These facts follow directly from the definitions.

Using the definition in Section A.1.2.4, if $(v_n)$ is a sequence in ordered vector space $E$ and $v \in E$ , then the statement $v_n \uparrow v$ means that $(v_n)$ is increasing and $\bigvee_n v_n = v$ .

Solution to Exercise A.5.8

Suppose $u_n \uparrow 0$ and $v_n \uparrow 0$ . Let $U$ be the set of upper bounds of $(u_n + v_n)$ . Since $u_n \leq 0$ and $v_n \leq 0$ for all $n$ we see that $0 \in U$ . Fixing any $w \in U$ , monotonicity of the sequences gives $u_n + v_m \leq w$ for all $n, m$ , from which we obtain $u_n \leq w - v_m$ for all $n, m$ and hence $0 \leq w - v_m$ (because 0 is the supremum of $(u_n)$ ). Rearranging gives $v_m \leq w$ for all $m$ and hence $0 \leq w$ . This proves that 0 is a least element of $U$ , so 0 is the supremum of $(u_n + v_n)$ .

Regarding the second claim, suppose $u_n \uparrow u$ and fix $b \in E$ . Let $U$ be the set of upper bounds of $(u_n + b)$ . Since $u_n \leq u$ for all $n$ we see that $u + b \in U$ . If $w \in U$ , then $u_n \leq w - b$ for all $n$ , so $u \leq w - b$ , or $u + b \leq w$ . This proves that $u + b$ is a least element of $U$ , so $u+b$ is the supremum of $(u_n + b)$ .

In some settings, a partial order is introduced into a vector space $E$ by first choosing a (pointed convex) cone $C$ on $E$ (see Section A.4.1.2) and stating that $u \leq v$ if and only if $v - u \in C$ . The following discussion clarifies this idea.

A.5.2.2Operators on Ordered Vector Space¶

A linear operator $T$ mapping ordered vector space $E$ to itself is called positive if $T$ is invariant on the positive cone; that is, if $u \in E$ and $u \geq 0$ implies $Tu \geq 0$ .

(In the canonical example given above, positive operators are identified with nonnegative matrices. Unfortunately, this notational inconsistency is deeply embedded in the existing literature so we must accept it.)

Let $E$ be an ordered vector space and let $A \colon E \to E$ be a linear operator. Recalling the definition in Section A.5.1.3, $A$ is order continuous on $E$ when $(v_n) \subset E$ and $v_n \uparrow v \in E$ implies $Av_n \uparrow Av$ . By Lemma A.5.5, every order continuous linear operator is order preserving – and hence positive. The next exercise can be completed using Lemma A.5.9.

A self-map $S$ on a convex subset $C$ of ordered vector space $E \coloneq (E, \leq)$ is called convex on $C$ if

S(\lambda v + (1-\lambda) v') \leq \lambda Sv + (1-\lambda) Sv' \text{ whenever } v \leq v' \in C \text{ and } 0\leq \lambda \leq 1

The map $S$ is called concave on $C$ if

\lambda Sv + (1-\lambda) Sv' \leq S(\lambda v + (1-\lambda) v') \text{ whenever } v \leq v' \in C \text{ and } 0\leq \lambda \leq 1

A.5.2.3Riesz Space¶

Next we introduce Riesz spaces, which are ordered vector spaces with lattice structure. This structure allows for the introduction of a notion of absolute value, which behaves similarly to the pointwise absolute value over vectors in $\RR^n$ . Absolute value in turn helps us clarify and quantify the actions of operators, providing new opportunities for establishing optimality conditions in dynamic programs.

An ordered vector space $E$ is called a Riesz space if $E$ is a lattice. With $\vee$ and $\wedge$ as the lattice operations and $u, v, w \in E$ , the following properties always hold:

$u \wedge v = - ((-u) \vee (-v))$ and $u \vee v = - ((-u) \wedge (-v))$ .
$(u \wedge v) + w = (u + w) \wedge (v + w)$ and $(u \vee v) + w = (u + w) \vee (v + w)$ .

These facts can be easily verified and other related results are found in Chapter 2 of Zaanen (2012).

For element $u$ of any Riesz space $(E, \leq)$ we use the notation

|u| \coloneq u \vee (-u), \quad u^+ \coloneq u \vee 0 \quad \text{and} \quad u^- \coloneq (-u) \vee 0.

These points in $E$ are called the absolute value, positive part, and negative part of $u$ respectively. One easily shows that $|-u| = |u|$ . Also,

Proof

For (i), since $(u \vee v) + w = (u+w) \vee (v+w)$ holds in any ordered vector space, we have $u^+ - u = (u \vee 0) - u = 0 \vee (-u) = u^-$ , giving the first equality. For the second, $u \vee (-u) = u \vee (-u) + u - u = (2u \vee 0) - u = 2u^+ - u = 2 u^+ - u^+ + u^- = u^+ + u^-$ . For (iii) we refer to Theorem 5.3 of Zaanen (2012). Regarding (iv), we have

|v| \leq u \; \iff \; v \vee (-v) \leq u \; \iff \; v \leq u \text{ and } -v \leq u \; \iff \; -u \leq v \leq u.

Notice that (iii) implies the triangle inequality $|u+v| \leq |u| + |v|$ .

We will make use of the following lemma:

Here’s an obvious corollary when $E = \RR$ .

A.5.2.4Riesz Spaces of Measurable Functions¶

Let $(\Xsf, \aA, \mu)$ be a $\sigma$ -finite measure space. As usual, we let

$m\Xsf$ be the real-valued Borel measurable functions on $(\Xsf, \aA)$ and
$b\Xsf$ be the bounded functions in $m\Xsf$ .

The vector spaces $m\Xsf$ and $b\Xsf$ are both Riesz spaces when paired with the pointwise partial order $\leq$ , with $b\Xsf$ a subset of $m\Xsf$ .

A.5.2.5Almost Everywhere Pointwise Order¶

Let $(\Xsf, \aA, \mu)$ be as in the previous section and fix $p \in [1, \infty)$ . Let $L_p \coloneq L_p(\Xsf, \aA, \mu)$ be the Banach space of equivalence classes defined in Section A.4.2.4. Let $\leq$ be defined by $f \leq g$ if and only if $\setntn{x \in \Xsf}{f(x) > g(x)}$ has $\mu$ -measure zero.

The space $(L_p, \leq)$ just described is a Riesz space. For example, if $f, g \in L_p$ , then $|f \vee g| \leq |f| + |g|$ , and $\int |f| \diff \mu$ and $\int |g| \diff \mu$ are both finite. Hence $f \vee g \in L_p$ .

A.5.2.6Dedekind completeness of Riesz Space¶

Since each Riesz space is a partially ordered space, the notions of Dedekind and countable Dedekind completeness apply directly. Moreover, when testing these forms of completeness, one-sided conditions suffice. The following one-sided condition is particularly simple.

For a proof of Lemma A.5.14, see Theorem 12.1 of Zaanen (2012).

We will make repeated use of the following fact.

A proof of Lemma A.5.15 can be found in Example 12.5 of Zaanen (2012).

Several interesting function spaces are naturally ordered by the pointwise partial order. Next we study the completeness properties of such Riesz spaces. We will make use of the following lemma, in the statement of which, for $(v_n) \subset \RR^\Xsf$ , the symbol $\sup_n v_n$ indicates the pointwise supremum.

Now let $(\Xsf, \aA, \mu)$ be a $\sigma$ -finite measure space and let $m\Xsf$ and $b\Xsf$ be the Riesz spaces discussed in Section A.5.2.4. As above, let $\leq$ be the pointwise partial order.

Proof

Consider the poset $m\Xsf$ . Let $(v_n)$ be increasing and bounded above in $m\Xsf$ . In view of Lemma A.5.16 we need only show that $s \coloneq \sup_n v_n$ is in $m\Xsf$ . This follows from existence of suprema in $\RR$ when subsets are bounded above (so $s$ is real-valued) and Lemma A.3.1, which implies measurability.

Next consider the poset $b\Xsf$ . Let $(v_n) \subset b\Xsf$ be increasing and bounded above by $w \in b\Xsf$ . By the same argument as the last paragraph, we have $s \coloneq \sup_n v_n \in m\Xsf$ . Moreover, $v_1 \leq s \leq w$ with $v_1, w \in b\Xsf$ . Hence $s \in b\Xsf$ . The claim now follows from Lemma A.5.16. ◻

A.5.3Topology and Order¶

In some applications it will be helpful to draw on results that use topological or metric structure. In this section we note some elementary facts about topological, metric and normed spaces where order is also present.

A.5.3.1Partially Ordered Space¶

A partial order $\preceq$ on topological space $V$ is called closed if, given any two nets $(u_\alpha)_{\alpha \in \Lambda}$ and $(v_\alpha)_{\alpha \in \Lambda}$ contained in $V$ ,

u_\alpha \to u, \;\; v_\alpha \to v \; \text{ and } \; u_\alpha \preceq v_\alpha \text{ for all } \alpha \in \Lambda \quad \implies \; u \preceq v.

(A.21)

A partially ordered space, also called a pospace, is a Hausdorff topological space endowed with a closed partial order. (We make the Hausdorff assumption so that sequences have unique limits.)

The next lemma connects topological and order convergence in partially ordered space $V = (V, \preceq)$ . In the statement, $(v_\alpha)_{\alpha \in \Lambda}$ is a net in $V$ .

The next lemma shows how global stability (see Section A.2.2.1) interacts with order stability in the setting of partially ordered space.

Proof

Let $S, V$ have the stated properties and let $\bar v$ be the unique fixed point of $S$ in $V$ . If $v \in V$ and $v \preceq S \, v$ , then, iterating on this inequality and using the fact that $S$ is order preserving, we have $v \preceq S^n v$ for all $n \in \NN$ . Since the partial order is closed and $S$ is globally stable, taking the limit gives $v \preceq \bar v$ . Using this inequality and $v \preceq S^n v$ , we have $v \preceq S^n v \preceq S^n \bar v = \bar v$ for all $n$ . Since $S^n v \to \bar v$ , Lemma A.5.18 implies that $S^n v \uparrow \bar v$ . This proves one direction of the definition of strong order stability. The proof of the other direction is similar. ◻

The following result can be used to compare fixed points of operators. In the statement, $V = (V, \preceq)$ is a pospace and $\sS(V)$ is all self-maps on $V$ , ordered pointwise (i.e., for $S, T \in \sS(V)$ , we have $S \preceq T$ if and only if $Sv \preceq Tv$ for all $v \in V$ ).

A.5.3.2Partially Ordered Metric Space¶

A partially ordered metric space is a tuple $(V, \preceq, d)$ where $(V, \preceq)$ is a poset, $d$ is a metric on $V$ , and $(V, \preceq)$ is a pospace under the topology induced by $d$ on $V$ . In particular, $\preceq$ is closed with respect to $d$ , so that

d(v_n, v) \to 0 \text{ and } d(u_n , u) \to 0 \text{ with } u_n \preceq v_n \text{ for all } n \text{ implies } u \preceq v \text{.}

On a partially ordered metric space $V = (V, \preceq, d)$ , the metric $d$ is called sup-nonexpansive if, for all subsets $(v_\alpha)$ and $(w_\alpha)$ of $V$ , we have

d \left( \vee_\alpha \, v_\alpha, \vee_\alpha \, w_\alpha \right) \leq \sup_\alpha \, d(v_\alpha, w_\alpha)

(A.22)

whenever the suprema exist.

Sup-nonexpansive metrics will be useful for us because contraction properties are passed from collections of mappings to their upper envelopes. The next lemma explains. In the statement,

$(V, \preceq, d)$ is a partially ordered metric space,
$\TT \coloneq \setntn{T_\sigma}{\sigma \in \Sigma}$ is a collection of self-maps on $V$ ,
$V_0$ is a subset of $V$ and $Tv \coloneq \vee_\sigma T_\sigma \, v$ exists at each $v \in V_0$ .

A.5.3.3Banach Lattices¶

If $(v_n)$ is a sequence in $E$ then convergence is as defined for sequences in normed linear space (see Section A.4.2.1): $v_n \to v$ means that $\| v_n - v \| \to 0$ as $n \to \infty$ . This should not be confused with $v_n \uparrow v$ and $v_n \downarrow v$ , which are defined in terms of suprema and infima (see Section A.5.1.3). Some relationships between the different forms of convergence are discussed in the next theorem.

A.5.3.4Order Units¶

Let $E$ be a Banach lattice. An element $e \in E_+$ will be called a normalized order unit if $\|e\| = 1$ and $|u| \leq \|u\|\, e$ for all $u \in E$ . For example, if $E = b\Xsf$ , then $e = \1$ is a normalized order unit.^[5]

Proof

Let $\{u_\alpha\}$ and $\{v_\alpha\}$ be subsets of $E$ such that $\vee_\alpha u_\alpha$ and $\vee_\alpha v_\alpha$ exist. Let $c = \sup_\alpha \|u_\alpha - v_\alpha\|$ and assume $c < \infty$ , since otherwise the claim is trivial. For each $\alpha$ , we have

u_\alpha = v_\alpha + (u_\alpha - v_\alpha) \leq v_\alpha + |u_\alpha - v_\alpha| \leq \vee_\beta v_\beta + \|u_\alpha - v_\alpha\|\,e \leq \vee_\beta v_\beta + c\,e.

(A.23)

Since $\vee_\beta v_\beta + c\,e$ is an upper bound for every $u_\alpha$ , we have $\vee_\alpha u_\alpha \leq \vee_\beta v_\beta + c\,e$ . By symmetry, $\vee_\beta v_\beta \leq \vee_\alpha u_\alpha + c\,e$ . Together these give $\left|\vee_\alpha u_\alpha - \vee_\beta v_\beta\right| \leq c\,e$ , and since $\|e\| = 1$ ,

\Bigl\|\vee_\alpha u_\alpha - \vee_\alpha v_\alpha\Bigr\| \leq c = \sup_{\alpha} \|u_\alpha - v_\alpha\|.

(A.24)

◻

A.5.3.5Weighted Sup-Norm Spaces¶

In this section we introduce a class of Banach lattices that are useful for handing unbounded dynamic programming problems. To this end, let $\Xsf$ be a topological space. A weight function on $\Xsf$ is a mapping $\ell \in m\Xsf$ with $\ell(x) \geq 1$ for all $x \in \Xsf$ . Given a weight function $\ell$ and $v \in \RR^\Xsf$ we introduce the $\ell$ -weighted supremum norm

\| v \|_\ell \coloneq \sup_{x \in \Xsf} \, \frac{|v(x)|}{\ell(x)}.

In this setting, we let

Elements of $b_\ell \Xsf$ are called $\ell$ -bounded functions.

The next theorem gives conditions under which the spaces discussed above are Banach lattices. Proofs can be found in §12.2.1 of Stachurski (2022).

A.5.3.6Positive Operators on Banach Lattices¶

If $E$ is a Banach lattice, then, as in Section A.4.3, we take $\blop(E)$ to be the norm bounded (and hence norm continuous) linear self-maps on $E$ . Let $\blop_+(E)$ be the positive linear self-maps on $E$ .

Continuing in the setting of Exercise A.5.26, the pointwise partial order on $\blop(E)$ is defined by $A \leq B$ whenever $Av \leq Bv$ for all $v \in E$ . The set $\blop_+(E)$ coincides with the positive cone of $\blop(E)$ . On this positive cone, the spectral radius is order preserving:

A Banach lattice $E$ is said to have a $\sigma$ -order continuous norm if

(v_n) \subset E \text{ and } v_n \downarrow 0 \quad \implies \quad \| v_n \| \to 0.

In the next example, $L_p$ is the Banach lattice discussed in Example A.5.7.

A.5.4Markov Models¶

Many dynamic programs have some form of Markov structure (or can be coerced into a Markov framework by suitably changing the state space). Here we review key ideas related to Markov processes and state some useful results.

In all of this section (Section A.5.4), $\Xsf$ is a metric space with Borel sets $\bB$ . The symbol $\dD(\Xsf)$ is the set of all distributions (Borel probability measures) on $\Xsf$ . If $\Xsf$ is finite, then the metric on $\Xsf$ is the discrete metric (under which all real-valued functions on $\Xsf$ are continuous and $\bB$ is the set of all subsets of $\Xsf$ ).

A.5.4.1Stochastic Kernels¶

Let $\Usf$ be a second metric space. A transition kernel from $\Usf$ to $\Xsf$ is a function $N$ from $\Usf \times \bB$ to $\RR_+$ with the property that $u \mapsto N(u,B)$ is Borel measurable for each $B \in \bB$ and $B \mapsto N(u, B)$ is a measure on $(\Xsf, \bB)$ for all $u \in \Usf$ . A stochastic kernel from $\Usf$ to $\Xsf$ is a transition kernel $P$ from $\Usf$ to $\Xsf$ satisfying $P(u, \Xsf) =1$ for all $u \in \Usf$ . Informally, the stochastic kernel $P$ takes a point $u \in \Usf$ and randomly “transitions” to a new point in $\Xsf$ via the distribution $P(u, \diff x)$ .

A common setting is where $\Usf = \Xsf$ . In this case we say that $N$ is a transition kernel on $\Xsf$ , while $P$ is a stochastic kernel on $\Xsf$ .

An $\Xsf$ -valued stochastic process $(X_t)_{t =0}^\infty$ on $(\Omega, \fF, \PP)$ is called $(P, \psi)$ -Markov if $X_0 \eqdist \psi$ and $\PP\{ X_{t+1} \in B \given X_t\} = P(X_t, \diff x')$ with probability one for all $t \geq 0$ . If $\psi = \delta_x$ for some $x \in \Xsf$ , then we say $(X_t)_{t \geq 0}$ is $(P, x)$ -Markov. We also say that $(X_t)_{t \geq 0}$ is $P$ -Markov if $(X_t)_{t \geq 0}$ is $(P, \psi)$ -Markov for some $\psi \in \dD(\Xsf)$ .

Given any stochastic kernel $P$ on $\Xsf$ and any initial condition $x \in \Xsf$ , a $(P, x)$ -Markov process always exists. In particular, we can take the canonical construction: set $\Omega = \Xsf^\infty$ , let $X_t(\omega) = \omega_t$ be the coordinate projections, and let $\PP_x$ be the unique probability measure on $\Xsf^\infty$ (equipped with the product $\sigma$ -algebra) such that $X_0 = x$ $\PP_x$ -a.s. and $\PP_x\{X_{t+1} \in B \given X_0, \ldots, X_t\} = P(X_t, B)$ . Existence of $\PP_x$ follows from the Ionescu-Tulcea theorem (see, e.g., Meyn & Tweedie (2009), Chapter 3).

Let $\theta \colon \Xsf^\infty \to \Xsf^\infty$ denote the shift operator defined by $\theta(x_0, x_1, \ldots) = (x_1, x_2, \ldots)$ . The following is the Markov property in its general form: for the canonical chain, if $f \colon \Xsf^\infty \to \RR$ is measurable and either nonnegative or bounded, then

\EE_x [ f \circ \theta \given X_1 ] = \EE_{X_1} f \quad \PP_x\text{-a.s.}

(A.25)

For a proof see Meyn & Tweedie (2009), p. 63.

Example A.5.11 (Stochastic recursive sequence)

Suppose $(X_t)_{t \geq 0}$ is defined by

X_{t+1} = F(X_t, W_{t+1}), \quad (W_t)_{t \geq 1} \iidsim \phi, \quad X_0 \sim \psi

(A.26)

where $(W_t)_{t \geq 1}$ and IID random elements taking values in metric space $\Zsf$ , $F \colon \Xsf \times \Zsf \to \Xsf$ is Borel measurable, and $X_0$ and $(W_t)_{t \geq 0}$ are defined on a common probability space $(\Omega, \fF, \PP)$ and are jointly independent. As a stochastic process, $(X_t)$ is $P$ -Markov when

P(x,B) \coloneq \PP\{F(x,W_{t+1}) \in B\} = \int \1_B[F(x,z)] \phi(\diff z) \qquad (x \in \Xsf, \; B \in \bB).

(A.27)

A.5.4.2Markov Operators¶

As before, let $m\Xsf$ be all the measurable functions on $\Xsf$ and let $P$ be a stochastic kernel on $\Xsf$ . Given $h \in m\Xsf$ we set

(P h)(x) \coloneq \int h(x') P(x, \diff x') \qquad (x \in \Xsf).

(A.28)

whenever the integral is well-defined. We call $P$ the Markov operator generated by the stochastic kernel $P$ . We use the same symbol because stochastic kernels and Markov operators can be placed in one-to-one correspondence via

P(x,B) = (P \1_B)(x) \qquad (x \in \Xsf, \, B \in \bB).

(A.29)

(The stochastic kernel is on the left and the Markov operator is on the right.)

(We already studied a version of $P$ in Example A.4.6.) Intuitively, $(P h)(x)$ represents the expectation of $h(X_{t+1})$ given $X_t = x$ . We extend this interpretation below.

Given a stochastic kernel $P$ on $\Xsf$ , a distribution $\phi \in \dD$ is called stationary for $P$ if

\phi(B) = \int P(x, B) \phi(\diff x) \quad \text{ for all } B \in \bB.

A.5.4.3General Properties¶

The following lemma lists useful properties of the Markov operator ((A.28)) when considered as a linear operator on $b\Xsf$ , the set of bounded Borel measurable functions on $\Xsf$ . Proofs can be found in Meyn & Tweedie (2009), Chapter 3.

(Note that (vi) follows from (v) and Gelfand’s formula for the spectral radius.)

Here is a fundamental result linking the stochastic kernel $P$ , Markov operator $P$ , and any $P$ -Markov process $(X_t)_{t \geq 0}$ . For a proof see Meyn & Tweedie (2009), Proposition 3.4.2.

A.5.4.4Markov Operators on Integrable Functions¶

Sometimes we wish to consider Markov operators as linear operators over a space of integrable functions. To this end, let $P$ be a stochastic kernel on $\Xsf$ and let $\phi$ be stationary for $P$ . As before, we use the same symbol $P$ for the Markov operator defined in (A.28). The space $L_1(\phi) \coloneq L_1(\Xsf, \bB, \phi)$ is the Banach lattice discussed in Example A.5.8.

The proof of part (i) follows from the following adjoint rule:

If we apply (A.32) with stationary $\psi$ we get

\int Ph \diff \psi = \int h \diff \psi.

(A.33)

To obtain (i) of Lemma A.5.32 we can use $\int |Ph| \diff \psi \leq \int P|h| \diff \psi$ and the apply (A.33) with $h$ replaced by $|h|$ . This proves that $Ph$ is $\psi$ -integrable whenever $h$ is $\psi$ -integrable. Linearity of $P$ is immediate. Part (ii) follows from order preservation of the integral. For (iii), $\|P\| \leq 1$ follows from the bound on $\int |Ph| \diff \psi$ just obtained, and $\|P\| \geq 1$ from $P\1 = \1$ . Part (iv) follows from (iii) and Gelfand’s formula.

A.5.5Orders over Distributions¶

Distributions are objects that decision makers naturally have preferences over. For example, speculators care about probability distributions over returns of prospective investments, often preferring distributions that offer high average returns with low risk. A planner might have preferences over the cross-sectional distributions of consumption and wealth. In this section, we discuss common methods for ordering distributions and their relationships with each other.

A.5.5.1Stochastic Dominance¶

Let $\Xsf$ be a metric space and let $\dD(\Xsf)$ be the set of all distributions (i.e., Borel probability measures) on $\Xsf$ . Let $ib\Xsf$ be the increasing bounded real-valued functions on $\Xsf$ . For $\mu$ and $\nu$ in $\dD(\Xsf)$ , we say that

$\nu$ first order stochastically dominates $\mu$ and write $\mu \lefsd \nu$ if

\int u(x) \mu(\diff x) \leq \int u(x) \nu(\diff x) \; \text{ for every } u \text{ in } ib\Xsf \text{ and}

$\nu$ second order stochastically dominates $\mu$ and write $\mu \lessd \nu$ if

\int u(x) \mu(\diff x) \leq \int u(x) \nu(\diff x) \; \text{ for every concave } u \text{ in } ib\Xsf.

If we refer to stochastic dominance without explicitly stating the order, then the understanding is that we mean first order stochastic dominance.

Suppose now that $\Xsf$ is a Borel subset of $\RR$ and fix $F, G \in \dD(\Xsf)$ . We understand $F$ and $G$ as cumulative distribution functions. When testing first order stochastic dominance, it is sufficient to restrict attention to increasing functions $u \in b\Xsf$ that take the form $u(x) = \1\{a < x\}$ for some $a \in \Xsf$ (see, e.g., Stachurski (2022), §9.4.1). Recalling the interpretation of the integral given in (A.6), this leads to the statement that $F \lefsd G$ if and only if $1 - F(a) \leq 1 - G(a)$ for all $a \in \Xsf$ , or

F \lefsd G \iff G(x) \leq F(x) \quad \text{ for all } x \in \Xsf

(A.34)

A.5.5.2Monotone Likelihood Ratios¶

Here is a property that implies first order stochastic dominance: Consider a pair of distributions $(F, G)$ with positive densities $f$ and $g$ on an interval $I$ contained in $\RR$ . We say that $(f, g)$ has a monotone likelihood ratio if $f/g$ is increasing on $I$ ; that is, if

x, x' \in I \text{ and } x \leq x' \implies \frac{f(x)}{g(x)} \leq \frac{f(x')}{g(x')}

(A.35)

Since $r$ is increasing in $x$ , the monotone likelihood ratio property holds.

Proof

Let $a := \inf I$ and $b := \sup I$ . (These values can be infinite.) Writing the monotone likelihood ratio property as

x \leq x' \implies f(x) g(x') \leq f(x') g(x)

(A.36)

and integrating with respect to $x$ from $a$ to $x'$ gives $F(x') g(x') \leq f(x') G(x')$ . Also, integrating (A.36) with respect to $x'$ from $x$ to $b$ gives $f(x) [1 - G(x)] \leq [1-F(x)] g(x)$ . Setting $x = x' = y$ in the last two inequalities yields

\frac{1 - G(y)}{1 - F(y)} \leq \frac{g(y)}{f(y)} \leq \frac{G(y)}{F(y)} .

This implies $F(y) \leq G(y)$ for arbitrary $y$ , so $G \lefsd F$ . ◻

A.5.5.3Mean-Preserving Spreads¶

We will be concerned with analyzing how behavior changes when decisions become “riskier” in some sense. To analyze such scenarios, we introduce the notion of a mean-preserving spread. In particular, for a given distribution $\phi$ , we say that $\psi$ is a mean-preserving spread of $\phi$ if there exists a pair of random variables $(Y, Z)$ such that

\EE[Z \given Y] = 0, \quad Y \eqdist \phi \quad \text{and } \; Y + Z \eqdist \psi

Thus, $\psi$ is a mean-preserving spread of $\phi$ if it adds noise without changing the mean.

Solution to Exercise A.5.31

Let $\phi$ be a mean-preserving spread of $\psi$ . Then there exists a random pair $(Y, Z)$ such that

Y \eqdist \psi, \quad Y + Z \eqdist \phi \quad \text{and } \; \EE[Z \given Y] = 0.

Fixing arbitrary concave $u \in ib\RR$ and applying Jensen’s inequality,

\EE\, u(Y + Z) = \EE \, \EE[ u(Y + Z) \given Y] \leq \EE \, u ( \EE[ Y + Z \given Y] ) = \EE \, u ( Y ).

Therefore $\int u(x) \phi(\diff x) = \EE \, u(Y + Z) \leq \EE u(Y) = \int u(x) \psi(\diff x)$ .

A.6Chapter Notes¶

Good introductions to real analysis include Bartle & Sherbert (2011) and Aliprantis & Burkinshaw (1998). For topology, measure theory, and functional analysis at an advanced level, Aliprantis & Border (2006) provides comprehensive coverage, while Dudley (2002) and Kreyszig (1978) offer accessible treatments. Fixed point theory in metric spaces is developed in Goebel & Kirk (1990), and nonlinear optimization is covered in Jahn (2020).

For lattice theory and order, Davey & Priestley (2002) provides a thorough introduction. High quality monographs on Riesz spaces, Banach lattices and positive operators include Aliprantis & Border (2006), Aliprantis & Burkinshaw (2006), Zaanen (2012), Meyer-Nieberg (2012), and Bátkai et al. (2017).

For further reading on probability theory and stochastic processes, Pollard (2002), Çınlar (2011) and Dudley (2002) are outstanding. For Markov chains and stochastic stability, the standard reference is Meyn & Tweedie (2009).

Footnotes¶

More precisely, $\sigma(\cC)$ is the intersection of all $\sigma$ -algebras on $\Xsf$ that contain $\cC$ . One can show that $\sigma(\cC)$ is always a well defined $\sigma$ -algebra, since the intersection is nonempty (it at least contains $\wp(\Xsf)$ ) and any intersection of $\sigma$ -algebras is again a $\sigma$ -algebra.
↩
For example, the pointwise limit of the sequence of functions $\{f_n\}$ given by $f_n(x) = x^n$ on $[0, 1]$ is discontinuous.
↩
A more general perspective on (A.6) that you might find useful is as follows. Suppose we identify measurable sets with their indicator functions. Then $\mu$ already provides us with an “integral” over the indicators in $m\aA_+$ . The map $I_\mu$ extends the reach of this function to all of $m\aA_+$ .
↩
Some authors would call what we have described as a real vector space, which can then be extended to the notion of complex vector spaces. We have no need for this extension here, so we drop the adjective “real.”
↩
Not all Banach lattices have normalized order units. In fact it can be shown that a Banach lattice $E$ admits a normalized order unit if and only if $E$ is an AM-space with unit. We omit these details.
↩

References¶

Bartle, R. G., & Sherbert, D. R. (2011). Introduction to real analysis (4th ed.). Hoboken, NJ: Wiley.
Aliprantis, C. D., & Border, C., Kim. (2006). Infinite dimensional analysis: a hitchhiker’s guide (3rd ed.). Springer-Verlag, New York.
Dudley, R. M. (2002). Real analysis and probability (Vol. 74). Cambridge University Press.
Jahn, J. (2020). Introduction to the theory of nonlinear optimization (4th ed.). Springer Nature.
Goebel, K., & Kirk, W. A. (1990). Topics in metric fixed point theory. Cambridge university press.
Çınlar, E. (2011). Probability and stochastics (Vol. 261). Springer Science & Business Media.
Kechris, A. (2012). Classical descriptive set theory (Vol. 156). Springer Science & Business Media.
Jänich, K. (1994). Linear algebra. Undergraduate Texts in Mathematics. Springer-Verlag, New York, 7, 8.
Aliprantis, C. D., & Burkinshaw, O. (1998). Principles of real analysis (3rd ed.). Academic Press.
Kreyszig, E. (1978). Introductory functional analysis with applications (Vol. 1). wiley New York.
Davey, B. A., & Priestley, H. A. (2002). Introduction to lattices and order. Cambridge University Press.
Zaanen, A. C. (2012). Introduction to operator theory in Riesz spaces. Springer.
Stachurski, J. (2022). Economic dynamics: theory and computation (2nd ed.). MIT Press.
Meyn, S. P., & Tweedie, R. L. (2009). Markov chains and stochastic stability. Cambridge University Press.
Aliprantis, C. D., & Burkinshaw, O. (2006). Positive operators (Vol. 119). Springer Science & Business Media.

A Mathematical Background

A.1Foundations¶

A.1.1Properties of the Real Line¶

A.1.1.1Min, Max, Sup and Inf¶

A.1.1.2Completeness of the Real Line¶

A.1.2Partial Orders¶

A.1.2.1Partially Ordered Sets¶

A.1.2.2Bounds¶

A.1.2.3Greatest and Least Elements¶

A.1.2.4Suprema and Infima¶

A.1.2.5Order Duals¶

A.1.2.6Monotone Sequences¶

A.1.2.7Order Preserving Maps¶

A.1.2.8Strict Monotonicity¶

A.1.2.9Order Isomorphisms¶

A.1.2.10Order Stability¶

A.1.3Metric Space¶

A.1.3.1Definition¶

A.1.3.2Convergence¶

A.1.3.3Open and Closed Sets¶

A.1.3.4Compactness¶

A.1.3.5Completeness¶

A.2Topology¶

A.2.1Topological Space¶

A.2.1.1Definition and Examples¶

A.2.1.2Nets¶

A.2.1.3Continuous Functions¶

A.2.1.4Initial Topologies¶

A.2.1.5Metrizable Spaces¶

A.2.1.6Product Topologies¶

A.2.1.7Existence of Extrema¶

A.2.2Stability and Contractions¶

A.2.2.1Fixed Points¶

A.2.2.2Contractions¶

A.3Measure and Integration¶

A.3.1Measure Theory¶

A.3.1.1Measurable Space¶

A.3.1.2Measurable Functions¶

A.3.1.3Parametric Continuity and Measurable Selections¶

A.3.1.4Measures¶

A.3.1.5Product Spaces and Product Measures¶

A.3.2Integration¶

A.3.2.1Abstract Integrals¶

A.3.2.2Properties of Integrals¶

A.3.3Conditioning¶

A.3.3.1Definition¶

A.3.3.2Properties¶

A.3.4Martingales¶

A.3.4.1Discrete-Time Martingales¶

A.3.4.2Martingale Stopping Times¶

A.4Vector Spaces and Norms¶

A.4.1Vector Space¶

A.4.1.1Definition and Properties¶

A.4.1.2Convexity¶

A.4.1.3Linear Maps and Subspaces¶

A.4.1.4Bases and Dimension¶

A.4.2Normed Vector Space¶

A.4.2.1Norms on Vector Space¶

A.4.2.2Completeness¶

A.4.2.3Compactness¶

A.4.2.4LpL_pLp​ Spaces¶

A.4.3Bounded Linear Operators¶

A.5Order¶

A.5.1Order Continuity and Order Completeness¶

A.5.1.1Lattices and Chains¶

A.5.1.2Dedekind Completeness¶

A.5.1.3Order Continuity¶

A.5.2Ordered Vector Space¶

A.5.2.1Definition and Properties¶

A.5.2.2Operators on Ordered Vector Space¶

A.5.2.3Riesz Space¶

A.5.2.4Riesz Spaces of Measurable Functions¶

A.5.2.5Almost Everywhere Pointwise Order¶

A.5.2.6Dedekind completeness of Riesz Space¶

A.5.3Topology and Order¶

A.5.3.1Partially Ordered Space¶

A.5.3.2Partially Ordered Metric Space¶

A.5.3.3Banach Lattices¶

A.5.3.4Order Units¶

A.5.3.5Weighted Sup-Norm Spaces¶

A.4.2.4 $L_p$ Spaces¶