Common Symbols and Terminology - Dynamic Programming Volume II: General States

$\1\{P\}$	indicator function (1 if statement $P$ is true, 0 otherwise)
$\alpha \coloneq 1$	$\alpha$ is defined as equal to 1
$f \equiv 1$	function $f$ is everywhere equal to 1
$\bigvee$ and $\bigwedge$	supremum and infimum (see Section A.1.2.4)
$\wp(A)$	the power set of $A$ ; that is, the set of all subsets of given set $A$
$\natset{n}$	$\{1, \ldots, n\}$
$\CC$	the complex numbers
$\NN$ , $\ZZ$ and $\RR$	the natural numbers, integers and real numbers respectively
$\ZZ_+$ , $\RR_+$ , etc.	the nonnegative elements of $\ZZ$ , $\RR$ , etc.
$\|x\|$	absolute value of scalar or vector $x$ (modulus if $x \in \CC$ )
$\|B\|$ for set $B$	the cardinality of $B$
$\RR^n$	all $n$ -tuples of real numbers
$\RR^{m \times n}$	all $m \times n$ real matrices
$x \leq y \;\; (x,y \in \RR^n)$	$x_i \leq y_i$ for $i=1, \ldots n$ (pointwise partial order)
$\RR^\Xsf$	all functions from $\Xsf$ to $\RR$
$b\Xsf$	all bounded (or bounded measurable) functions in $\RR^\Xsf$ (see Example A.1.14)
$bc\Xsf$	all continuous functions in $b\Xsf$ (see Example A.1.17)
$f(n) = \OO(\beta^n)$	there exists $C < \infty$ with $f(n) \leq C \beta^n$ for all $n \in \NN$

$\dD(\Xsf)$	the set of Borel probability measures on $\Xsf$ (see Section A.5.4)
$\blop(E, F)$	the set of bounded linear operators from $E$ to $F$ (see Section A.4.3)
$\la a, b \ra$	the inner product of $a$ and $b$
$v_n \uparrow v$	$(v_n)$ is increasing and $\bigvee_n v = v$ (see Section A.1.2.6)
IID	independent and identically distributed
$X \eqdist Y$	$X$ and $Y$ have the same distribution
$X \sim F$	$X$ has distribution $F$
$F \lefsd G$	$F$ first order stochastically dominates $G$ (see Section A.5.5)

$(V, \TT)$	an ADP with value space $V$ and policy operators $T_\sigma \in \TT$ (see Section 2.1.1.1)
$v_\sigma$	a $\sigma$ -value function; fixed point of $T_\sigma$ (see Section 2.1.1.1)
$T$	the Bellman operator, defined by $Tv = \bigvee_\sigma T_\sigma \, v$ (see (2.4))
$H$	the Howard operator, defined by $Hv = v_\sigma$ where $\sigma$ is $v$ -greedy (see Section 2.2.1.1)
$W$	the optimistic policy operator (see (2.6))
$V_G$	all $v \in V$ with at least one $v$ -greedy policy (see Section 2.1.1.4)
$V_U$	all $v \in V_G$ such that $v \preceq Tv$ (see Section 2.1.1.4)
$V_\Sigma$	the set of fixed points of the policy operators (see Section 2.1.1.4)
$\vmax$	the value function; greatest element of $V_\Sigma$ (see Section 2.1.2.1)
VFI	value function iteration (see Section 1.2.1.3 and Section 2.2.1)
OPI	optimistic policy iteration (see Section 1.2.1.3 and Section 2.2.1)
HPI	Howard policy iteration (see Section 1.2.1.3 and Section 2.2.1)

Mathematical Notation¶