Notation
Mathematical Notation¶
indicator function (1 if statement is true, 0 otherwise) | |
is defined as equal to 1 | |
function is everywhere equal to 1 | |
and | supremum and infimum (see Section A.1.2.4) |
the power set of ; that is, the set of all subsets of given set | |
the complex numbers | |
, and | the natural numbers, integers and real numbers respectively |
, , etc. | the nonnegative elements of , , etc. |
absolute value of scalar or vector (modulus if ) | |
for set | the cardinality of |
all -tuples of real numbers | |
all real matrices | |
for (pointwise partial order) | |
all functions from to | |
all bounded (or bounded measurable) functions in (see Example A.1.14) | |
all continuous functions in (see Example A.1.17) | |
there exists with for all |
the set of Borel probability measures on (see Section A.5.4) | |
the set of bounded linear operators from to (see Section A.4.3) | |
the inner product of and | |
is increasing and (see Section A.1.2.6) | |
IID | independent and identically distributed |
and have the same distribution | |
has distribution | |
first order stochastically dominates (see Section A.5.5) |
Dynamic Programming Notation and Terminology¶
an ADP with value space and policy operators (see Section 2.1.1.1) | |
a -value function; fixed point of (see Section 2.1.1.1) | |
the Bellman operator, defined by (see (2.4)) | |
the Howard operator, defined by where is -greedy (see Section 2.2.1.1) | |
the optimistic policy operator (see (2.10)) | |
all with at least one -greedy policy (see Section 2.1.1.4) | |
all such that (see Section 2.1.1.4) | |
the set of fixed points of the policy operators (see Section 2.1.1.4) | |
the value function; greatest element of (see Section 2.1.2.1) | |
VFI | value function iteration (see Section 1.2.1.3 and Section 2.2.1) |
OPI | optimistic policy iteration (see Section 1.2.1.3 and Section 2.2.1) |
HPI | Howard policy iteration (see Section 1.2.1.3 and Section 2.2.1) |