B.1Chapter 2 Results¶
B.2Chapter 6 Results¶
We adopt the setting of Section 6.1.1.2 and consider the claim
when is -Markov with initial condition and . Throughout this discussion the assumption is in force (see Theorem 6.1.1). Unlike the rest of the book, we assume some familiarity with measure theory, at the level of, say, Dudley (2002), Chapters 3 and 4.
To begin the discussion we set
Our first aim is to show that is a well-defined random variable, in the sense that the sum converges almost surely. Since absolute convergence of real series implies convergence, and since finite expectation implies finiteness almost everywhere, it suffices to show that
By the monotone convergence theorem (see, e.g., Dudley (2002), Theorem 4.3.2), we have
where the last equality is by (6.7). Since , we have shown that (B.5) holds, which in turn confirms that is well-defined and finite almost surely.
Now observe that, on the probability one set where is finite, we have as . Moreover,
and, as shown above, . By the dominated convergence theorem, we now have , or, equivalently,
Hence (B.3) holds.
B.3Chapter 7 Results¶
B.4Chapter 9 Results¶
Let’s now turn to the proof of the core optimality results for ADPs. In what follows, is a well-posed ADP with Bellman operator and -value functions . We start with
We first prove Proposition 9.2.5 and then return to Theorem 9.2.4.
- Dudley, R. M. (2002). Real analysis and probability (Vol. 74). Cambridge University Press.