Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

B Remaining Proofs

Authors
Affiliations
New York University
Australian National University

B.1Chapter 2 Results

B.2Chapter 6 Results

We adopt the setting of Section 6.1.1.2 and consider the claim

Ext=0[i=0tβi]h(Xt)=t=0Ex[i=0tβi]h(Xt),\EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] h(X_t) = \sum_{t=0}^\infty \EE_x \, \left[ \prod_{i=0}^t \beta_i \right] h(X_t),

when (Xt)(X_t) is PP-Markov with initial condition xx and hRXh \in \RR^\Xsf. Throughout this discussion the assumption ρ(L)<1\rho(L) < 1 is in force (see Theorem 6.1.1). Unlike the rest of the book, we assume some familiarity with measure theory, at the level of, say, Dudley (2002), Chapters 3 and 4.

To begin the discussion we set

FTt=0Tδth(Xt)andFt=0δth(Xt)whereδti=0tβi.F_T \coloneq \sum_{t=0}^T \delta_t \, h(X_t) \quad \text{and} \quad F \coloneq \sum_{t=0}^\infty \delta_t \, h(X_t) \quad \text{where} \quad \delta_t \coloneq \prod_{i=0}^t \beta_i.

Our first aim is to show that FF is a well-defined random variable, in the sense that the sum converges almost surely. Since absolute convergence of real series implies convergence, and since finite expectation implies finiteness almost everywhere, it suffices to show that

Ext=0δth(Xt)<.\EE_x \, \sum_{t=0}^\infty \delta_t \, |h(X_t)| < \infty.

By the monotone convergence theorem (see, e.g., Dudley (2002), Theorem 4.3.2), we have

Ext=0δth(Xt)=t=0Exδth(Xt)=t=0(Lth)(x),\EE_x \, \sum_{t=0}^\infty \delta_t \, |h(X_t)| = \sum_{t=0}^\infty \EE_x \, \delta_t \, |h(X_t)| = \sum_{t=0}^\infty (L^t |h|)(x) ,

where the last equality is by (6.7). Since ρ(L)<1\rho(L) < 1, we have shown that (B.5) holds, which in turn confirms that FF is well-defined and finite almost surely.

Now observe that, on the probability one set where FF is finite, we have FTFF_T \to F as TT \to \infty. Moreover,

FTt=0Tδth(Xt)Yt=0δth(Xt),|F_T| \leq \sum_{t=0}^T \delta_t \, |h(X_t)| \leq Y \coloneq \sum_{t=0}^\infty \delta_t \, |h(X_t)|,

and, as shown above, ExY<\EE_x \, Y < \infty. By the dominated convergence theorem, we now have ExF=limTExFT\EE_x \, F = \lim_{T \to \infty} \EE_x \, F_T, or, equivalently,

Ext=0δth(Xt)=limTExt=0Tδth(Xt)=limTt=0TExδth(Xt)=t=0Exδth(Xt).\EE_x \, \sum_{t=0}^\infty \delta_t \, h(X_t) = \lim_{T \to \infty} \EE_x \, \sum_{t=0}^T \delta_t \, h(X_t) = \lim_{T \to \infty} \sum_{t=0}^T \EE_x \, \delta_t \, h(X_t) = \sum_{t=0}^\infty \EE_x \, \delta_t \, h(X_t).

Hence (B.3) holds.

B.3Chapter 7 Results

B.4Chapter 9 Results

Let’s now turn to the proof of the core optimality results for ADPs. In what follows, A=(V,{Tσ})\aA = (V, \{T_\sigma\}) is a well-posed ADP with Bellman operator T\tmax and σ\sigma-value functions {vσ}σΣ\{ v_\sigma \}_{\sigma \in \Sigma}. We start with

We first prove Proposition 9.2.5 and then return to Theorem 9.2.4.

References
  1. Dudley, R. M. (2002). Real analysis and probability (Vol. 74). Cambridge University Press.