Continuous-Time - Dynamic Programming Volume I: Finite States

Earlier chapters treated dynamics in discrete time. Now we switch to continuous time. We restrict ourselves to finite state spaces, where continuous-time processes are pure jump processes. This allows us to provide a rigorous and self-contained treatment, while laying foundations for a treatment of general state problems.

10.1Continuous-Time Markov Chains¶

In this section, we introduce continuous-time Markov models. In Section 10.2, we will use them as components of continuous-time Markov decision processes.

10.1.1Background¶

In Section 3.1.1 we learned that if $(X_t) = (X_0, X_1, \ldots)$ is $P$ -Markov, then the distributions $(\psi_t)$ of the state process obey $\psi_{t+1} = \psi_t P$ for all $t$ . This update rule is a linear difference equation in distribution space, which in turn suggests that, once we switch to continuous-time, distributions will evolve according to linear differential equations in distribution space.

This idea turns out to be correct. As such, we begin this chapter with some facts about linear differential equations.

10.1.1.1Scalar Exponentials¶

Solutions to linear differential equations involve exponential functions. The real-valued exponential function can be defined by the power series

\me^x :=: \exp(x) \coloneq \sum_{k \geq 0} \frac{x^k}{k!} \qquad (x \in \RR).

(10.1)

Example 10.1.1

If $u_t$ is the balance of a savings account that pays a continuously compounded interest rate $r$ , then the balance evolves according to

\dot u_t \coloneq \frac{\diff}{\diff t} u_t = r u_t \quad \text{for all } \; t \geq 0 \quad \text{with initial balance } u_0 \text{ given}.

(10.2)

We understand (10.2) as a functional equation whose solution is an element $t \mapsto u_t$ of $C_1(\RR_+, \RR)$ , the set of continuously differentiable functions from $\RR_+$ to $\RR$ , that satisfies (10.2). We claim that $u_t \coloneq \me^{r t} u_0$ is the only solution to (10.2) in $C_1(\RR_+, \RR)$ . It is easy to check that this choice of $u_t$ obeys (10.2). As for uniqueness, suppose that $t \mapsto y_t$ is another solution in $C_1(\RR_+, \RR)$ , so that $\dot y_t = r y_t$ for all $t \geq 0$ and $y_0 = u_0$ . Then

\frac{\diff}{\diff t} \left( y_t \, \me^{-rt} \right) = \dot y_t \, \me^{-rt} - r y_t \, \me^{-rt} = r y_t \, \me^{-rt} - r y_t \, \me^{-rt} = 0,

so $t \mapsto y_t \, \me^{-rt}$ is constant on $\RR_+$ , implying existence of a $c \in \RR$ such that $y_t = c \, \me^{rt}$ for all $t \geq 0$ . Setting $t=0$ and using the initial condition gives $c=u_0$ . Hence, at any $t$ , we have $y_t = \me^{rt} u_0 = u_t$ .

The continuous-time system in Example 10.1.1 is closely related to the discrete time difference equation $u_{t+1} = \me^{r} u_t$ . Indeed, if we start at $u_0$ , then the $t$ -th iterate is $\me^{r t} u_0$ , so solutions agree at integer times. We can think of the continuous-time system as one that interpolates between points in time of a corresponding discrete time system.

The exponential $\me^\lambda$ of $\lambda = a + i b \in \CC$ can also be defined via (10.1). From the identity $\me^{ib} = \cos(b) + i \sin(b)$ , we obtain

\me^{\lambda} = \me^{a + ib} = \me^{a}(\cos(b) + i \sin(b)).

(10.3)

This equation will soon prove useful.

10.1.1.2The Exponential Distribution¶

A random variable $W$ is said to be exponentially distributed with rate $\theta$ , and we write $W \eqdist \Exp(\theta)$ , when the counter CDF $G$ satisfies

G(t) \coloneq \PP\{W > t\} = \me^{- \theta t} \qquad (t \geq 0).

Continuous-time Markov chains have a close relationship with the exponential distribution, a fact that stems from its being the only distribution having the memoryless property

\PP \{W > s + t \given W > s \} = \PP \{W > t\} \quad \text{for all } s, t > 0.

(10.4)

The memoryless property is special. For example, the probability that an individual human being lives 70 years from birth is not equal to the probability that he or she lives another 70 years conditional on having reached age 70. In fact, the exponential distribution is the only memoryless distribution supported on the nonnegative reals:

Proof

Exercise 10.1.1 treats (i) $\Rightarrow$ (ii). As for (ii) $\Rightarrow$ (i), suppose (ii) holds. Then $G$ has three properties:

(a) $G$ is decreasing on $\RR_+$ (as is any counter CDF),

(b) $0 < G(t) < 1$ for all $t > 0$ , and

From (a)–(c) we will show that

G(t) = G(1)^t \qquad \text{for all } t \geq 0.

(10.5)

This is sufficient to prove (i) because then $\theta \coloneq - \ln G(1)$ is a positive real number (by (b)) and, furthermore,

G(t) = \exp\{ \ln [ G(1)^t ] \} = \exp\{ \ln [ G(1) ] t \} = \exp( - \theta t).

To see that (10.5) holds, fix $m,n \in \NN$ . We can use (c) to obtain both $G(m/n) = G(1/n)^m$ and $G(1) = G(1/n)^n$ . It follows that $G(m/n)^n = G(1/n)^{m n} = G(1)^m$ and, raising to the power of $1/n$ , we get (10.5) when $t=m/n$ .

The discussion so far confirms that (10.5) holds when $t$ is rational. So now take any $t \geq 0$ and rational sequences $(a_n)$ and $(b_n)$ converging to $t$ with $a_n \leq t \leq b_n$ for all $n$ . By (a) we have $G(b_n) \leq G(t) \leq G(a_n)$ for all $n$ , so $G(1)^{b_n} \leq G(t) \leq G(1)^{a_n}$ . for all $n \in \NN$ . Taking the limit in $n$ completes the proof. ◻

10.1.1.3Extension to Matrices¶

The real exponential formula (10.1) extends to the matrix exponential via

\me^A \coloneq I + A + \frac{A^2}{2!} + \cdots = \sum_{k \geq 0} \frac{A^k}{k!},

(10.6)

where $A$ is any square matrix. As we will see, the matrix exponential plays a key role in the solution of vector-valued linear differential equations.

Lemma 10.1.2 (Properties of the matrix exponential)

Let $A$ and $B$ be square matrices.

(i) If $A$ is diagonalizable with $A = P D P^{-1}$ , then $\me^{A} = P \me^{D} P^{-1}$ .

(ii) If $A$ and $B$ commute (i.e. $AB = BA$ ), then $\me^{A + B} = \me^{A} \me^{B}$ .

(iii) If $m$ is any positive integer, then $\me^{mA} = (\me^{A})^m$ .

(iv) $\lambda$ is an eigenvalue of $A$ if and only if $\me^\lambda$ is an eigenvalue of $\me^{A}$ .

(v) The function $\RR \ni t \mapsto \me^{t A}$ is differentiable in $t$ , with

\frac{\diff}{\diff t} \me^{tA} = A \me^{t A} = \me^{t A} A.

(10.7)

(vi) $\me^{A^\top} = (\me^A)^\top$ .

(vii) The fundamental theorem of calculus holds, in the sense that

\me^{tA} - \me^{sA} = \int_s^t \me^{\tau A} A \diff \tau \quad \text{for all } s \leq t.

(10.8)

In Lemma 10.1.2 and what follows, integration or differentiation of a vector- or matrix-valued function is carried out element by element. For example, to differentiate a matrix $B(t) = (b_{ij}(t))$ that depends on $t$ , we form a new matrix by differentiating each element $b_{ij}(t)$ with respect to $t$ . The integral $\int_a^b B(t) \diff t$ is the matrix of integrals $\int_a^b b_{ij}(t) \diff t$ .

The proof of part (ii) of Lemma 10.1.2 uses the definition of the exponential and the binomial formula. See, for example, Hirsh & Smale (1974). Part (iii) follows directly from part (ii). Part (iv) follows easily from part (i) when $A$ is diagonalizable (and can be proved more generally via the Jordan canonical form).

Solution to Exercise 10.1.4

We use the definition $\me^A = \sum_{k \geq 0} \frac{A^k}{k!}$ for the proof and fix $t \in \RR$ . A common argument for differentiating $\me^{t A}$ with respect to $t$ is to take the derivative through the infinite sum to get

\frac{\diff}{\diff t} \me^{tA} = \left( A + t \frac{A^2}{1!} + t^2 \frac{A^3}{2!} + \cdots \right) = A \me^{t A}.

But this is not fully rigorous, since we have not justified interchange of limits. A better answer is to start with (10.9), which gives

\frac{\diff}{\diff t} \me^{t A} = \me^{t A} \lim_{h \to 0} \frac{\me^{h A} - I}{h}.

and note that

\frac{\me^{h A} - I}{h} = A + \frac{1}{2!} h A^2 + \frac{1}{3!} h^2 A^3 + \cdots,

which converges to $A$ as $h \to 0$ .

As for (vii), we are drawing an analogy with the fundamental theorem of calculus for scalar-valued functions, which states that $f(t) - f(s) = \int_s^t f'(\tau) \diff \tau$ for all $s \leq t$ , where $f'$ is the derivative of $f$ .

Solution to Exercise 10.1.6

Fix $i,j$ with $1 \leq i,j \leq n$ , let $e_k$ be the $k$ -th canonical basis vector and let $f$ be the function on $\RR$ defined by $f(t) = \inner{e_i, \me^{tA} e_j}$ . Part (v) of Lemma 10.1.2 tells us that $f'(t) = \inner{e_i, \me^{tA} A e_j}$ . By the fundamental theorem of calculus, we have $f(t) - f(s) = \int_s^t f'(\tau) \diff \tau$ , or

\inner{e_i, \me^{tA} e_j} - \inner{e_i, \me^{sA} e_j} = \int_s^t \inner{e_i, \me^{\tau A} A e_j} \diff \tau.

As this is true for any $i$ , we have $\me^{tA} - \me^{sA} = \int \me^{\tau A} A \diff \tau$ , which is what we need to prove.

10.1.2Continuous-Time Flows¶

Next, we study solutions of multivariate differential equations, with a focus on linear systems. These results lay foundations for our study of continuous-time Markov dynamics in Section 10.1.3.

10.1.2.1Continuous-Time Dynamical Systems¶

Recall from Section 2.1.1 that a discrete dynamical system is a pair $(U, S)$ , where $U$ is a set and $S$ is a self-map on $U$ . Trajectories are sequences $(S^t u)_{t \geq 0} = (u, Su, S^2 u, \ldots)$ , where $u \in U$ is the initial condition. These ideas can be extended to continuous-time by considering a pair $(U, (S_t)_{t \geq 0})$ where $U$ is any set and $S_t$ is a self-map on $U$ for each $t \in \RR_+$ . The interpretation is that if $u \in U$ is the current state of the system, then $S_t u$ will be the state after $t$ units of time.

In general, to understand $(U, (S_t)_{t \geq 0})$ as a continuous-time dynamical system, we require that (a) $S_0$ is the identity map, so that the state after zero units of time is just the initial condition, and (b) if we start at $u$ , move forward to $u_s \coloneq S_s u$ , and then move again to $S_t u_s$ after another $t$ units of time, the outcome should be the same as moving from $u$ to $S_{s \,+\, t} \, u$ directly. That is,

S_{s \,+\, t} = S_t \circ S_s \quad \text{for all } t, s \geq 0.

This is the semigroup property.

One way that continuous-time dynamical systems arise is via initial value problems. An initial value problem (IVP) in $\RR^n$ consists of a differential equation $\dot u_t = f(u_t)$ paired with an initial condition $u_0 \in \RR^n$ , where $u_t \in \RR^n$ and $f \colon \RR^n \to \RR^n$ . Under suitable conditions on $f$ , the solution $u_t \coloneq F(t, u_0)$ is uniquely defined for all $t \geq 0$ , and, moreover,

F(0, u_0) = u_0 \quad \text{and} \quad F(s + t, u_0) = F(t, F(s, u_0)) \quad \text{for all } s, t \geq 0

(see, e.g., Hirsh & Smale (1974), Section 8.7). Hence $(S_t)_{t \geq 0}$ defined by $S_t u = F(t, u)$ satisfies the semigroup property and $(\RR^n, (S_t)_{t \geq 0})$ is a continuous-time dynamical system. The function $f$ is called the vector field of $(\RR^n, (S_t)_{t \geq 0})$ .

10.1.2.2Linear Initial Value Problems¶

Given our interest in continuous-time Markov chains and their connection to linear systems (see the comments at the start of Section 10.1.1), we focus primarily on linear differential equations. The next result discusses linear IVPs, illustrating the key role of the matrix exponential. In the statement, $A$ is $n \times n$ and both $\dot u_t$ and $u_t$ are column vectors in $\RR^n$ .

(Here $\dot u_t \coloneq \diff u_t / \diff t$ is defined by differentiating the vector $u_t$ element-by-element, as discussed after Lemma 10.1.2.)

Proposition 10.1.3 motivates us to study flows of the form

t \mapsto u_t, \quad u_t = \me^{t A} u_0 \qquad (t \geq 0),

(10.12)

where $A$ is $n \times n$ , $u_0$ is a vector in $\RR^n$ indicating the initial condition, and $u_t$ is the “state” of the system at time $t$ . Figure 10.1 shows an example when

A = \begin{pmatrix} -2.0 & -0.4 & 0 \\ -1.4 & -1.0 & 2.2 \\ 0.0 & -2.0 & -0.6 \end{pmatrix}.

(10.13)

Exponential flow t \mapsto \me^{tA}u_0 starting from u_0 \in \RR^3 — Figure 10.1:Exponential flow $t \mapsto \me^{tA}u_0$ starting from $u_0 \in \RR^3$

10.1.2.3Stability in the Diagonalizable Case¶

For an exponential flow such as (10.12), a key question is whether or not $u_t \to 0$ as $t \to \infty$ . (This will matter when we try to evaluate lifetime rewards over an infinite horizon in continuous time.) Rather than analyze these issues at every possible $u_0$ , we directly consider the matrix-valued flow $t \mapsto \me^{ t A}$ and study whether $\me^{tA} \to 0$ .

The case where $A$ is diagonalizable provides a good starting point. Suppose $A = P^{-1} D P$ with $D = \diag_j (\lambda_j)$ containing the eigenvalues of $A$ . Then, by Lemma 10.1.2, for any $t \geq 0$ , we have

\me^{t A} = \me^{t P^{-1} D P} = P^{-1} \me^{t D } P.

(10.14)

Exercise 10.1.8 and equation (10.14) tell us that the long run dynamics of $\me^{tA}$ are determined by the scalar flows $t \mapsto \me^{t \lambda_j}$ . How does $\me^{t \lambda}$ evolve over time when $\lambda \in \CC$ ?

To answer this question we write $\lambda = a + ib$ and apply (10.3) to obtain

\me^{t\lambda} = \me^{ta}(\cos(tb) + i \sin(tb)).

This equation tells us that

\me^{t\lambda} \to 0 \text{ as } t \to \infty \quad \text{if and only if} \quad \real \lambda < 0,

(10.15)

where $\real \lambda$ is the real part of $\lambda$ (i.e., if $\lambda = a + ib$ , then $\real \lambda = a$ ).

From this analysis, we conclude that, when $A$ is diagonalizable, we have $\me^{tA} \to 0$ if and only if $\real \lambda_j < 0$ for all $\lambda_j \in \sigma(A)$ , where $\sigma(A)$ denotes the set of all eigenvalues (the spectrum) of $A$ . Another way to put this is that $\me^{tA} \to 0$ if and only if $s(A) < 0$ , where

s(A) \coloneq \max_{\lambda \in \sigma(A)} \real \lambda,

(10.16)

is the spectral bound of $A$ .

As the preceding analysis suggests, the spectral bound plays a key role in the asymptotics of exponential flows, just as a spectral radius governs asymptotics of trajectories of linear maps (see, e.g., Exercise 1.2.11). Section 10.1.2.4 expands on this analysis, while dropping the assumption that $A$ is diagonalizable.

10.1.2.4The General Case¶

Let $A$ be any square matrix. In the following statement about a spectral bound, $\| \cdot \|$ is the operator norm defined in Section 1.2.1.4.

(The second equality in (10.17) also holds when the limit is taken over $t \in \RR_+$ . See, for example, Engel & Nagel (2006).)

The next theorem is a key stability result for exponential flows. Among other things, it extends to arbitrary $A$ the finding that $s(A) < 0$ is necessary and sufficient for stability.

A full proof of Theorem 10.1.5 in a general setting can be found in §V.II of Engel & Nagel (2006).

Theorem 10.1.5 tells us that the flow $t \mapsto \me^{tA} u_0$ converges to the origin at an exponential rate if and only if $s(A)<0$ . The equivalence of (i) and (ii) was proved for the diagonalizable case in Section 10.1.2.3. It can be viewed as the continuous-time analog of $\|B^k\| \to 0$ if and only if $\rho(B) < 1$ (see Exercise 1.2.11).

Solution to Exercise 10.1.12

Let’s start with (i) $\implies$ (ii), or $s(A)<0$ implies $\| \me^{tA} \| \to 0$ as $t \to \infty$ .

Here is one proof that works for $t \in \NN$ and $t \to 0$ . Observe that, since $(\me^A)^t = \me^{t A}$ , the powers $B^t$ of $B \coloneq \me^A$ match the flow $t \mapsto \me^{tA}$ at integer times. We have $B^t \to 0$ if and only if $\rho(B) < 1$ . But, by Lemma 10.1.4, $\rho(B) = \rho(\me^A) = \me^{s(A)}$ . Hence $\rho(B) < 1$ is equivalent to $s(A) < 0$ . Thus, $s(A) < 0$ is the exact condition we need to obtain $B^t = \me^{t A} \to 0$ .

We can improve on this proof of (i) $\implies$ (ii) by allowing $t \in \RR$ and $t \to \infty$ as follows. Suppose $s(A) < 0$ . Fix $\epsilon > 0$ such that $s(A)+\epsilon < 0$ and use (10.17) to obtain a $T < \infty$ such that $(1/t) \ln \| \me^{tA} \| \leq s(A) + \epsilon$ for all $t \geq T$ . Equivalently, for $t$ large, we have $\| \me^{t A} \| \leq \me^{t(s(A)+\epsilon)}$ . The claim follows.

That (iii) implies (iv) is immediate: Just substitute the bound in (iii) into the integral.

10.1.2.5Semigroup Terminology¶

Advanced treatments of continuous-time systems often begin with semigroups. Let’s briefly describe these and connect them to things we have studied earlier. (If you prefer to skip this section on first reading, you can move to the next one after noting that, given an $n \times n$ matrix $A$ , the family $(S_t)_{t \geq 0} = (\me^{t A})_{t \geq 0}$ is called an exponential semigroup and that $A$ is called the infinitesimal generator of the semigroup.)

Let $\Xsf$ be a finite set and let $(S_t)_{t \geq 0}$ be a subset of $\lopx$ indexed by $t \in \RR_{+}$ . The family $(S_t)_{t \geq 0}$ is called a strongly continuous semigroup or $C_0$ -semigroup on $\RR^\Xsf$ if

(i) $S_0 = I$ , where $I$ is the identity,

(ii) $S_{t + \, t'} = S_t \circ S_{t'}$ , and

(iii) $t \mapsto S_t u$ is a continuous map from $\RR_+$ to $\RR^\Xsf$ for every $u \in \RR^\Xsf$ .

In essence, a $C_0$ -semigroup on $\RR^\Xsf$ is a continuous-time dynamical system $(\RR^\Xsf, (S_t)_{\, t \geq 0})$ where each $S_t$ maps an initial state into a time $t$ state.

The semigroup perspective is important because it extends naturally to settings where $\Xsf$ is not finite, in which case we replace the finite-dimensional set $\RR^\Xsf$ with some (typically infinite-dimensional) class of functions $\gG \subset \RR^\Xsf$ , and each $S_t$ becomes a linear operator mapping $\gG$ into itself. At this level of generality, $S_t u$ can be the solution to a partial differential equation, or a stochastic differential equation (see, e.g., Engel & Nagel (2006) or Applebaum (2019)). Operator semigroup theory offers an elegant and powerful framework for handling such systems.

For semigroups in general settings we often have no analytical expressions for $S_t$ . This situation is like the one we encountered in the continuous-time system in Section 10.1.2.1, where $\dot u_t = f(u_t)$ and $f$ is potentially nonlinear. When no analytical solution $u_t$ exists, analyzing the dynamics requires us to try to infer its properties from the vector field $f$ , so that $f$ becomes the primary focus of analysis.

A natural question, then, is, given a semigroup $(S_t)_{\, t \geq 0}$ on $\lopx$ , does there always exist a “vector field” type object that “generates” $(S_t)_{\, t \geq 0}$ ? When $\Xsf$ is finite, the answer is affirmative. This object, to be denoted by $A$ , is called the infinitesimal generator of the semigroup and is defined by

A = \lim_{t \, \downarrow \, 0} \frac{S_t - S_0}{t} = \lim_{t \, \downarrow \, 0} \frac{S_t - I}{t}

(10.19)

At $u \in U$ , the vector $A u$ indicates the instantaneous change in the state.

More precisely, when $\Xsf$ is finite, we have:

Semigroups of the form described in Proposition 10.1.6 are called exponential semigroups (or “uniformly continuous” semigroups).

A full proof of Proposition 10.1.6 can be found in the discussion of Theorem 2.12 of Engel & Nagel (2006). The results are not surprising, since the main claim is that, in finite dimensions, solutions to linear differential equations have exponential form. The fact that $A$ is the infinitesimal generator of the semigroup $(S_t)_{\, t \geq 0} = (\me^{tA})_{\, t \geq 0}$ follows from Lemma 10.1.2, which gives

\lim_{t \, \downarrow \, 0} \frac{S_t - S_0}{t} = \lim_{t \, \downarrow \, 0} \frac{\me^{t A} - \me^{0}}{t} = \frac{\diff}{\diff t} \me^{t A} \; \Bigr\rvert_{\, t = 0} = A \me^{0 A} = A.

The preceding discussion places our analysis in a wider context. To practice our new terminology, we can restate (i) $\iff$ (ii) from Theorem 10.1.5 by saying that the exponential semigroup $(S_t)_{\, t \geq 0} = (\me^{tA})_{\, t \geq 0}$ converges to zero if and only if the spectral bound of its infinitesimal generator is negative.

10.1.3Markov Semigroups¶

Having studied multivariate linear dynamics, we are now ready to concentrate on the Markov case, where dynamics evolve in distribution space. For the most part we now switch to operator-theoretic notation, where $\Xsf$ is a finite set with $n$ elements, and an $n \times n$ matrix is identified with a linear operator on $\lopx$ . As emphasized in Section 2.3.3.1, this is merely a change in terminology, and all preceding results for matrices extend directly to linear operators.

10.1.3.1Intensity Matrices¶

If $(X_t)_{\, t \geq 0}$ is $P$ -Markov on $\Xsf$ for some $P \in \mopx$ , then the marginal distributions of $(X_t)_{t \geq 0}$ evolve according to the linear difference system $\psi_{t+1} = \psi_t P$ (see Section 3.1.1). We now seek a continuous-time analog in the form of a linear differential equation.

To this end we call $Q \in \lopx$ an intensity operator or intensity matrix^[1] when

Q(x, x') \geq 0 \text{ whenever } x \neq x' \quad \text{and} \quad \sum_{x'} Q(x, x') = 0 \text{ for all } x \in \Xsf.

(10.20)

Let

\iopx = \text{ the set of all intensity operators in } \lopx.

Consider the IVP

\dot \psi_t(x') = \sum_{x} Q(x, x') \psi_t(x) \qquad (t \geq 0, \; x' \in \Xsf),

which we can also write as

\dot \psi_t = \psi_t \, Q, \qquad \psi_0 \in \dD(\Xsf) \text{ given}.

(10.21)

when $\psi_t$ and $\dot \psi_t$ are understood to be row vectors. We say that $\dD(\Xsf)$ is invariant for the IVP (10.21) if the solution $(\psi_t)_{t \geq 0}$ remains in $\dD(\Xsf)$ for all $t \geq 0$ .

In view of Proposition 10.1.3, we can rephrase this by stating that $\dD(\Xsf)$ is invariant for (10.21) whenever

\psi_0 \in \dD(\Xsf) \quad \implies \quad \psi_0 \, \me^{t Q} \in \dD(\Xsf) \text{ for all } t \geq 0.

(10.22)

Our key result for this section shows the central role of intensity matrices:

Proposition 10.1.7 tells us that the set $\iopx$ coincides with the set of continuous-time (and time-homogeneous) Markov models on $\Xsf$ . Any specification outside this class fails to generate flows in distribution space. The proof is completed in several steps.

For Exercise 10.1.13–Exercise 10.1.15, $Q \in \iopx$ and $P_t \coloneq \me^{tQ}$ .

For the proof of Proposition 10.1.7, we have now shown that (i) implies (ii). Evidently (ii) implies (iii), because if $\psi_0 \in \dD$ and $\psi_t = \psi_0 P_t$ where $P_t$ is stochastic, then $\psi_t \in \dD(\Xsf)$ . Hence it remains only to show that (iii) implies (i).

Solution to Exercise 10.1.17

By Lemma 10.1.2, we have

\frac{\diff}{\diff t} \me^{tQ} = Q \me^{t Q} = \me^{t Q} Q \quad \text{for all} \quad t \geq 0.

(10.23)

Evaluating (10.23) at $t=0$ and recalling that $\me^0 = I$ gives

Q = \lim_{h \, \downarrow \, 0} \; \frac{1}{h} \, ( \me^{h Q} - I ).

(10.24)

Interpreting $\delta_x$ as a row vector and $\delta_{x'}$ as a column vector, while using the fact that $x \neq x'$ combined with (10.24), we obtain

Q(x, x') = \delta_x Q \delta_{x'} = \delta_x \left[ \lim_{h \, \downarrow \, 0} \frac{\me^{h Q}}{h} \right] \delta_{x'} = \lim_{h \, \downarrow \, 0} \delta_x \frac{\me^{h Q}}{h} \delta_{x'}.

Hence we need only show that the $\delta_x \me^{h Q} \delta_{x'} \geq 0$ . By (ii), $\delta_x \me^{h Q}$ is a distribution, so the inequality holds.

Returning to Proposition 10.1.7, the last two exercises confirm that (iii) implies (i). The proof is now complete.

10.1.3.2Interpretation¶

Section 10.1.3.1 covered the mathematical relationship between intensity matrices and Markov operators. Let’s now discuss the connection more informally, in order to build intuition.

To this end, let $(X_t)_{t \geq 0}$ be $P_h$ -Markov in discrete time. Here $h > 0$ is the length of the time step. We write the corresponding distribution sequence $\psi_{t+h} = \psi_t P_h$ in terms of change per unit of time, as in

\frac{\psi_{t+h} - \psi_t}{h} = \psi_t \frac{P_h - I}{h} \quad \text{where} \quad I \text{ is the } n \times n \text{ identity}.

(10.25)

Continuous-time dynamics are obtained by taking the limit as $h \, \downarrow \, 0$ . If we define

Q \coloneq \lim_{h \, \downarrow \, 0} \frac{P_h - I}{h},

(10.26)

and assume that limits exist, then (10.25) becomes (10.21).

What properties does $Q$ have? Inspecting (10.26) implies

Q(x, x') \approx \frac{P_h(x, x') - \1\{x = x'\}}{h}

(10.27)

when $h$ is small and positive.

Equation (10.28) tells us that $Q(x, x')$ represents the instantaneous rate of flow out of state $x$ and into state $x'$ . The on-diagonal value $P_h(x,x)$ just balances the off-diagonal probabilities.

10.1.3.3Markov Semigroups¶

Fix $Q \in \iopx$ . In the terminology of Section 10.1.2.5, the family of operators $(P_t)_{t \geq 0} = (\me^{tQ})_{t \geq 0} \subset \mopx$ that solves $\dot \psi_t = \psi_t Q$ (see (10.22)) is an exponential semigroup. Since each $P_t$ is in $\mopx$ , it is also called the Markov semigroup generated by $Q$ . It satisfies the semigroup property $P_{s \, + \, t} = P_s \, P_t$ for all $s, t \geq 0$ , which can be written more explicitly as

P_{s+t} (x, x') = \sum_z P_s(x, z) P_t(z, x') \qquad (s, t \geq 0, \; x, x' \in \Xsf).

(10.29)

In the present setting, (10.29) is called the (continuous-time) Chapman–Kolmogorov equation. It states that the probability of moving from $x$ to $x'$ over $s+t$ units of time equals the probability of moving from $x$ to $z$ over $s$ units of time, and then $z$ to $x'$ over $t$ units of time, summed over all $z$ .

Again following the terminology in Section 10.1.2.5, the intensity matrix $Q$ that defines $(P_t)_{t \geq 0} = (\me^{tQ})_{t \geq 0}$ is also called the infinitesimal generator of $(P_t)_{t \geq 0}$ .

From Lemma 10.1.2, the derivative of $\me^{tQ}$ is $Q\me^{tQ} = \me^{tQ}Q$ . We can write this as

$\dot P_t = QP_t$ , which is called the Kolmogorov backward equation, and
$\dot P_t = P_t Q$ , which is called the Kolmogorov forward equation.

We can work in the other direction as well: If we can establish that a function $t \mapsto P_t$ from $\RR_+$ to $\lopx$ satisfies either one of these equations, then $(P_t)_{t \geq 0}$ is a Markov semigroup with infinitesimal generator $Q$ . The next proposition gives details.

Proposition 10.1.8 is a version of our result for linear IVPs in Proposition 10.1.3, except that the IVP is now defined in operator space, rather than vector space.

10.1.4Continuous-Time Markov Chains¶

We have discussed the connection between intensity matrices, Markov semigroups, and distribution flows. Let’s now connect these objects to continuous-time Markov chains. In this section, we will (a) provide a formal definition of a continuous-time Markov chain associated with a given initial condition $\psi$ and intensity matrix $Q$ , and (b) show how to construct such a chain algorithmically. We’ll accomplish (b) in two steps: first showing how to construct a jump chain from certain primitives (Section 10.1.4.2–Section 10.1.4.3) and then showing how to construct those primitives from a given initial condition $\psi$ and intensity matrix $Q$ (Section 10.1.4.4).

10.1.4.1Definition¶

Let $C(\RR_+, \Xsf)$ be the set of right-continuous functions from $\RR_+$ to $\Xsf$ and let $(P_t)_{t \geq 0}$ be a Markov semigroup generated by some $Q \in \iopx$ . A continuous-time Markov chain generated by $(P_t)_{t \geq 0}$ is a random function $(X_t)_{t \geq 0}$ that takes values in $C(\RR_+, \Xsf)$ and satisfies

\PP \{ X_{s \,+\, t} = x' \given \fF_s \} = P_t(X_s, x') \qquad \text{for all } s, t \geq 0 \text{ and } x' \in \Xsf,

(10.30)

where $\fF_s \coloneq (X_\tau)_{0 \leq \tau \leq s}$ is the history of the process up to time $s$ . To update from time $s$ to time $t$ given this history, we simply take the last value $X_s$ and update using $P_t$ . Conditioning on $X_s = x$ , we get

P_t(x, x') = \PP \{ X_{s \, + \, t} = x' \given X_s = x\} \qquad (s, t \geq 0, \; x, x' \in \Xsf).

Mirroring terminology for discrete chains from Section 3.1.1.1, we will call a continuous-time Markov chain $(X_t)_{t \geq 0}$ $Q$ -Markov when (10.30) holds and $Q$ is the infinitesimal generator of $(P_t)_{t \geq 0}$ .

In what follows, $\PP_x$ and $\EE_x$ denote probabilities and expectations conditional on $X_0 = x$ . Given $h \in \RR^\Xsf$ , we have

\EE_x \, h(X_t) = \sum_{x'} P_t(x, x')h(x') =: (P_t h)(x) .

This expression mirrors the discrete time case discussed in Section 3.2.1.1.

10.1.4.2A Jump Chain Construction¶

In Section 10.1.4.1 we defined a continuous-time Markov chain. In this section, we describe a standard method for constructing one by using three components:

(i) an initial condition $\psi \in \dD(\Xsf)$ ,

(ii) a jump matrix $\Pi \in \mopx$ , and

(iii) a rate function $\lambda$ mapping $\Xsf$ to $(0, \infty)$ .

The process $(X_t)$ starts at state $x$ , which is drawn from $\psi$ , waits there for an exponential time $W$ with rate $\lambda(x)$ , and then updates to a new state $x'$ drawn from $\Pi(x, \cdot)$ . We take $x'$ as the new state for the process and repeat.

These ideas are restated in Algorithm 10.1. In the algorithm, $(W_k)$ and $(Y_k)$ are drawn independently. The process $(W_k)$ is called the sequence of holding times or wait times, the sums $J_k = \sum_{i=1}^k W_i$ are called the jump times and $(Y_k)$ is called the embedded jump chain. The jumps and the process $(X_t)_{t \geq 0}$ are illustrated in Figure 10.2.

Let $I \in \lopx$ be the identity matrix, so $I(x,x') = \1\{x = x'\}$ , and define $Q \in \lopx$ via

Q(x, x') = \lambda(x)(\Pi(x, x') - I(x,x')) \qquad (x, x' \in \Xsf)

(10.31)

It is easy to verify that $Q$ is an intensity matrix. In fact, $Q$ is the intensity matrix for the Markov semigroup associated with the process generated by Algorithm 10.1. For $x \neq x'$ , it tells us that probability flows from $x$ to $x'$ at rate $\lambda(x) \Pi(x, x')$ , which is the rate of leaving $x$ times the rate of moving from $x$ to $x'$ . The next result formalizes these ideas.

To prove Proposition 10.1.9 we take $(X_t)_{t \geq 0}$ to be as in the statement of the proposition and define $(P_t)_{t \geq 0}$ by $P_t(x, x') = \PP_x \{X_t = x'\}$ for all $x,x' \in \Xsf$ . The proof uses the following steps:

(i) Obtain an integral equation that $(P_t)_{t \geq 0}$ must satisfy.

(ii) Differentiate to obtain the Kolmogorov backward equation $\dot P_t = QP_t$ .

(iii) Solve this differential equation to obtain $P_t = \me^{t Q}$ for all $t$ .

Here is the first step. In the statement, $\Pi P_{t-\tau}$ is the matrix product of $\Pi$ and $P_{t-\tau}$ , while the equation in (10.32) is sometimes called the integrated Kolmogorov backward equation.

Proof

Fixing $x, x' \in \Xsf$ and $t > 0$ , we have

P_t(x, x') \coloneq \PP_x \{X_t = x'\} = \PP_x \{X_t = x', \; J_1 > t \} + \PP_x \{X_t = x', \; J_1 \leq t \}.

(10.33)

Regarding the first term on the right-hand side of (10.33),

\PP_x \{X_t = x', \; J_1 > t \} = I(x, x') P\{J_1 > t \} = I(x, x') e^{- t \lambda(x)}.

(10.34)

For the second term on the right-hand side of (10.33), we obtain

\PP_x \{X_t = x', \; J_1 \leq t \} = \EE_x \left[ \1\{J_1 \leq t\} \PP_x \{X_t = x' \,|\, W_1, Y_1\} \right] = \EE_x \left[ \1\{J_1 \leq t\} P_{t - J_1} (Y_1, x') \right].

Evaluating the expectation and using the independence of $J_1$ and $Y_1$ , this becomes

\begin{aligned} \PP_x \{X_t = x', \; J_1 \leq t \} & = \int_0^\infty \1\{\tau \leq t\} \sum_z \Pi(x, z) P_{t - \tau} (z, x') \lambda(x) e^{-\tau \lambda(x)} d \tau \\ & = \lambda(x) \int_0^t \sum_z \Pi(x, z) P_{t - \tau} (z, x') e^{-\tau \lambda(x)} d \tau. \end{aligned}

Combining this result with (10.33) and (10.34) gives (10.32). ◻

Differentiating the integrated Kolmogorov backward equation produces the Kolmogorov backward equation:

Proof

The claim that $P_0 = I$ is obvious. For the second claim, one can easily verify that, when $f$ is a differentiable function and $\alpha > 0$ , we have

g(t) = e^{- t \alpha} f(t) \quad \implies \quad g'(t) = e^{- t \alpha} f'(t) - \alpha g(t)

(10.35)

Note also that, with the change of variable $s = t - \tau$ , we can rewrite (10.32) as

P_t(x, x') = e^{-t \lambda(x)} \left\{ I(x, x') + \lambda(x) \int_0^t (\Pi P_s)(x, x') e^{s \lambda(x)} d s \right\}.

(10.36)

Applying (10.35) produces

\dot P_t(x, x') = e^{-t \lambda(x)} \left\{ \lambda(x) (\Pi P_t)(x, x') e^{t \lambda(x)} \right\} - \lambda(x) P_t(x, x').

Rearranging yields $\dot P_t(x, x') = \lambda(x) [ (\Pi - I) P_t](x, x')$ , which is identical to $\dot P_t = Q P_t$ . ◻

10.1.4.3Application: Inventory Dynamics¶

Let $X_t$ be a firm’s inventory at time $t$ . When current stock is $x > 0$ , customers arrive at rate $\lambda(x)$ , so the wait time for the next customer is an independent draw from the $\Exp (\lambda(x))$ distribution; $\lambda$ maps $\Xsf$ to $(0, \infty)$ .

The $k$ -th customer demands $U_k$ units, where each $U_k$ is an independent draw from a fixed distribution $\phi$ on $\NN$ . Purchases are constrained by inventory, however, so inventory falls by $U_k \wedge X_t$ . When inventory hits zero the firm orders $b$ units of new stock. The wait time for new stock is also exponential, being an independent draw from $\Exp (\lambda(0))$ .

Let $Y$ represent the inventory size after the next jump (induced by either a customer purchase or ordering new stock), given current stock $x$ . If $x > 0$ , then $Y$ is a draw from the distribution of $x - U \wedge x$ where $U \sim \phi$ . If $x=0$ , then $Y \equiv b$ . Hence $Y$ is a draw from $\Pi(x, \cdot)$ , where $\Pi(0, y) = \1\{y=b\}$ and, for $0 < x \leq b$ ,

\Pi(x, y) = \begin{cases} 0 & \text{ if } x \leq y \\ \PP\{x - U = y\} & \text{ if } 0 < y < x \\ \PP\{U \geq x\} & \text{ if } y = 0 \end{cases}

(10.37)

We can simulate the inventory process $(X_t)_{t \geq 0}$ via the jump chain algorithm. In this case, the wait time sequence $(W_k)$ is the wait time for customers (and for inventory when $X_t=0$ ) and the jump sequence $(Y_k)$ is the level of inventory immediately after each jump. By Proposition 10.1.9, the inventory process is $Q$ -Markov with $Q$ given by $Q(x, x') = \lambda(x)(\Pi(x, x') - I(x,x'))$ .

Figure 10.3 shows a simulation when orders are geometric, so that

\phi(k) = \PP\{U = k\} = (1-\alpha)^{k-1} \alpha \qquad (k \in \NN, \; \alpha \in (0, 1)).

In the simulation we set $\alpha=0.7$ , $b=10$ and $\lambda(x) \equiv 0.5$ . The figure plots $X_t$ for $t \in [0, 50]$ . Since each wait time $W_i$ is a draw from $\Exp(0.5)$ the mean wait time is 2.0. The function that produces the map $t \mapsto X_t$ is shown in Listing 1.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
using Random, Distributions

"""
Generate a path for inventory starting at b, up to time T.

Return the path as a function X(t) constructed from (J_k) and (Y_k).
"""
function sim_path(; T=10, seed=123, λ=0.5, α=0.7, b=10)

    J, Y = 0.0, b
    J_vals, Y_vals = [J], [Y]
    Random.seed!(seed)
    φ = Exponential(1/λ)     # Wait times are exponential
    G = Geometric(α)         # Orders are geometric

    while true
        W = rand(φ)  
        J += W
        push!(J_vals, J)
        if Y == 0
            Y = b
        else
            U = rand(G) + 1   # Geometric on 1, 2,...
            Y = Y - min(Y, U)
        end
        push!(Y_vals, Y)
        if J > T
            break
        end
    end
    
    function X(t)
        k = searchsortedlast(J_vals, t)
        return Y_vals[k+1]
    end

    return X
end

Program 1:Continuous-time inventory dynamics (inventory_cont_time.jl)

10.1.4.4From Intensity Matrices to Jump Chains¶

If $Q \in \lopx$ is a given intensity matrix, how should we produce a continuous-time $Q$ -Markov chain? If we can construct a jump chain that is $Q$ -Markov, then not only do we obtain existence of a $Q$ -Markov chain but we also provide a way to simulate one (via Algorithm 10.1).

To construct such a jump chain we first fix an intensity matrix $Q \in \lopx$ and, to simply matters, assume that all rows of $Q$ are nonzero. This means that the process has no absorbing states (since nonzero rows are equivalent to $Q(x,x) < 0$ for all $x$ , which in turn states that there is a nonzero outflow from each state).

Then we set

\lambda(x) \coloneq -Q(x,x) \quad \text{and} \quad \Pi(x,x') \coloneq I(x,x') + \frac{Q(x,x')}{\lambda(x)}.

It is straightforward to confirm that $\Pi \in \mopx$ and that $Q$ satisfies (10.31). Hence, by Proposition 10.1.9, the process $(X_t)_{t \geq 0}$ generated by Algorithm 10.1 is $Q$ -Markov.

10.2Continuous-Time Markov Decision Processes¶

We are ready to turn to dynamic programming in continuous-time. As for the discrete time case, continuous-time dynamic programs aim to maximize a measure of lifetime value. In Section 10.2.1 we study lifetime valuations. In Section 10.2.2 we learn how to maximize them.

10.2.1Valuation¶

In this section, we consider lifetime valuations associated with continuous reward flows, starting from a general semigroup perspective and then progressing to specific cases (such as expected lifetime value under constant discounting). Throughout, $\Xsf$ is a finite set.

10.2.1.1A Semigroup Perspective¶

For the discrete time problems with state-dependent discounting that we studied in Chapter 6, lifetime valuations take the form $v = \sum_{t \geq 0} K^t h$ for some $h \in \RR^\Xsf$ and a positive linear operator $K$ on $\RR^\Xsf$ . (See Theorem 6.1.1 and (6.18).) For a continuous-time version we fix $h \in \RR^\Xsf$ , take $(K_t)_{t \geq 0}$ to be a positive exponential semigroup in $\lopx$ , where positive means $K_t \geq 0$ for all $t$ , and set

v = \int_0^\infty K_t h \diff t.

(10.38)

Let $A \in \lopx$ be the infinitesimal generator of $(K_t)_{\, t \geq 0}$ . The next result provides a condition for finiteness of $v$ and several characterizations.

A way to understand (10.39) is to view the valuation $v$ as a price that reflects prospective benefits from holding an asset. The asset yields a flow of benefits, where $h(x)$ is the instantaneous reward in state $x$ . Rewards $t$ periods in the future are discounted by the pricing operator $K_t$ . Thus, $(K_t h)(x)$ is the anticipated payoff $t$ periods ahead, discounted for the wait time and possibly also for risk as in (6.31). The value $v(x)$ is then lifetime value, which equals the current price.

In this asset valuation setting, (10.39) is a natural consistency condition. It says that the price of purchasing the asset today is equal to the payouts obtained from holding the asset from now until time $t$ and then selling it for current discounted value $K_t v$ . (This is the continuous-time analog of (6.33).)

The preceding discussion matches the semigroup perspective on asset pricing introduced in Garman (1985) and Duffie & Garman (1986). In addition to shedding light on (10.39), it also leads to the assertion that $v = - A^{-1} h$ in (ii), which is obtained by differentiating (10.39). Details are in the proof:

Proof

Proof of Proposition 10.2.1.

From Proposition 10.1.6, we have $K_t = \me^{tA}$ for all $t \geq 0$ . Since $s(A) < 0$ , Theorem 10.1.5 implies that the integral in (10.38) is finite. For any $t \geq 0$ ,

v = \int_0^\infty K_\tau h \diff \tau = \int_0^t K_\tau h \diff \tau + \int_t^\infty K_\tau h \diff \tau .

Using the semigroup property and linearity of $K_t$ , we can write the last term on the right-hand side as

\int_t^\infty K_\tau h \diff \tau = \int_0^\infty K_{t + \tau} h \diff \tau = \int_0^\infty K_t K_\tau h \diff \tau = K_t \int_0^\infty K_\tau h \diff \tau = K_t v.

Combining this result with the expression for $v$ in the previous display proves (10.39). This proves part (i) of the proposition.

Turning to (ii), if we rearrange (10.39) and divide by $t > 0$ , we get

-\frac{K_t - I}{t} v = \frac{1}{t} \int_0^t K_\tau h \diff \tau .

(10.41)

By the fundamental theorem of calculus,

\lim_{t \to 0} \frac{1}{t} \int_0^t K_\tau h \diff \tau = \frac{\diff}{\diff t} \int_0^t K_\tau h \diff \tau \; \Bigr\rvert_{\, t = 0} = K_0 \, h = I \, h = h.

As a result, taking $t \to 0$ in (10.41) and using the definition of the infinitesimal generator yields $- A v = h$ . Moreover, since $s(A) < 0$ , all eigenvalues of $A$ are nonzero. Hence $A$ has nonzero determinant and is therefore nonsingular (bijective). Combining these facts yields $v = - A^{-1} h$ .

Regarding (iii), fix $g \in \RR^\Xsf$ with $g \geq 0$ . From the preceding results, the function $w = \int_0^\infty K_t g \diff t$ is finite and equals $-A^{-1} g$ . Since $K_t \geq 0$ for all $t$ , we have $w \geq 0$ . Thus, $-A^{-1} g \geq 0$ whenever $g \geq 0$ . Hence $-A^{-1} \geq 0$ , or $A^{-1} \leq 0$ .

For (iv) we use the fact that $v$ obeys $-A v = h$ to obtain $v = h + (I + A) v$ . Hence $v$ is a fixed point of $U$ . Conversely, if $w$ is a fixed point of $U$ , then $-A w = h$ . But $A$ is invertible, so then $w = - A^{-1} h = v$ . Hence $v$ is the only fixed point of $U$ in $\RR^\Xsf$ .

Order stability of $U$ requires upward and downward stability on $\RR^\Xsf$ . For upward stability, suppose that $w \in \RR^\Xsf$ and $Uw \geq w$ . Then $h + A w \geq 0$ , or $- A w \leq h$ . But $-A^{-1} \geq 0$ , so $w \leq - A^{-1} h = v$ , and upward stability holds. The proof of downward stability is similar. ◻

10.2.1.2Valuations as Expectations¶

In applications, the expression $v = \int_0^\infty K_t \, h \diff t$ from (10.38) typically arises as a discounted expectation over a flow of rewards. When analyzing $v$ we wish to deploy Proposition 10.2.1, so we need to check that any expectation we propose results in $(K_t)$ being a semigroup. The next proposition provides one result along these lines.

In the proof of Proposition 10.2.2, we will use the fact that $(X_t)_{t \geq 0}$ satisfies the Markov property. In particular, if $H$ is a real-valued function on the path space $C(\RR_+, \Xsf)$ , then

\EE \left[ H( (X_\tau)_{\tau \geq s} ) \,|\, (X_\tau)_{\tau=0}^s \right] = \EE_{X_s} H( (X_\tau)_{\tau \geq 0} ).

(10.43)

For a proof of (10.43), see, for example, Chapter 2 of Liggett (2010).

Proof

Proof of Proposition 10.2.2.

Fix $h \in \RR^\Xsf$ . Evidently $(K_0 \, h)(x) = h(x)$ , so $K_0 = I$ . Regarding the semigroup property, we fix $s \leq t$ and use Exercise 10.2.1 and the law of iterated expectations to obtain

(K_{s + \, t} \, h)(x) = \EE_x \, \eta(0,s + t) \, h(X_{s + t}) = \EE_x \, \left[ \eta(0,s) \, \EE \left[ \eta(s, s+t) \, h(X_{s + t}) \,|\, (X_\tau)_{\tau=0}^s \right] \right].

Using the Markov property (10.43), the inner expectation in the last display can be expressed as

\EE \left[ \exp \left(- \int_s^{s+t} \delta(X_\tau) \diff \tau \right) h(X_{s + \, t}) \,|\, (X_\tau)_{\tau=0}^s \right] \\ = \EE_{X_s} \left[ \exp \left(- \int_0^t \delta(X_\tau) \diff \tau \right) h(X_t) \right] = (K_t \, h) (X_s),

(K_{s \, + \, t} \, h)(x) = \EE_x \eta(0, s) (K_t \, h) (X_s) = \EE_x \exp \left(- \int_0^s \delta(X_\tau) \diff \tau \right) (K_t \, h) (X_s) = (K_s \, K_t \, h) (x).

This argument confirms that $K_{s \, + \, t} = K_s \circ K_t$ .

To see that $K_t$ is a positive operator for all $t$ , observe that if $h \geq 0$ , then the expectation in (10.42) is nonnegative. Hence $K_t \, h \geq 0$ whenever $h \geq 0$ .

To prove continuity of $t \mapsto K_t h$ , it suffices to show that $(K_t h)(x) \to h(x)$ as $t \downarrow 0$ (see, e.g., Engel & Nagel (2006), Proposition 1.3). This holds by right-continuity of $X_t$ , which gives $h(X_t) \to h(x)$ as $t \downarrow 0$ , and hence

\lim_{t \downarrow 0} (K_t h)(x) = \EE_x \lim_{t \downarrow 0} \exp \left(- \int_0^t \delta(X_\tau) \diff \tau \right) h(X_t) = h(x).

(Readers familiar with measure theory can justify the change of limit and expectation via the dominated convergence theorem.) ◻

10.2.1.3Constant Discounting¶

Many studies of continuous-time dynamic programming with discounting use a constant discount rate. In this setting, the lifetime value in (10.38) becomes

v(x) \coloneq \EE_x \int_0^\infty \me^{-t \delta} h(X_t) \diff t

(10.44)

for some $\delta \in \RR$ and $h \in \RR^\Xsf$ . Here $(X_t)_{t \geq 0}$ is a continuous-time Markov chain on finite state $\Xsf$ generated by Markov semigroup $(P_t)_{t \geq 0}$ with intensity operator $Q$ . The idea is that $h(X_t)$ is an instantaneous reward at each time $t$ , while $\delta$ is a fixed discount rate.

Equation (10.44) is the continuous-time version of (3.16).

Proof

As a first step, we reverse the order of expectation and integration in (10.44) to get

v(x) = \int_0^\infty (K_t h)(x) \diff t \quad \text{where} \quad (K_t h)(x) \coloneq \me^{-t \delta} \, \EE_x \, h(X_t) = \me^{-t \delta} (P_t \, h)(x).

(This change of order can be justified by Fubini’s theorem, which can be applied when $\EE_x \, \int_0^\infty \me^{-t \delta} \, | h(X_t) | \diff t < \infty$ . Since $\Xsf$ is finite, we have $|h| \leq M < \infty$ for some constant $M$ , and the double integral is dominated by $M \int_0^\infty \me^{-t\delta} \diff t = M / \delta$ .)

Note that $K_t$ is a special case of (10.42). Hence $(K_t)_{t \geq 0}$ is a positive $C_0$ -semigroup. Its infinitesimal generator is $A \coloneq Q - \delta I$ , since $K_t = \me^{-t \delta} P_t = \me^{t(Q - \delta I)}$ . We claim that $s(A) < 0$ . To see this, observe that (using (10.17)),

\me^{s(Q - \delta I)} = \rho(\me^{Q - \delta I}) = \rho(\me^Q \me^{- \delta I}) = \rho(\me^Q \me^{- \delta} I) = \me^{- \delta} \rho(\me^Q ) = \me^{- \delta} \rho(P_1) = \me^{- \delta}.

Taking logs gives $s(Q - \delta I) = -\delta$ . Since $\delta > 0$ , we have $s(Q - \delta I) < 0$ , as claimed.

We can now apply Proposition 10.2.1 with $A = Q - \delta I$ and $K_t = \me^{tA}$ . The proposition tells us that that $A$ is bijective, and

v = -A^{-1} h = (- A)^{-1} h = (\delta I - Q)^{-1} h .

It also tells us that $-A^{-1} \geq 0$ , so $(\delta I - Q)^{-1} = (-A)^{-1} = - A^{-1} \geq 0$ . This confirms both claims in (10.45). Finally, the operator $U$ in (10.46) is a special case of $U$ in (10.40), with $A = Q - \delta I$ , so $U$ is order stable with unique fixed point $v$ (by Proposition 10.2.1). All of the claims in Proposition 10.2.3 are now verified. ◻

10.2.2Constructing a Decision Process¶

In this section, we define continuous-time Markov decision processes, discuss optimality theory, and provide algorithms and applications.

10.2.2.1Definition¶

Given two finite sets $\Asf$ and $\Xsf$ , called the state and action spaces respectively, we define a continuous-time Markov decision process (or continuous-time MDP) to be a tuple $\cC = (\Gamma, \delta, r, Q)$ consisting of

(i) a nonempty correspondence $\Gamma$ from $\Xsf$ to $\Asf$ , referred to as the feasible correspondence, which in turn defines the feasible state-action pairs

\Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)},

(ii) a constant $\delta > 0$ , referred to as the discount rate,

(iii) a function $r$ from $\Gsf$ to $\RR$ , referred to as the reward function, and

(iv) an intensity kernel $Q$ from $\Gsf$ to $\Xsf$ ; that is, a map $Q$ from $\Gsf \times \Xsf$ to $\RR$ satisfying

\sum_{x'} Q(x, a, x') = 0 \quad \text{ for all } (x,a) \text{ in } \Gsf

and $Q(x, a, x') \geq 0$ whenever $x \neq x'$ .

Informally, at state $x$ with action $a$ over the short interval from $t$ to $t+h$ , the controller receives instantaneous reward $r(x,a)h$ and the state transitions to state $x'$ with probability $Q(x, a, x') h + o(h)$ .

Paralleling our discussion of the discrete time case in Chapter 5, the set of feasible policies is

\Sigma \coloneq \setntn{\sigma \in \Asf^\Xsf} {\sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf}.

(10.47)

10.2.2.2Lifetime Values¶

Choosing policy $\sigma$ from $\Sigma$ means that we respond to state $X_t$ with action $A_t \coloneq \sigma(X_t)$ at every $t \geq 0$ . The state then evolves according to the intensity operator

Q_\sigma(x, x') \coloneq Q(x, \sigma(x), x') \qquad (x, x' \in \Xsf).

Letting

P^\sigma_t \coloneq \me^{t Q_\sigma} \quad \text{and} \quad r_\sigma(x) \coloneq r(x, \sigma(x)) \qquad (x \in \Xsf)

the lifetime value of following $\sigma$ starting from state $x$ is

v_\sigma (x) \coloneq \EE_x \int_0^\infty \me^{-\delta t} r(X_t, \sigma(X_t)) \diff t = \EE_x \int_0^\infty \me^{-\delta t} r_\sigma(X_t) \diff t,

(10.48)

where $(X_t)_{t \geq 0}$ is $Q_\sigma$ -Markov with initial condition $x$ . We call $v_\sigma$ the $\sigma$ -value function.

Since $\delta > 0$ , we can apply Proposition 10.2.3 to obtain

v_\sigma = (\delta I - Q_\sigma)^{-1} r_\sigma.

(10.49)

Representation (10.49) provides a straightforward method for computing $v_\sigma$ .

10.2.2.3Greedy Policies¶

A policy $\sigma \in \Sigma$ is called $v$ -greedy for $\cC$ if

\sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{r(x, a) + \sum_{x'} v(x') Q(x, a, x')\right\} \quad \text{for all } x \in \Xsf.

(10.50)

Like the discrete time case, a $v$ -greedy policy chooses actions optimally to trade off high current rewards versus high rate of flow into future states with high values. Unlike the discrete time case, the discount factor does not appear in (10.50) because the trade-off is instantaneous.

10.2.2.4Policy Iteration¶

We introduce a continuous-time policy iteration algorithm that parallels discrete time HPI for Markov decision processes, as described in Section 5.1.4.2.

The continuous-time HPI routine is given in Algorithm 10.2, with the intuition being similar to that for the discrete time MDP version given. We provide convergence results in Section 10.2.3.

10.2.2.5Policy Operators¶

For each $\sigma \in \Sigma$ , let $T_\sigma$ be the operator defined at $v \in \RR^\Xsf$ by

T_\sigma \, v = r_\sigma + (Q_\sigma + (1 - \delta) I) v.

(10.51)

As shown in Proposition 10.2.3, each $T_\sigma$ is order stable on $\RR^\Xsf$ , with unique fixed point $v_\sigma$ . Hence $\aA \coloneq (\RR^\Xsf, \{T_\sigma\})$ is an order stable ADP.

Solution to Exercise 10.2.2

Fix $v \in \RR^\Xsf$ . Policy $\sigma$ is $v$ -max-greedy for $\aA$ if and only if $T_\sigma \, v \geq T_\tau \, v$ for all $\tau \in \Sigma$ , which in turn holds if and only if

\begin{aligned} r(x, \sigma(x)) & + \sum_{x'} v^*(x') Q(x, \sigma(x), x') + (1 - \delta) v^*(x) \\ & = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v^*(x') Q(x, a, x') \right\} + (1 - \delta) v^*(x), \end{aligned}

for all $x \in \Xsf$ . Canceling terms, this reduces to

r(x, \sigma(x)) + \sum_{x'} v^*(x') Q(x, \sigma(x), x') = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v^*(x') Q(x, a, x') \right\},

for all $x \in \Xsf$ , which is equivalent to the definition of $v^*$ -greedy for $\cC$ in (10.50).

10.2.3Optimality¶

For a continuous-time MDP $\cC = (\Gamma, \delta, r, Q)$ with $\sigma$ -value functions $\{v_\sigma\}$ ,

the value function generated by $\cC$ is $v^* \coloneq \bigvee_\sigma v_\sigma$ , and
a policy is called optimal for $\cC$ if $v_\sigma = v^*$ .

A function $v \in \RR^\Xsf$ is said to satisfy a Hamilton–Jacobi–Bellman (HJB) equation if

\delta v(x) = \max_{a \in \Gamma(x)} \left\{r(x, a) + \sum_{x'} v(x') Q(x, a, x')\right\} \quad \text{for all } x \in \Xsf.

(10.52)

We say that $\cC$ obeys Bellman’s principle of optimality if

\sigma \in \Sigma \text{ is optimal for } \cC \quad \iff \quad \sigma \text{ is } v^* \text{-greedy}.

Here is our main optimality result for continuous-time MDPs.

Proof

Let $\cC = (\Gamma, \delta, r, Q)$ be a fixed continuous-time MDP with lifetime values $\{v_\sigma\}$ and value function $v^*$ . Consider the order stable ADP $\aA \coloneq (\RR^\Xsf, \{T_\sigma\})$ discussed in Section 10.2.2.5. The ADP Bellman max-operator is $T \coloneq \bigvee_\sigma T_\sigma$ , which can be written more explicitly as

(T v)(x) = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v(x') Q(x, a, x') \right\} + (1 - \delta) v(x) .

(10.53)

It is clear from (10.50) and Exercise 10.2.2 that, for each $v \in \RR^\Xsf$ , the set of $v$ -max-greedy policies is nonempty. Since $\Sigma$ is finite, it follows from Proposition 9.2.1 that $\aA$ is max-stable. Hence, by Theorem 9.2.4, an optimal policy always exists and the value function $v^*$ is the unique fixed point of $T$ in $\RR^\Xsf$ . The last statement is equivalent to the assertion that $v^*$ is the unique element of $\RR^\Xsf$ satisfying

v^*(x) = \max_{a \in \Gamma(x)} \left\{ r(x,a) + \sum_{x'} v^*(x') Q(x, a, x') \right\} + (1 - \delta) v^*(x).

Rearranging this expression confirms that $v^*$ is the unique solution to the HJB equation in $\RR^\Xsf$ .

Applying Theorem 9.2.4 again, a policy is optimal for $\aA$ if and only if $T_\sigma \, v^* = T v^*$ . Since the definition of optimality for $\aA$ coincides with the definition of optimality for $\cC$ , we see that $\cC$ obeys Bellman’s principle of optimality.

The continuous-time HPI routine described in Algorithm 10.2 is just ADP max-HPI (see Section 9.2.1.4) specialized to the current setting. Hence, applying Theorem 9.2.4 once more, continuous time HPI converges to an optimal policy in finitely many steps. ◻

10.2.4Application: Job Search¶

Here we study a continuous-time version of the job search problem with separation considered in Section 3.3.2. As before, a worker can be either unemployed (state 0) or employed (state 1). When the worker is employed, she can be fired at any time. Firing occurs at rate $\alpha > 0$ , meaning that the probability of being fired over the short interval from $t$ to $t+h$ is approximately $\alpha h$ . When unemployed, the worker receives flow unemployment compensation $c$ and job offers at rate $\kappa$ . She can choose either to accept or to reject an offer; she discounts the future at rate $\delta > 0$ .

We assume that job offers are associated with wage offers that take values in finite set $\Wsf$ . Let $P \in \mopw$ give probabilities for new wage draws, so that, conditional on previous draw $w$ , the next offer is drawn from $P(w, \cdot)$ .

For the state space we set $\Xsf = \{0, 1\} \times \Wsf$ , with typical state $x = (s, w)$ . Here $s$ is binary and indicates current employment status, while $w$ is the current wage. Let

\lambda(x) = \lambda(s, w) = \1\{s = 0\} \kappa + \1\{s = 1\} \alpha

denote the state-dependent jump rate, which switches between $\kappa$ and $\alpha$ depending on employment status.

Let $a \in \Asf \coloneq \{0,1\}$ indicate the action, where 0 means reject and 1 means accept. Let $\Pi(x, a, x')$ represent the jump probabilities, with

\begin{aligned} \Pi((0, w), a, (0, w')) & = P(w, w') (1-a) \qquad \text{(unemployed to unemployed)} \\ \Pi((0, w), a, (1, w')) & = P(w, w') a \; \,\qquad \qquad \text{(unemployed to employed)} \\ \Pi((1, w), a, (0, w')) & = P(w, w') \quad \qquad \qquad \text{(employed to unemployed)} \\ \Pi((1, w), a, (1, w')) & = 0. \qquad\qquad\qquad \quad \; \; \, \, \text{(employed to employed)} \end{aligned}

The first two lines consider jump probabilities for the state $(s, w)$ when unemployed and the action is $a$ . The second two consider jump probabilities when employed. The reason that the probability assigned to the last line is zero is that a jump from $s=1$ occurs because the worker is fired, so the value of $s$ after the jump is zero.

Motivated by the jump chain construction of intensity matrices in (10.31), we set

Q(x, a, x') = \lambda(x) (\Pi(x, a, x') - I(x, x')).

It follows that, for any $\sigma \in \Sigma \coloneq \{0,1\}^\Xsf$ , the operator

Q_\sigma(x, x') \coloneq \lambda(x) (\Pi(x, \sigma(x), x') - I(x, x')),

is an intensity matrix for the jump chain under policy $\sigma$ .

If we define

r(x, a) = r((s, w), a) = c \1\{s = 0\} + w \1\{s = 1\},

then lifetime value is given by (10.48), where $(X_t)_{t \geq 0}$ is $Q_\sigma$ -Markov and $X_0 = x$ .

With $\Gamma$ defined by $\Gamma(x) = \Asf$ for all $x \in \Xsf$ , the tuple $\cC = (\Gamma, \delta, r, Q)$ is a continuous-time MDP and Theorem 10.2.4 applies. In particular, an optimal policy exists and can be computed with HPI in a finite number of iterations.

Figure 10.4 shows an optimal policy computed in this way. (Code and parameter values can be found in cont_time_js.jl.) The policy is of threshold type, with a reservation wage of around 12. Figure 10.5 shows how this reservation wage changes with parameters. The reservation wage increases as the separation rate falls, as the offer rate increases, as the discount rate falls, and as unemployment compensation increases.

Figure 10.4:Continuous-time job search policy

Figure 10.5:Continuous-time job search reservation wage

10.3Chapter Notes¶

Applebaum (2019) and Engel & Nagel (2006) provide elegant introductions to semigroup theory and its applications in studying partial and stochastic differential equations. The beautiful book by Lasota & Mackey (1994) covers connections among semigroups, Markov processes, and stochastic differential equations. Norris (1998) provides a good introduction to continuous-time Markov chains, while Liggett (2010) is more advanced.

A rigorous treatment of continuous-time MDPs can be found in Hernández-Lerma & Lasserre (2012), which also handles the case where $\Xsf$ is countably infinite. Our approach is somewhat different, since our main optimality results rest on the ADP theory in Chapter 9.

In recent years, continuous-time dynamic programming has become more common in macroeconomic analysis. Influential references include Nuño & Moll (2018), Kaplan et al. (2018), Achdou et al. (2022), and Fernández-Villaverde et al. (2023). For computational aspects, see Duarte (2018), Ráfales & Vázquez (2021), Rendahl (2022), and Eslami & Phelan (2023).

Footnotes¶

Other names for intensity matrices include “ $Q$ -matrices” (which is fine until you need to use another symbol), “Kolmogorov matrices,” and “infinitesimal stochastic matrices.”
↩

References¶

Hirsh, M., & Smale, S. (1974). Differential Equations, Dynamical Systems and Linear Algebra. Academic Press.
Engel, K.-J., & Nagel, R. (2006). A short course on operator semigroups. Springer Science & Business Media.
Applebaum, D. (2019). Semigroups of Linear Operators (Vol. 93). Cambridge University Press.
Garman, M. B. (1985). Towards a semigroup pricing theory. The Journal of Finance, 40(3), 847–861.
Duffie, D., & Garman, M. B. (1986). Intertemporal Arbitrage and the Markov Valuation of Securities. Citeseer.
Liggett, T. M. (2010). Continuous Time Markov Processes: An Introduction (Vol. 113). American Mathematical Society.
Lasota, A., & Mackey, M. C. (1994). Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics (2nd ed., Vol. 97). Springer Science & Business Media.
Norris, J. R. (1998). Markov Chains. Cambridge University Press.
Hernández-Lerma, O., & Lasserre, J. B. (2012). Further Topics on Discrete-Time Markov Control Processes (Vol. 42). Springer Science & Business Media.
Nuño, G., & Moll, B. (2018). Social optima in economies with heterogeneous agents. Review of Economic Dynamics, 28, 150–180.
Kaplan, G., Moll, B., & Violante, G. L. (2018). Monetary policy according to HANK. American Economic Review, 108(3), 697–743.
Achdou, Y., Han, J., Lasry, J.-M., Lions, P.-L., & Moll, B. (2022). Income and wealth distribution in macroeconomics: A continuous-time approach. The Review of Economic Studies, 89(1), 45–86.
Fernández-Villaverde, J., Hurtado, S., & Nuno, G. (2023). Financial frictions and the wealth distribution. Econometrica, 91(3), 869–901.
Duarte, V. (2018). Machine learning for continuous-time economics [Techreport]. SSRN, 3012602.
Ráfales, J., & Vázquez, C. (2021). Equilibrium models with heterogeneous agents under rational expectations and its numerical solution. Communications in Nonlinear Science and Numerical Simulation, 96, 105673.

10 Continuous-Time

10.1Continuous-Time Markov Chains¶

10.1.1Background¶

10.1.1.1Scalar Exponentials¶

10.1.1.2The Exponential Distribution¶

10.1.1.3Extension to Matrices¶

10.1.2Continuous-Time Flows¶

10.1.2.1Continuous-Time Dynamical Systems¶

10.1.2.2Linear Initial Value Problems¶

10.1.2.3Stability in the Diagonalizable Case¶

10.1.2.4The General Case¶

10.1.2.5Semigroup Terminology¶

10.1.3Markov Semigroups¶

10.1.3.1Intensity Matrices¶

10.1.3.2Interpretation¶

10.1.3.3Markov Semigroups¶

10.1.4Continuous-Time Markov Chains¶

10.1.4.1Definition¶

10.1.4.2A Jump Chain Construction¶

10.1.4.3Application: Inventory Dynamics¶

10.1.4.4From Intensity Matrices to Jump Chains¶

10.2Continuous-Time Markov Decision Processes¶

10.2.1Valuation¶

10.2.1.1A Semigroup Perspective¶

10.2.1.2Valuations as Expectations¶

10.2.1.3Constant Discounting¶

10.2.2Constructing a Decision Process¶

10.2.2.1Definition¶

10.2.2.2Lifetime Values¶

10.2.2.3Greedy Policies¶

10.2.2.4Policy Iteration¶

10.2.2.5Policy Operators¶

10.2.3Optimality¶

10.2.4Application: Job Search¶

10.3Chapter Notes¶