Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

10 Continuous-Time

Authors
Affiliations
New York University
Australian National University

Earlier chapters treated dynamics in discrete time. Now we switch to continuous time. We restrict ourselves to finite state spaces, where continuous-time processes are pure jump processes. This allows us to provide a rigorous and self-contained treatment, while laying foundations for a treatment of general state problems.

10.1Continuous-Time Markov Chains

In this section, we introduce continuous-time Markov models. In Section 10.2, we will use them as components of continuous-time Markov decision processes.

10.1.1Background

In Section 3.1.1 we learned that if (Xt)=(X0,X1,)(X_t) = (X_0, X_1, \ldots) is PP-Markov, then the distributions (ψt)(\psi_t) of the state process obey ψt+1=ψtP\psi_{t+1} = \psi_t P for all tt. This update rule is a linear difference equation in distribution space, which in turn suggests that, once we switch to continuous-time, distributions will evolve according to linear differential equations in distribution space.

This idea turns out to be correct. As such, we begin this chapter with some facts about linear differential equations.

10.1.1.1Scalar Exponentials

Solutions to linear differential equations involve exponential functions. The real-valued exponential function can be defined by the power series

ex:=:exp(x)k0xkk!(xR).\me^x :=: \exp(x) \coloneq \sum_{k \geq 0} \frac{x^k}{k!} \qquad (x \in \RR).

The continuous-time system in Example 10.1.1 is closely related to the discrete time difference equation ut+1=erutu_{t+1} = \me^{r} u_t. Indeed, if we start at u0u_0, then the tt-th iterate is ertu0\me^{r t} u_0, so solutions agree at integer times. We can think of the continuous-time system as one that interpolates between points in time of a corresponding discrete time system.

The exponential eλ\me^\lambda of λ=a+ibC\lambda = a + i b \in \CC can also be defined via (10.1). From the identity eib=cos(b)+isin(b)\me^{ib} = \cos(b) + i \sin(b), we obtain

eλ=ea+ib=ea(cos(b)+isin(b)).\me^{\lambda} = \me^{a + ib} = \me^{a}(\cos(b) + i \sin(b)).

This equation will soon prove useful.

10.1.1.2The Exponential Distribution

A random variable WW is said to be exponentially distributed with rate θ\theta, and we write W=dExp(θ)W \eqdist \Exp(\theta), when the counter CDF GG satisfies

G(t)P{W>t}=eθt(t0).G(t) \coloneq \PP\{W > t\} = \me^{- \theta t} \qquad (t \geq 0).

Continuous-time Markov chains have a close relationship with the exponential distribution, a fact that stems from its being the only distribution having the memoryless property

P{W>s+tW>s}=P{W>t}for all s,t>0.\PP \{W > s + t \given W > s \} = \PP \{W > t\} \quad \text{for all } s, t > 0.

The memoryless property is special. For example, the probability that an individual human being lives 70 years from birth is not equal to the probability that he or she lives another 70 years conditional on having reached age 70. In fact, the exponential distribution is the only memoryless distribution supported on the nonnegative reals:

10.1.1.3Extension to Matrices

The real exponential formula (10.1) extends to the matrix exponential via

eAI+A+A22!+=k0Akk!,\me^A \coloneq I + A + \frac{A^2}{2!} + \cdots = \sum_{k \geq 0} \frac{A^k}{k!},

where AA is any square matrix. As we will see, the matrix exponential plays a key role in the solution of vector-valued linear differential equations.

In Lemma 10.1.2 and what follows, integration or differentiation of a vector- or matrix-valued function is carried out element by element. For example, to differentiate a matrix B(t)=(bij(t))B(t) = (b_{ij}(t)) that depends on tt, we form a new matrix by differentiating each element bij(t)b_{ij}(t) with respect to tt. The integral abB(t) ⁣dt\int_a^b B(t) \diff t is the matrix of integrals abbij(t) ⁣dt\int_a^b b_{ij}(t) \diff t.

The proof of part (ii) of Lemma 10.1.2 uses the definition of the exponential and the binomial formula. See, for example, Hirsh & Smale (1974). Part (iii) follows directly from part (ii). Part (iv) follows easily from part (i) when AA is diagonalizable (and can be proved more generally via the Jordan canonical form).

As for (vii), we are drawing an analogy with the fundamental theorem of calculus for scalar-valued functions, which states that f(t)f(s)=stf(τ) ⁣dτf(t) - f(s) = \int_s^t f'(\tau) \diff \tau for all sts \leq t, where ff' is the derivative of ff.

10.1.2Continuous-Time Flows

Next, we study solutions of multivariate differential equations, with a focus on linear systems. These results lay foundations for our study of continuous-time Markov dynamics in Section 10.1.3.

10.1.2.1Continuous-Time Dynamical Systems

Recall from Section 2.1.1 that a discrete dynamical system is a pair (U,S)(U, S), where UU is a set and SS is a self-map on UU. Trajectories are sequences (Stu)t0=(u,Su,S2u,)(S^t u)_{t \geq 0} = (u, Su, S^2 u, \ldots), where uUu \in U is the initial condition. These ideas can be extended to continuous-time by considering a pair (U,(St)t0)(U, (S_t)_{t \geq 0}) where UU is any set and StS_t is a self-map on UU for each tR+t \in \RR_+. The interpretation is that if uUu \in U is the current state of the system, then StuS_t u will be the state after tt units of time.

In general, to understand (U,(St)t0)(U, (S_t)_{t \geq 0}) as a continuous-time dynamical system, we require that (a) S0S_0 is the identity map, so that the state after zero units of time is just the initial condition, and (b) if we start at uu, move forward to usSsuu_s \coloneq S_s u, and then move again to StusS_t u_s after another tt units of time, the outcome should be the same as moving from uu to Ss+tuS_{s \,+\, t} \, u directly. That is,

Ss+t=StSsfor all t,s0.S_{s \,+\, t} = S_t \circ S_s \quad \text{for all } t, s \geq 0.

This is the semigroup property.

One way that continuous-time dynamical systems arise is via initial value problems. An initial value problem (IVP) in Rn\RR^n consists of a differential equation u˙t=f(ut)\dot u_t = f(u_t) paired with an initial condition u0Rnu_0 \in \RR^n, where utRnu_t \in \RR^n and f ⁣:RnRnf \colon \RR^n \to \RR^n. Under suitable conditions on ff, the solution utF(t,u0)u_t \coloneq F(t, u_0) is uniquely defined for all t0t \geq 0, and, moreover,

F(0,u0)=u0andF(s+t,u0)=F(t,F(s,u0))for all s,t0F(0, u_0) = u_0 \quad \text{and} \quad F(s + t, u_0) = F(t, F(s, u_0)) \quad \text{for all } s, t \geq 0

(see, e.g., Hirsh & Smale (1974), Section 8.7). Hence (St)t0(S_t)_{t \geq 0} defined by Stu=F(t,u)S_t u = F(t, u) satisfies the semigroup property and (Rn,(St)t0)(\RR^n, (S_t)_{t \geq 0}) is a continuous-time dynamical system. The function ff is called the vector field of (Rn,(St)t0)(\RR^n, (S_t)_{t \geq 0}).

10.1.2.2Linear Initial Value Problems

Given our interest in continuous-time Markov chains and their connection to linear systems (see the comments at the start of Section 10.1.1), we focus primarily on linear differential equations. The next result discusses linear IVPs, illustrating the key role of the matrix exponential. In the statement, AA is n×nn \times n and both u˙t\dot u_t and utu_t are column vectors in Rn\RR^n.

(Here u˙t ⁣dut/ ⁣dt\dot u_t \coloneq \diff u_t / \diff t is defined by differentiating the vector utu_t element-by-element, as discussed after Lemma 10.1.2.)

Proposition 10.1.3 motivates us to study flows of the form

tut,ut=etAu0(t0),t \mapsto u_t, \quad u_t = \me^{t A} u_0 \qquad (t \geq 0),

where AA is n×nn \times n, u0u_0 is a vector in Rn\RR^n indicating the initial condition, and utu_t is the “state” of the system at time tt. Figure Figure 10.1 shows an example when

A=(2.00.401.41.02.20.02.00.6).A = \begin{pmatrix} -2.0 & -0.4 & 0 \\ -1.4 & -1.0 & 2.2 \\ 0.0 & -2.0 & -0.6 \end{pmatrix}.
Exponential flow t \mapsto \me^{tA}u_0 starting from u_0 \in \RR^3

Figure 10.1:Exponential flow tetAu0t \mapsto \me^{tA}u_0 starting from u0R3u_0 \in \RR^3

10.1.2.3Stability in the Diagonalizable Case

For an exponential flow such as (10.24), a key question is whether or not ut0u_t \to 0 as tt \to \infty. (This will matter when we try to evaluate lifetime rewards over an infinite horizon in continuous time.) Rather than analyze these issues at every possible u0u_0, we directly consider the matrix-valued flow tetAt \mapsto \me^{ t A} and study whether etA0\me^{tA} \to 0.

The case where AA is diagonalizable provides a good starting point. Suppose A=P1DPA = P^{-1} D P with D=diagj(λj)D = \diag_j (\lambda_j) containing the eigenvalues of AA. Then, by Lemma 10.1.2, for any t0t \geq 0, we have

etA=etP1DP=P1etDP.\me^{t A} = \me^{t P^{-1} D P} = P^{-1} \me^{t D } P.

Exercise 10.8 and equation (10.26) tell us that the long run dynamics of etA\me^{tA} are determined by the scalar flows tetλjt \mapsto \me^{t \lambda_j}. How does etλ\me^{t \lambda} evolve over time when λC\lambda \in \CC?

To answer this question we write λ=a+ib\lambda = a + ib and apply (10.4) to obtain

etλ=eta(cos(tb)+isin(tb)).\me^{t\lambda} = \me^{ta}(\cos(tb) + i \sin(tb)).

This equation tells us that

etλ0 as tif and only ifReλ<0,\me^{t\lambda} \to 0 \text{ as } t \to \infty \quad \text{if and only if} \quad \real \lambda < 0,

where Reλ\real \lambda is the real part of λ\lambda (i.e., if λ=a+ib\lambda = a + ib, then Reλ=a\real \lambda = a).

From this analysis, we conclude that, when AA is diagonalizable, we have etA0\me^{tA} \to 0 if and only if Reλj<0\real \lambda_j < 0 for all λjσ(A)\lambda_j \in \sigma(A), where σ(A)\sigma(A) denotes the set of all eigenvalues (the spectrum) of AA. Another way to put this is that etA0\me^{tA} \to 0 if and only if s(A)<0s(A) < 0, where

s(A)maxλσ(A)Reλ,s(A) \coloneq \max_{\lambda \in \sigma(A)} \real \lambda,

is the spectral bound of AA.

As the preceding analysis suggests, the spectral bound plays a key role in the asymptotics of exponential flows, just as a spectral radius governs asymptotics of trajectories of linear maps (see, e.g., Exercise 1.14). Section Section 10.1.2.4 expands on this analysis, while dropping the assumption that AA is diagonalizable.

10.1.2.4The General Case

Let AA be any square matrix. In the following statement about a spectral bound, \| \cdot \| is the operator norm defined in Section 1.2.1.4.

(The second equality in (10.30) also holds when the limit is taken over tR+t \in \RR_+. See, for example, Engel & Nagel (2006).)

The next theorem is a key stability result for exponential flows. Among other things, it extends to arbitrary AA the finding that s(A)<0s(A) < 0 is necessary and sufficient for stability.

A full proof of Theorem 10.1.5 in a general setting can be found in §V.II of Engel & Nagel (2006).

Theorem 10.1.5 tells us that the flow tetAu0t \mapsto \me^{tA} u_0 converges to the origin at an exponential rate if and only if s(A)<0s(A)<0. The equivalence of (i) and (ii) was proved for the diagonalizable case in Section 10.1.2.3. It can be viewed as the continuous-time analog of Bk0\|B^k\| \to 0 if and only if ρ(B)<1\rho(B) < 1 (see Exercise 1.14).

10.1.2.5Semigroup Terminology

Advanced treatments of continuous-time systems often begin with semigroups. Let’s briefly describe these and connect them to things we have studied earlier. (If you prefer to skip this section on first reading, you can move to the next one after noting that, given an n×nn \times n matrix AA, the family (St)t0=(etA)t0(S_t)_{t \geq 0} = (\me^{t A})_{t \geq 0} is called an exponential semigroup and that AA is called the infinitesimal generator of the semigroup.)

Let X\Xsf be a finite set and let (St)t0(S_t)_{t \geq 0} be a subset of L(RX)\lopx indexed by tR+t \in \RR_{+}. The family (St)t0(S_t)_{t \geq 0} is called a strongly continuous semigroup or C0C_0-semigroup on RX\RR^\Xsf if

  1. S0=IS_0 = I, where II is the identity,

  2. St+t=StStS_{t + \, t'} = S_t \circ S_{t'}, and

  3. tStut \mapsto S_t u is a continuous map from R+\RR_+ to RX\RR^\Xsf for every uRXu \in \RR^\Xsf.

In essence, a C0C_0-semigroup on RX\RR^\Xsf is a continuous-time dynamical system (RX,(St)t0)(\RR^\Xsf, (S_t)_{\, t \geq 0}) where each StS_t maps an initial state into a time tt state.

The semigroup perspective is important because it extends naturally to settings where X\Xsf is not finite, in which case we replace the finite-dimensional set RX\RR^\Xsf with some (typically infinite-dimensional) class of functions GRX\gG \subset \RR^\Xsf, and each StS_t becomes a linear operator mapping G\gG into itself. At this level of generality, StuS_t u can be the solution to a partial differential equation, or a stochastic differential equation (see, e.g., Engel & Nagel (2006) or Applebaum (2019)). Operator semigroup theory offers an elegant and powerful framework for handling such systems.

For semigroups in general settings we often have no analytical expressions for StS_t. This situation is like the one we encountered in the continuous-time system in Section 10.1.2.1, where u˙t=f(ut)\dot u_t = f(u_t) and ff is potentially nonlinear. When no analytical solution utu_t exists, analyzing the dynamics requires us to try to infer its properties from the vector field ff, so that ff becomes the primary focus of analysis.

A natural question, then, is, given a semigroup (St)t0(S_t)_{\, t \geq 0} on L(RX)\lopx, does there always exist a “vector field” type object that “generates” (St)t0(S_t)_{\, t \geq 0}? When X\Xsf is finite, the answer is affirmative. This object, to be denoted by AA, is called the infinitesimal generator of the semigroup and is defined by

A=limt0StS0t=limt0StItA = \lim_{t \, \downarrow \, 0} \frac{S_t - S_0}{t} = \lim_{t \, \downarrow \, 0} \frac{S_t - I}{t}

At uUu \in U, the vector AuA u indicates the instantaneous change in the state.

More precisely, when X\Xsf is finite, we have:

Semigroups of the form described in Proposition 10.1.6 are called exponential semigroups (or “uniformly continuous” semigroups).

A full proof of Proposition 10.1.6 can be found in the discussion of Theorem 2.12 of Engel & Nagel (2006). The results are not surprising, since the main claim is that, in finite dimensions, solutions to linear differential equations have exponential form. The fact that AA is the infinitesimal generator of the semigroup (St)t0=(etA)t0(S_t)_{\, t \geq 0} = (\me^{tA})_{\, t \geq 0} follows from Lemma 10.1.2, which gives

limt0StS0t=limt0etAe0t= ⁣d ⁣dtetA  t=0=Ae0A=A.\lim_{t \, \downarrow \, 0} \frac{S_t - S_0}{t} = \lim_{t \, \downarrow \, 0} \frac{\me^{t A} - \me^{0}}{t} = \frac{\diff}{\diff t} \me^{t A} \; \Bigr\rvert_{\, t = 0} = A \me^{0 A} = A.

The preceding discussion places our analysis in a wider context. To practice our new terminology, we can restate (i)     \iff (ii) from Theorem 10.1.5 by saying that the exponential semigroup (St)t0=(etA)t0(S_t)_{\, t \geq 0} = (\me^{tA})_{\, t \geq 0} converges to zero if and only if the spectral bound of its infinitesimal generator is negative.

10.1.3Markov Semigroups

Having studied multivariate linear dynamics, we are now ready to concentrate on the Markov case, where dynamics evolve in distribution space. For the most part we now switch to operator-theoretic notation, where X\Xsf is a finite set with nn elements, and an n×nn \times n matrix is identified with a linear operator on L(RX)\lopx. As emphasized in Section 2.3.3.1, this is merely a change in terminology, and all preceding results for matrices extend directly to linear operators.

10.1.3.1Intensity Matrices

If (Xt)t0(X_t)_{\, t \geq 0} is PP-Markov on X\Xsf for some PM(RX)P \in \mopx, then the marginal distributions of (Xt)t0(X_t)_{t \geq 0} evolve according to the linear difference system ψt+1=ψtP\psi_{t+1} = \psi_t P (see Section 3.1.1). We now seek a continuous-time analog in the form of a linear differential equation.

To this end we call QL(RX)Q \in \lopx an intensity operator or intensity matrix[1] when

Q(x,x)0 whenever xxandxQ(x,x)=0 for all xX.Q(x, x') \geq 0 \text{ whenever } x \not= x' \quad \text{and} \quad \sum_{x'} Q(x, x') = 0 \text{ for all } x \in \Xsf.

Let

I(RX)= the set of all intensity operators in L(RX).\iopx = \text{ the set of all intensity operators in } \lopx.

Consider the IVP

ψ˙t(x)=xQ(x,x)ψt(x)(t0,  xX),\dot \psi_t(x') = \sum_{x} Q(x, x') \psi_t(x) \qquad (t \geq 0, \; x' \in \Xsf),

which we can also write as

ψ˙t=ψtQ,ψ0D(X) given.\dot \psi_t = \psi_t \, Q, \qquad \psi_0 \in \dD(\Xsf) \text{ given}.

when ψt\psi_t and ψ˙t\dot \psi_t are understood to be row vectors. We say that D(X)\dD(\Xsf) is invariant for the IVP (10.40) if the solution (ψt)t0(\psi_t)_{t \geq 0} remains in D(X)\dD(\Xsf) for all t0t \geq 0.

In view of Proposition 10.1.3, we can rephrase this by stating that D(X)\dD(\Xsf) is invariant for (10.40) whenever

ψ0D(X)    ψ0etQD(X) for all t0.\psi_0 \in \dD(\Xsf) \quad \implies \quad \psi_0 \, \me^{t Q} \in \dD(\Xsf) \text{ for all } t \geq 0.

Our key result for this section shows the central role of intensity matrices:

Proposition 10.1.7 tells us that the set I(RX)\iopx coincides with the set of continuous-time (and time-homogeneous) Markov models on X\Xsf. Any specification outside this class fails to generate flows in distribution space. The proof is completed in several steps.

For Exercise 10.13--Exercise 10.15, QI(RX)Q \in \iopx and PtetQP_t \coloneq \me^{tQ}.

For the proof of Proposition 10.1.7, we have now shown that (i) implies (ii). Evidently (ii) implies (iii), because if ψ0D\psi_0 \in \dD and ψt=ψ0Pt\psi_t = \psi_0 P_t where PtP_t is stochastic, then ψtD(X)\psi_t \in \dD(\Xsf). Hence it remains only to show that (iii) implies (i).

Returning to Proposition 10.1.7, the last two exercises confirm that (iii) implies (i). The proof is now complete.

10.1.3.2Interpretation

Section Section 10.1.3.1 covered the mathematical relationship between intensity matrices and Markov operators. Let’s now discuss the connection more informally, in order to build intuition.

To this end, let (Xt)t0(X_t)_{t \geq 0} be PhP_h-Markov in discrete time. Here h>0h > 0 is the length of the time step. We write the corresponding distribution sequence ψt+h=ψtPh\psi_{t+h} = \psi_t P_h in terms of change per unit of time, as in

ψt+hψth=ψtPhIhwhereI is the n×n identity.\frac{\psi_{t+h} - \psi_t}{h} = \psi_t \frac{P_h - I}{h} \quad \text{where} \quad I \text{ is the } n \times n \text{ identity}.

Continuous-time dynamics are obtained by taking the limit as h0h \, \downarrow \, 0. If we define

Qlimh0PhIh,Q \coloneq \lim_{h \, \downarrow \, 0} \frac{P_h - I}{h},

and assume that limits exist, then (10.47) becomes (10.40).

What properties does QQ have? Inspecting (10.48) implies

Q(x,x)Ph(x,x)1{x=x}hQ(x, x') \approx \frac{P_h(x, x') - \1\{x = x'\}}{h}

when hh is small and positive.

Equation (10.50) tells us that Q(x,x)Q(x, x') represents the instantaneous rate of flow out of state xx and into state xx'. The on-diagonal value Ph(x,x)P_h(x,x) just balances the off-diagonal probabilities.

10.1.3.3Markov Semigroups

Fix QI(RX)Q \in \iopx. In the terminology of Section 10.1.2.5, the family of operators (Pt)t0=(etQ)t0M(RX)(P_t)_{t \geq 0} = (\me^{tQ})_{t \geq 0} \subset \mopx that solves ψ˙t=ψtQ\dot \psi_t = \psi_t Q (see (10.41)) is an exponential semigroup. Since each PtP_t is in M(RX)\mopx, it is also called the Markov semigroup generated by QQ. It satisfies the semigroup property Ps+t=PsPtP_{s \, + \, t} = P_s \, P_t for all s,t0s, t \geq 0, which can be written more explicitly as

Ps+t(x,x)=zPs(x,z)Pt(z,x)(s,t0,  x,xX).P_{s+t} (x, x') = \sum_z P_s(x, z) P_t(z, x') \qquad (s, t \geq 0, \; x, x' \in \Xsf).

In the present setting, (10.52) is called the (continuous-time) Chapman--Kolmogorov equation. It states that the probability of moving from xx to xx' over s+ts+t units of time equals the probability of moving from xx to zz over ss units of time, and then zz to xx' over tt units of time, summed over all zz.

Again following the terminology in Section 10.1.2.5, the intensity matrix QQ that defines (Pt)t0=(etQ)t0(P_t)_{t \geq 0} = (\me^{tQ})_{t \geq 0} is also called the infinitesimal generator of (Pt)t0(P_t)_{t \geq 0}.

From Lemma 10.1.2, the derivative of etQ\me^{tQ} is QetQ=etQQQ\me^{tQ} = \me^{tQ}Q. We can write this as

We can work in the other direction as well: If we can establish that a function tPtt \mapsto P_t from R+\RR_+ to L(RX)\lopx satisfies either one of these equations, then (Pt)t0(P_t)_{t \geq 0} is a Markov semigroup with infinitesimal generator QQ. The next proposition gives details.

Proposition 10.1.8 is a version of our result for linear IVPs in Proposition 10.1.3, except that the IVP is now defined in operator space, rather than vector space.

10.1.4Continuous-Time Markov Chains

We have discussed the connection between intensity matrices, Markov semigroups, and distribution flows. Let’s now connect these objects to continuous-time Markov chains. In this section, we will (a) provide a formal definition of a continuous-time Markov chain associated with a given initial condition ψ\psi and intensity matrix QQ, and (b) show how to construct such a chain algorithmically. We’ll accomplish (b) in two steps: first showing how to construct a jump chain from certain primitives (Section 10.1.4.2--Section 10.1.4.3) and then showing how to construct those primitives from a given initial condition ψ\psi and intensity matrix QQ (Section 10.1.4.4).

10.1.4.1Definition

Let C(R+,X)C(\RR_+, \Xsf) be the set of right-continuous functions from R+\RR_+ to X\Xsf and let (Pt)t0(P_t)_{t \geq 0} be a Markov semigroup generated by some QI(RX)Q \in \iopx. A continuous-time Markov chain generated by (Pt)t0(P_t)_{t \geq 0} is a random function (Xt)t0(X_t)_{t \geq 0} that takes values in C(R+,X)C(\RR_+, \Xsf) and satisfies

P{Xs+t=xFs}=Pt(Xs,x)for all s,t0 and xX,\PP \{ X_{s \,+\, t} = x' \given \fF_s \} = P_t(X_s, x') \qquad \text{for all } s, t \geq 0 \text{ and } x' \in \Xsf,

where Fs(Xτ)0τs\fF_s \coloneq (X_\tau)_{0 \leq \tau \leq s} is the history of the process up to time ss. To update from time ss to time tt given this history, we simply take the last value XsX_s and update using PtP_t. Conditioning on Xs=xX_s = x, we get

Pt(x,x)=P{Xs+t=xXs=x}(s,t0,  x,xX).P_t(x, x') = \PP \{ X_{s \, + \, t} = x' \given X_s = x\} \qquad (s, t \geq 0, \; x, x' \in \Xsf).

Mirroring terminology for discrete chains from Section 3.1.1.1, we will call a continuous-time Markov chain (Xt)t0(X_t)_{t \geq 0} QQ-Markov when (10.53) holds and QQ is the infinitesimal generator of (Pt)t0(P_t)_{t \geq 0}.

In what follows, Px\PP_x and Ex\EE_x denote probabilities and expectations conditional on X0=xX_0 = x. Given hRXh \in \RR^\Xsf, we have

Exh(Xt)=xPt(x,x)h(x)=:(Pth)(x).\EE_x \, h(X_t) = \sum_{x'} P_t(x, x')h(x') =: (P_t h)(x) .

This expression mirrors the discrete time case discussed in Section 3.2.1.1.

10.1.4.2A Jump Chain Construction

In Section 10.1.4.1 we defined a continuous-time Markov chain. In this section, we describe a standard method for constructing one by using three components:

  1. an initial condition ψD(X)\psi \in \dD(\Xsf),

  2. a jump matrix ΠM(RX)\Pi \in \mopx, and

  3. a rate function λ\lambda mapping X\Xsf to (0,)(0, \infty).

The process (Xt)(X_t) starts at state xx, which is drawn from ψ\psi, waits there for an exponential time WW with rate λ(x)\lambda(x), and then updates to a new state xx' drawn from Π(x,)\Pi(x, \cdot). We take xx' as the new state for the process and repeat.

These ideas are restated in Algorithm 10.1. In the algorithm, (Wk)(W_k) and (Yk)(Y_k) are drawn independently. The process (Wk)(W_k) is called the sequence of holding times or wait times, the sums Jk=i=1kWiJ_k = \sum_{i=1}^k W_i are called the jump times and (Yk)(Y_k) is called the embedded jump chain. The jumps and the process (Xt)t0(X_t)_{t \geq 0} are illustrated in Figure Figure 10.2.

A jump chain sample path

Figure 10.2:A jump chain sample path

Let IL(RX)I \in \lopx be the identity matrix, so I(x,x)=1{x=x}I(x,x') = \1\{x = x'\}, and define QL(RX)Q \in \lopx via

Q(x,x)=λ(x)(Π(x,x)I(x,x))(x,xX)Q(x, x') = \lambda(x)(\Pi(x, x') - I(x,x')) \qquad (x, x' \in \Xsf)

It is easy to verify that QQ is an intensity matrix. In fact, QQ is the intensity matrix for the Markov semigroup associated with the process generated by Algorithm 10.1. For xxx \not= x', it tells us that probability flows from xx to xx' at rate λ(x)Π(x,x)\lambda(x) \Pi(x, x'), which is the rate of leaving xx times the rate of moving from xx to xx'. The next result formalizes these ideas.

To prove Proposition 10.1.9 we take (Xt)t0(X_t)_{t \geq 0} to be as in the statement of the proposition and define (Pt)t0(P_t)_{t \geq 0} by Pt(x,x)=Px{Xt=x}P_t(x, x') = \PP_x \{X_t = x'\} for all x,xXx,x' \in \Xsf. The proof uses the following steps:

  1. Obtain an integral equation that (Pt)t0(P_t)_{t \geq 0} must satisfy.

  2. Differentiate to obtain the Kolmogorov backward equation P˙t=QPt\dot P_t = QP_t.

  3. Solve this differential equation to obtain Pt=etQP_t = \me^{t Q} for all tt.

Here is the first step. In the statement, ΠPtτ\Pi P_{t-\tau} is the matrix product of Π\Pi and PtτP_{t-\tau}, while the equation in (10.57) is sometimes called the integrated Kolmogorov backward equation.

Differentiating the integrated Kolmogorov backward equation produces the Kolmogorov backward equation:

10.1.4.3Application: Inventory Dynamics

Let XtX_t be a firm’s inventory at time tt. When current stock is x>0x > 0, customers arrive at rate λ(x)\lambda(x), so the wait time for the next customer is an independent draw from the Exp(λ(x))\Exp (\lambda(x)) distribution; λ\lambda maps X\Xsf to (0,)(0, \infty).

The kk-th customer demands UkU_k units, where each UkU_k is an independent draw from a fixed distribution ϕ\phi on N\NN. Purchases are constrained by inventory, however, so inventory falls by UkXtU_k \wedge X_t. When inventory hits zero the firm orders bb units of new stock. The wait time for new stock is also exponential, being an independent draw from Exp(λ(0))\Exp (\lambda(0)).

Let YY represent the inventory size after the next jump (induced by either a customer purchase or ordering new stock), given current stock xx. If x>0x > 0, then YY is a draw from the distribution of xUxx - U \wedge x where UϕU \sim \phi. If x=0x=0, then YbY \equiv b. Hence YY is a draw from Π(x,)\Pi(x, \cdot), where Π(0,y)=1{y=b}\Pi(0, y) = \1\{y=b\} and, for 0<xb0 < x \leq b,

Π(x,y)={0 if xyP{xU=y} if 0<y<xP{Ux} if y=0\Pi(x, y) = \begin{cases} 0 & \text{ if } x \leq y \\ \PP\{x - U = y\} & \text{ if } 0 < y < x \\ \PP\{U \geq x\} & \text{ if } y = 0 \end{cases}

We can simulate the inventory process (Xt)t0(X_t)_{t \geq 0} via the jump chain algorithm. In this case, the wait time sequence (Wk)(W_k) is the wait time for customers (and for inventory when Xt=0X_t=0) and the jump sequence (Yk)(Y_k) is the level of inventory immediately after each jump. By Proposition 10.1.9, the inventory process is QQ-Markov with QQ given by Q(x,x)=λ(x)(Π(x,x)I(x,x))Q(x, x') = \lambda(x)(\Pi(x, x') - I(x,x')).

Figure Figure 10.3 shows a simulation when orders are geometric, so that

ϕ(k)=P{U=k}=(1α)k1α(kN,  α(0,1)).\phi(k) = \PP\{U = k\} = (1-\alpha)^{k-1} \alpha \qquad (k \in \NN, \; \alpha \in (0, 1)).

In the simulation we set α=0.7\alpha=0.7, b=10b=10 and λ(x)0.5\lambda(x) \equiv 0.5. The figure plots XtX_t for t[0,50]t \in [0, 50]. Since each wait time WiW_i is a draw from Exp(0.5)\Exp(0.5) the mean wait time is 2.0. The function that produces the map tXtt \mapsto X_t is shown in Listing 1.

Continuous-time inventory dynamics

Figure 10.3:Continuous-time inventory dynamics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
using Random, Distributions

"""
Generate a path for inventory starting at b, up to time T.

Return the path as a function X(t) constructed from (J_k) and (Y_k).
"""
function sim_path(; T=10, seed=123, λ=0.5, α=0.7, b=10)

    J, Y = 0.0, b
    J_vals, Y_vals = [J], [Y]
    Random.seed!(seed)
    φ = Exponential(1/λ)     # Wait times are exponential
    G = Geometric(α)         # Orders are geometric

    while true
        W = rand(φ)  
        J += W
        push!(J_vals, J)
        if Y == 0
            Y = b
        else
            U = rand(G) + 1   # Geometric on 1, 2,...
            Y = Y - min(Y, U)
        end
        push!(Y_vals, Y)
        if J > T
            break
        end
    end
    
    function X(t)
        k = searchsortedlast(J_vals, t)
        return Y_vals[k+1]
    end

    return X
end

Program 1:Continuous-time inventory dynamics (inventory_cont_time.jl)

10.1.4.4From Intensity Matrices to Jump Chains

If QL(RX)Q \in \lopx is a given intensity matrix, how should we produce a continuous-time QQ-Markov chain? If we can construct a jump chain that is QQ-Markov, then not only do we obtain existence of a QQ-Markov chain but we also provide a way to simulate one (via Algorithm 10.1).

To construct such a jump chain we first fix an intensity matrix QL(RX)Q \in \lopx and, to simply matters, assume that all rows of QQ are nonzero. This means that the process has no absorbing states (since nonzero rows are equivalent to Q(x,x)<0Q(x,x) < 0 for all xx, which in turn states that there is a nonzero outflow from each state).

Then we set

λ(x)Q(x,x)andΠ(x,x)I(x,x)+Q(x,x)λ(x).\lambda(x) \coloneq -Q(x,x) \quad \text{and} \quad \Pi(x,x') \coloneq I(x,x') + \frac{Q(x,x')}{\lambda(x)}.

It is straightforward to confirm that ΠM(RX)\Pi \in \mopx and that QQ satisfies (10.56). Hence, by Proposition 10.1.9, the process (Xt)t0(X_t)_{t \geq 0} generated by Algorithm 10.1 is QQ-Markov.

10.2Continuous-Time Markov Decision Processes

We are ready to turn to dynamic programming in continuous-time. As for the discrete time case, continuous-time dynamic programs aim to maximize a measure of lifetime value. In Section 10.2.1 we study lifetime valuations. In Section 10.2.2 we learn how to maximize them.

10.2.1Valuation

In this section, we consider lifetime valuations associated with continuous reward flows, starting from a general semigroup perspective and then progressing to specific cases (such as expected lifetime value under constant discounting). Throughout, X\Xsf is a finite set.

10.2.1.1A Semigroup Perspective

For the discrete time problems with state-dependent discounting that we studied in Chapter 6, lifetime valuations take the form v=t0Kthv = \sum_{t \geq 0} K^t h for some hRXh \in \RR^\Xsf and a positive linear operator KK on RX\RR^\Xsf. (See Theorem 6.1.1 and (6.31).) For a continuous-time version we fix hRXh \in \RR^\Xsf, take (Kt)t0(K_t)_{t \geq 0} to be a positive exponential semigroup in L(RX)\lopx, where positive means Kt0K_t \geq 0 for all tt, and set

v=0Kth ⁣dt.v = \int_0^\infty K_t h \diff t.

Let AL(RX)A \in \lopx be the infinitesimal generator of (Kt)t0(K_t)_{\, t \geq 0}. The next result provides a condition for finiteness of vv and several characterizations.

A way to understand (10.69) is to view the valuation vv as a price that reflects prospective benefits from holding an asset. The asset yields a flow of benefits, where h(x)h(x) is the instantaneous reward in state xx. Rewards tt periods in the future are discounted by the pricing operator KtK_t. Thus, (Kth)(x)(K_t h)(x) is the anticipated payoff tt periods ahead, discounted for the wait time and possibly also for risk as in (6.54). The value v(x)v(x) is then lifetime value, which equals the current price.

In this asset valuation setting, (10.69) is a natural consistency condition. It says that the price of purchasing the asset today is equal to the payouts obtained from holding the asset from now until time tt and then selling it for current discounted value KtvK_t v. (This is the continuous-time analog of (6.57).)

The preceding discussion matches the semigroup perspective on asset pricing introduced in Garman (1985) and Duffie & Garman (1986). In addition to shedding light on (10.69), it also leads to the assertion that v=A1hv = - A^{-1} h in (ii), which is obtained by differentiating (10.69). Details are in the proof:

10.2.1.2Valuations as Expectations

In applications, the expression v=0Kth ⁣dtv = \int_0^\infty K_t \, h \diff t from (10.68) typically arises as a discounted expectation over a flow of rewards. When analyzing vv we wish to deploy Proposition 10.2.1, so we need to check that any expectation we propose results in (Kt)(K_t) being a semigroup. The next proposition provides one result along these lines.

In the proof of Proposition 10.2.2, we will use the fact that (Xt)t0(X_t)_{t \geq 0} satisfies the Markov property. In particular, if HH is a real-valued function on the path space C(R+,X)C(\RR_+, \Xsf), then

E[H((Xτ)τs)(Xτ)τ=0s]=EXsH((Xτ)τ0).\EE \left[ H( (X_\tau)_{\tau \geq s} ) \,|\, (X_\tau)_{\tau=0}^s \right] = \EE_{X_s} H( (X_\tau)_{\tau \geq 0} ).

For a proof of (10.76), see, for example, Chapter 2 of Liggett (2010).

10.2.1.3Constant Discounting

Many studies of continuous-time dynamic programming with discounting use a constant discount rate. In this setting, the lifetime value in (10.68) becomes

v(x)Ex0etδh(Xt) ⁣dtv(x) \coloneq \EE_x \int_0^\infty \me^{-t \delta} h(X_t) \diff t

for some δR\delta \in \RR and hRXh \in \RR^\Xsf. Here (Xt)t0(X_t)_{t \geq 0} is a continuous-time Markov chain on finite state X\Xsf generated by Markov semigroup (Pt)t0(P_t)_{t \geq 0} with intensity operator QQ. The idea is that h(Xt)h(X_t) is an instantaneous reward at each time tt, while δ\delta is a fixed discount rate.

Equation (10.82) is the continuous-time version of (3.33).

10.2.2Constructing a Decision Process

In this section, we define continuous-time Markov decision processes, discuss optimality theory, and provide algorithms and applications.

10.2.2.1Definition

Given two finite sets A\Asf and X\Xsf, called the state and action spaces respectively, we define a continuous-time Markov decision process (or continuous-time MDP) to be a tuple C=(Γ,δ,r,Q)\cC = (\Gamma, \delta, r, Q) consisting of

  1. a nonempty correspondence Γ\Gamma from X\Xsf to A\Asf, referred to as the feasible correspondence, which in turn defines the feasible state-action pairs

G{(x,a)X×A:aΓ(x)},\Gsf \coloneq \setntn{(x, a) \in \Xsf \times \Asf}{a \in \Gamma(x)},
  1. a constant δ>0\delta > 0, referred to as the discount rate,

  2. a function rr from G\Gsf to R\RR, referred to as the reward function, and

  3. an intensity kernel QQ from G\Gsf to X\Xsf; that is, a map QQ from G×X\Gsf \times \Xsf to R\RR satisfying

xQ(x,a,x)=0 for all (x,a) in G\sum_{x'} Q(x, a, x') = 0 \quad \text{ for all } (x,a) \text{ in } \Gsf

and Q(x,a,x)0Q(x, a, x') \geq 0 whenever xxx \not= x'.

Informally, at state xx with action aa over the short interval from tt to t+ht+h, the controller receives instantaneous reward r(x,a)hr(x,a)h and the state transitions to state xx' with probability Q(x,a,x)h+o(h)Q(x, a, x') h + o(h).

Paralleling our discussion of the discrete time case in Chapter 5, the set of feasible policies is

Σ{σAX:σ(x)Γ(x) for all xX}.\Sigma \coloneq \setntn{\sigma \in \Asf^\Xsf} {\sigma(x) \in \Gamma(x) \text{ for all } x \in \Xsf}.

10.2.2.2Lifetime Values

Choosing policy σ\sigma from Σ\Sigma means that we respond to state XtX_t with action Atσ(Xt)A_t \coloneq \sigma(X_t) at every t0t \geq 0. The state then evolves according to the intensity operator

Qσ(x,x)Q(x,σ(x),x)(x,xX).Q_\sigma(x, x') \coloneq Q(x, \sigma(x), x') \qquad (x, x' \in \Xsf).

Letting

PtσetQσandrσ(x)r(x,σ(x))(xX)P^\sigma_t \coloneq \me^{t Q_\sigma} \quad \text{and} \quad r_\sigma(x) \coloneq r(x, \sigma(x)) \qquad (x \in \Xsf)

the lifetime value of following σ\sigma starting from state xx is

vσ(x)Ex0eδtr(Xt,σ(Xt)) ⁣dt=Ex0eδtrσ(Xt) ⁣dt,v_\sigma (x) \coloneq \EE_x \int_0^\infty \me^{-\delta t} r(X_t, \sigma(X_t)) \diff t = \EE_x \int_0^\infty \me^{-\delta t} r_\sigma(X_t) \diff t,

where (Xt)t0(X_t)_{t \geq 0} is QσQ_\sigma-Markov with initial condition xx. We call vσv_\sigma the σ\sigma-value function.

Since δ>0\delta > 0, we can apply Proposition 10.2.3 to obtain

vσ=(δIQσ)1rσ.v_\sigma = (\delta I - Q_\sigma)^{-1} r_\sigma.

Representation (10.94) provides a straightforward method for computing vσv_\sigma.

10.2.2.3Greedy Policies

A policy σΣ\sigma \in \Sigma is called vv-greedy for C\cC if

σ(x)argmaxaΓ(x){r(x,a)+xv(x)Q(x,a,x)}for all xX.\sigma(x) \in \argmax_{a \in \Gamma(x)} \left\{r(x, a) + \sum_{x'} v(x') Q(x, a, x')\right\} \quad \text{for all } x \in \Xsf.

Like the discrete time case, a vv-greedy policy chooses actions optimally to trade off high current rewards versus high rate of flow into future states with high values. Unlike the discrete time case, the discount factor does not appear in (10.95) because the trade-off is instantaneous.

10.2.2.4Policy Iteration

We introduce a continuous-time policy iteration algorithm that parallels discrete time HPI for Markov decision processes, as described in Section 5.1.4.2.

The continuous-time HPI routine is given in Algorithm 10.2, with the intuition being similar to that for the discrete time MDP version given. We provide convergence results in Section 10.2.3.

10.2.2.5Policy Operators

For each σΣ\sigma \in \Sigma, let TσT_\sigma be the operator defined at vRXv \in \RR^\Xsf by

Tσv=rσ+(Qσ+(1δ)I)v.T_\sigma \, v = r_\sigma + (Q_\sigma + (1 - \delta) I) v.

As shown in Proposition 10.2.3, each TσT_\sigma is order stable on RX\RR^\Xsf, with unique fixed point vσv_\sigma. Hence A(RX,{Tσ})\aA \coloneq (\RR^\Xsf, \{T_\sigma\}) is an order stable ADP.

10.2.3Optimality

For a continuous-time MDP C=(Γ,δ,r,Q)\cC = (\Gamma, \delta, r, Q) with σ\sigma-value functions {vσ}\{v_\sigma\},

A function vRXv \in \RR^\Xsf is said to satisfy a Hamilton--Jacobi--Bellman (HJB) equation if

δv(x)=maxaΓ(x){r(x,a)+xv(x)Q(x,a,x)}for all xX.\delta v(x) = \max_{a \in \Gamma(x)} \left\{r(x, a) + \sum_{x'} v(x') Q(x, a, x')\right\} \quad \text{for all } x \in \Xsf.

We say that C\cC obeys Bellman’s principle of optimality if

σΣ is optimal for C    σ is v-greedy.\sigma \in \Sigma \text{ is optimal for } \cC \quad \iff \quad \sigma \text{ is } v^* \text{-greedy}.

Here is our main optimality result for continuous-time MDPs.

Here we study a continuous-time version of the job search problem with separation considered in Section 3.3.2. As before, a worker can be either unemployed (state 0) or employed (state 1). When the worker is employed, she can be fired at any time. Firing occurs at rate α>0\alpha > 0, meaning that the probability of being fired over the short interval from tt to t+ht+h is approximately αh\alpha h. When unemployed, the worker receives flow unemployment compensation cc and job offers at rate κ\kappa. She can choose either to accept or to reject an offer; she discounts the future at rate δ>0\delta > 0.

We assume that job offers are associated with wage offers that take values in finite set W\Wsf. Let PM(RW)P \in \mopw give probabilities for new wage draws, so that, conditional on previous draw ww, the next offer is drawn from P(w,)P(w, \cdot).

For the state space we set X={0,1}×W\Xsf = \{0, 1\} \times \Wsf, with typical state x=(s,w)x = (s, w). Here ss is binary and indicates current employment status, while ww is the current wage. Let

λ(x)=λ(s,w)=1{s=0}κ+1{s=1}α\lambda(x) = \lambda(s, w) = \1\{s = 0\} \kappa + \1\{s = 1\} \alpha

denote the state-dependent jump rate, which switches between κ\kappa and α\alpha depending on employment status.

Let aA{0,1}a \in \Asf \coloneq \{0,1\} indicate the action, where 0 means reject and 1 means accept. Let Π(x,a,x)\Pi(x, a, x') represent the jump probabilities, with

Π((0,w),a,(0,w))=P(w,w)(1a)(unemployed to unemployed)Π((0,w),a,(1,w))=P(w,w)a  (unemployed to employed)Π((1,w),a,(0,w))=P(w,w)(employed to unemployed)Π((1,w),a,(1,w))=0.    (employed to employed)\begin{aligned} \Pi((0, w), a, (0, w')) & = P(w, w') (1-a) \qquad \text{(unemployed to unemployed)} \\ \Pi((0, w), a, (1, w')) & = P(w, w') a \; \,\qquad \qquad \text{(unemployed to employed)} \\ \Pi((1, w), a, (0, w')) & = P(w, w') \quad \qquad \qquad \text{(employed to unemployed)} \\ \Pi((1, w), a, (1, w')) & = 0. \qquad\qquad\qquad \quad \; \; \, \, \text{(employed to employed)} \end{aligned}

The first two lines consider jump probabilities for the state (s,w)(s, w) when unemployed and the action is aa. The second two consider jump probabilities when employed. The reason that the probability assigned to the last line is zero is that a jump from s=1s=1 occurs because the worker is fired, so the value of ss after the jump is zero.

Motivated by the jump chain construction of intensity matrices in (10.56), we set

Q(x,a,x)=λ(x)(Π(x,a,x)I(x,x)).Q(x, a, x') = \lambda(x) (\Pi(x, a, x') - I(x, x')).

It follows that, for any σΣ{0,1}X\sigma \in \Sigma \coloneq \{0,1\}^\Xsf, the operator

Qσ(x,x)λ(x)(Π(x,σ(x),x)I(x,x)),Q_\sigma(x, x') \coloneq \lambda(x) (\Pi(x, \sigma(x), x') - I(x, x')),

is an intensity matrix for the jump chain under policy σ\sigma.

If we define

r(x,a)=r((s,w),a)=c1{s=0}+w1{s=1},r(x, a) = r((s, w), a) = c \1\{s = 0\} + w \1\{s = 1\},

then lifetime value is given by (10.93), where (Xt)t0(X_t)_{t \geq 0} is QσQ_\sigma-Markov and X0=xX_0 = x.

With Γ\Gamma defined by Γ(x)=A\Gamma(x) = \Asf for all xXx \in \Xsf, the tuple C=(Γ,δ,r,Q)\cC = (\Gamma, \delta, r, Q) is a continuous-time MDP and Theorem 10.2.4 applies. In particular, an optimal policy exists and can be computed with HPI in a finite number of iterations.

Figure Figure 10.4 shows an optimal policy computed in this way. (Code and parameter values can be found in cont_time_js.jl.) The policy is of threshold type, with a reservation wage of around 12. Figure Figure 10.5 shows how this reservation wage changes with parameters. The reservation wage increases as the separation rate falls, as the offer rate increases, as the discount rate falls, and as unemployment compensation increases.

Continuous-time job search policy

Figure 10.4:Continuous-time job search policy

Continuous-time job search reservation wage

Figure 10.5:Continuous-time job search reservation wage

10.3Chapter Notes

Applebaum (2019) and Engel & Nagel (2006) provide elegant introductions to semigroup theory and its applications in studying partial and stochastic differential equations. The beautiful book by Lasota & Mackey (1994) and covers connections among semigroups, Markov processes, and stochastic differential equations. Norris (1998) provides a good introduction to continuous-time Markov chains, while Liggett (2010) is more advanced.

A rigorous treatment of continuous-time MDPs can be found in Hernández-Lerma & Lasserre (2012), which also handles the case where X\Xsf is countably infinite. Our approach is somewhat different, since our main optimality results rest on the ADP theory in Chapter 9.

In recent years, continuous-time dynamic programming has become more common in macroeconomic analysis. Influential references include Nuño & Moll (2018), Kaplan et al. (2018), Achdou et al. (2022), and Fernández-Villaverde et al. (2023). For computational aspects, see Duarte (2018), Ráfales & Vázquez (2021), Rendahl (2022), and Eslami & Phelan (2023).

Footnotes
  1. Other names for intensity matrices include “QQ-matrices” (which is fine until you need to use another symbol), “Kolmogorov matrices,” and “infinitesimal stochastic matrices.”

References
  1. Hirsh, M., & Smale, S. (1974). Differential Equations, Dynamical Systems and Linear Algebra. Academic Press.
  2. Engel, K.-J., & Nagel, R. (2006). A short course on operator semigroups. Springer Science & Business Media.
  3. Applebaum, D. (2019). Semigroups of Linear Operators (Vol. 93). Cambridge University Press.
  4. Garman, M. B. (1985). Towards a semigroup pricing theory. The Journal of Finance, 40(3), 847–861.
  5. Duffie, D., & Garman, M. B. (1986). Intertemporal Arbitrage and the Markov Valuation of Securities. Citeseer.
  6. Liggett, T. M. (2010). Continuous Time Markov Processes: An Introduction (Vol. 113). American Mathematical Society.
  7. Lasota, A., & Mackey, M. C. (1994). Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics (2nd ed., Vol. 97). Springer Science & Business Media.
  8. Norris, J. R. (1998). Markov Chains. Cambridge University Press.
  9. Hernández-Lerma, O., & Lasserre, J. B. (2012). Further Topics on Discrete-Time Markov Control Processes (Vol. 42). Springer Science & Business Media.
  10. Nuño, G., & Moll, B. (2018). Social optima in economies with heterogeneous agents. Review of Economic Dynamics, 28, 150–180.
  11. Kaplan, G., Moll, B., & Violante, G. L. (2018). Monetary policy according to HANK. American Economic Review, 108(3), 697–743.
  12. Achdou, Y., Han, J., Lasry, J.-M., Lions, P.-L., & Moll, B. (2022). Income and wealth distribution in macroeconomics: A continuous-time approach. The Review of Economic Studies, 89(1), 45–86.
  13. Fernández-Villaverde, J., Hurtado, S., & Nuno, G. (2023). Financial frictions and the wealth distribution. Econometrica, 91(3), 869–901.
  14. Duarte, V. (2018). Machine learning for continuous-time economics [Techreport]. SSRN, 3012602.
  15. Ráfales, J., & Vázquez, C. (2021). Equilibrium models with heterogeneous agents under rational expectations and its numerical solution. Communications in Nonlinear Science and Numerical Simulation, 96, 105673.