Stochastic Discounting - Dynamic Programming Volume I: Finite States

In this chapter we describe how to extend the MDP model to handle time-varying discount factors, a specification now widely used in macroeconomics and finance.

6.1Time-Varying Discount Factors¶

We introduce formulas for infinite-horizon lifetime valuations under stochastic discounting and provide necessary and sufficient conditions for existence of finite solutions.

6.1.1Valuation¶

Our first step is to motivate and understand lifetime valuation when discount factors vary over time.

6.1.1.1Motivation¶

In Section 3.2.2.2 we discussed firm valuation in a setting where the interest rate is constant. But data show that interest rates are time-varying, even for safe assets such as US Treasury bills. Figure 6.1 shows nominal interest rate on one year Treasury bills since the 1950s, whereas Figure 6.2 shows an estimate of the real interest rate for 10 year T-bills since 2012. Both nominal and real interest rates are evidently time varying.

Nominal US interest rates (plot_interest_rates_nominal.jl) — Figure 6.1:Nominal US interest rates (`plot_interest_rates_nominal.jl`)

Real US interest rates (plot_interest_rates_real.jl) — Figure 6.2:Real US interest rates (`plot_interest_rates_real.jl`)

6.1.1.2Theory¶

The aim of this section is to understand and evaluate expressions such as (6.1). Throughout,

$\Xsf$ is a finite set, $P \in \mopx$ , and $(X_t)_{t \geq 0}$ is $P$ -Markov.
$h$ is an element of $\RR^\Xsf$ , with $h(X_t)$ typically interpreted as a payoff or reward at time $t$ in state $X_t$ .
$b$ is a map from $\Xsf \times \Xsf$ to $(0, \infty)$ and

\beta_t \coloneq b(X_{t-1}, X_t) \text{ for } t \in \NN \quad \text{with} \quad \beta_0 \coloneq 1.

(6.2)

The sequence $(\beta_t)_{t \geq 0}$ is called a discount factor process and $\prod_{i=0}^t \beta_i$ is the discount factor for period $t$ payoffs evaluated at time zero. We are interested in expected discounted sums of the form

v(x) \coloneq \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] h(X_t) \qquad (x \in \Xsf).

(6.3)

Theorem 6.1.1 generalizes Lemma 3.2.1. Indeed, if $b \equiv \beta \in (0, 1)$ , then $L = \beta P$ and $\rho(L) = \beta \rho(P) = \beta < 1$ , so the result in Theorem 6.1.1 reduces to Lemma 3.2.1.

Proof

Proof of Theorem 6.1.1.

To verify Theorem 6.1.1, we first prove that

\EE_x \, \left[ \prod_{i=0}^t \beta_i \right] \, h(X_t) = (L^t h)(x) \quad \text{for all } t \in \NN \text{, } h \in \RR^\Xsf \text{ and } x \in \Xsf.

(6.6)

We establish (6.6) using induction on $t$ . It is easy to see that (6.6) holds at $t=1$ . Now suppose it holds at $t$ . We claim it also holds at $t+1$ . To show this we fix $h \in \RR^\Xsf$ and set $\delta_t \coloneq \prod_{i=0}^t \beta_i$ . Applying the law of iterated expectations (see Section 3.2.1.2) yields

\EE_x \, \delta_{t+1} \, h(X_{t+1}) = \EE_x \, \EE_t \, b(X_t, X_{t+1}) \delta_t \, h(X_{t+1}) = \EE_x \, \delta_t \, \EE_t \, b(X_t, X_{t+1}) h(X_{t+1}).

Since

\EE_t \, b(X_t, X_{t+1}) h(X_{t+1}) = \sum_{x'} b(X_t, x') h( x') P(X_t, x') = \sum_{x'} L(X_t, x') h( x') = (Lh)(X_t),

we can now write

\EE_x \, \delta_{t+1} h(X_{t+1}) = \EE_x \, \delta_t f(X_t) \quad \text{where} \quad f(x) \coloneq (L h)(x).

(6.7)

Applying the induction hypothesis to (6.7) yields $\EE_x \, \delta_{t+1} h(X_{t+1}) = (L^t f)(x)$ . But $L^t f = L^t L h = L^{t+1} h$ , so $\EE_x \, \delta_{t+1} h(X_{t+1}) = (L^{t+1} h)(x)$ . This completes the induction step and hence the proof of (6.6)

Now we can complete the proof of Theorem 6.1.1. To this end, we fix $x \in \Xsf$ and use (6.6) to obtain

v(x) = \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] h(X_t) = \sum_{t=0}^\infty \EE_x \, \left[ \prod_{i=0}^t \beta_i \right] h(X_t) = \sum_{t=0}^\infty (L^t h)(x).

(6.8)

Pointwise, this is $v = \sum_{t \geq 0} L^t h$ . By the Neumann series lemma and $\rho(L)<1$ , this sum converges and equals $(I-L)^{-1} h$ . ◻

In (6.8) we passed expectations through an infinite sum. This operation is valid under the assumption $\rho(L)<1$ . A complete proof can be found in Section B.2.

Solution to Exercise 6.1.1

Set $L(x, x') \coloneq \beta(x) P(x, x')$ with $\beta(x) \coloneq 1/(1+r(x))$ . We claim that (6.9) is finite for all $x \in \Xsf$ and satisfies $v = (I - L)^{-1} \pi$ whenever $\rho(L)<1$ . To see this, we apply Theorem 6.1.1 with $b(x, x') = \beta(x)$ and $h = \pi$ .

Incidentally, to understand $v = \pi + Lv$ , suppose we buy the firm now, hold it for one period and then sell it. The expected present value of the payoff is $\pi + Lv$ . If expected benefit equals cost, then the value of (i.e., cost of buying) the firm now should equal $\pi + Lv$ . That is, $v = \pi + Lv$ . We expand on these ideas in Section 6.3.

6.1.2Testing the Spectral Radius Condition¶

In Theorem 6.1.1 the condition $\rho(L) < 1$ drives stability. In this section, we develop necessary and sufficient conditions for $\rho(L) < 1$ to hold.

6.1.2.1Spectral Radii via Expectations¶

First we develop an alternative representation of the spectral radius based on expectations. The next result is proved via a local spectral radius argument. In the statement, $\beta_t$ is as defined in (6.2) and $L$ is the operator in (6.4).

The expression in (6.10) connects the spectral radius with the long-run properties of the discount factor process. The connection becomes even simpler when $P$ is irreducible, as the next exercise asks you to show.

Exercise 6.1.3 shows that the spectral radius is a long-run (geometric) average of the discount factor process. For the conclusions of Theorem 6.1.1 to hold, we need this long-run average to be less than unity.

Figure 6.3 illustrates the condition $\rho(L) < 1$ when $\beta_t = X_t$ and $P$ is a Markov matrix produced by discretization of the AR1 process

X_{t+1} = \mu (1 - a) + a X_t + s (1 - a^2)^{1/2} \epsilon_{t+1} \qquad (\epsilon_t) \iidsim N(0,1).

(6.12)

The discussion in Section 3.1.3 tells us that the stationary distribution $\psi^*$ of (6.12) is normally distributed with mean $\mu$ and standard deviation $s$ . The parameter $a$ controls autocorrelation. In the figure we set $\mu$ to 0.96, which, since $\beta_t = X_t$ , is the stationary mean of the discount factor process. The parameters $a$ and $s$ are varied in the figure, and the contour plot shows the corresponding value of $\rho(L)$ . The process (6.12) is discretized via the Tauchen method with the size of the state space set to 6 (which avoids negative values for $\beta(x)$ ).

\rho(L) for different values of (a,s) (discount_spec_rad.jl) — Figure 6.3: $\rho(L)$ for different values of $(a,s)$ (`discount_spec_rad.jl`)

The figure shows that $\rho(L)$ tends to increase with both the volatility and the autocorrelation of the state process. This seems natural given the expression on the right hand side of (6.11), since sequences of large values of $\beta_i$ compound in the product $\prod_{i=0}^{t} \beta_i$ , pushing up the long-run average value, and such sequences occur more often when autocorrelation and volatility are large.

We finish this section with a lemma that simplifies computation of the spectral radius in settings where the process $(\beta_t)$ depends only on a subset of the state variables – a setting that is common in applications. In the statement of the lemma, the state space $\Xsf$ takes the form $\Xsf = \Ysf \times \Zsf$ . We fix $Q \in \mopz$ and $R \in \mopy$ . The discount operator $L$ is

L(x,x') = b(z, z') Q(z, z') R(y, y') \quad \text{with} \quad b \colon \Zsf \times \Zsf \to \RR_+.

Let $(Z_t)$ and $(Y_t)$ be $Q$ -Markov and $R$ -Markov respectively, so that, with $P$ as the pointwise product of $Q$ and $R$ , the process $(X_t) \coloneq ((Z_t, Y_t))$ is $P$ -Markov. We set $L_\Zsf(z,z') \coloneq b(z, z') Q(z, z')$ .

Proof

Let $\beta_t = b(Z_t, Z_{t+1})$ . Since $(\beta_t)$ depends only on $(Z_t)$ and, in addition, $(Y_t)$ and $(Z_t)$ are independent, for $x = (y,z) \in \Xsf$ we have $\EE_x \, \prod_{i=0}^t \beta_i = \EE_{(y, z)} \, \prod_{i=0}^t \beta_i = \EE_z \, \prod_{i=0}^t \beta_i$ . Hence

\left( \max_{x \in \Xsf} \EE_x \, \prod_{i=0}^t \beta_i \right)^{1/t} = \left(\max_{z \in \Zsf} \EE_z \, \prod_{i=0}^t \beta_i \right)^{1/t}.

Taking the limit and using Lemma 6.1.2 gives $\rho(L) = \rho(L_\Zsf)$ , where the first spectral radius is taken in $\lopx$ and the second is taken in $\lopz$ . ◻

6.1.2.2Necessary Conditions¶

In Section 6.1.1 we studied settings where lifetime value is a function $v$ on a state space $\Xsf$ that satisfies an equation of the form $v = h + L v$ . The unknown is $v \in \RR^\Xsf$ where $\Xsf$ is a finite set, $h \in \RR^\Xsf$ is given and $L$ is a linear operator from $\RR^\Xsf$ to itself. We discussed the fact that $\rho(L) < 1$ is sufficient for $v = h + L v$ to have a unique solution.

In some settings the condition $\rho(L) < 1$ is also necessary. For example, let

$V = (0, \infty)^\Xsf$
$L$ be a positive linear operator on $\RR^\Xsf$ .

In this setting we have the following result:

Proof

Regarding (i) $\implies$ (ii), existence of a unique $v \in \RR^\Xsf$ satisfying $v = h + L v$ follows from the Neumann series lemma. Since $v= \sum_{t \geq 0} L^t h \geq h \gg 0$ , we have $v \in V$ .

For (ii) $\implies$ (i), let $v$ be any solution to $v = h + L v$ in $V$ . By the Perron–Frobenius theorem, we can select a left eigenvector $e$ such that $e \geq 0$ and $e^\top L= \rho(L) e^\top$ . For this $e$ , we have $e^\top v = e^\top Lv + e^\top h = \rho(L) e^\top v + e^\top h$ . Since $e \geq 0$ , $e \neq 0$ and $v, h \gg 0$ , it must be that $e^\top h > 0$ and $e^\top v > 0$ . Therefore $\rho(L)$ satisfies $(1-\rho(L)) \alpha = \beta$ for $\alpha, \beta > 0$ . Hence $\rho(L) < 1$ . ◻

In Section 7.1.3 we will extend lemma Lemma 6.1.4 to handle certain nonlinear equations.

6.1.3Fixed-Point Results¶

State-dependent discounting breaks the contractivity properties that we exploited in Chapter 5, when we studied optimality of MDPs (see, e.g., the proof of Proposition 5.1.1). Here we introduce a generalization of Banach’s fixed-point theorem that can deliver global stability under weaker conditions. For the remainder of this section, $\Xsf$ is any finite set.

6.1.3.1Long-Run Contractions¶

Fix $U \subset \RR^\Xsf$ . We call a self-map $T$ on $U$ eventually contracting if there exists a $k \in \NN$ and a norm $\| \cdot \|$ on $\RR^\Xsf$ such that $T^k$ is a contraction on $U$ under $\| \cdot \|$ .

The next example illustrates Theorem 6.1.5 by proving a result similar to Exercise 1.2.17.

Example 6.1.2

If $T u = A u + b$ for some $b \in \RR^\Xsf$ and $A \in \lopx$ with $\rho(A) < 1$ , then, under the Euclidean norm,

\| T^k u - T^k v \| = \| A^k u - A^k v \| = \| A^k (u - v) \| \leq \|A^k \| \| u - v \|,

where the last line is by the submultiplicative property of the operator norm. Since $\rho(A) < 1$ , we can choose a $k \in \NN$ such that $\| A^k \| < 1$ (see Exercise 1.2.11). Hence $T$ is eventually contracting and Theorem 6.1.5 yields global stability. The unique fixed point satisfies $u = Au + b$ and, since $\rho(A) < 1$ , we can use the Neumann series lemma to write it as $u = (I - A)^{-1} b$ .

Example 6.1.2 illustrates the connection between Theorem 6.1.5 and the Neumann series lemma. Theorem 6.1.5 is more general because it can be applied in nonlinear settings. But the Neumann series lemma remains important because, when applicable, it provides inverse and power series representations of the fixed point.

On one hand, if $T$ is a contraction map on $U \subset \RR^\Xsf$ with respect to a given norm $\| \cdot \|_a$ , we cannot necessarily say that $T$ is a contraction with respect to some other norm $\| \cdot \|_b$ on $\RR^\Xsf$ . On the other hand, if $T$ is an eventual contraction on $U$ with respect to some given norm on $\RR^\Xsf$ , then $T$ is eventually contracting with respect to every norm on $\RR^\Xsf$ . The next exercise asks you to verify this.

6.1.3.2A Spectral Radius Condition¶

The following sufficient condition for eventual contractivity will be helpful when we study dynamic programs with state-dependent discounting.

Proof

Fix $v, w \in U$ . Pick any $k \in \NN$ . We have $|T^k v - T^k w| \leq L | T^{k-1} v - T^{k-1} w |$ , or

e_k \leq L e_{k-1} \quad \text{where} \quad e_k \coloneq |T^k v - T^k w| .

(6.14)

Since $L$ is positive, $L$ is order-preserving on $U$ by Exercise 2.3.11. As a result, we can iterate on (6.14) to obtain $e_k \leq L^k e_0$ , or

|T^k v - T^k w| \leq L^k | v - w |.

Let $\| \cdot \|$ be the Euclidean norm. Since $0 \leq a \leq b$ implies $\| a \| \leq \| b \|$ , we get

\| T^k v - T^k w \| \leq \| L^k |v - w |\| \leq \| L^k \| \| v - w \|,

where $\| L^k \|$ is the operator norm (see Section 1.2.1.4). Since $\rho(L) < 1$ , we have $\| L^k \| \to 0$ as $k \to \infty$ . (Exercise 1.2.11) Hence $T$ is eventually contracting on $U$ . ◻

6.1.3.3A Generalized Blackwell Condition¶

In Section 2.2.3.3 we studied a sufficient condition for order-preserving self maps to be contractions. The next proposition provides an analogous result for eventual contractions. In the statement of the proposition, $U$ is a subset of $\RR^\Xsf$ such that $v, c \in U$ and $c \geq 0$ implies $v+c \in U$ .

6.2Optimality with State-Dependent Discounting¶

We can now turn to dynamic programs in which the objective is to maximize a lifetime value in the presence of state-dependent discounting. First, we present an extension of the MDP model from Chapter 5 that admits state-dependent discounting. Then we provide weak conditions under which optimal policies exist and Bellman’s principle of optimality holds.

6.2.1MDPs with State-Dependent Discounting¶

We are ready to extend the MDP model to include state-dependent discounting. We construct a framework and then provide weak conditions for optimality based on spectral radius methods.

6.2.1.1Setup¶

To provide a framework for dynamic programs with state-dependent discounting, we begin with an MDP $(\Gamma, \beta, r, P)$ with state space $\Xsf$ , action space $\Asf$ and feasible state action pairs $\Gsf$ . We then replace the constant discount factor $\beta$ with a function $\beta$ from $\Gsf \times \Xsf$ to $\RR_+$ . We call the resulting model an MDP with state-dependent discounting. The Bellman equation takes the form

v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \sum_{x'} v(x') \beta(x, a, x') P(x, a, x') \right\},

(6.15)

where $x \in \Xsf$ and $v \in \RR^\Xsf$ . Notice that the discount factor depends on all relevant information: The current action, the current state and the stochastically determined next period state.

For MDPs with state-dependent discounting, we can obtain standard optimality results by assuming that there exists a $b < 1$ such that $\beta(x, a, x') \leq b$ for all $(x, a, x') \in \Gsf \times \Xsf$ . In this setting it is easy to show that lifetime values are finite, and to extend the optimality results for regular MDPs found in Proposition 5.1.1.

Unfortunately, the assumption discussed in the previous paragraph is too strict for many applications. (We return to this point in Section 6.2.1.6.) We will state an optimality result under weaker conditions.

6.2.1.2Finite Lifetime Values¶

Let $\Sigma$ be the set of all feasible policies, defined as for regular MDPs. The policy operator $T_\sigma$ corresponding to $\sigma \in \Sigma$ is represented by

(T_\sigma \, v)(x) = r(x, \sigma(x)) + \sum_{x'} v(x') \beta(x, \sigma(x), x') P(x, \sigma(x), x').

(6.16)

Following Chapter 5, we set $r_\sigma(x) \coloneq r(x,\sigma(x))$ . We define $L_\sigma \in \lopx$ via

L_\sigma(x,x') \coloneq \beta(x, \sigma(x), x') P(x, \sigma(x), x').

(6.17)

Notice that we can now write (6.16) as $T_\sigma \, v = r_\sigma + L_\sigma \, v$ . In line with our discussion of MDPs in Chapter 5, when $T_\sigma$ has a unique fixed point we denote it by $v_\sigma$ and interpret it as lifetime value.

As discussed, the value $v_\sigma(x)$ has the interpretation of lifetime value of policy $\sigma$ conditional on initial state $x$ . We can reinforce this interpretation by connecting Lemma 6.2.1 to Theorem 6.1.1. The next exercise asks you to work through all the steps.

Exercise 6.2.1

Fix $\sigma \in \Sigma$ , set $\beta_t \coloneq \beta(X_{t-1}, \sigma (X_{t-1}), X_{t})$ for $t \geq 1$ and $\beta_0 \coloneq 1$ . Let $(X_t)$ be $P_\sigma$ -Markov with initial condition $x$ . (As before, $P_\sigma(x, x') \coloneq P(x, \sigma(x), x')$ .) Prove that, under Assumption 6.2.1, the function $v_\sigma$ obeys

v_\sigma(x) = \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t \beta_i \right] r_\sigma(X_t) \qquad (x \in \Xsf).

(6.19)

Solution to Exercise 6.2.2

Fix $\sigma \in \Sigma$ and let Assumption 6.2.1 hold. We saw in the proof of Lemma 6.2.1 that $T_\sigma \, v = r_\sigma + L_\sigma \, v$ and $v_\sigma = (I - L_\sigma)^{-1} r_\sigma$ is the unique fixed point in of this operator $\RR^\Xsf$ . Moreover, for fixed $v, w \in \RR^\Xsf$ , we have

|T_\sigma \, v - T_\sigma \, w| = |L_\sigma \, v - L_\sigma \, w| = |L_\sigma \, (v - w)| = L_\sigma \, |v - w|.

Hence, by Proposition 6.1.6, $T_\sigma$ is globally stable on $\RR^\Xsf$ .

6.2.1.3Optimality¶

The Bellman operator takes the form

(Tv)(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \sum_{x'} v(x') \beta(x, a, x') P(x, a, x') \right\},

(6.21)

where $x \in \Xsf$ and $v \in \RR^\Xsf$ .

Given $v \in \RR^\Xsf$ , a policy $\sigma$ is called $v$ -greedy if $\sigma(x)$ is a maximizer of the right-hand side of (6.21) for all $x$ in $\Xsf$ . Equivalently, $\sigma$ is $v$ -greedy whenever $T_\sigma \, v = Tv$ .

When Assumption 6.2.1 holds and, as a result, $T_\sigma$ has a unique fixed point $v_\sigma$ for each $\sigma \in \Sigma$ , we let $v^*$ denote the value function, which is defined as $v^* \coloneq \vee_{\sigma \in \Sigma} v_\sigma$ . As for the regular MDP case, a policy $\sigma$ is called optimal if $v_\sigma = v^*$ .

We can now state our main optimality result for MDPs with state-dependent discounting.

In Section 8.2.2 we prove a result that includes Proposition 6.2.2 as a special case.

6.2.1.4Algorithms¶

Algorithms for solving an MDP with state-dependent discounting include value function iteration (VFI), Howard policy iteration (HPI), and optimistic policy iteration (OPI). The algorithms for VFI and OPI are identical to those given for regular MDPs (see Section 5.1.4), provided that the correct operators $T$ and $T_\sigma$ are used, and that the definition of a $v$ -greedy policy is as given in Section 6.2.1.1. The algorithm for HPI is almost identical, with the only change being that computation of lifetime values involves $L_\sigma$ . Details are given in Algorithm 6.1.

We prove in Chapter 8 that, under the conditions of Assumption 6.2.1, VFI, OPI and HPI are all convergent, and that HPI converges to an exact optimal policy in a finite number of steps.

6.2.1.5Exogenous Discounting¶

Some applications use an exogenous state component to drive a discount factor process. In this section we set up such a model and obtain optimality conditions by applying Proposition 6.2.2.

The first step is to decompose the state $X_t$ into a pair $(Y_t, Z_t)$ , where $(Y_t)_{t \geq 0}$ is endogenous (i.e., affected by the actions of the controller) and $(Z_t)_{t \geq 0}$ is purely exogenous. In particular, the primitives consist of

(i) a nonempty correspondence $\Gamma$ from $\Ysf \times \Zsf$ to $\Asf$ ,

(ii) a function $\beta$ from $\Zsf$ to $\RR_+$ ,

(iii) a function $r$ from $\Gsf \coloneq \setntn{(y, a) \in \Ysf \times \Asf}{a \in \Gamma(y)}$ to $\RR$ ,

(iv) a stochastic matrix $Q$ on $\Zsf$ and

(v) a stochastic kernel $R$ from $\Gsf$ to $\Ysf$ .

The corresponding Bellman equation is

v(y, z) = \max_{a \in \Gamma(y, z)} \left\{ r(y, a) + \beta(z) \sum_{z', \, y'} v(y', z') Q(z, z') R(y, a, y') \right\},

(6.22)

for all $(y, z) \in \Xsf$ . Given $v \in \RR^\Xsf$ , a policy $\sigma \in \Sigma$ is called $v$ -greedy if

\sigma(y, z) \in \argmax_{a \in \Gamma(y, z)} \left\{ r(y, a) + \beta(z) \sum_{z', \, y'} v(y', z') Q(z, z') R(y, a, y') \right\},

(6.23)

for all $(y, z) \in \Xsf$ .

This exogenous discount model is a special case of the general MDP with state-dependent discounting. Indeed, we can write (6.22) as (6.21) by setting $x \coloneq (y,z)$ and defining

P(x,a,x') \coloneq P((y, z),a, (y', z')) \coloneq Q(z, z') R(y, a, y').

The following proposition provides a relatively simple sufficient condition for the core optimality results in the setting of the exogenous discount model.

Solution to Exercise 6.2.4

Fix $\sigma \in \Sigma$ . In the present setting, the discount operator $L_\sigma$ from (6.17) becomes

L_\sigma(x, x') = L_\sigma((y,z), (y',z')) = \beta(z) Q(z,z') R(y, \sigma(y), y').

In view of Lemma 6.1.3, the spectral radius of $L_\sigma$ on $\lopx$ is equal to the spectral radius of $L_\Zsf(z, z') = \beta(z) Q(z, z')$ on $\lopz$ . It follows that $\rho(L_\Zsf) < 1$ in $\lopz$ implies $\rho(L_\sigma) < 1$ in $\lopx$ , so Assumption 6.2.1 holds. Hence, under this condition, Proposition 6.2.2 is valid.

6.2.1.6Comments on the Spectral Radius Condition¶

In Section 6.2.1.2 we mentioned that requiring $\sup \beta < 1$ is too strict for some applications. For example, the real interest rate $r_t$ shown in Figure 6.2 is sometimes negative. Using long historical records, Farmer et al. (2023) find that the discount rate is negative around 1/3 of the time. This means that the associated discount factor $\beta_t = 1/(1+r_t)$ is sometimes greater than 1 and $\sup \beta < 1$ fails.

In macroeconomics, empirically motivated time-varying discount factor specifications lead to models where $\beta_t > 1$ occurs with positive probability. For example, Hills et al. (2019) study a model that can be embedded in the MDP framework just described. Figure 6.4 shows a simulation of one of the discount factor processes used in their model, prior to discretization. The exogenous state and discount factor process takes the form $\beta_t = b Z_t$ , where $(Z_t)$ is an exogenous state obeying $Z_{t+1} = 1 - \rho + \rho Z_t + \sigma \epsilon_{t+1}$ with $(\epsilon_t)$ IID and standard normal. Clearly $\sup \beta < 1$ fails for this model too.

Let’s now consider the weaker condition $\rho(L) < 1$ described in Proposition 6.2.3 and check whether it holds. Following Hills et al. (2019), we discretize the dynamics of $(Z_t)$ via a Tauchen approximation, producing a stochastic matrix $Q$ on a finite set $\Zsf$ .^[2] The set of values for $\beta_t$ ranges between 0.95 and 1.04, so that $\beta_t > 1$ remains possible. Nonetheless, with $L(z, z') = \beta(z) Q(z, z')$ we obtain $\rho(L)=0.9996$ . Hence Proposition 6.2.3 applies.

Discount factor process (\beta)_{t
\geq 0} in . — Figure 6.4:Discount factor process $(\beta)_{t \geq 0}$ in Hills *et al.* (2019).

6.2.2Inventory Management Revisited¶

In this section, we modify the inventory management model from Section 5.2.1 to include time-varying interest rates.

Recall that, in the model of Section 5.2.1, the Bellman equation takes the form

v(x) = \max_{a \in \Gamma(x)} \left\{ r(x, a) + \beta \sum_{d \geq 0} v(f(x, a, d)) \phi(d) \right\},

(6.24)

at each $x \in \Xsf$ , where $\Xsf \coloneq \{0, \ldots, K\}$ , $x$ is the current inventory level, $a$ is the current inventory order, $r(x, a)$ is current profits (defined in (5.8)), $f(x,a,d) \coloneq (x - d)\vee 0 + a$ and $d$ is an IID demand shock with distribution $\phi$ . Let’s now add a time-varying discount rate and investigate its impact on optimal choices.

We add time-varying discounting by replacing the constant $\beta$ in (6.24) with a stochastic process $(\beta_t)$ where $\beta_t = 1/(1+r_t)$ . We suppose that the dynamics can be expressed as $\beta_t = \beta(Z_t)$ , where the exogenous process $(Z_t)_{t \geq 0}$ is $Q$ -Markov on $\Zsf$ . After relabeling the endogenous state $X_t$ as $Y_t$ and $x$ as $y$ , in line with the notation in Section 6.2.1.5, the Bellman equation becomes $v(y, z) = \max_{a \in \Gamma(y, z)} B((y, z), a, v)$ where

B((y, z), a, v) = r(y, a) + \beta(z) \sum_{d, \, z'} v(f(y, a, d), z') \phi(d) Q(z, z').

(6.25)

If we set

R(y, a, y') \coloneq \PP\{f(y, a, d) = y'\} \quad \text{when} \quad D \sim \phi,

then $R(y, a, y')$ is the probability of realizing next period inventory level $y'$ when the current level is $y$ and the action is $a$ . Hence we can rewrite (6.25) as

B((y, z), a, v) = r(y, a) + \beta(z) \sum_{y', z'} v(y', z') Q(z, z') R(y, a, y') .

(6.26)

We have now created a version of the MDP with exogenous state-dependent discounting described in Section 6.2.1.5. Letting $L(z, z') \coloneq \beta(z) Q(z, z')$ and applying Proposition 6.2.3, we see that all of the standard optimality results hold whenever $\rho(L)<1$ .

Figure 6.5 shows how inventory evolves under an optimal program when the parameters of the problem are as given in Listing 1. (The code preallocates and computes arrays representing $r$ , $R$ , and $Q$ in (6.26) and includes a test for $\rho(L)<1$ .) We set $\beta(z) = z$ and take $(Z_t)$ to be a discretization of an AR(1) process. Figure 6.5 was created by simulating $(Z_t)$ according to $Q$ and inventory $(Y_t)$ according to $Y_{t+1} = (Y_t - D_{t+1} ) \vee 0 + A_t$ , where $A_t$ follows the optimal policy.

The outcome is similar to Figure 5.7, in the sense that inventory falls slowly and then jumps up. As before, fixed costs induce this lumpy behavior. However, a new phenomenon is now present: Inventories trend up when interest rates fall and down when they rise. (The interest rate $r_t$ is calculated via $\beta_t = 1/(1+r_t)$ at each $t$ .) High interest rates foreshadow high interest rates due to positive autocorrelation ( $\rho > 0$ ), which in turn devalue future profits and hence encourage managers to economize on stock.

Figure 6.5:Inventory dynamics with time-varying interest rates

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
using LinearAlgebra, Random, Distributions, QuantEcon

f(y, a, d) = max(y - d, 0) + a  # Inventory update

function create_sdd_inventory_model(; 
            ρ=0.98, ν=0.002, n_z=20, b=0.97,  # Z state parameters
            K=40, c=0.2, κ=0.8, p=0.6,        # firm and demand parameters
            d_max=100)                        # truncation of demand shock

    ϕ(d) = (1 - p)^d * p                      # demand distribution
    d_vals = collect(0:d_max)
    ϕ_vals = ϕ.(d_vals)
    y_vals = collect(0:K)                     # inventory levels
    n_y = length(y_vals)
    mc = tauchen(n_z, ρ, ν)
    z_vals, Q = mc.state_values .+ b, mc.p
    ρL = maximum(abs.(eigvals(z_vals .* Q)))     
    @assert  ρL < 1 "Error: ρ(L) ≥ 1."    

    R = zeros(n_y, n_y, n_y)
    for (i_y, y) in enumerate(y_vals)
        for (i_y′, y′) in enumerate(y_vals)
            for (i_a, a) in enumerate(0:(K - y))
                hits = f.(y, a, d_vals) .== y′
                R[i_y, i_a, i_y′] = dot(hits, ϕ_vals)
            end
        end
    end

    r = fill(-Inf, n_y, n_y)
    for (i_y, y) in enumerate(y_vals)
        for (i_a, a) in enumerate(0:(K - y))
                cost = c * a + κ * (a > 0)
                r[i_y, i_a] = dot(min.(y, d_vals),  ϕ_vals) - cost
        end
    end

    return (; K, c, κ, p, r, R, y_vals, z_vals, Q)
end

Program 1:Investment model with time-varying discounting (inventory_sdd.jl)

Figure 6.6 shows execution time for VFI and OPI at different choices of $m$ (see Section 6.2.1.3 for the interpretation of $m$ ). As for the optimal savings problem we studied in Chapter 5, OPI is around 1 order of magnitude faster when $m$ is close to 50 (cf. Figure 5.8).

Figure 6.6:OPV vs VFI timings for the inventory problem

6.3Asset Pricing¶

This section provides a brief introduction to asset pricing in a Markov environment. While the topic of asset pricing is fascinating in its own right, our main aim is to provide additional practice in handling linear valuation problems. (Readers who wish to push ahead with their study of dynamic programming can safely skip to Chapter 7.)

6.3.1Introduction to Asset Pricing¶

We first discuss risk-neutral pricing and show why this assumption is typically implausible. Next, we introduce stochastic discount factors and stationary asset pricing.

6.3.1.1Risk-Neutral Pricing?¶

Consider the problem of assigning a current price $\Pi_t$ to an asset that confers on its owner the right to payoff $G_{t+1}$ . The payoff is stochastic and realized next period. One simple idea is to use risk-neutral pricing, which implies that

\Pi_t = \EE_t \, \beta \, G_{t+1},

(6.27)

for some constant discount factor $\beta \in (0,1)$ . If the payoff is in $k$ periods, then we modify the price to $\EE_t \, \beta^k \, G_{t+k}$ . In essence, risk-neutral pricing says that cost equals expected reward, discounted to present value by compounding a constant rate of discount. (A rate of discount, say $\rho$ , is linked to a discount factor, say $\beta$ , by $\beta = 1/(1+\rho) \approx \exp(-\rho)$ .)

Although risk neutrality allows for simple pricing, assuming risk neutrality for all investors is not plausible.

To give one example, suppose that we take the asset that pays $G_{t+1}$ in (6.27) and replace it with another asset that pays $H_{t+1} =G_{t+1} + \epsilon_{t+1}$ , where $\epsilon_{t+1}$ is independent of $G_{t+1}$ , $\EE_t \, \epsilon_{t+1}=0$ and $\var \epsilon_{t+1} > 0$ . In effect, we are adding risk to the original payoff without changing its mean.

Under risk neutrality, the price of this new asset is

\Pi_t^H = \EE_t \, \beta \, [G_{t+1} + \epsilon_{t+1}] = \Pi_t + \beta \, \EE_t \, \epsilon_{t+1} = \Pi_t.

Thus, $H_{t+1}$ and $G_{t+1}$ are priced identically, even though their means are both $\EE_t G_{t+1}$ and their variances satisfy

\var H_{t+1} = \var G_{t+1} + \var \epsilon_{t+1} > \var G_{t+1}.

This outcome contradicts the idea that investors typically want compensation for bearing risk.

A helpful way to think about the same point is to consider the rate of return $r_{t+1} \coloneq (G_{t+1}-\Pi_t) / \Pi_t$ on holding an asset with payoff $G_{t+1}$ . From (6.27) we have $\EE_t \, \beta (1 + r_{t+1}) = 1$ , or

\EE_t \, r_{t+1} = \frac{1-\beta}{\beta}.

Since the right-hand side does not depend on $G_{t+1}$ , risk neutrality implies that all assets have the same expected rate of return. But this contradicts the finding that, on average, riskier assets tend to have higher rates of return that compensate investors for bearing risk.

6.3.1.2A Stochastic Discount Factor¶

To go beyond risk neutral-pricing, let’s start with a model containing one asset and one agent. It is straightforward to price the asset and compare it to the risk neutral case.

A representative agent takes the price $\Pi_t$ of a risky asset as given and solves

\begin{aligned} & \max_{0 \leq \alpha \leq 1} \{ u(C_t) + \beta \EE_t u(C_{t+1}) \} \\ \text{subject to} \quad & C_t = E_t - \Pi_t \alpha \quad \text{and} \quad C_{t+1} = E_{t+1} + \alpha G_{t+1}. \end{aligned}

Here

$u$ is a flow utility function,
$G_{t+1}$ is the payoff of the asset and $\Pi_t$ is the time- $t$ price,
$\beta$ is a constant discount factor measuring impatience of the agent,
$E_t$ and $E_{t+1}$ are endowments and
$\alpha$ is the share of the asset purchased by the agent.

Rewriting as $\max_\alpha \{ u(E_t - \Pi_t \alpha) + \beta \EE_t u(E_{t+1} + \alpha G_{t+1}) \}$ and differentiating with respect to $\alpha$ leads to the first order condition

u'(E_t - \Pi_t \alpha) \Pi_t = \beta \EE_t u'(E_{t+1} + \alpha G_{t+1}) G_{t+1}.

Rearranging gives us

\Pi_t = \EE_t \left[ \beta \frac{u'(C_{t+1})}{u'(C_t)} G_{t+1} \right].

(6.28)

Comparing (6.28) with (6.27), we see that the payoff is now multiplied by a positive random variable rather than a constant. The random variable

M_{t+1} \coloneq \beta \frac{u'(C_{t+1})}{u'(C_t)}

(6.29)

is called the stochastic discount factor or pricing kernel. We call this particular form of the pricing kernel shown in (6.29) Lucas stochastic discount factor (Lucas SDF) in honor of Lucas (1978).

In the CRRA case, the Lucas SDF applies heavier discounting to assets that concentrate payoffs in states of the world where the agent is already enjoying strong consumption growth. Conversely, the SDF attaches higher weights to future payoffs that occur when consumption growth is low because such payoffs hedge against the risk of drawing low consumption states.

6.3.1.3A General Specification¶

The standard neoclassical theory of asset pricing generalizes the Lucas discounting specification by assuming only that there exists a positive random variable $M_{t+1}$ such that the price of an asset with payoff $G_{t+1}$ is

\Pi_t = \EE_t \, M_{t+1} \, G_{t+1} \qquad (t \geq 0).

(6.31)

In line with the preceding discussion, $M_{t+1}$ is called a stochastic discount factor (SDF). Equation (6.31) generalizes (6.28) by refraining from restricting the SDF (apart from assuming positivity).

Actually, it can be shown that there exists an SDF $M_{t+1}$ such that (6.31) is always valid under relatively weak assumptions. In particular, a single SDF $M_{t+1}$ can be used to price any asset in the market, so if $H_{t+1}$ is a another stochastic payoff then the current price of an asset with this payoff is $\EE_t \, M_{t+1} \, H_{t+1}$ .

We do not prove these claims, since our interest is in understanding forward-looking equations in Markov environments. Some relevant references are listed in Section 6.4.

6.3.1.4Markov Pricing¶

A common assumption in quantitative applications is that all underlying randomness is driven by a Markov model. In this spirit, we take $(X_t)$ to be $P$ -Markov on finite-state $\Xsf$ , where $P \in \mopx$ , and suppose further that the SDF and payoff have the forms

M_{t+1} = m(X_t, X_{t+1}) \quad \text{and} \quad G_{t+1} = g(X_t, X_{t+1}),

for fixed functions $m, g$ mapping $\Xsf \times \Xsf$ to $\RR_+$ . Since $m$ is arbitrary at this point, we don’t assume a particular specification for the SDF.

In this setting, conditioning on $X_t = x$ , the standard asset pricing equation $\Pi_t = \EE_t \, M_{t+1} \, G_{t+1}$ becomes

\pi(x) = \sum_{x'} m(x, x') g(x, x') P(x, x') \qquad (x \in \Xsf),

(6.32)

where $\pi(x)$ is the price of the asset conditional on $X_t = x$ (i.e., $\Pi_t = \pi(X_t)$ ).

6.3.1.5Pricing a Stationary Dividend Stream¶

Now we are ready to look at pricing a stationary cash flow over an infinite horizon, a basic problem in asset pricing. We will apply the Markov structure assumed in Section 6.3.1.4. In all that follows, $(X_t)$ is $P$ -Markov on $\Xsf$ and $M_{t+1}$ is defined as in Section 6.3.1.4.

We seek the time $t$ price, denoted by $\Pi_t$ , for an ex dividend contract on the dividend stream $(D_t)_{t \geq 0}$ . The contract provides the owner with the right to the dividend stream. The “ex dividend” component means that, should the dividend stream be traded at time $t$ , the dividend paid at time $t$ goes to the seller rather than the buyer. As a result, purchasing at $t$ and selling at $t+1$ pays $\Pi_{t+1} + D_{t+1}$ . Hence, applying the asset pricing rule (6.31), at time $t$ price $\Pi_t$ of the contract must satisfy

\Pi_t = \EE_t \, M_{t+1} (\Pi_{t+1} + D_{t+1}).

(6.33)

We assume the existence of a $d \in \RR_+^\Xsf$ such that $D_t = d(X_t)$ for all $t$ . Using (6.32), we can write this as

\pi(x) = \sum_{x'} m(x, x') (\pi(x') + d(x')) P(x, x') \qquad (x \in \Xsf),

(6.34)

or, equivalently,

\pi = A \pi + A d \quad \text{when } A(x,x') \coloneq m(x, x') P(x, x').

(6.35)

By the Neumann series lemma, $\rho(A) < 1$ implies (6.35) has unique solution

\pi^* \coloneq (I - A )^{-1} A d = \sum_{k=1}^\infty A^k d.

The vector $\pi^*$ is called an equilibrium price function.

6.3.1.6Forward Sum Representation¶

Asset prices can be expressed as infinite sums. Let’s show this for cum dividend contracts (although the case of ex dividend contracts is similar). In Exercise 6.3.4 you found that the state-contingent price vector $\pi$ for a cum dividend contract on the dividend stream $(D_t)_{t \geq 0}$ obeys

\pi = d + A \pi \quad \text{when } A(x,x') \coloneq m(x, x') P(x, x')

(6.37)

and $\rho(A) < 1$ . As before, $D_t = d(X_t)$ and $(X_t)_{t\geq 0}$ is $P$ -Markov on $\Xsf$ . Applying the uniqueness component of the Neumann series lemma and Theorem 6.1.1, we see that the function $\pi$ also obeys

\pi(x) = \EE_x \, \sum_{t=0}^\infty \left[ \prod_{i=0}^t M_i \right] D_t \qquad (x \in \Xsf),

where $M_{t+1} \coloneq m(X_t, X_{t+1})$ for $t \geq 0$ and $M_0 \coloneq 1$ . This expression agrees with our intuition: The price of the contract is the expected present value of the dividend stream, with the time $t$ dividend discounted by the composite factor $M_1 \cdots M_t$ .

6.3.2Nonstationary Dividends¶

Until now, our discussion of asset pricing has assumed that dividends are stationary. However, dividends typically grow over time, along with other economic measures such as GDP. In this section, we solve for the price of a dividend stream when dividends exhibit random growth.

6.3.2.1Price-Dividend Ratios¶

A standard model of dividend growth is

\ln \frac{D_{t+1}}{D_t} = \kappa(X_t, \eta_{t+1}) \qquad t = 0, 1, \ldots,

where $\kappa$ is a fixed function, $(X_t)$ is the state process and $(\eta_t)$ is IID. We let $\phi$ be the density of each $\eta_t$ and assume that $(X_t)$ is $P$ -Markov on a finite set $\Xsf$ . Let’s suppose as before that the SDF obeys $M_{t+1} = m(X_t, X_{t+1})$ for some positive function $m$ .

Since dividends grow over time, so will the price of the asset. As such, we should no longer seek a fixed function $\pi$ such that $\Pi_t = \pi(X_t)$ for all $t$ , since the resulting price process $(\Pi_t)$ will fail to grow. Instead, we try to solve for the price-dividend ratio $V_t \coloneq \Pi_t / D_t$ , which we hope will be stationary.

After conditioning on $X_t = x$ , (6.38) leads us to conjecture existence of a function $v$ such that

v(x) = \sum_{x'} m(x, x') \int \exp(\kappa(x, \eta)) \phi(\diff \eta) \left[ 1 + v(x') \right] P(x, x'),

(6.39)

for all $x \in \Xsf$ . We understand (6.39) as an equation to be solved for the unknown object $v \in \RR^\Xsf$ . If we can find a solution $v^*$ to (6.39), then setting $V_t = v^*(X_t)$ yields a process $(V_t)$ that obeys (6.38).

Solution to Exercise 6.3.6

We seek a $v$ that solves

v(x) = \sum_{x'} \left[ 1 + v(x') \right] A(x, x') \qquad (x, x' \in \Xsf).

Treating $A$ as a matrix and $v$ as a column vector, this equation becomes $v = A \1 + A v$ , where $\1$ is a column vector of ones. By the Neumann series lemma, $\rho(A) < 1$ implies that this equation has the unique solution $v^* = (I - A)^{-1} A\1$ . By the same lemma, $v^*$ has the alternative representation $v^* = \sum_{t \geq 0} A^t (A \1) = \sum_{t \geq 1} A^t \1$ .

The price-dividend process $(V^*_t)$ defined by $V^*_t = v^*(X_t)$ solves (6.38). The price can be recovered via $\Pi_t = V^*_t D_t$ .

6.3.2.2Application: Markov Growth with a Lucas SDF¶

As an example, suppose that dividend growth obeys

\kappa(X_t, \eta_{d, t+1}) = \mu_d + X_t + \sigma_d \, \eta_{d, t+1},

where $(\eta_{d,t})_{t \geq 0}$ is IID and standard normal. Consumption growth is given by

\ln \frac{C_{t+1}}{C_t} = \mu_c + X_t + \sigma_c \, \eta_{c, t+1} ,

where $(\eta_{c,t})_{t \geq 0}$ is also IID and standard normal. We use the Lucas SDF in (6.30), implying that

M_{t+1} = \beta \left( \frac{C_{t+1}}{C_t} \right)^{-\gamma} = \beta \exp(-\gamma( \mu_c + X_t + \sigma_c \eta_{c, t+1} )).

1
2
3
4
5
6
7
8
9
10
11
12
13
using QuantEcon, LinearAlgebra

"Creates an instance of the asset pricing model with Markov state."
function create_asset_pricing_model(;
        n=200,              # state grid size
        ρ=0.9, ν=0.2,       # state persistence and volatility
        β=0.99, γ=2.5,      # discount and preference parameter
        μ_c=0.01, σ_c=0.02, # consumption growth mean and volatility
        μ_d=0.02, σ_d=0.1)  # dividend growth mean and volatility
    mc = tauchen(n, ρ, ν)
    x_vals, P = exp.(mc.state_values), mc.p
    return (; x_vals, P, β, γ, μ_c, σ_c, μ_d, σ_d)
end

Program 2:Asset pricing model with Lucas SDF (pd_ratio.jl)

Figure 6.7 shows the price-dividend ratio function $v^*$ for the specification given in Listing 2, as well as for an alternative mean dividend growth rate $\mu_d$ . The state process is a Tauchen discretization of an AR(1) process with positive autocorrelation. An increase in the state predicts higher dividends, which tends to increase the price. At the same time, higher $x$ also predicts higher consumption growth, which acts negatively on the price. For values of $\gamma$ greater than 1, the second effect dominates and the price-dividend ratio slopes down.

Figure 6.7:Price-dividend ratio as a function of the state

6.3.3Incomplete Markets¶

In Section 6.3.1.5 we used the Neumann series lemma to solve for the equilibrium price vector $\pi$ . However, some modifications to the basic model introduce nonlinearities that render the Neumann series lemma inapplicable. For example, Harrison & Kreps (1978) analyze a setting with heterogeneous beliefs and incomplete markets, leading to failure of the standard asset pricing equation. This results in a nonlinear equation for prices.

We treat the Harrison & Kreps (1978) model only briefly. There are two types of agents. Type $i$ believes that the state updates according to stochastic matrix $P_i$ for $i=1,2$ . Agents are risk-neutral, so $m(x,y) \equiv \beta \in (0, 1)$ . Harrison & Kreps (1978) show that, for their model, the equilibrium condition (6.34) becomes

\pi(x) = \max_i \beta \sum_{x'} [\pi(x') + d(x')] P_i(x, x')

(6.42)

for $x \in \Xsf$ and $i \in \{1, 2\}$ . Setting aside the details that lead to this equation, our objective is simply to obtain a vector of prices $\pi$ that solves (6.42).

As a first step, we introduce an operator $T \colon \RR^\Xsf_+ \to \RR^\Xsf_+$ that maps $\pi$ to $T \pi$ via

(T \pi)(x) = \max_i \beta \sum_{x'} [\pi(x') + d(x')] P_i(x, x') \qquad (x \in \Xsf).

(6.43)

We are assuming $d \geq 0$ , so $T$ is indeed a self-map on $\RR^\Xsf_+$ .

By construction, a vector $\pi \in \RR_+^\Xsf$ is a fixed point of $T$ if and only if it is a vector of prices that solves (6.42). Hence, we have successfully converted our equilibrium problem into a fixed point problem.

We aim to show that $T$ is a contraction. To this end, pick any $p, q \in \RR^\Xsf_+$ . Applying the inequality from Lemma 2.2.2, we obtain

| (Tp)(x) - (Tq)(x) | \leq \beta \max_i \left| \sum_{x'} [p(x') + d(x')] P_i(x, x') - \sum_{x'} [q(x') + d(x')] P_i(x, x') \right|.

Using the triangle inequality and canceling terms leads to

| (Tp)(x) - (Tq)(x) | \leq \beta \max_{i \in \{1, 2\}} \sum_{x'} |p(x') - q(x')| P_i(x, x') \leq \beta \| p - q \|_\infty.

Since this bound holds for all $x$ , we can take the maximum with respect to $x$ and obtain

\| Tp - Tq \|_\infty \leq \beta \| p - q \|_\infty.

Thus, on $\RR^\Xsf_+$ , the map $T$ is a contraction of modulus $\beta$ with respect to the sup norm.

Since $\RR^\Xsf_+$ is a closed subset of $\RR^\Xsf$ , we conclude that $T$ has a unique fixed point in this set. Hence, the system (6.42) has a unique solution $\pi^*$ in $\RR^\Xsf_+$ , representing equilibrium prices. This fixed point can be computed by successive approximation.

6.4Chapter Notes¶

Asset pricing is discussed in many sources, including Hansen & Renault (2010), Ross (2009), Cochrane (2009), Duffie (2010) and Campbell (2017). Asset pricing is part of many applications and extensions in macroeconomics, public finance, international economics, and other fields. Some of these are described in Ljungqvist & Sargent (2018).

Dynamic programming with state-dependent discounting is becoming more common in macroeconomics and finance. Representative examples include Krusell & Smith (1998), Woodford (2011), Christiano et al. (2014), Albuquerque et al. (2016), Saijo (2017), Basu & Bundick (2017), Groot et al. (2018), Schorfheide et al. (2018), Hills et al. (2019), Toda (2019), Fagereng et al. (2019), Hubmer et al. (2020) and Cao (2020). For more on the theory of state-dependent discounting, see Jasso-Fuentes et al. (2020), Toda (2021) or Stachurski & Zhang (2021). An analysis of sovereign default with time-varying interest rates is provided by Bloise & Vailakis (2022).

Another challenge to the standard model with constant discount rates comes from empirical and experimental studies that find evidence of “hyperbolic discounting,” where valuations across time fall rapidly at first and then more slowly. Provocative reviews of hyperbolic and quasi-hyperbolic discounting can be found in Frederick et al. (2002) and Rubinstein (2003). Cao & Werning (2018) provide conditions under which predictions from optimal savings models with quasi-hyperbolic discounting are robust. Balbus et al. (2018) analyze uniqueness of time-consistent stationary Markov policies for quasi-hyperbolic households under uncertainty. Balbus et al. (2022) study equilibria in dynamic models with recursive payoffs and generalized discounting. Noor & Takeoka (2022) addresses the topic of optimal discounting. Additional references include Diamond & Köszegi (2003), Dasgupta & Maskin (2005), Karp (2005), Amador et al. (2006), Balbus et al. (2018), Fedus et al. (2019), Hens & Schindler (2020), Jaśkiewicz & Nowak (2021), and Drugeon & Wigniolle (2021).

This chapter focused on time additive models with state-dependent discounting. More general preference specifications with this feature include Albuquerque et al. (2016), Schorfheide et al. (2018), Pohl et al. (2018), Gomez-Cram & Yaron (2020), and Groot et al. (2022). In Chapter 8 we consider state-dependent discounting in general settings that accommodate such nonlinearities.

Footnotes¶

We are assuming that randomness in interest rates is a function of the same Markov state that influences profits. There is very little loss of generality in making this assumption. In fact, the two processes can still be statistically independent. For example, if we take $X_t$ to have the form $X_t = (Y_t, Z_t)$ , where $(Y_t)$ and $(Z_t)$ are independent Markov chains, then we can take $\beta_t$ to be a function of $Y_t$ and $\pi_t$ to be a function of $Z_t$ . The resulting interest and profit processes are statistically independent.
↩
The parameters are $\rho = 0.85$ , $\sigma = 0.0062$ , and $b = 0.99875$ . In line with Hills et al. (2019), we discretize the model via mc = tauchen(n, ρ, σ, 1 - ρ, m){.julia} with $m = 4.5$ and $n = 15$ .
↩

References¶

Krusell, P., & Smith, A. A., Jr. (1998). Income and wealth heterogeneity in the macroeconomy. Journal of Political Economy, 106(5), 867–896.
Marimon, R. (1984). General Equilibrium and Growth under Uncertainty: the Turnpike Property.
Marimon, R. (1989). Stochastic turnpike property and stationary equilibrium. Journal of Economic Theory, 47(2), 282–306.
Stachurski, J., & Zhang, J. (2021). Dynamic programming with state-dependent discounting. Journal of Economic Theory, 192, 105190.
Farmer, J. D., Geanakoplos, J., Richiardi, M. G., Montero, M., Perelló, J., & Masoliver, J. (2023). Discounting the distant future: What do historical bond prices imply about the long term discount rate? [Techreport]. arXiv, 2312.17157.
Hills, T. S., Nakata, T., & Schmidt, S. (2019). Effective lower bound risk. European Economic Review, 120, 103321.
Cochrane, J. H. (2009). Asset Pricing: Revised Edition. Princeton University Press.
Lucas, R. E. (1978). Asset prices in an exchange economy. Econometrica, 46(6), 1429–1445.
Harrison, J. M., & Kreps, D. M. (1978). Speculative investor behavior in a stock market with heterogeneous expectations. The Quarterly Journal of Economics, 92(2), 323–336.
Hansen, L. P., & Renault, E. (2010). Pricing kernels. Encyclopedia of Quantitative Finance.
Ross, S. A. (2009). Neoclassical Finance. Princeton University Press.
Duffie, D. (2010). Dynamic Asset Pricing Theory. Princeton University Press.
Campbell, J. Y. (2017). Financial Decisions and Markets: A Course in Asset Pricing. Princeton University Press.
Ljungqvist, L., & Sargent, T. (2018). Recursive Macroeconomic Theory (4th ed.). MIT Press.
Woodford, M. (2011). Simple analytics of the government expenditure multiplier. American Economic Journal: Macroeconomics, 3(1), 1–35.

6 Stochastic Discounting

6.1Time-Varying Discount Factors¶

6.1.1Valuation¶

6.1.1.1Motivation¶

6.1.1.2Theory¶

6.1.2Testing the Spectral Radius Condition¶

6.1.2.1Spectral Radii via Expectations¶

6.1.2.2Necessary Conditions¶

6.1.3Fixed-Point Results¶

6.1.3.1Long-Run Contractions¶

6.1.3.2A Spectral Radius Condition¶

6.1.3.3A Generalized Blackwell Condition¶

6.2Optimality with State-Dependent Discounting¶

6.2.1MDPs with State-Dependent Discounting¶

6.2.1.1Setup¶

6.2.1.2Finite Lifetime Values¶

6.2.1.3Optimality¶

6.2.1.4Algorithms¶

6.2.1.5Exogenous Discounting¶

6.2.1.6Comments on the Spectral Radius Condition¶

6.2.2Inventory Management Revisited¶

6.3Asset Pricing¶

6.3.1Introduction to Asset Pricing¶

6.3.1.1Risk-Neutral Pricing?¶

6.3.1.2A Stochastic Discount Factor¶

6.3.1.3A General Specification¶

6.3.1.4Markov Pricing¶

6.3.1.5Pricing a Stationary Dividend Stream¶

6.3.1.6Forward Sum Representation¶

6.3.2Nonstationary Dividends¶

6.3.2.1Price-Dividend Ratios¶

6.3.2.2Application: Markov Growth with a Lucas SDF¶

6.3.3Incomplete Markets¶

6.4Chapter Notes¶