Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Adaptation in Artificial Economies

This chapter puts adaptive agents into five different environments that have been analyzed in the rational expectations literature, thereby illustrating some of the different structures and possibilities in economic systems composed of such agents.

The first model, due to Bray, nicely illustrates many features of ‘least squares learning’ in a ‘self-referential’ system, including the temporary irrationality of adaptive forecasting rules and the possibility of their eventual rationality. The second model, a version of Samuelson’s overlapping-generations model of money, illustrates how successive generations can adaptively climb their way to more or less complicated rational expectations equilibria, and how the rate of convergence can depend on details of the adaptation algorithm and the intricacy of what must be learned. The third example puts overlapping generations of adaptive agents into an environment with too many equilibria, namely, a multiple currency setting in which the substitution of adaptive for ‘rational’ agents is enough to render the exchange rate determinate, but history dependent. The fourth example is a version of the ‘no-trade’ environment of Jean Tirole, in which the ‘problem’ with the rational expectations equilibrium is its incredible efficiency in eliminating opportunities for trade based on disparate information; I show how replacing rational agents with adapting ones can serve temporarily to restore opportunities for trade and thereby create trading volume. The last example, Marcet and Sargent’s model of investment under uncertainty with learning, is designed to illustrate how much ‘coaxing’ must be done by us and how much ‘theorizing’ must be done by our artificial agents for them to learn when their planning horizon is infinite.

The presentation in this chapter is informal. We spend most of our effort describing and simulating models. The chapter is concluded with a brief description of how the machinery of stochastic approximation can be used to attain analytical results about the limiting behavior of such models.

A model of Bray

Margaret Bray (1982) studied a model that exhibits several features of systems that are adapting their way to a rational expectations equilibrium. These features include:

(a) People use a forecasting scheme that would be optimal if the environment were stationary. But their learning causes the environment to be non-stationary, and their learning scheme suboptimal.

(b) Sometimes the system converges to a rational expectations equilibrium.

(c) If the system does not converge to a rational expectations equilibrium, it does not converge.

(d) The dimension of the ‘state’ of the system with learning is larger than the corresponding rational expectations equilibrium, because measures of people’s beliefs are needed to describe the position and motion of the system.

In Bray’s model, the environment would be stationary if people were to know the distribution of prices. The dynamics in the model all come from the adjustment of people’s expectations, which vanish if and when people learn the equilibrium distribution of prices.

Bray assumed a ‘cobweb’-like structure in which the equilibrium price ptp_t for a single commodity is determined by a market-clearing condition of the form

pt=a+bpt+1e+ut,p_t = a + b p_{t+1}^e + u_t,

where pt+1ep_{t+1}^e is the price that market participants expect to prevail at time t+1t + 1, and {ut}\{u_t\} is an independently and identically distributed sequence of random variables with mean zero. To compute a rational expectations equilibrium, we note the absence of dynamics in either the structural equation (1) or the shock utu_t, and so we guess that pt+1e=βp_{t+1}^e = \beta t\forall t, a constant that is independent of time. Substituting this guess into (1) gives pt=a+bβ+utp_t = a + b\beta + u_t, which implies Et1pt=a+bβE_{t-1} p_t = a + b\beta. Evidently, the guess is true if β=a/(1b)\beta = a/(1 - b). Substituting this value of β=pt+1e\beta = p_{t+1}^e back into (1) shows that in a rational expectations equilibrium pt=β+utp_t = \beta + u_t.

In backing off rational expectations, Bray assumed that people form the expectation pt+1ep_{t+1}^e by taking an average of past prices. For convenience, we use the notation βt=pt+1e\beta_t = p_{t+1}^e. In terms of a stochastic approximation algorithm, Bray’s assumption about expectations can be represented as

βt=βt1+(1/t)(pt1βt1).\beta_t = \beta_{t-1} + (1/t)(p_{t-1} - \beta_{t-1}).

Notice how this scheme uses only observations on prices through period t1t - 1 to form price expectations at time tt.

Rewrite equation (1) by substituting βt\beta_t for pt+1ep_{t+1}^e to get

pt=a+bβt+ut.p_t = a + b \beta_t + u_t.

Given an initial condition for β\beta, equations (2) and (3) determine the evolution of (p,β)(p, \beta) through time, where βt\beta_t is interpreted as people’s expectation of what pt+1p_{t+1} will be.[1] Bray studied the circumstances under which βt\beta_t and the distribution of pt+1p_{t+1}, which evolve interdependently, would converge to a rational expectations equilibrium. That is, she studied the conditions under which βt\beta_t would converge to the value β=a/(1b)\beta = a/(1 - b).

For describing people’s learning behavior, we can use a state vector zt=[pt 1]z_t = [p_t \ 1]', whose law of motion is evidently

[pt1]=[0a+bβt01][pt11]+[10]ut.\begin{bmatrix} p_t \\ 1 \end{bmatrix} = \begin{bmatrix} 0 & a + b\beta_t \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p_{t-1} \\ 1 \end{bmatrix} + \begin{bmatrix} 1 \\ 0 \end{bmatrix} u_t.

This system indicates that, when at time tt people estimate the price next period to be βt\beta_t, they act to make the best prediction of next period’s price be a+bβta + b\beta_t. Notice that in forecasting this way people are acting as if they believe (incorrectly) that the law of motion of the state is not (4) but rather

[pt1]=[0β01][pt11]+[10]vt,\begin{bmatrix} p_t \\ 1 \end{bmatrix} = \begin{bmatrix} 0 & \beta \\ 0 & 1 \end{bmatrix} \begin{bmatrix} p_{t-1} \\ 1 \end{bmatrix} + \begin{bmatrix} 1 \\ 0 \end{bmatrix} v_t,

for some serially uncorrelated random process {vt}\{v_t\} with mean zero, where β\beta is a constant. When people perceive that the law of motion for ztz_t is governed by (5), their forecasting causes the actual law of motion to be (4).

Margaret Bray showed that, if b<1b < 1, system (2), (3) will converge to a rational expectations equilibrium with probability one. She also noted that b<1b < 1 is a necessary condition for convergence to a rational expectations equilibrium.[2] Furthermore, she showed that, if the system does not converge to a rational expectations equilibrium, it does not converge at all.[3]

Irrationality of expectations

Although the model’s exogenous ‘fundamentals,’ i.e. the {ut}\{u_t\} process and the parameters aa, bb, are stationary, the stochastic process for the price {pt}\{p_t\} is nonstationary, because it is a piece of the joint process {pt,βt}\{p_t, \beta_t\} determined by (2), (3). This means that the expectations formation scheme (2), which is a sensible way to estimate a mean for a stationary process (e.g., for someone already living within the rational expectations equilibrium of this market), is suboptimal so long as expectations are being revised. The fact that β\beta in (3) is moving through time, as described by the law of motion (4), means that βt\beta_t is itself a ‘hidden state variable,’ and that the system (4) should be augmented to include it. Substituting (2) into (3) and rearranging gives the following system:

[pt1βt]=[b/tab(t1)/t0101/t0(t1)/t][pt11βt1]+[100]ut.\begin{bmatrix} p_t \\ 1 \\ \beta_t \end{bmatrix} = \begin{bmatrix} b/t & a & b(t-1)/t \\ 0 & 1 & 0 \\ 1/t & 0 & (t-1)/t \end{bmatrix} \begin{bmatrix} p_{t-1} \\ 1 \\ \beta_{t-1} \end{bmatrix} + \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} u_t.

This is a nonlinear stochastic difference equation, which can be used to forecast prices with smaller mean squared error than given by the forecast βt\beta_t used by the people in the model. In particular, the expectation of ptp_t conditioned on the entire state zt=[pt1 1 βt1]z_t^* = [p_{t-1} \ 1 \ \beta_{t-1}]' is equal to

Eptzt=a+(b/t)pt1+(b(t1)/t)βt1.E p_t | z_t^* = a + (b/t) p_{t-1} + (b(t-1)/t) \beta_{t-1}.

Equation (7) gives the ‘rational expectation’ of price conditional on the full state vector [pt1 1 βt1][p_{t-1} \ 1 \ \beta_{t-1}]. This is the price that would be forecast by an outside observer who knew that the price was determined by (2), (3), and who could observe (or compute) βt1\beta_{t-1} as well as pt1p_{t-1}. The failure of this conditional expectation to equal βt\beta_t, except after convergence, indicates the irrationality of the learning scheme.

Simulation of p_{t+1}^e = \beta_t (solid line) and the rational expectation E p_{t+1} | z_{t+1}^* (dotted line) in Bray’s model starting from \beta_0 = 8. The variance of u_t was set at one. The conditional expectation E p_{t+1} | z_{t+1}^* is the best forecast of price that could be made by an outside observer who understood that agents are learning via Bray’s scheme.

Figure 1:Simulation of pt+1e=βtp_{t+1}^e = \beta_t (solid line) and the rational expectation Ept+1zt+1E p_{t+1} | z_{t+1}^* (dotted line) in Bray’s model starting from β0=8\beta_0 = 8. The variance of utu_t was set at one. The conditional expectation Ept+1zt+1E p_{t+1} | z_{t+1}^* is the best forecast of price that could be made by an outside observer who understood that agents are learning via Bray’s scheme.

Figure 1 and Figure 2 display aspects of a simulation of Bray’s model in which we set a=5a = 5, b=0.7b = 0.7, Eut2=1E u_t^2 = 1. The random process {ut}\{u_t\} was generated with a Gaussian pseudo-random number generator. The rational expectations price is a/(1b)=16.667a/(1 - b) = 16.667. We started the system at β0=8\beta_0 = 8, an expected price far below the rational expectations price. Figure 1 shows the gap between the least squares forecast βt\beta_t and the conditional expectation Ept+1zt+1E p_{t+1} | z_{t+1}^*, which is large at first, then gradually diminishes over time. Figure 2 and Figure 3 show how the rational expectations forecast EptztE p_t | z_t^* on average is closer to the actual price ptp_t than is βt1\beta_{t-1}.

Simulation of p_t (dotted line) and \beta_{t-1} in Bray’s model. The forecast \beta_t on average underpredicts p_t, but the underprediction tends to diminish with time.

Figure 2:Simulation of ptp_t (dotted line) and βt1\beta_{t-1} in Bray’s model. The forecast βt\beta_t on average underpredicts ptp_t, but the underprediction tends to diminish with time.

Simulation of p_t (dotted line) and E p_t | z_t^* in Bray’s model.

Figure 3:Simulation of ptp_t (dotted line) and EptztE p_t | z_t^* in Bray’s model.

Why not repair the irrationality indicated by the discrepancy between βt1\beta_{t-1} and EptztE p_t | z_t^* by going back to the original model and replacing βt1\beta_{t-1} by EptztE p_t | z_t^*? Evidently, using this new theory of price expectations would require us to modify the actual law of motion for prices, which would render our new scheme suboptimal again. But we can use this new actual law of motion as our theory of expectations. This line starts us on a recursion, which has been taken up by Bray and Kreps (1987), who show that in the limit it leads us back to a rational expectations in which agents are learning ‘within’ the equilibrium, but not ‘about’ the equilibrium. Bray and Kreps argue against following this recursion to its limit if it is ‘bounded rationality’ that we are after.[4]

Heterogeneity of expectations and size of the state

Bray’s model assumes that all market participants have the same beliefs βt\beta_t. If we permit heterogeneity of beliefs, the effect is to add to the dimension of the true state in the appropriate counterpart to (6). For example, suppose that there are two classes of agents, differentiated only by the initial β\beta, say β0a\beta_0^a and β0b\beta_0^b, which they use in a version of scheme (2), and that each of the two classes accounts for half of the market. Then the counterpart to (6) would include (βta,βtb)(\beta_t^a, \beta_t^b) as state variables. In this version of Bray’s model, heterogeneity of beliefs would vanish as the system converges to a rational expectations equilibrium.[5]

An economy with Markov deficits

The second example is the overlapping-generations model of a monetary economy introduced by Paul Samuelson, and used extensively by John Bryant, Neil Wallace, and others to study issues of inflationary finance. We use this model to illustrate how:

(a) Learning can be modelled by having overlapping generations of agents adjust their behaviors relative to those of their ancestors in a utility increasing direction.

(b) The object about which agents are learning can be specified ‘non-parametrically,’ provided that agents are patient enough or lucky enough to be willing to learn how to behave on a state-by-state basis.

(c) Where the state is of high dimension, agents can be modelled as learning by using parametric decision rules.

(d) There is a close connection between algorithms to compute (approximate) equilibria and models of learning.

The economy consists of overlapping generations of two-period lived agents. At each date t1t \geq 1, there are born NN identical agents who are endowed with w1w_1 units of a single consumption good when young, and w2w_2 units when old. Each young agent’s preferences over a lifetime consumption profile (c1,c2)(c_1, c_2) are ordered by the expected value of u(c1)+u(c2)u(c_1) + u(c_2), where u(c)=cσ/σu(c) = -c^{-\sigma}/\sigma, where σ>0\sigma > 0.

There is a government that prints currency to finance government expenditures that are governed by a Markov chain

π(i,j)=Prob{Gt+1=GˉjGt=Gˉi}.\pi(i, j) = \text{Prob}\{G_{t+1} = \bar{G}_j \mid G_t = \bar{G}_i\}.

We let G=[Gˉ1,Gˉ2,,Gˉn]G = [\bar{G}_1, \bar{G}_2, \ldots, \bar{G}_n] be the possible levels of government expenditures. The government’s budget constraint is

Gt=(HtHt1)/pt,G_t = (H_t - H_{t-1})/p_t,

where HtH_t is the stock of currency carried over by the young at tt to t+1t + 1, and ptp_t is the time tt price level.

Stationary rational expectations equilibrium

The rate of return on currency between tt and t+1t + 1 is pt/pt+1=Rtp_t/p_{t+1} = R_t. We shall seek a stationary equilibrium in which the rate of return on currency is given by

R(i,j)=rate of return on currency from t to t+1 when Gt=Gˉi and Gt+1=Gˉj.R(i, j) = \text{rate of return on currency from } t \text{ to } t + 1 \text{ when } G_t = \bar{G}_i \text{ and } G_{t+1} = \bar{G}_j.

Finding a stationary equilibrium requires solving a set of nonlinear equations in a set of vectors characterizing individual agents’ optimal decisions. In a stationary equilibrium, savings are determined by an (n×1)(n \times 1) vector of state-dependent saving rates where s=[s1,,sn]s = [s_1, \ldots, s_n] and sis_i is the saving rate when Gt=GˉiG_t = \bar{G}_i.[6] A young agent’s utility is given by

V(si)=u(w1si)+ju(w2+siR(i,j))π(i,j),V(s_i) = u(w_1 - s_i) + \sum_j u(w_2 + s_i R(i, j)) \pi(i, j),

for i=1,,ni = 1, \ldots, n where uu is an increasing and concave utility function. The first-order conditions with respect to sis_i are V(si)=0V'(s_i) = 0 i\forall i or

u(w1si)=j=1nu(w2+siR(i,j))R(i,j)π(i,j).u'(w_1 - s_i) = \sum_{j=1}^n u'(w_2 + s_i R(i, j)) R(i, j) \pi(i, j).

In a stationary equilibrium, the government’s budget constraint, namely, (Ht/pt)(Ht1/pt1)(pt1/pt)=Gt(H_t/p_t) - (H_{t-1}/p_{t-1})(p_{t-1}/p_t) = G_t, can be written as

hjhiR(i,j)=Gˉjh_j - h_i R(i, j) = \bar{G}_j

where hi=Ht/pth_i = H_t/p_t when Gt=GˉiG_t = \bar{G}_i.

Finally, the condition that the supply of currency must equal the demand can be written Ht/pt=stNH_t/p_t = s_t N or

hi=siN.h_i = s_i N.

To determine a stationary equilibrium, we have to solve equations (12), (13), and (14) for (n×1)(n \times 1) vectors ss and hh and an (n×n)(n \times n) matrix RR of rates of return, where the sis_i’s satisfy si(0,w1)s_i \in (0, w_1) and the elements of R(i,j)R(i, j) are all positive. Notice that (13) and (14) imply

R(i,j)=sjNGˉjsiN.R(i, j) = \frac{s_j N - \bar{G}_j}{s_i N}.

Substituting (15) into (12) gives the following set of nn nonlinear equations to be solved for [s1,,sn][s_1, \ldots, s_n]:

u(w1si)=j=1nu(w2+sjNGˉjN)sjNGˉjsiNπ(i,j).u'(w_1 - s_i) = \sum_{j=1}^n u'\left(w_2 + \frac{s_j N - \bar{G}_j}{N}\right) \cdot \frac{s_j N - \bar{G}_j}{s_i N} \pi(i, j).

Evidently, a stationary equilibrium exists if and only if (16) can be solved for s=[s1,,sn]s = [s_1, \ldots, s_n] with si(0,w1)s_i \in (0, w_1) for all ii.

In general, the system of nonlinear equations (16) that determines a vector of stationary equilibrium saving rates ss has multiple solutions. In addition to multiple stationary equilibria, there are nonstationary equilibria of the model, with a form resembling the ‘bubble equilibria’ of the models of money described in Chapter 2 and which we shall meet again in Chapter 6. Overlapping-generations models of the type we are using also have stationary ‘sunspot equilibria,’ that is, equilibria in which random variables (called ‘sunspots’ or ‘extrinsic random variables’) influence equilibrium prices and quantities only because they are expected to influence them.[7] The stability of sunspot equilibria under adaptive learning has been studied by Woodford (1990), Evans (1989), and Evans and Honkapohja (1992a).

A learning version

We use this environment as a setting in which successive generations of agents are ‘learning’ or ‘evolving.’ We want to watch how collections of adapting agents cope with the environment, and see whether and when they might eventually learn the rational expectations equilibrium. We can also watch how agents’ learning varies as we alter the complexity of what they learn about, which in this setting is controlled by the number of states and the stochastic structure of the Markov process for government expenditures.

We endow agents with knowledge about their own utility functions, about the previous experiences of agents like themselves, and about the behavior of past and present government expenditures and prices. However, we do not give agents knowledge of the distributions of government expenditures, prices, and rates of return. Instead of knowing these distributions, agents must somehow use their historical observations, which might be arranged in the form of histograms or empirical probability distributions, to make decisions by some principle other than that of ‘expected utility maximization with knowledge of equilibrium probability distributions.’

The economy with learning is identical with the model with rational agents, except that now the households consist of two classes (subsequences) of (adaptive) agents. We include two classes, called ‘odd’ and ‘even,’ because, in order to evaluate a person’s saving decision, we wait until two periods’ worth of consumption data for that person have become known. Odd agents reset a saving rate when tt is odd, while even agents reset a saving rate when tt is even. Odd agents learn from the past experiences of other odd agents, and even agents from the past experiences of other even agents. Agents of each class will be assumed to update their saving decisions based on the utility experienced by previous people of their type. In particular, they will adapt the decisions of their predecessors using a recursive Newton--Raphson (or stochastic approximation) procedure.[8] The ex post realized utility of a person who observed Gt=GˉiG_t = \bar{G}_i and set st=sis_t = s_i when young at time tt is

U(si)=u(w1si)+u(w2+siRt),U(s_i) = u(w_1 - s_i) + u(w_2 + s_i \cdot R_t),

where RtR_t is the realized gross rate of return. The derivatives of realized utility with respect to the saving decision sis_i are

U(si)=u(w1si)+u(w2+siRt)RtU'(s_i) = -u'(w_1 - s_i) + u'(w_2 + s_i R_t) \cdot R_t
U(si)=u(w1si)+u(w2+siRt)Rt2.U''(s_i) = u''(w_1 - s_i) + u''(w_2 + s_i R_t) R_t^2.

Notice that, in a rational expectations equilibrium, V(sit)=EtU(sit)V'(s_{it}) = E_t U'(s_{it}), V(sit)=EtU(sit)V''(s_{it}) = E_t U''(s_{it}), where Et()E_t(\cdot) is the expectation conditional on Gt=GˉiG_t = \bar{G}_i. We want a learning algorithm to apply where people don’t know these conditional expectations.

We assume that people use a Robbins--Monro algorithm, state by state. To set up the Robbins--Monro algorithm, we have to keep track of the number of periods an individual has been in a given state (i.e., observed Gt=GˉiG_t = \bar{G}_i) for each state. We let tj=1,2,t_j = 1, 2, \ldots for j=o,ej = o, e index the cumulative number of odd and even generations, respectively. For each state i=1,2,,ni = 1, 2, \ldots, n, we let τij(tj)\tau_i^j(t_j), j=o,ej = o, e, index the cumulative number of times that GtG_t has equalled Gˉi\bar{G}_i. That is,

τij(tj+1)={τij(tj)+1if Gtj=Gˉiτij(tj)otherwise.\tau_i^j(t_j + 1) = \begin{cases} \tau_i^j(t_j) + 1 & \text{if } G_{t_j} = \bar{G}_i \\ \tau_i^j(t_j) & \text{otherwise.} \end{cases}

We define the decreasing gain sequence γτ=1/τ\gamma_\tau = 1/\tau. The learning algorithm is then

sj(i,τij+1)=sj(i,τij)γτijMj(i,τij+1)1U(sj(i,τij))s^j(i, \tau_i^j + 1) = s^j(i, \tau_i^j) - \gamma_{\tau_i^j} M^j(i, \tau_i^j + 1)^{-1} U'(s^j(i, \tau_i^j))
Mj(i,τij+1)=Mj(i,τij)+γτij(U(sj(i,τij))Mj(i,τij)).M^j(i, \tau_i^j + 1) = M^j(i, \tau_i^j) + \gamma_{\tau_i^j} \left( U''(s^j(i, \tau_i^j)) - M^j(i, \tau_i^j) \right).

This algorithm is set up to promote the possibility that, as τij\tau_i^j \to \infty, we will have M(i,τij)EiU(si)M(i, \tau_i^j) \to E_i U''(s_i) and sj(i,τij)s^j(i, \tau_i^j) will solve the first-order conditions EiU(si)=0E_i U'(s_i) = 0.

We assume the one-period utility function u(c)=cσ/σu(c) = -c^{-\sigma}/\sigma. In the experiments reported below, we assume the logarithmic specification σ=1\sigma = 1.

We assume that young agents observe GtG_t before they make their saving decision. Agents of each type begin each period with an (n×1)(n \times 1) vector of saving rates sj(τ,τij)s^j(\tau, \tau_i^j), j=o,ej = o, e, ‘learned’ from ancestors of their own type, which they use as a state-contingent saving rule. When Gt=GˉiG_t = \bar{G}_i, the young at date tt set savings according to st=sj(i,τij)s_t = s^j(i, \tau_i^j).

The price level at time tt is determined by the two equations

Ht=Ht1+ptGtH_t = H_{t-1} + p_t G_t
Ht/pt=stN,H_t / p_t = s_t \cdot N,

which imply

pt=Ht1/(NstGt).p_t = H_{t-1} / (N s_t - G_t).

We require an initial condition H0H_0 for HtH_t at t=1t = 1, and initial conditions for (Mj,sj)(M^j, s^j) for each of our two classes of agents.

We assume, as in the rational expectations version of the model, that GtG_t is a Markov chain with transition matrix π\pi, where π(i,j)=Prob{Gt+1=GˉjGt=Gˉi}\pi(i, j) = \text{Prob}\{G_{t+1} = \bar{G}_j \mid G_t = \bar{G}_i\}.

Some experiments

Figure 4 and Figure 5 report the results of using our learning algorithm. For each experiment, we set γτ=1/τ\gamma_\tau = 1/\tau τ\forall \tau, and we set initial conditions for Mt=IM_t = -I.[9] We studied two economies, identical in all respects except for the stochastic process for government expenditures. In each economy, government expenditures follow a two-state Markov process with transition matrix π\pi. In each economy, w1=20w_1 = 20, w2=10w_2 = 10 for each type of agent. In the first economy, government expenditures are identically zero in both states and πij=0.5\pi_{ij} = 0.5 (i,j)\forall (i, j). This implies that the rational expectations savings rates are 5 for each ‘state’ of government expenditures, and that the equilibrium rate of return on currency is unity always. In the second economy, government expenditures follow a Markov process with

π=(0.750.250.50.5),\pi = \begin{pmatrix} 0.75 & 0.25 \\ 0.5 & 0.5 \end{pmatrix},

where the two states are [Gˉ1 Gˉ2]=[0.8 0][\bar{G}_1 \ \bar{G}_2] = [0.8 \ 0]. For the second economy, the equilibrium savings rates are [4.211 4.364][4.211 \ 4.364], and the equilibrium rates of return are

R=(0.811.03620.78171.00).R = \begin{pmatrix} 0.81 & 1.0362 \\ 0.7817 & 1.00 \end{pmatrix}.

For comparability, we model each of the economies as being driven by a two-state Markov process for government expenditures, though in the first economy this means that the agents are wastefully overfitting their saving function.

Figure 4 and Figure 5 indicate that both economies are converging to the rational expectations savings rate. The convergence occurs more and more smoothly as time passes, a feature caused by the action of the γτ\gamma_\tau sequence.

Simulation of savings rates of odd agents for stochastic approximation algorithm when w_1 = 20, w_2 = 10, G_t = 0 \forall t. The rational expectations equilibrium savings rates are 5 in each state. The saving rates (dark line for state 1, dotted line for state 2) are converging to the rational expectations saving rates.

Figure 4:Simulation of savings rates of odd agents for stochastic approximation algorithm when w1=20w_1 = 20, w2=10w_2 = 10, Gt=0G_t = 0 t\forall t. The rational expectations equilibrium savings rates are 5 in each state. The saving rates (dark line for state 1, dotted line for state 2) are converging to the rational expectations saving rates.

In the zero expenditure economy, nothing stochastic is occurring. For this economy, convergence can be accelerated by using a constant-gain algorithm. This algorithm is formed by replacing the γτ\gamma_\tau by a constant. Besides accelerating convergence in constant environments, a potential advantage of constant-gain learning schemes is that they retain their flexibility to respond with the passage of time. A concomitant disadvantage is that their readiness to respond to recent occurrences prevents convergence to a rational expectations equilibrium when there are intrinsic shocks in the system. When intrinsic shocks are present, the most that can be hoped for with a constant-gain algorithm is convergence to a situation in which beliefs eventually spend most of their time within a neighborhood (whose size depends on the gain parameter) of rational expectations beliefs.[10]

Simulation of saving rates of odd agents for stochastic approximation algorithm when w_1 = 20, w_2 = 10, [\bar{G}_1 \ \bar{G}_2] = [0.8 \ 0], \pi = \begin{pmatrix} 0.75 & 0.25 \\ 0.5 & 0.5 \end{pmatrix}. The rational expectations savings rates are 4.211, 4.364. The rational expectations rates of return on currency are R = \begin{pmatrix} 0.81 & 1.0362 \\ 0.7817 & 1.00 \end{pmatrix}. The dark line is the saving rate for state 1, the dotted line the saving rate for state 2. Also plotted are the rational expectations saving rates in states 1 and 2.

Figure 5:Simulation of saving rates of odd agents for stochastic approximation algorithm when w1=20w_1 = 20, w2=10w_2 = 10, [Gˉ1 Gˉ2]=[0.8 0][\bar{G}_1 \ \bar{G}_2] = [0.8 \ 0], π=(0.750.250.50.5)\pi = \begin{pmatrix} 0.75 & 0.25 \\ 0.5 & 0.5 \end{pmatrix}. The rational expectations savings rates are 4.211, 4.364. The rational expectations rates of return on currency are R=(0.811.03620.78171.00)R = \begin{pmatrix} 0.81 & 1.0362 \\ 0.7817 & 1.00 \end{pmatrix}. The dark line is the saving rate for state 1, the dotted line the saving rate for state 2. Also plotted are the rational expectations saving rates in states 1 and 2.

We display the results of using the constant-gain learning scheme in Figure 6 and Figure 7. Evidently, convergence with a constant-gain algorithm occurs much faster in the economy with the simpler government policy (the one with government expenditures always zero).

A comparison of the outcomes depicted in Figure 5 and Figure 6 provides an idea of some of the tradeoffs involved between constant-gain and decreasing-gain algorithms.

Simulation of saving rates for constant gain algorithm when w_1 = 20, w_2 = 10, [\bar{G}_1 \ \bar{G}_2] = [0.8 \ 0], \pi = \begin{pmatrix} 0.75 & 0.25 \\ 0.5 & 0.5 \end{pmatrix}. The gain \gamma is held constant at 0.05. The rational expectations savings rates are 4.211, 4.364. The algorithm does not converge, but seems to get to the vicinity of the rational expectations saving rates. The savings rate for state 1 is shown in the solid line, that for state 2 in the dotted line.

Figure 6:Simulation of saving rates for constant gain algorithm when w1=20w_1 = 20, w2=10w_2 = 10, [Gˉ1 Gˉ2]=[0.8 0][\bar{G}_1 \ \bar{G}_2] = [0.8 \ 0], π=(0.750.250.50.5)\pi = \begin{pmatrix} 0.75 & 0.25 \\ 0.5 & 0.5 \end{pmatrix}. The gain γ\gamma is held constant at 0.05. The rational expectations savings rates are 4.211, 4.364. The algorithm does not converge, but seems to get to the vicinity of the rational expectations saving rates. The savings rate for state 1 is shown in the solid line, that for state 2 in the dotted line.

Simulation of saving rates for constant-gain algorithm when G_t = 0, i = 1, 2. The gain is being held constant at 0.3 for each class of agents. Convergence is fast to the rational expectations savings rates.

Figure 7:Simulation of saving rates for constant-gain algorithm when Gt=0G_t = 0, i=1,2i = 1, 2. The gain is being held constant at 0.3 for each class of agents. Convergence is fast to the rational expectations savings rates.

Parametric and non-parametric adaptation

In the preceding formulation, people choose one saving rate for each level of government expenditure. This specification was designed potentially to let the system eventually ‘learn’ the rational expectations equilibrium, in which the ‘state’ is the (n×1)(n \times 1) vector GG of government expenditures. By letting agents learn a distinct saving rate ss to apply for each GG, we are in effect letting them use a non-parametric specification to learn about a policy function s=f(G)s = f(G).

There are two potential difficulties with this specification. First, the transition matrix π\pi may imply that some states Gˉi\bar{G}_i are visited very infrequently. Observations from such states will roll in only slowly, making learning occur slowly. Of course, in terms of the unconditional expected utility of the agents, failure to learn about the correct thing to do in such infrequently visited states may cost little. Second, when the number of states nn is large, the specification of one saving parameter for each state will become burdensome, again because the observations per state will roll in slowly.

An econometrician’s or statistician’s solution to this problem would be to assume a parametric form for the saving function s=f(G,θ)s = f(G, \theta), where θ\theta is a vector of parameters, of small dimension relative to nn, and then to use all of the observations to estimate θ\theta. A recursive algorithm for estimating the parameters θ\theta would use the gradient

U/θ=U(s)f/θ,\partial U / \partial \theta = U'(s) \partial f / \partial \theta,

and the Jacobian U2/θ2\partial U^2 / \partial \theta^2. A recursive algorithm would be

θτ+1=θτγτMτ+11U/θτ\theta_{\tau+1} = \theta_\tau - \gamma_\tau M_{\tau+1}^{-1} \partial U / \partial \theta_\tau
Mτ+1=Mτ+γτ(2U/θτ2Mτ).M_{\tau+1} = M_\tau + \gamma_\tau (\partial^2 U / \partial \theta_\tau^2 - M_\tau).

This algorithm uses each observation to estimate a more or less smooth function f(G,θ)f(G, \theta) to be used to determine savings.

Use of a parametric form f(G,θ)f(G, \theta) for the saving function raises the issue of approximation. Evidently a learning scheme that uses a parametric specification has a chance eventually of converging to a rational expectations equilibrium only if a rational expectations equilibrium can be supported by a saving function within the class of functions determined by f(G,θ)f(G, \theta). For many models, the chosen econometrically convenient function f(G,θ)f(G, \theta) will not be compatible with the functions determined by a rational expectations equilibrium. In these situations, a learning scheme based on a parametric specification will, if it converges, converge to an approximate rational expectations equilibrium.[11]

Learning the hard way

In the preceding model, learning occurs between non-overlapping generations of grandparents to grandchildren, with the grandchildren adjusting their grandparents’ saving choice after observing the consequences of their grandparents’ choice.[12] This model requires little of agents in the way of ‘theorizing,’ at the cost of rendering their learning dependent on the behaviors and experiences of their predecessors.

When we extend the horizon beyond two periods, it becomes increasingly inconvenient to model learning in this way because we have to wait longer for the consequences of life-time savings behavior to be known. If we attribute some ‘theorizing’ to our agents, we can avoid the need to learn only from one’s predecessors’ complete life-time experiences.

Learning via model formation

To motivate an alternative model of learning in this environment, consider the Euler equation for a young agent’s saving decision within a rational expectations equilibrium:

u(w1si)=Et(u(w2+stRt)Rt),u'(w_1 - s_i) = E_t(u'(w_2 + s_t R_t) R_t),

where the conditional expectation Et()E_t(\cdot) is over the equilibrium distribution of the rate of return on currency, Rt=pt/pt+1R_t = p_t / p_{t+1}, conditional on the current value of the deficit GtG_t. As earlier, sis_i is the saving rate when Gt=GˉiG_t = \bar{G}_i. One way to formulate the problem of learning is to suppose that there is a representative young agent within each generation who knows the utility function uu and how to compute the derivative u()u'(\cdot), but who does not know the distribution with respect to which the conditional expectation Et()E_t(\cdot) is to be computed in (31). To cope with this situation, the agent forms a model of the probability distribution with respect to which Et()E_t(\cdot) is to be computed, and adopts an algorithm for updating this distribution as new data arrive. At each point in time, the agent uses this estimated model distribution as the distribution in (31), and uses (31) to determine sis_i. Then the price level is determined as above, namely, by

pt=Ht1/(NstGt).p_t = H_{t-1} / (N s_t - G_t).

We describe two methods for modelling and updating the required distributions.

Updating histograms

Here the agent’s model is created by simply forming histograms of ex post realized rates of return RtR_t, one for each of the possible realized values of GtG_t. When Gt=GˉiG_t = \bar{G}_i is observed at time tt, the young agent forms sis_i by using that histogram to represent the conditional expectation in (31). Let rjr_j, j=1,,J(t,i)j = 1, \ldots, J(t, i), be the population of values of RtR_t that have been observed prior to tt to follow the event Gt=GˉiG_t = \bar{G}_i, where J(t,i)J(t, i) is the number of times the event Gt=GˉiG_t = \bar{G}_i has occurred prior to time tt. Then sis_i is the value that solves

u(w1si)=j=1J(t,i)u(w2+sirj)rj/J(t,i).u'(w_1 - s_i) = \sum_{j=1}^{J(t,i)} u'(w_2 + s_i r_j) r_j / J(t, i).

As time passes, the histograms are updated.

A parametric model of conditional probabilities

Here the agent adopts a parametric model of the conditional probabilities, namely,

Prob(RtRGt=Gˉi)=F(R,Gˉi,θi).\text{Prob}(R_t \leq R \mid G_t = \bar{G}_i) = F(R, \bar{G}_i, \theta_i).

At time tt the agent has an estimate of the parameters of the distributions θit\theta_{it}, and uses them to determine behavior via the following approximation to (31):

u(w1si)=Ru(w2+siR)RdF(R,Gˉi,θit).u'(w_1 - s_i) = \int_R u'(w_2 + s_i R) R \, dF(R, \bar{G}_i, \theta_{it}).

As data on (Rt,Gt)(R_t, G_t) pairs flow in, the agent uses an adaptive algorithm to update his estimates of θit\theta_{it}. For example, for each ii, let the distribution be a two-parameter distribution determined by the first and second moment. Then the agent would update these parameters using the stochastic approximation algorithm

μτ=μτ1+(1/τ)(Rτμτ1)\mu_\tau = \mu_{\tau-1} + (1/\tau)(R_\tau - \mu_{\tau-1})
mτ=mτ1+(1/τ)(Rτ2mτ1),m_\tau = m_{\tau-1} + (1/\tau)(R_\tau^2 - m_{\tau-1}),

where there is a different ‘clock’ τ(t)\tau(t) for each event Gt=GˉiG_t = \bar{G}_i.

Approximate equilibria

When we adopt a learning scheme that restricts agents’ decision rules to too small a class of functions, we cannot expect the economy with adaptively learning agents ever to converge to a rational expectations equilibrium. The most we can hope is that the learning economy might converge to an approximate equilibrium, a concept that is used by applied researchers interested in computing rational expectations equilibria. In this section, I briefly describe some of the connections between algorithms to compute approximate equilibria and economies populated by adaptively learning agents.

Computing an equilibrium of the model becomes more demanding as we expand the dimension of the state space. Suppose that we modify the previous model by assuming that government expenditures are determined by the continuous state Markov process with transition kernel

Prob{Gt+1GGt=G}=F(G,G).\text{Prob}\{G_{t+1} \leq G' \mid G_t = G\} = F(G', G).

All other aspects of the model remain unchanged. We conjecture an equilibrium saving function of the form st=f(G)s_t = f(G), and use the equilibrium conditions to derive restrictions on this function. The household’s first-order conditions evaluated at st=f(Gt)s_t = f(G_t) can be written

u(w1f(G))=Et(u(w2+f(G)Rt)Rt),u'(w_1 - f(G)) = E_t(u'(w_2 + f(G) R_t) R_t),

where Et()=E(Gt)E_t(\cdot) = E(\cdot \mid G_t). The equilibrium condition and the government budget constraint imply Nf(Gt)Nf(Gt1)Rt1=GtNf(G_t) - Nf(G_{t-1})R_{t-1} = G_t, which can be solved for Rt1R_{t-1}:

Rt1=(Nf(Gt)Gt)/Nf(Gt1).R_{t-1} = (Nf(G_t) - G_t) / Nf(G_{t-1}).

Substituting this into the household’s first-order condition gives

u(w1f(Gt))=Et[u(w2+Nf(Gt+1)Gt+1N)×Nf(Gt+1)Gt+1Nf(Gt)],u'(w_1 - f(G_t)) = E_t \left[ u'\left(w_2 + \frac{Nf(G_{t+1}) - G_{t+1}}{N}\right) \times \frac{Nf(G_{t+1}) - G_{t+1}}{Nf(G_t)} \right],

which is a functional equation in f(Gt)f(G_t).

There exist a number of methods for solving a functional equation like (41) numerically. All of these methods replace the function f(G)f(G) with a finite-parameter approximation f(G,θ)f(G, \theta), then find values of the parameters θ\theta that come as close as possible to satisfying (41).[13]

Method of parameterized expectations

Here is how Albert Marcet’s method of parameterized expectations can be used approximately to solve the functional equation (41). For convenience, write the right side of (41) as Etk(Gt,Gt+1)E_t k(G_t, G_{t+1}) where k(Gt,Gt+1)=u(w2+(Nf(Gt+1)Gt+1)/N)(Nf(Gt+1)Gt+1)/Nf(Gt+1)k(G_t, G_{t+1}) = u'(w_2 + (Nf(G_{t+1}) - G_{t+1})/N)(Nf(G_{t+1}) - G_{t+1})/Nf(G_{t+1}).[14]

  1. Guess that the conditional expectation on the right side of (41) has a form h(Gt,θ)h(G_t, \theta). Pick a starting value of θ\theta, call it θj\theta_j for j=1j = 1. Use this guess and (41) to solve for an initial saving function st=f(Gt,θj)s_t = f(G_t, \theta_j).

  2. Use a random number generator to draw a realization of length TT from the Markov process F(G,G)F(G', G). Use this simulation and f(Gt,θj)f(G_t, \theta_j) to generate a realization of k(Gt,Gt+1)k(G_t, G_{t+1}). Then use this realization to compute the non-linear regression coefficients θj+1\theta_{j+1} in the regression Etk(Gt,Gt+1)=h(Gt,θj+1)E_t k(G_t, G_{t+1}) = h(G_t, \theta_{j+1}).

  3. Solve the first-order condition u(w1st)=h(Gt,θj+1)u'(w_1 - s_t) = h(G_t, \theta_{j+1}) for a new saving function f(Gt,θj+1)f(G_t, \theta_{j+1}).

  4. Iterate on steps 1--3 to convergence.

There is evidently a close connection between this method for equilibrium computation and the behavior of a system populated by adaptive agents. Indeed, we can reinterpret a recursive or ‘on-line’ version of this algorithm as a system with adaptive agents. Thus, a recursive version of the nonlinear least squares algorithm is

θt+1=θt+(1/t)Rt+11(k(Gt,Gt+1)h(Gt,θt))h(Gt,θt)\theta_{t+1} = \theta_t + (1/t) R_{t+1}^{-1} \left( k(G_t, G_{t+1}) - h(G_t, \theta_t) \right) \nabla h(G_t, \theta_t)
Rt+1=Rt+(1/t)((h(Gt,θt))(h(Gt,θt))Rt),R_{t+1} = R_t + (1/t) \left( (\nabla h(G_t, \theta_t))(\nabla h(G_t, \theta_t))' - R_t \right),

where h(G,θ)\nabla h(G, \theta) is the gradient of hh with respect to the parameters θ\theta. Upon noting the resemblance between this algorithm and the learning scheme (41), it is understandable that Marcet proposed his equilibrium computation scheme as an outgrowth of earlier work on the dynamics of least squares learning systems.

Learning and equilibrium computation

Learning algorithms and equilibrium computation algorithms look like each other. Equilibrium computation algorithms often have interpretations as centralized learning algorithms whereby the model builder, acting in a role of ‘social planner,’ gropes for a set of pricing functions for markets and decision rules for agents that will satisfy all of the individual optimum conditions and market-clearing conditions. We have also seen that learning systems with boundedly rational agents sometimes have interpretations as decentralized equilibrium computation algorithms.

Recursive kernel density estimation

A continuous state (for GtG_t) specification in the present model is a convenient context for describing another way to formulate learning nonparametrically, namely, via recursive kernel estimators of a kind studied by Chen and White (1993). To describe their formulation, we first recall the nature of kernel estimators. Suppose that we have TT observations xtx_t, t=1,,Tt = 1, \ldots, T, on the nn-dimensional random vector xx drawn from an unknown joint density F(x)F(x). Let K(x):RnRK(x): \mathbb{R}^n \to \mathbb{R} be a probability density for xx, say a multivariate normal density. Then the kernel estimator of the density of xx is

F^(x)=1Tt=0TK(xxth),\hat{F}(x) = \frac{1}{T} \sum_{t=0}^{T} K\left(\frac{x - x_t}{h}\right),

where h>0h > 0 is a fixed ‘bandwidth’ parameter.

Chen and White study a modified recursive version of such estimators. They let {ht}t=0\{h_t\}_{t=0}^\infty be a sequence of bandwidths with ht0h_t \searrow 0, and F^0(x)\hat{F}_0(x) be an arbitrary initial density. Then they construct the sequence {F^t(x)}\{\hat{F}_t(x)\} of distributions via the stochastic approximation algorithm

F^t(x)=F^t1(x)+1t[K(xxtht)F^t1(x)].\hat{F}_t(x) = \hat{F}_{t-1}(x) + \frac{1}{t} \left[ K\left(\frac{x - x_t}{h_t}\right) - \hat{F}_{t-1}(x) \right].

For the present example, we could let xt=[Rt,Gt]x_t = [R_t, G_t]'. At time tt, we would let behavior be determined by the solution of a version of (31) in which the conditional expectation on the right side is evaluated with respect to the conditional distribution for RtR_t conditional on GtG_t that can be deduced from the joint density F^t1(x)\hat{F}_{t-1}(x).[15]

Learning in a model of the exchange rate

We now study an environment for which the rational expectations equilibrium exchange rate is indeterminate, but we expel all rational agents and replace them with adaptive agents.[16] We endow these adaptive agents with learning algorithms and initial values for their decisions, which serve to render the exchange rate and all other endogenous variables determinate. We want to study how the exchange rate behaves, and whether a ghost of indeterminacy still lurks.

The economy consists of a sequence of overlapping generations of two-period lived agents. There are two kinds of currency, available in supplies H1H_1 and H2H_2 that are fixed over time. At each date t1t \geq 1, there are born a constant number NN of young agents who are endowed with w1w_1 units of a nonstorable consumption good when young, and w2w_2 units when old. A young agent makes two decisions. First, he chooses an amount sts_t to save when young. Second, he chooses a fraction λt\lambda_t to allocate to currency 1, and allocates the remainder to currency 2. At time tt the young agent’s realized utility from those decisions will be

U(st,λt)=u(w1st)+u(w2+λtstp1t/p1t+1+(1λt)stp2t/p2t+1),U(s_t, \lambda_t) = u(w_1 - s_t) + u(w_2 + \lambda_t s_t p_{1t}/p_{1t+1} + (1 - \lambda_t) s_t p_{2t}/p_{2t+1}),

where pitp_{it} is the price level at time tt in terms of currency ii, and pit/pit+1p_{it}/p_{it+1} is the gross rate of return on currency ii. Here u()u(\cdot) is an increasing and strictly concave function of consumption of the one good. In the examples below, we shall set u(c)=ln(c)u(c) = \ln(c). We calculate the gradient

U/st=u(w1st)+u(w2+λtstp1t/p1t+1+(1λt)stp2t/p2t+1)×(λtp1t/p1t+1+(1λt)p2t/p2t+1)\partial U / \partial s_t = -u'(w_1 - s_t) + u'(w_2 + \lambda_t s_t p_{1t}/p_{1t+1} + (1 - \lambda_t) s_t p_{2t}/p_{2t+1}) \times (\lambda_t p_{1t}/p_{1t+1} + (1 - \lambda_t) p_{2t}/p_{2t+1})
U/λt=u(w2+λtstp1t/p1t+1+(1λt)stp2t/p2t+1)st×(p1t/p1t+1p2t/p2t+1).\partial U / \partial \lambda_t = u'(w_2 + \lambda_t s_t p_{1t}/p_{1t+1} + (1 - \lambda_t) s_t p_{2t}/p_{2t+1}) s_t \times (p_{1t}/p_{1t+1} - p_{2t}/p_{2t+1}).

We also use the elements of the matrix of second partials.[17] For the purposes of studying learning, the economy consists of two subsequences of agents. Each subsequence is identified with a class of agents, whom we dub ‘odd’ and ‘even.’ As in the preceding model, we have two clocks τ(t)\tau(t), one for odd and the other for even agents, that count only two-period episodes used to evaluate realized utility. Let γτ\gamma_\tau be a sequence of positive numbers satisfying limτγτ=1\lim_{\tau \to \infty} \gamma_\tau = 1. The values of (sτ,λτ)(s_\tau, \lambda_\tau) for each class of agents evolve according to the recursive algorithm

[sτ+1λτ+1]=[sτλτ]γτRτ+11[U/sτU/λτ]\begin{bmatrix} s_{\tau+1} \\ \lambda_{\tau+1} \end{bmatrix} = \begin{bmatrix} s_\tau \\ \lambda_\tau \end{bmatrix} - \gamma_\tau R_{\tau+1}^{-1} \begin{bmatrix} \partial U / \partial s_\tau \\ \partial U / \partial \lambda_\tau \end{bmatrix}
Rτ+1=Rτ+γτ([2U/sτ2U2/sτλτU2/λτsτ2U/λτ2]Rτ).R_{\tau+1} = R_\tau + \gamma_\tau \left( \begin{bmatrix} \partial^2 U / \partial s_\tau^2 & \partial U^2 / \partial s_\tau \partial \lambda_\tau \\ \partial U^2 / \partial \lambda_\tau \partial s_\tau & \partial^2 U / \partial \lambda_\tau^2 \end{bmatrix} - R_\tau \right).

There are two realizations of this algorithm, one for the odd agents, the other for the even agents. Agents of each class thus learn from the utility experience only of previous agents of their own class.[18] The price level is determined by

p1t=H1/(λtst),p2t=H2/((1λt)st).p_{1t} = H_1 / (\lambda_t s_t), \quad p_{2t} = H_2 / ((1 - \lambda_t) s_t).

In odd periods, the (sτ,λτ)(s_\tau, \lambda_\tau) pair for the odd agents is used in (51), while in even periods, the (sτ,λτ)(s_\tau, \lambda_\tau) pair for even agents is used in (51) to determine price levels in terms of the two currencies.

Given initial conditions for (Rτ1,sτ,λτ)(R_{\tau-1}, s_\tau, \lambda_\tau) for each class of agents, equations (46), (49), (51) determine the evolution of the decisions λt\lambda_t, sts_t and the prices pitp_{it} in terms of the two currencies. The exchange rate is just p1t/p2tp_{1t}/p_{2t}.

Figure 8 and Figure 9 report the results of simulating this system starting from two different sets of initial conditions for (s,λ)(s, \lambda), but common initial conditions for the RR’s.[19] For each experiment, the exchange rate path rapidly converges to a constant value, but the limiting exchange rate values differ between the two economies. Evidently, the exchange rate depends sensitively on the initial conditions that we choose. Figure 10 shows saving rates for the two types of agents in experiment 1, while Figure 12 displays the evolution of their portfolio parameters λ\lambda. All of these parameters are gradually converging to values consistent with a rational expectations equilibrium.

Logarithm of exchange rate in experiment 1.

Figure 8:Logarithm of exchange rate in experiment 1.

Logarithm of exchange rate for experiment 2. Experiments 1 and 2 share identical parameters, except for the initial conditions on \lambda for odd and even agents.

Figure 9:Logarithm of exchange rate for experiment 2. Experiments 1 and 2 share identical parameters, except for the initial conditions on λ\lambda for odd and even agents.

Saving rate for odd agents in experiment 1.

Figure 10:Saving rate for odd agents in experiment 1.

Saving rate for even agents in experiment 1.

Figure 11:Saving rate for even agents in experiment 1.

\lambda for odd agents in experiment 1.

Figure 12:λ\lambda for odd agents in experiment 1.

Exchange rate initial-condition dependence

For the purpose of understanding the sense in which this system can render the exchange rate determinate, it is useful to consider the limiting properties of algorithm (49). If algorithm (49) comes to rest, it will, as it is designed to do, come to rest at a point (λ,s)(\lambda, s) at which [U/s U/λ]=[0 0][\partial U / \partial s \ \partial U / \partial \lambda] = [0 \ 0]. The condition U/λ=0\partial U / \partial \lambda = 0 implies p1t/p1t+1=p2t/p2t+1p_{1t}/p_{1t+1} = p_{2t}/p_{2t+1}, or R1t=R2tR_{1t} = R_{2t}, where RitR_{it} is the gross rate of return on currency ii. This is the same arbitrage condition that leads to exchange rate indeterminacy under rational expectations. The reasoning that underlies the exchange rate indeterminacy under rational expectations also implies that the rest points of algorithm (49) leave the exchange rate unrestricted. Our learning system renders the exchange rate path determinate by having the ‘dead hand of history’ put enough sluggishness into decisions. The exchange rate path can be said to be ‘history-dependent’ because the initial conditions assigned to (λ,s)(\lambda, s) assume an importance that does not vanish as time passes.[20] [21]

The no-trade theorem

Jean Tirole (1982) proved a sharp ‘no-trade’ theorem that characterizes rational expectations equilibria in a class of models of purely speculative trading.[22] [23] The equilibrium market price fully reveals everybody’s private information at zero trades for all traders. The no-trade theorem overrules the common-sense intuition that differences in information are a source of trading volume.

The remarkable no-trade outcome works the rational expectations hypothesis very hard. This is revealed clearly in Tirole’s proof, which exploits elementary properties of the (commonly believed) probability distribution function determined by a rational expectations equilibrium. In this section, I describe how backing off rationality can (temporarily) undo the no-trade result, and produce a model of trading volume. I first describe an environment for which the no-trade theorem holds under rational expectations, then withdraw Tirole’s rational agents and replace them with Robbins--Monro adaptive agents.

The environment

The environment is one analyzed in detail by John Hussman (1992).[24] There is a competitive market for a stock that is a claim on a dividend process {dt}\{d_t\} governed by

dt=θ1t+θ2t+ϵtd_t = \theta_{1t} + \theta_{2t} + \epsilon_t
θ1t=ρ1θ1t1+ν1t\theta_{1t} = \rho_1 \theta_{1t-1} + \nu_{1t}
θ2t=ρ2θ2t1+ν2t\theta_{2t} = \rho_2 \theta_{2t-1} + \nu_{2t}

where ρ1<1|\rho_1| < 1, ρ2<1|\rho_2| < 1, and [ϵt ν1t ν2t][\epsilon_t \ \nu_{1t} \ \nu_{2t}] is a vector white noise. There are two classes of traders, dubbed aa and bb, present in equal numbers (for convenience we’ll assume one each) who have different information about dividends. At time tt, traders of both classes observe the history of the publicly available information {ps,ds;s<t}\{p_s, d_s; s < t\}. In addition, traders of classes aa and bb, respectively, observe the pieces of ‘private information’:

sta=θ1t+ηtas_t^a = \theta_{1t} + \eta_t^a
stb=θ2t+ηtbs_t^b = \theta_{2t} + \eta_t^b

where (ηta,ηtb)(\eta_t^a, \eta_t^b) are white noises that are orthogonal to each other and to each component of [ϵt ν1t ν2t][\epsilon_t \ \nu_{1t} \ \nu_{2t}]. At time tt, traders of class jj observe a history generated by the information vector zjt=[pt stj dt]z_{jt} = [p_t \ s_t^j \ d_t]'.

Traders behave myopically, each period maximizing the one-period utility function

E(exp(Wt+1j/ϕ))Ijt,ϕ>0,E(-\exp(-W_{t+1}^j / \phi)) \mid I_{jt}, \quad \phi > 0,

subject to

Wt+1j=RWtj+qtj(pt+1+dt+1Rpt),W_{t+1}^j = R W_t^j + q_t^j (p_{t+1} + d_{t+1} - R p_t),

where RR is a constant gross interest rate on a risk-free asset, IjtI_{jt} is agent jj’s information set, and qtjq_t^j is agent jj’s purchases. This leads to a demand function for trader of class jj that is linear in the expected ‘excess return’:

qtj=ϕjE(pt+1+dt+1Rpt)Ijtq_t^j = \phi_j E(p_{t+1} + d_{t+1} - R p_t) \mid I_{jt}

where ϕj=ϕ/σpt+1+dt+1Ijt2\phi_j = \phi / \sigma_{p_{t+1}+d_{t+1}|I_{jt}}^2; σpt+1+dt+1Ijt2\sigma_{p_{t+1}+d_{t+1}|I_{jt}}^2 is the variance of pt+1+dt+1p_{t+1} + d_{t+1} conditional on the information set IjtI_{jt}; and (pt+1+dt+1Rpt)(p_{t+1} + d_{t+1} - R p_t) is the excess return of the stock over the risk-free asset.

Following Tirole, we assume that the asset is available in fixed supply qˉ\bar{q}, which for convenience we assume to be zero.[25] A rational expectations equilibrium is a stochastic process for {pt}\{p_t\} that satisfies the market-clearing condition

qta+qtb=0.q_t^a + q_t^b = 0.

Prices fully revealing with no trade

For ease of exposition, we shall assume that in forming conditional expectations, agents of both classes condition only on the most recent observation zjtz_{jt}, which means that we set Ijt=zjtI_{jt} = z_{jt}. Hussman (1992) and Sargent (1991) describe how to set things up to condition on the (infinite) history of zjtz_{jt}. We also replace conditional expectations with linear regressions, a step we can defend by one of two standard justifications.[26] Substituting the demand functions (59) into equilibrium condition (60) gives

ϕaE(ytzat)+ϕbE(ytzbt)=0\phi_a E(y_t \mid z_{at}) + \phi_b E(y_t \mid z_{bt}) = 0

or

E(ytzat)=ϕbϕaE(ytzbt)E(y_t \mid z_{at}) = \frac{-\phi_b}{\phi_a} E(y_t \mid z_{bt})

where the random variable yt=(pt+1+dt+1Rpt)y_t = (p_{t+1} + d_{t+1} - R p_t) measures the excess return of the stock over the risk-free asset. Writing the regressions E(ytzjt)=δjzjtE(y_t \mid z_{jt}) = \delta_j z_{jt}, (62) implies

δazat=ϕbϕaδbzbt.\delta_a z_{at} = \frac{-\phi_b}{\phi_a} \delta_b z_{bt}.

Because we have assumed that the information vector zatzbtz_{at} \neq z_{bt}, (63) can hold t\forall t only if both sides are constant over time. However, (62) with E(ytzat)=α0E(y_t \mid z_{at}) = \alpha_0, α0\alpha_0 a constant implies E(ytzat)=E(ytzbt)=0E(y_t \mid z_{at}) = E(y_t \mid z_{bt}) = 0.[27] Substituting EytzjtE y_t | z_{jt} into the demand functions shows the no-trade outcome. The condition that EytIjt=0E y_t | I_{jt} = 0 j\forall j shows that the market price adjusts to reveal fully all of the private information that is relevant for predicting excess returns.

Models with ‘noise traders’ break the no-trade theorem by replacing (60) with qta+qtb=ζtq_t^a + q_t^b = \zeta_t, where {ζt}\{\zeta_t\} is an exogenous stochastic process of supplies by the noise traders. Technically, notice how the presence of a time-varying, random {ζt}\{\zeta_t\} process disrupts the argument leading to (62).

Computation of the equilibrium

In order both to study the rational expectations equilibrium in more detail and to provide a framework from which we can expel rational agents and resettle adaptive agents, it is useful to have a way of computing the rational expectations equilibrium. We follow Hussman and adapt the apparatus of Marcet and Sargent (1989b) to this purpose. A trader of type jj observes the history of zjt=[pt stj dt]z_{jt} = [p_t \ s_t^j \ d_t]', fits the vector autoregression,[28]

zjt=βjzjt1+ζjt,z_{jt} = \beta_j z_{jt-1} + \zeta_{jt},

and uses it to forecast the components of zjt+1z_{jt+1}:

Ezjt+1zjt=βjzjt.E z_{jt+1} \mid z_{jt} = \beta_j z_{jt}.

Then trader jj’s estimate of excess returns (pt+1+dt+1Rpt)=yt(p_{t+1} + d_{t+1} - R p_t) = y_t is

E(ytzjt)=δjzjtE(y_t \mid z_{jt}) = \delta_j z_{jt}

where δj=[1 0 1]βj+[R 0 0]\delta_j = [1 \ 0 \ 1] \beta_j + [-R \ 0 \ 0].

The state vector and the innovation vector for the market are

zt=[ptstastbdtθ1tθ2t],ut=[ηtaηtbν1tν2tϵt].z_t = \begin{bmatrix} p_t \\ s_t^a \\ s_t^b \\ d_t \\ \theta_{1t} \\ \theta_{2t} \end{bmatrix}, \quad u_t = \begin{bmatrix} \eta_t^a \\ \eta_t^b \\ \nu_{1t} \\ \nu_{2t} \\ \epsilon_t \end{bmatrix}.

Using (66) and the equilibrium condition (60), we can derive a state transition equation of the form

zt=T(β)zt1+V(β)ut,z_t = T(\beta) z_{t-1} + V(\beta) u_t,

where T(β)T(\beta), V(β)V(\beta) are matrix functions of β=(βa,βb)\beta = (\beta_a, \beta_b) that are described by Hussman. The form of (68) emphasizes how traders’ perceptions of the laws of motion, as parameterized by the vector autoregressive parameters βj\beta_j, influence the law of motion for the entire state ztz_t.

Since zjtz_{jt} is a subvector of ztz_t, system (68) can be used to deduce the projections

E(zjtzjt1)=Sj(β)zjt1,E(z_{jt} \mid z_{jt-1}) = S_j(\beta) z_{jt-1},

where Sj(β)S_j(\beta) depends on (T(β),V(β))(T(\beta), V(\beta)) and the moments EututE u_t u_t'. Thus, we have a mapping from a pair of perceived laws of motion β=(βa,βb)\beta = (\beta_a, \beta_b) to a pair of matrices (Sa(β),Sb(β))(S_a(\beta), S_b(\beta)) that determine optimal (linear least squares) predictors. A rational expectations equilibrium is a fixed point of this mapping.

Tampering with the no-trade theorem

The no-trade theorem follows directly from the equality

ϕaδazat+ϕbδbzbt=0,\phi_a \delta_a z_{at} + \phi_b \delta_b z_{bt} = 0,

and in particular from the facts that δj\delta_j, j=a,bj = a, b are each constant over time, and that δjzjt\delta_j z_{jt}, j=a,bj = a, b are each a conditional expectation of the same random variable, conditioned on different information sets, but calculated with respect to the same joint probability distribution. We can temporarily disrupt the forces leading to the no-trade theorem by withdrawing from our agents knowledge of the (equilibrium) joint probability distributions required to compute E(ytzjt)E(y_t \mid z_{jt}), and giving them instead initial conditions for βj\beta_j in their vector autoregressions and a recursive algorithm for updating their estimates of βj\beta_j.[29] The effect of this will be to replace (63) with an equilibrium condition of the form

ϕaδatzat+ϕbδbtzbt=0,\phi_a \delta_{at} z_{at} + \phi_b \delta_{bt} z_{bt} = 0,

where δjt=[1 0 1]βjt+[R 0 0]\delta_{jt} = [1 \ 0 \ 1] \beta_{jt} + [-R \ 0 \ 0]. The facts that δjt\delta_{jt} in (71) are time-dependent, and that they start from arbitrary initial conditions, raises the possibility that δatzatδbtzbt\delta_{at} z_{at} \neq \delta_{bt} z_{bt}, so that trade will occur.

The system’s motion is described by

zt=T(βt1)zt1+V(βt1)utz_t = T(\beta_{t-1}) z_{t-1} + V(\beta_{t-1}) u_t
Mjt=Mjt1+(1/t)(zjt1zjt1Mjt1)M_{jt} = M_{jt-1} + (1/t)\left( z_{jt-1} z_{jt-1}' - M_{jt-1} \right)
βjt=βjt1+(1/t)Mjt1zjt1(zjtβjt1zjt1).\beta_{jt}' = \beta_{jt-1}' + (1/t) M_{jt}^{-1} z_{jt-1} (z_{jt} - \beta_{jt-1} z_{jt-1})'.

where βt=(βat,βbt)\beta_t = (\beta_{at}, \beta_{bt}), and the first equation is formed simply by replacing β\beta by βt1\beta_{t-1} in (68). To start the system, we need initial conditions for (βj,Mj)(\beta_j, M_j) for j=a,bj = a, b. We shall start the system from initial conditions in the vicinity of a rational expectations equilibrium.

Some experiments

Figure 13, Figure 15, and Figure 17 report some simulations of the system with least squares learning.[30] Figure 13 plots the price and volume from a simulation of a system with least squares learning, where the system has been initiated with beliefs that are perturbed by very small amounts from the rational expectations beliefs, and the covariance matrices (the MjM_j’s) have been set close to those from the asymptotic distribution theory that governs the regression of someone who had lived for 100 periods in the rational expectations equilibrium. These graphs illustrate how the price with least squares learning resembles the rational expectations price, but diverges enough from it to generate volume. Hussman and Sargent studied the behavior of such systems over much longer horizons, and found that over time the gap between the rational expectations price and the price under least squares learning vanishes, and so does volume. But positive trading volume persists for a very long time.

Rational expectations price (solid line) and price under least squares learning (dotted line) after 1000 periods of learning, with initial beliefs close to those appropriate for a rational expectations equilibrium.

Figure 13:Rational expectations price (solid line) and price under least squares learning (dotted line) after 1000 periods of learning, with initial beliefs close to those appropriate for a rational expectations equilibrium.

Volume with least squares learning after 1000 periods.

Figure 14:Volume with least squares learning after 1000 periods.

Figure 15 and Figure 16 plot parts of a simulation of the same model with initial beliefs that assign too much weight to the market price in determining expected excess return. In particular, initial beliefs are equal to the rational expectations beliefs except that the coefficients on current price in the vector autoregression determining expected excess return are raised in absolute value by 40 percent vis à vis their rational expectations values. The initial 150 periods of the simulation are plotted in Figure 15, while Figure 17 plots observations after 1000 periods. These figures indicate how we can make prices temporarily diverge from their rational expectations values by driving initial beliefs farther away from the rational expectations values. Figure 17 shows how, with the passage of time, least squares beliefs are adapting to eliminate differences from rational expectations.[31]

Rational expectations price (solid line) and price under least squares learning (dotted line), starting from beliefs that overweight the market price: first 150 observations.

Figure 15:Rational expectations price (solid line) and price under least squares learning (dotted line), starting from beliefs that overweight the market price: first 150 observations.

Volume under least squares learning, starting from beliefs that overweight the market price.

Figure 16:Volume under least squares learning, starting from beliefs that overweight the market price.

Rational expectations price (solid line) and price under least squares learning (dotted line), starting from beliefs that overweight the market price, after 1000 periods of learning.

Figure 17:Rational expectations price (solid line) and price under least squares learning (dotted line), starting from beliefs that overweight the market price, after 1000 periods of learning.

Volume under least squares learning, starting from beliefs that overweight the market price, after 1000 periods of learning.

Figure 18:Volume under least squares learning, starting from beliefs that overweight the market price, after 1000 periods of learning.

Sustaining volume with a constant gain

We can prevent convergence to rational expectations and extinction of volume by assigning agents constant-gain (i.e., γt=γ>0\gamma_t = \gamma^* > 0) versions of the recursive algorithm. By setting γ\gamma^*, we can control the neighborhood of a rational expectations equilibrium, and the average level of volume to which this model would eventually converge.

Using a constant-gain algorithm might be a good idea for agents who take time invariance of their forecasting model with a grain of salt, and who place a premium on adaptability. Second, constant-gain algorithms assign enough greater weight to recent observations than ordinary least squares to defeat the forces that generate consistency of ordinary least squares under classical conditions. The stay-on-your-toes spirit of constant-gain algorithms can have advantages in situations (like this no-trade model with least squares learning) in which one is fitting a time-invariant model where the law of motion is really time-varying.[32]

In giving up the ability to converge, the constant-gain adapter retains an ability to keep up with the times.[33]

Learning with an infinite horizon

We have already encountered a couple of situations in which agents want to set their behavior to satisfy an Euler equation, and where they only need to learn about the distribution with respect to which to compute the expectations that appear in their Euler equation. I described how we could have applied this setup to the Markov deficit example, and we actually did the no-trade example in this way. In those examples, because of the short horizons of the agents, there were alternative ways to model learning, e.g. by letting agents ‘learn the hard way’ by observing the past utility-experiences associated with the dynamic plans of their predecessors. Where agents have infinite planning horizons, we are more restricted in how we can model agents’ learning.

To illustrate learning (with coaxing) in a simple infinite-horizon context, this section describes least squares learning in the context of a linear version of Lucas and Prescott’s equilibrium model of investment under uncertainty, an example studied by Marcet and Sargent (1989a). This example has the following features:

(a) Because the horizon is infinite, agents in the model get a lot of coaxing. The model is set up so that agents know most of what they require to make optimal decisions, and only learn about a limited aspect of the system, namely, the law of motion involving an aggregate endogenous state variable. Firms know enough to form and solve their Euler equation, but don’t know the equilibrium conditional distribution of the future values of the output prices that appear on the right side. Firms use a recursively updated estimate of a vector autoregression to solve their Euler equation.

(b) Verifying the convergence of the system is technically difficult because the firms are learning about a ‘moving target,’ a law of motion that is influenced by their own learning behavior.

Investment under uncertainty

A representative firm chooses its capital stock ktk_t to maximize

Et=0δt(ptfkt(d/2)(ktkt1)2),E \sum_{t=0}^\infty \delta^t (p_t f k_t - (d/2)(k_t - k_{t-1})^2),

where δ(0,1)\delta \in (0, 1), f>0f > 0, d>0d > 0, and where ptp_t is the price of a single commodity. The price of the commodity is determined in a competitive market. The demand for the commodity is governed by

pt=A0A1fKt+ut,p_t = A_0 - A_1 f K_t + u_t,

where KtK_t is the average level of capital used to produce output in this market (so that average output is fKtf K_t), and {ut}\{u_t\} is a serially uncorrelated random process with mean zero. Under rational expectations, the firm is supposed to know the law of motion for average capital, namely,

Kt=β0+β1Kt1+vut,K_t = \beta_0 + \beta_1 K_{t-1} + v u_t,

and to use it in conjunction with (76) to forecast prices. Under the assumption that the firm knows the laws (76), (77) governing prices and the market-wide average capital stock, the firm’s problem can be represented as a dynamic programming or discrete-time calculus of variations problem. The Euler equation associated with this problem is

kt=kt1+(f/d)Etj=0δjpt+j,k_t = k_{t-1} + (f/d) E_t \sum_{j=0}^\infty \delta^j p_{t+j},

where EtE_t is the conditional expectation evaluated with respect to the equilibrium distribution generated by (76), (77).

Rational expectations equilibrium as a fixed point

For given values of β0\beta_0, β1\beta_1 in (76), the prediction problem associated with the right side of the Euler equation (78) can be formulated as follows. Represent (77) as

[Kt+11]=[β1β001][Kt1]+[v0]ut+1,\begin{bmatrix} K_{t+1} \\ 1 \end{bmatrix} = \begin{bmatrix} \beta_1 & \beta_0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} K_t \\ 1 \end{bmatrix} + \begin{bmatrix} v \\ 0 \end{bmatrix} u_{t+1},

or

xt+1=βxt+[v0]ut+1,x_{t+1} = \beta x_t + \begin{bmatrix} v \\ 0 \end{bmatrix} u_{t+1},

where xt=[Kt 1]x_t = [K_t \ 1]'. Using (80) to evaluate the conditional expectation, the Euler equation can be represented as

kt=kt1+fd(1δ)A0+fdut[1 0]A1f2d(Iδβ)1[Kt1].k_t = k_{t-1} + \frac{f}{d(1-\delta)} A_0 + \frac{f}{d} u_t - [1 \ 0] \frac{A_1 f^2}{d} (I - \delta \beta)^{-1} \begin{bmatrix} K_t \\ 1 \end{bmatrix}.

Equation (81) summarizes individual firm behavior under the beliefs (77) about the aggregate state KtK_t. We impose equilibrium by setting kt=Ktk_t = K_t, and solving the resulting equation for the actual law of motion for KtK_t induced by the beliefs (77). We obtain

Kt=T1(β1)Kt1+T0(β)+V(β1)ut,K_t = T_1(\beta_1) K_{t-1} + T_0(\beta) + V(\beta_1) u_t,

where

T1(β1)=(1δβ1)1δβ1+A1f2/d.T_1(\beta_1) = \frac{(1 - \delta \beta_1)}{1 - \delta \beta_1 + A_1 f^2/d}.

This construction induces a mapping from a perceived law of motion for KtK_t into an actual one. When firms believe that the law of motion is (77), they act to make the actual law of motion (82). A rational expectations equilibrium is a fixed point of this mapping, namely, a pair {β0,β1}\{\beta_0, \beta_1\} that satisfy

β0=T0(β),β1=T1(β1).\beta_0 = T_0(\beta), \quad \beta_1 = T_1(\beta_1).

Least squares learning

Marcet and Sargent (1989a) describe a version of this model with adaptive agents. Firms formulate and recursively estimate an autoregression of the form (77), using the stochastic approximation algorithm

βt=βt1(1/t)Rt1(xt2(βt1xt2xt1))\beta_t' = \beta_{t-1}' - (1/t) R_t^{-1} (x_{t-2}(\beta_{t-1} x_{t-2} - x_{t-1})')
Rt=Rt1+(1/t)(xt1xt1Rt1).R_t = R_{t-1} + (1/t)(x_{t-1} x_{t-1}' - R_{t-1}).

Firms’ behavior is determined each period by using the estimated vector autoregression to evaluate the conditional expectation on the right side of the Euler equation (77). This behavior causes the actual evolution of the capital stock to be

Kt=T1(βt)Kt1+T0(βt)+V(βt)ut.K_t = T_1(\beta_t) K_{t-1} + T_0(\beta_t) + V(\beta_t) u_t.

Technically, this example has many of the features of Bray’s model, with the additional feature that, even in the rational expectations equilibrium (i.e. the system without learning), there is a state variable, KtK_t, that imparts dynamics to the system.

It is about the law of motion of that state variable, and not a fixed mean, that the firms are learning. Nevertheless, very similar considerations govern the convergence of this system to a rational expectations equilibrium.

Marcet and Sargent (1989a) applied methods developed by Lennart Ljung (1977) to describe the sense in which the limiting behavior of the stochastic difference equations (85), (87) is governed by the associated ordinary differential equations

ddt(β1β0)=R1Mx(β)[T1(β)β1T0(β)β0]\frac{d}{dt} \begin{pmatrix} \beta_1 \\ \beta_0 \end{pmatrix} = R^{-1} M_x(\beta) \begin{bmatrix} T_1(\beta) - \beta_1 \\ T_0(\beta) - \beta_0 \end{bmatrix}
ddtR=Mx(β)R,\frac{d}{dt} R = M_x(\beta) - R,

where Mx(β)=ExtxtM_x(\beta) = E x_t x_t', from system (82), evaluated at the fixed vector β\beta. Notice that the rest points of this system are rational expectations equilibria. Stability of this ordinary differential equation system about the rational expectations equilibrium is a necessary condition for the (almost sure) convergence of the stochastic difference equations (85), (87) to the rational expectations equilibrium. Sufficiency is more tenuous and troublesome.[34] [35] Marcet and Sargent studied the technical complications that the presence of learning about a law of motion for an endogenous state variable like KtK_t added to the sort of system studied by Bray.

Convergence theorems

I now briefly describe a method that has been used to analyze the limiting properties of models in which the agents’ behavior is determined by their use of adaptive estimators. Such systems have the property that laws of evolution of the endogenous variables are determined in part by the adaptive estimation process. Because the agents are learning about a system that is being influenced by the learning processes of people like themselves, these systems are sometimes called ‘self-referential.’ That the adaptive estimators are not estimating the parameters of a fixed data-generating mechanism means that standard econometric proofs of convergence of estimators (e.g., their consistency and asymptotic efficiency) cannot usually be applied. Instead, another approach based on stochastic approximation methods has increasingly been used.

I shall illustrate the kind of analysis that can be done with stochastic approximation methods in the context of a particular example, namely our model with a stochastic government deficit.[36] Consider a version of that model in which agents are learning via a parametric model which they fit to the distribution of the return on currency. In particular, assume that agents generate forecasts of the rate of return on currency by fitting the parametric model

EˉtRt=f(Gt;θt),\bar{E}_t R_t = f(G_t; \theta_t),

where θt\theta_t is the time tt estimate of the vector of parameters θ\theta in the probability model, and f(;θ)f(\cdot; \theta) is a possibly nonlinear function mapping the government deficit GG into a forecast of the rate of return on currency between tt and t+1t + 1, which we denote EˉtRt\bar{E}_t R_t. We assume that agents use the following stochastic approximation algorithm for estimating θ\theta:

θt+1=θt+(1/t)Mt1(ft)(Rtf(Gt;θt))\theta_{t+1} = \theta_t + (1/t) M_t^{-1} (\nabla f_t)(R_t - f(G_t; \theta_t))
Mt+1=Mt+(1/t)((ft)(ft)Mt),M_{t+1} = M_t + (1/t)((\nabla f_t)(\nabla f_t)' - M_t),

where ft\nabla f_t is the gradient of ff with respect to θ\theta evaluated at θt\theta_t and GtG_t. Recall that (91) simply implements a recursive nonlinear least squares algorithm.

The model has the self-referential property that, when agents forecast according to the rule and when the θ\theta’s are updated according to (91), the optimal forecast has a form

EtRt=h(Gt),E_t R_t = h(G_t),

where

Rt=h(Gt)+ut,R_t = h(G_t) + u_t,

where EtE_t is the conditional expectation operator, h()h(\cdot) is a function mapping GtG_t into the least squares forecast EtRtE_t R_t, and {ut}\{u_t\} is a random process with the property that utu_t is orthogonal to every (Borel measurable) function of GtG_t.[37] The function h(G)h(G), which depends on (θt,Mt)(\theta_t, M_t), is determined implicitly by the process of solving the model.[38]

Substituting (94) into (91), the recursive learning algorithm can be written

θt+1=θt+(1/t)Mt1(ft)(h(Gt)+utf(Gt;θt))\theta_{t+1} = \theta_t + (1/t) M_t^{-1} (\nabla f_t)(h(G_t) + u_t - f(G_t; \theta_t))
Mt+1=Mt+(1/t)((ft)(ft)Mt).M_{t+1} = M_t + (1/t)((\nabla f_t)(\nabla f_t)' - M_t).

An associated ordinary differential equation

We want to study the behavior of the system formed by the assumed exogenous Markov process for government expenditures GG which together with equations (94) and (95) determines the evolution of (Gt,Rt,θt,Mt)(G_t, R_t, \theta_t, M_t). In particular, we want to find some conditions under which this system converges to an asymptotically stationary system in which the parameters determining beliefs (θ,M)(\theta, M) stop moving. When convergence does occur, we want to describe the resulting limit point in terms of how it relates to the concept of a rational expectations equilibrium.

Application of arguments in the spirit of Ljung and Söderström (1983) and Kushner and Clark (1978) can be used to show that the limiting behavior of the system of stochastic difference equations defined by the Markov process for GG and equations (94) and (95) is determined by an associated system of ordinary differential equations. This associated differential equation is derived by conducting the following mental experiment. Temporarily suspend the operation of system (95), and consider the system operating with a fixed θ\theta for ever. Assume that this system converges to a unique invariant distribution,[39] and let h(Gt)=T(θ)(Gt)h^*(G_t) = T(\theta)(G_t) be the conditional expectation of RtR_t evaluated with respect to this invariant distribution. Form the associated differential equation system:

(d/dt)θ=E(M1f(T(θ)(G)f(G,θ)))(d/dt)\theta = E(M^{-1} \nabla f (T(\theta)(G) - f(G, \theta)))
(d/dt)M=E((f)(f))M,(d/dt)M = E((\nabla f)(\nabla f)') - M,

where EE is the unconditional expectation operator evaluated with respect to the asymptotic stationary distribution associated with the fixed parameter vector θ\theta. The system of ordinary differential equations (97) is formed mechanically by taking expectations of the objects ‘to the right of (1/t)(1/t)’ in equation system (95), and using the resulting expectations to estimate the average motion of (θ,M)(\theta, M) over small intervals of time dtdt. The expectations are taken with respect to the stationary distribution associated with a fixed θ\theta.[40]

We also consider the smaller ordinary differential equation system

(d/dt)θ=E((f)(T(θ)(G)f(G,θ))).(d/dt)\theta = E((\nabla f)(T(\theta)(G) - f(G, \theta))).

Propositions

Several properties of systems like this have recurred in a variety of contexts, among the important ones being:

(a) The rest points of the ordinary differential equation (o.d.e) system (97) satisfy

E(f)(T(θ)(G)f(G,θ))=0E(\nabla f)(T(\theta)(G) - f(G, \theta)) = 0
E(f)(f)=M.E(\nabla f)(\nabla f)' = M.

If the support of f(G,θ)f(G, \theta) includes a rational expectations equilibrium, the first equation of (100) can be satisfied by T(θ)(G)f(G,θ)=0T(\theta)(G) - f(G, \theta) = 0, in which case we have a rational expectations equilibrium as a rest point of (97) or (99). If the support of f(G,θ)f(G, \theta) does not include a rational expectations equilibrium, then the first equation of (100) identifies a set of orthogonality conditions that are the first-order necessary conditions for a special approximation problem. If the function T(θ)T(\theta) were independent of θ\theta (which it usually is not), then these equations would be orthogonality conditions for the problem: find the value of θ\theta which makes the function f(G,θ)f(G, \theta) best approximate the fixed function T(G)T(G), where the approximation criterion is the mean square difference between the functions

ET(θ)(G)f(θ,G)2.E|T(\theta)(G) - f(\theta, G)|^2.

The approximation problem is unusual because θ\theta determines the approximating function ff and also influences the function being approximated h=T(θ)h^* = T(\theta). This aspect of the approximation problem reflects the self-referential property of the system.

(b) If the estimators (θt,Mt)(\theta_t, M_t) converge, they converge to a rest point of the ordinary differential equation (97).

(c) If a fixed point of the ordinary differential equation (97) is locally unstable, then the estimator θt\theta_t cannot converge to that fixed point.

(d) Suppose that the ordinary differential equation (97) is globally stable about a unique rest point. Then there exists a modification of the recursive algorithm for θt\theta_t, MtM_t which converges almost surely to the rest point.[41]

(e) Convergence theorems require that {γt}\{\gamma_t\} look like {1/t}\{1/t\}. Convergence will not occur with ‘constant-gain’ versions of the algorithm.[42]

(f) Few results are available on rates of convergence. However, theorems described by Benveniste, Métivier, and Priouret (1990) can sometimes be used to show that a necessary and sufficient condition for T\sqrt{T}-convergence of θt\theta_t to the fixed point is that the eigenvalues of the Jacobian of the linear approximation to the small o.d.e. (99) at the fixed point are all less than 1/2-1/2 in modulus. Notice that this condition is stronger than the necessary condition for ‘local stability’ of the algorithm at the fixed point, namely that the eigenvalues of this same linear approximation are less than 0 in modulus.

Propositions like (a), (b), (c), and (e) are not difficult to obtain, and can be expected to apply across a wide variety of models, both linear models like those studied by Marcet and Sargent (1989a, 1989b, 1992) and nonlinear ones like the ones studied by Woodford (1990). Propositions like (d) are harder to obtain, and often involve delicate and involved computations to verify assumptions sufficient to assure almost sure convergence. The amount of work to be done depends on the details of the device that is used to force the algorithm infinitely often into the domain of attraction of a fixed point. So far very little formal work has been done along the lines of proposition (f) about rates of convergence.[43] [44]

Conclusions

The examples in this chapter all take an environment that had been studied under rational expectations and add a source of transient dynamics coming from adaptive least squares learning. The dynamics are transient because the ‘fundamentals’ in these environments are time-invariant, and because the adaptive algorithms we have given our boundedly rational agents eventually settle upon good time-invariant decision rules for those environments.[45]

Are transient dynamics created in this way likely to be a useful addition to the list of ways that applied economists have of inducing dynamics? Among the principal mechanisms through which applied economists induce dynamics are:

(a) Capital (physical and human).

(b) Costs of adjustment.

(c) Serially correlated exogenous processes and disturbances.

(d) Information structures that induce agents to solve signal extraction problems or incentive problems.

I suspect that it is too early to add the sort of transient dynamics described in this chapter to this list of workhorses in applied economic dynamics. However, the examples in this chapter can teach us various things.

  1. Beyond exhibiting the structure of a model of market equilibrium with dynamic supply behavior under least squares learning, Bray’s model displays circumstances under which at least one class of plausible adaptive algorithms eventually converges to rational expectations.

  2. The model with a stochastic government deficit sensitizes us to the issue that how fast adaptive agents can be expected to learn to have rational expectations depends on how complicated is the stochastic environment they must learn about, and how much prior information they are endowed with by way of a parametric form to learn about. In particular, how fast adaptive agents learn depends on how complicated is the government policy regime.

  3. Adaptive algorithms are in principle capable of resolving indeterminacies in some rational expectations models like our exchange rate model, but in a very tenuous way. In our exchange rate example, the equations determining the limiting behavior of the system leave the exchange rate indeterminate (because those equations just recover the logic of exchange rate indeterminacy), but the adaptive algorithms assign enough force to initial conditions and to ‘history’ to determine the exchange rate path. By sufficiently tying down the expectations process that was left underdetermined by the rational expectations equilibrium, the adaptive mechanism selects an exchange rate path. This may seem a weak reed on which to base exchange rate determination.[46]

  4. Adaptive learning provides enough friction temporarily to break the logic of the no-trade theorem, and so to provide a model of trading volume. This is one of a class of examples in which incorporating adaptive agents would serve, at least temporarily, to modify or take the edge off very sharp predictions that arise in some rational expectations models. One can imagine using similar shading of sharp rational expectations results in particular environments giving rise to Ricardian or Modigliani--Miller results for government monetary-fiscal operations.

  5. A comparison of the Marcet-Sargent example with some of the earlier ones shows that there are many choices to be made in endowing our artificial agents with adaptive algorithms. These choices supply differing amounts of ‘coaxing’ to our boundedly rational agents.

In the next two chapters I shall describe more potential uses of models of bounded rationality.

Footnotes
  1. This model has the special feature that, in the rational expectations equilibrium, the unconditional expectation equals the conditional expectation, a consequence of there being no time-varying state variables in the rational expectations equilibrium.

  2. Marcet and Sargent (1989a) show that the limiting behavior of β\beta is governed by the associated differential equation (d/dt)β=a+(b1)β(d/dt)\beta = a + (b - 1)\beta, which is stable for b<1b < 1. The right hand side of this differential equation can be expressed as T(β)βT(\beta) - \beta, where T(β)T(\beta) is the mapping from the perceived forecast of prices β\beta to the optimal forecast of prices a+bβa + b\beta. Stephen DeCanio (1979) and George Evans (1985, 1989) used the operator T(β)T(\beta) to define a notion of expectational stability. Marcet and Sargent (1989a) described a sense in which the operator T(β)βT(\beta) - \beta governs the convergence of least squares learning schemes in a class of models.

  3. Jasmina Arifovic (1991) has studied a version of Bray’s model in which Bray’s representative least squares learner is replaced by a population of heterogeneous agents with heterogeneous beliefs. She applied a genetic algorithm to this environment, and found that the population can sometimes learn its way to a rational expectations equilibrium even when Bray’s necessary condition b<1b < 1 is violated.

  4. See Mark Feldman (1987) for a study of the convergence of a model with a collection of Bayesian agents who start out with divergent priors. Also, see El-Gamal and Sundaram (1993).

  5. Marcet and Sargent (1989b) describe systems with heterogeneous beliefs in which heterogeneity remains in the rational expectations equilibrium because people are assumed to be differentially informed.

  6. Models of this class typically have equilibria outside the class of stationary ‘fundamental’ equilibria that we are focusing on. In addition to a class of non-stationary equilibria that David Gale (1973) studied, Azariadis, Guesnerie, Cass and Shell have studied sunspot equilibria for such models. Michael Woodford (1990) and George Evans (1989) have studied how collections of agents using least squares learning schemes can converge to a sunspot equilibrium. To study this question, Evans used a distinction between ‘strong’ and ‘weak’ expectational stability, which focuses on whether or not convergence is robust to failure to specify the order of the perceived autoregressive moving average process in a way that is overparameterized with respect to an equilibrium process. ‘Strong stability’ is the property that convergence to a rational expectations equilibrium occurs when agents overparameterize the law of motion they are learning about.

  7. See Cass and Shell (1983), Azariadis (1981), and Azariadis and Guesnerie (1986).

  8. In models with agents who have an infinite horizon, it will obviously not work to let agents see and base adaptation of decisions on realizations of the infinite-horizon utility functional. Adaptation in settings with infinite-horizon agents has been modelled by endowing agents with versions of ‘adaptive control’ algorithms in which adaptation is confined to learning about a rule for forecasting state variables that are not controllable by the agent. The agent simply resolves a dynamic programming problem at each point in time with a revised forecasting rule. See Marcet and Sargent (1989a, 1989b) for some examples of such setups.

  9. The initial conditions for the saving rates can be read from the graphs.

  10. See Evans and Honkapohja (1993b) for a discussion of some of the features of constant gain algorithms.

  11. See Marcet and Marshall (1992) and Sargent (1991). Also, see Kenneth Judd (1990, 1992) for descriptions of a variety of numerical methods for computing approximate equilibria.

  12. A byproduct of setting things up in this way is the alternation of turns between odd and even sets of agents. This ‘two-population’ feature of the learning algorithm duplicates or resembles the experimental environments of Marimon and Sunder (1992) and Arifovic (1993), to be discussed in the next chapter.

  13. See Kenneth Judd (1992) for a critical survey of and guide to such methods.

  14. See Marcet and Marshall (1992) for a formal analysis of the algorithm.

  15. Chen and White (1992, 1993) have attained results on rates of convergence of such nonparametric estimators under assumptions permitting less feedback from agents’ behavior to outcomes than the present example admits.

  16. The environment is the one studied by Kareken and Wallace (1981).

  17. These are given by 2U/st2=u(w1st)+u(w2+λtstp1t/p1t+1+(1λt)stp2t/p2t+1)×(λtp1t/p1t+1+(1λt)p2t/p2t+1)2\partial^2 U / \partial s_t^2 = u''(w_1 - s_t) + u''(w_2 + \lambda_t s_t p_{1t}/p_{1t+1} + (1 - \lambda_t) s_t p_{2t}/p_{2t+1}) \times (\lambda_t p_{1t}/p_{1t+1} + (1 - \lambda_t) p_{2t}/p_{2t+1})^2; 2U/λt2=u(w2+λtstp1t/p1t+1+(1λt)stp2t/p2t+1)×st(p1t/p1t+1p2t/p2t+1)2\partial^2 U / \partial \lambda_t^2 = u''(w_2 + \lambda_t s_t p_{1t}/p_{1t+1} + (1 - \lambda_t) s_t p_{2t}/p_{2t+1}) \times s_t (p_{1t}/p_{1t+1} - p_{2t}/p_{2t+1})^2; 2U/(λtst)=u(w2+λtstp1t/p1t+1+(1λt)stp2t/p2t+1)st×(p1t/p1t+1p2t/p2t+1)(λtp1t/p1t+1+(1λt)p2t/p2t+1)+u(w2+λtstp1t/p1t+1+(1λt)stp2t/p2t+1)×(p1t/p1t+1p2t/p2t+1)\partial^2 U / (\partial \lambda_t \partial s_t) = u''(w_2 + \lambda_t s_t p_{1t}/p_{1t+1} + (1 - \lambda_t) s_t p_{2t}/p_{2t+1}) s_t \times (p_{1t}/p_{1t+1} - p_{2t}/p_{2t+1})(\lambda_t p_{1t}/p_{1t+1} + (1 - \lambda_t) p_{2t}/p_{2t+1}) + u'(w_2 + \lambda_t s_t p_{1t}/p_{1t+1} + (1 - \lambda_t) s_t p_{2t}/p_{2t+1}) \times (p_{1t}/p_{1t+1} - p_{2t}/p_{2t+1}).

  18. Algorithm (49) can be modified to incorporate ‘simulated annealing’ by replacing γτ\gamma_\tau by γτa=(1+ζτ)γτ\gamma_\tau^a = (1 + \zeta_\tau)\gamma_\tau everywhere in the algorithm, where ζτ\zeta_\tau is a random variable with mean zero. We can implement a so-called ‘constant-gain’ algorithm by setting γτ=γ0\gamma_\tau = \gamma_0, a constant. A constant-gain algorithm with γτ=1\gamma_\tau = 1 and with the initial value of RR set equal to the Hessian implements Newton--Raphson.

  19. We set the parameters of the model at w1=20w_1 = 20, w2=15w_2 = 15, H1=100H_1 = 100, H2=120H_2 = 120.

  20. It is possible to construct stochastic versions of this model in which the exchange rate is path-dependent in the sense that realizations emanating from identical initial conditions would eventually converge to different exchange rates because of the different realizations of the random processes impinging on the system’s transient dynamics.

  21. Evans and Honkapohja (1993) describe how adaptive learning rules resolve equilibrium indeterminacy problems: ‘models with multiple solutions are converted into models with path dependence in which the trajectory of the economy, and the [rational expectations equilibrium] attained in the limit, are determined through a learning rule by initial forecasts and by the sequence of exogenous shocks.’

  22. ‘Purely speculative trading’ means that all insurance and consumption-smoothing reasons for trading are assumed absent.

  23. Milgrom and Stokey describe a related no-trade theorem.

  24. Hussman’s work is related to work by Jiang Wang (1990). For other work on methods of circumventing the no trade theorem, see Harald Uhlig (1992).

  25. The important thing is that the supply is fixed over time.

  26. Either assume that all distributions are multivariate normal or restrict decision rules to be linear.

  27. Applying the law of iterated expectations to (62), and noting that E(E(ytzat))=E(E(ytzbt))=αE(E(y_t \mid z_{at})) = E(E(y_t \mid z_{bt})) = \alpha, implies (1+ϕb/ϕa)α=0(1 + \phi_b/\phi_a)\alpha = 0, or α=0\alpha = 0.

  28. The vector ζjt\zeta_{jt} is the innovation vector.

  29. The work described in this section is from Hussman and Sargent (1993).

  30. Parameter values were set at ρ1=0.8\rho_1 = 0.8, ρ2=0.4\rho_2 = 0.4, ϕ=1\phi = 1, R=1.1R = 1.1. Constants have been omitted from the dividend process, with the consequence that equilibrium prices fluctuate around zero. By adding constants, we could make prices fluctuate around a positive number.

  31. The price differences show higher-than-normal kurtosis, which decreases toward three as the system converges to a rational expectations equilibrium.

  32. In the model of Sims and Chung to be described in Chapter 7, we shall see a situation in which using a ‘random coefficients’ specification enhances a government policy-maker’s adaptability and sometimes leads to superior outcomes, relative to those implied by ‘decreasing-gain’ specifications.

  33. There are alternative ways to break the no-trade theorem. One class of alternatives would alter the environment to restore non-speculative motives for asset trades, e.g. via endowment heterogeneity coupled with consumption smoothing motives. Another class of explanations would retain the only-speculative motive assumption, but would model trading processes explicitly in such a way that positive volume and lack of full revelation of information would be the outcome of implementing one of the auction mechanisms analyzed, say, by Gresik and Satterthwaite (1989). I don’t know whether the learning route described in the text is more promising than these alternatives.

  34. The sufficient conditions for convergence that have been discovered to date involve adding some side conditions to the least squares algorithm designed to insure that the altered algorithm visits the basin of attraction of the fixed point of the operator T(β)T(\beta) infinitely often. See Ljung (1977), Ljung and Söderström (1983), and Marcet and Sargent (1989a) for a discussion of various ways of modifying the algorithm. What is needed to get the stochastic approximation approach to yield almost sure convergence to a fixed point of the o.d.e. is some device that assures that the algorithm infinitely often visits the domain of attraction of the fixed point of the o.d.e.

  35. Marcet and Sargent also study the sense in which the local stability of the learning scheme is governed by the smaller o.d.e. ddt(β1β0)=[T1(β)β1T0(β)β0]\frac{d}{dt} \begin{pmatrix} \beta_1 \\ \beta_0 \end{pmatrix} = \begin{bmatrix} T_1(\beta) - \beta_1 \\ T_0(\beta) - \beta_0 \end{bmatrix}.

  36. This section is based on Marcet and Sargent (1989a) and Woodford (1990). Bullard and Duffy (1993) study least squares learning in an economy with overlapping generations of nn-period lived agents. For n4n \geq 4, they find that least squares learning fails to converge locally to a rational expectations equilibrium. Also see Bullard (1991) for a discussion of how complicated nonlinear dynamics can sometimes emerge out of least squares learning.

  37. This property of utu_t identifies h()h(\cdot) as the conditional expectation function.

  38. The function hh is understood to embed the dependence of θt+1\theta_{t+1} on θt\theta_t, MtM_t via equation (91). The price level pt+1p_{t+1} depends on θt+1\theta_{t+1} because θt+1\theta_{t+1} influences savings behavior at t+1t + 1.

  39. i.e., a unique asymptotic stationary distribution.

  40. Notice the ‘mean field theory’ flavor of this approach: approximating deterministic dynamics are being used to study aspects of an underlying stochastic process.

  41. The modifications are devices that ‘project’ the estimator back into the intersection of the domain of attraction of the fixed point with the set of values of θ\theta for which the system converges to an asymptotically stationary distribution for RtR_t, GtG_t. See Ljung (1977), Ljung and Söderström (1983), and Marcet and Sargent (1989a) for a discussion of various ways of modifying the algorithm. What is needed to get the stochastic approximation approach to yield almost sure convergence to a fixed point of the o.d.e. is some device that assures that the algorithm infinitely often visits the domain of attraction of the fixed point of the o.d.e.

  42. The most that can be hoped for with constant-gain versions of the algorithm is convergence in a stochastic sense of visiting a specified neighborhood of a fixed point of the o.d.e. with a relative frequency that depends, among other things, on the gain parameter γ\gamma.

  43. See Marcet and Sargent (1992) for an analysis of rates of convergence in a particular model, with part of the analysis being based on the theorems of Benveniste et al., and another part being based on Monte Carlo methods. Also see Ljung, Pflug, and Walk (1992).

  44. See Chung-Ming Kuan (1989) and Mohr (1990) for useful early contributions. Marcet and Sargent (1993) state a proposition about a rate of convergence.

  45. The exceptions are the constant-gain algorithms.

  46. Put differently, a regime that allows the exchange rate to be history-dependent seems to be an ill-formed mechanism.