Lecture 8: Heterogeneous-agent economies, Part I

SIMON FRASER UNIVERSITY
BURNABY
BRITISH COLUMBIA
Paul Klein
Office: WMC 3635
Phone: TBA
Email: paul klein [email protected]
URL: http://paulklein.ca/newsite/teaching/809.php
Economics 809
Advanced macroeconomic theory
Spring 2014
Lecture 8:
Heterogeneous-agent economies, Part I: Invariant measures
Suppose we are in a situation where something (say income or wealth) gets jumbled in a random
but systematic way across individuals in each period. Random, because no one knows exactly
who is going to get rich quick and who is going to end up on skid row. Systematic, because the
same fraction of agents switch from a given level to another given level in each period.
Under what circumstances can we be sure that the economy will converge to a unique distribution of income or wealth? That is the topic of this set of lecture notes. Among economists, the
treatment by Stokey and Lucas (1989) has become the standard introduction. They establish
the following three conditions as sufficient:
1. The Feller property, for existence.
2. The mixing property, for uniqueness.
3. Monotonicity, for convergence.
A bit later on, we will look at other sufficient conditions, including Theorem 11.12 in Stokey
and Lucas (1989) and the result in Hairer (2006).
In any case, we begin by describing formally what it means to jumble in a random but systematic
way.
1
1
Probability transition functions
Let (S, d) be a metric space. Let τ be the topology generated by d and let B be the Borel
σ–algebra generated by τ . Let Q(s, A) be a mapping from S × B into [0, 1] such that
1. For each fixed s ∈ S, Q(s, ·) is a probability measure and
2. For each fixed A, Q(·, A) is a B–measurable function.
We call such a Q a probability transition function. In what follows, we will essentially be
talking about the following difference equation
∫
µt+1 (A) =
Q(s, A)dµt (s)
(1)
S
where {µt } is a sequence of probability measures on (S, B). We are interested in the conditions
under which a sequence {µt } satisfying (1) converges to a unique limiting probability measure,
independently of the initial probability measure µ0 .
Having defined the transition function Q(s, A), we can define other objects in terms of it. For
example, suppose s is transformed into s′ according to Q and you want to know E[f (s′ )|s]. Or
you want to know the average of f (s) when s is distributed according to µ. Or the average of
f (s′ ) when s is distributed according to µ.
Define the operators T and T ∗ in terms of Q via the following. Let µ : B → [0, 1] be a
probability measure and let f : S → R be a bounded or non–negative measurable function.
∫
(T f )(s) = f (y)Q(s, dy)
S
∗
∫
(T µ)(A) =
Q(s, A)dµ(s).
S
Moreover, we sometimes write
∫
(f, µ) =
f dµ.
S
Here is an important fact about probability transition measures, stated as Theorem 8.3 in
Stokey and Lucas (1989).
2
Theorem. Let T and T ∗ be defined as above and let f be a non–negative measurable function
on (S, B) into R. Then
∫
∫
f d(T ∗ µ)
(T f )dµ =
S
S
for any probability measure µ on B.
Proof. By the Monotone Convergence Theorem, it suffices to show it for a simple function.
We show it here for an indicator function. For the rest, see Stokey and Lucas (1989). Let
A ∈ B and let f = IA . Then
(T f )(s) = Q(s, A)
and the left hand side becomes
∫
Q(s, A)dµ.
S
Meanwhile, the right hand side is
∫
∫
∗
∗
IA (s)d(T µ) = (T µ)(A) = Q(s, A)dµ.
S
2
S
Existence of an invariant (stationary) measure
Definition. The function Q is said to have the Feller property if
∫
(T f )(s) = f (y)Q(s, dy)
S
is bounded and continuous whenever f is bounded and continuous.
Proposition. The following properties are equivalent to the Feller property.
1. sn → s implies Q(sn , ·) converges weakly to Q(s, ·).
2. µn converges weakly to µ implies that T ∗ µn converges weakly to T ∗ µ.
3. For each open set A ∈ τ , the function Q(·, A) is lower semicontinuous, i.e. for each
a ∈ [0, 1], the set {s ∈ S : Q(s, A) > a} is open.
3
We will define the notion of weak convergence of measures in a moment. But before we do
that, we will define the central concept of this Lecture: an invariant measure.
Definition. A Q–invariant measure is a probability measure such that, for all B ∈ B, we have
∫
µ(B) = Q(s, B)dµ(s).
S
In shorthand notation:
T ∗ µ = µ.
We have the following existence result for invariant measures.
Proposition. Let (S, d) be a compact metric space and suppose Q has the Feller property.
Then there is an invariant measure.
Proof. See Aliprantis and Border (1994).
To understand better what this means, let’s consider two cases where there does not exist an
invariant measure.
Example. Let S = N = {1, 2, 3, . . .} and define d via
d(n, m) = |n − m|.
Notice that this metric generates the discrete topology. Let Q be given by
{
1 if (n + 1) ∈ A
Q(n, A) =
0 otherwise.
This Q has no invariant measure. This is possible since S is not compact. However, we can
make it compact by adding one point and defining a new metric.
Example. Let S = Z+ = {0, 1, 2, 3, . . .} and define d via
 1
− 1 if n ̸= 0 and

m
n



1
if n = 0 and
m
d(n, m) =
1

if n ̸= 0 and

n


0
if n = 0 and
m ̸= 0
m ̸= 0
m=0
m = 0.
The topology generated by this metric is the one associated with the Alexandrov one–point
compactification of N with 0 as the point at infinity. That is, the open sets are all the subsets
4
of N and sets of the form
{0} ∪ K c
where K ⊂ N is a finite set.1 It follows that (S, d) is a compact metric space. Notice that all
the singletons are open sets except {0}, which is not open. No matter how small you choose
ε > 0, there is a point n ∈ S such that d(0, n) < ε but n ∈
/ {0}.
Now let Q be defined as in the previous example and it should be fairly clear that, even though
S is compact, there is still no invariant measure. (For a proof of this, see Aliprantis and Border
(1994).) So it must be that Q is not Feller. We will show this in two ways. First, we show that
there is an open set A ⊂ S such that Q(·, A) is not lower semicontinuous. Choose A = {1}.
This is an open set. Meanwhile, the inverse image of ( 21 , 1] under Q(·, A) is the set {0}, which
is not open.
On the other hand, consider the bounded and continuous function f : S → R defined via
{
1
if n ̸= 0
n
f (n) =
0 if n = 0.
(The reader should verify that this really is a continuous function.) Meanwhile, applying the
operator T to this function, we get
(T f )(n) =
1
.
n+1
This is not a continuous function. To see this, consider the inverse image of the open set ( 34 , 53 ).
Alternatively, consider the sequence nt = t. Apparently
lim nt = 0
t→∞
and
lim (T f )(nt ) = 0.
t→∞
Yet (T f )(0) = 1 ̸= 0, so T f is not continuous at n = 0.
We are not just interested in existence, however. We want uniqueness and convergence, too.
By convergence, we mean that, if we define the probability measure µ0 arbitrarily and define
µt+1 = T ∗ µt
1
More generally, the Alexandrov one–point compactification of a topological space (X, τ ) is the set X ∪ {∞}
together with the following topology. A set A is open either if A ∈ τ or if there is a compact set K ⊂ X such
that K c ∈ τ and A = {∞}∪K c . Of course, the requirement that K c be open is redundant if (X, τ ) is Hausdorff.
5
then µt will converge to a probability measure µ such that T ∗ µ = µ. But what are we to mean
by convergence? After all, we haven’t defined a topology for measures, although we could (see
Aliprantis and Border (1994)). Anyhow, we will discuss two convergence concepts for measures,
and we will call them “weak” and “strong”. This more or less corresponds to the terminology in
the mathematics literature, but the important thing to know is that strong convergence implies
weak convergence.
3
Weak convergence
How might one define convergence of measures? Here’s one possibility.
Silly definition (too weak and too strong at the same time). The sequence of measures
{µn } is said to converge to the limit measure µ if, for each B ∈ B(S)
lim µn (B) = µ(B).
n→∞
One problem with this definition is that it is in one sense too strong. Under this definition,
there can be a sequence of random variables {Xn } that converges in probability to the random
variable X but the distribution measures of {Xn } do not converge to the distribution measure
of X. The following definition does not have that problem.
Pretty good (but weak) definition. The sequence of measures {µn } is said to converge
(weakly, vaguely) to the limit measure µ if, for each bounded and continuous function f : S →
R, we have
∫
∫
lim
f dµn =
n→∞
S
f dµ.
S
We now introduce the mixing property, which is there to ensure uniqueness.
Definition. Suppose S = [a1 , b1 ] × [a2 , b2 ] × . . . × [an , bn ] is a rectangle in Rn . We write
S = [a, b]. Then Q is said to exhibit the mixing property if there is a c ∈ S and an N ≥ 1
such that
QN (a, [c, b]) > 0 and
QN (b, [a, c]) > 0
6
∫
where
2
Q (x, A) =
Q(y, A)Q(x, dy).
X
As a counter–example, consider a two–state Markov chain with the transition probability matrix
[
]
1 0
Γ=
0 1
so that, for sure, you stay in whatever state you started. In this case, any probability measure
is invariant, so there is existence but not uniqueness.
Finally, we introduce a property that will ensure convergence. For this property (monotonicity)
to be defined, we need to equip S with an order.
Definition. An order on a set X is a relation ≥ satisfying, for all x, y, z ∈ X,
• If x ≥ y and y ≥ z then x ≥ z (transitivity),
• if x ≥ y and y ≥ x then x = y (antisymmetry), and
• x ≥ x (reflexivity).
Definition. An ordered pair (X, ≥) where X is a set and ≥ is an order on X is called a
partially ordered set, or just an ordered set.
Definition. A function f : X → R where (X, ≥) is an ordered set is said to be increasing if
f (y) ≥ f (x) whenever y ≥ x.
Definition. A transition function Q is said to be monotone if the associated operator T has
the property that for any bounded, increasing real function f , T f is also increasing.
This definition captures the idea that, if the current value is high, you are more likely to get
a high value in the future as well. Positive autocorrelation, if you like. A counterexample is a
two–state Markov chain with the transition probability matrix
[
]
0 1
Γ=
.
1 0
In this case, there is a unique invariant measure (µ1 = µ2 = 1/2), but unless you start there
you don’t converge to it. You keep switching back and forth.
7
Theorem [12.12 in Stokey and Lucas (1989)]. Let S ⊂ Rn be a compact rectangle. Associate
it with the Euclidean metric and the following order. x ≥ y if xi ≥ yi for all i = 1, 2, . . . , n. If
Q is monotone, has the Feller property and exhibits the mixing property then there is a unique
probability measure µ such that
T ∗µ = µ
and a sequence of measures defined via
µt+1 = T ∗ µt
will converge weakly to µ no matter what the initial probability measure µ0 is.
But what if S is not a rectangle (but nevertheless compact)? Apparently Hopenhayn and
Prescott (1992)) discuss this, but that paper is nearly impossible to read. However, more easily
digestible results are available for a stronger notion of convergence, and that is where we will
turn our attention now.
4
Strong convergence
In this section, to avoid cluttering the notation, the operator that we have so far called T ∗ will
be called T .
4.1
Review and generalization of Banach’s fixed point theorem
Let (S, d) be a complete metric space and let T : S → S be a contraction, i.e. suppose there is
a 0 ≤ β < 1 such that
d(T r, T s) ≤ βd(r, s)
for all r, s ∈ S. Then Banach’s fixed point theorem says that T has a unique fixed point s∗
and, moreover, for every s ∈ S we have
lim T n s = s∗ .
n→∞
Generalization: Suppose there is an m such that T m is a contraction. Then the result of
Banach’s theorem still goes through. The proof comes in two parts. First we show that the
8
unique fixed point p of T m is also a fixed point of T . Since T m is a contraction, there is a
0 ≤ β < 1 such that
d(T m x, T m y) ≤ βd(x, y)
for all x, y ∈ S. In particular, it is true for x = p and x = T p. Thus
d(T m+1 p, T m p) ≤ βd(T p, p).
But we also have, of course, that
d(T m+1 p, T m p) = d(T p, p)
since p is a fixed point of T m . But these two things can only be true simultaneously if d(T p, p) =
0. Hence T fixes p. This fixed point of T is unique; if there were another, it would also be fixed
under T m and that is impossible.
The second part of the proof is to show that T n x converges for all x and that the limit is a
fixed point of T . Left for the reader.
4.2
Introducing the total variation metric
Let (X, F) be a measurable space and consider the set M of probability measures µ on F.
To make M into a metric space, we define the following “total variation” metric.
dTV (µ, λ) = 2 · sup |µ(A) − λ(A)|.
A∈F
The above formulation is the most straightforward way of defining the total variation (though
the 2 is admittedly awkward). Here’s another one.
Proposition 1 Let (X , F) be a measurable space and let M be the set of probability measures
µ : F → [0, 1]. Let µ and λ be arbitrary members of M. Define dTV via
dTV (µ, λ) = 2 · sup |µ(A) − λ(A)|.
A∈F
Let η be an arbitrary positive measure on (X , F) such that µ and λ are absolutely continuous
with respect to it. Define the Radon-Nikodym derivatives fµ and fλ via
fµ =
9
dµ
dη
fλ =
Define
dλ
.
dη
∫
d(µ, λ) =
X
|fµ − fλ |dη.
Then, regardless of the specific choice of η, dTV (µ, λ) = d(µ, λ) for all probability measures µ
and λ. Thus we can define
∫
TV
d (µ, λ) =
|fµ − fλ |dη.
X
Proof. First, notice that
∫
(fµ − fλ )dη = µ(A) − λ(A).
A
In particular, let
A = {x ∈ X : fµ (x) ≥ fλ (x)} .
It should be clear that, at this A, µ(A) − λ(A) achieves its supremum and at Ac it achieves its
infimum. Because ν(A) = 1 − ν(Ac ) for any probability measure ν, the modulus is of course
the same at each of these sets. Meanwhile,
∫
∫
∫
|fµ − fλ |dη =
|fµ − fλ |dη +
|fµ − fλ |dη = 2 · (µ(A) − λ(A)).
X
A
Ac
From now on, we will denote the total variation distance simply by d.
A corollary of this proposition is that (M, d) is a complete metric space. Proof: Let µn be
a sequence of probability measures that is Cauchy in the total variation distance and let η
∑
be defined by η = n>0 2−n µn . Then each of the µn is absolutely continuous with respect
to η. Since (why?) the total variation distance is equal to the L1 (Ω, F, η) distance between
the corresponding Radon-Nikodym derivatives, the result thus follows from the completeness
of L1 (Ω, F, η).
Notice also that we can extend M to a vector space (addition and scalar multiplication being
defined in the obvious way) and thus we can think of the metric d as having been generated by
the following norm:
∥µ∥ = 2 · sup µ(A).
A∈F
10
This of course is a pretty boring norm for probability measures (it’s guaranteed to equal two),
but it’s more interesting if we apply it to linear combinations of probability measures which
are not necessarily themselves probability measures.
Lemma 1 Define
{
∫
(µ ∧ λ)(A) =
min
A
dλ
dµ
,
d(λ + µ) d(λ + µ)
}
d(λ + µ).
Then
d(µ, λ) = 2 − 2(µ ∧ λ)(X ).
Proof. For any two positive numbers, |x − y| = x + y − 2 min{x, y}.
4.3
Application of Banach’s fixed point theorem to Markov processes
Let P (x, A) be a probability transition function and define T : M → M via
∫
(T µ)(A) =
P (x, A)dµ(x).
X
We are looking for sufficient conditions under which any sequence of probability measures µn
satisfying
µn+1 = T µn
for all n = 1, 2, . . . converges to the unique fixed point of T . For that it is a sufficient condition
that T is a contraction. Indeed, it is sufficient that T n is a contraction for some n.
Incidentally, note that
∫
n
P n (x, A)dµ(x).
(T )(µ)(A) =
X
∫
where P n is defined via
P
n+1
P n (y, A)P (x, dy).
(x, A) =
X
Now recall what I said in my notes on Markov chains:
11
A somewhat tighter result on uniqueness and convergence of probability measures
is available. For the sequence P n µ to converge to a unique stationary measure π for
any distribution µ it is necessary and sufficient that P n has a row without zeros for
some n. This means that there is at least one state i and at least one n such that
state i is reachable from every state in n steps. See Stokey and Lucas (1989).
This result can be generalized to the infinite-state-space case. The following proposition establishes a sufficient condition for existence, uniqueness and convergence that reduces to the
above condition in the finite-state-space case. (Can you verify that reduction?)
Proposition 2 Let P be a probability transition function on (X , G). Let (M, d) be the metric
space of probability measures on (X , G) with the total variation metric. Let T : M → M be
defined via
∫
(T µ)(A) =
P (x, A)dµ.
(2)
X
Suppose there is an α > 0 and a probability measure η ∈ M such that, for all x ∈ X and all
A ∈ G we have
P (x, A) ≥ αη(A).
Then T is a contraction. Hence, by Banach’s fixed point theorem, we have the following result.
For any µ0 ∈ M we have T n µ0 → µ∗ strongly, where µ∗ is the unique member of M such that
that T µ∗ = µ∗ .
Proof. Note that the assumption implies that T µ ≥ αη for every probability measure µ ∈ M.
We can therefore define probability measures T µ via
T µ = αη + (1 − α)T µ
(3)
Now let µ and ν be two arbitrary probability measures on X . Using Lemma 1 we can define
the probability measures µ and ν via
µ=µ∧ν+
d(µ, ν)
·µ
2
ν =µ∧ν+
d(µ, ν)
· ν.
2
and
One then has
d(T µ, T ν) = d(T µ, T ν) ·
12
d(µ, ν)
.
2
It follows from Equation (3) that
d(T µ, T ν) = d(αη + (1 − α)T µ, αη + (1 − α)T ν) = (1 − α)d(T µ, T ν) ≤ 2(1 − α)
where we have used the fact that the distance between two probability measures can never
exceed 2. Combining these results, we have
d(T µ, T ν) ≤ (1 − α)d(µ, ν)
so that T is a contraction. The result now follows from Banach’s fixed point theorem.
An even tighter result is stated as Theorem 11.12 in Stokey and Lucas (1989), providing a
necessary and sufficient condition for (strong) convergence. It says the following.
Proposition 3 Let P be a probability transition function on (X , G). Let (M, d) be the metric
space of probability measures on (X , G) with the total variation metric. Let T : M → M be
defined via
∫
(T µ)(A) =
P (x, A)dµ.
(4)
X
Suppose there is an integer N and a real number α > 0 such that for all A ∈ G we have either
P N (x, A) ≥ α for all x ∈ X
or
P N (x, Ac ) ≥ α for all x ∈ X .
(Call this condition M.) Then T N is a contraction. Hence, by Banach’s fixed point theorem,
we have the following result. For any µ0 ∈ M we have T n µ0 → µ∗ strongly, where µ∗ is the
unique member of M such that that T µ∗ = µ∗ . Meanwhile, if T n µ0 → µ∗ strongly for every µ0
in M, then there is an integer N ≥ 1 such that P N (x, A) satisfies condition M.
Proof. See Stokey and Lucas (1989).
In the next set of lecture notes, we’ll be able to verify that these sufficient conditions apply in
Huggett (1993), a workhorse model for macroeconomists. In that model, there is a point in
the state space such that you reach it with strictly positive probability in finitely many periods
no matter where you start. This means that it is easy to verify either Hairer’s condition or
condition M.
13
References
Aliprantis, C. and K. Border (1994). Infinite Dimensional Analysis, A Hitchhiker’s Guide.
Springer Verlag.
Hairer,
M.
(2006).
Ergodic
properties
http://www.warwick.ac.uk/ masdbd/notes/Markov.pdf.
of
Markov
processes.
Hopenhayn, H. and E. C. Prescott (1992). Stochastic monotonicity and stationary distributions
for dynamic economies. Econometrica 60, 1387–1406.
Huggett, M. (1993). The risk free rate in heterogeneous-agents, incomplete insurance economies.
Journal of Economic Dynamics and Control 17 (5/6), 953–970.
Stokey, N. L. and E. C. Lucas, R. E. with Prescott (1989). Recursive Methods in Economic
Dynamics. Harvard University Press.
14