A Proof of Birkhoff`s Ergodic Theorem

A Proof of Birkhoff’s Ergodic Theorem
Joseph Horan
September 2, 2015
1
Introduction
In Fall 2013, I was learning the basics of ergodic theory, and I came across this theorem. One of my
supervisors, Anthony Quas, showed me this proof, as communicated to him by a colleague, Mate Wierdl.
It’s hard to find it in print, so here it is.
2
The Theorem and the Proof
Theorem 2.1 (Birkhoff). Let (X, B, µ, T ) be a measure-preserving system on a σ-finite measure space, and
let f be an integrable function. Then
n−1
1X
f (T k (x))
lim
n→∞ n
k=0
converges for almost every x ∈ X, and the resulting function f˜ is in L1 , with f˜ ◦ T = f˜ almost everywhere
and kf˜k1 ≤ kf k1 . If µ(X) < ∞, then
Z
Z
˜
f dµ =
f dµ.
X
X
Corollary 2.2. If (X, B, µ, T ) is ergodic, then f˜ as obtained above is almost everywhere constant, and if
µ(X) < ∞, then
Z
1
f˜ =
f dµ.
µ(X) X
Proof of Birkhoff ’s Ergodic Theorem. We split the proof into two parts: first, assuming the almost everywhere existence of the limit of the ergodic averages, we prove that it has the requisite properties. Second,
we prove that the limit exists for all L1 functions.
So for now, let f ∈ L1 (µ) and assume that the limit
n−1
1X
f˜(x) = lim
f (T i (x))
n→∞ n
i=0
exists for almost every x ∈ X. As a limit of a sum of
lemma, we have:
Z Z
˜
f
dµ
=
X
(Fatou’s Lemma)
invariance
measurable functions, f˜ is measurable, and by Fatou’s
n−1
1X
i
f (T (x) dµ(x)
lim
n→∞
n
X
i=0
Z
n−1
1 X ≤ lim inf
f (T i (x) dµ(x)
n→∞
X n i=0
n−1 Z
1X
|f (x)| dµ(x)
= lim inf
n→∞ n
i=0 X
n
= lim inf kf k1 = kf k1 .
n→∞ n
1
Hence f˜ is integrable, with f˜ ≤ kf k1 .
1
Precomposing by T gives us:
n−1
n
1X
1X
f˜(T (x)) = lim
f (T i+1 (x)) = lim
f (T i (x))
n→∞ n
n→∞ n
i=0
i=1


n
 −f (x) n + 1 1 X


+
= lim 
f (T i (x))

n→∞
n } | {z
n } n + 1 i=0
| {z
→0
→1
= f˜(x).
So we have that f˜ is T -invariant.
Lastly for this part, suppose that µ(X) < ∞. We will show that for any bounded f ∈ L1 (µ), we have
Z
Z
f˜ dµ =
f dµ,
X
X
and bootstrap to show it for unbounded f . So let f ∈ L1 (µ) be bounded, with |f | ≤ C < ∞. Then we have
1 n−1
n
X
f (T i (x)) ≤ C = C,
n
n
i=0
and
Z
Z
dµ = Cµ(X) < ∞,
C dµ = C
X
X
so g ≡ C is an integrable majorant for this sequence, which allows us to apply the Lebesgue Dominated
Convergence Theorem:
Z
X
n−1
1X
lim
f (T i (x)) dµ
X n→∞ n i=0
Z
Z
n−1 Z
1X
n
= lim
f (T i (x)) dµ = lim
f dµ =
f dµ,
n→∞ n
n→∞ n X
X
i=0 X
f˜ dµ =
Z
using T -invariance in the last steps. Now let f be unbounded, and let > 0; we may find fB ∈ L1 (µ) which
is bounded, with kf − fB k1 < 2 . By the above argument,
f −fB ≤ kf − fB k < .
1
1
2
We then approximate the difference of the integrals:
Z
Z
f˜ dµ −
f dµ
X
ZX
Z
Z
Z
Z
Z
˜
˜
˜
≤ f dµ −
fB dµ + fB dµ −
fB dµ + fB dµ −
f dµ
X
X
X
X
X
ZX
Z
= f˜ − f˜B dµ + fB − f dµ
Z X
ZX
˜ ˜ ≤
|fB − f | dµ
f − fB dµ +
X
X
≤ f˜ − f˜B + kf − fB k1 < + = .
2 2
1
2
Since was arbitrary, we have
Z
f˜ dµ =
Z
f dµ,
X
X
as desired.
We now only have to prove that for every f ∈ L1 (µ), the pointwise limit of the ergodic averages converges
almost everywhere. There are four main steps of the proof, together with some minor arguments.
1. Prove a maximal ergodic lemma for l1 (Z).
2. Use this lemma to prove a maximal ergodic lemma for L1 (X, µ).
3. Show that the subset of L1 functions satisfying the BET is closed in L1 .
4. Find a dense subset of L1 for which the BET holds.
Everything will be done for real-valued functions; once it is proven, the result holds for a complex-valued
function by writing the the function as a sum of a real function and another real function multiplied by the
imaginary unit. As well, we will do the main proof in the case of a probability space, and then show how to
extend the result to a σ-finite space. Or at least, we would, if I knew how to do it. See Halmos’s Lectures
on Ergodic Theory or similar for a proof that works in the σ-finite case. Sorry.
Step 1: This lemma is of a more combinatorial nature.
Lemma 2.3 (Maximal Ergodic Lemma for l1 (Z)). Let a ∈ l1 (Z), and denote
j−1
Ma (n) = sup
j≥1
1X
|a(n + i)| .
j i=0
Then we have, for any positive real number λ,
|{n : Ma (n) ≥ λ}| ≤
kak1
.
λ
Proof. First, assume that the statement holds when λ = 1. Now, let λ > 0 be fixed, and let ã = λ1 a. Then
we have:
1
|{n : Ma (n) ≥ λ}| = |{n : Mã (n) ≥ 1}| ≤ kãk1 = kak1 .
λ
P
Hence we need only prove the claim for λ = 1. Note that because a ∈ l1 (Z), kak |a(i)| < ∞, so Ma (n) is
1
actually a maximum, because the averaging factor j forces the terms towards 0.
Let 0 < < 1. Again because a ∈ l1 (Z), we find M ∈ R and indices k1 ≤ k2 ∈ Z such that kak1 > M >
kak1 − , and
k2
X
|a(i)| .
M=
i=k1
This says that almost all of the mass is contained between two indices. We observe that for n < k1 − (M + 1),
there is a bound on the value of Ma (n). For j = 0, . . . , k1 − n, we have
j−1
1X
1
|a(n + i)| < < 1,
j i=0
j
and for j > k1 − n, we have that j > k1 − n > M + 1, by definition of n, so that
j−1
1X
1
M +1
|a(n + i)| <
( + M ) <
= 1.
j i=0
M +1
M +1
Hence Ma (n) < 1 for n < k1 − n. For n > k2 , then we clearly have Ma (n) < 1, as less than of the weight
is after k2 . Therefore |{n : Ma (n) ≥ 1}| is finite. Denote A = {n : Ma (n) ≥ 1}.
3
Let n0 = min{n ∈ A}, and define
)
j−1
1 X
|a(n0 + m)| ,
j0 = max j ≥ 1 :
j m=0
(
which we can do because Ma (n0 ) is an honest maximum. Then inductively define ni = min{n ∈ A, n >
ni−1 + ji−1 }, and
(
)
j−1
1 X
ji = max j ≥ 1 :
|a(ni + m)| ,
j m=0
for i = 1 . . . k, since A is finite. The sets {ni , . . . , ni + ji − 1} are disjoint, which means
à =
k
[
{ni , . . . , ni + ji − 1}
i=0
is a disjoint union, and we see that A ⊂ Ã. This implies:
kak1 ≥
k jX
i −1
X
|a(ni + m)|
i=0 m=0
|
≥
k
X
ji =
i=0
{z
≥ji
k
X
}
|{ni , . . . , ni + ji − 1}|
i=0
= Ã ≥ |A| = |{n : Ma (n) ≥ 1}| ,
and so we are done.
Step 2: This is a so-called ‘transference principle’, where we obtain the result in a more complicated
space by bootstrapping from the l1 (Z) case.
Lemma 2.4 (Maximal Ergodic Lemma for L1 (X, µ)). Let (X, B, µ, T ) be a measure-preserving system, and
let f be an integrable function. Define, for each x ∈ X,
j−1
1 X M f (x) = sup
f (T i (x)) .
j
j≥1
i=0
Then for any positive real number λ, we have
µ({x : M f (x) ≥ λ}) ≤
kf k1
.
λ
Proof. The same proof as in the l1 (Z) case applies to show that it suffices to prove the lemma for λ = 1. So
assume λ = 1. We shall perform two truncations, to allow us to deal with finite summations. Let K ∈ N+ ,
and define fx,K : Z → R by
(
f (T n (x)) 0 ≤ n ≤ K,
fx,K (n) =
0
n < 0, n > K.
Then for µ-almost every x ∈ X, fx,K ∈ l1 (Z) as it is a finite sum (it isn’t summable everywhere because
f could be infinite on a set B of measure zero), and for a fixed n, f·,K (n) = f (T n (·)) is both measurable
and integrable (the latter by T -invariance of µ). Because Z is countable, we see that f·,K (·) is a measurable
function on X × Z.
For any function g ∈ L1 µ, and N ∈ N+ , we define MN g : X → R, by
j−1
1 X f (T i (x)) .
MN g(x) = max
1≤j≤N j
i=0
4
Note that as a maximum of a sum of measurable functions, MN f is measurable. We also have that
M f (x) = lim MN f (x),
N →∞
by definition, so that M f is measurable. Now, observe that for N1 < N2 , we have MN1 f (x) ≤ MN2 f (x),
so that we have an increasing chain of subsets of X, given by {x : MN f (x) ≥ 1}. If we were able to prove
that
µ({x : MN f (x) ≥ 1}) ≤ kf k1
for all N , then we would be able to show the result for M f , by continuity of the measure µ along chains:
n
o
µ({x : M f (x) ≥ 1}) = µ( x : lim MN f (x) ≥ 1 )
N →∞
= lim µ({x : MN f (x) ≥ 1}) ≤ lim kf k1 = kf k1 .
N →∞
N →∞
So it is now our goal to show that the result holds for any N .
If we let N >> K, then we see that because a large number of the terms in the sum are zero, the following
identity is true:
j−1
j−1
1 X 1 X f (T n ((T i (x))) = max
f (T n+i (x)) = Mfx,K (n),
MN fx,K (n) = max
1≤j≤N j
1≤j≤N j
i=0
i=0
where we use the notation from the previous lemma for summable functions. This allows us to apply Lemma
2.3, to obtain (writing k·kc for the norm on l1 (Z)):
kfx,K kc ≥ n : Mfx,K (n) ≥ 1 = |{n : MN fx,K (n) ≥ 1}| .
We may now prove the lemma:
K kf k1 =
K−1
XZ
n=0
=
Z
(by Fubini)
X
K−1
XZ
n=0
=
|f (x)| dµ(x) =
K−1
XZ
n=0
ZZ
|fx,K (n)| dµ(x) =
|fx,K (n)| dµ × c(x, n)
X
K−1
X
|f (T n (x))| dµ(x) (by T invariance)
X
X×Z
Z
|fx,K (n)| dµ(x) =
X n=0
X
kfx,K kc dµ(x)
Z
≥
|{n : MN fx,K (n) ≥ 1}| dµ(x)
X
Z
=
K−1
X
1{n
: MN fx,K (n)≥1} (n)
dµ(x)
1{x
: MN (f ◦T n )(x)≥1} (x)
1{x
: MN f (x)≥1} (x)
X n=0
(by Fubini)
=
K−1
XZ
n=0
(T − invariance of µ)
=
K−1
XZ
n=0
dµ(x)
X
dµ(x)
X
= Kµ({x : MN f (x) ≥ 1}).
Hence we obtain µ({x : MN f (x) ≥ 1}) ≤ kf k1 for all N >> K, and thus we get
µ({x : M f (x) ≥ 1}) ≤ kf k1
by the earlier remark.
5
Step 3: We now show that the set of L1 functions satisfying the BET is closed. Suppose that {fk }∞
k=1 ⊂
L (µ) with each fk satisfying the BET, with the corresponding T -invariant function denoted f˜k , and fk −→
1
k→∞
f in L1 . Denote
n−1
n−1
1X
1X
f (T i (x)), Af (x) = lim inf
f (T i (x)).
n→∞ n
n→∞ n
i=0
i=0
Lemma 2.5. We have µ( x : Āf (x) − Af (x) > 0 ) = 0, and hence the limit function f˜ exists almost
everywhere on X.
Āf (x) = lim sup
Proof. We may bound the difference of Āf (x) and Af (x). For any k, we have, by applying inequalities
involving limit supremums and infimums:
n−1
n−1
1X
1X
i
Āf (x) − Af (x) = lim sup
f (T (x)) − lim inf
f (T i (x))
n→∞ n
n→∞ n
i=0
i=0
n−1
n−1
1X
1X
fk (T i (x)) +
(f − fk )(T i (x))
n i=0
n i=0
= lim sup
n→∞
− lim inf
n→∞
!
n−1
n−1
1X
1X
fk (T i (x)) +
(f − fk )(T i (x))
n i=0
n i=0
!
n−1
n−1
1X
1X
i
˜
˜
= fk (x) + lim sup
(f − fk )(T (x)) − fk (x) − lim inf
(f − fk )(T i (x))
n→∞ n
n→∞ n
i=0
i=0
≤ lim sup
n→∞
n−1
1 X 2(f − fk )(T i (x))
n i=0
n−1
1 X 2(f − fk )(T i (x))
n≥1 n i=0
≤ sup
= M (2(f − fk ))(x).
From here, fix > 0, and pick δ > 0 such that for all sufficiently large k, kfk − f k1 <
Lemma 2.4 in the case of 2(f − fk ), where k is large enough to obtain:
δ
2 .
We may apply
k2(f − fk )k1
2δ
<
= δ.
µ( x : Āf (x) − Af (x) > ) ≤ µ({x : M (2(f − fk ))(x) > }) ≤
2
We picked δ arbitrarily, so that
µ( x : Āf (x) − Af (x) > ) = 0,
and was also arbitrary, so that
µ( x : Āf (x) − Af (x) > 0 ) = 0.
Hence Āf = Af µ-almost everywhere, so that
n−1
1X
f˜(x) = lim
f (T i (x))
n→∞ n
i=0
exists µ-almost everywhere.
Hence the set of L1 functions which satisfy the BET is closed in the L1 sense.
Step 4: Finally, we show that there is an L1 dense set of functions which satisfy the BET. We shall do
this in two smaller steps: first we show that the square-integrable functions are dense in L1 (µ), then we
explicitly find our desired set.
6
Lemma 2.6. Supposing that µ(X) < ∞, we have L2 (µ) ⊂ L1 (µ), and moreover L2 (µ) is dense in the L1
sense in L1 (µ).
Proof. We are still assuming that µ(X) < ∞. We first follow the proof of a result I should’ve remembered
from my intro measure theory class from undergrad, but in the special case of L2 and L1 . Let f ∈ L2 (µ);
then |f | ∈ L2 (µ) also. We apply Cauchy-Schwarz (a special case of Hölder’s Inequality):
Z
1
|hf, 1i| =
|f | · 1 dµ ≤ kf k2 k1k2 = kf k2 µ(X) 2 < ∞.
X
Thus f ∈ L1 (µ), so L2 (µ) ⊂ L1 (µ). In addition, the norm inequality means convergence in the L2 sense
implies convergence in the L1 sense.
Furthermore, if f ∈ L1 (µ), then for each n ∈ N define fn : X → R, by


f (x) ≥ n,
n
fn (x) = f (x) −n < f (x) < n,


−n
f (x) ≤ −n.
Then |fn (x)| ≤ |f (x)|, so fn ∈ L1 (µ), and moreover fn is bounded. This means fn2 is bounded, and hence
integrable on X, so fn ∈ L2 (µ). Finally, fn −→ f almost everywhere, so |fn (x) − f (x)| −→ 0 for almost
n→∞
n→∞
every x ∈ X. We also have, by inspection of the definition of fn , that
(
|f (x)| − n |f (x)| ≥ n
|fn (x) − f (x)| =
0
|f (x)| < n
≤ |f (x)| ,
so that |f | is an integrable majorant for |fn − f |, and so we obtain:
Z
Z
Z
LDCT
lim kfn − f k1 = lim
|fn − f | dµ =
lim |fn − f | dµ =
0 dµ = 0.
n→∞
n→∞
X n→∞
X
X
Hence fn converges to f in the L1 sense, so we are done.
The previous lemma serves a very useful purpose: if a sequence of L2 functions {fk }∞
k=1 converges in the
L sense to an L2 function f , and BET holds for each fk , then we see that this convergence happens in the
L1 sense, and so by two lemmas ago, BET holds for f also. The final step, then, is to find a subset of L1 on
which BET holds, and the L1 -closure of which contains L2 .
Lemma 2.7. BET holds for the subspace of L2 coboundaries, C = f − f ◦ T : f ∈ L2 , and thus for the
2
L2 -closure of C, C̄ L .
2
Proof. For a coboundary f − f ◦ T , we have:
n−1
1X
1
(f − f ◦ T )(T i (x)) = (f (x) − f (T n (x))).
n i=0
n
By Markov’s Inequality, we obtain:
f (T n (x))
kf ◦ T n k1
kf k1
1 =
µ x : ≥ 2 ≤
−→ 0.
1
n
n3
n 3 n→∞
n 12
n3
Hence for almost every x ∈ X, we have
n−1
1X
lim
(f − f ◦ T )(T i (x)) = 0,
n→∞ n
i=0
2
and so BET holds for C. By the above convergence result, BET also holds for C̄ L .
7
Lemma 2.8. The orthogonal complement of C is
Z
⊥
2
C = g ∈ L (µ) :
hg dµ = 0, ∀h ∈ C = g ∈ L2 (µ) : g = g ◦ T ,
X
the set of T -invariant functions in L2 , and BET holds for this subspace.
Proof. Let g ∈ C ⊥ . Then g − g ◦ T ∈ C, by definition, and we have:
hg, g − g ◦ T i = hg, gi − hg, g ◦ T i = hg ◦ T, g ◦ T i − hg, g ◦ T i
= hg ◦ T − g, g ◦ T i = −hg ◦ T, g − g ◦ T i = h−g ◦ T, g − g ◦ T i = 0.
Adding, we obtain:
2
0 = 0 + 0 = hg, g − g ◦ T i + h−g ◦ T, g − g ◦ T i = hg − g ◦ T, g − g ◦ T i = kg − g ◦ T k2 ,
which implies that g = g ◦ T almost everywhere.
Conversely, if g is (almost) T -invariant, and f ∈ L2 , then we have:
hg, f − f ◦ T i = hg, f i − hg, f ◦ T i = hg ◦ T, f ◦ T h−hg ◦ T, f ◦ T i = 0.
Hence g ∈ C ⊥ , and we get that C ⊥ = g ∈ L2 (µ) : g = g ◦ T .
To conclude, observe that for g ∈ C ⊥ ,
n−1
n−1
1X
1X
ng(x)
g(T i (x)) = lim
g(x) = lim
= g(x),
n→∞ n
n→∞ n
n→∞
n
i=0
i=0
lim
so BET holds for elements of C ⊥ .
2
We now observe that L2 (µ) = C̄ L ⊕2 C ⊥ , so that every element of L2(µ) may be written as a sum of a
limit of a coboundary and a T -invariant function. If BET holds for two functions, it holds for their sum (by
limit laws), so that BET now holds for all of L2 (µ). but we already know that the L1 -closure of L2 (µ) is all
of L1 (µ), so we see that BET holds for all functions in L1 , thus completing the proof in the finite measure
case.
We give the proof of the corollary of Birkhoff’s theorem, Corollary 2.2.
Proof. Let (µ, T ) be ergodic, let f ∈ L1 (µ), and let f˜ be the limit of the ergodic averages. We know that f˜
is T -invariant, so f˜ is almost everywhere constant. If µ(X) = ∞, then since f˜ is integrable, f˜ ≡ 0 almost
everywhere.
If µ(X) < ∞, then we have:
Z
Z
f˜ dµ = f˜µ(X) =
f dµ,
X
X
so that
f˜(x) =
1
µ(X)
Z
f dµ,
X
as desired.
Finally, we show that f˜ is the conditional expectation of f with respect to the σ-algebra of T -invariant
measurable sets, T .
Proof. Let A ∈ T and let f ∈ L1 (µ). Then we have 1A = 1T −1 (A) = 1A ◦ T , and 1A f ∈ L1 (µ), so we have:
n−1
n−1
1X
1X
1A (T i (x))f (T i (x)) = lim
f (T i (x))1A (x) = f˜(x)1A (x).
n→∞ n
n→∞ n
i=0
i=0
˜ f (x) = lim
1A
8
Hence, we obtain:
Z
Z
f dµ =
A
Z
f 1A dµ =
X
˜ f dµ =
1A
X
Z
X
f˜1A dµ =
Z
f˜ dµ,
A
and so by a theorem about conditional expectations, f˜ is the conditional expectation of f with respect to T
(equality of integrals over any set in the σ-algebra is sufficient and necessary).
9