Chapter 4
Ergodic theory
4.1 The Poincaré recurrence theorem
Ergodic theory deals with dynamics in a measure space. Let us begin with a definition.
Definition 4.1.1: Let (X, A, µ) and (Y, B, ν) be two measure spaces,
¡ and f :¢X → Y a measurable map. We
shall denote by f∗ µ the measure on (Y, B) given by f∗ µ(B) = µ f −1 (B) for all B ∈ B. We say that f
is measure-preserving if f∗ µ = ν. If (X, A, µ) = (Y, B, ν) and f is measure-preserving, we say that µ is
f -invariant, or that f is an endomorphism of (X, A, µ).
The first result in ergodic theory is Poincaré’s recurrence theorem, which states that in a probability
Borel measure space almost every point is recurrent. This is a consequence of the following
Proposition 4.1.1: Let f : X → X be an endomorphism of a probability space (X, A, µ). Given A ∈ A,
let à ⊆ A be the set of points x ∈ A such that f j (x) ∈ A for infinitely many j ∈ N, that is
∞ [
\
f −j (A).
à = A ∩
n=0 j≥n
Then à ∈ A and µ(Ã) = µ(A).
/ A for all j ≥ n}. We clearly have
Proof : Let Cn = {x ∈ A | f j (x) ∈
∞
[
Cn ;
à = A \
n=1
therefore it suffices to prove that Cn ∈ A and µ(Cn ) = 0 for all n ≥ 1. Now, we have
[
f −j (A),
Cn = A \
j≥n
and thus Cn ∈ A. Furthermore, (4.1.1) implies
[
[
f −j (A) \
f −j (A),
Cn ⊆
j≥0
and thus
µ(Cn ) ≤ µ
[
j≥n
f −j (A) − µ
j≥0
[
f −j (A) = f −n
j≥n
f −j (A) .
j≥n
But since
[
[
f −j (A)
j≥0
and f is measure-preserving, it follows that
[
[
f −j (A) = µ
f −j (A) ,
µ
j≥0
and we are done.
j≥n
(4.1.1)
46
Sistemi Dinamici Discreti, A.A. 2005/06
As a consequence we have
Theorem 4.1.2: (Poincaré’s recurrence theorem) Let X be a separable metric space, f : X → X a Borelmeasurable map, and µ an f -invariant probability Borel measure on X. Then µ-almost every point of X is
f -recurrent.
Proof : Since X is separable, we can find (exercise) a countable basis of open sets {Un }n∈N such that
lim diam(Un ) = 0
n→+∞
and
[
∀m ≥ 0
Un = X.
n≥m
Set Ũn = {x ∈ Un | f j (x) ∈ Un for infinitely many j > 0}. Proposition 4.1.1 says that each Ũn is a Borel
set and µ(Un \ Ũn ) = 0. Put
∞ [
\
X̃ =
Ũn ,
m=0 n≥m
so that x ∈ X̃ if and only if there are infinitely many n ≥ 0 so that f j (x) ∈ Un for infinitely many j > 0.
We have
∞
∞
∞ [
[
[
[
[
[
[
X \
Un \
(Un \ Ũn ) = 0.
Ũn = µ
Ũn ≤ µ
µ(X \ X̃) = µ
m=0
m=0
n≥m
n≥m
n≥m
m=0 n≥m
So it suffices to show that every point of X̃ is recurrent. Given ε > 0, choose m ≥ 0 so that diam(Un ) ≤ ε
j
if n ≥ m. If x ∈ X̃,¡ there must
¢ exist n ≥ m such that x ∈ Ũn , and thus f (x) ∈ Un for infinitely many j ≥ 0.
j
But this implies d x, f (x) < ε for infinitely many j ≥ 0, and thus (being ε arbitrary) x is recurrent.
Example 4.1.1.
If f : X → X has a periodic point p ∈ X of exact period n, then
µ=
n−1
1X
δ j
n j=0 f (p)
is an f -invariant Borel probability measure, where δx is the Dirac measure concentrated at x ∈ X.
Example 4.1.2.
The Lebesgue measure of S 1 is Rα -invariant for any α ∈ R.
Exercise 4.1.1. Let (X, A, µ) and (Y, B, ν) be σ-finite measure spaces,
and
¡
¢ f : X → Y a measurable map.
Prove if there exists a generating sub-algebra B0 ⊆ B such that µ f −1 (B) = ν(B) for all B ∈ B0 then f is
measure-preserving.
Example 4.1.3. The Lebesgue measure of S 1 is Em -invariant for any m ∈ Z, because the inverse image of
an interval of length ` small enough is the union of |m| disjoint intervals of length `/|m|, and we can apply
the previous exercise. Poincaré’s recurrence theorem applied to E10 then says that there exists a Borel set
X ⊂ S 1 of full measure such that for all j ≥ 1 the decimal expansion of each x ∈ X contains the sequence
of its first j digits infinitely many times.
Example 4.1.4.
The Gauss transformation. Let φ: [0, 1] → [0, 1] be given by
(
φ(x) =
¹ º
1
1
−
if x 6= 0,
x
x
0
if x = 0.
4.1
The Poincaré recurrence theorem
47
This map has important connections with the theory of continuous fractions. We claim that φ preserves the
Borel probability measure µ given by
1
µ(A) =
log 2
Z
A
1
dx
1+x
for every Borel set A ⊆ [0, 1]. Using the Exercise 4.1.1, it suffices ¡to prove that
¢ φ preserves the measures
of finite unions of intervals, and hence it suffices to check that µ φ−1 ([a, b]) = µ([a, b]) for every interval [a, ¡b] ⊆ [0, 1]. Since
µ is absolutely continuous with respect to the Lebesgue measure, it suffices to check
¢
that µ φ−1 ([0, b]) = µ([0, b]) for every b ∈ [0, 1]. But indeed we have
−1
φ
¸
∞ ·
[
1
1
,
,
([0, b]) =
b+n n
n=1
and so
¸¶ X
µ·
Z 1/n
∞
∞
∞
X
¢ X
¡
1 + n1
1
1
1
1
1
,
=
dt =
log
µ
µ φ−1 ([0, b]) =
1
b+n n
log 2 1/(b+n) 1 + t
log 2
1 + b+n
n=1
n=1
n=1
¶
∞ µ
1 X
n
1
n+1
=
− log
=
log(b + 1)
log
log 2 m=1
b+n+1
b+n
log 2
Z b
1
1
dt = µ([0, b]).
=
log 2 0 1 + t
Example 4.1.5. Assume we have given a probability measure on the set {0, . . . , N − 1}, that is a sequence
of non-negative numbers p0 , . . . , pN −1 ≥ 0 so that p0 + · · · + pN −1 = 1. The associated Bernoulli (or product)
measure on ΩN and Ω+
N is the probability Borel measure µ defined on the cylinders by
...mr
µ(Cam11...a
) = pa1 · · · par .
r
It is clearly (full) left shift-invariant. A slightly more involved construction yields another kind of invariant
measure. A stochastic matrix is a matrix S ∈ MN,N (R+ ), with row and column indexes running from 0
PN −1
to N − 1, such that
i=0 sij = 1 for j = 0, . . . , N − 1. If S is a stochastic matrix, it is possible to
prove that there exists a vector p = (p0 , . . . , pN −1 ) ∈ RN with non-negative coordinates such that Sp = p
and p0 + · · · + pN −1 = 1; furthermore, p is unique if S is transitive (in the sense of Definition 1.7.9). Then
the Markov measure µS,p on ΩN is defined on cylinders by
...mr
µS,p (Cam11...a
) = sa1 a2 · · · sar−1 ar par .
r
Again, it is easy to check that it is shift-invariant.
Exercise 4.1.2. Let f : X → X be an endomorphism of a measure space (X, A, µ). Given A ∈ A with
non-zero measure, put à = {x ∈ A | f j (x) ∈ A for infinitely many j ≥ 1}, and define f˜: à → A by setting f˜(x) = f j (x), where j is the least positive index such that
f j (x)¢ ∈ A (f˜ is sometimes called the
¡
first-return map associated to f ). Prove that f˜(Ã) ⊆ Ã, that µ Ã \ f˜(Ã) = 0, and that µ|Ã is f˜-invariant.
Exercise 4.1.3. (Hopf) Let X be a locally compact separable metric space and f : X → X a Borel-measurable
map. If µ is a locally finite f -invariant Borel measure, prove that µ-almost every x ∈ X is recurrent or has
empty ω-limit set. (Hint: since µ is locally finite, Lusin’s theorem yields a sequence {Xi } of open sets with
finite measure whose union has full measure. For every Xi let X̃i be defined as in the previous exercise.
Applying Theorem 4.1.2 to f˜: X̃i → X̃i deduce that almost every point of X̃i is recurrent. Conclude that
almost every point of Xi is recurrent or has ω-limit set disjoint from Xj .)
48
Sistemi Dinamici Discreti, A.A. 2005/06
Definition 4.1.2: Let µ be a Borel measure on a topological space X. The support of µ is the closed set
supp µ = X \
[
{A | A open in X, µ(A) = 0}.
Lemma 4.1.3: Let µ be a Borel measure on a topological space X. Then:
(i)
(ii)
(iii)
(iv)
if X is second countable then µ(X \ supp µ) = 0;
any set of full measure is dense in supp µ;
if f : X → X is continuous and µ is f -invariant then supp µ is f -invariant;
if X is a separable metric space, f : X → X is continuous, and µ is an f -invariant probability measure,
then supp µ is contained in the closure of recurrent points, and hence supp µ ⊆ N W (f ).
Proof : (i) Indeed, we can write X \ supp µ as countable union of measure zero open sets.
(ii) Assume that C is not dense in supp µ; this means that there exist x ∈ supp µ and an open neighbourhood A of x disjoint from C. Since A ∩ supp µ 6= 0, we must have µ(A) > 0; therefore µ(X \ C) ≥ µ(A) > 0,
and so C is not of full measure.
(iii) Since f is continous and µ is f -invariant, we clearly have f −1 (X \ supp µ) ⊆ X \ supp µ. Therefore f −1 (supp µ) ⊇ supp µ, and hence supp µ is f -invariant.
(iv) Theorem 4.1.2 says that the set of recurrent points has full measure; hence it is dense in supp µ,
and we are done.
4.2 Existence of invariant measures
A natural question now is whether invariant measures exist. We have a positive answer in the case of
continuous self-maps of compact metric spaces. To prove it let us recall a couple of definitions and theorems.
Definition 4.2.1: Let A be a σ-algebra on a set X. We shall denote by MA (X) the set of probability
measures on A. If X is a topological space and A is the σ-algebra of Borel subsets, we shall write M(X)
instead of MA (X). If f : X → X is a A-measurable map, we shall denote by MfA (X) the set of f -invariant
probability measures on A, and by Mf (X) the set of f -invariant Borel probability measures.
The Borel measures on compact Hausdorff spaces have a nice relationship with the dual of the space of
continuous functions:
Theorem 4.2.1: (Riesz representation theorem) Let X be a compact Hausdorff space. Then:
0
(i) for every bounded linear functional T on C
R exists a unique pair of mutually singular finite
R (X) there
Borel measures µ and ν such that T (ϕ) = X ϕ dµ − X ϕ dν for all ϕ ∈ C 0 (X).
(ii) a bounded linear functional T on C 0 (X) is positive (that
R is, T (ϕ) ≥ 0 if ϕ ≥ 0) if and only if there
exists a unique finite Borel measure µ such that T (ϕ) = X ϕ dµ for all ϕ ∈ C 0 (X).
As a consequence, if X is a compact Hausdorff space we can realize M(X) as the subset of the unit
ball of the dual of C 0 (X) consisting of the positive functionals. Now, the dual of a normed vector space is
endowed with the weak-∗ topology, i.e., the topology of pointwise convergence (a sequence {Tj } of functionals
on the normed space V converges to a functional T if and only if Tj (v) → T (v) for all v ∈ V ), and we have
the famous
Theorem 4.2.2: (Banach-Alaoglu) The unit ball (with respect to the dual norm) of the dual of a normed
vector space is weak-∗ compact.
This is actually a consequence of Tychonoff’s theorem on the product of compact spaces. Indeed, if B is
the unit ball in the normed vector space V , we can realize the unit ball of the dual as a closed (with respect
to the product topology) subset of [−1, 1]B , which is compact by Tychonoff’s theorem. Since the topology
of pointwise convergence coincides with the product topology, we are done.
Since the set of positive linear functionals on C 0 (X) is clearly closed with respect to the weak-∗ topology,
as a corollary we have
4.3
Ergodic measures
49
Corollary 4.2.3: Let X be a compact Hausdorff space. Then M(X) is compact in the weak-∗ topology.
Exercise 4.2.1. Let X be a compact metric space, and take a countable subset {gj }j∈N ⊂ C 0 (X) dense in
the unit ball of C 0 (X). Given µ, ν ∈ M(X) put
¯
¯Z
Z
∞
X
¯
1 ¯¯
gj dµ −
gj dν ¯¯ .
d(µ, ν) =
¯
j
2
X
X
j=0
Prove that d is a metric on M(X) inducing the weak-∗ topology.
We are now able to prove the existence of invariant Borel measures:
Theorem 4.2.4: (Krylov-Bogolubov) Let f : X → X be a continuous self-map of a compact metric space X.
Then there exists an f -invariant probability Borel measure.
¡
¢
Proof : Let f∗ : M(X) → M(X) be defined by (f∗ µ)(A) = µ f −1 (A) for any Borel subset A ⊆ X. In
particular we have
Z
Z
∀ϕ ∈ C 0 (X)
(ϕ ◦ f ) dµ =
X
ϕ d(f∗ µ),
X
and thus f∗ is continuous in the weak-∗ topology. Our aim is to find a fixed point of f∗ .
Take µ0 ∈ M(X), and let µn ∈ M(X) be defined by
µn =
n
1 X
(f∗ )m µ0 .
n + 1 m=0
Since, by Corollary 4.2.3 and Exercise 4.2.1, M(X) is a compact metric space, we can extract a subsequence {µnk } converging to µ ∈ M(X). Now,
f∗ µnk =
nk
nk
X
X
1
1
1
1
µ0 +
(f∗ )nk +1 µ0 .
(f∗ )m+1 µ0 =
(f∗ )m µ0 −
nk + 1 m=0
nk + 1 m=0
nk + 1
nk + 1
The last two terms converge to 0 when k goes to infinity; hence
nk
X
1
(f∗ )m µ0 = lim µnk = µ,
k→+∞ nk + 1
k→∞
m=0
f∗ µ = lim f∗ µnk = lim
k→∞
and we are done.
Exercise 4.2.2. Let X be a compact Hausdorff space, and f : X → RX a continuous map. Prove that a
bounded positive linear functional T on C 0 (X) is of the form T (ϕ) = X ϕ dµ for a f -invariant finite Borel
measure µ ∈ Mf (X) if and only if T (ϕ ◦ f ) = T (ϕ) for all ϕ ∈ C 0 (X).
Exercise 4.2.3.
Let f : [0, 1] → [0, 1] be given by
½
f (x) =
1
2x
1
if 0 < x ≤ 1,
if x = 0.
Prove that f is Borel measurable and that there are no f -invariant probability Borel measures on [0, 1].
50
Sistemi Dinamici Discreti, A.A. 2005/06
4.3 Ergodic measures
In general, there may exist several probability measures invariant under the action of a given continuous
map (remember the Example 4.1.1). We would like to single out measures related to the whole dynamics of
the map, and not only to part of it.
Definition 4.3.1: Let f : X → X be a continuous map of a topological space. An f -invariant Borel measure µ
is ergodic if every completely f -invariant Borel set A ⊆ X is either of zero or of full measure, that is,
either µ(A) = 0 or µ(X \ A) = 0.
There is a particular case when this happen:
Definition 4.3.2: We shall say that a continuous map f : X → X of a topological space is uniquely ergodic if
there exists one and only one f -invariant Borel probability measure.
The unique invariant measure of a uniquely ergodic map is ergodic. To prove it we need a definition:
Definition 4.3.3: Let (X, A, µ) be a measure space, and A ∈ A with 0 < µ(A) < ∞. Then the conditional
measure associated to A is the probability measure µA defined by
∀B ∈ A
µA (B) =
µ(B ∩ A)
.
µ(A)
We shall sometimes write µ(B | A) instead of µA (B).
Proposition 4.3.1: Let f : X → X a uniquely ergodic continuous map. Then the unique f -invariant Borel
probability measure µ is ergodic.
Proof : If µ is not ergodic we can find a completely f -invariant Borel set A so that 0 < µ(A), µ(X \ A) < 1.
Then µA and µX\A are f -invariant Borel probability measures, and they are distinct because µA (A) = 1
whereas µX\A (A) = 0.
Lemma 4.3.2: Let µ be a finite f -invariant measure for a continuous self-map f of a topological space X.
Then µ is ergodic if and only if every ϕ ∈ L1 (X, µ) which is f -invariant it is constant µ-almost everywhere.
Proof : If µ is not ergodic, then there is completely f -invariant Borel set A with µ(A), µ(X \ A) 6= 0. Then
the characteristic function χA of A is an element of L1 (X, µ) which is f -invariant and not constant µ-almost
everywhere.
Conversely, assume µ ergodic, and let f ∈ L1 (X, µ) be f -invariant. Then for every c ∈ R the
set Ac = {x
T and,
S ∈ X | f (x) ≤ c} is completely f -invariant; therefore either µ(Ac ) = 0 or µ(Ac ) = µ(X),
since X = c∈Q Ac , we must have µ(Ac ) = µ(X) for at least some c ∈ R. Analogously, since ∅ = c∈Q Ac ,
we must have µ(Ac ) = 0 for at least some c ∈ R. Let
c0 = inf{c ∈ R | µ(Ac ) = µ(X)} > −∞.
Then
[
Ac = 0,
µ ({x ∈ X | f (x) < c0 }) = µ
c<c0
c∈Q
and thus
µ({x ∈ X | f (x) = c0 }) = µ(Ac ) − µ ({x ∈ X | f (x) < c0 }) = µ(X),
that is f ≡ c0 µ-almost everywhere.
The arguments used in the previous section also yields the existence of ergodic measures. To see this,
we need the following
Lemma 4.3.3: Let f : X → X be a continuous map of a topological space X. If µ ∈ Mf (X) is not ergodic
then there exist µ1 , µ2 ∈ Mf (X) and 0 < t < 1 so that µ1 6= µ2 and µ = tµ1 + (1 − t)µ2 .
Proof : Let A ⊂ X be a completely f -invariant Borel set such that 0 < µ(A) < 1, and set µ1 = µA , µ2 = µX\A
and t = µ(A). Then µ1 6= µ2 and µ = tµ1 + (1 − t)µ2 .
4.3
Ergodic measures
51
Definition 4.3.4: Let V be a vector space, and K ⊂ V a closed convex subset. An extreme point of K
is a point v ∈ K such that if we can write v = tv1 + (1 − t)v2 with v1 , v2 ∈ K and 0 < t < 1 then
necessarily v1 = v2 = v.
So extreme points of Mf (X) are ergodic measures (the converse is true too; see Exercise 4.3.1). Thus
to prove the existence of ergodic measures it suffices to prove the existence of extreme points:
Theorem 4.3.4: Let f : X → X be a continuous self-map of a compact metric space X. Then there exists
an ergodic f -invariant probability Borel measure.
Proof : Let {ϕj }j∈N be a countable dense set in C 0 (X), and define Mj by setting M0 = Mf (X) and
¯Z
½
¾
Z
¯
¯
ϕj dµ = max
ϕj dν .
Mj+1 = µ ∈ Mj ¯
ν∈Mj X
X
R
Since ν 7→ X ϕj dν is continuous (and recalling Theorem 4.2.4) every Mj is a not empty convex compact
set; hence their intersection G is not empty. To conclude the proof it suffices to show that every µ ∈ G is an
extreme point of Mf (X).
Take µ ∈ G, and write µ = tµ1 + (1 − t)µ2 with 0 < t < 1 and µ1 , µ2 ∈ Mf (X). Then
Z
Z
Z
0
ϕ dµ = t
ϕ dµ1 + (1 − t)
ϕ dµ2 .
∀ϕ ∈ C (X)
X
X
X
Since µ ∈ M1 , this implies
½Z
¾
Z
Z
Z
Z
t
ϕ0 dµ1 + (1 − t)
ϕ0 dµ2 =
ϕ0 dµ ≥ max
ϕ0 dµ1 ,
ϕ0 dµ2 ,
X
X
X
X
X
R
R
R
which is possible if and only if X ϕ0 dµ = X ϕ0 dµ1 = X ϕ0 dµ2 ; in particular, µ1 , µ2 ∈ M1 . Repeating
this argument, by induction we see that
Z
Z
Z
ϕj dµ =
ϕj dµ1 =
ϕj dµ2
X
X
X
R
R
R
for every j ∈ N. But {ϕj } is dense in C 0 (X); hence X ϕ dµ = X ϕ dµ1 = X ϕ dµ2 for all ϕ ∈ C 0 (X).
The uniqueness statement in Theorem 4.2.1.(ii) then implies µ1 = µ2 = µ, and so µ is an extreme point
of Mf (X), as desired.
Actually, it is possible to use convex analysis (more precisely, Choquet’s theorem on extreme points) to
prove a sort of decomposition of any invariant measure as integral or ergodic ones:
Theorem 4.3.5: Every invariant Borel probability measure µ for a continuous self-map f of a compact
metric space X can be decomposed into an integral of ergodic invariant Borel probability measures in the
following sense: there is a partition (modulo null sets) of X into completely f -invariant subsets {Xα }α∈A ,
where A is a measure space endowed with a probability measure dα (actually, A can be taken a Lebesgue
space), so that every Xα carries an ergodic f -invariant measure µα and we have
Z
Z Z
ϕ dµ =
ϕ dµα dα.
∀ϕ ∈ C 0 (X)
X
A
Xα
Two different ergodic measures are related to completely different parts of the dynamics:
Proposition 4.3.6: Two distinct ergodic measures for a continuous self-map f of a topological space X are
mutually singular.
Proof : Let µ and µ1 be two f -invariant ergodic measures, and assume that µ1 is not absolutely continuous
with respect to µ. Then there is a Borel set A such that µ(A) = 0 but µ1 (A) > 0. Let
\ [
f −j (A)
à =
n≥0 j≥n
be the set of points whose orbit intersects A infinitely many times. Then à is a completely f -invariant subset
such that µ(Ã) = 0 and µ1 (Ã) > 0 (compare Proposition 4.1.1). By ergodicity, it follows that X \ Ã has full
µ-measure while à has full µ1 -measure, and hence µ and µ1 are mutually singular.
If, on the other hand, µ1 << µ, the Radon-Nikodym derivative dµ1 /dµ is f -invariant, and thus µ-almost
everywhere constant (by Lemma 4.3.2), which implies µ1 = µ.
52
Sistemi Dinamici Discreti, A.A. 2005/06
Remark 4.3.1. Notice that the last argument in the previous proof shows that if µ1 is f -invariant, µ is
ergodic and µ1 << µ then µ1 = µ.
Exercise 4.3.1.
Prove that a measure f ∈ Mf (X) is ergodic if and only if it is an extreme point of Mf (X).
There is a stronger notion of ergodicity:
Definition 4.3.5: An f -invariant measure µ is mixing if
¡
¢
lim µ f −n (A) ∩ B = µ(A)µ(B)
n→∞
(4.3.1)
for every pair of measurable sets A and B.
Lemma 4.3.7: Any mixing measure is ergodic.
Proof : Indeed, if A ⊆ X is completely f -invariant we have
¡
¢
¡
¢
µ(A)µ(X \ A) = lim µ f −n (A) ∩ X \ A = µ A ∩ (X \ A) = 0,
n→∞
and hence µ(A) = 0 or µ(X \ A) = 0.
Remark 4.3.2.
and only if
In the next section we shall show that an f -invariant probability measure µ is ergodic if
n−1
¢
1 X ¡ −m
µ f (A) ∩ B = µ(A)µ(B)
n→∞ n
m=0
lim
for every pair of measurable sets A and B.
Ergodicity is the statistical equivalent of topological transitivity, unique ergodicity of minimality, and
mixing of topological mixing. We shall give a better explanation of this in the next section, but meanwhile
we can prove the following:
Proposition 4.3.8: Let f : X → X be a homeomorphism of a locally compact Hausdorff space with a
countable basis of open sets and no isolated points. Assume that there exists an ergodic measure µ
with supp µ = X. Then f is topologically transitive.
Proof : Since supp µ = X, every open set has positive measure. But, since µ is ergodic, there cannot be
two not empty disjoint completely f -invariant open subsets of X and, by Corollary 1.4.5, f is topologically
transitive.
Remark 4.3.3. If f is a homeomorphism, it is easy to check that the support of an f -invariant measure is
completely f -invariant. Then if an ergodic measure µ gives no mass to isolated points the previous argument
shows that f is topologically transitive when restricted to the support of µ. In the next section we shall
show that this is true even when f is not a homeomorphism.
Remark 4.3.4. Not every f -invariant measure for a topologically transitive map is ergodic. For instance,
if µ1 and µ2 are Bernoulli measures on Ω2 , they are ergodic for the shift (they are mixing; see Example 4.3.5);
on the other hand, µ = (µ1 + µ2 )/2 is still shift-invariant and of full support, but it is not ergodic, because
of Proposition 4.3.6.
Proposition 4.3.9: Let f : X → X be a uniquely ergodic continuous self-map of a compact metric space X,
and µ the unique element of Mf (X). Then f is minimal on supp µ.
Proof : Take x ∈ supp µ, and let Λ ⊆ supp µ be the closure of the orbit of x. Since Λ is clearly f -invariant,
Theorem 4.2.4 yields an f |Λ -invariant Borel measure ν. Define then a Borel measure ν̃ on X by setting ν̃(A) = ν(A ∩ Λ) for every Borel set A. Since Λ is f -invariant we have
¢
¡
¡
¢
¡
ν̃ f −1 (A) = ν f −1 (A) ∩ Λ) = ν (f |Λ )−1 (A ∩ Λ) = ν(A ∩ Λ) = ν̃(A),
and hence ν̃ is f -invariant. By unique ergodicity, ν̃ = µ; but since supp ν̃ ⊆ Λ, this implies Λ = supp µ, and
we are done.
4.3
Ergodic measures
53
Exercise 4.3.2. Let f : X → X be a continuous map of a topological space X. Assume that there exists a
mixing measure µ. Prove that f is topologically mixing on supp µ.
We shall now describe some examples of ergodic and uniquely ergodic maps. We begin with some
generalities on topological groups.
Definition 4.3.6: A Borel measure on a topological group G invariant under all left (respectively, right)
translations will be said left (respectively, right) invariant.
We have the following important (and deep)
Theorem 4.3.10: (Haar) Let G be a locally compact topological group. Then there exists a unique (up to
multiplicative constants) locally finite left invariant Borel measure on G.
Definition 4.3.7: Let G be a compact topological group. The unique (by the previous theorem) Borel left
invariant probability measure on G is called the Haar measure of G.
Example 4.3.1. The usual Lebesgue measure on the torus Tn (the one induced by the Lebesgue measure
of Rn ), suitably normalized, is the Haar measure of Tn .
Lemma 4.3.11: The Haar measure µ of a compact group G is also right invariant, and invariant under
continuous homomorphisms.
Proof : Given x ∈ G let ν = (Rx )∗ µ, where Rx is the right translation by x. Then ν is left invariant, because
for all y ∈ G and Borel sets A we have
¢
¡ −1 −1
¢
¡ −1 −1
¢
¡ −1
¢
¡
ν L−1
y (A) = µ Rx Ly (A) = µ Ly Rx (A) = µ Rx (A) = ν(A),
where Ly is the left translation by y. Since we clearly have ν(G) = 1, the uniqueness of the Haar measure
yields ν = µ, and thus µ is right invariant.
Analogously, if H: G → G is a continuous homomorphism, setting ν = H∗ µ the same argument shows
that ν is left-invariant with ν(G) = 1, and hence ν = µ, as desired.
Exercise 4.3.3.
Prove that the support of the Haar measure of a compact group G is the whole group G.
Theorem 4.3.12: Let G be a compact abelian group, µ its Haar measure, and Lx : G → G a left translation.
Then the following properties are equivalent:
(i) Lx is topologically transitive;
(ii) Lx is minimal;
(iii) µ is Lx -ergodic;
(iv) Lx is uniquely ergodic;
(v) the orbit of x is dense.
Proof : (i) ⇒ (ii) Lemma 1.4.2.
(ii) ⇒ (v) Obvious.
(v) ⇒ (iv) Take ν ∈ MLx (G); it suffices to prove that ν is the Haar measure of G. Given y ∈ G, choose
n −1
a sequence {nj } ⊂ N∗ such that Lx j (x) = nj x converges to y. We shall prove that
Z
Z
0
lim
ϕ ◦ Lnj x dν =
ϕ ◦ Ly dν.
(4.3.2)
∀ϕ ∈ C (G)
j→∞
G
G
By uniform continuity of ϕ, for every ε > 0 there is a neighbourhood U of the identity element e such that
|ϕ(y1 ) − ϕ(y2 )| ≤ ε for every y2 ∈ y1 + U . Now,
|(ϕ ◦ Lnj x )(z) − (ϕ ◦ Ly )(z)| = |ϕ(z + nj x) − ϕ(z + y)|;
if j is large enough we have nj x − y ∈ U , and thus kϕ ◦ Lnj x − ϕ ◦ Ly k∞ ≤ ε, and (4.3.2) follows. But then
Z
Z
Z
ϕ ◦ Ly dν = lim
ϕ d(Ly )∗ ν =
G
G
j→∞
Z
ϕ ◦ Lnj x dν = lim
G
j→∞
Z
ϕ d(Lnx j )∗ ν =
G
ϕ dν.
G
54
Sistemi Dinamici Discreti, A.A. 2005/06
But this means that ν is left invariant, and hence it is the Haar measure of G, as claimed.
(iv) ⇒ (iii) Proposition 4.3.1.
(iii) ⇒ (i) If G is a finite group, then the Haar measure is the normalized counting measure. If x has
period strictly less than the cardinality of G, then the orbit of x is a completely Lx -invariant subset with
measure strictly greater than zero and strictly less than one, against the ergodicity assumption. So the orbit
of x must be the whole of G, and Lx is topologically transitive.
If G has an isolated point, then all points are isolated, and thus G (being compact) must be finite. So
we can assume that G has no isolated points, and in this case the assertion follows from Proposition 4.3.11,
because it is easy to see that supp µ = G (exercise).
Corollary 4.3.13: (i) Irrational rotations of S 1 are uniquely ergodic.
(ii) The Lebesgue measure of Tn is ergodic for a translation Tγ : Tn → T n if and only if 1, γ1 , . . . , γn are
rationally independent if and only if Tγ is uniquely ergodic.
Proof : It follows from Propositions 1.4.1, 1.4.7 and 4.3.12.
Example 4.3.2. The Lebesgue measure µ of S 1 is ergodic for the maps Em : S 1 → S 1 with |m| > 1. Indeed,
let ϕ ∈ L1 (S 1 , µ) be f -invariant, and let
ϕ(x) =
+∞
X
ϕk exp(2πikx)
k=−∞
its expansion in Fourier series. Since ϕ ◦ Em = ϕ almost everywhere, it follows that ϕmk = ϕk for all k ∈ Z.
But |ϕk | → 0 as |k| → +∞; hence necessarily ϕk = 0 for k 6= 0, and thus ϕ is constant. The assertion then
follows from Lemma 4.3.2.
We end this section with a list of examples of mixing measures, together with an (unproved) proposition
useful in proving that a measure is mixing.
Definition 4.3.8: A collection C of measurable sets in a measure space (X, A, µ) is dense if for every A ∈ A
and any ε > 0 there is A0 ∈ C so that µ(A4A0 ) < ε, where A4A0 = (A \ A0 ) ∪ (A0 \ A) is the symmetric
difference. More generally, a collection C ⊂ A is said sufficient if the family of finite disjoint unions of
elements of C is dense.
Proposition 4.3.14: Let f : X → X be an endomorphism of a measure space (X, A, µ). Then:
(i) if (4.3.1) holds for all A and B belonging to a sufficient collection of measurable sets, then µ is mixing;
(ii) µ is mixing if and only if a given complete system Φ of functions in L2 (X, µ) and any ϕ, ψ ∈ Φ one has
Z
Z
Z
lim
(ϕ ◦ f n )ψ dµ =
ϕ dµ ·
ψ dµ.
(4.3.3)
n→+∞
X
X
X
Example 4.3.3. The Lebesgue measure of the torus is never mixing with respect to translations Tγ , while
it is mixing for hyperbolic automorphisms of T2 .
Example 4.3.4.
The Lebesgue measure of S 1 is mixing with respect to the maps Em with |m| > 1.
Example 4.3.5.
The Bernoulli measure is mixing with respect to the full shift both in ΩN and in Ω+
N.
4.4 Birkhoff ’s theorem
The main result of this section is the following
Theorem 4.4.1: (Birkhoff) Let f : X → X be an endomorphism of a probability space (X, A, µ). Then for
every ϕ ∈ L1 (X, µ) the following limit
n−1
1X ¡ j ¢
ϕ f (x)
n→+∞ n
j=0
ϕf (x) = lim
4.4
Birkhoff’s theorem
55
exists µ-almost everywhere. Furthemore, ϕf ∈ L1 (X, µ) is f -invariant and
Z
Z
ϕf dµ =
ϕ dµ.
X
(4.4.1)
X
Proof : Let I = {A ∈ A | f −1 (A) = A} be the invariant σ-algebra associated to f . Notice that a function ψ: X → R is measurable with respect to I if and only if it is f -invariant and A-measurable.
Given ψ ∈ L1 (X, µ) set
k−1
X
Ψn = max
ψ ◦ f j ∈ L1 (X, µ);
1≤k≤n
j=0
since Ψn+1 ≥ Ψn for µ-almost every x ∈ X we have either Ψn (x) → +∞ or the sequence {Ψn (x)} is bounded.
Furthermore,
Ψn ◦ f = Ψn+1 − ψ + min{0, Ψn ◦ f },
and hence the set Aψ = {x ∈ X | Ψn (x) → +∞} belongs to I. Furthermore,
Ψn+1 − Ψn ◦ f = ψ − min{0, Ψn ◦ f } & ψ
So, by the Dominated Convergence Theorem,
Z
Z
0≤
(Ψn+1 − Ψn ) dµ =
Aψ
on Aψ .
Z
(Ψn+1 − Ψn ◦ f ) dµ →
Aψ
ψ dµ.
(4.4.2)
Aψ
Now let ψI ∈ L1 (X, µ) be the (invariant) Radon-Nykodim derivative of ψµ|I with respect to µ|I , that is the
unique f -invariant function in L1 (X, µ|I ) ⊂ L1 (X, µ) such that
Z
Z
ψ dµ =
A
ψI dµ
A
for all A ∈ I. In particular, (4.4.2) implies that if ψI < 0 then necessarily µ(Aψ ) = 0.
Another general remark is that we have
lim sup
n→∞
n−1
1X
Ψn
=0
ψ ◦ f k ≤ lim sup
n
n
n→∞
(4.4.3)
k=0
for µ-almost every x ∈
/ Aψ .
Now take ε > 0 and put ψ = ϕ − ϕI − ε. Then ψI = −ε, which means that µ(Aψ ) = 0. But then (4.4.3)
yields
n−1
1X
lim sup
ϕ ◦ f k ≤ ϕI + ε
n→+∞ n
k=0
µ-almost everywhere. Replacing ϕ by −ϕ yields
lim inf
n→+∞
n−1
1X
ϕ ◦ f k ≥ ϕI − ε
n
k=0
µ-almost everywhere. It follows that ϕf = ϕI µ-almost everywhere, and in particular
Z
Z
ϕf dµ =
X
Z
ϕI dµ =
X
ϕ dµ.
X
56
Sistemi Dinamici Discreti, A.A. 2005/06
Definition 4.4.1: The function ϕf is said the time-average (or orbital average) of ϕ. In particular, if ϕ = χA
is the characteristic function of a subset A, then
1
#{j ∈ {0, . . . , n − 1} | f j (x) ∈ A}
(χA )f (x) = lim
n→+∞ n
is denoted by τA and called the average time spent by x in A.
In particular, Birkhoff’s theorem implies that
Z
τA dµ = µ(A)
X
for all measurable sets A.
Corollary 4.4.2: Let f : X → X be an endomorphism of a probability space (X, A, µ), and 1 ≤ p ≤ ∞.
Then for every ϕ ∈ Lp (X, µ) the time-average ϕf belongs to Lp (X, µ) with kϕf kp ≤ kϕkp , and
°
°
°
°
n−1
X
°
°
1
j°
lim °
−
ϕ
◦
f
(4.4.4)
ϕ
f
°
° =0
n→∞
n j=0
°
°
p
for 1 ≤ p < ∞.
Proof : First of all, applying Birkhoff’s theorem to |ϕ| we get
n−1
1 X ¯¯ ¡ j ¢¯¯
ϕ f (x) = |ϕ|f (x)
|ϕf (x)| ≤ lim
n→+∞ n
j=0
(4.4.5)
for µ-almost all x ∈ X. In particular, if ϕ ∈ L∞ (X, µ) then ϕf ∈ L∞ (X, µ) and kϕf k∞ ≤ kϕk∞ .
If 1 ≤ p < ∞ we have
°
°p
p
p
p
° n−1
°
Z
n−1
n−1
n−1
X
X
X
X
°
°
1
1
1
1
|ϕ ◦ f j | dµ = °
|ϕ ◦ f j |°
kϕ ◦ f j kp =
kϕkp = kϕkpp ,
°n
° ≤ n
n j=0
n
X
° j=0
°
j=0
j=0
p
where we used the fact that kϕ ◦ f kp = kϕkp because f is measure-preserving. Then (4.4.5) and Fatou’s
lemma imply that ϕf ∈ Lp (X, µ) and kϕf kp ≤ kϕkp .
To prove (4.4.4), assume for a moment that ϕ ∈ L∞ (X, µ) ⊂ L1 (X, µ). Then
¯p ¯
¯p
¯
¯
¯
¯
¯
n−1
n−1
X
X
¯
¯
¯
¯
j
j
p
p
¯ ≤ ¯kϕf k∞ + 1
¯
¯ϕf (x) − 1
ϕ
◦
f
(x)
kϕ
◦
f
k
∞ ¯ ≤ 2 kϕk∞ ,
¯
¯
¯
n
n
¯
¯
¯
¯
j=0
j=0
j
and (4.4.4) follows from the Dominated Convergence Theorem.
Take now ϕ ∈ Lp (X, µ). Given ε > 0, choose ϕ̃ ∈ L∞ (X, µ) so that kϕ − ϕ̃kp ≤ ε/3, and choose N > 0
such that
°
°
°
°
n−1
X
°
°
1
j°
°ϕ̃f −
∀n ≥ N
ϕ̃
◦
f
°
° ≤ ε/3.
n j=0
°
°
p
Then kϕf − ϕ̃f kp = k(ϕ − ϕ̃)f kp ≤ kϕ − ϕ̃kp ≤ ε/3 and
°
°
° n−1
°
n−1
°1 X
°
1X
j°
°
(ϕ
−
ϕ̃)
◦
f
≤
kϕ − ϕ̃kp ≤ ε/3;
°n
°
n j=0
° j=0
°
p
summing up we get
∀n ≥ N
°
°
°
°
n−1
X
°
°
1
j°
°ϕf −
ϕ ◦ f ° ≤ ε,
°
n j=0
°
°
p
and we are done.
4.4
Birkhoff’s theorem
57
Corollary 4.4.3: Let f : X → X be an endomorphism of a probability space (X, A, µ). Then for every pair
of measurable sets A, B ∈ A the limit
n−1
¢
1 X ¡ −j
µ f (A) ∩ B
n→+∞ n
j=0
lim
exists.
Proof : We have
¡
µ f
−j
¢
Z
(A) ∩ B =
Z
(χA ◦ f j )χB dµ.
χf −j (A) χB dµ =
X
X
Since χA ∈ L1 (X, µ), we can apply the previous corollary to conclude that the sequence
n−1
1X
χA ◦ f j
n j=0
converges in L1 (X, µ). Thus
Z
n−1
n−1
X
¢
1 X ¡ −j
1
µ f (A) ∩ B =
χA ◦ f j χB dµ
n j=0
n j=0
X
also converges.
Corollary 4.4.4: Let f : X → X be an endomorphism of a probability space (X, A, µ). Then the following
properties are equivalent:
(i) µ is ergodic;
(ii) for every A, B ∈ A we have
n−1
¢
1 X ¡ −j
µ f (A) ∩ B = µ(A) · µ(B);
n→+∞ n
j=0
lim
(iii) for every ϕ ∈ L1 (X, µ) we have
Z
n−1
1X ¡ j ¢
ϕf (x) = lim
ϕ f (x) ≡
ϕ dµ
n→∞ n
X
j=0
µ-almost everywhere; in other words, the time-average is almost always equal to the space average;
(iv) τA ≡ µ(A) µ-almost everywhere for all A ∈ A.
Proof : (i) ⇒ (iii) Since ϕf is f -invariant, it must be constant µ-almost everywhere (Lemma 4.3.2), and the
assertion follows from (4.4.1).
(iii) ⇒ (iv) Obvious.
(iv) ⇒ (ii) By the Dominated Convergence Theorem
Z
Z
n−1
n−1 Z
X
1
1X
j
µ(A)µ(B) =
τA χB dµ =
χA ◦ f
(χA ◦ f j )χB dµ
lim
χB dµ = lim
n→+∞ n
n→+∞ n
X
X
X
j=0
j=0
n−1
¢
1 X ¡ −j
µ f (A) ∩ B .
n→+∞ n
j=0
= lim
(ii) ⇒ (i). If A is completely f -invariant we have
n−1
n−1
¢
¢
1 X ¡ −j
1X ¡
µ f (A) ∩ X \ A = lim
µ A ∩ X \ A = 0,
n→+∞ n
n→+∞ n
j=0
j=0
µ(A)µ(X \ A) = lim
and µ is ergodic.
58
Sistemi Dinamici Discreti, A.A. 2005/06
Remark 4.4.1. Corollary 4.4.4.(iv) says that if µ is ergodic then the average time spent in A by µ-almost
every point is exactly equal to the measure of A.
Corollary 4.4.5: Let f : X → X be a self-map of a separable metric space X, and µ an ergodic Borel
probability measure. Then the orbit of µ-almost every x ∈ supp µ is dense in supp µ. In particular, f is
topologically transitive on supp µ.
Proof : Let {Um }m∈N be a countable basis of open sets for the induced topology on supp µ; clearly, µ(Um ) > 0
for all m ∈ N. Applying Corollary 4.4.4.(iv) simultaneously to the characteristic functions of Um we obtain
a set A ⊆ supp µ of full measure such that
n−1
¢
¡
1X
χUm f j (x) = µ(Um ) > 0
n→∞ n
j=0
lim
for all x ∈ A and all m ∈ N. But then the orbit of every x ∈ A must intersect all Um ’s, and hence is dense.
Example 4.4.1. Let f : [0, 1] → R be continuous, and 0 < x < 1. Let 0.a1 a2 a3 . . . be the decimal expansion
of x, and set xn = 0.an an+1 . . .. Then
f (x1 ) + · · · + f (xn )
=
n→+∞
n
Z
1
f (x) dx
lim
0
for almost every x ∈ (0, 1). Indeed, the limit on the left is fE10 , and E10 is ergodic.
One might expect that if µ is ergodic and ϕ ∈ C 0 (X) then the time-average might be constant everywhere
instead of just constant µ-almost everywhere. This is not the case, as shown by the following
Theorem 4.4.6: Let f : X → X be a continuous self-map of a compact metric space X. Then the following
properties are equivalent:
(i) f is uniquely ergodic;
(ii) for every ϕ ∈ C 0 (X) the limit
n−1
1X ¡ j ¢
ϕ f (x)
n→∞ n
j=0
lim
exists for every x ∈ X and does not depend on x;
(iii) for every ϕ ∈ C 0 (X) the sequence of functions
n−1
1X ¡ j ¢
ϕ f (x)
n j=0
(4.4.6)
converges uniformly to a constant.
(4.4.6) does not converge
Proof : (i) ⇒ (iii) If (iii) does not hold, there is ϕ ∈ C 0 (X) so that the sequence
R
uniformly to a constant; in particular, it does not converge uniformly to X ϕ dµ, where µ is the unique
element of Mf (X). Then there exists ε > 0, a sequence nk → +∞, and a sequence {xk } ⊂ X such that
∀k ∈ N
¯
¯
¯
¯
Z
k −1
¯
¯ 1 nX
¡ j
¢
¯
ϕ f (xk ) −
ϕ dµ¯¯ ≥ ε.
¯ nk
X
¯
¯
j=0
Theorem 4.2.1 implies that for every k ∈ N there exists µk ∈ M(X) such that
Z
∀ψ ∈ C 0 (X)
ψ dµk =
X
nk −1
¡
¢
1 X
ψ f j (xk ) .
nk j=0
4.5
Topological entropy
59
Since M(X) is weakly-* compact, we can assume that the sequence {µk } converges to ν ∈ M(X). We claim
that ν ∈ Mf (X). Indeed,
Z
Z
nk −1
¡
¢
1 X
(ψ ◦ f ) dν = lim
(ψ ◦ f ) dµk = lim
ψ f j+1 (xk )
k→∞
k→∞
n
k j=0
X
X
Z
Z
¢
¤
1 £ ¡ nk
ψ dµk + lim
ψ dν
= lim
ψ f (xk ) − ψ(xk ) =
k→∞ X
k→∞ nk
X
for every ψ ∈ C 0 (X), because kψ ◦ f nk − ψk∞ ≤ 2kψk∞ , and hence ν is f -invariant. But
¯
¯
¯
¯
¯
¯ nX
¯Z
¯Z
Z
Z
Z
¯
¯
¯
¯ 1 k −1 ¡ j
¯
¯
¢
¯
¯
¯
¯
¯ ϕ dν −
ϕ dµ¯ = lim ¯ ϕ dµk −
ϕ dµ¯ = lim ¯
ϕ f (xk ) −
ϕ dµ¯¯ ≥ ε;
¯
k→∞
k→+∞ ¯ nk
X
X
X
X
X
¯
j=0
so ν =
6 µ, and f is not uniquely ergodic.
(iii) ⇒ (ii) Obvious.
(ii) ⇒ (i) Let T : C 0 (X) → R be the functional defined by
n−1
1X ¡ j ¢
ϕ f (x)
n→∞ n
j=0
T (ϕ) = lim
for any x ∈ X. Since T is clearly positive, bounded and linear, there is a measure ν ∈ M(X) such that
Z
0
ϕ dν = T (ϕ).
∀ϕ ∈ C (X)
X
Furthermore, ν is f -invariant, because
n−1
n−1
¤
1 X ¡ j+1 ¢
1X ¡ j ¢
1£ ¡ n ¢
ϕ f (x) − ϕ(x) = T (ϕ)
ϕ f
(x) = lim
ϕ f (x) + lim
n→∞ n
n→∞ n
n→∞ n
j=0
j=0
T (ϕ ◦ f ) = lim
for all ϕ ∈ C 0 (X).
Now take any µ ∈ Mf (X). Then the f -invariance of µ and the Dominated Convergence Theorem yield
Z
Z
Z
Z
n−1 Z
n−1
1X
1X
(ϕ ◦ f j ) dµ =
lim
(ϕ ◦ f j ) dµ =
T (ϕ) dµ = T (ϕ) =
ϕ dν
n→∞ n
X n→∞ n j=0
X
X
j=0 X
ϕ dµ = lim
X
for every ϕ ∈ C 0 (X). So µ = ν, and f is uniquely ergodic.
4.5 Topological entropy
We now introduce a way to measure the complexity of the orbits of a topological dynamical system. The
idea is to count the number of orbit segments of given length n we can distinguish up to a finite precision ε,
and then let n go to infinity and ε go to zero.
Let us begin with some definitions.
Definition 4.5.1: Let f : X → X be a continuous self-map of a compact metric space. Given n ∈ N∗ ,
let dfn : X × X → R+ be the distance given by
¡
¢
∀x, y ∈ X
dfn (x, y) = max d f j (x), f j (y) .
0≤j≤n−1
We shall denote by Bf (x, ε, n) the open ball of center x and radius ε for dfn .
Remark 4.5.1.
We have dfn+1 ≥ dfn , and hence Bf (x, ε, n + 1) ⊆ Bf (x, ε, n).
60
Sistemi Dinamici Discreti, A.A. 2005/06
Definition 4.5.2: Let f :SX → X be a continuous self-map of a compact metric space. A set E ⊆ X is
(n, ε)-spanning if X = x∈E Bf (x, ε, n). We shall denote by Sd (f, ε, n) the minimal cardinality of an (n, ε)spanning set.
Roughly speaking, Sd (f, ε, n) is the minimal number of initial conditions needed to approximate all
orbit segments of length n up to a precision ε.
Let us now put
1
hd (f, ε) = lim sup log Sd (f, ε, n) ∈ [0, +∞].
n→∞ n
Since ε1 ≤ ε implies Sd (f, n, ε1 ) ≥ Sd (f, n, ε) for all n ∈ N∗ and hence hd (f, ε1 ) ≥ hd (f, ε), the limit
lim hd (f, ε) ∈ [0, +∞]
ε→0+
(4.5.1)
exists.
Lemma 4.5.1: The limit (4.5.1) does not depend on the distance chosen to define the topology of X.
Definition 4.5.3: Let f : X → X be a continuous self-map of a compact metric space. Then the topological
entropy h(f ) ∈ [0, +∞] of f is
h(f ) = lim+ hd (f, ε) = lim+ lim sup
ε→0
ε→0
n→∞
1
log Sd (f, ε, n),
n
where d is any distance inducing the topology of X. We shall sometimes write htop (f ) instead of h(f ).
Lemma 4.5.2: The topological entropy is invariant under topological conjugacy. More generally, if g is a
factor of f , then h(g) ≤ h(f ).
Proof : Let h: X → Y be a semiconjugation between f : X → X and g: Y → Y , so that h ◦ f = g ◦ h. Fix
continuous,
for every
a distance dX on X, and a distance dY on Y .¡ Since h is uniformly
¢
¡
¢ ε >¡ 0 there ¢is
a δ(ε) > 0 so that dX (x1 , x2 ) < δ(ε) implies dY h(x1 ), h(x2 ) < ε. Then h Bf (x, δ(ε), n) ⊆ Bg h(x), ε, n ,
and hence
¡
¢
SdX f, δ(ε), n ≥ SdY (g, ε, n)
for every ε > 0 and n ∈ N∗ , because h is surjective. Taking logarithms and limits we get the assertion.
There are two other ways of defining the topological entropy.
Definition 4.5.4: Let f : X → X be a continuous self-map of a compact metric space. Let Dd (f, ε, n) be the
minimal cardinality of a cover of X composed by sets of dfn -diameter at most ε.
We clearly have
Sd (f, ε, n) ≤ Dd (f, ε, n) ≤ Sd (f, ε/2, n).
(4.5.2)
Lemma 4.5.3: For any ε > 0 the limit
lim
n→∞
exists. It follows that
lim lim
ε→0+
Remark 4.5.2.
n→∞
1
log Dd (f, ε, n)
n
1
log Dd (f, ε, n) = h(f ).
n
Since (4.5.2) also implies
Dd (f, 2ε, n) ≤ Sd (f, ε, n) ≤ Dd (f, ε, n),
the previous lemma yields
1
log Sd (f, ε, n) = h(f ).
n
Definition 4.5.5: Let f : X → X be a continuous self-map of a compact metric space. We say that a set E ⊂ X
is (n, ε)-separated if dfn (x, y) ≥ ε for all x, y ∈ E. We denote by Nd (f, ε, n) the maximal cardinality of an
(n, ε)-separated set.
lim lim inf
ε→0+ n→∞
4.6
Measure-theoretic entropy
61
Since it is easy to see that
Sd (f, ε, n) ≤ Nd (f, ε, n) ≤ Sd (f, ε/2, n),
it follows that
lim lim sup
ε→0+
n→∞
1
1
log Nd (f, ε, n) = lim lim inf log Nd (f, ε, n) = h(f ).
n
ε→0+ n→∞ n
Definition 4.5.6: If U is an open cover of a compact space X, let N (U) denote the minimal cardinality of a
subcover of U. If V is another open cover, let
U ∨ V = {U ∩ V | U ∈ U, V ∈ V}.
Finally, if f : X → X is continuous, set
f −1 (U) = {f −1 (U ) | U ∈ U}.
Proposition 4.5.4: Let f : X → X be a continuous self-map of a compact metric space X. Then:
(i) for every open cover U of X the limit
lim
n→∞
exists;
(ii) we have
¡
¢
1
log N U ∨ f −1 (U) ∨ · · · ∨ f −(n−1) (U)
n
¡
¢
1
log N U ∨ f −1 (U) ∨ · · · ∨ f −(n−1) (U) ,
h(f ) = sup lim
n→∞ n
U
where the supremum is taken over all open covers of X.
We now collect a few properties of the topological entropy:
Proposition 4.5.5: Let f : X → X be a continuous self-map of a compact metric space. Then:
(i)
(ii)
(iii)
(iv)
(v)
if
if
if
if
if
Λ ⊆ X is a closed f -invariant set then h(f |Λ ) ≤ h(f );
X = Λ1 ∪ · · · ∪ Λm , where Λ1 , . . . , Λm are closed f -invariant subsets, then h(f ) = max1≤j≤m h(f |Λj );
m ∈ N then h(f m ) = mh(f );
f is a homeomorphism then h(f m ) = |m|h(f ) for all m ∈ Z;
g: Y → Y is another continuous self-map of a compact metric space, then h(f × g) = h(f ) + h(g).
Example 4.5.1. If f : X → X is an isometry of a compact metric space, then dfn = d for all n ∈ N∗ , and
so h(f ) = 0. In particular, the topological entropy of rotations of S 1 or of translations of the torus is zero.
Example 4.5.2.
The topological entropy of Em : S 1 → S 1 is log |m|.
Example 4.5.3.
The topological entropy of the shift σN : ΩN → ΩN is log N .
Example 4.5.4.
If FL : T2 → T2 is given by FL (x, y) = (2x + y, x + y) (mod 1) then h(FL ) = (3 +
√
5)/2.
Definition 4.5.7: Let (X, d) be a compact metric space. For ε > 0 let b(ε) be the minimal cardinality of a
cover of X by ε-balls. The ball dimension of X is
D(X) = lim sup
ε→0
log b(ε)
∈ [0, +∞].
| log ε|
Theorem 4.5.6: Let f : X → X be a Lipschitz-continuous self-map of a compact metric space X. Then
h(f ) ≤ D(X) max{0, log Lip(f )}.
62
Sistemi Dinamici Discreti, A.A. 2005/06
4.6 Measure-theoretic entropy
We shall now describe a more quantitative concept of entropy, obtained using a probability measure.
Definition 4.6.1: Let (X, A, µ) be a probability space. A measurable partition of X is a family
¡ ofSmeasurable
¢
subsets P = {Cα | α ∈ I}, where I is a finite or countable set of indices, such that µ X \ α Cα = 0
and µ(Cα1 ∩ Cα2 ) = 0 when α1 6= α2 . Elements of P are called atoms of P. Given two measurable
partitions P and Q, we say that P = Q (mod 0) if for every C ∈ P there is D ∈ Q such that µ(C4D) = 0,
and conversely. If f : X → X is a measurable map, we set f −1 (P) = {f −1 (Cα ) | α ∈ I}.
Definition 4.6.2: Let P be a measurable partition of a probability space (X, µ). For x ∈ X, let P(x) be the
atom of P containing x (this is well-defined outside of a set of zero measure). The information function of
the partition P is the measurable function IP : X → R given by
¡
¢
IP (x) = − log µ P(x) .
We can think that a partition of X collects the elements of X we cannot tell apart using some measuring
instrument. Then if the atom containing x is small then the information obtained measuring x is large; this
is the meaning of the information function.
Definition 4.6.3: Let (X, A, µ) be a probability space. The entropy of a measurable partition P = {Cα | α ∈ I}
of X is given by
Z
X
µ(Cα ) log µ(Cα ) =
IP dµ ∈ [0, +∞].
hµ (P) = −
X
α∈I
µ(Cα )>0
We shall also need a conditional notion of entropy.
Definition 4.6.4: Let P = {Cα | α ∈ I} and Q = {Dβ | β ∈ J} two measurable partitions of a probability
space (X, µ). The conditional information function IP,Q : X → R of P with respect to Q is defined by
¡
¢
¡
¢
¡
¢
IP,Q (x) = − log µ P(x) | Q(x) = − log µ P(x) ∩ Q(x) + log µ Q(x) .
The conditional entropy of P with respect to Q is given by
Z
X
X
hµ (P | Q) =
IP,Q dµ = −
µ(Dβ )
µ(Cα | Dβ ) log µ(Cα | Dβ ),
X
β∈J
α∈I
where for simplicity we assume that 0 log 0 = 0.
Definition 4.6.5: Let P = {Cα | α ∈ I} and Q = {Dβ | β ∈ J} two measurable partitions of a probability
space (X, µ). We say that P is subordinate to Q, or that Q is a refinement of P, and we write P ≤ Q (mod 0),
if for every D ∈ Q there is C ∈ P so that D ⊆ C (mod 0), that is µ(D \ C) = 0. The joint partition of P
and Q is
P ∨ Q = {C ∩ D | C ∈ P, D ∈ Q, µ(C ∩ D) > 0};
clearly P ∨ Q ≤ P, Q. Finally, we say that P and Q are independent if
∀C ∈ P, D ∈ Q
µ(C ∩ D) = µ(C)µ(D).
The following proposition summarizes the technical properties of the entropy of partitions.
Proposition 4.6.1: Let (X, A, µ) be a probability space, and let P = {Cα | α ∈ I}, Q = {Dβ | β ∈ J}
and R = {Eγ | γ ∈ K} be measurable partitions of X. Then:
(i) we have
¢
¡
0 ≤ − log sup µ(Cα ) ≤ hµ (P) ≤ log cardP;
α∈I
furthermore, if P is finite then hµ (P) = log cardP if and only if all the atoms of P have equal measure;
4.6
Measure-theoretic entropy
63
(ii) we have
0 ≤ hµ (P | Q) ≤ hµ (P);
furthermore, we have hµ (P | Q) = hµ (P) if and only if P and Q are independent, and hµ (P | Q) = 0
if and only if P ≤ Q (mod 0);
(iii) if Q ≤ R then hµ (P | Q) ≥ hµ (P | R);
(iv) we have
hµ (P ∨ Q) = hµ (P) + hµ (Q | P) = hµ (Q) + hµ (P | Q)
and, more generally,
hµ (P ∨ Q | R) = hµ (P | R) + hµ (Q | P ∨ R);
(v) we have
hµ (P ∨ Q) ≤ hµ (P) + hµ (Q)
and, more generally,
hµ (P ∨ Q | R) ≤ hµ (P | R) + hµ (Q | R);
(vi) we have
hµ (P | R) ≤ hµ (P | Q) + hµ (Q | R);
(vii) if ν is another probability measure on X and P is a partition measurable with respect both to µ and
to ν, then
thµ (P) + (1 − t)hν (P) ≤ htµ+(1−t)ν (P).
∀t ∈ [0, 1]
Corollary 4.6.2: If P and Q are two measurable partitions with finite entropy of a probability space (X, µ),
set
dR (P, Q) = hµ (P | Q) + hµ (Q | P).
Then dR is a distance on the set of (all equivalence classes mod 0 of) measurable partitions with finite
entropy on X.
Definition 4.6.6: The distance dR is called the Rokhlin distance.
To define the metric entropy we need the following
Lemma 4.6.3: Let {an } ⊂ R+ be a sequence of positive real numbers such that an+m ≤ an + am for
all n, m ∈ N. Then
an
an
= inf
≥ 0.
lim
n→+∞ n
n∈N n
Proof : Let c = inf n an /n. Given ε > 0, let n0 ∈ N be such that an0 /n0 ≤ c + ε. If n > n0 , write n = n0 p + q
with 0 ≤ q < n0 and p ≥ 1. Then
an
an0 p+q
pan0
aq
an
1
aq
=
≤
+
= 0 +
≤c+ε+
sup aj .
n
n0 p + q
n0 p
n
n0
n
n 0≤j<n0
Hence for n large enough we have
c≤
an
≤ c + 2ε,
n
and we are done.
Let f : X → X be an endomorphism of ¡the probability
space (X, A, µ), and P a measurable partition
¢
with finite entropy. Then we clearly have hµ f −1 (P) = hµ (P), and so, setting
Pnf = P ∨ f −1 (P) ∨ · · · ∨ f −n−1 (P),
Proposition 4.6.1.(v) yields
∀m, n ∈ N
³
´
¡ ¢
¡ f¢
f
hµ Pn+m
.
≤ hµ Pnf + hµ Pm
Therefore the previous lemma allows us to make the following
64
Sistemi Dinamici Discreti, A.A. 2005/06
Definition 4.6.7: Let f : X → X be an endomorphism of the probability space (X, A, µ), and P a measurable
partition with finite entropy. The metric entropy of f relative to P is
1 ¡ f¢
h µ Pn .
n→∞ n
hµ (f, P) = lim
Lemma 4.6.4: Let f : X → X be an endomorphism
of the probability
¢¢ space (X, A, µ), and P a measurable
¡
¡W
n−1
partition with finite entropy. Then n 7→ hµ P | f −1 j=0 f −j (P) is non-increasing, and
¢
¡
hµ (f, P) = lim hµ P | f −1 (Pnf ) .
n→∞
Definition 4.6.8: Let f : X → X be an endomorphism of the probability space (X, A, µ). Then the entropy
of f with respect to µ (or of µ with respect to f ) is
hµ (f ) = sup{hµ (f, P) | P is a measurable partition of X with finite entropy} ∈ [0, +∞].
Remark 4.6.1.
It suffices to take the supremum with respect to the finite measurable partitions of X.
In some sense, hµ (f, P) is the average amount of information given by knowing the present state (up to
approximation P) and an arbitrarily long past. Thus the metric entropy of f measure the maximum amount
of average information we can extract from f if we disregard sets of µ-measure zero.
The definition we just gave is due to Kolmogorov. A slightly different approach is due to ShannonMcMillan-Breiman.
Let f : X → X be an endomorphism of the probability space (X, A, µ). If P is a measurable partition
of X, then it is easy to check that the atom of x ∈ X in the partition Pnf is given by
n−1
\
¡
¢
¢¢
¡ ¡
f −j P f j (x) .
Pnf (x) = {y ∈ X | f j (y) ∈ P f j (x) for 0 ≤ j ≤ n − 1} =
j=0
In particular, y ∈ Pnf (x) if and only if x and y have the same n-segment of orbit (up to the approximation P).
Then we have the following
Theorem 4.6.5: (Shannon-McMillan-Breiman) Let f : X → X be an endomorphism of the probability
space (X, A, µ) and P is a measurable partition of X with finite entropy. Then the sequence of functions IPnf /n converges µ-almost everywhere and in L1 to a function hP (f ) ∈ L1 (X, µ) which is f -invariant.
Since the convergence is in L1 we can apply the Dominated Convergence Theorem and hence we get
Z
Z
Z
1
1
1
hP (f ) dµ =
lim IPnf dµ = lim
IPnf dµ = lim hµ (Pnf ) = hµ (f, P),
n→∞
n→∞
n→∞
n
n X
n
X
X
which was the original definition of entropy. In particular, if µ is ergodic we have hP (f ) ≡ hµ (f, P) µ-almost
everywhere.
The third definition is due to Brin and Katok:
Theorem 4.6.6: (Brin-Katok) Let X be a compact metric space, f : X → X continuous, and µ ∈ Mf (X)
an f -invariant Borel probability measure, and set
h+
µ (f, x, ε) = − lim sup
n→∞
Then the limits
¡
¢
1
log µ Bf (x, ε, n) ,
n
h−
µ (f, x, ε) = − lim inf
n→∞
−
hµ (f, x) = lim h+
µ (f, x, ε) = lim hµ (f, x, ε)
ε→0+
ε→0+
exist for µ-almost every x ∈ X, are equal, are f -invariant and
Z
hµ (f, x) dµ.
hµ (f ) =
X
Now let us summarize the properties of the metric entropy.
¡
¢
1
log µ Bf (x, ε, n) .
n
4.7
The variational principle
65
Proposition 4.6.7: Let f : X → X be an endomorphism of the probability space (X, A, µ), and P, Q
measurable partitions with finite entropy. Then:
(i) we have
Ã
!
1
0 ≤ − lim sup log sup µ(C) ≤ hµ (f, P) ≤ hµ (P);
f
n→∞ n
C∈Pn
(ii)
(iii)
(iv)
(v)
(vi)
(vii)
hµ (f, P ∨ Q) ≤ hµ (f, P) + hµ (f, Q);
hµ (f, Q) ≤ hµ (f, P) + hµ (Q | P); in particular, if P ≤ Q then hµ (f, P) ≤ hµ (f, Q);
|hµ¡(f, P) − hµ¢(f, Q)| ≤ hµ (P | Q) + hµ (Q | P);
¡
¢
hµ f, f −1 (P) = hµ (f, P) and, if f is invertible, hµ (f, P) = hµ f, f (P) ;
hµ (f, P) = hµ (f, Pnf ) for all n ∈ N;
if ν is another f -invariant probability measure then
∀t ∈ [0, 1]
thµ (f, P) + (1 − t)hν (f, P) ≤ htµ+(1−t)ν (f, P).
Proposition 4.6.8: Let f : X → X be an endomorphism of the probability space (X, A, µ). Then:
(i) if the endomorphism g: Y → Y of the probability space (Y, ν) is a factor of f (that is there exists a
measure-preserving h: X → Y such that g ◦ h = h ◦ f ), then hν (g) ≤ hµ (f );
(ii) if A is completely f -invariant and µ(A) > 0 then hµ (f ) = µ(A)hµA (f ) + µ(X \ A)hµX\A (f );
(iii) if ν is another f -invariant probability measure then
thµ (f ) + (1 − t)hν (f ) ≤ htµ+(1−t)ν (f );
∀t ∈ [0, 1]
(iv) hµ (f k ) = khµ (f ) and, if f is invertible, hµ (f −k ) = |k|hµ (f );
(v) if g: Y → Y is an endomorphism of a probability space (Y, ν) then hµ×ν (f × g) = hµ (f ) + hν (g).
Example 4.6.1. The entropy of the rotations Rα : S 1 → S 1 and of translations Tγ : Tn → Tn of the torus
with respect to the Lebesgue measure is zero.
Example 4.6.2.
The entropy of Em : S 1 → S 1 with respect to the Lebesgue measure id log |m|.
Example 4.6.3. The entropy of the shift σN : ΩN → ΩN with respect to the Bernoulli measure associated
to (p0 , . . . , pN −1 ) is −p0 log p0 − · · · − pN −1 log pN −1 .
Example 4.6.4. If FL : T2 → T2 is given
√ by FL (x, y) = (2x + y, x + y) (mod 1) then its entropy with
respect to the Lebesgue measure is (3 + 5)/2.
4.7 The variational principle
The aim of this section is to prove that the topological entropy is the supremum of the metric entropies. To
do so we need two lemmas.
Lemma 4.7.1: Let X be a compact metric space, and µ ∈ M(X). Then:
¡
¢
(i) for every x ∈ X and δ > 0 there is δ 0 ∈ (0, δ) such that µ ∂B(x, δ 0 ) = 0;
(ii) given δ > 0 there is a finite measurable partition P of X such that diam(C) < δ and µ(∂C) = 0 for
all C ∈ P.
S
Proof : (i) B(x, δ) = 0<δ0 <δ ∂B(x, δ 0 ) is an uncountable disjoint union with finite measure.
(ii) Let {B1 , . . . , Bk } be a cover of X by balls of radius less than δ/2 and with µ(∂Bj ) = 0 for j = 1, . . . , k.
Sj−1
Put C1 = B1 and Cj = Bj \ l=1 Bl for j = 2, . . . , k. Then P = {C1 , . . . , Ck } is as required.
Lemma 4.7.2: Let f : X → X be a continuous self-map of a compact metric space, and ε > 0 given. For
every n ∈ N choose an (n, ε)-separated set En ⊂ X, and put
νn =
X
1
δx
card(En )
x∈En
and µn =
n−1
1X j
f νn .
n j=0 ∗
66
Sistemi Dinamici Discreti, A.A. 2005/06
Then there is an accumulation point µ (in the weak-* topology) of {µn } in M(X) which is f -invariant and
satisfies
1
lim sup log card(En ) ≤ hµ (f ).
n→∞ n
Proof : Choose a sequence nk so that
lim
k→∞
1
1
log card(Enk ) = lim sup log card(En ).
nk
n→∞ n
Since M(X) is weak-* compact, we can also assume that µnk → µ ∈ M(X). But we have
1 n
(f νn − νn ),
n ∗
f∗ µn − µn =
and hence µ is f -invariant, because the νn are probability measures.
Let now P be a finite measurable partition with atoms of diameter less than ε and satisfying the properties of Lemma 4.7.1.(ii). Since each C ∈ Pnf contains at most one element of En , there are card(En )
atoms of Pnf with νn measure 1/card(En ), while the other atoms have vanishing νn -measure; in particular, hνn (Pnf ) = log card(En ).
Now fix 0 < q < n and 0 ≤ k ≤ q − 1. If ak = b(n − k)/qc, we have
{0, 1, . . . , n − 1} = {k + rq + j | 0 ≤ r < ak , 0 < j ≤ q} ∪ S,
where
S = {0, 1, . . . , k, k + ak q + 1, . . . , n − 1};
notice that card(S) < k + q < 2q by the definition of ak . Now,
Pnf =
Ãa −1
k
_
!
f −(k+rq) (Pqf )
r=0
∨
_
f −j (P) ;
j∈S
hence
log card(En ) = hνn (Pnf ) ≤
aX
k −1
¢ X
¢
¡
¡
hνn f −(k+rq) (Pqf ) +
hνn f −j (P)
r=0
≤
aX
k −1
j∈S
hf −(k+rq) νn (Pqf ) + 2q log card(P),
r=0
∗
where we used Proposition 4.6.1.(i) and (v). Now recalling Proposition 4.6.1.(vii) we get
q log card(En ) =
q−1
X
hνn (Pnf )
≤
k=0
lim sup
n→∞
Ãa −1
k
X
r=0
nhµn (Pqf )
k=0
≤
Hence
q−1
X
!
hf −(k+rq) νn (Pqf )
∗
+ 2q log card(P)
+ 2q 2 log card(P).
hµnk (Pqf )
hµ (Pqf )
1
1
log card(En ) = lim
=
.
log card(Enk ) ≤ lim
k→∞ nk
k→∞
n
q
q
Since this is now true for all q we can pass to the limit in q and we get
lim sup
n→∞
and we are done.
1
log card(En ) ≤ hµ (f, P) ≤ hµ (f ),
n
3.7
The variational principle
67
Then we are able to prove the variational principle:
Theorem 4.7.3: Let f : X → X be a continuous self-map of a compact metric space. Then
htop (f ) = sup{hµ (f ) | µ ∈ Mf (X)}.
Proof : If P = {C1 , . . . , Ck } is a finite measurable partition of X then, since µ is a Borel measure, we have
∀C ∈ P
µ(C) = sup{µ(B) | B ⊂ C, C closed}.
So for j = 1, . . . , k we can choose a compact set Bj ⊆ Cj so that if we take Q = {B0 , B1 , . . . , Bk }
where B0 = X \ (B1 ∪ · · · ∪ Bk ) we have hµ (P | Q) < 1. Proposition 4.6.7.(iii) yields
hµ (f, P) ≤ hµ (f, Q) + hµ (P | Q) ≤ hµ (f, Q) + 1.
Now U = {B0 ∪ B1 , . . . , B0 ∪ Bk } is an open cover of X. Proposition 4.6.1.(i) yields
hµ (Qfn ) ≤ log cardQfn ≤ log(2n card Ufn ).
If δ0 > 0 is the Lebesgue number of U, then it is also the Lebesgue number of Ufn with respect to the
distance dfn . Now, U is a minimal cover; hence also each Ufn is. This means that every B ∈ Ufn contains a
point xB that does not belong to any other element of Ufn . In particular, the xB form an (n, δ0 )-separated
set. Consequently,
hµ (f, Q) ≤ htop (f ) + log 2
and
hµ (f, P) ≤ htop (f ) + log 2 + 1
for every finite measurable partition P. Therefore using Propositions 4.6.8.(iv) and 4.5.5.(iii) we get
hµ (f ) =
htop (f n ) + log 2 + 1
hµ (f n )
log 2 + 1
≤
= htop (f ) +
n
n
n
for every n ∈ N∗ , and hence
hµ (f ) ≤ htop (f ).
On the other hand, applying Lemma 4.7.2 to maximal (n, ε)-separated sets in X yields
lim sup
n→∞
1
Nd (f, ε, n) ≤ hµ (f ),
n
where µ ∈ Mf (X) is the accumulation point provided by the lemma. But then
lim sup
n→∞
1
Nd (f, ε, n) ≤ sup hµ (f ),
n
µ∈Mf (X)
and letting ε → 0 we get the assertion.
In general, the supremum in the variational principle is not achieved.
Definition 4.7.1: Let f : X → X be a continuous self-map of a compact metric space. A measure µ ∈ Mf (X)
satisfying hµ (f ) = htop (f ) is said of maximal entropy. If f has one and only one measure of maximal entropy,
we say that f is intrinsically ergodic.
Exercise 4.7.1. Let f : X → X be an intrinsecally ergodic continuous self-map of a compact metric space.
Prove that the unique measure of maximal entropy is ergodic.
© Copyright 2026 Paperzz