ERGODIC RAMSEY THEORY: WHERE COMBINATORICS AND NUMBER
THEORY MEET DYNAMICS – FIRST LECTURE
YURI LIMA AND CARLOS MATHEUS
Abstract. This is the first of a series of five lecture notes corresponding to the first part (taught
by the second author) of a mini-course on Ergodic Ramsey Theory (by the second author and
Vitaly Bergelson) held at Maceió (February 2010). In this introductory lecture, we review some
aspects of Ramsey Theory (with special enphasis on its 2 main principles) and we discuss van der
Waerden theorem. After that, we start the investigation of the relationship of van der Waerden
theorem with the recurrence properties of topological systems by the introduction of the shift
dynamics. Finally, we reserve the end of this first lecture to begin the discussion of Poincaré
recurrence theorem and its generalization, namely, Poincaré multiple recurrence theorem.
Generally speaking, Ramsey theory (named in honor of Frank P. Ramsey) concerns the following
vague question:
How many elements a given highly organized structure should contain in order to ensure that a
certain particularly interesting property is true?
Of course, the answer depends on what we mean by highly organized structure and particularly interesting property. Below, we’ll illustrate with a few examples of what a prototypical formalization
of the previous question could be.
Example 1 (Dirichlet’s pigeonhole principle). Given n pigeons distributed in m pigeonholes, how
big n must be to guarantee that at least one pigeonhole contains at least two pigeons? The answer is
provided by Dirichlet’s pigeonhole principle saying that the question has a positive answer whenever
n > m (i.e., the number of pigeons is strictly bigger than the number of pigeonholes).
Example 2 (Friendship theorem). What’s the minimum number of persons in a party so that
there is always three of them (at least) which are either mutually (pairwise) strangers or mutually
(pairwise) acquaintances? The answer is provided by the friendship’s theorem below.
Theorem 3 (Friendship theorem). The minimum number of persons in a party so that there is
always (at least) 3 of them which are either mutually strangers or mutually acquaintances is exactly
6.
Date: January 27, 2010.
1991 Mathematics Subject Classification. Ergodic Theory, Number Theory, Combinatorics.
Key words and phrases. Ramsey Theory, Ergodic Ramsey Theory, van der Waerden theorem, shift dynamics,
Krylov-Bogolyubov theorem, Poincaré recurrence theorem, Poincaré multiple recurrence theorem.
1
2
YURI LIMA AND CARLOS MATHEUS
Proof. The basic idea of the simple proof of this result is the translation of this statement to the
language of graph theory. Namely, let’s begin by showing that any party of 6 persons satisfies the
statement of the friendship theorem. We think of these 6 persons as the 6 vertices of the complete
graph 1 K6 . We color the edges of K6 with blue if the 2 persons represented by the 2 vertices
at the ends of this edge are mutually acquaintances and red if they are mutually strangers. In
this context, our task consists to find a monochromatic triangle inside K6 . Keeping this goal in
mind, we fix an arbitrary vertex (person) P and look at the 5 edges containing P . By Dirichlet’s
pigeonhole principle, among such edges, at least 3 of them have the same color, say blue.
C
B
A
D
P
E
We denote by A, B and C the three vertices (persons) in the other ends of these 3 edges. We have
two possibilities:
• either one of the edges AB, BC or CA is blue, so that the triangle formed by the vertices
of this blue edge and P is monochromatic (blue);
• or all of the 3 edges AB, BC and CA are red, so that the triangle ABC is monochromatic
(red).
In any situation, our first task is complete, i.e., we can always find monochromatic triangles inside
K6 . Finally, we show that 6 is the smallest number verifying the statement of the friendship
theorem with the following example of a blue-red coloring of K5 without monochromatic triangles.
Denoting by A, B, C, D, E the 5 vertices of K5 , we color the edges AB, BC, CD, DE, and EA in
blue and the remaining edges in red.
B
C
A
E
D
1A graph is simple if it doesn’t contain loops, i.e., edges connecting a vertex to itself, and no more than one edge
connecting two vertices. A graph is complete whenever it is simple and every pair of distinct vertices has an edge
connecting them. The complete graph of n vertices has n(n − 1)/2 edges and is denoted by Kn (K comes from the
German word komplett).
ERGODIC RAMSEY THEORY – FIRST LECTURE
3
The reader can check that the above coloring has the desired property.
Exercise 4. Complete the details of the following alternative proof of theorem 3 (via a double
counting argument):
(i) consider the subset S of ordered triples (x, y, z) of vertices of our graph where xy is red and
yz is blue;
(ii) by counting how many times a given vertex p appears in the middle of an element of S, we
see that #S ≤ 36;
(iii) observe that every non-monochromatic triangle xyz of our graph produces exactly two elements of S;
(iv) putting the two previous items together, one obtains that the number of non-monochromatic
triangles is 18 at most;
(v) since the total number of triangles of K6 is 20, we conclude that K6 contains 2 monochromatic
triangles (at least).
Example 5 (Ramsey theorem). The friendship theorem is a particular case of Ramsey theorem
(for 2 colors), whose statement and proof are given below.
Theorem 6 (Ramsey theorem for 2 colors). Given two colors (blue and red) and a pair (r, s) of
positive integers, there exists a minimal positive integer R(r, s) such that any complete graph of
R(r, s) vertices whose each egde is colored blue or red contains either an entirely blue complete
subgraph with r vertices or an entirely red complete subgraph with s vertices.
Remark 7. Friendship’s theorem corresponds to the statement R(3, 3) = 6.
Proof of theorem 6. We proceed by induction on r + s. Observe that R(n, 1) = R(1, n) = 1 for
every n ∈ N, so that we can start our argument. Suppose that R(r − 1, s) and R(r, s − 1) exist.
We claim that R(r, s) also exists. More precisely, we affirm that
R(r, s) ≤ R(r − 1, s) + R(r, s − 1).
Indeed, consider the complete graph with R(r − 1, s) + R(r, s − 1) vertices and fix an arbitrary
vertex v. Next, decompose the remaining vertices into two subsets:
C := {w : vw is blue}
and
D := {w : vw is red}.
By definition, our graph has R(r − 1, s) + R(r, s − 1) = |C| + |D| + 1 vertices, so that either
|C| ≥ R(r − 1, s) or |D| ≥ R(r, s − 1). Since the latter case is entirely analogous to the former one,
we can assume that |C| ≥ R(r − 1, s). It follows that either C contains a blue complete subgraph
with r − 1 vertices or a red complete subgraph with s vertices. In the first situation, A ∪ {v} is
a blue complete subgraph with r vertices, while in the second A (and a fortiori our initial graph)
contains a red complete subgraph with s vertices.
4
YURI LIMA AND CARLOS MATHEUS
Remark 8. As the reader can suspect, Ramsey theorem for 2 colors and its extensions (for more
colors, hypergraphs, infinite sets, etc.) are the origin of Ramsey theory. For sake of completeness,
we will include the proof of Ramsey theorem for an arbitrary number of colors below, but none
of its extensions. Instead, we refer the reader to the Wikipedia article Ramsey’s theorem for more
details and Gugu’s article O teorema de Ramsey [5] for an excellent introduction in Portuguese to
the subject.
Theorem 9 (Ramsey theorem). Given any number c of colors and any c-tuple (n1 , . . . , nc ) of
positive integers, there exists a positive integer R(n1 , . . . , nc ) such that, if the edges of the complete
graph with R(n1 . . . . , nc ) vertices are colored with c distinct colors, then it contains a i-colored
complete subgraph with ni vertices for some color 1 ≤ i ≤ c.
Proof. Again our argument proceeds by induction, but this time on the number of colors c. Observe
that Ramsey theorem is trivially true for c = 1, while it holds for c = 2 in view of theorem 6.
Assuming it is true for c − 1 ≥ 2 colors, we claim that it holds for c colors. More precisely, we
affirm that
R(n1 , . . . , nc ) ≤ R(n1 , . . . , nc−2 , R(nc−1 , nc )).
In fact, the idea here is to forget (momentarily) the difference between the colors c − 1 (blue) and c
(red), so that our graph with t := R(n1 , . . . , nc−2 , R(nc−1 , nc )) vertices is colored using c− 1 colors.
Using the inductive assumption, we have two possibilities: either our graph contains a i-colored
complete subgraph with ni vertices for some 1 ≤ i ≤ c − 2 or it contains a (blue and red)-colored
complete subgraph with R(nc−1 , nc ) colors. In the former case we are done and in the latter case
we put back the original colors c − 1 (blue) and c (red) in our subgraph with R(nc1 , nc ) vertices
so that, by definition of the number R(nc−1 , nc ), we have a c − 1 blue-colored sub-subgraph with
nc vertices or a c red-colored sub-subgraph with nc vertices. This ends the argument.
Remark 10. An interesting (and very difficult 2) problem is the explicit computation of the values
of the Ramsey numbers R(n1 , . . . , nc ). In fact, as far as the author knows, the sole explicitly
computed Ramsey numbers R(r, s) with 3 ≤ r ≤ s are:
R(r, s)
s=3
s=4
s=5
s=6
s=7
s=8
s=9
r=3
6
9
14
18
23
28
36
r=4
9
18
25
?
?
?
?
2Indeed, as Joel Spencer points out in his book Ten Lectures on the Probabilistic Method: “Erdos asks us to
imagine an alien force, vastly more powerful than us, landing on Earth and demanding the value of R(5, 5) or they
will destroy our planet. In that case, he claims, we should marshal all our computers and all our mathematicians
and attempt to find the value. But suppose, instead, that they ask for R(6, 6). In that case, he believes, we
should attempt to destroy the aliens.” By the way, it is worth to mention that one has 43 ≤ R(5, 5) ≤ 49 and
102 ≤ R(6, 6) ≤ 165.
ERGODIC RAMSEY THEORY – FIRST LECTURE
5
Concerning the missing values of this tables, one disposes only the following estimates: 35 ≤
R(4, 6) ≤ 41, 49 ≤ R(4, 7) ≤ 61, 56 ≤ R(4, 8) ≤ 84 and 73 ≤ R(4, 9) ≤ 115. Furthermore, the
unique multicolor Ramsey number known is R(3, 3, 3) = 17.
Before showing further examples of the formalization of the vague question at the origin of
Ramsey theory (our initial motivation), let us take the opportunity to briefly discuss a little bit
more the nature of these Ramsey numbers R(r, s) and R(n1 , . . . , nc ). Although the previous remark
says that we can’t hope to compute exactly R(r, s) (in general), we can give lower and upper bounds
on R(r, s), especially in the diagonal case r = s. The lower bound is quite elementary and relies
on the so-called and powerful Erdös probabilistic method :
Theorem 11. R(k, k) > 2k/2 for any k ≥ 3.
Proof. Consider a complete graph with n vertices and randomly color its edges blue or red with
equal probability 1/2. Given a subset of k vertices, the probability that all of the k(k − 1)/2 edges
connecting these points are monochromatic is
k(k−1)/2
1
2·
.
2
On the other hand, the total number of subsets of k vertices is
n
·
k
Hence, the probability p of finding a subset of k vertices such that all edges connecting these points
are monochromatic is
p≤2·
k(k−1)/2
k(k−1)/2
2nk
1
n
1
≤
.
·
·
2
k!
2
k
Now observe that, if n ≤ 2k/2 , then
p≤
2nk
·
k!
k(k−1)/2
21+k/2
1
≤
·
2
k!
Since 21+k/2 < k! for every k ≥ 4, we conclude that p < 1, i.e.,1 − p > 0, whenever n ≤ 2k/2 and
k ≥ 4. Thus, by the definition of p, there is a positive probability of finding a subset of k vertices
such that all edges connecting its vertices are monochromatic when n ≤ 2k/2 and k ≥ 4. In other
words, R(k, k) > 2k/2 for any k ≥ 4. Therefore, the proof of the theorem is complete because the
friendship theorem says that R(3, 3) = 6 > 23/2 .
On the contrary, the issue of upper bounds is more delicate and the best result (up to date) is:
Theorem 12 (Conlon [3]). For any 0 < ε ≤ 1, there exists a constant C(ε) > 0 such that
k+l
R(k + 1, l + 1) ≤ k −C(ε) log k/ log log k
k
6
YURI LIMA AND CARLOS MATHEUS
for any εk ≤ l ≤ k. In particular, there exists a constant C > 0 such that
2k
.
R(k + 1, k + 1) ≤ k −C log k/ log log k
k
Remark 13. The first upper bound on Ramsey numbers R(r, s) is due to Erdös and Szekeres (1935):
k+l
R(k + 1, l + 1) ≤
.
k
Next, Graham and Rödl (1987) showed that
k+l
6
R(k + 1, l + 1) ≤
log log(k + l)
k
and Rödl3 proved that
A
k+l
R(k + 1, l + 1) ≤
k
logB (k + l)
for some constants A, B > 0. After, in 1988, Thomason obtained that, for k ≤ l,
p
l
k+l
log k + A log k
R(k + 1, l + 1) ≤ exp
2k
k
for some constant A > 0. Note that this is a major improvement of Rödl’s upper bound in the
diagonal case k = l, because it allows to conclude that
R(k + 1, k + 1) ≤ k
√
−1/2+A/ log k
2k
,
k
while Conlon’s theorem can be seen as an improvement of Thomason’s result.
After this quick introduction to Ramsey theorem (our prototypical answer to the initial vague
question), we’ll focus on some basic principles of Ramsey theory.
1. Basic principles of Ramsey theory
A closer inspection of the statement of Ramsey theorem reveals that it supports the following
principle.
First principle of Ramsey theory: “A highly organized structure4 can’t be destroyed by a
partition into finitely many pieces.”
In fact, Ramsey theorem states that the highly organized structure of a (large) complete graph
persists after we make a partition of its edges using the colors blue and red, namely, there is always
a monochromatic complete subgraph.
Now turn attention to N. How to define organization on this set? Well, we can assume that a
set E ⊂ N is organized if it contains affine images of every subset of N. In other words, if for every
finite set F ⊂ N, there exist a, r ∈ N for which
a + rF ⊂ E.
3Also in the mid-eighties, but his proof was not published.
4E.g., complete graphs, semi-groups, vector spaces, etc.
ERGODIC RAMSEY THEORY – FIRST LECTURE
7
By finiteness of F , such condition holds if and only if it holds for every set F of the type {1, . . . , k}.
In this case, a + rF is the arithmetic progression a + r, a + 2r, . . . , a + kr. We should expect, by
the first principle, that whenever N is partitioned into finitely many subsets, one of them contains
arbitrarily long arithmetic progressions.
Definition 14. A set E ⊂ N is AP-rich if it contains arbitrarily long arithmetic progressions.
The next theorem confirms our speculations.
Theorem 15 (van der Waerden). For every partition N =
r
S
Ci , some Ci is AP-rich.
i=1
A reformulation of the above result is its finitary version.
Theorem 16 (van der Waerden – finitary version). Given k, r ∈ N, there exists N = N (k, r) such
r
S
Ci , some Ci is AP-rich.
that, for every m ≥ N and every partition {1, . . . , m} =
i=1
Remark 17. It is easy to check that Theorems 15 and 16 are equivalent. We leave this as an exercise
to the reader.
In other words, we can’t break the highly organized structure of N by partitioning it into finitely
many pieces. One further example of persistent structure is Hindman theorem.
Definition 18. A set E ⊂ N is an IP-set 5 if there exists an infinite sequence (xn ) ⊂ N such that
E contains the set of finite sums
FS(xn ) =
(
X
)
xn ; F ⊂ N is finite .
n∈F
Theorem 19 (Hindman). If E ⊂ N is an IP-set and E =
r
S
Ci is a partition (coloring) of E,
i=1
then some Ci is also an IP-set.
We’ll come back to Hindman’s theorem in the fifth lecture. By now, we hope that the reader is
“convinced” of the “validity” of the first principle of Ramsey theorem. If so, we pose the following
question:
Why does the first principle hold?
In other words, what’s the responsible for the fact that any finite partition of a highly organized
structure has (at least) one piece exhibiting a “replica” of the original structure? The answer relies
on the
Second principle of Ramsey theory: There is always a suitable notion of largeness which is
behind the scenes and such that any large set contains the highly organized structure and it is not
broken by finite partitions.
5It seems there is no consensus about the origin of the name “IP-set”:
some believe it means “infinite(-
dimensional) parallelepiped” (after Furstenberg and Weiss) and other believe that it means “idempotent” (in view
of the relation of these sets with idempotent ultrafilters.
8
YURI LIMA AND CARLOS MATHEUS
We illustrate this second principle identifying the notion of largeness hidden in van der Waerden
theorem. The well-known result of Szemerédi gives the answer. First, we need the following
concepts of denseness.
Definition 20. The density of a set E ⊂ N is
d(E) := lim
N →∞
#E ∩ {1, . . . , N }
N
if this limit exists. The upper density of A is
d(E) := lim sup
N →∞
#E ∩ {1, . . . , N }
N
and the upper Banach density of E is
#E ∩ {M + 1, . . . , N }
N −M
N −M→∞
d∗ (E) := lim sup
and they both exist.
Szemerédi theorem says that positive density (of any kind) is the notion of largeness we were
looking for.
Theorem 21 (Szemerédi). Any set A ⊂ N with positive density (of any kind) is AP-rich.
Exercise 22. Prove that Szemerédi theorem implies van der Waerden theorem.
Similarly, we’ll see in the fifth lecture that idempotent ultrafilters of βN (Stone-Cech compactification of N) is the notion of largeness behind Hindman’s theorem.
Concerning the proof of these results, it turns out that Szemeredi’s theorem is harder than
Hindman’s theorem. In fact, from the conceptual point of view, Hindman’s result is based on
some sort of shift invariance of idempotent ultrafilters (which allows to use recurrence ideas, e.g.,
versions of Poincaré recurrence theorem), while Szemerédi’s result has several sophisticated approaches (combinatorial, ergodic, etc.) including Furstenberg’s ergodic-theoretical proof based on
his Poincaré’s multiple recurrence theorem. Actually, Furstenberg’s seminal work on the ergodic
approach to Szemerédi theorem is the beginning of the Ergodic Ramsey Theory, which is the main
topic of this mini-course.
2. Ergodic Ramsey Theory
Roughly speaking, Furstenberg’s proof of Szemerédi theorem goes as follows. First, he proved
that there is a profound link between Ramsey and Ergodic theories provided by the next result.
Theorem 23 (Furstenberg correspondence principle). If E ⊂ N satisfies d∗ (E) > 0, there exist
a measure-preserving dynamical system (X, B, T, µ) and a measurable subset A ∈ B such that
µ(A) = d∗ (E) > 0 and
d∗ (E ∩ (E − n1 ) ∩ · · · ∩ (E − nk )) ≥ µ(A ∩ T −n1 A ∩ · · · ∩ T −nk A)
for any n1 , . . . , nk ∈ N.
ERGODIC RAMSEY THEORY – FIRST LECTURE
9
Since the fact that E contains a k-AP is equivalent to
E ∩ (E − n) ∩ · · · ∩ (E − kn) 6= ∅
for some n ∈ N, Szemerédi theorem is a consequence of the correspondence principle and the
following result.
Theorem 24 (Furstenberg multiple recurrence theorem). Let (X, B, T, µ) be a measure-preserving
dynamical system. For any k ∈ N and for any A ∈ B with µ(A) > 0, there exists n ∈ N such that
µ(A ∩ T −n A ∩ · · · ∩ T −kn A) > 0.
Remark 25. Actually, it is also possible to show that Szemerédi theorem implies Furstenberg
multiple recurrence theorem and so they are equivalent.
Of course, the study of Furstenberg’s results relies on the ideas and tools from Ergodic Theory
(such as recurrence theorems, ergodicity, weak mixing versus compact systems, etc.). Before
entering the discussion of this issue, we spend the rest of this lecture with the proof of Poincaré
recurrence theorem, the properties of the shift dynamics and the proof of van der Waerden theorem.
Remark 26. See Appendix C for a proof of Furstenberg correspondence principle.
3. Measure-preserving systems
We say that a measurable transformation T : X → X of a measurable space (X, B) equipped
with a T -invariant probability measure µ (i.e., µ(T −1 A) = µ(A)) is a measure-preserving system
(X, B, T, µ).
We observe that the concept of measure-preserving system (mps for short) is meaningful. Indeed,
the mere existence of T -invariant probabilities is sufficient to impose non-trivial constraints on the
orbits of T (from the statistical point of view). For instance,
Theorem 27 (Poincaré recurrence theorem). Let (X, B, T, µ) be a mps. Consider A ∈ B such
that µ(A) > 0. Then, µ(A ∩ T −n A) > 0 for some n ∈ N.
Remark 28. This is the case k = 1 of Furstenberg multiple recurrence theorem.
Proof. Let Am := T −m A. The T -invariance of µ implies that µ(Am ) = µ(A) > 0 and
µ(Am ∩ Al ) = µ(A ∩ T −(l−m) A)
for every l ≥ m. Therefore, our task is reduced to find a pair of indices m < l such that the
intersection of Am and Al has positive µ-measure. Suppose that such a pair doesn’t exist. This
means that A0 , A1 , . . . , Am , . . . is a collection of essentially (i.e., mod 0) pairwise disjoint subsets
of X of equal positive measure. In particular, for every N ∈ N,
!
N
−1
N
−1
X
X
Ai ≤ µ(X) = 1.
µ(Ai ) = µ
N · µ(A) =
i=0
Of course, this is a contradiction for N > 1/µ(A).
i=0
10
YURI LIMA AND CARLOS MATHEUS
Once we know that the existence of invariant measure is a non-trivial restriction to the dynamics
of a systems, it is natural to ask whether a given system possesses invariant measures. The following
exercise shows that we can’t expect that the existence of invariant measures is true in general.
Exercise 29. Show that the dynamical system T : [0, 1] → [0, 1] defined by
(
x/2 if x 6= 0
T (x) =
1
if x = 0
doesn’t admit invariant probabilities. (Hint: Poincaré recurrence theorem.)
The next theorem says that the non-existence of invariant measures in the previous exercise is
due to the lack of continuity of the corresponding dynamical system.
Theorem 30 (Krylov-Bogolyubov). Let X be a compact metric space equipped with the Borel
sigma-algebra B and T : X → X a continuous dynamical systems. Then, T has a invariant
probability measure.
Example 31. Take X = {1, . . . , n} a finite subset equipped with the metric d(i, j) = |i − j|. An
invertible dynamical system T on X is a permutation T : {1, ..., n} → {1, ..., n} and the orbits of
T are its cycles. Of course, any measure µ on X is completely determined by its values µ({i}) on
the singletons {i} ⊂ X. Given any cycle C of T , show that
(
1
, if i ∈ C,
#C
µC ({i}) :=
0
, otherwise,
is a T -invariant probability measure. Furthermore, show that any T -invariant probability measure
is a convex combination of such measures.
As the reader can see, the invariant measures of the previous example (permutations of finite
sets) are described by certain probabilities µC given by averages over the orbits (cycles) and convex
combinations of these µC . In general, the philosophy of the construction of invariant measures in
Krylov-Bogolyubov theorem is essentially the same – we consider a certain average process along
orbits. Evidently, the general situation is a little bit more technical than our example because the
orbits are typically infinite, so that we will only find our desired measures after a limit process. Of
course, the main point is the formalization of this limit process (via the Banach-Alaoglu theorem
from Functional Analysis). Let’s go to the details.
Proof. We define the push-forward operator T∗ on the space M(X) of probability measures of X
as
T∗ η(A) := η(T −1 A).
By definition, T -invariant probabilities are exactly the fixed point of the push-forward operator
T∗ . In order to find some fixed point of T∗ , we begin with an arbitrary probability measure η, e.g.,
ERGODIC RAMSEY THEORY – FIRST LECTURE
11
η = δx where δx is the Dirac measure supported on a point x ∈ X, and we consider the following
averaging process:
n−1
1X i
µn :=
T η.
n i=0 ∗
A direct calculation reveals that
1 n
(T η − η).
n ∗
This means that µn is almost T∗ -invariant in the sense that µn and T∗ µn differ by a measure
T∗ µn = µn +
whose total variation is ≤ 2/n. In particular, it is natural to search for an adequate topology on
M(X) so that we can find fixed points of T∗ by taking limits. Fortunately, we learn in Functional
Analysis that Banach-Alaoglu theorem implies that the convex set M(X) is compact with respect
to the weak-∗ topology. Recall that, by definition, the weak-∗ topology is the coarsest topology such
R
that all linear functional µ 7→ f dµ are continuous (where f : X → R is a continuous function).
Furthermore, since T is continuous and
Z
Z
f d(T∗ µ) =
(f ◦ T )dµ,
X
X
it follows that T∗ : M(X) → M(X) is continuous with respect to the weak-∗ topology. At this
point, it is very easy to conclude the existence of fixed points of T∗ . By weak-∗ compactness of
M(X), we can take a subsequence µnk of µn converging to a probability measure µ, and by weak-∗
continuity of T∗ ,
T∗ µ =
=
=
=
lim T∗ µnk
T∗ nk η − η
µnk +
lim
k→+∞
nk
lim µnk
k→+∞
k→+∞
µ,
that is, µ is T -invariant. The proof of Krylov-Bogolyubov theorem is complete.
Remark 32. In the fourth lecture we’ll see a generalization of Krylov-Bogolyubov theorem for the
action of amenable groups.
Closing this lecture, we will introduce the shift dynamics and derive both van der Waerden and
Szemerédi theorem from multiple recurrence results. Finally, in the Appendices, we present the
proof of one of these multiple recurrence theorems and an alternative (combinatorial) proof of van
der Waerden theorem.
4. Shift dynamics
Symbolic spaces are the natural objects to work with partitions of N. In fact, assume
N=
r
[
i=1
Ci .
12
YURI LIMA AND CARLOS MATHEUS
Consider a finite alphabet A = {1, . . . , r} and introduce the space Σ = AN of all (semi-)infinite
words formed by the letters of A, i.e.,
.
Σ = {(x1 , . . . , xn , . . . ) ; xi ∈ A, ∀ i ∈ N}.
The partition above is naturally associated to the element x = (xn ) of Σ by the relation
xn = i ⇐⇒ n ∈ Ci .
In other words, Σ represents the set of all partitions of N into r pieces.
We equip Σ with the metric
.
d(x, y) = 1/l ,
where x = (x1 , . . . ), y = (y1 , . . . ) ∈ Σ and l is the smallest integer such that xl 6= yl .
Exercise 33. Show that (Σ, d) is a compact metric space.
In this setting, a natural transformation is the shift map σ : Σ → Σ defined as
.
σ((x1 , . . . , xn , . . . )) = (x2 , . . . , xn+1 , . . . ).
Exercise 34. Prove that σ is a continuous map of (Σ, d).
4.1. Proof of van der Waerden assuming topological multiple recurrence. As a first
application of the shift dynamics, we prove van der Waerden theorem using the following result,
whose proof is in Appendix A.
Theorem 35 (Furstenberg and Weiss topological multiple recurrence theorem). Let T : X → X
be a continuous map of the compact metric space (X, d). Then, for any k ∈ N and ε > 0, there
exists n ∈ N and x ∈ X such that
d(T in x, x) < ε , i = 1, 2, . . . , k.
Moreover, given any dense subset Z ⊂ X, we can take x ∈ Z.
Consider the shift σ : Σ → Σ and x ∈ Σ associated to the given partition. Recall that, by the
definition of the distance d and the shift map σ, we have
d(T m y, T l z) < 1 ⇐⇒ ym+1 = zl+1 .
Therefore, the existence of a monochromatic arithmetic progression m, m + n, . . . , m + (k − 1)n is
equivalent to xm = xm+n = · · · = xm+(k−1)n , that is
d(σ m−1 x, σ in σ m−1 x) < 1 , i = 1, 2, . . . , k.
for every i = 1, . . . , k.
We take X = {σ l x ; l ∈ N}. Note that X is a compact metric space, σ|X : X → X is continuous
and Z := {σ l x ; l ∈ N} ⊂ X is dense. Thus, Theorem 35 (with ε = 1) says that we can fix m ∈ N
such that, for some n ∈ N, the element z := σ m−1 x ∈ Z satisfies
d(σ m−1 z, σ in σ m−1 z) = d(x, σ in x) < 1 , i = 1, 2, . . . , k,
ERGODIC RAMSEY THEORY – FIRST LECTURE
13
which completes the proof.
4.2. Proof of Szemerédi theorem assuming multiple recurrence. As a second application
of the shift dynamics, we reduce Szemerédi theorem to Furstenberg multiple recurrence theorem
(Theorem 24) via a weak version of Furstenberg correspondence principle.
More precisely, we fix our positive (upper) density subset A ⊂ N and introduce
.
xn = χA (n),
where χA is the characteristic function of A. It follows that x = (x1 , . . . , xn , . . . ) ∈ {0, 1}N . Using
the orbit of x under the shift σ, we form the Birkhoff averages
n−1
. 1X
δ i
µn =
n i=0 σ (x)
and extract a (σ-invariant) limit µ (up to passing to a subsequence) exactly as we did in the proof
of Krylov-Bogolyubov theorem.
Define Y := {y = (yn ) ∈ {0, 1}N ; y1 = 1}. Observe that Y is a closed and open subset of
{0, 1}N, so that χY is a continuous function. In particular,
µ(Y ) = lim sup µl (Y ) = lim sup
l→∞
l→∞
#A ∩ {1, . . . , l}
= d(A) > 0.
l
Hence, we can apply Furstenberg multiple recurrence theorem to conclude that, for every k ∈ N,
there exists N ∈ N with
µ(Y ∩ σ −N Y ∩ · · · ∩ σ −(k−1)N Y ) > 0.
Consequently, for any sufficiently large l, we also have
µl (Y ∩ σ −N Y ∩ · · · ∩ σ −(k−1)N Y ) > 0.
Thus, we can find 0 ≤ m ≤ l such that σ m x ∈ Y ∩ σ −N Y ∩ · · · ∩ σ −(k−1)N Y , that is, m, m +
N, . . . , m + (k − 1)N ∈ A.
5. Some general comments on Furstenberg multiple recurrence theorem
Closing this lecture, let me point out the plan of the subsequent second and third lectures. As
we are going to see later, Furstenberg multiple recurrence theorem can be proved in two steps:
• first, we verify the result in two special classes of dynamical systems: weak mixing systems
(pseudorandom) and compact systems (structured);
• after that, Furstenberg multiple recurrence theorem is a consequence of Furstenberg structural theorem, which states that any dynamical system is a tower of extensions of compact
systems and weak mixing systems.
In the second lecture, we study the dynamics of weak mixing systems (which includes the
shift dynamics) and sketch the proof of multiple recurrence theorem for these systems. In the
third lecture, we study the dynamics of compact systems (a class including the rotations of the
circle) and prove multiple recurrence theorem for this class. In particular, we reduce multiple
14
YURI LIMA AND CARLOS MATHEUS
recurrence theorem to Furstenberg structural theorem6. Unfortunately, the technical details of
this last theorem are beyond the scope of this mini-course. Instead, the last two lectures deal with
ergodic theory of actions of amenable groups and Hindman theorem.
Appendix A. Proof of Furstenberg and Weiss topological multiple recurrence
theorem
A preliminary observation is: if, for some k and ε, the first part of theorem holds for a certain x,
then the same statement is true for an entire small neighborhood of x, and, a fortiori, the theorem
works with some element of any fixed dense subset Z. Hence, it suffices to show the first part of
the theorem to get a full proof of it.
Next, we notice that Zorn’s lemma implies that one can assume that X is minimal, i.e., X
doesn’t possesses any proper closed subset Y such that T (Y ) ⊂ Y . Observe that, in this situation,
the subsets {T m(x)}∞
m=0 are dense in X, so that the theorem is true for k = 1 (since, by denseness,
there exists some n ∈ N with d(T n (x), x) < ε).
At this stage, the proof proceeds by induction (on k). Suppose that the theorem holds for
some k ≥ 1, i.e., for all ε > 0 there exists x ∈ X and n ∈ N such that d(T in (x), x) < ε for each
i = 1, . . . , k. We claim that the set of such points x is actually dense in X.
Indeed, let U ⊂ X be an arbitrary open subset and pick B ⊂ U a small ball of radius strictly less
than ε. Define Bm = (T m )−1 (B), so that these subsets form an open cover of X (by the minimality
assumption). Using the compactness of X, we can extract a finite subcover {Bm1 , . . . , Bmr }. Let
δ > 0 be the Lebesgue number of this open subcover, that is, a number such that any ball of
radius δ is contained inside some element of the subcover. Take x and n such that d(T in (x), x) < δ
for i = 1, . . . , k (whose existence is assured by the inductive hypothesis) and denote by D the
ball of center x and radius δ. Then, by our choice of δ, there exists j such that D ⊂ Bmj . In
particular, T mj (D) ⊂ B, that is, the elements T mj (T in (x)) belong to the ball of radius ε centered
on T mj (x) ∈ U . This proves our denseness claim.
Now, let’s go back to the proof of the theorem. Fix ε > 0. By the inductive hypothesis, there
are x0 and n0 such that d(T in0 x0 , x0 ) < ε/2 for i = 1, . . . , k. Taking x1 such that T n0 (x1 ) = x0 ,
we have d(T (i+1)n0 x1 , x0 ) < ε/2 for i = 1, . . . , k. Hence, d(T in0 (x1 ), x0 ) < ε/2 for i = 1, . . . , k + 1.
By continuity, there exists ε1 < ε such that d(y, x1 ) < ε1 implies d(T in0 (y), x0 ) < ε/2 para
i = 1, . . . , k + 1. By our denseness claim, there are y1 and n1 such that d(y1 , x1 ) < ε1 /2 and
d(T in1 (y1 ), y1 ) < ε1 /2 for i = 1, . . . , k. By the triangular inequality, we have:
d(T in0 (T (i−1)n1 (y1 )), x0 ) < ε2 for i = 1, . . . , k + 1.
6This statement is not completely correct. After proving multiple recurrence for compact and weak mixing
systems, one has to define the notions of compact and weak mixing extensions and prove that such objects preserve
the required property. Once this is established, multiple recurrence theorem is reduced to Furstenberg structural
theorem.
ERGODIC RAMSEY THEORY – FIRST LECTURE
15
Proceeding in this way (taking x2 such that T n1 (x2 ) = y1 , etc.), we find a sequence of points
x2 , x3 , . . . ∈ X and a sequence of natural numbers n2 , n3 , . . . such that, for each l, we have:
d(T inl−1 (xl ), xl−1 )
<
ε/2
d(T i(nl−1 +nl−2 ) (xl ), xl−2 )
<
ε/2
...
d(T
i(nl−1 +···+n0 )
(xl ), x0 )
<
ε/2
for i = 1, . . . , k + 1.
By compactness, there are l > m such that d(xl , xm ) < ε/2. By the triangular inequality, we
have:
d(T i(nl−1 +···+nm ) (xl ), xl ) < ε , for i = 1, . . . , k + 1.
Therefore, it suffices to take x = xl and n = nl−1 + · · · + nm to conclude the proof of FurstenbergWeiss theorem.
Appendix B. Combinatorial proof of van der Waerden theorem
In this Appendix we prove Van der Waerden theorem via the coloring method in Combinatorics.
In order to aleviate the notation, we denote the arithmetical progression a, a + r, . . . , a + (k − 1)r
by a + [0, k)r, and we assume that one disposes of m colors to assign to the natural numbers from
1 to N .
Definition 36. Let c : {1, . . . , N } → {1, . . . , m} be a coloring. Given k ≥ 1, d ≥ 0 and a ∈
{1 . . . , N }, a fan of radius k, degree d with base point a is a d-tuple of arithmetic progressions
(a + [0, k)r1 , . . . , a + [0, k)rd ) where r1 , . . . , rd > 0. For each 1 ≤ i ≤ d, the progressions a + [1, k)ri
are called spokes of the fan. We say that a fan is polychromatic if its base point and its spokes are
monochromatic, i.e., there are distinct colors c0 , c1 , . . . , cd such that c(a) = c0 and c(a + jri ) = ci
for j = 1, . . . k and i = 1, . . . d.
Remark 37. Observe that, by the distinction between the colors, if we have m colors, it is not
possible to construct a polychromatic fan whose degree is ≥ m.
Of course, we see that the van der Waerden theorem is a direct consequence of the following
result:
Theorem 38. Let k, m ≥ 1. Then, there exists N such that any coloring of {1, . . . , N } with m
colors contains a monochromatic arithmetic progression of length k.
Proof. The argument consists into a double induction scheme. Firstly, we make an inductive
argument on k: observe that the case k = 1 is trivial, so that we can take k ≥ 2 and we can assume
that the theorem holds for k − 1. Secondly, we perform an induction on d, i.e., we will show the
following claim by induction: given d, there exists N such that for any coloring of {1, . . . , N } with
m colors, we have either a monochromatic arithmetic progression of length k or a polychromatic
16
YURI LIMA AND CARLOS MATHEUS
fan of radius k and degree d. Note that the case d = 0 is trivial and once we prove this claim
for d = m, one can use the remark 37 in order to obtain the desired monochromatic arithmetic
progression of length k (so that the double inductive argument is complete).
Let us take d ≥ 1 and suppose that this claim is true for d − 1. Let N = 4kN1 N2 , where N1
and N2 are large integers to be chosen later, and consider A = {1, . . . , N }. Fix c : {1, . . . , N } →
{1, . . . , m} a coloring of A. Obviously, {bkN1 + 1, . . . , bkN1 + N1 } is a subset of A with N1 elements
for each b = 1, . . . N2 . By our inductive hypothesis on k and d, if N1 is sufficiently large, we can
find either a monochromatic arithmetic progression of length k or a polychromatic fan of radius k
and degree d − 1.
Of course, if we find a monochromatic arithmetic progression of length k inside {bkN1 +
1, . . . , bkN1 + N1 } for some b = 1, . . . , N2 , we are done. Thus, one can suppose that we find a polychromatic fan inside {bkN1 +1, . . . , bkN1 +N1 } for every b = 1, . . . , N2 . In other words, for each b =
1, . . . , N2 , we have a(b), r1 (b), . . . , rd−1 (b) ∈ {1, . . . , N1 } and distinct colors c0 (b), c1 (b), . . . , cd−1 (b) ∈
{1, . . . m} such that c(bkN1 + a(b)) = c0 (b) and c(bkN1 + a(b) + jri (b)) = ci (b) for every j =
1, . . . , k − 1 and i = 1, . . . , d − 1. We say that these are the first and second properties of the fan
associated to b. In particular, the map
b → (a(b), r1 (b), . . . , rd−1 (b), c0 (b), . . . , cd−1 (b))
is a coloring with md N1d colors of the set {1, . . . , N2 }. Using again our inductive hypothesis on k, if
N2 is sufficiently large, there exists some arithmetic progression b + [0, k − 1)s which is monochromatic with respect to this new coloring, say that its color has the form (a, r1 , . . . , rd−1 , c1 , . . . , cd−1 ).
Up to reversing the position of the progression, we can suppose that s is negative.
At this point, the idea is to convert this huge progression of identical polychromatic fans of degree
d − 1 (in the sense that their combinatorial type is fixed by the color (a, r1 , . . . , rd−1 , c1 , . . . , cd−1 ))
in a new polychromatic fan with degree d in order to close the inductive argument. Let b0 =
(b − s)kN1 + a ∈ {1, . . . , N } and consider:
(b0 + [0, k)skN1 , b0 + [0, k)(skN1 + r1 ), . . . , b0 + [0, k)(skN1 + rd−1 )).
We affirm that this is a fan of radius k, degree d and base point b0 .
Indeed, let us verify that the spokes are monochromatic. In the first spoke we have c(b0 +
jskN1 ) = c((b + (j − 1)s)kN1 + a) by direct substitution. By the first property of the fan associated
to b + (j − 1)s, it follows that c((b + (j − 1)s)kN1 + a) = c0 (b + (j − 1)s) = c0 (b) (since the arithmetic
progression b + [0, k − 1)s is monochromatic if 1 ≤ j ≤ k − 1). Similarly, in an arbitrary spoke,
using the second property of the fans, we have that, if 1 ≤ j ≤ k − 1 and 1 ≤ t ≤ d, then
c(b0 + j(skN1 + rt )) = c((b + (j − 1)s)kN1 + a + jrt ) = ct (b + (j − 1)s) = ct .
If the base point b0 has the same color of a spoke, we found a monochromatic arithmetic
progression of length k. Otherwise, the base point has a distinct color from the spokes, so that we
ERGODIC RAMSEY THEORY – FIRST LECTURE
17
found a polychromatic fan of radius k and degree d. This ends the inductive step and, a fortiori,
the proof of the theorem.
Appendix C. Proof of Furstenberg correspondence principle
Consider a sequence of non-negative real numbers (an )n∈Z such that
PM−1
aN + aM + n=N |an+1 − an |
−→ 0 as M − N → +∞ .
PM
n=N an
(C.1)
For example, every constant sequence satisfies this condition. Given a set E ⊂ Z, define the upper
Banach density with respect to (an ) as:
d∗ (E) = lim sup
M−N →+∞
aN · δN (E) + · · · + aM · δM (E)
,
aN + · · · + aM
where δn stands for the usual Dirac measure. This density is a well defined number between 0 and
1. We will prove the following
Theorem 39. If E ⊂ Z satisfies d∗ (E) > 0, then there exist a mps (X, B, µ, T ) and a set A ∈ B
such that µ(A) = d∗ (E) and
d∗ ((E − n1 ) ∩ · · · ∩ (E − nt )) ≥ µ T −n1 A ∩ · · · ∩ T −nt A ,
(C.2)
for every n1 , . . . , nt ∈ Z.
Proof. We consider the most natural system: the set of characteristic functions of sets of integers.
X
=
{0, 1}Z
B
=
σ-algebra generated by the cylinders.
The dynamics considered is the bilateral shift T : X → X defined by
T (. . . , x0 ; x1 , x2 , . . .) = (. . . , x1 ; x2 , . . .) .
We apply a Krylov-Bogolyubov argument to create a T -invariant measure µ. Take x = χE ∈ X,
that is, x = (. . . , x−1 , x0 ; x1 , . . .) with xn = δn (E).
Let (Nk )k∈Z , (Mk )k∈Z be sequences of integers for which
d∗ (E) = lim
k→+∞
aNk · δNk (E) + · · · + aMk · δMk (E)
·
aNk + · · · + aMk
We may assume (restricting (Nk ), (Mk ) to subsequences, if necessary) that the probabilities (µk )k∈Z
defined by
aNk · δT Nk x + · · · + aMk · δT Mk x
aNk + · · · + aMk
converge in the weak-∗ topology to a probability µ, that is,
Z
Z
f dµk −→
f dµ , for every continuous f : X → R.
µk =
X
X
18
YURI LIMA AND CARLOS MATHEUS
Under the assumption (C.1), µ is T -invariant. In fact,
T∗ µk =
aNk · δT Nk +1 x + · · · + aMk · δT Mk +1 x
aNk + · · · + aMk
and so
T∗ µk − µk
=
1
M
k
X
aMk · δT Mk +1 x − aNk · δT Nk x −
M
k −1
X
(an+1 − an ) · δT n+1 x
n=Nk
an
!
n=Nk
aNk + aMk +
M
k −1
X
|an+1 − an |
n=Nk
|T∗ µk − µk | ≤
Mk
X
an
n=Nk
which, by hypothesis, converges to zero as k → +∞. Good: we have our probability space!
Continue the construction taking A = {(. . . , x−1 , x0 ; x1 , . . .) ∈ X ; x0 = 1}. Then
δT n x (A) = 1
⇐⇒
(T n x)0 = 1
⇐⇒
xn = 1
⇐⇒
n∈E
⇐⇒
δn (E) = 1,
that is,
δT n x (A) = δn (E).
This is the connection between A and E we were looking for. It implies that
µk (A)
=
=
aNk · δT Nk x (A) + · · · + aMk · δT Mk x (A)
aNk + · · · + aMk
aNk · δNk (E) + · · · + aMk · δMk (E)
aNk + · · · + aMk
and, as A is a clopen set (all cylinders are clopen),
µ(A) = lim µk (A) = d∗ (E).
k→+∞
It remains to verify (C.2), which actually follows from the last argument:
δT n x (T −n1 A ∩ · · · ∩ T −nt A)
n
= 1
⇐⇒
T x
∈
T −n1 A ∩ · · · ∩ T −nt A
⇐⇒
T n+n1 x, . . . , T n+nt x
∈
A
⇐⇒
xn+n1 , . . . , xn+nt
⇐⇒
n + n1 , . . . , n + nt
∈
E
⇐⇒
n
∈
(E − n1 ) ∩ · · · ∩ (E − nt )
= 1
ERGODIC RAMSEY THEORY – FIRST LECTURE
19
d∗ ((E − n1 ) ∩ · · · ∩ (E − nt )) ≥ µ T −n1 A ∩ · · · ∩ T −nt A .
and then
This concludes the proof.
Remark 40. This exposition of Furstenberg’s correspondence principle was extracted from Yuri
Lima’s post at the blog [4].
Acknowledgments
This first lecture note is based on a survey written by the second author jointly with Alexander
Arbieto and Carlos Gustavo Moreira [1]. In particular, we’re thankful to them for the several
discussions around this subject and for allowing us to use the corresponding texts.
References
1. A. Arbieto, C. Matheus and C.G. Moreira, The remarkable effectiveness of ergodic theory in number
theory, Ensaios Matemáticos 17 (2009).
2. V. Bergelson, Ergodic Ramsey Theory - an Update, Ergodic Theory of Zd -actions, London Math. Soc. Lecture
Note Series 228 (1996), 1–61.
3. D. Conlon, A new upper bound for diagonal Ramsey numbers, Annals of Mathematics (2) 170 (2009), 941–960.
4. Disquisitiones Mathematicae, http://matheuscmss.wordpress.com/
5. C.G. Moreira, O Teorema de Ramsey, Revista Eureka! 6 (2000), 23–29.
Instituto Nacional de Matemática Pura e Aplicada, Estrada Dona Castorina 110, 22460-320, Rio de
Janeiro, Brasil.
E-mail address: [email protected]
College de France, 3 Rue d’Ulm, Paris CEDEX 05, France.
E-mail address: [email protected]
© Copyright 2026 Paperzz