Ergodicity of Cellular Automata

1
Ergodicity of Cellular Automata
Andrei Toom
E-mail [email protected]
[email protected]
This course will be delivered on January 14-18, 2013 at
Tartu Uiversity, Estonia as a part of the graduate program.
Content
1. Undirected plane percolation: finite and infinite. Duality. . . . . . . . . . . . . . 3
2. Directed plane percolation: finite and infinite. Duality. . . . . . . . . . . . . . . 14
3. Stavskaya process has a critical value. Approximations. . . . . . . . . . . . . . . 19
4. d -dimensional DCA (Deterministic Cellular Automata). . . . . . . . . . . . . . 30
5. σ is empty ⇐⇒ D is an eroder ⇐⇒ ∃ α > 0 : RD is non-ergodic . . 42
6. Measures on the sigma-algebra in Ω = AG . M is compact. . . . . . . . . 49
7. General definition of PCA (Probabilistic Cellular Automata). . . . . . . . . . 54
8. In dealing with PCA, should we count time as an extra coordinate? . . 59
9. Coupling of measures and processes. Order and Monotonicity. . . . . . . . 64
10. The problem of ergodicity for PCA is undecidable. . . . . . . . . . . . . . . . . . 71
Main terms and notations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
2
Foreword
For several decades mathematicians study processes with many (or infinitely many) interacting components. Such processes are known under
many names. In this course we call them cellular automata. By choosing this title we dismiss many non-essential technical complications. The
parameter of time is discrete and the set of states of every component is
finite, often has only two elements. In spite of their conceptual simplicity, cellular automata demonstrate a plethora of interesting phenomena
and the title ”programmable matter” coined by T. Toffoli and N. Margolus [32] is quite appropriate. In a cellular automaton with discrete time
and positive transition probabilities, any local event can happen with a
positive probability (which is not true for systems with continuous time
where only one change can occur at a time). If, in spite of this, a cellular
automaton is non-ergodic (and we shall present examples of this), it is a
convincing analog of phase transitions, which have many forms in the real
world (freezing, melting, evaporation, condensation etc.) and are among
the most important natural phenomena. We are especially interested in
non-trivial (that is, strictly between zero and one) critical values of parameters, when properties of the process on the opposite sides of a critical
value are qualitatively different, thus imitating natural phase transitions
where the structure of matter changes qualitatively when temperature
continuously changes across freezing or condensation points. We start
with some basic notions of percolation because it is strongly connected
with cellular automata.
The present book consists of ten chapters and almost every chapter contains at least one theorem relevant to our study. Other statements are
called lemmas. Some of them are proved, other proofs are left to the
reader. Some results included here can be found in the surveys [36] and
[37]. Our list of references certainly does not include all the relevant studies; it includes only those publications to which we explicitly refer.
There are many solved and still more unsolved problems in this area. We
selected a few of them, which are closest to our notions, and placed them at
the end of every chapter along with a few exercises for those students who
want to get some hands-on experience of doing mathematics in this area.
3
Chapter 1
Undirected plane percolation
1.1. Percolation models.
We call a real function f of n real arguments monotonic if
x1 ≤ x01 , . . . .xn ≤ x0n =⇒ f (x1 , . . . , xn ) ≤ f (x01 , . . . , x0n ).
Let us call a variable Boolean if it has only two possible values 0 and 1 .
According to the general definition above, we call a Boolean function φ
of n Boolean arguments monotonic if
(v1 ≤ v10 , . . . , vn ≤ vn0 ) =⇒ φ(v1 , . . . , vn ) ≤ φ(v10 , . . . , vn0 ).
In this chapter a percolation model is a graph with a finite or countable
set of vertices and a finite or countable set of edges. For every edge there
are two vertices, called its ends. An edge may be directed or undirected.
If an edge is undirected, it is either open in both direction or closed in
both directions. If an edge is directed, it is open or closed in one direction
and open or closed in the other direction independently.
1.2. Finite undirected percolation.
We say that a graph is drawn on a plane if the following conditions are
fulfilled:
• ) All vertices of the graph are represented by different points in that
plane, every edge is represented by a self-avoiding arc and ends of this arc
represent those vertices, which this edge connects.
• ) An arc is a set {f (t) : 0 ≤ t ≤ 1} , where f is a continuous function
from [0, 1] to the plane and the relation t ↔ f (t) is one-to-one. The
set {f (t) : 0 < t < 1} is called the inner part of this arc. Elements of the
inner part are called inner points. No inner point of an edge can belong to
another edge. In fact we assume for simplicity that every arc representing
an edge is a broken line. Those values of t , in the vicinity of which this
function is not linear, are called angles.
• ) Every edge connects two vertices which are called its endpoints. Inner
points of any arcs representing an edge have no common points with other
edges. Endpoints of an arc may coincide with each other only when the
edges have one and the same vertex as an end point. Throughout this
text plane graph means a graph drawn on a plane in the described way.
(Graphs which can be drawn on a plane in this way are called planar, but
we do not speak about them.)
4
Every plane graph divides the plane into parts which are called faces.
In this subsection we consider only finite graphs, so one of the faces is
unbounded, others are bounded. For short we may use ”vertices” instead
of ”points representing them”.
Every plane graph has a dual graph, also a plane graph. For any plane
graph Γ and its dual plane graph Γ0 :
(a) There is a one-to-one correspondence between faces of Γ and vertices
of Γ0 , such that every vertex of Γ0 belongs to the corresponding face of Γ .
(b) There is a one-to-one correspondence between edges of Γ and edges of
Γ0 such that inner parts of these two edges intersect exactly at one point
which is not an angle of any of them.
(c) There is a one-to-one correspondence between faces of Γ0 and vertices
of Γ , such that every vertex of Γ belongs to the corresponding face of Γ0 .
Let us establish the following rule connecting states of edges of a plane
graph and its dual:
Given an undirected plane graph Γ and its dual Γ0 : an edge of
Γ0 is open if and only if the corresponding edge of Γ is closed.
(1)
A path in a graph is a finite or infinite sequence
vertex-edge-vertex-edge-. . . ,
where every edge connects those two vertices, between which it is placed
in this sequence. A contour is a path in which the first and last vertices
coincide. A path or contour is called open if all the edges included in it
are open. We say that there is percolation from a vertex A to a vertex
B is there is an open path from A to B .
Theorem 1.1. Let Γ be a plane graph and Γ0 its dual. Assume the rule
(1). Let A, B be two different vertices of Γ . Then in Γ there is no open
path connecting A and B if and only if in Γ0 there is an open contour
separating point A from point B .
I cannot say who discovered this theorem; for a long time it was considered
”generally known” and remained unwritten. Now its proof is available in
[26].
Instead of proving this theorem, we illustrate it. The following diagram
shows a graph Γ whose vertices are represented by black circles and edges
are represented by double lines. Vertices of its dual are represented by
white circles and edges are represented by curves.
5
Y
i
C
A
vH
H
HH
HH
HHH
HHH
HHH
HHH
HH
HH
v
i
i
H
Hv B
H
H
HH
HH
HH
HH
HH
HH
HH
HH
HH
HH HH
v
Z
X
D
Figure 1.1
Figure 1.1 shows a finite graph Γ with four vertices A, B, C, D drawn on
a plane and its dual graph Γ0 with three vertices X, Y, Z also drawn there.
According to our theorem 1.1, there is no path in Γ connecting A with
B if and only if there is an open contour in Γ0 separating A from B .
j 6
6
6
?
6
ti
?
-
-
i
(0, 0)
?
?
-
?
-
-
Figure 1.2. Infinite undirected percolation.
6
The most interesting and important property of infinite percolation is possibility of phase transition and we go straight to one case of it. Figure 1.2
represents an infinite graph, which we call ”checkered paper”. In mathematical terms, it is a graph, whose set of vertices is Z2 , the set of pairs
(i, j) , where both i and j are integer numbers. Any vertex (i, j) is connected by undirected edges with (i + 1, j), (i, j + 1), (i − 1, j), (i, j − 1) .
Let us imagine that every edge is a pipe which is either open or closed,
namely it is open with probability ε and closed with probability 1 − ε
independently of the other edges. The origin (0, 0) is the only source of
some liquid, which can pass along open edges, but not along closed ones.
There are no one-way edges: any edge is either open in both directions,
or closed in both directions. Vertices are always open. The vertices which
the liquid can reach are called wet, others are called dry. The source (0, 0)
is always wet by definition. If the edge from (0, 0) to (1, 0) is open, the
vertex (1, 0) is also wet. If, in addition, the edge from (1, 0) to (1, 1) is
also open, the vertex (1, 1) is also wet and so on.
Let us call a path a sequence ”vertex-edge-vertex-edge-. . . ” in which every
edge connects the vertices between which it is placed. If a path is finite,
it ends with a vertex and the number of edges in the sequence is called
length of the path. One finite path of length 14 is shown on figure 1.2. A
path is called self-avoiding if all the vertices in its sequence are different.
For example, the path shown on figure 1.2 is self-avoiding. A path is
called open if all its edges are open. A vertex is wet if and only if there
is an open path from the source (0, 0) to this vertex (or from this vertex
to the source, which means the same). We say that there is percolation
from (0, 0) to ∞ if the set of wet vertices is infinite. The most interesting
feature of this kind of percolation is existence of a non-trivial critical value,
which in the present case can be formulated as follows:
Theorem 1.2. Percolation from (0, 0) to ∞ on the checkered paper as
described above, has a critical value ε∗ strictly between zero and one, such
that:
(a) If ε < ε∗ , the probability of percolation from (0, 0) to ∞ is zero.
(b) If ε > ε∗ , the probability of percolation from (0, 0) to ∞ is positive.
In fact we shall prove that
(a’) if ε is small enough, the probability of percolation from (0, 0) to ∞
is zero and
(b’) if ε is large enough, the probability of percolation from (0, 0) to ∞ is
positive.
7
Items (a’) and (b’) are sufficient to prove our theorem. Indeed, we may
define ε∗ as the supremum of those values of ε , for which the probability
of percolation from (0, 0) to ∞ is zero. According to what will be proved,
thus defined ε∗ is strictly between zero and one.
In chapter 9 we shall prove that the probability of percolation in the finite
case is a non-decreasing function of ε . The same is true for percolation
from (0, 0) to ∞ , whence the critical value is unique. It remains to prove
(a’) and (b’). To prove this, we need the following lemma.
Lemma 1.1. For any wet vertex there is an open self-avoiding path from
(0, 0) to this vertex.
Proof. Let us call by v the vertex in question. Since v is wet, there is
an open path from (0, 0) to v . If this path is self-avoiding, we are done.
Let us assume that this path is not self-avoiding. Then its sequence of
vertices
v0 = (0, 0), v1 , . . . , vn = v
contains two identical vertices vi = vj . Let us exclude from this path
all the vertices with number in the range i + 1, . . . , j . Thus we obtain
another open path, which is shorter than the previous one. Let us repeat
this operation, obtaining a sequence of open paths, each shorter than the
previous one. We have to stop because lengths of these paths are nonnegative.
Lemma 1.2. Percolation from (0, 0) to ∞ on the checkered paper is
equivalent to existence of an open self-avoiding infinite path starting at
(0, 0) .
In one direction it is evident: if such a path exists, the set of its vertices is
infinite and all of them are wet, so the set of wet vertices is also infinite.
Now let us prove lemma 1.2 in the opposite direction. We can encode a
path starting at (0, 0) by the sequence of directions of its edges as we
pass them. For example, the path shown on figure 1.2 can be encoded as
a sequence of directions east, north, west, north, west, south, west, south,
south, south, east, south, east, east.
Let the set of wet vertices be infinite. Let us call S the set of open
finite self-avoiding paths starting at (0, 0) . Since for any wet vertex there
is such a path leading there, S is infinite. Let us classify S into four
subsets depending on direction of the first edge in the path:
S = Seast ∪ Snorth ∪ Swest ∪ Ssouth .
8
Since the union of these four sets is infinite, at least one of them is also
infinite. Let it be Seast (the other cases are analogous). Then we classify
Seast into three classes according to direction of the second edge:
Seast = Seast,
east
∪ Seast,
north
∪ Seast,
south .
Again, at least one of these subsets must be infinite.
Seast, north , we classify it again:
Seast,
north
= Seast,
north, east
∪ Seast,
north, north
∪ Seast,
If it is, say,
north, west
and again at least one of these subsets must be infinite. Thus we continue
inductively. At the n -th step of our inductive argument we already have
a sequence of n directions such that the set of open self-avoiding finite
paths starting with these directions is infinite. Since we can continue this
argument infinitely, this sequence grows infinitely, thereby defining an
infinite open self-avoiding path starting at (0, 0) . Lemma 1.2 is proved.
Now let us prove the statement (a’). If there is an infinite open selfavoiding path starting at zero, then, by taking its first n steps, we obtain
a finite open self-avoiding path starting at (0, 0) , whose length is n . Let
us estimate the probability of its existence. For any self-avoiding path of
lenght n the probability that it is open is εn . The number of self-avoiding
paths of length n starting at (0, 0) does not exceed 4 · 3 n−1 . Thus the
event ”there is an open self-avoiding path of length n starting at (0, 0) ”
is a union of at most 4 · 3n−1 events, the probability of each being εn .
Therefore the probability of this event does not exceed the sum of their
probabilities, which is
4 · 3 n−1 · εn =
4
· (3ε)n .
3
If ε < 1/3 , this quantity tends to zero when n → ∞ . But the probability of percolation from (0, 0) to ∞ is not greater than this quantity.
Therefore the probability of percolation is zero for all ε < 1/3 .
The proof of (b’) is more difficult. Which sets of closed edges make percolation impossible? Let us call such sets obstacles. It is better to speak
about minimal obstacles, that is obstacles, all of whose proper subsets are
not obstacles. One minimal obstacle is shown on figure 1.3. Closed edges
are crossed and wet vertices are circled.
9
j 6
i
i
i
i
i
i
i
i
i
t
i
i
i
-
i
(0, 0)
i
i
i
i
i
Figure 1.3.
One minimal obstacle, that is a minimal set of closed edges that makes
percolation impossible. Closed edges are crossed, wet vertices are circled.
You can see that closed edges on figure 1.3 form some kind of fence around
the origin. It is better to make the crossing bars longer, so that they form
a continuous contour around (0, 0) as shown on figure 1.4.
This observation can be turned into a rigorous statement. In the previous
section we defined dual graphs for plane graphs. In the analogous way we
may define dual graphs for infinite plane graphs, including the checkered
paper. On figure 1.4 the dual graph is shown by dotted lines.
There is a one-to-one correspondence between edges of the two graphs,
namely every edge of the dual graph crosses exactly one edge of the original
graph and vice versa, and the relation between their being open is exactly
the same as (1).
Lemma 1.3. If the rule (1) is applied, there is no open path in the
checkered paper from (0, 0) to ∞ if and only if in the dual graph there is
an open contour surrounding (0, 0) .
Proof of this lemma is published in [26]. The method we use here is known
as Peierls contour method because R. Peierls used it first; he applied it to
Ising model.
10
j 6
ti
-
i
(0, 0)
Figure 1.4.
The crossing bars form a contour surrounding the origin.
Now let us prove assertion (b’). According to lemma 1.2, the probability
that there is no percolation in checkered paper from (0, 0) to ∞ equals
the probability of existence of an open contour surrounding (0, 0) in the
dual graph. This probability does not exceed the sum over all contours
surrounding (0, 0) of the probabililty that a given contour is open. Let
us estimate this sum. All contours have an even number of steps and
therefore this number can be denoted 2n , where n ≥ 2 because the
minimal contour has length 4. A contour having 2n steps is open with a
probability (1 − ε)2n . Thus the probability that there is no percolation
from (0, 0) to ∞ does not exceed
Prob(no percolation from (0, 0) to ∞ ) ≤
∞
X
Cn (1 − ε) 2n ,
n=2
where Cn is the number of different contours having 2n steps and surrounding (0, 0) . It remains to estimate Cn . To determine a contour
surrounding (0, 0) and having 2n steps, it is sufficient:
i) Specify the i coordinate of the leftmost point of intersection of our
contour with the positive half of axis i . This coordinate equals k + 1/2 ,
11
where k is an integer number between zero and n − 2 . (For example,
k = 2 on figures 1.2 and 1.3.) Thus here we have n − 1 cases.
ii) Specify directions of the 2n edges starting from that which we hit
in the item i) and going counter-clockwise along the contour. The first
edge’s direction is north, every other edge’s direction has at most three
possible values, the last edge’s direction is predetermined because the
contour must return to its initial point, so the number of cases here does
not exceed 3 2n−2 .
Therefore Cn ≤ (n − 1) · 32n−2 and the probability that there is no percolation from (0, 0) to ∞ does not exceed
∞
X
(n − 1) · 3 2n−2 (1 − ε)2n .
n=2
For (1 − ε) small enough this sum is less than one and this is what we
need. In fact, this sum equals
2
2
x
where x = 3(1 − ε) .
3(1 − x)
It is less than one if
1
ε > 1 − √ ≈ 0.71.
2 3
Thus the probability of percolation on checkered paper is zero if ε is small
enough and positive if ε is large enough. We can define the critical value
ε∗ as the supremum of those values of ε for which the probability of
percolation is zero. Then
0 <
1
1
≤ ε∗ ≤ 1 − √ < 1.
3
2 3
In chapter 9 we shall prove an important property of percolation: the
probability of percolation is a non-decreasing function of ε in all the
cases considered in this chapter. Hence in every case there is only one
critical value ε∗ such that probability of percolation is zero for ε < ε∗
and positive for ε > ε∗ . If thew critical value ε∗ equals 0 or 1, we call it
trivial; if it is in (0, 1) , we call it non-trivial.
Thus we have proved our main statement: existence of a non-trivial critical
value, which we define as the supremum of those values of ε for which
the probability of percolation is zero.
12
Of course, our estimations of the critical value are very rough and can
be improved. In fact, in the present case the critical value is known
exactly: it equals 1/2 . However, this is an exception connected with the
fact that the dual graph of checkered paper is isomorphic with it. To
prove this exact value is much more difficult. The proof was first obtained
by Kesten and published in his book [16]. You can find a later version
of this proof in [13].) Generally, dual graphs are not isomorphic with
the original graphs and the critical values are difficult even to estimate.
Even computability of some of these critical values was brought to public
attention only recently [14].
We formulated lemma 1.3 only for the checkered paper, but in fact
it can be formulated in more general terms: for any periodic plane
graph. A plane graph is called periodic if it has a 2-dimensional group of
automorphisms.
Let us assume that (0, 0) is the only source of liquid. As before, a vertex
is wet if liquid can reach there and percolation from (0, 0) to ∞ means
that the set of wet vertices is infinite.
Lemma 1.4. (A general version of Lemma 1.3.) If rule (1) is applied,
there is no percolation from (0, 0) to ∞ in a plane periodic graph, all of
whose faces are bounded, if and only if in its dual graph there is an open
contour surrounding (0, 0) .
You can find the proof in [26]. One of those graphs, to which this statement allows to prove existence of critical value, is shown on figure 1.4.
Exercise 1.1. Prove existence of a critical value for the triangular lattice
shown on figure 1.5.
Exercise 1.2. Let us consider one-dimensional percolation, where the set
of vertices is Z and two vertices x and y are connected with an edge
if |x − y| ≤ 100 . As before, suppose that 0 is the only source of liquid,
every edge is open with a probability ε independently from other edges and
percolation means that the set of wet vertices is infinite. Prove that in this
case the probability of percolation is zero for all ε < 1 . As before, we can
define ε∗ as the supremum of those values of ε for which the probability
of percolation is zero, but now this supremum is trivial: ε∗ = 1 . This
suggests that there is qualitative difference between one-dimensional and
multi-dimensional cases in percolation.
13
qq
qq
qq
qq
qqqqq
qq
qq
qq
qq
qqqqq
qq
qq
qq
qq
qqqqq
qq
qq
qq
qq
qqqqq
qq
qq
qq
qq
qqqqq
J
J
J
J
J
J
J
J
J
J
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q q q J
q q q q q J
q q q q q J
q q q q q J
q q q
J
q
q
q
q
q
qq
qqq J
q
qq
qq
qq
J
J
J
J
qq
qq
qq
qq
J
J
J
J
J
qq
qq
qq
qq
J
qq
J
qq
qq
J
qq
J
J
qqq J qqqqq J qqqqq J qqqqq J qqqqq J qqq
q
qqq
qqq
q
qqq
qqq
qqq
qqq
qqq
qqq
qqqqqJ
qqqqqJ
qqqqqJ
q q q q q qJq
q q q q q qJq
q
q
q
q
q
J J J J J
qq
qq
qq
qq
qq
qq
qq
J
qq
J
J
qq
J
qq
J
qq
qq
qq
qq
qq
J
J
J
J
J
J qqqqq J qqqqq J qqqqq J qqqqq J qqqqq q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
J
q q q q q J
q q q q q J
q q q q q J
q q q q q J
q q q
q
q
q
q
q
qq
qqq J
q
qq
qq
qq
J
J
J
J
qq
qq
qq
qq
J
J
J
J
J
qq
qq
qq
qq
qq
J
J
J
J
J
qq
qq
qq
qqq J qqqqq J qqqqq J qqqqq J qqqqq J qqq
q
q
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqq
qqqqqJ
qqqqqJ
qqqqqJ
q q q q q qJq
q q q q q qJq
q
q
q
q
q
J J J J J
qq
qq
qq
qq
qq
J
J
J
J
qq
J
qq
qq
qq
qq
qq
qq
qq
qq
qq
J
J
J
J
J
J qqqqq J qqqqq J qqqqq J qqqqq J qqqqq q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
J
q
q
q
q q q
q q q q q J
q q q q q J
q q q q q J
q q q q q J
q
q
qq
qqq J
qq
qq
qq
q
J
J
J
J
qq
qq
qq
qq
J
J
J
J
J
qq
qq
qq
qq
qq
qq
qq
qq
J
J
J
J
J
qqq J qqqqq J qqqqq J qqqqq J qqqqq J qqq
q
q
q
q
q
q
q
q
q
qqq
qq
qq
qq
q q q q q q q qJq
q q q q q q q qJq
q q q q q q q J
q q q q q q q J
qqqqqJ
qq
qq
qq
qq
qq
J
J J J J qq
qq
qq
qq
qq
J
J
J
J
J
qq
qq
qq
qq
qq
qq
qq
qq
qq
qq
Figure 1.5. Continuous lines show a triangular lattice and dotted lines
show the dual graph. In this case the dual graph is a hexagonal lattice.
Exercice 1.3. Let us take the infinite graph used in theorem 1.2 and take
its subgraph keeping only those vertices (i, j) for which i ≤ j and j ≥ 0
and only those edges, both ends of which are kept. As before, every edge is
open with a probability ε independently of other edges. Let us denote by Π
the probability of percolation from (0, 0) to ∞ . Prove that Π undergoes
a phase transition as ε grows from 0 to 1.
14
Chapter 2
Directed plane percolation: finite and infinite.
Another kind of percolation, which is still more important in the present
course, is directed percolation. In this case an edge may be closed in one
direction and open in the other direction. The figure 2.1 shows a directed
version of checkered paper.
j 6
6
6
6
6
6
-
6
6
-
6
6
-
6
6
-
6
6
6
6
6
6
6
-
6
6
-
-
6
6
-
6
6
6
6
-
6
6
6
6
6
-
6
6
-
-
-
-
6
6
6
6
6
6
(0,0)
r
-
Figure 2.1.
6
6
6
6
-
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
-
-
i
Figure 2.1 is a directed version of checkered paper. In this case a path can
contain only east or north directed steps. One path starting at (0, 0) is
shown by arrows.
Suppose that every edge in the figure 2.1 is open in the direction of arrow
with probability ε and closed with probability 1 − ε independently of
others and always closed in the opposite direction. As before, we call a
path open if all its steps are open - in that direction in which we use them
- and a vertex is wet if there is an open path from (0, 0) to this vertex.
Percolation means, as before, that the set of open vertices is infinite or,
which is equivalent, that there is an infinite open path starting at (0, 0) .
Theorem 2.1. Percolation from (0, 0) to ∞ on the directed checkered
paper has a critical value ε∗ strictly between zero and one such that:
15
(a) If ε < ε∗ , the probability of percolation is zero.
(b) If ε > ε∗ , the probability of percolation is positive.
As before, it is sufficient to prove that
(a’) the probability of percolation is zero for ε small enough and
(b’) the probability of percolation is positive for ε large enough.
The assertion (a’) is easy to prove. However, when proving the assertion
(b’) we meet a new difficulty: the minimal obstacles are more complicated.
j 6
6
6
6
6
6
(0, 0)
6
-
6
-
6
6
-
6
6
-
6
-
6
6
6
6
6
6
-
-
6
6
6
-
6
6
6
-
6
6
6
6
6
6
-
6
6
6
6
6
6
-
6
6
6
-
6
6
6
6
6
6
6
s
6
6
-
-
-
-
-
i
Figure 2.2.
One minimal obstacle on the directed checkered paper. Wet vertices are
circled. This obstacle also can be presented as a contour surrounding
(0, 0) . See next figure.
First of all let us formulate how states of edges of the dual graph depend
on the states of edges of the original graph.
We must choose one of the two orientations on the plane, and we choose
the counterclockwise one. This choice is arbitrary, but it must be made.
This is a kind of chirality similar to the choice which police of any face
must make: all vehicles must drive on one and the same side: right or left.
A similar choice must be made in electromagnetism, screw industry and
16
other circumstances. Thus to every direction of an edge of the original
graph there corresponds a direction of the corresponding edge of the dual
graph: the direction from the right hand to the left hand as we advance
straight ahead along the original edge in the chosen direction. So we adopt
the following rule:

Every edge of the dual graph is open in a certain direction 
if and only if the corresponding edge of the original graph is
(2)

closed in the corresponding direction.
Lemma 2.1. If rule (2) is applied, there is no percolation in a directed
graph on a plane from (0, 0) to ∞ if and only if in the dual graph there is
an open contour going around the source (0, 0) in the positive direction,
that is counterclockwise. Proof of lemma 2.1 is available in [26]. It is
illustrated by figure 2.3 where one contour is shown.
6j
6
6
6
6
6
6
6
6
6
6
6
6
6
qq q q q q q q q q
qqqqqqq
q q q q q q q q qq q q q q q q qq
qq
qq6
q q
qq qq q
6 q
6
6
6
6 qq
6
6
qq q q q q q -q
q q q q q q q q q-q
q q q q q q q qq qq
q?
q6
qq
q
qq
qq q q
q
6 q
6 qq
6
6
6
6
6
q
q?
q
q q q q q q q q qq
qq
q6
q
q
qq
qq
6
qq q qq qq qq qq 6 qq 6 qq
6
6
6 qq 6 qq
6
q?
qq
q?
qq
q
q
qq qq
6
qq qq
6
q q q q q
q
q
q
6 qq 6 qq
6
6
6 qq 6 qq
6
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
?
?
qq
q
q
6
qq qq
qq qq 6 qq 6
6
6
6
6 qq
6
q q q q q q q q q-q
q q q q q q q q-q
q q q q q q q qq qq
q
q q q q q q q q?
qqq q q q q q -q
q
qq6
q qq
q
q
s
-
6
j=1
(0,0)
-
-
-
-
-
-
i
-
i=2
Figure 2.3.
Contour corresponding to the obstacle shown on figure 2.2. Here x = 2
and y = 1 (coordinates of its beginning and end).
Let us use lemma 2.1 to prove the assertion (b’). Like before, the proba-
17
bility that there is no percolation does not exceed
X
α |ω| ,
(3)
ω∈Ω
where α = 1−ε , Ω is the set of all minimal obstacles and |ω| is cardinality
of ω . It follows from lemma 2.1 that to every minimal obstacle there
corresponds a contour in the dual graph - like that shown on figure 2.3.
The original graph cuts the plane into infinitely many square faces and
one unbounded face. Every contour in the dual graph starts and ends in
one and the same face, namely in the unbounded one. Let us denote by
i the horizontal coordinate of its beginning and j the vertical coordinate
of its end. For the contour shown on figure 2.3 i = 2 and j = 1 and
generally i and j take any positive integer values. Also let us denote
e, n, w, s the numbers of east, north, west, south steps in a contour.
Notice that w = i + e and n = j + s . The number of edges in the
corresponding obstacle is n + w = i + j + e + s . The total number of steps
in the contour is e + n + w + s = i + j + 2e + 2s . The directions of the first
and last steps are determined uniquely and directions of all the others are
chosen of at most three options, so the number of contours with given
e, n, w, s does not exceed 3 e+n+w+s−2 = 3 i+j+2e+2s−2 . The table below
shows the probabilities of original edges to be open in all directions and
probabilities of dual edges resulting from the rule (2).
Original graph
Dual graph
east : open with prob. ε
north: open with prob. α = 1 − ε
north: open with prob. ε
west : open with prob. α = 1 − ε
west : always closed
south: always open
south: always closed
east : always open
Figure 2.4.
Therefore the probability of an obstacle is αn+w = αi+j+e+s , whence (3)
does not exceed
∞ X
∞ X
∞ X
∞
X
i=1
j=1
e=0
s=0
3 i+j+2e+2s−2 α i+j+e+s =
18
∞
∞
∞
∞
X
X
X
1 X
i
j
e
·
(3α) ·
(3α) ·
(9α) ·
(9α)s =
9 i=1
e=0
s=0
j=1
1
3θ
3θ
1
1
·
·
·
·
=
9 1 − 3α 1 − 3α 1 − 9α 1 − 9α
α
(1 − 3α)(1 − 9α)
2
.
If α is small enough, e.g. smaller than 0.09 , this expression is less than
1 . Thus we have proved that the probability of percolation is positive as
soon as ε > 1 − 0.09 = 0.91 . This is only an estimation. The critical
value is less than this; its exact value is unknown. It is not yet even proved
that it is computable. Using the method of [14], probably, it is possible
to prove that this critical value is computable.
Exercice 2.1 For every natural n let us consider a finite directed graph
Γn which is a square piece of the graph in theorem 2.1. In other words
its set of vertices is {0, . . . , n}2 and from every vertex (i, j) there go two
directed edges: to (i + 1, j) and to (i, j + 1) . Every edge is open in
this direction with a probability ε independently of the other edges and
always closed in the opposite direction. Let us denote by Π the probability
of percolation from (0, 0) to (n, n) . Prove that Π undergoes a phase
transition as n → ∞ .
19
Chapter 3.
Stavskaya process
You may wonder why did we pay so much attention to percolation if the
title of this course promised to speak about cellular automata. The answer
is that among the many applications of percolation, the study of cellular
automata plays an essential part.
So let us go to processes. Every process considered here is based on
a set denoted by G and called ground space. (Most often G = Zd .)
Elements of G are called points; we may imagine them as components of
a multicomponent system.
Let us present our first example of a cellular automaton called Stavskaya
process; Olga Stavskaya, advised by I. I. Piatetski-Shapiro, was the first
person who wrote a computer program, simulating the process named
after her and experimentally showed existence of phase transition in that
process. Our denotations are somewhat different from the article [31],
which initiated our studies of probabilistic cellular automata at Moscow
University, partially summed up in [36, 37]. In the present case the ground
space is G = Z .
Dealing with PCA, we shall use an operator, which we call random noise
and denote Rαβ , where α and β are parameters with values in [0, 1] . The
operator Rαβ substitutes:
every 0 by 1 with probability α and
every 1 by 0 with probability β,
(4)
doing this to all components simultaneously and independently.
If one of these parameters equals zero, it may be omitted:
Rα means Rαβ with β = 0 and
Rβ means Rαβ with α = 0 .
Dealing with PCA, we in every case choose a finite or countable ground
space G and non-empty finite alphabet A . Its elements are called letters.
In the present case A = {0, 1} and the letters are 0 and 1 . We may
imagine that the state 1 means presence of a particle and the state 0 means
an empty site. The set AG is called the configuration space and denoted
by Ω . In the present case Ω = {0, 1}Z . Its elements are bi-infinite
sequences called configurations. Every configuration x is determined by
its components xi ∈ {1, 0} , where i ∈ Z .
We shall consider a sequence of probabilistic measures enumerated by t =
20
0, 1, 2, . . . , which we call ”Stavskaya process”. We assume that initially all
the components are zeros for sure and then, at every step of the discrete
time, two transformations occur, denoted by D (deterministic) and by
Rα (random), where α ∈ [0, 1] is a real parameter. The deterministic
transformation D , when applied to a configuration x turns it into a
configuration y defined as follows:
(
1
if xi = xi+1 = 1,
yi =
0
in all the other cases.
Speaking informally, every particle dies if it find no protection from its
right neighbor.
The transformation Rα is probabilistic: under its action, every component
turns into a state 1 (present) with probability α independently from what
happens at other places.
We shall use pseudo-codes to formalize our ideas. We enumerate lines for
convenience of reference. The Stavskaya process with ”all zeros” as the
initial condition may be expressed by the following pseudo-code:
1
2
3
4
5
6
7
for all i ∈ Z do simultaneously
xi, 0 ← 0
for t = 1 to ∞ do
for all i ∈ Z do simultaneously
x(i, t) ← min(x(i, t − 1), x(i + 1, t − 1))
for all i ∈ Z do simultaneously
if rnd < α then x(i, t) ← 1
The sign ← in lines 2 and 5 is the assignment operator; x ← a means that
variable x is assigned value a . Thus lines 1-2 assign the initial configuration ”all zeros”. Lines 4-7 perform a time step. Lines 4-5 corresponds to
the operator D . Lines 6-7 corresponds to the operator Rα . This pseudocode uses a random number rnd which is uniformly distributed between
0 and 1, newly generated every time when it is called and is independent
from all the previously generated random numbers.
Let us represent the same idea in mathematical terms. We assume that the
process is induced by independent random variables rnd(i, t) , everyone of
which equals 1 with probability α and 0 with probability 1 − α by the
map defined in the following inductive way:
21
Base of induction: x(i, 0) = 1 for all i ∈ Z .
Induction step: x(i, t) = max(y(i, t), min(x(i, t − 1), x(i + 1, t − 1))).
where every y(i, t) is an independent random variable, which equals 1
with probability α and zero with probability 1 − α . Figure 3.1 shows
this triangle for i = 0 and t = 3 . (The axis of time is slanted to make
the scheme symmetric.)
(0, 3)
%
(0, 2)
(1, 2)
%
%
(0, 1)
(1, 1)
(2, 1)
%
%
%
(0, 0)
(1, 0)
(2, 0)
(3, 0)
Figure 3.1.
T
I
@
@
@
@
I
@
I
@
@
@
@
@
@
s0
@
I
@
@
@
I
@
@
I
@
@
@
s1
@
@
s2
@
@
s3
Figure 3.2.
The figure 3.2 shows that part of the percolation model for Stavskaya process, which is relevant for the state of point 0 at time 3 . In this model:
Edges are always open upward and closed downward. Vertices s0 , s1 , s2 , s3
are always open. Other vertices are open with probability 1 − α and closed
with probability α independently of each other. There is zero at the point
(0, 3) (point 0 at time 3) if and only if there is an open path in this graph
from some of the sources si to the target T .
Lemma 3.1. In the Stavskaya process starting from ”all zeros” there is a
zero at a point (v, t) if and only if there is an open path from some initial
vertex to the vertex (v, t) in the percolation graph.
22
Notice that the state of a point i at time t depends only on what happens
in the triangle
{(j, s) : i ≤ j ≤ i + t − s} .
However, it is better to stretch every vertex, thus turning vertex percolation into bond percolation as shown on figure 3.3.
B
q
q q
q
q
q
q
q
q
qq q q q q q6q q q q qq
q q
q q
q q
q q
q q
q
q
} q
q >Z
q
q
Z
q
q
q
q
q q Z
q
q
q
q
q
qq q q q q6q q q qq q q q q6q q q q qq
q q
q q
q q
q q
q q
q q
q q
q
q
q
q
>Z
} q
} q
q >Z
q
q
q
Z
Z
q
q
q
q
q
q
q q Z q q Z
q
q
q
q
q
q
qq
q
q
q
q
6
6
6
qqqq q q q q q qqqqq q q q q q qqqqq q q q q q qqqq
q q
q q
q q
q q
q q
q q q
>Z
>Z
>Z
} qq
} qq
} qq
q
q
q
q
q
q
q
q
Z
Z
Z
q
q
q
q
q
q q Z q q Z q q Z
q
q
q q
q
q q
q q
q
q q q q q6q q q q q q q q6q q q q q q q q6q q q q q q q q6q q q q
q
A
q
q
s0
q
Ts
s1
s2
s3
XXX
XXX @
@
XXX
@s
C
S
Figure 3.3.
Stavskaya process as percolation. Presence of a zero at the point (0, 3)
amounts to percolation in the graph shown by contunuous lines from the
source S to the target T. Dotted lines show the dual graph. The sides AB
and BC correspond to one face.
Let us imagine that four vertices denoted s0 , s1 , s2 , s3 on figure 3.2 are
sources of liquid and that arrows are directed pipes which can transmit
this liquid upward, but not back. The inclined arrows are always open,
but the vertical arrows may be open or closed because they imitate our
random operator: everyone of them is closed with probability α and open
with probability 1 − α . Then the probability that there is a zero at point
0 at time 3 in our process equals the probability that there ia an open
path from at least one of the sources s0 , s1 , s2 , s3 to the target T . Thus
we have reduced a problem about our random process to a problem about
percolation.
23
However, it is better to have only one source. For this reason we introduce
a special vertex S and connect in by edges with s0 , s1 , s2 , s3 . It is
convenient to assume that these edges are always open in both directions
- then the dual edges will be always closed in both directions and we don’t
even need to draw them. For the same reason it is convenient to assume
that the vertical edges of the original graph are always open downward;
it does not create any unwanted opportunities of percolation because the
slanted edges are always closed downward. Then, according to the rule
(2), the edges of the dual graph (shown by dotted lines) are open as
follows: edges directed . and - are always open in these directions
and always closed in the opposite directions; edges directed → are open
in this direction with probability 1 − α and always closed in the opposite
direction.
First let us prove that α∗ ≤ 1/2. Indeed the probability of percolation
from the t = 0 level to the point (0, t) is
P (percolation) = P (∃ open path) ≤
X
P (path is open) ≤
path
X
t
(1 − α) =
2(1 − α)
t
.
path
If α > 1/2 then the last expression tends to zero as t → ∞ .
Now let us prove that α∗ > 0 . With this purpose we concentrate our
attention at the dual graph shown on figure 3.3. According to lemma
2.1, there is no percolation in the original graph if and only if there is
an open contour in this graph surrounding T and going in the positive
(counterclockwise) direction. We may assume that every contour starts
and ends at the topmost point B . The probability that there is such a
contour does not exceed
∞
X
Ck α k ,
(5)
k=1
where Ck is the number of such contours corresponding to obstacles with
k elements, that is having k horizontal steps. Every contour has equal
numbers of steps of all the three directions, so altogether it has at most
3k steps. Since every step of a contour has only three possible directions,
Ck ≤ 33k and therefore the probability that there is one at site 0 at time
t does not exceed
∞
X
27α
33k · αk =
,
(6)
1 − 27α
k=1
24
which is less than one as soon as α < 1/54 . Thus, whenever α < 1/54 ,
zeros do not die out because their density does not tend to zero. We have
proved that 1/54 ≤ α∗ ≤ 1/2 for Stavskaya process.
Bq
A
q q
Jq
q
]
q
J q
q
q
q T Jq
q s Jq
qq
q
qq
q q q q q q q qJq q q
q
Jq
q
q
q ]
q
q
q Jq
q
q
q
Jq q
q
q
q
q q
q
Jq
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qJq qq q
q q
q q
Jq
q
q
q ]
q q
q
q
q Jq
q
q
q
q
q
q
q
q
Jq
q
q
q
q
q
q q
q
q q
q
J
q
q
q
q
q
q
q q q q q q q q q q q q q qJqq q q q q q q q q qq
q
q
q
q q
q q
q q
q q
Jq
q q
q
q
q ]
q q
q
q
q
q
q Jq
q
q
q
q
q
q
q
q
q
Jq q
q
q
q
q
q
q
q
q
q q
q q
q
Jq q
q
q
q
q
q
q
q
q
q q q q q q q q q
q q q q q q q q q q q q q qqJq q q q q q q q q
C
Figure 3.4.
The dual graph for the Stavskaya process. Thick continuous arrows show
one contour surrounding T . Existence of an open contour surrounding T
in the positive direction amounts to presence of one at the point 0 at time
3 in the process.
Stavskaya process with a finite space
Stavskaya process has been simulated on a computer using Monte Carlo
method. But a computer cannot deal with infinity! So in fact this simulation was performed with a finite space. To bring our pseudo-code closer to
our simulation we need a finite space Zm - the set of remainders modulo
m , where m is an arbitrary natural number.
The following pseudo-code shows how to do it, where the lines are enumerated for reference purpose:
1
2
3
4
5
6
7
for all i ∈ Zm do simultaneously
xi, 0 ← 0
for t = 1 to tmax do
for all i ∈ Zm do simultaneously
xi, t ← min(xi, t−1 , xi+1, t−1 )
for all i ∈ Zm do simultaneously
if rnd < α then xi, t ← 1
Operations with residues modulo m result in residues modulo m , in
25
particluar (m − 1) + 1 = 0 . It is easy to prove that for any α > 0 , for
any finite space the zeros die out in a finite mean time . To prove this
is easy, it is sufficient to notice that all the ones may die at once with a
probability αm . Hence the expectation of time when all zeros die out is
not greater than α−m .
Why this was not observed in computer experiments? Notice that the
number α−m is enormous even for quite moderate values of α and m ,
much greater that the time which we can afford in experiment. What we
observe in the experiment is not distinction between ergodicity and nonergodicity which takes place in infinite processes, but distinction between
fast and slow convergence which takes place in finite processes. This
distinction can be formulated as follows.
Theorem 3.1. Let us take the Stavskaya process with Zm as the ground
space and with initial condition ”all zeros”. As before, α means birth
rate. Then:
(a) If α > 3/4 , the mathematical expectation of time, when all components
turn ones, grows at most as a logarithm of m when m → ∞ .
(b) If α < 1/54 , the mathematical expectation of time, when all components turn ones, grows at least as an exponent of m when m → ∞ .
Our process starts with ”all zeros”. Let us estimate the expectation of time
T when it reaches ”all ones”. Since T takes only natural values,
E(T ) = P (T ≥ 1) + P (T ≥ 2) + P (T ≥ 3) + . . .
(7)
We may assume that m > 1 and shall consider two cases.
Case 1. Let α ≥ 3/4 .
As in the infinite case, we use percolation on the appropriate graph G .
<< The event T > t , that is ”all ones” is not yet reached at time t ,
takes place if and only if there is an open path from time 0 to time t .
Everyone of these paths is open with probability (1 − α)t and there are
m · 2t such paths. So the probability that at least one of these paths is
open does not exceed m · (2(1 − α))t . But this probability is P (T > t) .
Therefore
P (T > t) ≤ m · (2(1 − α))t ≤ m · 2−t .
(8)
Since T takes only natural values, we may rewrite the sum (7) as follows:
E(T ) = P (T > 0) + P (T > 1) + P (T > 2) + . . .
(9)
Of course, each of them does not exceed 1, but we use this trivial fact only
for those terms, whose parameter t does not exceed log2 m . The others,
26
whose parameter t exceeds log2 m , do not exceed terms of a geometric
progression, whose first term does not exceed 1 and whose common ratio does not exceed 1/2, so their sum does exceed 2. Altogether we get
E(T ) ≤ 2 + log2 m . Thus for large enough values of α the expectation
E(T ) grows at most as logarithm of m .
Case 2. Let α ≤ 1/54 .
We reach the configuration ”all ones” not earlier than at time t if and
only if there are i, j ∈ Zm such that there is an open path from (i, 0)
to (j, t) . Connecting all the vertices (i, 0) to one pole A and all the
vertices (j, t) to another pole B , we get a percolation on a finite plane
graph. Let us denote by Γ this graph and by Γ0 the dual graph. There is
no percolation from (i, 0) to (j, t) if and only if there is no percolation
from A to B . This is equivalent to existence of an open contour in the
dual graph which separates A from B . Any contour of this sort may be
described as follows: we start at the point (0, 0) and let t grow until we
meet an edge of Γ0 belonging to this contour. Then we move along our
contour in the direction in which it surrounds A in the counter-clockwise
orientation. The length of this contour is not less than m and at every
step we have to choose among at most three directions. Then, denoting
by k the number of horizontal steps in this contour, T ≥ t if and only if
at least one of these contours is open. Then
P (T ≥ t) ≤
∞
X
33k · αk .
k=m
Thus
∀ t : P (T ≥ t) ≥ 1 − t · (27α)m .
Any of these terms is not less than 1/2 if
t≤
1
.
2 · (27α)m
Remembering that α ≤ 1/54 , we get
t ≤ 2m /2.
Substituting each of these terms by 1/2 and excluding all the others, we
get
E(T ) ≥ 2m .
Thus we have proved that for α small enough E(T) grows at least exponentially when m → ∞ .
27
Thus for finite Stavskaya processes also there is some kind of phase transition. We may denote by εlog the infimum of α for which the mathematical
expectation of the time, when all zeros die out, grows at most as a logarithm of m when m → ∞ . We may also denote by αexp the supremum
of α for which the mathematical expectation of time, when all zeros die
out, grows as an exponent of m when m → ∞ . So 0 < εexp ≤ εlog < 1.
We guess that εexp = εlog , but cannot prove it.
Chaos approximation.
We have only very rough estimations of the critical values of Stavskaya
process. This is typical of probabilistic cellular automata and percolation.
Considering these difficulties, it makes sense to examine approximations.
One of them is called ”chaos approximation” or ”mean field approximation.” Let us assume that every time when Stavskaya operator is applied,
all components are randomly mixed. In this way we get a product measure after every time step and one parameter xt , the density of zeros, is
sufficient to describe what happens all the time. This parameter at all
times is determined by the conditions:
x0 = 0,
xt+1 = α + (1 − α) x2t .
(10)
Of course, behavior of xt is different from behavior of the density of zeros
at time t in Stavskaya process. However, their qualitative similairy is
intriguing: in both cases there is a critical value of α . In physics such
approximations of complex processes by simple iterations are widely used
and called ”mean-field approximations” because they can be interpreted
as a substitution of individual particles and their interactions by some
uniformly distrubuted mean field described by just one parameter - density.
We iteratively define a sequence x0 , x1 , x2 , . . . with initial value x0 = 0
by the rule
∀ n : xn+1 = α + (1 − α) · x2n .
(11)
where α ∈ [0, 1] is a constant paremeter. We are interested in the limit
of xn as n → ∞ . The diagram 3.6 helps us to see that
(
1
lim =
n→∞
α/(1 − α)
if α ≥ 1/2,
if α < 1/2.
28
y=1
6
y=3/4
y=1/2
y=1/4
-
x=1
(0,0)
Figure 3.5.
lim
ε = 1/2
6
-
ε=1
(0,0)
Figure 3.6.
In the figure 3.6 the thick curve shows the limit limn→∞ xn as a function
of ε . The value ε = 1/2 is critical. Thus we see that even such a simple
approximation has a critical point, only its value is 1/2, which is different
from the critical value of Stavskaya process.
29
Cayley tree and Bethe lattice.
We can present a graph on which the chaos approximation is exact. It
is based on ideas of Bethe and Cayley adapted to our kind of random
processes. A part of this graph is shown on the figure. The central point,
marked with the black circle represents the value of the zero-th component
at a time t .
6
*H
Y
HH
HH
H
@
I
@
I
@
@
@
@
A
K
AK
AK
AK
A
A
A
A
A
A
A
A
CO
CO
CO
CO
CO
CO
CO
CO
C
C
C
C
C
C
C
C
C C C C C C C C
Figure 3.7.
Part of Bethe lattice, where the chaos approximation for uniform operators with two neighbors is exact because any product-measure turns into a
product-measure.
Exercice 3.1. Prove that the limit limt→∞ xt of the iterations (11) exists
and study its behavior depending on parameter α . For which values of α
this limit equals 1 and for which values of α this limit is less than 1?
Solved problem 3.1. Throughout this book the alphabet A is finite.
Let us make an exception just to make the reader aware of this possibility.
Let A = {0, 1, 2, 3, . . .} , G = Z and Ω = AZ . The initial condition is δ0
concentrated in ”all zeros”. Let a deterministic operator D be defined as
follows:
∀ x ∈ Ω, ∀ v ∈ G : (Dx)v = min(xv , xv+1 ).
The random operator Rα increases every component by 1 with probability
α independently from all the others. So we get a sequence of measures
µt = (Rα D)t .
Let us denote by Et the mathematical expectation of µt (0) . Let us say
that this model grows if Et tends to infinity as t → ∞ . Then there is a
critical value α∗ ∈ (0, 1) such that this model grows for α > α∗ and does
not grow for α < α∗ .
30
Chapter 4
Deterministic cellular automata aka DCA
Throughout this chapter our alphabet A will have a special element written as 0 and called ”zero”. The configuration ”all zeros”, all of whose
components are zeros, is assumed to be invariant under those deterministic
operators which we consider and denote by D :
D (”all zeros”) = ”all zeros”.
(12)
You may imagine the configuration ”all zeros” as an abstract analog of
some uniform tissue or crystal. Imagine that there is a small defect in this
tussue: a defect in a crystal, a small tumor in a healthy tissue etc. We
want to predict, what will happen to this defect if we leave it alone: will
it disappear or grow or remain the same?
The area of processes with local interaction is very large and we try to
restrict it. One of means for that is this. In every case we take a set G
which we call ground space. Elements of G are called points. In the most
typical case G = Zd . Also we choose a natural number n and n maps
vi : G → G .
Also we choose a non-empty finite set A called alphabet and a transition
function f : An → A . Given all this. we have a deterministic operator
D : Ω → Ω defined as follows:
( D x)i = f (xv1 (i) , . . . , xvn (i) ).
(13)
For any p ∈ G and i ∈ {1, . . . , n} we call vi (p) the i -th neighbor of p .
Given a set G and maps vi : G → G , we call a map H : G → G an
automorphism if H is one-to-one and
∀ p ∈ G and i ∈ {1, . . . , n} : vi (H(G)) = H(vi (p)).
We call this interaction scheme uniform or USI if for all p, q there is an
automorphism H such that H(p) = q . All the random processes in this
book are based on USI. The most usual examples of USI where d is called
dimension are:
(a) G = Zd and vi (p) = p + vi (0) for all i ∈ {1, . . . , n} ;
(b) G = Zdm and vi (p) = p + vi (0)( mod m) for all i .
What we call DCA, are traditionally called Cellular Automata. DCA have
been studied for several decades and you can find a lot of reference to them
31
at the internet. In particular, John von Neumann [25] used a concrete twodimensional DCA with 29 letters in the alphabet, each component having
9 neighbors including itself to design an abstract ”animal”capable of selfreproduction. John Conway designed a concrete two-dimensional DCA
with the same 9 neighbors and only two letters in the alphabet, which he
called ”Game of Life”. Studies of ”Garden of Eden” provided a contribution to the theory of DCA. Stephen Wolfram organized a lot of computer
experiments with DCA. Some early results about undecidability in DCA
are available in [15]. You can find some modern studies of DCA searching
internet for Computational Mechanics, Complexity and Emergence. Our
results are motivated by the study of ergodicity of probabilistic cellular
automata.
Attractors and Eroders.
A configuration x is called a finite deviation or f.d. of configuration y if
the set {v ∈ G : xv 6= yv } is finite. We say that a configuration x is an
attractor if D x = x and for any y f.d. from x there is a natural number
t such that D t y = x . The notion of attractor is a deterministic analog
of the notion of ergodicity, which we shall discuss later.
Given two configurations x and y , we call them each other’s finite deviation if the set {p ∈ G : x(p) 6= y(p)} is finite. This definition is interesting
only if G is infinite because, if G is finite, all configurations are finite deviations of each other.
Any map from AG to AG will be called a D-operator (where D means
deterministic.)
We call a configuration x invariant for a D-operator D if D x = x .
Given two configurations x, y ∈ AG , we say that x attracts y under
D if x is invariant for D and there is a natural number t such that
D t y = x.
We call a configuration x an attractor under a D-operator D if x is
invariant under D and for any finite deviation y of x there is a natural
number t such that D t y = x .
A configuration is called an island if it is a finite deviation from ”all
zeros”. Due to our assumptions D transforms any island into an island.
(”All zeros” is an island because we claim that the empty set is finite.)
In applications zero often represents an empty unit of space while other
letters represent units occupied by various particles.
Let us call a deterministic operator D an eroder if, given it, the configura-
32
tion ”all zeros” is an attractor. Our first theorem is about non-decidability
of the problem of eroders. In the bulk of mathematics, the greater is the
class of objects which we study, the more valuable are our results. There
is, however, one area, where it is the other way round: proofs of undecidability. If we prove non-existence of algoritm declaring ”yes” or ”no” for
each object of a certain class, our result is the more valuable, the smaller
is this class. We are going to declare undecidability of the problem of deciding which deterministic operators are eroders. To make this result as
valuable as possible, we consider only DCA with the following additional
conditions:
d = 1,
f (0, . . . , 0) = 0,
V (0) = {−1, 0, 1} .
(14)
Theorem 4.1. For DCA satisfying the additional conditions (14) the
problem of recognizing eroders vs. non-eroders is undecidable.
This theorem should not astonish us because DCA are a very rich class of
objects. We do not present a proof of this theorem here; it can be found
in [27]. To give you some taste of proofs of undecidability, we present a
proof of undecidability in chapter 9.
Now we go to theorem 4.2, which asserts decidability of the problem of
recognizing eroders vs. non-eroders in some cases. The main difference
between theorem 4.1 and theorem 4.2 is that in the latter case we assume
that the function f is monotonic. This is what it means. Let us enumerate
A to obtain A = {0, . . . , h} with a natural order on this set. We call a
function f : AV → A monotonic if
a1 ≤ b1 , · · · , aW ≤ bW =⇒ f (a1 , . . . , aW ) ≤ f (b1 , . . . , bW ).
h=4 [10]
h=3 [10]
[24]
h=2 [10]
[30]
h=1 trivial [35]
[35]
[35]
d=1 d=2 d=3 d=4
Figure 4.1.
(15)
33
The figure 4.1 illustrates our theorem 4.2: it shows that we know necessary
and sufficient criteria for eroders in the case h = 1 [35] and in the case
d = 1 [10]. Also we have some incomplete results about the case d = h = 2
[30, 24]. Monotonicity is essential in all these cases.
Eroders vs. non-eroder for the case h = 1 . Although the space of
our operators is discrete, we need to speak about continuous real space
Rd now. A set in Rd is called convex if with any two points it contains
the segment with the ends at these points. Given a set in the real space
Rd , its convex hull is the intersection of all convex sets containing this set.
The following figure gives two examples.
PP
A
P
PP
P
B
C@
PP
P
Fr
I
r
r
r
J
M
r
G
Kr
@
PP @
PP@
PP
@
D
E r Lr
rH
Figure 4.2. Examples of a set and its convex hull. ABCD is a nonconvex quadrangle. The triangle ABD is its convex hull. The quadrangle
EF GH is a convex hull of the finite set {E, F, G, H, I, J, K, L, M } .
We are mostly interested in convex hulls of finite sets. For example, convex
hull of a point is the same point; convex hull of two points is the segment
with the ends at these points; convex hull of three points is either the
triangle with these points as vertices or the segment with two of these
points as ends and the third point between them. If you put nails into a
board at several points and turn a cord around them, it takes the form of
the boundary of the convex hull of these points.
Trying to find a solvable case, we restrict ourselves to the case A = {0, 1}
and d = 1 . More than that, let us consider only uniform operators. This
means that we choose a finite list of vectors v1 , . . . , vn ∈ Zd and a Boolean
function f : {0, 1}n → {0, 1} and define D as follows:
( D x)v = f (xv+v1 , . . . , xv+vn ) for all v ∈ G = Zd .
However, even after all these restrictions the problem of discerning eroders
remains algorithmically unsolvable. So we need a stronger assumption.
Here it is: we assume that D is monotonic, which means that the function
34
f (·) is monotonic, that is
x1 ≤ y1 , . . . , xn ≤ yn =⇒ f (x1 , . . . , xn ) ≤ f (y1 , . . . , yn ).
Let us denote by V the list v1 , . . . , vn the set of neighhbor vectors. Let
us call a subset z of V a zero-set if
(∀i ∈ Z : xi = 0) =⇒ f (xw1 , . . . , xvn ) = 0,
We call a zero-set minimal if all its proper subsets are not zero-sets. Since
V is finite, the set of its subsets is also finite, whence the set of minimal zero-sets is also finite. On the other hand, (12) implies that the set
{v1 , . . . , vn } is a zero-set, whence the family of zero-set is not empty and
we may denote them by z1 , . . . , zk .
Now let us imbed Zd into a real space Rd and denote by conv (S) the
convex hull of any set S ⊂ Rd . We denote by σ the intersection of convex
hulls of all the minimal zero-sets:
σ = conv (z1 ) ∩ · · · ∩ conv (zk ).
(16)
Theorem 4.2 [35]. D is an eroder if and only if σ is empty.
Let us consider several examples.
We define the function median of any odd number of real arguments as
follows: given an odd sample (x0 , . . . , x2k ), we first reorder it into the
non-decreasing order (y0 ≤ y1 ≤ · · · ≤ y2k−1 ≤ y2k ), and then we define
the median as the middle term:
median (x0 , . . . , x2k ) = yk .
What we call ”median”, in some articles was called”voting” [39].
Example 4.1. Let d = 1, V = {−k, . . . , k} and
( D x)(0) = median (x(−k), · · · , x(k)).
In this case a subset of V is a minimal zero-set if and only if it has k + 1
elements. Therefore σ has one element zero, whence it is non-empty.
Accordingly, D is not an eroder. It is sufficient to take an island x such
that
(
1
if 0 ≤ p ≤ k,
x(p) =
0
at all the other points.
35
Example 4.2. Let d = 2 . In this case we denote points, that is elements
of Z2 by pairs (i, j) , where i, j ∈ Z . Then we choose the following
neighborhood:
V = {(0, 0), (1, 0), (−1, 0), (0, 1), (0, −1)} .
In this case σ also has one element zero and therefore is non-empty.
Accordingly, D is not an eroder; as an island x , which D does not
erode, we may take
(
1
if 0 ≤ i ≤ 1 and 0 ≤ j ≤ 1,
x(i, j) =
0
at all the other points.
Example 4.3. The NEC operator, where NEC stands for NorthEast-Center. In this case the deterministic operator D N EC turns any
x ∈ Ω into D x defined as follows:
( D x)i,j = median (xi,j+1 , xi+1,j , xi,j ) for all i, j,
(17)
For D N EC there are three minimal zero-sets:
{(0, 1), (1, 0)} ,
{(0, 1), (0, 0)} ,
{(1, 0), (0, 0)} .
Their convex hulls are segments with these points as ends:
[(0, 1), (1, 0)],
[(0, 1), (0, 0)],
[(0, 0), (1, 0)].
The intersection of these three segments is empty. Thus the set σ for
D N EC is empty and the operator is an eroder. You may check it looking
at the figure 3.5.
In fact, both ”all zeros” and ”all ones” are attractors under this operator.
To prove this, it is sufficient to notice that this function is 0-1 symmetric,
which means that if all its arguments change their values, the function
also changes its value.
Example 4.4 shows that σ may be non-empty, but have no integer point:
( D x)(0, 0) = min max(x(0, 0), x(1, 1), max(x(0, 1), x(1, 0), ) .
This is because D has two minimal zero-sets, namely
{(0, 0), (1, 1)}
and
{(0, 1), (1, 0)} .
Their convex hulls are diagonals of the square with vertices
(0, 0), (1, 0), (1, 1), (0, 1) . Therefore in this case σ consists of one point
(1/2, 1/2) - the center of this square.
36
Example 4.5, flattening, for which
( D f lat x)i,j = min max(xi,j , xi,j+1 ), max(xi+1,j . xi+1,j+1 ) .
(18)
This formula already has min-max form. Its σ and its analog (if 0 and
1 trade places) is empty also. So both ”all zeros” and ”all ones” are
attractors under D f lat . You can figure this out looking at figures 4.3 and
4.4.
Theorem 4.3. (A version of Helly’s theorem.) If there is a finite
family of convex sets in Rd such that every d + 1 of them have a common
point, then all the sets in the family have a common point.
We don’t need this theorem in general, it is sufficient to prove it for the
case when all these convex sets are closed half-planes.
We leave to the reader to prove the following special case of Helly’s theorem for d = 1 : if there are several closed half-lines in a line, every two of
which have a non-empty intersection, then all of them have a non-empty
intersection. (A closed half-line is one of the two halves into which a
line is cut by a point, including this point.) Also we need the following
statement: if there are two closed convex sets in a plane, which do not
intersect, then there is a line in this plane, which separates them, so that
these sets are on different sides of this line. For our case, when all the sets
in question are intersections of several closed half-planes, this statement
is evident and we also leave its proof to the reader.
Now let us prove by contradiction that if there is a finite family of closed
half-planes in a plane such that every three of them have a common point,
then all of them have a common point.
Let n be the smallest number of closed half-planes in a counter-example,
and let C1 , . . . , Cn be closed half-planes in a plane whose intersection is
empty although every three of them have a non-empty intersection. Since
n is minimal with this property, the intersection I = C1 ∩ · · · ∩ Cn−1 is
non-empty. However, I has no common points with Cn . Then there is a
straight line x in the plane, which separates them, so that I and Cn are
on different sides of it.
Now for all i = 1, . . . , n − 1 we denote Di the intersection of Ci with
this line x . Let us prove that every two of the sets D1 , . . . , Dn−1 have a
common point. Let us take some sets Ci and Cj , We know that Ci , Cj
and Cn have a common point. Therefore the intersection Ci ∩ Cj has
a point on one side of x and a point on the other side. Hence, since
37
Ci ∩ Cj is convex, it has a common point with x and this point belongs
to Di ∩ Dj , which therefore is non-empty.
Now we can apply Helly’s theorem for the one-dimensional case to the
sets D1 , . . . , Dn−1 because all of them are closed half-lines (unless they
are empty or coincide with this line, which is easy to handle). Since we
have proved that every two of them have a common point, all of them
must have a common point. Let us call this point y . But every Di is a
subset of Ci , whence y belongs to all C1 , . . . , Cn−1 , whence it belongs to
their intersection I . So the line x has a common point y with I , which
contradicts our choice of the line x . This contradiction proved Helly’s
theorem in the case in which we need it.
Lemma 4.1. If σ is empty, then there are at most three zero-half-planes,
whose intersection is empty.
Proof. Every convex hull of a zero-set can be represented as an intersection of several closed half-planes, all of which are zero-half-planes. Therefore, σ can be represented as an intersection of several zero-half-planes.
Since it is empty, from Helly’s theorem at most three of them have an
empty intersection. Lemma 4.1 is proved.
Lemma 4.2. If there are at most three zero-half-planes, whose intersection is empty, then D is an eroder.
What is important here is the minimal number of zero-half-planes, whose
intersection is empty. On the plane this number is either 2 or 3 and these
cases should be considered separately. To hit at the idea in each case it
is sufficient to examine our two examples: flattening for the former case
and D N EC for the latter case.
Points in the state 1
Figure 4.3.
Why flattening is an eroder, that is why ”all zeros” is an attractor under
flattening. The set of points in the state 1 is between these lines until
it diasppears. The left line does not move. The right line moves in the
direction of arrow as time goes on.
38
?
Points in the state zero
Figure 4.4.
Why ”all ones” is an attractor under flattening. The set of points in the
state 0 is between these lines until it disappears. The lower line does not
move. The upper line moves in the direction of arrow as time goes om.
For flattening there are two zero-half-planes whose intersection is empty:
{(i, j) : i ≤ 0}
{(i, j) : i ≥ 1} .
and
Based on this, let us show that flattening is an eroder. Given a configuration x with a finite I1 (x) , let us draw two vertical lines such that this set
is between them. Then I1 ( D t x) is also between two vertical lines: one
of these lines remains the same and the other line moves towards it as t
grows. Thus at every time step the distance between these lines decreases
and the configuration becomes ”all zeros” in a time equal to the initial
distance between the lines.
@
@
@
@
@
@
@
@
@
@
@
@
Points in
the state 1
@
@
@
@
@
@
@
@
Figure 4.5.
Why D N EC is an eroder. The set of points in the state 1 is in the triangle
formed by these three lines until it disappears. The horizontal and vertical
lines do not move. The inclined line moves in the direction of arrow as
time goes on. Due to its 0-1 symmetry, ”all ones” also is an attractor
under D N EC .
39
For D N EC there are three zero-half-planes, whose intersection is empty:
{(i, j) : i ≤ 0} ,
{(i, j) : j ≤ 0}
and
{(i, j) : i + j ≥ 1} .
Based on this, let us show that D N EC is an eroder. For any configuration
x with a finite set I1 (x) we draw three lines so as to place this set into
a triangle surrounded by these lines: one horizontal, one vertical and one
with the slope −1 . After every application of D N EC the set I1 ( D Nt EC x)
remains between the lines, the horizontal and vertical lines remaining the
same, but the line with the slope -1 moving towards their intersection.
So the set I1 ( D Nt EC x) shrinks to an empty set and the configuration
becomes ”all zeros” in a time proportional to its initial size. Using these
ideas you can prove lemma 4.2.
Lemma 4.3. If σ is not empty, then D is not an eroder.
We shall prove this for the case when σ contains the origin. In this case
we shall present a configuration x , which is a finite deviation from ”all
zeros”, such that I1 (D x) ⊇ I1 (x) . From monotonicity this implies that
I1 (Dt x) is non-empty for all t . Any set S + v = {i + v, i ∈ S} , where
v is a vector, is called a shift of S . Let us call a set S obtuse for another
set Q if any shift of S , which intersects conv (Q) , intersects Q also.
The following figure illustrates this.
H
HH
H
HH
S
H
HH HH
H
H
HH HH
H
H
HH HH
H
H
S+v
H
H
HH
H
Q
Figure 4.6.
The set S is not obtuse for the set Q because its shift S + v intersects
the convex hull of Q without intersecting Q .
Theorem 4.4.
Carathéodory’s Convex Sets Theorem for
bounded 2-dimensional sets.
Let S be any set in a plane.
Then a point p belongs to conv (S) if and only if it belongs to
conv ({p1 , p2 , p3 }) , where p1 , p2 , p3 are some points in S (some of
which may coincide). In other words,
[
conv (S) =
conv ({p1 , p2 , p3 }).
p1 , p2 , p3 ∈ S
40
We leave its proof to the reader.
Lemma 4.4. For any set S in a plane the set −3 conv (S) is obtuse
for S .
We leave to the reader proof of this lemma when S consists of one, two
or three points and, assuming that for these cases the lemma is already
proved, prove it in general. Let us take any shift S+v of S , assume that it
does not intersect −3 conv (S) and prove that conv (S) + v also does not
intersect −3 conv (S) . Let us take any point p ∈ conv (S) . Now, due to
theorem 4.3, there are p1 , p2 , p3 ∈ S such that p ∈ conv ({p1 , p2 , p3 }) .
By assumption, S +v does not intersect −3 conv (S) , whence p1 +v, p2 +
v, p3 + v do not belong to −3 conv (S) . Since our lemma is already
proved for sets containing at most three points, we can conclude that
conv ({p1 + v, p2 + v, p3 + v}) does not intersect −3 conv (S) , whence
p does not belong to −3 conv (S) . Lemma 4.4 is proved.
Now notice that for any S1 , S2 , z ⊆ R2 : If S1 is obtuse for z and S2 is
non-empty, then S1 + S2 is obtuse for z , where + means vector sum:
S1 + S2 = {s1 + s2 : s1 ∈ S1 , s2 ∈ S2 } .
The number of minimal zero-sets is finite since all of them are subsets of
V . So we can denote them z1 , . . . , zn . For every minimal zero-set zi we
have a bounded set Si obtuse for it. Then their vector sum S1 + · · · + Sn
is also bounded and obtuse for all minimal zero-sets. Let us add a large
enough sphere S0 to be sure that the intersection of the total vector sum
S with Z2 is non-empty and define
S = S0 + S1 + · · · + Sn .
Thus we have a bounded set S , which is obtuse for all zero-sets. Let us
prove that the configuration x , whose I1 (x) = S ∩ Z2 , is not eroded by
D . In fact we shall prove that I1 ( D x) ⊇ I1 (x) . Assume the opposite:
there is a point v such that xv = 1 , but ( D x)v = 0 . Since ( D x)v = 0 ,
there must be a minimal zero-set z such that xv+i = 0 for all i ∈ z .
Therefore z + v does not intersect S . Since S is obtuse for z , the
convex hull of z + v also does not intersect S . Since σ contains the
origin, the convex hull of z also contains the origin, so the convex hull of
z + v contains v , whence v does not belong to S , which contradicts our
assumption that xv = 1 . Lemma 4.4 is proved.
Example 4.6. Pay attention that σ may contain no integer points and
still be non-empty. Let D : Ω → Ω , where Ω = {0, 1}G , G = Z2 , be
41
defined as follows:
( D x)i,j = min (max(xi,j , xi+1,j+1 ), max(xi+1,j , xi,j+1 )).
a) What is σ in this case?
b) Let us denote x the configuration such that
I1 (x) = {(0, 0), (1, 0), (0, 1), (1, 1)} .
What are sets I1 ( D t x) for t=1,2,3,...?
Exercise 4.1. Prove another version of Helly’s theorem: If there is a
family of bounded convex sets in Rd such that every d + 1 of them have a
common point, then all the sets in the family have a common point. (The
family does not need to be finite.)
You can find a proof of a very general version of Helly’s theorem and many
related facts in [28].
Exercise 4.2. Prove that the following statement is false: ”If there is a
family of convex sets in Rd such that every d + 1 of them have a common
point, then all the sets in the family have a common point.”
42
Chapter 5
σ is empty ⇐⇒ D is an eroder ⇐⇒ RD may be non-ergodic
In this chapter we do not yet consider cellular automata in general; instead we consider an important class of them. As before, DCA means a
Deterministic Cellular Automaton defined in chapter 4, usually denoted
by D with or without a subscript.
An RD-operator is a composition of a deterministic operator D and a
random operator R defined in (4). Following tradition, we apply them in
one order (first D, then R), but write in the opposite: (first R, then D).
Let us define a deterministic operator D like we did it in chapter 4.
The ground space is Zd . We choose a natural number n and n different
neighbor vectors v1 , . . . , vn ∈ Zd . Also we need a function f : An → A .
One class of RD operators are percolation operators. In this case A =
{0, 1} . You may imagine that ”one” means that a component is alive and
”zero” means that it is dead. In this case we define the function f as
follows:
(
0
if a1 = · · · = an = 0,
f (a1 , . . . , an ) =
(19)
1
in all the other cases.
The random operator Rα acts as was defined in (4). The main purpose of
this chapter is to present and partiallly prove the following theorem about
RD operators.
Theorem 5.1. Let A = {0, 1} . Let D denote a monotonic DCA acting
on AG where G = Zd , which turns ”all zeros” into ”all zeros”. Let the
set σ ⊂ Rd be defined as in (16). Then:
(a) The following limit exists for all α :
lim (Rα D )t δ0 .
t→∞
(20)
(b) If σ is empty, then D is an eroder and the limit (20) is different from
δ1 for α small enough; this limit tends to δ0 as α → 0 .
(c) If σ is not empty, then D is not an eroder and the limit (20) equals
δ1 for all α > 0 .
The following scheme shows of which parts its proof consists. By a closed
half-plane we mean one of two halves into which a plane is divided by
a line, including this line. A zero-half-plane means a closed half-plane,
which contains a zero-set.
43
σ is empty
?
There are at most three zero-half-planes,
whose intersection is empty
?
?
Rα D is non-ergodic
for small α
D is an eroder
σ is not empty
D is not an eroder
-
Rα D is ergodic
for all α > 0
Figure 5.1. Scheme of proof of theorem 5.1.
Every arrow in this scheme represents an argument. But first of all we
need the following theorem.
Lemma 5.1. If D is not an eroder, then Rα D is ergodic for all
positive α .
For this purpose we estimate the probability that the 0 -th component is
zero after t applications of Rα D to any initial configuration and prove
that this probability tends to zero when t → ∞ . Since D is not an
eroder, there is a finite deviation x from ”all zeros” not eroded by it. So
I1 ( D t x) is not empty for all natural t and we can choose a point pt in
it. For every u ∈ [1, t] let us consider the event:
”At the time u ∈ [1, t] the random operator Rα turns all components of
I1 (x1 ) − p t−u into ones”.
This event is sufficient for the point 0 at time t to be in the state 1. But
these events are independent from each other (because they pertain to
different times) and the probability of each is α C where C is the number
of elements in I1 (x) . Therefore for every of these events the probability
44
to happen is α C and the probability not to happen is 1 − α C . So the
probability that none of these events happens is
t
C
1−α
,
which tends to zero when t tends to ∞ . Thus the probability to have
zero at the origin tends to zero as t → ∞ . The same is true for any point
and this is sufficient for zeros to die out.
Lemma 5.2. If there are at most three zero-half-planes, whose intersection is empty, then Rα D is non-ergodic for small enough α > 0 .
A general proof needs some sophisticated technique: either branching
analogs of contours [35], or renormalization [4]. [21] contains a good explanation of the branching method for the NEC operator.
Flattening: its non-ergodicity
We shall prove lemma 5.2 for only one particular case: P = Rα D f lat ,
where D f lat was defined in example 4.5, formula (18). Notice that δ1 is
invariant for P , so it is sufficient to prove that P t δ0 does not tend to δ1
when t → ∞ for α small enough. In fact we shall estimate from above
the density of ones in measures P t δ0 uniformly in t .
Let us use triple (i, j, t) when we speak of the point (i, j) at time t .
We shall imagine this triple as a point in a three-dimensional integer space
where the axes i, j are horizontal and the axis t goes upward. We may
also denote a point (i, j, t) by one letter, say A , and in this case we
write i = i(A), j = j(A), t = t(A) . Let x(i, j, t) equal 1 if there is a
particle at (i, j, t) and equal 0 otherwise. Also let us use independent
random variables y(i, j, t) , which equal 1 if the random operator turned
zero into one at this point and equal 0 otherwise. Every y(i, j, t) equals
one with probability α and zero with probability 1 − α independently of
other variables y(·) . Thus we can define the variables x(i, j, t) in the
following inductive way:
Base of induction: x(i, j, 0) = 0 for all i, j .
Induction step: x(i, j, t) = max(y(i, j, t), z(i, j, t)) , where the
intermediate variable z(i, j, t) is (i, j) -th component of the result of
application of D f lat to the configuration at time t − 1 , that is
z(i, j, t) =
min(max(x(i , j, t − 1), x(i , j + 1, t − 1)),
max(x(i + 1, j, t − 1). x(i + 1, j + 1, t − 1))).
45
This pseudo-code expresses the same idea:
1
2
3
4
5
6
for all (i, j) ∈ Z2 do simultaneously
x(i, j, 0) ← 0
for t = 1 to ∞ do
for all (i, j) ∈ Z2 do simultaneously
x(i, j, t) ← ( D f lat x(t − 1))(i, j)
if rnd < α then x(i, j, t) ← 1
Here ( D f lat x(t − 1))(i, j) is the (i, j) -th component of the result of
application of D f lat to the configuration x(t − 1) , whose components are
x(i, j, t − 1), i, j ∈ Z .
Let us estimate from above the probability that there is a particle at
(0, 0) at time T , that is x(0, 0, T ) = 1 . We shall cover this event by
several events and estimate the sum of their probabilities. To every one of
these events there will correspond a special path. If k -th and (k + 1) -th
vertices of this path are (ik , jk , tk ) and (ik+1 , jk+1 , tk+1 ) , then we denote
∆k = (∆ik , ∆jk , ∆tk ) , where
∆ik = ik+1 − ik ,
∆jk = jk+1 − jk ,
∆tk = tk+1 − tk .
Let us take an arbitrary realization of our process and construct the event
to which it belongs along with the corresponding path. We shall proceed
inductively, at every step constructing some event and some path and
proving the following induction assumptions about them:
a) This path leads from (0, 0, T ) to (1, 0, T ) .
b) This path has steps only of three following types:

down, having ∆i = 0 and ∆t = −1 .


horizontal, having ∆i = 1 and ∆t = 0 .


up, having ∆i = −1 and ∆t = 1 .
In all the three cases |∆j| ≤ 1 .
c) If a horizontal step starts at (i, j, t) , then x(i, j, t) = 1 .
Base of induction. The event is ” x(0, 0, T ) = 1 ” and the path is
(0, 0, T ) → (1, 0, T ) . It is evident that all the assumptions are fulfilled.
When we stop: when our path has the following property:
d) for every vertex (i, j, t) , where a horizontal step starts, y(i, j, t) = 1 .
46
6t
r
A = (i, j, t)
r
r
C = (i, j1 , t − 1) D = (i + 1, j2 , t)
i
Figure 5.2.
This is an illustration of one induction step. The axis j is not drawn
because it is perpendicular to the paper. If x(i, j, t) = 1 , but y(i, j, t) =
0 , this value of x must be ”inherited” from the previous time step: there
must be points (i, j1 , t − 1) and (i + 1, j2 , t − 1) , where x equals one,
j1 and j2 being equal either j or j + 1 .
Induction step. Suppose that there is a vertex A = (i, j, t) , where a
horizontal step AB starts, but y(i, j, t) = 0 . By induction assumption,
x(i, j, t) = 1 . Since the variable y at this point is zero, the value of x at
this point is inherited from the previous time step. Looking at the figure
5.2 will help you to realize that there must be two other points C and D
such that
t(C) = t(D) = t(A) − 1,
i(C) = i(A) and i(D) = i(A) + 1.
Then we define another vertex E by the rule: the vector from D to E is
the same as the vector from A to B . After that we change our path as
follows: instead of going straight from A to B , we go from A to C , then
to D , then to E and thence to B . In other words, we insert C, D, E
between A and B into the sequence of vertices of our path. It is easy
to prove that the new path also satisfies the induction assumptions a),
b), c). This induction process cannot last forever because x(i, j, 0) ≡ 0 .
When it stops, we have a path satisfying the requirements a), b), c) and
d). but this path may not be self-avoiding yet. We want a self-avoiding
path with all these properties. To obtain it, we use ”delooping” similar to
that which we used when proving lemma 1.1. If the path which we have is
not yet self-avoiding, it visits some vertex twice and makes a loop between
these visits. We eliminate this loop (including only one of these visits)
from our path, thus obtaining a shorter path, which has all the properties
(a), (b), (c) and (d). So we do until we get a self-avoiding path with these
properties.
47
Our event is that y(i, j, t) = 1 for all those vertices of our path where
horizontal steps start. Since the path is self-avoiding, probability of this
event is not greater than αk , where k is the number of horizontal steps in
the path. Notice that the number of down steps is equal to the number of
up steps and both equal k − 1 . So the length of the path is 3k − 2 . The
number of possible paths with length 3k −2 does not exceed C 3k−3 where
C is the number of different vectors (∆i, ∆j, ∆t) allowed by condition
b). Thus the sum of probabilities of all events does not exceed
∞
X
k=1
C 3k−3 αk =
α
.
1 − C 3α
For α small enough this sum is less than 1, whence measures P t δ0 do
not tend to δ1 . Lemma 5.2 is proved.
Exercice 5.1. Let us take percolation operators with the initial configuration ”all ones”, that is ”all alive”.
(a) Let us take percolation operators with n = 1 . Prove that all of them
are ergodic whenever α > 0 .
(b) Let us take percolation operators with n = 2 . Prove that these operators behave like Stavskaya operator: all of them have one and the same
critical values.
Exercice 5.2. Let d = 1 and D : Ω → Ω be defined thus:
( D x)i = median (x−1 , x0 , x1 ).
Let also Rαβ be the random noise defined in (4) Prove that the composition
Rαβ D is ergodic whenever 2/3 < α + β < 4/3 .
Exercise 5.3. Prove that the operator Rαβ D N EC is ergodic as soon as
2/3 < α + β < 4/3 .
Solved problem 5.1. [10] contains a criterion of eroders for arbitrary
finite set of states, but only for dimension d = 1 .
Solved problem 5.2. Theorem 5.1 states a connection between eroders
and existence of critical values. This connection is not true for greater
numbers of states of components. This is shown by the following onedimensional counter-example, where every point has three states: 0,1 and
2. In this case Ω = {0, 1, 2}Z and uniform deterministic operator D is
defined by the rule
∀x ∈ Ω, v ∈ Z : ( D x)v = f (xv−1 , xv , xv+1 ),
48
the function f being defined as follows:


1
f (x−1 , x0 , x1 ) = 0


median (x−1 , x0 , x1 )
if x−1 = 1, x0 = x1 = 2,
if x−1 = x0 = 1, x1 = 0,
in all the other cases.
a) Check that the function f (·) is monotonic.
b) Let us call a configuration x a finite deviation from ”all zeros” if the
set {v ∈ Z : xv 6= 0} is finite. Prove that D is an eroder that is for
any finite deviation x from ”all zeros” there is t such that D t x = ”all
zeros”.
c) Let Rα be a random operator, which turns any component into the state
2 with probability α independently of others. Prove that for any α > 0
the operator Rα D is ergodic [36].
Unsolved problem 5.1. Does uniqueness of invariant measure imply
ergodicity? In other words, is there a non-ergodic cellular automaton,
which has exactly one invariant measure?
Unsolved problem 5.2. Along with the NEC operator, the article [39]
wrote about another operator Rαβ D median , acting on the same space, where
D median is defined by
( D median x)(i,j) = median (xi,j , xi+1,j , xi,j+1 , xi−1,j , xi,j−1 ),
(21)
where the Boolean function median (·) of five arguments equals 0 if the
majority of its arguments equals 0 and equals 1 if the majority of its arguments equals 1 . Since neither ”all zeros” nor ”all ones” is an attractor
under D median , theorem 5.2 cannot be used. However, computer modelling suggested that this operator is non-ergodic for small enough α = β .
Also it seems plausible that this operator is ergodic whenever α 6= β . Both
statements are unproved.
49
Chapter 6.
Measures
If we want to define a normalized probability distribution on a finite or
countable set Ω , it is easy: we just atrribute a non-negative probability
P rob(ω) to every i ∈ Ω so that
X
P rob(ω) = 1.
ω∈Ω
Then for any S ⊂ Ω we simply assign
X
P rob(S) =
P rob(ω)
ω∈S
and obtain a probabilistic distribution with many good qualities. The
situation is more difficult when Ω is non-countable.
Let us remember some notions which we introduced in the previous chapters. Suppose that we have a non-empty finite set denoted by A and
called alphabet. Elements of A are called letters. We call G ground space
and Ω = AG configuration space. Elements of G are called points and
elements of Ω are called configurations. Every configuration ω is determined by its components ωi for all i ∈ G . For any i ∈ G and any a ∈ A
we call a basic set the set
{ω : ωi = a} .
(22)
A finite intersection of basic sets is called a thin cylinder. and a finite union
of thin cylinders is called a cylinder. Cylinders form an algebra A because
a finite intersection and a finite union of cylinders and a complementar of
a cylinder are cylinders.
We want to have a measure µ defined on all cylinders in Ω which satisfies
the following conditions:


 (a) µ(empty set) = 0.
(23)
(b) µ(Ω) = 1.


(c) If S1 ∩ S2 is empty, then µ(S1 ∪ S2 ) = µ(S1 ) + µ(S2 ).
But this is not enough because many important subsets of Ω are not
cylinders. For example, any set consisting of only one configuration is
not a cylinder. To include into consideration many inportant sets, which
are not cylinders, we define the σ -algebra A0 as the minimal set, which
50
includes A and satisfies the following rule also: any countable intersection
and any countable union of elements of A0 also belong to A0 .
Now instead of the condition (c) we want a stronger condition (c’):
(c0 ) If every two sets of a sequence S1 , S2 , S3 , . . . of subsets of Ω have
an empty intersection, then
µ(S1 ∪ S2 , ∪S3 ∪ . . . ) = µ(S1 ) + µ(S2 ) + µ(S3 ) + . . .
Theorem 6.1. Carathéodory’s extension theorem. If we have a
measure µ defined on all elements of an algebra A , which satisfies the
conditions (23), then there is a measure µ0 defined on all the elements
of the corresponding σ -algebra A0 , which coincides with µ on A and
satisfies the condition (c’) on all the elements of A0 .
Proof may be found in [1] or [6].
One example of usefullness of this extension: given any configuration x ,
we obtain the set, which consists only of this configuration, as the countable intersection
\
{ω : ωi = xi } .
i∈G
Let us denote by M the set of probability measures on Ω , that is on the
σ -algebra generated by the cylinders as described above.
By convergence in M we mean convergence on all thin cylinders (or on
all cylinders, which is the same).
Theorem 6.2. The set M of normalized measures on the sigma-algebra
generated by cylinders in Ω is compact.
Proof. In fact we shall prove that every sequence in M has a convergent
subsequence; it is well-known that this is equivalent to compactness of
M.
Let C1 , C2 , C3 , . . . be a list of all thin cylinders. Now let us take an
arbitrary sequence µi ∈ M and prove that it has a converging subsequence νi . Let us consider the sequence of µi (C1 ) , i.e. values of our
measures on C1 . These values are real numbers between zero and one, so
their sequence has a converging subsequence. So the sequence (µi ) has
a subsequence (µ1i ) whose values on C1 converge. Let us define ν1 = µ11
and consider the sequence of values of
µ1i (C2 ), i = 2, 3, 4, . . .
(24)
51
Again, this sequence has a converging subsequence, whence the sequence
(24) has a subsequence (µ2i ) of measures whose values on C1 and C2
converge. Let us define ν2 = µ21 and consider values of measures
µ2i (C3 ), i = 2, 3, 4, . . .
(25)
Arguing in the same way, we obtain its subsequence whose values on C3
converge and so on. Thus we define a sequence (ν1 , ν2 , ν3 , . . .) , whose
values on all cylinders converge. Thus we found a subsequence which
converges on all thin cylinders. Theorem 6.2 is proved.
An important class of measures is product measures. We call µ a product
measure if for any thin cylinder
C = {ω : ωi1 = ai1 , . . . , ωin = ain } .
we have
µ(C) =
j=n
Y
µ(ωij )
j=1
Uniformity. We consider measures on AZ . We take shifts on Z , then
shifts on M We consider a measure µ uniform if it is invariant under all
shifts.
Existence of non-measurable sets. The idea of the proof is not new:
a similar one was used in [17] on pp. 18, in [18] on pp. 13-14, and
essentially in the same way in [6] on pp. 408-409. All of these arguments
are in the spirit of Lebesgue measure; accordingly the space (analog of
our configuration space) is a circumference and two sets are congruent if
one turns into the other by a rotation. In our case (in the style of our
approach) the configuration space is AZ and two sets are congruent if one
turns into the other by a shift.
Now we present our argument. For every v ∈ Z we define a map Tv :
Ω → Ω as follows: for any configuration x = (. . . x−1 , x0 , x1 , . . . ) and all
integer i
(Tv (x))i = xi−v .
Let us call such maps translations. We claim two sets S1 , S2 ⊂ Ω equivalent if one of them may be turned into the other by a translation. This
claim is consistent with the general mathematical notion of equivalence
and has all its properties including representation of Ω as a unon of
classes of equivalence. Suppose we want to attribute to every S ⊂ Ω a
52
non-negative number µ(S) , called its measure, which satisfy the following
conditions:


(a) µ(empty set) = 0.





(b) For every configuration x ∈ Ω : µ(x) = 0.





(c) µ(Ω) = 1.



 (d) If S ⊂ S ⊂ Ω then µ(S ) ≤ µ(S ).
1
2
1
2

(e) If S1 , S2 ⊂ Ω and S1 and S2 are equivalent, then µ(S1 ) = µ(S2 ).





(f ) If every two sets of a sequence S1 , S2 , S3 , · · · ⊂ Ω





have an empty intersection, then




µ(S1 ∪ S2 , ∪S3 ∪ . . . ) = µ(S1 ) + µ(S2 ) + µ(S3 ) + . . .
Theorem 6.3. The conditions (a,b,c,d,e,f ) are incompatible.
Notice that these conditions are redundand; for example, (a) follows from
(b) and (d). But such as they are written these conditions are good enough
for us.
Proof. Let us assume all these conditions and come to a contradiction.
We have called two configurations x, y equivalent if there is v ∈ Z such
that Tv (x) = y . Evidently this ”equivalence” satisfies all the conditions
for an equivalence relation in the general sense, so it allows us to classify
Ω into classes of equivalence.
Let us call a configuration x periodic if there exists a shift Tv , where
v 6= 0 , such that x = Tv (x) .
The set of periodic configurations is countable, therefore it has measure
0. Therefore the set of non-periodic configurations has measure 1. This
set is altready classified into classes.
For every class of non-periodic configurations let us choose just one representative - an element of this class. Putting all these representatives together, we get a set S . Notice that no two different elements of S belong
to one class and therefore no two different elements of S are equivalent.
Now let us consider translations Tv (S) of S indexed by the vector v ∈ Z .
Lemma 6.2. If v 6= w , then the sets Tv (S) and Tw (S) have an empty
intersection.
Proof by contradiction. Let Tv (S) and Tw (S) have a common configuration x :
x ∈ Tv (S) and x ∈ Tw (S),
53
whence
T−v x ∈ S
and
T−w x ∈ S.
Then S has two different equivalent elements T−v x and T−w x , which is
impossible. Lemma 6.2 is proved.
Now notice that the translations Tv (S) at all vectors v have pairwise
emplty intersections and their union has measure 1. Therefore
X
µ(Tv (R)) = 1.
v∈Z
Now let us consider two cases
(a) µ(S) = 0 . Then µ(Ω) = 0.
(b) µ(S) > 0 . Then µ(Ω) = ∞.
In both cases µ(Ω) is not 1 as it should be.
This contradiction proves theorem 6.2.
Exercice 6.1. Let A = {0, 1} and G = Z . Prove that the following sets
belong to the sigma algebra generated by cylinders in Ω = AG :
(a)
(b)
(c)
x1 + · · · + xn ) 1
x ∈ Ω : lim
=
.
n→∞
n
2
x1 + · · · + xn ) 1
.
x ∈ Ω : lim
<
n→∞
n
2
x1 + · · · + xn
x ∈ Ω : lim
exists .
n→∞
n
Exercice 6.2. In the chapters 1 and 2 we naively spoke about probabilities of percolation from a vertex to infinity without having checked that
these events belonged to the sigma-algebra generated by cylinders. But
it is better late than never. Prove that these events really belong to that
sigma-algebra.
54
Chapter 7
PCA in general
In this chapter we speak about cellular automata. All of them are linear
operators acting on measures belonging to the space M introduced in the
previous chapter. So we shall often use the word ”operator” instead of
a longer phrase ”cellular automaton”. By ”process” we mean a sequence
of measures µ, P µ, P 2 µ, . . . resulting from iterative application of an
operator P to some initial measure µ . A measure µ is called invariant
for P if P µ = µ . We call an operator P : M → M ergodic if it has an
invariant measure µinv and for any measure µ the limit limit limt→∞ P t µ
exists and is one and the same for all µ .
If P is ergodic, it has only one invariant measure, but the converse is
not proved for cellular automata and is not true for operators in general.
In this chapter we shall prove that all cellular automata of a large class
have at least one invariant measure. Generally, it is important to study
ergodicity and sets of invariant measures of cellullar automata.
Ergodic operators correspond to real systems which ”forget” their initial
conditions - this is what we usually want to achieve when we mix a drink.
Non-ergodic operators correspond to real systems which partially remember their initial conditions - this is what we want to achieve when we
keep information in computer memory. The central goal of this course is
to present some examples of ergodic and some examples of non-ergodic
cellular automata.
Let us present a general definition of PCA aka Probabilistic Cellular Automata and prove some general statements about them. As usual, the
ground space G is either Zd or Zdm . We denote by M the space of
normalized measures on sigma-algebra generated by cylinders in Ω = AG
where A is the alphabet.
We have n neighbor vectors v1 , . . . , vn ∈ Zd and denote V (p) =
{v1 (p), . . . , vn (p)} . If the ground space is finite, then the configuration
space is also finite and our PCA is just a finite Markov chain. We define
it as follows. For any delta-measure δ(x) , where x is any configuration
in Ω , the measure P δ(x) is a product-measure with factors defined as
follows. Let us call the distribution of the i -th component according to
this measure the transitional distribution and denote it by θi (·|x) . In fact,
the i -th transitional distribution depends only on components of x in the
neighborhood of i , so we can write it also as θi (·|xV (i) ) , where xV (i) is
55
the restriction of x to V (i) . By θi (y|x) we denote the value of θi (·|x)
on y ∈ A , that is the conditional probability that after application of operator P the i -th component will be in the state y if before application
the neighborhood of i was in the state xV (i) . This probability is called
transitional probability.
Thus we have defined how our operator acts on delta-measures. If our
space is finite, this is sufficient because Ω is finite also and any measure
is a finite linear combination of delta-measures.
If the ground space is infinite, measures on Ω generally are not finite
linear combinations of delta-measures, but as soon as we concentrate on
the value of P µ on a certain thin cylinder S , we may restrict µ to V (S) ,
where S is the support of this thin cylinder. Since S is finite, V (S) is
finite also, and this restriction is a linear combination of delta-measures
on V (S) . Thus we can write an explicit formula for the value of P µ at
an arbitrary thin cylinder:
(P µ)(yi = bi , i ∈ S) =
Y
X
µ(xi = ai , i ∈ V (S))
θi (bi |aV (i) )
aj ,j∈V (S)
(26)
i∈I
for any finite set S ⊂ G and any bi , i ∈ S .
Several examples. We call a random operator degenerate if at least one
of its transition probabilities equals zero and non-degenerate only if all its
transition probabilities are strictly positive.
Any deterministic operator can be considered as a very degenerate PCA,
which transforms any delta-measure into a delta-measure. Its transition
probabilities are
(
1
if y = fi (xV (i) ),
θi (y|xV (i) ) =
0
otherwise.
Any percolation operator P = Rα D defined in (19) can be represented in
the form (26) by taking V (i) = {i + v1 , . . . , i + vn } and
(
0
if xj = 0 for all j ∈ V (i),
θi (1|x) =
1−α
otherwise.
Any map from M to M is called an operator. We call a measure µ
invariant for an operator P if P µ = µ .
56
Theorem 7.1. Any PCA has at least one invariant measure.
Proof. Let us apply our operator P iteratively to an arbitrary initial
measure µ . We obtain a sequence of measures µ, P µ, P 2 µ, P 3 µ, . . . .
Let us form another sequence of measures ψ1 , ψ2 , ψ3 , . . . , where
1 k−1
ψk =
µ + · · · + P µ , k = 1, 2, 3, . . .
(27)
k
From theorem 6.2 this sequence has a subsequence which converges to
some measure φ . Let us prove that φ is invariant for P . Suppose that
it is not, which means that there is a thin cylinder
C = {y : yi = ai for all i ∈ S}
on which φ and P φ have different values:
φ(C) 6= (P φ)(C).
(28)
Let us denote H = |φ(C) − (P φ)(C)| > 0 . Let us denote also
Ca = {x : xi = ai for all i ∈ V (I)} ,
where a ∈ SV (I) .
Using these notations, we can rewrite the formula (26) as
X
Y
(P µ)(C) =
µ(Ca )
θi (bi |a).
a
(29)
i∈I
Since the sequence (27) has a subsequence, which converges to φ , we can
take k so large that
|ψk (C) − φ(C)| < H/3
and
|ψk (Ca ) − φ(Ca )| <
H
3
Q
j∈V (I) |Sj |
(30)
for all a , where |Sj | is the cardinality of Sj . Then
P ψk =
whence
1
k
P µ + · · · + P kµ ,
µ(C) − (P k µ)(C)
,
ψk (C) − P ψk (C) =
k
which is not greater by modulo than 2/k . This is less than H/3 as soon
as we choose k > 6/H ,
57
Also let us prove that | (P φ)(C) − (P ψk )(C) | < H/3 :
| (P φ)(C) − (P ψk )(C) | =
X
Y
Y
X
=
θ
(b
|a
)
ψ
(x
=
a
)
φ(x
=
a
)
θ
(b
|a
)
−
i
i
i
I
I
I
I
i
i
V
(i)
V
(i)
aV (I)
aV (I)
i∈I
i∈I
X
Y
(31)
(φ
−
ψ
)(x
=
a
)
θ
(b
|a
)
k
I
I
i i V (i) .
aV (I)
i∈I
Since every transition probability does not exceed one,
Y
θi (bi |aV (i) ) ≤ 1.
i∈I
Hence from (30), (31) does not exceed H/3 . Thus
| φ(C) − (P φ)(C) | ≤
| φ(C) − ψk (C) | + | ψk (C) − (P ψk )(C) | + | P ψk (C) − (P φ)(C) | <
H/3 + H/3 + H/3 = H.
This contradiction shows that our assumption (28) was false. Theorem
7.1 is proved.
In fact, we often need a similar, but more general theorem, which can be
formulated as follows.
Theorem 7.2. Suppose that we have a non-empty convex compact subset
C of M where a cellular automaton P acts. Suppose that there is µ ∈ M
such that P t µ belongds to C for all t . Then P has an invariant measure
which belongs to C .
Proof is evident.
The following theorem achieves one of the main goals of this book.
Theorem 7.3. If D is a monotonic deterministic operator defined by
(37) and both ”all zeros” and ”all ones” are attractors under it, then Rαβ D
has at least two different invariant measures for small enough positive α
and β .
Proof. Let C be the set of measures, for which the density of zeros does
not exceed, say, 1/3. Then, taking the initial measure concentrated in ”all
ones” and α such that 27α/(1 − 27α) < 1/3 and using theorem 5.3, we
58
conclude that operator P has an invariant measure in C . The same is
true if instead of 1/3 we take any positive number. Theorem 7.3 is proved.
Do we really need to select a sub-sequence in (27) to prove theorem 7.1
? Perhaps, already the sequence αt always converges ? The following
example shows that this assumption is false.
Example 7.1, which shows that a sequence ψ defined by (27) may
have no limit: let Ω = {0, 1}Z , P being a deterministic operator, simply
shifting every configuration at one site to the left and µ concentrated in
the configuration a ∈ Ω , where for all i ∈ Z
(
1
if i ≥ 1 and the integer part of log3 i is odd,
ai =
0
in all the other cases.
Take C = {x ∈ Ω : x0 = 1} . Then P k µ has no limit when k → ∞ .
An explanation of a similar difficulty for processes with continuous time
has been provided by Liggett [23, p. 12 of section 1 of chapter 1].
Exercice 7.1. Show that theorem 7.1 becomes false if A is infinite.
59
Chapter 8
Ergodicity and dimension
For a long time (since the first studies of Ising model by Ising, Peierls,
Onsager a.o. in the first half of the XX century) it was a common opinion in statistical physics that phase transitions can occur only in systems,
whose dimension is greater than one. For example, §152 of Landau and
Lifshitz’s fundamental monograph [20] was called ”The impossibility of
the existence of phases in one-dimensional systems” and an argument of
physical nature was presented in support of this impossibility. Another
example: ”In one dimension bosons do not condense, electrons do not
superconduct, ferromagnets do not magnetize, and liquids do not freeze”
[22], p. vi. (Our exercice 1.2 shows in this direction also.) However, all
these ideas were formed in dealing with models of equilibrium, which had
no time parameter. Cellular automata, besides space, have time. Should
it be counted as an additional dimension? There were many arguments
about it in Moscow in the 1960-1970-s. Dobrushin suggested that time
should be counted as a dimension. In this sense cellular automata, which
we call one-dimensional because their ground space is one-dimensional, essentially are two-dimensional and as such should be non-ergodic as soon as
their parameters satisfy certain inequalities. Piatetski-Shapiro expressed
his doubts about it and to clarify this dispute, undertook a computer
simulation of some cellular automata.
Three processes were chosen for modeling and both Dobrushin and
Piatetski-Shapiro agreed that these processes might be treated as crucial ones. In all the three cases the notion of median, borrowed from
statistics, was used to define the deterministic operators. Only, instead of
”median” they spoke about ”voting” [39].
First we define three deterministic operators D 3 , D 5 , D N EC . In the
first case the ground space is G = Z , in the other cases the ground space
is G = Z2 . In all the three cases the configuration space is AG . Their
definitions:
∀ i ∈ Z , x ∈ Ω : ( D 3 x)i =
∀ (i, j) ∈ Z2 , x ∈ Ω : ( D 5 x)i,j =
∀ (i, j) ∈ Z2 , x ∈ Ω : ( D N EC x)i,j =
median (xi−1 , xi , xi+1 ),
median (xi,j+1 , xi+1,j , xi,j , xi,j−1 , xi−1,j ),
median (xi,j+1 , xi+1, j , xi+1,j , xi,j ).
Now let us remember what we called the random noise Rαβ : it turns any
delta-measure δa , where a ∈ Ω is a given configuration, into a product-
60
measure µ , according to which for any point p ∈ G the component xp is
distributed as follows:
(
µ(xp = 1) = α,
if ap = 0, then
µ(xp = 0) = 1 − α,
(
µ(xp = 1) = 1 − β,
if ap = 1, then
µ(xp = 0) = β,
Dobrushin expected that with α = β positive but small enough all these
processes would be non-ergodic, that is their limit behavior as t → ∞
would depend on the initial condition. Especially appropriate would be
two contrasting initial conditions: a measure concentrated in ”all zeros”
and a measure concentrated in ”all ones”. Denoting by D any of these
operators, computer was used to simulate the measures δ0 (Rαβ D )t and
δ1 (Rαβ D )t as t grows. Especially important was the case α = β and D =
D 3 . The results of modeling and their interpretations were published in
[39]. They seemed to show that Rαβ D 3 was ergodic for all α = β strictly
between 0 and 1/2 . At that time these results seemed quite convincing.
However, no rigorous proof of this conjecture is obtained till now. Gray
[11] proved a similar statement for continuous time.
What about the other two operators, their results seemed similar to each
other although it can be proved rigorously that in the biased case β = 0
(33) and (34) they behave differently: for (34) both ”all zeros” and ”all
ones” are attractors, while for (33) both of ”all zeros” and ”all ones” are
not. After that the positive rates conjecture was proposed by several authors based on various informal considerations. It claimed that all uniform
non-degenerate one-dimensional probabilistic cellular automata with a finite alphabet are ergodic. (see, for example, [23, chapter 4, section 3], or
[36, p. 115] or [11]).
Bennett and Grinstein’s computer simulation of NEC
Let us mention here results of a more recent computer modeling of the
composition Rαβ DN EC . Since this operator has two parameters α and
β , instead of a critical point there is a critical curve, see diagram. (Due
to monotonicity, about which we shall speak below, it was sufficient to
consider only two initial conditions ”all zeros” and ”all ones”, and due to
symmetry of D 3 it was sufficient to consider only one of them.) When
α = β was close to 1/2 , the operator was ergodic, that is tended to one
61
and the same regime from all initial conditions, but for α = β small
enough the operator was non-ergodic, i.e. it ”remembered” the initial
condition: if simulation started from ”all zeros”, zeros prevailed all the
time, if it started from ”all ones”, ones prevailed all the time.
1
0
-1
.0
.1
.2
.3
.4
Figure 8.1.
Sketch borrowed from Bennett and Grinstein’s article [2]: computer
simulation (using CAM [32]) and chaos approximation of the NEC
operator with asymmetric noise.
The horizontal axis is noise, the vertical axis is bias, where
noise = β + α,
bias =
β−α
,
β+α
where α and β are parameters of the noise, probabilities of 0 to become
1 and of 1 to become 0.
The left curve is simulation using CAM - Cellular Automata Machine
The right curve is mean field aka chaos approximation.
The left part of the rectangle - two-phase region.
The right part of the rectangle - one-phase region.
Regretfully, no analogous simulation was performed for D 5 , so we can
only guess for which α and β the operator Rαβ D 5 is ergodic. One guess
is that it is non-ergodic only for α = β small enough.
Most of the problems considered here pertains to the infinite case. One of
them is to decide for any configuration x and any operator D whether
x is an attractor under D .
62
We have presented some non-degenerate non-ergodic cellular automata,
for which G = Z2 , so we may call them two-dimensional. Similar constructions and arguments can be presented for all dimensions greater than
one.
Although the Stavskaya process shows a possibility of phase transition
in the one-dimensional case, it is not quite satisfactory since one of its
invariant measures is degenerate (concentrared in ”all ones”).
Is it possible to present a one-dimensional cellular automaton, which has
two non-degenerate invariant measures? The following processes are very
short of answering this question, but deserve some attention. In both
A = {0, 1} and 0 < 1 .
Example 8.1. The GKL (Gacs, Kurdyumov, Levin) process [8].
Its operator is a composition of a deterministic operator D GKL and our
standard random noise Rαβ , where D is defined as follows:
(
median (xi , xi−1 , xi−3 )
if xi = 0,
∀ x ∈ Ω : ( D x)i =
median (xi , xi+1 , xi+3 )
if xi = 1.
The deterministic operator D GKL in the example 8.1 is uniform In this
example both ”all zeros” and ”all ones” are attractors, but it is a unanimous guess that Rαβ D GKL is ergodic whenever α+β > 0 . (See a detailed
computer study of D GKL without and with random noise in [29].)
Example 8.2. The 2-lines process D 2lines [37]. In this case ground
space is special: G = Z × {−1, 1} . Elements of G may be written as
pairs (i, j) , where i ∈ Z and j equals -1 or 1. The alphabet is {0, 1} .
Now we define the deterministic operator D 2lines as follows:
∀ x ∈ Ω ∀ (i, j) ∈ G : ( D 2lines x)(i,j) = min(max(xi,j xi,
−j ),
xi−j, j ).
The deterministic operator D 2lines in the example 8.2 is uniform, but
it is the only example in this book, where the group of automorphisms
is non-commutative. The operator D 2lines is an eroder, but it has been
proved [34] that Rα D 2lines is ergodic whenever α > 0 .
Gacs’ construction.
In spite of considerable computer simulations, nothing can substitute a
rigorous argument. The systems, which mathematicians consider, may be
much more general than those which arise from physical considerations,
and may contradict physical intuition. Now the positive rates conjecture
is refuted: P. Gács proposed a non-ergodic non-degenerate uniform onedimensional system [9]. Gács’ system actually is an operator acting on
63
Ω = AZ , which is a composition of a deterministic and a random operator,
the random operator turning any state into any other state with a small
probability. The main property of the system is that errors do not accumulate, so that the density of components in ”wrong” states remains small
forever. The system is very complicated and some defects were found in
its first version, but now Gray [12] has helped to correct all the flaws
and a final version of Gács’ construction has been publushed [9]. It takes
more than two hundred pages to describe and it needs, although finite,
but enormous number of elements in the set A of states of a single component and although positive but very small probability of error. I asked
Gács to estimate, at least roughly, the parameters of his construction. He
was not sure, but suggested 2100 as a rough estimate of the number of
states and one divided by a square of this number as a rough estimation
of probability of error. Although Gács’s result is very important theoretically, these numbers make any practical application very unlikely. It
would be interesting to find out, whether such a large number of states
and such a small probability of error are really necessary.
Exercise 8.1. Write a chaos approximation for the operator Rαβ D N EC
and study for which values of parameters α and β it is ergodic.
64
Chapter 9
Coupling, Order, Monotonicity
We start with a trivial lemma.
Lemma 9.1.
(a) Given a percolation model on a finite graph, we attribute to the edges
e1 , . . . , en Boolean variables v 1 , . . . , v n so that each v i equals 1 if the
edge ei is open and equals 0 if this edge is closed. Given a percolation
model with vertices A and B , we denote by Π the Boolean variable, which
equals 1 if there is an open path from A to B and equals 0 otherwise.
Then Π is a monotonic Boolean function of (v 1 , . . . , v n ) .
(b) Given a percolation model on an infinite graph, we attribute to the
edges e1 , . . . , en Boolean variables v 1 , v 2 , . . . so that each v i equals 1 if
the edge ei is open and equals 0 if this edge is closed. Given a percolation
model with a vertex A , we denote by Π the Boolean variable, which equals
1 if there is an open path from A to ∞ and equals 0 otherwise. Then Π
is a monotonic Boolean function of (v 1 , . . . , v n ) .
Proof of lemma 9.1 is evident and we omit it. Now we go to the main
business of this chapter. By a coupling of two or more measures we
mean a measure on a product of their spaces, whose marginals are given
measures. Any two given measures have a lot of couplings, most of them
useless. For example, a product of two measures is one of their couplings
(a fairly useless one). Proof of the following lemma is based on a useful
coupling.
Theorem 9.1. Suppose that we have a percolation model in which every
edge is open with probability ε and closed with probability 1 − ε independently from all the other edges. Given any two vertices A and B , the
probability of percolation from A to B is a monotonic function of ε .
Proof. We denote by P (ε) the probability of percolation from A to B
if every edge is open with a probability ε and closed with a probability
1 − ε independently of all the other edges. In fact we shall prove that
ε1 < ε2 =⇒ P (ε1 ) ≤ P (ε2 ).
To prove this, we use the idea of coupling. Given ε1 and ε2 such that ε1 <
ε2 , we introduce for all i ∈ {1, . . . , n} variables wi , independent from
each other, each having three possible states ”both”, ”one”and ”none”and
65
distributed as follows:


”both”
wi = ”one”


”none”
with probability ε1 ,
with probability ε2 − ε1 ,
with probability 1 − ε2 .
(35)
For all i ∈ {1, . . . , n} , we define the states of the variables v1i and v2i as
functions of the states of w1 , . . . , wn in the following way:
(
(
i
1
if w = ”both”,
0
if wi = ”none”.
i
i
(36)
v1 =
v2 =
0
if wi 6= ”both”.
1
if wi =
6 ”none”.
Then
(a) every v1i equals 1 with probability
like in the percolation with parameter
(b) every v2i equals 1 with probability
like in the percolation with parameter
(c) v1i ≤ v2i a.s. for all i .
ε1 independently from others just
ε1 ;
ε2 independently from others just
ε2 ;
Evidently, the function f is monotonic. Hence and from (c)
f (v11 , . . . , v1n ) ≤ f (v21 , . . . , v2n ).
Hence and from (a) and (b) P (ε1 ) ≤ P (ε2 ) . Theorem 9.1 is proved.
Coupling of processes. By a coupling of two or more processes we
mean a process on a product of their spaces, whose marginals are given
processes.
This time we use a general class of deterministic operators. In all of them
A = {0, 1} . To define a deterministic operator D , we take an arbitrary
non-empty finite list of vectors v1 , . . . , vn ∈ Zd and an arbitrary Boolean
function f : {0, 1}n → {0, 1} . Then D is defined as follows:
( D x)v = f (xv+v1 , . . . , xv+vn ) for all v ∈ Zd .
(37)
Also we use a random operator Rαβ , defined in (4).
Theorem 9.2. Let D be defined by (37) and
1−
1
< α + β ≤ 1.
n
(38)
Then the operator Rαβ D is ergodic.
Proof. In the present case we use a coupling of three processes: two processes generated by our operator Rαβ D with different initial measures and
66
a percolation process. This coupling is defined by the following pseudocode, where x(v, t) , y(v, t) and m(v, t) mean components of the three
marginals at a point v at time t :
1 for all v ∈ Zd do simultaneously
2 x(0) ← µinit
x
init
3 y(0) ← µy
4 m(v, 0) ← 1
5 for t = 1 to ∞ do
6 for all v ∈ Zd do simultaneously
7
x(v, t) ← f (x(v + v1 , t − 1), . . . , x(v + vn , t − 1))
8
y(v, t) ← f (y(v + v1 , t − 1), . . . , y(v + vn , t − 1))
9
m(v, t) ← max(m(v + v1 , t − 1), . . . , m(v + vn , t − 1))
10
r ← rnd
11
if r < α then
12
x(v, t) ← 0
13
y(v, t) ← 0
14
m(v, t) ← 0
15
if r > 1 − β then
16
x(v, t) ← 1
17
y(v, t) ← 1
18
m(v, t) ← 0
Let us first ignore all the lines of this pseudo-code which deal with the
values m(v, t) and concentrate our attention on those lines, which deal
with x(v, t) and y(v, t) . These lines describe two processes with one and
the same operator Rαβ D and arbitrary different initial conditions set by
lines 1-3. These processes function simultaneously, using a common source
of random noise. In both processes every component every time does the
following: first, due to lines 7 and 8, it assumes some value, which depends
on states of its neighbors one time step ago, and second, due to lines 10-17
it makes a random change, becoming 1 with probability α and becoming
0 with probability β . Let us call a point (v, t) a break if x(v, t) 6= y(v, t) .
It follows from percolation arguments that the density of breaks at time
t does not exceed coef t , where
coef = n(1 − (α + β)).
(39)
Under the condition (38) coef < 1 and therefore the quantity (39) tends
to zero as t → ∞ .
67
To monitor what happens to breaks, we have special marks m(v, t) , which
may equal 0 or 1. We call a point (v, t) marked if m(v, t) = 1 and
unmarked otherwise. The interaction is arranged in such a way that
m(v, t) = 1 at every break:
∀ v, t : x(v, t) 6= y(v, t) =⇒ m(v, t) = 1.
(The converse may be false: m(v, t) may equal 1 at other points also.)
Initially all the points are marked (to be on the safe side, because initially
all the points may be breaks) and become unmarked only when lines 12-13
or 16-17 assign equal values to x(v, t) and y(v, t) . Notice that the mark
process is a percolation process with the death rate α + β .
Now let us prove that under conditions (38) our operator P = Rαβ D is
ergodic. From theorem 7.1 P has at least one invariant measure µinv .
Let us take an arbitrary measure µ0 and prove that µ0 P t tends to µinv
as t → ∞ .
Remember that by convergence in M we mean convergence on all thin
cylinders. Let us take an arbitrary thin cylinder C and prove that
µ0 P t (C) tends to µinv (C) as t → ∞.
Let us denote µt = µ0 P t for all t . Remember that any thin cylinder is
an intersection of several basic sets (22). In particular
C = C1 ∩ · · · ∩ Ck .
Therefore
|µt (C) − µinv |(C) ≤
k
X
|µt (Ci ) − µinv (Ci )|.
i=1
Since every item in the right side tends to zero as t → ∞ , the right side
also tends to zero, whence the left side also tends to zero. Theorem 9.2 is
proved.
Order on configurations.
Let us assume that the alphabet A is ordered (perhaps, partially). For
example, if A = {0, 1, . . . , m} , it may be ordered in the same way as on
the number line:
0 ≺ 1 ≺ · · · ≺ m − 1 ≺ m.
We shall use signs ≺ and and words preceeds and succeeds speaking
about this order. For example, when A = {0, 1} , we typically assume that
68
0 ≺ 1 which means that zero preceeds one and (which is the same) 1 0
which means that one succeeds zero. These relations are a generalization
of relations ”less” and ”more” among real numbers, but in the present
case there may be uncomparable elements. For any a, b ∈ A we assume
that:
(a) x ≺ y meanbs the same as y x ,
(b) If x ≺ y and x y then x = y .
Let us introduce a partial order on AG by saying that configuration x
preceeds configuration y or, what is the same, y succeeds x and writing
x ≺ y or y x if xi ≤ yi for all i ∈ G .
In chapter 4 we have called a deterministic operator D monotonic if
x ≺ y implies D x ≺ D y . This definition is consistent with what we
said before. Let us say that a measurable set S ⊂ Ω is upper if
(x ∈ S and x ≺ y) =⇒ y ∈ S.
Analogously, a set S is lower if
(y ∈ S and x ≺ y) =⇒ x ∈ S.
It is easy to check that a complement to an upper set is lower and vice
versa.
Order on measures.
We introduce a partial order on M by saying that a normalized measure
µ preceeds ν (or ν succeeeds µ ) if µ(S) ≤ ν(S) for any upper S (or
µ(S) ≥ ν(S) for any lower S , which is equivalent).
Order on operators.
We call an operator P : M → M monotonic if µ ≺ ν implies P µ ≺ P ν .
Also let us say that operator P1 preceeds operator P2 or P2 succeeeds P1
and write P1 ≺ P2 or P2 P1 if P1 µ ≺ P2 µ for all measures µ .
Notice that there are uncomparable configurations, none of which preceeds
the other, for example (0, 1) and (1, 0) . Similarly there are uncomparable
measures and uncomparable operators. Notice also that all our definitions
are consistent: if we consider a deterministic operator as a degenerate
random operator, our definitions of monotonicity coincide.
Lemma
9.2. Let
Q
Q us have two product-measures µ, ν ∈ M , where µ =
i µi and ν =
i νi . Then µ ≺ ν if and only if µi ≺ νi for all i .
Proof is easy and we omit it.
69
Of course, composition of monotonic operators is monotonic, so to know
that a composition of two operators is monotonic, it is sufficient to check
monotonicity of each. How to check monotonicity of an operator?
Lemma 9.3. An operator P defined by (26) is monotonic if and only
if all the transition distributions θi (·|x) are monotonic functions of x ,
that is
x ≺ y =⇒ θi (·|x) ≺ θi (·|y) .
(40)
Proof of lemma 9.3. In one direction: suppose that (40) is false, that is
there are i , and y ≺ z such that θi (·|y) does not preceed θi (·|z) . Then
δ(y) ≺ δ(z) serve as those µ ≺ ν for which P µ does not preceed P ν
because both are product-measures and the i -th factor of P µ does not
preceed the i -th factor of P ν . From lemma 9.1 this is sufficient. In the
opposite direction it also follows from lemma 9.1.
Lemma 9.4. Given two operators P1 and P2 with one and the same Ω
and transition distributions θi1 (·|x) and θi2 (·|x) respectively. Then P1 ≺
P2 if and only if
θi1 (·|x) ≺ θi2 (·|x)
(41)
for all i ∈ G and x ∈ Ω .
Proof of lemma 9.4. In one direction: Let us assume that θi1 (·|y) does
not preceed θi2 (·|y) for some i ∈ U and some y ∈ Ω and prove that P1
does not preceed P2 , that is exists a measure µ such that P1 µ does
not preceed P2 µ . Let us take µ = δ(y) . Then both P1 µ and P2 µ are
product-measures, the i -th factors of which violate condition of lemma
9.1, whence P1 µ does not preceed P2 µ . In the opposite direction: Now
assume (41) and prove that P1 µ ≺ P2 µ for all µ . It is sufficient to prove
this for delta-measures, for which it follows from lemma 9.1. Lemma 9.4
is proved.
Lemma 9.5. Let A = {0, 1} . If P is monotonic, then the sequences P t δ0
and P t δ1 converge and P is ergodic if and only if their limits coincide.
Proof. It is easy to prove by induction that
δ0 ≺ P δ0 ≺ P 2 δ0 ≺ P 3 δ0 ≺ P 4 δ0 . . .
and
δ1 P δ 1 P 2 δ1 P 3 δ1 P 4 δ1 . . . .
Indeed, in each case the first inequality is evident because δ0 preceeds and
δ1 succeeeds any measure and all the other inequalities follow from this.
Thus for any upper or lower set C the sequences P t δ1 (C) and P t δ0 (C)
70
are monotonic and therefore each of them has a limit. Now let us take
any thin cylinder C and denote
C = {x | ∃ y ∈ C : y ≺ x}
0
and C = C \ C.
0
It is easy to show that C and C are upper sets, so the values of P t δ0
and P t δ1 at these sets have limits. Therefore their values at C also have
limits, which means that these measures have limits. Lemma 9.5 is proved.
One example of application of lemma 9.5 : the sequence P t δ1 for the
Stavskaya operator P has a limit; since from (6)
P t δ1 (x0 = 0) ≤
27α
for all t,
1 − 27α
the same inequality is true for the limit measure, whence for small values
of α the Stavskaya operator has at least two different invariant measures.
Exercise 9.1. Given two measures µ and ν on one and the same sigmaalgebra generated by cylinders in Ω . Given µ ≺ ν and ν ≺ µ . May we
conclude that µ = ν ?
Exercise 9.2. Prove that the operator Rαβ defined in (4) is monotonic
whenever α + β ≤ 1 .
71
Chapter 10
Undecidability of the problem of ergodicity for PCA
The main result of this chapter is undecidability of the problem of ergodicity of cellular automata. Let us think, what do we really want to achieve
in dealing with cellular automata, in particular with the study of their
ergodicity. Mathematics is an abstract science and we, mathematicians,
want to prove general theorems. In the present case we want to have
criteria of ergodicity for as large classes of cellular automata as possible.
However, it is known that some general problems in all areas of mathematics cannot be solved in the algorithmical sense. It is only natural that
in dealing with cellular automata we face such situations very often, because our object is very general. When we meet an undecidable problem,
it means that we are working close to the boundaries of natural possibility.
This moves us to treat with more respect those partial results which we
have obtained: perhaps, they are close to what can be done at all.
In this chapter we shall show that the problem of deciding, which cellular
automata are ergodic and which are not, is algorithmically unsolvable for a
certain class of them. There are several formalizations of algorithms. We
choose one of them, namely Turing machines, named after their inventor
Alan Turing, because they are the most similar to cellular automata. In
fact we shall use the following class of Turing machines with one head
and one bi-infinite tape. A Turing machine of this class consists of a head
and a tape. The tape is infinite in both directions and consists of cells
enumerated by integer numbers. Every cell can be in several states. The
set G of states is one and the same for all cells. The head also has a finite
set H ∪ {stop} of states, where one state, called stop , plays a special role
described below. At every step of the discrete time the head observes one
cell of the tape as shown in figure 10.1.
head
6
?
Figure 10.1. Turing Machine.
At every step of the discrete time the head observes one cell, exchanges
information with it and then moves to another cell.
72
Also we choose three functions:
Ftape : G × H → G,
Fhead : G × H → H ∪ {stop} ,
Fmove : G × H → {←, →} .
When the machine starts, the tape is ”empty”, which means that all cells
are filled with the initial symbol g1 ∈ G . The head is in the initial state
h1 ∈ H and the head observes the 0 -th cell of the tape. At every step the
head simultaneously writes into that cell of the tape, which it observes, a
new symbol according to the function Ftape , goes to a new state according
to the function Fhead , and moves one cell left or one cell right along the
tape according to the values ←, → of the function Fmove respectively, the
arguments of all the three functions being the symbol in the presently
observed cell of the tape and the present state of the head. The machine
stops when and if the head reaches the state stop . (That is why we don’t
need to define our functions when the head is in the state stop .) It is
well-known that the problem of deciding, which of these machines ever
stop, is algorithmically unsolvable, that is there is no algorithm capable
to predict for all Turing machines, which of them ever stop having started
with empty tape. This famous theorem is described in many books. We
shall use it to prove another theorem about undecidability - about cellular
automata.
In the greater part of mathematics, a theoretical result is the better the
more general it is, that is the larger is the class of objects under consideration. However, when we prove algorithmic unsolvability, it is the
other way round: the result is the more valuable the smaller is the class
of objects for which it is proved. For this reason, we minimize our class as
much as possible: the arbitrary number n of states of a single component
is the only source of infinitness of our class, everything else is minimized:
the dimension is 1, the interaction is only between nearest neighbors and
all the transition probabilities are either 0 or 1/2 or 1.
Our configuration space is {1, . . . , n}Z , where n is a natural parameter. A
cellular automaton is an operator, which is a composition of two operators:
first deterministic D , then random R . Our deterministic operator is
determined by a function f : {1, . . . , n}3 → {1, . . . , n} in the following
way: it transforms any configuration x into D x , where
( D x)i = f (xi−1 , xi , xi+1 ) for all i ∈ Z.
(42)
73
Our random operator R is very simple: all components of a configuration,
which are not in the state n , turn into the state 1 with probability
1/2 independently of each other. An operator P is called ergodic if the
ditribution P t µ tends to one and the same limit dsitribution from all
initial conditions µ .
Theorem 10.1. There is no algorithm to decide which cellular automata
described in chapter 8 are ergodic and which are not [19, 36].
Our method of proof consists in the following: for every Turing machine
of the class described above we construct a cellular automaton of the
class described above, which is ergodic if and only if that Turing machine
stops having started on an empty tape. This is sufficient to prove that
the problem of deciding which of our cellular automata are ergodic is
unsolvable because if it were solvable, the problem of deciding which of
Turing machines stop would be solvable also, but it is known that it is not.
In more detail: if the ergodicity problem were solvable, then we could take
a Turing machine, construct the corresponding cellular automaton, apply
to it that hypothetical deciding procedure, decide whether it is ergodic or
not, and conclude whether the Turing machine stops or not.
Thus for every Turing machine T we shall construct an operator P which
is ergodic if and only if T stops. In fact, P imitates the functioning of
T in the following way: under its action, all components at any time randomly (with probability 1/2 ) turn into heads of T in initial condition.
Every head marks its territory with brackets and imitates the functioning
of T on its territory. This functioning may be interrupted by other heads,
but measures are taken to prevent the heads from collaborating: as soon
as a head gets any sign of presence of another head, it commits a suicide. If M never stops, the process remains in this messy regime forever.
If M stops, some components go to a special final state, which starts an
”epidemy” by turning their neighbors into the same state, so that the process tends to the δ(all final) , measure concentrated in the configuration
”all components are in the final state”. In fact, this measure is invariant
in all cases, but the process tends to it from all initial conditions only
if M stops.
Thus, for any Turing machine M of the class described above we shall
construct an operator P , which is ergodic if and only if M stops. We set
S = Slef t × Sright × Stape × Shead ,
74
where
Slef t = Sright = {0, 1} ,
Stape = G,
Shead = H ∪ {0, stop} .
Accordingly, we write a generic element of S as
x = (lef t(x), right(x), tape(x), head(x)).
We say that a state x has a left bracket if lef t(x) = 1 and that it has a
right bracket if right(x) = 1 .
We call x a no-head if head(x) = 0 and a head otherwise. We call x a
stop-head if head(x) = stop .
The state (0, 0, g1 , 0) is called empty , the state (1, 1, g1 , h1 ) is called
newborn and the state (0, 0, g1 , stop) is called f inal .
For brevity we shall write F∗ (x) = F∗ (tape(x), head(x)) , where ∗ means
‘tape’, ‘head’ or ‘move’.
We say that a head x wants to move left or to move right when Fmove (x)
equals ← or → respectively.
We need all these states to make our process imitate the functioning of
Turing machine M . The tape component imitates what is written on the
tape, the head component imitated what is in the head or its absence if it
is zero. The left and right brackets are necessary to exclude interference
of the heads.
Our operator P is a composition P = RD , where R turns any state
except final into the newborn state with a probability 1/2 independently.
It remains to define the deterministic operator D , that is to define the
function f (·) in formula (42). Its definition consists of several rules.
Rule 0. If x or y or z is a stop-head, then f (x, y, z) = f inal .
Formulating all the other rules, we assume that neither x nor y nor z
is a stop-head. We call a triple (x, y, z) ∈ S 3 normal if at most one of
x, y, z is a head.
Rule 1. Whenever the triple (x, y, z) is not normal, f (x, y, z) = empty .
In all the following rules we assume that the triple (x, y, z) is normal.
Rule 2. If all of x, y, z are no-heads, then f (x, y, z) = y .
All the subsequent rules form three groups depending on which of the three
arguments is a head: center, denoted y , or its left neighbor, denoted x ,
or its right neighbor, denoted z . The center-rules:
75
Rule 1-center. If y is a head which wants to move left, then
f (x, y, z) = (0, right(y), Ftape (y), 0).
Rule 2-center. If y is a head which wants to move right, then
f (x, y, z) = (lef t(y), 0, Ftape (y), 0).
Since the left rules and the right rules are symmetric, we shall omit the
right ones. The left-rules:
Rule 1-left. If x is a head, which wants to move right and has a right
bracket, then
f (x, y, z) = (0, 1, g1 , Fhead (x)).
Rule 2-left. If x is a head, which wants to move right and has no right
bracket and y has no left bracket, then
f (x, y, z) = (0, right(y), tape(y), Fhead (x)).
Rule 3-left. If x is a head, which wants to move right and has no right
bracket, but y has a left bracket, then f (x, y, z) = y .
Rule 4-left. If x is a head, which wants to move left, then f (x, y, z) = y .
The right-rules are analogous to left-rules, only with right and left permuted.
Our operator D is defined. To make the operator R satisfy the promised
condition, it is sufficient to choose n equal to the cardinality of S and
enumerate the states of S so that the newborn state gets number 1 and
the final state gets number n .
Lemma 10.1. Operator P = R D is ergodic if and only if the Turing
machine M stops.
Proof of lemma 10.1. Due to the rule 0, the measure δ(all final) concentrated in the configuration “all components are in the final state” is
invariant for P . Therefore P is ergodic if and only if our process tends
to δ(all final) from any initial configuration. Now let us argue in two
directions.
One direction: Let us suppose that M stops after T steps and prove that
our process tends to δ(all final) from any initial configuration. Let us
consider a region [s0 − 2T, s0 + 2T ] ⊂ Z , where s0 is any integer number.
If a stop-head is present there, it turns into f inal , which expands in both
76
directions due to rule 0. If there is no stop-head there, then the following
scenario has a positive probability: First, at some time t0 births occur in
all sites in the range [s0 − 2T, s0 + 2T ] . At the next time step all of these
sites become empty. At the next time step birth occurs at the middle site
s0 and this is the only birth that occurs in the space-time region
{(s, t) | s0 − 2T + (t − t0 ) ≤ s ≤ s0 + 2T − (t − t0 ), 0 < t − t0 ≤ 2T } .
Under these conditions, we are dealing with configurations imitating the
functioning of M during time long enough for M to stop. As soon as
the head stops, it turns into f inal , which expands in both directions due
to rule 0. This scenario has a positive probability. Therefore it happens
somewhere almost sure, whence our process tends to δ(all final) .
The other direction: Let us assume that M never stops, i.e. continues to
function forever if started at the empty tape. Let us take the initial measure concentrated in the configuration “all components are in the empty
state” and prove that the resulting distributions cannot contain a stophead with a positive probability and therefore cannot tend to δ(all final) .
This would be evident if every head functioned alone, never interacting
with other heads.
Let us show that in our process every head either functions as if it were
alone or disappears. In our construction every head creates its own ”territory” marked by left bracket at the left end and by right bracket at
the right end. This territory consists of sites which this head has visited.
However, this territory may be invaded by another head, which changes
the states of cells which it visits, and our head must recognize when it
happens. Every time when a head wants to move beyond its territory,
it carries the bracket one step further, perhaps, invading another head’s
territory. In this case, due to rule 1-left, it changes the symbol on the tape
to the empty one, so its functioning does not differ from functioning of a
solitary head on a tape which was empty at the beginning. The crucial
question is what happens when some head returns to a place which was
its territory, but was invaded by another head. If our head does not notice
that the site was invaded and uses symbol written there by another head,
it may eventually stop although it would not stop if functioned alone. We
must avoid this.
Let us examine the situation in more detail. Since right and left are
symmetric, it is sufficient to examine, what happens if some head wants
to move right. If it has a right bracket, it means that it is expanding its
77
territory; in this case it can do it due to rule 1-left and in doing it it will
erase the former tape symbol and write the initial symbol as if it were
alone on the tape. If it has no right bracket and its right neighbor has
no left bracket, it means that it is moving within its own territory and
goes to a place, which has never been invaded and the symbol in the right
neighbor was written by itself - see rule 2-left. However, if it has no right
bracket. but its right neighbor has a left bracket, it means that another
head has visited this site. Having discovered that it is not unique, our
head gets so angry that it commits a suicide, that is turns into a no-head.
In more detail, on one hand, due to rule 2-center, our head is not any
more where it was, but, on the other hand, this head does not emerge
in its right neighbor cell due to rule 3-left and the cell, which it wanted
to invade, remains intact. All this assures that every head either moves
within its own territory, never visited by other heads, expanding it and
imitating the functioning of the original Turing machine with one head,
or disappears. Therefore the probability that a stop-head will ever emerge
is zero and the process does not tend to δ(all final) . Thus lemma 10.1 is
proved, whence theorem 10.1 immediately follows.
Exercise 10.1. Let us consider the class of operators Rαβ D on {0, 1}G ,
where G = Zd and D is any operator defined by formula (37) with only
one restriction: the number of neighbors n = 1 . Present an algorithm,
which decides for all operators of this class, which of them are ergodic and
which are not.
Exercise 10.2. Let us consider the class of monotonic deterministic operators D : Ω → Ω , defined by (37), where Ω = {0, 1}G and G = Zd .
Present an algorithm to decide, which of these operators are ergodic.
(Since every deterministic operator can be interpreted as a random operator, we can apply to them the notion of ergodicity.)
Exercise 10.3. Prove that the problem of deciding, which cellular automata have only one invariant measure, is algorithmically unsolvable.
This statement is similar to theorem 10.1, but is not identical with it and
needs to be proved separately because we don’t yet know whether uniqueness
of invariant measure implies ergodicity.
78
Main terms and notations
Z - the set of integer numbers.
Zd - the d -dimensional integer space.
R - the set of real numbers.
Rd - the d -dimensional real space.
Percolation model - graph, whose edges may be open or closed.
Path in a graph - a finite or infinite sequence ”vertex-edge-vertex-edge...”,
where every edge connects those vertices between which it is places in this
sequence.
Contour - a path in which the first and last vertices coincide.
G - ground space, a finite or countable set, discrete analog of physical
space.
A - a non-empty finite set called alphabet, the set of states of any
component.
letter - any element of A
AG - the configuration space, its elements are called configurations.
Configuration space - product-space Ω = AG .
Configuration - an element of configuration space x ∈ Ω . Determined by
its components xv for all v ∈ G
Thin cylinder - subset of Ω of the form
{x ∈ Ω : xi1 = ai1 , . . . , xin = ain } .
Support of this thin cylinder is the set {i1 , . . . , in } .
Normalized measure µ on Ω is defined by its values on thin cylinders.
”Normalized” means µ(Ω) = 1 .
M - set of normalized measures on Ω .
Delta-measure δ(x) - measure concentrated on one configuration x .
Product-measure - a measure on a product-space, in which all the
marginals are independent.
P µ - result of application of operator P to measure µ .
Transition distribution θi (·|x) - distribution of the i -th component according to the measure P δ(x) .
Transition probability θi (y|x) - probability that the i -th component equals
79
y according to the measure P δ(x) .
Degenerate measure - a measure on sigma-algebra generated by thin cylinders, which equals zero at at least one thin cylinder.
Degenerate cellular automaton - a cellular automaton, at least one transition probability of which is zero.
Composition P Q of two operators P and Q - an operator, whose action
consists of action first Q , then P .
Uniform measure - a measure, which is invariant under space shifts.
Uniform operator - an operator which commutes with space shifts.
Invariant measure: a measure µ ∈ M is called invariant for operator P
if P µ = µ .
Ergodicity: Operator P : M → M is called ergodic if the limit
limt→∞ P t µ exists and is one and the same for all µ ∈ M .
A deterministic operator D : is monotonic if x ≺ y implies D x ≺ D y .
A random operator P : M → M is monotonic if µ ≺ ν implies P µ ≺
P ν.
Coupling of two or more measures - a measure on a product-space, whose
marginals are given measures.
Coupling of two or more processes - a process on a product-space, whose
marginals are given processes.
finite deviation Given two configurations x, y ∈ Ω , we call x and y finite
deviations of each other if the set {g ∈ G : xg 6= yg } is finite.
A configuration x ∈ Ω is callled an attractior of a deterministic operator
D if x is invariant for D and for any y finite deviation of x there is a
natural t such that D t y = x. .
Shift of a set S in a linear space at a vector v denoted S + v - the set
{i + v | i ∈ S} .
Vector sum of two sets
{i + j | i ∈ S1 , j ∈ S2 } .
in
a
linear
space:
S1 + S2
=
Convex set - a set in a linear space, which with any two points a, b contains
the segment [a, b] .
Convex hull of a set S in a linear space - intersection of all convex sets
containing S .
Turing machine - an abstract ”machine” proposed by Alan Turing as a
formalization of the notion of algorithm.
80
References
[1] Robert B. Ash. Measure, Integration, and Functional Analysis. Academic Press,
New York and London (1972).
[2] C. Bennett and G. Grinstein. Role of Irreversibility in Stabilizing Complex and
Nonergodic Behavior in Locally Interacting Discrete Systems. Phys. Rev. Letters, v.
55 (1985), n. 7, pp. 657-660.
[3] P. Berman and J. Simon. Investigations of Fault-Tolerant Networks of Computers.
ACM Symp. on Theory of Computing, v. 20 (1988), pp. 66-77.
[4] M. Bramson and L. Gray. A Useful Renormalization Argument. Festschrift for F.
Spitzer. Birkhäuser, Boston, MA.
[5] S. R. Broadbent and J. M. Hammersley. Percolation processes I. Crystals and mazes.
Proceedings of the Cambridge Philosophical Society, v. 53 (1957), pp. 629-641.
[6] R. Durrett. Probability: Theory and Examples. 4-th edition (2010).
[7] R. E. Edwards. Functional Analysis: Theory and Applications. Dover Publications,
Inc., N.Y. (1995).
[8] P. Gacs, G. Kurdyumov and L. Levin. One-dimensional homogeneous media, which
erode finite islands. Problems of Information Transmission., v. 14 (1978), pp. 223226. (Translated from Russian.)
[9] P. Gács. Reliable cellular automata with self-organization. Journal of Stat. Physics,
v. 103 (2001), n. 1/2, pp. 45-267.
[10] G. Galperin. Homogeneous local monotone operators with memory. Doklady of Soviet Acad. of Sciences, 228 (1976), pp. 277-280. (Translated from Russian.)
[11] L. Gray. The Positive Rates Problem for Attractive Nearest Neighbor Spin Systems
on Z . Z. Wahrscheinlichkeitstheorie verw. Gebiete, v. 61 (1982), pp. 389-404.
[12] L. Gray. A reader’s guide to P. Gács’s ”positive rates” paper: ”Reliable cellular
automata with self-organization” [J. Statist. Phys. 103 (2001), no. 1-2, 45-267].
Journal of Stat. Physics, v. 103 (2001), n. 1-2, pp. 1-44.
[13] G. Grimmett. Percolation. Springer (1999).
[14] . O. Häggström. Computability of Percolation Thresholds. In and Out of Equilibrium
2 (2007), Progress in Probability, v. 60 (2008), pp. 321-329, ed. by V. Sidoravicius
and M. Vares.
81
[15] F. C. Hennie. Iterative Arrays of Logical Circuits. MIT Press Classics (1961).
[16] H. Kesten. Percolation Theory for Mathematicians. Birkhauser, Boston (1982).
[17] A. N. Kolmogorov and S. V. Fomin. Measure, Lebesgue Integrals, and Hilbert Space.
Academic Press, New York and London, 1961. (Translated from Russian.)
[18] A. N. Kolmogorov and S. V. Fomin. Elements of the Theory of Functions and Functional Analysis. Volume 2. Measure. The Lebesgue Integral. Hilbert Space. (Translated from Russian.)
[19] G. I. Kurdumov. An algorithm-theoretic Method for the Study of Uniform Random
Networks. Multicomponent Random Systems ed. by R. L. Dobrushin and Ya. G.
Sinai. Advances in Probability and Related Topics, v. 6 (1980), pp. 471-503. Contributing editor D. Griffeath, series editor P. Ney. Marcel Dekker, Inc. New York
and Basel. (Translated from Russian.)
[20] L. D. Landau and E. M. Lifshitz. Statistical Physics. (Vol. 5 of Course of Theoretical
Physics.) 2-d Edition. Pergamon Press, 1969.
[21] J. L. Lebowitz, C. Maes and E. R. Speer. Statistical mechanics of probabilistic cellular automata. Journal of Stat. Physics, 59 (1990), 1-2, 117-168.
[22] Mathematical Physics in One Dimension. Exactly Soluble Models of Interacting Particles. A Collection of Reprints with Introductory Text by Elliott H. Lieb and Daniel
C. Mattis. N.Y., Academic Press, 1966.
[23] Thomas M. Liggett. Interacting Particle Systems. N.Y., Springer-Verlag, 1985 and
2005.
[24] M. L. Menezes and A. Toom. A non-linear eroder in presence of one-sided noise.
Brazilian Journal of Probability and Statistics, vol. 20, n. 1, june 2006, pp. 1-12.
[25] J. von Neumann and A. W. Burks. Theory of self-reproducing automata. Urbana,
University of Illinois Press, 1966.
[26] P. Lima and A. Toom. Dualities Useful in Bond Percolation. Cubo, vol. 10 (2008),
n. 3, pp. 93-102.
[27] N. Petri. Unsolvability of the recognition problem for annihilating iterative networks.
Selecta Mathematica Sovietica, vol. 6 (1987), pp. 354-363. (Translated from Russian.)
[28] R. T. Rockafellar. Convex analysis. Princeton University Press, 1970.
[29] P. G. de Sá and C. Maes. The Gacs-Kurdyumov-Levin Automaton Revisited. Journal
of Stat. Physics, v. 67 (1992), n.3/4, pp. 507-522.
82
[30] Luis de Santana. Velocities a là Galperin in Continuous Spaces. Doctoral Thesis
defended on July 26, 2012 at CCEN/UFPE, Brazil.
[31] O. Stavskaya and I. Piatetski-Shapiro. On homogeneous nets of spontaneously active
elements. Systems Theory Res., v. 20 (1971), pp. 75-88. (Translated from Russian.)
[32] T. Toffoli and N. Margolus. Programmable matter, concepts and realization. Physica
D, v. 47 (1991), pp. 263-272.
[33] A.Toom and L. Mityushin. Two Results regarding Non-Computability for Univariate
Cellular Automata. Problems of Information Transmission, v. 12 (1976), n. 2, pp.
135-140. (Translated from Russian.)
[34] A. Toom. Unstable Multicomponent Systems. Problems of Information Transmission,
v. 12 (1976), n. 3, pp. 220-224. (Translated from Russian.)
[35] A. Toom. Stable and attractive trajectories in multicomponent systems. Multicomponent Random Systems, ed. by R. Dobrushin and Ya. Sinai. Advances in Probability
and Related Topics, v. 6 (1980), pp. 549-575. Contributing editor D. Griffeath, series
editor P. Ney. Marcel Dekker, Inc. New York and Basel. (Translated from Russian.)
[36] A. Toom, N. Vasilyev, O. Stavskaya, L. Mityushin, G. Kurdyumov and S. Pirogov.
Discrete Local Markov Systems. Stochastic Cellular Systems : Ergodicity, Memory,
Morphogenesis. Ed. by R. Dobrushin, V. Kryukov and A. Toom. Nonlinear Science:
theory and applications, Manchester University Press, 1990, pp. 1-182. (Translated
from Russian.)
[37] A. Toom. Cellular Automata with Errors: Problems for Students of Probability.
Topics in Contemporary Probability and its Applications. Ed. by J. L. Snell. Series
Probability and Stochastics ed. by R. Durrett and M. Pinsky. CRC Press (1995),
pp. 117-157.
[38] A. Toom. Every continuous operator has an invariant measure. Journal of Stat.
Physics, v. 129 (2007), n. 3, pp. 555-566.
[39] N. Vasilyev, M. Petrovskaya and I. Piatetski-Shapiro. Modelling of voting with random errors. Automation and Remote Control, v.10 (1970), pp. 1639-1642. (Translated from Russian.)