Automata in Number Theory
Boris Adamczewski
Automatic sequences
Automatic sequences
Let k ≥ 2 be an integer. A sequence (an ) is
k-automatic if there exists a finite automaton taking
the base-k expansion of n as input and producing the
term an as output.
Automatic sequences
Let k ≥ 2 be an integer. A sequence (an ) is
k-automatic if there exists a finite automaton taking
the base-k expansion of n as input and producing the
term an as output.
More formal definition. Let Σk denote the alphabet {0, 1, . . . , k − 1}. A
k-automaton is a 6-tuple
A = (Q, Σk , δ, q0 , ∆, τ ) ,
where :
• Q is a finite set of states,
• q0 ∈ Q is the initial state
• δ : Q × Σk → Q is the transition function,
• ∆ is the output alphabet
• τ : Q → ∆ is the output function.
Automatic sequences
Let k ≥ 2 be an integer. A sequence (an ) is
k-automatic if there exists a finite automaton taking
the base-k expansion of n as input and producing the
term an as output.
More formal definition. Let Σk denote the alphabet {0, 1, . . . , k − 1}. A
k-automaton is a 6-tuple
A = (Q, Σk , δ, q0 , ∆, τ ) ,
where :
• Q is a finite set of states,
• q0 ∈ Q is the initial state
• δ : Q × Σk → Q is the transition function,
• ∆ is the output alphabet
• τ : Q → ∆ is the output function.
P
If n = ri=0 wi k i , we set q1 = δ(q0 , wr ), q2 = δ(q1 , wr −1 ), . . . , qr +1 = δ(qr , w0 ) and
an = τ (qr +1 ).
We say that the the sequence (an ) with values in ∆ is produced by the k-automaton
A.
Eventually periodic sequences
Exercise. It is not hard to prove that all eventually periodic sequences are k-automatic
for every integer k ≥ 2.
Example 0. For instance the following 2-automaton produce the periodic sequence
012 012 012 · · ·
0
1
0
1
A/0
C /2
B/1
1
0
The Thue–Morse sequence
Example 1. The Thue–Morse sequence (tn ) can be defined by tn = 0 if the sum of
the binary digits of n is even and by tn = 1 if this sum is odd.
The Thue–Morse sequence
Example 1. The Thue–Morse sequence (tn ) can be defined by tn = 0 if the sum of
the binary digits of n is even and by tn = 1 if this sum is odd.
0
0
1
A/0
B/1
1
We obtain
t0 t1 t2 t3 · · · = 01101001100101 · · · .
The Thue–Morse sequence
Example 1. The Thue–Morse sequence (tn ) can be defined by tn = 0 if the sum of
the binary digits of n is even and by tn = 1 if this sum is odd.
0
0
1
A/0
B/1
1
We obtain
t0 t1 t2 t3 · · · = 01101001100101 · · · .
If we choose n = 22, then n = 1 × 16 + 0 × 8 + 1 × 4 + 1 × 2 + 0 × 1, so that
(22)2 = 10110.
We thus obtain that
q0 = A, q1 = B, q2 = B, q3 = A, q4 = B, q5 = B
and
a22 = τ (B) = 1.
The Baum–Sweet sequence
Example 2. The Baum–Sweet sequence (bn ) is defined by bn = 1 if the binary
expansion of n contains no block of consecutive 0’s of odd length, and by bn = 0
otherwise.
0
0, 1
1
A/1
1
0
C /0
B/1
1
0
We obtain
b0 b1 b2 b3 · · · = 110110010100100110010 · · · .
D/0
The Baum–Sweet sequence
Example 2. The Baum–Sweet sequence (bn ) is defined by bn = 1 if the binary
expansion of n contains no block of consecutive 0’s of odd length, and by bn = 0
otherwise.
0
0, 1
1
A/1
1
0
C /0
B/1
1
D/0
0
We obtain
b0 b1 b2 b3 · · · = 110110010100100110010 · · · .
Exercise. Show that there are arbitrarily large blocks of consecutive 0’s in the
Baum–Sweet sequence.
Two natural questions
Question. We ca ask wether a 2-automatic sequence is also 3-automatic or
4-automatic.
Two natural questions
Question. We ca ask wether a 2-automatic sequence is also 3-automatic or
4-automatic.
Theorem (Cobham). A sequence is k-automatic if and only if it is k r -automatic for
all positive integer r .
Two natural questions
Question. We ca ask wether a 2-automatic sequence is also 3-automatic or
4-automatic.
Theorem (Cobham). A sequence is k-automatic if and only if it is k r -automatic for
all positive integer r .
Theorem (Cobham). A non-eventually periodic sequence cannot be both k-automatic
and l-automatic for two multiplicatively independent positive integers k and l.
Two positive integers k and l are multiplicatively independent if the equation k a = l b
has no nontrivial integer solution (a, b), that is, log k/ log l is irrational.
Two natural questions
Question. We ca ask wether a 2-automatic sequence is also 3-automatic or
4-automatic.
Theorem (Cobham). A sequence is k-automatic if and only if it is k r -automatic for
all positive integer r .
Theorem (Cobham). A non-eventually periodic sequence cannot be both k-automatic
and l-automatic for two multiplicatively independent positive integers k and l.
Two positive integers k and l are multiplicatively independent if the equation k a = l b
has no nontrivial integer solution (a, b), that is, log k/ log l is irrational.
Question. What is the difference if we choose in the definition of automatic sequences
that automata should read the input from the right to the left, that is, when starting
with the least significant digit ?
Two natural questions
Question. We ca ask wether a 2-automatic sequence is also 3-automatic or
4-automatic.
Theorem (Cobham). A sequence is k-automatic if and only if it is k r -automatic for
all positive integer r .
Theorem (Cobham). A non-eventually periodic sequence cannot be both k-automatic
and l-automatic for two multiplicatively independent positive integers k and l.
Two positive integers k and l are multiplicatively independent if the equation k a = l b
has no nontrivial integer solution (a, b), that is, log k/ log l is irrational.
Question. What is the difference if we choose in the definition of automatic sequences
that automata should read the input from the right to the left, that is, when starting
with the least significant digit ?
Theorem. The class of automatic sequences remains unchanged.
Remark. The proof provides a constructive/effective result. As we will see, both type
of automata turn out to be useful !
A first useful characterization
—
Morphisms of free monoids
Morphisms of free monoids
Morphisms. Let A and B be two finite sets. A map σ from A to B ∗ extends uniquely
to a homomorphism between the free monoids A∗ and B ∗ (that is,
σ(w1 w2 · · · wr ) = σ(w1 )σ(w2 ) · · · σ(wr )). Such a homomorphism from A∗ to B ∗ is
called a morphism.
Morphisms of free monoids
Morphisms. Let A and B be two finite sets. A map σ from A to B ∗ extends uniquely
to a homomorphism between the free monoids A∗ and B ∗ (that is,
σ(w1 w2 · · · wr ) = σ(w1 )σ(w2 ) · · · σ(wr )). Such a homomorphism from A∗ to B ∗ is
called a morphism.
Uniform morphisms. If there is a positive integer k such that each element of A is
mapped to a word of length k, then the morphism is called k-uniform. A coding is a
1-uniform morphism.
Morphisms of free monoids
Morphisms. Let A and B be two finite sets. A map σ from A to B ∗ extends uniquely
to a homomorphism between the free monoids A∗ and B ∗ (that is,
σ(w1 w2 · · · wr ) = σ(w1 )σ(w2 ) · · · σ(wr )). Such a homomorphism from A∗ to B ∗ is
called a morphism.
Uniform morphisms. If there is a positive integer k such that each element of A is
mapped to a word of length k, then the morphism is called k-uniform. A coding is a
1-uniform morphism.
Fixed points. A k-uniform morphism σ from A∗ to itself is said to be prolongable if
there exists a letter a such that σ(a) = aw . In that case, the sequence of finite words
(σn (a))n≥0 converges in A∞ = A∗ ∪ AN (endowed with its usual topology) to an
infinite word denoted σ∞ (a).
Morphisms of free monoids
Morphisms. Let A and B be two finite sets. A map σ from A to B ∗ extends uniquely
to a homomorphism between the free monoids A∗ and B ∗ (that is,
σ(w1 w2 · · · wr ) = σ(w1 )σ(w2 ) · · · σ(wr )). Such a homomorphism from A∗ to B ∗ is
called a morphism.
Uniform morphisms. If there is a positive integer k such that each element of A is
mapped to a word of length k, then the morphism is called k-uniform. A coding is a
1-uniform morphism.
Fixed points. A k-uniform morphism σ from A∗ to itself is said to be prolongable if
there exists a letter a such that σ(a) = aw . In that case, the sequence of finite words
(σn (a))n≥0 converges in A∞ = A∗ ∪ AN (endowed with its usual topology) to an
infinite word denoted σ∞ (a).
This infinite word is a fixed point for σ (extended by continuity to infinite words) and
we say that σ∞ (a) is generated by the morphism σ.
Morphisms of free monoids
Morphisms. Let A and B be two finite sets. A map σ from A to B ∗ extends uniquely
to a homomorphism between the free monoids A∗ and B ∗ (that is,
σ(w1 w2 · · · wr ) = σ(w1 )σ(w2 ) · · · σ(wr )). Such a homomorphism from A∗ to B ∗ is
called a morphism.
Uniform morphisms. If there is a positive integer k such that each element of A is
mapped to a word of length k, then the morphism is called k-uniform. A coding is a
1-uniform morphism.
Fixed points. A k-uniform morphism σ from A∗ to itself is said to be prolongable if
there exists a letter a such that σ(a) = aw . In that case, the sequence of finite words
(σn (a))n≥0 converges in A∞ = A∗ ∪ AN (endowed with its usual topology) to an
infinite word denoted σ∞ (a).
This infinite word is a fixed point for σ (extended by continuity to infinite words) and
we say that σ∞ (a) is generated by the morphism σ.
Example. The morphism τ defined over the alphabet {0, 1} by τ (0) = 01 and
τ (1) = 10 is a 2-uniform morphism which generates the Thue–Morse sequence
t = τ ∞ (0) = 01101001100101 · · · .
Automatic sequences and morphisms of free monoids
Uniform morphisms and automatic sequences are strongly connected, as the following
classical result of Cobham shows.
Theorem (Cobham, 1972). An infinite word is k-automatic if and only if it is the
image by a coding of a word that is generated by a k-uniform morphism.
Automatic sequences and morphisms of free monoids
Uniform morphisms and automatic sequences are strongly connected, as the following
classical result of Cobham shows.
Theorem (Cobham, 1972). An infinite word is k-automatic if and only if it is the
image by a coding of a word that is generated by a k-uniform morphism.
Notable consequences.
• Automatic sequences can be produced in linear time (they can be produced by
simple Turing machine).
• Automatic sequences enjoy some “ self-similarity".
Automatic sequences and morphisms of free monoids
Uniform morphisms and automatic sequences are strongly connected, as the following
classical result of Cobham shows.
Theorem (Cobham, 1972). An infinite word is k-automatic if and only if it is the
image by a coding of a word that is generated by a k-uniform morphism.
Notable consequences.
• Automatic sequences can be produced in linear time (they can be produced by
simple Turing machine).
• Automatic sequences enjoy some “ self-similarity".
Remark. The proof provides a constructive/effective result working with automata
that start with the most significant digit.
The Baum–Sweet sequence
The Baum–Sweet sequence (bn ) is defined by bn = 1 if the binary expansion of n
contains no block of consecutive 0’s of odd length, and by bn = 0 otherwise.
0
0, 1
1
A/1
1
0
C /0
B/1
0
1
D/0
The Baum–Sweet sequence
The Baum–Sweet sequence (bn ) is defined by bn = 1 if the binary expansion of n
contains no block of consecutive 0’s of odd length, and by bn = 0 otherwise.
0
0, 1
1
A/1
1
0
C /0
B/1
1
D/0
0
One can show that the Baum-Sweet sequence is equal to the sequence ϕ(σ∞ (A))
where
σ(A) = AB, σ(B) = CB, σ(C ) = BD, σ(D) = DD
and
ϕ(A) = ϕ(B) = 1, ϕ(C ) = ϕ(D) = 0.
A second useful characterization
—
The k-kernel
The k-kernel
An important notion in the study of k-automatic sequences is the notion of k-kernel.
Definition. The k-kernel of a sequence a = (an )n≥0 is defined as the set
˘
¯
Nk (a) = (ak r n+i )n≥0 | i ≥ 0, 0 ≤ i < k r .
The k-kernel
An important notion in the study of k-automatic sequences is the notion of k-kernel.
Definition. The k-kernel of a sequence a = (an )n≥0 is defined as the set
˘
¯
Nk (a) = (ak r n+i )n≥0 | i ≥ 0, 0 ≤ i < k r .
Exercise. Prove that the 2-kernel of the Thue–Morse sequence t has only two
elements (t and the sequence t obtained by exchanging the symbols 0 and 1 in t).
The k-kernel
An important notion in the study of k-automatic sequences is the notion of k-kernel.
Definition. The k-kernel of a sequence a = (an )n≥0 is defined as the set
˘
¯
Nk (a) = (ak r n+i )n≥0 | i ≥ 0, 0 ≤ i < k r .
Exercise. Prove that the 2-kernel of the Thue–Morse sequence t has only two
elements (t and the sequence t obtained by exchanging the symbols 0 and 1 in t).
This notion gives rise to another useful characterization of k-automatic sequences.
Theorem (Eilenberg). A sequence is k-automatic if and only if its k-kernel is finite.
The k-kernel
An important notion in the study of k-automatic sequences is the notion of k-kernel.
Definition. The k-kernel of a sequence a = (an )n≥0 is defined as the set
˘
¯
Nk (a) = (ak r n+i )n≥0 | i ≥ 0, 0 ≤ i < k r .
Exercise. Prove that the 2-kernel of the Thue–Morse sequence t has only two
elements (t and the sequence t obtained by exchanging the symbols 0 and 1 in t).
This notion gives rise to another useful characterization of k-automatic sequences.
Theorem (Eilenberg). A sequence is k-automatic if and only if its k-kernel is finite.
Proof (hint). To prove this result, one uses automata that read the input starting
with the least significant digit. Then, the k-kernel of a given k-automatic sequence
corresponds to the different sequences you can obtain by changing the initial state in
such an automaton. Since there are only a finite number of state, the k-kernel has to
be finite.
The k-kernel
An important notion in the study of k-automatic sequences is the notion of k-kernel.
Definition. The k-kernel of a sequence a = (an )n≥0 is defined as the set
˘
¯
Nk (a) = (ak r n+i )n≥0 | i ≥ 0, 0 ≤ i < k r .
Exercise. Prove that the 2-kernel of the Thue–Morse sequence t has only two
elements (t and the sequence t obtained by exchanging the symbols 0 and 1 in t).
This notion gives rise to another useful characterization of k-automatic sequences.
Theorem (Eilenberg). A sequence is k-automatic if and only if its k-kernel is finite.
Proof (hint). To prove this result, one uses automata that read the input starting
with the least significant digit. Then, the k-kernel of a given k-automatic sequence
corresponds to the different sequences you can obtain by changing the initial state in
such an automaton. Since there are only a finite number of state, the k-kernel has to
be finite.
In the other direction, you can effectively construct a k-automaton (that read the input
starting with the least significant digits) whose states are the elements of the k-kernel.
Connexion with number theory I
—
Algebraic formal power series with coefficients in a finite field
Arithmetic in positive characteristic
Z
Fp [t]
∩
∩
Q
∼
Fp (t)
∩
∩
R
Fp ((t))
One replaces the real number
α :=
+∞
X
n=−k
an
pn
by the Laurent series
f (t) :=
+∞
X
n=−k
an t n .
Formal power series and finite automata
Let us consider the formal power series
X n
f (t) =
t 2 ∈ F2 ((t)).
n≥0
Formal power series and finite automata
Let us consider the formal power series
X n
f (t) =
t 2 ∈ F2 ((t)).
n≥0
Then
f (t)
=
X
n
t2 + t
n≥1
= f (t 2 ) + t = f (t)2 + t
and thus
f (t)2 + f (t) + t = 0.
Formal power series and finite automata
Let us consider the formal power series
X n
f (t) =
t 2 ∈ F2 ((t)).
n≥0
Then
f (t)
=
X
n
t2 + t
n≥1
= f (t 2 ) + t = f (t)2 + t
and thus
f (t)2 + f (t) + t = 0.
Thus, f (t) is a root of the polynomial (whose coefficients are in F2 [t])
X 2 + X + t.
We then say that f (t) is albegraic power series over the field of rational functions
F2 (t). In that case, f (t) is a quadratic power series.
Formal power series and finite automata
Let us consider the formal power series
X n
f (t) =
t 2 ∈ F2 ((t)).
n≥0
Then
f (t)
=
X
n
t2 + t
n≥1
= f (t 2 ) + t = f (t)2 + t
and thus
f (t)2 + f (t) + t = 0.
Thus, f (t) is a root of the polynomial (whose coefficients are in F2 [t])
X 2 + X + t.
We then say that f (t) is albegraic power series over the field of rational functions
F2 (t). In that case, f (t) is a quadratic power series.
Though f (t)P
is not a rational function its Laurent series expansion is very simple.
Indeed, f = n≥0 an t n where an = 1 if n is a power of 2 and an = 0 otherwise.
The sequence an is easily obtained from the base-2 expansion of n. It is a 2-automatic
sequence.
Formal power series and finite automata
Let us consider the formal power series
X
f (t) =
an t n ∈ F2 ((t)),
n≥0
where (an )n≥0 is the Thue–Morse sequence ( an = 1 if the sum of the binary digits of
n is odd and an = 0 otherwise).
Formal power series and finite automata
Let us consider the formal power series
X
f (t) =
an t n ∈ F2 ((t)),
n≥0
where (an )n≥0 is the Thue–Morse sequence ( an = 1 if the sum of the binary digits of
n is odd and an = 0 otherwise).
We easily get that
a2n = an
and
a2n+1 = 1 − an .
Formal power series and finite automata
Let us consider the formal power series
X
f (t) =
an t n ∈ F2 ((t)),
n≥0
where (an )n≥0 is the Thue–Morse sequence ( an = 1 if the sum of the binary digits of
n is odd and an = 0 otherwise).
We easily get that
a2n = an
and
a2n+1 = 1 − an .
Then,
f (t)
=
X
a2n t 2n +
n≥0
=
X
n≥0
=
X
n≥0
X
a2n+1 t 2n+1
n≥0
an t 2n +
X
(an + 1)t 2n+1
n≥0
an t 2n + t
X
(an + 1)t 2n
n≥0
= f (t 2 ) + tf (t 2 ) +
t
·
1 + t2
Formal power series and finite automata
Let us consider the formal power series
X
f (t) =
an t n ∈ F2 ((t)),
n≥0
where (an )n≥0 is the Thue–Morse sequence ( an = 1 if the sum of the binary digits of
n is odd and an = 0 otherwise).
We easily get that
a2n = an
and
a2n+1 = 1 − an .
Then,
f (t)
=
X
a2n t 2n +
n≥0
=
X
X
a2n+1 t 2n+1
n≥0
an t 2n +
X
(an + 1)t 2n+1
n≥0
n≥0
=
X
an t 2n + t
n≥0
X
(an + 1)t 2n
n≥0
= f (t 2 ) + tf (t 2 ) +
t
·
1 + t2
Thus, f (t) is a root of the polynomial
(1 + t)3 X 2 + (1 + t)2 X + t .
Le théorème de Christol
Christol actually proved the following beautiful general result.
Theorem
P (Christol). Let q be a power of a prime number p and let
f (t) = n≥0 an t n a formal power series with coefficients in the finite field Fq . Then,
f is algebraic over the field of rational functions Fp (t) if and only if the sequence
(an )n≥0 is p-automatic.
G. Christol, Ensembles presque périodiques k-reconnaissables, Theoret. Comput.
Sci., 1979.
Le théorème de Christol
Christol actually proved the following beautiful general result.
Theorem
P (Christol). Let q be a power of a prime number p and let
f (t) = n≥0 an t n a formal power series with coefficients in the finite field Fq . Then,
f is algebraic over the field of rational functions Fp (t) if and only if the sequence
(an )n≥0 is p-automatic.
G. Christol, Ensembles presque périodiques k-reconnaissables, Theoret. Comput.
Sci., 1979.
Remark. The proof used the characterization of automatic p-sequences in terms of
their p-kernel.
Le théorème de Christol
Christol actually proved the following beautiful general result.
Theorem
P (Christol). Let q be a power of a prime number p and let
f (t) = n≥0 an t n a formal power series with coefficients in the finite field Fq . Then,
f is algebraic over the field of rational functions Fp (t) if and only if the sequence
(an )n≥0 is p-automatic.
G. Christol, Ensembles presque périodiques k-reconnaissables, Theoret. Comput.
Sci., 1979.
Remark. The proof used the characterization of automatic p-sequences in terms of
their p-kernel.
Exercise. Deduce the following theorem of Fatou. Let f (t) ∈ Z((t)) be an algebraic
power series whose coefficients take only finitely many different values. Then, f (t) is a
rational function.
Automatic sets of integers
Automatic sets of integers
A set N ⊂ N is said to be recognizable by a finite k-automaton, or for short
k-automatic, if the characteristic sequence of N , defined by an = 1 if n ∈ N and
an = 0 otherwise, is a k-automatic sequence.
Automatic sets of integers
A set N ⊂ N is said to be recognizable by a finite k-automaton, or for short
k-automatic, if the characteristic sequence of N , defined by an = 1 if n ∈ N and
an = 0 otherwise, is a k-automatic sequence.
This means that there exists a finite k-automaton that reads as input the base-k
expansion of n and accepts this integer (producing as output the symbol 1) if n
belongs to N , otherwise this automaton rejects the integer n, producing as output the
symbol 0.
Arithmetic progressions
The simplest automatic sets are arithmetic progressions.
Moreover, arithmetic progressions have the very special property of being k-automatic
sets for every integer k ≥ 2.
Arithmetic progressions
The simplest automatic sets are arithmetic progressions.
Moreover, arithmetic progressions have the very special property of being k-automatic
sets for every integer k ≥ 2.
1
A/0
0
1
C /0
B/1
1
0
1
D/0
0
0
1
E /0
0
A 2-automaton recognizing the arithmetic progression 5N + 3.
Powers of 2
The set {1, 2, 4, 8, 16, . . .} formed by the powers of 2 is also a typical example of a
2-automatic set.
Powers of 2
The set {1, 2, 4, 8, 16, . . .} formed by the powers of 2 is also a typical example of a
2-automatic set.
0
0, 1
0
A/0
1
B/1
1
C /0
A 2-automaton recognizing the powers of 2.
Sums of two discinct powers of 3
In the same spirit, the set formed by taking all integers that can be expressed as the
sum of two distinct powers of 3 is 3-automatic.
Sums of two discinct powers of 3
In the same spirit, the set formed by taking all integers that can be expressed as the
sum of two distinct powers of 3 is 3-automatic.
0
0
A/0
2
1
B/0
2
1
0, 1, 2
0
C /0
1, 2
D/1
A 3-automaton recognizing those integers that are the sum of two distinct powers of 3.
Sums of two discinct powers of 3
In the same spirit, the set formed by taking all integers that can be expressed as the
sum of two distinct powers of 3 is 3-automatic.
0
0
A/0
2
1
B/0
2
1
0, 1, 2
0
C /0
1, 2
D/1
A 3-automaton recognizing those integers that are the sum of two distinct powers of 3.
There are also much stranger automatic sets !
Sums of two discinct powers of 3
In the same spirit, the set formed by taking all integers that can be expressed as the
sum of two distinct powers of 3 is 3-automatic.
0
0
A/0
2
1
B/0
2
1
0, 1, 2
0
C /0
1, 2
D/1
A 3-automaton recognizing those integers that are the sum of two distinct powers of 3.
There are also much stranger automatic sets !
For instance, the set of integers whose binary expansion has an odd number of digits,
does not contain three consecutive 1’s, and contains an even number of two
consecutive 0’s is a 2-automatic set.
Sums of two discinct powers of 3
In the same spirit, the set formed by taking all integers that can be expressed as the
sum of two distinct powers of 3 is 3-automatic.
0
0
A/0
2
1
B/0
2
1
0, 1, 2
0
C /0
1, 2
D/1
A 3-automaton recognizing those integers that are the sum of two distinct powers of 3.
There are also much stranger automatic sets !
For instance, the set of integers whose binary expansion has an odd number of digits,
does not contain three consecutive 1’s, and contains an even number of two
consecutive 0’s is a 2-automatic set.
Furthermore, the class of k-automatic sets is closed under various natural operations
such as intersection, union and complement.
Connexion with number theory II
—
A discussion on prime numbers and automata
Prime numbers and randomness
An efficient way to produce conjectures about prime numbers comes from the
so–called Cramér probabilistic model.
K. Soundararajan, The distribution of prime numbers, 2007.
It is based on the principle that the set P of prime numbers behaves roughly like a
random sequence, in which an integer of size about n has a 1 in log n chance of being
prime.
Prime numbers and randomness
An efficient way to produce conjectures about prime numbers comes from the
so–called Cramér probabilistic model.
K. Soundararajan, The distribution of prime numbers, 2007.
It is based on the principle that the set P of prime numbers behaves roughly like a
random sequence, in which an integer of size about n has a 1 in log n chance of being
prime.
Limitations. Prime numbers are all odd with only one exception !
T. Tao, Structure and Randomness, Amer. Math. Soc., 2008.
Prime numbers and randomness
An efficient way to produce conjectures about prime numbers comes from the
so–called Cramér probabilistic model.
K. Soundararajan, The distribution of prime numbers, 2007.
It is based on the principle that the set P of prime numbers behaves roughly like a
random sequence, in which an integer of size about n has a 1 in log n chance of being
prime.
Limitations. Prime numbers are all odd with only one exception !
T. Tao, Structure and Randomness, Amer. Math. Soc., 2008.
Predictions. The Cramér model allows one to predict precise answers concerning
occurrences :
• of large gaps between consecutive prime numbers,
• of small gaps between primes (twin prime conjecture),
• of some special patterns in P such as arithmetic progressions (Hardy–Littlewood
conjectures).
Prime numbers and randomness
An efficient way to produce conjectures about prime numbers comes from the
so–called Cramér probabilistic model.
K. Soundararajan, The distribution of prime numbers, 2007.
It is based on the principle that the set P of prime numbers behaves roughly like a
random sequence, in which an integer of size about n has a 1 in log n chance of being
prime.
Limitations. Prime numbers are all odd with only one exception !
T. Tao, Structure and Randomness, Amer. Math. Soc., 2008.
Predictions. The Cramér model allows one to predict precise answers concerning
occurrences :
• of large gaps between consecutive prime numbers,
• of small gaps between primes (twin prime conjecture),
• of some special patterns in P such as arithmetic progressions (Hardy–Littlewood
conjectures).
Some spectacular breakthrough were made recently in the two latter topics by
Goldston, Pintz and Yildirim, and by Green and Tao.
Prime numbers and automata
A consequence of this probabilistic way of thinking is that the set P should be
sufficiently random that it cannot be recognized by a finite automata.
Prime numbers and automata
A consequence of this probabilistic way of thinking is that the set P should be
sufficiently random that it cannot be recognized by a finite automata.
Theorem (Minsky and Papert, 1966). The set of prime numbers is not automatic.
Prime numbers and automata
A consequence of this probabilistic way of thinking is that the set P should be
sufficiently random that it cannot be recognized by a finite automata.
Theorem (Minsky and Papert, 1966). The set of prime numbers is not automatic.
Actually, Schützenberger (see also Hartmanis and Shank) even proved the following
stronger result.
Theorem (Schützenberger, 1968). No infinite subset of the set of prime numbers
can be recognized by a finite automaton.
Prime numbers and automata
A consequence of this probabilistic way of thinking is that the set P should be
sufficiently random that it cannot be recognized by a finite automata.
Theorem (Minsky and Papert, 1966). The set of prime numbers is not automatic.
Actually, Schützenberger (see also Hartmanis and Shank) even proved the following
stronger result.
Theorem (Schützenberger, 1968). No infinite subset of the set of prime numbers
can be recognized by a finite automaton.
The intriguing question we are now left with is : how can we prove that a set of
integers is not automatic ?
Prime numbers and automata
A consequence of this probabilistic way of thinking is that the set P should be
sufficiently random that it cannot be recognized by a finite automata.
Theorem (Minsky and Papert, 1966). The set of prime numbers is not automatic.
Actually, Schützenberger (see also Hartmanis and Shank) even proved the following
stronger result.
Theorem (Schützenberger, 1968). No infinite subset of the set of prime numbers
can be recognized by a finite automaton.
The intriguing question we are now left with is : how can we prove that a set of
integers is not automatic ?
There are actually different approaches : one can use the k-kernel and show that it is
infinite or one can use some density properties (the logarithmic frequency of an
automatic set exists, also if an automatic set has a positive density then it is rational).
The pumping lemma
The following result turns out to be very useful tool to prove that some sets are not
automatic.
Pumping Lemma. Let N ⊂ N be a k-automatic set. Then for every sufficiently large
integer n in N , there exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that
n = [w1 w2 w3 ]k and [w1 w2j w3 ]k belongs to N for all j ≥ 1.
The pumping lemma
The following result turns out to be very useful tool to prove that some sets are not
automatic.
Pumping Lemma. Let N ⊂ N be a k-automatic set. Then for every sufficiently large
integer n in N , there exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that
n = [w1 w2 w3 ]k and [w1 w2j w3 ]k belongs to N for all j ≥ 1.
Proof. Let n = [ar ar −1 · · · a0 ]k be an element of N and assume that r is larger than
the number of states in the underlying automaton. By the pigeonhole principle, there
is a state that is encountered twice when reading the input a0 a1 · · · ar , say just after
reading ai and aj , i < j. Then setting w1 = ar · · · aj+1 , w2 = aj · · · ai +1 and
w3 = ai · · · a0 , gives the result.
The pumping lemma
The following result turns out to be very useful tool to prove that some sets are not
automatic.
Pumping Lemma. Let N ⊂ N be a k-automatic set. Then for every sufficiently large
integer n in N , there exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that
n = [w1 w2 w3 ]k and [w1 w2j w3 ]k belongs to N for all j ≥ 1.
Proof. Let n = [ar ar −1 · · · a0 ]k be an element of N and assume that r is larger than
the number of states in the underlying automaton. By the pigeonhole principle, there
is a state that is encountered twice when reading the input a0 a1 · · · ar , say just after
reading ai and aj , i < j. Then setting w1 = ar · · · aj+1 , w2 = aj · · · ai +1 and
w3 = ai · · · a0 , gives the result.
Proof of Schützenberger’s theorem. Let us assume that N is an infinite
k-automatic set consisting only of prime numbers. Let p be an element of N that is
sufficiently large to apply the pumping lemma.
The pumping lemma
The following result turns out to be very useful tool to prove that some sets are not
automatic.
Pumping Lemma. Let N ⊂ N be a k-automatic set. Then for every sufficiently large
integer n in N , there exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that
n = [w1 w2 w3 ]k and [w1 w2j w3 ]k belongs to N for all j ≥ 1.
Proof. Let n = [ar ar −1 · · · a0 ]k be an element of N and assume that r is larger than
the number of states in the underlying automaton. By the pigeonhole principle, there
is a state that is encountered twice when reading the input a0 a1 · · · ar , say just after
reading ai and aj , i < j. Then setting w1 = ar · · · aj+1 , w2 = aj · · · ai +1 and
w3 = ai · · · a0 , gives the result.
Proof of Schützenberger’s theorem. Let us assume that N is an infinite
k-automatic set consisting only of prime numbers. Let p be an element of N that is
sufficiently large to apply the pumping lemma.
There thus exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that p = [w1 w2 w3 ]k
and such that all integers of the form [w1 w2j w3 ]k , with j ≥ 1, belong to N .
The pumping lemma
The following result turns out to be very useful tool to prove that some sets are not
automatic.
Pumping Lemma. Let N ⊂ N be a k-automatic set. Then for every sufficiently large
integer n in N , there exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that
n = [w1 w2 w3 ]k and [w1 w2j w3 ]k belongs to N for all j ≥ 1.
Proof. Let n = [ar ar −1 · · · a0 ]k be an element of N and assume that r is larger than
the number of states in the underlying automaton. By the pigeonhole principle, there
is a state that is encountered twice when reading the input a0 a1 · · · ar , say just after
reading ai and aj , i < j. Then setting w1 = ar · · · aj+1 , w2 = aj · · · ai +1 and
w3 = ai · · · a0 , gives the result.
Proof of Schützenberger’s theorem. Let us assume that N is an infinite
k-automatic set consisting only of prime numbers. Let p be an element of N that is
sufficiently large to apply the pumping lemma.
There thus exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that p = [w1 w2 w3 ]k
and such that all integers of the form [w1 w2j w3 ]k , with j ≥ 1, belong to N .
However, it is not difficult to see, by using Fermat’s little theorem, that
[w1 w2p w3 ]k ≡ [w1 w2 w3 ]k mod p and thus [w1 w2p w3 ]k ≡ 0 mod p.
The pumping lemma
The following result turns out to be very useful tool to prove that some sets are not
automatic.
Pumping Lemma. Let N ⊂ N be a k-automatic set. Then for every sufficiently large
integer n in N , there exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that
n = [w1 w2 w3 ]k and [w1 w2j w3 ]k belongs to N for all j ≥ 1.
Proof. Let n = [ar ar −1 · · · a0 ]k be an element of N and assume that r is larger than
the number of states in the underlying automaton. By the pigeonhole principle, there
is a state that is encountered twice when reading the input a0 a1 · · · ar , say just after
reading ai and aj , i < j. Then setting w1 = ar · · · aj+1 , w2 = aj · · · ai +1 and
w3 = ai · · · a0 , gives the result.
Proof of Schützenberger’s theorem. Let us assume that N is an infinite
k-automatic set consisting only of prime numbers. Let p be an element of N that is
sufficiently large to apply the pumping lemma.
There thus exist finite words w1 , w2 and w3 , with |w2 | ≥ 1, such that p = [w1 w2 w3 ]k
and such that all integers of the form [w1 w2j w3 ]k , with j ≥ 1, belong to N .
However, it is not difficult to see, by using Fermat’s little theorem, that
[w1 w2p w3 ]k ≡ [w1 w2 w3 ]k mod p and thus [w1 w2p w3 ]k ≡ 0 mod p.
It follows that the integer [w1 w2p w3 ]k belongs to N but is not a prime number.
Primes in automatic sets
We have just seen that the set of all prime numbers is not automatic. However, it is
believed that many automatic sets should contain infinitely many prime numbers.
Primes in automatic sets
We have just seen that the set of all prime numbers is not automatic. However, it is
believed that many automatic sets should contain infinitely many prime numbers.
Dirichlet’s theorem. Let a and b be two relatively prime positive integers. Then the
arithmetic progression aN + b contains infinitely many primes.
Primes in automatic sets
We have just seen that the set of all prime numbers is not automatic. However, it is
believed that many automatic sets should contain infinitely many prime numbers.
Dirichlet’s theorem. Let a and b be two relatively prime positive integers. Then the
arithmetic progression aN + b contains infinitely many primes.
• Note that the special case of the arithmetic progression 2N + 1 was known by Euclid
and his famous proof that there are infinitely many prime numbers.
Primes in automatic sets
We have just seen that the set of all prime numbers is not automatic. However, it is
believed that many automatic sets should contain infinitely many prime numbers.
Dirichlet’s theorem. Let a and b be two relatively prime positive integers. Then the
arithmetic progression aN + b contains infinitely many primes.
• Note that the special case of the arithmetic progression 2N + 1 was known by Euclid
and his famous proof that there are infinitely many prime numbers.
Beyond Dirichlet’s theorem, the more general result concerning automatic sets and
prime numbers is the following.
Theorem (Fouvry and Mauduit, 1996). Given a non-empty automatic set N ⊂ N
associated with an irreducible automaton, there exists a positive integer r such that N
contains infinitely many r -almost primes.
An automaton is irreducible if for all pairs of states (A, B) there is a path from A to
B. An r -almost prime is the product of at most r prime numbers.
Primes in automatic sets
We have just seen that the set of all prime numbers is not automatic. However, it is
believed that many automatic sets should contain infinitely many prime numbers.
Dirichlet’s theorem. Let a and b be two relatively prime positive integers. Then the
arithmetic progression aN + b contains infinitely many primes.
• Note that the special case of the arithmetic progression 2N + 1 was known by Euclid
and his famous proof that there are infinitely many prime numbers.
Beyond Dirichlet’s theorem, the more general result concerning automatic sets and
prime numbers is the following.
Theorem (Fouvry and Mauduit, 1996). Given a non-empty automatic set N ⊂ N
associated with an irreducible automaton, there exists a positive integer r such that N
contains infinitely many r -almost primes.
An automaton is irreducible if for all pairs of states (A, B) there is a path from A to
B. An r -almost prime is the product of at most r prime numbers.
• In contrast, to prove that there are infinitely many primes in sparse automatic sets
such as {2n − 1, n ≥ 1} and {2n + 1, n ≥ 1} appears to be extremely difficult. This
would solve two long-standing conjectures about the existence of infinitely many
Fermat primes and Mersenne primes.
Primes in the Thue–Morse set
In 1968, Gelfond asked about the collection of prime numbers that belong to the
Thue–Morse set.
• The previous result of Fouvry and Mauduit implies that such a set contains infinitely
r -almost primes for some r , but until recently it was still not known whether it
contains infinitely many primes.
Primes in the Thue–Morse set
In 1968, Gelfond asked about the collection of prime numbers that belong to the
Thue–Morse set.
• The previous result of Fouvry and Mauduit implies that such a set contains infinitely
r -almost primes for some r , but until recently it was still not known whether it
contains infinitely many primes.
Remarkably, Mauduit and Rivat proved a much stronger result that gives the exact
proportion of primes that belong to this automatic set, solving Gelfond’s conjecture.
Theorem (Mauduit and Rivat). One has
lim
N→∞
{0 ≤ n ≤ N, n ∈ P and s2 (n) ≡ 1
{0 ≤ n ≤ N, n ∈ P}
mod 2}
=
1
·
2
C. Mauduit et J. Rivat, Sur un problème de Gelfond : la somme des chiffres des
nombres premiers, Annals of Math., 2010.
Thus, half of prime numbers belong to the Thue–Morse set {1, 2, 4, 7, 8, 11, 13, . . .}.
Primes in the Thue–Morse set
In 1968, Gelfond asked about the collection of prime numbers that belong to the
Thue–Morse set.
• The previous result of Fouvry and Mauduit implies that such a set contains infinitely
r -almost primes for some r , but until recently it was still not known whether it
contains infinitely many primes.
Remarkably, Mauduit and Rivat proved a much stronger result that gives the exact
proportion of primes that belong to this automatic set, solving Gelfond’s conjecture.
Theorem (Mauduit and Rivat). One has
lim
N→∞
{0 ≤ n ≤ N, n ∈ P and s2 (n) ≡ 1
{0 ≤ n ≤ N, n ∈ P}
mod 2}
=
1
·
2
C. Mauduit et J. Rivat, Sur un problème de Gelfond : la somme des chiffres des
nombres premiers, Annals of Math., 2010.
Thus, half of prime numbers belong to the Thue–Morse set {1, 2, 4, 7, 8, 11, 13, . . .}.
As usual with analytic number theory, the proof of their result (which relies on strong
estimates of exponential sums) is long and difficult.
© Copyright 2026 Paperzz