The Expected Number of Repetitions in Random Words

The Expected Number of Repetitions
in Random Words
Arseny M. Shur
Ural Federal University, Ekaterinburg, Russia
Shanghai, April 25, 2015
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
1 / 19
Combinatorics on Words
A discipline that studies properties of sequences of symbols
Born: 1906
A. Thue. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr.
Mat. Nat. Kl. 7, 1–22 (1906)
Named: 1983
M. Lothaire. Combinatorics on Words. Vol. 17 of Encyclopedia of
Mathemetics and Its Applications (1983)
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
2 / 19
Combinatorics on Words
A discipline that studies properties of sequences of symbols
Born: 1906
A. Thue. Über unendliche Zeichenreihen. Norske vid. Selsk. Skr.
Mat. Nat. Kl. 7, 1–22 (1906)
Named: 1983
M. Lothaire. Combinatorics on Words. Vol. 17 of Encyclopedia of
Mathemetics and Its Applications (1983)
Sources:
Algebra (terms)
Symbolic dynamics (trajectories)
Grammars and rewriting systems
Algorithms for sequential data
Biological strings
...
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
2 / 19
Palindromes and Squares
A palindrome is a word which is equal to its reversal, like
a i b o h p h o b i a
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
3 / 19
Palindromes and Squares
A palindrome is a word which is equal to its reversal, like
a i b o h p h o b i a
Palindromes are one of the most simple and common repetitions in
words, along with squares, which are words consisting of two equal
parts, like
c o u s c o u s
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
3 / 19
Palindromes and Squares
A palindrome is a word which is equal to its reversal, like
a i b o h p h o b i a
Palindromes are one of the most simple and common repetitions in
words, along with squares, which are words consisting of two equal
parts, like
c o u s c o u s
Palindromes are in some sense counterparts of squares:
in a sequence of states of some finite-state machine, a square
indicates repeated behaviour, while a palindrome shows that the
machine reversed back to front;
among the basic data structures, palindromes correspond to
stacks, while squares correspond to queues; as a consequence,
the language of all palindromes is context-free, while the language
of all squares is not.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
3 / 19
Counting Factors
We consider finite words over finite (k-letter) alphabets; we write
w = w [1..n] for a word of length n; words of the form w [i..j] are factors
of w . Normally, n is assumed big, and k is fixed.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
4 / 19
Counting Factors
We consider finite words over finite (k-letter) alphabets; we write
w = w [1..n] for a word of length n; words of the form w [i..j] are factors
of w . Normally, n is assumed big, and k is fixed.
A lot of results on the possible number of distinct palindromic factors
and square factors in a word:
max number of palindromes is n (Droubay, Pirillo, 2001)
√
max number of squares is between n − O( n) and 2n − O(log n)
(Ilie, 2007)
min number of palindromes is k for k ≥ 3 and 8 for k = 2, n ≥ 9
min number of squares is 0 for k ≥ 3 (Thue, 1912) and 3 for k = 2
(Fraenkel, Simpson, 1995)
any number of palindromes between min and max is available
for k ≥ 4, a word can contain k palindromes and 0 squares
simultaneously
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
4 / 19
Problems and Simple Answers
Problems
Find the expected number of
- distinct palindromes
- distinct squares
occurring as factors in a random k-ary word.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
5 / 19
Problems and Simple Answers
Problems
Find the expected number of
- distinct palindromes
- distinct squares
occurring as factors in a random k-ary word.
Theorem
The expected number of distinct palindromic factors
√ in a random word
of length n over a fixed nontrivial alphabet is Θ( n).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
5 / 19
Problems and Simple Answers
Problems
Find the expected number of
- distinct palindromes
- distinct squares
occurring as factors in a random k-ary word.
Theorem
The expected number of distinct palindromic factors
√ in a random word
of length n over a fixed nontrivial alphabet is Θ( n).
As a by-product of the technique used, we also get
Theorem
The expected number of distinct square factors
√ in a random word of
length n over a fixed nontrivial alphabet is Θ( n).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
5 / 19
Some Explanations
Let k (alphabetic size) be fixed; E (n) is the expectation studied.
The expected number Em (n) of distinct palindromic factors
of length m in a random word of length n is not greater than
⋆ the total number of k-ary palindromes of length m;
⋆ the expected number of occurrences of palindromic factors of
length m in a random word of length n.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
6 / 19
Some Explanations
Let k (alphabetic size) be fixed; E (n) is the expectation studied.
The expected number Em (n) of distinct palindromic factors
of length m in a random word of length n is not greater than
⋆ the total number of k-ary palindromes of length m; blue
⋆ the expected number of occurrences of palindromic factors of
length m in a random word of length n. red
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
A. M. Shur (Ural Federal U)
n−2m
km
m
Expected Number of Repetitions
po
m
Shanghai, April 25, 2015
6 / 19
Some Explanations (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
E (n) =
P
n−2m
km
m
po
m
Em (n) is bounded by the total area under the graphs;
since all graphs are those of exponents, the area under each pair
of graphs equals to the height
of the highest point up to a constant
√
multiple; thus, E (n) = O( n);
some additional considerations show that the upper√
bound is
sharp up to a constant multiple, implying E (n) = Θ( n).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
7 / 19
Some Explanations (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
E (n) =
P
n−2m
km
m
po
m
Em (n) is bounded by the total area under the graphs;
since all graphs are those of exponents, the area under each pair
of graphs equals to the height
of the highest point up to a constant
√
multiple; thus, E (n) = O( n);
some additional considerations show that the upper√
bound is
sharp up to a constant multiple, implying E (n) = Θ( n).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
7 / 19
Some Explanations (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
E (n) =
P
n−2m
km
m
po
m
Em (n) is bounded by the total area under the graphs;
since all graphs are those of exponents, the area under each pair
of graphs equals to the height
of the highest point up to a constant
√
multiple; thus, E (n) = O( n);
some additional considerations show that the upper√
bound is
sharp up to a constant multiple, implying E (n) = Θ( n).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
7 / 19
Some Explanations (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
E (n) =
P
n−2m
km
m
po
m
Em (n) is bounded by the total area under the graphs;
since all graphs are those of exponents, the area under each pair
of graphs equals to the height
of the highest point up to a constant
√
multiple; thus, E (n) = O( n);
some additional considerations show that the upper√
bound is
sharp up to a constant multiple, implying E (n) = Θ( n).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
7 / 19
Dependence on k
Refinement of the obtained result: consider E (n, k)
√ instead of E (n)
and find the dependence of the constant in the Θ( n) expression on k.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
8 / 19
Dependence on k
Refinement of the obtained result: consider E (n, k)
√ instead of E (n)
and find the dependence of the constant in the Θ( n) expression on k.
intuition: more letters – more luck needed to get a palindrome;
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
8 / 19
Dependence on k
Refinement of the obtained result: consider E (n, k)
√ instead of E (n)
and find the dependence of the constant in the Θ( n) expression on k.
intuition: more letters – more luck needed to get a palindrome;
√
broken by the picture: the peak on the right graph is ≈ kn;
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
A. M. Shur (Ural Federal U)
n−2m
km
m
Expected Number of Repetitions
po
m
Shanghai, April 25, 2015
8 / 19
Dependence on k
Refinement of the obtained result: consider E (n, k)
√ instead of E (n)
and find the dependence of the constant in the Θ( n) expression on k.
intuition: more letters – more luck needed to get a palindrome;
√
broken by the picture: the peak on the right graph is ≈ kn;
√
is E (n, k) = Θ( kn)? Not so easy.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
8 / 19
Dependence on k (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
If po is an integer
A. M. Shur (Ural Federal U)
n−2m
km
po
m
, we get
√
Expected Number of Repetitions
m
kn
Shanghai, April 25, 2015
9 / 19
Dependence on k (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
n−2m
km
m
If po is an integer [half-integer], we get
A. M. Shur (Ural Federal U)
po
√
Expected Number of Repetitions
kn
m
√
[2 n]
Shanghai, April 25, 2015
9 / 19
Dependence on k (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
n−2m
km
m
po
√
m
√
If po is an integer [half-integer], we get kn [2 n]
√
similar for pe , but the values are k times less
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
9 / 19
Dependence on k (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
n−2m
km
m
po
√
m
√
If po is an integer [half-integer], we get kn [2 n]
√
similar for pe , but the values are k times less
note that pe ≈ po + 1/2 (in fact, pe = p0 + 1/2!)
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
9 / 19
Dependence on k (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
n−2m
km
m
po
√
m
√
If po is an integer [half-integer], we get kn [2 n]
√
similar for pe , but the values are k times less
note that pe ≈ po + 1/2 (in fact, pe = p0 + 1/2!)
◮ the upper bound oscillates between the orders of
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
√
n and
√
kn
Shanghai, April 25, 2015
9 / 19
Dependence on k (Ctd)
Length 2m
Length 2m+1
km
k m+1
n−2m+1
km
pe
n−2m
km
m
po
√
m
√
If po is an integer [half-integer], we get kn [2 n]
√
similar for pe , but the values are k times less
note that pe ≈ po + 1/2 (in fact, pe = p0 + 1/2!)
◮ the upper bound oscillates between the orders of
√
n and
√
kn
◮ what’s next?
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
9 / 19
Balls and Bins
To get more intuition, suppose (even if this is not true) that for a
random k-ary word of length n all events of type “to contain a given
palindrome of length m” are independent and equiprobable.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
10 / 19
Balls and Bins
To get more intuition, suppose (even if this is not true) that for a
random k-ary word of length n all events of type “to contain a given
palindrome of length m” are independent and equiprobable.
Balls: palindromic factors of length m of our random word
w [i1 ..j1 ]
w [i2 ..j2 ]
w [i3 ..j3 ]
w [i4 ..j4 ]
···
w [i5 ..j5 ]
w [is ..js ]
Bins: distinct palindromes of length m
aaaaa
A. M. Shur (Ural Federal U)
aabaa
ababa
Expected Number of Repetitions
···
bbbbb
Shanghai, April 25, 2015
10 / 19
Balls and Bins
To get more intuition, suppose (even if this is not true) that for a
random k-ary word of length n all events of type “to contain a given
palindrome of length m” are independent and equiprobable.
Balls: palindromic factors of length m of our random word
w [i1 ..j1 ]
w [i2 ..j2 ]
w [i3 ..j3 ]
w [i4 ..j4 ]
···
w [i5 ..j5 ]
w [is ..js ]
Bins: distinct palindromes of length m
aaaaa
aabaa
ababa
···
bbbbb
Folklore Proposition
For N bins and CN balls, the expected number of empty bins is
≈Ne−C .
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
10 / 19
Testing The Model
Conjecture
The function
maximums √
E (n, k) oscillates between
its
√
n
1
4
k +1
1
E (n, k) = 1 − e + k −1 − 2(k 3 −1) − O kek
kn + O( k√log
)
n
for integerpo and minimums
√
√
k√log n
4
k 2 +1
1
− 2(k
−
O
)
E (n, k) = 3 − e1 + k −1
n
+
O(
3 −1)
ek
n
for integer pe .
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
11 / 19
Testing The Model
Conjecture
The function
maximums √
E (n, k) oscillates between
its
√
n
1
4
k +1
1
E (n, k) = 1 − e + k −1 − 2(k 3 −1) − O kek
kn + O( k√log
)
n
for integerpo and minimums
√
√
k√log n
4
k 2 +1
1
− 2(k
−
O
)
E (n, k) = 3 − e1 + k −1
n
+
O(
3 −1)
ek
n
for integer pe .
√
A big amount of experimental data for C(k) = E (n, k)/ n for different
k and n ∼ 106 − 108 was obtained by M. Rubinchik, with the use of a
novel data structure (palindromic tree).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
11 / 19
Testing The Model
Conjecture
The function
maximums √
E (n, k) oscillates between
its
√
n
1
4
k +1
1
E (n, k) = 1 − e + k −1 − 2(k 3 −1) − O kek
kn + O( k√log
)
n
for integerpo and minimums
√
√
k√log n
4
k 2 +1
1
− 2(k
−
O
)
E (n, k) = 3 − e1 + k −1
n
+
O(
3 −1)
ek
n
for integer pe .
√
A big amount of experimental data for C(k) = E (n, k)/ n for different
k and n ∼ 106 − 108 was obtained by M. Rubinchik, with the use of a
novel data structure (palindromic tree). His data
supports the conjecture in general
cannot definitely tell whether the coefficients are correct
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
11 / 19
Testing The Model
Conjecture
The function
maximums √
E (n, k) oscillates between
its
√
n
1
4
k +1
1
E (n, k) = 1 − e + k −1 − 2(k 3 −1) − O kek
kn + O( k√log
)
n
for integerpo and minimums
√
√
k√log n
4
k 2 +1
1
− 2(k
−
O
)
E (n, k) = 3 − e1 + k −1
n
+
O(
3 −1)
ek
n
for integer pe .
√
A big amount of experimental data for C(k) = E (n, k)/ n for different
k and n ∼ 106 − 108 was obtained by M. Rubinchik, with the use of a
novel data structure (palindromic tree). His data
supports the conjecture in general
cannot definitely tell whether the coefficients are correct
One of the problems: for small alphabets, the suggested maximums
and minimums are almost indistinguishable
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
11 / 19
Analysis
Bad news: our assumption was totally wrong, because the events “to
contain a given palindrome of length m” are dependent and have
different probabilities.
aaa · · · aaa is less probable than baa · · · aab,
and each of them “suppresses” the other.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
12 / 19
Analysis
Bad news: our assumption was totally wrong, because the events “to
contain a given palindrome of length m” are dependent and have
different probabilities.
aaa · · · aaa is less probable than baa · · · aab,
and each of them “suppresses” the other.
Why the predictions with balls and bins were good?
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
12 / 19
Analysis
Bad news: our assumption was totally wrong, because the events “to
contain a given palindrome of length m” are dependent and have
different probabilities.
aaa · · · aaa is less probable than baa · · · aab,
and each of them “suppresses” the other.
Why the predictions with balls and bins were good?
Good news: the probabilities for all words of length m are quite close
(any word of length m is more probable as a factor than any word of
length m+1); moreover, most of the palindromes have almost the
same (or even exactly the same) probability. There is also a way to
avoid considering dependencies.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
12 / 19
Analysis
Bad news: our assumption was totally wrong, because the events “to
contain a given palindrome of length m” are dependent and have
different probabilities.
aaa · · · aaa is less probable than baa · · · aab,
and each of them “suppresses” the other.
Why the predictions with balls and bins were good?
Good news: the probabilities for all words of length m are quite close
(any word of length m is more probable as a factor than any word of
length m+1); moreover, most of the palindromes have almost the
same (or even exactly the same) probability. There is also a way to
avoid considering dependencies.
Approach: the theory of factor avoidance.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
12 / 19
Forget about dependence
A word u avoids a word w if w is not a factor of u.
Lemma
E(n, k, m) =
P |w |=m,
w ∈P
1−
Aw (n)
kn
, where
- E(n, k, m) is the expected number of distinct palindromes of length m
in a random word of length n
- P is the set of all palindromes
- Aw (n) is the number of words of length n avoiding the word w (over a
fixed k-letter alphabet)
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
13 / 19
Forget about dependence
A word u avoids a word w if w is not a factor of u.
Lemma
E(n, k, m) =
P |w |=m,
w ∈P
1−
Aw (n)
kn
, where
- E(n, k, m) is the expected number of distinct palindromes of length m
in a random word of length n
- P is the set of all palindromes
- Aw (n) is the number of words of length n avoiding the word w (over a
fixed k-letter alphabet)
Since E(n, k) =
n/2
P
E(n, k, m),
m=1
all we need is a good asymptotics for Aw (n).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
13 / 19
Number of words avoiding a single factor
A word u is a border of a word w if u is both a prefix and a suffix of
w (including the case u = w )
With a word w of length m we associate its border array, which is
a word ŵ [1..m] over {0, 1} such that w [i] = 1 if and only if w has a
border of length m−i+1
The border array can be interpreted as the array of coefficients of
a real-valued border polynomial fw (x) such that ŵ [i] is the
coefficient of x m−i . Since ŵ[1] = 1, this polynomial has degree
m−1
Example
The word w = aabaabaa has non-empty borders w , aabaa, aa, and a.
- Its border array ŵ equals 10010011.
- Its border polynomial is fw (x) = x 7 + x 4 + x + 1.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
14 / 19
Number of words avoiding a single factor (2)
Recall that fw (k) is the border polynomial of w .
Theorem (Guibas, Odlyzko, 1978, 1981)
1) The number Aw (n) of words of length n avoiding a given word w of
length m > 3 is
Aw (n) = Cw θwn + O(1.7n ),
m2 1
1
f ′ (k)
.
− O 3m , Cw =
where θw = k −
− w3
fw (k) fw (k)
k
1 − (k − θ)2 fw′ (θ)
2) The condition fu (k) < fw (k) implies Au (n) ≤ Aw (n) for all n ≥ 0 and,
in particular, θu ≤ θw .
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
15 / 19
Number of words avoiding a single factor (2)
Recall that fw (k) is the border polynomial of w .
Theorem (Guibas, Odlyzko, 1978, 1981)
1) The number Aw (n) of words of length n avoiding a given word w of
length m > 3 is
Aw (n) = Cw θwn + O(1.7n ),
m2 1
1
f ′ (k)
.
− O 3m , Cw =
where θw = k −
− w3
fw (k) fw (k)
k
1 − (k − θ)2 fw′ (θ)
2) The condition fu (k) < fw (k) implies Au (n) ≤ Aw (n) for all n ≥ 0 and,
in particular, θu ≤ θw .
fu (k) < fw (k) iff û < ŵ as an integer written in binary
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
15 / 19
Number of words avoiding a single factor (2)
Recall that fw (k) is the border polynomial of w .
Theorem (Guibas, Odlyzko, 1978, 1981)
1) The number Aw (n) of words of length n avoiding a given word w of
length m > 3 is
Aw (n) = Cw θwn + O(1.7n ),
m2 1
1
f ′ (k)
.
− O 3m , Cw =
where θw = k −
− w3
fw (k) fw (k)
k
1 − (k − θ)2 fw′ (θ)
2) The condition fu (k) < fw (k) implies Au (n) ≤ Aw (n) for all n ≥ 0 and,
in particular, θu ≤ θw .
fu (k) < fw (k) iff û < ŵ as an integer written in binary
W.h.p., if w is a palindrome, then ŵ = 10 · · · 0z, where
|z| = O(log |w |)
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
15 / 19
Estimating E(n, k )
Since almost all palindromes of length m have “almost equal” border
polynomials, we can derive the following formula:
 √
1
k ε 1 − e− k 2ε
√n ,
n + O log
m is even,
n √
E(n, k, m) =
1
−
log
n
k ε 1 − e k 2ε
kn + O √n , m is odd,
where m = 2(pe + ε) = 2(po + ε) + 1.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
16 / 19
Estimating E(n, k )
Since almost all palindromes of length m have “almost equal” border
polynomials, we can derive the following formula:
 √
1
k ε 1 − e− k 2ε
√n ,
n + O log
m is even,
n √
E(n, k, m) =
1
−
log
n
k ε 1 − e k 2ε
kn + O √n , m is odd,
where m = 2(pe + ε) = 2(po + ε) + 1.
2
The function g(x) = x(1 − e−1/x ), appearing as the coefficient (for
x = k ε ) has a tricky behaviour. In particular,
g(1) = 1 − 1/e ≈ 0.6321, but this is not the maximum value!
maxx>0 g(x) ≈ 0.6382 is reached at x ≈ 0.8921
thus, the coefficients suggested by the balls-and-bins approach
were slightly incorrect
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
16 / 19
The final result
Theorem
Let k ≥ 2.
(1) The expected√palindromic richness E(n, k) of a random k-ary word
of length n is Θ( n) as n → ∞ with k fixed.
)
√
has no limit as n → ∞ with k fixed.
(2) The ratio E(n,k
n
E(n,k
√ ) is Θ(1) as k → ∞.
n
√
√ ) is Θ( k ) as k → ∞.
lim supn→∞ E(n,k
n
(3) The function C(k) = lim infn→∞
(4) The function C(k) =
(5) limk →∞ C(k) = 3 − 1/e.
√
(6) limk →∞ C(k)/ k = χ, where χ ≈ 0.6382 is the maximum of the
2
function f (x) = x(1 − e−1/x ) in the interval (0, ∞).
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
17 / 19
The final result
Theorem
Let k ≥ 2.
(1) The expected√palindromic richness E(n, k) of a random k-ary word
of length n is Θ( n) as n → ∞ with k fixed.
)
√
has no limit as n → ∞ with k fixed.
(2) The ratio E(n,k
n
E(n,k
√ ) is Θ(1) as k → ∞.
n
√
√ ) is Θ( k ) as k → ∞.
lim supn→∞ E(n,k
n
(3) The function C(k) = lim infn→∞
(4) The function C(k) =
(5) limk →∞ C(k) = 3 − 1/e.
√
(6) limk →∞ C(k)/ k = χ, where χ ≈ 0.6382 is the maximum of the
2
function f (x) = x(1 − e−1/x ) in the interval (0, ∞).
Some particular values:
C(2) ≈ 6.17315, C(2) = 6.17368
C(3) ≈ 4.40121, C(3) = 4.41410
C(10) ≈ 3.02693, C(10) = 3.41133
C(50) ≈ 2.70152, C(50) = 5.09183
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
17 / 19
Squares
Forget about squares. They are much alike even-length palindromes.
The only slight difference is in borders, but still, almost all squares
have “almost equal” border polynomials. The rest is easy, because
there is no analogs of odd-length palindromes.
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
18 / 19
Thank you for your attention!
A. M. Shur (Ural Federal U)
Expected Number of Repetitions
Shanghai, April 25, 2015
19 / 19