Properties of infinite words : Recent results

Properties of Infinite Words" Recent Results
J. Berstel
L.I.T.P.
Universit6 Pierre et Marie Curie
Paris, France
A b s t r a c t . - - In this survey, two problems are considered. The first is to determine conditions on an infinite word such that the set of (finite) words which
are not prefixes is context-free. The second is to describe how the number of
factors of length n in an infinite word grows as a function of n. An application
to a problem concerning semigroups motivates the second problem.
1
Introduction
Recently, there has been an increasing interest in combinatorial properties on words, and
several papers in this volume are considering topics in this field. In this paper, I address two
problems on infinite words. The first is concerned with the relation between context-free
languages and infinite words. It is well-known that the set of initial factors of an infinite
word is context-free if and only if the word is ultimately periodic. A more interesting
question is to ask when the complement of the initial factors is context-free. We call such
an infinite word co-conlezt-free. There are several significant results in this direction which
are reported in the next section. However, a characterization of these infinite words is still
lacking, as well as a suitable device to construct or to recognize them.
The second subject concerns, paraphrazing Ed. Lucas, "the apparition and repetition
of factors" in an infinite word. For infinite words generated by uniform morphisms, the
classical work of Cobham [9] gives an explicit relationship with number systems. For
nonuniform morphisms, the connexions are more subtle. It is quite remarkable that new
investigations on this topic where motivated by a problem in semigroup theory, initiated
by a paper on the Burnside problem on semigroups ([35]).
37
2
Co-Context-Free Languages
In [27], M. G. Main proved that there exists an infinite language containing only squarefree words such that its complement is context-free. This answers negatively one of the
conjectures of [1]. Shortly after, Bucher, Haussler and Main [28] have disproved several
other conjectures of [1] and from other authors by considering variations of this language.
A more systematic investigation of the techniques which may help to construct special
languages with context-free complement has been started. In order to report it, let us
recall solne definitions.
Given an alphabet A, a function a : A* ~ A*, and a word w E A*, consider the
sequence
w, ~(w), ~2(w) . . . . . ~ , ( w ) . . . .
Let ~*(w) be the set of these words. The adherence of c~*(w) (or simply of the function
for w) is the set of infinite words x such that any initial factor of x is a left factor of
some word in a*(w) (This is precisely the adherence of ~*(w) in the set of infinite words
equipped with the product topology induced by the discrete topology on the alphabet).
It has been shown by [10,38] that for a morphism a, the adherence of ~*(w) is finite. Of
particular interest is the case where a is a nonerasing function (whenever x is nonempty,
so is ~(z)), and where w is a proper prefix of tr(w), in which case ~ is called prolongeable
in w. If ~ is prefix-preserving (for all u, v # 1, ~(uv) = ~(u)x for some nonempty x)
then each c~"(w) is a proper prefix of ~'*+l(w). In this case, the adherence is just a single
infinite word x which is the l~mi* of the sequence and which also is a fixed-point of a,
i.e. x = ~(x). It is called the infinite word obtained by iterating c~ on w. The condition
on ~ to be prefix-preserving is of course fulfilled by morphisms, but also by the functions
computed by generalized sequential machines or gsm's (in the sense of Ginsburg [19] or
Eilenberg [15]).
Given any infinite word x, we consider the set P r e f ( x ) of (finite) words which are
prefixes of x and the set Copref(x) which is just the complement of P r e f ( x ) . A word x
is called co-context-free if the set Copter(x) is context-free.
E x a m p l e . - - Consider, over the two letters a and b, the infinite word
x = aba2ba3b...anb ....
The set Copter(x) is closely related to the classical Goldstine language. More precisely,
Copref(x) f7 {a,b}*b = {a nlban2b . . . a "k ] 3k, nk # k}
38
which is context-free. We observe for later reference that the infinite word x equals, up to
its first letter, the word
x I = cba2ba3b.., a n b . . .
which is obtained by iterating the morphism c ~-~ cb, a ~-~ a, b ~-~ ab on c. The following
result was proved independently in [6] and [28].
T h e o r e m . - - Let x be the infinite word generated by a morphism t~ which is prolongeable
in s o m e word w. Then x is co-context-free.
This result can be strengthened in several directions. First, we observe that the morphism must not necessarily be prolongeable.
T h e o r e m ([38]).-- A n y infinite word in the adherence o f a morphism for some word w is
co-context-free.
It has also been shown by Terlutte that the intersection of context-free C o p t e r sets
remains context-free. Call center of a language the set of left factors of its adherence.
Then using this observation, he gets
T h e o r e m ([38]).-- The center of a DOL-language is co-context-free.
Of course, it is interesting to get information about the nature of these languages. It
can be shown that all these complements of languages are restricted one-counter languages,
i.e. are recognizable by a pda with a counter and without reset. Thus they are rather low
in the hierarchy of context-free languages. Another question concerns ambiguity. There is
a general result for languages which are coprefixes.
T h e o r e m (Autebert, Flajolet, Gabarr6[2]).--Let x be an infinite word. Then one, and
only one o f the following properties holds for the language C o p t e r ( x ) :
(i) it is not context-free;
(ii) it is an inherently ambiguous context-free language;
(iii) it is regular.
This result gives the following corollary for iterated morphisms:
C o r o l l a r y . - - Let x be an infinite word defined by iterating a morphism a. Then either
C o p t e r ( x ) is an inherently ambiguous context-free language, or it is regular. Moreover, it
is decidable, for a given a, which o f the two cases holds.
It is indeed easily seen that a language C o p t e r ( x ) is regular iff x is ultimately periodic.
It has been shown by Harju and Linna [21] and independently by Pansiot [32] that it
is decidable whether an infinite word generated by an iterated morphism is ultimately
periodic. In our example, the language C o p r e f ( x ~) is inherently ambiguous, whence also
the Goldstine language. This was shown directly by Flajolet in [17,18].
39
It is rather difficult to prove that a given language C o p t e r ( x ) is not context-free.
Indeed, it has been observed by A. Grason [20] that all these languages satisfy Ogden's
iteration lemma, and by M. G. Main (personal communication) that they also satisfy the
interchange lemma of Ogden, Ross, Winklmann [31]. In some cases, the iteration lemma
of Bader and Moura [4] may help, but a more efficient criterion has been proved by Grazon
T h e o r e m [20].--Let x be an infinite c o - c o n t e x t - f r e e word o f t h e f o r m
x -=- aSlba s2 . . . a S * b . . .
for integers s,~ > O. T h e n there e x i s t s an integer K such t h a t for all i n t e g e r s n > 1
.-1
s, <(K+l)!+Kn+K~s,
i=l
As an example, the word x = aba2ba6b .. .an!b . . . is not co-context-free. However, taking
s , = 2" gives a co-context-free word.
2/_22
112
2/11
Figure 1: Kolakowski's gsm
Iterated morphisms are a too restrictive device to characterize co-context-free infinite
words. We therefore consider gsm's (identifying the machine with the mapping it produces). Consider for example the gsm in figure 1 which generates, by iteration starting
with the letter 2, the infinite word of Kolakovski [26]
kol = 2211212212211.-It is easily seen that kol cannot be generated by a morphism; however it is co-context-free
in view of the following result :
40
T h e o r e m [39].-- L e t x be an infinite word o b t a i n e d b y i t e r a t i n g a g s m on a word where
it is prolongeable. T h e n x is co-context-free.
The restriction that x is generated starting with a word on which the gsm is prolongeable is essential.
c/c
a/a
d/
~
c/c2
Figure 2: A gsn~l generating a non co-context-free word
E x a m p l e [39].-- Consider the gsm a given in figure 2. Starting with the word w = c d c ~ ,
one gets o~(w) = a d c ~ # , showing that a is not prolongeable on w. In fact, none of the
a n ( w ) is a prefix of a'~+l(w). However, the adherence of c~*(w) is a single, infinite word
2
x = aba2ba4ba16b . . . a 2"
...
where there are n powers of 2 at the nth step. In view of A. Grazon's criterion, this word
is not co-context-free.
On the other side, there are still co-context-free words which are not generated by this
device. Such an example has recently be given by Autebert and Gabarr6.
T h e o r e m [3].-- T h e infinite word
x = abca2ba2bc . , , (a2~b)2"c . . .
is co-context-free and cannot be o b t a i n e d by i t e r a t i n g a g s m on a word where it is prolongeable.
However, Terlutte [39] has given a gsm producing x, starting on a word where the gsm
is not prolongeable. Thus, a characterization of co-context-free words is still lacking. For
further discussion and some conjectures, see [3].
4]
3
Weakly permutable monoids
The origin of the theory of the so-called weakly permutable monoids, and the use of
properties of infinite words in this context, is a paper of Restivo and Reutenaner [35]
concerning the Burnside Problem for semigroups. This problem is the following : Given
a finitely generated semigroup S, each element of which generates a finite subsemigroup
(i.e., S is torsion), is S finite ?
This problem has a negative answer in general, as was shown by Morse, Hedlund [30].
The argument is the following : Consider the set of cubefree words X over a two-letter
alphabet. The set M = X tA 0, with 0 a new element, is made a semigroup by setting
z • x ~ = z z ~ if z z ~ is cubefree, = 0 otherwise. Clearly, M is a finitely generated, torsion
semigroup (z 3 = 0). Since there are infinitely many cubefree words over a two-letter
alphabet, M is infinite.
In fact, the problem was first raised for groups, for which the answer is also negative
(Golod, Shafarevitch, see [23]).For commutative semigroups, the answer is trivially positive. Restivo and Reutenauer introduce a property which generalizes commutativity, and
they call it the permutation properly : A semigroup S is called n-permutable if, for any
s l , . . . ,sn in S, there exists some permutation a of { 1 , . . . ,n}, c r ¢ id, such that
8 1 . . . s n = 8a(1)'''Sa(n)
A semigroup is permutable if it is n-permutable for some n _> 2.Note that for n -- 2, this
is commutativity.
T h e o r e m [35].-- A finitely generated, permutable, torsion sen~Agroup is finite.
For permutable groups, the structure is rather well-known:
T h e o r e m [11,12].-- A group is permutable iff it is finite by abelian by finite ( i.e. it has
a normal subgroup o f finite index such that its derived subgroup is finite).
In [7], Blyth introduces an interesting extension of permutability. A semigroup S is
called n-weakly permutable (Blyth says n-rewritable) if, for any elements Sl . . . . , sn in S,
there exist two permutations a and r of {1, . . . . n}, ~r ¢ v, such that
s~(1) --- so(~) = s~(1) • • • sT(n)
A semigroup is weakly permutable if it is n-weakly permutable for some n >__ 2.Note that
for n = 2, this is the same as permutable. Blyth proves the following result.
T h e o r e m [7].-- A group is permutable iff it is weakly permutable.
42
The question whether this equivalence also holds for semigroups was considered by
Restivo. He gave the following counter-example.
T h e o r e m [34].-- The Fibonacci semigroup is weakly permutable and infinite.
The construction is nice, and goes as follows. For any infinite word x, let F(x) be the set
of factors of x, and consider the set M(x) = F(x) U {0}, where 0 is a new element. As
above, we turn M(x) into a semigroup by setting, for u, u' E M(x),
u . u ' = [ uu'
t 0
ifuu'eF(x)
otherwise
Observe that M(x) is just the Rees quotient of the f~ee semigroup over the alphabet of x
by the twosided ideal of those words that are not in F(x).
Restivo considers the Fibonacci semigroup M(f) obtain by this construction for the
infinite Fibonacci word f = abaababaabaab .... This semigroup is torsion since Karhum£ki
[25] showed that no factor of f contains a fourth power. One basic argument in Restivo's
proof is to show that for any u E F(f), the number of words v such that [u[ = Iv[ and
uv E F ( f ) is bounded by a constant which is independent of the length of u. This means
that if you consider the tree of factors of f, then the number of nodes of outdegree 2 in a
slab of length n and with root at distance n is uniformly bounded.
In fact, the proof of Restivo's theorem shows that the Fibonacci semigroup is 5-weakly
permutable. One may look for the smallest value of n for which there is an infinite torsion
n-weakly permutable semigroup. Restivo's bound was improved to 4 by A. de Luca and
S. Varricchio :
T h e o r e m [13].-- The Thue-Morse semigroup M(t) is 4-weakly permutable and infinite.
The final value was contributed by J. Justin and G. Pirillo. They prove the following :
T h e o r e m [24].-- The semigroup M(s), where s is the infinite word generated by the
morphism a ~-~ aab, b ~ abb, is 3-weakly permutable and infinite.
A systematic study of weakly permutable semigroup is ongoing. In the case of semigroups
defined by an infinite word, one general result is due to P. Mignosi. He proves the following
T h e o r e m [29].-- Let fx(n) be the number of factors of length n in the infinite word x.
If f x grows at most linearly, then M(x) is weakly permutable.
This has been slightly extended by de Luca and Varricchio [14]. They show that the
same conclusion holds for factorial languages whose growth function (i.e. the number of
words of length at most n) is quadratically upper bounded.
43
Both proofs of Restivo's and de Luca and Varricchio's theorems giving explicit examples
of weakly permutable infinite semigroups use the concept of what has been called special
factor in [5]. This is a factor of an infinite word which can be extended to the right in at
least two different manners into a factor of the word. For instance, aba is a special factor
of f since abab and abaa are factors of f. On the contrary, b is not a special factor, since
bb is not a factor of f. Special factors of f are characterized in [5]. In [13], de Luca and
Varricchio give a complete characterization of the special factors of the Thue-Morse word.
They show that for each integer n, there are either 2 or 4 special factors of length n, and
they give the form of these factors. We just quote the following result :
T h e o r e m [13].-- Let Ct(n) be the number of special factors of length n in the Thue-Morse
word t. Then Ct(n) E {2,4}, and
Ct(n) =
{24
if n E [2k q- 2 k - 1 4 - 1 , ' ' ' ' 2 ~ + l ]
i f n e [ 2 k + l , . . . . 2 k + 2 k-l]
where k = [log(n - 1)J.
This explicit formula also follows from the computations of S. Brlek [8]. It is a special
case of a nice theorem of Tapsoba [37] who gives an explicit description of the number of
special factors for a large class of words generated by morphisms. Observe that the set
of integers n such that say Ct(n) = 2 in the previous theorem, when written in binary
expansion, is a recognizable language. The same holds of course for the set of integers n
with Ct(n) = 4. Tapsoba observed that this situation is generic in the following sense.
T h e o r e m [37].-- Let x be an infinite word generated by a uniform morphism c~ of modulus
k. Assume that a is injective and that x is minimal. Denote by Cx(n) the number o f special
factors of length n in x. Then Cx takes only a finite number o f distinct values, and for
each integer p, the set of k-ary expansions of numbers in Cx 1 (p) is a recognizable language.
A word x is called minimal if the symbolic system associated is minimal, or equivalently,
if it is almost periodic, i.e. if for each factor u of x, there is an integer I such that u is a
factor of any factor of length 1 of x. Tapsoba's proof is effective.
The converse of the theorem of course is not true, since for instance for S~urmian words
(see e.g. [22]), the function ¢ always has value 1. However, there are strong relations
with number systems also for non-uniform morphisms, as shown by Shallit ([36] and this
volume) and Rauzy [33]. Concerning the finiteness of the function ¢, there is some partial
information available, depending on the nature of the infinite word.
T h e o r e m [13].-- I f x is a p-powerfree infinite word for some p > 1, and if f x grows at
most linearly, then Cx takes only a finite number of values.
44
Recall that a word is p-poweri}ee if it contains no factor of the form u p for some
nonempty word u. There is an interesting special case of the previous result.
T h e o r e m [13].-- Let x be an infinite p-powerfree almost periodic word generated by
iterating some morphism. Then ¢x takes only a finite number of values.
This results from a theorem of Ehrenfeucht and Rozenberg [16] stating that for an
infinite p-powerfree almost periodic word x generated by some morphism, the function f x
grows at most linearly.
References
[1] 3. M. A u t e b e r t , J. B e a u q u i e r , L. Boasson, M. Nivat, Quelques probl~mes
ouverts en th@orie des langages, RAIRO Informatique th@orique 13 (t979), 363-379.
[2] 3. M. A u t e b e r t , P. Flajolet, J. G a b a r r 6 , Prefixes of infinite words and ambiguous
context-free languages, Inform. Proc. Letters 25 (1987), 211-216.
[3] J. M. A u t e b e r t , 3. G a b a r r 6 , Iterated gsm's and co-eft, Acta Informatica, to app ear.
[4] C. B a d e r , A. M o u r a , A generalization of Ogden's lemma, J. Assoc. Comput. Mach.
29 (1982), 404-407.
[5] J. Berstel, Mots de Fibonacci, S@minaire d'inforrnatique th@orique, LITP, Paris,
Annie 1980/81, 57-78.
[6] J. Berstel, Every iterated morphism yields a co-cfl, Inform. Proc. Letters 22 (1986),
7-9.
[7] R. D. B l y t h , Rewriting products of group elements I, J. Algebra 116 (1988), 506521.
[8] S. Brlek, Enumeration of factors in the Thue-Morse word, Discr. Appl. Math., to
appear.
[9] A. C o b h a m Uniform tag sequences, Math. Systems Theory 6 (1972), 164-192.
[10] K. Culik II, A. S a l o m a a On infinite words obtained by iterating morphisms, Theoret. Comput. Sci. 19 (1982), 29-38.
[11] M. Curzio, P. L o n g o b a r d i , M. M a j , Su di un problema combinatorio in teoria
dei gruppi, Atti Acc. Lincei Rend. Fis. VIII 74 (1983), 136-142.
[12] M. Curzio, P. L o n g o b a r d i , M. M a j , D. J. S. R o b i n s o n , A permutational
property of groups, Arch. Math. 44 (1985), 385-389.
45
[13] A. de Luea, S. Varricehio, Some combinatorial properties of the Thue-Morse sequence and a problem in semigroups, Theoret. Comput. Sci. , to appear.
[14] A. de Luca, S. Varricehio, Factorial languages whose growth function is quadratically upper bounded, Inform. Proc. Letters, to appear.
[15] S. Eilenberg Automata, Languages, and Machines, Vol A, Academic Press, 1974
[16] A. E h r e n f e u c h t , G. Rozenberg, On the subword complexity of D0L languages
with a constant distribution, Inform. Proc. Letters 16 (1981), 25-32.
[17] P. Flajolet, Ambiguity and transcendence, in: Proc. 12th ICALP , Lecture Notes
Comput. Sci. 194 (1985), 179-188.
[18] P. Flajolet, Analytic models and ambiguity of context-free languages, Theoret. Cornput. Sci. 49 (1987), 283-309.
[19] S. Ginsburg, The Mathematical Theory of Context-Free Languages, McGraw Hill,
1966.
[20] A. Grazon, An infinite word language which is not co-CFL, Inform. Proc. Letters
24 (1987), 81-86.
[21] T. H a r j u , M. Linna, On the periodicity of morphisrns in i~ee monoids, Theoret.
Inform. Appl. 20 (1986), 47-54.
[22] G.A. H e d l u n d , M. Morse, Symbolic dynamics I,II, Amer. J. Math 60 (1938),
815-866, 62 (1940), 1-42.
[23] N. Herstein, Noncommutative Rings, Carus Math. Monographs, Math. Assoc.
Amer., 1969.
[24] 3. Justin~ G. PiriHo, Infinite words and permutation properties, Semigroup Forum,
to appear.
[25] J. K a r h u m ~ k i , On cube-free w-words generated by binary morphisms, Discr. Appl.
Math. 5 (1983), 279-297.
[26] W. Kolakovski, Self generating runs, problem 5304, American Math. Monthly 71
(1965), solution by N.Ucoluk, same journal 73 (1966), 681-682.
[27] M. G. Main, An infinite square-free co-CFL, Inform. Proc. Letters 20 (1985), 105107.
[28] M. G. Main, W. Bucher~ D. Haussler, Application of an infinite square-free coCFL, in: Proc. 12th ICALP , Lecture Notes Comput. Sci. 194 (1985), 404-412, also
Theoret. Comput. Sci. 49 (1987), 113-119.
46
[29] F. M i g n o s i Infinite words with linear subword complexity, Theoret. Comput. Sci.,
to appear.
[30] M. Morse, G. H e d l u n d , Unending chess, symbolic dynamics, and a problem in
semigroups, Duke Math. J. 11 (1944), 1-7.
[31] W . O g d e n , R. Ross, K. W i n k l m a n n , An "interchange lemma" for context-free
languages, SIAM J. Comput. 14 (1985), 410-415.
[32] J.-J. Pansiot, Decidability of periodicity for infinite words, Theoret. Inform. Appl.
20 (1986), 43-46.
[33] G. Rauzy, Sequences defined by iterated morphisms, in :Workshop on Sequences (R.
Capocelli ed.), Lecture Notes Comput. Sci., to appear.
[34] A. Restivo, Permutation properties and the Fibonacci semigroup, Semigroup Forum,
to appear.
[35] A. Restivo, C. R e u t e n a u e r , On the Burnside problem for semigroups, J. Algebra
89 (1984), 102-104.
[36] J. Shallit, A generalization of automatic sequences, Theoret. Comput. Sci. 61 (1988),
1-16.
[37] T. Tapsoba, Complexitd de suites automatiques, Th~se 3e cycle, Universit~ d'AixMarseille II, 1987.
[38] A. T e r l u t t e , Sur le centre de DOL-langages, Theoret. Inform. Appl. 21 (1987), 137146.
[39] A. T e r l u t t e , Contribution ~ l'dtude des langages engendrds par des morphismes
it~r~s, Th~se, Universit~ de Lille I, 1988.