full text pdf

DOI: 10.2478/s12175-013-0198-y
Math. Slovaca 64 (2014), No. 1, 229–246
SOME CONSIDERATIONS ON HYDRA GROUPS
AND A NEW BOUND
FOR THE LENGTH OF WORDS
Daniele Ettore Otera* — Francesco G. Russo**
— Vincenzo Russo***
(Communicated by Anatolij Dvurečenskij )
ABSTRACT. After a survey on some recent results of Riley and others on Ackermann functions and Hydra groups, we make an analogy between DNA sequences,
whose growth is the same of that of Hydra groups, and a musical piece, written with the same algorithmic criterion. This is mainly an aesthetic observation,
which emphasizes the importance of the combinatorics of words in two different
contexts. A result of specific mathematical interest is placed at the end, where
we sharpen some previous bounds on deterministic finite automata in which there
are languages with hairpins.
c
2014
Mathematical Institute
Slovak Academy of Sciences
1. Hydra groups
The combinatorics of words is increased significatively in the last years, since
it is a powerful tool for the description of several processes in pure and applied
sciences. A classic reference on the topic is [24] and will be used for the basic
notions. We recall that a nonempty set A (which will be always assumed to be
finite) is said to be an alphabet and its elements are called letters. By A+ we
denote the free semigroup generated by A, i.e. the set of all finite sequences of
letters with the operation of concatenation. Elements of A+ will be called words.
It is well known that each word can be written in a unique way as a sequence
of letters, so it is possible to define for each word w its length |w|:
w = a1 a2 . . . ak ai ∈ A =⇒ |w| = k ∈ N.
It is useful (though not necessary) to introduce the formal empty word ε: that
is a word of length 0 such that
∀ w ∈ A+ ∪ {ε}
wε = εw = w.
2010 M a t h e m a t i c s S u b j e c t C l a s s i f i c a t i o n: Primary 57M50; Secondary 68R05, 68Q15.
K e y w o r d s: isoperimetric functions, finitely presented groups, lengths of words, counterpoint,
Myhill-Nerode’s theorem.
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
The set A∗ = A+ ∪ {ε} is the free monoid generated by A and any of its subsets
is a language. For k ∈ N, Ak , A<k , A≤k are called, respectively, the set of all
words of length k, of length strictly smaller than k, and of length at most k. It
is clear that
A≤k = A<k ∪ Ak = {ε} ∪ A ∪ A2 ∪ · · · ∪ Ak−1 ∪ Ak .
The following notion is crucial in several areas of the theory of semigroups and
of the geometric group theory.
1.1 (See [24: §1.3.1]) A deterministic finite automaton (DFA) is a
tuple A = (S, A, s, F, P ), where:
(i) A is an alphabet,
(ii) S ⊆ A is a nonempty subset of A, called the set of states of A,
(iii) s ∈ S is called the start state of A,
(iv) F ⊆ S is the (nonempty) set of final states of A,
(v) P ⊆ S × A × S is the set of productions of A, with the property that for
each s in S and a in A, the cardinality of {s} × {a} × S ∩ P is at most 1,
(vi) for each (s, a, t) ∈ P , s is called the start state of the production, t the final
state and a its label.
A path on the automaton A is a sequence of productions such that the start
state of each one is the final state of the preceding one. Given a path p = p1 . . . pk
we say that p goes from s to t, where s is the start state of the first production
and t is the final state of the last one. A path is said to be accepted by the
automaton A if it goes from the start state to a final state of A. Given an
accepted path, the associated sequence of labels is called a word accepted by A.
As a general fact, note that, if one wants to define explicitly a function from
A∗ to A∗ (or to study explicitly a DFA), it is enough to give the definition just
on the letters of A∗ (see [24: §1.3.1]).
Now, we fix an alphabet on 2 letters and we introduce a very special function,
which will mimic the celebrated battle of Hercules against Hydra.1 A typical
battle between Hydra and Hercules may be summarized by the following steps:
1In the Greek mythology Hydra was a monster, whose head was done by snakes (from 3 to
9, according to the various sources) which become more numerous, whenever they were cut.
Hydra was the son of Typhon and Echidna, and was educated by Gea, who put it in Argolid.
Its death was the second mythological effort of Hercules and there are several representations
of this battle in pictures and figurative arts. Today we know that the origin of the myth
was due to some evidences, unknown to the scientists of those years. For instance, one has
evidences from the zoology that there are octopus (named Hydras), belonging to the family of
Cnidaria, that have a high ability to regenerate their tentacles, whenever they are cut.
230
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
(i) Hercules cuts one of the heads of Hydra;
(ii) Hydra regenerates its head with a prescribed rule.
The combinatorics of words allows us to formalize (i) and (ii) as follows.
1.2 (See [6,12]) Let A = {a, b} be an∗alphabet
on 2 letters, w ∈ A∗
∗
a word of length > 1 and ρ1 the function, from A to A , defined by
ρ1 : a →
b →
ab
b.
The word ρ1 (w) is called a 1-Hydra word on A∗ , or briefly 1-Hydra. One says
say that Hercules cuts w (or fights against w), if A admits a DFA A, whose
initial state is w and final state is the word v equal to w, except for its first
letter (occurring on the left), which is omitted. After finitely many steps that
Hercules cuts w, A might produce the empty word e, and if this happens, one
says that Hercules wins w.
The imagination does not distort the reality, if we compare Hercules with an
algorithm which operates on words, and Hydra with a word as in Definition 1.2.
The following example helps us to better understand (see [6, 12]).
Example 1.3. Assume that Hercules cuts a4 = aaaa. Firstly, the prefix a is
omitted, getting aaa, which regenerates and becomes ababab via ρ1 . In the
second step the prefix a is omitted, and one gets from babab the word babbabb
via ρ1 and so on. The following diagram summarizes the battle we are describing:
a
a
b
a
b
b
b
a
b
b
b
b
b
b
b
a
b
a
b
b
b
a
b
b
b
b
b
b
b
a
a
b
b
b
a
b
b
b
b
b
b
b
a
b
b
b
a
b
b
b
b
b
b
b
a
a
a
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
b
231
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
After 24 − 1 = 15 steps, Hercules wins a4 (i.e., a4 is reduced to the empty word).
Remark 1.4 It is not hard to see, as done in Example 1.3, that Hercules wins
an after 2n − 1 steps, assuming that the regeneration is always given by ρ1 .
For the case of words in two letters, the situation is illustrated below.
Example 1.5. Assume that Hercules fights against the 1-Hydra ab. It is clear
that Hercules wins after only two steps. For the word an b, the situation is
different and Hercules wins after 2n steps. More generally, Hercules wins the
word an bm after 2n + m − 1 shots.
We have anticipated the following result.
1.6
(See [6, 12]) Given r1 , s1 , . . . , rk , sk , k ∈ N, after finitely many
cuts of Hercules, a 1-Hydra word ar11 bs11 ar22 bs22 . . . arkk bskk becomes e. In particular,
the DFA in Definition 1.2 reduces a 1-Hydra word to the empty word (after
finitely many steps).
S k e t c h o f t h e p r o o f. We give just an idea of the proof with a computational argument (for the proof see [6, 12]), since the general case follows analogously. Assume r1 = r2 = · · · = rk = s1 = s2 = · · · = sk = 2. We get:
a21 b21 a22 b22 . . . a2k b2k
a1 a1 b1 b1 a2 a2 b2 b2 . . . ak ak bk bk
a1 b1 b1 b1 a2 b2 a2 b2 b2 . . . ak bk ak bk bk bk
... ... ... ... ... ... ... ... ... ... ...
b1 b1 . . . b2 b2 . . . bk bk . . . bk
... ...... ... ... ...
b2 b2 . . . bk bk . . . bk
... ... ... ... ... ...
bk bk . . . bk
... ... ...
bk
Now one can generalize 1-Hydra words, considering alphabets with more letters.
1.7 Let A = {a0 , a1 , . . . , ak }
define from A∗ to A∗ the function
ρk : a0 →
a1 →
a2 →
..
.
ak
be an alphabet on k + 1 letters and
a0
a1 a0
a2 a1
→ ak ak−1
232
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
Given w ∈ A∗ (of length > 1), ρk (w) is called k-Hydra and Hk : A∗ → N denotes
the number of steps after which Hercules wins a k-Hydra word. For convenience,
we write Hk (n) = Hk (ank ).
T. Riley has proved recently the following result.
1.8 (See [6, 12]) With the notations of
H2 (an1 )
≥2
2.
.
.2
Definition 1.7,
n
.
Proposition 1.8 gives a further proof of what we have seen by computational
arguments in Examples 1.3 and 1.5. We will repeat the proof of the following
result from [12], for an explicit application of the bound of Proposition 1.8.
1.9 (See [6, 12]) Hercules wins always a k-Hydra word (on a finite
alphabet) for all k > 0.
S k e t c h o f t h e p r o o f. Given k, n > 0, we may define inductively
A1 (n) := 2n,
(n)
Ak+1 (n) := Ak (1),
which is known in number theory as Ackermann function (see [6, 12]). Among
its properties, we have that Hk (n) ≥ Ak (n), for all k ≥ 3 and n > 0. On the
other hand, for any n, Hk (n) ≤ Ak (n + k), and the result follows.
We have all that it is necessary for the introduction of the following group.
1.10
(See [6, 12]) The presentation
Γk = a0 , . . . , ak , p, q, t | t−1 ai t = ai ai−1 (0 < i ≤ k),
t−1 a0 t = a0 , [p, ai t] = 1 (0 ≤ i ≤ k), [q, t] = 1
defines a group called a k-Hydra (or briefly a Hydra group).
Hydra groups are subject of several investigations in the last years, because of
their role in the famous decision problems of M. Dehn (see [26]). The following
result illustrates this: the proof is omitted, because it involves specific techniques
of combinatorial group theory (Van Kampen diagrams and generalizations of
Seifert-Van Kampen’s Theorem, see [17] for feedback and definitions).
1.11 (See [12]) With the notations of Definition 1.10, the elements
an
k , q] and [p, q] are conjugated in Γ .
Furthermore, each word w such that
[p
k
an p k , q w = w[p, q] in Γk has length ≥ Hk (n).
233
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
Another interesting point is related to the so called quadratic Dehn functions
(see [14, 15]), which are very special for Hydra groups. The following questions,
due to Olshanskij, Rips and Sapir, are actually open.2
1.12 What is the structure of (finitely presented) groups with qua-
dratic Dehn function? Are there (finitely presented) groups with quadratic Dehn
function without decidable conjugacy problem?
In [6, 12, 22] it has been proved that Γk has Dehn function Dehn(n) such that
Dehn(n)
=0
n→∞ n2 log n
and has decidable conjugacy problem. Unfortunately, the general case is hard
to investigate.
lim
1.13
(See [28]) There exists a group with Dehn(n) = n2 log n and
undecidable conjugacy problem.
Theorem 1.13 shows some very special examples. Most of the groups which we
know from the geometry and topology have actually quadratic Dehn function.
It is instructive to recall here that the infinite dihedral group
D∞ = x, y | y 2 = 1, y −1 xy = x−1 = y x = C2 C∞
may be generalized in various ways. For instance, the Baumsalg–Solitar groups
are classical generalizations (see [3, 12] but also [29, 32] for similar semidirect
products). A way to construct them is the following. Let m = ±1 be a non-zero
integer and write π for the set of prime divisors of m. Consider the group
G = x, a | am = ax = x Qπ ,
2Given a finitely presented group Γ = X | R and the free group F (X) on X, if w ∈ F (X) is
a freely reduced word on Γ such that w = 1, then w can be written as
−1
w = u1 r1 u−1
1 · · · um rm um ∈ F (X),
where m ≥ 0 and ri ∈ R ∪ R−1 for all i = 1, . . . , m. In this situation, we may define the area
of w with respect to X and R as
Area(w) = min{m ≥ 0 | w is a product in F (X) of m conjugates of elements of R ∪ R−1 }.
It is not hard to see that Area(w) is the smallest number of 2-cells in a van Kampen diagram (over the given presentation) with boundary cycle labelled by w. We recall that
an isoperimetric function (for a finite presentation) is a monotone non-decreasing function
f : N → [0, ∞[ such that whenever w ∈ F (X) is a freely reduced word satisfying w = 1 in Γ,
then Area(w) ≤ f (|w|), where |w| is the length of the word w. The Dehn function (of a finite
presentation) is defined as
Dehn(n) = max{Area(w) | w = 1 in Γ, |w| ≤ n, w is freely reduced}.
Equivalently, one can see that Dehn(n) is the smallest isoperimetric function (for the given
presentation). Further details can be found in [14, 15, 17, 30].
234
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
where the normal closure aG Qπ is the additive group of π-adic rationals and
x has infinite order, inducing in Qπ the automorphism b → mb. Roughly speaking, this construction has been done, replacing the role the normal subgroup C∞
with Qπ and that of C2 with a finite group acting by scalar multiplication on
Qπ . More details can be found in [3, 4, 7, 17, 22].
Further generalizations, which are very interesting in the geometric group
theory, are the so-called synchronously bicombable groups, automatic groups, biautomatic groups, and the very large class of hyperbolic groups. These groups
are defined in terms of generators and relations and the words which we obtain
are on alphabets admitting DFAs with specific properties (synchronous bicombability, automaticity, biautomaticity or hyperbolicity). We refer to [7,8,13,22,28]
for precise definitions. Here we want only emphasize the importance of Hydra
groups by the next remark.
Remark 1.14 Hyperbolic groups, automatic and biautomatic groups have quadratic Dehn function and decidable conjugacy problem, while Hydra groups are
neither hyperbolic, nor (bi)-automatic, but have quadratic Dehn function too.
In virtue of all we have seen up to now, Riley and others think that the
particular role of Hydra groups, in the general theory of finitely presented groups,
could help us to shed light on the following conjecture, which is still open.
1.15 There are no bi-automatic groups, which are not automatic.
2. An application between sequences
of DNA and BACH
In the present section we will deal with the free monoid on the alphabet
{B, A, C, H} and with the free monoid on the alphabet {A, T, G, C}. Here we
want to emphasize that the computability is a very powerful tool in the theory of
languages for interactions among different branches of the natural sciences (see
[5, 7, 8, 36]). In molecular biology, for example, the structure of DNA is done by
means of sequences of nucleobases (Adenine, Thymine, Cytosine, Guanine) and
it is possible to look at words on a 4-letters alphabet, for instance {A, T, G, C},
in which prescribed rules are assigned among the links of the 4 nucleobases.
We know for example that Adenine and Thymine can link, the same is true for
Cytosine and Guanine, but this fails for Adenine and Cytosine or for Thymine
and Guanine. Details can be found in [23], but more properly in [1,10,11,18–21],
where the techniques of the computer science allow to measure the influence that
some DNA sequences may have on biological systems.
235
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
It is also interesting to note that the theory of languages has applications in
an area of the music, called harmony and known since more than 400 years already by J. S. Bach in terms of counterpoint and of musical language. The books
[16, 25, 27, 31, 33] describe some methods and algorithmic criteria to compose a
piece. We only recall the following famous example, related to some combinatorial aspects of an alphabet on 4 letters, in analogy with the case of the DNA.
The theme “BACH” can be read as a sequence of 4 sounds, once we fix the
frequencies for the letters B, A, C and H (Saxon notation) in the central octave
of a piano. Several authors wrote canons, fugues and other musical pieces, after
its introduction by J. S. Bach himself as a theme in “Die Kunst der Fugue”.
He used this theme also in other pieces such as “Variationen Über Von Himmel
Hoch”. More details can be found in [2: BWV 769] and [35]. Just as information,
note that similar criteria were applied by F. Schubert and A. Schoenberg in some
of their compositions.
In what follows, we will see that some analogies are possible between the
context of DNA and that of the (musical) counterpoint.
From [23], we know that the following restrictions hold:
• A can link T
• G can link C
• A can not link G
• T can not link C.
Once these elementary rules are satisfied, sophisticated models can be constructed, as show [1, 10, 11, 18–21]. We do not give details on this delicate point,
referring to the above papers for a rigorous treatment of the topic.
Today the equable temperament has been adopted everywhere in the Western
Harmony and constitutes the basement of the Classic Counterpoint (see [25,
27, 35]). In order to explain what is the equable temperament, the notion of
scale and pitch have to be introduced. A sound is characterized in Acoustics
by three features which are hight, intensity and timbre (see [31, 33]). The hight
corresponds to the frequency of the sound, then it can be measured in Hertz
via a suitable dispositive (see [25, 31, 33]). The difference of hight between two
sounds is said to be a pitch. Since a musical instrument can not cover infinite
frequencies, restrictions on the hight of a sound should be obviously done. This
is a common problem known by any musician. However the introduction of a
unity of measure for the pitches avoids problems of hight. Such unity is called
semipitch. A progressive sequence of finitely many semipitches is said to be
a scale. Once the scale is divided in twelve semipitches, where each semipitch
has equal value, then we have an equable temperament. Further details can be
found in [25, 27, 31, 33].
236
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
The importance of the equable temperament has been the subject of investigations from several authors in the History of Music, Harmony and in many
branches of Music (see [25, 27]). It represents a decisive step in order to confirm
the actual rules in Composition, Counterpoint, classification of musical forms,
Acoustics, Aesthetic of Music, Philology of Music, and briefly in the whole conception of the contemporary Music. J. S. Bach has introduced equable temperament as testified in [25, 31, 33].
In equable temperament the quavers B, A, C, H have the following restrictions:
• B is distant a pitch from A
• C is distant a pitch from H
• B is not distant a pitch from C
• A is not distant a pitch from H.
Then a treatment as in DNA can be done for the free monoid on the alphabet
{B, A, C, H}. We will see this later on.
It is interesting to note what happens when composing in a suitable way with
translations and reflections of the usual Euclidean plane (we are summarizing
some time-frequency diagrams, largely used in [31, 33]). It is in fact possible to
discover a logic of this kind in many pieces of J. S. Bach and in the entire Musical
Offer [2], if we formalize some musical rules in a mathematical language. Introducing a monometric orthogonal reference frame in the usual Euclidean plane,
we can visualize the temporal evolution of a hand which plays the ascending
scale of C major in the central octave of the piano (see [31,33]). Fixed a unity of
time, for example the quaver, we grade the X axis (duration axis) with integer
multiple of the fixed unity. On the Y axis (high axis) we grade according to the
frequencies of the white tastes of the central octave of the piano. Roughly speaking, each sound is characterized by hight, intensity and timbre, in particular the
hight is an acoustic size which is measured in Hertz. The range which competes
to the white tastes of the central octave of the piano goes from 264 to 520 Hertz
in the sequence C = 264 Hz, D = 297 Hz, E = 330 Hz, F = 352 Hz, G = 396 Hz,
A = 440 Hz, B = 495 Hz, C = 520 Hz. We have just summarized what has been
largely studied in [31, 33]. Then we will have a time-frequency diagram which
has the form of a scale, once we want to simulate a hand which plays the ascending scale of C major in the central octave of the piano. Of course, several
different configurations can be obtained, when the melody is more complicated.
In particular, we can see this with 4 sounds, corresponding to “B A C H” (for
instance in the central octave of the piano).
2.1
The free monoid on {A, T, G, C}∗ and that on {B, A, C, H}∗ are
isomorphic.
237
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
P r o o f. The map φ : {A, T, G, C}∗ → {B, A, C, H}∗ ,
such that:
A ∈ {A, T, G, C}∗ −→ φ(A) = B ∈ {B, A, C, H}∗
T ∈ {A, T, G, C}∗ −→ φ(T ) = A ∈ {B, A, C, H}∗
G ∈ {A, T, G, C}∗ −→ φ(G) = C ∈ {B, A, C, H}∗
C ∈ {A, T, G, C}∗ −→ φ(C) = H ∈ {B, A, C, H},∗
can be extended to an isomorphism from the monoid {A, T, G, C}∗ to the monoid
{B, A, C, H}∗ . Furthermore, the relations in {A, T, G, C}∗ are respected under
φ in {B, A, C, H}∗ . Then the result follows.
2.2 Hercules wins a 1-Hydra word
w on the alphabet {A, T, G, C}.
∗
In particular, Hercules cuts w ∈ {A, T, G, C} and, after finitely many steps, w
is equal to only one letter (namely, the last letter which appears on the right in
the initial expression of w). The same is true for any word in {B, A, C, H}∗ .
P r o o f. We should specify Theorem 1.9 on an alphabet with 4 letters. The rest
is clear from Lemma 2.1.
Now the following consequences are interesting and useful, but we do not give
many details, since we do not have the instruments, which are necessary to make
an appropriate analysis.
Remark 2.3 If the growth of a cancer cell (see for instance the Retinoblastoma
in [34]) is the same of ρk , then a sequence of DNA is nothing else than a suitable
word w over {A, T, G, C} and ρk determines the growth of w. Now a molecular
attack, as fast as the Ackermann function, will reduce w to a single nucleobase.
This means, in principle, that a cancer growth of type ρk might be stopped,
once we act with this precise velocity on the cells. Unfortunately, computational
evidences are hard to provide in general contexts and we have only data for the
retinoblastoma, which has been recently studied in [34] under this point of view.
Now we will proceed to read Corollary 2.2 from the musical point of view,
that is, replacing {A, T, G, C} with the alphabet {B, A, C, H}.
Remark 2.4 In a time-frequency diagram in which the theme BACH is assigned, it is possible to compose a musical piece, assigning a word on {B, A, C, H}
and ρk as algorithmic criterion.
With the help of Csound [9], we have just given an algorithmic criterion which
allows us to compose a piece of music with the same mathematical model of a
sequence of DNA, by Corollary 2.2.
238
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
3. A new bound for words
over alphabets with hairpins
We are going to recall some well known notions, which can be found in [24,36].
A factor of a word w ∈ A∗ is any w ∈ A∗ such that there exist v, v ∈ A∗ for
which w = vw v . If v = ε, w is called a prefix of w, while, if v = ε, w is called
a suffix of w; Fact(w) denotes the set of all factors of w, Factk (w) the set of all
factors of w of length k. As one can expect we may define the following sets:
Fact≤k (w) = Fact(w) ∩ A≤k
Fact<k (w) = Fact(w) ∩ A<k .
Similarly, we may define Suff(w) for suffixes and Pref(w) for prefixes.
Finally, w
denotes the reversal of a word w, i.e.:
= ak ak−1 . . . a2 a1 .
w = a1 a2 . . . ak−1 ak =⇒ w
It is well known that prefix codes are of extreme importance in many branches
of combinatorics. A prefix code is a language different from {ε} which does not
contain any (proper) prefix of its elements, i.e. C is a prefix code if and only if
C CA+ = C.
The height of a prefix code C is the smallest natural k such that C ⊂ A≤k .
From now on, a key role will be played by the following function, which counts
the number of prefix codes of height at most k, namely
ϕ : k ∈ N → ϕ(k) = |{C : C = C CA+ & C ⊂ A≤k }| ∈ N.
(3.1)
3.1
Given an alphabet A of cardinality d, (3.1) can be recursively
d
defined by ϕ(0) = 1 and ϕ(k) = (ϕ(k − 1) + 1) .
P r o o f. Obviously the only prefix code of height at most 0 is the empty code.
Let now C be a prefix code of height at most k. For each letter ai ∈ A we define
the set Cai : Cai = C ∩ ai A∗ . It is clear that each Cai is either empty or can be
written as ai X, X being a language. It is easy to prove that in this case X is a
prefix code of height at most k − 1, and this leads to the statement.
We are now able to formalize the notion of hairpin. Let θ be an involutory
morphism over A∗ , i.e.: an automorphism of A∗ such that θ2 = id.
3.2
For each k ∈ N, a word w ∈ A∗ is called a (θ, k)-hairpin if
there exist v, v ∈ A∗ such that |v| ≥ k and w = vv θ(
v ).
With the above notations we call v (resp. θ(
v )) the left (resp. right) stem of
the hairpin w and v its loop. We shall denote the set of all words containing
a hairpin among its factors with (θ, k)hp, and with (θ, k)hpf its complement in
A∗ . The following definition is then meaningful.
239
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
3.3
(θ, k)hp = w ∈ A∗ : ∃ v ∈ Fact(w) v is a (θ, k)-hairpin
(θ, k)hpf = w ∈ A∗ : Fact(w) ∩ (θ, k)hp = ∅
An element of (θ, k)hpf is called a (θ, k)-hairpin-free word. It is worth noting
that hairpins
of palindromes: in fact if w is a palindrome,
are a generalization
|w|
-hairpin and its loop is either a letter or the empty word
then it is a id, 2
(actually hairpins are a generalization of an even wider class of words: the so
called pseudo-palindromes, see [24: §2.2.1]).
Our aim is to give an estimation of the number N(θ,k) of states of the minimal
DFA accepting the language of (θ, k)-hairpin-free words over a finite alphabet
A for any k and θ. As a a special case of this result, we will obtain that both
(θ, k)hp and (θ, k)hpf are regular languages.
Let us fix θ, k and A and let Card(A) = d. We set for all w ∈ A∗
3.4
Cw = {v ∈ A∗ | wv ∈ (θ, k)hp}.
(See [24]) The Nerode equivalence ∼ is defined as follows:
w ∼ w ⇐⇒ Cw = Cw .
It is not hard to check that ∼ is actually reflexive, symmetric and transitive.
Since the number of states of the minimal DFA accepting (θ, k)hp equals N(θ,k)
(actually the minimal DFA accepting (θ, k)hpf is the same except for the set of
terminal states), Myhill-Nerode’s Theorem (see [5, 24, 36]) implies
∗
A
N(θ,k) = Card
.
∼
Let X be defined as X = {Cw }w∈A∗ . Since the function
f : Cw → [w]∼
can easily be shown to be bijective, we have that N(θ,k) = Card(X).
We define the following sets:
X1 = {Cw | w ∈ (θ, k)hp}
X2 = {Cw | w ∈ (θ, k)hpf and |w| < k}
X3 = {Cw | w ∈ (θ, k)hpf and |w| ≥ k}.
Since X1 ∪ X2 ∪ X3 = X, we have that
Card(X) ≤ Card(X1 ) + Card(X2 ) + Card(X3 ),
(3.2)
and we will show, in fact, that the three sets are disjoint, so the equality holds.
3.5 Let w ∈ A∗ . Then
w ∈ (θ, k)hp ⇐⇒ Cw = A∗ .
240
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
P r o o f. It follows clearly from the above definitions.
So we have
3.6
X1 = {A∗ } =⇒ Card(X1 ) = 1.
(3.3)
Let w ∈ (θ, k)hpf . Then
|w| < k ⇐⇒ min |v| = 2k − |w| > k.
v∈Cw
P r o o f. It is enough to consider w be a (θ, k)-hairpin. Then |w| ≥ 2k and the
rest follows easily.
We can thus prove the following result.
3.7
Let w ∈ (θ, k)hpf be such that |w| < k. Then
[w]∼ = {w}.
P r o o f. Let v ∈ [w]∼ . Then Cw = Cv and, by Lemma 3.6, |w| = |v|. Let
x ∈ Cw ∩ A2k−|w| . Then there exists y ∈ Ak−|w| such that
x = yθ(
y )θ(w)
= yθ(wy)∼ ,
as wx ∈ (θ, k)hp and |wx| = 2k. On the other hand, we have vx ∈ (θ, k)hp and
|vx| = 2k, so
vyθ(vy)= vx = vyθ(wy) =⇒ θ(vy)= θ(wy) =⇒ v = w.
The other inclusion is trivial.
Observing that, by definition,
Cw = Cw ⇐⇒ [w]∼ = [w ]∼ ,
we deduce the next consequence of Theorem 3.7.
3.8 Let Cw , Cw
∈ X2 . Then
Cw = Cw ⇐⇒ w = w .
Corollary 3.8 allows us to conclude that
Card(X2 ) = Card(A<k ) =
k−1
di =
i=0
dk − 1
.
d−1
(3.4)
We are now going to estimate Card(X3 ). First of all we need an easy lemma.
3.9
If w ∈ Cw , then vv ∈ Cw for all v ∈ A∗ , that is,
X ⊆ Cw =⇒ XA∗ ⊆ Cw .
P r o o f. It is clear that if wv contains a (θ, k)-hairpin, each word containing wv
as a factor also contains a (θ, k)-hairpin.
241
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
3.10
Let w, w be in (θ, k)hpf such that
(i) Suff k (w) = Suff k (w )
(ii) Factk (w) = Factk (w )
(iii) |w|, |w | ≥ k.
Then
Cw = Cw ⇐⇒ Cw ∩ A≤k = Cw ∩ A≤k .
P r o o f. One implication is trivial. Conversely, let v ∈ Cw . If |v| = k, then
v ∈ Cw . Proceeding inductively on |v|, let be
Cw ∩ A≤n−1 = Cw ∩ A≤n−1
for n − 1 > k.
We claim that Cw ∩ A≤n = Cw ∩ A≤n .
Let v ∈ Cw such that |v| = n. If there exists a v ∈ Pref <n (v) ∩ Cw , since
|v | ≤ n − 1, it must also be v ∈ Cv , and from Lemma 3.9 we have that v ∈ Cw .
Let thus be v ∈ Cw such that
Pref ≤n−1 (v) ∩ Cw = ∅.
(3.5)
This yields that wv contains exactly one (θ, k)-hairpin: let us call p and q respectively the left and right arm of such (θ, k)-hairpin. Again from (3.5) we have
that |p| = |q| = k and that q is a suffix of v. If p is a factor of w, then it must be,
by hypothesis, a factor of w , and in this case v ∈ Cw . The last case to consider
is when p is neither a factor of w nor of v (otherwise it must overlap with q). In
this case p can be written as
p = ŵv̂,
ŵ being a suffix of w and v̂ a prefix of v. Since |ŵ| < k, by hypothesis ŵ must
also be a suffix of w , and we have that p is a factor of w v not overlapping with
the suffix q and thus
w v ∈ (θ, k)hp =⇒ v ∈ Cw .
We have then proved that
∀n ∈ N
Cw ∩ A≤n ⊆ Cw ∩ A≤n =⇒ Cw ⊆ Cw .
The other inclusion is analogous.
From Lemma 3.9 we obtain that the function
Φ : Cw → Cw Cw A+
is an injection. From this fact, and from Theorem 3.10, we have that given S =
Suff k (w) and T = Factk (w) ⊇ S, each element Cv ∈ X3 such that Suff k (v) = S
242
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
and Factk (v) = T is uniquely determined by a prefix code of height at most k.
This means that for each of the dk choices of S and for each choice of T we have
at most ϕ(k) possible choices of Cv , i.e.:
k
Card(X3 ) ≤ dk 2d
−1
ϕ(k).
(3.6)
Putting together (3.3), (3.4) and (3.6) we may rewrite (3.2) as
k
dk − 1
+ dk 2d −1 ϕ(k).
d−1
N(θ,k) ≤ 1 +
(3.7)
Now, Theorem 3.10 can be extended to elements of X1 and X2 in the following
weaker way: fix v as the longest element of Suff ≤k (w); then either v ∈ Factk (w)
or Factk (w) = ∅. If |v| < k, then Cw is uniquely determined by a prefix code of
height smaller than k (the empty code characterizes elements of X1 ). We obtain
then a simpler, though weaker, inequality:
N(θ,k) ≤
dk+1 − 1 dk −1
ϕ(k).
2
d−1
(3.8)
Notice that (3.7) and (3.8) seem to be new in literature and are given here
for the first time, to the best of our knowledge. Minimal automata accepting
hairpin (free) languages have been inspected in several papers, see for instance
[1, 10, 11, 18–21], but, as far as we know, it has always been shown that the best
k
estimation for N(θ,k) is 23d , for non-deterministic automata. Now we will see
that it is actually possible to sharpen it in our case.
3.11
N(θ,k) ≤ 23d for all d ≥ 3.
k
P r o o f. It is clear from the definition of ϕ that
ϕ(k) ≤ 2
dk+1 −1
d−1
.
From (3.8) we obtain
N(θ,k) ≤ 2
log2
dk+1 −1
d−1
k
2d
−1
2
dk+1 −1
d−1
.
With simple computations, and defining the function
k+1 (d − 1) log2 d d−1−1 + 2dk+1 − dk − d
,
f (d, k) =
d−1
the statement of the theorem is true whenever
2f (d,k) ≤ 23d ,
k
243
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
which is true if and only if
3d (d − 1) − 2d
+ d + d − (d − 1) log2
k
k+1
k
k+1
− 2d + d − (d − 1) log2
that is:
d
k
dk+1 − 1
d−1
dk+1 − 1
d−1
≥ 0,
= d(dk + 1) − 2(dk + 1) + 2 − (d − 1) log2
dk+1 − 1
d−1
≥ 0.
The last inequality is clearly true for all d ≥ 3.
We note that Theorems 3.10 and 3.11 are far from being optimal, since computational evidences show that improvements are still possible. Anyway, the
general arguments, which we have used, support most of the evidences which we
have found in alphabets over 2, 3 and 4 letters.
REFERENCES
[1] ADLEMAN, L.—KARI, J.—KARI, L.—REISHUS, D.—SOSIK, P.: The undecidability
of the infinite Ribbon problem: Implications for computing by self-assembly, SIAM J.
Comput. 38 (2009), 2356–2381.
[2] BACH, J. S.: Opera Omnia, Urtext Verlag, Berlin, 2000.
[3] BAUMSLAG, G.: A non-cyclic one-relator group all of whose finite quotients are cyclic,
J. Austral. Math. Soc. 10 (1969), 497–498.
[4] BERNASCONI, A. A.: On HNN-extensions and the complexity of the word problem for
one-relator groups. PhD Thesis, University of Utah, 1994.
[5] BJORGER, E.: Computability, Complexity, Logic. Stud. LogicF ound. Math. 128,
North-Holland Publishing Co., Amsterdam, 1989.
[6] BRADY, N.—DISON, W.—RILEY, T.: Hyperbolic hydra. Preprint,
http://arxiv.org/abs/1105.1535, Cornell University Library, 2011.
[7] CANNON, J. W.—EPSTEIN, D. A.—HOLT, D. F.—LEVY, S. V.—PATERSON, M. S.
—THURSTON, W. P.: Word Processing in Groups, Jones and Bartlett, London, 1992.
[8] COHEN, D. E.: The mathematician who had little wisdom: a story and some mathematics. In: Combinatorial and Geometric Group Theory (Edinburgh, 1993), LMS Lecture
Notes 204, Cambridge University Press, Cambridge, 1995, pp. 56–62.
[9] CSOUNDS, Program of musical synthesis and signal processing system,
http://www.csounds.com, Rome, 2011.
[10] CZEIZLER, EL.—CZEIZLER, EU.—KARI, L.—SALOMAA, K.: On the descriptional
complexity of Watson-Crick automata, Theoret. Comput. Sci. 410 (2009), 3250–3260.
244
Unauthenticated
Download Date | 6/18/17 4:15 PM
HYDRA GROUPS AND A NEW BOUND FOR THE LENGTH OF WORDS
[11] CZEIZLER, EL.—CZEIZLER, EU.—KARI, L.—SEKI, S.: An extension of the Lyndon–
Schutzenberger result to pseudoperiodic words, Inform. and Comput. 209 (2011),
717–730.
[12] DISON, W.—RILEY, T.: Hydra groups. Preprint, http://arxiv.org/abs/1002.1945, Cornell University Library, 2010.
[13] FRIEDMAN, H. M.: Long finite sequences, J. Combin. Theory Ser. A 95 (2001),
102–144.
[14] GERSTEN, S. M.: Isodiametric and isoperimetric inequalities in group extensions.
Preprint, University of Utah, 1991.
[15] GERSTEN, S. M.: Isoperimetric and isodiametric functions. In: Geometric Group Theory I (G. Niblo, M. Roller, eds.), LMS Lecture Notes 181, Cambridge University Press,
Cambridge, 1993, pp. 79–96.
[16] HOFSTADTER, D. R.: Gödel, Escher, Bach: an Eternal Golden Braid, Penguin Books,
Harmondsworth, 1981.
[17] KARRASS, A.—MAGNUS, W.—SOLITAR, D.: Combinatorial Group Theory (2nd
ed.), Dover Publications, Mineola, NY, 2004.
[18] KARI, L.—KONSTANTINIDIS, S.—LOSSEVA, E.—SOSIK, P.—THIERRIN, G.:
A formal language analysis of hairpin structures, Fund. Inform. 71 (2006), 453–475.
[19] KARI, L.—KONSTANTINIDIS, S.—SOSIK, P.: On properties of bond-free DNA languages, Theoret. Comput. Sci. 334 (2005), 131–159.
[20] KARI, L.—MAHALINGAM, K.: Watson–Crick bordered words and their syntactic
monoid, Internat. J. Found. Comput. Sci. 19 (2008), 1163–1179.
[21] KARI, L.—SEKI, S.: An improved bound for an extension of Fine and Wilf ’s theorem,
and its optimality, Fund. Inform. 101 (2010), 215–236.
[22] KASSABOV, M.—RILEY, T.: The Dehn function of Baumslag’s Metabelian Group.
Preprint, http://arxiv.org/abs/1008.1966, Cornell University Library, 2010.
[23] LEHNINGER, A.: Biochemistry, Worth Publishers, New York, 1976.
[24] LOTHAIRE, M.: Algebraic Combinatorics on Words. Encyclopedia Math. Appl. 90,
Cambridge University Press, Cambridge, 2011.
[25] MAZZOLA, G.: The Topos of Music, Birkäuser Verlag, Berlin, 2002.
[26] MILLER, C. F.: Decision problems for groups — survey and reflections. In: Algorithms
and Classification in Combinatorial Group Theory, Lect. Workshop, Berkeley/CA (USA)
1989. Math. Sci. Res. Inst. Publ. 23, Cambridge Univ. Press, Cambridge, 1992, pp. 1–59.
[27] NIERHAUS, G. Algorithmic Composition. Paradigms of Automated Music Generation,
Springer Verlag, Wien, 2009.
[28] OLSHANSKIJ, A. YU.—SAPIR, M. V.: Groups with small Dehn functions and bipartite
chord diagrams, Geom. Funct. Anal. 16 (2006), 1324–1376.
[29] OTERA, D. E.—RUSSO, F. G.: On the wgsc property in some classes of groups,
Mediterr. J. Math. 6 (2009), 501–508.
[30] PLATONOV, A. N.: An isoperimetric function of the Baumslag-Gersten group, Vestnik
Moskov. Univ. Ser. I Mat. Mekh. 3 (2004), 12–17.
[31] RUSSO, F. G.: La Musica Algoritmica e l’Offerta Musicale di J.S.Bach, Delta 3, Grottaminarda (Avellino), 2004.
[32] RUSSO, F. G.: On compact Just-Non-Lie groups, J. Lie Theory 17 (2007), 625–632.
245
Unauthenticated
Download Date | 6/18/17 4:15 PM
D. E. OTERA — F. G. RUSSO — V. RUSSO
[33] RUSSO, F. G.—SCHETTINO, C.: A simulation of the madrigal n. 1 of the book III of
Carlo Gesualdo da Venosa, Far East J. Math. Sci. (FJMS) 47 (2010), 51–61.
[34] RUSSO, V.: I Prodotti del gene RIZ (Retinoblastoma Interacting Zinc-finger protein)
nel controllo della proliferazione cellulare. MSc Thesis, Second University of Naples,
2010.
[35] SCHWEITZER, A.: G. S. Bach, Il Musicista-Poeta, Suvini Zerboni, Milano, 1952.
[36] TOURLAKIS, G. J.: Computability, Reston Publ. Co. Inc., Reston (Vancouver), 1984.
Received 17. 9. 2011
Accepted 30. 10. 2011
* Laboratoire de Mathématiques
Bâtiment 425
Faculté de Science d’Orsay
Université Paris-Sud 11
F–91405, Orsay Cedex
FRANCE
and
Institute of Mathematics and Informatics
Vilnius University
Akademijos str. 4
LT–08663, Vilnius
LITHUANIA
E-mail : [email protected]
** DEIM
Universitá degli Studi di Palermo
Viale Delle Scienze, Edificio 8
I–90128, Palermo
ITALY
and
Instituto de Matemática
Universidade Federal do Rio de Janeiro
Ilha do Fundão
21945–970, Rio de Janeiro
BRAZIL
E-mail : [email protected]
*** Dipartimento di Ematologia
Universitá degli Studi di Napoli Federico II
Via Sergio Pansini 5
I–80131, Naples
ITALY
E-mail : vincenzo russo [email protected]
246
Unauthenticated
Download Date | 6/18/17 4:15 PM

Download Report

full text pdf

Paperzz.com

Your Paperzz