Applied Mathematical Sciences, Vol. 8, 2014, no. 127, 6325 - 6339
HIKARI Ltd, www.m-hikari.com
http://dx.doi.org/10.12988/ams.2014.48660
Prefixes, Suffixes and Substrings in a Multilanguage:
A Perspective
Dasharath Singh
(Former Professor: Indian Institute of Technology Bombay)
Corresponding author
Mathematics Department, Ahmadu Bello University Zaria, Nigeria
Ahmed Ibrahim Isah
Mathematics Department, Ahmadu Bello University Zaria, Nigeria
Copyright © 2014 Dasharath Singh and Ahmed Ibrahim Isah. This is an open access article distributed
under the Creative Commons Attribution License, which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Abstract
In this paper, prefix, suffix and substring relations are described in a Multilanguage. It is
shown that prefix and suffix relations give rise to a relational structure of a linear
multiset order, and substring relations determine only a partial multiset order. The
distinctive features of these relations in multilanguage vs. standard formal language are
identified by using their graphical and Hasse diagram representations. Finally, certain
monoids of these relations and their homomorphisms are described.
Mathematics Subject Classification: 68Q45, 94A45, 03E02
Keywords: String, Prefix, Suffix, Substring, Monoid, Multilanguage
1 Introduction
Gottlob Fregeβs formulation of the first-order logic in 1879 is known to be the first
description of the structure of a formal language. It was immensely augmented by
6326
Dasharath Singh and Ahmed Ibrahim Isah
Hilbert, Alan Turing and Emil Post. In the mid-1950s, Noam Chomskyβs endeavor to
formulate a general theory of the syntax of natural languages made a great impact on
the development of formal language theory. In the 1960s and 1970s, particularly due to
the advent of electronic computers, much of the foundations for the theory of formal
languages were erected in order to describe the process linked with the use of computers
and communication devices. All forms of information (whether numbers, names,
pictures, or sound waves) could be represented as strings and a collection of strings
could be structured to represent a language became central to diverse applications of
formal languages in computer science and linguistics, and more recently, in soft
sciences such as biology and economics. In fact, the study of stringology has come to
occupy the core of the formal language theory.
On the other hand, the fact that many query languages and database languages do
require multiset-based semantics, the study of stringology needs to be extended to
multilanguages.
This paper attempts to describe prefix, suffix and substring relations and their properties
in a multilanguage. In particular, monoids of these relations and their homomorphisms
are described.
2 Some Basics of Standard Formal Language Theory
Most of the definitions given here are standard and mimicked from various sources,
particularly from [1] and those listed in the references.
Basically, a language πΏ is defined as a set of strings over a fixed alphabet Ξ£. An
alphabet is a nonempty set of distinguishable symbols, called terminal symbols,
sometimes denoted π΄π . A language deployed to generate strings of another language
over an alphabet Ξ£ is called metalanguage often denoted π΄π , and it is supposed to
consist of a set of syntactic classes or variables called nonterminal symbols. An
element of an alphabet is called a letter or a character. An alphabet need not be finite or
even countably infinite. However, for our purposes here, we shall assume it to be finite.
A string is a finite ordered set of symbols from the alphabet. A string is also called a
(finite) sequence, a word, or a sentence depending on its nature. A string consisting of π
symbols (π > 0) is said to be a string of length π. In the event, an empty string (i.e., a
string of length 0), denoted ο₯, ο¬, ο, or 1, is admitted, the system can have strings of
length π β₯ 0.
Strings can also be defined as functions [1]. For any integer π β₯ 1, let [π] =
{1, 2, β¦ , π}, and for π = 0, let [0] = β
.
A string π’ of length π over a fixed Ξ£ is a function
Prefixes, suffixes and substrings in a Multilanguage
6327
π’: [π] βΆ Ξ£.
When π = 0, the special string π’: [0] βΆ Ξ£, of length 0 (having no symbol), is called
the empty or null string.
Given a string π’: [π] βΆ Ξ£ of length π β₯ 1, let π’(π) denote the π π‘β letter in the string π’.
A string π’ of length π can also be represented as
π’ = π’1 π’2 β¦ π’π , π’π β Ξ£, π = 1, π.
[5]
For example, let Ξ£ = {π, π} and π’:
βΆ Ξ£ be defined such that π’(1) = π, π’(2) = π,
π’(3) = π, π’(4) = π and π’(5) = π, then
π’ = πππππ.
Let Ξ£ οͺ denote the set of all strings and Ξ£ + = Ξ£ οͺ β {ο₯}, denote the set of all nonempty
strings over Ξ£. A formal language (or simply a language) πΏ is a subset of Ξ£ οͺ i.e., πΏ β
Ξ£οͺ .
A language can also be considered as a subset of the free monoid on an alphabet;
however, as the language consisting of a free monoid is too large, it is not found
suitable for practical applications.
Following [1], we describe below two basic operations which are central to both
standard formal languages and multilanguages.
Concatenation
Let π’: [π] βΆ Ξ£ and π£: [π] βΆ Ξ£ be two strings over Ξ£, then concatenation of π’ and π£,
denoted π’. π£ or π’π£, is the string π’π£: [π + π] βΆ Ξ£, defined as
π’(π)
if 1 β€ π β€ π,
u v (i) =
π£(π β π) if π + 1 β€ π β€ π + π.
It is immediate to see that the following holds:
π’π = ππ’ = π’ and π’(π£π€) = (π’π£)π€.
In other words, concatenation is a binary operation on Ξ£ οͺ which is associative and has π
as an identity. Thus, (Ξ£ οͺ , ., π) or (Ξ£ οͺ , . ) is a monoid, and (Ξ£ + , . ) is a (free)
semigroup generated by Ξ£.
Note that concatenation, in general, is not commutative i.e., π’π£ β π£π’. For example,
let π’ = πππ, π£ = ππ over Ζ© = {π, π}, then π’π£ β π£π’.
Powers of strings
Given a string π’ β Ξ£ οͺ and π β₯ 0, π’π is defined as follows:
6328
Dasharath Singh and Ahmed Ibrahim Isah
π
if π = 0,
π’πβ1 π’
if π β₯ 1.
un =
Clearly, π’1 = π’, π π = π, and π’π π’ = π’π’π , for all π β₯ 0.
For example, let Ζ© = {π, π, π}, π’ = ππππ, π£ = πππ, π€ = ππππ, π₯ = π, π¦ = π; then
(π’π£)π€ = π’(π£π€) = πππππππππππ, π’2 = ππππππππ, π₯ 6 = ππππππ, π₯π¦ = π₯ = π, etc.
It is known that, in general, uncountably infinite amount of languages exist over any
alphabet. However, in the case of a formal language theory, as it is defined by a
grammar or by an automaton, and as such operating in a natural language, at most
countably infinite amount of languages can exist.
In view of enormous applications of formal languages, it is not surprising that a vast
amount of literature is available on the topic (see 1, 2, 3, 4, 5, for details).
3 Multilanguages
Knuth, in the summer of 1964, while writing the book entitled The Art of Computer
Programming, drafted a chapter on The theory of languages where he emphasized the
need to keep a track of the multiplicity of strings in the language so that a string would
appear several times if there were several ways to parse it. This was recognized quite
natural from a programmerβs point of view because transformations on context-free
grammar (its terminal strings form multisets if it is non-circular) were found most
useful in practice, especially when they yielded isomorphisms between parse trees. In
fact, Knuth [6] is the first place where an elaborate description of multilanguages
appeared. Knuth defines a multilanguage simply as a multiset of strings.
3.1 Basics of a Multilanguage
In the following, we briefly describe multisets, multiset relations, and lexicographic
ordering on multisets. Further, characterization of prefix, suffix and substring relations
in a multilanguage is described which would help studying stringology in a multisetbased environment.
Prefixes, suffixes and substrings in a Multilanguage
6329
Multisets
A Multiset (mset, for short) is an unordered collection of objects in which, unlike an
ordinary set, duplicates or multiples of objects are admitted. For example, a string
stripped of its ordering is a multiset.
An mset containing one occurrence of π, two occurrences of π, and three occurrences
of π, is variably represented as [[π, π, π, π, π, π]] or [π, π, π, π, π, π] or [π, π, π]1,2,3 or
[π, 2π, 3π] or [π. 1, π. 2, π. 3] or [1/π, 2/π, 3/π] or [π1 , π 2 , π 3 ] or [π1 π 2 π 3 ]. For
convenience, the curly brackets are also used in place of the square brackets. In fact the
last form of representation as string, even without using any bracket, turn out to be the
most compact one, especially in computational parlance. In general, if π₯1 appears π1
times, π₯2 appears π2 times, β¦ , π₯π appears ππ times in an mset π΄, then π΄ is expressed
as π΄ = {π1 /π₯1 , π2 /π₯2 , β¦ , ππ /π₯π }.
Formally, a multiset over Ξ£ is a mapping π΄ βΆ Ξ£ βΆ ο, where ο is the set of natural
numbers, including zero. Let π΄(π) or πΆπ΄ (π) denote the number or count of copies of
the symbol π in the multiset π΄. The empty multiset, denoted ο₯, is defined as ο₯ (π) = 0,
for all π β Ξ£. Follows that π΄ is a set, whenever π΄(π) β€ 1.
Let π΄ and π΅ be msets over Ξ£. π΄ is said to be a submultiset (msubset or submset, for
short) of π΅, written π΄ β π΅ or π΅ β π΄, if π΄(π) β€ π΅(π), βπ β Ξ£. Thus, π΄ = π΅ iff π΄ β π΅
and π΅ β π΄.
Note that an mset need not be finite, in general. However, for our purposes here, we
shall consider it finite.
Multiset relations
Let π π be a cardinality bounded mset space defined on a set π and π΄ β π π . Any subset
π
of π΄xπ΄ is said to be an mset relation on π΄, denoted π
: π΄ βΆ π΄, where every member
of π
has a count πΆ1 (π₯, π¦) and πΆ2 (π₯, π¦). Note that π/π₯ π
β related to π/π¦ is
symbolized as (π/π₯, π/π¦) β π
or π/π₯ π
π/π¦. In other words, π
= {(π/π₯, π/π¦)/
ππ: (π/π₯, π/π¦) βππ π
}, where (π/π₯, π/π¦) βππ π
symbolizes that the ordered pair
(π₯, π¦) occurs ππ times in π
. The domain of the relation π
on π΄ (π·ππ π
, for short) =
{π₯ βπ π΄: βπ¦ βπ‘ π΄ such that π /π₯ π
π‘/π¦} where πΆπππ π
(π₯) = sup{πΆ1 (π₯, π¦): π₯ βπ π΄}, and
the range of π
on π΄ (π
ππ π
, for short) = {π¦ βπ‘ π΄: βπ₯ βπ π΄ such that π /π₯ π
π‘/π¦}
where πΆπππ π
(π₯) = sup{πΆ2 (π₯, π¦): π¦ βπ‘ π΄}. Note that these Sups always exist for a
cardinality bounded msets.
For example, let π΄ = [5/π₯, 7/π¦, 9/π§] be an mset. Then, π
= {(2/π₯, 3/π¦)/6, (4/π₯, 3/
π§)/12, (4/π¦, 5/π§)/20, (6/π¦, 5/π₯)/30, (4/π§, 4/π§)/16, (5/π§, 3/π₯)/15, (3/π₯, 6/π¦)/18}
is an mset relation on π΄ with π·ππ π
= [4/π₯, 6/π¦, 5/π§], and πππ π
= [5/π₯, 6/π¦, 5/
π§].
6330
Dasharath Singh and Ahmed Ibrahim Isah
An mset relation π
on an mset π΄ is called reflexive iff β π/π₯ in π΄, π/π₯ π
π/π₯;
irreflexive iο¬ π/π₯ π
π/π₯ never holds; symmetric iff π/π₯ π
π/π¦ imply π/π¦ π
π/π₯,
antisymmetric iο¬ β π/π₯, π/π¦ in π΄, π/π₯ π
π/π¦ and π/π¦ π
π/π₯ imply π/π₯ = π/π¦;
transitive iff β π/π₯, π/π¦, π/π§ in π΄, π/π₯ π
π/π¦ and π/π¦ π
π/π§ imply π/π₯ π
π/π§.
An mset relation π
on an mset π΄ is said to be an mset order relation (mset order, for
short) or a partial mset order relation, if it is reflexive, antisymmetric and transitive.
The pair (π΄, π
) is called an ordered mset or a partially ordered mset (pomset) structure.
A partial mset order is called a linear mset order (or a complete mset order) on π΄, if for
every two elements π/π₯ β π/π¦ of π΄, either π/π₯ π
π/π¦ or π/π¦ π
π/π₯ (see [7], [8], for
further details).
Over an alphabet Ξ£, let Ξ£ β denote the multiset of all strings, including the empty string
which is unique, and Ξ£ β¨ , the multiset of all non-empty strings. A multilanguage,
denoted ππΏ , is an msubset of Ξ£ β .
Let for each π β Ξ£, π’ β Ξ£ β , |π’|π denote the number of occurrences of the symbol π in
the string π’.
Observe that if Ξ£ = β
then β
β = {π}, and if Ξ£ β β
then Ξ£ β is countably infinite.
Interestingly, even for a tiny alphabet that consists of a single letter e.g., let Ξ£ = {π},
Ξ£ β contains infinitely many strings viz., π0 = ο₯, π1 = π, π2 = ππ, etc., and, in
general, for every π > 0, the string ππ is in Ξ£ β . Therefore, given any alphabet Ξ£, there
are infinitely many multilanguages over Ξ£.
Orderings on strings in a Multilanguage
Let Ξ£ = {π1 , β¦ , ππ } such that π1 < π2 < β― ππ , π’, π£ β Ξ£ β , then the lexicographic
ordering, denoted βΌ, is defined as:
if π£ = π’, or
uβΌ v
if π£ = π’π¦ for some π¦ β Ξ£ β , or
if π’ = π₯ππ π¦, π£ = π₯ππ π§, and ππ < ππ for some π₯, π¦, π§ β Ξ£β .
In fact, lexicographic ordering of Ξ£ β is a monotonic and linear mset order.
For example, the lexicographic ordering of the strings ππππ, π, π, π, ππ, ππ, πππ, ππ
over Ξ£ = {π, π} is π, π, π, ππ, ππ, ππ, πππ, ππππ and also, lexicographic ordering of Ξ£ β
itself is [ π, π, π, ππ, ππ, ππ, ππ, πππ, πππ, β¦ ], etc.
Prefix, Suffix and Substring in a Multilanguage
Let Ξ£ be an alphabet, Ξ£ β be the mset of all strings over Ξ£, and π’, π£ β Ξ£ β .
Prefixes, suffixes and substrings in a Multilanguage
6331
π’ is a substring of π£ iο¬ there exist π₯, π¦ β Ξ£ β such that π£ = π₯π’π¦. In other words, a
substring (also, called segment, subword or factor) of a string is any sequence of
consecutive symbols that appear in the string.
The term substrings of a string represents the set of all substrings of a string.
π’ is a prefix of π£ iο¬ there is some π¦ β Ξ£ β such that π£ = π’π¦. That is, a prefix of a string
π’ is a substring of π’ that occurs at the beginning of π’.
The term prefixes of a string represents the set of all prefixes of a string.
π’ is a suffix of π£ iο¬ there is some π₯ β Ξ£ β such that π£ = π₯π’. That is, a suffix of a string
π’ is a substring that occurs at the end of π’.
The term suffixes of a string represents the set of all suffixes of a string.
Observe that a prefix is any sequence of leading symbols of the string, and a suffix is
any sequence of trailing symbols of the string. Moreover, a substring of a string is a
prefix of a suffix of the string, and also, a suffix of a prefix.
π’ is a proper prefix (suffix or substring) of π£ iο¬ π’ is a prefix (respectively, suffix or
substring) of π£ and π’ β π£.
Note that prefixes and suffixes both are substrings, but the converse need not hold.
For example, let π€ = ππππ be a string over Ξ£ = {π, π}. Its prefixes, suffixes and
substrings are as follows:
Prefixes: ο₯, π, ππ, πππ, ππππ.
Suffixes: ο₯, π, ππ, πππ, ππππ.
Substrings: ο₯, π, π, ππ, π, ππ, πππ, π, ππ, πππ, ππππ.
The border of a string is both its prefix and suffix, e.g., πππ is a border of πππππ.
Remark 3.1.1
It needs to be emphasized that prefixes, suffixes and substrings are multisets. The
prefixes and suffixes are defined in the same way both in standard formal language and
multilanguage. For example, prefixes and suffixes of the string π€ considered above are
the same in both. However, for substrings, this is not the case. The substrings of π€ in a
multilanguage are as given above, where as in a standard formal language, substrings of
π€ are ο₯, π, π, ππ, ππ, πππ, ππ, πππ, ππππ.
Prefix, Suffix and Substring Closures in a Multilanguage
Let ππΏ be a multilanguage. In line with the description of these notions for a standard
formal language, we define them for ππΏ as follows:
Prefix closure of ππΏ :
ππππππΏ (π£) = {π’| π£ = π’π¦; π£ β ππΏ , π’, π¦ β Ξ£ β }, and
ππππ(ππΏ ) = βπ£βππΏ ππππππΏ (π£) = {π’| π£ = π’π¦; π£ β ππΏ , π’, π¦ β Ξ£ β }.
Suffix closure of ππΏ :
6332
Dasharath Singh and Ahmed Ibrahim Isah
π π’ππππΏ (π£) = {π’| π£ = π₯π’; π£ β ππΏ , π₯, π’ β Ξ£ β }, and
π π’ππ(ππΏ ) = βπ£βππΏ π π’ππππΏ (π£) = {π’| π£ = π₯π’; π£ β ππΏ , π₯, π’ β Ξ£ β }.
Substring closure of ππΏ :
π π’ππ ππΏ (π£) = {π’| π£ = π₯π’π¦; π£ β ππΏ , π₯, π’, π¦ β Ξ£ β }, and
π π’ππ (ππΏ ) = βπ£βππΏ π π’ππ ππΏ (π£) = {π’| π£ = π₯π’π¦; π£ β ππΏ , π₯, π’, π¦ β Ξ£ β }.
Example
Let ππΏ = {ππ, ππ, πππ} over Ξ£ = {π, π}. Then
ππππ(ππΏ ) = {π, π, ππ, π, ππ, π, ππ, πππ},
π π’ππ(ππΏ ) = {π, π, ππ, π, ππ, π, ππ, πππ}, and
π π’ππ (ππΏ ) = {π, π, π, ππ, π, π, ππ, π, π, ππ, π, ππ, πππ}.
As in a standard formal language πΏ, the following results hold in ππΏ as well:
A multilanguage is said to be prefix, suffix or substring closed if ππππ(ππΏ ) = ππΏ ,
π π’ππ(ππΏ ) = ππΏ or π π’ππ (ππΏ ) = ππΏ , respectively.
For example, ππΏ = [π, π, π, π] is both prefix, suffix and substring closed.
The prefix, suffix and substring closure operators in a multilanguage ππΏ are
idempotent:
ππππ(ππππ(ππΏ )) = ππππ(ππΏ ), π π’ππ(π π’ππ(ππΏ )) =
π π’ππ(ππΏ ), π π’ππ (π π’ππ (ππΏ )) = π π’ππ (ππΏ ).
Remark 3.1.2
It may, however, be noted that the prefix, suffix and substring closures of a
multilanguage and that of a standard formal language differ. For example, in the case of
a multilanguage ππΏ = {ππ, πππ}, π π’ππ(ππΏ ) = {π, π, ππ, π, ππ, πππ}, and in a standard
formal language πΏ = {ππ, πππ}, π π’ππ(πΏ) = {π, π, ππ, πππ}.
3.2
Prefix, Suffix and Substring Relations and Their Characteristics in a
Multilanguage
Let π consist of prefixes (suffixes or substrings) of a string π£ β Ξ£ β and π
, be the
respective binary relation on π. Hence forth, π
shall stand for π
(prefix, suffix or
substring), unless otherwise stated.
i.
ii.
π
is reflexive: β π/π₯ β π, π/π₯ π
π/π₯. Since every prefix (suffix or substring)
π’ is a prefix (respectively, suffix or substring) of itself.
π
(prefix or suffix) is complete: β π/π₯ β π/π¦ β π, either π/π₯ π
π/π¦ or π/
π¦ π
π/π₯. Since for every pair of prefixes (or suffixes) of a string π£, one is a
prefix (respectively, suffix) of the other. However, for substrings, this is not licit.
For example, let π£ = ππππ and π/π₯ = 1/π, π/π¦ = 1/π then both π/π₯, π/π¦ β
π, but neither π/π₯ π
π/π¦ nor π/π¦ π
π/π₯.
Prefixes, suffixes and substrings in a Multilanguage
6333
π
is antisymmetric: β π/π₯, π/π¦ β π, if π/π₯ π
π/π¦ and π/π¦ π
π/π₯, then
π/π₯ = π/π¦. Since whenever π’ is a prefix (suffix or substring) of π£ and π£ is a
prefix (respectively, suffix or substring) of π’, then π’ must be the same as π£ as
duplicates are indistinguishable.
iv.
π
is transitive: β π/π₯, π/π¦, π/π§ β π, if π/π₯ π
π/π¦ anπ π/π¦ π
π/π§, then
π/π₯ π
π/π§. Since whenever π’ is a prefix (suffix or substring) of π£ and π£ is a
prefix (respectively, suffix or substring) oπ π€, clearly π’ is a prefix (respectively,
suffix or substring) oπ π€.
Clearly, π
(prefix or suffix) is a linear mset order, where as π
(substring) is only a
partial mset order.
iii.
Summarily, π
on π gives rise to a multiset relational structure (π, π
), hence forth
notated π = (π, π
), on π. In general, π is an mset and π
is an mset relation. In
particular, π may be a list of attributes defined on a domain and π
, the relation name
(see [9], for related descriptions). In the following, we describe some typical features of
such an mset structure, which would be exactly the features possessed by its relations.
Maximal, Minimal and Isolated Elements of π = (πΏ, πΉ)
Let π = (π, π
), where π consists of all nonempty prefixes (suffixes or substrings) of a
string π£ β Ξ£ β¨ and π
, its respective relation on π. Then,
π/π₯ is a maximal element of π iff β π/π¦ β π, π/π¦ π
π/π₯;
for π
(prefix or suffix), π/π₯ is a minimal element of π iff β π/π¦ β π, π/π₯ π
π/π¦;
for π
(substring), π/π₯ is a minimal element of π iff β π/π¦ β π, π/π₯ π
π/π¦ or else
they are unrelated; and
π/π₯ is an isolated element of π iff π/π₯ is both a maximal and minimal element of π.
For example, Let Ξ£ = {π, π}, π£ = ππππ β Ξ£ β¨ , then the non-empty prefixes and
substrings of π£ are π, ππ, πππ, ππππ and π, π, ππ, π, ππ, πππ, π, ππ, πππ,
ππππ, respectively.
Now for π
(prefix), 1/π and 1/ππππ are the minimal and maximal elements,
respectively, and for π
(substring), 2/π and 2/π are the minimal elements and 1/ππππ
is the maximal element.
Also, let π’ = π β Ξ£ β¨ , then 1/π is the isolated element for π
.
Proposition 3.2.1
π = (π, π
) always has a unique maximal element and, only under π
(prefix or suffix), it
has also a unique minimal element.
Proof
Let π/π§ β π‘/π β π, and let π/π§, π‘/π both be maximal elements of π. Since π/π§ is a
maximal element, π‘/π π
π/π§. Similarly, since π‘/π is a maximal element, π/π§ π
π‘/π .
6334
Dasharath Singh and Ahmed Ibrahim Isah
Now, by antisymmetry of these relations, π/π§ = π‘/π . This contradicts the assumption
that π/π§ β π‘/π . Hence π has a unique maximal element. Similarly, the other parts can
be proved.
Chains of π = (πΏ, πΉ)
A submset structure β = (πΆ β π, π
) of the prefix (suffix or substring) mset relational
structure π is a chain in π, if distinct points of πΆ are pair wise comparable in π
i.e., β π/π₯ β π/π¦ β π, either π/π₯ π
π/π¦ or π/π¦ π
π/π₯ in π. Also, π itself is said to
be a chain if the distinct points on π are pair wise comparable. It is easy to see that
every (πΆ, π
) with π
(prefix or suffix) is a chain, and the mset of all chains in a prefix
(suffix or substring) relational structure is partially ordered by mset inclusion relation.
The maximal elements in this pomset relational structure are the maximal chains.
For example, let π£ = πππ β Ξ£ β¨ , and π = [2/π, 1/π, 1/ππ, 1/ππ, 1/πππ] be the mset
of all nonempty substrings of π£. Then, submsets πΆ1 = [2/π, 1/ππ], πΆ2 = [2/π, 1/ππ],
πΆ3 = [2/π, 1/ππ, 1/πππ], πΆ4 = [1/π, 1/ππ], πΆ5 = [1/π, 1/ππ, 1/πππ]
are
some
chains in π, and πΆ3 , πΆ5 are the maximal chains among these.
Graphical Representation of πΉ on πΏ
Let π consist of all prefixes (suffixes or substrings) of a string π£ β Ξ£ β¨ and π
, its
respective relation on π. This relation can be represented by a directed graph πΊ = (π,
πΈ), where π (the vertex mset) is defined by π β π, and πΈ (the directed edges mset) by
(π/π₯, π/ π¦) β πΈ iff (π/π₯ β π/ π¦) β π
, for π β₯ 2.
For example, Let Ξ£ = {π, π}, π£ = ππππ β Ξ£ β¨ , and let the non-empty prefixes and
substrings of π£ be given as:
Prefixes: π = π, π’ = ππ, π€ = πππ, π₯ = ππππ, and
Substrings: π = π, π = π, π‘ = ππ, π’ = ππ, π€ = ππ, π₯ = πππ, π¦ = πππ, π§ = ππππ.
Then the directed graph for the prefix relation π
on π = [1/π , 1/π’, 1/π€, 1/π₯] is
presented in fig A, while that of the substring relation π
on π = [3/π, 1/π , 1/π‘, 1/π’, 1/
π€, 1/π₯, 1/π¦, 1/π§] is presented in fig B below.
Prefixes, suffixes and substrings in a Multilanguage
s
u
w
x
Fig A
Graph of the prefix relation
6335
r
r
r
t
w
s
u
y
x
z
Fig B
Graph of the substring relation
Proposition 3.2.2
A prefix (suffix or substring) relation graph is a directed acyclic graph for π β₯ 2.
Proof
Let πΊ = (π, πΈ) denote the graph of a prefix (suffix or substring) relation and let
(π/π₯, π/π¦) β πΈ be represented as π/π₯ πΈ π/π¦.
Let π β₯ 2, and the graph is cyclic. That is, there exist distinct elements π1 /π₯1 , β¦ , ππ /
π₯π β π such that π1 /π₯1 πΈ β¦ πΈ ππ /π₯π πΈ π1 /π₯1 .
Since these relations are transitive, π1 /π₯1 πΈ π2 /π₯2 and π2 /π₯2 πΈ π3 /π₯3 imply π1 /
π₯1 πΈ π3 /π₯3 and, in turn, by repeated application of transitivity, we have π1 /π₯1 πΈ ππ /
π₯π . Hence, π1 /π₯1 = ππ /π₯π , by antisymmetry of these relations. This contradicts the
assumption that π1 /π₯1 , β¦ , ππ /π₯π are distinct and π β₯ 2.
Hasse diagram of πΉ on πΏ
Let π = (π, π
), where π consists of all nonempty prefixes (suffixes or substrings) of a
string π£ β Ξ£ β¨ and π
, its respective relation on π. In order to represent the relations by
6336
Dasharath Singh and Ahmed Ibrahim Isah
Hasse diagram, we assume that βπ/π₯, π/π¦ β π, π/π₯ π
π/π¦ only if β π/π§ β π such
that π/π₯ π
π/π§ and π/π§ π
π/π¦.
For each ordered pair for which this relation holds, draw π/π₯ in a vertical plane below
π/π¦ (or π/π¦ below π/π₯) and connect them with a straight line, i.e., a Hasse diagram
for prefix (suffix or substring) relation is a directed graph with vertex mset π and edges
set πΈ minus self-loops and edges implied by transitivity.
As an example, the Hasse diagrams for the prefix relation in fig A and, that of the
substring relation in fig B, are presented in Fig C and D, respectively.
x
z
w
x
u
s
Fig C
Hasse diagram for the prefix relation
y
t
s
w
u
r
r
r
Fig D
Hasse diagram for the substring relation
Remark 3.2.3
Akin to set relations, two or more different prefix (suffix or substring) mset relations
may have similar Hasse diagrams. For example, the Hasse diagram of the suffix relation
π
on π = [π, π, π, π], for the nonempty suffixes of the string ππππ with π = π, π =
ππ, π = πππ, π = ππππ, is similar to that of the prefix relation presented in fig C.
4 Monoids of prefixes, suffixes and substrings of a nonempty string in
a multilanguage
In view of the fact that homomorphisms of semigroups and monoids have useful
applications in the economic design of sequential machines and in formal languages,
this section introduces these notions in a multilanguage.
Prefixes, suffixes and substrings in a Multilanguage
6337
Let Ξ£ be an alphabet and Ξ£ β¨ , the mset of all nonempty strings over Ξ£. For π£ β Ξ£ β¨ , let
π£ β denote the multiset of all prefixes (suffixes or substrings) of π£, and π
its respective
relation. We define a binary operation β on π£ β as follows:
i.
βπ/π₯, π/π¦ β π£ β , if π/π₯ π
π/π¦ and β π/π§ β π£ β such that π/π₯ π
π/π§ β§
π/π§ π
π/π¦, then π/π₯ β π/π¦ = π/π¦,
ii.
βπ/π₯, π/π¦ β π£ β , if π/π₯ π
π/π¦ and β π1 /π§1 , π2 /π§2 , β¦ , ππ /π§π β π£ β , such
that π/π₯ π
π1 /π§1 π
π2 /π§2 π
β¦ π
ππ /π§π π
π/π¦,
then π/π₯ β π/π¦ = π/π₯ β
π1 /π§1 β π2 /π§2 β β¦ β ππ /π§π β π/π¦.
It is easy to see that β is associative and π is the identity element. Hence, (π£ β , β, π),
usually written as (π£ β , β), is a monoid. Also, every element π/π₯ β π£ β is idempotent
with respect to the operation β, since π/π₯ β π/π₯ = π/π₯, βπ/π₯ β π£ β .
However, β is not commutative as π/π₯ β π/π¦ β π/π¦ β π/π₯, βπ/π₯, π/π¦ β π£ β .
For example, let Ξ£ = {π, π}, and let π£ = ππππ β Ξ£ β . Consider the substrings
π, π, π, ππ, π, ππ, πππ, π, ππ, πππ, ππππ of π£ and let π = π, π = π, π = π, π‘ = ππ, π’ =
ππ, π€ = ππ, π = πππ, π = πππ, and π = ππππ. Then, π£ β = [1/π, 3/π, 1/π , 1/π‘, 1/
π’, 1/π, 1/π€, 1/π, 1/π].
It is straightforward to see that β is associative, since βπ/π₯, π/π¦, π/π§ β π£ β ,
(π/π₯ β π/π¦) β π/π§ = π/π₯ β (π/π¦ β π/π§), and 1/π is the identity element. Therefore,
(π£ β , β) is a monoid.
Functions between Monoids of Prefixes, Suffixes and Substrings
Let π = (π£ β , β) and π = (π’β , β) be two monoids of prefixes (suffixes or substrings)
of π£, π’ β Ξ£ β . A homomorphism between π and π is a function π: π£ β βΆ π’β such that
for every element π/π₯ β π£ β there exists exactly one element π/π¦ β π’β such that (π/
π₯, π/π¦) β π, the pair (π₯, π¦) occurs only πΆ1 (π₯, π¦) times and, for every pair of
elements π/π₯, π/π¦ β π£ β , if π/π₯ β π/π¦, then
(i)
π(π/π₯ β π/π¦) = π(π/π₯) β π(π/π¦); and
(ii)
π(ππ£β ) = ππ’β , where ππ£β and ππ’β are the same identity on π£ β and π’β both,
as mentioned earlier.
For example, let Ξ£ = {π, π} and π£ = πππ, π’ = ππππ β Ξ£ β ; then their substrings are
π, π, π, ππ, π, ππ, πππ and π, π, π, ππ, π, ππ, πππ, π, ππ, πππ, ππππ, respectively. For the
substrings
of π£, let
π = π, π = π, π = π, π = ππ, π = ππ, π = πππ, π£ β =
[1/π, 2/π, 1/π, 1/π, 1/π, 1/π] and, for that of π’, let π = π, π = π, π = π, π‘ = ππ,
β = ππ, π€ = ππ, π₯ = πππ, π = πππ, π§ = ππππ, π’β = [1/π, 3/π, 1/π , 1/π‘, 1/β, 1/
π€, 1/π₯, 1/π, 1/π§]. Then,
π: (π£ β , β) βΆ (π’β , β), defined by π{(1/π, 1/π)/1, (2/π, 2/π)/2, (1/π, 1/π€)/1, (1/
π, 1/π)/1, (1/π, 1/π)/1, (1/π, 1/π§)/1}, is a monoid homomorphism of substrings,
6338
Dasharath Singh and Ahmed Ibrahim Isah
while π: (π£ β , β) βΆ (π’β , β), defined by π{(1/π, 1/π)/1, (2/π, 2/π)/2, (1/π, 1/β)/
1, (1/π, 1/π€)/1, (1/π, 1/π₯)/1, (1/π, 1/π§)/1} is not a homomorphism, since π(1/π β
1/π) does not imply π(1/π) β π(1/π).
Let π: π = (π£ β , β) βΆ π = (π’β , β) be a monoid homomorphism. We define
(i) π: π βΆ π is an injection iff π: π£ β βΆ π’β is an injection, and
βπ₯(π₯ β π£ β βΉ |π₯| β€ |π(π₯)|),
(ii) π: π βΆ π is a surjection iff π: π£ β βΆ π’β is a surjection, and
βπ₯(π₯ β π£ β βΉ |π₯| β₯ |π(π₯)|), and
(iii) π: π βΆ π is a bijection iff π: π£ β βΆ π’β is a bijection, and
βπ₯(π₯ β π£ β βΉ |π₯| = |π(π₯)|).
Note that |π’| denotes the length of the string π’.
For example, consider the strings π = πππ, π = πππ β Ξ£ β . Then, π, π, π, ππ, π, ππ, πππ
and π, π, π, ππ, π, ππ, πππ, respectively are their substrings.
For π, let π₯ = π, π = π, π = π, π = ππ, π = ππ, π = πππ, π β = [1/π₯, 2/π, 1/π, 1/
π, 1/π, 1/π ], and for π, let π¦ = π, π‘ = π, π’ = π, π£ = ππ, π€ = ππ, π§ = πππ, πβ =
[1/π¦, 2/π‘, 1/π’, 1/π£, 1/π€, 1/π§], then π: (π β , β) βΆ (πβ , β), defined by π{(1/π₯, 1/
π¦)/1, (2/π, 2/π‘)/2, (1/π, 1/π’)/1, (1/π, 1/π£)/1, (1/π, 1/π€)/1, (1/π , 1/π§)/1}, is an
isomorphism.
Concluding Remarks
In view of the fact that strings are known to be relatively (in comparison with integers
or even booleans, etc.) suitable basic data types in a (querry) language, and as many
(database) languages and systems do require multiset-based semantics, formulation of
prefix, suffix and substring relational structures in this paper may be found useful in
construction of formal power series developed in [10], for example. Monoids and their
homomorphisms described in this paper may find useful applications in the economical
design of sequential machines and formal languages.
Acknowledgement: We are beholden to Professor D. E. Knuth for making paper [6]
available to the first Author of this paper in a personal correspondence.
References
[1] J. Gallier, Introduction to the Theory of Computation, Formal Languages and
Automata Models of Computation, Lecture Notes, (2010) 1- 60.
[2] J. Kari, Automata and Formal Languages, University of Turku, Lecture Notes,
(2011) 1 β 150.
Prefixes, suffixes and substrings in a Multilanguage
6339
[3] M. Kudlek, C. Martin-Vide, and G. Pãun, Toward a Formal Macroset Theory, In:
Multiset Processing, C.S. Calude et al. (Eds.), LNCS 2235, Springer β Verlag,
(2001) 123-133.
[4] E. CsuhajβVarjú, C. MartínβVide and V. Mitrana, Multiset Automata*, In:
Multiset Processing, C.S. Calude et al. (Eds.), LNCS 2235, Springer β Verlag,
(2001) 69 β 83.
[5] K. Ruohonen, Formal Languages, Lecture Notes, (2009) 1-94.
[6] D. E. Knuth, Context-free Multilanguages, In: Theoretical Studies in Computer
Science, Jeffrey D. Ullman, Symour Ginsburg (Eds), Academic Press, (1992) 1-13.
[7] D. Singh, A. M. Ibrahim, Y. Tella, and J. N. Singh, An Overview of the
Applications of Multisets, NOVI Sad J. Math, 37 (2) (2007) 73 β 92.
[8] K. P. Girish and J. J. Sunil, General relations between partially ordered multisets
and their chains and antichains, Mathematical Communications, 14 (2) (2009) 193205.
[9] P. W. P. J. Grefen and R. A. de By, A Multiset Extended Relational Algebra: A
Formal Approach to a Practical Issue, In: 10th International Conference on Data
Engineering (ICDE), Houston, TX, USA, (1994) 80 - 88.
[10] A. Salomaa, Formal Languages, Academic press, New York San Francisco, 1973.
Received: August 17, 2014
© Copyright 2026 Paperzz