Partial Word DFAs⋆

Partial Word DFAs?
E. Balkanski1 , F. Blanchet-Sadri2 , M. Kilgore3 , and B. J. Wyatt2
1
3
Department of Mathematical Sciences, Carnegie Mellon University,
Wean Hall 6113, Pittsburgh, PA 15213, USA,
[email protected]
2
Department of Computer Science, University of North Carolina,
P.O. Box 26170, Greensboro, NC 27402–6170, USA,
[email protected] [email protected]
Department of Mathematics, Lehigh University, Christmas-Saucon Hall,
14 East Packer Avenue, Bethlehem, PA 18015, USA
[email protected]
Abstract. Recently, Dassow et al. connected partial words and regular
languages. Partial words are sequences in which some positions may be
undefined, represented with a “hole” symbol . If we restrict what the
symbol can represent, we can use partial words to compress the representation of regular languages. Doing so allows the creation of so-called
-DFAs which are smaller than the DFAs recognizing the original language L, which recognize the compressed language. However, the -DFAs
may be larger than the NFAs recognizing L. In this paper, we investigate
a question of Dassow et al. as to how these sizes are related.
1
Introduction
The study of regular languages dates back to McCulloch and Pitts’ investigation
of neuron nets (1943) and has been extensively developing since (for a survey see,
e.g., [7]). Regular languages can be represented by deterministic finite automata,
DFAs, by non-deterministic finite automata, NFAs, and by regular expressions.
They have found a number of important applications such as compiler design.
There are well-known algorithms to convert a given NFA to an equivalent DFA
and to minimize a given DFA, i.e., find an equivalent DFA with as few states as
possible (see, e.g., [6]). It turns out that there are languages accepted by DFAs
that have 2n states while their equivalent NFAs only have n states.
Recently, Dassow et al. [4] connected regular languages and partial words.
Partial words first appeared in 1974 and are also known under the name of
strings with don’t cares [5]. In 1999, Berstel and Boasson [2] initiated their
combinatorics under the name of partial words. Since then, many combinatorial
properties and algorithms have been developed (see, e.g., [3]). One of Dassow et
al.’s motivations was to compress DFAs into smaller machines, called -DFAs.
?
This material is based upon work supported by the National Science Foundation
under Grant No. DMS–1060775.
2
E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt
More precisely, let Σ be a finite alphabet of letters. A (full) word over Σ is
a sequence of letters from Σ. We denote by Σ ∗ the set of all words over Σ, the
free monoid generated by Σ under the concatenation of words where the empty
word ε serves as the identity. A language L over Σ is a subset of Σ ∗ . It is regular
if it is recognized by a DFA or an NFA. A DFA is a 5-tuple M = (Q, Σ, δ, q0 , F ),
where Q is a set of states, δ : Q × Σ → Q is the transition function, q0 ∈ Q is
the start state, and F ⊆ Q is the set of final or accepting states. In an NFA, δ
maps Q × Σ to 2Q . We call |Q| the state complexity of the automaton. Many
languages are classified by this property.
Setting Σ = Σ ∪ {}, where 6∈ Σ represents undefined positions or holes,
a partial word over Σ is a sequence of symbols from Σ . Denoting the set of
all partial words over Σ by Σ∗ , a partial language L0 over Σ is a subset of Σ∗ .
It is regular if it is regular when being considered over Σ . In other words, we
define languages of partial words, or partial languages, by treating as a letter.
They can be transformed to languages by using -substitutions over Σ. A ∗
substitution σ : Σ∗ → 2Σ satisfies σ(a) = {a} for all a ∈ Σ, σ() ⊆ Σ, and
σ(uv) = σ(u)σ(v) for u, v ∈ Σ∗ . As a result, σ is fully defined by σ(), e.g., if
σ() = {a, b} and L0 = {b, c} then σ(L0 ) = {ab, bb, ac, bc}. If we consider this
process in reverse, we can “compress” languages into partial languages.
We consider the following question from Dassow et al. [4]: Are there regular
languages L ⊆ Σ ∗ , L0 ⊆ Σ∗ and a -substitution σ with σ(L0 ) = L such that
the minimal state complexity of a DFA accepting L0 or the minimal state complexity of a -DFA accepting L, denoted by min -DFA (L), is (strictly) less than
the minimal state complexity of a DFA accepting L, denoted by minDFA (L)?
Reference [4, Theorem 4] states that for every regular language L, we have
minDFA (L) ≥ min -DFA (L) ≥ minNFA (L), where minNFA (L) denotes the minimal state complexity of an NFA accepting L, and there exist regular languages L
such that minDFA (L) > min -DFA (L) > minNFA (L). On the other hand, [4, Theorem 5] states that if n ≥ 3 is an integer, regular languages L and L0 exist such
that min -DFA (L) ≤ n + 1, minDFA (L) = 2n − 2n−2 , minNFA (L0 ) ≤ 2n + 1, and
min -DFA (L0 ) ≥ 2n − 2n−2 . This was the first step towards analyzing the sets:
Dn = {m | there exists L such that min -DFA (L) = n and minDFA (L) = m},
Nn = {m | there exists L such that min -DFA (L) = n and minNFA (L) = m}.
Our paper, whose focus is the analysis of Dn and Nn , is organized as follows.
We obtain in Section 2 values belonging to Dn by looking at specific types
of regular languages, followed by values belonging to Nn in Section 3. Due to
the nature of NFAs, generating a sequence of minimal NFAs from a -DFA is
difficult. However, in the case minDFA (L) > min -DFA (L) = minNFA (L), we
show how to use concatenation of languages to create an L0 with systematic
differences between min -DFA (L0 ) and minNFA (L0 ). We also develop a way of
applying integer partitions to obtain such values. We conclude with some remarks
in Section 4.
Partial Word DFAs
2
3
Constructs for Dn
This section provides some values for Dn by analyzing several classes of regular
languages. In the description of the transition function of our DFAs and -DFAs,
all the transitions lead to the error state (a sink non-final state) unless otherwise
stated. Also, in our figures, the error state and transitions leading to it have
been removed for clarity. We will often refer to the following algorithm.
Given a -DFA M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ) and a -substitution σ, Algorithm 1
gives a minimal DFA that accepts σ(L(M 0 )):
– Build an NFA N = (Q0 , Σ, δ, q00 , F 0 ) that accepts σ(L(M 0 )), where δ(q, a) =
{δ 0 (q, a)} if a ∈ Σ \ σ() and δ(q, a) = {δ 0 (q, a), δ 0 (q, )} if a ∈ σ().
– Convert N to an equivalent minimal DFA.
First, we look at languages of words of equal length. We give three constructs.
The first two both use an alphabet of variable size, while our third one restricts
this to a constant k. We prove the second construct which is illustrated in Fig. 1.
2 n−1 Theorem 1. For n ≥ 1, n−1
+ 3 + 2 + (n − 1) mod 3 ∈ Dn .
3
√
1+8(n−1)−1
Theorem 2. For n ≥ 1, if x =
then 2x + n − 1 − x(x+1)
∈ Dn
2
2
for languages of words of equal length.
Px
Proof. We start by writing n as n = r + i=1 i such that 1 ≤ r ≤ x+1 (from the
online encyclopedia of integer sequences, x is as stated). Let M = (Q, Σ, δ, q0 , F )
be the DFA defined as follows:
– (i, j) | 0 ≤ i < x, 0 ≤ j < 2i , (i, j) 6= (x − 1, 0) ∪{(i, 0) | x ≤ i ≤ x + r} =
Q, q0 = (0, 0), F = {(x + r − 1, 0)}, and (x + r, 0) is the error state;
– Σ = {a0 , a1 , c} ∪ {bi | 1 ≤ i < x};
– δ is defined as follows:
• δ((i, j), ak ) = (i + 1, 2j + k) for all (i, j), (i + 1, 2j + k) ∈ Q, ak ∈ Σ,
i 6= x − 1, with the exception of δ((x − 2, 0), a0 ) = (x + r, 0),
• δ((x − 1, i), bj ) = (x, 0) for all (x − 1, i) ∈ Q, bj ∈ Σ where the jth digit
from the right in the binary representation of i is a 1,
• δ((i, 0), c) = (i + 1, 0) for x ≤ i < x + r.
Each word accepted by M can be written in the form w = ubi cr−1 , where u is a
word of length x − 1 over {a0 , a1 } except for a0x−1 , and bi belongs to some subset
of Σ unique for each u. This implies that M is minimal with 2x + n − 1 − x(x+1)
2
states. We can build the minimal equivalent -DFA for σ() = {a0 , a1 }, giving
M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ) with n states as follows:
– {(i, j) | 0 ≤ i < x, 0 ≤ j ≤ i, (i, j) 6= (x − 1, 0)} ∪ {(i, 0) | x ≤ i ≤ x + r} =
Q0 , q00 = (0, 0), F 0 = {(x + r − 1, 0)}, and (x + r, 0) is the error state;
– δ 0 is defined as follows:
• δ 0 ((i, 0), a1 ) = (i + 1, i + 1) for 0 ≤ i < x − 1,
4
E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt
• δ 0 ((i, j), ) = (i + 1, j) for all (i, j) ∈ Q0 \{(x − 2, 0)} where i < x − 1,
• δ 0 ((x − 1, i), bx−i ) = (x, 0) for 1 ≤ i < x,
• δ 0 ((x + i, 0), c) = (x + i + 1, 0) for 0 ≤ i < r − 1.
Observe that L(M 0 ) = {x−i−1 a1 i−1 bi cr−1 | 1 ≤ i < x}, so σ(L(M 0 )) = L(M ).
Each accepted word consists of a unique prefix of length x − 1 paired with a
unique bi ∈ Σ, and r states are needed for the suffix cr−1
P, xwhich implies that
M 0 is minimal over all -substitutions. Note that |Q0 | = ( i=1 i) + r = n.
t
u
Fig. 1. M (left) and M 0 (right) from Theorem 2, n = 11, x = 4
Theorem 3. For k > 1 and l, r ≥ 0, let n =
k(k+2l+3)
2
+ r + 2. Then
2k+1 + l(2k − 1) + r ∈ Dn ,
for languages of words of equal length.
Next, we look at languages of words of bounded length. The following theorem
is illustrated in Fig. 2.
Theorem 4. For n ≥ 3, [n, n + (n−2)(n−3)
] ⊆ Dn .
2
Pn−3
Proof. Write m = n + r + i=l i for the lowest value of l ≥ 1 such that r ≥ 0.
Let M = (Q, Σ, δ, q0 , F ) be defined as follows:
– Σ = {a0 , ar } ∪ {ai | l ≤ i ≤ n − 3};
– Q = {(i, 0) | 0 ≤ i < n} ∪ {(i, j) | aj ∈ Σ and 1 ≤ i ≤ j}, q0 = (0, 0), F =
{(n − 2, 0)} ∪ {(i, i) | i 6= 0, (i, i) ∈ Q}, and (n − 1, 0) is the error state;
– δ is defined by δ((0, 0), ai ) = (1, i) for all ai ∈ Σ where i > 0, δ((i, j), a0 ) =
(i + 1, j) for all (i, j) ∈ Q, i 6= j, and δ((i, i), a0 ) = (i + 1, 0) for all (i, i) ∈ Q.
Then L(M ) = {ai an−3
| ai ∈ Σ} ∪ {ai ai−1
| ai ∈ Σ, i 6= 0}. For each ai , i 6= 0,
0
0
M requires i states. These are added to the error state and n − 1 states needed
for an−2
. Thus, M is minimal with m states. Let M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ), where
0
0
Q = {i | 0 ≤ i < n}, q00 = 0, F 0 = {n − 2}, and n − 1 is the error state;
δ 0 is defined by δ 0 (0, ) = 1, δ 0 (0, ai ) = n − 1 − i for all ai ∈ Σ, i > 0, and
δ 0 (i, a0 ) = i + 1 for 1 ≤ i < n − 1. For σ() = Σ, we have σ(L(M 0 )) = L(M ).
Furthermore, M 0 needs n − 1 states to accept a0n−3 ∈ L(M 0 ), so M 0 is minimal
with n states.
t
u
Partial Word DFAs
5
Fig. 2. M (top) and M 0 (bottom) from Theorem 4, n = 7 and m = 15 (l = 3, r = 1)
Theorem 4 gives elements of Dn close to its lower bound. To find an upper
bound, we look at a specific class of machines. Let n ≥ 2 and let
Rn = ({0, . . . , n − 1}, {a0 } ∪ {(αi )j | 2 ≤ i + 2 ≤ j ≤ n − 2} , δ 0 , 0, {n − 2}) (1)
be the -DFA where n − 1 is the error state, and δ 0 is defined by δ 0 (i, ) = i + 1
for 0 ≤ i < n − 2 and δ 0 (i, (αi )j ) = j for all (αi )j . Fig. 3 gives an example when
n = 7. Set Ln = σ(L(Rn )), where σ is the -substitution that maps to the
alphabet. Note that Rn is minimal for L(Rn ), since we need at least n − 1 states
to accept words of length n − 2 without accepting longer strings. Furthermore,
Rn is minimal for σ, as each letter (αi )j encodes a transition between a unique
pair of states (i, j). This also implies that Rn is minimal for any -substitution.
The next two theorems look at the minimal DFA that accepts Ln . We refer the
reader to Fig. 3 to visualize the ideas behind the proofs.
Referring to Fig. 3, in the DFA, each explicitly labelled transition is for the
indicated letters. From each state, there is one transition that is not labelled this represents the transition for each letter not explicitly labelled in a different
transition from that state. (For example, from state 0, a3 transitions to {1, 3},
a2 transitions to {1, 2}, a4 transitions to {1, 4}, a5 transitions to {1, 5}, and all
other letters a0 , b3 , b4 , b5 , c4 , c5 , d5 transition to {1}). The idea behind the proof
of Theorem 6 is that we start with this DFA. We introduce a new letter, “e”,
into the alphabet and add a new state, {2, 3, 4, 5}, along with a transition from
{1, 3} to {2, 3, 4, 5} for e. We want to alter the -DFA to accommodate this. So
we add a transition for e from 1 to 3 and from 3 to 5 (represented by dashed
edges). All other states transition to the error state for e. Now consider the string
a3 e. We get four strings that correspond to some partial word that produces a3 e
after substitution: a3 e, a3 , e, and . When the -DFA reads the first, it halts
in state 5; on the second, it halts in 4; on the third, it halts in 3; and for the
fourth, it halts in 2, which matches the added state {2, 3, 4, 5}. Finally, we need
to consider the effect of adding e and the described transitions to the -DFA does it change the corresponding minimal DFA in other ways? To show that it
does not, all transitions with dashed edges in the DFA represent the transitions
for e. For example, from state {2, 3}, an e transitions to {3, 4, 5}.
6
E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt
Fig. 3. -DFA R7 (top if the dashed edges are seen as solid) and minimal DFA for
σ(L7 ) (bottom if the dotted element is ignored and the dashed edges are seen as solid)
where α0 = a, α1 = b, α2 = c, α3 = d and σ() = {a0 , a2 , a3 , a4 , a5 , b3 , b4 , b5 , c4 , c5 , d5 }.
Theorem 5. Let Fib be the Fibonacci sequence defined by Fib(1) = Fib(2) = 1
and for n ≥ 2, Fib(n+1) = Fib(n)+ Fib(n−1). Then for n ≥ 1, Fib(n+1) ∈ Dn .
Proof. For n ≥ 2, applying Algorithm 1, convert M 0 = Rn to a minimal DFA
M = (Q, Σ, δ, q0 , F ) that accepts Ln , where Q ⊆ 2{0,...,n−1} . For each state
{i} ∈ Q for 0 ≤ i ≤ n − 2, M requires additional states to represent each
possible subset of one or more states of {i + 1, . . . , n − 2} that M 0 could reach
in i transitions. Thus M is minimal with number of states
1+
n−2
X
i=0
min{i,n−2−i} X
j=0
n−2−i
j
= Fib(n + 1),
where the 1 refers to the error state and where the inside sum refers to the
number of states with minimal element i.
t
u
Theorem 6. For n ≥ 3, the following is the least upper bound for m ∈ Dn in
the case of languages of words of bounded length:
n−1
X
i=0
n − 1 − dlog2 ie
.
i
Our next result restricts the alphabet size to two.
Theorem 7. For n ≥ 1,
n−1
b n2 c(b n2 c+1)+b n−1
2 c(b 2 c+1)
2
+ 1 ∈ Dn .
Finally, we look at languages with some arbitrarily long words.
Partial Word DFAs
7
Theorem 8. For n ≥ 1, 2n − 1 is the least upper bound for m ∈ Dn .
Proof. First, let M 0 be a minimal -DFA with -substitution σ. If we convert
this to a minimal DFA accepting σ(L(M 0 )) using Algorithm 1, the resulting DFA
has at most 2n − 1 states, one for each non-empty subset of the set of states in
M 0 . Thus an upper bound for m ∈ Dn is 2n − 1.
Now we show that there exists a regular language L such that min -DFA (L) =
n and minDFA (L) = 2n −1. Let M 0 = (Q0 , Σ , δ 0 , q00 , F 0 ) with Q0 = {0, . . . , n−1},
Σ = {a, b}, q00 = 0, F 0 = {n − 1}, and δ 0 defined by δ 0 (i, α) = i + 1 for 0 ≤ i <
n − 1, α ∈ {, a}; δ 0 (n − 1, α) = 0 for α ∈ {, a}; and δ 0 (i, b) = 0 for 0 ≤ i < n.
Then M 0 is minimal, since n−1 ∈ L(M 0 ) but i ∈
/ L(M 0 ) for 0 ≤ i < n − 1.
After constructing the minimal DF A M = (Q, Σ, δ, q0 , F ) using Algorithm 1
for σ() = {a, b}, we claim that all non-empty subsets of Q0 are states in Q. To
show this, we construct a word that ends in any non-empty subset P of Q0 . Let
P = {p0 , . . . , px } with p0 < · · · < px . We start with apx . Then create the word
w by replacing the a in each position px − pi − 1, 0 ≤ i < x, with b.
We show that w ends in state P by first showing that for each pi ∈ P , some
partial word w0 exists such that w ∈ σ(w0 ) and M 0 halts in pi when reading w0 .
First, suppose pi = px . Since |w| = px , let w0 = px . For w0 , M 0 halts in px .
Now, suppose pi 6= px . Let w0 = px −pi −1 bpi . After reading px −pi −1 , M 0 is in
state px − pi − 1, then in state 0 for b, and then in state pi after reading pi .
Now suppose a partial word w0 exists such that w ∈ σ(w0 ) where M 0 halts
in p for p ∈
/ P . Suppose p > px . Each state i ∈ Q0 is only reachable after i
transitions and |w0 | = px , so M 0 cannot reach p after reading w0 . Now suppose
p < px . Then M 0 needs to be in state 0 after reading px − p symbols to end in p,
so we must have w0 [px − p − 1] = b. However, w[px − p − 1] = a, a contradiction.
Furthermore, no states of Q are equivalent, as each word w ends in a unique
state of Q. Therefore, M has 2n − 1 states, and 2n − 1 ∈ Dn .
t
u
To further study intervals in Dn , we look at the following class of -DFAs.
For n ≥ 2 and 0 ≤ r < n, let
Rn,r {s1 , . . . , sk } = ({0, . . . , n − 1}, {a0 , a1 , . . . , ak } , δ 0 , 0, {n − 1})
(2)
be the -DFA where {s1 , . . . , sk } is a set of tuples whose first member is a letter
ai , distinct from a0 , followed by one or more states in ascending order, and
where δ 0 (q, ai ) = 0 for all (q, ai ) that occur in the same tuple, δ 0 (i, ) = i + 1 for
0 ≤ i ≤ n − 2, δ 0 (n − 1, ) = r, and δ 0 (q, ai ) = δ 0 (q, ) for all other (q, ai ). Since
Rn,r {} is minimal for any -substitution, and since and non- transitions from
any state end in the same state, Algorithm 1 converts Rn,r {} to a minimal DFA
with exactly n states. The next result looks at -DFAs of the form Rn,r {(a1 , 0)}.
Theorem 9. For n ≥ 2 and 0 ≤ i < n, n + (n − 1)i ∈ Dn .
Proof. Let a0 = a and a1 = b, let r = n − i − 1, let σ() = Σ = {a, b}, and let
M 0 = Rn,r {(b, 0)}. Using Algorithm 1, let M = (Q, Σ, δ, {0}, F ) be the minimal
DFA accepting σ(L(M 0 )). For all words over Σ of length less than n, M must halt
8
E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt
in some state P ∈ Q, a subset of consecutive states of {0, . . . , n − 1}. Moreover,
any state P ∈ Q of consecutive states of {0, . . . , n − 1}, with minimal element p,
is reached by M when reading bq ap for some q ≥ 0. Also, any accepting states
in Q that are subsets of {0, . . . , n − 1} of size n − r or greater are equivalent,
as are any non-accept states that are subsets of size n − r or greater such that
the n − r greatest values in each set are identical. This implies that M requires
P
n
j=n−i j states for words of length less than n.
For words of length n or greater, M may halt in a state P ∈ Q that is not
a subset of consecutive states of {0, . . . , n − 1}, as for some r < p < n − 1, it is
possible to have r, n − 1 ∈ P but p ∈
/ P . This only occurs when a transition from
a state P with n−1 ∈ P occurs, in which case, M moves to a state P 0 containing
r, corresponding to δ 0 (n − 1, α) for all α ∈ Σ . Thus, all states can be considered
subsets of consecutive values if we consider r consecutive to n − 1 or, in other
words, if we allow values
n − 1 to r to “wrap” around to each other. This
Pfrom
i−1
means that M requires j=1 j states for words of length n or greater. Therefore,
Pn
Pi−1
t
u
j=n−i j +
j=1 j = n + (n − 1)i ∈ Dn .
3
Constructs for Nn
Let Σ be an alphabet, and let Σi = {ai | a ∈ Σ} for all integers i, i > 0. Let
σi : Σ → Σi such that a 7→ ai , and let #j be a symbol in no Σi , for all i and
j. Given a language L over Σ, the ith product of L and the ith #-product of L
are, respectively, the languages
πi (L) =
i
Y
j=1
σj (L),
πi0 (L) = σ1 (L)
i
Y
{#j−1 }σj (L).
j=2
In general, we call any construct of this form, languages over different alphabets
concatenated with # symbols, a #-concatenation. With these definitions in hand,
we obtain our first bound for Nn .
, n ⊆ Nn .
Theorem 10. For n > 0, n − n−1
3
Proof. Let L = {aa, ba, b} be a language over Σ = {a, b}. A minimal NFA
recognizing πi (L) is defined as having 2i + 1 states, q0 , . . . , q2i , with accepting
state q2i , starting state q0 , and transition function δ defined by δ(q2j , bj+1 ) =
{q2j+1 , q2(j+1) }, δ(q2j , aj+1 ) = {q2j+1 }, and δ(q2j+1 , aj+1 ) = {q2(j+1) } for j < i.
It is easy to see this is minimal: the number of states is equal to the maximal
length of the words plus one. A minimal -DFA recognizing πi (L) is defined as
having 3i + 1 states, q0 , . . . , q3i−1 and qerr , with accepting states q3i−1 and q3i−2 ,
starting state q0 , and transition function δ defined as follows:
– δ(q0 , b1 ) = q2 , δ(q0 , ) = q1 , and δ(q1 , a1 ) = q2 ;
– δ(q3j−1 , aj+1 ) = q3j , δ(q3j−1 , bj+1 ) = q3j+1 , δ(q3j , aj+1 ) = q3(j+1)−1 , and
δ(q3j+1 , aj+1 ) = q3(j+1)−1 for 0 < j < i;
Partial Word DFAs
9
– δ(q3j+1 , aj+2 ) = δ(q3(j+1)−1 , aj+2 ) and δ(q3j+1 , bj+2 ) = δ(q3(j+1)−1 , bj+2 )
for 0 < j < i − 1.
The -substitution corresponds to Σ1 = {a1 , b1 } here. This is minimal.
Now, fix n; take any i ≤ b n−1
3 c. We can write n = 3i + r + 1, for some r ≥ 0.
Let {αj }0≤j≤r be a set of symbols not in the alphabet of πi (L). Minimal NFA
and -DFA recognizing πi (L) ∪ {α0 · · · αr } can clearly be obtained by adding
0
0
to each a series of states q00 = q0 , q10 , . . . , qr0 , and qr+1
= q2i and qr+1
= q3i−1
0
0
respectively, with δ(qj , αj ) = qj+1 for 0 ≤ j ≤ r. Hence, for i ≤ b n−1
3 c, we
can produce a -DFA of size n = 3i + r + 1 which reduces to an NFA of size
2i + r + 1 = n − i.
t
u
Our general interval is based on πi0 (L), where no -substitutions exist over
multiple Σi ’s. We need the following lemma.
Lemma 1. Let L, L0 be languages recognized by minimal NFAs N = (Q, Σ, δ,
q0 , F ) and N 0 = (Q0 , Σ 0 , δ 0 , q00 , F 0 ), where Σ ∩ Σ 0 = ∅. Moreover, let # ∈
/ Σ, Σ 0 .
00
0
00
0
Then L = L{#}L is recognized by the minimal NFA N = (Q ∪ Q , Σ ∪ Σ 0 , δ 00 ,
q0 , F 0 ), where δ 00 (q, a) = δ(q, a) if q ∈ Q and a ∈ Σ; δ 00 (q, a) = δ 0 (q, a) if q ∈ Q0
and a ∈ Σ 0 ; δ 00 (q, #) = {q00 } if q ∈ F ; and δ 00 (q, a) = ∅ otherwise. Consequently,
the following hold:
1. For any L, minNFA (πi0 (L)) = i minNFA (L);
2. Let L1 , . . . , Ln be languages whose minimal DFAs have no error states and
whose alphabets are pairwise disjoint, and without loss of generality, let
minDFA (L1 ) − min -DFA (L1 ) ≥ · · · ≥ minDFA (Ln ) − min -DFA (Ln ). Then
min (L1 {#1 }L2 {#2 } · · · Ln ) = 1 + min (L1 ) +
-DFA
-DFA
n
X
i=2
min(Li ).
DFA
Theorem 11. Let L be a language whose minimal DFA has no error state.
Moreover,
assume
k min -DFA (L) = minNFA (L). Fix some n and j, 0 < j ≤
j
n−min -DFA (L)−1
. Then n − j (minDFA (L) − min -DFA (L)) − 1 ∈ Nn .
minDFA (L)
Proof. Since 0 < j ≤
j
n−min -DFA (L)−1
minDFA (L)
k
, we can write n = 1 + min -DFA (L) +
j minDFA (L) + r for some r. Then, by Lemma 1(2), this corresponds to n =
0
min -DFA (πj+1
(L) ∪ {w}), where w is a word corresponding to an r-length chain
0
of states, as we used in the proof of Theorem 10. We also have minNFA (πj+1
(L)∪
{w}) = (j + 1) min -DFA (L) + r using Lemma 1(1) and our assumption that
min -DFA (L) = minNFA (L). Alternatively,
0
min(πj+1
(L) ∪ {w}) = n − j min(L) − min (L) − 1.
NFA
Our result follows.
DFA
-DFA
t
u
10
E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt
The above linear bounds can be improved, albeit with a loss of clarity in
the overall construct. Consider the interval of values obtained in Theorem 4.
Fix anl integer x.mThe minimal integer y such that x ≤ y + (y−2)(y−3)
is clearly
2
√
3+ 8x−15
, for x ≥ 4. Associate with x and nx the corresponding DFAs
nx =
2
and -DFAs used in the proof of Theorem 4, i.e., let Ln,m be the language in
the proof with minimal -DFA size n and minimal DFA size m. If we replace
each -transition in the minimal -DFA and remove the error state, we get a
minimal NFA of size n − 1 accepting Ln,m (this NFA must be minimal since
the maximal length of a word in Ln,m is n − 2). Noting that all deterministic automata in question have error states, we get, using Lemma 1(1), that
min -DFA (πi0 (Lnx ,x )) = nx + (i − 1)(x − 1) and minNFA (πi0 (Lnx ,x )) = i(nx − 1).
This allows us to obtain the following linear bound.
k
i
h
j
x
−
1,
n
⊆ Nn .
Theorem 12. For n > nx ≥ 4, n − (x − nx ) n−n
x−1
Proof. For any n and fixed x, write n = nx + (i − 1)(x − 1) + r, for some
0 ≤ r < x − 1, which is realizable as a minimal -DFA by appending to the
minimal -DFA accepting πi0 (Lnx ,x ) an arbitrary chain of states of length r,
using letters not in the alphabet of πi0 (Lnx ,x ), similar to what we did in the
proof of Theorem 10. This leads
j to ka minimal NFA of size i(nx − 1) + r, giving
x
the lower bound n − (x − nx ) n−n
− 1 if we solve for i. Anything in the upper
x−1
bound can be obtained by decreasing i or replacing occurrences of Lnx ,x with
Lnx ,x−j (for some j) and in turn adding additional chains of states of length r,
to maintain the size of the -DFA.
t
u
We can obtain even lower bounds by considering the sequence of DFAs defined in Theorem 8. Recall that for any n ≥ 1, we have a minimal DFA, which
we call Mn , of size 2n − 1; the equivalent minimal -DFA, Mn0 , has size n. Applying Algorithm 1 to Mn0 , the resulting NFA of size n is also minimal. Let
n0 ≥ n1 ≥ · · · ≥ nk be a sequence of integers and consider
min (L(Mn0 ){#1 }L(Mn1 ) · · · {#k }L(Mnk )) = 1 + n0 +
-DFA
k
X
(2ni − 1) ,
(3)
i=1
where the equality comes from Lemma 1(2). Iteratively applying Lemma 1 gives
min(L(Mn0 ){#1 }L(Mn1 ) · · · {#k }L(Mnk )) =
NFA
k
X
ni .
(4)
i=0
To understand the difference between (3) and (4) in greater depth, let us view
(n1 , . . . , nk ) as an integer partition, λ, or as a Young Diagram and assign each
cell a value (see, e.g., [1]). In this case, the ith column of λ has each cell valued
at 2i−1 . Transposing about y = −x gives the diagram corresponding to the
transpose of λ, λT = (m1 , . . . , mn1 ), in which the ith row has each cell valued
at 2i−1 . Note that m1 = k and there are, for each i, mi terms of 2i−1 . Fig. 4
Partial Word DFAs
11
Fig. 4. λ = (6, 4, 1, 1) (left) and λT = (4, 2, 2, 2, 1, 1) (right)
gives an example of an integer partition and its transpose. Define Π(λT ) =
Pk
Pk
Pn1 i−1
mi = i=1 (2ni − 1) and Σ(λ) = i=1 ni .
i=1 2
Given this, we can view the language L described in (3) and (4), i.e, L =
L(Mn0 ){#1 }L(Mn1 ) · · · {#k }L(Mnk ), as being defined by the integer n0 and the
partition of integers λ = (n1 , . . . , nk ) with n0 ≥ n1 . This gives
min (L) = 1 + n0 + Π(λT )
-DFA
and
min(L) = n0 + Σ(λ).
NFA
To further understand this, we must consider the following sub-problem: let
Π(λ) = n. What are the possible values of Σ(λ)? To proceed here, we define the
sequence pn recursively as follows: if n = 2k − 1 for some k, pn = k; otherwise,
letting n = m + (2k − 1) for k maximal, pn = k + pm . This serves as the minimal
bound for the possible values of Σ(λ).
Theorem 13. If Π(λ) = n, then Σ(λ) ≥ pn . Consequently, for all n and k =
blog2 (n + 1)c, k + pn ∈ N1+k+n .
Proof. To show that pn is obtainable, we prove that the following partition, λn ,
satisfies Σ(λ) ≥ pn : if n = 2k − 1 for some k, λn = (1k ); otherwise, letting
n = m + (2k − 1) for k maximal, λn = λ2k −1 + λm . Here, the sum of two
partitions is the partition obtained by adding the summands term by term; (1k )
is the k-tuple of ones. Clearly, for partitions λ and λ0 , Π(λ + λ0 ) = Π(λ) + Π(λ0 )
and Σ(λ + λ0 ) = Σ(λ) + Σ(λ0 ). By construction, Π(λn ) = n and Σ(λn ) = pn .
To see this, if n = 2k − 1 for some k, Π(λn ) = Π((1k )) = Π((k)T ) = 2k − 1 = n
Pk
and Σ(λn ) = Σ((1k )) = i=1 1 = k = pn . Otherwise,
Π(λn ) = Π(λ2k −1 ) + Π(λm ) = Π((1k )) + Π(λm ) = 2k − 1 + m = n,
Σ(λn ) = Σ(λ2k −1 ) + Σ(λm ) = Σ((1k )) + Σ(λm ) = k + pm = pn .
To show that pn , or λn , is minimal, we can proceed inductively.
From the above, each pn is obtainable by a partition of size k, where k is the
maximal integer with n ≥ 2k − 1. Alternatively, k = blog2 (n + 1)c. Fixing n, we
get k + pn ∈ N1+k+n .
t
u
12
4
E. Balkanski, F. Blanchet-Sadri, M. Kilgore, and B. J. Wyatt
Conclusion
For languages of words of equal length, Theorem 2 gives the maximum element in
Dn found so far and Theorem 3 gives that maximum element when we restrict to
a constant alphabet size. For languages with words of bounded length, Theorem 6
gives the least upper bound for elements in Dn based on minimal -DFAs of
the form (1) and Theorem 7 gives the maximum element found so far when
we restrict to a binary alphabet. For languages with words of arbitrary length,
Theorem 8 gives the least upper bound of 2n − 1 for elements in Dn , bound
that can be achieved over a binary alphabet. We conjecture that for n ≥ 1,
[n, 2n − 1] ⊆ Dn . This conjecture has been verified for all 1 ≤ n ≤ 7 based on all
our constructs from Section 2.
In Section 3, via products, Theorem 10 gives an interval for Nn . If we replace
products with #-concatenations, Theorem 12 increases the interval further. Theorem 13 does not give an interval, but an isolated point not previously achieved.
With the exception of this latter result, all of our bounds are linear. Some of
our constructs satisfy min -DFA (L) = minNFA (L), ignoring error states. As noted
earlier, this is a requirement for #-concatenations to produce meaningful bounds.
Constructs without this restriction are often too large to be useful.
References
1. Andrews, G.E., Eriksson, K.: Integer Partitions. Cambridge University Press (2004)
2. Berstel, J., Boasson, L.: Partial words and a theorem of Fine and Wilf. Theoretical
Computer Science 218, 135–141 (1999)
3. Blanchet-Sadri, F.: Algorithmic Combinatorics on Partial Words. Chapman &
Hall/CRC Press, Boca Raton, FL (2008)
4. Dassow, J., Manea, F., Mercaş, R.: Connecting partial words and regular languages.
In: Cooper, S.B., Dawar, A., Löwe, B. (eds.) CiE 2012, Computability in Europe.
Lecture Notes in Computer Science, vol. 7318, pp. 151–161. Springer-Verlag, Berlin,
Heidelberg (2012)
5. Fischer, M., Paterson, M.: String matching and other products. In: Karp, R. (ed.)
7th SIAM-AMS Complexity of Computation. pp. 113–125 (1974)
6. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to Automata Theory, Languages, and Computation - international edition (2nd ed). Addison-Wesley (2003)
7. Yu, S.: Regular languages. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, chap. 2, pp. 41–110. Springer-Verlag, Berlin (1997)