Ambiguity, Nondeterminism and State Complexity of Finite Automata

Acta Cybernetica 23 (2017) 141–157.
Ambiguity, Nondeterminism and State Complexity
of Finite Automata
Yo-Sub Hana, Arto Salomaab, and Kai Salomaac
Abstract
The degree of ambiguity counts the number of accepting computations of
a nondeterministic finite automaton (NFA) on a given input. Alternatively,
the nondeterminism of an NFA can be measured by counting the amount of
guessing in a single computation or the number of leaves of the computation
tree on a given input. This paper surveys work on the degree of ambiguity
and on various nondeterminism measures for finite automata. In particular, we focus on state complexity comparisons between NFAs with quantified
ambiguity or nondeterminism.
Keywords: finite automata, nondeterminism, degree of ambiguity, state
complexity
Dedicated to the memory of Zoltán Ésik (1951–2016).
1
Introduction
Finite automata are a fundamental model of computation that has been systematically studied since the 1950’s. At the same time many important questions on
finite automata and regular languages remain open [7, 18, 52]. The last decades
have seen much work on the descriptional complexity, or state complexity, of regular
languages [10, 13, 15, 16, 17, 28]. The state complexity (respectively, nondeterministic state complexity) of a regular language L is the optimal size of a deterministic
finite automaton (DFA) (respectively, a nondeterministic finite automaton (NFA))
recognizing L. The effect of a regularity preserving operation on the minimal DFA
(or alternatively a minimal NFA) is called the state complexity of the operation.
The state complexity of basic operations on regular languages was considered first
by Maslov [34] and further references can be found in the survey by Gao et al. [9].
a Department of Computer Science, Yonsei University, 50, Yonsei-Ro, Seodaemum-Gu, Seoul
120-749, Republic of Korea, E-mail: [email protected]
b Department of Mathematics and Statistics and Turku Centre for Computer Science, University
of Turku, 20014 Turku, Finland, E-mail: [email protected]
c School of Computing, Queen’s University, Kingston, Ontario K7L 2N8, Canada, E-mail:
[email protected]
DOI: 10.14232/actacyb.23.1.2017.9
142
Y.-S. Han, A. Salomaa, K. Salomaa
Yu and co-authors have considered also the state complexity of combined operations and in a sequence of papers culminating with [6] have determined the precise
worst-case state complexity of all combinations of two basic language operations.
Establishing the precise state complexity of combined language operations is often
quite involved, and for general combinations of operations that include marked concatenation and intersection the question is even undecidable [45]. Ésik et al. [8] have
introduced techniques to estimate the state complexity of combined operations.
Ambiguity is a fundamental concept in grammar derivations. The ambiguity
of regular expressions and finite state machines was first systematically considered
by Book et al. [3]. A regular expression is unambiguous if it denotes each string
in at most one way. A nondeterministic finite automaton (NFA) is unambiguous if
each string has at most one accepting computation. Book et al. [3] show that the
Glushkov automaton construction preserves ambiguities of a regular expression. A
more restrictive notion of one-unambiguity was introduced by Brüggemann-Klein
and Wood [4]: every regular language can be denoted by an unambiguous regular
expression but not, in general, by a one-unambiguous regular expression.
The degree of ambiguity of an NFA A on a string w is the number of accepting
computations of A on w. The degree of ambiguity of A is the maximal degree of
ambiguity of A on any input string, if the maximum exists, and in this case A is said
to be finitely ambiguous. Otherwise the degree of ambiguity of A can be measured
as a function of the length of the inputs. Ravikumar and Ibarra [42] have first studied systematically the size trade-offs between the unambiguous, finitely ambiguous,
polynomially ambiguous and exponentially ambiguous NFAs. The celebrated separation result of Leung [30] establishes that there exist (exponentially ambiguous)
n-state NFAs such that any equivalent polynomially ambiguous NFA needs 2n − 1
states. Hromkovič et al. [19, 20] have used powerful techniques from communications complexity for state complexity separations for NFAs with different degree of
ambiguity.
The degree of ambiguity is defined in terms of the number of accepting computations, and does not directly limit the amount of nondeterminism, or the amount
of guessing, used by an automaton. In an unambiguous NFA, even though an accepting computation is unique, the computation may include any number of nondeterministic steps – unambiguity implies just that at any nondeterministic step at
most one choice can lead to acceptance. In order to develop a quantitative understanding of the power of nondeterminism, one can directly measure the number of
nondeterministic steps used by an NFA.
Nondeterminism measures for Turing machine computations were originally considered by Kintala and Fischer [25]. Kintala and Wotschke [26] first quantified the
amount of nondeterminism in a finite automaton computation and showed, roughly
speaking, that there is a significant difference in the determinization size blow-up
between NFAs allowing different finite numbers of nondeterministic choices in a
computation (where the number of nondetermistic steps is at most the logarithm
of the number of states). The hierarchy result has been refined in the spectrum
result of Goldstine et al. [11] that will be discussed in section 4.3.
Commonly used nondeterminism measures count the number of nondeterminis-
Ambiguity, Nondeterminism and State Complexity
143
tic steps (or the amount of guessing in bits of information) on a best accepting computation [11], or the number of leaves of the entire computation tree [19, 38]. Further variants limit the amount of nondeterminism on a worst computation [19, 39].
Some interesting relationships between the degree of ambiguity and various nondeterminism measures have been established by Goldstine et al. [12] and Hromkovič
et al. [19].
This paper surveys work on the growth rates of the degree of ambiguity and
the various nondeterminism measures, and on algorithms to determine the growth
rate for a given NFA. In particular, we focus on state completity comparisons between NFAs having different degrees of ambiguity or allowing different amounts of
nondeterminism. Strong separation results are known for succinctness comparisons
between NFAs of different ambiguity growth rates (finite, polynomial or exponential). However, in the case of limited nondeterminism, practically all existing work
on state complexity is restricted to comparisons between different finite amounts
of nondeterminism, that is, the amount of nondeterminism on any input is at most
a given constant. State complexity of NFAs with limited nondeterminism that is
measured as a function of input length is a topic for future study.
First we fix some notation in section 2. Work on the degree of ambiguity is
described in section 3 and section 4 deals with the various nondeterminism measures
for NFAs.
2
Definitions
Here we recall and introduce some basic notation and definitions. More information
on finite automata and regular languages can be found e.g. in [44, 47, 51]. General
background on degrees of ambiguity and limited nondeterminism for finite automata
can be found in [10, 13, 16, 41].
The set of positive integers is N and the cardinality of a finite set F is |F |. The
set of strings over a finite alphabet Σ is Σ∗ and ε is the empty string. A bounded
language is a subset of a∗1 a∗2 · · · a∗k , where ai , 1 ≤ i ≤ k, are (not necessarily distinct)
elements of the alphabet Σ.
A nondeterministic finite automaton (NFA) is a tuple A = (Q, Σ, δ, q0 , F ) where
Q is the finite set of states, Σ is the input alphabet, δ : Q×Σ → 2Q is the transition
function, q0 ∈ Q is the initial state and F ⊆ Q is the set of final states. The
transition function δ is in the usual way extended as a function Q × Σ∗ → 2Q and
the language recognized by A is L(A) = {w ∈ Σ∗ | δ(q0 , w) ∩ F 6= ∅}. If |δ(q, b)| ≤ 1
for all q ∈ Q and b ∈ Σ, the automaton A is a deterministic finite automaton
(DFA). Note that we allow DFAs to have undefined transitions.
It is well known that DFA’s and NFA’s both recognize the class of regular
languages. For a regular language L, the state complexity of L (respectively, the
nondeterministic state complexity of L) is the number of states of the state minimal
DFA (respectively, of a state minimal NFA1 ) recognizing L.
1 An
NFA with the smallest number of states recognizing a language L need not be unique.
144
Y.-S. Han, A. Salomaa, K. Salomaa
Consider an NFA A = (Q, Σ, δ, q0 , F ). The branching of a transition from
state q ∈ Q on input symbol b ∈ Σ is |δ(q, b)|. A computation of A on a string
w = b1 b2 · · · bk , bi ∈ Σ, i = 1, . . . , k, k ≥ 0, is a sequence of states (p1 , . . . , p` ),
where p1 ∈ δ(q0 , b1 ), pj+1 ∈ δ(pj , bj+1 ), j = 1, . . . ` − 1, and either ` = k, or, ` < k
and δ(p` , b`+1 ) = ∅.
The sequence of states (p1 , . . . , p` ) is a complete computation on b1 b2 · · · bk if ` =
k and an accepting computation is a complete computation that ends in an accepting
state of F . The set of all computations (respectively, all accepting computations)
of A on the string w is denoted compA (w) (respectively, compacc
A (w)).
Intuivively, a computation of A on a string w is a sequence of states that A
reaches when started with the initial state and the symbols of w are read one by
one. A complete computation ends with a state reached after consuming all symbols
of w. A computation may also end with a state where the transition on the next
symbol of w is undefined.
3
Degree of ambiguity
Book et al. [3] first considered systematically the ambiguity of regular expressions
and NFAs, and the relationship between these notions. A regular expression is
unambiguous if it denotes each string in at most one way. A more restrictive
notion of 1-unambiguity, or 1-determinism, was introduced by Brüggemann-Klein
and Wood [4]. A regular expression is 1-unambiguous if its position automaton is
deterministic. Every regular language has an unambiguous regular expression but
the 1-unambiguous expressions define a strict subclass of regular languages [4, 14].
An NFA is unambiguous if any string has at most one accepting computation.
Formally, the degree of ambiguity of an NFA A on a string w, daA (w), is the number
of accepting computations of A on w. The degree of ambiguity of A on strings of
length m is defined as
daA (m) = max{daA (w) | w ∈ Σm }.
Strictly speaking, we use the symbol daA to denote two different functions: it
denotes a function Σ∗ → N and a function N → N.
The degree of ambiguity of A is said to be finite (or bounded) if the values
daA (m), m ∈ N are bounded, and in this case we denote
dasup
A = sup daA (m).
m∈N
The NFA A is unambiguous if dasup
A = 1. Clearly every DFA is unambiguous.
Following Ravikumar and Ibarra [42], with respect to the degree of ambiguity we consider five different classes of NFAs: DFAs, unambiguous NFAs (UFA),
finitely ambiguous NFAs (FNFA), polynomially ambiguous NFAs (PNFA) and general (potentially exponentially ambiguous) NFAs. An NFA A is strictly polynomially ambiguous if A is not finitely ambiguous and there is a polynomial p(·) such
that daA (m) ≤ p(m) for all m ∈ N. The polynomial degree of growth of A is the
Ambiguity, Nondeterminism and State Complexity
145
minimal degree of a polynomial p0 (m) that upper bounds the function daA (m). An
NFA A is strictly exponentially ambiguous if it is not polynomially ambiguous.
It is known that for a fixed k ∈ N the equivalence of FNFAs with degree of
ambiguity k can be tested in polynomial time [27, 49]. This is significant because,
as we will see, the determinization of even UFAs can cause an exponential size
blow-up. Also, different variants of the minimization problem for UFAs remain
intractable, see [16] for references.
The syntactic definition of an NFA A does not directly tell us what is the degree
of ambiguity of A. It was shown by Mandel and Simon [33] and Reutenauer [43],
and by others independently, that it is decidable whether a given NFA is finitely
ambiguous or polynomially ambiguous. Reutenauer [43] also gave an algorithm to
compute the polynomial degree of growth of an NFA.
Building on charaterizations by Ibarra and Ravikumar [21] and Reutenauer [43],
Weber and Seidl [50] gave a simpler structural characterization of finitely ambiguous and polynomial ambiguous NFAs that yields a polynomial time algorithm for
the corresponding decision problems. The characterization implies also that, for
an NFA with unbounded ambiguity, the degree of ambiguity must grow at least
linearly.
Theorem 1 (Weber and Seidl [50]). It can be decided in polynomial time whether
a given NFA A is finitely ambiguous, strictly polynomially ambiguous or strictly
exponentially ambiguous. Furthermore, the polynomial degree of growth of A can
be computed in polynomial time.
For the question of determining the exact finite degree of ambiguity, the complexity depends essentially on whether the finite degree of ambiguity is a constant
or considered part of the input. For a fixed k, it can be tested in polynomial time
whether the degree of ambiguity of an NFA is greater than k [49], but when k is
part of the input the complexity is essentially worse.
Theorem 2 (Chan and Ibarra [5]). For a given NFA A and k ∈ N, testing whether
the degree of ambiguity of A is at least k is PSPACE-complete.
A relevant question is also how large can be the degree of ambiguity of an nstate FNFA. A double exponential upper bound was given already by Mandel and
3
Simon [33] and this was impoved to 2Θ(n ) by Reutenauer [43]. The bound was
further improved by Weber and Seidl [50] who also show that for some subclasses
of NFAs the maximal finite degree of ambiguity is exactly 2Θ(n) .
Theorem 3 (Weber and Seidl [50]). The degree of ambiguity of an n-state FNFA
n
is at most 5 2 · nn .
3.1
Ambiguity and state complexity
Clearly every regular language can be recognized by an unambiguous NFA, but the
succinctness of the description depends significantly on the degree of ambiguity.
Schmidt [46] first developed methods to prove lower bounds for the size of UFAs
146
Y.-S. Han, A. Salomaa, K. Salomaa
and also showed that the determinization of UFAs causes, in the worst case, an
exponential size blow-up. The lower bound was improved by different authors
and the precise worst case size blow-up was determined by Leung [32]. Leiss [29]
constructed n-state UFAs with multiple initial states where any equivalent DFA
needs 2n − 1 states and Leung [32] gave a construction for the same exponential
size blow-up using UFAs with only one initial state.
Theorem 4 (Leiss [29], Leung [32]). For each n ∈ N, there exists an UFA with n
states such that the minimal equivalent DFA has 2n − 1 states. For each n ∈ N,
there exists an FNFA with n states such that any equivalent UFA has 2n − 1 states.
Note that because our definition allows DFAs to be incomplete the above bound
for the UFA determinization differs by one from the bound stated in [32]. Inputdriven pushdown automata (IDPDA) define a subclass of deterministic context-free
languages that retains many of the desirable properties of the regular languages.
In particular, an n-state nondeterministic IDPDA has an equivalent deterministic
2
machine with 2Θ(n ) states [1]. Recently, Okhotin and Salomaa [36] have shown
that, analogously with Theorem 4, determinizing an unambiguous IDPDA and
converting a general nondeterministic IDPDA to an unambiguous one both cause,
2
in the worst case, the same 2Θ(n ) size blow-up.
For state complexity comparisons between NFAs with different growth rates
of ambiguity we use the following terminology. Consider classes X and Y of devices (the classes we consider are DFAs, UFAs, FNFAs, PNFAs and general NFAs,
possibly with additional restictions). We say that class Y is (super-polynomially)
separated from class X, if there exists a collection of languages Ln , n ∈ N, such that
Ln is recognized by a device from class Y having n states, but for any polynomial
p(n) and for sufficiently large values of n, a device from class X for Ln must have
more than p(n) states. This means, roughly speaking, that simulation of devices
of class Y by devices of class X, in the worst case, causes a super-polynomial size
blow-up.
Ravikumar and Ibarra [42] first considered systematically succinctness comparisons between FNFAs, PNFAs and general NFAs. In particular, they established
the following result for NFAs accepting bounded languages.
Theorem 5 (Ravikumar and Ibarra [42]). Any NFA accepting a bounded language can be converted to an FNFA with at most polynomial size blow-up. The
class of FNFAs (respectively, the class of UFAs) recognizing a bounded language is
super-polynomially separated from the corresponding class of UFAs (respectively, of
DFAs).
The descriptional complexity comparison between the classes FNFA, PNFA and
NFA recognizing general regular languages was left open in [42]. Although in the
case of bounded languages, NFAs of exponential ambiguity can be simulated by
PNFAs and FNFAs of polynomial size, it was conjectured that for general regular
languages the classes are super-polynomially separated. Leung [30] and Hromkovič
et al. [19] have established that general NFAs can be super-polynomially more
succinct than PNFAs.
Ambiguity, Nondeterminism and State Complexity
147
Theorem 6 (Leung [30], Hromkovič et al. [19]). The class of NFAs is superpolynomially separated from the class of PNFAs.
The communication complexity techniques used by Hromkovič et al. [19] to
prove Theorem 6 yield a substantially simplified proof. However, their proof does
not give the optimal size blow-up 2n −1 for the NFA–to–PNFA transformation that
is obtained in the original ad hoc proof where Leung [30] shows that any PNFA for
the family of languages Ln = (0 + (01∗ )n−1 0)∗ , n ≥ 1, cannot be smaller than an
incomplete DFA. It is easy to give an n-state NFA of exponential ambiguity that
recognizes Ln .
Ravikumar and Ibarra [42] also conjectured that polynomially ambiguous NFAs
can be significantly more succinct than finitely ambiguous NFAs. This question
remained open for over 20 years. After about 10 years Hromkovič et al. [19] gave a
partial result showing that there exist (n + 2)-state PNFAs (with linear degree of
ambiguity) such that any equivalent FNFA with degree of ambiguity k must have
n−2
at least 2 k − 2 states. The question was solved affirmatively by Hromkovič and
Schnitger [20] using the powerful communication complexity techniques.
Theorem 7 (Hromkovič and Schnitger [20]). For n ∈ N there exists PNFA A with
number of states polynomial in n such that any FNFA recognizing the language
1
L(A) has at least 2Ω(n 3 ) states.
Theorem 7 is obtained as a special case of the more technical statement given
next in Theorem 8 by setting there the parameter k to be one and, in fact, the
degree of ambiguity of the PNFA A is only linear. The general result by Hromkovič
and Schnitger [20] gives a super-polynomial succinctness separation between NFAs
with degree of ambiguity O(mk ) and O(mk−1 ), k ∈ N.
1
Theorem 8 (Hromkovič and Schnitger [20]). Let r and t = (r/k 2 ) 3 be positive
integers. There exist languages Lr,k having an NFA with degree of ambiguity O(mk )
and k · poly(r) states such that any NFA for Lr,k with degree of ambiguity o(mk )
has at least 2Ω(r
(1
3
5
/k 3 ) )
states.
Theorem 7 and Theorem 8 give a super-polynomial separation, respectively, between PNFAs and FNFAs and between NFAs having different polynomial degree of
growth for ambiguity. The statement of Theorem 8 defines the languages Lr,k only
for restricted values of the subindices, but for the separation result it is sufficient
that Lr,k exists for infinitely many values of r and k. However, the lower bounds
are not of the order 2Θ(n) as is known in the separation of general NFAs and PNFAs [30]. In fact, Hromkovič and Schnitger [20] suspect that the lower bound of
Theorem 8 may not be optimal even for the languages used in the lower bound
construction.
To conclude this section, we mention that Okhotin [35] has studied the state
complexity of determinization of unary UFAs and Jirasek et al. [22] recently studied
the state complexity of operations on UFAs.
148
4
Y.-S. Han, A. Salomaa, K. Salomaa
Limited nondeterminism
Nondeterminism measures can be based on the amount of nondeterminism used in
a best accepting computation of an NFA A on a given string w, on the amount of
nondeterminism in a worst computation of A on w or on the size of the computation
tree of A on w [10, 11, 19, 38].
In the following, A = (Q, Σ, δ, q0 , F ) is always an NFA. Consider a string w =
b1 b2 · · · bk , bi ∈ Σ, i = 1, . . . , k, and a computation of A on w,
C = (p1 , . . . , p` ), pi ∈ Q, 1 ≤ i ≤ ` ≤ k.
Recall that ` < k is possible only if δ(p` , b`+1 ) = ∅, that is, a computation reads
the entire string w unless it encounters an undefined transition.
The guessing of the computation C, γA (C) [11], is
γA (C) = log2 |δ(q0 , b1 )| +
`−1
X
log2 |δ(pi , bi+1 )|.
i=1
The branching of the first step of the computation C is |δ(q0 , b1 )|, and after the
first step the state is p1 . The branching of the second step is then |δ(p1 , b2 )|, and
the branching of the ith step is |δ(pi−1 , bi )|, 3 ≤ i ≤ ` − 1. Thus, intuitively, γA (C)
represents the amount of guessing, in bits of information, that occurs during the
computation C. If A is a DFA, the amount of guessing in any computation of A is
zero.
The branching of the computation C, βA (C) [11], is defined as the product of the
branchings of the individual transitions of C, or in other words, βA (C) = 2γA (C) .
The amount of guessing an NFA uses on a string can be defined either as a best
case or a worst case measure. The guessing of a string w ∈ L(A) [11] is the amount
of guessing of the best accepting computation:
γA (w) = min{ γA (C) | C ∈ compacc
A (w) },
and the maximum guessing of A on a string w ∈ Σ∗ [39] is
max
γA
(w) = max{ γA (C) | C ∈ compA (w) }.
Note that the best case measure is defined as the amount of guessing on the best
accepting computation while the maximum guessing considers all, not necessarily
complete, computations. Instead of counting the amount of guessing in bits of
information, Hromkovič et al. [19] use the advice measure that counts the number
of nondeterministic steps on the worst computation on a given input and Leung [31]
uses a corresponding best case measure. These measures are within a multiplicative
max
constant (depending only on the NFA A) of the γA
and γA measures, respectively.
The branching (respectively, the trace) of A on the string w is then βA (w) =
max
2γA (w) (respectively, τA (w) = 2γA (w) [39, 41]).
The total amount of nondeterminism used by A in all computations on a string
w is represented by the number of leaves of the computation tree of A on w. The
Ambiguity, Nondeterminism and State Complexity
149
number of leaves is the same as the number of computations of A on w, |compA (w)|,
and this value is called the tree width of A on w, twA (w). The tree width measure
is called ‘leaf size’ in [19].
Similarly as we did with the degree of ambiguity, the tree width, the (maximum)
guessing, the branching and the trace of an NFA A defines a function on naturals
by taking the maximum value of the measure on strings of length m (m ∈ N). If χ
is any of tw, γ, γ max , β, or τ then χA : N → N is defined as
χA (m) = max{ χA (w) | w ∈ Σm }, m ∈ N.
We say that the χ-function of A is finite (or bounded ) if the value χsup
=def
A
supm∈N χA (m) is finite.
Hromkovič et al. [19] have characterized the possible growth rates of the tree
width of an NFA. As for degree of ambiguity, the tree width of an NFA cannot be
unbounded and sublinear.
Theorem 9 (Hromkovič et al. [19]). For any NFA A, the function twA (m) is
either bounded by a constant, or between linear and polynomial in m, or otherwise
in 2Θ(m) .
The above characterization can be effectively decided. An NFA A has unbounded tree width if and only if some cycle of A contains a nondeterministic
transition and this observation yields a simple polynomial time algorithm to test
whether twA (m) is bounded [38]. On the other hand, there is no efficient algorithm
to determine whether the guessing of an NFA is bounded.
Theorem 10 (Leung [31]). For a given NFA A, it is PSPACE-complete to decide
whether γA (m) is bounded.
Interestingly it is known that the guessing of an NFA may be unbounded and
grow sublinearly.
Theorem 11 (Simon [48],
√ Goldstine et al. [12]). For each k ∈ N, there is an NFA
A such that γA (m) = Θ( k m).
Due to the exponential correspondence between the guessing and branching
measures, Theorem√ 11 implies that, for each k ∈ N, there exists an NFA A such
k
that βA (m) = 2Θ( m) . It is not known whether the branching of an NFA can be
polynomially bounded but infinite [39].
Open 1. If NFA A has unbounded branching does this imply that the growth rate
of βA (m) must be superpolynomial?
It is known that, for a unary NFA A, βA (m) is always either bounded or in
2Θ(m) [41] and the possible growth rates of a variant of the branching measure
considered in [37] are similarly restricted. For the worst-case branching measure
trace, Palioudakis et al. [39] have shown that, for an n-state NFA A, τA (m) is either
m
bounded or τA (m) ≥ 2b n c .
150
4.1
Y.-S. Han, A. Salomaa, K. Salomaa
NFAs with large finite nondeterminism
Similarly as in the case of degree of ambiguity [50], for an n-state NFA A with
bounded guessing (respectively, bounded tree width) we can ask how large can
the guessing (respectively, the tree width) of A be. Leung [31] has shown that an
n-state NFA with limited nondeterminism in any computation can make at most
2n − 2 nondeterministic transitions and has constructed a family of NFAs with
bounded nondeterminism that is considerably larger than the number of states.
Theorem 12 (Leung [31]). If A is an n-state NFA with bounded guessing, then
n
sup
γA (m) = O(2n ). There exist n-state NFAs Bn , n ∈ N such that γB
= 2 3 − 2.
n
A limitation of the above result is that in the NFAs Bn are defined over a
growing alphabet and a large number of the nondeterministic moves are redundant.
It remains open whether there exist n-state NFAs A with bounded guessing that
is larger than n and where the language L(A) cannot be recognized by an NFA
of same size and less nondeterminism [31]. The notion of “less nondeterminism”
could be formalized analogously as is done below with the notion of optimality in
the case of tree width.
Hromkovič et al. [19] observed that the tree width of an n-state NFA, if bounded,
is at most nn . Palioudakis et al. [38] improved this bound and, furthermore, gave
a construction of n-state NFAs with all possible values of bounded tree width that
do not have “redundant” nondeterminism.
The notion of avoiding redundant nondeterminism is formalized as follows. A
finite tree width NFA A with n states is said to have optimal tree width if L(A)
sup
cannot be recognized by any NFA B with n1 states where n1 ≤ n and twsup
B ≤ twA
and at least one of the inequalities is strict.
Theorem 13 (Palioudakis et al. [38]). The tree width of an n-state finite tree width
NFA is at most 2n−2 . For every n ≥ 2 and 1 ≤ k ≤ 2n−2 there exists an n-state
NFA over a binary alphabet having optimal tree width k.
Note that the above bound is less than the upper bound for the finite ambiguity
of an n-state NFA (from Theorem 3). Naturally, for any NFA A and string w, the
degree of ambiguity of A on w is at most the tree width of A on w (and usually
much smaller than the tree width). However, an upper bound for the finite tree
width of an n-state NFA does not imply a corresponding bound for the degree of
ambiguity because an NFA may have finite ambiguity and unbounded tree width.
4.2
Comparing nondeterminism measures and ambiguity
Directly from the definitions it follows that if an NFA A has finite tree width, then
the guessing (and branching) of A is also finite, but the converse implication does
not need to hold. The tree width of A is finite if and only if the trace of A is finite.
Proposition 1 (Palioudakis et al. [39]). If A is an NFA with finite tree width, then
sup
sup
twA
twsup
A ≤ τA ≤ 2
−1
.
Ambiguity, Nondeterminism and State Complexity
151
It is known that the above inequalities cannot be improved in general, that is,
there are NFAs for which either of the inequalities of Proposition 1 becomes and
equality [39].
Hromkovič et al. [19] have established relationships between the tree width,
maximum guessing and degree of ambiguity in a minimal NFA. They use the name
‘leaf size’ for tree width and instead of maximum guessing they use an “advice”
measure that is within a constant factor of maximum guessing. The advice of an
NFA A on a string w counts the largest number of nondeterministic steps in any
computation of A on w.
Theorem 14 (Hromkovič et al. [19]). If A is a minimal NFA, then for all m ∈ N,
max
max
max(γA
(m), daA (m)) ≤ twA (m) = O(daA (m) · γA
(m)).
Goldstine et al. [12] have established a subtle relationship between ambiguity
and guessing for NFAs where all states are final. They define the ambiguity of a
string w as the number of complete computations on w. To avoid confusion, we
call the number of complete computations of an NFA A on a string w the complete
ambiguity 2 of A on w. Note that if all states of A are final, the complete ambiguity
of A coincides with the degree of ambiguity as defined in section 3 and if A has
no undefined transitions then the complete ambiguity of A coincides with the tree
width of A.
By definition, the guessing function γA (m) of an NFA grows at most linearly.
If the guessing is bounded or grows linearly, then the complete ambiguity may be
either bounded or unbounded but, in the intermediate case, where the guessing is
unbounded but sublinear, then ambiguity must always be unbounded. Recall from
Theorem 11 that there exist NFAs with unbounded and sublinear growth rate of
the guessing function.
Theorem 15 (Goldstine et al. [12]). Let A be an NFA. If γA (m) is non-constant
and sublinear, then the complete ambiguity of A must be unbounded. On the other
hand, if γA (m) is in O(1) or in Θ(m), then the complete ambiguity may be either
bounded or unbounded.
4.3
Limited nondeterminism and state complexity
An important descriptional complexity question is the succinctness comparison
of NFAs employing different amounts of nondeterminism and, in particular, the
determinization size blow-up of NFAs with limited nondeterminism. Goldstine et
al. [11] have shown that converting a general NFA to an NFA with finite branching
involves, in the worst case, an exponential size blow-up.
Theorem 16 (Goldstine et al. [11]). For each n ∈ N, there exists an n-state NFA
A such that any finite branching NFA recognizing the language L(A) needs at least
2n−1 states.
2 Keeler
[24] calls this the string path width of A on w.
152
Y.-S. Han, A. Salomaa, K. Salomaa
Also, Goldstine et al. [11] have shown that there exist regular languages for
which different finite amounts of nondeterminism yield incremental savings in the
number of states. The following two theorems give the “spectrum” result of [11]
stated in a slightly simplified form.
Theorem 17 (Goldstine et al. [11]). Let A be a minimal DFA of size 2n −1, n ≥ 2.
Then
(i) for 2 ≤ k ≤ logn n , the optimal size of an NFA with branching k for L(A) is
2
n
at least 2 k , and,
(ii) for k ≥
n
log2 n ,
an NFA with branching k for L(A) has size at least n.
Furthermore, they show that the bounds of Theorem 17 are close to best possible:
Theorem 18 (Goldstine et al. [11]). For n ≥ 2 there exists a minimal NFA An
with n + 1 states such that if we denote by σn [k] the optimal size of an NFA with
branching k recognizing L(An ) then the following relations hold:
n
n
(i) σn [1] = 2n , and, 2 k ≤ σn [k] < 2k · 2 k when 2 ≤ k <
n
(ii) n + 1 ≤ σn [k] < 2k · 2 k when
n
log2 n
n
log2 n ,
≤ k < n,
(iii) n + 1 ≤ σn [k] < 4n, when k ≥ n.
Recall that tree width is more restrictive than branching in the sense that
an NFA with finite tree width necessarily has finite branching, but the converse
implication does not hold, in general. Contrasting the result of Theorem 16, every
finite tree width NFA has an equivalent DFA of polynomial size.
Theorem 19 (Palioudakis et al. [38]). For an NFA A with n states having tree
Pk
width at most k ≤ n − 1, the language L(A) has a DFA of size 1 + j=1 n−1
.
j
Furthermore, for every 1 ≤ k ≤ n − 1, there exists an n state NFA An,k with
tree width k over a binary alphabet such that the minimal DFA for L(An,k ) has
Pk
1 + j=1 n−1
states.
j
Pk−`+1 n−1
Palioudakis et al. [38] gives also an upper bound 1+ i=1
for converting
i
an n state NFA with tree width k to an NFA with tree width 2 ≤ ` < k, but a
corresponding lower bound is missing. Also no spectrum result for tree width
analogous to the spectrum result for branching (Theorem 18) is known. That is,
there is no result that yields good bounds, for a given sequence of languages, for
the succinctness of NFAs over a range of different tree width values.
Deterministic finite automata with multiple initial states (MDFA) can be viewed
as a restricted type of automata with limited nondeterminism: the only nondeterminism consists of the choice of the initial state. With an elegant construction
based on modular arithmetic, Kappes [23] has given an efficient simulation of an
NFA with finite branching by an MDFA.
Ambiguity, Nondeterminism and State Complexity
153
Theorem 20 (Kappes [23]). An NFA with n states and branching k (k ∈ N) can
be simulated by an MDFA with k · n states and k initial states.
The bound as stated in [23] is k · n + 1 and the construction produces an MDFA
with a dead state. Above in Theorem 20 we allow the possibility that an MDFA can
have undefined transitions (following the definition of [40]). Palioudakis et al. [40]
have established an almost matching lower bound by showing that for infinitely
many values n, k ∈ N there exists an n-state NFA with branching k such that any
k
equivalent MDFA needs at least 1+log
k · n states.
To conclude we mention that limiting the nondeterminism of an NFA is not
sufficient to make the minimization problem tractable. It is well known that minimization of general NFAs is PSPACE-complete. Björklund and Martens [2] have
shown that minimization remains NP-hard, roughly speaking, for all finite automaton models that extend the class of DFAs. The hardness result is for the class of
δNFAs which are a very restricted subclass of tree width two NFAs [2].
5
Conclusion and open problems
Descriptional complexity comparison of nondeterministic finite automata of different degrees of ambiguity and employing different amounts of nondeterminism is
a foundational question in automata theory. The spectrum result of Goldstine et
al. [11] (Theorems 17 and 18) establishes the existence of a sequence of languages
for which different finite amounts of branching allow incremental savings in the
number of states, and the succinctness comparisons are approximately the best
possible.
On the other hand, very little is known about the state complexity of NFAs
where the amount of nondeterminism is unbounded and measured as a function
of input length. While there is a super-polynomial separation between the size of
finitely ambiguous, polynomially ambiguous, and general NFAs, no succinctness
comparisons between NFAs of different unbounded branching or unbounded tree
width are known. This can be a topic for future research. In particular, it would
be interesting to know whether the powerful communication complexity techniques
used by Hromkovič et al. [19, 20] for succinctness comparisons of NFAs with different degrees of ambiguity can be used to establish good lower bounds and separation
results for the state complexity of NFAs where the branching (or the tree width) is
measured as a function of input length.
A further topic of interest could be the succinctness comparison of NFAs of
given degree of ambiguity and NFAs of given branching (or tree width). Only a
few tentative results are known [38] in this direction.
Acknowledgements
Han was supported by the Basic Science Research Program through NRF funded
by MEST (2015R1D1A1A01060097) and the Yonsei University Future-leading Re-
154
Y.-S. Han, A. Salomaa, K. Salomaa
search Initiative of 2016. K. Salomaa was supported by the Natural Sciences and
Engineering Research Council of Canada Grant OGP0147224.
References
[1] Alur, R. and Madhusudan, P. Adding nesting structure to words. Journal of
the ACM, 56(3), 2009.
[2] Björklund, H. and Martens, W. The tractability frontier for NFA minimization.
J. Comput. System Sci., 78: 198–210, 2012.
[3] Book, R., Even, S., Greibach, S. and Ott, G. Ambiguity in graphs and expressions. IEEE Transactions on Computers, c-20(2):149–153, 1971.
[4] Brüggemann-Klein, A. and Wood, D. One-unambiguous regular languages.
Information and Computation, 140(2):229–253, 1998.
[5] Chan, T.-H. and Ibarra, O. On the finite-valuedness problem for sequential
machines. Theoret. Comput. Sci., 23: 95–101, 1983.
[6] Cui, B., Gao, Y., Kari, L., Yu, S. State complexity of combined operations
with two basic operations. Theoret. Comput. Sci., 437: 82–102, 2012.
[7] Ellul, K., Krawetz, B., Shallit, J. and Wang, M.-W. Regular expressions: New
results and open problems. Journal of Automata, Languages and Combinatorics, 10: 407–437, 2005.
[8] Ésik, Z., Gao, Y, Liu, G., Yu, S. Estimation of state complexity of combined
operations. Theoret. Comput. Sci., 410: 3272–3280, 2009.
[9] Gao, Y., Moreira, N., Reis, R., Yu, S. A review on state complexity of individual operations. To appear in Computer Science Review. (Available at
www.dcc.fc.up.pt/dcc/Pubs/TReports/TR11/dcc-2011-08.pdf)
[10] Goldstine, J., Kappes, M., Kintala, C.M.R., Leung, H., Malcher, A. and
Wotschke, D. Descriptional complexity of machines with limited resources.
J. Universal Computer Science, 8(2): 193–234, 2002.
[11] Goldstine, J., Kintala, C.M.R. and Wotschke, D. On measuring nondeterminism in regular languages. Inform. Comput., 86: 179–194, 1990.
[12] Goldstine, J., Leung, H. and Wotschke, D. On the relation between ambiguity
and nondeterminism in finite automata. Inform. Comput., 100: 261–270, 1992.
[13] Gruber, H., Holzer, M. From finite automata to regular expressions and back
— A summary of descriptional complexity. Intern. J. Foundations Comput.
Sci., 26: 1009–1040, 2015.
Ambiguity, Nondeterminism and State Complexity
155
[14] Han, Y.-S. and Wood, D. Generalizations of 1-deterministic regular languages.
Information and Computation, 206(9-1): 1117–1125, 2008.
[15] Holzer, M. and Kutrib, M. Nondeterministic finite automata – Recent results
on the descriptional and computational complexity. Intern. J. Foundations
Comput. Sci., 20(4): 563–580, 2009.
[16] Holzer, M. and Kutrib M. Descriptional complexity of (un)ambiguous finite
state machines and pushdown automata. In: Reachability Problems 2010, Lect.
Notes Comput. Sci. 6227, pp. 1–23, 2010
[17] Holzer, M. and Kutrib, M. Descriptional and computational complexity of
finite automata — A survey. Information and Computation, 209: 456–470,
2011.
[18] Hromkovič, J. Descriptional complexity of finite automata: Concepts and open
problems. Journal of Automata, Languages and Combinatorics, 7(4): 519–531,
2002.
[19] Hromkovič, J., Seibert, S., Karhumäki, J., Klauck, H. and Schnitger, G. Communication complexity method for measuring nondeterminism in finite automata. Inform. Comput., 172: 202–217, 2002.
[20] Hromkovič, J. and Schnitger, G. Ambiguity and communication. Theory Comput. Syst., 48: 517–534, 2011.
[21] Ibarra, O. and Ravikumar, B. On sparseness, ambiguity and other decision
problems for acceptors and transducers. In Proc. of STACS’86, Lect. Notes
Comput. Sci. 210, Springer, pp. 171–179, 1986.
[22] Jirásek, J., Jirásková, G., Šebej, J. Operations on unambiguous finite automata. In Developments in Language Theory, DLT, Lect. Notes Comput. Sci.
9840, Springer, pp. 243–255, 2016.
[23] Kappes, M. Descriptional complexity of deterministic finite automata with
multiple intial states. J. Automata, Languages and Combinatorics, 5: 269–
278, 2000.
[24] Keeler, C. New measures for finite automaton complexity and subregular language hierarchies. MSc thesis, Queen’s University, 2016.
[25] Kintala, C.M.R. and Fischer, P.C. Refining nondeterminism in relativized
polynomial time computations. SIAM J. Comput., 9(1): 46–53, 1980.
[26] Kintala, C.M.R and Wotschke, D. Amounts of nondeterminism in finite automata. Acta Informatica, 13(2): 199–204, 1980.
[27] Kuich, W. Finite automata and ambiguity. Rept. 253 of the IIG. Techinische
Universität Graz, 1988.
156
Y.-S. Han, A. Salomaa, K. Salomaa
[28] Kutrib, M. and Pighizzini, G. Recent trends in descriptional complexity of
formal languages. Bulletin of the EATCS, 111: 70–86, 2013.
[29] Leiss, E. Succinct representation of regular languages by Boolean automata.
Theoret. Comput. Sci., 13: 323–330, 1981.
[30] Leung, H. Separating exponentially ambiguous finite automata from polynomially ambiguous finite automata. SIAM Journal of Computing, 27(4):
1073–1082, 1998.
[31] Leung, H. On finite automata with limited nondeterminism. Acta Informatica,
35: 595–624, 1998.
[32] Leung, H. Descriptional complexity of NFA of different ambiguity. Intern. J.
Foundations Comput. Sci., 16(5): 975–984, 2005.
[33] Mandel, A. and Simon, I. On finite semi-groups of matrices. Theoret. Comput.
Sci., 5: 183–204, 1977.
[34] Maslov, A.N. Estimates of the number of states of finite automata. Soviet
Math. Dokl., 11: 1373–1375, 1970.
[35] Okhotin, A. Unambiguous finite automata over a unary alphabet. Inform.
Comput., 212: 15–36, 2012.
[36] Okhotin, A., Salomaa, K. Descriptional complexity of unambiguous inputdriven pushdown automata. Theoret. Comput. Sci., 566: 1–11, 2015.
[37] Palioudakis, A., Han, Y.-S. and Salomaa, K. Growth rate of minimum branching. Submitted for publication, February 2017.
[38] Palioudakis, A., Salomaa, K. and Akl, S.G. State complexity of finite tree
width NFAs. J. Automata, Languages and Combinatorics, 17(2–4): 245–264,
2012.
[39] Palioudakis, A., Salomaa, K. and Akl, S.G.: Comparisons between measures of
nondeterminism for finite automata. In Proc. of DCFS’13, Lect. Notes Comput. Sci. 8031, Springer, pp. 217–228, 2013.
[40] Palioudakis, A., Salomaa, K. and Akl, S.G. Lower bound for converting an
NFA with finite nondeterminism into an MDFA. J. Automata, Languages and
Combinatorics, 19: 251–264, 2014.
[41] Palioudakis, A., Salomaa, K. and Akl, S.G. Quantifying nondeterminism in
finite automata. Annals of the University of Bucharest, No. 2, pp. 89–100,
2015.
[42] Ravikumar, B. and Ibarra, O.H. Relating the type of ambiguity of finite
automata to the succinctness of their representation. SIAM Journal of Computing, 18(6): 1263–1282, 1989.
Ambiguity, Nondeterminism and State Complexity
157
[43] Reutenauer, C. Propriétés arithmétiques et topologiques de séries rationnelles
en variables non commutatives. Thése troisiéme cycle, Université Paris VI,
1977.
[44] Rozenberg, G., Salomaa, A. (Eds.) Handbook of Formal Languages, Vol. I–III,
Springer-Verlag, 1997.
[45] Salomaa, A., Salomaa, K., and Yu, S. Undecidability of state complexity.
Internat. J. Comput. Math., 90: 1310–1320, 2013.
[46] Schmidt, E.M. Succinctness of descriptions of context-free, regular and finite
languages. PhD thesis, Cornell University, Ithaca, NY, 1978
[47] Shallit, J. A Second Course in Formal Languages and Automata Theory, Cambridge University Press, 2009.
[48] Simon, I. The nondeterministic complexity of a finite automaton. In: Lothaire,
M. (ed.) Mots - mélanges offerts a M.P. Schützenberger, Paris, Hermes, pp.
384–400, 1990.
[49] Stearns, R. and Hunt III, H. On the equivalence and containment problems
for unambiguous regular expressions, regular grammars and finite automata.
SIAM J. Comput., 14: 598–611, 1985.
[50] Weber, A. and Seidl, H. On the degree of ambiguity of finite automata. Theoret.
Comput. Sci., 88: 325–349, 1991.
[51] Yu, S. Regular languages, in [44], Vol. I, pp. 41–110, 1997.
[52] Yu, S. State complexity of regular languages. Journal of Automata, Languages
and Combinatorics, 6(2): 221–234, 2001.