Three New Algorithms for Regular Language Enumeration

Three New Algorithms for
Regular Language Enumeration
Margareta Ackerman
Erkki Makinen
University of Waterloo
Waterloo, ON
University of Tempere
Tempere, Finland
0
B
0
A
D
1
1
1
C
1
0
E
What kind of words does this NFA accepts?
0
B
0
A
D
1
1
1
C
1
0
E
ε 0 00 11 000 110 0000 1100 1110 00000 ....
Cross-section problem: enumerate all words of length n
accepted by the NFA in lexicographic order.
0
B
0
A
D
1
1
1
C
1
0
E
ε 0 00 11 000 110 0000 1100 1110 00000 ....
Enumeration problem: enumerate the first m words
accepted by the NFA in length-lexicographic order.
0
B
0
A
D
1
1
1
C
1
0
E
ε 0 00 11 000 110 0000 1100 1110 00000 ....
Min-word problem: find the first word of length n
accepted by the NFA.
Applications
• Correctness testing, provides evidence that an NFA
generates the expected language.
• An enumeration algorithm can be used to verify
whether two NFAs accept the same language (Conway,
1971).
• A cross-section algorithm can be used to determine
whether every word accepted by a given NFA is a
power - a string of the from wn for n>1, |w|>0.
(Anderson, Rampersad, Santean, and Shallit, 2007)
• A cross-section algorithm can be used to solve the “ksubset of an n-set” problem: Enumerate all k-subset of
a set in alphabetical order. (Ackerman & Shallit, 2007)
Objectives
Find algorithms for the three problems that are
• Asymptotically efficient in
– Size of the NFA (s states and d transitions)
– Output size (t)
– The length of the words in the cross-section (n)
• Efficient in practice
Previous Work
• A cross-section algorithm, where finding each consecutive
word is super-exponential in the size of the cross-section
(Domosi, 1998).
• A cross-section algorithm that is exponential in n (length of
words in the cross-section) is found in the Grail
computation package.
– “Breast-First-Search” approach
– Trace all paths of length n in the NFA, storing the paths that end
at a final state.
– O(dσn+1), where d is the number of transitions in the NFA and σ
is the alphabet size.
Previous Polynomial Algorithms:
Makinen, 1997
• Dynamic programming solution
– Min-word O(dn2)
– Cross-section O(dn2+dt)
– Enumeration O(d(e+t))
Quadratic in n
e: the number of empty cross-section encountered
d: the number of transitions in the NFA
n: the length of words in the cross-section
t: the number of characters in the output
Previous Polynomial Algorithms:
Ackerman and Shallit, 2007
• Linear in the length of words in the cross-section
– Min-word: O(s2.376n)
– Cross-section: O(s2.376n+dt)
– Enumeration: O(s2.376c+dt)
Linear in n
c: the number of cross-section encountered
d: the number of transitions in the NFA
n: the length of words in the cross-section
t: the number of characters in the output
Previous Polynomial Algorithms:
Ackerman and Shallit, 2007
• The algorithm uses “smart breadth first search,”
following only those paths that lead to a final state.
• Main idea: compute a look-ahead matrix, used to
determine whether there is a path of length i starting
at state s and ending at a final state.
• In practice, Makinen’s algorithm (slightly modified) is
usually more efficient, except on some boundary
cases.
Contributions
Present 3 algorithms for each of the
enumeration problems, including:
• O(dn) algorithm for min-word
• O(dn+dt) algorithm for cross-section
• Algorithms with improved practical
performance for each of the enumeration
problems
Contributions: Detailed
• We present three sets of algorithms
1. AMSorted:
- An efficient min-word algorithm, based on Makinen’s original algorithm.
- A cross-section and enumeration algorithms based on this min-word
algorithm.
2.
AMBoolean:
- A more efficient min-word algorithm, based on minWordAMSorted.
- A cross-section and enumeration algorithms based on this min-word
algorithm.
3.
Intersection-based:
- An elegant min-word algorithm.
- A cross-section algorithm based on this min-word algorithm.
Key ideas behind our first two
algorithms
- Makinen’s algorithm uses simple dynamic
programming, which is efficient in practice on
most NFAs.
- The algorithm by Ackerman & Shallit uses
“smart breadth first search,” following only
those paths that lead to a final state.
- We build on these ideas to yield algorithms
that are more efficient both asymptotically
and in practice.
Makinen’s original min-word algorithm
A
1
2
3
-
(3,C)
(3,C)
2
B
0
A
B
C
0
1
(2,B)
(1,B)
(0,A)
(1,B)
1
3
C
1
S[i] stores a representation of the minimal word w of
length i that appears on a path from S to a final state.
Makinen’s original min-word algorithm
A
1
2
3
-
(3,C)
(3,C)
2
B
0
A
B
C
0
1
(2,B)
(1,B)
(0,A)
(1,B)
1
3
C
1
The minimal word of length n can be found by tracing
back from the last column of the start state.
Makinen’s original min-word algorithm
• Initialize the first column
• For columns i = 2...n
– For each state S
Find S[i] by comparing all words of length i appearing on
2
paths from S to a final state.
1
2
3
A
-
(3,C) (3,C)
B
0
(2,B) (0,A)
C
1
(1,B) (1,B)
B
0
A
1
3
1
C
Makinen’s original min-word algorithm
• Initialize the first column
• For columns i = 2...n
i operations
– For each state S
Find S[i] by comparing all words of length i appearing on
2
paths from S to a final state.
1
2
3
A
-
(3,C) (3,C)
B
0
(2,B) (0,A)
C
1
(1,B) (1,B)
B
0
A
1
3
1
C
Makinen’s original min-word algorithm
• Initialize the first column
• For columns i = 2...n
i operations
– For each state S
Find S[i] by comparing all words of length i appearing on
paths from S to a final state.
Theorem: Makinen’s original min-word
algorithm is O(dn2).
New min-word algorithm:
MinWordAMSorted
Idea: Sort every columns by the words that the
entries represent.
A
1
2
-
(3,C)
3
(3,C) 321
B
0
(2,B)
(0,A) 031
C
1
(1,B)
(1,B) 120
B
0
A
1
3
1
C
2
New min-word algorithm:
MinWordAMSorted
• We define an order on {S[i] : S a state in N}.
• If A[1]=a and B[1]=b, where a<b, then
A[1]<B[1].
• For i > 1, A[i] = (a, A’) and B[i] = (b, B’)
– If a<b, then A[i] < B[i].
– If a = b, and A’[i-1] < B’[i-1], then A[i] < B[i].
• If A[i] is defined, and B[i] is undefined, then
A[i] > B[i].
New min-word algorithm:
MinWordAMSorted
• Initialize the first column
• For columns i = 2...n
– For each state S
• Find S[i] using only column i-1 and the edges leaving S.
– Sort column i
2
1
2
3
A
-
(3,C) (3,C)
B
0
(2,B) (0,A)
C
1
(1,B) (1,B)
B
0
A
1
3
1
C
New min-word algorithm:
MinWordAMSorted
• Initialize the first column
• For columns i = 2...n
d operations
– For each state S
• Find S[i] using only column i-1 and the edges leaving S.
– Sort column i
s log s operations
Theorem: The algorithm
minWordAMSorted is O((s log s +d) n).
New cross-section algorithm:
crossSectionAMSorted
• A state S is i-complete if there exists a path of
length i from state S to a final state.
• To enumerate all words of length n:
1. Call minWordAMSorted (create a table) O((s log s +d) n).
2. Perform a “smart BFS”: O(dt)
- Begin at the start state.
- Follow only those paths of length n that end at a final state,
by using the table to identify i-complete states.
Theorem: The algorithm crossSectionAMSorted
is O(n (s log s + d) + dn).
New enumeration algorithm:
enumAMSorted
Run the cross-section algorithm until the
required number of words are listed, while
reusing the table.
Theorem: The algorithm enumAMSorted
is O(c (s log s + d)+ dt).
c: the number of cross-section encountered
d: the number of transitions in the NFA
t: the number of characters in the output
What have we got so far?
New Algorithms
Previous Algorithms
Makinen
Ackerman &
Shallit
O(dn2)
O(s2.376n)
cross-section O(n (s log s + d)+dt)
O(dn2+dt)
O(s2.376n+dt)
enumeration O(c (s log s +d) + dt)
O(de + dt)
O(s2.376c+dt)
Sorted
min-word
O((s log s + d)n)
c: the number of cross-section encountered
e: the number of empty cross-section encountered
d: the number of transitions in the NFA
n: the length of words in the cross-section
t: the number of characters in the output
New min-word algorithm:
minWordAMBoolean
Idea: instead of using a table to find the
minimal word, construct a table whose only
purpose is to determine i-complete states.
Can be done using a similar algorithm to
minWordAMSorted, but more efficiently, since
there is no need to sort.
New min-word algorithm:
minWordAMBoolean
A
1
2
3
F
T
T
B
0
A
B
C
T
T
T
T
T
F
1
3
C
New min-word algorithm:
minWordAMBoolean
• Fill in the first column
• For i=2 ... n
– For every state S
• Determine whether S is i-complete using only the transitions
leaving S and column i-1
• Starting at the start state, follow minimal transitions to paths
that can complete a word of length n (using the table).
1
2
3
A
F
T
T
B
T
T
T
C
T
T
F
B
0
A
1
3
C
New min-word algorithm:
minWordAMBoolean
d operations
• Fill in the first column
• For i=2 ... n
– For every state S
• Determine whether S is i-complete using only the transitions
leaving S and column i-1
• Starting at the start state, follow minimal transitions to paths
that can complete a word of length n (using the table).
1
2
3
A
F
T
T
B
T
T
T
C
T
T
F
B
0
A
1
3
C
New min-word algorithm:
minWordAMBoolean
• Fill in the first column
• For i=2 ... n
– For every state S
d operations
• Determine whether S is i-complete using only the transitions
leaving S and column i-1
• Starting at the start state, follow minimal transitions to paths
that can complete a word of length n (using the table).
Theorem: The algorithm minWordAMBoolean is
O(dn).
New cross-section algorithm:
crossSectionAMBoolean
• Extend to a cross-section algorithm using the
same approach as the Sorted algorithm.
• To enumerate all words of length n:
– Call minWordAMBoolean (create a table) O(dn).
– Perform a “smart BFS”: O(dt)
- Begin at the start state.
- Follow only those paths of length n that end at a final state,
by using the table to identify i-complete states.
Theorem: The algorithm crossSectionAMBoolean
is O(dn+dt).
New enumeration algorithm:
enumAMBoolean
Run the cross-section algorithm until the
required number of words are listed, while
reusing the table.
Theorem: The algorithm enumAMBoolean
is O(de+ dn).
e: the number of empty cross-section encountered
d: the number of transitions in the NFA
n: the length of words in the cross-section
t: the number of characters in the output
What have we got so far?
New Algorithms
Previous Algorithms
Makinen
Ackerman &
Shallit
Sorted
Boolean
min-word
O((s logs+d)n)
O(dn)
O(dn2)
O(s2.376n)
cross-section
O(n (s log s+d)+dt)
O(dn+dt)
O(dn2+dt)
O(s2.376n+dt)
enumeration
O(c (s log s +d) + dt)
O(de+dt)
O(de+dt)
O(s2.376c+dt)
c: the number of cross-section encountered
e: the number of empty cross-section encountered
d: the number of transitions in the NFA
n: the length of words in the cross-section
t: the number of characters in the output
Intersection-Based Algorithms
• We present surprisingly elegant min-word and
cross-section algorithms that have the
asymptotic efficiency of the Boolean-based
algorithms.
• However, these algorithms are not as efficient
in practice as the Boolean-based and Sortedbased algorithms.
New min-word algorithm:
minWordIntersection
Let N be the input NFA, and A be the NFA that accepts the language of all
words of length n.
1. Let C = N x A
2. Remove all states of C that cannot be
reached from the final states of C using
reversed transitions.
3. Starting at the start state, follow the
minimal n consecutive transitions to a final
state.
New min-word algorithm:
minWordIntersection
Let N be the input NFA, and A be the NFA that accepts the language of all
words of length n.
1. Let C = N x A
Let n = 2
Automaton A
2. Remove all states of C that cannot be
Automaton
N
reached from
the final states of C using
1
reversed transitions.
0
1
B
1
3. Starting
at the start state, follow the
A minimal n consecutive transitions to a final
0
1
0
1
state.
0
C
0
New min-word algorithm:
minWordIntersection
Let N be the input NFA, and A be the NFA that accepts the language of all
words of length n.
1. Let C = N x A
Automaton C
2. Remove all states of C that cannot be
Automaton
N
reached from
the final states of C using
1
reversed transitions.
0
1
B
1
3. Starting
at the start state, follow the
A minimal n consecutive transitions to a final
0
1
state.
0
C
0
New min-word algorithm:
minWordIntersection
Let N be the input NFA, and A be the NFA that accepts the language of all
words of length n.
1. Let C = N x A
2. Remove all states of C that cannot be
reached from the final states of C using
reversed transitions.
3. Starting at the start state, follow
the
1
minimal n consecutive transitions to a final
1
state.
New min-word algorithm:
minWordIntersection
Let N be the input NFA, and A be the NFA that accepts the language of all
words of length n.
1. Let C = N x A
2. Remove all states of C that cannot be1
reached from the final states of C using
1
reverse transitions.
3. Starting at the start state, follow the
minimal n consecutive transitions to a final
state.
Thus the minimal word of length 2 accepted by N is “11”
Asymptotic running time of
minWordIntersection
1. Let C = N x A Concatenate n copies of N.
2. Remove all states of C that cannot be
reached from the final states of C using
reverse transitions.
3. Starting at the start state, Follow the
minimal n consecutive transitions to final.
Each step is proportional to size of C, which is O(nd).
Theorem: The algorithm minWordIntersection
is O(dn).
New cross-section algorithm:
crossSectionIntersection
• To enumerate all words of length n, perform
BFS on C = N x A, and remove all states not
reachable from final state removed (using
reverse transitions).
• Since all paths of length n starting at the start
state lead to a final state, there is no need to
check for i-completness.
Theorem: The algorithm crossSectionIntersection
is O(dn+dt).
Practical Performance
• We compared Makinen’s, Ackerman-Shallit, AMSorted, and
AMBoolean, and Intersection-based algorithms.
• Tested the algorithms on a variety of NFAs: dense, sparse,
few and many final states, different alphabet size, worst
case for Makinen’s algorithm, ect…
• Here are the best performing algorithms:
– Min-word: AMSorted
– Cross-section: AMBoolean
– Enumeration: AMBoolean
Summary
New Algorithms
Sorted
Boolean
Previous Algorithms
Intersection
Makinen
Ackerman &
Shallit
min-word
O((s logs+d)n)
O(dn)
O(dn)
O(dn2)
O(s2.376n)
cross-section
O(n (s log s +d)+dt)
O(dn+dt)
O(dn+dt)
O(dn2+dt)
O(s2.376n+dt)
enumeration
O(c (s log s +d) + dt)
O(de+dt)
-
O(de+dt)
O(s2.376c+dt)
c: the number of cross-section encountered
e: the number of empty cross-section encountered
d: the number of transitions in the NFA
n: the length of words in the cross-section
t: the number of characters in the output
: most efficient in practice
Open problems
• Extending the intersection-based cross-section
algorithm to an enumeration algorithm.
• Lower bounds.
• Can better results be obtained using a
different order?
• Restricting attention to a smaller family of
NFAs.