Definition

Reduction of the number of states in finite automata
Ü
Ü
Ü
Representation of an automaton for the purpose of
computation requires space proportional to the number of
states.
For storage efficiency, it is desirable to reduce the number of
states as far as possible.
One method for reducing the states of a DFA is based on
finding and combing indistinguishable states.
Definition
Two states p and q of a DFA are called indistinguishable if
δ ∗ (p, w) ∈ F implies δ ∗ (q, w) ∈ F , and
δ ∗ (p, w) ∈
/ F implies δ ∗ (q, w) ∈
/ F , for all w ∈ Σ∗ .
If, on the other hand, there exists some string w ∈ Σ∗ such
that δ ∗ (p, w) ∈ F and δ ∗ (q, w) ∈
/ F , or vice versa, then the
states p and q are said to be distinguishable by a string w.
1. Finding pairs of distinguishable states
Procedure: MARK
1. Partition the state set into final and non-final states to get
two equivalence classes.
2. Repeat the following steps
Ü
For all pairs (p, q) and all a ∈ Σ, compute δ(p, a) = pa and
δ(q, a) = qa .
Ü
If the pair (pa , qa ) is marked as distinguishable, mark (p, q)
as distinguishable.
Once the indistinguishable (equivalence) classes are found, the
construction of the minimal DFA is straightforward.
2. Construction of the minimal DFA
Procedure: REDUCE
Given a DFA M = (Q, Σ, δ, q0 , F ), we construct a reduced DFA
M̂ = (Q̂, Σ, δ̂, qˆ0 , F̂ ) as follows.
1. Procedure MARK generates the equivalence classes, say
{qi , qj , . . . , qk }.
2. For each set {qi , qj , . . . , qk } of such indistinguishable states,
create a state labeled ij . . . k for M̂ .
3. For each transition rule of M of the form δ(qr , a) = qp , find
the sets to which qr and qp belong. If qr ∈ {qi , qj , . . . , qk } and
qp = {ql , qm , . . . , qn }, add to δ̂ a rule δ̂(ij . . . k, a) = lm . . . n.
4. The initial state qˆ0 is that state of M̂ whose label includes
the 0.
5. F̂ is the set of all the states whose label contains i such that
qi ∈ F .
Minimal DFA
Theorem
Given any DFA M , application of the procedure REDUCE
yields another DFA M̂ such that L(M ) = L(M̂ ).
Furthermore, M̂ is minimal in the sense that there is no other
DFA with a smaller number of states that also accepts L(M ).
Example: 1. Finding pairs of distinguishable states
q1
0
start
0
q0
1
q2
1
1
0
0
0,1
q4
1
q3
We partition the state set into final and non-final states to get two
equivalence classes: {q0 , q1 , q3 } and {q2 , q4 }. We compute
δ(q0 , 0) = q1 and δ(q1 , 0) = q2 and recognize that q0 and q1 are
distinguishable, so wet put q0 and q1 into different sets {q0 } and
{q1 , q3 }. We compute δ(q2 , 0) = q1 and δ(q4 , 0) = q4 and
recognize that q2 and q4 are distinguishable, so we put them into
different sets {q2 } and {q4 }. The equivalence classes are now {q0 },
{q1 , q3 }, {q2 } and {q4 }. The rest of the computations show that
no further splitting is needed.
Example continued: 2. Construction of the minimal DFA
Equivalence classes: {q0 } {q1 , q3 } {q2 } {q4 }
1. Create for each equivalence class (i.e. {q1 , q3 }) a
corresponding labeled state (i.e. 13).
2. For each transition δ(p, a) = q find the equivalence classes. If
p = {qi , qj , . . . , qk } and q = {ql , qm , . . . , qn }, add
δ̂(ij . . . k, a) = lm . . . n.
⇒ Minimal automaton
start
0
0,1
13
1
4
0,1
0
2
0
1
Regular Expressions
Ü
Ü
Ü
One way of describing regular languages is via the notation
of regular expressions.
This notation involves a combination of strings of symbols
from some alphabet Σ, parentheses (), the operators +, ·,
and ∗ .
We construct regular expressions from primitive constituents
by repeatedly applying certain recursive rules.
Definition
Let Σ be a given alphabet. Then
1. ∅, λ, and a ∈ Σ are all regular expressions. These are
called primitive regular expressions.
2. If r1 and r2 are regular expressions, so are r1 + r2 ,
r1 · r2 , r1∗ , (r1 ).
3. A string is a regular expression if and only if it can be
derived from the primitive regular expressions by a
finite number of applications of the rules in (2).
Regular expressions
Ü
Example:
For Σ = {a, b, c}, the string (a + b · c)∗ · (c + ∅) is a regular
expression.
But (a + b+) is not a regular expression.
Definition
The concatenation of two languages L1 and L2 is the set of
all strings obtained by concatenating any element of L1 with
any element of L2 , L1 L2 = {xy | x ∈ L1 , y ∈ L2 }.
Languages associated with regular expressions
If r is a regular expression, we let L(r) denote the language
associated with r.
Definition
The language L(r) denoted by any regular expression r is
defined by the following rules.
1. ∅ is a regular expression denoting the empty set,
2. λ is a regular expression denoting {λ},
3. For every a ∈ Σ, a is a regular expression denoting {a}.
If r1 and r2 are regular expressions, then
4. L(r1 + r2 ) = L(r1 ) ∪ L(r2 ),
5. L(r1 · r2 ) = L(r1 )L(r2 ),
6. L((r1 )) = L(r1 ),
7. L(r1 ∗ ) = (L(r1 ))∗ .
To see what language a given regular expression denotes, we apply
these rules repeatedly.
Languages associated with regular expressions
Ü
With a little practice we can see quickly what language a
particular regular expression denotes.
Ü
Example:
Exhibit the language L(a∗ · (a + b)) in set notation.
L(a∗ · (a + b))
Ü
=
=
=
=
L(a∗ )L(a + b)
(L(a))∗ (L(a) ∪ L(b))
{λ, a, aa, aaa, . . .}{a, b}
{a, aa, aaa, . . . , b, ab, aab, . . .}.
We establish a set of precedence rules for evaluation in which
star-closure precedes concatenation and concatenation
precedes union.
Languages associated with regular expressions
Ü
Example:
For Σ = {a, b}, the expression
r = (a + b)∗ (a + bb)
is regular.
It denotes the language L(r) = {a, bb, aa, abb, ba, bbb, . . .}.
The first part, (a + b)∗ stands for any string of a’s and b’s. The
second part, (a + bb) represents either an a or a double b. L(r) is
the set of all strings on {a, b}, terminated by either an a or a bb.
Ü
Example:
The expression
r = (aa)∗ (bb)∗ b
denotes the set of all strings with an even number of a’s followed
by an odd number of b’s; that is,
L(r) = {a2n b2m+1 | n ≥ 0, m ≥ 0}.
References
LINZ, P. An introduction to Formal Languages and Automata.
Jones and Bartlett Learning, 2012.