CS504-Theory of Computation - Lecture 3: Regular Languages

CS504-Theory of Computation
Lecture 3: Regular Languages
Waheed Noor
Computer Science and Information Technology,
University of Balochistan,
Quetta, Pakistan
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
1 / 34
Outline
1
Regular Expressions
2
Regular Languages
3
Nonregular Languages
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
2 / 34
Outline
1
Regular Expressions
2
Regular Languages
3
Nonregular Languages
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
3 / 34
Regular Expressions I
Definition
Regular expression is a formal notation (finite representation)
describing languages over some alphabet by applying different
language operations such as union, concatenation and Kleene star as
shown bellow.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
4 / 34
Regular Expressions II
Regular expressions are built recursively from smaller regular
expressions.
Each regular expression denotes a language, which is also
defined recursively from languages denoted by subexpression of
that regular expressions.
For example, let r be a regular expression denoting a language
L(r)
Σ denotes the alphabet that is the set of symbols, e.g., Σ = {0, 1}
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
5 / 34
Regular Expressions III
Rules
1
is a regular expression denoting language L()= {}, i.e., the
only member is the empty string.
2
For each a in Σ, a is a regular expression and L(a) = {a}, which is
the language with one string of length one.
3
If r and s are regular expressions, then so is (rs).
4
If r and s are regular expressions, then so is (r ∪ s or r | s).
5
If r and s are regular expressions, then so is r∗ .
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
6 / 34
Induction
Definition
Larger regular expressions can be built from smaller regular
expressions through induction, i.e., inducing smaller regular
expressions to existing regular expression.
For example, Let r and s are regular expressions denoting languages
L(r) and L(s), respectively
(r)|(s) or r|s is a regular expression denoting the language L(r) ∪
L(s).
(r)(s) or rs is a regular expression denoting the language L(r)L(s).
(r∗ ) or r∗ is a regular expression denoting the language (L(r))∗
(r) is a regular expression denoting the language L(r).
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
7 / 34
An Example
Let Σ = {a, b} be an alphabet then the regular expressions and
respective languages are given bellow.
a|b denotes {a, b}.
(a|b)(a|b) denotes a language of all strings of length two
{aa, ab, bb, ba}.
(a|b)* denotes the language {, a, b, aaab, baab, . . .}.
a|a*b denotes the language ???
Class Activity
Given is the above alphabet Σ, write regular expressions for following
languages
All strings must start and end with the symbol a and if the symbol
b appears it must appear even number of times.
All strings with an even number of a and odd number of b.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
8 / 34
More Examples
Describe the languages described by following regular expressions
a(a|b)*b
c*(a|(bc*))*
Discussion
There may be different regular expressions describing the same
language.
Not all languages can be described by regular expressions.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
9 / 34
Outline
1
Regular Expressions
2
Regular Languages
3
Nonregular Languages
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
10 / 34
Regular Languages
Definition
The class of languages over some alphabet Σ that can be described by
regular expressions is called regular languages. The class of regular
languages over some alphabet Σ is closure of the set of languages
with respect to the functions of union, concatenation, and Kleene star.
From previous two sections, we know that the class of languages
accepted by finite automata remains same irrespective of
determinism or nondeterminism.
This shows a stability of finite automata.
Furthermore, the class of languages accepted by finite automata
is the same as the class of regular languages.
We must, later in this lecture, show by proving that the class of
languages accepted by finite automata have the same closure
property as the class of regular languages.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
11 / 34
Finite Automata and Regular Expressions
Lets first see how can we construct a NFA from the regular expression
(ab|aab)*.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
12 / 34
Finite Automata and Regular Expressions I
-move construct NFA
For any subexpression a in Σ construct NFA
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
13 / 34
Finite Automata and Regular Expressions II
For regular expression s|t suppose we have NFA N(s) and N(t) then
For regular expression st construct NFA
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
14 / 34
Finite Automata and Regular Expressions III
For regular expression s* construct NFA
Class Activity
Construct a NFA for the class of languages accepted by L(r) =
(a|b)*abb.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
15 / 34
The Closure Property
Theorem
The class of languages accepted by finite automata is closed under
a) Union
b) Concatenation
c) Kleene star
d) Complementation
e) Intersection
You can consider it a collection of theorems for each operation.
Proof.
For each operation, we will show the construction of an automaton M
that accepts the respective language given two automata M1 and M2 ,
except for Kleene star and complementation where we consider M1
only.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
16 / 34
The Closure Property
Union
Proof.
Let M1 = (Q1 , Σ, ∆1 , s1 , F1 ) and M2 = (Q2 , Σ, ∆2 , s2 , F2 ) be NFA; we
shall construct another NFA M = (Q, Σ, ∆, s, F ) such that
L(M) = L(M1 ) ∪ L(M2 ).
Figure : Intuitive illustration of construction of NFA M accepting L(M1 ) ∪ L(M2 )
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
17 / 34
The Closure Property
Union
Proof. (Cont.)
Without loss of generality, assume that Q1 ∩ Q2 = ∅, then
s is a new state such that s ∈
/ Q1 and s ∈
/ Q2 ,
Q = Q1 ∪ Q2 ∪ {s}
F = F1 ∪ F2
∆ = ∆1 ∪ ∆2 ∪ {(s, e, s1 ), (s, e, s2 )},
Now, if w ∈ Σ∗ , then (s, w) `∗M (q, e) for some q ∈ F if and only if
either (s1 , w) `∗M1 (q, e) for some q ∈ F1 or (s2 , w) `∗M2 (q, e) for some
q ∈ F2 .
Hence, M accepts w if and only if M1 accepts w or M2 accepts w and
L(M) = L(M1 ) ∪ L(M2 ).
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
18 / 34
The Closure Property
Concatenation
Proof.
Let M1 = (Q1 , Σ, ∆1 , s1 , F1 ) and M2 = (Q2 , Σ, ∆2 , s2 , F2 ) be NFA; we
shall construct a NFA M = (Q, Σ, ∆, s, F ) : L(M) = L(M1 )L(M2 ).
Figure : Intuitive illustration of construction of NFA M accepting L(M1 )L(M2 )
Remaining of the formal proof is class activity
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
19 / 34
The Closure Property
Kleene Star
Proof.
Let M1 = (Q1 , Σ, ∆1 , s1 , F1 ) be NFA; we shall construct another NFA
M = (Q, Σ, ∆, s, F ) such that L(M) = L(M1 )∗ .
Figure : Intuitive illustration of construction of NFA M accepting L(M1 )∗
Remaining of the formal proof is your informal quiz.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
20 / 34
The Closure Property
Complementation
Proof.
Let M1 = (Q1 , Σ, ∆1 , s1 , F1 ) be DFA; we construct a DFA
M = (Q, Σ, ∆, s, F ) such that L(M) = L(M1 ). That is M accepts the
complementary language L = Σ∗ − L(M1 ).
Since, the complementary language L consist of strings do not exist in
L, therefore it is straight forward to construct M by interchanging the
final and non-final states, therefore
Q = Q1 ,
∆ = ∆1 ,
s = s1
F = Q − F1 .
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
21 / 34
The Closure Property
Intersection
We show proof of this property directly.
Proof.
Let L1 and L2 are two languages over some alphabet Σ accepted by
two finite automata M1 and M2 , respectively, then
we can write L1 ∩ L2 = Σ∗ − ((Σ∗ − L1 ) ∪ (Σ∗ − L2 )),
since, closedness under union and complementation exist, therefore
we conclude that L1 ∩ L2 is also accepted by some finite automaton
M.
The main result of the previous section (proof of previous theorem) is
the following theorem
Theorem
A language is regular if and only if it is accepted by a finite automaton.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
22 / 34
Outline
1
Regular Expressions
2
Regular Languages
3
Nonregular Languages
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
23 / 34
Nonregular Languages
Until now, we have learned that regular languages are closed
under variety of operations.
We also now know that regular languages can be specified by
regular expression or by finite automata (DFA or NFA).
Using these powerful techniques, singly or in combination, we can
show languages to be regular in different ways.
However, we have no technique yet for showing that a particular
language is not regular.
We only know on fundamental principle that nonregular languages
do exist.
Since, the number of languages is uncountable, while the number
of regular expressions/finite automata is countable.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
24 / 34
Properties of Regular Languages I
There are two key properties (intuitive) of regular languages that are
worth of consideration before we study a tool for showing formally that
a particular language is not regular.
1
The amount of memory needed to determine if an input string is in
the language is finite, irrespective of the length of the input string
but dependent of the language.
Example
{an bn : n ≥ 0} seems to be not regular, since it is unlikely to have a
finite automata that can correctly remember how many a’s it has seen
upon reaching the boarder between a and b.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
25 / 34
Properties of Regular Languages II
2
Regular languages with infinite number of strings must have
infinite subsets with fairly simple repetitive structure (or periodicity)
encoded by Kleene star in regular expression, or cycle in finite
automaton.
Example
{an : n is a prime} also seems to be not regular, since prime numbers
do not have simple repetitive structure.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
26 / 34
The Pumping Leema
The above two intuitions are correct but not sufficient, so we need to
be more rigorous and establish a property that every regular language
must posses, otherwise a language is non regular. Therefore we will
prove a theorem shortly.
Pumping Leema for regularity of a language: Informally
Any sufficiently long string in regular language can be pumped. That is
there is a section in the string that can be repeated any number of
times such that the resulting strings are also in the language.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
27 / 34
The Pumping Leema
Theorem
Let L be a regular language. There is an integer n ≥ 1, called pumping
length, such that any string w ∈ L with |w| ≥ n can be rewritten as
w = xyz such that y 6= , |xy | ≤ n, and xy i z ∈ L for each i ≥ 0.
Proof.
Since L is regular, there exist a DFA M that accepts L. Assume n is
number of states in M.
Let w be a string in L such that |w| ≥ n, now consider the first n steps
of computation of M over w.
(q0 , w1 , w2 , · · · , wn ) `M (q1 , w2 , · · · , wn ) `M · · · , `M (qn , ),
where q0 is the initial state of M and w1 , w2 , · · · , wn are fist n symbols
of w.
Now, observer that since M has only n states, and there n + 1
configurations (q0 , q1 , · · · , qn ),
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
28 / 34
The Pumping Leema
Cont.
by pigeonhole principle there exist i, j where 0 ≤ i < j ≤ n, such that
qi = qj , which means there must be a state that appears twice in the
sequence.
Now, we can define x = w1 , · · · , wi−1 , y = wi , · · · , wj and
z = wj+1 , · · · , wm , and y derives M from state qi back to state qi and
y 6= , since i < j.
Finally, the length of xy is the number j that is by definition j ≤ n.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
29 / 34
Pumping Leema: Examples
Example (1)
Show that the language L = {ai bi : i ≥ 0} is not regular.
Solution
First of all, let L be regular, and there exist an integer n ≥ 1 be the
pumping length.a
Now, let w = an bn ∈ L, where |w| = 2n ≥ n. Hence, by pumping
leema, w can be rewritten as w = xyz such that y 6= , and |xy | ≤ n.
Now, observe xy entirely contains some a’s, since |xy | ≤ n, where if
x = , then y = an , otherwise y = aj for some 0 > j ≤ n, where y at
least contains one a.
But then none of the string xy 0 z, xy 2 z, · · · belongs to L, for example,
xz = ai−n b ∈
/ L. A contradiction.
a
recall this can be defined as the number of states in the DFA
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
30 / 34
Pumping Leema: Examples
Example (2)
Show that the language L = {ai : i is prime} is not regular
Solution
Let L be regular, and there exist an integer n ≥ 1 be the pumping
length.
Now let i ≥ n be a prime number, and consider a string w = ai , and w
can be rewritten as w = xyz such that y 6= , and |xy | ≤ n.
Then x = ap , y = aq , and z = ar , where p, r ≥ 0 and q > 0. By
pumping leema, xy j z ∈ L for each j ≥ 0; that is p + jq + r is prime for
each j ≥ 0. But it is impossible; let j = p + 2q + r + 2, then
p + jq + r = (q + 1)(p + 2q + r ), which is clearly the product of two
integers, where each one is greater 1. Hence contradiction.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
31 / 34
Pumping Leema: Class Activity
Example (3)
Show that the language L = {aj bi : j > i ≥ 0}
Homework: Show that following languages are not regular
L1 = {ww : w ∈ {a, b}∗ }.
L2 = {w ∈ {a, b}∗ :
number of occurrences of ab is equal to occurrences of ba in w }.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
32 / 34
State Minimization of DFA
There may be many DFAs accepting the same language.
There is always unique minimum-state DFA for any regular
language.
We can construct this minimum-state DFA from any of the DFA for
the same language by grouping equivalent states.
Definition (Equivalent States)
Any two states are equivalent if they are of same type (final/non-final),
and on every symbol they transfer to the same state.
Method or algorithm for state minimization starts by splitting states
into two sets final states and non-final states.
We split each set into more sets of states if they can be
distinguishable on any symbol.
We repeat this for each symbol on the new sets of states, and
repeat the whole procedure until no set of states can be
distinguishable on any symbol.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
33 / 34
Class Activity: Minimum State DFA
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
34 / 34
References I
Harry Lewis and Christos Papadimitriou.
Elements of the Theory of Computation.
Prentice Hall, 2nd edition, 1998.
Michael Sipser.
Introduction to the Theory of Computation.
Thomson Course Technology, 2nd edition, 2006.
Waheed Noor (CS&IT, UoB, Quetta)
CS504-Theory of Computation
May 2014
35 / 34