Languages and Finite Automata

Regular Expressions
1
Regular Expressions
Regular expressions
describe regular languages
Example:
(a  b  c) *
describes the language
a, bc*   , a, bc, aa, abc, bca,...
2
Recursive Definition
Primitive regular expressions:
Given regular expressions
,  , 
r1 and r2
r1  r2
r1  r2
r1 *
Are regular expressions
r1 
3
Examples
A regular expression:
a  b  c  * (c  )
Not a regular expression:
a  b  
4
Languages of Regular Expressions
L r  :
language of regular expression
r
Example
L(a  b  c) *   , a, bc, aa, abc, bca,...
5
Definition
For primitive regular expressions:
L   
L    
La   a
6
Definition (continued)
For regular expressions r1 and r2
Lr1  r2   Lr1   Lr2 
Lr1  r2   Lr1  Lr2 
Lr1 *   Lr1  *
Lr1   Lr1 
7
Example
Regular expression: a  b   a *
La  b   a *  La  b  La *
 La  b  La *
  La   Lb   La  *
 a  b a *
 a, b , a, aa, aaa,...
 a, aa, aaa,...,b, ba, baa,...
8
Example
Regular expression
r  a  b  * a  bb
Lr   a, bb, aa, abb, ba, bbb,...
9
Example
Regular expression
Lr   {a b
r  aa  * bb * b
2n 2m
b : n, m  0}
10
Example
Regular expression
r  (0  1) * 00 (0  1) *
L(r ) = { all strings with at least
two consecutive 0 }
11
Example
Regular expression
r  (1  01) * (0   )
L(r ) = { all strings without
two consecutive 0 }
12
Equivalent Regular Expressions
Definition:
Regular expressions
are equivalent if
r1 and r2
L(r1)  L(r2 )
13
Example
L = { all strings without
two consecutive 0 }
r1  (1  01) * (0   )
r2  (1* 011*) * (0   )  1* (0   )
L(r1)  L(r2 )  L
r1 and r2
are equivalent
regular expr.
14
Example
Let L be the language that consists of all
Strings over {a, b} that begin with aa or
end with bb.
aa(a+b)* + (a+b)*bb
15
Example
L is the set of strings over {a, b} that
contains exactly two b’s.
a*ba*ba*
16
Example
L is the set of strings over {a, b} that do not end
in aaa.
(a+b)*(b+ba+baa)+a+aa+ 
17
Example
L is the set of strings over {a, b, c} that do
not contain the substring bc.
c*(b+ac*)*
18
Regular Expressions
and
Regular Languages
19
Theorem
Languages
Generated by
Regular Expressions

Regular
Languages
20
We will show:
Languages
Generated by
Regular Expressions

Regular
Languages
Languages
Generated by
Regular Expressions

Regular
Languages
21
Proof - Part 1
Languages
Generated by
Regular Expressions

Regular
Languages
For any regular expression
the language
r
L(r ) is regular
Proof by induction on the size of
r
22
Induction Basis
Primitive Regular Expressions:
,  , 
NFAs
L( M1)    L()
L( M 2 )  {}  L( )
a
regular
languages
L( M 3 )  {a}  L(a)
23
Inductive Hypothesis
Assume
for regular expressions r1 and r2
that
L(r1 ) and L(r2 ) are regular languages
24
Inductive Step
We will prove:
Lr1  r2 
Lr1  r2 
Lr1 *
Lr1 
Are regular
Languages
25
By definition of regular expressions:
Lr1  r2   Lr1   Lr2 
Lr1  r2   Lr1  Lr2 
Lr1 *   Lr1  *
Lr1   Lr1 
26
By inductive hypothesis we know:
L(r1 ) and L(r2 ) are regular languages
We also know:
Regular languages are closed under:
Union
Concatenation
Star
Lr1   Lr2 
Lr1  Lr2 
 Lr1  *
27
Therefore:
Lr1  r2   Lr1   Lr2 
Lr1  r2   Lr1  Lr2 
Are regular
languages
Lr1 *   Lr1  *
28
And trivially:
L((r1)) is a regular language
29
Proof - Part 2
Languages
Generated by
Regular Expressions

Regular
Languages
For any regular language L there is
a regular expression r with L(r ) 
L
Proof by construction of regular expression
30
Since L is regular take the
NFA M that accepts it
L( M )  L
Single final state
31
From M construct the equivalent
Generalized Transition Graph
in which transition labels are regular expressions
Example:
M
a
c
a, b
a
c
ab
32
Example
3
a
1
a
2
b
b
4
a
5
b
Delete node 2
3
b
ab
1
ab
4
a
5
bb
33
3
b
ab
1
ab
4
a
5
bb
Delete node 3
bab
1
ab
4
a
5
bb
34
bab  bb
1
ab
4
a
5
Delete node 4
1
ab(bab  bb) * a
5
35
Another Example:
a
q0
b
b
q1 a, b
q2
b
b
b
a
q0
q1 a  b q2
b
36
Reducing the states:
b
a
q0
b
q1 a  b q2
b
bb * a
q0
b
bb * (a  b)
q2
37
Resulting Regular Expression:
bb * a
q0
b
bb * (a  b)
q2
r  (bb * a) * bb * (a  b)b *
L( r )  L( M )  L
38
In General
e
Removing states:
d
qi
c
qj
q
a
b
ae * d
ce * b
ce * d
qi
qj
ae * b
39
The final transition graph:
r1
r4
r3
q0
r2
qf
The resulting regular expression:
r  r1 * r2 (r4  r3r1 * r2 ) *
L( r )  L( M )  L
40
Applications of RE
Two common applications of RE:
Lexical analysis in compiler
Finding patterns in text
41
Lexical Analyzer
• Recognize “tokens” in a program source
code.
• The tokens can be variable names, reserved
words, operators, numbers, … etc.
• Each kind of token can be specified as an
RE, e.g., a variable name is of the form [AZa-z][A-Za-z0-9]*. We can then construct
an NFA to recognize it automatically.
42
• By putting all these NFA’s together, we
obtain one that can recognize different
kinds of tokens in the input string.
• We can convert this NFA to DFA, and
implement this DFA as a deterministic
program - the lexical analyzer.
43
Text Search
• “grep” in Unix stands for “Global (search
for) Regular Expression and Print”.
• Unix has its own notations for regular
expressions:
– Dot “.” stands for
• “any character”.
– [a1a2…ak] stands for
• {a1, a2…ak}, e.g., [bcd12] stands for the set {b, c, d,
1, 2}.
– [x-y] stands for
• all characters from x to y in the ASCII sequence.
44
–| means
•“or”, i.e., + in our normal notation.
–* means
•“Kleene star”, as in our normal notation.
–? means
•“zero or one”
–+ means
•“one or more”, e.g., R+ is RR*
–{n} means
•“n copies of”, e.g., R{5} is RRRRR
(You can find out more by “man grep”, “man regex”)
45
• We can use these notations to search for
string patterns in text.
• For example, credit card numbers:
– [0-9]{16} | [0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}
46
Grammar
Languages can also be represented
by grammar.
47
Grammars
Grammars express languages
Example:
the English language
sentence  noun _ phrase
noun _ phrase  article
predicate
noun
predicate  verb
48
article  a
article  the
noun  cat
noun  dog
verb  runs
verb  walks
49
A derivation of “the dog walks”:
sentence  noun _ phrase
predicate
 noun _ phrase
verb
 article
verb
noun
 the noun
verb
 the dog verb
 the dog walks
50
A derivation of “a cat runs”:
sentence  noun _ phrase
predicate
 noun _ phrase
verb
 article
noun
verb
 a noun
verb
 a cat verb
 a cat runs
51
Language of the grammar:
L = { “a cat runs”,
“a cat walks”,
“the cat runs”,
“the cat walks”,
“a dog runs”,
“a dog walks”,
“the dog runs”,
“the dog walks” }
52
Notation
Production Rules
noun  cat
noun  dog
Variable
Terminal
53
Another Example
Grammar:
S  aSb
S 
Derivation of sentence
ab :
S  aSb  ab
S  aSb
S 
54
Grammar:
S  aSb
S 
Derivation of sentence
aabb :
S  aSb  aaSbb  aabb
S  aSb
S 
55
Other derivations:
S  aSb  aaSbb  aaaSbbb  aaabbb
S  aSb  aaSbb  aaaSbbb
 aaaaSbbbb  aaaabbbb
56
Language of the grammar
S  aSb
S 
L  {a b : n  0}
n n
57
More Notation
Grammar
G  V ,T , S , P 
V:
Set of variables
T:
Set of terminal symbols
S : Start variable
P:
Set of Production rules
58
Example
Grammar
G:
S  aSb
S 
G  V ,T , S , P 
V  {S}
T  {a, b}
P  {S  aSb, S  }
59
More Notation
Sentential Form:
A sentence that contains
variables and terminals
Example:
S  aSb  aaSbb  aaaSbbb  aaabbb
Sentential Forms
sentence
60
We write:
*
S  aaabbb
Instead of:
S  aSb  aaSbb  aaaSbbb  aaabbb
61
In general we write:
If:
*
w1  wn
w1  w2  w3    wn
62
By default:
*
w  w
63
Example
Grammar
S  aSb
S 
Derivations
*
S 
*
S  ab
*
S  aabb
*
S  aaabbb
64
Example
Grammar
S  aSb
S 

Derivations
S  aaSbb

aaSbb  aaaaaSbbbbb
65
Another Grammar Example
Grammar G : S  Ab
A  aAb
A
Derivations:
S ⇒Ab ⇒b
S ⇒Ab ⇒aAbb ⇒abb
S ⇒Ab ⇒aAbb ⇒aaAbbb ⇒aabbb
66
More Derivations
S  Ab  aAbb  aaAbbb  aaaAbbbb
 aaaaAbbbbb  aaaabbbbb

S  aaaabbbbb

S  aaaaaabbbbbbb

S a b b
n n
67
Language of a Grammar
For a grammar G
with start variable
S :

L(G )  {w : S  w}
String of terminals
68
Example
For grammar
S  Ab
G:
A  aAb
A
L(G )  {a b b : n  0}
n n
Since:

S a b b
n n
69
A Convenient Notation
A  aAb
A
article  a
article  the
A  aAb | 
article  a | the
70
Regular Grammars
A grammar G = (V, T, S, P) is right-linear if all productions
are of the form
A → xB, A → x
where A, B  V, x  T*
A grammar G = (V, T, S, P) is left-linear if all productions are
of the form
A → Bx, A → x
where A, B  V, x  T*
A regular grammar is one that either right-linear or leftlinear
71
Assignment#1
Read Textbook 3.3
Then finish Exercise 1, 2, 5, 6 on Page 96-97
72
Standard Representations
of Regular Languages
Regular Languages
FAs
Regular
Grammar
NFAs
Regular
Expressions
73
Equivalence
Regular
Expressions
Theorem 3.1
Theorem 3.2
NFA
Theorem 2.2
Any DFA is
trivially an NFA
Theorem 2.3
& Theorem 2.4
DFA
Theorem 3.3
Theorem 3.4
Regular
Grammars
74
Elementary Questions
about
Regular Languages
75
When we say:
We mean:
We are given
a Regular Language
L
Language L is in a standard
representation
76
Membership Question
Question:
Answer:
Given regular language L
and string w
how can we check if w 
L?
Take the DFA that accepts L
and check if w is accepted
77
DFA
w
w L
DFA
w
w L
78
Question:
Answer:
Given regular language L
how can we check
if L is empty: ( L  ) ?
Take the DFA that accepts
L
Check if there is any path from
the initial state to a final state
79
DFA
L
DFA
L
80
Question:
Given regular language
how can we check
if L is finite?
Answer: Take the DFA that accepts
L
L
Check if there is a walk with cycle
from the initial state to a final state
81
DFA
L is infinite
DFA
L is finite
82
Question: Given regular languages L1 and
how can we check if L1  L2 ?
Answer:
Find if
L2
( L1  L2 )  ( L1  L2 )  
83
( L1  L2 )  ( L1  L2 )  
and
L1  L2  
L1
L2 L
2
L1  L2
L1  L2  
L2
L1 L1
L2  L1
L1  L2
84
( L1  L2 )  ( L1  L2 )  
L1  L2  
L1
or
L1  L2  
L2
L2
L1  L2
L1
L2  L1
L1  L2
85