Regular Expressions
1
Regular Expressions
Regular expressions
describe regular languages
Example:
(a b c) *
describes the language
a, bc* , a, bc, aa, abc, bca,...
2
Recursive Definition
Primitive regular expressions:
Given regular expressions
, ,
r1 and r2
r1 r2
r1 r2
r1 *
Are regular expressions
r1
3
Examples
A regular expression:
a b c * (c )
Not a regular expression:
a b
4
Languages of Regular Expressions
L r :
language of regular expression
r
Example
L(a b c) * , a, bc, aa, abc, bca,...
5
Definition
For primitive regular expressions:
L
L
La a
6
Definition (continued)
For regular expressions r1 and r2
Lr1 r2 Lr1 Lr2
Lr1 r2 Lr1 Lr2
Lr1 * Lr1 *
Lr1 Lr1
7
Example
Regular expression: a b a *
La b a * La b La *
La b La *
La Lb La *
a b a *
a, b , a, aa, aaa,...
a, aa, aaa,...,b, ba, baa,...
8
Example
Regular expression
r a b * a bb
Lr a, bb, aa, abb, ba, bbb,...
9
Example
Regular expression
Lr {a b
r aa * bb * b
2n 2m
b : n, m 0}
10
Example
Regular expression
r (0 1) * 00 (0 1) *
L(r ) = { all strings with at least
two consecutive 0 }
11
Example
Regular expression
r (1 01) * (0 )
L(r ) = { all strings without
two consecutive 0 }
12
Equivalent Regular Expressions
Definition:
Regular expressions
are equivalent if
r1 and r2
L(r1) L(r2 )
13
Example
L = { all strings without
two consecutive 0 }
r1 (1 01) * (0 )
r2 (1* 011*) * (0 ) 1* (0 )
L(r1) L(r2 ) L
r1 and r2
are equivalent
regular expr.
14
Example
Let L be the language that consists of all
Strings over {a, b} that begin with aa or
end with bb.
aa(a+b)* + (a+b)*bb
15
Example
L is the set of strings over {a, b} that
contains exactly two b’s.
a*ba*ba*
16
Example
L is the set of strings over {a, b} that do not end
in aaa.
(a+b)*(b+ba+baa)+a+aa+
17
Example
L is the set of strings over {a, b, c} that do
not contain the substring bc.
c*(b+ac*)*
18
Regular Expressions
and
Regular Languages
19
Theorem
Languages
Generated by
Regular Expressions
Regular
Languages
20
We will show:
Languages
Generated by
Regular Expressions
Regular
Languages
Languages
Generated by
Regular Expressions
Regular
Languages
21
Proof - Part 1
Languages
Generated by
Regular Expressions
Regular
Languages
For any regular expression
the language
r
L(r ) is regular
Proof by induction on the size of
r
22
Induction Basis
Primitive Regular Expressions:
, ,
NFAs
L( M1) L()
L( M 2 ) {} L( )
a
regular
languages
L( M 3 ) {a} L(a)
23
Inductive Hypothesis
Assume
for regular expressions r1 and r2
that
L(r1 ) and L(r2 ) are regular languages
24
Inductive Step
We will prove:
Lr1 r2
Lr1 r2
Lr1 *
Lr1
Are regular
Languages
25
By definition of regular expressions:
Lr1 r2 Lr1 Lr2
Lr1 r2 Lr1 Lr2
Lr1 * Lr1 *
Lr1 Lr1
26
By inductive hypothesis we know:
L(r1 ) and L(r2 ) are regular languages
We also know:
Regular languages are closed under:
Union
Concatenation
Star
Lr1 Lr2
Lr1 Lr2
Lr1 *
27
Therefore:
Lr1 r2 Lr1 Lr2
Lr1 r2 Lr1 Lr2
Are regular
languages
Lr1 * Lr1 *
28
And trivially:
L((r1)) is a regular language
29
Proof - Part 2
Languages
Generated by
Regular Expressions
Regular
Languages
For any regular language L there is
a regular expression r with L(r )
L
Proof by construction of regular expression
30
Since L is regular take the
NFA M that accepts it
L( M ) L
Single final state
31
From M construct the equivalent
Generalized Transition Graph
in which transition labels are regular expressions
Example:
M
a
c
a, b
a
c
ab
32
Example
3
a
1
a
2
b
b
4
a
5
b
Delete node 2
3
b
ab
1
ab
4
a
5
bb
33
3
b
ab
1
ab
4
a
5
bb
Delete node 3
bab
1
ab
4
a
5
bb
34
bab bb
1
ab
4
a
5
Delete node 4
1
ab(bab bb) * a
5
35
Another Example:
a
q0
b
b
q1 a, b
q2
b
b
b
a
q0
q1 a b q2
b
36
Reducing the states:
b
a
q0
b
q1 a b q2
b
bb * a
q0
b
bb * (a b)
q2
37
Resulting Regular Expression:
bb * a
q0
b
bb * (a b)
q2
r (bb * a) * bb * (a b)b *
L( r ) L( M ) L
38
In General
e
Removing states:
d
qi
c
qj
q
a
b
ae * d
ce * b
ce * d
qi
qj
ae * b
39
The final transition graph:
r1
r4
r3
q0
r2
qf
The resulting regular expression:
r r1 * r2 (r4 r3r1 * r2 ) *
L( r ) L( M ) L
40
Applications of RE
Two common applications of RE:
Lexical analysis in compiler
Finding patterns in text
41
Lexical Analyzer
• Recognize “tokens” in a program source
code.
• The tokens can be variable names, reserved
words, operators, numbers, … etc.
• Each kind of token can be specified as an
RE, e.g., a variable name is of the form [AZa-z][A-Za-z0-9]*. We can then construct
an NFA to recognize it automatically.
42
• By putting all these NFA’s together, we
obtain one that can recognize different
kinds of tokens in the input string.
• We can convert this NFA to DFA, and
implement this DFA as a deterministic
program - the lexical analyzer.
43
Text Search
• “grep” in Unix stands for “Global (search
for) Regular Expression and Print”.
• Unix has its own notations for regular
expressions:
– Dot “.” stands for
• “any character”.
– [a1a2…ak] stands for
• {a1, a2…ak}, e.g., [bcd12] stands for the set {b, c, d,
1, 2}.
– [x-y] stands for
• all characters from x to y in the ASCII sequence.
44
–| means
•“or”, i.e., + in our normal notation.
–* means
•“Kleene star”, as in our normal notation.
–? means
•“zero or one”
–+ means
•“one or more”, e.g., R+ is RR*
–{n} means
•“n copies of”, e.g., R{5} is RRRRR
(You can find out more by “man grep”, “man regex”)
45
• We can use these notations to search for
string patterns in text.
• For example, credit card numbers:
– [0-9]{16} | [0-9]{4}-[0-9]{4}-[0-9]{4}-[0-9]{4}
46
Grammar
Languages can also be represented
by grammar.
47
Grammars
Grammars express languages
Example:
the English language
sentence noun _ phrase
noun _ phrase article
predicate
noun
predicate verb
48
article a
article the
noun cat
noun dog
verb runs
verb walks
49
A derivation of “the dog walks”:
sentence noun _ phrase
predicate
noun _ phrase
verb
article
verb
noun
the noun
verb
the dog verb
the dog walks
50
A derivation of “a cat runs”:
sentence noun _ phrase
predicate
noun _ phrase
verb
article
noun
verb
a noun
verb
a cat verb
a cat runs
51
Language of the grammar:
L = { “a cat runs”,
“a cat walks”,
“the cat runs”,
“the cat walks”,
“a dog runs”,
“a dog walks”,
“the dog runs”,
“the dog walks” }
52
Notation
Production Rules
noun cat
noun dog
Variable
Terminal
53
Another Example
Grammar:
S aSb
S
Derivation of sentence
ab :
S aSb ab
S aSb
S
54
Grammar:
S aSb
S
Derivation of sentence
aabb :
S aSb aaSbb aabb
S aSb
S
55
Other derivations:
S aSb aaSbb aaaSbbb aaabbb
S aSb aaSbb aaaSbbb
aaaaSbbbb aaaabbbb
56
Language of the grammar
S aSb
S
L {a b : n 0}
n n
57
More Notation
Grammar
G V ,T , S , P
V:
Set of variables
T:
Set of terminal symbols
S : Start variable
P:
Set of Production rules
58
Example
Grammar
G:
S aSb
S
G V ,T , S , P
V {S}
T {a, b}
P {S aSb, S }
59
More Notation
Sentential Form:
A sentence that contains
variables and terminals
Example:
S aSb aaSbb aaaSbbb aaabbb
Sentential Forms
sentence
60
We write:
*
S aaabbb
Instead of:
S aSb aaSbb aaaSbbb aaabbb
61
In general we write:
If:
*
w1 wn
w1 w2 w3 wn
62
By default:
*
w w
63
Example
Grammar
S aSb
S
Derivations
*
S
*
S ab
*
S aabb
*
S aaabbb
64
Example
Grammar
S aSb
S
Derivations
S aaSbb
aaSbb aaaaaSbbbbb
65
Another Grammar Example
Grammar G : S Ab
A aAb
A
Derivations:
S ⇒Ab ⇒b
S ⇒Ab ⇒aAbb ⇒abb
S ⇒Ab ⇒aAbb ⇒aaAbbb ⇒aabbb
66
More Derivations
S Ab aAbb aaAbbb aaaAbbbb
aaaaAbbbbb aaaabbbbb
S aaaabbbbb
S aaaaaabbbbbbb
S a b b
n n
67
Language of a Grammar
For a grammar G
with start variable
S :
L(G ) {w : S w}
String of terminals
68
Example
For grammar
S Ab
G:
A aAb
A
L(G ) {a b b : n 0}
n n
Since:
S a b b
n n
69
A Convenient Notation
A aAb
A
article a
article the
A aAb |
article a | the
70
Regular Grammars
A grammar G = (V, T, S, P) is right-linear if all productions
are of the form
A → xB, A → x
where A, B V, x T*
A grammar G = (V, T, S, P) is left-linear if all productions are
of the form
A → Bx, A → x
where A, B V, x T*
A regular grammar is one that either right-linear or leftlinear
71
Assignment#1
Read Textbook 3.3
Then finish Exercise 1, 2, 5, 6 on Page 96-97
72
Standard Representations
of Regular Languages
Regular Languages
FAs
Regular
Grammar
NFAs
Regular
Expressions
73
Equivalence
Regular
Expressions
Theorem 3.1
Theorem 3.2
NFA
Theorem 2.2
Any DFA is
trivially an NFA
Theorem 2.3
& Theorem 2.4
DFA
Theorem 3.3
Theorem 3.4
Regular
Grammars
74
Elementary Questions
about
Regular Languages
75
When we say:
We mean:
We are given
a Regular Language
L
Language L is in a standard
representation
76
Membership Question
Question:
Answer:
Given regular language L
and string w
how can we check if w
L?
Take the DFA that accepts L
and check if w is accepted
77
DFA
w
w L
DFA
w
w L
78
Question:
Answer:
Given regular language L
how can we check
if L is empty: ( L ) ?
Take the DFA that accepts
L
Check if there is any path from
the initial state to a final state
79
DFA
L
DFA
L
80
Question:
Given regular language
how can we check
if L is finite?
Answer: Take the DFA that accepts
L
L
Check if there is a walk with cycle
from the initial state to a final state
81
DFA
L is infinite
DFA
L is finite
82
Question: Given regular languages L1 and
how can we check if L1 L2 ?
Answer:
Find if
L2
( L1 L2 ) ( L1 L2 )
83
( L1 L2 ) ( L1 L2 )
and
L1 L2
L1
L2 L
2
L1 L2
L1 L2
L2
L1 L1
L2 L1
L1 L2
84
( L1 L2 ) ( L1 L2 )
L1 L2
L1
or
L1 L2
L2
L2
L1 L2
L1
L2 L1
L1 L2
85
© Copyright 2026 Paperzz