Chapter1.pdf

Theory of Languages
and Automata
Chapter 1- Regular Languages
& Finite State Automaton
Sharif University of Technology
Finite State Automaton
O We begin with the simplest model of Computation,
called finite state machine or finite automaton.
O are good models for computers with an extremely
limited amount of memory.
➜ Embedded Systems
O
Markov Chains are the probabilistic counterpart of
Finite Automata
Theory of Languages and Automata
Prof. Movaghar
2
Simple Example
O Automatic door
Door
Theory of Languages and Automata
Prof. Movaghar
3
Simple Example (cont.)
O State Diagram
O State Transition Table
Neither
Front
Rear
Both
Closed
Closed
Open
Closed
Closed
Open
Closed
Open
Open
Open
Theory of Languages and Automata
Prof. Movaghar
4
Formal Definition
O A finite automaton is a 5-tuple (Q ,Σ
,δ ,q0, F),
where
1. Q is a finite set called states,
2. Σ is a finite set called the alphabet,
3. δ : Q ×Σ ⟶ Q is the transition function,
4. q0 ∈ Q is the start state, and
5. F ⊆ Q is the set of accept states.
Theory of Languages and Automata
Prof. Movaghar
5
Example
M1 = (Q, Σ, δ, q0, F) , where
1. Q = {q1, q2, q3},
2. Σ = {0,1},
3. δ is described as
O
0
1
q1 q1 q2
q2 q3 q2
q3 q2 q2
1. q1 is the start state, and
2. F = {q2}.
Theory of Languages and Automata
Prof. Movaghar
6
Language of a Finite machine
O If A is the set of all strings that machine M accepts,
we say that A is the language of machine M and
write: L(M) = A.
✓ We say that M recognizes A
or that M accepts A.
Theory of Languages and Automata
Prof. Movaghar
7
Example
O
L(M1) = {w | w contains at least one 1 and
even number of 0s follow the last 1}.
Theory of Languages and Automata
Prof. Movaghar
8
Example
O
M4 accepts all strings that start and end with a or with
b.
Theory of Languages and Automata
Prof. Movaghar
9
Formal Definition
M = (Q, Σ, δ, q0, F)
O w = w1w2 … wn
O
O
∀i, wi ∈ Σ
M accepts w ⇔ ∃ r0, r1, …, rn ∀i, ri ∈ Q
1. r0 = q0 ,
2. δ(ri, wi+1) = ri+1, for i = 0, … , n-1,
3. rn ∈ F.
Theory of Languages and Automata
Prof. Movaghar
10
Regular Language
O
A language is called a regular language if
some finite automaton recognizes it.
Theory of Languages and Automata
Prof. Movaghar
11
Example
O
L (M5) = {w | the sum of the symbols in w is 0 modulo
3,
except that <RESET> resets the count to 0}.
✓ As M5 recognizes this language, it is a regular language.
Theory of Languages and Automata
Prof. Movaghar
12
Designing Finite Automata
O Put yourself in the place of the machine and then
see how you would go about performing the
machine’s task.
O Design a finite automaton to recognize the regular
language of all strings that contain the string 001 as
a substring.
Theory of Languages and Automata
Prof. Movaghar
13
Designing Finite Automata (cont.)
O There are four possibilities: You
haven’t just seen any symbols of the pattern,
2. have just seen a 0,
3. have just seen 00, or
4. have seen the entire pattern 001.
1.
Theory of Languages and Automata
Prof. Movaghar
14
The Regular Operations
O Let A and B be languages. We define the regular
operations union, concatenation, and star as
follows.
O Union: A∪B = {x | x∈A or x∈B}.
O Concatenation: A∘B = {xy | x∈A and y∈B }.
O Star: A* = {x1x2…xk | k ≥ 0 and each xi∈A }.
Theory of Languages and Automata
Prof. Movaghar
15
Closure Under Union
O THEOREM
The class of regular languages is closed under the
union operation.
Theory of Languages and Automata
Prof. Movaghar
16
Proof
Let M1 = (Q1, Σ1, δ1, q1, F1) recognize A1, and
M2 = (Q2, Σ2, δ2, q2, F2) recognize A2.
O Construct M = (Q, Σ, δ, q0, F) to recognize A1∪A2.
1. Q = Q1 × Q2
2. Σ = Σ1 ∪ Σ2
3. δ((r1,r2),a) = (δ1(r1,a), δ2 (r2,a)).
4. q0 is the pair (q1, q2).
5. F is the set of pair in which either members in an
accept state of M1 or M2.
F = (F1× Q2 ) ∪ (Q1 × F2)
F ≠ F1× F2
O
Theory of Languages and Automata
Prof. Movaghar
17
Closure under Concatenation
O THEOREM
The class of regular languages is closed under the
concatenation operation.
O To prove this theorem we introduce a new technique
called nondeterminism.
Theory of Languages and Automata
Prof. Movaghar
18
Nondeterminism
O In a nondeterministic machine, several choices
may exit for the next state at any point.
O Nondeterminism is a generalization of determinism,
so every deterministic finite automaton is
automatically a nondeterministic finite automaton.
Theory of Languages and Automata
Prof. Movaghar
19
Differences between DFA & NFA
O
First, very state of a DFA always has exactly one exiting
transition arrow for each symbol in the alphabet.
In an NFA a state may have zero, one, or more exiting
arrows for each alphabet symbol.
O Second, in a DFA, labels on the transition arrows are
symbols from the alphabet.
An NFA may have arrows labeled with members of the
alphabet or ε. Zero, one, or many arrows may exit from
each state with the label ε.
Theory of Languages and Automata
Prof. Movaghar
20
Deterministic vs. Nondeterministic
Computation
Theory of Languages and Automata
Prof. Movaghar
21
Example
O Consider the computation of N1 on input 010110.
Theory of Languages and Automata
Prof. Movaghar
22
Example (cont.)
Theory of Languages and Automata
Prof. Movaghar
23
Formal Definition
O
A nondeterministic finite automaton is a 5-tuple
(Q ,Σ ,δ ,q0, F), where
1. Q is a finite set of states,
2. Σ is a finite alphabet,
3. δ : Q ×Σε ⟶ P(Q) is the transition function,
4. q0 ∈ Q is the start state, and
5. F ⊆ Q is the set of accept states.
Theory of Languages and Automata
Prof. Movaghar
24
Example
N1 = (Q, Σ, δ, q0, F) , where
1. Q = {q1, q2, q3, q4},
2. Σ = {0,1},
3. δ is given as
O
0
q1
q2
q3
q4
1
ε
{q1} {q1,q2} ∅
{q3}
∅
{q4}
∅
{q4}
∅
{q4} {q4}
∅
1. q1 is the start state, and
2. F = {q4}.
Theory of Languages and Automata
Prof. Movaghar
25
Equivalence of NFAs & DFAs
O THEOREM
Every nondeterministic finite automaton has an
equivalent deterministic finite automaton.
O PROOF IDEA convert the NFA into an equivalent
DFA that simulates the NFA.
If k is the number of states of the NFA, so the DFA
simulating the NFA will have 2 k states.
Theory of Languages and Automata
Prof. Movaghar
26
Proof
O Let N = (Q ,Σ
,δ ,q0, F) be the NFA recognizing A.
We construct a DFA M =(Q' ,Σ' ,δ' ,q0', F’) recognizing
A.
O let's first consider the easier case wherein N has no ε
arrows.
1. Q' = P(Q).
2.
3.
q0’ = q0 .
4.
F' = {R∈ Q’ | R contains an accept state of N}.
Theory of Languages and Automata
Prof. Movaghar
27
Proof (cont.)
O Now we need to consider the ε arrows.
O for R ⊆ Q let
O
E(R) = {q | q can be reached from R by traveling
along 0 or more ε arrows}.
1. Q' = P(Q).
2.
δ' (R,a) ={q∈ Q | q∈ E(δ(r,a)) for some r∈ R}.
3.
q0’ = E({q0}) .
4.
F' = {R∈ Q’ | R contains an accept state of N}.
Theory of Languages and Automata
Prof. Movaghar
28
Corollary
O A language is regular if and only if some
nondeterministic finite automaton recognizes
it.
Theory of Languages and Automata
Prof. Movaghar
29
Example
O D’s state set is
{∅,{1},{2},{3},{1,2},{1,3},{2,3},{1,2,3}}.
O The start state is E({1}) = {1,3}.
O The accept states are
{{1},{1,2},{1,3},{1,2,3}}.
Theory of Languages and Automata
Prof. Movaghar
30
Example (cont.)
After
removing
unnecessary
states
Theory of Languages and Automata
Prof. Movaghar
31
CLOSURE UNDER THE
REGULAR OPERATIONS
[Using NFA]
Theory of Languages and Automata
Prof. Movaghar
32
Closure Under Union
O The class of regular languages is closed
under the Union operation.
Let NFA1 recognize A1 and NFA2 recognize A2.
Construct NFA3 to recognize A1 U A2.
Theory of Languages and Automata
Prof. Movaghar
33
Proof (cont.)
Theory of Languages and Automata
Prof. Movaghar
34
Closure Under Concatenation Operation
O The class of regular languages is closed
under the concatenation operation.
Theory of Languages and Automata
Prof. Movaghar
35
Proof (cont.)
Theory of Languages and Automata
Prof. Movaghar
36
Closure Under Star operation
O The class of regular languages is closed
under the star operation.
O We represent another NFA to recognize A*.
Theory of Languages and Automata
Prof. Movaghar
37
Proof (cont.)
1.
2.
3.
4.
5.
𝑄 = 𝑄0 𝑈𝑄1 The states of N are the states of N1 plus a
new start state.
The state q0 is the new start state.
𝐹 = 𝑞0 𝑈 𝐹1
The accept states are the old accept states plus the new
start state.
Define 𝛿 so that for any 𝑞 ∈ 𝑄 𝑎𝑛𝑑 𝑎 ∈ 𝜀 :
Theory of Languages and Automata
Prof. Movaghar
38
Regular Expression
O Say that R is a regular expression if R is:
1. a for some a in the alphabet Σ
2. 𝜀
3. ∅
4.
5.
6.
𝑅1 𝑈𝑅2 , 𝑤𝑕𝑒𝑟𝑒 𝑅1 𝑎𝑛𝑑 𝑅2 𝑎𝑟𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟 𝑒𝑥𝑝.
𝑅1 𝑜 𝑅2 , 𝑤𝑕𝑒𝑟𝑒 𝑅1 𝑎𝑛𝑑 𝑅2 𝑎𝑟𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟 𝑒𝑥𝑝
𝑅1∗ , 𝑤𝑕𝑒𝑟𝑒 𝑅1 𝑎𝑛𝑑 𝑅2 𝑎𝑟𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟 𝑒𝑥𝑝
Recursive Definition?
Theory of Languages and Automata
Prof. Movaghar
39
Regular Expression Language
O Let R be a regular expression. L( R ) is the language
that is defined by R:
1. 𝑖𝑓 𝑅 = 𝑎 𝑓𝑜𝑟 𝑎 ∈ Σ then L( R )={a}
2. 𝑖𝑓 𝑅 = 𝜀 𝑡𝑕𝑒𝑛 𝐿 𝑅 = *𝜀+
3. 𝑖𝑓 𝑅 = ∅ 𝑡𝑕𝑒𝑛 𝐿 𝑅 = ∅
4. 𝑖𝑓 𝑅 = 𝑅1 𝑈𝑅2 𝑡𝑕𝑒𝑛 𝐿 𝑅 = 𝐿 𝑅1 𝑈𝐿 𝑅2
5. 𝑖𝑓 𝑅 = 𝑅1 ∩ 𝑅2 𝑡𝑕𝑒𝑛 𝐿 𝑅 = 𝐿 𝑅1 ∩ 𝐿 𝑅2
6.
𝑖𝑓 𝑅 =
𝑅1∗
𝑡𝑕𝑒𝑛 𝐿 𝑅 = 𝐿 𝑅1
Theory of Languages and Automata
∗
Prof. Movaghar
40
Examples(cont.)
1. 0 ∪ 𝜀 1∗ = 01∗ ∪ 1∗
2. Σ∗ 1Σ∗ = *𝑤|𝑤 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑜𝑛𝑒 1+
3. 0∗ 10∗ = *𝑤|𝑤 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 𝑎 𝑠𝑖𝑛𝑔𝑙𝑒 1+
4. Σ∗ 001Σ∗ = *𝑤|𝑤 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑠 001 𝑎𝑠 𝑎 𝑠𝑢𝑏𝑠𝑡𝑟𝑖𝑛𝑔+
5. 01𝑈10 = *01,10+
6. (ΣΣ)∗ = *𝑤|𝑤 𝑖𝑠 𝑎 𝑠𝑡𝑟𝑖𝑛𝑔 𝑜𝑓 𝑒𝑣𝑒𝑛 𝑙𝑒𝑛𝑔𝑡𝑕+
7. (ΣΣΣ)∗ = *𝑤|𝑡𝑕𝑒 𝑙𝑒𝑛𝑡𝑔𝑕 𝑜𝑓 𝑤 𝑖𝑠 𝑎 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 𝑜𝑓 3+
8(ΣΣΣΣ)∗ = *𝑤|𝑡𝑕𝑒 𝑙𝑒𝑛𝑡𝑔𝑕 𝑜𝑓 𝑤 𝑖𝑠 𝑎 𝑚𝑢𝑙𝑡𝑖𝑝𝑙𝑒 𝑜𝑓 4+
9. 0 ∪ 𝜀 1∗ = 01∗ ∪ 1∗
10. 0 ∪ 𝜀 (1 ∪ 𝜀) = *𝜀, 0,1,01+
11. 1∗ ∅ = ∅
12. ∅∗ = *𝜀+
Theory of Languages and Automata
Prof. Movaghar
41
Equivalence of DFA and Regular Expression
O A language is regular if and only if some regular
expression describes it.
Lemma:
O If a language is described by a regular expression, then
it is regular.
O If a language is regular, then it is described by a regular
expression.
Theory of Languages and Automata
Prof. Movaghar
42
Building an NFA from the Regular Expression
O We consider the six cases in the formal definition of
regular expressions
Theory of Languages and Automata
Prof. Movaghar
43
Examples
Theory of Languages and Automata
Prof. Movaghar
44
Other direction of the proof
O We need to show that, if a language A is regular, a
regular expression describes it!
O First we show how to convert DFAs into GNFAs, and
then GNFAs into regular expressions.
O We can easily convert a DFA into a GNFA in the special
form.
Theory of Languages and Automata
Prof. Movaghar
45
Formal Definition
A generalized nondeterministic finite automaton is a
5-tuple, a 5-tuple (Q ,Σ ,δ ,qstart , qaccept), where
1. Q is a finite set called states,
2. Σ is a the input alphabet,
3. δ : (𝑄 − *𝑞𝑎𝑐𝑐𝑒𝑝𝑡 +) × (Q − *𝑄𝑠𝑡𝑎𝑟𝑡 + ⟶ 𝑅 is
the transition function,
4. qstart is the start state, and
5. qaccept is the accept state.
Theory of Languages and Automata
Prof. Movaghar
46
Assumptions
For convenience we require that GNFAs always have a
special form that meets the following conditions:
1. The start state has transition arrows going to every other
state but no arrows coming in from any other state.
2. There is only a single accept state, and it has arrows
coming in from every other state but no arrows going to
any other state. Furthermore, the accept state is not the
same as the start state.
3. Except for the start and accept states, one arrow goes
from every state to every other state and also from each
state to itself.
Theory of Languages and Automata
Prof. Movaghar
47
Acceptance of Languages for
GNFA
O A GNFA accepts a string w in Σ* if w = w1 w2 … wk,
where each wi is in Σ* is in Σ* and a sequence of q0,
q1, …, qk exists such that
1. q0 = qstart is the start state,
2. qk = qaccept is the accept state, and
3. For each i, we have wi L(Ri) where Ri = δ(qi-1, qi); in
other words Ri is the expression on the arrow from
qi-1 to qi.
How to Eliminate a State?
Theory of Languages and Automata
Prof. Movaghar
49
Example
Theory of Languages and Automata
Prof. Movaghar
50
Example
Theory of Languages and Automata
Prof. Movaghar
51
Grammar
O A grammar G is a 4-tuple
G = (V, Σ, R, S)
where:
1. V is a finite set of variables,
2. Σ is a finite, disjoint from V, of terminals,
3. R is a finite set of rules,
4. S is the start variable.
Theory of Languages and Automata
Prof. Movaghar
52
Rule
A rule is of the form
𝑥→𝑦
+
where 𝑥 ∈ (𝑉 ∪ Σ) and 𝑦 ∈ (𝑉 ∪ Σ)
∗
The rules are applied in the following manner: given a string w of the
form
w = uxv,
We say that the rule x → y is applicable to this string, and we may use
it to replace x with y, thereby obtaining a new string
z = uyv,
This is written as
𝑤 ⇒ 𝑧.
Theory of Languages and Automata
Prof. Movaghar
53
Derivation
O If
𝑊1 ⇒ 𝑊2 ⇒ … ⇒ 𝑊𝑛
we say that W1 derives Wn and write
𝑊1 ⇒∗ 𝑊𝑛
Thus, we always have
𝑊 ⇒∗ 𝑊
Theory of Languages and Automata
Prof. Movaghar
54
Language of a Grammar
Let G = (V, Σ, R, S) be a grammar. Then, the set
𝑳 𝑮 = *𝑾 ∈ 𝑻∗ : 𝑺 ⇒∗ 𝑾+
is the language generated by G.
Theory of Languages and Automata
Prof. Movaghar
55
Example
Consider the grammar
G = ({S}, {a,b}, P, S}
with P given by
S → aSb
S→ε
Then
𝑆 → 𝑎𝑆𝑏 → 𝑎𝑎𝑆𝑏𝑏 → 𝑎𝑎𝑏𝑏
So we can write
𝑆 ⇒∗ 𝑎𝑎𝑏𝑏
Then,
L(G) = {an bn: n ≥ 0}
Theory of Languages and Automata
Prof. Movaghar
56
A Notation for Grammars
Consider the grammar
G = ({S}, {a,b}, P, S}
with P given by
S → aSb
S→ε
The above grammar is usually written as:
G: S → aSb | ε
Theory of Languages and Automata
Prof. Movaghar
57
Regular Grammar
A grammar G = (V, Σ, R, S) is said to be right-linear if all
rules are of the form
A → xB
A→x
Where A, B  V, and X Σ*. A grammar is said to be leftlinear if all rules are of the form
A → Bx
A→x
A regular grammar is one that is either right-linear or
left-linear.
Theory of Languages and Automata
Prof. Movaghar
58
Theorem
Let G = (V, Σ, R, S) be a right-linear grammar. Then:
L(G) is a regular language.
Theory of Languages and Automata
Prof. Movaghar
59
Example
Construct a NFA that accepts the language
generated by the grammar
V0 → aV1
V1 → abV0 | b
a
V1
V0
b
Vf
a
b
V2
Theory of Languages and Automata
Prof. Movaghar
60
Theorem
Let L be a regular language on the alphabet Σ.
Then:
There exists a right-linear grammar G = (V, Σ, R, S)
Such that L = L(G).
Theory of Languages and Automata
Prof. Movaghar
61
Theorem
Theorem A language is regular if and only if there exists
a left-linear grammar G such that L = L(G).
Outline of the proof:
Given any left-linear grammar with rules of the form
A → Bx
A→x
We can construct a right-linear Ĝ by replacing every such rule of G
with
A → xRB
A → xR
We have L(G) = L(Ĝ)R .
Theory of Languages and Automata
Prof. Movaghar
62
Theorem
A language L is regular if and only if there exists a
regular grammar G such that L = L(G).
𝐿 𝑖𝑠 𝒓𝒆𝒈𝒖𝒍𝒂𝒓 ↔ ∃ 𝐺: 𝐿 = 𝐿 𝐺 ; 𝐺 𝑖𝑠 𝒓𝒆𝒈𝒖𝒍𝒂𝒓
Theory of Languages and Automata
Prof. Movaghar
63