unit production

Lecture 09:
Theory of Automata:08
Context Free Grammars
Lecture 09:
Theory of Automata:08
The Total Language Tree
• It is possible to depict the generation of all the words in
the language of a CFG simultaneously in one big
(possibly infinite) tree.
Definition:
• For a given CFG, we define a tree with the start symbol
S as its root and whose nodes are working strings of
terminals and nonterminals. The descendants of each
node are all the possible results of applying every
applicable production to the working string, one at a
time. A string of all terminals is a terminal node in the
tree. The resultant tree is called the total language tree
of the CFG.
2
Lecture 09:
Theory of Automata:08
Example
• Consider the CFG:
S → aa | bX |aXX
X → ab |b
• The total language tree is
3
Lecture 09:
Theory of Automata:08
• The above total language has only 7 different words.
• Four of its words (abb, aabb, abab, aabab) have two different
derivations because they appear as terminal nodes in two different
places.
• However, these words are NOT generated by two different
derivation trees. Hence, the CFG is unambiguous. For example,
4
Lecture 09:
Theory of Automata:08
Example
• Consider the CFG:
S → aSb | bS | a
• The language of this CFG is infinite, so is the
total language tree: The tree may get arbitrary
wide as well as infinitely long.
• Can you draw the beginning part of this total
language tree?
5
Lecture 09:
Theory of Automata:08
Semi Word
• For a given CFG, semiword is a string of
terminals (may be none) concatenated with with
exactly one non-terminal (on the right).
• In general semiword has the shape
(terminal) (terminal)….(terminal) (Non-Terminal)
e.g. aaaX
abcY
bbY
A word is a string of terminals only (zero or more terminals)
6
Lecture 09:
Theory of Automata:08
Regular Grammar
Given an FA, there is a CFG that generates exactly the
language accepted by the FA.
– In other words, all regular languages are CFLs
CFL
Regular
7
Lecture 09:
Theory of Automata:08
Creating a CFG from an FA
Step-1 The Non-terminals in CFG will be all names of the
states in the FA with the start state renamed S.
Step-2 For every edge
X
a
y
a
x
Create productions XaY or XaX
Do the same for b-edges
Step-3 For every final-state X, create the production
XΛ
8
Lecture 09:
Theory of Automata:08
Example
b
a,b
a
a
S-
M
x
b
S  aM
S  bS
M  aF
M  bS
F  aF
F  bF
FΛ
Note: It is not necessary that each CFG
has a corresponding FA. But each FA
has an equivalent CFG.
9
Lecture 09:
Theory of Automata:08
Regular Grammar
Theorem 22:
If all the productions in a given CFG fit one of the
two forms: Non-terminal  semiword
or Non-terminal  word
(Where the word may be a Λ or string of terminal),
then the language generated by the CFG is
Regular.
Proof:
For a CFG to be regular is by constructing a TG
from the given CFG.
10
Lecture 09:
Theory of Automata:08
Proof contd.
• Let us consider a general CFG in this form
N1  w1N2
N7  w10
N1 w2N3
N18  w23
N2  w3N4
------------------------------------------Where N’s are non-terminal and w’s are the string of
terminal and part wyNz are semiwords.
Let N1=S. Draw a small circle for each N and one extra
circle labelled +, the circle for S we label (-)
N2
-S
N3
N13
……
Nx
……
+
11
Lecture 09:
Theory of Automata:08
Proof contd.
• For each production of the form Nx  wyNz, draw a directed edge
from state Nx to Nz with label wy.
wy
Nx
Nz
• If Nx = Nz, the path is a loop
• For every production of the form Np  wq, draw a directed edge
from Np to + and label it with wq even if wq = Λ.
wq
Np
+
• Any path in TG form – to + corresponds to a word in the language of
TG (by concatenating symbols) and simultaneously corresponds to
sequence of productions on the CFG generating words.
• Conversely every production of the word in the CFG:
S  wN  wwN  wwwN  …..  wwwww
Corresponds to a path in this TG.
12
Lecture 09:
Theory of Automata:08
Example
• Consider the CFG S  aaS | bbS | Λ
aa
bb
-
Λ
+
• The regular expression is given by (aa + bb)*.
• Language accepted?
• Consider the CFG
SaaS | bbS | abX | baX | Λ
• EVEN-EVEN
X aaX | bbX | abS | baS
aa,bb
Λ
+
aa,bb
ab, ba
-
X
ab, ba
13
Lecture 09:
Theory of Automata:08
Killing Λ-Productions
Λ-Productions:
In a given CFG, we call a non-terminal N null able
if there is a production N  Λ, or there is a
derivation that starts at N and lead to a Λ.
N  ………  Λ
• Λ-Productions are undesirable.
• We can replace Λ-production with appropriate
non-Λ productions.
14
Lecture 09:
Theory of Automata:08
Theorem 23
•
•
If L is CFL generated by a CFG having Λ-productions,
then there is a different CFG that has no Λ-production
and still generates either the whole language L (if L
does not include Λ) or else generate the language of all
the words in L other than Λ.
Replacement Rule.
1. Delete all Λ-Productions.
2. Add the following productions:
For every production of the X  old string
Add new production of the form X  .., where right side will
account for every modification of the old string that can be
formed by deleting all possible subsets of null-able
Non-Terminals, except that we do not allow X  Λ, to be
formed if all the character in old string are null-able
15
Lecture 09:
Theory of Automata:08
Example
Consider the CFG
S  a | Xb | aYa
XY|Λ
Yb|X
Old nullable
Production
XY
XΛ
YX
S  Xb
S  aYa
New
Production
nothing
nothing
nothing
Sb
S  aa
So the new CFG is
S  a | Xb | aa | aYa |b
XY
Yb|X
16
Lecture 09:
Theory of Automata:08
Example
Consider the CFG
S  Xa
X  aX | bX | Λ
Old nullable
Production
S  Xa
New
Production
Sa
X  aX
Xa
X  bX
Xb
So the new CFG is
S  a | Xa
X  aX | bX | a | b
17
Lecture 09:
Theory of Automata:08
Example
S  XY
X  Zb
Y  bW
Z  AB
WZ
A  aA | bA | Λ
B  Ba | Bb | Λ
• Null-able Non-terminals are?
• A, B, Z and W
18
Lecture 09:
S  XY
X  Zb
Y  bW
Z  AB
WZ
A  aA | bA | Λ
B  Ba | Bb | Λ
Old nullable
Production
X  Zb
Y  bW
Z  AB
WZ
A  aA
A  bA
B  Ba
B  Bb
Theory of Automata:08
Example Contd.
New
Production
Xb
Yb
Z  A and Z  B
Nothing new
Aa
Ab
B a
Bb
So the new CFG is
S  XY
X  Zb | b
Y  bW | b
Z  AB | A | B
WZ
A  aA | bA | a | b
B  Ba | Ba | a | b
19
Lecture 09:
Theory of Automata:08
Killing unit-productions
• Definition: A production of the form
• Nonterminal  one Nonterminal
is called a unit production.
• The following theorem allows us to get rid of unit
productions:
Theorem 24:
If there is a CFG for the language L that has no
Λ-productions, then there is also a CFG for L with
no Λ-productions and no unit productions.
20
Lecture 09:
Theory of Automata:08
Proof of Theorem 24
• This is another proof by constructive algorithm.
• Algorithm: For every pair of nonterminals A and B, if the
CFG has a unit production A  B, or if there is a chain
A  X1  X2  …  B
where X1, X2, ... are nonterminals, create new productions
as follows:
• If the non-unit productions from B are
B  s1 | s2| …
where s1, s2, ... are strings, we create the productions
A  s1| s2| …
21
Lecture 09:
Theory of Automata:08
Example
• Consider the CFG
S  A| bb
AB|b
BS|a
• The non-unit productions are
S  bb, A  b ,B  a
• And unit productions are
SA
AB
BS
22
Lecture 09:
Theory of Automata:08
Example contd.
• Let’s list all unit productions and their sequences and create new
productions:
SA
gives
Sb
SAB
gives
Sa
AB
gives
Aa
ABS
gives
A  bb
BS
gives
B  bb
BSA
gives
Bb
• Eliminating all unit productions, the new CFG is
S  bb | b | a
A  b | a | bb
B  a | bb | b
• This CFG generates a finite language since there are no
nonterminals in any strings produced from S.
23
Lecture 09:
Theory of Automata:08
Useless Symbols
•
Let a CFG G. A symbol X ε (V U ∑) is useful if there is a derivation
*
*
G
G
S UxV  w
Where U and V ε (V U ∑) and w ε ∑*. A symbol that is not useful is useless
• A terminal is useful if it occurs in a string of the language of G.
• A variable is useful if it occurs in a derivation that begins from S and
generates a terminal string
For a variable to be useful two conditions must be satisfied.
1. The variable must occur in a sentential form of the grammar
2. There must be a derivation of a terminal string from the variable.
• A variable that occurs in a sentential form is said to be reachable from
S.
• A two part procedure is presented to eliminate useless symbols.
24
Lecture 09:
Theory of Automata:08
Useless Productions
S  aSb
S 
SA
A  aA Useless Production
Some derivations never terminate...
S  A  aA  aaA    aaaA  
28
Lecture 09:
Theory of Automata:08
Another grammar:
SA
A  aA
A
B  bA Useless Production
Not reachable from S
29
Lecture 09:
Theory of Automata:08
In general:
if
contains only
terminals
S    xAy    w
w L(G )
then variable
A
is useful
otherwise, variable
A
is useless
30
Lecture 09:
Theory of Automata:08
A production A  x is useless
if any of its variables is useless
S  aSb
S 
SA
Productions
useless
useless
A  aA
useless
useless
BC
useless
useless
CD
useless
Variables
31
Lecture 09:
Theory of Automata:08
Removing Useless Productions
Example Grammar:
S  aS | A | C
Aa
B  aa
C  aCb
32
Lecture 09:
First:
Theory of Automata:08
find all variables that can produce
strings with only terminals
S  aS | A | C
Round 1:
Aa
SA
B  aa
C  aCb
{ A, B}
Round 2:
{ A, B, S}
33
Lecture 09:
Theory of Automata:08
Keep only the variables
that produce terminal symbols:
{ A, B, S}
(the rest variables are useless)
S  aS | A | C
Aa
S  aS | A
Aa
B  aa
C  aCb
B  aa
Remove useless productions
34
Lecture 09:
Theory of Automata:08
Second: Find all variables
reachable from
S
Use a Dependency Graph
S  aS | A
Aa
B  aa
S
A
B
not
reachable
35
Lecture 09:
Theory of Automata:08
Keep only the variables
reachable from S
(the rest variables are useless)
Final Grammar
S  aS | A
Aa
B  aa
S  aS | A
Aa
Remove useless productions
36
Lecture 09:
Theory of Automata:08
Set of variables that Derive terminal symbols
•
Input = CFG (V, ∑, P , S)
•
TERM = { A | there is a rule Aw ε P with w ε
∑*
repeat
•
–
–
PREV = TERM
For each variable in A ε V do
•
•
If there is a rule A  w and w ε (PREV U ∑)* then
TERM = TERM U {A}
Until PREV = TERM
37
Lecture 09:
Theory of Automata:08
Example
• Consider following CFG
G:
S  AC | BS | B
A  aA | aF
B  CF | b
C cC | D
D aD | BD | C
E  aA | BSA
F  bB | b
38
Lecture 09:
Theory of Automata:08
• New Grammar from
TERM will be
GT:
S BS | B
A  aA | aF
Bb
E  aA | BSA
F  bB | b
S  AC | BS | B
A  aA | aF
B  CF | b
C cC | D
D aD | BD | C
E  aA | BSA
F  bB | b
Iteration
0
1
2
3
TERM
{B, F}
{B, F, A, S}
{B, F, A, S, E}
{B, F, A, S, E}
PREV
{}
{B, F}
{B, F, A, S}
{B, F, A, S, E}
39
Lecture 09:
Theory of Automata:08
Construction of set of reachable Variables
•
•
1.
2.
Input = CFG (V, ∑, P , S)
REACH = {S}
PREV = null
repeat
i. NEW = REACH – PREV
ii. PREV = REACH
iii. For each variable A in NEW do
i.
For each rule A  w do add all variables in w to
REACH
3. Until REACH = PREV
40
Lecture 09:
Theory of Automata:08
GT:
S BS | B
A  aA | aF
Bb
E  aA | BSA
F  bB | b
Iteration
0
1
2
3
REACH
{S}
{S, B}
{S, B}
{S, B}
PREV
{}
{S}
{S, B}
{S, B}
NEW
{S}
{B}
41
Lecture 09:
Theory of Automata:08
Removing All
• Step 1: Remove Nullable Variables
• Step 2: Remove Unit-Productions
• Step 3: Remove Useless Variables
42