Simplifications
of
Context-Free Grammars
1
Assumption
The language discussed here does not contain
the empty string.
Let L be any context-free language, let G = (V,
T, S, P) be a context-free grammar for L – { }.
Then the grammar obtain by adding to V the
new variable S0, Making the start variable, and
adding to P the productions:
S S0 |
generate L.
2
A Substitution Rule
Equivalent
grammar
S aB
A aaA
A abBc
B aA
Bb
S aB | ab
Substitute
Bb
A aaA
A abBc | abbc
B aA
3
A Substitution Rule
S aB | ab
A aaA
A abBc | abbc
B aA
Substitute
B aA
S aB | ab | aaA
A aaA
A abBc | abbc | abaAc
Equivalent
grammar
4
In general:
A xBz
B y1
Substitute
B y1
A xBz | xy1z
equivalent
grammar
5
Nullable Variables
production :
A
Nullable Variable:
A
6
Removing Nullable Variables
Example Grammar:
S aMb
M aMb
M
Nullable variable
7
Why?
S SaB | aB
B bB |
Language generated by the grammar? (ab*)(ab*)*
How many derivation steps for the string aaaa?
S SaB SaBaB SaBaBaB aBaBaBaB
aaBaBaB aaaBaB aaaaB aaaa
8
S SaB | Sa | aB | a
B bB | b
Language generated by the grammar? (ab*)(ab*)*
How many derivation steps for the string aaaa?
S Sa Saa Saaa aaaa
9
Final Grammar
S aMb
M aMb
M
S aMb
Substitute
M
S ab
M aMb
M ab
10
Algorithm to construct the set of nullable
variables
Input: context-free grammar G = (V, T, S, P)
1. NULL : { A P}
2. Repeat
2.1. PREV := NULL
2.2 for each variable AV do
if there is an A rule A w and w PREV * , then
NULL : NULL { A}
until NULL = PREV
11
Example
S ABaC
A BC
B b|
C D|
Dd
Iteration
0
NULL
{B, C}
PREV
1
{A, B, C}
{B, C}
2
{A, B, C}
{A, B, C}
12
Once the set of nullable variables has been found,
a new set of production rules P’ is constructed.
Construction:
For each production A x1 x2 ...xm in the
grammar G
- the production is put into P’
- as well as all those generated by replacing
nullable variables with in all possible
combinations
- if all xi are nullable, the production A
is not added
13
Example
S ABaC
Nullable variables: A, B, C
A BC
B b|
C D|
Dd
New Productions:
S ABaC | BaC | AaC | ABa | aC | Aa | Ba | a
A B | C | BC
B b
CD
Dd
14
Unit-Productions
Unit Production:
A B
(a single variable in both sides)
15
Removing Unit Productions
Observation:
A A
Is removed immediately
16
How about:
A B
BA
17
Example Grammar:
S aA
Aa
A B
BA
B bb
18
S aA
Aa
A B
BA
B bb
S aA | aB
Substitute
A B
Aa
B A| B
B bb
19
S aA | aB
S aA | aB
Aa
B A| B
B bb
Remove
BB
Aa
BA
B bb
20
S aA | aB
Aa
BA
B bb
Substitute
BA
S aA | aB | aA
Aa
B bb
21
Remove repeated productions
Final grammar
S aA | aB | aA
S aA | aB
Aa
Aa
B bb
B bb
22
Algorithm to remove unit productions
Draw a dependency graph with an edge (A, B)
whenever the grammar has a unit production
AB
The new grammar G’ is generated by first
putting into P’ all non-unit productions of P.
*
Then for all A and B satisfying A B , we add
to P’
A y1 | y2 | .... | yn
Where B y1 | y2 | .... | yn is the set of all rules in
P’ with B on the left
23
Another Example
S Aa | B
B A | bb
S
A a | bc | B
B
A
Add original non-unit productions:
S Aa
B bb
A a | bc
Use A->a|bc to substitute:
S Aa | B
B a | bc | bb
A a | bc
Use B->bb to substitute:
S Aa | a | bc | bb
B a | bc | bb
A a | bc | bb
24
Useless Productions
S aSb
S
SA
A aA Useless Production
Some derivations never terminate...
S A aA aaA aaaA
25
Another grammar:
SA
A aA
A
B bA Useless Production
Not reachable from S
26
contains only
terminals
In general:
if
S xAy w
w L(G )
then variable
A
is useful
otherwise, variable
A
is useless
27
A production A x is useless
if any of its variables is useless
S aSb
S
SA
Productions
useless
useless
A aA
useless
useless
BC
useless
useless
CD
useless
Variables
28
Removing Useless Productions
Example Grammar:
S aS | A | C
Aa
B aa
C aCb
29
First:
find all variables that can produce
strings with only terminals
S aS | A | C
Round 1:
Aa
SA
B aa
C aCb
{ A, B}
Round 2:
{ A, B, S}
30
Keep only the variables
that produce terminal symbols:
{ A, B, S}
(the rest variables are useless)
S aS | A | C
Aa
S aS | A
Aa
B aa
C aCb
B aa
Remove useless productions
31
Second: Find all variables
reachable from
S
Use a Dependency Graph
S aS | A
Aa
B aa
S
A
B
not
reachable
32
Keep only the variables
reachable from S
(the rest variables are useless)
Final Grammar
S aS | A
Aa
B aa
S aS | A
Aa
Remove useless productions
33
Removing All
Step 1: Remove Nullable Variables
Step 2: Remove Unit-Productions
Step 3: Remove Useless Variables
34
Normal Forms
for
Context-free Grammars
35
Chomsky Normal Form
Each productions has form:
A BC
variable
or
variable
Aa
terminal
36
Examples:
S AS
S AS
S a
S AAS
A SA
A SA
Ab
A aa
Chomsky
Normal Form
Not Chomsky
Normal Form
37
Conversion to Chomsky Normal Form
Example:
S ABa
A aab
B Ac
Not Chomsky
Normal Form
38
Introduce variables for terminals:
Ta , Tb , Tc
S ABTa
S ABa
A aab
B Ac
A TaTaTb
B ATc
Ta a
Tb b
Tc c
39
Introduce intermediate variable:
S ABTa
A TaTaTb
B ATc
Ta a
Tb b
Tc c
V1
S AV1
V1 BTa
A TaTaTb
B ATc
Ta a
Tb b
Tc c
40
Introduce intermediate variable:
S AV1
V1 BTa
A TaTaTb
B ATc
Ta a
Tb b
Tc c
V2
S AV1
V1 BTa
A TaV2
V2 TaTb
B ATc
Ta a
Tb b
Tc c
41
Final grammar in Chomsky Normal Form:
S AV1
V1 BTa
Initial grammar
S ABa
A aab
B Ac
A TaV2
V2 TaTb
B ATc
Ta a
Tb b
Tc c
42
In general:
From any context-free grammar
(which doesn’t produce )
not in Chomsky Normal Form
we can obtain:
An equivalent grammar
in Chomsky Normal Form
43
The Procedure
First remove:
Nullable variables
Unit productions
Useless productions
44
Then, for every symbol
a:
Ta a
Add production
In productions: replace
New variable:
a with Ta
Ta
45
Replace any production
with
A C1C2 Cn
A C1V1
V1 C2V2
Vn2 Cn1Cn
New intermediate variables:
V1, V2 , ,Vn2
46
Theorem:
For any context-free grammar
(which doesn’t produce )
there is an equivalent grammar
in Chomsky Normal Form
47
Observations
• Chomsky normal forms are good
for parsing and proving theorems
• It is very easy to find the Chomsky normal
form for any context-free grammar
48
Greinbach Normal Form
All productions have form:
A a V1V2 Vk
symbol
k 0
variables
49
Examples:
S cAB
A aA | bB | b
Bb
Greinbach
Normal Form
S abSb
S aa
Not Greinbach
Normal Form
50
Conversion to Greinbach Normal Form:
S abSb
S aa
S aTb STb
S aTa
Ta a
Tb b
Greinbach
Normal Form
51
Theorem:
For any context-free grammar
(which doesn’t produce )
there is an equivalent grammar
in Greinbach Normal Form
52
Observations
• Greinbach normal forms are very good
for parsing
• It is hard to find the Greinbach normal
form of any context-free grammar
53
Compilers
54
Machine Code
Program
v = 5;
if (v>5)
x = 12 + v;
while (x !=3) {
x = x - 3;
v = 10;
}
......
Compiler
Add v,v,0
cmp v,5
jmplt ELSE
THEN:
add x, 12,v
ELSE:
WHILE:
cmp x,3
...
55
Compiler
Lexical
analyzer
input
program
parser
output
machine
code
56
A parser knows the grammar
of the programming language
57
Parser
PROGRAM STMT_LIST
STMT_LIST STMT; STMT_LIST | STMT;
STMT EXPR | IF_STMT | WHILE_STMT
| { STMT_LIST }
EXPR EXPR + EXPR | EXPR - EXPR | ID
IF_STMT if (EXPR) then STMT
| if (EXPR) then STMT else STMT
WHILE_STMT while (EXPR) do STMT
58
The parser finds the derivation
of a particular input
derivation
input
10 + 2 * 5
Parser
E -> E + E
|E*E
| INT
E => E + E
=> E + E * E
=> 10 + E*E
=> 10 + 2 * E
=> 10 + 2 * 5
59
derivation tree
derivation
E => E + E
=> E + E * E
=> 10 + E*E
=> 10 + 2 * E
=> 10 + 2 * 5
E
E
10
+
E
2
E
*
E
5
60
derivation tree
E
E
10
machine code
+
E
2
E
*
mult a, 2, 5
add b, 10, a
E
5
61
Parsing
62
input
string
Parser
grammar
derivation
63
Example:
Parser
input
aabb
S SS
S aSb
S bSa
derivation
?
S
64
Exhaustive Search
S SS | aSb | bSa |
Phase 1:
S SS
S aSb
Find derivation of
aabb
S bSa
S
All possible derivations of length 1
65
S SS
aabb
S aSb
S bSa
S
66
Phase 2
S SS | aSb | bSa |
S SS SSS
S SS aSbS
Phase 1
S SS bSaS
S SS
S SS S
S aSb
S aSb aSSb
aabb
S aSb aaSbb
S aSb abSab
S aSb ab
67
S SS | aSb | bSa |
Phase 2
S SS SSS
S SS aSbS
aabb
S SS S
S aSb aSSb
S aSb aaSbb
Phase 3
S aSb aaSbb aabb
68
Final result of exhaustive search
(top-down parsing)
Parser
input
aabb
S SS
S aSb
S bSa
S
derivation
S aSb aaSbb aabb
69
Time complexity of exhaustive search
Suppose there are no productions of the form
A
A B
Number of phases for string
w:
2| w|
70
For grammar with
k rules
Time for phase 1:
k
k possible derivations
71
Time for phase 2:
k
2
k
2
possible derivations
72
Time for phase
2 | w |: k
2|w|
2|w| possible derivations
k
73
Total time needed for string
2
k k k
phase 1
phase 2
w:
2|w|
phase 2|w|
Extremely bad!!!
74
There exist faster algorithms
for specialized grammars
S-grammar:
A ax
symbol
Pair
string
of variables
( A, a) appears once
75
S-grammar example:
S aS
S bSS
S c
Each string has a unique derivation
S aS abSS abcS abcc
76
For S-grammars:
In the exhaustive search parsing
there is only one choice in each phase
Time for a phase:
1
Total time for parsing string
w:
| w|
77
For general context-free grammars:
There exists a parsing algorithm
that parses a string | w |
3
in time | w |
(we will show it in the next class)
78
The CYK Parser
79
The CYK Membership Algorithm
Input:
• Grammar
• String
G in Chomsky Normal Form
w
Output:
find if
w L(G )
80
The Algorithm
Input example:
• Grammar
• String
G : S AB
A BB
Aa
B AB
Bb
w : aabbb
81
aabbb
a
a
b
b
aa
ab
bb
bb
aab
abb
bbb
aabb
abbb
b
aabbb
82
S AB
A BB
Aa
B AB
Bb
a
A
a
A
b
B
b
B
aa
ab
bb
bb
aab
abb
bbb
aabb
abbb
b
B
aabbb
83
S AB
A BB
Aa
B AB
Bb
a
A
a
A
b
B
b
B
aa
aab
ab
S,B
abb
bb
A
bbb
bb
A
aabb
abbb
b
B
aabbb
84
S AB
A BB
Aa
B AB
Bb
a
A
a
A
b
B
b
B
aa
bb
A
bbb
S,B
bb
A
aab
S,B
ab
S,B
abb
A
aabb
A
abbb
S,B
b
B
aabbb
S,B
85
Therefore:
aabbb L(G )
Time Complexity:
Observation:
3
| w|
The CYK algorithm can be
easily converted to a parser
(bottom up parser)
86
© Copyright 2026 Paperzz