NORMAL FORMS
FDP ON THEORY OF COMPUTING
By
G Sudha Sadasivam
Assistant Professor, CSE
CONTEXT FREE GRAMMAR
• Formal languages, NFA and DFA describe
grammars
• English Sentence rules
<Sentence> <Noun Phrase> <Verb Phrase>
<Noun Phrase> <Article> <Noun> | <Noun>
<Verb Phrase> <Verb> | <Verb> <Noun Phrase>
<Article> a | the
<Noun> Sita | boy | girl | ball | dog | ...
<Verb> caught | saw | took | ...
CONTENTS
•
•
•
•
•
•
•
•
CFG
Chomsky hierarchy
Normal forms
Unit productions
λ productions
Useless productions
CNF
Applications
Sentence
NP
VP
Noun
Verb
Sita
took
NP
Article
the
Noun
ball
Chomsky Hierarchy- Avram Noam Chomsky
• Grammar G = ( VN, VT, P, S), where VN is set of NT/
Variables, VT is alphabet set / terminals (∑), P is set of
productions , S start symbol
• In CFG, rules are of the form A w, w ε ( VT U VN)
Type
Name
Productions (P)
0
Unrestricted a b with a (VT VN)+ and b (VT VN)*
1
Contexta1Aa2 a1b a2 with A VN and
Sensitive a1, a2 (VT VN)* and b (VT VN)+
2
ContextFree
A b with A VN and b (VT VN)*
3
Regular,
Finite
A bB or A b with A, B VN and b V*T
CFG
• Context-free means a variable can be replaced
with w.
• They are powerful to describe Programming
languages.
• CFG are simple to construct efficient parsing
algorithms – LR & LL parsers.
• BNF (Backus-Naur form is used to represent
CFG.
• S aSb | λ
• Panini described Sanskrit using CFG
• Venpa is governed by CFG
• Text Mining in Bio-medicine
Normal Forms
• A NF for a grammar has additional
conditions imposed upon its productions
and is equivalent to the given grammar.
• Two types
– Chomsky NF (CNF)
– Greibach NF (GNF)
• Simple form of productions.
• Rules in CNF has both theoretical and
practical implications.
T0
T1
T2
GNF
T3
CNF
Examples
– CNF:
CYK membership algorithm to find if a
string is in the language represented by
the grammar.
– GNF:
is used for conversion from CFG to
NDPA and vice versa.
CNF
• CFG, RHS can be a combination of V & T
• Eg
NP the N is reduced to
DET the
NP DET N
• A λ-free CFG is said to be in CNF if prod. are
Aa
B CD , with A, B, C, D εVN and a ε VT.
• If CFG is not λ-free then include S λ
• As 2nd prod has two variables – binary
grammars
GNF
• Productions are of the form
Aax , where a is a VT and x ε VN*.
• They are long
1. λ-free languages
• Let L be any CF language,
G (with λ proiductions) has prod S0 S | λ
• λ - productions are A λ;
• A variable A for A * λ is nullable
• In this case λ-prod can be removed
• If G is a λ-free CFG, then there is G1 having no
λ-prod
STEPS
1. Find the set VN of all nullable variables
1. A λ, put A in VN
2. Repeat to Add variables to VN
For prod B A1 A2 A3 A4
A1 A2 A3 A4 are in VN, add B to VN
2. For A x1 x2 … xm with m >=1, put into
P1, that production and prod generated
by replacing nullable variables with λ in
all combinations
S ABaC
A BC
Bb|λ
CD|λ
D d
Nullable is { A, B, C}
S ABaC | BaC | AaC | ABa | aC | Ba | Aa | a
A BC | C | B
Bb
CD
Dd
2. Substitution Rule
G : A x1 B x2 and B y1 | y2 |.. | yn
G1: A x1 y1 x2 | x1 y2 x2 | x1 y3 x2 |…| x1 yn x2
B y1 | y2 |.. | yn
Then G = G1
Example:
A a | aaA | abBc
B abbA | b
Then
A a | aaA | ababbAc | abbc
B abbA | b
3. Removing Useless Productions
Prod that do not take part in any derivations
S * xAy * w is useful
Useless variables
1. Cannot be reached from S
S A; A aA | λ;
B bA
2. Cannot derive a terminal string
SA|b
A aA
1. Identify the variables that can lead to a
terminal string
1. Set V1(G1=(V1,T1,P1,S1) to NULL
2. Repeat
For A x1x2x3……x n for xi in V1 U T,
add A to V1
3. Add to P1 all prod in P whose symbols
are in A U V1
2. Eliminate var that cannot be reached from S
1. Dependency graph
2. Useful-var is reached from start (S).
For eg1:
S aSb | λ | A
A aA
A – is useless
Eg2:
SA
A aA | λ
B bA
1) A cannot derive a terminal string
2) B is not reachable from start
Eg3:
1: S aS | A | C
2: A a
3: B aa
4: C aCb
1) C is useless since it does not derive a
terminal string
2) Reachability graph – B is not reachable
S
A
B
4. Removing Unit Productions
Prod of the form A B are unit productions
1. Find A * B from a dependency graph
2. Add to P1 all non-unit prod from P
3. If AB and B y1 | y2 |.. | yn then
A y1 | y2 |.. | yn
S Aa | B
A a | bc | B
B A | bb
S
S Aa | bb
A a | bc | bb
B bb | a | bc
A
1. Non-unit rules
S Aa
A a | bc
B bb
2. As S * B , S bb is added
As A * B, A bb is added
As B * A, B a|bc is added
B
Altogether
1. Remove λ productions
2. Remove Unit Productions.
3. Remove useless productions and nonterminals.
CNF
CNF -- Rules are of the form A BC or A a
Eg: S AS | a; A SA | b
Steps
1. Eliminate λ, unit and useless productions
2. Add production of form A a or A BC to P1
3. Consider A x1x2x3……x n
1. If n = 1 then x1 should be a Terminal (T)
2. If n >=2, introduce Ba for a ε T and
1. Convert A x1x2x3……x n to A B1 B2 … B3
and add B1 x1 … to P1
3. For n>2 introduce new Var D1,D2…
1. Eg. A BCDE is written as
2. A BD1 D1 C D2 D2 D D3 D3 E D4
G:
S ABa
A aab
B Ac
Result:
Na a
Nb b
Nc c
S AX1
X1 BNa
A NaX2
X2 NaNb
B AN
Eg1:
Remove useless symbols
S AA|CD|bB
A aA|a
B bB|bC
C cB
D dD|d
Eg 2:
Convert to CNF
A BD
Aa
Sλ
Eg3:
S A|ABa|AbA
A Aa|λ
B Bb|BC
C CB|CA|bB
THANK YOU
NORMAL FORMS
FDP ON THEORY OF COMPUTING
By
G Sudha Sadasivam
Assistant Professor, CSE
GNF
• Productions are of the form
Aax , where a is a VT and x ε VN*.
• They are long
• Can be used to construct PDA to recognise
CFG
• Both GNF and s-grammars require that
rules have the form A ax.
• s-grammars requires that the first VN of all
the A-rules , be distinct. GNF does not
impose such a restriction.
1.Remove λ productions
2.Remove Unit Productions.
3.Remove useless productions and nonterminals.
4.Convert to CNF
5.Convert to GNF
Eg1:
S AB
A aA | bB | b
Bb
Eg2:
S abSb | aa
GNF
S aAB | bBB | bB
A aA | bB | b
Bb
GNF2
S aY SY | aX
Xa
Yb
Removing direct left recursions
A A α | β, the equivalent is
A β A’ ; A’ α A’ | λ
Eg:
S Sa | b is equivalent to
S b S’ ; S’ a S’ | λ
Answer:
SA|C
A B A’ | a A’
A’ aBA’ | aCA’
B Cb B’
B’ bB’
C cC | c
1. Grammar G:
2. Removing left
recursion:
3. BAA is out of order
4. Take each rule that does not have a terminal at start and
follow the derivation until a terminal is produced
answer
Construct GNF for
Answer
CYK algorithm
• Cocke-Younger-Kasami algorithm - To
recognise CF language
• To prove membership of strings to CFL
• To construct a possible parse tree.
• Can also process Stochastic CFG, where
probabilities a stored in a table.
• Asymptotic time complexity is θ(n3),
where, n -- strlen
I/P string a 1 ... a n of length ‘n’. G has ‘r’ terminals.
Grammar has nonterminals R 1 ... R r & R 1 is start
symbol.
P[n n r] is an array of booleans initialized to false.
For each i = 1 to n
For each unit production R j → a i set P[i 1 j] = true.
For each i = 2 to n -- Length of span
For each j = 1 to n-i+1 -- Start of span
For each k = 1 to i-1 -- Partition of span
For each production R A -> R B R C
if P[j k B] and P[j+k i-k C]
then set P[j i A] = true
if P[1 n 1] is true
then string is member of language
else string is not member of language
CYK Parsers
i/p str: w = w1w2 w3……wn;
wij = wi wi+1……wj and Vij = { A ε V | A * wij }
W belongs to L iff S ε V1n.
Vii. Is found by examining RHS of rules
O(n3) – n is string length
S AB
A BB | a
B AB | b
To generate string aabbb
String: aaabbb
String : baaa
© Copyright 2025 Paperzz