1. Automata - Temple CIS

7. Properties of Context-Free
Languages
CIS 5513 - Automata and Formal Languages – Pei Wang
Chomsky normal form
A CFL can be generated by many CFGs
Every CFL (without ɛ) can be generated by a
CFG in Chomsky normal form (CNF), where
each rule is in the form of either A → BC or
A → a, i.e., every variable becomes either
two variables or one terminal
Every CFG can be converted into CNF in
several steps
Removing ɛ-productions
A symbol A is nullable if A * ɛ, i.e., there is a
production A → ɛ, or A → B1B2 … Bk where
B1B2 … Bk are all nullable
If A is nullable, then B → CAD should produce a
variant B → CD, and A cannot derive ɛ
anymore in B → CAD
All the ɛ-productions can be eliminated by
treating all the variables the above way
Removing ɛ-productions: example
S → AB
A → aAA | ɛ
B → bBB | ɛ
S, A, and B are all nullable. New grammar:
S → AB | A | B
A → aAA | aA | a
B → bBB | bB | b
Removing unit productions
A unit production has the form A → B, and (A,
B) is a unit pair if A * B
A unit pair can be removed by expanding the
involved variables all the way until the result is
not a unit production
Removing unit productions: example
I → a | b | Ia | Ib | I0 | I1
F → I | (E)
T→F|T*F
E→T|E+T
changes to
I → a | b | Ia | Ib | I0 | I1
F → a | b | Ia | Ib | I0 | I1 | (E)
T → a | b | Ia | Ib | I0 | I1 | (E) | T * F
E → a | b | Ia | Ib | I0 | I1 | (E) | T * F | E + T
Removing useless symbols
A symbol X is useful if it is both reachable and
generating, i.e., S * αXβ * w
Removing a useless symbol in a grammar will
not change the language it generates
1. Eliminate nongenerating symbols and all
productions involving such symbols
2. Eliminate unreachable symbols
The order of the above steps matters
Useless symbols: example
Given CFG:
S → AB | a
A→b
B is not generating, so the grammar is
S→a
A→b
Now A is not reachable, so the grammar is
S→a
CFG to Chomsky normal form
Convert a CFG into CNF (not unique):
1. Eliminate ɛ-productions
2. Eliminate unit productions
3. Eliminate useless symbols
4. Change non-CNF productions into
CNF productions, i.e.,
A → BCD becomes A → BE, E → CD
A → Fg becomes A → FG, G → g
Greibach normal form
Every nonempty CFL without ɛ can be
generated from a grammar each of whose
production rule has the form
A → aα
where a is a terminal, and α is a string of
zero or more variables
This form can be obtained from PDA with a
single state and accept by empty stack
Pumping lemma for CFL
A sufficiently long string must be derived by
using the same variable repeatedly in a path of
the parse tree
Pumping lemma for CFL (cont)
A part of the
parse tree
can be
repeated:
S * uAy
A * vAx
A * w
Languages that are not CFL
The pumping lemma can be used to show that
some languages are not CFL:
 {0m1m2m | m 1} : for the n in pumping lemma,
pick the word z = 0n1n2n = uvwxy, since there
are n 1’s in the middle, vwx cannot contains
both 0 and 2, so repeat it will produce a word
not in the language
 To prove {ww | w is in {0,1}*} is not CFL, pump
the word 0n1n0n1n , then discuss the cases
Closure properties of CFL
CFLs are closed under the operation of
 Substitution (replace a terminal by a CFL)
 Union
 Concatenation
 Closure (* and +)
 Reversal
 Homomorphism
 Inverse homomorphism
Closure properties of CFL (cont.)
CFL’s are not closed under complement,
intersection, and difference
Example: {0n1n2i | n 1, i 1} and
{0i1n2n | n 1, i 1} are both
CFL’s, but their intersection is not
Example: {0,1}*  {ww} is CFL, but {ww} is not
The intersection or difference of a CFL and a
regular language is a CFL
Decision properties of CFL’s
[Complexity-related topics will not be covered]
Whether a CFL is empty can be decided by
checking whether the start symbol of its
grammar is generating
Whether a string belongs to a CFL can be
decided using dynamic programming to
incrementally build up the string
Testing membership in a CFL
The CYK algorithm: use a CFG in CNF to
incrementally find all variables that generate
the substrings
The triangular table is filled
bottom-up, where Xij
comes from XikX(k+1)j for
all possible k values,
according to the grammar
Membership decision for CFL