THEORY OF COMPUTATION

THEORY OF COMPUTATION
UNIT II
CONTEXT FREE LANGUAGES AND PUSH DOWN AUTOMATA
CONTEXT FREE LANGUAGES
Overview
This chapter gives the overview of context free grammars, derivation trees, ambiguity of a
language and various normal forms
Objectives
To learn about CFG
To learn about the derivation trees and their ambiguity
To learn about Chomsky and Griebach Normal forms
To solve problems in CFG
Introduction
In this chapter we present context-free grammars, a more powerful method of describing
languages. Such grammars can describe certain features that have a recursive structure, which
makes them useful in a variety of applications. Context-free grammars were first used in the study
of human languages.
One way of understanding the relationship of terms such as noun, verb, and preposition
and their respective phrases leads to a natural recursion because noun phrases may appear inside
verb phrases and vice versa. Context-free grammars can capture important aspects of these
relationships.
An important application of context-free grammars occurs in the specification and
compilation of programming languages. A grammar for a programming language often appears as
a reference for people trying to learn the language syntax. Designers of compilers and interpreters
for programming languages often start by obtaining a grammar for the language. Most compilers
and interpreters contain a component called a parser that extracts the meaning of a program prior
to generating the compiled code or performing the interpreted execution.
A number of methodologies facilitate the construction of a parser once a context-free
grammar is available. Some tools even automatically generate the parser from the grammar. The
collection of languages associated with context-free grammars is called the context-free languages.
They include all the regular languages and many additional languages.
In this chapter, we give a formal definition of context-free grammars and study the
properties of context-free languages. We also introduce pushdown automata, a class of machines
recognizing the context-free languages. Pushdown automata are useful because they allow us to
gain additional insight into the power of context-free grammars.
Definition of CFG
A context-free grammar is a 4-tuple (V, T, S, P) where
(i) V is a finite set called the variables
(ii) T is a finite set, disjoint from V, called the terminals
(iii) P is a finite set of rules, with each rule being a variable and a string
of variables and terminals, and
(iv) S ∈V is the start variable.
If u, v and w are strings of variables and terminals, and A→ w is a rule of the grammar, we say that
uAv yields uwv, written uAv ⇒uwv.
Example of CFG
Given a grammar G = ({S}, {a, b}, R, S).
The set of rules R is
This grammar generates strings such as
abab, aaabbb, and aababb
If we assume that a is left paranthesis ‘(’ and b is right paranthesis ‘)’, then
L(G) is the language of all strings of properly nested parantheses.
Right-Linear Grammar
In general productions have the form:
(V ∪T ) + → (V ∪T )*
In right-linear grammar, all productions have one of the two forms:
V →T *V or V →T *
i.e., the left hand side should have a single variable and the right hand side consists of any number
of terminals (members of T) optionally followed by a single variable
Right-Linear Grammars and NFAs
There is a simple connection between right-linear grammars and NFAs, as shown in the
following illustration.
As an example of the correspondence between an NFA and a right linear grammar, the
following automaton and grammar both recognize the set of set of strings consisting of an even
number of 0’s and an even number of 1’s.
Left-Linear Grammar
In a left-linear grammar, all productions have one of the two forms:
V →VT *
or V →T *
i.e., the left hand side must consist of a single variable, and the right-hand side consists of an
optional single variable followed by one number of terminals.
Conversion of Right-Linear Grammar to Left-Linear Grammar
Example:
Construct right-and left-linear grammars for the language L = {an bm : n ≥ 2, m ≥ 3}.
Solution:
Right-Linear Grammar:
Left-Linear Grammar:
Derivation Trees:
A ‘derivation tree’ is an ordered tree which the the nodes are labeled with the left sides of
productions and in which the children of a node represent its corresponding right sides.
Definition of a Derivation Tree
Let G = (V, T, S, P) be a CFG. An ordered tree is a derivation tree for G iff it has the
following properties:
Sentential Form
Partial Derivation Tree
In the definition of derivation tree given, if every leaf has a label from V ∪T ∪{λ} it is said
to be “partial derivation tree”.
Right Most/Left Most/Mixed Derivation
Example:
A grammar G which is context-free has the productions
Ambiguity:
Sometimes a grammar can generate the same string in several different ways. Such a string
will have several different parse trees and thus several different meanings. This result may be
undesirable for certain applications, such as programming languages, where a given program
should have a unique interpretation. If a grammar generates the same string in several different
ways, we say that the string is derived ambiguously in that grammar.
If a grammar generates some string ambiguously we say that the grammar is ambiguous.
The grammar given by
generates strings having an equal number of a’s and b’s. The string “abab” can be generated from
this grammar in two distinct ways, as shown in the following derivation trees:
Each of the above derivation trees can be turned into a unique rightmost derivation, or into
a unique leftmost derivation. Each leftmost or rightmost derivation can be turned into a unique
derivation tree. These representations are largely interchangeable.
Ambiguous Grammars/Ambiguous Languages
Since derivation trees, leftmost derivations, and rightmost derivations are equivalent
rotations, the following definitions are equivalent:
Definition: Let G = (N ,T, P, S ) be a CFG.
A string w∈L(G) is said to be “ambiguously derivable “if there are two or more different
derivation trees for that string in G.
Definition: A CFG given by G = (N, T, P, S) is said to be “ambiguous” if there exists at least one
string in L(G) which is ambiguously derivable. Otherwise it is unambiguous. Ambiguity is a
property of a grammar, and it is usually, but not always possible to find an equivalent
unambiguous grammar. An “inherently ambiguous language” is a language for which no
unambiguous grammar exists.
Example
Show that the grammar S → S | S, S → a is ambiguous.
Solution
In order to show that G is ambiguous, we need to find a w∈L(G), which is ambiguous.
Assume w = abababa.
The two derivation trees for w = abababa is shown below in Fig. (a) and (b).
Therefore, the grammar G is ambiguous.
SIMPLIFICATION OF CFG
Simplification of CFG-Introduction
In a Context Free Grammar (CFG), it may not be necessary to use all the symbols in V ∪T,
or all the production rules in P while deriving sentences. Let us try to eliminate symbols and
productions in G which are not useful in deriving sentences. Let G = (V,T, S, P) be a context-free
grammar. Suppose that P contains a production of the form
A→ x 1B x 2 .
Substitution Rule
A production A→ x 1B x 2 can be eliminated from a grammar if we put in its place the set
of productions in which B is replaced by all strings it derives in one step. In this result, it is
necessary that A and B are different variables.
1.Eliminating Useless Productions
2.Empty Production Removal
The productions of context-free grammars can be coerced into a variety of forms without
affecting the expressive power of the grammars. If the empty string does not belong to a language,
then there is a way to eliminate the productions of the form A→ λ from the grammar.
If the empty string belongs to a language, then we can eliminate λ from all productions
save for the single production S → λ. In this case we can also eliminate any occurrences of S from
the right-hand side of productions.
Procedure to find CFG with out empty Productions
3. Unit production removal
Left Recursion Removal
NORMAL FORMS
Two kinds of normal forms viz., Chomsky Normal Form and Greibach Normal Form (GNF) are
considered here.
Chomsky Normal Form (CNF)
Any context-free language L without any λ-production is generated by a grammar is
which productions are of the form A → BC or A→ a, where A, B ∈VN , and a ∈ V Τ.
Procedure to find Equivalent Grammar in CNF
(i) Eliminate the unit productions, and λ-productions if any,
(ii) Eliminate the terminals on the right hand side of length two or more.
(iii) Restrict the number of variables on the right hand side of productions to two.
Proof:
For Step (i): Apply the following theorem: “Every context free language can be generated by a
grammar with no useless symbols and no unit productions”.
At the end of this step the RHS of any production has a single terminal or two or more symbols.
Let us assume the equivalent resulting grammar as G = (VN ,VT ,P ,S ).
For Step (ii): Consider any production of the form
Example
Obtain a grammar in Chomsky Normal Form (CNF) equivalent to the grammar G with
productions P given
Solution
Greibach Normal Form (GNF)
In Chomsky’s Normal Form (CNF), restrictions are put on the length of right sides of a
production, whereas in Greibach Normal Form (GNF), restrictions are put on the positions in
which terminals and variables can appear.
GNF is useful in simplifying some proofs and making constructions such as Push Down
Automaton (PDA) accepting a CFG.
Definition: A context-free grammar is said to be in Greibach Normal Form (GNF) if all
productions have the form
A→ ax
where a ∈T and x ∈V * .
For a grammar in GNF, the RHS of every production has a single terminal followed by a string of
variables.
Every context-free language L without ∈, can be generated by a grammar in which
every production is of the form Aa α, where A is a variable and a is a terminal, and α is a
(possibly empty) string of variables.
Converting to GNF is complex, even if we start with a grammar in CNF. Roughly, we
expand the first variable of each production, until we get a non-terminal. But there can be cycles,
where we never reach a terminal.
We “short circuit” the process, creating a production that introduces a terminal as the
first symbol of the body and has variables following it to generate all the sequences of variables
that might have been generated on the way to generation of that terminal.
Since each use of a rule introduces exactly one terminal, a string of length n, has a
derivation of exactly n steps
Lemma Define a A-production to be a production with variable A on the left.
Let G = (V,T,P,S) be a CFG.
Let A α1Bα2 be a production in P and
B β1 | β2 | … |βr be the set of B-productions.
Let G1 = (V,T,P,S) be obtained from G by
deleting the production A α1Bα2 from P
and adding the productions Aα1β1α2 | α1β2α2 | α1βrα2. Then L(G) = L(G1)
Lemma Let G = (V,T,P,S) be a CFG.
Let A Aα1 | Aα2 | … |Aαr be the set of A-productions
for which A is the leftmost symbol for the right hand side.
Let Aβ1| β2| … |βs be the remaining A-productions.
Let G1 = (V∪ {B},T,P1,S) be the CFG formed by adding the variable B to V
and replacing all the A-productions by the productions:
1) AβI AβiB 1 ≤ i ≤ s
2) B αI B αiB 1 ≤ i ≤ r
Then L(G1) = L(G).
GNF
GNF forms
forms may
may be
be obtained
obtained directly,
directly, in
in some
some simple
simple cases
cases
1) P = {SaSb, Sab} can be easily changed to P’={SaSB, SaB,
Bb}. This is done by introducing a new non-terminal in the place of
terminal as explained in the conversion to CNF. We note that this
principle is not applied if the terminal occurs as the first symbol on the
right side of a rule. The rules in P’ are in GNF and generate the
language L = {anbn/n ≥1}
2) P = {SaSa, SbSb, Saa, Sbb} may be changed to P’ = {
SaSA, SbSB, SaA, SbB, Aa, Bb}. Here again, we use the
same principle as in (i). The rules in P’ are in GNF and generate the
language { wwR / w is in {a,b}+}
3) Let P = {SS1S, SS1,S1aS1b, S1ab}. Then P’={ SaS1BS,
SaS1B, S1aS1B, S1aB, SaBS, SaB, Bb}, is in GNF and
generates the language L = {anbn/ n≥1}
4) Let P = {SSS, SaSb, Sab}. Then P’ = {SaSBS, SaSB, SaB,
Bb} is in GNF and generates the ∈- free Dyck set over the alphabet
{a,b}.
Example
Convert to Greibach normal form, the grammar
Summary
This chapter includes concept of the context-free language and the simplification of CFG. The
ambiguity of the CFG and various normal forms are also been discussed.
Key Terms:
CFG: Context-Free Grammar.
Left-Linear Grammar: All productions are either of the form V →VT * or V →T *
Right-Linear Grammar: All productions are either of the form V →T *V or V →T *
Parsing: Finding a derivation of the string.
Topdown Parsing: Sequence of rules applied in the leftmost derivation
Bottomup Parsing: Sequence of rules applied in a rightmost derivation.
Ambiguous Grammar: A CFG is said to be “ambiguous” if there exists at least one string in the
language of the CFG which is ambiguously derivable. Otherwise it is unambiguous.
Useless Production: A production rule not affecting the language
Unit Production: Any production of a CFG of the form A→ B where A, B ∈V is called a UnitProduction.
Chomsky Normal Form: A CFG without any λ-productions is generated by a grammar in which
productions are of the form A→ BC or A→ a, where A, B ∈ VN and a ∈ V Τ .
Review Questions:
1. Define the term: Context-Free Grammar (CFG).
2. Give an example of a CFG.
3. What do you mean by a right linear grammar?
4. Show the relationship existing between right-linear grammars and NFAs. Give an example.
5. What is a left-linear grammar?
6. Compare right-linear grammar with left-linear grammar.
7. Give some examples of Context-free languages.
8. What are derivation Trees?
9. Define ‘derivation tree’.
10. When is an ordered tree said to be a derivation tree?
11. What do you mean by sentential form?
12. What is a partial derivation tree?
13. Explain (a) Rightmost (b) Leftmost and (c) Mixed derivation.
14. What do you mean by the terms (a) Parsing (b) Ambiguity.
15. What do you mean by exhaustive search parsing?
16. Distinguish between top-down and bottom-up parsing.
17. Define the terms: (a) Ambiguous Grammar (b) Ambiguous Language.
18. What do you mean by inherently ambiguous language?
19. Explain the method of simplifying a CFG.
20. State the substitution rule.
21. How will you abolish useless production in CFG?
22. What do you mean by empty production removal? Explain with an example
23. State the procedure to find CFG without l-productions.
24. What do you mean by unit production removal?
25. What do you mean by left recursion removal?
26. What are the kinds of Normal Forms?
27. What do you mean by Chomsky Normal Form (CNF)?
28. State the procedure to find equivalent grammar in CNF.
29. What do you mean by Greibach Normal Form (GNF).
30. When is a CFG said to be in GNF?
Exercise