Context-Free Grammars

Chapter 3
Context-Free Grammars
By
Dr Zalmiyah Zakaria
•Context-Free Grammars and Languages
•Regular Grammars
Formal Definition of
Context-Free Grammars (CFG)
A CFG can be formally defined by a quadruple
of (V, , P, S) where:
– V is a finite set of variables (non-terminal)
–  (the alphabet) is a finite set of terminal
symbols , where V   = 
– P is a finite set of rules (production rules) written
as: A   for A  V,   (v  )*.
– S is the start symbol, S  V
2
Formal Definition of CFG
• We can give a formal description to a
particular CFG by specifying each of its
four components, for example,
G = ({S, A}, {0, 1}, P, S) where P
consists of three rules:
S → 0S1
S→A
A→
Sept2011
Theory of Computer Science
3
Context-Free Grammars
• Terminal symbols – elements of the
alphabet
• Variables or non-terminals – additional
symbols used in production rules
• Variable S (start symbol) initiates the
process of generating acceptable
strings.
4
Terminal or Variable ?
• S → (S) | S + S | S × S | A
•A→1|2|3
• The terminal symbols are { (, ), +,
×, 1, 2, 3}
• The variable symbols are S and A
Sept2011
Theory of Computer Science
5
Context-Free Grammars
• A rule is an element of the set
V  (V  )*.
• An A rule:
[A, w] or A  w
• A null rule or lambda rule:
A
6
Context-Free Grammars
• Grammars are used to generate
strings of a language.
• An A rule can be applied to the
variable A whenever and wherever
it occurs.
• No limitation on applicability of a
rule – it is context free
8
Context-Free Grammars
• CFG have no restrictions on the right-hand side of
production rules. All of the following are valid CFG
production rules:
S  aSbS
Sλ
S  ABcDEF
Sc
S  xyz
S  MZK
• Notice that there can be a λ, one or more terminals,
one or more non-terminals and any combination of
terminals and non-terminals on the right-hand side
of a production rule.
Sept2011
Theory of Computer Science
9
Generating Strings with CFG
• Generate a string by applying rules
– Start with the initial symbol
– Repeat:
• Pick any non-terminal in the string
• Replace that non-terminal with the right-hand side of
some rule that has that non-terminal as a left-hand
side
• Repeat until all elements in the string are terminals
• E.g. :
P:
S  uAv
Aw
We can derived string uwv as:
S ⇒ uAv ⇒ uwv
10
Generating Strings with CFG
S  aS
S  Bb
B  cB
B
• Generating a string:
P:
S
aS
aBb
acBb
acb
Sept2011
replace S with aS
replace S with Bb
replace B with cB
replace B with 
Final String
Theory of Computer Science
11
Generating Strings with CFG
P:
S  aS
B  cB
S  Bb
B
• Generating a string:
S
aS
aaS
aaBb
aacBb
aaccBb
aaccb
Sept2011
replace S with aS
replace S with aS
replace S with Bb
replace B with cB
replace B with cB
replace B with 
Final String
Theory of Computer Science
12
Generating Strings with CFG
P:
S  aS
B  cB
S  Bb
B
• Regular expression equivalent to
∗
∗
this CFG: a c b
• Shortest string: S ⇒ Bb ⇒ b.
Sept2011
Theory of Computer Science
13
Derivation
• A derivation is a listing of how a string is generated –
showing what the string looks like after every
replacement.
S  AB
A  aA | 
B  bB | 
Sept2011
S ⇒ AB
⇒ aAB
⇒ aAbB
⇒ abB
⇒ abbB
⇒ abb
Theory of Computer Science
14
Derivation
• A terminal string is in the language of
the grammar if it can be derived from
the start symbol S using the rules of
the grammar:
• Example: S  AASB | AAB
Aa
B  bbb
15
Derivation
Derivation
Rule Applied
S ⇒ AASB
⇒ AAAASBB
⇒ AAAAAASBBB
⇒ AAAAAAAABBBB
⇒ aaaaaaaaBBBB
⇒ aaaaaaaabbbbbbbbbbbb
S  AASB
S  AASB
S  AASB
S  AAB
Aa
B  bbb
16
Derivation
• Let G be the grammar :
S  aS | aA
A  bA | 
• The derivation of aabb is as shown:
S  aS
 aaA
 aabA
 aabbA
 aabb
 aabb
17
Example
• Let G be the grammar :
S  aSb | ab
• The derivation of aabb is as shown:
S  aSb
 aaSbb
 aaaSbbb
 aaaaSbbbb
 aaaabbbb
 Set notation equivalent to this CFG is
L = {anbn | n > 0}
18
CFG – Generating strings
• E.g. : Formal description of G = ({S}, {a, b}, P, S)
P:
S  aS | bS | 
• The following strings can be derived:
S⇒
S ⇒ aS ⇒ a ⇒ a
S ⇒ bS ⇒ b ⇒ b
S ⇒ aS ⇒ aaS ⇒ aa ⇒ aa
S ⇒ aS ⇒ abS ⇒ ab ⇒ ab
S ⇒ bS ⇒ baS ⇒ ba ⇒ ba
S ⇒ aS ⇒ abS ⇒ abbS ⇒ abb ⇒ abb
19
CFG – Generating strings
• E.g. : G = ({S}, {a,b}, P, S)
P: S  aS | bS | 
• The language above can also be
defined using regular expression:
L(G) = (a + b)*
20
CFG – Generating strings
• E.g. :
G = ({S, A}, {a, b}, P, S)
P:
S  AA
A  AAA | bA | Ab | a
• The following strings can be derived:
S ⇒ AA
S ⇒ aA
rule [A  a]
S ⇒ aAAA
rule [A  AAA]
S ⇒ abAAA
rule [A  bA]
S ⇒ abaAA
rule [A  a]
S ⇒ abaAbA
rule [A  Ab]
S ⇒ abaabA
rule [A  a]
S ⇒ abaaba
rule [A  a]
Shortest
string ?
21
CFG – Generating strings
G = (V, , P, S), V = {S, A},  = {a, b},
P:
S  AA
A  AAA | bA | Ab | a
• Four distinct derivations of ababaa in G:
(a) S ⇒ AA
(b) S
⇒ aA
⇒ aAAA
⇒ abAAA
⇒ abaAA
⇒ ababAA
⇒ ababaA
⇒ ababaa
(a) & (b) leftmost,
⇒ AA
(c) S ⇒ AA
(d) S ⇒ AA
⇒ AAAA
⇒ Aa
⇒ aA
⇒ aAAA
⇒ AAAa
⇒ aAAA
⇒ abAAA
⇒ AAbAa
⇒ aAAa
⇒ abaAA
⇒ AAbaa
⇒ abAAa
⇒ ababAA
⇒ AbAbaa
⇒ abAbAa
⇒ ababaA
⇒ Ababaa
⇒ ababAa
⇒ ababaa
⇒ ababaa
⇒ ababaa
(c) rightmost,
(d) arbitrary
22
Context-Free Grammars
S ⇒ aS ⇒ aSa ⇒ aba
• Sentencial forms are the strings derivable
from start symbol of the grammar.
• Sentences are forms that contain only
terminal symbols.
• A set of strings over  is context-free
language if there is a context-free grammar
that generates it.
23
Derivation Tree, DT
G = ({S, A}, {a, b}, P, S)
P: S  AA
A  AAA | bA | Ab | a
The derivation tree for
S ⇒ AA ⇒ aA ⇒ abA ⇒ abAb ⇒ abab
24
Derivation Tree
S ⇒ aS ⇒ aSa ⇒ aba
S ⇒ Sa ⇒ aSa ⇒ aba
25
Derivation Tree
• A CFG G is ambiguous if there exist more
than 1 DT for n, where n  L(G).
• Example: G = ({S}, {a, b}, P, S)
P: S  aS | Sa | b
The string aba can be derived as:
S ⇒ aS ⇒ aSa ⇒ aba or S ⇒ Sa ⇒ aSa ⇒ aba
26
Example
• Let G be the grammar given by
the production
S  aSa | aBa
B  bB | b
• Then L(G) = {anbman | n > 0, m > 0}
27
Example
• Let L(G) =
n≥0, m>0}
• Then the production rules for this
grammar is:
n
m
m
2n
{a b c d |
S  aSdd | A
A  bAc | bc
28
Example
• A string w is a palindrome if w = wR
• The set of palindrome over {a, b}
can be derived using rules:
Sa|b|
S  aSa | bSb
29
More Examples
• What is the language of this grammar ?
• S  aA
A  aA | bA | b
• S  aA
A  aA | bB
B  bB | 
• S  aS | bA
A  bB
B  aB | 
a(a + b)*b
aa*bb*
a*bba*
30
How to convert RE to CFG
• General rules:
• a+b
Sa|b
• ab
S  aA
Ab
• a*
S → aS | λ
Sept2011
Theory of Computer Science
31
Regular Grammars
• Regular grammars are an important
subclass of context-free grammars that
play an important role in the lexical
analysis and parsing of programming
languages.
• Regular grammars is a subset of CFG.
• Regular grammars are obtained by
placing restrictions on the form of the
right hand side of the rules.
33
Definition
•
A regular grammar is a CFG in which each rule
has one of the following form:
1. A  a
2. A  aB
3. A  λ
where A, B  V, and a  , A, B is a nonterminal and a can be one or more terminals
34
Regular Grammars
• There is at most ONE variable in a
sentential form – the rightmost
symbol in the string.
• Each rule application adds ONE
terminal to the derived string.
Derivation is terminated by rules:
A  a OR
Aλ
35
Example
• Consider the grammar:
G:
S  abSA | 
A  Aa | 
• The equivalent regular grammar:
Gr :
S  aB | 
B  bS | bA
A  aA | 
36
Example
Syntax of Pascal in Backus-Naur Form
<assign>  <var> := <exp>
<var>
 A B C
<exp>  <var> + <exp> 
<var> - <exp> 
(<exp>) 
<var> * <exp> 
<var>
37
Example
Is A := B*(A+C) Syntactically correct?
<assign>  <var> := <expr>
A := <expr>
A := <var>*<expr>
A := B*<expr>
A := B*(<var>+<expr>)
A := B*(A+<expr>)
A := B*(A+<var>)
A := B*(A+C)
38
Example
Is A := B*(A+C) Syntactically correct?
39
How to construct RG from a RE
• Use a new non-terminal for every new
character
• Each loop state turns into a recursive
definition on a non-terminal
• Example: R.E.
ab*ab
RG
S  aA
A  bA
A  aB
Bb
Sept2011
Theory of Computer Science
40
Reg Expression to Reg Grammar
• RE
• RE
Sept2011
a(a + b)*b RG S → aA
A → aA
A → bA
A→b
ab*a
RG S → aA
A → bA
A→a
Theory of Computer Science
41
How to construct RG from a RE (cont.)
• Eg.
Sept2011
R.E.
a(a + b)*b
Regular Grammar
S  aA
A  aA
A  bA
Ab
Theory of Computer Science
42
RE ⇒ RG
• Consider the regular expression:
a+b*
• The regular grammar is:
S  aS  aR
R  bR  
43
RE ⇒ RG ⇒ RL
• The regular language L = a+b*
can be defined as:
L = ( V, , P, S)
where: V = {S, R}
 = {a, b}
P: S  aS  aR
R  bR  
44
Non-Regular Language
• The language L = a+b*
can also be defined as:
L = ( V, , P, S)
where: V = {S, A, B}
 = {a, b}
P: S  AB
A  aA | a
B  bB | 
non-regular
grammar
45
Non-RG to RG
• Since every RG defines a RL and every RL must have a
RE, it would be easier to go ahead and figure out
what the RE is and convert it to the RG.
S  XYX
X  aX | bX | 
Y  aa | bb
• The equivalent RE for it is: (a + b)*(aa + bb)(a + b)*
• Eg. : CFG that is not regular:
S  aS | bS | X
X  aaY | bbY
Y  aY | bY | 
• Convert it to RG:
Sept2011
Theory of Computer Science
46
Example
• Grammar for even-length strings over
{a, b}:
S  aE | bE | 
E  aS | bS
47
Design example
L = {0n1n | n  0}
These strings have recursive structure:
000000111111
0000011111
00001111
000111
0011
01
λ
S  0S1| λ
Design examples
L = {0n1n0m1m | n  0, m  0}
These strings have two parts:
L = L1L2
L1 = {0n1n | n  0}
L2 = {0m1m | m  0}
rules for L1: S1  0S11| λ
L2 is the same as L1
Allowed strings
010011
00110011
000111
S  AA
A  0A1 | λ
Design examples
L = {0n1m0m1n | n  0, m  0}
Allowed strings
011001
0011
1100
00110011
These strings have nested structure:
outer part: 0n1n
inner part: 1m0m
S  0S1|A
A  1A0 | λ
Example of RG
• G = ({S, A, B, C}, {a, b}, P, S)
P:
S  aS | aB
B  bC
C  aC | a
m
n
• L(G) = {a ba |m, n  1}
Sept2011
Theory of Computer Science
51
Example of CFG
• G = ({S, A, B, C}, {a, b}, P, S)
P:
S  aSbb | abb
n
2n
• L(G) = {a b |n  1}
Sept2011
Theory of Computer Science
52