Solution - GUC MET - German University in Cairo

German University in Cairo
Department of Computer Science
Dr. Haythem O. Ismail
CSEN 1003 Compiler, Spring Term 2017
Practice Assignment 3
Discussion: 18.02.17 - 23.02.17
Exercise 3-1
CFG’s
Give a context-free grammar (CFG) for each of the following languages:
a) L = {am bn ck | k = m + n and m, n, k ≥ 0} over the alphabet Σ = {a, b, c}.
Solution:
S → aSc | T
T → bT c | ε
b) L = {am bn | n 6= m} over the alphabet Σ = {a, b}.
Solution:
S → P |T
P → aP b | aP | a
T → aT b | T b | b
Alternative solution:
S
X
A
B
→
→
→
→
AX | XB
aXb | ε
aA | a
bB | b
Note: This language does not accept the empty string because it would imply m = n = 0.
c) L = {w | w is a palindrome } over the alphabet Σ = {a, b, c}. (Note: A palindrome is a
string that reads the same backwards as forwards.)
Solution:
S
→
ε | a | b | c | aSa | bSb | cSc
Exercise 3-2
Parse trees
Cosider the grammar:
0
Some exercises are due to Dr. Carmen Gervet
1
S
A
B
→
→
→
A1B
1A | 0
0B | ε
Give a parse tree for each of the following strings:
a) 11101
Solution:
S
A
B
1
A
1
ε
A
1
A
1
0
b) 1010
Solution:
S
A
1
B
1
A
0
B
ε
0
c) 0100
Solution:
S
A
B
1
0
B
0
0
B
ε
Exercise 3-3
Ambiguous grammars
For each of the following grammars, first show that the grammar is ambiguous, then provide an
equivalent unambiguous grammar.
a)
S
→
1S0 | 1S | ε
2
Solution:
We show that the grammar is ambiguous by providing two different parse trees for the
string:110
S
S
S
1
1
S
1
S
1
0
S
0
ε
ε
An equivalent unambiguous grammar:
b)
S
T
→
→
1S0 | T
1T | ε
S
→
aSbS | aS | ε
Solution:
We show that the grammar is ambiguous by providing two different parse trees for the
string:aab
S
S
a
a
S
b
S
S
a
S
a
ε
S
ε
ε
b
S
ε
An equivalent unambiguous grammar:
S
T
→
→
aSbS | T
aT | ε
Exercise 3-4
Leftmost and rightmost derivations
Consider the following context-free grammar:
S
→
SS+ | SS* | a
and the string: aa+a*
a) Give a leftmost derivation for the string. Show the sequence of derivation rules applied.
Solution:
S ⇒ SS* ⇒ (SS+)S* ⇒ aS+S* ⇒ aa+S* ⇒ aa+a*
b) Give a rightmost derivation for the string. Show the sequence of derivation rules applied.
Solution:
S ⇒ SS* ⇒ Sa* ⇒ (SS+)a* ⇒ Sa+a* ⇒ aa+a*
3
c) Give a parse tree for the string.
Solution:
S
S
S
S
a
a
S
*
a
+
d) Is this grammar ambiguous? Justify your answer.
Solution:
This grammar describes the language of strings in postfix notation with the operand ‘a’.
It is not ambiguous because postfix notation implies a single interpretation of strings.
NOTE: it does not mean that it is suitable for top-down parsing!!
Exercise 3-5
Unambiguous grammars
The following context-free grammar generates prefix expressions with operands 0 and 1 and
binary operators +, -, and *:
S
→
+SS | -SS | *SS | 0 | 1
a) Find leftmost and rightmost derivations together with a parse tree for the string *+-0101.
Solution:
Derivations:
Leftmost
S
⇒*SS
⇒*(+SS)S
⇒*+(-SS)SS
⇒*+-0SSS
⇒*+-01SS
⇒*+-010S
⇒*+-0101
Rightmost
S
⇒*SS
⇒*S1
⇒*(+SS)1
⇒*+S01
⇒*+(-SS)01
⇒*+-S101
⇒*+-0101
Parse tree:
S
S
*
S
1
S
+
-
4
S
S
S
0
1
0
b) Prove that this grammar is unambiguous.
Solution:
This grammar denotes a prefix notation of strings with operands 0, 1 and operation symbols
+, - and *.
In this grammar, the application of each rule generates a string starting with a unique
terminal symbol (*, +, -, 0 or 1). For any string w that belongs to the CFL, when we
consider a leftmost variable E in the leftmost derivation of the string, there is only one
rule that can be used to continue the derivation. This rule is uniquely determined by the
next symbol in w to be derived. So there is only one leftmost derivation for w, hence the
non-ambiguity of the grammar.
Exercise 3-6
Grammar Correctness
a) Consider the CFG G1 :
S −→ 0S11 | 0S111 | ε
Prove that L(G1 ) =
{0m 1n
| 2m ≤ n ≤ 3m and n, m ≥ 0}
Solution:
Proof. We divide the proof into two parts.
Soundness (L(G1 ) ⊆ L1 ). We prove the statement by induction on the length n of Sderivations.
Basis (n = 1). The only S-derivation of length 1 is the derivation s ⇒ ε and ε ∈ L1 .
j
Induction Hypothesis. For some k ∈ N, if s ⇒ w, with j ≤ k, w ∈ L1 .
k+1
Induction Step. Suppose that S ⇒ w. Hence either,
k
S ⇒ 0S11 ⇒ 0u11 = w
or
k
S ⇒ 0S111 ⇒ 0v111 = w
k
k
Thus, S ⇒ u and S ⇒ v. By the induction hypothesis, u ∈ L1 and v ∈ L1 .
Hence, u, v = 0x 1y , for some x, y ∈ N and 2x ≤ y ≤ 3x. It follows that either
w = 0u11 = 0x+1 1y+2 ∈ L1 , or w = 0u111 = 0x+1 1y+3 ∈ L1 .
Completeness (L1 ⊆ L(G1 )). We prove the statement by induction on the length n of
strings in L1 .
Basis (n = 0). The only string of length 0 is ε, and S ⇒ ε. Hence, ε ∈ L1 .
Induction Hypothesis. For some k ∈ N, if w ∈ Σ∗ and |w| ≤ k, then w ∈ L1 implies
that w ∈ LA .
Induction Step. Let w ∈ L1 with |w| = k + 1. By definition of L1 , w = 0m 1p ,
for some m, p ∈ N and 2m ≤ p ≤ 3m. Hence, either w = 0u11, where u =
0m−1 1p−2 Or w = 0u111, where u = 0m−1 1p−3 Thus, u, v ∈ L1 . Moreover, since
|u| = |0m−1 1p−2 | ≤ k + 1and|u| = | 0m−1 1p−3 | ≤ k + 1. Hence, by the induction
∗
∗
hypothesis, u, v ∈ L1 . By definition of L1 , S ⇒ u and S ⇒ v . Thus, the following
is a valid S-derivations:
∗
S ⇒ 0S11 ⇒ 0u11 = w
5
or
∗
S ⇒ 0S111 ⇒ 0v11 = w
2
Thus, w ∈ L1 .
b) Consider the CFG G2 :
S −→ AC
A −→ aAb | ε
C −→ cC | ε
Prove that L(G2 ) = {am bm cn | m, n ≥ 0}
Solution:
First, it should be noted that, since the only S-rule is the rule S → AC, every derivation of a
string w ∈ L(G2 ) is of the form
+
S ⇒ AC ⇒ uv = w
+
+
+
+
where A ⇒ u ∈ Σ∗ and C ⇒ v ∈ Σ∗ . Hence, L(G2 ) = L(GA ) ◦ L(GC ) = {u | A ⇒ u} ◦ {v | C ⇒
v}. To prove that L(G2 ) = {am bm cn | m, n ≥ 0}, it suffices to show that
a) L(GA ) = L1 = {am bm | m ≥ 0} and
b) L(GC ) = L2 = {cn | n ≥ 0}.
Claim 1. L(GA ) = L1
Proof. We divide the proof into two parts.
L(GA ) ⊆ L1 . We prove the statement by induction on the length n of A-derivations.
Basis (n = 1). The only A-derivation of length 1 is the derivation A ⇒ ε and ε ∈ L1 .
j
Induction Hypothesis. For some k ∈ N, if A ⇒ w, with j ≤ k, w ∈ L1 .
k+1
Induction Step. Suppose that A ⇒ w. Hence,
k
A ⇒ aAb ⇒ aub = w
k
Thus, A ⇒ u. By the induction hypothesis, u ∈ L1 . Hence, u = am bm , for some
m ≥ 0. It follows that w = aub = am+1 bm+1 ∈ L1 .
L1 ⊆ L(GA ). We prove the statement by induction on the length n of strings in L1 .
Basis (n = 0). The only string of length 0 is ε, and A ⇒ ε. Hence, ε ∈ LA .
Induction Hypothesis. For some k ∈ N, if w ∈ Σ∗ and |w| ≤ k, then w ∈ L1 implies
that w ∈ LA .
Induction Step. Let w ∈ L1 with |w| = k + 1. By definition of L1 , w = am bm , for some
m ≥ 0. Moreover, since |w| = k + 1, it follows that m ≥ 1. Hence, w = aub, where
u = am−1 bm−1 for some m − 1 ≥ 0. Thus, u ∈ L1 . Moreover, since 2m = k + 1, it
follows that |u| = 2m − 2 = k − 1. Hence, by the induction hypothesis, u ∈ LA . By
∗
definition of LA , A ⇒ u. Thus, the following is a valid A-derivation:
∗
A ⇒ aAb ⇒ aub = w
2
Thus, w ∈ LA .
6
Claim 2. L(GC ) = L2
Proof. We divide the proof into two parts.
L(GC ) ⊆ L2 . We prove the statement by induction on the length n of C-derivations.
Basis (n = 1). The only C-derivation of length 1 is the derivation C ⇒ ε and ε ∈ L2 .
j
Induction Hypothesis. For some k ∈ N, if C ⇒ w, with j ≤ k, w ∈ L2 .
k+1
Induction Step. Suppose that C ⇒ w. Hence,
k
C ⇒ cC ⇒ cu = w
k
Thus, C ⇒ u. By the induction hypothesis, u ∈ L2 . Hence, u = cm , for some m ≥ 0.
It follows that w = cu = cm+1 ∈ L2 .
L2 ⊆ L(GC ). We prove the statement by induction on the length n of strings in L2 .
Basis (n = 0). The only string of length 0 is ε, and C ⇒ ε. Hence, ε ∈ LC .
Induction Hypothesis. For some k ∈ N, if w ∈ Σ∗ and |w| ≤ k, then w ∈ L2 implies
that w ∈ LC .
Induction Step. Let w ∈ L2 with |w| = k + 1. By definition of L2 , w = cm , for some
m ≥ 0. Moreover, since |w| = k + 1, it follows that m ≥ 1. Hence, w = cu, where
u = cm−1 for some m − 1 ≥ 0. Thus, u ∈ L2 . Moreover, since m = k + 1, it follows
that |u| = m − 1 = k. Hence, by the induction hypothesis, u ∈ LC . By definition of
∗
LC , C ⇒ u. Thus, the following is a valid C-derivation:
∗
C ⇒ cC ⇒ cu = w
2
Thus, w ∈ LC .
7