unit -iv formal languages

UNIT -IV FORMAL LANGUAGES
Languages and grammar, Phrase Structure grammar, four classes of grammars,
pumping lemma for regular languages, context free languages
4.1 INTRODUCTION
We study computation to learn about the fundamental principles that stands as a basis for practical
applications of computing. The three important structures used in models of computation are grammars,
finite state machines and turing machines. Grammars are used to generate the elements also called as
words of a language. Formal languages, which are generated by grammars, provide models for natural
languages like English and for programming languages. Grammars are very important in the theory and
construction of compilers. We introduce various types of grammars and study certain relations between
them.
4.2 LEARNING OBJECTIVE

To be capable of writing grammars for a given language and also recognizing the language
generated by a given grammar

To be capable of checking whether a particular word is derivable by the given grammar

To be capable of recognising the type of given language, in particular to check whether a
language is not regular or not context- free.
4.3 BASIC CONCEPTS OF AUTOMATA THEORY
We shall discuss the basic definition of alphabet, string and languages with few illustrations.
ALPHABET
An alphabet is a finite, nonempty set of symbols (or characters) denoted by  (to be read as sigma) or V.
Examples:
1.   a, b, c,..., z is an alphabet (of all lower case English letters) with symbols a, b,
c, …, z.
2.   0,1 denotes the binary alphabet with symbols 0 and 1
3.   0,1, a, b, c is also an alphabet.
STRINGS:
A finite sequence of symbols (may or may not be repeated) chosen from some alphabet  is
defined to be a string (or word).
Examples.
1. Let   0,1 we can form the strings 010, 0110, 111, 01 etc.
2.   a, e, b, c, t . The strings over  are bat, cat, bet, eat, bee, beat and so on. But “car” is not a
string over  since the symbol r does not belong to  .
EMPTY STRING:
A string having no symbol is called an empty string usually denoted by
 (epsilon) or  (phi) or 
(lamda).
Notation :
The lower case letters near the end of English alphabet such as u,v,w, x, y, z are always used to represent a
string.
LENGTH OF A STRING
Let  be an alphabet. Let u be a string over  , then the length of the string u is the number of symbols
in the string (not necessarily distinct) and is denoted by u .
For instance, if   0,1 then u = 0101101 is a string over 
u=0 1 0 1 1 0 1
here we have seven symbols constituting the word u with repetitions then the length of u,
u  0101101  7 .
If   a, b, c , then the strings of length 2 are given as aa, bb, cc, ab, ba, bc, cb, ca, and ac.
Clearly length of the empty string is 0.
  0 (no symbol to count)
NOTATION:
* - the set of all strings over the alphabet 
   *    - set of all strings over  excluding the empty string 
 n be the strings of length n formed by the symbols in  and 0  { }
 k , N  0,1,2,... , a collection of all strings of length 0, length 1, length 2
It is obvious that  
*
kN

etc., and  
k .
kN
k 0
Example:
Let   a, b
Let  = the set of all two length strings formed by the symbols in 
2
= aa, bb, ab, ba
4.3.1 CONCATENATION OF STRINGS
The main operation on strings is concatenation. Let  be an alphabet. Let x, y be two strings of
 then the concatenation of x and y gives a new string xy in * by putting them together end to end.
Clearly
 x  x  x
It is obvious that xy  x  y .
Example:
Let   0,1
Let x = 00101  , x  5
*
y = 110101 * , y  6
Then z = xy = 00101110101 is a string belonging to  and
*
z  11, x  y  5  6  11  z  x  y
SUBSTRING, SUFFIX AND PREFIX OF A STRING:
If w = xuy, then u is called a sub string of w. String u is a suffix of v if v = xu for some string x .
String u is a prefix of v if v = uy for some string y.
Example:
If w = aaabbbccd, be a string over   a, b, c, d  then
u = abb is a substring of w whereas acd is not a substring of w.
If u = abb, then bb is a suffix of u and ab is a prefix of u.
REVERSAL OF A STRING
If x  a1a2 ...an is a string of an alphabet  then the reversal of x is
x R  an an1...a2 a1 . Clearly x R  x . If x  x R , then x is called a palindrome.
Example:
1. x = abcda then xR = adcba, x  x . But x  x , x is not a palindrome.
R
R
2. x = aba then xR = aba so you can check that x is a palindrome of length 3
3. Concatenation: x = abd, y = hlm, then xy = abdhlm and yx = hlmabd
Reversal: x = abd, xR = dba
Exponentiation: Let x = abd then x0 =
 , x2 = xx = abdabd, x3 = xxx = abdabdabd
4.3.2 LANGUAGE:
A language L over an alphabet  is a subset of  , L   . A language may be empty.
*
*
Here you should note a point L    ,a language containing an empty string is not an empty language.
Examples.
Let   0,1


2. L   x a, b / x  x , x  3
3. L   x 0,1 / x  3 = {000, 001, 010, 011, 100, 101, 110, 111}.
1. L1  x 0,1 / x ends with single 0
*
*
R
2
*
4. L = { the set of all strings of length 4 that begin with a and end with b over {a,b,c}}
= { aaab,abb,abab,abbb,aacb,acab,accb,abcb,acbb}
OPERATION ON LANGUAGES:
Since languages are subsets of  , the usual set operations like union, intersection, difference,
*
complement can be performed .
Consider two languages L1 and L2 over the alphabet  . then,
1. Union
:
2. Intersection
:
3. Complementation :
L1  L2  u / u  L1 or u  L2 
L1  L2  u / u  L1 and u  L2 
L  *  L , for a language L
4. Concatenation: Another interesting operation is concatenation.
L1 L2 = {uv | u  L1 , v  L2 } , a language consisting the set of all concatenation of strings in L1 with
strings in L2.
Note:
1. Concatenation operation is not commutative.
Let   a, b , L1={ a,ab}, L2={ b,abb}. L1 L2= { ab,aabb,abb,ababb} and
L2 L1= { ba,bab,abba,abbab}. Hence L1 L2  L2 L1.
2. However concatenation operation is associative.
Consider languages L1 , L2 and L3 over the alphabet 
L1 ( L2 L3 ) = (L1 L2 ) L3 which implies that products can be written without using parenthesis.
Powers of languages. Given a language L, let L0 = {
 } , Ln = L Ln-1 for n > 0
Example:
If   a, b and L = { a,bb}, then L0 = {  ) L1=L, L2= LL = { aa,abb,bba,bbbb}
5. Kleen closure (or Kleen star) of a language L over  is defined as L* 

Li
i 0
and the positive closure of L also called as Kleen positive is the language denoted by L 

Li .
i 1
Note: L  L*  { } . However L need not be equal to L*  { } . Consider   a . Let L = { , a} ,
then L  L*
One can show that
L  LL*
L  L*  { } if   L
L  L* if   L.
PROPERTIES OF CLOSURE OPERATION:
Let L, M be two languages over  , then
1. ( L* M * )*  ( L*  M * )*  ( L  M )*
2. L(ML)*  ( LM )* L
HAVE YOU UNDERSTOOD THE CONCEPTS ?
ANSWER THE FOLLOWING:
1. Is   0,1, a, b an alphabet?
Ans: yes
2. What is the length of the string abcdefda
Ans: 8
3. If x = abcba, y = bcaba
i.
What is the concatenation y with x
Ans: bcabaabcba
ii.
Is x = y
No
iii.
Is x a reversible string
Yes
iv.
Is y  y R ? Is y reversible?
y  y R =5,y not reversible
4. What is the language that generates the strings a, a3 , a5 , a7 , a11 ,...
Ans:L={an/ n is a prime number }
4.4 PHRASE STRUCTURE GRAMMARS ( Type- 0 grammar)
In section we learn the notions and definitions of a phrase-structure grammar and phrase structure
language with some examples. From this section we will denote the alphabet by V.
4.4.1 PHRASE-STRUCTURE GRAMMAR
A phrase-structure grammar (PSG) denoted by G consists of four components,
G = (V, T, P, S), V  T   where: V is a finite set of symbols called non-terminals,
T is a finite set of terminal symbols, P is a finite set of productions with rules of the form u  v , u and
v are strings over the alphabet V  T with the restriction that u is not the empty string. S a specific
nonterminal is called the start symbol. Such a grammar is always referred as an arbitrary grammar or
type-0 grammar.
A string made up of terminals and /or nonterminals is called a sentential form. By derivation we mean that
if x and y are sentential forms and u  v is a production then the replacement of u by v in xuy and it is
denoted as xuy  xvy
Notation:  derives in one step
 derives in one or more steps
* derives in zero or more steps.
Examples:
1.Consider the grammar G = ({S}, {a,b,c}, P, S) where P is given by
S   , S  aS , S  bS , S  cS . These productions are also written in the short form as
S   | aS | bS | cS
2. Consider the grammar G = ({S,A,B}, {a,b}, P, S) where P is given by
S  AB, A   , A  aA, B   , B  bB . These productions are also written in the short form as
S  AB, A   | aA, B   | aB
consider
S  AB  aAB  aaAB  aaB  aabB  aab . This statement is written as
S  aab
consider S  AB  AbB  Ab  aAb  aaAb  aab . This shows that the string aab is obtained
using two different derivations.
Note:
When a grammar contains more than one terminal, it may be possible to find different derivations of the
same string.
4.4.2 PHRASE-STRUCTURE LANGUAGE:
A language generated by a phrase-structure grammar is said to be a phrase-structure language, i.e.,
if G = (V, T, P, S) is a PSG, then
L  G   w  T * / S   w is the phrase-structure language generated by G.
Example 1:
Consider a PSG, G = (V, T, P, S) with V = {S, A}, T = {a, b}
P  S  aS , S  aA, A  bA, A  b = S  aS | aA, A  bA | b ,
consists of four production rules, S the start symbol, then what is L(G)?
Solution:
The grammar works as follows.
Starting with S, applying rule S  aS , (n-1) times we get an-1S,
Applying the rule S  aA we get anA,
Applying the rule A  bA , m-1 times we get the string anbm-1A,
Finally applying the rule A  b we get anbm
It is easy to follow that a collection of words of the anbm with any number of a’s followed by any
number of b’s forms the language generated by the grammar G.
S  aS  a n S  a n A  a nbm A  a nbm
L  G   ab, a 2b, ab 2 , a 3b, a 2b 2 , ab3 ,...
L  G   a n b m / n, m  1
Example 2:
Consider the grammar G = (V, T, P, S) where
V  {S , A}
T  {a, b}
P  {S  aAb, A  aAb, A   }
S – the start symbol


It can be shown that L  G   a b / n, m  1
n
m
Above two examples implies the following:
Note:
A language can have more than one grammar.
Example 3:
Find the language generated by the grammar:
S  0SBA / 01A, AB  BA,1B  11,1A  10, 0 A  00
Solution:
P1 / S  0SBA
P2 S  01A, P3 : AB  BA, P4 :1B  11, P5 :1A  10
P6 : 0 A  00 starting with S production
S  01A  010  010  L(G )
S  0SBA
n  10n1 S ( BA)n1
 0n 101ABB n 2 An 1  0n1BB n 2 An  0n1n 0n
 L(G )  0n1n 0n : n  1
Note:.
Given a grammar G, framing L (G) is discussed..
Now, the natural question that arises is given a (PSL) language L (G) how to frame the corresponding
grammar G:
Example 1:
If L  {a n bn / n 1} is a language over   a, b . To construct a grammar
G = (V,T, P, S) such that L  L(G)  {a n bn / n 1} .
Solution:
Define G  {S},{a, b}, P, S  where P consists of S  aSb, S  ab . Then S  ab , which implies
*
*
ab  L(G ) . S  a n 1Sb n 1  a n 1abb n 1 i.e., S  a nb n . Therefore a nb n  L(G ) . Let w  L(G ) . Assume
that S  w in n steps. If n = 1, then w  ab . If n  1 , then the production S  aSb is to be used for
the first (n-1) steps, while the production S  ab is to be used in the (final) nth step. Now it follows
that w  a nb n . Thus w  L(G ) , then w  a nb n for some n  1. Hence L(G )  a nb n : n  1 . Hence
L  L(G)  {a n bn / n 1} .
In particular the derivation step for aabb is given by S  aSb  aabb( a b )
2 2
Example 2:
Construct a suitable grammar for L  a nb 2 n / n  1 .
Solution:
Let G  (V , T , P, S ) generates L.
Claim L  abb, aabbbb,...
In all words number of b’s are double of number of a’s.
G  S , a, b , S  abb, S  aSbb , S 
4.4.3 COMBINATION OF GRAMMARS:
To write a grammar for the language L = {  , a,b,aa,bb,aaa,bbb,… an, bn, ….}
It is clear that L is union of the languages M ={  , a,aa,aaa, … an, ….}and
N={  ,b,bb,bbb,… , bn, ….}
Grammar for M is A   | aA
Grammar for N is B   | bB
The grammar for L=M  N is S  A| B , A   | aA , B   | bB .
UNION RULE:
In general if M and N are two languages whose grammars have disjoint sets of nonterminals by renaming
them if necessary. Let A and B be start symbols of M and N , then the language for M  N starts with the
two productions S  A| B .
PRODUCT RULE:
The language MN starts with the production S  AB
To write a grammar for the language L = { ambn / m, n are non-negative integers}
It is clear that L is product of the two languages M ={  , a,aa,aaa, … an, ….}and
N={  ,b,bb,bbb, bn, ….}.
Grammar for M is A   | aA
Grammar for N is B   | bB
The grammar for L = MN is S  AB , A   | aA , B   | bB .
Examples:
1. Let   a, b and the language L be the set all palindromes over {a, b}Find a grammar G such
that L = L(G).
Solution:
Define G  {S},{a, b}, P, S  where P consists of the productions given by
S  aSa, S  bSb, S  a, S  b, S   . Obviously a palindrome of even length can be derived using
S  aSa, S  bSb and S   . If w is a palindrome of odd length with centre-marker a, then the
production S  aSa | bSb | a can be used to derive w. When the centre-marker is b then we have to use
S  b as the final production. Thus G is the required grammar.
2. Find the grammar generating L  a nb n c m : n  1, m  0
Solution:
Define G  {S , A},{a, b, c}, P, S  where P consists of S  A, A  ab, A  aAb, S  Sc .
*
*
*
S  A  a n 1 Ab n 1  a n 1abb n 1 i.e., a nbn  L(G ) . Also S  Sc  Sc m  a nb n c m . So a nbn c m  L(G) .
Hence L  L(G ) .
*
Let w  L(G ) . Let S  w in k steps. In the first step either the production S  A or the production
S  Sc should be used. As w {a, b, c}* , in the last step, we have to use the production A  ab . It
follows that k  2 . We observe that, S  Sc cannot be used in all the steps, since S cannot be
eliminated in the resulting string. S can be eliminated only by using S  A ). If the production S  A
should be used in some step, say (m+1)th step for some 0  m  k  2 , then the non-terminal S will not
appear in subsequent steps and only the production A  aAb should be used in m+2, m+3, …, k-1
steps. Note that S  Sc has been in the first m steps. Thus L(G )  L . Hence we have L  L(G ) .
3. Construct a grammar G for the language L(G )  a nba m : n, m  1
Solution:
Define G  {S , A},{a, b}, P, S  where P consists of S  aS , S  abA, A  aA, A  a . We first show
*
that S  a nba m , n, m  1 .
*
S  abA  aba . Hence S  aba .
S  aS
(by applying S  aS once)
*
 a n 1S (by applying S  aS (n-2) times)
 a nbA (by S  abA )
 a nba m 1 A
(by applying A  aA (m-1) times)
 a nba m 1a
(by applying A  a finally).
*
Hence S  a nba m for m, n  1 .
*
Let w  L(G ) . Let S  w in k steps. As w {a, b}* , it follows that k  2 . The production S  aAb
must be used in some step, otherwise S, a variable, will remain till the end. If the production
S  aAb is used in a step, S will be eliminated at that step and only the productions A  aA and
A  a can be used in subsequent steps. As the production A  a eliminates the variable A, it should
be used only at the last step. Thus in deriving the word w,
the production S  abA is used in nth step for some n , 1  n  k 1
the production A  aA is used in first n-1 steps
the production A  aA is used in nth to (k-1)th steps
the production A  a is used in the last step.
Thus
*
S  a n 1S
 a nbA
in the (n-1)th step.
in the nth step using S  bA
 a n ba k 1 A in the (k-1)th step.
 a n ba k
in the kth step.
So w  a nba m for some n, m  1 .
Thus L(G )  a nba m : n, m  1
4. Construct a grammar for the language L  a i b j : i  j  0 .
Solution:
Define G  {S , A},{a, b,}, P, S  where P consists of S  aS , S  aA, A  aAb, A  ab . The
construction will be clear if we realize that a i b j  a i  j a j b j . A  aAb and A  ab can be used to
generate a j b j . For getting a i  j in a i b j we can use the productions S  aS , S  aA . Now
*
S  a i  j 1S
 a i  j 1aA
*
(by applying S  aS
(i-j-1) times)
(by S  aA )
 a i  j a j 1 Ab j 1
(by applying A  aAb
 ai j a jb j
(by applying A  ab finally)
(j-1) times)
Hence ai b j  L(G) when i  j  0 . The reverse inclusion can be proved as in earlier examples.
5. Obtain a grammar for the language L  a mb n : m  n, n  0 .
Solution:
Before constructing the grammar we note that L  L1  L2  L3 where L1  a mb n : m  n  1 ,
L2  a mb n : m  n, m  1 , L3  b n : n  1 . L1 is the language given in the previous problem. Now let
us define grammar G accepting the given language L
G  {S , S1 , S2 , S3 A},{a, b}, P, S  where P consists of
S  S1 , S  S2 , S  S3
………… (1)
S1  aS1 , S1  Aa, A  aAb, A  ab
………… (2)
S2  S2b, S2  Ab
………… (3)
S3  bS3 , S3  b
…………. (4)
*
By the previous example S1  w , if and only if w  L1 . The productions given in (3) together with
*
A  aAb and A  ab generates a word in L2 . Hence S 2  w if and only if w  L2 . Obviously
S3  w if and only if w  L3 . By (1) , we see that S  S1 , S  S2 , S  S3 . Hence
L(G)  L1  L2  L3  L .
6. Find the language generated by the grammar G  V , T , P, S  where V   S , A , T   a, b  ,
P  S  aS , S  aAb, A  bAa, A  ba
Solution:
We show that L(G )  a m ab n a nb : m  0, n  1 .
*
S  aAb  abAab  ab n 1 Aa n 1b  ab n a nb using S  aAb once, A  bAa (n-1) times and finally
*
A  ba once. Hence abn a nb  L(G) .Also S  aS  aa m1S by applying S  aS m times, where
*
*
m  1 . So S  a m S  a m ab n a nb .
Hence a m ab n a nb : m  0, n  1  L(G ) .
The reverse inclusion can be proved
7. Construct a grammar G such that L(G)  w {a, b}: w has equal number of a's and b's .
Solution:
Define G  {S , A, B},{a, b,}, P, S  where P consists of
S  aB | bA, A  aS | bAA | a, B  bS | aBB | b .
To prove that G accepts the given language, we prove the following by induction on w .
*
i.
S  w if and only if w consists of an equal number of a’s and b’s.
ii.
A  w if and only if w consists of one more a than the number of b’s.
iii.
B  w if and only if w consists of one more b than the number of a’s.
*
*
(i), (ii), (iii) are true when w  1 . For A  a and a is the only string of length one derivable from A.
Similarly b is the only string of length one which can be derived from B. Also no string of length one
is derivable from S. Thus there is a basis for induction.
Assume (i), (ii), (iii) to be true for all strings of length k-1. Let w  k .
*
We prove ‘only if’ part of (i). Let S  w . Then the first production has to be either S  aB or
*
*
S  bA . If the first production is S  aB , then S  aB  w . Hence w  aw1 , B  w1 and
w1  k  1. By induction hypothesis, (iii) is true for w1 . This means w1 has one more b than the
number of a’s. Hence w  aw1 has equal number of a’s and b’s. The proof is similar if the first
production is S  bA .
We list some languages along with a grammar for each language.
Language
w  {a, b}
*
: w has equal number of a's and b's
Grammar
{S , A, B},{a, b,}, P, S 
S  aB | bA, A  aS | bAA | a, B  bS | aBB | b
a b
m
a b
m
n
n
/ n, m  1
/ n, m  1
( {S, A}, {a, b},P,S})
S  aS | aA, A  bA | b ,
(V, T, P, S) where
V  ({S , A},{a, b}, P, S )
P  {S  aAb, A  aAb, A   }
set all palindromes over the alphabet{a, b}
{S},{a, b}, P, S  where P consists of
the
productions given by
S  aSa, S  bSb, S  a, S  b, S  
{a,ab,aab,aaab}
{S},{a, b}, P, S  , S  a | ab | aab | aaab
{ac,abc,ab2c,…abnc….}
{S , B},{a, b, c}, P, S 
S  aBc, B   | bB
 , a, aa, aaa,.....
{S},{a}, P, S 
S   | aS
a, abc, aabcc,..a bc ...
n
n
{S},{a, b, c}, P, S 
S  b | aSc
HAVE YOU UNDERSTOOD THE CONCEPTS?
ANSWER THE FOLLOWING:
1. The PSG or simply a grammar G  S , a, b , S  aSbb, S  abb , S  . What is language
generated by G.
2. If L  (ad ) n a m d m / i  2, j  1 then L is a language over the alphabet
a.   a, d , a, d
b.   a, b, c, d
c.   a, d
d.   a, d , n, m
3. If 1  a, b, c , 2  b, c, d then L  bi c j / i  j  1 is a language over the
alphabet
a. 1   2
b. 1   2
c. 1*
d. 1
4.
If    , then what is * .
5.
If   0 * 

i
i 0
a.
 ,0,00,000,...
b.
0,00,000,...
c.
0
d.
 
6. G  S ,a , S  aS , S  a , S  generates the language a n / n  1 . True or
False?
Ans : 1. L(G )  a nb 2 n / n  1 2. (c) 3. (a) 4. empty 5. (a) 6. true
4.5 Classification of grammars:
In this section we focus on the four main classifications of grammars namely
type 0, type 1, type 2 and type 3 grammars.
TYPE 0 GRAMMAR
A general phrase structure grammar is called a type 0 grammar or an arbitrary grammar.
TYPE 1 GRAMMAR (CONTEXT-SENSITIVE GRAMMAR)
A PSG G = (V,T,P,S) is called a context-sensitive grammar (CSG); if each  u, v   P is such that

u = u1Au2, v = u1xu2, where A V , u1 , u2  (V  T ) , x  (V  T ) . A CSG is also called a type-1
*
grammar.
A is replaced by x in the context of u1 and u2 hence the name context-sensitive.
u1Au2 → u1xu2
The phrase-structure language, generated by a context- sensitive grammar is a context sensitive
language (CSL).
Suppose for every rule  u, v   P we have u  v then the PSG is said to be length-increasing. Now it
is easy to understand a grammar is CSG if it is length increasing. In other words a grammar is CSG if
the right hand side of any production is atleast as long as its left hand side.
Example:
Consider the grammar G = (V, T, P, S) where V = {S, B}, T = {a, b, c}
P consists of the following productions (rules)
S → aSBC,
S → abc
CB → BC,
bB → bb
L(G )  a n b n c n / n  1 is a CSL. We also observe that the rules in P obeys
Then
u  v , u  v  P
Why is this grammar context-sensitive?
Observe the rule CB → BC, CB is replaced or rewritten as BC only if we have the context CB in the left
side of the rule. Like wise the rule bB → bb, B is replaced by b
In the context of ab as its left member, other wise the rule cannot be applied.
TYPE 2 GRAMMAR (CONTEXT-FREE GRAMMAR):
A context-free grammar consists of four components, G = (V, T, P, S) where,
V is a finite non-empty set of symbols, called the non-terminals
T is the terminal alphabet, V  T   , S V , the start symbol
P a finite non-empty set of rules of the form u  v, u V , v  (V  T )
*
TYPE 2 LANGUAGE(CONTEXT-FREE LANGUAGE)
The language L(G) generated by a context-free grammar G is called a context-free language(CFL).
Note:
It is very clear from the definitions that every CFL is CSL.
Examples:
1. Let G = ( { S,A,B,C } , {0.1}, P,S ) where P: S → A | B | C, A → 0C |
B → 1B | 1 , C → BA.
Find L (G) . Is G a CFG ?
To find L(G)
S→C
 BA
 B0C
* (B0)n C
* (B0)n BA
 (B0)n B, for some n > 0
S→B
 1B
 11B

* 1n B
 1n+1
S→A

S→A
 0C
 0BA
 0B0C
* 0 (B0)n C
 0(B0)n BA = (0B)n+1
Hence we get
L(G) = { w in {0,1}* | each 0 in w is followed immediately by 1 }.
As the left hand side of each production is a single variable, G is context-free grammar
2. Let G = (V, T, P, S)
V = {S}, T = {a, b}
P = {S → SaSaSaS, S → SaSaSbS, S → SbSaSaS, S →
 }
Then


L  G   w / w   a, b  and w consists of twice as many a's as b's

3. Let G = (V, T, P, S), V = {S}, T = {a, b}
P  S  aS , S  bS , S  a, S  b , then

L  G   w / w a, b
*

4. Show that G = (V,T,P,S), where V = { E,T,F}, T { a ,+, *, ( , ) },
P= { E  E+T| T, T  T * F | F, F  (E) | a } is CFG.
Proof:
As the left hand side of each production is a single variable, G is context-free grammar.
TYPE 3 GRAMMAR (REGULAR GRAMMAR)
A phrase-structure grammar G = (V, T, P, S) is called a regular-grammar if each rule u  v  P


is such that u V , v  T V  T .
*
Regular grammars are called as type 3 grammars or right-linear grammar.
TYPE 3 LANGUAGE (REGULAR LANGUAGE)
A language generated by a regular grammar G is called a regular language or a regular set (REG
or RL).
Note:
Regular languages are CFL but the converse need not be true.
Examples:
1. Any finite language is regular. This implies that the class of finite languages is a subset of Regular
languages.
Proof:
Let L = { a1, a2, a3,…,an }be a language with n elements.
Consider the grammar G = (V, T, P, S), Where V = S, T ={ a1, a2, a3,…,an }and
P = { S  a1, S  a2,
…
,S  an }.
Clearly S is a regular language.
2. G = (V, T, P, S), V = {S, A}, T = {a, b}
P  S  aS , S  Aa, A  bA, A  b . Then L  G   a n b m / n, m  1 which is a regular
language.
3. G = ( { S, A, B, C, D, E }, {a}, P, S) where P = { S  ACaB, CB  DB, aD  Da, AE  Ea, Ca
 aaC, CB  E, AD  AC, AE   .
As CB  E is a production in G , G is not CSG and hence not CFG or Regular Grammar.
Hence G is of type 0 grammar.
4. The language L = { an bn | n  0 } is not a regular language . The grammar which is not regular for
this language is S   | aSb.
This language is CFL but not regular.
Hence the set of regular languages is a proper subset of all CFL.
5. The language L = { an bn cn | n  1 }is CSL but not CFL
A Grammar generating L is given by:
S  aSBC|aBC, CB  BC, aB  ab, bB  bb, bC  bc, cC  cc.
Observe that the left hand side of the productions are not all single nonterminals.
Hence the set of CF languages is a proper subset of all CSL.
4.5.1 CHOMSKY HEIRARCHY:
Every regular grammar is a context-free grammar, every context-free grammar is a context-sensitive
grammar and every context-sensitive grammar is a phrase-structure grammar.
Hence we have the following hierarchy given by Noam Chomsky.
REG 
CF 
CS
 PS where REG stands for the family of regular languages, CFL stands for
the family of context-free languages, CS stands for the family of context-sensitive languages and PS
stands for the family of phrase-structure languages.
L  a nb n / n  1  a b  context-free but not regular
L  a nb n c n / n  0  a*b*c* context sensitive but not context-free.
HAVE YOU UNDERSTOOD THE CONCEPTS ?
EXERCISES:
PART A
1. Define a phrase structure grammar
2. Given G = ( {S,A,B,C,D}, {0}, P,S) where
P: S  AB, A  C, B  D, C  0, D  0, show that L (G) = 
3. Consider G  (V , T , P, S ) , where V  S , A, B , T  a, b, c
P  S  AB, A  ab, A  aAb, B  C, B  BC S is the starting symbol.
Determine whether the strings aabb, aaabbc, ababcc, aaabbbccc is a string in
the language L(G).
4. Show that the grammar with productions S  aAb | abSb | a, A  bS | aAAb is
ambiguous by constructing two derivation trees for the word abab.
EXERCISES:
PART B
1. Frame a grammar that generates the language
L   x / x  {a, b}* and x does not contain two consecutive a's
2. Give a grammar that generates the language accepting equal number of a’s and b’s.
3. Given
the following grammar: G1  S , A , a, b , S  aS , S  bA, A  aA, A  a , S  , What
is L(G1)?
4. Given G  (V , T , P, S ) , V  S , A, B , T  a, b, c
P  S  AB, AB  BA, A  a, B  b , S is the starting symbol. Find the set of
strings accepted by G2. Give the set-theoretic notion of L(G2).
5. Explain the four classes of grammars with an example.
6. Construct a type-2 grammar generating the language L consisting of strings of 0’s
and 1’s with more 0’s than 1’s.
7. If L(G )   x / x {a, b}* and x doesnot contain consecutive a's . Frame G as a
type-3 grammar.
8. Frame a grammar G that can generate the language (0,1) n / n  N  , N is the set of
natural numbers. Is G a regular grammar?
Books for References
1. Hopcroft and Ullman, “Introduction to Automata Theory, Languages and Computation”, Narosa
Publishing House, Delhi, 2002.
2. J.P.Tremblay and R.Manohar, “Discrete Mathematical Structures with Applications to Computer
Science”, Tata McGraw Hill, Third Edition, 2003.
3. C.L.Liu, “Elements of Discrete Mathematics”, Tata McGraw Hill, Second Edition, 2000.

Download Report

unit -iv formal languages

Paperzz.com

Your Paperzz