CS 4203 Compilers - 2014/2015
Handout: Assignment#2
1- Explain the function of the lexical analyzer ( scanner ) .
The job of the lexical analyzer, or scanner, is to transform a stream of characters into a stream of
tokens.
2- What is meant by regular expressions.
a language or technique for writing rules that describe tokens
3- Define DFA and NFA.
a language or technique for writing programs that recognize tokens that match the rules
DFA: Automata where the next state is uniquely given by the current state and the current input character.
Definition of NFA
•
An NFA (non-deterministic finite automaton) M consists of
– an alphabet , a set of states S,
– a transition function T: S x ( U{ε})℘(S),
– a start state s0 from S, and a set of accepting states A from S
4- Write regular expressions for the following characters sets or give reasons why no regular
expressions can be generated.
a- all strings of lowercase letters that begin and end in a
a [a-z]*a | a
b- all strings of lowercase letters that either begin or end with a ( or both )
a[a-z]* | [a-z]*a | a [a-z]*a
c- all strings of digits that contain no leading zeros
[1-9]digit* | 0
d- all strings of digits that represent even numbers
digit*(0|2|4|6|8)
e- all strings of digits such that all 2’s occur before all 9’s
(noNine)* (noTwo)*
f- all strings of a’s and b’s that contain no 3 consecutive b’s
Basically, what must happen here is that either no b, or one b, or two b’s must separate every
consecutive pair of a’s. Taking into account the boundary conditions, we get
((ε|b|bb)a)*(ε|b|bb)
g- all strings of a’s and b’s that contain an odd number of a’s or an odd number of b’s or
both
(b*ab*ab*)*ab*|(a*ba*ba*)*ba*
h- all strings of a’s and b’s that contain an even number of a’s and an even number of b’s
no regular expression
1
CS 4203 Compilers - 2014/2015
Handout: Assignment#2
Solved only for some special strings:
(aa | bb )*
i- all strings of a’s and b’s that contain exactly as many a’s as b’s
this case cannot be represented by a regular expression,
regular expression is not able to count
5- Write an English description for the following regular expressions.
a-( a | b )* a ( a | b | )
all strings of letters a, b that may begin with a or b and end with a or b
b- ( A | B | . . . | Z ) ( a | b | . . . | z) *
all alphabetic strings that must begin with a capital letter , the rest is of lowercase letters.
Zero or more lowercase letters.
c- ( aa | b )* ( a | bb )*
d- ( 0 | 1 | . . . | 9 | A | B | C | D | E | F | ) + ( x | X )
strings that contain sequence of digits and capital letters and end with small x or capital X
Regular Expressions – Extra examples
1. write down regular expressions for the following descriptions:
a) an integral number is a non-zero sequence of digits, optionally followed by a letter denoting the base
class (b for binary and o for octal).
digit = [0-9]
base = b|o
integral_number = digit+ base?
b) a fixed-point number is an (optional) sequence of digits followed by a dot (’.’) followed by a
sequence of digits.
digit = [0-9]
dot = [.]
fixed_point_number = digit* dot digit+
c) an identifier is a sequence of letters and digits; the first character must be a letter. The
underscore _ counts as a letter, but may not be used as the first or last character.
letter = [a-zA-Z]
digit = [0-9]
underscore = [_]
letter_or_digit = letter | digit
letter_or_digit_or_und = letter_or_digit | underscore
identifier = letter (letter_or_digit_or_und* letter_or_digit)?
2
CS 4203 Compilers - 2014/2015
Handout: Assignment#2
2. Find a regular expression corresponding to the language of all strings over the
alphabet { a, b } that contain exactly two a's.
Solution: A string in this language must have at least two a's. Since any string of b's can be placed in front of
the first a, behind the second a and between the two a's, and since an arbitrasry string of b's can be
represented by the regular expression b*,
b*a b*a b* is a regular expression for this language.
3. Find a regular expression corresponding to the language of all strings over the
alphabet { a, b } that do not end with ab.
Solution: Any string in a language over { a , b } must end in a or b. Hence if a string does not end with ab
then it ends with a or if it ends with b the last b must be preceded by a symbol b. Since it can have any string
in front of the last a or bb,
( a | b )*( a | bb ) is a regular expression for the language.
4. Represent the following set of strings by a Regular Expression?
{( ), (( )), ((( ))),…}
There is no regular expression for the set
{( ), (( )), ((( ))),…}
regular expression cannot have enough memory to remember the count of how many open parenthesis it
has seen. Hence it cannot match the open ones with the closed ones.
DFA–examples
1. Draw a DFA to recognize the regular expression (a|b)*bab? Illustrate its transition table?
The transition table of the DFA
Input char
a
b
Accepting
S0
S2
S0
S2
S1
S1
S3
S1
no
no
no
yes
state
S0
S1
S2
S3
3
CS 4203 Compilers - 2014/2015
Handout: Assignment#2
2. Draw a DFA that accepts all strings consisting of only symbol a over the alphabet { a, b }?
Illustrate its transition table?
The transition table of the DFA
Input char
a
b
Accepting
0
1
1
1
yes
no
state
0
1
3. What is the regular expression accepted by the following DFA, given the alphabet { a, b }.
This DFA has a cycle: 1 - 2 - 1 and it can go through this cycle any number of times by reading substring
ab repeatedly.
To find the language it accepts, first from the initial state go to state 1 by reading one a. Then from state 1
go through the cycle 1 - 2 - 1 any number of times by reading substring ab any number of times to come
back to state 1. This is represented by (ab)*. Then from state 1 go to state 2 and then to state 3 by reading
aa. Thus a string that is accepted by this DFA can be represented by a(ab)*aa .
The transition table of the DFA
Input char
a
b
Accepting
1
2
3
4
4
4
4
1
4
4
no
no
no
yes
no
state
0
1
2
3
4
4
CS 4203 Compilers - 2014/2015
Handout: Assignment#2
4. What is the regular expression accepted by the following DFA, given the alphabet { a, b }.
This DFA has two independent cycles: 0 - 1 - 0 and 0 - 2 - 0 and it can move through these cycles any
number of times in any order to reach the accepting state from the initial state such as 0 - 1 - 0 - 2 - 0 - 2 0. Thus a string that is accepted by this DFA can be represented by ( ab + bb )*.
The transition table of the DFA
Input char
state
S0
S1
S2
S3
a
b
Accepting
S1
S3
S3
S3
S2
S0
S0
S3
yes
no
no
no
5. What is the regular expression accepted by the following DFA, given the alphabet { a, b }.
This DFA has two accepting states: 0 and 1. Thus the language that is accepted by this DFA is the union of the
language accepted at state 0 and the one accepted at state 1. The language accepted at state 0 is b* . To find the
language accepted at state 1, first at state 0 read any number of b's. Then go to state 1 by reading one a. At this poin
(b*a) will have been read. At state 1 go through the cycle 1 - 2 - 1 any number of times by reading substring ba
repeatedly. Thus the language accepted at state 1 is b*a(ba)* .
The transition table of the DFA
---------------
5
CS 4203 Compilers - 2014/2015
Input char
state
S0
S1
S2
S3
Handout: Assignment#2
a
b
Accepting
S1
S3
S1
S3
S0
S2
S3
S3
yes
yes
no
no
6. Produce a deterministic finite automaton (DFA) that recognizes even length strings of digits over
the alphabet {0,1}, where the third digit is a 1
Note that L cannot contain strings of length less than 4.
The transition table of the DFA
Input char
state
S1
S2
S3
S4
S5
S6
0
1
Accepting
S1
S3
S4
S4
S6
S5
S1
S3
S5
S4
S6
S5
no
no
no
no
no
yes
6
© Copyright 2026 Paperzz