Exercise Set 1

Fall 2015-2016 Compiler Principles
Exercise Set 1: Lexical Analysis
Roman Manevich
Ben-Gurion University
Exercise 1
• Assume the following lexical specification:
R = (1-9)(0-9)*. (0-9)*(1-9) | 0. (0-9)*(1-9)
D = (0-9)(0-9).(0-9)(0-9).(0-9)(0-9)(0-9)(0-9)
• Construct the scanner automaton and label
the accepting states by the appropriate tokens
• Run it on the following inputs using the
maximal munch algorithm
1.23.2
1.230.2
01.11.2015
2
Exercise 2 (2015 midterm)
3
Exercise 3
a) Given an example of a lexical specification
R1,…, Rk and a word  in the language
(R1 | … | Rk)* for which the maximal munch
policy does not yield “Success”
b) [harder] Can you think of an algorithm that
satisfies both properties (successfully splits
any word in the language of R1 | … | Rk and
produces the longest tokens)?
4
Exercise 4
• Which of the following conditions yields worst-case
linear running time for the naïve maximal munch
algorithm:
1.
2.
3.
4.
All tokens have constant length
The languages of different tokens do not intersect
The languages of different tokens do not intersect
Each token starts with a unique letter and ends with a
unique letter
5. The language of each token is prefix-closed (i.e., if w is
accepted than each prefix of w is also accepted)
•
Find conditions as weak as you can for which the
worst-case running time of the naïve maximal munch
algorithm is linear
5
Exercise 5 – research challenge
• The worst-case of Tom Reps’s algorithm is
O(nk) where n is the length of the input and k
is the number of automaton states
• Can you find an algorithm that reduces the k
constant factor to a constant?
6
Exercise 6 – research challenge
• Develop an asymptotically efficient parallel
algorithm for maximal munch-based scanning
7