Static Enforcement of Security with Types

3.2 Language and Grammar
3.2.7 Left Factoring
Unclear productions
A 1 | 2
Left factoring the grammar
A   A
A   1 |  2
3.2 Language and Grammar
Example Dangling Else
stmt  if expr then stmt else stmt
| if expr then stmt
| other
Left factoring
stmt  if expr then stmt optional_else_part
| other
optional_else_part  else stmt
|
3.2 Language and Grammar
3.2.8 Non-Context-Free Language Constructs
L1 = {wcw | w is in (a | b)*}
Checking that identifiers are
declared before used
L2 = {anbmcndm | n  0, m  0}
Number of formal parameters
agrees with the number of actual
parameters
L3 = {anbncn | n  0}
Requirements for typesetting
L1= {wcwR | w(a|b)*}
S  aSa | bSb | c
L2 = {a nbmcmdn | n  1, m  1 }
S  aSd | aAd
A  bAc | bc
L 2 = {a nbncmdm | n  1,m  1 }
S  AB
A  aAb | ab
B  cBd | cd
L3 ={a nb n | n  1 }
S  aSb | ab
3.2 Language and Grammar
L3 ={a nb n | n  1 }
S  aSb | ab
L3 could not be described using Regex
If there exists an DFA D accepting L3’, with k states
Suppose D reaches s0, s1, …, sk after reading , a, aa, …,
ak , respectively
For an input with more than k a’s, some state must
be entered twice
Path labeled aj  i
…。。。
Path labeled ai
s0
…。。。
Path labeled bi
si
…。。。
f
3.2 Language and Grammar
3.2.9 Formal Language
Unrestricted
Grammer G = (VT , VN, S, P)
Type 0:  , ,   (VN VT)*, |  |  1
Type 1:|  |  |  |,except S  
Type 2:A  ,AVN ,   (VN ∪VT)*
Type 3:A  aB or A  a,A, BVN , a VT
Context
Free
Regex
Context
Sensitive
3.2 Language and Grammar
Example:L3={ a nb nc n| n  1}
S  aSBC
aB  ab
cC  cc
S  aBC
bB  bb
Derivations of anbncn
S * a n-1S (BC) n1
S + an(BC)n
S + anBnCn
S + a nbB n1C n
S + a nbnC n
S + anbncC n-1
S + a nbncn
CB  BC
bC  bc
Revisions
A 1 | 2
Regex
Left Factoring
Restrictions
A+Aa
Eliminating Left Recursion
Context Free Grammar
Definition
Derivation
Parse Tree
Type 0
Type 1
Left most
Right most
Type 2
Type 3
Ambiguity
Elimination
of Ambiguity
3.3 Top-Down Parsing
3.3.1 Common Appoaches
例: S  aCb C  cd | c
Given input w = acb
S
a
C
S
S
b
a
c
C
b
a
d
Not suitable for grammars with left
recursions or common left-factors
C
c
b
3.3 Top-Down Parsing
3.3.2 LL(1) Grammar
How to restrict the grammar to avoid
backtrack?
Define two functions, i.e., first and follow
FIRST ( )={a |  * a…, a  VT}
1、esp., if * , define   FIRST ( )
2、if AB, then FIRST(B) should be added
to FIRST(A)
Given two choicesi and j,
FIRST (i )  FIRST (j ) = 
On condition that  not in FIRST (i ) or FIRST
( )
3.3 Top-Down Parsing
3.3.2 LL(1) Grammar
How to restrict the grammar to avoid
backtrack?
Define two functions, i.e., first and follow
FOLLOW (A) = {a | S * …Aa…,aVT}
1. If A is the right most symbol of a
sentential form, place $ in FOLLOW(A)
2. If there is a production AB or AB,
in which   *  , then everything in
FOLLOW(A) is in FOLLOW(B)
3.3 Top-Down Parsing
LL(1) Grammar
Any production A  |  should agree
with
FIRST( )  FIRST( ) = 
If  *  , then FIRST()  FOLLOW(A) =

3.3 Top-Down Parsing
LL(1) Grammar
Any production A  |  should agree
with
FIRST( )  FIRST( ) = 
If  *  , then FIRST()  FOLLOW(A) =

Characteristics of LL(1)
No common left-factor
Not ambiguous
No left recursion
3.3 Top-Down Parsing
例
E  TE 
E   + TE  | 
T  FT 
T   * FT  | 
F  (E) | id
FIRST(E) = FIRST(T) = FIRST(F) = { ( , id }
FIRST(E ) = {+, }
FRIST(T ) = {*, }
FOLLOW(E) = FOLLOW(E ) = { ), $}
FOLLOW(T) = FOLLOW (T ) = { +, ), $}
FOLLOW(F) = {+, *, ), $}
7、Computing FIRST and FOLLOW
FIRST
If Xa.., Place terminalain FIRST(X)
If X, then place  in FIRST(X)
If XY…,where Y is a non-terminal, then add everything of
FIRST(Y)\{} to FIRST(X)
If XY1Y2..YK, where Y1,Y2,..Yi-1 are non-terminals, and all the
FIRST set of Y1,Y2,..Yi-1的FIRST contain , then place FIRST(Y)\{} in
FIRST(X), (j=1,2,..i). Especially, if all Y1~YK contain  production,
then place  in FIRST(X)
SBA
FIRST(B)={a,b,c}
ABS|d
FIRST(A)={a,b,c,d}
BaA|bS|c
FIRST(S)={a,b,c}
7、Computing FIRST and FOLLOW
FOLLOW
For the start symbol S, place $ in FOLLOW(S)
If there exists AB, then place FIRST()\{} in FOLLOW(B). Note
that  could be empty
If A B or A B , and  * ( is in FIRST()), then place
FOLLOW(A) in FOLLOW(B)( could be empty)。
SBA
FIRST(B)={a,b,c} FIRST(A)={a,b,c,d} FIRST(S)={a,b,c}
ABS|d
BaA|bS|c
FOLLOW(S)=?
FOLLOW(A)=?
FOLLOW(B)=?
3.3 Top-Down Parsing
3.3.3 Recursive Descent Parsing
A set of procedures, for each non-terminal
The procedures could be recursive
Example:
type  simple
|  id
| array [simple] of type
simple  integer
| char
| num dotdot num
3.3 Top-Down Parsing
An auxiliary procedure
procedure match (t : token);
begin
if lookahead = t then
lookahead := nexttoken( )
else error( )
end;
3.3 Top-Down Parsing
proccdure type;
begin
if lookahead in {integer, char, num} then
simple( )
else if lookahead =  then begin
match (); match (id)
end
else if lookahead = array then begin
match (array); match (  [  ); simple( );
match (  ]  ); match (of ); type( )
end
type  simple
else error( )
|  id
end;
| array [simple] of type
3.3 Top-Down Parsing
procedure simple;
begin
if lookahead = integer then
match (integer)
else if lookahead = char then
match (char)
else if lookahead = num then begin
match (num); match (dotdot); match
(num)
simple  integer
end
| char
else error( )
| num dotdot num
end;
3.3 Top-Down Parsing
3.3.4 Non-recursive predictive parsing
Input
Stack X
Y
Z
$
a + b $
Predictive Parsing
Program
Parsing Table
M
Output
3.3 Top-Down Parsing
Input Symbol
Nonterminal
id
E
E  TE 
E
T
*
E   +TE 
T  FT 
T
F
+
T
F  id
T   *FT 
...
3.3 Top-Down Parsing
Moves made by parser to accept id * id + id
Stack
$E
$E T
$E T F
$E T  id
$E T 
$E T F*
$E T F
$E T  id
Input
id * id + id$
id * id + id$
id * id + id$
id * id + id$
* id + id$
* id + id$
id + id$
id + id$
Output
E  TE 
T  FT 
F  id
T   *FT 
F  id
E  TE 
E   + TE  | 
T  FT 
T   * FT  | 
F  (E) | id
6、LL(1) Parsing
E
input
stack
E
$
Depth first
输入:id*id
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
$E T 
* id $
$E T F*
* id $
$E T F
id $
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
T
input
stack
E
E
T
E’
$
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
$E T 
* id $
$E T F*
* id $
$E T F
id $
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
E
T
F
T
input
stack
E
F
T’
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
$E T 
* id $
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
input
stack
E
E
T
F
id
T
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
id
$E T 
* id $
T’
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
input
stack
E
E
T
F
id
T
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
$E T 
* id $
T’
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
栈
E
E
T
*
F
T
id * F
T
输
入
输
出
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
F
$E T 
* id $
T’
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
栈
E
E
T
F
T
id * F
T
输
入
输
出
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
F
$E T 
* id $
T’
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
栈
E
E
T
F
T
id * F
id
T
输
入
输
出
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
id
$E T 
* id $
T’
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
input
stack
E
E
T
F
T
id * F
id
T
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
$E T 
* id $
T’
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
input
stack
E
E
T
F
T
id * F
T
id 
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
$E T 
* id $
$E T F*
* id $
E’
$E T F
id $
$
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
input
stack
E
E
T
F
T
id * F

T
id 
$
output
$E
id * id $
$E T
id * id $
E  TE 
$E T F
id * id $
T  FT 
$E T  id
id * id $
F  id
$E T 
* id $
$E T F*
* id $
$E T F
id $
$E T  id
id $
T   *FT 
F  id
$E T 
$
$E 
$
T
$
$
E
3.3 Top-Down Parsing
3.3.5 Constructing predictive parsing table
(1)For each production A   , execute(2) (3)
(2)For each terminal a of FIRST(), add A  
in M[A, a]。
(3)If  is in FIRST(), then for each terminal b
(including $) of OLLOW(A), add A   in M[A,
b]。
(4)Label undefined entries of M as error。
3.3 Top-Down Parsing
Multiple defined entry
Nonterminal
stmt
Input Symbol
other
b
stmt  other
e_part else stmt
e_part  
e_part
expr
else
expr  b
...
3.3 Top-Down Parsing
Multiple defined entry
Nonterminal
stmt
Input Symbol
other
b
stmt  other
e_part else stmt
e_part
expr
else
expr  b
...
习
题
3.4(b)(c), 3.6(a)(b), 3.8