Plan for Today Regular Expressions: repetition and choice Context

Plan for Today
Regular Expressions: repetition and choice
Context Free Grammars
–  models for specifying programming languages
syntax
semantics
–  example grammars
–  derivations
let : “a” | “b” | “c”
word : let+
 
What regular expressions cannot express:
 
 
Parse trees
 
 
 
Syntax-directed translation
–  using syntax-directed translation to interpret MiniSVG
 
 
 
Top-down Predictive Parsing
 
nesting, e.g. matching parentheses: ( ) | (( )) | ((( ))) | …
to any depth
Why? A DFA has only a finite # states and thus cannot
encode that it has seen N “(“-s and thus now must
see N “)”-s for the parentheses to match (for any N).
For that we need some form of recursion:
S : “(“ S “)” | ε
 
CS453 Lecture
Context Free Grammar Intro
1
Context Free Grammars
 
 
 
 
Context Free Grammar Intro
2
Syntax and Semantics
CFG: set of productions of the form
 
Non-terminal ! phrase | phrase | phrase …
phrase: string of terminals and non-terminals
 
 
 
CS453 Lecture
 
Regular Expressions define what correct tokens are
Context Free Grammars define what correctly formed programs are
But… are all correctly formed programs meaningful?
terminals: tokens of the language
non-terminals represent sets of strings of tokens of the language
 
 
 
 
 
Example:
stmt ! ifStmt | whileStmt
ifStmt ! IF OPEN boolExpr CLOSE Stmt
whileStmt ! WHILE OPEN boolExpr CLOSE Stmt
 
 
CS453 Lecture
Context Free Grammar Intro
3
CS453 Lecture
Context Free Grammar Intro
4
Our Next Class of Languages
Syntax and Semantics
 
 
 
 
 
 
 
Regular Expressions define what correct tokens are
Context-Free Languages
Context Free Grammars define what correctly formed programs are
But… are all correctly formed programs meaningful?
{ww R }
n n
{a b }
NO: the program can have semantic errors
some can be detected by the compiler: type errors, undefined errors
some cannot: run-time errors,
program does not compute what it is supposed to
Regular Languages
a *b *
 
The semantics of a program defines its meaning.
Here, we do syntax directed translation / interpretation
(a | b)*
 
CS453 Lecture
Context Free Grammar Intro
5
Context-Free Languages
Example
A context-free grammar
Context-Free
Grammars
Recursive definitions
Pushdown
Automata
FSA
+
S →aSb
S →ε
A derivation:
stack
We will start here
G:
S ⇒ aSb ⇒ aaSbb ⇒ aabb
Another derivation:
€ ⇒ aaaSbbb ⇒ aaabbb
S ⇒ aSb ⇒ aaSbb
An Application of this Language
Context-Free Languages
S → aSb
S →ε
Gave a
grammar
for:
n n
L(G ) = {a b : n ≥ 0}
(((( ))))
Describes parentheses:
Example
A context-free grammar
A derivation:
G : S →aSa
S →bSb
S →ε
S ⇒ aSa ⇒ abSba ⇒ abba
Another derivation: €
Deriving another grammar
S ⇒ aSa ⇒ abSba ⇒ abaSaba ⇒ abaaba
Can we derive a
Grammar for:
{a nb n }
{ww R }
Regular Languages
Representing All Properly Nested
Parentheses
S → aSb
S →ε
L(G ) = {a nb n : n ≥ 0}
Describes parentheses:
(((( ))))
Can we build a grammar to include any valid
combination of ()? For example ( () (()) )
A Possible Grammar
A context-free grammar
Context-Free Grammars
G : S →(S)
Grammar
S →SS
S →ε
A derivation:
Nonterminals
S ⇒ SS ⇒ (S)S ⇒ ()S ⇒ ()
Terminals Start
symbol
Productions of the form:
Another derivation:
A→ x
S ⇒ SS ⇒ (S)S ⇒€()S ⇒ ()(S) ⇒ ()()
€
G = (V , T , S , P )
Nonterminal String of symbols,
Nonterminals and terminals
€
Derivation Order
Derivation, Language
Given a grammar with rules:
 
 
 
 
 
 
 
 
Grammar: G=(V,T,S,P)
1. S → AB
Derivation:
Start with start symbol S
Keep replacing non-terminals A by their RHS x,
until no non-terminals are left
The resulting string (sentence) is part of the language L(G)
Context Free Grammar Intro
4. B →Bb
5. B →ε
Always expand the leftmost non-terminal
The Language L(G) defined by the CFG G:
L(G) = the set of all strings of terminals that can be derived this way
CS453 Lecture
2. A →aaA
3. A →ε
15
Leftmost derivation:
€
1
2
3
€ 4
5
S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab
Derivation Order
Given a grammar with rules:
2. A →aaA
3. A →ε
1. S → AB
4. B →Bb
5. B →ε
Always expand the rightmost non-terminal
€
1
4
5
String
Stm --> id := Exp
Exp --> num
Exp --> ( Stm, Exp )
a := ( b := ( c := 3, 2 ), 1 )
Leftmost derivation:
Stm ==> a := Exp ==> a := ( Stm, Exp ) ==> a := ( b := Exp, Exp )
==> a := ( b := ( Stm, Exp ), Exp ) ==> a := ( b := ( c := Exp, Exp ), Exp )
==> a := ( b := ( c := 3, Exp ), Exp ) ==> a := ( b := ( c := 3, 2), Exp )
==> a := ( b := ( c := 3, 2), 1)
Rightmost derivation:
€
Rightmost derivation:
Grammar
2
3
Stm ==> a := Exp ==> a := ( Stm, Exp ) ==> a := ( Stm, 1)
==>
S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab
Parse Trees
Parse Trees
S → AB
A →aaA | ε
B →Bb | ε
S → AB
S
A
B →Bb | ε
S ⇒ AB ⇒ aaAB
S ⇒ AB
€
A →aaA | ε
S
€
€
€
A
B
a
a
A
B
Parse Trees
Parse Trees
S → AB
A →aaA | ε
B →Bb | ε
S
a
€
A
a
A
B →Bb | ε
S
€
B
B
A →aaA | ε
S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb
S ⇒ AB ⇒ aaAB ⇒ aaABb
€
S → AB
a
b
€
A
a
A
B
B
b
ε
Parse Trees
€
S → AB
Sentential forms
S → AB
A →aaA | ε
B →Bb | ε
A →aaA | ε
B →Bb | ε
S ⇒ AB ⇒ aaAB ⇒ aaABb ⇒ aaBb ⇒ aab
S
€
a
€
A
a
B
A
B
ε
ε
yield
b
€
€ S ⇒ AB
€
aaεεb
= aab
Partial parse tree
S
A
B
Sometimes, derivation order doesn t matter
S ⇒ AB ⇒ aaAB
Partial parse tree
sentential
form
S
A
Leftmost:
S ⇒ AB ⇒ aaAB ⇒ aaB ⇒ aaBb ⇒ aab
 
Rightmost:
S ⇒ AB ⇒ ABb ⇒ Ab ⇒ aaAb ⇒ aab
B
S
Same parse tree
A
a
a
A
a
a
€
Does it matter here?
B
A
B
ε
ε
b
€
How about here?
Parse Tree
Grammar
Stm --> id := Exp
Exp --> num
Exp --> ( Stm, Exp )
Grammar
(1) exp --> exp * exp
(2) exp --> exp + exp
(3) exp --> NUM
String
String
a := ( b := ( c := 3, 2 ), 1 )
42 + 7 * 6
What does this question mean?
CS453 Lecture
Context Free Grammar Intro
27
CS453 Lecture
Context Free Grammar Intro
28
Syntax Directed Translation or Interpretation
Interpret this program
Use parse tree to …
–  Translate one language to another
–  Create a data structure of the program
–  Interpret, or evaluate, the program
 
Parse Tree
Grammar
Stm --> id := Exp
Exp --> num
Exp --> ( Stm, Exp )
Works conceptually by…
–  Parser discovers the parse tree
–  Parser executes certain actions while traversing the parse tree using a
depth-first, post-order traversal
 
CS453 Lecture
Context Free Grammar Intro
String
a := ( b := ( c := 3, 2 ), 1 )
29
How about here?
CS453 Lecture
Context Free Grammar Intro
30
Semantic Rules for Expression Example
Grammar
(1) exp --> exp * exp
(2) exp --> exp + exp
(3) exp --> NUM
String
42 + 7 * 6
CS453 Lecture
Context Free Grammar Intro
31
CS453 Lecture
Context Free Grammar Intro
32
Example:
input
string
Parser
Parser
derivation
grammar
input
aabb
S →SS
S →aSb
S →bSa
derivation
?
S →ε
€
Exhaustive Search
S →SS | aSb | bSa | ε
Phase 1:
€
S ⇒ SS
S ⇒ aSb
Find derivation of
S ⇒ SS
S ⇒ aSb
aabb
S ⇒ bSa
S ⇒ε
S ⇒ bSa
S ⇒ε
All possible derivations of length 1
€
€
aabb
Phase 2
S →SS | aSb | bSa | ε
S ⇒ SS ⇒ SSS
S ⇒ SS ⇒ aSbS
Phase 1 € S ⇒ SS ⇒ bSaS
S ⇒ SS
S ⇒ SS ⇒ S
S ⇒ aSb S ⇒ aSb ⇒ aSSb
S ⇒ SS ⇒ SSS
aabb
S ⇒ aSb ⇒ aaSbb
S ⇒ aSb ⇒ abSab
S ⇒ aSb ⇒ ab
Final result of exhaustive search
(top-down parsing)
Parser
input
aabb
aabb
S ⇒ SS ⇒ aSbS
S ⇒ SS ⇒€
S
S ⇒ aSb ⇒ aSSb
S ⇒ aSb ⇒ aaSbb
Phase 3
S ⇒ aSb ⇒ aaSbb ⇒ aabb
For general context-free grammars:
The exhaustive search approach is extremely
costly: O(|P||w|)
S ⇒ SS
S ⇒ aSb
S ⇒ bSa
S ⇒ε
There exists a parsing algorithm
3
that parses a string w in time | w |
derivation
€
S →SS | aSb | bSa | ε
Phase 2
S ⇒ aSb ⇒ aaSbb ⇒ aabb
For LL(1) grammars, a simple type of
CFGs that we will meet soon, we can
use Predictive parsing and parse in | w |
time
Example Predictive Parser: Recursive Descent
start
stmts
stmt
ifStmt
whileStmt
->
->
->
->
->
Recursive Descent Parsing
stmts EOF
stmt | stmt stmts
ifStmt | whileStmt IF id { stmts }
WHILE id { stmts }
 
void start() { switch(m_lookahead) {
case IF, WHILE: stmts(); match(Token.Tag.EOF); break;
default:
throw new ParseException(…);
}}
void stmts() { switch(this.m_lookahead) {
case IF,WHILE: stmt(); stmts(); break;
case EOF:
break; default:
throw new ParseException(…);
}}
void stmt() { switch(m_lookahead) {
case IF:
ifStmt();break;
case WHILE: whileStmt(); break;
default:
throw new ParseException(…);
}}
void ifStmt() {switch(m_lookahead) {
case IF: Check_next_Token(id); Check_next_Token(OPENBRACE); stmts(); Check_next_Token(CLOSEBRACE); break;
default: throw new ParseException(…); }}
CS453 Lecture
Context Free Grammar Intro
 
 
 
 
 
41
Each non-terminal becomes a method
that mimics the RHSs of the productions associated with it
and choses a particular RHS:
an alternative based on a look-ahead symbol
and throws an exception if no alternative applies
When does this work?
CS453 Lecture
Context Free Grammar Intro
42