CMPE 152: Compiler Design
May 11 Class Meeting
Department of Computer Engineering
San Jose State University
Spring 2017
Instructor: Ron Mak
www.cs.sjsu.edu/~mak
1
Bottom-Up Parsers
A popular type of bottom-up parser is the
shift-reduce parser.
A bottom-up parser starts with the
input tokens from the source program.
A shift-reduce parser uses a parse stack.
The stack starts out empty.
The parser shifts (pushes)
each input token (terminal symbol)
from the scanner onto the stack.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
2
Bottom-Up Parsers, cont’d
When what’s on top of the parse stack matches
the longest right hand side of a production rule:
The parser pops off the matching symbols
and …
… reduces (replaces) them with the
nonterminal symbol at the left hand side of the
matching rule.
Example: <term> ::= <factor> * <factor>
Pop off <factor> * <factor> and replace by <term>
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
3
Bottom-Up Parsers, cont’d
Repeat until the parse stack is reduced
to the topmost nonterminal symbol.
Example: <PROGRAM>
The parser accepts the input source
as being syntactically correct.
The parse was successful.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
4
Example: Shift-Reduce Parsing
Parse stack (top at right)
Parse the expression
a + b*c given the
production rules:
<expression> ::= <simple expression>
<simple expression> ::= <term + <term>
<term> ::= <factor> | <factor> * <factor>
<factor> ::= <variable>
<variable> ::= <identifier>
<identifier> ::= a | b | c
In this grammar, the
topmost nonterminal symbol
is <expression>
Computer Engineering Dept.
Spring 2017: May 11
Input
Action
a + b*c
shift
a
+ b*c
reduce
<identifier>
+ b*c
reduce
<variable>
+ b*c
reduce
<factor>
+ b*c
reduce
<term>
+ b*c
shift
b*c
shift
<term> +
<term> + b
*c
reduce
<term> + <identifier>
*c
reduce
<term> + <variable>
*c
reduce
<term> + <factor>
*c
shift
c
shift
<term> + <factor> *
<term> + <factor> * c
reduce
<term> + <factor> * <identifier>
reduce
<term> + <factor> * <variable>
reduce
<term> + <factor> * <factor>
reduce
<term> + <term>
reduce
<simple expression>
reduce
<expression>
accept
CMPE 152: Compiler Design Lab
© R. Mak
5
Why Bottom-Up Parsing?
The shift-reduce actions can be
driven by a table.
The table is based on the production rules.
It is almost always generated
by a compiler-compiler.
Like a table-driven scanner,
a table-driven parser can be
very compact and extremely fast.
However, for a significant grammar,
the table can be nearly impossible
for a human to follow.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
6
Why Bottom-Up Parsing?
Error recovery can be especially tricky.
It can be very hard to debug the parser
if something goes wrong.
It’s usually an error in the grammar (of course!).
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
7
Lex and Yacc
Lex and Yacc
Lex automatically generates a scanner
written in C.
“Standard” compiler-compiler
for Unix and Linux systems.
Flex: free GNU version
Yacc (“Yet another compiler-compiler”)
automatically generates a parser written in C.
Bison: free GNU version
Generates a bottom-up shift-reduce parser.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
8
Example: Simple Interpretive Calculator
Yacc file (production rules): calc.y
...
We’ll need to define the NUMBER token.
%token NUMBER
%left '+' '-' /* left associative, same precedence */
%left '*' '/' /* left associative, higher precedence */
%%
exprlist: /* empty list */
| exprlist '\n'
| exprlist expr '\n' {printf("\t%lf\n", $2);}
;
expr:
|
|
|
|
|
;
NUMBER
expr '+'
expr '-'
expr '*'
expr '/'
'(' expr
expr
expr
expr
expr
')'
{$$
{$$
{$$
{$$
{$$
{$$
%%
Computer Engineering Dept.
Spring 2017: May 11
=
=
=
=
=
=
$1;}
$1 +
$1 $1 *
$1 /
$2;}
$3;}
$3;}
$3;}
$3;}
#include <stdio.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
progname = argv[0];
yyparse();
}
CMPE 152: Compiler Design Lab
© R. Mak
9
Example: Simple Calculator, cont’d
Lex file (token definitions): calc.l
%{
#include "calc.tab.h"
extern lineno;
%}
%option noyywrap
%%
[ \t]
[0-9]+\.?|[0-9]*\.[0-9]+
\n
.
Commands:
{;} /* skip blanks and tabs */
{sscanf(yytext, "%lf", &yylval); return NUMBER;}
{lineno++; return '\n';}
{return yytext[0];} /* everything else */
yacc –d calc.y
lex calc.l
cc –c *.c
cc –o calc *.o
./calc
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
Demo
10
LL and LR Parsers
Parsers are classified LL or LR according to
the way they operate while parsing.
The first L stands for left-to-right,
the order a parser reads the source program.
If the second letter is also L, it means that
whenever the parser is processing a production
rule, it “expands” the leftmost nonterminal
symbol first.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
11
LL and LR Parsers, cont’d
Is the parser generated by JavaCC LL or LR?
In a production rule like
term() addOp() term()
it calls the leftmost parsing method term() first.
Therefore, the parser is LL.
By default, it’s LL(1) for one-token lookahead.
In general, the parser is LL(k) since we can add
LOOKAHEAD(k) to the production rules.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
12
LL and LR Parsers, cont’d
What about a
bottom-up parser?
Read the contents of the
shift-reduce parse stack
from bottom to top.
Parse stack (top at right)
It’s the rightmost
nonterminal symbol
that’s expanded first.
The shift-reduce parser
is LR(k).
Computer Engineering Dept.
Spring 2017: May 11
Input
Action
a + b*c
shift
a
+ b*c
reduce
<identifier>
+ b*c
reduce
<variable>
+ b*c
reduce
<factor>
+ b*c
reduce
<term>
+ b*c
shift
b*c
shift
<term> +
<term> + b
*c
reduce
<term> + <identifier>
*c
reduce
<term> + <variable>
*c
reduce
<term> + <factor>
*c
shift
c
shift
<term> + <factor> *
<term> + <factor> * c
reduce
<term> + <factor> * <identifier>
reduce
<term> + <factor> * <variable>
reduce
<term> + <factor> * <factor>
reduce
<term> + <term>
reduce
<simple expression>
reduce
<expression>
accept
CMPE 152: Compiler Design Lab
© R. Mak
13
LL and LR Parsers, cont’d
Specific types of LR parsers:
SLR : Simple LR
LALR : Lookahead LR
See http://en.wikipedia.org/wiki/LR_parser
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
14
© Copyright 2026 Paperzz