a + b*c - Department of Computer Science

CMPE 152: Compiler Design
May 11 Class Meeting
Department of Computer Engineering
San Jose State University
Spring 2017
Instructor: Ron Mak
www.cs.sjsu.edu/~mak
1
Bottom-Up Parsers

A popular type of bottom-up parser is the
shift-reduce parser.


A bottom-up parser starts with the
input tokens from the source program.
A shift-reduce parser uses a parse stack.


The stack starts out empty.
The parser shifts (pushes)
each input token (terminal symbol)
from the scanner onto the stack.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
2
Bottom-Up Parsers, cont’d

When what’s on top of the parse stack matches
the longest right hand side of a production rule:

The parser pops off the matching symbols
and …

… reduces (replaces) them with the
nonterminal symbol at the left hand side of the
matching rule.

Example: <term> ::= <factor> * <factor>

Pop off <factor> * <factor> and replace by <term>
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
3
Bottom-Up Parsers, cont’d

Repeat until the parse stack is reduced
to the topmost nonterminal symbol.


Example: <PROGRAM>
The parser accepts the input source
as being syntactically correct.

The parse was successful.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
4
Example: Shift-Reduce Parsing
Parse stack (top at right)

Parse the expression
a + b*c given the
production rules:
<expression> ::= <simple expression>
<simple expression> ::= <term + <term>
<term> ::= <factor> | <factor> * <factor>
<factor> ::= <variable>
<variable> ::= <identifier>
<identifier> ::= a | b | c
In this grammar, the
topmost nonterminal symbol
is <expression>
Computer Engineering Dept.
Spring 2017: May 11
Input
Action
a + b*c
shift
a
+ b*c
reduce
<identifier>
+ b*c
reduce
<variable>
+ b*c
reduce
<factor>
+ b*c
reduce
<term>
+ b*c
shift
b*c
shift
<term> +
<term> + b
*c
reduce
<term> + <identifier>
*c
reduce
<term> + <variable>
*c
reduce
<term> + <factor>
*c
shift
c
shift
<term> + <factor> *
<term> + <factor> * c
reduce
<term> + <factor> * <identifier>
reduce
<term> + <factor> * <variable>
reduce
<term> + <factor> * <factor>
reduce
<term> + <term>
reduce
<simple expression>
reduce
<expression>
accept
CMPE 152: Compiler Design Lab
© R. Mak
5
Why Bottom-Up Parsing?

The shift-reduce actions can be
driven by a table.


The table is based on the production rules.
It is almost always generated
by a compiler-compiler.

Like a table-driven scanner,
a table-driven parser can be
very compact and extremely fast.

However, for a significant grammar,
the table can be nearly impossible
for a human to follow.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
6
Why Bottom-Up Parsing?

Error recovery can be especially tricky.

It can be very hard to debug the parser
if something goes wrong.

It’s usually an error in the grammar (of course!).
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
7
Lex and Yacc

Lex and Yacc


Lex automatically generates a scanner
written in C.


“Standard” compiler-compiler
for Unix and Linux systems.
Flex: free GNU version
Yacc (“Yet another compiler-compiler”)
automatically generates a parser written in C.


Bison: free GNU version
Generates a bottom-up shift-reduce parser.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
8
Example: Simple Interpretive Calculator

Yacc file (production rules): calc.y
...
We’ll need to define the NUMBER token.
%token NUMBER
%left '+' '-' /* left associative, same precedence */
%left '*' '/' /* left associative, higher precedence */
%%
exprlist: /* empty list */
| exprlist '\n'
| exprlist expr '\n' {printf("\t%lf\n", $2);}
;
expr:
|
|
|
|
|
;
NUMBER
expr '+'
expr '-'
expr '*'
expr '/'
'(' expr
expr
expr
expr
expr
')'
{$$
{$$
{$$
{$$
{$$
{$$
%%
Computer Engineering Dept.
Spring 2017: May 11
=
=
=
=
=
=
$1;}
$1 +
$1 $1 *
$1 /
$2;}
$3;}
$3;}
$3;}
$3;}
#include <stdio.h>
#include <ctype.h>
int main(int argc, char *argv[])
{
progname = argv[0];
yyparse();
}
CMPE 152: Compiler Design Lab
© R. Mak
9
Example: Simple Calculator, cont’d

Lex file (token definitions): calc.l
%{
#include "calc.tab.h"
extern lineno;
%}
%option noyywrap
%%
[ \t]
[0-9]+\.?|[0-9]*\.[0-9]+
\n
.

Commands:
{;} /* skip blanks and tabs */
{sscanf(yytext, "%lf", &yylval); return NUMBER;}
{lineno++; return '\n';}
{return yytext[0];} /* everything else */
yacc –d calc.y
lex calc.l
cc –c *.c
cc –o calc *.o
./calc
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
Demo
10
LL and LR Parsers

Parsers are classified LL or LR according to
the way they operate while parsing.

The first L stands for left-to-right,
the order a parser reads the source program.

If the second letter is also L, it means that
whenever the parser is processing a production
rule, it “expands” the leftmost nonterminal
symbol first.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
11
LL and LR Parsers, cont’d

Is the parser generated by JavaCC LL or LR?

In a production rule like
term() addOp() term()
it calls the leftmost parsing method term() first.

Therefore, the parser is LL.


By default, it’s LL(1) for one-token lookahead.
In general, the parser is LL(k) since we can add
LOOKAHEAD(k) to the production rules.
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
12
LL and LR Parsers, cont’d


What about a
bottom-up parser?
Read the contents of the
shift-reduce parse stack
from bottom to top.


Parse stack (top at right)
It’s the rightmost
nonterminal symbol
that’s expanded first.
The shift-reduce parser
is LR(k).
Computer Engineering Dept.
Spring 2017: May 11
Input
Action
a + b*c
shift
a
+ b*c
reduce
<identifier>
+ b*c
reduce
<variable>
+ b*c
reduce
<factor>
+ b*c
reduce
<term>
+ b*c
shift
b*c
shift
<term> +
<term> + b
*c
reduce
<term> + <identifier>
*c
reduce
<term> + <variable>
*c
reduce
<term> + <factor>
*c
shift
c
shift
<term> + <factor> *
<term> + <factor> * c
reduce
<term> + <factor> * <identifier>
reduce
<term> + <factor> * <variable>
reduce
<term> + <factor> * <factor>
reduce
<term> + <term>
reduce
<simple expression>
reduce
<expression>
accept
CMPE 152: Compiler Design Lab
© R. Mak
13
LL and LR Parsers, cont’d

Specific types of LR parsers:



SLR : Simple LR
LALR : Lookahead LR
See http://en.wikipedia.org/wiki/LR_parser
Computer Engineering Dept.
Spring 2017: May 11
CMPE 152: Compiler Design Lab
© R. Mak
14