Yacc Actions

Compiler Structures
241-437, Semester 1, 2011-2012
7. Yacc
•
Objective
– describe yacc (actually bison)
– give simple examples of its use
241-437 Compilers: Yacc/7
1
Overview
1.
2.
3.
4.
5.
6.
What is Yacc?
Format of a yacc/bison File
Expressions Compiler
Bottom-up Parsing Reminder
Expression Conflicts
Precedence/Associativity in yacc
continued
241-437 Compilers: Yacc/7
2
7.
8.
9.
10.
11.
Dangling Else Conflict
Left and Right Recursion
Error Recovery
Embedded Actions
More Information
241-437 Compilers: Yacc/7
3
1. What is Yacc?
•
Yacc (Yet Another Compiler Compiler) is a
tool for translating a context free grammar
into a bottom-up LALR parser
– it creates a parse table like that described in the
last chapter
•
Yacc is used with lex to create compilers.
continued
241-437 Compilers: Yacc/7
4
•
Most people use bison, a much improved
version of yacc
– on most modern Unixes, when you call yacc,
you're really using bison
•
bison works with flex (the fast version of
lex).
241-437 Compilers: Yacc/7
5
Bison and Flex
foo.l,
a flex file
flex
$ flex foo.l
$ bison foo.y
$ gcc foo.tab.c -o foo
lex.yy.c
#include
foo.y,
a bison file
bison
source
program
C compiler
foo,
c executable
foo.tab.c
foo,
c executable
parsed
output
$ ./foo < program.txt
241-437 Compilers: Yacc/7
6
Compiler Components (in foo)
3. Token,
token value,
token type
lex.yy.c,
Source
Lexical
Program
Analyzer
2. Get chars (using chars) 1. Get next
token by
to make
calling
a token
yylex()
lexical
errors
241-437 Compilers: Yacc/7
foo.tab.c,
Syntax
Analyzer
(using tokens)
parsed
output
syntax
errors
7
Inside foo.tab.c
input tokens a1
stack
Xm sm
Xm-1 sm-1
…
Xo s0
X is terminals or
non-terminals,
S = state
241-437 Compilers: Yacc/7
a2
…
ai
…
an
LALR Parser
actions
gotos
$
parsed
output
Parse table
(bison creates this
based on your
grammar)
8
2. Format of a yacc/bison File
declarations:
C data and yacc definitions (or nothing)
%%
Grammar rules (with actions)
%%
#include "lex.yy.c"
C functions, including main()
241-437 Compilers: Yacc/7
9
Declarations
•
C data is put between %{ and %}
•
The yacc definitions list the tokens
(terminals) used in the grammar
%token terminal1 terminal2 ...
•
Other yacc definitions:
– %left and %right for associativity
– %prec for precedence
241-437 Compilers: Yacc/7
10
 Precedence
example: 2 + 3 * 5
– does it mean (2 + 3) * 5
or 2 + (3 * 5) ?
 Associativity
example: 1 – 1 – 1
– does it mean (1 – 1) – 1
or 1 – (1 – 1) ?
241-437 Compilers: Yacc/7
// left
// right
11
Rules
•
grammar part is the same as:
nonterminal  body1 | body2 |
•
|
bodyN
Rule format:
nonterminal :
|
.
|
;
•
...
body 1
body 2
. .
body n
{action 1}
{action 2}
{action n)
Actions are optional; they are C code.
Actions are usually at the end of a body,
but can be placed anywhere.
241-437 Compilers: Yacc/7
12
3. Expressions Compiler
expr.l,
a flex file
flex
lex.yy.c
#include
expr.y,
a bison file
bison
gcc
exprEval,
c executable
expr.tab.c
$ flex expr.l
$ bison expr.y
$ gcc expr.tab.c -o exprEval
241-437 Compilers: Yacc/7
13
Usage
$ ./exprEval
2 + 3
Value = 5
2 - (5 * 2)
Value = -8
1 / 3
Value = 0
$
241-437 Compilers: Yacc/7
I typed
these lines.
I typed
ctrl-D
14
expr.l
%%
[-+*/()\n]
[0-9]*
[ \t]
{ return *yytext; }
RE actions usually
end with a return.
The token is passed
to the syntax analyser.
{ yylval = atoi(yytext);
return(NUMBER);
}
;
/* skip whitespace */
%%
int yywrap(void)
{ return 1; }
241-437 Compilers: Yacc/7
No main() function
15
Lex File Format Reminder
•
A lex program has three sections:
REs and/or C code
%%
RE/action rules
%%
C functions
241-437 Compilers: Yacc/7
16
expr.y
%token NUMBER
declarations
%%
exprs: expr '\n'
| exprs expr '\n'
;
expr: expr '+' term
| expr '-' term
| term
;
{ printf("Value = %d\n", $1); }
{ printf("Value = %d\n", $2); }
{ $$ = $1 + $3; }
{ $$ = $1 - $3; }
{ $$ = $1; }
attributes
rules
continued
241-437 Compilers: Yacc/7
17
term: term '*' factor
| term '/' factor
{ $$ = $1 * $3; }
{ $$ = $1 / $3; }
/* integer division */
| factor
;
factor: '(' expr ')'
| NUMBER
;
{ $$ = $2; }
more rules
continued
241-437 Compilers: Yacc/7
18
$$
#include "lex.yy.c"
c code
int yyerror(char *s)
{ fprintf(stderr, "%s\n", s);
return 0;
}
int main(void)
{ yyparse();
// the syntax analyzer
return 0;
}
241-437 Compilers: Yacc/7
19
Yacc Actions
•
yacc actions (the C code) can use attributes
(variables).
•
Each body terminal/non-terminal has an
attribute, which contains it's return value.
241-437 Compilers: Yacc/7
20
Attributes
•
An attribute is $n, where n is the position of
the terminal/non-terminal in the body
starting at 1
–
–
–
–
$1 = first terminal/non-terminal of the body
$2 = second one
etc.
$$ = return value for the rule
•
the default value for $$ is the $1 value
241-437 Compilers: Yacc/7
21
Evaluation in yacc
Stack
val
$
_
$3
3
$F
3
$T
3
$T*
3
$T*5 3 5
$T*F 3 5
$T
15
$E
15
$E+
15
$ E + 4 15 4
$ E + F 15 4
$ E + T 15 4
$E
19
$ E \n
19
$ Es
19
241-437 Compilers: Yacc/7
Input
3*5+4\n$
*5+4\n$
*5+4\n$
*5+4\n$
5+4\n$
+4\n$
+4\n$
+4\n$
+4\n$
4\n$
\n$
\n$
\n$
\n$
$
$
Input: 3 * 5 + 4\n
Action
shift
reduce F  num
reduce T  F
shift
shift
reduce F  num
reduce T  T * F
reduce E  T
shift
shift
reduce F  num
reduce T  F
reduce E  E + T
shift
reduce Es  E \n
accept
Rule
$$ = $1 (implicit)
$$ = $1 (implicit)
$$ = $1 (implicit)
$$ = $1 * $3
$$ = $1 (implicit)
$$ = $1 (implicit)
$$ = $1 (implicit)
$$ = $1 + $3
printf $1
22
4. Bottom-up Parsing Reminder
•
Simple expressions grammar:
E => E '+' E
E => E '*' E
E => id
241-437 Compilers: Yacc/7
// rule r1
// rule r2
// rule r3
23
Parsing "x + y * z"
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
.
x
E
E
E
E
E
E
E
E
E
x
.
.
+
+
+
+
+
+
+
.
+
+
+
.
y
E
E
E
E
E
y
y
y
y
.
.
*
*
*
.
241-437 Compilers: Yacc/7
*
*
*
*
*
*
.
z
E
z
z
z
z
z
z
z
.
.
//
//
//
//
//
//
//
//
//
//
//
shift
reduce(r3)
shift
shift
reduce(r3)
shift
shift
reduce(r3)
reduce(r2)
reduce(r1)
accept
24
Shift/Reduce Conflict
•
At step 6, a shift or a reduce is possible.
6.
7.
E + E . * z
E . * z
// reduce (r1)
:
•
What should be done?
– by default, yacc (bison) shifts
241-437 Compilers: Yacc/7
25
Reduce/Reduce Conflict
•
Modify the grammar to include:
E => T
E => id
T => id
// new rule r3
// rule r4
// rule r5
continued
241-437 Compilers: Yacc/7
26
•
Consider step 2:
x . + y * z
•
There are two ways to reduce:
E . + y * z
// reduce (r4)
or
T . + y * z
•
// reduce (r5)
What should be done?
– by default, yacc (bison) reduces using the first
possible rule (i.e. rule r4)
241-437 Compilers: Yacc/7
27
Common Conflicts
•
The two most common shift/reduce
problems in prog. languages are:
– expression precedence
– dangling else
•
•
yacc has features for fixing both of these
Reduce/reduce problems are usually due to
errors in your grammar.
241-437 Compilers: Yacc/7
28
Debugging Conflicts
•
bison can generate extra conflict
information, which can help you debug your
grammar.
– use the -v option
241-437 Compilers: Yacc/7
29
5. Expression Conflicts
in shiftE.y
%token NUMBER
%%
expr: expr '+' expr
| expr '*' expr
| '(' expr ')'
| NUMBER
;
shift/reduce here,
as in previous
example
continued
241-437 Compilers: Yacc/7
30
%%
#include "lex.yy.c"
int yyerror(char *s)
{ fprintf(stderr, "%s\n", s);
return 0;
}
int main(void)
{ yyparse();
return 0;
}
241-437 Compilers: Yacc/7
31
Example
•
When the parsing state is:
expr '+' expr
. '*' z
should bison shift:
expr
'+' expr
'*' . z
or reduce?:
expr
. '*' z
241-437 Compilers: Yacc/7
// using rule 1
32
Using -v
$ bison shiftE.y
shiftE.y: conflicts: 4 shift/reduce
$ bison -v shiftE.y
shiftE.y: conflicts: 4 shift/reduce
– creates a shiftE.output file with extra conflict
information
241-437 Compilers: Yacc/7
33
Inside shiftE.output
State 9 conflicts: 2 shift/reduce
State 10 conflicts: 2 shift/reduce
states 9 and 10
are the problems
Grammar
0 $accept: expr $end
1 expr: expr '+' expr
2
| expr '*' expr
3
| '(' expr ')'
4
| NUMBER
:
the rules
are numbered
// many state blocks
continued
241-437 Compilers: Yacc/7
34
when bison is looking at
these kinds of parsing states
state 9
1 expr: expr . '+' expr
1
| expr '+' expr .
2
| expr . '*' expr
'+'
'*'
shift, and go to state 6
shift, and go to state 7
'+'
'*'
$default
bison does this
but it could do this
[reduce using rule 1 (expr)]
[reduce using rule 1 (expr)]
reduce using rule 1 (expr)
continued
241-437 Compilers: Yacc/7
35
when bison is looking at
these kinds of parsing states
state 10
1 expr: expr . '+' expr
2
| expr . '*' expr
2
| expr '*' expr .
'+'
'*'
shift, and go to state 6
shift, and go to state 7
'+'
'*'
$default
241-437 Compilers: Yacc/7
bison does this
but it could do this
[reduce using rule 2 (expr)]
[reduce using rule 2 (expr)]
reduce using rule 2 (expr)
36
What causes Expression Conflicts?
•
The problems are the precedence and
associativity of the operators:
– does 2 + 3 * 5 mean
(2 + 3) * 5 or 2 + (3 * 5) ?
– does 1 - 1 - 1 mean
(1 - 1) - 1 or 1 - (1 - 1) ?
•
// should be 2nd
// should be 1st
* should have higher precedence than +, and
– should be left associative.
241-437 Compilers: Yacc/7
37
6. Precedence/Associativity in yacc
•
The declarations section can contain
associativity and precedence settings for
tokens:
– %left, %right, %nonassoc
– precedence is given by the order of the lines
•
Example:
%left '+' '-'
%left '*' '/'
241-437 Compilers: Yacc/7
All left associative, with
'*' and '/' higher precedence
than '+' and '-'.
38
Expressions Variables Compiler
exprVars.l,
a flex file
flex
lex.yy.c
#include
exprVars.y,
a bison file
bison
gcc
exprVarsEval,
c executable
exprVars.tab.c
$ flex exprVars.l
$ bison exprVars.y
$ gcc exprVars.tab.c -o exprVarsEval
241-437 Compilers: Yacc/7
39
Usage
$ ./exprVarsEval
2 + 5 * 3
Value = 17
1 - 1 - 1
Value = -1
a = 3 * 4
a
Value = 12
b = (3 - 6) * a
b
Value = -36
$
241-437 Compilers: Yacc/7
I typed
these lines.
I typed
ctrl-D
40
exprVars.l
/* Added: RE vars, token names, VAR token,
assignment, error msgs */
digits
letter
%%
[0-9]+
[a-z]
\n
return('\n');
\= return(ASSIGN);
\+ return(PLUS);
\return(MINUS);
\* return(TIMES);
\/
return(DIV);
\( return(LPAREN);
\) return(RPAREN);
the token names
are defined in the
yacc file
continued
241-437 Compilers: Yacc/7
41
{letter}
{ yylval = *yytext - 'a';
return(VAR);
}
{digits}
{ yylval = atoi(yytext);
return(NUMBER);
}
[ \t]
/* skip whitespace */
.
;
yyerror("Invalid char");
/* reject everything else */
%%
int yywrap(void)
{ return 1; }
241-437 Compilers: Yacc/7
42
exprVars.y
/* Added: token names, assoc/precedence ops,
changed grammar rules, vars and assignment. */
%token VAR NUMBER ASSIGN PLUS MINUS TIMES
DIV LPAREN RPAREN
%left PLUS MINUS
%left TIMES DIV
%{
int symbol[26];
%}
// stores var's values
%%
continued
241-437 Compilers: Yacc/7
43
program: program statement '\n'
|
;
statement: expr
{ printf("Value = %d\n", $1); }
| VAR ASSIGN expr
{ symbol[$1] = $3; }
expr: NUMBER
| VAR
| expr
| expr
| expr
| expr
{ $$ = symbol[$1]; }
PLUS expr
{ $$ = $1 + $3; }
MINUS expr
{ $$ = $1 - $3; }
TIMES expr
{ $$ = $1 * $3; }
DIV expr
{ $$ = $1 / $3; }
/* integer division */
| LPAREN expr RPAREN
{ $$ = $2; }
;
%%
241-437 Compilers: Yacc/7
continued
44
#include "lex.yy.c"
int yyerror(char *s)
{ fprintf(stderr, "%s\n", s);
return 0;
}
int main(void)
{ yyparse();
return 0;
}
241-437 Compilers: Yacc/7
45
7. Dangling Else Conflict
in iffy.y
%token IF ELSE variable
%%
stmt:
expr
| if_stmt
;
if_stmt: IF expr stmt
| IF expr stmt ELSE stmt
;
expr:
variable
;
241-437 Compilers: Yacc/7
$ bison -v iffy.y
iffy.y: conflicts: 1 shift/reduce
46
Shift or Reduce?
•
if (x <
if (x
y =
else y
5)
< 3)
a – b;
= b – a;
Current state:
– IF expr IF expr stmt . ELSE stmt
•
Shift choice:
–
–
–
–
IF
IF
IF
IF
expr
expr
expr
expr
IF expr stmt . ELSE stmt
IF expr stmt ELSE . stmt
IF expr stmt ELSE stmt .
stmt .
the second ELSE is paired with the second IF
continued
241-437 Compilers: Yacc/7
47
if (x <
if (x
y =
else y
•
5)
< 3)
a – b;
= b – a;
Reduce option:
–
–
–
–
IF
IF
IF
IF
expr
expr
expr
expr
IF expr stmt . ELSE stmt
stmt . ELSE stmt
stmt ELSE . stmt
stmt ELSE stmt .
the second ELSE is paired with the first IF
241-437 Compilers: Yacc/7
48
Inside iffy.output
State 8 conflicts: 1 shift/reduce
Grammar
0 $accept: stmt $end
1 stmt: expr
2
| if_stmt
3 if_stmt: IF expr stmt
4
| IF expr stmt ELSE stmt
5 expr: variable
:
// many state blocks
continued
241-437 Compilers: Yacc/7
49
when bison is looking at
these kinds of parsing states
state 8
3 if_stmt: IF expr stmt .
4
| IF expr stmt . ELSE stmt
ELSE
bison does this
shift, and go to state 9
ELSE
$default
[reduce using rule 3 (if_stmt)]
reduce using rule 3 (if_stmt)
but it could do this
241-437 Compilers: Yacc/7
50
8. Left and Right Recursion
•
A left recursive rule:
list: item
| list ',' item
;
•
A right recursion rule:
list: item
| item ',' list
•
Left recusion keeps the parse table stack
smaller, so may be a better choice
• this is the opposite of top-down
241-437 Compilers: Yacc/7
51
9. Error Recovery
•
When an error occurs, yacc/bison calls
yyerror() and then terminates.
•
A better approach is to call yyerror(), then
try to continue
– this can be done by using the keyword error in
the grammar rules
241-437 Compilers: Yacc/7
52
Example
•
If there's an error in the stmt rule, then skip
the rest of the input tokens until ';'" or '}' is
seen, then continue as before:
stmt: ';'
| expr ';'
| VAR '=' expr ';'
| '{' stmt_list '}'
| error ';'
| error '}'
;
241-437 Compilers: Yacc/7
53
10. Embedded Actions
•
Actions can be placed anywhere in a rule,
not just at the end:
listPair: item1 { do_item1($1); }
item2 { do_item2($3); }
– the action variable in the second action block is
$3 since the first action block is counted as part
of the rule
241-437 Compilers: Yacc/7
54
11. More Information
in our library
•
Lex and Yacc
by Levine, Mason, and Brown
O'Reilly; 2nd edition
•
On UNIX:
– man yacc
– info yacc
241-437 Compilers: Yacc/7
continued
55
•
A Compact Guide to Lex & Yacc
by Tom Niemann
http://epaperpress.com/lexandyacc/
– with several yacc calculator examples, which
I'll be discussing in the next few chapters
•
The Lex & Yacc Page
– documentation and tools
http://dinosaur.compilertools.net/
241-437 Compilers: Yacc/7
continued
56
•
Compiler Construction using Flex and
Bison
by Anthony A. Aaby
– in the "Useful Info" subdirectory of the course
website
241-437 Compilers: Yacc/7
57