YACC

• Syntax error handling
– Errors can occur at many levels
•
•
•
•
lexical: unknown operator
syntactic: unbalanced parentheses
semantic: variable never declared
runtime: reference a NULL pointer
– Goals of error-handling in a parser
• To detect and report the presence of errors
• To recover from an error and detect subsequent
errors
• To not slow down the processing of correct
programs
Error recovery strategies
• Panic mode recovery
– On discovering an error, discard input symbols
one at a time until one of a designated set of
synchronizing token is found.
• Phrase-level recovery
– On discovering an error, perform a local fix to
allow the parser to continue.
• Error recovery in predictive parsing
– Recovery in a non-recursive predictive parser is
easier than in a recursive descent parser.
– Panic mode recovery
• If a terminal on stack, pop the terminal.
• If a non-terminal on stack, shift the input until the
terminal can expand.
– Phrase-level recovery
• Carefully filling in the blank entries about what to
do.
– Error recover in LR parsing
• Canonical LR parsers never make extra reductions
when recognizing an error.
• SLR and LALR may make extra reductions, but will
never shift an erroneous input symbol on the stack.
• Panic mode recovery
– Scan down stack until a state representing a major
program construct is found. Input symbols are discarded
until one is found that is in the follow of the nonterminal.
Trying to isolate the phrase containing the error.
• Phrase level recovery
– Implement an error recovery routine for each error entry in
the table.
– Writing a parser with YACC (Yet Another
Compiler Compiler).
• Generates LALR parsers
• Work with lex. YACC calls yylex to get next token.
– YACC and lex must agree on the values for each token.
• Produce y.tab.c file by “yacc yaccfile”, which contains a
routine yyparse().
• yyparse() returns 0 if the program is ok, non-zero otherwise
• YACC file format:
declarations
%%
translation rules
%%
supporting C-routines
• The declarations part specifies tokens, non-terminals
symbols, other C constructs.
– To specify token AAA BBB
• %token AAA BBB
– To assign a token number to a token (needed when using lex), a
nonnegative integer followed immediately to the first appearance
of the token
• %token EOFnumber 0
• %token SEMInumber 101
– Non-terminals do not need to be declared unless you want to
associated it with a type (will be discussed later).
• Translations rules specify the grammar productions
exp : exp PLUSnumber exp
| exp MINUSnumber exp
| exp TIMESnumber exp
| exp DIVIDEnumber exp
| LPARENnumber exp RPARENnumber
| ICONSTnumber
;
exp : exp PLUSnumber exp
;
exp : exp MINUSnumber exp
;
• Yacc environment
– Yacc processes the specification file and produce a y.tab.c file.
– An integer function yyparse() is produced by Yacc.
• Calls yylex() to get tokens.
• Return non-zero when an error is found.
• Return 0 if the program is accepted.
– Need main() and and yyerror() functions.
– Example:
yyerror(str)
char *str;
{ printf("yyerror: %s at line %d\n", str, yyline);
}
main()
{
if (!yyparse()) {printf("accept\n");}
else printf("reject\n");
}
– YACC builds a LALR parser for the grammar.
• May have shift/reduce and reduce/reduce conflicts if there are
problems with the grammar.
• Default conflict resolution:
– shift/reduce --> shift
– reduce/reduce --> first production in the state
– should always avoid reduce/reduce conflicts
• ‘yacc -v *.y’ will generate a report in file ‘y.output’.
• See example1.y
• The programmer MUST resolve all conflicts (unless you really
know what you are doing).
– modify the grammar. See example2.y
– Use precedence and associativity of operators.
• Use precedence and associativity of
operators.
– Using keywords %left, %right, %nonassoc in
the declarations section.
• All tokens on the same line are the same precedence
level and associativity.
• The lines are listed in order of increasing
precedence.
%left PLUSnumber, MINUSnumber
%left TIMESnumber, DIVIDEnumber
– See example3.y
• Symbol attributes
– Each symbol can be associated with some
attributes.
• Data structure of the attributes can be specified in the union in
the declarations. (see example4.y).
%union {
int semantic_value;
}
%token <semantic_value> ICONSTnumber
%type <semantic_value> exp
%type <semantic_value> term
%type <semantic_value> item
119
• Semantic actions associate with productions can be specified
• Semantic actions
– Semantic actions associate with productions can be
specified.
item : LPARENnumber exp RPARENnumber
{$$ = $2;}
| ICONSTnumber
{$$ = $1;}
;
• $$ is the attribute associated with the left handside of the
production
• $1 is the attribute associated with the first symbol in the
right handside, $2 for the second symbol, …
– An action can be in anyway in the production, it is also
counted as a symbol.
– Checkout example5.y for examples with multiple
types associated with different symbol.