Syntax tree Parse tree: interior nodes are non-terminals, leaves are
terminals
Syntax tree: interior nodes are “operators”, leaves are
operands
Parse tree: rarely constructed as a data structure
Syntax tree: when representing a program in a tree structure
usually use a syntax tree
Parse tree: Represents the concrete syntax of a program
Syntax tree: Represents the abstract syntax of a program (the
semantics)
A syntax tree is often called abstract syntax tree or AST
Syntax and parse tree examples
Grammar:
E→E*E
| E+E
| id
Program: a + b * c
Parse tree
Syntax tree
E
+
E
+
E
id
E
*
id
id
E
id
*
id
id
Why syntax tree We want an intermediate representation because:
● It will be possible to divide the compilation task into several
passes.
We want several passes because:
● Some tasks would otherwise be hard to perform, specially in a
bottom up parser.
● In general we prefer a modularity.
We want a syntax tree as an intermediate representation because:
● It corresponds more closely to the meaning of the various program
constructs (to the semantics).
● It is more compact.
Creating a syntax tree Top down and bottom up approaches differs a bit.
We will focus on the bottom-up construction since we will
implement a syntax tree in a bottom-up parser.
Grammar:
Grammar extended with semantic actions:
E→E+E
E1 → E2 * E3 { E1.t = create(E2.t, '*', E3.t) }
|E*E
E1 → E2 + E3 { E1.t = create(E2.t, '+', E3.t) }
| id
E → id
{ E.t = create(id.name) }
Syntax tree construction example
A: E1 → E2 + E3 { E1.t = create(E2.t, '+', E3.t) }
B: E1 → E2 * E3 { E1.t = create(E2.t, '*', E3.t) }
C: E → id
{ E.t = create(id.name) }
0
1
2
3
4
5
6
+
*
3
C
4
C
A
B
4
B
id
2
$
acc
C
2
2
id + id * id
A
B
id +
E $0
+
1 $0 id2
$0 E1
+
$0 E1 +3
5 $0 E1 +3 id2
6 $0 E1 +3 E5
$0 E1 +3 E5 *4
$0 E1 +3 E5 *4
$0 E1 +3 E5 *4
$0 E1 +3 E5
$0 E1
id
id
id
id
id2
E6
*
*
*
*
*
*
id$
id$ C: create(id.name)
id$
id$
id$ C: create(id.name)
id$
id$
$ C: create(id.name)
$ B: create(E.t,'*',E.t)
$ A: create(E.t,'+',E.t)
$
Bison (Yacc)
YACC – Yet Another Compiler Compiler. A program that compiles a
CFG to a compiler.
Bison – An open source alternative. Fairly compatible input format
and program options.
Basically:
Specify your parser by writing the grammar of your language in a
text file.
●
Feed bison with that file and you get a C-file and H-file as output.
The H-file contains e.g. Token definitions.
●
●
Compile that file and you have a parser!
The input file to Bison
There are three sections in the specification file to bison:
Definitions
%%
Grammar rules
%%
C-code
%% is a marker to delimit the sections
- As you can see it follows the same pattern as the flex input.
Definition section
Example of definitions:
●
Define tokens
●
Define operator precedence
●
Define operator associativity
●
Define the types of grammar symbols
●
Re-define yylval
●
Write C-code (analogous to flex)
●
Issue certain commands to Bison
Definition section
Example of definitions:
●
Define tokens
●
Define operator precedence
●
Define operator associativity
●
Define the types of grammar symbols
●
Re-define yylval
●
Write C-code (analogous to flex)
●
Issue certain commands to Bison
Token definition
Normal case
%token IDENTIFIER
%token WHILE
Token, precedence and associativity – all in one (lowest precedence
at the top)
%left <yyOperator> RELOP
%left <yyOperator> MINUSOP PLUSOP
%right <yyOperator> NOTOP
Definition section
Example of definitions:
●
Define tokens
●
Define operator precedence
●
Define operator associativity
●
Define the types of grammar symbols
●
Re-define yylval
●
Write C-code (analogous to flex)
●
Issue certain commands to Bison
Definiing types
Just enter the type within <> before the list of tokens
%left <Operator> RELOP
%left <Operator> MULOP
%right <Operator> NOTOP UNOP
%token <String> ID STRING
Or the same for non-terminals
%type <Node> stmnt expr actuals exprs
What is a type in this context?
Definition section
Example of definitions:
●
Define tokens
●
Define operator precedence
●
Define operator associativity
●
Define the types of grammar symbols
●
Re-define yylval
●
Write C-code (analogous to flex)
●
Issue certain commands to Bison
Re-refinition of yylval
yylval is the lexical analyzers variable to specify token attributes.
By default this is an int. Usually that is not sufficient. And usually we
want more than one attribute (remember the name of an ID and the
value of an integer constant from lab 1). We redefine by specifying
a bison command named %union. Note this is not C- code. Bison will
generate C-code from it which occasionally may be a union.
% union {
int Operator;
char *String;
NODE_TYPE Node;
}
Note: the member name is used as the “type” specification
Re-refinition of yylval
% union {
int Operator;
char *String;
NODE_TYPE Node;
}
Note: the member name is used as the “type” specification e.g.:
%token <String> ID STRING
Re-refinition of yylval
When the parser is shifting yylval will be pushed onto a stack
parallel to the stack containing the states. This stack is named
the value stack.
The re-definition of yylval will affect also the stack elements in
order to make it consistent.
Assigning types to tokens and grammar symbols using the <>
syntax have two advantages.
● You don't need to write the member names
● Bison will perform a type-checking
Definition section
Example of definitions:
●
Define tokens
●
Define operator precedence
●
Define operator associativity
●
Define the types of grammar symbols
●
Re-define yylval
●
Write C-code (analogous to flex)
●
Issue certain commands to Bison
Definition section
Example of definitions:
●
Define tokens
●
Define operator precedence
●
Define operator associativity
●
Define the types of grammar symbols
●
Re-define yylval
●
Write C-code (analogous to flex)
●
Issue certain commands to Bison
Grammar rules section
Grammar rule syntax (example):
decl
: BASIC_TYPE idents ';'
;
idents : idents ',' ident
| ident
;
ident : ID
;
Integrating semantic actions
Grammar rule syntax (example):
decl : BASIC_TYPE idents ';' { $$ = function($1, $2); }
;
Bison translates $$, $1, $2 etc to some C-variable names.
When this grammar rule is reduced, the C-code corresponding to
“$$ = function($1, $2);” will be executed.
Any C-code may be there, in this example you may get an idea how
to connect a sub-trees to a parent tree in a bottom-up fashion.
$1 means “BASIC_TYPE.attribute” (the yylval that was previously
shifted onto the value stack)
$2 means “idents.attribute” (the value pushed on the stack when
idents was reduced)
$$ means “decl.attribute”
Note that after this reduction BASIC_TYPE idents will be popped
and decl will be pushed
Semantic actions (cont.)
We had this before when we built a tree while executing a parser
“virtually”:
A: E1 → E2 + E3 { E1.t = create(E2.t, '+', E3.t) }
B: E1 → E2 * E3 { E1.t = create(E2.t, '*', E3.t) }
C: E → id
{ E.t = create(id.name) }
Note the similarity with the Bison specification file:
E
: E '+' E { $$ = create($1, '+' $3); }
...
Code section
Like in a flex specification file, the Bison specification file has a last
section where C-code can be written.
© Copyright 2026 Paperzz