Syntax and Semantics
•
•
•
•
•
The Purpose of Syntax
Problem of Describing Syntax
Formal Methods of Describing Syntax
Derivations and Parse Trees
Sebesta Chapter 3
1
What is Syntax and Semantics
• Syntax and Semantics define a PL
• Syntax
– form or structure of program units
• expressions, statements, declarations, etc.
• Semantics
– meaning of program units
• expressions, statements, declarations, etc.
• Why do we need language definitions?
– to design a language
– to implementer a compiler/interpreter
– to write a program (use the language)
2
Syntax Elements
• A sentence is
– a string of characters over some alphabet
• A language is
– a set of sentences
• A lexeme is
– the lowest level syntactic unit of a language
• e.g.,*, public, totalCount
• A token is
– a category of lexemes
• e.g., identifier
3
Describing Syntax
• Recognizers
– read an input string in the alphabet of
the language (a sentence) and decide
whether it belongs to the language
• used in compilers
– see Chapter 4 for details
• Generators
– produce sentences in a language
• a sentence is syntactically correct if it can be
generated by the generator
4
Backus-Naur Form (BNF)
• BNF is a meta-language
– i.e. a language used to describe another language
– invented by John Backus to describe ALGOL 58
– used by Peter Naur to describe ALGOL 60
• BNF is equivalent to context-free grammars
• a BNF grammar is defined by
–
–
–
–
a set of terminal symbols,
a set of nonterminal symbols
a set of rules
a start symbol (one of the terminal symbols)
5
BNF Elements
• terminal symbols
– are the lexemes of the target PL
• e.g., while, ( , )
• nonterminal symbols
– represent classes of syntactic structures
• they act like syntactic variables
• e.g., <statement>
• rules
– define how a nonterminal symbol can by
developed into a sequence of nonterminal
and terminal symbols
• e.g., <while_stmt> while ( <logic_expr> ) <stmt>
6
BNF Rules
• A rule has
– a left-hand side (LHS)
– then
– a right-hand side (RHS)
• There can be several rules for one LHS
<stmt> <assignment>
<stmt> begin <stmt_list> end
• Syntactic lists are described using recursion
<ident_list> ident
<ident_list> ident , <ident_list>
• A grammar is
– a finite nonempty set of rules
7
EBNF
• Extended BNF (EBNF)
– is most often used
– avoids having numerous rules for the same LHS
• Extra meta-symbols (in addition to )
– [… ]
• enclosed symbols are optional (1 or 0 times)
– e.g., <if_stmt> if ( <exp> ) <stmt> [ else <stmt> ]
– {…}
• enclosed symbols can be repeated (0 to n times)
– e.g., <ident_list> ident {, ident }
– …|…
• choice of one of the symbol sequences separated by |
– e.g., <stmt> <assignment> | begin <stmt_list> end
– (…)
• groups enclosed symbols
8
BNF vs. EBNF
BNF
<expr> <expr> + <term>
<expr> <expr> - <term>
<expr> <term>
<term> <term> * <factor>
<term> <term> / <factor>
<term> <factor>
<factor> <exp> ** <factor>
<factor> <exp>
<exp> ( <expr> )
<exp> id
EBNF
<expr> <term> { ( + | - ) <term> }
<term> <factor> { ( * | / ) <factor> }
<factor> <exp> [ ** <factor> ]
<exp> ( <expr> ) | id
9
Augmented EBNF
• another meta-symbol
= (equal) instead of
• meta-symbols for repetitions
+ means one or more times
* means zero or more times
<ident> = <letter>+ ( <letter> | <digit> )*
• rules can use iteration instead of recursion
– e.g.:
• <stmt_list> <stmt> | <stmt> ; <stmt_list>
– can be formulated as
• <stmt_list> = <stmt> ( ; <stmt> )*
10
Context-Free Grammar
• Context-Free Grammars (CFG)
– defined by Noam Chomsky
– meant to describe the syntax of natural languages
• Context-Free Grammar G = (S, T, N, P)
•
•
•
•
S = start symbol
T = set of terminal symbols – lexemes and tokens
N = set of non-terminal symbols - abstractions
P = production rules – definition of a LHS abstraction
using RHS
• A sentence
– a sequence of terminal symbols
11
A Small Language in EBNF
<program>
<stmt_list>
<stmt>
<expr>
<term>
<var>
begin <stmt_list> end
<stmt> | <stmt> ; <stmt_list>
<var> = <expr>
<term> + <term> | <term> - <term>
<var> | const
a | b | c
12
Derivation
• A derivation is
– a repeated application of rules
• starting with the start symbol
• substitution of a nonterminal LHS by the RHS of a rule
• ending with a sentence (all terminal symbols)
• Every string of symbols in the derivation is
– a sentential form
• A sentence is
– sentential form with only terminal symbols
13
Derivation Types
• A leftmost derivation
– leftmost nonterminal in each sentential form is
expanded first
• A rightmost derivation
– rightmost nonterminal is expanded first
• A mixed derivation
– an arbitrary nonterminal is expanded
14
Derivation Example
<program> begin <stmt_list> end
<stmt_list> <stmt> | <stmt> ; <stmt_list>
<stmt> <var> = <expr>
<expr> <term> + <term> | <term> - <term>
<term> <var> | const
<var> a | b | c
<program> =>
=>
=>
=>
=>
=>
=>
=>
begin
begin
begin
begin
begin
begin
begin
begin
<stmt_list> end
<stmt> end
<var> = <expr> end
a = <expr> end
a = <term> + <term> end
a = <var> + <term> end
a = b + <term> end
a = b + const end
15
Questions
In the preceding slide:
1. Is the derivation a leftmost or a rightmost derivation?
2. State the "opposite" derivation.
•
•
I.e. if it is a leftmost derivation give rightmost one
or vice versa
3. What are the terminal symbols of the language,
what are the nonterminal symbols and what is the
start symbol?
4. Change a rule so that
begin a = - b + const end
is a legal sentence
16
Parse Tree
• Parse Tree is
– a hierarchical representation of a derivation
<program>
begin
<stmt_list>
end
<stmt>
<var>
a
=
<expr>
<term> +
<term>
<var>
const
b
17
Simple Assignment Language
EBNF Grammar
Parse tree of the
sentence:
a = b * (a + c)
<assign> <id> = <expr>
<expr>
<id> + <expr>
| <id> * <expr>
| ( <expr> )
| <id>
<id>
a | b | c
<assign>
<id>
a
=
<id>
b
<expr>
*
<expr>
( <expr> )
<id>
a
+
<expr>
<id>
c
18
Ambiguous Grammars
• A grammar is ambiguous
– if and only if it generates a sentential
form that has two or more distinct parse
trees
– e.g.
<assign> <id> = <expr>
<expr>
<expr> + <expr>
| <expr> * <expr>
| ( <expr> )
| <id>
<id>
a | b | c
19
20
Two Distinct Parse Trees
add-first parse tree
multiply-first parse tree
a = b + c * d
a = b + c * d
<assign>
<assign>
<id>
=
a <expr>
+
a
<expr>
<id> <expr> *
b
<id>
<expr>
<expr>
<expr>
= <expr>
<expr> *
+
<expr>
<expr> <id>
<id>
<id>
<id>
<id>
c
d
b
c
d
An Unambiguous Expression Grammar
• The same language can be defined with
an unambiguous grammar!
<assign> <id> = <expr>
<expr>
<expr> + <term>
| <term>
<term>
<term> * <factor>
| <factor>
<factor> ( <expr> )
| <id>
<id>
a | b | c
21
Precedence Through Grammar
• A grammar can enforce the precedence of
operators
– The parse tree shows how
• (low levels are evaluated first)
– e.g.,
<expr> <expr> + <term> | <term>
<term> <term> * const | const
<expr>
<expr>
+
<term>
<term> <term> *
const const
const
22
© Copyright 2026 Paperzz