Chapter 3: Describing Syntax
and Semantics
Lectures # 6
Chapter 3 Topics
Definitions
Tokens and Lexemes
Formal Definition of Languages
Formal Methods of Describing Syntax
Context Free Grammar (CFG)
Backus-Naur Form (BNF)
Derivation
Parse Trees
An Ambiguous Expression Grammar
Presidency and associativity of grammars
Syntax Graphs
Chapter 3: Describing Syntax and Semantics
2
Definitions
Syntax: the form or structure of the expressions, statements,
and program units.
Semantics: the meaning of the expressions, statements, and
program units.
Syntax and semantics provide a language’s definition.
Syntax of programming languages can be specified by
Context Free Grammar (CFG) (will be discussed later).
Chapter 3: Describing Syntax and Semantics
3
Tokens and Lexemes
A lexeme is the lowest level syntactic unit of a language
(meaningful units which compose a sentence).
A token is a category of lexemes (e.g., identifier, constant, …).
Example: A sentence if index > 2 then count := 17;
Tokens
key word
identifier (id)
constant (const)
relational operator (relop)
assignment operator
end of statement
Chapter 3: Describing Syntax and Semantics
Lexemes
if , then
index , count
2 , 17
>
:=
;
4
Formal Definition of Languages
Recognizers
o Either accepts or rejects an input string.
o Given a string, a recognizer for a language L tells whether or
not the string is in L.
o The syntax analysis part of a compiler is a recognizer for the
language.
Generators
A device that generates sentences of a language.
A generator for L will produce an arbitrary string in L on
demand. (ex: Grammar, BNF)
Chapter 3: Describing Syntax and Semantics
5
Formal Methods of Describing
Syntax
Backus-Naur Form (BNF) and Context-Free Grammars (CFG):
Most widely known method for describing programming
language syntax.
Extended BNF (EBNF)
Improves readability and writability of BNF.
Chapter 3: Describing Syntax and Semantics
6
Context-Free Grammars (CFG)
Context-Free Grammars (CFGs):
Developed by Noam Chomsky in the mid-1950s.
Language generators, meant to describe the syntax of
natural languages.
Context-free grammars are used to describe the syntax of
modern programming languages.
Chapter 3: Describing Syntax and Semantics
7
Context-Free Grammars (cont.)
A CFG is a 4-tuple (N, T, S, P), where:
N is a set of nonterminal symbols.
T is a set of terminal symbols (N T = ).
S N is the start symbol.
P is a set of productions (also called rules).
Chapter 3: Describing Syntax and Semantics
8
Context-Free Grammars (cont.)
Example:
P = { S bA, S aA, A aA, A b }
Nonterminals = {S, A}
Terminals = {a, b}
Start symbol = S
A sentence consists entirely of terminal symbols (in this
grammar, a’s and b’s).
The productions can be used to generate sentences, starting
with a rule for the start symbol (S).
Chapter 3: Describing Syntax and Semantics
9
Using production rules
To generate a sentence, we start with a rule whose left hand
side is the start symbol (S). In the preceding grammar we
start with:
S bA or S aA
We systematically replace nonterminals in the resulting
expression with the right hand sides of rules for the
nonterminals.
This is called expanding the nonterminal. In the preceding
grammar, we might replace an A with either aA or b.
Chapter 3: Describing Syntax and Semantics
10
Generating a sentence
We demonstrate the generation of a sentence baab using the
productions:
P = { S bA, S aA, A aA, A b }
S bA
baA
baaA
baab
Chapter 3: Describing Syntax and Semantics
(start with start symbol)
(replace A with aA)
(replace A with aA)
(replace A with b)
11
More sentences
P = { S bA, S aA, A aA, A b }
S bA
bb
S bA
baA
bab
S aA
aaA
aaaA
aaab
Chapter 3: Describing Syntax and Semantics
12
CFG Conventions
Nonterminals are often distinguished by using:
Upper case or italicized letters (S, A, Stmt, …)
Angle brackets (e.g., <while_stmt>)
Terminals are distinguished by using:
Lower case letters (a, b, …).
The first production rule is a rule for the start symbol.
Using these conventions, we can define the grammar by
listing only the rules.
Chapter 3: Describing Syntax and Semantics
13
Backus-Naur Form (BNF)
Backus-Naur Form (BNF) (1959):
Invented by John Backus to describe Algol 58.
BNF is equivalent to context-free grammars (CFG).
In BNF, rules are used to describe the syntax of a language.
In BNF, there is at least one rule for each language
abstraction (nonterminal).
Chapter 3: Describing Syntax and Semantics
14
BNF Fundamentals
Nonterminals: BNF abstractions.
Terminals: lexemes and tokens.
Grammar: a collection of rules.
Examples of BNF rules:
<stmt> <while_stmt> | <if_stmt>
<while_stmt> while <logic_expr> do <stmt>
<if_stmt> if <logic_expr> then <stmt>
Chapter 3: Describing Syntax and Semantics
15
BNF Rules
A rule has a left-hand side (LHS) and a right-hand side
(RHS), and consists of terminal and nonterminal symbols.
A grammar is a finite nonempty set of rules.
An abstraction (or nonterminal symbol) can have more than
one RHS.
<stmt> <single_stmt>
| begin <stmt_list> end
Chapter 3: Describing Syntax and Semantics
16
Recursive BNF rules
Syntactic lists are described in BNF using recursion:
<id_list> ident | ident , <id_list>
<expr> num | <expr> + num
Chapter 3: Describing Syntax and Semantics
17
© Copyright 2026 Paperzz