Chapter 3: Describing Syntax and Semantics Lectures # 6

Chapter 3: Describing Syntax
and Semantics
Lectures # 6
Chapter 3 Topics







Definitions
Tokens and Lexemes
Formal Definition of Languages
Formal Methods of Describing Syntax
 Context Free Grammar (CFG)
 Backus-Naur Form (BNF)
 Derivation
 Parse Trees
An Ambiguous Expression Grammar
Presidency and associativity of grammars
Syntax Graphs
Chapter 3: Describing Syntax and Semantics
2
Definitions

Syntax: the form or structure of the expressions, statements,
and program units.

Semantics: the meaning of the expressions, statements, and
program units.

Syntax and semantics provide a language’s definition.

Syntax of programming languages can be specified by
Context Free Grammar (CFG) (will be discussed later).
Chapter 3: Describing Syntax and Semantics
3
Tokens and Lexemes

A lexeme is the lowest level syntactic unit of a language
(meaningful units which compose a sentence).

A token is a category of lexemes (e.g., identifier, constant, …).

Example: A sentence if index > 2 then count := 17;
Tokens
key word
identifier (id)
constant (const)
relational operator (relop)
assignment operator
end of statement
Chapter 3: Describing Syntax and Semantics
Lexemes
if , then
index , count
2 , 17
>
:=
;
4
Formal Definition of Languages

Recognizers
o Either accepts or rejects an input string.
o Given a string, a recognizer for a language L tells whether or
not the string is in L.
o The syntax analysis part of a compiler is a recognizer for the
language.

Generators

A device that generates sentences of a language.

A generator for L will produce an arbitrary string in L on
demand. (ex: Grammar, BNF)
Chapter 3: Describing Syntax and Semantics
5
Formal Methods of Describing
Syntax

Backus-Naur Form (BNF) and Context-Free Grammars (CFG):


Most widely known method for describing programming
language syntax.
Extended BNF (EBNF)

Improves readability and writability of BNF.
Chapter 3: Describing Syntax and Semantics
6
Context-Free Grammars (CFG)

Context-Free Grammars (CFGs):

Developed by Noam Chomsky in the mid-1950s.

Language generators, meant to describe the syntax of
natural languages.

Context-free grammars are used to describe the syntax of
modern programming languages.
Chapter 3: Describing Syntax and Semantics
7
Context-Free Grammars (cont.)

A CFG is a 4-tuple (N, T, S, P), where:

N is a set of nonterminal symbols.

T is a set of terminal symbols (N  T = ).

S  N is the start symbol.

P is a set of productions (also called rules).
Chapter 3: Describing Syntax and Semantics
8
Context-Free Grammars (cont.)

Example:
P = { S  bA, S  aA, A  aA, A  b }



Nonterminals = {S, A}
Terminals = {a, b}
Start symbol = S

A sentence consists entirely of terminal symbols (in this
grammar, a’s and b’s).

The productions can be used to generate sentences, starting
with a rule for the start symbol (S).
Chapter 3: Describing Syntax and Semantics
9
Using production rules

To generate a sentence, we start with a rule whose left hand
side is the start symbol (S). In the preceding grammar we
start with:
S  bA or S  aA

We systematically replace nonterminals in the resulting
expression with the right hand sides of rules for the
nonterminals.

This is called expanding the nonterminal. In the preceding
grammar, we might replace an A with either aA or b.
Chapter 3: Describing Syntax and Semantics
10
Generating a sentence

We demonstrate the generation of a sentence baab using the
productions:
P = { S  bA, S  aA, A  aA, A  b }

S  bA
 baA
 baaA
 baab
Chapter 3: Describing Syntax and Semantics
(start with start symbol)
(replace A with aA)
(replace A with aA)
(replace A with b)
11
More sentences
P = { S  bA, S  aA, A  aA, A  b }



S  bA
 bb
S  bA
 baA
 bab
S  aA
 aaA
 aaaA
 aaab
Chapter 3: Describing Syntax and Semantics
12
CFG Conventions


Nonterminals are often distinguished by using:

Upper case or italicized letters (S, A, Stmt, …)

Angle brackets (e.g., <while_stmt>)
Terminals are distinguished by using:

Lower case letters (a, b, …).

The first production rule is a rule for the start symbol.

Using these conventions, we can define the grammar by
listing only the rules.
Chapter 3: Describing Syntax and Semantics
13
Backus-Naur Form (BNF)

Backus-Naur Form (BNF) (1959):

Invented by John Backus to describe Algol 58.

BNF is equivalent to context-free grammars (CFG).

In BNF, rules are used to describe the syntax of a language.

In BNF, there is at least one rule for each language
abstraction (nonterminal).
Chapter 3: Describing Syntax and Semantics
14
BNF Fundamentals

Nonterminals: BNF abstractions.

Terminals: lexemes and tokens.

Grammar: a collection of rules.

Examples of BNF rules:
<stmt>  <while_stmt> | <if_stmt>
<while_stmt>  while <logic_expr> do <stmt>
<if_stmt>  if <logic_expr> then <stmt>
Chapter 3: Describing Syntax and Semantics
15
BNF Rules

A rule has a left-hand side (LHS) and a right-hand side
(RHS), and consists of terminal and nonterminal symbols.

A grammar is a finite nonempty set of rules.

An abstraction (or nonterminal symbol) can have more than
one RHS.
<stmt>  <single_stmt>
| begin <stmt_list> end
Chapter 3: Describing Syntax and Semantics
16
Recursive BNF rules

Syntactic lists are described in BNF using recursion:
<id_list>  ident | ident , <id_list>
<expr>  num | <expr> + num
Chapter 3: Describing Syntax and Semantics
17