語法Syntax 語義Semantics 語用Pragmatics(編譯指示)

Chapter 3
Describing Syntax and
Semantics
ISBN 0-321-19362-8
Chapter 3 Topics
• Introduction
• The General Problem of Describing Syntax
(語法)
• Formal Methods of Describing Syntax
• Attribute Grammars
• Describing the Meanings of Programs:
Dynamic Semantics(語意)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-2
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-3
Introduction
• Who must use language definitions?
– Other language designers
– Implementors
– Programmers (the users of the language)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-4
語言
• 語言是由句子組成的集合,
是由一組記號所構成的集合。
• 漢語--所有符合漢語語法的句子的全體
• 英語--所有符合英語語法的句子的全體
• 程式語言--所有該語言的程式的全體
• 研究語言 :
 每個句子構成的規律
 每個句子的含意
 每個句子和使用者間的關係
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-5
研究程式語言:
每個程式構成的規律
每個程式的含意
每個程式和使用者間的關係
程式語言研究的三個方面:
語法
Syntax
語義
Semantics
語用
Pragmatics(編譯指示)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-6
語法 -- 表示構成語言句子的各個記號之間
的組合規律
語義 -- 表示按照各種表示方法所表示的各
個記號的特定含意。(各個記號和記號所
表示的對象之間的關係)
語用 --表示在各個記號所出現的行為中,
它們的來源、使用和影響。
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-7
形式語言
• 如果不考慮語義和語用,即只從語法這一側面來看
語言,這種意義下的語言稱作形式語言。
• 形式語言抽象地定義為一個數學系統。“形式”是
指這樣的事實:語言的所有規則只以什麼符號串能
出現的方式來陳述。
• 形式語言理論是對符號串集合的表示法、結構及其
特性的研究,是程序設計語言語法分析研究的基礎。
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-8
Introduction
• Syntax (語法) - the form or structure of the
expressions, statements, and program units
• Semantics (語意) - the meaning of the expressions,
statements, and program units
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-9
(字母)
(辭彙)
(標記)
(句子)
(識別器)
(產生器)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-10
Describing Syntax
• Formal approaches to describing syntax:
– Recognizers(識別器) A recognizer is a machine that takes a string and
determines if it belongs to the language described
by the grammar.
used in compilers (we will look at in Chapter 4)
– Generators(產生器) –
A generator is a machine that produces only legal
sentences in the language.
It may produce random sentences.
generate the sentences of a language (what we'll
study in this chapter)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-11
Describing Syntax
• A sentence is a string of characters over some alphabet
• A language is a set of sentences
• A lexeme(辭彙) is the lowest level syntactic(語法的) unit
of a language (e.g., *, sum, begin)
• A token(標記) is a category of lexemes (e.g., identifier)
index = 2 * count + 17;
Lexemes
index
=
2
*
count
+
17
;
Tokens
identifier
equal_sign
int_literal
mult_op
identifier
plus_op
int_literal
semicolon
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-12
同義字;類義字
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-13
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-14
邏輯和
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-15
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-16
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-17
• Work by Turing and Chomsky in the 1940-50s
identified four categories of languages of increasing
power and complexity:
regular
context-free
context-sensitive
recursively enumerable
Usually, programming languages are
context-free.
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-18
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-19
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-20
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-21
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-22
NOAM CHOMSKY,
MIT Institute Professor;
Professor of Linguistics,
Linguistic Theory,
Syntax, Semantics,
Philosophy of Language
Six participants in the 1960 Algol conference in Paris. The
picture was taken at the 1974 ACM conference on the
history of programming languages. Top row: John
McCarthy, Fritz Bauer, Joe Wegstein. Bottom row: John
Backus, Peter Naur, Alan Perlis.
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-23
Formal Methods of
Describing Syntax
• Context-Free Grammars
– Developed by Noam Chomsky in the mid-1950s
– Language generators, meant to describe the syntax of
natural languages
– Define a class of languages called context-free
languages
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-24
語彙剖析
語法剖析
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-25
Formal Methods of
Describing Syntax
• Backus-Naur Form (1959)
– Invented by John Backus to describe Algol 58
– BNF is equivalent to context-free grammars
– A metalanguage(語言分析用的語言) is a language
used to describe another language.
– In BNF, abstractions are used to represent classes of
syntactic structures--they act like syntactic variables
(also called nonterminal symbols)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-26
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-27
• The first programming language to have a formally
specified grammar was ALGOL 60
• The formal description was in a "metalanguage"
called Backus Naur Format.
• A metalanguage is a language used to describe other
languages.
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-28
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-29
The components of BNF include
• A start symbol - By default the start symbol is the nonterminal
on the LHS of the first rule.
• Terminals - These are the tokens.
Terminals will be represented as the name of the token, e.g.,
begin means the token corresponding to the reserved word begin.
• Nonterminals - These are represented in angle brackets, e.g.,
<stmt> means the nonterminal statement.
• Rules or productions of the form
nonterminal -> body
• The body of the rule consists of a list of
terminals,
nonterminals,
| (meaning "or"),
a pair of brackets, [], enclosing an optional clause,
a pair of braces, {}, followed by a * (meaning zero or more), or
+ (meaning one or more), enclosing a repeating clause
or E, representing the empty string.
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-30
The components of BNF include
The interpretation of a rule is that the syntax of the nonterminal,
sometimes called the head or right-hand side (RHS) is described
by the body, sometimes called the left-hand side (LHS). For
example, the following rule describes the syntax of an if
statement.
<if_stmt> -> if <predicate> then <stmt>
| if <predicate> then <stmt> else <stmt>
Note that a non-terminal may be the RHS of several rules. The rule
given above is the same as the pair of rules given below.
<if_stmt> -> if <predicate> then <stmt>
<if_stmt> -> if <predicate> then <stmt> else <stmt>
Another equivalent formulation is as an optional clause.
<if_stmt> -> if <predicate> then <stmt> [ else <stmt> ]
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-31
Backus-Naur Form (1959)
<assign>  <var>= <expression>
The abstractions <var> and <expression> obviously
must be defined before the <assign> definition
becomes useful.
• This is a rule; it describes the structure of a
assign statement
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-32
Formal Methods of Describing Syntax
• A rule has a left-hand side (LHS) and a right-hand
side (RHS), and consists of terminal and
nonterminal symbols
• A grammar is a collection of rules
• An abstraction (or nonterminal symbol) can have
more than one definitions
<if_stmt>  if <logic_expr> then <stmt>
| if <logic_expr> then <stmt> else <stmt>
Multiple definitions can be written as a single rule, with the different
definitions separated by the symbol |
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-33
Formal Methods of Describing Syntax
• Syntactic lists are described using recursion
<ident_list>  idenrifier
| idenrifier, <ident_list>
• A derivation(誘導) is a repeated application of
rules, starting with the start symbol and ending
with a sentence (all terminal symbols)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-34
Derivation
• Every string of symbols in the derivation is a
sentential form
• A sentence is a sentential form that has only
terminal symbols
• A leftmost derivation is one in which the
leftmost nonterminal in each sentential form is
the one that is expanded
• A derivation may be neither leftmost nor
rightmost
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-35
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-36
Each successive string in the sequence is derived from the previous string
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-37
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-38
Parsing
• Let's look at how a grammar can be used to recognize
whether a string is a sentence in the language. The
process is called parsing(從語法上分析). The idea is that
if we can somehow parse the sentence from the start
symbol, then the sentence is part of the language
described by the grammar. Parsing proceeds by
replacing a nonterminal with its body.
• The parse creates a parse tree.
- The root of the tree is the start symbol.
- Every interior node is a nonterminal.
- Every leaf is a terminal.
There is one child for each nonterminal or terminal in
the body of a production that is used in a parse.
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-39
Parse Tree
• A hierarchical representation of a derivation
<program>
<stmts>
<stmt>
<var>
=
<expr>
a <term> +
<term>
<var>
const
b
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-40
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-41
Figure 3.1 A parse tree
for the simple statement A = B * (A + C)
1
2
3
4
5
6
7
8
9
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-42
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-43
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-44
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-45
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-46
Figure 3.3 The unique parse tree for
A = B + (A * C) using an unambiguous grammar
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-47
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-48
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-49
Figure 3.4 A parse tree for A = B +(A + C)
illustrating the associativity of addition
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-50
Figure 3.5 Two distinct parse trees for the same sentenial form
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-51
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-52
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-53
• Extended BNF (or just abbreviations EBNF):
– Optional parts are placed in brackets ([ ]), e.g. C
<proc_call> -> ident [ ( <expr_list>)]
– Put alternative parts of RHSs in parentheses and
separate them with vertical bars
<term> -> <term> (*|/|%) <factor>
– Put repetitions (0 or more) in braces ({ })
<ident_list> -> identifier{,<identifier>}
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-54
BNF and EBNF
• BNF:
<expr>  <expr> + <term>
| <expr> - <term>
| <term>
<term>  <term> * <factor>
| <term> / <factor>
| <factor>
• EBNF:
<expr>  <term> {(+ | -) <term>}
<term>  <factor> {(* | /) <factor>}
Copyright © 2004 Pearson Addison-Wesley. All rights reserved.
3-55