CS344-232 Mathematical Foundation for Computer Science II
Formal Language
Alphabets and Languages
Alphabet is a finite set of symbols and is denoted by . In fact, any object can be in an
alphabet; from a formal point of view, an alphabet is simply a finite set of any sort.
A string over an alphabet is a finite sequence of symbols from the alphabet. A string may
have no symbols at all; in this case it is called the empty string and is denoted by .
The length of a string is the number of symbols in string. We denote the length of a string w
by |w|; thus |101| = 3 and || = 0.
Two strings over the same alphabet can be combined to form a third by the operation of
concatenation. The concatenation of strings x and y, written x y or simply xy, is the string x
followed by the string y.
A string v is a substring of a string w if and only if there are strings x and y such that w =
xvy. Both x and y could be , so every string is a substring of itself; and taking x = w and v = y = ,
we see that is a substring of every string. If w = xv for some x, then v is a suffix of w; if w = vy for
some y, then v is a prefix of w.
For each string w and each natural number I, the string wI is defined as w0 = , the empty
string; wI+1 = wI w for each I 0.
The reversal of a string w, denoted by wR, is the string “spelled backwards”: for example,
reverseR = esrever.
The set of all strings - including the empty string - over an alphabet is denoted by *.
Any set of strings over an alphabet - that is, any subset of * - will be called a language. Thus
*, , and are languages.
Since a language is simply a special kind of set, we can specify a finite language by listing
all its strings. For example, { aba, cde, fg} is a language over {a,b, …, z}. However, most languages
of interest are infinite, so that listing all the strings is not possible. Thus we can specify infinite
languages by the scheme
L = { w * : w has property P }
eg. { w {0,1}* : w has an equal number of 0’s and 1’s}, and { w * : w = wR or palindrome e.g.
madam = madam }.
If L1 and L2 are languages over , their concatenation is L = L1 L2, or simply L1L2, where
L = { w : w = x y for some x L1 and y L2}.
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
Example, = { 0, 1 }, L1 = { w *: w has an even number of 0’s (any number of 1)} and L2 = { w :
w starts with a 0 and the rest of the symbols are 1’s}, then L1 L2 = { w *: w has an odd number
of 0’s (any number 0f 1)}.
Another language operation is the closure or Kleene star of a single language L, denoted
by L* . L* is the set of all strings obtained by concatenating zero or more strings from L. (The
concatenation of zero strings is , and the concatenation of one string is the string of itself.) Thus,
L* = {w *: w = w1 w2 … wk, for some k 0 and some w1,…,wk L , k = 0 mean w0 = }.
Example, L = {01, 1, 100}, then 110001110011 *, since it is equal to 1100011001 1.
Note! L* and * = {}.
We write L+ for the set LL* . Equivalently,
L+= {w : w = w1 w2 … wk, for some k 1 and some w1,…,wk L}.
Regular Expressions
Here are the rules that define the regular expressions over alphabet . Associated with
each rule is a specification of the language denoted by regular expression being defined.
1. is a regular expression that denotes {} , that is, the set containing the empty string.
2. If a is a symbol in , then a is a regular expression that denotes {a}, i.e., the set containing the
string a.
3. Suppose r and s are regular expressions denoting the languages Lr and Ls. Then,
a) r | s is a regular expression denoting Lr Ls.
b) r s is a regular expression denoting Lr Ls.
c) r* is a regular expression denoting Lr*.
d) r is a regular expression denoting Lr.
A language denoted by a regular expression is said to be a regular set or regular language.
Unnecessary parentheses can be avoided in regular expression if we adopt the conventions that:
1. the unary operator * has highest precedence and is left associative,
2. concatenation has the highest precedence and is left associative,
3. | has the lowest precedence and is left associative.
Under these conventions, (a) | ((b)*(c)) is equivalent to a|b*c. Both expressions denote the set of
strings that are either a single a or zero or more b’s followed by one c.
Example Let = { a, b }.
1. The regular expression a | b denotes the set { a, b }.
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
2. The regular expression (a|b)(a|b) denotes { aa, ab, ba, bb }. Another regular expression for this
same set is aa | ab | ba | bb.
3. The regular expression a* denotes the set of all strings of zero or more a’s, i.e. { , a, aa, aaa, …
}.
4. The regular expression (a|b)* denotes the set of all strings containing zero or more instances of
an a or b, that is, the set of all strings of a’s and b’s.
5. The regular expression a | a*b denotes the set containing the string a and all strings consisting
of zero or more a’s followed by a b.
Example Let = { A, B, …, Z, 0, 1,..9 }.
1. The regular expression
AND
|
ARRAY |
BEGIN
|
CASE |
CONST |
DIV
|
DO
|
DOWNTO
|
ELSE |
END |
FILE
|
FOR |
FUNCTION
|
GOTO |
IF
|
IN
|
LABEL |
MOD
|
NIL
|
NOT |
OF
|
OR
|
PACKED
| PROCEDURE | PROGRAM |
RECORD |
REPEAT |
SET
|
THEN |
TO
|
TYPE
|
UNTIL |
VAR
|
WHILE |
WITH
denotes the set of keyword of Standard Pascal language.
2. The regular expression (A|B|…|Z)(A|B|…|Z|0|1|…|9)* denotes the set of identifier of Standard
Pascal language.
3. The regular expression (0|1|…|9)(0|1|…|9)* หรื อ (0|1|…|9)+ denotes the set of unsigned integer of
Standard Pascal language.
Note! a+ = aa*
If two regular expressions r and s denote the same language, we say r and s are equivalent and
write r = s. For example, (a|b) = (b|a).
Algebraic properties of regular expressions.
Axiom
r|s=s|r
r | (s | t) = (r | s) | t
(rs)t = r(st)
r(s|t) = rs | rt
or (s|t)r = sr | tr
Descripion
| is commutative
| is associative
concatenation is associative
concatenation distributes over |
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
r = r or r= r
r* = (r | )*
r** = r*
is the identity element for concatenation
relation between * and
* is idempotent
Finite Automata
A recognizer for a language is a program that takes as input a string x and answers “yes” if
x is a sentence of the language and “no” otherwise. We compile a regular expression into a
recognizer by constructing a generalized transition diagram called a finite automaton. A finite
automaton can be deterministic or nondeterministic, where “nondeterministic” means that more than
one transition out of a state may be possible on the same input symbol.
Nondeterministic Finite Automata
A nondeterministic finite automaton (NFA, for short) is a mathematical model that consists of
1. a finite set of states, K
2. a set of input symbols (the input symbol alphabet)
3. a transition function move that maps state-symbol pairs to sets of states,
4. a state that is distinguished as the start (or initial) state, s
5. a set of states F distinguished as accepting (or final) states
Definition A nondeterministic finite automaton is a quintuple M = (K, , , s, F)
Where
K is a finite set of states,
is an alphabet,
s K is the initial state,
F K is the set of final states,
and the transition function, : K ({}) 2K
An NFA can be represented diagrammatically by a labeled directed graph, called a transition
graph or state diagram, in which the nodes are the states and the labeled edges represent the
transition function. Final state represent by two circles, others by a circle.
Example The state diagram for an NFA that recognizes the language (a|b)*abb is shown below:
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
a
a
start
b
1
0
b
3
2
b
set of states, K = { 0, 1, 2, 3}
input symbol alphabet, = { a, b}
state 0 = start state, s = 0
state 3 = accepting state, F = { 3 }
a transition function, : { 0, 1, 2,3} ({}) {,{ 0},{ 1},{ 2},{3},{ 0,1}, { 0,2},
{0,3},{1,2},{1,3},{2,3},{ 0,1,2}, {0,1,3},{0,2,3},{1,2,3}, {0,1,2,3}} is given below:
k
t
(k,t)
0
0
0
1
1
1
2
2
2
3
3
3
a
b
a
b
a
b
a
b
{0,1}
{0}
{2}
{3}
or
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
transition table
Input symbol (t)
State(k)
a
>0
b
{0, 1}
{0}
-
1
-
{2}
-
2
-
{3}
-
3*
-
-
-
Note! > means start state, * means final state, Exercise! try ab and abb
Deterministic Finite Automata
A deterministic finite automata (DFA, for short) is a special case of a nondeterministic finite
automata in which
1. no state has an -transition, i.e. a transition on input , and
2. for each state s and input symbol a, there is at most one edge labeled a leaving s.
Definition A deterministic finite automaton is a quintuple M = (K, , , s, F)
Where
K is a finite set of states,
is an alphabet,
s K is the initial state,
F K is the set of final states,
and , the transition function, is a function from K to K.
Example
(a) Consider the deterministic finite automaton M = (K, , , s, F), where
K = { q0 , q 1 }
= { 0, 1 }
s = q0
F = { q0 }
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
And is the transition function tabulated below:
k
t
(k,t)
q0
q0
q1
q1
0
1
0
1
q0
q1
q1
q0
or
transition table
Input symbol
State
0
1
>q0*
q0
q1
q1
q1
q0
Then LM is the set of all strings in {0, 1}* that have an odd number of 1’s.
(b) Standard Pascal identifier
A|B|…|Z|0|1|…|9
start
0
A|B|…|Z
1
(c) Standard Pascal unsigned integer
0|1|…|9
start
0
0|1|…|9
1
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
Algorithm Simulate a DFA.
Input. An input string x terminated by an end-of-file character eof. A DFA D with start state s0 and set
of accepting states F.
Output. The answer “yes” if D accepts x; “no” otherwise.
s = s0
c := netchar
while c eof do
s := move(s,c)
c := netchar
end
if s is in F then
return “yes”
else return “no”
Note! move(s,c) gives the state to which there is a transition from state s on input character c.
Construction of an NFA from a Regular Expression
Algorithm. (Thompson’s construction)
Input. A regular expression r over an alphabet .
Output. An NFA N accepting Lr.
1. For , construct the NFA
start i
f
2. For a in , construct the NFA
start
a
f
i
3. Suppose Ns and Nt are NFA’s for regular expression s and t.
a) For the regular expression s | t, construct the following composite NFA Ns | t:
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
start
Ns
f
i
Nt
b) For the regular expression st, construct the following composite NFA Nst:
Start
Ns
i
Nt
f
c) For the regular expression s*, construct the following composite NFA Ns*:
start
i
Ns
f
d) For the parenthesized regular expression (s), use Ns itself as the NFA.
Example, Let us construct NFA from the regular expression (a | b)*abb
For first a, we construct the NFA
start
a
3
2
i
For first b, we construct the NFA
b
start
4
2
For (a | b ), we construct
the NFA
i
start
2
5
a
3
1
6
4
5
b
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
For (a | b )*, we construct the NFA
a
start
0
3
2
1
6
7
6
7
5
4
b
For second a, we construct the NFA
a
start
8
7’
For (a | b )*a, we construct the NFA
a
start
0
3
2
1
5
4
b
Continuing in this fashion we obtain the NFA for (a | b)*abb
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
a
8
CS344-232 Mathematical Foundation for Computer Science II
a
start
0
3
2
1
6
7
a
5
4
8
b
b
9
b
10
Conversion of an NFA into a DFA
Algorithm. Constructing a DFA from an NFA.
Input. An NFA N.
Output. A DFA D accepting the same language.
Initially, -closure({s0}) is the only state in Dstates and it is unmarked;
while there is an unmarked state T in Dstates do begin
mark T;
for each input symbol a do begin
U := -closure(move(T,a));
if U is not in Dstates then
add U as an unmarked state to Dstates
Dtran[T,a]:=U
end
end
A state of DFA (Dstates) is a final state if it contains at least one final state of NFA.
Note! move(T,a) is a set of NFA states to which there is a transition on input symbol a from some NFA
state s in T. Dtran[T,a]:=U is a transition of DFA on input symbol a from state T to U.
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
Computation of -closure(T)
push all states in T onto stack
Initialize -closure(T) to T
while stack is not empty do begin
pop t, the top element, off of stack
for each state u with an edge from t to u labeled do
if u is not in -closure(T) do begin
add u to -closure(T)
push u onto stack
end
end
Example , Figure below shows NFA accepting the language aa* | bb*.
a
a
1
2
start
b
0
3
b
4
Step 1 : Find start state of DFA by computation of -closure({0})
-closure({0}) = {0,1,3}
start
{0,1,3}
step 2: Mark {0,1,3}* and for every input , a or b , find
a) move({0,1,3}, a) = {2} then -closure(move{0,1,3},a) or -closure({2}
-closure({2}) = {2} this is an unmarked new state of DFA
a
{0,1,3}*
{2}
start
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
b) move({0,1,3}, b) = {4} then -closure(move{0,1,3},b) or -closure({4})
-closure({4}) = {4} this is an unmarked new state of DFA
a
{0,1,3}*
{2}
start
b
{4}
step 3: Mark {2}* and for every input , a or b , find
a) move({2}, a) = {2} then -closure(move{0,1,3},a) or -closure({2})
-closure({2}) = {2} this is an old state of DFA
a
{0,1,3}*
{2}*
start
a
b
{4}
b) move({2}, b) = {} then -closure(move{2},b) or -closure({})
step 4: Mark {4}* and for every input , a or b , find
a) move({4}, a) = {} then -closure(move{4},a) or -closure({})
b) move({4}, b) = {4} then -closure(move{4},b) or -closure({4})
-closure({4}) = {4} this is an old state of DFA
a
{0,1,3}*
{2}*
start
a
b
b
{4}*
Step 5: Every state of DFA is marked and no unmarked new state , stop
Step 6: state {2} and {4} are final states
a
{0,1,3}*
{2}*
start
a
b
{4}*
b
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
Example , Figure below shows NFA accepting the language (a|b)*abb.
a
start
0
3
2
1
6
7
a
5
4
8
b
b
9
b
10
The start state of the equivalent DFA is -closure(0), which is A = {0,1,2,4,7}, since these are exactly
the states reachable from state 0 via a path in which every edge is labeled . Note that a path can
have no edges, so 0 is reached from itself by such a path.
The input symbol alphabet here is {a, b}. The algorithm tells us to mark A and then to
compute -closure(move(A,a)). We first compute move(A,a), the set of states of N having transitions
on a from members of A. Among the states 0, 1, 2, 4 and 7, only 2 and 7 have such transitions, to 3
and 8, so
-closure(move({0,1,2,4,7}, a)) = -closure({3,8}) = {1, 2, 3, 4, 6, 7, 8}. Let us call this set B. Thus,
Dtran[A,a]=B.
Among the states in A, only 4 has a transition on b to 5, so the DFA has a transition on b
from A to C = -closure({5}) = {1, 2, 4, 5, 6, 7}. Thus, Dtran[A,b]=C.
If we continue this process with the now unmarked sets B and C, we eventually reach the
point where all sets that are states of the DFA are marked. This is certain since there are “only” 211
different subsets of a set of eleven states, and a set, once marked, is marked forever. The five
different sets of states we actually construct are:
A = {0, 1, 2, 4, 7}
D = {1, 2, 4, 5, 6, 7, 9}
B = {1, 2, 3, 4, 6, 7, 8} E = {1, 2, 4, 5, 6, 7, 10}
C = {1, 2, 4, 5, 6, 7}
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
State A is the start state, and state E is the only accepting state. The complete transition table Dtran
is shown below:
input symbol
state
>A
B
C
D
E*
a
B
B
B
B
B
b
C
D
C
E
C
Minimizing the number of states of a DFA
There is a simple method for finding the minimum state of a deterministic automaton. Rather
than give a formal algorithm for computing, we work through an example. First some terminology is
needed. Let be the equivalence relation on the states of a deterministic finite automaton, such that
p q if and only if for each input string x, (p, x) is an accepting state if and only if (q, x) is an
accepting state, we say p is equivalent to q. We say that p is distinguishable from q if there exists an
x such that (p, x) is in F and (q, x) is not, or vice versa.
Next for each pair of states p and q that are not already known to be distinguishable we
consider the pairs of states r = (p, a) and s = (q, a) for each input symbol a. If states r and s have
been shown to be distinguishable by some string x, then p and q are distinguishable by string ax.
Thus if the entry (r, s) in the table has an X, an X is also placed at the entry (p, q). If the entry (r, s)
does not yet have an X, then the pair (p, q) is placed on a list associated with the (r, s) – entry. At
some future time, if the (r, s) entry receives an X, then each pair on the list associated with the (r, s)entry also receives an X.
Example Let M be the finite automaton:
0
1
> a
0
b
1
e
1
f
1
0
1
1
c
0
0
0
1
1
0
g
d
h
0
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
We constructed a table with an entry for each pair of states. An X is placed in the table each
time we discover a pair of states that cannot be equivalent. Initially an X is placed in each entry
corresponding to one final state and one nonfinal state. In our example, we place an X in the entries
(a, c), (b, c), (c, d), (c, e), (c, f), (c, g), and (c, h).
Continuing with the example, we place an X in the entry (a, b), since the entry ((b, 1), (a, 1)) =
(c, f) already has an X. Similarly, the (a, d)-entry receives an X since the entry ((a, 0), (d, 0)) =
(b, c) has an X. Consideration of the (a, e)-entry on input 0 results in the pair (a, e) being placed on
the list associated with (b, h). Observe that on input 1, both a and e go to the same state f and hence
no string starting with a 1 can distinguish a from e. Because of the 0-input, the pair (a, g) is placed
on the list associated with (b, g). When the (b, g)-entry is considered, it receives an X on account of a
1-input, and hence the pair (a, g) receives an X since it was on the list for (b, g). The string 01
distinguishes a from g. On completion of the table:
b
X
c
X
X
d
X
X
X
e
X
X
X
f
X
X
X
X
g
X
X
X
X
X
X
h
X
X
X
X
X
X
a
b
c
d
e
f
g
We conclude that the equivalent states are a e, b h, and d f. The minimum-state finite
automaton is given below:
>
0
{a, e}
0
1
{b, h}
1
{g}
0
1
1
1
0
{c}
0
{d, f}
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
Finite Automata with output
One limitation of the finite automaton as we have defined it is that output is limited to a
binary signal: “accept” | “don’t accept”. Models in which the output is chosen from some other
alphabet have been considered. There are two distinct approaches; the output may be associated
with the state (called a Moore machine) or with the transition (called a Mealy machine). We notice
that the two machine types produce the same input-output mappings.
Moore machines
A Moore machine is a six-tuple (K, , , , , s), where K, , , and s are as in the DFA. is the
output alphabet and is a mapping from K to giving the output associated with each state. The
output of M in response to input a1a2 … an , n0, is (q0) (q1) … (qn) , where q0, q2, … , qn is the
sequence of states such that (q i-1, ai) = qi for 1 i n. Note that any Moore machine gives output
(q0) in response to input . The DFA may be viewed as a special case of a Moore machine where
the output alphabet is {0, 1} and state q is “accepting” if and only if (q) = 1.
Example, Suppose we wish to determine the residue mod 3 for each binary string treated as a binary
integer. To begin, observe that if i written in binary is followed by a 0, the resulting string has value
2*i, and if i in binary is followed by a 1, the resulting string has value 2*i + 1. If the remainder of i/3 is
p, then the remainder of 2*i/3 is 2*p mod 3. If p = 0, 1, or 2, then 2*p mod 3 is 0, 2, or 1, respectively.
Similarly, the remainder of (2*i + 1)/3 is 1, 0, or 2, respectively.
It suffices therefore to design a Moore machine with three states, q0, q1, and q2, where qj is
entered if and only if the input seen so far has residue j. We define (qj) = j for j = 0, 1, and 2. The
following figure shows the transition diagram, where outputs label the states.
1
0
start
q0/0
q1/1
q2/ 2
0
1
0
1
Note! We use q/a as a state indicate that (q)=a.
On input 1010 the sequence of states entered is q0, q1, q2, q2, q1, giving output sequence
01221. That is , (which has “value” 0) has residue 0, 1 has residue 1, 2 (in decimal) has residue 2,
5 has residue 2, and 10 (in decimal) has residue 1.
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
Mealy machines
A Mealy machine is a six-tuple (K, , , , , s), where all is as in the Moore machine, except that
maps K to . That is, (q, a) gives the output associated with the transition from state q on
input a. The output of M in response to input a1a2 … an , n0, is (q0, a1) (q1, a2) … (qn-1, an),
where q0, q2, … , qn is the sequence of states such that (q i-1, ai) = qi for 1 i n. Note that this
sequence has length n rather than length n + 1 as for Moore machine, and on input a Mealy
machine gives output .
Example,
0/y
0/n
p0
start
1/n
q0
1/n
0/n
p1
1/y
We use the label a/b on an arc from state p to state q to indicate that (p, a) = q and (p, a) = b.
The response of M to input 01100 is nnyny, with the sequence of states entered being q0p0p1p1p0p0.
Theory: If M1 is a Moore machine, then there is a Mealy machine M2 equivalent to M1.
Theory: If M1 is a Mealy machine, then there is a Moore machine M2 equivalent to M1.
ตัวอย่ าง การออกแบบเครื่ องขายน ้าหวานอัตโนมัติ สมมุตวิ า่ เครื่ องๆ นี ้รับเฉพาะเหรี ยญ 1 บาท
และ 5 บาทเท่านัน้ และน ้าหวานที่ขายราคาถ้ วยละ 3 บาท โดยมีน ้าหวานสองประเภท คือ
น ้าเขียวและน ้าแดง
เมื่อผู้ซื ้อหยอดเหรี ยญมูลค่าครบ
3
บาทแล้ ว
สามารถเลือกกดปุ่ มสีเขียวหรื อสีแดงก็ได้
เพื่อรับถ้ วยน ้าเขียวหรื อน ้าแดงตามลาดับ
ในกรณีที่หยอดเหรี ยญมูลค่าเกิน 3 บาท เครื่ องจะทอนเงินให้ ด้วย
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
กดปุ่ มแดง/
เริ่ มต้ น
กดปุ่ มแดง/
1/
s0
s1
กดปุ่ มเขียว/
กดปุ่ มเขียว
กดปุ่ มแดง/น ้าแดง
/น ้าเขียว 5/2
5/3
กดปุ่ มเขียว/
1/
กดปุ่ มแดง/
s3
s2
1/
1/1
5/5
กดปุ่ มเขียว/
5/4
หรื อ แสดงด้ วย transition table ได้ ดังนี ้
สถานะ
s0
s1
s2
s3
1
s1
s2
s3
s3
5
s3
s3
s3
s3
กดปุ่ มเขียว
s0
s1
s2
s0
กดปุ่ มแดง
s0
s1
s2
s0
1
1
5
2
3
4
5
กดปุ่ มเขียว
น ้าเขียว
กดปุ่ มแดง
น ้าแดง
Grammars and the Chomsky Hierarchy of Languages
Definition A grammar G is a quadruple (V, ,R,S), where
V is an alphabet or vocabulary,
(the set of terminal symbols or terminal alphabet) is a subset of V,
V - is the set of nonterminal symbols or nonterminal alphabet,
R (the set of rules or productions) is a finite set of ordered pairs (, ) such that and
are in V* and contains at least one symbol from V - , and
S (the start symbol) is an element of V - which is distinguished from the other nonterminal
symbols.
For any , V*, we write whenever (, ) R.
For any strings , V*, we say that is derivable from in one step, or if and
only if there are strings , , , V* such that = , = , and . We call any
sequence of the form 0 1 … n a derivation of n from 0. Here 0 , …, n may be
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
any strings in V*, and n, the length of the derivation or the number of steps, may be any natural
number, including zero.
We write 0 *n if the derivation involves zero or more steps and 0 + n if the
derivation involves one or more steps.
Definition A sentential form is any string which is derivative in zero or more steps from starting
symbol. The language L generated by a grammar G = (V, , R,S) is the set of all sentential forms
whose symbols are terminal, i.e., L(G) = { w * : S * w}. The element of language is called
sentence.
Definition A grammar G = (V, , R,S) is said to be of type i if it satisfies the corresponding
restrictions in this list:
i = 0 : No restrictions.
i = 1: Every rule in R has form A , where , and are in V*, A V - , and ,
except possibly for the rule S , which may occur in R, in which case S does not occur on the
right-hand sides of the rules.
i = 2: Every rule in R has form A , where A V - and V* .
i = 3: Every rule in R has form
a) (right-linear) either A aB or A a, or
b) (left-linear) either A Ba or A a
where A, B V - and a or a = .
Type 0 grammars are often called phrase structure grammars. Type 1 grammars are called
context-sensitive grammars. Type 2 grammars are called context-free grammars. Type 3 grammars
are called regular grammars.
A language is said to be of type i if it is generated by a type i grammar. It is obvious from the
definition that every type 3 language is also type 2 and every type 1 language is also type 0. It is also
trivial that they are all type 0 at the same time. This means that
type 3 language type 2 language type 1 language type 0 language.
Example (regular grammar) G3 = (V = {A, B, C, a, b}, = { a, b }, R = { A aA | bB, B aB | aC,
C b }, A) generates the language {ambanb | m,n are integers m 0, n 1 }
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
Example (context-free grammar) G2 = ( V = { S, a, b }, = { a, b }, R = { S aSb, S }, S)
generates the language { anbn : n = 0, 1, … }.
Example (context-sensitive grammar) G1 = ( V = { S, X, Y, a, b, c }, = { a, b, c }, R = { S abc,
S aXbc, Xb bX, Xc Ybcc, bY Yb, aY aaX, aY aa }, S) generates the
language { anbncn : n = 1, 2, … }.
Example (phrase structure grammar) G0 = ( V = { S, A, B, C, a, b } , = { a, b, c }, R = { S Bsa |
Asb | C, C , BC aC, Ba aB, Bb bB, AC bC, Aa aA, Ab bA }, S )
generates the language { ww | w {a, b}* }.
How to write a context-free grammar by recursive definition?
Example, Language EQUAL,
A string is in EQUAL if it has an equal number of a’s and b’s.
e.g. { ab, ba, aabb, abab, abba, baba, … }
Recursive Definition of EQUAL
A string is in EQUAL if it is
- an a followed by an b-type string
- it is a b followed by an a-type string
A string is an a-type if it is
- an a , or
- an a followed by a string in EQUAL, or
- a b followed by two strings of a-type
A string is an b-type if it is
- an b , or
- a b followed by a string in EQUAL, or
- an a followed by two strings of b-type
Rules for EQUAL
A string is in EQUAL if it is
- an a followed by an b-type string
- it is a b followed by an a-type string
S aB
S bA
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
A string is an a-type if it is
- an a , or
- an a followed by a string in EQUAL, or
- a b followed by two strings of a-type
A string is an b-type if it is
- an b , or
- a b followed by a string in EQUAL, or
- an a followed by two strings of b-type
Aa
A aS
A bAA
Bb
B bS
B aBB
Therefore EQUAL = (( V = { S, A, B, a, b }, = { a, b }, R = { S aB, S bA, A a, A
aS, A bAA, B b, B bS, B aBB }, S)
Note! EQUAL is not equivalence to { anbn : n = 1, 2, … }. But { anbn : n = 1, 2, … } is a subset of
EQUAL.
Example, Write context-free grammars for the list structure. A list structure may be defined as follows:
i)
is (null) list structure.
ii)
a (an atom) is a list structure.
iii)
If l1, l2, …, lk are list structures, k 1, then (l1, l2, …, lk) is a list structure.
Solution
Rules for list structure
is (null) list structure
S
a (an atom) is a list structure
S a
If l1, l2, …, lk are list structures, k 1, L S | L , S
then (l1, l2, …, lk) is a list structure
S(L)
Therefore
list structure = ( V = { S L a , ( ) },
= { a , ( ) },
R={S|a|(L)
L S | L , S },
S)
หมายเหตุ มีหลายคาตอบ เช่น
<list-structure> ( <list> ) | <obj>
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
<list> <obj> | <list> , <list> | ( <list> )
<obj> | a
หรื อ
ListStruc ( Lists ) | Atom
Lists ListStruc | ListStruc , Lists
Atom | a
แบบฝึ กหัด จงแสดง string ที่เป็ น list structure มา 8 ตัว
Parse Trees and Derivations
A parse tree may be viewed as a graphical representation for a derivation that filters out the
choice regarding replacement order. Each interior node of a parse tree is labeled by some
nonterminal A, and that the child of the node are labeled, from left to right, by the symbols in the right
side of the production by which this A was replaced in the derivation. The leaves of the parse tree
are labeled by nonterminals or terminals and, read from left to right, they constitute a sentential form,
called the yield or frontier of the tree.
For example, consider the following grammar for arithmetic expressions, with the
nonterminal E representing an expression.
E E + E | E * E | ( E ) | - E | id
(g4.1)
The string –(id+id) is a sentence of grammar (g4.1) because there is the derivation
E - E - ( E ) - ( E + E ) - ( id + E ) - ( id + id )
The sequence of parse trees constructed from this derivation is shown below:
E
E
E
E
E
( E
)
E
E
E
E
E
E
( E )
( E
)
( E
)
E + E
E + E
E + E
Id
id
id
There are two type of parse tree constructions:
Top-Down Construction: construct from starting symbol down to yield or frontier
Example,
a)
E
(
E
)
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
b)
E
(
E
E +
)
E
c)
E
(
E
)
E + E
id
d)
E
(
E
)
E + E
Id
id
Bottom-Up Construction: construct from yield or frontier up to starting symbol
Example,
a)
E
(
id + id
)
b)
(
E
id
c)
E
+ id
)
E
(
E
id
(
E
id
d)
+
E
E
+
E
id
)
E
id
)
There is often a degree of arbitrariness in the choice of replacement made in a derivation. If
the leftmost nonterminal in a sentential form is replaced at each step, such derivations are termed
leftmost, lm . Analogously, rightmost, rm derivations are those in which the rightmost
nonterminal is replaced at each step.
For example, from grammar (g4.1)
leftmost
rightmost
E lm E + E
E rm E * E
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
lm id + E
lm id + E * E
lm id + id * E
lm id + id * id
rm E * id
rm E + E * id
rm E + id * id
rm id + id * id
BNF Notation
Backus Naur Form (BNF) has been used extensively in the formal definition of many
programming languages. It is equivalence to context-free grammar. A popular language described
using BNF is ALGOL. For example, the definition of an identifier in BNF is given as
<identifier> ::= <letter> | <identifier> <letter> | <identifier> <digit>
<letter> ::= a | b | c | … | y | z
<digit> ::= 0 | 1 | … | 8 | 9
Note that the symbol ::= replaces the symbol in the grammar notation, and | is used to separate
different right-hand sides of productions corresponding to the same left-hand side. The symbols ::=
and | are interpreted as “is defined as” and “or”, respectively.
สรุป
จากทีก่ ล่าวมา สามารถแบ่งกลุม่ ของตัวแบบที่ใช้ แทนภาษา ดังนี ้
expressions
regular expression
automata
finite automata
pushdown automata
two-pushdown automata
turing machine
grammars
regular grammar
context-free grammar
context-sensitive grammar
phrase structure grammar
ตัวแบบที่อยูบ่ รรทัดเดียวกัน จะ equivalent กัน คือ ถ้ าตัวแบบใดสามารถใช้ แทนของภาษาใดได้
ตัวแบบที่ equivalence กัน จะสามารถแทนของภาษานันได้
้ เสมอ ซึง่ จะไม่ขอพิสจู น์ในที่นี ้ (ดูในเอกสารอ้ างอิง)
ภาษาแบ่งออกได้ เป็ นกลุม่ ตามตัวแบบที่ใช้ แทนภาษานัน้ คือ
- ภาษาในกลุม่ ที่แทนด้ วย regular expression, finite automata, หรื อ regular grammar เรี ยกว่า type 3
หรื อ regular language
- ภาษาในกลุม่ ที่แทนด้ วย pushdown automata, หรื อ context-free grammar เรี ยกว่า type 2 หรื อ
context-free language
- ภาษาในกลุม่ ที่แทนด้ วย two-pushdown automata, หรื อ context-sensitive grammar เรี ยกว่า type 1
หรื อ context-sensitive language
- ภาษาในกลุม่ ที่แทนด้ วย turing machine, หรื อ phase structure grammar เรี ยกว่า type 0 หรื อ phrase
structure language
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
CS344-232 Mathematical Foundation for Computer Science II
หมายเหตุ ยังมีภาษาอีกมากมายที่ไม่สามารถใช้ ตวั แบบที่กล่าวมาแทนได้
สาหรับตัวแบบ two-pushdown automata หรื อ context-sensitive grammar ที่ใช้ แทนภาษาในกลุม่ type
1 ไม่คอ่ ยมีประโยชน์ในการนาไปใช้ จึงไม่คอ่ ยมีการกล่าวถึงอีกต่อไป ส่วนตัวแบบที่ใช้ แทนภาษาในกลุม่ type 3
และ type 2 สามารถนาไปใช้ ในวิชาที่เกี่ยวกับการสร้ างตัวแปลภาษา โดยใช้ เพื่อแยก โทเค็น (token) และ
ตรวจสอบวากยสัมพันธ์ (syntax) ของภาษา ตามลาดับ เป็ นต้ น สาหรับ turing machine นาไปใช้ ใน
ทฤษฎีของการคานวณ (theory of computation) ซึง่ จะกล่าวถึงในบทถัดไป
โครงการจัดตังภาควิ
้
ชาวิทยาการคอมพิวเตอร์ คณะวิทยาศาสตร์ มหาวิทยาลัยสงขลานคริ นทร์
© Copyright 2026 Paperzz