Document

(
...
"
...
,
1
1
.
,,
COMPUTER GO-MOKU
0
.....
~
iii>
,
~
"
,-,
-,
J
""
.
,
'.'
1
.
ABSTRACT
.
•
'
Go-Moku, a simple versionC'pf the ancient Oriental game of -Go, is a two-nerson
board
,
F~
game with an enormous
n~?1ber of possible rno~es
at every
r~'
This thesis details the desi?n'
This-téG:hnique~has
of a technique to adequately 'search the game tree for Go-Moku.
been'imple-
mented in C anl Pascal. The performances of fuis program against other programs and against
l
'\,
~
~
,
human opponents have been docurnented. The, rules of Gu-Moku anf! sorne basic searèh strategies fof' two-person garne trees are presented for background purposes.
,.
:
~
RESUME
.
.
Le jeu de Go-Moku e'st joué par deux personnes sur un tal;>leau du vieux jeu Oriental de
~
0
Go. A chaque tour,lun joueur a un énorme nombre de possibilités. Dans cette thèse nous
/'
•
étudions· une méthode de recherche de l'arbre du jeu. Nous avons encodé
l'algorithme en C et
-
'
en 'Pascal, et comparé le programme avec d'autres
progr~mmes
....
et des adversaires humains.
Nous présentons aussi dè façon générale les règlements du jeu de Go-Moku et drvèrs stratégies
~
de recherche d'arbres de jep.
'~
."
.
.'
..j'
/
l
'
/
.
"1
"
_
...
11
....
•
)
ACKNOWLE;DGEMENTS
+ '
»
1 would 1ik~ to express my gratitude to' my supervisor, Dr. Luc Devroye, for his friendship
[
,
.
.-
and rus guidance throughout my years at McGill. 1 l!m grateful to the Naturu\ Sciences and
•
Epgineering Research Council of Canaqa for théir gè~erous financial support. Mo~t of ull, 1
.
.
would like to thank my husband, Herbert,,111X daughter, Justine, and my family for theirDlove .
-
..
,
'
andsup~.
1
t7
;1
v
t
\
üi
.....
,
..
1 _
'l'ABU: OF CONTENTS
\'
• ~.
.
1\
~
-"
Abstra,.ct ............................................... ,....... J......,.. : .............
t. . . . . . . :. . . . . . . . . . .t . . . .
. '."
Résumé ..................................................... : ...........
ô ......................................: .....................
~..
_~ . .:f
,'"
. )
\
Acknowlttlgements
:
. . . . ......... ,........ :............................................. ~ ..........................................
.
.'-
,
Lis~'of Pig~~es ........................................ :..... :.................................................................... .
v
List of Listings ................................................................ :................................................ ..
'vi
.
l
"
,
.
ii
J
vii
LIst of Tables ......................................................................................................: ........... ..
~
1. Introduction .................................................................... \............................................
1
ti· Desç;riPtion. of Go!Moku ..............:.................................:......... ........................
2
~
2. Review of Search. Procedures for Two-Person Games ...... :.......................... ';, ..........
Q
6
.
.
2.1 The Two-Person Game Tree ......-; ......................................... :.............................
6
"
2.2. Starie Evaluation Punction ........ ...........................................-l...........................:..
,
.
6
2.3. Minimax Search ...................................................... ~.........................................
7
2.4. Alpha-Beta Search, ............................................. :~~'.............................................
'
9
2.5. Move,,()rdering ..................'I ......................................... ~ ................................... ..
13
2.6. The Horizon Effect ....................................................... .-;-;.......... ............ ............
14
'.
of Go-Moku Programs :................................................................................:
15 .
3.1. A Learning Compone nt in a Go-Moku Playing Program ....~............................
li
3. Review
-
'
3.2. Automatic Description and Pattern Recognition ................................................
)
J
16
)
-'
Iv "
..',..
.
..
~
-
·3.3. A Look-Up Table ..................................................................................... ".. ..... ..
18
3.4. fattem Matching ... : .............................................................................. , ... ~ ....... .
18
J
~
3.5. Heuristics ..................................................... :................................................, ... .
4. The Design ............................................................................... :.................................. .
....
19
~
20
4.1 Chains and States ......................................... :.................................................... .
20
.
\
w.............,.................................... :"............
.. :.. .
22
4.2. Evaluâtion Function ......................
.."
•
.
<.r-
4.3. Data Structures ..........................................-.............. :... :............ : ....................... :
27
4.4. Move Selection ............. ~.................................;.. ..........................; ...................... ..
29
5. Implementation ............................................................................ :............................. .
36
thà I?~ta Structures .............................................................................,
36
5.1. Updating
4
~2.
0
-)
/
!r/
Timing )and Tree Pruning ........ ................................. ......................................... ...
36
A'
Q,
5.3. Fanout and Depth ............................................................................................ ..
,
-
r .
,
.
6. Performance ...............................................................................................................
.
, ,
39
6.1. Performance Against Turbo's Program .......1................: ...................................,:.
39
6.2. Performance Against Karlsson's Program ............................ :........................... .
39
,
...
,
6.3. Performance Against Human Opponents .......................................................... .
•
..
37'
42
7. Conclusions ................................................................................................................ ..
44
,References ................................ :........................................ ~.................... :........................ .
46
~
/
J
,
v
o
i
" .
'
...
LIST OF FIGURES
3
Figure 1.1.1 A win follows from an open four
~1,
.
/
,.<....r..,,\
Figure 1.1.2 Example of an open four ...............................~ ..................:..................... ..
4
Figure 1.1.3 Examples of simutaneous threat ......................................., ..: ................: ...... .
5
Figure 2.3.1 Minimax procedure .' ... ......................./: ... :............ >...... ~ ...... ,'........... ~ .......... ..
8
(1 ...
_
~
......1';,."-/
~
Il'
'
Figure 2.4.1 Alpha-Deta procedure ................................ : ............................................... ..
fo
.................... ..
17
.....~ .................. ~ ........................ .
21
transition diagram ............................................................ .
24
4.3.1 The data structures ...................................................................................... .
28
Ffgure 4.4.1 Procedures inyolved in move selection ......................................... ~ ............ .
30
Figure 3.2.1 A local pattern .............................................. :...................._<~
Figure 4.1.1 Th:
.
Figure 4.1.2 The
Flgur~
~O chains affected by ohe stone
pa~tate
.r
•
. '
Ff~re 6.1.1 A sample g~ Turbo ...... ~............'~ .....
c.........................................:.:-
40
Figure 6.2.1 A sample game with Luff ...... :.................................................................... ..
,
~
.
Figure
6~.l
.,
41
43 •
A sample game with a human opponent
~
,
'.
(
\
..
\
vi
,,
List 9f Listings
\
Listing 2.3J Minimax procedure
.
,
• • • • • • • • • • • • • • • • • • • • • • • • • • • , • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 111 • • • • • • • • •
ListiQg 2.4.1 ~lpha-beta.. pr9cedure
•••••••••••••••••••• \ , . . . . . . . . . . . . . . . . . . . . . , ••••••••••• Il •• ' •••••••••••••••••••••••
.
1
.
.
9
'12
Q
Listing 4.4.1 Function'VALUÈ ....... ?.................................................................. ::...........
')
34
(
"
...
·1
~
~
,
<>
:
,;
1
~
,
0
. . . . _,s~
~
,.~
1.,"
:J
.'
.~
•
...
J
\
r
\
.
\
~
~
.
,-
\
.
~
-
.
"
,.
4
,
•
"
.0
r
c'
,
,
LIST OF TABLES
.....................................
:...............
Table 2.4.1 The effects of. the Alpha-Beta alogorithm
,
d
12
of states and its pattern .......................... ~.............................................
23
~ablè
4.1.1 List
11'
.
Table 4.2.1 Scores for each state .................................... :.................. :.............................
.
~
,
&
d. . . . . .
Table 4.2.2 Values of each score ...................................................................... :..............
Table 4.2.3 Scores for killer rno~es
.
,
'""".
.
/
1
•
•
/
.
r
•
.
,,'
"
ô
,
,
1
.,
\.
26
t'
-
'0" .....: ..........·.... ;; ......................................
Table 5.2.1 Statistics of sarnple games ................ ............ ................ ............ ................. .
\
25
~
.
'
27
37
,
'
,
)
--
-
/'
-,
,
..
~,
-
\
."\
,
.
1. INTR6DUCTION
~
/
\
Computer programs thal play games of skill have been
Il
favorite arel;l of reseurch in
~
anificial' intelligence since the introduction .of electronic computers. The appeals of' gameplaying for artificial il'!telligence researchyrs have becn described by Feigenbaum and Feldmllnl
l
1]:
l '. .
·
Effectively, it provides a direct coritest between man's wit and machine's wit. On Il more serious level, game situations provide problem environments which are relatively highly regl!lar and
weil defined, but which afford sufficient complex~ solution gene!ution so Ihat inlelU~ence
and symbolic reasoning skills play a CruCIal role. Iii' sh()rt,-game environments are very useful
:~ task environmettts for studying the nature and structure of complex problem-soIvmg processes.
This thesis desêribes an evaluation stralegy that has been designed and implemented 10
p ay the board game Go-Moku. It is a sImple but not a trivial game with the intellectuat"content
of cbeckers. The object of the game is quite simple, The first player able to place five. of his
ç'-
pieces in a row wins. The
, size of the board allows for a huge number of moves at every turn,
.
At the start of the game, there are 361 possible moves. The number of possible moves is
1
reduced by one after each turn. This results in agame tree
Wlth
an enormous branching factor.
Designing a technique to se arch the game tree effectively -within a minimal time span
~s
the
objective of this research.
The remainder of this chapter describes the roles and strategies of playÎng Go-Moku.
Chapter two cantains general techniques used in game playing programs. A review of previous
research with Go-Moku programs are presented 'in chapter three. Chapter four cantaÎns a descripl
'
,
tion of the desi~nd chapter five dicusses the implementation. The performu'nce of the implementation and sample games are given in chapter six and finaIIYI the conclusion incJudes a sumo
mary.
,
'
1\
-
f
JI
2
1.1. Descrlpdon 'or Go-Moku
~
Oo-Moku is a two-person game played on a Go board (a 19 X 19 grid) using 00 stones
(a suppl}l of indistinguishable stones, blat ones for one player and white ones for the other)[2].
~
Black al ways starts, and the players move altemately placing his stone on any vacant point
of intersection (x,y) The winner is the first player to complete a (horizontal, vertical or diagonal)
'\
line of exactly five adjacent stones of his colour. Hence, the game is caIled Go-Moku since, in.
Japanese, Go means five and Moku means
stones[3].
,
.
~
~
As soon as one player forms an open four, four black (white).stones in a row with the two
adjacent points of intersection (at each end of this row) vacant, then it is evident that he wins the
game on hls next move (unless his opponent can do ~o immediately), since the opponent can
)
.stop him from forming a five 'only on one side, the player can then complete the five on the
other. Thus, in Figure LU, Black wins by playing (12,11). If White stops the five on (9,8),
Black completes It on (14,13), and vice versa.
Consequently, a player threatens tq win agame whenever he gets an open three, three
"
black (whIte) stones
•
ln
a row
ln
r
such a way that he can form an open four on his next move .
.
Thus, an open three, an open four, and a four aIl resuIt in a must block situation for the
opponent. In Figure 1.1.2, an open four can b;,: formed
by
plàying (7,5), (11 ,9), or (8,13) and
can be blocked by playing (7,5), (11,9), (8,13), (6,13), or (11,13). /
If only one four or ont! ,five is threatened, the opponent can al ways block it. Thus, the
'-
'
'>
player can win only if two threats are made simutaneously. Figure 1.1.3 shows ex amples of
simultaneous threats. Two fours are formed simuItaneously if Black plays (7,6). By playing
(16,11), two threes which both lead to open fours is formed. A four and an operl three are
formed when (10,17) is played.
l'
(~
\
~
3
~
~,.
\
1
2
3
4
6
5
8
7
9
10
1
2
13
14
15
16
17
19
18
,..
~-
~
,
3
\
11;l
4
5
~
r
6
,
7
1
'- \\,
,
.,
8
~
9
10
11
,
12
,
.
1Iz
.
,
,
13
-
,
,
14
.
15
16
,
,)
.
[,r'
.~
~
17
18
fi
•
.t
19
Figu~~
1.1..1 A'win follows from an open four .
..
j
,",
'f
.,..
')
"
.
..
4
2
3
5
4
6
7
.8
9
10
11
12
13
~
14
15
16
17
18
19
~
'r
(
L
2
.
3
.
4
.
~
,
--
6
'J
7
- ...
•
,
.'
8
9
.
0
{
10
11
(
12
Jo
13
14
1~
1..,
.
-,
I~J
•
~
16
17
18
,
19
•
Figure 1.1.2 Exarnple of an open four.
f
..
l
,
,
5
t
~
Î
\
#
<'
1
2
1
f
3
1
'5
6
7
8
9
10 '11
12
13
14
IS
16
17
18
19
,
2
"
3
4
"
5
.A
,
6
7
8
~
-e
0- -e
)
.
9
,
:
10
t
11
~
~-
.
~
~
y
•
'1"2.,
13
14
~
~
.
)
15
16
\
17
~
18
\
19
Y
Figure 1.1.3 Examples of simultaneous threats.
'-
A
6
"'---l,r
,
2. R'EVIEW OF SEARCH
J...'
"
PRO~DURES FOR TWO-PERSON G~MES
---!
'Î Î'
, 2.1. Tbe Two-Person Game Tree
\.
J
Bach node in agame tree represents a board situation in the game, the root being the
.
current one. A move consists ôf a single action by one player. The term ply is used to detlote
the levels in the tree, thus. th'e root leveI is ply O. Each no de at ply 1 represents a board situation resulting from a move made by player 1. Each response by player 2 yields a new board
situation represented by anode at ply 2. Consequently, aIl nodes at odd ply or even ply are
Q
,
4
situations directly resulting from a move made by player 1 or: player 2 respectively.
\
If the game tree could be generated fully, then an optimal move may be se1ected by
searching backward from the terminal nodes. However, in almost al! board games, this ~ould
lead to a.tree with a combinatorial explosion of unthinkable numbers. An alternative is to generate a reasonable portion of the tree and to compare terminal situations to. yield a basis for
move selection. A "reasonable portion of the tree" might be taken to mean ail legal moves
'within a fixed limit of depth, time, or storage, or it might be refined in various ways[4] [5J.
\>.
2.2. Statie Evaluation Fonction
To effectively compare two terminal situatiOJ1s, aIl judgements
abou~
board position must
be ,converted into a singÎe, overall quality number. The procedure of determining the quality
.,..
number is called starie evaluarion.
The statie evaruation function is usually a polynomial -,whose
,
1
"'.
'2~
tenus represent various features of the position, high values being given for features fav6uràble
to one player, and low ones for those favouring his opponent. A somewhat complex funetion is
'"
7
o
•
required 10 obtain reasonably accurate static evaluation. However, there is a limit on the cornand time are directly proportional[6].
plexity that is feasible since complexity
,
If the statie eyaluatlon fut;lction was perfect, a game-playing program could select its movc
~ at each tum by gènerating alllegal moyes, evaluating e~ch of the resulting, positions, und choosing the moye leading to the best value. Since the evaluation is only an cstimate, crrors in Ihc"
function may be compensated by looking farther ahead. Sil1ce the stattc evaluation has
tive aspect, the assumption is that there will be less \r"Qom for a bad prediction
generate~
before the evaluation function is applied.
Il
predic-
ifr1
de> tree is
(
2.3. Minimax Search
The player hoping for large positive numbers is called the maximizing playcr and his
,
opponent is called the minimizing player. If the player to move is the maximjzmg playcr, he is
looking for a path that le~ds to a large posItive" number, and he will assume that his opponelll
0
will try to force the play'Thwards the situations with negative statie eyalualion~171.
A miniature game tree is shown in Figure 2.3.1. The maximizer may hope 10 reach the
,
situâtion with the evaluation of 9. But the maximizer must realize that the minimilCr.,would not
pennit that since- the minimizer
~an choose
a movt'tfiat yields the score 3. Thus, in general Ihe
m~imizer must tak; into account the attitude of the rninimizer at th~ next I~ye~, down. Similarly,
the minimizer chooses a move in accordance with the choices of the maximii'ir at the next level
down. This continues until the tiroit of the""' se arch is reà~hed, and the evaluation. function provides a direct basis for selecting. among the alternatives. In the example, the minimizer
may
.
choose moves which minimize
-
th~ s~res
to 4 or 3. Clearly the maximi zer will select the
.
,
'
'\
mo~e
1
that the minimizer can do no better Than to hold the ef"Pected score 10 4.
The procedure by which the scoring
inform~tjon
passes up the game tree is called the
i'
MINIMAX procedure. Winston's version [8] of this procedure is given in Lhting , 2.3.1.
\
\
8
-"
'\"
Maximizing level
li
,
,
Minimizing Jevel
1
Maximizing level
4
,3
7
9
\
~
"
•
Maximizi~g
4/~
....
level
.~ ~~.".
Minimizing lev.el
1/\
1/\
.. '. •. .
473
......
Maximizing level
9
Maximizing Jevel
Minimizing Jevel
...
- Maximizing Jevel
Figure 2.3.1
Minirnax procedure.
1
•
9
MINIMAX:
.1
Determine if the timit of search has been reached, or if the level is a. minimizing
level, or if the level is a maximizing level:
"Î
If the timit ofsearch has been reached, compute the statie value of the currcnt
positiçm relativè to the appropriate player. Report the result.
1.2 If the'level is a minimizing level, use MINIMAX on the children of the currcnt
position. Report the mmimum of thé results.
>;-.
,
1.1
...
1.3 Othérwise, the level is a maximizillg level. Use MINIMAX on the children of
the CUITent postion. Report the maximum of the results.
r
Listing 2.3.1. Minimax proc7dure.
~ernarkable redhctions inrstorage requiremenl<; may be made by performing evuluation'l
L
and ca1culating backed-up values simultaneously with tree generation. This can be accom,~
,
plished by generating the tree with a depth first sèarch approach. It is necessary to kcep only
one position at each level of look ahead, together with a Certain amount of infortnatioQ about
the moves from _each of the se positions[9].
2.4. Alpha-Beta Search,
/
Most interest.ing two-person games have trees with more than one million terminal positions in ,:.an average 4-ply search. To cope with such gigantie combinatorial growlh. a'"
refinement of the minimax method known as
c
..
alpha-bet~
pl1,lning must be introduccd. The
CI.
J ,~ ~
•
underlying concept of the alpha-beta algo~thm is that~f a player can choose from a number of
moves, once he finds one move which serv.es his 'pu~se he need not examine the remainder
of the moves in that group[4] [9].
Consider the situation in Figure 2.4.1 where the first two termmal nodes have alrcady
,.)
been evaluated. The Minimax procedure guarantees a score of 4 for the minimizer. If the ~aximizer takes the left branch at the root node then the maximizer cou Id do no worM than 4.
.
.
Now looking down the right branch of the root node, the evaluation of the first terminal node
..
'
J
r
10
\
IL
'Maxirnizing level
7
Minimizing level
•
)
•
•
7
4
•
•
ri
Maxïrnizing level
i
--
Maxirnizing level
,--v,/·~ .
1/\ /~\
~
\
?
-~.
.
\
Minimizing level
\
. . . .
Maximizing level
,
'>--
4
7
>4
Maxirnizing lever
, Minimizing level
Maximizing level
4
7
3
•
Figure 2.4.1
Alpha-Beta procedure.
)
1
,
-
'J
11
,
0\
yields a score of 3 for the."minimizer. Clearly, the minimizer can now he guarnnteed a score of
j
at most 3 and consequently the maximizer can do no better than a
? going
down the right
branch at the\root node.
Suppose the terminal node with score 3 has a thousand siblings. Ail of thesc siblings
.
'
need not be evaluated since the maximizer is guaranteed a score of 4 along the Icft brunch und
the maximizer does not need to know 'm6re about the rig'ht branch other than that he can do no
better than a 3 there.
rl
The move that lead ~o the guaranteed score of 4 is said to re/ut. moves that néed not be
evaluated.
y
•
Listing 2.4.1 shows the recursive version of Minimax with alpha-betu by Pearl(6). Procedure V ( J;
a, p ) receives two parameters, a < /3, and evaluates V (l), the minimax value of
node l, if the value lies between a and
V(J)
~
/3).
- 00, + 00).
/3.
Otherwise, it returns either a (if V (l)
~
a) or
/3 (if
Thus, if J is a root of agame tree, its minimax value will be obtaiQ!!d by V(J;
\
\
The s\;lbstantiaI savings ,achieved by the alpha-beta algorithm make it un almost essentilll
segment in any program that se arches two-person
g~me
trees. The ulgorithm always chooses
the -same move that would be selected by the minimax algonFhm, Dut usually in a fraction of
the time. How powerful is the alpha-beta algorithm? Newbo/n,s investigationl JO) of the power
,
'
.
of the alpha-beta algorithm has produced the following results. Table 2.4.1 shows, for various
.
branching factors (b) and in searehes of 2- and 3-ply, the number of terminal nodes the program would examine using alpha-beta when the nodes are randomly ordered and optimally
ordered.
/
One" can see that as the branehing factor inerbses, so does the proportion of node!. rhat
can he ignÇ>red due to alpha-beta. And as the depth of seareh increases, the effect of the algonthm is again inereased. So the bigger the tree becomes, the greater the savings.
.'
~
'.
12
FUNCfION V ( J; ~ Il
{ -constraints: cx <
/
13, b
):
2 1}
,
BEGIN
IF J is terminal THEN RETURN V(J) = evaluation(J).
ELSE
~l'
Let JI. ' . . ,Jb be the su~ssors of J
IF J,Îs a MAX no de
THEN
FOR k := 1 to b 00
cxf-max[ ex., V( J k ; ex., 13 ) ]
IF ex. ;:= 13, RETURN p.
RETURN a.
ELSE
FOR k := 1 to b\OO
/'
pf-min[ p, V (~) ]
IF Il =:; a, RETURN a.
RETURN p.
END'
Listing 2.4.1 Alpha-beta procedure.
b
2
4
8
16
2-ply se arch
total terminal
random
nodes
.>
4
3.67
16
12.14
64
38.65
122.11
256
3-ply search
total terminal
raQdom
nodes
6.84
8
64
40.11
512
220.37
1214.45
409.6
optimal
.
3
7
15
31
f
,
optimal
566
15
44.248
127
o
Table 2.4.1 The effects of the Alpha-Beta algorithm
't
If the moves are examined in their optimal order, the tree search res-ùhs in 2 x ,...[fi -1
terminal nodes being examined, where N is the total numher of terminal nodes in "the tree.
\,'
The branching factor is typically 36 for agame tree in chess. Thus the number of terminal
,
.
nodes is 364 for a 4-ply search. If the tre~ is optimally ordered, then by using the alpha-beta
algorithm, we need examine 0nly 2 x 362 terminal nodes to find the best move. That is a sav,
;
ing of q1ter 99% when compared with the simple minima; mètho<f.
,
-
t
The most important ~rryplication of these ksults is that the moves at a given position
Il
must be ordered in sorne way to take maximum advantage of the savings that can be achleved
\
)
1
,---1
,
,
1
/
13
(
'"
. with the alpha-beta algorithm.
f/
'J
~
2.5. Move Ordering
."
One technique for moye ordering is the kil/er heuristic. A moye that is (ound to rcfllll!
other mayes is called a 'killer' and is placed on the killer list. When·u newly gencmtcd node is
found ta be on the list, it is re-ordered to the front. That is, this moye should be considered
.first at this position'so as to increase. the chance of refuting 'Other moves.
There are many ways to implement the ktller heuristic, aIl 'bf which are not difticult, but
they aIl require the use of extra RAM. The ~implest implementation is to store the moye
which produced the last cutoff, for each level of the tree[ll~. Look at this moye firlit, when
,"
examining
~ext group of positions at the sàme level. If a dtfferent moye proYldes 11 re~ta.
tion, then this new killer moye replaces the original one. Anothefapproach is to keep more
,
.'"
tllan one killer maye on the list: Assume· that a giyen program stores four killer moves ut each
/
level and keep count of how often eaeh killer was used as & refutation. The four killcr moves
ean he
ord~red
so that the next time the program reaches this leyel of lookrahead, it will exam-
ine the killer moye most
frequen~ly
used first, followed by the second most frequently used
maye, and sa on.
"
one
The logic behind the use of the killer heuristic is that if a move refutes other moves in
.
~n, it will probably refute ather moves in a si~i1a~ position(7].
When a pro gram has finished
ilS
search of the game tree, and has selected it... moye, it
has in its memory the principal continuation, the path through the tree
~hich
'.
it considen. ta
represent the best play by bath sides. The first maye on the principal continuation reprc!-.ents
the
prog~am's
selected (best) .moye. This
15
fol1owed( by the move which it cxpect!-. üs
opponent to make in reply, then 4 the moye which
lt
thinks
IS
me most IiÎœly r~ply to itt-.
opponent's expected move; and so on.
\
14
1
We can now take.advantage of this information and use it to drder the moves. When the
begins to compute a move, simply use the 3rd ply move from the CUITent seartn
Program next
. . ,
•..Jo,
as the first move' to be examined. (The term ply is used to denote a single move by one
"
."
"
'
player.), The 4th ply move and the 5th ply move can be considered as 'killers' at ply-2 and
ply-3
r~spective]y
in the next search. Similarly, aIl moves on the path may serve as
'ki1lers~
.
;
the next search.
.'
in
( ;1
This method, which requires very little çomputatioll time and no more
memory réquirement than what is needed for. the killer heuristic, increases the possibility of
refuting other moves[4].
,
•2.6. The Horizon Effeet '
.
Searching to an arbitrarily limited depth creates a phenomenon that Berliner [12] [13]
,
~,
\
has called the horizon effect.
~erliner
observed that whenever se arch is terminated (short of the
end of the game)
and a statie evaluation fubction is. .
\
~lied, the program's
,
t ... ,
,
"reality exists in
~
ter ms of th~ ~tP~t of the statie eval~ation function, and anything that is not d~tectable at
evaluation time d
s not exist as far as the program is concemed"fi'2]. The first of two Îcinds
of eqors that may
suit is called the negative hor{zon effect: the pro gram may conclude that it
has avoided sorne undesirable effect when in fact the undesirable situation' has only been
delayed to a point beyond the horizon. A second kind of eITor, the positive horizon effect,
t'!
'involves reaching for a desirable
consequen~:
the program wrongly concludes that the conse-
quence is achievable or that the same conseqll,!!tlce may be achieved later in the game in a
•
more effective form .
.
...
... 1
\
,
..
1\
/,
1)
\.3. REVIEW OF GO-MOKU PROGRAMS
.
earning Component in a Go-Moku Playing Program
,
il
Elcock and MUITay [14] have written a program whi,h learns td play the board gume
Go-Moku using a le'arning mechanism called backtrack analysis. The object of their reseurch
was ta isolate and investigate particular aspects of this learning process which might be vulid
Y\,.
over a range of ilI-structured problems.
The learning pro gram (LP) stores a current list of subgoals which governs mQ.Ye selection in a gaIne. These subgoals are descriptions of moves from which u win is expccted to be
,
.
inevitable. The subgoals are ordered according to the number' of moves away from a win. This
,
allows for the most economical of a number of alternative winmng moyes to' be selected. It
'also makes the choice between attackmg and defensive moves straightforward; if black is to :~.;_",.
play and white has a higher level subgoal than black, then black must pfay (defensively) on the
intersection point which gives white this highe~ subgoal.
Each moye on the Iist of possible
mayes is processed ta find its description. Scannin~ the hst of subgoals by Jeyel, a compari1.ion
1
, is made oetween the description of the subgoal and the described pos1.lble moyes. The moye
"'"
which achiJves the highest level subgoal is selected. Pinally, a default moye, selected uging a
random number generator, is played if no maye ach{eves a subgoal.
'"'""---
the CUITent list of subgoals is generated by a backtrack an'<llysls comp<ment (BAC). A
game between the LP and an opponent is played ta a win. The, BAC is only called if the LP
has lost the game. Since the moves were selected on the basis of the LP's CUITent list of
subgoals, it follows that if the LP won the game, the LP selected the be.\! mayes and therefore,
the list of subgoals need no alterations. When the BAC is calh;d, it examines each of the
play~d moves in the reverse arder, Le. winning ,!"love at head of list. ft attempts ta find the
r
f
..
~-
,
16
erilieal board situati.on from wmeh it sees the win as inevitable. The subgoals are then modified
1-
to prevent a reoccurenee of this board situation.
The LP reached a stage at whièh It can learn nothing more from backtrack analysis of
lost games. The opponent wins by a move
~ich
1>
aehieved a described, but inadequately
described, subgoal on the list of subgoals: the backtraek' analysis cannot even
r~eognise
that
there is a critieal board situation, let alone resolve it.
3.2. Automatic Description and Pattern Recognition
In their follow-up article [15] , Elcock and Murray dealt extensively with automatic
description and recognition of board patterns in Go-Moku.
.'"'
They designed a method to
automatically generate, from a description of subgoals, a segfl\ent of control instructions which
,
directed the processmg of the board required to recogmze
~
r~ahzations
-~
of the described subgoal.
,<
It should be noted that 'their description was not a representatlon in the sense that the particular
pattern could ndt be reconstructed. The description only)mplied that, a local pattern has
satisfied certain constraints
,..
Figure 3.2.1 is an exa~ple of a ,local pattern~e desc~p~ion (in words) of the eSSential
content of this kind of board situation is:
''''--.
there exists a node (7,10) whieh is a constituent of two possible 5-pattems on each of two
lines tbrough the node:' on one of the Hnes througk_ the node the patterns have two pieces
played; on the other hne the pattt:rns have one p~ece played and a common implicit noqe
(10,10) of these patterns IS itself a constituent of two possible 15-patterns, with two pieces
played, on sorne other line through il
The descnption makes explicit reference to two nodes. Nodes such as (7,12) and (10,\3) the
realization
6f figure
3.2.1 are Implied by the description.
6
It is the existence of a continuation which leads in8'vitably to the creation of a 5-pattern
by a ~yer that the description is d~ned to capture. t>escriptions with
makes
u~e list of.subgoals that guides
,
()
the program's play.
/'
\)
this
property is what
1
b.
,
---
2
1
3
4
')
v'
5
,
/
6
7
'--'
1
f
\;-..
8
(
/
,.-
9
\
ti.
10
/
11
A
Y
13
14
15
•
.
16
17
..
,ô
y
12
~
1
1\
,
.. """ ~,.
-
,
L.-
V
1(\
-t> 18
~
1
r~
19
l,
Figur:e 3.2.1 A'local pattern.
,
,
,
. ,t
\.
18
,3.3. A Look·Up Table
Allwork's program [16J written in extended Basic for a NOVA 2, 'relies o~ a look:up
table and sorne exception conditions
10
determine the J?riority of a rno~~. Thehable consists of
.
81 pafterns and a value associafect' with each pattern.
"
-Each pattern has a length of 5, where the
first pOint is vacant and the rest are vacant, black or white.
,--
The last move by black (BI) and the last move by white (Wl)
----
<
ar~
scrutinized. For each
~
vacant point (maximum of 8) that is adjacent to BI, a pattern is formed by the vacant point
/
a,}d the next four points m a straight \ine in any of the eight directions. Similarly, eight pattems for ea~vacant point that is adjacent to Wl
"
IS forme~.
The score for the vacant point is
calculated by looking up the patterns in the table and assigning the...value of the pattern to the
_/
vacant point if the new score is
gr~ater than ~eViOUSlY ca\culated. The
score rnay be adjusted
if certain exception conditions have been satisfied. Fmally, the entire board
lS
scanned for the
vacant point with the highest .score and the move is made at that point.
,
V
3.4. Pattern Matching
t
A more sophisticated program called LUFF, which plays on the VAX, has been written
by Karlsson [17])n 1982. Similar ta the table look-up techniq'!e, Karlsson matches strings of
stones on the board with patterns in his database. This database is in the form of an automaton
~
,
which aIJows him a ,flexible length for, each pattern. Given the pattern, a score is given to each
o
empty point; also a second scoring system is added. This second score is concerned with the
threat point, which is similar to recognizing
~
killer move. A
mov~is~ chosenJJased
on the
combined score. This program plays a strong but conservative game sinee the strategy of the
program is to defend threats.
1
('
._-----~
,
. '
1
1
19
3.5. Heuristics
,
,
A version of Go-Moku implemented on the micro-computer hus becn developcd by
J
Turbo[18]. Although the game it plays is quite weak since there is no \OOk-lIhead und the
evaluatiol1 is simplisl?c', this version is worth m-entioning since it makcs 'lise of incrcmcnlal
updating. Realising that since only a smaU portion of the board is affccted: by enèh move, ils
"
\
'
\
evaluation function updates only that portion;,of the board that is affected. This decreuscs the
calcula1:ion lime considerably.
Its scor)ng function is based qn counting the number of pieces in a
"
~hain. It gives the
greatest score to building a five, followed by blocking a five, then creating a four, followed by
blocking a four, and so forth. It then searches through the whole bourd to select the ê1npty
/
-
point with
~e
highest score.
f
...
.'10
\
1
i
J
1
.
-,.
1
4
20
/
\
/
4. THE DESIGN
\
Due to the ~ze of the Go-Moku board, the number of possible moves at every turn is
,
enormous. This al80 leads to agame tree that grows at a combinatorial rate of 361 d , where d '
is the depth ?f the 'tree. The tree at a
d~pth
of 4, only two moves by each player, results in
over 10 billion
terminal nodes. Even with the use of Minimax and optimal Alpha-Beta
pruning,
,
,
it is estimated that over lcr=tnillion tenninal nodes still have to be evaluated. 1bùs, the objec0 ....
tive is to design a technique to selectively search the game tree in such a wày to obtain the
•
c
\
desired result in as tittle time as possible. The design consists of' a sophisticated evaluation
)
function which uses the concepts of states and incremental updating, and a dual data structuré
which minimizes data access time.
J
4.1. Chains and States
Let us define a chain to be 5 adjacent points in a line. For ex ample, a chain of 5' black
J
"'li
/
~
,
r
stones represents a win for black. Each point in a chain may have three 'POssiblities: vacant,
ri
,
•
black or white., On a (19x19) board, there are (15x19) honzontal, 19x15 very:ical and 2x15xl5
diagonal chaim., giving il total of 1020 possible five-position chains on the b6ard. Each time a
. stone is pJaced on the board, only 20 of the 1020 chains are affected by this move. (See Figùre
.~
4.1.1).
't
Suppose that we only conslder cha~ns that contain s'efes of one colour, a black chain or
a white chain.
(AAti~ed
chain contains both black and white stones.) A chain is said to b e )
inactive if lt is a Olixed chain, otherwise the cham is active. Since a win cannot arise from a
,
mixed chain, those chaïns that were inactive before the move are ignored. Thus, of the 20
chams possibly affected by one stone, only those chaîns that are activ,e are examined. 1Jtis is a
1
li
.
•
v
•
•
•
•
• •
• •
•
•
Figure 4.1.1
•
•
•
•
~.,~~J
..
r
•
The 20 chains affected by one stone.
..
. [
(
t
22
\
f
'
fonn of incremental updating, examimng and 'updating oruy that part of the board that is
affected.
<
Since each point in a black(white)·chain may be vacant or black(white), there are (25-1)
,
'
31 patterns of black(white) chains and 1 pattern of me empty chain. The state of a chain is a
~
.
number representative of me pattern of the chain. Table 4.1.1 list each pattern and its associated state number.
Let us loo.k at how the state of a cham ~an be easily determined. AlI chains have state l,
L
the empty chain, at the !;tart of the game when the board is empty. Ali state 1. chains can
,i
become astate 2, 3, 4, 5, or 6 chain. Similarly, a state ~ cijam can becorne astate -8, 11, 14,
or 15 chain. To detennine the state of a chain and lts possible transition, see Figure 4.1.2.
Each time a stone is added ta a chain, traverse down one level to me
approp~e node; '~averse
up one level when a stone is removed.
-:,..ï
4.2. Evuluution Function
t
A score for each player is assigned ta each vacant point on an active chain. These scores,
are based on the state and the co10ur of the chain. If the chain is .bl~ck, then an offensive score
is given ta black and a defensive score is given to white, omerwise, white receives the
.
-
offensive score while black receives the defensive score. Experimental games with numerous
opponents fine tuned the scores given in Tables 4.2.1 and 4.2.2, which wete initially basèd on
the stategy: (
means "has greàter value than" )
'create five > black possible five > create open four> black poSSible open four> create crossing threes ~ black possible crùssing threes > black possible four> create four> black three >
create three.
n>n
Each time a new score is given ta a vacant point, the old score for the given point in the
t"~,
same. chain is deleted. Each vacant point may be part of at most 20 active chains. The
~lackvalue of a point is the sum Qf an scores given by each active chain t~ black for this
"23 •
--,
v-
Patterns
POSITION
1 2 3 4
STATE
1
-
2
3
4
5
•
-
6
•
7
8
9
.
•
•
10
0
11
•
12
-
13
14
15
16
17 \.
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Il
•
•
•
-
-
• :-
-
-
•
-
-
-
•
-.
-
•
-
~
• •
• • - •
,
- •
-
-
• •
Ct
-
•
-
-
•
•
-
-
•
-
-
•
•
-
-
•
-
•
-
•
-
-
',.
fi
-
•
•
•
• •
• •
•
-
•
• •
-
-
0
•
• •
• -
-
-
-
•
•
•
•
"
-
•
•
-
•
•
-
-
•
-
-
-
•
• •
5
-
-
-~
"'--
-
.\•
-
'''-,,-
•
-
•
• •
• -
~
•
•
•
-
•
• •
• •
• •
-
•
•
.;J
!
Table 4.1.1 List of states and its patterll.
\
~'- ,;>
1.(+++++
if
2: (0 + + + +)
3: (+ 0 + + +)
4: (+ + 0 + +)
S: (+ + + 0 +)
6:
(-+-r++
0)
.
~
"
8. (0 + 0 + +)
Il: (+ 0 0 + +)
14: (+ + 0 0 +)
15. (t t 0 + 0)
,.~
"
J"
17: (0 0 0 + +)
\----
~
23: (+ 0 0 Cf +)
24: (+ 0 0, + 0)
,
27: (0 0 0 0 +)
Figure 4.1.2 The partial stale transition diagram.
31: (+ 0 0 0 0)
32 (0 0 0 0 0)
~
1)
25
STATE
\
1
2,
3
4
5
6
7
8
9
•
0
e7
e7
e8
e7
e8
e8
·•
• r'
,
•
•
.~
•
~
e2
e3
e3
eS
•
•
•
•
el
•
•
•
•
e8
e6
e6
e7
e7
e6
e7
e7
•
'
•
e9
e7
e7
e7
e8
·
\-
4
eO
elO
e9e9
e7
e7
e8
•
•
20
e9
e9
e7
•
0
14
15
16
17
18
19
•
e9
e9
elO
•
•
~2
3
eO
e9
e9
•
e9
e9
elO
elO
13
21
22
23
24
25
26
27
28
29
30
31
32
'2
eO
" e9
1
eO
10
11
...
Evaluations
POS1TION
0
•
e4
e5
e5
•
•
•
eS
o
,
•
••
e7
Cl
e4
e5
•
•
5
eO
eIO
eIO
e9
è9
•
e8
e8
e8
•
e7
e7
•
•
e7
e7
•
•
•
eS
0
eS
•
.,
•
e5
e5
•
•
•
•
•
Cl
e4
e4
•
•
•
•
e2
•
•
•
•• J •
•
el
el
•
•
•
•
•
•
•
0
el
el
•
•
0
•
•
•
•
Table 4.2.1 Scores for each state.
)
eS
e3
e~
Cl
1
\
..
.
26
\
0
SCORE
Evaluations
VALUE
defensive
offensive
eO
el
e2
e3
e4
e5
e6
e7
e8
0
0
5000
2000
100
50
30
30'
20
100
45
35
3fl
20 \
20
~
é9
10
7
5
el0
15
7
.- - 5
,
1
.
Table 4.2.2 Values of each score.
point. Similarly, the whitevalue is the sum of aIl scores for white at this point. Since a point
near the edge of the board participates in fewer chams, this cumulative technique of scoring
discourages the playing of a' stone near the edge of the board.
It is easy to, count the number of black chains with four stones by keeping track of the
number of acl1ve black chams with states 27 through 31. To count the number of. blJck chains
.
.
with three stones, keep count of the number of active black chains with states F7 through 26.
,
Similarly, th/number of white chains wiili four stones and three 'stones can be counted. Let
1
blackthree and black/our be the counters for the number of blaclc chuiils with
thre~
stones and
four stones respectively. Similarly, define the countets whitethree and white/our. The ,func-
,
tion EVALrfAi10N
(- (Move) for
, a move made by black is defined as:
-
(
-
blackvalue-{whitevalue/2)
+' 10 x (bl~Ckfour -
white/our
+ blackthree - whitethree) ....
For the evaluation of a move by white, interchange black and white in the function definition.
Another set of scores, E 1, ... ,E 5, are ret}lmed as the evaluatlon when killer moves are enc
tered. These scores are given in Table 4.2.3.
..u
1
.'
~27
,.- /
•
Evaluations
VALUE
SCORE
20000
El
E2
16000
,
E3
1500
E4
-20000
-8000
E5
('
"
Table 4.2.3 Scores for killer moves.
4.3, Data Structures
• <
The choice of a data structure is influenced by the desire to minimize calculations
t\y
cr
storing as much as possible but without storing redundant data. AIso, data access time may be
~
'If
,
reduced by the c1ever use of dual data struétures. Figure 4.3..t shows the data structures uscd.
\
The structure with the most data is the BOARD, a two-dimensional (19x 19) array of
points. Each point contains:
piece - the stone on the point: black, white, or vacant;
whitevalue - the value of the point for player white;
whitepos - h pointer 'to a location in the WHITEHEAP (defined below);
blackvalue - the value of the point for player black;
blackpos - a pointer to a location in the BLACKliEAP (defined be)ow);
Hstate - the state number for the chain starting at the point and going-horizontally to the
'> •
right; (An jnactive chain is represented by a negative state number.)
Hco)our - the colou~f the honzonta) chain;
,,
Vstate - the state number for the chain staIting at the poInt and gomg vertically down;
V colour - the col our of the vertical cham;
Rstate - the state number for the chain startmg at ùle point aqd going diagonally down
the right;
Rcolour - the colour of the right diagonal chain;,
J
1
.c.
.~
#
"
..,
l'
~
\
..
4
#
whilepos
whitevalue
blackvalue blacl-pos
&
IJeolour
.....,
1
~
moves in
d
p~
Veolour
Vstale
Lco1our
1
,..
x
.
,
~
r
(
i
..
MOVELIST
~
(x,y)
y
.
-\
(x;y)
principal continuation
~
~
"
~
•
. BOARD /
(
'"
"'v
-1
WIDTEHEAP
BLACKHEAP
'Ii'
Ci
"'
PC
~
~
; Figure 4.S.1
The data structures.
i
"r
.-'
~
...
~
,
"
•
,
29
,
and going
Lstate - the state number for the chain starting at the point
d,iagonally down
the left;
'.
~
Lcolour - the colour of the left diagonal chain.
\
BLACKHEAP and WHITEHEAP are array implementations of a priority queue. The
priority queues contain ail possible moves and are ordered such that the moyes with the highest
blaclcvalue(whitevalue) is at the top of the BLACKHEAP(WHITEHEAP). The priority queues
C<?n;ain only the moves; the values of the moyes ~e stofed on the board.
-MOVELIST is a two-d,imensional (depth x f anout) array of mQves. lt i,1i used to store
the moves generated by the SUCCESSOR routine (description .of routine in 4.4) .. PC is a
c
two-dimensional (depth
x depth+l)
array which stores the moves on the principal continua-
r
tion.
...
4.4. Move Selection
Figure 4.4.1 shows the procedures involved in the selection of a moye. When it is the
program's
mm to play, it caUs the routine FINDMOVE 10 select a move..., In ·the opening game,c
~
.FINDMOVE
,
returns a move that is randomly selected such that 6
$
x. y :5 14, lhal is, a move
more than 5 positlon~ from any edge of the b~ard After 4 moves have been played, PINDMOVE randomly selects one of the top three moves from the heap. If the nur'nbér of move!.
~eeds 8 then a move is selected by sear::c~mg the game tree l,Ising the recur~iye function VALUE.
.
.
..
VALUE performs a minimax se arch wùh
"
alpha~beta
pruning on the game tree and
retums the minimax value and the principal ~ontinuation. VALUE uses the routine SUCCES-
l
'
SOR to generate the successors of a node. The routines ,MAKEMOVE and UNOOMOV,e are·
used ~y function' VALUE to traverse do~n the tree and to traverse back-up .1hc-tree ~eltpectively.
~~
""
('
~-
.
,
--..
~
FIND1\-fOVE
..
"
~
SUCCESSOR
~
~
,
~
SCORE
FINDSTATE
'"
-
~)
..
UNDOSTATE
,
i
\
.8:)
"
(~
•
'c
Figure 4.4.1
'"
~
Procedures invalved in mave selection.
•
:s
,,<' ,
,/
31
lJ
The routine SUCCESSOR not only generat~s the successor of a node but it plays
Il
role
in determining the fano~t. SUCCESSOR first determines whether the prediction of the
opponent's response is correct. If so, the
firs,~
move generated ut each p
is obtained l'rom the
principal continuation (PC). GlVen that it is black's turn to move, S
_CESSOR consldcrs
,
only the top 4 moyes (3 moves if one has been obtained from the PC) ip luck's heup. I-kthe
,
,
/
-
difference in the value of the .,4th'\move and the 5th move is less than a cc am thrc,shold, then
the 5th moye is added to the gâme tree.
Similarly, the 6th(7th) move is included jf the
1
, ~ifference between the 4th move and the 6tjl(7th) move is below the thre!.hold. Thus, the
fknout of the game tree vanes fn?m 4 to 7.
(,
l
The routine MAKEMOVE updates ail data structures and eXllmine!i. the chnins nffecled
by a given move. Function FINDSTATE is used to determine the new stale of a chnin. With
this state, MAKEMOVE can determine if a move is a kjlfer' move.
When killer move creates a fi've, the boolean variable Ganll'WUfI is set to true. If a
.::J move blacks a possible tiye, the boolean variable Mus/Block
Four is
/
~et
is giverthe value truc. Open-
to truc if a move creates an open four, and Force is 'SêJto truc if a move forces
the opponent to block on hls next turn.
Function SCORE
J
IS
then used to update the scores of ail vacant pOints
ln
the affected
c
chains. When traversing back-up the tree, routines UNDOMOVE, UNDOSTATE and SCORE
restore the data structures, the states and the scqres.
The tirst time the recursive functton VALUE (see L2mg 4.4.1) is called with the
parameters: J
=
root, 0.=-00, and 13=+00. The "uccessors of the root node are generated and
~
'"
VALUE is called with each of the b successo),,,. To traverse down the trcc and
,
lO
update
àlJ
data structures, MAKEMOVE is called. Note that since the moves are generated from the
heap, they are ordered in the following manne.r:
1
moves that imfT\ediately win the game;
c
)
32
\ ...
"
2
moves that block the opponent from winning immediately;
3
moves that create an open four; and
4
moves that foree the opponent to
blo~j;
a win;
If the mOVf\ results in it\game winning situation, there is no need to search deeper in the
1
.,.,.
,
l'
tree and it is not riêcessary to search the sibhngs of this node either sinlfé'this is the best move.
The score Elis returned if the move is made by the maximizmg player;"btherwise (E 4 + ply)
is retumed. The motiye for adding ply to E 4 is that if two, moyes resuIt in a loss to the program, /the moye that oceurs later (Le. ply is greater) is better, that is, postpones the loss.
If the moye blocks a gilme winning situation at the first ply, then the program must
make this moye, so there lS no need to search deeper nor to search its siblings. If the move
creates an open four. Again, there iS\llo need to search deeper or to search its siblings. Thus,
the killer moyes further help reduce the amount of nodes that need to bè searched. Ply is subtracted from E 3 so that if two moves result in a win for the program, the moye that occurs
sooner (Le. ply is smaller) is
better~
To minimize the horizon effeet, the function allows for a varying search from a
minimum of 6 to
St.pmaximum
of 13. When the minimum seareh depth has been rearehed, a
decision to continue searehing~,eper is made on the b,asis of whether the move was a Must•
Block . or a Force.
Th~se
1
kliler moves at the seareh limit result in a deeper seareh. If the
search is discontinued, EVALUATION(J) is retumed.
~Î'
Wheneyer a must block move is en~ountered,
Jts siblings need not be examined sinee
b
any other move would result in a loss. A continuation to search deeper in the tree is required,
if this does not Occur at ply one.
Each time MAKEMOVE(J) is called, the move is put on the diagonal of the PC. When
\
.
a good move is foühd, that is ~hen a or ~ is updated, push tha't move an~ its continuation, the
J
33
1
moves on that row, up one row. When the searèh is finished, the principal continuation is on
the top row of PC.
(
t,
JI
34
futlction V ALUE(J ;a,(3):
if (J is root) thed begin
gèfierate successors of J: J l' . . . ,lb
while (there are suecessors) do
temp <- VALUE(Jk;a,~)
if (temp > 0'.) then begin
a <- ternI'
update principal continuation
end
if (a. ~ ~) then return (~)
endwhile
'\
return (a.)
end
else begin
cali MAKEMOVE(J)
if (GameWon) then begin
label J a'i terminal
delete the remaining successors for the parent of J
..
cali UNDOMOVE(J)
..
:,;:~
-
)
-
return (E 1)
end
else if (MustBlock at ply 1) then begin
lab~1 J as terminal
•
delete the remaining suecessofs fôr the parent of J
J
J
i
cali UNDOMOVE(J)
return (E2)
end
else if (OpenFour) then begin
1
label J as terminal
~ the remaining sucees sors for the parent of J
.Ydifi UNDOMOVE(J)
if J is MIN nod& then
return (E 3-ply)
else return (E 5)
end
.
c
else if «ply = maximum search depth) or
«ply ~ minimum search depth) and M)s a MIN nadi and
,
(not MustBlock) and (nofForce))) then begin
E <_ EVALUATION(J)
0
'f
0
"
cali UNDOMOVE(J)
•
\
,
return (E)
end
()
Listing 4.4.1 Function VALUE .
• r
\
\
35
\
&
~'~gin
~
if (MustB/ock and pÎy-> 1) then
(
delete the remaining successors for the parent of J
generate successors of 1 : J 1- " _J b
if (l is MAX node) then begin
while (there are successors) do 0:
temp <- VALUE(J.t;a,~)
if (temp > a) th en begin
a. <- temp
update princi pal ,continuation
end
if (a. ~ ~) then begin
cali UNDOMOVE(l)
return (~)
endwhile
cali UNDOMOVE(l)
return (a)
end
else begin
while (there are successors) do
temp <- VALUE(lk. ;a_~)
.\
if (ternp .< P) th en b,e.gin
~ <- temp
•
, update principal continuation
end
\~.
if (~ ~ a) then begfn
cali UNDOMOVE(l)
return (a)
endwhiJe
cali UNDOMOVE(J)
return (~)
end
-/7
...
J
J
>
Listing 4.411 F4ition VALUE cont'd:
,.,
fê"-'\
.....
36
S. IMPLJ':MENTATION
5.1. Vpdatlng the Data Structures
0
Making a move affects only 20 of the 1020 possible chains. Of the 32 points that belong
~
to one of the 20 chains affected by the move, only those that are vacant have their b/ackva/ue
and whiteva/ue updated. As the game progresses, there are much fewer than 31 vacant points
in the 20 chains. When ail 20 chains have been processed, the values of the affected points
are updated. Each update requÎres.fa heap update.
If the update decreases the value of the
point, then the procedure PUSHDOWN is apphed on that point in the heap.
(Procedure
"
PUSHDOWN is given Aho, Hopcroft and UlIman[19]) The point i~ pushed down in the heap
by swapping it with the larger of its two children. This continues un@ this point is larger than
,J
both its chifdreo.
Procedure SIFfUP is used when the update has increased the
~
~
v~Îue
of the point. (Pro-
cedure SIFfUP is given by Standish[201.) SIFfUP allows the point to move up the heap by
swapping the point with its parent if the vilue of the parent is less. A maximum of 32 updates
are required for each heap. However, as the game progresses the true number of updates
required are' mu ch less. The value of each point is not stored in the heap but rather on the
'" ~. .. .. • • ,Ii
board sa as ta avoÎd moving the scores around when the heap Is updated.
5.2. Timing Ilnd Trec Pruning
The implementation has been installed on a PC-AT, a SUN and a VAX.
An average
move requires 20 ta 40 seconds on the VAX and the SUN (a little longer on the PC). C and
Pascal were chosen for the implementation
50
that the program may' be easlly ported ta aoy
machine. An increase in speed may be obtained, if the de~ign is implemented directly in
assembler.
)
"\
37
On average, 20% of the time: the program can make use of the principal continUlltlon.
For example, in an average game of 6Q moves, 30 by each player, the program
C~IO
coo'Cclly
predict 6 of the opponent's move. This success may seem low, but in many,instances
Il
win
may be accompli shed by placing a stone at either end of an open four, which implies
IWO
different moves will yield the same result. Similarly, blockmg an open t"ru may be uccomplished by more than one move, thus, it is difficult to predict exactly with which move the
opponent will respond.
Table 5.2.1 shows sorne statlstics of sample games that the program played against Ïlsclf.
Only about one third of the nodes examined are terminal nodes. The numbcr of nodes exam<>
ined
In
a game
ar~
not directly proportiortal to the number of moves played. In fact, fewer
nodes need to be examined as the game progress since more killer moves are cncountercd.
5.3. Fanout and Depth
The fimout and the depth of the search tree were determined via expcrimentation.
Different versions of the program played hundreds of games with each other. The program
J
number
of moves
18
22
-
~
25
27
35
42
58
59
71
77
total number
total number
of terminal nodes
of nodes examined
1018
580
1426
486
1462
490 '
2080
748
1755
580
1716
577
1241
379
,
1609
522
"
1549
526
1922
594
~age time (sec.)
Table 5.2.1 Statistics of sample games.
,
per: move
39
29
27
32r
27
26
20
24
19
26
38
using a fanout of 3 te 6 won only 43% of ~e games against the program with a fanout of 4 to
7.
As the fanout increased, the margm of wins and losses narrowed. The program with a
......'
0
fanout of 4 to 7 won 48% of the games against one with the fanout of 5 te 8. Although, the
prog~
~
with the greater fanout was more successful, it required considerably more seareh
ume.
It seems logical te think that if one searches deeper in the tree, one would obtain better
re!iults. The program that searches ta a minimum depth of 6 won 64% of the games with programs that searehes to a' mimmum depth of 4 However, the program that se arches ta a
minimum depth of 8 won only 47% of the games against the program searching ta a depth
o~
6. This surprising ,result might be explained by the faet that the programs extend their seareh
beyond the minimum depth whcncver a critical situation arises. Consequently, bath programs
will Search ta the same depth in a
cr;~ical
situation.
Another possible reason why searching deeper does not improve the quality of the game
, is that the game is very volatIle. One move can greatly affect the strength of a player. For
example, in one maye a player can black his opponent's four and create for rumself a crossing
four and open three.
,
r
..
.'
)
)
39
6. PERFORMANCE
~
6.1. Performance Against Turbo's Program
ft
There is little competition from Turbo's program since no look-ahead is used by Turbo.
This lack of look-ahead misled Turbo into making moves that superficially loo\cd good. For
example, it always makes a four if possible, without rcalizing that the four cnn be easily
blocked on the next move. As weB, this prevents the program from building a strong base,
many two's
an~
three's.
AIso, its evaluation function is very
S1m pI\and,
docs not mllkc
enough discriminations among the possible moves. Turbo 10st ail of !ts 50 games against our
program.
Figure 6.1.1 gives a sample game against Turbo (Turbo plays White). Our (rogram won
by creating two crossing open threes at (9,11).
/
6.2. Performance Against Karlsson's Program
1
Karlsson's program, LUFF, is a mu ch more 'ohallenging. competitor even though, it too,
,.
\
has no look-ahead. Karlsson's game tends to be more co.nscrvative and thus cvcn two's arc
.
.
blocked early in the game. Of approxtmatly two thousand games pIaycd bctween KarJ!.,son's
program and ours, our program won 84%. Most losses expcnenced by our .r-r(')gram werc duc
to an increas10g number of threats. (Many game resulted 10 over 120 moves.) Sincc only the
top 4 to 7 moves are considered, the number of immedtate threats
becomc~ ~o
grcat that sorne
of them are never constdered. T.hus, the program is blind to thesé threat'i.
"
Figure 6.2.1 shows a sample game with LUFF (LUFF plays White). Our p'rogram won
by creating a four crossing an open three at (12,7).
"
.-------------------------
40
•1
2
3
4
5
6
•7
f
r
\
8
9
10
11-
12
13
14
15
16
17
18
19
2
3
4
5
6
,"" ')
~
7
8
• i '
"0
~l~
,
9
,
1
10
11
12
f
"
\)
13
14
"
15
1
1\
16
o<l
17
,
~
,~
18
--
19
)
'-
,~
Figure 6.1.1 A sample game wuh Turbo(White).
•
, ,.
~
41
<
)
'If
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17' 18
19
1
2 r--;---+--;---+--;---+--;---r-~---r--+-~~-+--~--+--4---+~
3
4
j
5
:
6
7
8
9
10 I--+--+--+---+--f--+---+--t
Il ~~--~--4---r-~--~~
12 ~~~-H~4---r-~---r--+---~
13 1-~---r--4---r-~---r--+---~-+~
14 1-~---r--4---r-~--~---+---~-+---r~
15 ~--+--~--~--r-~--~--+---r--+---r--+---I--+~
16 ~~--~--~--r-~--~--+-~r-~---r--+---~
17 ~~--~--4---r-~--~--+-~I--+--~--+-~~-+--~--+--4~-+--~
18 r-~---r--~--r-~---r--+---r--+---r--+-~r--+--;---+--;---+~
19
1
(!
L-~__- L_ _~_ _L-~__- L_ _~_ _L-~__- L_ _~~~~-r~
__~~__~~
Figure 6.2.1 A sarnple game with LUFF(White).·
-
~,
..
-
...
6.3. Performance Agalnst "uman Opponents
'--
Our program has not succeeded as ,weIl against humans. Definitely, the Go-M9ku skill
,
of each human is different. The program has won very few games against very good players .
. Figure 6.3.1 shows a game loS! to a human opponent (Whil!'). At
our program allowed the human' to play two stones (move 3 and move
r~ Of the game,
5;'
far
aw~y
from the
initial action. These stones gave a local edge to the human later on. Move 25 created an open
three for
...
tjY
an. Instead of blocking this open three at (14,9), the program postpond the
blocking move by cre ting a four at (10,9); it then plock the open three and a possible open
three by playing (1
~. T~is :,rror i; judgement allowed ~~ human' to
take position (14;9).
Once the human played (14,7), two crossing threes were formed and the game was lost.
.
(
.
,
'1>
\
l
/
43
i'
~
1
1
)1
3
4
5
6
8
9
10
11
12
13
14
15
16
17
18
19
'2
3
.....
4
5
6
7
8
9
10
11
">
12
13
14
15
~l
)
16
17
18
19
Figure 6.3.1 A sample game with a human opponent(White)
..
~)
t
--------------:;".-------------
44
\,
7. CONCLUSIONS
By scoring each possible rnove throughout the game, we have been able ta adequately
order the moves so that only very few (4 ta 7) of the 300 or so possible moves need to be
examined with the minimax procedure with alpha-beta pruning. We were able to further pru\
the game tree by recognizlOg killer moves that refute other moves. The scoring function and
the recognition of killer move's were facilitated by the concept of states. The state of a chain
determined the score that
IS
to be assigned ta a given mo~ on that chain.
'"
The use of the heap to order the moves was an attempt to minimize the time required to
find the top few moves.
Updating only those vacant points in the affected active chains
prevented redundant work.
One way of speeding up this implernentation would be to translate the program into
~ssembly
language. The decision to program in C :md Pascal was to make this implementation
po~able.
\. A possible [eature that may be introduced is to restrict the program from exceeding a
given time limit during the search. This can be accomplished by making the width and the
.
depth of the search time dependent. That is, each time the search backs-up to the root, check
if a lime threshold has been exceeded, if so, dlscontlflue search. This
~Od
will result in nar-
ro\\Îng the fanout, but hopefully the scores have accurately ordered the t>bt moves 10 the front.
However limiting the search depth, so as to not exceed a time limit, greatly 1ncreâses the
chance of poor results due to the horizon effect.
The technique that has been designed and irnplemented surely limits the fanout of the
Go-MoJçu game tree. However, this limitation of the fanout is costly; on occasions when too
many threats occur simultaneously (e.g. in long games), the program is blind to sorne of them.
The cumulative scoring of a vacant p:>int provides good discrimination.
A point that
\
".
plays an important role in more .than one chain, su ch as creating crossing threes, will receive
li
\
good score from each of the chàins. By summing these scores, the ~'lltue of
li
point is weil
,
reflected.
F~er
fine-runing of the scores to improve discriminauon is indeed difficult since
~ can be accomplished only
by having the program play an enormous number of games with
.
~
.
many different opponents.
During the development of this design we attempted to break down our evaJuation strué,
tegy. We viewed the pmgram as being made up of many ptayers, each with a dlffcrtnt point
of view. The libarian finds situations in the hbrary that has winning patterns. The viru.\' killer
.
~
checks the opponent' s local strength, white the virus maker checks the progra m' s local
strength.~
, '
"
'
machoman plays near the opponent's last move. The body/mi/der secks
10
max-
imize basic ~trength in terms o'f creating twos. The optimist suggests good moves for the proJ
gram based Qn the overall score white the pesimist finds the opponent's harmful moves, The'
enforeer investigates moves that force other moves. The ralldom guy offers a random movc. A
move among the many suggested ,moves is then chosen by the strateglM- with the aiù of the
1
statistician, who keeps track of how often a suggested move by a given player
At first we believed that
b~
i~
chosen.
separating the strategies we could discover which strategy
\
was more effective and thus we could improve the performance of the program. Sccondly"we
hoped that we could change stategies during different stages of the game as wc bclieve humans
do. However, the greatest obstaCte we encountered with this approach wa~ how to coordinate
\
Î
the different strategies. How should the strateglst decipher the different suggestions?
Further study is reqÛfred to solve this probler of the coordmation o( many !.trategies.
Also, a method for changing strategies during dtfferent
~tages
of the game
~hould
improve per-
formance.
J
r
/
46
REFERENCES
1.
Felgenbaurn, E.A. and Feldrnan, 1., Computers and Thought, MeGraw-Hdl (1963).
2.
Levy, D, "Go-Moku and, Renju," pp. 212-224 in Computer Gamesmansjâp, j,enru..ry
Pubhshing, London (1983).
3.
Lasker, E .• "The Game of Go-Moku," pp. 205-212
ln
GO and GO-MOKU the
rienta/
Board Games, Dover Publicanon (1960).
"
~.
4.
5.
Levy, D., "Two-Person Games," pp. 30-76
lishing, London (1983).
ln
Computer Gamesmanship, Cenrury Pub-
Charruak, E. and MeDerrnott, D, "Game Trees," pp. 281-293 in Introduction to
Anzficiai Intelligence, Addison-Wesley Publishing (1985).
6.
Pearl, 1., "Strategtes and Models for Game-Playing Prograrns," pp. 221-251 in Heuristics, Addison-Wesley Publishing (1984).
).
Nilsson, N.1., "Searerung Game Trees," pp. 112-130
gence, Tioga Publishing Co. (1980).
8.
Winston, P.H., "Exploring Alternanves," pp. 87-135
Wesley Publishing (1984).
ln
10
Pnnclples of Arnficzal InœJli-
Artzficzal Intelligence, Addison·1
Handb;~k
9.
'Barr, A. and Felgenbaum, E.A. (eds.), "Search," pp. 46-108 in The
Arcificiai Intelligence, William Kaufmann Ine. (1981).
10.
Newbom, M.M.v"The Efficieney of the Alpha-Beta Seareh on Trees wi~ Branchdependem Tenninal Node Scores," A rtzfi cia1 Intelligence 8 pp. 137-153 (1977). -,
Il.
Newbom, M.M., Computer Chess, Academie Press (1975).
12.
Berliner, Hans l, "Chess
as
of
Problem Solving: The Developmem of a Taeries Analyzer,"
Dept. of Computer Science, Carnegie-Me lIon University, (1974).
('
13.
Berliner, Hans 1., "A Chronology of Computer Chess and Its Literarure," Artificiallntel-
ligence 10(2)(1978).
l'
14.
Eleock, E.W. and Murray, A.M., "Experirhents with a Learning Component in a GoMoku Playing Program," Machine Inteiiigence 1 pp. 87-103, Edinburgh Universtty Press,
(1967).
15.
Elcoek. E.\V. and Murray, A.M., "Automatie Deseripti6n and Recogmtion of Board Patterns," Machine lntellfgence 2 pp. 75-88, Edinburgh Vniversity Press. (1968).
16.
Allwork, J .• "BASIC Game : GOBANG," BYTE, pp. 56-62 (Nov. 1979).
17.
Karlsson, ~., "Go-Moku," Technical Report for Royai Institute of Techndlo gy, Stockholm, Swedeq;; (1982).
'..;.,l
~
18.
Turbo, "Go-,Mok.l1 Pro gram Design," pp. 35-46
(1985).
19.
Aho, A.V, Hoperaft. J.E., Ullman. J.D., "Heapsort," pp. 271-274 in Data Structures and
Algorithms, Addison- Wesley Publishing (1983).
20.
Standish. T.A .... AdditionaI Tree Strucrures and their Applications," pp. 84-123
Structure Techmques. Addison-Wesley Pubhshmg (1980).
~
III
GamelVorks, Borland International
>' \
ft
10
Data
o

Download Report

Document

Paperzz.com

Your Paperzz