슬라이드 제목 없음

CONTENTS
1. Introduction
2. The Basic Checker-playing Program
3. Rote Learning and Its Variants
4. Learning Procedure Involving Generalizations
5. Rote Learning vs. Generalization
INTRODUCTION
• General Methods of Approach
• Choice of Problem : ‘Checkers’
Heuristic procedures
A definite goal (final goal)
at least one intermediate goal (criterion)
Definite rules of activity
The learning process can be tested
Familiar & understandable
The Basic Checker-playing Program
• General method from ‘Shannon, 1950’ as applied to chess
1. Alternatives
Which alternative moves are to be considered?
2. Analysis
a. Which continuations are to be explored and to what depth?
b. How are positions to be evaluated in terms of their patterns?
c. How are the evaluations to be integrated into a single value
for an alternative?
3. Final choice procedure
What procedure is to be used to select the final preferred move?
The Basic Checker-playing Program (Cont’d)
<< Ply Number >>
+20
+3
+20
+100
+20
+100 +50 +20
-7
+4
+4
+3
-3 0
+15
-70
-10
+3 -10
-70
2
+15 : Anticipated reply
by Opponent
+7
-20 -70 -100 +3
1
: Proposed move
by Machine
+7 +15
3
: Proposed move
by Machine
-5
• Exploration to ply level 3
• Evaluation with scoring polynomial
• Selection of alternative by ‘minimax’ procedure
The Basic Checker-playing Program (Cont’d)
• Ply Limitations
depends on the board conditions
a. Set a minimum distance
b. When the next move is a jump,
the last move is a jump,
an exchange offer is possible,
program continues looking ahead.
desired results
The Basic Checker-playing Program (Cont’d)
• Other Modes of Play
Have program play both sides of the game
Follow book games
evaluation of book move and proposed
move by machine (correlation coefficient)
Have program play several simultaneous games
against different opponents
The Basic Checker-playing Program (Cont’d)
• Scoring polynomial
a. Measure of intermediate goals
b. Linear polynomial:
sum of terms multiplied by coefficients
f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x)
g(x): terms selected from a list of 38 parameters
c: coefficients which multiply these parameters
The Basic Checker-playing Program (Cont’d)
• Scoring polynomial (Cont’d)
c. Each term relates to the relative standings of
the two sides, with respect to the parameter in
question; difference between the ratings for the
individual sides.
d. Dominant parameters:
inability to move, relative piece advantage
The Basic Checker-playing Program (Cont’d)
+20
<< Ply Number >>
1
2
3
+20
• Selection of the best next move depends on the
evaluation process.
• Learning involves improving the evaluation
as a result of ‘experiences’ .
Rote Learning and Its Variants
• Storage scheme
Simply save all of the board positions
encountered during play, together with
their computed scores.
Reference is made to this memory record
• Improvement
Reduce computing time
Looking much farther in advance
• Sense of direction
Rote Learning and Its Variants (Cont’d)
….
Board position score
+20
+15
….
+20
+20
+20
Learning
Ply level 6
Improvement
Rote Learning and Its Variants (Cont’d)
• Cataloging & Culling Stored Information
Limit the the number of boards that can be saved
& Long search time
a. catalog boards that are saved
Standardizing & Grouping
b. delete redundancies
c. discard board positions
Method based on frequency of use:
Refreshing & Forgetting
Method based on ply:
cull lowest-ply board positions
Rote Learning and Its Variants (Cont’d)
• Rote-learning Tests
Conclusions:
a. A sense of direction & refined system
of cataloging and storing information
b. Efficiency depends on the data handling
capacity of computer
c. More information must be stored to
improve midgame play
d. Game/ suitable vehicle for use during
development of learning techniques
Learning Procedure Involving Generalizations
• An obvious way to decrease the amount of
storage needed to utilize the past experience
is to generalize on the basis of experience
and to save only the generalizations.
• Generalize on experience after each move
by adjusting the coefficients in the evaluation
polynomial and by replacing terms which
appear to be unimportant by new parameters
drawn from a reserve list.
Learning Procedure Involving Generalizations (Cont’d)
backed-up score
A Scoring System
Y=f(x)
X: current board position
Y: an estimate for
backed-up score
+20
+20
Learning
Ply level 6
Evaluation
Improvement
Learning Procedure Involving Generalizations (Cont’d)
Back-up score from ply level 3
Board position
score
Board position
….
….
+20
+15
Backed-up score
Function (scoring system)
f(x,c) : linear polynomial
Learning Procedure Involving Generalizations (Cont’d)
• Scoring Polynomial for generalization:
f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x)
g(x): terms selected from a list of 38 parameters
c: coefficients which multiply these parameters
• Learning procedure involves, after each move,
adjusting the coefficients
replacing terms which appear to be unimportant
by new parameters
Learning Procedure Involving Generalizations (Cont’d)
• Training
Alpha (with learning)
& Beta program (without learning)
determine relative ability of Alpha
manual intervention
(arbitrary change in scoring polynomial)
Learning Procedure Involving Generalizations (Cont’d)
• Polynomial Modification Procedure
Initial scoring polynomial
f(x,c) =c1g1(x)+c2g2(x)+…+cjgj(x)
At a given board position(xk),
a. compute the scoring polynomial (f(xk,c))
and save this polynomial.
b. compute the backed-up score(yk),
using the look-ahead procedure
Learning Procedure Involving Generalizations (Cont’d)
• Polynomial Modification Procedure (Cont’d)
• Delta = yk - f(xk,w)
indicator of change
used to check the scoring polynomial
and adjust weight(coefficient) for each term
in polynomial
check the scoring polynomial, using delta
Learning Procedure Involving Generalizations (Cont’d)
• Polynomial Modification Procedure (Cont’d)
Adjustment in the values of coefficient
a. Correlation beween the signs of the individual term
contributions in the initial polynomial and the sign of
delta
b. Adjustment in consideration of Number of times that
each term has been used and has had nonzero value.
c. The coefficient for the term with the largest correlation
coefficient is set at a prescribed maximum value, with
proportionate values determined for all of the remaining
coefficients.
Learning Procedure Involving Generalizations (Cont’d)
• Instabilities
Stabilizing against minor variations in the delta values
set an arbitrary minimum value of delta
fixed at the average value of the coefficients for
the terms in the currently existing evaluation polynomial.
Stabilizing violent fluctuations,
when a new term is introduced
replace the times-used number by an arbitrary number,
until the usage does, in fact, equal this number.
Learning Procedure Involving Generalizations (Cont’d)
• Term Replacement
Low-term tally against the lowest correlation
coefficient
Is it a satisfactory scheme to select terms for
the evaluation polynomial?
• Binary Connective Terms
Combinational, nonlinear terms
Learning Procedure Involving Generalizations (Cont’d)
• Preliminary Learning-by-generalization Tests
Learning procedure did work and learning rate
was high.
Learning was quite erratic and none too stable.
Learning Procedure Involving Generalizations (Cont’d)
• Second Series of Tests
Four Modifications for improving stability
Conclusions
a. effective learning device for problem to
amenable to tree-searching procedures.
b. modest memory requirements & reasonable
operating time
c. instability can be dealt with by straight-forward
procedures.
d. machine can learn to play a better-than-average
game of checkers
Rote Learning vs. Generalization
• Rote learning:
Improvement is made by increasing data storage
Good opening play and end-game play
poor middle game
• Learning-by-generalization:
Generalization on the experience by adjusting
a scoring system
Good opening play and end-game play
poor middle game
34
35
25
26
15
7
8
3
23
19
11
12
13
24
16
17
32
28
20
21
22
4
29
30
31
33
14
10
6
2
5
1