Rote Learning - Learning Agents Center

2. Rote Learning
Prof. Gheorghe Tecuci
Learning Agents Laboratory
Computer Science Department
George Mason University
 2003, G.Tecuci, Learning Agents Laboratory
1
Overview
Rote learning issues
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
2
Rote Learning
Rote learning consists of memorizing the solutions of
the solved problems so that the system needs not to
solve them again:
During subsequent computations of f(X1, ... Xn), the
performance element can simply retrieve (Y1, ... , Yp)
from memory rather than recomputing it.
 2003, G.Tecuci, Learning Agents Laboratory
3
Issues in the design of rote learning systems
Memory organizations
Rote learning requires useful organization of the memory
so that the retrieval of the desired information will be very
fast.
Stability of the environment
The information stored at one time should still be valid
later.
Store-versus-compute trade-off
The cost of storing and retrieving the memorized
information should be smaller than the cost of
recomputing it.
 2003, G.Tecuci, Learning Agents Laboratory
4
Overview
Rote learning issues
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
5
Game playing as a performance task: Checkers
1
5
2
6
9
13
10
17
18
30
12
16
20
19
24
23
26
25
29
11
15
22
4
8
7
14
21
3
27
31
 2003, G.Tecuci, Learning Agents Laboratory
28
32
There are two players (Grey and White),
each having 12 men. They alternatively
move one of their men. A man could be
moved forward diagonally from one black
square to another, or it could jump over an
opponent's man, if the square behind it is
vacant. In such a case the opponent's man
is captured. Any number of men could be
jumped (and captured) if the square behind
each is vacant. If a man reaches the
opponent's last row, it is transformed into a
king by placing another man on top of it.
The king could move both forward and
backward (as opposed to the men which
could move only forward).
The winning player is the one who succeeds
in blocking all the men of its opponent (so
that they cannot move) or succeeds in
capturing all of them.
6
Game tree search
All the possible plays of a game could be
represented as a tree. The root node is the initial
state, in which it is the first player's turn to move.
1
5
2
6
3
7
4
8
9
10
11
12
The successors of the initial state are the states
he can reach in one move, their successors are 13
14
16
15
the states resulting from the other player's
17
20
18
19
possible replies, and so on.
21
24
22
23
Terminal states are those representing a win for
26
27
28
25
the Grey player, a loss for the the Grey player, or
30
a draw.
29
31
32
Each path from the root node to a terminal node gives a different complete play of the
game. For instance, Grey has seven possible moves at the start of the game, namely:
9-13, 9-14, 10-14, 10-15, 11-15, 11-16, and 12-16.
White has seven possible responses:
21-17, 22-17, 22-18, 23-18, 23-19; 24-19, 24-20.
Some of these responses are better, while others are worse. For instance, if Grey
opens 9-14 and White plays 21-17, then Grey can jump over White's man and capture
it.
7
 2003, G.Tecuci, Learning Agents Laboratory
The minimax procedure
Minimax is a procedure for assigning values to the nodes in a game tree. The value
of a node expresses how good that node is for the first player (called the Max
player) and how bad it is for the second player (called the Min player). Therefore,
the Max player will always choose to move to the node that has the maximum
value among the possible successors of the current node. Similarly, the Min player
will always choose to move to the node that has the minimum value among the
possible successors of the current node.
In the case of checkers, we consider that Grey is the Max player and White is the
Min player.
Given the values of the terminal nodes, the values of the nonterminal nodes are
computed as follows:
- the value of a node where it is the Grey player's turn to move is the maximum of
the values of its successors (because Grey tries to maximize its outcome);
- the value of a node where it is the White player's turn to move is the minimum of
the values of its successors (because White tries to minimize the outcome of Grey).
 2003, G.Tecuci, Learning Agents Laboratory
8
Problem
Consider the following game tree in which the numbers associated with the leaves
represent how good they are from the point of view of the Maximizing player:
What move should be chosen by the Max player, and what should be the response
of the Min player, assuming that both are using the mini-max procedure?
 2003, G.Tecuci, Learning Agents Laboratory
9
Solution
Max will move to c,
Min will respond by moving to f,
and Max will move to m.
 2003, G.Tecuci, Learning Agents Laboratory
10
Searching a partial game tree
Size of the search space
A complete game tree for checkers has been estimated as
having 1040 nonterminal nodes. If one assumes that these
nodes could be generated at a rate of 3 billion per second,
the generation of the whole tree would still require around
1021 centuries !
Checkers is far simpler than chess which, in turn, is generally
far simpler than business competitions or military games.
The tree of possibilities is far too large to be fully generated and
searched backward from the terminal nodes, for an optimal move.
 2003, G.Tecuci, Learning Agents Laboratory
11
Searching a partial game tree
.
3. Back propagate the estimated values
1. Generate a partial game tree
node corresponding to
the current board situation
2. Estimate the values of the leaf nodes by using a static evaluation function
Heuristic function for board position evaluation: w1.f1 + w2.f2 + w3.f3 + …
where wi are real-valued weights and fi are numeric board features
(e.g. the number of white pieces, the number of white kings).
 2003, G.Tecuci, Learning Agents Laboratory
12
What is the justification for this approach?
.
3. Back propagate the estimated values
1. Generate a partial game tree
node corresponding to
the current board situation
2. Estimate the values of the leaf nodes by using a static evaluation function
The idea is that the static evaluation function produces
more accurate results when the evaluated nodes are
closer to a goal node.
 2003, G.Tecuci, Learning Agents Laboratory
13
Overview
Rote learning issues
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
14
An illustration of rote learning in game playing
Samuel's checkers player
Estimate value of A
Memorize (A, 8)
 2003, G.Tecuci, Learning Agents Laboratory
15
Improving the performance of the checkers player
Current position E
A
(A, 8)
Question
Using the memorized value (A, 8) is improving the performance.
Why?
 2003, G.Tecuci, Learning Agents Laboratory
17
Improving the look-ahead power by rote learning
Current position
(A, 8)
Answer: This makes the program more
efficient for two reasons:
• it does not have to compute the value of
A with the static evaluation function;
• the memorized value of A is more
accurate than the static value of A,
because it is based on a look-ahead
search.
 2003, G.Tecuci, Learning Agents Laboratory
8
18
Samuel’s results and conclusion
The program developed by Samuel was trained by playing against itself, by
playing against people and by following book games. After training, the
memory contained roughly 53,000 positions, and the program became
"rather better-than-average novice, but definitely not ... an expert" (Samuel,
1959).
Samuel estimated that his program would need to memorize about one
million positions to approximate a master level of checkers play.
Samuel's experiments demonstrated that significant and measurable
learning can result from rote learning alone.
By retrieving the stored results of extensive computations, the
program can proceed deeper in its reasoning. The price is storage
space, access time, and effort in organizing the stored knowledge.
 2003, G.Tecuci, Learning Agents Laboratory
20
Overview
Rote learning issues
Game playing as a performance task
Rote learning in game paying
Learning a static evaluation function
Recommended reading
 2003, G.Tecuci, Learning Agents Laboratory
21
Learning a polynomial evaluation function
Learning a polynomial evaluation function
value =  wifi
What are the main problems to be solved?
a) Discovering which features fi to use in the function
b) Learning the weights of the features to obtain an
accurate value for the board position
 2003, G.Tecuci, Learning Agents Laboratory
22
Learning the weights of the features
Reinforcement learning
The learning procedure is to compare at each move the
value of the static evaluation function corresponding to the
current board position with a performance standard that
provides a more accurate estimate of that value. The
difference between these two estimates controls the
adjustment of the weights in the evaluation function so as
to better approximate the performance standard.
 2003, G.Tecuci, Learning Agents Laboratory
23
Performance standards
What performance standards could be used?
One performance standard could be obtained by
conducting a deeper minimax search into future board
positions, applying the evaluation function to tip board
positions and backing up these values. The idea is that the
static evaluation function produces more accurate results
when the evaluated nodes are closer to a goal node.
 2003, G.Tecuci, Learning Agents Laboratory
24
Performance standards: using “f” itself
How could this be implemented?
One considers an iterative procedure of updating “f.”
The performance standard for a certain position B is
f(successor(B)).
That is, one adjusts the weights such that to reduce the
difference between f(successor(B)) and f(B).
B f(B)
f(successor(B))
 2003, G.Tecuci, Learning Agents Laboratory
25
Performance standards
What other performance standards could be used?
Another possible performance standard could be obtained
from "book games" played between two human experts. In
such a case, the static evaluation function should be modified
so that the value of the board position corresponding to the
move indicated by the book is higher than the values of the
positions corresponding to the other possible moves.
 2003, G.Tecuci, Learning Agents Laboratory
26
Discovering features to use in evaluation function
The problem of new terms: How could a learning system
discover the appropriate terms for representing the knowledge
to be learned?
A partial solution is term selection: provide a list of terms
from which the most relevant terms are to be chosen.
Samuel started with 38 terms, out of which only 16 are used in
the static evaluation function. The remaining 22 features are
maintained on a standby feature list. Periodically, the feature
that has the lowest weight out of the 16 features currently in
use in the evaluation function is replaced with the first feature
from the standby 22 feature list. The replaced feature is placed
at the end of the standby 22 feature list.
 2003, G.Tecuci, Learning Agents Laboratory
27
Other types of static evaluation functions
Signature table (an explicit representation of a function which gives the
value of the function for each possible combination of argument values).
Because such a table may be very large, one may reduce it by
considering only special combinations of argument values.
Learning the signature table means determining the values of
the function for particular combinations of the arguments.
The signature table is a more general representation than a
linear polynomial function.
Neural network
The inputs are the features and the output is the value of
the function.
 2003, G.Tecuci, Learning Agents Laboratory
28
Results of Samuel’s experiments
•Learning based on signature tables was much more efficient
than learning based on a linear polynomial function.
•Learning a signature table from book moves was more
efficient than rote learning.
 2003, G.Tecuci, Learning Agents Laboratory
29
Recommended reading
Mitchell T.M., Machine Learning, Chapter 1: Introduction, pp. 5-14, McGraw Hill,
1997.
Samuel A.L., Some studies in machine learning using the game of checkers, in
Readings in Machine Learning, pp.535-554.
The Handbook of Artificial Intelligence, vol. III, pp. 335-344, pp.457-464.
 2003, G.Tecuci, Learning Agents Laboratory
30