The Implementation of Machine Learning in the Game of Checkers

The Implementation of Machine
Learning in the Game of Checkers
Billy Melicher
Computer Systems lab 08
10/29/08
Abstract
• Machine learning uses past information
to predict future states
• Can be used in any situation where the
past will predict the future
• Will adapt to situations
Introduction
• Checkers is used to
explore machine
learning
• Checkers has many
tactical aspects that
make it good for
studying
Background
• Minimax
• Heuristics
• Learning
Minimax
• Method of adversarial search
• Every pattern(board) can be given a fitness
value(heuristic)
• Each player chooses the outcome that is best
for them from the choices they have
Minimax
Minimax
• Has exponential growth rate
• Can only evaluate a certain number of actions
into the future – ply
Heuristic
• Heuristics predict out come of a board
• Fitness value of board, higher value, better
outcome
• Not perfect
• Requires expertise in the situation to create
Heuristics
•
•
•
•
H(s) = c0F0(s) + c1F1(s) + … + cnFn(s)
H(s) = heuristic
Has many different terms
In checkers terms could be:
•
•
•
•
Number of checkers
Number of kings
Number of checkers on an edge
How far checkers are on board
Learning by Rote
• Stores every game played
• Connects the moves made for each board
• Relates the moves made from a particular
board to the outcome of the board
• More likely to make moves that result in a
win, less likely to make moves resulting in a
loss
• Good in end game, not as good in mid game
Learning by Generalization
• Uses a heuristic function to guide moves
• Changes the heuristic function after games
based on the outcome
• Good in mid game but not as good in early
and end games
• Requires identifying the features that affect
game
Development
• Use of minimax algorithm with alpha beta
pruning
• Use of both learning by Rote and
Generalization
• Temporal difference learning
Temporal Difference Learning
• In temporal difference learning, you adjust the
heuristic based on the difference between the
heuristic at one time and at another
• Equilibrium moves toward ideal function
• U(s) <-- U(s) + α( R(s) + γU(s') - U(s))