Learning Othello - University of St. Thomas

Learning Othello
The quest for general strategy
building.
Objectives and Goals
• Develop a program that is capable of
learning to play Othello.
• More interested in how the program learns.
• Learn which techniques are more affective
than others
• NOT building Deep Blue!
The Big Picture
• Just a small piece in a much bigger puzzle.
• Using games to explore building a general
strategy ontology.
• Is it possible to find patterns in different
games to form similar, working strategies?
What is Othello?
• Two player grid-based game.
• Black and white discs used.
• Must move if you can, skip turn if you
can’t.
• Object is to “outflank” opponent’s discs and
convert them to yours.
• Game over when no more moves possible.
• Winner is player with most discs.
The Board
System Design
GUI Dis play
Playe r
Agen t
Strategy
Text Dis play
Game
Controll er
Learning
Agen t
(GA)
Knowledge Base
Playe r
Agen t
Strategy
System Developed Iteratively
• Iteration 1
– Knowledge base will contain moves and
genetic algorithm information.
• Iteration 2
– Game logic contained in controller moved to
knowledge base.
• Iteration 3
– All player logic stored in knowledge base.
Game Controller
•
•
•
•
Initialize all components of the system.
Setup the default board.
Prepare the display.
Use learning agent to prepare player
strategies.
• Control game flow.
• Control training sessions.
System Design
GUI Dis play
Playe r
Agen t
Strategy
Text Dis play
Game
Controll er
Learning
Agen t
(GA)
Knowledge Base
Playe r
Agen t
Strategy
Game Controller Cont.
• Iteration 1
– Present player agent with list of possible
moves.
– Take move from player agent and control
board.
• In iteration 2, a lot of this logic will be
contained in the KB.
Player Agent
• Consult strategy to return a move to the
Game Controller.
• Iteration 1
– Game Controller will give agent all possible
moves to chose from.
• Iteration 2
– Game controller will give agent copy of the
board, and the agent will find and chose move.
System Design
GUI Dis play
Playe r
Agen t
Strategy
Text Dis play
Game
Controll er
Learning
Agen t
(GA)
Knowledge Base
Playe r
Agen t
Strategy
Learning Agent
• Majority of the focus for the project and
iteration 1.
• Uses genetic algorithm approach for
building strategy.
• Moderate risk because not directly
applicable to the problem domain.
Steps for GA
•
•
•
•
Encode problem in terms of chromosomes.
Define fitness function for chromosomes.
Generate initial population.
Measure fitness and select fittest to “mate.”
– Cross over
– Mutation
• Repeat
The Genetics of Othello
• Moves will be the genes.
–
–
–
–
Horizontal_skip, Horizontal_jump
Vertical_skip, Vertical_jump
Diagonal_skip, Diagonal_jump
Hybrid (?)
• In iteration 1, a chromosome will contain all
possible moves.
• Iteration 2, a chromosome may not contain all
moves.
Fitness
• Simplest measure is your fit if you win.
• More complicated, and possibly more
accurate is a measure of the number of discs
taken.
– Scoring system from online game.
The Birds and Bees of Othello
• Crossover
– Swapping the order of the moves in a
chromosome.
– Need to avoid duplicates.
• Mutation
– Change the jump/skip value.
– True mutation because of the coupling?
– Defined differently in hybrid moves?
Knowledge Base
• Iteration 1
– Contain the definition of the genetic operators
(crossover and mutation)
– Contain the moves and their relation to the genetic
operators
• Iteration 2
– More game logic stored so fitness functions can be
defined and tied closer to the genetic algorithm
information
• Iteration 3
– All game logic contained in KB.
Overall Application Flow
• Game controller started and components
initialized.
• Learning agent populates initial
chromosome base.
• Game controller directs games.
• Learning agent “mates” chromosomes
based on game outcomes.
Issues
• Defining a strategy simply by moves may be too
simplistic
• How exactly to mate the chromosomes
– One chromosome that is operated on
• Might work if it contains all moves
– Multiple chromosomes to mix and match
• How play multiple simultaneous games
• Encoding the game knowledge to make it
“computational”
Summary
• Good first step towards a very complex
problem.
• Allows the ability to explore different areas
of AI with the same infrastructure.
– Swappable learning agents