GO: the surrounding game

Learning to Play the Game of
GO
Lei Li
Computer Science Department
May 3, 2007
Outline
• Part I: Computer GO, a flavor taste
• Part II: Learn to Predict Move
Part I
Computer GO
a flavor taste
The Game of GO
• 19 by 19 grid
• Two players (black
and white)
• Place stone at
intersections of lines
• Object: maximize
surrounding
territories, or control
a larger part of the
board than your
opponent
Why GO is special?
• Very simple rule
• Global winning
• NO two games the
same
• Handicap system
• Current best program:
handtalk, unable to
beat experienced
amateur
Game
ComputerHuman
Checkers
Chinook > H
Othello
Chess
Go
Logistello > H
Deep Blue >= H
Handtalk << H
Complexity results
• Polynomial-space hard [Lichtenstein &
Sipser 80]
• Exponential-time complete [Robson 83]
Major approaches
• Tree Search based (minimax, alpha-beta)
– Handtalk, GO++, GNU GO
• Monte-Carlo methods
– Select best from random play
• Learning based
– Neural network [Enzenberger 96]
– SVM
– Graphical model [Stern et al 06]
Search in Computer GO
• Tree search
– Pattern matching
– Heuristics, expert rules
– Local search
– Early stop
– Alpha-beta pruning (which is successful for
chess)
– High-level abstract strategies
Major challenges
•
•
•
•
•
Too many possible moves (361)
How to evaluate a move (subtlety)
Implicit control vs explicit control
Connectivity viewpoint
Local state vs global state
Local vs Global
Local vs Global
Part II
Learn to Predict Move
Move Prediction in the Learning
Setting
• Given:
– a database of professional games
• Goal:
– learn the distribution of a move given the
current board state
– rank the moves
• Assumption:
– Experts always make best moves
Learning features
• State explosion with full board
– Full board state: configuration (c)
– 2361 possibilities
• Local State (t):
– Local pattern (within a region of size 64)
– Plus 8 extra features on liveness (situation)
Local pattern region
Local Liveness Features
•
•
•
•
•
Liberties of new chain: 1, 2, 3, >3
Liberties of opponent: 1, 2, 3, >3
Is there an active Ko?
Is new chain captured immediately?
Distance to board edge: <3, 4, 5, >5
Move Distribtuion Model
• Move v, given configuration c
– P (v | c)   P (v | c, u ) p (u ) du
– u as a prior value of pattern, Normal(μ, σ)
– Latent value of move, x|u ~ Normal(u, β)
– Pick the move with the largest latent value:
P(v | c, u )  P(arg max {x(v' , c)}  v)
• Learning posterior
by sum-product
algorithm
v'
Results
• Real data:
– 181,000 game records
– 600 million patterns (with prune)
• 34% of expert moves ranked first
– 86% in top 20
– can be used to score or rank during search
Test in the real game
• Opening: rather good
• Weaker in later stages
– missing pattern details
– global state needed
Some ideas for discussion
• Iterative Search and Learning
– Learning for move ranking/prediction
– Use ranking to score in search tree
– Search result as new data for learning
• Learn local region/global strategy
– learn abstract strategy (e.g. fight, defense)
– Group move sequence together?
Other questions
• Can computer learn from non-expert
human?
• Can computer learn to play by playing with
self?
References
• Graepel et al, Learning on graphs in the
game of go, 2001
• Stern et al, Modeling uncertainty in the
game of go, 2004
• Stern et al, Bayesian pattern ranking for
move prediction in the game of go, 2006
• Bouzy et al: Computer Go: an AI Oriented
Survey, 2001
Thanks!

Download Report

GO: the surrounding game

Paperzz.com

Your Paperzz