The Role of Knowledge Extraction in Computer Go

The Role of Knowledge Extraction in Computer Go
Simon Viennot
Japan Advanced Institute of Science and Technology
December 15th, 2014
Japan, Kanazawa
Japan
Japan, Kanazawa
Japan
Kanazawa, Sea of Japan
Japan, Kanazawa
Japan
Kanazawa, Sea of Japan
Snow during winter
JAIST
Japan Advanced Institute of Science and Technology (JAIST)
Master and doctor courses
JAIST
Japan Advanced Institute of Science and Technology (JAIST)
Master and doctor courses
Information Science
Knowledge Science
Material Science
JAIST
Japan Advanced Institute of Science and Technology (JAIST)
Master and doctor courses
Information Science
Knowledge Science
Material Science
Information Science > Artificial Intelligence > Games
Assistant professor (since 2013)
Past and current work
PhD. Thesis
2008-2011 (Lille University)
Exact solution of games
Sprouts, Cram, Dots-and-Boxes
Past and current work
PhD. Thesis
2008-2011 (Lille University)
Exact solution of games
Sprouts, Cram, Dots-and-Boxes
Game of Go
Since april 2012 (JAIST)
Existing program, Nomitan, as a
base
Section 1
Game of Go
The success of Computer Chess
1997 : Deep Blue defeats Kasparov
but close
The success of Computer Chess
1997 : Deep Blue defeats Kasparov
but close
αβ search algorithm
The success of Computer Chess
1997 : Deep Blue defeats Kasparov
but close
αβ search algorithm
state evaluation function (piece value)
The success of Computer Chess
1997 : Deep Blue defeats Kasparov
but close
αβ search algorithm
state evaluation function (piece value)
Situation in 2014
The success of Computer Chess
1997 : Deep Blue defeats Kasparov
but close
αβ search algorithm
state evaluation function (piece value)
Situation in 2014
Magnus Carlsen ranked 2881 elo
The success of Computer Chess
1997 : Deep Blue defeats Kasparov
but close
αβ search algorithm
state evaluation function (piece value)
Situation in 2014
Magnus Carlsen ranked 2881 elo
Top program ranked 3270 elo
on a standard 4-core pc (Houdini)
The success of Computer Chess
1997 : Deep Blue defeats Kasparov
but close
αβ search algorithm
state evaluation function (piece value)
Situation in 2014
Magnus Carlsen ranked 2881 elo
Top program ranked 3270 elo
on a standard 4-core pc (Houdini)
⇒ Winning probability = 91%
Comparison of Chess and Go
Game of Chess
Comparison of Chess and Go
Game of Chess
Game of Go
Comparison of Chess and Go
Game of Chess
8 × 8 board
Game of Go
Comparison of Chess and Go
Game of Chess
8 × 8 board
Game of Go
19 × 19 board
Comparison of Chess and Go
Game of Chess
8 × 8 board
pieces are moved
Game of Go
19 × 19 board
Comparison of Chess and Go
Game of Chess
Game of Go
8 × 8 board
19 × 19 board
pieces are moved
stones are added one by one
Comparison of Chess and Go
Game of Chess
Game of Go
8 × 8 board
19 × 19 board
pieces are moved
stones are added one by one
300-500 (India)
Comparison of Chess and Go
Game of Chess
Game of Go
8 × 8 board
19 × 19 board
pieces are moved
stones are added one by one
300-500 (India)
400 BC (China)
Comparison of Chess and Go
Game of Chess
Game of Go
8 × 8 board
19 × 19 board
pieces are moved
stones are added one by one
300-500 (India)
400 BC (China)
600 millions players
(spread worldwide)
Comparison of Chess and Go
Game of Chess
Game of Go
8 × 8 board
19 × 19 board
pieces are moved
stones are added one by one
300-500 (India)
400 BC (China)
600 millions players
(spread worldwide)
60 millions players
(China, Japan, Korea)
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
End of the game
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
End of the game
25 Black points
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
End of the game
25 Black points
26 White points
Rules of the game of Go
13 × 13 example
Black and White put
one stone alternately
Goal
Surround the widest
possible area of the
board
End of the game
25 Black points
26 White points
⇒ White wins
Rules of the game of Go: Capture
Capture of one stone
Rules of the game of Go: Capture
Capture of one stone
Rules of the game of Go: Capture
Capture of one stone
Rules of the game of Go: Capture
Capture of 5 stones
Rules of the game of Go: Capture
Capture of 5 stones
Rules of the game of Go: Capture
Capture of 5 stones
Rules of the game of Go: Capture
Capture of stones
with one eye
Rules of the game of Go: Capture
Capture of stones
with one eye
Rules of the game of Go: Capture
Capture of stones
with one eye
Rules of the game of Go: Capture
Two eyes ?
Rules of the game of Go: Capture
Two eyes ?
Impossible to capture
⇒ Black group is alive
Why Computer Go is difficult ?
Very large state-space of the game
19 × 19 board = 361 first possible moves
the game is long (280 moves on average), with long-term
goals
Why Computer Go is difficult ?
Very large state-space of the game
19 × 19 board = 361 first possible moves
the game is long (280 moves on average), with long-term
goals
Designing an evaluation function is difficult
all the stones look the same (no piece value like chess)
alive vs dead stones
Why Computer Go is difficult ?
Very large state-space of the game
19 × 19 board = 361 first possible moves
the game is long (280 moves on average), with long-term
goals
Designing an evaluation function is difficult
all the stones look the same (no piece value like chess)
alive vs dead stones
⇒ αβ search does not work well
Section 2
Current level
Computer Go history
9 dan
7 dan
Top program
Nomitan (JAIST)
5 dan
3 dan
1 dan
1 kyu
3 kyu
5 kyu
7 kyu
9 kyu
03 04 05 06 07 08 09 10 11 12 13 14
Figure: Strength of Computer Go programs
UEC-cup competition
UEC-cup
University of Electro-Communications (Tokyo)
Currently biggest Computer Go competition
around 20 participants each year
UEC-cup 2013, 2014
Rank
1
2
3
4
5
6
7
2013
Crazy Stone
Zen
Aya
Pachi
MP-Fuego
Nomitan
The Many Faces of Go
Nomitan rank
2012
13th
2013
6th
≈
≈
≈
≈
6d
6d
4d
2d
?
≈ 2d
≈ 3d
2014
5th
2014
Zen
Crazy Stone
Aya
Hirabot
Nomitan
Gogataki
LeafQuest
≈
≈
≈
≈
≈
<
<
6d
6d
4d
2d
3d
1d
1d
Computer power
Computer Power in UEC-cup
No limit
16 to 64 cores usual
Computer power
Computer Power in UEC-cup
No limit
16 to 64 cores usual
cluster of the lab for Computer Go
19 machines, 204 cores
⇒ parallelization on a cluster
Computer power
Computer Power in UEC-cup
No limit
16 to 64 cores usual
cluster of the lab for Computer Go
19 machines, 204 cores
⇒ parallelization on a cluster
16-core machine : 2 dan
92-core cluster : 3 dan
Section 3
Monte-Carlo Tree Search
Monte-Carlo Tree Search Breakthrough
Selection
Tree policy
Monte-Carlo Tree Search Breakthrough
Selection
Tree policy
Monte-Carlo Tree Search Breakthrough
Selection
Tree policy
Monte-Carlo Tree Search Breakthrough
Selection
Tree policy
Monte-Carlo Tree Search Breakthrough
Expansion
Monte-Carlo Tree Search Breakthrough
Simulation
Simulation policy
Win/Loss
Monte-Carlo Tree Search Breakthrough
Win/Loss
Backpropagation
Monte-Carlo Tree Search Breakthrough
Win/Loss
Backpropagation
Monte-Carlo Tree Search Breakthrough
Win/Loss
Backpropagation
Monte-Carlo Tree Search Breakthrough
Win/Loss
Backpropagation
Upper Confidence Tree: Definition
For which candidate move should we run the next simulation ?
Upper Confidence Tree: Definition
For which candidate move should we run the next simulation ?
2002 Auer et al : Upper Confidence Bound (UCB) formula
2006, Kocsis, Szepesvari : UCB formula as the tree policy
Upper Confidence Tree: Definition
For which candidate move should we run the next simulation ?
2002 Auer et al : Upper Confidence Bound (UCB) formula
2006, Kocsis, Szepesvari : UCB formula as the tree policy
wi number of wins of the node
ni number of visits of the node
n number of visits of the parent
node
UCB formula:
wi
+C ·
µi =
ni
s
ln n
ni
Upper Confidence Tree: Definition
For which candidate move should we run the next simulation ?
2002 Auer et al : Upper Confidence Bound (UCB) formula
2006, Kocsis, Szepesvari : UCB formula as the tree policy
wi number of wins of the node
ni number of visits of the node
n number of visits of the parent
node
UCB formula:
wi
+C ·
µi =
ni
s
At each step of the selection step in Monte-Carlo Tree Search,
choose the node that maximizes µi
ln n
ni
Upper Confidence Tree: Analysis
UCB formula:
wi
+C ·
µi =
ni
s
ln n
ni
Upper Confidence Tree: Analysis
UCB formula:
wi
+C ·
µi =
ni
Exploitation term
wi
ni
s
ln n
ni
Upper Confidence Tree: Analysis
UCB formula:
wi
+C ·
µi =
ni
Exploitation term
wi
ni
Meaning: choose more frequently
nodes with good results
s
ln n
ni
Upper Confidence Tree: Analysis
UCB formula:
wi
+C ·
µi =
ni
Exploitation term
wi
ni
Meaning: choose more frequently
nodes with good results
s
ln n
ni
Exploration term
s
ln n
ni
Upper Confidence Tree: Analysis
UCB formula:
wi
+C ·
µi =
ni
Exploitation term
wi
ni
Meaning: choose more frequently
nodes with good results
s
ln n
ni
Exploration term
s
ln n
ni
Meaning: choose more frequently
nodes not well explored
Upper Confidence Tree: Analysis
UCB formula:
wi
+C ·
µi =
ni
Exploitation term
wi
ni
Meaning: choose more frequently
nodes with good results
s
ln n
ni
Exploration term
s
ln n
ni
Meaning: choose more frequently
nodes not well explored
C is a parameter to balance the two terms
Asymmetric Tree Growth
Asymmetric growth of the tree developed by MCTS
(Finnsonn PhD Thesis, 2012)
Parallelization on a single machine
Monte-Carlo is easy to parallelize (compared to αβ)
Efficient if memory is shared (single machine)
Thread 1
Parallelization on a single machine
Monte-Carlo is easy to parallelize (compared to αβ)
Efficient if memory is shared (single machine)
Thread 1
Thread 2
Parallelization on a single machine
Monte-Carlo is easy to parallelize (compared to αβ)
Efficient if memory is shared (single machine)
Thread 1
Thread 2
Thread 3
Section 4
Offline Knowledge
Machine learning : Principle
Offline knowledge
Knowledge is useful in Chess as an “evaluation function”
Machine learning : Principle
Offline knowledge
Knowledge is useful in Chess as an “evaluation function”
No such evaluation function for Go
Machine learning : Principle
Offline knowledge
Knowledge is useful in Chess as an “evaluation function”
No such evaluation function for Go
But can we use some knowledge ?
Machine learning : Principle
Offline knowledge
Knowledge is useful in Chess as an “evaluation function”
No such evaluation function for Go
But can we use some knowledge ?
800 professional Go players
Machine learning : Principle
Offline knowledge
Knowledge is useful in Chess as an “evaluation function”
No such evaluation function for Go
But can we use some knowledge ?
800 professional Go players
⇒ Machine learning from
professional game records
Machine learning Input
Machine learning input : a collection of game positions and
the moves played in that position
Local pattern around a candidate move
Local pattern around a candidate move
Red cell =
candidate move
Local pattern around a candidate move
3x3 pattern around
the candidate move
Local pattern around a candidate move
Bigger pattern around
the candidate move
Local pattern around a candidate move
Different candidate
move
Local pattern around a candidate move
Different candidate
move
3x3 pattern
Local pattern around a candidate move
Different candidate
move
Bigger pattern
Machine learning
Machine learning idea
Compare the local patterns and learn evaluation weights
Machine learning
Machine learning idea
Compare the local patterns and learn evaluation weights
Local patterns of played moves better than local patterns of
not-played moves
Machine learning features
Features
Local patterns are not sufficient
Machine learning features
Features
Local patterns are not sufficient
Generalization of patterns is called “features”
Machine learning features
Features
Local patterns are not sufficient
Generalization of patterns is called “features”
Examples of features
pattern shape
distance to the previous move
escape from immediate capture
Machine learning features
Features
Local patterns are not sufficient
Generalization of patterns is called “features”
Examples of features
pattern shape
distance to the previous move
escape from immediate capture
... (secret ?)
Machine learning features
Features
Local patterns are not sufficient
Generalization of patterns is called “features”
Examples of features
pattern shape
distance to the previous move
escape from immediate capture
... (secret ?)
Remi Coulom, Computing elo ratings of move patterns in the game
of go, 2007, ICGA Journal
Usage of Learned Knowledge
How can we include the learned knowledge?
Replace “Random simulations” by “realistic” simulations
Usage of Learned Knowledge
How can we include the learned knowledge?
Replace “Random simulations” by “realistic” simulations
Progressive Widening: limit the number of searched moves
Usage of Learned Knowledge
How can we include the learned knowledge?
Replace “Random simulations” by “realistic” simulations
Progressive Widening: limit the number of searched moves
Progressive Bias: search more some moves
Realistic simulation
Realistic simulation
Black stones captured
Realistic simulation
Black stones captured
⇒ should be captured
in all simulations
Realistic simulation
Black stones captured
⇒ should be captured
in all simulations
Random simulations:
captured only in 50%
of the simulations
Realistic simulation
Black stones captured
⇒ should be captured
in all simulations
Random simulations:
captured only in 50%
of the simulations
Realistic simulations:
almost always captured
Progressive Widening
Search only good candidates considering the learned
knowledge
Progressive Widening
Search only good candidates considering the learned
knowledge
Typically in 13 × 13, only 15 moves searched
Progressive bias
New term in the UCB formula
s
r
wj
ln n
K
µj =
+C ·
+ CBT ·
· P(mj )
nj
nj
n+K
Progressive bias
New term in the UCB formula
s
r
wj
ln n
K
µj =
+C ·
+ CBT ·
· P(mj )
nj
nj
n+K
Progressive bias
New term in the UCB formula
s
r
wj
ln n
K
µj =
+C ·
+ CBT ·
· P(mj )
nj
nj
n+K
P(mj ) is the evaluation (selection probability)
from the machine learning
Progressive bias
New term in the UCB formula
s
r
wj
ln n
K
µj =
+C ·
+ CBT ·
· P(mj )
nj
nj
n+K
P(mj ) is the evaluation (selection probability)
from the machine learning
⇒ Moves frequently played by professionals will be searched more
Progressive bias example
Progressive bias example
Move
1
2
3
4
Win. ratio
54.9%
53.5%
53.4%
53%
Visits
4281
2834
2802
2437
Visits with bias
BT Selection
Progressive bias example
Move
1
2
3
4
Win. ratio
54.9%
53.5%
53.4%
53%
Visits
4281
2834
2802
2437
Visits with bias
BT Selection
0.1%
79.7%
14.6%
2.1%
Progressive bias example
Move
1
2
3
4
Win. ratio
54.9%
53.5%
53.4%
53%
Visits
4281
2834
2802
2437
Visits with bias
1221
BT Selection
0.1%
79.7%
14.6%
2.1%
Progressive bias example
Move
1
2
3
4
Win. ratio
54.9%
53.5%
53.4%
53%
Visits
4281
2834
2802
2437
Visits with bias
1221
1449
365
BT Selection
0.1%
79.7%
14.6%
2.1%
Progressive bias example
Move
1
2
3
4
Win. ratio
54.9%
53.5%
53.4%
53%
Visits
4281
2834
2802
2437
Visits with bias
1221
15683
1449
365
BT Selection
0.1%
79.7%
14.6%
2.1%
Winning percentage
Effect of progressive widening and progressive bias
70
60
50
40
30
20
10
0
without BT bias
1
1.5
2
2.5
3
3.5
4
4.5
µ parameter
Figure: Winning rate against Fuego
x-axis (left to right): smaller number of candidates moves
5
Winning percentage
Effect of progressive widening and progressive bias
70
60
50
40
30
20
10
0
without BT bias
with BT bias
1
1.5
2
2.5
3
3.5
4
4.5
µ parameter
Figure: Winning rate against Fuego
x-axis (left to right): smaller number of candidates moves
5
Winning percentage
Effect of progressive widening and progressive bias
70
60
50
40
30
20
10
0
without BT bias
with BT bias
1
1.5
2
2.5
3
3.5
4
4.5
µ parameter
Figure: Winning rate against Fuego
x-axis (left to right): smaller number of candidates moves
K. Ikeda, S. Viennot, Efficiency of Static Knowledge Bias in
Monte-Carlo Tree Search, 2013, Computers and Games
5
Section 5
Online Knowledge
Information extraction from the simulations
In MCTS, we collect the “winning ratio” information from the
random simulations
Information extraction from the simulations
In MCTS, we collect the “winning ratio” information from the
random simulations
Simulations contain more information
Information extraction from the simulations
In MCTS, we collect the “winning ratio” information from the
random simulations
Simulations contain more information
Extracting this information can improve the search
Information extraction from the simulations
In MCTS, we collect the “winning ratio” information from the
random simulations
Simulations contain more information
Extracting this information can improve the search
Example of other information
Ownership
Criticality
Ownership
Ownership
Ownership =
probability of
“controlling an area”
Ownership
Ownership =
probability of
“controlling an area”
Ownership boundary ≈
territory boundary
Ownership
Ownership =
probability of
“controlling an area”
Ownership boundary ≈
territory boundary
(possible) usage
Search more the moves
in the boundary
Criticality
Criticality
Criticality =
correlation between
“controlling an area”
and “winning”
Criticality
Criticality =
correlation between
“controlling an area”
and “winning”
Usage 1
Add moves from the
critical area to the
search candidates
Criticality
Criticality =
correlation between
“controlling an area”
and “winning”
Usage 1
Add moves from the
critical area to the
search candidates
Usage 2
Search more the
critical moves
Section 6
Examples of current problems
Problem 1
Problem 2
Problem 3
Section 7
Conclusion
Conclusion
Knowledge extraction
Conclusion
Knowledge extraction
from professional game records with machine learning
Conclusion
Knowledge extraction
from professional game records with machine learning
from the simulations
Conclusion
Knowledge extraction
from professional game records with machine learning
from the simulations
included both in the simulations and in the search algorithm
Conclusion
Knowledge extraction
from professional game records with machine learning
from the simulations
included both in the simulations and in the search algorithm
⇒ strong programs
Conclusion
Knowledge extraction
from professional game records with machine learning
from the simulations
included both in the simulations and in the search algorithm
⇒ strong programs
How far are we from professionals ?
Denseisen
Computer-Human match
Top 2 programs of UEC-cup
Against top professionals
With handicap
Denseisen
Computer-Human match
Top 2 programs of UEC-cup
Against top professionals
With handicap
4-stone handicap games
2013 and 2014 : Crazystone wins, Zen loses
article in Wired “The Mystery of Go”
Conclusion
How far are we from professionals ?
Conclusion
How far are we from professionals ?
Currently top amateur level (6 dan)
Conclusion
How far are we from professionals ?
Currently top amateur level (6 dan)
But if the difference is 3 stones (optimistic), winning
probability against professionals without handicap is...
Conclusion
How far are we from professionals ?
Currently top amateur level (6 dan)
But if the difference is 3 stones (optimistic), winning
probability against professionals without handicap is...
... less than 2%.
Conclusion
How far are we from professionals ?
Currently top amateur level (6 dan)
But if the difference is 3 stones (optimistic), winning
probability against professionals without handicap is...
... less than 2%.
⇒ Computer Go is still a challenge.
Conclusion
Thank you for listening.
Any questions ?
References
1993, Brugmann, Monte-Carlo Go
2003, Bouzy, Helmstetter, Monte-Carlo Go developments
2006, Kocsis, Szepesvari, Bandit based Monte-Carlo planning
2006, Gelly, Teytaud et al, Modification of UCT with patterns
in Monte-Carlo Go
2007, Coulom, Computing ELO ratings of move patterns in
the game of Go
2007, Gelly, Silver, Combining online and offline knowledge in
UCT
2009, Coulom, Criticality - a Monte-Carlo heuristic for Go
programs
2011, Petr Baudis, Master thesis, MCTS with information
sharing
2012, Browne et al, A Survey of Monte Carlo Tree Search
Methods