The Role of Knowledge Extraction in Computer Go Simon Viennot Japan Advanced Institute of Science and Technology December 15th, 2014 Japan, Kanazawa Japan Japan, Kanazawa Japan Kanazawa, Sea of Japan Japan, Kanazawa Japan Kanazawa, Sea of Japan Snow during winter JAIST Japan Advanced Institute of Science and Technology (JAIST) Master and doctor courses JAIST Japan Advanced Institute of Science and Technology (JAIST) Master and doctor courses Information Science Knowledge Science Material Science JAIST Japan Advanced Institute of Science and Technology (JAIST) Master and doctor courses Information Science Knowledge Science Material Science Information Science > Artificial Intelligence > Games Assistant professor (since 2013) Past and current work PhD. Thesis 2008-2011 (Lille University) Exact solution of games Sprouts, Cram, Dots-and-Boxes Past and current work PhD. Thesis 2008-2011 (Lille University) Exact solution of games Sprouts, Cram, Dots-and-Boxes Game of Go Since april 2012 (JAIST) Existing program, Nomitan, as a base Section 1 Game of Go The success of Computer Chess 1997 : Deep Blue defeats Kasparov but close The success of Computer Chess 1997 : Deep Blue defeats Kasparov but close αβ search algorithm The success of Computer Chess 1997 : Deep Blue defeats Kasparov but close αβ search algorithm state evaluation function (piece value) The success of Computer Chess 1997 : Deep Blue defeats Kasparov but close αβ search algorithm state evaluation function (piece value) Situation in 2014 The success of Computer Chess 1997 : Deep Blue defeats Kasparov but close αβ search algorithm state evaluation function (piece value) Situation in 2014 Magnus Carlsen ranked 2881 elo The success of Computer Chess 1997 : Deep Blue defeats Kasparov but close αβ search algorithm state evaluation function (piece value) Situation in 2014 Magnus Carlsen ranked 2881 elo Top program ranked 3270 elo on a standard 4-core pc (Houdini) The success of Computer Chess 1997 : Deep Blue defeats Kasparov but close αβ search algorithm state evaluation function (piece value) Situation in 2014 Magnus Carlsen ranked 2881 elo Top program ranked 3270 elo on a standard 4-core pc (Houdini) ⇒ Winning probability = 91% Comparison of Chess and Go Game of Chess Comparison of Chess and Go Game of Chess Game of Go Comparison of Chess and Go Game of Chess 8 × 8 board Game of Go Comparison of Chess and Go Game of Chess 8 × 8 board Game of Go 19 × 19 board Comparison of Chess and Go Game of Chess 8 × 8 board pieces are moved Game of Go 19 × 19 board Comparison of Chess and Go Game of Chess Game of Go 8 × 8 board 19 × 19 board pieces are moved stones are added one by one Comparison of Chess and Go Game of Chess Game of Go 8 × 8 board 19 × 19 board pieces are moved stones are added one by one 300-500 (India) Comparison of Chess and Go Game of Chess Game of Go 8 × 8 board 19 × 19 board pieces are moved stones are added one by one 300-500 (India) 400 BC (China) Comparison of Chess and Go Game of Chess Game of Go 8 × 8 board 19 × 19 board pieces are moved stones are added one by one 300-500 (India) 400 BC (China) 600 millions players (spread worldwide) Comparison of Chess and Go Game of Chess Game of Go 8 × 8 board 19 × 19 board pieces are moved stones are added one by one 300-500 (India) 400 BC (China) 600 millions players (spread worldwide) 60 millions players (China, Japan, Korea) Rules of the game of Go 13 × 13 example Black and White put one stone alternately Rules of the game of Go 13 × 13 example Black and White put one stone alternately Rules of the game of Go 13 × 13 example Black and White put one stone alternately Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board End of the game Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board End of the game 25 Black points Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board End of the game 25 Black points 26 White points Rules of the game of Go 13 × 13 example Black and White put one stone alternately Goal Surround the widest possible area of the board End of the game 25 Black points 26 White points ⇒ White wins Rules of the game of Go: Capture Capture of one stone Rules of the game of Go: Capture Capture of one stone Rules of the game of Go: Capture Capture of one stone Rules of the game of Go: Capture Capture of 5 stones Rules of the game of Go: Capture Capture of 5 stones Rules of the game of Go: Capture Capture of 5 stones Rules of the game of Go: Capture Capture of stones with one eye Rules of the game of Go: Capture Capture of stones with one eye Rules of the game of Go: Capture Capture of stones with one eye Rules of the game of Go: Capture Two eyes ? Rules of the game of Go: Capture Two eyes ? Impossible to capture ⇒ Black group is alive Why Computer Go is difficult ? Very large state-space of the game 19 × 19 board = 361 first possible moves the game is long (280 moves on average), with long-term goals Why Computer Go is difficult ? Very large state-space of the game 19 × 19 board = 361 first possible moves the game is long (280 moves on average), with long-term goals Designing an evaluation function is difficult all the stones look the same (no piece value like chess) alive vs dead stones Why Computer Go is difficult ? Very large state-space of the game 19 × 19 board = 361 first possible moves the game is long (280 moves on average), with long-term goals Designing an evaluation function is difficult all the stones look the same (no piece value like chess) alive vs dead stones ⇒ αβ search does not work well Section 2 Current level Computer Go history 9 dan 7 dan Top program Nomitan (JAIST) 5 dan 3 dan 1 dan 1 kyu 3 kyu 5 kyu 7 kyu 9 kyu 03 04 05 06 07 08 09 10 11 12 13 14 Figure: Strength of Computer Go programs UEC-cup competition UEC-cup University of Electro-Communications (Tokyo) Currently biggest Computer Go competition around 20 participants each year UEC-cup 2013, 2014 Rank 1 2 3 4 5 6 7 2013 Crazy Stone Zen Aya Pachi MP-Fuego Nomitan The Many Faces of Go Nomitan rank 2012 13th 2013 6th ≈ ≈ ≈ ≈ 6d 6d 4d 2d ? ≈ 2d ≈ 3d 2014 5th 2014 Zen Crazy Stone Aya Hirabot Nomitan Gogataki LeafQuest ≈ ≈ ≈ ≈ ≈ < < 6d 6d 4d 2d 3d 1d 1d Computer power Computer Power in UEC-cup No limit 16 to 64 cores usual Computer power Computer Power in UEC-cup No limit 16 to 64 cores usual cluster of the lab for Computer Go 19 machines, 204 cores ⇒ parallelization on a cluster Computer power Computer Power in UEC-cup No limit 16 to 64 cores usual cluster of the lab for Computer Go 19 machines, 204 cores ⇒ parallelization on a cluster 16-core machine : 2 dan 92-core cluster : 3 dan Section 3 Monte-Carlo Tree Search Monte-Carlo Tree Search Breakthrough Selection Tree policy Monte-Carlo Tree Search Breakthrough Selection Tree policy Monte-Carlo Tree Search Breakthrough Selection Tree policy Monte-Carlo Tree Search Breakthrough Selection Tree policy Monte-Carlo Tree Search Breakthrough Expansion Monte-Carlo Tree Search Breakthrough Simulation Simulation policy Win/Loss Monte-Carlo Tree Search Breakthrough Win/Loss Backpropagation Monte-Carlo Tree Search Breakthrough Win/Loss Backpropagation Monte-Carlo Tree Search Breakthrough Win/Loss Backpropagation Monte-Carlo Tree Search Breakthrough Win/Loss Backpropagation Upper Confidence Tree: Definition For which candidate move should we run the next simulation ? Upper Confidence Tree: Definition For which candidate move should we run the next simulation ? 2002 Auer et al : Upper Confidence Bound (UCB) formula 2006, Kocsis, Szepesvari : UCB formula as the tree policy Upper Confidence Tree: Definition For which candidate move should we run the next simulation ? 2002 Auer et al : Upper Confidence Bound (UCB) formula 2006, Kocsis, Szepesvari : UCB formula as the tree policy wi number of wins of the node ni number of visits of the node n number of visits of the parent node UCB formula: wi +C · µi = ni s ln n ni Upper Confidence Tree: Definition For which candidate move should we run the next simulation ? 2002 Auer et al : Upper Confidence Bound (UCB) formula 2006, Kocsis, Szepesvari : UCB formula as the tree policy wi number of wins of the node ni number of visits of the node n number of visits of the parent node UCB formula: wi +C · µi = ni s At each step of the selection step in Monte-Carlo Tree Search, choose the node that maximizes µi ln n ni Upper Confidence Tree: Analysis UCB formula: wi +C · µi = ni s ln n ni Upper Confidence Tree: Analysis UCB formula: wi +C · µi = ni Exploitation term wi ni s ln n ni Upper Confidence Tree: Analysis UCB formula: wi +C · µi = ni Exploitation term wi ni Meaning: choose more frequently nodes with good results s ln n ni Upper Confidence Tree: Analysis UCB formula: wi +C · µi = ni Exploitation term wi ni Meaning: choose more frequently nodes with good results s ln n ni Exploration term s ln n ni Upper Confidence Tree: Analysis UCB formula: wi +C · µi = ni Exploitation term wi ni Meaning: choose more frequently nodes with good results s ln n ni Exploration term s ln n ni Meaning: choose more frequently nodes not well explored Upper Confidence Tree: Analysis UCB formula: wi +C · µi = ni Exploitation term wi ni Meaning: choose more frequently nodes with good results s ln n ni Exploration term s ln n ni Meaning: choose more frequently nodes not well explored C is a parameter to balance the two terms Asymmetric Tree Growth Asymmetric growth of the tree developed by MCTS (Finnsonn PhD Thesis, 2012) Parallelization on a single machine Monte-Carlo is easy to parallelize (compared to αβ) Efficient if memory is shared (single machine) Thread 1 Parallelization on a single machine Monte-Carlo is easy to parallelize (compared to αβ) Efficient if memory is shared (single machine) Thread 1 Thread 2 Parallelization on a single machine Monte-Carlo is easy to parallelize (compared to αβ) Efficient if memory is shared (single machine) Thread 1 Thread 2 Thread 3 Section 4 Offline Knowledge Machine learning : Principle Offline knowledge Knowledge is useful in Chess as an “evaluation function” Machine learning : Principle Offline knowledge Knowledge is useful in Chess as an “evaluation function” No such evaluation function for Go Machine learning : Principle Offline knowledge Knowledge is useful in Chess as an “evaluation function” No such evaluation function for Go But can we use some knowledge ? Machine learning : Principle Offline knowledge Knowledge is useful in Chess as an “evaluation function” No such evaluation function for Go But can we use some knowledge ? 800 professional Go players Machine learning : Principle Offline knowledge Knowledge is useful in Chess as an “evaluation function” No such evaluation function for Go But can we use some knowledge ? 800 professional Go players ⇒ Machine learning from professional game records Machine learning Input Machine learning input : a collection of game positions and the moves played in that position Local pattern around a candidate move Local pattern around a candidate move Red cell = candidate move Local pattern around a candidate move 3x3 pattern around the candidate move Local pattern around a candidate move Bigger pattern around the candidate move Local pattern around a candidate move Different candidate move Local pattern around a candidate move Different candidate move 3x3 pattern Local pattern around a candidate move Different candidate move Bigger pattern Machine learning Machine learning idea Compare the local patterns and learn evaluation weights Machine learning Machine learning idea Compare the local patterns and learn evaluation weights Local patterns of played moves better than local patterns of not-played moves Machine learning features Features Local patterns are not sufficient Machine learning features Features Local patterns are not sufficient Generalization of patterns is called “features” Machine learning features Features Local patterns are not sufficient Generalization of patterns is called “features” Examples of features pattern shape distance to the previous move escape from immediate capture Machine learning features Features Local patterns are not sufficient Generalization of patterns is called “features” Examples of features pattern shape distance to the previous move escape from immediate capture ... (secret ?) Machine learning features Features Local patterns are not sufficient Generalization of patterns is called “features” Examples of features pattern shape distance to the previous move escape from immediate capture ... (secret ?) Remi Coulom, Computing elo ratings of move patterns in the game of go, 2007, ICGA Journal Usage of Learned Knowledge How can we include the learned knowledge? Replace “Random simulations” by “realistic” simulations Usage of Learned Knowledge How can we include the learned knowledge? Replace “Random simulations” by “realistic” simulations Progressive Widening: limit the number of searched moves Usage of Learned Knowledge How can we include the learned knowledge? Replace “Random simulations” by “realistic” simulations Progressive Widening: limit the number of searched moves Progressive Bias: search more some moves Realistic simulation Realistic simulation Black stones captured Realistic simulation Black stones captured ⇒ should be captured in all simulations Realistic simulation Black stones captured ⇒ should be captured in all simulations Random simulations: captured only in 50% of the simulations Realistic simulation Black stones captured ⇒ should be captured in all simulations Random simulations: captured only in 50% of the simulations Realistic simulations: almost always captured Progressive Widening Search only good candidates considering the learned knowledge Progressive Widening Search only good candidates considering the learned knowledge Typically in 13 × 13, only 15 moves searched Progressive bias New term in the UCB formula s r wj ln n K µj = +C · + CBT · · P(mj ) nj nj n+K Progressive bias New term in the UCB formula s r wj ln n K µj = +C · + CBT · · P(mj ) nj nj n+K Progressive bias New term in the UCB formula s r wj ln n K µj = +C · + CBT · · P(mj ) nj nj n+K P(mj ) is the evaluation (selection probability) from the machine learning Progressive bias New term in the UCB formula s r wj ln n K µj = +C · + CBT · · P(mj ) nj nj n+K P(mj ) is the evaluation (selection probability) from the machine learning ⇒ Moves frequently played by professionals will be searched more Progressive bias example Progressive bias example Move 1 2 3 4 Win. ratio 54.9% 53.5% 53.4% 53% Visits 4281 2834 2802 2437 Visits with bias BT Selection Progressive bias example Move 1 2 3 4 Win. ratio 54.9% 53.5% 53.4% 53% Visits 4281 2834 2802 2437 Visits with bias BT Selection 0.1% 79.7% 14.6% 2.1% Progressive bias example Move 1 2 3 4 Win. ratio 54.9% 53.5% 53.4% 53% Visits 4281 2834 2802 2437 Visits with bias 1221 BT Selection 0.1% 79.7% 14.6% 2.1% Progressive bias example Move 1 2 3 4 Win. ratio 54.9% 53.5% 53.4% 53% Visits 4281 2834 2802 2437 Visits with bias 1221 1449 365 BT Selection 0.1% 79.7% 14.6% 2.1% Progressive bias example Move 1 2 3 4 Win. ratio 54.9% 53.5% 53.4% 53% Visits 4281 2834 2802 2437 Visits with bias 1221 15683 1449 365 BT Selection 0.1% 79.7% 14.6% 2.1% Winning percentage Effect of progressive widening and progressive bias 70 60 50 40 30 20 10 0 without BT bias 1 1.5 2 2.5 3 3.5 4 4.5 µ parameter Figure: Winning rate against Fuego x-axis (left to right): smaller number of candidates moves 5 Winning percentage Effect of progressive widening and progressive bias 70 60 50 40 30 20 10 0 without BT bias with BT bias 1 1.5 2 2.5 3 3.5 4 4.5 µ parameter Figure: Winning rate against Fuego x-axis (left to right): smaller number of candidates moves 5 Winning percentage Effect of progressive widening and progressive bias 70 60 50 40 30 20 10 0 without BT bias with BT bias 1 1.5 2 2.5 3 3.5 4 4.5 µ parameter Figure: Winning rate against Fuego x-axis (left to right): smaller number of candidates moves K. Ikeda, S. Viennot, Efficiency of Static Knowledge Bias in Monte-Carlo Tree Search, 2013, Computers and Games 5 Section 5 Online Knowledge Information extraction from the simulations In MCTS, we collect the “winning ratio” information from the random simulations Information extraction from the simulations In MCTS, we collect the “winning ratio” information from the random simulations Simulations contain more information Information extraction from the simulations In MCTS, we collect the “winning ratio” information from the random simulations Simulations contain more information Extracting this information can improve the search Information extraction from the simulations In MCTS, we collect the “winning ratio” information from the random simulations Simulations contain more information Extracting this information can improve the search Example of other information Ownership Criticality Ownership Ownership Ownership = probability of “controlling an area” Ownership Ownership = probability of “controlling an area” Ownership boundary ≈ territory boundary Ownership Ownership = probability of “controlling an area” Ownership boundary ≈ territory boundary (possible) usage Search more the moves in the boundary Criticality Criticality Criticality = correlation between “controlling an area” and “winning” Criticality Criticality = correlation between “controlling an area” and “winning” Usage 1 Add moves from the critical area to the search candidates Criticality Criticality = correlation between “controlling an area” and “winning” Usage 1 Add moves from the critical area to the search candidates Usage 2 Search more the critical moves Section 6 Examples of current problems Problem 1 Problem 2 Problem 3 Section 7 Conclusion Conclusion Knowledge extraction Conclusion Knowledge extraction from professional game records with machine learning Conclusion Knowledge extraction from professional game records with machine learning from the simulations Conclusion Knowledge extraction from professional game records with machine learning from the simulations included both in the simulations and in the search algorithm Conclusion Knowledge extraction from professional game records with machine learning from the simulations included both in the simulations and in the search algorithm ⇒ strong programs Conclusion Knowledge extraction from professional game records with machine learning from the simulations included both in the simulations and in the search algorithm ⇒ strong programs How far are we from professionals ? Denseisen Computer-Human match Top 2 programs of UEC-cup Against top professionals With handicap Denseisen Computer-Human match Top 2 programs of UEC-cup Against top professionals With handicap 4-stone handicap games 2013 and 2014 : Crazystone wins, Zen loses article in Wired “The Mystery of Go” Conclusion How far are we from professionals ? Conclusion How far are we from professionals ? Currently top amateur level (6 dan) Conclusion How far are we from professionals ? Currently top amateur level (6 dan) But if the difference is 3 stones (optimistic), winning probability against professionals without handicap is... Conclusion How far are we from professionals ? Currently top amateur level (6 dan) But if the difference is 3 stones (optimistic), winning probability against professionals without handicap is... ... less than 2%. Conclusion How far are we from professionals ? Currently top amateur level (6 dan) But if the difference is 3 stones (optimistic), winning probability against professionals without handicap is... ... less than 2%. ⇒ Computer Go is still a challenge. Conclusion Thank you for listening. Any questions ? References 1993, Brugmann, Monte-Carlo Go 2003, Bouzy, Helmstetter, Monte-Carlo Go developments 2006, Kocsis, Szepesvari, Bandit based Monte-Carlo planning 2006, Gelly, Teytaud et al, Modification of UCT with patterns in Monte-Carlo Go 2007, Coulom, Computing ELO ratings of move patterns in the game of Go 2007, Gelly, Silver, Combining online and offline knowledge in UCT 2009, Coulom, Criticality - a Monte-Carlo heuristic for Go programs 2011, Petr Baudis, Master thesis, MCTS with information sharing 2012, Browne et al, A Survey of Monte Carlo Tree Search Methods
© Copyright 2026 Paperzz