Parallel Programming in Chess Simulations Part 2 Tyler Patton

Parallel
Programming in
Chess Simulations
Part 2
Tyler Patton
Discussion:
Chess Engine Basics
Everything Parallel
What Next?
Background: Scope
First estimate of the number of positions:
64! / 32!*(8!)2*(2!)6 =1043 (Shannon)
Tight upper bound:
=1050 (Dr. Allis)
Number of possible game variations:
10120 (Shannon number)
Given ~103 starting moves and 40 move pair
average
Basics:
StockFish implementation
•
•
•
•
•
Minimax
Alpha-Beta pruning
Bitboards
Late Move Reductions
Large transposition table
Basics:
Minimax search
• Concept
Maximize the evaluation of your
move while minimizing opponents
move evaluation
• Reviews each possible move sequence
• In chess, Minimax has a high time cost since every
move sequence is evaluated regardless of the move
Basics:
Alpha-Beta Pruning
• Allows for eliminations
of branches
• Alpha: Our best move so
far in the current path
• Beta: Our Opponent’s
best move so far in the current path
• If the current node has a better beta value we prune the branch
(the best beta value is minimum)
Basics:
Time complexity
• Minimax Search:
O(bm)
Where b is branch factor and m is move depth
For chess, b ≈ 35 and m ≈ 100
• Alpha Beta Pruning:
O(bm/2) Doubles search depth from minimax
Basics:
Late Move Reduction (LMR)
•
•
•
Alpha-Beta produces an ordered list of effective
moves to search
Moves toward the end of the list are unlikely to
produce values that increase alpha
LMR does a reduced depth search on the late move
and checks for an increase versus alpha
•
If the score is greater than or equal to alpha we know
nothing and complete a full depth search
•
If the score is less than alpha we prune this node
Basics:
Transposition Table
• Stores the history of search evaluations
• Positions that been searched are likely to be
reached again
• Before a branch is searched the transpositions
table is checked and gives the result if able
• Implemented as a hash table
Parallelization:
Parallelizing Alpha-Beta pruning
•
Goal: Use multiple processors to simultaneously
search different branches of the game tree
•
Drawback: Dependency on the alpha value
•
Parallel algorithms tend to be less efficient since the alpha
value is not as strong
•
If the best alpha value is the first branch searched then the
parallel algorithm has equal iterations to the sequential
algorithm
•
Processors are dependent on each other for updated
alpha values which cause communication locks
Parallelization:
Principal Variation Splitting (PVS)
•
•
Early technique for
parallelizing alpha-beta
Assumptions:
•
•
The game tree is well
ordered
The leftmost path is
the best
•
•
Updates alpha after a branch is searched
Processors work under the same node
Parallelization:
Enhanced Principal Variation Splitting (EPVS)
•
•
Simple improvement of PVS
When a processor runs out of work:
•
•
•
•
•
Stop all processors at ply P
Evaluate the branches 2 ply
Split the processors among the tree like PVS
Interacts with transposition table further calculations
are not redundant
Increased communication overhead
Parallelization:
Dynamic Tree Splitting (DTS)
•
Assumptions:
•
•
•
Shared memory
Communication cost = 0 (Cray C916/1024 computer)
Steps:
•
•
•
•
•
One processor searches from ply = 1 to N
Each other processer begins processing nodes as in PVS
If a processor has no work to do it broadcasts to help and joins
another processor with work to share
A split position is chosen and the processors divide the node
Evaluations and splitting are looped until the node is complete
Parallelization:
Speedup Comparisons
•
PVS:
•
EPVS:
•
DTS:
What Next?
Some things we didn’t have time for
•
Bitboards
•
•
Younger Brother Wait Concept
•
•
Master-slave approach; similar to DTS
GPU implementations
•
•
Data structure for storing chess positions
i.e. CPU generates the tree then GPU evaluates
Neural Networks
•
Simulation technique which may allow more processors
What Next?
Looking to the future
•
•
The best algorithm for large numbers of processors
and indefinite tree size is unknown
Optimizations to existing algorithms and techniques
are still possible
•
•
i.e. making the new alpha available to each processor
when its found as opposed to when a processor finishes a
search
Explore new algorithms that don’t rely on
communication pitfalls or tree structure
Questions?
Sources:
http://ijsetr.org/wp-content/uploads/2015/05/IJSETR-VOL-4-ISSUE-4-1138-1141.pdf
https://www.cis.uab.edu/hyatt/search.html
http://www.top-5000.nl/ps/Parallel%20Alpha-Beta%20Search%20on%20Shared%20Memory%20Multiprocessors.pdf
http://arirang.snu.ac.kr/courses/pp2006/Chapter16.pdf
http://iacoma.cs.uiuc.edu/~greskamp/pdfs/412.pdf
https://www.fide.com/component/handbook/?id=174&view=article
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=42858
http://supertech.csail.mit.edu/papers/dimacs94.pdf
http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4031890
http://www.sciencedirect.com/science/article/pii/S1875952111000450
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3135&rep=rep1&type=pdf
http://podelise.ru/tw_files/25875/d-25874275/7z-docs/1.pdf