Parallel Programming in Chess Simulations Part 2 Tyler Patton Discussion: Chess Engine Basics Everything Parallel What Next? Background: Scope First estimate of the number of positions: 64! / 32!*(8!)2*(2!)6 =1043 (Shannon) Tight upper bound: =1050 (Dr. Allis) Number of possible game variations: 10120 (Shannon number) Given ~103 starting moves and 40 move pair average Basics: StockFish implementation • • • • • Minimax Alpha-Beta pruning Bitboards Late Move Reductions Large transposition table Basics: Minimax search • Concept Maximize the evaluation of your move while minimizing opponents move evaluation • Reviews each possible move sequence • In chess, Minimax has a high time cost since every move sequence is evaluated regardless of the move Basics: Alpha-Beta Pruning • Allows for eliminations of branches • Alpha: Our best move so far in the current path • Beta: Our Opponent’s best move so far in the current path • If the current node has a better beta value we prune the branch (the best beta value is minimum) Basics: Time complexity • Minimax Search: O(bm) Where b is branch factor and m is move depth For chess, b ≈ 35 and m ≈ 100 • Alpha Beta Pruning: O(bm/2) Doubles search depth from minimax Basics: Late Move Reduction (LMR) • • • Alpha-Beta produces an ordered list of effective moves to search Moves toward the end of the list are unlikely to produce values that increase alpha LMR does a reduced depth search on the late move and checks for an increase versus alpha • If the score is greater than or equal to alpha we know nothing and complete a full depth search • If the score is less than alpha we prune this node Basics: Transposition Table • Stores the history of search evaluations • Positions that been searched are likely to be reached again • Before a branch is searched the transpositions table is checked and gives the result if able • Implemented as a hash table Parallelization: Parallelizing Alpha-Beta pruning • Goal: Use multiple processors to simultaneously search different branches of the game tree • Drawback: Dependency on the alpha value • Parallel algorithms tend to be less efficient since the alpha value is not as strong • If the best alpha value is the first branch searched then the parallel algorithm has equal iterations to the sequential algorithm • Processors are dependent on each other for updated alpha values which cause communication locks Parallelization: Principal Variation Splitting (PVS) • • Early technique for parallelizing alpha-beta Assumptions: • • The game tree is well ordered The leftmost path is the best • • Updates alpha after a branch is searched Processors work under the same node Parallelization: Enhanced Principal Variation Splitting (EPVS) • • Simple improvement of PVS When a processor runs out of work: • • • • • Stop all processors at ply P Evaluate the branches 2 ply Split the processors among the tree like PVS Interacts with transposition table further calculations are not redundant Increased communication overhead Parallelization: Dynamic Tree Splitting (DTS) • Assumptions: • • • Shared memory Communication cost = 0 (Cray C916/1024 computer) Steps: • • • • • One processor searches from ply = 1 to N Each other processer begins processing nodes as in PVS If a processor has no work to do it broadcasts to help and joins another processor with work to share A split position is chosen and the processors divide the node Evaluations and splitting are looped until the node is complete Parallelization: Speedup Comparisons • PVS: • EPVS: • DTS: What Next? Some things we didn’t have time for • Bitboards • • Younger Brother Wait Concept • • Master-slave approach; similar to DTS GPU implementations • • Data structure for storing chess positions i.e. CPU generates the tree then GPU evaluates Neural Networks • Simulation technique which may allow more processors What Next? Looking to the future • • The best algorithm for large numbers of processors and indefinite tree size is unknown Optimizations to existing algorithms and techniques are still possible • • i.e. making the new alpha available to each processor when its found as opposed to when a processor finishes a search Explore new algorithms that don’t rely on communication pitfalls or tree structure Questions? Sources: http://ijsetr.org/wp-content/uploads/2015/05/IJSETR-VOL-4-ISSUE-4-1138-1141.pdf https://www.cis.uab.edu/hyatt/search.html http://www.top-5000.nl/ps/Parallel%20Alpha-Beta%20Search%20on%20Shared%20Memory%20Multiprocessors.pdf http://arirang.snu.ac.kr/courses/pp2006/Chapter16.pdf http://iacoma.cs.uiuc.edu/~greskamp/pdfs/412.pdf https://www.fide.com/component/handbook/?id=174&view=article http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=42858 http://supertech.csail.mit.edu/papers/dimacs94.pdf http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4031890 http://www.sciencedirect.com/science/article/pii/S1875952111000450 http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.47.3135&rep=rep1&type=pdf http://podelise.ru/tw_files/25875/d-25874275/7z-docs/1.pdf
© Copyright 2026 Paperzz