Adversarial search Artificial Intelligence Adversarial search Chapter 6, AIMA • At least two agents and a competitive environment: Games, economies. • Games and AI: – Generally considered to require intelligence (to win) – Have to evolve in real-time – Well-defined and limited environment This presentation owes a lot to V. Pavlovic @ Rutgers, who borrowed from J. D. Skrentny, who in turn borrowed from C. Dyer,... Board games © Thierry Dichtenmuller Games & AI perfect info Deterministic Chance Checkers, Chess, Go, Othello Backgammon, Monopoly imperfect info Games and search Traditional search: single agent, searches for its well-being, unobstructed Games: search against an opponent Example: two player board game (chess, checkers, tic-tac-toe,…) Board configuration: unique arrangement of "pieces“ Representing board games as goal-directed search problem (states = board configurations): – – – – Initial state: Current board configuration Successor function: Legal moves Goal state: Winning/terminal board configuration Utility function: Current board configuration Bridge, Poker, Scrabble Example: Tic-tac-toe • Initial state: 3×3 empty table. • Successor function: Players take turns marking ± or | in the table cells. • Goal state: When all the table cells are filled or when either player has three symbols in a row. • Utility function: +1 for three in a row, -1 if the opponent has three in a row, 0 if the table is filled and no-one has three symbols in a row. Initial state ± ± ± ± ± ±| | ±± ±| | ± ± |± ± ± |±± ±| |± | | ± || ± ± Goal state | ± ± ± =0 |±± Utility |± ± Goal state Utility = +1 | ± ± | | Goal state | Utility = -1 1 The minimax principle Example: Tic-tac-toe Assume the opponent plays to win and always makes the best possible move. ± | Your (MAX) move (±) The minimax value for a node = the utility for you of being in that state, assuming that both players (you and the opponent) play optimally from there on to the end. | ± ± | Assignment: Expand this tree to the end of the game. Terminology: MAX = you, MIN = the opponent. Example: Tic-tac-toe Example: Tic-tac-toe ± | Your (MAX) move ± | Your (MAX) move | ± ± | ± ± | Opponent (MIN) move | ± ± | ± | ± | | ± ± | ± ± | ± ± | ± | ± ± | Opponent (MIN) move | ± Your | ± ± (MAX) move | | ± ± | | ± | ± | | ± | ± | | ± ± | ± ± | ± ± | ± ± | ± ± ± | ± | | | ± | | ± ± ± | ± ± | | ± | ± ± | | ± | ± ± | | | | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± ± ± | ± | ± | | ± ± | ± ± | ± ± | ± | ± ± | Your | ± ± (MAX) move | | Minimax +1 value ± ± | | ± | ± | | ± | ± | | ± ± | ± ± | ± ± | ± ± | ± ± ± | ± | | | ± | | ± | | 0 ± ± | ± ± | | ± ± | ± | ± ± 0 0 | ± | ± ± | | ± ± | ± ± 0 +1 | ± | ± ± | | ± ± | ± ± | | ± ± | | ± | ± ± | | ± | ± | | ± | | ± ± | | ± | ± ± | | ± | ± | | ± Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 Example: Tic-tac-toe Example: Tic-tac-toe ± | Your (MAX) move ± | Your (MAX) move | ± ± | ± ± | | Minimax value 0 Opponent (MIN) move ± ± | ± | ± | | ± ± | ± ± | ± ± Minimax value ± ± | Your | ± ± (MAX) move | | Minimax +1 value | ± | 0 Opponent (MIN) move | ± 0 | ± | ± | | ± | ± | | ± ± | ± ± | ± ± | ± ± | ± ± ± | ± | | | ± | | ± 0 0 0 0 ± | ± | | ± ± | ± ± | ± ± Minimax value 0 ± ± | | | ± ± | +1 ± ± | Your | ± ± (MAX) move | | Minimax +1 value | ± | 0 | ± 0 0 ± ± | | ± | ± | | ± | ± | | ± ± | ± ± | ± ± | ± ± | ± ± ± | ± | | | ± | | ± | | 0 0 0 0 +1 ± ± | ± ± | | ± | ± ± | | ± | ± ± | ± ± | ± ± | | ± | ± ± | | ± | ± ± | | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | ± ± | | ± ± | | ± | ± ± | | ± | ± | | ± | | ± ± | | ± | ± ± | | ± | ± | | ± Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 Utility = +1 Utility = 0 Utility = 0 Utility = 0 Utility = 0 Utility = +1 2 The minimax value The minimax algorithm 1. Start with utilities of terminal nodes 2. Propagate them back to root node by choosing the minimax strategy Minimax value for node n = Utility(n) If n is a terminal node Max(Minimax-values of successors) If n is a MAX node Min(Minimax-values of successors) If n is a MIN node A A 1 B max D D 0 C B -5 C -6 E E 1 min F G -7 7 -5 H 3 I 9 J -6 K L 0 2 M 1 N 3 O 2 High utility favours you (MAX), therefore choose move with highest utility Low utility favours the opponent (MIN), therefore choose move with lowest utility The minimax algorithm The minimax algorithm 1. Start with utilities of terminal nodes 2. Propagate them back to root node by choosing the minimax strategy A A 1 B 1. Start with utilities of terminal nodes 2. Propagate them back to root node by choosing the minimax strategy A A 1 max D D 0 C B -5 Figure borrowed from V. Pavlovic C -6 E E 1 B max D D 0 C B -5 C -6 E E 1 min F -7 7 G -5 H 3 I 9 J -6 K 0 L 2 M 1 N 3 O 2 Figure borrowed from V. Pavlovic Complexity of minimax algorithm • A depth-first search – Time complexity O(bd) – Space complexity O(bd) • Time complexity impossible in real games (with time constraints) except in very simple games (e.g. tic-tac-toe) min F -7 7 G -5 H 3 I 9 J -6 K 0 L 2 M 1 N 3 O 2 Figure borrowed from V. Pavlovic Strategies to improve minimax 1. Remove redundant search paths - symmetries 2. Remove uninteresting search paths - alpha-beta pruning 3. Cut the search short before goal - Evaluation functions 4. Book moves 3 First three levels of the tic-tac-toe state space reduced by symmetry 1. Remove redundant paths 3 states (instead of 9) Tic-tac-toe has mirror symmetries and rotational symmetries 12 states (instead of 8·9 = 72) |± | ± | = | = | | | ± | ± = Image from G. F. Luger, ”Artificial Intelligence”, 2002 2. Remove uninteresting paths Alpha-Beta Example If the player has a better choice m at n’s parent node, or at any node further up, then node n will never be reached. minimax(A,0,4) minimax(node, level, depth limit) Prune the entire path below node m’s parent node (except for the path that m belongs to, and paths that are equal to this path). A α=? B Minimax is depth-first → keep track of highest (α) and lowest (β) values so far. -5 N W Call Stack A N -5 W -3 H -10 I 8 P O 4 D C G F 9 E 0 J Q -6 R S 0 T 3 M 2 U 5 V -7 -9 A Slide adapted from V. Pavlovic Q -6 R 0 S 3 T 5 U -7 V B F G F α=? -5 W -3 H -10 I 8 P O 4 D C β=? N -9 B A Slide adapted from V. Pavlovic α=? max M 2 Call Stack A max min L K X -5 L K minimax(F,2,4) α=? B 9 J Alpha-Beta Example minimax(B,1,4) B β=? P -5 Alpha-Beta Example min I 8 E 0 X -3 max H -10 O 4 D C G F Called alpha-beta pruning. Call Stack A max 9 J Q -6 L K R 0 X -5 E 0 S 3 M 2 T 5 U -7 V -9 F B A Slide adapted from V. Pavlovic 4 Alpha-Beta Example Alpha-Beta Example minimax(F,2,4) is returned to minimax(N,3,4) alpha = 4, maximum seen so far max Call Stack A α=? B min F max G α=? N -5 H -10 W 9 X -3 I 8 P O 4 D C β=? -5 E 0 J Q -6 R S 0 M 2 T 3 U 5 α=? B min L K V -7 F -9 gold: terminal state G α=4 α= N -5 W 9 -5 Alpha-Beta Example B F max min Call Stack G α=4 N -5 H -10 O W 9 X -3 I 8 P O β=? 4 D C -5 J L K Q R S 0 Q -6 R S 0 T 3 M 2 U 5 V -7 -9 F B A Slide adapted from V. Pavlovic T 3 M 2 U 5 α=? B V -7 F -9 gold: terminal state min G α=4 N -5 O W 9 -5 Alpha-Beta Example I 8 P X -3 Slide adapted from V. Pavlovic H -10 β=? 4 D C β=? max O F B A Call Stack A max min E 0 -6 L K minimax(W,4,4) A β=? J Alpha-Beta Example α=? min E 0 gold: terminal state minimax(O,3,4) max I 8 P X -3 Slide adapted from V. Pavlovic H -10 O 4 D C β=? max N F B A Call Stack A max E 0 J L K Q -6 R S 0 T 3 M 2 U 5 V -7 -9 gold: terminal state (depth limit) W O F B A Slide adapted from V. Pavlovic Alpha-Beta Example minimax(O,3,4) is returned to O's beta (-3) < F's alpha (4): Stop expanding O (alpha cut-off) beta = -3, minimum seen so far A max α=? B min F G α=4 N -5 O W -3 H -10 I 8 P β=-3 β= 4 D C β=? max min Call Stack 9 X -5 E 0 J Q -6 R 0 S 3 5 gold: terminal state (depth limit) Slide adapted from V. Pavlovic M 2 T α=? B min L K U -7 F V -9 min G α=4 N -5 O W -3 H -10 I 8 P β=-3 4 D C β=? max O F B A Call Stack A max 9 X -5 E 0 J Q -6 L K R 0 S 3 M 2 T 5 gold: terminal state (depth limit) U -7 V -9 O F B A Slide adapted from V. Pavlovic 5 Alpha-Beta Example Alpha-Beta Example Why? Smart opponent selects W or worse → O's upper bound is –3 So MAX shouldn't select O:-3 since N:4 is better minimax(F,2,4) is returned to alpha not changed (maximizing) α=? B min F G α=4 N -5 H -10 O W 9 X -3 I 8 P β=-3 4 D C β=? max min Call Stack A max -5 E 0 J Q -6 R S 0 M 2 T 3 U 5 α=? B min L K V -7 F -9 gold: terminal state (depth limit) min G α=4 N -5 O W 9 -5 Alpha-Beta Example I 8 P X -3 Slide adapted from V. Pavlovic H -10 β=-3 4 D C β=? max O F B A Call Stack A max E 0 J L K Q -6 R S 0 T 3 M 2 U 5 V -7 -9 gold: terminal state (depth limit) F B A Slide adapted from V. Pavlovic Alpha-Beta Example minimax(B,1,4) is returned to minimax(G,2,4) beta = 4, minimum seen so far α=? B min F G α=4 N -5 H -10 O W 9 X -3 I 8 P β=-3 4 D C β=4 β= max min Call Stack A max -5 E 0 J Q -6 R S 0 M 2 T 3 U 5 α=? B min L K F min -9 N H -10 O W I 8 P 9 X -3 Slide adapted from V. Pavlovic -5 E 0 J L K Q -6 R S 0 T 3 M 2 U 5 V -7 -9 gold: terminal state (depth limit) Alpha-Beta Example minimax(B,1,4) is returned to minimax(A,0,4) is returned to alpha = -5, maximum seen so far Call Stack A max α=? B min F max G α=4 N -5 O W -3 H -10 I 8 P β=-3 4 D C β=-5 β=4 β= 9 X -5 E 0 J Q -6 0 S 3 T 5 gold: terminal state (depth limit) Slide adapted from V. Pavlovic M 2 U -7 α=-5 α=? B F V min B A G α=4 N -5 O W -3 H -10 I 8 P β=-3 4 D C β=-5 β=4 β= max -9 Call Stack A max min L K R G B A Slide adapted from V. Pavlovic Alpha-Beta Example beta = -5, minimum seen so far min -5 β=-3 4 B A gold: terminal state (depth limit) G α=4 D C β=4 β= max V -7 Call Stack A max 9 X -5 E 0 J Q -6 L K R 0 S 3 M 2 T 5 gold: terminal state (depth limit) U -7 V -9 A Slide adapted from V. Pavlovic 6 Alpha-Beta Example Alpha-Beta Example minimax(C,1,4) Call Stack A max α=-5 α=? B min C F G α=4 N -5 H -10 O W 9 X -3 I 8 P β=-3 4 D C β=? β=-5 β=4 β= max min minimax(H,2,4) -5 E 0 J Q -6 R S 0 M 2 T 3 U 5 α=-5 α=? B min L K min -9 F G α=4 N -5 O W Alpha-Beta Example I 8 P 9 X -3 Slide adapted from V. Pavlovic H -10 β=-3 4 C A gold: terminal state (depth limit) C D C β=? β=-5 β=4 β= max V -7 Call Stack A max -5 E 0 J L K Q -6 R S 0 T 3 M 2 U 5 V -7 -9 gold: terminal state (depth limit) H C A Slide adapted from V. Pavlovic Alpha-Beta Example minimax(C,1,4) is returned to beta = -10, minimum seen so far Call Stack A max α=-5 α=? B min C F G α=4 N -5 H -10 O W 9 X -3 I 8 P β=-3 4 D C β=-10 β=? β=-5 β=4 β= max min C's beta (-10) < A's alpha (-5): Stop expanding C (alpha cut-off) -5 E 0 J Q -6 R S 0 M 2 T 3 U 5 α=-5 α=? B min L K min -9 F N W F max min Call Stack C G α=4 N -5 O W -3 I 8 P β=-3 4 H -10 9 D X -5 E 0 J Q -6 0 S 3 M 2 T 5 gold: terminal state (depth limit) Slide adapted from V. Pavlovic -5 J L K Q -6 R S 0 T 3 M 2 U 5 V -7 -9 C A gold: terminal state (depth limit) Slide adapted from V. Pavlovic U -7 α=-5 α=0 α=? B V min D A C F G α=4 N -5 O W -3 H -10 I 8 P β=-3 4 D C β=-10 β=? β=-5 β=4 β= max -9 Call Stack A max min L K R 9 E 0 minimax(D,1,4) is returned to A C β=-10 β=? D Alpha-Beta Example α=-5 α=? B I 8 P X -3 minimax(D,1,4) β=-5 β=4 β= H -10 O Alpha-Beta Example min -5 β=-3 4 Slide adapted from V. Pavlovic max G α=4 C A gold: terminal state (depth limit) C C β=-10 β=? β=-5 β=4 β= max V -7 Call Stack A max 9 X -5 E 0 J Q -6 L K R 0 S 3 M 2 T 5 gold: terminal state (depth limit) U -7 V -9 A Slide adapted from V. Pavlovic 7 Alpha-Beta Example Alpha-Beta Example minimax(D,1,4) is returned to α=-5 α=0 α=? B min C F G α=4 N -5 H -10 O W 9 X -3 I 8 P β=-3 4 D C β=-10 β=? β=-5 β=4 β= max min Which nodes will be expanded? A max -5 E 0 J Q R S 0 T 3 M 2 U 5 α=-5 α=0 α=? B min -9 F G α=4 N -5 O W Alpha-Beta Example D I K L K α=5 Q 9 -5 E J 8 -6 Call Stack E β=-7 0 P X -3 Slide adapted from V. Pavlovic H -10 β=-3 4 A gold: terminal state (depth limit) C C β=-10 β=? β=-5 β=4 β= max V -7 Which nodes will be expanded? A max min L K -6 Call Stack R S 0 T 3 M M α=-7 2 U 5 V -7 -9 A All gold: terminal state (depth limit) Slide adapted from V. Pavlovic Alpha-Beta Example minimax(D,1,4) is returned to α=-5 α=0 α=? B min C F G α=4 N -5 O W -3 H -10 I 8 P β=-3 4 D C β=-10 β=? β=-5 β=4 β= max min What if we expand Call from right to left? Stack A max 9 X -5 E 0 J Q -6 R 0 S 3 M 2 T 5 α=-5 α=0 α=? B min L K U -7 V min gold: terminal state (depth limit) Slide adapted from V. Pavlovic Alpha-Beta pruning rule Stop expanding max node n if α(n) > β higher in the tree min node n if β(n) < α higher in the tree A C F G α=4 N -5 O W -3 H -10 I J 8 Q 9 X -5 E E β=-7 0 P β=-3 4 D C β=-10 β=? β=-5 β=4 β= max -9 What if we expand Call from right to left? Stack A max -6 L K R S 0 3 M M α=-7 2 T 5 gold: terminal state (depth limit) U -7 V -9 Only 4 A Slide adapted from V. Pavlovic Alpha-Beta pruning rule Stop expanding max node n if α(n) > β higher in the tree min node n if β(n) < α higher in the tree α= 4 ββ==84 α= α=24 β=2 β=3 β=3 α= α=48 β=2 α= 3 β=3 β=2 Which nodes will not be expanded when expanding from left to right? 8 Alpha-Beta pruning rule Stop expanding max node n if α(n) > β higher in the tree min node n if β(n) < α higher in the tree Alpha-Beta pruning rule Stop expanding max node n if α(n) > β higher in the tree min node n if β(n) < α higher in the tree α= 3 ββ==78 α= 4 ββ==92 Which nodes will not be expanded when expanding from left to right? Alpha-Beta pruning rule Stop expanding max node n if α(n) > β higher in the tree min node n if β(n) < α higher in the tree β=9 ββ==34 β=3 α= 8 β=2 α=32 α= β=3 α= 9 β=2 Which nodes will not be expanded when expanding from right to left? 3. Cut the search short • Use depth-limit and estimate utility for non-terminal nodes (evaluation function) – Static board evaluation (SBE) – Must be easy to compute Example, chess: SBE = α " Material Balance"+ β " Center Control"+γ ... Material balance = value of white pieces – value of black pieces, where pawn = +1, knight & bishop = +3, rook = +5, queen = +9, king = ? Which nodes will not be expanded when expanding from right to left? http://en.wikipedia.org/wiki/Computer_chess Leaf evaluation For most chess positions, computers cannot look ahead to all final possible positions. Instead, they must look ahead a few plies and then evaluate the final board position. The algorithm that evaluates final board positions is termed the "evaluation function", and these algorithms are often vastly different between different chess programs. Nearly all evaluation functions evaluate positions in units and at the least consider material value. Thus, they will count up the amount of material on the board for each side (where a pawn is worth exactly 1 point, a knight is worth 3 points, a bishop is worth 3 points, a rook is worth 5 points and a queen is worth 9 points). The king is impossible to value since its loss causes the loss of the game. For the purposes of programming chess computers, however, it is often assigned a value of appr. 200 points. The parameters (α,β,γ,...) can be learned (adjusted) from experience. 4. Book moves • Build a database (look-up table) of endgames, openings, etc. • Use this instead of minimax when possible. Evaluation functions take many other factors into account, however, such as pawn structure, the fact that doubled bishops are usually worth more, centralized pieces are worth more, and so on. The protection of kings is usually considered, as well as the phase of the game (opening, middle or endgame). 9 http://en.wikipedia.org/wiki/Computer_chess Games with chance Using endgame databases … Nalimov Endgame Tablebases, which use state-of-the-art compression techniques, require 7.05 GB of hard disk space for all five-piece endings. It is estimated that to cover all the six-piece endings will require at least 1 terabyte. Seven-piece tablebases are currently a long way off. While Nalimov Endgame Tablebases handle en passant positions, they assume that castling is not possible. This is a minor flaw that is probably of interest only to endgame specialists. More importantly, they do not take into account the fifty move rule. Nalimov Endgame Tablebases only store the shortest possible mate (by ply) for each position. However in certain rare positions, the stronger side cannot force win before running into the fifty move rule. A chess program that searches the database and obtains a value of mate in 85 will not be able to know in advance if such a position is actually a draw according to the fifty move rule, or if it is a win, because there will be a piece exchange or pawn move along the way. Various solutions including the addition of a "distance to zero" (DTZ) counter have been proposed to handle this problem, but none have been implemented yet. • Dice games, card games,... • Extend the minimax tree with chance layers. A max α=4 α= Compute the expected value over outcomes. 50/50 50/50 4 .5 Select move with the highest expected value. B .5 D β=6 2 9 chance -2 .5 C β=2 7 .5 50/50 50/50 E β=0 6 5 min β=-4 0 8 -4 Animation adapted from V. Pavlovic 10
© Copyright 2026 Paperzz