DCP 1172 Introduction to Artificial Intelligence Lecture notes for Chap. 6 [AIMA] Chang-Sheng Chen 1 This time: Outline • Adversarial search - Game playing • The mini-max algorithm • Resource limitations • alpha-beta pruning • Elements of chance DCP 1172, Ch. 6 2 Game Playing Search • Why study games? • Why is search a good idea? DCP 1172, Ch. 6 3 Why Study Games ? (1) • Game playing was one the first tasks undertaken in AI. • By 1950, Chess had been studied by many forerunners in AI ( e.g., Claude Shannon, Alan Turing, etc.) • For AI researchers, the abstract nature of games make them an appealing feature for study. • The state of a game is easy to represent, • and agents are usually restricted to a small number of actions, • whose outcomes are defined by precise rules. DCP 1172, Ch. 6 4 Why Study Games ? (2) • Games are interesting because they are too hard to solve. • Games requires the ability to make some decision even when calculating the optimal decision is infeasible. • Games also penalize inefficiency severely. • Game-playing research has therefore spawned a number of interesting ideas on how to make the best possible use of time. DCP 1172, Ch. 6 5 Why is search a good idea? • Ignoring computational complexity, games are a perfect application for a complete search. • Some majors assumptions we’ve been making: • Only an agent’s actions change the world • World is deterministic and fully observable • Pretty much true in lots of games • Of course, ignoring complexity is a bad idea, so games are a good place to study resource bounded searches. DCP 1172, Ch. 6 6 What kind of games? • Abstraction: To describe a game we must capture every relevant aspect of the game. Such as: • Chess • Tic-tac-toe • … • Fully observable environments: Such games are characterized by perfect information • Search: game-playing then consists of a search through possible game positions • Unpredictable opponent: introduces uncertainty thus game-playing must deal with contingency problems DCP 1172, Ch. 6 7 Searching for the next move • Complexity: many games have a huge search space • Chess: b = 35, m=100 nodes = 35 100 if each node takes about 1 ns to explore then each move will take about 10 50 millennia to calculate. • Resource (e.g., time, memory) limit: optimal solution not feasible/possible, thus must approximate 1. Pruning: makes the search more efficient by discarding portions of the search tree that cannot improve quality result. 2. Evaluation functions: heuristics to evaluate utility of a state without exhaustive search. DCP 1172, Ch. 6 8 Two-player games • A game formulated as a search problem: • • • • Initial state: board position and turn Successor functions: definition of legal moves Terminal state: conditions for when game is over Utility function: a numeric value that describes the outcome of the game. E.g., -1, 0, 1 for loss, draw, win. (AKA payoff function) DCP 1172, Ch. 6 9 Game vs. search problem DCP 1172, Ch. 6 10 Example: Tic-Tac-Toe DCP 1172, Ch. 6 11 Type of games DCP 1172, Ch. 6 12 Type of games DCP 1172, Ch. 6 13 Generate Game Tree DCP 1172, Ch. 6 14 Generate Game Tree x x x x x DCP 1172, Ch. 6 15 Generate Game Tree x x o o x x o DCP 1172, Ch. 6 x o 16 Generate Game Tree x 1 ply 1 move x o o x x o DCP 1172, Ch. 6 x o 17 A subtree x o x ox o win lose x o x x ox o x o x x ox o o x o x x ox o x o x o x ox x o x o x x o x x ox o ox oo x o x o x x ox x oo x o x o ox x xo x o x ox x o x o x ox x oo x o x o ox x o x o x o ox x x o DCP 1172, Ch. 6 draw x o x ox o x o x o x x ox o x o 18 What is a good move? x o x ox o win lose x o x x ox o x o x x ox o o x o x x ox o x o x o x ox x o x o x x o x x ox o ox oo x o x o x o ox x xo x o x ox x o x o x ox x oo x o x o ox x o x o x o ox x x o DCP 1172, Ch. 6 draw x o x ox o x o x o x x ox o x o 19 MiniMax • Perfect play for deterministic environments with perfect information • From among the moves available to you, take the best one • Where the best one is determined by a search using the MiniMax strategy DCP 1172, Ch. 6 20 The minimax algorithm • Basic idea: choose move with highest minimax value = best achievable payoff against best play • Algorithm: 1. Generate game tree completely 2. Determine utility of each terminal state 3. Propagate the utility values upward in the three by applying MIN and MAX operators on the nodes in the current level 4. At the root node use minimax decision to select the move with the max (of the min) utility value • Steps 2 and 3 in the algorithm assume that the opponent will play perfectly. DCP 1172, Ch. 6 21 Minimax 3 12 8 2 4 6 14 5 2 •Minimize opponent’s chance •Maximize your chance DCP 1172, Ch. 6 22 Minimax 3 2 2 MIN 3 12 8 2 4 6 14 5 2 •Minimize opponent’s chance •Maximize your chance DCP 1172, Ch. 6 23 Minimax 3 MAX 3 2 2 MIN 3 12 8 2 4 6 14 5 2 •Minimize opponent’s chance •Maximize your chance DCP 1172, Ch. 6 24 Minimax 3 MAX 3 2 2 MIN 3 12 8 2 4 6 14 5 2 •Minimize opponent’s chance •Maximize your chance DCP 1172, Ch. 6 25 MiniMax = maximum of the minimum • I’ll choose the best move for me (max) • You’ll choose the best move for you (min) 1st Ply 2nd Ply DCP 1172, Ch. 6 26 Minimax: Recursive implementation Complete: Yes, for finite state-space Time complexity: O(bm) Optimal: Yes Space complexity: O(bm) (= DFS Does not keep all nodes in memory.) DCP 1172, Ch. 6 27 Do We Have To Do All That Work? MAX MIN 3 12 8 DCP 1172, Ch. 6 28 Do We Have To Do All That Work? 3 MAX 3 MIN 3 12 8 DCP 1172, Ch. 6 29 Do We Have To Do All That Work? 3 MAX 3 2 MIN 3 12 8 2 Since 2 is smaller than 3, then there is no need for further search DCP 1172, Ch. 6 30 Do We Have To Do All That Work? 3 MAX 3 X 2 MIN 3 12 8 14 5 2 More on this next time: α-β pruning DCP 1172, Ch. 6 31 Ideal Case • Search all the way to the leaves (end game positions) • Return the leaf (leaves) that leads to a win (for me) • Anything wrong with that? DCP 1172, Ch. 6 32 More Realistic • Search ahead to a non-leaf (non-goal) state and evaluate it somehow • Chess • 4 ply is a novice • 8 ply is a master • 12 ply can compete at the highest level • In no sense can 12 ply be likened to a search of the whole space DCP 1172, Ch. 6 33 1. Move evaluation without complete search • Complete search is too complex and impractical • Evaluation function: evaluates value of state using heuristics and cuts off search • New MINIMAX: • CUTOFF-TEST: • cutoff test to replace the terminal test condition (e.g., deadline, depth-limit, etc.) • EVAL: • evaluation function to replace utility function (e.g., number of chess pieces taken) DCP 1172, Ch. 6 34 Evaluation Functions • Need a numerical function that assigns a value to a nongoal state • Has to capture the notion of a position being good for one player • Has to be fast • Typically a linear combination of simple metrics DCP 1172, Ch. 6 35 Evaluation functions • Weighted linear evaluation function: to combine n heuristics f = w1f1 + w2f2 + … + wnfn E.g, w’s could be the values of pieces (1 for prawn, 3 for bishop etc.) f’s could be the number of type of pieces on the board DCP 1172, Ch. 6 36 Note: exact values do not matter DCP 1172, Ch. 6 37 Minimax with cutoff: viable algorithm? Assume we have 100 seconds, evaluate 104 nodes/s; can evaluate 106 nodes/move DCP 1172, Ch. 6 38 2. - pruning: search cutoff • Pruning: eliminating a branch of the search tree from consideration without exhaustive examination of each node • - pruning: the basic idea is to prune portions of the search tree that cannot improve the utility value of the max or min node, by just considering the values of nodes seen so far. • Does it work? Yes, in roughly cuts the branching factor from b to b resulting in double as far lookahead than pure minimax DCP 1172, Ch. 6 39 - pruning: example 6 MAX MIN 6 6 12 8 DCP 1172, Ch. 6 40 - pruning: example 6 MAX MIN 2 6 6 12 8 2 DCP 1172, Ch. 6 41 - pruning: example 6 MAX MIN 6 12 5 2 6 8 2 DCP 1172, Ch. 6 5 42 - pruning: example 6 MAX Selected move MIN 6 12 5 2 6 8 2 DCP 1172, Ch. 6 5 43 Properties of - DCP 1172, Ch. 6 44 - pruning: general principle Player Opponent m If > v then MAX will chose m so prune tree under n Similar for for MIN Player n Opponent DCP 1172, Ch. 6 v 45 Remember: Minimax: Recursive implementation DCP 1172, Ch. 6 46 Alpha-beta Pruning Algorithm DCP 1172, Ch. 6 47 More on the - algorithm • Same basic idea as minimax, but prune (cut away) branches of the tree that we know will not contain the solution. • Because minimax is depth-first, let’s consider nodes along a given path in the tree. Then, as we go along this path, we keep track of: • : Best choice so far for MAX • : Best choice so far for MIN DCP 1172, Ch. 6 48 More on the - algorithm: start from Minimax Note: These are both Local variables. At the Start of the algorithm, We initialize them to = - and = + DCP 1172, Ch. 6 49 More on the - algorithm = - = + MAX MIN In Min-Value: Max-Value loops over these … Min-Value loops over these MAX 5 10 = - =5 = - =5 6 = - DCP = 51172, 2 Ch. 6 8 7 50 More on the - algorithm In Max-Value: = - = + MAX Max-Value loops MIN over these =5 = + … MAX 5 10 = - =5 = - =5 6 = - DCP = 51172, 2 Ch. 6 8 7 51 More on the - algorithm In Min-Value: = - = + MAX =5 = + … MIN Min-Value loops over these MAX 5 10 = - =5 = - =5 6 =5 = - =6 2 DCP = 51172, Ch. 2 8 7 <, End loop and return 5 52 More on the - algorithm In Max-Value: = - = + MAX Max-Value loops MIN over these =5 = + =5 = + … MAX 5 10 = - =5 = - =5 6 = - =5 =5 =2 DCP 1172, Ch. 6 2 8 7 End loop and return 5 53 Operation of - pruning algorithm <, End loop and return DCP 1172, Ch. 6 54 Example DCP 1172, Ch. 6 55 - algorithm: DCP 1172, Ch. 6 56 Solution NODE A B C D E D F D C G H G C B J K L K … TYPE Max Min Max Min Max Min Max Min Max Min Max Min Max Min Max Min Max Min ALPHA -I -I -I -I 10 -I 11 -I 10 10 9 10 10 -I -I -I 14 -I BETA +I +I +I +I 10 10 11 10 +I +I 9 9 +I 10 10 10 14 10 SCORE 10 11 10 9 9 10 14 10 NODE … J B A Q R S T S R V W V R Q A DCP 1172, Ch. 6 TYPE ALPHA BETA Max Min Max Min Max Min Max Min Max Min Max Min Max Min Max 10 -I 10 10 10 10 5 10 10 10 4 10 10 10 10 10 10 +I +I +I +I 5 5 +I +I 4 4 +I 10 10 SCORE 10 10 5 5 4 4 10 10 10 57 State-of-the-art for deterministic games DCP 1172, Ch. 6 58 Stochastic games DCP 1172, Ch. 6 59 Algorithm for stochastic games DCP 1172, Ch. 6 60 Remember: Minimax algorithm DCP 1172, Ch. 6 61 Stochastic games: the element of chance expectimax and expectimin, expected values over all possible outcomes CHANCE ? 0.5 0.5 ? 3 ? 8 17 DCP 1172, Ch. 6 8 62 Stochastic games: the element of chance 4 = 0.5*3 + 0.5*5 CHANCE Expectimax 0.5 0.5 5 3 Expectimin 5 8 17 DCP 1172, Ch. 6 8 63 Evaluation functions: Exact values DO matter Order-preserving transformation do not necessarily behave the same! DCP 1172, Ch. 6 64 State-of-the-art for stochastic games DCP 1172, Ch. 6 65 Summary DCP 1172, Ch. 6 66
© Copyright 2026 Paperzz