GAME PLAYING 2 THIS LECTURE Alpha-beta pruning Games with chance Partially observable games NONDETERMINISM Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX) MAX’s play MIN’s play MAX cannot tell what move will be played NONDETERMINISM Uncertainty is caused by the actions of another agent (MIN), who competes with our agent (MAX) MAX’s play Instead of a single path, the agent must construct an entire plan MIN’s play MAX must decide what to play for BOTH these outcomes MINIMAX BACKUP MAX’s turn +1 0 MIN’s turn -1 +1 0 +1 MAX’s turn 0 -1 0 +1 0 DEPTH-FIRST MINIMAX ALGORITHM MAX-Value(S) 1. 2. MIN-Value(S) 1. 2. If Terminal?(S) return Result(S) Return maxS’SUCC(S) MIN-Value(S’) If Terminal?(S) return Result(S) Return minS’SUCC(S) MAX-Value(S’) MINIMAX-Decision(S) Return action leading to state S’SUCC(S) that maximizes MIN-Value(S’) REAL-TIME GAME PLAYING WITH EVALUATION FUNCTION e(s): function indicating estimated favorability of a state to MAX Keep track of depth, and add line: If(depth(s) = cutoff) return e(s) After terminal test CAN WE DO BETTER? Yes ! Much better ! 3 3 -1 Pruning -1 This part of the tree can’t have any effect on the value that will be backed up to the root EXAMPLE EXAMPLE b=2 2 The beta value of a MIN node is an upper bound on the final backed-up value. It can never increase EXAMPLE The beta value of a MIN node is an upper bound on the final backed-up value. It can never increase b=1 2 1 EXAMPLE a=1 The alpha value of a MAX node is a lower bound on the final backed-up value. It can never decrease b=1 2 1 EXAMPLE a=1 b = -1 b=1 2 1 -1 EXAMPLE a=1 b = -1 b=1 Search can be discontinued below any MIN node whose beta value is less than or equal to the alpha value of one of its MAX ancestors 2 1 -1 ALPHA-BETA PRUNING Explore the game tree to depth h in depth-first manner Back up alpha and beta values whenever possible Prune branches that can’t lead to changing the final decision ALPHA-BETA ALGORITHM Update the alpha/beta value of the parent of a node N when the search below N has been completed or discontinued Discontinue the search below a MAX node N if its alpha value is the beta value of a MIN ancestor of N Discontinue the search below a MIN node N if its beta value is the alpha value of a MAX ancestor of N EXAMPLE MAX MIN MAX MIN MAX MIN 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN MAX MIN MAX MIN 0 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN MAX MIN MAX MIN 0 0 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN MAX MIN MAX MIN 0 0 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN MAX MIN MAX MIN 0 0 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN MAX MIN 0 MAX MIN 0 0 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN MAX MIN 0 MAX MIN 0 0 -3 3 3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN MAX MIN 0 MAX MIN 0 0 -3 3 3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 3 3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 3 3 5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE 0 0 0 0 0 -3 3 3 2 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 3 3 2 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 3 3 2 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 3 3 2 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 3 3 2 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 3 3 2 2 5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 3 3 2 2 1 1 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 3 3 2 2 1 1 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 3 3 2 2 1 1 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 1 3 3 1 2 2 1 1 -3 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 1 3 3 1 2 2 1 1 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 1 3 3 1 2 2 1 1 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 0 0 -3 2 2 1 3 3 1 2 2 1 -5 1 -5 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE 0 0 0 0 0 0 -3 2 2 1 3 3 1 2 2 1 -5 1 -5 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 0 MIN 0 MAX 0 MIN 0 MAX MIN 1 0 0 -3 2 2 1 3 3 1 2 2 1 -5 1 -5 -3 -5 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 1 MIN 0 MAX 0 MIN 0 MAX MIN 1 0 0 -3 2 2 2 1 3 3 1 2 2 1 -5 2 1 -5 2 -3 -5 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 EXAMPLE MAX 1 MIN 0 MAX 0 MIN 0 MAX MIN 1 0 0 -3 2 2 2 1 3 3 1 2 2 1 -5 2 1 -5 2 -3 -5 2 0 5 -3 3 3 -3 0 2 -2 3 5 2 5 -5 0 1 5 1 -3 0 -5 5 -3 3 2 HOW MUCH DO WE GAIN? Consider these two cases: a=3 3 3 b=-1 -1 a=3 (4) b=4 4 -1 HOW MUCH DO WE GAIN? Assume a game tree of uniform branching factor b Minimax examines O(bh) nodes, so does alpha-beta in the worst-case The gain for alpha-beta is maximum when: The children of a MAX node are ordered in decreasing backed up values The children of a MIN node are ordered in increasing backed up values Then alpha-beta examines O(bh/2) nodes [Knuth and Moore, 1975] But this requires an oracle (if we knew how to order nodes perfectly, we would not need to search the game tree) If nodes are ordered at random, then the average number of nodes examined by alpha-beta is ~O(b3h/4) ALPHA-BETA IMPLEMENTATION MAX-Value(S,a,b) 1. 2. 3. 4. 5. MIN-Value(S,a,b) 1. 2. 3. 4. 5. If Terminal?(S) return Result(S) For all S’SUCC(S) a max(a,MIN-Value(S’,a,b)) If a b, then return a Return a If Terminal?(S) return Result(S) For all S’SUCC(S) b min(b,MAX-Value(S’,a,b)) If a b, then return b Return b Alpha-Beta-Decision(S) Return action leading to state S’SUCC(S) that maximizes MIN-Value(S’,-,+) HEURISTIC ORDERING OF NODES Order the nodes below the root according to the values backed-up at the previous iteration OTHER IMPROVEMENTS Adaptive horizon + iterative deepening Extended search: Retain k>1 best paths, instead of just one, and extend the tree at greater depth below their leaf nodes (to help dealing with the “horizon effect”) Singular extension: If a move is obviously better than the others in a node at horizon h, then expand this node along this move Use transposition tables to deal with repeated states Null-move search GAMES OF CHANCE GAMES OF CHANCE Dice games: backgammon, Yahtzee, craps, … Card games: poker, blackjack, … Is there a fundamental difference between the nondeterminism in chess-playing vs. the nondeterminism in a dice roll? MAX CHANCE MIN CHANCE MAX EXPECTED VALUES The utility of a MAX/MIN node in the game tree is the max/min of the utility values of its successors The expected utility of a CHANCE node in the game tree is the average of the utility values of its successors ExpectedValue(s) = s’SUCC(s) ExpectedValue(s’) P(s’) CHANCE nodes Compare to MinimaxValue(s) = max s’SUCC(s) MinimaxValue(s’) MAX nodes MinimaxValue(s) = min s’SUCC(s) MinimaxValue(s’) MIN nodes ADVERSARIAL GAMES OF CHANCE E.g., Backgammon MAX nodes, MIN nodes, CHANCE nodes Expectiminimax search Backup step: MAX = maximum of children CHANCE = average of children MIN = minimum of children CHANCE = average of children 4 levels of the game tree separate each of MAX’s turns! Evaluation function? Pruning? GENERALIZING MINIMAX VALUES Utilities can be continuous numerical values, rather than +1,0,-1 Allows maximizing the amount of “points” (e.g., $) rewarded instead of just achieving a win Rewards associated with terminal states Costs can be associated with certain decisions at non-terminal states (e.g., placing a bet) ROULETTE No bet “Game tree” only has depth 2 Place a bet Observe the roulette wheel Bet: Red, $5 Chance node Probabilities 18/38 20/38 Red Not red +10 0 CHANCE NODE BACKUP Expected value: For k children, with backed up values v1,…,vk Chance node value = p1 * v1 + p2 * v2 + … + pk * vk Bet: Red, $5 Value: 18/38 * 10 + 20/38 * 0 = 4.74 Chance node Probabilities 18/38 20/38 Red Not red +10 0 MAX/CHANCE NODES Max should pick the action leading to the node with the highest value MAX Bet: Red, $5 Bet: 17, $5 3.95 = 150/38 4.74 18/38 20/38 1/38 Red Not red 17 +10 0 +150 37/38 Not 17 0 Chance A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT for TT HT Done HT Flip T 1/2 1/2 HT HH Flip T 1/2 1/2 HT HH Flip H 1/2 HT Flip H HT 1/2 TT TT Done TT 1/2 1/2 Flip T 1/2 1/2 HT TT A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT 1-1=0 for TT 1/2 HT 1-2=-1 HT Done HT 1 Flip T 1/2 HT Flip T 1/2 1/2 Flip H 1/2 HH 5-1=4 Flip H HT HH HT 5-2=3 -1 TT Done TT -1 1/2 1/2 Flip T 1/2 1/2 1/2 TT -2 HT -1 TT -2 A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT 0 for TT 1/2 HT -1 HT Done HT 1 Flip T 1/2 HT Flip T 1 1/2 HH 3 1/2 Flip H 1/2 HH 4 Flip H HT HT -1 TT Done TT -3/2 -1 1/2 1/2 Flip T 1/2 1/2 1/2 TT -2 HT -1 TT -2 A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT 0 for TT HT Done HT 1 Flip T 1/2 HT 1/2 1 Flip T 1 1/2 1/2 HT -1 HH 3 Flip H 1/2 HH 4 Flip H HT HT -1 1/2 TT -2 TT Done TT -3/2 -1 1/2 1/2 Flip T 1/2 1/2 HT -1 TT -2 A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT 0 for TT HT Done HT 1 Flip T 3 1/2 HT 1/2 2 Flip T 2 1/2 1/2 HT -1 HH 3 Flip H 1/2 HH 4 Flip H HT HT -1 1/2 TT -2 TT Done TT -3/2 -1 1/2 1/2 Flip T 1/2 1/2 HT -1 TT -2 A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT 0 for TT HT Done HT 1 Flip T 3 1/2 HT 1/2 2 Flip T 2 1/2 1/2 HT -1 HH 3 Flip H 1/2 HH 4 Flip H HT HT -1 1/2 TT -2 TT Done TT -3/2 -1 1/2 1/2 Flip T -3/2 1/2 1/2 HT -1 TT -2 A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT 0 for TT HT Done HT 1 Flip T 3 1/2 HT 1/2 1/2 HT -1 HH 3 1/2 1/2 2 Flip T 2 Flip H 1/2 1/2 HH 4 Flip H HT TT 2 -1 Done Flip T TT -3/2 -1 -3/2 1/2 HT -1 1/2 TT -2 1/2 1/2 HT -1 TT -2 A SLIGHTLY MORE COMPLEX EXAMPLE Two fair coins Pay $1 to start, at which point both are flipped Can flip up to two coins again, at a cost of $1 each Done Payout: $5 for HH, $1 for HT or TH, $0 HT 0 for TT HT 3 Done HT 1 Flip T 3 1/2 HT 1/2 1/2 HT -1 HH 3 1/2 1/2 2 Flip T 2 Flip H 1/2 1/2 HH 4 Flip H HT TT 2 -1 Done Flip T TT -3/2 -1 -3/2 1/2 HT -1 1/2 TT -2 1/2 1/2 HT -1 TT -2 CARD GAMES Blackjack (6-deck), video poker: similar to coinflipping game But in many card games, need to keep track of history of dealt cards in state because it affects future probabilities One-deck blackjack Bridge Poker PARTIALLY OBSERVABLE GAMES Partial observability Don’t see entire state (e.g., other players’ hands) “Fog of war” Examples: Kriegspiel (see R&N) Battleship Stratego OBSERVATION OF THE REAL WORLD Real world in some state Percepts Interpretation of the percepts in the representation language On(A,B) On(B,Table) Handempty Percepts can be user’s inputs, sensory data (e.g., image68 pixels), information received from other agents, ... SECOND SOURCE OF UNCERTAINTY: IMPERFECT OBSERVATION OF THE WORLD Observation of the world can be: Partial, e.g., a vision sensor can’t see through obstacles (lack of percepts) R1 R2 The robot may not know whether there is dust in room R2 69 SECOND SOURCE OF UNCERTAINTY: IMPERFECT OBSERVATION OF THE WORLD Observation of the world can be: Partial, e.g., a vision sensor can’t see through obstacles Ambiguous, e.g., percepts have multiple possible interpretations A C B On(A,B) On(A,C) 70 SECOND SOURCE OF UNCERTAINTY: IMPERFECT OBSERVATION OF THE WORLD Observation of the world can be: Partial, e.g., a vision sensor can’t see through obstacles Ambiguous, e.g., percepts have multiple possible interpretations Incorrect 71 PARTIALLY-OBSERVABLE CARD GAMES One possible strategy: Consider all possible deals given observed information Solve each deal as a fully-observable problem Choose the move that has the best average minimax value “Averaging over clairvoyance” [Why doesn’t this always work?] BELIEF STATE A belief state is the set of all states that an agent think are possible at any given time or at any stage of planning a course of actions, e.g.: To plan a course of actions, the agent searches a space of belief states, instead of a space of states SENSOR MODEL State space S The sensor model is a function SENSE: S 2S that maps each state s S to a belief state (the set of all states that the agent would think possible if it were actually observing state s) Example: Assume our vacuum robot can perfectly sense the room it is in and if there is dust in it. But it can’t sense if there is dust in the other room SENSE( ) = SENSE( ) = VACUUM ROBOT ACTION MODEL Right either moves the robot right, or does nothing Suck picks up the dirt in the room, if any, and always does the right thing Left always moves the robot to the left, but it may occasionally deposit dust in the right room • The robot perfectly senses the room it is in and whether there is dust in it • But it can’t sense if there is dust in the other room TRANSITION BETWEEN BELIEF STATES Suppose the robot is initially in state: After sensing this state, its belief state is: Just after executing Left, its belief state will be: After sensing the new state, its belief state will be: or if there is no dust in R1 if there is dust in R1 TRANSITION BETWEEN BELIEF STATES Playing a “game against nature” Left Clean(R1) Clean(R1) After receiving an observation, the robot will have one of these two belief states AND/OR TREE OF BELIEF STATES Left Right loop Suck goal Suck goal An action is applicable to a belief state B if its preconditions are achieved in all states in B A goal belief state is one in which all states are goal states RECAP Alpha-beta pruning: reduce complexity of minimax to O(bh/2) ideally, O(b3h/4) typically Games with chance Expected values: averaging over probabilities A 2nd source of uncertainty: partial observability Reason about sets of states: belief state Much more on latter 2 topics later PROJECT PROPOSAL (OPTIONAL) Mandatory: instructor’s advance approval Project title, team members 1/2 to 1 page description Specific topic (problem you are trying to solve, topic of survey, etc) Why did you choose this topic? Methods (researched in advance, sources of references) Expected results Email to me by 9/20 HOMEWORK Reading: R&N 6
© Copyright 2026 Paperzz