Inroduction - Peer Instruction for Computer Science

Game Playing
Introduction to Artificial Intelligence
CSE 150
Lecture on Chapter 5
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
CSP Summary
• CSPs are a special kind of search problem:
States defined by values of a fixed set of variables
Goal test defined by constraints on variable values
• Backtracking = depth-first search with
1) fixed variable order
2) only legal successors
3) Stop search path when constraints are violated.
• Forward checking prevents assignments that guarantee a
later failure
• Heuristics:
– Variable selection: most constrained variable
– Value selection: least constraining value
• Iterative min-conflicts is usually effective in practice
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
READING
• Read Chapter 6! (Next stop, Chapter 13!).
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Clicker Quiz
What type of adversarial search is best for chess?
A) Minimax
B) Expectimax
C) Expectiminimax
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Clicker Quiz
What type of adversarial search is best for chess?
A) Minimax
B) Expectimax
C) Expectiminimax
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Clicker Quiz
What type of adversarial search is best for backgammon?
A) Minimax
B) Expectimax
C) Expectiminimax
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Clicker Quiz
What type of adversarial search is best for backgammon?
A) Minimax
B) Expectimax
C) Expectiminimax
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Clicker Quiz
An evaluation function gives an estimate of the expected
utility of the game for a given, non-terminal game
position.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Clicker Quiz
An evaluation function gives an estimate of the expected
utility of the game for a given, non-terminal game
position.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
This lecture
• Introduction to Games
• Perfect play (Min-Max)
– Tic-Tac-Toe
– Checkers
– Chess
– Go
• Resource limits
• - pruning
• Games with chance
– Backgammon
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Types of Games
Deterministic
Perfect
Information
Imperfect
Information
CSE 150, Fall 2012
Chance
Chess, Checkers
Go, Othello
Backgammon,
Monopoly
Battleship
Bridge, Poker,
Scrabble,
Nuclear war
Adapted from slides adapted from slides of Russell’s by Kriegman
Games vs. search problems
Similar to state-space search but:
1. There’s an Opponent whose moves must be considered
2. Because of the combinatorial explosion, we can’t search the
entire tree:
Have to commit to a move based on an estimate of its
goodness.
Heuristic mechanisms we used to decide which node to open
next will be used to decide which MOVE to make next
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Games vs. search problems
We will follow some of the history:
– algorithm for perfect play [Von Neumann, 1944]
– finite horizon, approximate evaluation [Zuse,
1945; Shannon, 1950; Samuel, 1952-57]
– pruning to reduce costs [McCarthy, 1956]
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Deterministic Two-person Games
• Two players move in turn
• Each Player knows what the other has done
and can do
• Either one of the players wins (and the other
loses) or there is a draw.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Games as Search: Grundy’s Game
[Nilsson, Chapter 3]
• Initially a stack of pennies stands between two players
• Each player divides one of the current stacks into two
unequal stacks.
• The game ends when every stack contains one or two
pennies
• The first player who cannot play loses
MAX
CSE 150, Fall 2012
MINNIE
Adapted from slides adapted from slides of Russell’s by Kriegman
Grundy’s Game: States
Can Minnie
formulate a strategy
to always win?
6, 1
5, 1, 1
3, 1, 1, 1, 1
Max’s turn
5, 2
4, 2, 1
4, 1, 1, 1
4, 3
3, 2, 2
3, 2, 1, 1
MINNIE Loses
Minnie’s turn
Max’s turn
3, 3, 1
2, 2, 2, 1 Minnie’s turn
2, 2, 1, 1, 1
MAX Loses
2, 1, 1, 1, 1, 1
CSE 150, Fall 2012
7
MINNIE Loses
Max’s turn
Minnie’s turn
Adapted from slides adapted from slides of Russell’s by Kriegman
When is there a
winning strategy for Max?
• When it is MIN’s turn to play, a win must be
obtainable for MAX from all the results of the
move (sometimes called an AND node)
• When it is MAX’s turn, a win must be obtainable
from at least one of the results of the move
(sometimes called an OR node).
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Minimax
• Initial state: Board position and whose turn it is.
• Operators: Legal moves that the player can make
• Terminal test: A test of the state to determine if game is
over; set of states satisfying terminal test are terminal
states.
• Utility function: numeric value for outcome of game (leaf
nodes).
– E.g. +1 win, -1 loss, 0 draw.
– E.g. # of points scored if tallying points as in
backgammon.
• Assumption: Opponent will always chose operator (move)
maximize own utility.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Minimax
• Initial state: Board position and whose turn it is.
• Operators: Legal moves that the player can make
• Terminal test: A test of the state to determine if game is
over; set of states satisfying terminal test are terminal
states.
• Utility function: numeric value for outcome of game (leaf
nodes).
– E.g. +1 win, -1 loss, 0 draw.
– E.g. # of points scored if tallying points as in
backgammon.
• Assumption: Opponent will always chose operator (move)
maximize own utility.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Minimax
Perfect play for deterministic, perfect-information games
Max’s strategy: choose action with highest minimax
value
best achievable payoff against best play by min.
Consider a 2-ply (two step) game:
Max wants largest outcome --- Minnie wants smallest
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Minimax Example
Game tree for a fictitious 2-ply game
MAX
3 = max(3,2,2)
A1
A3
A2
3=
MIN
min(3,12,8)
A1,2
3
CSE 150, Fall 2012
2
2
A1.3
12
8
2
4
6
14
Leaf nodes: value of utility function
5
2
Adapted from slides adapted from slides of Russell’s by Kriegman
Minimax Algorithm
Function MINIMAX-DECISION (game) returns an operator
For each op in OPERATORS[game] do
Value[op]  MINIMAX-VALUE(Apply(op, game), game)
End
Return the op with the highest Value[op]
Function MINIMAX-VALUE (state, game) returns a utility value
If TERMINAL-TEST[game](state) then
return UTILITY[game](state)
else if MAX is to move in state then
return the highest MINIMAX-VALUE of Successors[state]
else return the lowest MINIMAX-VALUE of Successors[state]
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Properties of Minimax
• Complete:
• Optimal:
Yes, if tree is finite
Yes, against an optimal opponent.
Otherwise??
• Time complexity: O(bm)
• Space complexity: O(bm)
Note: For chess, b  35, m  100 for a “reasonable game.”
 Solution is completely infeasible
Actually only see roughly 1040 board positions, not 35100: Checking for
previously visited states becomes important! (“transposition table”)
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Clicker Question
A minimax search
tree is shown to the
left. What value
should we give to
the root node?
A) 8
B) 10
C) 9
D) 100
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Clicker Question
A minimax search
tree is shown to the
left. What value
should we give to
the root node?
A) 8
B) 10
C) 9
D) 100
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Clicker Question
A minimax search
tree is shown to the
left. What value
should we give to
the root node?
A) 8
B) 10
C) 9
D) 100
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Resource limits
Suppose we have 100 seconds, explore 104 nodes/second
 106 nodes per move
Standard approach:
• cutoff test
e.g., depth limit
• evaluation function = estimated desirability of position
(an approximation to the utility used in minimax).
This is similar to the heuristic evaluation function - but its
units are arbitrary - it is not the distance to a goal
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Evaluation Function
For chess, typically linear weighted
sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + ... + wn fn(s)
e.g., w1 = 9 with
f1(s) = (number of white queens) - (number of black queens)
w2 = 5 with
f2(s) = (number of white rooks) - (number of black rooks)
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Clicker Question
Assume we have an evaluation function f1 which returns a value
between 0 and 100.
Now assume we define another evaluation function f2, which is
defined as follows:
f2(s) = f1(s)2, for all states s.
Which of the following is/are true?
A) Minimax search using f2 is guaranteed to return the same
action as minimax using f1
B) Expectimax search using f2 is guaranteed to return the
same action as expectimax using f1
C) A and B
D) Neither A nor B
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Digression: Exact Values Don't Matter
MAX
MIN
• Behavior is preserved under any monotonic transformation of Eval
• Only the order matters:
payoff in deterministic games acts as an ordinal utility function
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Cutting off search
• Replace MinimaxValue with MinimaxCutoff .
• MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
Does it work in practice?
bm= 106,
b=35 
m=4
No: 4-ply lookahead is a hopeless chess player!
4-ply human novice
8-ply typical PC, human master
12-ply Deep Blue, Kasparov
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
A variable in a CSP is arc-consistent if every
value in its domain satisfies the variable’s binary
constraints.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
A variable in a CSP is arc-consistent if every
value in its domain satisfies the variable’s binary
constraints.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
If a CSP problem is arc consistent then it is also path
consistent.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
If a CSP problem is arc consistent then it is also path
consistent.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
A sudoku puzzle is an example of a constraint
satisfaction problem.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
A sudoku puzzle is an example of a constraint
satisfaction problem.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
Local search can be applied to CSPs.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
Local search can be applied to CSPs.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
The AC3 algorithm (for arc-consistency) is an
example of backtracking search.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Reading Quiz
The AC3 algorithm (for arc-consistency) is an
example of backtracking search.
A) True
B) False
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Another idea
• α-β Pruning
• Essential idea is to stop searching down a branch
of tree when you can determine that it’s a dead
end.
• Guaranteed to give the same answer as minimax!
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
α-β Pruning Example
≥ 3 =3
MAX
=2
≤2
=3
MIN
X X
3
CSE 150, Fall 2012
12
8
2
≤ 14 ≤5
14
5
2
Adapted from slides adapted from slides of Russell’s by Kriegman
The α-β Algorithm
Basically MINIMAX and keep track of α, β and prune
Initialize v for search under this node
If there is a better value
for MIN, then PRUNE
Otherwise,
alpha
This returnsupdate
the best
value for min
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
How is the search tree pruned?
• α is the best value (to
MAX) found so far.
• If V ≤ α, then the subtree
containing V will have utility
no greater than V
 Prune that branch
• β can be defined similarly
from MIN’s perspective
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
α-β Pruning
• For the entire search tree, keep parameter α:
– A LOWER BOUND on MAX’s backed-up value
from MINIMAX search
– I.E., if we did a complete MINIMAX search,
MAX’s final backed-up value would be no lower
than α.
• Why a LOWER BOUND?
– It is based on what MINNIE can hold him to on
some line of play.
– MAX will always be able to CHOOSE that line of
play -- MAX will choose a BETTER line of play if
he can.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
α-β Pruning
• For each MIN node, keep a parameter β:
– An UPPER BOUND on how well MINNIE can do
I.e., if we did a complete MINIMAX search,
MINNIE’s value will be no higher than β
• Why?
– MINNIE is looking for LOW values of the static
evaluation function.
– This is based on what MAX can hold her to. She
will choose a BETTER line of play (LOWER
VALUE) if she can.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
α-β Pruning
• While searching the tree:
– The α value of the tree is set to the
HIGHEST backed-up value found so far
– The β value of the tree is set to the
LOWEST backed-up value found so far
• STOP SEARCHING WHEN:
– The value of a MIN node is less than the α
value found so far
– The value of a MAX node is greater than
the β value found so far
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Properties of α-β pruning
Pruning does not affect final result.
Good move ordering improves effectiveness of pruning.
With “perfect ordering,” time complexity = O(bm/2)
– doubles depth of search
– can easily reach depth 8 and play good chess
A simple example of the value of reasoning about which
computations are relevant (a form of metareasoning).
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
A reminder
• “Standard” Minimax and alpha-beta assume play to the
end of the game - which is often not possible!
• Instead, we use a cut-off test instead of a terminal test
• Use a heuristic evaluation function instead of actual value.
• The results of cutting of search using alpha-beta are going
to depend on the quality of your evaluation function.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Deeper searches
Problem with depth-limited cut-off of search: if the value of a
position can change radically in the next move or two.
One fix is quiesence search:
If a piece is about to be taken, the board is not “quiet” - keep
searching beyond that point. Deep Blue used this.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Deeper searches
The horizon problem:
Sometimes disaster (or Shangri-la) is just beyond the depth bound:
In the example at the right,
in three moves (6 ply), white
can capture the bishop at a7
But black can stall this by
moving its pawns aggressively
against the white king.
If this pushes the taking of
the bishop over the
horizon, black will “think”
it has saved the day, when
all it has done is wasted
its pawns.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Deeper searches
One way to mitigate the horizon problem (there is no
“cure”):
The singular extension heuristic:
if one move from this position is clearly better than the
others, and is stored in memory, extend the search
using that move.
requires that this situation has been encountered in the
search before.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Deeper searches
Another useful heuristic:
the null move heuristic: Let the opponent take two
moves and then search under the node.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Stored knowledge
•
The transposition table: Store results of searches, using a hash
table - if you come to a node again, you don’t need to search under
it!
• There is a big fan-out at the beginning of the game:
– Deep Blue stored thousands of opening moves
• And thousands of solved end games (all games with five pieces or
less)
• And 700,000 grandmaster games
• Nowadays, a PC with good heuristics is as good as the best player.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Deterministic games in practice
Checkers: In 1994 Chinook ended 40-year-reign of human world champion Marion
Tinsley. Used an endgame database defining perfect play for all positions
involving 8 or fewer pieces on the board, a total of 443,748,401,247 positions.
Checkers is now solved. The answer is boring: the game can always be played to a
draw.
Chess: Deep Blue defeated human world champion Gary Kasparov in a six-game
match in 1997. Deep Blue searches 200 million positions per second, uses very
sophisticated evaluation, and undisclosed methods for extending some lines of
search up to 40 ply.
Othello: Human champions refuse to compete against computers, who are too good.
Go: The program Zen played against Takemiya Masaki (ranked at 9p - high
professional) on March 17, 2012. Zen beat Takemiya with a 5 stone handicap by
eleven points followed by a stunning twenty point win at a 4 stone handicap.
Takemiya remarked "I had no idea that computer go had come this far.”
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Nondeterministic games
• E.g., in backgammon, the dice rolls determine the legal moves
• Consider simpler example with coin-flipping instead of dice-rolling:
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Nondeterministic games
1. For min node, compute min of children
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Nondeterministic games
1. For mind node, compute min of children
2. For chance node, compute weighted average of children
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Nondeterministic games
1. For min node, compute min of children
2. For chance node, compute weighted average of children
3. For max node, compute max of children
3
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Algorithm for Nondeterministic Games
Expectiminimax maximize expected value (average) of utility
against a perfect opponent.
Pseudocode is just like Minimax, except we must also handle
chance nodes:
...
if state is a chance node then
return average of
ExpectiMinimax-Value of Successors(state)
...
A version of  pruning is possible but only if the leaf values are
bounded.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Function MINIMAX-VALUE (state, game) returns a utility value
If TERMINAL-TEST[game](state) then
return UTILITY[game](state)
else if MAX is to move in state then
return the highest MINIMAX-VALUE of Successors[state]
else return the lowest MINIMAX-VALUE of Successors[state]
Function EXPECTIMINIMAX-VALUE (state, game) returns a utility value
If TERMINAL-TEST[game](state) then
return UTILITY[game](state)
else if state is a chance node then
return average of EXPECTIMINMAX-VALUE of Successors(state)
else if MAX is to move in state then
return the highest EXPECTIMINIMAX-VALUE of Successors[state]
else return the lowest EXPECTIMINIMAX-VALUE of Successors[state]
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Nondeterministic Games in Practice
• Dice rolls increase branching factor b:
– 21 possible rolls with 2 dice
– Backgammon 20 legal moves
•
depth 4  20 * (21 * 20)3  1.2 X 109
• As depth increases, probability of reaching a given node shrinks
 value of look ahead is diminished
•  pruning is much less effective
• TDGammon uses depth-2 search + very good Eval
Plays at world-champion level
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Clicker Questions
An expectiminimax
search tree is shown to
the left. Edge weights
represent probability.
What value should we
give to the root node?
A) -1
B) 1
C) 3
D) 4
E) 5.5
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Digression: Exact Values DO Matter
• Behavior is preserved only by positive linear transformation
of Eval.
• Hence Eval should be proportional to the expected payoff.
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman
Games Summary
• Games are fun to work on!
• They illustrate several important points about AI
– perfection is unattainable  must approximate
– good idea to think about what to think about
– uncertainty constrains the assignment of values to states
• “Games are to AI as grand prix racing is to automobile
design”
CSE 150, Fall 2012
Adapted from slides adapted from slides of Russell’s by Kriegman