Game Search

9-16-2013
David A. Walsh ’67 Arts & Sciences Seminar Series
The Co-Evolution of Humans with Machines
Paul Horn ’68
Monday, September 16, 2013
12 pm – 1 pm
Student Center Multipurpose Rooms
A number of well-known scientists and technologists are
suggesting that we are on the verge of a “singularity”, a
transition in evolution from life as we know it to a postbiological future. The “human era” will be ended. The
boldest of the “singularitarians” predict that the epoch
will arrive in less than 30 years. What can we believe?
 Game/Adversary Search
 MiniMax
HW#3 due today
Reading Assignment: AIMA: Chapter 5:
Game (adversary) Search
 What are games?
 Optimal decisions in games
 Which
strategy leads to success?
 - pruning
 Games of imperfect information
 Games that include an element of chance
 Search – no adversary
 Solution is (heuristic) method for finding goal
 Heuristics can find optimal solution
 Evaluation function: estimate of cost from start to
goal through given node
 Examples: path planning, scheduling activities
 Games – adversary
 Solution is strategy (strategy specifies move for
every possible opponent reply).
 Time limits force an approximate solution
 Evaluation function: evaluate “goodness” of
game position
 Examples: chess, checkers, Othello, backgammon
 "Unpredictable" opponent  specifying a move for every
possible opponent reply
 Time limits  unlikely to find goal, must approximate
Plan of attack:
 Computer considers possible lines of play (Babbage, 1846)
 Algorithm for perfect play (Zermelo, 1912; Von Neumann, 1944)
 Finite horizon, approximate evaluation (Zuse, 1945; Wiener, 1948;
Shannon, 1950)
 First chess program (Turing, 1951)
 Machine learning to improve evaluation accuracy (Samuel, 1952-57)
 Pruning to allow deeper search (McCarthy, 1956)
 A game formulated as a search problem:
 Initial
state:
● board position and turn
 Operators:
● definition of legal moves
 Terminal state:
● conditions for when game is over
 Utility function:
● a numeric value that describes the outcome of the
game; e.g. -1, 0, 1 for loss, draw, win. (AKA
payoff function)
deterministic
complete
information
imperfect
information
chess, checkers,
go, othello
?
chance
backgammon,
monopoly
bridge, scrabble,
poker, nuclear war(?)
nondeterministic problem
AND
node
AND
node
OR
node
OR
node
break a complex problem into subproblems
AND
node
AND
node
OR
node
“divide and conquer”
game tree
OR
node
AND
node
The root node of the game tree represents the current state, with
Player 1 (X) to move. Edges represent all possible legal moves
18. β≤ 5
19.the
α= 7 states with
from that state.6. β≤
The
8 nodes at level two represent
Player 2 (O) to move, and edges are all the counter moves, and so
on. Each level is called a “ply”, so the following tree is 3-ply.
m2
m1
m4
m1
m1
m3
m2
m2
m1
m3
m2
m4
m1
m3
Suppose that X is the player and O is the opponent. X is trying
to maximize the value and O wants to minimize it. Should X
18. β≤ 5
No,
has
to
consider
opponent.
choose move m
to
get
the
score
of
100?
19. α= 7
6.2β≤ 8
 Consider state O2. Move m1 results in a next state with value
12 and m4 results in an 8. If player O is playing well, which move
should be chosen?

1. generate tree X1
3. backup
values
O2
m1
m1
8
4. choose move
MAX
m2
MIN
O3
m4
m1
8
X4
X5
X6
12
8
5
5
m2
2. evaluate possible future states
X7
100
MAX
 Perfect play for deterministic environments with perfect
information
 Basic idea: choose move with highest minimax value
= best achievable payoff against best play
 Algorithm:
1. Generate game tree completely
2. Determine utility of each terminal state
3. Propagate the utility values upward in the tree by
applying MIN and MAX operators on the nodes in
the current level
4. At the root node use minimax decision to select
the move with the max (of the min) utility value
 Steps 2 and 3 assume that the opponent plays perfectly.
 Complete?
Yes (if tree is finite)
 Optimal?
Yes (against an optimal opponent)
 Time complexity?
O(bm)
 Space complexity?
O(bm) (depth-first exploration)
 For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
Suppose we have 100 secs, explore 104
nodes/sec
 106 nodes per move
Standard approach:
 cutoff test:
e.g., depth limit (perhaps add quiescence search)
 (“static”) evaluation function
= estimated desirability of position
 Types of evaluation functions:
- Weighted linear function: sum of weights times
features
- Non-linear: e.g. neural nets for backgammon
 Characteristics of a good evaluation function
- should agree with the real utility function
- should be quick to compute
- should reflect the ability to win
 Often a linear weighted sum of features
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
 Weighted linear evaluation function: to
combine n heuristics
f = w1f1 + w2f2 + … + wnfn
e.g, w’s could be the values of pieces (1 for pawn, 3 for bishop etc.),
f’s could be the number of type of pieces on the board
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
Does it work in practice?
bm = 106, b=35  m=4
4-ply lookahead is a hopeless chess player!



4-ply ≈ human novice
8-ply ≈ typical PC, human master
12-ply ≈ Deep Blue, Kasparov
example: s3
eval(s3) = ?
X
O
Static evaluation function for state s:
eval(s) =
∞
-∞
if s is a win for X
if s is a win for O
(# of rows, cols & diags open for X) –
(# of rows, cols & diags open for O)
1. fixed depth limit (static or dynamic)
2. Iterative deepening
cutoff at depth d
problem: horizon effect; may cut off before
some really good move for opponent
solution: add quiescence
another variation: add secondary search
 Definition of optimal play for MAX assumes
MIN plays optimally: maximizes worst-case
outcome for MAX.
 But if MIN does not play optimally, MAX will
do even better. [proven.]
 Minimax-Value is complete and optimal, but is
intractable for all but very small problems
 Minimax-Cutoff (Minimax) uses a static evaluation
function which estimates the “goodness” of the state
from the player agent’s perspective
 Algorithm:
1. Generate game tree to the desired ply
2. Evaluate each leaf state
3. Propagate the values upward in the tree by
applying MIN and MAX operators on the nodes in
the current level
4. At the root node use minimax decision to select
the move with the max (of the min) evaluation