Adversarial Search

Adversarial Search
1
Adversarial Search
• The ghosts trying to make pacman loose
• Can not come up with a giant program that plans to the end,
because of the ghosts and their actions
• Goal: Eat lots of dots and lots of ghosts (not most hideous ones)
and not die
• How to come up with an action based on reasoning ahead about
what both you and your adversary might do
2
Adversarial Search Problems: Games
• Many different kinds of games!
– Deterministic or stochastic?
Is there any dice? Does any action have multiple random
outcomes?
– One, two, or more players?
Multiple agents may cooperate, compete, or act in between.
– Zero sum?
One utility function: you want to make it big and the adversary
wants to make it small
– Perfect information (can you see the state)?
• Need algorithms for calculating a strategy (policy) which
recommends a move in each state
3
Deterministic Games
• Many possible formalizations, one is:
–
–
–
–
–
–
States: S (start at s0)
Players: P={1 ... N} (usually take turns)
Actions: A (may depend on player/state)
Transition Function (also called the model): S×A → S
Terminal Test: S → {t, f}
(Terminal) Utility Function: S × P → R
• Solution for a player is a policy: S → A
4
Deterministic Single-Player?
• Deterministic, single player, perfect information:
–
–
–
–
Know the rules
Know what actions do
Know when you win
E.g. 8-Puzzle, Rubik’s
cube
• … it’s just a search!
5
Deterministic Single-Player?
• Like search tree, but there is some end of the game
• On each leaf (ending node) there is some
outcome (+1 win, -1 lose or some real values
to specify some states are better than others)
• In general search
– costs are incremental
– we try to minimize cost
• Games are like general AI
– we want to maximize utility,
– all rewards come at the end,
no cost on each move
• In each node select the branch (child) that
finally will achieve the win
-1
+1
-1
6
Deterministic Single-Player?
• … it’s just a search!
• Slight reinterpretation:
+1
– Each node stores a value: the
best outcome it can reach under
optimal play
– This is the maximal value of
its children (the maxValue)
– Note that we don’t have path sums
as before (utilities are at leaves)
– Here, value at the root is +1
since you can force the win
• After search,
can pick move that leads to best node
-1
+1
-1
7
Adversarial Games
• Deterministic, zero-sum games:
– Tic-tac-toe, chess, checkers
– One player maximizes result
– The other minimizes result
• Minimax search:
– A state-space search tree
– Players alternate turns
– Each node has a minimax
value: best achievable utility
against a rational (optimal)
adversary
8
Adversarial Games
• Deterministic, zero-sum games:
– Tic-tac-toe, chess, checkers
– One player maximizes result
– The other minimizes result
• Minimax search:
– A state-space search tree
– Players alternate turns
– Each node has a minimax
value: best achievable utility
against a rational (optimal)
adversary
9
Computing Minimax Values
• Two recursive functions:
– max-value maxes the values of successors
– min-value mins the values of successors
def value(state):
If the state is a terminal state: return the state’s utility
If the agent is MAX: return max-value(state)
If the agent is MIN: return min-value(state)
def max-value(state):
Initialize max = -∞
For each successor of state:
Compute value(successor)
Update max accordingly
Return max
10
11
Minimax Example
Minmax search
performs a complete
DFS exploration of
game tree
?
?
?
+∞
?
?
+∞
3
3
+∞
3
3
12
Minimax Example
3
3
2
2
13
Tic-Tac-Toe Game Tree
Fewer than 9! Terminal nodes
Minmax value at the root is 0
14
Minimax Properties
• Optimal against a perfect player. Otherwise?
If MIN does not play optimally, MAX can do even better (other
strategies exist)
• Time complexity?
– O(bm)
Exponential Time
Totally impractical
• Space complexity?
– O(bm)
• For chess, b ≈ 35, m ≈ 100
– Exact solution is completely infeasible
– But, do we need to explore the whole tree?
15
Resource Limits
Two real time approaches when
we cannot search to leaves (terminals):
• Truncate subtrees
− Depth-limited Search
− Iterative Deepening
• Pruning parts of tree proved
not be useful
− Alpha-Beta pruning
16
Depth-limited Search
Depending how long you have time, search to the right depth
• Depth-limited search
– search a limited depth of tree
– Replace terminal utilities with a
heuristic evaluation function Eval
for non-terminal positions
– Guarantee of optimal play is gone
– More plies makes a BIG difference
• Example:
– Suppose we have 100 seconds time
for each move, and
can explore 10K nodes/sec
– So can check 1M nodes per move
– This reaches about depth 8, a decent chess program
17
Iterative Deepening
If not sure about time, search depth by depth
• Iterative deepening, use DFS to do BFS
− Do a DFS which only searches for paths of
length 1. (DFS gives up on any path of
more than 1)
− If still have time, do a DFS which only
searches paths of length 2 or less.
− If still have time, do a DFS which only
searches paths of length 3 or less.
….and so on.
Note: wrongness of Eval functions matters less and
less the deeper the search goes!
18
Evaluation Functions
• Function which scores non-terminal states (estimated achievable
utility)
• Ideal function: returns the exact utility of the state
• In practice: typically weighted linear sum of features of states:
Eval ( s ) = w1 f1 ( s ) + w2 f 2 ( s ) + K + wn f n ( s )
• e.g. f1(s) = (numWhiteQueens – numBlackQueens), etc.
19
Minimax Example
20
Pruning in Minimax Search
-∞ → 3 → 3 → 3
3
∞→3 →3 →3
2
∞ → 14 → 5 → 2
21
Alpha-Beta Pruning
• General configuration
– We’re computing the
MIN-VALUE at n
– We’re looping over n’s children
– n’s value estimate is dropping
– a is the best value that MAX
can get at any choice point
along the current path
– If n’s value estimate becomes
worse than a,
MAX will avoid it, so can stop
considering n’s other children
– Define b similarly for MIN
22
Alpha-Beta Pruning Example
23
Alpha-Beta Pruning Example
a=−∞
b=+∞
Starting a/b
Raise a
a=−∞
b=+∞
a=3
b=+∞
Lower b
a=−∞
b=+∞
Raise a
a=−∞ a=−∞ a=−∞
b=3 b=3 b=3
a=−∞
b=3 a=8
b=3
×
a=3
b=+∞
×
a=3
b=2
a=3
b=+∞
a=3
b=+∞
a=3
b=+∞
a=3 a=3
b=14 b=5
×
a=3
b=1
a is MAX’s best choice (highest)
found so far along the path
b is MIN’s best choice (lowest)
found so far along the path
Example trace
v = max (v, MinV(child, α, β))
if
v ≥ β the return v
else α = max(α, v)
MaxV=3
MaxV=12
u = min (u, MaxV(child, α, β))
if
u ≤ α the return u
else β = min(β, u)
MaxV=8
Example trace
MaxV=2
Example trace
MaxV=14
MaxV=5
MaxV=2
28
Alpha-Beta Pseudocode
29
Alpha-Beta Pruning Properties
• This pruning has no effect on final result at the root
• Values of intermediate nodes might be wrong!
– Important: children of the root may have the wrong value
• Good child ordering improves effectiveness of pruning
• With “perfect ordering”:
– Time complexity drops to O(bm/2)
– Doubles solvable depth!
– Full search of, e.g. chess, is still hopeless…
• This is a simple example of metareasoning (computing
about what to compute)
30