2PP

CS 188: Artificial Intelligence
Fall 2006
Lecture 7: Adversarial Search
9/19/2006
Dan Klein – UC Berkeley
Many slides over the course adapted from either Stuart
Russell or Andrew Moore
Announcements
ƒ Project 1.2 is up (Single-Agent Pacman)
ƒ Critical update: make sure you have the most
recent version!
ƒ Reminder: you are allowed to work with a
partner!
ƒ Change to John’s section: M 3-4pm now in
4 Evans
1
Motion as Search
ƒ Motion planning as path-finding problem
ƒ Problem: configuration space is continuous
ƒ Problem: under-constrained motion
ƒ Problem: configuration space can be complex
Why are there two
paths from 1 to 2?
Decomposition Methods
ƒ Break c-space into discrete regions
ƒ Solve as a discrete problem
2
Approximate Decomposition
ƒ Break c-space into
a grid
G
ƒ Search (A*, etc)
ƒ What can go wrong?
ƒ If no path found, can
subdivide and
repeat
ƒ Problems?
ƒ Still scales poorly
ƒ Incomplete*
ƒ Wiggly paths
S
Hierarchical Decomposition
ƒ Actually used in practical
systems
ƒ But:
ƒ Not optimal
ƒ Not complete
ƒ Still hopeless
above a small
number of
dimensions
3
Skeletonization Methods
ƒ Decomposition methods
turn configuration space
into a grid
ƒ Skeletonization methods
turn it into a set of points,
with preset linear paths
between them
Visibility Graphs
ƒ Shortest paths:
ƒ No obstacles: straight line
ƒ Otherwise: will go from
vertex to vertex
ƒ Fairly obvious, but
somewhat awkward to prove
qgoal
ƒ Visibility methods:
ƒ All free vertex-to-vertex lines
(visibility graph)
ƒ Search using, e.g. A*
ƒ Can be done in O(n3) easily,
O(n2log(n)) less easily
qstart
ƒ Problems?
ƒ Bang, screech!
ƒ Not robust to control errors
ƒ Wrong kind of optimality?
4
Voronoi Decomposition
ƒ Voronoi regions: points colored by closest obstacle
Y
R
G
B
ƒ Voronoi diagram: borders between regions
ƒ Can be calculated efficiently for points (and polygons) in 2D
ƒ In higher dimensions, some approximation methods
Voronoi Decomposition
ƒ Algorithm:
ƒ Compute the Voronoi diagram
of the configuration space
ƒ Compute shortest path (line)
from start to closest point on
Voronoi diagram
ƒ Compute shortest path (line)
from goal to closest point on
Voronoi diagram.
ƒ Compute shortest path from
start to goal along Voronoi
diagram
ƒ Problems:
ƒ Hard over 2D, hard with
complex obstacles
ƒ Can do weird things:
5
Probabilistic Roadmaps
ƒ Idea: just pick random points
as nodes in a visibility graph
ƒ This gives probabilistic
roadmaps
ƒ Very successful in practice
ƒ Lets you add points where you
need them
ƒ If insufficient points, incomplete,
or weird paths
Roadmap Example
6
Potential Field Methods
ƒ So far: implicit preference for short paths
ƒ Rational agent should balance distance
with risk!
ƒ Idea: introduce cost for being close to an
obstacle
ƒ Can do this with discrete methods (how?)
ƒ Usually most natural with continuous
methods
Potential Fields
ƒ Cost for:
ƒ Being far from goal
ƒ Being near an
obstacle
ƒ Go downhill
ƒ What could go
wrong?
7
Adversarial Search
[DEMO 1]
Game Playing State-of-the-Art
ƒ Checkers: Chinook ended 40-year-reign of human world champion
Marion Tinsley in 1994. Used an endgame database defining perfect
play for all positions involving 8 or fewer pieces on the board, a total
of 443,748,401,247 positions.
ƒ Chess: Deep Blue defeated human world champion Gary Kasparov
in a six-game match in 1997. Deep Blue examined 200 million
positions per second, used very sophisticated evaluation and
undisclosed methods for extending some lines of search up to 40
ply.
ƒ Othello: human champions refuse to compete against computers,
which are too good.
ƒ Go: human champions refuse to compete against computers, which
are too bad. In go, b > 300, so most programs use pattern
knowledge bases to suggest plausible moves.
ƒ Pacman: unknown
8
Game Playing
ƒ Axes:
ƒ Deterministic or stochastic?
ƒ One, two or more players?
ƒ Perfect information (can you see the state)?
ƒ Want algorithms for calculating a strategy
(policy) which recommends a move in
each state
Deterministic Single-Player?
ƒ Deterministic, single player,
perfect information:
ƒ
ƒ
ƒ
ƒ
Know the rules
Know what actions do
Know when you win
E.g. Freecell, 8-Puzzle, Rubik’s
cube
ƒ … it’s just search!
ƒ Slight reinterpretation:
ƒ Each node stores the best
outcome it can reach
ƒ This is the maximal outcome of
its children
ƒ Note that we don’t store path
sums as before
ƒ After search, can pick move that
leads to best node
lose
win
lose
9
Deterministic Two-Player
ƒ E.g. tic-tac-toe, chess,
checkers
ƒ Minimax search
ƒ A state-space search tree
ƒ Players alternate
ƒ Each layer, or ply, consists of a
round of moves
ƒ Choose move to position with
highest minimax value = best
achievable utility against best
play
max
min
8
2
5
6
ƒ Zero-sum games
ƒ One player maximizes result
ƒ The other minimizes result
Tic-tac-toe Game Tree
10
Minimax Example
Minimax Search
11
Minimax Properties
ƒ Optimal against a perfect player. Otherwise?
max
ƒ Time complexity?
ƒ O(bm)
min
ƒ Space complexity?
ƒ O(bm)
ƒ For chess, b ≈ 35, m ≈ 100
10
10
9
ƒ Exact solution is completely infeasible
ƒ But, do we need to explore the whole tree?
100
[DEMO2:
minVsExp]
Resource Limits
ƒ Cannot search to leaves
4
ƒ Limited search
ƒ Instead, search a limited depth of the
tree
ƒ Replace terminal utilities with an eval
function for non-terminal positions
max
4
-2 min
-1
-2
4
?
?
min
9
ƒ Guarantee of optimal play is gone
ƒ More plies makes a BIG difference
ƒ [DEMO 3: limitedDepth]
ƒ Example:
ƒ Suppose we have 100 seconds, can
explore 10K nodes / sec
ƒ So can check 1M nodes per move
ƒ α-β reaches about depth 8 – decent
chess program
?
?
12
Evaluation Functions
ƒ Function which scores non-terminals
ƒ Ideal function: returns the utility of the position
ƒ In practice: typically weighted linear sum of features:
ƒ e.g. f1(s) = (num white queens – num black queens), etc.
Evaluation for Pacman
[DEMO4: evalFunction]
13
Iterative Deepening
Iterative deepening uses DFS as a subroutine:
1. Do a DFS which only searches for paths of
length 1 or less. (DFS gives up on any path of
length 2)
2. If “1” failed, do a DFS which only searches paths
of length 2 or less.
3. If “2” failed, do a DFS which only searches paths
of length 3 or less.
….and so on.
b
…
This works for single-agent search as well!
Why do we want to do this for multiplayer games?
α-β Pruning Example
14
α-β Pruning
ƒ General configuration
ƒ α is the best value the
Player can get at any
choice point along the
current path
ƒ If n is worse than α, MAX
will avoid it, so prune n’s
branch
ƒ Define β similarly for MIN
Player
Opponent
α
Player
Opponent
n
α-β Pruning Pseudocode
β
v
15
α-β Pruning Properties
ƒ Pruning has no effect on final result
ƒ Good move ordering improves effectiveness of pruning
ƒ With “perfect ordering”:
ƒ Time complexity drops to O(bm/2)
ƒ Doubles solvable depth
ƒ Full search of, e.g. chess, is still hopeless!
ƒ A simple example of metareasoning, here reasoning
about which computations are relevant
Non-Zero-Sum Games
ƒ Similar to
minimax:
ƒ Utilities are
now tuples
ƒ Each player
maximizes
their own entry
at each node
ƒ Propagate (or
back up) nodes
from children
1,2,6
4,3,2
6,1,2
7,4,1
5,1,1
1,5,2
7,7,1
5,4,5
16
Stochastic Single-Player
ƒ What if we don’t know what the
result of an action will be? E.g.,
ƒ In solitaire, shuffle is unknown
ƒ In minesweeper, don’t know where
the mines are
max
ƒ Can do expectimax search
ƒ Chance nodes, like actions except
the environment controls the action
chosen
ƒ Calculate utility for each node
ƒ Max nodes as in search
ƒ Chance nodes take average
(expectation) of value of children
average
8
2
5
6
ƒ Later, we’ll learn how to formalize
this as a Markov Decision
Process
Stochastic Two-Player
ƒ E.g. backgammon
ƒ Expectiminimax (!)
ƒ Environment is an
extra player that moves
after each agent
ƒ Chance nodes take
expectations, otherwise
like minimax
17
Stochastic Two-Player
ƒ Dice rolls increase b: 21 possible rolls
with 2 dice
ƒ Backgammon ≈ 20 legal moves
ƒ Depth 4 = 20 x (21 x 20)3 1.2 x 109
ƒ As depth increases, probability of
reaching a given node shrinks
ƒ So value of lookahead is diminished
ƒ So limiting depth is less damaging
ƒ But pruning is less possible…
ƒ TDGammon uses depth-2 search +
very good eval function +
reinforcement learning: worldchampion level play
What’s Next?
ƒ Make sure you know what:
ƒ Probabilities are.
ƒ Expectations are.
ƒ Next:
ƒ How to learn evaluation functions
ƒ Markov Decision Processes
18