CS 445 / 645 Introduction to Computer Graphics

CS 416
Artificial Intelligence
Lecture 8
Adversarial Search
Chapter 6
TA Office Hours
Chris White Office Hours
• Olsson 238
• Tuesday after class
• Friday 11:30 – 1:00
Buddha on the Brain (Wired Issue 14.02)
The neuroscience of meditation
• Tibetan monk with more
than 10,000 hours of
meditation time
EEG
• 30x greater gamma waves
(focused thought)
• Larger active area of
prefrontal cortex
Think “kindness and compassion”
“The world’s fastest runners stopped
racing cars years ago”
2003 Kasparov v. Deep Junior ends in a 3-3 Draw
Chess Article
Garry Kasparov reflects on computerized chess
• IBM should have released the contents of Deep Blue to chess
community to advance research of computation as it relates to chess
• Kudos to Deep Junior for putting information in public domain so state of
the art can advance
• Deep Blue made one good move that surprised Kasparov (though he
thinks a person was in the loop)
• Deep Junior made a fantastic sacrifice that reflects a new
accomplishment for computerized chess
http://www.opinionjournal.com/extra/?id=110003081
Where we’ve been
Search
• Find optimal sequence of actions
– Tree searching
• Find optimal input to a function
– Simulated annealing
– Genetic algorithms
– Gradient descent
Adversarial Search
Problems involving
• Multiple agents
• Competitive environments
• Agents have conflicting goals
Also called games
Since the dawn of time?
Oldest known written fair-division problem
Talmud – Jewish Oral Law dating to first century
• A Bankruptcy Case
– A man married three wives and in each marriage contract he promised
each of them different amounts of money upon his death:
 one of them gets $100
 another gets $200
 the third gets $300
– When he died, he had fewer than $600
• What do you do?
Bankruptcy law
• Modern bankruptcy provides shares of the estate
proportional to their individual claims, no matter what size of
the estate
– A receives 100/600 * estate_holdings
– B receives 200/600 * estate_holdings
– C receives 300/600 * estate_holdings
Bankruptcy law
Rabbi Nathan in Mishnah section of Talmud
Estate→
Claims↓
100
200
300
100
33.3
50
50
200
33.3
75
100
300
33.3
75
150
This allocation not understood until recently
Unexplained until 1984
Aumann and Maschler (Israeli Mathematicians)
•
Realistically, when you die, people could come out of the
woodwork saying you owe them money. Some could
coalesce into deceptive groups. How can we reduce the
incentives (rewards) of forming such groups?
•
Minimize largest dissatisfaction among all possible
coalitions
•
A common fair-division problem
– http://www.math.gatech.edu/~hill/publications/cv.dir/made
vice.pdf
Garment Principle
Two people claim a garment worth $100
• One claims the entire garment belongs to him
• The other claims half the garment is his
The one claiming the full garment gets $75
The one claiming half gets $25
Why?
Minimizing maximum dissatisfaction
• The one who wants the entire garment cedes nothing to the
other and thus wants $100.
• The one who wants half the garment would be perfectly
happy to cede $50 to the other.
– But a split of 50/50 would make one person unhappy and
the other perfectly happy
 How to make them equally unhappy?
A $100 Garment
Person 1
Person 2
Requested
Amount
100
50
Ceded from
competitor
50
0
25
25
75
25
Split what
remains
Sum of ceded
and split
Game Theory
Studied by mathematicians, economists, finance
In this part of AI we limit games to:
• deterministic
• turn-taking
• two-player
• zero-sum
• perfect information
Games
“Shall we play a game?”
Let’s play tic-tac-toe
Tic-Tac-Toe game tree
MAX’s first move
MIN’s first move
Each layer is a
ply
What data do we need to play?
Initial State
• How does the game start?
Successor Function
• A list of legal (move, state) pairs for each state
Terminal Test
• Determines when game is over
Utility Function
• Provides numeric value for all terminal states
Minimax strategy
Optimal Strategy
• Leads to outcomes at least as good as any other strategy
when playing an infallible opponent
• Pick the option that minimizes the maximum damage your
opponent can do
– minimize the worst-case outcome
– because your skillful opponent will certainly find the most
damaging move
Minimax
Algorithm
• MinimaxValue(n) =
Utility (n)
if n is a terminal state
max MinimaxValue(s) of all successors, s
if n is a MAX node
min MinimaxValue(s) of all successors, s
if n is a MIN node
This is optimal strategy assuming both players
play optimally from there until end of game
A two-ply example
MIN considers minimizing how much it loses…
A two-ply example
MAX considers maximizing how much it wins…
Minimax Algorithm
We wish to identify minimax decision at the root
• Recursive evaluation of all nodes in game tree
• Time complexity = O (bm)
Feasibility of minimax?
How about a nice game of chess?
• Avg branching = 35 and avg # moves = 50 for each player
– O(35100) time complexity = 10154 nodes
 1040 distinct nodes
1081 atoms in universe!
Minimax is impractical if directly applied to chess
Pruning minimax tree
Are there times when you know you need not
explore a particular move?
• When the move is poor?
• Poor compared to what?
• Poor compared to what you have explored so far
Alpha-beta pruning
•
a
– the value of the best (highest) choice so far in search of
MAX
•
b
– the value of the best (lowest) choice so far in search of
MIN
• Order of considering successors matters
– If possible, consider best successors first
Notation on tree
Max wants to
maximize a
Min wants to
minimize b
Alpha-beta pruning
MIN knows it will
lose at most 3.
MAX worries that
MIN knows player
–inf is still possible
MAX has an option
of going to node B
MAX knows that 3 is
with a min payoff of
worst case for this
3. MAX will never
node.
take action C and
MAX knows that it
culling is possible.
can accomplish a
score of at least 3.
Discovery could find
a higher value
Alpha-beta pruning
• Without pruning
– O(bd) nodes to explore
• With a good heuristic pruner (consider part (f) of figure)
– O(bd/2)
 Chess can drop from O(35100) to O(6100)
• With a random heuristic (you don’t try the best thing first)
– O(b3d/4)
Real-time decisions
What if you don’t have enough time to explore
entire search tree?
• We cannot search all the way down to terminal state for all
decision sequences
• Use a heuristic to approximate (guess) eventual terminal
state
• Replace non-terminal states with output of heuristic and treat
as if they were terminal
Evaluation Function (Estimator)
The heuristic that estimates expected utility
• Cannot take too long (otherwise continue w/o it)
• It should preserve the ordering among terminal states
– otherwise it can cause bad decision making
• Define features of game state that assist in evaluation
– what are features of chess?
Truncating minimax search
When do you recurse or use evaluation function?
• Cutoff-Test (state, depth) returns 1 or 0
– When 1 is returned, use evaluation function
When do you cut off?
• Cutoff if state is stable or quiescient (more predictable)
When do you cut off?
• When exploring beyond a certain depth
– The horizon effect
When do you cut off?
Use of a good heuristic as cutoff will expedite, but
not invalidate search (same result w/o heuristic)
Also…
• Cutoff moves you know are bad (forward pruning)
• Risk losing good states down the road
Benefits of truncation
Comparing Chess
Number of plys that can
considered per unit time
• Using minimax
5 ply
• Average Human
6-8 ply
• Using alpha-beta
10 ply
• Intelligent pruning
14 ply
Games with chance
How to include chance in game tree?
• Add chance
nodes
Expectiminimax
Expectiminimax (n) =
• utility(n)
if n is a terminal state
•
max sSuccessors( n ) EXPECTIMINIMAX ( s)
if n is a MAX node
•
min sSuccessors( n ) EXPECTIMINIMAX ( s)
if n is a MIN node
•
 P(s) * EXPECTIMINIMAX (s)
sSuccessors( n )
if n is a chance node
Pruning
Can we prune search in games of chance?
• Think about alpha-beta pruning
– With alpha-beta, we don’t explore nodes that we know are
worse than what we know we can accomplish
– With randomness, we never really what we will accomplish
 chance node values are average of successors
Thus it is hard to use alpha-beta
• Best case is to bound max/min outcomes
History of Games
Chess, Deep Blue
• IBM: 30 RS/6000 comps with 480 custom VLSI chess chips
• Deep Thought design came from Campbell and Hsu at CMU
• 126 mil nodes / s
• 30 bil positions per move
• routine reaching depth of 14
• iterative deepening alpha-beta search
Deep Blue
• evaluation function had 8000 features
• 4000 opening moves in memory
• 700,000 grandmaster games from which recommendations
extracted
• many endgames solved for all five piece combos
Deep Junior (Israeli Co.) – 8-processor
1.6 GHz Intel w/ 8 GB RAM (2003)
Checkers
Arthur Samuel of IBM, 1952
• program learned by playing against itself
• beat champion 1962 (but human clearly made error)
• 19 KB of memory
• 0.000001 Ghz processor
Checkers
Chinook, Jonathan Schaeffer, 1990
• Alpha-beta search on regular PCs
• database of all 444 billion endgame positions with 8 pieces
• Played against Marion Tinsley
– world champion for over 40 years
– lost only 3 games in 40 years
– Chinook won two games, but lost match
• Rematch with Tinsley was incomplete for health reasons
– Chinook became world champion
Othello
Smaller search space (5 to 15 legal moves)
Humans are no match for computers
Backgammon
Garry Tesauro, TD-Gammon, 1992
• Reliably ranked in top-three players of world
• Learned to play through playing against itself
– Reinforcement Learning
Go
Most popular board game in Asia
• Branching factor of 361
• Few competent computer players
– Weak amature at best
Discussion
How reasonable is minimax?
• perfectly performing opponent
• perfect knowledge of leaf node evaluations
• strong assumptions
Metareasoning
Reasoning about reasoning
• alpha-beta is one example
– think before you think
– think about utility of thinking about something before you
think about it
– don’t think about choices you don’t have to think about