Chapter6 slides

COMP-4640: Intelligent & Interactive Systems
Game Playing
A game can be formally
defined as a search
problem with:
-An initial state
-a set of operators
(actions or moves)
-a terminal test
-a utility
function
(payoff)
COMP-4640: Intelligent & Interactive Systems
Game Playing
1.
Multi-agent environment
–
Multi-player games involve planning and acting in environments
populated by other active agents
Agents use sense/plan/act architecture that does not plan too far into
the unpredictable future
But with proper information agent can construct plan that consider the
effects of the actions of other agents
In AI we will consider the special case of a games,
–
–
–
•
•
•
•
2.
Zero Sum Games
–
–
3.
deterministic
turn taking
two-player
zero sum games of perfect-information
either one of them wins (and the other loses), or a draw results
+1 win -1 loss 0 draw
Agents utility functions make the games adversarial
COMP-4640: Intelligent & Interactive Systems
Game Playing
Multi-agent environment
Robot Soccer
Game tree (2-player,
deterministic, turns)
COMP-4640: Intelligent & Interactive Systems
Game Playing
The Minimax Algorithm
COMP-4640: Intelligent & Interactive Systems
Game Playing
The Minimax Algorithm
COMP-4640: Intelligent & Interactive Systems
Game Playing
• The evaluation function:
• Must have the same terminal states (goal states)
as the utility function
• Must be of reasonable complexity so that it can
be computed quickly (this is a trade-off between
Accuracy and Time)
• Should be accurate
• The performance of the game playing system
depends on the accuracy “goodness” of the
evaluation function
COMP-4640: Intelligent & Interactive Systems
Game Playing
• One problem with using minimax is that it may
not be feasible to search the whole game tree
for a minimax decision (move or action)
• Using depth-limited search may speed thing up
the minimax decision process but instead of
using the utility function one would need to
construct an evaluation fuction.
• This evaluation function would provide an
estimate of the expected utility of a game
position
COMP-4640: Intelligent & Interactive Systems
Game Playing
•
•
•
•
•
•
•
•
Properties of minimax
Complete? Yes (if tree is finite)
Optimal? Yes (against an optimal opponent)
Time complexity? O(bm)
Space complexity? O(bm) (depth-first exploration)
• For chess, b ≈ 35, m ≈100 for "reasonable" games
 exact solution completely infeasible
•
COMP-4640: Intelligent & Interactive Systems
Game Playing
Once we have developed a good evaluation
function, we must also consider:
• The depth-limit
• The Horizon Problem
– Difficult to eliminate
– When a program is facing a move by the opponent
that causes serious damage and is ultimately
unavoidable
– Stalling pushes the move over the horizon to a place
where it can’t be detected
COMP-4640: Intelligent & Interactive Systems
Game Playing
• Once we have an evaluation function and a
depth-limit we can then re-apply minimax
search.
• However, for depth-limited search minimax may
still be inefficient.
• Minimax will expand nodes that need not be
searched.
• By making our search method more efficient, we
will be able to search at deeper levels of our
game tree.
COMP-4640: Intelligent & Interactive Systems
Game Playing: Alpha-Beta Pruning
1. Search below a
MIN node may be
alpha-pruned if the
beta value is < to
the alpha value of
some MAX
ancestor.
2. Search below a
MAX node may be
beta-pruned if the
alpha value is > to
the beta value of
some MIN
ancestor.
7
2
Alpha-Beta Pruning (αβ prune)
• Rules of Thumb
– α is the highest max found so far
– β is the lowest min value found so far
– If Min is on top Alpha prune
– If Max is on top Beta prune
– You will only have alpha prune’s at Min level
– You will only have beta prunes at Max level
– See detailed algorithm p167
COMP-4640: Intelligent & Interactive Systems
Game Playing: Alpha-Beta Pruning
1. Search below a
MIN node may be
alpha-pruned if the
beta value is < to
the alpha value of
some MAX
ancestor.
2. Search below a
MAX node may be
beta-pruned if the
alpha value is > to
the beta value of
some MIN
ancestor.
7
2
COMP-4640: Intelligent & Interactive Systems
Game Playing: Alpha-Beta Pruning
1. Search below a
MIN node may be
alpha-pruned if the
beta value is < to
the alpha value of
some MAX
ancestor.
2. Search below a
MAX node may be
beta-pruned if the
alpha value is > to
the beta value of
some MIN
ancestor.
3
3
3
5
3
2
9
3
5
β
COMP-4640: Intelligent & Interactive Systems
Game Playing: Alpha-Beta Pruning
1. Search below a
MIN node may be
alpha-pruned if the
beta value is < to
the alpha value of
some MAX
ancestor.
2. Search below a
MAX node may be
beta-pruned if the
alpha value is > to
the beta value of
some MIN
ancestor.
3
0
3
3
0
9
0
7
α
3
2
9
3
5
7
9
β
0
7
4
COMP-4640: Intelligent & Interactive Systems
Game Playing: Alpha-Beta Pruning
3
1. Search below a
MIN node may be
alpha-pruned if the
beta value is < to
the alpha value of
some MAX
ancestor.
2. Search below a
MAX node may be
beta-pruned if the
alpha value is > to
the beta value of
some MIN
ancestor.
3
2
0
3
3
0
9
0
2
7
2
α
3
2
9
3
5
0
9
β
0
α
2
7
7
6
4
2 1
6
5
6
COMP-4640: Intelligent & Interactive Systems
Game Playing
COMP-4640: Intelligent & Interactive Systems
Game Playing
5
5
3
5
5
3
3
6
5
5
6
0
6
7
3
1
3
2
4
7
COMP-4640: Intelligent & Interactive Systems
Game Playing
5
5
3
5
5
3
3
6
5
5
6
0
6
β
7
α
3
1
3
2
4
7
COMP-4640: Intelligent & Interactive Systems
Game of Chance: Expecti-minimax
•Initial value of leaves indicate board state
•Use percentage chance based upon roll for first calculated value
•Min eval f(n) selects Max value
•The second roll uses different assigned percentage chance
•Max eval f(n) selects Max value
COMP-4640: Intelligent & Interactive Systems
Game of Chance: Expecti-minimax
3
(3*1.0)
3
0
0
0
•Initial value of leaves indicate board state
•Use percentage chance based upon roll for first calculated value
•Min eval f(n) selects Max value
•The second roll uses different assigned percentage chance
•Max eval f(n) selects Max value
COMP-4640: Intelligent & Interactive Systems
Game of Chance: Expecti-minimax
3
(3*1.0)
3
0
6
0
0
6
6
9
9
3
0
6
•Initial value of leaves indicate board state
•Use percentage chance based upon roll for first calculated value
•Min eval f(n) selects Max value
•The second roll uses different assigned percentage chance
•Max eval f(n) selects Max value
12
COMP-4640: Intelligent & Interactive Systems
Game of Chance: Expecti-minimax
2
2
(0*0.67 + 6*0.33)
3
(3*1.0)
3
0
6
0
0
6
6
9
9
3
0
6
•Initial value of leaves indicate board state
•Use percentage chance based upon roll for first calculated value
•Min eval f(n) selects Max value
•The second roll uses different assigned percentage chance
•Max eval f(n) selects Max value
12
COMP-4640: Intelligent & Interactive Systems
Game of Chance: Expecti-minimax
2
2
(0*0.67 + 6*0.33)
3
(3*1.0)
3
0
6
0
0
6
0
6
9
9
3
3
6
0
6
0
12
6
•Initial value of leaves indicate board state
•Use percentage chance based upon roll for first calculated value
•Min eval f(n) selects Max value
•The second roll uses different assigned percentage chance
•Max eval f(n) selects Max value
12
COMP-4640: Intelligent & Interactive Systems
Game of Chance: Expecti-minimax
2
2
2
(0*0.67 + 6*0.33)
3
(3*1.0)
3
0
2
6
0
0
2
6
0
6
9
9
3
3
(0*0.67 + 6*0.33)
6
0
6
0
12
6
•Initial value of leaves indicate board state
•Use percentage chance based upon roll for first calculated value
•Min eval f(n) selects Max value
•The second roll uses different assigned percentage chance
•Max eval f(n) selects Max value
12
Cutting off search
MinimaxCutoff is identical to MinimaxValue except
1. Terminal? is replaced by Cutoff?
2. Utility is replaced by Eval
Does it work in practice?
bm = 106, b=35  m=4
4-ply lookahead is a hopeless chess player!
–
–
–
4-ply ≈ human novice
8-ply ≈ typical PC, human master
12-ply ≈ Deep Blue, Kasparov
COMP-4640:
Deterministic games in practice
• Checkers: Chinook ended 40-year-reign of human world champion
Marion Tinsley in 1994. Used a precomputed endgame database
defining perfect play for all positions involving 8 or fewer pieces on
the board, a total of 444 billion positions.
»
»
• Chess: Deep Blue defeated human world champion Garry Kasparov
in a six-game match in 1997. Deep Blue searches 200 million
positions per second, uses very sophisticated evaluation, and
undisclosed methods for extending some lines of search up to 40
ply.
•
• Othello: human champions refuse to compete against computers,
who are too good.
•
• Go: human champions refuse to compete against computers, who
are too bad. In go, b > 300, so most programs use pattern
knowledge bases to suggest plausible moves.
http://www.research.ibm.com/deepblue/