DCP 1172: Introduction to Artificial Intelligence

DCP 1172
Introduction to Artificial Intelligence
Lecture notes for Chap. 6 [AIMA]
Chang-Sheng Chen
1
This time: Outline
• Adversarial search
- Game playing
• The mini-max
algorithm
• Resource limitations
• alpha-beta pruning
• Elements of chance
DCP 1172, Ch. 6
2
Game Playing Search
• Why study games?
• Why is search a good idea?
DCP 1172, Ch. 6
3
Why Study Games ? (1)
• Game playing was one the first tasks undertaken in AI.
• By 1950, Chess had been studied by many forerunners in AI (
e.g., Claude Shannon, Alan Turing, etc.)
• For AI researchers, the abstract nature of games make
them an appealing feature for study.
• The state of a game is easy to represent,
• and agents are usually restricted to a small number
of actions,
• whose outcomes are defined by precise rules.
DCP 1172, Ch. 6
4
Why Study Games ? (2)
• Games are interesting because they are too
hard to solve.
• Games requires the ability to make some decision
even when calculating the optimal decision is
infeasible.
• Games also penalize inefficiency severely.
• Game-playing research has therefore spawned a
number of interesting ideas on how to make the
best possible use of time.
DCP 1172, Ch. 6
5
Why is search a good idea?
• Ignoring computational complexity, games are a
perfect application for a complete search.
• Some majors assumptions we’ve been making:
• Only an agent’s actions change the world
• World is deterministic and fully observable
• Pretty much true in lots of games
• Of course, ignoring complexity is a bad idea, so
games are a good place to study resource bounded
searches.
DCP 1172, Ch. 6
6
What kind of games?
• Abstraction: To describe a game we must capture
every relevant aspect of the game. Such as:
• Chess
• Tic-tac-toe
• …
• Fully observable environments: Such games are
characterized by perfect information
• Search: game-playing then consists of a search through
possible game positions
• Unpredictable opponent: introduces uncertainty
thus game-playing must deal with contingency
problems
DCP 1172, Ch. 6
7
Searching for the next move
• Complexity: many games have a huge search space
• Chess: b = 35, m=100  nodes = 35 100
if each node takes about 1 ns to explore
then each move will take about 10 50 millennia
to calculate.
• Resource (e.g., time, memory) limit: optimal
solution not feasible/possible, thus must approximate
1. Pruning: makes the search more efficient by discarding
portions of the search tree that cannot improve quality
result.
2. Evaluation functions: heuristics to evaluate utility of
a state without exhaustive search.
DCP 1172, Ch. 6
8
Two-player games
• A game formulated as a search problem:
•
•
•
•
Initial state: board position and turn
Successor functions: definition of legal moves
Terminal state: conditions for when game is over
Utility function: a numeric value that describes the outcome
of the game. E.g., -1, 0, 1 for loss, draw, win.
(AKA payoff function)
DCP 1172, Ch. 6
9
Game vs. search problem
DCP 1172, Ch. 6
10
Example: Tic-Tac-Toe
DCP 1172, Ch. 6
11
Type of games
DCP 1172, Ch. 6
12
Type of games
DCP 1172, Ch. 6
13
Generate Game Tree
DCP 1172, Ch. 6
14
Generate Game Tree
x
x
x
x
x
DCP 1172, Ch. 6
15
Generate Game Tree
x
x
o
o x
x
o
DCP 1172, Ch. 6
x
o
16
Generate Game Tree
x
1 ply
1 move
x o
o x
x
o
DCP 1172, Ch. 6
x
o
17
A subtree
x o x
ox
o
win
lose
x o x
x ox
o
x o x
x ox
o
o
x o x
x ox
o x o
x o x
ox
x
o
x o x x o x
x ox o ox
oo x
o
x o x
x ox
x oo
x o x
o ox
x xo
x o x
ox
x o
x o x
ox
x oo
x o x
o ox
x o
x o x
o ox
x x o
DCP 1172, Ch. 6
draw
x o x
ox
o x o
x o x
x ox
o x o
18
What is a good move?
x o x
ox
o
win
lose
x o x
x ox
o
x o x
x ox
o
o
x o x
x ox
o x o
x o x
ox
x
o
x o x x o x
x ox o ox
oo x
o
x o x
o ox
x xo
x o x
ox
x o
x o x
ox
x oo
x o x
o ox
x o
x o x
o ox
x x o
DCP 1172, Ch. 6
draw
x o x
ox
o x o
x o x
x ox
o x o
19
MiniMax
• Perfect play for deterministic environments with perfect
information
• From among the moves available to you, take the
best one
• Where the best one is determined by a search using
the MiniMax strategy
DCP 1172, Ch. 6
20
The minimax algorithm
• Basic idea: choose move with highest minimax value
= best achievable payoff against best play
• Algorithm:
1. Generate game tree completely
2. Determine utility of each terminal state
3. Propagate the utility values upward in the three by applying
MIN and MAX operators on the nodes in the current level
4. At the root node use minimax decision to select the move
with the max (of the min) utility value
• Steps 2 and 3 in the algorithm assume that the
opponent will play perfectly.
DCP 1172, Ch. 6
21
Minimax
3
12
8 2 4 6
14
5
2
•Minimize opponent’s chance
•Maximize your chance
DCP 1172, Ch. 6
22
Minimax
3
2
2
MIN
3
12
8 2 4 6
14
5
2
•Minimize opponent’s chance
•Maximize your chance
DCP 1172, Ch. 6
23
Minimax
3
MAX
3
2
2
MIN
3
12
8 2 4 6
14
5
2
•Minimize opponent’s chance
•Maximize your chance
DCP 1172, Ch. 6
24
Minimax
3
MAX
3
2
2
MIN
3
12
8 2 4 6
14
5
2
•Minimize opponent’s chance
•Maximize your chance
DCP 1172, Ch. 6
25
MiniMax = maximum of the minimum
• I’ll choose the best move for me (max)
• You’ll choose the best move for you (min)
1st Ply
2nd Ply
DCP 1172, Ch. 6
26
Minimax: Recursive implementation
Complete: Yes, for finite state-space Time complexity: O(bm)
Optimal: Yes
Space complexity: O(bm) (= DFS
Does not keep all nodes in memory.)
DCP 1172, Ch. 6
27
Do We Have To Do All That Work?
MAX
MIN
3
12
8
DCP 1172, Ch. 6
28
Do We Have To Do All That Work?
3
MAX
3
MIN
3
12
8
DCP 1172, Ch. 6
29
Do We Have To Do All That Work?
3
MAX
3
2
MIN
3
12
8 2
Since 2 is smaller than 3, then there is no need for
further search
DCP 1172, Ch. 6
30
Do We Have To Do All That Work?
3
MAX
3
X
2
MIN
3
12
8
14
5
2
More on this next time: α-β pruning
DCP 1172, Ch. 6
31
Ideal Case
• Search all the way to the leaves (end game positions)
• Return the leaf (leaves) that leads to a win (for me)
• Anything wrong with that?
DCP 1172, Ch. 6
32
More Realistic
• Search ahead to a non-leaf (non-goal) state and
evaluate it somehow
• Chess
• 4 ply is a novice
• 8 ply is a master
• 12 ply can compete at the highest level
• In no sense can 12 ply be likened to a search of the
whole space
DCP 1172, Ch. 6
33
1. Move evaluation without complete search
• Complete search is too complex and impractical
• Evaluation function: evaluates value of state using
heuristics and cuts off search
• New MINIMAX:
• CUTOFF-TEST:
• cutoff test to replace the terminal test condition
(e.g., deadline, depth-limit, etc.)
• EVAL:
• evaluation function to replace utility function
(e.g., number of chess pieces taken)
DCP 1172, Ch. 6
34
Evaluation Functions
• Need a numerical function that assigns a value to a nongoal state
• Has to capture the notion of a position being good for
one player
• Has to be fast
• Typically a linear combination of simple metrics
DCP 1172, Ch. 6
35
Evaluation functions
• Weighted linear evaluation function: to combine n heuristics
f = w1f1 + w2f2 + … + wnfn
E.g,
w’s could be the values of pieces (1 for prawn, 3 for bishop etc.)
f’s could be the number of type of pieces on the board
DCP 1172, Ch. 6
36
Note: exact values do not matter
DCP 1172, Ch. 6
37
Minimax with cutoff: viable algorithm?
Assume we have
100 seconds,
evaluate 104
nodes/s; can
evaluate 106
nodes/move
DCP 1172, Ch. 6
38
2. - pruning: search cutoff
• Pruning: eliminating a branch of the search tree
from consideration without exhaustive examination of
each node
• - pruning: the basic idea is to prune portions of the
search tree that cannot improve the utility value of the
max or min node, by just considering the values of
nodes seen so far.
• Does it work? Yes, in roughly cuts the branching
factor from b to b resulting in double as far lookahead than pure minimax
DCP 1172, Ch. 6
39
- pruning: example
6
MAX
MIN
6
6
12
8
DCP 1172, Ch. 6
40
- pruning: example
6
MAX
MIN
2
6
6
12
8
2
DCP 1172, Ch. 6
41
- pruning: example
6
MAX
MIN
6
12
5
2
6
8
2
DCP 1172, Ch. 6
5
42
- pruning: example
6
MAX
Selected move
MIN
6
12
5
2
6
8
2
DCP 1172, Ch. 6
5
43
Properties of -
DCP 1172, Ch. 6
44
- pruning: general principle
Player
Opponent
m

If  > v then MAX will chose m so
prune tree under n
Similar for  for MIN
Player
n
Opponent
DCP 1172, Ch. 6
v
45
Remember: Minimax: Recursive implementation
DCP 1172, Ch. 6
46
Alpha-beta Pruning Algorithm
DCP 1172, Ch. 6
47
More on the - algorithm
• Same basic idea as minimax, but prune (cut away)
branches of the tree that we know will not contain the
solution.
• Because minimax is depth-first, let’s consider nodes
along a given path in the tree. Then, as we go along this
path, we keep track of:
•  : Best choice so far for MAX
•  : Best choice so far for MIN
DCP 1172, Ch. 6
48
More on the - algorithm: start from Minimax
Note: These are both
Local variables. At the
Start of the algorithm,
We initialize them to
 = - and  = +
DCP 1172, Ch. 6
49
More on the - algorithm
 = -
 = +
MAX
MIN
In Min-Value:
Max-Value loops
over these
…
Min-Value loops
over these
MAX
5
10
 = -
=5
 = -
=5
6
 = -
DCP
= 51172,
2
Ch. 6
8
7
50
More on the - algorithm
In Max-Value:
 = -
 = +
MAX
Max-Value loops
MIN over these
=5
 = +
…
MAX
5
10
 = -
=5
 = -
=5
6
 = -
DCP
= 51172,
2
Ch. 6
8
7
51
More on the - algorithm
In Min-Value:
 = -
 = +
MAX
=5
 = +
…
MIN
Min-Value loops
over these
MAX
5
10
 = -
=5
 = -
=5
6
=5
 = -
 =6 2
DCP
= 51172, Ch.
2
8
7
<, End loop and return 5
52
More on the - algorithm
In Max-Value:
 = -
 = +
MAX
Max-Value loops
MIN over these
=5
 = +
=5
 = +
…
MAX
5
10
 = -
=5
 = -
=5
6
 = -
=5
=5
=2
DCP 1172, Ch. 6
2
8
7
End loop and return 5
53
Operation of - pruning algorithm
<, End loop
and return
DCP 1172, Ch. 6
54
Example
DCP 1172, Ch. 6
55
- algorithm:
DCP 1172, Ch. 6
56
Solution
NODE
A
B
C
D
E
D
F
D
C
G
H
G
C
B
J
K
L
K
…
TYPE
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
ALPHA
-I
-I
-I
-I
10
-I
11
-I
10
10
9
10
10
-I
-I
-I
14
-I
BETA
+I
+I
+I
+I
10
10
11
10
+I
+I
9
9
+I
10
10
10
14
10
SCORE
10
11
10
9
9
10
14
10
NODE
…
J
B
A
Q
R
S
T
S
R
V
W
V
R
Q
A
DCP 1172, Ch. 6
TYPE
ALPHA
BETA
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
Max
Min
Max
10
-I
10
10
10
10
5
10
10
10
4
10
10
10
10
10
10
+I
+I
+I
+I
5
5
+I
+I
4
4
+I
10
10
SCORE
10
10
5
5
4
4
10
10
10
57
State-of-the-art for deterministic games
DCP 1172, Ch. 6
58
Stochastic games
DCP 1172, Ch. 6
59
Algorithm for stochastic games
DCP 1172, Ch. 6
60
Remember: Minimax algorithm
DCP 1172, Ch. 6
61
Stochastic games: the element of chance
expectimax and expectimin, expected values over all possible outcomes
CHANCE
?
0.5
0.5
?
3
?
8
17
DCP 1172, Ch. 6
8
62
Stochastic games: the element of chance
4 = 0.5*3 + 0.5*5
CHANCE
Expectimax
0.5
0.5
5
3
Expectimin
5
8
17
DCP 1172, Ch. 6
8
63
Evaluation functions: Exact values DO matter
Order-preserving transformation do not necessarily behave
the same!
DCP 1172, Ch. 6
64
State-of-the-art for stochastic games
DCP 1172, Ch. 6
65
Summary
DCP 1172, Ch. 6
66