Tic-tac-toe

Adversarial search
Artificial Intelligence
Adversarial search
Chapter 6, AIMA
• At least two agents and a
competitive environment: Games,
economies.
• Games and AI:
– Generally considered to require
intelligence (to win)
– Have to evolve in real-time
– Well-defined and limited environment
This presentation owes a lot to V. Pavlovic @ Rutgers, who borrowed from J. D. Skrentny, who in turn borrowed from C. Dyer,...
Board games
© Thierry Dichtenmuller
Games & AI
perfect info
Deterministic
Chance
Checkers,
Chess, Go,
Othello
Backgammon,
Monopoly
imperfect info
Games and search
Traditional search: single agent, searches for its
well-being, unobstructed
Games: search against an opponent
Example: two player board game (chess, checkers,
tic-tac-toe,…)
Board configuration: unique arrangement of "pieces“
Representing board games as goal-directed search
problem (states = board configurations):
–
–
–
–
Initial state: Current board configuration
Successor function: Legal moves
Goal state: Winning/terminal board configuration
Utility function: Current board configuration
Bridge, Poker,
Scrabble
Example: Tic-tac-toe
• Initial state: 3×3 empty
table.
• Successor function:
Players take turns marking
± or | in the table cells.
• Goal state: When all the
table cells are filled or
when either player has
three symbols in a row.
• Utility function: +1 for
three in a row, -1 if the
opponent has three in a
row, 0 if the table is filled
and no-one has three
symbols in a row.
Initial state
± ±
± ±
±
±|
|
±± ±|
|
±
± |± ±
±
|±±
±| |±
|
|
±
||
±
±
Goal state |
± ±
± =0 |±±
Utility
|±
±
Goal state
Utility = +1 | ± ±
|
|
Goal state
|
Utility = -1
1
The minimax principle
Example: Tic-tac-toe
Assume the opponent plays to win and
always makes the best possible move.
± |
Your (MAX) move
(±)
The minimax value for a node = the utility
for you of being in that state, assuming
that both players (you and the opponent)
play optimally from there on to the end.
| ± ±
|
Assignment: Expand this tree to the end of the game.
Terminology:
MAX = you, MIN = the opponent.
Example: Tic-tac-toe
Example: Tic-tac-toe
± |
Your (MAX) move
± |
Your (MAX) move
| ± ±
| ± ±
|
Opponent
(MIN) move
|
± ± |
± |
± |
| ± ±
| ± ±
| ± ±
|
± |
± ± |
Opponent
(MIN) move
| ±
Your
| ± ±
(MAX)
move | |
± ± |
| ± |
± |
| ± |
± |
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
± |
± | |
| ±
| | ±
± ± |
± ± |
| ± |
± ± |
| ± |
± ± |
| |
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
± ± |
± |
± |
| ± ±
| ± ±
| ± ±
|
± |
± ± |
Your
| ± ±
(MAX)
move | |
Minimax
+1
value
± ± |
| ± |
± |
| ± |
± |
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
± |
± | |
| ±
| | ±
| |
0
± ± |
± ± |
| ± ±
| ±
| ± ±
0
0
| ± |
± ± |
| ± ±
| ± ±
0
+1
| ± |
± ± |
| ± ±
| ± ±
| | ±
± | |
± | ±
± | |
± | ±
| | ±
| | ±
± | |
± | ±
± | |
± | ±
| | ±
Utility = +1
Utility = 0
Utility = 0
Utility = 0
Utility = 0
Utility = +1
Utility = +1
Utility = 0
Utility = 0
Utility = 0
Utility = 0
Utility = +1
Example: Tic-tac-toe
Example: Tic-tac-toe
± |
Your (MAX) move
± |
Your (MAX) move
| ± ±
| ± ±
|
|
Minimax
value
0
Opponent
(MIN) move
± ± |
± |
± |
| ± ±
| ± ±
| ± ±
Minimax
value
± ± |
Your
| ± ±
(MAX)
move | |
Minimax
+1
value
|
± |
0
Opponent
(MIN) move
| ±
0
| ± |
± |
| ± |
± |
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
± |
± | |
| ±
| | ±
0
0
0
0
± |
± |
| ± ±
| ± ±
| ± ±
Minimax
value
0
± ± |
| |
± ± |
+1
± ± |
Your
| ± ±
(MAX)
move | |
Minimax
+1
value
|
± |
0
| ±
0
0
± ± |
| ± |
± |
| ± |
± |
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
± |
± | |
| ±
| | ±
| |
0
0
0
0
+1
± ± |
± ± |
| ± |
± ± |
| ± |
± ± |
± ± |
± ± |
| ± |
± ± |
| ± |
± ± |
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| ± ±
| | ±
± | |
± | ±
± | |
± | ±
| | ±
| | ±
± | |
± | ±
± | |
± | ±
| | ±
Utility = +1
Utility = 0
Utility = 0
Utility = 0
Utility = 0
Utility = +1
Utility = +1
Utility = 0
Utility = 0
Utility = 0
Utility = 0
Utility = +1
2
The minimax value
The minimax algorithm
1. Start with utilities of terminal nodes
2. Propagate them back to root node by choosing the
minimax strategy
Minimax value for node n =
Utility(n)
If n is a terminal node
Max(Minimax-values of successors)
If n is a MAX node
Min(Minimax-values of successors)
If n is a MIN node
A
A
1
B
max
D
D
0
C
B
-5
C
-6
E
E
1
min
F
G
-7
7
-5
H
3
I
9
J
-6
K
L
0
2
M
1
N
3
O
2
High utility favours you (MAX), therefore choose move with highest utility
Low utility favours the opponent (MIN), therefore choose move with
lowest utility
The minimax algorithm
The minimax algorithm
1. Start with utilities of terminal nodes
2. Propagate them back to root node by choosing the
minimax strategy
A
A
1
B
1. Start with utilities of terminal nodes
2. Propagate them back to root node by choosing the
minimax strategy
A
A
1
max
D
D
0
C
B
-5
Figure borrowed from V. Pavlovic
C
-6
E
E
1
B
max
D
D
0
C
B
-5
C
-6
E
E
1
min
F
-7
7
G
-5
H
3
I
9
J
-6
K
0
L
2
M
1
N
3
O
2
Figure borrowed from V. Pavlovic
Complexity of minimax algorithm
• A depth-first search
– Time complexity O(bd)
– Space complexity O(bd)
• Time complexity impossible in real games (with
time constraints) except in very simple games
(e.g. tic-tac-toe)
min
F
-7
7
G
-5
H
3
I
9
J
-6
K
0
L
2
M
1
N
3
O
2
Figure borrowed from V. Pavlovic
Strategies to improve minimax
1. Remove redundant search paths
- symmetries
2. Remove uninteresting search paths
- alpha-beta pruning
3. Cut the search short before goal
- Evaluation functions
4. Book moves
3
First three levels of the tic-tac-toe state space reduced by symmetry
1. Remove redundant paths
3 states
(instead of 9)
Tic-tac-toe has mirror symmetries
and rotational symmetries
12 states
(instead of
8·9 = 72)
|±
|
±
|
=
|
=
|
|
|
±
|
±
=
Image from G. F. Luger, ”Artificial Intelligence”, 2002
2. Remove uninteresting paths
Alpha-Beta Example
If the player has a better choice
m at n’s parent node, or at
any node further up, then
node n will never be reached.
minimax(A,0,4)
minimax(node, level, depth limit)
Prune the entire path below node
m’s parent node (except for
the path that m belongs to,
and paths that are equal to
this path).
A
α=?
B
Minimax is depth-first → keep
track of highest (α) and
lowest (β) values so far.
-5
N
W
Call
Stack
A
N
-5
W
-3
H
-10
I
8
P
O
4
D
C
G
F
9
E
0
J
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
A
Slide adapted from V. Pavlovic
Q
-6
R
0
S
3
T
5
U
-7
V
B
F
G
F
α=?
-5
W
-3
H
-10
I
8
P
O
4
D
C
β=?
N
-9
B
A
Slide adapted from V. Pavlovic
α=?
max
M
2
Call
Stack
A
max
min
L
K
X
-5
L
K
minimax(F,2,4)
α=?
B
9
J
Alpha-Beta Example
minimax(B,1,4)
B
β=?
P
-5
Alpha-Beta Example
min
I
8
E
0
X
-3
max
H
-10
O
4
D
C
G
F
Called alpha-beta pruning.
Call
Stack
A
max
9
J
Q
-6
L
K
R
0
X
-5
E
0
S
3
M
2
T
5
U
-7
V
-9
F
B
A
Slide adapted from V. Pavlovic
4
Alpha-Beta Example
Alpha-Beta Example
minimax(F,2,4) is returned to
minimax(N,3,4)
alpha = 4, maximum seen so far
max
Call
Stack
A
α=?
B
min
F
max
G
α=?
N
-5
H
-10
W
9
X
-3
I
8
P
O
4
D
C
β=?
-5
E
0
J
Q
-6
R
S
0
M
2
T
3
U
5
α=?
B
min
L
K
V
-7
F
-9
gold: terminal state
G
α=4
α=
N
-5
W
9
-5
Alpha-Beta Example
B
F
max
min
Call
Stack
G
α=4
N
-5
H
-10
O
W
9
X
-3
I
8
P
O
β=?
4
D
C
-5
J
L
K
Q
R
S
0
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
F
B
A
Slide adapted from V. Pavlovic
T
3
M
2
U
5
α=?
B
V
-7
F
-9
gold: terminal state
min
G
α=4
N
-5
O
W
9
-5
Alpha-Beta Example
I
8
P
X
-3
Slide adapted from V. Pavlovic
H
-10
β=?
4
D
C
β=?
max
O
F
B
A
Call
Stack
A
max
min
E
0
-6
L
K
minimax(W,4,4)
A
β=?
J
Alpha-Beta Example
α=?
min
E
0
gold: terminal state
minimax(O,3,4)
max
I
8
P
X
-3
Slide adapted from V. Pavlovic
H
-10
O
4
D
C
β=?
max
N
F
B
A
Call
Stack
A
max
E
0
J
L
K
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
gold: terminal state (depth limit)
W
O
F
B
A
Slide adapted from V. Pavlovic
Alpha-Beta Example
minimax(O,3,4) is returned to
O's beta (-3) < F's alpha (4): Stop expanding O (alpha cut-off)
beta = -3, minimum seen so far
A
max
α=?
B
min
F
G
α=4
N
-5
O
W
-3
H
-10
I
8
P
β=-3
β=
4
D
C
β=?
max
min
Call
Stack
9
X
-5
E
0
J
Q
-6
R
0
S
3
5
gold: terminal state (depth limit)
Slide adapted from V. Pavlovic
M
2
T
α=?
B
min
L
K
U
-7
F
V
-9
min
G
α=4
N
-5
O
W
-3
H
-10
I
8
P
β=-3
4
D
C
β=?
max
O
F
B
A
Call
Stack
A
max
9
X
-5
E
0
J
Q
-6
L
K
R
0
S
3
M
2
T
5
gold: terminal state (depth limit)
U
-7
V
-9
O
F
B
A
Slide adapted from V. Pavlovic
5
Alpha-Beta Example
Alpha-Beta Example
Why?
Smart opponent selects W or worse → O's upper bound is –3
So MAX shouldn't select O:-3 since N:4 is better
minimax(F,2,4) is returned to
alpha not changed (maximizing)
α=?
B
min
F
G
α=4
N
-5
H
-10
O
W
9
X
-3
I
8
P
β=-3
4
D
C
β=?
max
min
Call
Stack
A
max
-5
E
0
J
Q
-6
R
S
0
M
2
T
3
U
5
α=?
B
min
L
K
V
-7
F
-9
gold: terminal state (depth limit)
min
G
α=4
N
-5
O
W
9
-5
Alpha-Beta Example
I
8
P
X
-3
Slide adapted from V. Pavlovic
H
-10
β=-3
4
D
C
β=?
max
O
F
B
A
Call
Stack
A
max
E
0
J
L
K
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
gold: terminal state (depth limit)
F
B
A
Slide adapted from V. Pavlovic
Alpha-Beta Example
minimax(B,1,4) is returned to
minimax(G,2,4)
beta = 4, minimum seen so far
α=?
B
min
F
G
α=4
N
-5
H
-10
O
W
9
X
-3
I
8
P
β=-3
4
D
C
β=4
β=
max
min
Call
Stack
A
max
-5
E
0
J
Q
-6
R
S
0
M
2
T
3
U
5
α=?
B
min
L
K
F
min
-9
N
H
-10
O
W
I
8
P
9
X
-3
Slide adapted from V. Pavlovic
-5
E
0
J
L
K
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
gold: terminal state (depth limit)
Alpha-Beta Example
minimax(B,1,4) is returned to
minimax(A,0,4) is returned to
alpha = -5, maximum seen so far
Call
Stack
A
max
α=?
B
min
F
max
G
α=4
N
-5
O
W
-3
H
-10
I
8
P
β=-3
4
D
C
β=-5
β=4
β=
9
X
-5
E
0
J
Q
-6
0
S
3
T
5
gold: terminal state (depth limit)
Slide adapted from V. Pavlovic
M
2
U
-7
α=-5
α=?
B
F
V
min
B
A
G
α=4
N
-5
O
W
-3
H
-10
I
8
P
β=-3
4
D
C
β=-5
β=4
β=
max
-9
Call
Stack
A
max
min
L
K
R
G
B
A
Slide adapted from V. Pavlovic
Alpha-Beta Example
beta = -5, minimum seen so far
min
-5
β=-3
4
B
A
gold: terminal state (depth limit)
G
α=4
D
C
β=4
β=
max
V
-7
Call
Stack
A
max
9
X
-5
E
0
J
Q
-6
L
K
R
0
S
3
M
2
T
5
gold: terminal state (depth limit)
U
-7
V
-9
A
Slide adapted from V. Pavlovic
6
Alpha-Beta Example
Alpha-Beta Example
minimax(C,1,4)
Call
Stack
A
max
α=-5
α=?
B
min
C
F
G
α=4
N
-5
H
-10
O
W
9
X
-3
I
8
P
β=-3
4
D
C
β=?
β=-5
β=4
β=
max
min
minimax(H,2,4)
-5
E
0
J
Q
-6
R
S
0
M
2
T
3
U
5
α=-5
α=?
B
min
L
K
min
-9
F
G
α=4
N
-5
O
W
Alpha-Beta Example
I
8
P
9
X
-3
Slide adapted from V. Pavlovic
H
-10
β=-3
4
C
A
gold: terminal state (depth limit)
C
D
C
β=?
β=-5
β=4
β=
max
V
-7
Call
Stack
A
max
-5
E
0
J
L
K
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
gold: terminal state (depth limit)
H
C
A
Slide adapted from V. Pavlovic
Alpha-Beta Example
minimax(C,1,4) is returned to
beta = -10, minimum seen so far
Call
Stack
A
max
α=-5
α=?
B
min
C
F
G
α=4
N
-5
H
-10
O
W
9
X
-3
I
8
P
β=-3
4
D
C
β=-10
β=?
β=-5
β=4
β=
max
min
C's beta (-10) < A's alpha (-5): Stop expanding C (alpha cut-off)
-5
E
0
J
Q
-6
R
S
0
M
2
T
3
U
5
α=-5
α=?
B
min
L
K
min
-9
F
N
W
F
max
min
Call
Stack
C
G
α=4
N
-5
O
W
-3
I
8
P
β=-3
4
H
-10
9
D
X
-5
E
0
J
Q
-6
0
S
3
M
2
T
5
gold: terminal state (depth limit)
Slide adapted from V. Pavlovic
-5
J
L
K
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
C
A
gold: terminal state (depth limit)
Slide adapted from V. Pavlovic
U
-7
α=-5
α=0
α=?
B
V
min
D
A
C
F
G
α=4
N
-5
O
W
-3
H
-10
I
8
P
β=-3
4
D
C
β=-10
β=?
β=-5
β=4
β=
max
-9
Call
Stack
A
max
min
L
K
R
9
E
0
minimax(D,1,4) is returned to
A
C
β=-10
β=?
D
Alpha-Beta Example
α=-5
α=?
B
I
8
P
X
-3
minimax(D,1,4)
β=-5
β=4
β=
H
-10
O
Alpha-Beta Example
min
-5
β=-3
4
Slide adapted from V. Pavlovic
max
G
α=4
C
A
gold: terminal state (depth limit)
C
C
β=-10
β=?
β=-5
β=4
β=
max
V
-7
Call
Stack
A
max
9
X
-5
E
0
J
Q
-6
L
K
R
0
S
3
M
2
T
5
gold: terminal state (depth limit)
U
-7
V
-9
A
Slide adapted from V. Pavlovic
7
Alpha-Beta Example
Alpha-Beta Example
minimax(D,1,4) is returned to
α=-5
α=0
α=?
B
min
C
F
G
α=4
N
-5
H
-10
O
W
9
X
-3
I
8
P
β=-3
4
D
C
β=-10
β=?
β=-5
β=4
β=
max
min
Which nodes will
be expanded?
A
max
-5
E
0
J
Q
R
S
0
T
3
M
2
U
5
α=-5
α=0
α=?
B
min
-9
F
G
α=4
N
-5
O
W
Alpha-Beta Example
D
I
K
L
K
α=5
Q
9
-5
E
J
8
-6
Call
Stack
E
β=-7
0
P
X
-3
Slide adapted from V. Pavlovic
H
-10
β=-3
4
A
gold: terminal state (depth limit)
C
C
β=-10
β=?
β=-5
β=4
β=
max
V
-7
Which nodes will
be expanded?
A
max
min
L
K
-6
Call
Stack
R
S
0
T
3
M
M
α=-7
2
U
5
V
-7
-9
A
All
gold: terminal state (depth limit)
Slide adapted from V. Pavlovic
Alpha-Beta Example
minimax(D,1,4) is returned to
α=-5
α=0
α=?
B
min
C
F
G
α=4
N
-5
O
W
-3
H
-10
I
8
P
β=-3
4
D
C
β=-10
β=?
β=-5
β=4
β=
max
min
What if we expand
Call
from right to left?
Stack
A
max
9
X
-5
E
0
J
Q
-6
R
0
S
3
M
2
T
5
α=-5
α=0
α=?
B
min
L
K
U
-7
V
min
gold: terminal state (depth limit)
Slide adapted from V. Pavlovic
Alpha-Beta pruning rule
Stop expanding
max node n if α(n) > β higher in the tree
min node n if β(n) < α higher in the tree
A
C
F
G
α=4
N
-5
O
W
-3
H
-10
I
J
8
Q
9
X
-5
E
E
β=-7
0
P
β=-3
4
D
C
β=-10
β=?
β=-5
β=4
β=
max
-9
What if we expand
Call
from right to left?
Stack
A
max
-6
L
K
R
S
0
3
M
M
α=-7
2
T
5
gold: terminal state (depth limit)
U
-7
V
-9
Only 4
A
Slide adapted from V. Pavlovic
Alpha-Beta pruning rule
Stop expanding
max node n if α(n) > β higher in the tree
min node n if β(n) < α higher in the tree
α= 4
ββ==84
α=
α=24
β=2
β=3
β=3
α=
α=48
β=2
α= 3
β=3
β=2
Which nodes will not be expanded when expanding from left to right?
8
Alpha-Beta pruning rule
Stop expanding
max node n if α(n) > β higher in the tree
min node n if β(n) < α higher in the tree
Alpha-Beta pruning rule
Stop expanding
max node n if α(n) > β higher in the tree
min node n if β(n) < α higher in the tree
α= 3
ββ==78
α= 4
ββ==92
Which nodes will not be expanded when expanding from left to right?
Alpha-Beta pruning rule
Stop expanding
max node n if α(n) > β higher in the tree
min node n if β(n) < α higher in the tree
β=9
ββ==34
β=3
α= 8
β=2
α=32
α=
β=3
α= 9
β=2
Which nodes will not be expanded when expanding from right to left?
3. Cut the search short
• Use depth-limit and estimate utility for
non-terminal nodes (evaluation function)
– Static board evaluation (SBE)
– Must be easy to compute
Example, chess:
SBE = α " Material Balance"+ β " Center Control"+γ ...
Material balance = value of white pieces – value of black pieces, where
pawn = +1, knight & bishop = +3, rook = +5, queen = +9, king = ?
Which nodes will not be expanded when expanding from right to left?
http://en.wikipedia.org/wiki/Computer_chess
Leaf evaluation
For most chess positions, computers cannot look ahead to all final possible
positions. Instead, they must look ahead a few plies and then evaluate the
final board position. The algorithm that evaluates final board positions is
termed the "evaluation function", and these algorithms are often vastly
different between different chess programs.
Nearly all evaluation functions evaluate positions in units and at the least
consider material value. Thus, they will count up the amount of material on
the board for each side (where a pawn is worth exactly 1 point, a knight is
worth 3 points, a bishop is worth 3 points, a rook is worth 5 points and a
queen is worth 9 points). The king is impossible to value since its loss causes
the loss of the game. For the purposes of programming chess computers,
however, it is often assigned a value of appr. 200 points.
The parameters (α,β,γ,...) can be learned (adjusted) from
experience.
4. Book moves
• Build a database (look-up table) of
endgames, openings, etc.
• Use this instead of minimax when
possible.
Evaluation functions take many other factors into account, however, such as
pawn structure, the fact that doubled bishops are usually worth more,
centralized pieces are worth more, and so on. The protection of kings is
usually considered, as well as the phase of the game (opening, middle or
endgame).
9
http://en.wikipedia.org/wiki/Computer_chess
Games with chance
Using endgame databases
…
Nalimov Endgame Tablebases, which use state-of-the-art compression
techniques, require 7.05 GB of hard disk space for all five-piece endings. It is
estimated that to cover all the six-piece endings will require at least 1 terabyte.
Seven-piece tablebases are currently a long way off.
While Nalimov Endgame Tablebases handle en passant positions, they assume
that castling is not possible. This is a minor flaw that is probably of interest only
to endgame specialists.
More importantly, they do not take into account the fifty move rule. Nalimov
Endgame Tablebases only store the shortest possible mate (by ply) for each
position. However in certain rare positions, the stronger side cannot force win
before running into the fifty move rule. A chess program that searches the
database and obtains a value of mate in 85 will not be able to know in advance if
such a position is actually a draw according to the fifty move rule, or if it is a win,
because there will be a piece exchange or pawn move along the way. Various
solutions including the addition of a "distance to zero" (DTZ) counter have been
proposed to handle this problem, but none have been implemented yet.
• Dice games, card games,...
• Extend the minimax tree with chance
layers.
A
max
α=4
α=
Compute the expected
value over outcomes.
50/50
50/50
4
.5
Select move with
the highest
expected value.
B
.5
D
β=6
2 9
chance
-2
.5
C
β=2
7
.5
50/50
50/50
E
β=0
6
5
min
β=-4
0 8
-4
Animation adapted from V. Pavlovic
10