GAME TREES / ADVERSARIAL SEARCH

CSc-180 (Gordon)
Week 6A notes
GAME TREES / ADVERSARIAL SEARCH
Consider a program to play a board game.
Basic game play loop:
no
accept user
input
prompt for
user move
legal?
yes
update internal
board state
display
board
no
game
over?
make
computer
move
update internal
board state
display
board
yes
no
game
over?
yes
announce winner
The hard part is “make computer move”, because that is where intelligence is required.
Key issues:
• huge search space. In chess, branching factor ≈ 35. Games last about 80 moves. Search space ≈ 3580 positions.
• cannot search entire tree. Must use shorter depth, or narrow width.
• need a heuristic to evaluate who is winning. But sometimes the heuristic is wrong.
• complex heuristics are slower, further reducing the number of nodes that can be searched.
Example very simple heuristic for a board state “P” in a chess-like game, considering only material balance:
eval(P) = piececount(computer) – piececount(human)
We could apply this heuristic immediately:
current board position
(all possible computer moves)
A
B
C
D
E
F
eval(A)
= 10
eval(B)
=2
eval(C)
= -5
eval(D)
=6
eval(E)
= 10
eval(F)
= -2
Computer would pick either move A or move E, and play that move. But the program will be very weak!
Shannon (1950) proposed instead the following:
• build a tree of possible moves, alternating at each level between the possible computer and human moves.
• apply the heuristic only at the leaf nodes.
• use the MINIMAX algorithm to “back up” the leaf node heuristic evaluations.
Properties of the heuristic function:
• positive or negative; more positive means better for computer, more negative is better for human.
• often called “terminal evaluation” or “static evaluation”
• usually a weighted sum of various factors. For example:
eval(P) = W1(material) + W2(center control) + W3(mobility) + ... etc.
• Samuels (1960) checker program learned by automatically tuning the weights
MINIMAX example:
current board position
5
A
B
-3
D
C
5
F
E
9
computer moves (MAX)
G
I
H
7
-3
0
2
K
J
5
6
human moves (MIN)
2
8
computer moves (MAX)
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
0
9
-2
0
-2
-4
-3
7
6
5
-2
2
8
0
1
Terminal eval
(heuristic)
Computer chooses move B (the red line, above)
Note that if computer chose move A (in the hopes of reaching node M with a higher score of 9), the human would
probably not respond with D. Instead, human would move to F, resulting in a poor score for the computer.
How deep to search?
• can use full-width, fixed depth
• can use variable depth – the minimax algorithm works the same!
• each level in the tree is called a “PLY”
If variable depth search, how to know when to stop and evaluate?
• if game over, return +999 or -999 (depending on who won) – don’t search deeper!
• is there time to search deeper?
• heuristic: is this a promising path?
• heuristic: is the position “stable”? If so, maybe can apply static eval (terminal eval) here. (“quiescence”)
• if a predetermined max depth is reached
Important optimizations:
• computer should pick fastest win:
o give more credit to wins at higher plies
• computer should pick slowest loss:
o give more credit to losses at lower plies
Iterative Deepening algorithm:
for searchDepth = 1 to maxDepth
{ minimax(searchDepth)
if out of time, exit
}
solves problem of not knowing how deep you can search!
Minimax Pseudocode
MiniMax(Board)
best.mv = [not yet defined]
best.score = -9999
For each legal move m
{ make move m.mv on Board
m.score = MIN
if (m.score > best.score) then best = m
retract move m.mv on Board
}
Make move best.mv
---------------------------------------MAX
if (game over) return EVAL-ENDING
else if (max depth) return EVAL
else
best.score = -9999
for each computer legal move m
{ make move m.mv on Board
m.score = MIN
if (m.score > best.score) then best = m
retract move m.mv on Board
}
return best.score
MIN
if (game over) return EVAL-ENDING
else if (max depth) return EVAL
else
best.score = 9999
for each human legal move m.mv
{ make move m.mv on Board
m.score = MAX
if (m.score < best.score) then best = m
retract move m.mv on Board
}
return best.score
Assumptions:
•
EVAL applies heuristics to evaluate position B, from computer’s
perspective (more+ = computer winning, more- = human winning).
•
EVAL-ENDING = +999(computer won), -999(human won), 0 if drawn.
•
Board (game state) is global. All other variables are local.
•
Variables “best” and “m” represent legal moves, where:
o “.mv” represents the move itself
o “.score” represents the evaluation for the move

Download Report

GAME TREES / ADVERSARIAL SEARCH

Paperzz.com

Your Paperzz