Game Playing

Game Playing
Chapter 5
Game playing
 Search applied to a problem against an adversary


some actions are not under the control of the
problem-solver
there is an opponent (hostile agent)
 Since it is a search problem, we must specify
states & operations/actions


initial state = current board; operators = legal moves;
goal state = game over; utility function = value for
the outcome of the game
usually, (board) games have well-defined rules & the
entire state is accessible
Basic idea
 Consider all possible moves for yourself
 Consider all possible moves for your
opponent
 Continue this process until a point is reached
where we know the outcome of the game
 From this point, propagate the best move back


choose best move for yourself at every turn
assume your opponent will make the optimal
move on their turn
Example
 Tic-tac-toe (Nilsson’s book)
Problem
 For interesting games, it is simply not
computationally possible to look at all
possible moves



in chess, there are on average 35 choices per
turn
on average, there are about 50 moves per player
thus, the number of possibilities to consider is
35100
Solution
 Given that we can only look ahead k number of
moves and that we can’t see all the way to the
end of the game, we need a heuristic function
that substitutes for looking to the end of the
game



this is usually called a static board evaluator (SBE)
a perfect static board evaluator would tell us for
what moves we could win, lose or draw
possible for tic-tac-toe, but not for chess
Creating a SBE approximation
 Typically, made up of rules of thumb

for example, in most chess books each piece is given
a value
• pawn = 1; rook = 5; queen = 9; etc.

further, there are other important characteristics of a
position
• e.g., center control

we put all of these factors into one function,
weighting each aspect differently potentially, to
determine the value of a position
• board_value =  * material_balance +  * center_control
+ … [the coefficients might change as the game goes on]
Compromise
 If we could search to the end of the game, then
choosing a move would be relatively easy

just use minimax
 Or, if we had a perfect scoring function (SBE),
we wouldn’t have to do any search (just
choose best move from current state -- one
step look ahead)
 Since neither is feasible for interesting games,
we combine the two ideas
Basic idea
 Build the game tree as deep as possible
given the time constraints
 apply an approximate SBE to the leaves
 propagate scores back up to the root & use
this information to choose a move
 example
Score percolation: MINIMAX
 When it is my turn, I will choose the move
that maximizes the (approximate) SBE score
 When it is my opponent’s turn, they will
choose the move that minimizes the SBE


because we are dealing with competitive games,
what is good for me is bad for my opponent &
what is bad for me is good for my opponent
assume the opponent plays optimally [worst-case
assumption]
MINIMAX algorithm
 Start at the the leaves of the trees and apply
the SBE
 If it is my turn, choose the maximum SBE
score for each sub-tree
 If it is my opponent’s turn, choose the
minimum score for each sub-tree
 The scores on the leaves are how good the
board appears from that point
 Example
Example
Alpha-beta pruning
 While minimax is an effective algorithm, it can
be inefficient



one reason for this is that it does unnecessary work
it evaluates sub-trees where the value of the subtree is irrelevant
alpha-beta pruning gets the same answer as
minimax but it eliminates some useless work
Example
Alpha-Beta Algorithm
•Traverse the search tree in depth-first order
•Assuming we stop the search at ply d,
then at each of these nodes we generate,
we apply the static evaluation function and
return this value to the node's parent
•At each non-leaf node, store a value indicating the
best backed-up value found so far.
At MAX nodes we'll call this alpha,
at MIN nodes we'll call the value beta.
•alpha = best (maximum) value found so far at a MAX node
(based on its descendant's values).
•beta = best (i.e., minimum)value found so far at a MIN node
(based on its descendant's values).
•The alpha value (of a MAX node) is monotonically non-decreasing
•The beta value (of a MIN node) is monotonically non-increasing
•Given a node n, cutoff the search below n
(i.e., don't generate any more of n's children) if :
•beta cutoff
n is a MAX node and
alpha(n) >= beta(i) for some MIN node ancestor i of n.
•alpha cutoff
n is a MIN node and
beta(n) <= alpha(i) for some MAX node ancestor i of n.
In the example shown above an alpha cutoff
An example of a beta cutoff at node B is shown below:
(because alpha(B) = 25 > beta(S) = 20)
MIN
. S beta = 20
. |
.
|
MAX A
B alpha = 25
20
/|\
/ \
/|\
/ \ D E
20 -10 -20 25
•To avoid searching for the ancestor nodes in order to make
the above tests, we can carry down the tree the best values
found so far at the ancestors.
•at a MAX node n:
beta = min of all the beta values at MIN node
ancestors of n.
at a MIN node n:
alpha = max of all the alpha values at MAX node
ancestors of n.
•at each non-leaf node we'll store
both an alpha and a beta value.
Initially, root values of alpha = -inf and beta = +inf
See the text for Alpha-Beta algorithm
Use
 We project ahead k moves, but we only do
one (the best) move then
 After our opponent moves, we project ahead
k moves so we are possibly repeating some
work
 However, since most of the work is at the
leaves anyway, the amount of work we redo
isn’t significant (think of iterative
deepening)
Alpha-beta performance
 Best-case: can search to twice the depth
during a fixed amount of time [O(bd/2) v.
O(bd)]
 Worst-case: no savings



alpha-beta pruning & minimax always return
the same answer
the difference is the amount of work they do
effectiveness depends on the order in which
successors are examined
• want to examine the best first
Refinements
 Waiting for quiescence

avoids the horizon effect
• disaster is lurking just beyond our search depth
• on the nth move (the maximum depth I can see) I
take your rook, but on the (n+1)th move (a depth to
which I don’t look) you checkmate me

solution
• when predicted values are changing frequently,
search deeper in that part of the tree (quiescence
search)
Secondary search
 Find the best move by looking to depth d
 Look k steps beyond this best move to see if
it still looks good
 No? Look further at second best move, etc.

in general, do a deeper search at parts of the
tree that look “interesting”
 Picture
Book moves
 Build a database of opening moves, end
games, tough examples, etc.
 If the current state is in the database, use the
knowledge in the database to determine the
quality of a state
 If it’s not in the database, just do alpha-beta
pruning
AI & games
 Initially felt to be great AI testbed
 It turned out, however, that brute-force
search is better than a lot of knowledge
engineering

scaling up by dumbing down
• perhaps then intelligence doesn’t have to be humanlike


more high-speed hardware issues than AI issues
however, still good test-beds for learning