Game Playing Game Playing and AI Computers Playing Chess

GamePlayingandAI
GameplayingasaproblemforAIresearch
GamePlaying
Chapter5.1–5.3,5.5
–  gameplayingisnon-trivial
•  playersneed“human-like”intelligence
•  gamescanbeverycomplex(e.g.,Chess,Go)
•  requiresdecisionmakingwithinlimitedHme
–  gamesusuallyare:
•  well-definedandrepeatable
•  fullyobservableandlimitedenvironments
–  candirectlycomparehumansandcomputers
ComputersPlayingChess
TypesofGames
DefiniHons:
•  Zero-sum:oneplayer’sgainistheotherplayer’s
loss.Doesnotmeanfair.
•  Discrete:statesanddecisionshavediscrete
values
•  Finite:finitenumberofstatesanddecisions
•  DeterminisHc:nocoinflips,dierolls–nochance
•  PerfectinformaHon:eachplayercanseethe
completegamestate.Nosimultaneousdecisions.
GamePlayingandAI
Fully Observable
(perfect info)
GamePlayingasSearch
Deterministic
Stochastic (chance)
Checkers, Chess,
Go, Othello
Backgammon,
Monopoly
Partially Observable
(imperfect info)
•  Considertwo-player,perfectinformaHon,
determinisHc,0-sumboardgames:
–  e.g.,chess,checkers,Hc-tac-toe
–  BoardconfiguraHon:aspecificarrangementof
"pieces"
•  RepresenHngboardgamesassearchproblem:
Stratego,
Battleship
–  states:boardconfiguraHons
–  ac'ons:legalmoves
–  ini'alstate:starHngboardconfiguraHon
–  goalstate:gameover/terminalboard
configuraHon
Bridge, Poker,
Scrabble
AllarealsomulH-agent,adversarial,staHctasks
GreedySearch
usinganEvaluaHonFuncHon
GameTreeRepresentaHon
What'sthenewaspect
tothesearchproblem?
•  AU"lityfunc"onisusedtomapeachterminalstate
oftheboard(i.e.,stateswherethegameisover)toa
scoreindicaHngthevalueofthatoutcometothe
computer
There’sanopponent
wecannotcontrol!
…
X
X
X
X
…
XO
X
O
X
O
X
O
…
How can we handle this?
XX
X
O
X
X
O
XO
•  We’lluse:
–  posiHveforwinning;large+meansbe_erforcomputer
–  negaHveforlosing;large−meansbe_erforopponent
–  0foradraw
–  typicalvalues(losstowin):
•  -∞to+∞
•  -1.0to+1.0
GreedySearch
usinganEvaluaHonFuncHon
•  Expandthesearchtreetotheterminalstates
oneachbranch
•  EvaluatetheUHlityofeachterminalboard
configuraHon
•  MaketheiniHalmovethatresultsintheboard
configuraHonwiththemaximumvalue
F
-7
C
C
9
G
-5
H
3
I
9
D
D
2
J
-6
•  Assumingareasonablesearchspace,what'sthe
problem?
Thisignoreswhattheopponentmightdo!
ComputerchoosesC
OpponentchoosesJanddefeatscomputer
computer's
possible moves
A
A
9
B
B
-5
GreedySearch
usinganEvaluaHonFuncHon
K
0
E
E
3
L
2
M
1
N
3
A
9
opponent's
possible moves
O
2
terminal states
B
C
-5
F
D
9
G
-7
computer's
possible moves
-5
H
3
I
9
2
J
-6
K
L
0
2
M
1
E
3
opponent's
possible moves
N
O
3
board evaluation from computer's perspective
board evaluation from computer's perspective
MinimaxPrinciple
MinimaxPrinciple
Assumebothplayersplayop'mally
– assumingtherearetwomovesunHlthe
terminalstates,
– highUHlityvaluesfavorthecomputer
2
terminal states
•  Thecomputerassumesaferitmoves
theopponentwillchoosetheminimizingmove
•  Thecomputerchoosesthebestmove
consideringbothitsmoveandtheopponent’s
opHmalmove
•  computershouldchoosemaximizingmoves
A
A
1
– lowUHlityvaluesfavortheopponent
B
B
-7
•  smartopponentchoosesminimizingmoves
F
-7
computer's
possible moves
C
C
-6
G
-5
H
3
I
9
D
D
0
J
-6
K
0
L
2
M
1
E
E
1
opponent's
possible moves
N
O
3
board evaluation from computer's perspective
2
terminal states
PropagaHngMinimaxValues
UptheGameTree
•  Explorethetreetotheterminalstates
•  EvaluatetheUHlityoftheresulHngboard
configuraHons
•  Thecomputermakesamovetoputtheboard
inthebestconfiguraHonforitassumingthe
opponentmakesherbestmovesonherturn(s):
DeeperGameTrees
•  Minimaxcanbegeneralizedtomorethan2moves
•  Propagatevaluesupthetree
A
A
3
B
B
-5
–  startattheleaves
–  assignvaluetotheparentnodeasfollows
•  useminimumwhennodeistheopponent’smove
•  usemaximumwhennodeisthecomputer'smove
F
F
4
N
W
-3
GeneralMinimaxAlgorithm
For each move by the computer:
1. Perform depth-first search, stopping
at terminal states
2. Evaluate each terminal state
3. Propagate upwards the minimax values
if opponent's move, propagate up
minimum value of its children
if computer's move, propagate up
maximum value of its children
4. Choose move at root with the maximum
of the minimax values of its children
SearchalgorithmindependentlyinventedbyClaude
Shannon(1950)andAlanTuring(1951)
C
C
3
G
H
D
J
J
9
opponent P
9
min
Q
3
8
-6
K
K
5
R
0
opponent
min
E
E
-7
0
I
-5
O
O
-5
4
computer max
S
3
L
T
5
computer
max
M
M
-7
2
U
-7
V
-9
terminal states
X
-5
ComplexityofMinimaxAlgorithm
Assumeallterminalstatesareatdepthd and
there are b possible moves at each step
•  Spacecomplexity
Depth-firstsearch,soO(bd)
•  Timecomplexity
Branchingfactorb,soO(bd)
•  Timecomplexityisamajorproblemsince
computertypicallyonlyhasalimitedamount
ofHmetomakeamove
ComplexityofGamePlaying
•  Assumetheopponent’smovescanbe
predictedgiventhecomputer’smoves
•  Howcomplexwouldsearchbeinthiscase?
–  worstcase:O(bd)branchingfactor,depth
–  Tic-Tac-Toe:~5legalmoves,9movesmaxgame
•  59=1,953,125states
–  Chess:~35legalmoves,~100movespergame
ComplexityofMinimaxAlgorithm
•  Minimaxalgorithmappliedtocompletegame
treesisimpracHcalinpracHce
–  insteaddodepth-limitedsearchtoply(depth)m
–  butUHlityfuncHondefinedonlyforterminal
states
–  weneedtoknowavaluefornon-terminalstates
•  bd~35100~10154states,only~1040legalstates
–  Go:~250legalmoves,~150movespergame
•  Commongamesproduceenormoussearchtrees
StaHcBoardEvaluaHon
•  ASta'cBoardEvalua'on(SBE)func'onisused
toesHmatehowgoodthecurrentboard
configuraHonisforthecomputer
–  itreflectsthecomputer’schancesofwinningfrom
thatnode
–  itmustbeeasytocalculatefromaboard
configuraHon
•  Forexample,forChess:
SBE = α * materialBalance + β * centerControl + γ *…
where material balance = Value of white pieces - Value of
black pieces, pawn = 1, rook = 5, queen = 9, etc.
•  Sta'cEvalua'onfunc'onsuseheurisHcsto
esHmatethevalueofnon-terminalstates
StaHcBoardEvaluaHon
•  Typically,onesubtractshowgooditisforthe
opponentfromhowgooditisforthe
computer
•  IftheSBEgivesXforaplayer,thenitgives-X
fortheopponent
•  SBEshouldagreewiththeUHlityfuncHon
whencalculatedatterminalnodes
MinimaxwithEvaluaHonFuncHons
Tic-Tac-Toe
Example
•  ThesameasgeneralMinimax,except
–  onlygotodepthm
–  esHmatesvalueatleavesusingtheSBEfuncHon
•  HowwouldthisalgorithmperformatChess?
–  ifcouldlookahead~4pairsofmoves(i.e.,8ply),
wouldbeconsistentlybeatenbyaverageplayers
–  ifcouldlookahead~8pairs,isasgoodashuman
master
Evaluation function = (# 3-lengths open for me) – (# 3-lengths open for opponent)
Minimax Algorithm
function Max-Value(s)
inputs:
s: current state in game, Max about to play
output: best-score (for Max) available from s
if ( s is a terminal state or at depth limit )
then return ( SBE value of s )
else
v= –∞
// v is current best minimax value at s
foreach s’ in Successors(s)
v = max( v , Min-Value(s’))
return v
function Min-Value(s)
output: best-score (for Min) available from s
if ( s is a terminal state or at depth limit )
then return ( SBE value of s)
else
v = +∞
// v is current best minimax value at s
foreach s’ in Successors(s)
v = min( v , Max-Value(s’))
return v
SummarySoFar
•  Can'tuseMinimaxsearchtoendofthegame
–  ifwecould,thenchoosingopHmalmoveiseasy
•  SBEisn'tperfectatesHmaHng/scoring
–  ifitwas,justchoosebestmovewithoutsearching
•  SinceneitherisfeasibleforinteresHnggames,
combineMinimaxandSBEconcepts:
–  UseMinimaxtocutoffsearchatdepthm
–  useSBEtoesHmate/scoreboardconfiguraHon
Minimax Example
max
A
min
B
max
min
max
G
F
N
-5
W
-3
H
3
I
8
P
O
4
D
C
9
E
0
J
Q
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
X
-5
Alpha-BetaIdea
•  Someofthebranchesofthegametreewon'tbe
takenifplayingagainstanintelligentopponent
•  “Ifyouhaveanideathatissurelybad,don’ttake
theHmetoseehowtrulyawfulitis.”
--PatWinston
•  Pruningcanbeusedtoignoresomebranches
•  WhiledoingDFSofgametree,keeptrackof:
–  Atmaximizinglevels:
•  highestSBEvalue,v,seensofarinsubtreebeloweachnode
•  lowerboundonnode'sfinalminimaxvalue
–  Atminimizinglevels:
•  lowestSBEvalue,v,seensofarinsubtreebeloweachnode
•  upperboundonnode'sfinalminimaxvalue
Alpha-BetaIdea:AlphaCutoff
max
min
A
100
C
200
D
E
100
120
S
α = 100
B
v = 20
F
α≥v
G
20
Depth-firsttraversalorder
AferreturningfromA,cangetatleast100atS
AferreturningfromF,cangetatmost20atB
Atthispointnoma_erwhatminimaxvalueiscomputed
atG,SwillpreferAoverB.So,SlosesinterestinB
•  ThereisnoneedtovisitG.ThesubtreeatGispruned.
SavesHme.Called“Alphacutoff”(atMINnodeB)
• 
• 
• 
• 
BetaCutoffExample
max
max
B
20
D
20
• 
• 
• 
• 
E
-10
F
-20
A
20
β = 20
C
25
v = 25
X X
v≥β
G
25
•  AteachMINnode,keeptrackoftheminimum
valuereturnedsofarfromitsvisitedchildren
•  Storethisvalueasv
•  EachHmevisupdated(ataMINnode),check
itsvalueagainsttheαvalueofallitsMAX
nodeancestors
•  Ifα≥vforsomeMAXnodeancestor,don’t
visitanymoreofthecurrentMINnode’s
children;i.e.,prune(cutoff)allsubtrees
rootedatremainingchildrenofMIN
BetaCutoff
S
min
AlphaCutoff
H
After returning from B, can get at most 20 at MIN node A
After returning from G, can get at least 25 at MAX node C
No matter what minimax value is found at H, A will NEVER
choose C over B, so don’t visit node H
Called “Beta Cutoff” (at MAX node C)
•  AteachMAXnode,keeptrackofthemaximum
valuereturnedsofarfromitsvisitedchildren
•  Storethisvalueasv
•  EachHmevisupdated(ataMAXnode),check
itsvalueagainsttheβvalueofallitsMINnode
ancestors
•  Ifv≥βforsomeMINnodeancestor,don’tvisit
anymoreofthecurrentMAXnode’schildren;
i.e.,prune(cutoff)allsubtreesrootedat
remainingchildrenofMAX
ImplementaHonofCutoffs
Ateachnode,keepbothαandβvalues,whereα=
largest(i.e.,best)valuefromitsMAXnodeancestorsin
searchtree,andβ=smallest(i.e.,best)valuefromits
MINnodeancestorsinsearchtree.Passthesedown
thetreeduringtraversal
–  AtMAXnode,v=largestvaluefromitschildren
visitedsofar;cutoffifv≥β
•  v value at MAX comes from its descendants
•  β value at MAX comes from its MIN node ancestors
ImplementaHonofAlphaCutoff
max
α = -∞
β = +∞
min
A
C
200
Initialize root’s values
S
B
D
100
E
F
120
20
G
•  At each node, keep two bounds (based on all ancestors
–  AtMINnode,v=smallestvaluefromitschildren
visitedsofar;cutoffifα≥v
•  α value at MIN comes from its MAX node ancestors
•  v value at MIN comes from its descendants
visited so far on path back to root):
§  α: the best (largest) MAX can do at any ancestor
§  β: the best (smallest) MIN can do at any ancestor
§  v: the best value returned by current node’s visited children
•  If at anytime α ≥ v at a MIN node, the remaining children
are pruned (i.e., not visited)
AlphaCutoffExample
AlphaCutoffExample
max
min
α = -∞
β = +∞
α = -∞
β = +∞
v = +∞
C
200
A
max
S
min
B
D
100
E
120
F
20
G
α = -∞
β = +∞
α = -∞
β = 200
v = 200
C
200
A
200
S
B
D
100
E
120
F
20
G
AlphaCutoffExample
max
min
α=-∞
β=+∞
α=-∞
β=100
v=100
A
100
C
200
D
E
120
min
F
20
min
C
200
A
100
α=100
β=120
v=120
B
120
D
100
E
120
A
100
F
20
G
α=100
β=+∞
B
D
100
E
F
120
G
20
AlphaCutoffExample
max
α=100 S
β=+∞ 100
α=-∞
β=100
α=-∞
β=100
C
200
G
AlphaCutoffExample
max
α=100
β=+∞ S
v=100 100
max
S
B
100
AlphaCutoffExample
min
α=100 S
β=+∞ 100
α=-∞
β=100
C
200
A
100
α=100
β=120
v=20
B
20
D
100
E
120
F
20
X
α = 100 ≥ v = 20
G
Notes:
•  Alpha cutoff means not visiting some of a MIN node’s children
•  v values at MIN come from descendants
•  Alpha value at MIN come from MAX node ancestors
Alpha-BetaAlgorithm
Starting from the root:
func'onMax-Value(s,α,β)
Max-Value(root, -∞, +∞)
inputs:
s:currentstateingame,Maxabouttoplay
α:bestscore(highest)forMaxalongpathfromstoroot
β:bestscore(lowest)forMinalongpathfromstoroot
if(sisaterminalstateoratdepthlimit)
thenreturn(SBEvalueofs)
v=-∞
// v = best minimax value found so far at s
foreachs’inSuccessors(s)
v=max(v,Min-Value(s’,α,β))
if(v≥β)thenreturnv//pruneremainingchildren
α=max(α,v)
returnv //returnvalueofbestchild
func'onMin-Value(s,α,β)
if(sisaterminalstateoratdepthlimit)
thenreturn(SBEvalueofs)
v=+∞
// v = best minimax value found so far at s
foreachs’inSuccessors(s)
v=min(v,Max-Value(s’,α,β))
if(α≥v)thenreturnv//pruneremainingchildren
β=min(β,v)
returnv //returnvalueofbestchild
Alpha-BetaExample
max
B
N
-5
O
4
W
-3
H
3
D
I
J
P
Q
-5
-6
L
K
R
0
S
3
M
2
T
5
max
B
min
α=-∞,Bβ=+∞
E
0
8
9
X
Call
Stack
A
α=-∞,Aβ=+∞
C
G
F
Alpha-BetaExample
U
-7
G
F
V
N
-9
O
4
A
-5
W
-3
3
-5
E
0
8
I
J
P
Q
9
X
D
C
H
Call
Stack
A
α=-∞, β=+∞
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
B
A
Alpha-BetaExample
max
B
min α=-∞, β=+∞
max α=-∞,FFβ=+∞ G
-5
N
3
O
4
W
Call
Stack
A
α=-∞, β=+∞
D
C
H
Alpha-BetaExample
I
J
P
Q
9
-6
L
K
R
S
0
B
minα=-∞, β=+∞
E
0
8
T
3
U
5
maxα=-∞,Fβ=+∞G
-5
M
2
V
-7
-9
X
-3
max
-5
N
F
B
A
3
O
4
W
-5
Alpha-BetaExample
E
0
8
I
J
P
Q
9
X
-3
D
C
H
Call
Stack
A
α=-∞ β=+∞
L
K
R
-6
S
0
T
3
M
2
U
5
V
-7
-9
brown: terminal state
N
F
B
A
Alpha-BetaExample
v(F) = alpha(F) = 4, maximum seen so far
B
min α=-∞, β=+∞
F
G
max
α= β=+∞
-5
v=4, α=4,
N
O
4
W
-3
α=-∞, β=+∞
3
I
J
P
Q
X
-6
0
brown: terminal state
B
min α=-∞, β=+∞
L
K
R
max
E
0
8
9
-5
D
C
H
Call
Stack
A
max
S
3
M
2
T
5
U
-7
F
max
V
-9
F
B
A
min
G
-5
α=4
O
N
4 α=4,O
β=+∞
W
-3
X
-5
D
C
H
3
E
0
8
I
J
P
Q
9
Call
Stack
A
α=-∞, β=+∞
-6
L
K
R
0
brown: terminal state
S
3
M
2
T
5
U
-7
V
-9
O
F
B
A
Alpha-BetaExample
Alpha-BetaExample
v(O) = -3, minimum seen so far below O
max
Call
Stack
A
α=-∞
min
B
F
max
-5
H
3
O
N
min
G
α=4
4 α=4, β=+∞
W
X
-3
-5
D
C
β=+∞
I
J
P
Q
9
-6
R
S
0
M
2
T
3
min
L
K
U
5
V
-7
-9
brown:
blue: terminal
terminalstate
state (depth limit)
W
O
F
B
A
Call
Stack
A
α=-∞
E
0
8
max
B
F
max
-5
H
3
O
N
min
G
α=4
4 α=4,
β=v=-3
W
X
-3
-5
Alpha-BetaExample
D
C
β=+∞
8
I
J
P
Q
9
E
0
-6
L
K
R
S
0
T
3
M
2
U
5
V
-7
-9
brown: terminal state
O
F
B
A
Alpha-BetaExample
Why? SmartopponentwillchooseWorworse,thusO'supper
boundis–3.So,atFcomputershouldn'tchooseO:-3since
N:4isbe_er.
alpha(O) ≥ v(O): stop expanding O (alpha cutoff)
max
Call
Stack
A
α=-∞
B
min
F
max
min
G
-5
α=4
O
N
α=4
v=-3
4
W
-3
X
-5
D
C
β=+∞
H
3
I
J
P
Q
9
-6
R
red: not visited
0
S
3
M
2
T
5
B
min
L
K
U
-7
F
-9
O
F
B
A
min
G
-5
α=4
O
N
α=4,
v=-3
4
W
-3
X
-5
D
C
β=+∞
max
V
Call
Stack
A
α=-∞
E
0
8
max
H
3
8
I
J
P
Q
9
E
0
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
O
F
B
A
Alpha-BetaExample
Alpha-BetaExample
alpha(F) not changed (maximizing)
B
N
min
G
-5
α=4
H
3
O
4
D
I
J
P
Q
-6
S
0
M
2
T
3
U
5
V
-7
-9
-5
F
B
A
min
F
G
-5
α=4
N
H
3
O
4
Call
Stack
A
α=-∞
D
C
β=4
β=
max
X
-3
B
min
L
K
R
max
E
0
8
9
v=-3
W
α=-∞
C
β=+∞
F
max
Call
Stack
A
max
min
v(B) = beta(B) = 4, minimum seen so far
W
8
I
J
P
Q
9
v=-3
E
0
-6
L
K
R
S
0
T
3
M
2
U
5
V
-7
-9
X
-3
-5
Alpha-BetaExample
B
A
Alpha-BetaExample
v(B) = beta(B) = -5, updated to minimum seen so far
B
min
F
min
G
-5
α=4
N
O
4
-3
H
3
D
I
J
P
Q
X
-5
-6
0
S
3
M
2
T
5
B
min
L
K
R
max
E
0
8
9
v=-3
W
α=-∞
C
β=4
max
Call
Stack
A
max
U
-7
F
V
-9
G
B
A
min
G
-5
α=4
N
O
4
-3
H
3
X
-5
E
0
8
I
J
P
Q
9
v=-3
W
D
C
β=-5
β=4
max
Call
Stack
A
α=-∞
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
B
A
Alpha-BetaExample
Alpha-BetaExample
Copy alpha and beta values from A to C
v(A) = alpha(A) = -5, maximum seen so far
max
B
min
F
max
N
min
G
-5
α=4
H
3
O
4
8
I
J
P
Q
-6
R
S
0
M
2
T
3
U
5
V
-7
-5
H
3
O
4
A
-5
G
α=4
N
min
-9
C
W
E
0
8
I
J
P
Q
9
v=-3
Call
Stack
D
α=-5,Cβ=+∞
β=-5
F
max
X
-3
B
min
L
K
A
Aβ=+∞
α=-5,α=
max
E
0
9
v=-3
W
D
C
β=-5
Call
Stack
A
α=-5,α=β=+∞
-6
L
K
R
S
0
T
3
M
2
U
5
V
-7
-9
X
-3
-5
Alpha-BetaExample
C
A
Alpha-BetaExample
v(C) = beta(C) = 3, minimum seen so far
v(C) (=3) > α(C) (= -5), so no cutoff
A
B
min
F
max
min
C
β=-5
N
-5
O
4
-3
H
3
I
J
P
Q
X
-5
-6
L
K
R
0
S
3
M
2
T
5
B
min
U
-7
F
max
V
-9
H
C
A
min
C
G
-5
α=4
N
O
4
-3
H
3
8
I
J
P
Q
9
v=-3
W
α=-5,α=
β=+∞
D
α=-5, β=3,
β= v=3
β=-5
X
-5
Call
Stack
A
max
E
0
8
9
v=-3
W
D
α=-5, β=+∞
G
α=4
Call
Stack
Aβ=+∞
α=-5,α=
max
-6
E
0
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
C
A
Alpha-BetaExample
Alpha-BetaExample
beta(C) not changed (minimizing)
B
min
F
max
min
N
-5
H
3
O
4
E
0
8
I
J
P
Q
9
v=-3
W
D
α=-5, β=3
G
α=4
α=-5,α=
β=+∞
C
β=-5
Call
Stack
A
max
-6
L
K
R
S
0
M
2
T
3
U
5
V
-7
F
-9
-5
I
C
A
min
N
-5
B
min
W
F
max
min
G
-5
α=4
N
O
4
-3
H
3
X
-5
E
0
I
J
P
Q
8 α=-5,J β=3
9
v=-3
W
D
-6
R
0
S
3
M
2
T
5
J
P
Q
-6
L
K
R
S
0
T
3
M
2
U
5
V
-7
-9
-5
C
A
B
U
-7
F
max
V
-9
J
C
A
min
N
-5
O
4
-3
H
3
X
-5
E
0
I
J
P
Q
-6
R
0
L
K
α=-5,
8
β=3, v=9
9
v=-3
W
D
α=-5, β=3
G
α=4
α=-5,α=
β=+∞
C
β=-5
Call
Stack
A
max
min
L
K
I
X
-3
Call
Stack
A
α=-5, β=3
8
9
E
0
Alpha-BetaExample
α=-5,α=
β=+∞
C
β=-5
3
v=-3
Alpha-BetaExample
max
H
O
4
D
α=-5, β=3
G
α=4
α=-5,α=
β=+∞
C
β=-5
max
X
-3
B
min
Call
Stack
A
max
S
3
M
2
T
5
U
-7
V
-9
P
J
C
A
Alpha-BetaExample
Alpha-BetaExample
v(J) (=9) ≥ beta(J) (=3) so stop expanding J (beta cutoff)
v(J) = 9
B
min
F
max
min
N
-5
H
3
J
P
Q
9
v=-3
W
I
-6
L
K
R
S
0
M
2
T
3
U
5
V
-7
F
-9
-5
J
C
A
N
min
-5
H
3
O
4
X
-3
-5
Alpha-BetaExample
E
0
I
J
P
Q
-6
L
K
α=-5,
8
β=3, v=9
9
v=-3
W
D
α=-5, β=3
G
α=4
α=-5,α=
β=+∞
C
β=-5
max
X
-3
B
min
Call
Stack
A
max
E
0
α=-5,
8
β=3,
α= v=9
O
4
D
α=-5, β=3
G
α=4
α=-5,α=
β=+∞
C
β=-5
Call
Stack
A
max
R
S
0
T
3
M
2
U
5
V
-7
-9
red: not visited
J
C
A
Alpha-BetaExample
Why? ComputershouldchoosePorbe_eratJ,soJ's
lowerboundis9.But,smartopponentatCwon'ttake
J:9sinceH:3isbe_erforopponent.
v(C) and beta(C) not changed (minimizing)
B
min
F
max
min
N
-5
O
4
-3
H
3
I
J
P
Q
X
-5
-6
0
S
3
M
2
T
5
B
min
L
K
R
max
E
0
8 v=9, β=3
9
v=-3
W
D
α=-5, β=3
G
α=4
α=-5,α=
β=+∞
C
β=-5
Call
Stack
A
max
U
-7
F
max
V
-9
J
C
A
min
C
β=-5
N
-5
O
4
-3
H
3
I
X
-5
E
0
J
8
v=9
P
Q
9
v=-3
W
D
α=-5, β=3
G
α=4
Call
Stack
A
α=-5,α=
β=+∞
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
C
A
Alpha-BetaExample
Alpha-BetaExample
v(A) = alpha(A) = 3, updated to maximum seen so far
B
min
F
max
N
min
-5
H
I
3
O
4
J
v=9
P
Q
-6
L
K
R
S
0
M
2
T
3
U
5
V
-7
F
min
-9
N
A
-5
-5
W
B
F
max
min
G
-5
α=4
N
O
4
-3
3
D
I
X
-5
J
v=9
P
Q
-6
0
S
3
Q
-6
R
S
0
T
3
M
2
U
5
V
-7
-9
-5
D
A
M
2
T
5
B
U
-7
F
max
V
min
-9
A
N
-5
O
4
-3
H
3
I
X
-5
E
0
J
8
v=9
P
Q
9
v=-3
W
D
β=3
G
α=4
α=3,α=
β=+∞
C
β=-5
Call
Stack
A
max
min
L
K
R
P
L
K
X
-3
E
0
8
9
v=-3
W
H
v=9
9
Call
Stack
A
β=3
J
8
Howdoesthealgorithmfinishthesearchtree?
α=3,α=
β=+∞
C
β=-5
I
E
0
Alpha-BetaExample
alpha(A) and v(A) not updated after returning
from D (because A is a maximizing node)
min
3
v=-3
Alpha-BetaExample
max
H
O
4
D
β=3
G
α=4
α=3,α=
β=+∞
C
β=-5
max
X
-3
B
min
Call
Stack
A
max
E
0
8
9
v=-3
W
D
β=3
G
α=4
α=-5
α=
α=3,
β=+∞
C
β=-5
Call
Stack
A
max
-6
L
K
R
0
S
3
M
2
T
5
U
-7
V
-9
A
Alpha-BetaExample
Alpha-BetaExample
Why? SmartopponentwillchooseLorworse,thusE's
upperboundis2.SocomputeratAshouldn'tchoose
E:2sinceC:3isabe_ermove.
AfervisiHngK,S,T,LandreturningtoE,
alpha(E)(=3)≥v(E)(=2)sostopexpandingE
anddon’tvisitM(alphacutoff)
B
min
F
max
min
N
-5
H
3
O
4
I
E
0
J
8
v=9
P
Q
9
v=-3
W
D
β=3
G
α=4
α=3,α=
β=+∞
C
β=-5
Call
Stack
A
max
-6
K
L
M
α=5, β=+∞ 2
R
S
0
T
3
U
5
min
-9
A
X
-3
F
-5
N
-5
O
4
-3
H
3
I
X
-5
E
0
J
8
v=9
P
Q
9
v=-3
W
D
β=3
G
α=4
α=3,α=
β=+∞
C
β=-5
max
V
-7
B
min
α=3, β=5, v=2
Call
Stack
A
max
-6
α=3, β=5, v=2
K
L
M
α=5, β=2 2
R
0
S
3
T
5
U
-7
V
-9
A
Alpha-BetaExample
Finalresult:ComputerchoosesmoveC
max
B
min
F
max
min
C
β=-5
N
-5
O
4
-3
H
3
I
X
-5
E
0
J
8
v=9
P
Q
9
v=-3
W
D
β=3
G
α=4
Call
Stack
A
α=3,α=
β=+∞
-6
α=3, β=5, v=2
K
L
M
α=5, β=2 2
R
0
S
3
T
5
U
-7
V
-9
A
Anotherstep-by-stepexample(fromAIcourseat
UCBerkeley)givenat
h_ps://www.youtube.com/watch?v=xBXHtz4Gbdo
EffecHvenessofAlpha-BetaSearch
EffecHvenessofAlpha-BetaSearch
•  EffecHvenessdependsontheorderinwhich
successorsareexamined;moreeffecHve(i.e.,
morecutoffs)ifbestsuccessorsareexamined
first
•  WorstCase:
•  InpracHceofengetO(b(d/2))ratherthanO(bd)
–  orderedsothatnopruningtakesplace
–  noimprovementoverexhausHvesearch
•  BestCase:
–  eachplayer’sbestmoveisvisitedfirst
•  InpracHce,performanceisclosertobest,
ratherthanworst,case
DealingwithLimitedTime
•  Inrealgames,thereisusuallyaHmelimit,
T,onmakingamove
•  Howdowetakethisintoaccount?
–  cannotstopalpha-betaalgorithmmidwayand
expecttouseresultswithanyconfidence
–  so,wecouldsetaconservaHvedepth-limitthat
guaranteeswewillfindamoveinHme<T
–  butthen,thesearchmayfinishearlyand
theopportunityiswastedtodomoresearch
–  sameashavingabranchingfactorof√b
since(√b)d = b(d/2)
•  Example:Chess
–  DeepBluewentfromb ~ 35tob ~ 6, visiting 1
billionth the number of nodes visited by
Minimax algorithm
–  permitsmuchdeepersearchforthesameHme
–  makescomputerchesscompeHHvewithhumans
DealingwithLimitedTime
InpracHce,useitera'vedeepeningsearch(IDS)
–  runalpha-betasearchwithdepth-firstsearchand
anincreasingdepthlimit
–  whentheclockrunsout,usethesoluHonfound
forthelastcompletedalpha-betasearch(i.e.,the
deepestsearchthatwascompleted)
–  “any'mealgorithm”
TheHorizonEffect
•  SomeHmesdisasterlurksjustbeyondthe
searchdepth
–  computercapturesqueen,butafewmoveslater
theopponentcheckmates(i.e.,wins)
•  Thecomputerhasalimitedhorizon,it
cannotseethatthissignificanteventcould
happen
•  Howdoyouavoidcatastrophiclossesdueto
“short-sightedness”?
–  quiescencesearch
–  secondarysearch
BookMoves
•  Buildadatabaseofopeningmoves,end
games,andstudiedconfiguraHons
•  Ifthecurrentstateisinthedatabase,
usedatabase:
–  todeterminethenextmove
–  toevaluatetheboard
•  Otherwise,doAlpha-Betasearch
TheHorizonEffect
•  QuiescenceSearch
–  whenSBEvalueisfrequentlychanging,lookdeeper
thanthedepthlimit
–  lookforpointwhengame“quietsdown”
–  E.g.,alwaysexpandanyforcedsequences
•  SecondarySearch
1.  findbestmovelookingtodepthd
2.  lookkstepsbeyondtoverifythatitsHlllooksgood
3.  ifitdoesn't,repeatstep2fornextbestmove
MoreonEvaluaHonFuncHons
TheboardevaluaHonfuncHonesHmates
howgoodthecurrentboardconfiguraHon
isforthecomputer
–  itisaheurisHcfuncHonoftheboard'sfeatures
•  i.e.,function(f1, f2, f3, …, fn)
–  thefeaturesarenumericcharacterisHcs
•  feature1,f1,isnumberofwhitepieces
•  feature2,f2,isnumberofblackpieces
•  feature3,f3,isf1/f2
•  feature4,f4,isesHmateof“threat”towhiteking
•  etc.
LinearEvaluaHonFuncHons
•  Alinearevalua'onfunc'onofthe
featuresisaweightedsumoff1,f2,f3, ...
w1 * f1 + w2 * f2 + w3 *f3 + … + wn * fn
–  wheref1,f2,…,fn arethefeatures
–  andw1,w2 ,…,wn aretheweights
•  Moreimportantfeaturesgetmoreweight
ExamplesofAlgorithms
thatLearntoPlayWell
Checkers
A.L.Samuel,“SomeStudiesinMachineLearning
usingtheGameofCheckers,” IBMJournalof
ResearchandDevelopment,11(6):601-617,1959
•  LearnedbyplayingthousandsofHmesagainsta
copyofitself
•  UsedanIBM704with10,000wordsofRAM,
magneHctape,andaclockspeedof1kHz
•  Successfulenoughtocompetewellathuman
tournaments
LinearEvaluaHonFuncHons
•  Thequalityofplaydependsdirectlyonthe
qualityoftheevaluaHonfuncHon
•  TobuildanevaluaHonfuncHonwehaveto:
1.  constructgoodfeaturesusingexpertdomain
knowledge
2.  pickorlearngoodweights
ExamplesofAlgorithms
thatLearntoPlayWell
Backgammon
G.TesauroandT.J.Sejnowski,“AParallel
NetworkthatLearnstoPlayBackgammon,”
ArDficialIntelligence,39(3),357-390,1989
•  Alsolearnedbyplayingagainstitself
•  Usedanon-linearevaluaHonfuncHon-aneural
network
•  Ratedoneofthetopthreeplayersintheworld
Non-DeterminisHcGames
0
1 2 3 4 5 6
Non-DeterminisHcGames
7 8 9 10 11 12
•  Somegamesinvolvechance,forexample:
–  rollofdice
–  spinofgamewheel
–  dealofcardsfromshuffleddeck
•  Howcanwehandlegameswithrandom
elements?
•  ThegametreerepresentaHonisextended
toinclude“chancenodes:”
25
24 23 22 21 20 19
1.  computermoves
2.  chancenodes(represenHngrandomevents)
3.  opponentmoves
18 17 16 15 14 13
Non-DeterminisHcGames
•  Weightscorebytheprobabilitythatmoveoccurs
•  Useexpectedvalueformove:insteadofusing
maxormin,computetheaverage,weightedbythe
probabiliHesofeachchild
ExtendedgametreerepresentaHon:
A
50/50
.5
B
7
.5
.5
.5
E
0
5
4
.5
D
6
50/50
50/50
chance
50/50
6
2 9
A
max
C
2
Non-DeterminisHcGames
0 8
B
min
4
-4
50/50
50/50
.5
.5
D
6
2 9
chance
-2
C
2
7
.5
max
E
0
6
5
min
4
0 8
-4
Non-DeterminisHcGames
Choosemovewiththehighestexpectedvalue
A
max
α=
4
50/50
50/50
4
-2
.5
B
.5
C
2
7
.5
.5
D
6
2 9
chance
E
0
6
5
min
4
0 8
ExpecHminimax
Expectiminimax(n) =
SBE(n)
for n, a Terminal state or
state at cutoff depth
maxs∈Succ(n) expectiminimax( s)
for n, a Max node
mins∈Succ(n) expectiminimax( s)
for n, a Min node
Σ s∈Succ ( n ) P ( s ) * expectiminimax( s) for n, a Chance node
-4
Non-DeterminisHcGames
•  Non-determinismincreasesbranchingfactor
–  21possibledisHnctrollswith2dice(since6-5is
sameas5-6)
•  Valueoflookaheaddiminishes:asdepth
increases,probabilityofreachingagiven
nodedecreases
•  Alpha-BetapruninglesseffecHve
HistoryofSearchInnovaHons
• Shannon,Turing
• Kotok/McCarthy
• MacHack
• Chess3.0+
• Belle
• CrayBlitz
• Hitech
• DeepBlue
Minimaxsearch 1950
Alpha-Betapruning1966
TransposiHontables1967
IteraHve-deepening1975
Specialhardware1978
Parallelsearch 1983
ParallelevaluaHon1985
ALLOFTHEABOVE1997
ComputersPlayGrandMasterChess
“DeepBlue”(IBM)
• 
• 
• 
• 
• 
• 
• 
• 
Parallelprocessor,32“nodes”
Eachnodehad8dedicatedVLSI“chesschips”
Searched200millionconfiguraHons/second
Usedminimax,alpha-beta,sophisHcatedheurisHcs
Averagebranchingfactor~6insteadof~40
In2001searchedto14ply(i.e.,7pairsofmoves)
Avoidedhorizoneffectbysearchingasdeepas40ply
Usedbookmoves
ComputerscanPlayGrandMasterChess
Kasparovvs.DeepBlue,May1997
•  6gamefull-regulaHonchessmatchsponsoredby
ACM
•  Kasparovlostthematch2winsto3winsand1He
•  Historicachievementforcomputerchess;thefirst
Hmeacomputerbecamethebestchessplayeron
theplanet
•  DeepBlueplayedby“bruteforce”(i.e.,rawpower
fromcomputerspeedandmemory);itused
relaHvelyli_lethatissimilartohumanintuiHonand
cleverness
StatusofComputers
inOtherDeterminisHcGames
“GameOver:KasparovandtheMachine”(2003)
•  Checkers
–  Firstcomputerworldchampion:Chinook
–  Beatallhumans(beatMarionTinsleyin1994)
–  UsedAlpha-Betasearchandbookmoves
•  Othello
–  Computerseasilybeatworldexperts
•  Go
–  Branchingfactorb~360,verylarge!
–  BeatLeeSedol,oneofthetopplayersintheworld,in
2016,4gamesto1
GamePlaying:Go
Quiz:NameTheseRobots
Google’sAlphaGobeatKoreangrandmaster
LeeSedol4gamesto1in2016
HowtoImprovePerformance?
•  Reducedepthofsearch
–  Be_erSBEs
•  Usedeeplearningtolearngoodfeaturesrather
thanuse“manually-defined”features
•  Reducebreadthofsearch
–  Exploreasubsetofthepossiblemovesinsteadof
exploringall
•  Userandomizedexplora"onofthesearch
space
MonteCarloTreeSearch(MCTS)
•  Concentratesearchonmostpromisingmoves
•  Best-firstsearchbasedonrandomsamplingof
searchspace
•  MonteCarlomethodsareabroadclassof
algorithmsthatrelyonrepeatedrandom
samplingtoobtainnumericalresults.Theycan
beusedtosolveproblemshavinga
probabilisHcinterpretaHon.
PureMonteCarloTreeSearch
ExploitaHonvs.ExploraHon
•  Foreachpossiblelegalmoveofcurrentplayer,
simulatekrandomgamesbyselecHngmoves
atrandomforbothplayersunHlgameover
(calledplayouts);counthowmanywerewins
outofeachkplayouts;movewithmostwins
isselected
•  Stochas'csimula'onofgame
•  Gamemusthavefinitenumberofpossible
moves,andgamelengthisfinite
•  RatherthanselecHngachildatrandom,how
toselectbestchildnodeduringtreedescent?
MCTSAlgorithm
Recursivelybuildsearchtree,whereeachround
consistsof:
1.  StarHngatroot,successivelyselectbestchild
nodesusingscoringmethodunHlleafnodeL
reached
2.  Createandaddbestnewchildnode,C,ofL
3.  PerformarandomplayoutfromC
4.  UpdatescoreatCandallofC’sancestorsin
searchtree
–  Exploita'on:Keeptrackofaveragewinratefor
eachchildfromprevioussearches;preferchild
thathaspreviouslyleadtomorewins
–  Explora'on:AllowforexploraHonofrelaHvely
unvisitedchildren(moves)too
•  Combinethesefactorstocomputea“score”
foreachchild;pickchildwithhighestscoreat
eachsuccessivenodeinsearch
MonteCarloTreeSearch(MCTS)
L
C
Key:numbergameswon/numberplayedplayouts
State-of-theArtGoPrograms
Update
Scores
•  Google’sAlphaGo
•  Facebook’sDarkforest
•  MCTSimplementedusingmulHplethreadsand
GPUs,andupto110Kplayouts
•  Alsousedadeepneuralnetworktocompute
SBE
Playout
Key:numbergameswon/numberplayedplayouts
Summary
Summary
•  Gameplayingismodeledasasearchproblem
•  Searchtreesforgamesrepresentboth
computerandopponentmoves
•  EvaluaHonfuncHonsesHmatethequalityof
agivenboardconfiguraHonforeachplayer
•  Minimaxalgorithmdeterminesthe
“opHmal”movesbyassumingthatboth
playersalwayschoosestheirbestmove
•  Alpha-betaalgorithmcanavoidlargepartsof
thesearchtree,thusenablingthesearchto
godeeper
•  Formanywell-knowngames,computer
algorithmsusingheurisHcsearchcanmatch
orout-performhumanexperts
−goodforopponent
0neutral
+goodforcomputer