GamePlayingandAI GameplayingasaproblemforAIresearch GamePlaying Chapter5.1–5.3,5.5 – gameplayingisnon-trivial • playersneed“human-like”intelligence • gamescanbeverycomplex(e.g.,Chess,Go) • requiresdecisionmakingwithinlimitedHme – gamesusuallyare: • well-definedandrepeatable • fullyobservableandlimitedenvironments – candirectlycomparehumansandcomputers ComputersPlayingChess TypesofGames DefiniHons: • Zero-sum:oneplayer’sgainistheotherplayer’s loss.Doesnotmeanfair. • Discrete:statesanddecisionshavediscrete values • Finite:finitenumberofstatesanddecisions • DeterminisHc:nocoinflips,dierolls–nochance • PerfectinformaHon:eachplayercanseethe completegamestate.Nosimultaneousdecisions. GamePlayingandAI Fully Observable (perfect info) GamePlayingasSearch Deterministic Stochastic (chance) Checkers, Chess, Go, Othello Backgammon, Monopoly Partially Observable (imperfect info) • Considertwo-player,perfectinformaHon, determinisHc,0-sumboardgames: – e.g.,chess,checkers,Hc-tac-toe – BoardconfiguraHon:aspecificarrangementof "pieces" • RepresenHngboardgamesassearchproblem: Stratego, Battleship – states:boardconfiguraHons – ac'ons:legalmoves – ini'alstate:starHngboardconfiguraHon – goalstate:gameover/terminalboard configuraHon Bridge, Poker, Scrabble AllarealsomulH-agent,adversarial,staHctasks GreedySearch usinganEvaluaHonFuncHon GameTreeRepresentaHon What'sthenewaspect tothesearchproblem? • AU"lityfunc"onisusedtomapeachterminalstate oftheboard(i.e.,stateswherethegameisover)toa scoreindicaHngthevalueofthatoutcometothe computer There’sanopponent wecannotcontrol! … X X X X … XO X O X O X O … How can we handle this? XX X O X X O XO • We’lluse: – posiHveforwinning;large+meansbe_erforcomputer – negaHveforlosing;large−meansbe_erforopponent – 0foradraw – typicalvalues(losstowin): • -∞to+∞ • -1.0to+1.0 GreedySearch usinganEvaluaHonFuncHon • Expandthesearchtreetotheterminalstates oneachbranch • EvaluatetheUHlityofeachterminalboard configuraHon • MaketheiniHalmovethatresultsintheboard configuraHonwiththemaximumvalue F -7 C C 9 G -5 H 3 I 9 D D 2 J -6 • Assumingareasonablesearchspace,what'sthe problem? Thisignoreswhattheopponentmightdo! ComputerchoosesC OpponentchoosesJanddefeatscomputer computer's possible moves A A 9 B B -5 GreedySearch usinganEvaluaHonFuncHon K 0 E E 3 L 2 M 1 N 3 A 9 opponent's possible moves O 2 terminal states B C -5 F D 9 G -7 computer's possible moves -5 H 3 I 9 2 J -6 K L 0 2 M 1 E 3 opponent's possible moves N O 3 board evaluation from computer's perspective board evaluation from computer's perspective MinimaxPrinciple MinimaxPrinciple Assumebothplayersplayop'mally – assumingtherearetwomovesunHlthe terminalstates, – highUHlityvaluesfavorthecomputer 2 terminal states • Thecomputerassumesaferitmoves theopponentwillchoosetheminimizingmove • Thecomputerchoosesthebestmove consideringbothitsmoveandtheopponent’s opHmalmove • computershouldchoosemaximizingmoves A A 1 – lowUHlityvaluesfavortheopponent B B -7 • smartopponentchoosesminimizingmoves F -7 computer's possible moves C C -6 G -5 H 3 I 9 D D 0 J -6 K 0 L 2 M 1 E E 1 opponent's possible moves N O 3 board evaluation from computer's perspective 2 terminal states PropagaHngMinimaxValues UptheGameTree • Explorethetreetotheterminalstates • EvaluatetheUHlityoftheresulHngboard configuraHons • Thecomputermakesamovetoputtheboard inthebestconfiguraHonforitassumingthe opponentmakesherbestmovesonherturn(s): DeeperGameTrees • Minimaxcanbegeneralizedtomorethan2moves • Propagatevaluesupthetree A A 3 B B -5 – startattheleaves – assignvaluetotheparentnodeasfollows • useminimumwhennodeistheopponent’smove • usemaximumwhennodeisthecomputer'smove F F 4 N W -3 GeneralMinimaxAlgorithm For each move by the computer: 1. Perform depth-first search, stopping at terminal states 2. Evaluate each terminal state 3. Propagate upwards the minimax values if opponent's move, propagate up minimum value of its children if computer's move, propagate up maximum value of its children 4. Choose move at root with the maximum of the minimax values of its children SearchalgorithmindependentlyinventedbyClaude Shannon(1950)andAlanTuring(1951) C C 3 G H D J J 9 opponent P 9 min Q 3 8 -6 K K 5 R 0 opponent min E E -7 0 I -5 O O -5 4 computer max S 3 L T 5 computer max M M -7 2 U -7 V -9 terminal states X -5 ComplexityofMinimaxAlgorithm Assumeallterminalstatesareatdepthd and there are b possible moves at each step • Spacecomplexity Depth-firstsearch,soO(bd) • Timecomplexity Branchingfactorb,soO(bd) • Timecomplexityisamajorproblemsince computertypicallyonlyhasalimitedamount ofHmetomakeamove ComplexityofGamePlaying • Assumetheopponent’smovescanbe predictedgiventhecomputer’smoves • Howcomplexwouldsearchbeinthiscase? – worstcase:O(bd)branchingfactor,depth – Tic-Tac-Toe:~5legalmoves,9movesmaxgame • 59=1,953,125states – Chess:~35legalmoves,~100movespergame ComplexityofMinimaxAlgorithm • Minimaxalgorithmappliedtocompletegame treesisimpracHcalinpracHce – insteaddodepth-limitedsearchtoply(depth)m – butUHlityfuncHondefinedonlyforterminal states – weneedtoknowavaluefornon-terminalstates • bd~35100~10154states,only~1040legalstates – Go:~250legalmoves,~150movespergame • Commongamesproduceenormoussearchtrees StaHcBoardEvaluaHon • ASta'cBoardEvalua'on(SBE)func'onisused toesHmatehowgoodthecurrentboard configuraHonisforthecomputer – itreflectsthecomputer’schancesofwinningfrom thatnode – itmustbeeasytocalculatefromaboard configuraHon • Forexample,forChess: SBE = α * materialBalance + β * centerControl + γ *… where material balance = Value of white pieces - Value of black pieces, pawn = 1, rook = 5, queen = 9, etc. • Sta'cEvalua'onfunc'onsuseheurisHcsto esHmatethevalueofnon-terminalstates StaHcBoardEvaluaHon • Typically,onesubtractshowgooditisforthe opponentfromhowgooditisforthe computer • IftheSBEgivesXforaplayer,thenitgives-X fortheopponent • SBEshouldagreewiththeUHlityfuncHon whencalculatedatterminalnodes MinimaxwithEvaluaHonFuncHons Tic-Tac-Toe Example • ThesameasgeneralMinimax,except – onlygotodepthm – esHmatesvalueatleavesusingtheSBEfuncHon • HowwouldthisalgorithmperformatChess? – ifcouldlookahead~4pairsofmoves(i.e.,8ply), wouldbeconsistentlybeatenbyaverageplayers – ifcouldlookahead~8pairs,isasgoodashuman master Evaluation function = (# 3-lengths open for me) – (# 3-lengths open for opponent) Minimax Algorithm function Max-Value(s) inputs: s: current state in game, Max about to play output: best-score (for Max) available from s if ( s is a terminal state or at depth limit ) then return ( SBE value of s ) else v= –∞ // v is current best minimax value at s foreach s’ in Successors(s) v = max( v , Min-Value(s’)) return v function Min-Value(s) output: best-score (for Min) available from s if ( s is a terminal state or at depth limit ) then return ( SBE value of s) else v = +∞ // v is current best minimax value at s foreach s’ in Successors(s) v = min( v , Max-Value(s’)) return v SummarySoFar • Can'tuseMinimaxsearchtoendofthegame – ifwecould,thenchoosingopHmalmoveiseasy • SBEisn'tperfectatesHmaHng/scoring – ifitwas,justchoosebestmovewithoutsearching • SinceneitherisfeasibleforinteresHnggames, combineMinimaxandSBEconcepts: – UseMinimaxtocutoffsearchatdepthm – useSBEtoesHmate/scoreboardconfiguraHon Minimax Example max A min B max min max G F N -5 W -3 H 3 I 8 P O 4 D C 9 E 0 J Q -6 L K R 0 S 3 M 2 T 5 U -7 V -9 X -5 Alpha-BetaIdea • Someofthebranchesofthegametreewon'tbe takenifplayingagainstanintelligentopponent • “Ifyouhaveanideathatissurelybad,don’ttake theHmetoseehowtrulyawfulitis.” --PatWinston • Pruningcanbeusedtoignoresomebranches • WhiledoingDFSofgametree,keeptrackof: – Atmaximizinglevels: • highestSBEvalue,v,seensofarinsubtreebeloweachnode • lowerboundonnode'sfinalminimaxvalue – Atminimizinglevels: • lowestSBEvalue,v,seensofarinsubtreebeloweachnode • upperboundonnode'sfinalminimaxvalue Alpha-BetaIdea:AlphaCutoff max min A 100 C 200 D E 100 120 S α = 100 B v = 20 F α≥v G 20 Depth-firsttraversalorder AferreturningfromA,cangetatleast100atS AferreturningfromF,cangetatmost20atB Atthispointnoma_erwhatminimaxvalueiscomputed atG,SwillpreferAoverB.So,SlosesinterestinB • ThereisnoneedtovisitG.ThesubtreeatGispruned. SavesHme.Called“Alphacutoff”(atMINnodeB) • • • • BetaCutoffExample max max B 20 D 20 • • • • E -10 F -20 A 20 β = 20 C 25 v = 25 X X v≥β G 25 • AteachMINnode,keeptrackoftheminimum valuereturnedsofarfromitsvisitedchildren • Storethisvalueasv • EachHmevisupdated(ataMINnode),check itsvalueagainsttheαvalueofallitsMAX nodeancestors • Ifα≥vforsomeMAXnodeancestor,don’t visitanymoreofthecurrentMINnode’s children;i.e.,prune(cutoff)allsubtrees rootedatremainingchildrenofMIN BetaCutoff S min AlphaCutoff H After returning from B, can get at most 20 at MIN node A After returning from G, can get at least 25 at MAX node C No matter what minimax value is found at H, A will NEVER choose C over B, so don’t visit node H Called “Beta Cutoff” (at MAX node C) • AteachMAXnode,keeptrackofthemaximum valuereturnedsofarfromitsvisitedchildren • Storethisvalueasv • EachHmevisupdated(ataMAXnode),check itsvalueagainsttheβvalueofallitsMINnode ancestors • Ifv≥βforsomeMINnodeancestor,don’tvisit anymoreofthecurrentMAXnode’schildren; i.e.,prune(cutoff)allsubtreesrootedat remainingchildrenofMAX ImplementaHonofCutoffs Ateachnode,keepbothαandβvalues,whereα= largest(i.e.,best)valuefromitsMAXnodeancestorsin searchtree,andβ=smallest(i.e.,best)valuefromits MINnodeancestorsinsearchtree.Passthesedown thetreeduringtraversal – AtMAXnode,v=largestvaluefromitschildren visitedsofar;cutoffifv≥β • v value at MAX comes from its descendants • β value at MAX comes from its MIN node ancestors ImplementaHonofAlphaCutoff max α = -∞ β = +∞ min A C 200 Initialize root’s values S B D 100 E F 120 20 G • At each node, keep two bounds (based on all ancestors – AtMINnode,v=smallestvaluefromitschildren visitedsofar;cutoffifα≥v • α value at MIN comes from its MAX node ancestors • v value at MIN comes from its descendants visited so far on path back to root): § α: the best (largest) MAX can do at any ancestor § β: the best (smallest) MIN can do at any ancestor § v: the best value returned by current node’s visited children • If at anytime α ≥ v at a MIN node, the remaining children are pruned (i.e., not visited) AlphaCutoffExample AlphaCutoffExample max min α = -∞ β = +∞ α = -∞ β = +∞ v = +∞ C 200 A max S min B D 100 E 120 F 20 G α = -∞ β = +∞ α = -∞ β = 200 v = 200 C 200 A 200 S B D 100 E 120 F 20 G AlphaCutoffExample max min α=-∞ β=+∞ α=-∞ β=100 v=100 A 100 C 200 D E 120 min F 20 min C 200 A 100 α=100 β=120 v=120 B 120 D 100 E 120 A 100 F 20 G α=100 β=+∞ B D 100 E F 120 G 20 AlphaCutoffExample max α=100 S β=+∞ 100 α=-∞ β=100 α=-∞ β=100 C 200 G AlphaCutoffExample max α=100 β=+∞ S v=100 100 max S B 100 AlphaCutoffExample min α=100 S β=+∞ 100 α=-∞ β=100 C 200 A 100 α=100 β=120 v=20 B 20 D 100 E 120 F 20 X α = 100 ≥ v = 20 G Notes: • Alpha cutoff means not visiting some of a MIN node’s children • v values at MIN come from descendants • Alpha value at MIN come from MAX node ancestors Alpha-BetaAlgorithm Starting from the root: func'onMax-Value(s,α,β) Max-Value(root, -∞, +∞) inputs: s:currentstateingame,Maxabouttoplay α:bestscore(highest)forMaxalongpathfromstoroot β:bestscore(lowest)forMinalongpathfromstoroot if(sisaterminalstateoratdepthlimit) thenreturn(SBEvalueofs) v=-∞ // v = best minimax value found so far at s foreachs’inSuccessors(s) v=max(v,Min-Value(s’,α,β)) if(v≥β)thenreturnv//pruneremainingchildren α=max(α,v) returnv //returnvalueofbestchild func'onMin-Value(s,α,β) if(sisaterminalstateoratdepthlimit) thenreturn(SBEvalueofs) v=+∞ // v = best minimax value found so far at s foreachs’inSuccessors(s) v=min(v,Max-Value(s’,α,β)) if(α≥v)thenreturnv//pruneremainingchildren β=min(β,v) returnv //returnvalueofbestchild Alpha-BetaExample max B N -5 O 4 W -3 H 3 D I J P Q -5 -6 L K R 0 S 3 M 2 T 5 max B min α=-∞,Bβ=+∞ E 0 8 9 X Call Stack A α=-∞,Aβ=+∞ C G F Alpha-BetaExample U -7 G F V N -9 O 4 A -5 W -3 3 -5 E 0 8 I J P Q 9 X D C H Call Stack A α=-∞, β=+∞ -6 L K R 0 S 3 M 2 T 5 U -7 V -9 B A Alpha-BetaExample max B min α=-∞, β=+∞ max α=-∞,FFβ=+∞ G -5 N 3 O 4 W Call Stack A α=-∞, β=+∞ D C H Alpha-BetaExample I J P Q 9 -6 L K R S 0 B minα=-∞, β=+∞ E 0 8 T 3 U 5 maxα=-∞,Fβ=+∞G -5 M 2 V -7 -9 X -3 max -5 N F B A 3 O 4 W -5 Alpha-BetaExample E 0 8 I J P Q 9 X -3 D C H Call Stack A α=-∞ β=+∞ L K R -6 S 0 T 3 M 2 U 5 V -7 -9 brown: terminal state N F B A Alpha-BetaExample v(F) = alpha(F) = 4, maximum seen so far B min α=-∞, β=+∞ F G max α= β=+∞ -5 v=4, α=4, N O 4 W -3 α=-∞, β=+∞ 3 I J P Q X -6 0 brown: terminal state B min α=-∞, β=+∞ L K R max E 0 8 9 -5 D C H Call Stack A max S 3 M 2 T 5 U -7 F max V -9 F B A min G -5 α=4 O N 4 α=4,O β=+∞ W -3 X -5 D C H 3 E 0 8 I J P Q 9 Call Stack A α=-∞, β=+∞ -6 L K R 0 brown: terminal state S 3 M 2 T 5 U -7 V -9 O F B A Alpha-BetaExample Alpha-BetaExample v(O) = -3, minimum seen so far below O max Call Stack A α=-∞ min B F max -5 H 3 O N min G α=4 4 α=4, β=+∞ W X -3 -5 D C β=+∞ I J P Q 9 -6 R S 0 M 2 T 3 min L K U 5 V -7 -9 brown: blue: terminal terminalstate state (depth limit) W O F B A Call Stack A α=-∞ E 0 8 max B F max -5 H 3 O N min G α=4 4 α=4, β=v=-3 W X -3 -5 Alpha-BetaExample D C β=+∞ 8 I J P Q 9 E 0 -6 L K R S 0 T 3 M 2 U 5 V -7 -9 brown: terminal state O F B A Alpha-BetaExample Why? SmartopponentwillchooseWorworse,thusO'supper boundis–3.So,atFcomputershouldn'tchooseO:-3since N:4isbe_er. alpha(O) ≥ v(O): stop expanding O (alpha cutoff) max Call Stack A α=-∞ B min F max min G -5 α=4 O N α=4 v=-3 4 W -3 X -5 D C β=+∞ H 3 I J P Q 9 -6 R red: not visited 0 S 3 M 2 T 5 B min L K U -7 F -9 O F B A min G -5 α=4 O N α=4, v=-3 4 W -3 X -5 D C β=+∞ max V Call Stack A α=-∞ E 0 8 max H 3 8 I J P Q 9 E 0 -6 L K R 0 S 3 M 2 T 5 U -7 V -9 O F B A Alpha-BetaExample Alpha-BetaExample alpha(F) not changed (maximizing) B N min G -5 α=4 H 3 O 4 D I J P Q -6 S 0 M 2 T 3 U 5 V -7 -9 -5 F B A min F G -5 α=4 N H 3 O 4 Call Stack A α=-∞ D C β=4 β= max X -3 B min L K R max E 0 8 9 v=-3 W α=-∞ C β=+∞ F max Call Stack A max min v(B) = beta(B) = 4, minimum seen so far W 8 I J P Q 9 v=-3 E 0 -6 L K R S 0 T 3 M 2 U 5 V -7 -9 X -3 -5 Alpha-BetaExample B A Alpha-BetaExample v(B) = beta(B) = -5, updated to minimum seen so far B min F min G -5 α=4 N O 4 -3 H 3 D I J P Q X -5 -6 0 S 3 M 2 T 5 B min L K R max E 0 8 9 v=-3 W α=-∞ C β=4 max Call Stack A max U -7 F V -9 G B A min G -5 α=4 N O 4 -3 H 3 X -5 E 0 8 I J P Q 9 v=-3 W D C β=-5 β=4 max Call Stack A α=-∞ -6 L K R 0 S 3 M 2 T 5 U -7 V -9 B A Alpha-BetaExample Alpha-BetaExample Copy alpha and beta values from A to C v(A) = alpha(A) = -5, maximum seen so far max B min F max N min G -5 α=4 H 3 O 4 8 I J P Q -6 R S 0 M 2 T 3 U 5 V -7 -5 H 3 O 4 A -5 G α=4 N min -9 C W E 0 8 I J P Q 9 v=-3 Call Stack D α=-5,Cβ=+∞ β=-5 F max X -3 B min L K A Aβ=+∞ α=-5,α= max E 0 9 v=-3 W D C β=-5 Call Stack A α=-5,α=β=+∞ -6 L K R S 0 T 3 M 2 U 5 V -7 -9 X -3 -5 Alpha-BetaExample C A Alpha-BetaExample v(C) = beta(C) = 3, minimum seen so far v(C) (=3) > α(C) (= -5), so no cutoff A B min F max min C β=-5 N -5 O 4 -3 H 3 I J P Q X -5 -6 L K R 0 S 3 M 2 T 5 B min U -7 F max V -9 H C A min C G -5 α=4 N O 4 -3 H 3 8 I J P Q 9 v=-3 W α=-5,α= β=+∞ D α=-5, β=3, β= v=3 β=-5 X -5 Call Stack A max E 0 8 9 v=-3 W D α=-5, β=+∞ G α=4 Call Stack Aβ=+∞ α=-5,α= max -6 E 0 L K R 0 S 3 M 2 T 5 U -7 V -9 C A Alpha-BetaExample Alpha-BetaExample beta(C) not changed (minimizing) B min F max min N -5 H 3 O 4 E 0 8 I J P Q 9 v=-3 W D α=-5, β=3 G α=4 α=-5,α= β=+∞ C β=-5 Call Stack A max -6 L K R S 0 M 2 T 3 U 5 V -7 F -9 -5 I C A min N -5 B min W F max min G -5 α=4 N O 4 -3 H 3 X -5 E 0 I J P Q 8 α=-5,J β=3 9 v=-3 W D -6 R 0 S 3 M 2 T 5 J P Q -6 L K R S 0 T 3 M 2 U 5 V -7 -9 -5 C A B U -7 F max V -9 J C A min N -5 O 4 -3 H 3 X -5 E 0 I J P Q -6 R 0 L K α=-5, 8 β=3, v=9 9 v=-3 W D α=-5, β=3 G α=4 α=-5,α= β=+∞ C β=-5 Call Stack A max min L K I X -3 Call Stack A α=-5, β=3 8 9 E 0 Alpha-BetaExample α=-5,α= β=+∞ C β=-5 3 v=-3 Alpha-BetaExample max H O 4 D α=-5, β=3 G α=4 α=-5,α= β=+∞ C β=-5 max X -3 B min Call Stack A max S 3 M 2 T 5 U -7 V -9 P J C A Alpha-BetaExample Alpha-BetaExample v(J) (=9) ≥ beta(J) (=3) so stop expanding J (beta cutoff) v(J) = 9 B min F max min N -5 H 3 J P Q 9 v=-3 W I -6 L K R S 0 M 2 T 3 U 5 V -7 F -9 -5 J C A N min -5 H 3 O 4 X -3 -5 Alpha-BetaExample E 0 I J P Q -6 L K α=-5, 8 β=3, v=9 9 v=-3 W D α=-5, β=3 G α=4 α=-5,α= β=+∞ C β=-5 max X -3 B min Call Stack A max E 0 α=-5, 8 β=3, α= v=9 O 4 D α=-5, β=3 G α=4 α=-5,α= β=+∞ C β=-5 Call Stack A max R S 0 T 3 M 2 U 5 V -7 -9 red: not visited J C A Alpha-BetaExample Why? ComputershouldchoosePorbe_eratJ,soJ's lowerboundis9.But,smartopponentatCwon'ttake J:9sinceH:3isbe_erforopponent. v(C) and beta(C) not changed (minimizing) B min F max min N -5 O 4 -3 H 3 I J P Q X -5 -6 0 S 3 M 2 T 5 B min L K R max E 0 8 v=9, β=3 9 v=-3 W D α=-5, β=3 G α=4 α=-5,α= β=+∞ C β=-5 Call Stack A max U -7 F max V -9 J C A min C β=-5 N -5 O 4 -3 H 3 I X -5 E 0 J 8 v=9 P Q 9 v=-3 W D α=-5, β=3 G α=4 Call Stack A α=-5,α= β=+∞ -6 L K R 0 S 3 M 2 T 5 U -7 V -9 C A Alpha-BetaExample Alpha-BetaExample v(A) = alpha(A) = 3, updated to maximum seen so far B min F max N min -5 H I 3 O 4 J v=9 P Q -6 L K R S 0 M 2 T 3 U 5 V -7 F min -9 N A -5 -5 W B F max min G -5 α=4 N O 4 -3 3 D I X -5 J v=9 P Q -6 0 S 3 Q -6 R S 0 T 3 M 2 U 5 V -7 -9 -5 D A M 2 T 5 B U -7 F max V min -9 A N -5 O 4 -3 H 3 I X -5 E 0 J 8 v=9 P Q 9 v=-3 W D β=3 G α=4 α=3,α= β=+∞ C β=-5 Call Stack A max min L K R P L K X -3 E 0 8 9 v=-3 W H v=9 9 Call Stack A β=3 J 8 Howdoesthealgorithmfinishthesearchtree? α=3,α= β=+∞ C β=-5 I E 0 Alpha-BetaExample alpha(A) and v(A) not updated after returning from D (because A is a maximizing node) min 3 v=-3 Alpha-BetaExample max H O 4 D β=3 G α=4 α=3,α= β=+∞ C β=-5 max X -3 B min Call Stack A max E 0 8 9 v=-3 W D β=3 G α=4 α=-5 α= α=3, β=+∞ C β=-5 Call Stack A max -6 L K R 0 S 3 M 2 T 5 U -7 V -9 A Alpha-BetaExample Alpha-BetaExample Why? SmartopponentwillchooseLorworse,thusE's upperboundis2.SocomputeratAshouldn'tchoose E:2sinceC:3isabe_ermove. AfervisiHngK,S,T,LandreturningtoE, alpha(E)(=3)≥v(E)(=2)sostopexpandingE anddon’tvisitM(alphacutoff) B min F max min N -5 H 3 O 4 I E 0 J 8 v=9 P Q 9 v=-3 W D β=3 G α=4 α=3,α= β=+∞ C β=-5 Call Stack A max -6 K L M α=5, β=+∞ 2 R S 0 T 3 U 5 min -9 A X -3 F -5 N -5 O 4 -3 H 3 I X -5 E 0 J 8 v=9 P Q 9 v=-3 W D β=3 G α=4 α=3,α= β=+∞ C β=-5 max V -7 B min α=3, β=5, v=2 Call Stack A max -6 α=3, β=5, v=2 K L M α=5, β=2 2 R 0 S 3 T 5 U -7 V -9 A Alpha-BetaExample Finalresult:ComputerchoosesmoveC max B min F max min C β=-5 N -5 O 4 -3 H 3 I X -5 E 0 J 8 v=9 P Q 9 v=-3 W D β=3 G α=4 Call Stack A α=3,α= β=+∞ -6 α=3, β=5, v=2 K L M α=5, β=2 2 R 0 S 3 T 5 U -7 V -9 A Anotherstep-by-stepexample(fromAIcourseat UCBerkeley)givenat h_ps://www.youtube.com/watch?v=xBXHtz4Gbdo EffecHvenessofAlpha-BetaSearch EffecHvenessofAlpha-BetaSearch • EffecHvenessdependsontheorderinwhich successorsareexamined;moreeffecHve(i.e., morecutoffs)ifbestsuccessorsareexamined first • WorstCase: • InpracHceofengetO(b(d/2))ratherthanO(bd) – orderedsothatnopruningtakesplace – noimprovementoverexhausHvesearch • BestCase: – eachplayer’sbestmoveisvisitedfirst • InpracHce,performanceisclosertobest, ratherthanworst,case DealingwithLimitedTime • Inrealgames,thereisusuallyaHmelimit, T,onmakingamove • Howdowetakethisintoaccount? – cannotstopalpha-betaalgorithmmidwayand expecttouseresultswithanyconfidence – so,wecouldsetaconservaHvedepth-limitthat guaranteeswewillfindamoveinHme<T – butthen,thesearchmayfinishearlyand theopportunityiswastedtodomoresearch – sameashavingabranchingfactorof√b since(√b)d = b(d/2) • Example:Chess – DeepBluewentfromb ~ 35tob ~ 6, visiting 1 billionth the number of nodes visited by Minimax algorithm – permitsmuchdeepersearchforthesameHme – makescomputerchesscompeHHvewithhumans DealingwithLimitedTime InpracHce,useitera'vedeepeningsearch(IDS) – runalpha-betasearchwithdepth-firstsearchand anincreasingdepthlimit – whentheclockrunsout,usethesoluHonfound forthelastcompletedalpha-betasearch(i.e.,the deepestsearchthatwascompleted) – “any'mealgorithm” TheHorizonEffect • SomeHmesdisasterlurksjustbeyondthe searchdepth – computercapturesqueen,butafewmoveslater theopponentcheckmates(i.e.,wins) • Thecomputerhasalimitedhorizon,it cannotseethatthissignificanteventcould happen • Howdoyouavoidcatastrophiclossesdueto “short-sightedness”? – quiescencesearch – secondarysearch BookMoves • Buildadatabaseofopeningmoves,end games,andstudiedconfiguraHons • Ifthecurrentstateisinthedatabase, usedatabase: – todeterminethenextmove – toevaluatetheboard • Otherwise,doAlpha-Betasearch TheHorizonEffect • QuiescenceSearch – whenSBEvalueisfrequentlychanging,lookdeeper thanthedepthlimit – lookforpointwhengame“quietsdown” – E.g.,alwaysexpandanyforcedsequences • SecondarySearch 1. findbestmovelookingtodepthd 2. lookkstepsbeyondtoverifythatitsHlllooksgood 3. ifitdoesn't,repeatstep2fornextbestmove MoreonEvaluaHonFuncHons TheboardevaluaHonfuncHonesHmates howgoodthecurrentboardconfiguraHon isforthecomputer – itisaheurisHcfuncHonoftheboard'sfeatures • i.e.,function(f1, f2, f3, …, fn) – thefeaturesarenumericcharacterisHcs • feature1,f1,isnumberofwhitepieces • feature2,f2,isnumberofblackpieces • feature3,f3,isf1/f2 • feature4,f4,isesHmateof“threat”towhiteking • etc. LinearEvaluaHonFuncHons • Alinearevalua'onfunc'onofthe featuresisaweightedsumoff1,f2,f3, ... w1 * f1 + w2 * f2 + w3 *f3 + … + wn * fn – wheref1,f2,…,fn arethefeatures – andw1,w2 ,…,wn aretheweights • Moreimportantfeaturesgetmoreweight ExamplesofAlgorithms thatLearntoPlayWell Checkers A.L.Samuel,“SomeStudiesinMachineLearning usingtheGameofCheckers,” IBMJournalof ResearchandDevelopment,11(6):601-617,1959 • LearnedbyplayingthousandsofHmesagainsta copyofitself • UsedanIBM704with10,000wordsofRAM, magneHctape,andaclockspeedof1kHz • Successfulenoughtocompetewellathuman tournaments LinearEvaluaHonFuncHons • Thequalityofplaydependsdirectlyonthe qualityoftheevaluaHonfuncHon • TobuildanevaluaHonfuncHonwehaveto: 1. constructgoodfeaturesusingexpertdomain knowledge 2. pickorlearngoodweights ExamplesofAlgorithms thatLearntoPlayWell Backgammon G.TesauroandT.J.Sejnowski,“AParallel NetworkthatLearnstoPlayBackgammon,” ArDficialIntelligence,39(3),357-390,1989 • Alsolearnedbyplayingagainstitself • Usedanon-linearevaluaHonfuncHon-aneural network • Ratedoneofthetopthreeplayersintheworld Non-DeterminisHcGames 0 1 2 3 4 5 6 Non-DeterminisHcGames 7 8 9 10 11 12 • Somegamesinvolvechance,forexample: – rollofdice – spinofgamewheel – dealofcardsfromshuffleddeck • Howcanwehandlegameswithrandom elements? • ThegametreerepresentaHonisextended toinclude“chancenodes:” 25 24 23 22 21 20 19 1. computermoves 2. chancenodes(represenHngrandomevents) 3. opponentmoves 18 17 16 15 14 13 Non-DeterminisHcGames • Weightscorebytheprobabilitythatmoveoccurs • Useexpectedvalueformove:insteadofusing maxormin,computetheaverage,weightedbythe probabiliHesofeachchild ExtendedgametreerepresentaHon: A 50/50 .5 B 7 .5 .5 .5 E 0 5 4 .5 D 6 50/50 50/50 chance 50/50 6 2 9 A max C 2 Non-DeterminisHcGames 0 8 B min 4 -4 50/50 50/50 .5 .5 D 6 2 9 chance -2 C 2 7 .5 max E 0 6 5 min 4 0 8 -4 Non-DeterminisHcGames Choosemovewiththehighestexpectedvalue A max α= 4 50/50 50/50 4 -2 .5 B .5 C 2 7 .5 .5 D 6 2 9 chance E 0 6 5 min 4 0 8 ExpecHminimax Expectiminimax(n) = SBE(n) for n, a Terminal state or state at cutoff depth maxs∈Succ(n) expectiminimax( s) for n, a Max node mins∈Succ(n) expectiminimax( s) for n, a Min node Σ s∈Succ ( n ) P ( s ) * expectiminimax( s) for n, a Chance node -4 Non-DeterminisHcGames • Non-determinismincreasesbranchingfactor – 21possibledisHnctrollswith2dice(since6-5is sameas5-6) • Valueoflookaheaddiminishes:asdepth increases,probabilityofreachingagiven nodedecreases • Alpha-BetapruninglesseffecHve HistoryofSearchInnovaHons • Shannon,Turing • Kotok/McCarthy • MacHack • Chess3.0+ • Belle • CrayBlitz • Hitech • DeepBlue Minimaxsearch 1950 Alpha-Betapruning1966 TransposiHontables1967 IteraHve-deepening1975 Specialhardware1978 Parallelsearch 1983 ParallelevaluaHon1985 ALLOFTHEABOVE1997 ComputersPlayGrandMasterChess “DeepBlue”(IBM) • • • • • • • • Parallelprocessor,32“nodes” Eachnodehad8dedicatedVLSI“chesschips” Searched200millionconfiguraHons/second Usedminimax,alpha-beta,sophisHcatedheurisHcs Averagebranchingfactor~6insteadof~40 In2001searchedto14ply(i.e.,7pairsofmoves) Avoidedhorizoneffectbysearchingasdeepas40ply Usedbookmoves ComputerscanPlayGrandMasterChess Kasparovvs.DeepBlue,May1997 • 6gamefull-regulaHonchessmatchsponsoredby ACM • Kasparovlostthematch2winsto3winsand1He • Historicachievementforcomputerchess;thefirst Hmeacomputerbecamethebestchessplayeron theplanet • DeepBlueplayedby“bruteforce”(i.e.,rawpower fromcomputerspeedandmemory);itused relaHvelyli_lethatissimilartohumanintuiHonand cleverness StatusofComputers inOtherDeterminisHcGames “GameOver:KasparovandtheMachine”(2003) • Checkers – Firstcomputerworldchampion:Chinook – Beatallhumans(beatMarionTinsleyin1994) – UsedAlpha-Betasearchandbookmoves • Othello – Computerseasilybeatworldexperts • Go – Branchingfactorb~360,verylarge! – BeatLeeSedol,oneofthetopplayersintheworld,in 2016,4gamesto1 GamePlaying:Go Quiz:NameTheseRobots Google’sAlphaGobeatKoreangrandmaster LeeSedol4gamesto1in2016 HowtoImprovePerformance? • Reducedepthofsearch – Be_erSBEs • Usedeeplearningtolearngoodfeaturesrather thanuse“manually-defined”features • Reducebreadthofsearch – Exploreasubsetofthepossiblemovesinsteadof exploringall • Userandomizedexplora"onofthesearch space MonteCarloTreeSearch(MCTS) • Concentratesearchonmostpromisingmoves • Best-firstsearchbasedonrandomsamplingof searchspace • MonteCarlomethodsareabroadclassof algorithmsthatrelyonrepeatedrandom samplingtoobtainnumericalresults.Theycan beusedtosolveproblemshavinga probabilisHcinterpretaHon. PureMonteCarloTreeSearch ExploitaHonvs.ExploraHon • Foreachpossiblelegalmoveofcurrentplayer, simulatekrandomgamesbyselecHngmoves atrandomforbothplayersunHlgameover (calledplayouts);counthowmanywerewins outofeachkplayouts;movewithmostwins isselected • Stochas'csimula'onofgame • Gamemusthavefinitenumberofpossible moves,andgamelengthisfinite • RatherthanselecHngachildatrandom,how toselectbestchildnodeduringtreedescent? MCTSAlgorithm Recursivelybuildsearchtree,whereeachround consistsof: 1. StarHngatroot,successivelyselectbestchild nodesusingscoringmethodunHlleafnodeL reached 2. Createandaddbestnewchildnode,C,ofL 3. PerformarandomplayoutfromC 4. UpdatescoreatCandallofC’sancestorsin searchtree – Exploita'on:Keeptrackofaveragewinratefor eachchildfromprevioussearches;preferchild thathaspreviouslyleadtomorewins – Explora'on:AllowforexploraHonofrelaHvely unvisitedchildren(moves)too • Combinethesefactorstocomputea“score” foreachchild;pickchildwithhighestscoreat eachsuccessivenodeinsearch MonteCarloTreeSearch(MCTS) L C Key:numbergameswon/numberplayedplayouts State-of-theArtGoPrograms Update Scores • Google’sAlphaGo • Facebook’sDarkforest • MCTSimplementedusingmulHplethreadsand GPUs,andupto110Kplayouts • Alsousedadeepneuralnetworktocompute SBE Playout Key:numbergameswon/numberplayedplayouts Summary Summary • Gameplayingismodeledasasearchproblem • Searchtreesforgamesrepresentboth computerandopponentmoves • EvaluaHonfuncHonsesHmatethequalityof agivenboardconfiguraHonforeachplayer • Minimaxalgorithmdeterminesthe “opHmal”movesbyassumingthatboth playersalwayschoosestheirbestmove • Alpha-betaalgorithmcanavoidlargepartsof thesearchtree,thusenablingthesearchto godeeper • Formanywell-knowngames,computer algorithmsusingheurisHcsearchcanmatch orout-performhumanexperts −goodforopponent 0neutral +goodforcomputer
© Copyright 2024 Paperzz