Chapter 5 Adversaria..

Adversarial Search 1
(Game Playing)
Outline
• Motivation
• Optimal decisions
• Minimax algorithm
• α-β pruning
2
Motivation
• Games are a form of multi-agent environment
– Any given agent will need to consider the actions of
other agents and how they affect our success?
– The unpredictability of other agents can introduce
many possible contingencies into the agent’s problem
solving process
– Cooperative vs. competitive multi-agent environments
3
Background on Multi agent
environment
Environments:
Single agent vs. multiagent
• The distinction is done as below
• If A is an agent (say, taxi driver)
• How to treat an other object B?
– As an agent?
– Or as a stochastically behaving object?
• The key distinction is whether B’s behavior is
best described as maximizing a performance
measure whose value depends on agent A’s
behavior
• Examples?
5
Environments:
Single agent vs. multiagent
• Example1:
• Chess
– The opponent B is trying to maximize its
performance measure
– Which minimizes agent A’s performance measure
• Chess
is
a
environment
competitive
multiagent
6
Environments:
Single agent vs. multiagent
• Example2:
• Taxi driving environment is a partially co
operative multiagent environment
• Avoiding
collisions
maximizes
the
performance measure of all agents
• Partially competitive as only one car can
occupy a parking space
7
Game theory
• Mathematical game theory, a branch of
economics, views any multi agent
environment as a game provided that the
– Impact of each agent on the others is “significant”
– Regardless of agents are competitive or cooperative
– (Game theory in Ch 17)
– Note: Environments with large number of agents are often viewed as
economics rather than games.
8
Games in AI
• In AI most common games are
• Turn taking
• Two player
• Zero-sum: one player’s loss is another’s gain
• Perfect Information: each player knows the entire
game state (No information is hidden from either
player)
• Deterministic: no element of chance
• What it means?
• Deterministic, fully observable environments in which
there are two agents
• Whose actions alternate and in which
• The utility values at the end of the game are
always equal and opposite
9
What is adversarial?
• There is opposition between the agent’s utility
functions that makes the situation adversarial
– Ex: if one player wins a game of chess +1
– The other player necessarily loses
-1
10
Motivation Contd …
• Games are a form of multi-agent environment
– What do other agents do and how do they affect our
success?
– Competitive multi-agent environments in which the
agents’ goals are in conflict give rise to adversarial
search problems known as games
• Why study games?
– Games are fun!
– Historical role in AI
– Studying games teaches us how to deal with other
agents trying to foil our plans
11
Motivation Contd …
• Huge state spaces
• The state of a game is easy to represent
• Agents are restricted to a small number of actions
• Outcomes are defined by precise rules
• Clear set of legal moves
• Well-defined outcomes (e.g. win, lose, draw)
• Nice, clean environment with clear criteria for
success
12
Motivation Contd …
• Physical games
• Physical games like tennis, croquet, ice hockey,
etc.
• have complicated descriptions
• Larger range of possible actions
• Imprecise rules to define legal actions
• With exception of robot soccer, physical games
have not much attracted by AI community (more
on http://www.robocup.org)
13
Motivation Contd …
• Game playing is one of the first task undertaken in AI
• Chess
• Checkers
• Othello
• Backgammon
• Games unlike toy problems are interesting
• Too hard to solve
• Ex: chess has an average branching factor of about 35
• If there are 50 moves by each player then
• Search tree has about 35100 nodes
14
Motivation Contd …
• Too hard to solve
• Games like the real world require the ability to take
some decision
• Needed even when calculating the optimal decision is
infeasible
• Games penalize inefficiency severely
• Implementing game algorithms without much
efficiency lead to cost more and take more time
• So game playing research has come up with ideas on
how to make the best possible use of time
15
Drawing game trees
• Even a simple tic tac toe is too complex to
draw the entire game tree as shown next
16
Game tree (2-player, deterministic, turns)
17
More complicated games
• Most card games (e.g. Hearts, Bridge, etc.) and
Scrabble
– non-deterministic
– lacking in perfect information
• Cooperative games
• Real-time strategy games (lack alternating
moves). e.g. Warcraft
18
Types of Games
Note: No chance (e.g., using dice) involved
19
Types of Games
20
Optimal decisions in games
Game setup – Two player
• Two players: A and B known as MAX and MIN
• MAX represents the player trying to win ie., to MAXimize
performance. MIN is the opponent who attempts to
MINimize MAX’s score
• High values are assumed to be good for MAX and bad for
MIN
• Assume that MIN uses the same information and always
attempts to move to a state that is worst for MAX
• MAX moves first and they take turns until the game is
over
• Winner gets award, loser gets penalty
Note: We consider zero sum games in this chapter
22
Two-Player Games
• A game formulated as a search problem:
–
–
–
–
–
Initial state: ?
Actions: ?
Terminal state: ?
Utility function: ?
Transition Model?
23
Game setup – Two player
• The initial state S0 , which specifies how the game is set up at
the start.
• PLAYER(s): Defines which player has the move in a state.
• ACTIONS(s): Returns the set of legal moves in a state.
• RESULT(s, a): The transition model, which defines the result of
a move.
• TERMINAL-TEST ( s): A terminal test, which is true when the
game is over and false otherwise. States where the game has
ended are called terminal states.
• UTILITY(s,p): A utility function (also called an objective
function or payoff function) defines the final numeric value
for a game that ends in terminal state s for a player p.
24
Game setup – Two player
• A game can be defined as a search problem:
– Initial state: board position and identifies the player to
move
– e.g. board configuration of chess
– Terminal test: Is the game finished? States at which
the game has ended are called terminal states
– Utility function: Gives numerical value for terminal
states. E.g. win (+1), lose (-1) and draw (0) in tic-tactoe, chess. Backgammon has +192 to -192
25
Game Trees
• Represent the problem space for a game by a game tree
• Nodes represent ‘board positions’; edges represent legal
moves.
• Root node is the position in which a decision must be
made.
• Evaluation function f assigns real-number scores to `board
positions.’
• Terminal nodes represent ways the game could end,
labeled with the desirability of that ending (e.g.
win/lose/draw or a numerical score)
26
Game Trees VS. Search Trees
• Game trees:
– For tic-tac-toe the game tree is relatively small-fewer than 9! = 362,
880 terminal nodes.
– But for chess there are over 1040 nodes
• So the game tree is best thought of as a theoretical construct
that we cannot realize in the physical world.
• Search tree:
– But regardless of the size of the game tree, it is MAX's job to search for
a good move.
– We use the term search tree for a tree that is superimposed on the full
game tree, and examines enough nodes to allow a player to determine
what move to make
27
Game Tree for Tic Tac Toe
• MAX has 9 possible moves. places ‘x’
• MIN places ‘o’ . They paly alternate until reach terminal
state: states where one player has three in a row or all the
squares are filled.
• It’s MAX job to use the search tree to determine the best
move.
• Terminal states are assigned with utility value according
to the rules of the game
28
Optimal strategies
Optimal strategies
• In a normal search problem what is the optimal solution?
– A sequence of steps leading to a goal state
• What about games?
–
–
–
–
MIN has some decisions
So MAX must find a contingent strategy
Specifies the MAX’s move in the initial state
Then MAX’s moves in the states resulting from every possible
response by MIN
– Then MAX’s moves in the states resulting from every possible
response by MIN to those moves and so on
• We will see how to find optimal strategy (minimax procedure)
30
Utility function
• How many functions for two players MAX and MIN?
• The zero-sum assumption allows to use a single evaluation
function to describe the goodness of a board with respect
to both players.
• one of the players just have to negate the return of the
function.
• Positive numbers indicate favor to MAX player
• Negative numbers indicate favor to MIN player
f(n) > 0: position n good for MAX and bad for MIN.
f(n) < 0: position n bad for MAX and good for MIN
f(n) near 0: position n is a neutral position.
f(n) >> 0: win for MAX
f(n) << 0: win for MIN
31
An example (partial) game tree for Tic-Tac-Toe
• f(n) = +1 if the position is a
win for X.
• f(n) = -1 if the position is a
win for O.
-
• f(n) = 0 if the position is a
draw.
32
Generate game tree
33
Generate game tree
x
x
x
x
34
Generate game tree
x
x
o
o x
x
o
x
o
35
Generate game tree
x
1 ply
1 move
x o
o x
x
o
x
o
36
Drawing game trees
•
•
•
•
So we adopt to the game tree shown next
Game trees are searched by level or a ply
Each move by a player defines a new ply of the game tree
Each level in the game tree is labeled according to the player
who moves at that point in the game, MIN or MAX
• MAX node
– nodes at even-numbered depths correspond to positions
in which it is MAX’s move next
• MIN node
– nodes at odd-numbered depths correspond to positions in
which it is MIN’s move next
37
Applying Minimax to Tic Tac Toe
38
Tic-Tac-Toe
X
O
f(n) = 6 - 5 = 1
 Initial State: Board position of 3x3 matrix with 0 and X.
 Actions (Operators): Putting 0’s or X’s in vacant positions
alternatively
 Terminal test: Which determines game is over
 Utility function:
f(n) = (No. of complete rows, columns or diagonals are
still open for player ) – (No. of complete rows, columns or
diagonals are still open for opponent )
Example : Tic-Tac-Toe
• MAX marks crosses and MIN marks circles and it is
MAX’s turn to play first.
– With a depth bound of 2, conduct a breadth-first search
– evaluation function f(n) of a position n
• If n is not a winning for either player,
f(n) = (no. of complete rows, columns, or diagonals that are still
open for MAX) - (no. of complete rows, columns, or diagonals that
are still open for MIN)
• If n is a win of MAX,
f(n) = 
• If n is a win of MIN
f(n) = - 
40
Example : Tic-Tac-Toe (2)
• First move
41
Example : Tic-Tac-Toe (3)
42
Example : Tic-Tac-Toe (4)
43
Problems
44
Compute Two-ply minimax for tic-tac-toe at the following state
Compute Two-ply minimax for tic-tac-toe at the following state
Building Minimax Procedure
47
A 2-ply Game tree - Hypothetical
• The possible moves for MAX at the root node are labeled A1,
A2, and A3.
• The possible replies to A1 for MIN are A11, A12, A13
• Assume game ends after one move each by MAX and MIN.
• In game parlance, we say that this tree is one move deep,
consisting of two half-moves, each of which is called a ply.
• Assume the utilities of the terminal states in this game range
from 2 to 14.
• Given a game tree, the optimal strategy can be determined
from the minimax value of each node, MINIMAX(s).
48
A 2-ply Game tree - Hypothetical
MAX
A1
A2
A3
1st ply
MIN
2nd ply
A11
3
A12
12
A13
A21
8
2
A22
A23
4
A31
6
14
A32
A33
5
2
• Note: An action by one player is called a ply, two ply (an action and
a counter action) is called a move.
• MAX nodes are denoted as
and MIN nodes as inverted.
49
A 2-ply Game tree
•
•
•
•
•
What is the MAX’s best move at the root?
What is the MIN’s best reply ?
Compute the minimax value
Label the nodes with their minimax values
Apply Minimax definition
50
Definition MINIMAX (s):
• Given a game tree, the optimal strategy can be determined by
using the minimax value of each node which is denoted as
MINIMAX (s):
– If the parent state is a MAX node, give it the maximum value among its
children
– If the parent state is a MIN node, give it the minimum value among its
children
– The minimax value of a terminal state is just its utility.
MINIMAX(s)=
UTILITY(s)
maxa  Actions(s) MINIMAX(RESULT(s,a))
mina  Actions(s) MINIMAX(RESULT(s,a))
If TERMINAL-TEST(s)
If PLAYER (s) = MAX
If PLAYER (s) = MIN
51
A 2-ply Game tree
• Apply minimax definition to the hypothetical
game tree
• What is the MAX’s best move at the root?
• MAX’s best move at the root is A1
• As it leads to the successor with the highest
minimax value
• What is the MIN’s best reply ?
• A11 because it leads to the successor with the
lowest minimax value
52
A 2-ply Game tree - Hypothetical
MAX
A1
A2
A3
1st ply
MIN
2nd ply
A11
3
A12
12
A13
A21
8
2
A22
A23
4
A31
6
14
A32
A33
5
2
• Note: An action by one player is called a ply, two ply (an action and
a counter action) is called a move.
• MAX nodes are denoted as
and MIN nodes as inverted.
53
Minimax Rule
• Goal of game tree search: to determine one move for Max
player that maximizes the guaranteed payoff for a given
game tree for MAX
Regardless of the moves the MIN will take
• The value of each node (MAX and MIN) is determined by
(back up from) the values of its children
• MAX plays the worst case scenario:
Always assume MIN to take moves to maximize his payoff (i.e., to minimize the pay-off of MAX)
• For a MAX node, the backed up value is the maximum of
the values associated with its children
• For a MIN node, the backed up value is the minimum of
the values associated with its children
54
Minimax Tree
MAX node
MIN node
f value
A1 is selected as the next move
55
Minimax procedure
• Create start node as a MAX node with current board
configuration
• Expand nodes down to some depth (i.e., ply) of lookahead
in the game.
• Apply the evaluation function at each of the leaf nodes
• Obtain the “back up" values for each of the non-leaf
nodes from its children by Minimax rule until a value is
computed for the root node.
• Pick the operator associated with the child node whose
backed up value determined the value at the root as the
move for MAX
56
Applying Minimax Definition
The minimax decision
57
Minimax Search
2
1
2
2
7
1
Static evaluator
value
8
2
7
1
8
2
1
2
7
1
8
2
This is the move
selected by minimax
2
1
MAX
MIN
2
7
1
58
8
Minimax algorithm
• Algorithm:
1. Generate game tree completely
2. Determine utility of each terminal state
3. Propagate the utility values upward in the three by applying MIN and
MAX operators on the nodes in the current level
4. At the root node use minimax decision to select the move with the
max (of the min) utility value
• Steps 2 and 3 in the algorithm assume that the opponent will
play perfectly.
59
Minimax Algorithm
60
Explanation
• The algorithm for calculating minimax decisions
• It returns the action corresponding to the best possible move
• that is, the move that leads to the outcome with the best
utility, under the assumption that the opponent plays to
minimize utility.
• The functions MAX-VALUE and MIN-VALUE go through the
whole game tree, all the way to the leaves
• to determine the backed-up value of a state.
• The notation argmax a € S f(a) computes the element a of set
S that has the maximum value of f (a).
61
Minimax Assumption
• Finds the contingent strategy for MAX assuming an
infallible MIN opponent.
• Minimax Assumption: Both players play optimally !!
• Definition of optimal play for MAX assumes MIN plays
optimally: maximizes worst-case outcome for MAX.
• But if MIN does not play optimally, MAX will do even
better [proven]
62
MINIMAX Code
function MINIMAX(N)
begin
if N is a leaf then
return the estimated score of this leaf
else
Let N1, N2, .., Nm be the successors of N;
if N is a MIN node then
return min{MINIMAX(N1),…,MINIMAX(Nm)}
else
return max{MINIMAX(N1), .., MINIMAX(Nm)}
end MINIMAX;
63
Minimax Properties
• Minimax is for deterministic fully observable games
• perfect information games: play for deterministic
environments with perfect information
64
Applying minimax to complicated games
• How to apply minimax to complicated games?
– It is not possible to expand the game tree till the leaf
nodes (complete tree is infeasible ex: as in chess)
– Instead, the state space is searched to a predefined
number of levels (determined by available resources of
time and memory)
• This strategy is an n-ply lookahead where n is the no
of levels explored
– Leaves of this sub graph are not the terminal states of the
game
– So, it is not possible to give them values that reflect a win
or a loss
65
Applying minimax to complicated games
• How to apply minimax to complicated games?
• Each node is given a value according to some heuristic
evaluation function
• The value that is propagated back to the root node is not an
indication of whether or not a win can be achieved
• But is the heuristic value of the best state that can be reached
in n moves from the start node
• Backed up value are based on “looking ahead” in the game
tree
• Look ahead increases the power of a heuristic by allowing it to
apply over a greater area of the search space
• So, minimax consolidates these evaluations into a single value
of an ancestor state
66
Heuristic vs. Brute force
• Zero sum games : one players loss is another player's gain.
• A winning strategy for this type of game is to minimize the
maximum potential gain of the opponent and
• Assume your opponent is following the same strategy.
• Better than brute force lookahead:
– Consider all possible moves to the end
– Pick the move that leads to a win, if possible
– Why not program Computer Chess that way?
67
Heuristics in games
• Heuristics in chess
• Heuristics in chess: difference in no of pieces
belonging to MAX and MIN
68
Minimax: properties
• Complete: ?
• Optimal: ?
• Time complexity: ?
• Space complexity: ?
69
Minimax: properties
• The minimax algorithm is depth-first search
•Complete: ? Yes, for finite state-space (finite
tree)
• Optimal: ? Yes (against an optimal opponent)
• Time complexity: ? O(bm)
• Space complexity: ? O(bm) if all successors
are generated at once
•O(m) if successors are generated one at a
time
70
State space search vs. minimax search
• Performance depends on
– Quality of evaluation functions (domain knowledge)
– Depth of the search (computer power and search algorithm)
• Different from ordinary state space search
– Not to search for a complete solution but for one move only
– No cost is associated with each arc
– MAX does not know how MIN is going to counter each of his
moves
• Time complexity is impractical for real games
– But minimax rule is a basis for other game tree search
algorithms
71
Multiplayer Games
• Many popular games allow more than two
players.
• How to extend the minimax idea to
multiplayer games
• This is straightforward from the technical
viewpoint
• but raises some interesting new conceptual
issues.
72
Multiplayer Games
• Many games allow more than two players
• Replace the single value for each node with a
vector of values
• In 2 player zero sum games the two element
vector was reduced to a single value because
values are always opposite
• Treat utility function to return a vector of
values
• Ex: for 3 players A, B, C a vector (vA, vB, vC) is
associated to each node
73
Multiplayer Games
• Computing minimax values
• Consider node X where player C chooses what
to do
• There are 2 choices leading to 2 terminal
states (1, 2, 6) and (4, 2, 3)
• C should choose (1, 2, 6) as 6 > 3. So backed
up value of node X is (1, 2, 6)
• In general, backed up value of a node n is the
utility vector of that successor which has the
highest value for the player choosing at n
74
Extending Minimax to Multiplayer games
Note: optimal strategy for multi player games such as alliances are not dealt
75
Alpha Beta Pruning

Download Report

Chapter 5 Adversaria..

Paperzz.com

Your Paperzz