Computational Game Theory

Computational aspects of two-player
zero-sum games
Course notes for “Computational Game
Theory”
Section 3
Fall 2010
Peter Bro Miltersen
November 10, 2010
Version 1.3
3
Extensive form games (Game Trees, Kuhn
Trees)
The strategic form of a game is a very general and clean way of representing
a game, but it is not a very convenient one. Suppose for instance, that we
want to represent the game tic-tac-toe in the strategic form. We need a
matrix with a row for every possible way of playing tic-tac-toe as X and a
column for every possible way of player tic-tac-toe as O. It may not be very
obvious how to enumerate the possible ways of playing tic-tac-toe. We shall
define an alternative representation of a game, the extensive form or game
tree or Kuhn tree representation which will make it more explicit what those
possible ways of playing a game is. The extensive form of a game thereby
gives a better visualization of the game. Also, as we shall see, it provides a
much more compact representation of games such as, say, tic-tac-toe, than
the strategic form. This compactness is clearly important for computational
purposes.
A game in extensive form is a rooted tree. Each node of the tree is also called
a position. Each position belongs to exactly one player, or to nature. If a
position belongs to nature, a fixed probability distribution is assoicated to its
outgoing arcs. For each player, a partition (i.e., an equivalence relation) is
given on the set positions belonging to that player. The equivalence classes
are called information sets. Intuitively, the player cannot distinguish between
the nodes in an information set. In each position of the game belonging to
a player, the outgoing arcs are also known as the actions of that position.
Intuitively, a player must choose between one of these actions whenever he
finds himself in that position. Each action has a name. If two positions are
in the same information set, the set of action names in those postions should
coincide. On he other hand, if two positions are not in the same information
sets, we require that the actions names in those positions are disjoint sets
(this will be convenient later on).
Example: Basic Endgame in Poker The rules of the game are:
(1) Both players put $1 into the pot.
(2) Player 1 is dealt a card (he inspects it, but keeps it hidden from player
2). A heart a winning card for Player 1. So it is a winning card with
probability 41 and a losing card with probability 34 .
2
(3) Player 1 either bets or he checks. If he bets, he puts $2 more into the
pot.
If player 1 bets:
(4) Player 2 either calls or folds. If he folds, he loses $1, no matter what
card player 1 has. If he calls, he adds $2 to the pot.
(5) If player 1 has hearts, he wins the pot. He also wins if he bets and
player 2 folds. Otherwise player 2 wins.
We draw a Kuhn tree for this game (Figure 1). To each vertex of the game
tree, we attach a label indicating which payer is to move from that position. The random card that is deal in the beginning, we generally refer to as
moves by nature and use the label N. At each terminal vertex, we write the
numerical value of player 1’s winnings (= player 2’s losses, because we are in
a zero-sum game). Player 2 does not know player 1’s card, that is when it
is his turn to move, he does not know at which of his two possible positions
he is. To indicate this on the diagram we are encirling the two positions in
a closed curve to indicate that these two vertices constitute an infomation
set. The two vertices at which player 1 is to move constitute two separate
infomation sets since he has inspected the card, and know at which positions
he is.
In general, information sets could descibe situations in which one player has
forgotten a move he has made earlier in the game or some information he
once knew (see Player 1’s information set in figure 2). However, we do not
allow this in our course. We only deal with games of perfect recall, which
are games in which players remember all past information they once knew
and all past moves they made. Formally, a game tree satisfies the perfect
recall condition if, for all nodes x and y belonging to the same information h
set belonging to Player j the following is true: The sequence of action names
performed by Player j on the path from the root of the tree to x is identical
to the sequence of action names performed by Player j from the root of the
tree to y. Note that this also implies that the sequences of information sets
belonging to that player encountered on those two paths coincide, as actions
in different information sets are required to have different names. There is a
conceptual reason for demanding the perfect recall condition: Arguably, if a
player forgets information, this should not be part of a model of the game,
it should be part of a model of the player. There is also a computational
3
Figure 1: Game tree
reason: Most computational problems associated with game without perfect
recall, such as value computation, turns out to be NP-hard.
3.1
Converting from the extensive form to the strategic form
We can convert a game from extensive form to strategic form. This conversion
procedure can be regarded as the definition of the “semantics” of the notion
of an extensive form game.
Definition 1 The strategic form game coresponding to an extensive form
game is the following. Let Ki be the set of information sets h belonging to
player i in the extensive form game. Then we let the strategy space for player
i in the strategic form game be the set
Si = ×h∈Ki set of action names in h
We also need to define the payoff functions. Note that a strategy profile
(x1 , x2 , . . . , xl ) with xi ∈ Si can be viewed as a set of selected actions, one for
each information set of the game. Given a strategy profile, we now consider
the following random process: We put a pebble in the root of the game. If the
4
Figure 2: Game tree, not perfect recall
pebble is in a position belonging to nature, we take a random sample from
the probability distribution on the outgoing arcs indicated at the position,
and move the pebble along the randomly chosen arc. If the pebble is in a
position belonging to a player, we take the outgoing arc corresponding to the
action chosen by the strategy profile. The payoff ui (x1 , x2 , . . . , xl ) for Player
i is defined to be the expected value of the payoff of Player i found in the
leaf of the tree where the pebble ends up.
We can convert our Basic Endgame in Poker from extensive form to strategic
form. Player 1 has two information sets, in each set he must make choice
from among two options. He therefore has 2 · 2 = 4 pure strategies. We may
denote them by
(b’,b): bet with a winning card or a losing card.
(b’,c): bet with a winning card, check with a losing card.
(c’,b): check with a winning card, bet with a losing card.
(c’,c): check with a winning card or a losing card.
Therefore, S1 = {(b0 , b), (b0 , c), (c0 , b), (c0 , c)}. Player 2 has only one information set.
C: if player 1 bets, call.
F: if player 1 bets, fold.
5
Therefore, S2 = {(C, F )}. The payoff function on two strategies is the expected payoff when strategies are played against each other in the tree, as
explained formally in the definition. Supose Player 1 uses (b’,b) and Player
2 uses C. Then the expected payoff is
1
3
3
u1 ((b, b), C) = (3) + (−3) = −
4
4
2
This gives the upper left entry in the following matrix. The other entries
may be computed similarly.
(b, b)
(b, c)
(c, b)
(c, c)
C F
− 23 1
0 − 12
−2 1
− 21 − 12
In this example the payoff matrix is manageable. But in general, the blowup
in size when going from extensive form to strategic form is exponential. Say,
suppose Player 1 has 100 information sets, each with a choice between two
actions. Then the number of rows in the matrix of the corresponding matrix
game is 2100 . So, we often like to represent a game in extensive form.
3.2
Converting Extensive Form Games into Strategic
Form
As an example of such a conversion, we consider an example from last lecture,
namely the basic endgame of poker. Figure 3 shows the game tree constructed
last lecture.
If we want to solve this game (finding the value and a maximin strategy for
both players), one way to do so is to convert the game into strategic form
and then solve it using linear programming. The corresponding strategic
form (constructed last lecture) is given by the matrix
C F
b0 b − 23 1
b0 c 0 − 12
c0 b −2 1
c0 c − 21 − 12
When solving this game, one might first want to reduce the matrix by using
the notion of dominance. We say that one row r1 weakly dominates another
6
Figure 3: Extensive form of basic endgame of poker.
non-identical row r2 if each entry in r1 is larger than or equal to the corresponding entry in r2 . Intuitively any probability mass put into r2 by a
strategy can be moved to r1 instead since each entry gives at least the same
payoff. It is therefore safe to remove the dominated row, since an optimal
strategy not using the dominated row exists. For our matrix game we see
that row 3 is weakly dominated by row 1 (a payoff of − 23 is always better than
−2 while the payoff of 1 does not change anything). We therefore remove
row 3. Similarly, row 4 is weakly dominated by row 2. We end up with
C F
b b − 32 1
b0 c 0 − 12
0
This game is easily solved using linear programming, and gives us ( 16 , 56 , 0, 0)
(matching the four rows in the original matrix) as the optimal mixed strategy
for Player 1 and ( 12 , 12 ) as the optimal mixed strategy for Player 2. The value
of the game is − 14 . Intuitively b0 c seems like the best strategy (bet when
having a heart, check elsewise), and not surprisingly we therefore use this
strategy 5 out of 6 times. However, it would not make sense to use this every
time, since Player 2 would then change his strategy to always fold when
Player 1 bets, causing Player 1 to lose more money. Player 1 therefore needs
to bluff occasionally—not very surprising to poker players.
Since this example did not very well illustrate the fact that such a conversion
gives an exponential blowup in the number of nodes in the tree, we consider
another example. This time two players roll a die and Player 2 tries to get
7
Figure 4: A dice game in extensive form.
a higher number than Player 1, who starts. The extensive form (or at least
some of it) of the game is seen in Figure 4. Each state has six outgoing
actions corresponding to each possible roll. Player 1 tells a number to Player
2 after studying his die (possible lying) and then Player 2 decides what to
do. A corresponding matrix game would have a row for each possible pure
strategy, thus giving 66 = 46656 rows in the matrix, as seen below.
10 100 1(3) 1(4) 1(5) 1(6)
10 100 1(3) 1(4) 1(5) 2(6)
..
.
60 600 6(3) 6(4) 6(5) 6(6)
3.3
Representing and Finding Solutions
As seen in the previous subsection, converting from extensive form to strategic form gives an exponential blowup, thus possible resulting in an LP practically infeasible to solve. Another related problem is the representation of
the result. For an n × m game matrix, the optimal solution is given as an
n-tuple with probabilities for each pure strategy (summing to 1) specifying
the mixed strategy. This also gives the exponential blowup.
We therefore seek other ways to both represent solutions and to find them.
First we address the problem of giving the solution in a more compact way.
Definition 2 A behavior strategy is
• a map from information sets of a player to probability distributions on
actions of those information sets, or stated differently it is
8
• an assignment of probabilities of actions belonging to a player (where
they sum to 1 for each information set).
This strategy corresponds to “delaying” the decision of which action to take
until the involved information set is reached when traversing the game tree.
See the red numbers on Figure 3 for a specific behavior strategy. Mixed
strategies force us to consider all options from the beginning, giving us quite
a few more possibilities.
Playing the game according to the behavior strategy is done by traversing the
game tree and letting each player take an action when reaching their information set according to the probability distribution on the actions belonging
to the information set.
The following theorem by Kuhn tells us that for games of perfect recall (no
forgetful players), mixed and behavior strategies can express precisely the
same strategies.
Theorem 3 (Kuhn 1953) For an extensive form game of perfect recall of
an arbitrary number of players, mixed strategies and behavior strategies are
behaviorally equivalent.
Here behaviorally equivalent means that playing a mixed or behavior strategy cannot be distinguished by somebody viewing from the outside. They
simulate each other perfectly.
Since the size of a behavior strategy is bounded by the number of edges in
the tree, such strategies are preferred when dealing with games of extensive
form.
We have now represented the solution in a more compact way and move on
to consider the following problem.
Algorithmic problem Given two-player, zero-sum games in extensive
form, compute value and maximin/minimax behavior strategies.
We present here three possible algorithms, where only the last one is a polynomial time algorithm.
Algorithm 1:
1. Convert to strategic form (exponential time).
9
2. Compute maximin/minimax mixed strategies (exponential time, since
the size of the matrix is already exponential).
3. Convert to behaviorally equivalent maximin/minimax strategies (as
given in the constructive proof of 3).
Algorithm 1 uses the theory already known, but has exponential running
time in the size of the game tree.
Algorithm 2:
1. Write Nash equation conditions (for each information set) as a mathematical program, roughly the size of the tree.
2. Solve the program.
This algorithm is somewhat better than Algorithm 1, since the program does
not suffer from the exponential blowup. However, solving such games can
be hard, since the resulting program is not linear: Variables of the program
(the probabilities used for the behavior strategy) often are multiplied by each
other, as seen in the toy example in Figure 5, where Player 1 has more than
one choice along the path to γ and β. The Nash equations will therefore
involve “pD · pd ” where pD is the behavior probability of D and d is the
behavior probability of d. Such terms can consist of an arbitrary number of
multiplications, corresponding to the number of choices along the path.
3.4
A Polynomial Time Algorithm
The last algorithm is due to Keller, Megiddo and von Stengel. In order for
this algorithm to work, we need to define two new helpful constructions,
sequence form and realization plan.
Definition 4 The sequence form of two-player, zero-sum extensive games is
given by the following two items.
• Sets Si of sequences for each player, i = 1, 2. Formally a set of sequences
for Player i is the set of all paths from the root to all other nodes, taking
out the actions for Player i.
10
Figure 5: A toy example.
• A payoff matrix with a row for every σ ∈ S1 and a column for every
τ ∈ S2 . The entries aστ of the matrix is given by
X
weight(l),
aστ =
leaves l consistent
with σ and τ
where for a leaf l
Y
weight(l) = payoff(l) ·
pe .
e is chance
edge on path
from root to l
Here pe is the probability of the chance edge e.
This definition is best viewed through an example or two. Let us again
consider the basic endgame of poker (Figure 3) and the toy game from Figure
5. For the poker game, we have
S1 = {, b0 , c0 , b, c},
S2 = {, C, F }.
as the sets of sequences. For the game in Figure 5 we get
S1 = {, D, U, Dd, Du},
S2 = {, L, R}.
11
For basic endgame of poker we get the following payoff matrix.
C F
0
0 0
3
1
0
b 0
4
4
c0 14
0 0
b 0 − 49 34
c − 34 0 0
In this example, each pair of sequences only leads to one leaf, so the sum
consists of only one term for each entry. The pairs not leading to a leaf has
the entry 0.
Definition 5 A realisation plan for a player is an assignment of real number
to his sequences, r : (S1 ∪ S2 ) → R. This number is called the realisation
weight of the sequence. The realisation plan corresponding to a behavior
strategy assigns to each sequence the product of behavior probabilities of
that sequence.
One way to view realization weights is that they simply correspond to a
change of variables (from the behavior strategy probabilities) that makes the
non-linear program of Algorithm 2 into a linear one!
As an example we again consult the game from Figure 5, and find the realisation weights of the two sequences Dd and D. The red numbers in the
figure are the behavior probabilities.
r(Dd) = 0.2 · 0.9 = 0.18,
r(D) = 0.2,
0.18
r(Dd)
=
= 0.9,
p(d) =
r(D)
0.2
where p(d) denotes the probability given to the action d in the behavior
strategy. Note that we can go back and forth between behavior strategies
and realisation plans by simple multiplication and division (unless we divide
by 0, but that will never be an issue, since the path containing this action
will never be taken).
The next lemma connects realization plans and behavior strategies.
Lemma 6 For a two-player, zero-sum game in extensive form the following
holds.
12
1. The set of realisation plans of Player 1 corresponding to some behavior
strategy is a bounded non-empty polytope
X = {x | Ex = e, x ≥ 0}.
2. The set of realisation plans of Player 2 corresponding to some behavior
strategy is a bounded non-empty polytope
Y = {y | F y = f, y ≥ 0}.
3. The expected payoff to Player 1 when he plays by x and Player 2 plays
by y is xT Ay, where A is the sequence form payoff matrix.
The matrices E and F and the vectors e and f are constructed using the fact
that the probability mass entering a node must be equal to the probability
mass leaving the node. For our game from Figure 5 we therefore have the
following equations for Player 1.
x
xD + xU
xDd + xDu
x , xD , xU , xDd , xDu
= 1,
= x ,
= xD ,
≥ 0.
The first three lines corresponds to Ex = e.
A formal proof of Lemma 6 is omitted since all three items are straightforward. It is, however, a very good exercise to go through the details of the
proof and also verify a few examples!
The next theorem follows naturally.
Theorem 7 For a two-player, zero-sum game in extensive form with payoff
matrix A (from the sequence form), the maximin realisation plan, r, is given
by
r = arg max min xT Ay.
x∈X y∈Y
Finally, we are ready to give Algorithm 3.
Algorithm 3: (Koller, Megiddo, von Stengel 1996)
13
1. Convert the game to sequence form. In particular, compute the payoff
matrix A and the matrices and vectors E, e, F, f defining the valid
realization plans.
2. Compute the maximin expression of Theorem 7 using linear programming (possible due to the proof of the generalised maximin theorem).
Since the number of sequences is linear in the number of nodes, we avoid
the exponential blowup when constructing the payoff matrix. This gives us a
polynomial time algorithm in the number of nodes, which is useful for solving
games of extensive form. The existence of such an algorithm was an open
problem for quite a while.
Using the Algorithm we find the linear programs for our two running examples, see Table 1.
Variables
Program
Basic endgame of poker
x , xb0 , xc0 , xb , xc (the realisation
weights)
q0 (the value)
qh (representing the contribution to
the value from plays through the information set owned by Player 2, h)
max q0
Game from Figure 5
x , xD , xU , xDd , xDu (the realisation
weights)
q0 (the value)
qh (representing the contribution to
the value from plays through the information set owned by Player 2, h)
max q0
subject to
subject to
: q0 ≤ qh + 14 xc0 − 34 xc
C : qh ≤ 34 xb0 − 94 xb
F : qh ≤ 14 xb0 + 34 xb
: q 0 ≤ q h + α · xu
L : qh ≤ γ · xDd + β · xDu
R : q h ≤ δ · xD
x = 1
xb0 + xc0 = x
xb 0xc = x
x , xb0 , xc0 , xb , xc ≤ 0
x = 1
xD + xU = x
xDd + xDu = xD
x , xD , xU , xDd , xDu ≥ 0
Table 1: Using Algorithm 3 on two examples.
The intuition behind the first constraint in the poker game is that the value
is bounded by the contribution to the value through h plus the contribution
from not going through h (Player 1 taking action c0 and c). The same applies
to the other example.
14
In general, the linear programs arising are quite intuitive. The reader is
invited to try some more examples, in particular examples with more information sets belonging to Player 2.
3.5
Finding pure minimax behavior strategies
Below is the game we considered in the last section.
Chance
♥
1
4
3
4
¬♥
I
b’ 1
1
2
b
c’ 0
1
II
C
I
F
1
2
3
c
1
2
5
6
1
II
C
1
1
6
F
1
2
-3
1
Figure 6: Basic endgame in poker
For this game we found the unique maximin/minimax strategies for player1/player2.
These are shown as probabilities on the actions. In general we would like to
find pure maximin/minimax strategies if they exist. As an example we now
look at a game which is a slight modification of the above. In fact it is a
more detailed model of the same real life game, where assume that Player
1 gets a random card out of a 24-card deck (9 up to Ace of each suit), and
that any hearts are good for Player 1. The tree is too big to draw, but the
top of it is shown in Figure 7.
This game has several maximin behavior strategies. Some of them are pure.
For instance, a maximin strategy for player 1 is:
• If ♥ bet
• If ¬♥
– If ace bet
– If ¬ ace check
15
Chance
A♥
K♥
I
I
..
.
..
.
10 ♠
······
same rules as above
9♠
I
I
..
.
..
.
Figure 7: Modified poker game
In this way we let the card decide the “randomness” we needed before where
player 1 in the last case should bet with probability 61 and check with probability 56 . In general we have the following:
Computational Problem 1 Given a two-player zero-sum extensive form
game with perfect recall, does it have a pure maximin behavior strategy1 ?
We are going to show that the problem is NP-hard. In fact, it is strongly
NP-hard. We remind what strongly NP-hard means:
Definition 8 (Strongly NP-hard) A problem is strongly NP-hard if it is
NP-hard when numbers in the instances are represented using unary notation
rather than binary or decimal. (For example, unary(8)= 11111111)
Proposition 9 Computational problem 1 is strongly NP-hard, even if chance
nodes are restricted to uniform distribution.
Proof The proof is done by a reduction from Exact Binpacking. Recall
the definition of this problem:
Exact Binpacking: Given positive integers a1 , . . . , an and an integer
P K≥
2, can we partition {1, 2, . . . , n} in K parts I0 , I1 , . . . IK such that i∈Ij ai is
the same for all j?
1
We note here that in this context it does not make a big difference if we are talking
about a plan or a strategy. The only difference is whether we specify behavior in every
information set or not. In a plan we don’t specify behavior in nodes that are not reachable
with the given choices.
16
We know that this problem is NP-hard (in fact strongly NP-hard) so doing
a reduction from this yields the result. In the reduction we use a gadget,
M (K) which is the k × k matrix game with 0 payoffs everywhere, except at
the diagonal where the payoff is -1. M (K) describes a game where player 1
thinks of a number between 0 and K − 1 and player 2 makes a guess about
which number it is. If he guesses correct player 1 gives him a dollar. This
game has value − 31 and the maximin as well as the minimax strategy is the
uniform distribution on the strategy space (note by the way that this game
is not symmetric. M is symmetric and for a game to be symmetric, its
matrix must be skewsymmetric). We can also make an extensive form game
strategically equivalent to M (K), illustrated here for K = 3.
I
1
0
II
0’
-1
II
2’
1’
0
2
0’
0
1’
0
-1
II
2’
0’
0
0
2’
1’
0
Figure 8: Extensive game equivalent to M (3)
The reduction goes as follows: Given an instance of Exact Binpacking in
Three Bins, A = {a1 , . . . , an , K} , we construct the following game, G(A),
illustrated in Figure 9, assuming K = 3.
Lemma 10 G(A) has a pure maximin strategy ⇔ A is a yes instance.
Proof ”⇐” Assuming A is a yes instance, i.e. it can indeed be devided into
three equally large parts, we want to convince ourselves that G(A) has a pure
maximin strategy. This proof is the same as for the modified poker game
example: Player 1 gets the randomness he needs from the bin corresponding
to the item the chance node informs him of.
”⇒” Now we assume that G(A) has a pure maximin strategy and we have
to construct a partition of A. We do this by putting items in bins matching
player 1’s choice in the strategy. For example if in node I where the edge
a
from the parent has probability Pn j ai the choice is 0 we put item aj into
i=1
bin I0 and so on. The claim is, that this is a correct partition. Assume for
17
-1
Chance
a1
P
ai
a2
P
ai
I
0
..
.
II
0’ 1’ 2’
-1
0
1
a
Pj
ai
······
I
an
P
ai
I
I
2
1
0
···
..
.
II
..
.
0’ 1’ 2’
0
a
n−1
P
ai
0
0
···
II
0’ 1’ 2’
-1 · · ·
· · ·· · ·
· · · -1
0
II
0’ 1’ 2’
0
Figure 9: Game tree for G(A)
P
P
P
contradiction that it is not, that is ¬(P i∈I0 ai = i∈I1 ai = i∈I2 ai ). Then
n
P
ai
∃j, wlog j = 0 such that i∈I0 ai > i=1
. But then the pure strategy of
3
player 2 can always choose 0 which makes player 1’s payoff < − 13 . But this
means that it is not a pure maximin strategy, and we have the contradiction!
With the proof of Lemma 10 we have completed the reduction and thereby
the proof of the proposition.
The following fact is not very difficult to show and is left as an exercise:
Fact 11 (Hansen, Miltersen, Sørensen, COCOON’07) For two-player
zero-sum games of perfect recall without chance nodes, existence of pure maximin strategies can be determined in linear time by a tree traversal.
18
2
..
.
0
0
-1

Download Report

Computational Game Theory

Paperzz.com

Your Paperzz