Games in extensive form with imperfect information

Games in extensive form with imperfect
information
Wieslaw Zielonka
www.liafa.univ-paris-diderot.fr/~zielonka
mail: [email protected]
LIAFA, Université Denis Diderot
1
Extensive games with imperfect information
An extensive game with imperfect information has the same ingredients as a game
with perfect information but now we assume that each set Vi , i ∈ N, is partitioned
onto ki non empty sets, Vi = Vi1 ∪ . . . Viki , the so called information sets.
We assume that for any two positions v, w that are in the same information set we
have A(v) = A(w), i.e. v and w have the same sets of available actions. In the
sequel, for each information set Vim , A(Vim ) will denote the set of actions available
at Vim .
The intuition behind the information sets is that if the play arrives at any position
belonging to information set Vij then player i knows only that he is at some position
in Vij but he does not know which one.
If all information sets are singletons then we have perfect information game examined
in the preceding section.
An example of an extensive game with imperfect information is given on Figure 1.
Example 1. Figure 1 presents a game with partial information. At state r there is
Figure 1: A extensive game with imperfect information.
a chance move, with probability 0.4 we go to the state x and with probability 0.6
1
to state y. Both these state belong to player 1, actions a and d lead to terminal
states and actions b and c lead to states w and v controlled by player 2. Player 2
cannot distinguish between these two states, he can observe neither chance moves
nor player’s 1 moves.
1.1
Pure, mixed and behavioral strategies
Let Γ be a game in extensive form. Let Vi1 , . . . , Viki be information sets of player i.
A pure strategy for player i is a mapping σi that for each information set Vij gives
an action σi (Vij ) ∈ A(Vij ) available in Vij .
A mixed strategy for player i is a probability distribution over pure strategies of
player i.
A behavioral strategy σi for player i is a mapping that for each information set Vij
gives a probability distribution over the actions available at Vij , σi (Vij ) ∈ ∆(A(Vij )).
Example 2. Let us consider the game in Figure 2.
Figure 2: An extensive game with imperfect information. There is one chance
position – the root. The information sets of player 1 are singletons. There are two
information sets for player 2.
Player 1 has two information sets consisting of single vertices and he has 4 pure
strategies: [ac], [ad], [bc], [bd]. Player 2 has also two information sets and four pure
strategies: [AC], [AD], [BC], [BD]. Any profile of pure strategies yields a probability
distribution over terminal positions, for example if players use strategies ([ac], [AD])
then terminal vertices with payoffs (0, 2) and 4, 1 are obtained with probabilities 32
and 31 respectively, all other terminal positions have probability 0. Thus we can associate with profile ([ac], [AD]) the payoffs 32 (0, 2)+ 31 (4, 1) = ( 43 , 35 ). The same method
can applied to any pure strategy profile to calculate the corresponding payoffs.
This gives a translation from extensive games to normal form games and the Nash
theorem implies that the existence of equilibria in mixed strategies.
2
However we are more interested in equilibria in behavioral strategies.
Example 3. Consider the game presented in Figure 3.
Figure 3: An extensive game with imperfect information. There are two information
sets for player 1, {x1 } and {s1 ; y1 }, and one information set {u2 , z2 } of for player 2.
An example of a behavioral strategy for player 1 is σ1 (x1 ) = 52 a+ 35 b and σ1 ({s1 , y1 }) =
1
y + 43 z, i.e. in the information set {x1 } actions a and b are taken with probability
4
2
and 53 respectively and in the information set {s1 , y1 } player 1 takes action y with
5
probability 41 and action z with probability 43 .
Again each profile of behavioral strategies yields a probability distribution over
terminal vertices and taking the expectation we get the payoff of each player. But
does there exist an equilibrium in behavioral strategies? The answer is positive for
games with perfect recall.
1.2
Games with perfect recall
Let Γ be an extensive game (with imperfect information). For each path p =
v0 a0 v1 . . . an−1 vn from the initial position v0 to a position vn we can define the view
viewi (p) of player i in the following way:
• first remove from p all states that are not in Vi and all actions that are not
executed by player i,
• in the next step replace the states of Vi by the corresponding information sets.
Definition 1. A game is a game with perfect recall iff for each player i, for each
information set Vij of player i and for all positions v, w ∈ Vij if pv and pw are paths
from the initial position to v and w respectively then viewi (pv ) = viewi (pw ). In
other words, player i cannot distinguish v from w even if he has a perfect memory
of all events that he could observe during the play.
The games in Figures 3, 2 and 1 are with perfect recall.
Figure 4 provides an example of a game without perfect recall. Let pz and py be the
paths from the initial position to z and y respectively. We have view1 (pz ) = {z, y}
and view1 (py ) = {x}c{z, y}. Thus if player 2 confuses z and y then this means that
he “has forgotten” whether he had visited x or not and whether he has taken action
c.
3
Figure 4: There are two information sets for player 1, {x} and {z, y}.
Definition 2. Two strategies σi and σi′ (mixed or behavioral) of player i are said
to be outcome-equivalent if for any strategy profile in pure strategies σ−i of the
other players and for each position s ∈ N, P (s|(σi , σ−i )) = P (s|(σi′ , σ−i )), i.e. the
probability of reaching s under (σi , σ−i ) is the same as the probability of reaching s
under (σi′ , σ−i ).
Theorem 3 (Kuhn). For finite games with perfect recall mixed and behavioral strategies are outcome-equivalent.
Conclusion: games with perfect recall have Nash equilibria in behavioral strategies.
We just sketch briefly the proof of Theorem 3.
The translation from behavioral to outcome-equivalent mixed strategies is easy.
Consider again the game of Figure 3. The behavioral strategy
{x1 } 7→ αa + (1 − α)b,
{s1 , y1 } 7→ βy + (1 − β)z
is outcome-equivalent to the mixed strategy
αβ[ay] + α(1 − β)[az] + (1 − α)β[b, y] + (1 − α)(1 − β)[yz].
I hope that you get the pattern: If Vi1 , . . . , Viik are information sets of player i,
A1i , . . . , Aiik are the sets of actions available at each information set and σ a behavioral
strategy that maps each Vij to an element of ∆(Aji ) then the corresponding outcomeequivalent mixed strategy σ ∗ is obtained in the following way: the probability of a
pure strategy [a1 , . . . , aik ], where a1 ∈ A1 , . . . , aik ∈ Aiik is simply σ ∗ ([a1 , . . . , aik ]) =
σ(V1 )(a1 ) · · · σ(Viik )(aik ).
The proof that σ ∗ is outcome-equivalent to σ is easy and can be left as an exercise.
The translation from a mixed to an outcome-equivalent behavioral strategy needs
more care as the following example shows:
Consider the mixed strategy 21 [ay]+ 12 [bz] of player 1 in the game of Figure 3. What is
the equivalent behavioral strategy? In particular what is the probability distribution
for actions y and z at the information set {s1 , y1 }? One can think naively that y, z
are executed with respective probabilities 0.5 and 0.5. Note however that player 1
using the mixed strategy 21 [ay] + 12 [bz] will never play z, this is because if he chooses
b at x1 then the information set {s1 , y1 } is never attained, thus the probability of
choosing z in an equivalent behavioral strategy is 0 and not 0.5.
4
To construct a behavioral strategy equivalent to a given mixed strategy we proceed
in the following way.
A history is a path h = s1 a1 s2 a2 . . . an−1 sn in the game tree starting at the root s1
of the tree and ending at some position sn .
Let β be a pure strategy for player i. We say that h = s1 a1 s2 a2 . . . an−1 sn is consistent with β if for each vertex sm in h which is controlled by player i, β(Vism ) = am ,
where Vism is the information set containing sm .
Let h and h′ be two histories ending at vertices (positions) which are in the same
information set Vij of player i. The crucial observation is that for a perfect recall
game and any pure strategy β of player i, h is consistent with β if and only if h′ is
consistent with β (and this property does not hold for games without perfect recall).
Let Bi be the set of all pure strategies of player i.
For each information set Vij of player i, let Bi (Vij ) be the set of all pure strategies
β ∈ Bi of player i such that there exists a history h ending at Vij which is consistent
with β (by our previous remark concerning perfect recall games, this happens if all
histories ending at Vij are consistent with β).
For an action a ∈ A(Vij ) available at Vij , let Bi (Vij , a) be the set of all pure strategies
β ∈ Bi ofP
player i such that β ∈ Bi (Vij ) and β(Vij ) = a.
Let σ = {β∈Bi } bβ · β be a mixed strategy of player i, where bβ is the probability
P
of choosing pure strategy β,
bβ = 1.
∗
Then a behavioral strategy σ outcome equivalent to σ is defined in the following
way:
for each information set Vij of player i and each action a ∈ A(Vij ) we set
P
β∈Bi (Vij ,a) bβ
σ ∗ (Vij )(a) = P
β∈Bi (V j ) bβ
i
P
P
= 0 then σ ∗ (Vij )(a) can be set to any constant
P
from the interval [0, 1] (in such a way that the the sum of a∈A(V j ) σ ∗ (Vij )(a) = 1).
i
Again the proof that σ and σ ∗ are outcome-equivalent is left as an exercise.
Let us consider the game of Example 3. Player 1 has four pure strategies [ay] [az],
[by] and [bz]. Let
σ = b1 [ay] + b2 [az] + b3 [by] + b4 [bz]
if
β∈Bi (Vij ) bβ
6= 0. If
β∈Bi (Vij ) bβ
be a mixed strategy of player 1.
Take the information set {s1 , y1 } of player 1. Then B1 ({s1 , y1 }) = {[ay], [az]}, i.e.
histories ending at {s1 , y1 } are consistent either with [ay] or with [az], and they are
consistent neither with [by] nor with [bz]. We have also B1 ({s1 , y1 }, y) = {[ay]}, i.e.
[ay] is the only pure strategy such that histories ending at {s1 , y1 } are consistent
with it and this strategy selects action y. Thus a behavioral strategy σ ∗ outcome
equivalent to σ selects action y at {s1 , y1} with probability
σ ∗ ({s1 , y1})(y) =
b1
b1 + b2
if b1 + b2 6= 0. If b1 + b2 = 0 then we can set σ ∗ ({s1 , y1})(y) anyhow.
5
1.3
Subgame perfect equilibria for extensive games with imperfect information
Subgame perfect equilibria for imperfect information games are defined in the same
way as for perfect information games if we adopt the following definition of the
subgame: a subgame is a subtree starting at some non-terminal position and such
that that for each player i and for each information set Vij of player i either Vij is
contained in the subtree or it is disjoint with the subtree. For example the game in
Figure 3 has no proper subgames. On the other hand, the game of Figure 2 has two
proper subgames rooted at w and x.
There many other notions of equilibria for imperfect information games: sequential
equilibria, trembling hand equilibria etc.
Exercise 1 (from David Williams, Weighing the odds, Cambridge University Press).
At the end of Monty Hall game show a contestant is shown three closed doors.
Behind one of the doors is a car; behind each of the other two is a goat. The
contestant chooses one of the three doors. The show’s host, who knows which
door conceals the car, opens one of the remaining two doors which he knows will
definitely reveal a goat. He then asks the contestant whether or not she wishes to
switch her choice to the remaining closed door. Should she switch or stick to the
original choice?
Model this as a game with imperfect information. Answer the question given above.
Note that there is a number of hidden assumptions. First of all we suppose that the
contestant prefers a car to a goat (maybe this is obvious?).
Secondly, all this description is common knowledge, in particular the contestant
knows that the host knows where is the car and that the host will always open the
door with a goat behind.
The problem can be solved by elementary probability but I want that you model it
as a game and solve on the basis of the game presentation.
Now suppose that the host does not know where is the car, he opens the door and
it turns out that there is a goat behind. In this case should the contestant switch
or stick to the original choice?
Exercise 2. A driver starts driving at S (see the figure below). At X he can either
continue C or exit E and go to A with payoff 0. At Y he can either continue and get
to C with payoff 1 or exit and go to B with payoff 4. However he cannot distinguish
between intersections X and Y.
What is his optimal behavioral strategy? An his optimal mixed strategy?
X
Y
C
C
C
1
S
E
E
4
0
B
A
6