Games in extensive form with imperfect information Wieslaw Zielonka www.liafa.univ-paris-diderot.fr/~zielonka mail: [email protected] LIAFA, Université Denis Diderot 1 Extensive games with imperfect information An extensive game with imperfect information has the same ingredients as a game with perfect information but now we assume that each set Vi , i ∈ N, is partitioned onto ki non empty sets, Vi = Vi1 ∪ . . . Viki , the so called information sets. We assume that for any two positions v, w that are in the same information set we have A(v) = A(w), i.e. v and w have the same sets of available actions. In the sequel, for each information set Vim , A(Vim ) will denote the set of actions available at Vim . The intuition behind the information sets is that if the play arrives at any position belonging to information set Vij then player i knows only that he is at some position in Vij but he does not know which one. If all information sets are singletons then we have perfect information game examined in the preceding section. An example of an extensive game with imperfect information is given on Figure 1. Example 1. Figure 1 presents a game with partial information. At state r there is Figure 1: A extensive game with imperfect information. a chance move, with probability 0.4 we go to the state x and with probability 0.6 1 to state y. Both these state belong to player 1, actions a and d lead to terminal states and actions b and c lead to states w and v controlled by player 2. Player 2 cannot distinguish between these two states, he can observe neither chance moves nor player’s 1 moves. 1.1 Pure, mixed and behavioral strategies Let Γ be a game in extensive form. Let Vi1 , . . . , Viki be information sets of player i. A pure strategy for player i is a mapping σi that for each information set Vij gives an action σi (Vij ) ∈ A(Vij ) available in Vij . A mixed strategy for player i is a probability distribution over pure strategies of player i. A behavioral strategy σi for player i is a mapping that for each information set Vij gives a probability distribution over the actions available at Vij , σi (Vij ) ∈ ∆(A(Vij )). Example 2. Let us consider the game in Figure 2. Figure 2: An extensive game with imperfect information. There is one chance position – the root. The information sets of player 1 are singletons. There are two information sets for player 2. Player 1 has two information sets consisting of single vertices and he has 4 pure strategies: [ac], [ad], [bc], [bd]. Player 2 has also two information sets and four pure strategies: [AC], [AD], [BC], [BD]. Any profile of pure strategies yields a probability distribution over terminal positions, for example if players use strategies ([ac], [AD]) then terminal vertices with payoffs (0, 2) and 4, 1 are obtained with probabilities 32 and 31 respectively, all other terminal positions have probability 0. Thus we can associate with profile ([ac], [AD]) the payoffs 32 (0, 2)+ 31 (4, 1) = ( 43 , 35 ). The same method can applied to any pure strategy profile to calculate the corresponding payoffs. This gives a translation from extensive games to normal form games and the Nash theorem implies that the existence of equilibria in mixed strategies. 2 However we are more interested in equilibria in behavioral strategies. Example 3. Consider the game presented in Figure 3. Figure 3: An extensive game with imperfect information. There are two information sets for player 1, {x1 } and {s1 ; y1 }, and one information set {u2 , z2 } of for player 2. An example of a behavioral strategy for player 1 is σ1 (x1 ) = 52 a+ 35 b and σ1 ({s1 , y1 }) = 1 y + 43 z, i.e. in the information set {x1 } actions a and b are taken with probability 4 2 and 53 respectively and in the information set {s1 , y1 } player 1 takes action y with 5 probability 41 and action z with probability 43 . Again each profile of behavioral strategies yields a probability distribution over terminal vertices and taking the expectation we get the payoff of each player. But does there exist an equilibrium in behavioral strategies? The answer is positive for games with perfect recall. 1.2 Games with perfect recall Let Γ be an extensive game (with imperfect information). For each path p = v0 a0 v1 . . . an−1 vn from the initial position v0 to a position vn we can define the view viewi (p) of player i in the following way: • first remove from p all states that are not in Vi and all actions that are not executed by player i, • in the next step replace the states of Vi by the corresponding information sets. Definition 1. A game is a game with perfect recall iff for each player i, for each information set Vij of player i and for all positions v, w ∈ Vij if pv and pw are paths from the initial position to v and w respectively then viewi (pv ) = viewi (pw ). In other words, player i cannot distinguish v from w even if he has a perfect memory of all events that he could observe during the play. The games in Figures 3, 2 and 1 are with perfect recall. Figure 4 provides an example of a game without perfect recall. Let pz and py be the paths from the initial position to z and y respectively. We have view1 (pz ) = {z, y} and view1 (py ) = {x}c{z, y}. Thus if player 2 confuses z and y then this means that he “has forgotten” whether he had visited x or not and whether he has taken action c. 3 Figure 4: There are two information sets for player 1, {x} and {z, y}. Definition 2. Two strategies σi and σi′ (mixed or behavioral) of player i are said to be outcome-equivalent if for any strategy profile in pure strategies σ−i of the other players and for each position s ∈ N, P (s|(σi , σ−i )) = P (s|(σi′ , σ−i )), i.e. the probability of reaching s under (σi , σ−i ) is the same as the probability of reaching s under (σi′ , σ−i ). Theorem 3 (Kuhn). For finite games with perfect recall mixed and behavioral strategies are outcome-equivalent. Conclusion: games with perfect recall have Nash equilibria in behavioral strategies. We just sketch briefly the proof of Theorem 3. The translation from behavioral to outcome-equivalent mixed strategies is easy. Consider again the game of Figure 3. The behavioral strategy {x1 } 7→ αa + (1 − α)b, {s1 , y1 } 7→ βy + (1 − β)z is outcome-equivalent to the mixed strategy αβ[ay] + α(1 − β)[az] + (1 − α)β[b, y] + (1 − α)(1 − β)[yz]. I hope that you get the pattern: If Vi1 , . . . , Viik are information sets of player i, A1i , . . . , Aiik are the sets of actions available at each information set and σ a behavioral strategy that maps each Vij to an element of ∆(Aji ) then the corresponding outcomeequivalent mixed strategy σ ∗ is obtained in the following way: the probability of a pure strategy [a1 , . . . , aik ], where a1 ∈ A1 , . . . , aik ∈ Aiik is simply σ ∗ ([a1 , . . . , aik ]) = σ(V1 )(a1 ) · · · σ(Viik )(aik ). The proof that σ ∗ is outcome-equivalent to σ is easy and can be left as an exercise. The translation from a mixed to an outcome-equivalent behavioral strategy needs more care as the following example shows: Consider the mixed strategy 21 [ay]+ 12 [bz] of player 1 in the game of Figure 3. What is the equivalent behavioral strategy? In particular what is the probability distribution for actions y and z at the information set {s1 , y1 }? One can think naively that y, z are executed with respective probabilities 0.5 and 0.5. Note however that player 1 using the mixed strategy 21 [ay] + 12 [bz] will never play z, this is because if he chooses b at x1 then the information set {s1 , y1 } is never attained, thus the probability of choosing z in an equivalent behavioral strategy is 0 and not 0.5. 4 To construct a behavioral strategy equivalent to a given mixed strategy we proceed in the following way. A history is a path h = s1 a1 s2 a2 . . . an−1 sn in the game tree starting at the root s1 of the tree and ending at some position sn . Let β be a pure strategy for player i. We say that h = s1 a1 s2 a2 . . . an−1 sn is consistent with β if for each vertex sm in h which is controlled by player i, β(Vism ) = am , where Vism is the information set containing sm . Let h and h′ be two histories ending at vertices (positions) which are in the same information set Vij of player i. The crucial observation is that for a perfect recall game and any pure strategy β of player i, h is consistent with β if and only if h′ is consistent with β (and this property does not hold for games without perfect recall). Let Bi be the set of all pure strategies of player i. For each information set Vij of player i, let Bi (Vij ) be the set of all pure strategies β ∈ Bi of player i such that there exists a history h ending at Vij which is consistent with β (by our previous remark concerning perfect recall games, this happens if all histories ending at Vij are consistent with β). For an action a ∈ A(Vij ) available at Vij , let Bi (Vij , a) be the set of all pure strategies β ∈ Bi ofP player i such that β ∈ Bi (Vij ) and β(Vij ) = a. Let σ = {β∈Bi } bβ · β be a mixed strategy of player i, where bβ is the probability P of choosing pure strategy β, bβ = 1. ∗ Then a behavioral strategy σ outcome equivalent to σ is defined in the following way: for each information set Vij of player i and each action a ∈ A(Vij ) we set P β∈Bi (Vij ,a) bβ σ ∗ (Vij )(a) = P β∈Bi (V j ) bβ i P P = 0 then σ ∗ (Vij )(a) can be set to any constant P from the interval [0, 1] (in such a way that the the sum of a∈A(V j ) σ ∗ (Vij )(a) = 1). i Again the proof that σ and σ ∗ are outcome-equivalent is left as an exercise. Let us consider the game of Example 3. Player 1 has four pure strategies [ay] [az], [by] and [bz]. Let σ = b1 [ay] + b2 [az] + b3 [by] + b4 [bz] if β∈Bi (Vij ) bβ 6= 0. If β∈Bi (Vij ) bβ be a mixed strategy of player 1. Take the information set {s1 , y1 } of player 1. Then B1 ({s1 , y1 }) = {[ay], [az]}, i.e. histories ending at {s1 , y1 } are consistent either with [ay] or with [az], and they are consistent neither with [by] nor with [bz]. We have also B1 ({s1 , y1 }, y) = {[ay]}, i.e. [ay] is the only pure strategy such that histories ending at {s1 , y1 } are consistent with it and this strategy selects action y. Thus a behavioral strategy σ ∗ outcome equivalent to σ selects action y at {s1 , y1} with probability σ ∗ ({s1 , y1})(y) = b1 b1 + b2 if b1 + b2 6= 0. If b1 + b2 = 0 then we can set σ ∗ ({s1 , y1})(y) anyhow. 5 1.3 Subgame perfect equilibria for extensive games with imperfect information Subgame perfect equilibria for imperfect information games are defined in the same way as for perfect information games if we adopt the following definition of the subgame: a subgame is a subtree starting at some non-terminal position and such that that for each player i and for each information set Vij of player i either Vij is contained in the subtree or it is disjoint with the subtree. For example the game in Figure 3 has no proper subgames. On the other hand, the game of Figure 2 has two proper subgames rooted at w and x. There many other notions of equilibria for imperfect information games: sequential equilibria, trembling hand equilibria etc. Exercise 1 (from David Williams, Weighing the odds, Cambridge University Press). At the end of Monty Hall game show a contestant is shown three closed doors. Behind one of the doors is a car; behind each of the other two is a goat. The contestant chooses one of the three doors. The show’s host, who knows which door conceals the car, opens one of the remaining two doors which he knows will definitely reveal a goat. He then asks the contestant whether or not she wishes to switch her choice to the remaining closed door. Should she switch or stick to the original choice? Model this as a game with imperfect information. Answer the question given above. Note that there is a number of hidden assumptions. First of all we suppose that the contestant prefers a car to a goat (maybe this is obvious?). Secondly, all this description is common knowledge, in particular the contestant knows that the host knows where is the car and that the host will always open the door with a goat behind. The problem can be solved by elementary probability but I want that you model it as a game and solve on the basis of the game presentation. Now suppose that the host does not know where is the car, he opens the door and it turns out that there is a goat behind. In this case should the contestant switch or stick to the original choice? Exercise 2. A driver starts driving at S (see the figure below). At X he can either continue C or exit E and go to A with payoff 0. At Y he can either continue and get to C with payoff 1 or exit and go to B with payoff 4. However he cannot distinguish between intersections X and Y. What is his optimal behavioral strategy? An his optimal mixed strategy? X Y C C C 1 S E E 4 0 B A 6
© Copyright 2026 Paperzz