Games and Economic Behavior 28, 310–324 (1999) Article ID game.1998.0701, available online at http://www.idealibrary.com on Nash Equilibria of Repeated Games with Observable Payoff Vectors* Tristan Tomala Cermsem, Université Paris-I Panthéon-Sorbonne, Ceremade, Université Paris Dauphine, Place de Lattre de Tassigny, 75016 Paris, France E-mail: [email protected]. Received October 25, 1995 We study a model of repeated games with imperfect monitoring where the payoff vector is observable. In this situation, any profitable deviation is detectable by all the players but the identity of the deviator may be unknown. We design collective punishments directed against the set of potential deviators. A particular class of signals is studied for which a characterization of the set of equilibrium payoffs is obtained. Journal of Economic Literature Classification Numbers: C73. © 1999 Academic Press 1. INTRODUCTION This paper deals with Nash equilibiria of undiscounted repeated games with imperfect monitoring. The players are involved in a repeated strategic interaction and their aim is to maximize their long-run payoff. They rely on the information collected along the play to choose their current actions. Here, this information is given by a public signal which depends on the joint action. Models of imperfect monitoring give a quite realistic representation of long strategic interaction and the assumption of public observation is often found in real life or economic situations. It can be thought of as radio broadcast, official statistics and in the case of industrial competition, as the total supply in the market or as the price system of the preceding period. The well-known Folk theorem (for a survey see Sorin, 1992) deals with perfect monitoring, i.e., situations where players are informed of the ac* I thank Professor J. Abdou for motivating this work and for constant help, support and fruitful discussions, Professors E. Lehrer and S. Sorin for helpful remarks and comments and also G. Giraud, O. Gossner, and an anonymous referee for remarks improving the exposition of the paper. 0899-8256/99 $30.00 Copyright © 1999 by Academic Press All rights of reproduction in any form reserved. 310 repeated games with observable payoffs 311 tion profile chosen at each stage, and states that any feasible individually rational payoff vector is an equilibrium payoff of the repeated game. This is proved by prescribing strategies playing pure actions on the equilibrium path. Thanks to perfect monitoring, any deviation is detected by all the players and the deviator is identified and punished to his minmax level forever. With imperfect monitoring, new problems arise: (a) A player may deviate without affecting the observations of his opponents. (b) A deviation may be observed by a strict subset of players only. A collective punishment cannot start unless the players communicate to coordinate their play. (c) An observed deviation may be compatible with several deviators. Considering point (a), two leading papers on the subject are Fudenberg et al. (1994) for the discounted case and Lehrer (1990) for the undiscounted case. The main issue of Fudenberg et al. (1994) is to consider public random signals and to prove that a Folk theorem holds under some informational assumptions. Their main assumptions (individual full rank and pairwise identifiabilty) imply that a unilateral deviation must change the distribution on public signals (i.e., must be statistically detectable) and that the deviations of two different players can be distinguished (again statistically) from each other. These assumptions are generic provided that the number of possible values of the public signal is large enough. For the undiscounted case, in Lehrer (1990), each player’s action set is endowed with a partition and when player i chooses the action si , the element of the partition to which si belongs is publicly revealed. This assumption guarantees that deviations from different players can be distinguished from each other. Then, Lehrer (1990) studies the impact of profitable and undetectable deviations. Point (b) is studied in Fudenberg and Levine (1991) and in Ben Porath and Kahneman (1996). Players have to communicate through the signals to agree on a punishment plan. In this paper we deal with point (c) only. We consider public signals and observable payoff vectors so that: • any deviation that changes the payoff for one player is detectable; • all players detect deviations simultaneously. When a signal that was not expected at equilibrium is observed, the identity of the deviator may not be revealed. It may occur that when two players i; j can induce this signal, punishing player i rewards player j. A profitable deviation for player j would be then to induce such “bad” signals and gain from the punishment of player i. Such players must be punished simultaneously. We thus design collective punishments against coalitions. Such 312 tristan tomala considerations are meaningless in the two-player case where the identity of the deviator is always revealed. 2. THE MODEL 2.1. The infinitely repeated game A repeated game with public signals is given by • a set of players N = 1; : : : ; n with n ≥ 3, • for each i in N, aQfinite nonempty set S i of actions, and a one-shot payoff function gi from j∈N S j to (w.l.o.g. we assume all payoffs to be non-negative), Q • a public signal function ` from j∈N S j to some finite set A. The repeated game 0∞ is described as follows: at each stage t = 1; 2; : : : player i chooses sti in S i and if st is the profile of actions chosen, the public to all the players. A total history of length t signal at = `st is announced Q is an element of Ht = j∈N S j t . The set of private histories of length t for player i is Hti = S i × At . A pure strategy σ i for player i is a sequence of i into S i . A mixed strategy for player functions σti t≥1 , where σti maps Ht−1 i is a probability distribution over the set of player i’s pure strategies. A behavior strategy σ i for player i is a sequence of functions σti t≥1 , where σti i maps Ht−1 into 1S i , the set of probability distributions over S i . Perfect recall is assumed and Kuhn’s theorem allows one to restrict the study to P behavior strategies. Let i be the set of behavior strategies for player i. A joint behavior strategy σ = σ 1 ; : : : ; σ n induces a probability distribution over total histories in a natural way. Let σ be this probability and Ɛσ be the induced expectation operator. If gti is the payoff for player i at PT i i i∈N is a collection of sets stage t, denote γT σ = Ɛσ 1/T t=1 gt . If E iQ indexed on N, an element e1 ; : : : ; en of E = i∈N E i will simplyQbe denoted by e. We will denote by e−i the current element of E −i = j6=i E j , and we will write e = ei ; e−i when the ith component is stressed. We deal with uniform equilibria which are for all ε > 0, ε-Nash equilibria of the T -fold repeated game for large enough T . Definition 2.1. A joint behavior strategy σ is a uniform equilibrium if: (i) for each player i, limT →∞ γTi σ exists, (ii) for all ε > 0, there is a stage T0 such that ∀T ≥ T0 ; ∀i ∈ N; ∀τi ∈ 6i ; γTi τi ; σ −i ≤ γTi σ + ε: If σ is a uniform equilibrium, we put for each i, γ i σ = limT →∞ γTi σ and the vector γ 1 σ; : : : ; γ n σ is an equilibrium payoff. Let E∞ be repeated games with observable payoffs 313 the set of equilibrium payoffs. The main issue of this paper is to give a description of E∞ . Let gS be the set of payoff vectors associated to pure joint actions in the one-shot game and co gS the set of feasible payoffs be its convex hull. We define now the individual rationality levels in the repeated game. Notation 2.2. (i) The independent minmax for player i is vi = x−i ∈ Qmin max gi xi ; x−i : j i i j∈N\i 1S x ∈1S (ii) The correlated minmax for player i is wi = min max gi xi ; x−i : x−i ∈1S −i xi ∈1S i In repeated games with imperfect monitoring, a player’s payoff can be decreased to his independent minmax, whatever the observation structure is: his opponents just have to play x−i that achieves the minimum in 2.2 i) at each stage. Howerver, the signals can be used by the players to generate correlated distributions on their set of actions. This issue is studied in detail in Lehrer (1991). Thus, a player can guarantee himself his correlated minmax by playing at each stage a best response to the expected distribution on the moves of his opponents. Considering optimal punishments, a new bound is obtained. The payoffs associated to a player’s punishment are defined as follows. Definition 2.3. (i) Players −i can force player i to the payoff z i in if ∀ε > 0; ∃σ −i ∈ 6−i ; ∃T0 ; s.t.: ∀τi ∈ 6i ; ∀T ≥ T0 ; γTi τi ; σ −i ≤ z i + ε: (ii) The repeated game minmax of player i is defined as i = infz i ∈ s.t. players −i can force i to z i : v∞ i ≥ wi . Remark 2.4. For each player i, vi ≥ v∞ Note, as in Fudenberg and Levine (1991), that in repeated games of complete information, having ε-equilibria for all ε allows one to construct equilibria by concatenation. Players −i can actually play a strategy adapted i + ε on finite blocks of length T ε, forcing player i closer and closer to v∞ i to v∞ as ε goes to zero. Thus, the following definition makes sense. Definition 2.5. 6−i such that A punishing strategy against player i is a strategy σ −i ∈ i + ε: ∀ε > 0; ∃T0 ; s.t: ∀τi ∈ 6i ; ∀T ≥ T0 ; γTi τi ; σ −i ≤ v∞ Denote IR∞ the set of individually rational payoffs of the repeated game, i : IR∞ = u1 ; : : : ; un ∈ N s.t. ∀i ∈ N; ui ≥ v∞ 314 tristan tomala Lemma 2.6 E∞ ⊂ IR∞ : This lemma is a direct consequence of the definitions. i can actually be equal to wi . The following example shows that v∞ Example 2.7. Consider the three-player repeated game described below. Player 1 chooses the row, player 2 the column, and player 3 the matrix. The signal is given by e f g h x y z t x y z t a b o o b a o o o o c d o o d c α β o o β α o o o o γ δ o o δ γ t x y z t –1 –1 0 –1 –1 0 0 0 0 0 0 0 0 0 0 0 and the payoffs for player 3 are e f g h x y z 0 0 0 0 0 0 0 0 0 0 0 0 –1 –1 –1 –1 In this game we have v3 = −1/4 and w3 = −1/2. The correlated minmax for player 3 is achieved by the correlation matrix M 1/8 1/8 0 0 1/8 1/8 0 0 0 0 1/8 1/8 0 0 1/8 1/8 and a best response of player 3 is 1/2; 1/2. How do players 1 and 2 force player 3 to w3 ? They play in such a way that at each stage, the expected distribution on S 1 × S 2 conditional on the public history and on past moves of player 3 will be exactly M. Then, the expected payoff for player 3 at each stage will be at most w3 . Suppose that player 2 chooses x or y with probability 1/2. Since player 1 knows his own move, the structure of the signal allows him to know whether player 2 played x or y. Player 3 observing the resulting signal still attributes a probabilty of 1/2 to x and y. We will thus condition the strategies for players 1 and 2 on past moves of player 2. We represent S 1 × S 2 as a 4 × 4 matrix, and split it into four 2 × 2 submatrices, top-left, top-right, bottom-left, and bottom-right. repeated games with observable payoffs 315 The correlation procedure is as follows. • At stage 1, both play 1/2; 1/2 in the top-left submatrix. – If the move of player 2 was x at stage 1, then at stage 2 they remain in the top-left square and play again 1/2; 1/2. – If the move of player 2 was y, then at stage 2 they switch to the bottom-right submatrix where they both play 1/2; 1/2. • If, at stage T , they play in the top-left square, they apply the above procedure to determine their strategies at T + 1. • If they play in the bottom-right square at stage T , then – if the move of player 2 is z they remain in the bottom-right square – if 2 plays t, they switch to top-left. When both adhere to this strategy, the correlation matrix M is generated at each stage T ≥ 2. The key argument here is that the move of player 2 is a private information for players 1 and 2. Note that any information known by players 2 and 3 is also known by player 1. Thus, given the public history and the past moves of player 1, the future moves of players 2 and 3 are 1 2 = v1 and by a similar argument, v∞ = v2 . independent. Therefore, v∞ 2.2. The information structure We introduce now the signaling structure and its implications in terms of payoffs. We focus our attention on games with observable payoffs. Definition 2.8. The payoff vector is observable if ∀s; s0 ∈ S × S; gs 6= gs0 ⇒ `s 6= `s0 : Given this assumption, any profitable deviation is observable by all the players. However, the signal may not reveal the identity of a single deviator but a set of potential deviators. It turns out that this whole set has to be simultaneously punished. We shall define new level of punishments using vector payoffs. Only some subsets of N deserve such an analysis, i.e., the subsets whose members can be suspected at the same time to have deviated. We describe these subsets now. Definition 2.9. For all players i, j, i ∼ j ⇔ ∃t i ; t j ∈ S i × S j ; ∀s ∈ S; `t i ; s−i = `t j ; s−j : 316 tristan tomala Players i and j are equivalent when both of them have an action (t i and t j ) that induces the same public signal whatever the joint action s is. Two equivalent players cannot be differentiated by the others through the signal. We justify now that this is an equivalence relation. Reflexivity and symmetry being clear, we prove transitivity only. Take i; j; k such that ∀s ∈ S; `t i ; s−i = `t j ; s−j and ∀s ∈ S; `r j ; s−j = `r k ; s−k . Then `t i ; sj ; sk ; s−i−j−k = `t i ; r j ; sk ; s−i−j−k = `t i ; sj ; r k ; s−i−j−k and `si ; sj ; r k ; s−i−j−k = `si ; t j ; r k ; s−i−j−k = `t i ; sj ; r k ; s−i−j−k : Hence, ∀s ∈ S x `t i ; s−i = `r k ; s−k . We denote by N the associated partition of N. Definition 2.10. put For all M in N such that M ≥ 2 and for all i in M, T i = t i ∈ S i ∀ j ∈ M\i; ∃t j ∈ S j ; ∀s ∈ S; `t i ; s−i = `t j ; s−j : T i is the set of actions of player i such that the property of definition 2.9 holds for each player j equivalent to i. Remark that as soon as a member i of M chooses an action in T i , the value of the signal does not depend on the action of any other member of M. Moreover, this value is the same for any t i ∈ T i . This is summarized by the following lemma. 2.11. S Lemma i T × S −i . i∈M on The function ` depends only on s−M = sk k∈M / Proof. We will prove this lemma in two steps. Take first a player i ∈ M and t i ∈ T i . For any j in M, the signal does not depend on player j’s action as soon as i plays t i . Since there is t j such that ∀s ∈ S; `t i ; s−i = `t j ; s−j , we have ∀s ∈ S; `t i ; sj ; s−i−j = `t i ; t j ; s−i−j . Hence, the signal depends only on t i and on s−M . Second, we prove that the value of the signal is the same for all t i ∈ T i . If ∀s ∈ S; `t i ; s−i = `t j ; s−j and ∀s ∈ S, `r i ; s−i = `r j ; s−j then ∀s ∈ S, `t i ; s−i = `t j ; s−j = `r i ; s−i = `r j ; s−j . This is because `t i ; s−i does not depend on sj and `r j ; s−j does not depend on si . Hence `t i ; r j ; s−i = `r j ; s−j . A direct consequence of this lemma is that ∀s−M ∈ S −M , `·; s−M is Q i constant on i∈M T Q. Moreover, if the payoff vector is observable, g·; s−M is also constant on i∈M T i . Q We choose and we fix t M ∈ i∈M T i . We can now define the generalized minmax levels for the coalitions M in N . repeated games with observable payoffs Definition 2.12. 317 For all M in N , • if M = i, i V M = u ∈ N ui ≥ v∞ • if M ≥ 2, M × N\M : V M = co gt M × S −M + + When M contains at least two players, V M is a generalized minmax for coalition M. The similarities with the usual minmax level are: • first, that a player i in M can guarantee that the payoff vector for M will be in this set by playing any action in T i ; • second, the payoff vector for M can be held down to this set for any strategy of a player i in M: as soon as another player j in M plays in T j , player i cannot control the payoff. We shall consider in the sequel a condition on the signal for which the analysis is easier. Along the play, when the players observe a deviation, they can compute the set of players that possess an action compatible with the observed signal. If this set is an equivalence class for our relation, it can be punished to the generalized minmax previously defined. Take s ∈ S and a ∈ A and define Ns; a as the set of players who have an action inducing the signal a against s. Formally, Ns; a = i ∈ N ∃t i ; `t i ; s−i = a Condition C Ns; a ≥ 2 and `s 6= a ⇒ Ns; a ∈ N : The meaning of this condition is the following. The main idea in defining equilibrium strategies in this paper is that when a deviation occurs, each player computes a set of suspects [Ns; a if s was to be played and if a was observed]. In full generality, this set may be any coalition. Our condition only allows the set of suspects to belong to a certain family of coalitions. This requirement is rather strong, but relaxing it makes it more difficult to assign a deviation to a particular subset of players. Namely, the set of suspects should evolve during the play, making the punishing strategy more intricate to define. This problem can be, however, dealt with, see Tomala (1998) for a study in the pure strategy case. However, with general signals the very definition of the set of suspects is unclear. This is mainly due to mixed strategies and to the fact that players cannot predict their opponent’s moves with certainty. We are here in the special case of an observable payoff vector where we can always find an equilibrium strategy which is pure on equilibrium path (see the proof of the theorem). Nevertheless, Examples 2.13, 2.14, and 3.2 exhibit natural signaling functions satisfying C. 318 tristan tomala Example 2.13. We say that ` is rectangular if for all a in A, the inverse image of a by `, `−1 a is a direct product. This implies that for all players i and j, `t i ; s−i = `t j ; s−j = a ⇒ `si ; s−i = `sj ; s−j = a: In this case, our equivalence relation has very strong properties. If i ∼ j, then either i = j or, if i 6= j, neither i nor j can influence the signal. Furthermore T i = S i and T j = S j . Thus, if M contains at least two players M × N\M . V M = co gS + + C holds for a rectangular signal since a situation where `t i ; s−i = j −j `t ; s 6= `s is impossible. The interest of rectangular signals is the following. Consider a repeated game where the one-shot game is in extensive form and where the signal associated to a joint action (i.e., a joint strategy in the extensive form game) is the unique terminal node of the underlying tree. It is proven in Abdou (1994) that this mapping is rectangular. A repeated extensive game with observation of the terminal node is a natural case to analyze. Furthermore, in this setup the payoff vector is observable. Example 2.14. A signal for which C is verified and for which our equivalence notion is not trivial is the following. We endow the set of players with a partition N and divide each player’s action set into two subsets, namely for each i, S i = T i ∪ Ri with T i nonempty. The public signal ` is as follows: for each M ∈ N , we are given a map `M on S M and `s = `M sM M∈N . The definition of `M sM is the following. • If for all i ∈ M, si ∈ Ri : `M sM = sM ; • If there is i ∈ M such that si ∈ T i : `M sM = 0M , where 0M is a blank signal. For i ∈ M, a r i ∈ Ri is called revealing since if all players in M play a revaling action, the signal is the joint action. Otherwise, if a player i in M plays a hiding action t i ∈ T i , the signal reveals nothing. Remark that the equivalence relation of definition 2.9 leads to the same partition and to the same sets T i and that this function satisfies C. 3. THE MAIN THEOREM We are now ready to state the leading result of this paper. Theorem 3.1. (i) In a repeated game with observable payoff vector, \ V M: E∞ ⊂ co gS ∩ IR∞ ∩ M∈N repeated games with observable payoffs (ii) Under condition C, E∞ = co gS ∩ IR∞ ∩ \ 319 V M: M∈N Remark that the theorem does not give a completely computable expression of the set of equilibrium payoffs since the repeated game minmax levels are not characterized according to the one-shot game. However, since i ≤ vi , we know that a feasible and individually rational (in the wi ≤ v∞ sense of the vi ’s) payoff is an equilibrium payoff if and only if it belongs to all V M’s. Example 3.2. Consider a three-player repeated game where players 1 and 2 have two actions and player 3 has three actions. Player 1 chooses the row, 2 chooses the column, and 3 chooses the matrix. The signaling function is given by e f x y x y x y a b b b c c c c d d d d L M R and the payoff function e f x y x y x y 1,1,1 4,4,0 4,4,0 4,4,0 0,3,0 0,3,0 0,3,0 0,3,0 3,0,0 3,0,0 3,0,0 3,0,0 L M R Condition C is verified. The vector 1; 1; 1 if feasible and individually i = 0. At any stage where rational since for each player i, vi = wi = v∞ the signal a is supposed to be observed, 1 and 2 can profitably deviate by inducing the signal b. In this case, 3 will not know who deviated. Hence, whom to punish: player 1 by playing M rewarding player 2 or player 2 by playing R rewarding player 1? Player 3 should punish both simultaneously. We have here: • N = 1; 2; 3 2 2 × = u ∈ + × • V 1; 2 = co 4; 4; 0; 0; 3; 0; 3; 0; 0 + + 1 2 u + u ≥ 3 2 × u1 + u2 ≥ 3: • E∞ = co gS ∩ IR∞ ∩ u ∈ + We turn now to the proof of Theorem 3.1. Proof of (i). It is enough to prove that for M ∈ N with M ≥ 2, E∞ ⊂ V M. Let u ∈ co gS and σ ∈ 6 such that γσ exists and equals u. We 320 tristan tomala FIGURE 1 prove that u ∈ / V M ⇒ u ∈ / E∞ . The proof is divided in two arguments. We first prove that every player i in M has a deviation τi inducing against σ −i a payoff in V M. Second, we deduce from this fact that σ is not a uniform equilibrium. For any i ∈ M, define τi the pure stategy of player i which plays at each stage t M i the action of player i in the M-tuple t M , regardless of the history. Since, for all i; j ∈ M and s in S, `t M i; s−i = `t M j; s−j , τi ; σ −i and τj ; σ −j induce the same distributions of public signals. For τ a joint behavior strategy, let Qτ be the probability induced by τ on A∞ the set of public infinite histories. For all i; j ∈ M, Qτi ;σ −i = Qτj ;σ −j . The payoff vector being observable depends on the signal only and therefore Z Z γTi τi ; σ −i = giT dQτi ; σ −i = giT dQτj ; σ −j = γTi τj ; σ −j Denote by gt the random payoff vector at stage t and by gT the average of gt t≤T . Put γTi = γTi τi ; σ −i = γTi τj ; σ −j and γT = γT1 ; : : : ; γTn . For all i in M, under τi ; σ −i , gt ∈ gt M × S −M at each stage t. Hence, gT ∈ co gt M × S −M . Because the latter set is convex, we take the expectation and find γT ∈ co gt M × S −M ⊂ V M. It is then impossible for σ to be a uniform equilibrium. Since u ∈ / V M which is closed and convex, there is ε > 0 such that, for all u0 ∈ V M, there is i ∈ M such that ui < u0 i − ε. Hence, for every stage T , there is i ∈ M who has an ε-profitable deviation, that is, γ i σ < γTi − ε. repeated games with observable payoffs 321 C that co gS ∩ IR∞ ∩ T Proof of (ii). We prove under condition T V M ⊂ E . Let u ∈ co gS ∩ IR ∩ ∞ ∞ M∈N M∈N V M. We will construct a uniform equilibrium σ with payoff u. As usual in repeated games with complete information, this strategy will consist in a main path to be followed by all players and punishments in case of deviation. If player i is identified as the deviator, he should be punished to his minmax; if the set of possible deviators is a coalition M, they will be simultaneously punished. Fix h∗ = st∗ ∞ t=1 , an infinite play which leads to u, i.e., lim T 1 X gst∗ = u: T t=1 This play will be referred to as the main path. Let α∗ = `st∗ ∞ t=1 be the public history associated to h∗ . Denote α∗T = `st∗ Tt=1 , the public history of length T . Let us describe now the punishments. • For each player i, let σ i j be the punishing strategy for player i against player j given by definition 2.5. • For each M ∈ N with M ≥ 2 , there is πM ∈ co gt M × S −M such that for i ∈ M, ui ≥ π i M. Fix a play hM = st Mt≥1 leading to P πM, i.e., lim1/T Tt=1 gst M = πM and such that for all t, st M ∈ M −M t × S . We are ready now to define the strategy σ i for player i. Let hiT = be a history of length t for player i and αT = at t≤T the public part of this history. We will define σTi +1 hiT for all T . sti ; at t≤T ∗i • σTi +1 hi∗ T = sT +1 (player i follows the main path). • If αT 6= α∗T , put p = inft; αt 6= α∗t , there is a ∈ A s.t. αp = ∗ αp−1 ; a. Compute Nsp∗ ; a. i i i i – If Nsp∗ ; a = j then σp+T +1 hp+T = σ T jhp+T T where i is the i-history of length T obtained from hp+T by suppressing the p first observations. (Player i punishes forever the only possible deviator and his punishing strategy starts at time p+1.) hip+T T – If Nsp∗ ; a ≥ 2, from condition C, Nsp∗ ; a = M ∈ N and i = sp+T +1 M [if the deviator is a member of M, the whole subset M is held down to πM forever]. i i σp+T +1 hp+T We prove now that σ is a uniform equilibrium with payoff u. If all players adhere to this strategy, it is clear that γσ is well defined and equals u. It remains to show that σ is an ε-equilibrium in the T -fold repeated game for T greater than some T0 . Let ε > 0, we first construct the associated T0 . 322 tristan tomala From the properties of convergence of the payoff on the equilibrium path and in punishment phases, there is an integer K such that for T ≥ K, for each player i and subset M ∈ N : • the payoff received by following the equilibrium path for at least T stages is within ε of ui , • being punished by the strategy σ −i for at least T stages yields a i + ε, maximal payoff less than v∞ • if i is a member of M, being punished as such for at least T stages yields a payoff less than π i M + ε. C being an upper bound for all payoffs appearing in the one-shot game, choose T0 ≥ maxK + 1/ε; C/ε; 2K and T ≥ T0 . Take a player i and let τi ∈ 6i be a pure strategy. If the deviation never changes the signal, it also never changes the payoff and it is not profitable. Otherwise let p be the first stage where a deviation is observed. • If at stage p, player i is identified as the only possible deviator, then his payoff is at most p − 1/T uip−1 + C/T + T − p/T vTi −p where uip−1 is the average payoff for player i when the equilibrium path is followed until stage p − 1 and vTi −p is the maximal payoff for player i when he is punished from stage p + 1 to stage T . – If p − 1 ≤ K, then p − 1/T ≤ ε and T − p ≥ T − K ≥ K + 1. i + ε ≤ ui + ε1 + 2C. Then the payoff for player i is at most 2Cε + v∞ – If p − 1 ≥ T − K, then T − p/T ≤ ε and p − 1 ≥ T − K ≥ K. Again the payoff for i is at most ui + ε1 + 2C. – If K < p − 1 < T − K, then the equilibrium path and the punishment phase are followed for at least K stages. Thus uip−1 and vTi −p are less than ≤ ui + ε, therefore the average payoff is less than ui + ε1 + C. • If at stage p, the set of potential deviators is M with i ∈ M, the maximal average payoff for player i is p − 1/T uip−1 + C/T + T − p/T πTi −p where πTi −p is the payoff for player i when he is punished from stage p + 1 to stage T as a member of M. Then, the same calculations as above can be made. We find then that for all ε > 0, there is T0 such that for T ≥ T0 for each player i, for each pure strategy τi , γTi τi ; σ −i ≤ ui + ε1 + 2C. If µi is a mixed strategy for player i, then taking the expectation in this inequality with respect to µi implies that this holds for any mixed deviation. Hence, σ is a uniform equilibrium. repeated games with observable payoffs 323 Corollary 3.3. In a repeated extensive form game where the terminal node is publicly observable, E∞ = co gS ∩ IR∞ : This is a direct consequence of Example 2.13 and Theorem 3.1. 4. CONCLUDING REMARKS 4.1. Discounted Games Our results easily extend to Nash equilibria of discounted games for low discount factor. The same arguments as in the proof of Theorem 3.1(i) show that any Nash payoff of the discounted game lies in all the V M’s. Each strictly individually rational feasible payoff vector that lies in intersection of the interiors of the V M’s is Nash payoff for low discount factor, provided that there exists such a vector. The construction of the equilibium strategy is basically the same as in the proof of 3.1(ii). 4.2. Extensions Further Questions on This Model It would be natural to extend the study to finitely repeated games, but the key of the folk theorem for finitely repeated games (Benoit and Krishna, 1987) is that for each player, there is a Nash equilibrium of the one-shot game at which he receives strictly more than his minmax level. Then, ending the play by such Nash equilibria will induce a negative profit in case of deviation. The difficulty here is that the payoff received by a subset of players M when it is punished depends on the equilibrium payoff. Therefore a generalization of the condition given in Benoit and Krishna (1987) would be that for each M ∈ N , there is a one-shot Nash equilibrium payoff whose coordinates in M are strictly greater than those of any payoff in co gt M × S −M . It is easy to check on Example 3.2 that this is much too strong a requirement. Some new strategic ideas have to be developed here. Relaxing Our Assumptions Considering games with an unobservable payoff vector implies that there will be some undetectable and profitable static deviations. This issue has been studied in details in Lehrer (1990), where in particular it is shown that the equilibrium strategies may not be pure on the equilibrium path. Then some statistical inference on the set of suspects has to be performed. Up to now this problem is not solved. 324 tristan tomala Relaxing condition C implies that after a first deviation, the set of suspects may not be a class for our equivalence relation. This has two consequences. First, the definition of punishing stratgies against a subset of players becomes much more intricate. Second, some additional information on the identity of the deviator may be revealed along the play and the set of suspects evolves with time. We have given in Tomala (1998) a solution to this problem for a general public signal in the case of pure strategies (with compact action spaces). REFERENCES Abdou, J. (1994). “Rectangularity and Tightness: A Normal Form Characterization of Perfect Information Game Forms,” mimeo. Benoit, J.-P., and Krishna, V. (1987). “Nash Equilibria of Finitely Repeated Games,” Int. J. Game Theory 16, 197–204. BenPorath, E., and Kahneman, M. (1996). “Communication in Repeated Games with Private Monitoring,” J. Econ. Theory 70, 281–298. Fudenberg, D., and Levine, D. (1991). “An Approximate Folk Theorem with Imperfect Private Information,” J. Econ. Theory 54, 26–47. Fudenberg, D., Levine, D., and Maskin, E. (1994). “The Folk Theorem With Imperfect Public Information,” Econometrica 62, 997–1039. Lehrer, E. (1990).“Nash Equilibria of n-Player Repeated Games with Semi-Standard Information,” Int. J. Game Theory 19, 191–217. Lehrer, E. (1991). “Internal Correlation in Repeated Games,” Int. J. Game Theory 19, 431– 456. Sorin, S. (1992). “Repeated Games With Complete Information,” Chap. 4 of Handbook of Game Theory with Economic Applications, Vol. 1, pp. 71–107 (R. Aumann and S. Hart, Eds.). Amsterdam: North-Holland. Tomala, T. (1998). “Pure Equilibria of Repeated Games with Public Observation,” Int. J. Game Theory 27, 93–109.
© Copyright 2026 Paperzz