MA300.2 Game Theory II, LSE Summary of Lecture 4 More on Collusion and Punishments in Repeated Games 1. Punishments More Severe Than Nash Reversion In the previous section we provided a detailed example of a “real-life” game — the Cournot oligopoly — in which the one-shot Nash equilibrium fails to push firms down to their security level. This raises the question of whether more severe credible punishments are available. Example. [Osborne, p. 456.] Consider the following game: A1 B1 C1 A2 4, 4 0, 3 0, 1 B2 3, 0 2, 2 0, 1 C2 1, 0 1, 0 0, 0 The unique Nash equilibrium of the above game involves both parties playing A. But it is easy to check that each player’s security level is 1. So are there equilibrium payoffs “in between”? Consider the following strategy profile, descibed in two phases, which we describe as follows: The Ongoing Path. (Phase O) Play (B1 , B2 ) at every date. The Punishment Phase. (Phase P) Play (C1 , C2 ) for two periods; then return to Phase O. Start with Phase O. If there is any deviation, start up Phase P. If there is any deviation from that, start Phase P again. To check whether this strategy profile forms a SGPE, suffices to check one-shot deviations. Phase O yields a lifetime payoff of 2. A deviation gets her the payoff (1 − β)[3 + β0 + β 2 0] + β 3 2. Noting that 1 − β 3 = (1 − β)(1 + β + β 2 ), we see that a deviation in Phase O is not worthwhile if 2 + 2β + 2β 2 ≥ 3 √ or β ≥ ( 3 − 1)/2. What about the first date of Phase P? Lifetime utility in this phase is β 2 2 (why?). If she deviates she can get 1 today, then the phase is started up again. So deviation is not worthwhile if β 2 2 ≥ (1 − β)1 + ββ 2 2 (1) or if β ≥ one. √ 2/2. This is a stronger restriction than the one for phase O so hold on to this 2 Finally, notice without doing the calculations that it √ is harder to deviate in date 2 of Phase P (why?). So these strategies form a SGPE if β ≥ 2/2. Several remarks are of interest here. 1. The equilibrium payoff from this strategy profile is 2. But in fact, the equilibrium bootstraps off another equilibrium: the one that actually starts at Phase P. The return to that equilibrium is even lower: it is β 2 2. 2. Indeed, at that lowest value of β for which this second equilibrium is sustainable, the equilibrium exactly attains the minimax value for each player! And so everything that can be conceivable sustained in this example can be done with this punishment equilibrium, at least at this threshold discount factor. 3. Notice that the ability to sustain this security value as an equilibrium payoff is not exactly “monotonic” in the discount factor. In fact if the discount factor rises a bit above the minimum threshold you cannot find an equilibrium with security payoffs. But this is essentially an integer problem — you can punish for two periods but the discount factor may not be “good enough” for a three-period punishment. Ultimately, as the discount factor becomes close to 1 we can edge arbitrarily close to the security payoff and stay in that close zone; this insight will form the basis of the celebrated folk theorem. Example. [Abreu.] Here is a simple, stripped-down version of the Cournot example in which we can essentially try out the same sort of ideas. The nice feature about this example (in contrast to the previous one, the role of which was purely pedagogical) is that it has some collusive outcome better than the Nash whcih players are trying to sustain. L1 M1 H1 L2 10, 10 15, 3 7, 0 M2 3, 15 7, 7 5, −4 H2 0, 7 −4, 5 −15, −15 Think of L, M and H as low, medium and high outputs respectively. Now try and interpret the payoffs to your satisfaction. Notice that each player’s maximin payoff is 0, but of course, no one-shot Nash equilibrium achieves this payoff. 1. You can support the collusive outcome using Nash reversion. To check when this works, notice that sticking to collusion gives 10, while the best deviation followed by Nash reversion yields (1 − β)15 + β7. It is easy to see that this strategy profile forms an equilibrium if and only if β ≥ 5/8 . For lower values of β Nash reversion will not work. 2. But here is another one that works for somewhat lower values of β. Start with (L1 , L2 ). If there is any deviation play (H1 , H2 ) once and then revert to (L1 , L2 ). If there is any deviation from that, start the punishment up again. Check this out. The punishment value 3 is −15(1 − β) + 10β ≡ p, (2) and so the no-deviation constraint in the punishment phase is p ≥ (1 − β)0 + βp, or p ≥ 0. This yields the condition β ≥ 3/5 . What about the collusive phase? In that phase, the no-deviation condition tells us that 10 ≥ (1 − β)15 + βp but (2) assures us that this restriction is always satisfied (why?). So the collusive phase is not an issue, so our restriction is indeed β ≥ 3/5, the one that’s needed to support the punishment phase. 3. For even lower values of β, the symmetric punishment described above will not work. But here is something that else that will: punishments tailored to the deviator! Think of two punishment phases, one for player 1 and one for player 2. The punishment phase for player i (where i is either 1 or 2) looks like this: (Mi , Hj ); (Li , Mj ), (Li , Mj ), (Li , Mj ), . . . Now we have to be more careful in checking the conditions on the discount factor. First write down the payoff to players i and j from punishment phase Pi , the one that punishes i. It is p ≡ −4(1 − β) + 3β in stage 1 and 3 in each stage thereafter for the “punishee” player i and 5(1 − β) + 15β in stage 1 and 15 in each stage thereafter for the “punisher” player j. now, if i deviates in the first stage of his punishment he gets 0 and then is punished again. So the no-deviation condition is p ≥ (1 − β)0 + βp, or just plain p ≥ 0, which yields the restriction β ≥ 4/7 . What if i deviates in some future stage of his punishment? The condition there is 3 ≥ (1 − β)7 + βp = (1 − β)7 − β(1 − β)4 + β 2 3, but it is easy to see that this is taken care of by the β ≥ 4/7 restriction. Now we must check j’s deviation from i’s punishment! In the second and later stages there is nothing to check (why?). In stage 1, the condition is 5(1 − β) + 15β ≥ (1 − β)7 + βp. [Notice how j’s punishment is started off if she deviates from i’s punishment!] Compared with the previous inequality, this one is easier to satisfy. Finally, we must see that no deviation is profitable from the original cooperative path. This condition is just 10 ≥ (1 − β)15 + βp, 4 and reviewing the definition of p we see that no further restrictions on β are called for. 4. Can we do still better? We can! The following punishment exactly attains the minimax value for each agent for all β ≥ 8/15 . To punish player i, simply play the path (Li , Hj ); (Li , Hj ), . . . forever. Notice that this pushes player i down to minimax. Moreoover, player i cannot profitably deviate from this punishment. But player j can! The point is, however, that in that case we will start punishing player j with the corresponding path (Lj , Hi ); (Lj , Hi ), . . . which gives her zero. All we need to do now is to check that a one-shot deviation by j is unprofitable. Given the description above, this is simply the condition that 7 ≥ (1 − β)15 + β punishment payoff = (1 − β)15. This condition is satisfied for all β ≥ 8/15. So you see that in general, we can punish more strongly than Nash reversion, and what is more, this is a variety of such punishments, all involving either a nonstationary time structure (“carrot-and-stick”, as in part 2) or a family of player-specific punishments (as in part 4) or both (as in part 3). This leads to the Pandora’s Box of too many equilibria. The repeated game, in its quest to explain why players cooperate, also ends up “explaining” why they might fare even worse than one-shot Nash! 2. The Folk Theorem It turns out that the above observations can be generalized substantially provided players are patient enough. Recall the one-shot game G, and look at the set of all payoff vectors p ∈ IRn such that p = (a). This is the set of all feasible payoffs; call it F . Define F ∗ to be the convex hull of F . Notice that normalized payoffs in the infinitely repeated game generate values in F ∗ . [Explain this by drawing the set of feasible payoffs for the PD or for a coordination game.] 2.1. The Two-Player Folk Theorem. First assume that n = 2. Then the following result is true. Theorem 1. [Two Player Folk Theorem.] Consider any payoff vector p ∈ F ∗ such that p m, where m is the vector of security levels. Then for any > 0 there exists a threshold discount factor β ∗ such that if all players are more patient than β ∗ then a payoff vector -close to p is sustainable as a subgame perfect equilibrium payoff outcome. Proof. The proof involves some subtleties which are best absorbed step by step. So in the first, formal part of the proof I am going to assume that there is an action profile a with payoff f (a) exactly equal to p. Then I indicate later how to extend the argument. 5 For player i, let âi be an action that succeeds in minimaxing his opponent player j. That is, âi minimizes dj (ai ) over different choices of ai . Denote by â the vector of these two actions (â1 , â2 ). Also, denote by M the largest possible payoff in the one-shot game. The following claim is subtle and crucial. Claim. There exists β ∗ and a length of time T (an integer) such that for all β ≥ β ∗ , (3) pi > (1 − β)M + β p̃i , where (4) p̃i = (1 − β T )fi (â) + β T pi with (5) p̃i > mi for i = 1, 2. To see why such a β ∗ and T must exist, first substitute (4) into the right-hand side of (3) to get the expression h(β) ≡ (1 − β)M + β(1 − β T )fi (â) + β T +1 pi , and notice that this equals pi when β exactly equals 1. I want the derivative of this expression with respect to β to be positive evaluated at β = 1. (You’ll see why in a minute.) Take the derivative first: h0 (β) = −M + [1 − (T + 1)β T ]fi (â) + (T + 1)β T pi , so that h0 (1) = −M + fi (â) + (T + 1)[pi − fi (â)], which can be guaranteed to be positive for T chosen large enough. Now observe that for β close enough to 1, (3)–(5) all hold (the last because pi > mi and so p̃i > mi for β sufficiently close to 1). Now consider the following strategy profile. Begin by playing some action profile a such that f (a) = p. If there is any unilateral deviation, play the mutual minimaxing action profile â for T periods, and then start up a again. If there is any deviation, simply restart the T -period “punishment phase” all over again. To see that this is SGP, first consider the initial phase. If player i deviates, she gets (1 − β)di (a) + β punishment payoff ≤ (1 − β)M + β p̃i , and this inequality follows because di (a) ≤ M (the maximum payoff) and p̃i is precisely i’s payoff from punishment (examine (4)). But (3) tells us that the right hand side of the above inequality is itself no larger than pi , and so we must conclude that there is no profitable one-shot deviation for i during the initial phase. What about the punishment phase? Well, notice that it suffices to check deviations at the very first date of the punishment phase (why?). But at that date all that i can get is mi , because player 2 is minimaxing him! Consequently, the required no-deviation constraint is p̃i ≥ (1 − β)mi + β p̃i , 6 where the second term on the right hand side simply follows from the fact that we start the punishment up again. But this inequality follows right away from (5), and by the one-shot deviation principle, we are done. What follows is a precise but somewhat informal description of how the argument is extended to the case in which there is no action profile that exactly hits p. In this case, there certainly are a finite number of action profiles (no more than three actually for 2 players) for which the convex combination of payoffs equals p. Call these action profiles a1 , a2 and a3 , with associated payoffs p1 , p2 and p3 , so that for some nonnegative weights (λ1 , λ2 , λ3 ), λ1 p1 + λ2 p2 + λ3 p3 = p. Now we will have to choose β ∗ a little more tightly. Certainly, (3)–(5) will have to be satisfied as before, but now we need some more properties. We are going to play (along the “initial phase”) p1 , p2 and p3 in rotation, with relative time periods roughly proportional to (λ1 , λ2 , λ3 ). If we then take β very close to one, then the overall payoff generated will be very close to p. Indeed, the overall lifetime payoff for each player no matter which part of the “rotation” we are in will be very close to p. Now we will replace the initial phase in the formal part of the proof above by the rotated play of these three action profiles (as also in the second part of each punishment phase). The same arguments then go through. Notice the carrot-and-stick structure of punishments, very standard in repeated games. The good stuff typically comes later, the bad stuff comes first. It is the promise of the rewards later than makes players stick to their punishments (at least for symmetric punishments of the kind considered here). Also, do notice that we are not considering the strongest punishments possible. But it does not matter, because the statement of the folk theorem simply cannot be strengthened. You cannot drive players below their minimax values. 2.2. Remarks on Three or More Players. With three or more players, the folk theorem runs into some problems. Indeed, the theorem is generally false in this case. Consider the following Example. Player 1 chooses rows, player 2 chooses columns, and player 3 chooses matrices: U D L 1, 1, 1 0, 0, 0 R 0, 0, 0 0, 0, 0 U D L 0, 0, 0 0, 0, 0 R 0, 0, 0 1, 1, 1 Each player’s minmax value is 0, but notice that there is no action combination that simultaneously minmaxes all three players. E.g., to minimax player 3, 12 play U R. To minmax 2, 13 play U 2. To minmax player 1, 23 play L2. Nothing works to simultaneously minmax all three players. 7 Let α be the lowest equilibrium payoff for any player. Note that α ≥ (1 − β)D + βα, where D is the largest deviation payoff to some player under any first-period action supporting α. It can be shown (even with the iuse of observable mixed strategies) that D ≥ 1/4. So α ≥ 1/4. No folk theorem. The problem is that we cannot separately minmax each deviator and provide incentives to the other players to carry out the minmaxing, because all payoffs are common. If there is enough “wiggle-room” to separately reward the players for going into the various punishment phase, then we can get around this problem. A sufficient condition for this is that F ∗ has full dimensionality. We omit the details.
© Copyright 2026 Paperzz