Multi-stage games with observed actions

Game Theory/Econ 5300
University of Wyoming
Course Note 5:
Infinitely Repeated Games & the Folk Theorems
Repeated games are multi-stage games where players play the same game (called the
“stage game”) repeatedly a fixed number of periods T   in a grand “supergame”. They
cannot stop playing the game at will, like they do in a war of attrition or preemption game
(see e.g. course note 4). The fact that you are playing the same stage game repeatedly
tends to create additional Nash, subgame perfect Nash, renegotiation-proof, and other
equilibriums. The reason is that, when the game is repeated, the players can condition
their current actions on the history of the game. For example, the other players can
reward or punish player 1 in the future depending on how she behaves in the present.
They do so by playing the game more or less favorably in terms of what player 1 would
prefer. In the language of dynamic programming, players can promise each other
different ‘continuation values’ for the remainder of the game depending on past events.
Repeated games are often used to describe trust situations: under which conditions will
two or more players trust each other - when will the short term incentive to cheat the
other players be less than the long term loss? One answer is that the short term gain to
cheating must be less than the punishment. However, this is not quite enough: we also
have to ensure that a second player who is supposed to punish a first player for cheating
is willing to carry out the punishment. Otherwise, the punishment is not a credible threat
and the first player will not be afraid of cheating. For an SPNE, therefore, players must
play a Nash equilibrium both in the cooperation phase, the punishment phase and any
other subgames that might occur.
Payoff Normalization
It is convenient in infinitely repeated games to normalize a player’s payoff x in a given
period to (1   ) x , where   (0,1) is the discount factor. That way his lifetime payoff is

(1   ) x
simply   t (1   ) x 
 x . This makes some graphical representations easier.1
(1   )
t 0
Subgame Perfect Equilibriums in Infinite Horizon Games:
First, notice that simply playing a Nash equilibrium of the stage game in every period is a
subgame perfect equilibrium in the infinitely repeated game. This is because if (a) you
are playing a Nash equilibrium in each period and (b) the only subgames are the
continuation games beginning in every period, then you must be playing a Nash
equilibrium in every subgame. For example “Confess, Confess” in the prisoner’s
dilemma in every period is a subgame perfect equilibrium in the repeated prisoner’s
dilemma. Likewise, in the ranked coordination game
1
We are computing the payoffs in the repeated game as the present value of future payoffs (using the
discount factor   1 /(1  r ) , where r is the discount rate) Fudenberg and Tirole (hapter 5 discuss other
ways to compute the value of the payoff stream, like the overtaking and time-average criterion. Since the
present value approach is used widely in economics and game theory, we will use it also.
1
Player 2
Player 1
Up
Down
Up
2,2
0,0
Down
0,0
1,1
playing the stage game strategies {Up, Up} in periods 1,2,17, and 28, (Down,Down) in
periods 10 and 30, and the mixed strategies {Up with probability 1/3, Up with probability
1/3} in all other periods is a subgame perfect equilibrium in the repeated game. In
Chicken, playing {Continue, Veer off} in every period; or interchangeably with {Veer
off, Continue}; or interchangeably with the mixed equilibrium in the stage game..all these
are subgame perfect equilibriums in the repeated game.
However, repetition also allows for other outcomes in every period than simply Nash
equilibriums of the stage game. A series of results called the folk theorems (see below)
state that sufficiently patient players - players with a high enough discount factor  - may
choose, in every period of the repeated game, any of their stage game strategies, as long
as their infinite horizon payoff is above a critical minimum called the minmax payoff. In
other words, a large variety of technically feasible behavior - and not just behavior that is
a Nash Equilibrium in the stage game - can happen in SPNE in the repeated game.
Minmax Payoffs
The minmax payoff for each player plays a major role in repeated game theory. A
player’s minmax payoff is simply the lowest payoff other players can technically force
the player to accept in a single play of the stage game. In other words, it is her payoff in
the stage game when (a) the other players do their best to minimize what she gets,
knowing that (b) she will choose a best response to their strategies. Formally, the minmax
payoff for player i is


v i  min max g i ( i ,  i ) , where  i and  i denotes the pure or mixed strategies of
 i
i


player i and the other players in a single play of the game.
In some games the strategy profile which minmaxes player 1 is the same as the profile
that minmaxes the other players (e.g., “confess, confess” in the prisoner’s dilemma), but
this is not true generally. In Chicken, for instance, “Veer off, Continue” minmaxes player
2 but the opposite “Continue, Veer off” minmaxes player 2. Moreover, since the goal of
minmaxing is simply to punish a player as much as possible, the minmax profile for a
player may hurt the other players as well and not be a Nash equilibrium. For example, a
bank may prefer to write off a bad loan rather than take the debtor to court, but only the
court option will minmax the debtor. In the following game, similarly, the minmax
profiles for players 1 and 2 are, respectively (D,L) (which forces player 1 to accept to -5)
and (U,L) (which forces player 2 to accept 0), but only (D,R) is a Nash equilibrium.
2
Player 2
Player 1
L
R
U
-6,0
-1,-1
D
-5,-1
2,2
A minmax profile can also be mixed. In matching pennies below the lowest payoff you
can impose on the other player when he does his best to defend against it requires you to
randomize with probability 0.5. This leaves the opponent with expected payoff 1, while
any pure strategy can only impose payoff 2 on the other player.
Player 2
Player 1
Heads
Tails
Heads
2,0
0,2
Tails
0,2
2,0
The Set of Feasible and Strictly Individually Rational (FIR) payoffs
Define (1) a strictly individually rational payoff for player i is any payoff above his
minmax payoff, that is, any payoff vi  v i .
Define (2) the set of feasible payoffs is the convex hull of the set of all possible stage
game payoffs (Nash equilibrium or not): V  convex hull {v a  A with g (a)  v} , where
A is the set of action profiles in the stage game. The convex hull of a set is the smallest
convex set containing the set. The set of feasible payoffs for the prisoner’s dilemma with
payoffs
Player 2
Player 1
Confess
Deny
Confess
0,0
-L,1+G
Deny
1+G,-L
1,1
is shown below. Because each player’s minmax payoff is zero (which happens with the
profile “confess, confess”) the set of feasible and strictly individually rational (FIR)
payoffs is the subset of the feasible payoffs where both players get above zero.
3
v2
Feasible
payoffs
-L,1+G
1,1
FIR
payoffs
v1
0,0
1+G,-L
Below is the set of FIR payoffs for a ranked coordination game. Notice that in this game,
the lowest both players can hold each other to occurs with the minmax strategies {Up
with probability 1/3, Up with probability 1/3}, which gives expected payoff of 2/3
((1/3)(2)+(2/3)(0)}. The set of FIR payoffs are all feasible payoff combinations where
each player gets more than 2/3.
Player 2
Player 1
Up
Down
Up
2,2
0,0
Down
0,0
1,1
v2
FIR
payoffs
2,2
1,1
2/3, 2/3
v1
0,0
4
The set of FIR payoffs for chicken is also shown below, where we suppose that chicken
has payoff matrix:
Player 2
Player 1
Continue (C)
Veer off (V)
Continue (C)
-10,-10
10,-5
Veer off (V)
-5,10
0,0
In Chicken, the minmax strategy for player 1 is for player 2 to continue and player 1 to
veer off; the minmax strategy for player 2 is for player 1 to continue and player 2 to veer
off. Each player’s minmax payoff is therefore v i  5 .
v2
-5,10
FIR
payoffs
v1
10,-5
-10,-10
There is a small issue we skipped when drawing the sets of feasible payoffs before.
Consider for instance the Chicken example. It is clear that we can achieve discounted
per-period average payoffs at or close to (-5,10). For example, we can specify that they
play “C,V” for the first T   periods and then “C,V”. The payoff to player 1 of “V,C”
until T=999 and “C,V” from period 1000 towards, for instance, would be
1   9991
 1000
(1   )[
(5) 
(10)]  5 unless  is very close to one so she is extremely
1
1 
patient. However, we also claimed that the payoffs (2.5, 2.5) are feasible. The problem is
now that if players are very impatient,   0 , we cannot persuade either of them to
receive 2.5. by playing “V” today (get -5) and “C” tomorrow (get 10) for an average of
2.5, because they do not care about tomorrow (so  5   10  5  2.5 . However, we can
solve the problem if players use a public randomization device. For example, if they flip
a coin over who should play C and who should play V then the expected payoff is
precisely (10)(1/2)+(-5)(1/2) = 2.5. To simplify, we assume from now on that players do
have a public randomization device available, so they can in fact achieve any payoff in
the feasible set.
5
The Fudenberg and Maskin (1986) perfect folk theorem
Assume that the dimension of the set of feasible payoffs equals the number of players
Then, for any v  V with vi  v i for all players i, there is a discount factor   1 such
that for all   ( ,1) there is a subgame perfect equilibrium of G( ) with payoffs v .
Note: The assumption of “full dimensionality” of the set of feasible payoffs is satisfied
by the 2-player, 2 dimensional sets of feasible payoffs in the prisoner’s dilemma and
chicken drawn above - although it fails for the ranked coordination game above, where
the 2 players have a only 1-dimensional set of feasible payoffs. The Fudenberg-Maskin
theorem needs to assume full dimensionality since otherwise it can be difficult to reward
some players with high payoffs while punishing others with low payoffs. If players
cannot be rewarded enough for punishing deviators from an SPNE they may be unwilling
to carry out the punishment, making the punishment threat non-credible and meaning the
SPNE fails. But we will assume full dimensionality in all examples of the FudenbergMaskin theorem we study.
Below is first the proof (from Fudenberg and Tirole Chapter 5) and then examples of
applying the Fudenberg-Maskin Perfect Folk Theorem.
Proof:
(i) Assume there is a pure action profile with payoffs g (a)  v . If v can only be obtained
using a public randomization device, the proof is almost the same. Also assume that the
minmax profile for player i played by the other players, mi i , involves pure strategies (
otherwise again the proof similar). Here, mi i denotes the action profile played in the
stage game by the other players when seeking to minimize i ’s maximum achievable
stage game payoff. If punishment requires mixing, we just need a small extension to the
proof. Choose a payoff vector v ' in the interior of the set of feasible payoffs V and any
  0 such that for each i
vi'  (v1'   , v2'   , v3'   ,...vi' 1   , vi' , vi' 1   , vi' 2   ,...)
and
v i  vi'  vi
vi' is the payoff vector we apply in the post-punishment phase if player failed to carry out
her punishment. If instead all players carried out the punishment they were supposed to,
or some other player deviated in the punishment phase, then the post-punishment phase
will give payoffs v 'j for some other player j and i gets vi'    vi' . Thus, player i will
get an extra  per period. This way player i will get rewarded for punishing (she gets an
extra  post-punishment in every period). Also assume, to again avoid dealing with
public randomizations, that there is a pure strategy profile a(i) implementing these
payoffs, that is, g (a(i))  vi' for all i . Also, choose a number of punishment periods N
such that for all players i
6
max g i (ai , ai )  N v i  g i (a)  Nvi;
ai
which says that for perfectly patient players (   1 ) it is not worth deviating once and
then getting minmaxed for N periods instead of conforming when the post-punishment
payoff vector is v i' .
Now consider the following strategy profile:
Play begins in phase I . In phase I, play action profile a , where g (a)  v . Play remains
in phase I as long as in each past period either the realized actions were a or differed
from a in two or more components. If a single player j deviates from profile a , play
switches to phase II j .
Phase II j . In phase II j (the punishment phase for player j ; the phase which follows j ’s
deviation from the equilibrium path), play m j [the minmax profile for player j ]. Continue
in Phase II j for N periods as long as in each past period in that phase either the realized
actions were m j or differed from m j in two or more components. Switch to phase III j
after N successive periods of phase II j . If during phase II j a single player i deviated
from m j then begin Phase II i (the punishment phase for player i ). [Notice that if i
deviates in the punishment phase for j then j is effectively forgiven].
Phase III j . In phase III j (the post-punishment phase which follows the punishment phase
for player j ), player a( j ) . [This profile implements the payoff vector v 'j ]. Continue in
this phase unless in some period a single player i deviates from a( j ) , in which case
begin phase II i .
To check that these strategies are a subgame perfect equilibrium for high enough discount
factor   ( ,1) , we use the one-stage deviation principle. The one-stage deviation
principle says that no player can benefit by deviating from her proposed strategy in a
single stage and abiding by her strategy thereafter. Thus, we must check that no player
wants to deviate once, and do what the strategies specify after that, in phases I, II or III.
Phase I deviations? In phase I, player i receives v i from conforming and at most
(1   ) max g i (ai , ai )   (1   N )v i   N 1vi'
ai
from deviating once. By v i  vi' , the deviation payoff is less than the equilibrium payoff
for  sufficiently large.
7
Phase III j , j  i deviations? In phase III j , j  i , player i receives vi'   , that is, 
more than she gets in her own post-punishment phase. Her payoff to deviating is at most
(1   ) max g i (ai , ai ( j ))   (1   N )v i   N 1vi'
ai
which is less than vi'   for  sufficiently large. The key to this result is not  but
simply that vi'  v i , though of course  helps enforce conformity.
Phase III i deviations? In phase III i , player i receives v i' for conforming and at most
(1   ) max g i (ai , ai (i))   (1   N )v i   N 1vi'
ai
from deviating. As for phase III j , j  i , the deviation payoff is less than the conforming
payoff for  sufficiently large because vi'  v i .
Phase II j ,  i deviations? This is the critical phase: in phase Phase II j , player i ’s payoff
from punishing player j may be very low, perhaps much smaller than i ’s own minmax
payoff. This is where the extra  you get in the punishment phase by being an obedient
punisher becomes important. In phase II j , j  i , player i ’s payoff to conforming when
there are N '  N periods of punishment remaining is
(1   N ' )wij   N ' (vi'   )
where wij denotes player i ’s payoff when she minmaxes player j , that is, she
participates in the punishment. By deviating with N '  N periods (including the current
period) of punishment left, player i gets payoff
(1   ) max (ai , mji )   (1   N )v i   N 1vi'
ai
and for  large enough the deviation is not worth it because of the extra  player i
receives in the post-punishment phase for player j , phase III j , compared to her own
post-punishment, phase III i .
Phase II i deviations? Finally, in phase II i player i has no incentive to deviate because in
every period in this phase a minmax profile for player i is being played. This, by
definition, means that the other players  i seek to minimize i ’s payoff and she plays a
best response. If she were to deviate, she can do no better in the current period and her
punishment would phase would start over. Formally, conforming in her own punishment
phase with N '  N punishment periods left pays player i
(1   N ' )v i   N ' vi'
and deviating pays her at most
8
(1   )v i   (1   N )v i   N 1vi' ,
which is strictly less for all N '  N .
(ii) when punishment requires mixing: see Fudenberg and Maskin (1986) for the proof in
this case. When the minmax profile for a player involves mixing by the other players, we
run into the problem that, to be willing to mix, another player must receive the same
payoff in the remaining part of the repeated game from each of the pure strategies she
mixes over in the stage game. But consider now the following example. Suppose that
player 1 minmaxing player 2 means that player 1 randomizes over actions U and D in the
stage game with probabilities 0.5 of each action. However, assume player 1 gets payoff 2
from playing U in the stage game and payoff 0 from playing D. Now, because she is
supposed to mix in the punishment phase, unless other players can observe her mixing
probabilities, player 1 may want to “pretend” that she is mixing but really play U every
time on purpose; others cannot see that she is deviating from the punishment she is
supposed to carry out. To ensure that player 1 is truly willing to mix, the continuation
payoffs she gets in the rest of the game must depend on the realized outcomes of her
mixing. In particular, if she plays U in every one of the N punishment periods she gets
2(1   N ) out of the punishment phase and if she plays D she gets 0(1   N )  0 .
Therefore, player 1’s continuation value (her payoff in the post-punishment phase) if her
realized actions in the punishment phase are U every period must be v1' (2)  2(1   N )
and if she plays D in every period her post-punishment phase payoff is v1' (2) . For any
combination of U’s and D’s in the N punishment periods, the continuation (postpunishment) payoffs can be chosen this way to make the punishing player 1 indifferent.
Example 1 of Fudenberg and Maskin’s perfect folk theorem
Suppose now that we want to enforce the payoffs (0.8,0,8) in a perfect equilibrium of the
infinitely repeated prisoner’s dilemma. Since (0.8, 0.8) does not result from any
combination of pure strategies, we need to use public randomization. For example, we
can flip a rigged coin in every period which shows Heads with probability 0.8 and let
Heads result in (D,D) and Tails result in (C,C) with payoffs (1,1) and (0,0), respectively.
Now players get (0.8,0.8) in expectation in the repeated game with the standard payoff
normalization. Also, suppose we pick the vectors
v1'  (0.5,0.5   )
v2'  (0.5   ,0.5)
to be played in the post-punishment phases III1 and III 2 , respectively. Notice that these
payoff are in the interior of the set of feasible payoffs V and satisfy
v i  vi'  vi , i  1,2
9
This can again be achieved by public randomization if we flip a normal coin with
probability 0.5 of Heads and specify that Heads means (D,D) and Tails means (C,C) –
except if Heads occurs in, say, periods 18 and 36, then players player (D,C) if the postpunishment phase is III1 and (C,D) if the post-punishment phase is III 2 . This gives the
desired payoffs v1' or v 2' in expectation in phases III1 and III 2 , respectively.
We only need to check deviations for player 1 since the proposed SPNE is symmetric
(otherwise we must check players 1 and 2 separately in each phase).
In phase I, player 1 receives 0.8 by conforming on average and is tempted to deviate
only when the public randomization implies (deny, deny). In such periods she gets
(1   )1   (0.8) by conforming. The most she can get by deviating is
(1   )(1  G)   (1   N )0   N 1 (0.5)  (1   )(1)   (0.8)
for   1 sufficiently large.
In phase III2, player 1 receives 0.5   by conforming on average and is tempted to
deviate only when the public randomization implies (deny, deny). Conforming gives
(1   )(1)   (0.5   ) . Her payoff to deviating is at most
(1   )(1  G)   (1   N )0   N 1 (0.5)  (1   )(1)   (0.5   )
for   1 sufficiently large.
.
In Phase III1, player 1 receives 0.5 by conforming on average and is tempted to deviate
only when the public randomization implies (deny, deny) OR (deny, confess). In the first
case, we need
(1   )(1  G)   (1   N )0   N 1 (0.5)  (1   )(1)   (0.5)
and in the second case we need
(1   )( L)   (1   N )0   N 1 (0.5)  (1   )( L)   (0.5)
and both inequalities hold for   1 sufficiently large.
In phase II2 player 1’s payoff to conforming when there are N ' periods of punishment
remaining is
(1   N ' )0   N ' (0.5   )
and by deviating with N ' periods of punishment left, she gets
10
(1   )0   (1   N )0   N 1 (0.5)  (1   N ' )0   N ' (0.5   )
for  large enough (in fact, it is true for any  in this prisoner’s dilemma example. This
is because the minmax strategies for each player happen to be Nash equilibriums in the
stage game, so there is never incentive to deviate for a punisher in the punishment phase.
However, in games other than the prisoner’s dilemma there may be such incentive, see
example 2 below).
In phase II1 conforming in her own punishment phase with N '  N punishment periods
left pays player 1
(1   N ' )0   N ' (0.5)
and deviating pays her at most
(1   )0   (1   N )0   N 1 (0.5)  (1   N ' )0   N ' (0.5) ,
for any  .
(we really do not have to even check this phase, see the proof above; when player 1 is
being minmaxed, by definition of minmax play for player 1 she has no current gain to
deviating and in the future her payoff is worse since the other players restart punishing).
Example 2 of Fudenberg and Maskin’s perfect folk theorem
Consider the following game between two players. The set of strictly individually
rational feasible payoffs is shown below.
Player 2
Player 1
U
M
D
U
4,4
1,6
-1,-1
M
6,1
1,1
0,0
D
-1,-1
0,0
-1,-1
This stage game has three Nash equilibriums: {U,M},{M,U} and {M,M}. Player 1 has
the minmax profile {M,D} and player 2 the minmax profile {D,M}, with minmax payoffs
v1  v 2  0 . Notice, however, that the minmax strategies are not a Nash equilibrium:
when player 2 is supposed to minmax player 1 in the punishment phase II1, she will
prefer to play U or M instead of D. Similarly, when player 1 is supposed to minmax
player 2 in the punishment phase II2, she will prefer to play U or M instead of D
11
v2
1,6
4,4
Feasible,
strictly IR
payoffs
6,1
v1
-1,-1
Suppose now that we want to enforce the payoffs (4,4) in a perfect equilibrium of the
infinitely repeated game. This can be implemented in every stage by the profile {U,U}
Also, suppose we pick the vectors
v1'  (3,.3   )
v2'  (3   ,3)
to be played in the post-punishment phases III1 and III 2 , respectively. This can again be
achieved with public randomization, for example in phase III1 we can allow the public
variable to sometimes tell players to play {M,M}, sometimes {U,M} and sometimes
{U,U}. Likewise in phase III 2 we can use public randomization over {M,M} and {M,U}
and {U,U} to achieve v 2' . Notice that again in the post-punishment phase (phase III) the
payoffs v ' are in the interior of the set of feasible payoffs V and satisfy
v i  vi'  vi , i  1,2
Again we can just check deviations for player 1 since the game is symmetric.
In phase I, player 1 receives 4 by conforming. The most she can get by deviating is
(1   )(6)   (1   N )0   N 1 (3)  4
for   1 sufficiently large.
In phase III2, player 1 receives 3   by conforming on average and is tempted to deviate
only when the public randomization implies (U,U) since (M,M) and (M,U) are Nash
equilibriums in the stage game. Her payoff to conforming in state (U,U) is
(1   )4   (3   ) . Her payoff to deviating is at most
(1   )6   (1   N )0   N 1 (3)  (1   )4   (3   )
for   1 sufficiently large.
12
In Phase III1, player 1 receives 3 by conforming on average and is tempted to deviate
only when the public randomization implies (U,U) since (M,M) and (U,M) are Nash
equilibriums in the stage game. Her payoff to conforming in state (U,U) is
(1   )4   (3) . Her payoff to deviating is at most
(1   )6   (1   N )0   N 1 (3)  (1   )4   (3)
for   1 sufficiently large.
.In phase II2 player 1’s payoff to conforming when there are N ' periods of punishment
remaining is (1   N ' )0   N ' (3   ) . By deviating with N ' periods of punishment left,
player 1 gets payoff at most
(1   )1   (1   N )0   N 1 (3)  (1   N ' )0   N ' (3   )
for  large enough. Do notice, however, that in this game (unlike the previous, prisoner’s
dilemma example) a player can gain from deviating in the punishment phase: her stage
game payoff is 1 when she fails to punish and 0 when she does punish.
In phase II1, conforming in her own punishment phase with N '  N punishment periods
left pays player 1
(1   N ' )0   N ' (3)
and deviating pays her at most
(1   )0   (1   N )0   N 1 (3)  (1   N ' )0   N ' (3) ,
for any  . (again, we did not really need to check this)
Generally, allowing stricter punishments for deviations will make deviation less tempting
and therefore increases the set of perfect equilibrium outcomes of the repeated game. In
fact, whenever you want to construct a perfect equilibrium, it makes good sense to look
for the strongest punishments possible: if an outcome can be enforced at all, it can be
enforced with the strongest possible punishments. This also means that using the
strongest possible punishments will give you the most inclusive set of perfect
equilibriums and makes it easy to construct an equilibrium (section 5.1.3. in Fudenberg
and Tirole, Abreu 1988). However, if players can mistake mistakes or it can look like
they cheated because of noisy information, then punishments may actually have to be
carried out in equilibrium (Green and Porter 1984, Econometrica 52, 1). In this case, the
harshest possible punishments may not be optimal.
13
Renegotiation-Proof Equilibriums (RPE) in Infinitely Repeated Games
As discussed in course note 2, a Renegotiation-proof equilibrium (RPE) is an SPNE in
which the players cannot agree to switch to a different SPNE in any continuation
subgame. That is, regardless of which subgame they reach one player cannot approach
the other and say “hey, instead of playing the rest of the game as specified by our SPNE
strategies, let us switch to this other SPNE from today onwards.”
The idea is that an SPNE where in some subgame the players can agree to jump to a new
SPNE for the rest of the game is unlikely to be observed: both players know if they ever
get to that subgame they would deviate by mutual agreement from their SPNE strategies.
Despite a handful different definitions in the literature, a good example is Farrell J &
Maskin E 1989 Renegotiation in Repeated Games. Games and Economic Behavior 1,
327-60
In Farrell & Maskin (1989) an SPNE is
Weakly renegotiation-proof (WRPE) if no continuation payoff allowed by the strategies
is Pareto-dominated by another continuation payoff allowed by the strategies.
Strongly renegotiation-proof (SRPE) if it is WRPE and no continuation payoff allowed
by the strategies is Pareto-dominated by a continuation payoff in another WRPE.
WRPE says the players cannot all benefit from (and therefore agree to) “jumping” to
another SPNE allowed by their strategies.
SRPE says they cannot gain from jumping either to another SPNE allowed by their
strategies OR to some SPNE not allowed by their strategies, but which they know is (i)
Pareto-improving and (ii) “stable” in the sense it is weakly renegotiation–proof.
It may seem you should always require SRPE: why only allow players to consider
jumping to better payoffs within the existing SPNE? One reason to focus on WRPE may
be no SRPE exists. Alternatively, the players may be biased toward the strategic
possibilities they were already considering or the way they have played it historically.
Farrell-Maskin RPE Example
Suppose in the infinitely repeated prisoner’s dilemma
Deny (D) Confess (C)
P2
P1
Deny (D)
1,1
-1,2
Confess (C)
2,-1
0,0
WRPE?: {DD; if anybody plays C then switch to CC in all future periods}
14
WRPE? is SPNE, but it is not weakly - and therefore not strongly - renegotiation-proof.
The problem is the continuation payoffs in any subgame after Confess are (0,0), which is
Pareto-dominated by (1,1). Both players would prefer returning to Deny, Deny each
period instead of Confess, Confess.
Let us try instead:
WRPE 1: {Start with DC, then CD, then DC etc. If player 1 alone deviates then play DC
for N periods. If player 2 alone deviates then CD for N periods. If both deviate then
ignore it. If the cheater cheats during her punishment, restart the punishment. If the
punisher cheats, start her punishment. After N periods return to DC, CD, DC, etc.,
starting with the cheater Denying}
We should first check it is an SPNE for  large enough (just check for player 1 since
player 2 is symmetric). Then we check for WRPE
I: In the cooperation phase you have strongest incentive to cheat when it is Deny,
Confess and you are tempted to play Confess, Confess. You do not cheat if

(1   )  t (1  2 )  v DC
0
t 



lifeitme payoff to honestry,so DC,CD,DC,CD,DC,etc.

(1   )0   (1   N )(1)   N 1v DC


lifetime payoff to cheating (so deviate to CC),note you return to the
continuation payoff v DCafter N 1 periods
 (1   N 1 )v DC  (1   )0   (1   N )(1)

The right hand side is negative and the left hand side v DC  (1   )  t (1  2 )  0
t 0
   0.5 . So a sufficient condition to avoid cheating in the cooperation phase
is   0.5 . Generally you cooperate always for   1 .
IIa: In your own punishment phase with N '  N punishment periods remaining:
(1   N ' )(1)   N ' v DC



lifeitme payoff to accepting yourpunishment,
note yourreturn to v DC when punishment
ends.
 (1   )0   (1   N )(1)   N 1v DC

lifetime payoff to deviatingoncein yourpunishment
(one-stage deviationprinciplemeans we do not have to check
multiple period deviations)
 ( N '   N 1 )v DC  1   N '   (1   N ) ,
 ( N   N 1 )v DC  1   N   (1   N )   N v DC  1   N , true for   1
IIb: When you punish: you play Confess, Deny, so switching to Deny, Deny would
decrease your current payoff from 2 to 1 and throw you into a punishment phase. No
incentive not to punish.
So WRPE1 is an SPNE for large  . Now, to check if it also WRPE, we write the
continuation payoffs in the cooperation phase even period, cooperation odd period, and
the punishment phases for the two players:
15
 v1DC   (1  2 )(1   ) /(1   2 ) 
;
v DC   DC   
 v  (2   )(1   ) /(1   2 ) 

 2  
CD
v
 v1CD   (2   )(1   ) /(1   2 ) 
;
  CD   
 v  (1  2 )(1   ) /(1   2 ) 

 2  
  (1   N ' )   N ' v1DC 
;
v 
 (1   N ' )(2)   N ' v DC 
2 

1
 (1   N ' )(2)   N ' v1CD 

v2  
  (1   N ' )   N ' v CD 
2


It is clear you cannot Pareto-rank any two payoff vectors ( v DC just switches v CD so one
player wins, one loses; v1 hurts player one compared to v DC , while helping player 2 etc.).
Therefore WRPE1 is weakly renegotiation proof for  large
However, suppose  is large enough that (2   )(1   ) /(1   2 )  1    0.5 .Then in
any cooperation subgame specifying CD thus period, both players are better off switching
to a continuation SPNE with Deny, Deny in the cooperation phase and N periods of
punishments before returning to DD. Both players would be better off since the payoffs
to Deny, Deny
v
DD
 v1  1  (2   )(1   ) /(1   2 )  CD
v
       
 v  1 (1  2 )(1   ) /(1   2 ) 


 2
(symmetrically, v DD Pareto-dominates v DC in subgames specifying DC).
Consequently, provided Deny, Deny can be achieved in a WRPE our WRPE1 is not
strongly renegotiation proof.
It turns out that the Deny, Deny alternative:
WRPE2: {DD in every period. If player 1 alone deviates then play DC for N periods. If
player 2 alone deviates then CD for N periods. If both deviate then ignore it. If the
cheater cheats during her punishment, restart the punishment. If the punisher cheats, start
her punishment. After N periods return to DD in every period}
is in fact another WRPE. This is easy to prove for  large (proof omitted)– first we
would prove it is SPNE using the same steps as for WRPE1. Then we prove no two
continuation payoffs within WRPE2 Pareto-dominate each other (the three continuation
payoffs
are,
DD
N'
N ' DD
N'
N ' DD
 v1  1
  (1   )   v1  2  (1   )(2)   v1 
, vˆ  
 ., none of
vˆ DD   DD    , vˆ1  
 v  1
 (1   N ' )(2)   N ' v DD 
  (1   N ' )   N ' v DD 
2 
2
 2 



which Pareto-dominate the others.
16
(Note: you can show WRPE2 is SPNE for any  , N that works for WRPE1. Generally
the required  , N may change, but you can adjust them accordingly. The reason SPNE2
is easier to enforce here than SPNE1 is that the one-shot gain to cheating in the
cooperation phase is the same as in SPNE1 (the normal form above shows cheating from
DC to CC in SPNE2 gives a short-term benefit of one; so does cheating DD to CD in
SPNE1). Since the one-shot cheating gain in phase I is the same, but the payoff foregone
DD
DC
by cheating is better than in SPNE2 ( vˆ  v ) they are less tempted to cheat in either
the cooperation or the punishment phase).
Now, we can finally ask also: is WRPE2 strongly renegotiation-proof? The answer is
yes, because all continuation payoffs in WRPE2 are on the Pareto-frontier of the set of
feasible payoffs. This was not the case in WRPE1 (see the graph below) This means in
WRPE2 it is never possible to move to another continuation payoff that helps one player
without hurting another. All continuation payoffs in an SPNE being on the efficient
frontier is a sufficient condition for SRPE (but may not be necessary). Constructing
SPNEs where all continuation payoffs are on the efficient frontier is therefore an easy
way to construct SRPEs. On the downside since we cannot use “mutual destruction”
punishments like “Confess, Confess” – they are often ruled out by the SRPE condition –
we cannot threaten the players as harshly as when we cannot renegotiate to mutual
advantage. So renegotiation, while making the SPNE “solid” often weaken the range of
behaviors you can enforce in the cooperation phase.
Graphically (see below) not all the continuation payoffs in WRPE1 are on the Paretofrontier of the set of feasible payoffs ( v CD and v DC are below the frontier; so are the
punishment payoffs (not drawn) since they return players to v CD , v DC ). In the WRPE2 all
continuation payoffs are on the frontier and it is therefore SRPE:
v2
-1,2
v̂ 2
1,
1,1= v̂
v DC
v CD
DD
v̂ 1
v1
1,
0,0
2, -1
17