Supplementary appendix

Bounding Equilibrium Payo¤s in Repeated Games
with Private Monitoring: Online Appendix
Takuo Sugaya and Alexander Wolitzky
Stanford Graduate School of Business and MIT
April 25, 2016
Proof of Proposition 2
We prove that Etalk ( ; p) = Emed ( ). In our construction, players ignore private signals yi;t
observed in periods t = 1; 2; :::. That is, only signal yi;0 observed in period 0 is used. Hence
we can see p as an ex ante correlation device. Since we consider two-player games, whenever
we say players i and j, we assume that they are di¤erent players: i 6= j.
The structure of the proof is as follows: take any strategy of the mediator, ~ , that
satis…es inequality (3) in the text (perfect monitoring incentive compatibility); and let v~ be
the value when the players follow ~ . Since each v^ 2 Emed ( ) has a corresponding ^ that
satis…es perfect monitoring incentive compatibility, it su¢ ces to show that, for each " > 0,
there exists a sequential equilibrium whose equilibrium payo¤ v satis…es kv v~k < " in the
following environment:
1. At the beginning of the game, each player i receives a message mmediator
from the
i
mediator.
2. In each period t, the stage game proceeds as follows:
(a) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ), each player i sends the …rst
i
message m1st
i;t simultaneously.
(b) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ; m1st
i
t ), each player i takes
action ai;t simultaneously.
(c) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ; m1st
i
t ; at ), each player i sends
2nd
the second message mi;t simultaneously.
We call this environment “perfect monitoring with cheap talk.”
To this end, from ~ , we …rst create a strict full-support equilibrium with mediated perfect monitoring that yields payo¤s close to v~. We then move from to a similar equilibrium
, which will be easier to transform into an equilibrium with perfect monitoring with cheap
talk. Finally, from , we create an equilibrium with perfect monitoring with cheap talk
with the same on-path action distribution.
1
Construction and Properties of
In this subsection, we consider mediated perfect monitoring throughout. Since W 6= ;, by
Lemma 2 in the text, there exists a strict full support equilibrium strict with mediated perfect
monitoring. As in the proof of that lemma, consider the following strategy of the mediator:
In period 1, the mediator draws one of two states, Rv~ and Rperturb , with probabilities 1
and , respectively. In state Rv~, the mediator’s recommendation is determined as follows: If
no player has deviated up to period t, the mediator recommends rt according to ~ (htm ). If
only player i has deviated, the mediator recommends r i;t to player j according to j , and
recommends some best response to j to player i. Multiple deviations are treated as in the
proof of Lemma 1 in the text. On the other hand, in state Rperturb , the mediator follows the
equilibrium strict . Let denote this strategy of the mediator. From now on, we …x > 0
arbitrarily.
With mediated perfect monitoring, since strict has full support, player i believes that the
mediator’s state is Rperturb with positive probability after any history. Therefore, by perfect
monitoring incentive compatibility and the fact that strict is a strict equilibrium, it is always
strictly optimal for each player i to follow her recommendation. This means that, for each
period t, there exist "t > 0 and Tt < 1 such that, for each player i and on-path history ht+1
m ,
we have
"
#
1
X
t 1
(1
)E ui (rt ) j htm ; ri;t + E (1
)
ui ( (hm )) j htm ; ri;t
=t+1
> max (1
)E ui (ai ; r
ai 2Ai
+(
Tt
) (1
i;t )
j
htm ; ri;t
ai ;
"t ) max ui (^
a
^i
"t
j )
+ "t max ui (a) +
Tt
a2A
max ui (a):
a2A
(1)
That is, suppose that if player i unilaterally deviates from on-path history, then player j
virtually minmaxes player i for Tt 1 periods with probability 1 "t . (Recall that j is
the minmax strategy and "j is a full support perturbation of j .) Then player i has a
strict incentive not to deviate from any recommendation in period t on equilibrium path.
Equivalently, since is an full support recommendation, player i has a strict incentive not
to deviate unless she herself has deviated.
Moreover, for su¢ ciently small "t > 0, we have
"
#
1
X
t 1
)
(1
)E ui (rt ) j htm ; ri;t + E (1
ui ( (hm )) j htm
=t+1
> (1
Tt
) (1
"t ) max ui (^
ai ;
a
^i
"t
j )
+ "t max ui (a) +
a2A
Tt
max ui (a):
a2A
(2)
That is, if a deviation is punished with probability 1 "t for Tt periods including the current
period, then player i believes that the deviation is strictly unpro…table.1
For each t, we …x "t > 0 and Tt < 1 with (1) and (2). Without loss, we can take "t
decreasing: "t "t+1 for each t.
1
If the current on-path recommendation schedule Pr (rj;t j htm ; ri;t ) is very close to
more restrictive than (1).
2
j,
then (2) may be
Construction and Properties of
In this subsection, we again consider mediated perfect monitoring. We further modify and
create the following mediator’s strategy : At the beginning of the game, for each i, t, and
punish
at , the mediator draws ri;t
(at ) according to "i t . In addition, for each i and t, she draws
! i;t 2 fR; P g such that ! i;t = R (regular) and P (punish) with probability 1 pt and pt ,
respectively, independently across i and t. We will pin down pt > 0 in Lemma 1. Moreover,
given ! t = (! 1;t ; ! 2;t ), the mediator chooses rt (at ) for each at as follows: If ! 1;t = ! 2;t = R,
then she draws rt (at ) according to (at ) (r). If ! i;t = R and ! j;t = P , then she draws ri;t (at )
P
a
punish
from Pr (ri j rj;t
(at )) while she draws rj;t (at ) randomly from aj 2Aj jAjj j .2 Finally, if
P
! 1;t = ! 2;t = P , then she draws ri;t (at ) randomly from ai 2Ai jAaii j for each i independently.
Since has full support,
is well de…ned.
As will be seen, we will take pt su¢ ciently small. In addition, recall that > 0 (the
perturbation of ~ to ) is arbitrarily. In the next subsection and onward, we construct an
equilibrium with perfect monitoring with cheap talk that has the same equilibrium action
distribution as . Since pt is small and > 0 is arbitrary, constructing such an equilibrium
su¢ ces to prove Proposition 2.
punish
At the start of the game, the mediator draws ! t , ri;t
(at ), and rt (at ) for each i, t, and
at . Given them, the mediator sends messages to the players as follows:
1. At the start of the game, the mediator sends
punish
ri;t
(at )
1
at 2At
1
to player i.
t=1
2. In each period t, the stage game proceeds as follows:
(a) The mediator decides ! t (at ) 2 fR; P g2 as follows: if there is no unilateral deviator
(de…ned below), then the mediator sets ! t (at ) = ! t . If instead player i is a
unilateral deviator, then the mediator sets ! i;t (at ) = R and ! j;t (at ) = P .
(b) Given ! i;t (at ), the mediator sends ! i;t (at ) to player i. In addition, if ! i;t (at ) = R,
then the mediator sends ri;t (at ) to player i as well.
(c) Given these messages, player i takes an action. In equilibrium, if player i has not
punish
yet deviated, then player i takes ri;t (at ) if ! i;t (at ) = R and takes ri;t
(at ) if
! i;t (at ) = P . For notational convenience, let
ri;t =
ri (at ) if ! i;t (at ) = R;
punish
ri;t
(at ) if ! i;t (at ) = P
be the action that player i is supposed to take if she has not yet deviated. Her
strategy after her own deviation is not speci…ed.
We say that player i has unilaterally deviated if there exist
t 1 and a unique i such
0
that (i) for each
< , we have an; 0 = rn; 0 for each n 2 f1; 2g (no deviation happened
punish
As will be seen below, if ! j;t = P , then player j is supposed to take rj;t
(at ). Hence, rj;t (at ) does
t
not a¤ect the equilibrium action. We de…ne rj;t (a ) so that, when the mediator sends a message only at
the beginning of the game (in the game with perfect monitoring with cheap talk), she sends a “dummy
recommendation” rj;t (at ) so that player j does not realize that ! j;t = P until period t.
2
3
until period
1) and (ii) ai; 6= ri; and aj; = rj; (player i deviates in period and player
j does not deviate).
Note that
is close to on the equilibrium path for su¢ ciently small pt . Hence, onpath strict incentive compatibility for player i follows from (1). Moreover, the incentive
compatibility condition analogous to (2) also holds.
Lemma 1 There exists fpt g1
t=1 with pt > 0 for each t such that it is strictly optimal for each
player i to follow her recommendation: For each player i and history
punish
at
ri;t
hti
1
at 2At 1
t=1
; at ; (! (a ))t =11 ; ! i;t (at ); (ri; )t =1 ;
if player i herself has not yet deviated, we have the following two inequalities:
1. If a deviation is punished by "j t for the next period Tt periods with probability 1 "t
Pt+Tt 1
p , then it is strictly unpro…table:
=t
"
#
1
X
t 1
(1
)E ui (ri;t ; aj;t ) j hti + E
(1
)
ui (ri; ; aj; ) j hti ; ai;t = ri;t
=t+1
> max (1
Tt
+(
+
Tt
ui (ai ; aj;t ) j
)E
ai 2Ai
)
1
"t
max ui (a):
hti
Pt+Tt
1
=t
ai ;
max ui (^
p
a
^i
"t
j )
+ "t +
Pt+Tt
1
=t
p
max ui (a)
a2A
(3)
a2A
2. If a deviation is punished by "j t from the current period with probability 1 "t
Pt+Tb 1
pt , then it is strictly unpro…table:
=t
"
#
1
X
t 1
(1
)E ui (ri;t ; aj;t ) j hti + E
(1
)
ui (ri; ; aj; ) j hti ; ai;t = ri;t
=t+1
Tt
> (1
+
Tt
)
1
max ui (a):
"t
Pt+Tt
=t
1
p
max ui (^
ai ;
a
^i
"t
j )
+ "t +
Pt+Tt
=t
a2A
1
p
max ui (a)
a2A
(4)
Moreover, E does not depend on the speci…cation of player j’s strategy after player j’s
own deviation, for each history hti such that player i has not deviated.
Proof. Since has full support on the equilibrium path, a player i who has not yet deviated
always believes that player j has not deviated. Hence, E is well de…ned without specifying
player j’s strategy after player j’s own deviation.
Moreover, since pt is small and ! j;t is independent of (! )t =11 and ! i;t , given (! (a ))t =11
and ! i;t (at ) (which are equal to (! )t =11 and ! i;t on-path), player i believes that ! j;t (at ) is
equal to ! j;t and ! j;t is equal to R with a high probability, unless player i has deviated. Since
Pr (rj;t j ! i;t (at ); ! j;t (at ) = R ; hti ) = Pr (rj;t j at ; ri;t );
4
we have that the di¤erence
ui (ri;t ; aj;t ) j hti
E
E
ui (ri;t ; aj;t ) j rit ; at ; ri;t
is small for small pt .
Further, if p is small for each
t+1, then since ! is independent of ! t with t
1,
t
regardless of (! (a )) =1 , player i believes that ! i; (a ) = ! j; (a ) = R with high probability
for
t + 1 on the equilibrium path. Since the distribution of the recommendation given
is the same as that of given a and ! i; (a ) = ! j; (a ) = R, we have that
"
#
"
#
1
1
X
X
t 1
t 1
(1
)
ui (ri; ; aj; ) j hti ; ai;t = ri;t E (1
)
ui (ri; ; aj; ) j rit ; at ; ri;t
E
=t+1
=t+1
is small for small p with
t + 1.
Hence, (1) and (2) imply that, there exists pt > 0 such that, if p
pt for each
t,
then the claims of the lemma hold. Hence, if we take pt min t p , then the claims hold.
We …x fpt g1
t=1 so that Lemma 1 holds. This fully pins down
monitoring.
with mediated perfect
Construction with Perfect Monitoring with Cheap Talk
Given
with mediated perfect monitoring, we de…ne the equilibrium strategy with perfect
monitoring with cheap talk such that the equilibrium action distribution is the same as
. We must pin down the following four objects: at the beginning of the game, what
player i receives from the mediator; what message m1st
message mmediator
i;t player i sends at
i
the beginning of period t; what action ai;t player i takes in period t; and what message m2nd
i;t
player i sends at the end of period t.
Intuitive Argument
punish
As in , at the beginning of the game, for each i, t, and at , the mediator draws ri;t
(at )
according to "i t . In addition, with pt > 0 pinned down in Lemma 1, she draws ! t 2 fR; P g2
and rt (at ) as in
for each t and at . She then de…nes ! t (at ) from at , rt (at ), and ! t as in .
Intuitively, the mediator sends all the information about
punish
punish
! t (at ); rt at ; r1;t
at ; r2;t
at
1
at 2At
1
t=1
through the initial messages (mmediator
; mmediator
). In particular, the mediator directly sends
1
2
punish
(ri;t
(at ))at 2At
1
1
t=1
to player i as a part of mmediator
. Hence, we focus on how we replicate
i
the role of the mediator in of sending (! t (at ); rt (at )) in each period, depending on realized
history at .
The key features to establish are (i) player i does not know the instructions for the other
player, (ii) before player i reaches period t, player i does not know her own recommendations
for periods
t (otherwise, player i would obtain more information than the original
5
equilibrium
and thus might want to deviate), and (iii) no player wants to deviate (in
particular, if player i deviates in actions or cheap talk, then the strategy of player j is as if
the state were ! j;t = P in , for a su¢ ciently long time with a su¢ ciently high probability).
The properties (i) and (ii) are achieved by the same mechanism as in Theorem 9 of Heller,
Solan and Tomala (2012, henceforth HST). In particular, without loss, let Ai = f1i ; :::; ni g
be player i’s action set. We can view ri;t (at ) as an element of f1; :::; ni g. The mediator at
the beginning of the game draws rt (at ) for each at .
Instead of sending ri;t (at ) directly to player i, the mediator encodes ri;t (at ) as follows: For
a su¢ ciently large N t 2 Z to be determined, we de…ne pt = N t ni nj . This pt corresponds to
ph in HST. Let Zpt
f1; :::; pt g. The mediator draws xji;t (at ) uniformly and independently
from Zpt for each i, t, and at . Given them, she de…nes
i
yi;t
(at )
xji;t (at ) + ri;t (at ) (mod ni ):
(5)
i
i
(at ) is the “encoded instruction”of ri;t (at ), and to obtain ri;t (at ) from yi;t
(at ),
Intuitively, yi;t
1
j
player i needs to know xi;t (at ). The mediator gives (yii (at ))at 2At 1 t=1 to player i as a part of
. At the same time, she gives
mmediator
i
xji;t (at )
1
.
to player j as a part of mmediator
j
at 2At 1 t=1
j
the beginning of period t, player j sends xi;t (at ) by cheap talk as a part of m1st
j;t , based
t
t
the realized action a , so that player i does not know ri;t (a ) until period t. (Throughout
At
on
the proof, the superscript of a variable represents who is informed about the variable, and
the subscript represents whose recommendation the variable is about.)
In order to incentivize player j to tell the truth, the equilibrium should embed a mechanism that punishes player i if she tells a lie. In HST, this is done as follows: The mediator
draws ii;t (at ) and ii;t (at ) uniformly and independently from Zpt , and de…nes
uji;t (at )
i
t
i;t (a )
xji;t (at ) +
i
t
i;t (a )
(mod pt ):
(6)
The mediator gives xji;t (at ) and uji;t (at ) to player j while she gives ii;t (at ) and ii;t (at ) to
player i. In period t, player j is supposed to send xji;t (at ) and uji;t (at ) to player i. If player i
receives xji;t (at ) and uji;t (at ) with
uji;t (at ) 6=
i
t
i;t (a )
xji;t (at ) +
i
t
i;t (a )
(mod pt );
(7)
then player i interprets that player j has deviated. For su¢ ciently large N t , since player
j does not know ii;t (at ) and ii;t (at ), if player j tells a lie about xji;t (at ), then with a high
probability, player j creates a situation where (7) holds.
Since HST considers Nash equilibrium, they let player i minimax player j forever after
(7) holds. On the other hand, since we consider sequential equilibrium, as in the proof of
Lemma 2 in the text, we will create a coordination mechanism such that, if player j tells
a lie, then with high probability player i minimaxes player j for a long time and player i
assigns probability zero to the event that player i punishes player j.
To this end, we consider the following coordination: First, if and only if ! i;t (at ) = R, the
6
mediator de…nes uji;t (at ) as (6). Otherwise, uji;t (at ) is randomly drawn. That is,
uji;t (at )
xji;t (at ) + ii;t (at ) (mod pt ) if ! i;t (at ) = R;
uniformly distributed over Zpt
if ! i;t (at ) = P:
t
i
i;t (a )
(8)
Since both ! i;t (at ) = R and ! i;t (at ) = P happen with a positive probability, player i after
receiving uji;t (at ) with uji;t (at ) 6= ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt ) interprets that ! i;t (at ) =
P . For notational convenience, let !
^ i;t (at ) 2 fR; P g be player i’s interpretation of ! i;t (at ).
punish
After !
^ i;t (at ) = P , she takes period-t action according to ri;t
(at ). Given this inference, if
player j tells a lie about uji;t (at ) with ! i;t (at ) = R, then with a high probability, she induces
a situation with uji;t (at ) 6= ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt ), and player i punishes player
j in period t (without noticing player j’s deviation).
punish
Second, switching to ri;t
(at ) for period t only may not su¢ ce, if player j believes that
player i’s action distribution given ! i;t (at ) = R is close to the minimax strategy. Hence, we
ensure that, once player j deviates, player i takes ri;punish (a ) for a su¢ ciently long time.
To this end, we change the mechanism so that player j does not always know uji;t (at ).
j
Instead, the mediator draws pt independent random variables vi;t
(n; at ) with n = 1; :::; pt
uniformly from Zpt . In addition, she draws nii;t (at ) uniformly from Zpt . The mediator de…nes
uji;t (n; at ) for each n = 1; :::; pt as follows:
uji;t (n; at ) =
uji;t (at ) if n = nii;t (at );
j
(n; at ) if otherwise,
vi;t
that is, uji;t (n; at ) corresponds to uji;t (at ) with (8) only if n = nii;t (at ). For other n, uji;t (n; at )
is completely random.
The mediator sends nii;t (at ) to player i, and sends fuji;t (n; at )gn2Zpt to player j. In addition,
the mediator sends nji;t (at ) to player j, where
nji;t (at ) =
nii;t (at )
if ! i;t 1 (at 1 ) = P;
uniformly distributed over Zpt if ! i;t 1 (at 1 ) = R
is equal to nii;t (at ) if and only if last-period ! i;t 1 (at 1 ) is equal to P .
In period t, player j is asked to send xji;t (at ) and uji;t (n; at ) with n = nii;t (at ), that is, send
xji;t (at ) and uji;t (at ). If and only if player j’s messages x^ji;t (at ) and u^ji;t (at ) satisfy
u^ji;t (at ) =
i
t
i;t (a )
x^ji;t (at ) +
i
t
i;t (a )
(mod pt );
player i interprets !
^ i;t (at ) = R. If player i has !
^ i;t (at ) = R, then player i knows that player
j needs to know nii;t+1 (at+1 ) to send the correct uji;t+1 (n; at+1 ) in the next period. Hence, she
sends nii;t+1 (at+1 ) to player j. If player i has !
^ i;t (at ) = P , then she believes that player j
knows nii;t+1 (at+1 ) and does not send nii;t+1 (at+1 ).
Given this coordination, once player j creates a situation with ! i;t (at ) = R but !
^ i;t (at ) =
P , then player j cannot receive nii;t+1 (at+1 ). Without knowing nii;t+1 (at+1 ), with a large N t+1 ,
with a high probability, player j cannot know which uji;t+1 (n; at+1 ) she should send. Then,
7
again, she will create a situation with
u^ji;t+1 (at+1 ) 6=
i
t+1
)
i;t+1 (a
x^ji;t (at+1 ) +
i
t+1
)
i;t (a
(mod pt+1 );
that is, !
^ i;t+1 (at+1 ) = P . Recursively, player i has !
^ i; (a ) = P for a long time with a high
probability if player j tells a lie.
Finally, if player j takes a deviant action in period t, then the mediator has drawn
! i; (a ) = P for each
t+1 for a corresponding to the realized history. With ! i; (a ) = P ,
in order to avoid !
^ i; (a ) = P , player j needs to create a situation
u^ji; (a ) =
i
i;
(a )
x^ji; (a ) +
i
i;
(a ) (mod p )
without knowing ii; (a ) and ii; (a ) while the mediator’s message does not tell her what is
t
i
xji;t (at ) + ii;t (at ) (mod p ) by (8). Hence, for su¢ ciently large N , player j cannot
i;t (a )
avoid !
^ i; (a ) = P with a nonnegligible probability. Hence, player j will be minmaxed from
the next period with a high probability.
The above argument in total shows that, if player j deviates, whether in communication
or action, then she will be minmaxed for su¢ ciently long time. Lemma 1 ensures that player
j does not want to tell a lie or take a deviant action.
Formal Construction
Let us formalize the above construction: As in , at the beginning of the game, for each i,
punish
t, and at , the mediator draws ri;t
(at ) according to "i t ; then she draws ! t 2 fR; P g2 and
t
t
rt (a ) for each t and a ; and then she de…nes ! t (at ) from at , rt (at ), and ! t as in . For each
t and at , she draws xji;t (at ) uniformly and independently from Zpt . Given them, she de…nes
xji;t (at ) + ri;t (at ) (mod ni );
i
(at )
yi;t
so that (5) holds.
j
(n; at ) for each n 2 Zpt , nii;t (at ), and
The mediator draws ii;t (at ), ii;t (at ), u~ji;t (at ), vi;t
n
~ ji;t (at ) from the uniform distribution over Zpt independently for each player i, each period
t, and each at .
As in (8), the mediator de…nes
uji;t (at )
xji;t (at ) + ii;t (at ) (mod pt ) if ! i;t (at ) = R;
u~ji;t (at )
if ! i;t (at ) = P:
i
t
i;t (a )
In addition, the mediator de…nes
uji;t (n; at ) =
and
nji;t (at ) =
uji;t (at ) if n = nii;t (at );
j
vi;t
(n; at ) if otherwise
nii;t (at ) if t = 1 or ! i;t 1 (at 1 ) = P;
n
~ ji;t (at ) if t 6= 1 and ! i;t 1 (at 1 ) = R;
as explained above.
8
Let us now de…ne the equilibrium:
1. At the beginning of the game, the mediator sends
mmediator
i
=
punish
i
(at ) ;
(at ); ii;t (at ); ii;t (at ); ri;t
yi;t
nii;t (at ); nij;t (at ); uij;t (n; at ) n2Z t ; xij;t (at )
p
!
at 2At
1
!1
t=1
to each player i.
2. In each period t, the stage game proceeds as follows: In equilibrium,
m1st
j;t =
j
t
t
uji;t (m2nd
if t 6= 1 and m2nd
i;t 1 ; a ); xi;t (a )
i;t 1 6= fbabbleg;
uji;t (nji;t (at ); at ); xji;t (at ) if t = 1 or m2nd
i;t 1 = fbabbleg
and
m2nd
j;t
=
(9)
^ j;t (at ) = R;
njj;t+1 (at+1 ) if !
fbabbleg if !
^ j;t (at ) = P:
t+1
Note that, since m2nd
= (a1 ; :::; at ).
j;t is sent at the end of period t, the players know a
(a) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ), each player i sends the …rst
i
message m1st
i;t simultaneously. If player i herself has not yet deviated, then
m1st
i;t =
t
t
i
if t 6= 1 and m2nd
uij;t (m2nd
j;t 1 6= fbabbleg;
j;t 1 ; a ); xj;t (a )
t
2nd
t
t
i
i
i
uj;t (nj;t (a ); a ); xj;t (a ) if t = 1 or mj;t 1 = fbabbleg:
t
t
i
t
i
2nd
i
1st
Let m1st
i;t (u) be the …rst element of mi;t (that is, either uj;t (mj;t 1 ; a ) or uj;t (nj;t (a ); a )
t
i
1st
on equilibrium); and let mi;t (x) be the second element (xj;t (a ) on equilibrium).
As a result, the pro…le of the messages m1st
becomes common knowledge.
t
If
i
i
t
t
t
m1st
m1st
(10)
j;t (u) 6= i;t (a )
j;t (x) + i;t (a ) (mod p );
then player i interprets !
^ i;t (at ) = P . Otherwise, !
^ i;t (at ) = R.
(b) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ; m1st
i
t ), each player i takes
action ai;t simultaneously. If player i herself has not yet deviated, then player i
takes ai;t = ri;t with
ri;t =
i
yi;t
(at )
m1st
^ i;t (at ) = R;
j;t (x) (mod ni ) if !
punish
ri;t
(at )
if !
^ i;t (at ) = P:
(11)
i
Recall that yi;t
(at ) xji;t (at ) + ri;t (at ) (mod ni ) by (5). By (9), therefore, player i
punish
i
takes ri;t
(at ) if ! i;t (at ) = R and ri;t
(at ) if ! i;t (at ) = P on the equilibrium path,
as in .
(c) Given player i’s history (mmediator
; (m1st ; a ; m2nd )t =11 ; m1st
i
t ; at ), each player i sends
2nd
the second message mi;t simultaneously. If player i herself has not yet deviated,
then
nii;t+1 (at+1 ) if !
^ i;t (at ) = R;
m2nd
i;t =
fbabbleg if !
^ i;t (at ) = P:
9
As a result, the pro…le of the messages m2nd
becomes common knowledge. Note
t
t
that ! t (a ) becomes common knowledge as well on equilibrium path.
Incentive Compatibility
)
; mmediator
The above equilibrium has full support: Since ! t (at ), and rt (at ) have full support, (mmediator
2
1
have full support as well. Hence, we are left to verify player i’s incentive not to deviate from
the equilibrium strategy, given that player i believes that player j has not yet deviated after
any history of player i.
Suppose that player i followed the equilibrium strategy until the end of period t 1.
First, consider player i’s incentive to tell the truth about m1st
i;t . In equilibrium, player i sends
m1st
i;t =
t
i
t
if m2nd
6 fbabbleg;
uij;t (m2nd
j;t 1 ; a ); xj;t (a )
j;t 1 =
t
2nd
t
t
i
i
i
uj;t (nj;t (a ); a ); xj;t (a ) if mj;t 1 = fbabbleg:
The random variables possessed by player i are independent of those possessed by player
j given (m1st ; a ; m2nd )t =11 , except that (i) uji;t (at ) = ii;t (at ) xji;t (at ) + ii;t (at ) (mod pt )
if ! i;t (at ) = R, (ii) uij;t (at ) = jj;t (at ) xij;t (at ) + jj;t (at ) (mod pt ) if ! j;t (at ) = R, (iii)
~ ii; (a ) if ! i; 1 (a 1 ) = R, and (iv)
nji; (a ) = nii; (a ) if ! i; 1 (a 1 ) = P while nji; (a ) = n
~ jj; (a ) if ! j; 1 (a 1 ) = R. Since
nij; (a ) = njj; (a ) if ! j; 1 (a 1 ) = P while nij; (a ) = n
j
i
i
t
t
~ji;t (at ), vi;t
(n; at ) nii;t (at ), and n
~ ji;t (at ) are uniform and independent, player
i;t (a ), i;t (a ), u
i cannot learn ! i; (a ), ri; (a ), or rj; (a ) with
t. Hence, player i believes at the time
1st
when she sends mi;t that her equilibrium value is equal to
(1
)E
ui (at ) j hti + E
"
(1
punish
where hti is as if player i observed ri;t
(at )
)
1
X
=t+1
1
at 2At
1 t=1
t 1
#
ui (at ) j hti ;
, at , (! (a ))t =11 , and ri;t (at ), and
believed that r (a ) = a for each = 1; :::; t 1 with
with mediated perfect monitoring.
On the other hand, for each e > 0, for a su¢ ciently large N t , if player i tells a lie in at
least one element m1st
e, player i creates a situation
i;t , then with probability 1
m1st
i;t (u) 6=
j
t
j;t (a )
m1st
i;t (x) +
j
t
j;t (a )
(mod pt ):
Hence, (10) (with indices i and j reversed) implies that !
^ j;t (at ) = P .
t
Moreover, if player i creates a situation with !
^ j;t (a ) = P , then player j will send m2nd
j;t =
j
j
t+1
t
t+1
fbabbleg instead of nj;t+1 (a ). Unless ! j;t (a ) = P , since nj;t+1 (a ) is independent of
player i’s variables, player i believes that njj;t+1 (at+1 ) is distributed uniformly over Zpt+1 .
Hence, for each e > 0, for su¢ ciently large N t , if !
^ j;t (at ) = R, then player i will send m1st
i;t+1
with
j
j
t+1
t+1
m1st
) m1st
) (mod pt+1 )
i;t+1 (x) + j;t+1 (a
i;t+1 (u) 6= j;t+1 (a
with probability 1 e. Then, by (10) (with indices i and j reversed), player j will have
!
^ j;t+1 (at+1 ) = P .
Recursively, if ! j; (a ) = R for each
= t; ::; t + Tt 1, then player i will induce
10
!
^ j; (a ) = P for each = t; ::; t + Tt 1 with a high probability. Hence, for "t > 0 and Tt
…xed in (1) and (2), for su¢ ciently large N t , if N
N t for each
t, then player i will
be punished for the subsequent P
Tt periods regardless
Pt+Tt 1 of player i’s continuation strategy with
t+Tt 1
probability no less than 1 "t
p
.
(
p represents the maximum probability
=t
=t
of having ! i; (a ) = P for some for subsequent Tt periods.) (4) implies that telling a lie
gives strictly lower payo¤ than the equilibrium payo¤. Therefore, it is optimal to tell the
truth about m1st
i;t . (In (4), we have shown interim incentive compatibility after knowing
t
! i;t (a ) and ri;t while here, we consider hti before ! i;t (at ) and ri;t . Taking the expectation
with respect to ! i;t (at ) and ri;t , (4) ensures incentive compatibility before knowing ! i;t (at )
and ri;t .)
Second, consider player i’s incentive to take the action ai;t = ri;t according to (11) if
player i follows the equilibrium strategy until she sends m1st
i;t . If she follows the equilibrium
strategy, then player i believes at the time when she takes an action that her equilibrium
value is equal to
#
"
1
X
t 1
(1
)E ui (at ) j hti + E
ui (at ) j hti ;
(1
)
=t+1
punish
(at )
where hti is as if player i observed ri;t
1
at 2At
1 t=1
, at , (! (a ))t =11 , ! i;t (at ), and ri;t , and
believed that r (a ) = a for each = 1; :::; t 1 with
with mediated perfect monitoring.
(Compared to the time when player i sends m1st
,
player
i now knows ! i;t (at ) and ri;t on
i;t
equilibrium path.)
If player i deviates from ai;t , then ! j; (a ) = P by de…nition for each
t + 1 and
a that is compatible with at (that is, a = (at ; at ; :::; a 1 ) for some at ; :::; a 1 ). To avoid
being minmaxed in period , player i needs to induce !
^ j; (a ) = R although ! j; (a ) = P .
j
j
i
t
i
t
t
~ ji;t (at ) are uniform
Given ! j; (a ) = P , since i;t (a ), i;t (a ), u~i;t (a ), vi;t (n; at ) nii;t (at ), and n
and independent (conditional on the other variables), regardless of player i’s continuation
strategy, by (10) (with indices i and j reversed), player i will send m1st
i; with
m1st
i; (u) 6=
j
j;
(a )
m1st
i; (x) +
j
j;
(a ) (mod p )
with a high probability.
Hence, for su¢ ciently large N t , if N
N t for each
t, then player i will be punished
for the next Tt periods regardless of player i’s continuation strategy with probability no less
than 1 "t . By (3), deviating from ri;t gives a strictly lower payo¤ than her equilibrium
payo¤. Therefore, it is optimal to take ai;t = ri;t .
2nd
Finally, consider player i’s incentive to tell the truth about m2nd
i;t . Regardless of mi;t ,
player j’s actions do not change. Hence, we are left to show that telling a lie does not
improve player i’s deviation gain by giving player i more information.
On the equilibrium path, player i knows ! i;t (at ). If player i tells the truth, then m2nd
i;t =
i
t+1
t
ni;t+1 (a ) 6= fbabbleg if and only if ! i;t (a ) = R. Moreover, player j sends
m1st
j;t+1 =
t+1
uji;t+1 (m2nd
); xji;t+1 (at+1 )
if ! i;t (at ) = R;
i;t ; a
uji;t+1 (nji;t+1 (at+1 ); at+1 ); xji;t+1 (at+1 ) if ! i;t (at ) = P:
11
Since nji;t+1 (at+1 ) = nii;t+1 (at+1 ) if ! i;t (at ) = P , in total, if player i tells the truth, then
player i knows uij;t+1 (mii;t+1 (at+1 ); at+1 ) and xij;t+1 (at+1 ). This is su¢ cient information to
infer ! i;t+1 (at+1 ) and ri;t+1 (at+1 ) correctly.
If she tells a lie, then player j’s messages are changed to
m1st
j;t+1
=
t+1
uji;t+1 (m2nd
); xji;t+1 (at+1 )
if m2nd
6 fbabbleg;
i;t ; a
i;t =
j
j
j
t+1
t+1
t+1
2nd
ui;t+1 (ni;t+1 (a ); a ); xi;t+1 (a ) if mi;t = fbabbleg:
j
(n; at+1 ) nii;t+1 (at+1 ), and n
~ ji;t+1 (at+1 ) are
Since ii;t+1 (at+1 ), ii;t+1 (at+1 ), u~ji;t+1 (at+1 ), vi;t+1
uniform and independent conditional on ! i;t+1 (at+1 ) and ri;t+1 (at+1 ), uji;t+1 (n; at+1 ) and
xji;t+1 (at+1 ) are not informative about player j’s recommendation from period t + 1 on or
player i’s recommendation from period t + 2 on, given that player i knows ! i;t+1 (at+1 ) and
ri;t+1 (at+1 ). Since telling the truth informs player i of ! i;t+1 (at+1 ) and ri;t+1 (at+1 ), there is
no gain from telling a lie.
12