Supplement to - Theoretical Economics

Supplementary Material
Supplement to “Bounding equilibrium payoffs in repeated
games with private monitoring”
(Theoretical Economics, Vol. 12, No. 2, May 2017, 691–729)
Takuo Sugaya
Graduate School of Business, Stanford University
Alexander Wolitzky
Department of Economics, MIT
Proof of Proposition 2
We prove that Etalk (δ p) = Emed (δ). In our construction, players ignore private signals
yit observed in periods t = 1 2 . That is, only signal yi0 observed in period 0 is used.
Hence we can see p as an ex ante correlation device. Since we consider two-player
games, whenever we say players i and j, we assume that they are different players: i = j.
The structure of the proof is as follows: take any strategy of the mediator, μ̃, that
satisfies inequality (3) in the text (perfect monitoring incentive compatibility), and let
ṽ be the value when the players follow μ̃. Since each v̂ ∈ Emed (δ) has a corresponding
μ̂ that satisfies perfect monitoring incentive compatibility, it suffices to show that, for
each ε > 0, there exists a sequential equilibrium whose equilibrium payoff v satisfies
v − ṽ < ε in the following environment:
(i) At the beginning of the game, each player i receives a message mmediator
from the
i
mediator.
(ii) In each period t, the stage game proceeds as follows:
2nd t−1
(a) Given player i’s history (mmediator
(m1st
τ aτ mτ )τ=1 ), each player i sends the
i
first message m1st
it simultaneously.
1st
2nd t−1
(m1st
(b) Given player i’s history (mmediator
τ aτ mτ )τ=1 mt ), each player i
i
takes action ait simultaneously.
1st
2nd t−1
(c) Given player i’s history (mmediator
(m1st
τ aτ mτ )τ=1 mt at ), each player i
i
2nd
sends the second message mit simultaneously.
We call this environment perfect monitoring with cheap talk.
Takuo Sugaya: [email protected]
Alexander Wolitzky: [email protected]
Copyright © 2017 The Authors. Theoretical Economics. The Econometric Society. Licensed under the
Creative Commons Attribution-NonCommercial License 3.0. Available at http://econtheory.org.
DOI: 10.3982/TE2270
2 Sugaya and Wolitzky
Supplementary Material
To this end, from μ̃, we first create a strict full-support equilibrium μ with mediated
perfect monitoring that yields payoffs close to ṽ. We then move from μ to a similar equilibrium μ∗ , which will be easier to transform into an equilibrium with perfect monitoring with cheap talk. Finally, from μ∗ , we create an equilibrium with perfect monitoring
with cheap talk with the same on-path action distribution.
Construction and properties of μ
In this subsection, we consider mediated perfect monitoring throughout. Since W̊ ∗ = ∅,
by Lemma 2 in the main text, there exists a strict full-support equilibrium μstrict with mediated perfect monitoring. As in the proof of that lemma, consider the following strategy
of the mediator: In period 1, the mediator draws one of two states, Rṽ and Rperturb , with
probabilities 1 − η and η, respectively. In state Rṽ , the mediator’s recommendation is
determined as follows: If no player has deviated up to period t, the mediator recommends rt according to μ̃(htm ); if only player i has deviated, the mediator recommends
rjt to player j according to α∗j , and recommends some best response to α∗j to player i.
Multiple deviations are treated as in the proof of Lemma 1. In contrast, in state Rperturb ,
the mediator follows the equilibrium μstrict . Let μ denote this strategy of the mediator.
From now on, we fix η > 0 arbitrarily.
With mediated perfect monitoring, since μstrict has full support, player i believes that
the mediator’s state is Rperturb with positive probability after any history. Therefore, by
perfect monitoring incentive compatibility and the fact that μstrict is a strict equilibrium,
it is always strictly optimal for each player i to follow her recommendation. This means
that, for each period t, there exist εt > 0 and Tt < ∞ such that, for each player i and
on-path history ht+1
m , we have
∞
τ t
μ
t
μ
τ−t−1
(1 − δ)E ui (rt ) | hm rit + δE (1 − δ)
δ
ui μ hm hm rit
> max (1 − δ)E ui (ai rjt ) htm rit
τ=t+1
(S1)
ai ∈Ai
+ δ − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a)
âi
a∈A
a∈A
That is, suppose that if player i unilaterally deviates from on-path history, then player
j virtually minmaxes player i for Tt − 1 periods with probability 1 − εt . (Recall that α∗j
is the minmax strategy and αεj is a full-support perturbation of α∗j .) Then player i has
a strict incentive not to deviate from any recommendation in period t on equilibrium
path. Equivalently, since μ is a full-support recommendation, player i has a strict incentive not to deviate unless she herself has deviated.
Moreover, for sufficiently small εt > 0, we have
∞
τ t
μ
t
μ
τ−t−1
δ
ui μ hm hm
(1 − δ)E ui (rt ) | hm rit + δE (1 − δ)
τ=t+1
> 1 − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a)
âi
a∈A
a∈A
(S2)
Supplementary Material
Bounding equilibrium payoffs
3
That is, if a deviation is punished with probability 1 − εt for Tt periods including the
current period, then player i believes that the deviation is strictly unprofitable.1
For each t, we fix εt > 0 and Tt < ∞ with (S1) and (S2). Without loss, we can take εt
decreasing: εt ≥ εt+1 for each t.
Construction and properties of μ∗
In this subsection, we again consider mediated perfect monitoring. We further modify μ and create the following mediator’s strategy μ∗ : Fix a fully mixed μ̊ ∈ (A) with
u(μ̊) ∈ W̊ ∗ . At the beginning of the game, for each i, t, and at , the mediator draws
punish t
rit
(a ) according to αεi t . In addition, for each i and t, she draws ωit ∈ {R P} such
that ωit = R (regular) and P (punish) with probability 1 − pt and pt , respectively, independently across i and t. We will pin down pt > 0 in Lemma S1. Moreover, given
ωt = (ω1t ω2t ), the mediator chooses rt (at ) for each at as follows: If ω1t = ω2t = R,
then she draws rt (at ) according to μ(at )(r) if ω1τ = ω2τ = R for each τ ≤ t −1; and draws
rt (at ) according to μ̊(r) if there exists τ ≤ t − 1 with ω1τ = P or ω2τ = P. If ωit = R and
punish
ωjt = P, then she draws rit (at ) from Prμ (ri | rjt
(at )) while she draws rjt (at ) ran
aj 2
domly from aj ∈Aj |Aj | . Finally, if ω1t = ω2t = P, then she draws rit (at ) randomly
ai
for each i independently. Since μ has full support, μ∗ is well defined.
from ai ∈Ai |A
i|
As will be seen, we will take pt sufficiently small. In addition, recall that η > 0 (the
perturbation of μ̃ to μ) is arbitrarily. In the next subsection and onward, we construct
an equilibrium with perfect monitoring with cheap talk that has the same equilibrium
action distribution as μ∗ . Since pt is small and η > 0 is arbitrary, constructing such an
equilibrium suffices to prove Proposition 2.
punish t
At the start of the game, the mediator draws ωt , rit
(a ), and rt (at ) for each i, t,
t
and a . Given them, the mediator sends messages to the players as follows:
punish
(i) At the start of the game, the mediator sends ((rit
(at ))at ∈At−1 )∞
t=1 to player i.
(ii) In each period t, the stage game proceeds as follows:
(a) The mediator decides ω̄t (at ) ∈ {R P}2 as follows: if there is no unilateral deviator (defined below), then the mediator sets ω̄t (at ) = ωt . If instead player i is
a unilateral deviator, then the mediator sets ω̄it (at ) = R and ω̄jt (at ) = P.
(b) Given ω̄it (at ), the mediator sends ω̄it (at ) to player i. In addition, if ω̄it (at ) =
R, then the mediator sends rit (at ) to player i as well.
(c) Given these messages, player i takes an action. In equilibrium, if player i has
punish t
(a )
not yet deviated, then player i takes rit (at ) if ω̄it (at ) = R and takes rit
1 If the current on-path recommendation schedule Prμ (r | ht r ) is very close to α∗ , then (S2) may be
jt
m it
j
more restrictive than (S1).
punish
will be seen below, if ωjt = P, then player j is supposed to take rjt
(at ). Hence, rjt (at ) does
not affect the equilibrium action. We define rjt (at ) so that, when the mediator sends a message only at
the beginning of the game (in the game with perfect monitoring with cheap talk), she sends a “dummy
recommendation” rjt (at ) so that player j does not realize that ωjt = P until period t.
2 As
4 Sugaya and Wolitzky
Supplementary Material
if ω̄it (at ) = P. For notational convenience, let
rit =
ri at
punish t a
rit
if ω̄it at = R
if ω̄it at = P
be the action that player i is supposed to take if she has not yet deviated. Her
strategy after her own deviation is not specified.
We say that player i has unilaterally deviated if there exist τ ≤ t − 1 and a unique i
such that (i) for each τ < τ, we have anτ = rnτ for each n ∈ {1 2} (no deviation happened until period τ − 1) and (ii) aiτ = riτ and ajτ = rjτ (player i deviates in period τ
and player j does not deviate).
Note that μ∗ is close to μ on the equilibrium path for sufficiently small pt . Hence, onpath strict incentive compatibility for player i follows from (S1). Moreover, the incentive
compatibility condition analogous to (S2) also holds.
Lemma S1. There exists {pt }∞
t=1 with pt > 0 for each t such that it is strictly optimal for
each player i to follow her recommendation: For each player i and history
hti ≡
punish t rit
a
t−1
∞
at ω̄τ aτ τ=1 ω̄it at (riτ )tτ=1 at ∈At−1 t=1
if player i herself has not yet deviated, we have the following two inequalities:
(i) If a deviation is punished by αεj t for the next period Tt periods with probability
t −1
1 − εt − t+T
pτ , then it is strictly unprofitable:
τ=t
∞
∗
∗
δτ−t−1 ui (riτ ajτ ) | hti ait = rit
(1 − δ)Eμ ui (rit ajt ) | hti + δEμ (1 − δ)
τ=t+1
> max (1 − δ)E ui (ai ajt ) | hti
μ∗
ai ∈Ai
+ δ − δTt
+ εt +
1 − εt −
t+T
t −1
t+T
t −1
τ=t
pτ max ui âi αεj t
âi
(S3)
pτ max ui (a)
τ=t
a∈A
+ δTt max ui (a)
a∈A
(ii) If a deviation is punished by αεj t from the current period with probability 1 − εt −
t+Tb −1
pt , then it is strictly unprofitable:
τ=t
∗
(1 − δ)Eμ ui (rit ajt ) | hti
∞
μ∗
τ−t−1
t
δ
ui (riτ ajτ ) | hi ait = rit
(1 − δ)
(S4)
+ δE
τ=t+1
Supplementary Material
> 1−δ
Tt
5
Bounding equilibrium payoffs
1 − εt −
t+T
t −1
τ=t
pτ
max ui âi αεj t +
âi
εt +
t+T
t −1
pτ max ui (a)
a∈A
τ=t
+ δTt max ui (a)
a∈A
∗
Moreover, Eμ does not depend on the specification of player j’s strategy after player j’s
own deviation, for each history hti such that player i has not deviated.
Proof. Given μ̊, since u(μ̊) ∈ W̊ ∗ , for sufficiently small εt > 0, we have
(1 − δ)Eμ̊ ui (rt ) | htm rit + δui (μ̊)
> max (1 − δ)E ui (ai r−it ) | htm rit
ai ∈Ai
+ δ − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a)
âi
a∈A
a∈A
and
(1 − δ)Eμ̊ ui (rt ) | htm rit + δui (μ̊)
> 1 − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a)
âi
a∈A
a∈A
Hence (S1) and (S2) hold with μ replaced with μ̊.
Since μ∗ has full support on the equilibrium path, a player i who has not yet devi∗
ated always believes that player j has not deviated. Hence, Eμ is well defined without
specifying player j’s strategy after player j’s own deviation.
Moreover, since pt is small and ωjt is independent of (ωτ )t−1
τ=1 and ωit , given
t ) (which are equal to (ω )t−1 and ω on path), player i believes
(ω̄τ (aτ ))t−1
and
ω̄
(a
τ τ=1
it
it
τ=1
that ω̄jt (at ) is equal to ωjt and ωjt is equal to R with a high probability, unless player i
has deviated. Since
Prμ∗ rjt | ω̄it at ω̄jt at = R hti = Prμ∗ rjt | at rit we have that the difference
∗
Eμ ui (rit ajt ) | hti − Eμ ui (rit ajt ) | rit at rit
is small for small pt .
Further, if pτ is small for each τ ≥ t + 1, then since ωτ is independent of ωt with
t ≤ τ − 1, regardless of (ω̄τ (aτ ))tτ=1 , player i believes that ω̄iτ (aτ ) = ω̄jτ (aτ ) = R with
high probability for τ ≥ t + 1 on the equilibrium path. Since the distribution of the recommendation given μ∗ is the same as that of μ (or μ̊) given aτ and (ω̄iτ (aτ ) ω̄jτ (aτ )) =
(R R) for each τ ≤ t − 1 (or (ω̄iτ (aτ ) ω̄jτ (aτ )) = (R R) for some τ ≤ t − 1, respectively),
we have that
6 Sugaya and Wolitzky
Supplementary Material
1. for each hti with ω̄it (at ) = R and (ω̄iτ (aτ ) ω̄jτ (aτ )) = (R R) for each τ ≤ t − 1,
E
μ∗
(1 − δ)
∞
δτ−t−1 ui (riτ ajτ ) | hti ait = rit
τ=t+1
∞
− Eμ (1 − δ)
δτ−t−1 ui (riτ ajτ ) | rit at rit
τ=t+1
is small for small pτ with τ ≥ t + 1; and
2. for each hti with ω̄it (at ) = R and (ω̄iτ (aτ ) ω̄jτ (aτ )) = (R R) for some τ ≤ t − 1,
E
μ∗
(1 − δ)
−E
μ̊
∞
τ−t−1
δ
ui (riτ ajτ ) | hti ait
= rit
τ=t+1
(1 − δ)
∞
τ−t−1
δ
ui (riτ ajτ ) | rit at rit
τ=t+1
is small for small pτ with τ ≥ t + 1.
Hence, (S1) and (S2) (and the same inequalities with μ∗ replaced with μ̊) imply that,
there exists p̄t > 0 such that, if pτ ≤ p̄t for each τ ≥ t, then the claims of the lemma hold.
Hence, if we take pt ≤ minτ≤t p̄τ , then the claims hold.
∗
We fix {pt }∞
t=1 so that Lemma S1 holds. This fully pins down μ with mediated perfect
monitoring.
Construction with perfect monitoring with cheap talk
Given μ∗ with mediated perfect monitoring, we define the equilibrium strategy with perfect monitoring with cheap talk such that the equilibrium action distribution is the same
as μ∗ . We must pin down the following four objects: at the beginning of the game, what
player i receives from the mediator; what message m1st
message mmediator
i
it player i sends
at the beginning of period t; what action ait player i takes in period t; and what message
m2nd
it player i sends at the end of period t.
Intuitive argument As in μ∗ , at the beginning of the game, for each i, t, and at , the
punish t
(a ) according to αεi t . In addition, with pt > 0 pinned down in
mediator draws rit
Lemma S1, she draws ωt ∈ {R P}2 and rt (at ) as in μ∗ for each t and at . She then defines
ω̄t (at ) from at , rt (at ), and ωt as in μ∗ .
Intuitively, the mediator sends all the information about
punish t punish t ∞
ω̄t at rt at r1t
a r2t
a at ∈At−1 t=1
through the initial messages (mmediator
mmediator
). In particular, the mediator directly
1
2
punish
sends ((rit
mediator . Hence, we focus on how
(at ))at ∈At−1 )∞
t=1 to player i as a part of mi
Supplementary Material
Bounding equilibrium payoffs
7
we replicate the role of the mediator in μ∗ of sending (ω̄t (at ) rt (at )) in each period,
depending on realized history at .
The key features to establish are (i) player i does not know the instructions for the
other player, (ii) before player i reaches period t, player i does not know her own recommendations for periods τ ≥ t (otherwise, player i would obtain more information than
the original equilibrium μ∗ and thus might want to deviate), and (iii) no player wants
to deviate (in particular, if player i deviates in actions or cheap talk, then the strategy of
player j is as if the state were ω̄jt = P in μ∗ , for a sufficiently long time with a sufficiently
high probability).
The properties (i) and (ii) are achieved by the same mechanism as in Theorem 9 of
Heller et al. (2012, henceforth HST). In particular, without loss, let Ai = {1i ni } be
player i’s action set. We can view rit (at ) as an element of {1 ni }. The mediator at the
beginning of the game draws rt (at ) for each at .
Instead of sending rit (at ) directly to player i, the mediator encodes rit (at ) as follows: For a sufficiently large N t ∈ Z to be determined, we define pt = N t ni nj . This pt
j
corresponds to ph in HST. Let Zpt ≡ {1 pt }. The mediator draws xit (at ) uniformly
and independently from Zpt for each i, t, and at . Given them, she defines
t
j i
yit
a ≡ xit at + rit at (mod ni )
(S5)
i (at ) is the “encoded instruction” of r (at ), and to obtain r (at ) from
Intuitively, yit
it
it
j
i (at ), player i needs to know x (at ). The mediator gives ((y i (at ))
∞
yit
at ∈At−1 )t=1 to player i
i
it
j
as a part of mmediator
. At the same time, she gives ((xit (at ))at ∈At−1 )∞
i
t=1 to player j as a part
j
of mmediator
. At the beginning of period t, player j sends xit (at ) by cheap talk as a part
j
t
t
of m1st
jt , based on the realized action a , so that player i does not know rit (a ) until period t. (Throughout the proof, the superscript of a variable represents who is informed
about the variable, and the subscript represents whose recommendation the variable is
about.)
To incentivize player j to tell the truth, the equilibrium should embed a mechanism
that punishes player i if she tells a lie. In HST, this is done as follows: The mediator draws
αiit (at ) and βiit (at ) uniformly and independently from Zpt , and defines
j j uit at ≡ αiit at × xit at + βiit at mod pt j
(S6)
j
The mediator gives xit (at ) and uit (at ) to player j while she gives αiit (at ) and βiit (at ) to
j
j
player i. In period t, player j is supposed to send xit (at ) and uit (at ) to player i. If player
j
j
i receives xit (at ) and uit (at ) with
j j uit at = αiit at × xit at + βiit at mod pt (S7)
then player i interprets that player j has deviated. For sufficiently large N t , since player
j
j does not know αiit (at ) and βiit (at ), if player j tells a lie about xit (at ), then with a high
probability, player j creates a situation where (S7) holds.
8 Sugaya and Wolitzky
Supplementary Material
Since HST considers Nash equilibrium, they let player i minimax player j forever
after (S7) holds. However, since we consider sequential equilibrium, as in the proof of
Lemma 2, we will create a coordination mechanism such that, if player j tells a lie, then
with high probability player i minimaxes player j for a long time and player i assigns
probability 0 to the event that player i punishes player j.
To this end, we consider the following coordination: First, if and only if ω̄it (at ) = R,
j
j
the mediator defines uit (at ) as (S6). Otherwise, uit (at ) is randomly drawn. That is,
j uit at ≡
j αiit at × xit at + βiit at mod pt
uniformly distributed over Zpt
if ω̄it at = R
if ω̄it at = P
(S8)
Since both ω̄it (at ) = R and ω̄it (at ) = P happen with a positive probability, player i afj
j
j
ter receiving uit (at ) with uit (at ) = αiit (at ) × xit (at ) + βiit (at ) (mod pt ) interprets that
ω̄it (at ) = P. For notational convenience, let ω̂it (at ) ∈ {R P} be player i’s interpretation
punish
of ω̄it (at ). After ω̂it (at ) = P, she takes period-t action according to rit
(at ). Given
j
this inference, if player j tells a lie about uit (at ) with ω̄it (at ) = R, then with a high probj
j
ability, she induces a situation with uit (at ) = αiit (at ) × xit (at ) + βiit (at ) (mod pt ), and
player i punishes player j in period t (without noticing player j’s deviation).
punish t
(a ) for period t only may not suffice if player j believes
Second, switching to rit
that player i’s action distribution given ω̄it (at ) = R is close to the minimax strategy.
punish
(aτ ) for a sufficiently
Hence, we ensure that once player j deviates, player i takes riτ
long time.
To this end, we change the mechanism so that player j does not always know
j
j
uit (at ). Instead, the mediator draws pt independent random variables vit (n at ) with
n = 1 pt uniformly from Zpt . In addition, she draws niit (at ) uniformly from Zpt . The
j
mediator defines uit (n at ) for each n = 1 pt as
j uit n at =
j
j uit at
j vit n at
if n = niit at if otherwise,
j
that is, uit (n at ) corresponds to uit (at ) with (S8) only if n = niit (at ). For other n,
j
uit (n at ) is completely random.
j
The mediator sends niit (at ) to player i, and sends {uit (n at )}n∈Zpt to player j. In
j
addition, the mediator sends nit (at ) to player j, where
j nit at =
niit at
uniformly distributed over Zpt
if ωit−1 at−1 = P
if ωit−1 at−1 = R
is equal to niit (at ) if and only if last-period ω̄it−1 (at−1 ) is equal to P.
Supplementary Material
Bounding equilibrium payoffs
j
9
j
In period t, player j is asked to send xit (at ) and uit (n at ) with n = niit (at ), that is,
j
j
j
j
send xit (at ) and uit (at ). If and only if player j’s messages x̂it (at ) and ûit (at ) satisfy
j j ûit at = αiit at × x̂it at + βiit at mod pt player i interprets ω̂it (at ) = R. If player i has ω̂it (at ) = R, then player i knows that
j
player j needs to know niit+1 (at+1 ) to send the correct uit+1 (n at+1 ) in the next period.
Hence, she sends niit+1 (at+1 ) to player j. If player i has ω̂it (at ) = P, then she believes
that player j knows niit+1 (at+1 ) and does not send niit+1 (at+1 ).
Given this coordination, once player j creates a situation with ω̄it (at ) = R but
ω̂it (at ) = P, then player j cannot receive niit+1 (at+1 ). Without knowing niit+1 (at+1 ),
j
with a large N t+1 , with a high probability, player j cannot know which uit+1 (n at+1 )
she should send. Then, again, she will create a situation with
j
j ûit+1 at+1 = αiit+1 at+1 × x̂it at+1 + βiit at+1 mod pt+1 that is, ω̂it+1 (at+1 ) = P. Recursively, player i has ω̂iτ (aτ ) = P for a long time with a high
probability if player j tells a lie.
Finally, if player j takes a deviant action in period t, then the mediator has drawn
ω̄iτ (aτ ) = P for each τ ≥ t + 1 for aτ corresponding to the realized history. With
ω̄iτ (aτ ) = P, so as to avoid ω̂iτ (aτ ) = P, player j needs to create a situation
j j ûiτ aτ = αiiτ aτ × x̂iτ aτ + βiiτ aτ mod pτ
without knowing αiiτ (aτ ) and βiiτ (aτ ) while the mediator’s message does not tell her
j
what is αiit (at ) × xit (at ) + βiit (at ) (mod pτ ) by (S8). Hence, for sufficiently large N τ ,
player j cannot avoid ω̂iτ (aτ ) = P with a nonnegligible probability. Hence, player j will
be minmaxed from the next period with a high probability.
The above argument in total shows that if player j deviates, whether in communication or action, then she will be minmaxed for a sufficiently long time. Lemma S1 ensures
that player j does not want to tell a lie or take a deviant action.
Formal construction Let us formalize the above construction: As in μ∗ , at the beginpunish t
(a ) according to αεi t ;
ning of the game, for each i, t, and at , the mediator draws rit
then she draws ωt ∈ {R P}2 and rt (at ) for each t and at ; and then she defines ω̄t (at ) from
j
at , rt (at ), and ωt as in μ∗ . For each t and at , she draws xit (at ) uniformly and independently from Zpt . Given them, she defines
t
j i
yit
a ≡ xit at + rit at (mod ni )
so that (S5) holds.
j
j
The mediator draws αiit (at ), βiit (at ), ũit (at ), vit (n at ) for each n ∈ Zpt , niit (at ), and
j
ñit (at ) from the uniform distribution over Zpt independently for each player i, each
period t, and each at .
10 Sugaya and Wolitzky
Supplementary Material
As in (S8), the mediator defines
j αiit at × xit at + βiit at mod pt
j t
uit a ≡ j t ũit a
if ω̄it at = R
if ω̄it at = P
In addition, the mediator defines
j uit n at =
and
j nit at =
j uit at
j vit n at
niit at
j ñit at
if n = niit at if otherwise
if t = 1 or ωit−1 at−1 = P
if t = 1 and ωit−1 at−1 = R
as explained above.
Let us now define the equilibrium:
(i) At the beginning of the game, the mediator sends
⎛
t i t i t punish t i
a
α
a
β
a
r
a
y
it
it
it
it
mmediator
= ⎝ i t i t i i
nit a njt a ujt n at n∈Z t xijt at
p
⎞∞
⎠
at ∈At−1
t=1
to each player i.
(ii) In each period t, the stage game proceeds as follows: In equilibrium,
j j t
t
uit m2nd
if t = 1 and m2nd
it−1 a xit a
it−1 = {babble}
1st
mjt = j j t t j t 2nd
if t = 1 or mit−1 = {babble}
uit nit a a xit a
and
m2nd
jt
(S9)
j
njt+1 at+1 if ω̂jt at = R
=
{babble}
if ω̂jt at = P
t+1 =
Note that, since m2nd
jt is sent at the end of period t, the players know a
(a1 at ).
2nd t−1
(a) Given player i’s history (mmediator
(m1st
τ aτ mτ )τ=1 ), each player i sends the
i
first message m1st
it simultaneously. If player i herself has not yet deviated, then
m1st
it
i t
t
uijt m2nd
if t = 1 and m2nd
jt−1 a xjt a
jt−1 = {babble}
= i i t t i t
2nd
if t = 1 or mjt−1 = {babble}
ujt njt a a xjt a
i
1st
2nd
t
Let m1st
it (u) be the first element of mit (that is, either ujt (mjt−1 a ) or
uijt (nijt (at ) at ) on equilibrium), and let m1st
it (x) be the second element
(xijt (at ) on equilibrium). As a result, the profile of the messages m1st
t becomes
common knowledge.
Supplementary Material
Bounding equilibrium payoffs
11
If
t
t i
1st
i
m1st
mod pt jt (u) = αit a × mjt (x) + βit a
(S10)
then player i interprets ω̂it (at ) = P. Otherwise, ω̂it (at ) = R.
1st
2nd t−1
(b) Given player i’s history (mmediator
(m1st
τ aτ mτ )τ=1 mt ), each player i
i
takes action ait simultaneously. If player i herself has not yet deviated, then
player i takes ait = rit with
rit
t
i
yit
a − m1st (x) (mod ni )
= punish t jt
rit
a
if ω̂it at = R
if ω̂it at = P
(S11)
j
i (at ) ≡ x (at ) + r (at ) (mod n ) by (S5). By (S9), therefore, player
Recall that yit
it
i
it
punish
i (at ) if ω̄ (at ) = R and r
i takes rit
it
it
path, as in μ∗ .
(at ) if ω̄it (at ) = P on the equilibrium
1st
2nd t−1
(c) Given player i’s history (mmediator
(m1st
τ aτ mτ )τ=1 mt at ), each player i
i
sends the second message m2nd
it simultaneously. If player i herself has not yet
deviated, then
niit+1 at+1 if ω̂it at = R
2nd
mit =
{babble}
if ω̂it at = P
As a result, the profile of the messages m2nd
becomes common knowledge.
t
t
Note that ω̄t (a ) becomes common knowledge as well on equilibrium path.
Incentive compatibility
The above equilibrium has full support: Since ω̄t (at ) and rt (at ) have full support,
(mmediator
mmediator
) have full support as well. Hence, we are left to verify player i’s in1
2
centive not to deviate from the equilibrium strategy, given that player i believes that
player j has not yet deviated after any history of player i.
Suppose that player i followed the equilibrium strategy until the end of period t − 1.
First, consider player i’s incentive to tell the truth about m1st
it . In equilibrium, player i
sends
i t
t
uijt m2nd
if m2nd
jt−1 a xjt a
jt−1 = {babble}
1st
mit = i i t t i t if m2nd
ujt njt a a xjt a
jt−1 = {babble}
The random variables possessed by player i are independent of those possessed by
j
j
i
i
2nd t−1
t
t
t
t
player j given (m1st
τ aτ mτ )τ=1 , except that (i) uit (a ) = αit (a ) × xit (a ) + βit (a )
j
j
(mod pt ) if ω̄it (at ) = R, (ii) uijt (at ) = αjt (at ) × xijt (at ) + βjt (at ) (mod pt ) if ω̄jt (at ) = R,
j
j
(iii) niτ (aτ ) = niiτ (aτ ) if ωiτ−1 (aτ−1 ) = P while niτ (aτ ) = ñiiτ (aτ ) if ωiτ−1 (aτ−1 ) = R, and
j
j
(iv) nijτ (aτ ) = njτ (aτ ) if ωjτ−1 (aτ−1 ) = P while nijτ (aτ ) = ñjτ (aτ ) if ωjτ−1 (aτ−1 ) = R.
12 Sugaya and Wolitzky
Supplementary Material
j
j
j
Since αiit (at ), βiit (at ), ũit (at ), vit (n at ) niit (at ), and ñit (at ) are uniform and independent, player i cannot learn ω̄iτ (aτ ), riτ (aτ ), or rjτ (aτ ) with τ ≥ t. Hence, player i believes
at the time when she sends m1st
it that her equilibrium value is equal to
∞
∗
(1 − δ)E ui (at ) | hti + δEμ (1 − δ)
δτ−t−1 ui (at ) | hti μ∗
τ=t+1
punish
t
where hti is as if player i observed (rit
(at ))∞
, at , (ω̄τ (aτ ))t−1
τ=1 , and rit (a ), and
at ∈At−1 t=1
τ
∗
believed that rτ (a ) = aτ for each τ = 1 t − 1 with μ with mediated perfect monitoring.
Alternatively, for each e > 0, for a sufficiently large N t , if player i tells a lie in at least
one element m1st
it , then with probability 1 − e, player i creates a situation
j t
j t 1st
mod pt m1st
it (u) = αjt a × mit (x) + βjt a
Hence, (S10) (with indices i and j reversed) implies that ω̂jt (at ) = P.
Moreover, if player i creates a situation with ω̂jt (at ) = P, then player j will send
j
j
t+1 ). Unless ω̄ (at ) = P, since n
t+1 ) is indem2nd
jt
jt = {babble} instead of njt+1 (a
jt+1 (a
j
pendent of player i’s variables, player i believes that njt+1 (at+1 ) is distributed uniformly
over Zpt+1 . Hence, for each e > 0, for sufficiently large N t , if ω̂jt (at ) = R, then player i
will send m1st
it+1 with
t+1 t+1 j
j
m1st
× m1st
mod pt+1
it+1 (u) = αjt+1 a
it+1 (x) + βjt+1 a
with probability 1 − e. Then, by (S10) (with indices i and j reversed), player j will have
ω̂jt+1 (at+1 ) = P.
Recursively, if ω̄jτ (aτ ) = R for each τ = t t + Tt − 1, then player i will induce
ω̂jτ (aτ ) = P for each τ = t t + Tt − 1 with a high probability. Hence, for εt > 0 and Tt
fixed in (S1) and (S2), for sufficiently large N̄ t , if N τ ≥ N̄ t for each τ ≥ t, then player i will
be punished for the subsequent Tt periods regardless of player i’s continuation strategy
t −1
t −1
pτ . ( t+T
pτ represents the maximum
with probability no less than 1 − εt − t+T
τ=t
τ=t
probability of having ω̄iτ (aτ ) = P for some τ for subsequent Tt periods.) Equation (S4)
implies that telling a lie gives a strictly lower payoff than the equilibrium payoff. Therefore, it is optimal to tell the truth about m1st
it . (In (S4), we have shown interim incentive
compatibility after knowing ω̄it (at ) and rit , while here we consider hti before ω̄it (at )
and rit . Taking the expectation with respect to ω̄it (at ) and rit , (S4) ensures incentive
compatibility before knowing ω̄it (at ) and rit .)
Second, consider player i’s incentive to take the action ait = rit according to (S11)
if player i follows the equilibrium strategy until she sends m1st
it . If she follows the equilibrium strategy, then player i believes at the time when she takes an action that her
equilibrium value is equal to
∞
μ∗
t
μ∗
τ−t−1
t
δ
ui (at ) | hi (1 − δ)
(1 − δ)E ui (at ) | hi + δE
τ=t+1
Supplementary Material
Bounding equilibrium payoffs
13
punish
t
(at ))∞
, at , (ω̄τ (aτ ))t−1
where hti is as if player i observed (rit
τ=1 , ω̄it (a ), and
at ∈At−1 t=1
τ
∗
rit , and believed that rτ (a ) = aτ for each τ = 1 t − 1 with μ with mediated perfect
t
monitoring. (Compared to the time when player i sends m1st
it , player i now knows ω̄it (a )
and rit on the equilibrium path.)
If player i deviates from ait , then ω̄jτ (aτ ) = P by definition for each τ ≥ t + 1 and
τ
a that is compatible with at (that is, aτ = (at at aτ−1 ) for some at aτ−1 ). To
avoid being minmaxed in period τ, player i needs to induce ω̂jτ (aτ ) = R although
j
j
ω̄jτ (aτ ) = P. Given ω̄jτ (aτ ) = P, since αiit (at ), βiit (at ), ũit (at ), vit (n at ) niit (at ), and
j
ñit (at ) are uniform and independent (conditional on the other variables), regardless of
player i’s continuation strategy, by (S10) (with indices i and j reversed), player i will send
m1st
iτ with
j τ
j τ 1st
m1st
mod pτ
iτ (u) = αjτ a × miτ (x) + βjτ a
with a high probability.
Hence, for sufficiently large N̄ t , if N τ ≥ N̄ t for each τ ≥ t, then player i will be punished for the next Tt periods regardless of player i’s continuation strategy with probability no less than 1 − εt . By (S3), deviating from rit gives a strictly lower payoff than her
equilibrium payoff. Therefore, it is optimal to take ait = rit .
2nd
Finally, consider player i’s incentive to tell the truth about m2nd
it . Regardless of mit ,
player j’s actions do not change. Hence, we are left to show that telling a lie does not
improve player i’s deviation gain by giving player i more information.
On the equilibrium path, player i knows ω̄it (at ). If player i tells the truth, then
i
t+1 ) = {babble} if and only if ω̄ (at ) = R. Moreover, player j sends
m2nd
it
it = nit+1 (a
m1st
jt+1
j
j
t+1
uit+1 m2nd
xit+1 at+1
it a
= j
j
j
uit+1 nit+1 at+1 at+1 xit+1 at+1
if ω̄it at = R
if ω̄it at = P
j
Since nit+1 (at+1 ) = niit+1 (at+1 ) if ω̄it (at ) = P, in total, if player i tells the truth, then
player i knows uijt+1 (miit+1 (at+1 ) at+1 ) and xijt+1 (at+1 ). This is sufficient information
to infer ω̄it+1 (at+1 ) and rit+1 (at+1 ) correctly.
If she tells a lie, then player j’s messages are changed to
m1st
jt+1
j
2nd t+1 j
u
mit a
xit+1 at+1
if m2nd
it = {babble}
= it+1
j
t+1 t+1 j
t+1 j
uit+1 nit+1 a
a
xit+1 a
if m2nd
it = {babble}
j
j
j
Since αiit+1 (at+1 ), βiit+1 (at+1 ), ũit+1 (at+1 ), vit+1 (n at+1 ) niit+1 (at+1 ), and ñit+1 (at+1 )
j
are uniform and independent conditional on ω̄it+1 (at+1 ) and rit+1 (at+1 ), uit+1 (n at+1 )
j
and xit+1 (at+1 ) are not informative about player j’s recommendation from period
t + 1 on or player i’s recommendation from period t + 2 on, given that player i knows
ω̄it+1 (at+1 ) and rit+1 (at+1 ). Since telling the truth informs player i of ω̄it+1 (at+1 ) and
rit+1 (at+1 ), there is no gain from telling a lie.
14 Sugaya and Wolitzky
Supplementary Material
References
Heller, Y., E. Solan, and T. Tomala (2012), “Communication, correlation and cheap-talk
in games with public information.” Games and Economic Behavior, 74, 222–234. [7]
Co-editor George J. Mailath handled this manuscript.
Manuscript received 26 August, 2015; final version accepted 25 April, 2016; available online 25
April, 2016.