Folk Theorem in Repeated Games with Private Monitoring

Folk Theorem in Repeated Games with Private Monitoring
Takuo Sugayay
Stanford Graduate School of Business
April 29, 2015
Abstract
We show that the folk theorem generically holds for the repeated two-player game with private monitoring if the support of each player’s signal distribution is su¢ ciently large. Neither
cheap talk communication nor public randomization is necessary.
Journal of Economic Literature Classi…cation Numbers: C72, C73, D82
Keywords: repeated game, folk theorem, private monitoring
[email protected]
I thank Stephen Morris for invaluable advice, Yuliy Sannikov for helpful discussion, Andrzej Skrzypacz for
carefully reading the paper, Yuichi Yamamoto for his suggestions to improve the exposition of the paper, and Dilip
Abreu, Eduardo Faingold, Drew Fudenberg, Edoardo Grillo, Johannes Hörner, Yuhta Ishii, Michihiro Kandori,
Vijay Krishna, George Mailath, Mihai Manea, Larry Samuelson, Tadashi Sekiguchi, Satoru Takahashi, and seminar
participants at 10th SAET conference, 10th World Congress of the Econometric Society, 22nd Stony Brook Game
Theory Festival, and Summer Workshop on Economic Theory for useful comments. I also thank Piotr Dworczak
and Sergey Vorontsov for careful proofreading, and Jenna Abel for professional editorial service. The remaining
errors are my own responsibility.
y
1
1
Introduction
A possibility of tacit collusion when a party observes only private signals about the other parties’
actions has been a long-standing open problem. Stigler (1964) considers a case in which a …rm can
cut the price secretly and conjectures that, since the coordination on the punishment is di¢ cult with
private signals, policing an implicit collusive agreement (punishing a deviator from the agreement)
does not work e¤ectively. Harrington and Skrzypacz (2011) examine a recent cartel agreement, and
construct a model to replicate how the …rms police the agreement. Since their equilibrium requires
cheat-talk communication and monetary transfer, they show the possibility of explicit collusion,
rather than tacit collusion.
A theoretical possibility of tacit collusion is, therefore, yet to be
proven.1
The literature on in…nitely repeated games o¤ers a theoretical framework to study tacit collusion.
One of the key …ndings is the folk theorem: Any feasible and individually rational payo¤ pro…le
can be sustained in equilibrium when players are su¢ ciently patient. This implies that su¢ ciently
patient …rms can sustain tacit collusion to obtain any feasible and individually rational payo¤.
Fudenberg and Maskin (1986) establish the folk theorem under perfect monitoring, in which players
can directly observe the action pro…le.
Fudenberg, Levine, and Maskin (1994) extend the folk
theorem to imperfect public monitoring, in which players can observe only public noisy signals
about the action pro…le.
It has been an open question whether the folk theorem holds in general private monitoring,
in which players observe private noisy signals about other players’ actions.
Matsushima (2004),
Hörner and Olszewski (2006), Hörner and Olszewski (2009), and Yamamoto (2012) show the folk
theorem in restricted classes of private monitoring (see below for details), but little is known about
general private monitoring.
The objective of this paper is to solve this open problem: In general private monitoring with
discounting,2 the folk theorem holds in the two-player game.
Speci…cally, we identify su¢ cient
conditions with which the folk theorem holds, and show that these su¢ cient conditions generically
hold in private monitoring.
In our equilibrium, we use neither cheap-talk communication nor
1
Another area where private signals are important is team-production. If each agent in a team subjectively
evaluates how the other agents contribute to the production, then seeing subjective evaluation as a private signal,
the situation is modeled as repeated games with private monitoring.
2
See Lehrer (1990) for the case with no discounting.
2
public randomization.3
Furthermore, in the companion paper Sugaya (2012), we generalize our
equilibrium construction to a general game with more than two players.
We now discuss papers about the folk theorem4 and then illustrate how we prove the folk
theorem, built upon the pre-existing work.
The driving force of the folk theorem is reciprocity:
If a player deviates today, she will be punished in the future (Stigler (1964) call this reciprocity
“policing the agreement”). For this mechanism to work, each player needs to coordinate her action
with the other players’histories. Whether the players achieve coordination depends on the players’
information and thus on the monitoring structure.
Fudenberg and Maskin (1986) establish the folk theorem under perfect monitoring. Because the
histories are common knowledge in perfect monitoring, each player can coordinate her continuation
strategy with the other players’ histories.
Fudenberg, Levine, and Maskin (1994) extend the
folk theorem to imperfect public monitoring by focusing on the equilibrium in which each player’s
continuation strategy depends only on past public signals. Because the histories of public signals
are common knowledge, players can coordinate their continuation play through public signals.
On the other hand, in general private monitoring, each player’s actions and signals are her
private information, so players do not share common information about the histories. Hence, the
coordination becomes complicated as periods proceed. This is why Stigler (1964) conjectures that
collusion is impossible with general private monitoring.
If monitoring is almost public, then players believe that every player observes the same signal
with a high probability. Hence, almost common knowledge about …nite-past histories exists. This
almost common knowledge enables Hörner and Olszewski (2009) to show the folk theorem in almost
public monitoring.
Without almost public monitoring, almost common knowledge may not exist, and coordination
is di¢ cult.
To deal with this di¢ culty, Piccione (2002) and Ely and Välimäki (2002) focus on
a tractable class of equilibria, a “belief-free equilibrium.” A strategy pro…le is belief free if, after
each history pro…le, the continuation strategy of each player is optimal, conditional on the histories
of the opponents.
Hence, coordination is not an issue.
Piccione (2002) and Ely and Välimäki
(2002) show that the belief-free equilibrium sustains the set of feasible and individually rational
3
There are papers that prove folk theorems in private monitoring with cheap-talk or mediated communication
(explicit collusion). See, for example, Compte (1998), Kandori and Matsushima (1998), Aoyagi (2002), Aoyagi
(2005), Fudenberg and Levine (2007), Obara (2009), Rahman (2014), and Renault and Tomala (2004).
4
See also Kandori (2002) and Mailath and Samuelson (2006) for a survey.
3
payo¤s in the two-player prisoners’ dilemma with almost perfect monitoring.5;6 However, with
general monitoring or in a general game beyond prisoners’ dilemma, the belief-free equilibrium
cannot sustain the folk theorem. See Ely, Hörner, and Olszewski (2005) and Yamamoto (2009) for
the formal proof.
Hence, there are two ways to generalize the belief-free equilibrium: One is to
keep prisoners’dilemma payo¤ structure and to consider noisy monitoring; and the other is to keep
almost perfect monitoring and to consider a more general stage game.
The former approach by Matsushima (2004) and Yamamoto (2012) recovers the precision of the
monitoring, assuming that monitoring is not perfect but conditionally independent: Conditional
on action pro…les, the players observe statistically independent signals. The idea is to replace one
period in the equilibrium of Ely and Välimäki (2002) with a long T -period review phase. When
we aggregate information over a long phase and rely on the law of large numbers, we can recover
the precision of the monitoring.
To see the obstacle to further generalize their result to general monitoring with correlated
signals, let us explain their equilibrium construction in the two-player prisoners’dilemma in which
each player has two signals, good and bad. Suppose that with player i’s cooperation, player j 6= i
observes a good signal gj with 60%, while with player i’s defection, player j observes a bad signal
bj with 60%. To achieve approximately e¢ cient equilibrium, player j should not punish player i if
she observes gj for more than (0:6 + ") T times during a review phase with a small ". Note that
by the central limit theorem, 0:6T is what player j can expect from player i’s cooperation.
Suppose now we are close to the end of a review phase. If player i’s signals are correlated with
player j’s, then after rare histories, player i may believe that, given her own history and correlation,
player j has already observed gj for more than (0:6 + ") T times.
may want to switch to defection.
After such a history, player i
(If signals are conditionally independent, then player i after
each history always believes that player j observes gj at most (0:6 + ") T times with a very high
probability. Hence, switching does not happen.)
Once player i switches her action based on her history, player j wants to learn player i’s switch
of actions via player j’s private signals.
Note that player i reviews player j’s actions by player
5
See Yamamoto (2007) for the N -player prisoners’dilemma.
Kandori and Obara (2006) use a similar concept to analyze a private strategy in public monitoring. Kandori
(2011) considers “weakly belief-free equilibria,” which is a generalization of belief-free equilibria. Apart from a
typical repeated-game setting, Takahashi (2010) and Deb (2011) consider the community enforcement, and Miyagawa,
Miyahara, and Sekiguchi (2008) consider the situation in which a player can improve the precision of monitoring by
paying a cost.
6
4
i’s history and that player j wants to know how player i has reviewed player j so far.
Player
i’s switch of actions is informative about her history and so about how she has reviewed player j.
Recursively, if a player switches actions, then we have to deal with a high order belief about each
player’s history.
To deal with this problem, in our equilibrium, we divide a review phase into multiple review
rounds. If the division is …ne enough, we can make sure that the switch of actions only happens
at the beginning of the review round. In addition, before each review round, player j tells player i
whether player i should switch the actions. (Apart from the issue of player j’s incentive to tell the
truth,) if this communication is successful, then player i does not need to learn player j’s switches.
Since we do not assume cheap talk, this communication is done by player j taking di¤erent actions
to send di¤erent messages. Since player i infers player j’s actions (messages) from noisy private
signals, player i may make a mistake and player j may not realize player i’s mistake. In Section
10, we construct a module for a player to send the other player a message by taking actions with
noisy private signals, so that, if player i suspects that she made a mistake, then she believes that
player j has “realized”the mistake.
The latter approach is to generalize Ely and Välimäki (2002) to a general stage game (here we
focus on two-player games), keeping monitoring almost perfect. In the belief-free equilibrium in
Ely and Välimäki (2002), in each period, each player i picks a state xi that can be G (good) or B
(bad). In each period, given state xi 2 fG; Bg, player i takes a mixed action
i (xi ).
Together
with the state transition, they make sure that, for each state of the opponent xj 2 fG; Bg, both
i (G)
and
i (B)
are optimal conditional on xj .
A reason why the belief-free equilibrium in Ely and Välimäki (2002) cannot sustain the folk
theorem in a general game is that it is hard for player j with
j (B)
to punish player i severely
enough after a signal which statistically indicates her deviation, at the same time keeping both
j (G)
and
j (B)
optimal against both xi = G and xi = B and keeping the equilibrium payo¤ with
xi = xj = G su¢ ciently high.
Hörner and Olszewski (2006) overcome this di¢ culty as follows: They divide the repeated game
into L-period phases.7
In each phase, each player i picks a state xi 2 fG; Bg.
given state xi 2 fG; Bg, player i takes an L-period dynamic strategy
7
i (xi ).
In each phase,
Again, for each
They use the term T -period blocks, but we use L and phases instead, in order to make the terminology consistent
within the paper.
5
state of the opponent xj 2 fG; Bg, both
i (G)
and
i (B)
are optimal conditional on xj at the
beginning of review phase. However, within a phase, since the players coordinate on x by taking
actions at the beginning of the phase, it is optimal to adhere to
corresponding to
i (xi )
i (xi )
once player i takes an action
at the beginning of the phase. Having L periods in the phase, they can
create a severe punishment by letting player j with
j (B)
switch to a minimax strategy after a
signal which statistically indicates the opponent’s deviation. Moreover, since the switch happens
only after a rare signal which indicates a deviation, we can keep the payo¤ of
j (G)
j (B)
(and that of
in order to keep belief-free property at the beginning of the phase) su¢ ciently high given
xi = G.
There are two di¢ culties to generalize their construction to general monitoring: With general
monitoring, the coordination on x by taking actions becomes harder since the actions are less
precisely observed. Second, as seen in the case with Matsushima (2004), since each player switches
actions based on past signals, each player may start to learn the opponent’s history by observing
each other’s switches via noisy signals.
This learning becomes complicated with noisy signals.
Again, by using the module to send a message by taking actions with noisy private signals, we
make sure that the players can coordinate on x and learning does not change the player’s optimal
strategy.
In total, combining Matsushima (2004) and Hörner and Olszewski (2006), we construct a following equilibrium: We divide the repeated game into long review phases. At the beginning of the
phase, the players coordinate on x by using the module to send messages via taking actions. Then,
we have L review rounds. These review rounds serve two roles: One is to allow player i to switch
actions when she believes that the opponent has observed a lot of good signals in order to deal with
conditionally dependent signals. The other is to allow player j to switch to a minimaxing action
when she observes a lot of signals statistically indicating the opponent’s deviation as in Hörner and
Olszewski (2006). Relatedly, player i also switches her own actions to take a best response against
player j’s minimaxing strategy, when player j switches to the minimaxing actions. To coordinate
these switches, before each review rounds, the players communicate about the continuation play
again by the module to send messages via taking actions.
The module and communication are
designed carefully so that learning from the opponent’s actions in the review round does not change
the optimal action within the review round.
The rest of the paper is organized as follows. In Section 2, we introduce the model, and in
6
Section 3, we state assumptions and result (folk theorem). The rest of the paper proves the folk
theorem.
(After giving a short roadmap of the proof in Section 4), in Section 5, we derive the su¢ cient
conditions in a …nitely repeated game without discounting such that, once we prove the su¢ cient
conditions, then it implies that the folk theorem holds in the in…nitely repeated game with discounting. From there on, we focus on proving these su¢ cient conditions.
Section 6, we de…ne multiple variables, with which Section 7 pins down the structure (of the
strategy) of the …nitely repeated game.
(After giving another short roadmap in Section 8,) since the rest of the proof is long and
complicated, we …rst o¤er the overview in Section 9. Then, we prove two modules which may be of
their own interests. Section 10 de…nes the module for player j to send a binary message m 2 fG; Bg
to player i. Section 11 de…nes the module that will be used for the equilibrium construction of the
review round. Given these modules, we explain how to use these modules to prove the su¢ cient
conditions in Section 12.
Sections 13-16 actually prove the su¢ cient conditions. (See Section 12 for the structure of these
sections.) Since the entire proof is long, in Sections 13-16, we summarize each step as lemmas and offer the intuitive explanation of the proof, relegating the technical proof to Appendix A (online appendix). In addition, we o¤er the table of notation in Appendix B. Appendices A and B are provided
as supplemental materials for the submission. In case the referees are not provided the supplemental
materials, the online appendix is also available at https://sites.google.com/site/takuosugaya/home/research.
2
2.1
Model
Stage Game
We consider a two-player game with private monitoring.
The stage game is given by fI, fAi ,
Yi gi2I , qg. Here, I = f1; 2g is the set of players, Ai is the …nite set of player i’s pure actions, and
Yi is the …nite set of player i’s private signals.
In every stage game, player i chooses an action ai 2 Ai , which induces an action pro…le a
Q
Q
(a1 ; a2 ) 2 A
(y1 ; y2 ) 2 Y
i2I Ai . Then, a signal pro…le y
i2I Yi is realized according to a
joint conditional probability function q (y j a). Summing up these probabilities with respect to yj ,
7
let qi (yi j a)
P
yj 2Yj
q(y j a) denote the marginal distribution of player i’s signals. Throughout
the paper, when we say players i and j, players i and j are di¤erent: i 6= j.
Following the convention in the literature, we assume that player i’s ex post utility is a deterministic function of player i’s action ai and her private signal yi . This implies that observing her
own ex post utility does not give player i any further information than (ai ; yi ).8 Let u~i (ai ; yi ) be
player i’s ex post payo¤.
Taking the expectation of the ex post payo¤, we can derive player i’s
P
expected payo¤ from a 2 A: ui (a)
ui (ai ; yi ).
yi 2Yi qi (yi j a)~
Let F be the set of feasible and individually rational payo¤s:
F
min
j
Let
2.2
v 2 co(fu(a)ga2A ) : vi
min
j2
max ui (ai ;
(Aj ) ai 2Ai
j)
for all i 2 I :
(1)
be the minimax strategy.
Repeated Game
Consider the in…nitely repeated game with the (common) discount factor
2 (0; 1).
Let hti
(ai; ; yi; )t =11 with h1i = f;g be player i’s history in period t; and let Hit be the set of all the possible
histories of player i. In each period t, player i takes an action ai;t according to her strategy
S1
t
(Ai ). Let i be the set of all strategies of player i. Finally, let E( ) be the set
i :
t=1 Hi !
of sequential equilibrium payo¤s with a common discount factor .
3
Assumptions and Result
In this section, we state our four assumptions and main result. First, we assume that the distribution of private signal pro…les has full support:
Assumption 1 For each a 2 A and y 2 Y , we have q(y j a) > 0.
Let us de…ne
"support
min q(y j a) > 0
y2Y;a2A
8
(2)
Otherwise, we see the realization of player i’s ex post utilities as a part of player i’s private signals. Let Ui be
the …nite set of realizations of player i’s ex post utilities, and let u
~i 2 Ui be a generic element of Ui . We see a vector
(yi ; u
~i ) as player i’s private signal. Hence, the set of player i’s signals is now Yi Ui .
8
be the lower bound of the probability. Note that this also implies the full support of the marginal
distribution: mini2I;yi 2Yi ;a2A qi (yi j a)
"support .
Assumption 1 excludes public monitoring, where yi = yj with probability one. One may …nd
allowing public monitoring of special interest: Our other Assumptions 2, 3, and 4 de…ned below
generically hold if jYi j
jAj j for each i and j, while pairwise full rank in Fudenberg, Levine, and
Maskin (1994) requires jYi j
jA1 j + jA2 j
1. (See also Radner, Myerson, and Maskin (1986).) We
can extend the result to allow public monitoring and interested readers are referred to the working
paper.9
Second, we assume that player i’s signal statistically identi…es player j’s action. Let
qi (ai ; aj )
(3)
(qi (yi j ai ; aj ))yi 2Yi
be the vector expression of the marginal distribution of player i’s signals.
Assumption 2 For each i 2 I and ai 2 Ai , the collection of jYi j-dimensional vectors (qi (ai ; aj ))aj 2Aj
is linearly independent.
Suppose that player i takes ai . When player j changes her actions, the change gives a di¤erent
distribution of player i’s signals, so that player i can statistically identify player j’s deviation.
Third, we assume that each signal of player i happens with di¤erent probabilities after player
j’s di¤erent actions.
Assumption 3 For each i 2 I, ai 2 Ai , and aj ; a0j 2 Aj satisfying aj 6= a0j , we have qi (yi j ai ; aj ) 6=
qi (yi j ai ; a0j ) for all yi 2 Yi .
Fourth, suppose that in the repeated game, player j takes a mixed strategy
j
2
(Aj ).
If
player i’s history is (ai ; yi ), then player i believes that player j’s history (aj ; yj ) is distributed
according to Pr((aj ; yj ) j
j ; ai ; yi ).
Here, Pr is de…ned such that player j takes aj according to
and signal pro…le y is drawn from q(y j a). The following assumption about Pr((aj ; yj ) j
j
j ; ai ; yi )
ensures that there exists a mixed strategy of player j such that di¤erent histories of player i give
di¤erent beliefs about player j’s history:
9
We use private strategies.
Kandori and Obara (2006).
The possibility of private strategies to improve e¢ ciency is …rst pointed out by
9
Assumption 4 For each i 2 I, there exists
j
with (ai ; yi ) 6= (a0i ; yi0 ), there exists (aj ; yj ) 2 Aj
(Aj ) such that, for all (ai ; yi ); (a0i ; yi0 ) 2 Ai
2
Yj such that Pr((aj ; yj ) j
j ; ai ; yi )
Yi
6= Pr((aj ; yj ) j
0
0 10
j ; ai ; yi ).
With these four assumptions, we can show the following theorem.
Theorem 1 If Assumptions 1, 2, 3, and 4 are satis…ed, then in a two-player repeated game, for
each payo¤ pro…le v 2 int(F ), there exists
4
< 1 such that, for all
> , we have v 2 E ( ).
Short Road Map
Before giving the details of the proof, we …rst display a short road map to prove Theorem 1. First,
in Section 5, we derive the su¢ cient conditions in a …nitely repeated game without discounting such
that, once we prove the su¢ cient conditions, then it implies that Theorem 1 holds in the in…nitely
repeated game with discounting.
Second, in Section 6, we de…ne multiple variables, with which
Section 7 pins down the structure (of the strategy) of the …nitely repeated game. The rest of the
roadmap, which is easier to understand once we see the structure in Section 7, is postponed until
Section 8.
5
Reduction to a Finitely Repeated Game without Discounting
To prove Theorems 1, we …x a payo¤ v 2 int(F ) arbitrarily. First, we derive su¢ cient conditions
such that once we prove these su¢ cient conditions hold for given v, then v 2 E ( ). These su¢ cient
conditions are stated as conditions in a “…nitely repeated game without discounting.” This reduction
to the …nitely repeated game is standard (see Hörner and Olszewski (2006), for example), except
that we successfully reduce the in…nitely repeated game with discounting to the …nitely repeated
game without discounting. Since there is no discounting, we can treat each period identically. This
identical treatment will simplify the equilibrium construction.
To this end, we see the in…nitely repeated game as repetitions of TP -period review phases. We
make sure that each review phase is recursive. (Speci…cally, our equilibrium is a TP -period block
10
This conditional probability is well de…ned for each
j
10
and (ai ; yi ) given Assumption 1.
equilibrium in the language of Hörner and Olszewski (2006).) We decompose each player’s payo¤
from the repeated game as the summation of instantaneous utilities from the current phase and
continuation payo¤ from the future phases.
Instead of analyzing the in…nitely repeated game, we concentrate on a TP -period …nitely repeated
game, in which payo¤s are augmented by a terminal reward function that depends on a history of
the …nitely repeated game. The reward function corresponds to the continuation payo¤ from the
future phases.
Speci…cally, in the …nitely repeated game, in each period t = 1; :::; TP , each player i takes an
(ai; ; yi; )t =11 . Let
action based on her private history hti
i
be the set of player i’s strategies. In
equilibrium, each player is in one of the two possible states fG; Bg and, given state xi 2 fG; Bg,
she takes a strategy
i (xi )
2
i.
On the other hand, player i’s reward function is determined by
player j’s state xj and player j’s history of the …nitely repeated game hTj P +1 :
TP +1
).
i (xj ; hj
Note
that player i’s reward function is the statistic that player j calculates.
We will show that if we …nd a natural number TP 2 N, a strategy
TP +1
),
i (xj ; hj
a reward function
i (xi ),
and values fvi (xj )gxj 2fG;Bg such that the following su¢ cient conditions hold, then we
have v 2 E( ): For each i 2 I,
1. [Incentive Compatibility] For all x 2 fG; Bg2 ,
i
(xi ) 2 arg max E
i2
i
"T
P
X
ui (at ) +
TP +1
)
i (xj ; hj
t=1
j
i;
#
j (xj )
:
(4)
In words, incentive compatibility requires that, given each state of the opponent, xj 2 fG; Bg,
both
i (G)
and
i (B)
are optimal for player i to maximize the summation of instantaneous
utilities and reward function. Since
TP +1
)
i (xj ; hj
is the movement of the continuation payo¤
P P t 1
in the context of the in…nitely repeated game and Tt=1
ui (at ) + TP i (xj ; hTj P +1 ) is close
P P
to Tt=1
ui (at ) + i (xj ; hTj P +1 ) for a su¢ ciently large , this condition implies both i (G) and
i (B)
are optimal in each review phase regardless of the opponent’s state.
2. [Promise Keeping] For all x 2 fG; Bg2 ,
"T
P
X
1
E
ui (at ) +
TP
t=1
TP +1
)
i (xj ; hj
11
#
j (x) = vi (xj ):
(5)
Here, for notational convenience, we write (x)
( 1 (x1 );
2 (x2 )).
In words, (5) says that the
time average of the instantaneous utilities and the change in continuation payo¤,
TP +1
),
i (xj ; hj
is equal to vi (xj ). This implies that, for a su¢ ciently high discount factor, vi (xj ) is approximately player i’s value from the in…nitely repeated game if player j’s current state is xj .
3. [Full Dimensionality] The values vi (B) and vi (G) contain vi between them:
(6)
vi (B) < vi < vi (G):
Since this condition implies vi (B) < vi (G), when player j switches from xj = G to xj = B,
the switch strictly decreases player i’s payo¤. Hence, player j can punish player i by historycontingent state transitions. In addition, since vi 2 [vi (B); vi (G)], player j can mix the initial
state properly so that player i’s initial equilibrium payo¤ is vi .
4. [Self Generation] The sign of
0 and
TP +1
)
i (B; hj
TP +1
)
i (xj ; hj
satis…es a proper condition: For all hTj P +1 ,
TP +1
)
i (G; hj
0. If we de…ne
sign(xj )
8
<
1 if xj = G;
: 1
(7)
if xj = B;
then this condition is equivalent to the following:
sign(xj ) i (xj ; hTj P +1 )
0:
(8)
We call the condition (8) “self generation.”
This corresponds to the condition that Hörner and Olszewski (2006) impose: The reward
function when the opponent’s state is G (or B) is nonpositive (or nonnegative).
As will
be seen in the proof, in the in…nitely repeated game, self generation ensures that player i’s
continuation payo¤ speci…ed by the reward function at the end of a review phase is included
in [vi (B); vi (G)].
Since vi (xj ) is player i’s value when player j’s state is xj , this inclusion
ensures that, by mixing xj = G and B properly in the next phase, player j implements the
12
continuation payo¤ speci…ed by the reward function.11
The following lemma proves that the above conditions are su¢ cient.
Lemma 1 For any v 2 R2 , if there exist TP 2 N, ff
i
(xi )gxi 2fG;Bg gi2I , ff i (xj ; hTj P +1 )gxj 2fG;Bg gi2I ,
and fvi (xj )gi2I;xj 2fG;Bg such that the conditions (5)–(8) are satis…ed, then we have v 2 E( ) for a
su¢ ciently large
2 (0; 1).
Proof. See Appendix A.2. Appendices A and B are provided as supplemental materials for the
submission. In case the referees are not provided the supplemental materials, the online appendix
is also available at https://sites.google.com/site/takuosugaya/home/research.
Let us comment on why we can ignore discounting.
Recall that Assumption 2 ensures that
player j can statistically infer player i’s action. Hence, if she infers that player i incurs a loss in
earlier periods of the …nitely repeated game rather than later, then player j can compensate player i
slightly (of order 1
) so that player i is indi¤erent about when to incur the loss. For a su¢ ciently
large , such compensation is arbitrarily small.
Therefore, we focus on, for any v 2 int(F ), …nding TP 2 N, ff
xj 2fG;Bg gi2I ,
6
i
(xi )gxi 2fG;Bg gi2I , ff i (xj ; hTj P +1 )g
and fvi (xj )gi2I;xj 2fG;Bg to satisfy (5)–(8).
Basic Variables
To construct the variables to satisfy (5)–(8), it will be useful to de…ne some functions/variables.
In Section 6.1, we …x
(A1 )
…x (a(x);
L 2 N,
11
i[
],
xj
i ,
x
u > 0, and ui j
i2I;xj 2fG;Bg
.
Here,
2
(A) with
(A)
(A2 ). Note that we do not allow correlation between players’actions; in Section 6.2, we
i (x);
;
i
(x))x2fG;Bg2 for each ,
> 0, ai (G), ai (B), and
mix
i ;
min;
i
for each ,
payo
and in Section 6.3, we …x S,
One may wonder whether we need the condition that
(1
)
TP +1
)
i (xj ;hj
TP
> 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg ,
S(t)
i ,
j,
qG , q B ,
c:i:
i ,
and "strict .
is su¢ ciently small. This condition
is automatically satis…ed for su¢ ciently large for the following reason: We …rst …x TP . Then, i (xj ; hTj P +1 ) is
bounded since the number of histories given TP is …nite. Finally, by taking su¢ ciently close to 1, we can make
(1
)
TP +1
)
i (xj ;hj
TP
arbitrarily small.
13
6.1
i[
Basic Reward Functions
First, for each
2
(A), we create
xj
i
] and
] (aj ; yj ) that cancels out the di¤erences in the instantaneous
i[
utilities for di¤erent a’s. Since Assumption 2 implies that player j can statistically infer player i’s
action from her signals, for each aj 2 Aj , there exists
i (aj ;
) : Yj ! R such that
ui (ai ; aj ) + E [ i (aj ; yj ) j ai ; aj ] = 0
for each ai 2 Ai . For each , we de…ne
i[
](aj ; yj ) =
i (aj ; yj )
(9)
+ ui ( ) so that
ui (a) + E [ i [ ](aj ; yj ) j a] = ui ( )
for each a 2 A. This equality also implies that the expected value of
E [ i [ ](aj ; yj ) j ] = 0. In addition,
i[
(10)
i[
](aj ; yj ) given
is zero:
](aj ; yj ) is continuous in .
Second, by adding/subtracting a constant depending on xj to/from
i (aj ; yj ),
we create
xj
i (aj ; yj )
that makes player i indi¤erent between any action pro…le and that satis…es self generation: ui (a) +
E
xj
i (aj ; yj )
j a does not depend on a 2 A, and we have sign(xj )
xj
i (aj ; yj )
0 for each xj 2
fG; Bg and (aj ; yj ). In summary,
x
Lemma 2 There exist u > 0 and ui j
i2I;xj 2fG;Bg
such that, for each i 2 I, the following three
properties hold:
1. For each
2
(A), there exists
i[
Yj !
] : Aj
u u
;
4 4
that makes any action optimal for
player i: ui (a) + E [ i [ ](aj ; yj ) j a] = ui ( ) for all a 2 A. This implies E [ i [ ](aj ; yj ) j ] =
0. Moreover, for each (aj ; yj ),
i[
2. For each xj 2 fG; Bg, there exists
x
ui j for all a 2 A and sign(xj )
] (aj ; yj ) is continuous in .
xj
i
: Aj Yj !
xj
i (aj ; yj )
u u
;
4 4
satisfying ui (a)+E
xj
i (aj ; yj )
0 for each xj 2 fG; Bg and (aj ; yj ) 2 Aj
x
ja =
Yj .
x
3. u is su¢ ciently large: maxi2I;a2A jui (a)j+maxi2I;xj 2fG;Bg ui j < u and maxi2I;xj 2fG;Bg ui j <
u
.
4
Proof. See Appendix A.3.
We …x
i[
],
xj
i ,
x
u > 0, and ui j
i2I;xj 2fG;Bg
so that Lemma 2 holds.
14
Figure 1: How to take a(x)
6.2
Basic Actions
We de…ne an action pro…le a(x) for each x 2 fG; Bg2 . As will be seen, a(x) is the action pro…le
which is taken with a high probability on equilibrium path when the players take (x). Speci…cally,
we …x (a(x))x2fG;Bg2 such that, for each i 2 I and xj 2 fG; Bg, we have sign(xj )(vi ui (a(x))) > 0.12
See Figure 1 for the illustration of a(x). This de…nition of a(x) is the same as Hörner and Olszewski
(2006).
Given (a(x))x2fG;Bg2 , we perturb ai (x) to
probability no less than
i (x)
so that player i takes each action with
> 0:
i (x)
(1
(jAi j
1) ) ai (x) +
= (1
(11)
ai :
ai 6=ai (x)
In addition, given player i’s minimaxing strategy
min;
i
X
jAi j )
12
min
i ,
min
i
+
we also perturb
X
ai :
min
i :
(12)
ai 2Ai
Action pro…les that satisfy the desired inequalities may not exist. However, if dim (F ) 2 (otherwise, int (F ) =
; and Theorem 1 is vacuously true), then there always exist an integer z and 2z …nite sequences fa1 (x), : : :,
az (x)gx2fG;Bg2 such that each vector wi (x), the average payo¤ vector over the sequence fa1 (x), : : :, az (x)gx2fG;BgN ,
satis…es the appropriate inequalities. The construction that follows must then be modi…ed by replacing each action
pro…le a(x) by the …nite sequence of action pro…les fa1 (x), : : :, az (x)gx2fG;BgN . Details are omitted since this is the
same as Hörner and Olszewski (2006).
15
Let vi ;
maxai 2Ai ui (ai ;
min;
j
min;
i
) be the perturbed minimax value. Given
;
i
(x) =
8
<
:
i (x)
if xi = G;
min;
i
if xi = B:
, we de…ne
(13)
By de…nition of a(x) and v 2 int (F ), we have
max max ui (ai ;
ai 2Ai
If we take
by
i (x)
min
j );
< vi < min ui (a(x)) for all i 2 I:
max ui (a(x))
x:xj =B
x:xj =G
su¢ ciently small, then the targeted payo¤ vi is in the interval of the payo¤s induced
and
min;
i
: For a su¢ ciently small
max vi ; ; max ui (
x:xj =B
(x))
> 0, for each
payo
< vi < min ui (
x:xj =G
Hence, there exist vi (xj ) and ui (xj ) such that, for each
max vi ; ; max ui (
x:xj =B
(x))
<
<
: ui (xj )
, we have
(x)) for all i 2 I:
payo
, we have
< ui (B) < vi (B) < vi < vi (G) < ui (G) < min ui (
x:xj =G
Given u …xed in Section 6.1, there exists L such that, for each
8
<
payo
u
ui (xj ) ui ( (x))
L
n
max ui ( (x)); maxai 2Ai ui (ai ;
Fix such L. Given u and L, we …x su¢ ciently small
(15 + 8L)
min;
i
o
)
<
payo
(x)) for all i 2 I:
, we have
if xj = G;
u
L
if xj = B:
> 0 such that, for each xj 2 fG; Bg, we have
x
jui (xj )j + Lu + ui j
< jui (xj )
vi (xj )j :
In summary, we have proven the following lemma:
Lemma 3 Given v 2 int (F ) and u 2 R, there exist (a(x))x2fG;Bg2 ,
L 2 N, and
> 0 such that, for each i 2 I and
max vi ; ; max ui (
x:xj =B
(x))
<
payo
payo
> 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg ,
, we have
< ui (B) < vi (B) < vi < vi (G) < ui (G) < min ui (
x:xj =G
16
(x));
(14)
8
<
u
ui (xj ) ui ( (x))
L
n
max ui ( (x)); maxai 2Ai ui (ai ;
: ui (xj )
and
x
jui (xj )j + Lu + ui j
(15 + 8L)
We …x (a(x);
L 2 N, and
i (x);
;
i
< jui (xj )
(x))x2fG;Bg2 for each ,
min;
i
min;
i
if xj = G;
o
)
u
L
(15)
if xj = B;
vi (xj )j for each xj 2 fG; Bg:
for each ,
(16)
> 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg ,
payo
> 0 so that Lemma 3 holds.
In addition, we also …x two di¤erent actions
P
= jA1i j ai 2Ai ai be the
ai (G); ai (B) 2 Ai arbitrarily with ai (G) 6= ai (B). Moreover, let mix
i
random strategy of player i.
6.3
Conditional Independence
For the reason to be explained in Section 15, we want to construct S,
S(t)
i ,
j,
qG , qB ,
c:i:
i ,
"strict with the following properties: In some period t 2 N, player j takes a random strategy
and
mix
j
and player i takes some action ai;t and observes yi;t . After period t, there are S periods assigned
to period t, denoted by S(t) 2 NS with jS(t)j = S. Player j takes
S
S(t)
(ai;t ; yi;t ) [ (ai; ; yi; ) 2S(t); s
i takes some pure strategy i :
s2S(t)
mix
j
1
in each period, while player
! Ai in S(t), which depends
on her history in periods t and S(t).
Based on player j’s history in t and S(t), player j calculates a function
The realization of
1
2
with qG
E
h
j
=
1
2
j
i
: (Aj
Yj )S+1 ! [0; 1].
statistically infers what action player i takes in period t: For some qG and qB
qB > 0, we have
(aj;t ; yj;t ) [ (aj; ; yj; )
2S(t)
Importantly, the expected value of
j
j
mix
j ; ai;t ; yi;t ;
S(t)
i
i
=
8
>
>
q
>
< G
1
2
>
>
>
: q
B
if ai;t = ai (G);
if ai;t 6= ai (G); ai (B);
if ai;t = ai (B):
is conditionally independent of yi;t : In period t, from player
i’s perspective, if she (rationally) expects that she will take
S(t)
i ,
then the expected value of
j
does
not depend on yi;t .
To incentivize player i to take
S(t)
i ,
player j gives the reward function
c:i:
i
(aj;t ; yj;t ) [ (aj; ; yj; )
(c:i: stands for conditional independence) such that, for each (ai;t ; yi;t ), if a pure strategy
17
i
2S(t)
is dif-
ferent from
S(t)
i
on equilibrium path, then
E
h
c:i:
i
E
h
(aj;t ; yj;t ) [ (aj; ; yj; )
c:i:
i
j
2S(t)
(aj;t ; yj;t ) [ (aj; ; yj; )
S(t)
i
mix
j ; ai;t ; yi;t ;
mix
j ; ai;t ; yi;t ;
j
2S(t)
i
i
i
(17)
"strict
for some "strict > 0. (Here, we ignore the instantaneous utility.)
S(t)
i ,
Moreover, we make sure that, given the equilibrium strategy
at the timing of taking ai;t ,
the value does not depend on ai;t : For each ai;t 2 Ai ,
E
h
c:i:
i
(aj;t ; yj;t ) [ (aj; ; yj; )
2S(t)
S(t)
i ,
The following lemma ensures that we can …nd S,
S(t)
i
mix
j ; ai;t ;
j
j,
i
= 0:
c:i:
i ,
qG , qB ,
and "strict to satisfy the
conditions above:
Lemma 4 There exist S 2 N, "strict > 0, uc:i: > 0, and 1 > qG > qB > 0 with qG 21 = 12 qB > 0
S
S(t)
(ai;t ; yi;t ) [ (ai; ; yi; ) 2S(t); s 1 !
such that, for each i 2 I, t 2 N, and S(t), there exist i :
s2S(t)
Ai ,
j
: (Aj
Yj )
S+1
! [0; 1], and
c:i:
i
: (Aj
Yj )
S+1
uc:i: ; uc:i: such that the following
!
properties hold:
1. Take any pure strategy of player i, denoted by
i
S
:
(ai;t ; yi;t ) [ (ai; ; yi; )
s2S(t)
For each (ai;t ; yi;t ), if there exists hi = (ai;t ; yi;t ) [ (ai; ; yi; )
S(t)
i
such that (a) hi is reached by the equilibrium strategy
(b)
i (hi )
6=
S(t)
i (hi )
(
i
2S(t);
s 1
2S(t);
s 1
! Ai .
for some s 2 S(t)
with a positive probability, and
is an on-path deviation), then the continuation payo¤ from hi is
decreased by at least "strict :
E
h
E
c:i:
i
h
c:i:
i
(aj;t ; yj;t ) [ (aj; ; yj; )
2S(t)
(aj;t ; yj;t ) [ (aj; ; yj; )
mix
j ; ai;t ; yi;t ;
j
2S(t)
j
S(t)
i ; hi
mix
j ; ai;t ; yi;t ;
i ; hi
i
i
"strict :
2. The conditional independence property holds: For each ai;t 2 A and yi;t 2 Yi ,
E
h
j
(aj;t ; yj;t ) [ (aj; ; yj; )
2S(t)
j
mix
j ; ai;t ; yi;t ;
18
S(t)
i
i
=
8
>
>
q
>
< G
1
2
>
>
>
: q
B
if ai;t = ai (G)
if ai;t 6= ai (G); ai (B)
if ai;t = ai (B):
3. The expected value does not depend on ai;t : For all ai;t 2 Ai .
h
E
c:i:
i
(aj;t ; yj;t ) [ (aj; ; yj; )
2S(t)
mix
j ; ai;t ;
j
S(t)
i
i
= 0:
Proof. See Appendix A.4.
Let us provide the intuition of the proof. Given player i’s history in period t, (ai;t ; yi;t ), player
i is asked to “report” (ai;t ; yi;t ) to player j in periods S(t).
For a moment, imagine that player
i sends the message via cheap talk. Given player i’s message (^
ai;t ; y^i;t ), suppose player j gives a
reward
1aj;t ;yj;t
E 1aj;t ;yj;t j
2
mix
^i;t ; y^i;t
j ;a
:
Here, 1aj;t ;yj;t is jAj j jYj j-dimensional vector whose element corresponding to (aj;t ; yj;t ) is one and the
other elements are zero. On the other hand, E 1aj;t ;yj;t j
of 1aj;t ;yj;t given that player j takes
mix
j
mix
^i;t ; y^i;t
j ;a
is the conditional expectation
and player i observes (^
ai;t ; y^i;t ). Throughout the paper,
we use Euclidean norm.
From player i’s perspective, ignoring the instantaneous utility as in (17), she wants to minimize
E
h
1aj;t ;yj;t
E 1aj;t ;yj;t j
mix
^i;t ; y^i;t
j ;a
2
j
mix
j ; ai;t ; yi;t
i
(18)
by taking (^
ai;t ; y^i;t ) optimally. As will be seen in Lemma 16, given Assumption 4, we can show that
(^
ai;t ; y^i;t ) = (ai;t ; yi;t ) (telling the truth) is the unique optimal strategy.13 Since a
^i;t = ai;t , player j
has enough information to create
j
to satisfy Claim 2 of Lemma 4.
Without cheap talk, player i takes a message by taking an action sequence in S(t) and player j
infers what action sequence/message player i sends from player j’s history in S(t). Player i wants to
minimize (18) with (^
ai;t ; y^i;t ) replaced with player j’s inference of the message. By taking S = jS(t)j
su¢ ciently long, ex ante (after period t but before S(t)), we can make sure that given player i’s
equilibrium strategy
S(t)
i
to maximize the reward, player j infers (ai;t ; yi;t ) correctly with a high
probability. Since Claim 2 of Lemma 4 requires conditional independence at the end of period t,
this ex ante high probability is su¢ cient to create
j
(aj;t ; yj;t ) [ (aj; ; yj; )
2S(t)
to satisfy Claim
2.
The strictness (Claim 1) can be achieved as follows: In the last period in S(t), if there is player
13
The objective function is called “the scoring rule” in statistics.
19
i’s history with which player i is indi¤erent, then player j can break the tie by giving a small reward
for player i’s speci…c action based on (aj; ; yj; ) with
being the last period in S(t). We can make
sure that this tie-breaking reward is small enough not to a¤ect the strictness of the incentives for
player i’s histories after which player i had the strict incentive originally and not to a¤ect player i’s
incentive to minimize (18) so much (that is, the ex ante probability of player j infering (aj;t ; yj;t ) is
still high).
Then, we can proceed by backward induction. Whenever player j breaks a tie for some period
, it does not a¤ect the strict incentives in the later periods
in a certain period
0
>
since the reward to break a tie
based on (aj; ; yj; ) will be sunk in the later periods
0
> .
Finally, since player j can statistically identify player i’s action ai;t from player j’s signal yj;t ,
by creating a reward function solely based on yj;t in order to reward (or punish) ai;t which gives a
low value (or high value) in S(t), we can make sure that the expected value does not depend on ai;t
to satisfy Claim 3. Since the reward based solely on yj;t is sunk in S(t), this reward does not a¤ect
player i’s incentive in S(t).
We now …x S,
7
S(t)
i ,
j,
qG , qB ,
c:i:
i ,
and "strict so that Lemma 4 holds.
Structure
Given the variables …xed in Section 6, we pin down the structure of (the strategy in) each review
phase/…nitely repeated game with T 2 N being a parameter.
The review phase is divided into
blocks, and the block is divided into rounds. See Figure 2 for illustration.
First, given each player i’s state xi , the players coordinate on x in order to take a(x) depending
on x with a high probability. To this end, at the beginning of the …nitely repeated game, they play
the coordination block.
In particular, players …rst coordinate on x1 . First, player 1 sends x1 2 fG; Bg to player 2 by
1
1
taking actions, spending T 2 periods with a large T . We call these T 2 periods “round (x1 ; 1),”and
let T (x1 ; 1) be the set of periods in round (x1 ; 1). (In this step, we focus on the basic structure
and postpone the explanation of the formal strategy in each round until Section 13.)
Throughout the paper, we ignore the integer problem since it is easily dealt with by replacing
each variable n with the smallest integer no less than n.
2
2
Second, player 1 sends x1 to player 2, spending T 3 periods. We call these T 3 periods “round
20
Figure 2: Structure of the review phase
(x1 ; 2),” and let T (x1 ; 2) be the set of periods in round (x1 ; 2). (We will explain in Section 13.2
why player 1 sends x1 twice and why round (x1 ; 2) is longer.)
Based on rounds (x1 ; 1) and (x1 ; 2), player 2 creates the inference of x1 , denoted by x1 (2) 2
fG; Bg. (In general, we use subscript to denote the original owner of the variable and index in the
parenthesis to denote a player who makes the inference of the variable. For example, variablej (i)
means that player j knows variablej , and that variablej (i) is player i’s inference of variablej .)
1
1
Third, player 2 sends x1 (2) to player 1, spending T 2 periods. We call these T 2 periods “round
(x1 ; 3),” and let T (x1 ; 3) be the set of periods in round (x1 ; 3). Based on rounds (x1 ; 1), (x1 ; 2),
and (x1 ; 3), player 1 creates the inference of x1 , denoted by x1 (1) 2 fG; Bg. (This inference x1 (1)
may be di¤erent from her original state x1 .)
Once players are done with coordinating on x1 , they coordinate on x2 . The way to coordinate
on x2 is the same as the way to coordinate on x1 , with players’indices reversed.
After the coordination on x is done, the players play L “main blocks.” In each main block,
…rst, the players play a T -period review round. With a large T , we can recover the precision of the
monitoring by the law of large numbers. After each review round, the players coordinate on what
strategy they should take in the next review round, depending on the history (as explained in the
21
Introduction). This coordination is done by each player i sequentially sending a binary message
i (l
+ 1) 2 fG; Bg, which summarizes the history at the end of the review round. (The precise
meaning of
i (l
+ 1) 2 fG; Bg will be explained in Section 13.4.)
In total, for l = 1; :::; L
1, the players …rst play review round l for T periods. Let T (l) be
the set of review round l. Then, player 1 sends
1
these T 2 periods “the supplemental round for
(l + 1),”and let T (
1
(l + 1)) be the set of periods
periods. “The supplemental
(l + 1)) are similarly de…ned.
We call these three rounds
2 (l
round for
2
(l + 1)” and the set T (
1
+ 1) 2 fG; Bg, spending T 2 periods. We call
1
2
in it. After player 1, player 2 sends
2
1
1 (l
+ 1) 2 fG; Bg, spending T
“main block l.” Once main block l is over, the players play review round l + 1, recursively.
After the last review block L, since the players do not need to coordinate on the future strategy
any more, main block L consists only of review round L.
Given this structure, we can chronologically order all the rounds in the coordination and main
blocks, and name them round 1, round 2, ..., and round R. Here,
R
6 + 3 (L
(19)
1) + 1
is the total number of rounds in the coordination and main blocks.
For example, round 1 is
equivalent to round (x1 ; 1), round 2 is equivalent to round (x1 ; 2), and so on.
chronological order, when we say r
Given such a
l, this means that round r is review round l or a round
chronologically before review round l. Similarly, r < l means that round r is a round chronologically
before (but not equal to) review round l. In addition, let T(r) be the set of periods in round r; for
example, T(r) = T (x1 ; 1), and t(r) + 1 is the …rst period of round r.
Finally, the players play the report block, where the players send the summary statistics of the
history in the coordination and main blocks. As will be seen in Section 15, we use this block to
22
…ne-tune the reward function. This round lasts for Treport periods with
1
2
Treport
1 + (S + 1) T + (S + 1) T
1
2
R
X
r=1
R
X
1
2
+ (S + 1)T + 1
r=1
+(S + 1)T
1
2
1
+(S + 1)T 4
R
X
r=1
R
X
r=1
1
2
jT(r)j 3 log2 1 + jT(r)j 3 jA1 jjY1 j
1
3
log2 jT(r)j + (S + 1)T
1
3
jT(r)j log2 jT(r)j
2
jA2 jjY2 j
3
1
4
R
X
r=1
(20)
2
jT(r)j 3 (1 + log2 jA1 j jY1 j)
1
2
+ (S + 1)T + 1
R
X
r=1
1
log2 jT(r)j 3
2
jT(r)j 3 log2 jA2 j jY2 j :
11
Note that Treport is of order T 12 .
We have now pinned down the structure of the review phase, with T being a parameter. For
a su¢ ciently large T , the payo¤s from the review round determine the equilibrium payo¤ from the
review phase since the length of the review rounds is much longer than the other rounds/blocks:
Lemma 5 Let
1
2
2 + 2T 3
TP (T ) = 4T
| {z
} +
coordination block
LT
|{z}
review round
+ (L
|
supplemental round
be the length of the review phase. We have limT !1
T(r)
Given player i’s history hi
1
1) 2T 2 +
{z
}
Treport
| {z }
11
report block, O(T 12 )
length of the review rounds
length of the review phase
limT !1
LT
TP (T )
= 1.
(ai;t ; yi;t )t2T(r) in round r in the coordination and main blocks,
let
T(r)
fi [hi
]
T(r)
fi [hi
](ai ; yi )
(ai ;yi )2Ai Yi
# ft 2 T (r) : (ai;t ; yi;t ) = (ai ; yi )g
jT (r)j
(21)
(ai ;yi )2Ai Yi
be the vector expression of the frequency of periods in round r where player i has (ai;t ; yi;t ) = (ai ; yi ).
It will be useful to consider the following randomization of player i: For each round r, player i
picks a period texclude
(r) 2 T (r) randomly: texclude
(r) = t with probability
i
i
independently of player i’s history. Let Tinclude
(r)
i
23
1
jT(r)j
for each t 2 T (r),
T (r) n texclude
(r) be the periods other than
i
texclude
(r), and let
i
T(r)
fiinclude [hi
T(r)
fiinclude [hi
]
# t2
(r).
be the frequency in Tinclude
i
](ai ; yi )
Tinclude
i
(ai ;yi )2Ai Yi
(r) : (ai;t ; yi;t ) = (ai ; yi )
jT (r)j 1
!
(22)
(ai ;yi )2Ai Yi
As will be seen in Section 13, player i decides the continuation
T(r)
play in the subsequent rounds based only on fiinclude [hi
]. This implies that player j never learns
(ai;texclude
(r) ; yi;texclude
(r) ) by observing her own private signals informative about player i’s continuation
i
i
play. This property will be important in Section 15.
T(r)
Given fi [hi
], let
0
B
@
fi [hi l ]
T(x1 ;1)
fi [hi
T(x1 ;2)
]; fi [hi
T(x1 ;3)
]; fi [hi
l)
T(
T(~
fi [hi ]; fi [hi
~
)
1 (l+1)
T(x2 ;1)
]; fi [hi
T(
]; fi [hi
T(x2 ;2)
]; fi [hi
~
)
2 (l+1)
T(x2 ;3)
]; fi [hi
];
l 1
]
~
l=1
T(l)
; fi [hi ]
1
C
A
(23)
be player i’s frequency of the history at the end of review round l; and let fi [h<l
i ] be the frequency
T(l)
which exclude fi [hi
] from fi [hi l ] (that is, the frequency at the beginning of review round l).
l
<l
fiinclude [hi l ] and fiinclude [h<l
i ] are similarly de…ned. On the other hand, let hi and hi be player i’s
histories at the end and beginning of review round l, respectively. Similarly, let hi
r
and h<r
be
i
player i’s histories at the end and beginning of round r, respectively.
be player j’s history in the report block.
Finally, let hreport
j
8
Road Map to Prove (5)–(8)
We have …xed
each ,
qB ,
c:i:
i ,
payo
i[
],
xj
i ,
x
u > 0, ui j
i2I;xj 2fG;Bg
, (a(x);
> 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg , L 2 N,
i (x);
;
i
(x))x2fG;Bg2 for each ,
> 0, ai (G), ai (B),
mix
i ,
S,
min;
i
for
j,
qG ,
S(t)
i ,
and "strict in Section 6, and then de…ned the structure in the …nitely repeated game and
TP (T ) in Section 7. Given this structure, we are left to pin down T 2 N, ff
i
(xi )gxi 2fG;Bg gi2I ,
and ff i (xj ; hTj P +1 )gxj 2fG;Bg gi2I to satisfy (5)–(8) with …xed fvi (xj )gi2I;xj 2fG;Bg and TP (T ).
Since the proof is long and complicated, we …rst o¤er the overview in Section 9.
Then, we
prove two modules which may be of their own interests. Section 10 de…nes the module for player
24
j to send a binary message m 2 fG; Bg to player i. This module will be used in the equilibrium
construction so that the players can coordinate on the continuation play by sending messages via
actions. Section 11 de…nes the module that will be used for the equilibrium construction of the
review round. The proof for each module is relegated to the online appendix. The road map of
how to use these modules in the equilibrium construction will be postponed after Section 11.
9
Overview of the Equilibrium Construction
As Lemma 5 ensures, the equilibrium payo¤ is determined by the payo¤s in the review rounds.
Hence, in this section, we focus on how to de…ne the strategy and reward function in the review
rounds.
9.1
Heuristic Reward Function
For a moment, suppose that the players can coordinate on the true x: x(i) = x(j) = x.
<
payo
. Suppose that the players take (
i (x(i));
j (x(j)))
=
Take
(x) in each review round and
that player i’s reward function is
LT fvi (xj )
ui (
(x))g +
L X
X
i[
(x)](aj;t ; yj;t ):
(24)
l=1 t2T (l)
Claim 1 of Lemma 2 ensures that player i’s incentive is satis…ed and that her equilibrium value is
vi (xj ).
PTP
t=1
Moreover, since E [ i [
i[
(x)](aj;t ; yj;t ) j
(x)] = 0, the law of large numbers ensures that
(x)](aj;t ; yj;t ) is near zero with a high probability. Together with (14), we conclude that
self generation is satis…ed with a high probability. See Figure 3 for illustration.
However, self generation needs to be satis…ed after each possible realization of player j’s history,
and after an erroneous realization of player j’s history, self generation would be violated if we used
(24). For example, consider prisoners’dilemma
C
C
2; 2
D 3; 1
25
D
1; 3
0; 0
Figure 3: Illustration of the heuristic reward with xj = G
where, for each j 2 I, signal structure is Yj 2 f0j ; 1j g, qj (1j j Ci ; aj ) = :6, and qj (1j j Di ; aj ) = :4
for each aj 2 fC; Dg. Then,
i[
(x)](aj ; yj ) = 5
1fyj =1j g
5 + ui (
(x)), where 1fyj =1j g = 1 if
yj = 1j and 1fyj =1j g = 0 otherwise. If x = (G; G) and v is close to the mutual cooperation payo¤
(2; 2), then both vi (xj ) and ui (
(x)) are near 2 by Figure 1. Hence, if player j observes yj = 1j
excessively often (say, almost TP periods), then (24) is equal to
LT fvi (xj ) ui (
|
{z
(x))g + 5
}
both of them are near 2
# ft : yj;t = 1j g
|
{z
}
near TP
5LT + ui ( (x))LT > 0:
| {z }
near 2
If signals were conditionally independent, then the …x suggested by Matsushima (2004) would
work: If (24) violates self generation, then player j gives player i the reward of zero. Since signals
are conditionally independent, player i after each history believes that such an event happens very
rarely. (Since player i puts a small yet positive belief on (24) violating self generation, Matsushima
uses pure strategy a(x) and gives a strict incentive to follow ai (x), so that player i wants to take
ai (x) even though the reward is zero with a small probability.)
However, with conditionally dependent signals, this …x does not work. For example, if player i’s
signals and player j’s signals are positively correlated, after player i observes a lot of 1i , she starts
to believe that (24) violates self generation with a high probability. After such an event, incentive
compatibility would not be satis…ed if we gave the reward of zero.
26
9.2
Type-1, Type-2, and Type-3 Reward Functions
To deal with this problem, we de…ne the overall reward is the summation of the reward for each
review round l (we need more modi…cation for the formal proof):
TP +1
) = LT fui (xj )
i (xj ; hj
where
T(l)
review
(xj ; hj ; l)
i
ui ( (x))g +
L
X
T(l)
review
(xj ; hj ; l);
i
l=1
is player i’s reward for review round l. We use ui (xj ) instead of vi (xj ) here
to keep some slack in self generation for the later modi…cations (see (14) for why ui (xj ) gives us more
T(l)
review
(xj ; hj ; l)
i
slack). Now, player i’s value without further adjustment is ui (xj ). The shape of
can be either type-1, type-2, or type-3.
The type-1 reward function is the same as heuristic reward:
T(l)
review
(xj ; hj ; l)
i
=
P
t2T(l)
i[
(x)]
(aj;t ; yj;t ). By Lemma 2, the maximum realization of the absolute value of the type-1 reward per
round is u4 T . Hence, as long as the realized absolute value of the type-1 reward per round is no
more than
u
T
4L
until review round l, no matter what realization happens in review round l + 1, the
total absolute value at the end of review round l + 1 is
l+1 X
X
i[
u
T
4L
(x)](aj;t ; yj;t )
^
l=1 t2T(^
l)
u
l+ T
4
u
T:
2
Together with (15), this means that self generation is not an issue.
P
u
Therefore, as long as
(x)](aj;t ; yj;t )
T for each ^l = 1; :::; l, player j uses the
t2T(^
l) i [
4L
type-1 reward in review round l + 1. Let j (l + 1) 2 fG; Bg such that j (l + 1) = G if and only
P
u
if
(x)](aj;t ; yj;t )
T for each ^l = 1; :::; l. Note that j (l + 1) depends on player j’s
t2T(^
l) i [
4L
history hj l . (Presicely speaking, we should write
On the other hand, if
j (l
j (l
+ 1) = B, then if player j used the type-1 reward function for review
round l + 1, then self generation might be violated.
equal to
;
j
+ 1)(hj l ), but we omit hj l for simplicity.)
Hence, player j takes
indi¤erent between “player j taking
;
(x(j)) (which is
(x) if x(j) = x) and gives a constant reward. (15) ensures that there exists a constant
reward such that (i) if coordination goes well and player i takes BRi (
j
;
j
j (x(j))
;
j
(x(j))), then player i is
and using the type-1 reward” and “player j taking
(x(j)) and using the constant reward”(and so the switch of the reward function does not a¤ect
player i’s incentive in the previous rounds) and that (ii) self generation is satis…ed. We call this
27
constant reward “the type-2 reward function.”
In the equilibrium construction, player i creates an inference of
j (l
+ 1),
j (l
+ 1)(i) 2 fG; Bg,
which depends on player i’s history h<l+1
at the beginning of review round l + 1.
i
j (l + 1)(i)
j (l+1)
= G means player i believes that
= B. Player i takes
since player j with
j (l
if
i (x(i))
+ 1) takes
;
j
j (l + 1)
j (l+1)(i)
= G and
j (l + 1)(i)
= G and takes BRi (
(x(j)) (which is equal to
;
;
j
Intuitively,
= B means she believes
(x(i))) if
j (l+1)(i)
=B
(x(i)) if the coordination goes
j
well: x(i) = x(j)) and the reward is type-2 and constant. (Here, we ignore player i’s role to control
player j’s payo¤. As player j takes an action based on
player i also takes her action based on not only
j (l
j (l + 1)
in order to control player i’s payo¤,
+ 1)(i) but also
i (l
+ 1) to control player j’s
payo¤. We ignore this complication in this section.)
We say that the coordination goes well if x(i) = x(j) = x and “ j (l + 1)(i) = B whenever
j (l
+ 1) = B”.
The above discussion means that, if the coordination goes well, then player
i’s strategy is optimal.
j (l + 1)
(One may wonder why her strategy is optimal if
= G. Note that player j with
j (l + 1)
j (l
+ 1)(i) = B and
= G uses the type-1 reward, and Lemma 2 ensures
that any strategy of player i maximizes the summation of the instantaneous utility and reward
function. Hence, any strategy of player i is optimal as long as
j (l
+ 1) = G.)
Finally, player j sometimes uses the following “type-3”reward:
T(l)
review
(xj ; hj ; l)
i
=
X
xj
i (aj;t ; yj;t ):
(25)
t2T(l)
By Claim 2 of Lemma 2, this reward always makes player i indi¤erent between any action pro…le
x
(player i’s incentive is irrelevant) and satis…es self generation, but the value ui j may be very di¤erent
from the targeted value vi (xj ) (and so promise keeping may become an issue). We will make sure
that the type-3 reward is used with a small probability so that we will not violate promise keeping.
Moreover, we will also make sure that player i cannot deviate to a¤ect player j’s decision of using
x
the type-3 reward function (since otherwise, depending on whether ui j is larger or smaller than the
value with the type-1 or type-2 reward, player i may want to deviate to a¤ect the decision). That
is, the distribution of the event that player j uses the type-3 reward does not depend on player i’s
strategy.
Further, whenever player j has
i (l)(j)
(that is, player j believes that her reward function is
constant and takes a static best response to player i’s action), player j has determined to use the
28
type-3 reward for player i, so that player i can ignore the possibility of
9.3
i (l)(j)
= B.
Miscoordination of the Continuation Strategy
Since players coordinate on the continuation strategy based on private signals, it is possible that
coordination does not go well. There are following two possibilities. First, players coordinate on
the same x(i) = x(j), but this is di¤erent from true x. Recall that xj controls player i’s payo¤.
Hence, as long as players coordinate on the same x(i) = x(j) and xj (i) = xj (j) = xj , player i’s
payo¤ is equal to ui (xj ). We will make sure that player j with xj (j) 6= xj uses the type-3 reward
so that, as long as players coordinate on the same x(i) = x(j), either player i’s payo¤ is equal to
ui (xj ) or player i’s incentive is irrelevant. Since the type-3 reward is not used often, by adjusting
the reward slightly, we can make sure that player i’s ex ante equilibrium payo¤ is vi (xj ).
Second, player i has x(i) 6= x(j) or “ j (l + 1)(i) = G if
player i with history hi
l+1
j (l + 1)
= B”. We will make sure that,
(at the end of review round l + 1) believes
9
08
< fx(i) 6= x(j) _ f j (l + 1)(i) = G ^ j (l + 1) = Bgg =
o
n
j xj ; hi
Pr @
: ^ review (xj ; hT(l+1) ; l + 1) is not type-3 reward ;
i
j
1
l+1 A
1
exp( T 3 ):
(26)
Since the expected di¤erence of player i’s payo¤ for di¤erent actions is zero with the type-3 reward
function, this means that player i’s conditional expectation of the gain of changing her action to deal
with the miscoordination is very small. (Note that (26) holds, conditioning on hi
l+1
, which includes
player i’s history in review round l +1.) That is, without further adjustment, the equilibrium would
be "-sequential equilibrium, where deviation gain after each history is no more than " being of order
1
exp( T 3 ).
9.4
Adjustment and Report Block
To make the equilibrium strategy exactly optimal, we further modify the reward function as follows:
The above discussion implies that, if we changed player i’s reward function for review round l so
29
that
target
(xj ; hi l ; hj l ; l)
i
= 1nfx(i)6=x(j)_f
+ 1
j (l)(i)=G^ j (l)=Bgg^ftype-3
1nfx(i)6=x(j)_f
o
j (l)(i)=G^ j (l)=Bgg^ftype-3
type-1 reward function
review (x ;hT(l) ;l)g
j j
i
reward is not used in
then player i’s equilibrium strategy would be optimal.
T(l)
review
(xj ; hj ; l)
i
o
review (x ;hT(l) ;l)g
j j
i
reward is not used in
(Recall that Lemma 2 ensures that the
type-1 reward function makes each strategy of player i optimal. Hence, player i does not need to
worry about miscoordination.) Note that such a reward would depend on player i’s history hi l as
well, since x(i) and
E
h
j (l)(i)
depends on hi l . Moreover, (26) ensures that
target
(xj ; hi l ; hj l ; l)
i
j xj ; hi
l
i
E
h
T(l)
review
(xj ; hj ; l)
i
j xj ; hi
l
i
(27)
1
is of order exp( T 3 ) with l + 1 replaced with l.
adjust
(xj ; hreport
; l),
i
j
If player j can construct a reward function
which depends on player j’s
history in the report block hreport
, such that, from player i’s perspective at the end of review round
j
l, the expected value of
adjust
(xj ; hreport
; l)
i
j
given (xj ; hi l ) is equal to (27), then by the law of iterated
expectation, player i in review round l wants to maximize
E
h
T(l)
review
(xj ; hj ; l)
i
+
adjust
(xj ; hreport
; l)
i
j
j xj ; hi
l
i
=E
h
target
(xj ; hi l ; hj l ; l)
i
j xj ; hi
l
i
and so player i’s equilibrium strategy is incentive compatible.
To this end, we de…ne
adjust
(xj ; hj l ; hreport
; l)
i
j
so that
h h
E E adjust
(xj ; hj l ; hreport
; l) j xj ; hi L ;
i
j
h
i
h
target
l
l
l
= E i
(xj ; hi ; hj ; l) j xj ; hi
E
Here,
report
jh
i
i
L
report
jh
i
i
L
i
j xj ; hi
T(l)
review
(xj ; hj ; l)
i
l
i
j xj ; hi
l
i
:
(28)
is player i’s equilibrium strategy in the report block given her history at the end
of the main block, hi L .
As will be seen in Section 15, we construct player i’s reward function
in the report block so that player i’s optimal strategy
report
jh
i
i
L
depends on her history in the
coordination and main blocks, in particular, on hi l . Then, player j’s history in the report block is
30
correlated with hi l . Therefore, we can construct a function
adjust
(xj ; hj l ; hreport
; l)
i
j
such that (28)
holds.
From now on, before de…ning the strategy and reward function formally, we introduce two
modules that will be useful for the equilibrium construction.
10
Module to Send a Binary Message
We explain how player j sends a binary message m 2 fG; Bg. Speci…cally, we …x m 2 fG; Bg and
de…ne how player j sends m to player i in T, where T
N is the set of periods in which player j
sends m. For example, if we take T = T (xj ; 1) and m = xj , then the following module is the one
with which player j sends xj in round (xj ; 1).
Recall that, for each i 2 I, ai (G); ai (B) 2 Ai with ai (G) 6= ai (B), and
6. Given m 2 fG; Bg, player j (sender) takes actions as follows: With
> 0 are …xed in Section
send
to be determined in
Lemma 6, player j picks one of the following mixtures of her actions at the beginning:
send
1. [Send: Regular] With probability 1 , player j picks j (m)
1 (jAj j 1) send aj (m)+
P
send
That is, player i takes aj (m) (action corresponding to the message m)
aj 6=aj (m) aj .
with a high probability.
send
2. [Send: Opposite] With probability 2 , player j picks j (m)
^
1 (jAj j 1) send aj (m)
^ +
P
send
^ = fG; Bg n fmg. That is, player j takes an action as if the true
aj 6=aj (m)
^ aj with m
message were m
^ 6= m.
3. [Send: Mixture] With probability 2 , player j picks
send
j
(M )
1
2
send
j
(G) + 12
send
j
(B). That
is, player j takes an action as if she mixed two messages G and B.
Let
j (m)
2f
send
j
(G);
send
j
(B);
send
j
(M )g be the realization of the mixture. Given
player j takes aj;t i.i.d. across periods according to
send
j
from
(B), and
j (m)
send
j
j (m).
(That is, the mixture over
(M ) happens only once at the beginning.
Given
j (m),
j (m),
send
j
(G),
player j draws aj;t
every period.) On the other hand, player i (receiver) takes ai according to
mix
i
(fully
mixed strategy) i.i.d. across periods. See Figure 4. In general, thin lines in the …gure represent
the events that happen with small probabilities.
31
Figure 4: How to send message m
In periods T, each player n 2 I observes a history hTn = fan;t ; yn;t gt2T . Given hTn , let
fn hTn (an ; yn )
#ft 2 T : (an;t ; yn;t ) = (an ; yn )g
jTj
be the frequency of periods with action an and signal yn ; and let fn hTn
(fn hTn (an ; yn ))an ;yn be
the vector expression.
2 T randomly:
In addition, as in texclude
(r) in Section 7, each player n picks one period texclude
i
n
texclude
= t with probability
n
…ne Tinclude
n
1
jTj
for each t 2 T, independently of player n’s history.
.
g as the set of periods in T except for texclude
ft 2 T : t 6= texclude
n
n
We deWe de…ne
. That is, these are the
fninclude [hTn ](an ; yn ) and fninclude [hTn ] as above, with T replaced with Tinclude
n
frequencies in the periods except for texclude
.
n
Player i creates an inference of m based on fiinclude [hTi ], denoted by m(fiinclude [hTi ]) 2 fG; Bg.
Since fiinclude [hTi ] is in
(Ai
Yi ), we can see m(fiinclude [hTi ]) as a function m :
On the other hand, player j creates a variable
is,
j (m;
):
(Aj
include T
[hj ])
j (m; fj
(Ai
Yi ) ! fG; Bg.
2 fR; Eg based on fjinclude [hTj ], that
Yj ) ! fR; Eg.
As will be seen in the proof of Lemma 11, once
include T
[hj ])
j (m; fj
= E happens in a round where
player j sends a message, then player j uses the type-3 reward for player i (that is, player j makes
player i indi¤erent between any actions, as seen in (25)) in the subsequent review rounds. In order
to satisfy (26) in the overview, we make sure that, given the true message m and player i’s history
hTi , if m(fiinclude [hTi ]) 6= m (if player i makes a mistake to infer the message), then the conditional
probability of
include T
[hj ])
j (m; fj
= E is su¢ ciently high (see Claim 2 of Lemma 6 for the formal
argument). That is, if player i realizes her mistake to infer player j’s message, then player i believes
32
that any strategy is optimal with a high probability.
In addition, as mentioned in Section 9.2, we want to make sure that the type-3 reward is used
with a small probability so that we will not violate promise keeping.
include T
[hj ])
j (m; fj
include T
[hj ])
j (m; fj
so that
6). We use R and E for the realization of
To this end, we de…ne
= E with a small probability (see Claim 3 of Lemma
j,
meaning that R stands for “regular” and E stands
for “erroneous.”
Moreover, again as mentioned in Section 9.2, we want to make sure that the distribution of the
event that player j uses the type-3 reward does not depend on player i’s strategy. To this end, we
make sure that the distribution of
include T
[hj ])
j (m; fj
does not depend on player i’s strategy (again
see Claim 3 of Lemma 6).
In addition to m(fiinclude [hTi ]), player i also creates a variable
that is,
i (receive;
):
(Ai
include T
[hj ]),
j (m; fj
Yi ) ! fR; Eg. As
include T
[hi ])
i (receive; fi
once
2 fR; Eg,
include T
[hi ])
i (receive; fi
=E
happens in a round where player j sends a message, then player i uses the type-3 reward for player
include T
[hi ])
i (receive; fi
j in the subsequent review rounds. Again, we make sure that
include T
[hi ])
i (receive; fi
with a small probability and that the distribution of
= E happens
does not depend on
player j’s strategy (see Claim 4 of Lemma 6).
Further, we make sure that player j with
include T
[hj ])
j (m; fj
= R (that is, if player j does not
use the type-3 reward for player i) believes that mi (fiinclude [hTi ]) = m or
include T
[hi ])
i (receive; fi
=E
(that is, player i infers the message correctly or player i uses the type-3 reward for player j) with
a high probability (see Claim 5 of Lemma 6).
In particular, with m = xj , this means that as long as player j is not using the type-3 reward for
player i, she believes that player i received xj correctly or player i uses the type-3 reward. Since
player j is indi¤erent between any action in the latter case, in total, this belief incentivizes her to
adhere to her own state xj , as mentioned in Section 9.3 and will be shown in Claim 5 of Lemma 11.
Finally, since Assumption 1 assumes that signal pro…les have full support, each player cannot
…gure out the other player’s actions or inferences perfectly (see Claims 6 and 7 of Lemma 6).
In total, we can construct three functions m,
Lemma 6 There exist
i (receive;
) :
(Ai
send
i,
and
j,
so that the following lemma holds:
> 0, "message > 0, Kmessage < 1, m :
Yi ) ! fR; Eg, and
j (m;
) :
(Aj
(Ai
Yj ) ! fR; Eg such that, for a suf-
…ciently large jTj, for each i 2 I and m 2 fG; Bg, the following claims hold:
33
Yi ) ! fG; Bg,
1. Since texclude
is random, the frequency is the su¢ cient statistic to infer the other players’
n
Yj )jTj , m; m
^ 2 fG; Bg, and ^i 2 fR; Eg, we have
variables: For each hTj 2 (Aj
Prijj
= Prijj
n
o
m(fiinclude [hTi ]) = m
^ i ; i (receive; fiinclude [hTi ]) = ^i j m; hTj
o
n
^ i ; i (receive; fiinclude [hTi ]) = ^i j m; fj [hTj ] :
m(fiinclude [hTi ]) = m
In addition, for each hTi 2 (Ai
Prjji
= Prjji
Here, Prijj
j m; hTj and Prjji
Yi )jTj and ^j 2 fR; Eg, we have
n
n
= ^j
include T
[hj ])
j (m; fj
include T
[hj ]) = ^j
j (m; fj
o
o
j m; hTi
j m; fi [hTi ] :
j m; hTi are induced by the following assumptions: In Prijj
j m; hTj ,
player j’s action sequence faj;t gt2T is given by hTj . On the other hand, player i takes ai;t i.i.d.
across periods according to
mix
i .
Given fat gt2T , the signal pro…le yt is drawn from the condi-
tional joint distribution function q(yt j at ) for each t, and player j observes her signal fyj;t gt2T .
randomly. Prjji
Finally, player i draws texclude
i
and j reversed, and
mix
i
replaced with
2. For all m 2 fG; Bg and hTi 2 (Ai
j m; hTi is de…ned in the same way, with i
given m.
j (m)
Yi )jTj , if m(fiinclude [hTi ]) 6= m (player i misinfers the
message), then we have
Prjji
include T
[hj ])
j (m; fj
= E j m; hTi
3. For each m and for each strategy of player i denoted by
probability of
1
j
= R does not depend on m or
2 for each m 2 fG; Bg and
1
:
SjTj
s=0
1
(Ai
Yi )s !
Prjji
include T
[hj ])
j (m; fj
j
:
SjTj
s=0
1
Yj )s !
(Aj
= R does not depend on player j’s strategy: Prijj
2 does not depend on
i
(29)
(Ai ), the
= R j m;
i
=
i.
4. For each strategy of player j denoted by
i
i:
exp( "message jTj):
1
(Aj ), the probability of
include T
[hi ])
i (receive; fi
=R j
j
=
14
j.
14
Here, we omit m from Prijj j m; hTj since player j’s strategy
determine the distribution of actions and signals.
34
j
and player i’s equilibrium strategy
mix
i
fully
Yj )jTj , if
5. For all hTj 2 (Aj
include T
[hi ])
i (receive; fi
Prijj
1
include T
[hj ])
j (m; fj
= R, then we have
= E _ m(fiinclude [hTi ]) = m j m; hTj
exp( "message jTj):
Yj )jTj , any inference of player i is possible:
6. For all m; m
^ 2 fG; Bg and hTj 2 (Aj
Prijj
m(fiinclude [hTi ]) = m
^ j m; hTj
7. For all m 2 fG; Bg and hTi 2 (Ai
exp( Kmessage jTj):
Yi )jTj , any history of player i is possible: Pr hTi j m
exp( Kmessage jTj):
Proof. See Appendix A.6.
Claim 1 holds once we de…ne m(fiinclude [hTi ])
Let us intuitively explain why Lemma 6 holds.
and
include T
[hi ])
i (receive; fi
so that they depend only on fiinclude [hTi ], and de…ne
include T
[hj ])
j (m; fj
so
that it depends only on fjinclude [hTj ]. Hence, we concentrate on the other claims.
First, we de…ne
include T
[hj ])
j (m; fj
such that player j has
include T
[hj ])
j (m; fj
= R only if (and if,
except for a small adjustment in the formal proof) all of the following conditions are satis…ed:
1. [Regular Mixture Send] Player j picks
j (m)
2. [Regular Action Send] Let fjinclude [hTj ](aj )
send
=
j
P
(m).
yj 2Yj
fjinclude [hTj ](aj ; yj ) be the frequency of
player j’s actions. We say that player j’s action frequency is regular if, for a small "message > 0,
we have fjinclude [hTj ](aj )
j (m)(aj )
"message for each aj 2 Aj .
3. [Regular Signal Send] Let fjinclude [hTj ](Yj j aj )
fjinclude [hTj ](yj
j aj )
fjinclude [hTj ](yj j aj )
yj 2Yj
with
fjinclude [hTj ](aj ; yj )
fjinclude [hTj ](aj )
be the vector-expression of the conditional frequency of player j’s signals given player j’s
action aj . (We de…ne fjinclude [hTj ](yj j aj ) = 0 if fjinclude [hTj ](aj ) = 0.) On the other hand, let
a
fqj (aj ; ai )gai 2Ai be the a¢ ne hull of player j’s signal distributions with respect to player
35
i’s actions. We say that player j’s signal frequency is regular if, for a small "message > 0,
d fjinclude [hTj ](Yj j aj (m)); a
fqj (aj (m); ai )gai 2Ai
(30)
"message :
Here and in what follows, we use Euclidean norm k k and Hausdor¤ metric d.
Given this de…nition, we verify Claim 3 of Lemma 6.
[Regular Mixture Send] and [Regular
Action Send] are solely determined by player j’s mixture, which player i cannot a¤ect. Moreover,
player i cannot a¤ect the probability of [Regular Signal Send] since a
fqj (aj (m); ai )gai 2Ai is the
a¢ ne hull with respect to player i’s actions. (Precisely speaking, taking a¢ ne hull ensures that
player i cannot change the expected distance
h
E d
fjinclude [hTj ](Yj
j aj ); a
fqj (aj ; ai )gai 2Ai
i
(31)
;
but does not guarantee that player i cannot change the distribution of the distance
Pr d fjinclude [hTj ](Yj j aj ); a
fqj (aj ; ai )gai 2Ai
(32)
:
We take care of player i’s incentive to change the distribution in the formal proof.)
probability of
Moreover,
include T
[hj ])
j (m; fj
j (m)
=
send
j
Hence, the
= R does not depend on player i’s strategy.
(m) with a high probability, as seen in Figure 3, and by the law of
large numbers, [Regular Action Send] and [Regular Signal Send] happen with a high probability
given
j (m)
=
send
j
(m). Hence,
include T
[hj ])
j (m; fj
= R with a high probability. Hence, Claim 3
of Lemma 6 holds.
On the other hand, we de…ne i (receive; fiinclude [hTi ]) as follows: Let fiinclude [hTi ](ai )
P
yi 2Yi
fiinclude
[hTi ](ai ; yi ) be the frequency of player i’s actions. We say that player i’s action frequency is regular
if, for a small "message > 0, we have fiinclude [hTi ](ai )
mix
i (ai )
[Regular Action Receive] denote this event. Player i has
"message for each ai 2 Ai .
include T
[hi ])
i (receive; fi
Let
= R only if (and if,
except for a small adjustment in the formal proof) [Regular Action Receive] holds. Again, we can
make sure that the probability of
include T
[hi ])
i (receive; fi
= R does not depend on player j’s strategy
and is high by the law of large numbers. Hence, Claim 4 of Lemma 6 holds.
We now de…ne player i’s inference m(fiinclude [hTi ]), so that Claim 2 of Lemma 6 holds.
36
She
calculates the log likelihood ratio between
prior of
j (m)
=
send
j
(G) and
j (m)
=
send
j
(B), ignoring the
j (m):
log
o
(G)
; fai;t gt2T
j
n
o
send
j
(B) ; fai;t gt2T
j (m) = j
Pr fyi;t gt2T j
n
send
j (m) =
Pr fyi;t gt2T
(
X
= jTj
fi [hTi ](ai ; yi ) log qi (yi j ai ;
send
j
(G))
ai ;yi
X
(33)
fi [hTi ](ai ; yi ) log qi (yi
ai ;yi
j ai ;
send
j
)
(B)) :
Related to (33), let
X
Li (fi [hTi ]; G)
ai ;yi
be the log likelihood of
j (m)
=
send
j
fi [hTi ](ai ; yi ) log qi (yi j ai ;
send
j
(G))
(G). Li (fi [hTi ]; B) and Li (fi [hTi ]; M ) are de…ned in the same
way, with G replaced with B and M , respectively.
Given these likelihoods, player i creates m(fiinclude [hTi ]) as follows: For some Lreceive
belief > 0,
1. [Case G] If Li (fiinclude [hTi ]; G) > Li (fiinclude [hTi ]; B) + 2Lreceive
belief , then player i infers that the
message is G: m(fiinclude [hTi ]) = G.
2. [Case B] If Li (fiinclude [hTi ]; B) > Li (fiinclude [hTi ]; G) + 2Lreceive
belief , then player i infers that the
message is B: m(fiinclude [hTi ]) = B.
3. [Case M ] If neither of the above two conditions is satis…ed, then player i infers the message
randomly: m(fiinclude [hTi ]) = G with probability 12 ; and m(fiinclude [hTi ]) = B with probability 12 .
See Figure 5 for the illustration.
Let us …x Lreceive
belief > 0.
Since the maximum likelihood estimator is consistent, there exists
37
Figure 5: How to infer m
send
> 0 such that, for su¢ ciently small Lreceive
belief > 0, for each
send
<
send
, we have
8
>
Li (fiinclude [hTi ]; G) Li (fiinclude [hTi ]; B) + 3Lreceive
>
belief
>
>
>
send
>
>
if fiinclude [hTi ] (ai ; yi ) = jA1i j qi yi j ai ; j (G) for each ai ; yi
>
>
>
>
>
>
>
(that is, if the frequency of player i’s history is close to the ex ante mean given
>
>
>
send
send
>
<
then the likelihood of j (G) is higher than that of j (B)),
>
>
Li (fiinclude [hTi ]; B) Li (fiinclude [hTi ]; G) + 3Lreceive
>
belief
>
>
send
>
>
>
if fiinclude [hTi ] (ai ; yi ) = jA1i j qi yi j ai ; j (B) for each ai ; yi
>
>
>
>
>
>
(that is, the frequency of player i’s history is close to the ex ante mean given
>
>
>
>
send
send
:
then the likelihood of j (B) is higher than that of j (G)).
Moreover, since the log-likelihood is strictly concave and
taking
send
=
send
j
j
(M ) =
1
2
send
j
(G) +
send
j
1
2
> 0 su¢ ciently small if necessary, for su¢ ciently small Lreceive
belief > 0, for each
if neither [Case G] nor [Case B] is the case, then
j (m)
send
send
j
(G) and
j (m)
Li (fiinclude [hTi ]; M )
=
send
j
j (m)
=
send
j
(G),
(B),
(34)
send
j
send
(B), re<
send
,
(M ) is more likely than both
(B): If neither [Case G] or [Case B] holds, then
max Li (fiinclude [hTi ]; G); Li (fiinclude [hTi ]; B) + 2Lreceive
belief :
We …x Lreceive
belief > 0 so that (34) and (35) hold.
38
(35)
Figure 6: A¢ ne hull of player j’s signal observation
Note that with this inference, Claim 2 of Lemma 6 holds. To see why, suppose m = G (the
explanation with m = B is the same and so is omitted). If m(fiinclude [hTi ]) = B 6= G = m, then
[Case B] or [Case M ] is the case. We consider these two cases in the sequel.
[Case B] implies that
send
=
j
prior. Since player j takes each of
j
2
send
(B) is more likely than
(G),
send
j
(B), and
send
j
j (m)
=
send
j
(G), except for the
(M ) with probability no less than
by Figure 4, given m = G and taking the prior into account, the law of large numbers ensures
that
j (m)
6=
send
j
Send] ensures that
j (m)
1
j (m)
6=
send
j
(G) with probability of order 1
j (m)
6=
send
j
(G) implies that
(G) implies that player j has
exp( Lreceive
belief jTj).
include T
[hj ])
j (m; fj
include T
[hj ])
j (m; fj
Since [Regular Mixture
= E, this high probability on
= E with probability no less than
exp( Lreceive
belief jTj). Taking "message > 0 su¢ ciently small, this is su¢ cient for (29).
If [Case M ] is the case, then (35) implies that at least one of
likely than
send
j
send
j
(B) and
send
j
(M ) is more
(G), except for the prior. The same proof as [Case B] establishes the result.
We now prove Claim 5 of Lemma 6. Recall that we have already …xed Lreceive
belief so that (34) and
(35) hold.
Note that we assume
include T
[hj ])
j (m; fj
= R, which implies [Regular Mixture Send],
[Regular Action Send], and [Regular Signal Send].
For a moment, suppose that
send
= 0 and "message = 0 for simplicity. Then, [Regular Mixture
Send] and [Regular Action Send] imply that player j takes
equality holds only with
send
j (m)
=
send
j
(m) = aj (m) (the last
= 0). Since she takes aj (m) for sure, we can see fjinclude [hTj ](Yj j aj (m))
as her entire history. Moreover, [Regular Signal Send] implies that this frequency is equal to the
a¢ ne hull a
fqj (aj (m); ai )gai 2Ai
(with "message = 0). See Figure 6.
If fjinclude [hTj ] is close to the ex ante distribution given
send
j
(m), then by the law of iterated
expectation, player j believes that player i’s history fiinclude [hTi ] is close to the ex ante distribu39
tion given
send
j
(m).
(34) ensures that player j believes that player i has L(m; fiinclude [hTi ])
L(m;
^ fiinclude [hTi ]) + 2Lreceive
^ 2 fG; Bg n fmg and so m(fiinclude [hTi ]) = m with a high probabelief with m
bility.
In particular, there exists "1 > 0 such that, if
fjinclude [hTj ](Yj j aj (m))
qj (aj (m);
mix
i )
"1 (distance from the ex ante mean is small),
(36)
then by the law of large numbers, player j believes that L(m; fiinclude [hTi ])
L(m;
^ fiinclude [hTi ]) +
jTj),
^ 2 fG; Bgnfmg (and so m(fiinclude [hTi ]) = m) with probability 1 exp( ""1 ;Lreceive
2Lreceive
belief with m
belief
where the coe¢ cient ""1 ;Lreceive
> 0 depends on "1 and Lreceive
belief . We …x such "1 > 0. In Figure 7,
belief
"1 > 0 determines the distance between point A and point D.
On the other hand, suppose (36) does not hold. This means that her signal frequency fjinclude [hTj ](Yj j
aj (m)) is not close to qj (aj (m);
mix
i ).
Since fjinclude [hTj ](Yj j aj (m)) is in a
fqj (aj (m); ai )gai 2Ai ,
this means that her signal frequency is skewed toward qj (aj (m); ai ) for some ai 2 Ai compared to
qj (aj (m);
probability
mix
i ).
Player j believes that such ai happens signi…cantly more often than the ex ante
mix
i (ai )
=
1
.
jAi j
Hence, given "1 > 0, there exists su¢ ciently small "2 > 0 such that, if (36) does not hold, then
there exists ai 2 Ai such that the conditional expectation of the frequency of ai is not close to
mix
i (ai )
=
1
:
jAi j
For some ai 2 Ai ,
h
E fiinclude [hTi ](ai ) j m;
i
1
jAi j
> "2 (the frequency of player i’s actions is irregular).
send
j
(m);
mix
T
i ; hj
(37)
Fix such "2 > 0. In Figure 7, if we take "2 > 0 su¢ ciently smaller than "1 > 0, then points B and
C will be included in the interval [A; D].
The log likelihood is continuous in perturbation
send
and frequency fiinclude [hTi ]. In addition,
the conditional expectation of fiinclude [hTi ] is continuous in fjinclude [hTj ]. Hence, there exist su¢ ciently
small
send
> 0 and "message > 0 such that the following claims hold: Given
40
include T
[hj ])
j (m; fj
= R,
Figure 7: Player j’s inference of player i’s history
1. If
fjinclude [hTj ](Yj j aj (m))
mix
i )
qj (aj (m);
"1 (distance from the ex ante mean is small),
then player j believes that m(fiinclude [hTi ]) = m with probability 1
exp(
1
" receive
2 "1 ;Lbelief
jTj). For
su¢ ciently small "message > 0 compared to ""1 ;Lreceive
> 0, we can say that player j believes
belief
that m(fiinclude [hTi ]) = m with probability 1
exp( "message jTj).
2. Otherwise, there exists ai 2 Ai such that
h
E fiinclude [hTi ](ai ) j m;
send
j
(m);
mix
T
i ; hj
i
1
1
> "2 :
jAi j
2
For su¢ ciently small "message > 0 compared to "2 > 0, by the law of large numbers, player j
believes that fiinclude [hTi ](ai )
E with probability 1
1
jAi j
> "message for some ai 2 Ai and so i (receive; fiinclude [hTi ]) =
exp( "message jTj).
In both cases, Claim 5 of Lemma 6 holds.
Finally, since each player takes a fully mixed strategy and Assumption 1 ensures that the distribution of the private signal pro…le has full support, we have Claims 6 and 7 of Lemma 6. In
the working paper, we also consider public monitoring, where both players observe the same signal
with probability one.
There, we make sure that the same claims hold, only using the fact that
players are taking private strategies.
41
11
Module for the Review Round
In this section, we consider the following jTj-period …nitely repeated game, in which T
N is the
(arbitrary) set of periods in this …nitely repeated game. In our equilibrium construction later, T
corresponds to the set of periods in the review round. In this section, we …x x(i) 2 fG; Bg2 and
x(j) 2 fG; Bg2 . As will be seen, x(i) is player i’s inference of state pro…le x 2 fG; Bg2 .
Recall that Section 6 de…nes (
i (x))i2I;x2fG;Bg2
for each . Given x(i) 2 fG; Bg2 , with
determined in Lemma 7, player i takes ai;t i.i.d. across periods according to
to be
i (x(i)).
As in Section 10, let hTi = fai;t ; yi;t gt2T be the history; fi hTi be the frequency; texclude
be the
i
period excluded from Tinclude
i
ft 2 T : t 6= texclude
g; and fiinclude [hTi ] be the frequency in Tinclude
.
i
i
As in the case with the binary message protocol, each player i creates a variable i (x(i); fiinclude [hTi ]) 2
fR; Eg based on fiinclude [hTi ], that is, i (x(i); ) :
(Ai Yi ) ! fR; Eg. Again, once i (x(i); fiinclude [hTi ]) =
E happens in a review round, player i uses the type-3 reward function for player j (makes player j indi¤erent between any action) in the subsequent review rounds. We make sure that i (x(i); fiinclude [hTi ]) =
include T
[hi ])
i (x(i); fi
E with a small probability and the distribution of
does not depend on player
j’s strategy (see Claim 2 of Lemma 7 with indices i and j reversed for the formal argument).
In addition, related to the type-1 reward in Section 9.2, each player j calculates
X
include T
[hj ])
i (x(j); fj
i[
(38)
(x(j))](aj;t ; yj;t )
t2Tinclude
j
=
X
i[
(x(j))](aj ; yj )
fjinclude [hTj ](aj ; yj );
(aj ;yj )2Aj Yj
using the reward function de…ned in Lemma 2. Except for the fact that it does not include the
reward in period texclude
and that it uses player j’s inference x(j) rather than true state pro…le x,
j
this function is the same as the type-1 realization of
T(l)
review
(xj ; hj ; l)
i
with T = T (l). Note that
this depends only on fjinclude [hTj ] and x(j) once we …x , since we have …xed ( i [ ](aj ; yj ))(aj ;yj )2Aj
in Lemma 2 for each
2
(A).
As in Section 9.2, player j will have
happens in review round l.
include T
[hi ])
i (x(i); fi
that
Yj
j (l
+ 1) = B only if
include T
[hj ])
i (x(j); fj
>
u
4L
jTj
We prove that if x(i) = x(j) (coordination on x goes well) and
= R (player i does not use the type-3 reward for player j), then player i believes
include T
[hj ])
i (x(j); fj
u
4L
jTj (and so
j (l+1)
42
= G) or
include T
[hj ])
j (x(j); fj
= E (and so player
j uses the type-3 reward function for player i) with a high probability (see Claim 3 of Lemma 7).
This high probability implies the following: As will be seen in Lemma 8, if the coordination
does not go well, then as a result of the message exchange about x, player j uses the type-3 reward
with a high probability.
include T
[hi ])
i (x(i); fi
Together with the argument in the previous paragraph, player i with
= R believes that
j (l
+ 1) = G or player j uses the type-3 reward with a high
probability. Hence, if we de…ne player i’s strategy such that she switches to
after
include T
[hi ])
i (x(i); fi
j (l
+ 1), whenever
j (l
+ 1)(i) = B, player i uses the type-3 reward
With indices i and j reversed, as mentioned in Section 9.2, player i can
condition that whenever
i (l
+ 1)(j) = B, player j uses the type-3 reward.
Finally, we prove that if x(i) = x(j),
u
4L
+ 1)(i) = B only
= E, then (26) holds.
Given such a transition of
function for player j.
j (l
jTj, then player i believes that
review
(x(i); fiinclude [hTi ])
i
include T
[hj ])
j (x(j); fj
include T
[hi ])
j (x(i); fi
= R, and
>
= E with a high probability (see Claim 4 of
Lemma 7). This high probability implies the following: As will be seen in Section 13.4.1, if xi = B
(player i wants to keep player j’s equilibrium payo¤ low),
not yet made player j indi¤erent), and
review
(x(i); fiinclude [hTi ])
i
include T
[hi ])
j (x(i); fi
>
u
4L
= R (player i has
jTj (self generation is an issue)
in a review round, then player i will minimax player j to keep her payo¤ low in all the subsequent
review rounds. In such a case, player i believes that player j uses the type-3 reward for player i
and any strategy (including the minimaxing one) is optimal.
In total, we can construct
Lemma 7 Given
(Ai
payo
i (x(i);
) so that the following lemma holds:
, u, and L …xed in Section 6.2, there exist
2 (0;
payo
),
i (x(i);
) :
Yi ) ! fR; Eg, and "review > 0 such that, for a su¢ ciently large jTj, the following four
properties hold: For each i 2 I,
1. Since texclude
is random, the frequency is the su¢ cient statistic to infer the other players’
j
variables: For each ^j 2 fR; Eg and ^ i 2 R, we have
Prjji
= Prjji
Here, Prjji
n
include T
[hj ])
j (x(j); fj
= ^j ;
include T
[hj ])
i (x(j); fj
o
= ^ i j x(j); hTi
n
o
include T
include T
^
[hj ]) = j ; i (x(j); fj
[hj ]) = ^ i j x(j); fi [hTi ] :
j (x(j); fj
j x(j); hTi is induced by the following assumptions: Player i’s action sequence
fai;t gt2T is given by hTi . On the other hand, player j takes aj;t i.i.d. across periods according to
43
j (x(j)).
Given fat gt2T , the signal pro…le yt is drawn from the conditional joint distribution
function q(yt j at ) for each t, and player i observes her signal fyi;t gt2T .
Finally, player j
draws texclude
randomly.
j
2. For each x(j) 2 fG; Bg2 and for each strategy of player i denoted by
(Ai ), the probability of
j x(j);
i)
=1
j
= R does not depend on x(j) or
2 for each x(j) 2 fG; Bg2 and
3. For each x(j) 2 fG; Bg2 and hTi 2 (Ai
R, then player i believes that either
i:
Prjji (
i
:
SjTj
s=0
1
(Ai
Yi ) s !
include T
[hj ])
j (x(j); fj
=R
i.
Yi )jTj , conditional on x(i) = x(j), if
include T
[hj ])
i (x(j); fj
is near zero or
review
(x(i); fiinclude [hTi ])
i
include T
[hj ])
j (x(j); fj
=
=
E with a high probability:
9
1
=
jTj
j x(j); fx(i) = x(j)g ; hTi A
;
include T
[hj ]) = E
_ j (x(j); fj
08
<
@
Prjji
:
u
4L
include T
[hj ])
i (x(j); fj
4. For each x(i) 2 fG; Bg2 and hTi 2 (Ai
R and
include T
[hi ])
j (x(i); fi
>
u
4L
1
Yi )jTj , conditional on x(i) = x(j), if
jTj, then player i believes that
exp( "review jTj):
review
(x(i); fiinclude [hTi ])
i
include T
[hj ])
j (x(j); fj
= E with
a high probability:
Prjji
include T
[hj ])
j (x(j); fj
= E j x(j); fx(i) = x(j)g ; hTi
1
exp( "review jTj):
Proof. See Appendix A.7.
Let us explain why Lemma 7 holds intuitively. First, we de…ne
include T
[hj ])
j (x(j); fj
as follows:
Let fjinclude [hTj ](aj ) be the frequency of player j’s actions; and let fjinclude [hTj ](Yj j aj ) be the conditional frequency of player j’s signals given aj , as in Section 10. Player j has
include T
[hj ])
j (x(j); fj
=R
only if (and if, except for a small adjustment in the formal proof) both of the following conditions
are satis…ed:
1. [Regular Action Review] We say that player j’s action frequency is regular if, for a small
"review > 0, we have fjinclude [hTj ](aj )
j (x(j))(aj )
"review for each aj 2 Aj .
2. [Regular Signal Review] We say that player j’s signal frequency is regular if, for a small
44
=
"review > 0, we have
d fjinclude [hTj ](Yj j aj (x(j))); a
fqj (aj (x(j)); ai )gai 2Ai
"review :
Given this de…nition, Claim 1 holds since texclude
is random and other variables depend only on
j
fjinclude [hTj ].
In addition, Claim 2 holds for the following reasons.
[Regular Action Review] is solely de-
termined by player j’s mixture, which player i cannot a¤ect.
the probability of [Regular Signal Review] since a
Moreover, player i cannot a¤ect
fqj (aj (x(j)); ai )gai 2Ai
is the a¢ ne hull with
respect to player i’s actions. (The same caution as (31) and (32) is applicable here.)
By the law of large numbers, [Regular Action Review] and [Regular Signal Review] happen with
a high probability. Hence,
Given this de…nition of
include T
[hj ])
j (x(j); fj
include T
[hi ])
i (x(i); fi
= R with a high probability.
include T
[hj ]),
j (x(j); fj
and
we prove Claim 3 of Lemma
7. Since we condition x(j) = x(i), let x(j) = x(i) = x. As in Claim 5 of Lemma 6, with i and j
reversed, aj (m) replaced with ai (x),
send
replaced with ,
replaced with "review , we have the following: Given
send
j
(m) replaced with
include T
[hi ])
i (x; fi
i (x),
and "message
= R, there are following two
cases:
1. fiinclude [hTi ](Yi j ai (x)) is close to qi (ai (x);
j (x)).
This means that fiinclude [hTi ](Yi j ai (x))
is close to the ex ante mean given (ai (x);
j (x)).
For a small perturbation , this means
that fiinclude [hTi ] is close to the ex ante mean given by (
i (x);
j (x)).
By the law of iterated
expectation, the conditional expectation of fjinclude [hTj ] is close to the ex ante mean given
(
i (x);
j (x)).
By the law of large numbers, player i believes that
to the ex ante mean given (
i (x);
i (x);
j (x))
include T
[hj ])
i (x; fj
is close
j (x)).
By Claim 1 of Lemma 2, the ex ante mean of
given (
include T
[hj ])
i (x; fj
include T
[hj ])
i (x; fj
=
P
t2Tinclude
j
i[
(x)](aj;t ; yj;t )
is zero. Therefore, for su¢ ciently small "review > 0, player i believes that
u
4L
jTj with probability 1
exp( "review jTj).
2. Otherwise, fiinclude [hTi ](Yi j ai (x)) is skewed toward qi (ai (x); aj ) for some aj 2 Aj compared
to qi (ai (x);
and
j (x)
j (x)).
Again, for su¢ ciently small , since player i takes ai (x) often and ai (x)
are close to each other, this implies that player i believes that fjinclude [hTj ](aj ) is
45
not close to
j (x(j))(aj )
for some aj 2 Aj (and so
include T
[hj ])
j (x; fj
= E) with probability
exp( "review jTj) for a su¢ ciently small "review .
1
In both cases, Claim 3 of Lemma 7 holds.
Similarly, we prove Claim 4 of Lemma 7. Since we assume x(j) = x(i) and i (x; fiinclude [hTi ]) = R,
again, there are following two cases: The …rst is that fiinclude [hTi ] is close to the ex ante mean given
by (
i (x);
zero,
j (x))
include T
[hi ])
j (x; fi
include T
[hj ])
j (x; fj
12
with x = x(j) = x(i).
>
u
4L
is
jTj is not the case. The remaining case is that player i believes that
= E.
Road Map to Prove (5)–(8) given the Modules
Again, recall that we have …xed
each ,
S,
include T
[hi ])
j (x; fi
Then, since the ex ante mean of
S(t)
i ,
min;
i
j,
for each ,
qG , qB ,
c:i:
i ,
i[
],
xj
i ,
x
u > 0, ui j
i2I;xj 2fG;Bg
, (a(x);
> 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg , L 2 N,
payo
;
i (x);
i
(x))x2fG;Bg2 for
> 0, ai (G), ai (B),
mix
i ,
and "strict in Section 6. Then, we …x the structure of the …nitely repeated
game in Section 7. The length of the …nitely repeated game is TP (T ) with T being a parameter.
send
Given these variables, we …x
1, m :
Yi ) ! fG; Bg,
(Ai
j
send
(G),
i (receive;
j
):
fR; Eg so that Lemma 6 holds; and we …x
send
(B),
send
(M ),
> 0, "message > 0, Kmessage <
Yi ) ! fR; Eg, and
(Ai
2 (0;
j
payo
),
i (x(i);
):
j (m;
):
(Aj
Yj ) !
Yi ) ! fR; Eg, and
(Ai
"review > 0 such that Lemma 7 holds.
Given these variables/functions, in Sections 13-16, we de…ne the strategy and reward, and then
verify (5)–(8).
Speci…cally, we de…ne player i’s strategy in the coordination and main blocks for each T in
report
jh
i
i
Section 13. The de…nition of the strategy
L
in the report block will be postponed until
Section 15.
In Section 14, we de…ne player i’s reward function
TP +1
)
i (xj ; hj
is the summation of
i (xj ),
xj
i (aj;t ; yj;t ),
TP +1
)
i (xj ; hj
review
,
i
for each T . As will be seen,
adjust
,
i
…rst three elements in Section 14 and will postpone the de…nition of
and
adjust
i
report
.
i
and
We de…ne the
report
i
until Section
15.
In Section 15, Lemma 13 de…nes
de…nition of the strategy
i (xi )
report
jh
i
i
L
,
and the reward
adjust
,
i
and
TP +1
)
i (xj ; hj
46
report
i
for each T , which completes the
for each T .
Finally, in Section 16, we prove that, for a su¢ ciently large T , the de…ned strategy and reward
satisfy (5)–(8).
The technical proofs for the claims in Sections 13-16 are further relegated to Appendix A (online
appendix), and Appendix B contains the list of the frequetly used notation.
13
Strategy in the Coordination and Main Blocks
Sections 13.1 and 13.3 de…ne the strategy in the coordination blocks for xj and xi , respectively. In
particular, player i creates xj (i) and xi (i). Section 13.2 derives properties about xj (i).
Given x(i) = (xi (i); xj (i)), in Section 13.4, we de…ne player i’s strategy in main block l. As will
be seen, her strategy in main block l depends on two variables
i (l)
2 fG; Bg and
j (l)(i)
2 fG; Bg.
Sections 13.4.1, 13.4.2, and 13.4.3 de…ne her strategy in review round l, the supplemental round for
j (l
+ 1), and that for
and
j (l)(i)
13.1
+ 1), given
i (l
i (l)
and
Finally, we de…ne the transition of
i (l)(j).
i (l)
in Section 13.4.4.
Strategy in the Coordination Block for xj
In order to coordinate on xj , player j sends xj 2 fG; Bg in rounds (xj ; 1) and (xj ; 2). Given these
two rounds, player i creates her inference xj (i) 2 fG; Bg. Given xj (i), player i sends xj (i) in round
for (xj ; 3).
1
In particular, in round (xj ; 1), player j sends xj 2 fG; Bg spending T 2 periods as explained
1
in Section 10 (with m replaced with xj , and T = T(xj ; 1) with jTj = T 2 ).
structs
include T(xj ;1)
[hj
])
j (xj ; fj
include T(xj ;1)
[hi
])
i (receive; fi
T(xj ;1)
2 fR; Eg, and player i constructs xj (fiinclude [hi
Player j con]) 2 fG; Bg and
2 fR; Eg. In addition, for each n 2 fi; jg, let texclude
(xj ; 1) be the pen
riod such that player n does not use her history in period texclude
(xj ; 1) when she determines the
n
continuation play.
2
Then, in round (xj ; 2), player j re-sends xj spending T 3 periods as explained in Section 10 (with
2
m replaced with xj , and T = T(xj ; 2) with jTj = T 3 ). Player j constructs
T(xj ;2)
fR; Eg, and player i constructs xj (fiinclude [hi
]) 2 fG; Bg and
include T(xj ;2)
[hj
])
j (xj ; fj
include T(xj ;2)
[hi
])
i (receive; fi
2
2
fR; Eg. For each n 2 fi; jg, texclude
(xj ; 2) is randomly selected.
n
T(xj ;1)
Given xj (fiinclude [hi
T(xj ;2)
]) 2 fG; Bg and xj (fiinclude [hi
47
]) 2 fG; Bg, player i creates xj (i) 2
Figure 8: How to create xj (i)
fG; Bg as follows:
T(xj ;2)
, player i uses xj (fiinclude [hi
1. [Round (xj ; 2) xj (i)] With probability 1
T(xj ;2)
to infer xj : xj (i) = xj (fiinclude [hi
]);
T(xj ;1)
2. [Round (xj ; 1) xj (i)] With probability , player i uses xj (fiinclude [hi
T(xj ;1)
infer xj : xj (i) = xj (fiinclude [hi
]) in round (xj ; 2)
]) in round (xj ; 1) to
]).
We summarize player i’s inference in Figure 8.
1
Finally, in round (xj ; 3), player i sends xj (i) 2 fG; Bg spending T 2 periods as explained
in Section 10 (with indices j and i reversed, m replaced with xj (i), and T = T(xj ; 3) with
1
jTj = T 2 ).
include T(xj ;3)
[hi
])
i (xj (i); fi
Player i constructs
T(xj ;3)
xj (i)(fjinclude [hj
]) 2 fG; Bg and
2 fR; Eg, and player j constructs
include T(xj ;3)
[hj
])
j (receive; fj
2 fR; Eg.
For each n 2 fi; jg,
texclude
(xj ; 3) is randomly selected.
n
Given
include T(xj ;2)
[hj
])
j (xj ; fj
T(xj ;3)
2 fR; Eg and xj (i)(fjinclude [hj
]) 2 fG; Bg, player j creates
xj (j) 2 fG; Bg as follows:
1. [Adhere xj (j)] If player j has
include T(xj ;2)
[hj
])
j (xj ; fj
= R in round (xj ; 2), then player j mixes
the following cases:
(a) [Round (xj ; 2) xj (j)] With probability 1
, player j adheres to her own state: xj (j) = xj ;
(b) [Round (xj ; 3) xj (j)] With probability , player j listens to player i’s message in round
T(xj ;3)
(xj ; 3): xj (j) = xj (i)(fjinclude [hj
2. [Not Adhere xj (j)] If player j has
]).
include T(xj ;2)
[hj
])
j (xj ; fj
= E in round (xj ; 2), then player j
T(xj ;3)
always listens to player i’s message in round (xj ; 3): xj (j) = xj (i)(fjinclude [hj
We summarize player j’s inference in Figure 9.
48
]).
Figure 9: How to create xj (j)
13.2
Properties of xj (i) and xj (j)
It will be useful to derive the following properties about the inferences.
Lemma 8 Given the above construction of xj (i) and xj (j), we have the following: For a su¢ ciently
large T , for each i; j 2 I and xj ; xj (i); xj (j) 2 fG; Bg,
T(xj ;1)
1. For each hi
xj (j)],
T(xj ;2)
, hi
T(xj ;3)
, and hi
include T(xj ;1)
[hj
])
j (xj ; fj
, if xj (i) 6= xj (j), then player i believes that [Round (xj ; 3)
= E, or
08
>
>
[Round (xj ; 3) xj (j)]
<
B>
B
j ;1)
Pr B _ j (xj ; fjinclude [hT(x
]) = E
j
>
@>
>
: _ (x ; f include [hT(xj ;2) ]) = E
j
j
j
j
T(xj ;1)
2. For each hj
(xj ; 1) xj (i)] or
T(xj ;2)
, hj
9
>
>
>
=
>
>
>
;
T(xj ;3)
, and hj
include T(xj ;2)
[hj
])
j (xj ; fj
j
= E with a high probability:
1
C
T(x ;1)
T(x ;2)
T(x ;3) C
xj ; xj (j); hi j ; hi j ; hi j C
A
1
1 exp( 2T 3 ):
(39)
, if xj (j) 6= xj (i), then player j believes that [Round
include T(xj ;2)
[hi
])
i (receive; fi
08
>
>
[Round (xj ; 1) xj (i)]
B>
<
B
j ;2)
Pr B _ i (receive; fiinclude [hT(x
]) = E
i
@>
>
>
T(x
;3)
: _ (x (i); f include [h j ]) = E
i j
i
i
= E with a high probability:
9
>
>
>
=
>
>
>
;
T(xj ;1)
j xj ; xj (i); hj
Proof. See Appendix A.8.
49
T(xj ;2)
; hj
1
C
C
A
T(xj ;3) C
; hj
1
1 exp( 2T 3 ):
(40)
include T(xj ;1)
[hj
])
j (xj ; fj
As will be seen, if [Round (xj ; 3) xj (j)],
= E, or
include T(xj ;2)
[hj
])
j (xj ; fj
=
E is the case, then player j uses the type-3 reward (makes any strategy of player i optimal). Hence,
(39) ensures that, whenever player i realizes that player i’s inference is miscoordinated with player
j’s inference, any strategy is optimal for player i with a high probability, as required by (26).
Similarly, (40) ensures that, whenever player j realizes that player j’s inference is miscoordinated
with player i’s inference, any strategy is optimal for player j with a high probability.
We explain why (39) holds. First, Figure 9 implies that if xj (j) 6= xj , then we have [Round
include T(xj ;2)
[hj
])
j (xj ; fj
(xj ; 3) xj (j)] or
= E. Hence, we concentrate on xj (j) = xj .
T(xj ;1)
Look at Figure 8 for how player i infers xj (i). Suppose player i has xj (i) = xj (fiinclude [hi
1
]) 6=
xj (j) = xj . Since the round (xj ; 1) lasts for T 2 periods, Claim 2 in Lemma 6 with m = xj implies
Pr
n
T(xj ;1)
include
[hj
j (xj ; fj
]) = E
o
T(xj ;1)
j xj ; hi
1
1
"message T 2 :
exp
Given xj , by Figure 9, player j’s strategy in rounds (xj ; 2) and (xj ; 3) and her inference xj (j) are
independent of her history in (xj ; 1). Hence, this inequality is su¢ cient for (39).
T(xj ;2)
Suppose next player i has xj (i) = xj (fiinclude [hi
2
]) 6= xj (j) = xj .
Since the round (xj ; 2)
lasts for T 3 periods, Claim 2 in Lemma 6 with m = xj implies
n
Pr
include T(xj ;2)
[hj
])
j (xj ; fj
=E
o
T(xj ;2)
j xj ; hi
1
exp
2
(41)
"message T 3 :
Since player j uses her history in the round (xj ; 2) to determine xj (j) by Figure 9, this inequality
does not immediately imply (39). Nonetheless, we can still prove (39) as follows.
Observe that by Figure 9, with probability at least , regardless of player j’s history in (xj ; 2),
T(xj ;3)
player j uses the inference in (xj ; 3): xj (j) = xj (i)(fjinclude [hj
]).
Since the length of round
1
2
(xj ; 3) is T , Claim 6 of Lemma 6 (with i and j reversed and m = xj (i)) implies that player i with
T(xj ;3)
hi
T(xj ;3)
believes that any xj (i)(fjinclude [hj
1
]) happens with at least probability exp( Kmessage T 2 ).
In total, player i believes that any xj (j) is possible with at least probability
T(xj ;2)
given (hi
T(xj ;3)
; hi
Hence, the belief update about
).
1
xj (j) is of order exp( T 2 ).
50
include T(xj ;2)
[hj
])
j (xj ; fj
1
exp( Kmessage T 2 )
T(xj ;3)
from hi
and
2
1
Since T 3
T 2 (here is where we use the assumption that round (xj ; 2) is much longer than
round (xj ; 3)), (41) implies that
n
Pr
include T(xj ;2)
[hj
])
j (xj ; fj
=E
o
T(xj ;2)
j xj ; xj (j); hi
T(xj ;3)
(42)
; hi
2
is of order exp( T 3 ). Conditional on xj and xj (j), player j’s strategy in round (xj ; 1) is independent
of her history in (xj ; 2) and (xj ; 3). Hence, (42) is su¢ cient for (39).
We second prove (40). Look at Figure 9 for how player j infers xj (j). Suppose player j has
T(xj ;3)
xj (j) = xj (i)(fjinclude [hj
1
]). Then, since round (xj ; 3) lasts for T 2 periods, Claim 2 of Lemma 6
(with i and j reversed and m = xj (i)) implies
Pr
n
T(xj ;3)
include
[hi
i (xj (i); fi
]) = E
o
T(xj ;3)
j xj (i); hj
1
1
"message T 2 :
exp
Given xj (i), player i’s strategy in rounds (xj ; 1) and (xj ; 2) is independent of her history in (xj ; 3).
Hence, this inequality is su¢ cient for (40).
Suppose next that player j adheres to xj . In this case, by Figure 9, player j has
include T(xj ;2)
[hj
])
j (xj ; fj
R. Hence, by Claim 5 of Lemma 6, we have
08
<
Pr @
:
9
1
=
=E
T(xj ;2) A
j
x
;
h
j
j
T(x ;2)
;
_xj (fiinclude [hi j ]) = xj
include T(xj ;2)
[hi
])
i (receive; fi
1
2
"message T 3 :
exp
T(xj ;1)
To see why this is su¢ cient for (40), we deal with player j’s learning from xj (i), hj
T(x ;3)
hj j
(43)
, and
as follows. By Figure 8, with probability at least , player i uses the inference in round
T(xj ;1)
(xj ; 1): xj (i) = xj (fiinclude [hi
1
]). Since the length of round (xj ; 1) is T 2 , Claim 6 of Lemma 6
T(xj ;1)
(with m = xj ) implies player j with hj
T(xj ;1)
believes that any inference xj (fiinclude [hi
]) happens
1
2
with probability at least exp( Kmessage T ). In total, player j believes that any xj (i) is possible
with probability at least
Since T
2
3
T
1
2
T(xj ;1)
1
exp( Kmessage T 2 ) given (hj
T(xj ;2)
; hj
).
(here is where we use the assumption that round (xj ; 2) is much longer than
round (xj ; 1)), (43) implies
08
<
Pr @
:
9
1
=
=E
T(x ;1)
T(x ;2)
j xj ; xj (i); hj j ; hj j A
;
include T(xj ;2)
_xj (fi
[hi
]) = xj
include T(xj ;2)
[hi
])
i (receive; fi
51
=
2
is still of order 1
exp( T 3 ). Moreover, given xj (i), player i’s strategy in round (xj ; 3) is inde-
pendent of her history in (xj ; 1) and (xj ; 2). Hence, (43) is su¢ cient for (40).
Strategy in the Coordination Block for xi
13.3
The strategy in the coordination block for xi is de…ned in the same way as Section 13.1, with indices
i and j reversed. Player i creates
T(x ;3)
xi (j)(fiinclude [hi i ])
2 fG; Bg,
include T(xi ;1)
[hi
])
i (xi ; fi
include T(xi ;3)
[hi
])
i (receive; xi ; fi
T(xi ;1)
the other hand, player j creates xi (fjinclude [hj
T(xi ;2)
xi (fjinclude [hj
T(xi ;3)
fjinclude [hj
13.4
]) 2 fG; Bg,
2 fR; Eg,
]) 2 fG; Bg,
include T(xi ;2)
[hj
])
j (receive; fj
include T(xi ;2)
[hi
])
i (xi ; fi
2 fR; Eg,
2 fR; Eg, and xi (i) 2 fG; Bg. On
include T(xi ;1)
[hj
])
j (receive; fj
2 fR; Eg, xi (j) 2 fG; Bg, and
2 fR; Eg,
j (xi (j);
]) 2 fR; Eg.
Strategy in the Main Block
Given the coordination block, each player i has the inference of the state pro…le x(i) = (xi (i); xj (i)) 2
fG; Bg2 . Player i with x(i) takes an action in main block l given
fG; Bg. We …rst de…ne the strategy given
13.4.1
Review Round l given
i (l)
i (l)
and
i (l)
2 fG; Bg and
j (l)(i)
2
j (l)(i):
j (l)(i)
and
In review round l, player i takes actions, depending on (i) her inference of the state pro…le
x(i) 2 fG; Bg2 , (ii) the summary statistic of the realization of player j’s reward (calculated by player
i)
i (l)
2 fG; Bg, and (iii) her inference of player j’s statistic
As will be seen, both
i (l)
and
j (l)(i)
j (l),
denoted by
j (l)(i)
are functions of the frequency of player i’s history at the be-
ginning of review round l, fiinclude [h<l
i ]. Hence, precisely speaking, we should write
and
include <l
[hi ]).
j (l)(i)(fi
Player i picks
i (l)
2 fG; Bg.
include <l
[hi ])
i (l)(fi
For notational convenience, we omit fiinclude [h<l
i ].
2
(Ai ) based on
i (l)
follows (see Figure 10 for illustration). Given
and
i (l),
j (l)(i)
at the beginning of review round l as
player i takes ai;t according to
periods for T periods in review round l. (That is, the mixture to decide
once at the beginning of review round l. Given
i (l),
i (l)
player i draws ai;t from
i (l)
i.i.d. across
is conducted only
i (l)
every period in
the round.)
1. If player i has
optimal. (If
j (l)(i)
j (l)
= G, then player i believes that with a high probability, any action is
= G (coordination goes well), then the reward is not type-2. Whether the
52
reward is type-1 or type-3, both rewards make any strategy of player i optimal by Lemma 2.)
In this case, she picks the strategy to control player j’s payo¤s.
(a) If
= G, then player j’s reward function is not type-2. In this case, player i takes
i (l)
i (l)
(b) If
=
i (l)
i (x(i))
with probability 1
, and
i (l)
=
;
i
(x(i)) with probability .
= B, then player j’s reward function is type-2.
with probability , and
i (x(i))
i (l)
;
=
i
That is, with a high probability, player i takes
Then, player i takes
(x(i)) with probability 1
i (x(i))
if
i (l)
= G and
;
i
i (l)
=
.
(x(i)) if
i (l)
= B.
As will be seen in Section 16, this strategy makes sure that player j’s equilibrium payo¤ is
vj (xi ) at the same time of satisfying self generation.
In addition, we make sure that the support of
given
j (l)
about
j (l)
is the same regardless of
= G. With indices i and j reversed, player i cannot learn
j (l)(i)
given
i (l)
i (l)(j)
j (l)
i (l)
2 fG; Bg
by observing
= G. This will be important when we calculate player i’s belief update
in Lemma 9.
2. If player i has
player j takes
j (l)(i)
;
j
= B, then player i believes that her reward function is type-2 and
(x(i)) if x(i) = x(j). (She believes that, if x(i) 6= x(j), then player j uses
the type-3 reward and any strategy is optimal with a high probability by Lemma 8.) Hence,
player i takes the static best response to
;
j
(x(i)):
i (l)
= BRi (
;
j
(x(i))).
For notational convenience, let actioni (l) = R denote the event that player i takes
i (l)
If
= G and takes
j (l)(i)
;
i
(x(i)) if
if
= B; and let actioni (l) = E denote the complementary event.
= G, then actioni (l) = R denotes the event that player i takes the strategy which is
taken with a high probability 1
reversed), if
T(l)
Let hi
i has
i (l)
i (x(i))
j (l)(i)
j (l)(i)
. As will be seen in Claim 4 of Lemma 11 (with indices i and j
= G and actioni (l) = E, then player i will use the type-3 reward for player j.
T(l)
(ai;t ; yi;t )t in review round l be player i’s history in review round l. Given hi
=
i (l)
= G and takes
T(l)
review
(x(i); fiinclude [hi ])
i
i (l)
=
i (x(i)),
, if player
then she creates the random variables
2 fR; Eg as in Section 11 with T = T(l) (T periods in review round l).
Let texclude
(l) be the period randomly excluded by player i.
i
53
Figure 10: Determinaction of
i (l)
On the other hand, player i also calculates player j’s type-1 reward
X
include T(l)
[hi ])
j (x(i); fi
j[
(44)
(x(i))](ai;t ; yi;t ):
t2T(l);t6=texclude
(l)
i
The de…nitions are the same as (38) in Section 11 with indices i and j reversed and T = T(l).15
Given
is
i
i
include T(l)
[hi ]),
j (x(i); fi
(l) and
(1) = G. Given
1. [Case
i
i
player i de…nes
(l), player i de…nes
G] If player i has
i (l)
i (l
i
(l + 1) as follows: The initial condition
+ 1) as follows (See Figure 11 for illustration):
= G, then player i has
i (l
+ 1) = B at the end of review
round l if and only if player j’s score in review round l is excessive:
(a) [Case
i
Not Excessive] If player i has
include T(l)
[hi ])
j (x(i); fi
u
T,
4L
then
i (l + 1)
=G
at the end of review round l.
(b) [Case
i
Excessive] If player i has
include T(l)
[hi ])
j (x(i); fi
>
u
T,
4L
then
i (l
+ 1) = B at
the end of review round l.
2. [Case
i
B] If player i has
round l. That is,
i (l)
i (l)
= B, then player i has
Note that the function
+ 1) = B at the end of review
= B is absorbing.
With indices i and j reversed, player j also creates
15
i (l
include T(l)
[hi ])
j (x(i); fi
j (l
+ 1) 2 fG; Bg.
is well de…ned even if player i takes
54
i (l)
6=
i (x(i)).
Figure 11: Transition of
13.4.2
The Supplemental Round for
Player j sends
j
j
(l + 1)
(l + 1) 2 fG; Bg in the supplemental round for
ods as explained in Section 10 (with m replaced with
1
2
jTj = T ).
j
i (l)
Player j creates
T(
(l + 1) (fiinclude [hi
j (l+1))
j ( j (l
+
T(
1); fjinclude [hj
j
(l + 1), and T = T(
j (l+1))
j
(l + 1)) with
]) 2 fR; Eg and player i creates
include T(
[hi
i (receive; fi
]) 2 fG; Bg and
1
+ 1) spending T 2 peri-
j (l
j (l+1))
]) 2 fR; Eg.
For each
n 2 fi; jg, texclude
( j (l + 1)) is randomly excluded.
n
13.4.3
The Supplemental Round for
i
(l + 1)
The strategy is the same as Section 13.4.2 with indices i and j reversed. Player i creates
T(
1); fiinclude [hi
i (l+1))
]) 2 fR; Eg and player j creates
include T(
[hj
j (receive; fj
i (l+1))
that we have de…ned the transition of
i (l).
The initial condition is
player i determines
1. [Case
j (i)
j (l
i (l+1))
]) 2 fG; Bg and
i (l)
and
j (1)(i)
j
j (l)(i)
and
i (l)
for all l 2 f1; :::; Lg and
Hence, to complete the de…nition of the strategy in
the main block, we are left to de…ne the transition of
Transitions of
T(
(l + 1) (fjinclude [hj
+
( i (l +1)) is randomly excluded.
]) 2 fR; Eg. For each n 2 fi; jg, texclude
n
Now that we have de…ned player i’s strategy given
13.4.4
i
i ( i (l
j (l)(i).
(l) (i)
= G. Inductively, for l 2 f1; :::; L
1g, given
j (l)(i)
2 fG; Bg,
+ 1)(i) 2 fG; Bg as follows (see Figure 12 for illustration):
G] If player i has
j (l)(i)
= G, then player i has
j (l + 1)(i)
= B in the next block
l + 1 if and only if player i decides to “listen to”player j’s message in the supplemental round
for
j (l
+ 1) and “infers”that player j’s message is B. Speci…cally,
55
(a) [Regular
^l
j (i)]
^ =
If player i has
i (l)
i (x(i))
l)
include T(^
[hi ])
i (x(i); fi
and
= R for each
^ = G, then player i mixes the following two cases:
l with
i (l)
i. [Case
j (i)
Ignore] With probability 1
Player i has
about
j (l
j (l
, player i “ignores” player j’s message:
+ 1)(i) = G, regardless of player i’s inference of player j’s message
+ 1), denoted by
j
T(
(l + 1) (fiinclude [hi
j (l+1))
]).
As will be seen in Lemma 9, if player i ignores player j’s message, then player i
believes that
j (l +1)
= G or player j uses the type-3 reward with a high probability.
In other words, (26) holds when player i ignores the message.
To see why this is true with this transition, assume here that player j has
i (l)(j)
= G, and
j (l)
not yet switched to
=
j (l)
= G,
The …rst condition assumes that player j had
j (x(j)).
= B by the beginning of review round l. In addition, as
will be seen in Lemma 11, if
j (l)
= G but either
i (l)(j)
6= B or
j (l)
then player j uses the type-3 reward (and so (26) holds trivially).
conditions, if and only if
j (l)
include T(l)
[hj ])
i (x(j); fj
>
u
,
4L
player j has
6=
j (x(j)),
Given these
j (l + 1)
= B by
Figure 11 (with indices i and j reversed).
^ = G for each ^l
Imagine …rst that player i has
j (i)]
implies
include T(l)
[hi ])
i (x(i); fi
include T(l)
[hi ])
i (x(i); fi
u
4L
i (l)
l.
Then, [Regular
By Claim 3 of Lemma 7, player i with
= R.
= R believes that, if x(i) = x(j), then
include T(l)
[hj ])
i (x(j); fj
or player j uses the type-3 reward. Moreover, if x(i) 6= x(j), then player i be-
lieves that player j uses the type-3 reward by Lemma 8. In total, player i believes
that
j (l
+ 1) = G or player j uses the type-3 reward.
Imagine second that player i switched
This means
l)
include T(^
[hi ])
j (x(i); fi
that player i with
u
4L
l)
include T(^
[hi ])
i (x(i); fi
>
^ = G to
i (l)
^+ 1) = B for some ^l
i (l
l
1.
by Figure 11. Claim 3 of Lemma 7 ensures
= R believes that player j uses the type-3
reward if x(i) = x(j). Again, if x(i) 6= x(j), then player i believes that player j uses
the type-3 reward.
In both cases, player i believes that
j (l + 1)
= G or player j uses the type-3 reward.
Note that the above argument establishes player i believing “ j (l + 1) = G or player
j using the type-3 reward” at the end of review round l (or in the second case, at
the end of review round ^l). Recall that (26) holds conditional on player i’s history
56
Figure 12: Transition of
j (l)(i)
at the end of review round l + 1. Hence, we still have to deal with player i’s learning
about
j (l + 1)
from her history between the end of review round l (or that of review
round ^l) and the end of review round l + 1. See the explanation after Lemma 9 for
the details.
ii. [Case
Listen] With probability , player i “listens to” player j’s message:
j (i)
Player i has
j (l
+ 1)(i) =
j
T(
(l + 1) (fiinclude [hi
j (l+1))
]).
Claim 2 of Lemma 6 ensures that player i believes that either
j (l +1)(i)
=
j
(l + 1)
or player j uses the type-3 reward.
(b) [Irregular
j (i)]
j (l
j
+ 1)(i) =
Otherwise, player i always listens to the message:
T(
(l + 1) (fiinclude [hi
Note that, after each history (unless
j (l+1))
j (l)(i)
Player i has
]).
= B), player i listens to player j’s message
with probability at least . See footnote 17 for why this lower bound of the probability is
important.
2. [Case
j (i)
B] If player i has
block l + 1. That is,
j (l)(i)
j (l)(i)
= B, then player i has
= B is absorbing.
57
j (l
+ 1)(i) = B in the next
13.4.5
Property of
j
(l) (i)
Now that we have …nished de…ning player i’s strategy in the coordination and main blocks, it will
be useful to summarize the property of the inference
j (l)(i).
Suppose that x(i) = x(j) (otherwise,
Lemma 8 ensures that player i believes that player j uses the type-3 reward function) and that
player j has
i (l)(j)
= G (otherwise, as mentioned in the explanation of Claim 4 of Lemma 7,
player j uses the type-3 reward).
The following lemma ensures that, if x(i) = x(j), player j has
has
j (l)(i)
i (l)(j)
= G, and player i
= G, then player i believes that, with a high probability, one of the following four
events happens:
j (l)
= G;
include T(
[hj
j ( j (l); fj
j (l))
supplemental round; actionj (^l) = E for some ^l
some review round ^l
l
]) = E happens when player j sends
l
1; or
l)
include T(^
[hj ])
j (x(j); fj
j (l)
in the
= E happens for
1.16 As will be seen in Lemma 11, the cases except for
j (l)
= G imply
that player j uses the type-3 reward. Hence, the following lemma is the basis for (26):
Lemma 9 For a su¢ ciently large T , for each i 2 I, xj 2 fG; Bg, and each hi l , if player i has
j (l)(i)
= G, then we have
0 8
include T( j (l))
<
[hj
]) = R
j (l) = B ^ j ( j (l); fj
B
^
B :
T(l)
Pr B
^actionj (^l) = j (x(j); fjinclude [hj ]) = R for each ^l
@
j xj ; fx(j) = x(i)g ; f i (l)(j) = Gg ; hi l
l
9 1
=
C
C
;
C
1
A
1
exp( T 3 ):
(45)
Proof. See Appendix A.9.
Let us give an intuitive explanation. Recall that, given
R with ^l
n
l
^
^
^
)
i (l)(j) = G ^ actionj (l) = R with l
n
o
^
)
j (l) = j (x(j)) :
l
See Figure 12 for how player i updates
16
l)
include T(^
[hj ])
j (x(j); fj
j (x(j)),
= G, if player j has actionj (^l) =
l, then we have
^
^
i (l)(j) = G ^ actionj (l) = R with l
n
i (l)(j)
then neither
o
o
j (l)(i).
since
= R nor
If player i listens to
^
j (l) =
l)
include T(^
[hj ])
j (x(j); fj
58
= B is absorbing by Figure 12
(46)
is not de…ned if player j does not take
l)
include T(^
[hj ])
j (x(j); fj
i (l)(j)
j (x(j)).
j (l)
in the supplemental
We de…ne that if
= E is the case.
^ 6=
j (l)
round for
j (l),
then by Claim 2 of Lemma 6, if
n
Pr
j(
include T(
[hj
j (l); fj
j (l))
]) = E
o
j (l)(i)
j
6=
T(
j (l); hi
then
j (l),
j (l))
1
1
T(
exp( "message T 2 ):
j (l))
Since player j’s continuation strategy does not depend on hj
given
j (l),
this inequality is
su¢ cient for (45) (for su¢ ciently large T ).
Hence, we focus on the case where player i does not listen to
j (l),
0 8
<
j (l) = B
B
B :
T(^
l)
Pr B
^actionj (^l) = j (x(j); fjinclude [hj ]) = R for each ^l
@
j xj ; fx(j) = x(i)g ; f i (l)(j) = Gg ; hi l
Since player i does not listen to
and
l)
include T(^
[hi ])
i (x(i); fi
j (l),
for each ^l
^
=
9 1
=
C
C
;
C
1
A
l
1
exp( T 3 ):
^ = G, we have
1 with
i (l)
^ =
i (l)
i (x(i))
= R.
First consider the case where player i has
i (l)(j)
l
and will prove
^ = G for each ^l
^ = G. Unless actionj (^l) = E, player j takes
j (l)
we have just veri…ed that player i with
l 1. For each ^l
i (l)
^ = G has
i (l)
^ =
i (l)
^ =
j (l)
i (x(i))
j (x(j)).
and
l 1, suppose
On the other hand,
l)
include T(^
[hi ])
i (x(i); fi
=R
in the case we are focusing on. Hence, unless actionj (^l) = E, we have
0
B
B
Pr B
@
0
B
B
= Pr B
@
1
Since
8
<
9
jTj =
: _ (x(j); f include [hT(^l) ]) = E ;
j
j
j
n
o n
o
T(^
l)
j xj ; fx(j) = x(i)g ; i (^l)(j) = j (^l) = G ; actionj (^l) = R ; fi [hi ]
8
9
1
u
< i (x(j); f include [hT(^l) ])
=
jTj
j
j
4L
C
C
^
T(
l)
: _ (x(j); f include [h ]) = E ;
C by (46)
j
j
j
A
T(^
l)
j xj ; fx(j) = x(i)g ; i (x(i)); j (x(j)); fi [hi ]
T(^
l)
include
[hj ])
i (x(j); fj
u
4L
1
C
C
C
A
exp( "review T ) by Claim 3 of Lemma 7.
^ = G and
j (l)
l)
include T(^
[hj ])
i (x(j); fj
u
4L
jTj imply
^ + 1) = G by Figure 11, together
j (l
with the case of actionj (^l) = E, the above inequality implies
0 n
Pr @
l)
include T(^
^
^
[hj ]) = R
j (l + 1) = B ^ actionj (l) = j (x(j); fj
n
o
^
^
^
j xj ; fx(j) = x(i)g ; i (l)(j) = j (l) = G ; hi l
59
o 1
A
exp( "T )
(47)
with a su¢ ciently small ".
In order to prove (45), we are left to prove
0 n
l)
include T(^
^
^
[hj ]) = R
j (l + 1) = B ^ actionj (l) = j (x(j); fj
n
o
j xj ; fx(j) = x(i)g ; i (l)(j) = j (^l) = G ; hi l
Pr @
o 1
A
(48)
is of order exp( T ) from (47). (The di¤erence from (47) is that, in (48), we condition on
^
G and hi l , rather than
since
j (l)
i (l)(j)
^
= B is absorbing, we have
If (48) holds for each ^l
To this end, comparing (47) and (48), we need to deal with learning
after review round ^l and conditioning on not only
= G (and so
~
i (l)(j)
= G for each ~l
with probability at least
~ 2f
j ( j (l))
(G);
send
j
(B);
player i cannot update the belief about
i (l)(j)
~ =
j (l)
send
j
Given
= G.
j (x(j))
and
~ =
j (l)
;
j
(x(j)) happen
(M )g happens with probability at least 2 . Hence,
^ + 1) so much by observing
~ or
j (l
i (l)(j)
j (l)
~
j ( j (l))
given
= G does not change player i’s belief so much for the following
reasons: Figure 12 with indices i and j reversed, as long as
message about
17
i (l)(j)
= G.
Moreover, conditioning on
m=
= G but also
regardless of player j’s history. Further, Figure 4 implies that, given
send
j
^
i (l)(j)
l), learning after review round ^l is small for the
following reasons: Figure 10 implies that both
~ any
9 1
=
C
C
;
C
1
A
1, then this inequality implies (45), as desired.
l
We now prove (48).
j (l),
=
= G and hi l .) To see why (48) is su¢ cient for (45), note that,
0 8
<
j (l) = B
B
^
B
l)
^ l
Pr B : ^actionj (^l) = j (x(j); fjinclude [hT(
j ]) = R for each l
@
j xj ; fx(j) = x(i)g ; f i (l)(j) = Gg ; hi l
9 1
0 8
=
<
^
(
l
+
1)
=
B
j
l
1
C
B
X B
C
T(^
l)
;
:
include
^
Pr B
C:
^actionj (l) = j (x(j); fj
[hj ]) = R
A
@
n
o
^
l=1
j xj ; fx(j) = x(i)g ; i (l)(j) = j (^l) = G ; hi l
i (l)(j)
i (l)(j)
~
i (l+1)
= G, player j listens to player i’s
with probability at least .17 Claim 6 of Lemma 6 (with i and j reversed and
~ + 1)) implies that player i with hT(
i
i (l
~
i (l)(j)
~
i (l+1))
This is where we use the mixture in the inference of
believes that any
j (l).
60
~ + 1)(f include [hT(
i (l
j
j
~
i (l+1))
])
1
happens with probability exp( Kmessage T 2 ).
possible with probability at least
1
Hence, player i believes that
1
exp( Kmessage T 2 ).
T for a su¢ ciently large T , (47) implies (48), as desired.
Since T 2
Second we consider the case where there exists ^l
to
~ + 1)(j) = G is
i (l
^ + 1) = B, that is,
i (l
l)
include T(^
[hi ])
j (x(i); fi
>
l 2 such that player i switches from
u
4L
by Figure 11. Since player i has
until review round ^l, the same proof as above implies that she believes that, given
with a high probability,
~l
^ = G unless actionj (~l) = E or
j (l)
l)
include T(~
[hj ])
j (x(j); fj
^ =G
i (l)
~ =G
i (l)
^
i (l)(j)
= G,
= E for some
^l.18
Moreover, if
^ = G, then since
j (l)
^ = G and
i (l)
^
i (l+1)
= B imply
l)
include T(^
[hi ])
j (x(i); fi
>
u
4L
by Figure 11, unless actionj (^l) = E, we have
0
n
o
1
=E
A
n
o n
o
Pr @
T(^
l)
^
^
^
j xj ; fx(j) = x(i)g ; j (l) = i (l)(j) = G ; actionj (l) = R ; fi [hi ]
0
1
o
n
l)
include T(^
])
=
E
(x(j);
f
[h
j
j
j
A
= Pr @
T(^
l)
j xj ; fx(j) = x(i)g ; i (x(i)); j (x(j)); fi [hi ]
since player j takes
1
by (46)
exp( "review T ) by Claim 4 of Lemma 7.
In total, player i believes that
some ~l
j (x(j))
l)
include T(^
[hj ])
j (x(j); fj
(49)
^ = G unless actionj (~l) = E or
j (l)
^l. Player i also believes that if
l)
include T(~
[hj ])
j (x(j); fj
^ = G, then actionj (^l) = E or
j (l)
= E for
l)
include T(^
[hj ])
j (x(j); fj
=
E. Hence, player i at the end of review round ^l believes that
n
Pr actionj (~l) =
l)
include T(~
[hj ])
j (x(j); fj
= R for each ~l
o
n
o
^l j xj ; i (^l)(j) = G ; h
i
is of order exp( T ). We can deal with learning after review round ^l and conditioning on
^
l
i (l)(j)
=G
as above.
^l, she has i (~l) = G for each
The precise argument goes as follows: Since player i has i (~l) = G for each ~l
~l
^l 1. Replacing l with ^l in the above argument, player j believes that j (^l) = G unless actionj (~l) = E or
l)
include T(~
^l 1. Since “actionj (~l) = E or j (x(j); f include [hT(~l) ]) = E for some
[hj ]) = E for some ~l
j (x(j); fj
j
j
~l ^l 1” implies “actionj (~l) = E or j (x(j); f include [hT(~l) ]) = E for some ~l ^l”, the statement holds.
18
j
j
61
13.5
Full Support of texclude
(r)
j
As will be seen, it will be useful (in the proof of Lemma 13) that player i believes that any texclude
(r)
j
is possible in each round r, as long as player j’s state
i (l)(j)
has not switched to
i (l)(j)
= B:
Lemma 10 For a su¢ ciently large T , for each i 2 I, xj 2 fG; Bg, round r, and player i’s history
hi L , the following claim holds: Let li be the …rst review round with
r satis…es r < li .
(If
i (l)(j)
= B and suppose round
i (l)(j)
= G for each l = 1; :::; L, then we assume that each r = 1; :::; R
satis…es r < li .) For each t 2 T(r), we have
texclude
(r) = t j xj ; f i (l)(j) = G for all l
j
Pr
Proof. Let
send
j
(B),
j (r)
send
j
rg ; hi
be player j’s strategy in round r. Since r < li ,
(M ),
mix
j ,
j (x(j)),
"action
support =
x(j)2fG;Bg2 ;
j (r)2
n
send
j
;
or
(G);
j
L
T
j (r)
2
.
send
can be either
j
(G),
(x(j)). Let
send
j
min
(B);
send
j
(M );
;
mix ;
j
j (x(j)); j
o
(x(j)) ;aj 2Aj
j (r)(aj )
>0
be the lower bound of the probability with which player j takes each action.
(r) from T(r), there exists one s 2 T(r) with
Since player j picks texclude
j
Pr
texclude
(r) = s j
j
j (r); hi
L
jT(r)j
1
T
1
:
By Assumption 1, the conditional probability is always well de…ned. Hence, we are left to show
that the likelihood between texclude
(r) = t and texclude
(r) = s is bounded by T
j
j
Pr
Pr
texclude
(r) = t j
j
texclude
(r)
j
=s j
j (r); hi
L
j (r); hi
L
T
1
1
for each t 2 T(r):
(50)
:
Suppose that texclude
(r) = s, (aj;s ; yj;s ) = (aj ; yj ), and (aj;t ; yj;t ) = (~
aj ; y~j ) is the case. Imagine
j
that player j changes texclude
(r) from s to t: texclude
(r) = t. At the same time, suppose that Nature
j
j
changes (aj;s ; yi;s ) from (aj ; yj ) to (~
aj ; y~j ), and changes (aj;t ; yj;t ) from (~
aj ; y~j ) to (aj ; yj ).
T(r)
fjinclude [hj
] is unchanged.
each round.
Conditional on
i (l)(j)
Then,
= G, player j takes fully mixed strategy in
Hence, the likelihood between “(aj;s ; yj;s ) = (aj ; yj ) and (aj;t ; yj;t ) = (~
aj ; y~j )” and
2
“(aj;s ; yj;s ) = (~
aj ; y~j ) and (aj;t ; yj;t ) = (aj ; yj )” is uniformly bounded by "action
support "support . Hence,
62
we have
texclude
(r) = t j
j
Pr
texclude
(r)
j
Pr
=s j
j (r); hi
L
j (r); hi
L
"action
support "support
2
;
which implies (50) for a su¢ ciently large T , as desired.
14
Reward Function in the Coordination and Main Blocks
Before de…ning the strategy in the report block, we de…ne player i’s reward function. In Section
14.1, we de…ne the variable
j (l)
2 fG; Bg, on which the reward function depends, so that
satis…es certain properties. Given the de…nition of
TP +1
),
i (xj ; hj
review
i
in (52); and
adjust
i
and
14.1
which is the summation of
report
i
is de…ned in (57). (
xj
i
i (xj ),
De…nition of
in review round l.
review
,
i
adjust
,
i
and
j (l)
convenience, we write
j (l)
j (l)
= B means that player j uses the type-3 reward for player i
j (l)
include <l
[hj ])
j (l)(fj
precisely speaking, for notational
j (l).
j (l)
is relegated to Appendix A.10.1. In Appendix A.10.1, we will de…ne
= B if (and only if, except for a small adjustment) at least one of the following
happened; when player j received a message in a round r < l,
happened; in some review round ^l
T(^
l)
review
(x(j); fjinclude [hj ])
j
~ for ~l
j (l)
l
1, player j had
^
i (l)(j)
=
include T(r)
[hj ])
j (m; fj
include T(r)
[hj ])
j (receive; fj
^ = G, took
j (l)
= E happened; or when player j inferred xi , xj , or
^ =
j (l)
= E
j (x(j)),
~ with ~l
i (l)
=E
l
1
l,19 the result of player j’s mixture was a rare event (for example, when she
inferred xj , [Round (xj ; 3) xj (j)] happened.
probability
is de…ned
2 fG; Bg
four cases happens: When player j sent a message m in a round r < l,
or took
i (xj )
depends on the frequency of player j’s history at the beginning of review
The formal de…nition of
and
report
.
i
has been de…ned in Lemma 2.) The remaining rewards
round l, fjinclude [h<l
j ]. Although we should write
such that
Section 14.2 de…nes the reward function
will be de…ned in Lemma 13.
As will be seen in (53) and (54),
j (l)
j (l),
xj
i ,
j (l)
As seen in Figure 9, this event happens only with
(rare). After [Round (xj ; 3) xj (j)], player j has
19
j (l)
= B).
Precisely speaking, since the mixture to determine j (l) happens at the beginning of review round l, j (l) depends
on player j’s history at the beginning of review round l, h<l
j , and her own mixture at the beginning of review round
l.
63
There are following six implications of this de…nition. First,
implies
j (l
j (l)
= B is absorbing:
j (l)
=B
+ 1) = B.
Second, the distribution of
does not depend on player i’s strategy. Since we will de…ne
j (l)
that player j uses the type-3 reward if and only if
j (l)
= B, this ensures that player i cannot
control whether player j uses the type-3 reward or not, as mentioned in Section 9.2.
To see why this property is true, recall that Claims 3 and 4 of Lemma 6 ensure that the distribution of
include T(r)
[hj ])
j (m; fj
include T(r)
[hj ])
j (receive; fj
and
does not depend on player i’s strategy.
T(^
l)
review
(x(j); fjinclude [hj ])
j
Moreover, Claim 2 of Lemma 7 ensures that the distribution of
does not
depend on player i’s strategy.20 Finally, player j’s own mixture is out of player i’s control. Hence,
in total, the distribution of
does not depend on player i’s strategy.
j (l)
Moreover, recalling that E stands for erroneous (and so it is rare that the realization of a random
variable is E), it is rare to have
j
This ensures that player j does not use the type-3
(l) = B.
reward often, as mentioned in Section 9.2.
Third, player j with
player j has
i (l)(j)
To see why
~
i (l)(j)
= G to
i (l)(j)
= B has
j (l)
= B. This ensures that player i can condition that
= G since otherwise the type-3 reward makes all the strategies optimal.
i (l)(j)
= B implies
j (l)
= B, take ~l
1 such that player j switched from
l
~ + 1)(j) = B. As seen in Figure 12 with indices i and j reversed, this means
i (l
that player j listened to player i’s message
~ + 1). If [Regular
i (l
i (j)]
was the case (see Section
13.4.4 with indices i and j reversed), then listening to player i’s message is the rare realization of
player j’s own mixture and so
If [Regular
or had
^
i (l)(j)
i (j)]
j (l)
= B.
is not the case, then player j with
l)
include T(^
[hj ])
j (x(j); fj
= E for some ^l
~l. Since player j had
= G. Hence, if she did not take
her own mixture for
^
j (l).
In total, player j has
Fourth, given
j (l)
If
j (l)
j (x(j)) in
l)
include T(^
[hj ])
j (x(j); fj
In other words, if actionj (l) = E, then
l, then
j (l)
j (x(j))
= G, she also had
i (l)(j)
= E, then player j has
j (x(j))
j (l)
~
^ =
j (l)
review round ^l, then this is a rare realization of
= B whenever player j switched from
= G, player j takes
actionj (^l) = E for some ^l
^ = G did not take
j (l)
= B.
if
j (l)
~
i (l)(j)
j (l)
= G to
= G and takes
Moreover, since
= B by de…nition.
j (l)
;
j
~
i (l+1)(j)
(x(j)) if
j (l)
= B.
= B.
= B is absorbing, if
= B.
To see why, note that the third property implies that given
j (l)
= G, player j has
i (l)(j)
= G.
T(^
l)
Since review
(x(j); fjinclude [hj ]) is de…ned only if player j takes j (^l) = j (x(j)), we need some adjustment so
j
that player i does not want to control the distribution of j (l) by changing the distribution of j (^l).
20
64
Given
takes
i (l)(j)
;
j
= G, Figure 10 with indices i and j reversed ensures that the event that player j
(x(j)) with
j (l)
= G or takes
no more than ). Since we have
j (l)
j (x(j))
with
j (l)
= B is rare (happens with probability
= B after a rare realization, the result follows.
Fifth, if player j has xj (j) 6= xj , then we have
j (l)
= B. Recall that player j’s state xj controls
player i’s payo¤ (see (5) for example). This property means that when player j does not adhere to
her own state (“gives up”controlling player i’s payo¤), player i is indi¤erent between any action.
Figure 9 ensures that, if xj (j) 6= xj , then either
include T(xj ;2)
[hj
])
j (xj ; fj
= E or [Round (xj ; 3)
xj (j)] (rare realization of player j’s mixture) happens. Hence, xj (j) 6= xj implies
Finally, suppose that player i conditions on
conditioning is optimal).
j (l)
= G or
j (l)
j (l)
= B.
= G (see the third property for why this
i (l)(j)
Player i believes that, if she has x(i) 6= x(j) or
= G, then
j (l)(i)
= B with a high probability. That is, as seen in (26), player i believes that, if
the coordination does not go well, then player j uses the type-3 reward with a high probability.
To see why player i holds such a belief, we …rst explain that, if player i has x(i) 6= x(j), then she
believes that
j (l)
= B with a high probability. x(i) 6= x(j) happens only if one of the following
two cases happens: xj (i) 6= xj (j) or xi (i) 6= xi (j). We consider these two cases in the sequel.
Given (39) in Lemma 8, if player i has xj (i) 6= xj (j), then given xj and xj (j), player i believes
include T(xj ;1)
[hj
])
j (xj ; fj
that [Round (xj ; 3) xj (j)] (a rare realization of player j’s mixture),
include T(xj ;2)
[hj
])
j (xj ; fj
= E, or
= E at the end of coordination block for xj . Since player j’s continuation
strategy does not depend on her history in the coordination block for xj conditional on xj (j), this
belief means that, in review round l, player i believes that
j (l)
= B.
Again, given (40) in Lemma 8 with indices i and j reversed, if player i has xi (i) 6= xi (j), then
given xi (j), player i believes that [Round (xi ; 1) xi (j)] (a rare realization of player j’s mixture),
include T(xi ;2)
[hj
])
j (receive; fj
= E, or
include T(xi ;3)
[hj
])
j (xi (j); fj
= E at the end of coordination block
for xi . Since player j’s continuation strategy does not depend on her history in the coordination
block for xi given xi (j), this belief means that, in review round l, player i believes that
Hence, we are left to show that, given x(i) = x(j), if player i has
that
j (l)
= G or
j (l)
with x(i) = x(j) and
j (l)(i)
= G, then she believes
include T(
[hj
j (l) = G, j ( j (l); fj
^
include T(l)
[hj ]) = E for some ^l
l
j (x(j); fj
= G believes that player j has
l
1, or
seen in the fourth property, actionj (^l) = E for some ^l
j (l))
= B.
= B with a high probability. Recall that Lemma 9 ensures that player i
E, actionj (^l) = E for some ^l
include T(
[hj
j ( j (l); fj
j (l)(i)
j (l)
]) = E and
l)
include T(^
[hj ])
j (x(j); fj
65
l
1 means
= E imply
j (l)
j (l)
= B.
j (l))
1.
]) =
As
In addition,
= B by de…nition. Hence,
Lemma 9 implies that player i believes that player j has
j (l)
= G or
j (l)
= B with a high
probability.
In total, we prove the following lemma:
Lemma 11 For each j 2 I and l 2 f1; :::; Lg, we can de…ne a random variable
j (l)
2 fG; Bg
whose distribution is determined by player j’s history h<l
j such that the following properties hold:
1.
j (l)
= B is absorbing:
j (l)
2. For each xj 2 fG; Bg and
j (xj );
Pr (f
j
i ),
i
=B)
2
i,
j (l
+ 1) = B.
the distribution of
does not depend on player i’s strategy
(l) = Bg j
3. For each hj l with
j (xj );
i (l)(j)
i)
i
= G and takes
;
j
2
denoted by Pr(
i.
Moreover,
j
(l) j
j (l)
= B is rare:
(15 + 8L) .
= B, we have
j (l)
= B.
4. For each j 2 I and l 2 f1; :::; Lg, for each hj l with
j (l)
j (l),
(x(j)) if
j (l)
j (l)
= G, player j takes
j (x(j))
if
= B.
5. For each j 2 I and l 2 f1; :::; Lg, for each hj l with xj (j) 6= xj , we have
j (l)
= B.
6. For a su¢ ciently large T , for each i 2 I, xj 2 fG; Bg, l 2 f1; :::; Lg, and hi l , if x(i) 6= x(j)
or
j (l)(i)
j (l)
= G with a positive probability with hi l , then player i believes that
= B with a high probability conditional on xj , x(j), and
Pr fx(i) 6= x(j) _
j (l)(i)
i (l)(j)
= Gg j xj ; x(j); hi
l
j (l)
= G or
= G: If hi l satis…es
> 0;
then we have
Pr f j (l) = G _
j (l)
= Bg j xj ; f i (l)(j) = Gg ; hi
l
1
1
exp( T 3 ):
Proof. See Appendix A.10.
14.2
De…nition of the Reward Function
We are now ready to de…ne player i’s reward function, given the above de…nition of
reward is the summation of (i) a constant
i (xj ),
j (l).
The total
(ii) the reward for rounds other than review rounds,
66
PR
r=1:round r is not a review round
and
adjust
,
i
P
t2T(r)
xj
i (aj;t ; yj;t ),
(iii) the reward for review round l = 1; :::; L,
report
:
i
and (iv) the reward for the report block
TP +1
)
i (xj ; hj
=
i (xj )
R
X
+
review
i
X
xj
i (aj;t ; yj;t )
r=1
t2T(r)
round r is not a review round
+
+
L n
X
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l)
i
+
adjust
(xj ; hj l ; hreport
; l)
i
j
l=1
report
(xj ; hj L ; hreport
):
i
j
o
(51)
We de…ne and explain each term in the sequel.
First, we de…ne the constant
i (xj )
= TP vi (xj )
i (xj )
8 hP
L
< E
l=1 1f
:
such that
x
T ui (xj ) + 1f j (l)=Bg sign (xj ) LT u + T ui j
j (l)=Gg
o
n
2
1
x
2
3
ui j
+ (2 + 2L) T + 2T
i 9
j xj =
;
:
(52)
We de…ne this constant so that player i’s average value from the review phase is equal to vi (xj ) in
order to satisfy promise keeping. As will be seen in (85), the value in the biggest parenthesis is the
value (without being divided by TP ) from the coordination, main, and report block.
Second, the term
R
X
X
xj
i (aj;t ; yj;t )
r=1
t2T(r)
round r is not a review round
cancels out the e¤ect of the instantaneous utilities on the incentives for the rounds which are not
review rounds. This reward incentivizes the players to take actions in order to exchange messages
in the coordination block and supplemental rounds.
Third, we de…ne
if
i (l)(j)
review
i
= B (recall that
such that
i (l)(j)
review
i
satis…es the properties mentioned in Section 9. Firstly,
= B implies
sign (xj ) LT u +
j (l)
X
= B), then player j uses the type-3 reward
xj
i (aj;t ; yj;t ):
(53)
t2T(l)
Compared to (25) in Section 9, we add sign(xj ) LT u. Since sign(xj )
sign (xj ) LT u = LT u, this
term gives us enough slack in self generation once the type-3 reward is used.
67
Secondly, if
i (l)(j)
= G, then if
j (l)
= B, then again player j uses the type-3 reward
sign (xj ) LT u +
X
xj
i (aj;t ; yj;t ):
(54)
t2T(l)
If
i (l)(j)
=
j (l)
issue or not). If
= G, then the reward function depends on
j (l)
(whether self generation is an
= G (self generation is not an issue), then player j uses the type-1 reward
T fui (xj )
ui (
(x(j)))g +
X
(x(j))](aj;t ; yj;t ) is ui (
i[
(55)
(x(j))](aj;t ; yj;t ):
t2T(l)
By Lemma 2, any strategy of player i is optimal.
ui (at )+ i [
j (l)
In addition, since the expected payo¤ from
(x(j))), the value from the review round (without being divided
by T or TP ) is T ui (xj ). Here, we ignore the e¤ect of her strategy in review round l on the payo¤s
from the subsequent rounds. See Section 16 for the formal proof of why it is optimal for player i
to ignore this e¤ect.
On the other hand, if
j (l)
= B (self generation is an issue), then player j uses the type-2 reward
T fui (xj )
By Claim 4 of Lemma 11, player j with
ui (BRi (
j (l)
;
j
;
(x(j)));
= G and
j
j (l)
(56)
(x(j)))g:
;
= B takes
(x(j)). Since the reward
j
is constant, as long as the coodination goes well and player i has x(i) = x(j) and
;
player i’s equilibrium strategy, which takes a static best response to
addition, since the expected payo¤ from ui (at ) is ui (BRi (
;
j
(x(j)));
review round (without being divided by T or TP ) is T ui (xj ).
68
j
;
j
j (l)(i)
= B,
(x(j)), is optimal.
In
(x(j))), the value from the
review
i
In total, we de…ne the reward
as follows:
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l)
i
8
<
X
= 1f i (l)(j)=Bg sign (xj ) LT u +
:
t2T(l)
|
{z
+1f
i (l)(j)=Gg
8
>
>
>
>
>
>
>
>
>
>
>
>
1f
>
>
>
>
<
j (l)=Gg
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
:
8
>
>
>
>
>
>
>
>
<
(53)
1f
(57)
9
=
xj
i (aj;t ; yj;t )
;
}
j (l)=Gg
>
>
>
>
>
+1f
>
>
>
:
8
<
: +
|
T fui (xj )
P
ui (
(x(j)))g
(x(j))](aj;t ; yj;t ) ;
{z
}
i[
t2T(l)
9
>
>
>
>
>
>
>
>
=
9
=
9
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
T
fu
(x
)
u
(BR
(
(x(j)));
(x(j)))g
> :
i j
i
i
j
j
j (l)=Bg
>
|
{z
} >
; >
>
(56)
>
9
8
>
>
>
=
<
>
X x
>
j
>
>
(a
;
y
)
+1f j (l)=Bg sign (xj ) LT u +
j;t j;t
>
i
>
;
:
>
>
t2T(l)
>
>
|
{z
}
;
(55)
;
;
(54)
Note that x(j), i (l)(j), j (l), and j (l) are determined by xj , fjinclude [hj l ], and player j’s own mixP
P
x
T(l)
ture, and t2T(l) i j (aj;t ; yj;t ) and t2T(l) i [ (x(j))](aj;t ; yj;t ) are determined by fj [hj ]. Hence,
review
i
T(l)
is a function of xj , fjinclude [hj l ], fj [hj ], and l.
Given this de…nition of
review
,
i
Player i has x(i) 6= x(j) or
j (l)(i)
by Claim 3 of Lemma 11) and
j (l)(i)
= B but player j has
of player i is optimal given
player i’s strategy is optimal unless the coordination goes wrong:
= G even though player j has
j (l)
j (l)
j (l)
j (l)
= G (this implies
i (l)(j)
=G
= B. One may wonder what if player i has x(i) 6= x(j) or
=
j (l)
= G. This case is not a problem since any strategy
= G. To see why, note that Lemma 2 ensures that, with type-1
reward (55), any strategy is optimal.
Let
x(i) and
i (x(j); l)
j (l)(i)
<l
be the set of player i’s histories fi [h<l
i ] such that player i with fi [hi ] has x(j) =
= B with probability one:
i (x(j); l)
8
9
exclude
<
=
For any realization of ti
(r) for r < l,
fi [h<l
]
:
:
i
:
;
player i with fi [h<l
]
has
x(j)
=
x(i)
and
(l)(i)
=
B
j
i
69
The discussion above means that, if and only if
i (l)(j)
=G^
j (l)
=G^
j (l)
= B ^ fi [h<l
i ] 62
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l).
i
player i’s strategy would be suboptimal if the reward for review round l were
target
i
If player j used the following reward function
would be optimal after each history: Again, if
the type-3 reward (53) or (54), respectively.
uses the type-1 reward (55) as before.
fi [h<l
i ] 62
i (x(j); l)”,
If
review
,
i
rather than
= B or
i (l)(j)
i (l)(j)
In addition, if
=
j (l)
i (l)(j)
j (l)
i (x(j); l)”,
then player i’s strategy
= B, then player j uses
= G and
=
j (l)
then player j uses the type-1 reward as well.
“ j (l) = B and fi [h<l
i ] 2
(58)
i (x(j); l);
If
= G, player j
j (l)
= G and “ j (l) = B and
i (l)(j)
=
j (l)
= G and
player j uses the type-2 reward (56).
Intuitively, whenever player i’s history satis…es (58), player j uses the type-1 reward rather than
type-2.
Since Lemma 2 ensures that the type-1 reward makes any strategy of player i optimal,
with such a reward function, player i’s strategy would be optimal after each history.
In total,
target
i
is de…ned as
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
= 1f
Since
i (l)(j)=Bg
8
<
:
sign (xj ) LT u +
X
t2T(l)
(59)
9
=
xj
i (aj;t ; yj;t )
;
+1f i (l)(j)=Gg
8
8
>
>
1
>
>
<l
>
>
f j (l)=G_f j (l) = B ^ fi [hi ] 62 i (x(j); l)gg
>
>
>
>
|
{z
}
>
>
>
>
>
>
th e c a se w ith w h ich
>
>
>
review
>
is su b o p tim a l
<
>
i
8
9
>
>
< 1f j (l)=Gg
< T fu (x ) u ( (x(j)))g =
i j
i
>
>
>
P
>
>
: +
>
>
>
(x(j))](aj;t ; yj;t ) ;
>
t2T(l) i [
>
>
>
>
>
>
>
>
: +1
>
fui (xj ) ui (BRi ( j ; (x(j)));
>
>
f j (l)=B^fi [h<l
i ]2 i (x(j);l)g
>
n
o
>
P
>
x
:
+1f j (l)=Bg sign (xj ) LT u + t2T(l) i j (aj;t ; yj;t )
target
i
is a function of whether fi [h<l
i ] is included in
i (x(j); l)
9
>
>
>
>
>
>
>
>
>
>
=
;
j
>
>
>
>
>
>
>
>
>
>
(x(j)))g ;
9
>
>
>
>
>
>
>
>
>
>
>
>
>
=
>
>
>
>
>
>
>
>
>
>
>
>
>
;
:
or not, this reward is a function
of fi [h<l
i ] as well.
Instead of replacing
to
review
:
i
The reward
review
i
adjust
,
i
with
target
,
i
we add the expected di¤erence between
which is added to
review
i
70
review
i
and
target
i
in (51), is de…ned so that the expected value
of
adjust
i
adjust
(xj ; hj l ; hreport
; l)
i
j
13, we will de…ne
E
= E
if
i (l)(j)
review
i
review
i
is equal to the expected di¤erence between
= G and
and
target
i
h
h
report
jh
i
i
and
adjust
(xj ; hj l ; hreport
; l)
i
j
L
target
.
i
and
In particular, in Lemma
so that
j xj ; f i (l)(j) = Gg ;
report
jh
i
i
l
; hi
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
l
i
j xj ; f i (l)(j) = Gg ; hi
i
h
T(l)
l
l
include
E review
(x
;
f
[h
];
f
[h
];
l)
j
x
;
f
(l)(j)
=
Gg
;
h
j
j j
j
i
i
j
j
i
adjust
(xj ; hj l ; hreport
; l)
i
j
is zero if
is identically equal to zero if
i (l)(j)
i (l)(j)
l
i
(60)
= B. (Note that the di¤erence between
report
jh
i
i
= B.) Here,
l
distribution of player i’s strategy in the report block given hi l . This property of
is the conditional
adjust
i
corresponds
to (28) in Section 9.4.
We postpone the de…nition of
14.3
adjust
i
and
report
i
to Lemma 13.
target
i
Small Expected Di¤erence between
and
review
i
Recall that Claim 6 of Lemma 11 ensures that player i who has x(i) 6= x(j) or
that it is rare for player j to have
the di¤erence between
target
i
and
j (l)
= B and
review
i
adjust
)
i
= G given
i (l)(j)
= G believes
= G. On the other hand,
is not zero only if player i has x(i) 6= x(j) or
with a positive probability and player j has
(and so the expected value of
j (l)
j (l)(i)
j (l)
= B and
j (l)
j (l)(i)
=G
= G. Hence, the expected di¤erence
is close to zero, conditional on
i (l)(j)
= G. This closeness
will play an important role when we de…ne player i’s strategy in the report block in Lemma 13.
Lemma 12 For a su¢ ciently large T , for each i 2 I, l 2 f1; :::; Lg, xj 2 fG; Bg, and hi l , the
di¤erence (60) depends only on the frequency of player i’s history:
i
j xj ; f i (l)(j) = Gg ; hi l
h
i
T(l)
l
l
include
(x
;
f
[h
];
f
[h
E review
];
l)
j
x
;
f
(l)(j)
=
Gg
;
h
j
j j
j
i
i
j
j
i
h
i
T(l)
target
l
l
l
include
= E i
(xj ; fi [hi ]; fj
[hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ]
h
i
T(l)
l
l
review
include
E i
(xj ; fj
[hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ]
E
h
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
71
Moreover, this di¤erence is su¢ ciently small:
E
h
i
j xj ; f i (l)(j) = Gg ; fi [hi ]
h
i
T(l)
l
l
review
include
E i
(xj ; fj
[hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ]
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
Proof. Both
i (l)(j)
review
i
target
i
and
l
1
exp( T 4 ):
T(l)
depend only on xj , fi [hi l ], fjinclude [hj l ], and fj [hj ]. In addition,
(r) is random, fi [hi l ] is a su¢ cient statistic.
depends only on fjinclude [hj l ]. Since texclude
j
Recall that fi [h<l
i ] 62
i (x(j); l)
target
i
and
review
i
di¤er only if
Since both
review
i
and
Comparing (57) and (59),
B ^ fi [h<l
i ] 62
i (x(j); l).
conditional on xj , x(j),
player i believes that
i (l)(j)
j (l)
implies that player i has either x(i) 6= x(j) or
j (l)
j (l)
j (l)
= G^
= G.
j (l)
=
are of order T , it su¢ ces to show that,
= G, and fi [hi l ], if player i has x(i) 6= x(j) or
= B or
Pr f j (l) = B _
target
i
= G^
i (l)(j)
j (l)(i)
j (l)(i)
= G, then
= G with a high probability:
= Gg j xj ; x(j); f i (l)(j) = Gg ; hi
l
1
1
exp( T 3 ):
Claim 6 of Lemma 11 establishes the result.
15
Report Block
We now de…ne
i (xi )
report
jh
i
i
L
,
and reward function
adjust
,
i
and
report
i
in Lemma 13.
This …nishes de…ning the strategy
TP +1
):
i (xj ; hj
Lemma 13 There exists Kreport 2 N such that there exist
adjust
(xj ; hj l ; hreport
; l)
i
j
2
h
report
jh
i
i
L
,
adjust
(xj ; hj l ; hreport
; l)
i
j
i
exp( T ); exp( T ) ;
1
5
1
5
11
11
and
report
(xj ; hj L ; hreport
)
i
j
2 [ Kreport T 12 ; Kreport T 12 ]
such that, for su¢ ciently large T , for each xj 2 fG; Bg and hi L ,
72
with
1.
report
jh
i
i
E
"
L
maximizes
X
ui (at ) +
report
(xj ; hj L ; hreport
)
i
j
+
t:report block
L
X
adjust
(xj ; hj l ; hreport
; l)
i
j
l=1
j xj ; hi
L
#
:
(61)
2. From player i’s perspective at the end of review round l, the expected adjustment is equal to
(60):
E
= E
if
i (l)(j)
h
h
adjust
(xj ; hj l ; hreport
; l)
i
j
j f i (l)(j) = Gg ;
report
jh
i
i
l
; hi
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
l
i
j xj ; f i (l)(j) = Gg ; hi
h
i
T(l)
l
l
include
E review
(x
;
f
[h
];
f
[h
];
l)
j
x
;
f
(l)(j)
=
Gg
;
h
j
j j
j
i
i
j
j
i
= G, and the adjustment is equal to zero if
i (l)(j)
l
i
(62)
= B.
3. The equilibrium value satis…es
E
"
X
ui (at ) +
report
(xj ; hj L ; hreport
)
i
j
t:report block
j xj ;
report
jh
i
i
L
; hi
L
#
= 0:
(63)
Proof. See Appendix A.11.
Let us intuitively explain why this lemma is true. To this end, we …rst assume that the players
have access to the public randomization device and can communicate via cheap talk in Section 15.1.
Then, we explain how to dispense with cheap talk, keeping public randomization device in Section
15.2. Finally, we explain how to dispense with public randomization device in Section 15.3. The
formal proof in Appendix A.11 does not use public randomization device or cheap talk.
15.1
Report Block with Cheap Talk
T(r)
Intuitively, player i sends fi [hi L ] = (fi [hi
])R
r=1 to player j so that player j can calculate
adjust
i
1
to satisfy (62). To this end, it will be useful to divide each round r into jT(r)j 3 subrounds with
2
equal length, that is, each subround lasts for jT(r)j 3 periods: Let t(r) + 1 be the …rst period of
round r: T(r) = ft(r) + 1; :::; t(r) + jT(r)jg.
ft(r) + (k(r)
2
3
Subround k(r) of round r consists of T (r; k(r))
2
T(r;k(r))
1) jT(r)j + 1; :::; t(r) + k(r) jT(r)j 3 g. Let fi [hi
73
] be the frequency of player i’s
Figure 13: Rounds and subrounds
history in subround k(r) of round r:
T(r;k(r))
fi [hi
# ft 2 T (r; k(r)) : ai;t = ai ; yi;t = yi g
] (ai ; yi )
2
:
jT(r)j 3
See Figure 13 for illustration:
15.1.1
Player i’s Strategy
report
jh
i
i
L
At the beginning of the report block, the players draw a public randomization device, which determines whether player 1 or 2 sends the message fi [hi L ]. Each player is picked with probability
1
2
and only one player is picked at the same time (that is, with public randomization, only one player
sends the history). Suppose player i is picked.
T(r;k(r))
For each round r, player i sends the frequency of each subround (fi [hi
1
jT(r)j 3
])k(r)=1 . Then, for
each round r, the players draw a public randomization device, which picks one subround k(r) with
probability jT(r)j
1
3
1
randomly. Suppose that ki (r) 2 f1; :::; jT(r)j 3 g is picked for round r. Then,
player i sends the entire history of the picked subround ki (r): (ai;t ; yi;t )t2T(r;ki (r)) .
(The reason
why player i sends the message this way rather than simply sending (ai;t ; yi;t )t2T(r) is to reduce the
cardinality of the message, looking ahead to dispensing with cheap talk.)
15.1.2
Reward Functions
report
i
and
adjust
i
Player j incentivizes player i to tell the truth by
report
i
as follows: First, player j incentivizes player
i to tell the truth about (ai;t ; yi;t )t2T(r;ki (r)) by giving the reward equal to
T
11
X
t2T(r;ki (r))
1ft=texclude (r)g 1fr<l g 1aj;t ;yj;t
j
i
74
E 1aj;t ;yj;t j
^i;t ; y^i;t
j (r); a
2
:
(64)
In general, 1fXg is equal to one if statement X is true and zero otherwise. Here, 1ft=texclude (r)g = 1
j
exclude
(r); and 1fr<l g is equal to one if and only if round r is before (and not
if and only if t = tj
i
equal to) review round li (recall that li is the …rst review round with
i (l)(j)
= B). We de…ne
1fr<l g = 1 for each r = 1; :::; R if player j has i (l)(j) = G for each l = 1; :::; L. In addition,
i
1aj;t ;yj;t is jAi j jYi j-dimensional vector whose element corresponding to (aj;t ; yj;t ) is one and other
elements are equal to zero. Moreover,
j (r)
is player j’s strategy in round r and (^
ai;t ; y^i;t ) is player
i’s message.
1aj;t ;yj;t
E 1aj;t ;yj;t j
^i;t ; y^i;t
j (r); a
2
is called the scoring rule in statistics, and incentivizes
player i to tell the truth about (^
ai;t ; y^i;t ) if player i believes that (aj;t ; yj;t ) is distributed according
to Pr( j
j (r); ai;t ; yi;t ).
Moreover, given Assumption 4, the incentive is strict if
j (r)
is a fully
mixing strategy. (See Lemma 46 for the formal proof.)
(r). Recall that player j does not
Here, player j punishes player i based on the period texclude
j
use (aj;t ; yj;t ) with t = texclude
(r) to determine her continuation strategy. Hence, player i cannot
j
learn (aj;t ; yj;t ) and believes that (aj;t ; yj;t ) is distributed according to Pr( j
j (r); ai;t ; yi;t ).
By
(r). Together with the
Lemma 10, with Assumption 1, player i cannot learn what period is texclude
j
fact that
j (r)
is fully mixing given
i (l)(j)
= G, player i has the strict incentive to tell the truth
about (ai;t ; yi;t ) for each t 2 T (r; ki (r)) if r < li .
Second, we make sure that (64) does not a¤ect player i’s incentive in the coordination and main
blocks. At the timing when player i takes ai;t , the expected reward given that player i will tell the
truth in the report block is
T
11
1ft=texclude (r)g 1fr<l g E
j
i
given texclude
(r), li , and
j
j (r).
h
1aj;t ;yj;t
E 1aj;t ;yj;t j
2
j (r); ai;t ; yi;t
j
j (r); ai;t
Note that the conditional expectation E 1aj;t ;yj;t j
i
;
j (r); ai;t ; yi;t
depends on (ai;t ; yi;t ), and the distribution of yi;t depends on ( j (r); ai;t ). Hence, the conditional
expectation given ( j (r); ai;t ) (but before observing yi;t ) depends on ( j (r); ai;t ). Since player j’s
signal yj;t statistically infers player i’s action ai;t , there exists
cancel
[ j (r)](aj;t ; yj;t )
i
=E
h
1aj;t ;yj;t
E 1aj;t ;yj;t j
75
cancel
[ j (r)](aj;t ; yj;t )
i
j (r); ai;t ; yi;t
2
j
such that
j (r); ai;t
i
:
To cancel out the e¤ect of (64) on player i’s incentive to take actions, player j gives the reward
T
11
1ft=texclude (r)g 1fr<l g
j
i
cancel
[ j (r)](aj;t ; yj;t ):
i
(65)
T(r;ki (r))
Third, player j punishes player i if player i’s message about the frequency fi [hi
] about
the picked subround is not compatible with player i’s message about the history in this subround:
T
15
1fr<l g 1
i
^ T(r;ki (r)) ]6=jT(r)j
f i [h
i
2
3
P
t2T(r;ki (r))
1a^i;t ;^
yi;t
(66)
:
2
The variable with hat denotes player i’s message. Since jT (r)j 3 is the length of the subround,
^ T(r;ki (r)) ] and jT (r)j 23 P
f i [h
^i;t ;^
yi;t should be equal to each other if player i tells the
i
t2T(r;ki (r)) 1a
truth.
In total, we have
report
(xj ; hj L ; hreport
)
i
j
=
R
X
8
>
>
>
>
<
T
11
P
t2T(r;ki (r))
+T
1fr<l g
i
>
>
r=1
>
>
:
P
11
T
E 1aj;t ;yj;t j
1ft=texclude (r)g 1aj;t ;yj;t
j
t2T(r;ki (r))
15
1
1ft=texclude (r)g
j
^ T(r;ki (r)) ]6=jT(r)j
fi [h
i
2
3
P
^i;t ; y^i;t
j (r); a
cancel
[ j (r)](aj;t ; yj;t )
i
t2T(r;ki (r))
1a^i;t ;^
yi;t
1
2
9
>
>
>
>
=
>
>
>
>
;
:
^ T(r;k(r)) ])jT(r)j 3 , player j
Finally, given player i’s message about the frequency of subrounds, (fi [h
i
k(r)=1
calculates the frequency of the round:
1
^ T(r) ] = jT (r)j
f i [h
i
1
3
jT(r)j 3
X
^ T(r;k(r)) ]:
f i [h
i
(67)
k(r)=1
^ T(1) ]; :::; fi [h
^ T(r) ] with round r being review round l, player j de…nes
For each l, from fi [h
i
i
adjust
(xj ; hj l ; hreport
; l)
i
j
= 2
1f
i (l)(j)=Gg
8 h
< E
:
i 9
^ l] =
j xj ; f i (l)(j) = Gg ; fi [h
i
h
i
;
l
l
;
include
l
^ ]
E review
(x
;
f
[h
];
f
[h
];
l)
j
x
;
f
(l)(j)
=
Gg
;
f
[
h
j
j
j
i
i
i
j
j
j
i
target
(xj ; fjinclude [hj l ]; fj [hlj ]; fi [hi l ]; l)
i
76
so that (62) holds with truthtelling. Lemma 12 ensures that the expected di¤erence between
and
review
i
target
i
depends only on the frequency of player i’s history. Here, 2 cancels out the probability
that player i is selected by the public randomization (we de…ne that the adjustment is zero for
player j who is not selected). Note that we have
1
adjust
(xj ; hj l ; hreport
; l)
i
j
(68)
exp( T 4 )
for each xj , hj l , hreport
, and l by Lemma 12.
j
15.1.3
Incentive Compatibility
Consider player i’s incentive to tell the truth about the history in round r. If 1fr<l g = 0, that
i
report
report
L
) does not
(xj ; hj ; hj
is, if round r is in review round li with i (li )(j) = B or after, then i
depend on player i’s message about round r. Moreover, the message about round r does not a¤ect
adjust
i
either. To see why, note that, for each l < li ,
adjust
; l)
(xj ; hj l ; hreport
j
i
= 2
1f
i (l)(j)=Gg
8 h
< E
:
i 9
l
^
j xj ; f i (l)(j) = Gg ; fi [hi ] =
i
h
l
l
;
l
include
review
^
[hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ]
(xj ; fj
E i
target
(xj ; fjinclude [hj l ]; fj [hlj ]; fi [hi l ]; l)
i
does not depend on player i’s message about round r since round r is after review round l. For
each l
adjust
(xj ; hj l ; hreport
; l)
i
j
li , we have
does not a¤ect
report
i
or
= 0 by de…nition since
i (l)(j)
= B. Since the message
adjust
,
i
any message is optimal about round r with 1fr<l g = 0. Therefore,
i
we will concentrate on the case with 1fr<l g = 1 and i (l)(j) = G for each review round l which is
i
equal to or before round r.
When player i sends (ai;t ; yi;t )t2T(r;ki (r)) , player i believes that the expected reward from (64)
given
j (r)
is
T
E
First, Pr
11
h
Pr
t = texclude
(r) j
j
1aj;t ;yj;t
E 1aj;t ;yj;t j
t = texclude
(r) j
j
j (r); hi
L
j (r); hi
L
^i;t ; y^i;t
j (r); a
T
2
77
2
j
j (r); hi
L
; t = texclude
(r)
j
i
for each t 2 T(r) by Lemma 10.
:
(69)
Second,
since player j does not use information in period texclude
(r) to determine the continuation play,
j
(r) = t, player j’s history (aj;t ; yj;t ) is distributed according to
player i believes that given texclude
j
Pr (aj;t ; yj;t j
j (r); ai;t ; yi;t ).
Hence, (69) is equal to
"E
h
for some " > 0 of order T
1aj;t ;yj;t
13
E 1aj;t ;yj;t j
^i;t ; y^i;t
j (r); a
2
j
j (r); ai;t ; yi;t
i
(70)
for each t 2 T (r; ki (r)).
Finally, since we are considering the case with 1fr<l g , player j’s strategy j (r) has full support.
i
Then, by Assumption 4, we can make sure that the expected loss in (70) from telling a lie about
13
(ai;t ; yi;t ) is of order T
of the other reward in
relevant reward in
13
. (See Lemma 46 for the details.) Since T
report
i
report
i
(since
cancel
[ j (r)](aj;t ; yj;t )
i
is greater than the magnitude
is sunk in the report block, the other
adjust
(xj ; hj l ; hreport
; l)
i
j
is the one de…ned in (66)) and the adjustment
is
1
4
of order exp( T ) by (68), it is strictly optimal to tell the truth about (ai;t ; yi;t )t2T(r;ki (r)) regardless
of the past messages.
Given this truthtelling incentive about (ai;t ; yi;t )t2T(r;ki (r)) , consider player i’s incentive to tell the
truth about the frequency in subrounds.
T(r;k(r))
While she sends the message fi [hi
that public randomization will pick any k(r) with probability jT(r)j
1
3
T
T(r;k(r))
public randomization ki (r) is drawn after she …nishes sending (fi [hi
i tells a lie about subround k(r), then the expected loss from (66) is T
than the magnitude of the adjustment
T(r;k(r))
the truth about (fi [hi
adjust
(xj ; hj l ; hreport
; l).
i
j
1
3
], she believes
. (Recall that the
1
jT(r)j 3
])k(r)=1 .) Hence, if player
1
3
T
15
, which is greater
Hence, it is strictly optimal to tell
1
jT(r)j 3
])k(r)=1 regardless of the past messages.
In summary, we construct an incentive compatible strategy and reward such that player i tells
the truth about the history and from that report, player j calculates
15.2
adjust
i
to satisfy (62).
Dispensing with Cheap Talk
We now explain how player i sends the message by taking actions.
Again, the players draw a
public randomization device to decide who to report the history. Suppose that player i is selected.
Player j takes
mix
j
i.i.d. across periods, and
that there exists player j’s reward function
adjust
(xi ; hi l ; hreport
; l)
j
i
report
j
to incentivize her to take
equilibrium value in the report block equal to zero.
78
= 0. Assumption 2 ensures
mix
j
and to keep her
Hence, we focus on player i’s strategy and
rewards.
15.2.1
Player i’s Strategy
report
jh
i
i
L
T(r;k(r))
As in the case with cheap talk, player i sends (fi [hi
1
jT(r)j 3
])k(r)=1 and (ai;t ; yi;t )t2T(r;ki (r)) . Let mi be
T(r;k(r))
a generic message that player i wants to send in the report block, that is, mi can be fi [hi
]
for some r and k(r), or (ai;t ; yi;t ) for some r and t 2 T (r; ki (r)); and let Mi be the set of possible
2
T(r;k(r))
fi [hi
2
jT (r)j 3 jAi jjYi j
messages. We have jMi j
T(r;k(r))
T 3 jAi jjYi j for fi [hi
] since the frequency of subround
2
] can be expressed by “how many times out of jT (r)j 3 periods player i observes each
2
(ai ; yi ).” (Recall that jT (r)j 3 is the length of the subround.) For (ai;t ; yi;t ), we have jMi j
since (ai ; yi ) is included in Ai
jAi j jYi j
Yi .
We now explain how player i sends mi 2 Mi . Given ai (G) and ai (B) …xed in Section 6.2, there
exists a one-to-one mapping ~ai : Mi ! fai (G); ai (B)glog2 jMi j between message mi and sequence
of binary actions since fai (G); ai (B)glog2 jMi j = jMi j.
explicitly.)
(See Appendix A.4 for how to de…ne ~ai
2
T(r;k(r))
Note that the length of ~ai (mi ) is bounded by log2 T 3 jAi jjYi j for fi [hi
] and by
log2 jAi j jYi j for (ai;t ; yi;t ).
When player i sends the message mi , player i takes an action sequence ~ai (mi ).
Moreover,
player i repeats each element of ~ai (mi ) for multiple periods, in order to increase the precision of
T(r;k(r))
the message. In particular, for mi corresponding to fi [hi
1
], player i repeats the action for T 2
1
periods, and for mi corresponding to (ai;t ; yi;t ), she repeats it for T 4 periods. Let T (mi ) be the
T(r;k(r))
number of repetitions: T (fi [hi
1
1
]) = T 2 and T (ai;t ; yi;t ) = T 4 .
11
Given this strategy, the communication takes periods of order T 12 :
0
1
2
T
|{z}
B
R B
repetition
X
B
B
1
B
r=1 @ + T 4
|{z}
repetition
11
which is of order T 12
1
3
jT (r)j
| {z }
log2 jT (r)j
|
{z
number of subrounds
length of ~ai (mi ) for
2
3
jT (r)j
| {z }
number of periods per subround
1
2
jAi jjYi j
3
}
T(r;k(r))
]
fi [hi
log jA j jY j
| 2 {zi i}
length of ~ai (mi ) for (ai ;yi )
C
C
C
C;
C
A
(71)
T , as seen in (20).
11
Let treport + 1 be the …rst period of the report block; and let treport + 1; :::; treport + T 12 be the
11
periods in which player i takes ~ai (mi ) for some mi . (Precisely speaking, T 12 should be of order
11
11
11
T 12 by (71). For simple notation, we just write T 12 rather than of order T 12 in the text.) After
79
Figure 14: Structure of the report block with public randomization
player i …nishes sending each ~ai (mi ), the players sequentially assign S periods for each of periods
11
11
treport + 1; :::; treport + T 12 , where S is determined in Section 6.3. That is, periods treport + T 12 +
11
11
1; :::; treport + T 12 + S are assigned to period treport + 1, periods treport + T 12 + S + 1; :::; treport +
11
T 12 + 2S are assigned to period treport + 2, and so on. In general, let S(t) be the set of periods
11
treport + T 12 + (k
11
1)S + 1; :::; treport + T 12 + kS that are assigned to period t with k = t
In periods S(t), given player i’s history in period t, player i takes
The periods treport + 1; :::; treport + T
11
12
S(t)
i
treport .
determined in Section 6.3.
are called “the round for sending the history” and the
other periods are called “the round for conditional independence.” Figure 14 summarizes the entire
structure.
15.2.2
Player j’s Inference of Player i’s Message
For each period t in which player i sends an element of an action sequence ~ai (mi ) assigned to a
message mi , player j given her history in periods t and S(t) calculates a function
[ (aj; ; yj; )
2S(t) )
j ((aj;t ; yj;t )
determined in Section 6.3. By Lemma 4, since the expected realization of
j
is
high (or low) if player i takes ai (G) (or ai (B)), player j infers that the element of ~ai (mi ) is ai (G)
80
(or ai (B)) if there are a lot of high (or low) realization of
j.
Since player i repeats the element
of ~ai (mi ) for T (mi ) periods, by the law of large numbers, player j can infer the element correctly
with probability of order
1
(72)
exp( T (mi )):
S(t)
i
Importantly, Lemma 4 ensures that (if player i expects that she will take
in the round for
conditional independence) player i in the round for sending the history cannot update player j’s
inference from player i’s signal observation (conditional independence property).
Given this inference of each element of ~ai (mi ), player j infers each message using the inverse of
T(r;k(r))
~ai (mi ). Let (fi [hi
T(r)
player j infers fi [hi
15.2.3
1
jT(r)j 3
](j))k(r)=1 and (ai;t (j); yi;t (j))t2T(r;ki (r)) be player j’s inference. As in (67),
T(r;k(r))
](j) and fi [hi l ](j) from ((fi [hi
Reward Functions
We modify
adjust
i
and
report
i
adjust
i
and
1
jT(r)j 3
](j))k(r)=1 )R
r=1 .
report
i
in order to deal with the possibility of errors. First, we consider an
error when player j calculates
adjust
(xj ; hj l ; hreport
; l)
i
j
= 2
1f
i (l)(j)=Gg
8 h
< E
:
(73)
i 9
target
(xj ; fjinclude [hj l ]; fj [hlj ]; fi [hi l ]; l) j xj ; f i (l)(j) = Gg ; fi [hi l ](j) =
i
h
i
l
l
;
include
l
E review
(x
;
f
[h
];
f
[h
];
l)
j
x
;
f
(l)(j)
=
Gg
;
f
[h
](j)
j
j j
j
i
i i
i
j
j
T(r;k(r))
from fi [hi l ](j). Since the cardinality of fi [hi
T(r;k(r))
fi [hi
2
] is log2 jT (r)j 3 jAi jjYi j
2
] is jT (r)j 3 jAi jjYi j , the length of ~ai (mi ) to send
1
2
log2 T 3 jAi jjYi j . Since each round has jT (r)j 3
1
T 3 subrounds and
there are R rounds, the total length of ~ai (mi )’s that are used to calculate fi [hi l ](j) is no more than
1
2
1
T(r;k(r))
RT 3 log2 jT (r)j 3 jAi jjYi j . By (72), recalling that T (mi ) = T 2 for mi corresponding to fi [hi
],
player j infers fi [hi l ](j) correctly with probability no less than
1
1
2
RT 3 log2 jT (r)j 3 jAi jjYi j
1
T(r;k(r))
On the other hand, the cardinality of the messages ((fi [hi
81
(74)
exp( T 2 ):
1
jT(r)j 3
](j))k(r)=1 )R
r=1 used to calculate
fi [hi l ](j) is no more than
YR
cardinality of
r=1
1
number of subrounds in round r
T(r;k(r))
fi [hi
]
jT (r)j
2
jAi jjYi j
3
RT 3
;
1
which is of order exp(T 3 ).
As will be seen in Appendix A.11.3.2, if we have shown that the probability of the correct
inference (74), to the power of the cardinality of the messages used in (73), converges to one, then
we can create
adjust
(xj ; hj l ; hreport
; l)
i
j
such that, given that player i follows the equilibrium strategy
in the report block, (i) this adjustment is still small and (ii) (62) holds from the perspective of player
1
2
1
i in review round l. Since we have (jT (r)j 3 jAi jjYi j )RT 3
1
exp(T 2 ) for a su¢ ciently large
exp(T 3 )
T , the probability of the correct inference, to the power of the cardinality of the messages, satis…es
2
1
1
2
1
1
RT 3 log2 jT (r)j 3 jAi jjYi j
1
3
RT log2 jT (r)j
2
jAi jjYi j
3
exp( T 2 )
jT(r)j 3 jAi jjYi j
1
RT 3
1
1
2
jT (r)j
exp( T )
2
jAi jjYi j
3
RT 3
! 1 as T ! 1;
(75)
as desired.
report
(xj ; hj L ; hreport
)
i
j
We also modify
taking errors into account.
Consider the reward to
incentivize player i to tell the truth about (ai;t ; yi;t )t2T(r;ki (r)) :
1ft=texclude (r)g 1fr<l g 1aj;t ;yj;t
j
i
E 1aj;t ;yj;t j
^i;t ; y^i;t
j (r); a
2
:
(76)
Without cheap talk, player j can infer each element of ~ai (mi ) to send (ai;t ; yi;t ) correctly with
probability of order 1
1
1
exp( T 4 ) since T (mi ) = T 4 for mi corresponding to (ai ; yi ). Since the
number of elements of ~ai (mi ) to send (ai;t ; yi;t ) is log2 jAi j jYi j, the probability that player j infers
(ai;t ; yi;t ) correctly is
1
log2 jAi j jYi j
1
exp( T 4 ):
On the other hand, the cardinality of the message (ai;t ; yi;t ) in (76) is jAi j jYi j.
Hence, the probability of the correct inference, to the power of the cardinality of the messages
82
in (76), converges to one as T goes to in…nity:
1
log2 jAi j jYi j
1
exp( T 4 )
jAi jjYi j
1
jAi j jYi j log2 jAi j jYi j
1
exp( T 4 ) !T !1 1:
As will be seen in Lemma 48 formally, this means that we can slightly modify the reward taking
errors into account, so that it is strictly optimal for player i to tell the truth about (ai;t ; yi;t ).
cancel
[ j (r)](aj;t ; yj;t )
i
Similarly, we can modify
cancel
i
so that
cancels out the e¤ect of ai;t in
period t of the coordination or main block on the modi…ed version of (76).
T(r;k(r))
Next, consider the reward to incentivize player i to tell the truth about fi [hi
1
T(r;ki (r))
fi [hi
](j)6=jT(r)j
2
3
P
t2T(r;ki (r))
T(r;ki (r))
Without cheap talk, since the cardinality of fi [hi
T(r;ki (r))
to send fi [hi
2
jT (r)j
2
3
jT (r)j
T
P
2
3
(77)
:
2
] is jT (r)j 3 jAi jjYi j , the length of ~ai (mi )
] is log2 jT (r)j 3 jAi jjYi j . In addition, since the cardinality of (ai;t ; yi;t ) is jAi j jYi j,
the length of ~ai (mi ) to send (ai;t ; yi;t ) is log2 jAi j jYi j.
2
3
1ai;t (j);yi;t (j)
]:
Since each subround has jT (r; ki (r))j =
T(r;ki (r))
periods, the total length of ~ai (mi )’s that are used to calculate fi [hi
t2T(r;ki (r))
](j) and
1ai;t (j);yi;t (j) is
2
2
log2 jT (r)j 3 jAi jjYi j + jT (r)j 3 log2 jAi j jYi j :
Hence, all the messages transmit correctly with probability
1
2
2
1
1
(log2 jT (r)j 3 jAi jjYi j exp( |{z}
T 2 ) + jT (r)j 3 log2 jAi j jYi j exp( |{z}
T 4 )):
T(r;k(r))
# of repetitions for fi [hi
(78)
# of repetitions for (ai;t ;yi;t )
]
On the other hand, the cardinality of the messages used in (77) is calculated as follows: The
2
2 P
T(r;k (r))
cardinality of fi [hi i ](j) is jT (r)j 3 jAi jjYi j . As for jT (r)j 3 t2T(r;ki (r)) 1ai;t (j);yi;t (j) , when player i
sends the message as if (~
ai;t ; y~i;t )t2T(r;ki (r)) were the true message, the distribution of this summation
P
ai;t ; y~i;t )t2T(r;ki (r)) . Hence,
t2T(r;ki (r)) 1ai;t (j);yi;t (j) depends only on the frequency of each (ai ; yi ) in (~
the relevant cardinality of (ai;t (j); yi;t (j))t2T(r;ki (r)) for (77) is equal to that of the frequency, that is,
2
jT (r)j 3 jAi jjYi j .
83
Hence, (78) to the power of the relevant cardinality is
2
log2 jT (r)j
1
2
jAi jjYi j
3
2
3
1
2
2
jT(r)j 3 jAi jjYi j jT(r)j 3 jAi jjYi j
1
4
exp( T ) + jT (r)j log2 jAi j jYi j exp( T )
!T !1 1:
Therefore, we can slightly modify the reward function so that player i wants to tell the truth about
T(r;k(r))
fi [hi
].
In addition, we add
c:i:
i
de…ned in Section 6.3 in order to incentivize player i to take
S(t)
i
in the
round for conditional independence.
Finally, we add the reward
i (xj ; yj )
de…ned in (9), so that it cancels out the e¤ect of the
instantaneous utilities.
In total, we have
report
(xj ; hj L ; hreport
)
i
j
X
=
t:report block
+
R
X
r=1
+
i (xj;t ; yj;t )
8
>
>
>
>
>
>
>
>
<
1fr<l g
i
>
>
>
>
>
>
>
>
:
T
11
P
t2T(r;ki (r))
T
X
T
+T
(
15
11
1ft=texclude (r)g
j
:
1 c:i:
i
modi…cation of
E 1aj;t ;yj;t j
1aj;t ;yj;t
1ft=texclude (r)g modi…cation of
j
modi…cation of 1
t:round for sending the history
15.2.4
8
<
T(r;ki (r))
fi [hi
(aj;t ; yj;t ) [ (aj; ; yj; )
cancel
[ j (r)](aj;t ; yj;t )
i
](j)6=jT(r)j
2S(t)
:
j (r); ai;t (j); yi;t (j)
2
3
P
t2T(r;ki (r))
1ai;t (j);yi;t (j)
)
Incentive Compatibility
Let us verify player i’s incentive. By
i (xj;t ; yj;t ),
(9) ensures that we can neglect the instantaneous
utility. In the round for conditional independence, Claim 1 of Lemma 4 implies that, once player
i deviates, then she incurs a loss of T
other terms in
report
i
are of order T
11
1
"strict from T
1 c:i:
i .
Since the modi…cation is small, the
. Hence, it is optimal for player i to take
S(t)
i
in the round
for conditional independence.
Given this incentive, Claim 2 of Lemma 4 ensures that player i in the round for sending the
history cannot update player j’s inference of player i’s message.
Moreover, (9) and Claim 3 of
Lemma 4 ensures that player i is indi¤erent between any action in the round for sending the history
84
2
9 9
= >
>
>
>
>
; >
>
>
=
>
>
>
>
>
>
>
>
;
in terms of the expected value of
ui (at ) +
i (aj;t ; yj;t )
+
X
(ui (a ) +
i (aj;
; yj; )) +
c:i:
i
2S(t)
(aj;t ; yj;t ) [ (aj; ; yj; )
2S(t)
:
Since the incentive with cheap talk is strict, the message transmit correctly with a high probability,
and the modi…cation is small, it is incentive compatible for player i to follow the equilibrium strategy.
In summary, we construct an incentive compatible strategy and reward without cheap talk. The
key idea is to use the round for conditional independence to keep conditional independence property
in the round for sending the history.
15.3
Dispensing with Public Randomization
Now we explain how to dispense with the public randomization. Recall that the public randomization plays two roles in the report block. The …rst is to pick who reports the history. The second
is to pick a subround k(r).
Let us focus on the …rst one. (The second one is dispensed with by a similar procedure. See
Appendix A.11 for the details.)
The purpose of this …rst role is to establish the following two
properties: (i) ex ante (before the report block), every player has a positive probability to report
the history, and (ii) ex post (after the realization of the public randomization), there is only one
player who reports the history.
The property (i) is important since, without adjusting the reward function by
adjust
i
based
on player i’s report in the report block, player i’s equilibrium strategy would not be optimal.
The property (ii) is important to incentivize player i to tell the truth. Remember that player j
incentivizes player i to tell the truth by punishing player i according to (64). As seen in (70), the
establishment of truthtelling incentive uses the fact that player i cannot update her belief about
the realization of player j’s history in period texclude
(r). If both players sent messages by taking
j
actions and player i could observe a part of player j’s messages before …nishing reporting her own
history, then player i might be able to learn player j’s history in period texclude
(r) and would want
j
to tell a lie.
In order to establish these two properties without public randomization, we consider the following
procedure. For simplicity, let us assume that the signals are not conditionally independent (see the
85
end of this subsection for the case with conditionally independent signals): There exists ap:r: 2 A
such that, given ap:r: , there exist y1p:r: ; y1p:r: 2 Y1 and y2p:r: 2 Y2 such that
Pr (y2p:r: j ap:r: ; y1p:r: ) 6= Pr (y2p:r: j ap:r: ; y1p:r: ) :
Fix those ap:r: , y1p:r: ; y1p:r: 2 Y1 , and y2p:r: 2 Y2 . Without loss, we assume that
Pr (y2p:r: j ap:r: ; y1p:r: ) > Pr (y2p:r: j ap:r: ; y1p:r: ) :
(79)
At the beginning of the report block, the players take a pure strategy ap:r: .21
Let tp:r: be the
period when the players take ap:r: . Each player i observes her own signal yi;tp:r: . By Assumption
2, since player j can statistically identify if player i takes aip:r: , there exists
p:r:
i
: Aj
that it is optimal for player i to take ap:r:
i :
E ui (ai ; aj ) +
p:r:
i (aj ; yj )
j ai ; ajp:r:
8
< 0 if a = ap:r: ,
i
i
=
p:r:
: 1 if a 6= a .
i
i
Yj ! R such
(80)
Intuitively, player 2 asks player 1 to guess whether player 2 observed y2;tp:r: = y2p:r: or y2;tp:r: 6=
y2p:r: . On the other hand, since players’signals are conditionally dependent, player 1’s conditional
likelihood of y2;tp:r: = y2p:r: against y2;tp:r: 6= y2p:r: depends on y1;tp:r: 2 Y1 . In particular, (79) ensures
that there exists pp:r:
such that
1
Pr
Pr
y2;tp:r: = y2p:r: j ap:r: ; y1p:r:
Pr
p:r:
>
p:r:
p:r: > p1
p:r:
y2;tp:r: 6= y2
j a ; y1
Pr
y2;tp:r: = y2p:r: j ap:r: ; y1p:r:
:
y2;tp:r: 6= y2p:r: j ap:r: ; y1p:r:
Perturbing pp:r:
if necessary, we make sure that there is no y1 2 Y1 such that
1
Pr
Pr
21
y2;tp:r: = y2p:r: j ap:r: ; y1
= p1p:r: :
y2;tp:r: 6= y2p:r: j ap:r: ; y1
The superscript p:r: stands for “public randomization.”
86
Given this de…nition, we partition the set of player 1’s signals into Y1report and Y1not
(
Pr
y 1 2 Y1 :
Pr
(
Pr
y 1 2 Y1 :
Pr
Y1report
Y1not
with Y1report [ Y1not
Y1not
report
report
report
y2;tp:r: = y2p:r: j ap:r: ; y1
> p1p:r:
p:r:
p:r:
y2;tp:r: 6= y2
j a ; y1
y2;tp:r: = y2p:r: j ap:r: ; y1
< p1p:r:
p:r:
p:r:
y2;tp:r: 6= y2
j a ; y1
p:r:
about
3 y1p:r: ;
3 y1p:r:
.” (In this explanation, we assume that cheap talk is available. Cheap talk is dispensable
= 2 denote the event that y1;tp:r: 2 Y1non
p:r:
)
:
= Y1 . Player 1 is asked to report whether “y1;tp:r: 2 Y1report ” or “y1;tp:r: 2
as in Section 15.2.) For notational convenience, let
and
)
report
p:r:
report
= 1 denote the event that y1;tp:r: 2 Y1report ;
.
Let ^p:r: 2 f1; 2g be player 1’s message
.
To incentivize player 1 to tell the truth, player 2 rewards player 1 if either “player 2 observes
y2p:r: and player 1 reports ^p:r: = 1”or “player 2 does not observe y2p:r: and player 1 reports ^p:r: = 2”:
T
4
1fy2;tp:r: =yp:r: ^^p:r: =1g + p1p:r: 1fy2;tp:r: 6=yp:r: ^^p:r: =2g :
2
2
After player 1 sends ^p:r: , the players send the message about hi
L
(81)
sequentially: Player 1 sends
the messages …rst, and then player 2 sends the messages. The players coordinate on how to send
the messages based on ^p:r: as follows.
If player 1 reports ^p:r: = 1, then player 2 adjusts player 1’s reward function based on player 1’s
messages, and incentivizes player 1 to tell the truth according to (64), (65), and (66) with i = 1.
On the other hand, if player 1 reports ^p:r: = 2, then player 2 incentivizes player 1 to send an
uninformative message (formally, we extend the message space to fgarbageg [ M1 and player 2
gives the positive reward only if player 1 sends fgarbageg if ^p:r: = 2.
She does not adjust the
reward function for player 1 if ^p:r: = 2). As in the case with public randomization, given player
1’s message ^p:r: , player 1 wants to send the true message mi after ^p:r: = 1; and she wants to send
fgarbageg after ^p:r: = 2.
On the other hand, player 1 adjusts the reward and incentivizes player 2 to tell the truth
according to (64), (65), and (66) with i = 2, only after player 1 sends fgarbageg. (If ^p:r: = 1, then
player 1 makes player 2 indi¤erent between any messages and
87
adjust
2
is identically equal to zero.)
Figure 15: How to coordinate without public randomization
See Figures 15 and 16 for illustration.
Since Y1report and Y1non
report
are nonempty, with truthtelling, both ^p:r: = 1 (that is, y1;tp:r: 2
Y1report ) and ^p:r: = 2 (that is, y1;tp:r: 2 Y1not
report
) happen with a positive probability.
Hence,
each player has the positive probability to get her reward adjusted (that is, the property (i) is
established).
Moreover, the second sender (player 2) can condition that player 1’s message was
fgarbageg, and so there is no learning about h1 L (that is, the property (ii) is established).
We are left to verify that it is optimal for the players to take ap:r: and player 1 has the incentive
to tell the truth about
p:r:
. Consider player i’s incentive to take a pure strategy ap:r: . ap:r: a¤ects
player i’s payo¤ through (80), (81), adjustment of the reward
adjust
,
i
the rewards for truthtelling
(64), (65), and (66). Since the e¤ects other than (80) are su¢ ciently small compared to (80), it is
optimal to take ap:r:
for each i 2 f1; 2g.
i
Given that the players take ap:r: , the report ^p:r: a¤ects player 1’s payo¤ through (81), adjustment
of the reward
adjust
,
1
the rewards for truthtelling (64), (65), and (66), since (80) is sunk at the point
of sending ^p:r: . Except for (81), the e¤ect is of order T
T
4
h
i
E 1fy2;tp:r: =yp:r: g j ap:r: ; y1;tp:r:
2
where O(T
5
O(T
5
)>T
) is a random variable of order T
Pr
Pr
5
5
4 p:r:
p1 E
. Hence, it is optimal to send ^p:r: = 1 if
h
i
1fy2;tp:r: 6=yp:r: g j ap:r: ; y1;tp:r: + O(T
2
. That is,
y2;tp:r: = y2p:r: j ap:r: ; y1;tp:r:
> p1p:r: + O(T
p:r:
p:r:
y2;tp:r: 6= y2
j a ; y1;tp:r:
88
1
):
5
);
Figure 16: Player 2 can condition that she did not learn player 1’s history
For a su¢ ciently large T , therefore, it is optimal to send ^p:r: = 1 if y1;tp:r: 2 Y1report . Similarly, we
can show that it is optimal to send ^p:r: = 2 if y1;tp:r: 2 Y1not
truth about
p:r:
report
. That is, it is optimal to tell the
.
In summary, we construct an incentive compatible strategy and reward such that the players
coordinate on whether player 1 sends the history truthfully, by asking player 1 to guess whether
player 2 observed y2;tp:r: = y2p:r: or y2;tp:r: 6= y2p:r: .
Moreover, player 2’s report matters only after
player 1 sends the garbage, which incentivizes player 2 to tell the truth. Since the players’signals
are correlated, player 1’s guess di¤ers after di¤erent observations of player 1’s signals. Hence, both
players have positive probabilities of getting their reward adjusted based on the report block.
Finally, if the players’signals are conditionally independent, player 2 takes a mixed strategy and
asks player 1 to guess which action she takes, rather than asks player 1 to guess whether player 2
observed y2;tp:r: = y2p:r: or y2;tp:r: 6= y2p:r: .
The proof is the same as above, except that, since player 2 takes a mixed strategy, player 2 has
to be indi¤erent between multiple actions in period tp:r: . If player 2 had di¤erent probabilities of
getting her reward adjusted after di¤erent actions of hers, then depending on player 2’s expectation
89
of the adjustment, player 2 would have di¤erent incentives to take di¤erent actions.
To avoid
such complication, as will be seen in Appendix A.11.7.3 formally, we make sure that the magnitude
of getting her reward adjusted does not depend on player 2’s action.
Then, player 2 becomes
indi¤erent between actions.
16
Veri…cation of (5)–(8)
Note that Section 7 de…nes the structure of the …nitely repeated game with T being a parameter.
For each T , Section 13 de…nes player i’s strategy in the coordination and main blocks, and Lemma
13 de…nes her strategy in the report block
report
jh
i
i
L
. Hence, we have …nished de…ning the strategy.
In addition, for each T , Section 14.2 de…nes the reward function as
TP +1
)
i (xj ; hj
=
i (xj )
R
X
+
X
xj
i (aj;t ; yj;t )
r=1:round r is not a review round t2T(r)
+
+
Section 14.2 pins down
L n
X
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l)
i
+
o
adjust
; l)
(xj ; hj l ; hreport
j
i
l=1
report
(xj ; hj L ; hreport
):
i
j
i (xj ),
xj
i ,
and
review
,
i
and Lemma 13 pins down
adjust
i
and
report
.
i
(82)
Hence,
we have …nished de…ning the reward function. Therefore, we are left to verify (5)–(8).
It is useful to recall that Claim 2 of Lemma 13 de…nes
adjust
(xj ; hj l ; hreport
; l)
i
j
so that, given
player i’s equilibrium strategy in the report block, we have
h
review
(xj ; hj l ; l)
i
adjust
(xj ; hj l ; hreport
; l)
i
j
j xj ; hi l
h
i
T(l)
target
l
l
l
include
= E i
(xj ; fi [hi ]; fj
[hj ]; fj [hj ]; l) j xj ; hi
E
+
90
i
(83)
since
i (l)(j)
= B implies
review
i
target
i
=
by (57) and (59). Moroever, (59) ensures that
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
= 1f
i (l)(j)=Bg
8
<
:
sign (xj ) LT u +
f
16.1
>
>
>
>
>
>
:
X
t2T(l)
+1f i (l)(j)=Gg
8
>
>
>
>
>
>
< 1
(84)
9
=
xj
i (aj;t ; yj;t )
;
8
9
8
<
>
T fui (xj ) ui ( (x(j)))g =
>
>
< 1
P
f j (l)=G_f j (l)=B^fi [h<l
i ]62 i (x(j);l)gg :
+ t2T(l) i [ (x(j))](aj;t ; yj;t ) ;
j (l)=Gg
>
>
>
: +1
fui (xj ) ui (BRi ( j ; (x(j))); j ; (x(j)))g
f j (l)=B^fi [h<l
i ]2 i (x(j);l)g
n
o
P
xj
+1f j (l)=Bg sign (xj ) LT u + t2T(l) i (aj;t ; yj;t )
9
>
>
>
=
9
>
>
>
>
>
>
=
>
>
:
>
; >
>
>
>
>
>
;
Incentive Compatibility, Promise Keeping, and Full Dimensionality
We prove the optimality of player i’s equilibrium strategy by backward induction: In the report
block, player i maximizes
E
"
X
ui (ai;t ) +
t:report block
L
X
adjust
(xj ; hj l ; hreport
; l)
i
j
+
report
(xj ; hj L ; hreport
)
i
j
l=1
j xj ; hi
L
#
since the other rewards in (82) are sunk in the report block. Hence, by Lemma 13, it is optimal to
take
report
jh
i
i
L
for each xj 2 fG; Bg.
The equilibrium payo¤ (without taking the average) in the report block satis…es
E
"
X
ui (ai;t ) +
report
(xj ; hj L ; hreport
)
i
j
t:report block
by Claim 3 of Lemma 13 (we include
j xj ;
adjust
(xj ; hj l ; hreport
; l)
i
j
report
jh
i
i
L
; hi
L
#
=0
to the equilibrium payo¤ in review
round l, rather than in the report block).
In the last review round L, since the continuation payo¤ from the report block is zero, player i
P
ignores t:report block ui (ai;t ) + report
(xj ; hj L ; hreport
). In addition, by (62), the expected value of
i
j
X
adjust
(xj ; hj l ; hreport
; l)
i
j
l L 1
91
does not depend on player i’s strategy after review round L 1. Since the rewards
PR
P
xj
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l) for l
r=1:round r is not a review round
t2T(r) i (aj;t ; yj;t ), and i
i (xj ),
L 1
are sunk in review round L, player i wants to maximize, with l = L,
2
E4
X
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l)
i
ui (at ) +
adjust
(xj ; hj l ; hreport
; l)
i
j
+
t2T(l)
j xj ;
3
<l 5
:
i (xi ); hi
Given (83), by the law of iterated expectation, player i wants to maximize, with l = L,
2
X
E4
ui (at ) +
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
t2T(l)
Hence, by Lemma 2 and (59), if
2
E4
X
j (l)
3
<l 5
:
i (xi ); hi
j xj ;
= B, then player i wants to maximize
X
ui (at ) + sign (xj ) LT u +
t2T(l)
xj
i (aj;t ; yj;t )
t2T(l)
j xj ;
3
<l 5
:
i (xi ); hi
Claim 2 of Lemma 2 ensures that any strategy is optimal and the equilibrium value (without taking
x
the average) is equal to sign(xj ) LT u + T ui j .
On the other hand, if
if
j (l)
j (l)
= G, then we have
= G or “ j (l) = B and h<l
i 62
2
E4
X
t2T(l)
ui (at ) + T fui (xj )
ui (
i (x(j); l)”,
i (l)(j)
= G by Claim 3 of Lemma 11. Hence,
then player i wants to maximize
X
(x(j)))g +
(x(j))](aj;t ; yj;t ) j xj ;
i[
t2T(l)
3
<l 5
:
i (xi ); hi
By Claim 1 of Lemma 2, any action is optimal and player i’s equilibrium value is T ui (xj ).
addition,
j (l)
= B and h<l
i 2
2
E4
X
i (x(j); l),
ui (at ) + ui (xj )
In
then player i wants to maximize
ui (BRi (
;
j
(x(j)));
t2T(l)
;
j
(x(j))) j xj ;
3
<l 5
:
i (xi ); hi
Since the reward is constant, player i wants to take a static best response to player j’s strategy.
Moreover, Claim 4 of Lemma 11 ensures that player j with
Hence, BRi
;
j
(x(j)) is optimal. Since
i (x(j); l)
92
j (l)
= G and
j (l)
= B takes
;
j
(x(j)).
is the set of player i’s history in which player
i has x(i) = x(j) and
i (x(j); l)
j (l)(i)
;
takes BRi
j
= B with probability one.
Figure 10 ensures that player i with
(x(j)) with probability one.
Hence, player i’s strategy is optimal.
Moreover, her equilibrium value is T ui (xj ).
In total, with l = L, the equilibrium payo¤ is
2
E4
=
8
<
X
T(l)
target
(xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l)
i
ui (at ) +
t2T(l)
if
T ui (xj )
: sign (x ) LT u + T uxj if
j
i
Since the distribution of
j (l)
j (l)
= G;
j (l)
= B:
3
5
j xj ; h<l
i
does not depend on player i’s strategy by Claim 2 of Lemma 11,
player i before review round L ignores the continuation payo¤ from review round L on.
Again,
(62) ensures that the expected value of
X
adjust
; l)
(xj ; hj l ; hreport
j
i
l L 1
does not depend on player i’s strategy after review round L
1. Ignoring the rewards in (82) that
have been sunk, player i in the supplemental rounds for
and
2
E4
t2T(
X
ui (at ) +
1 (l))[T(
t2T(
2 (l))
X
1 (l))[T(
1 (l)
2 (l)
xj
i (aj;t ; yj;t )
2 (l))
with l = L maximizes
j xj ; hi
1
3
l 15
:
x
Therefore, any strategy is optimal and the equilibrium value is equal to 2T 2 ui j since jT( 1 (l))j =
1
jT( 2 (l))j = T 2 .
Again, this value does not depend on the previous history.
L
Hence, player i in review round
1 ignores the continuation payo¤ from the supplemental round for
1 (L)
on.
Therefore, by
the same proof as review round L, we can establish the optimality of the equilibrium strategy.
Recursively, we can show that the equilibrium strategy is optimal in the main and report blocks,
and the equilibrium payo¤ from the main and report blocks is equal to
2 (L
1) T
1
2
x
ui j
+
L
X
l=1
1f
j (l)=Gg
T ui (xj ) + 1f
93
x
j (l)=Bg
sign (xj ) LT u + T ui j
:
Since this value does not depend on player i’s strategy in the coordination block, player i in the
coordination block maximizes
E
"
X
ui (at ) +
xj
i (aj;t ; yj;t )
t:coordination block
#
j xj :
1
2
x
Hence, any strategy is optimal and equilibrium value is equal to 4T 2 + 2T 3 ui j , recalling that
1
2
4T 2 + 2T 3 is the length of the coordination block.
Therefore, in total, player i’s value from the review phase except for
1
2
2
3
4T + 2T + 2 (L
1) T
1
2
x
ui j
+
L
X
l=1
Hence, by de…nition of
i (xj )
1f
j (l)=Gg
T ui (xj ) + 1f
i (xj )
is equal to
x
j (l)=Bg
sign (xj ) LT u + T ui j
:
(85)
in (52), the total equilibrium value (without taking the average) is
equal to TP vi (xj ), as desired. In total, we have veri…ed incentive compatibility, promise keeping,
and full dimensionality.
16.2
Self Generation
PL
We …rst show that we can ignore the terms except for
" > 0 such that, for each xj 2 fG; Bg, we have
(15 + 7L)
x
jui (xj )j + Lu + ui j
(16) ensures that there exists such " > 0.
P
Note that the terms other than Ll=1
review
(xj ; hj l ; l).
i
l=1
+ " < jui (xj )
review
(xj ; hj l ; l)
i
in
To see this, de…ne
(86)
vi (xj )j :
TP +1
)
i (xj ; hj
satisfy, for each hj L and
hreport
,
j
8
>
>
>
>
>
>
>
>
>
>
<
>
>
>
>
>
>
>
>
>
>
:
i (xj )
= TP vi (xj )
PR
E
hP
L
l=1
1f j (l)=Gg T ui (xj ) + 1f j (l)=Bg T sign (xj ) LT u +
o
n
1
2
x
(2 + 2L) T 2 + 2T 3 ui j ;
r=1:round r is not a review round
xj
i (aj;t ; yj;t )
adjust
(xj ; hj l ; hreport
; l)
i
j
exp
(TP
1
T5
11
report
(xj ; hj L ; hreport
)
i
j
LT ) u by Lemma 2,
by Lemma 13,
Kreport T 12 by Lemma 13.
94
x
T ui j
i
By Lemma 5, TP and LT are similar to each other for a large T . Hence, for a su¢ ciently large T ,
for each hj L and hreport
, we have
j
8
9
PR
xj
<
=
1
i (xj ) +
r=1: round r is not a review round i (aj;t ; yj;t )
TP : + PL adjust (xj ; h l ; hreport ; l) + report (xj ; h L ; hreport ) ;
j
j
i
j
j
l=1 i
2
hP
xj
L
1
1
vi (xj ) E
l=1 1f j (l)=Gg L ui (xj ) + 1f j (l)=Bg L sign (xj ) Lu + ui
hP
2 4
xj
L
1
1
vi (xj ) E
l=1 1f j (l)=Gg L ui (xj ) + 1f j (l)=Bg L sign (xj ) Lu + ui
Moreover, by Claim 2 of Lemma 11, the probability of
each l. Taking l = L, this means that
Since
1
j (l)
j (L)
j (l)
i
i
j (l)
+"
5:
= B is no more than (15 + 8L)
= G with probability no less than 1
= B is absorbing, this also means that
";
3
for
(15 + 8L) .
= G for each l with probability no less than
(15 + 8L) . Hence, if xj = G, we have
9
8
PR
xj
<
=
1
i (xj ) +
r=1: round r is not a review round i (aj;t ; yj;t )
TP : + PL adjust (xj ; h l ; hreport ; l) + report (xj ; h L ; hreport ) ;
j
j
j
j
i
l=1 i
vi (xj )
ui (xj ) + (15 + 8L)
x
jui (xj )j + Lu + ui j
+ ";
which is non-positive by (86). Similarly, if xj = B, we have
9
8
PR
xj
<
=
1
i (xj ) +
r=1: round r is not a review round i (aj;t ; yj;t )
lim
P
T !1 TP :
) ;
; l) + report
(xj ; hj L ; hreport
(xj ; hj l ; hreport
+ Ll=1 adjust
j
j
i
i
vi (xj )
ui (xj )
(15 + 8L)
x
jui (xj )j + Lu + ui j
"
0:
n
PL
Hence, for a su¢ ciently large T , we have sign(xj ) i (xj ; hj TP +1 )
l=1
PL review
l
Therefore, it su¢ ces to show that sign(xj ) l=1 i
(xj ; hj ; l) 0.
P
Moreover, if j (l) = B for some l = 1; :::; L, then we have sign(xj ) Ll=1
95
o
0.
review
(xj ; hj l ; l)
i
0.
review
(xj ; hj l ; l)
i
To see this, recall that, for each history of player j, we have
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l)
i
= 1f
i (l)(j)=Bg
+1f
8
<
sign (xj ) LT u +
:
8
>
>
>
>
>
>
< 1f
i (l)(j)=Gg
X
j (l)=Gg
>
>
>
>
>
>
:
8
>
>
>
<
xj
i (aj;t ; yj;t )
t2T(l)
>
>
>
: +1
f
1f
j (l)=Gg
8
<
9
=
;
: +
T fui (xj )
P
Claim 5 of Lemma 11, x(j) = xj if
: sign (x ) 1
j
f
j (l)=Gg
j (l)=Gg
T fui (xj )
fui (xj )
ui (
sign (xj )
j (l)=Gg
1f
Therefore, if
j (l)
L
X
i
ui (BRi (
;
j
(x(j)))g
;
(x(j)));
Hence, together with Claim 3 of Lemma 11 ( i (l)(j) = B )
1f
;
xj
i (aj;t ; yj;t )
t2T(l)
9
>
>
>
>
>
>
=
0. Moreover, by
= G. Hence, (15) ensures that
j (l)
sign (xj ) 1f
P
9
>
>
>
=
>
>
> :
>
(x(j))); j (x(j)))g ; >
>
>
o
>
>
xj
;
(aj;t ; yj;t )
;
T fui (xj ) ui (BRi ( j
n
P
+1f j (l)=Bg sign (xj ) LT u + t2T(l)
j (l)=Bg
9
=
(x(j)))g
(x(j))](aj;t ; yj;t ) ;
i[
t2T(l)
For each history of player j, by Lemma 2, we have sign(xj )
8
<
ui (
j
j (l)
0;
(x(j)))g
0:
= B), we have
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l)
i
sign (xj )
X
i[
(x(j))](aj;t ; yj;t ) + 1f
t2T(l)
j (l)=Gg
T u + 1f
j (l)=Bg
j (l)=Bg
LT u
LT u by Lemma 2.
= B for some l = 1; :::; L, then we have
sign (xj )
T(l)
review
(xj ; fjinclude [hj l ]; fj [hj ]; l)
i
(L
1) T u + LT u
0;
l=1
as desired.
Therefore, we focus on the case with
j (l)
= G for each l = 1; :::; L. By Claim 5 of Lemma 11,
we have xj (j) = xj . Let l be the last review round with
each l = 1; :::; L). If xj = G, then recalling that
96
;
j
j (l)
(x(j)) =
= G (de…ne l = L if
j (x(j))
j (l)
= G for
with xj (j) = xj = G, we
have
L
X
l=1
=
T(l)
review
(xj ; hj ; l)
i
8
l <
X
l=1
:
T fui (xj )
L
X
+
ui (
(x(j)))g +
X
i[
t2T(l)
T ui (xj )
ui (BRi (
;
j
;
(x(j)));
j
9
=
(x(j))](aj;t ; yj;t )
;
(x(j)))
l=l+1
lT
max
xi (j)2fG;Bg
fui (xj )
ui (
(xj ; xi (j)))g + (L
l)T f
ui (xj )
|
ui (BRi (
|
j (x(j)));
maxxi (j)2fG;Bg fui (xj ) ui (
+
P
j
LT
l 1
X
u
T
4L
l=1
| {z }
since
t2T(l)
i[
since
j (l)=G,
(x(j))](aj;t ;yj;t )j
xi (j)2fG;Bg
fui (xj )
u
T
4
u
T
4L
ui (
](aj ;yj ):Aj Yj ![
i[
u u
;
4 4
(x(j)))
g
j (x(j)))
}
}
(xj ;xi (j)))g since xj (j)=xj
] by Lemma 2,
is the maximum realization
P
(x(j))](aj;t ;yj;t )
t2T(l) i [
for
for l=1;:::;l 1
max
u
T
4
|{z}
+
ui (
{z
{z
u
u
(xj ; xi (j)))g + T + T
4
4
0
by (15). Similarly, if xj = B, we have
L
X
T(l)
review
(xj ; hj ; l)
i
l=1
=
8
l <
X
l=1
+
:
T fui (xj )
L
X
T ui (xj )
ui (
(x(j)))g +
X
i[
t2T(l)
ui (BRi (
;
j
;
(x(j)));
j
9
=
(x(j))](aj;t ; yj;t )
;
(x(j)))
l=l+1
lT
min
xi (j)2fG;Bg
+(L
l)T f
l 1
X
u
T
4L
l=1
LT
max
fui (xj )
ui (
(xj ; xi (j)))g
ui (xj ) ui (BRi ( j ; (x(j)));
|
{z
minxi (j)2fG;Bg fui (xj ) ui (BRi ( j ; (x(j))); j ;
xi (j)2fG;Bg
u
T
4
ui (xj )
max ui (BRi (
;
j
(x(j)))
g
}
(x(j)))g since xj (j)=xj
(x(j)));
97
;
j
;
j
(x(j))); ui (
(xj ; xi (j)))
u
u
+ T+ T
4
4
0
by (15). Therefore, self generation is satis…ed.
Since we have proven (5)–(8), Lemma 1 ensures that Theorem 1 holds.
References
Aoyagi, M. (2002): “Collusion in dynamic Bertrand oligopoly with correlated private signals and
communication,”Journal of Economic Theory, 102(1), 229–248.
Aoyagi, M. (2005): “Collusion through Mediated Communication in Repeated Games with Imperfect Private Monitoring,”Economic Theory, 25(2), pp. 455–475.
Boucheron, S., G. Lugosi, and P. Massart (2013): Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford.
Compte, O. (1998): “Communication in repeated games with imperfect private monitoring,”
Econometrica, 66(3), 597–626.
Deb, J. (2011): “Cooperation and community responsibility: A folk theorem for repeated random
matching games,”mimeo.
Ely, J., J. Hörner, and W. Olszewski (2005): “Belief-free equilibria in repeated games,”
Econometrica, 73(2), 377–415.
Ely, J., and J. Välimäki (2002): “A robust folk theorem for the prisoner’s dilemma,”Journal of
Economic Theory, 102(1), 84–105.
Fudenberg, D., and D. Levine (2007): “The Nash-threats folk theorem with communication
and approximate common knowledge in two player games,”Journal of Economic Theory, 132(1),
461–473.
Fudenberg, D., D. Levine, and E. Maskin (1994): “The folk theorem with imperfect public
information,”Econometrica, 62(5), 997–1039.
Fudenberg, D., and E. Maskin (1986): “The folk theorem in repeated games with discounting
or with incomplete information,”Econometrica, 53(3), 533–554.
98
Harrington, J., and A. Skrzypacz (2011): “Private monitoring and communication in cartels:
Explaining recent collusive practices,”American Economic Review, 101(6), 2425–2449.
Hörner, J., and W. Olszewski (2006): “The folk theorem for games with private almost-perfect
monitoring,”Econometrica, 74(6), 1499–1544.
(2009): “How robust is the folk theorem?,” The Quarterly Journal of Economics, 124(4),
1773–1814.
Kandori, M. (2002): “Introduction to repeated games with private monitoring,”Journal of Economic Theory, 102(1), 1–15.
(2011): “Weakly belief-free equilibria in repeated games with private monitoring,”Econometrica, 79(3), 877–892.
Kandori, M., and H. Matsushima (1998): “Private observation, communication and collusion,”
Econometrica, 66(3), 627–652.
Kandori, M., and I. Obara (2006): “E¢ ciency in repeated games revisited: The role of private
strategies,”Econometrica, 74(2), 499–519.
Lehrer, E. (1990): “Nash equilibria of n-player repeated games with semi-standard information,”
International Journal of Game Theory, 19(2), 191–217.
Mailath, G., and L. Samuelson (2006): Repeated games and reputations: long-run relationships.
Oxford University Press.
Matsushima, H. (2004): “Repeated games with private monitoring: Two players,”Econometrica,
72(3), 823–852.
Miyagawa, E., Y. Miyahara, and T. Sekiguchi (2008): “The folk theorem for repeated games
with observation costs,”Journal of Economic Theory, 139(1), 192–221.
Obara, I. (2009): “Folk theorem with communication,” Journal of Economic Theory, 144(1),
120–134.
Piccione, M. (2002): “The repeated prisoner’s dilemma with imperfect private monitoring,”Journal of Economic Theory, 102(1), 70–83.
99
Radner, R., R. Myerson, and E. Maskin (1986): “An example of a repeated partnership game
with discounting and with uniformly ine¢ cient equilibria,” Review of Economic Studies, 53(1),
59–69.
Rahman, D. (2014): “The Power of Communication,”American Economic Review, 104(11), 3737–
51.
Renault, J., and T. Tomala (2004): “Communication equilibrium payo¤s in repeated games
with imperfect monitoring,”Games and Economic Behavior, 49(2), 313–344.
Sekiguchi, T. (1997): “E¢ ciency in repeated prisoner’s dilemma with private monitoring,”Journal of Economic Theory, 76(2), 345–361.
Stigler, G. (1964): “A theory of oligopoly,”The Journal of Political Economy, 72(1), 44–61.
Sugaya, T. (2012): “Folk theorem in repeated games with private monitoring: multiple players,”
mimeo.
Takahashi, S. (2010): “Community enforcement when players observe partners’past play,”Journal of Economic Theory, 145(1), 42–62.
Yamamoto, Y. (2007): “E¢ ciency results in N player games with imperfect private monitoring,”
Journal of Economic Theory, 135(1), 382–413.
(2009): “A limit characterization of belief-free equilibrium payo¤s in repeated games,”
Journal of Economic Theory, 144(2), 802–824.
(2012): “Characterizing belief-free review-strategy equilibrium payo¤s under conditional
independence,”mimeo.
100