Folk Theorem in Repeated Games with Private Monitoring Takuo Sugayay Stanford Graduate School of Business April 29, 2015 Abstract We show that the folk theorem generically holds for the repeated two-player game with private monitoring if the support of each player’s signal distribution is su¢ ciently large. Neither cheap talk communication nor public randomization is necessary. Journal of Economic Literature Classi…cation Numbers: C72, C73, D82 Keywords: repeated game, folk theorem, private monitoring [email protected] I thank Stephen Morris for invaluable advice, Yuliy Sannikov for helpful discussion, Andrzej Skrzypacz for carefully reading the paper, Yuichi Yamamoto for his suggestions to improve the exposition of the paper, and Dilip Abreu, Eduardo Faingold, Drew Fudenberg, Edoardo Grillo, Johannes Hörner, Yuhta Ishii, Michihiro Kandori, Vijay Krishna, George Mailath, Mihai Manea, Larry Samuelson, Tadashi Sekiguchi, Satoru Takahashi, and seminar participants at 10th SAET conference, 10th World Congress of the Econometric Society, 22nd Stony Brook Game Theory Festival, and Summer Workshop on Economic Theory for useful comments. I also thank Piotr Dworczak and Sergey Vorontsov for careful proofreading, and Jenna Abel for professional editorial service. The remaining errors are my own responsibility. y 1 1 Introduction A possibility of tacit collusion when a party observes only private signals about the other parties’ actions has been a long-standing open problem. Stigler (1964) considers a case in which a …rm can cut the price secretly and conjectures that, since the coordination on the punishment is di¢ cult with private signals, policing an implicit collusive agreement (punishing a deviator from the agreement) does not work e¤ectively. Harrington and Skrzypacz (2011) examine a recent cartel agreement, and construct a model to replicate how the …rms police the agreement. Since their equilibrium requires cheat-talk communication and monetary transfer, they show the possibility of explicit collusion, rather than tacit collusion. A theoretical possibility of tacit collusion is, therefore, yet to be proven.1 The literature on in…nitely repeated games o¤ers a theoretical framework to study tacit collusion. One of the key …ndings is the folk theorem: Any feasible and individually rational payo¤ pro…le can be sustained in equilibrium when players are su¢ ciently patient. This implies that su¢ ciently patient …rms can sustain tacit collusion to obtain any feasible and individually rational payo¤. Fudenberg and Maskin (1986) establish the folk theorem under perfect monitoring, in which players can directly observe the action pro…le. Fudenberg, Levine, and Maskin (1994) extend the folk theorem to imperfect public monitoring, in which players can observe only public noisy signals about the action pro…le. It has been an open question whether the folk theorem holds in general private monitoring, in which players observe private noisy signals about other players’ actions. Matsushima (2004), Hörner and Olszewski (2006), Hörner and Olszewski (2009), and Yamamoto (2012) show the folk theorem in restricted classes of private monitoring (see below for details), but little is known about general private monitoring. The objective of this paper is to solve this open problem: In general private monitoring with discounting,2 the folk theorem holds in the two-player game. Speci…cally, we identify su¢ cient conditions with which the folk theorem holds, and show that these su¢ cient conditions generically hold in private monitoring. In our equilibrium, we use neither cheap-talk communication nor 1 Another area where private signals are important is team-production. If each agent in a team subjectively evaluates how the other agents contribute to the production, then seeing subjective evaluation as a private signal, the situation is modeled as repeated games with private monitoring. 2 See Lehrer (1990) for the case with no discounting. 2 public randomization.3 Furthermore, in the companion paper Sugaya (2012), we generalize our equilibrium construction to a general game with more than two players. We now discuss papers about the folk theorem4 and then illustrate how we prove the folk theorem, built upon the pre-existing work. The driving force of the folk theorem is reciprocity: If a player deviates today, she will be punished in the future (Stigler (1964) call this reciprocity “policing the agreement”). For this mechanism to work, each player needs to coordinate her action with the other players’histories. Whether the players achieve coordination depends on the players’ information and thus on the monitoring structure. Fudenberg and Maskin (1986) establish the folk theorem under perfect monitoring. Because the histories are common knowledge in perfect monitoring, each player can coordinate her continuation strategy with the other players’ histories. Fudenberg, Levine, and Maskin (1994) extend the folk theorem to imperfect public monitoring by focusing on the equilibrium in which each player’s continuation strategy depends only on past public signals. Because the histories of public signals are common knowledge, players can coordinate their continuation play through public signals. On the other hand, in general private monitoring, each player’s actions and signals are her private information, so players do not share common information about the histories. Hence, the coordination becomes complicated as periods proceed. This is why Stigler (1964) conjectures that collusion is impossible with general private monitoring. If monitoring is almost public, then players believe that every player observes the same signal with a high probability. Hence, almost common knowledge about …nite-past histories exists. This almost common knowledge enables Hörner and Olszewski (2009) to show the folk theorem in almost public monitoring. Without almost public monitoring, almost common knowledge may not exist, and coordination is di¢ cult. To deal with this di¢ culty, Piccione (2002) and Ely and Välimäki (2002) focus on a tractable class of equilibria, a “belief-free equilibrium.” A strategy pro…le is belief free if, after each history pro…le, the continuation strategy of each player is optimal, conditional on the histories of the opponents. Hence, coordination is not an issue. Piccione (2002) and Ely and Välimäki (2002) show that the belief-free equilibrium sustains the set of feasible and individually rational 3 There are papers that prove folk theorems in private monitoring with cheap-talk or mediated communication (explicit collusion). See, for example, Compte (1998), Kandori and Matsushima (1998), Aoyagi (2002), Aoyagi (2005), Fudenberg and Levine (2007), Obara (2009), Rahman (2014), and Renault and Tomala (2004). 4 See also Kandori (2002) and Mailath and Samuelson (2006) for a survey. 3 payo¤s in the two-player prisoners’ dilemma with almost perfect monitoring.5;6 However, with general monitoring or in a general game beyond prisoners’ dilemma, the belief-free equilibrium cannot sustain the folk theorem. See Ely, Hörner, and Olszewski (2005) and Yamamoto (2009) for the formal proof. Hence, there are two ways to generalize the belief-free equilibrium: One is to keep prisoners’dilemma payo¤ structure and to consider noisy monitoring; and the other is to keep almost perfect monitoring and to consider a more general stage game. The former approach by Matsushima (2004) and Yamamoto (2012) recovers the precision of the monitoring, assuming that monitoring is not perfect but conditionally independent: Conditional on action pro…les, the players observe statistically independent signals. The idea is to replace one period in the equilibrium of Ely and Välimäki (2002) with a long T -period review phase. When we aggregate information over a long phase and rely on the law of large numbers, we can recover the precision of the monitoring. To see the obstacle to further generalize their result to general monitoring with correlated signals, let us explain their equilibrium construction in the two-player prisoners’dilemma in which each player has two signals, good and bad. Suppose that with player i’s cooperation, player j 6= i observes a good signal gj with 60%, while with player i’s defection, player j observes a bad signal bj with 60%. To achieve approximately e¢ cient equilibrium, player j should not punish player i if she observes gj for more than (0:6 + ") T times during a review phase with a small ". Note that by the central limit theorem, 0:6T is what player j can expect from player i’s cooperation. Suppose now we are close to the end of a review phase. If player i’s signals are correlated with player j’s, then after rare histories, player i may believe that, given her own history and correlation, player j has already observed gj for more than (0:6 + ") T times. may want to switch to defection. After such a history, player i (If signals are conditionally independent, then player i after each history always believes that player j observes gj at most (0:6 + ") T times with a very high probability. Hence, switching does not happen.) Once player i switches her action based on her history, player j wants to learn player i’s switch of actions via player j’s private signals. Note that player i reviews player j’s actions by player 5 See Yamamoto (2007) for the N -player prisoners’dilemma. Kandori and Obara (2006) use a similar concept to analyze a private strategy in public monitoring. Kandori (2011) considers “weakly belief-free equilibria,” which is a generalization of belief-free equilibria. Apart from a typical repeated-game setting, Takahashi (2010) and Deb (2011) consider the community enforcement, and Miyagawa, Miyahara, and Sekiguchi (2008) consider the situation in which a player can improve the precision of monitoring by paying a cost. 6 4 i’s history and that player j wants to know how player i has reviewed player j so far. Player i’s switch of actions is informative about her history and so about how she has reviewed player j. Recursively, if a player switches actions, then we have to deal with a high order belief about each player’s history. To deal with this problem, in our equilibrium, we divide a review phase into multiple review rounds. If the division is …ne enough, we can make sure that the switch of actions only happens at the beginning of the review round. In addition, before each review round, player j tells player i whether player i should switch the actions. (Apart from the issue of player j’s incentive to tell the truth,) if this communication is successful, then player i does not need to learn player j’s switches. Since we do not assume cheap talk, this communication is done by player j taking di¤erent actions to send di¤erent messages. Since player i infers player j’s actions (messages) from noisy private signals, player i may make a mistake and player j may not realize player i’s mistake. In Section 10, we construct a module for a player to send the other player a message by taking actions with noisy private signals, so that, if player i suspects that she made a mistake, then she believes that player j has “realized”the mistake. The latter approach is to generalize Ely and Välimäki (2002) to a general stage game (here we focus on two-player games), keeping monitoring almost perfect. In the belief-free equilibrium in Ely and Välimäki (2002), in each period, each player i picks a state xi that can be G (good) or B (bad). In each period, given state xi 2 fG; Bg, player i takes a mixed action i (xi ). Together with the state transition, they make sure that, for each state of the opponent xj 2 fG; Bg, both i (G) and i (B) are optimal conditional on xj . A reason why the belief-free equilibrium in Ely and Välimäki (2002) cannot sustain the folk theorem in a general game is that it is hard for player j with j (B) to punish player i severely enough after a signal which statistically indicates her deviation, at the same time keeping both j (G) and j (B) optimal against both xi = G and xi = B and keeping the equilibrium payo¤ with xi = xj = G su¢ ciently high. Hörner and Olszewski (2006) overcome this di¢ culty as follows: They divide the repeated game into L-period phases.7 In each phase, each player i picks a state xi 2 fG; Bg. given state xi 2 fG; Bg, player i takes an L-period dynamic strategy 7 i (xi ). In each phase, Again, for each They use the term T -period blocks, but we use L and phases instead, in order to make the terminology consistent within the paper. 5 state of the opponent xj 2 fG; Bg, both i (G) and i (B) are optimal conditional on xj at the beginning of review phase. However, within a phase, since the players coordinate on x by taking actions at the beginning of the phase, it is optimal to adhere to corresponding to i (xi ) i (xi ) once player i takes an action at the beginning of the phase. Having L periods in the phase, they can create a severe punishment by letting player j with j (B) switch to a minimax strategy after a signal which statistically indicates the opponent’s deviation. Moreover, since the switch happens only after a rare signal which indicates a deviation, we can keep the payo¤ of j (G) j (B) (and that of in order to keep belief-free property at the beginning of the phase) su¢ ciently high given xi = G. There are two di¢ culties to generalize their construction to general monitoring: With general monitoring, the coordination on x by taking actions becomes harder since the actions are less precisely observed. Second, as seen in the case with Matsushima (2004), since each player switches actions based on past signals, each player may start to learn the opponent’s history by observing each other’s switches via noisy signals. This learning becomes complicated with noisy signals. Again, by using the module to send a message by taking actions with noisy private signals, we make sure that the players can coordinate on x and learning does not change the player’s optimal strategy. In total, combining Matsushima (2004) and Hörner and Olszewski (2006), we construct a following equilibrium: We divide the repeated game into long review phases. At the beginning of the phase, the players coordinate on x by using the module to send messages via taking actions. Then, we have L review rounds. These review rounds serve two roles: One is to allow player i to switch actions when she believes that the opponent has observed a lot of good signals in order to deal with conditionally dependent signals. The other is to allow player j to switch to a minimaxing action when she observes a lot of signals statistically indicating the opponent’s deviation as in Hörner and Olszewski (2006). Relatedly, player i also switches her own actions to take a best response against player j’s minimaxing strategy, when player j switches to the minimaxing actions. To coordinate these switches, before each review rounds, the players communicate about the continuation play again by the module to send messages via taking actions. The module and communication are designed carefully so that learning from the opponent’s actions in the review round does not change the optimal action within the review round. The rest of the paper is organized as follows. In Section 2, we introduce the model, and in 6 Section 3, we state assumptions and result (folk theorem). The rest of the paper proves the folk theorem. (After giving a short roadmap of the proof in Section 4), in Section 5, we derive the su¢ cient conditions in a …nitely repeated game without discounting such that, once we prove the su¢ cient conditions, then it implies that the folk theorem holds in the in…nitely repeated game with discounting. From there on, we focus on proving these su¢ cient conditions. Section 6, we de…ne multiple variables, with which Section 7 pins down the structure (of the strategy) of the …nitely repeated game. (After giving another short roadmap in Section 8,) since the rest of the proof is long and complicated, we …rst o¤er the overview in Section 9. Then, we prove two modules which may be of their own interests. Section 10 de…nes the module for player j to send a binary message m 2 fG; Bg to player i. Section 11 de…nes the module that will be used for the equilibrium construction of the review round. Given these modules, we explain how to use these modules to prove the su¢ cient conditions in Section 12. Sections 13-16 actually prove the su¢ cient conditions. (See Section 12 for the structure of these sections.) Since the entire proof is long, in Sections 13-16, we summarize each step as lemmas and offer the intuitive explanation of the proof, relegating the technical proof to Appendix A (online appendix). In addition, we o¤er the table of notation in Appendix B. Appendices A and B are provided as supplemental materials for the submission. In case the referees are not provided the supplemental materials, the online appendix is also available at https://sites.google.com/site/takuosugaya/home/research. 2 2.1 Model Stage Game We consider a two-player game with private monitoring. The stage game is given by fI, fAi , Yi gi2I , qg. Here, I = f1; 2g is the set of players, Ai is the …nite set of player i’s pure actions, and Yi is the …nite set of player i’s private signals. In every stage game, player i chooses an action ai 2 Ai , which induces an action pro…le a Q Q (a1 ; a2 ) 2 A (y1 ; y2 ) 2 Y i2I Ai . Then, a signal pro…le y i2I Yi is realized according to a joint conditional probability function q (y j a). Summing up these probabilities with respect to yj , 7 let qi (yi j a) P yj 2Yj q(y j a) denote the marginal distribution of player i’s signals. Throughout the paper, when we say players i and j, players i and j are di¤erent: i 6= j. Following the convention in the literature, we assume that player i’s ex post utility is a deterministic function of player i’s action ai and her private signal yi . This implies that observing her own ex post utility does not give player i any further information than (ai ; yi ).8 Let u~i (ai ; yi ) be player i’s ex post payo¤. Taking the expectation of the ex post payo¤, we can derive player i’s P expected payo¤ from a 2 A: ui (a) ui (ai ; yi ). yi 2Yi qi (yi j a)~ Let F be the set of feasible and individually rational payo¤s: F min j Let 2.2 v 2 co(fu(a)ga2A ) : vi min j2 max ui (ai ; (Aj ) ai 2Ai j) for all i 2 I : (1) be the minimax strategy. Repeated Game Consider the in…nitely repeated game with the (common) discount factor 2 (0; 1). Let hti (ai; ; yi; )t =11 with h1i = f;g be player i’s history in period t; and let Hit be the set of all the possible histories of player i. In each period t, player i takes an action ai;t according to her strategy S1 t (Ai ). Let i be the set of all strategies of player i. Finally, let E( ) be the set i : t=1 Hi ! of sequential equilibrium payo¤s with a common discount factor . 3 Assumptions and Result In this section, we state our four assumptions and main result. First, we assume that the distribution of private signal pro…les has full support: Assumption 1 For each a 2 A and y 2 Y , we have q(y j a) > 0. Let us de…ne "support min q(y j a) > 0 y2Y;a2A 8 (2) Otherwise, we see the realization of player i’s ex post utilities as a part of player i’s private signals. Let Ui be the …nite set of realizations of player i’s ex post utilities, and let u ~i 2 Ui be a generic element of Ui . We see a vector (yi ; u ~i ) as player i’s private signal. Hence, the set of player i’s signals is now Yi Ui . 8 be the lower bound of the probability. Note that this also implies the full support of the marginal distribution: mini2I;yi 2Yi ;a2A qi (yi j a) "support . Assumption 1 excludes public monitoring, where yi = yj with probability one. One may …nd allowing public monitoring of special interest: Our other Assumptions 2, 3, and 4 de…ned below generically hold if jYi j jAj j for each i and j, while pairwise full rank in Fudenberg, Levine, and Maskin (1994) requires jYi j jA1 j + jA2 j 1. (See also Radner, Myerson, and Maskin (1986).) We can extend the result to allow public monitoring and interested readers are referred to the working paper.9 Second, we assume that player i’s signal statistically identi…es player j’s action. Let qi (ai ; aj ) (3) (qi (yi j ai ; aj ))yi 2Yi be the vector expression of the marginal distribution of player i’s signals. Assumption 2 For each i 2 I and ai 2 Ai , the collection of jYi j-dimensional vectors (qi (ai ; aj ))aj 2Aj is linearly independent. Suppose that player i takes ai . When player j changes her actions, the change gives a di¤erent distribution of player i’s signals, so that player i can statistically identify player j’s deviation. Third, we assume that each signal of player i happens with di¤erent probabilities after player j’s di¤erent actions. Assumption 3 For each i 2 I, ai 2 Ai , and aj ; a0j 2 Aj satisfying aj 6= a0j , we have qi (yi j ai ; aj ) 6= qi (yi j ai ; a0j ) for all yi 2 Yi . Fourth, suppose that in the repeated game, player j takes a mixed strategy j 2 (Aj ). If player i’s history is (ai ; yi ), then player i believes that player j’s history (aj ; yj ) is distributed according to Pr((aj ; yj ) j j ; ai ; yi ). Here, Pr is de…ned such that player j takes aj according to and signal pro…le y is drawn from q(y j a). The following assumption about Pr((aj ; yj ) j j j ; ai ; yi ) ensures that there exists a mixed strategy of player j such that di¤erent histories of player i give di¤erent beliefs about player j’s history: 9 We use private strategies. Kandori and Obara (2006). The possibility of private strategies to improve e¢ ciency is …rst pointed out by 9 Assumption 4 For each i 2 I, there exists j with (ai ; yi ) 6= (a0i ; yi0 ), there exists (aj ; yj ) 2 Aj (Aj ) such that, for all (ai ; yi ); (a0i ; yi0 ) 2 Ai 2 Yj such that Pr((aj ; yj ) j j ; ai ; yi ) Yi 6= Pr((aj ; yj ) j 0 0 10 j ; ai ; yi ). With these four assumptions, we can show the following theorem. Theorem 1 If Assumptions 1, 2, 3, and 4 are satis…ed, then in a two-player repeated game, for each payo¤ pro…le v 2 int(F ), there exists 4 < 1 such that, for all > , we have v 2 E ( ). Short Road Map Before giving the details of the proof, we …rst display a short road map to prove Theorem 1. First, in Section 5, we derive the su¢ cient conditions in a …nitely repeated game without discounting such that, once we prove the su¢ cient conditions, then it implies that Theorem 1 holds in the in…nitely repeated game with discounting. Second, in Section 6, we de…ne multiple variables, with which Section 7 pins down the structure (of the strategy) of the …nitely repeated game. The rest of the roadmap, which is easier to understand once we see the structure in Section 7, is postponed until Section 8. 5 Reduction to a Finitely Repeated Game without Discounting To prove Theorems 1, we …x a payo¤ v 2 int(F ) arbitrarily. First, we derive su¢ cient conditions such that once we prove these su¢ cient conditions hold for given v, then v 2 E ( ). These su¢ cient conditions are stated as conditions in a “…nitely repeated game without discounting.” This reduction to the …nitely repeated game is standard (see Hörner and Olszewski (2006), for example), except that we successfully reduce the in…nitely repeated game with discounting to the …nitely repeated game without discounting. Since there is no discounting, we can treat each period identically. This identical treatment will simplify the equilibrium construction. To this end, we see the in…nitely repeated game as repetitions of TP -period review phases. We make sure that each review phase is recursive. (Speci…cally, our equilibrium is a TP -period block 10 This conditional probability is well de…ned for each j 10 and (ai ; yi ) given Assumption 1. equilibrium in the language of Hörner and Olszewski (2006).) We decompose each player’s payo¤ from the repeated game as the summation of instantaneous utilities from the current phase and continuation payo¤ from the future phases. Instead of analyzing the in…nitely repeated game, we concentrate on a TP -period …nitely repeated game, in which payo¤s are augmented by a terminal reward function that depends on a history of the …nitely repeated game. The reward function corresponds to the continuation payo¤ from the future phases. Speci…cally, in the …nitely repeated game, in each period t = 1; :::; TP , each player i takes an (ai; ; yi; )t =11 . Let action based on her private history hti i be the set of player i’s strategies. In equilibrium, each player is in one of the two possible states fG; Bg and, given state xi 2 fG; Bg, she takes a strategy i (xi ) 2 i. On the other hand, player i’s reward function is determined by player j’s state xj and player j’s history of the …nitely repeated game hTj P +1 : TP +1 ). i (xj ; hj Note that player i’s reward function is the statistic that player j calculates. We will show that if we …nd a natural number TP 2 N, a strategy TP +1 ), i (xj ; hj a reward function i (xi ), and values fvi (xj )gxj 2fG;Bg such that the following su¢ cient conditions hold, then we have v 2 E( ): For each i 2 I, 1. [Incentive Compatibility] For all x 2 fG; Bg2 , i (xi ) 2 arg max E i2 i "T P X ui (at ) + TP +1 ) i (xj ; hj t=1 j i; # j (xj ) : (4) In words, incentive compatibility requires that, given each state of the opponent, xj 2 fG; Bg, both i (G) and i (B) are optimal for player i to maximize the summation of instantaneous utilities and reward function. Since TP +1 ) i (xj ; hj is the movement of the continuation payo¤ P P t 1 in the context of the in…nitely repeated game and Tt=1 ui (at ) + TP i (xj ; hTj P +1 ) is close P P to Tt=1 ui (at ) + i (xj ; hTj P +1 ) for a su¢ ciently large , this condition implies both i (G) and i (B) are optimal in each review phase regardless of the opponent’s state. 2. [Promise Keeping] For all x 2 fG; Bg2 , "T P X 1 E ui (at ) + TP t=1 TP +1 ) i (xj ; hj 11 # j (x) = vi (xj ): (5) Here, for notational convenience, we write (x) ( 1 (x1 ); 2 (x2 )). In words, (5) says that the time average of the instantaneous utilities and the change in continuation payo¤, TP +1 ), i (xj ; hj is equal to vi (xj ). This implies that, for a su¢ ciently high discount factor, vi (xj ) is approximately player i’s value from the in…nitely repeated game if player j’s current state is xj . 3. [Full Dimensionality] The values vi (B) and vi (G) contain vi between them: (6) vi (B) < vi < vi (G): Since this condition implies vi (B) < vi (G), when player j switches from xj = G to xj = B, the switch strictly decreases player i’s payo¤. Hence, player j can punish player i by historycontingent state transitions. In addition, since vi 2 [vi (B); vi (G)], player j can mix the initial state properly so that player i’s initial equilibrium payo¤ is vi . 4. [Self Generation] The sign of 0 and TP +1 ) i (B; hj TP +1 ) i (xj ; hj satis…es a proper condition: For all hTj P +1 , TP +1 ) i (G; hj 0. If we de…ne sign(xj ) 8 < 1 if xj = G; : 1 (7) if xj = B; then this condition is equivalent to the following: sign(xj ) i (xj ; hTj P +1 ) 0: (8) We call the condition (8) “self generation.” This corresponds to the condition that Hörner and Olszewski (2006) impose: The reward function when the opponent’s state is G (or B) is nonpositive (or nonnegative). As will be seen in the proof, in the in…nitely repeated game, self generation ensures that player i’s continuation payo¤ speci…ed by the reward function at the end of a review phase is included in [vi (B); vi (G)]. Since vi (xj ) is player i’s value when player j’s state is xj , this inclusion ensures that, by mixing xj = G and B properly in the next phase, player j implements the 12 continuation payo¤ speci…ed by the reward function.11 The following lemma proves that the above conditions are su¢ cient. Lemma 1 For any v 2 R2 , if there exist TP 2 N, ff i (xi )gxi 2fG;Bg gi2I , ff i (xj ; hTj P +1 )gxj 2fG;Bg gi2I , and fvi (xj )gi2I;xj 2fG;Bg such that the conditions (5)–(8) are satis…ed, then we have v 2 E( ) for a su¢ ciently large 2 (0; 1). Proof. See Appendix A.2. Appendices A and B are provided as supplemental materials for the submission. In case the referees are not provided the supplemental materials, the online appendix is also available at https://sites.google.com/site/takuosugaya/home/research. Let us comment on why we can ignore discounting. Recall that Assumption 2 ensures that player j can statistically infer player i’s action. Hence, if she infers that player i incurs a loss in earlier periods of the …nitely repeated game rather than later, then player j can compensate player i slightly (of order 1 ) so that player i is indi¤erent about when to incur the loss. For a su¢ ciently large , such compensation is arbitrarily small. Therefore, we focus on, for any v 2 int(F ), …nding TP 2 N, ff xj 2fG;Bg gi2I , 6 i (xi )gxi 2fG;Bg gi2I , ff i (xj ; hTj P +1 )g and fvi (xj )gi2I;xj 2fG;Bg to satisfy (5)–(8). Basic Variables To construct the variables to satisfy (5)–(8), it will be useful to de…ne some functions/variables. In Section 6.1, we …x (A1 ) …x (a(x); L 2 N, 11 i[ ], xj i , x u > 0, and ui j i2I;xj 2fG;Bg . Here, 2 (A) with (A) (A2 ). Note that we do not allow correlation between players’actions; in Section 6.2, we i (x); ; i (x))x2fG;Bg2 for each , > 0, ai (G), ai (B), and mix i ; min; i for each , payo and in Section 6.3, we …x S, One may wonder whether we need the condition that (1 ) TP +1 ) i (xj ;hj TP > 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg , S(t) i , j, qG , q B , c:i: i , and "strict . is su¢ ciently small. This condition is automatically satis…ed for su¢ ciently large for the following reason: We …rst …x TP . Then, i (xj ; hTj P +1 ) is bounded since the number of histories given TP is …nite. Finally, by taking su¢ ciently close to 1, we can make (1 ) TP +1 ) i (xj ;hj TP arbitrarily small. 13 6.1 i[ Basic Reward Functions First, for each 2 (A), we create xj i ] and ] (aj ; yj ) that cancels out the di¤erences in the instantaneous i[ utilities for di¤erent a’s. Since Assumption 2 implies that player j can statistically infer player i’s action from her signals, for each aj 2 Aj , there exists i (aj ; ) : Yj ! R such that ui (ai ; aj ) + E [ i (aj ; yj ) j ai ; aj ] = 0 for each ai 2 Ai . For each , we de…ne i[ ](aj ; yj ) = i (aj ; yj ) (9) + ui ( ) so that ui (a) + E [ i [ ](aj ; yj ) j a] = ui ( ) for each a 2 A. This equality also implies that the expected value of E [ i [ ](aj ; yj ) j ] = 0. In addition, i[ (10) i[ ](aj ; yj ) given is zero: ](aj ; yj ) is continuous in . Second, by adding/subtracting a constant depending on xj to/from i (aj ; yj ), we create xj i (aj ; yj ) that makes player i indi¤erent between any action pro…le and that satis…es self generation: ui (a) + E xj i (aj ; yj ) j a does not depend on a 2 A, and we have sign(xj ) xj i (aj ; yj ) 0 for each xj 2 fG; Bg and (aj ; yj ). In summary, x Lemma 2 There exist u > 0 and ui j i2I;xj 2fG;Bg such that, for each i 2 I, the following three properties hold: 1. For each 2 (A), there exists i[ Yj ! ] : Aj u u ; 4 4 that makes any action optimal for player i: ui (a) + E [ i [ ](aj ; yj ) j a] = ui ( ) for all a 2 A. This implies E [ i [ ](aj ; yj ) j ] = 0. Moreover, for each (aj ; yj ), i[ 2. For each xj 2 fG; Bg, there exists x ui j for all a 2 A and sign(xj ) ] (aj ; yj ) is continuous in . xj i : Aj Yj ! xj i (aj ; yj ) u u ; 4 4 satisfying ui (a)+E xj i (aj ; yj ) 0 for each xj 2 fG; Bg and (aj ; yj ) 2 Aj x ja = Yj . x 3. u is su¢ ciently large: maxi2I;a2A jui (a)j+maxi2I;xj 2fG;Bg ui j < u and maxi2I;xj 2fG;Bg ui j < u . 4 Proof. See Appendix A.3. We …x i[ ], xj i , x u > 0, and ui j i2I;xj 2fG;Bg so that Lemma 2 holds. 14 Figure 1: How to take a(x) 6.2 Basic Actions We de…ne an action pro…le a(x) for each x 2 fG; Bg2 . As will be seen, a(x) is the action pro…le which is taken with a high probability on equilibrium path when the players take (x). Speci…cally, we …x (a(x))x2fG;Bg2 such that, for each i 2 I and xj 2 fG; Bg, we have sign(xj )(vi ui (a(x))) > 0.12 See Figure 1 for the illustration of a(x). This de…nition of a(x) is the same as Hörner and Olszewski (2006). Given (a(x))x2fG;Bg2 , we perturb ai (x) to probability no less than i (x) so that player i takes each action with > 0: i (x) (1 (jAi j 1) ) ai (x) + = (1 (11) ai : ai 6=ai (x) In addition, given player i’s minimaxing strategy min; i X jAi j ) 12 min i , min i + we also perturb X ai : min i : (12) ai 2Ai Action pro…les that satisfy the desired inequalities may not exist. However, if dim (F ) 2 (otherwise, int (F ) = ; and Theorem 1 is vacuously true), then there always exist an integer z and 2z …nite sequences fa1 (x), : : :, az (x)gx2fG;Bg2 such that each vector wi (x), the average payo¤ vector over the sequence fa1 (x), : : :, az (x)gx2fG;BgN , satis…es the appropriate inequalities. The construction that follows must then be modi…ed by replacing each action pro…le a(x) by the …nite sequence of action pro…les fa1 (x), : : :, az (x)gx2fG;BgN . Details are omitted since this is the same as Hörner and Olszewski (2006). 15 Let vi ; maxai 2Ai ui (ai ; min; j min; i ) be the perturbed minimax value. Given ; i (x) = 8 < : i (x) if xi = G; min; i if xi = B: , we de…ne (13) By de…nition of a(x) and v 2 int (F ), we have max max ui (ai ; ai 2Ai If we take by i (x) min j ); < vi < min ui (a(x)) for all i 2 I: max ui (a(x)) x:xj =B x:xj =G su¢ ciently small, then the targeted payo¤ vi is in the interval of the payo¤s induced and min; i : For a su¢ ciently small max vi ; ; max ui ( x:xj =B (x)) > 0, for each payo < vi < min ui ( x:xj =G Hence, there exist vi (xj ) and ui (xj ) such that, for each max vi ; ; max ui ( x:xj =B (x)) < < : ui (xj ) , we have (x)) for all i 2 I: payo , we have < ui (B) < vi (B) < vi < vi (G) < ui (G) < min ui ( x:xj =G Given u …xed in Section 6.1, there exists L such that, for each 8 < payo u ui (xj ) ui ( (x)) L n max ui ( (x)); maxai 2Ai ui (ai ; Fix such L. Given u and L, we …x su¢ ciently small (15 + 8L) min; i o ) < payo (x)) for all i 2 I: , we have if xj = G; u L if xj = B: > 0 such that, for each xj 2 fG; Bg, we have x jui (xj )j + Lu + ui j < jui (xj ) vi (xj )j : In summary, we have proven the following lemma: Lemma 3 Given v 2 int (F ) and u 2 R, there exist (a(x))x2fG;Bg2 , L 2 N, and > 0 such that, for each i 2 I and max vi ; ; max ui ( x:xj =B (x)) < payo payo > 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg , , we have < ui (B) < vi (B) < vi < vi (G) < ui (G) < min ui ( x:xj =G 16 (x)); (14) 8 < u ui (xj ) ui ( (x)) L n max ui ( (x)); maxai 2Ai ui (ai ; : ui (xj ) and x jui (xj )j + Lu + ui j (15 + 8L) We …x (a(x); L 2 N, and i (x); ; i < jui (xj ) (x))x2fG;Bg2 for each , min; i min; i if xj = G; o ) u L (15) if xj = B; vi (xj )j for each xj 2 fG; Bg: for each , (16) > 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg , payo > 0 so that Lemma 3 holds. In addition, we also …x two di¤erent actions P = jA1i j ai 2Ai ai be the ai (G); ai (B) 2 Ai arbitrarily with ai (G) 6= ai (B). Moreover, let mix i random strategy of player i. 6.3 Conditional Independence For the reason to be explained in Section 15, we want to construct S, S(t) i , j, qG , qB , c:i: i , "strict with the following properties: In some period t 2 N, player j takes a random strategy and mix j and player i takes some action ai;t and observes yi;t . After period t, there are S periods assigned to period t, denoted by S(t) 2 NS with jS(t)j = S. Player j takes S S(t) (ai;t ; yi;t ) [ (ai; ; yi; ) 2S(t); s i takes some pure strategy i : s2S(t) mix j 1 in each period, while player ! Ai in S(t), which depends on her history in periods t and S(t). Based on player j’s history in t and S(t), player j calculates a function The realization of 1 2 with qG E h j = 1 2 j i : (Aj Yj )S+1 ! [0; 1]. statistically infers what action player i takes in period t: For some qG and qB qB > 0, we have (aj;t ; yj;t ) [ (aj; ; yj; ) 2S(t) Importantly, the expected value of j j mix j ; ai;t ; yi;t ; S(t) i i = 8 > > q > < G 1 2 > > > : q B if ai;t = ai (G); if ai;t 6= ai (G); ai (B); if ai;t = ai (B): is conditionally independent of yi;t : In period t, from player i’s perspective, if she (rationally) expects that she will take S(t) i , then the expected value of j does not depend on yi;t . To incentivize player i to take S(t) i , player j gives the reward function c:i: i (aj;t ; yj;t ) [ (aj; ; yj; ) (c:i: stands for conditional independence) such that, for each (ai;t ; yi;t ), if a pure strategy 17 i 2S(t) is dif- ferent from S(t) i on equilibrium path, then E h c:i: i E h (aj;t ; yj;t ) [ (aj; ; yj; ) c:i: i j 2S(t) (aj;t ; yj;t ) [ (aj; ; yj; ) S(t) i mix j ; ai;t ; yi;t ; mix j ; ai;t ; yi;t ; j 2S(t) i i i (17) "strict for some "strict > 0. (Here, we ignore the instantaneous utility.) S(t) i , Moreover, we make sure that, given the equilibrium strategy at the timing of taking ai;t , the value does not depend on ai;t : For each ai;t 2 Ai , E h c:i: i (aj;t ; yj;t ) [ (aj; ; yj; ) 2S(t) S(t) i , The following lemma ensures that we can …nd S, S(t) i mix j ; ai;t ; j j, i = 0: c:i: i , qG , qB , and "strict to satisfy the conditions above: Lemma 4 There exist S 2 N, "strict > 0, uc:i: > 0, and 1 > qG > qB > 0 with qG 21 = 12 qB > 0 S S(t) (ai;t ; yi;t ) [ (ai; ; yi; ) 2S(t); s 1 ! such that, for each i 2 I, t 2 N, and S(t), there exist i : s2S(t) Ai , j : (Aj Yj ) S+1 ! [0; 1], and c:i: i : (Aj Yj ) S+1 uc:i: ; uc:i: such that the following ! properties hold: 1. Take any pure strategy of player i, denoted by i S : (ai;t ; yi;t ) [ (ai; ; yi; ) s2S(t) For each (ai;t ; yi;t ), if there exists hi = (ai;t ; yi;t ) [ (ai; ; yi; ) S(t) i such that (a) hi is reached by the equilibrium strategy (b) i (hi ) 6= S(t) i (hi ) ( i 2S(t); s 1 2S(t); s 1 ! Ai . for some s 2 S(t) with a positive probability, and is an on-path deviation), then the continuation payo¤ from hi is decreased by at least "strict : E h E c:i: i h c:i: i (aj;t ; yj;t ) [ (aj; ; yj; ) 2S(t) (aj;t ; yj;t ) [ (aj; ; yj; ) mix j ; ai;t ; yi;t ; j 2S(t) j S(t) i ; hi mix j ; ai;t ; yi;t ; i ; hi i i "strict : 2. The conditional independence property holds: For each ai;t 2 A and yi;t 2 Yi , E h j (aj;t ; yj;t ) [ (aj; ; yj; ) 2S(t) j mix j ; ai;t ; yi;t ; 18 S(t) i i = 8 > > q > < G 1 2 > > > : q B if ai;t = ai (G) if ai;t 6= ai (G); ai (B) if ai;t = ai (B): 3. The expected value does not depend on ai;t : For all ai;t 2 Ai . h E c:i: i (aj;t ; yj;t ) [ (aj; ; yj; ) 2S(t) mix j ; ai;t ; j S(t) i i = 0: Proof. See Appendix A.4. Let us provide the intuition of the proof. Given player i’s history in period t, (ai;t ; yi;t ), player i is asked to “report” (ai;t ; yi;t ) to player j in periods S(t). For a moment, imagine that player i sends the message via cheap talk. Given player i’s message (^ ai;t ; y^i;t ), suppose player j gives a reward 1aj;t ;yj;t E 1aj;t ;yj;t j 2 mix ^i;t ; y^i;t j ;a : Here, 1aj;t ;yj;t is jAj j jYj j-dimensional vector whose element corresponding to (aj;t ; yj;t ) is one and the other elements are zero. On the other hand, E 1aj;t ;yj;t j of 1aj;t ;yj;t given that player j takes mix j mix ^i;t ; y^i;t j ;a is the conditional expectation and player i observes (^ ai;t ; y^i;t ). Throughout the paper, we use Euclidean norm. From player i’s perspective, ignoring the instantaneous utility as in (17), she wants to minimize E h 1aj;t ;yj;t E 1aj;t ;yj;t j mix ^i;t ; y^i;t j ;a 2 j mix j ; ai;t ; yi;t i (18) by taking (^ ai;t ; y^i;t ) optimally. As will be seen in Lemma 16, given Assumption 4, we can show that (^ ai;t ; y^i;t ) = (ai;t ; yi;t ) (telling the truth) is the unique optimal strategy.13 Since a ^i;t = ai;t , player j has enough information to create j to satisfy Claim 2 of Lemma 4. Without cheap talk, player i takes a message by taking an action sequence in S(t) and player j infers what action sequence/message player i sends from player j’s history in S(t). Player i wants to minimize (18) with (^ ai;t ; y^i;t ) replaced with player j’s inference of the message. By taking S = jS(t)j su¢ ciently long, ex ante (after period t but before S(t)), we can make sure that given player i’s equilibrium strategy S(t) i to maximize the reward, player j infers (ai;t ; yi;t ) correctly with a high probability. Since Claim 2 of Lemma 4 requires conditional independence at the end of period t, this ex ante high probability is su¢ cient to create j (aj;t ; yj;t ) [ (aj; ; yj; ) 2S(t) to satisfy Claim 2. The strictness (Claim 1) can be achieved as follows: In the last period in S(t), if there is player 13 The objective function is called “the scoring rule” in statistics. 19 i’s history with which player i is indi¤erent, then player j can break the tie by giving a small reward for player i’s speci…c action based on (aj; ; yj; ) with being the last period in S(t). We can make sure that this tie-breaking reward is small enough not to a¤ect the strictness of the incentives for player i’s histories after which player i had the strict incentive originally and not to a¤ect player i’s incentive to minimize (18) so much (that is, the ex ante probability of player j infering (aj;t ; yj;t ) is still high). Then, we can proceed by backward induction. Whenever player j breaks a tie for some period , it does not a¤ect the strict incentives in the later periods in a certain period 0 > since the reward to break a tie based on (aj; ; yj; ) will be sunk in the later periods 0 > . Finally, since player j can statistically identify player i’s action ai;t from player j’s signal yj;t , by creating a reward function solely based on yj;t in order to reward (or punish) ai;t which gives a low value (or high value) in S(t), we can make sure that the expected value does not depend on ai;t to satisfy Claim 3. Since the reward based solely on yj;t is sunk in S(t), this reward does not a¤ect player i’s incentive in S(t). We now …x S, 7 S(t) i , j, qG , qB , c:i: i , and "strict so that Lemma 4 holds. Structure Given the variables …xed in Section 6, we pin down the structure of (the strategy in) each review phase/…nitely repeated game with T 2 N being a parameter. The review phase is divided into blocks, and the block is divided into rounds. See Figure 2 for illustration. First, given each player i’s state xi , the players coordinate on x in order to take a(x) depending on x with a high probability. To this end, at the beginning of the …nitely repeated game, they play the coordination block. In particular, players …rst coordinate on x1 . First, player 1 sends x1 2 fG; Bg to player 2 by 1 1 taking actions, spending T 2 periods with a large T . We call these T 2 periods “round (x1 ; 1),”and let T (x1 ; 1) be the set of periods in round (x1 ; 1). (In this step, we focus on the basic structure and postpone the explanation of the formal strategy in each round until Section 13.) Throughout the paper, we ignore the integer problem since it is easily dealt with by replacing each variable n with the smallest integer no less than n. 2 2 Second, player 1 sends x1 to player 2, spending T 3 periods. We call these T 3 periods “round 20 Figure 2: Structure of the review phase (x1 ; 2),” and let T (x1 ; 2) be the set of periods in round (x1 ; 2). (We will explain in Section 13.2 why player 1 sends x1 twice and why round (x1 ; 2) is longer.) Based on rounds (x1 ; 1) and (x1 ; 2), player 2 creates the inference of x1 , denoted by x1 (2) 2 fG; Bg. (In general, we use subscript to denote the original owner of the variable and index in the parenthesis to denote a player who makes the inference of the variable. For example, variablej (i) means that player j knows variablej , and that variablej (i) is player i’s inference of variablej .) 1 1 Third, player 2 sends x1 (2) to player 1, spending T 2 periods. We call these T 2 periods “round (x1 ; 3),” and let T (x1 ; 3) be the set of periods in round (x1 ; 3). Based on rounds (x1 ; 1), (x1 ; 2), and (x1 ; 3), player 1 creates the inference of x1 , denoted by x1 (1) 2 fG; Bg. (This inference x1 (1) may be di¤erent from her original state x1 .) Once players are done with coordinating on x1 , they coordinate on x2 . The way to coordinate on x2 is the same as the way to coordinate on x1 , with players’indices reversed. After the coordination on x is done, the players play L “main blocks.” In each main block, …rst, the players play a T -period review round. With a large T , we can recover the precision of the monitoring by the law of large numbers. After each review round, the players coordinate on what strategy they should take in the next review round, depending on the history (as explained in the 21 Introduction). This coordination is done by each player i sequentially sending a binary message i (l + 1) 2 fG; Bg, which summarizes the history at the end of the review round. (The precise meaning of i (l + 1) 2 fG; Bg will be explained in Section 13.4.) In total, for l = 1; :::; L 1, the players …rst play review round l for T periods. Let T (l) be the set of review round l. Then, player 1 sends 1 these T 2 periods “the supplemental round for (l + 1),”and let T ( 1 (l + 1)) be the set of periods periods. “The supplemental (l + 1)) are similarly de…ned. We call these three rounds 2 (l round for 2 (l + 1)” and the set T ( 1 + 1) 2 fG; Bg, spending T 2 periods. We call 1 2 in it. After player 1, player 2 sends 2 1 1 (l + 1) 2 fG; Bg, spending T “main block l.” Once main block l is over, the players play review round l + 1, recursively. After the last review block L, since the players do not need to coordinate on the future strategy any more, main block L consists only of review round L. Given this structure, we can chronologically order all the rounds in the coordination and main blocks, and name them round 1, round 2, ..., and round R. Here, R 6 + 3 (L (19) 1) + 1 is the total number of rounds in the coordination and main blocks. For example, round 1 is equivalent to round (x1 ; 1), round 2 is equivalent to round (x1 ; 2), and so on. chronological order, when we say r Given such a l, this means that round r is review round l or a round chronologically before review round l. Similarly, r < l means that round r is a round chronologically before (but not equal to) review round l. In addition, let T(r) be the set of periods in round r; for example, T(r) = T (x1 ; 1), and t(r) + 1 is the …rst period of round r. Finally, the players play the report block, where the players send the summary statistics of the history in the coordination and main blocks. As will be seen in Section 15, we use this block to 22 …ne-tune the reward function. This round lasts for Treport periods with 1 2 Treport 1 + (S + 1) T + (S + 1) T 1 2 R X r=1 R X 1 2 + (S + 1)T + 1 r=1 +(S + 1)T 1 2 1 +(S + 1)T 4 R X r=1 R X r=1 1 2 jT(r)j 3 log2 1 + jT(r)j 3 jA1 jjY1 j 1 3 log2 jT(r)j + (S + 1)T 1 3 jT(r)j log2 jT(r)j 2 jA2 jjY2 j 3 1 4 R X r=1 (20) 2 jT(r)j 3 (1 + log2 jA1 j jY1 j) 1 2 + (S + 1)T + 1 R X r=1 1 log2 jT(r)j 3 2 jT(r)j 3 log2 jA2 j jY2 j : 11 Note that Treport is of order T 12 . We have now pinned down the structure of the review phase, with T being a parameter. For a su¢ ciently large T , the payo¤s from the review round determine the equilibrium payo¤ from the review phase since the length of the review rounds is much longer than the other rounds/blocks: Lemma 5 Let 1 2 2 + 2T 3 TP (T ) = 4T | {z } + coordination block LT |{z} review round + (L | supplemental round be the length of the review phase. We have limT !1 T(r) Given player i’s history hi 1 1) 2T 2 + {z } Treport | {z } 11 report block, O(T 12 ) length of the review rounds length of the review phase limT !1 LT TP (T ) = 1. (ai;t ; yi;t )t2T(r) in round r in the coordination and main blocks, let T(r) fi [hi ] T(r) fi [hi ](ai ; yi ) (ai ;yi )2Ai Yi # ft 2 T (r) : (ai;t ; yi;t ) = (ai ; yi )g jT (r)j (21) (ai ;yi )2Ai Yi be the vector expression of the frequency of periods in round r where player i has (ai;t ; yi;t ) = (ai ; yi ). It will be useful to consider the following randomization of player i: For each round r, player i picks a period texclude (r) 2 T (r) randomly: texclude (r) = t with probability i i independently of player i’s history. Let Tinclude (r) i 23 1 jT(r)j for each t 2 T (r), T (r) n texclude (r) be the periods other than i texclude (r), and let i T(r) fiinclude [hi T(r) fiinclude [hi ] # t2 (r). be the frequency in Tinclude i ](ai ; yi ) Tinclude i (ai ;yi )2Ai Yi (r) : (ai;t ; yi;t ) = (ai ; yi ) jT (r)j 1 ! (22) (ai ;yi )2Ai Yi As will be seen in Section 13, player i decides the continuation T(r) play in the subsequent rounds based only on fiinclude [hi ]. This implies that player j never learns (ai;texclude (r) ; yi;texclude (r) ) by observing her own private signals informative about player i’s continuation i i play. This property will be important in Section 15. T(r) Given fi [hi ], let 0 B @ fi [hi l ] T(x1 ;1) fi [hi T(x1 ;2) ]; fi [hi T(x1 ;3) ]; fi [hi l) T( T(~ fi [hi ]; fi [hi ~ ) 1 (l+1) T(x2 ;1) ]; fi [hi T( ]; fi [hi T(x2 ;2) ]; fi [hi ~ ) 2 (l+1) T(x2 ;3) ]; fi [hi ]; l 1 ] ~ l=1 T(l) ; fi [hi ] 1 C A (23) be player i’s frequency of the history at the end of review round l; and let fi [h<l i ] be the frequency T(l) which exclude fi [hi ] from fi [hi l ] (that is, the frequency at the beginning of review round l). l <l fiinclude [hi l ] and fiinclude [h<l i ] are similarly de…ned. On the other hand, let hi and hi be player i’s histories at the end and beginning of review round l, respectively. Similarly, let hi r and h<r be i player i’s histories at the end and beginning of round r, respectively. be player j’s history in the report block. Finally, let hreport j 8 Road Map to Prove (5)–(8) We have …xed each , qB , c:i: i , payo i[ ], xj i , x u > 0, ui j i2I;xj 2fG;Bg , (a(x); > 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg , L 2 N, i (x); ; i (x))x2fG;Bg2 for each , > 0, ai (G), ai (B), mix i , S, min; i for j, qG , S(t) i , and "strict in Section 6, and then de…ned the structure in the …nitely repeated game and TP (T ) in Section 7. Given this structure, we are left to pin down T 2 N, ff i (xi )gxi 2fG;Bg gi2I , and ff i (xj ; hTj P +1 )gxj 2fG;Bg gi2I to satisfy (5)–(8) with …xed fvi (xj )gi2I;xj 2fG;Bg and TP (T ). Since the proof is long and complicated, we …rst o¤er the overview in Section 9. Then, we prove two modules which may be of their own interests. Section 10 de…nes the module for player 24 j to send a binary message m 2 fG; Bg to player i. This module will be used in the equilibrium construction so that the players can coordinate on the continuation play by sending messages via actions. Section 11 de…nes the module that will be used for the equilibrium construction of the review round. The proof for each module is relegated to the online appendix. The road map of how to use these modules in the equilibrium construction will be postponed after Section 11. 9 Overview of the Equilibrium Construction As Lemma 5 ensures, the equilibrium payo¤ is determined by the payo¤s in the review rounds. Hence, in this section, we focus on how to de…ne the strategy and reward function in the review rounds. 9.1 Heuristic Reward Function For a moment, suppose that the players can coordinate on the true x: x(i) = x(j) = x. < payo . Suppose that the players take ( i (x(i)); j (x(j))) = Take (x) in each review round and that player i’s reward function is LT fvi (xj ) ui ( (x))g + L X X i[ (x)](aj;t ; yj;t ): (24) l=1 t2T (l) Claim 1 of Lemma 2 ensures that player i’s incentive is satis…ed and that her equilibrium value is vi (xj ). PTP t=1 Moreover, since E [ i [ i[ (x)](aj;t ; yj;t ) j (x)] = 0, the law of large numbers ensures that (x)](aj;t ; yj;t ) is near zero with a high probability. Together with (14), we conclude that self generation is satis…ed with a high probability. See Figure 3 for illustration. However, self generation needs to be satis…ed after each possible realization of player j’s history, and after an erroneous realization of player j’s history, self generation would be violated if we used (24). For example, consider prisoners’dilemma C C 2; 2 D 3; 1 25 D 1; 3 0; 0 Figure 3: Illustration of the heuristic reward with xj = G where, for each j 2 I, signal structure is Yj 2 f0j ; 1j g, qj (1j j Ci ; aj ) = :6, and qj (1j j Di ; aj ) = :4 for each aj 2 fC; Dg. Then, i[ (x)](aj ; yj ) = 5 1fyj =1j g 5 + ui ( (x)), where 1fyj =1j g = 1 if yj = 1j and 1fyj =1j g = 0 otherwise. If x = (G; G) and v is close to the mutual cooperation payo¤ (2; 2), then both vi (xj ) and ui ( (x)) are near 2 by Figure 1. Hence, if player j observes yj = 1j excessively often (say, almost TP periods), then (24) is equal to LT fvi (xj ) ui ( | {z (x))g + 5 } both of them are near 2 # ft : yj;t = 1j g | {z } near TP 5LT + ui ( (x))LT > 0: | {z } near 2 If signals were conditionally independent, then the …x suggested by Matsushima (2004) would work: If (24) violates self generation, then player j gives player i the reward of zero. Since signals are conditionally independent, player i after each history believes that such an event happens very rarely. (Since player i puts a small yet positive belief on (24) violating self generation, Matsushima uses pure strategy a(x) and gives a strict incentive to follow ai (x), so that player i wants to take ai (x) even though the reward is zero with a small probability.) However, with conditionally dependent signals, this …x does not work. For example, if player i’s signals and player j’s signals are positively correlated, after player i observes a lot of 1i , she starts to believe that (24) violates self generation with a high probability. After such an event, incentive compatibility would not be satis…ed if we gave the reward of zero. 26 9.2 Type-1, Type-2, and Type-3 Reward Functions To deal with this problem, we de…ne the overall reward is the summation of the reward for each review round l (we need more modi…cation for the formal proof): TP +1 ) = LT fui (xj ) i (xj ; hj where T(l) review (xj ; hj ; l) i ui ( (x))g + L X T(l) review (xj ; hj ; l); i l=1 is player i’s reward for review round l. We use ui (xj ) instead of vi (xj ) here to keep some slack in self generation for the later modi…cations (see (14) for why ui (xj ) gives us more T(l) review (xj ; hj ; l) i slack). Now, player i’s value without further adjustment is ui (xj ). The shape of can be either type-1, type-2, or type-3. The type-1 reward function is the same as heuristic reward: T(l) review (xj ; hj ; l) i = P t2T(l) i[ (x)] (aj;t ; yj;t ). By Lemma 2, the maximum realization of the absolute value of the type-1 reward per round is u4 T . Hence, as long as the realized absolute value of the type-1 reward per round is no more than u T 4L until review round l, no matter what realization happens in review round l + 1, the total absolute value at the end of review round l + 1 is l+1 X X i[ u T 4L (x)](aj;t ; yj;t ) ^ l=1 t2T(^ l) u l+ T 4 u T: 2 Together with (15), this means that self generation is not an issue. P u Therefore, as long as (x)](aj;t ; yj;t ) T for each ^l = 1; :::; l, player j uses the t2T(^ l) i [ 4L type-1 reward in review round l + 1. Let j (l + 1) 2 fG; Bg such that j (l + 1) = G if and only P u if (x)](aj;t ; yj;t ) T for each ^l = 1; :::; l. Note that j (l + 1) depends on player j’s t2T(^ l) i [ 4L history hj l . (Presicely speaking, we should write On the other hand, if j (l j (l + 1) = B, then if player j used the type-1 reward function for review round l + 1, then self generation might be violated. equal to ; j + 1)(hj l ), but we omit hj l for simplicity.) Hence, player j takes indi¤erent between “player j taking ; (x(j)) (which is (x) if x(j) = x) and gives a constant reward. (15) ensures that there exists a constant reward such that (i) if coordination goes well and player i takes BRi ( j ; j j (x(j)) ; j (x(j))), then player i is and using the type-1 reward” and “player j taking (x(j)) and using the constant reward”(and so the switch of the reward function does not a¤ect player i’s incentive in the previous rounds) and that (ii) self generation is satis…ed. We call this 27 constant reward “the type-2 reward function.” In the equilibrium construction, player i creates an inference of j (l + 1), j (l + 1)(i) 2 fG; Bg, which depends on player i’s history h<l+1 at the beginning of review round l + 1. i j (l + 1)(i) j (l+1) = G means player i believes that = B. Player i takes since player j with j (l if i (x(i)) + 1) takes ; j j (l + 1) j (l+1)(i) = G and j (l + 1)(i) = G and takes BRi ( (x(j)) (which is equal to ; ; j Intuitively, = B means she believes (x(i))) if j (l+1)(i) =B (x(i)) if the coordination goes j well: x(i) = x(j)) and the reward is type-2 and constant. (Here, we ignore player i’s role to control player j’s payo¤. As player j takes an action based on player i also takes her action based on not only j (l j (l + 1) in order to control player i’s payo¤, + 1)(i) but also i (l + 1) to control player j’s payo¤. We ignore this complication in this section.) We say that the coordination goes well if x(i) = x(j) = x and “ j (l + 1)(i) = B whenever j (l + 1) = B”. The above discussion means that, if the coordination goes well, then player i’s strategy is optimal. j (l + 1) (One may wonder why her strategy is optimal if = G. Note that player j with j (l + 1) j (l + 1)(i) = B and = G uses the type-1 reward, and Lemma 2 ensures that any strategy of player i maximizes the summation of the instantaneous utility and reward function. Hence, any strategy of player i is optimal as long as j (l + 1) = G.) Finally, player j sometimes uses the following “type-3”reward: T(l) review (xj ; hj ; l) i = X xj i (aj;t ; yj;t ): (25) t2T(l) By Claim 2 of Lemma 2, this reward always makes player i indi¤erent between any action pro…le x (player i’s incentive is irrelevant) and satis…es self generation, but the value ui j may be very di¤erent from the targeted value vi (xj ) (and so promise keeping may become an issue). We will make sure that the type-3 reward is used with a small probability so that we will not violate promise keeping. Moreover, we will also make sure that player i cannot deviate to a¤ect player j’s decision of using x the type-3 reward function (since otherwise, depending on whether ui j is larger or smaller than the value with the type-1 or type-2 reward, player i may want to deviate to a¤ect the decision). That is, the distribution of the event that player j uses the type-3 reward does not depend on player i’s strategy. Further, whenever player j has i (l)(j) (that is, player j believes that her reward function is constant and takes a static best response to player i’s action), player j has determined to use the 28 type-3 reward for player i, so that player i can ignore the possibility of 9.3 i (l)(j) = B. Miscoordination of the Continuation Strategy Since players coordinate on the continuation strategy based on private signals, it is possible that coordination does not go well. There are following two possibilities. First, players coordinate on the same x(i) = x(j), but this is di¤erent from true x. Recall that xj controls player i’s payo¤. Hence, as long as players coordinate on the same x(i) = x(j) and xj (i) = xj (j) = xj , player i’s payo¤ is equal to ui (xj ). We will make sure that player j with xj (j) 6= xj uses the type-3 reward so that, as long as players coordinate on the same x(i) = x(j), either player i’s payo¤ is equal to ui (xj ) or player i’s incentive is irrelevant. Since the type-3 reward is not used often, by adjusting the reward slightly, we can make sure that player i’s ex ante equilibrium payo¤ is vi (xj ). Second, player i has x(i) 6= x(j) or “ j (l + 1)(i) = G if player i with history hi l+1 j (l + 1) = B”. We will make sure that, (at the end of review round l + 1) believes 9 08 < fx(i) 6= x(j) _ f j (l + 1)(i) = G ^ j (l + 1) = Bgg = o n j xj ; hi Pr @ : ^ review (xj ; hT(l+1) ; l + 1) is not type-3 reward ; i j 1 l+1 A 1 exp( T 3 ): (26) Since the expected di¤erence of player i’s payo¤ for di¤erent actions is zero with the type-3 reward function, this means that player i’s conditional expectation of the gain of changing her action to deal with the miscoordination is very small. (Note that (26) holds, conditioning on hi l+1 , which includes player i’s history in review round l +1.) That is, without further adjustment, the equilibrium would be "-sequential equilibrium, where deviation gain after each history is no more than " being of order 1 exp( T 3 ). 9.4 Adjustment and Report Block To make the equilibrium strategy exactly optimal, we further modify the reward function as follows: The above discussion implies that, if we changed player i’s reward function for review round l so 29 that target (xj ; hi l ; hj l ; l) i = 1nfx(i)6=x(j)_f + 1 j (l)(i)=G^ j (l)=Bgg^ftype-3 1nfx(i)6=x(j)_f o j (l)(i)=G^ j (l)=Bgg^ftype-3 type-1 reward function review (x ;hT(l) ;l)g j j i reward is not used in then player i’s equilibrium strategy would be optimal. T(l) review (xj ; hj ; l) i o review (x ;hT(l) ;l)g j j i reward is not used in (Recall that Lemma 2 ensures that the type-1 reward function makes each strategy of player i optimal. Hence, player i does not need to worry about miscoordination.) Note that such a reward would depend on player i’s history hi l as well, since x(i) and E h j (l)(i) depends on hi l . Moreover, (26) ensures that target (xj ; hi l ; hj l ; l) i j xj ; hi l i E h T(l) review (xj ; hj ; l) i j xj ; hi l i (27) 1 is of order exp( T 3 ) with l + 1 replaced with l. adjust (xj ; hreport ; l), i j If player j can construct a reward function which depends on player j’s history in the report block hreport , such that, from player i’s perspective at the end of review round j l, the expected value of adjust (xj ; hreport ; l) i j given (xj ; hi l ) is equal to (27), then by the law of iterated expectation, player i in review round l wants to maximize E h T(l) review (xj ; hj ; l) i + adjust (xj ; hreport ; l) i j j xj ; hi l i =E h target (xj ; hi l ; hj l ; l) i j xj ; hi l i and so player i’s equilibrium strategy is incentive compatible. To this end, we de…ne adjust (xj ; hj l ; hreport ; l) i j so that h h E E adjust (xj ; hj l ; hreport ; l) j xj ; hi L ; i j h i h target l l l = E i (xj ; hi ; hj ; l) j xj ; hi E Here, report jh i i L report jh i i L i j xj ; hi T(l) review (xj ; hj ; l) i l i j xj ; hi l i : (28) is player i’s equilibrium strategy in the report block given her history at the end of the main block, hi L . As will be seen in Section 15, we construct player i’s reward function in the report block so that player i’s optimal strategy report jh i i L depends on her history in the coordination and main blocks, in particular, on hi l . Then, player j’s history in the report block is 30 correlated with hi l . Therefore, we can construct a function adjust (xj ; hj l ; hreport ; l) i j such that (28) holds. From now on, before de…ning the strategy and reward function formally, we introduce two modules that will be useful for the equilibrium construction. 10 Module to Send a Binary Message We explain how player j sends a binary message m 2 fG; Bg. Speci…cally, we …x m 2 fG; Bg and de…ne how player j sends m to player i in T, where T N is the set of periods in which player j sends m. For example, if we take T = T (xj ; 1) and m = xj , then the following module is the one with which player j sends xj in round (xj ; 1). Recall that, for each i 2 I, ai (G); ai (B) 2 Ai with ai (G) 6= ai (B), and 6. Given m 2 fG; Bg, player j (sender) takes actions as follows: With > 0 are …xed in Section send to be determined in Lemma 6, player j picks one of the following mixtures of her actions at the beginning: send 1. [Send: Regular] With probability 1 , player j picks j (m) 1 (jAj j 1) send aj (m)+ P send That is, player i takes aj (m) (action corresponding to the message m) aj 6=aj (m) aj . with a high probability. send 2. [Send: Opposite] With probability 2 , player j picks j (m) ^ 1 (jAj j 1) send aj (m) ^ + P send ^ = fG; Bg n fmg. That is, player j takes an action as if the true aj 6=aj (m) ^ aj with m message were m ^ 6= m. 3. [Send: Mixture] With probability 2 , player j picks send j (M ) 1 2 send j (G) + 12 send j (B). That is, player j takes an action as if she mixed two messages G and B. Let j (m) 2f send j (G); send j (B); send j (M )g be the realization of the mixture. Given player j takes aj;t i.i.d. across periods according to send j from (B), and j (m) send j j (m). (That is, the mixture over (M ) happens only once at the beginning. Given j (m), j (m), send j (G), player j draws aj;t every period.) On the other hand, player i (receiver) takes ai according to mix i (fully mixed strategy) i.i.d. across periods. See Figure 4. In general, thin lines in the …gure represent the events that happen with small probabilities. 31 Figure 4: How to send message m In periods T, each player n 2 I observes a history hTn = fan;t ; yn;t gt2T . Given hTn , let fn hTn (an ; yn ) #ft 2 T : (an;t ; yn;t ) = (an ; yn )g jTj be the frequency of periods with action an and signal yn ; and let fn hTn (fn hTn (an ; yn ))an ;yn be the vector expression. 2 T randomly: In addition, as in texclude (r) in Section 7, each player n picks one period texclude i n texclude = t with probability n …ne Tinclude n 1 jTj for each t 2 T, independently of player n’s history. . g as the set of periods in T except for texclude ft 2 T : t 6= texclude n n We deWe de…ne . That is, these are the fninclude [hTn ](an ; yn ) and fninclude [hTn ] as above, with T replaced with Tinclude n frequencies in the periods except for texclude . n Player i creates an inference of m based on fiinclude [hTi ], denoted by m(fiinclude [hTi ]) 2 fG; Bg. Since fiinclude [hTi ] is in (Ai Yi ), we can see m(fiinclude [hTi ]) as a function m : On the other hand, player j creates a variable is, j (m; ): (Aj include T [hj ]) j (m; fj (Ai Yi ) ! fG; Bg. 2 fR; Eg based on fjinclude [hTj ], that Yj ) ! fR; Eg. As will be seen in the proof of Lemma 11, once include T [hj ]) j (m; fj = E happens in a round where player j sends a message, then player j uses the type-3 reward for player i (that is, player j makes player i indi¤erent between any actions, as seen in (25)) in the subsequent review rounds. In order to satisfy (26) in the overview, we make sure that, given the true message m and player i’s history hTi , if m(fiinclude [hTi ]) 6= m (if player i makes a mistake to infer the message), then the conditional probability of include T [hj ]) j (m; fj = E is su¢ ciently high (see Claim 2 of Lemma 6 for the formal argument). That is, if player i realizes her mistake to infer player j’s message, then player i believes 32 that any strategy is optimal with a high probability. In addition, as mentioned in Section 9.2, we want to make sure that the type-3 reward is used with a small probability so that we will not violate promise keeping. include T [hj ]) j (m; fj include T [hj ]) j (m; fj so that 6). We use R and E for the realization of To this end, we de…ne = E with a small probability (see Claim 3 of Lemma j, meaning that R stands for “regular” and E stands for “erroneous.” Moreover, again as mentioned in Section 9.2, we want to make sure that the distribution of the event that player j uses the type-3 reward does not depend on player i’s strategy. To this end, we make sure that the distribution of include T [hj ]) j (m; fj does not depend on player i’s strategy (again see Claim 3 of Lemma 6). In addition to m(fiinclude [hTi ]), player i also creates a variable that is, i (receive; ): (Ai include T [hj ]), j (m; fj Yi ) ! fR; Eg. As include T [hi ]) i (receive; fi once 2 fR; Eg, include T [hi ]) i (receive; fi =E happens in a round where player j sends a message, then player i uses the type-3 reward for player include T [hi ]) i (receive; fi j in the subsequent review rounds. Again, we make sure that include T [hi ]) i (receive; fi with a small probability and that the distribution of = E happens does not depend on player j’s strategy (see Claim 4 of Lemma 6). Further, we make sure that player j with include T [hj ]) j (m; fj = R (that is, if player j does not use the type-3 reward for player i) believes that mi (fiinclude [hTi ]) = m or include T [hi ]) i (receive; fi =E (that is, player i infers the message correctly or player i uses the type-3 reward for player j) with a high probability (see Claim 5 of Lemma 6). In particular, with m = xj , this means that as long as player j is not using the type-3 reward for player i, she believes that player i received xj correctly or player i uses the type-3 reward. Since player j is indi¤erent between any action in the latter case, in total, this belief incentivizes her to adhere to her own state xj , as mentioned in Section 9.3 and will be shown in Claim 5 of Lemma 11. Finally, since Assumption 1 assumes that signal pro…les have full support, each player cannot …gure out the other player’s actions or inferences perfectly (see Claims 6 and 7 of Lemma 6). In total, we can construct three functions m, Lemma 6 There exist i (receive; ) : (Ai send i, and j, so that the following lemma holds: > 0, "message > 0, Kmessage < 1, m : Yi ) ! fR; Eg, and j (m; ) : (Aj (Ai Yj ) ! fR; Eg such that, for a suf- …ciently large jTj, for each i 2 I and m 2 fG; Bg, the following claims hold: 33 Yi ) ! fG; Bg, 1. Since texclude is random, the frequency is the su¢ cient statistic to infer the other players’ n Yj )jTj , m; m ^ 2 fG; Bg, and ^i 2 fR; Eg, we have variables: For each hTj 2 (Aj Prijj = Prijj n o m(fiinclude [hTi ]) = m ^ i ; i (receive; fiinclude [hTi ]) = ^i j m; hTj o n ^ i ; i (receive; fiinclude [hTi ]) = ^i j m; fj [hTj ] : m(fiinclude [hTi ]) = m In addition, for each hTi 2 (Ai Prjji = Prjji Here, Prijj j m; hTj and Prjji Yi )jTj and ^j 2 fR; Eg, we have n n = ^j include T [hj ]) j (m; fj include T [hj ]) = ^j j (m; fj o o j m; hTi j m; fi [hTi ] : j m; hTi are induced by the following assumptions: In Prijj j m; hTj , player j’s action sequence faj;t gt2T is given by hTj . On the other hand, player i takes ai;t i.i.d. across periods according to mix i . Given fat gt2T , the signal pro…le yt is drawn from the condi- tional joint distribution function q(yt j at ) for each t, and player j observes her signal fyj;t gt2T . randomly. Prjji Finally, player i draws texclude i and j reversed, and mix i replaced with 2. For all m 2 fG; Bg and hTi 2 (Ai j m; hTi is de…ned in the same way, with i given m. j (m) Yi )jTj , if m(fiinclude [hTi ]) 6= m (player i misinfers the message), then we have Prjji include T [hj ]) j (m; fj = E j m; hTi 3. For each m and for each strategy of player i denoted by probability of 1 j = R does not depend on m or 2 for each m 2 fG; Bg and 1 : SjTj s=0 1 (Ai Yi )s ! Prjji include T [hj ]) j (m; fj j : SjTj s=0 1 Yj )s ! (Aj = R does not depend on player j’s strategy: Prijj 2 does not depend on i (29) (Ai ), the = R j m; i = i. 4. For each strategy of player j denoted by i i: exp( "message jTj): 1 (Aj ), the probability of include T [hi ]) i (receive; fi =R j j = 14 j. 14 Here, we omit m from Prijj j m; hTj since player j’s strategy determine the distribution of actions and signals. 34 j and player i’s equilibrium strategy mix i fully Yj )jTj , if 5. For all hTj 2 (Aj include T [hi ]) i (receive; fi Prijj 1 include T [hj ]) j (m; fj = R, then we have = E _ m(fiinclude [hTi ]) = m j m; hTj exp( "message jTj): Yj )jTj , any inference of player i is possible: 6. For all m; m ^ 2 fG; Bg and hTj 2 (Aj Prijj m(fiinclude [hTi ]) = m ^ j m; hTj 7. For all m 2 fG; Bg and hTi 2 (Ai exp( Kmessage jTj): Yi )jTj , any history of player i is possible: Pr hTi j m exp( Kmessage jTj): Proof. See Appendix A.6. Claim 1 holds once we de…ne m(fiinclude [hTi ]) Let us intuitively explain why Lemma 6 holds. and include T [hi ]) i (receive; fi so that they depend only on fiinclude [hTi ], and de…ne include T [hj ]) j (m; fj so that it depends only on fjinclude [hTj ]. Hence, we concentrate on the other claims. First, we de…ne include T [hj ]) j (m; fj such that player j has include T [hj ]) j (m; fj = R only if (and if, except for a small adjustment in the formal proof) all of the following conditions are satis…ed: 1. [Regular Mixture Send] Player j picks j (m) 2. [Regular Action Send] Let fjinclude [hTj ](aj ) send = j P (m). yj 2Yj fjinclude [hTj ](aj ; yj ) be the frequency of player j’s actions. We say that player j’s action frequency is regular if, for a small "message > 0, we have fjinclude [hTj ](aj ) j (m)(aj ) "message for each aj 2 Aj . 3. [Regular Signal Send] Let fjinclude [hTj ](Yj j aj ) fjinclude [hTj ](yj j aj ) fjinclude [hTj ](yj j aj ) yj 2Yj with fjinclude [hTj ](aj ; yj ) fjinclude [hTj ](aj ) be the vector-expression of the conditional frequency of player j’s signals given player j’s action aj . (We de…ne fjinclude [hTj ](yj j aj ) = 0 if fjinclude [hTj ](aj ) = 0.) On the other hand, let a fqj (aj ; ai )gai 2Ai be the a¢ ne hull of player j’s signal distributions with respect to player 35 i’s actions. We say that player j’s signal frequency is regular if, for a small "message > 0, d fjinclude [hTj ](Yj j aj (m)); a fqj (aj (m); ai )gai 2Ai (30) "message : Here and in what follows, we use Euclidean norm k k and Hausdor¤ metric d. Given this de…nition, we verify Claim 3 of Lemma 6. [Regular Mixture Send] and [Regular Action Send] are solely determined by player j’s mixture, which player i cannot a¤ect. Moreover, player i cannot a¤ect the probability of [Regular Signal Send] since a fqj (aj (m); ai )gai 2Ai is the a¢ ne hull with respect to player i’s actions. (Precisely speaking, taking a¢ ne hull ensures that player i cannot change the expected distance h E d fjinclude [hTj ](Yj j aj ); a fqj (aj ; ai )gai 2Ai i (31) ; but does not guarantee that player i cannot change the distribution of the distance Pr d fjinclude [hTj ](Yj j aj ); a fqj (aj ; ai )gai 2Ai (32) : We take care of player i’s incentive to change the distribution in the formal proof.) probability of Moreover, include T [hj ]) j (m; fj j (m) = send j Hence, the = R does not depend on player i’s strategy. (m) with a high probability, as seen in Figure 3, and by the law of large numbers, [Regular Action Send] and [Regular Signal Send] happen with a high probability given j (m) = send j (m). Hence, include T [hj ]) j (m; fj = R with a high probability. Hence, Claim 3 of Lemma 6 holds. On the other hand, we de…ne i (receive; fiinclude [hTi ]) as follows: Let fiinclude [hTi ](ai ) P yi 2Yi fiinclude [hTi ](ai ; yi ) be the frequency of player i’s actions. We say that player i’s action frequency is regular if, for a small "message > 0, we have fiinclude [hTi ](ai ) mix i (ai ) [Regular Action Receive] denote this event. Player i has "message for each ai 2 Ai . include T [hi ]) i (receive; fi Let = R only if (and if, except for a small adjustment in the formal proof) [Regular Action Receive] holds. Again, we can make sure that the probability of include T [hi ]) i (receive; fi = R does not depend on player j’s strategy and is high by the law of large numbers. Hence, Claim 4 of Lemma 6 holds. We now de…ne player i’s inference m(fiinclude [hTi ]), so that Claim 2 of Lemma 6 holds. 36 She calculates the log likelihood ratio between prior of j (m) = send j (G) and j (m) = send j (B), ignoring the j (m): log o (G) ; fai;t gt2T j n o send j (B) ; fai;t gt2T j (m) = j Pr fyi;t gt2T j n send j (m) = Pr fyi;t gt2T ( X = jTj fi [hTi ](ai ; yi ) log qi (yi j ai ; send j (G)) ai ;yi X (33) fi [hTi ](ai ; yi ) log qi (yi ai ;yi j ai ; send j ) (B)) : Related to (33), let X Li (fi [hTi ]; G) ai ;yi be the log likelihood of j (m) = send j fi [hTi ](ai ; yi ) log qi (yi j ai ; send j (G)) (G). Li (fi [hTi ]; B) and Li (fi [hTi ]; M ) are de…ned in the same way, with G replaced with B and M , respectively. Given these likelihoods, player i creates m(fiinclude [hTi ]) as follows: For some Lreceive belief > 0, 1. [Case G] If Li (fiinclude [hTi ]; G) > Li (fiinclude [hTi ]; B) + 2Lreceive belief , then player i infers that the message is G: m(fiinclude [hTi ]) = G. 2. [Case B] If Li (fiinclude [hTi ]; B) > Li (fiinclude [hTi ]; G) + 2Lreceive belief , then player i infers that the message is B: m(fiinclude [hTi ]) = B. 3. [Case M ] If neither of the above two conditions is satis…ed, then player i infers the message randomly: m(fiinclude [hTi ]) = G with probability 12 ; and m(fiinclude [hTi ]) = B with probability 12 . See Figure 5 for the illustration. Let us …x Lreceive belief > 0. Since the maximum likelihood estimator is consistent, there exists 37 Figure 5: How to infer m send > 0 such that, for su¢ ciently small Lreceive belief > 0, for each send < send , we have 8 > Li (fiinclude [hTi ]; G) Li (fiinclude [hTi ]; B) + 3Lreceive > belief > > > send > > if fiinclude [hTi ] (ai ; yi ) = jA1i j qi yi j ai ; j (G) for each ai ; yi > > > > > > > (that is, if the frequency of player i’s history is close to the ex ante mean given > > > send send > < then the likelihood of j (G) is higher than that of j (B)), > > Li (fiinclude [hTi ]; B) Li (fiinclude [hTi ]; G) + 3Lreceive > belief > > send > > > if fiinclude [hTi ] (ai ; yi ) = jA1i j qi yi j ai ; j (B) for each ai ; yi > > > > > > (that is, the frequency of player i’s history is close to the ex ante mean given > > > > send send : then the likelihood of j (B) is higher than that of j (G)). Moreover, since the log-likelihood is strictly concave and taking send = send j j (M ) = 1 2 send j (G) + send j 1 2 > 0 su¢ ciently small if necessary, for su¢ ciently small Lreceive belief > 0, for each if neither [Case G] nor [Case B] is the case, then j (m) send send j (G) and j (m) Li (fiinclude [hTi ]; M ) = send j j (m) = send j (G), (B), (34) send j send (B), re< send , (M ) is more likely than both (B): If neither [Case G] or [Case B] holds, then max Li (fiinclude [hTi ]; G); Li (fiinclude [hTi ]; B) + 2Lreceive belief : We …x Lreceive belief > 0 so that (34) and (35) hold. 38 (35) Figure 6: A¢ ne hull of player j’s signal observation Note that with this inference, Claim 2 of Lemma 6 holds. To see why, suppose m = G (the explanation with m = B is the same and so is omitted). If m(fiinclude [hTi ]) = B 6= G = m, then [Case B] or [Case M ] is the case. We consider these two cases in the sequel. [Case B] implies that send = j prior. Since player j takes each of j 2 send (B) is more likely than (G), send j (B), and send j j (m) = send j (G), except for the (M ) with probability no less than by Figure 4, given m = G and taking the prior into account, the law of large numbers ensures that j (m) 6= send j Send] ensures that j (m) 1 j (m) 6= send j (G) with probability of order 1 j (m) 6= send j (G) implies that (G) implies that player j has exp( Lreceive belief jTj). include T [hj ]) j (m; fj include T [hj ]) j (m; fj Since [Regular Mixture = E, this high probability on = E with probability no less than exp( Lreceive belief jTj). Taking "message > 0 su¢ ciently small, this is su¢ cient for (29). If [Case M ] is the case, then (35) implies that at least one of likely than send j send j (B) and send j (M ) is more (G), except for the prior. The same proof as [Case B] establishes the result. We now prove Claim 5 of Lemma 6. Recall that we have already …xed Lreceive belief so that (34) and (35) hold. Note that we assume include T [hj ]) j (m; fj = R, which implies [Regular Mixture Send], [Regular Action Send], and [Regular Signal Send]. For a moment, suppose that send = 0 and "message = 0 for simplicity. Then, [Regular Mixture Send] and [Regular Action Send] imply that player j takes equality holds only with send j (m) = send j (m) = aj (m) (the last = 0). Since she takes aj (m) for sure, we can see fjinclude [hTj ](Yj j aj (m)) as her entire history. Moreover, [Regular Signal Send] implies that this frequency is equal to the a¢ ne hull a fqj (aj (m); ai )gai 2Ai (with "message = 0). See Figure 6. If fjinclude [hTj ] is close to the ex ante distribution given send j (m), then by the law of iterated expectation, player j believes that player i’s history fiinclude [hTi ] is close to the ex ante distribu39 tion given send j (m). (34) ensures that player j believes that player i has L(m; fiinclude [hTi ]) L(m; ^ fiinclude [hTi ]) + 2Lreceive ^ 2 fG; Bg n fmg and so m(fiinclude [hTi ]) = m with a high probabelief with m bility. In particular, there exists "1 > 0 such that, if fjinclude [hTj ](Yj j aj (m)) qj (aj (m); mix i ) "1 (distance from the ex ante mean is small), (36) then by the law of large numbers, player j believes that L(m; fiinclude [hTi ]) L(m; ^ fiinclude [hTi ]) + jTj), ^ 2 fG; Bgnfmg (and so m(fiinclude [hTi ]) = m) with probability 1 exp( ""1 ;Lreceive 2Lreceive belief with m belief where the coe¢ cient ""1 ;Lreceive > 0 depends on "1 and Lreceive belief . We …x such "1 > 0. In Figure 7, belief "1 > 0 determines the distance between point A and point D. On the other hand, suppose (36) does not hold. This means that her signal frequency fjinclude [hTj ](Yj j aj (m)) is not close to qj (aj (m); mix i ). Since fjinclude [hTj ](Yj j aj (m)) is in a fqj (aj (m); ai )gai 2Ai , this means that her signal frequency is skewed toward qj (aj (m); ai ) for some ai 2 Ai compared to qj (aj (m); probability mix i ). Player j believes that such ai happens signi…cantly more often than the ex ante mix i (ai ) = 1 . jAi j Hence, given "1 > 0, there exists su¢ ciently small "2 > 0 such that, if (36) does not hold, then there exists ai 2 Ai such that the conditional expectation of the frequency of ai is not close to mix i (ai ) = 1 : jAi j For some ai 2 Ai , h E fiinclude [hTi ](ai ) j m; i 1 jAi j > "2 (the frequency of player i’s actions is irregular). send j (m); mix T i ; hj (37) Fix such "2 > 0. In Figure 7, if we take "2 > 0 su¢ ciently smaller than "1 > 0, then points B and C will be included in the interval [A; D]. The log likelihood is continuous in perturbation send and frequency fiinclude [hTi ]. In addition, the conditional expectation of fiinclude [hTi ] is continuous in fjinclude [hTj ]. Hence, there exist su¢ ciently small send > 0 and "message > 0 such that the following claims hold: Given 40 include T [hj ]) j (m; fj = R, Figure 7: Player j’s inference of player i’s history 1. If fjinclude [hTj ](Yj j aj (m)) mix i ) qj (aj (m); "1 (distance from the ex ante mean is small), then player j believes that m(fiinclude [hTi ]) = m with probability 1 exp( 1 " receive 2 "1 ;Lbelief jTj). For su¢ ciently small "message > 0 compared to ""1 ;Lreceive > 0, we can say that player j believes belief that m(fiinclude [hTi ]) = m with probability 1 exp( "message jTj). 2. Otherwise, there exists ai 2 Ai such that h E fiinclude [hTi ](ai ) j m; send j (m); mix T i ; hj i 1 1 > "2 : jAi j 2 For su¢ ciently small "message > 0 compared to "2 > 0, by the law of large numbers, player j believes that fiinclude [hTi ](ai ) E with probability 1 1 jAi j > "message for some ai 2 Ai and so i (receive; fiinclude [hTi ]) = exp( "message jTj). In both cases, Claim 5 of Lemma 6 holds. Finally, since each player takes a fully mixed strategy and Assumption 1 ensures that the distribution of the private signal pro…le has full support, we have Claims 6 and 7 of Lemma 6. In the working paper, we also consider public monitoring, where both players observe the same signal with probability one. There, we make sure that the same claims hold, only using the fact that players are taking private strategies. 41 11 Module for the Review Round In this section, we consider the following jTj-period …nitely repeated game, in which T N is the (arbitrary) set of periods in this …nitely repeated game. In our equilibrium construction later, T corresponds to the set of periods in the review round. In this section, we …x x(i) 2 fG; Bg2 and x(j) 2 fG; Bg2 . As will be seen, x(i) is player i’s inference of state pro…le x 2 fG; Bg2 . Recall that Section 6 de…nes ( i (x))i2I;x2fG;Bg2 for each . Given x(i) 2 fG; Bg2 , with determined in Lemma 7, player i takes ai;t i.i.d. across periods according to to be i (x(i)). As in Section 10, let hTi = fai;t ; yi;t gt2T be the history; fi hTi be the frequency; texclude be the i period excluded from Tinclude i ft 2 T : t 6= texclude g; and fiinclude [hTi ] be the frequency in Tinclude . i i As in the case with the binary message protocol, each player i creates a variable i (x(i); fiinclude [hTi ]) 2 fR; Eg based on fiinclude [hTi ], that is, i (x(i); ) : (Ai Yi ) ! fR; Eg. Again, once i (x(i); fiinclude [hTi ]) = E happens in a review round, player i uses the type-3 reward function for player j (makes player j indi¤erent between any action) in the subsequent review rounds. We make sure that i (x(i); fiinclude [hTi ]) = include T [hi ]) i (x(i); fi E with a small probability and the distribution of does not depend on player j’s strategy (see Claim 2 of Lemma 7 with indices i and j reversed for the formal argument). In addition, related to the type-1 reward in Section 9.2, each player j calculates X include T [hj ]) i (x(j); fj i[ (38) (x(j))](aj;t ; yj;t ) t2Tinclude j = X i[ (x(j))](aj ; yj ) fjinclude [hTj ](aj ; yj ); (aj ;yj )2Aj Yj using the reward function de…ned in Lemma 2. Except for the fact that it does not include the reward in period texclude and that it uses player j’s inference x(j) rather than true state pro…le x, j this function is the same as the type-1 realization of T(l) review (xj ; hj ; l) i with T = T (l). Note that this depends only on fjinclude [hTj ] and x(j) once we …x , since we have …xed ( i [ ](aj ; yj ))(aj ;yj )2Aj in Lemma 2 for each 2 (A). As in Section 9.2, player j will have happens in review round l. include T [hi ]) i (x(i); fi that Yj j (l + 1) = B only if include T [hj ]) i (x(j); fj > u 4L jTj We prove that if x(i) = x(j) (coordination on x goes well) and = R (player i does not use the type-3 reward for player j), then player i believes include T [hj ]) i (x(j); fj u 4L jTj (and so j (l+1) 42 = G) or include T [hj ]) j (x(j); fj = E (and so player j uses the type-3 reward function for player i) with a high probability (see Claim 3 of Lemma 7). This high probability implies the following: As will be seen in Lemma 8, if the coordination does not go well, then as a result of the message exchange about x, player j uses the type-3 reward with a high probability. include T [hi ]) i (x(i); fi Together with the argument in the previous paragraph, player i with = R believes that j (l + 1) = G or player j uses the type-3 reward with a high probability. Hence, if we de…ne player i’s strategy such that she switches to after include T [hi ]) i (x(i); fi j (l + 1), whenever j (l + 1)(i) = B, player i uses the type-3 reward With indices i and j reversed, as mentioned in Section 9.2, player i can condition that whenever i (l + 1)(j) = B, player j uses the type-3 reward. Finally, we prove that if x(i) = x(j), u 4L + 1)(i) = B only = E, then (26) holds. Given such a transition of function for player j. j (l jTj, then player i believes that review (x(i); fiinclude [hTi ]) i include T [hj ]) j (x(j); fj include T [hi ]) j (x(i); fi = R, and > = E with a high probability (see Claim 4 of Lemma 7). This high probability implies the following: As will be seen in Section 13.4.1, if xi = B (player i wants to keep player j’s equilibrium payo¤ low), not yet made player j indi¤erent), and review (x(i); fiinclude [hTi ]) i include T [hi ]) j (x(i); fi > u 4L = R (player i has jTj (self generation is an issue) in a review round, then player i will minimax player j to keep her payo¤ low in all the subsequent review rounds. In such a case, player i believes that player j uses the type-3 reward for player i and any strategy (including the minimaxing one) is optimal. In total, we can construct Lemma 7 Given (Ai payo i (x(i); ) so that the following lemma holds: , u, and L …xed in Section 6.2, there exist 2 (0; payo ), i (x(i); ) : Yi ) ! fR; Eg, and "review > 0 such that, for a su¢ ciently large jTj, the following four properties hold: For each i 2 I, 1. Since texclude is random, the frequency is the su¢ cient statistic to infer the other players’ j variables: For each ^j 2 fR; Eg and ^ i 2 R, we have Prjji = Prjji Here, Prjji n include T [hj ]) j (x(j); fj = ^j ; include T [hj ]) i (x(j); fj o = ^ i j x(j); hTi n o include T include T ^ [hj ]) = j ; i (x(j); fj [hj ]) = ^ i j x(j); fi [hTi ] : j (x(j); fj j x(j); hTi is induced by the following assumptions: Player i’s action sequence fai;t gt2T is given by hTi . On the other hand, player j takes aj;t i.i.d. across periods according to 43 j (x(j)). Given fat gt2T , the signal pro…le yt is drawn from the conditional joint distribution function q(yt j at ) for each t, and player i observes her signal fyi;t gt2T . Finally, player j draws texclude randomly. j 2. For each x(j) 2 fG; Bg2 and for each strategy of player i denoted by (Ai ), the probability of j x(j); i) =1 j = R does not depend on x(j) or 2 for each x(j) 2 fG; Bg2 and 3. For each x(j) 2 fG; Bg2 and hTi 2 (Ai R, then player i believes that either i: Prjji ( i : SjTj s=0 1 (Ai Yi ) s ! include T [hj ]) j (x(j); fj =R i. Yi )jTj , conditional on x(i) = x(j), if include T [hj ]) i (x(j); fj is near zero or review (x(i); fiinclude [hTi ]) i include T [hj ]) j (x(j); fj = = E with a high probability: 9 1 = jTj j x(j); fx(i) = x(j)g ; hTi A ; include T [hj ]) = E _ j (x(j); fj 08 < @ Prjji : u 4L include T [hj ]) i (x(j); fj 4. For each x(i) 2 fG; Bg2 and hTi 2 (Ai R and include T [hi ]) j (x(i); fi > u 4L 1 Yi )jTj , conditional on x(i) = x(j), if jTj, then player i believes that exp( "review jTj): review (x(i); fiinclude [hTi ]) i include T [hj ]) j (x(j); fj = E with a high probability: Prjji include T [hj ]) j (x(j); fj = E j x(j); fx(i) = x(j)g ; hTi 1 exp( "review jTj): Proof. See Appendix A.7. Let us explain why Lemma 7 holds intuitively. First, we de…ne include T [hj ]) j (x(j); fj as follows: Let fjinclude [hTj ](aj ) be the frequency of player j’s actions; and let fjinclude [hTj ](Yj j aj ) be the conditional frequency of player j’s signals given aj , as in Section 10. Player j has include T [hj ]) j (x(j); fj =R only if (and if, except for a small adjustment in the formal proof) both of the following conditions are satis…ed: 1. [Regular Action Review] We say that player j’s action frequency is regular if, for a small "review > 0, we have fjinclude [hTj ](aj ) j (x(j))(aj ) "review for each aj 2 Aj . 2. [Regular Signal Review] We say that player j’s signal frequency is regular if, for a small 44 = "review > 0, we have d fjinclude [hTj ](Yj j aj (x(j))); a fqj (aj (x(j)); ai )gai 2Ai "review : Given this de…nition, Claim 1 holds since texclude is random and other variables depend only on j fjinclude [hTj ]. In addition, Claim 2 holds for the following reasons. [Regular Action Review] is solely de- termined by player j’s mixture, which player i cannot a¤ect. the probability of [Regular Signal Review] since a Moreover, player i cannot a¤ect fqj (aj (x(j)); ai )gai 2Ai is the a¢ ne hull with respect to player i’s actions. (The same caution as (31) and (32) is applicable here.) By the law of large numbers, [Regular Action Review] and [Regular Signal Review] happen with a high probability. Hence, Given this de…nition of include T [hj ]) j (x(j); fj include T [hi ]) i (x(i); fi = R with a high probability. include T [hj ]), j (x(j); fj and we prove Claim 3 of Lemma 7. Since we condition x(j) = x(i), let x(j) = x(i) = x. As in Claim 5 of Lemma 6, with i and j reversed, aj (m) replaced with ai (x), send replaced with , replaced with "review , we have the following: Given send j (m) replaced with include T [hi ]) i (x; fi i (x), and "message = R, there are following two cases: 1. fiinclude [hTi ](Yi j ai (x)) is close to qi (ai (x); j (x)). This means that fiinclude [hTi ](Yi j ai (x)) is close to the ex ante mean given (ai (x); j (x)). For a small perturbation , this means that fiinclude [hTi ] is close to the ex ante mean given by ( i (x); j (x)). By the law of iterated expectation, the conditional expectation of fjinclude [hTj ] is close to the ex ante mean given ( i (x); j (x)). By the law of large numbers, player i believes that to the ex ante mean given ( i (x); i (x); j (x)) include T [hj ]) i (x; fj is close j (x)). By Claim 1 of Lemma 2, the ex ante mean of given ( include T [hj ]) i (x; fj include T [hj ]) i (x; fj = P t2Tinclude j i[ (x)](aj;t ; yj;t ) is zero. Therefore, for su¢ ciently small "review > 0, player i believes that u 4L jTj with probability 1 exp( "review jTj). 2. Otherwise, fiinclude [hTi ](Yi j ai (x)) is skewed toward qi (ai (x); aj ) for some aj 2 Aj compared to qi (ai (x); and j (x) j (x)). Again, for su¢ ciently small , since player i takes ai (x) often and ai (x) are close to each other, this implies that player i believes that fjinclude [hTj ](aj ) is 45 not close to j (x(j))(aj ) for some aj 2 Aj (and so include T [hj ]) j (x; fj = E) with probability exp( "review jTj) for a su¢ ciently small "review . 1 In both cases, Claim 3 of Lemma 7 holds. Similarly, we prove Claim 4 of Lemma 7. Since we assume x(j) = x(i) and i (x; fiinclude [hTi ]) = R, again, there are following two cases: The …rst is that fiinclude [hTi ] is close to the ex ante mean given by ( i (x); zero, j (x)) include T [hi ]) j (x; fi include T [hj ]) j (x; fj 12 with x = x(j) = x(i). > u 4L is jTj is not the case. The remaining case is that player i believes that = E. Road Map to Prove (5)–(8) given the Modules Again, recall that we have …xed each , S, include T [hi ]) j (x; fi Then, since the ex ante mean of S(t) i , min; i j, for each , qG , qB , c:i: i , i[ ], xj i , x u > 0, ui j i2I;xj 2fG;Bg , (a(x); > 0, (vi (xj ); ui (xj ))i2I;xj 2fG;Bg , L 2 N, payo ; i (x); i (x))x2fG;Bg2 for > 0, ai (G), ai (B), mix i , and "strict in Section 6. Then, we …x the structure of the …nitely repeated game in Section 7. The length of the …nitely repeated game is TP (T ) with T being a parameter. send Given these variables, we …x 1, m : Yi ) ! fG; Bg, (Ai j send (G), i (receive; j ): fR; Eg so that Lemma 6 holds; and we …x send (B), send (M ), > 0, "message > 0, Kmessage < Yi ) ! fR; Eg, and (Ai 2 (0; j payo ), i (x(i); ): j (m; ): (Aj Yj ) ! Yi ) ! fR; Eg, and (Ai "review > 0 such that Lemma 7 holds. Given these variables/functions, in Sections 13-16, we de…ne the strategy and reward, and then verify (5)–(8). Speci…cally, we de…ne player i’s strategy in the coordination and main blocks for each T in report jh i i Section 13. The de…nition of the strategy L in the report block will be postponed until Section 15. In Section 14, we de…ne player i’s reward function TP +1 ) i (xj ; hj is the summation of i (xj ), xj i (aj;t ; yj;t ), TP +1 ) i (xj ; hj review , i for each T . As will be seen, adjust , i …rst three elements in Section 14 and will postpone the de…nition of and adjust i report . i and We de…ne the report i until Section 15. In Section 15, Lemma 13 de…nes de…nition of the strategy i (xi ) report jh i i L , and the reward adjust , i and TP +1 ) i (xj ; hj 46 report i for each T , which completes the for each T . Finally, in Section 16, we prove that, for a su¢ ciently large T , the de…ned strategy and reward satisfy (5)–(8). The technical proofs for the claims in Sections 13-16 are further relegated to Appendix A (online appendix), and Appendix B contains the list of the frequetly used notation. 13 Strategy in the Coordination and Main Blocks Sections 13.1 and 13.3 de…ne the strategy in the coordination blocks for xj and xi , respectively. In particular, player i creates xj (i) and xi (i). Section 13.2 derives properties about xj (i). Given x(i) = (xi (i); xj (i)), in Section 13.4, we de…ne player i’s strategy in main block l. As will be seen, her strategy in main block l depends on two variables i (l) 2 fG; Bg and j (l)(i) 2 fG; Bg. Sections 13.4.1, 13.4.2, and 13.4.3 de…ne her strategy in review round l, the supplemental round for j (l + 1), and that for and j (l)(i) 13.1 + 1), given i (l i (l) and Finally, we de…ne the transition of i (l)(j). i (l) in Section 13.4.4. Strategy in the Coordination Block for xj In order to coordinate on xj , player j sends xj 2 fG; Bg in rounds (xj ; 1) and (xj ; 2). Given these two rounds, player i creates her inference xj (i) 2 fG; Bg. Given xj (i), player i sends xj (i) in round for (xj ; 3). 1 In particular, in round (xj ; 1), player j sends xj 2 fG; Bg spending T 2 periods as explained 1 in Section 10 (with m replaced with xj , and T = T(xj ; 1) with jTj = T 2 ). structs include T(xj ;1) [hj ]) j (xj ; fj include T(xj ;1) [hi ]) i (receive; fi T(xj ;1) 2 fR; Eg, and player i constructs xj (fiinclude [hi Player j con]) 2 fG; Bg and 2 fR; Eg. In addition, for each n 2 fi; jg, let texclude (xj ; 1) be the pen riod such that player n does not use her history in period texclude (xj ; 1) when she determines the n continuation play. 2 Then, in round (xj ; 2), player j re-sends xj spending T 3 periods as explained in Section 10 (with 2 m replaced with xj , and T = T(xj ; 2) with jTj = T 3 ). Player j constructs T(xj ;2) fR; Eg, and player i constructs xj (fiinclude [hi ]) 2 fG; Bg and include T(xj ;2) [hj ]) j (xj ; fj include T(xj ;2) [hi ]) i (receive; fi 2 2 fR; Eg. For each n 2 fi; jg, texclude (xj ; 2) is randomly selected. n T(xj ;1) Given xj (fiinclude [hi T(xj ;2) ]) 2 fG; Bg and xj (fiinclude [hi 47 ]) 2 fG; Bg, player i creates xj (i) 2 Figure 8: How to create xj (i) fG; Bg as follows: T(xj ;2) , player i uses xj (fiinclude [hi 1. [Round (xj ; 2) xj (i)] With probability 1 T(xj ;2) to infer xj : xj (i) = xj (fiinclude [hi ]); T(xj ;1) 2. [Round (xj ; 1) xj (i)] With probability , player i uses xj (fiinclude [hi T(xj ;1) infer xj : xj (i) = xj (fiinclude [hi ]) in round (xj ; 2) ]) in round (xj ; 1) to ]). We summarize player i’s inference in Figure 8. 1 Finally, in round (xj ; 3), player i sends xj (i) 2 fG; Bg spending T 2 periods as explained in Section 10 (with indices j and i reversed, m replaced with xj (i), and T = T(xj ; 3) with 1 jTj = T 2 ). include T(xj ;3) [hi ]) i (xj (i); fi Player i constructs T(xj ;3) xj (i)(fjinclude [hj ]) 2 fG; Bg and 2 fR; Eg, and player j constructs include T(xj ;3) [hj ]) j (receive; fj 2 fR; Eg. For each n 2 fi; jg, texclude (xj ; 3) is randomly selected. n Given include T(xj ;2) [hj ]) j (xj ; fj T(xj ;3) 2 fR; Eg and xj (i)(fjinclude [hj ]) 2 fG; Bg, player j creates xj (j) 2 fG; Bg as follows: 1. [Adhere xj (j)] If player j has include T(xj ;2) [hj ]) j (xj ; fj = R in round (xj ; 2), then player j mixes the following cases: (a) [Round (xj ; 2) xj (j)] With probability 1 , player j adheres to her own state: xj (j) = xj ; (b) [Round (xj ; 3) xj (j)] With probability , player j listens to player i’s message in round T(xj ;3) (xj ; 3): xj (j) = xj (i)(fjinclude [hj 2. [Not Adhere xj (j)] If player j has ]). include T(xj ;2) [hj ]) j (xj ; fj = E in round (xj ; 2), then player j T(xj ;3) always listens to player i’s message in round (xj ; 3): xj (j) = xj (i)(fjinclude [hj We summarize player j’s inference in Figure 9. 48 ]). Figure 9: How to create xj (j) 13.2 Properties of xj (i) and xj (j) It will be useful to derive the following properties about the inferences. Lemma 8 Given the above construction of xj (i) and xj (j), we have the following: For a su¢ ciently large T , for each i; j 2 I and xj ; xj (i); xj (j) 2 fG; Bg, T(xj ;1) 1. For each hi xj (j)], T(xj ;2) , hi T(xj ;3) , and hi include T(xj ;1) [hj ]) j (xj ; fj , if xj (i) 6= xj (j), then player i believes that [Round (xj ; 3) = E, or 08 > > [Round (xj ; 3) xj (j)] < B> B j ;1) Pr B _ j (xj ; fjinclude [hT(x ]) = E j > @> > : _ (x ; f include [hT(xj ;2) ]) = E j j j j T(xj ;1) 2. For each hj (xj ; 1) xj (i)] or T(xj ;2) , hj 9 > > > = > > > ; T(xj ;3) , and hj include T(xj ;2) [hj ]) j (xj ; fj j = E with a high probability: 1 C T(x ;1) T(x ;2) T(x ;3) C xj ; xj (j); hi j ; hi j ; hi j C A 1 1 exp( 2T 3 ): (39) , if xj (j) 6= xj (i), then player j believes that [Round include T(xj ;2) [hi ]) i (receive; fi 08 > > [Round (xj ; 1) xj (i)] B> < B j ;2) Pr B _ i (receive; fiinclude [hT(x ]) = E i @> > > T(x ;3) : _ (x (i); f include [h j ]) = E i j i i = E with a high probability: 9 > > > = > > > ; T(xj ;1) j xj ; xj (i); hj Proof. See Appendix A.8. 49 T(xj ;2) ; hj 1 C C A T(xj ;3) C ; hj 1 1 exp( 2T 3 ): (40) include T(xj ;1) [hj ]) j (xj ; fj As will be seen, if [Round (xj ; 3) xj (j)], = E, or include T(xj ;2) [hj ]) j (xj ; fj = E is the case, then player j uses the type-3 reward (makes any strategy of player i optimal). Hence, (39) ensures that, whenever player i realizes that player i’s inference is miscoordinated with player j’s inference, any strategy is optimal for player i with a high probability, as required by (26). Similarly, (40) ensures that, whenever player j realizes that player j’s inference is miscoordinated with player i’s inference, any strategy is optimal for player j with a high probability. We explain why (39) holds. First, Figure 9 implies that if xj (j) 6= xj , then we have [Round include T(xj ;2) [hj ]) j (xj ; fj (xj ; 3) xj (j)] or = E. Hence, we concentrate on xj (j) = xj . T(xj ;1) Look at Figure 8 for how player i infers xj (i). Suppose player i has xj (i) = xj (fiinclude [hi 1 ]) 6= xj (j) = xj . Since the round (xj ; 1) lasts for T 2 periods, Claim 2 in Lemma 6 with m = xj implies Pr n T(xj ;1) include [hj j (xj ; fj ]) = E o T(xj ;1) j xj ; hi 1 1 "message T 2 : exp Given xj , by Figure 9, player j’s strategy in rounds (xj ; 2) and (xj ; 3) and her inference xj (j) are independent of her history in (xj ; 1). Hence, this inequality is su¢ cient for (39). T(xj ;2) Suppose next player i has xj (i) = xj (fiinclude [hi 2 ]) 6= xj (j) = xj . Since the round (xj ; 2) lasts for T 3 periods, Claim 2 in Lemma 6 with m = xj implies n Pr include T(xj ;2) [hj ]) j (xj ; fj =E o T(xj ;2) j xj ; hi 1 exp 2 (41) "message T 3 : Since player j uses her history in the round (xj ; 2) to determine xj (j) by Figure 9, this inequality does not immediately imply (39). Nonetheless, we can still prove (39) as follows. Observe that by Figure 9, with probability at least , regardless of player j’s history in (xj ; 2), T(xj ;3) player j uses the inference in (xj ; 3): xj (j) = xj (i)(fjinclude [hj ]). Since the length of round 1 2 (xj ; 3) is T , Claim 6 of Lemma 6 (with i and j reversed and m = xj (i)) implies that player i with T(xj ;3) hi T(xj ;3) believes that any xj (i)(fjinclude [hj 1 ]) happens with at least probability exp( Kmessage T 2 ). In total, player i believes that any xj (j) is possible with at least probability T(xj ;2) given (hi T(xj ;3) ; hi Hence, the belief update about ). 1 xj (j) is of order exp( T 2 ). 50 include T(xj ;2) [hj ]) j (xj ; fj 1 exp( Kmessage T 2 ) T(xj ;3) from hi and 2 1 Since T 3 T 2 (here is where we use the assumption that round (xj ; 2) is much longer than round (xj ; 3)), (41) implies that n Pr include T(xj ;2) [hj ]) j (xj ; fj =E o T(xj ;2) j xj ; xj (j); hi T(xj ;3) (42) ; hi 2 is of order exp( T 3 ). Conditional on xj and xj (j), player j’s strategy in round (xj ; 1) is independent of her history in (xj ; 2) and (xj ; 3). Hence, (42) is su¢ cient for (39). We second prove (40). Look at Figure 9 for how player j infers xj (j). Suppose player j has T(xj ;3) xj (j) = xj (i)(fjinclude [hj 1 ]). Then, since round (xj ; 3) lasts for T 2 periods, Claim 2 of Lemma 6 (with i and j reversed and m = xj (i)) implies Pr n T(xj ;3) include [hi i (xj (i); fi ]) = E o T(xj ;3) j xj (i); hj 1 1 "message T 2 : exp Given xj (i), player i’s strategy in rounds (xj ; 1) and (xj ; 2) is independent of her history in (xj ; 3). Hence, this inequality is su¢ cient for (40). Suppose next that player j adheres to xj . In this case, by Figure 9, player j has include T(xj ;2) [hj ]) j (xj ; fj R. Hence, by Claim 5 of Lemma 6, we have 08 < Pr @ : 9 1 = =E T(xj ;2) A j x ; h j j T(x ;2) ; _xj (fiinclude [hi j ]) = xj include T(xj ;2) [hi ]) i (receive; fi 1 2 "message T 3 : exp T(xj ;1) To see why this is su¢ cient for (40), we deal with player j’s learning from xj (i), hj T(x ;3) hj j (43) , and as follows. By Figure 8, with probability at least , player i uses the inference in round T(xj ;1) (xj ; 1): xj (i) = xj (fiinclude [hi 1 ]). Since the length of round (xj ; 1) is T 2 , Claim 6 of Lemma 6 T(xj ;1) (with m = xj ) implies player j with hj T(xj ;1) believes that any inference xj (fiinclude [hi ]) happens 1 2 with probability at least exp( Kmessage T ). In total, player j believes that any xj (i) is possible with probability at least Since T 2 3 T 1 2 T(xj ;1) 1 exp( Kmessage T 2 ) given (hj T(xj ;2) ; hj ). (here is where we use the assumption that round (xj ; 2) is much longer than round (xj ; 1)), (43) implies 08 < Pr @ : 9 1 = =E T(x ;1) T(x ;2) j xj ; xj (i); hj j ; hj j A ; include T(xj ;2) _xj (fi [hi ]) = xj include T(xj ;2) [hi ]) i (receive; fi 51 = 2 is still of order 1 exp( T 3 ). Moreover, given xj (i), player i’s strategy in round (xj ; 3) is inde- pendent of her history in (xj ; 1) and (xj ; 2). Hence, (43) is su¢ cient for (40). Strategy in the Coordination Block for xi 13.3 The strategy in the coordination block for xi is de…ned in the same way as Section 13.1, with indices i and j reversed. Player i creates T(x ;3) xi (j)(fiinclude [hi i ]) 2 fG; Bg, include T(xi ;1) [hi ]) i (xi ; fi include T(xi ;3) [hi ]) i (receive; xi ; fi T(xi ;1) the other hand, player j creates xi (fjinclude [hj T(xi ;2) xi (fjinclude [hj T(xi ;3) fjinclude [hj 13.4 ]) 2 fG; Bg, 2 fR; Eg, ]) 2 fG; Bg, include T(xi ;2) [hj ]) j (receive; fj include T(xi ;2) [hi ]) i (xi ; fi 2 fR; Eg, 2 fR; Eg, and xi (i) 2 fG; Bg. On include T(xi ;1) [hj ]) j (receive; fj 2 fR; Eg, xi (j) 2 fG; Bg, and 2 fR; Eg, j (xi (j); ]) 2 fR; Eg. Strategy in the Main Block Given the coordination block, each player i has the inference of the state pro…le x(i) = (xi (i); xj (i)) 2 fG; Bg2 . Player i with x(i) takes an action in main block l given fG; Bg. We …rst de…ne the strategy given 13.4.1 Review Round l given i (l) i (l) and i (l) 2 fG; Bg and j (l)(i) 2 j (l)(i): j (l)(i) and In review round l, player i takes actions, depending on (i) her inference of the state pro…le x(i) 2 fG; Bg2 , (ii) the summary statistic of the realization of player j’s reward (calculated by player i) i (l) 2 fG; Bg, and (iii) her inference of player j’s statistic As will be seen, both i (l) and j (l)(i) j (l), denoted by j (l)(i) are functions of the frequency of player i’s history at the be- ginning of review round l, fiinclude [h<l i ]. Hence, precisely speaking, we should write and include <l [hi ]). j (l)(i)(fi Player i picks i (l) 2 fG; Bg. include <l [hi ]) i (l)(fi For notational convenience, we omit fiinclude [h<l i ]. 2 (Ai ) based on i (l) follows (see Figure 10 for illustration). Given and i (l), j (l)(i) at the beginning of review round l as player i takes ai;t according to periods for T periods in review round l. (That is, the mixture to decide once at the beginning of review round l. Given i (l), i (l) player i draws ai;t from i (l) i.i.d. across is conducted only i (l) every period in the round.) 1. If player i has optimal. (If j (l)(i) j (l) = G, then player i believes that with a high probability, any action is = G (coordination goes well), then the reward is not type-2. Whether the 52 reward is type-1 or type-3, both rewards make any strategy of player i optimal by Lemma 2.) In this case, she picks the strategy to control player j’s payo¤s. (a) If = G, then player j’s reward function is not type-2. In this case, player i takes i (l) i (l) (b) If = i (l) i (x(i)) with probability 1 , and i (l) = ; i (x(i)) with probability . = B, then player j’s reward function is type-2. with probability , and i (x(i)) i (l) ; = i That is, with a high probability, player i takes Then, player i takes (x(i)) with probability 1 i (x(i)) if i (l) = G and ; i i (l) = . (x(i)) if i (l) = B. As will be seen in Section 16, this strategy makes sure that player j’s equilibrium payo¤ is vj (xi ) at the same time of satisfying self generation. In addition, we make sure that the support of given j (l) about j (l) is the same regardless of = G. With indices i and j reversed, player i cannot learn j (l)(i) given i (l) i (l)(j) j (l) i (l) 2 fG; Bg by observing = G. This will be important when we calculate player i’s belief update in Lemma 9. 2. If player i has player j takes j (l)(i) ; j = B, then player i believes that her reward function is type-2 and (x(i)) if x(i) = x(j). (She believes that, if x(i) 6= x(j), then player j uses the type-3 reward and any strategy is optimal with a high probability by Lemma 8.) Hence, player i takes the static best response to ; j (x(i)): i (l) = BRi ( ; j (x(i))). For notational convenience, let actioni (l) = R denote the event that player i takes i (l) If = G and takes j (l)(i) ; i (x(i)) if if = B; and let actioni (l) = E denote the complementary event. = G, then actioni (l) = R denotes the event that player i takes the strategy which is taken with a high probability 1 reversed), if T(l) Let hi i has i (l) i (x(i)) j (l)(i) j (l)(i) . As will be seen in Claim 4 of Lemma 11 (with indices i and j = G and actioni (l) = E, then player i will use the type-3 reward for player j. T(l) (ai;t ; yi;t )t in review round l be player i’s history in review round l. Given hi = i (l) = G and takes T(l) review (x(i); fiinclude [hi ]) i i (l) = i (x(i)), , if player then she creates the random variables 2 fR; Eg as in Section 11 with T = T(l) (T periods in review round l). Let texclude (l) be the period randomly excluded by player i. i 53 Figure 10: Determinaction of i (l) On the other hand, player i also calculates player j’s type-1 reward X include T(l) [hi ]) j (x(i); fi j[ (44) (x(i))](ai;t ; yi;t ): t2T(l);t6=texclude (l) i The de…nitions are the same as (38) in Section 11 with indices i and j reversed and T = T(l).15 Given is i i include T(l) [hi ]), j (x(i); fi (l) and (1) = G. Given 1. [Case i i player i de…nes (l), player i de…nes G] If player i has i (l) i (l i (l + 1) as follows: The initial condition + 1) as follows (See Figure 11 for illustration): = G, then player i has i (l + 1) = B at the end of review round l if and only if player j’s score in review round l is excessive: (a) [Case i Not Excessive] If player i has include T(l) [hi ]) j (x(i); fi u T, 4L then i (l + 1) =G at the end of review round l. (b) [Case i Excessive] If player i has include T(l) [hi ]) j (x(i); fi > u T, 4L then i (l + 1) = B at the end of review round l. 2. [Case i B] If player i has round l. That is, i (l) i (l) = B, then player i has Note that the function + 1) = B at the end of review = B is absorbing. With indices i and j reversed, player j also creates 15 i (l include T(l) [hi ]) j (x(i); fi j (l + 1) 2 fG; Bg. is well de…ned even if player i takes 54 i (l) 6= i (x(i)). Figure 11: Transition of 13.4.2 The Supplemental Round for Player j sends j j (l + 1) (l + 1) 2 fG; Bg in the supplemental round for ods as explained in Section 10 (with m replaced with 1 2 jTj = T ). j i (l) Player j creates T( (l + 1) (fiinclude [hi j (l+1)) j ( j (l + T( 1); fjinclude [hj j (l + 1), and T = T( j (l+1)) j (l + 1)) with ]) 2 fR; Eg and player i creates include T( [hi i (receive; fi ]) 2 fG; Bg and 1 + 1) spending T 2 peri- j (l j (l+1)) ]) 2 fR; Eg. For each n 2 fi; jg, texclude ( j (l + 1)) is randomly excluded. n 13.4.3 The Supplemental Round for i (l + 1) The strategy is the same as Section 13.4.2 with indices i and j reversed. Player i creates T( 1); fiinclude [hi i (l+1)) ]) 2 fR; Eg and player j creates include T( [hj j (receive; fj i (l+1)) that we have de…ned the transition of i (l). The initial condition is player i determines 1. [Case j (i) j (l i (l+1)) ]) 2 fG; Bg and i (l) and j (1)(i) j j (l)(i) and i (l) for all l 2 f1; :::; Lg and Hence, to complete the de…nition of the strategy in the main block, we are left to de…ne the transition of Transitions of T( (l + 1) (fjinclude [hj + ( i (l +1)) is randomly excluded. ]) 2 fR; Eg. For each n 2 fi; jg, texclude n Now that we have de…ned player i’s strategy given 13.4.4 i i ( i (l j (l)(i). (l) (i) = G. Inductively, for l 2 f1; :::; L 1g, given j (l)(i) 2 fG; Bg, + 1)(i) 2 fG; Bg as follows (see Figure 12 for illustration): G] If player i has j (l)(i) = G, then player i has j (l + 1)(i) = B in the next block l + 1 if and only if player i decides to “listen to”player j’s message in the supplemental round for j (l + 1) and “infers”that player j’s message is B. Speci…cally, 55 (a) [Regular ^l j (i)] ^ = If player i has i (l) i (x(i)) l) include T(^ [hi ]) i (x(i); fi and = R for each ^ = G, then player i mixes the following two cases: l with i (l) i. [Case j (i) Ignore] With probability 1 Player i has about j (l j (l , player i “ignores” player j’s message: + 1)(i) = G, regardless of player i’s inference of player j’s message + 1), denoted by j T( (l + 1) (fiinclude [hi j (l+1)) ]). As will be seen in Lemma 9, if player i ignores player j’s message, then player i believes that j (l +1) = G or player j uses the type-3 reward with a high probability. In other words, (26) holds when player i ignores the message. To see why this is true with this transition, assume here that player j has i (l)(j) = G, and j (l) not yet switched to = j (l) = G, The …rst condition assumes that player j had j (x(j)). = B by the beginning of review round l. In addition, as will be seen in Lemma 11, if j (l) = G but either i (l)(j) 6= B or j (l) then player j uses the type-3 reward (and so (26) holds trivially). conditions, if and only if j (l) include T(l) [hj ]) i (x(j); fj > u , 4L player j has 6= j (x(j)), Given these j (l + 1) = B by Figure 11 (with indices i and j reversed). ^ = G for each ^l Imagine …rst that player i has j (i)] implies include T(l) [hi ]) i (x(i); fi include T(l) [hi ]) i (x(i); fi u 4L i (l) l. Then, [Regular By Claim 3 of Lemma 7, player i with = R. = R believes that, if x(i) = x(j), then include T(l) [hj ]) i (x(j); fj or player j uses the type-3 reward. Moreover, if x(i) 6= x(j), then player i be- lieves that player j uses the type-3 reward by Lemma 8. In total, player i believes that j (l + 1) = G or player j uses the type-3 reward. Imagine second that player i switched This means l) include T(^ [hi ]) j (x(i); fi that player i with u 4L l) include T(^ [hi ]) i (x(i); fi > ^ = G to i (l) ^+ 1) = B for some ^l i (l l 1. by Figure 11. Claim 3 of Lemma 7 ensures = R believes that player j uses the type-3 reward if x(i) = x(j). Again, if x(i) 6= x(j), then player i believes that player j uses the type-3 reward. In both cases, player i believes that j (l + 1) = G or player j uses the type-3 reward. Note that the above argument establishes player i believing “ j (l + 1) = G or player j using the type-3 reward” at the end of review round l (or in the second case, at the end of review round ^l). Recall that (26) holds conditional on player i’s history 56 Figure 12: Transition of j (l)(i) at the end of review round l + 1. Hence, we still have to deal with player i’s learning about j (l + 1) from her history between the end of review round l (or that of review round ^l) and the end of review round l + 1. See the explanation after Lemma 9 for the details. ii. [Case Listen] With probability , player i “listens to” player j’s message: j (i) Player i has j (l + 1)(i) = j T( (l + 1) (fiinclude [hi j (l+1)) ]). Claim 2 of Lemma 6 ensures that player i believes that either j (l +1)(i) = j (l + 1) or player j uses the type-3 reward. (b) [Irregular j (i)] j (l j + 1)(i) = Otherwise, player i always listens to the message: T( (l + 1) (fiinclude [hi Note that, after each history (unless j (l+1)) j (l)(i) Player i has ]). = B), player i listens to player j’s message with probability at least . See footnote 17 for why this lower bound of the probability is important. 2. [Case j (i) B] If player i has block l + 1. That is, j (l)(i) j (l)(i) = B, then player i has = B is absorbing. 57 j (l + 1)(i) = B in the next 13.4.5 Property of j (l) (i) Now that we have …nished de…ning player i’s strategy in the coordination and main blocks, it will be useful to summarize the property of the inference j (l)(i). Suppose that x(i) = x(j) (otherwise, Lemma 8 ensures that player i believes that player j uses the type-3 reward function) and that player j has i (l)(j) = G (otherwise, as mentioned in the explanation of Claim 4 of Lemma 7, player j uses the type-3 reward). The following lemma ensures that, if x(i) = x(j), player j has has j (l)(i) i (l)(j) = G, and player i = G, then player i believes that, with a high probability, one of the following four events happens: j (l) = G; include T( [hj j ( j (l); fj j (l)) supplemental round; actionj (^l) = E for some ^l some review round ^l l ]) = E happens when player j sends l 1; or l) include T(^ [hj ]) j (x(j); fj j (l) in the = E happens for 1.16 As will be seen in Lemma 11, the cases except for j (l) = G imply that player j uses the type-3 reward. Hence, the following lemma is the basis for (26): Lemma 9 For a su¢ ciently large T , for each i 2 I, xj 2 fG; Bg, and each hi l , if player i has j (l)(i) = G, then we have 0 8 include T( j (l)) < [hj ]) = R j (l) = B ^ j ( j (l); fj B ^ B : T(l) Pr B ^actionj (^l) = j (x(j); fjinclude [hj ]) = R for each ^l @ j xj ; fx(j) = x(i)g ; f i (l)(j) = Gg ; hi l l 9 1 = C C ; C 1 A 1 exp( T 3 ): (45) Proof. See Appendix A.9. Let us give an intuitive explanation. Recall that, given R with ^l n l ^ ^ ^ ) i (l)(j) = G ^ actionj (l) = R with l n o ^ ) j (l) = j (x(j)) : l See Figure 12 for how player i updates 16 l) include T(^ [hj ]) j (x(j); fj j (x(j)), = G, if player j has actionj (^l) = l, then we have ^ ^ i (l)(j) = G ^ actionj (l) = R with l n i (l)(j) then neither o o j (l)(i). since = R nor If player i listens to ^ j (l) = l) include T(^ [hj ]) j (x(j); fj 58 = B is absorbing by Figure 12 (46) is not de…ned if player j does not take l) include T(^ [hj ]) j (x(j); fj i (l)(j) j (x(j)). j (l) in the supplemental We de…ne that if = E is the case. ^ 6= j (l) round for j (l), then by Claim 2 of Lemma 6, if n Pr j( include T( [hj j (l); fj j (l)) ]) = E o j (l)(i) j 6= T( j (l); hi then j (l), j (l)) 1 1 T( exp( "message T 2 ): j (l)) Since player j’s continuation strategy does not depend on hj given j (l), this inequality is su¢ cient for (45) (for su¢ ciently large T ). Hence, we focus on the case where player i does not listen to j (l), 0 8 < j (l) = B B B : T(^ l) Pr B ^actionj (^l) = j (x(j); fjinclude [hj ]) = R for each ^l @ j xj ; fx(j) = x(i)g ; f i (l)(j) = Gg ; hi l Since player i does not listen to and l) include T(^ [hi ]) i (x(i); fi j (l), for each ^l ^ = 9 1 = C C ; C 1 A l 1 exp( T 3 ): ^ = G, we have 1 with i (l) ^ = i (l) i (x(i)) = R. First consider the case where player i has i (l)(j) l and will prove ^ = G for each ^l ^ = G. Unless actionj (^l) = E, player j takes j (l) we have just veri…ed that player i with l 1. For each ^l i (l) ^ = G has i (l) ^ = i (l) ^ = j (l) i (x(i)) j (x(j)). and l 1, suppose On the other hand, l) include T(^ [hi ]) i (x(i); fi =R in the case we are focusing on. Hence, unless actionj (^l) = E, we have 0 B B Pr B @ 0 B B = Pr B @ 1 Since 8 < 9 jTj = : _ (x(j); f include [hT(^l) ]) = E ; j j j n o n o T(^ l) j xj ; fx(j) = x(i)g ; i (^l)(j) = j (^l) = G ; actionj (^l) = R ; fi [hi ] 8 9 1 u < i (x(j); f include [hT(^l) ]) = jTj j j 4L C C ^ T( l) : _ (x(j); f include [h ]) = E ; C by (46) j j j A T(^ l) j xj ; fx(j) = x(i)g ; i (x(i)); j (x(j)); fi [hi ] T(^ l) include [hj ]) i (x(j); fj u 4L 1 C C C A exp( "review T ) by Claim 3 of Lemma 7. ^ = G and j (l) l) include T(^ [hj ]) i (x(j); fj u 4L jTj imply ^ + 1) = G by Figure 11, together j (l with the case of actionj (^l) = E, the above inequality implies 0 n Pr @ l) include T(^ ^ ^ [hj ]) = R j (l + 1) = B ^ actionj (l) = j (x(j); fj n o ^ ^ ^ j xj ; fx(j) = x(i)g ; i (l)(j) = j (l) = G ; hi l 59 o 1 A exp( "T ) (47) with a su¢ ciently small ". In order to prove (45), we are left to prove 0 n l) include T(^ ^ ^ [hj ]) = R j (l + 1) = B ^ actionj (l) = j (x(j); fj n o j xj ; fx(j) = x(i)g ; i (l)(j) = j (^l) = G ; hi l Pr @ o 1 A (48) is of order exp( T ) from (47). (The di¤erence from (47) is that, in (48), we condition on ^ G and hi l , rather than since j (l) i (l)(j) ^ = B is absorbing, we have If (48) holds for each ^l To this end, comparing (47) and (48), we need to deal with learning after review round ^l and conditioning on not only = G (and so ~ i (l)(j) = G for each ~l with probability at least ~ 2f j ( j (l)) (G); send j (B); player i cannot update the belief about i (l)(j) ~ = j (l) send j Given = G. j (x(j)) and ~ = j (l) ; j (x(j)) happen (M )g happens with probability at least 2 . Hence, ^ + 1) so much by observing ~ or j (l i (l)(j) j (l) ~ j ( j (l)) given = G does not change player i’s belief so much for the following reasons: Figure 12 with indices i and j reversed, as long as message about 17 i (l)(j) = G. Moreover, conditioning on m= = G but also regardless of player j’s history. Further, Figure 4 implies that, given send j ^ i (l)(j) l), learning after review round ^l is small for the following reasons: Figure 10 implies that both ~ any 9 1 = C C ; C 1 A 1, then this inequality implies (45), as desired. l We now prove (48). j (l), = = G and hi l .) To see why (48) is su¢ cient for (45), note that, 0 8 < j (l) = B B ^ B l) ^ l Pr B : ^actionj (^l) = j (x(j); fjinclude [hT( j ]) = R for each l @ j xj ; fx(j) = x(i)g ; f i (l)(j) = Gg ; hi l 9 1 0 8 = < ^ ( l + 1) = B j l 1 C B X B C T(^ l) ; : include ^ Pr B C: ^actionj (l) = j (x(j); fj [hj ]) = R A @ n o ^ l=1 j xj ; fx(j) = x(i)g ; i (l)(j) = j (^l) = G ; hi l i (l)(j) i (l)(j) ~ i (l+1) = G, player j listens to player i’s with probability at least .17 Claim 6 of Lemma 6 (with i and j reversed and ~ + 1)) implies that player i with hT( i i (l ~ i (l)(j) ~ i (l+1)) This is where we use the mixture in the inference of believes that any j (l). 60 ~ + 1)(f include [hT( i (l j j ~ i (l+1)) ]) 1 happens with probability exp( Kmessage T 2 ). possible with probability at least 1 Hence, player i believes that 1 exp( Kmessage T 2 ). T for a su¢ ciently large T , (47) implies (48), as desired. Since T 2 Second we consider the case where there exists ^l to ~ + 1)(j) = G is i (l ^ + 1) = B, that is, i (l l) include T(^ [hi ]) j (x(i); fi > l 2 such that player i switches from u 4L by Figure 11. Since player i has until review round ^l, the same proof as above implies that she believes that, given with a high probability, ~l ^ = G unless actionj (~l) = E or j (l) l) include T(~ [hj ]) j (x(j); fj ^ =G i (l) ~ =G i (l) ^ i (l)(j) = G, = E for some ^l.18 Moreover, if ^ = G, then since j (l) ^ = G and i (l) ^ i (l+1) = B imply l) include T(^ [hi ]) j (x(i); fi > u 4L by Figure 11, unless actionj (^l) = E, we have 0 n o 1 =E A n o n o Pr @ T(^ l) ^ ^ ^ j xj ; fx(j) = x(i)g ; j (l) = i (l)(j) = G ; actionj (l) = R ; fi [hi ] 0 1 o n l) include T(^ ]) = E (x(j); f [h j j j A = Pr @ T(^ l) j xj ; fx(j) = x(i)g ; i (x(i)); j (x(j)); fi [hi ] since player j takes 1 by (46) exp( "review T ) by Claim 4 of Lemma 7. In total, player i believes that some ~l j (x(j)) l) include T(^ [hj ]) j (x(j); fj (49) ^ = G unless actionj (~l) = E or j (l) ^l. Player i also believes that if l) include T(~ [hj ]) j (x(j); fj ^ = G, then actionj (^l) = E or j (l) = E for l) include T(^ [hj ]) j (x(j); fj = E. Hence, player i at the end of review round ^l believes that n Pr actionj (~l) = l) include T(~ [hj ]) j (x(j); fj = R for each ~l o n o ^l j xj ; i (^l)(j) = G ; h i is of order exp( T ). We can deal with learning after review round ^l and conditioning on ^ l i (l)(j) =G as above. ^l, she has i (~l) = G for each The precise argument goes as follows: Since player i has i (~l) = G for each ~l ~l ^l 1. Replacing l with ^l in the above argument, player j believes that j (^l) = G unless actionj (~l) = E or l) include T(~ ^l 1. Since “actionj (~l) = E or j (x(j); f include [hT(~l) ]) = E for some [hj ]) = E for some ~l j (x(j); fj j j ~l ^l 1” implies “actionj (~l) = E or j (x(j); f include [hT(~l) ]) = E for some ~l ^l”, the statement holds. 18 j j 61 13.5 Full Support of texclude (r) j As will be seen, it will be useful (in the proof of Lemma 13) that player i believes that any texclude (r) j is possible in each round r, as long as player j’s state i (l)(j) has not switched to i (l)(j) = B: Lemma 10 For a su¢ ciently large T , for each i 2 I, xj 2 fG; Bg, round r, and player i’s history hi L , the following claim holds: Let li be the …rst review round with r satis…es r < li . (If i (l)(j) = B and suppose round i (l)(j) = G for each l = 1; :::; L, then we assume that each r = 1; :::; R satis…es r < li .) For each t 2 T(r), we have texclude (r) = t j xj ; f i (l)(j) = G for all l j Pr Proof. Let send j (B), j (r) send j rg ; hi be player j’s strategy in round r. Since r < li , (M ), mix j , j (x(j)), "action support = x(j)2fG;Bg2 ; j (r)2 n send j ; or (G); j L T j (r) 2 . send can be either j (G), (x(j)). Let send j min (B); send j (M ); ; mix ; j j (x(j)); j o (x(j)) ;aj 2Aj j (r)(aj ) >0 be the lower bound of the probability with which player j takes each action. (r) from T(r), there exists one s 2 T(r) with Since player j picks texclude j Pr texclude (r) = s j j j (r); hi L jT(r)j 1 T 1 : By Assumption 1, the conditional probability is always well de…ned. Hence, we are left to show that the likelihood between texclude (r) = t and texclude (r) = s is bounded by T j j Pr Pr texclude (r) = t j j texclude (r) j =s j j (r); hi L j (r); hi L T 1 1 for each t 2 T(r): (50) : Suppose that texclude (r) = s, (aj;s ; yj;s ) = (aj ; yj ), and (aj;t ; yj;t ) = (~ aj ; y~j ) is the case. Imagine j that player j changes texclude (r) from s to t: texclude (r) = t. At the same time, suppose that Nature j j changes (aj;s ; yi;s ) from (aj ; yj ) to (~ aj ; y~j ), and changes (aj;t ; yj;t ) from (~ aj ; y~j ) to (aj ; yj ). T(r) fjinclude [hj ] is unchanged. each round. Conditional on i (l)(j) Then, = G, player j takes fully mixed strategy in Hence, the likelihood between “(aj;s ; yj;s ) = (aj ; yj ) and (aj;t ; yj;t ) = (~ aj ; y~j )” and 2 “(aj;s ; yj;s ) = (~ aj ; y~j ) and (aj;t ; yj;t ) = (aj ; yj )” is uniformly bounded by "action support "support . Hence, 62 we have texclude (r) = t j j Pr texclude (r) j Pr =s j j (r); hi L j (r); hi L "action support "support 2 ; which implies (50) for a su¢ ciently large T , as desired. 14 Reward Function in the Coordination and Main Blocks Before de…ning the strategy in the report block, we de…ne player i’s reward function. In Section 14.1, we de…ne the variable j (l) 2 fG; Bg, on which the reward function depends, so that satis…es certain properties. Given the de…nition of TP +1 ), i (xj ; hj review i in (52); and adjust i and 14.1 which is the summation of report i is de…ned in (57). ( xj i i (xj ), De…nition of in review round l. review , i adjust , i and j (l) convenience, we write j (l) j (l) = B means that player j uses the type-3 reward for player i j (l) include <l [hj ]) j (l)(fj precisely speaking, for notational j (l). j (l) is relegated to Appendix A.10.1. In Appendix A.10.1, we will de…ne = B if (and only if, except for a small adjustment) at least one of the following happened; when player j received a message in a round r < l, happened; in some review round ^l T(^ l) review (x(j); fjinclude [hj ]) j ~ for ~l j (l) l 1, player j had ^ i (l)(j) = include T(r) [hj ]) j (m; fj include T(r) [hj ]) j (receive; fj ^ = G, took j (l) = E happened; or when player j inferred xi , xj , or ^ = j (l) = E j (x(j)), ~ with ~l i (l) =E l 1 l,19 the result of player j’s mixture was a rare event (for example, when she inferred xj , [Round (xj ; 3) xj (j)] happened. probability is de…ned 2 fG; Bg four cases happens: When player j sent a message m in a round r < l, or took i (xj ) depends on the frequency of player j’s history at the beginning of review The formal de…nition of and report . i has been de…ned in Lemma 2.) The remaining rewards round l, fjinclude [h<l j ]. Although we should write such that Section 14.2 de…nes the reward function will be de…ned in Lemma 13. As will be seen in (53) and (54), j (l) j (l), xj i , j (l) As seen in Figure 9, this event happens only with (rare). After [Round (xj ; 3) xj (j)], player j has 19 j (l) = B). Precisely speaking, since the mixture to determine j (l) happens at the beginning of review round l, j (l) depends on player j’s history at the beginning of review round l, h<l j , and her own mixture at the beginning of review round l. 63 There are following six implications of this de…nition. First, implies j (l j (l) = B is absorbing: j (l) =B + 1) = B. Second, the distribution of does not depend on player i’s strategy. Since we will de…ne j (l) that player j uses the type-3 reward if and only if j (l) = B, this ensures that player i cannot control whether player j uses the type-3 reward or not, as mentioned in Section 9.2. To see why this property is true, recall that Claims 3 and 4 of Lemma 6 ensure that the distribution of include T(r) [hj ]) j (m; fj include T(r) [hj ]) j (receive; fj and does not depend on player i’s strategy. T(^ l) review (x(j); fjinclude [hj ]) j Moreover, Claim 2 of Lemma 7 ensures that the distribution of does not depend on player i’s strategy.20 Finally, player j’s own mixture is out of player i’s control. Hence, in total, the distribution of does not depend on player i’s strategy. j (l) Moreover, recalling that E stands for erroneous (and so it is rare that the realization of a random variable is E), it is rare to have j This ensures that player j does not use the type-3 (l) = B. reward often, as mentioned in Section 9.2. Third, player j with player j has i (l)(j) To see why ~ i (l)(j) = G to i (l)(j) = B has j (l) = B. This ensures that player i can condition that = G since otherwise the type-3 reward makes all the strategies optimal. i (l)(j) = B implies j (l) = B, take ~l 1 such that player j switched from l ~ + 1)(j) = B. As seen in Figure 12 with indices i and j reversed, this means i (l that player j listened to player i’s message ~ + 1). If [Regular i (l i (j)] was the case (see Section 13.4.4 with indices i and j reversed), then listening to player i’s message is the rare realization of player j’s own mixture and so If [Regular or had ^ i (l)(j) i (j)] j (l) = B. is not the case, then player j with l) include T(^ [hj ]) j (x(j); fj = E for some ^l ~l. Since player j had = G. Hence, if she did not take her own mixture for ^ j (l). In total, player j has Fourth, given j (l) If j (l) j (x(j)) in l) include T(^ [hj ]) j (x(j); fj In other words, if actionj (l) = E, then l, then j (l) j (x(j)) = G, she also had i (l)(j) = E, then player j has j (x(j)) j (l) ~ ^ = j (l) review round ^l, then this is a rare realization of = B whenever player j switched from = G, player j takes actionj (^l) = E for some ^l ^ = G did not take j (l) = B. if j (l) ~ i (l)(j) j (l) = G to = G and takes Moreover, since = B by de…nition. j (l) ; j ~ i (l+1)(j) (x(j)) if j (l) = B. = B. = B is absorbing, if = B. To see why, note that the third property implies that given j (l) = G, player j has i (l)(j) = G. T(^ l) Since review (x(j); fjinclude [hj ]) is de…ned only if player j takes j (^l) = j (x(j)), we need some adjustment so j that player i does not want to control the distribution of j (l) by changing the distribution of j (^l). 20 64 Given takes i (l)(j) ; j = G, Figure 10 with indices i and j reversed ensures that the event that player j (x(j)) with j (l) = G or takes no more than ). Since we have j (l) j (x(j)) with j (l) = B is rare (happens with probability = B after a rare realization, the result follows. Fifth, if player j has xj (j) 6= xj , then we have j (l) = B. Recall that player j’s state xj controls player i’s payo¤ (see (5) for example). This property means that when player j does not adhere to her own state (“gives up”controlling player i’s payo¤), player i is indi¤erent between any action. Figure 9 ensures that, if xj (j) 6= xj , then either include T(xj ;2) [hj ]) j (xj ; fj = E or [Round (xj ; 3) xj (j)] (rare realization of player j’s mixture) happens. Hence, xj (j) 6= xj implies Finally, suppose that player i conditions on conditioning is optimal). j (l) = G or j (l) j (l) = B. = G (see the third property for why this i (l)(j) Player i believes that, if she has x(i) 6= x(j) or = G, then j (l)(i) = B with a high probability. That is, as seen in (26), player i believes that, if the coordination does not go well, then player j uses the type-3 reward with a high probability. To see why player i holds such a belief, we …rst explain that, if player i has x(i) 6= x(j), then she believes that j (l) = B with a high probability. x(i) 6= x(j) happens only if one of the following two cases happens: xj (i) 6= xj (j) or xi (i) 6= xi (j). We consider these two cases in the sequel. Given (39) in Lemma 8, if player i has xj (i) 6= xj (j), then given xj and xj (j), player i believes include T(xj ;1) [hj ]) j (xj ; fj that [Round (xj ; 3) xj (j)] (a rare realization of player j’s mixture), include T(xj ;2) [hj ]) j (xj ; fj = E, or = E at the end of coordination block for xj . Since player j’s continuation strategy does not depend on her history in the coordination block for xj conditional on xj (j), this belief means that, in review round l, player i believes that j (l) = B. Again, given (40) in Lemma 8 with indices i and j reversed, if player i has xi (i) 6= xi (j), then given xi (j), player i believes that [Round (xi ; 1) xi (j)] (a rare realization of player j’s mixture), include T(xi ;2) [hj ]) j (receive; fj = E, or include T(xi ;3) [hj ]) j (xi (j); fj = E at the end of coordination block for xi . Since player j’s continuation strategy does not depend on her history in the coordination block for xi given xi (j), this belief means that, in review round l, player i believes that Hence, we are left to show that, given x(i) = x(j), if player i has that j (l) = G or j (l) with x(i) = x(j) and j (l)(i) = G, then she believes include T( [hj j (l) = G, j ( j (l); fj ^ include T(l) [hj ]) = E for some ^l l j (x(j); fj = G believes that player j has l 1, or seen in the fourth property, actionj (^l) = E for some ^l j (l)) = B. = B with a high probability. Recall that Lemma 9 ensures that player i E, actionj (^l) = E for some ^l include T( [hj j ( j (l); fj j (l)(i) j (l) ]) = E and l) include T(^ [hj ]) j (x(j); fj 65 l 1 means = E imply j (l) j (l) = B. j (l)) 1. ]) = As In addition, = B by de…nition. Hence, Lemma 9 implies that player i believes that player j has j (l) = G or j (l) = B with a high probability. In total, we prove the following lemma: Lemma 11 For each j 2 I and l 2 f1; :::; Lg, we can de…ne a random variable j (l) 2 fG; Bg whose distribution is determined by player j’s history h<l j such that the following properties hold: 1. j (l) = B is absorbing: j (l) 2. For each xj 2 fG; Bg and j (xj ); Pr (f j i ), i =B) 2 i, j (l + 1) = B. the distribution of does not depend on player i’s strategy (l) = Bg j 3. For each hj l with j (xj ); i (l)(j) i) i = G and takes ; j 2 denoted by Pr( i. Moreover, j (l) j j (l) = B is rare: (15 + 8L) . = B, we have j (l) = B. 4. For each j 2 I and l 2 f1; :::; Lg, for each hj l with j (l) j (l), (x(j)) if j (l) j (l) = G, player j takes j (x(j)) if = B. 5. For each j 2 I and l 2 f1; :::; Lg, for each hj l with xj (j) 6= xj , we have j (l) = B. 6. For a su¢ ciently large T , for each i 2 I, xj 2 fG; Bg, l 2 f1; :::; Lg, and hi l , if x(i) 6= x(j) or j (l)(i) j (l) = G with a positive probability with hi l , then player i believes that = B with a high probability conditional on xj , x(j), and Pr fx(i) 6= x(j) _ j (l)(i) i (l)(j) = Gg j xj ; x(j); hi l j (l) = G or = G: If hi l satis…es > 0; then we have Pr f j (l) = G _ j (l) = Bg j xj ; f i (l)(j) = Gg ; hi l 1 1 exp( T 3 ): Proof. See Appendix A.10. 14.2 De…nition of the Reward Function We are now ready to de…ne player i’s reward function, given the above de…nition of reward is the summation of (i) a constant i (xj ), j (l). The total (ii) the reward for rounds other than review rounds, 66 PR r=1:round r is not a review round and adjust , i P t2T(r) xj i (aj;t ; yj;t ), (iii) the reward for review round l = 1; :::; L, report : i and (iv) the reward for the report block TP +1 ) i (xj ; hj = i (xj ) R X + review i X xj i (aj;t ; yj;t ) r=1 t2T(r) round r is not a review round + + L n X T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) i + adjust (xj ; hj l ; hreport ; l) i j l=1 report (xj ; hj L ; hreport ): i j o (51) We de…ne and explain each term in the sequel. First, we de…ne the constant i (xj ) = TP vi (xj ) i (xj ) 8 hP L < E l=1 1f : such that x T ui (xj ) + 1f j (l)=Bg sign (xj ) LT u + T ui j j (l)=Gg o n 2 1 x 2 3 ui j + (2 + 2L) T + 2T i 9 j xj = ; : (52) We de…ne this constant so that player i’s average value from the review phase is equal to vi (xj ) in order to satisfy promise keeping. As will be seen in (85), the value in the biggest parenthesis is the value (without being divided by TP ) from the coordination, main, and report block. Second, the term R X X xj i (aj;t ; yj;t ) r=1 t2T(r) round r is not a review round cancels out the e¤ect of the instantaneous utilities on the incentives for the rounds which are not review rounds. This reward incentivizes the players to take actions in order to exchange messages in the coordination block and supplemental rounds. Third, we de…ne if i (l)(j) review i = B (recall that such that i (l)(j) review i satis…es the properties mentioned in Section 9. Firstly, = B implies sign (xj ) LT u + j (l) X = B), then player j uses the type-3 reward xj i (aj;t ; yj;t ): (53) t2T(l) Compared to (25) in Section 9, we add sign(xj ) LT u. Since sign(xj ) sign (xj ) LT u = LT u, this term gives us enough slack in self generation once the type-3 reward is used. 67 Secondly, if i (l)(j) = G, then if j (l) = B, then again player j uses the type-3 reward sign (xj ) LT u + X xj i (aj;t ; yj;t ): (54) t2T(l) If i (l)(j) = j (l) issue or not). If = G, then the reward function depends on j (l) (whether self generation is an = G (self generation is not an issue), then player j uses the type-1 reward T fui (xj ) ui ( (x(j)))g + X (x(j))](aj;t ; yj;t ) is ui ( i[ (55) (x(j))](aj;t ; yj;t ): t2T(l) By Lemma 2, any strategy of player i is optimal. ui (at )+ i [ j (l) In addition, since the expected payo¤ from (x(j))), the value from the review round (without being divided by T or TP ) is T ui (xj ). Here, we ignore the e¤ect of her strategy in review round l on the payo¤s from the subsequent rounds. See Section 16 for the formal proof of why it is optimal for player i to ignore this e¤ect. On the other hand, if j (l) = B (self generation is an issue), then player j uses the type-2 reward T fui (xj ) By Claim 4 of Lemma 11, player j with ui (BRi ( j (l) ; j ; (x(j))); = G and j j (l) (56) (x(j)))g: ; = B takes (x(j)). Since the reward j is constant, as long as the coodination goes well and player i has x(i) = x(j) and ; player i’s equilibrium strategy, which takes a static best response to addition, since the expected payo¤ from ui (at ) is ui (BRi ( ; j (x(j))); review round (without being divided by T or TP ) is T ui (xj ). 68 j ; j j (l)(i) = B, (x(j)), is optimal. In (x(j))), the value from the review i In total, we de…ne the reward as follows: T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) i 8 < X = 1f i (l)(j)=Bg sign (xj ) LT u + : t2T(l) | {z +1f i (l)(j)=Gg 8 > > > > > > > > > > > > 1f > > > > < j (l)=Gg > > > > > > > > > > > > > > > > : 8 > > > > > > > > < (53) 1f (57) 9 = xj i (aj;t ; yj;t ) ; } j (l)=Gg > > > > > +1f > > > : 8 < : + | T fui (xj ) P ui ( (x(j)))g (x(j))](aj;t ; yj;t ) ; {z } i[ t2T(l) 9 > > > > > > > > = 9 = 9 > > > > > > > > > > > > > > > > = > > > > > T fu (x ) u (BR ( (x(j))); (x(j)))g > : i j i i j j j (l)=Bg > | {z } > ; > > (56) > 9 8 > > > = < > X x > j > > (a ; y ) +1f j (l)=Bg sign (xj ) LT u + j;t j;t > i > ; : > > t2T(l) > > | {z } ; (55) ; ; (54) Note that x(j), i (l)(j), j (l), and j (l) are determined by xj , fjinclude [hj l ], and player j’s own mixP P x T(l) ture, and t2T(l) i j (aj;t ; yj;t ) and t2T(l) i [ (x(j))](aj;t ; yj;t ) are determined by fj [hj ]. Hence, review i T(l) is a function of xj , fjinclude [hj l ], fj [hj ], and l. Given this de…nition of review , i Player i has x(i) 6= x(j) or j (l)(i) by Claim 3 of Lemma 11) and j (l)(i) = B but player j has of player i is optimal given player i’s strategy is optimal unless the coordination goes wrong: = G even though player j has j (l) j (l) j (l) j (l) = G (this implies i (l)(j) =G = B. One may wonder what if player i has x(i) 6= x(j) or = j (l) = G. This case is not a problem since any strategy = G. To see why, note that Lemma 2 ensures that, with type-1 reward (55), any strategy is optimal. Let x(i) and i (x(j); l) j (l)(i) <l be the set of player i’s histories fi [h<l i ] such that player i with fi [hi ] has x(j) = = B with probability one: i (x(j); l) 8 9 exclude < = For any realization of ti (r) for r < l, fi [h<l ] : : i : ; player i with fi [h<l ] has x(j) = x(i) and (l)(i) = B j i 69 The discussion above means that, if and only if i (l)(j) =G^ j (l) =G^ j (l) = B ^ fi [h<l i ] 62 T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l). i player i’s strategy would be suboptimal if the reward for review round l were target i If player j used the following reward function would be optimal after each history: Again, if the type-3 reward (53) or (54), respectively. uses the type-1 reward (55) as before. fi [h<l i ] 62 i (x(j); l)”, If review , i rather than = B or i (l)(j) i (l)(j) In addition, if = j (l) i (l)(j) j (l) i (x(j); l)”, then player i’s strategy = B, then player j uses = G and = j (l) then player j uses the type-1 reward as well. “ j (l) = B and fi [h<l i ] 2 (58) i (x(j); l); If = G, player j j (l) = G and “ j (l) = B and i (l)(j) = j (l) = G and player j uses the type-2 reward (56). Intuitively, whenever player i’s history satis…es (58), player j uses the type-1 reward rather than type-2. Since Lemma 2 ensures that the type-1 reward makes any strategy of player i optimal, with such a reward function, player i’s strategy would be optimal after each history. In total, target i is de…ned as T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i = 1f Since i (l)(j)=Bg 8 < : sign (xj ) LT u + X t2T(l) (59) 9 = xj i (aj;t ; yj;t ) ; +1f i (l)(j)=Gg 8 8 > > 1 > > <l > > f j (l)=G_f j (l) = B ^ fi [hi ] 62 i (x(j); l)gg > > > > | {z } > > > > > > th e c a se w ith w h ich > > > review > is su b o p tim a l < > i 8 9 > > < 1f j (l)=Gg < T fu (x ) u ( (x(j)))g = i j i > > > P > > : + > > > (x(j))](aj;t ; yj;t ) ; > t2T(l) i [ > > > > > > > > : +1 > fui (xj ) ui (BRi ( j ; (x(j))); > > f j (l)=B^fi [h<l i ]2 i (x(j);l)g > n o > P > x : +1f j (l)=Bg sign (xj ) LT u + t2T(l) i j (aj;t ; yj;t ) target i is a function of whether fi [h<l i ] is included in i (x(j); l) 9 > > > > > > > > > > = ; j > > > > > > > > > > (x(j)))g ; 9 > > > > > > > > > > > > > = > > > > > > > > > > > > > ; : or not, this reward is a function of fi [h<l i ] as well. Instead of replacing to review : i The reward review i adjust , i with target , i we add the expected di¤erence between which is added to review i 70 review i and target i in (51), is de…ned so that the expected value of adjust i adjust (xj ; hj l ; hreport ; l) i j 13, we will de…ne E = E if i (l)(j) review i review i is equal to the expected di¤erence between = G and and target i h h report jh i i and adjust (xj ; hj l ; hreport ; l) i j L target . i and In particular, in Lemma so that j xj ; f i (l)(j) = Gg ; report jh i i l ; hi T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i l i j xj ; f i (l)(j) = Gg ; hi i h T(l) l l include E review (x ; f [h ]; f [h ]; l) j x ; f (l)(j) = Gg ; h j j j j i i j j i adjust (xj ; hj l ; hreport ; l) i j is zero if is identically equal to zero if i (l)(j) i (l)(j) l i (60) = B. (Note that the di¤erence between report jh i i = B.) Here, l distribution of player i’s strategy in the report block given hi l . This property of is the conditional adjust i corresponds to (28) in Section 9.4. We postpone the de…nition of 14.3 adjust i and report i to Lemma 13. target i Small Expected Di¤erence between and review i Recall that Claim 6 of Lemma 11 ensures that player i who has x(i) 6= x(j) or that it is rare for player j to have the di¤erence between target i and j (l) = B and review i adjust ) i = G given i (l)(j) = G believes = G. On the other hand, is not zero only if player i has x(i) 6= x(j) or with a positive probability and player j has (and so the expected value of j (l) j (l)(i) j (l) = B and j (l) j (l)(i) =G = G. Hence, the expected di¤erence is close to zero, conditional on i (l)(j) = G. This closeness will play an important role when we de…ne player i’s strategy in the report block in Lemma 13. Lemma 12 For a su¢ ciently large T , for each i 2 I, l 2 f1; :::; Lg, xj 2 fG; Bg, and hi l , the di¤erence (60) depends only on the frequency of player i’s history: i j xj ; f i (l)(j) = Gg ; hi l h i T(l) l l include (x ; f [h ]; f [h E review ]; l) j x ; f (l)(j) = Gg ; h j j j j i i j j i h i T(l) target l l l include = E i (xj ; fi [hi ]; fj [hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ] h i T(l) l l review include E i (xj ; fj [hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ] E h T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i 71 Moreover, this di¤erence is su¢ ciently small: E h i j xj ; f i (l)(j) = Gg ; fi [hi ] h i T(l) l l review include E i (xj ; fj [hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ] T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i Proof. Both i (l)(j) review i target i and l 1 exp( T 4 ): T(l) depend only on xj , fi [hi l ], fjinclude [hj l ], and fj [hj ]. In addition, (r) is random, fi [hi l ] is a su¢ cient statistic. depends only on fjinclude [hj l ]. Since texclude j Recall that fi [h<l i ] 62 i (x(j); l) target i and review i di¤er only if Since both review i and Comparing (57) and (59), B ^ fi [h<l i ] 62 i (x(j); l). conditional on xj , x(j), player i believes that i (l)(j) j (l) implies that player i has either x(i) 6= x(j) or j (l) j (l) j (l) = G^ = G. j (l) = are of order T , it su¢ ces to show that, = G, and fi [hi l ], if player i has x(i) 6= x(j) or = B or Pr f j (l) = B _ target i = G^ i (l)(j) j (l)(i) j (l)(i) = G, then = G with a high probability: = Gg j xj ; x(j); f i (l)(j) = Gg ; hi l 1 1 exp( T 3 ): Claim 6 of Lemma 11 establishes the result. 15 Report Block We now de…ne i (xi ) report jh i i L , and reward function adjust , i and report i in Lemma 13. This …nishes de…ning the strategy TP +1 ): i (xj ; hj Lemma 13 There exists Kreport 2 N such that there exist adjust (xj ; hj l ; hreport ; l) i j 2 h report jh i i L , adjust (xj ; hj l ; hreport ; l) i j i exp( T ); exp( T ) ; 1 5 1 5 11 11 and report (xj ; hj L ; hreport ) i j 2 [ Kreport T 12 ; Kreport T 12 ] such that, for su¢ ciently large T , for each xj 2 fG; Bg and hi L , 72 with 1. report jh i i E " L maximizes X ui (at ) + report (xj ; hj L ; hreport ) i j + t:report block L X adjust (xj ; hj l ; hreport ; l) i j l=1 j xj ; hi L # : (61) 2. From player i’s perspective at the end of review round l, the expected adjustment is equal to (60): E = E if i (l)(j) h h adjust (xj ; hj l ; hreport ; l) i j j f i (l)(j) = Gg ; report jh i i l ; hi T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i l i j xj ; f i (l)(j) = Gg ; hi h i T(l) l l include E review (x ; f [h ]; f [h ]; l) j x ; f (l)(j) = Gg ; h j j j j i i j j i = G, and the adjustment is equal to zero if i (l)(j) l i (62) = B. 3. The equilibrium value satis…es E " X ui (at ) + report (xj ; hj L ; hreport ) i j t:report block j xj ; report jh i i L ; hi L # = 0: (63) Proof. See Appendix A.11. Let us intuitively explain why this lemma is true. To this end, we …rst assume that the players have access to the public randomization device and can communicate via cheap talk in Section 15.1. Then, we explain how to dispense with cheap talk, keeping public randomization device in Section 15.2. Finally, we explain how to dispense with public randomization device in Section 15.3. The formal proof in Appendix A.11 does not use public randomization device or cheap talk. 15.1 Report Block with Cheap Talk T(r) Intuitively, player i sends fi [hi L ] = (fi [hi ])R r=1 to player j so that player j can calculate adjust i 1 to satisfy (62). To this end, it will be useful to divide each round r into jT(r)j 3 subrounds with 2 equal length, that is, each subround lasts for jT(r)j 3 periods: Let t(r) + 1 be the …rst period of round r: T(r) = ft(r) + 1; :::; t(r) + jT(r)jg. ft(r) + (k(r) 2 3 Subround k(r) of round r consists of T (r; k(r)) 2 T(r;k(r)) 1) jT(r)j + 1; :::; t(r) + k(r) jT(r)j 3 g. Let fi [hi 73 ] be the frequency of player i’s Figure 13: Rounds and subrounds history in subround k(r) of round r: T(r;k(r)) fi [hi # ft 2 T (r; k(r)) : ai;t = ai ; yi;t = yi g ] (ai ; yi ) 2 : jT(r)j 3 See Figure 13 for illustration: 15.1.1 Player i’s Strategy report jh i i L At the beginning of the report block, the players draw a public randomization device, which determines whether player 1 or 2 sends the message fi [hi L ]. Each player is picked with probability 1 2 and only one player is picked at the same time (that is, with public randomization, only one player sends the history). Suppose player i is picked. T(r;k(r)) For each round r, player i sends the frequency of each subround (fi [hi 1 jT(r)j 3 ])k(r)=1 . Then, for each round r, the players draw a public randomization device, which picks one subround k(r) with probability jT(r)j 1 3 1 randomly. Suppose that ki (r) 2 f1; :::; jT(r)j 3 g is picked for round r. Then, player i sends the entire history of the picked subround ki (r): (ai;t ; yi;t )t2T(r;ki (r)) . (The reason why player i sends the message this way rather than simply sending (ai;t ; yi;t )t2T(r) is to reduce the cardinality of the message, looking ahead to dispensing with cheap talk.) 15.1.2 Reward Functions report i and adjust i Player j incentivizes player i to tell the truth by report i as follows: First, player j incentivizes player i to tell the truth about (ai;t ; yi;t )t2T(r;ki (r)) by giving the reward equal to T 11 X t2T(r;ki (r)) 1ft=texclude (r)g 1fr<l g 1aj;t ;yj;t j i 74 E 1aj;t ;yj;t j ^i;t ; y^i;t j (r); a 2 : (64) In general, 1fXg is equal to one if statement X is true and zero otherwise. Here, 1ft=texclude (r)g = 1 j exclude (r); and 1fr<l g is equal to one if and only if round r is before (and not if and only if t = tj i equal to) review round li (recall that li is the …rst review round with i (l)(j) = B). We de…ne 1fr<l g = 1 for each r = 1; :::; R if player j has i (l)(j) = G for each l = 1; :::; L. In addition, i 1aj;t ;yj;t is jAi j jYi j-dimensional vector whose element corresponding to (aj;t ; yj;t ) is one and other elements are equal to zero. Moreover, j (r) is player j’s strategy in round r and (^ ai;t ; y^i;t ) is player i’s message. 1aj;t ;yj;t E 1aj;t ;yj;t j ^i;t ; y^i;t j (r); a 2 is called the scoring rule in statistics, and incentivizes player i to tell the truth about (^ ai;t ; y^i;t ) if player i believes that (aj;t ; yj;t ) is distributed according to Pr( j j (r); ai;t ; yi;t ). Moreover, given Assumption 4, the incentive is strict if j (r) is a fully mixing strategy. (See Lemma 46 for the formal proof.) (r). Recall that player j does not Here, player j punishes player i based on the period texclude j use (aj;t ; yj;t ) with t = texclude (r) to determine her continuation strategy. Hence, player i cannot j learn (aj;t ; yj;t ) and believes that (aj;t ; yj;t ) is distributed according to Pr( j j (r); ai;t ; yi;t ). By (r). Together with the Lemma 10, with Assumption 1, player i cannot learn what period is texclude j fact that j (r) is fully mixing given i (l)(j) = G, player i has the strict incentive to tell the truth about (ai;t ; yi;t ) for each t 2 T (r; ki (r)) if r < li . Second, we make sure that (64) does not a¤ect player i’s incentive in the coordination and main blocks. At the timing when player i takes ai;t , the expected reward given that player i will tell the truth in the report block is T 11 1ft=texclude (r)g 1fr<l g E j i given texclude (r), li , and j j (r). h 1aj;t ;yj;t E 1aj;t ;yj;t j 2 j (r); ai;t ; yi;t j j (r); ai;t Note that the conditional expectation E 1aj;t ;yj;t j i ; j (r); ai;t ; yi;t depends on (ai;t ; yi;t ), and the distribution of yi;t depends on ( j (r); ai;t ). Hence, the conditional expectation given ( j (r); ai;t ) (but before observing yi;t ) depends on ( j (r); ai;t ). Since player j’s signal yj;t statistically infers player i’s action ai;t , there exists cancel [ j (r)](aj;t ; yj;t ) i =E h 1aj;t ;yj;t E 1aj;t ;yj;t j 75 cancel [ j (r)](aj;t ; yj;t ) i j (r); ai;t ; yi;t 2 j such that j (r); ai;t i : To cancel out the e¤ect of (64) on player i’s incentive to take actions, player j gives the reward T 11 1ft=texclude (r)g 1fr<l g j i cancel [ j (r)](aj;t ; yj;t ): i (65) T(r;ki (r)) Third, player j punishes player i if player i’s message about the frequency fi [hi ] about the picked subround is not compatible with player i’s message about the history in this subround: T 15 1fr<l g 1 i ^ T(r;ki (r)) ]6=jT(r)j f i [h i 2 3 P t2T(r;ki (r)) 1a^i;t ;^ yi;t (66) : 2 The variable with hat denotes player i’s message. Since jT (r)j 3 is the length of the subround, ^ T(r;ki (r)) ] and jT (r)j 23 P f i [h ^i;t ;^ yi;t should be equal to each other if player i tells the i t2T(r;ki (r)) 1a truth. In total, we have report (xj ; hj L ; hreport ) i j = R X 8 > > > > < T 11 P t2T(r;ki (r)) +T 1fr<l g i > > r=1 > > : P 11 T E 1aj;t ;yj;t j 1ft=texclude (r)g 1aj;t ;yj;t j t2T(r;ki (r)) 15 1 1ft=texclude (r)g j ^ T(r;ki (r)) ]6=jT(r)j fi [h i 2 3 P ^i;t ; y^i;t j (r); a cancel [ j (r)](aj;t ; yj;t ) i t2T(r;ki (r)) 1a^i;t ;^ yi;t 1 2 9 > > > > = > > > > ; : ^ T(r;k(r)) ])jT(r)j 3 , player j Finally, given player i’s message about the frequency of subrounds, (fi [h i k(r)=1 calculates the frequency of the round: 1 ^ T(r) ] = jT (r)j f i [h i 1 3 jT(r)j 3 X ^ T(r;k(r)) ]: f i [h i (67) k(r)=1 ^ T(1) ]; :::; fi [h ^ T(r) ] with round r being review round l, player j de…nes For each l, from fi [h i i adjust (xj ; hj l ; hreport ; l) i j = 2 1f i (l)(j)=Gg 8 h < E : i 9 ^ l] = j xj ; f i (l)(j) = Gg ; fi [h i h i ; l l ; include l ^ ] E review (x ; f [h ]; f [h ]; l) j x ; f (l)(j) = Gg ; f [ h j j j i i i j j j i target (xj ; fjinclude [hj l ]; fj [hlj ]; fi [hi l ]; l) i 76 so that (62) holds with truthtelling. Lemma 12 ensures that the expected di¤erence between and review i target i depends only on the frequency of player i’s history. Here, 2 cancels out the probability that player i is selected by the public randomization (we de…ne that the adjustment is zero for player j who is not selected). Note that we have 1 adjust (xj ; hj l ; hreport ; l) i j (68) exp( T 4 ) for each xj , hj l , hreport , and l by Lemma 12. j 15.1.3 Incentive Compatibility Consider player i’s incentive to tell the truth about the history in round r. If 1fr<l g = 0, that i report report L ) does not (xj ; hj ; hj is, if round r is in review round li with i (li )(j) = B or after, then i depend on player i’s message about round r. Moreover, the message about round r does not a¤ect adjust i either. To see why, note that, for each l < li , adjust ; l) (xj ; hj l ; hreport j i = 2 1f i (l)(j)=Gg 8 h < E : i 9 l ^ j xj ; f i (l)(j) = Gg ; fi [hi ] = i h l l ; l include review ^ [hj ]; fj [hj ]; l) j xj ; f i (l)(j) = Gg ; fi [hi ] (xj ; fj E i target (xj ; fjinclude [hj l ]; fj [hlj ]; fi [hi l ]; l) i does not depend on player i’s message about round r since round r is after review round l. For each l adjust (xj ; hj l ; hreport ; l) i j li , we have does not a¤ect report i or = 0 by de…nition since i (l)(j) = B. Since the message adjust , i any message is optimal about round r with 1fr<l g = 0. Therefore, i we will concentrate on the case with 1fr<l g = 1 and i (l)(j) = G for each review round l which is i equal to or before round r. When player i sends (ai;t ; yi;t )t2T(r;ki (r)) , player i believes that the expected reward from (64) given j (r) is T E First, Pr 11 h Pr t = texclude (r) j j 1aj;t ;yj;t E 1aj;t ;yj;t j t = texclude (r) j j j (r); hi L j (r); hi L ^i;t ; y^i;t j (r); a T 2 77 2 j j (r); hi L ; t = texclude (r) j i for each t 2 T(r) by Lemma 10. : (69) Second, since player j does not use information in period texclude (r) to determine the continuation play, j (r) = t, player j’s history (aj;t ; yj;t ) is distributed according to player i believes that given texclude j Pr (aj;t ; yj;t j j (r); ai;t ; yi;t ). Hence, (69) is equal to "E h for some " > 0 of order T 1aj;t ;yj;t 13 E 1aj;t ;yj;t j ^i;t ; y^i;t j (r); a 2 j j (r); ai;t ; yi;t i (70) for each t 2 T (r; ki (r)). Finally, since we are considering the case with 1fr<l g , player j’s strategy j (r) has full support. i Then, by Assumption 4, we can make sure that the expected loss in (70) from telling a lie about 13 (ai;t ; yi;t ) is of order T of the other reward in relevant reward in 13 . (See Lemma 46 for the details.) Since T report i report i (since cancel [ j (r)](aj;t ; yj;t ) i is greater than the magnitude is sunk in the report block, the other adjust (xj ; hj l ; hreport ; l) i j is the one de…ned in (66)) and the adjustment is 1 4 of order exp( T ) by (68), it is strictly optimal to tell the truth about (ai;t ; yi;t )t2T(r;ki (r)) regardless of the past messages. Given this truthtelling incentive about (ai;t ; yi;t )t2T(r;ki (r)) , consider player i’s incentive to tell the truth about the frequency in subrounds. T(r;k(r)) While she sends the message fi [hi that public randomization will pick any k(r) with probability jT(r)j 1 3 T T(r;k(r)) public randomization ki (r) is drawn after she …nishes sending (fi [hi i tells a lie about subround k(r), then the expected loss from (66) is T than the magnitude of the adjustment T(r;k(r)) the truth about (fi [hi adjust (xj ; hj l ; hreport ; l). i j 1 3 ], she believes . (Recall that the 1 jT(r)j 3 ])k(r)=1 .) Hence, if player 1 3 T 15 , which is greater Hence, it is strictly optimal to tell 1 jT(r)j 3 ])k(r)=1 regardless of the past messages. In summary, we construct an incentive compatible strategy and reward such that player i tells the truth about the history and from that report, player j calculates 15.2 adjust i to satisfy (62). Dispensing with Cheap Talk We now explain how player i sends the message by taking actions. Again, the players draw a public randomization device to decide who to report the history. Suppose that player i is selected. Player j takes mix j i.i.d. across periods, and that there exists player j’s reward function adjust (xi ; hi l ; hreport ; l) j i report j to incentivize her to take equilibrium value in the report block equal to zero. 78 = 0. Assumption 2 ensures mix j and to keep her Hence, we focus on player i’s strategy and rewards. 15.2.1 Player i’s Strategy report jh i i L T(r;k(r)) As in the case with cheap talk, player i sends (fi [hi 1 jT(r)j 3 ])k(r)=1 and (ai;t ; yi;t )t2T(r;ki (r)) . Let mi be T(r;k(r)) a generic message that player i wants to send in the report block, that is, mi can be fi [hi ] for some r and k(r), or (ai;t ; yi;t ) for some r and t 2 T (r; ki (r)); and let Mi be the set of possible 2 T(r;k(r)) fi [hi 2 jT (r)j 3 jAi jjYi j messages. We have jMi j T(r;k(r)) T 3 jAi jjYi j for fi [hi ] since the frequency of subround 2 ] can be expressed by “how many times out of jT (r)j 3 periods player i observes each 2 (ai ; yi ).” (Recall that jT (r)j 3 is the length of the subround.) For (ai;t ; yi;t ), we have jMi j since (ai ; yi ) is included in Ai jAi j jYi j Yi . We now explain how player i sends mi 2 Mi . Given ai (G) and ai (B) …xed in Section 6.2, there exists a one-to-one mapping ~ai : Mi ! fai (G); ai (B)glog2 jMi j between message mi and sequence of binary actions since fai (G); ai (B)glog2 jMi j = jMi j. explicitly.) (See Appendix A.4 for how to de…ne ~ai 2 T(r;k(r)) Note that the length of ~ai (mi ) is bounded by log2 T 3 jAi jjYi j for fi [hi ] and by log2 jAi j jYi j for (ai;t ; yi;t ). When player i sends the message mi , player i takes an action sequence ~ai (mi ). Moreover, player i repeats each element of ~ai (mi ) for multiple periods, in order to increase the precision of T(r;k(r)) the message. In particular, for mi corresponding to fi [hi 1 ], player i repeats the action for T 2 1 periods, and for mi corresponding to (ai;t ; yi;t ), she repeats it for T 4 periods. Let T (mi ) be the T(r;k(r)) number of repetitions: T (fi [hi 1 1 ]) = T 2 and T (ai;t ; yi;t ) = T 4 . 11 Given this strategy, the communication takes periods of order T 12 : 0 1 2 T |{z} B R B repetition X B B 1 B r=1 @ + T 4 |{z} repetition 11 which is of order T 12 1 3 jT (r)j | {z } log2 jT (r)j | {z number of subrounds length of ~ai (mi ) for 2 3 jT (r)j | {z } number of periods per subround 1 2 jAi jjYi j 3 } T(r;k(r)) ] fi [hi log jA j jY j | 2 {zi i} length of ~ai (mi ) for (ai ;yi ) C C C C; C A (71) T , as seen in (20). 11 Let treport + 1 be the …rst period of the report block; and let treport + 1; :::; treport + T 12 be the 11 periods in which player i takes ~ai (mi ) for some mi . (Precisely speaking, T 12 should be of order 11 11 11 T 12 by (71). For simple notation, we just write T 12 rather than of order T 12 in the text.) After 79 Figure 14: Structure of the report block with public randomization player i …nishes sending each ~ai (mi ), the players sequentially assign S periods for each of periods 11 11 treport + 1; :::; treport + T 12 , where S is determined in Section 6.3. That is, periods treport + T 12 + 11 11 1; :::; treport + T 12 + S are assigned to period treport + 1, periods treport + T 12 + S + 1; :::; treport + 11 T 12 + 2S are assigned to period treport + 2, and so on. In general, let S(t) be the set of periods 11 treport + T 12 + (k 11 1)S + 1; :::; treport + T 12 + kS that are assigned to period t with k = t In periods S(t), given player i’s history in period t, player i takes The periods treport + 1; :::; treport + T 11 12 S(t) i treport . determined in Section 6.3. are called “the round for sending the history” and the other periods are called “the round for conditional independence.” Figure 14 summarizes the entire structure. 15.2.2 Player j’s Inference of Player i’s Message For each period t in which player i sends an element of an action sequence ~ai (mi ) assigned to a message mi , player j given her history in periods t and S(t) calculates a function [ (aj; ; yj; ) 2S(t) ) j ((aj;t ; yj;t ) determined in Section 6.3. By Lemma 4, since the expected realization of j is high (or low) if player i takes ai (G) (or ai (B)), player j infers that the element of ~ai (mi ) is ai (G) 80 (or ai (B)) if there are a lot of high (or low) realization of j. Since player i repeats the element of ~ai (mi ) for T (mi ) periods, by the law of large numbers, player j can infer the element correctly with probability of order 1 (72) exp( T (mi )): S(t) i Importantly, Lemma 4 ensures that (if player i expects that she will take in the round for conditional independence) player i in the round for sending the history cannot update player j’s inference from player i’s signal observation (conditional independence property). Given this inference of each element of ~ai (mi ), player j infers each message using the inverse of T(r;k(r)) ~ai (mi ). Let (fi [hi T(r) player j infers fi [hi 15.2.3 1 jT(r)j 3 ](j))k(r)=1 and (ai;t (j); yi;t (j))t2T(r;ki (r)) be player j’s inference. As in (67), T(r;k(r)) ](j) and fi [hi l ](j) from ((fi [hi Reward Functions We modify adjust i and report i adjust i and 1 jT(r)j 3 ](j))k(r)=1 )R r=1 . report i in order to deal with the possibility of errors. First, we consider an error when player j calculates adjust (xj ; hj l ; hreport ; l) i j = 2 1f i (l)(j)=Gg 8 h < E : (73) i 9 target (xj ; fjinclude [hj l ]; fj [hlj ]; fi [hi l ]; l) j xj ; f i (l)(j) = Gg ; fi [hi l ](j) = i h i l l ; include l E review (x ; f [h ]; f [h ]; l) j x ; f (l)(j) = Gg ; f [h ](j) j j j j i i i i j j T(r;k(r)) from fi [hi l ](j). Since the cardinality of fi [hi T(r;k(r)) fi [hi 2 ] is log2 jT (r)j 3 jAi jjYi j 2 ] is jT (r)j 3 jAi jjYi j , the length of ~ai (mi ) to send 1 2 log2 T 3 jAi jjYi j . Since each round has jT (r)j 3 1 T 3 subrounds and there are R rounds, the total length of ~ai (mi )’s that are used to calculate fi [hi l ](j) is no more than 1 2 1 T(r;k(r)) RT 3 log2 jT (r)j 3 jAi jjYi j . By (72), recalling that T (mi ) = T 2 for mi corresponding to fi [hi ], player j infers fi [hi l ](j) correctly with probability no less than 1 1 2 RT 3 log2 jT (r)j 3 jAi jjYi j 1 T(r;k(r)) On the other hand, the cardinality of the messages ((fi [hi 81 (74) exp( T 2 ): 1 jT(r)j 3 ](j))k(r)=1 )R r=1 used to calculate fi [hi l ](j) is no more than YR cardinality of r=1 1 number of subrounds in round r T(r;k(r)) fi [hi ] jT (r)j 2 jAi jjYi j 3 RT 3 ; 1 which is of order exp(T 3 ). As will be seen in Appendix A.11.3.2, if we have shown that the probability of the correct inference (74), to the power of the cardinality of the messages used in (73), converges to one, then we can create adjust (xj ; hj l ; hreport ; l) i j such that, given that player i follows the equilibrium strategy in the report block, (i) this adjustment is still small and (ii) (62) holds from the perspective of player 1 2 1 i in review round l. Since we have (jT (r)j 3 jAi jjYi j )RT 3 1 exp(T 2 ) for a su¢ ciently large exp(T 3 ) T , the probability of the correct inference, to the power of the cardinality of the messages, satis…es 2 1 1 2 1 1 RT 3 log2 jT (r)j 3 jAi jjYi j 1 3 RT log2 jT (r)j 2 jAi jjYi j 3 exp( T 2 ) jT(r)j 3 jAi jjYi j 1 RT 3 1 1 2 jT (r)j exp( T ) 2 jAi jjYi j 3 RT 3 ! 1 as T ! 1; (75) as desired. report (xj ; hj L ; hreport ) i j We also modify taking errors into account. Consider the reward to incentivize player i to tell the truth about (ai;t ; yi;t )t2T(r;ki (r)) : 1ft=texclude (r)g 1fr<l g 1aj;t ;yj;t j i E 1aj;t ;yj;t j ^i;t ; y^i;t j (r); a 2 : (76) Without cheap talk, player j can infer each element of ~ai (mi ) to send (ai;t ; yi;t ) correctly with probability of order 1 1 1 exp( T 4 ) since T (mi ) = T 4 for mi corresponding to (ai ; yi ). Since the number of elements of ~ai (mi ) to send (ai;t ; yi;t ) is log2 jAi j jYi j, the probability that player j infers (ai;t ; yi;t ) correctly is 1 log2 jAi j jYi j 1 exp( T 4 ): On the other hand, the cardinality of the message (ai;t ; yi;t ) in (76) is jAi j jYi j. Hence, the probability of the correct inference, to the power of the cardinality of the messages 82 in (76), converges to one as T goes to in…nity: 1 log2 jAi j jYi j 1 exp( T 4 ) jAi jjYi j 1 jAi j jYi j log2 jAi j jYi j 1 exp( T 4 ) !T !1 1: As will be seen in Lemma 48 formally, this means that we can slightly modify the reward taking errors into account, so that it is strictly optimal for player i to tell the truth about (ai;t ; yi;t ). cancel [ j (r)](aj;t ; yj;t ) i Similarly, we can modify cancel i so that cancels out the e¤ect of ai;t in period t of the coordination or main block on the modi…ed version of (76). T(r;k(r)) Next, consider the reward to incentivize player i to tell the truth about fi [hi 1 T(r;ki (r)) fi [hi ](j)6=jT(r)j 2 3 P t2T(r;ki (r)) T(r;ki (r)) Without cheap talk, since the cardinality of fi [hi T(r;ki (r)) to send fi [hi 2 jT (r)j 2 3 jT (r)j T P 2 3 (77) : 2 ] is jT (r)j 3 jAi jjYi j , the length of ~ai (mi ) ] is log2 jT (r)j 3 jAi jjYi j . In addition, since the cardinality of (ai;t ; yi;t ) is jAi j jYi j, the length of ~ai (mi ) to send (ai;t ; yi;t ) is log2 jAi j jYi j. 2 3 1ai;t (j);yi;t (j) ]: Since each subround has jT (r; ki (r))j = T(r;ki (r)) periods, the total length of ~ai (mi )’s that are used to calculate fi [hi t2T(r;ki (r)) ](j) and 1ai;t (j);yi;t (j) is 2 2 log2 jT (r)j 3 jAi jjYi j + jT (r)j 3 log2 jAi j jYi j : Hence, all the messages transmit correctly with probability 1 2 2 1 1 (log2 jT (r)j 3 jAi jjYi j exp( |{z} T 2 ) + jT (r)j 3 log2 jAi j jYi j exp( |{z} T 4 )): T(r;k(r)) # of repetitions for fi [hi (78) # of repetitions for (ai;t ;yi;t ) ] On the other hand, the cardinality of the messages used in (77) is calculated as follows: The 2 2 P T(r;k (r)) cardinality of fi [hi i ](j) is jT (r)j 3 jAi jjYi j . As for jT (r)j 3 t2T(r;ki (r)) 1ai;t (j);yi;t (j) , when player i sends the message as if (~ ai;t ; y~i;t )t2T(r;ki (r)) were the true message, the distribution of this summation P ai;t ; y~i;t )t2T(r;ki (r)) . Hence, t2T(r;ki (r)) 1ai;t (j);yi;t (j) depends only on the frequency of each (ai ; yi ) in (~ the relevant cardinality of (ai;t (j); yi;t (j))t2T(r;ki (r)) for (77) is equal to that of the frequency, that is, 2 jT (r)j 3 jAi jjYi j . 83 Hence, (78) to the power of the relevant cardinality is 2 log2 jT (r)j 1 2 jAi jjYi j 3 2 3 1 2 2 jT(r)j 3 jAi jjYi j jT(r)j 3 jAi jjYi j 1 4 exp( T ) + jT (r)j log2 jAi j jYi j exp( T ) !T !1 1: Therefore, we can slightly modify the reward function so that player i wants to tell the truth about T(r;k(r)) fi [hi ]. In addition, we add c:i: i de…ned in Section 6.3 in order to incentivize player i to take S(t) i in the round for conditional independence. Finally, we add the reward i (xj ; yj ) de…ned in (9), so that it cancels out the e¤ect of the instantaneous utilities. In total, we have report (xj ; hj L ; hreport ) i j X = t:report block + R X r=1 + i (xj;t ; yj;t ) 8 > > > > > > > > < 1fr<l g i > > > > > > > > : T 11 P t2T(r;ki (r)) T X T +T ( 15 11 1ft=texclude (r)g j : 1 c:i: i modi…cation of E 1aj;t ;yj;t j 1aj;t ;yj;t 1ft=texclude (r)g modi…cation of j modi…cation of 1 t:round for sending the history 15.2.4 8 < T(r;ki (r)) fi [hi (aj;t ; yj;t ) [ (aj; ; yj; ) cancel [ j (r)](aj;t ; yj;t ) i ](j)6=jT(r)j 2S(t) : j (r); ai;t (j); yi;t (j) 2 3 P t2T(r;ki (r)) 1ai;t (j);yi;t (j) ) Incentive Compatibility Let us verify player i’s incentive. By i (xj;t ; yj;t ), (9) ensures that we can neglect the instantaneous utility. In the round for conditional independence, Claim 1 of Lemma 4 implies that, once player i deviates, then she incurs a loss of T other terms in report i are of order T 11 1 "strict from T 1 c:i: i . Since the modi…cation is small, the . Hence, it is optimal for player i to take S(t) i in the round for conditional independence. Given this incentive, Claim 2 of Lemma 4 ensures that player i in the round for sending the history cannot update player j’s inference of player i’s message. Moreover, (9) and Claim 3 of Lemma 4 ensures that player i is indi¤erent between any action in the round for sending the history 84 2 9 9 = > > > > > ; > > > = > > > > > > > > ; in terms of the expected value of ui (at ) + i (aj;t ; yj;t ) + X (ui (a ) + i (aj; ; yj; )) + c:i: i 2S(t) (aj;t ; yj;t ) [ (aj; ; yj; ) 2S(t) : Since the incentive with cheap talk is strict, the message transmit correctly with a high probability, and the modi…cation is small, it is incentive compatible for player i to follow the equilibrium strategy. In summary, we construct an incentive compatible strategy and reward without cheap talk. The key idea is to use the round for conditional independence to keep conditional independence property in the round for sending the history. 15.3 Dispensing with Public Randomization Now we explain how to dispense with the public randomization. Recall that the public randomization plays two roles in the report block. The …rst is to pick who reports the history. The second is to pick a subround k(r). Let us focus on the …rst one. (The second one is dispensed with by a similar procedure. See Appendix A.11 for the details.) The purpose of this …rst role is to establish the following two properties: (i) ex ante (before the report block), every player has a positive probability to report the history, and (ii) ex post (after the realization of the public randomization), there is only one player who reports the history. The property (i) is important since, without adjusting the reward function by adjust i based on player i’s report in the report block, player i’s equilibrium strategy would not be optimal. The property (ii) is important to incentivize player i to tell the truth. Remember that player j incentivizes player i to tell the truth by punishing player i according to (64). As seen in (70), the establishment of truthtelling incentive uses the fact that player i cannot update her belief about the realization of player j’s history in period texclude (r). If both players sent messages by taking j actions and player i could observe a part of player j’s messages before …nishing reporting her own history, then player i might be able to learn player j’s history in period texclude (r) and would want j to tell a lie. In order to establish these two properties without public randomization, we consider the following procedure. For simplicity, let us assume that the signals are not conditionally independent (see the 85 end of this subsection for the case with conditionally independent signals): There exists ap:r: 2 A such that, given ap:r: , there exist y1p:r: ; y1p:r: 2 Y1 and y2p:r: 2 Y2 such that Pr (y2p:r: j ap:r: ; y1p:r: ) 6= Pr (y2p:r: j ap:r: ; y1p:r: ) : Fix those ap:r: , y1p:r: ; y1p:r: 2 Y1 , and y2p:r: 2 Y2 . Without loss, we assume that Pr (y2p:r: j ap:r: ; y1p:r: ) > Pr (y2p:r: j ap:r: ; y1p:r: ) : (79) At the beginning of the report block, the players take a pure strategy ap:r: .21 Let tp:r: be the period when the players take ap:r: . Each player i observes her own signal yi;tp:r: . By Assumption 2, since player j can statistically identify if player i takes aip:r: , there exists p:r: i : Aj that it is optimal for player i to take ap:r: i : E ui (ai ; aj ) + p:r: i (aj ; yj ) j ai ; ajp:r: 8 < 0 if a = ap:r: , i i = p:r: : 1 if a 6= a . i i Yj ! R such (80) Intuitively, player 2 asks player 1 to guess whether player 2 observed y2;tp:r: = y2p:r: or y2;tp:r: 6= y2p:r: . On the other hand, since players’signals are conditionally dependent, player 1’s conditional likelihood of y2;tp:r: = y2p:r: against y2;tp:r: 6= y2p:r: depends on y1;tp:r: 2 Y1 . In particular, (79) ensures that there exists pp:r: such that 1 Pr Pr y2;tp:r: = y2p:r: j ap:r: ; y1p:r: Pr p:r: > p:r: p:r: > p1 p:r: y2;tp:r: 6= y2 j a ; y1 Pr y2;tp:r: = y2p:r: j ap:r: ; y1p:r: : y2;tp:r: 6= y2p:r: j ap:r: ; y1p:r: Perturbing pp:r: if necessary, we make sure that there is no y1 2 Y1 such that 1 Pr Pr 21 y2;tp:r: = y2p:r: j ap:r: ; y1 = p1p:r: : y2;tp:r: 6= y2p:r: j ap:r: ; y1 The superscript p:r: stands for “public randomization.” 86 Given this de…nition, we partition the set of player 1’s signals into Y1report and Y1not ( Pr y 1 2 Y1 : Pr ( Pr y 1 2 Y1 : Pr Y1report Y1not with Y1report [ Y1not Y1not report report report y2;tp:r: = y2p:r: j ap:r: ; y1 > p1p:r: p:r: p:r: y2;tp:r: 6= y2 j a ; y1 y2;tp:r: = y2p:r: j ap:r: ; y1 < p1p:r: p:r: p:r: y2;tp:r: 6= y2 j a ; y1 p:r: about 3 y1p:r: ; 3 y1p:r: .” (In this explanation, we assume that cheap talk is available. Cheap talk is dispensable = 2 denote the event that y1;tp:r: 2 Y1non p:r: ) : = Y1 . Player 1 is asked to report whether “y1;tp:r: 2 Y1report ” or “y1;tp:r: 2 as in Section 15.2.) For notational convenience, let and ) report p:r: report = 1 denote the event that y1;tp:r: 2 Y1report ; . Let ^p:r: 2 f1; 2g be player 1’s message . To incentivize player 1 to tell the truth, player 2 rewards player 1 if either “player 2 observes y2p:r: and player 1 reports ^p:r: = 1”or “player 2 does not observe y2p:r: and player 1 reports ^p:r: = 2”: T 4 1fy2;tp:r: =yp:r: ^^p:r: =1g + p1p:r: 1fy2;tp:r: 6=yp:r: ^^p:r: =2g : 2 2 After player 1 sends ^p:r: , the players send the message about hi L (81) sequentially: Player 1 sends the messages …rst, and then player 2 sends the messages. The players coordinate on how to send the messages based on ^p:r: as follows. If player 1 reports ^p:r: = 1, then player 2 adjusts player 1’s reward function based on player 1’s messages, and incentivizes player 1 to tell the truth according to (64), (65), and (66) with i = 1. On the other hand, if player 1 reports ^p:r: = 2, then player 2 incentivizes player 1 to send an uninformative message (formally, we extend the message space to fgarbageg [ M1 and player 2 gives the positive reward only if player 1 sends fgarbageg if ^p:r: = 2. She does not adjust the reward function for player 1 if ^p:r: = 2). As in the case with public randomization, given player 1’s message ^p:r: , player 1 wants to send the true message mi after ^p:r: = 1; and she wants to send fgarbageg after ^p:r: = 2. On the other hand, player 1 adjusts the reward and incentivizes player 2 to tell the truth according to (64), (65), and (66) with i = 2, only after player 1 sends fgarbageg. (If ^p:r: = 1, then player 1 makes player 2 indi¤erent between any messages and 87 adjust 2 is identically equal to zero.) Figure 15: How to coordinate without public randomization See Figures 15 and 16 for illustration. Since Y1report and Y1non report are nonempty, with truthtelling, both ^p:r: = 1 (that is, y1;tp:r: 2 Y1report ) and ^p:r: = 2 (that is, y1;tp:r: 2 Y1not report ) happen with a positive probability. Hence, each player has the positive probability to get her reward adjusted (that is, the property (i) is established). Moreover, the second sender (player 2) can condition that player 1’s message was fgarbageg, and so there is no learning about h1 L (that is, the property (ii) is established). We are left to verify that it is optimal for the players to take ap:r: and player 1 has the incentive to tell the truth about p:r: . Consider player i’s incentive to take a pure strategy ap:r: . ap:r: a¤ects player i’s payo¤ through (80), (81), adjustment of the reward adjust , i the rewards for truthtelling (64), (65), and (66). Since the e¤ects other than (80) are su¢ ciently small compared to (80), it is optimal to take ap:r: for each i 2 f1; 2g. i Given that the players take ap:r: , the report ^p:r: a¤ects player 1’s payo¤ through (81), adjustment of the reward adjust , 1 the rewards for truthtelling (64), (65), and (66), since (80) is sunk at the point of sending ^p:r: . Except for (81), the e¤ect is of order T T 4 h i E 1fy2;tp:r: =yp:r: g j ap:r: ; y1;tp:r: 2 where O(T 5 O(T 5 )>T ) is a random variable of order T Pr Pr 5 5 4 p:r: p1 E . Hence, it is optimal to send ^p:r: = 1 if h i 1fy2;tp:r: 6=yp:r: g j ap:r: ; y1;tp:r: + O(T 2 . That is, y2;tp:r: = y2p:r: j ap:r: ; y1;tp:r: > p1p:r: + O(T p:r: p:r: y2;tp:r: 6= y2 j a ; y1;tp:r: 88 1 ): 5 ); Figure 16: Player 2 can condition that she did not learn player 1’s history For a su¢ ciently large T , therefore, it is optimal to send ^p:r: = 1 if y1;tp:r: 2 Y1report . Similarly, we can show that it is optimal to send ^p:r: = 2 if y1;tp:r: 2 Y1not truth about p:r: report . That is, it is optimal to tell the . In summary, we construct an incentive compatible strategy and reward such that the players coordinate on whether player 1 sends the history truthfully, by asking player 1 to guess whether player 2 observed y2;tp:r: = y2p:r: or y2;tp:r: 6= y2p:r: . Moreover, player 2’s report matters only after player 1 sends the garbage, which incentivizes player 2 to tell the truth. Since the players’signals are correlated, player 1’s guess di¤ers after di¤erent observations of player 1’s signals. Hence, both players have positive probabilities of getting their reward adjusted based on the report block. Finally, if the players’signals are conditionally independent, player 2 takes a mixed strategy and asks player 1 to guess which action she takes, rather than asks player 1 to guess whether player 2 observed y2;tp:r: = y2p:r: or y2;tp:r: 6= y2p:r: . The proof is the same as above, except that, since player 2 takes a mixed strategy, player 2 has to be indi¤erent between multiple actions in period tp:r: . If player 2 had di¤erent probabilities of getting her reward adjusted after di¤erent actions of hers, then depending on player 2’s expectation 89 of the adjustment, player 2 would have di¤erent incentives to take di¤erent actions. To avoid such complication, as will be seen in Appendix A.11.7.3 formally, we make sure that the magnitude of getting her reward adjusted does not depend on player 2’s action. Then, player 2 becomes indi¤erent between actions. 16 Veri…cation of (5)–(8) Note that Section 7 de…nes the structure of the …nitely repeated game with T being a parameter. For each T , Section 13 de…nes player i’s strategy in the coordination and main blocks, and Lemma 13 de…nes her strategy in the report block report jh i i L . Hence, we have …nished de…ning the strategy. In addition, for each T , Section 14.2 de…nes the reward function as TP +1 ) i (xj ; hj = i (xj ) R X + X xj i (aj;t ; yj;t ) r=1:round r is not a review round t2T(r) + + Section 14.2 pins down L n X T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) i + o adjust ; l) (xj ; hj l ; hreport j i l=1 report (xj ; hj L ; hreport ): i j i (xj ), xj i , and review , i and Lemma 13 pins down adjust i and report . i (82) Hence, we have …nished de…ning the reward function. Therefore, we are left to verify (5)–(8). It is useful to recall that Claim 2 of Lemma 13 de…nes adjust (xj ; hj l ; hreport ; l) i j so that, given player i’s equilibrium strategy in the report block, we have h review (xj ; hj l ; l) i adjust (xj ; hj l ; hreport ; l) i j j xj ; hi l h i T(l) target l l l include = E i (xj ; fi [hi ]; fj [hj ]; fj [hj ]; l) j xj ; hi E + 90 i (83) since i (l)(j) = B implies review i target i = by (57) and (59). Moroever, (59) ensures that T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i = 1f i (l)(j)=Bg 8 < : sign (xj ) LT u + f 16.1 > > > > > > : X t2T(l) +1f i (l)(j)=Gg 8 > > > > > > < 1 (84) 9 = xj i (aj;t ; yj;t ) ; 8 9 8 < > T fui (xj ) ui ( (x(j)))g = > > < 1 P f j (l)=G_f j (l)=B^fi [h<l i ]62 i (x(j);l)gg : + t2T(l) i [ (x(j))](aj;t ; yj;t ) ; j (l)=Gg > > > : +1 fui (xj ) ui (BRi ( j ; (x(j))); j ; (x(j)))g f j (l)=B^fi [h<l i ]2 i (x(j);l)g n o P xj +1f j (l)=Bg sign (xj ) LT u + t2T(l) i (aj;t ; yj;t ) 9 > > > = 9 > > > > > > = > > : > ; > > > > > > ; Incentive Compatibility, Promise Keeping, and Full Dimensionality We prove the optimality of player i’s equilibrium strategy by backward induction: In the report block, player i maximizes E " X ui (ai;t ) + t:report block L X adjust (xj ; hj l ; hreport ; l) i j + report (xj ; hj L ; hreport ) i j l=1 j xj ; hi L # since the other rewards in (82) are sunk in the report block. Hence, by Lemma 13, it is optimal to take report jh i i L for each xj 2 fG; Bg. The equilibrium payo¤ (without taking the average) in the report block satis…es E " X ui (ai;t ) + report (xj ; hj L ; hreport ) i j t:report block by Claim 3 of Lemma 13 (we include j xj ; adjust (xj ; hj l ; hreport ; l) i j report jh i i L ; hi L # =0 to the equilibrium payo¤ in review round l, rather than in the report block). In the last review round L, since the continuation payo¤ from the report block is zero, player i P ignores t:report block ui (ai;t ) + report (xj ; hj L ; hreport ). In addition, by (62), the expected value of i j X adjust (xj ; hj l ; hreport ; l) i j l L 1 91 does not depend on player i’s strategy after review round L 1. Since the rewards PR P xj T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) for l r=1:round r is not a review round t2T(r) i (aj;t ; yj;t ), and i i (xj ), L 1 are sunk in review round L, player i wants to maximize, with l = L, 2 E4 X T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) i ui (at ) + adjust (xj ; hj l ; hreport ; l) i j + t2T(l) j xj ; 3 <l 5 : i (xi ); hi Given (83), by the law of iterated expectation, player i wants to maximize, with l = L, 2 X E4 ui (at ) + T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i t2T(l) Hence, by Lemma 2 and (59), if 2 E4 X j (l) 3 <l 5 : i (xi ); hi j xj ; = B, then player i wants to maximize X ui (at ) + sign (xj ) LT u + t2T(l) xj i (aj;t ; yj;t ) t2T(l) j xj ; 3 <l 5 : i (xi ); hi Claim 2 of Lemma 2 ensures that any strategy is optimal and the equilibrium value (without taking x the average) is equal to sign(xj ) LT u + T ui j . On the other hand, if if j (l) j (l) = G, then we have = G or “ j (l) = B and h<l i 62 2 E4 X t2T(l) ui (at ) + T fui (xj ) ui ( i (x(j); l)”, i (l)(j) = G by Claim 3 of Lemma 11. Hence, then player i wants to maximize X (x(j)))g + (x(j))](aj;t ; yj;t ) j xj ; i[ t2T(l) 3 <l 5 : i (xi ); hi By Claim 1 of Lemma 2, any action is optimal and player i’s equilibrium value is T ui (xj ). addition, j (l) = B and h<l i 2 2 E4 X i (x(j); l), ui (at ) + ui (xj ) In then player i wants to maximize ui (BRi ( ; j (x(j))); t2T(l) ; j (x(j))) j xj ; 3 <l 5 : i (xi ); hi Since the reward is constant, player i wants to take a static best response to player j’s strategy. Moreover, Claim 4 of Lemma 11 ensures that player j with Hence, BRi ; j (x(j)) is optimal. Since i (x(j); l) 92 j (l) = G and j (l) = B takes ; j (x(j)). is the set of player i’s history in which player i has x(i) = x(j) and i (x(j); l) j (l)(i) ; takes BRi j = B with probability one. Figure 10 ensures that player i with (x(j)) with probability one. Hence, player i’s strategy is optimal. Moreover, her equilibrium value is T ui (xj ). In total, with l = L, the equilibrium payo¤ is 2 E4 = 8 < X T(l) target (xj ; fi [hi l ]; fjinclude [hj l ]; fj [hj ]; l) i ui (at ) + t2T(l) if T ui (xj ) : sign (x ) LT u + T uxj if j i Since the distribution of j (l) j (l) = G; j (l) = B: 3 5 j xj ; h<l i does not depend on player i’s strategy by Claim 2 of Lemma 11, player i before review round L ignores the continuation payo¤ from review round L on. Again, (62) ensures that the expected value of X adjust ; l) (xj ; hj l ; hreport j i l L 1 does not depend on player i’s strategy after review round L 1. Ignoring the rewards in (82) that have been sunk, player i in the supplemental rounds for and 2 E4 t2T( X ui (at ) + 1 (l))[T( t2T( 2 (l)) X 1 (l))[T( 1 (l) 2 (l) xj i (aj;t ; yj;t ) 2 (l)) with l = L maximizes j xj ; hi 1 3 l 15 : x Therefore, any strategy is optimal and the equilibrium value is equal to 2T 2 ui j since jT( 1 (l))j = 1 jT( 2 (l))j = T 2 . Again, this value does not depend on the previous history. L Hence, player i in review round 1 ignores the continuation payo¤ from the supplemental round for 1 (L) on. Therefore, by the same proof as review round L, we can establish the optimality of the equilibrium strategy. Recursively, we can show that the equilibrium strategy is optimal in the main and report blocks, and the equilibrium payo¤ from the main and report blocks is equal to 2 (L 1) T 1 2 x ui j + L X l=1 1f j (l)=Gg T ui (xj ) + 1f 93 x j (l)=Bg sign (xj ) LT u + T ui j : Since this value does not depend on player i’s strategy in the coordination block, player i in the coordination block maximizes E " X ui (at ) + xj i (aj;t ; yj;t ) t:coordination block # j xj : 1 2 x Hence, any strategy is optimal and equilibrium value is equal to 4T 2 + 2T 3 ui j , recalling that 1 2 4T 2 + 2T 3 is the length of the coordination block. Therefore, in total, player i’s value from the review phase except for 1 2 2 3 4T + 2T + 2 (L 1) T 1 2 x ui j + L X l=1 Hence, by de…nition of i (xj ) 1f j (l)=Gg T ui (xj ) + 1f i (xj ) is equal to x j (l)=Bg sign (xj ) LT u + T ui j : (85) in (52), the total equilibrium value (without taking the average) is equal to TP vi (xj ), as desired. In total, we have veri…ed incentive compatibility, promise keeping, and full dimensionality. 16.2 Self Generation PL We …rst show that we can ignore the terms except for " > 0 such that, for each xj 2 fG; Bg, we have (15 + 7L) x jui (xj )j + Lu + ui j (16) ensures that there exists such " > 0. P Note that the terms other than Ll=1 review (xj ; hj l ; l). i l=1 + " < jui (xj ) review (xj ; hj l ; l) i in To see this, de…ne (86) vi (xj )j : TP +1 ) i (xj ; hj satisfy, for each hj L and hreport , j 8 > > > > > > > > > > < > > > > > > > > > > : i (xj ) = TP vi (xj ) PR E hP L l=1 1f j (l)=Gg T ui (xj ) + 1f j (l)=Bg T sign (xj ) LT u + o n 1 2 x (2 + 2L) T 2 + 2T 3 ui j ; r=1:round r is not a review round xj i (aj;t ; yj;t ) adjust (xj ; hj l ; hreport ; l) i j exp (TP 1 T5 11 report (xj ; hj L ; hreport ) i j LT ) u by Lemma 2, by Lemma 13, Kreport T 12 by Lemma 13. 94 x T ui j i By Lemma 5, TP and LT are similar to each other for a large T . Hence, for a su¢ ciently large T , for each hj L and hreport , we have j 8 9 PR xj < = 1 i (xj ) + r=1: round r is not a review round i (aj;t ; yj;t ) TP : + PL adjust (xj ; h l ; hreport ; l) + report (xj ; h L ; hreport ) ; j j i j j l=1 i 2 hP xj L 1 1 vi (xj ) E l=1 1f j (l)=Gg L ui (xj ) + 1f j (l)=Bg L sign (xj ) Lu + ui hP 2 4 xj L 1 1 vi (xj ) E l=1 1f j (l)=Gg L ui (xj ) + 1f j (l)=Bg L sign (xj ) Lu + ui Moreover, by Claim 2 of Lemma 11, the probability of each l. Taking l = L, this means that Since 1 j (l) j (L) j (l) i i j (l) +" 5: = B is no more than (15 + 8L) = G with probability no less than 1 = B is absorbing, this also means that "; 3 for (15 + 8L) . = G for each l with probability no less than (15 + 8L) . Hence, if xj = G, we have 9 8 PR xj < = 1 i (xj ) + r=1: round r is not a review round i (aj;t ; yj;t ) TP : + PL adjust (xj ; h l ; hreport ; l) + report (xj ; h L ; hreport ) ; j j j j i l=1 i vi (xj ) ui (xj ) + (15 + 8L) x jui (xj )j + Lu + ui j + "; which is non-positive by (86). Similarly, if xj = B, we have 9 8 PR xj < = 1 i (xj ) + r=1: round r is not a review round i (aj;t ; yj;t ) lim P T !1 TP : ) ; ; l) + report (xj ; hj L ; hreport (xj ; hj l ; hreport + Ll=1 adjust j j i i vi (xj ) ui (xj ) (15 + 8L) x jui (xj )j + Lu + ui j " 0: n PL Hence, for a su¢ ciently large T , we have sign(xj ) i (xj ; hj TP +1 ) l=1 PL review l Therefore, it su¢ ces to show that sign(xj ) l=1 i (xj ; hj ; l) 0. P Moreover, if j (l) = B for some l = 1; :::; L, then we have sign(xj ) Ll=1 95 o 0. review (xj ; hj l ; l) i 0. review (xj ; hj l ; l) i To see this, recall that, for each history of player j, we have T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) i = 1f i (l)(j)=Bg +1f 8 < sign (xj ) LT u + : 8 > > > > > > < 1f i (l)(j)=Gg X j (l)=Gg > > > > > > : 8 > > > < xj i (aj;t ; yj;t ) t2T(l) > > > : +1 f 1f j (l)=Gg 8 < 9 = ; : + T fui (xj ) P Claim 5 of Lemma 11, x(j) = xj if : sign (x ) 1 j f j (l)=Gg j (l)=Gg T fui (xj ) fui (xj ) ui ( sign (xj ) j (l)=Gg 1f Therefore, if j (l) L X i ui (BRi ( ; j (x(j)))g ; (x(j))); Hence, together with Claim 3 of Lemma 11 ( i (l)(j) = B ) 1f ; xj i (aj;t ; yj;t ) t2T(l) 9 > > > > > > = 0. Moreover, by = G. Hence, (15) ensures that j (l) sign (xj ) 1f P 9 > > > = > > > : > (x(j))); j (x(j)))g ; > > > o > > xj ; (aj;t ; yj;t ) ; T fui (xj ) ui (BRi ( j n P +1f j (l)=Bg sign (xj ) LT u + t2T(l) j (l)=Bg 9 = (x(j)))g (x(j))](aj;t ; yj;t ) ; i[ t2T(l) For each history of player j, by Lemma 2, we have sign(xj ) 8 < ui ( j j (l) 0; (x(j)))g 0: = B), we have T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) i sign (xj ) X i[ (x(j))](aj;t ; yj;t ) + 1f t2T(l) j (l)=Gg T u + 1f j (l)=Bg j (l)=Bg LT u LT u by Lemma 2. = B for some l = 1; :::; L, then we have sign (xj ) T(l) review (xj ; fjinclude [hj l ]; fj [hj ]; l) i (L 1) T u + LT u 0; l=1 as desired. Therefore, we focus on the case with j (l) = G for each l = 1; :::; L. By Claim 5 of Lemma 11, we have xj (j) = xj . Let l be the last review round with each l = 1; :::; L). If xj = G, then recalling that 96 ; j j (l) (x(j)) = = G (de…ne l = L if j (x(j)) j (l) = G for with xj (j) = xj = G, we have L X l=1 = T(l) review (xj ; hj ; l) i 8 l < X l=1 : T fui (xj ) L X + ui ( (x(j)))g + X i[ t2T(l) T ui (xj ) ui (BRi ( ; j ; (x(j))); j 9 = (x(j))](aj;t ; yj;t ) ; (x(j))) l=l+1 lT max xi (j)2fG;Bg fui (xj ) ui ( (xj ; xi (j)))g + (L l)T f ui (xj ) | ui (BRi ( | j (x(j))); maxxi (j)2fG;Bg fui (xj ) ui ( + P j LT l 1 X u T 4L l=1 | {z } since t2T(l) i[ since j (l)=G, (x(j))](aj;t ;yj;t )j xi (j)2fG;Bg fui (xj ) u T 4 u T 4L ui ( ](aj ;yj ):Aj Yj ![ i[ u u ; 4 4 (x(j))) g j (x(j))) } } (xj ;xi (j)))g since xj (j)=xj ] by Lemma 2, is the maximum realization P (x(j))](aj;t ;yj;t ) t2T(l) i [ for for l=1;:::;l 1 max u T 4 |{z} + ui ( {z {z u u (xj ; xi (j)))g + T + T 4 4 0 by (15). Similarly, if xj = B, we have L X T(l) review (xj ; hj ; l) i l=1 = 8 l < X l=1 + : T fui (xj ) L X T ui (xj ) ui ( (x(j)))g + X i[ t2T(l) ui (BRi ( ; j ; (x(j))); j 9 = (x(j))](aj;t ; yj;t ) ; (x(j))) l=l+1 lT min xi (j)2fG;Bg +(L l)T f l 1 X u T 4L l=1 LT max fui (xj ) ui ( (xj ; xi (j)))g ui (xj ) ui (BRi ( j ; (x(j))); | {z minxi (j)2fG;Bg fui (xj ) ui (BRi ( j ; (x(j))); j ; xi (j)2fG;Bg u T 4 ui (xj ) max ui (BRi ( ; j (x(j))) g } (x(j)))g since xj (j)=xj (x(j))); 97 ; j ; j (x(j))); ui ( (xj ; xi (j))) u u + T+ T 4 4 0 by (15). Therefore, self generation is satis…ed. Since we have proven (5)–(8), Lemma 1 ensures that Theorem 1 holds. References Aoyagi, M. (2002): “Collusion in dynamic Bertrand oligopoly with correlated private signals and communication,”Journal of Economic Theory, 102(1), 229–248. Aoyagi, M. (2005): “Collusion through Mediated Communication in Repeated Games with Imperfect Private Monitoring,”Economic Theory, 25(2), pp. 455–475. Boucheron, S., G. Lugosi, and P. Massart (2013): Concentration Inequalities: A Nonasymptotic Theory of Independence. OUP Oxford. Compte, O. (1998): “Communication in repeated games with imperfect private monitoring,” Econometrica, 66(3), 597–626. Deb, J. (2011): “Cooperation and community responsibility: A folk theorem for repeated random matching games,”mimeo. Ely, J., J. Hörner, and W. Olszewski (2005): “Belief-free equilibria in repeated games,” Econometrica, 73(2), 377–415. Ely, J., and J. Välimäki (2002): “A robust folk theorem for the prisoner’s dilemma,”Journal of Economic Theory, 102(1), 84–105. Fudenberg, D., and D. Levine (2007): “The Nash-threats folk theorem with communication and approximate common knowledge in two player games,”Journal of Economic Theory, 132(1), 461–473. Fudenberg, D., D. Levine, and E. Maskin (1994): “The folk theorem with imperfect public information,”Econometrica, 62(5), 997–1039. Fudenberg, D., and E. Maskin (1986): “The folk theorem in repeated games with discounting or with incomplete information,”Econometrica, 53(3), 533–554. 98 Harrington, J., and A. Skrzypacz (2011): “Private monitoring and communication in cartels: Explaining recent collusive practices,”American Economic Review, 101(6), 2425–2449. Hörner, J., and W. Olszewski (2006): “The folk theorem for games with private almost-perfect monitoring,”Econometrica, 74(6), 1499–1544. (2009): “How robust is the folk theorem?,” The Quarterly Journal of Economics, 124(4), 1773–1814. Kandori, M. (2002): “Introduction to repeated games with private monitoring,”Journal of Economic Theory, 102(1), 1–15. (2011): “Weakly belief-free equilibria in repeated games with private monitoring,”Econometrica, 79(3), 877–892. Kandori, M., and H. Matsushima (1998): “Private observation, communication and collusion,” Econometrica, 66(3), 627–652. Kandori, M., and I. Obara (2006): “E¢ ciency in repeated games revisited: The role of private strategies,”Econometrica, 74(2), 499–519. Lehrer, E. (1990): “Nash equilibria of n-player repeated games with semi-standard information,” International Journal of Game Theory, 19(2), 191–217. Mailath, G., and L. Samuelson (2006): Repeated games and reputations: long-run relationships. Oxford University Press. Matsushima, H. (2004): “Repeated games with private monitoring: Two players,”Econometrica, 72(3), 823–852. Miyagawa, E., Y. Miyahara, and T. Sekiguchi (2008): “The folk theorem for repeated games with observation costs,”Journal of Economic Theory, 139(1), 192–221. Obara, I. (2009): “Folk theorem with communication,” Journal of Economic Theory, 144(1), 120–134. Piccione, M. (2002): “The repeated prisoner’s dilemma with imperfect private monitoring,”Journal of Economic Theory, 102(1), 70–83. 99 Radner, R., R. Myerson, and E. Maskin (1986): “An example of a repeated partnership game with discounting and with uniformly ine¢ cient equilibria,” Review of Economic Studies, 53(1), 59–69. Rahman, D. (2014): “The Power of Communication,”American Economic Review, 104(11), 3737– 51. Renault, J., and T. Tomala (2004): “Communication equilibrium payo¤s in repeated games with imperfect monitoring,”Games and Economic Behavior, 49(2), 313–344. Sekiguchi, T. (1997): “E¢ ciency in repeated prisoner’s dilemma with private monitoring,”Journal of Economic Theory, 76(2), 345–361. Stigler, G. (1964): “A theory of oligopoly,”The Journal of Political Economy, 72(1), 44–61. Sugaya, T. (2012): “Folk theorem in repeated games with private monitoring: multiple players,” mimeo. Takahashi, S. (2010): “Community enforcement when players observe partners’past play,”Journal of Economic Theory, 145(1), 42–62. Yamamoto, Y. (2007): “E¢ ciency results in N player games with imperfect private monitoring,” Journal of Economic Theory, 135(1), 382–413. (2009): “A limit characterization of belief-free equilibrium payo¤s in repeated games,” Journal of Economic Theory, 144(2), 802–824. (2012): “Characterizing belief-free review-strategy equilibrium payo¤s under conditional independence,”mimeo. 100
© Copyright 2026 Paperzz