Supplementary Material Supplement to “Bounding equilibrium payoffs in repeated games with private monitoring” (Theoretical Economics, Vol. 12, No. 2, May 2017, 691–729) Takuo Sugaya Graduate School of Business, Stanford University Alexander Wolitzky Department of Economics, MIT Proof of Proposition 2 We prove that Etalk (δ p) = Emed (δ). In our construction, players ignore private signals yit observed in periods t = 1 2 . That is, only signal yi0 observed in period 0 is used. Hence we can see p as an ex ante correlation device. Since we consider two-player games, whenever we say players i and j, we assume that they are different players: i = j. The structure of the proof is as follows: take any strategy of the mediator, μ̃, that satisfies inequality (3) in the text (perfect monitoring incentive compatibility), and let ṽ be the value when the players follow μ̃. Since each v̂ ∈ Emed (δ) has a corresponding μ̂ that satisfies perfect monitoring incentive compatibility, it suffices to show that, for each ε > 0, there exists a sequential equilibrium whose equilibrium payoff v satisfies v − ṽ < ε in the following environment: (i) At the beginning of the game, each player i receives a message mmediator from the i mediator. (ii) In each period t, the stage game proceeds as follows: 2nd t−1 (a) Given player i’s history (mmediator (m1st τ aτ mτ )τ=1 ), each player i sends the i first message m1st it simultaneously. 1st 2nd t−1 (m1st (b) Given player i’s history (mmediator τ aτ mτ )τ=1 mt ), each player i i takes action ait simultaneously. 1st 2nd t−1 (c) Given player i’s history (mmediator (m1st τ aτ mτ )τ=1 mt at ), each player i i 2nd sends the second message mit simultaneously. We call this environment perfect monitoring with cheap talk. Takuo Sugaya: [email protected] Alexander Wolitzky: [email protected] Copyright © 2017 The Authors. Theoretical Economics. The Econometric Society. Licensed under the Creative Commons Attribution-NonCommercial License 3.0. Available at http://econtheory.org. DOI: 10.3982/TE2270 2 Sugaya and Wolitzky Supplementary Material To this end, from μ̃, we first create a strict full-support equilibrium μ with mediated perfect monitoring that yields payoffs close to ṽ. We then move from μ to a similar equilibrium μ∗ , which will be easier to transform into an equilibrium with perfect monitoring with cheap talk. Finally, from μ∗ , we create an equilibrium with perfect monitoring with cheap talk with the same on-path action distribution. Construction and properties of μ In this subsection, we consider mediated perfect monitoring throughout. Since W̊ ∗ = ∅, by Lemma 2 in the main text, there exists a strict full-support equilibrium μstrict with mediated perfect monitoring. As in the proof of that lemma, consider the following strategy of the mediator: In period 1, the mediator draws one of two states, Rṽ and Rperturb , with probabilities 1 − η and η, respectively. In state Rṽ , the mediator’s recommendation is determined as follows: If no player has deviated up to period t, the mediator recommends rt according to μ̃(htm ); if only player i has deviated, the mediator recommends rjt to player j according to α∗j , and recommends some best response to α∗j to player i. Multiple deviations are treated as in the proof of Lemma 1. In contrast, in state Rperturb , the mediator follows the equilibrium μstrict . Let μ denote this strategy of the mediator. From now on, we fix η > 0 arbitrarily. With mediated perfect monitoring, since μstrict has full support, player i believes that the mediator’s state is Rperturb with positive probability after any history. Therefore, by perfect monitoring incentive compatibility and the fact that μstrict is a strict equilibrium, it is always strictly optimal for each player i to follow her recommendation. This means that, for each period t, there exist εt > 0 and Tt < ∞ such that, for each player i and on-path history ht+1 m , we have ∞ τ t μ t μ τ−t−1 (1 − δ)E ui (rt ) | hm rit + δE (1 − δ) δ ui μ hm hm rit > max (1 − δ)E ui (ai rjt ) htm rit τ=t+1 (S1) ai ∈Ai + δ − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a) âi a∈A a∈A That is, suppose that if player i unilaterally deviates from on-path history, then player j virtually minmaxes player i for Tt − 1 periods with probability 1 − εt . (Recall that α∗j is the minmax strategy and αεj is a full-support perturbation of α∗j .) Then player i has a strict incentive not to deviate from any recommendation in period t on equilibrium path. Equivalently, since μ is a full-support recommendation, player i has a strict incentive not to deviate unless she herself has deviated. Moreover, for sufficiently small εt > 0, we have ∞ τ t μ t μ τ−t−1 δ ui μ hm hm (1 − δ)E ui (rt ) | hm rit + δE (1 − δ) τ=t+1 > 1 − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a) âi a∈A a∈A (S2) Supplementary Material Bounding equilibrium payoffs 3 That is, if a deviation is punished with probability 1 − εt for Tt periods including the current period, then player i believes that the deviation is strictly unprofitable.1 For each t, we fix εt > 0 and Tt < ∞ with (S1) and (S2). Without loss, we can take εt decreasing: εt ≥ εt+1 for each t. Construction and properties of μ∗ In this subsection, we again consider mediated perfect monitoring. We further modify μ and create the following mediator’s strategy μ∗ : Fix a fully mixed μ̊ ∈ (A) with u(μ̊) ∈ W̊ ∗ . At the beginning of the game, for each i, t, and at , the mediator draws punish t rit (a ) according to αεi t . In addition, for each i and t, she draws ωit ∈ {R P} such that ωit = R (regular) and P (punish) with probability 1 − pt and pt , respectively, independently across i and t. We will pin down pt > 0 in Lemma S1. Moreover, given ωt = (ω1t ω2t ), the mediator chooses rt (at ) for each at as follows: If ω1t = ω2t = R, then she draws rt (at ) according to μ(at )(r) if ω1τ = ω2τ = R for each τ ≤ t −1; and draws rt (at ) according to μ̊(r) if there exists τ ≤ t − 1 with ω1τ = P or ω2τ = P. If ωit = R and punish ωjt = P, then she draws rit (at ) from Prμ (ri | rjt (at )) while she draws rjt (at ) ran aj 2 domly from aj ∈Aj |Aj | . Finally, if ω1t = ω2t = P, then she draws rit (at ) randomly ai for each i independently. Since μ has full support, μ∗ is well defined. from ai ∈Ai |A i| As will be seen, we will take pt sufficiently small. In addition, recall that η > 0 (the perturbation of μ̃ to μ) is arbitrarily. In the next subsection and onward, we construct an equilibrium with perfect monitoring with cheap talk that has the same equilibrium action distribution as μ∗ . Since pt is small and η > 0 is arbitrary, constructing such an equilibrium suffices to prove Proposition 2. punish t At the start of the game, the mediator draws ωt , rit (a ), and rt (at ) for each i, t, t and a . Given them, the mediator sends messages to the players as follows: punish (i) At the start of the game, the mediator sends ((rit (at ))at ∈At−1 )∞ t=1 to player i. (ii) In each period t, the stage game proceeds as follows: (a) The mediator decides ω̄t (at ) ∈ {R P}2 as follows: if there is no unilateral deviator (defined below), then the mediator sets ω̄t (at ) = ωt . If instead player i is a unilateral deviator, then the mediator sets ω̄it (at ) = R and ω̄jt (at ) = P. (b) Given ω̄it (at ), the mediator sends ω̄it (at ) to player i. In addition, if ω̄it (at ) = R, then the mediator sends rit (at ) to player i as well. (c) Given these messages, player i takes an action. In equilibrium, if player i has punish t (a ) not yet deviated, then player i takes rit (at ) if ω̄it (at ) = R and takes rit 1 If the current on-path recommendation schedule Prμ (r | ht r ) is very close to α∗ , then (S2) may be jt m it j more restrictive than (S1). punish will be seen below, if ωjt = P, then player j is supposed to take rjt (at ). Hence, rjt (at ) does not affect the equilibrium action. We define rjt (at ) so that, when the mediator sends a message only at the beginning of the game (in the game with perfect monitoring with cheap talk), she sends a “dummy recommendation” rjt (at ) so that player j does not realize that ωjt = P until period t. 2 As 4 Sugaya and Wolitzky Supplementary Material if ω̄it (at ) = P. For notational convenience, let rit = ri at punish t a rit if ω̄it at = R if ω̄it at = P be the action that player i is supposed to take if she has not yet deviated. Her strategy after her own deviation is not specified. We say that player i has unilaterally deviated if there exist τ ≤ t − 1 and a unique i such that (i) for each τ < τ, we have anτ = rnτ for each n ∈ {1 2} (no deviation happened until period τ − 1) and (ii) aiτ = riτ and ajτ = rjτ (player i deviates in period τ and player j does not deviate). Note that μ∗ is close to μ on the equilibrium path for sufficiently small pt . Hence, onpath strict incentive compatibility for player i follows from (S1). Moreover, the incentive compatibility condition analogous to (S2) also holds. Lemma S1. There exists {pt }∞ t=1 with pt > 0 for each t such that it is strictly optimal for each player i to follow her recommendation: For each player i and history hti ≡ punish t rit a t−1 ∞ at ω̄τ aτ τ=1 ω̄it at (riτ )tτ=1 at ∈At−1 t=1 if player i herself has not yet deviated, we have the following two inequalities: (i) If a deviation is punished by αεj t for the next period Tt periods with probability t −1 1 − εt − t+T pτ , then it is strictly unprofitable: τ=t ∞ ∗ ∗ δτ−t−1 ui (riτ ajτ ) | hti ait = rit (1 − δ)Eμ ui (rit ajt ) | hti + δEμ (1 − δ) τ=t+1 > max (1 − δ)E ui (ai ajt ) | hti μ∗ ai ∈Ai + δ − δTt + εt + 1 − εt − t+T t −1 t+T t −1 τ=t pτ max ui âi αεj t âi (S3) pτ max ui (a) τ=t a∈A + δTt max ui (a) a∈A (ii) If a deviation is punished by αεj t from the current period with probability 1 − εt − t+Tb −1 pt , then it is strictly unprofitable: τ=t ∗ (1 − δ)Eμ ui (rit ajt ) | hti ∞ μ∗ τ−t−1 t δ ui (riτ ajτ ) | hi ait = rit (1 − δ) (S4) + δE τ=t+1 Supplementary Material > 1−δ Tt 5 Bounding equilibrium payoffs 1 − εt − t+T t −1 τ=t pτ max ui âi αεj t + âi εt + t+T t −1 pτ max ui (a) a∈A τ=t + δTt max ui (a) a∈A ∗ Moreover, Eμ does not depend on the specification of player j’s strategy after player j’s own deviation, for each history hti such that player i has not deviated. Proof. Given μ̊, since u(μ̊) ∈ W̊ ∗ , for sufficiently small εt > 0, we have (1 − δ)Eμ̊ ui (rt ) | htm rit + δui (μ̊) > max (1 − δ)E ui (ai r−it ) | htm rit ai ∈Ai + δ − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a) âi a∈A a∈A and (1 − δ)Eμ̊ ui (rt ) | htm rit + δui (μ̊) > 1 − δTt (1 − εt ) max ui âi αεj t + εt max ui (a) + δTt max ui (a) âi a∈A a∈A Hence (S1) and (S2) hold with μ replaced with μ̊. Since μ∗ has full support on the equilibrium path, a player i who has not yet devi∗ ated always believes that player j has not deviated. Hence, Eμ is well defined without specifying player j’s strategy after player j’s own deviation. Moreover, since pt is small and ωjt is independent of (ωτ )t−1 τ=1 and ωit , given t ) (which are equal to (ω )t−1 and ω on path), player i believes (ω̄τ (aτ ))t−1 and ω̄ (a τ τ=1 it it τ=1 that ω̄jt (at ) is equal to ωjt and ωjt is equal to R with a high probability, unless player i has deviated. Since Prμ∗ rjt | ω̄it at ω̄jt at = R hti = Prμ∗ rjt | at rit we have that the difference ∗ Eμ ui (rit ajt ) | hti − Eμ ui (rit ajt ) | rit at rit is small for small pt . Further, if pτ is small for each τ ≥ t + 1, then since ωτ is independent of ωt with t ≤ τ − 1, regardless of (ω̄τ (aτ ))tτ=1 , player i believes that ω̄iτ (aτ ) = ω̄jτ (aτ ) = R with high probability for τ ≥ t + 1 on the equilibrium path. Since the distribution of the recommendation given μ∗ is the same as that of μ (or μ̊) given aτ and (ω̄iτ (aτ ) ω̄jτ (aτ )) = (R R) for each τ ≤ t − 1 (or (ω̄iτ (aτ ) ω̄jτ (aτ )) = (R R) for some τ ≤ t − 1, respectively), we have that 6 Sugaya and Wolitzky Supplementary Material 1. for each hti with ω̄it (at ) = R and (ω̄iτ (aτ ) ω̄jτ (aτ )) = (R R) for each τ ≤ t − 1, E μ∗ (1 − δ) ∞ δτ−t−1 ui (riτ ajτ ) | hti ait = rit τ=t+1 ∞ − Eμ (1 − δ) δτ−t−1 ui (riτ ajτ ) | rit at rit τ=t+1 is small for small pτ with τ ≥ t + 1; and 2. for each hti with ω̄it (at ) = R and (ω̄iτ (aτ ) ω̄jτ (aτ )) = (R R) for some τ ≤ t − 1, E μ∗ (1 − δ) −E μ̊ ∞ τ−t−1 δ ui (riτ ajτ ) | hti ait = rit τ=t+1 (1 − δ) ∞ τ−t−1 δ ui (riτ ajτ ) | rit at rit τ=t+1 is small for small pτ with τ ≥ t + 1. Hence, (S1) and (S2) (and the same inequalities with μ∗ replaced with μ̊) imply that, there exists p̄t > 0 such that, if pτ ≤ p̄t for each τ ≥ t, then the claims of the lemma hold. Hence, if we take pt ≤ minτ≤t p̄τ , then the claims hold. ∗ We fix {pt }∞ t=1 so that Lemma S1 holds. This fully pins down μ with mediated perfect monitoring. Construction with perfect monitoring with cheap talk Given μ∗ with mediated perfect monitoring, we define the equilibrium strategy with perfect monitoring with cheap talk such that the equilibrium action distribution is the same as μ∗ . We must pin down the following four objects: at the beginning of the game, what player i receives from the mediator; what message m1st message mmediator i it player i sends at the beginning of period t; what action ait player i takes in period t; and what message m2nd it player i sends at the end of period t. Intuitive argument As in μ∗ , at the beginning of the game, for each i, t, and at , the punish t (a ) according to αεi t . In addition, with pt > 0 pinned down in mediator draws rit Lemma S1, she draws ωt ∈ {R P}2 and rt (at ) as in μ∗ for each t and at . She then defines ω̄t (at ) from at , rt (at ), and ωt as in μ∗ . Intuitively, the mediator sends all the information about punish t punish t ∞ ω̄t at rt at r1t a r2t a at ∈At−1 t=1 through the initial messages (mmediator mmediator ). In particular, the mediator directly 1 2 punish sends ((rit mediator . Hence, we focus on how (at ))at ∈At−1 )∞ t=1 to player i as a part of mi Supplementary Material Bounding equilibrium payoffs 7 we replicate the role of the mediator in μ∗ of sending (ω̄t (at ) rt (at )) in each period, depending on realized history at . The key features to establish are (i) player i does not know the instructions for the other player, (ii) before player i reaches period t, player i does not know her own recommendations for periods τ ≥ t (otherwise, player i would obtain more information than the original equilibrium μ∗ and thus might want to deviate), and (iii) no player wants to deviate (in particular, if player i deviates in actions or cheap talk, then the strategy of player j is as if the state were ω̄jt = P in μ∗ , for a sufficiently long time with a sufficiently high probability). The properties (i) and (ii) are achieved by the same mechanism as in Theorem 9 of Heller et al. (2012, henceforth HST). In particular, without loss, let Ai = {1i ni } be player i’s action set. We can view rit (at ) as an element of {1 ni }. The mediator at the beginning of the game draws rt (at ) for each at . Instead of sending rit (at ) directly to player i, the mediator encodes rit (at ) as follows: For a sufficiently large N t ∈ Z to be determined, we define pt = N t ni nj . This pt j corresponds to ph in HST. Let Zpt ≡ {1 pt }. The mediator draws xit (at ) uniformly and independently from Zpt for each i, t, and at . Given them, she defines t j i yit a ≡ xit at + rit at (mod ni ) (S5) i (at ) is the “encoded instruction” of r (at ), and to obtain r (at ) from Intuitively, yit it it j i (at ), player i needs to know x (at ). The mediator gives ((y i (at )) ∞ yit at ∈At−1 )t=1 to player i i it j as a part of mmediator . At the same time, she gives ((xit (at ))at ∈At−1 )∞ i t=1 to player j as a part j of mmediator . At the beginning of period t, player j sends xit (at ) by cheap talk as a part j t t of m1st jt , based on the realized action a , so that player i does not know rit (a ) until period t. (Throughout the proof, the superscript of a variable represents who is informed about the variable, and the subscript represents whose recommendation the variable is about.) To incentivize player j to tell the truth, the equilibrium should embed a mechanism that punishes player i if she tells a lie. In HST, this is done as follows: The mediator draws αiit (at ) and βiit (at ) uniformly and independently from Zpt , and defines j j uit at ≡ αiit at × xit at + βiit at mod pt j (S6) j The mediator gives xit (at ) and uit (at ) to player j while she gives αiit (at ) and βiit (at ) to j j player i. In period t, player j is supposed to send xit (at ) and uit (at ) to player i. If player j j i receives xit (at ) and uit (at ) with j j uit at = αiit at × xit at + βiit at mod pt (S7) then player i interprets that player j has deviated. For sufficiently large N t , since player j j does not know αiit (at ) and βiit (at ), if player j tells a lie about xit (at ), then with a high probability, player j creates a situation where (S7) holds. 8 Sugaya and Wolitzky Supplementary Material Since HST considers Nash equilibrium, they let player i minimax player j forever after (S7) holds. However, since we consider sequential equilibrium, as in the proof of Lemma 2, we will create a coordination mechanism such that, if player j tells a lie, then with high probability player i minimaxes player j for a long time and player i assigns probability 0 to the event that player i punishes player j. To this end, we consider the following coordination: First, if and only if ω̄it (at ) = R, j j the mediator defines uit (at ) as (S6). Otherwise, uit (at ) is randomly drawn. That is, j uit at ≡ j αiit at × xit at + βiit at mod pt uniformly distributed over Zpt if ω̄it at = R if ω̄it at = P (S8) Since both ω̄it (at ) = R and ω̄it (at ) = P happen with a positive probability, player i afj j j ter receiving uit (at ) with uit (at ) = αiit (at ) × xit (at ) + βiit (at ) (mod pt ) interprets that ω̄it (at ) = P. For notational convenience, let ω̂it (at ) ∈ {R P} be player i’s interpretation punish of ω̄it (at ). After ω̂it (at ) = P, she takes period-t action according to rit (at ). Given j this inference, if player j tells a lie about uit (at ) with ω̄it (at ) = R, then with a high probj j ability, she induces a situation with uit (at ) = αiit (at ) × xit (at ) + βiit (at ) (mod pt ), and player i punishes player j in period t (without noticing player j’s deviation). punish t (a ) for period t only may not suffice if player j believes Second, switching to rit that player i’s action distribution given ω̄it (at ) = R is close to the minimax strategy. punish (aτ ) for a sufficiently Hence, we ensure that once player j deviates, player i takes riτ long time. To this end, we change the mechanism so that player j does not always know j j uit (at ). Instead, the mediator draws pt independent random variables vit (n at ) with n = 1 pt uniformly from Zpt . In addition, she draws niit (at ) uniformly from Zpt . The j mediator defines uit (n at ) for each n = 1 pt as j uit n at = j j uit at j vit n at if n = niit at if otherwise, j that is, uit (n at ) corresponds to uit (at ) with (S8) only if n = niit (at ). For other n, j uit (n at ) is completely random. j The mediator sends niit (at ) to player i, and sends {uit (n at )}n∈Zpt to player j. In j addition, the mediator sends nit (at ) to player j, where j nit at = niit at uniformly distributed over Zpt if ωit−1 at−1 = P if ωit−1 at−1 = R is equal to niit (at ) if and only if last-period ω̄it−1 (at−1 ) is equal to P. Supplementary Material Bounding equilibrium payoffs j 9 j In period t, player j is asked to send xit (at ) and uit (n at ) with n = niit (at ), that is, j j j j send xit (at ) and uit (at ). If and only if player j’s messages x̂it (at ) and ûit (at ) satisfy j j ûit at = αiit at × x̂it at + βiit at mod pt player i interprets ω̂it (at ) = R. If player i has ω̂it (at ) = R, then player i knows that j player j needs to know niit+1 (at+1 ) to send the correct uit+1 (n at+1 ) in the next period. Hence, she sends niit+1 (at+1 ) to player j. If player i has ω̂it (at ) = P, then she believes that player j knows niit+1 (at+1 ) and does not send niit+1 (at+1 ). Given this coordination, once player j creates a situation with ω̄it (at ) = R but ω̂it (at ) = P, then player j cannot receive niit+1 (at+1 ). Without knowing niit+1 (at+1 ), j with a large N t+1 , with a high probability, player j cannot know which uit+1 (n at+1 ) she should send. Then, again, she will create a situation with j j ûit+1 at+1 = αiit+1 at+1 × x̂it at+1 + βiit at+1 mod pt+1 that is, ω̂it+1 (at+1 ) = P. Recursively, player i has ω̂iτ (aτ ) = P for a long time with a high probability if player j tells a lie. Finally, if player j takes a deviant action in period t, then the mediator has drawn ω̄iτ (aτ ) = P for each τ ≥ t + 1 for aτ corresponding to the realized history. With ω̄iτ (aτ ) = P, so as to avoid ω̂iτ (aτ ) = P, player j needs to create a situation j j ûiτ aτ = αiiτ aτ × x̂iτ aτ + βiiτ aτ mod pτ without knowing αiiτ (aτ ) and βiiτ (aτ ) while the mediator’s message does not tell her j what is αiit (at ) × xit (at ) + βiit (at ) (mod pτ ) by (S8). Hence, for sufficiently large N τ , player j cannot avoid ω̂iτ (aτ ) = P with a nonnegligible probability. Hence, player j will be minmaxed from the next period with a high probability. The above argument in total shows that if player j deviates, whether in communication or action, then she will be minmaxed for a sufficiently long time. Lemma S1 ensures that player j does not want to tell a lie or take a deviant action. Formal construction Let us formalize the above construction: As in μ∗ , at the beginpunish t (a ) according to αεi t ; ning of the game, for each i, t, and at , the mediator draws rit then she draws ωt ∈ {R P}2 and rt (at ) for each t and at ; and then she defines ω̄t (at ) from j at , rt (at ), and ωt as in μ∗ . For each t and at , she draws xit (at ) uniformly and independently from Zpt . Given them, she defines t j i yit a ≡ xit at + rit at (mod ni ) so that (S5) holds. j j The mediator draws αiit (at ), βiit (at ), ũit (at ), vit (n at ) for each n ∈ Zpt , niit (at ), and j ñit (at ) from the uniform distribution over Zpt independently for each player i, each period t, and each at . 10 Sugaya and Wolitzky Supplementary Material As in (S8), the mediator defines j αiit at × xit at + βiit at mod pt j t uit a ≡ j t ũit a if ω̄it at = R if ω̄it at = P In addition, the mediator defines j uit n at = and j nit at = j uit at j vit n at niit at j ñit at if n = niit at if otherwise if t = 1 or ωit−1 at−1 = P if t = 1 and ωit−1 at−1 = R as explained above. Let us now define the equilibrium: (i) At the beginning of the game, the mediator sends ⎛ t i t i t punish t i a α a β a r a y it it it it mmediator = ⎝ i t i t i i nit a njt a ujt n at n∈Z t xijt at p ⎞∞ ⎠ at ∈At−1 t=1 to each player i. (ii) In each period t, the stage game proceeds as follows: In equilibrium, j j t t uit m2nd if t = 1 and m2nd it−1 a xit a it−1 = {babble} 1st mjt = j j t t j t 2nd if t = 1 or mit−1 = {babble} uit nit a a xit a and m2nd jt (S9) j njt+1 at+1 if ω̂jt at = R = {babble} if ω̂jt at = P t+1 = Note that, since m2nd jt is sent at the end of period t, the players know a (a1 at ). 2nd t−1 (a) Given player i’s history (mmediator (m1st τ aτ mτ )τ=1 ), each player i sends the i first message m1st it simultaneously. If player i herself has not yet deviated, then m1st it i t t uijt m2nd if t = 1 and m2nd jt−1 a xjt a jt−1 = {babble} = i i t t i t 2nd if t = 1 or mjt−1 = {babble} ujt njt a a xjt a i 1st 2nd t Let m1st it (u) be the first element of mit (that is, either ujt (mjt−1 a ) or uijt (nijt (at ) at ) on equilibrium), and let m1st it (x) be the second element (xijt (at ) on equilibrium). As a result, the profile of the messages m1st t becomes common knowledge. Supplementary Material Bounding equilibrium payoffs 11 If t t i 1st i m1st mod pt jt (u) = αit a × mjt (x) + βit a (S10) then player i interprets ω̂it (at ) = P. Otherwise, ω̂it (at ) = R. 1st 2nd t−1 (b) Given player i’s history (mmediator (m1st τ aτ mτ )τ=1 mt ), each player i i takes action ait simultaneously. If player i herself has not yet deviated, then player i takes ait = rit with rit t i yit a − m1st (x) (mod ni ) = punish t jt rit a if ω̂it at = R if ω̂it at = P (S11) j i (at ) ≡ x (at ) + r (at ) (mod n ) by (S5). By (S9), therefore, player Recall that yit it i it punish i (at ) if ω̄ (at ) = R and r i takes rit it it path, as in μ∗ . (at ) if ω̄it (at ) = P on the equilibrium 1st 2nd t−1 (c) Given player i’s history (mmediator (m1st τ aτ mτ )τ=1 mt at ), each player i i sends the second message m2nd it simultaneously. If player i herself has not yet deviated, then niit+1 at+1 if ω̂it at = R 2nd mit = {babble} if ω̂it at = P As a result, the profile of the messages m2nd becomes common knowledge. t t Note that ω̄t (a ) becomes common knowledge as well on equilibrium path. Incentive compatibility The above equilibrium has full support: Since ω̄t (at ) and rt (at ) have full support, (mmediator mmediator ) have full support as well. Hence, we are left to verify player i’s in1 2 centive not to deviate from the equilibrium strategy, given that player i believes that player j has not yet deviated after any history of player i. Suppose that player i followed the equilibrium strategy until the end of period t − 1. First, consider player i’s incentive to tell the truth about m1st it . In equilibrium, player i sends i t t uijt m2nd if m2nd jt−1 a xjt a jt−1 = {babble} 1st mit = i i t t i t if m2nd ujt njt a a xjt a jt−1 = {babble} The random variables possessed by player i are independent of those possessed by j j i i 2nd t−1 t t t t player j given (m1st τ aτ mτ )τ=1 , except that (i) uit (a ) = αit (a ) × xit (a ) + βit (a ) j j (mod pt ) if ω̄it (at ) = R, (ii) uijt (at ) = αjt (at ) × xijt (at ) + βjt (at ) (mod pt ) if ω̄jt (at ) = R, j j (iii) niτ (aτ ) = niiτ (aτ ) if ωiτ−1 (aτ−1 ) = P while niτ (aτ ) = ñiiτ (aτ ) if ωiτ−1 (aτ−1 ) = R, and j j (iv) nijτ (aτ ) = njτ (aτ ) if ωjτ−1 (aτ−1 ) = P while nijτ (aτ ) = ñjτ (aτ ) if ωjτ−1 (aτ−1 ) = R. 12 Sugaya and Wolitzky Supplementary Material j j j Since αiit (at ), βiit (at ), ũit (at ), vit (n at ) niit (at ), and ñit (at ) are uniform and independent, player i cannot learn ω̄iτ (aτ ), riτ (aτ ), or rjτ (aτ ) with τ ≥ t. Hence, player i believes at the time when she sends m1st it that her equilibrium value is equal to ∞ ∗ (1 − δ)E ui (at ) | hti + δEμ (1 − δ) δτ−t−1 ui (at ) | hti μ∗ τ=t+1 punish t where hti is as if player i observed (rit (at ))∞ , at , (ω̄τ (aτ ))t−1 τ=1 , and rit (a ), and at ∈At−1 t=1 τ ∗ believed that rτ (a ) = aτ for each τ = 1 t − 1 with μ with mediated perfect monitoring. Alternatively, for each e > 0, for a sufficiently large N t , if player i tells a lie in at least one element m1st it , then with probability 1 − e, player i creates a situation j t j t 1st mod pt m1st it (u) = αjt a × mit (x) + βjt a Hence, (S10) (with indices i and j reversed) implies that ω̂jt (at ) = P. Moreover, if player i creates a situation with ω̂jt (at ) = P, then player j will send j j t+1 ). Unless ω̄ (at ) = P, since n t+1 ) is indem2nd jt jt = {babble} instead of njt+1 (a jt+1 (a j pendent of player i’s variables, player i believes that njt+1 (at+1 ) is distributed uniformly over Zpt+1 . Hence, for each e > 0, for sufficiently large N t , if ω̂jt (at ) = R, then player i will send m1st it+1 with t+1 t+1 j j m1st × m1st mod pt+1 it+1 (u) = αjt+1 a it+1 (x) + βjt+1 a with probability 1 − e. Then, by (S10) (with indices i and j reversed), player j will have ω̂jt+1 (at+1 ) = P. Recursively, if ω̄jτ (aτ ) = R for each τ = t t + Tt − 1, then player i will induce ω̂jτ (aτ ) = P for each τ = t t + Tt − 1 with a high probability. Hence, for εt > 0 and Tt fixed in (S1) and (S2), for sufficiently large N̄ t , if N τ ≥ N̄ t for each τ ≥ t, then player i will be punished for the subsequent Tt periods regardless of player i’s continuation strategy t −1 t −1 pτ . ( t+T pτ represents the maximum with probability no less than 1 − εt − t+T τ=t τ=t probability of having ω̄iτ (aτ ) = P for some τ for subsequent Tt periods.) Equation (S4) implies that telling a lie gives a strictly lower payoff than the equilibrium payoff. Therefore, it is optimal to tell the truth about m1st it . (In (S4), we have shown interim incentive compatibility after knowing ω̄it (at ) and rit , while here we consider hti before ω̄it (at ) and rit . Taking the expectation with respect to ω̄it (at ) and rit , (S4) ensures incentive compatibility before knowing ω̄it (at ) and rit .) Second, consider player i’s incentive to take the action ait = rit according to (S11) if player i follows the equilibrium strategy until she sends m1st it . If she follows the equilibrium strategy, then player i believes at the time when she takes an action that her equilibrium value is equal to ∞ μ∗ t μ∗ τ−t−1 t δ ui (at ) | hi (1 − δ) (1 − δ)E ui (at ) | hi + δE τ=t+1 Supplementary Material Bounding equilibrium payoffs 13 punish t (at ))∞ , at , (ω̄τ (aτ ))t−1 where hti is as if player i observed (rit τ=1 , ω̄it (a ), and at ∈At−1 t=1 τ ∗ rit , and believed that rτ (a ) = aτ for each τ = 1 t − 1 with μ with mediated perfect t monitoring. (Compared to the time when player i sends m1st it , player i now knows ω̄it (a ) and rit on the equilibrium path.) If player i deviates from ait , then ω̄jτ (aτ ) = P by definition for each τ ≥ t + 1 and τ a that is compatible with at (that is, aτ = (at at aτ−1 ) for some at aτ−1 ). To avoid being minmaxed in period τ, player i needs to induce ω̂jτ (aτ ) = R although j j ω̄jτ (aτ ) = P. Given ω̄jτ (aτ ) = P, since αiit (at ), βiit (at ), ũit (at ), vit (n at ) niit (at ), and j ñit (at ) are uniform and independent (conditional on the other variables), regardless of player i’s continuation strategy, by (S10) (with indices i and j reversed), player i will send m1st iτ with j τ j τ 1st m1st mod pτ iτ (u) = αjτ a × miτ (x) + βjτ a with a high probability. Hence, for sufficiently large N̄ t , if N τ ≥ N̄ t for each τ ≥ t, then player i will be punished for the next Tt periods regardless of player i’s continuation strategy with probability no less than 1 − εt . By (S3), deviating from rit gives a strictly lower payoff than her equilibrium payoff. Therefore, it is optimal to take ait = rit . 2nd Finally, consider player i’s incentive to tell the truth about m2nd it . Regardless of mit , player j’s actions do not change. Hence, we are left to show that telling a lie does not improve player i’s deviation gain by giving player i more information. On the equilibrium path, player i knows ω̄it (at ). If player i tells the truth, then i t+1 ) = {babble} if and only if ω̄ (at ) = R. Moreover, player j sends m2nd it it = nit+1 (a m1st jt+1 j j t+1 uit+1 m2nd xit+1 at+1 it a = j j j uit+1 nit+1 at+1 at+1 xit+1 at+1 if ω̄it at = R if ω̄it at = P j Since nit+1 (at+1 ) = niit+1 (at+1 ) if ω̄it (at ) = P, in total, if player i tells the truth, then player i knows uijt+1 (miit+1 (at+1 ) at+1 ) and xijt+1 (at+1 ). This is sufficient information to infer ω̄it+1 (at+1 ) and rit+1 (at+1 ) correctly. If she tells a lie, then player j’s messages are changed to m1st jt+1 j 2nd t+1 j u mit a xit+1 at+1 if m2nd it = {babble} = it+1 j t+1 t+1 j t+1 j uit+1 nit+1 a a xit+1 a if m2nd it = {babble} j j j Since αiit+1 (at+1 ), βiit+1 (at+1 ), ũit+1 (at+1 ), vit+1 (n at+1 ) niit+1 (at+1 ), and ñit+1 (at+1 ) j are uniform and independent conditional on ω̄it+1 (at+1 ) and rit+1 (at+1 ), uit+1 (n at+1 ) j and xit+1 (at+1 ) are not informative about player j’s recommendation from period t + 1 on or player i’s recommendation from period t + 2 on, given that player i knows ω̄it+1 (at+1 ) and rit+1 (at+1 ). Since telling the truth informs player i of ω̄it+1 (at+1 ) and rit+1 (at+1 ), there is no gain from telling a lie. 14 Sugaya and Wolitzky Supplementary Material References Heller, Y., E. Solan, and T. Tomala (2012), “Communication, correlation and cheap-talk in games with public information.” Games and Economic Behavior, 74, 222–234. [7] Co-editor George J. Mailath handled this manuscript. Manuscript received 26 August, 2015; final version accepted 25 April, 2016; available online 25 April, 2016.
© Copyright 2026 Paperzz