Nash Equilibria of Repeated Games with Observable

Games and Economic Behavior 28, 310–324 (1999)
Article ID game.1998.0701, available online at http://www.idealibrary.com on
Nash Equilibria of Repeated Games with
Observable Payoff Vectors*
Tristan Tomala
Cermsem, Université Paris-I Panthéon-Sorbonne, Ceremade, Université Paris
Dauphine, Place de Lattre de Tassigny, 75016 Paris, France
E-mail: [email protected].
Received October 25, 1995
We study a model of repeated games with imperfect monitoring where the payoff vector is observable. In this situation, any profitable deviation is detectable by
all the players but the identity of the deviator may be unknown. We design collective punishments directed against the set of potential deviators. A particular
class of signals is studied for which a characterization of the set of equilibrium
payoffs is obtained. Journal of Economic Literature Classification Numbers: C73.
© 1999 Academic Press
1. INTRODUCTION
This paper deals with Nash equilibiria of undiscounted repeated games
with imperfect monitoring. The players are involved in a repeated strategic
interaction and their aim is to maximize their long-run payoff. They rely
on the information collected along the play to choose their current actions.
Here, this information is given by a public signal which depends on the joint
action. Models of imperfect monitoring give a quite realistic representation
of long strategic interaction and the assumption of public observation is
often found in real life or economic situations. It can be thought of as
radio broadcast, official statistics and in the case of industrial competition,
as the total supply in the market or as the price system of the preceding
period.
The well-known Folk theorem (for a survey see Sorin, 1992) deals with
perfect monitoring, i.e., situations where players are informed of the ac* I thank Professor J. Abdou for motivating this work and for constant help, support and
fruitful discussions, Professors E. Lehrer and S. Sorin for helpful remarks and comments and
also G. Giraud, O. Gossner, and an anonymous referee for remarks improving the exposition
of the paper.
0899-8256/99 $30.00
Copyright © 1999 by Academic Press
All rights of reproduction in any form reserved.
310
repeated games with observable payoffs
311
tion profile chosen at each stage, and states that any feasible individually
rational payoff vector is an equilibrium payoff of the repeated game. This
is proved by prescribing strategies playing pure actions on the equilibrium
path. Thanks to perfect monitoring, any deviation is detected by all the
players and the deviator is identified and punished to his minmax level forever. With imperfect monitoring, new problems arise:
(a) A player may deviate without affecting the observations of his
opponents.
(b) A deviation may be observed by a strict subset of players only.
A collective punishment cannot start unless the players communicate to
coordinate their play.
(c)
An observed deviation may be compatible with several deviators.
Considering point (a), two leading papers on the subject are Fudenberg
et al. (1994) for the discounted case and Lehrer (1990) for the undiscounted
case. The main issue of Fudenberg et al. (1994) is to consider public random signals and to prove that a Folk theorem holds under some informational assumptions. Their main assumptions (individual full rank and pairwise identifiabilty) imply that a unilateral deviation must change the distribution on public signals (i.e., must be statistically detectable) and that the
deviations of two different players can be distinguished (again statistically)
from each other. These assumptions are generic provided that the number
of possible values of the public signal is large enough. For the undiscounted
case, in Lehrer (1990), each player’s action set is endowed with a partition
and when player i chooses the action si , the element of the partition to
which si belongs is publicly revealed. This assumption guarantees that deviations from different players can be distinguished from each other. Then,
Lehrer (1990) studies the impact of profitable and undetectable deviations.
Point (b) is studied in Fudenberg and Levine (1991) and in Ben Porath
and Kahneman (1996). Players have to communicate through the signals to
agree on a punishment plan.
In this paper we deal with point (c) only. We consider public signals and
observable payoff vectors so that:
• any deviation that changes the payoff for one player is detectable;
• all players detect deviations simultaneously.
When a signal that was not expected at equilibrium is observed, the identity of the deviator may not be revealed. It may occur that when two players
i; j can induce this signal, punishing player i rewards player j. A profitable
deviation for player j would be then to induce such “bad” signals and gain
from the punishment of player i. Such players must be punished simultaneously. We thus design collective punishments against coalitions. Such
312
tristan tomala
considerations are meaningless in the two-player case where the identity of
the deviator is always revealed.
2. THE MODEL
2.1. The infinitely repeated game
A repeated game with public signals is given by
• a set of players N = ”1; : : : ; n• with n ≥ 3,
• for each i in N, aQfinite nonempty set S i of actions, and a one-shot
payoff function gi from j∈N S j to (w.l.o.g. we assume all payoffs to be
non-negative),
Q
• a public signal function ` from j∈N S j to some finite set A.
The repeated game 0∞ is described as follows: at each stage t = 1; 2; : : :
player i chooses sti in S i and if st is the profile of actions chosen, the public
to all the players. A total history of length t
signal at = `st ‘ is announced
Q
is an element of Ht =  j∈N S j ‘t . The set of private histories of length t for
player i is Hti = S i × A‘t . A pure strategy σ i for player i is a sequence of
i
into S i . A mixed strategy for player
functions σti ‘t≥1 , where σti maps Ht−1
i is a probability distribution over the set of player i’s pure strategies. A
behavior strategy σ i for player i is a sequence of functions σti ‘t≥1 , where σti
i
maps Ht−1
into 1S i ‘, the set of probability distributions over S i . Perfect
recall is assumed and Kuhn’s
theorem allows one to restrict the study to
P
behavior strategies. Let i be the set of behavior strategies for player i.
A joint behavior strategy σ = σ 1 ; : : : ; σ n ‘ induces a probability distribution over total histories in a natural way. Let σ be this probability and
Ɛσ be the induced expectation operator.
If gti is the payoff for player i at
PT
i
i
‘i∈N is a collection of sets
stage t, denote γT σ‘ = Ɛσ 1/T t=1 gt ‘. If E iQ
indexed on N, an element e1 ; : : : ; en ‘ of E = i∈N E i will simplyQbe denoted by e. We will denote by e−i the current element of E −i = j6=i E j ,
and we will write e = ei ; e−i ‘ when the ith component is stressed.
We deal with uniform equilibria which are for all ε > 0, ε-Nash equilibria
of the T -fold repeated game for large enough T .
Definition 2.1.
A joint behavior strategy σ is a uniform equilibrium if:
(i) for each player i, limT →∞ γTi σ‘ exists,
(ii) for all ε > 0, there is a stage T0 such that
∀T ≥ T0 ; ∀i ∈ N; ∀τi ∈ 6i ; γTi τi ; σ −i ‘ ≤ γTi σ‘ + ε:
If σ is a uniform equilibrium, we put for each i, γ i σ‘ = limT →∞ γTi σ‘
and the vector γ 1 σ‘; : : : ; γ n σ‘‘ is an equilibrium payoff. Let E∞ be
repeated games with observable payoffs
313
the set of equilibrium payoffs. The main issue of this paper is to give a
description of E∞ .
Let gS‘ be the set of payoff vectors associated to pure joint actions in
the one-shot game and co gS‘ the set of feasible payoffs be its convex hull.
We define now the individual rationality levels in the repeated game.
Notation 2.2.
(i) The independent minmax for player i is
vi =
x−i ∈
Qmin
max gi xi ; x−i ‘:
j
i
i
j∈N\”i• 1S ‘ x ∈1S ‘
(ii) The correlated minmax for player i is
wi =
min
max gi xi ; x−i ‘:
x−i ∈1S −i ‘ xi ∈1S i ‘
In repeated games with imperfect monitoring, a player’s payoff can be
decreased to his independent minmax, whatever the observation structure
is: his opponents just have to play x−i that achieves the minimum in 2.2 i)
at each stage. Howerver, the signals can be used by the players to generate
correlated distributions on their set of actions. This issue is studied in detail in Lehrer (1991). Thus, a player can guarantee himself his correlated
minmax by playing at each stage a best response to the expected distribution on the moves of his opponents. Considering optimal punishments, a
new bound is obtained. The payoffs associated to a player’s punishment are
defined as follows.
Definition 2.3.
(i) Players −i can force player i to the payoff z i in if
∀ε > 0; ∃σ −i ∈ 6−i ; ∃T0 ; s.t.: ∀τi ∈ 6i ; ∀T ≥ T0 ; γTi τi ; σ −i ‘ ≤ z i + ε:
(ii) The repeated game minmax of player i is defined as
i
= inf”z i ∈ s.t. players −i can force i to z i •:
v∞
i
≥ wi .
Remark 2.4. For each player i, vi ≥ v∞
Note, as in Fudenberg and Levine (1991), that in repeated games of
complete information, having ε-equilibria for all ε allows one to construct
equilibria by concatenation. Players −i can actually play a strategy adapted
i
+ ε on finite blocks of length T ε‘, forcing player i closer and closer
to v∞
i
to v∞
as ε goes to zero. Thus, the following definition makes sense.
Definition 2.5.
6−i such that
A punishing strategy against player i is a strategy σ −i ∈
i
+ ε:
∀ε > 0; ∃T0 ; s.t: ∀τi ∈ 6i ; ∀T ≥ T0 ; γTi τi ; σ −i ‘ ≤ v∞
Denote IR∞ the set of individually rational payoffs of the repeated game,
i
•:
IR∞ = ”u1 ; : : : ; un ‘ ∈ N s.t. ∀i ∈ N; ui ≥ v∞
314
tristan tomala
Lemma 2.6 E∞ ⊂ IR∞ : This lemma is a direct consequence of the definitions.
i
can actually be equal to wi .
The following example shows that v∞
Example 2.7. Consider the three-player repeated game described below. Player 1 chooses the row, player 2 the column, and player 3 the matrix.
The signal is given by
e
f
g
h
x
y
z
t
x
y
z
t
a
b
o
o
b
a
o
o
o
o
c
d
o
o
d
c
α
β
o
o
β
α
o
o
o
o
γ
δ
o
o
δ
γ
t
x
y
z
t
–1 –1 0
–1 –1 0
0 0 0
0 0 0
0
0
0
0
and the payoffs for player 3 are
e
f
g
h
x
y
z
0
0
0
0
0
0
0
0
0 0
0 0
–1 –1
–1 –1
In this game we have v3 = −1/4 and w3 = −1/2. The correlated minmax for player 3 is achieved by the correlation matrix M
1/8
1/8
0
0
1/8
1/8
0
0
0
0
1/8
1/8
0
0
1/8
1/8
and a best response of player 3 is 1/2; 1/2‘.
How do players 1 and 2 force player 3 to w3 ? They play in such a
way that at each stage, the expected distribution on S 1 × S 2 conditional on
the public history and on past moves of player 3 will be exactly M. Then,
the expected payoff for player 3 at each stage will be at most w3 . Suppose
that player 2 chooses x or y with probability 1/2. Since player 1 knows his
own move, the structure of the signal allows him to know whether player
2 played x or y. Player 3 observing the resulting signal still attributes a
probabilty of 1/2 to x and y. We will thus condition the strategies for players
1 and 2 on past moves of player 2. We represent S 1 × S 2 as a 4 × 4 matrix,
and split it into four 2 × 2 submatrices, top-left, top-right, bottom-left, and
bottom-right.
repeated games with observable payoffs
315
The correlation procedure is as follows.
• At stage 1, both play 1/2; 1/2‘ in the top-left submatrix.
– If the move of player 2 was x at stage 1, then at stage 2 they
remain in the top-left square and play again 1/2; 1/2‘.
– If the move of player 2 was y, then at stage 2 they switch to the
bottom-right submatrix where they both play 1/2; 1/2‘.
• If, at stage T , they play in the top-left square, they apply the above
procedure to determine their strategies at T + 1.
• If they play in the bottom-right square at stage T , then
– if the move of player 2 is z they remain in the bottom-right square
– if 2 plays t, they switch to top-left.
When both adhere to this strategy, the correlation matrix M is generated
at each stage T ≥ 2. The key argument here is that the move of player 2 is
a private information for players 1 and 2. Note that any information known
by players 2 and 3 is also known by player 1. Thus, given the public history
and the past moves of player 1, the future moves of players 2 and 3 are
1
2
= v1 and by a similar argument, v∞
= v2 .
independent. Therefore, v∞
2.2. The information structure
We introduce now the signaling structure and its implications in terms of
payoffs. We focus our attention on games with observable payoffs.
Definition 2.8.
The payoff vector is observable if
∀s; s0 ‘ ∈ S × S; gs‘ 6= gs0 ‘ ⇒ `s‘ 6= `s0 ‘:
Given this assumption, any profitable deviation is observable by all the
players. However, the signal may not reveal the identity of a single deviator
but a set of potential deviators. It turns out that this whole set has to be
simultaneously punished. We shall define new level of punishments using
vector payoffs.
Only some subsets of N deserve such an analysis, i.e., the subsets whose
members can be suspected at the same time to have deviated. We describe
these subsets now.
Definition 2.9.
For all players i, j,
i ∼ j ⇔ ∃t i ; t j ‘ ∈ S i × S j ; ∀s ∈ S; `t i ; s−i ‘ = `t j ; s−j ‘:
316
tristan tomala
Players i and j are equivalent when both of them have an action (t i
and t j ) that induces the same public signal whatever the joint action s is.
Two equivalent players cannot be differentiated by the others through the
signal. We justify now that this is an equivalence relation. Reflexivity and
symmetry being clear, we prove transitivity only.
Take i; j; k such that ∀s ∈ S; `t i ; s−i ‘ = `t j ; s−j ‘ and ∀s ∈ S;
`r j ; s−j ‘ = `r k ; s−k ‘. Then
`t i ; sj ; sk ; s−i−j−k ‘ = `t i ; r j ; sk ; s−i−j−k ‘ = `t i ; sj ; r k ; s−i−j−k ‘
and
`si ; sj ; r k ; s−i−j−k ‘ = `si ; t j ; r k ; s−i−j−k ‘ = `t i ; sj ; r k ; s−i−j−k ‘:
Hence, ∀s ∈ S x `t i ; s−i ‘ = `r k ; s−k ‘.
We denote by N the associated partition of N.
Definition 2.10.
put
For all M in N such that ŽMŽ ≥ 2 and for all i in M,
T i = ”t i ∈ S i Ž ∀ j ∈ M\”i•; ∃t j ∈ S j ; ∀s ∈ S; `t i ; s−i ‘ = `t j ; s−j ‘•:
T i is the set of actions of player i such that the property of definition 2.9
holds for each player j equivalent to i. Remark that as soon as a member
i of M chooses an action in T i , the value of the signal does not depend on
the action of any other member of M. Moreover, this value is the same for
any t i ∈ T i . This is summarized by the following lemma.
2.11.
S Lemma
i
T
×
S −i .
i∈M
on
The function ` depends only on s−M = sk ‘k∈M
/
Proof. We will prove this lemma in two steps.
Take first a player i ∈ M and t i ∈ T i . For any j in M, the signal does
not depend on player j’s action as soon as i plays t i . Since there is t j
such that ∀s ∈ S; `t i ; s−i ‘ = `t j ; s−j ‘, we have ∀s ∈ S; `t i ; sj ; s−i−j ‘ =
`t i ; t j ; s−i−j ‘. Hence, the signal depends only on t i and on s−M .
Second, we prove that the value of the signal is the same for all t i ∈ T i .
If ∀s ∈ S; `t i ; s−i ‘ = `t j ; s−j ‘ and ∀s ∈ S, `r i ; s−i ‘ = `r j ; s−j ‘ then
∀s ∈ S, `t i ; s−i ‘ = `t j ; s−j ‘ = `r i ; s−i ‘ = `r j ; s−j ‘. This is because
`t i ; s−i ‘ does not depend on sj and `r j ; s−j ‘ does not depend on si .
Hence `t i ; r j ; s−i ‘ = `r j ; s−j ‘.
A direct consequence
of this lemma is that ∀s−M ∈ S −M , `·; s−M ‘ is
Q
i
constant on i∈M T Q. Moreover, if the payoff vector is observable, g·; s−M ‘
is also constant on i∈M T i . Q
We choose and we fix t M ∈ i∈M T i . We can now define the generalized
minmax levels for the coalitions M in N .
repeated games with observable payoffs
Definition 2.12.
317
For all M in N ,
• if M = ”i•,
i
•
V M‘ = ”u ∈ N Ž ui ≥ v∞
• if ŽMŽ ≥ 2,
M
× N\M :
V M‘ = co g”t M • × S −M ‘ + +
When M contains at least two players, V M‘ is a generalized minmax
for coalition M. The similarities with the usual minmax level are:
• first, that a player i in M can guarantee that the payoff vector for
M will be in this set by playing any action in T i ;
• second, the payoff vector for M can be held down to this set for any
strategy of a player i in M: as soon as another player j in M plays in T j ,
player i cannot control the payoff.
We shall consider in the sequel a condition on the signal for which the
analysis is easier. Along the play, when the players observe a deviation, they
can compute the set of players that possess an action compatible with the
observed signal. If this set is an equivalence class for our relation, it can
be punished to the generalized minmax previously defined. Take s ∈ S and
a ∈ A and define Ns; a‘ as the set of players who have an action inducing
the signal a against s. Formally,
Ns; a‘ = ”i ∈ N Ž ∃t i ; `t i ; s−i ‘ = a•
Condition C ŽNs; a‘Ž ≥ 2 and `s‘ 6= a ⇒ Ns; a‘ ∈ N ‘: The meaning of this condition is the following. The main idea in defining equilibrium
strategies in this paper is that when a deviation occurs, each player computes a set of suspects [Ns; a‘ if s was to be played and if a was observed].
In full generality, this set may be any coalition. Our condition only allows the set of suspects to belong to a certain family of coalitions. This
requirement is rather strong, but relaxing it makes it more difficult to assign a deviation to a particular subset of players. Namely, the set of suspects should evolve during the play, making the punishing strategy more
intricate to define. This problem can be, however, dealt with, see Tomala
(1998) for a study in the pure strategy case. However, with general signals
the very definition of the set of suspects is unclear. This is mainly due to
mixed strategies and to the fact that players cannot predict their opponent’s
moves with certainty. We are here in the special case of an observable payoff vector where we can always find an equilibrium strategy which is pure
on equilibrium path (see the proof of the theorem). Nevertheless, Examples
2.13, 2.14, and 3.2 exhibit natural signaling functions satisfying C.
318
tristan tomala
Example 2.13. We say that ` is rectangular if for all a in A, the inverse
image of a by `, `−1 a‘ is a direct product. This implies that for all players
i and j,
`t i ; s−i ‘ = `t j ; s−j ‘ = a ⇒ `si ; s−i ‘ = `sj ; s−j ‘ = a:
In this case, our equivalence relation has very strong properties. If i ∼ j,
then either i = j or, if i 6= j, neither i nor j can influence the signal.
Furthermore T i = S i and T j = S j . Thus, if M contains at least two players
M
× N\M .
V M‘ = co gS‘ + +
C holds for a rectangular signal since a situation where `t i ; s−i ‘ =
j −j
`t ; s ‘ 6= `s‘ is impossible. The interest of rectangular signals is the following. Consider a repeated game where the one-shot game is in extensive
form and where the signal associated to a joint action (i.e., a joint strategy
in the extensive form game) is the unique terminal node of the underlying
tree. It is proven in Abdou (1994) that this mapping is rectangular. A repeated extensive game with observation of the terminal node is a natural
case to analyze. Furthermore, in this setup the payoff vector is observable.
Example 2.14. A signal for which C is verified and for which our equivalence notion is not trivial is the following. We endow the set of players with
a partition N and divide each player’s action set into two subsets, namely
for each i, S i = T i ∪ Ri with T i nonempty. The public signal ` is as follows:
for each M ∈ N , we are given a map `M on S M and `s‘ = `M sM ‘‘M∈N .
The definition of `M sM ‘ is the following.
• If for all i ∈ M, si ∈ Ri : `M sM ‘ = sM ;
• If there is i ∈ M such that si ∈ T i : `M sM ‘ = 0M , where 0M is a
blank signal.
For i ∈ M, a r i ∈ Ri is called revealing since if all players in M play a
revaling action, the signal is the joint action. Otherwise, if a player i in M
plays a hiding action t i ∈ T i , the signal reveals nothing. Remark that the
equivalence relation of definition 2.9 leads to the same partition and to the
same sets T i and that this function satisfies C.
3. THE MAIN THEOREM
We are now ready to state the leading result of this paper.
Theorem 3.1.
(i) In a repeated game with observable payoff vector,
\
V M‘:
E∞ ⊂ co gS‘ ∩ IR∞ ∩
M∈N
repeated games with observable payoffs
(ii) Under condition C,
E∞ = co gS‘ ∩ IR∞ ∩
\
319
V M‘:
M∈N
Remark that the theorem does not give a completely computable expression of the set of equilibrium payoffs since the repeated game minmax levels are not characterized according to the one-shot game. However, since
i
≤ vi , we know that a feasible and individually rational (in the
wi ≤ v∞
sense of the vi ’s) payoff is an equilibrium payoff if and only if it belongs to
all V M‘’s.
Example 3.2. Consider a three-player repeated game where players 1
and 2 have two actions and player 3 has three actions. Player 1 chooses
the row, 2 chooses the column, and 3 chooses the matrix. The signaling
function is given by
e
f
x
y
x
y
x
y
a
b
b
b
c
c
c
c
d
d
d
d
L
M
R
and the payoff function
e
f
x
y
x
y
x
y
1,1,1
4,4,0
4,4,0
4,4,0
0,3,0
0,3,0
0,3,0
0,3,0
3,0,0
3,0,0
3,0,0
3,0,0
L
M
R
Condition C is verified. The vector 1; 1; 1‘ if feasible and individually
i
= 0. At any stage where
rational since for each player i, vi = wi = v∞
the signal a is supposed to be observed, 1 and 2 can profitably deviate by
inducing the signal b. In this case, 3 will not know who deviated. Hence,
whom to punish: player 1 by playing M rewarding player 2 or player 2 by
playing R rewarding player 1? Player 3 should punish both simultaneously.
We have here:
• N = ”1; 2•; ”3•‘
2
2
× = ”u ∈ +
×
• V ”1; 2•‘ = co ”4; 4; 0‘; 0; 3; 0‘; 3; 0; 0‘• + +
1
2
Ž u + u ≥ 3•
2
× Ž u1 + u2 ≥ 3•:
• E∞ = co gS‘ ∩ IR∞ ∩ ”u ∈ +
We turn now to the proof of Theorem 3.1.
Proof of (i). It is enough to prove that for M ∈ N with ŽMŽ ≥ 2, E∞ ⊂
V M‘. Let u ∈ co gS‘ and σ ∈ 6 such that γσ‘ exists and equals u. We
320
tristan tomala
FIGURE 1
prove that u ∈
/ V M‘ ⇒ u ∈
/ E∞ . The proof is divided in two arguments.
We first prove that every player i in M has a deviation τi inducing against
σ −i a payoff in V M‘. Second, we deduce from this fact that σ is not a
uniform equilibrium.
For any i ∈ M, define τi the pure stategy of player i which plays at
each stage t M i‘ the action of player i in the M-tuple t M , regardless of
the history. Since, for all i; j ∈ M and s in S, `t M i‘; s−i ‘ = `t M j‘; s−j ‘,
τi ; σ −i ‘ and τj ; σ −j ‘ induce the same distributions of public signals. For
τ a joint behavior strategy, let Qτ be the probability induced by τ on A∞
the set of public infinite histories. For all i; j ∈ M, Qτi ;σ −i ‘ = Qτj ;σ −j ‘ .
The payoff vector being observable depends on the signal only and
therefore
Z
Z
γTi τi ; σ −i ‘ = giT dQτi ; σ −i ‘ = giT dQτj ; σ −j ‘ = γTi τj ; σ −j ‘
Denote by gt the random payoff vector at stage t and by gT the average of
gt ‘t≤T . Put γTi = γTi τi ; σ −i ‘ = γTi τj ; σ −j ‘ and γT = γT1 ; : : : ; γTn ‘. For all
i in M, under τi ; σ −i ‘, gt ∈ g”t M • × S −M ‘ at each stage t. Hence, gT ∈
co g”t M • × S −M ‘. Because the latter set is convex, we take the expectation
and find γT ∈ co g”t M • × S −M ‘ ⊂ V M‘.
It is then impossible for σ to be a uniform equilibrium. Since u ∈
/ V M‘
which is closed and convex, there is ε > 0 such that, for all u0 ∈ V M‘,
there is i ∈ M such that ui < u0 i − ε. Hence, for every stage T , there is
i ∈ M who has an ε-profitable deviation, that is, γ i σ‘ < γTi − ε.
repeated games with observable payoffs
321
C that co gS‘ ∩ IR∞ ∩
T Proof of (ii). We prove under condition T
V
M‘
⊂
E
.
Let
u
∈
co
gS‘
∩
IR
∩
∞
∞
M∈N
M∈N V M‘. We will construct a uniform equilibrium σ with payoff u. As usual in repeated games
with complete information, this strategy will consist in a main path to be
followed by all players and punishments in case of deviation. If player i is
identified as the deviator, he should be punished to his minmax; if the set
of possible deviators is a coalition M, they will be simultaneously punished.
Fix h∗ = st∗ ‘∞
t=1 , an infinite play which leads to u, i.e.,
lim
T
1 X
gst∗ ‘ = u:
T t=1
This play will be referred to as the main path. Let α∗ = `st∗ ‘‘∞
t=1 be the
public history associated to h∗ . Denote α∗T = `st∗ ‘‘Tt=1 , the public history
of length T .
Let us describe now the punishments.
• For each player i, let σ i j‘ be the punishing strategy for player i
against player j given by definition 2.5.
• For each M ∈ N with ŽMŽ ≥ 2 , there is πM‘ ∈ co g”t M • × S −M ‘
such that for i ∈ M, ui ≥ π i M‘. Fix a play hM = st M‘‘t≥1 leading to
P
πM‘, i.e., lim1/T ‘ Tt=1 gst M‘‘ = πM‘ and such that for all t, st M‘ ∈
M
−M
”t • × S .
We are ready now to define the strategy σ i for player i. Let hiT =
be a history of length t for player i and αT = at ‘t≤T the public
part of this history. We will define σTi +1 hiT ‘ for all T .
sti ; at ‘t≤T
∗i
• σTi +1 hi∗
T ‘ = sT +1 (player i follows the main path).
• If αT 6= α∗T , put p = inf”t; αt 6= α∗t •, there is a ∈ A s.t. αp =
∗
αp−1 ; a‘. Compute Nsp∗ ; a‘.
i
i
i
i
– If Nsp∗ ; a‘ = ”j• then σp+T
+1 hp+T ‘ = σ T j‘hp+T T ‘‘ where
i
is the i-history of length T obtained from hp+T by suppressing
the p first observations. (Player i punishes forever the only possible deviator
and his punishing strategy starts at time p+1.)
hip+T T ‘
– If ŽNsp∗ ; a‘Ž ≥ 2, from condition C, Nsp∗ ; a‘ = M ∈ N and
i
= sp+T
+1 M‘ [if the deviator is a member of M, the whole
subset M is held down to πM‘ forever].
i
i
σp+T
+1 hp+T ‘
We prove now that σ is a uniform equilibrium with payoff u. If all players
adhere to this strategy, it is clear that γσ‘ is well defined and equals u.
It remains to show that σ is an ε-equilibrium in the T -fold repeated game
for T greater than some T0 . Let ε > 0, we first construct the associated T0 .
322
tristan tomala
From the properties of convergence of the payoff on the equilibrium path
and in punishment phases, there is an integer K such that for T ≥ K, for
each player i and subset M ∈ N :
• the payoff received by following the equilibrium path for at least T
stages is within ε of ui ,
• being punished by the strategy σ −i for at least T stages yields a
i
+ ε,
maximal payoff less than v∞
• if i is a member of M, being punished as such for at least T stages
yields a payoff less than π i M‘ + ε.
C being an upper bound for all payoffs appearing in the one-shot game,
choose T0 ≥ max”K + 1‘/ε; C/ε; 2K• and T ≥ T0 . Take a player i and let
τi ∈ 6i be a pure strategy.
If the deviation never changes the signal, it also never changes the payoff
and it is not profitable. Otherwise let p be the first stage where a deviation
is observed.
• If at stage p, player i is identified as the only possible deviator, then
his payoff is at most ’p − 1‘/T “uip−1 + C/T + ’T − p‘/T “vTi −p where
uip−1 is the average payoff for player i when the equilibrium path is followed until stage p − 1 and vTi −p is the maximal payoff for player i when
he is punished from stage p + 1 to stage T .
– If p − 1 ≤ K, then p − 1‘/T ≤ ε and T − p ≥ T − K ≥ K + 1.
i
+ ε ≤ ui + ε1 + 2C‘.
Then the payoff for player i is at most 2Cε + v∞
– If p − 1 ≥ T − K, then T − p‘/T ≤ ε and p − 1 ≥ T − K ≥ K.
Again the payoff for i is at most ui + ε1 + 2C‘.
– If K < p − 1 < T − K, then the equilibrium path and the punishment phase are followed for at least K stages. Thus uip−1 and vTi −p are less
than ≤ ui + ε, therefore the average payoff is less than ui + ε1 + C‘.
• If at stage p, the set of potential deviators is M with i ∈ M,
the maximal average payoff for player i is ’p − 1‘/T “uip−1 + C/T +
’T − p‘/T “πTi −p where πTi −p is the payoff for player i when he is punished from stage p + 1 to stage T as a member of M. Then, the same
calculations as above can be made.
We find then that for all ε > 0, there is T0 such that for T ≥ T0 for each
player i, for each pure strategy τi , γTi τi ; σ −i ‘ ≤ ui + ε1 + 2C‘. If µi is
a mixed strategy for player i, then taking the expectation in this inequality
with respect to µi implies that this holds for any mixed deviation. Hence,
σ is a uniform equilibrium.
repeated games with observable payoffs
323
Corollary 3.3. In a repeated extensive form game where the terminal
node is publicly observable,
E∞ = co gS‘ ∩ IR∞ :
This is a direct consequence of Example 2.13 and Theorem 3.1.
4. CONCLUDING REMARKS
4.1. Discounted Games
Our results easily extend to Nash equilibria of discounted games for low
discount factor. The same arguments as in the proof of Theorem 3.1(i) show
that any Nash payoff of the discounted game lies in all the V M‘’s. Each
strictly individually rational feasible payoff vector that lies in intersection of
the interiors of the V M‘’s is Nash payoff for low discount factor, provided
that there exists such a vector. The construction of the equilibium strategy
is basically the same as in the proof of 3.1(ii).
4.2. Extensions
Further Questions on This Model
It would be natural to extend the study to finitely repeated games, but the
key of the folk theorem for finitely repeated games (Benoit and Krishna,
1987) is that for each player, there is a Nash equilibrium of the one-shot
game at which he receives strictly more than his minmax level. Then, ending
the play by such Nash equilibria will induce a negative profit in case of
deviation. The difficulty here is that the payoff received by a subset of
players M when it is punished depends on the equilibrium payoff. Therefore
a generalization of the condition given in Benoit and Krishna (1987) would
be that for each M ∈ N , there is a one-shot Nash equilibrium payoff whose
coordinates in M are strictly greater than those of any payoff in co g”t M • ×
S −M ‘. It is easy to check on Example 3.2 that this is much too strong a
requirement. Some new strategic ideas have to be developed here.
Relaxing Our Assumptions
Considering games with an unobservable payoff vector implies that there
will be some undetectable and profitable static deviations. This issue has
been studied in details in Lehrer (1990), where in particular it is shown
that the equilibrium strategies may not be pure on the equilibrium path.
Then some statistical inference on the set of suspects has to be performed.
Up to now this problem is not solved.
324
tristan tomala
Relaxing condition C implies that after a first deviation, the set of suspects may not be a class for our equivalence relation. This has two consequences. First, the definition of punishing stratgies against a subset of
players becomes much more intricate. Second, some additional information on the identity of the deviator may be revealed along the play and the
set of suspects evolves with time. We have given in Tomala (1998) a solution to this problem for a general public signal in the case of pure strategies
(with compact action spaces).
REFERENCES
Abdou, J. (1994). “Rectangularity and Tightness: A Normal Form Characterization of Perfect
Information Game Forms,” mimeo.
Benoit, J.-P., and Krishna, V. (1987). “Nash Equilibria of Finitely Repeated Games,” Int.
J. Game Theory 16, 197–204.
BenPorath, E., and Kahneman, M. (1996). “Communication in Repeated Games with Private
Monitoring,” J. Econ. Theory 70, 281–298.
Fudenberg, D., and Levine, D. (1991). “An Approximate Folk Theorem with Imperfect Private
Information,” J. Econ. Theory 54, 26–47.
Fudenberg, D., Levine, D., and Maskin, E. (1994). “The Folk Theorem With Imperfect Public
Information,” Econometrica 62, 997–1039.
Lehrer, E. (1990).“Nash Equilibria of n-Player Repeated Games with Semi-Standard Information,” Int. J. Game Theory 19, 191–217.
Lehrer, E. (1991). “Internal Correlation in Repeated Games,” Int. J. Game Theory 19, 431–
456.
Sorin, S. (1992). “Repeated Games With Complete Information,” Chap. 4 of Handbook of
Game Theory with Economic Applications, Vol. 1, pp. 71–107 (R. Aumann and S. Hart,
Eds.). Amsterdam: North-Holland.
Tomala, T. (1998). “Pure Equilibria of Repeated Games with Public Observation,” Int. J. Game
Theory 27, 93–109.