Efficiency in Repeated Prisoner`s Dilemma with Private Monitoring

Journal of Economic TheoryET2313
journal of economic theory 76, 345361 (1997)
article no. ET972313
Efficiency in Repeated Prisoner's Dilemma
with Private Monitoring
Tadashi Sekiguchi*
Graduate School of Economics, University of Tokyo,
7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan
Received June 27, 1996; revised March 10, 1997
This paper analyzes repeated games with private monitoring, where in each
period each player receives a signal of the other player's action in the previous
period, and that signal is private information. Previous literature on discounted
repeated games with private monitoring has not shown whether or not (nearly)
efficient equilibria exist. For a repeated prisoner's dilemma satisfying a certain
assumption regarding stage game payoffs, we show that there exists a nearly
efficient sequential equilibrium, provided that imperfectness of signals is small and
players are patient. Journal of Economic Literature Classification number: C73.
1997 Academic Press
1. INTRODUCTION
This paper analyzes repeated games with private monitoring. Much of the
literature on repeated games assumes that players can observe public
(possibly imperfect) information regarding other players' past actions.
However, this paper assumes that, at the end of every period, each player
observes an imperfect signal regarding hisher opponents' actions for that
period, and that this signal is private information.
Situations where information regarding the behavior of other agents
remains private would appear to be important in a number of economic
settings. The ``secret price cutting'' in Stigler [20] is an excellent example.
In each period, each firm in an oligopolistic market with differentiated
goods chooses a price for its product. Since each firm has a chance to offer
its customers secretly a lower price than the announced one, the firms are
unsure about the actual prices their rivals have chosen. Each firm's sales
during the period, which are privately observed by the firm and are determined by actual prices and unobservable stochastic shocks to demand,
provide information on pricing behavior of the other firms. In this example,
* I am grateful to Michihiro Kandori, Hitoshi Matsushima and an associate editor for their
helpful comments. Of course, any remaining errors are my own.
345
0022-053197 25.00
Copyright 1997 by Academic Press
All rights of reproduction in any form reserved.
File: DISTIL 231301 . By:DS . Date:01:10:97 . Time:10:21 LOP8M. V8.0. Page 01:01
Codes: 3953 Signs: 2405 . Length: 50 pic 3 pts, 212 mm
346
TADASHI SEKIGUCHI
sales can be seen as privately observed imperfect signals of the other firms'
actions because the sales depends partly on unobservable stochastic shocks.
Alternatively, suppose that two people have made an agreement to
exchange their goods produced in each period. In each period's production,
each person has an opportunity to make a costly investment to improve
the quality of his goods. Realized quality depends on the investment decision
and on unobservable stochastic shocks. The players can observe only the
realized quality of the goods they receive. In this example, quality is an
imperfect private signal of the opponent's investment decision and the
situation is somewhat like a prisoner's dilemma; the two people have no
incentive to make the costly investments when the trade is one-shot.
This paper analyzes a repeated prisoner's dilemma with private monitoring
such as that described above. The main result of this paper is that there
exists a sequential equilibrium with (average) payoffs arbitrarily close to
those of cooperation, provided that the stage game payoffs satisfy some
conditions, signals convey correct information in most cases, and players
are patient. In other words, near efficiency can be achieved in these cases.
In previous literature, there are other attempts to obtain near efficiency
in private monitoring situations. Radner [18] shows an efficiency result,
and Lehrer [1416] provides further characterization of the equilibrium
payoff set, both assuming no discounting on players' payoff functions.
Fudenberg and Levine [9] prove a Folk Theorem and therefore near
efficiency, using a notion of approximate equilibrium as a solution concept,
which requires only =-rationality and is thus weaker than Nash equilibrium
or its refinements. However, the assumption of no discounting is
implausible in that players' payoffs are not affected by any change in outcomes within finite periods, however long these may be. Likewise,
=-rationality is implausible because deviations within quite long periods are
ignored when players are very patient, as such deviations have little effect
on players' average payoffs. Therefore the assumptions of no discounting or
=-rationality are not a good approximation to the case of discounting and
perfect rationality. As for the repeated prisoner's dilemma with private
monitoring and discounting, Compte [6] obtains near efficiency, too.
However, Compte [6] assumes that defection is irreversible: once a player
defects, he must defect in all subsequent periods, and thus Compte does not
analyze usual repeated games. Therefore this paper is the first to provide
a class of repeated games with private monitoring that have a nearly
efficient equilibrium without those restrictive assumptions.
Now we briefly survey the previous literature on repeated games with
private monitoring and discounting. On the one hand, Matsushima [17]
attempts to derive an anti-Folk Theorem for such games. He shows that
when strategies available to players are confined to those that satisfy a
certain requirement of informational efficiency, any Nash equilibrium of
File: DISTIL 231302 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 3450 Signs: 3148 . Length: 45 pic 0 pts, 190 mm
REPEATED PRISONER'S DILEMMA
347
repeated games with private monitoring must be a repetition of Nash equilibria of the stage game. On the other hand, Kandori [12] and Bhaskar
[4] show the existence of Nash equilibria other than repetition of the Nash
equilibrium of the stage game. They assume, however, that the stage game
has a mixed strategy equilibrium (Kandori [12]) or multiple equilibria
(Bhaskar [4]), which prevents us from applying their results to more
general stage games, including the prisoner's dilemma. In addition, the
equilibria constructed there are not fully efficient: the average payoffs of the
equilibria neither lie on nor are close to the Pareto frontier of the feasible
sets of the stage games.
Therefore the previous literature on repeated games with private
monitoring and discounting does not show whether (nearly) efficient outcomes can be achieved as Nash or sequential equilibria, while Folk
Theorems are obtained in repeated games with public information on other
players' past actions. 1 Now we briefly explain why the cases with private
monitoring are hard to analyze. In repeated games with perfect monitoring,
where players directly observe other players' past actions, subgame-perfectness requires that the continuation strategy profile after any history be a
subgame-perfect equilibrium. Similarly, in repeated games with imperfect
monitoring, where all players observe the same signal on the actions
chosen by them, if players use strategies that depend only on these public
signals, then sequential rationality requires that the continuation strategies
after any history of public signals constitute a sequential equilibrium. Thus
such games have a recursive structure in that the same solution concept
applies to all continuation games, and so one can apply the dynamic
programming method developed by Abreu [1] and Abreu et al. [2] to
these games.
In contrast, repeated games with private monitoring do not have such a
recursive structure. To see this, suppose a strategy profile is an equilibrium,
and consider a history in which player 1 deviated from the strategy. After
that history, player 1 computes his belief about other players' continuation
strategies taking his own deviation into account, but the other players
compute their beliefs presuming that their opponents follow the equilibrium strategies without deviation. Thus the continuation strategies do
not constitute any kind of equilibrium in general when a deviation has
occurred. This is the lack of recursive structure that prevents us from
applying the dynamic programming method to repeated games with private
monitoring.
Our result, existence of a nearly efficient equilibrium for some kinds of
prisoner's dilemma with small imperfectness of signals and patient players,
has two implications. First, we can demonstrate a kind of robustness of
1
See Fudenberg and Maskin [11] and Fudenberg et al. [10].
File: DISTIL 231303 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 3385 Signs: 2922 . Length: 45 pic 0 pts, 190 mm
348
TADASHI SEKIGUCHI
cooperation in repeated games. Theories on repeated games have created
the idea that economic agents who are engaged in long-run relationships
can achieve efficient outcomes. However, as is described above, previous
literature has not shown the possibility that efficiency might still be
achieved in private monitoring cases. In that sense, we were previously
unsure as to whether the possibility of cooperation is robust with respect
to changes in the monitoring structure. Though restricted to certain kinds
of prisoner's dilemma, this paper shows that such cooperation is still
possible, despite the change.
Second, we can now consider the significance of communication in
repeated games with private monitoring. Compte [4] and Kandori and
Matsushima [12] show that a Folk Theorem can be obtained when
players are able to communicate with each other about their privately
observed signals in each period. 2 While they show that communication is
sufficient to sustain cooperation, this paper shows that it is not always
necessary. Moreover, economic agents often confront the situation where
communication is prohibited; in the secret price cutting model, the
antitrust law can be considered as such an example. Therefore the analysis
of the case without communication is important not only theoretically but
also practically, and this paper shows that efficiency can be obtained in
such cases.
The rest of this paper is organized as follows. Section 2 introduces the
model. Section 3 shows existence of a nearly efficient sequential equilibrium
for certain types of prisoner's dilemma, given small imperfectness of signals
and patient players. Section 4 concludes.
2. THE MODEL
The stage game is as follows. Each player i(i=1, 2) simultaneously
chooses an action a i # A i =[C, D] and observes a signal | i # 0 i =[c, d ],
which is a stochastic variable, after choosing the action. Both the chosen
action and the observed signal cannot be observed by the other player;
namely, the signal is private information.
Let ?(| | a) be the probability that a signal profile | # 0=0 1 _0 2 is
realized, given that an action profile a # A=A 1 _A 2 is chosen. We hereafter
call the event where a player observes c (or d, respectively) despite that the
other player chose D (or C ) ``error''. We assume that, for any action
profile, the probability that an error occurs to one particular player but not
to the other is p 1 and that the probability that an error occurs to both
2
Ben-Porath and Kahneman [3] also introduce communication and obtain a Folk
Theorem in similar private monitoring cases.
File: DISTIL 231304 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 3115 Signs: 2589 . Length: 45 pic 0 pts, 190 mm
REPEATED PRISONER'S DILEMMA
349
players is p 2 , where p 1 # (0, 1), p 2 # (0, 1) and 1&2p 1 & p 2 >0. For example,
when a=(C, C ), we have
?((c, c) | (C, C ))=1&2p 1 & p 2 ,
?((c, d) | (C, C ))=?((d, c) | (C, C ))= p 1 ,
and
?((d, d ) | (C, C ))=p 2 ,
and so on. Notice that the situation is close to perfect monitoring when p 1
and p 2 are close to zero. We will concentrate on such situations later.
The above assumptions are quite general except that the probability of
each type of errors is independent of action profiles and symmetric in that
the probability that an error occurs only to player 1 is always equal to the
probability that an error happens only to player 2. 3 In particular, they
include both the case with independent signals ( p 1 and p 2 are written as
p 1 =z(1&z) and p 2 =z 2 for some z # (0, 1)) and the case with highly
correlated signals.
Each player's payoff depends on the action she chose and the signal she
observed and is written as u i (a i , | i ) (i=1, 2). We assume that the players'
payoff functions are identical and ordered as in a prisoner's dilemma.
Therefore, after normalization we have
u 1(C, c)=u 2(C, c)=1,
u 1(C, d )=u 2(C, d )=&L,
u 1(D, c)=u 2(D, c)=1+G,
and
u 1(D, d )=u 2(D, d )=0,
where G>0, L>0, and 1+G&L<2.
Given a # A, player i's expected payoff when players choose an action
profile a is
f i (a)= : u i (a i , | i ) ?(| | a).
|#0
3
These assumptions are partly for the sake of expositional simplicity. See Remark 2 of the
Theorem.
File: DISTIL 231305 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 2404 Signs: 1366 . Length: 45 pic 0 pts, 190 mm
350
TADASHI SEKIGUCHI
When p 1 and p 2 are small, f 1( } ) and f 2( } ) are ordered as in a prisoner's
dilemma. That is,
f 1(D, C )= f 2(C, D)
> f 1(C, C )= f 2(C, C )
> f 1(D, D)= f 2(D, D)
> f 1(C, D)= f 2(D, C).
In these cases, we can normalize f 1( } ) and f 2( } ) as depicted in Fig. 1.
Formally,
g=
G+(L&G)( p 1 + p 2 )
1&(2+G+L)( p 1 + p 2 )
and
l=
L+(G&L)( p 1 + p 2 )
.
1&(2+G+L)( p 1 + p 2 )
When possible, we will mainly use the normalized payoffs in Fig. 1 as stage
game payoffs.
We consider infinitely repeated games with discounting where the stage
game described above is played in each period t=1, 2, 3, .... Signals observed
in each period are private information and there is no way for the players
to communicate with each other about the signals. We assume that
occurrence of signals is independent over time. We denote the repeated
game with private monitoring by G( p, $ ), where p=( p 1 , p 2 ) and $ # (0, 1)
is the players' common discount factor. In this game, a history for player
i at period t (t2), denoted h ti , is a sequence of player i's past actions and
Figure 1
File: 642J 231306 . By:SD . Date:16:09:97 . Time:15:07 LOP8M. V8.0. Page 01:01
Codes: 1810 Signs: 999 . Length: 45 pic 0 pts, 190 mm
REPEATED PRISONER'S DILEMMA
351
t
signals. That is, h ti =[(a i ({), | i ({))] t&1
{=1 . Let H i (i=1, 2, t2) be the set
of histories for player i at period t. For notational convenience, we define
H 1i (i=1, 2) as an arbitrary singleton set. A behavior strategy for player i
t
in G( p, $ ) is a mapping from t=1 H i to a probability distribution on A i .
We denote the set of strategies for player i in G( p, $ ) by 7 i (i=1, 2).
Given a strategy profile _=(_ 1 , _ 2 ) # 7 1 _7 2 , player i 's average expected
payoff is
(1&$ ) : $ t&1E[ f i (a(t)) | _],
t=1
where f i ( } ) are employed as stage game payoffs. E[ } | _] means expected
value with respect to the probability distribution of action profiles induced
by _, and a(t) is the realized action profile in period t.
3. EXISTENCE OF A NEARLY EFFICIENT EQUILIBRIUM
In this section, we prove our main result, existence of a nearly efficient
sequential equilibrium for some kinds of prisoner's dilemma with small
imperfectness of signals and with patient players. Before stating the result,
we briefly sketch the idea of the proof.
To begin with, we define some strategies. Let _ C # 7 i (i=1, 2) be the
strategy such that _ C (h 1i )=C and, for t2, _ C (h ti )=C iff h ti =[(C, c),
(C, c), ..., (C, c)], and let _ D # 7 i (i=1, 2) be the strategy in which player
i plays D regardless of histories. In other words, _ C is the grim trigger
strategy. And let _* be the strategy that mixes _ C and _ D so that the opponent is indifferent between _ C and _ D . Such _* exists when the players are
not too myopic and when signals are accurate to some extent.
Since the signals are noisy, the player playing _* eventually starts to play
D forever, which may make strategy profile (_*, _*) inefficient. However,
when $ is relatively small, it may be a nearly efficient outcome because the
payoffs in the distant future are unimportant. As is proved later, this is
indeed true because the probability of _ D in _* is small for such $. Then
we can also obtain a nearly efficient outcome in the case with $ close to
one, by modifying (_*, _*) as follows: we divide the original game into N
distinct games, the first of which is played in period 1, N+1, 2N+1, ..., the
second of which is played in period 2, N+2, 2N+2, ..., and so on. Then
the effective discount factor of the divided games, $ N, is so small that
playing (_*, _*) in each divided game achieves near efficiency. 4
4
This idea of dividing the original game is introduced by Ellison [7] in the context of
random matching models.
File: DISTIL 231307 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 3210 Signs: 2439 . Length: 45 pic 0 pts, 190 mm
352
TADASHI SEKIGUCHI
Thus it suffices to show that (_*, _*) is a Nash equilibrium for small $,
and it is shown through the following method, which we call path
dominance. Given a history for a player at some period on the path induced
by a given strategy profile, we check whether it is optimal for the player to
play the action in that period prescribed by the strategy. In other words, we
consider optimality of the current action given the history, instead of
optimality of the whole continuation strategy. To this end, we compare the
payoff obtained by conforming to the given strategy with an upper bound
of payoffs obtained by deviating at least in that period. If the former
is larger than the latter, then the continuation strategy dominates all
continuation strategies induced by a current deviation, which proves
the optimality of the current action. Note that it is easier to compute an
upper bound than the maximum of the payoffs. And optimality of current
actions at all histories on the path implies that the given strategy is a Nash
equilibrium.
Of course, a Nash equilibrium does not satisfy sequential rationality in
general. In our model, however, any Nash equilibrium has a sequential
equilibrium with the same path. This is because any player's information
set off the equilibrium path must follow the player's own deviation. A formal
proof is given in Proposition 3.
To sum up, we try to find a nearly efficient Nash equilibrium using pathdominance at first and then to find a sequential equilibrium with the same
path, instead of finding such a sequential equilibrium directly. As we have
seen in the Introduction, it is difficult to find sequential equilibria in our
model because the model lacks a recursive structure. As time passes, the
players accumulate a lot of privately observed signals, by which they form
beliefs on continuation strategies of the others given a strategy profile. As
a result, the players would have quite complex belief systems, and one can
hardly expect that the original strategy is in fact a best response to such
complex belief systems. On the contrary, it is relatively easy to find a simple
Nash equilibrium strategy profile because it requires best response property
only on the equilibrium path, which can be proved by a path-dominance
argument. This is the reason our method works better.
Now we begin with some preliminary results. The following proposition
is a key to our path-dominance argument. Since we are interested in the
case with small p 1 and p 2 , we consider G( p, $ ) with stage game payoffs
given by Fig. 1.
Proposition 1. Suppose l> g 2 and $ # (g(g+1), (g+l)(2g+l+1))
in G( p, $ ). 5 Moreover, suppose that player i believes that the probability
that player j 's continuation strategy is _ D is q at some history at period t.
5
Notice that l>g 2 is equivalent to g( g+1)<(g+l )(2g+l+1).
File: DISTIL 231308 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 3384 Signs: 2838 . Length: 45 pic 0 pts, 190 mm
353
REPEATED PRISONER'S DILEMMA
Then, given the history, it is not optimal for player i to play C in period t
if q12.
Proof.
Given the belief, playing C in period t gives player i at most
(1&$)(&ql+1&q)+$(1&q)(1+ g),
while playing D from period t on gives him at least
(1&$ )(1&q)(1+ g).
Since $ # (g( g+1), (g+l)(2g+l+1)) and q12, the difference is
(1&$ )(1&q)(1+ g)&(1&$ )(&ql+1&q)&$(1&q)(1+ g)
=q[(1&$ ) l+(2 g+1) $& g]+ g&(2 g+1) $
12 [(1&$ ) l+(2g+1) $& g]+ g&(2 g+1) $
= 12 [l+ g&(l+2 g+1) $]>0,
because $> g( g+1) implies that (1&$ ) l+(2 g+1) $& g>0.
Q.E.D.
Although Proposition 1 does not cover the case with $ close to 1, the
next Proposition, which is due to Ellison [7] and is reproduced here for
the reader's convenience, ensures that we can restrict attention to the case
considered in Proposition 1.
Proposition 2. Fix p, $ 0 and $ 1 , where 0<$ 0 <$ 1 <1. If, for any
$ # [$ 0 , $ 1 ], G( p, $ ) has a sequential equilibrium whose average payoffs are
more than &, then there exists $ # (0, 1) such that G( p, $ ) has a sequential
are more than & if $$.
equilibrium whose average payoffs
Proof. Set $ =$ 0 $ 1 . For any $$, there exists an integer N($ ) such
that $ N($ ) # [$ 0 , $ 1 ]. Now we divide the repeated game G( p, $ ), where
$$, into N($ ) distinct repeated games, the first of which is played in
1, N+1, 2N+1, ..., and the second of which is played in period
period
2, N+2, 2N+2, ..., and so on. Since each repeated game can be regarded
as one with discount factor $ N($ ), playing the sequential equilibrium of
G( p, $ N($ ) ) which gives more than & for both players in each divided
repeated game constitutes a sequential equilibrium of G( p, $ ) with average
payoffs more than &.
Q.E.D.
The next proposition describes the relationship between Nash equilibrium and sequential equilibrium.
File: DISTIL 231309 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 2744 Signs: 1729 . Length: 45 pic 0 pts, 190 mm
354
TADASHI SEKIGUCHI
Proposition 3. Let _=(_ 1 , _ 2 ) # 7 1 _7 2 be a Nash Equilibrium of
G( p, $ ). Then there exists a sequential equilibrium of G( p, $ ) which has the
same path as _.
t
Proof. Define a function { i : t=1 H i A i as follows. Choose t and
t
t
h # H i . Given h i and _ j , we can compute the belief about player j 's
continuation strategies by Bayes' rule. Finiteness of the stage game and
discounting imply that there exists an optimal pure continuation strategy
given the belief (see Fudenberg and Levine [8] for the details). Choose
such a continuation strategy and define { i (h ti ) as the action it assigns in
period t.
Let H i (i=1, 2) be the set of histories for player i of G( p, $ ) which can
be observed with positive probability given _. Now we define _^ =(_^ 1 , _^ 2 ) #
7 1_7 2 as (a) _^ i (h ti )=_ i (h ti ) if h ti # H i and (b) _^ i (h ti )={ i (h ti ) otherwise. It
is easy to show that _^ i is a best response to _ j .
Suppose player j plays _^ j . Player i 's system of beliefs given _^ j is the same
as that given _ j . Thus _^ i is sequentially rational given _^ j and the system
of beliefs, and the system of beliefs is of course consistent. Since it is easy
to show that _ and _^ have the same path, _^ is the desired sequential
equilibrium.
Q.E.D.
t
i
Using these results, we can now prove our main result, existence of a
nearly efficient sequential equilibrium for a repeated prisoner's dilemma
with a certain assumption on stage game payoffs, small imperfectness of
signals and patient players.
Theorem. Suppose L>G 2. Then for any =>0, there exist p >0 and
$ # (0, 1) such that G( p, $ ), with stage game payoffs given in Fig. 1, has a
sequential
equilibrium whose average payoffs are more than 1&= if
p 1 + p 2 p and $$.
Proof. See Appendix.
Remark 1. The assumption L>G 2 is necessary because of the following reason. Consider the strategy profile (_*, _*). In order that it is a Nash
equilibrium for small $ and small probabilities of errors, it must be that _ D
is an optimal continuation strategy when h ti =[(C, c), (C, c), ..., (C, c),
(C, d )]. If we also assume that occurrence of signals is independent among
players, the likelihood of an event mainly depends on the number of errors
it includes. Given h ti , there are two distinct events which includes exactly
one error. The first is the one that d observed in period t&1 is an error
and player j's continuation strategy is _ C . The second is that player j
observed d, an error, in period t&2 and switched to _ D in period t&1.
Notice that any other event compatible with that history must include
File: DISTIL 231310 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 3255 Signs: 2511 . Length: 45 pic 0 pts, 190 mm
REPEATED PRISONER'S DILEMMA
355
more than one error. This implies that player i believes that player j plays
_ D with probability as large as 12. For _ D to be optimal in this case, the
short-run losses of playing C in period t must be large, and therefore $
must not be too large. On the other hand, $ must be large for players to
cooperate. The assumption L>G 2 ensures the existence of such a moderate $.
Remark 2. We also assume that probabilities of errors are independent
of action profiles and are symmetric. Without these assumptions, errors to
player i can be much more likely than those to player j. Then, in the case
described above, player i may conclude that the d observed in period t&1
is the only error and that player j plays _ C from now on, and therefore
(_*, _*) may fail to be an equilibrium. Thus the assumptions are important,
but we can obtain efficiency even in cases which violate but are close to
the assumptions. In this respect, the theorem is robust to changes in
monitoring structure.
4. CONCLUSION
This paper shows that a nearly efficient outcome can be achieved as
an equilibrium for some kinds of prisoner's dilemma even when players'
information about their opponents' past behavior is private. In this section,
we briefly discuss on possible extension and limitation of our result.
One drawback of the result of this paper is that it holds only for a
limited class of prisoner's dilemma. One can, however, obtain a similar
efficiency result for general repeated prisoner's dilemma, by assuming additional assumptions on the structure of private monitoring. Suppose that,
because of, for example, improvement in monitoring technology,
probabilities of errors are decreasing over time. That is, any type of error
in period t is more likely to happen than that in period t$ if t<t$. Under
this monitoring structure, suppose (_*, _*) is played and player i confronts
history h ti =[(C, c), (C, c), ..., (C, c), (C, d )]. As was described in Remark 2
of the Theorem, there are two events which can be equally likely from
player i 's viewpoint if monitoring structure in each period remains the
same. Since probabilities of errors are decreasing, however, player i tends
to put more weight on the event that an error occurred to player j in
period t&2 and player j 's continuation strategy is _ D , which makes it
likely that (_*, _*) is an Nash equilibrium for a broader class of repeated
prisoner's dilemma. Indeed, in Sekiguchi [19], it is shown that near
efficiency is obtained for any given repeated prisoner's dilemma when
players are patient and probabilities of errors exponentially decrease.
Our result depends on the following special feature of the prisoner's
dilemma: the action corresponding to a profitable deviation from the
File: DISTIL 231311 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 3188 Signs: 2747 . Length: 45 pic 0 pts, 190 mm
356
TADASHI SEKIGUCHI
Figure 2
cooperative phase is the same as the one used to punish deviators. Therefore we can apply our result to stage games where the profitable deviation
from ``cooperative'' behavior is unique for each player and the actions
corresponding to the deviations constitute a Nash equilibrium which is
Pareto inferior to the cooperative behavior; for example, to the n-person
prisoner's dilemma with monitoring structure that is a natural extension of
the one presented in this paper. However, the following example shows that
it does not apply to other stage games.
Suppose A 1 =A 2 =[C, D, E ] and 0 1 =0 2 =[c, d, e]. Expected stage
game payoffs given an action profile are shown in Fig. 2 and signals are
independent among players. According to our result, one natural candidate
for an equilibrium strategy is to mix the grim trigger strategy and the
strategy where D is always played. Suppose player 2 plays the mixed
strategy and h t1 =[(C, c), (C, c), ..., (C, c), (C, e)]. The only event which
contains only one error and is compatible with h t1 is that player 2 chooses
the grim trigger strategy and the e observed in period t&1 is an error.
Therefore, given h t1 , player 2 continues playing C with high probability
when probabilities of errors are small, thereby inducing player 1 to play C
in period t. Thus player 1 does not have an incentive to follow the grim
trigger strategy given player 2's strategy. In this stage game, while the only
profitable deviation from a cooperative behavior (C, C ) is E for each
player, (E, E) is not a Nash equilibrium of the stage game, and so we cannot apply our result to this case.
So we cannot say whether it is possible to support a nearly efficient outcome as an equilibrium in general stage games. In addition, we are not sure
what will happen when the probabilities of error are larger. We hope
further research will be undertaken in these directions.
APPENDIX
Proof of the Theorem. First, notice that when p=0, g and l defined in
Section 2 are equal to G and L. Therefore, by continuity, the assumptions
File: 642J 231312 . By:SD . Date:16:09:97 . Time:15:07 LOP8M. V8.0. Page 01:01
Codes: 2557 Signs: 2056 . Length: 45 pic 0 pts, 190 mm
REPEATED PRISONER'S DILEMMA
357
G>0, L>0, 1+G&L<2, and L>G 2 imply that there exists p^ >0 such
that g>0, l>0, 1+ g&l<2, and l> g 2 if p 1 + p 2 p^.
In view of Propositions 2 and 3, it suffices to show that there exist p >0,
$ 0 , and $ 1 such that 0<$ 0 <$ 1 <1 and that G( p, $ ), where p 1 + p 2 p
and $ # [$ 0 , $ 1 ], has a Nash equilibrium whose average payoffs are more
than 1&=.
We define & :;( p, $ ) (:=C, D, ;=C, D) as a player's average payoff in
G( p, $ ) when he plays _ : and his opponent plays _ ; . Then we have
& DD ( p, $ )=0,
& CD ( p, $ )=
&(1&$ ) l
,
1&$( p 1 + p 2 )
& DC ( p, $ )=
(1&$ )(1+ g)
,
1&$( p 1 + p 2 )
and
& CC ( p, $ )=
1&$+$p 1[& CD ( p, $ )+& DC ( p, $)]
.
1&$(1&2p 1 & p 2 )
Now we define a function q as
q( p, $ )=
& CC ( p, $ )&& DC ( p, $ )
.
& CC ( p, $ )&& DC ( p, $ )&& CD ( p, $ )
q( p, $) is defined so that player i is indifferent between _ C and _ D when
player j plays _ D with probability q( p, $ ) and _ C with probability
1&q( p, $ ).
Fix =>0. Since q((0, 0), g( g+1))=0 and & CD ((0, 0), g( g+1))=1, we
can choose $ 0 and $ 1 , where g( g+1)<$ 0 <$ 1 <( g+l )(2 g+l+1), such
that
q((0, 0), $ )=
1&(1&$ )(1+ g)
# (0, 1&(1&=) 12 )
1&(1&$ )(1+ g)+(1&$ ) l
and
& DC ((0, 0), $ )>(1&=) 12,
for any $ # [$ 0 , $ 1 ] by continuity. Since q and & DC are continuous at
p=(0, 0) and $ # [$ 0 , $ 1 ], there exists p$ # (0, p^ ] such that p 1 + p 2 p$ and
$ # [$ 0 , $ 1 ] imply
q( p, $ ) # (0, 1&(1&=) 12 )
File: DISTIL 231313 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 2333 Signs: 1223 . Length: 45 pic 0 pts, 190 mm
(1)
358
TADASHI SEKIGUCHI
and
& DC ( p, $)>(1&=) 12.
(2)
Let _* # 7 i (i=1, 2) be the strategy where player i plays _ D with probability q( p, $ ) and _ C with probability 1&q( p, $ ). Associated with _*, we
define + i (h ti ) as follows. Suppose that player j plays _*. Then, for any
player i 's history h ti , player i can compute the belief on player j's continuation strategies given the history. Let + i (h ti ) be the probability that the belief
assigns to _ D .
Next, we define functions r and s as
r( p, $ )=
q( p, $ )(1& p 1 & p 2 )
[1&q( p, $ )]( p 1 + p 2 )+q( p, $ )(1& p 1 & p 2 )
and
s( p, $ )=1&
1&q( p, $ )
.
p1
p1+ p2
[1&q( p, $ )] 1+
+q( p, $ )
1&3p 1 &2p 2
1&2p 1 & p 2
\
+
r( p, $ ) is a lower bound of + i ([( } , d )]) and s( p, $ ) is an upper bound of
+i (h ti ) when h ti has the form of h ti =[(C, c), (C, c), ..., (C, c)] (these facts
will be proved later). When $ # [$ 0 , $ 1 ], r((0, 0), $ )=1 and s((0, 0), $ )=
0<q((0, 0), $ ). Therefore, continuity implies that there exists p ">0 such
that
r( p, $ )12
(3)
and
{
& CC ( p, $ )& 1+
s( p, $ )<
$
( p + p 2 ) & DC ( p, $ )
1&$ 1
=
$
( p + p 2 ) & DC ( p, $ )&& CD ( p, $ )
& CC ( p, $ )& 1+
1&$ 1
{
, (4)
=
when p 1 + p 2 p " and $ # [$ 0 , $ 1 ]. We define p =min[ p $, p ", 13].
Now we show that the strategy profile (_*, _*) is a Nash equilibrium of
G( p, $ ) when p 1 + p 2 p and $ # [$ 0 , $ 1 ]. By definition of q( p, $ ), each
player is indifferent between _ C and _ D given that the other plays _*.
Therefore, by Proposition 1, it suffices to show that (A) + i (h 2i )12 unless
h 2i =[(C, c)], (B) + i (h ti )12 when h ti =[(C, c), (C, c), ..., (C, c), (C, d )]
, (D, } )], where + i (h t&1
)12 and (C) when h ti =
or when h ti =[h t&1
i
i
[(C, c), (C, c), ..., (C, c)], it is optimal for player i not to play D in period t.
File: DISTIL 231314 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 2648 Signs: 1541 . Length: 45 pic 0 pts, 190 mm
REPEATED PRISONER'S DILEMMA
359
Using Bayes' rule, we have
+ i ([(D, c)])=
[1&q( p, $ )](1&2p 1 & p 2 )+q( p, $ )( p 1 + p 2 )
[1&q( p, $ )](1& p 1 & p 2 )+q( p, $ )( p 1 + p 2 )
>1&
+ i ([(C, d )])=
p1
,
1& p 1 & p 2
[1&q( p, $ )] p 2 +q( p, $ )(1& p 1 & p 2 )
[1&q( p, $ )]( p 1 + p 2 )+q( p, $ )(1& p 1 & p 2 )
r( p, $ )
+ i ([(D, d )])=
and
[1&q( p, $ )] p 1 +q( p, $ )(1& p 1 & p 2 )
[1&q( p, $ )]( p 1 + p 2 )+q( p, $ )(1& p 1 & p 2 )
r( p, $ ).
Then p 1 + p 2 13 and (3) prove (A).
If h ti =[(C, c), (C, c), ..., (C, c), (C, d )], the probability that h ti occurs
and player j's continuation strategy is _ C is
X=(1&q( p, $ ))(1&2p 1 & p 2 ) t&2 p 1 ,
because player j must choose _ C the d observed in period t&1 must be the
only error. And the probability that h ti occurs and player j's continuation
strategy is _ D is at least
Y=(1&q( p, $ ))(1&2p 1 & p 2 ) t&3 p 1(1& p 1 & p 2 ).
This is because Y denotes the probability of the event where h ti occurs and
player j played C until period t&2, observed d in period t&2 and switched
to _ D in period t&1. Since there are other events compatible with h ti where
player j's continuation strategy is _ D ,
+ i (h ti )
1& p 1 & p 2
1
Y
=
.
X+Y (1&2p 1 & p 2 )+(1& p 1 & p 2 ) 2
, (D, d )], where + i (h t&1
)12, we have
When h ti =[h t&1
i
i
+ i (h ti )=
1
(1&+ i (h t&1
)) p 1 ++ i (h t&1
)(1& p 1 & p 2 )
i
i
> ,
t&1
(1&+ i (h i ))( p 1 + p 2 )++ i (h t&1
)(1&
p
&
p
)
2
1
2
i
)12 and p 1 + p 2 13. When h ti =[h t&1
, (D, c)], where
because + i (h t&1
i
i
t&1
+i (h i )12, we can proceed as in the case where h 2i =[(D, c)]. Therefore, (B) is proved.
File: DISTIL 231315 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 2542 Signs: 1304 . Length: 45 pic 0 pts, 190 mm
360
TADASHI SEKIGUCHI
To prove (C), suppose h ti =[(C, c), (C, c), ..., (C, c)]. The probability
that h ti occurs and that player j's continuation strategy is _ C is
(1&q( p, $ ))(1&2p 1 & p 2 ) t&1,
while the probability that h ti occurs and that player j switched to _ D in
period k (1kt) is
(1&q( p, $))(1&2p 1 & p 2 ) k&2 p 1( p 1 + p 2 ) t&k,
if
k2,
and
q( p, $ )( p 1 + p 2 ) t&1
if k=1.
Therefore, we have
+ i (h ti )
=1&
1&q( p, $ )
t
( p 1 + p 2 ) t&k
p1 + p2
[1&q( p, $ )] 1+ p 1 :
t&k+1 +q( p, $ )
(1&2p
&
p
)
1&2p
1
2
1& p2
k=2
{
=
\
+
t&1
1&q( p, $ )
=1&
t&1
( p 1 + p 2 ) k&1
p1 + p 2
k +q( p, $ )
(1&2p
&
p
)
1&2p
1
2
1& p2
k=1
{
=
[1&q( p, $ )] 1+ p 1 :
\
+
t&1
1&q( p, $)
1&
( p 1 + p 2 ) k&1
p1 + p 2
k +q( p, $ )
(1&2p
&
p
)
1&2p
1
2
1& p 2
k=1
{
[1&q( p, $ )] 1+ p 1 :
=
\
+
=s( p, $ ),
because p 1 + p 2 13 implies that p 1 + p 2 (1&2p 1 & p 2 )<1. Given h ti ,
playing D in period t yields at most
(1&$ )[1&+ i (h ti )](1+ g)+$[1&+ i (h ti )]( p 1 + p 2 )(1+ g),
because player j playing _ C observes d with probability 1& p 1 & p 2 in
period t and switches to _ D . On the other hand, conforming to _ C yields
[1&+ i (h ti )] & CC ( p, $ )++ i (h ti ) & CD ( p, $ )
{
>[1&+ i (h ti )] 1+
$
( p 1 + p 2 ) & DC ( p, $ )
1&$
=
t
i
>(1&$ )[1&+ i (h )](1+ g)+$[1&+ i (h ti )]( p 1 + p 2 )(1+ g),
File: DISTIL 231316 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 2360 Signs: 1012 . Length: 45 pic 0 pts, 190 mm
REPEATED PRISONER'S DILEMMA
361
where the first inequality follows from (4). Therefore, it is not optimal to
play D in period t. This proves (C). Thus (_*, _*) is a Nash equilibrium
of G( p, $ ) when p 1 + p 2 p and $ # [$ 0 , $ 1 ], and its payoffs are at least
(1&q( p, $ )) & DC ( p, $ )>1&=,
by (1) and (2), which completes the proof.
Q.E.D.
REFERENCES
1. D. Abreu, On the theory of infinitely repeated games with discounting, Econometrica 56
(1988), 383396.
2. D. Abreu, D. Pearce, and E. Stacchetti, Toward a theory of discounted repeated games
with imperfect monitoring, Econometrica 56 (1990), 10411064.
3. E. Ben-Porath and M. Kahneman, Communication in repeated games with private
monitoring, J. Econ. Theory 70 (1996), 281297.
4. V. Bhaskar, Repeated games with almost perfect monitoring by privately observed signals,
1994. [Mimeo]
5. O. Compte, Communication in repeated games with imperfect monitoring, 1994. [Mimeo]
6. O. Compte, Sustaining cooperation without public information, 1995. [Mimeo]
7. G. Ellison, Cooperation in the prisoner's dilemma with anonymous random matching,
Rev. Econ. Stud. 61 (1994), 567588.
8. D. Fudenberg and D. Levine, Subgame-perfect equilibria of finite- and infinite-horizon
games, J. Econ. Theory 31 (1983), 251268.
9. D. Fudenberg and D. Levine, An approximate folk theorem with imperfect private information, J. Econ. Theory 54 (1991), 2647.
10. D. Fudenberg, D. Levine, and E. Maskin, The folk theorem with imperfect public information, Econometrica 62 (1994), 9971039.
11. D. Fudenberg and E. Maskin, The folk theorem in repeated games with discounting or
with incomplete information, Econometrica 54 (1986), 533554.
12. M. Kandori, Cooperation in finitely repeated games with imperfect private information,
1991. [Mimeo]
13. M. Kandori and H. Matsushima, Private observation, communication, and collusion,
Econometrica. [In press]
14. E. Lehrer, Lower equilibrium payoffs in repeated games with non-observable actions,
Int. J. Game Theory 18 (1989), 5789.
15. E. Lehrer, Nash equilibria of n-player repeated games with semi-standard information,
Int. J. Game Theory 19 (1990), 191217.
16. E. Lehrer, Two player repeated games with non-observable actions and observable
payoffs, Math. Oper. Res. 17 (1992), 200224.
17. H. Matsushima, On the theory of repeated games with private information. I: Anti-folk
theorem without communication, Econ. Lett. 35 (1991), 253256.
18. R. Radner, Repeated partnership games with imperfect monitoring and no discounting,
Rev. Econ. Stud. 53 (1986), 4358.
19. T. Sekiguchi, Efficiency in repeated prisoner's dilemma with private monitoring, 1996.
[Mimeo]
20. G. Stigler, A theory of oligopoly, J. Polit. Econ. 72 (1964), 4461.
.
File: DISTIL 231317 . By:DS . Date:01:10:97 . Time:10:22 LOP8M. V8.0. Page 01:01
Codes: 6866 Signs: 2658 . Length: 45 pic 0 pts, 190 mm
.