Unique and Optimal Perfect Equilibrium

Unique and Optimal Perfect Equilibrium
Prajit K. Dutta
Paolo Siconol…
Columbia University
January 8, 2016
Abstract
This paper proves a First Welfare Theorem for Games - it shows that
asynchronous dynamic games with voluntary one period ahead transfers
have a unique optimal equilibrium. The equilibrium coincides with the
Utilitarian Pareto Optimum and hence can be computed from a (simpler)
programming problem (rather than as a …xed point). Whilst it is commonly thought that Folk Theorems are endemic in dynamic games, this
paper shows that two simple changes in the standard model give us the
other extreme - a unique optimal equilibrium. More broadly, this paper is
a contribution towards uniting Mechanism Design and Repeated Games.
1
Introduction
This paper proves a First Welfare Theorem for Games - it shows that asynchronous dynamic games with voluntary one period ahead transfers have a unique
optimal equilibrium. Whilst it is commonly thought that Folk Theorems are
endemic in dynamic games, this paper shows that two simple changes in the
standard model give us the other extreme - a unique optimal equilibrium. Put
another way, Folk Theorems rely critically on two assumptions - simultaneous
moves and no transfers.
Yet neither assumption is sacrosanct. An assumption of simultaneous moves
may be no more "natural" than one of alternating moves. In an oligopoly
context, say, it may be more unnatural to assume that …rms literally move at
the same time or that …rms’moves do not become known to their competitors
before the latter has to move. (See, for example, Maskin-Tirole (1988a, b).)
Similarly, in many applications, a player can promise a conditional transfer to
another player. This would be true between countries (foreign aid, investment,
trade - see, for example, Klimenko, Ramey, Watson (.)), between …rms (licensing
or franchising fees - see, for example, Harrington-Skrypcaz (.)), between family
members (gifts, alimony), etc.
Forcing moves to be simultaneous introduces an additional "coordination
problem" (beyond that present in the payo¤s). For example, multiple equilibria
arise in, say, the Common Interest game only if players are forced to move
1
simultaneously. Allowing players to alternate moves might reduce the number
of equilibria; this is a …rst insight we explore. A second insight is that transfers
can be a more e¢ cient way to provide intertemporal incentives compared to the
Abreu-Pearce-Stachetti method of variations in continuation payo¤s (which is
what is used in Repeated Games in the absence of transfers).
When we began research our hope was that, given alternating moves, we
should have fewer equilibria and, given transfers, we should have more e¢ cient
equilibria. Our eventual result is that indeed there are fewer equilbria - being
one! - and it is fully e¢ cient.
Just having one of those two features is not enough - there is a Folk Theorem
when moves are alternating but there are no transfers. Such a game is an
example of a stochastic game and Dutta (1995) applies. There is also a Folk
Theorem when transfers are allowed but players move simultaneously. For
instance, see Dutta and Siconol… (2016) or Goodlucke-Krantz (2012). Hence,
we …nd minimally necessary conditions for our result.
More broadly, we view this paper as part of a research agenda which marries Mechanism Design to the theory of Repeated Games. Both …elds have
been hugely in‡uential and yet they have kept their distance from each other.
Mechanism Design rarely ventures into dynamic considerations. For its part,
Repeated Games takes the basic model as given and analyzes equilibrium possibilities within that model. What this paper suggests is that there is a healthy
and productive research agenda driven by the following question: is there way
to set up the dynamic model such that all equilibria have desirable properties?
Before describing our model and result, it might be useful to review the reach
and logic of Folk Theorems in Repeated Games, clearly the most celebrated result in that …eld. Folk Theorems show up in the canonical complete information
model of in…nitely and …nitely repeated games (respectively, Fudenberg-Tirole
(1986) and Benoit-Krishna (1985)) as also in variants of the canonical model
- in the more general set-up of stochastic games, (Dutta (1995), FudenbergYamamoto (2013)), and with overlapping generations of players (Smith (1997)).
They also show up in informational variants - with imperfect public monitoring
(Fudenberg-Levine-Maskin (1994)), and even with private monitoring (HornerOlszewski (2006, 2009) and Sugaya (2012, 2013)).1
The driving force behind the Folk Theorem is the logic of reciprocity - that
"players will do unto others what they expect others to do unto them". That
logic delivers the result; every individually rational payo¤ is also an equilibrium
payo¤ (for su¢ ciently patient players) or, colloquially, "anything can happen"
in equilibrium. This conclusion is disappointing since it robs the theory of
Repeated Games of any predictive power.
The model that we study in this paper is a variant on a repeated game. As
in the latter, there is a …xed stage game that, say, two players play over and
over again (over a …nite or in…nite horizon). In each period, the players get
1 The literature on Folk Theorems is truly vast.
Sugaya (2013) is a fairly up-to-date
reference on informational variants (and contains about …fty citations). Pesci and Wiseman
(2015) is a good reference for models that keep the assumption of (possibly imperfect) public
monitoring but allow state variables (and this paper has another …fty citations).
2
payo¤s and lifetime payo¤s are simply the (discounted) sum of period payo¤s.
So far this is as in a repeated game.
The departures are two-fold. First, players take turns moving. So, at any
period t, the payo¤s are derived from the action taken by the mover in period t
as well as the (…xed) action taken by his opponent in t 1. Second, the mover
also picks a non-negative transfer that she pays her opponent in the next period
depending on her opponent’s action.
To …x ideas, consider a stage game such as the Prisoners Dilemma:
1n2
conf ess
not conf ess
conf ess
0; 0
b; c
not conf ess
c; b
d; d
with conf ess as the dominant strategy and not conf ess; not conf ess the
Pareto dominant tuple (c > d > 0 > b, 2d > max(b + c; 0)).
To understand period t payo¤s, suppose the t 1 period action was not
conf ess and the promised transfer schedule was (0; ) implying a payment > 0
would be made only if the t period mover also picked not conf ess.
_ Then, if he
did indeed do so, his payo¤ would be d + while the non-mover would get d
in that period. Alternatively, if he picked conf ess then his payo¤ would be c
while hers would be b.
As is well-known, in the standard repeated game model with simultaneous
moves and no transfers, "conf ess, conf ess always and after all histories" is
a subgame perfect equilibrium (SPE) strategy (with 0,0 payo¤s, the minmax
payo¤s). In turn, via the well-known grim trigger strategy, that SPE forms
the punishment norm that supports any strictly individually rational (positive)
payo¤ as a SPE payo¤. This is the Folk Theorem construction for the (in…nitely)
repeated Prisoners Dilemma game.
Let us ask whether in our model "conf ess, conf ess always and after all
histories with no transfers" is a SPE? Without loss, we check for the viability
of one-shot deviations. Suppose, again, that the …xed t 1 period action was
not conf ess and the promised transfer schedule was (0; ). It is easy to see
that the transfer needs to be large enough that the t period mover is (at least)
indi¤erent between playing conf ess and not conf ess, i.e., with as the discount
factor,
d+ + b=c
where the LHS is the lifetime payo¤ under this strategy from playing not
conf ess in period t: (d + ) in that period plus the payo¤ in period t + 1 (b)
when the then mover will switch and play conf ess, followed by 0 payo¤s every
period thereafter. The RHS is the lifetime payo¤ under this strategy from
playing conf ess in period t; c followed by 0 every period after.
Should the period t 1 mover o¤er this transfer? She will if her lifetime
3
payo¤ starting in period t is higher after paying this transfer, i.e. if,2
d
+ c>b
which, after substituting the value of , simpli…es and strengthens to
2d
(1
)(b + c) > 0
and that holds for all under the Pareto optimality hypothesis that 2d >
max(b + c; 0). In other words, the "bad equilibrium" of the standard model is
not an equilibrium here. Whereas, in the standard model, one of the player
switching to not conf ess is simply treated as a mistake here its logical implication is that she will thereafter provide a transfer that has the other player
switch as well (at least for a period).
Of course, we still have a long way to go. Knowing that the (bad) always
conf ess strategy is not an SPE simply rules out 0 as a SPE payo¤. We do not
yet know what positive payo¤s can arise in equilibrium. What we will show as
our main result is that there is a unique SPE and it coincides with the
utilitarian Pareto optimum solution (UPO), i.e, the sum of SPE payo¤s
(across players) must equal the maximum sum of player payo¤s. One can intuit
that result by the following re-write of the last inequality:
2d + (b + c) > b + c
Notice that, once the t 1 period mover has played not conf ess, the LHS
is the sum of player payo¤s to a "one-shot" deviation from the always conf ess
strategy while the RHS is the sum of payo¤s from continuing with that strategy.
Evidently, maximizing the lifetime sum of payo¤s is a Dynamic Programming
problem. By a standard "Unimprovability" result in Dynamic Programming,
a strategy is optimal i¤ it cannot be improved by a one-shot deviation. Since
the always conf ess strategy can be improved, it is not UPO. Furthermore, the
same deviation proves that the strategy is neither an SPE under transfers nor
is it UPO which hints at a link between the two solutions.
In what follows, we generalize all this in two signi…cant ways: we show
that any strategy that is not UPO is subject to a pro…table unilateral one-shot
transfer deviation, i.e., is not an SPE. This proof requires a generalized version
of the Abreu-Pearce-Stachetti (1986) construction of SPE in repeated games.
We further show all that in the context of any two-player stage game (and not
just the Prisoners Dilemma).
In this paper it is assumed that one-period ahead moves are contractible - or,
equivalently, can be committed to. For actions, that assumption is embedded
in the timing of alternating moves; a player is not allowed to change her action
in the next period. Similarly, when she picks a transfer schedule it is assumed
2 Note that these are the lifetime payo¤s starting in period t, for the mover in period t
1.
Hence, one way of thinking about the timing is that she has already decided to play not
conf ess in period t 1 and is asking herself what transfer should she o¤er for the subsequent
period t.
4
that she is committed to make the appropriate payment in the next period.
This is a natural assumption in many contexts; for example, when money can
be placed in an escrow account. Or when a country delivers foreign aid into
the World Bank’s CDMA account (to encourage adoption of clean technology).
Or when …rms agree to pay licensing fees. Note that the commitment is only
for one period. Yet it implies an in…nite period optimality. In Section 6 we
discuss what happens when this assumption is not made.
The paper is organized as follows. In Section 2 we detail the model and
in Section 3 prove the result for the in…nite horizon case. Section 4 proves
the analogous result for the …nite horizon. Section 5 contains an example that
illustrates some issues that arise in the presence of in…nite transfer histories. In
Section 6 we discuss extensions and generalizations while Section 7 concludes.
2
Model
In this section, we present a general model. Let G denote a two player stage
game (in strategic form). (In Section 6 we extend the analysis to N players.)
Denote player i’s strategy set Ai and her payo¤ function i , where, as usual,
A1 A2 is the set of strategy tuples for the players.
i : A ! R, and A
Suppose that Ai is …nite for every i.3 Let i denote the generic player and let j
denote the "other" player, j 6= i.
The timing structure is that of alternating moves with players moving
sequentially. Hence, player i gets to move in period t followed by player j in
period t+1 and so on. Whenever it is player i0 s turn to move, she can choose an
action ait from the set of feasible actions, Ai - and this action then remains …xed
till the next time i can change her action. Payo¤s are ongoing. If aj denotes
the (…xed) action of the other player, then the period t stage game payo¤s are
4
We
i (ait ; aj ); j (ait ; aj ). There is an initial action state a for the game.
will consider all possible initial action states, i.e., all possible …xed actions and
both possible …rst movers.
Additionally, suppose that a player can also make a conditional transfer to
the other player - conditional in that it can be tied to the next period action
of the other player. For example, if in the previous period the then mover
(player j) had played aj and promised the current mover that she would be
paid according to the schedule, j (ai jaj ) then the total payo¤s of player i inclusive of transfers will be i (ait ; aj ) + j (ai jaj ) while that of player j would be
j (ait ; aj )
j (ai jaj ).
One interpretation of these transfers is that they are escrow commitments; if
player j promises according to the transfer schedule j (:jaj ) then an amount to
cover the commitment is posted to an escrow account from which funds equal to
0
0
j (ai jaj ) are paid out upon play of the action ai . How much information about
the transfer schedules and the actual transfers paid out is publicly known will
3 The
analysis extends to the case of Ai compact.
say, player i is the mover in the initial period 0 , then the two players’s payo¤s in that
initial period depend on the pair of actions (ai0 ; a).
4 If,
5
depend on the escrow mechanism. We make a couple of di¤erent assumptions
- detailed below - on how much is publicly known and publicly veri…able of past
escrow commitments and actual transfers.
The horizon can be in…nite or …nite. In the latter case, lifetime payo¤s
will be the undiscounted sum. In the discounted case, the lifetime payo¤s will
be evaluated according to the discounted average; player i0 s evaluation of the
payo¤ will be given by
P1
(1
) t=0 t f[ i (ait ; ajt 1 ) + j (ai jajt 1 ) k i mover]+
(1)
+[ i (ait 1 ; ajt )
i (aj jait 1 ) k j mover]g
2.1
Equilibrium
Let us start with the assumption on what we take to be publicly observable/veri…able for all parties:
Informational Assumption - 1. All stage game actions are observable. Hence,
an action history hat at time t includes all the past actions - the actual history
of play of the stage game, hat = (a; aj0 ; ai1 ; :::ajt 1 ).
2. Transfer/escrow commitments going back M periods are observable, where
1 6 M < 1. Hence, a transfer history ht at time t includes M past escrow
commitments - ht = ( jt M (:); it M +1 (:); ::: jt 1 (:)).
Note that the transfer history includes information about the entire transfer
schedule and not just the actual transfer that was made in a previous period.
Any results that we prove will continue to hold if we replaced that requirement
with one that said only the actual transfers are observable. Of course, the
entire schedule needs to be known for the immediately preceding period, i.e.,
for jt 1 (:), since the t th period action choice of player i will depend on that
(payo¤-relevant) schedule. Put another way, M must be at least > 1. The
restriction that in…nitely long transfer commitments are not publicly veri…able
seems plausible. Typically, banks or other …nancial institutions, to which such
an escrow commitment is posted, wipe out their records when a period of time
has elapsed. We will have some results on the case M = 1 as well.
To summarize, a history ht at time t is given by ht = (hat ; ht ). Our main
theorem will apply to equilibria that condition on all histories ht . (The buildup to that result will be one that allows only the currently outstanding transfer
commitments to be observable, i.e., that preliminary result will assume that
M = 1.)
Note also the following about the payo¤ -relevant history. For the t th
period action choice ait , the payo¤ relevant parts of the history are only the
preceding period’s choices: ajt 1 ; jt 1 (:), the …xed action of the other player j
and the transfer schedule that j has picked. For the t th period transfer choice
it (:), which is only paid in period t + 1, there is no payo¤ relevant part of the
history ht since the t + 1 period payo¤s will depend on ait and ajt+1 neither of
which are in ht . (The t th period payo¤s will also, of course, depend on the yet
to be made choice of it (:)). We will exploit this di¤erence in payo¤-relevance
between ait and jt 1 in the analysis that follows.
6
Let us though formally de…ne strategies and equilibria …rst.
A t th period strategy it for player i is an action choice ait that maps from
a history ht and a transfer choice it (:) that maps from (ht ; ait ). A strategy for
player i in the game, i , is a speci…cation of a strategy it for that player in
every period that she is the mover. A strategy vector - one strategy for every
player - de…nes in the usual way a (possibly probabilistic) history ht . Denote
the lifetime payo¤ of the mover at time t, vi :
vi (ht ) = maxf(1
)[ i (ai ; ajt
ai
1)
+
jt 1 (ai )]
+ wi (ht ; ai )g
(2)
where wi is the continuation for player i that follows from the optimal choice
of a transfer schedule it , given ait , and anticipating the "Stackelberg follower"
player j 0 s best response ajt+1 to that transfer, i.e.,
wi (ht ; ait ) = maxf(1
)[ i (ait ; aj ( i ))
i (:)
i (aj )]
+ vi (ht+2 (ait ;
it ))g
(3)
where ht+2 (ait ; it ) is the history at period t + 2 caused by player i0 s period t actions. Hence, ht+2 (ait ; it ) = (hat ; ait ; ajt+1 ( it ); jt M +2 (:); ::: it (:),
jt+1 (:)), the entire action history and the last M periods of transfer schedules.
In that history, the follower player j 0 s best response action in period t + 1,
aj ( it ), comes from the t + 1 period analog of Eq. 2:5
vj (ht+1 (ait ;
it ))
=
(1
)[
=
maxf(1
j (ait ; aj ( it ))
aj
)[
and his optimal transfer schedule
problem
wj (ht+1 ; aj ) = maxf(1
j (:)
)[
+
j (ait ; aj )
jt+1
it (aj ( it ))]
+
it (aj )]
+ wj (ht+1 ; aj ( it(4)
))
+ wj (ht+1 ; aj )g
must be a solution to the optimization
j (ai ( j )); aj )
j (ai )]
+ vj (ht+3 ))g
A Subgame Perfect Equilibria (SPE) is a pair of strategies that are best
responses to each other.
Markov Strategies are choices that depend only on the payo¤-relevant histories. A t th period Markov strategy for player i is hence an action choice ait
that maps from ajt 1 ; jt 1 (:) and a constant function i (:). A Markov Perfect
Equilibria (MPE) is a pair of Markov strategies that are best responses to each
other. As is well-known, MPE are also SPE.
3
Main Result: Equilibrium is Unique and Optimal
Let us start with a benchmark - the Utilitarian Pareto Optimum:
5 Note
that ht+1 (ait ;
it )
= (ha
t ; ait ;
jt M +1 (:);...
7
it (:)).
De…nition 1 The Utilitarian Pareto optimum (UPO) problem is the maximization of the discounted sum of players payo¤ s subject to an initial condition specifying a …xed action of the …rst period non-mover:
max
fait ;ajt gt
s:t: ai0
(1
)
0
= ai , ait = ait
1
X
t
[ i (ait ; ajt ) +
j (ait ; ajt )]
(5)
t=0
1,
t > 1; t odd, and ajt = ajt
1,
t > 0; t even
It is well known that the solution is Markovian, i.e., there is b i (aj ) - respectively, b j (ai ) - such that the Pareto optimal solution is for player i to play
according to b i (aj ) - respectively for player j to play b j (ai ) - whenever it is her
turn to move.
The main theorem that ties equilibrium behavior to the UPO is:
Theorem 2 There is a unique SPE to the game. It coincides with the Pareto
optimum solution in terms of actions - the equilibrium is for player i to play
according to b i (aj ) - respectively for player j to play b j (ai ) - whenever it is her
turn to move. The transfers that are required are also unique.
Since the UPO is a Markov strategy, it follows from the theorem that not
only is there a unique SPE but it is, in fact, a MPE.
3.1
The Proof When Only the Last Transfer is Observed
It is easier to break the proof into two parts, …rst when M = 1 and then the
general case. The proof will proceed by way of a modi…ed Bellman-AbreuPearce-Stachetti (BAPS) argument. There will be a set of steps.
3.1.1
De…nition of Two Operators and an Equivalence
Let Vi and Wj be correspondences with domain Aj (and range R). These
are to be thought of as potential SPE continuation payo¤s starting at a period
when player i is the mover, and aj is …xed, with continuation payo¤ vi (:) and
wj (:) respectively for players i and j (and the arguments in the continuations
could be all or part of the history of play ht ). Respectively, let Vj and Wi be
correspondences with domain Ai (and range R) that are to be thought of as
potential SPE payo¤s starting at a period when player j is the mover.
De…ne the following operator that determines player i0 s decision on transfer
8
i (aj jai )
if she were to, simultaneously, play the action ai .
(Vi ; Wj )(ai )
=
aj
2
vj
=
(1
8aj ; i (:); aj
2
arg maxf(1
wi
=
(1
0
0
0
>
fwi ; vj : 9aj ;
i (aj
arg maxf(1
)[
aj
)[
j (ai ; aj )
)[
aj
:
j ai ); vi (aj j ai ); wj (aj j ai ) 2 Vi ; Wj (6)
j (ai ; aj ) +
+
i (aj
0
0
)[ i (ai ; aj )
0
j ai )] + wj (aj j ai )g
j ai )] + wj (aj j ai )
j (ai ; aj )
)[ i (ai ; aj )
(1
i (aj
i (aj
+
0
i (aj )]
+ wj (aj j ai )g
j ai )] + vi (aj j ai g
i (aj )]
0
+ vi (aj j ai )g
Put another way, the operator speci…es equilibrium payo¤s to a family
of "one-shot" Stackelberg games with player i as the …rst mover choosing a
transfer schedule i (:jai ) and player j as the follower picking action aj . The
payo¤s in these one-shot games are de…ned by two parts - a) a stage game payo¤
augmented by the transfer determined by the schedule (respectively, i (ai ; aj )
i (aj ) and j (ai ; aj ) + i (aj )), and b) a "termination" payo¤ that needs to be
drawn from the given correspondences (Vi ; Wj ). Subject to that feasibility
constraint, these termination payo¤s can depend on the action chosen by player
j, aj (and hence are written, respectively, vi (: j ai ); wj (: j ai ).6
Consider a related - and seemingly simpler - operator T de…ned on the same
domain as :
T (Vi ; Wj )(ai )
=
b
aj
2
fwi ; vj : 9b
aj ; vi (aj j ai ); wj (aj j ai ) 2 Vi ; Wj (aj ) :
arg maxf(1
aj
vj
=
(1
)
wi
=
maxf(1
s.t. vj
=
(1
i ;aj
)[
)
aj )
j (ai ; b
j (ai ; aj )
+ wj (aj j ai )g
+ wj (b
aj j ai )
)[ i (ai ; aj )
j (ai ; aj )
+
(7)
i (aj )]
i (aj )]
+ vi (aj j ai )g
+ wj (aj j ai ), aj 2 Aj
As with the operator ; the simpler operator T (Vi ; Wj ) also de…nes "continuation payo¤s" starting at a period when player i is the mover. Note that in T ,
the transfer schedule i (:) is implicitly de…ned given the continuation selections
vi (: j :); wj (: j :) whereas in the transfer is seemingly determined alongside the
continuations. As with all Bellman and APS constructions, the idea of these
operators is to reduce the in…nite horizon problem to a sequence of one period
problems and use the Unimprovability Principle to show an equivalence between
the solution to the one period problem and the (original) in…nite horizon one.
We …rst show that two operators are equivalent:
Lemma 3 Given correspondences, Vi and Wj , (Vi ; Wj )(ai ) = T (Vi ; Wj )(ai ),
for all ai . Similarly, (Vj ; Wi )(aj ) = T (Vj ; Wi )(aj ), for all aj .
6 In the usual way, the entire action history ha gets incorporated since the termination
t
payo¤s can be drawn di¤erently for every ha
t.
9
Proof. We …rst show that T (Vi ; Wj )(ai ) v (Vi ; Wj )(ai ). Given wi ; vj 2
T (Vi ; Wj )(ai ), we have continuations vi (: j :); wj (: j :) and an associated "no
transfer" most preferred action b
aj that satisfy Eq. 7. From the last line of that
equation, we can derive the transfer schedule i (: j :) as:
(1
) i (aj j ai ) = (1
)[
aj )
j (ai ; b
j (ai ; aj )]
+ [wj (b
aj j ai )
wj (aj j ai )]
Note that it follows that i (b
aj j ai ) = 0. It further follows from the last line
of Eq. 7 that all actions aj yield the same "one-shot" payo¤ (1
)[ j (ai ; aj )
+ i (aj )] + wj (aj j ai ) to player j. De…ne aj as
aj 2 arg maxf(1
)[ i (ai ; aj )
aj
0
Consider an alternative transfer schedule
0
suppose that aj 2 arg maxaj f(1
particular, then
(1
)[
0
j (ai ; aj )
+
0
0
i (aj )]
)[
0
+ wj (aj
i (aj )]
i (:)
j (ai ; aj )
j
(8)
0
and a best action choice aj , i.e.,
+
ai ) > (1
> (1
=
+ vi (aj j ai )g
)[
(1
)[
0
i (aj )]
)[
+ wj (aj j ai )g.
aj )
j (ai ; b
aj )
j (ai ; b
0
j (ai ; aj )
+
+
+
aj )]
i (b
0
i (aj )]
0
aj )]
i (b
In
+ wj (b
aj j ai )
+ wj (b
aj j ai )
0
+ wj (aj j ai )
Proof. the second-last inequality following from the fact that i (b
aj j ai ) = 0 (as
established above) and the last equality from the fact that player j is indi¤ erent
across actions for the transfer schedule i . Looking at the two outer terms in
0
0
0
the full inequality we can then conclude that i (aj ) > i (aj ). From Eq. 8 above
we know that
wi
=
(1
)[ i (ai ; aj )
i (aj )]
0
0
>
(1
)[ i (ai ; aj )
>
(1
)[ i (ai ; aj )
i (aj )]
0
0
0
i (aj )]
0
+ vi (aj j ai )
0
+ vi (aj j ai )
0
+ vi (aj j ai )
0
0
and the last inequality follows from i (aj ) > i (aj ). Hence, all the requirements of Eq. 6 have been satis…ed, i.e., we have established that wi ; vj 2
T (Vi ; Wj )(ai ) imply that wi ; vj 2 (Vi ; Wj )(ai ).
For the opposite inclusion, suppose that wi ; vj 2 (Vi ; Wj )(ai ). Given
wi ; vj 2 (Vi ; Wj )(ai ), we know that we have continuations vi (: j :); wj (: j :)
and we can de…ne an associated "no transfer" most preferred action b
aj that
satis…es Eq. 7. Let us …rst show that i (b
aj j ai ) = 0. Suppose instead that
aj j ai ) = > 0. We are also given a transfer inclusive most preferred
i (b
action aj . It follows that i (b
aj j ai ) > . Consider an alternative transfer
schedule
0
i (:)
that reduces the transfers at b
aj and aj by
10
and reduces transfers
at other actions such that the lifetime payo¤ s everywhere is the same as at b
aj ,
i.e.,
(1
0
) i (aj j ai ) = (1
)[
aj )
j (ai ; b
j (ai ; aj )]+
[wj (b
aj j ai ) wj (aj j ai )] (9)
It is not di¢ cult to see that in this transfer scheme aj continues to be the
most preferred action and yet it costs the giver, player i, strictly less. Hence the
contradiction proves that i (b
aj j ai ) = 0. The same argument also establishes
that the transfer at aj must be such that player j is just indi¤ erent between aj
and b
aj . Hence the …rst two lines of Eq. 7 have been established. That aj must
be derived from a maximization problem follows, for example, by considering a
transfer scheme that satis…es Eq. 9. The fact that, from Eq. 6 we know that
0
0
0
wi = (1
)[ i (ai ; aj )
)[ i (ai ; aj )
i (aj j ai )] + vi (aj j ai ) > (1
i (aj )]
0
+ vi (aj j ai ) is just a re-statement of the penultimate equation in Eq. 7. The
lemma is proved.
The operator T above can be simpli…ed further. It is clear from the de…nition
of vj above that there is an equivalent way of writing the transfer constraint the last line in Eq. 3:
(1
) i (aj ) = (1
)[
aj )
j (ai ; b
j (ai ; aj )]
+ [wj (b
aj )
wj (aj )]
Substituting that into the de…nition of wi , the second-last line in Eq. 3, we
have
wi = maxf(1
)[ i (ai ; aj ) +
aj
j (ai ; aj )]
+ [vi (aj ) + wj (aj )]g
vj
This allows a more compact de…nition of the operator T with the transfer
terms completely removed:
T (Vi ; Wj )(ai )
=
b
aj
2
fwi ; vj : 9b
aj ; vi (aj ); wj (aj ) 2 Vi ; Wj (aj ) :
arg maxf(1
aj
vj
=
(1
wi
=
maxf(1
aj
)
)
aj )
j (ai ; b
j (ai ; aj )
(10)
+ wj (aj )g
+ wj (b
aj )
)[ i (ai ; aj ) +
j (ai ; aj )]
+ [vi (aj ) + wj (aj )]g
vj
Similarly, de…ne the operator T (Vj ; Wi )(aj ) by transposing the indices i and
j. We will now establish some properties of the operator T .
Lemma 4 If Vi , Wj , Vj , Wi are non-empty compact-valued correspondences
then so are T (Vi ; Wj ) and T (Vj ; Wi ). Furthermore, they are monotone, i.e.,
0
0
0
0
T (Vi ; Wj ) T (Vi ; Wj ) if (Vi ; Wj ) (Vi ; Wj ). Similarly for T (Vj ; Wi ).
Proof: Follows from the de…nitions.
11
3.1.2
Fixed Point of Operator and a Uniqueness Property
Let a sequence of correspondences be de…ned as follows:
De…nition 5 Consider the set of all feasible payo¤ s in the stage game G inclusive of transfers. Call that set of pairs, one payo¤ for each player, F . De…ne
Vi0 ; Wj0 = F = Vj0 ; Wi0
Then, inductively de…ne
Vin+1 ; Wjn+1
= T (Vjn ; Win )
Vjn+1 ; Win+1
T (Vin ; Wjn )
=
(11)
We now show, by a standard argument that uses the lemma above, that the
sequence has a …xed point:
Lemma 6 The pairs Vjn ; Win and Vin ; Wjn converge to a non-empty compactvalued correspondence as n ! 1. Call the limit correspondence Vj1 ; Wi1 and
Vi1 ; Wj1 .
Proof. From the fact that the operator preserves compact-valuedness and nonemptiness it follows that each correspondence has that property along the sequence. From the monotonicity property of the operator it follows that Vjn ; Win (aj )
and Vin ; Wjn (ai ) are monotonically decreasing non-empty compact sets at each
ai and aj and hence that the limit correspondence Vj1 ; Wi1 and Vi1 ; Wj1 are
non-empty compact-valued correspondences.
The next Lemma shows that Vj1 ; Wi1 (ai ) and Vi1 ; Wj1 (aj ) are singlentons,
for all (ai ; aj ).
Lemma 7 The correspondences V21 ; W11 and V11 ; W21 are single valued for
all a1 2 A1 and a2 2 A2 .
Proof By the de…nition of the operator T and of the correspondences
Vj1 ; Wi1 , i 6= j, i = 1; 2, a vector
(v20 ; w10 ; v10 ; w20 )
f[v20 (a1 ); w10 (a1 )]a1 2A1 ; [v10 (a2 ); w20 (a2 )]a2 2A2 g
is an element of the product correspondence
1
1
a1 2A1 V2 ; W1 (a1 )
f
1
1
a22A2 V1 ; W2 (a2 )g
if there exists a sequence
fv2t ; w1t ; v1t ; w2t gt
f[v2t (a1 ); w1t (a1 )]a1 2A1 ; [v1t (a2 ); w2t (a2 )]a2 2A2 gt
0
0
with (v20 ; w10 ; v10 ; w20 ) = (v20 ; w10 ; v10 ; w20 ) such that vit (aj ); wjt (aj ) 2 (Vi1 ; Wj1 )(aj ),
for all t and aj and i and such that the following system of equations holds true
for all i = 1; 2 and i 6= j:
vit (aj ) = maxf(1
ai
) i (ai ; aj ) + wit+1 (ai )g, all aj
12
(12)
(wit +vjt )(ai ) = maxf(1
aj
)[ i (ai ; aj )+
j (ai ; aj )]+
(wit+1 +vjt+1 )(aj )g, all ai
The last equations imply that (wit +vjt )(ai ) is the value of the Pareto problem 5
at ai which we denote by r(ai ). Therefore, the sequences f[v1t (a1 ); w2t (a1 )]a1 2A1 ; [v2t (a2 ); w1t (a2 )]a2 2A2 gt
must be a solution to equations 12 and to
vit (aj ) + wjt (sj ) = r(s2 ), all aj and i.
(13)
We rewrite equations 12 and 13 in a more convenient form. Solving equations
13 for w2t (a2 ) when i = 1 and for v2t (a1 ) when i = 2 and them substitute those
values into equations 12 when i = 2 obtaining:
r(a1 )
w1t (a1 ) = maxf(1
)
a2
2 (a1 ; a2 )
+ [r(a2 )
v1t+1 (a1 )g, all a1 ,
or equivalently
wit (ai ) = minfr(ai ; aj ) + vit+1 (aj )g, all ai .
aj
(14)
for r(a1 ; a2 ) = r(a1 ) (1 ) 2 (a1 ; a2 ) r(a2 ).Therefore in order to prove that
the correspondences V21 ; W11 and V11 ; W21 are single valued, it is su¢ cient to
show that there exists a unique and time invariant sequence fv1t ; w1t gt 0 , with
(v1t (a1 ); v1t (a2 )) 2 V11 (a2 ) W11 (a1 ), for all t and (a1 ; a2 ) solving equations
12 and 14. This is shown in the next claim.
Claim 8 : The solution to equations 12 and 14 is unique and time invariant.
Proof First we show that fv1t ; w1t gt 0 is unique. Then that uniqueness
implies time invariance.
(Proof 1 - Contraction): De…ne the following two operators:
U1 w(aj ) = maxf(1
ai
) i (ai ; aj ) + w(ai )g
(15)
and
U2 v(ai ) = minfr(ai ; aj ) + v(aj )g
aj
(16)
wherer w : Ai ! R and v : Aj ! R (and hence U1 w : Aj ! R while
U2 v : Ai ! R). We shall show that both operators are contractions and hence
have a unique …xed point - and that will prove the claim. The proof will follow
Blackwell (1955) and is repeated here only because the minimization in Eq. 16
may appear non-standard to some readers who are familiar with Blackwell’s
proof. It is straightforward to see that
U1 (w + k)(aj ) = U1 w(aj ) + k; U2 (v + k)(ai ) = U2 v(ai ) + k
(17)
for any constant k. Equally clearly, if w0 > w then U1 w0 > U1 w while
if v 0 6 v then U2 v 0 6 U2 v. For any two functions w and w0 , denote the
supnorm k w w0 k = supai j w(ai ) w0 (ai ) j (and likewise k v v 0 k = supaj
13
0
j v(aj ) v 0 (aj ) j. For any two pairs of functions (w; v) and (w0 ; v 0 ) denote the
supnorm as: k (v; w) (v 0 ; w0 ) k= maxfk w w0 k, k v v 0 kg.
We will now show that
k (U2 v; U1 w)
(U2 v 0 ; U1 w0 ) k6
(v 0 ; w0 ) k
k (v; w)
(18)
Since it is always the case that w0 + k w w0 k> w (and similarly, w+ k
w w0 k> w0 ), from the monotonicity of the operator U1 and Eq. 17 it follows
that
k w w0 k>j U1 w(aj ) U1 w0 (aj ) j 8aj
and hence, k w w0 k>k U1 w
case that v 0 k v v 0 k6 v (and v
operator U2 implies that
U1 w0 k. Similarly, since it is always the
k v v 0 k6 v 0 ), the monotonicity of the
v 0 k>j U2 v(ai )
kv
U2 v 0 (ai ) j 8ai
and hence, k v v 0 k>k U2 v U2 v 0 k. The last two norm inequalities
clearly prove Eq. 18. Hence, the operators are contractions and consequently
have a unique …xed point. The claim is proved.
m
m
(Proof 2) By contradiction let fv1t
; w1t
gt 0 , m = 1; 2, be two distinct
m
m0
sequences solving equations 12 and 14. Therefore, the sequence fv1t
v1t
gt 0 ,
for m 6= m0 , solves
0
m
v1t
(a2 )
m
v1t
(a2 )
=
m
) (a1 ; a2 ) + w1t+1
(a1 )g
maxf(1
a1
0
m
) (a1 ; a2 ) + w1t+1
(a1 )gg, all a2
maxf(1
a1
Since for any two functions f and g, it is maxx f (x)
g(x), it is:
0
m
v1t
(a2 )
m
v1t
(a2 )
m
fw1t+1
(a1 )
m
where a1 2 argfmaxa1 fw1t+1
(a1 )
ities 19 can be rewritten as:
0
m
m
v1t
(a2 ) v1t
(a2 )
maxx g(x)
maxx f (x)
0
m
w1t+1
(a1 )g, all a2 ,
(19)
0
m
w1t+1
(a1 )g. Then, by equation 14, inequal-
0
m
m
fmin
fr(a1 ; a2 )+ v1t+2
(a02 )g min
fr(a1 ; a02 )+ v1t+2
(a02 )gg, all a2 ,
0
0
a2
a2
0
m
0
0
or, for am
2;t+2 2 arg mina02 fr(a1 ; a2 ) + v1t+2 (a2 )gg, m = m ; m , as
0
m
m
v1t
(a2 ) v1t
(a2 )
0
m
m
As by de…nition, r(a1 ; am
2;t+2 ) + vt+2 (a2;t+2 )
last inequalities imply the following:
m
v1t
(a2 )
0
0
m
m
m
m
m
fr(a1 ; am
2;t+2 )+ v1t+2 (a2;t+2 ) r(a1 ; a2;t+2 )+ v1t+2 (a2;t+m )g, all a2
0
m
v1t
(a2 )
2
0
m
fv1t+2
(am
2;t+2 )
14
0
0
m
m
r(a1 ; am
2;t+2 ) + vt+2 (a2;t+2 ), the
0
0
m
v1t+2
(am
2;t+2 )g, all a2 .
Then arguing recursively, for all a2 :
m
v1t
(a2 )
0
m
v1t
(a2 )
2
0
m
fv1t+2
(am
2;t+2 )
lim
n!1
2n
0
0
m
v1t+2
(am
2;t+2 )g
0
m
fv1t+2n
(am
2;t+2n )
0
0
m
v1t+2n
(am
2;t+2n )g all a2 .
Since V11 is a compact valued correspondence, the values vtm (a2 ) vtm (a2 )
m
are uniformly bounded. Then, < 1 implies that limn!1 2n fv1t+2n
(a2 )
m0
0
1
v1t+2n (a2 )g = 0, all a2 . As m and m are arbitrary, the latter implies fv1t (a2 )gt 0 =
2
fv1t
(a2 )gt 0 , for all a2 , establishing uniqueness of the solution. Moreover, if time
invariance would fail there would be t0 > t such that (v1t )(a2 ) 6= (v1t0 )(a2 )
for some a2 2 A2 . Then the two distinct sequences fw1t+t ; v1t+t gt 0 and
fw1t+t0 ; v1t+t0 gt 0 would be solutions to equations 12 and 14 contradicting the
…rst part of the argument.
By Lemma 8, SPE value sequences are time invariant, that is, they depend
on the state (or action ) ai or aj . If the solutions to the planner problem 5 are
unique for all initial states ai or aj , SPE action and transfer pro…les are unique
with SPE actions identical to the planners optimal actions. However, if for some
ai or aj , there are multiple solutions to the planner problem, there are multiple
action and transfer SPE pro…les (with actions identical to the planner’s actions)
all of them obviously payo¤ equivalent.
3.1.3
Equivalence Between Fixed Point and the SPE Value Set
Let Vj ; Wi denote the SPE equilibrium value correspondence when the initial
mover is player j (with associated SPE payo¤s vj ; wi 2 Vj ; Wi ). Similarly,
let Vi ; Wj denote the SPE equilibrium value correspondence when the initial
mover is player i. In compact notation, suppressing player subscripts, let us
write that quartet as V ; W .
The argument that this SPE value correspondence is nothing but the …xed
point V 1 ; W 1 computed above will draw on the argument in Abreu-PearceStachetti for discounted Repeated Games (see Abreu-Pearce-Stachetti (1986))
which in turn is a strategic version of the Bellman Principle for Dynamic programming. The …rst step will show that the SPE value correspondence V ; W
is "self-generated" (in APS terminology) in that V ; W
T (V ; W ). (This
is the game-theoretic analog to the (Necessity Part of the) Optimality Principle in Dynamic Programming, the assertion that the value function must be
bounded above by the Bellman operator applied to itself.) The second step
will be an appeal to the monotonicity property of the operator to show that
V ;W
V n ; W n for all n and hence that V ; W
V 1 ; W 1 . The third
step - and the …nal one - will follow from the fact that we have already shown
- see Lemma 6 above - that V 1 ; W 1 is a singleton; hence, it must be the case
that V ; W = V 1 ; W 1 . That will evidently complete the proof.
The …rst step hence is:
15
Lemma 9 For the SPE equilibrium value correspondence, it is the case that
Vi ; Wj
T (Vj ; Wi )
Vj ; Wi
T (Vi ; Wj )
Proof. Suppose that vj ; wi 2 Vj ; Wi (ai ); i.e., when player i contemplates an
action ai ; these are SPE payo¤s that arise based on equilibrium strategies - say,
7
In particular, the pair of best response strategies imply
i (ai ) and j (ai ).
a concurrent transfer schedule i (:) (picked by i) and an action aj (picked by
j) and continuation payo¤s wj (aj j ai ): Since this is a SPE, the continuation
payo¤s do not depend on the t period transfer scheme i (:) (since that is not
part of the public history at time t + 1, the initial period for the continuation
wj (aj j ai )):
Let us detail each player’s incentives more carefully starting with player j:
(IC - Player j): The action aj maximizes player j 0 s lifetime payo¤s
aj 2 arg maxf(1
)[
j (ai ; aj )
+
i (aj )]
+ wj (aj j ai )g
(22)
(IC - Player i): There must not exist an alternative strategy for player
0
0
i, i (ai ), with, say, an immediate transfer schedule i (:); and a best response
0
0
public strategy of player j, j (ai ; i (:)), with continuation payo¤s wj (aj j ai ) 2
0
Wj ; so that an action aj is incentive compatible for player j, i.e.,
0
0
a) aj solves Eq. 22 for i (:) and wj (aj j ai );and that
b) player i prefers it because it gives her higher lifetime payo¤s, i.e.,
(1
0
)[ i (ai ; aj )
0
0
i (aj )]+
0
vi (aj j ai ) > (1
vi (aj j ai )
(23)
Clearly, from the de…nition it follows that vj ; wi 2 (Vj ; Wi )(ai ). However,
we have already seen that (Vj ; Wi )(ai ) = T (Vj ; Wi )(ai ). Hence, the lemma
has been proved. A swap of player subscripts proves similarly that vj ; wi 2
T (Vi ; Wj ).
From Lemma 4 above, it follows that we have V ; W
T (V n ; W n ) =
n+1
n+1
1
1
V
;W
for all n and hence that V ; W
V ; W . Hence, Step 2 is
proved. Finally, from Lemma 8 above we have the unique valuedness of the
correspondence V 1 ; W 1 . That then implies that one of two things must be
true: a) the SPE value correspondence V ; W ! R is empty-valued. Or, b)
that it is non-empty-valued and is in fact identical to the function V 1 ; W 1 .
That the former is not possible, that there always exists a SPE, follows - for
example - from Harris (1985) who showed that SPE always exist in perfect
information games with compact histories and continuous payo¤s.
All three steps have been proved and we have shown that the SPE value
correspondence V ; W agrees with the …xed point of the operator V 1 ; W 1 .
7 Note
)[
j (ai ; aj )
i (aj )]+
that if this is play commencing after a history, say ht , then the strategies i (ai ) and
might be indexed by the history and should be written more completely as i (ai ; ht )
and j (ai ; ht ). Since the set of SPE is independent of history, and we are investigating a
generic SPE, it saves notation to drop ht in what follows.
j (ai )
16
3.2
Completing the Proof for Finite Transfer Histories
In this section we show that the result extends to the case of any M < 1.
Lemma 10 The conclusion that there is a unique SPE which coincides with the
Utilitarian maximum holds for all 1 < M < 1.
Proof. Recall that a history ht at time t is given by ht = (hat ; ht ) where the
…nite transfer history is ht = ( jt M (:); it M +1 (:); ::: jt 1 (:)). Recall also the
"Stackelberg leader" player i0 s optimal transfer problem, i.e.,
wi (ht ; ait ) = maxf(1
)[ i (ait ; aj ( i ))
i (:)
i (aj )]
+ vi (ht+2 (ait ;
it ))g
(24)
where ht+2 (ait ; it ) is the history at period t + 2 caused by player i0 s period
t actions, ht+2 (ait ; it ) = (hat ; ait ; ajt+1 ( it ); jt M +2 (:); ::: it (:), jt+1 (:)). In
that history, the follower player j 0 s best response action in period t + 1, aj ( it ),
comes from:8
vj (ht+1 (ait ;
it ))
= maxf[
aj
j (ait ; aj )
+
it (aj )]
+ wj (ht+1 ; aj )g
(25)
where, similarly, ht+1 is the history at period t + 1 caused by player i0 s
period t actions, ht+1 (ait ; it ) = (hat ; ait ; jt M +1 (:); ::: it (:)). Hence, from
Eq. 25 it is clear that aj ( it ) does not contain any reference to jt M (:) since,
by period t + 1 when that action is chosen, jt M (:) is no longer in the …nite
transfer memory. In particular, wi (ht ; ait ) is then independent of jt M (:).
Since wi (ht ; ait ) is independent of jt M (:) it then follows, from the de…nition
of player i0 s lifetime payo¤s starting at time t;
vi (ht ) = maxf(1
ai
)[ i (ai ; ajt
1)
+
jt 1 (ai )]
+ wi (ht ; ai )g
(26)
that vi (ht ) is also independent of jt M (:). By the same logic, the payo¤s
for player j starting in the subsequent period t + 1, vj (ht+1 (ait ; it )), must be
independent of it M +1 (:). Then, from Eq. 25, it follows that player j 0 s best
response action in period t + 1, aj ( it ), which is the solution of Eq. 25, must
itself be independent of it M +1 (:). But, invoking Eqs. 24 and 26, that implies
that wi (ht ; ait ) and vi (ht ) must both be independent of it M +1 (:).
At this point the argument in the previous paragraph can be repeated to
show that if wi (ht ; ait ) and vi (ht ) are independent of it M +1 (:), then vj (ht+1 (ait ;
and aj ( it ) must be independent of it M +2 (:) thereby implying that wi (ht ; ait )
and vi (ht ) are in fact independent of it M +2 (:). And so on. Leading …nally
to the conclusion that no part of the transfer history enters wi , i.e., it is a function of ait alone and that only jt 1 (:) enters vi - and only through the payo¤
relevant e¤ect in the …rst term of Eq. 26. Put another way, the general case
of any M has been reduced to M = 1. The proof of the previous sub-section
then applies.
8 Note
that ht+1 (ait ;
it )
= (ha
t ; ait ;
jt M +1 (:);...
17
it (:)).
it ))
4
A General Result for the Finite Horizon
The theorem that we proved in the previous section applies to a …nite horizon
game as well.9 Indeed, in many ways, the argument is simpler because one can
rely on the stick of backwards induction. Also, the result does not require a
restriction to …nite transfer histories.
We now prove the relevant result. For purely notational reasons, suppose
that the last player to move is player j. For any ai ; de…ne the one-period
utilitarian maximum:
b1 (ai ) = arg max[ i (ai ; aj ) +
aj
j (ai ; aj )]
and let the associated one-period utilitarian maximum value be denoted
i (ai ; b 1 (ai ))
U1 (ai ) =
j (ai ; b 1 (ai ))
+
Inductively, whenever player j is the mover, de…ne the T +1 period utilitarian
maximum
bT +1 (ai ) = arg maxf[ i (ai ; aj ) +
aj
j (ai ; aj )]
+ UT (ai )g
and let the associated T + 1-period utilitarian maximum value be denoted
UT +1 (ai ) =
i (ai ; b T +1 (ai ))
+
j (ai ; b T +1 (ai ))
+ UT (bT +1 (ai ))
with analogous optimal action, bT +1 (aj ) and utilitarian value, UT +1 (aj )
de…ned for periods in which player i is the mover.
De…ne now individual incentives in the following manner. For any ai de…ne
the one-period unilateral maximum:
1 (ai )
= arg max
aj
j (ai ; aj )
and let the associated one-period unilateral maximum be denoted
v1 (ai ) =
j (ai ;
1 (ai ))
For every action ai de…ne the transfer required to induce one-period play of
aj as
(27)
1 (aj j ai ) = v1 (ai )
j (ai ; aj )
Clearly, 1 (aj j ai ) > 0 for all aj : De…ne the non-mover’s, player i0 s, payo¤
in the last period as
W1 (ai ) = max[ i (ai ; aj )
aj
1 (aj
j ai )]
9 The result applies both to the discounted as well as the undiscounted model. For notational simplicity, we restrict ourselves to the undiscounted case.
18
Substituting for
is equivalent to
1 (a
0
) from Eq. 27 it follows that the maximization above
W1 (ai ) = max[ i (ai ; aj ) +
aj
j (ai ; aj )]
v1 (ai ) = U1 (ai )
v1 (ai )
so that we have the following claim:
Claim 11 When there is one period left in the game, regardless of the action
ai played in the immediately preceding period by (the non-mover) player i, the
unique equilibrium continuation is for i to o¤ er a transfer 1 (b1 (ai ) j ai ) to
induce the play of the utilitarian maximum b1 (ai ) in the last period. The associated payo¤ s for the mover, player j, and the non-mover are then, respectively,
the unilateral maximum value v1 (ai ) and the residual from the utilitarian maximum value U1 (ai ) v1 (ai ).
Similarly when there are 2 periods left, for any aj de…ne the two-period
unilateral maximum:
2 (aj )
= arg max[ i (ai ; aj ) + W1 (ai )]
ai
and let the associated two-period unilateral maximum be denoted
v2 (aj ) =
i ( 2 (aj ); aj )
+ W1 (
2 (aj ))
For every action ai de…ne the transfer required to induce penultimate-period
play of ai as
[ i (ai ; aj ) + W1 (ai )]
(28)
2 (ai ) = v2 (aj )
Clearly, 2 (ai ) > 0 for all ai : De…ne the non-mover’s, player j 0 s, continuation
payo¤ starting in the penultimate period as
W2 (aj ) = max[
ai
Substituting for
is equivalent to
0
1 (a )
W2 (aj ) = max[ i (ai ; aj )+
ai
j (ai ; aj )
2 (ai )
+ v1 (ai )]
from Eq. 28 it follows that the maximization above
j (ai ; aj )+W1 (ai )+v1 (ai )]
v2 (aj ) = U2 (aj ) v2 (aj )
so that we have the following claim:
Claim 12 When there are two periods left in the game, regardless of the action aj played in the third to last period by current non-mover j, the unique
equilibrium continuation in the penultimate period is for to o¤ er a transfer
2 (b 2 (aj ) j aj ) to induce the play by i of the utilitarian maximum b 2 (aj ) in
the second to last period. The associated payo¤ s in the last two periods for
the mover and non-mover are then, respectively, the unilateral maximum value
v2 (aj ) and the residual from the utilitarian maximum value U2 (aj ) v2 (aj ).
19
Inductively de…ne, for periods when player i is the mover, optimal action,
bT +1 (aj ) and utilitarian value, UT +1 (aj ). Similarly, for periods when player j
is the mover, de…ne an optimal action, bT +1 (ai ) and utilitarian value, UT +1 (ai ).
The same proof as above applies to give us the main result which is:
Theorem 13 Consider a T period game. The utilitarian solution {bt (a) : t =
1; :::T g is the unique SPE action pro…le in the game. That is supported by
transfers that induce the mover to pick that action by way of a compensating
transfer that makes her as well o¤ as she would be from her most preferred
transfer.
5
An Example with In…nite Transfer History
If in…nite transfer histories are allowed, whilst equilibria are still Pareto optimal,
uniqueness of the transfer scheme need no longer hold. The example below
establishes that.
Consider the following stage game
1n2
*
A
(0,0)
B
(1,0)
An action is a 2 fA; Bg. We use the following convention: a) at T even, T =
0; 2; : : :, player 2 selects an action aT given the history hT = ( 0 ; a1 ; : : : ; aT 1 ; T )t T ;
at T odd player 2 selects a transfer T +1 given the history hT = ( 0 ; a1 ; : : : ; T 1 ; aT ).
The set of histories is H.
We limit attention to strategies where when player 2 is not a mover, he sets
= 0, and we never record this entry in the histories. It should be clear from
the argument that this is without loss of generality. Similarly we never record
the single action of player 1. When player 2 is a mover at hT , the last entry
of hT is T . When player 2 is not a mover at hT , the last entry is aT , the last
action taken by 2. When player 1 is either not a mover or a mover at hT , the
last entry is aT , as 1 does not choose actions.
For given strategies of the other players, the lifetime payo¤s for the players
are:
W1 (hT )
=
max(1
V1 (hT )
=
(1
)
)(
1 (aT +1 (hT ;
1 (aT )
))
(aT +1 (hT ; )) + V1 (hT ; ; aT +1 (hT ; ))
+ W1 (hT )
and
W2 (hT )
=
V2 (hT )
=
V2 (hT )
max(1
a
)(
20
T (a))
+ W2 (hT ; a)
Our equilibrium strategies depend on two subsets of H and two parameters,
and 0 , that we de…ne below.
Let H+ and H , H
follows:
H+ , be two subsets of the set of histories de…ned as
De…nition of H+ : hT 2 H+ if there exists t such that a) either
0
< or at 6= B; and b) t0 (B)
, all t t0 T .
t0 (A) <
t (B)
t (A)
De…nition of H : hT 2 H if hT 2 H+ and there exists t < T such that
hT = (ht ; hT t ) with ht 2 H+ and at+1 = B.
Let
be a positive number to be speci…ed below and let
0
Later, we de…ne
=
2
1+
:
2
(29)
so that the following are equilibrium strategies:
Equilibrium strategies:
Player 2: a(hT ) = B, if hT 2 HnH+ or hT 2 H and T (A)
While a(hT ) = A, otherwise (that is, hT 2 H+ nH , or hT 2 H and
T (B)).
T (B).
T (A)
>
Player 1: (hT ) = ( (A); (B)) = (0; ), if hT 2 HnH+ ; (hT ) = (0; 0 ),
hT 2 H+ nH , (hT ) = (0; 0), hT 2 H .
5.1
5.1.1
Equilibrium payo¤s
Player 1
Histories in HnH+ :
Let hT 2 HnH+ be such that aT = B. By de…nition of the equilibrium
strategies, (hT ) =
= (0; ), while, as (hT ; ) 2 HnH+, a(hT ; ) = B.
Hence W1 (hT ) = W1 (hT ), and V1 (hT ; (0; ); B) = V1 (hT ; (0; ); B), for all histories hT and hT 2 HnH+ with aT = aT = B. We denote by W1 (+) and V1 (B)
such values.
Then:
W1 (+)
=
(1
)(1
V1 (B)
=
(1
) + W1 (+)
and therefore
W1 (+) =
1
21
) + V1 (B),
+
1+
(30)
1 + (1
1+
V1 (B) =
)
(31)
Histories in H+ nH
Let hT 2 H+ nH . Then (hT ) = ^ = (0; 0 ) while, as hT ; ^ 2 HnH+ , it is
aT (hT ; ^) = B and (hT ; ^) 2 HnH+ . Thus, the payo¤ W1 (hT ) is invariant for
all histories hT 2 H+ nH and it is denoted by W1 ( ). Then:
W1 ( ) = (1
0
)(1
) + V1 (B)
and then, by equation 31,
W1 ( ) = (1
0
)(1
1 + (1
1+
)+
)
(32)
Histories in H
Let hT 2 H , then (hT ) = = (0; 0) and a(hT ; ) = B. If hT 2 H and
player 1 adopts the equilibrium strategies, all the followers of hT 2 H so generated stay in the set H . We denote by W1 (0) and V1 (0) the payo¤ generated
by the equilibrium strategies for histories hT 2 H . It is
W1 (0)
=
(1
) + V1 (0)
V1 (0)
=
(1
) + W1 (0)
thus
W1 (0) = V1 (0) = 1
5.1.2
(33)
Player 2
Histories in HnH+ :
There are three di¤erent types of these histories indexed by
(0; ); b) T = (0; 0 ); c) T = ( T (A); T (B)); with either T (B)
0
or with T (B)
:
T (B)
T
: a) T =
T (B)
a) By de…nition of equilibrium strategies, a(hT ) = B; for all hT 2 HnH+ .
Let V2 ( ) denote the value of 2 which is invariant for all hT 2 HnH+ with
T = (0; ). The value for player 2 at (hT ; B) 2 HnH+ is invariant for all
histories hT +1 2 HnH+ with aT +1 = B and is denoted by W2 (B). Then
V2 ( )
=
W2 (B)
=
(1
) + W2 (B)
V2 ( )
that is
V2 ( ) =
1+
W2 (B) =
22
1+
;
(34)
(35)
b) By de…nition of equilibrium strategies, a(hT ) = B; for all hT 2 HnH+ .
Let V2 ( 0 ) denote the value of 2 which is invariant for all hT 2 HnH+ with
0
T = (0; ). Then
V2 ( 0 ) = (1
) 0 + W2 (B) = (1
) 0+
2
(36)
1+
c) By de…nition of the equilibrium strategies, a(hT ) = B, while, as (hT ; B) 2
HnH+ , it is (hT ; B) = (0; ): Then by a) and b), it is W2 (hT ; B; (0; 0 ) =W2 (B).
Then we get
V2 (hT ) = (1
)
T (B)
+ W2 (B) = (1
)
T (B)
+
2
1+
(37)
Histories in H+ nH :
For such histories a(hT ) = A and, since (hT ; A) 2 H+ nH , (hT ; A) =
(0; 0 ) thereby implying that (hT ; A; (0; 0 )) 2 HnH+ . Thus, by b), W2 (hT +1 ) is
invariant for all hT +1 2 H+ nH with aT +1 = A and it is denoted W2 (A): It is
W2 (A) = V2 ( 0 ) = (1
) 0+
3
(38)
1+
and thus by 36
V2 (hT )
=
=
(1
(1
)
)
T (A)
+ W2 (A) = (1
T (B)
+
2
(1
0
) +
)
T (B)
+
2
V2 ( 0 )
(39)
4
1+
Histories in H :
We distinguish three such histories parametrized by T : i) T = (0; 0); ii)
T = ( T (A); T (B)); with T (A) > T (B); and iii) T = ( T (A); T (B)); with
T (A)
T (B):
i) Then, a(hT ) = B. For all histories following (hT ; B) generated by equilibrium strategies, player 1 transfer is = (0; 0), while player 2 action is B. We
call V2 (0) and W2 (0), the payo¤ so generated. Thus,
V2 (0) = W2 (0) and W2 (0) = V2 (0)
and hence
V2 (0) = W2 (0) = 0
(40)
ii) Then a(hT ) = A and hence
V2 (hT ) = (1
)
T (A)
+ W2 (0) = (1
)
T (A):
+ W2 (0) = (1
)
T (B).
iii) Then a(hT ) = B and hence
V2 (hT ) = (1
)
T (B)
23
5.1.3
Player 2 does not have incentives to deviate
We analyze one-shot deviations. We have to analyze four possible such deviations: a) player 2 picks A at hT 2 HnH+ with T (B)
, b) picks
T (A)
0
A at hT 2 HnH+ with T (B)
, c) picks A (B) at hT 2 H with
T (B)
(A) > (B) ( (A)
(B)), d) picks B at hT 2 H+ nH .
a) Player 2 payo¤s generated by such deviation are:
(1
)
T (A)
+ W2 (hT ; A)
By de…nition of equilibrium strategies (hT ; A) = (0; 0 ) and thus
W2 (hT ; A) = V2 ( 0 ).
Thus such a deviation is not pro…table if
V2 (hT )
=
=
As
T (B)
T (A)
(1
(1
(1
)
)
T (B)
T (A)
+ W2 (B)
+
2
[(1
(1
)
T (A)
+
2
V2 ( 0 )
0
) + W2 (B)]
; the last inequality is satis…ed if
) + W2 (B)
+ 2 [(1
) 0 + W2 (B)]
This is the case as by equation 35 and equation 29, the left and right hand side
of the inequality above are identical.
b) Player 2 payo¤s generated by such deviation are:
(1
)
T (A)
+ W2 (hT ; A).
As (hT ; A) 2 H+ nH , (hT ; A) = (0; 0 ): Thus, by de…nition of equilibrium
strategies
W2 (hT ; A) = V2 ( 0 ).
If player 2 follows equilibrium strategies at hT ; payo¤s are V2 (hT ) as in equation
0
37. Thus, by taking into account that T (B)
; such a deviation is
T (B)
not pro…table if
)
0
(1
which is obviously veri…ed as both W2 (B) > 0 and
0
> 0.
(1
)
0
[ V2 ( 0 )
W2 (B)] = [(1
2
)W2 (B)
c) Since hT 2 H , any story following hT generated by player 1 playing
equilibrium strategies are in H . Thus, if T (A)
T (B), by equation 40, the
payo¤s generated by such deviation are:
(1
) (A)
24
which is less or equal than (1 ) T (B), the payo¤ generated by the equilibrium
strategy. Identical argument applies if T (B)
T (A):
0
d) Since hT 2 H+ nH , it is be such that T (B)
. By deviating
T (A) <
and selecting B, player 2 generates a history (hT ; B) 2 H . Thus payo¤s are
(1
)
T (B)
Had player 2 followed the equilibrium strategy payo¤s would have been V2 (hT )
as in equation 39. Thus such a deviation is not pro…table if:
(1
which, since
T (B)
)
T (A)
T (A)
<
0
+ W2 (A)
(1
)
T (B)
, it is certainly veri…ed if, by equation 38
(1
) 0
1+
The latter, by equation 29, can be rewritten, after straightforward computations
as
2
(1
) 0+
2
which is certainly veri…ed for
5.1.4
4
2
(1 +
)
1
(41)
close enough to 1.
1 does not have incentives to deviate
We have to check four possible deviations: a) player 1 o¤ers ^ with ^(B) ^(A) <
, at hT 2 HnH+ ; b) player 1 o¤ers 6= (0; 0) at hT 2 H ; c) player 1 o¤ers ~,
with ~(B) ~(A) < 0 at hT 2 H+ nH .
a) Since hT 2 HnH+ , it is (hT ; ^) 2 H+ nH . Therefore, a(hT ; ^) = A
and then (hT ; ^; A) = 0 implying that W1 (hT ; ^; A) = W1 ( ). Thus payo¤s
generated by such deviation are:
(1
)^(A) + V1 (hT ; ^; A)
2
W1 (hT ; ^; A)
(1
)^(A) +
)^(A) +
2
W1 ( )
(1
)^(A) +
2
[(1
=
(1
=
=
)(1
0
) + V1 (B)]
while equilibrium payo¤s are
W1 (hT ) = (1
)(1
) + V1 (B)
As T (A)
0 and, by equations 29 and 31,
deviations is unpro…table.
<
0
and V1 (B) > 0, such
b) If hT 2 H all the followers of hT generated by player 1 following the
equilibrium strategy are in H . Furthermore, by equation 33, equilibrium payo¤s for player 1 are the highest possible payo¤s. Therefore, there cannot be a
pro…table deviation at hT 2 H .
25
c) Since hT 2 H+ nH , if player 1 o¤ers ~, is (hT ; ~) 2 H+ nH and hence
a(hT ; ~) = A, while (hT ; ~; A) = (0; 0 ). Thus payo¤s generated by such deviation are:
(1
)~(A) + V1 (hT ; ; A)
=
)~(A) +
)~(A) +
(1
(1
2
W1 (hT ; ; A) =
2
W1 ( ) ,
while equilibrium payo¤s at hT 2 H+ nH are W1 ( ).
Thus, as ~(A) 0, the deviation is not pro…table if W1 ( ) > 0, that is, by
equation 32, if
1 + (1
)
0
>0
(1
)(1
)+ 2
1+
The latter de…nes an upper bound on , ( ), which by the de…nition of
0
(equation 29) can be written after trivial computations as:
2
( )=
which is strictly positive for
(1 +
1+ 5
3
)
4
> 0.
Thus we have shown:
Fact: Let satisfy inequality 41. Then, the set of SPNE contains a continuum of equilibria indexed by 2 [0; ( )]. Equilibrium payo¤ s are one to one
in .
6
6.1
Extensions
N Players
Result generalizes - To be Written
6.2
Transfers with No Commitment
To be Written
6.3
In…nite Transfer Histories
To be Written
6.4
Mechanism Design and Folk Theorems in Repeated
Games
To be Written
26
7
Conclusion
To be Written
8
References
To be Written
27