Stage game: G = (N,Π N: Set of players, Ak: Action set of player k in

Stage game: G = (N, Πnk=1 Ak , {uk }nk=1 )
N : Set of players, Ak : Action set of player k in G
a = (a1 , . . . , an ) is a typical pure action profile
uk (a) is payoff of player k from pure action profile a
u(a) = (u1 (a), . . . , un (a))
Assumption:
maxa∈ΠAk ui (a) < B and mina∈ΠAk ui (a) > B for all k. That is payoffs of the
stage game are bounded.
Minmax
Minmax payoff of player k is defined as
vk = min
max uk (ak , a−k )
a−k ∈A−k
ak ∈Ak
This is a lower bound of payoff that k can obtain at any Nash equilibrium
of G. The action profile where k is minmaxed is denoted by mk , that is
uk (mk ) = vk . For all j 6= k, denote uj (mk ) by wjk .
Payoff set
Set of convex combinations of pure action payoffs of G.
V = convex hull{v | u(a) = v}
Note that, V consists of payoffs from pure action profiles as well as (independent and correlated) mixed action profiles.
Infinitely repeated Game G∞
G is repeatedly played at date t = 1, 2, . . ..
at = (at1 , . . . , atn ) denotes the pure action profile played at date t.
A history at date t is the sequence of action profiles played till the previous
date: ht = (a1 , . . . , at−1 ) and h1 = ∅. Set of histories at date t is H t .
Strategy of player k is denoted by σk , σk : H t → Ak
1
Payoff at date t is the present discount value of future stream of payoffs:
Ukt = (1 − δ)
∞
X
δ τ −t uk (aτ )
τ =t
δ is the discount factor, 0 ≤ δ < 1
One deviation
A strategy σk0 is one deviation from σk at history ht if σk0 (hτ ) = σk (hτ ) for
all histories hτ in the subgame starting at ht and σk0 (ht ) 6= σk (ht )
A profile of strategies (σ1 , . . . , σn ) satisfies ‘One Deviation Property’ (ODP)
if at any history one deviations are not profitable.
Theorem
(σ1 , . . . , σn ) is a SPE of G∞ ⇔ (σ1 , . . . , σn ) satisfies ODP
Proof: Necessity is trivial. Let us prove sufficiency, that is (σ1 , . . . , σn )
satisfies ODP implies (σ1 , . . . , σn ) is SPE. Suppose not (hoping to reach a
contradiction), then there is a history ht and a player k such that in the
subgame following ht , k has a profitable deviation σk0 . There are two possibilities, (i) σk0 and σk differ on finite number of histories. (ii) σk0 and σk differ
on infinite number of histories.
(i) Finite: (We have already discussed a similar result while studying Kuhn’s
Theorem) Suppose that σk0 and σk differ on finite number of dates (t1 , t2 , . . . , tm ).
By ODP, Uktm (σk0 , σ−k ) ≤ Uktm (σk , σ−k ) because at period tm , σk0 is onedeviation from σk . Since deviation at any history in period tm is not profitable, the profit from following σk0 must appear from the dates (t1 , t2 , . . . , tm−1 ).
The same argument now applies to tm−1 and so on, all the way to t1 . Thus
Ukt1 (σk0 , σ−k ) ≤ Ukt1 (σk , σ−k ), which contradicts our assumption.
(ii) Infinite: Suppose t is the first date where σk0 and σk differ. Suppose
that Ukt (σk0 , σ−k ) − Ukt (σk , σ−k ) > . Choose T sufficiently large so that
δ T (B − B) <
2
. Construct a new strategy σk00 which follows σk0 up to date
2
t + T and σk afterwards. By construction, σk00 differs from σk on finite dates
and
Ukt (σk00 , σ−k ) − Ukt (σk , σ−k ) > − δ T (B − B) >
2
σk00 is a profitable deviation, which is ruled out by (i).
Folk Theorem
Take any v ∈ V such that vk > vk for all k. If this game is played by
sufficiently patient players then there exist a SPE of G∞ with payoff v.
Proof:
Construction:
Choose v ? in the interior of V and > 0 such that for all i, vi < vi? < vi and
the following vectors,
?
?
v ? (i) = v1? + , . . . , vi−1
+ , vi? , vi+1
+ , . . . , vn? + are in V . This requires some assumptions on V which we are going to ignore
here. Moreover, to simplify the proof, we shall assume that there exist pure
action profiles a? (1), . . . , a? (n) and a such that v ? (i) = u (a? (i)) for all i =
1, . . . , n and v = u(a).
Choose an integer L sufficiently large, such that, for all i
(B − B) < L(vi? − vi )
SPE strategy:
Phase I (starting phase): Phase I includes the empty history. Play a so long
as either the realized action profile is a (no deviation) or the realized action
differs from a in two or more components. If a single player, say i deviates
move to phase IIi .
Phase IIi (punishment phase): Play mi for L periods as long as either the
realized action profile is mi (no deviation) or the realized action differs from
3
mi in two or more components. If a single player, say j (j includes i), deviates
during L periods move to phase IIj . Otherwise at the end of L periods move
to IIIi .
Phase IIIi (reward phase): Play a? (i) forever as long as either the realized
action profile is a? (i) (no deviation) or the realized action differs from a? (i)
in two or more components. If a single player, say j (j includes i), deviates
during phase IIIi move to phase IIj .
Checking for profitable deviations:
By ‘one deviation property’ of SPE, it is enough to check against onedeviations.
Phase I: Deviation can (at best) generate the following stream of payments
for i,
B
|{z}
,
vi , . . . , vi
| {z }
,
v?, . . .
|i {z }
deviation period L punishment period reward period
By following the prescribed strategy, i obtains vi throughout.
Thus profit from deviation is
(1 − δ)(B − vi ) + δ(1 − δ L )(vi − vi ) + δ L+1 (vi? − vi )
Since vi? < vi , the above expression is negative as δ ↑ 1. That is if δ is
sufficiently large then deviation is not profitable.
Phase IIi :
Case 1, i deviates: By deviating during IIi , i can get at most vi (he is already
minmaxed) and restarts the punishment phase IIi . Hence there is no profit
from deviation.
Phase IIi :
Case 2, j 6= i deviates:
4
Suppose L0 periods of Phase IIi still remained when the deviation took place.
By following the prescribed strategy, j obtains
wji , . . . , wji
| {z }
remaining L0 periods of phase IIi
, vj? + , . . .
| {z }
Phase IIIi
Deviation can (at best) generate the following stream of payments for j,
B
|{z}
,
deviation period
vj , . . . , vj
| {z }
,
vj? , . . .
| {z }
L punishment period IIj reward period IIIj
Since vj < vj? , this stream of payment is no better than
B
|{z}
deviation period
, vj , . . . , vj , vj? , . . .
| {z } | {z }
(L0 − 1) periods
forever
Thus profit from deviation is no more than
0
0
(1 − δ)(B − wji ) + δ(1 − δ L −1 )(vj − wji ) + δ L vj? − (vj? + )
Since > 0 the above expression is negative as δ ↑ 1. That is if δ is sufficiently
large then deviation is not profitable.
Phase IIIi :
Case 1, i deviates:
Deviation can (at best) generate the following stream of payments for i,
B
|{z}
,
vi , . . . , vi
| {z }
,
v?, . . .
|i {z }
deviation period L punishment period reward period
By following the prescribed strategy, i obtains vi? throughout.
Thus profit from deviation is
(1 − δ) (B − vi? ) + δ + δ 2 + . . . + δ L (vi − vi? )
Since (B − vi? ) < (B − B) < L(vi? − vi ), the above expression is negative as
δ ↑ 1. That is if δ is sufficiently large then deviation is not profitable.
Case 2, j 6= i deviates:
Check, similar to Phase I.
5