The Iterated Prisoner‘s Dilemma
Darwin:
The small strength and
speed of man, his
want of natural
weapons, etc., are
more than
counterbalanced ... by
his social qualities,
which led him
to give and receive
aid from his fellow
men.
Mutual aid
I cooperates
I
defects
II
II
cooperates
defects
b-c
-c
b
0
The one-shot PD
Prisoner' s Dilemma Game
players I and II
payoff for player I
II plays
II plays
C
D
I plays
C
R
S
I plays
D
T
P
R Reward, T Temptation , P Punishment , S Sucker' s payoff
T R P S
Social Dilemma (where the ' invisible hand' fails)
Adam Smith (1723-1790)
• …by pursuing his own
interest, man
frequently promotes
that of the society
more effectually than
when he really
intends to promote
it…
Adam Smith: Man intends only his own
gain, and he is in this, as in many other
cases, led by an invisible hand to
promote an end which was no part of his
intention.
Joseph Stiglitz: The reason that the
invisible hand often seems invisible is that
it is often not there.
Payoff for repeated games
w probabilit y for a further round
average number of rounds 1/ (1 w)
total payoff A( w) : A(0) wA(1) w2 A( 2) ...
payoff per round (1 w) A( w)
limiting case w 1 : payoff per round
A(0) ... A( n )
lim
n 1
The Good, the Bad and the
Discriminator
• ALLC
• ALLD
• TFT
• frequencies x,y,z (x+y+z=1)
Payoff matrix
up to factor 1/ (1 w)
ALLC
ALLC
ALLD
TFT
Px , Py , Pz
(i.e. per round)
ALLD
TFT
c
bc
b c
b
0
b(1 w)
b c
b c c(1 w)
expected payoff for ALLC, ALLD, TFT
P average payoff in population
Replicator Dynamics
replicator equation
x x( Px P )
y y ( Py P )
z z ( Pz P )
on unit simplex
Replicator Dynamics
middle zone
c
(1 w)c
z
wb
w(b c)
IPD with errors
probabilit y to mis - implement a move
stochastic strategies ( f , p, q)
f prob to coop in initial round
p prob to coop after co - player cooperated
q prob to coop after co - player defected
ALLC (1,1,1) (1 ,1 ,1 )
ALLD (0,0,0) ( , , )
TFT (1,1,0) (1 ,1 , )
IPD with errors
( f , p, q) against
( f ' , p ' , q' )
c( e wre' ) b( e' wr' e)
payoff
2
(1 w)(1 uw )
where r : p q r ' : p ' q' u : rr '
e : (1 w) f wq
e' : (1 w) f ' wq'
IPD with errors
IPD
limits 0 (no error)
and
w 1
do not commute
(infinitel y many rounds)
Evolving Generosity
Reacting on co-player
( f , p, q) strategies , with f 1
( p, q) strategies
p prob.to play C after co - player' s C
q prob to play C after co - player' s D
(0,0) is ALLD
(1,0) is Tit For Tat
assume errors!
The iterated Prisoner´s
Dilemma
The iterated Prisoner´s
Dilemma
The iterated Prisoner´s
Dilemma
The iterated Prisoner´s
Dilemma
The iterated Prisoner´s
Dilemma
The iterated Prisoner´s
Dilemma
Adaptive Dynamics
let x R be some trait (sex - ratio, prob. to escalate.. .)
resident pop. homogeneou s, all x
mutant minority y x h, payoff A( y, x)
h small
Adaptive Dynamics
let x R be some trait (sex - ratio, prob. to escalate.. .)
resident pop. homogeneou s, all x
mutant minority y x h, payoff A( y, x)
can it invade? iff W (h, x) : A( y, x) A( x, x) 0
Adaptive Dynamics
let x R be some trait (sex - ratio, prob. to escalate.. .)
resident pop. homogeneou s, all x
mutant minority y x h, payoff A( y, x)
can it invade? iff W (h, x) : A( y, x) A( x, x) 0
trait substituti on sequence (mutation - limited)
Adaptive Dynamics
let x R be some trait (sex - ratio, prob. to escalate.. .)
resident pop. homogeneou s, all x
mutant minority y x h, payoff A( y, x)
can it invade? iff W (h, x) : A( y, x) A( x, x) 0
trait substituti on sequence (mutation - limited)
W
A( x h, x) A( x, x)
x
(0, x) lim
h
h
points towards favorable direction
Adaptive Dynamics for the IPD
consider strategies n ( p, q )
p prob. to play C after co - player used C
q prob. to play C after co - player used C
A(n' , n) payoff for player using n' ( p' , q' )
in population where everyone else plays n
Invader' s payoff difference
A(n' , n) A(n, n)
A
A
(n' , n) (q 'q)
( n' , n)
p'
q '
(partial derivative s evaluated for n' n)
( p ' p )
Adaptive Dynamics for the IPD
consider strategies n ( p, q)
the vector field ( p , q )
A
A
with
p
(n' , n), q
( n' , n)
p '
q'
points into direction most advantageo us for invader n'
n' can invade if n'n in half - plane defined by ( p , q )
Adaptive Dynamics for the IPD
consider strategies n ( p, q )
the vector field ( p , q )
A
A
(n' , n), q
( n' , n)
p '
q '
points into direction most advantageo us for invader n'
with
p
For IPD
b( p q ) c
(1 ( p q )) 2 (1 ( p q ))
b( p q ) c
q (1 p )
(1 ( p q )) 2 (1 ( p q ))
p q
Adaptive Dynamics for the IPD
Reacting on last round
Prisoner's Dilemma Game
payoff for player I
if II plays C
if II plays D
if I plays C
R
S
if I plays D
T
P
R Reward, T Temptation, P Punishment, S Sucker's payoff
T>R>P>S
Reacting on last round
memory one strategies
( p R , pS , pT , pP )
where pi prob to play C after outcome i
(1,1,1,1) is ALLC
(0,0,0,0) is ALLD
(1,0,1,0) is TFT
(1,0,1,1) is Firm but Fair
(0,0,0,1) is Bully
16 non - probabilis tic strategies
The fearsome four
•
•
•
•
•
Heteroclinic network
A = Tit or Tat
B = Firm But Fair
C = Bully
D = ALLD
…and the winner is…
If all possible probabilistic strategies
( pR, pS, pT , pP )
are included (not only the deterministic
strategies such as (1,0,1,0) etc.)
then evolution leads to the (deterministic)
strategy (1,0,0,1)
Win-Stay. Lose-Shift WSLS
(1, 0, 0,1) = ( pR, pS, pT , pP )
WSLS
cooperates iff co-player used same move
win-stay, lose-shift:
self
C C D
co
C D C
payoff R S T
move C D D
D
D
P
C
WSLS
WSLS is error - correcting
If WSLS against WS LS
C C C...C D D C C
C C C...C C D C C
If TFT against TFT
C C C ... C D C D C D ...
C C C ... C C D C D C ...
WSLS
WSLS is a ' simpleton'
against ALLD
WSLS
C D C D C D C ...
ALLD
D D D D D D D...
payoff per round
PS
2
P T
ALLD gets
2
WSLS cannot invade ALLD
P T
ALLD cannot invade WSLS if
R
2
(i.e. 2c b)
Win-Stay, Lose-Shift WSLS
• Simple learning rule
• stable, error-correcting
• but needs retaliator to prepare the ground
Memory-one strategies
If ( pR , pS , pT , pP ) against (qR , qS , qT , qP )
transition matrix between states R, S , T , P
pR q R pR (1 q R ) (1 p R )qR (1 pR )(1 q R )
p q
*
*
*
S T
Q
*
*
*
*
*
*
*
*
allows computatio n of average payoff
Memory-one strategies
stationary distribution s= (sR, sT , sS, sP ) Î D 4
sQ = s
then payoff per round for player I is
sÎgI with gI = (R,T, S, P)
i.e., sRR+ sT T + sSS+ sP P
A new breath:
Press and Dyson PNAS 2012
AMS homepage (‚Maths in the Media‘)
‚The world of game theory is currently on
fire...‘
‚this is a monumental surprise...‘
‚the emerging revolution of game theory...‘
Dyson‘s formula
Define D( p, q, x) :
the n
1 pR qR
1 pR
1 qR
x1
pS qT
1 pS
qT
x2
pT qS
pT
1 qS
x3
p P qS
pP
qP
x4
D ( p, q, g I )
PI
D( p, q,1)
Zero-determinant (ZD) strategies
( pR, pS, pT , pR ) such that (for given reals a,b,g )
pR -1 = a R+ b R+ g
pS -1 = a S+ b T + g
pT = aT + b S+ g
pP = a P + b P + g
Press and Dyson: if player I uses this strategy, then
a PI + b PII + g = 0
no matter what player II is playing.
(Plotkin, Stewart, Hilbe, Nowak, Adami, Hintze, Akin...)
Zero-determinant (ZD) strategies
( pR, pS, pT , pR ) such that (for given reals a,b,g )
pR -1 = a R+ b R+ g
pS -1 = a S+ b T + g
pT = aT + b S+ g
pP = a P + b P + g
p = ( pR, pS, pT , pP ) = a g I + b gII + g1+ g0
with gI = (R, S,T, P), gII = (R,T, S, P),
1 = (1,1, 0, 0),
g0 = (1,1, 0, 0).
Examples of ZD-strategies
a PI + b PII + g = 0
Equalizers: if a = 0 ¹ b, then
PII = -
(any value between P and R)
g
b
Examples of ZD-strategies
PI PII 0
Equalizers : if 0 , then
PII
(any value between P and R )
Extortione rs : if :
the n
1 and ( ) P
PI P ( PII P)
(own ' surplus' PI P over maximin payoff P always
- fold of co - players' surplus)
si (n) prob that I gets payoff i Î {R, S, T, P} in round n
sR (n+1) + sS(n+1) prob I plays C in round n+1
= sR (n)pR + sS(n)pS + sT (n)pT + sP (n)pP
sR (n+1) + sS(n+1) = s(n)Îp = s(n)Î(a gI + b gII + g1+ g0 )
since s(n)Îg0 = sR (n) + sS(n)
with w(n) := sR (n+1) + sS(n+1) - (sR (n) + sS(n)) we get
w(n) = a PI (n) + b PII (n) + g ,
and hence
1 N-1
sR (N) + sS(N) - sR (0) - sS(0)
w(n) =
® a PI + b PII + g
Î
N n=0
N
Characterizations
ZD - strategies :
pR pP pS pT
Equalizers :
(b c)(1 pS pT ) (b c)(1 pR pP )
Extortione rs :
pP 0 and pT (c b) (1 pS )(b c)
All reactive strategies
are ZD
Extortion does not pay
Pairwise comparison of extortion
Neutral against AllD
Stable coexistence with AllD
Weakly dominated by TFT
Dominated by WSLS
If all five: no Nash equilibrium involves extortion
Complier strategies
if $c > 1 s.t. R- PI = c (R- PII )
for any strategy adopted by player II
Akin (AMS Monthly): memory one strategy 'good' if
pR = 1 and whenever PI < R then PII < R
exactly when (T - R)pT < (R- S)(1- pS )
and
(T - R)pP < (R- P)(1- pS )
complier strategies are precisely
those ZD-strategies which are good.
© Copyright 2026 Paperzz