A Powerful Long Memory Strategy for the
Prisoner’s Dilemma
Marc Harper, Dashiell Fryer, Chris Lee
UCLA, Pomona
Jan. 13th, 2015
In this talk I will explain how we use machine learning and
statistical inference to create a highly effective strategy for
population games, called IP0 .
The Prisoner’s Dilemma
The prisoner’s dilemma is a game played between two players, each
choosing to cooperate or defect, with the following payouts:
C
D
with T > R > P > S
C
R, R
T, S
D
S, T
P, P
The Prisoner’s Dilemma
The Prisoner’s Dilemma
Large class of strategies: memory-one strategies, specifed by
conditional probabilities on the last round of play:
(Pr (C |CC ), Pr (C |CD), Pr (C |DC ), Pr (C |DD))
The Prisoner’s Dilemma
Large class of strategies: memory-one strategies, specifed by
conditional probabilities on the last round of play:
(Pr (C |CC ), Pr (C |CD), Pr (C |DC ), Pr (C |DD))
ALLC: (1,1,1,1)
ALLD: (0,0,0,0)
TFT: (1,0,1,0)
WSLS: (1,0,0,1)
GTFT: (1 − , , 1 − , )
Markov Process Viewpoint
We can view the iterated Prisoner’s Dilemma as a Markov chain on
the states {CC , CD, DC , DD}.
In 2012, Press and Dyson (PNAS) computed a formula for the dot
product of the stationary distribution of the chain with any vector:
" −1+p q −1+p −1+q f #
1 1
D(p, q, f ) = det
p2 q3
p3 q2
p4 q4
1
1
1
−1+p2
q3
f2
p3
−1+q2 f3
p4
q4
f4
(1)
In particular, this allows one to compute the expected long term
payouts of the IPD and gave rise to Zero Determinant strategies.
The Prisoner’s Dilemma
Zero Determinant Strategies
I
Zero determinant strategies are highly effective at the two
player IPD
Zero Determinant Strategies
I
Zero determinant strategies are highly effective at the two
player IPD
I
In fact, one can often show that there are essentially no better
strategies, even if the history of play is taken into account
Zero Determinant Strategies
I
Zero determinant strategies are highly effective at the two
player IPD
I
In fact, one can often show that there are essentially no better
strategies, even if the history of play is taken into account
I
A ZD player can set the difference in stationary payouts in the
long run
Zero Determinant Strategies
I
Zero determinant strategies are highly effective at the two
player IPD
I
In fact, one can often show that there are essentially no better
strategies, even if the history of play is taken into account
I
A ZD player can set the difference in stationary payouts in the
long run
I
Breaks down into an ultimatum game for two “theory of
mind” players
Population Games
I
Since Press and Dyson much work on ZD strategies has
followed, particularly for population games
Population Games
I
Since Press and Dyson much work on ZD strategies has
followed, particularly for population games
I
In a typical population of N players, each player confronts
each other player every round
Population Games
I
Since Press and Dyson much work on ZD strategies has
followed, particularly for population games
I
In a typical population of N players, each player confronts
each other player every round
I
The population is updated via a birth-death or imitation
process based on each player’s total payout (e.g. the Moran
process)
Population Games
I
Since Press and Dyson much work on ZD strategies has
followed, particularly for population games
I
In a typical population of N players, each player confronts
each other player every round
I
The population is updated via a birth-death or imitation
process based on each player’s total payout (e.g. the Moran
process)
I
We assume some background noise in move selection (allows
for more interesting dynamics)
Winning Population Games
Central questions for strategies in population games:
I
Can a mutant invade a population of players of another
strategy?
Winning Population Games
Central questions for strategies in population games:
I
Can a mutant invade a population of players of another
strategy?
I
Is a population of a particular strategy resistant to invasion?
Winning Population Games
Central questions for strategies in population games:
I
Can a mutant invade a population of players of another
strategy?
I
Is a population of a particular strategy resistant to invasion?
I
Can coalitions be formed in a natural way (e.g. without prior
agreement), allowing for cooperation to arise?
Our Strategy: IP0
We defined a strategy called IP0 based on principles from Sun
Tzu’s The Art of War:
I
The general who wins the battle makes many calculations in
his temple before the battle is fought. The general who loses
makes but few calculations beforehand.
Our Strategy: IP0
We defined a strategy called IP0 based on principles from Sun
Tzu’s The Art of War:
I
The general who wins the battle makes many calculations in
his temple before the battle is fought. The general who loses
makes but few calculations beforehand.
I
Know your enemy and know yourself, find naught in fear for
100 battles.
Our Strategy: IP0
We defined a strategy called IP0 based on principles from Sun
Tzu’s The Art of War:
I
The general who wins the battle makes many calculations in
his temple before the battle is fought. The general who loses
makes but few calculations beforehand.
I
Know your enemy and know yourself, find naught in fear for
100 battles.
I
...what is of supreme importance in war is to attack the
enemy’s strategy.
Our Strategy: IP0
We defined a strategy called IP0 based on principles from Sun
Tzu’s The Art of War:
I
The general who wins the battle makes many calculations in
his temple before the battle is fought. The general who loses
makes but few calculations beforehand.
I
Know your enemy and know yourself, find naught in fear for
100 battles.
I
...what is of supreme importance in war is to attack the
enemy’s strategy.
I
One defends when his strength is inadequate, he attacks when
it is abundant.
Know Your Enemy
I
Rather than seeking to maximize its score, IP0 initially seeks
to maximize its information about another player’s strategy
vector.
Know Your Enemy
I
Rather than seeking to maximize its score, IP0 initially seeks
to maximize its information about another player’s strategy
vector.
I
For the first 10 rounds vs. a specific player, IP0 selects its
plays, either cooperate (C) or defect (D), solely to maximize
its information yield about the other player’s strategy vector
probabilities.
Know Your Enemy
I
Rather than seeking to maximize its score, IP0 initially seeks
to maximize its information about another player’s strategy
vector.
I
For the first 10 rounds vs. a specific player, IP0 selects its
plays, either cooperate (C) or defect (D), solely to maximize
its information yield about the other player’s strategy vector
probabilities.
I
We refer to this as the information gain phase. The four
probabilities are estimated from these rounds of play and are
continually refined in subsequent rounds.
Know Your Enemy
1.00
TP
0.90
0.85
0.800.0
0.1
0.2
FP
0.3
AUC
ZDR
ZDX
TFT
WSLS
ALLC
ALLD
0.95
1.00
0.95
0.90
0.85
0.80
0.75
0.70
0.652
0.4
3 4 5 6 7 8 9 10
number of infogain rounds
Figure : Accuracy of information gain phase. Left: ROC for = 0.05 and
10 infogain rounds. Vertical axis: true positives, Horizontal axis: false
positives. ZDR is the hardest strategy to recognize among those tested.
Right: AUC for IP against ZDR for = 0, 0.01, 0.05, 0.1.
Know Yourself
I
Each IP0 individual attempts to identify whether each other
other player is also IP0 , based purely on whether it appears to
“play like me” (choose the same moves an IP0 would have
chosen).
Know Yourself
I
Each IP0 individual attempts to identify whether each other
other player is also IP0 , based purely on whether it appears to
“play like me” (choose the same moves an IP0 would have
chosen).
I
In particular, the information gain phase produces a unique
pattern of play, that can be quickly recognized (within 3 - 10
moves), even in the presence of random noise (randomly
flipped moves).
Know Yourself
I
Each IP0 individual attempts to identify whether each other
other player is also IP0 , based purely on whether it appears to
“play like me” (choose the same moves an IP0 would have
chosen).
I
In particular, the information gain phase produces a unique
pattern of play, that can be quickly recognized (within 3 - 10
moves), even in the presence of random noise (randomly
flipped moves).
I
Note however that each IP0 player acts completely
independently; different IP0 in a population share no
information and do not communicate.
Attack the Enemy’s Strategy
Since the goal is to dominate the population or resist invasion,
each IP0 seeks to maximize its long term payout relative to the
other player type(s). IP0
I
always seeks to cooperate with other IP0 individuals (for IPD)
Attack the Enemy’s Strategy
Since the goal is to dominate the population or resist invasion,
each IP0 seeks to maximize its long term payout relative to the
other player type(s). IP0
I
always seeks to cooperate with other IP0 individuals (for IPD)
I
chooses the optimal strategy vector based on its estimate of
the opposing type’s strategy vector using the Press and Dyson
determinant
Attack the Enemy’s Strategy
Since the goal is to dominate the population or resist invasion,
each IP0 seeks to maximize its long term payout relative to the
other player type(s). IP0
I
always seeks to cooperate with other IP0 individuals (for IPD)
I
chooses the optimal strategy vector based on its estimate of
the opposing type’s strategy vector using the Press and Dyson
determinant
I
updates its estimates and counter-strategies as play proceeds
One defends... one attacks...
I
IP0 naturally switches effective strategy depending on the
proportion of IP0 in the population, and the opponent strategy
One defends... one attacks...
I
IP0 naturally switches effective strategy depending on the
proportion of IP0 in the population, and the opponent strategy
I
Commonly, IP0 initially cooperates with the opposing type,
when IP0 is in the minority, and later defects against the
opposing type, when IP0 is in the majority
1.0 ALLC
ZDR
0.5
0.0
−0.5
−1.00
2.0
IP0
1.5
ZDt
ConDef
ALLD
20
40
60
80
population fraction (%)
fitness difference
fitness difference
2.0
100
ZDR
0.0
ConDef
−1.00
(a) TFT, = 0
ZDt
ALLD
20
40
60
80
population fraction (%)
100
(b) TFT, = 0.05
3.0
IP0
2.5
2.0
1.5
ConDef
ALLD
1.0
0.5
ZDt
0.0 ALLC
ZDR
−0.50
20
40
60
80 100
population fraction (%)
fitness difference
fitness difference
0.5
IP0
ALLC
−0.5
3.5
3.0
2.5
2.0
IP0
ConDef
1.5
ALLD
1.0
0.5
ZDR
ZDt
0.0 ALLC
−0.50
20
40
60
80 100
population fraction (%)
(c) ALLC, = 0
1.5
1.0
(d) ALLC, = 0.05
2.0
fitness difference
fitness difference
2.0
IP0
ZDt
1.5
1.0
ZDR
0.5
ConDef
ALLD
0.0
−0.5
−1.0 ALLC
−1.50
20
40
60
80 100
population fraction (%)
0.0
ConDef
ZDR
ALLD
100
(b) ALLD, = 0.05
2.5
IP0
2.0
1.5
1.0
ZDt
0.5
ALLD
ConDef
0.0
ZDR
−0.5
−1.0 ALLC
−1.50
20
40
60
80 100
population fraction (%)
fitness difference
fitness difference
(c) WSLS, = 0
1.0
0.5
−0.5
ALLC
−1.00
20
40
60
80
population fraction (%)
(a) ALLD, = 0
2.5
2.0
1.5
IP0
1.0
0.5
ConDef ZDR ALLD ZDt
0.0
−0.5
−1.0
−1.50ALLC 20
40
60
80 100
population fraction (%)
IP0
ZDt
1.5
(d) WSLS, = 0.05
2.0
fitness difference
fitness difference
2.5
2.0
IP0
1.5
1.0
0.5
ZDR
ZDt
0.0 ALLC
−0.5
ConDef ALLD
−1.0
−1.5
−2.00
20
40
60
80 100
population fraction (%)
1.0
0.5
0.0
−0.50
ZDR
ConDef
(c) ZDχ , = 0
20
40
60
80
population fraction (%)
ZDt
100
(b) ZDR, = 0.05
ZDt
ALLD
20
40
60
80
population fraction (%)
ConDef
ZDR
ALLD
2.0
IP0
ALLC
ALLC
0.0
−1.00
100
fitness difference
fitness difference
1.5
1.0
0.5
−0.5
(a) ZDR, = 0
2.0
IP0
1.5
IP0
1.5
1.0 ALLC
ZDt
0.5
0.0
−0.50
ZDR
ConDef
ALLD
20
40
60
80
population fraction (%)
(d) ZDχ , = 0.05
100
Fixation Probabilities
IP0
IP0
ALLC
ALLD
TFT
WSLS
ZDR
ZDχ
0
0
0
0
0
0
ALLC
58.10
59.38
0
34.72
0
0
ALLD
5.50
0
3.68
0
0.86
1.61
TFT
43.60
49.48
0
7.11
24.07
0
WSLS
1.96
0
0.05
0
0
0
ZDR
16.30
21.14
0
0
0.32
ZDχ
51.01
54.78
0
9.74
21.16
27.55
0
Table : Fixation odds ratios ρ/ρneutral of a single row player invading a
population of N − 1 = 99 column players, with an ambient error rate of
= 0.05. At least 10,000 simulations were performed for each pair of
types. For IP0 , p-values for the null hypothesis of neutral fixation is
p = 5 × 10−10 for ALLD and p < 10−26 otherwise.
Thanks!
The Art of War: Beyond Memory-one Strategies in Population
Games, ArXiv:1405.4327, Lee, Harper, Fryer
Marc Harper: [email protected]
Dashiell Fryer: [email protected]
© Copyright 2026 Paperzz