1 autocratic strategies

1
AUTOCRATIC
1.1
O RIGINAL DISCOVERY
STRATEGIES
Recall that the transition matrix M for two interacting players X and Y with memory-one strategies p and q,
respectively, is given by


pR qR pR (1 − qR ) (1 − pR )qR (1 − pR )(1 − qR )
 pS qT pS (1 − qT ) (1 − pS )qT (1 − pS )(1 − qT ) 

M=
(1.1)
 pT qS pT (1 − qS ) (1 − pT )qS (1 − pT )(1 − qS )  .
pP qP pP (1 − qP ) (1 − pP )qP (1 − pP )(1 − qP )
M is a stochastic matrix (all rows sum up to 1) and hence has an eigenvalue of 1. The stationary vector v of
M satisfies
vM = v
or, equivalently
v M0 = 0
(1.2)
with M0 = M − I where I denotes the identity matrix. Using v we can immediately derive the average
payoffs per round, πX and πY , in an infinitely iterated game for players X and Y . All we need is to rewrite
the payoff matrix for each player as a vector: PX = (R, S, T, P ) and PY = (R, T, S, P ), respectively, to
obtain
v · PX
πX =
(1.3)
v·1
v · PY
πY =
.
(1.4)
v·1
Note that the division by v · 1, with 1 = (1, 1, 1, 1), is only necessary in case v was not normalized, i.e. its
elements vi did not sum up to 1.
In the following we now set out to establish an interesting relationship between determinants derived from
M and its stationary distribution v with rather unexpected consequences. For the adjugate of the matrix M0 ,
we have
adj(M0 )M0 = det(M0 )I = 0
(1.5)
where the first equality reflects a property of the adjugate matrix, whereas the second equality follows because
M0 is singular. Note that each element aij of adj(A) is given by aij = (−1)i+j mji where mij is the (i, j)minor of A, i.e. the determinant of A after removing row i and column j.
Because of Eq. (1.2), every row of adj(M0 ) must be proportional to v and hence we can find v based on
determinants of M0 . Since det(A) is invariant to adding one column of A to another one, we can rewrite M0
to obtain


pR qR pR − 1 qR − 1 f1
 pS qT pS − 1
qT
f2 
.
M00 = 
(1.6)
 p T qS
pT
qS − 1 f3 
p P qP
pP
qP
f4
1
2
CHAPTER 1. AUTOCRATIC STRATEGIES
by adding the first column to the second and third one, as well as replacing the last column by an arbitrary
vector f , without affecting the derivation of v based on the last row of adj(M0 ). More specifically, the entries
a4j in the last row of adj(M0 ) are the determinants of the 3 × 3 matrices after deleting the last column and
row j from M0 . Although every row of adj(M0 ) is proportional to v, the last row does not depend on the last
column of M0 and hence is the same for adj(M00 ) regardless of the choice of f .
With this, we have established a link between the stationary distribution v and the determinant of M00 via
the adjugate adj(M0 ):
det(M00 ) = a4,1 · f1 + a4,2 · f2 + a4,3 · f3 + a4,4 · f4 = v · f .
(1.7)
The reason why this exercise is worthwhile and will turn out to be rather rewarding is due to the particular
form of M00 . More specifically, note that (i) the second column of M00 is under the sole control of player X;
(ii) the third column is similarly under the sole control of player Y ; and, finally, (iii) the last column is an
arbitrary vector f that ends up in the dot-product with the stationary vector v.
For convenience let us define a new function D(p, q, f ) := det(M00 ), which also highlights the fact that
M00 is a function of the strategies, p and q, of the two players X and Y , as well as the arbitrary vector f .
Since f is arbitrary, we can actually set it to the payoff vectors PX and PY to obtain the average payoffs for
each player:
v · PX
D(p, q, PX )
=
v·1
D(p, q, 1)
v · PY
D(p, q, PY )
πY =
=
.
v·1
D(p, q, 1)
πX =
(1.8)
(1.9)
Because πX and πY are linear in PX and PY , respectively, we can also form a linear combination of the
payoffs:
απX + βπY + γ =
D(p, q, αPX + βPY + γ)
D(p, q, 1)
(1.10)
for some α, β, γ ∈ R. Interestingly and notably after over 50 years of research in game theory and the
prisoner’s dilemma in particular, the rather technical Eq. (1.10) caused quite a stir in the scientific community
because it enables players to unilaterally exert an unexpected level of control over iterated interactions. This
happens because either player can unilaterally set the right-hand-side of Eq. (1.10) to zero. Player X achieves
this by choosing a strategy
pR = αR + βR + γ + 1
(1.11)
pS = αS + βT + γ + 1
(1.12)
pT = αT + βS + γ
(1.13)
pP = αP + βP + γ
(1.14)
for suitable α, β, γ ∈ R such that pi ∈ [0, 1] and player Y can accomplish the same feat by choosing his
strategy q analogously. Either case yields
απX + βπY + γ = 0
(1.15)
and hence either player can unilaterally enforce a linear relationship between his or her payoff and the one
of the opponent. Because these strategies set a determinant in Eq. (1.10) to zero, this class of strategies was
termed zero-determinant strategies or ZD-strategies, for short. The level of control that these zero-determinant
strategies offer was previously thought to be impossible.
The essential feature of zero-determinant strategies is not so much the technical aspect that they render a
particular determinant in Eq. (1.10) zero but rather that they unilaterally enforce the linear payoff relation in
Eq. (1.15). In order to emphasize the latter, we name those strategies in the following autocratic strategies
because players adopting such strategies gain an unprecedented degree of control over interactions.
1.2. MORE TRADITIONAL DERIVATION
1.1.1
3
E XAMPLES
Let us now explore different particularly interesting scenarios where player X adopts an autocratic strategy.
S ET OPPONENT ’ S PAYOFF
The first interesting case arises when setting α = 0, β 6= 0 and hence πY = −γ/β. The corresponding
strategy enables player X to fix the payoff of player Y completely independent of the strategy of player Y .
As we will see below, this even remains true if player Y has an arbitrarily long memory and employs more
sophisticated strategies than just based on the previous interaction as for memory-one strategies. However,
player X is not completely free in setting her opponent’s payoff because p needs to be a probabilistic strategy,
i.e. pi ∈ [0, 1]. As a result, P ≤ πY ≤ R must hold. These strategies are traditionally termed equalizers
because they result in equal payoffs regardless of player Y ’s actions.
Exercise: Show that P ≤ πY ≤ R. In order to see this, note that from pR ≤ 1 follows βR + γ ≤ 0 and
similarly from pP ≥ 0 requires βP + γ ≥ 0. Since R ≥ P , without loss of generality, we establish β ≤ 0
and hence, using the same two inequalities, we find R ≥ −γ/β and P ≤ −γ/β or P ≤ πY ≤ R, as required.
S ET OWN PAYOFF
Choosing α 6= 0, β = 0 allows player X to set her own score to πX = −γ/α, or, does it?
Exercise: (1) Find range of possible πX . (2) Does it depend on the game, i.e. ranking of R, S, T, P ? (3)
Discuss features of the resulting strategies.
E XTORTIONERS
1.2
M ORE TRADITIONAL DERIVATION
The discovery of autocratic strategies by ? makes an impressive example of how the unbiased view of outsiders
can result in original discoveries and spark novel lines of research. This also explains their ingenious but rather
unusual approach to a game theoretical problem. For this reason the more traditional and direct approach to
autocratic strategies followed only afterwards by ?. Alternate and more direct way of showing that autocratic
strategies enforce a linear relation between the payoffs of player X and Y
Definition 1.1. An autocratic strategy p for player X in infinitely iterated 2 × 2 games is given by
pR = αR + βR + γ + 1
(1.16)
pS = αS + βT + γ + 1
(1.17)
pT = αT + βS + γ
(1.18)
pP = αP + βP + γ
(1.19)
for some α, β, γ ∈ R such that pi ∈ [0, 1].
Theorem 1.1. If player X uses an autocratic strategy then
απX + βπY + γ = 0
(1.20)
where πX and πY denote the average payoff per round to players X and Y , respectively, in the limit of
infinitely many rounds.
4
CHAPTER 1. AUTOCRATIC STRATEGIES
Proof. For the proof we need to first introduce some notation. Let πX (n) and πY (n) denote the payoffs to
player X and Y in round n; si (n) the probability that player X experiences outcome i ∈ {R, S, T, P }; and
qi (n) the conditional probability that player Y plays C in round n + 1 given outcome i in round n. With this
we can write the probability that player X cooperates in round n + 1 as
pC (n + 1) = sR (n + 1) + sS (n + 1) = s(n) · p,
(1.21)
with s(n) = (sR (n), sS (n), sT (n), sP (n)) and p the strategy of player X. The second equality in Eq. (1.21)
follows from
sR (n + 1) = sR (n)pR qR (n) + sS (n)pS qT (n) + sT (n)pT qS (n) + sP (n)pP qP (n)
(1.22)
sS (n + 1) = sR (n)pR (1 − qR (n)) + sS (n)pS (1 − qT (n))
+ sT (n)pT (1 − qS (n)) + sP (n)pP (1 − qP (n)).
(1.23)
Note that when summing sR (n + 1) + sS (n + 1) all terms involving qi (n) cancel and a simple dot-product
remains. This is the reason that no assumptions regarding the strategy of player Y were necessary. If player
X uses an autocratic strategy, see Eq. (1.16), we obtain
pC (n + 1) = s(n) · (αR + βR + γ + 1, αS + βT + γ + 1, αT + βS + γ, αP + βP + γ)
= s(n) · (αPX + βPY + γ1 + g),
(1.24)
(1.25)
with 1 = (1, 1, 1, 1) and g = (1, 1, 0, 0). Note that g represents a stubborn strategy that ignores the opponent’s
moves and continues with whatever it started with.
Let us now consider
w(n) := pC (n + 1) − pC (n) = s(n) · p − sR (n) − sS (n)
(1.26)
= s(n) · (αPX + βPY + γ1)
(1.27)
and determine the average of w(n) in the limit of infinitely many rounds. The left-hand-side is straight forward
and we obtain
LHS:
N
1 X
1
lim
w(n) = lim
(pC (N ) − pC (0)) = 0
N →∞ N
N →∞ N
(1.28)
n=0
because pC (n) is bounded. For the right-hand-side we get
RHS:
N
N
1 X
1 X
lim
s(n) · (αPX + βPY + γ1) = (αPX + βPY + γ1) · lim
s(n) (1.29)
N →∞ N
N →∞ N
n=0
n=0
but the limit is just the (normalized) stationary probability distribution v for the four outcomes R, S, T, P with
v · 1 = 1. Thus, we find that
(αPX + βPY + γ1) · v = απX + βπY + γ,
(1.30)
which equals zero to match the left-hand-side.
Note that if player Y also adopts a memory-one strategy then v is simply the stationary state of the Markov
chain given by the transition matrix, Eq. (1.1).
1.3. GENERALIZATIONS
1.3
G ENERALIZATIONS
1.3.1
D ISCOUNTED INTERACTIONS
1.3.2
A RBITRARY NUMBER OF PLAYERS
1.3.3
A RBITRARY STRATEGIES
5