Minimax regret strategies

Minimax regret strategies in simple games
VERY preliminary
Michal Lewandowski∗
September 19, 2014
Abstract
Goeree and Holt (2001) presented several simple games in which Nash
equilibrium and the experminetally observed strategies agree. Then they
showed that it is possible to modify these games in such a way, that the
Nash equilibrium of the resulting games disagrees with the experimental
prediction. One explanation of this fact is that players cannot be sure that
their opponents are perfectly rational, but only in this case NE strategy is
optimal. Otherwise, one has to take into account that some strategies are
more risky than other. The riskiness of a given strategy can be measured
by its maximum regret.
Keywords: minimax regret, Nash equilibrium
JEL Classification Numbers: C7
∗
Warsaw School of Economics, [email protected]
Michal Lewandowski
Minimax regret strategies
1
Minimax regret strategies
Consider a game Γ in strategic form:
1. The set of players N
2. For each player i ∈ N a set of actions Ai
3. For each player i ∈ N a payoff function ui : A → R
Notation:
• A set of all players profiles of actions is denoted by A ≡ ×i∈N Ai with a
typical element denoted by a.
• A set of all players but player i profiles of actions is denoted by A−i ≡
×j∈N \i with a typical element denoted by a−i .
Definition 1. Regret from chosing a0i ∈ Ai given a profile of other players’
actions a0−i ∈ A−i is defined as:
Ri (a0i , a0−i ) = max [ui (ai , a0−i )] − ui (a0i , a0−i )
(1)
ai ∈Ai
Maximum regret from choosing a0i ∈ Ai is then given by:
max Ri (a0i , a−i )
(2)
a−i ∈A−i
Minimax regret strategy for a player i ∈ N is an action a∗i ∈ Ai such that:
∗
ai ∈ arg min
max Ri (ai , a−i )
(3)
ai ∈Ai
a−i ∈A−i
Definition 2. Nash equilibrium in pure strategies is a profile of actions a∗ ∈ A
such that:
a∗i ∈ arg max ui (ai , a∗−i ), ∀a−i ∈ A−i , ∀i ∈ N
(4)
ai ∈Ai
Example 1. Consider two games of a ”choose an effort” variety:
Payoff table
L
R
T
2, 2
−3, 1
B
1, −3
1, 1
Regret table
L
R
max
T
0, 0
4, 1
4
B
1, 4
0, 0
1
max
4
1
2
Michal Lewandowski
Minimax regret strategies
Payoff table
L
R
T
5, 5
0, 1
B
1, 0
1, 1
Regret table
L
R
max
T
0, 0
1, 4
1
B
4, 1
0, 0
4
max
1
4
In the first game above, there are two Nash equilibria in pure strategies: (T, L)
and (B, R) with payoffs (2, 2) and (1, 1), respectively and one mixed strategy
equilibrium (0.8T + 0.2B, 0.2L + 0.8B). Contrary to the mixed strategy equilibrium prediction, many experiments confirmed that B and R are played much
more often than T and L, respectively. This is captured by minimax regret
strategies, which reveal that strategies T and L are much more risky than strategies B and R (maximal regret levels of the former are both 4 and for the latter
1).
In the second game above, there are again two Nash equilibria in pure strategies: (T, L) and (B, R) with payoffs (4, 4) and (1, 1), respectively and one mixed
strategy equilibrium (0.2T + 0.8B, 0.2L + 0.8R). Contrary to the mixed strategy
equilibrium prediction, many experiments confirmed that T and L are played
much more often than B and R, respectively. This is captured by minimax regret strategies, which reveal that strategies B and R are much more risky than
strategies T and L (maximal regret levels of the former are both 4 and for the
latter 1).
Example 2. Consider three games of a ”matching pennies” variety. The first
one is symmeric whereas the other two are asymmetric:
Payoff table
Game A
L
R
T
80, 40
40, 80
B
40, 80
80, 40
Regret table
L
R
max
T
0, 40
40, 0
40
B
40, 0
0, 40
40
max
40
40
Payoff table
Game B
Regret table
L
R
T
320, 40
40, 80
B
40, 80
80, 40
L
R
max
T
0, 40
40, 0
40
B
280, 0
0, 40
280
max
40
40
3
Michal Lewandowski
Minimax regret strategies
Payoff table
Game C
Regret table
L
R
T
44, 40
40, 80
B
40, 80
80, 40
L
R
max
T
0, 40
40, 0
40
B
4, 0
0, 40
4
max
40
40
Nash equilibria of these games are compared with the experimentally played
strategies in the following table:
Game A
NE
Exp
0.50T + 0.50B
Game B
!
0.50L + 0.50R
0.48T + 0.52B
Game C
0.50T + 0.50B
!
0.13L + 0.87R
!
0.48L + 0.52R
0.96T + 0.04B
0.16L + 0.84R
0.50T + 0.50B
!
0.91L + 0.09R
!
0.08T + 0.92B
!
0.8L + 0.2R
Interestingly, NE strategies of the Raw player are the same for all three
games, which sharply contrasts with the experimental evidence. Let’s look at
the riskiness of the strategies: For the column player both strategies in all three
games have the same maximum regret. It is therefore safe for the Column player
to play what Nash equilibrium suggests. Note that it also roughly reflects what
the Column players actually play in these games. On the other hand however,
the riskiness levels of the Raw player strategies differ substantially in these three
games. In Game A, maximum regret for strategy T and B are equal, in Game
B maximum regret for strategy T is much lower than that for strategy B and
in Game C it is the opposite. This is reflected in the experimental evidence in
which strategy T is played much more often than strategy B in Game B, with
exactly the opposite happening for Game C.
Example 3. Let’s consider the travellers’ dilemma with two players and action
space Ai = {180, 181, ..., 300} for each i ∈ N = {1, 2}. Payoffs are the following:
ui (a0i , a0j ) = min(a0i , a0j ) + P (a0i , a0j ), i 6= j, i, j ∈ N,

0
0


 P if ai < aj
where P (a0i , a0j ) =
0 if a0i = a0j , where P ∈ Z+


 −P if a0 > a0
i
j
When P = 0 and for P = 1, any profile of strategies for which ai = aj , i 6= j
is a pure strategy Nash equilibrium. When P > 1, the only Nash equilibrium
which survives iterated elimination of weakly dominant strategies is to bid 180.
Experimental evidence shows that if P is equal to 180 most of the players
do in fact bid low (i.e. close to 180), but if P is equal to 5, then most of the
players bid high (i.e. close to 300).
4
Michal Lewandowski
Minimax regret strategies
Let’s see what is the riskiness of the strategies in this game.
Regret of player i for a given strategy profile (a0i , a0j ) ∈ A is equal to:
Ri (a0i , a02 ) = max min(ai , a0j ) + P (ai , a0j ) − min(a0i , a0j ) + P (a0i , a0j )
ai ∈Ai
To calculate it we need to consider two cases:
(
180 − 180 = 0
0
0
aj = 180 ⇒ Ri (ai , 180) =
180 − 180 + P = P
a0j
if a0i = 180
if a0i > 180
> 180 ⇒ Ri (a0i , a0j ) = a0j − 1 + P − min(a0i , a0j ) + P (a0i , a0j )
We can summarize it in the form of the table:
a0i = 180
a0i > 180
a0j = 180
0
P
a0j
a0j
> 180
− 181
a0j − 1 + P − min(a0i , a0j ) − P (a0i , a0j )
So the maximum regret of player i for a given



 119
0
max regreti (ai ) =
max(P, 118)


 max(2P − 1, 0)
strategy a0i ∈ Ai is equal to:
when a0i = 180
when a0i = 181
when a0i ≥ 182
Now we can solve for the minimax strategies in this game which is summarized
in the following table:
Values of P
Set of minimax regret strategies
Minimax value
P =0
{300}
0
P ∈ {1, ..., 59}
{300 − 2P, ..., 299, 300}
2P − 1
P ∈ {60, 61, ..., 118}
{181}
118
P = 119
{180, 181}
119
P ∈ {120, 121, ...}
{180}
119
For example, for P = 0, the only minimax regret strategy is to bid 300. For
P = 5, any bid in the set {290, 291, ..., 300} is a minimax regret strategy. For
P = 180 the only minimax regret strategy is to bid 180.
Example 4. Consider the game of choosing an effort level, where N = {1, 2},
Ai = {110, 111, ..., 170}, i ∈ N . The payoffs for a given profile of actions
(a0i , a0j ) ∈ A are defined as follows:
ui (a0i , a0j ) = min(a0i , a0j ) − ca0i
where c ∈ (0, 1).
5
Michal Lewandowski
Minimax regret strategies
The set of Nash equilibria consists of all the pairs (a, a), where a ∈ Ai .
The experimental evidence on the other hand suggests that if cost of effort
is low (c = 0.1), then people usually choose high effort whereas if cost of effort
is high (c = 0.9), then they choose low effort. Again this is not captured by the
NE prediction. Let’s calculate the riskiness of the strategies involved.
The regret of a given profile of actions (a0i , a0j ) ∈ A is given by:
Ri (a0i , a0j ) =
=
max min(ai , a0j ) − cai − min(a0i , a0j ) − ca0i
ai ∈Ai
a0j − ca0j
− min(a0i , a0j ) + ca0i
Maximum regret for a strategy a0i is given by:
max regreti (a0i ) =
max aj − caj − min(a0i , aj ) + ca0i
aj ∈Aj
= max 170(1 − c) − a0i , −110c + ca0i
Minimax regret is given by:
min [max (170(1 − c) − ai , −110c) + cai ]
ai ∈Ai
Let’s define a∗i ∈ Ai as the value of player i strategy for which the two elements
of the above max function are equal:
170(1 − c) − a∗i = −110c ⇒ a∗i = 170 − c(170 − 110)
In order to find the minimax regret, it is clear that we need to consider three
cases:
a0i = 110
⇒
max regreti (110) = (1 − c)(170 − 110)
a0i = a∗i
⇒
max regreti (a∗i ) = c(1 − c)(170 − 110)
a0i = 170
⇒
max regreti (170) = c(170 − 110)
Since c ∈ (0, 1) it must be that the minimax regret strategy is a0i = a∗i and the
minimax regret value is max regreti (a∗i ) = c(1 − c)(170 − 110).
For example if c = 0.1, the minimax regret strategy is equal to 164 and if
c = 0.9, the minimax regret strategy is equal to 116, which is in accordance with
the experimental evidence.
Example 5. The following is the so called extended coordination game:
Payoff table
L
M
R
T
90, 90
0, 0
x, 40
B
0, 0
180, 180
0, 40
6
Michal Lewandowski
Minimax regret strategies
Since strategy R is never a best-response, it cannot be part of any Nash equilibrium. It is easy to see that the set of Nash equilibria of this game is the
following irrespective of the value x:
{(T, L), (B, M ), (0.67T + 0.33L, 0.67L + 0.33M )}
Experiments reveal that if x = 0, then strategy B is played almost exclusively
(96% of all subjects) whereas when x = 400, strategy B is played a little less
often (64% of all subjects). In both cases the Column player plays strategy M
more often than other strategies (84% in the first case and 76% in the second
case).
Let’s calculate the riskiness of the strategies involved.
L
Regret table
M
R
max
T
0, 0
180, 90
0, 50
180
B
90, 180
0, 0
x, 140
max(90, x)
max
180
90
140
where x ∈ R.
So
• if x < 180, then minimax regret strategies are (B, M )
• if x > 180, then minimax regret strategies are (T, M )
Example 6. The following is the so called Kreps game:
Payoff table
L
M
NN
R
T
200, 50
0, 45
10, 30
20, −250
B
0, −250
10, −100
30, 30
50, 40
The set of Nash equilibria is
{(T, L), (B, R), (0.49T + 0.51B, 0.13L + 0.87R)}
Experimental evidence is summarized in the following table:
Payoff table
L [26%]
M [8%]
N N [68%]
R [0%]
T [68%]
200, 50
0, 45
10, 30
20, −250
B [32%]
0, −250
10, −100
30, 30
50, 40
How about the minimax regret strategies?
7
Michal Lewandowski
Minimax regret strategies
L
M
NN
R
max
T
0, 0
10, 5
20, 20
30, 300
30
B
200, 290
0, 140
0, 10
0, 0
200
max
290
140
20
300
Regret table
The only Nash equlibria in pure strategies of this game are (T, L) and (B, R).
On the other hand, the only minimax regret pure strategy profile is (T, N N ).
Definition 3. Given any strategic form game Γ, a randomized strategy for any
player i is a probability distribution over Ai . Let ∆(Ai ) denote the set of all
possible randomized strategies for player i. The set of all randomized strategy
profiles will be denoted by ∆(A) = ×i∈N ∆(Ai ). It must be that:
X
σi (ai ) = 1, ∀i ∈ N
ai ∈Ai
We will write σ ≡ (σi )i∈N , where σi ≡ (σi (ai ))ai ∈Ai , for each i.
For any randomized strategy profile σ, let ui (σ) denote the expected payoff that
player i would get when the players independently choose their pure strategies
according to σ:

ui (σ) =
X

Y

a∈A
σj (aj ) ui (a), ∀i ∈ N
j∈N
For any σi0 ∈ ∆(Ai ), we denote (σi0 , σ−i ) the randomized strategy profile in
which the i-th component is σi0 and all other components are as in σ. Thus:


X
Y

ui (σi0 , σ−i ) =
σj (aj ) σi0 (ai )ui (a)
a∈A
j∈N \i
Definition 4. A randomized strategy profile σ ∗ ∈ ∆(A) is a Nash equilibrium
of Γ if the following holds:
∗
σi∗ ∈ arg max ui (σi , σ−i
), ∀i ∈ N
σi ∈∆(Ai )
(5)
Definition 5. A randomized strategy σi∗ ∈ ∆(Ai ) is a randomized minimax
regret strategy of player i ∈ N of the game Γ if the following holds:
σi∗ ∈ arg
∗ )=
where Ri (σi , σ−i
P
a∈A
min
σi ∈∆(Ai )
Q
∗
Ri (σi , σ−i
)
(6)
σ
(a
)
σi0 (ai ) maxa−i ∈A−i Ri (ai , a−i )
j
j
j∈N \i
8
Michal Lewandowski
Minimax regret strategies
Proposition 1.1. Minimax regret mixed strategy is equivalent to Nash equilibrium mixed strategy on the domain where they are both defined.
Proof.
min

σi ∈∆(Ai )

=
=
min
σi ∈∆(Ai )
Y

a∈A
min
σi ∈∆(Ai )
Y

a∈A
σj (aj ) σi (ai )Ri (ai , a−i )
j∈N \i


X
X
∗
Ri (σi , σ−i
)

σj (aj ) σi (ai ) max [ui (ai , a−i )] − ui (ai , a−i )
ai ∈Ai
j∈N \i


=
X
Y

a∈A
σj (aj ) σi (ai ) max ui (ai , a−i )
ai ∈Ai
j∈N \i

−
X
min
σi ∈∆(Ai )

Y

a∈A
j∈N \i

=
X

Y

a∈A
σj (aj ) σi (ai ) max ui (ai , a−i )
ai ∈Ai
j∈N \i

+ max
σi ∈∆(Ai )
X

Y

a∈A
σi ∈∆(Ai )
X

Y

a∈A
σj (aj ) σi (ai )ui (ai , a−i )
j∈N \i

= CON ST + max
σj (aj ) σi (ai )ui (ai , a−i )
σj (aj ) σi (ai )ui (ai , a−i )
j∈N \i
9
Minimax regret strategies
Michal Lewandowski
References
Goeree, J. K. and C. A. Holt (2001). Ten little treasures of game theory and ten
intuitive contradictions. The American Economic Review 91 (5), 1402–1422.
10