1.0
1
Chapter 7
Game theory
Game theory is the study of decision making under uncertainty involving two (or more) intelligent
opponents in which each opponent aims to optimise their decision at the expense of the other
opponents. Typical examples include launching advertising campaigns for competing products or
planning war tactics for opposing armies.
7.1
Introduction
Each opponent is called a player, and each player has a number of choices that are called strategies. There may be a finite or an infinite number of strategies for each player. The outcome of a
particular game is determined by the strategy that each player adopts. In a game with two players,
A and B say, the outcome of a game is usually expressed in terms of the payoff to player A. A
game with two players in which the gain of one player is equal to the loss of the other is called a
two player zero sum game. We will restrict attention to this case.
Two person zero-sum games are strictly competitive games, where everything that a player wins
must be lost by the other player in the game. They appear in context such as some parlour games
or defense modelling, but they are very rare in Business and Economics.
To simplify our exposition we will not say that A gives B p pounds etc since we are assuming a
zero-sum game every time that A wins B loses the same amount and vice versa. In a payoff matrix
we only need list the payoff to player A since the payoff to Player B is the negative of the payoff
to Player A. This represents the game as a matrix in the normal form.
The Normal Form
We consider a matrix A , which has as many rows as player 1 has strategies and as many columns
as player 2 has strategies. Each element of the matrix, aij , represents the expected payoff when
player 1 chooses strategy i, player 2 chooses strategy j. Thus, aij is amount which player 1 receives
from player 2,player 1 wants to maximise aij , while player 2 wants to minimise aij .
Example: Two players, A and B, play a card game. Player A has {1♥, 1♣}, and player B has
{1♦, 1♠}. (Thus each player has a red and a black card.) They each select one of their cards, and
display them simultaneously. If the displayed cards are of the same colour then A wins £1, whereas
if the displayed cards are of different colours then B wins £1.
2
We can summarise this game as a payoff matrix, where the element in the i-th row and j-th
column is the payoff to player A that arises when player A plays strategy i and player B plays
strategy j. In this case, the payoff matrix is
Player B
1♦ 1♠
1♥
1 −1
Player A
1♣ −1
1
The problem we wish to solve is this: How should the players select their strategies? As we shall
see, it may be optimal for a player to always play a single strategy (a pure policy) or a mixture of
strategies (a mixed policy).
Example
In the game Two-Finger Morra each player displays either one or two fingers and simultaneously
guesses how many the other player will show. If both guess correctly or both guess incorrectly
the game is a draw. If only one player guesses correctly he/she wins an amount equal to the total
number of fingers shown by both players. The payoff matrix is illustrated below,
Player
A
(1,1)
0
-2
3
0
(1,1)
(1,2)
(2,1)
(2,2)
Player B
(1,2) (2,1)
2
-3
0
0
0
0
-3
4
(2,2)
0
3
-4
0
Note: A player may play only one strategy in any particular game. To understand what a mixed
strategy means, consider a sequence of separate plays of the game where, in each game, the players
choose which strategy to play according to some random rule. For example, in the above game
player A may choose to play 1♥ with probability p and 1♣ with probability 1−p, for some 0 ≤ p ≤ 1.
One course of action that is open to a player is to randomly choose which strategy to employ at
each stage of the game. This would clearly have some advantages in some of the games we have
described. In the simple card game we described if A always plays the same strategy B would soon
figure out what A’s strategy is and win the game all the time. A mixed strategy is one in which a
player assigns a probability distribution to his/her set of strategies and at each stage of the game
chooses a strategy randomly according to this distribution. This is sometimes the best course of
action for a player, as we will later show.
If a player chooses one strategy S, say, all the time then this is called a pure strategy. Of course
this is just a special case of a mixed strategy in which the probability distribution assigned to the
set of strategies gives S probability 1.
We will consider the case of optimizing pure strategies first, however, as it is simpler conceptually.
7.2
Solving a two player zero sum game
To allow for the fact that both players are working against the other’s interests, the optimal course
of action for each to adopt is to select the policy (mixed or pure) that produces the best of the
3
worst possible outcomes for them, i.e. the policy that maximises the smallest amount they could
win, or minimises the greatest amount they could lose. A policy (mixed or pure) is optimal if
neither player finds it beneficial to deviate from that policy. A game is said to be stable, or in a
state of equilibrium, when both players play according to their optimal policies.
A zero sum two player game is usually represented by the payoff matrix to player A, whose strategies
are represented by the rows of the matrix. Player A selects the strategy that maximises their
minimum gain, where the minimum is taken over each of player B’s strategies. Similarly, player B
selects the strategy that minimises the maximum amount they could lose, where the maximum is
taken over each of player A’s strategies. These policies are usually referred to as the maximin and
minimax strategies respectively.
7.3
Pure strategies
Consider the game with matrix (payoff to player A) given by
1
2
3
Player A
Column maximum
minimax
1
8
6
7
8
Player B
2
3
2
9
5
7
3 −4
5
9
5
4
5
18
10
18
Row minimum
2
5
−4
maximin
5
It is clear that the maximin and minimax values are equal for this game.
By playing strategy 1, player A may gain at least 2 units regardless of what player B plays. Similarly,
player A gains at least 5 units by playing strategy 2, and gains at least −4 units (i.e. loses at most 4
units) by playing strategy 3. Thus by playing strategy 2, player A maximises the minimum amount
he or she can win. This is the best of the worst possible outcomes, and is called the maximin
strategy. The gain to player A of adopting this strategy is called the maximin (or lower) value of
the game.
Player B wants to minimise the amount he or she loses. By playing strategy 1, B can lose no more
than 8 units. Similarly, B can lose no more than 5, 9, and 18 units by playing strategies 2, 3, and 4
respectively. To minimise the maximum loss, player B therefore chooses strategy 2. This selection
is called the minimax strategy. The corresponding loss is called the minimax (or upper) value of
the game.
It is always true that
maximin value ≤ minimax value.
When these two quantities are equal, then the corresponding pure strategies are called optimal and
the game is said to have a saddlepoint. Optimality here means that neither player is tempted to
adopt another strategy, as to do so would allow their opponent to realise some additional advantage
over them. In general, the value of the game, v, satisfies
maximin (lower) value ≤ v ≤ minimax (upper) value.
4
In the example above, the maximin and minimax values are both equal to 5. Thus the game has a
saddlepoint, and the value of the game is given by v = 5. Notice that neither player can improve
their fortune by selecting another strategy.
7.4
Mixed strategies
The existence of a saddlepoint yields the optimal pure strategies for a game. Some games do not
have saddlepoints though. For example:
1
2
3
4
Player A
Column maximum
minimax
1
5
6
8
3
8
Player B
2
3
−10
9
7
8
7 15
4 −1
7 15
4
0
1
2
4
4
4
Row minimum
−10
1
2
−1
maximin
2
In this example the minimax value (= 4) is greater than the maximin value (= 2). Hence the game
does not have a saddlepoint, and the pure maximin and minimax strategies are not optimal. To
see this, note that it is beneficial for player A to play strategy 4 if B plays the minimax strategy.
However, then it would be best for B to play strategy 3, and so on. The game is described as being
unstable. The failure of the pure maximin and minimax strategies to yield an optimal solution
leads to the use of mixed strategies.
Player A may thus benefit by introducing a random element to his play. Suppose there are m
strategies in X. A wishes to choose a probability distribution {p1 , ..., pm } so that he/she maximizes
his/her expected payoff.
The idea of expected payoff is not mysterious. For example if you win £1 every time a fair die
comes up 6 and lose 20p if it doesn’t your expected payoff is 1/6 × £1 − 5 /6 × £.2 = £ 61 . “On
average” you will win one pound every time you play this game.
Let {p1 , ..., pm } and {q1 , ..., qn } be the probabilities with which A and B play their strategies eg
the probability that A plays strategy i ∈ X is pi .
P
Pn
Clearly m
i=1 pi =
j=1 qj = 1 and pi ≥ 0, qj ≥ 0 for all i and j. Thus the set X of possible mixed
strategies for A may be described by
X = {(p1 , ..., pm ) :
m
X
pi = 1, pi ≥ 0}
i=1
and the set Y of possible mixed strategies for B is the set
Y = {(q1 , ..., qn ) :
n
X
qi = 1, qi ≥ 0}
i=1
A point x in X is an m-tuple or m-vector x = (p1 , ..., pm ). If we let aij be the payoff to Player
A of the situation of playing the pure strategy i and B playing the pure strategy j then the
5
expected payoff to A if A plays the mixed strategy x = (p1 , ..., pm ) and B plays the mixed strategy
y = (q1 , ..., qn ) is
m X
n
X
K(x, y) =
pi qj aij
i=1 j=1
Once we know the payoffs {aij } for the pure strategies we may calculate the expected payoffs
corresponding to a choice of any mixed strategies. The numbers (aij ) are usually expressed in a
matrix.
Example
Suppose that Player A and Player B play the one or two finger game ie they both stick out 1 or
2 fingers. If both display the same number A wins £1 , if they display a different number then B
wins. What is the expected payoff to A if A choses the mixed strategy (1/3, 2/3) and B chooses the
mixed strategy (1/4, 3/4)? Set-up the Table a11 = 1, a12 = −1, a21 = −1, a22 = 1. The expected
payoff to A is then (1/3).(1/4).1 − (1/3).(3/4) − (2/3)(1/4) + (2/3)(3/4) = 1/6.
Example
Suppose that you suspect a coin is biased and actually is coming up heads 60% of the times that
it is tossed. You play the usual game of heads and tails and guess on whether the coin lands heads
or tails. If you are correct you win £1 , if incorrect you lose £1 . Assume that you are correct and
the coin is biased as you suspect. What is your optimal betting strategy?
If you bet on heads 60% of the time it is .04 (check). If you bet on heads all the time it is .2.
7.5
Dominance
Before proceeding further, we examine ways of simplifying games. We do this by eliminating any
strategies that (intelligent) players would never use.
For example, consider the game with matrix (payoff to player A) given by
Player B
1
2
3 Row minimum maximin
1
2 −2 −4
−4
Player A
2 −2
2
2
−2
−2
3
4 −2
0
−2
−2
Column maximum
4
2
2
minimax
2
2
Player A would never play strategy 1 because strategy 3 always yields a gain which is at least as
good. For this reason, strategy 3 is said to row-dominate strategy 1. Therefore, we may simplify
the game by deleting strategy 1 for player A. This gives the following reduced game:
Player A
Column maximum
minimax
2
3
Player B
1
2
−2
2
4 −2
4
2
2
6
3
2
0
2
2
Row minimum
−2
−2
maximin
−2
−2
From this reduced game, we see that player B would never play strategy 3 because strategy 2 is at
least as good, and we say that strategy 2 column-dominates strategy 3. Deleting strategy 3, we
obtain
Player B
1
2 Row minimum maximin
2 −2
2
−2
−2
Player A
3
4 −2
−2
−2
Column maximum
4
2
minimax
2
Note that deleting the dominated strategies does not alter that the value of the game still lies
between the maximin and minimax values, i.e. −2 ≤ v ≤ 2. In summary, we have
Row dominance: If each element in row j is greater than or equal to the corresponding element
of row i then row j dominates row i and row i can be deleted.
Column dominance: If each element in column j is less than or equal to the corresponding
element in column i then column j dominates column i and column i can be deleted.
7.6
Determining optimal mixed strategies
Determining the optimal mixed policy requires calculating the probability distribution by which
each player should play the available strategies. Let {p1 , . . . , pm } and {q1 , . . . , qn } be the probabilities by which A and B play their strategies respectively. (So player A chooses from m strategies,
player B chooses from n.) Then
m
n
X
X
pi =
qj = 1
i=1
j=1
and pi ≥ 0, qj ≥ 0 for all i and j. Conceptually, each player may be thought of as playing all of
their strategies simultaneously (less any that have been deleted through dominance) according to
these probabilities. Thus if aij represents the entry in row i and column j of the payoff matrix
(payoff to player A) we have the following representation of the game:
Player A
p1
..
.
pm
Player B
q1
· · · qn
a11 · · · a1n
..
..
.
.
am1 · · · amn
Choosing the best of the worst outcomes for player A corresponds to selecting {p1 , . . . , pm } to
maximise the smallest expected payoff from each of player B’s strategies. Similarly, player B will
aim to select {q1 , . . . , qn } to minimise the largest expected payoff from each of player A’s strategies.
P
Thus player A’s problem is to choose pi ≥ 0, i = 1, . . . , m, with m
i=1 pi = 1 in order to
!)
(
m
m
X
X
maximise min
ai1 pi , . . . ,
ain pi
.
i=1
7
i=1
P
Player B’s problem is to choose qj ≥ 0, j = 1, . . . , n with nj=1 qj = 1 in order to
n
n
X
X
minimise max
a1j qj , . . . ,
amj qj .
j=1
j=1
These objective functions are called the maximin and minimax expected payoffs respectively.
For any mixed policies with probabilities {pi } and {qj }, it is true that
maximin expected payoff ≤ v ≤ minimax expected payoff
(7.6.1)
where v is the value of the game. For the optimal values of {pi } and {qj }, equality holds in
equation (7.6.1), and the value of the game is then equal to the optimal maximin or minimax
probabilities for players A and B
expected payoff. Letting {p∗i } and {qj∗ }Pdenote
Pn the optimal
∗ q ∗ . Thus the value of the game may be
respectively, it may be shown that v = m
a
p
ij
i j
i=1
j=1
interpreted as the expected payoff that player A receives when both players are playing according
to their optimal policies. (The value of the game has the same interpretation in the case of pure
policies.)
7.6.1
2x2 Games
Let’s first consider 2x2 games, that are given by
A=
a11 a12
a21 a22
;
We consider the optimal strategies as p = (p1 , p2 ); q = (q1 , q2 ) and the value of the game v. We
define J = (1, 1),and assume A to be non-singular. Since p1 and p2 are probabilities associated
with mutually exclusive strategies then p1 + p2 = 1, analogously q1 + q2 = 1. It can be shown that
the solution will be:
1
v =
−1
JA J T
JA−1
p =
JA−1 J T
A−1 J T
q =
JA−1 J T
We notice that since A−1 = det1 A A+ ,where A+ is the adjoint of A - the transpose of a matrix the
elements of which are the cofactors of the correspondent elements in matrix A - v, p, q can also be
written as function of A+ . This leads to a solution that is independent of A being non-singular,
namely:
v =
x =
y =
8
det A
JA+ J T
JA+
JA+ J T
A+ J T
JA+ J T
Example
Solve the following game
7.6.2
1 0
−1 2
2xn and mx2 Games
These are two person zero-sum games, where a player has only two strategies, as in 2xn games.
Thus, this player wants to maximize
v(x) = min{a1j p1 + a2j p2 }
j
p1 = 1 − p 2
v(x) = min{(a2j − a1j ) p2 + a1j }.
j
That is, v(x) is a minimum of n linear functions of p2 . Such functions can then be plotted and
maximised graphically.
Example
Find the value of the game that is given by the following matrix:
6 9 3 15
12 3 18 0
7.6.3
Symmetric Games
Definition 1 A square matrix A is skew-symmetric if aij = −aji . A game is symmetric if its
matrix is skew-symmetric.
The value of a symmetric game is zero. If p is an optimal strategy for player 1, then it is also
optimal for player 2.
Example: Game of matching pennies
Player A chooses ‘heads’ (H) or ‘tails’ (T). Player B, not knowing the other’s choice, also chooses
H or T. If their choice agrees, then player B wins a penny from player A; otherwise player A wins
a penny from player B. Show that for the game of the matching pennies the optimal solution is
p = ( 21 , 12 ).
7.7
Solving a game using linear programming
In this section we show that the problem of finding the optimal mixed strategies for players A and
B may be formulated as a primal-dual pair of linear programming problems. This is useful due to
the availability of computer software for linear programming. The optimal strategies may therefore
be determined using the Simplex algorithm encountered previously. First, we focus on player A’s
problem.
9
Player A’s Problem:
m
X
(
maximise min
ai1 pi , . . . ,
i=1
m
X
ain pi
!)
i=1
subject to p1 + · · · + pm = 1 and pi ≥ 0 for i = 1, . . . , m.
P
Pm
We reformulate this as follows: letting v = min ( m
i=1 ai1 pi , . . . ,
i=1 ain pi ), player A’s problem
may be written
maximise
z=v
subject to
Pm
aij pi ≥ v, for j = 1, . . . , n,
Pi=1
m
i=1 pi = 1
pi ≥ 0 for i = 1, . . . , m.
and
Let us assume that v > 0. [If v ≤ 0 then construct a modified payoff matrix by adding a positive
constant K to each element of the payoff matrix. The value of the original game is obtained by
subtracting K from the value of the modified game.] We can simplify the above formulation by
dividing each constraint by v. This gives
p1
a11
v
..
.
p1
a1n
v
p1
and
v
p2
+ a21
v
..
.
p2
+ a2n
v
p2
+
v
pm
+ · · · + am1
v
..
.
pm
+ · · · + amn
v
pm
+ ··· +
v
≥
≥
=
1,
..
.
1,
1
.
v
Now let xi = pi /v for i = 1, . . . , m. Since maximising v is equivalent to minimising 1/v, and because
x1 + · · · + xm = (p1 + · · · + pm )/v = 1/v, we can express player A’s problem as
z = x1 + · · · + xm
minimise
a11 x1 + · · · + am1 xm ≥ 1,
..
..
..
subject to
.
.
.
a1n x1 + · · · + amn xm ≥ 1,
xi ≥ 0, for i = 1, . . . , m.
and
Similarly, player B’s problem can be expressed as
w = y1 + · · · + y n
maximise
a11 y1 + · · · + a1n yn ≤ 1,
..
..
..
subject to
.
.
.
am1 y1 + · · · + amn yn ≤ 1,
yj ≥ 0, for j = 1, . . . , n,
and
10
where w = 1/v, and yj = qj /v for j = 1, . . . , n.
Expressed in this manner, we see that B’s problem is the dual of A’s problem (as the term dual is
used in linear programming (LP)theory). By the duality theorems of LP, both problems have the
same optimal objective values. The value v of the game is thus the reciprocal of the optimal value
of x1 + ... + xm (or equivalently, the reciprocal of the optimal value of w). Once we have solved
for (x1 , ..., xm ) by , say, the Simplex algorithm we may recover (p1 , ..., pm ) by xi = pvi . Remember
that the value of the original (unmodified) game is then v − K, but the same optimal strategies
(p1 , ..., pm ) hold for both the modified and unmodified game. Analogous remarks apply to optimal
strategies for B- once B’s optimal strategies for the modified game are the same as B’s modified
strategies for the unmodified game. Alternatively, standard duality results in linear programming
theory allow the optimal solution or strategy of either problem to be constructed from the optimal
strategy of the other.
7.8
Two-person nonconstant sum games
Most situations in business are not constant-sum games since there is almost always a degree of
cooperation and it is unlikely for businesses to be in total conflict.
We now consider the famous Prisoner’s Dilemma
Example
A pair of transients, Al Fresco and Des Jardins, have been arrested for vagrancy. They are suspected
of complicity in a robbery, but the evidence to convict them for the robbery. They are interrogated
in separate cells and are offered the following deal: “If you confess and your friend does not, you
will be released and your friend will have the book thrown at him; and the other way round if he
confesses and you do not. If both confess, both will be convicted of minor vagrancy. The promised
jail sentances, in months, are shown in the table below.” What does rationality dictate that the
prisoners should do?
Al Fresco
Confess
Don’t Confess
Des Jardins
Confess Don’t Confess
(-8,-8)
(0,-15)
(-15,0)
(-1,-1)
This game can be solved by dominance, leading to the equilibrium pair (−8, −8) where both
prisoners confess. However, when we compare the confess/confess equilibrium (8 months in jail)
with the alternative of neither confessing (1 month in jail), this result looks paradoxical. Both
would benefit to choose the strategies (don’t confess, don’t confess) and get the payoff (−1, −1).
However this strategy is not stable as both players would gain by double-crossing their partner.
Their is a contradicition between what is individually rational and what is collective rational: on
persuing personal gain, both end up worse off than they needed to be. They would both be better
off if they could co-operate.
This example is a paradigm for many business and economic interactions, for example:
• Two firms are competing to sell the same product (e.g. energy and communication utilities
11
at present), the logic of profit maximisation forces each to set a low price when both would
earn more profits if each set a higher proce.
• Fishermen overfish their common fishing ground and destroy their industry.
However, the awareness of this paradigm has led to many situations of co-operation , as for instance,
when oil producers organised themselves or when former competitors in a particular industry join
forces.
Definition
An outcome is efficient if there is no alternative outcome that would leave some players better off
and none worse off.
The problem with the equilibrium in the Prisoner’s Dilemma if that the equilibrium solution is
inefficient. It would be in the interest of the players to wliminate inefficient equilibria.
12
© Copyright 2026 Paperzz