Uncertainty and Game Theory - Università degli Studi di Parma

COLLEGIO EUROPEO DI PARMA
Borgo Lalatta 14, 43100 Parma
Tel. + 39 (0)521.207525, Fax + 39 (0)521.384653
www.collegio.europeo.parma.it
Advanced Diploma in European Studies (ADES)
THE ECONOMIC ROLE OF THE STATE
Uncertainty and Game Theory:
Private Choices and Public Policies
by
Pietro A. Vagliasindi*
April 2004
_______________________________
* Professor of Public Finance and Industrial Organisation, University of Parma
Director of the Department of Economics, Finance & International Law,
Via Università 12, 43100 Parma
Tel: +39-0521-034561 Fax: +39-0521-034562
Email: [email protected]
2
Table of Contents
PAGE
Introduction .................................................................................................. 3
1. Choice under uncertainty ........................................................................ 4
A. The expected return: Heads or tails?...................................................... 4
B. The expected utility: risk aversion and risk preference. ........................ 5
C. Rational behaviour in an uncertain world. ............................................. 7
D. Von Neumann utility function and social welfare. ................................ 9
2. Strategic thinking and game theory ..................................................... 11
A. Interdependencies, strategic behaviour and rational individuals......... 11
B. Strategic Behaviour: the Prisoner’s Dilemma...................................... 12
C. Repeated Prisoner’s Dilemma .............................................................. 15
3. Game Theory .......................................................................................... 17
A. Static games, dynamic games and Nash equilibria. ............................. 17
B. Extensive form, dynamic games and perfect Nash equilibria.............. 20
4. Economic Applications: Oligopoly. ...................................................... 22
A. Oligopoly applications of the Nash Equilibrium ................................. 22
B. Cournot Competition: Quantity Strategy. ............................................ 23
C. Bertrand Competition: Price Strategy. ................................................. 24
D. The Stackelberg dynamic game. .......................................................... 25
E. Tacit collusion in repeated games: Cournot supergames ..................... 25
5. Appendix: Solution to selected problems............................................. 27
3
1. Introduction
Economic theory shows how markets work, i.e. how prices and quantities are
determined. In a world of certainty, decisions involve predictable streams of costs and
benefits. Converting these into present value one can compare them and choose the
optimal decision (that is, the one which maximises the benefits, net of costs).
However, we live and public policies (like the one described in this course) take place
in an uncertain world (and in a strategic context). In this setting the analysis of
individual choices is a bit more complex, as you can see from part.1. However, under
some simplifying assumptions economic problems can be dealt with as in a world of
certainty. Specifically, we assume that each individual does not know what will
happen, but knows the likelihood with which each outcome will be realised.
In an uncertain world, economic agents will then maximise their welfare, by choosing
the alternative that offers them the highest expected return (or utility), i.e. the sum of
returns (utility) associated with the different possible outcomes, each weighted by its
probability. The reader not familiar with expected return and expected utility can
familiarise with the implications of uncertainty, going first to the appendix. There we
will consider the utility flow from a year’s expenditure considering expected return in
“euros” and expected utility “utiles”.
To understand strategic economic context in which firms operate we also need to be
able to use game theory as a tool of analysis. Specifically, dealing with noncooperative game theory, we focus on individual players, who look to the way to attain
their best interest subject to given rules and possibilities. In this way, we will gain a
better knowledge of firms’ interactions and their behaviour when facing competitors
and the way they may react to regulations and competition policies.
In order to give a feeling of strategic behaviour, in part 2 we introduce game theory
starting with an informal description of the prisoner dilemma, one of the most famous
games. Later, in part 3 we try to formalise a bit more the analysis, discussing ways in
which one might “solve” a static game but we trying to avoid mathematical
definitions. Finally, in part 4 we discuss applications to oligopoly theory that are
nearer to the subjects dealt with in the topics related to competition policy.
All parts are complemented with problems to be solved. Solutions (or helpful hints for
problems with *, that will be solved in class) can be found in part 5.
4
1. Choice under uncertainty
Economic theory shows how markets work, through the determination of prices and
quantities. In a world of certainty, decisions involve predictable streams of costs and
benefits. Converting these into present value one can compare them and choose the
optimal decision (that is, the one which maximises the benefits, net of costs).
An optimal decision must be characterised by a positive present value (PV):
PV = Σi (Bi-Ci)/(1+r)i = Σi ρi Pi > 0
where Pi = Bi-Ci denotes the net benefits (profits, given by the difference between
benefits and costs), r is the rate of discount, ρi = 1/(1+r)i the discount factor and i=1,…
refers to time.
An optimal decision must be also characterised by the highest values among all
possible alternatives. Otherwise, undertaking such a decision would imply giving up
an alternative one with a higher (positive) present value.
In an uncertain world the analysis of individual choices is more complex. However,
under some simplifying assumptions we can convert the problem in the already solved
problem. Specifically, we can assume that each individual has a probability
distribution over possible outcomes. He does not know what will happen, but he
knows the likelihood with which each outcome will be realised. His problem is how to
maximise his welfare. He can choose the alternative that gives the highest expected
return (or utility), i.e. the sum of returns (utility) associated with the different possible
outcomes, each weighted by its probability. In what follows, we analyse the
implications of uncertainty, considering the utility flow from a year’s expenditure,
while temporarily ignoring other complications. To make things simple, we talk in
“euros” and “utiles” instead of “ euros per year” and “utiles per year”, i.e. an income
of x euros/year for one year equals x euros.
A. The expected return: Heads or tails?
Consider the case in which you are betting on whether a coin will come up heads or
tails. You have 1 € and you can choose between a certain outcome (i.e. to decline the
bet, ending up with 1 €) and an uncertain outcome (i.e. to accept the bet ending up
with either more or less than 1 €). Using a fair coin, half the time it will come up
heads. A rational gambler will take bets that offer a payoff of more than 1 € and
refuse any bet that offers less. For instance, if he is paid 2 € when the coin comes up
heads and he pays 1 € if it comes up tails, then, by accepting the bet, on average he
gains 0.50 €. If he is offered 0.50 € for the risk of 1 €, then on average he loses 0.25 €
5
by accepting the bet and should hence refuse the bet.
Taking the same gamble many times a gambler should choose the one with the highest
expected return. He should take any bet that is better than a fair gamble i.e. one with
positive expected return. The case of a gambler betting many times on the toss of a
coin can be generalised to describe any game of chance, following the rule “to
maximise expected return”. The expected return (E R) is the sum, over all of the
possible outcomes, of the return from each outcome times the probability of that
outcome.
ER =
Σi πi • Ri
with
Σi πi = 1
Here πi is the probability of outcome i occurring, Ri is the return from outcome
number i.
Any gamble ends up with one of the alternative outcomes happening; for instance,
when you toss a coin, it must come up either heads or tails. In this gamble, using a fair
coin π1 = π2 = 0.5 are respectively the probabilities associated to the outcome heads
and tails, according to which the gambler respectively gains R1 = 2 € and loses R2 = 1 €. The expected return is € 0.50.
E R = (π1 • R1) + (π2 • R2) = [0.5 • (+2 €)] + [0.5 • ( - 1 €) ]= + 0.50 €.
If you play the game many times, you will on average make € 0.50 each time you
play. The expected return from taking the gamble is positive, so you should take it,
provided that you can repeat it many times. The same applies to any other gamble with
a positive expected return. A gamble with a zero expected return is a fair gamble.
Problem 1: Check the result for π1 = π2 = 0.5, R1 = 0.5 €, R2 = -1 € by redoing the
calculations. What happens if π1 = 0.9 and π2 = 0.1?
Suppose now that you are playing the game once and that the bet is € 50,000, i.e. all
your income. If you lose, you starve, if you win, you gain only a modest welfare
increase. You may feel that a decline in your wealth from € 50,000 to zero hurts you
more than an increase from € 50,000 to € 150,000 would help you. The euros that
raise your income from zero to € 50,000 are worth more (per unit) than the additional
100,000 euros, starting from an income equal to 50,000 €. The rule “to maximise
expected return” is no longer rational. What is the rational behaviour in such a case?
B. The expected utility: risk aversion and risk preference.
John Von Neumann, the inventor of game theory, provided the answer to the question
at the end of last section, by combining the idea of expected return used in the
mathematical theory of gambling (probability theory) with the idea of utility used in
6
economics. In this way, he shows that it is possible to describe the behaviour of
individuals dealing with uncertain situations.
The basic underlying idea is that instead of maximizing expected return in euros,
individuals maximise expected return in utiles, i.e. expected utility. Each outcome i
has an associated utility Ui. He defined expected utility as:
E U(R) = Σi πi U(Ri)
The utility you get from outcome i depends only on how much more (or less) money
that outcome gives you. If utility increases linearly with income U(R) = a + (b • R),
as shown along OE on Figure 1, whatever decision maximises E R maximises E U.
E U(R) = Σi πi (a + b • Ri) =a Σiπi + b ΣiπiRi = a + b • E R
Hence, with a linear utility function the individual maximizing his expected utility
behaves like the gambler maximizing his expected return.
We can represent graphically the utility level of any outcome on a bidimensional
graph, such as ODE in Figure 1. Along this curve we will then find the utility of the
income Ri associated with outcome i. Considering ODE in Fig. 1, if you start with R*
= 50,000 € and bet all of it at even odds on the toss of a coin (heads you win, tails you
lose) then the utility to you of the outcome “heads” is the utility of € 1,300 (point E).
The utility to you of the outcome “tails” is the utility of zero dollars (point O).
Fig. 1
1,200
U(RB)
1,000
U(R*)
900 C
E U(R)
A
U(RA) 600
B
D
E
Fig. 2
U(RB) 1,800
E U(R) 1,000
RA
Rc R*
C
U(R*) 800
U(RA) 200
O
B
RB
RA
D
A
R* Rc
RB
ODE shows a relation where income has declining marginal utility. That is, total utility
increases with income, but it increases more and more slowly as income gets higher
and higher. In deciding whether to bet € 25,000, you are choosing between two
different gambles. If you do not take the bet, you have the certainty (π* =1) of ending
up with R* = € 50,000. If you do take the bet, you have a 0.5 chance of ending up with
RA = € 25,000 and a 0.5 chance of ending up with RB = € 75,000. So in the first case,
assuming U(50,000 €) = 1,000 utiles, we have:
7
E U(R*) = Σi πi Ui = π* • U* = 1,000 utiles
In the second case, with U(25,000 €) = 600 and U(100,000 €) = 1,200 we have:
E U(R) = Σi πi Ui = (0.5 • 600 utiles) + (0.5 • 1,200 utiles) = 900 utiles
The individual, taking the alternative with the higher expected utility, declines the bet.
In money terms, the two alternatives are equally attractive; they yield the same
expected return R* = 50,000 €. In that sense, it is a fair bet. In utility terms, the sure
option U(R*) is superior to the risky one U(RC). As long as the utility function has the
shape shown in Figure 1, a certainty of X € will be always preferred to a gamble with
the same expected return X €.
An individual who behaves in that way is defined as risk averse. Such an individual
would decline a fair gamble but might accept one that is better than fair, i.e. bet €
1,000 against € 1,500 on the flip of a coin, for example.
ODB in Figure 2 shows instead the utility function of a risk lover. It exhibits
increasing marginal utility. A risk lover is willing to take a gamble slightly worse than
fair, although he would decline one with very low expected return. An individual who
is neither a risk lover nor a risk averter is called risk neutral. Its corresponding utility
function (line OE) is also shown in Figure 1.
Problem 2: Will the agent accept the bet for U* = 1,000, UA = 200 and UB = 1, 800
with πA=πB= 0.5 in figure 2? What are the values of U*, UA and UB for a risk neutral
agent in Figure 1? Will he prefer the certain income R* for πA=0,6? And for πA=0,4?
The degree to which someone exhibits risk preference or risk aversion depends on the
shape of the utility function, the initial income, and the size of the bet. For small bets,
we can expect everyone to be roughly risk neutral; the marginal utility of an euro does
not change very much between an income of 49,999 € and an income of 50,001 €,
which is the relevant consideration for someone with 50,000 € who is considering a 1
€ bet.
C. Rational behaviour in an uncertain world.
As section B shown, it is relatively easy to predict the behaviour of someone
maximizing expected return than of someone maximizing expected utility. Each
individual can still maximise his utility by maximizing his expected return, as long as
he can repeat the same gamble many times (so that he can expect results to average
out). His income in the long run is (almost) certain. He maximises his expected utility
making that income as large as possible, i.e. by choosing the gamble with the highest
expected return. Maximizing expected utility is also equivalent to maximizing
8
expected return (like the gambler we started with) when the: (i) individual is riskneutral, (ii) the size of the prospective gains and losses is small compared to one’s
income (we can treat the marginal utility of income as constant and changes in utility
as proportional to changes in income, so he should act as if he were risk neutral).
Let us now consider a firm rather than an individual. The management, wishing to
raise the present price of firms’ stock, maximises the expected value of its future price
by maximizing the expected value of future profits. The threat of takeover bids forces
management to maximise the value of firms’ stock. When management pursues his
own goals the conclusion no longer holds. If the firm goes bankrupt, the income of the
chief executive may fall a lot. Accordingly, he may not be willing to take a 50 percent
chance of leading to bankruptcy even if it also has a 50 percent chance of tripling the
firm’s value. Hence the assumption of risk neutrality might not be always appropriate
for firms as well.
The existence of risk adverse agents explains the need for insurance. Suppose Paula’s
income is 30,000 € and there is a small probability (0.01) that an accident reduces it to
€ 10,000. The insurance company offers to insure her against that accident for a fixed
price of € 200. Whether or not the accident happens, she gives them € 200. If the
accident happens, they give her back € 20,000. She has a choice between two
gambles: to buy or not to buy the insurance. Buying the insurance, whether or not the
accident occurs she has € 30,000 minus the € 200 paid for the insurance. For the first
gamble: π1 = 1; R1 = 29,800 € and
EU = π1 • U(R1) = 998 utiles.
When she does not buy the insurance: π1 = 0.99; R1 = € 30,000; U(R1) = 1,000 utiles
and π2 = 0.01; R2 = € 10,000; U(R2) = 600 utiles. That implies:
E U(R) = [π1 • U(R1)] + [π2 • U(R2)] = 990 utiles + 6 utiles = 996 utiles.
Since Paula is better off with the insurance than without it and will buy it. Notice that
for R1 = € 30,000 the marginal utility of 100€ is about 1 utile.
Problem 3: How much would the agent in figure 1 pay to have a certain income R* =
€ 50,000, instead of RA = € 25,000 and RB = € 75,000 with π1 =π2 = 0.5. How much
should instead the agent in figure 2 be paid to have the certain income R*?
Buying insurance was a fair gamble: € 200 are paid in exchange for 1% chance of
receiving € 20,000. An insurance company making 100,000 bets will end up
receiving, on average, almost exactly the expected return, if the probabilities of these
bets are independently distributed. When insurance is fair, the insurance company and
the client breaks even in monetary terms, but the client gains in utility.
In the real world, insurance companies incur additional expenses other than paying out
9
claims and offer gambles somewhat less than fair to clients. Sufficiently risk averse
consumers still accept the gamble and buy an insurance contract that lowers their
expected return but increase their expected utility. In our case, with a marginal utility
of 100€ ≈ 1 utile, it would still be worth buying the insurance even if the company
charged € 300 for it. It would no longer worth buying at € 500.
Problem 4: Check those results.
Buying a lottery ticket is the opposite of buying insurance. When you buy a lottery
ticket, you accept an unfair gamble but this time you do it in order to increase your
uncertainty. In fact, on average, a lottery pays out in prizes less than it takes in. If you
are risk averse, it may make sense for you to buy insurance, but you should never buy
lottery tickets. If you are a risk lover it may make sense for you to buy a lottery ticket,
but you should never buy insurance.
This brings us to the lottery-insurance paradox. In the real world, the same people
sometimes buy both insurance and lottery tickets. They both gamble and buy
insurance knowing the odds are against them. Is this consistent with rational
behaviour? We propose two possible explanations. The individual is risk averse for
one range of incomes and risk preferring for another, higher, range. What individuals
get is not just one chance in a billion of a 100,000 € prize, but also, for a while, the
daydream (at a very low price) of getting it, because they have a slim chance to
actually win the prize.
D. Von Neumann utility function and social welfare.
Von Neumann proved that if individual choice under uncertainty meets a few
consistency conditions, it is always possible to assign utilities to outcomes in such a
way that the decisions actually made follows from maximizing expected utility.
He considers an individual behaviour “rational” or “consistent” under uncertainty (i.e.
choosing among “lotteries”: a collection of outcomes, each with a probability) if: (i)
given any two lotteries A and B, the individual either prefers A to B, prefers B to A, or
is indifferent between them, (ii) preferences are transitive; if you prefer A to B and B
to C, you must prefer A to C, (iii) in considering lotteries whose payoffs are
themselves lotteries people combine probabilities in a mathematically correct fashion,
(iv) preferences are continuous, (v) when outcome A is preferred to outcome B and
outcome B to outcome C, there is a probability mix of A and C (a lottery containing
only those outcomes) equivalent to B; i.e. since U(A) > U(B) > U(C) as utility moves
from U(A) to U(C), at some point, it must be equal to U(B).
Accepting these axioms and hence the Von Neumann utility the statement “I prefer
10
outcome X to outcome Y twice as much as I prefer Y to Z” is equivalent to “I am
indifferent between a certainty of Y and a lottery that gives me a two-thirds chance of
Z and a one-third chance of X”.
Problem 5*: Check that these statements are equivalent by doing the calculations.
We can make quantitative comparisons of utility differences and quantitative
comparisons of marginal utilities. The principle of declining marginal utility is
equivalent to risk aversion. We agree about the order of preferences and about their
relative intensity, but we may still disagree about the zero of the utility function and
the size of the unit in which we are measuring them. Utility functions are arbitrary
with respect to linear transformations. Changes in the utility function that consist of
adding the same amount to all utilities (changing the zero), or multiplying all utilities
by the same number (changing the scale), or both, do not really change the utility
function, i.e. the behaviour is exactly the same. [Please check this statement].
Utilitarian used the concept of utility, to determine social welfare, i.e. the total utility
of individuals that society should maximise. This was criticized because there is no
way of making interpersonal comparisons of utility, nor of deciding if a change that
benefits me and hurts you increases total utility. With Von Neumann utility the
utilitarian rule “to maximise total utility” is equivalent to “choose the alternative you
prefer if you were one of the people affected”. Given the probability π = 1/N of being
anyone; if there are N individuals, we can write the utility of person i as Ui = U(Ri),
and consider the expected utility of the lottery with probability π of being each person:
E U(R) = Σi πi Ui = Σi π Ui, = π Σi U(Ri).
It is easy to see how
Σi Ui is simply social welfare, i.e. society’s total utility.
Problem 6: Would this result hold when individual utility functions are different?
11
2. Strategic thinking and game theory
A. Interdependencies, strategic behaviour and rational individuals
An economy is fundamentally an interdependent system. Usually, interdependencies
are pushed into the background to simplify and solve economic problems. Markets are
represented in terms of an individual, consumer or producer, maximizing against an
opportunity set (e.g. a given budget constraint). Thus, important interaction features of
many markets are eliminated; bargaining, threats, bluffs, i.e. the whole range of
strategic behaviour. In this way economic theory explains a large part of real markets
avoiding situations involving strategic behaviours.
In practice, it presents two basic standard models: competitive markets and
monopolies. (1) On one hand, each individual, i.e. the consumer and the producer, is a
small part of the market. Therefore, he will take the behaviour of other individuals as
given. He does not have to worry about how what he does will affect everyone else’s
behaviour. The rest of the world consists for him of a set of prices at which he can sell
what he produces and buy what he wants. (2) On the other hand, the monopolist is so
big that his behaviour affects the entire market. He deals with the mass of consumers,
which individually know that each of them cannot affect his behaviour. For the
monopolist, customers are not persons; but simply a demand curve. In fact, each
consumer buys the quantity that maximises his welfare at the price fixed by him.
In the analysis of oligopoly or bilateral monopoly, basic economics collapses from a
coherent theory to a set of guesses. In fact, analysing strategic behaviour is really
difficult. John Von Neumann, one of the brightest mathematician and economist of the
last century, created game theory as a whole new branch of mathematics in the process
of trying to solve it. The work of successive economists has brought economics closer
to being able to say what economic agents will or should do in such strategic
framework.
We are interested in game theory as a tool of economic analysis to understand strategic
economic context. Studying the interactions of rational individuals we gain better
knowledge and prediction of the real ones. Game theory give us a clear and precise
language to express and formalise intuitive insight and notions in models (defining
players and for each one strategies and payoffs, i.e. a numerical representation of
preferences) that can be analysed deductively, testing their logical consistency and
finding which hypothesis support particular conclusions. Accordingly, we assume that
agents are able to calculate how to play the game and to consider every possibility
before making their first move. Obviously, for most games that assumption is not
realistic; but makes relatively straightforward to describe the perfect play of a game.
12
Whatever the game, the perfect strategy is the one that produces the best result.
It is more difficult to construct a theory of imperfect decisions of realistic players with
limited abilities. The assumption of rationality can be defended on the grounds that
there is one right answer to a problem and many wrong ones. If individuals tend to
choose the right one, we could analyse his behaviour as if he chooses the best option.
Let us start with an informal description of the most famous game that recurs in
economic context, to give a feeling of strategic behaviour. Part 2 contains a more
formal analysis (which avoid mathematical definitions), discussing ways in which one
might “solve” a static game and applications to oligopoly theory. Dealing with noncooperative theory, our unit of analysis are individual players subject to clearly
defined rules and possibilities, which look to their best interest.
B. Strategic Behaviour: the Prisoner’s Dilemma
Paula and Albert are arrested. If convicted, each will receive a jail sentence of five
years. The public accuser does not have enough evidence to get a conviction, so he
puts criminals in separate cells. He goes first to Paula. If she confesses and Albert does
not, the major charge will be dropped and she get only three months for a minor
charge. If Albert also confesses, the charge cannot be dropped but the judge will be
indulgent; Albert and Paula will get two years each. If Paula refuses to confess, the
judge will not be indulgent. If Albert confesses, Paula will be convicted possibly with
the maximum sentence. If neither confesses, they will get a six-month sentence, for
minor charges. Then, the public accuser goes to Albert’s cell and gives a similar
speech. Figure 1A shows the matrix of outcomes in utiles facing Paula and Albert.
Once Paula and Albert choose their strategies (a row and a column, respectively) we
can insert their payoffs (UP, UA) in a cell, the first for Paula, the second for Albert.
Figure 1A
A
Figure 1B
C
NC
C
4 ; 4
2 ; 5
NC
5 ; 2
3 ; 3
P
A
C
NC
C
4 ; 4
4 ; 3
NC
3 ; 4
3 ; 3
P
Paula reasons as follows: (1) If Albert confesses (he chooses NC, not to cooperate with
me) and I don’t (C, NC), I get five years UP(C, NC) = 2; if I confess too (NC, NC), I
13
get two years UP(NC, NC) = 3. If Albert is going to confess, I had better confess too;
3 utiles > 2 utiles. (2) If no one confesses (C,C), I go to jail for six months UP(C, C) =
4. That is an improvement, but I can do better. If I confess (NC, C), I get three months
UP(NC, C) = 3. So if Albert is going to stay silent, I am better off confessing 5 utiles >
4 utiles. (3) Whatever Albert does I am better off confessing (choosing strategy NC).
Problem 7 Show that Albert makes the same calculation and reaches the same
conclusion. So both dictate their confession.
Obviously, Paula picks her strategy with the objective of getting the highest payoff,
since it is just a numerical representation of her preferences. Albert does the same. The
same apply if Paula and Albert are two producers selling a similar product and payoffs
represent net profit (ΠP, ΠA). Each can advertise (or make price concessions) to attract
clients. So each one is lead to advertise (chose NC) to increase profits at expenses of
the rival. But if both increase advertising (NC, NC) net profits will decrease. They
would improve by cooperating and restricting advertising or price concessions (C, C).
This game introduces a solution concept. They do not cooperate because that is better
than cooperating, whatever the other does. In Figure 1A the column NC has a higher
payoff for Paula than the column C whichever row Albert chooses. Similarly, the row
NC has a higher payoff for Albert than the row C whichever column Paula chooses.
If one strategy leads to a better outcome than another whatever the other player does,
the first strategy is said to dominate the second. If one strategy dominates all others,
then the player is always better off using it; if both players have such dominant
strategies, we have a solution to the game. In this case, we are also assuming that each
player believer that his strategy choice does not affect the others’ strategy choices.
When applying dominance criterion iteratively, we assume that players assume that
others will not play dominated strategies. As far as these premises are correct
dominance gives a neat and straightforward mechanism to make predictions. When the
solution cannot be found with the dominance criterion we should recur to a different
solution concept due to Nash, we will introduce in part 3.
In our example, both players act rationally and both are, as a result, worse off.
Individual rationality, i.e. making the choice that best achieves the individual’s ends,
results in both individuals being worse off. The result of prisoner’s dilemma seems
counter-intuitive, but there is a number of situations where rational behaviour by the
individuals in a group made all of them worse off.
The explanation is that individual rationality and group rationality are different things.
Paula is only choosing her strategy, not Albert’s one. If Paula could choose between
the lower right-hand cell of the matrix and the upper left-hand cell, she would choose
14
the former; so would Albert. But those are not the choices they are offered. Paula is
choosing a column, and the right-hand column dominates the left-hand column; it is
better whichever row Albert chooses. Albert is choosing a row, and the bottom row
dominates the top one whichever column Paula chooses.
They do not cooperate, if the structure of rewards and punishments facing them has
not been changed, as in Fig. 1B. Criminals and producers make considerable efforts to
raise the cost of not cooperative strategies and lower the cost of cooperative ones. That
does not confute the logic of prisoner’s dilemma; it merely means that real agents are
sometimes playing other games, such in Fig. 1B. When payoffs have the structure
shown in Figure 1A, the logic of the game is compelling and there is no cooperation. P
and A cannot make a binding agreement since they must move simultaneously and
independently, so that there is no way to bind the other or inflict some punishment.
Problem 8 Show that in Fig. 1B Paula and Albert make the same calculation and
reach the conclusion to cooperate. So both cooperate.
Assume now that nature makes the first move and chose game 1A with probability 0.2
and game 1B with probability 0,8, without telling to Paula and Albert what is the case.
How should they behave? They should use expected payoffs according to the theory of
choice under uncertainty. The expected utility in case (C, NC) will thus be given by
payoff 0.2 (2, 5)+ 0.8 (4, 3) = (3,6; 3,4). Show that both will cooperate in this case.
Game theory provides a taxonomy for economic situations, based on the strategic
form. Other games, like the so-called battle of sexes can be applied to economic
context. If Paula and Albert produce complementary goods they may wish to adopt
compatible standards, even if they may prefer different sort of standards. In this game,
in figure 1C players seek to coordinate their actions, although they have conflicting
preferences.
Figure 1C
A
Figure 1D
a
b
a
3;5
0;0
b
0;0
5;3
P
A
a
b
a
3;5
0;0
b
0;0
2;1
P
We can have two solutions where a standard is adopted: a (favoured by Albert), or b
(favoured by Paula). Also in figure 1D we do not have a dominated strategy, but
standard a here emerges as optimal from consistent preferences. We will provide the
15
tools to solve these games in part 3 introducing Nash equilibria.
C. Repeated Prisoner’s Dilemma
It could be argued the prisoner’s dilemma result is correct, simply because it is a one
shot game. Real-world situations involve repeated plays. If one does not cooperate, he
can expect a similar treatment next time, so they both cooperate. This “reputation”
argument seems credible, but is it right? Consider Paula and Albert playing the game
in Figure 1.A a thousand times. A player who betrays his partner gains 1 utile in the
short run. The victim will respond by betraying on the next turn, and perhaps several
more. Both players would be better off cooperating every turn 4 > 3. Is that
insignificant gain (1 utile) worth this huge price (1,000 utiles)?
However, this reasoning has a problem. Consider the last turn of the game. Each
player knows that whatever he does, the other will have no further opportunity to
punish him. The last turn is therefore a one shot game. Non-cooperating dominates
cooperating. Each player knows that the other will betray him on the last move. He
needs not fear punishment for anything he does on the previous move; in any case the
other is going to betray him on the next move. So both betray on that move as well and
there is now no punishment for betraying on the previous move. Thus, if they are
rational they betray each other on the first move and every move thereafter. If they had
been irrational and cooperated, they would each have ended up with a best deal. The
result seems paradoxical, but the backward induction argument is correct. The
cooperative solution to repeated prisoner’s dilemma is unstable because it pays to
betray on the last play and hence on the next-to-last play, and so on back to the
beginning. With sufficiently bounded rationality the cooperative solution is no longer
unstable. Consider the game of repeated prisoner’s dilemma (with 1,000 plays) played
by robots programmed to compute only 900 possible states, due to their limited
amount of memory (each state implies a move, cooperate or betray in the case of
prisoner’s dilemma). They wont consider the last round if they cannot count up to
1,000.
On the other hand, cooperation may be stable if the play is repeated an infinite or
indefinite number of times. The promise to cooperate becomes credible due to the
threat to punish non-cooperative behaviour, since once the cooperative equilibrium is
broken there is no advantage from one side in trying to restore it. However, there is no
unique equilibrium, the non-cooperative solution is an equilibrium and similarly
behaviours in which periods of cooperation and non-cooperation are alternated.
Suppose that P cooperates for two periods and then does not cooperate for one period,
while A always cooperates. P gets 13 each three periods and A 10, but both are better
16
off than getting 9 arresting cooperation.
Problem 9* Consider Paula and Albert playing the game in Figure 1.A when the
probability to repeat it next time is ρ=0,9. Assume booth cooperate at first. Does it
pays not to cooperate if once an agent is betrayed he will always betray? What if the
game is repeated infinite times and agents discount future utility by r = 10%?
17
3. Game Theory
Game theory, was conceived by Von Neumann and presented in The Theory of Games
and Economic Behaviour co-authored with Oskar Morgenstern. You have probably
realised how broad are the applications of “game theory”. Von Neumann’s objective
was to understand all behaviours that could be structured like a game. That includes
most of the usual subject matter of economics, sociology, interpersonal relations,
political science, international relations, biology and perhaps more. Along his line of
reasoning solution means thinking how an agent should figure out how to play any
game perfectly. If one can set up any game as an explicit mathematical problem, the
details of the solution of each particular game become simply applications.
In this perspective, complicated games turn out to be trivial. The total number of
moves, and thus the total number of possible ways to play, is limited, very large but
finite. All a player needs to do is list all possible games, note his payoff, and then work
backward from the last move, assuming at each step that each player choose the move
that leads to an higher payoff.
This is the right approach to games, if you search a common way of describing all of
them, in order to figure out in what sense games have solutions and how, in principle,
one could find them. Obviously, it may not be a practical solution to many games. The
number of possible moves is very large like the number of stars in the universe, so
finding enough paper (or computer memory) to list them may be still difficult. But, as
game theorists, we may not be interested in these difficulties, being willing to give an
unlimited amount of computer memory and time to solve a game.
A. Static games, dynamic games and Nash equilibria.
For simplicity sake, let us consider two-person games. We need to show how any
game can be represented in a reduced (or normal or strategic) form as in Figures 1 and
in what sense the reduced form of a game can be solved.
We can think of a dynamic game as a series of separate decisions; I make a first move,
you respond, I respond to that, and so forth. We will see later that it can be represented
it by a tree in extensive form. However, we can, describe the same game in terms of a
single move by each side. The move consists of the choice of a strategy describing
what the player will do in any situation. The strategy would be a complete description
of how I would respond to any sequence of moves; I might observe my opponent
making and to any sequence of random events, such as the toss of a dice. Thus one
possible strategy might be to start by a given move, then if the opponent’s move is x to
respond by y, if the opponent’s move is instead z to respond by w, and so on.
18
Since a strategy determines everything a player will do in every situation, playing any
game simply consists of each side picking one strategy. Strategy decisions are
simultaneous; although each player may observe his opponent’s moves as they are
made, because he cannot see his opponent’s mind. Once the two strategies are chosen,
everything is determined. We can imagine the two players writing down their
strategies and then sitting back and watching as computers execute them.
Considered in these terms, any two-person game can be represented by a payoff
matrix, although it requires a huge number of rows and columns. Each row represents
a strategy that P can choose; each column represents a strategy that A can choose. The
cell at the intersection shows the outcome of that particular pair of strategies. If the
game contains random elements, the cell contains the expected outcome, the average
payoff over many plays of the game. In game theory, this way of describing a game is
called its strategic or reduced form.
Let us now discuss the Nash Equilibrium solution concept, a generalization of an idea
developed by a French economist/mathematician early in the nineteenth century.
Consider a game played over and over. Each player observes what the other players
are doing and alters his play accordingly. He acts on the assumption that what he does
will not affect what they do. In this way he does not have to take such effects into
account. He keeps changing his play until no further change can make him better off.
All the other players do the same. The equilibrium is reached when each player has
chosen a strategy that is optimal for him, given the strategies that the other players
follow. This solution is called Nash equilibrium and is a generalization by John Nash
of what Antoine Cournot realised more than a hundred years ago.
All players know what they and others should do, i.e. it is self-evident how to play.
Behaviours are self-evident and each player believes it to be clear to others; so he
chooses the best response to what others are obviously doing. In practice, they talk
earlier and came to a credibly self-enforcing agreement, or experienced each other, or
follow social rules of conduct. The set of Nash equilibria is the collection of all
credibly self-enforcing agreement and stable conventions that can be arranged.
Consider the case of driving. Choosing a strategy consists of deciding what side of the
road to drive on. The Hungarian population reached Nash equilibrium when everyone
drives on the right (R, R). The situation is stable, and would be stable even without
traffic police to enforce it. Since everyone else drives on the right, each player driving
on the left would impose very large costs on himself (as well as on others); so it is in
his interest to drive on the right. In England, everyone drives on the left (L,L). That is
a Nash equilibrium too, for the same reason. It may be a sub-optimal since in other
countries people drive on the right, cars should be manufactured with steering wheels
19
on the right side for the English market. Foreign tourists driving in England may
automatically drift into the right-hand lane and discover their error only when they
encounter an English driver face to face.
If all English drivers switched to driving on the right, they might all be better off. But
any English driver who tries to make the switch on his own initiative would be very
much worse off. The Nash equilibrium is stable against individual action even when it
leads to sub-optimal outcome. It may not be stable against joint action; e.g. a Country
switching to driving on the right, i.e. everyone changing his strategy at once.
Problem 10 Represent such a game using a payoff matrix. Show that (a, a) and (b, b)
are Nash equilibria in the game in figure 1C. Is that true for the game in figure 3D?
The Nash equilibrium is not, in general, unique; both everyone driving on the left and
everyone driving on the right are equilibria. Part of its definition is that my strategy is
optimal for me, given the strategies of the other players; I act as if what I do has no
effect on what they do. But my actions affect the other players, who respond following
the optimal response strategy. Moreover, the choice of a different strategy variable
gives rise to different Nash equilibria for otherwise identical games. So also the rule of
the game should be agreed upon.
Nash equilibria do not cover everything a good player would do. We explicitly ignore
approaches like stealing candy from babies, i.e. strategies that work badly against
good opponents but exploit the mistakes of bad ones. It is hard to include them, being
quite impossible to define a “best” strategy against many different opponents and
many different mistakes they might make. It seems reasonable to define a solution as
the correct way to play against an opponent playing correctly.
Figure 2A
A
P
a
a
- 5; 5
Figure 2B
b
a
5 ; -5
5 ; -5
b
Aa
a
b
Figure 2C
P
a
Ab
b
A
b
Pa
a
b
5 ; -5
-5 ; 5
a
Pb
b
a
b
-5; 5
-5 ; 5
5 ; -5
- 5; 5
-5 ; 5
5 ; -5
5 ; -5
Whether a solution exists for a game depends on what its reduced form looks like.
Figure 1 shows reduced form of games, each of which has a solution. However, the
game in Figure 2A, that has no solution in terms of pure strategies has still a mixed
strategies solution. A mixed strategy is a probability mix of pure strategies, e.g. a 50%
chance of a, a 50% chance of b for Paula, and vice-versa for Albert. A player who
follows that mixed strategy will lose, on average, zero, whatever his opponent does. A
20
player whose opponent follows that strategy will win, on average, zero, whatever he
does. So the Von Neumann solution is for each player to adopt that strategy. It is not
only a solution but the only solution; if the player P follows any pure strategy (say a)
more frequently than the other, his opponent A can win more often than he loses by
always picking the pure strategy (a) that wins against that one.
Problem 11 Change all the payoff sign in Fig. 2A and let Paul and Albert play the
new game. Show it has only a mixed strategy solution and find it. What kind of game
is it?
B. Extensive form, dynamic games and perfect Nash equilibria.
Figure 2B represents the same game of Figure 2A in extensive form. In this case
attention is given to the timing of actions and information available to players when
they choose each action. In an extensive game we have a series of decisional nodes
labelled by the players whose turn comes when that position is reached. At the start of
the game at the initial node P the two lines labelled a and b indicate that player P must
decide between a and b. In general, these lines may point to another node (Aa and Ab)
or a vector of number (payoffs) when that move ends the game. In Figure 2B after the
initial move by P, it comes the turn of player A who makes his choice between a and
b, without knowing P’s decision, as indicated by the ellipse, called information set
containing nodes Aa and Ab. This means that A does not know which of the two nodes
he is at when selecting his response. His choice ends the game. A game in extensive
form is like a tree, starting at the initial node it branches out till it reaches payoffs. In
fact, each following node has exactly one line (a move) aiming to it and at least one
line out of it (an action available to player). Accordingly, from any node there is a
single path towards the initial node and it is impossible to cycle back to that node.
Nodes in the same information set have the same range of choice and players.
For every extensive form game there is a corresponding strategic form game, but a
strategic form game may correspond to several extensive form games. Trivially,
interchanging the chronological order we can let A choose first as in figure 2C.
In figure 3A, by removing the ellipse we introduce a dynamic game, in which A learns
P’s move. This game does not correspond to 2A and has different solutions.
Problem 12 Represent the game in Fig. 3A in normal form. Show that it has two pure
strategy solutions and find them. Is there a first move advantage?
Let us now change payoffs and consider the Stackelberg game in figure 3B, where P
and A are the only producers in a market facing a decreasing demand function. P can
and does commit to her level of output (a=high, b=low) before A has the opportunity
21
to act. A observes P’s output and then decides what quantity to produce (a or b).
Figure 3A
Figure 3B
P
a
b
Aa
a
A
a
P
b
Aa
Ab
b
Figure 3C
P
1
1
2
5 ; -5
4
a 5; 2 -3; -3 5; 2 -3; -3
Ab
a
b
a
b
a
b
5 ; -5
-5 ; 5
-3 ; -3
5;2
2;5
4;4
b 4; 4 2; 5
-5 ; 5
3
2; 5
4; 4
Suppose that before P moves A warns her by saying: “I will choose a whatever you
choose”. If P believes this menace, she should choose b in order to get 2 instead of -5.
A’s optimal response to the choice b by P is indeed a. So if P finds the threat credible
and optimally respond to it, P will be happy to play b. But it is an incredible threat if
P chooses a A will face a loss if he carries out his promise and will choose b in the
absence of other considerations. A threat to act differently from the optimal response
is not credible, since P’s move is already carried out.
We can also have incredible promises. Suppose that A says: “I will choose a if you
choose a and b if you choose b”. If P believes that and acts accordingly, A has an
incentive to break his promise and chose a once P has chosen b. Accordingly, P will
not believe him without some credible guarantees and will opt for a. But, (b, a) is a
Nash equilibrium.
Problem 13 The reader is invited to show that, by representing the game in a normal
form, as in figure 3.C, what are the strategies 1, 2, 3, 4, available to A.
When P plays a, A should optimally respond to it, playing b, i.e. if the node Aa is
reached the “rest of the game” (subgame with initial node Aa) will be played in the
standard way, i.e. acting in their best interest given the circumstances. Hence, (a, b) is
the only Perfect Subgame Equilibrium, i.e. it represents a Nash equilibrium in each
subgame (also in those that are not reached in equilibrium). All P.S.E. are also Nash
equilibria, vice versa not all Nash equilibria are necessarily P.S.E. In fact, (b, a) is not
a P.S.E. because is not an equilibrium in the subgame with initial node Aa.
22
4. Economic Applications: Oligopoly.
A. Oligopoly applications of the Nash Equilibrium
In economics, there are many applications of “game theory”, but we shall limit to the
case of oligopoly, a market structure somewhere in between monopoly and perfect
competition. Oligopoly exists when a small number of firms sell in a single market. A
reason for this situation is that the optimal size of firm (at which average cost is
minimized) is so large that there is only room for a few such firms; this corresponds to
the cost curves shown in Figure 4. The situation differs from perfect competition
because each firm is large enough to have a significant effect on the market price. It
differs from monopoly because there is more than one firm. The firms are few enough
and their products similar enough that each must take account of the behaviour of all
the others.
As far as their customers are concerned, oligopolists have no more need to worry
about their strategic behaviour than monopolies do. The problem arises with
competitors. All the firms will be better off if they keep their output down and their
prices up, but each firm is better off increasing its output in order to take advantage of
the high price.
One can imagine at least three different outcomes. The firms might behave
independently, each trying to maximise its own profit while somehow taking account
of the effects of what it does on what the other firms do. A leader may emerge while
other firms behave as followers. In repeated games, firms might cooperate,
coordinating their behaviour as if they were a single monopoly.
In a static game, firms in an oligopolistic industry may talk about cooperative
agreement, although as in the prisoner dilemma each will violate the agreement if
doing so is in its interest. In a one shot game, agreements are not worth making
because they cannot be enforced, even if they can be reached. In such a situation, each
firm tries to maximise its profit independently and the result is a Nash equilibrium.
Each player takes what the other players are doing as given when deciding what he
should do to maximise gains. But firms face a downward-sloping demand curve.
We should carefully define a strategy, since different definitions (quantity or price)
lead to different conclusions. Each firm may decide how much to sell and let the
market determine the price; or it may choose its price and let the market determine
quantity. Considering a duopoly, let us first find the Nash equilibrium on the
assumption that a firm’s strategy is defined by the quantity it produces, i.e. the case
originally analysed by Cournot.
23
B. Cournot Competition: Quantity Strategy.
Given quantities produced by other firms, each firm calculates how much it should
produce to maximise profit. Figure 4 shows this situation from the point of view of
firm 1. D is the demand curve for the whole industry. Q2 is the output of other firms in
the industry (a single firm in a duopoly). It also shows the marginal cost (MC, defined
as the slope of total cost that is the cost of increasing quantity by one unit) and the
average cost (AC, defined as total cost over quantity) of firm 1. Whatever price the
firm decides to charge, it faces the residual demand curve (total demand minus Q2) D1
= D - Q2. To maximise profits the firm calculates its marginal revenue from the
residual demand curve D1 at the point at which it intersects the marginal cost,
producing quantity Q*1, as in monopoly, provided that for that quantity it is not loosing
money. It makes profits if the average cost AC is smaller than price P. If firms are
identical, they will find the same profit-maximising output. In a Nash equilibrium with
two firms, each firm produces Q*1, with a total output Q=2Q*1.
With free entry if the price is above the average cost we have positive profits and new
firms will enter the market. Hence, in equilibrium the average cost equals the marginal
revenue and the profit is approximately equal to zero, as in Figure 4.
Figure 4
Q2
MC 1
MR 1
Q1
R1
AC 1
Q2
P1
Figure 5
Q
D1
D
Q *2
E
Q*1
R2
Q1
The Nash equilibrium can be solved using reaction curves, that show what strategy
one player chooses, given the strategy of the other. In Figure 4, D1 is the residual
demand curve faced by Firm 1, given that Firm 2 is producing a quantity Q2. By
repeating this calculation of Q1 for different values of Q2, we build R1 on Figure 5 as
the reaction curve for Firm 1. It shows, for any quantity that Firm 2 chooses to
produce, how much Firm 1 will produce. Point E is the point calculated using Figure
4. The same analysis can be used to generate R2, the reaction function showing how
much Firm 2 will produce for any quantity Q1 Firm 1 produces. Since the two firms
are assumed to have the same cost curves, their reaction curves are symmetrical.
Nash equilibrium is reached at point E, where each firm produces its optimal quantity
given the quantity produced by the other firm. It occurs only at point E, where reaction
24
curves intersect, since only there the strategies are consistent, each optimal against the
other. This “reaction curve approach” can be applied to a wider range of problems.
C. Bertrand Competition: Price Strategy.
Let us now redo our analysis using a price strategy. Each firm observes prices other
firms are charging and select the price that maximises its profit. Since firms produce
identical goods, only the lowest price matters. Figure 6 shows the situation from firm
1’s perspective. P1 is the lowest of the prices charged by the other firms.
The firm in this situation has three alternatives, as shown by D1. It can charge more
than Pl and sell nothing. It can charge Pl and sell a determinate amount, Q(Pl)/N, if
there are N firms each charging Pl. It can charge less than Pl, say one cent less, and sell
as much as it wants up to Q(Pl). It is easy to see that, if Pl is greater than AC, the last
choice maximises its profit. Firm 1 maximises profit by producing Q1(Pl) and selling it
for just under Pl.
In a Bertrand-Nash equilibrium, every firm is maximizing its profit. Each other firm
also has the option of cutting its price (say by a cent) and selling all it wants. Whatever
price the other firms are charging, it is in the interest of any other firm to charge a cent
less. The process stops when the price reaches a level consistent with each firm selling
where price equals marginal cost. If additional identical firms are free to enter the
industry, the process stops when price gets down to minimum average cost and firms
is indifferent between selling as much as it likes or nothing.
Figure 6B
Figure 6A Q 2
D
MC 1
AC 1
B
Q2
P1
D1
Q1
O
Q*2
E
Q2
Q 1 (P 1 )
Q(P 1)
Q*1
Q1
Q1
Oligopolistic firms engaged in Bertrand competition behave in a competitive way,
ending up in point B instead of E. This seems a bit peculiar in an oligopoly where
firms are large and affecting price they should produce less than the competitive
output.
25
D. The Stackelberg dynamic game.
Let us now finally consider the dynamic Stackelberg game using a quantity strategy.
Firm 1 can commit to a given level of output before firm 2 has the opportunity to act.
Firm 2 observes her output and then decides what quantity to produce. Figure 7A
shows the Stackelberg game from firm 1’s perspective. Firm 1 can choose the Cournot
equilibrium production level Q*
1 but also different levels. He knows that firm 2 will
respond optimally to her chosen level. Starting from Q*
1 she can improve her situation
increasing her output to Q’1. In fact, in this way firm 2 will find it optimal to cut his
output to Q’2 and the price will be reduced in a limited amount. By repeating this
calculation of Π1 for different values of Q1, she will find the maximum profit Π’1. Her
new profits Π’1 will be greater than Π*1, the Cournot equilibrium level.
Figure 7A
Q2
MC 2
P
P’
Figure 7B
R1
Q1
MR’1
Q’1
MR 2
Q’2 Q2
Q
D2
D’2
D
Q *2
E
Q’2
Q*1
S1
M1
Q’1
R2
Q1
In Figure 7B, looking at firm 2’s reaction curve, the Stackelberg equilibrium is
reached at point S1 at the right of E, where firm 1 produces a greater quantity and firm
2 a smaller one. In the quantity space, knowing that profits increases the nearest we are
to the monopoly output M1, we can draw the isoprofit curves for firm 1 (and 2). Define
them They reach their maximum in correspondence of the reaction curve R1 (as it
represents the best reply for a given Q2), where they are tangent to the horizontal lines.
The point on R2, which maximises profits of firm 1 is S1, i.e. the one on the lowest
isoprofit curve. In practice, the first mover anticipates correctly her rival’s reaction.
She incorporates the follower’s maximisation problem when setting Q’1. The follower
behaves as in Cournot, since no further reactions are expected. In the equilibrium S1
firm 1 the first mover chooses the output on R2 reactions curve, which maximises his
own profits.
E. Tacit collusion in repeated games: Cournot supergames
Collusive outcomes are sustainable as non cooperative equilibria in repeated games.
Let our firms play infinite times the Cournot quantity setting game. Before starting the
game firms select a Pareto optimal equilibrium and commit to a strategy, after in each
26
stage game they choose output simultaneously. Firm i maximises its profits present
value [with ρ = 1/(1+r) = discount factor].
∞
Πi = Σt ρt Πi(Q1t, Q2t)
Outputs (Q1t, Q2t) are observed at the start of period t+1. Firms condition current
actions on previous behaviour using trigger quantity strategy (a single deviation
triggers the end of cooperation): they cooperate (producing collusive output) as long as
others do so, after a defection they turn to non-cooperation. Punishments correspond to
the Cournot equilibrium in an infinite long punishment phase (Cournot reversion).
Stage game payoffs are represented in figure 8.
Figure 8A
P
A
C
NC
C
Π∗ ; Π∗
Πd ; Πb
NC
Π ; Πd
b
Πc ; Πc
Figure 8B
Q2
R1
Q2
Q *2
B
D1 E
C
D2
Q *1
R2
Q1
Q1
Firm earns Π*/(1-ρ) by cooperating (where Π* = profit in the collusive equilibrium),
Πd+ρΠc/(1-ρ) by deviating (where Πd = profits when deviating from collusive
equilibrium and Πp = Πc = Cournot profits in the punishment phase).
We have a collusive equilibrium when: Π* > Πd(1-ρ)+ρΠp
Πd - Π*
short run gains from defection
or
ρ >  = 
Πd - Πp
permanent punishment losses
Since Πd - Πc = (Πd - Π*) + (Π* - Πc) > Πd - Π* a mild punishment is enough for ρ
close to 1. When the respond to “cheating” (detection and punishment) is quick any
payoff vector that is better for all players than a Nash equilibrium payoff vector can be
sustained as the outcome of a perfect equilibrium of the infinitely repeated game, if
players are patient (ρ close to 1). Cournot reversion is not the most severe punishment;
credible more competitive behaviour (like Bertrand) lowers Πp promoting collusion.
Problem 14* Assume that there are only two firm with constant marginal costs (C1 =
cQ1 and C2 = cQ2) facing market demand function Q(P) = Q1+Q2 = a-P. Calculate
the levels of output and profit for both firms with Cournot, Bertrand, Stackelberg,
collusion and “cheating”. Calculate also the minimum level of ρ supporting collusion.
27
5. Appendix: Solution to selected problems
Problem 1: Check the result for π1 = π2 = 0.5, R1 = 0.5 €, R2 = -1 € by redoing the
calculations. What happens if π1 = 0.9 and π2 = 0.1?
E R = (π1 • R1) + (π2 • R2) = [0.5 • (0.5 €)] + [0.5 • ( - 1 €) ] = - 0.50 €.
The expected return from taking the gamble is negative.
It becomes positive if π1 = 0.9 and π2 = 0.1.
E R = (π1 • R1) + (π2 • R2) = [0.9 • (0.5 €)] + [0.1 • ( - 1 €) ] = 0.35 €.
Problem 2: Will the agent accept the bet for U* = 1,000, UA = 200 and UB = 1, 800
with πA=πB= 0.5 in figure 2? What are the values of U*, UA and UB for a risk neutral
agent in Figure 1? Will he prefer the certain income R* for πA=0,6? And for πA=0,4?
Yes, being indifferent;
U* = 1,000, UA = 500 and UB = 1, 500;
Yes,
No.
Problem 3: How much would the agent in figure 1 pay to have a certain income R* =
€ 50,000, instead of RA = € 25,000 and RB = € 75,000 with π1 =π2 = 0.5. How much
should instead the agent in figure 2 be paid to have the certain income R*?
In both case it is the segment B*BC
Problem 4: Check those results.
U(29,700)= 997 and U(29,600)= 995
Problem 5*: Check that these statements are equivalent by doing the calculations.
Lottery 1 consist of Y, Lottery 2 of a 2/3 chance of Z and a 1/3 chance of X.
Rearranging statement 1 we find it implies statement 2
Problem 6: Would this result hold when individual utility functions are different?
Yes, in fact: E U(R) = Σi πi Ui(Ri) = πΣiUi(Ri)
28
Problem 7 Show that Albert makes the same calculation and reaches the same
conclusion. So both dictate their confession.
Albert should reason as follows: (1) If Paula confesses (she chooses NC, not to
cooperate with me) and I don’t (NC, C), I get five years UA(NC, C) = 2; if I confess
too (NC, NC), I get two years UA(NC, NC) = 3. If Paula is going to confess, I had
better confess too; 3 utiles > 2 utiles. (2) If no one confesses (C,C), I go to jail for six
months UA(C, C) = 4. But if I confess (C, NC), I get three months UA(C, NC) = 3. So I
am better off confessing 5 utiles > 4 utiles. (3) Whatever Paula does I am better off
confessing (choosing strategy NC).
Figure 1A
A
Figure 1B
C
NC
C
4 ; 4
2 ; 5
NC
5 ; 2
3 ; 3
P
A
C
NC
C
4 ; 4
4 ; 3
NC
3 ; 4
3 ; 3
P
Problem 8 Show that in Fig. 1B Paula and Albert make the same calculation and
reach the conclusion to cooperate. So both cooperate.
In this new setting Paula would reason as follows: (1) If Albert confesses (he chooses
NC, not to cooperate with me) and I don’t (C, NC), I get UP(C, NC) = 4; if I confess
too (NC, NC), I get UP(NC, NC) = 3. If Albert is going to confess, I had better not to
confess; 4 utiles > 3 utiles. (2) If Albert is going to stay silent and I do not confesses
(C,C), I get UP(C, C) = 4, while if I confess (NC, C), I get UP(NC, C) = 3. So also in
this case I am better off not confessing 4 utiles > 3 utiles. (3) Whatever Albert does I
am better off not confessing (choosing the cooperative strategy C).
Problem 9* Consider Paula and Albert playing the game in Figure 1.A when the
probability to repeat it next time is ρ=0,9. Assume booth cooperate at first. Does it
pays not to cooperate if once an agent is betrayed he will always betray? What if the
game is repeated infinite times and agents discount future utility by r = 10%?
Hint: Since the probability to playing the game next time is π = 0,9 Paula and Albert
29
maximises the present value of their utility ui [with π as a discount factor].
∞
∞
The expected utility betraying is ui(NC) = Σt π Ui(NCt,NCt) = Σt 0.9•3
∞
∞
ui(C) = Σt π Ui(Ct, Ct) = Σt 0.9•4
While when cooperating is
It pays not to cooperate if the difference ui(C) - ui(NC) is greater than the immediate
gain from betraying 5 - 4 = 1.
If the game is repeated infinite times and agents discount future utility by r = 10% we
get a discount factor ρ = 1/(1+r) = 0.9. Notice how the structure of the problem is the
same with π = 0.9 and there is no difference in term of the present value of utility.
Problem 10 Represent such a game using a payoff matrix. Show that (a, a) and (b, b)
are Nash equilibria in the game in figure 1C. Is that true for the game in figure 3D?
A possible representation of the English drivers game is provided in figure 1E with
Paula and Albert. If all English drivers switched from (L,L) to driving (R,R), they will
be better off, gaining 5 instead of 3. A single English driver (Paula or Albert) making
the switch on his own, would be worse off, gaining 0 instead of 3.
Figure 1E
P
A
R
Figure 1C
L
P
A
Figure 1D
a
b
P
A
a
b
R
5;5
0;0
a
3;5
0;0
a
3;5
0;0
L
0;0
3;3
b
0;0
5;3
b
0;0
2;1
In the game in figure 1C (a, a) is a Nash equilibrium, since when the standard a
(favoured by Albert) is adopted, Paula would suffer (and impose to Albert) a large
costs -3 (-5) by choosing b; so it is in her interest to adopt a. The same is true when (b,
b) is adopted), Albert would suffer (and impose to Paula) a large costs -3 (-5) by
choosing a; so it is in his interest to adopt b..
The same is true for the game in figure 3D, where (a, a) and (b, b) are Nash equilibria.
In this case (b, b) is stable against individual action, even if it leads to a sub-optimal
outcome. Paula and Albert will both gain making the switch to (a, a) together, gaining
3 or 5 instead of 2 or 1, but each one would be worse off making the switch on his
own, gaining 0 instead of 2 or 1.
30
Problem 11* Change all the payoff sign in Fig. 2A and let Paul and Albert play the
new game. Show it has only a mixed strategy solution and find it. What kind of game
is it?
A
P
a
b
a
5 ; -5
-5 ; 5
b
-5 ; 5
5 ; -5
Hint:
Obviously, starting from any state a player will find in his interest to change strategy e.g. in (a,a) Paula will improve by playing b. Consider the mixed strategy solution is
the following probability mix of pure strategies a 50% chance of a and a 50% chance
of b for Paula and Albert. In fact, if Paula (or Albert) will choose more frequently
strategy b (a) Albert (Paula) will not be indifferent among the two alternative but
strictly prefer b to a.
This seems a game of the type “you loose and I win”, in practice a “zero-sum game”.
Problem 12 Represent the game in Fig. 3A in normal form. Show that it has two pure
strategy solutions and find them. Is there a first move advantage?
We can represent the game as follows once we define Albert’s possible strategies.
A
P
1
2
3
4
1 = “Choose always a”
a -5; 5 5; -5 -5; 5 5; -5
b 5; -5 -5; 5 -5; 5
Albert’s strategies:
5; -5
2 = “Choose always b”
3 = “Choose a if P chooses a and b if P chooses b”
4 = “Choose b if P chooses a and a if P chooses b”
Strategy 3, i.e. “choose a if P chooses a and b if P chooses b” is the dominant strategy
for Albert. Paula is indifferent between choosing a or b, hence (a, 3) and (b, 3) are the
two pure strategy solutions Any probability mix of pure strategies (e.g. a 50% chance
of a and a 50% chance of b) for Paula would also be a component with Albert’s
strategy 3 for any solution.
Problem 13 The reader is invited to show that, by representing the game in a normal
form, as in figure 3.C, what are the strategies 1, 2, 3, 4, available to A.
31
We can represent the game in fig. 3B in a normal form, as in figure 3.C once we define
Albert’s possible strategies as follows.
Figure 3B
Figure 3C
P
A
a
Aa
a
P
b
1
1
2
4
a 5; 2 -3; -3 5; 2 -3; -3
Ab
b
a
b
5;2
2;5
4;4
b 4; 4 2; 5
-3 ; -3
3
2; 5
4; 4
1 = “Choose always b”,
2 = “Choose always a”,
3 = “Choose b if P chooses a and a if P chooses b”,
4 = “Choose a if P chooses a and b if P chooses b”
Problem 14* Assume that there are only two firm with constant marginal costs (C1 =
cQ1 and C2 = cQ2) facing market demand function Q(P) = Q1+Q2 = a-P. Calculate
the levels of output and profit for both firms with Cournot, Bertrand, Stackelberg,
collusion and “cheating”. Calculate also the minimum level of ρ supporting collusion.
Hint: Firms marginal revenues equal to marginal costs represents the equations of the
reaction functions. From MR1 ⇒ a-Q2-2Q1 = c and from MR2 ⇒ a-Q1-2Q2 = c.
Solving the system of the two equations gives Cournot solution, i.e. point E in figure
8B. Other equilibria can be calculate from point B, R and C in figure 8B.
Given equilibrium quantities we can calculate profits in equilibrium ( Π*, Πd, Πp) and
Πd - Π*
short run gains from defection
ρ =  = 
Πd - Πp
permanent punishment losses
32