3.3.3 Illustration: Infinitely repeated Cournot duopoly.

will begin next period less effective in deterring a deviation this period. Nonetheless,
players can do better than just repeat the Nash equilibrium of the constituent game.
3.3.3
Illustration: Infinitely repeated Cournot duopoly.
The setup of the stage game is the following. There are 2 firms, denoted Firm 1 and
Firm 2. Each firm i chooses its output qi , which is produced at marginal cost c ≡ 0.
Total inverse demand is P (Q) = 1 − Q, Q = q1 + q2 .
• The Cournot outcome of the stage game: firm i’s best-response function is
1−q
qi (qj ) = 2 j , and thus the Nash equilibrium is (q1N E , q2N E ) = ( 13 , 31 ), yielding
an equilibrium price pN E = 13 , and payoffs πiN E = 19 .
• The collusive outcome, obtained when firms form a cartel that maximizes
joint profit, is such that Qc = arg maxQ P (Q)Q and qic = 21 Qc . Here, Qc = 12 ,
q1c = q2c = 41 , the cartel price is pc = 12 , and firms’ profits amount to πic = 81 .
• If a firm i had to unilaterally deviate from the collusive agreement, it would
choose to produce an output qid that maximizes its profit, given that firm j
produces qjc = 41 , that is
qid = arg max πid (qi , qjc ) where πid (qi , qjc ) = (1 − qi − qjc )qi
qi
3
= ( − qi )qi
4
∂πid (qi , qjc )
3
3
|qid = 0 ⇔ − 2qid = 0 ⇔ qid = > qic .
∂qi
4
8
¡ ¢2
The instantaneous profit from deviating is πid = 38 > 81 .
Now we can analyze the following questions:
1. For which values of the discount facto δ is perfect collusion sustainable, given
that firms follow trigger strategies?
2. For a given value of δ smaller than 9/17, what output could be sustained by
a trigger strategy?
3. “Carrot-and-stick” strategies (Gibbons, p. 104-106).
61
1- For which values of the discount facto δ is perfect collusion sustainable,
given that firms follow trigger strategies?
Each firm i follows the trigger strategy
Produce the collusive output qic in period 1.
In period t > 1, produce qic if the outcome (q1c , q2c ) was observed in all
previous periods. If, in at least one previous period, the outcome was
different than (q1c , q2c ), then produce the non-cooperative output qiN E .
We must show that such a strategy profile constitute a SPNE.
Consider any given period t̃, and suppose that firm j is following the trigger
strategy. We must show that firm i does not want to deviate from the trigger
strategy.
In period t̃, there are two possible categories of subgames:
1. The subgames that start after the outcome (q1c , q2c ) in every (t̃ − 1) previous
period;
2. The subgames that start after at least one outcome was different than (q1c , q2c ).
In the first category of subgames, firm i knows that firm j will produce qjc . Firm i
can either produce qic or deviate (and then deviate to its best possible deviation) to
produce qid .
By deviating to qid , firm i obtains πid instead of πic in the current period, but
this deviation triggers firm j to a revert to the non-cooperative outcome qjN E in all
following periods, to which firm i’s best-response is qiN E . Therefore, the present
value (at time t̃) of the flow of profits when firm i deviates is
Vt̃d
≡
πid
+δ
∞
X
δ s−(t̃−1) πiN E
s=t̃+1
µ ¶2
δ 1
3
+
.
=
8
1−δ9
Conversely, if firm i does not deviate and produces qic in period t̃, it obtains a
profit πic = 18 at the current period. Next period, firm i will have the same decision
to make between deviating and cooperating. Denote V the discounted present value
of the infinite sequence of payoffs firm i receives when it is optimal for firm i to
collude:
∞
X
1 1
c
Vt̃ ≡
πic =
.
1−δ8
t̃
62
At period t̃, firm i chooses to collude if
Vt̃c
≥
Vt̃d
1 1
⇔
≥
1−δ8
µ ¶2
3
δ 1
9
+
⇔ δ≥
≈ 0.53.
8
1−δ9
17
Hence, the trigger strategy constitute a Nash equilibrium of the first-category sub9
games if δ ≥ 17
.
In the second category of subgames, firm i knows that firm j will produce qjN E ,
in the current period and in every following one. Firm i best response is to produce
qiN E . Therefore, the trigger strategy forms a Nash equilibrium of the second-category
subgames.
It remains to show that firm i chooses to produce qic in period 1. By choosing
1 1
to collude in the first period, firm i’s discounted flow of profits is V1c = 1−δ
. By
8
¡ 3 ¢2
δ 1
d
deviating, it is V1 = 8 + 1−δ 9 . Again, firm i agrees to collude in the first period
as long as δ >
9
.
17
9
2- For a given value of the discount factor δ < 17
, what level of output
would be sustainable, given that firms follow trigger strategies?
9
Denote q ∗ the output on which firms agree to cooperate on. Because δ < 17
, we
know that q ∗ < qic . We look for q ∗ ∈ (qic , qiN E ) that is sustainable when each firm i
follows a trigger strategy
Produce the output q ∗ in period 1.
In period t > 1, produce q ∗ if the outcome (q ∗ , q ∗ ) was observed in all
previous periods. If, in at least one previous period, the outcome was
different than (q ∗ , q ∗ ), then produce the non-cooperative output qiN E .
When both firms cooperate on q ∗ , each one obtains an instantaneous payoff of π ∗ =
(1−2q ∗ )q ∗ . If one of them wants to deviate, it will choose the best possible deviation,
q d (q ∗ ), that is the one that maximizes its profit, given that its competitor is currently
producing q ∗ : q d (q ∗ ) = arg maxq (1 − q ∗ − q)q. It follows that q d (q ∗ ) = 21 (1 − q ∗ ). The
current profit of the deviating firm is written πid (q d ; q ∗ ) = (1−q d −q ∗ )q d = 41 (1−q ∗ )2 .
9
For the trigger strategy to be an equilibrium, given δ < 17
, the present value of
∗
cooperating on q should be at least as high as the present value of deviating and
63
then reverting to the Nash outcome forever after:
Vc =
1
1
δ 1
(1 − 2q ∗ )q ∗ ≥ V d = (1 − q ∗ )2 +
.
1−δ
4
1−δ9
(3.2)
The condition expressed in (3.2) can be written as
µ
∗ 2
(q )
µ
¶
¶
1−δ
1−δ
1−δ 1
∗
+2 −q
+1 +
+ ≤ 0,
4
2
4
9
∗
and is satisfied for any q ∈
h
9−5δ 1
,
3(9−δ) 3
i
. The highest cooperative profit is obtained
when firms produce the corresponding lowest cooperative output q ∗ =
9−5δ
.
3(9−δ)
For
9−2
instance, if δ = 0.4, q ∗ = 3(9−0.4)
≈ 0.27. This level of output is of course larger
than the perfect collusive output qic = 0.25, but smaller than the Nash equilibrium
output qiN E = 0.33.
3- “Carrot-and-Stick” strategy.
Now, we consider a strategy different than a trigger strategy, in which a deviation
is followed by a punishment phase that lasts a finite number of periods. Basically,
firms cooperate on an output q ∗ , allowing them to obtain profits higher than the
Cournot profits. The punishment that follows a deviation consists in a level of
output q p > q ∗ , that yields an individual profit πip (q p , q p ), and a number of periods
T during which the punishment is carried out.
Cooperation can be sustained with “carrot-and-stick” strategies if these strategies
are feasible and credible. A “carrot-and-stick” strategy is be feasible if firms actually
want to cooperate on the level of output q ∗ , and it will be credible if it is indeed
optimal for each of them to implement the punishment phase after one of them has
deviated.
The cooperative profit is π ∗ ≡ π ∗ (q ∗ , q ∗ ) = (1 − 2q ∗ )q ∗ . The best deviation from
q ∗ is q d (q ∗ ) = 21 (1 − q ∗ ), yielding a profit π d ≡ π d (q d (q ∗ ); q ∗ ) = 41 (1 − q ∗ )2 .
During the punishment phase, both firms are supposed to produce q p , and should
obtain a profit π p ≡ π p (q p , q p ) = (1 − 2q p )q p . But each firm has always the option
to deviate from the punishment output. If it does so, it would choose the best
possible deviation. i.e. q dp = arg maxq (1 − q p − q)q. The best deviation from the
punishment output is thus q dp (q p ) = 21 (1 − q p ), yielding an instantaneous profit
π dp ≡ π dp (q dp ; q p ) = 14 (1 − q p )2 .
Now, consider the following strategy for firm i:
64
Produce q ∗ in period 1. In period t > 1, produce q ∗ if (q ∗ , q ∗ ) or (q p , q p )
was the outcome observed in period (t − 1). Otherwise, produce q p .
This two-phase strategy involves a one-period punishment phase in which each firm
produces q p . This punishment output is produced in two possible situations, i.e.
after either firm deviated from the cooperative output q ∗ , and after either firm
deviated from the punishment output during the punishment period.
If both firms adopt this “carrot-and-stick” strategy, the subgames in the infinitely
repeated game can be grouped in two categories:
1. The cooperative subgames, in which the outcome of the previous period was
either (q ∗ , q ∗ ) or (q p , q p );
2. The punishment subgames, in which the outcome of the previous period was
neither (q ∗ , q ∗ ) nor (q p , q p ).
To be a SPNE, the strategy profile (consisting of both firms adopting the “carrotand-stick” strategy) must be a Nash equilibrium in each category of subgames.
(1.) In cooperative subgames, the strategy profile is a Nash equilibrium if neither
firm wants to unilaterally deviate, that is if
µ
¶
1
δ
∗
d
p
∗
π ≥ π +δ π +
π
1−δ
1−δ
⇔ δ (π ∗ − π p ) ≥ π d − π ∗
(3.3)
Inequality (3.3) can be read as follows: The gain from deviating in the current period (the difference in the RHS) must not exceed the discounted value of the loss
due to the next period’s punishment phase (the difference in the LHS).
(2.) In punishment subgames, each firm must prefer to administer the punishment
rather than to deviate (that is, punishment must be credible). This condition can
be written as
µ
¶
δ
δ
∗
dp
p
∗
π +
π ≥ π +δ π +
π
1−δ
1−δ
⇔ δ (π ∗ − π p ) ≥ π dp − π p .
p
(3.4)
Inequality (3.4) means that the gain from deviating from the punishment this period
(the RHS) must not exceed the discounted loss of another period of punishment (the
LHS).
65
We need both inequalities (3.3) and (3.4) to hold simultaneously, which implies
that we must solve the following system
(
δ [(1 − 2q ∗ )q ∗ − (1 − 2q p )q p ] ≥
δ [(1 − 2q ∗ )q ∗ − (1 − 2q p )q p ] ≥
1
(1
4
1
(1
4
− q ∗ )2 − (1 − 2q ∗ )q ∗
− q p )2 − (1 − 2q p )q p
(3.5)
Note that there are for the moment 3 unknown in this system: δ, q ∗ and q p . To
make things simple, suppose δ = 12 and that firms wish to cooperate on the perfectly
collusive output, q ∗ = 14 . We can thus find the optimal punishment q p . With these
values, the system (3.5) becomes
(
1
2
1
2
(
£1
¤
¡ 3 ¢2 1
3
p p
−
−
(1
−
2q
)q
≥
≥ 0
2(q p )2 − q p + 32
8
8
£ 81
¤
⇔
5 p 2
− (1 − 2q p )q p ≥ 14 (1 − q p )2 − (1 − 2q p )q p
(q ) − 2q p + 38 ≤ 0
8
2
(
£ ¤ £ ¤
q p ∈ 0, 18 ∪ 38 , 1
£ 3 1¤
⇔
q p ∈ 10
,2 .
£
¤
Finally, the system (3.5) is satisfied for q p ∈ 38 , 12 . The two-phase (or “carrotand-stick”) strategy, allowing firms to cooperate on the collusive output q ∗ = 14 ,
and including a one-period punishment in case one of them deviates, constitutes a
subgame perfect Nash equilibrium as long as the punishment output q p is in the
¤
£
interval 83 , 12 . Note that the smallest punishment, corresponding to q p = 83 , is
harsher than a reversion to the Nash equilibrium of the stage game, q N E = 31 , as
3
π p = 32
< π N E = 19 .
66