Learning and changing: institutions and the dynamics of

Learning and changing:
institutions and the dynamics of cooperation
Roberto Galbiati∗
Emeric Henry†
Nicolas Jacquemet‡
January 30, 2017
Abstract
The level of cooperation in groups is affected both by formal institutions that govern individual
interactions and also by prevalent values attached to cooperation in the group. Using a lab
experiment guided by a theoretical model, we show that the two channels dynamically interact.
First, we show that being exposed to enforcement in the recent past has a strong positive
effect on future cooperation. This is mostly due to an indirect behavioral spillover effect:
facing penalties in the past increase past cooperation which in turn positively affects current
behavior. However, for interactions that occur early on, we find a negative effect of past
enforcement. We show it is due to a learning effect: whether an interaction is played with
enforcement affects what can be learned from the partner’s actions.
JEL Classification:
Keywords:
1
Introduction
What makes societies or organizations (hereafter groups) cooperative environments? Social sciences have provided two main answers to this question: formal institutions–laws or regulation–
and societal values– group members’ taste for cooperation. While these two fundamental drivers
of behavior are often considered separately, this paper studies their dynamic interaction. We focus
on two mechanisms. First, past institutions might affect individual values in the future via some
form of spillover. Second, the institutional setting may affect how individuals learn about prevalent values in the society. We rely on the combination a lab experiment and a theoretical model
that highlight both the channels behind spillover effects of past institutions and the dynamics of
learning about the group’s values.
∗
Sciences Po and CEPR.
Sciences Po and CEPR.
‡
Paris School of Economics and University Paris 1 Panthéon-Sorbonne. MSE, 106 Bd de l’hopital, 75013 Paris,
France. [email protected]
†
1
In order to clarify how spillovers and learning might work, consider the following two situations. The first is a setting of a well established society or organization where societal values
are commonly known. In this situation past institutions can affect values only through spillovers.
They can directly shift individual preferences (indirect spillovers) or affect values indirectly via
people response to others’ behavior (indirect spillovers), i.e past institutions affect cooperation in
the past which in turn impacts current values. The second setting is a situation in which a new
organization is formed (for instance a new firm, a new class in a school, or a new city is founded).
In this situation group members do not necessarily know each other and have to decide how much
effort to exert in cooperative activities. We show how past institutions determine the speed at
which an individual learns about other group members values. If for instance a very strict sanction is imposed on non-cooperativeness, everyone cooperates and thus individuals cannot learn
about others’ values. As a consequence, by affecting the learning process, strong institutions in
the past can negatively affect cooperation today if these sanctions are removed.
Our research design relies on a laboratory experiment that allows us to disentangle the
spillovers of past institutions from the direct effect of current ones and to study how the institutional setting affect learning dynamics. Participants play a series of indefinitely repeated
prisoners dilemma that in their baseline version build on Dal Bó and Fréchette (2011). At the
beginning of each game, it is randomly determined whether a formal institution in the form of a
penalty will be imposed in all rounds of the game when a participant chooses to deviate rather
than cooperate. At the end of the game, each participant is re-matched with a new one and a new
institution is drawn. This setup allows for a clean identification of the mechanisms underlying
the effect of past institutions on cooperation. Each participant will have a different history of
institutional exposure and of past behavior of partners, that does not depend on self-selection
into particular institutional environments and is independent from the current environment faced
by each individual.
Our results show that this rational learning dynamics are present and are consistent. After
having documented the learning dynamics our experimental design allows us to uncover how the
institutional environment affect individual cooperation in the long run. The random allocation of
penalties in each match is such that we can compare how people respond to different sequences of
institutional environments in later games where the learning of others’ types has converged. The
idea is to understand whether formal institutions in the past present some form of persistence
in affecting future cooperative attitudes of those that experienced them. Our result show that
institutional spillovers are present and strong. Past institutions affect ones’ propensity to cooperate mainly through their effect on the cooperative behavior of one’s past partners in the game.
Moreover, the effect of past institutions is stronger when they show some form of stability in the
past. All along the paper, we will read this results through the lens of a model that allows us to
make predictions about how current institutions will affect behavior and learning dynamics.
To summarize our findings, we discover that past legal enforcement can have a negative effect
2
in the short run, because of learning effects, but a positive effect in the long run when formal
institutions have been stable in the past.
Our paper is linked to different strands of literature.1 First, our paper is related to field
studies showing that formal institutions in the distant past may affect values and preferences in
the present. Guiso, Sapienza, and Zingales (2016); Tabellini (2008) document that in cities and
regions that were more exposed to institutions favoring cooperation, stronger cooperative values
and beliefs are observed today. Lowes, Nunn, Robinson, and Weigel (2016) use historical variation
in Africa and find on the contrary a negative relation between strong institutions in the past and
intrinsic motivation to follow rules nowadays. ?, using a shorter time span, show that the level of
corruption in the country of origin has an impact on parking violations by UN diplomats in New
York where they were not subject to fines. In all these studies, the mechanism through which
past institutions affect current behavior, is however not explored and this is the main purpose of
the current paper.
Some recent experimental literature, examines in the lab similar questions as the field studies
mentioned above: what is the effect of early exposure to strong formal institutions on subsequent
individuals propensity to play cooperatively. Peysakhovich and Rand (2016) build an experiment
where each treatment is organized in two phases: in the first participant play a series of infinitely
repeated prisoners dilemma while in the second they play one shot games such as the dictator
game. They show that in treatments where cooperation is supported in equilibrium in the games
of the first phase, the participants are more cooperative in the one shot games that follow. In
a related paper Cassar, d’Adda, and Grosjean (2014) study an experiment where subjects play
a market game under different institutional treatments, which generate different incentives to
behave honestly, preceded and followed by a non-contractible and non-enforceable trust game and
show a significant increase in individual trust and trustworthiness following exposure to better
institutions. While these studies show that the history of exposure to strong or weak institutions
can affect future behavior, their identification is based on comparison between treatments and
they cannot therefore examine the issue of persistence, explore mechanisms underlying these
effects such as learning dynamics nor compare the strength of effects between current and past
enforcement.2
Direct spillovers of actual institutions have been documented in other papers looking at how
the institutional setting anchors individuals’ propensity to cooperate beyond their deterrent effect
(Galbiati and Vertova, 2008, 2014). Indirect spillovers can be seen as a persistent form of conditional cooperation (Fischbacher and Gachter, 2010), observing others’ cooperating in the past
influence individuals cooperative behavior.
1
It is worth noting that our results on current institutions are consistent with papers studying how formal rules
and sanctions affect individual behaviors (Becker, 1968; Drago, Galbiati, and Vertova, 2009) and complement this
literature by highlighting the existence of indirect effects on individual values and beliefs in repeated interactions
2
Tabellini (2008) also studies the interaction between institutions and values but focusing on different perspective
where legal enforcement affects the intergenerational transmission of cooperative values.
3
Table 1: Stage-game payoff matrices
C
D
C
40 ; 40
60 ; 12
D
12 ; 60
35 ; 35
C
D
(a) Baseline game
C
40 ; 40
60 ; 12-F
D
12 ; 60-F
35-F ; 35-F
(b) With penalty
Finally, the idea that formal rules can also convey information about the distribution of
preferences or values is also present in Sliwka (2007); van der Weele (2009); Drago, Galbiati, and
Vertova (2009). These papers have some built-in social complementarity through some individual
having preferences to match the prevalent actions in a society. A difference between our approach
and this literature rests in how we model the individual learning process about prevalent values.
Existing literature focuses on formal institutions as specific carriers of information about social
values, in our study we abstract from this aspect and we focus on how enforcing institutions,
operating as a veil, can hide social values.
2
A theoretical model of cooperation dynamics
We design an experiment to study how past and current institutions affect cooperation. Subjects
in the experiment play infinitely repeated games implemented through a random continuation
rule. At the end of each interaction (hereafter “round”), the computer randomly determines
whether or not another round is to be played in the current repeated game (“match”). The
probability of continuation is fixed at δ = 0.75 and is independent of any choices players make
during the game. Participants therefore play a series of games of random length, with expected
δ
). Players in a given repeated game are matched using a quasi-stranger
length of 4 rounds (= 1−δ
design: at the beginning of each match, players are randomly and anonymously assigned to a
partner.3
The stage-game in all interactions is the Prisoner’s dilemma displayed in Table 1. Institutions
are randomly varied in each match: at the beginning of each match, the computer randomly
determines whether the match is played with a penalty or without–the two events occur with
equal probability. The result from this draw applies to both players of the current match, and to
all its rounds. If the game is played with a penalty, a player who chooses to deviate in the current
round pays a fine equal to F = 10. The resulting stage-game payoff matrix is however isomorphic
to Dal Bó and Fréchette (2011) {δ = 3/4; R = 40} treatment, in which cooperation is a sub-game
perfect and risk dominant action.
3
The experiment terminates once the match being played at the 15th minute ends.
4
2.1
Setup of the model
We build a model to understand how past and present institutions affect cooperative behavior
in the experiment. More generally the theory applies to any group whose members interact in
pairs and the institutions governing the group may vary from one interaction to the next. In each
interaction, the players simultaneously choose between actions C and D. In the case where an
interaction is a repeated prisoner’s dilemma, as is the case in the experiment, this requires the
first period action to fully summarize strategies as we explain below. The payoff of player i from
playing ait ∈ {C, D} in period t is given by:
UitC (Fit , pit )
= VitC (Fit , pit ) + βit
UitD (Fit , pit )
= VitD (Fit , pit )
where Vita (Fit , pit ) is the material payoff player i expects if she chooses action a in period t. This
expected payoff depends in particular on the beliefs player i holds on the probability that the
partner cooperates, pit , and of course on whether the current interaction is played with a fine,
Fit . Note that pit is in fact a function of Fit , since the presence of a fine affects the probability
that the partner cooperates.4
The parameter βit stands for individual personal values at time t: it measures the individual
propensity to cooperate at each period of time. Each individual has a baseline propensity to
cooperate, that we denote βi . In addition, past experience through both past fines and past
behaviors of the partners, can affect values. We consider the general formulation:
βit = βi + φF
1Fit−1 =1 + φC 1ajt−1 =C
(1)
Parameter φF represents the direct spillover that increases the value attached to cooperation
in the current interaction if the previous one was played with a fine. φC represents the indirect
spillover that increases the value attached to cooperation if in the previous interaction, the partner
cooperated. This model can easily be extended to allow for longer histories to impact values.5
In addition we suppose there is uncertainty on the set of society’s values, i.e the set of baseline
individual values βi . With probability q the state is high and βi is drawn from the normal
distribution Φ(µH , σ 2 ), while with probability 1−q, they are drawn from Φ(µL , σ 2 ), with µL < µH .
In the high state, society values cooperation more.
To summarize, we consider a general setting where past institutions influence the current
4
5
We drop from now on in the notation the fact that pit depends on Fit .
The effect on past institutions on values could be naturally extended to:
βit = βi +
T
X
φF τ
1Fit−τ =1 +
τ =1
T
X
φCτ
1ajt−τ =1
j=1
with φF τ and φCτ increase in τ , in other words the more recent history having more impact.
5
decision to cooperate in two ways. First, past institutions change values directly through the
parameter φF and indirectly through φC . Second, formal rules also impact how fast individuals
can learn about the societies values, as we explain below. To clarify the arguments we add the
different channels gradually.
2.2
Benchmark model
We first consider a benchmark model with no uncertainty on values (q = 1) and no spillovers
(φF = φC = 0).
We now use the specific payoffs corresponding to the prisoner’s dilemma in order to explicitly
describe the impact of fines on payoffs. In order for the decisions in the repeated game to be
summarized by the first period actions, i.e be equivalent to a simultaneous game, we constrain
the players to choose among strategies Always Defect (AD), Tit for Tat (TT) or Grim Trigger
(GT) (see section xxx for more details). We denote strategies C (for TT and GT that imply
cooperation in the first round) and D (for AD that implies deviation in the first round). Payoffs
in the prisoner’s dilemma are denoted πai ,aj , with ai , aj ∈ {C, D}.
Individual i, with beliefs pit that his partner will cooperate, will choose action C if and only
if the following condition is satisfied:
1
πC,C
pit
| 1 −{zδ
}
expected payoff of C against C
≥ pit (πD,C
|
+ (1 − pit ) πC,D + (πD,D − F
|
{z
δ
1Fit =1 )
1−δ
+ βi
}
expected payoff of C against
δ
1
− F 1Fit =1 ) + (πD,D − F 1Fit =1 )
+ (1 − pit )
(πD,D − F 1Fit =1 )
1−δ
1 − δ {z
}
{z
} |
expected payoff of D against D
expected payoff of D against C
This condition can be re-expressed as
βi
≥
∗
β (Fit ) ≡ Π1 − F
1Fit =1 + pit
δ
Π2 −
(F
1−δ
1Fit =1 + Π3 )
(2)
with the parameters defined as Π1 ≡ πD,D − πC,D > 0, Π2 ≡ (πD,C − πD,D ) − (πC,C − πC,D ) and
Π3 ≡ πC,C − πD,D > 0.
Condition (2) implies that the decision to cooperate will therefore follow a cutoff rule, such
that an individual i cooperates if and only if she attaches a sufficiently strong value to cooperation
βi ≥ β ∗ (Fit ), where the cutoff β ∗ depends on whether the current interaction is played with a
fine. Since there is no uncertainty and thus no learning all players share the same belief over the
probability that the partner cooperates, given by pit (Fit ) = P [βj ≥ β ∗ (Fit )] = 1 − ΦH [β ∗ (Fit )].
The cutoff value β ∗ (Fit ) is thus defined by the indifference condition:
∗
β (Fit )
=
Π1 − F
1Fit =1
+ (1 − ΦH [β (Fit )]) Π2 −
∗
6
δ
(F
1−δ
1Fit =1 + Π3 )
(3)
We show in Proposition 1 below that there always exists at least one equilibrium, and this
equilibrium is of the cutoff form. There could in fact exist multiple equilibria, but all stable
equilibria share the intuitive property that individuals are more likely to cooperate under a fine.
Proposition 1 In an environment with no uncertainty on values (q = 1) and no spillovers
(φF = φC = 0), there exists at least one equilibrium. Furthermore all equilibria are of the cutoff
form, i.e individuals cooperate if and only if βi ≥ β ∗ (Fit ) and in all stable equilibria, β ∗ decreases
with F and with µH .
Proof. See appendix 1.
We illustrate the possibility of multiple equilibria in Figure 1. The benefit of cooperation is
increasing in the probability that the partner cooperates. There exist equilibria where cooperation
is prevalent, which indeed makes cooperation individually attractive. On the contrary there are
equilibria with low levels of cooperation which makes cooperation unattractive. These equilibria
can be thought of as different norms of cooperativeness in the group. Proposition 1 also shows
that in all stable equilibria, β ∗ decreases with the average values of the group, measured by µH .
A random individual, even with the same βi , will cooperate more if he is in a cooperative group.
2.3
Introducing spillovers
We now add to the benchmark model the possibility of spillovers, i.e we assume φF > 0 and
φC > 0. The indifference condition (2) remains unchanged, but now βit is no longer constant and
equal to βi since past shocks can affect values. We have that individual i cooperates at t if and
only if:
βit
≥
Π1 − F
1Fit =1 + pit Π2 −
δ
pit (F
1−δ
1Fit =1 + Π3 )
The cutoff value is defined in the same way as before:
βt∗ (Fit ) =
Π1 − F
1Fit =1 + p∗t (Fit ) Π2 −
δ
(F
1−δ
1Fit =1 + Π3 )
(4)
The main difference with the benchmark model is in the value of p∗t (Fit ). There is indeed a
linkage between the values of the cutoffs at interaction t, βt∗ , and the values of the cutoffs βt∗0 in
all the preceding interactions t0 < t through p∗t (Fit ). Indeed, when an individual evaluates the
probability that her current partner in t will cooperate, she needs to determine how likely it is that
he received a direct and an indirect spillover from the previous period. The probability of having
a direct spillover is given by P [Fit−1 = 1] = 1/2 and is independent of any equilibrium decision.
By contrast, the probability of having an indirect spillover is linked to whether the partner of j
in his previous interaction (an individual we denote k) cooperated or not: P [akt−1 = C]. This
∗ , which also depends on whether
probability in turn depends on the cutoffs in t − 1, i.e βt−1
7
individual k himself received indirect spillovers, i.e on the cutoff in t − 2. We see that overall,
these cutoffs in t depend on the entire sequence of cutoffs.
We therefore focus on stationary equilibria, such that β ∗ is independent of t. We show in
Proposition 2 that such equilibria do exist.
Proposition 2 In an environment with spillovers (φF > 0 and φC > 0) and no uncertainty on
values, there exists a stationary equilibrium. Furthermore all equilibria are of the cutoff form, i.e
individuals cooperate if and only if βit ≥ β ∗ (Fit ).
Proof. See appendix 1.
Proposition 2 proves the existence of an equilibrium and presents the shape of the cutoffs.
Proposition 2 allows us to express the probability that a random individual cooperates as:
1 − ΦH Λ 1 − φF
1Fit−1 =1 − φC 1ajt−1 =C − Λ2 1Fit =1
(5)
where:
Λ1
Λ2
≡
β ∗ (0) = Π1 + p∗ (0) Π2 −
≡
δ
δ
Π3 +
p∗ (1)F
β (1) − β (0) = F + (p (0) − p (1)) Π2 −
1−δ
1−δ
∗
∗
δ
Π3
1−δ
∗
∗
This is the first relation we will test in the data to examine the existence and size of spillovers,
direct and indirect (φF and φC ), as well as the effect of current enforcement Λ2 . Another implication of the model is that the marginal effect on the probability to cooperate of having a spillover
(direct or indirect), is smaller when there is a fine in the current match than when there isn’t.
The model also has implications for the persistence of spillovers. The results derived below are
not to be tested directly in the data but allow us to understand better the dynamics of cooperation
in this model.
Corollary 1 In an environment with spillovers (φF > 0 and φC > 0) and no uncertainty on
values, if group members play a stationary equilibrium, then:
1. If in two otherwise identical groups, a larger share of interactions at period t = 1 are played
with fines in group 1 compared to group 2, then the probability of a randomly picked individual
cooperating in any interaction t ≥ 1 is higher in group 1 than group 2.
2. The equilibrium belief that the partner will cooperate in the absence of fines, p∗ (0), is increasing in F .
We considered an environment where only the previous period institutions and actions affect
current values and furthermore only in a temporary way (i.e the baseline value βi does not shift)
8
as presented in equation (1). Nevertheless, there is a transmission of shocks through time because
of indirect spillovers. Consider two identical groups playing according to the stable equilibrium
above. Suppose more individuals in group 1 randomly happen to experience stronger institutions
in their first interaction. Since the groups are otherwise identical, individuals in group 1 will
be more likely to cooperate in the first interaction. In the second interactions the institutions
are randomly drawn in both groups. However, on average, because group 1 members cooperated
more in interaction 1, these members are more likely to experience an indirect spillover φC and
cooperate in their second interaction. This in turns implies a higher probability of cooperation in
interaction 3 and in any subsequent interactions. There is thus a transmission of shocks.
Corollary 1 also shows that, in equilibrium, in an interaction played without a fine, players
are more cooperative if the fine is higher in interactions that do use them. The intuition is that
the higher fines increase the probability that individual cooperate when they are used and thus,
even in interactions without them, makes it more likely that the partner experienced a spillover
from his previous interaction.
2.4
Introducing learning
We now consider the more general formulation with uncertainty on the group’s values. We denote
qit the belief held by player i at interaction t that the state is H. All group members initially
share the same beliefs qi0 = q. They gradually learn about the group’s values and we show how
penalties impact learning.
Consider the first interaction t = 1. At the beginning of this interaction, all members share the
same belief q. Furthermore, no member has yet obtained spillovers from the past. The equilibrium
is defined by a single cutoff value β ∗ (Fit ) as in the benchmark model
β ∗ (Fi1 )
=
Π1 − F
1Fi1 =1 + p∗1 (Fi1 ) Π2 −
δ
(F
1−δ
1Fi1 =1 + Π3 )
The only difference with the benchmark model is that the probability that the partner cooperates
takes into account the uncertainty on the society’s values:
p∗1 (Fi1 ) = q (1 − ΦH [β ∗ (Fi1 )]) + (1 − q) (1 − ΦL [β ∗ (Fi1 )])
We now consider how beliefs about the state of the world are updated following the initial interaction. The update depends on the action of the partner and whether the interaction was played
with or without a fine. The general notation we use is qit (Fit−1 , ajt−1 , qit−1 ). For the update
following the first interaction, we can drop the dependence on qit−1 , since all individuals initially
share the same belief.
Fines affect learning about the state of the world. Naturally, the fact the partner chose D,
decreases the belief that the state is H, while the fact he chose C increases it. However the
9
update depends on whether the previous interaction was played with a fine or not. If the partner
cooperated in presence of a fine, it is a less convincing signal that society is cooperative than if he
cooperated in the absence of the fine (qi2 (0, C) > qi2 (1, C)). Similarly, deviation in the presence
of a fine decreases particularly strongly the belief that the state is high (qi2 (1, D) < qi2 (0, D)).
We thus have the relations:
qi2 (0, C) > qi2 (1, C) > q
qi2 (1, D) < qi2 (0, D) < q
We show in Proposition 3 that this updating property is true in general for later interactions.
In these later interactions, spillovers start playing a role. The decision to cooperate depends, as
in the previous section, both on the current institution Fit and on the history (Fit−1 , ajt−1 ), but
also on beliefs qit−1 . The cutoffs are defined by:
βt∗ (Fit , qit )
=
Π1 − F
1Fit =1 +
p∗t (Fit , qit )
δ
(F
Π2 −
1−δ
1Fit =1 + Π3 )
The beliefs on how likely it is that the partner cooperates in interaction t, p∗t (Fit , qit ), depends
on the probability that the partner experienced spillovers (as in the previous section). Furthermore, the probability that the partner j had an indirect spillover, depends on whether his partner
k in the previous interaction cooperated, and thus depends on the beliefs qkt−1 of that partner in
the previous interaction. The general problem requires to keep track of the higher order beliefs.
Proposition 3 In an environment with spillovers and learning, if an equilibrium exists, all equilibria are of the cutoff form, i.e individuals cooperate if and only if βi ≥ β ∗ (Fit , qit ). Furthermore,
the beliefs are updated in the following way following the history in the previous interaction
qit (0, C, qit−1 ) > qit (1, C, qit−1 ) > qit−1
qit (1, D, qit−1 ) < qit (0, D, qit−1 ) < qit−1
Proof.
Proposition 3 derives a general property of equilibria. Furthermore, in the appendix we show
existence of equilibria under a natural restriction on higher order beliefs, i.e if we assume that a
player who had belief qit−1 in interaction t − 1 believes that her partner in that interaction had
the same beliefs qjt−1 = qit−1 .
The result of Proposition 3 allows us, for a given belief qit−1 , to express the probability of
10
cooperation as:
1 − ΦH Λ 1 − φF
1Fit−1 =1 − φC 1ajt−1 =C − Λ2 1Fit =1

X
−
Λjk,l
1(Fit =j,Fit−1 =k,ajt−1 =l) 
(6)
j,k∈{0,1},l∈{C,D}
where:
Λ1
=
Λ2
=
Λ0k,l
=
Λ1k,l
=
δ
β (0, 0, D) = Π1 + p (0, 0, D, q) Π2 −
Π3
1−δ
δ
δ
∗
∗
F − (p (1, 0, D, q) − p (0, 0, D, q)) Π2 −
Π3 +
p∗ (1, 0, D, q)F > 0
1−δ
1−δ
δ
∗
∗
Π3
[p (0, k, l, q) − p (0, 0, D, q)] Π2 −
1−δ
δ
∗
∗
−[p (0, k, l, q) − p (0, 0, D, q)] Π2 −
(F + Π3 )
1−δ
∗
∗
We note that the parameters Λ1 , Λ2 and Λjk,l in equation (6) depend on qit−1 . Compared to
the case without learning, there are 6 additional parameters, reflecting the updating of beliefs.
According to the result in Proposition 3, we have, both in the case where the current interaction
is played with a fine and when it is not.
Λ10,C > Λ11,C > 0 > Λ11,D
Λ00,C > Λ01,C > 0 > Λ01,D
(7)
Overall, having fines in the previous interaction can potentially decrease on average cooperation in the current interaction. There are two countervailing effects. On the one hand, a fine in the
previous interaction, increases the direct and indirect spillovers and thus increases cooperation.
On the other hand, if the state is low, a fine can accelerate learning if on average, sufficiently
many people deviate in the presence of a fine. This will then decrease cooperation in the current
interaction.
3
Empirical strategy
Three sessions of the experiment were conducted at Ecole Polytechnique experimental laboratory.
Participants are both students (85% of the experiment pool) and employees of the university
(15%). Individual earnings are computed as the sum of all tokens earned during the experiment,
with an exchange rate equal to 100 tokens for 1 Euro. Participants earned on average 12.1 Euros
from an average of 20 matches, each featuring 3.8 rounds.6 This data delivers 934 game obser6
Repeated games thus are played according to a quasi-stranger design. The size of each session is: 16, 18 and
12 participants. .
11
Figure 1: Sample characteristics: distribution of game lengths and repeated-game strategies
(b) Individual strategies
(a) Distribution of game lengths
.3
No Fine
.8
Fine
AD
TFT
(pure) AC
Unknown
GT
0.70
Share of games
Density of games
.6
.2
.1
0.41
.4
0.33
0.24
.2
0.17
0.14
0.14
0.07
0
0.02
0
5
10
15
20
25
0
No Fine
0.02
Fine
Note. Left-hand side: empirical distribution of game lengths in the experiment, split according to the draw of the penalty.
Right-hand side: distribution of repeated-games strategies observed in the experiment. One-shot games are excluded. AD:
Always Defect; AC: Always Cooperate; TFT: Tit-For-Tat; GT: Grimm Trigger.
vations, 48% of which are played with no penalty. Figure 1a displays the empirical distribution
function of game lengths in the sample split according to the draw of a penalty. With the exception
of two stages repeated games, the distributions are very similar between the two environments.
This difference in the share of two-stages games mainly induces a slightly higher share of games
longer than 10 played with a penalty. In both environments, one third of the games we observe
are one-shot, and half the actually repeated games last between 2 and 5 rounds. A very small
fraction of games (less than 5% with a penalty, less than 2% with no penalty) feature lengths of
10 repetitions or more.
3.1
Empirical measures: strategies
Our empirical strategy relies on between games variations induced by the exogenous change in the
institutional environment. While the first stage decision in a given game is a measure of the effect
of past history of play on individual behavior, the decisions made within the course of a repeated
game are a mix of this component and the strategic interaction with the current partner. For
games that last more than one period (2/3 of the sample), we thus reduce the observed outcomes
to the first stage decision in each repeated game.
The first stage decision is an exhaustive statistic for the future sequence of play if subjects
choose among the following repeated-game strategies: Always Defect (AD), Tit-For-tat (TFT) or
Grim Trigger (GT). While AD induces defection at the first stage, both TFT and GT imply to
cooperate and are observationally equivalent if the partner chooses within the set restricted to
12
these three strategies and give rise to the same expected payoff. Figure 1b displays the distribution
of strategies we observe in the actually repeated games of the experiment (excluding one-shot
games). In many instances, TFT and GT cannot be distinguished: it happens for instance for
subjects who always cooperate against a partner who does the same (in which case, TFT and GT
also include Always Cooperate, AC), or if defection is played forever by both players once it occur.
We thus plot both their joint probability of occurrence, and the share that can be classified as
being either of the two. Last, we also add the share of Always Cooperate that can be distinguished
from other repeated games strategies– when AC is played against partners who do defect at least
once.
All sequences of decisions that do not fall in any of these strategies cannot be classified–this
accounts for 14% of the games played without a penalty, and 24% of those played with penalty.
The three strategies on which focus are thus enough to summarize a vast majority of repeated
game decisions. AD accounts for 70% of the repeated-game observations with no penalty, and
41% with a penalty, while TFT and GT account for 14% and 34% of them. In the remaining,
we restrict the sample to player-game observations for whom the first stage decision summarizes
the future history. Our working sample is thus made of 40 subjects, 785 games of which 50.3%
are played with a fine, and an average duration of 3.3 rounds. Our outcome variable of interest
is the first-stage decision made by each player in each of these repeated games. Lagged variables
all are computed according to actual past experience: one’s own cooperation at previous match,
partner’s decision and whether the previous match was played with a penalty are all defined
according to the match played just before the current one, whether or not this previous match
belongs to the working sample.
3.2
Descriptive statistics
Figure 2 provides an overview of the cooperation rate observed in each of the two environments.
The overall average cooperation rate is 32%, with a strong gap depending on whether a penalty
enforces cooperation: the average cooperation rate jumps from 19% in the baseline, to 46% with
a penalty. This is clear evidence of a strong disciplining effect of current enforcement. Figure 2a
documents the time trend of cooperation over matches. The vertical line identifies the point in
time beyond which we no longer observe a balanced panel–the number of matches played within
the 15mn duration of the experiment is individual specific, since it depends on game lengths. Time
trends beyond this point are strongly driven by the size of the sample. Focusing on the balanced
panel, our experiment replicates in both environments the standard decrease in cooperation rates
– from 15% at the initial first stage in the baseline, 69% with a penalty, to 11% and 41% at the
13th game. The time trends are parallel between the two conditions–note that, by design, the
history of past institutions is both individual specific and random.
Figure 2b organizes the same data at the individual level, based on the cumulative distribution
of cooperation in a given environment. We observe variations in both the intensive and the
13
Figure 2: The disciplining effect of current enforcement
(a) Game level behavior
(b) Individual level cooperation
.8
.4
cdf of subjects
Average Cooperation at 1st stage
1
No Fine
Fine
.6
.6
.4
.2
.2
0
No fine
Fine
0
0
10
20
Match
30
40
0
.2
.4
.6
Overall Cooperation at first stage
.8
1
Note. Cooperation observed at first-stage in the working sample as a function of the current penalty. Left-hand side: observed
evolution over matches (number of repeated games played before). Right-hand side: cumulative distribution of individual
cooperation rate at first stage of all games played respectively with and without a penalty.
extensive margin of cooperation in the adjustment to the penalty–resulting in first order stochastic
dominance of the distribution of cooperation with no penalty. First, regarding the extensive
margin, we observe a switch in the mass probability of subjects who always choose the same first
stage response: 45% never cooperate with a penalty, while only 26% do so with a penalty, and the
share of subjects who always cooperate raises from 4% to 17% along with the penalty. More than
half the difference in mass at 0 thus moves to 1. Turning to the intensive margin, the distribution
of cooperative decisions with no penalty is more concentrated toward the left: 70% of individuals
who switch between cooperation and defection cooperate less than 30% of the time with no fine,
while it is the case of only 40% of individuals who switch from one match to the other in the
penalty environment.
4
Results
Figure 3 presents graphically the main results that we then examine in detail in the rest of
the section. The sequence of matches are separated based on the time at which they occur:
observed matches are classified as “early” up the the 7th, as late after the 13th.7 According to
the theoretical model, both learning and spillovers explain observed cooperation levels in early
games. In late games, by contrast, beliefs about the state of the world should have converged so
that changes in behavior in line with the environment should be mainly driven by spillovers.
7
These thresholds are chosen in such a way that one third of the observed decisions are classified as “early”, and
one third as “late”. We use matches, rather than rounds, as a measure of time since we focus on games for which
the first stage decision summarizes all future actions within the current repeated game.
14
Figure 3: The dynamic effects of enforcement
(b) The effect of the history of play
(a) The effect of past enforcement
.8
Previous match:
No Fine
Previous match:
Fine
No Fine
Fine
.6
0.54
0.52
0.41
.4
0.38
0.25
0.23
.2
0.17
0.14
0
No Fine
Early games
Fine
No Fine
Average Cooperation at 1st stage
Average Cooperation at 1st stage
0.69
.6
0.49
0.46
.4
0.28
0.25
0.20
.2
0.15
0
Fine
Late games
0.47
Defect
Cooperate
Early games
Defect
Cooperate
Late games
Note. Cooperation at first stage in the working sample according to the draw of a penalty at previous match. In each figure,
the data is split according to whether the match occurs early (before the 7th match) or late. Left-hand side: each subpanel
refers to current enforcement; Right-hand side each subpanel refers to the partner’s decision experienced at previous match.
Figure 3a presents graphical evidence about the effect of the current and past institutional
environment in early and late games. First we observe that current institutional environment
affect cooperation, in the presence of a fine cooperation levels are higher both in early and late
games. We also find evidence of legal enforcement spillovers in late matches: past enforcement
induces a slight increase of cooperation in the current match. This result is reversed in early
matches where past enforcement see to have a detrimental effect on undermines cooperation in
current matches.
Table 2 presents a set of regression providing more systematic estimates of the effects of
current and past institutions.8 Column 1 reports the effect of the current enforcement. The
presence of a fine punishing those that decide to deviate from cooperation has a strong and
statistically significant effect on the individual propensity to cooperate. This result is in line with
the hypotheses that people respond to the incentives set by formal sanctions and the empirical
evidence in the field (Becker, 1968; Drago, Galbiati, and Vertova, 2009). Since in our experimental
design the institutional environment is randomly assigned we can identify the effect of both current
and past institutions. In column 2 we add the effect of the past institutional environment, we
focus on the effect of penalties in match t − 1 of the individual choice to cooperate in the following
match. The effect of the past institutional environment is positive and statistically significant
suggesting the presence of some institutional spillover. In column 3 and 4 presents the effect of
8
In all regression models, the outcome variable is the binary decision to cooperate at first stage of the current
match, Cit . The parameters are estimated using Probit models and controlling for the full set of individual levels
controls. Standard errors are clustered at the session level to control for the within-session correlation induced by
the quasi-stranger design.
15
Table 2
Fit = 1
(1)
Enforcement
(2)
(3)
Spillovers
(4)
(5)
Early
(6)
(7)
Late
(8)
est8
1.147∗∗∗
(0.211)
0.297∗∗∗
(0.031)
752
1.107
0.551
-303.910
752
1.107
0.551
-303.910
1.156∗∗∗
(0.216)
0.088∗
(0.048)
752
1.107
0.551
-303.682
0.299∗∗∗
(0.031)
0.023∗∗
(0.011)
752
1.107
0.551
-303.682
0.960∗∗∗
(0.267)
-0.237
(0.233)
228
1.293
0.626
-101.236
0.221∗∗∗
(0.036)
-0.055
(0.058)
228
1.293
0.626
-101.236
1.620∗∗∗
(0.394)
0.241∗∗∗
(0.027)
325
1.410
0.665
-126.385
0.364∗∗∗
(0.041)
0.054∗∗∗
(0.011)
325
1.410
0.665
-126.385
Fit−1 = 1
N
sigma u
rho
ll
Note. Probit models with individual random effects on the decision to cooperate at first stage estimated on the working
sample. Standard errors (in parenthesis) are clustered at the session level. All specifications include control variables for
gender age student penalty rd1 choice rd1 lenght1 game. Significance levels: ∗ 10%, ∗∗ 5%, ∗∗∗ : 1%.
current enforcement and institutional spillover in early and late matches as we have in Figure
3. The coefficient estimates on the effect of current and past institutions are consistent with the
predictions of the theoretical model. In early matches where both learning and spillovers are
present while current fines have a positive and significant effect on cooperation, past enforcement
institutions do not have a statistically significant effect on the individual propensity to cooperate
in following matched. In late games, when beliefs about the state of the world should have
converged, we can observe a positive and significant institutional spillover effect. In late games
past institutions do not affect learning and influence the propensity to cooperate only through
institutional spillovers.
The next step of our analysis aims at unpacking this effects in their different components.
Figure 3b provide some graphical evidence that can be intuitively interpreted in terms of learning. Here we plot cooperation levels separately according to the previous history of play. In
late matches, the cooperation rate among players who faced a cooperative partner at previous
match is twice its value against defection. This suggests a strong behavioral spillover of cooperation.9 Importantly, this effect does not interact with past enforcement in late matches: the
rate of cooperation is the same whether or not cooperation resulted from strong legal enforcement. This is the main difference with early matches. The effect of past cooperation on the
current willingness to cooperate is stronger in early matches if such history occurred in a weak
enforcement environment, while the detrimental effect of defection is stronger if experienced with
strong enforcement. This is consistent with learning-based dynamic: the amount of information
delivered by observing cooperation (defection) from others is higher when it occurs in the absence
(presence) of external incentives to do so. The rest of this section confirms the intuitions obtained
9
This effect is consistent with the large literature on conditional cooperation for instance Geacheter and Fisfchbacher EL or AER.
16
from the graphical analysis and shows how this combination of learning and behavioral spillovers
explain the dynamics of cooperation observed in the experiment.
4.1
The combined effect of learning and institutions: statistical test of the
model
In the empirical analysis, we denote Cit = 1ait =D ∈ {0, 1} the observed decision to cooperate of
participant i at the first stage of match t in the experiment. The main behavioral insights from
the model are summarized by equation (6),
P (Cit = 1) = 1 − ΦH Λ1 − φF
1Fit−1 =1 − φC 1ajt−1 =C − Λ2 1Fit =1

−
X
Λjk,l
1(Fit =j,Fit−1 =k,ajt−1 =l) 
j,k∈{0,1},l∈{C,D}
which is the estimable form of a Probit model on the individual decision to cooperate. This probability results from equilibria of the cutoff form involving the primitives of the model. Denoting
εit observation specific unobserved heterogeneity, θ a vector of unknown parameters, Xit a set
of observable describing participant i experience up to t and Cit∗ the latent function generating
player i willingness to cooperate at match t, observed decisions inform about the model parameters according to Cit = 1[Cit∗ = Xit θ + εit > 0]. Our empirical test of the model is thus based on
∗
∂φ(Xθ)
∂C
estimated coefficients, ∂C
∂X , rather than marginal effects ( ∂X = θ ∂X ).
In the set of covariates, both current (1Fit =1 ) and past enforcement (1Fit−1 =1 ) are exogenous
by design. The partner’s past decision to cooperate, Cjt−1 is exogenous to Cit as long as player
i and j have no other player in common in their history. Due to the rematching of players from
one match to the other, between subjects correlation might arise if player j met another player
with whom i has already played once. We address this concern in two ways: first, we include
the decision to cooperate at the first stage of the first match in the set of control variables, as a
measure of individual unobserved ex ante willingness to cooperate. Second, we take into account
the correlation structure in the error of the model: we specify a panel data model with random
effects at the individual level, control for the effect of time thanks to the inclusion of the match
number, and cluster the errors at the session level to account in a flexible way for within sessions
correlation.
Table 3 reports the estimation results from several specifications, in which each piece of the
model is introduced one after the other. In line with the model, parameters are estimated on
data from early and late stages. The parameters of interest are the two spillovers components of
individual values, φF , φC and the learning parameters Λk,l , k ∈ {0, 1}, l ∈ {C, D}.10 As shown in
section 2.3, these two parameters can be identified as the effect of the history in the previous game
10
Note that we do not separately estimate these parameters according to the current enforcement environment,
but rather estimate weighted averages Λk,l = 1Fit =0 Λ0k,l + 1Fit =1 Λ1k,l . .
17
Table 3
Fit = 1
Early
Early × 1[Fit ]
(1)
(2)
(3)
(4)
(5)
1.448∗∗∗
(0.294)
0.285
(0.330)
-0.698∗∗
(0.289)
1.454∗∗∗
(0.297)
0.292
(0.343)
-0.698∗∗
(0.286)
0.049
(0.090)
1.480∗∗∗
(0.301)
0.348
(0.516)
-0.646∗
(0.363)
0.306∗∗∗
(0.052)
1.066∗∗∗
(0.290)
0.233∗
(0.130)
-0.876∗∗∗
(0.339)
1.473∗∗∗
(0.325)
0.460
(0.498)
-0.644
(0.408)
0.094
(0.088)
0.693∗∗∗
(0.196)
0.430∗∗
(0.195)
-0.228∗∗∗
(0.088)
-0.631∗∗
(0.275)
553
1.063
0.530
-224.416
553
1.060
0.529
-220.033
1.472∗∗∗
(0.325)
0.453
(0.537)
-0.643
(0.410)
0.085
(0.180)
0.674∗∗
(0.315)
0.448∗∗∗
(0.065)
-0.230∗∗∗
(0.073)
-0.621∗∗∗
(0.238)
0.029
(0.354)
553
1.060
0.529
-220.031
Fit−1 = 1
ajt−1 = C
Early × C0
Early × C1
Early × D0
C1
N
σu
ρ
LL
553
1.063
0.531
-234.677
553
1.064
0.531
-234.624
Note. Probit models with individual random effects on the decision to cooperate at first stage estimated on the working
sample. Standard errors (in parenthesis) are clustered at the session level. All specifications include control variables for
gender age student penalty rd1 choice rd1 lenght1 game. Significance levels: ∗ 10%, ∗∗ 5%, ∗∗∗ : 1%.
conditional on the current enforcement (Fit ). Its marginal effect, Λ1 , as well as the intercept λ0 ,
both embed the value of equilibrium beliefs: we thus allow these two parameters to take different
values in early games as compared to late ones.
Columns (1) and (2) show the estimated effect of past and current enforcement when these
parameters are introduced in the specification. While we do not find any significant change
in the intercept when moving from early to late games, the effect of current enforcement on
the current willingness to cooperate is much stronger in early games. Column (3) introduces
learning parameters. As stressed in Section 2.4, the learning parameters show up before beliefs
have converged. They are thus estimated in interaction with a early dummy variable. Once
learning is taken into account, enforcement spillovers turn-out significant. More importantly,
the model predicts that learning is stronger when observed decisions are more informative about
societies values, which in turn depends on the enforcement regime under which behavior has
been observed–ie, cooperation (defection) is more informative under weak (strong) enforcement.
This results in a clear ranking between learning parameters–see equation (7). We use defection
18
under weak enforcement as a reference to the estimated learning parameters. The results show
that cooperation under these same circumstances (Early × C0 ) leads to the strongest increase in
the current willingness to cooperate. Observing this same decision but under strong enforcement
institutions rather than weak ones (Early×C1 ) has almost the same impact as observing defection
under strong institutions (the reference): in both cases, behavior is aligned with the incentives
implemented by the rules and barely provides any additional insights about the distribution of
values in the group. Last, defection under strong institutions (Early × D0 ) is informative about
a low willingness to cooperate in the group, and results in a strongly significant drop in current
cooperation.
Column (4) adds indirect spillovers, induced by the cooperation of the partner in the previous
game. The results confirm that indirect spillovers are stronger than institutional spillovers. Once
both learning and indirect spillovers are taken into account, we even fail to find a significant effect
of past enforcement: most of the apparent effect of past institutions goes through the resulting
change in behavior of people exposed to these rules. The identification of learning parameters
in this specification is quite demanding since both past enforcement and past cooperation are
included as dummy variables in this specification We nowadays observe a statistically significant
effect of learning in early games, with the expected ordering according to how informative is the
signal. Last, column (5) further adds the interaction between observed behavior from partner
in the previous game and the enforcement regime in an attempt to check the reliability of the
assumption that learning has converged in late games. As expected, we no longer observe any
effect of this interaction: in late games, it is cooperation per se rather than the enforcement
regime giving rise to this decision that matters for current cooperation.
4.2
The cumulative effect of past institutions and behaviors
In the model that determined the specification estimated in the previous section, it was assumed
that the personal value attached to cooperation βit was affected only be the environment and
behaviors in the previous game as specified in equation 1. It is however conceivable that the
longer history could have an impact on βit . To isolate the spillover effects and try to abstract
from learning dynamics, we focus our analysis in this section on late matches.
We use the same empirical specification as in the previous section, where the variables in Xit
now include observations form games before game t − 1. There is no theoretical guidance on how
the longer history should affect the current value attached to cooperation. We can focus on either
the institutional history (the sequence of fines in past games) or the history of cooperation (the
sequence of decisions of partners in previous games). Furthermore, history can be summarized
in many ways: we could for instance imagine that what matters is the total number of penalties
experienced in the past. Alternatively, we could imagine that the number of times the partner
cooperated in the first round of the game is what is determinant. We explore this in Table 4.
In columns (1) and (2) we examine whether the history matters without taking into account
19
Table 4
Fit = 1
Overall nb of Fines
(1)
(2)
(3)
(4)
(5)
(6)
est6
(7)
est7
(8)
est8
1.621∗∗∗
(0.409)
1.528∗∗∗
(0.489)
1.623∗∗∗
(0.409)
0.476
(1.106)
0.190∗
(0.105)
1.636∗∗∗
(0.393)
1.647∗∗∗
(0.398)
0.317
(0.823)
0.157
(0.106)
0.143∗
(0.073)
0.000
(0.106)
1.623∗∗∗
(0.384)
0.365∗∗∗
(0.034)
1.625∗∗∗
(0.420)
0.341∗∗∗
(0.028)
0.223
(0.138)
0.301
(0.190)
0.209
(0.426)
0.050
(0.032)
0.068
(0.047)
0.047
(0.095)
325
1.410
0.665
-126.350
325
1.410
0.665
-126.350
0.132
(0.127)
-0.173
(0.147)
0.019
(0.480)
0.263∗∗∗
(0.085)
1.658∗∗∗
(0.459)
0.559∗∗∗
(0.188)
325
1.390
0.659
-120.541
0.028
(0.026)
-0.036
(0.027)
0.004
(0.100)
0.055∗∗
(0.023)
0.348∗∗∗
(0.064)
0.117∗∗∗
(0.043)
325
1.390
0.659
-120.541
Coop j nb since t=0
0.202∗∗∗
(0.048)
0.019
(0.116)
Coop j in a raw
F it in a raw
CPG==1
CPG==2
CPG==3
CPC==1
CPC==2
CPC==3
N
sigma u
rho
ll
325
1.395
0.661
-126.665
325
1.266
0.616
-123.336
325
1.346
0.644
-124.323
325
1.252
0.611
-122.079
Note. Probit models with individual random effects on the decision to cooperate at first stage estimated on the working
sample. Standard errors (in parenthesis) are clustered at the session level. All specifications include control variables for
gender age student penalty rd1 choice rd1 lenght1 game. Significance levels: ∗ 10%, ∗∗ 5%, ∗∗∗ : 1%.
the exact sequencing. In column (1) we show that the total numbers of games played with a fine
in the past does not affect cooperation today. In column (2) we add the total number of games
where the partner cooperated in the first round, which does not affect cooperation either in the
current game. It seems apparent that the overall experience in the past is not a determinant
factor.
We then explore in columns (3) and (4) whether the fact of having stable experiences has an
impact. In column (3) we introduce variables measuring whether the last k games were played
with a fine. It appears that having a fine in the previous game and not in the one before (variable
CP1) has no impact on cooperation. The impact comes however from the stability of institutions:
having experienced two or more consecutive games with fines in the past. Column (4) adds the
cumulative experience of whether the partner cooperated in the first round of a match. Consistent
with the previous sections, the effect comes mostly through indirect spillovers: it is the fact that
20
the partners cooperated in the past that has an impact on current cooperation. Moreover, it is
the consistency of their actions that plays an important role.
5
Conclusion
We find strong empirical support for the main behavioral insights of the model: spillovers and
countervailing effect of learning. We also explore the cumulative effect of past enforcement in the
data, and find strong reinforcement of past history: the gradient of cooperation according to past
institutions is increasing in their stability. Such cumulative effects of past institutions can only
be accounted for by specifying a full structural model of learning taking into account the current
consequences of the whole history of each player. This is next on our agenda.
References
Becker, G. S. (1968): “Crime and Punishment: An Economic Approach,” Journal of Political Economy,
76(2), 169–217.
Cassar, A., G. d’Adda, and P. Grosjean (2014): “Institutional Quality, Culture, and Norms of
Cooperation: Evidence from Behavioral Field Experiments,” Journal of Law and Economics, 57(3),
821–863.
Dal Bó, P., and G. R. Fréchette (2011): “The Evolution of Cooperation in Infinitely Repeated
Games: Experimental Evidence,” American Economic Review, 101(1), 411–29.
Drago, F., R. Galbiati, and P. Vertova (2009): “The Deterrent Effects of Prison: Evidence from a
Natural Experiment,” Journal of Political Economy, 117(2), 257–280.
Fischbacher, U., and S. Gachter (2010): “Social Preferences, Beliefs, and the Dynamics of Free
Riding in Public Goods Experiments,” American Economic Review, 100(1), 541–56.
Galbiati, R., and P. Vertova (2008): “Obligations and cooperative behaviour in public good games,”
Games and Economic Behavior, 64(1), 146–170.
(2014): “How laws affect behavior: Obligations, incentives and cooperative behavior,” International Review of Law and Economics, 38, 48–57.
Guiso, L., P. Sapienza, and L. Zingales (2016): “Long-Term Persistence,” Journal of the European
Economic Association, pp. n/a–n/a.
Lowes, S., N. Nunn, J. A. Robinson, and J. Weigel (2016): “The Evolution of Culture and Institutions: Evidence from the Kuba Kingdom,” Harvard WP, Revise and Resubmit, Econometrica.
Peysakhovich, A., and D. G. Rand (2016): “Habits of Virtue: Creating Norms of Cooperation and
Defection in the Laboratory,” Management Science, 62(3), 631–647.
21
Sliwka, D. (2007): “Trust as a Signal of a Social Norm and the Hidden Costs of Incentive Schemes,”
American Economic Review, 97(3), 999–1012.
Tabellini, G. (2008): “The Scope of Cooperation: Values and Incentives,” Quarterly Journal of Economics, 123(3), 905–950.
van der Weele, J. (2009): “The Signaling Power of Sanctions in Social Dilemmas,” Journal of Law,
Economics, and Organization.
22
Appendix
A
Proofs
Proposition 1
As derived in the main text, if an equilibrium exists, it is necessarily such that players use cutoff
strategies described. Reexpressing characteristic equation(3), we can show that the cutoffs are determined
by the equation g(β ∗ (Fit )) = 0, where g is given by
g(β ∗ ) = −β ∗ (Fit ) + Π1 − F
1Fit =1 + (1 − ΦH [β ∗ (Fit )]) Π2 −
δ
(F
1−δ
1Fit =1 + Π3 )
where g(β ∗ ) > 0 when β ∗ converges to −∞ and g(β ∗ ) < 0 when β ∗ converges to +∞. Thus, since g is
continuous, there is at least one solution to the equation g(β ∗ (Fit )) = 0. At least one equilibrium exists.
Using the implicit theorem we have:
∂β ∗
∂F
= −
∂g ∂g
/
∂F ∂β ∗
= −
δ
−1 − (1 − ΦH [β ∗ ]) 1−δ
h
i
δ
−1 − φH [β ∗ ] Π2 − 1−δ
(F 1Fit =1 + Π3 )
where φH is the density corresponding to distribution ΦH .
For stable equilibria, the denominator is negative, so that overall
∂β ∗
<0
∂F
Similarly
∗
∂β
∂µH
=
∗
i
δ
Γ2 − 1−δ
(F 1Fit =1 + Π3 )
h
i
−
δ
−1 − φH [β ∗ ] Π2 − 1−δ
(F 1Fit =1 + Π3 )
H [β
− ∂Φ∂µ
H
]
h
∗
H [β
As before, in stable equilibria, the denominator is negative. Furthermore we have ∂Φ∂µ
H
increase in the mean of the normal distribution decreases ΦH [x] for any x. Overall we get
]
< 0 since an
∂β ∗
<0
∂µH
Proposition 2
We establish the existence of a stationary equilibrium. As derived in the main text, if an equilibrium
exists, it is necessarily such that players use cutoff strategies, defined by equation (4):
∗
β (Fit )
=
Π1 − F
1Fit =1
+ p (Fit ) Π2 −
∗
23
δ
(F
1−δ
1Fit =1 + Π3 )
We now determine the probabilities of cooperation p∗ (1) and p∗ (0) of the partner in the current
interaction, partner that we denote j 0 (and who interacted with individual k in the previous interaction).
We have:
p∗ (Fit )
=
=
P [aj 0 t = C|Fit ]
X
h
1 − ΦH β ∗ (Fit ) − φF
1Fj0 t−1 =1 − φC 1akt−1 =C
i
P [Fj 0 t−1 , akt−1 ]
(Fj 0 t−1 ∈{0,1},akt−1 ∈{C,D})
Furthermore, since we focus on stable equilibria, we have that P [Fj 0 t−1 , akt−1 ] can be derived as
1 ∗
p (1)
2
1
P [0, C] = P [Fj 0 t−1 = 0]P [akt−1 = C|Fkt−1 = 0] = p∗ (0)
2
1
P [1, D] = P [Fj 0 t−1 = 1]P [akt−1 = D|Fkt−1 = 1] = (1 − p∗ (1))
2
1
P [0, D] = P [Fj 0 t−1 = 0]P [akt−1 = D|Fkt−1 = 0] = (1 − p∗ (0))
2
P [1, C] = P [Fj 0 t−1 = 1]P [akt−1 = C|Fkt−1 = 1] =
So we have
p∗ (Fit )
=
+
1
1
(1 − ΦH [β ∗ (Fit ) − φF − φC ]) p∗ (1) + (1 − ΦH [β ∗ (Fit ) − φF ]) (1 − p∗ (1))
2
2
1
1
(1 − ΦH [β ∗ (Fit ) − φC ]) p∗ (0) + (1 − ΦH [β ∗ (Fit )]) (1 − p∗ (0))
2
2
(8)
The equilibrium, if it exists, is thus defined by the solution to the following system of equations, defined
by equation (4) with and without fines, and by equation (8) above:

δ
∗
∗
∗

β
(1)
−
Π
−
F
+
p
(1)Π
−
p
(1)
(F
+
Π
)
=0

1
2
3
1−δ


δ
A=
β ∗ (0) − Π1 + p∗ (0)Π2 − 1−δ
p∗ (0)Π3 = 0

h

P
 ∗

p (Fit ) = (Fj0 t−1 ∈{0,1},akt−1 ∈{C,D}) 1 − ΦH β ∗ (Fit ) − φF
1Fj0 t−1 =1 − φC 1akt−1 =C
i
P [Fj 0 t−1 , akt−1 ]
We see that overall, the system of equations A derived above can be reexpressed as a non linear system
of two nonlinear equations (where p∗ (1) and p∗ (0) are functions of β ∗ (1) and β ∗ (0))
Given that p∗ (1) is bounded between 0 and 1, when β ∗ (0) goes from −∞ to +∞, β ∗ (1), solution to the
first equation remains bounded. Similarly, given that p∗ (0) is bounded between 0 and 1, when β ∗ (1) goes
from −∞ to +∞, β ∗ (0), solution to the second equation remains bounded. This implies that in the space
(β ∗ (0), β ∗ (1)), the two curves intersect at least once and there therefore exists a stationary equilibrium.
Corollary 1
Denote pgt (Fit ) the probability that a random individual of group g cooperates in interaction t, denote
the probability that the partner experienced F in the previous period (Fj 0 t−1 = F ) and faced a
player who played a (akt−1 = a)
Ptg [F, a]
We establish the recursive property: p1t (Fit ) > p2t (Fit ) for t ≥ 2.
24
We first establish the property for t = 2.
In the first period, the proportion of individuals who face a fine is different in the two groups: we
denote that proportion q g . In later periods, this proportion equals 1/2.
We have:
pg2 (Fi2 )
X
=
h
1 − ΦH β ∗ (Fi2 ) − φF
1Fj0 1 =1 − φC 1ak1 =C
i
P1g [Fj 0 1 , ak1 ]
(Fj 0 1 ∈{0,1},ak1 ∈{C,D})
1
(1 − ΦH [β ∗ (Fi2 ) − φF ]) q g (1 − pg1 (1)) + (1 − ΦH [β ∗ (Fi2 )]) (1 − q g )(1 − pg1 (0))
2
+ (1 − ΦH [β ∗ (Fi2 ) − φF − φC ]) q g pg1 (1) + (1 − ΦH [β ∗ (Fi2 ) − φC ]) (1 − q g )pg1 (0)
=
=
q g (1 − ΦH [β ∗ (Fi2 ) − φF ]) + (1 − q g ) (1 − ΦH [β ∗ (Fi2 )])
+ q g (ΦH [β ∗ (Fi2 ) − φF ] − ΦH [β ∗ (Fi2 ) − φF − φC ]) pg1 (1)
+
(1 − q g ) (ΦH [β ∗ (Fi2 )] − ΦH [β ∗ (Fi2 ) − φC ]) pg1 (0)
By assumption, the difference between group 1 and 2 is that more interactions where played with a fine
in the first interaction in group 1 than in group 2. This implies that there is a higher probability of having
played with a fine in the previous round, but also a higher probability that the partner in interaction 2
saw his partner in interaction 1 cooperate. So we have that
p11 (1) > p22 (1)
p11 (0) > p22 (0)
q1 > q2
Overall this implies, using the expression for pg2 (Fi2 ) derived above, the property for t = 2:
p12 (Fi2 )
>
p22 (Fi2 )
We now assume the property is true for interaction t − 1 and establish it for interaction
t. The probability that a random individual of group g cooperates is given by:
pgt (Fit )
=
X
h
1 − ΦH β ∗ (Fit ) − φF
1Fj0 t−1 =1 − φC 1akt−1 =C
i
g
Pt−1
[Fj 0 t−1 , akt−1 ]
(Fj 0 t−1 ∈{0,1},akt−1 ∈{C,D})
Doing the same decomposition as above we obtain
pgt (Fit )
=
+
+
1
1
(1 − ΦH [β ∗ (Fit ) − φF ]) + (1 − ΦH [β ∗ (Fit )])
2
2
1
∗
(ΦH [β (Fit ) − φF ] − ΦH [β ∗ (Fit ) − φF − φC ]) pgt−1 (1)
2
1
(ΦH [β ∗ (Fit )] − ΦH [β ∗ (Fit ) − φC ]) pgt−1 (0)
2
So that indeed, using the recursive property for interaction t − 1, the recursive property is established for
25
interaction t:
p1t (Fit ) > p2t (Fit )
We now establish the second result of the corollary. Recombining equation (8) that characterize p∗ (1)
and p∗ (0), we have:
p∗ (1)
1
1 − (ΦH [β ∗ (Fit ) − φF ] − ΦH [β ∗ (Fit ) − φF − φC ])
=
2
+
1−
1
(ΦH [β ∗ (Fit ) − φF ] + ΦH [β ∗ (Fit )])
2
1 ∗
p (0) (ΦH [β ∗ (Fit )] − ΦH [β ∗ (Fit ) − φC ])
2
Furthermore, recombining the equations that characterize p∗ (1) and p∗ (0), we obtain:
1
∗
∗
=
p (1) 1 − (ΦH [β (1, 1, D)] − ΦH [β (1, 1, C)])
2
∗
+
1−
1
(ΦH [β ∗ (1, 1, D)] + ΦH [β ∗ (1, 0, D)])
2
1 ∗
p (0) (ΦH [β ∗ (1, 0, D)] − ΦH [β ∗ (1, 0, C)])
2
Since ΦH [β ∗ (., 1, D)] > ΦH [β ∗ (., 1, C)], we have that p∗ (0) is increasing in p∗ (1). Furthermore, since,
following the same logic as Proposition 1, p∗ (1) increases with F , we have that p∗ (0) is also increasing in
F . This establishes the second result of Proposition 2.
Proposition 3
Existence: We first show that an equilibrium exists if we assume that a player who has belief qit in
interaction t believes that her partner j 0 in that interaction has the same beliefs qj 0 t = qit .
If an equilibrium exists, it is necessarily such that players use cutoff strategies where the cutoff is
defined by:
βt∗ (Fit , qit )
=
Π1 − F
1Fit =1 + p∗t (Fit , qit ) Π2 −
δ
(F
1−δ
1Fit =1 + Π3 )
We have:
p∗t (Fit , qit )
=
P [aj 0 t = C|Fit , qit ]
h
h
X
= qit
1 − ΦH β ∗ (Fj 0 t , qj 0 t−1 ) − φF
1Fj0 t−1 =1 − φC 1akt−1 =C
ii
P [Fj 0 t−1 , akt−1 , qj 0 t−1 |s
(Fj 0 t−1 ,akt−1 ,qj 0 t−1 )
+
(1 − qit )
h
h
1 − ΦL β ∗ (Fj 0 t , qj 0 t−1 ) − φF
X
1Fj0 t−1 =1 − φC 1akt−1 =C
(Fj 0 t−1 ,akt−1 ,qj 0 t−1 )
Furthermore, we have
P [Fj 0 t−1 , akt−1 , qj 0 t−1 |s = H]
=
=
P [Fj 0 t−1 ] P [akt−1 |Fj 0 t−1 , s = H] ft (qj 0 t−1 |s = H)
1
P [akt−1 |Fj 0 t−1 , s = H] ft (qj 0 t−1 |s = H)
2
26
ii
P [Fj 0 t−1 , akt−1 , qj
we assumed that a player who had belief qit−1 in interaction t − 1 believes that all other players in that
interaction share the same belief qit−1 . Under this restriction, we have ft (qj 0 t−1 |s = .) = 1(qj0 t−1 =qit−1 )
So we can simplify the expression p∗t derived above to obtain:
p∗t (Fit , qit )
=
h
X
qit
h
1 − ΦH β ∗ (Fj 0 t , qj 0 t−1 ) − φF
1Fj0 t−1 =1 − φC 1akt−1 =C
ii
f rac12P [akt−1 |Fj 0 t−1 ,
(Fj 0 t−1 ,akt−1 ,qj 0 t−1 )
+
h
h
1 − ΦL β ∗ (Fj 0 t , qj 0 t−1 ) − φF
X
(1 − qit )
1Fj0 t−1 =1 − φC 1akt−1 =C
ii
f rac12P [akt−1 |Fj
(Fj 0 t−1 ,akt−1 ,qj 0 t−1 )
We can then show recursively that an equilibrium exists at interaction t if it exists at t−1. P [akt−1 |Fj 0 t−1 , s =
i], i ∈ {L, H} is entirely determined by the equilibrium structure in interactions t0 < t. Furthermore the
right hand side of equation (9) is bounded, so that there is at least one solution β ∗ (Fit , qit ) to this equation.
This proves that an equilibrium exists in period t if it exists in periods t0 < t and that players use cutoff
strategies.
We now show the updating property conditional on existence.
Specifically we assume that for t0 < t:
• An equilibrium exists at t0
•
•
Interaction t
We now derive the properties on updating. We have
qit (Fit−1 , ajt−1 , qit−1 ) =
qit−1 P [ajt−1 |Fit−1 , s = H]
qit−1 P [ajt−1 |Fit−1 , s = H] + (1 − qit−1 )P [ajt−1 |Fit−1 , s = L]
Which yields:
h
ΦH β ∗ (1, qj 0 t−1 ) − φF 1Fj0 t−2 =1 − φC 1akt−2 =
h
i
h
qit (1, D, qit−1 ) = P
∗ (1, q 0
q
Φ
β
)
−
φ
1
−
φ
1
+
(1
−
q
)Φ
β ∗ (1, qj 0 t−1
it−1
H
j
t−1
F
F
=1
C
a
=C
it−1
L
0
kt−2
Fj 0 t−2 ,ak0 t−2 ,qj 0 t−1
j t−2
qit−1
P
Fj 0 t−2 ,ak0 t−2 ,qj 0 t−1
while
qit (0, D, qit−1 ) = P
Fjt−2 ,akt−2
qit−1
P
Fjt−2 ,akt−2
ΦH [β ∗ (0, Fjt−2 , akt−2 , qit−1 )]
qit−1 ΦH [β ∗ (0, Fjt−1 , akt−1 , qit−1 )] + (1 − qit−1 )ΦL [β ∗ (0, Fjt−1 , akt−1 , qit−1 )]
The distribution of Fjt−2 , akt−2 being independent of Fit−1 and the fact that recursive property establishes
that β ∗ (1, ., ., .) < β ∗ (0, ., ., .) implies that qit (1, D, qit−1 ) < qit (0, D, qit−1 ).
Similarly we can establish that qit (0, C, qit−1 ) > qit (1, C, qit−1 ).
27