Do humans really punish altruistically? A closer look

Altruistic punishment: A closer look 1
1
In press: Proceedings of the Royal Society B: Biological Sciences, 280(1758).
2
3
4
Do humans really punish altruistically? A closer look
5
Eric J. Pedersena, Robert Kurzbanb,c, and Michael E. McCullougha
6
7
a
8
9
b
10
c
Department of Psychology, University of Miami, FL 33124-0751 USA
Department of Psychology, University of Pennsylvania, PA 19104-6241 USA
Department of Economics, University of Alaska Anchorage, AK 99508 USA
11
12
Key Words: cooperation, altruistic punishment, third-party punishment, affective forecasting,
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
evolutionary psychology
Corresponding Author:
Michael E. McCullough
Department of Psychology
University of Miami
P.O. Box 248185
Coral Gables, FL 33124-0751
Phone: 305-284-8057
Fax: 305-284-3402
Email: [email protected]
Altruistic punishment: A closer look 2
39
Abstract
40
Some researchers have proposed that natural selection has given rise in humans to one or more
41
adaptations for altruistically punishing on behalf of other individuals who have been treated
42
unfairly, even when the punisher has no chance of benefiting via reciprocity or benefits to kin.
43
However, empirical support for the altruistic punishment hypothesis depends on results from
44
experiments that are vulnerable to potentially important experimental artefacts. Here we searched
45
for evidence of altruistic punishment in an experiment that precluded these artefacts. In so doing,
46
we found that victims of unfairness punished transgressors whereas witnesses of unfairness did
47
not. Furthermore, witnesses’ emotional reactions to unfairness were characterized by envy of the
48
unfair individual’s selfish gains rather than by moralistic anger toward the unfair behaviour. In a
49
second experiment run independently in two separate samples, we found that previous evidence
50
for altruistic punishment plausibly resulted from affective forecasting error—that is, limitations
51
on humans’ abilities to accurately simulate how they would feel in hypothetical situations.
52
Together, these findings suggest that the case for altruistic punishment in humans—a view that
53
has gained increasing attention in the biological and social sciences—has been overstated.
54
Altruistic punishment: A closer look 3
55
56
Do humans really punish altruistically? A closer look
In many animal species, including humans, individuals punish conspecifics that have
57
harmed them [1–3]. Some researchers have recently argued that humans, unlike other animals,
58
also altruistically punish individuals who have harmed others, even when the punisher has no
59
chance of benefiting via reciprocity or benefits to kin [4–6]. Results from several economics
60
experiments appear to support this claim [4,6,7], but some scholars have questioned both the
61
adaptationist logic behind such theoretical claims [8–10] and the interpretation of the empirical
62
results [8,10–14]. Here we elide these theoretical debates and instead investigate a more basic
63
empirical question: Do people actually spontaneously punish individuals who have only harmed
64
other individuals in anonymous settings in the laboratory? Put differently, do the empirical
65
research findings often marshalled in support of the altruistic punishment hypothesis [e.g., 4,6]
66
provide a reliable guide to the presence or absence of a propensity for altruistic punishment in
67
humans?
68
In previous work, researchers claimed empirical support for the existence of altruistic
69
punishment on the basis of results from public goods game experiments in which the individual
70
being punished had harmed—or failed to help—the putative punisher as well as other victims
71
[4], leaving open the possibility that the punishment was vengeful, rather than altruistic [10].
72
Results from similar experiments that exclude revenge as a possible motive suggest that
73
investments in punishment in such contexts are conspicuously low [15]. Additional data
74
frequently adduced in support of the altruistic punishment hypothesis come from third-party
75
punishment games [6,7], in which a Dictator chooses to give some portion of a sum of money
76
(or none) to a passive Recipient. A third player can punish the Dictator (at a cost) in response to
Altruistic punishment: A closer look 4
77
the Dictator’s transfer to the Recipient. Many third parties in games with this structure pay a cost
78
to punish stingy Dictators, despite receiving no financial benefit from doing so [6,7].
79
However, five methodological limitations of the standard third-party punishment game
80
might conspire to yield inflated estimates of humans’ propensity to punish strangers for having
81
behaved unfairly toward other strangers. First, in the standard game, subjects are assigned to a
82
third-party role that implies their task is to determine how much to punish the Dictator: Indeed,
83
the only choices third parties can make are whether to punish the Dictator [16]—and, if so, how
84
much. Thus, any error will lead to an increase in the estimated quantity of punishment. Second,
85
punishment in the third-party punishment game is typically administered with the presence (or
86
inferred presence) of an audience: punishment of the Dictator by the third party is witnessed by
87
the initial victim because all players see the results of the game. The presence of an audience
88
introduces reputational considerations that could motivate punishment as a means of pursuing
89
indirect fitness benefits (e.g., by signaling one’s quality as a cooperative partner [17, 18]; or
90
one’s formidability to prevent future exploitation of oneself [19, 20] or one’s friends and kin
91
[21]). Indeed, it has been shown—though with a different paradigm—that observers of unfair
92
treatment punish third parties significantly less when they are assured no one will see their
93
decision [22; however, see also ref. 23].
94
Third, the third-party punishment game is typically conducted with the “strategy method”
95
[24], which requires third parties to repeatedly respond to a series of hypothetical Dictator
96
choices—in advance of learning of the Dictator’s actual choice—that are progressively more (or
97
progressively less) unfair [6]. Such methods can cause subjects to infer that the experimenters
98
expect them to vary their responses according to some feature that varies across the set of
99
repeated scenarios [25]. Consequently, due to a well-known experimental artefact called demand,
Altruistic punishment: A closer look 5
100
subjects might feel compelled to punish at least some of the time, calibrating those decisions to
101
the only feature of the Dictators’ repeated choices that varies: how unfair they are. This is
102
especially problematic in the standard third-party punishment game because rewarding is not
103
allowed; the only way subjects can vary their responses is to vary their amount of punishment. In
104
a notable exception, Alemberg, et al. [26] did add a rewarding option to the typical third-party
105
punishment game (conducted with the strategy method), and a small amount of third-party
106
punishment was observed, on average, when Dictators transferred $0 (of $10) to the Recipient.
107
We note, however, that subjects in this experiment were informed, before making their decisions,
108
that it was possible they would not be paired with another subject—in such a case, their
109
decisions would not be enacted and they would retain all of their money (i.e., participants’
110
decisions were somewhat hypothetical; see below).
111
Fourth, the strategy method also involves affective forecasting [27] inasmuch as it
112
requires subjects to respond ex ante to Dictator actions that have not yet occurred. Such
113
behavioural commitments can differ from the actual behaviours people enact after experiencing
114
social situations directly because people frequently weight the features of social situations
115
differently during conscious deliberation than they do after experiencing those social situations
116
in real time [28]. For example, as forecasters, people severely overestimate how upset they
117
would feel by (and subsequently, how much they would attempt to avoid interacting with)
118
someone who had made a racist comment; in contrast, subjects who have actually observed
119
another individual express strongly racist attitudes (versus those in control conditions) respond
120
with relative indifference to the racist individual [29].
121
122
Fifth, previous claims that anger is the predominant emotional response of third-party
punishers have relied on self-reports of anger in response to hypothetical scenarios [4,6]. Self-
Altruistic punishment: A closer look 6
123
reports of anger are typically highly correlated with self-reports of other, similar emotions—
124
including envy [30]. To the extent that the covariation between self-reported anger and self-
125
reported envy is not statistically controlled, estimates of third parties’ anger toward unfair
126
strangers might actually reflect envy, which can also motivate costly punishment in pursuit of
127
goals that are quite distinct from putatively altruistic goals such as enforcing norms or delivering
128
deterrence benefits to strangers [31]: Specifically, if third parties’ punishment of individuals who
129
have treated another individual unfairly is motivated by envy, but not by anger, then the
130
mechanisms that motivate third-party punishment might process cues that another individual has
131
obtained better outcomes than the self, rather than cues that an individual has violated a norm or
132
harmed an anonymous third party in whom the punisher has no fitness interest [e.g., 6,7].
133
Here we present two experiments designed to test whether subjects punish altruistically
134
on behalf of strangers in a third-party punishment game that was designed to rectify the
135
methodological problems noted above. We also examined whether previous findings could
136
plausibly be explained as a product of affective forecasting errors. We note that our goal was not
137
to estimate the unique influence of each of these five potential methodological problems; rather,
138
our goal was to test whether the altruistic punishment hypothesis could survive falsification in an
139
experiment that eliminated these problems. Experiment 1 was a modified third-party punishment
140
game in which subjects could either punish or reward—thereby reducing experimental demand
141
for punishment [25], the confounding of error and punishment, and potential audience effects.
142
Also, subjects made decisions about giving or deducting money from Dictators after witnessing
143
the Dictator’s decision, which enabled us to measure third-party punishment without the
144
possibility of affective forecasting errors [27]. Additionally, our measures of emotion were fine-
145
grained enough that it was possible to evaluate the unique motivational roles of anger and envy.
Altruistic punishment: A closer look 7
146
In Experiments 2a and 2b, the same third-party punishment game was presented as a
147
hypothetical vignette to subjects from two different research pools.
148
149
Methods
150
151
152
Subjects
153
SD = 2.99; 57% female). They received partial course credit and monetary compensation (see
154
below).
155
Experiment 1: Subjects were 315 University of Miami undergraduates (mean age = 19.12,
Experiment 2a: 538 subjects (mean age = 34.37, SD = 12.14; 60% female) were recruited
156
via Amazon Mechanical Turk (http://www.mturk.com/mturk/welcome) and were paid $0.25 for
157
their participation. Participation was restricted to users in the United States. Because participants
158
merely had to read a vignette and then report their forecasts of how they would think, feel, and
159
behave if the hypothetical situation had actually happened to them, participation generally took
160
about 4 minutes.
161
Experiment 2b: We replicated Experiment 2a with University of Miami undergraduates;
162
394 subjects (mean age = 18.74, SD = 1.27; 53% female) participated for partial course credit.
163
Procedure
164
Experiment 1: Subjects were run in individual sessions at a computer station in an
165
isolated room (see ESM S2.1). The entire experiment, including instructions, was conducted on a
166
computer via E-Run with a script created in E-Prime version 2.0. After subjects provided
167
informed consent, they were told they would be interacting with two other players in the building
168
over the computer network and that it was important that those other people remain anonymous;
169
in fact, they interacted with a pre-programmed computer script. Without this deception, the
Altruistic punishment: A closer look 8
170
research would have been unfeasible (see ESM S2.1). Subjects were informed that they would be
171
participating in an economic decision-making game that would last for multiple rounds and they
172
would be paid based on the money they earned during the game. Because deception was
173
involved, everyone was paid a flat rate of $9 at the end of the experiment following a debriefing.
174
We used a “funnel debriefing” method designed to detect suspicion and explore subjects’
175
reactions to having been deceived [32]. Subjects flagged for suspicion were excluded from all
176
analyses presented; re-including them in analyses did not qualitatively affect the results in any
177
way (see ESM S1.2).
178
The decision-making game comprised two rounds (Fig. 1) in which each player was
179
given $5 to use in each round and assigned to one of three roles: Decision-Maker, Receiver, or
180
Observer. (We refer to these roles here as Dictator, Recipient, and Third Party, respectively, to
181
be consistent with labels used in previous work on third-party punishment.) Subjects were not
182
told the exact number of rounds to avoid end of game effects [33], and were told that money
183
earned during each round would be “banked” and thus unaffected by subjects’ behaviour during
184
subsequent rounds. The Dictator ostensibly had the option to give any portion of his or her $5 to
185
the Recipient, or take any portion of the Recipient’s $5; the Third Party would merely see the
186
results of the round and would not be affected by the Dictator’s choice. Subjects were informed
187
that in some rounds all players would be involved, and in other rounds some players might be
188
excluded. Subjects were randomly assigned to be either the Third Party or the Recipient in the
189
first round and the (computer-programmed) Dictator either took $4 or $0 from the Recipient. The
190
computer displayed a summary screen for the round showing the amount of money each player
191
earned for the round. Following the round, subjects completed a lexical decision task (see ESM
192
S1.3) and a series of self-report questions (see below).
Altruistic punishment: A closer look 9
193
Prior to role assignment for the second round, subjects were informed that there would be
194
no Third Party in Round 2 [to avoid potential audience effects; 22,34]; one player would be
195
assigned to a different task and be unable to see the results of the interaction. We note that the
196
presence of the experimenter can also induce audience effects e.g., [22]; we took great care to
197
minimize this potential influence by (a) clearly informing participants during the consent process
198
that their data would be stored completely anonymously and could not be connected to them in
199
any way, and (b) minimizing contact with the experimenter by presenting all instructions
200
electronically. Though we cannot rule out experimenter audience effects completely, our results
201
are as insulated from them as we believe it was possible to do in the context of this experiment.
202
All players were given another $5; because previous earnings had been “banked,” all
203
players started Round 2 with $5. The subject was assigned the role of Dictator while the Dictator
204
from Round 1 (who had treated either the subject or the other player fairly or unfairly) was
205
assigned the role of the Recipient (ostensibly by chance). Players were identified consistently
206
throughout, so subjects were aware that the Recipient in Round 2 was the same player that had
207
been the Dictator in Round 1. Subjects were instructed that they could give any amount of their
208
$5 to the Recipient, do nothing, or remove any amount of the Recipient’s $5 (the word
209
“punishment” was never used). Removing money cost one-fourth of the amount removed and,
210
unlike in the first round, was not gained by the subject as income—it simply disappeared. Note
211
that the cost of punishment used here, 1:4, was less expensive than the 1:3 cost typically used in
212
the third-party punishment game; previous research has shown that punishment becomes more
213
likely as the cost of punishment declines (see ref. [10] for review). Following the completion of
214
the round, the experiment ended and the experimenter debriefed the subject through an
Altruistic punishment: A closer look 10
215
extensive, staged process to assess the believability of the experiment and to explain why
216
deception was necessary [32].
217
Experiments 2a and 2b: After providing consent, subjects were instructed to imagine
218
themselves “…in a particular situation in our laboratory. Please try to picture yourself in the
219
situation we are describing. We will ask you to complete a series of questions regarding how you
220
think you would think, feel, and act in this situation.” The layout and instructions of the game
221
were presented as they were in Experiment 1, and the rounds of the game and the self-report
222
measures were the same. However, subjects did not complete a lexical decision task following
223
the first round and were not debriefed following the completion of the experiment (because no
224
deception was involved).
225
226
227
Psychometric information regarding the self-report measures
228
responses toward both of the other players after the first round. They described their feelings
229
toward both players to avoid demand effects that might have occurred by probing only about the
230
Dictator. (Emotional reactions to the other player were not of theoretical interest here and so, in
231
the interest of brevity, we do not report them).
232
Self-rated emotions toward the other players: Subjects were asked to describe their emotional
-
Anger: 3-item composite of ratings on a scale from 0 (not at all) to 5 (extremely) the
233
extent to which the subject was “angry,” “mad,” and “outraged” at the Dictator
234
(Cronbach’s  = .94).
235
236
-
Envy: 2-item composite of ratings on a scale from 0 (not at all) to 5 (extremely) the extent
to which the subject was “envious” and “jealous” of the Dictator (Cronbach’s  = .84).
Altruistic punishment: A closer look 11
237
Fairness/moral wrongness of the Round 1 dictator’s behaviour: Subjects were asked to rate
238
both how “fair” and how “morally wrong” the Dictator’s behaviour was toward the Recipient in
239
Round 1 on a scale from 1 (not at all) to 9 (totally).
240
241
242
Results
Experiment 1
243
Third parties did not punish on behalf of strangers: A one-sample Wilcoxon test (used
244
because distributions were non-normal) revealed that the sample median of the distribution of
245
dollars punished or rewarded in Round 2 (in terms of the effect on the Recipient, not the cost to
246
the subject) by third-party witnesses of unfairness did not differ significantly from a
247
hypothesized median of zero, Z = -1.48, p = .140, N = 65 (all p-values throughout manuscript are
248
two-tailed). In contrast, victims of unfairness punished a nonzero amount, Z = -3.52, p < .001, N
249
= 61—significantly more than mere witnesses of unfairness, p = .026, N = 126 (two-sample
250
median test; Fig. 2).
251
If the function of punishment is to deter harmdoers from imposing costs on oneself (i.e.,
252
to bargain for better treatment for oneself [10,13,20]) or others in the future—or even if its
253
function is to enforce adherence to social norms [6]—then the punishment must be strong
254
enough to erase unfairly gained benefits [1,35]. Otherwise, the harmdoer retains a net profit from
255
the transgression, and thus will retain an incentive to continue to behave unfairly toward others
256
in the future. Because unfair Dictators took $4 from Recipients in Round 1 of Experiment 1, $4
257
was also the minimum amount of punishment that would be expected to deter unfair Dictators
258
from behaving unfairly in the future. Third-party punishment of this magnitude was extremely
259
rare: only 2 of 65 (3%) witnesses imposed at least $4 worth of punishment on unfair Dictators, a
Altruistic punishment: A closer look 12
260
proportion no different from the proportion for witnesses of fairness (0 of 80; p = .199, Fisher’s
261
Exact Test). In contrast, 13 of 61 (21%) victims of unfairness punished at least $4, a proportion
262
significantly greater than that for both recipients of fairness (0 of 64; p < .001), and witnesses of
263
unfairness (p = .002). Indeed, most victims of unfairness who punished (13 of 21) imposed at
264
least $4 worth of punishment.
265
According to the self-report measures of emotion, third parties did not become angry at
266
unfairness: when controlling for envy (which was highly correlated with anger, r = .637, p <
267
.001; see ESM S1.5), a 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANCOVA revealed
268
a significant target*treatment interaction for anger, F (1, 265) = 17.11, p < .001). Witnesses of
269
unfairness (M = .533, SE = .095, N = 65) did not report more anger than did witnesses of fairness
270
(M = .414, SE = .084, N = 80), p = .363, partial 2 = .00, but victims of unfairness (M = 1.28, SE
271
= .098, N = 61) did report more anger than their fairly-treated counterparts (M = .418, SE = .093,
272
N = 64), p < .001, partial 2 = .13 (Fig. 3; see ESM S1.3 and Fig. S1 for a replication with an
273
implicit measure based on reaction time data). Thus, people became angry when treated unfairly
274
but not when they only witnessed the unfair treatment of a stranger [cf. 36]. Importantly, this
275
difference in the anger of witnesses versus victims of unfairness was not due to different
276
perceptions of the transgression’s fairness or moral wrongness (see ESM S1.4 and Fig. S2).
277
Eleven of sixty-five witnesses of unfairness paid some cost to impose costs on the unfair
278
Dictator; with a much larger sample, one might argue, we therefore might have found statistical
279
evidence for mild third-party punishment. However, because the witnesses of unfairness were
280
not angry at the Dictator (see above), we suspected that the predominant emotional response
281
among witnesses of unfairness was envy given that they had observed the unfair Dictator obtain
282
a higher payoff ($9) than they themselves had received [$5; see ref. 37]. We found a significant
Altruistic punishment: A closer look 13
283
target*treatment interaction for envy (with anger partialed out), F (1, 265) = 4.53, p = .034:
284
witnesses of unfairness were more envious of the Dictator than were the witnesses of fairness, p
285
= < .001, partial 2 = .07. In contrast, victims of unfairness were no more envious than were their
286
fairly-treated counterparts, p = .306, partial 2 = .00. Thus, had we observed a significant amount
287
of third-party punishment among witnesses of unfairness, it plausibly could have been motivated
288
by envy toward the unfair Dictator rather than by moralistic anger. This difference in the
289
emotions of the witnesses and victims of the Dictator’s unfairness explains why third-party
290
punishment was quite rare and mild: witnesses of unfairness were envious of the Dictator’s ill-
291
gotten gains—but not angry—and so they likely were reluctant to spend their own money to
292
punish the Dictator.
293
During Experiment 1, we also ran a small fifth condition (N = 45; see ESM S1.1) in
294
which witnesses of unfairness started Round 1 with $9, enabling us to test whether witnesses’
295
economic disadvantage relative to unfair Dictators explained their surplus envy. Witnesses who
296
started Round 1 with $9 were significantly less envious of unfair Dictators (controlling for anger)
297
than were witnesses who started with $5, F (1, 107) = 8.35, p = .005, partial 2 = .07. Self-
298
reported anger (controlling for envy) did not differ between groups, F (1,107) = .581, p = .448,
299
partial 2 = .01. Witnesses of unfairness with $9 did not punish an amount significantly different
300
from zero, Z = -.879, p = .379, N = 45, nor differently than did the witnesses of unfairness with
301
$5, p = .475, N = 110, even though doing so would have cost a smaller proportion of their stake.
302
Thus, the emotional reactions of witnesses of unfairness were characterized by envy rather than
303
moralistic anger.
304
Experiment 2a
Altruistic punishment: A closer look 14
305
In Experiments 2a (online sample) and 2b (undergraduate sample), we investigated how
306
subjects’ affective and behavioural forecasts in a hypothetical scenario would compare to the
307
results from Experiment 1 (see Table S1 and Table S2 for descriptive statistics for both
308
experiments). This experiment was conducted not because we thought that participants’
309
hypothetical responses would provide a reliable assay of how they would behave in a real-life
310
situation, such as the one we explored in Experiment 1, but because we wished to compare
311
participants’ forecasts of how they might behave and feel with participants’ actual behaviour and
312
emotional reactions in Experiment 1. The rounds of the game were identical to Experiment 1,
313
except that subjects were instructed to report how they believed they would act and feel in
314
response to a hypothetical vignette.
315
In Experiment 2a—and in contrast to Experiment 1—witnesses of hypothetical unfairness
316
forecast that they would administer a greater-than-zero amount of punishment, Z = -2.38, p =
317
.017, N = 137, as did victims of hypothetical unfairness, Z = -2.66, p = .008, N = 148; witnesses
318
and victims of hypothetical unfairness did not differ significantly, p = .391. Four additional
319
results suggest that a different psychological process was at work in Experiment 2a than in
320
Experiment 1. First, witnesses of hypothetical unfairness forecast a much higher likelihood of
321
punishing at least $4 (the critical threshold for efficacious punishment) than did witnesses of
322
hypothetical fairness (p = .002, Fisher’s Exact Test). This proportion (22 of 137; 16%) was
323
significantly larger than what we saw in the actual behaviour of Experiment 1 subjects (2 of 65;
324
3%), p = .009, Fisher’s Exact Test, (Fig. 4). Second, the proportion of witnesses (16%) and
325
victims (31 of 148; 21%) of hypothetical unfairness that forecast punishing at least $4 did not
326
differ, p = .361, contrary to Experiment 1. Third, witnesses of hypothetical unfairness forecast a
327
significant amount of anger toward unfair Dictators in Experiment 2a, again in contrast to
Altruistic punishment: A closer look 15
328
Experiment 1: there was a significant target*treatment interaction (controlling for envy), F (1,
329
285) = 5.87, p = .016. Witnesses of hypothetical unfairness (M = 1.59, SE = .109, N = 133)
330
forecast more anger than did witnesses of hypothetical fairness (M = .419, SE = .116, N = 147), p
331
< .001, partial 2 = .16. Likewise, victims of hypothetical unfairness (M = 2.04, SE = .110, N =
332
139) forecast more anger than did recipients of hypothetical fairness (M = .345, SE = .108, N =
333
141), p < .001, partial 2 = .28. Fourth, both witnesses (F (1, 131) = 25.48, p < .001, partial 2 =
334
.16) and victims (F (138) = 10.69, p = .001, partial 2 = .07) of hypothetical unfairness forecast
335
significantly more anger (controlling for envy) toward the unfair Dictator than their counterparts
336
reported in Study 1 (Fig. 3). These results are therefore consistent with proposals that norm
337
violations elicit “negative emotions” [4,6], which in turn motivate altruistic punishment, but here
338
they resulted from affective forecasting rather than from responding to real-time events. (Recall,
339
in contrast, that in Experiment 1, which involved real-time behaviour and emotional responses
340
rather than forecasting, no such moral outrage was found).
341
Experiment 2b
342
The pattern of punishment results for Experiment 2b was virtually identical to those of
343
Experiment 2a: the proportion of witnesses of hypothetical unfairness that forecast they would
344
punish at least $4 (5 of 85; 6%) did not differ from that of victims of hypothetical unfairness (9
345
of 101; 9%), p = .580, Fisher’s Exact Test—a pattern that was similar to Experiment 2a, but
346
contrary to Experiment 1. Interestingly, neither witnesses Z = .183, p = .855, N = 85, nor victims
347
of hypothetical unfairness, Z = -.162, p = .872, N = 101, reported they would administer a
348
greater-than-zero amount of punishment. Nevertheless, both witnesses of hypothetical unfairness
349
(p = .031) and victims of hypothetical unfairness (p = .012) forecast they would punish
350
significantly more than did their (hypothetically) fairly-treated counterparts: this is because both
Altruistic punishment: A closer look 16
351
witnesses, Z = 2.33, p = .020, N = 97 and recipients of hypothetical fairness, Z = 2.98, p = .003,
352
N = 85, rewarded a greater-than-zero amount. Furthermore—and importantly—witnesses and
353
victims of hypothetical unfairness did not forecast different amounts of punishment, p = .743.
354
Thus, notwithstanding the fact that hypothetical punishment of unfairness appeared largely to
355
have taken the form of withdrawing reward (rather than imposing costs) for subjects in
356
Experiment 2b, the punishment results largely replicated those obtained in Experiment 2a (see
357
Fig. 4).
358
Moreover, the pattern of emotion results of Experiment 2b was identical to that of
359
Experiment 2a: witnesses of hypothetical unfairness (M = 1.43, SE = .100, N = 94) forecast more
360
anger (controlling for envy) than did witnesses of hypothetical fairness (M = .392, SE = .104, N =
361
92), p < .001, partial 2 = .13. Likewise, victims of hypothetical unfairness (M = 1.44, SE = .096,
362
N = 109) forecast more anger than did recipients of hypothetical fairness (M = .259, SE = .098, N
363
= 99), p < .001, partial 2 = .16. Witnesses (F (1, 144) = 18.53, p < .001, partial 2 = .11) but not
364
victims (F (157) = .005, p = .945, partial 2 = .00) of hypothetical unfairness also forecast
365
significantly more anger (controlling for envy) toward the unfair Dictator than the subjects in
366
Experiment 1 actually experienced (Fig. 3). The overall pattern of forecast behaviour and
367
emotion in Experiment 2b suggests that the students who were the subjects in Experiment 2b had
368
a slight tendency to believe that they would reward fair distributions, which the non-student
369
subjects in Experiment 2a did not share, but in every other way the results are identical to those
370
of Experiment 2a: subjects forecast that both experiencing and witnessing unfairness would
371
cause them to become angry and to punish dictators to a greater extent than did subjects who
372
forecast their responses to either receiving or witnessing fair treatment. Furthermore, both
373
experiencers and witnesses of unfairness forecast equivalent likelihoods of punishing at least $4.
Altruistic punishment: A closer look 17
374
Discussion
375
Experiment 1 indicates that, under the conditions we investigated, humans do not impose
376
meaningful amounts of third-party punishment on behalf of absolute strangers. The nominal and
377
statistically non-significant amount of punishment we did observe was apparently motivated by
378
envy because of a comparatively unfavourable personal outcome rather than by moralistic anger
379
on behalf of a mistreated stranger. Our finding that the emotional reaction to witnessing
380
unfairness is characterized by envy rather than moralistic anger is particularly inconvenient for
381
the altruistic punishment hypothesis: to categorize a behaviour as an adaptation for altruistic
382
benefit delivery, one needs to provide evidence that the psychological mechanisms that produce
383
the behaviour in question have been designed for that specific function. That is, one needs to
384
demonstrate that the behaviour is not caused by mechanisms designed for a different function.
385
The presence of envy, rather than moralistic anger, in response to witnessing unfairness, suggests
386
that the psychological mechanisms involved in third-party punishment are, at least in part,
387
designed to process cues that another individual has obtained better outcomes than oneself [38].
388
In contrast, we found no evidence that they are designed to process cues that an anonymous
389
stranger has been harmed. We do not mean to imply that humans do not impose any third-party
390
punishment: under some circumstances, they do [22,35,39]. However, our results cast doubt on
391
the proposal that the mechanisms that motivate third-party punishment are altruistic benefit-
392
delivery systems that are motivated proximately by moralistic anger.
393
Experiments 2a and 2b show furthermore that people inaccurately forecast their affective
394
and behavioural responses to unfairness in experimental games: in particular, subjects who
395
imagined themselves witnessing (rather than experiencing) unfair treatment forecast both more
396
anger and punishment (and, in the case of Experiment 2b, withdrawal of rewarding) than is
Altruistic punishment: A closer look 18
397
observed among people who witness unfair treatment in the laboratory. This dissociation
398
between hypothetical and actual third-party punishment raises the possibility that punishment
399
imposed by mere witnesses of unfairness found in prior work resulted from demand
400
characteristics, affective forecasting errors, and the other methodological shortcomings we have
401
cited here [8,25,27].
402
Limitations
403
As mentioned at the outset, the goal of the experiments presented herein was not to
404
systematically identify which specific methodological conventions were responsible for previous
405
findings of third-party punishment on behalf of strangers. Rather, the goal was to test whether a
406
suite of methodological conventions that are commonly applied within the third-party
407
punishment game collude to create more third-party punishment in that experimental realization
408
than would actually obtain in experiments that remediated those methodological shortcomings.
409
As such, our results cannot speak with certainty to the effects of particular aspects of previous
410
designs, such as the strategy method. A recent survey of studies comparing the strategy method
411
to the direct-response method across a variety of paradigms found that evidence surrounding the
412
effect of using the strategy method is mixed [40], but importantly, no study has been conducted
413
to directly compare the amounts of third-party punishment elicited in experiments using the
414
strategy method versus the direct response method [40]. Therefore, further work is needed to
415
determine the unique contribution of the use of the strategy method (and the other potential
416
methodological artefacts we have identified herein) to the apparently exaggerated evidence for
417
altruistic third-party punishment that previous work has revealed: we emphasize again that doing
418
so was not our goal here. Despite this limitation, our results do strongly suggest that subjects’
Altruistic punishment: A closer look 19
419
forecasts of their likely anger and punishment in response to witnessing unfairness in the
420
standard third-party punishment game [e.g., 6] are exaggerated.
421
Conclusion
422
These findings are of broad significance in the study of human cooperation because many
423
researchers have proceeded under the assumption that altruistic punishment is a robust
424
phenomenon that requires an adaptationist explanation. Indeed, two scientific “problems” for
425
which cooperation researchers over the past decades have been seeking adaptationist solutions
426
might not be problems at all. Consider the puzzle framed by proponents of “strong reciprocity,”
427
such as Gintis [41], who claimed that humans are “strong reciprocators” who are “predisposed to
428
cooperate with others and punish non co-operators, even when this behaviour cannot be justified
429
in terms of self-interest, extended kinship, or reciprocal altruism (p. 169).”
430
With respect to the former problem—“unjustified” predispositions to cooperate without
431
apparent individual benefit—results from recent models suggest that a bias to cooperate, even
432
when one faces cues of an interaction being one-shot, should be expected to coevolve with
433
reciprocity. This is because mistaking a one-shot interaction for a repeated interaction is a less
434
costly error than the reverse [42]. In terms of the latter problem (i.e., the claim that people punish
435
non co-operators even when the punisher does not stand to benefit individually), the results from
436
Experiment 1 call into question the claim that people engage in altruistic third-party punishment
437
at all [See also, 10,13]. We think another way forward in the study of third-party punishment in
438
humans, as in the study of the evolved mechanisms that motivate human cooperation in general,
439
is to intensify the search for direct or indirect benefits for punishers that outweigh the costs of
440
punishment, consistent with all known cases of third-party intervention in non-human animals
441
[43,44].
Altruistic punishment: A closer look 20
442
Acknowledgements
443
We thank Max Burton-Chellow, Steven Pinker, Stuart A. West, and Richard W. Wrangham for
444
their feedback on a previous draft, and David G. Rand for kind assistance with his data. Research
445
supported by grants from the Air Force Office of Scientific Research (Award #FA9550-12-1-
446
0179) to M.E.M and R.K., the Arsht Research on Ethics and Community Program at the
447
University of Miami to M.E.M, and an NSF Graduate Research Fellowship to E.J.P.
Altruistic punishment: A closer look 21
448
References
449
450
1
Clutton-Brock, T. H. & Parker, G. A. 1995 Punishment in animal societies. Nature 373,
209-216. (DOI:10.1038/373209a0)
451
452
453
2
West, S. A., Griffin, A. S. & Gardner, A. 2007 Social semantics: altruism, cooperation,
mutualism, strong reciprocity and group selection. J Evol Biol 20, 415-432.
(DOI:10.1111/j.1420-9101.2006.01258.x)
454
455
3
Bshary, A. & Bshary, R. 2010 Self-serving punishment of a common enemy creates a
public good in reef fishes. Curr Biol 20, 2032-2035. (DOI:10.1016/j.cub.2010.10.027)
456
457
4
Fehr, E. & Gächter, S. 2002 Altruistic punishment in humans. Nature 415, 137-140.
(DOI:10.1038/415137a)
458
459
5
Boyd, R., Gintis, H., Bowles, S. & Richerson, P. J. 2003 The evolution of altruistic
punishment. Proc Natl Acad Sci USA 100, 3531-3535. (DOI:10.1073/pnas.0630443100)
460
461
6
Fehr, E. & Fischbacher, U. 2004 Third-party punishment and social norms. Evol Hum
Behav 25, 63-87. (DOI:10.1016/S1090-5138(04)00005-4)
462
463
464
7
Henrich, J., McElreath, R., Barr, A., Ensminger, J., Barrett, C., Bolyanatz, A., Cardenas, J.
C., Gurven, M., Gwako, E., Henrich, N. et al. 2006 Costly punishment across human
societies. Science 312, 1767-70. (DOI:10.1126/science.1127333)
465
466
467
8
West, S. A., El Mouden, C. & Gardner, A. 2011 Sixteen common misconceptions about
the evolution of cooperation in humans. Evol Hum Behav 32, 231-262.
(DOI:10.1016/j.evolhumbehav.2010.08.001)
468
469
9
Burnham, T. C. & Johnson, D. D. P. 2005 The biological and evolutionary Logic of
human cooperation. Analyse and Kritik 27, 113-135.
470
471
10
McCullough, M. E., Kurzban, R. & Tabak, B. A. In press. Cognitive systems for revenge
and forgiveness. Behav Brain Sci
472
473
474
11
Hagen, E. H. & Hammerstein, P. R. 2006 Game theory and human evolution: A critique
of some recent interpretations of experimental games. Theor Popul Biol 69, 339-348.
(DOI:10.1016/j.tpb.2005.09.005)
475
476
477
478
12
Kümmerli, R., Burton-Chellew, M. N., Ross-Gillespie, A. & West, S. A. 2010 Resistance
to extreme strategies, rather than prosocial preferences, can explain human cooperation in
public goods games. Proc Natl Acad Sci USA 107, 10125-30.
(DOI:10.1073/pnas.1000829107)
479
480
13
Krasnow M. M., Cosmides L., Pedersen E. J., & Tooby J. 2012 What Are Punishment and
Reputation for? PLoS ONE 7: e45662. doi:10.1371/journal.pone.0045662
Altruistic punishment: A closer look 22
481
482
483
14
McCullough M. E., Pedersen, E. J., Schroder, J. M., Tabak, B. A., & Carver, C. S. 2012
Harsh childhood environmental characteristics predict exploitation and retaliation in
humans. Proc R Soc B 20122104. doi: 10.1098/rspb.2012.2104
484
485
15
Carpenter, J. P. & Matthews, P. H. In press. Norm Enforcement: Anger, indignation, or
reciprocity? J Eur Econ Assoc
486
487
16
Orne, M. 1962 On the social psychology of the psychological experiment: With particular
reference to demand characteristics and their implications. Am Psychol 17, 776-783.
488
489
490
491
492
17
Dana, J., Weber, R. A. & Kuang, J. 2007 Exploiting Moral Wiggle Room: Experiments
Demonstrating an Illusory Preference for Fairness. Econ Theor 33, 67-80.
18
Nelissen, R. M. A. 2008 The price you pay: cost-dependent reputation effects of altruistic
punishment. Evol Hum Behav 29, 242-248.
493
494
19
Johnstone, R. A. & Bshary, R. 2004 The evolution of spiteful behavior. Proc R Soc B 271,
1917-1922.
495
496
20
Sell, A, Tooby, J. & Cosmides, L. 2009 Formidability and the logic of human anger. Proc
Natl Acad Sci USA 106, 15073-15078.
497
498
21
Lieberman, D., & Linke, L. 2007 The effect of social category on third party punishment.
Evol Psychol 5, 289-305.
499
500
22
Kurzban, R., DeScioli, P. & O’Brien, E. 2007 Audience effects on moralistic punishment.
Evol Hum Behav 28, 75-84. (DOI:10.1016/j.evolhumbehav.2006.06.001)
501
502
23
Bolton, G. E., & Zwick, R. (1995). Anonymity versus punishment in ultimatum
bargaining. Games and Economic Behavior 10, 95-121.
503
504
505
24
Selten, R. 1967 Die Strategiemethode zur Erforschung des eingeschränkt rationalen
Verhaltens im Rahmen eines Oligopolexperiments. In Beiträge zur experimentellen
Wirtschaftsforschung, (ed H. Sauermann), pp. 136-168.
506
507
508
25
Weber, S. J. & Cook, T. D. 1972 Subject effects in laboratory research: An examination of
subject roles, demand characteristics, and valid inference. Psychol Bul 77, 273-295.
(DOI:10.1037/h0032351)
509
510
511
26
Almenberg J., Dreber A., Apicella C. L., & Rand D.G. 2011 Third Party Reward and
Punishment: Group Size, Efficiency and Public Goods, in Psychology of Punishment
Nova Science Publishers. Eds. NM Palmetti et al.
512
513
27
Wilson, T. D. & Gilbert, D. T. 2005 Affective forecasting: Knowing what to want. Curr
Dir Psychol Sci 14, 131-134.
Altruistic punishment: A closer look 23
514
515
28
Cook, K. S. & Yamagishi, T. 2008 A defense of deception on scientific grounds. Soc
Psychol Q 71, 215-221.
516
517
29
Kawakami, K., Dunn, E., Karmali, F. & Dovidio, J. 2009 Mispredicting affective and
behavioral responses to racism. Science 323, 276-278. (DOI:10.1126/science.1164951)
518
519
30
Hareli, S. & Weiner, B. 2002 Dislike and envy as antecedents of pleasure at another’s
misfortune. Motiv Emot 26, 257-277.
520
521
31
Reuben, E. & van Winden, F. 2008 Social ties and coordination on negative reciprocity:
The role of affect. J Public Econ 92, 34-53. (DOI:10.1016/j.jpubeco.2007.04.012)
522
523
32
Aronson, E., Ellsworth, P., Carlsmith, J. & Gonzales, M. 1990 Methods of research in
social psychology. New York: McGraw-Hill.
524
33
Camerer, C. 2003 Behavioral game theory. New York: Princeton University Press.
525
526
527
34
Ernest-Jones, M., Nettle, D. & Bateson, M. 2011 Effects of eye images on everyday
cooperative behavior: A field experiment. Evol Hum Behav 32, 172-178.
(DOI:10.1016/j.evolhumbehav.2010.10.006)
528
529
530
531
35
Petersen, M. B., Sell, A., Tooby, J. & Cosmides, L. 2010 Evolutionary psychology and
criminal justice: A recalibrational theory of punishment and reconciliation. In Human
Morality and Sociality: Evolutionary and comparative perspectives (ed H. Høgh-Oleson),
pp. 72-131. New York: Palgrave MacMillan.
532
533
534
36
Batson, C. D., Kennedy, C. L., Nord, L., Stocks, E. L., Fleming, D. A., Marzette, C. M.,
Lishner, D. A., Hayes, R. E., Kolchinsky, L. M. & Zerger, T. 2007 Anger at unfairness: Is
it moral outrage? Eur J Soc Psychol 1285, 1272-1285. (DOI:10.1002/ejsp)
535
536
37
Zizzo, D. & Oswald, A. 2001 Are people willing to pay to reduce others’ incomes?
Annales d’Economie et de Statistique 63-64, 39-65.
537
538
38
Price, M. E., Cosmides, L. & Tooby, J. 2002. Punitive sentiment as an anti-free
rider psychological device. Evol Hum Behav 23, 203-231.
539
540
39
Phillips, S., Cooney, M., Carr, T. & Frady, B. 2005 Aiding peace, abetting violence: Third
parties and the management of conflict. Am Sociol Rev 70, 334-354.
541
542
40
Brandts, J. & Charness, G. 2011 The strategy versus the direct-response method: a first
survey of experimental comparisons Exp Econ 14, 375-398.
543
544
41
Gintis, H. 2000 Strong reciprocity and human sociality. J Theor Biol 206, 169-179.
(DOI:10.1006/jtbi.2000.2111)
Altruistic punishment: A closer look 24
545
546
547
42
Delton, A. W., Krasnow, M. M., Cosmides, L. & Tooby, J. 2011 Evolution of direct
reciprocity under uncertainty can explain human generosity in one-shot encounters. Proc
Natl Acad Sci USA (DOI:10.1073/pnas.1102131108)
548
549
43
Raihani, N. J., Grutter, A. S. & Bshary, R. 2010 Punishers benefit from third-party
punishment in fish. Science 327, 171. (DOI:10.1126/science.1183068)
550
551
552
44
Smith, J. E., Van Horn, R. C., Powning, K. S., Cole, A. R., Graham, K. E., Memenis, S.
K. & Holekamp, K. E. 2010 Evolutionary forces favoring intragroup coalitions among
spotted hyenas and other animals. Behav Ecol 21, 284-303. (DOI:10.1093/beheco/arp181)
553
Altruistic punishment: A closer look 25
554
Figure Captions
555
556
557
Figure 1. Game structure
558
(a.) Round 1. All players started with $5. Subject was either Recipient or 3rd Party. The Dictator
559
(a fictitious player whose “decisions” were determined by computer script) either took $0 (fair
560
conditions; unbolded) or $4 (unfair conditions; bolded) from Recipient. (b.) Round 2. Players
561
started with $5. Subject was Dictator; previous “Dictator” was Recipient; 3rd party was excluded.
562
Subject was allowed to give any portion of $5, do nothing, or pay a 1:4 cost to deduct money
563
from Recipient. Money deducted from Recipient in Round 2 was “burned”—it was not gained by
564
Dictators as income.
565
Altruistic punishment: A closer look 26
566
567
Figure 2. Punishment/reward distributions (Experiment 1)
568
Amount of money (in $) the subject punished (negative values) or rewarded (positive values) the
569
Round 1 Dictator (N = 270). Values are in terms of the effect on the Round 1 Dictator, not the
570
cost to the subject (cost-to-punish ratio = 1:4; cost-to-reward ratio = 1:1).
571
Altruistic punishment: A closer look 27
572
573
Figure 3. Self-reported anger (Experiments 1, 2a and 2b)
574
Self-reported anger (scale: 0-5) toward Dictator following Round 1, controlling for envy (N =
575
943). Error bars = +/- 1 SE.
576
Altruistic punishment: A closer look 28
577
578
Figure 4. Punishment ≥ $4 (Experiments 1, 2a and 2b)
579
Proportion of subjects in unfair conditions that punished (Experiment 1) or reported they would
580
punish (Experiments 2a and 2b) ≥ $4 (N = 597). Error bars = +/- 1 SE.
581
Altruistic punishment: A closer look 29
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
Electronic Supplementary Material
S1. Supporting Methods and Results
S1.1 Random assignment to experimental condition and cell sizes
Experiment 1: The condition in which witnesses of unfairness started with $9 was significantly
smaller than the other conditions because we stopped enrolling subjects in this condition to
increase power for statistical analyses involving the four cells of the 2 (Target: Self, Other) x 2
(Treatment: Fair, Unfair) design. This decision was made prior to any analysis of the data from
this condition and was based solely on an attempt to maximize the number of subjects in each of
the other four cells, given the rate at which we managed to recruit subjects into the study. All
other cell size differences are due to the random nature of the assignments.
Experiments 2a and 2b: Cell size differences are due to random assignment. The discrepancy in
sample sizes between punishment/rewarding analyses and emotion analyses is a result of our
adding the emotion questions halfway through data collection.
S1.2 Data excluded from Experiment 1 analyses based on debriefing responses
Data from 26 subjects (12 female; Age: M = 18.85, SD = 1.62) were excluded from all analyses,
figures, and tables (including ESM) because they expressed scepticism during debriefing (see
ESM Appendix for debriefing script) that they had been interacting with other people. Decisions
to exclude individual subjects were made without knowledge of their experimental data. The
number of subjects excluded did not vary by condition, χ2 (4, N = 341) = .459, p = .977,
suggesting that none of the conditions induced greater scepticism than any other.
Additionally, we re-ran all of the analyses with flagged subjects included and found that the
results did not change qualitatively in any way—all of the significant relationships presented in
the main text remained significant when flagged subjects were added to the analyses.
S1.3 Lexical decision task measure of implicit anger following round 1 (Experiment 1):
Method and results
In addition to the self-report anger data reported in the main paper for Experiment 1, we
collected data from a lexical decision task (LDT) following Round 1: Subjects decided, as
quickly as possible, whether a string of letters was a word or a non-word; reaction times to 15
hostility-related words (e.g., “angry,” “kill”) in the LDT serve as an implicit measure of anger
(with faster reaction times indicating more anger; see refs 34,35).
A 2 (Target: Self, Other) x 2 (Treatment: Fair, Unfair) ANOVA revealed a significant
target*treatment interaction for LDT anger following Round 1, F (1, 261) = 4.66, p = .032,
partial 2 = .02 (see Fig. S1). Main effects were not significant. The reaction times to hostilityrelated words for subjects who witnessed another person being treated unfairly (natural logtransformed ms, M = 6.70, SD = 0.28, N = 62) did not differ from those who witnessed another
being treated fairly (M = 6.66, SD = 0.27, N = 80), p = .416, partial 2 = .00. However, subjects
Altruistic punishment: A closer look 30
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
who were treated unfairly (M = 6.58, SD = 0.29, N = 59) had faster reaction times than did
subjects who were treated fairly (M = 6.69, SD = 0.32, N = 64), p = .029, partial 2 = .02,
suggesting greater anger among those who had been treated unfairly.
Reaction times to neutral words are typically controlled for in analysing LDT data [34,35].
However, we did not control for reaction times to neutral words in our analysis because (a)
reaction times to neutral words did not differ across conditions, F (3, 261) = 1.54, p = .205,
partial 2 = .02, and (b) the inclusion of reaction times to neutral words as a covariate eliminated
the significant interaction. Because statistical control of a non-significant covariate decreases
power by subtracting degrees of freedom from the error term while not removing adequate sums
of squares for error [36], we dropped it from our analysis.
S1.4 Fairness/moral wrongness of the round 1 dictator’s behavior (Experiment 1):
Witnesses and receivers view the transgression identically.
A possible explanation for the differences between the emotional and behavioral responses of
recipients and witnesses of unfairness is that recipients paid more attention to the Dictator’s
unfair behavior. However, the target*treatment interaction was not significant for perceived
fairness (F (1, 266) = 1.73, p = .190) or moral wrongness (F (1, 266) = 1.33, p = .249) of the
Dictator’s decision (see Fig. S2). There were, however, significant effects of treatment: Unfair
Dictators’ behavior was perceived as more unfair, F (1, 266) = 354.13, p < .001, and morally
wrong, F (1, 266) = 182.03, p < .001, than fair Dictators’ behavior. Thus, subjects in both unfair
Dictator conditions clearly judged that a transgression had occurred, but they were only angered
and motivated to punish when they personally were targets of unfairness—consistent with
previous findings [27].
S1.5 Effects of round 1 dictator behavior on self-reported anger toward the round 1
dictator, envy uncontrolled
When envy was not controlled, witnesses of unfairness did appear to get angry at unfair
Dictators, relative to witnesses’ anger in response to fair dictators: A 2 (Target: Self, Other) x 2
(Treatment: Fair, Unfair) ANOVA revealed significant main effects of both target, F (1, 266) =
10.81, p = .001), and treatment, F (1, 266) = 85.09, p < .001) for anger, and a significant
target*treatment interaction for anger, F (1, 266) = 12.44, p < .001). Witnesses of unfairness (M
= .813, SE = .110, N = 65) reported more anger than did witnesses of fairness (M = .198, SE =
.098, N = 80), p < .001, partial 2 = .06, and recipients of unfairness (M = 1.55, SE = .115, N =
61) reported more anger than recipients of fairness (M = .172, SE = .108, N = 64), p < .001,
partial 2 = .22. However, as discussed in the main text, witnesses of unfairness reported no
more anger than witnesses of fairness when envy was statistically controlled. Thus, witnesses’
anger at unfair Dictators can be attributed to envy rather than to moralistic anger.
Altruistic punishment: A closer look 31
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
S2. Supplementary Notes
687
688
Below we address some possible concerns with the use of deception in our method, and why
these concerns do not affect the validity of our results.
689
690
691
692
693
694
695
696
697
698
Authenticity of debriefing responses. It could be argued that our debriefing process was
inadequate to identify all cases of scepticism of our deception among our participants because
participants were being compensated both with money and partial course credit; that is, perhaps
they were coerced into responding that they believed the deception because they felt they would
not receive compensation if they responded that they did not. This is highly unlikely. Participants
read and sign consent forms prior to participation in any experiment at the University of
Miami—as is the case with any IRB-approved study involving human subjects in the United
States—that explicitly state that participants cannot be denied compensation based on their
responses; actual amount of money earned may vary based on decisions in experimental tasks,
but course credit is required to be granted once a consent form is signed.
699
700
701
702
703
704
705
706
707
708
Possible contamination of the subject pool. If participants are regularly subjected to experiments
in which deception is used, it could be argued that perhaps they come to expect to be deceived in
future experiments in which they participate [37], and consequently provide responses that do
not accurately reflect natural behavior. This is unlikely to be a significant issue in the psychology
department’s subject pool at the University of Miami for several reasons. First, our subject pool
consists only of undergraduates currently enrolled in a specific introductory psychology course;
once they have completed the course they are no longer eligible to participate in experiments.
Second, members of the subject pool only participate in five total hours of experiments, which
translates to participation in approximately two to four experiments. Third, many of the
experiments that members of our subject pool participate in do not involve deception. Thus, it is
S2.1 The use of deception
Though experimental economists typically resist the use of deception in experiments, its use here
is justified: there was no practical way we could have obtained a sufficiently large sample size
without deception. We sought to gather data from at least 50 subjects per cell of our main 2x2
design in order to have adequate statistical power. Without the use of deception, we would have
had to rely on a minimum of 100 subjects, in the role of the Round 1 Dictator, to take exactly
$4.00 from the Recipient (unfair conditions) and at least 100 more subjects to take exactly $0.00
from the Recipient (fair conditions). Assume that Round 1 Dictators would be equally likely to
make one of the 11 possible choices on the give-take continuum, from giving $5 to taking $5, in
whole-dollar increments.* We would need N = 1,100 (100 subjects per choice*11 choices) to
achieve the same statistical power without the use of deception as with 200 subjects in our actual
paradigm. Considering that our interest lies entirely with how subjects responded to Dictator
actions, and not the actions themselves, such a design would be wasteful of subjects’ time
(thereby altering the ratio of benefits to risks of the experiment, and thus its ethicality) and
resources.
*
It is probably unrealistic to assume that subjects would be equally likely to choose any one of the 11 options, but
this oversimplification is more than adequate to demonstrate the present point. Also, in our actual design with
predetermined Round 1 Dictator actions, subjects in Round 2 were allowed to reward or punish any amount in this
range, down to the nearest cent: they made their decision by typing in a precise amount.
Altruistic punishment: A closer look 32
709
710
unlikely that participants in our subject pool have become accustomed to participating in
experiments involving deception to the point where their behavior is altered by its expectation.
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
Running subjects in individual sessions. The decision to run subjects either individually or in
groups presented a trade-off between experimental control and a more realistic setting. We
decided to run subjects individually as the benefits of doing so greatly outweighed the potential
costs of the less realistic setting. First, though subjects may have been more sceptical that they
were actually interacting with others since there were no other people in the room, the scenario
we presented was very plausible: subjects were told that two other participants we located
elsewhere in the psychology building (a five-story building with lots of foot traffic). Second,
running a third-party punishment study with two other people in the room introduces variables
relative to their physical appearances including: sex, clothing style, coalitional markings (e.g.,
fraternity symbols, sports team logos), perceived formidability, friendliness, ethnicity, etc. Even
though subjects’ identities are anonymous in the game and subjects are separated with partitions
during play, subjects will inevitably interact sometime during the experimental session. Thus,
running subjects individually under the perfectly plausible guise of their interacting with other
people in another room avoids the potential experimental noise that can be introduced by these
other factors.
Concern has previously been raised that running subjects in isolated rooms may potentially
increase scepticism that interactions are legitimate, thereby influencing behavioural results.
Frohlich, et al. [38], empirically tested whether running a dictator game with isolated subjects led
to different results as compared with running the game with subjects in the same room. In their
discussion, Frolich, et al. state, “Contrary to the hypothesis, most measures of subjects’
uncertainties were not significantly different as a result of the change in the number of rooms.
Indeed, only doubt that the money left in the envelope would be given to the paired other was
significantly reduced in the One Room experiments.” We note that this effect was only
statistically significant by one-tailed test (p = .038), whereas the effects on other, similar,
dependent variables were not statistically significant (even by one-tailed test: Did not view
experiment as a game; Not sure that description was accurate; and, most importantly, Not sure
that there were real people paired.)
S2.2 Inconsistency with previous experimental results
It could be argued that the reason we found no altruistic punishment in Experiment 1 is because
we changed multiple aspects of the standard design of the third-party punishment game at once.
Indeed, we made several changes in an attempt to minimize or eliminate (1) audience effects, (2)
experimental demand, (3) affective forecasting, and (4) potential extraneous variables introduced
by brief interactions with other participants. However, there are several reasons this argument is
not compelling. First, we observed a significant amount of second-party punishment so, clearly,
there was not a general suppression of punishment in our design. Second, the cost of punishment
in our design (1:4) was less expensive than the 1:3 cost-to-punishment ratio typically used in the
third-party punishment game [e.g., 6]. Indeed, our design should have encouraged punishment
relative to previous designs. Third, our purpose here was to test for the presence of altruistic
third-party punishment in a well-controlled experimental design, not to iteratively make changes
to a design that contained several features that likely produced artefactual results—to achieve
this well-controlled design, all of our changes needed to be made simultaneously.
Altruistic punishment: A closer look 33
755
756
References
757
758
34
Ayduk, O., Mischel, W. & Downey, G. 2002 Attentional mechanisms linking rejection to
hostile reactivity: The role of “hot” versus “cool” focus. Psychol Sci 13, 443-448.
759
760
761
35
Gollwitzer, M. & Denzler, M. 2009 What makes revenge sweet: Seeing the offender
suffer or delivering a message? J Exp Soc Psychol 45, 840-844.
(DOI:10.1016/j.jesp.2009.03.001)
762
763
36
Tabachnick, B. G. & Fidell, L. S. 1989 Using Multivariate Statistics. New York: Harper
& Row.
764
765
37
Hertwig, R. & Ortmann, A. 2001 Experimental practices in economics: A methodological
challenge for psychologists? Behav Brain Sci 24, 383-451.
766
767
768
38
Frohlich, N., Oppenheimer, J., & Bernard Moore, J. 2001. Some doubts about measuring
self-interest using dictator experiments: the costs of anonymity. J Econ Behav Organ 46,
271-290.
769
770
Altruistic punishment: A closer look 34
771
S3. Supplementary Figures and Captions
772
773
774
775
Figure S1.
Reaction times to hostility-related words in the lexical decision task.
Altruistic punishment: A closer look 35
776
777
778
779
780
Figure S2.
Ratings of the fairness and moral wrongness of the Round 1 Dictator’s decision. Scale from 1
(Not at all fair/morally wrong) to 9 (Totally fair/morally wrong).
Altruistic punishment: A closer look 36
781
782
783
784
785
786
S4. Supplementary Tables
Fair –
Recipient
Mean SD
Unfair Recipient
Mean SD
Fair Witness
Mean SD
Unfair Witness
($5)
Mean SD
Unfair Witness
($9)
Mean SD
Variable
Overall
Mean SD
$ Punished/Rewarded*
-0.18
1.46
0.17
0.68
-1.12
2.08
0.34
1.05
-0.24
1.37
-0.22
1.42
LDT RT**
6.65
0.29
6.69
0.32
6.58
0.29
6.66
0.28
6.70
0.28
6.61
0.31
Moral wrongness
3.34
2.67
1.31
1.19
4.48
2.49
1.61
1.79
5.18
2.30
5.13
2.31
Fairness
6.07
2.95
8.61
1.08
3.64
2.40
8.48
1.24
4.26
2.43
4.07
2.00
Anger
0.64
1.01
0.14
0.44
1.54
1.29
0.17
0.46
0.84
1.07
0.67
0.88
Envy
0.85
1.17
0.27
0.61
1.47
1.40
0.36
0.76
1.48
1.27
0.80
1.02
* Negative values indicate punishment
** Response time to hostility-related words in lexical decision task, ln-transformed ms
Table S1. Variable summary statistics – Experiment 1.
Altruistic punishment: A closer look 37
787
788
789
790
Fair –
Recipient
Mean
SD
Unfair Recipient
Mean SD
Fair Witness
Mean SD
Unfair Witness
Mean SD
Variable
Overall
Mean
SD
$ Punished/Rewarded*
-0.23
2.08
0.05
1.69
-0.59
2.48
0.17
1.51
-0.45
2.28
Moral wrongness
3.08
3.23
1.04
2.13
5.32
3.04
0.79
1.81
4.72
2.68
Fairness
5.28
3.38
7.85
2.21
2.64
2.57
7.74
2.01
3.37
2.58
Anger
1.12
1.32
0.16
0.52
2.25
1.20
0.28
0.64
1.63
1.13
Envy
0.98
1.25
0.46
0.92
1.85
1.39
0.34
0.69
1.12
1.16
* Negative values indicate punishment
Table S2. Variable summary statistics – Experiment 2a.
Altruistic punishment: A closer look 38
Variable
791
792
793
794
Overall
Mean
SD
Fair –
Recipient
Mean
SD
Unfair Recipient
Mean SD
Fair Witness
Mean SD
Unfair Witness
Mean SD
$ Punished/Rewarded*
0.25
1.83
0.42
1.71
-0.04
1.99
0.58
1.74
0.06
1.79
Moral wrongness
2.87
2.88
0.75
1.79
4.56
2.41
1.00
1.92
4.97
2.32
Fairness
5.40
3.25
7.92
2.17
3.24
2.24
7.70
2.34
3.02
2.30
Anger
0.90
1.15
0.15
0.51
1.60
1.21
0.24
0.62
1.51
1.11
Envy
0.92
1.26
0.39
0.91
1.68
1.28
0.21
0.53
1.27
1.42
* Negative values indicate punishment
Table S3. Variable summary statistics – Experiment 2b.
Altruistic punishment: A closer look 39
795
796
797
798
799
800
801
Appendix
Debriefing script followed by experimenter:
(1) Tell participant that the study is over. Ask if he/she has any questions. If the questions are
about hypotheses or the deceptive elements of the experiment, explain that you will address
those specific questions in just a few moments.
802
803
804
(2) Ask whether entire the experiment was clear in its overall purpose, and whether all aspects of
the procedure made sense. Was there anything that the participant found confusing unclear?
“Were you, at any point, unsure about what we were asking you to do?”
805
806
(3) We would find it very helpful to hear about any of your personal feelings and reactions to the
experiment. Probe about what made person feel the way they felt.
807
808
809
810
(4) Today’s experiment was designed to help us test some very specific hypotheses about human
behavior. Do you have any idea what those hypotheses were? If you had to guess, what would
you say were the hypotheses we were testing today? We would like to know as many of your
guesses about our hypotheses as you can come up with.
811
(5) Ask whether participant found any aspect of the procedure odd, upsetting or disturbing.
812
813
814
815
816
(6) Did you wonder at any point whether there was more than meets the eye to any of the
procedures that we had you complete today? That is, do you think that there might have been any
information that I held back from explaining to you about the experiment until now? Ask
participant to say more about their suspicions, and to elaborate on their questions about the
procedure.
817
818
(7) Ask how participant thinks (the suspicions he/she mentioned) affected his or her behavior
during the study.
819
820
821
822
823
824
The experimenter then fully explained the nature of the deception to the participant and why it
was a necessary part of the experiment. The experimenter also discussed the aims of the research
with the participant, and answered any questions the participant had about their experience.
Lastly, the experimenter and participant discussed possible ways for the participant to talk about
the experiment with his or her peers in a manner that, while honest, would not spoil the
deception for others in the subject pool.
825
826
827
828
829
830
Our experimenters reported anecdotally that participants generally found the experiment fun and
interesting, and were typically fascinated with the study design after the deception was revealed
in debriefing. Furthermore, the experimenters’ discussion of how to talk to other about the study
helped to, in a sense, bring participants in as collaborators on the research such that (a) they
would not feel that they had been taken advantage of, and (b) they would not spoil the study for
others in the subject pool.