Experimenter Demand Effects in Economic Experiments

Experimenter Demand Effects in Economic Experiments
Daniel John Zizzo*
School of Economics
University of East Anglia
Norwich NR4 7TJ
United Kingdom
[email protected]
July 2008
Social Science Research Network Discussion Paper
www.ssrn.com
Abstract
Experimenter demand effects refer to changes in behavior by experimental
subjects due to cues about what constitutes appropriate behavior. We argue
that they can either be social or purely cognitive, and that, when they may
exist, it crucially matters how they relate to the true experimental objectives.
They are usually a potential problem only when they are positively correlated
with the true experimental objectives’ predictions, and we identify techniques
such as non-deceptive obfuscation to minimize this correlation. We discuss the
persuasiveness or otherwise of defenses that can be used against demand
effects criticisms when such correlation remains an issue.
Keywords: experimenter demand effects, experimental design, experimental
instructions, social desirability, social pressure, framing, methodology.
JEL Classification Codes: B41, C91, C92.
* Phone: +44-1603-593668; fax: +44-1603-456259. I wish to thank participants to a presentation at
Bologna Forli’ and Nick Bardsley, Dan Benjamin, Pablo Branas Garza, Gary Charness, John Duffy,
Dirk Engelmann, Paul Ferraro, Alexander Koch, Charlie Plott and Jen Shang for useful feedback
and references on the topic. The usual disclaimer applies. Partial financial support from the Nuffield
Foundation for what in part has acted as a scoping study towards experimental work on social
desirability is gratefully acknowledged.
Electronic copy available at: http://ssrn.com/abstract=1163863
2
1. Introduction
Experimenter demand effects (EDE for short) refer to changes in behavior by experimental
subjects due to cues about what constitutes appropriate behavior (behavior ‘demanded’ from them).
EDE have been connected, among others, to Milgram’s (1974) experiments on fictional electric
shocks being delivered by experimental subjects under the direct pressure of an experimenter, to the
Hawthorne factory experiments where greater productivity seemed to occurred when workers were
the object of a sociological study (Gillespie, 1991), and to placebo effects in medicine where a
patient feels better simply by knowing that he or she has received a medicine that is meant to make
him or her feel better (Beecher, 1955). In the psychological questionnaire literature they are also
recognized as a source of distortion in responses provided (e.g., Paulhus, 1991; Lonnqvist et al.,
2007). Their potential relevance for experimental economics has been cogently highlighted by
Bardsley (2005, 2008), and is arguably more generally felt in journal refereeing activity where
experimental designs are criticized for falling afoul of EDE.1
It is clear that, insofar as (a) subjects work in a microeconomic system the rules of which are
defined by the experimenter (Smith, 1982) and (b) the experimenter has a position of authority over
the subjects in the laboratory, EDE are in principle a potential concern for economists designing
their own experiments. They can also be more generally seen as a threat to the interpretability, and
hence the internal validity, of economic experiments.
This methodological paper provides a general discussion of what to do about EDE. We try to
ask the question of when experimental economists should be worried (or more worried) about EDE
as a possible confound, and, when they are a potential likely confound, what can be done about
them. Our approach is pragmatic: while EDE can often not be entirely ruled out, the question is
what can be done to minimize the relevance or plausibility of an EDE based criticism in the context
of a progressive experimental research paradigm.2 Crucial to this strategy, it will be argued, is the
recognition that EDE are related to the beliefs that subjects hold about the experimental objectives,
and that such beliefs may be linked in different ways with the true experimental objectives.
Section 2 briefly provides the conceptual framework to our investigation on EDE and reviews
a number of examples for consideration. Section 3 considers EDE in more detail and reviews a set
1
On occasion papers do explicitly try to defend themselves from this type of criticism (e.g., Ball et al., 2001, footnote 8;
Benjamin et al., 2008, p. 25). A brief discussion of and warning against EDE is contained in Davis and Holt’s (1993,
pp. 26-27) textbook.
2
As the one defended in philosophy of science by Mayo (1996).
Electronic copy available at: http://ssrn.com/abstract=1163863
3
of examples related (or allegedly related) to EDE.3 Section 4 summarizes the empirical evidence
from section 3 and draws lessons on when we should in principle be worried about EDE. Section 5
considers what can be done in this case to minimize the risk of EDE, and section 6 discusses
defense strategies against EDE criticisms when EDE are still a possible issue. Section 7 concludes.
2. Conceptual Framework
EDE are about the relationship between the subject and the experimenter. The experimenter
provides the microeconomic system, in terms of environment and institutions, the subject makes
decisions in (Smith, 1982, 1994). In standard laboratory experiments the subject goes to the lab, sits
at a (computer or plain) desk and receives a set of instructions and possibly practice for the decision
task, with the expectation that payment will be a function of the decisions that he or she makes.4
Monetary payments that are a function of the experimental task, and other usual experimental tools
such as the lack of deception (Hertwig and Ortmann, 2001), are used to ensure a direct and salient
connection between decisions taken and desired monetary outcome, and therefore to ensure the
interpretability and therefore the internal validity of the experiment. This is true even in
experiments trying to identify other regarding preferences (e.g., Camerer, 2003, and Sobel, 2005,
for reviews), insofar as the effect of these preferences can only be isolated where set against the
benchmark of predictions under pure self interest.
Because the experimenter provides the microeconomic system the subject works with in the
laboratory, such as the experimental instructions, it is unavoidable that the experimenter is in a
position of authority relative to subjects. The experimenter has both legitimacy and expertise, which
have been recognized as sources of authority (French and Raven, 1959).5 That research is conducted
by Faculty members whereas ordinarily subjects are students, and that research is conducted in the
laboratory which is under the physical control of the experimenter, compound the vertical nature of
the relationship between experimenter and subjects. So do more avoidable factors present in some
experiments but not others, such as the physical presence of the experimenter (where noticeable) or
the use of a sample of one’s own students to run experiments. Figure 1 provides a stylized
3
A number of examples are discussed in this paper. As it is not meant to be a survey, and the amount of experimental
research that could in principle be affected by EDE is immense, they are, of course, provided without any claim of
exhaustiveness.
4
Payment may, of course, also be a function of luck and of the decisions of other subjects, depending on the type of
experiment.
5
Blass (1999) found that subjects weigh legitimacy and expertise as the two most important sources of the obedience to
authority found in Milgram’s obedience experiments (which are discussed below under Example 1).
4
representation of how the subject relates to the experimenter and (in interactive experiments) to
peers, i.e. to other subjects in the experiment. (Insert Figure 1 about here.)
Experiments generate data, and, in order for the data to be externally generated - as opposed
to be equivalent to a computer simulation the data of which is created by the researcher -, the
decision making problem by subjects is by definition incomplete. By this I mean that there is a
decision or set of decisions that is in the hands of subjects and which is not fully defined by the
experimenter or by peers. Subjects then try to make sense of the unfamiliar and incompletely
defined experimental environment based on the instructions, cues and feedback they receive. For
example, valuations may be influenced by the menu of available options (e.g., Stewart et al., 2003)
and anchored to prior information (e.g., Ariely et al., 2003), and framing cues may be used to
trigger behavioral schemata that, by similarity, may be applied to the decision problem at hand (e.g.,
Gale et al., 1995; Markman and Moreau, 2001; Zizzo and Tan, 2007). In interactive experiments
subjects form beliefs and choose actions with respect to peers, and possibly receive information
about the actions and social pressure that is being placed on them to play in a socially acceptable
way. Pressure may be received simply by learning what the other players have chosen (e.g.,
Breitmoser et al., 2007), or in some experiments explicitly through the receipt of advice on what to
do (e.g., Schotter and Sopher, 2006; Iyengar and Schotter, 2008). If these horizontal effects due to
subject – peer interaction matter, we may expect to matter potentially more in the context of the
vertical nature of the experimenter – subject relationship. As discussed earlier, the experimenter has
both expertise and legitimacy, and therefore authority.
As the experimenter knows best about the experiment, we may expect the subject to be
especially sensitive to his or her instructions and cues to make the decision problem more complete
(for example, to know which ‘real world’ behavioral schema is most suited for the task at hand).
This sensitivity to the cues provided may work through implicit (i.e., unconscious) cognitive
mechanisms: there is no reason for subjects to be explicitly aware of it, a point we shall return to in
section 6. These purely cognitive EDE simply follow from the position of expertise of the
experimenter, and do not need any exercise of authority as such by the experimenter.
Stronger form of EDE instead, in addition to this cognitive dimension, benefit from the
perceived social pressure that the experimenter, as an authority, explicitly or implicitly puts on a
subject through instructions and cues. The subject forms beliefs about the experiment objectives and
his or her actions can be played out in the direction that he or she believes to be congruent to such
5
objectives. These EDE have been previously conceptualized by psychologists (e.g., Orne, 1962,
1973).6 We label these stronger potential confounds as social EDE, to imply that, in addition to the
cognitive dimension, there is a social pressure dimension to them, and the two may of course
interact. Social EDE also need not be conscious, though they may well be so.
Before discussing examples of social and purely cognitive EDE, a qualification is in order.
While it is easy to identify paradigmatic examples of social EDE and of purely cognitive EDE - and
we shall do so shortly -, the distinction does have a gray area of cases in the middle where the
classification is arguably debatable. That being said, nothing important in this paper turns on a
specific example being classified as a social EDE or as a purely cognitive EDE.
3. Considering EDE in More Detail
3.1 Social EDE
Social EDE can have a number of sources, such as social conformism of the kind also found
in relation to horizontal social pressure (e.g., Jones, 1984; Asch, 1956); desire for respect (e.g.,
Ellingsen and Johannesson, 2007, 2008), which may be stronger in relation to an authority figure
than in relation to horizontal relationships with unknown strangers, though not necessarily so if the
other subjects are known or engaged in repeated social interaction; or to straightforward obedience
to authority attitudinal characteristics (e.g., Blass, 1991). Another possible source is concern for the
experimenter’s welfare, but the evidence on this is mixed.7
Whatever their source, social EDE imply that, if subjects are told or given a hint of what to
do, we may expect them to be more likely to do it. Depending on the true experimental objectives,
this will turn up not to be necessarily a problem (section 4), but of course it could be.
Example 1. Perhaps the strongest social EDE manipulation present in the existing
experimental literature is the one in Milgram’s (1963, 1974) obedience experiments. A subject was
told he or she was the ‘teacher’ of another subject (truly, just an actor), who had to answer a set of
questions and was strapped to an electric chair. Each time an incorrect answer was given, the
6
Orne (1973, p. 163) explicitly defines EDE as being related to the subject seeing “as his task to ascertain the true
purpose of the experiment and respond in a manner which will support the hypotheses being tested.”
7
Frank (1998) has a clever experiment where payoffs not realized by subjects in an ultimatum game are literally burnt
in front of them, instead of being kept by the experimenter. He finds this makes no difference to subjects’ behavior.
Harrison and Johnson (2006) note that the identity of the recipient of the money left unspent in a dictator game
experiment variant – whether it is not specified in an un-named charity, a third player in the room, or no one specific –
mattered for dictator game allocation. However, their results can arguably be explained simply by other regarding
preferences coefficients towards contributing towards the third player or a charity.
6
subject was asked to press a button that was meant to produce increasing electric shocks (from 15 to
450 volts) and to which the actor reacted accordingly in pain (as if the shock were real). If the
subject hesitated, the experimenter insistently demanded for him or her to continue. Over two thirds
of the subjects obeyed all the way to giving shocks of 450 volts. Milgram’s findings have been
replicated under a range of variants and of subject pools (e.g., Shanab and Yahya, 1977, 1978;
Blass, 1999). That being said, they represent an extreme case of social EDE at work in an
experiment where the effect of such social EDE was itself the objective of the experiment.
Example 2. The results of a study of the effect of different lighting conditions on worker
productivity at the Hawthorne Plant of the Western Electric Company in the 1920s and 1930s have
been interpreted as implying that workers put in more effort simply because they were being studied
(and independently of the lighting conditions). This interpretation, encoded by Mayo (1933), has
been popular in the sociological and management literature (e.g., Adair, 1984; Gillespie, 1991). It
has, however, come into question (e.g., Draper, 2006; Macefield, 2007). Alternative interpretations
based on learning and/or feedback effects are plausible (Parsons, 1974); a regression analysis of the
Hawthorne original data shows no support for a ‘Hawthorne effect’ interpretation (Jones, 1992);
and attempts that have been made to replicate the original results have been unsuccessful (Rice,
1982). The comparative weakness and ambiguity of the evidence implies that, notwithstanding its
enduring textbook appeal, it can hardly be used to argue for social EDE being an all pervasive
confound.8 Similarly, while a putative Hawthorne effect might be related to teacher effects in which
teachers’ expectations affect later performance of the students (Rosenthal and Jacobson, 1992), the
underlying causal mechanisms are unclear (for example, if a teacher believes a student is good, she
may behave differently towards that student, thus helping her to perform better).
To the extent however that such effects might exist, they operate in a well defined direction:
the subject has an obvious interpretation of what the authority’s (the employer, the teacher)
objective is – a better performance -, and there is an obvious way in which action can be taken to
facilitate such an objective, namely by putting in more effort.
Example 3. Bardsley (2008) draws an analogy between EDE and placebo effects in medicine.
Placebo effects occur when a patient’s medical condition improves for the very fact of knowing that
a medicine has been taken. Double blind trials controlling for placebo effects are standard
8
Richard Nisbett once called the Hawthorne effect “a glorified anecdote”: “once you have got an anecdote, you can
throw away the data” (cited in Kolata, 1998).
7
procedure when validating the effectiveness of new medical treatments. Draper (2006) comments
that, while there is clear evidence of placebo effects for some perceived variables such as pain,
more generally the evidence is not as strong as traditionally believed (Kienle and Kiene, 1997;
Hrobjartsson and Gotzsche, 2001). Insofar as there are any, placebo effects are characterized by
both a clear perception of what the authority wants (it wants the subject to get better) and by a clear
understanding of what action is required as a result (to feel better).
Example 4. Psychologists routinely measure the extent to which subjects’ responses to (not
incentivized) questionnaires is distorted by their tendency to respond in a ‘socially desirable’ way
(e.g., Stober, 1991; Crowne and Marlowe, 1964). One dimension of social desirability is about
impression management (Paulhus, 1991), as subjects try to put themselves in a positive light
towards the experimenter amplifying the good (e.g., kindness) and minimizing the bad (e.g., selfinterest). To mention a specific example, Fleming et al. (2007) found that a social desirability bias
extended to stated judgments of risk of socially controversial technologies: groups who scored
highly on social desirability measures judged socially contentious technologies (such as genetically
modified insulin) riskier than groups who had low social desirability scores; conversely no
difference was found between groups for non-contentious technologies (such as replacement heart
valves). Obviously this kind of distortion may be of concern in economic experiments, although the
lack of incentives and of behavioral responses may insulate economic experiments more than their
psychological counterparts in which socially desirability has been measured and found relevant.
Also, while socially desirability measures sometimes appear correlated with the other psychological
instruments - thus suggesting a distortion -, other times they are not, or appear to have only a small
influence (see the survey in King and Bruner, 2000). Crucially, the possibility of impression
management lies in the belief that the subject has about behavior that is socially desirable to the
experimenter, and in his or her views on how to change responses in the socially desired direction.
Example 5. In experimental economics a classic example of social EDE is Binmore et al.’s
(1985) bargaining experiment, where results more in alignment with self-interest were obtained
than in Guth et al. (1982), but instructions were specifically given asking subjects to be selfinterested: “How do we want you to play? YOU WILL BE DOING US A FAVOUR IF YOU
SIMPLY SET OUT TO MAXIMIZE YOUR WINNINGS.” Thaler (1988) contains a discussion of
the corresponding loss of experimental control given that Binmore et al.’s (1985) objective was to
show that subjects behaved according to the self-interest prediction.
8
Example 6. Croson and Marks (2001) considered the effect of exogenous recommended
contributions in a threshold public good game. They found statistically significant behavioral
changes as a result of the recommendations, although the likelihood of efficient provision was
increased as the result of the recommendations only when valuations of the public good were
heterogeneous. Shang and Croson (2008) describe a field experiment run as a set of phone calls as
part of a fundraising campaign by a public broadcasting radio, where in a treatment the caller
mentioned the contribution of someone else ($ 75, 180 or 300) and then asked for a contribution
amount. A similar experiment was run with a mailing campaign (Shang and Croson, forthcoming),
and in both experiments the implicit recommendation provided had an impact on contributions.
Building on Cason and Sharma (2007), Duffy and Feltovitch (2008) had subjects play repeatedly
Chicken games after having received a ‘recommendation by the computer program’ on an action to
take. Recommendations had on impact on behavior, and initially three subjects out of four chose the
recommended action out of two.9 When the advice was systematically bad or corresponding to non
equilibrium behavior, it had less of an impact than when it when it corresponded to a Nash
equilibrium or to a payoff improving correlated equilibrium.
In all these cases, the provision of an exogenous recommended contribution or action may
operate as a form of demand by the experimenter onto the subjects, in the context of a strategic
environment where such a demand effect produced by the experimenter is aligned with the potential
interest of players to coordinate and cooperate - though not always so in Duffy and Feltovitch, and
it is interesting that in this case subjects learn to disregard the advice. There is an external validity
justification to the manipulation, since we would expect that in the real world recommendations
may indeed come from authorities (in the context of public good games) or from mediating third
parties in a position of authority (in the context of Chicken games). Directly comparing
recommendations by authorities with those by peers, for exactly the same treatment manipulation,
may be a way of providing an indicative estimate of the size of this effect.10
Example 7. Chou et al. (2008) did a set of best guessing games experiments, with a standard
rule that the winner would be the person guessing closest to ¾ of the average guess by two players.
9
Cason and Sharma had an initial compliance rate of slightly higher than 80% in their baseline treatment with
recommendation. Unlike Duffy and Feltovitch, their instructions noted and stressed that following the recommendation,
if followed by both players, was payoff-enhancing.
10
The estimation would be indicative because, even where the recommendation comes from a peer, it may become a
part of the cognitive understanding of how the subject should play the game in the experiment, so there may be a purely
cognitive EDE anyway at work.
9
One of their manipulations was to introduce a strong hint on how to play the game, by writing down
(in bold characters, as a separate paragraph and with a figure on the side to stress the point) “Notice
how simple this is: the lower number will always win”. Chou et al. found that subjects largely
followed this advice. In another manipulation they had what they label a “battle protocol” in which
“your job is to choose how high to locate your troop on the hill, from 0 feet high to 100 feet high”
and “you win the battle if your chosen location is higher than your opponent’s”. Subjects largely
followed ‘the job’ they were given.
Chou et al. interpret the findings as showing that the instructions enabled to have a better
game theoretical understanding of the guessing game. There is another possible interpretation: there
was a social EDE at work and subjects simply did what they were told to do. As such, Chou et al.
may not be actually measuring the levels of reasoning of the subjects, which, given the abstract
nature of the game, is arguably the main reason of interest of guessing games as such (e.g.,
Camerer, 2003).
Example 8. Branas-Garza (2007) investigated the effect, in dictator games, of having (in bold
characters and in an emphatic centered position) the following cue: “REMEMBER that he is in your
hands” (i.e., he relies on you). This led to an increase in dictator giving. Branas-Garza argues that
the frame increases the moral costs from not giving. However, he notes that the increase in giving
was greater in a classroom setting run by the professor (for extra credit points) than in a regular
double-blind lab version (for money), and interprets this as evidence that, the greater the authority
delivering the cue, the greater its effect.
While it may be difficult to draw clear lessons from comparing the classroom setting to the
double blind version,11 it is clear that even in the lab version a social EDE is likely to be present and
may be amplified by the subjects’ perception of the dictator game setting (a point we shall return to
in the next section). It is not clear whether the cue is effective because of social EDE, because of the
salience of moral norms, or because of other factors such as guilt aversion and trust responsiveness,
which would suggest that making the reliance of the recipient salient should increase the giving
(e.g., Battigalli and Dufwenberg, 2007; Bacharach et al., 2007).
Example 9. Cadsby et al. (2006) had subjects earn money by answering a questionnaire and
then, in one treatment, they ‘required’ subjects to pay 30% of their resulting income, which thery
11
The use of classroom volunteers vs. proper lab volunteers, the different type of incentives and the possibility of peer
effects all contribute to make the comparison difficult. We return to classroom experiments in section 5.
10
were ‘expected to indicate correctly’. In the baseline treatment no such experimenter demand was
placed and an ‘invitation to gamble’ frame was used instead. They found that, in the first frame,
truthful income reporting remained high even when the chance of tax fraud being discovered was as
low as 1%. This is a clear case of social EDE but one with perfect external validity mapping, given
that, in the real world, authorities also unequivocally demand tax payment.
3.2 Purely Cognitive EDE
While social EDE are normally either an explicit experimental manipulation or directly
triggered by such a manipulation, the lack of explicit or implicit instructions to the subject on how
to behave does not prevent the existence of weaker forms of EDE. This is true for at least two
reasons. First, as discussed in section 2, subjects try to make sense of the unfamiliar and
incompletely defined experimental environment based on the instructions, cues and feedback they
receive, and the experimenter is the most qualified expert about the experiment they can get
information on cues from about what the experiment is about and they should do as a result.
Second, there is the Heisenberg principle type of argument that, by the very fact of drawing the
attention of subjects on the experimental variable of interest X, one is changing behavior in relation
to X. This second point is framed as a philosophical one but is actually an empirical one: that
human beings behave according to a Heisenberg type principle is an elegant conjecture but not
much more than that; it requires answering in specific cases and may turn out to have differential
validity in different settings. The first point needs to be taken into account but may be turned into an
advantage insofar as the experimental economist is interested in drawing inferences in relation to
specific contexts and frames; and task construal may be a matter of the real world as much as of the
experimental laboratory.
Example 10. The debate on the degree to which experimental instructions should be context
rich or context free is well known. Context can help subjects’ understanding; it may help external
validity in relation to the specific real world context, although this relies on the subjects’ specific
expectations about the context being realistic; if the experiment is complex, context may to some
extent be unavoidable. At the same time, however, context may distract subjects and allow them to
carry over unrealistic scripts and expectations to the task; it may reduce the generality of the
experiment; it may induce EDE.
Rather than adjudicating between the two views, here we focus on this last point more, by
making the obvious point that EDE might be triggered by the use of loaded language as a framing
11
device. To illustrate, in Baldry (1986), presenting a decision task in terms of tax payment rather
than betting made subjects pay more taxes; Alm et al. (1992), however, did not find an effect for the
usage of “tax” instructions, but this may have been because there was too high contribution anyway,
possibly influenced by their ‘group’ benefit framing in their instructions.12 Abbink et al. (2006)
compared loaded versus neutral instructions in a bribery experiment. While the ‘loaded’ treatment
had a lower bribery rate and bribery acceptance rate, the difference was too small to be statistically
significant. These illustrative cases where the frame can be clearly connected to an EDE show that
an effect may be present but may not be large and may be subject to ceiling effects.
Exceptions do exist, however. Some relate to the dictator game (see Example 13 below). To
mention another study, Liberman et al. (2004) had subjects play Prisoner’s Dilemma which was
either labeled as a ‘Community Game’ or as a ‘Wall Street Game’. Impressively, cooperation was
roughly twice as much with the ‘community game’ frame. This can, however, be explained by a
powerful interaction between the cue provided by the experimenter and the face-to-face nature in
which the experiment was conducted. Vertical and horizontal social pressure was not disentangled,
thus confounding the interpretation of the results.
Example 11. A stylized fact from research on experimental asset markets is the occurrence of
speculative bubbles for dividend paying assets (e.g., Smith et al., 1988; Sunder, 1995). Lei et al.
(2001) noted that experiments have traditionally had just a single activity available – trading in the
market for the asset –, and argued that the source of the bubbles could be that subjects feel induced
by the nature of the experiment to over-trade, thus pushing the prices up and creating the bubbles.
This, we may argue, would be a purely cognitive EDE.13 To address this, Lei et al. introduced an
alternative interesting activity - in the form of a second market where subjects could also trade in -.
They also added a statement to the instructions (in bold black letters) stating that subjects were not
required to participate in either of the markets if they chose not to, and that it was their decision
whether to participate in one, both or neither.14 Lei et al. found that the trading volume decreased as
the result of these changes to the design, suggesting that a cognitive EDE was indeed present.
However, prices were not statistically significantly lower in the experimental treatments than in the
12
See Tan and Zizzo (2008) for a discussion of group identity based framing, and Elliott et al. (1998), Cookson (2000)
and Hargreaves Heap and Zizzo (forthcoming) for three examples of its effectiveness (see also Example 16 below).
13
Another interpretation would be boredom. However, asset markets experiments usually find bubbles early on in the
experiment rather than or much more than late on. This is inconsistent with a boredom interpretation, as we may expect
subjects to be more bored late on rather than early on.
14
The emphasized extra statement may have induce a reverse form of EDE in which subjects may have felt compelled
to trade less, but, given the qualifications in the statement, this is not especially plausible.
12
controls, denting the claim that the occurrence of speculative bubbles is an artifact of such an EDE.
Experimental tests of market contestability theory are another area where the absence of an
alternative interesting activity might induce a cognitive EDE (see Holt, 1995, for a review).
Example 12. Attempts to disentangle ‘confusion’ from social preferences in public good
contribution experiments have estimated that ‘confusion’ explains up to around ½ of contribution
levels (Andreoni, 1995; Kurzban and Houser, 2002; Ferraro and Vossler, 2008). While it has been
modeled on occasion as an error parameter in quantal response equilibria models (e.g., Goeree et
al., 2002), an open question remains on what the source of ‘confusion’ actually is. Andreoni (1995)
estimated confusion by comparing a standard public good contribution experiment with a rank
tournament version with different incentives, while Kurzban and Houser (2002) and Ferraro and
Vossler (2008) replaced human public good co-players with computer co-players, in the full
knowledge of the subjects; Ferraro and Vossler’s protocol emphasized that what the subject would
do would have no impact on what the computer would later do, i.e. that the computer play was
predetermined. One possibility is that the nature of the standard public good problem, in which
deviation from Nash is only in one direction of contributing, may trigger a purely cognitive EDE
requiring subjects to engage in some contribution. An alternative form of purely cognitive EDE
may be in subjects herding on the cue provided by the computer based contribution.
Ferraro and Vossler informally report results from post-experimental focus groups which do
not seem supportive of the first cognitive EDE. They seemed to have found that subjects either
understood the incentive structure, or they did not and herded on the computer player contributions,
which is consistent with the second cognitive EDE; or they perceived the public good problem as an
assurance game and acted accordingly. This last answer is consistent with subjects applying a
behavioral schema from the real world which they felt appropriate even though it did not fit the
strategic details of the experimental setup,15 and may be consistent with a social preference story
and with a genuine reason why cooperation is observed in real world public good problems.16 The
informal partial support for the second EDE suggests that the usual anchoring on the contribution of
others found in public good experiments (e.g., Carpenter, 2004; Perugini et al., 2005) may in part be
explained not by social preferences or by internalized social norms, but rather by EDE. The strength
15
For other such examples, in the context of ultimatum games, see Carter and McAloon (1996) and Hoffman et al.
(2000).
16
Sissons Joshi et al. (2004) used survey techniques to show that car drivers in Oxford (U.K.) tend to perceive their city
traffic congestion problem as an assurance game rather than as a standard social dilemma.
13
of these results, of course, depends on the extent to which we are ready to believe in the focus group
results (see section 6 below).
Example 13. Dictator games are highly artificial settings in which subjects are asked to
consider giving, and often give, significant amounts of money to strangers while they rarely do so in
the real world (Schram, 2005; Bardsley, 2008). Dictator giving appears very sensitive to apparently
small changes in the design, such as changes in deservingness (e.g., Ruffle, 1998), the availability
of a picture of the recipient (Burnham, 2003), other information provided on the recipients (Branas
Garza, 2006) and awareness of observation (Haley and Fessler, 2005). Depending on the
experimental details, the fraction of subjects giving money varies over a wide range: for example,
only around 10% of the subjects gave money in treatments by Hoffman et al. (1994, 1996) and
Koch and Normann (2007), but over 95% did so in a treatment by Aguiar et al. (2008) and Branas
Garza (2006). By their unusual nature, dictator games are typically done only once, although
within-treatment manipulations are sometimes made (as in Andreoni and Miller, 2002).
The structure of the dictator game makes it an obvious candidate for an EDE. Subjects are
given money by the experimenter and their choice is simply to give it or not, with the recipient not
having a say in the matter; they do the experiment only once or, even if they do it more than once,
they do not receive any feedback after each play. A purely cognitive EDE would be sufficient for
subjects to realize that the experiment’s objective is about giving and that, therefore, they should be
giving some money. The question would then be how much money they should give, and the clues
given in the experimental setup can help in that respect, thus contributing to the sensitivity of
dictator giving to small changes in the design. Since this EDE may operate in a purely cognitive
way, a double blind design of the kind employed by Hoffman et al. (1994, 1996) and others would
not in itself be sufficient to remove it, though it might help reduce any surplus social EDE that may
also be present in such a setting.17
The clearest evidence of a purely cognitive EDE at work in dictator games comes from clever
experiments by Bardsley (2008) and List (2007).18 In their common dictator game variant, subjects
could not only give but also take money from the recipient. A shift to lower giving (and more
taking) was observed, which can in part be explained by a subject sample with heterogeneous
17
In a personal communication, Gary Charness has suggested that a reverse EDE may be at work: by stressing double
anonymity in the design, the experimenter may induce subjects to give less.
18
‘Clearest’ does not mean unequivocal. Bardsley (2008) discusses alternative interpretations of his findings, though his
preference is for an EDE interpretation. List (2007) puts his results in the context of the ‘moral cost’ framework
developed in Levitt and List (2007).
14
preferences,19 but can in part be attributed to a range dependence of the giving activity, which is
consistent with a purely cognitive EDE interpretation.20
These findings suggest caution in drawing inferences on dictator games from the laboratory to
the field. They may also make the interpretation of laboratory findings sometimes difficult,
particularly as social dimensions may easily be present on top of the cognitive dimension of the
EDE. We discussed one such example above (Example 8). Other cases may be subtler. For instance,
dictator game studies of social distance may confound horizontal with vertical social pressure
effects (see Dufwenberg and Muren, 2006), while experimental instructions noting that subjects are
‘entitled’ to keep part of their endowment (thus implying that they are not entitled to keep the rest),
asking subjects to justify their choices in writing to the experimenter, and manipulating the key
experimental variables on a within-session basis (see Branas-Garza, 2006 and Aguiar et al., 2008)
may be expected to be especially affected by EDE. We turn to the point of within-session
manipulations next.
Example 14. It is a basic principle of experimental design that counterbalancing or
randomization of the order of different tasks can be important for experimental control. Even where
counterbalancing or randomization occur, however, there is the potential danger of a purely
cognitive EDE if subjects believe to be able to glean information about the experimenter’s
objectives from the sequence of tasks at hand. Given a sequence of decision tasks, and based on
their perceptions of the experimenter’s objectives, subjects may aim to make their decisions
consistent with each other in a way in which normally, were they presented the tasks independently,
they would not be.21 This is not necessarily a problem. For example, in the context of individual
choice experiments, an attempt for consistency should be aligned with the financial incentives in
improving the goodness for fit of expected utility theory or other well defined utility functions.
Therefore, if the experimental hypotheses are about anomalies requiring to go beyond conventional
approaches to decision making under risk (as defined by Starmer, 2000), evidence for such
anomalies is made stronger by the fact that the cognitive EDE would operate in the opposite
direction. A change in behavior for consistency’s sake may also be practically more difficult if tasks
19
For example, a self-interested subject in the dictator game gives 0, but takes as much as possible if taking is allowed.
Bardsley (2008) also considered a pure taking game where subjects can take rather than give money, and used a
double blind protocol throughout.
21
There are a number of psychological reasons why this may be the case, having to do with cognitive dissonance
(Festinger, 1957) and self-esteem management (Kirkpatrick and Ellis, 2004).
20
15
change in multiple dimensions, thus making directly comparability harder and the experimenter’s
objectives less transparent.
Equally, the possibility of purely cognitive EDE through within-session manipulations should
not be ruled out in principle. For example, in Branas-Garza (2006) and Aguiar et al.’s dictator
games (2008), the recipient’s state was the only variable varied across three decision tasks, and it
was varied in way that made transparent how subjects ought to tailor their giving.22 Slonim and
Garbarino (2008) had on a within-session basis a trust game and a modified dictator game which is
identical to the trust game except that the recipient cannot return money; while their key message
that partner selection is correlated with more giving is unaffected by EDE, the easy to obtain
consistency between trust game and dictator game behavior makes claims on the robustness of the
results between the two settings weaker. Andreoni and Miller’s (2002) dictator games varied multidimensionally across tasks, which should reduce the bias, but, since their paper is all about
consistency of choices and they still did use dictator games with their limitations (Example 13),
their within-session design is potentially worrisome in the light of the potential cognitive EDE.
Example 15. We now turn to examples where the cognitive EDE potentially arises from
Heisenberg principle type of problems, in which experimental manipulations concerning the
measurement of X might change behavior in relation to X. A first such case concerns experiments
that combine questionnaires with behavior in games, but as long as the questionnaire is after the
behavioral part (e.g., Ben-Ner et al., 2004a, 2004b; Büchner et al., 2007), or run in a separate
session say a couple of weeks earlier (Tan, 2006), the problem is addressed. The first solution
retains however the danger of spurious congruence between questionnaire and behavioral responses
of the kind discussed under Example 14. The danger is not as strong as sometimes claimed since
verbal and behavioral responses can be surprisingly divergent (Zizzo, 2003a), reflecting the
distinction in cognitive psychology between explicit and implicit cognitive mechanisms.23 Whether
the danger is realistic depends on how transparent the connection between the two parts is, in terms
of subjects being able to clearly identify that the experimenter’s objective is to seek a connection
between the two parts, and in what way (see section 5). Often, batteries of different psychological
instruments are run, making such an identification harder insofar as they are clearly about different
things (e.g., Ben-Ner et al., 2004a, 2004b, use both personality and cognitive ability type of
22
Namely, no information on recipient; ‘poor’ recipient for whom the money can be ‘very useful’; ‘poor’ recipients for
whom the money can be ‘very useful’ and in relation to whom the donations are converted in medicines ‘of great help’.
23
We return to this point in section 6.
16
questionnaires). Of course, questionnaires are sometimes deliberately used to prime salience, e.g. of
social norms (as in Benjamin et al., 2008), and in this case a potential EDE criticism arises.
Example 16. Separating out experimental participants into two artificial groups or teams in
order to identify the effects of ingroup-outgroup relationships is one such manipulation (e.g.,
Charness et al., 2007; Tan and Zizzo, 2008; Hargreaves Heap and Zizzo, forthcoming). In many
cases the group manipulations is mapped into differences in the material incentive structure of
different teams, in which case subjects may make sense of the artificial groups based on their real
world experience of competing teams (as in treatments present, for example, in Tan and Bolle,
2007, Bornstein et al., 2002, and Hargreaves Heap and Zizzo, forthcoming). On the other extreme
of social psychological research in the minimal group paradigm tradition (e.g., Brown, 2000, for a
review), a simple dictator like allocation task typically finds ingroup members being favored over
outgroup members, which in itself – particularly given the artificiality of the dictator game task and
the one shot nature of the task – could well be related to a purely cognitive EDE.
Somewhere in the middle, economic experiments with treatments where artificial groups have
been induced but without changing the incentive structure (e.g., Hargreaves Heap and Varoufakis,
2002; Charness et al., 2007; Tan and Bolle, 2007) enable us to look into the ‘pure’ effects of groups
but still need to respond to the potential cognitive EDE criticism. There are a number of ways in
which these middle ground papers can defend themselves against this criticism: relative to the
minimal group paradigm, they are based on the use of less artificial tasks and the typically repeated
nature of the interaction allows subjects to acquire familiarity with the decision task; in at least two
cases (Zizzo, 2003b, and Charness et al., 2007), the mere introduction of groups was insufficient to
produce behavioral results but introducing stronger manipulation (without obvious extra cognitive
EDE mapping) made a difference; more importantly, sometimes core findings can be identified that
are not obviously explainable by a cognitive EDE story but which fit with genuine psychological
mechanisms related to the existence of groups, such as the unequivocal inducement of negative
inter-group discrimination in Hargreaves Heap and Zizzo (forthcoming) and to a lesser degree in
Zizzo (2003b), or the evolution of conventions in Hargreaves Heap and Varoufakis (2002).
Example 17. Trust responsiveness implies that, in trust games, trustees are more likely to
fulfill trust if they believe that trusters believe that they will fulfill trust (see Bacharach et al., 2007;
Guerra and Zizzo, 2004); in that sense, trust responsiveness requires a causal link from a second
order belief (a trustee’s belief about a truster’s belief) into behavior, and a direct experimental test
17
of trust responsiveness (and separation from other psychological mechanisms) requires the
measurement of such a second order belief (as also, for example, in Dufwenberg and Gneezy,
2000). The potential purely cognitive EDE is that, by the very fact of measuring beliefs, attention is
being drawn on trust responsiveness or similar psychological mechanisms (such as guilt aversion:
Battigalli and Dufwenberg, 2007), therefore altering the behavior of subjects. Guerra and Zizzo
(2004) directly tested this hypothesis, by comparing treatments with and without belief elicitation.
Of course, if beliefs are not elicited we cannot verify whether a link between link between beliefs
and behavior exists, since beliefs become unobservable. What can be verified, however, is whether
any distortion in trusting or fulfilling rates occurs as the result of eliciting beliefs. Guerra and Zizzo
found no evidence of this, thus casting doubt on the existence of an EDE in this setting.
Example 18. We have already considered cases where horizontal pressure might be combined
and confounded with vertical EDE (Liberman et al., 2004, under Example 10; Dufwenberg and
Muren, 2006, under Example 16). Another such case might be the availability of purely social
punishment on the part of public good contribution game co-players (as in Masclet et al., 2003).
Provision of horizontal advice (as in Schotter and Sopher, 2006, 2007, and Iyengar and
Schotter, 2008) may also be confounded with a purely cognitive EDE, as the experimental design
may clue subjects in on the fact that they should use the advice provided or take seriously the
opportunity to give advice. The problem here arises insofar as subjects can transparently read the
objective of the experiment as requiring them to take the advice and its provision seriously, and act
accordingly. One response to this is an external validity argument: in real world organizations, the
authority does require for advice and its provision to be taken seriously, and so the laboratory
simply mirrors real world organizational setups. Another response would be, if feasible, to identify
predictions that can be predicted by horizontal advice but not by the EDE.
4. Empirical Relevance of Experimenter Demand Effects and Experimental Objectives
4.1 Setting the Scene
Our analysis of a set of (real or alleged) examples of EDE has shown both the potential
explanatory power of demand effects in economic experiments and, at the same time, their limits in
what they can plausibly explain if one relies on the existing evidence. It is clear from examples
from outside economics such as the Milgram obedience experiments (Example 1) and the social
18
desirability research (Example 4),24 but also from cases of research in experimental economics
where dramatic changes in behavior have occurred (Examples 5-9), that social EDE do have at least
the potential to play havoc with experimental control. While potentially of relevance in a much
wider set of experimental designs, the empirical case for purely cognitive EDE is only partial. It is
clearest in the frequently used but highly artificial and arguably unrepresentative setup of dictator
games (Example 13). The evidence from asset markets is only partial (the purely cognitive EDE
affected trading volumes but not prices: Example 11), and that from public good contribution
experiments is bounded to what is probably a fraction of the variance in the contribution that is
attributed to ‘confusion’ (Example 12, in terms of reversion to the mean contribution). Similarly,
the existing evidence (insofar as there is some) points to only a limited effect in the context of the
potential connection between framing and purely cognitive EDE (Example 10), and the one study
which directly tested a Heisenberg principle type of purely cognitive EDE (Example 17) found no
support. That being said, we admitted that a purely cognitive EDE may lurk, at least in principle, in
other examples as well (Examples 14-16, 18).
When should we be worried about EDE? Figure 2 develops the part of the conceptual
framework of Figure 1 that deals with the relationship between the subject’s beliefs and actions and
the experimenter’s objectives. (Insert Figure 2 about here.)
Based on the instructions, cues and pressure received, subjects form expectations about the
experiment’s objectives which may inform the actions they take. The key, often overlooked, point is
how the subjects’ expected experiment objectives and corresponding actions relate to the actual
experiment objectives and predictions made by the experimenter.
4.2 Uncorrelated Expected and True Objectives
Assume that subjects are unable to form a view of the experiment objectives, or, to the extent
to which they are, the subjects’ expected experiment objectives are orthogonal to the true objectives
and predictions. That is, they do not plausibly imply actions that go either in the direction of or in
the opposite direction of the experiment predictions. In this case we can say that the eventual EDE
are uncorrelated with the true experimental objectives. In Carter and McAloon (1996) subjects
played either a standard ultimatum game or a similar but strategically different tournament game.
The point of the experiment was to verify that, contrarily to social preference models that predict
different behavior in the two games, similar behavior would be obtained, thus creating a potential
24
For reasons discussed in section 3.1, they provide stronger evidence than Examples 2 and 3.
19
puzzle for social preference models. The technical nature of the predictions and the betweensessions nature of the design make orthogonality a plausible assumption. Similarly, in Fehr and
Tyran’s (2001) paper on money illusion, either subjects saw a payoff matrix the complexity of
which could induce money illusion or they saw one that did not. Subjects could figure out that the
experiment was in part about the shock to the economy which they knew would occur mid-way in
the session, but this expected experiment objective cannot explain a differential response of subjects
under a complex payoff matrix or a simple one, i.e. it is orthogonal to the true objective. Hargreaves
Heap and Zizzo’s (forthcoming) focus was on determining whether intergroup discrimination was
positive or negative (which was verified against a control treatment run on a between sessions
basis) and in evaluating the psychological valuation of groups based on neutrally framed markets
for group membership. Although subjects could perceive the experiment was about cooperation and
groups, the way that this might distort behavior in (e.g., aggregate more cooperation across
treatments) was orthogonal to the true experimental objectives.
In Sarin and Weber (1993), both ambiguous assets and unambiguous assets were traded in
markets on a within session (or even within trading period) basis, and their objective was to look at
whether ambiguity aversion carried out in markets. There is no obvious sense however in which the
difference in ambiguity between assets should translate into a clear expected objective that one asset
should be favored over the other. That being the case, we can plausibly predict any eventual EDE to
be uncorrelated to the true experiment objective and therefore irrelevant to the focus of the paper.
Although we discuss only four examples here, more could be made. The key point is that,
even though an at least purely cognitive EDE might arise in principle, it would plausibly have no
bearing for the key experimental predictions and tests of the paper, thus making EDE a non issue.
Crucially, the true experimental objectives are obscure to the subjects, and as the expected
experimental objectives are either also unclear or uncorrelated to the true experimental objectives,
the subjects are unable to engage in actions that can act as confound.
4.3 Negatively Correlated Expected and True Objectives
Some experimental designs are such that an EDE may be induced, but may be negatively
correlated, and so work against, the predictions implied by the true experimental objectives. In this
case we can say, for short, that EDE and true experimental objectives are negatively correlated. As
noted under Example 14, within-subject designs may facilitate behavioral consistency, and so, in
individual choice experiments trying to find behavioral anomalies relative to expected utility theory,
20
and employing within-subject designs, any eventual purely cognitive EDE will operate against the
true experimental objective. Abbink et al. (2006), discussed under Example 10, is aimed to show
the robustness of behavior to the use of ‘loaded’ instructions, and so any EDE induced by the
‘loaded’ instructions should operate in the direction opposite to the true objective of the experiment.
Menzies and Zizzo’s (2005) true objective is to try to find evidence of sluggish belief adjustment,
but a key reason for why there may be such sluggish attention is inattention by economic agents
(e.g., Carroll, 2003); since in their experiments subjects simply receive a signal each period and all
they have to do is to make sense of it to revise their guess of the true state of the world, the resulting
purely cognitive EDE may focus all of the subjects’ attention on the signal and therefore may lead
to less sluggish belief adjustment than would otherwise be found. Cason et al. (2002) use a public
good experiments to find evidence of spite; since the public good experiment setup should if
anything be biased towards cooperative behavior (Example 12 above), the purely cognitive EDE
implied by the game setup seems again to operate against the true experimental objective.
In this scenario, unlike the previous one, EDE can act as a confound, but only in the sense of
making more difficult to show statistically significant evidence in support of the true experimental
objectives. On the one hand, if no evidence is found in support of the true experimental objectives,
this could be due not a genuine failure of the corresponding hypotheses but rather to EDE. On the
other hand, if statistically significant evidence is found in support of the true experimental
objectives, its persuasiveness is reinforced instead of the weakened by the knowledge that there
may be potential EDE working in the opposite direction.
4.4 Positively Correlated Expected and True Objectives
The potentially most problematic scenario is one where, if they exist, the EDE are positively
correlated with the true experimental objectives: that is, the EDE are positively correlated with the
predictions implied by the true experimental objectives. In this case, and assuming that
investigating EDE is not in itself the true experimental objective, if we observe behavior that
appears in support of the hypotheses related to the true experimental objectives, we cannot in
principle be sure about the genuine extent to which such behavior is due such hypotheses or due to
EDE. This positive correlation – not the mere existence of EDE – is what creates a potential
confound problem. Examples of experiments with positive correlation have been considered
throughout section 3, among others Binmore et al.’s (1985) bargaining experiment (Example 5), the
provision of advice on contributions and coordination (Example 6), instructions on how to play the
21
best guessing game (Example 7), behavior in public good (Example 12) and dictator (Example 13)
games, within session manipulations potentially inducing EDE aligned with the true experimental
objectives as in Andreoni and Miller (2002; Example 14), behavior and questionnaires (Example
15) or belief elicitation (Example 16). In the next two sections our primary focus will be on this
potentially problematic positive correlation scenario.
5. Dealing with Experimenter Demand Effects
Before discussing some of the things researchers may do to minimize EDE, an important
qualification is in order. In designing and running an experiment, researchers need to take into
account both their true objectives and the theoretical and practical constraints on hand, in terms for
example of cognitive simplicity of the experimental environment, duration of the experiment,
inability (in economic experiments) to engage in deception, number of experimental sessions and
treatments that can be run given the budget and the subject pool, and so on. Put it simply,
sometimes researchers may need to consciously accept a trade-off between different experimental
objectives and constraints, and it may be optimal for them to accept some risk of an EDE as a result,
rather than going for a corner solution where such risk is brought to zero. This does not necessarily
mean that the experimental results are confounded, and section 6 will discuss strategies to deal with
EDE criticisms where a prima facie case for an EDE positively correlated with the true
experimental objectives exists and has not fully been dealt with at the experimental design stage.
Social vs. purely cognitive EDE. The distinction between social vs. purely cognitive EDE is
relevant in thinking about how to deal with EDE for two reasons. First, most traditional design
adjustments are really thought with social EDE in mind. Second, unless there is a clear connection
between what the experiment is trying to do and the kind of social cues that are required for social
EDE to operate, social EDE are more straightforward to handle.
Measures such as not running the experiment with classroom ‘pseudo-volunteers’25 or
avoiding the presence of a senior experimentalist in the experimental room are obvious ways to
minimize the effectiveness that social cues may have, and so social EDE. So is more generally
minimizing the social interactions between experimenters and subjects, the objective being to
25
Eckel and Grossman (2000) present a dictator game experiment where (a) pseudo-volunteers give more than standard
volunteers and (b) they are more sensitivity to religious or altruistic preferences as measured in questionnaire
instruments. They note how EDE may be one explanation for their findings. Of course, they fact that they use dictator
games and combine them with a questionnaire instrument may make their results stronger than they otherwise would be
(as discussed under Examples 13 and 14).
22
maximize the social distance between subject and experimenter; a procedure such as the one used
by Ball et al. (2001) in which status was provided – in part – through a public award ceremony
weakens the results of that paper because of the social cues reinforcing the status effect
dimension.26 Double anonymity has been discussed as a tool under Example 13, but in practice may
be unfeasible in all but the simplest of experiments, and may impoverish the data that can be
collected.27 Not telling subjects what to do, or – more subtly – avoiding loaded frames is however
normally feasible. It may be more difficult when the potential social EDE is intimately connected
with the true objective of the experiment (without being the same), such as in the context of the
provision of advice to analyze its impact on the play of correlated equilibria in Chicken games (as
in Duffy and Feltovitch, 2008: see Example 6).
Changing the Decision Task. Due to their subtler nature, purely cognitive EDE potentially
apply to a wider range of designs than social EDE, and as a result they may be harder to control
completely. In some cases, such as that of dictator games (Example 13) and less evidently that of
public good contribution experiments (Example 14), the very nature of the decision problem implies
the existence of potential purely cognitive EDE. Clearly, and following the spirit of Lei et al. (2001:
Example 11), one solution is to alter the decision task in such a way as to minimize the purely
cognitive EDE that might otherwise result (e.g., in Lei et al. with the introduction of an interesting
alternative activity).
Non-Deceptive Obfuscation. Assuming that this is not possible or desirable for other reasons,
as noted in section 4 the problem arises when, if they exist, EDE – and so the expected experimental
objectives (Figure 2) - correlate with the true experimental objectives, especially when positively
so. Using our conceptual framework, a possible solution then is to try to minimize such correlation
between expected and true experimental objectives. Since deception is not allowed in economic
experiments, and for good reasons (e.g., Hertwig and Ortmann, 2001; Ortmann and Hertwig, 2002),
26
Ball et al. (2001) claim not. They note how the status procedure is a deliberate treatment variable in their experiment,
any resulting EDE is unlikely to affect the auction results, and a debriefing questionnaire asking subject to describe their
thought process and strategy did not find evidence for a relevance of the status symbol they use in relation to each
subject. The first point is discussed and criticized in section 6: that a procedure was implemented deliberately to induce
status does not negate the fact that status recognition is not the same as vertical authority given by a subject, i.e.
deference requested by the experimenter, which is the potential EDE here. The third point concerns the significance of
‘postexperimental inquiries’ (Orne, 1973), which shall be considered in section 6, but, taken at its face value, could be
used against both a status and an EDE interpretation of the results, and thus cannot be used to prefer one to the other.
Equally, the second point (an EDE should be neutralized by the auction mechanism) appears no less but also no more
persuasive than saying that a status effect should be neutralized by the auction mechanism.
27
For example, it may prevent being able to map demographic data into the choices that subjects make. As an earlier
footnote noted, it might also, by its own right, induce EDE.
23
this attempt to minimize the correlation has to be played out under the constraint that deception not
be used. I label the set of tools that can be used to achieve this as non-deceptive obfuscation. That
is, if the danger of a positive correlation exists, the experimenter can try (without using deception)
to obfuscate the true experimental objectives or to modify the expected experimental objectives in
such a way that the correlation is reduced.
What are some techniques that can be used to achieve non-deceptive obfuscation? One is the
use of context-free (not simply non socially loaded) language avoiding tipping agents in one
direction or another, though of course (as discussed under Example 10) this has its disadvantages.
Another is the use of contexts that reduce the connection with the true experimental objectives, such
as the use of a products market frame in Fehr et al.’s (1993) study of sequential labor markets.
Third, while sometimes requiring too many resources to be feasible, between sessions designs can
often be used effectively to obfuscate the actual experimental objectives (the examples of Carter
and McAloon, 1996, and Fehr and Tyran, 2001, are discussed in section 4.2). Fourth, when a
within-session design is important and unavoidable, for example because of an interest in verifying
the connection between questionnaire and behavioral responses (Example 15), requiring subjects to
come to two different sessions separated in time by between one and three weeks (as in Tan, 2006),
can be used as an obfuscation tool. Fifth, filler questions in questionnaires, or filler behavioral tasks,
can be employed: the latter may be time-consuming and dilute the financial incentives in the tasks
that matter (for a given budget), but the former is an obvious tool to employ when questionnaires
matter. In an experiment on emotional response elicitation (e.g., Bosman and van Winden, 2002),
they may take the form, for example, of asking about a set of emotions of different kind, therefore
obfuscating any inference on which ones the experimenter is actually interested in and how they are
connected with the rest of the experiment. Sixth, cues can be introduced in the experiment which,
while not deceiving subjects as such, may help in the obfuscation exercise. For example, if a cue
must be introduced pointing in one behavioral direction, another cue should be introduced pointing
in the opposite direction, with the disclaimer that the subject should do as pleases him or her.28 This
list is not meant to be exhaustive, but identifies specific ways in which potential EDE confounds
can be reduced or removed thanks to non-deceptive obfuscation.
28
The cue, of course, should not be deceptive. I use this technique in Sitzia and Zizzo (2008), where in an ‘informed
seller’ treatment subjects are neutrally informed of a possible strategic factor they may wish to take into account, but are
also informed of another (genuine) factor potentially working in exactly the opposite direction, and they are then
explicitly told that it is up to them if they wish to take one factor into account, the other one, both or neither.
24
6. Defenses against Experimenter Demand Effect Critiques
We now consider the scenario in which there is a positive correlation between EDE and true
experimental objectives and the likelihood of a potential EDE is significant enough that it may act
as a potential confound. Are there defenses that can be put forward against criticisms that the
experimental results are invalidated by EDE? We discuss six here.
1. The EDE as the objective of the experiment. If identifying an EDE is the objective of the
experiment (as for example in Milgram, 1963, Example 1, or Bardsley, 2008, Example 13), then
obviously it is not a confound variable. Casting this point in terms of experimental objectives is
crucial to avoid the potential confusion arising from Davis and Holt’s (1993, p. 26) phrasing of this
point in terms of the EDE being a treatment variable. This may induce the confusion (e.g. Ball et
al., 2001, footnote 8) that deliberately engaging in an EDE manipulation as a treatment variable
makes it right. This is only true if what the experiment and the resulting paper aim to achieve is
identifying an EDE; if it is, say, to identify status effects in markets (as in Ball et al., 2001) or to
argue for a moral cost model of reasoning (as, partially, in List, 2007), then obviously the potential
confound criticism will hold, and possibly more seriously so because of the difficulty of separating
these interpretations from an EDE when the behavioral correlation between the two is perfectly
positive in the experimental data.
2. The external validity defense. An EDE that parallels or helps reproduce an important
feature of the real world setup the experiment is trying to model is an EDE that may strengthen the
experiment by enhancing its external validity. As already noted in the discussion of Example 6
(e.g., Croson and Marks, 2001; Shang and Croson, 2008), advice provided in a position of authority
- that mirrors real world settings where such a vertical relationship advice may be provided – is an
obvious setup where an external validity defense holds. So is Cadsby et al.’s (2006) treatment
requesting subjects to pay their taxes, as discussed in Example 9, sometimes the use of a context
rich frame as considered in Example 10, or possibly the provision of horizontal advice in Example
18. Care has to be taken however that the mapping with the real world is structural and not
superficial. For example, using a stock market frame with real world traders may seem a good idea
in terms of external validity, but they will then find it easier simply to apply their own behavioral
schemata from real world trading which may have little to do with the actual trading environment
25
they are facing in the laboratory; if the true experimental objective was to find behavioral
anomalies, this might then be problematic.
3. The magnifying glass argument. I shall label a special version of the external validity
defense the ‘magnifying glass’ argument. It goes like this: an EDE may be legitimate if it magnifies
the relevance of a dimension which, in the real world, is (a) present to a stronger degree than in the
laboratory or (b) and/or cognitively familiar from experience and therefore easier to make sense of
than in the context of an unfamiliar experimental environment. The EDE would be a tool employed
by an experimenter in the same way in which a scientist may use a magnifying glass or a
microscope: to better, if artificially, identify effects which otherwise may not be observable.
As an example, consider the stress of experimenters on monetary incentives in the
instructions. The EDE thus induced may magnify the salience of the monetary incentives, and, as
monetary incentives at the margin are higher in many real world economic settings, may help with
the external validity of the experiment by making the results more widely applicable.29 That being
said, however, if the experiment is on social preferences, then the potential distorting effects on
motivation might be significant (e.g., Frey et al., 1996; Frey and Oberholzer-Gee, 1997) and the
existence of a positive correlation between EDE and true experimental objectives would
considerably weaken the magnifying glass argument.30
As another example, consider Lei et al.’s (2001) finding that the absence of an interesting
alternative activity induces a greater volume of trade in asset markets (though not a distortion in
prices: see Example 11). Since most asset markets of any relevance in the real world are much
‘thicker’ markets – in terms of volume of trade – than experimental markets with just a few (or even
just a few tens of) traders, the magnifying glass argument can be used to suggest that, as long as the
experimental objective is not about the volume of trade, the EDE has a beneficial effect in making
the experimental market closer to real world asset markets. There may still settings, however, where
the implied excess market activity from not having an interesting alternative may be positively
correlated with the true experimental objectives in ways which are detrimental: not only if the
experiment is about volumes of trade, but also, for example, if it is about contestable markets with
the excess activity implying excess entry in the market, which would be positively correlated with
finding evidence for the significance of contestability (see discussion in Holt, 1995).
29
30
That is, of course, if one believes that the results may be sensitive to the size of the monetary incentives.
Example 5 above is an extreme example of this.
26
As a third example, consider Benjamin et al.’s (2008) use of a priming questionnaire to elicit
ethnic, gender and race related social norms relative to a control. It can be argued that the resulting
purely cognitive EDE works as a magnifying glass to identify the differential potential impact of the
social norm on behavior. The potential concern, which cannot be addressed by their experiment,
might be if the magnifying glass were to work too well, thus inducing behavioral changes to an
extent that would not be observed in the real world; the qualitative result, however, might still be
interesting in a first paper on the topic.
4. The postexperimental inquiry defense. The last three defenses revolve around empirical
evidence that can be used against an EDE critique. The traditional response to EDE is to encourage
the use of ‘postexperimental inquiries’ (a terminology used by Orne, 1973) which take the shape of
questionnaires, verbal debriefing or focus groups, possibly involving roleplay (e.g., Orne, 1962,
1973; Bardsley, 2005, 2008). Under Example 12 we considered Ferraro and Vossler (2008) as a
potentially insightful instance of use of focus groups, Benjamin et al. (2008) use a direct “were you
thinking about what we wanted you to do” written question approach, and Ball et al. (2001)
contains a third example.
Other things being equal, any evidence is obviously better than no evidence. There are,
however, reasons to be cautious about this traditional response. One source of caution is noted by
psychologists (Orne, 1962, 1973; Golding and Lichtenstein, 1970), and is the fact that subjects may
be aware “that they ought not to catch on some aspects of the experimental procedure” and, if they
reveal they do, “their data cannot be used” (Orne, 1973, p. 11), and the resulting EDE is clearly
aligned with the experimenter’s incentive not to dig too deep for the same reason.31 But, regardless
of the plausibility or otherwise of this specific EDE, it points to a more general problem.
Postexperimental inquiries are not subject to the same experimental control that economists require
of their experiments: they are not incentivized and, as they come at the end of the experiment,
subject will typically be de-motivated, possibly tired and simply wishing to get paid and leave the
room. Furthermore, as a minimum (written questionnaires) subjects directly provide feedback to the
experimenter in providing responses; the social dimension of the data generation process is even
stronger in the case of verbal debriefing – as the subject verbally and visually interacts with the
experimenter – and perhaps strongest in the case of focus groups – where horizontal social cues
have plenty of opportunity to interact with the vertical authority and cues by the experimenter as
31
Based on these points, Orne (1962, 1973) does not favour a direct question approach.
27
focus group leader -. Lack of motivation and vertical social dimension can make postexperimental
inquiries paradoxically less interpretable and more subject to EDE than the economic experiments
the EDE of which they are supposed to control for.32
The problem is made worse by the fact that, while written questionnaires are less informative,
the experimental protocols precisely used in verbal debriefing and focus group sessions are
typically less controlled, and thus more subject to subtle or not so subtle unchecked cues, than the
written instructions of the experiment proper. It is also made worse by the well known dissociation
between explicit cognitive mechanisms and implicit cognitive mechanisms (e.g., Shanks and St.
John, 1994): subjects may not realize how their behavior may have been affected by the
experimental environment or, indeed, by their own biases, even though in practice it has.33
5. The direct experimental evidence defense. A less controversial defense is simply to note
that in a given setting there is direct experimental evidence against a hypothesized EDE. One could,
for example, rely on Lei et al. (2001) to argue that the absence of an interesting alternative activity
does not distort asset prices and so, if the positive correlation between the hypothesized EDE and
the true experimental objectives would come from co-moving prices, this is not really a problem.
Similarly, one could rely on Guerra and Zizzo (2004) to defend the use of belief elicitation together
with a behavioral task (Example 16).34 More generally, the earlier discussion summarizing the
empirical evidence on EDE (section 4.1) is relevant here, and shows how, with exceptions, it may
be easier to use the direct empirical evidence defense in the context of purely cognitive EDE than in
the context of social EDE. More empirical research on EDE is clearly needed (Bardsley, 2005).
6. The indirect experimental evidence defense. We discussed how, when the true experimental
objectives are connected to hypotheses that predict behavior which is either uncorrelated or
negatively correlated to the EDE, support of those hypotheses cannot be criticized on the basis of
EDE, and indeed the persuasiveness of the evidence is even stronger in case of negative correlation
(sections 4.2 and 4.3). The same argument may be applied more generally, whenever there are
notable behavioral patterns – ideally formulated in ex ante experimental hypotheses - that cannot
32
Put it differently, either we believe that EDE are potentially important empirically, in which case postexperimental
inquiries are not the solution because of their sensitivity to EDE, or we believe they are not, in which case
postexperimental inquiries are not a good use of limited experimental time and resources.
33
The literature on this dissociation is considerable: for just three examples, see Stocco and Fum (2008), Tapia et al.
(2008) and Zizzo (2003a).
34
See Orne (1973) for a discussion of ‘nonexperiments’ and ‘simulation techniques’ as creative ways of gathering
direct experimental evidence by running additional control treatments, though questions may be raised about their
applicability to economic experiments (e.g., in terms of incentive and learning issues, or the dissociation between
cognitive mechanisms point discussed under defense 4).
28
be explained by postulating the hypothesized EDE, or (more strongly) for which the EDE makes an
opposite prediction. The emphasis on notable behavioral patterns ideally hypothesized ex ante is
clear: EDE may be important but not explain 100% of the variance, and, if so, there quite likely will
anyway be behavioral patterns that cannot be explained by EDE but which can be found with
suitable ex post data mining. The defense is also clearly stronger when the non EDE predicted
behavior is negatively correlated with EDE predictions. As an example of how this defense can be
employed, Bacharach et al. (2007) used belief elicitation combined with different trust game
variants, and hypothesized and found that the degree of trust responsiveness, as defined by the
correlation between belief that one is being trusted and fulfilling trust, was sensitive to the type of
trust game used. While significant, this prediction was not in itself a key experimental objective, but
nevertheless provided indirect evidence against EDE being behind the trust responsiveness result.
7. Conclusions
This paper considered the question of when experimental economists should be concerned
about experimenter demand effects (EDE) as a possible confound, and, when they are a potential
likely confound, what can be done about them. We distinguished between social and purely
cognitive EDE and considered some examples. EDE – especially social EDE – exist but are easier
to avoid; purely cognitive EDE are subtler but the empirical evidence for them is nowhere as
cogent, except in specific settings such as dictator games. When EDE may exist, it crucially matters
how they relate to the true experimental objectives. EDE are a potential serious problem only when
they are positively correlated with the true experimental objectives.35 A number of strategies to
minimize this correlation were identified, including non-deceptive obfuscation of the true
experimental objectives. That being said, and given the trade-offs implicit in designing and running
an experiment, researchers may decide to accept the risk of an EDE even when this is not the
objective of the experiment; indeed, EDE can even be used as an experimental tool to ensure or
increase the external validity of the experiment. There are pitfalls in the traditional response to EDE
that postexperimental debriefing evidence should be used, but direct or indirect experimental
evidence may be especially relevant to defend a paper against an EDE criticism. Obviously more
experimental research would be useful.
35
They can also be a potential serious problem if they are negatively correlated with the true experimental objectives
and no support for the corresponding hypotheses is found.
29
References
Abbink, K., & Hennig-Schmidt, H. (2006). Neutral versus loaded instructions in a bribery experiment.
Experimental Economics, 9, 103-121.
Adair, G. (1984). The Hawthorne effect: a reconsideration of the methodological artifact. Journal of
Applied Psychology, 69, 334-345.
Aguiar, F., Branas-Garza, P., & Millar, L. M. (2008). Moral distance in dictator games. Judgment and
Decision Making, 3, 344-354.
Alm, J., McClelland, G. H., & Schultze, W. D. (1992). Why do people pay taxes? Journal of Public
Economics, 48, 21-38.
Andreoni, J. (1995). Cooperation in public goods experiments: Kindness or confusion? American
Economic Review, 85, 891-904.
Andreoni, J., & Miller, J. (2002). Giving according to GARP: An experimental test of the consistency of
preferences for altruism. Econometrica, 70, 737-753.
Ariely, D., Loewenstein, G., & Prelec, D. (2003). Coherent arbitrariness: Stable demand curves without
stable preferences. Quarterly Journal of Economics, 118, 73-105.
Asch, S. E. (1956). Studies of independence and conformity: I. A Minority of one against a unanimous
majority. Psychology Monographs, 70, 1-70.
Bacharach, M., Guerra, G., & Zizzo, D. J. (2007). The self-fulfilling property of trust: An experimental
study. Theory and Decision, 63, 349-388.
Baldry, J. C. (1986). Tax evasion is not a gamble. Economics Letters, 22, 333-335.
Ball, S., Eckel, C., Grossman, P. J. and Zane, W. (2001). Status in markets. Quarterly Journal of
Economics, 116, 161-188.
Bardsley, N. (2005). Experimental economics and the artificiality of alteration. Journal of Economic
Methodology, 12, 239-251.
Bardsley, N. (2008). Dictator game giving: Altruism or artefact? Experimental Economics, 11, 122-133.
Battigalli, P., & Dufwenberg, M. (2007). Guilt in games. American Economic Review, Papers, &
Proceedings, 97, 170-76.
Beecher, H. K. (1955). The powerful placebo. Journal of the American Medical Association, 159,
1602-1606.
Benjamin, D. J., Choi, J. J., Strickland, A. J. (2008). Social identity and preferences. Cornell University
and Institute for Social Research mimeo, February.
30
Ben-Ner, A., Putternam, L., & Kong, F. (2004a). Share and share alike? Gender-pairing, personality,
and cognitive ability as determinants of giving. Journal of Economic Psychology, 25, 581-589.
Ben-Ner, A., Putternam, L., Kong, F., & Magan, D. (2004b). Reciprocity in a two-part dictator game.
Journal of Economic Behavior and Organization, 53, 333-52.
Binmore, K., Shaked, A., & Sutton, J. (1985). Testing noncooperative bargaining theory: A preliminary
study. American Economic Review, 75, 1178-1180.
Blass, T. (1991). Understanding behavior in the Milgram obedience experiment: The role of personality,
situations, and their interactions. Journal of Personality and Social Psychology, 60, 398-413.
Blass, T. (1999). The Milgram paradigm after 35 years: Some things we now know about obedience to
authority. Journal of Applied Social Psychology, 29, 955-978.
Bornstein, G., Gneezy, U., & Nagel, R., 2002. The effect of intergroup competition on group
coordination. Games and Economic Behavior, 41, 1-25.
Bosman, R., & van Winden, F. (2002). Emotional hazard in a power-to-take experiment. Economic
Journal, 112, 147-169.
Branas-Garza, P. (2006). Poverty in dictator games: Awakening solidarity. Journal of Economic
Behavior and Organization,60, 306-320.
Branas-Garza, P. (2007). Promoting helping behavior with framing in dictator games. Journal of
Economic Psychology, 28, 477-486.
Breitmoser, Y., Tan, J. H. W., & Zizzo, D. J. (2007). The enthusiastic few, peer effects and entrapping
bandwagons. Social Science Research Network Discussion Paper, March.
Brown, R. (2000). Group Processes, 2nd ed. Oxford: Blackwell.
Büchner, S., Coricelli, G., Greiner, B. (2007). Self centered and other regarding behavior in the
solidarity game. Journal of Economic Behavior and Organization, 62, 293-303.
Burnham, T. C. (2003). Engineering altruism: A theoretical and experimental investigation of
anonymity and gift giving. Journal of Economic Behavior and Organization, 50, 133-144.
Cadsby, C. B., Maynes, E., & Trivedi, V. U. (2006). Tax compliance and obedience to authority at
home and in the lab: A new experimental approach. Experimental Economics, 9, 343-359.
Camerer, C. F. (2003). Behavioral Game Theory: Experiments in Strategic Interaction. Princeton:
Princeton University Press.
Carpenter, J. P. (2004). When in Rome: Conformity and the provision of public goods. Journal of
Socio-Economics, 33, 395-408.
31
Carroll, C. D. (2003). Macroeconomic expectations of households and professional forecasters.
Quarterly Journal of Economics, 118, 269-298.
Carter, J. R., & McAloon, S. A. (1996). A test for comparative income effects in an ultimatum
bargaining experiment. Journal of Economic Behavior and Organization, 31, 369-380.
Cason, T. N., & Sharma, T. (2007). Recommended play and correlated equilibria. Economic Theory, 33,
11-27.
Cason, T. N., Saijio, T., & Yamato, T. (2002). Voluntary participation and spite in public good
experiments: An international comparison. Experimental Economics, 5, 133-153.
Charness, G., Rigotti, L., & Rustichini, A. (2007). Individual behavior and group membership.
American Economic Review, 97, 1340-1352.
Chou, E., McConnell, M., Nagel, R., & Plott, C. R. (2008). The control of game form recognition in
experiments: Understanding dominant strategy failures in a simple two person “guessing” game.
California Institute of Technology Social Science Working Paper 1274.
Cookson, R. (2000). Framing effects in public goods experiments. Experimental Economics, 3, 55-79.
Croson, R., & Marks, M. (2001). The effect of recommended contributions in the voluntary provision of
public goods. Economic Inquiry, 39, 238-249.
Crowne, D. P., & Marlowe, D. (1964). The Approval Motive. New York: Wiley.
Davis, D. D., & Holt, C. A. (1993). Experimental Economics. Princeton: Princeton University Press.
Draper, S. W. (2006). The Hawthorne, Pygmalion, placebo and other effects of expectations: some
notes. University of Glasgow working paper, http://www.psy.gla.ac.uk/~steve/hawth.html
Duffy, J., & Feltovitch, N. (2008). Correlated equilibria, good or bad: An experimental study.
University of Pittsburgh and University of Aberdeen working paper.
Dufwenberg, M., & Gneezy, U. (2000). Measuring beliefs in an experimental lost wallet game. Games
and Economic Behavior, 30, 163-182.
Dufwenberg, M., & Muren, A. (2006). Generosity, anonymity, gender. Journal of Economic Behavior
and Organization, 61, 42-49.
Eckel, C. C., & Grossman, P. J. (2000). Volunteers and pseudo-volunteers: The effect of recruitment
method in dictator experiments. Experimental Economics, 3, 101-120.
Ellingsen, T., & Johannesson, M. (2007). Paying respect. Journal of Economic Perspectives, 21, 135149.
Ellingsen, T., & Johannesson, M. (2008). Pride and prejudice: The human side of incentive theory.
American Economic Review, 98, 990-1008.
32
Elliott, C. S., Hayward, D. M. , & Canon, S. (1998). Institutional framing: Some experimental evidence,
Journal of Economic Behavior and Organization, 35, 455-464.
Fehr, E., & Tyran, J. R. (2001). Does money illusion matter? American Economic Review, 91, 12391262.
Ferraro, P. J., & Vossler, C. A. (2008). Stylized facts and identification in public good experiments: The
confusion confound. Georgia State University and University of Tennessee working paper.
Festinger, L. (1957). A Theory of Cognitive Dissonance. Evanston: Peterson Row.
Fleming, P., Townsend, E., Lowe, K. C., & Ferguson, E. (2007). Social desirability effects on
biotechnology across the dimensions of risk, ethicality and naturalness. Journal of Risk Research,
10, 989-1003.
Frank, B. L. (1998). Good news for the experimenters: Subjects do not care about your welfare.
Economics Letters, 61, 171-174.
French, J. R. P., Jr., & Raven, B. (1959). The bases of social power. In D. Cartwright (Ed.), Studies in
social power (pp. 150-167). (Ann Arbor: Research Center for Group Dynamics, University of
Michigan).
Frey, B. S., & Oberholzer-Gee, F. (1997). The cost of price incentives: An empirical analysis of
motivation crowding-out. American Economic Review, 87, 746-755.
Frey, B. S., Oberholzer-Gee, F., & Eichenberger, R. (1996). The old lady visits your backyard: A tale of
morals and markets. Journal of Political Economy, 104, 1297-1313.
Gale, J., Binmore, K. G., & Samuelson, L. (1995). Learning to be imperfect: The ultimatum game.
Games and Economic Behavior, 8, 856-890.
Gillespie, R. (1991). Manifacturing Knowledge: A History of the Hawthorne Experiments. Cambridge:
Cambridge University Press.
Goeree, J. K., Holt, C. A., & Laury, S. K. (2002). Private costs and public benefits: Unraveling the
effects of altruism and noisy behavior. Journal of Public Economics, 83, 255-276.
Golding, S. L., & Lichtenstein, E. (1970). Confession of awareness and prior knowledge of deception as
a function of interview set and approval motivation. Journal of Personality and Social Psychology,
14, 213-223.
Guerra, G., & Zizzo, D. J. (2004). Trust responsiveness and beliefs. Journal of Economic Behavior and
Organization, 55, 25-30.
Guth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum
bargaining. Journal of Economic Behavior and Organization, 3, 367-388.
33
Haley, K. J., & Fessler, D. M. T. (2005). Nobody’s watching? Subtle cues affecting generosity in an
anonymous economic game. Evolution and Human Behavior, 26, 245-256.
Hargreaves Heap, S., & Varoufakis, Y. (2002). Some experimental evidence on the evolution of
discrimination, co-operation and perceptions of fairness. Economic Journal, 112, 679-703.
Hargreaves Heap, S., & Zizzo, D. J. (forthcoming). The value of groups. American Economic Review.
Harrison, G. W., & Johnson, L. T. (2006). Identifying altruism in the laboratory. In D. Davis and R.
Mark Isaac, (Eds.), Experiments Investigating Fundraising and Charitable Contributors (pp. 177223). (Amsterdam and San Diego: Elsevier, Research in Experimental Economics, volume 11).
Hertwig, R., & Ortmann, A. (2001). Experimental practices in Economics: A challenge for
psychologists? Behavioral and Brain Sciences, 24, 383-451.
Hoffman, E., McCabe, K. A., & Smith, V. L. (1996). On expectations and the monetary stakes in
ultimatum games. International Journal of Game Theory, 25, 289-302.
Hoffman, E., McCabe, K., & Smith, V. (2000). The impact of exchange context on the activation of
equity in ultimatum games. Experimental Economics, 3, 5-9.
Hoffman, E., McCabe, K., Shachat, K., & Smith, V. (1994). Preferences, property rights, and anonymity
in bargaining games. Games and Economic Behavior, 7, 346-380.
Holt, C. A. (1995). Industrial organization: A survey of laboratory research. In J. H. Kagel, & A. E.
Roth (Eds.), The handbook of experimental economics (pp. 349-443). (Princeton: Princeton
University Press).
Houser, D., & Kurzban, R. (2002). Revisiting kindness and confusion in public good experiments.
American Economic Review, 92, 1062-1069.
Hrobjartsson, A., & Gotzsche, P. C. (2001). Is the placebo powerless? An analysis of clinical trials
comparing placebo with no treatment. New England Journal of Medicine, 344, 1594-1602.
Iyengar, R., & Schotter, A. (2008). Learning under supervision: An experimental study. Experimental
Economics, 11, 154-173.
Jones, S. R. G. (1984). The Economics of Conformism. Blackwell: Oxford.
Jones, S. R. G. (1992). Was there a Hawthorne effect? American Journal of Sociology, 98, 451-468.
Kienle, G. S., & Kiene H. (1997). The poweful placebo effect: fact or fiction? Journal of Clinical
Epidemiology, 50, 1311-1318.
King, M. F., & Bruner, G. C. (2000). Social desirability bias: a neglected aspect of validity testing.
Psychology and Marketing, 17, 79-103.
34
Kirkpatrick, L. A. & Ellis, B. J. (2004). An evolutionary-psychological approach to self-esteem:
Multiple domains and multiple functions. In M. B. Brewer & M. Hewstone (Eds.), Self and Social
Identity (pp. 52-77). (Madden, MA: Blackwell).
Koch, A. K., & Normann, H.-T. (2007). Giving in dictator games: Regard for others or regard by
others? Royal Holloway working paper.
Kolata, G. (1998). Scientific myths that are too good to die. New York Times, June 15, 18.
Lei, V., Noussair, C. N., & Plott, C. R. (2001). Nonspeculative bubbles in experimental asset markets:
Lack of common knowledge of rationality vs. actual irrationality. Econometrica, 69, 831-859.
Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal
about the real world? Journal of Economic Perspectives, 21, 153-174.
Liberman, V., Samuels, S. M., & Ross, L. (2004). The name of the game: Predictive power of
reputations versus situational labels in determining Prisoner’s Dilemma game moves. Personality
and Social Psychology Bulletin, 30, 1175-1185.
List, J. A. (2007). On the interpretation of giving in dictator games. Journal of Political Economy, 115,
482-493.
Lonnqvist, J.-E., Verkasalu, M., & Bezmenova, I. (2007). Agentic and communal bias in socially
desirable responding. European Journal of Personality, 21, 853-868.
Macefield, R. (2007). Usability studies and the Hawthorne effect. Journal of Usability Studies, 2, 145154.
Markman, A. B., & Moreau, C. P. (2001). Analogy and analogical comparison in choice. In D. Gentner,
K. J. Holyoak & B. Kokinov (Eds.), The analogical mind: Perspectives from cognitive science (pp.
363-399). (Cambridge, MA: MIT Press).
Masclet, D., Noussair, C., Tucker, S., & Villeval, M.-C. (2003), Monetary and nonmonetary punishment
in the voluntary contributions mechanism. American Economic Review, 93, 366-380.
Mayo, D. G. (1996). Error and the Growth of Experimental Knowledge. Chicago: Chicago University
Press.
Mayo, E. (1933). The Human Problems of an Industrial Civilization. New York: MacMillan.
Menzies, G. D., & Zizzo, D. J. (2005). Inferential expectations. Australian National University Centre
for Applied Macroeconomic Analysis Discussion Paper n. 12.
Milgram, S. (1963). Behavioral studies of obedience. Journal of Abnormal and Social Psychology, 67,
371-378.
Milgram, S. (1974). Obedience to Authority: An Experimental View. New York: Harper and Row.
35
Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular
reference to demand characteristics and their implications. American Psychologist, 17, 776-783.
Orne, M. T. (1973). Communication by the total experimental situation: Why it is important, how it is
evaluated, and its significance for the ecological validity of findings. In P. Pliner, L. Krames, & T.
M. Alloway, Communication and Affect (pp. 157-191). (New York: Academic Press).
Ortmann, A., & Hertwig, A. (2002). The costs of deception: Evidence from psychology. Experimental
Economics, 5, 111-131.
Parsons, H. M. (1974). What happened at Hawthorne? Science, 183, 922-932.
Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. Shaver, & L. S.
Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17-59). (San
Diego: Academic Press).
Paulhus, D. L. (1991). Measurement and control of response bias. In J.P. Robinson, P.R. Shaver, & L.S.
Wrightsman, (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17-59). (San
Diego: Academic Press).
Perugini, M., Tan, J. H. W., & Zizzo, D. J. (2005). Which is the more predictable gender? Public good
contribution and personality. Social Science Research Network Discussion Paper, March.
Rice, B. (1982). The Hawthorne defect: persistence of a flawed theory. Psychology Today, 16, 70-74.
Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the Classroom: Teacher Expectation and Pupils’
Intellectual Development. Irvington: New York.
Ruffle, B. (1998). More is better but fair is fair: Tipping in dictator and ultimatum games. Games and
Economic Behavior, 23, 247-265.
Sarin, R. K., & Weber, M. (1993). Effects of ambiguity in market experiments. Management Science,
39, 602-615.
Schotter, A. & Sopher, B. (2007). Advice and behaviour in intergenerational games: An experimental
approach. Games and Economic Behavior, 58, 365-393.
Schotter, A., & Sopher, B. (2006). Trust and trustworthiness in games: An experimental study of
intergenerational advice. Experimental Economics, 9, 123-145.
Schram, A. (2005). Artificiality: The tension between internal and external validity in economic
experiments. Journal of Economic Methodology, 12, 225-237.
Shanab, M. E., & Yahya, K. A. (1977). A behavioral study of obedience in children. Journal of
Personality and Social Psychology, 35, 530-536.
36
Shanab, M. E., & Yahya, K. A. (1978). A cross-cultural study of obedience. Bulletin of the
Psychonomic Society, 11, 267-269.
Shang, J., & Croson, R. (2008). Field experiments in charitable contribution: The impact of social
information on the voluntary provision of public goods. Indiana University working paper.
Shang, J., & Croson, R. (forthcoming). The impact of downward social information on contribution
decisions. Experimental Economics.
Sissons Joshi, M., Joshi, V., & Lamb, R. (2004). The Prisoner’ Dilemma and city-centre traffic. Oxford
Economic Papers, 57, 70-89.
Sitzia, S., & Zizzo, D. J. (2008). In search of product complexity effects in experimental retail markets.
Paper presented at the Centre for Competition Policy, University of East Anglia, June.
Slonim, R., & Garbarino, E. (2008). Increases in trust and altruism from partner selection. Experimental
Economics, 11, 143-153.
Smith, V. L. (1982). Microeconomic systems as an experimental science. American Economic Review,
72, 923-955.
Smith, V. L. (1994). Economics in the laboratory. Journal of Economic Perspectives, 8, 113-131.
Smith, V., Suchanek, G., & Williams, A. (1988). Bubbles, crashes and endogenous expectations in
experimental spot asset markets. Econometrica, 56, 1119-1151.
Sobel, J. (2005). Interdependent preferences and reciprocity. Journal of Economic Literature, 43, 392436.
Starmer, C. (2000). Developments in non-expected utility theory: The hunt for a descriptive theory of
choice under risk. Journal of Economic Literature, 38, 332-382.
Stewart, N., Chater, N., Stott, H. P., & Reimers, S. (2003). Prospect relativity: How choice options
influence decision under risk. Journal of Experimental Psychology: General, 132, 23-46.
Stöber, J. (2001). The social desirability scale-17 (SDS-17). Convergent validity, discriminant validity,
and relationship with age. European Journal of Psychological Assessment, 17, 222-232.
Stocco, A., & Fum, D. (2008). Implicit emotional biases in decision making: The case of the Iowa
Gambling Task. Brain and Cognition, 66, 253-259.
Sunder, S. (1995). Experimental asset markets: A survey. In J. H. Kagel, & A. E. Roth (Eds.), The
Handbook of Experimental Economics (pp. 445-500). (Princeton: Princeton University Press).
Tan, J. H. W. (2006). Religion and social preferences: An experimental study. Economics Letters, 90,
60-67.
37
Tan, J. H. W., & Bolle, F. (2007). Team competition and the public goods game. Economics Letters, 96,
133-139.
Tan, J. H. W., & Zizzo, D. J. (2008). Groups, cooperation and conflict in games. Journal of SocioEconomics, 37, 1-17.
Tapia, M., Carretié, L., Sierra, B., & Mercado, F. (2008). Incidental encoding of emotional pictures:
Affective bias studied through event related brain potentials. International Journal of
Psychophysiology, 68, 193-200.
Thaler, R. H. (1988). Anomalies: The ultimatum game. Journal of Economic Perspectives, 2, 195-206.
Zizzo, D. J. (2003a). Verbal and behavioral learning in a probability compounding task. Theory and
Decision, 54, 287-314.
Zizzo, D. J. (2003b). You are not in my boat: Common fate and similarity attractors in bargaining
settings. Oxford University Department of Economics Discussion Paper 167.
Zizzo, D. J., & Tan, J. H. W. (2007). Perceived harmony, similarity and cooperation in 2 x 2 games: An
experimental study. Journal of Economic Psychology, 28, 365-386.
Figure 1 – The Microeconomic System for the Experimental Subject
Figure 2 – The Relationship Between Potential EDE and Experiment Objectives