Experimenter Demand Effects in Economic Experiments Daniel John Zizzo* School of Economics University of East Anglia Norwich NR4 7TJ United Kingdom [email protected] July 2008 Social Science Research Network Discussion Paper www.ssrn.com Abstract Experimenter demand effects refer to changes in behavior by experimental subjects due to cues about what constitutes appropriate behavior. We argue that they can either be social or purely cognitive, and that, when they may exist, it crucially matters how they relate to the true experimental objectives. They are usually a potential problem only when they are positively correlated with the true experimental objectives’ predictions, and we identify techniques such as non-deceptive obfuscation to minimize this correlation. We discuss the persuasiveness or otherwise of defenses that can be used against demand effects criticisms when such correlation remains an issue. Keywords: experimenter demand effects, experimental design, experimental instructions, social desirability, social pressure, framing, methodology. JEL Classification Codes: B41, C91, C92. * Phone: +44-1603-593668; fax: +44-1603-456259. I wish to thank participants to a presentation at Bologna Forli’ and Nick Bardsley, Dan Benjamin, Pablo Branas Garza, Gary Charness, John Duffy, Dirk Engelmann, Paul Ferraro, Alexander Koch, Charlie Plott and Jen Shang for useful feedback and references on the topic. The usual disclaimer applies. Partial financial support from the Nuffield Foundation for what in part has acted as a scoping study towards experimental work on social desirability is gratefully acknowledged. Electronic copy available at: http://ssrn.com/abstract=1163863 2 1. Introduction Experimenter demand effects (EDE for short) refer to changes in behavior by experimental subjects due to cues about what constitutes appropriate behavior (behavior ‘demanded’ from them). EDE have been connected, among others, to Milgram’s (1974) experiments on fictional electric shocks being delivered by experimental subjects under the direct pressure of an experimenter, to the Hawthorne factory experiments where greater productivity seemed to occurred when workers were the object of a sociological study (Gillespie, 1991), and to placebo effects in medicine where a patient feels better simply by knowing that he or she has received a medicine that is meant to make him or her feel better (Beecher, 1955). In the psychological questionnaire literature they are also recognized as a source of distortion in responses provided (e.g., Paulhus, 1991; Lonnqvist et al., 2007). Their potential relevance for experimental economics has been cogently highlighted by Bardsley (2005, 2008), and is arguably more generally felt in journal refereeing activity where experimental designs are criticized for falling afoul of EDE.1 It is clear that, insofar as (a) subjects work in a microeconomic system the rules of which are defined by the experimenter (Smith, 1982) and (b) the experimenter has a position of authority over the subjects in the laboratory, EDE are in principle a potential concern for economists designing their own experiments. They can also be more generally seen as a threat to the interpretability, and hence the internal validity, of economic experiments. This methodological paper provides a general discussion of what to do about EDE. We try to ask the question of when experimental economists should be worried (or more worried) about EDE as a possible confound, and, when they are a potential likely confound, what can be done about them. Our approach is pragmatic: while EDE can often not be entirely ruled out, the question is what can be done to minimize the relevance or plausibility of an EDE based criticism in the context of a progressive experimental research paradigm.2 Crucial to this strategy, it will be argued, is the recognition that EDE are related to the beliefs that subjects hold about the experimental objectives, and that such beliefs may be linked in different ways with the true experimental objectives. Section 2 briefly provides the conceptual framework to our investigation on EDE and reviews a number of examples for consideration. Section 3 considers EDE in more detail and reviews a set 1 On occasion papers do explicitly try to defend themselves from this type of criticism (e.g., Ball et al., 2001, footnote 8; Benjamin et al., 2008, p. 25). A brief discussion of and warning against EDE is contained in Davis and Holt’s (1993, pp. 26-27) textbook. 2 As the one defended in philosophy of science by Mayo (1996). Electronic copy available at: http://ssrn.com/abstract=1163863 3 of examples related (or allegedly related) to EDE.3 Section 4 summarizes the empirical evidence from section 3 and draws lessons on when we should in principle be worried about EDE. Section 5 considers what can be done in this case to minimize the risk of EDE, and section 6 discusses defense strategies against EDE criticisms when EDE are still a possible issue. Section 7 concludes. 2. Conceptual Framework EDE are about the relationship between the subject and the experimenter. The experimenter provides the microeconomic system, in terms of environment and institutions, the subject makes decisions in (Smith, 1982, 1994). In standard laboratory experiments the subject goes to the lab, sits at a (computer or plain) desk and receives a set of instructions and possibly practice for the decision task, with the expectation that payment will be a function of the decisions that he or she makes.4 Monetary payments that are a function of the experimental task, and other usual experimental tools such as the lack of deception (Hertwig and Ortmann, 2001), are used to ensure a direct and salient connection between decisions taken and desired monetary outcome, and therefore to ensure the interpretability and therefore the internal validity of the experiment. This is true even in experiments trying to identify other regarding preferences (e.g., Camerer, 2003, and Sobel, 2005, for reviews), insofar as the effect of these preferences can only be isolated where set against the benchmark of predictions under pure self interest. Because the experimenter provides the microeconomic system the subject works with in the laboratory, such as the experimental instructions, it is unavoidable that the experimenter is in a position of authority relative to subjects. The experimenter has both legitimacy and expertise, which have been recognized as sources of authority (French and Raven, 1959).5 That research is conducted by Faculty members whereas ordinarily subjects are students, and that research is conducted in the laboratory which is under the physical control of the experimenter, compound the vertical nature of the relationship between experimenter and subjects. So do more avoidable factors present in some experiments but not others, such as the physical presence of the experimenter (where noticeable) or the use of a sample of one’s own students to run experiments. Figure 1 provides a stylized 3 A number of examples are discussed in this paper. As it is not meant to be a survey, and the amount of experimental research that could in principle be affected by EDE is immense, they are, of course, provided without any claim of exhaustiveness. 4 Payment may, of course, also be a function of luck and of the decisions of other subjects, depending on the type of experiment. 5 Blass (1999) found that subjects weigh legitimacy and expertise as the two most important sources of the obedience to authority found in Milgram’s obedience experiments (which are discussed below under Example 1). 4 representation of how the subject relates to the experimenter and (in interactive experiments) to peers, i.e. to other subjects in the experiment. (Insert Figure 1 about here.) Experiments generate data, and, in order for the data to be externally generated - as opposed to be equivalent to a computer simulation the data of which is created by the researcher -, the decision making problem by subjects is by definition incomplete. By this I mean that there is a decision or set of decisions that is in the hands of subjects and which is not fully defined by the experimenter or by peers. Subjects then try to make sense of the unfamiliar and incompletely defined experimental environment based on the instructions, cues and feedback they receive. For example, valuations may be influenced by the menu of available options (e.g., Stewart et al., 2003) and anchored to prior information (e.g., Ariely et al., 2003), and framing cues may be used to trigger behavioral schemata that, by similarity, may be applied to the decision problem at hand (e.g., Gale et al., 1995; Markman and Moreau, 2001; Zizzo and Tan, 2007). In interactive experiments subjects form beliefs and choose actions with respect to peers, and possibly receive information about the actions and social pressure that is being placed on them to play in a socially acceptable way. Pressure may be received simply by learning what the other players have chosen (e.g., Breitmoser et al., 2007), or in some experiments explicitly through the receipt of advice on what to do (e.g., Schotter and Sopher, 2006; Iyengar and Schotter, 2008). If these horizontal effects due to subject – peer interaction matter, we may expect to matter potentially more in the context of the vertical nature of the experimenter – subject relationship. As discussed earlier, the experimenter has both expertise and legitimacy, and therefore authority. As the experimenter knows best about the experiment, we may expect the subject to be especially sensitive to his or her instructions and cues to make the decision problem more complete (for example, to know which ‘real world’ behavioral schema is most suited for the task at hand). This sensitivity to the cues provided may work through implicit (i.e., unconscious) cognitive mechanisms: there is no reason for subjects to be explicitly aware of it, a point we shall return to in section 6. These purely cognitive EDE simply follow from the position of expertise of the experimenter, and do not need any exercise of authority as such by the experimenter. Stronger form of EDE instead, in addition to this cognitive dimension, benefit from the perceived social pressure that the experimenter, as an authority, explicitly or implicitly puts on a subject through instructions and cues. The subject forms beliefs about the experiment objectives and his or her actions can be played out in the direction that he or she believes to be congruent to such 5 objectives. These EDE have been previously conceptualized by psychologists (e.g., Orne, 1962, 1973).6 We label these stronger potential confounds as social EDE, to imply that, in addition to the cognitive dimension, there is a social pressure dimension to them, and the two may of course interact. Social EDE also need not be conscious, though they may well be so. Before discussing examples of social and purely cognitive EDE, a qualification is in order. While it is easy to identify paradigmatic examples of social EDE and of purely cognitive EDE - and we shall do so shortly -, the distinction does have a gray area of cases in the middle where the classification is arguably debatable. That being said, nothing important in this paper turns on a specific example being classified as a social EDE or as a purely cognitive EDE. 3. Considering EDE in More Detail 3.1 Social EDE Social EDE can have a number of sources, such as social conformism of the kind also found in relation to horizontal social pressure (e.g., Jones, 1984; Asch, 1956); desire for respect (e.g., Ellingsen and Johannesson, 2007, 2008), which may be stronger in relation to an authority figure than in relation to horizontal relationships with unknown strangers, though not necessarily so if the other subjects are known or engaged in repeated social interaction; or to straightforward obedience to authority attitudinal characteristics (e.g., Blass, 1991). Another possible source is concern for the experimenter’s welfare, but the evidence on this is mixed.7 Whatever their source, social EDE imply that, if subjects are told or given a hint of what to do, we may expect them to be more likely to do it. Depending on the true experimental objectives, this will turn up not to be necessarily a problem (section 4), but of course it could be. Example 1. Perhaps the strongest social EDE manipulation present in the existing experimental literature is the one in Milgram’s (1963, 1974) obedience experiments. A subject was told he or she was the ‘teacher’ of another subject (truly, just an actor), who had to answer a set of questions and was strapped to an electric chair. Each time an incorrect answer was given, the 6 Orne (1973, p. 163) explicitly defines EDE as being related to the subject seeing “as his task to ascertain the true purpose of the experiment and respond in a manner which will support the hypotheses being tested.” 7 Frank (1998) has a clever experiment where payoffs not realized by subjects in an ultimatum game are literally burnt in front of them, instead of being kept by the experimenter. He finds this makes no difference to subjects’ behavior. Harrison and Johnson (2006) note that the identity of the recipient of the money left unspent in a dictator game experiment variant – whether it is not specified in an un-named charity, a third player in the room, or no one specific – mattered for dictator game allocation. However, their results can arguably be explained simply by other regarding preferences coefficients towards contributing towards the third player or a charity. 6 subject was asked to press a button that was meant to produce increasing electric shocks (from 15 to 450 volts) and to which the actor reacted accordingly in pain (as if the shock were real). If the subject hesitated, the experimenter insistently demanded for him or her to continue. Over two thirds of the subjects obeyed all the way to giving shocks of 450 volts. Milgram’s findings have been replicated under a range of variants and of subject pools (e.g., Shanab and Yahya, 1977, 1978; Blass, 1999). That being said, they represent an extreme case of social EDE at work in an experiment where the effect of such social EDE was itself the objective of the experiment. Example 2. The results of a study of the effect of different lighting conditions on worker productivity at the Hawthorne Plant of the Western Electric Company in the 1920s and 1930s have been interpreted as implying that workers put in more effort simply because they were being studied (and independently of the lighting conditions). This interpretation, encoded by Mayo (1933), has been popular in the sociological and management literature (e.g., Adair, 1984; Gillespie, 1991). It has, however, come into question (e.g., Draper, 2006; Macefield, 2007). Alternative interpretations based on learning and/or feedback effects are plausible (Parsons, 1974); a regression analysis of the Hawthorne original data shows no support for a ‘Hawthorne effect’ interpretation (Jones, 1992); and attempts that have been made to replicate the original results have been unsuccessful (Rice, 1982). The comparative weakness and ambiguity of the evidence implies that, notwithstanding its enduring textbook appeal, it can hardly be used to argue for social EDE being an all pervasive confound.8 Similarly, while a putative Hawthorne effect might be related to teacher effects in which teachers’ expectations affect later performance of the students (Rosenthal and Jacobson, 1992), the underlying causal mechanisms are unclear (for example, if a teacher believes a student is good, she may behave differently towards that student, thus helping her to perform better). To the extent however that such effects might exist, they operate in a well defined direction: the subject has an obvious interpretation of what the authority’s (the employer, the teacher) objective is – a better performance -, and there is an obvious way in which action can be taken to facilitate such an objective, namely by putting in more effort. Example 3. Bardsley (2008) draws an analogy between EDE and placebo effects in medicine. Placebo effects occur when a patient’s medical condition improves for the very fact of knowing that a medicine has been taken. Double blind trials controlling for placebo effects are standard 8 Richard Nisbett once called the Hawthorne effect “a glorified anecdote”: “once you have got an anecdote, you can throw away the data” (cited in Kolata, 1998). 7 procedure when validating the effectiveness of new medical treatments. Draper (2006) comments that, while there is clear evidence of placebo effects for some perceived variables such as pain, more generally the evidence is not as strong as traditionally believed (Kienle and Kiene, 1997; Hrobjartsson and Gotzsche, 2001). Insofar as there are any, placebo effects are characterized by both a clear perception of what the authority wants (it wants the subject to get better) and by a clear understanding of what action is required as a result (to feel better). Example 4. Psychologists routinely measure the extent to which subjects’ responses to (not incentivized) questionnaires is distorted by their tendency to respond in a ‘socially desirable’ way (e.g., Stober, 1991; Crowne and Marlowe, 1964). One dimension of social desirability is about impression management (Paulhus, 1991), as subjects try to put themselves in a positive light towards the experimenter amplifying the good (e.g., kindness) and minimizing the bad (e.g., selfinterest). To mention a specific example, Fleming et al. (2007) found that a social desirability bias extended to stated judgments of risk of socially controversial technologies: groups who scored highly on social desirability measures judged socially contentious technologies (such as genetically modified insulin) riskier than groups who had low social desirability scores; conversely no difference was found between groups for non-contentious technologies (such as replacement heart valves). Obviously this kind of distortion may be of concern in economic experiments, although the lack of incentives and of behavioral responses may insulate economic experiments more than their psychological counterparts in which socially desirability has been measured and found relevant. Also, while socially desirability measures sometimes appear correlated with the other psychological instruments - thus suggesting a distortion -, other times they are not, or appear to have only a small influence (see the survey in King and Bruner, 2000). Crucially, the possibility of impression management lies in the belief that the subject has about behavior that is socially desirable to the experimenter, and in his or her views on how to change responses in the socially desired direction. Example 5. In experimental economics a classic example of social EDE is Binmore et al.’s (1985) bargaining experiment, where results more in alignment with self-interest were obtained than in Guth et al. (1982), but instructions were specifically given asking subjects to be selfinterested: “How do we want you to play? YOU WILL BE DOING US A FAVOUR IF YOU SIMPLY SET OUT TO MAXIMIZE YOUR WINNINGS.” Thaler (1988) contains a discussion of the corresponding loss of experimental control given that Binmore et al.’s (1985) objective was to show that subjects behaved according to the self-interest prediction. 8 Example 6. Croson and Marks (2001) considered the effect of exogenous recommended contributions in a threshold public good game. They found statistically significant behavioral changes as a result of the recommendations, although the likelihood of efficient provision was increased as the result of the recommendations only when valuations of the public good were heterogeneous. Shang and Croson (2008) describe a field experiment run as a set of phone calls as part of a fundraising campaign by a public broadcasting radio, where in a treatment the caller mentioned the contribution of someone else ($ 75, 180 or 300) and then asked for a contribution amount. A similar experiment was run with a mailing campaign (Shang and Croson, forthcoming), and in both experiments the implicit recommendation provided had an impact on contributions. Building on Cason and Sharma (2007), Duffy and Feltovitch (2008) had subjects play repeatedly Chicken games after having received a ‘recommendation by the computer program’ on an action to take. Recommendations had on impact on behavior, and initially three subjects out of four chose the recommended action out of two.9 When the advice was systematically bad or corresponding to non equilibrium behavior, it had less of an impact than when it when it corresponded to a Nash equilibrium or to a payoff improving correlated equilibrium. In all these cases, the provision of an exogenous recommended contribution or action may operate as a form of demand by the experimenter onto the subjects, in the context of a strategic environment where such a demand effect produced by the experimenter is aligned with the potential interest of players to coordinate and cooperate - though not always so in Duffy and Feltovitch, and it is interesting that in this case subjects learn to disregard the advice. There is an external validity justification to the manipulation, since we would expect that in the real world recommendations may indeed come from authorities (in the context of public good games) or from mediating third parties in a position of authority (in the context of Chicken games). Directly comparing recommendations by authorities with those by peers, for exactly the same treatment manipulation, may be a way of providing an indicative estimate of the size of this effect.10 Example 7. Chou et al. (2008) did a set of best guessing games experiments, with a standard rule that the winner would be the person guessing closest to ¾ of the average guess by two players. 9 Cason and Sharma had an initial compliance rate of slightly higher than 80% in their baseline treatment with recommendation. Unlike Duffy and Feltovitch, their instructions noted and stressed that following the recommendation, if followed by both players, was payoff-enhancing. 10 The estimation would be indicative because, even where the recommendation comes from a peer, it may become a part of the cognitive understanding of how the subject should play the game in the experiment, so there may be a purely cognitive EDE anyway at work. 9 One of their manipulations was to introduce a strong hint on how to play the game, by writing down (in bold characters, as a separate paragraph and with a figure on the side to stress the point) “Notice how simple this is: the lower number will always win”. Chou et al. found that subjects largely followed this advice. In another manipulation they had what they label a “battle protocol” in which “your job is to choose how high to locate your troop on the hill, from 0 feet high to 100 feet high” and “you win the battle if your chosen location is higher than your opponent’s”. Subjects largely followed ‘the job’ they were given. Chou et al. interpret the findings as showing that the instructions enabled to have a better game theoretical understanding of the guessing game. There is another possible interpretation: there was a social EDE at work and subjects simply did what they were told to do. As such, Chou et al. may not be actually measuring the levels of reasoning of the subjects, which, given the abstract nature of the game, is arguably the main reason of interest of guessing games as such (e.g., Camerer, 2003). Example 8. Branas-Garza (2007) investigated the effect, in dictator games, of having (in bold characters and in an emphatic centered position) the following cue: “REMEMBER that he is in your hands” (i.e., he relies on you). This led to an increase in dictator giving. Branas-Garza argues that the frame increases the moral costs from not giving. However, he notes that the increase in giving was greater in a classroom setting run by the professor (for extra credit points) than in a regular double-blind lab version (for money), and interprets this as evidence that, the greater the authority delivering the cue, the greater its effect. While it may be difficult to draw clear lessons from comparing the classroom setting to the double blind version,11 it is clear that even in the lab version a social EDE is likely to be present and may be amplified by the subjects’ perception of the dictator game setting (a point we shall return to in the next section). It is not clear whether the cue is effective because of social EDE, because of the salience of moral norms, or because of other factors such as guilt aversion and trust responsiveness, which would suggest that making the reliance of the recipient salient should increase the giving (e.g., Battigalli and Dufwenberg, 2007; Bacharach et al., 2007). Example 9. Cadsby et al. (2006) had subjects earn money by answering a questionnaire and then, in one treatment, they ‘required’ subjects to pay 30% of their resulting income, which thery 11 The use of classroom volunteers vs. proper lab volunteers, the different type of incentives and the possibility of peer effects all contribute to make the comparison difficult. We return to classroom experiments in section 5. 10 were ‘expected to indicate correctly’. In the baseline treatment no such experimenter demand was placed and an ‘invitation to gamble’ frame was used instead. They found that, in the first frame, truthful income reporting remained high even when the chance of tax fraud being discovered was as low as 1%. This is a clear case of social EDE but one with perfect external validity mapping, given that, in the real world, authorities also unequivocally demand tax payment. 3.2 Purely Cognitive EDE While social EDE are normally either an explicit experimental manipulation or directly triggered by such a manipulation, the lack of explicit or implicit instructions to the subject on how to behave does not prevent the existence of weaker forms of EDE. This is true for at least two reasons. First, as discussed in section 2, subjects try to make sense of the unfamiliar and incompletely defined experimental environment based on the instructions, cues and feedback they receive, and the experimenter is the most qualified expert about the experiment they can get information on cues from about what the experiment is about and they should do as a result. Second, there is the Heisenberg principle type of argument that, by the very fact of drawing the attention of subjects on the experimental variable of interest X, one is changing behavior in relation to X. This second point is framed as a philosophical one but is actually an empirical one: that human beings behave according to a Heisenberg type principle is an elegant conjecture but not much more than that; it requires answering in specific cases and may turn out to have differential validity in different settings. The first point needs to be taken into account but may be turned into an advantage insofar as the experimental economist is interested in drawing inferences in relation to specific contexts and frames; and task construal may be a matter of the real world as much as of the experimental laboratory. Example 10. The debate on the degree to which experimental instructions should be context rich or context free is well known. Context can help subjects’ understanding; it may help external validity in relation to the specific real world context, although this relies on the subjects’ specific expectations about the context being realistic; if the experiment is complex, context may to some extent be unavoidable. At the same time, however, context may distract subjects and allow them to carry over unrealistic scripts and expectations to the task; it may reduce the generality of the experiment; it may induce EDE. Rather than adjudicating between the two views, here we focus on this last point more, by making the obvious point that EDE might be triggered by the use of loaded language as a framing 11 device. To illustrate, in Baldry (1986), presenting a decision task in terms of tax payment rather than betting made subjects pay more taxes; Alm et al. (1992), however, did not find an effect for the usage of “tax” instructions, but this may have been because there was too high contribution anyway, possibly influenced by their ‘group’ benefit framing in their instructions.12 Abbink et al. (2006) compared loaded versus neutral instructions in a bribery experiment. While the ‘loaded’ treatment had a lower bribery rate and bribery acceptance rate, the difference was too small to be statistically significant. These illustrative cases where the frame can be clearly connected to an EDE show that an effect may be present but may not be large and may be subject to ceiling effects. Exceptions do exist, however. Some relate to the dictator game (see Example 13 below). To mention another study, Liberman et al. (2004) had subjects play Prisoner’s Dilemma which was either labeled as a ‘Community Game’ or as a ‘Wall Street Game’. Impressively, cooperation was roughly twice as much with the ‘community game’ frame. This can, however, be explained by a powerful interaction between the cue provided by the experimenter and the face-to-face nature in which the experiment was conducted. Vertical and horizontal social pressure was not disentangled, thus confounding the interpretation of the results. Example 11. A stylized fact from research on experimental asset markets is the occurrence of speculative bubbles for dividend paying assets (e.g., Smith et al., 1988; Sunder, 1995). Lei et al. (2001) noted that experiments have traditionally had just a single activity available – trading in the market for the asset –, and argued that the source of the bubbles could be that subjects feel induced by the nature of the experiment to over-trade, thus pushing the prices up and creating the bubbles. This, we may argue, would be a purely cognitive EDE.13 To address this, Lei et al. introduced an alternative interesting activity - in the form of a second market where subjects could also trade in -. They also added a statement to the instructions (in bold black letters) stating that subjects were not required to participate in either of the markets if they chose not to, and that it was their decision whether to participate in one, both or neither.14 Lei et al. found that the trading volume decreased as the result of these changes to the design, suggesting that a cognitive EDE was indeed present. However, prices were not statistically significantly lower in the experimental treatments than in the 12 See Tan and Zizzo (2008) for a discussion of group identity based framing, and Elliott et al. (1998), Cookson (2000) and Hargreaves Heap and Zizzo (forthcoming) for three examples of its effectiveness (see also Example 16 below). 13 Another interpretation would be boredom. However, asset markets experiments usually find bubbles early on in the experiment rather than or much more than late on. This is inconsistent with a boredom interpretation, as we may expect subjects to be more bored late on rather than early on. 14 The emphasized extra statement may have induce a reverse form of EDE in which subjects may have felt compelled to trade less, but, given the qualifications in the statement, this is not especially plausible. 12 controls, denting the claim that the occurrence of speculative bubbles is an artifact of such an EDE. Experimental tests of market contestability theory are another area where the absence of an alternative interesting activity might induce a cognitive EDE (see Holt, 1995, for a review). Example 12. Attempts to disentangle ‘confusion’ from social preferences in public good contribution experiments have estimated that ‘confusion’ explains up to around ½ of contribution levels (Andreoni, 1995; Kurzban and Houser, 2002; Ferraro and Vossler, 2008). While it has been modeled on occasion as an error parameter in quantal response equilibria models (e.g., Goeree et al., 2002), an open question remains on what the source of ‘confusion’ actually is. Andreoni (1995) estimated confusion by comparing a standard public good contribution experiment with a rank tournament version with different incentives, while Kurzban and Houser (2002) and Ferraro and Vossler (2008) replaced human public good co-players with computer co-players, in the full knowledge of the subjects; Ferraro and Vossler’s protocol emphasized that what the subject would do would have no impact on what the computer would later do, i.e. that the computer play was predetermined. One possibility is that the nature of the standard public good problem, in which deviation from Nash is only in one direction of contributing, may trigger a purely cognitive EDE requiring subjects to engage in some contribution. An alternative form of purely cognitive EDE may be in subjects herding on the cue provided by the computer based contribution. Ferraro and Vossler informally report results from post-experimental focus groups which do not seem supportive of the first cognitive EDE. They seemed to have found that subjects either understood the incentive structure, or they did not and herded on the computer player contributions, which is consistent with the second cognitive EDE; or they perceived the public good problem as an assurance game and acted accordingly. This last answer is consistent with subjects applying a behavioral schema from the real world which they felt appropriate even though it did not fit the strategic details of the experimental setup,15 and may be consistent with a social preference story and with a genuine reason why cooperation is observed in real world public good problems.16 The informal partial support for the second EDE suggests that the usual anchoring on the contribution of others found in public good experiments (e.g., Carpenter, 2004; Perugini et al., 2005) may in part be explained not by social preferences or by internalized social norms, but rather by EDE. The strength 15 For other such examples, in the context of ultimatum games, see Carter and McAloon (1996) and Hoffman et al. (2000). 16 Sissons Joshi et al. (2004) used survey techniques to show that car drivers in Oxford (U.K.) tend to perceive their city traffic congestion problem as an assurance game rather than as a standard social dilemma. 13 of these results, of course, depends on the extent to which we are ready to believe in the focus group results (see section 6 below). Example 13. Dictator games are highly artificial settings in which subjects are asked to consider giving, and often give, significant amounts of money to strangers while they rarely do so in the real world (Schram, 2005; Bardsley, 2008). Dictator giving appears very sensitive to apparently small changes in the design, such as changes in deservingness (e.g., Ruffle, 1998), the availability of a picture of the recipient (Burnham, 2003), other information provided on the recipients (Branas Garza, 2006) and awareness of observation (Haley and Fessler, 2005). Depending on the experimental details, the fraction of subjects giving money varies over a wide range: for example, only around 10% of the subjects gave money in treatments by Hoffman et al. (1994, 1996) and Koch and Normann (2007), but over 95% did so in a treatment by Aguiar et al. (2008) and Branas Garza (2006). By their unusual nature, dictator games are typically done only once, although within-treatment manipulations are sometimes made (as in Andreoni and Miller, 2002). The structure of the dictator game makes it an obvious candidate for an EDE. Subjects are given money by the experimenter and their choice is simply to give it or not, with the recipient not having a say in the matter; they do the experiment only once or, even if they do it more than once, they do not receive any feedback after each play. A purely cognitive EDE would be sufficient for subjects to realize that the experiment’s objective is about giving and that, therefore, they should be giving some money. The question would then be how much money they should give, and the clues given in the experimental setup can help in that respect, thus contributing to the sensitivity of dictator giving to small changes in the design. Since this EDE may operate in a purely cognitive way, a double blind design of the kind employed by Hoffman et al. (1994, 1996) and others would not in itself be sufficient to remove it, though it might help reduce any surplus social EDE that may also be present in such a setting.17 The clearest evidence of a purely cognitive EDE at work in dictator games comes from clever experiments by Bardsley (2008) and List (2007).18 In their common dictator game variant, subjects could not only give but also take money from the recipient. A shift to lower giving (and more taking) was observed, which can in part be explained by a subject sample with heterogeneous 17 In a personal communication, Gary Charness has suggested that a reverse EDE may be at work: by stressing double anonymity in the design, the experimenter may induce subjects to give less. 18 ‘Clearest’ does not mean unequivocal. Bardsley (2008) discusses alternative interpretations of his findings, though his preference is for an EDE interpretation. List (2007) puts his results in the context of the ‘moral cost’ framework developed in Levitt and List (2007). 14 preferences,19 but can in part be attributed to a range dependence of the giving activity, which is consistent with a purely cognitive EDE interpretation.20 These findings suggest caution in drawing inferences on dictator games from the laboratory to the field. They may also make the interpretation of laboratory findings sometimes difficult, particularly as social dimensions may easily be present on top of the cognitive dimension of the EDE. We discussed one such example above (Example 8). Other cases may be subtler. For instance, dictator game studies of social distance may confound horizontal with vertical social pressure effects (see Dufwenberg and Muren, 2006), while experimental instructions noting that subjects are ‘entitled’ to keep part of their endowment (thus implying that they are not entitled to keep the rest), asking subjects to justify their choices in writing to the experimenter, and manipulating the key experimental variables on a within-session basis (see Branas-Garza, 2006 and Aguiar et al., 2008) may be expected to be especially affected by EDE. We turn to the point of within-session manipulations next. Example 14. It is a basic principle of experimental design that counterbalancing or randomization of the order of different tasks can be important for experimental control. Even where counterbalancing or randomization occur, however, there is the potential danger of a purely cognitive EDE if subjects believe to be able to glean information about the experimenter’s objectives from the sequence of tasks at hand. Given a sequence of decision tasks, and based on their perceptions of the experimenter’s objectives, subjects may aim to make their decisions consistent with each other in a way in which normally, were they presented the tasks independently, they would not be.21 This is not necessarily a problem. For example, in the context of individual choice experiments, an attempt for consistency should be aligned with the financial incentives in improving the goodness for fit of expected utility theory or other well defined utility functions. Therefore, if the experimental hypotheses are about anomalies requiring to go beyond conventional approaches to decision making under risk (as defined by Starmer, 2000), evidence for such anomalies is made stronger by the fact that the cognitive EDE would operate in the opposite direction. A change in behavior for consistency’s sake may also be practically more difficult if tasks 19 For example, a self-interested subject in the dictator game gives 0, but takes as much as possible if taking is allowed. Bardsley (2008) also considered a pure taking game where subjects can take rather than give money, and used a double blind protocol throughout. 21 There are a number of psychological reasons why this may be the case, having to do with cognitive dissonance (Festinger, 1957) and self-esteem management (Kirkpatrick and Ellis, 2004). 20 15 change in multiple dimensions, thus making directly comparability harder and the experimenter’s objectives less transparent. Equally, the possibility of purely cognitive EDE through within-session manipulations should not be ruled out in principle. For example, in Branas-Garza (2006) and Aguiar et al.’s dictator games (2008), the recipient’s state was the only variable varied across three decision tasks, and it was varied in way that made transparent how subjects ought to tailor their giving.22 Slonim and Garbarino (2008) had on a within-session basis a trust game and a modified dictator game which is identical to the trust game except that the recipient cannot return money; while their key message that partner selection is correlated with more giving is unaffected by EDE, the easy to obtain consistency between trust game and dictator game behavior makes claims on the robustness of the results between the two settings weaker. Andreoni and Miller’s (2002) dictator games varied multidimensionally across tasks, which should reduce the bias, but, since their paper is all about consistency of choices and they still did use dictator games with their limitations (Example 13), their within-session design is potentially worrisome in the light of the potential cognitive EDE. Example 15. We now turn to examples where the cognitive EDE potentially arises from Heisenberg principle type of problems, in which experimental manipulations concerning the measurement of X might change behavior in relation to X. A first such case concerns experiments that combine questionnaires with behavior in games, but as long as the questionnaire is after the behavioral part (e.g., Ben-Ner et al., 2004a, 2004b; Büchner et al., 2007), or run in a separate session say a couple of weeks earlier (Tan, 2006), the problem is addressed. The first solution retains however the danger of spurious congruence between questionnaire and behavioral responses of the kind discussed under Example 14. The danger is not as strong as sometimes claimed since verbal and behavioral responses can be surprisingly divergent (Zizzo, 2003a), reflecting the distinction in cognitive psychology between explicit and implicit cognitive mechanisms.23 Whether the danger is realistic depends on how transparent the connection between the two parts is, in terms of subjects being able to clearly identify that the experimenter’s objective is to seek a connection between the two parts, and in what way (see section 5). Often, batteries of different psychological instruments are run, making such an identification harder insofar as they are clearly about different things (e.g., Ben-Ner et al., 2004a, 2004b, use both personality and cognitive ability type of 22 Namely, no information on recipient; ‘poor’ recipient for whom the money can be ‘very useful’; ‘poor’ recipients for whom the money can be ‘very useful’ and in relation to whom the donations are converted in medicines ‘of great help’. 23 We return to this point in section 6. 16 questionnaires). Of course, questionnaires are sometimes deliberately used to prime salience, e.g. of social norms (as in Benjamin et al., 2008), and in this case a potential EDE criticism arises. Example 16. Separating out experimental participants into two artificial groups or teams in order to identify the effects of ingroup-outgroup relationships is one such manipulation (e.g., Charness et al., 2007; Tan and Zizzo, 2008; Hargreaves Heap and Zizzo, forthcoming). In many cases the group manipulations is mapped into differences in the material incentive structure of different teams, in which case subjects may make sense of the artificial groups based on their real world experience of competing teams (as in treatments present, for example, in Tan and Bolle, 2007, Bornstein et al., 2002, and Hargreaves Heap and Zizzo, forthcoming). On the other extreme of social psychological research in the minimal group paradigm tradition (e.g., Brown, 2000, for a review), a simple dictator like allocation task typically finds ingroup members being favored over outgroup members, which in itself – particularly given the artificiality of the dictator game task and the one shot nature of the task – could well be related to a purely cognitive EDE. Somewhere in the middle, economic experiments with treatments where artificial groups have been induced but without changing the incentive structure (e.g., Hargreaves Heap and Varoufakis, 2002; Charness et al., 2007; Tan and Bolle, 2007) enable us to look into the ‘pure’ effects of groups but still need to respond to the potential cognitive EDE criticism. There are a number of ways in which these middle ground papers can defend themselves against this criticism: relative to the minimal group paradigm, they are based on the use of less artificial tasks and the typically repeated nature of the interaction allows subjects to acquire familiarity with the decision task; in at least two cases (Zizzo, 2003b, and Charness et al., 2007), the mere introduction of groups was insufficient to produce behavioral results but introducing stronger manipulation (without obvious extra cognitive EDE mapping) made a difference; more importantly, sometimes core findings can be identified that are not obviously explainable by a cognitive EDE story but which fit with genuine psychological mechanisms related to the existence of groups, such as the unequivocal inducement of negative inter-group discrimination in Hargreaves Heap and Zizzo (forthcoming) and to a lesser degree in Zizzo (2003b), or the evolution of conventions in Hargreaves Heap and Varoufakis (2002). Example 17. Trust responsiveness implies that, in trust games, trustees are more likely to fulfill trust if they believe that trusters believe that they will fulfill trust (see Bacharach et al., 2007; Guerra and Zizzo, 2004); in that sense, trust responsiveness requires a causal link from a second order belief (a trustee’s belief about a truster’s belief) into behavior, and a direct experimental test 17 of trust responsiveness (and separation from other psychological mechanisms) requires the measurement of such a second order belief (as also, for example, in Dufwenberg and Gneezy, 2000). The potential purely cognitive EDE is that, by the very fact of measuring beliefs, attention is being drawn on trust responsiveness or similar psychological mechanisms (such as guilt aversion: Battigalli and Dufwenberg, 2007), therefore altering the behavior of subjects. Guerra and Zizzo (2004) directly tested this hypothesis, by comparing treatments with and without belief elicitation. Of course, if beliefs are not elicited we cannot verify whether a link between link between beliefs and behavior exists, since beliefs become unobservable. What can be verified, however, is whether any distortion in trusting or fulfilling rates occurs as the result of eliciting beliefs. Guerra and Zizzo found no evidence of this, thus casting doubt on the existence of an EDE in this setting. Example 18. We have already considered cases where horizontal pressure might be combined and confounded with vertical EDE (Liberman et al., 2004, under Example 10; Dufwenberg and Muren, 2006, under Example 16). Another such case might be the availability of purely social punishment on the part of public good contribution game co-players (as in Masclet et al., 2003). Provision of horizontal advice (as in Schotter and Sopher, 2006, 2007, and Iyengar and Schotter, 2008) may also be confounded with a purely cognitive EDE, as the experimental design may clue subjects in on the fact that they should use the advice provided or take seriously the opportunity to give advice. The problem here arises insofar as subjects can transparently read the objective of the experiment as requiring them to take the advice and its provision seriously, and act accordingly. One response to this is an external validity argument: in real world organizations, the authority does require for advice and its provision to be taken seriously, and so the laboratory simply mirrors real world organizational setups. Another response would be, if feasible, to identify predictions that can be predicted by horizontal advice but not by the EDE. 4. Empirical Relevance of Experimenter Demand Effects and Experimental Objectives 4.1 Setting the Scene Our analysis of a set of (real or alleged) examples of EDE has shown both the potential explanatory power of demand effects in economic experiments and, at the same time, their limits in what they can plausibly explain if one relies on the existing evidence. It is clear from examples from outside economics such as the Milgram obedience experiments (Example 1) and the social 18 desirability research (Example 4),24 but also from cases of research in experimental economics where dramatic changes in behavior have occurred (Examples 5-9), that social EDE do have at least the potential to play havoc with experimental control. While potentially of relevance in a much wider set of experimental designs, the empirical case for purely cognitive EDE is only partial. It is clearest in the frequently used but highly artificial and arguably unrepresentative setup of dictator games (Example 13). The evidence from asset markets is only partial (the purely cognitive EDE affected trading volumes but not prices: Example 11), and that from public good contribution experiments is bounded to what is probably a fraction of the variance in the contribution that is attributed to ‘confusion’ (Example 12, in terms of reversion to the mean contribution). Similarly, the existing evidence (insofar as there is some) points to only a limited effect in the context of the potential connection between framing and purely cognitive EDE (Example 10), and the one study which directly tested a Heisenberg principle type of purely cognitive EDE (Example 17) found no support. That being said, we admitted that a purely cognitive EDE may lurk, at least in principle, in other examples as well (Examples 14-16, 18). When should we be worried about EDE? Figure 2 develops the part of the conceptual framework of Figure 1 that deals with the relationship between the subject’s beliefs and actions and the experimenter’s objectives. (Insert Figure 2 about here.) Based on the instructions, cues and pressure received, subjects form expectations about the experiment’s objectives which may inform the actions they take. The key, often overlooked, point is how the subjects’ expected experiment objectives and corresponding actions relate to the actual experiment objectives and predictions made by the experimenter. 4.2 Uncorrelated Expected and True Objectives Assume that subjects are unable to form a view of the experiment objectives, or, to the extent to which they are, the subjects’ expected experiment objectives are orthogonal to the true objectives and predictions. That is, they do not plausibly imply actions that go either in the direction of or in the opposite direction of the experiment predictions. In this case we can say that the eventual EDE are uncorrelated with the true experimental objectives. In Carter and McAloon (1996) subjects played either a standard ultimatum game or a similar but strategically different tournament game. The point of the experiment was to verify that, contrarily to social preference models that predict different behavior in the two games, similar behavior would be obtained, thus creating a potential 24 For reasons discussed in section 3.1, they provide stronger evidence than Examples 2 and 3. 19 puzzle for social preference models. The technical nature of the predictions and the betweensessions nature of the design make orthogonality a plausible assumption. Similarly, in Fehr and Tyran’s (2001) paper on money illusion, either subjects saw a payoff matrix the complexity of which could induce money illusion or they saw one that did not. Subjects could figure out that the experiment was in part about the shock to the economy which they knew would occur mid-way in the session, but this expected experiment objective cannot explain a differential response of subjects under a complex payoff matrix or a simple one, i.e. it is orthogonal to the true objective. Hargreaves Heap and Zizzo’s (forthcoming) focus was on determining whether intergroup discrimination was positive or negative (which was verified against a control treatment run on a between sessions basis) and in evaluating the psychological valuation of groups based on neutrally framed markets for group membership. Although subjects could perceive the experiment was about cooperation and groups, the way that this might distort behavior in (e.g., aggregate more cooperation across treatments) was orthogonal to the true experimental objectives. In Sarin and Weber (1993), both ambiguous assets and unambiguous assets were traded in markets on a within session (or even within trading period) basis, and their objective was to look at whether ambiguity aversion carried out in markets. There is no obvious sense however in which the difference in ambiguity between assets should translate into a clear expected objective that one asset should be favored over the other. That being the case, we can plausibly predict any eventual EDE to be uncorrelated to the true experiment objective and therefore irrelevant to the focus of the paper. Although we discuss only four examples here, more could be made. The key point is that, even though an at least purely cognitive EDE might arise in principle, it would plausibly have no bearing for the key experimental predictions and tests of the paper, thus making EDE a non issue. Crucially, the true experimental objectives are obscure to the subjects, and as the expected experimental objectives are either also unclear or uncorrelated to the true experimental objectives, the subjects are unable to engage in actions that can act as confound. 4.3 Negatively Correlated Expected and True Objectives Some experimental designs are such that an EDE may be induced, but may be negatively correlated, and so work against, the predictions implied by the true experimental objectives. In this case we can say, for short, that EDE and true experimental objectives are negatively correlated. As noted under Example 14, within-subject designs may facilitate behavioral consistency, and so, in individual choice experiments trying to find behavioral anomalies relative to expected utility theory, 20 and employing within-subject designs, any eventual purely cognitive EDE will operate against the true experimental objective. Abbink et al. (2006), discussed under Example 10, is aimed to show the robustness of behavior to the use of ‘loaded’ instructions, and so any EDE induced by the ‘loaded’ instructions should operate in the direction opposite to the true objective of the experiment. Menzies and Zizzo’s (2005) true objective is to try to find evidence of sluggish belief adjustment, but a key reason for why there may be such sluggish attention is inattention by economic agents (e.g., Carroll, 2003); since in their experiments subjects simply receive a signal each period and all they have to do is to make sense of it to revise their guess of the true state of the world, the resulting purely cognitive EDE may focus all of the subjects’ attention on the signal and therefore may lead to less sluggish belief adjustment than would otherwise be found. Cason et al. (2002) use a public good experiments to find evidence of spite; since the public good experiment setup should if anything be biased towards cooperative behavior (Example 12 above), the purely cognitive EDE implied by the game setup seems again to operate against the true experimental objective. In this scenario, unlike the previous one, EDE can act as a confound, but only in the sense of making more difficult to show statistically significant evidence in support of the true experimental objectives. On the one hand, if no evidence is found in support of the true experimental objectives, this could be due not a genuine failure of the corresponding hypotheses but rather to EDE. On the other hand, if statistically significant evidence is found in support of the true experimental objectives, its persuasiveness is reinforced instead of the weakened by the knowledge that there may be potential EDE working in the opposite direction. 4.4 Positively Correlated Expected and True Objectives The potentially most problematic scenario is one where, if they exist, the EDE are positively correlated with the true experimental objectives: that is, the EDE are positively correlated with the predictions implied by the true experimental objectives. In this case, and assuming that investigating EDE is not in itself the true experimental objective, if we observe behavior that appears in support of the hypotheses related to the true experimental objectives, we cannot in principle be sure about the genuine extent to which such behavior is due such hypotheses or due to EDE. This positive correlation – not the mere existence of EDE – is what creates a potential confound problem. Examples of experiments with positive correlation have been considered throughout section 3, among others Binmore et al.’s (1985) bargaining experiment (Example 5), the provision of advice on contributions and coordination (Example 6), instructions on how to play the 21 best guessing game (Example 7), behavior in public good (Example 12) and dictator (Example 13) games, within session manipulations potentially inducing EDE aligned with the true experimental objectives as in Andreoni and Miller (2002; Example 14), behavior and questionnaires (Example 15) or belief elicitation (Example 16). In the next two sections our primary focus will be on this potentially problematic positive correlation scenario. 5. Dealing with Experimenter Demand Effects Before discussing some of the things researchers may do to minimize EDE, an important qualification is in order. In designing and running an experiment, researchers need to take into account both their true objectives and the theoretical and practical constraints on hand, in terms for example of cognitive simplicity of the experimental environment, duration of the experiment, inability (in economic experiments) to engage in deception, number of experimental sessions and treatments that can be run given the budget and the subject pool, and so on. Put it simply, sometimes researchers may need to consciously accept a trade-off between different experimental objectives and constraints, and it may be optimal for them to accept some risk of an EDE as a result, rather than going for a corner solution where such risk is brought to zero. This does not necessarily mean that the experimental results are confounded, and section 6 will discuss strategies to deal with EDE criticisms where a prima facie case for an EDE positively correlated with the true experimental objectives exists and has not fully been dealt with at the experimental design stage. Social vs. purely cognitive EDE. The distinction between social vs. purely cognitive EDE is relevant in thinking about how to deal with EDE for two reasons. First, most traditional design adjustments are really thought with social EDE in mind. Second, unless there is a clear connection between what the experiment is trying to do and the kind of social cues that are required for social EDE to operate, social EDE are more straightforward to handle. Measures such as not running the experiment with classroom ‘pseudo-volunteers’25 or avoiding the presence of a senior experimentalist in the experimental room are obvious ways to minimize the effectiveness that social cues may have, and so social EDE. So is more generally minimizing the social interactions between experimenters and subjects, the objective being to 25 Eckel and Grossman (2000) present a dictator game experiment where (a) pseudo-volunteers give more than standard volunteers and (b) they are more sensitivity to religious or altruistic preferences as measured in questionnaire instruments. They note how EDE may be one explanation for their findings. Of course, they fact that they use dictator games and combine them with a questionnaire instrument may make their results stronger than they otherwise would be (as discussed under Examples 13 and 14). 22 maximize the social distance between subject and experimenter; a procedure such as the one used by Ball et al. (2001) in which status was provided – in part – through a public award ceremony weakens the results of that paper because of the social cues reinforcing the status effect dimension.26 Double anonymity has been discussed as a tool under Example 13, but in practice may be unfeasible in all but the simplest of experiments, and may impoverish the data that can be collected.27 Not telling subjects what to do, or – more subtly – avoiding loaded frames is however normally feasible. It may be more difficult when the potential social EDE is intimately connected with the true objective of the experiment (without being the same), such as in the context of the provision of advice to analyze its impact on the play of correlated equilibria in Chicken games (as in Duffy and Feltovitch, 2008: see Example 6). Changing the Decision Task. Due to their subtler nature, purely cognitive EDE potentially apply to a wider range of designs than social EDE, and as a result they may be harder to control completely. In some cases, such as that of dictator games (Example 13) and less evidently that of public good contribution experiments (Example 14), the very nature of the decision problem implies the existence of potential purely cognitive EDE. Clearly, and following the spirit of Lei et al. (2001: Example 11), one solution is to alter the decision task in such a way as to minimize the purely cognitive EDE that might otherwise result (e.g., in Lei et al. with the introduction of an interesting alternative activity). Non-Deceptive Obfuscation. Assuming that this is not possible or desirable for other reasons, as noted in section 4 the problem arises when, if they exist, EDE – and so the expected experimental objectives (Figure 2) - correlate with the true experimental objectives, especially when positively so. Using our conceptual framework, a possible solution then is to try to minimize such correlation between expected and true experimental objectives. Since deception is not allowed in economic experiments, and for good reasons (e.g., Hertwig and Ortmann, 2001; Ortmann and Hertwig, 2002), 26 Ball et al. (2001) claim not. They note how the status procedure is a deliberate treatment variable in their experiment, any resulting EDE is unlikely to affect the auction results, and a debriefing questionnaire asking subject to describe their thought process and strategy did not find evidence for a relevance of the status symbol they use in relation to each subject. The first point is discussed and criticized in section 6: that a procedure was implemented deliberately to induce status does not negate the fact that status recognition is not the same as vertical authority given by a subject, i.e. deference requested by the experimenter, which is the potential EDE here. The third point concerns the significance of ‘postexperimental inquiries’ (Orne, 1973), which shall be considered in section 6, but, taken at its face value, could be used against both a status and an EDE interpretation of the results, and thus cannot be used to prefer one to the other. Equally, the second point (an EDE should be neutralized by the auction mechanism) appears no less but also no more persuasive than saying that a status effect should be neutralized by the auction mechanism. 27 For example, it may prevent being able to map demographic data into the choices that subjects make. As an earlier footnote noted, it might also, by its own right, induce EDE. 23 this attempt to minimize the correlation has to be played out under the constraint that deception not be used. I label the set of tools that can be used to achieve this as non-deceptive obfuscation. That is, if the danger of a positive correlation exists, the experimenter can try (without using deception) to obfuscate the true experimental objectives or to modify the expected experimental objectives in such a way that the correlation is reduced. What are some techniques that can be used to achieve non-deceptive obfuscation? One is the use of context-free (not simply non socially loaded) language avoiding tipping agents in one direction or another, though of course (as discussed under Example 10) this has its disadvantages. Another is the use of contexts that reduce the connection with the true experimental objectives, such as the use of a products market frame in Fehr et al.’s (1993) study of sequential labor markets. Third, while sometimes requiring too many resources to be feasible, between sessions designs can often be used effectively to obfuscate the actual experimental objectives (the examples of Carter and McAloon, 1996, and Fehr and Tyran, 2001, are discussed in section 4.2). Fourth, when a within-session design is important and unavoidable, for example because of an interest in verifying the connection between questionnaire and behavioral responses (Example 15), requiring subjects to come to two different sessions separated in time by between one and three weeks (as in Tan, 2006), can be used as an obfuscation tool. Fifth, filler questions in questionnaires, or filler behavioral tasks, can be employed: the latter may be time-consuming and dilute the financial incentives in the tasks that matter (for a given budget), but the former is an obvious tool to employ when questionnaires matter. In an experiment on emotional response elicitation (e.g., Bosman and van Winden, 2002), they may take the form, for example, of asking about a set of emotions of different kind, therefore obfuscating any inference on which ones the experimenter is actually interested in and how they are connected with the rest of the experiment. Sixth, cues can be introduced in the experiment which, while not deceiving subjects as such, may help in the obfuscation exercise. For example, if a cue must be introduced pointing in one behavioral direction, another cue should be introduced pointing in the opposite direction, with the disclaimer that the subject should do as pleases him or her.28 This list is not meant to be exhaustive, but identifies specific ways in which potential EDE confounds can be reduced or removed thanks to non-deceptive obfuscation. 28 The cue, of course, should not be deceptive. I use this technique in Sitzia and Zizzo (2008), where in an ‘informed seller’ treatment subjects are neutrally informed of a possible strategic factor they may wish to take into account, but are also informed of another (genuine) factor potentially working in exactly the opposite direction, and they are then explicitly told that it is up to them if they wish to take one factor into account, the other one, both or neither. 24 6. Defenses against Experimenter Demand Effect Critiques We now consider the scenario in which there is a positive correlation between EDE and true experimental objectives and the likelihood of a potential EDE is significant enough that it may act as a potential confound. Are there defenses that can be put forward against criticisms that the experimental results are invalidated by EDE? We discuss six here. 1. The EDE as the objective of the experiment. If identifying an EDE is the objective of the experiment (as for example in Milgram, 1963, Example 1, or Bardsley, 2008, Example 13), then obviously it is not a confound variable. Casting this point in terms of experimental objectives is crucial to avoid the potential confusion arising from Davis and Holt’s (1993, p. 26) phrasing of this point in terms of the EDE being a treatment variable. This may induce the confusion (e.g. Ball et al., 2001, footnote 8) that deliberately engaging in an EDE manipulation as a treatment variable makes it right. This is only true if what the experiment and the resulting paper aim to achieve is identifying an EDE; if it is, say, to identify status effects in markets (as in Ball et al., 2001) or to argue for a moral cost model of reasoning (as, partially, in List, 2007), then obviously the potential confound criticism will hold, and possibly more seriously so because of the difficulty of separating these interpretations from an EDE when the behavioral correlation between the two is perfectly positive in the experimental data. 2. The external validity defense. An EDE that parallels or helps reproduce an important feature of the real world setup the experiment is trying to model is an EDE that may strengthen the experiment by enhancing its external validity. As already noted in the discussion of Example 6 (e.g., Croson and Marks, 2001; Shang and Croson, 2008), advice provided in a position of authority - that mirrors real world settings where such a vertical relationship advice may be provided – is an obvious setup where an external validity defense holds. So is Cadsby et al.’s (2006) treatment requesting subjects to pay their taxes, as discussed in Example 9, sometimes the use of a context rich frame as considered in Example 10, or possibly the provision of horizontal advice in Example 18. Care has to be taken however that the mapping with the real world is structural and not superficial. For example, using a stock market frame with real world traders may seem a good idea in terms of external validity, but they will then find it easier simply to apply their own behavioral schemata from real world trading which may have little to do with the actual trading environment 25 they are facing in the laboratory; if the true experimental objective was to find behavioral anomalies, this might then be problematic. 3. The magnifying glass argument. I shall label a special version of the external validity defense the ‘magnifying glass’ argument. It goes like this: an EDE may be legitimate if it magnifies the relevance of a dimension which, in the real world, is (a) present to a stronger degree than in the laboratory or (b) and/or cognitively familiar from experience and therefore easier to make sense of than in the context of an unfamiliar experimental environment. The EDE would be a tool employed by an experimenter in the same way in which a scientist may use a magnifying glass or a microscope: to better, if artificially, identify effects which otherwise may not be observable. As an example, consider the stress of experimenters on monetary incentives in the instructions. The EDE thus induced may magnify the salience of the monetary incentives, and, as monetary incentives at the margin are higher in many real world economic settings, may help with the external validity of the experiment by making the results more widely applicable.29 That being said, however, if the experiment is on social preferences, then the potential distorting effects on motivation might be significant (e.g., Frey et al., 1996; Frey and Oberholzer-Gee, 1997) and the existence of a positive correlation between EDE and true experimental objectives would considerably weaken the magnifying glass argument.30 As another example, consider Lei et al.’s (2001) finding that the absence of an interesting alternative activity induces a greater volume of trade in asset markets (though not a distortion in prices: see Example 11). Since most asset markets of any relevance in the real world are much ‘thicker’ markets – in terms of volume of trade – than experimental markets with just a few (or even just a few tens of) traders, the magnifying glass argument can be used to suggest that, as long as the experimental objective is not about the volume of trade, the EDE has a beneficial effect in making the experimental market closer to real world asset markets. There may still settings, however, where the implied excess market activity from not having an interesting alternative may be positively correlated with the true experimental objectives in ways which are detrimental: not only if the experiment is about volumes of trade, but also, for example, if it is about contestable markets with the excess activity implying excess entry in the market, which would be positively correlated with finding evidence for the significance of contestability (see discussion in Holt, 1995). 29 30 That is, of course, if one believes that the results may be sensitive to the size of the monetary incentives. Example 5 above is an extreme example of this. 26 As a third example, consider Benjamin et al.’s (2008) use of a priming questionnaire to elicit ethnic, gender and race related social norms relative to a control. It can be argued that the resulting purely cognitive EDE works as a magnifying glass to identify the differential potential impact of the social norm on behavior. The potential concern, which cannot be addressed by their experiment, might be if the magnifying glass were to work too well, thus inducing behavioral changes to an extent that would not be observed in the real world; the qualitative result, however, might still be interesting in a first paper on the topic. 4. The postexperimental inquiry defense. The last three defenses revolve around empirical evidence that can be used against an EDE critique. The traditional response to EDE is to encourage the use of ‘postexperimental inquiries’ (a terminology used by Orne, 1973) which take the shape of questionnaires, verbal debriefing or focus groups, possibly involving roleplay (e.g., Orne, 1962, 1973; Bardsley, 2005, 2008). Under Example 12 we considered Ferraro and Vossler (2008) as a potentially insightful instance of use of focus groups, Benjamin et al. (2008) use a direct “were you thinking about what we wanted you to do” written question approach, and Ball et al. (2001) contains a third example. Other things being equal, any evidence is obviously better than no evidence. There are, however, reasons to be cautious about this traditional response. One source of caution is noted by psychologists (Orne, 1962, 1973; Golding and Lichtenstein, 1970), and is the fact that subjects may be aware “that they ought not to catch on some aspects of the experimental procedure” and, if they reveal they do, “their data cannot be used” (Orne, 1973, p. 11), and the resulting EDE is clearly aligned with the experimenter’s incentive not to dig too deep for the same reason.31 But, regardless of the plausibility or otherwise of this specific EDE, it points to a more general problem. Postexperimental inquiries are not subject to the same experimental control that economists require of their experiments: they are not incentivized and, as they come at the end of the experiment, subject will typically be de-motivated, possibly tired and simply wishing to get paid and leave the room. Furthermore, as a minimum (written questionnaires) subjects directly provide feedback to the experimenter in providing responses; the social dimension of the data generation process is even stronger in the case of verbal debriefing – as the subject verbally and visually interacts with the experimenter – and perhaps strongest in the case of focus groups – where horizontal social cues have plenty of opportunity to interact with the vertical authority and cues by the experimenter as 31 Based on these points, Orne (1962, 1973) does not favour a direct question approach. 27 focus group leader -. Lack of motivation and vertical social dimension can make postexperimental inquiries paradoxically less interpretable and more subject to EDE than the economic experiments the EDE of which they are supposed to control for.32 The problem is made worse by the fact that, while written questionnaires are less informative, the experimental protocols precisely used in verbal debriefing and focus group sessions are typically less controlled, and thus more subject to subtle or not so subtle unchecked cues, than the written instructions of the experiment proper. It is also made worse by the well known dissociation between explicit cognitive mechanisms and implicit cognitive mechanisms (e.g., Shanks and St. John, 1994): subjects may not realize how their behavior may have been affected by the experimental environment or, indeed, by their own biases, even though in practice it has.33 5. The direct experimental evidence defense. A less controversial defense is simply to note that in a given setting there is direct experimental evidence against a hypothesized EDE. One could, for example, rely on Lei et al. (2001) to argue that the absence of an interesting alternative activity does not distort asset prices and so, if the positive correlation between the hypothesized EDE and the true experimental objectives would come from co-moving prices, this is not really a problem. Similarly, one could rely on Guerra and Zizzo (2004) to defend the use of belief elicitation together with a behavioral task (Example 16).34 More generally, the earlier discussion summarizing the empirical evidence on EDE (section 4.1) is relevant here, and shows how, with exceptions, it may be easier to use the direct empirical evidence defense in the context of purely cognitive EDE than in the context of social EDE. More empirical research on EDE is clearly needed (Bardsley, 2005). 6. The indirect experimental evidence defense. We discussed how, when the true experimental objectives are connected to hypotheses that predict behavior which is either uncorrelated or negatively correlated to the EDE, support of those hypotheses cannot be criticized on the basis of EDE, and indeed the persuasiveness of the evidence is even stronger in case of negative correlation (sections 4.2 and 4.3). The same argument may be applied more generally, whenever there are notable behavioral patterns – ideally formulated in ex ante experimental hypotheses - that cannot 32 Put it differently, either we believe that EDE are potentially important empirically, in which case postexperimental inquiries are not the solution because of their sensitivity to EDE, or we believe they are not, in which case postexperimental inquiries are not a good use of limited experimental time and resources. 33 The literature on this dissociation is considerable: for just three examples, see Stocco and Fum (2008), Tapia et al. (2008) and Zizzo (2003a). 34 See Orne (1973) for a discussion of ‘nonexperiments’ and ‘simulation techniques’ as creative ways of gathering direct experimental evidence by running additional control treatments, though questions may be raised about their applicability to economic experiments (e.g., in terms of incentive and learning issues, or the dissociation between cognitive mechanisms point discussed under defense 4). 28 be explained by postulating the hypothesized EDE, or (more strongly) for which the EDE makes an opposite prediction. The emphasis on notable behavioral patterns ideally hypothesized ex ante is clear: EDE may be important but not explain 100% of the variance, and, if so, there quite likely will anyway be behavioral patterns that cannot be explained by EDE but which can be found with suitable ex post data mining. The defense is also clearly stronger when the non EDE predicted behavior is negatively correlated with EDE predictions. As an example of how this defense can be employed, Bacharach et al. (2007) used belief elicitation combined with different trust game variants, and hypothesized and found that the degree of trust responsiveness, as defined by the correlation between belief that one is being trusted and fulfilling trust, was sensitive to the type of trust game used. While significant, this prediction was not in itself a key experimental objective, but nevertheless provided indirect evidence against EDE being behind the trust responsiveness result. 7. Conclusions This paper considered the question of when experimental economists should be concerned about experimenter demand effects (EDE) as a possible confound, and, when they are a potential likely confound, what can be done about them. We distinguished between social and purely cognitive EDE and considered some examples. EDE – especially social EDE – exist but are easier to avoid; purely cognitive EDE are subtler but the empirical evidence for them is nowhere as cogent, except in specific settings such as dictator games. When EDE may exist, it crucially matters how they relate to the true experimental objectives. EDE are a potential serious problem only when they are positively correlated with the true experimental objectives.35 A number of strategies to minimize this correlation were identified, including non-deceptive obfuscation of the true experimental objectives. That being said, and given the trade-offs implicit in designing and running an experiment, researchers may decide to accept the risk of an EDE even when this is not the objective of the experiment; indeed, EDE can even be used as an experimental tool to ensure or increase the external validity of the experiment. There are pitfalls in the traditional response to EDE that postexperimental debriefing evidence should be used, but direct or indirect experimental evidence may be especially relevant to defend a paper against an EDE criticism. Obviously more experimental research would be useful. 35 They can also be a potential serious problem if they are negatively correlated with the true experimental objectives and no support for the corresponding hypotheses is found. 29 References Abbink, K., & Hennig-Schmidt, H. (2006). Neutral versus loaded instructions in a bribery experiment. Experimental Economics, 9, 103-121. Adair, G. (1984). The Hawthorne effect: a reconsideration of the methodological artifact. Journal of Applied Psychology, 69, 334-345. Aguiar, F., Branas-Garza, P., & Millar, L. M. (2008). Moral distance in dictator games. Judgment and Decision Making, 3, 344-354. Alm, J., McClelland, G. H., & Schultze, W. D. (1992). Why do people pay taxes? Journal of Public Economics, 48, 21-38. Andreoni, J. (1995). Cooperation in public goods experiments: Kindness or confusion? American Economic Review, 85, 891-904. Andreoni, J., & Miller, J. (2002). Giving according to GARP: An experimental test of the consistency of preferences for altruism. Econometrica, 70, 737-753. Ariely, D., Loewenstein, G., & Prelec, D. (2003). Coherent arbitrariness: Stable demand curves without stable preferences. Quarterly Journal of Economics, 118, 73-105. Asch, S. E. (1956). Studies of independence and conformity: I. A Minority of one against a unanimous majority. Psychology Monographs, 70, 1-70. Bacharach, M., Guerra, G., & Zizzo, D. J. (2007). The self-fulfilling property of trust: An experimental study. Theory and Decision, 63, 349-388. Baldry, J. C. (1986). Tax evasion is not a gamble. Economics Letters, 22, 333-335. Ball, S., Eckel, C., Grossman, P. J. and Zane, W. (2001). Status in markets. Quarterly Journal of Economics, 116, 161-188. Bardsley, N. (2005). Experimental economics and the artificiality of alteration. Journal of Economic Methodology, 12, 239-251. Bardsley, N. (2008). Dictator game giving: Altruism or artefact? Experimental Economics, 11, 122-133. Battigalli, P., & Dufwenberg, M. (2007). Guilt in games. American Economic Review, Papers, & Proceedings, 97, 170-76. Beecher, H. K. (1955). The powerful placebo. Journal of the American Medical Association, 159, 1602-1606. Benjamin, D. J., Choi, J. J., Strickland, A. J. (2008). Social identity and preferences. Cornell University and Institute for Social Research mimeo, February. 30 Ben-Ner, A., Putternam, L., & Kong, F. (2004a). Share and share alike? Gender-pairing, personality, and cognitive ability as determinants of giving. Journal of Economic Psychology, 25, 581-589. Ben-Ner, A., Putternam, L., Kong, F., & Magan, D. (2004b). Reciprocity in a two-part dictator game. Journal of Economic Behavior and Organization, 53, 333-52. Binmore, K., Shaked, A., & Sutton, J. (1985). Testing noncooperative bargaining theory: A preliminary study. American Economic Review, 75, 1178-1180. Blass, T. (1991). Understanding behavior in the Milgram obedience experiment: The role of personality, situations, and their interactions. Journal of Personality and Social Psychology, 60, 398-413. Blass, T. (1999). The Milgram paradigm after 35 years: Some things we now know about obedience to authority. Journal of Applied Social Psychology, 29, 955-978. Bornstein, G., Gneezy, U., & Nagel, R., 2002. The effect of intergroup competition on group coordination. Games and Economic Behavior, 41, 1-25. Bosman, R., & van Winden, F. (2002). Emotional hazard in a power-to-take experiment. Economic Journal, 112, 147-169. Branas-Garza, P. (2006). Poverty in dictator games: Awakening solidarity. Journal of Economic Behavior and Organization,60, 306-320. Branas-Garza, P. (2007). Promoting helping behavior with framing in dictator games. Journal of Economic Psychology, 28, 477-486. Breitmoser, Y., Tan, J. H. W., & Zizzo, D. J. (2007). The enthusiastic few, peer effects and entrapping bandwagons. Social Science Research Network Discussion Paper, March. Brown, R. (2000). Group Processes, 2nd ed. Oxford: Blackwell. Büchner, S., Coricelli, G., Greiner, B. (2007). Self centered and other regarding behavior in the solidarity game. Journal of Economic Behavior and Organization, 62, 293-303. Burnham, T. C. (2003). Engineering altruism: A theoretical and experimental investigation of anonymity and gift giving. Journal of Economic Behavior and Organization, 50, 133-144. Cadsby, C. B., Maynes, E., & Trivedi, V. U. (2006). Tax compliance and obedience to authority at home and in the lab: A new experimental approach. Experimental Economics, 9, 343-359. Camerer, C. F. (2003). Behavioral Game Theory: Experiments in Strategic Interaction. Princeton: Princeton University Press. Carpenter, J. P. (2004). When in Rome: Conformity and the provision of public goods. Journal of Socio-Economics, 33, 395-408. 31 Carroll, C. D. (2003). Macroeconomic expectations of households and professional forecasters. Quarterly Journal of Economics, 118, 269-298. Carter, J. R., & McAloon, S. A. (1996). A test for comparative income effects in an ultimatum bargaining experiment. Journal of Economic Behavior and Organization, 31, 369-380. Cason, T. N., & Sharma, T. (2007). Recommended play and correlated equilibria. Economic Theory, 33, 11-27. Cason, T. N., Saijio, T., & Yamato, T. (2002). Voluntary participation and spite in public good experiments: An international comparison. Experimental Economics, 5, 133-153. Charness, G., Rigotti, L., & Rustichini, A. (2007). Individual behavior and group membership. American Economic Review, 97, 1340-1352. Chou, E., McConnell, M., Nagel, R., & Plott, C. R. (2008). The control of game form recognition in experiments: Understanding dominant strategy failures in a simple two person “guessing” game. California Institute of Technology Social Science Working Paper 1274. Cookson, R. (2000). Framing effects in public goods experiments. Experimental Economics, 3, 55-79. Croson, R., & Marks, M. (2001). The effect of recommended contributions in the voluntary provision of public goods. Economic Inquiry, 39, 238-249. Crowne, D. P., & Marlowe, D. (1964). The Approval Motive. New York: Wiley. Davis, D. D., & Holt, C. A. (1993). Experimental Economics. Princeton: Princeton University Press. Draper, S. W. (2006). The Hawthorne, Pygmalion, placebo and other effects of expectations: some notes. University of Glasgow working paper, http://www.psy.gla.ac.uk/~steve/hawth.html Duffy, J., & Feltovitch, N. (2008). Correlated equilibria, good or bad: An experimental study. University of Pittsburgh and University of Aberdeen working paper. Dufwenberg, M., & Gneezy, U. (2000). Measuring beliefs in an experimental lost wallet game. Games and Economic Behavior, 30, 163-182. Dufwenberg, M., & Muren, A. (2006). Generosity, anonymity, gender. Journal of Economic Behavior and Organization, 61, 42-49. Eckel, C. C., & Grossman, P. J. (2000). Volunteers and pseudo-volunteers: The effect of recruitment method in dictator experiments. Experimental Economics, 3, 101-120. Ellingsen, T., & Johannesson, M. (2007). Paying respect. Journal of Economic Perspectives, 21, 135149. Ellingsen, T., & Johannesson, M. (2008). Pride and prejudice: The human side of incentive theory. American Economic Review, 98, 990-1008. 32 Elliott, C. S., Hayward, D. M. , & Canon, S. (1998). Institutional framing: Some experimental evidence, Journal of Economic Behavior and Organization, 35, 455-464. Fehr, E., & Tyran, J. R. (2001). Does money illusion matter? American Economic Review, 91, 12391262. Ferraro, P. J., & Vossler, C. A. (2008). Stylized facts and identification in public good experiments: The confusion confound. Georgia State University and University of Tennessee working paper. Festinger, L. (1957). A Theory of Cognitive Dissonance. Evanston: Peterson Row. Fleming, P., Townsend, E., Lowe, K. C., & Ferguson, E. (2007). Social desirability effects on biotechnology across the dimensions of risk, ethicality and naturalness. Journal of Risk Research, 10, 989-1003. Frank, B. L. (1998). Good news for the experimenters: Subjects do not care about your welfare. Economics Letters, 61, 171-174. French, J. R. P., Jr., & Raven, B. (1959). The bases of social power. In D. Cartwright (Ed.), Studies in social power (pp. 150-167). (Ann Arbor: Research Center for Group Dynamics, University of Michigan). Frey, B. S., & Oberholzer-Gee, F. (1997). The cost of price incentives: An empirical analysis of motivation crowding-out. American Economic Review, 87, 746-755. Frey, B. S., Oberholzer-Gee, F., & Eichenberger, R. (1996). The old lady visits your backyard: A tale of morals and markets. Journal of Political Economy, 104, 1297-1313. Gale, J., Binmore, K. G., & Samuelson, L. (1995). Learning to be imperfect: The ultimatum game. Games and Economic Behavior, 8, 856-890. Gillespie, R. (1991). Manifacturing Knowledge: A History of the Hawthorne Experiments. Cambridge: Cambridge University Press. Goeree, J. K., Holt, C. A., & Laury, S. K. (2002). Private costs and public benefits: Unraveling the effects of altruism and noisy behavior. Journal of Public Economics, 83, 255-276. Golding, S. L., & Lichtenstein, E. (1970). Confession of awareness and prior knowledge of deception as a function of interview set and approval motivation. Journal of Personality and Social Psychology, 14, 213-223. Guerra, G., & Zizzo, D. J. (2004). Trust responsiveness and beliefs. Journal of Economic Behavior and Organization, 55, 25-30. Guth, W., Schmittberger, R., & Schwarze, B. (1982). An experimental analysis of ultimatum bargaining. Journal of Economic Behavior and Organization, 3, 367-388. 33 Haley, K. J., & Fessler, D. M. T. (2005). Nobody’s watching? Subtle cues affecting generosity in an anonymous economic game. Evolution and Human Behavior, 26, 245-256. Hargreaves Heap, S., & Varoufakis, Y. (2002). Some experimental evidence on the evolution of discrimination, co-operation and perceptions of fairness. Economic Journal, 112, 679-703. Hargreaves Heap, S., & Zizzo, D. J. (forthcoming). The value of groups. American Economic Review. Harrison, G. W., & Johnson, L. T. (2006). Identifying altruism in the laboratory. In D. Davis and R. Mark Isaac, (Eds.), Experiments Investigating Fundraising and Charitable Contributors (pp. 177223). (Amsterdam and San Diego: Elsevier, Research in Experimental Economics, volume 11). Hertwig, R., & Ortmann, A. (2001). Experimental practices in Economics: A challenge for psychologists? Behavioral and Brain Sciences, 24, 383-451. Hoffman, E., McCabe, K. A., & Smith, V. L. (1996). On expectations and the monetary stakes in ultimatum games. International Journal of Game Theory, 25, 289-302. Hoffman, E., McCabe, K., & Smith, V. (2000). The impact of exchange context on the activation of equity in ultimatum games. Experimental Economics, 3, 5-9. Hoffman, E., McCabe, K., Shachat, K., & Smith, V. (1994). Preferences, property rights, and anonymity in bargaining games. Games and Economic Behavior, 7, 346-380. Holt, C. A. (1995). Industrial organization: A survey of laboratory research. In J. H. Kagel, & A. E. Roth (Eds.), The handbook of experimental economics (pp. 349-443). (Princeton: Princeton University Press). Houser, D., & Kurzban, R. (2002). Revisiting kindness and confusion in public good experiments. American Economic Review, 92, 1062-1069. Hrobjartsson, A., & Gotzsche, P. C. (2001). Is the placebo powerless? An analysis of clinical trials comparing placebo with no treatment. New England Journal of Medicine, 344, 1594-1602. Iyengar, R., & Schotter, A. (2008). Learning under supervision: An experimental study. Experimental Economics, 11, 154-173. Jones, S. R. G. (1984). The Economics of Conformism. Blackwell: Oxford. Jones, S. R. G. (1992). Was there a Hawthorne effect? American Journal of Sociology, 98, 451-468. Kienle, G. S., & Kiene H. (1997). The poweful placebo effect: fact or fiction? Journal of Clinical Epidemiology, 50, 1311-1318. King, M. F., & Bruner, G. C. (2000). Social desirability bias: a neglected aspect of validity testing. Psychology and Marketing, 17, 79-103. 34 Kirkpatrick, L. A. & Ellis, B. J. (2004). An evolutionary-psychological approach to self-esteem: Multiple domains and multiple functions. In M. B. Brewer & M. Hewstone (Eds.), Self and Social Identity (pp. 52-77). (Madden, MA: Blackwell). Koch, A. K., & Normann, H.-T. (2007). Giving in dictator games: Regard for others or regard by others? Royal Holloway working paper. Kolata, G. (1998). Scientific myths that are too good to die. New York Times, June 15, 18. Lei, V., Noussair, C. N., & Plott, C. R. (2001). Nonspeculative bubbles in experimental asset markets: Lack of common knowledge of rationality vs. actual irrationality. Econometrica, 69, 831-859. Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? Journal of Economic Perspectives, 21, 153-174. Liberman, V., Samuels, S. M., & Ross, L. (2004). The name of the game: Predictive power of reputations versus situational labels in determining Prisoner’s Dilemma game moves. Personality and Social Psychology Bulletin, 30, 1175-1185. List, J. A. (2007). On the interpretation of giving in dictator games. Journal of Political Economy, 115, 482-493. Lonnqvist, J.-E., Verkasalu, M., & Bezmenova, I. (2007). Agentic and communal bias in socially desirable responding. European Journal of Personality, 21, 853-868. Macefield, R. (2007). Usability studies and the Hawthorne effect. Journal of Usability Studies, 2, 145154. Markman, A. B., & Moreau, C. P. (2001). Analogy and analogical comparison in choice. In D. Gentner, K. J. Holyoak & B. Kokinov (Eds.), The analogical mind: Perspectives from cognitive science (pp. 363-399). (Cambridge, MA: MIT Press). Masclet, D., Noussair, C., Tucker, S., & Villeval, M.-C. (2003), Monetary and nonmonetary punishment in the voluntary contributions mechanism. American Economic Review, 93, 366-380. Mayo, D. G. (1996). Error and the Growth of Experimental Knowledge. Chicago: Chicago University Press. Mayo, E. (1933). The Human Problems of an Industrial Civilization. New York: MacMillan. Menzies, G. D., & Zizzo, D. J. (2005). Inferential expectations. Australian National University Centre for Applied Macroeconomic Analysis Discussion Paper n. 12. Milgram, S. (1963). Behavioral studies of obedience. Journal of Abnormal and Social Psychology, 67, 371-378. Milgram, S. (1974). Obedience to Authority: An Experimental View. New York: Harper and Row. 35 Orne, M. T. (1962). On the social psychology of the psychological experiment: With particular reference to demand characteristics and their implications. American Psychologist, 17, 776-783. Orne, M. T. (1973). Communication by the total experimental situation: Why it is important, how it is evaluated, and its significance for the ecological validity of findings. In P. Pliner, L. Krames, & T. M. Alloway, Communication and Affect (pp. 157-191). (New York: Academic Press). Ortmann, A., & Hertwig, A. (2002). The costs of deception: Evidence from psychology. Experimental Economics, 5, 111-131. Parsons, H. M. (1974). What happened at Hawthorne? Science, 183, 922-932. Paulhus, D. L. (1991). Measurement and control of response bias. In J. P. Robinson, P. Shaver, & L. S. Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17-59). (San Diego: Academic Press). Paulhus, D. L. (1991). Measurement and control of response bias. In J.P. Robinson, P.R. Shaver, & L.S. Wrightsman, (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17-59). (San Diego: Academic Press). Perugini, M., Tan, J. H. W., & Zizzo, D. J. (2005). Which is the more predictable gender? Public good contribution and personality. Social Science Research Network Discussion Paper, March. Rice, B. (1982). The Hawthorne defect: persistence of a flawed theory. Psychology Today, 16, 70-74. Rosenthal, R., & Jacobson, L. (1968). Pygmalion in the Classroom: Teacher Expectation and Pupils’ Intellectual Development. Irvington: New York. Ruffle, B. (1998). More is better but fair is fair: Tipping in dictator and ultimatum games. Games and Economic Behavior, 23, 247-265. Sarin, R. K., & Weber, M. (1993). Effects of ambiguity in market experiments. Management Science, 39, 602-615. Schotter, A. & Sopher, B. (2007). Advice and behaviour in intergenerational games: An experimental approach. Games and Economic Behavior, 58, 365-393. Schotter, A., & Sopher, B. (2006). Trust and trustworthiness in games: An experimental study of intergenerational advice. Experimental Economics, 9, 123-145. Schram, A. (2005). Artificiality: The tension between internal and external validity in economic experiments. Journal of Economic Methodology, 12, 225-237. Shanab, M. E., & Yahya, K. A. (1977). A behavioral study of obedience in children. Journal of Personality and Social Psychology, 35, 530-536. 36 Shanab, M. E., & Yahya, K. A. (1978). A cross-cultural study of obedience. Bulletin of the Psychonomic Society, 11, 267-269. Shang, J., & Croson, R. (2008). Field experiments in charitable contribution: The impact of social information on the voluntary provision of public goods. Indiana University working paper. Shang, J., & Croson, R. (forthcoming). The impact of downward social information on contribution decisions. Experimental Economics. Sissons Joshi, M., Joshi, V., & Lamb, R. (2004). The Prisoner’ Dilemma and city-centre traffic. Oxford Economic Papers, 57, 70-89. Sitzia, S., & Zizzo, D. J. (2008). In search of product complexity effects in experimental retail markets. Paper presented at the Centre for Competition Policy, University of East Anglia, June. Slonim, R., & Garbarino, E. (2008). Increases in trust and altruism from partner selection. Experimental Economics, 11, 143-153. Smith, V. L. (1982). Microeconomic systems as an experimental science. American Economic Review, 72, 923-955. Smith, V. L. (1994). Economics in the laboratory. Journal of Economic Perspectives, 8, 113-131. Smith, V., Suchanek, G., & Williams, A. (1988). Bubbles, crashes and endogenous expectations in experimental spot asset markets. Econometrica, 56, 1119-1151. Sobel, J. (2005). Interdependent preferences and reciprocity. Journal of Economic Literature, 43, 392436. Starmer, C. (2000). Developments in non-expected utility theory: The hunt for a descriptive theory of choice under risk. Journal of Economic Literature, 38, 332-382. Stewart, N., Chater, N., Stott, H. P., & Reimers, S. (2003). Prospect relativity: How choice options influence decision under risk. Journal of Experimental Psychology: General, 132, 23-46. Stöber, J. (2001). The social desirability scale-17 (SDS-17). Convergent validity, discriminant validity, and relationship with age. European Journal of Psychological Assessment, 17, 222-232. Stocco, A., & Fum, D. (2008). Implicit emotional biases in decision making: The case of the Iowa Gambling Task. Brain and Cognition, 66, 253-259. Sunder, S. (1995). Experimental asset markets: A survey. In J. H. Kagel, & A. E. Roth (Eds.), The Handbook of Experimental Economics (pp. 445-500). (Princeton: Princeton University Press). Tan, J. H. W. (2006). Religion and social preferences: An experimental study. Economics Letters, 90, 60-67. 37 Tan, J. H. W., & Bolle, F. (2007). Team competition and the public goods game. Economics Letters, 96, 133-139. Tan, J. H. W., & Zizzo, D. J. (2008). Groups, cooperation and conflict in games. Journal of SocioEconomics, 37, 1-17. Tapia, M., Carretié, L., Sierra, B., & Mercado, F. (2008). Incidental encoding of emotional pictures: Affective bias studied through event related brain potentials. International Journal of Psychophysiology, 68, 193-200. Thaler, R. H. (1988). Anomalies: The ultimatum game. Journal of Economic Perspectives, 2, 195-206. Zizzo, D. J. (2003a). Verbal and behavioral learning in a probability compounding task. Theory and Decision, 54, 287-314. Zizzo, D. J. (2003b). You are not in my boat: Common fate and similarity attractors in bargaining settings. Oxford University Department of Economics Discussion Paper 167. Zizzo, D. J., & Tan, J. H. W. (2007). Perceived harmony, similarity and cooperation in 2 x 2 games: An experimental study. Journal of Economic Psychology, 28, 365-386. Figure 1 – The Microeconomic System for the Experimental Subject Figure 2 – The Relationship Between Potential EDE and Experiment Objectives
© Copyright 2026 Paperzz