Do Women Shy Away From Competition? Do Men Compete Too Much? : A (Failed) Replication Curtis R. Price* College of Business University of Southern Indiana Evansville, IN 47712 Revision: January 15, 2010 Recent research pertaining to gender differences in the labor market have focused on women’s preference to abstain from competition as a potential explanation of why so few women are employed in high paying competitive jobs. Niederle and Vesterlund (2007) find that when given the choice of payment scheme in a simple addition task 73% of men and only 35% of women choose the tournament over the piece rate. This note describes additional data collected at a different university that fails to replicate these results. I discuss potential causes such as differences in performance, relative higher female overconfidence, and the differences in the subject pools between Purdue University and the University of Pittsburgh. JEL Classifications: C91, C99, J16, J22 Keywords: Gender, Competition, Experiments, Wages, Replication * This work was carried out while the author was a graduate student at Purdue University in the Krannert Graduate School of Management. I would like to thank Muriel Niederle & Lise Vesterlund for supplying the raw data, instructions, and software from their study along with helpful comments. I would also like to thank Jack Barron and Tim Cason for their support, comments, and discussions. This project was funded by NSF grant #102953. Supporting data and instructions can be found in the ExLab Digital Archive at http://exlab.bus.ucf.edu. All errors are my own. Send correspondence to: [email protected] 1 1. Introduction Recent experimental research has focused on selection biases in relation to competitive payment schemes to explain the under representation of women in high paying managerial jobs. The purpose of this paper is to replicate the findings of a recent study by Niederle and Vesterlund (2007) (hereafter NV). Replication in experimental economics is an important, and sometimes overlooked, aspect of the research. As a replication, the design and methodology are followed as closely as possible. However, the results of the original study are not replicated. 2. Related Studies Our focus here is on replicating the results of NV in which subject’s choices between noncompetitive (piece rate) and competitive (tournament) wage schemes are explored. Most closely related to our purpose here is the segment of the literature that explores gender differences in choices for competition in the laboratory. There is also a literature that focuses on the performance of male and female subjects when competition in wage schemes is used. For a review of this literature see Croson and Gneezy (2009). Studies in this vein have found a remarkably robust result that females tend to select wage schemes that involve absolute rather than relative compensation schemes. Table 1 summarizes the current set of literature on the impact of competition on the selection of a payment scheme.1 Notice that the results of NV have previously been replicated with similar designs using several tasks and subject populations. The preponderance of evidence supports the hypothesis that males choose competitive payment schemes more often than women. Nonetheless, the variability of the proportions of female and male subjects that choose 1 Because of the nature of this replication, the studies in Table 1 focus on the experimental economics literature. The author is unaware of any sufficiently similar studies from the experimental psychology literature to compare. This is likely due to the fact that it is generally economists who desire to study wages. For a discussion on the psychological similarities between males and females in the psychology literature see Hyde (2007) where the author discusses how, even though there may be evidence of gender differences, the variability of the magnitude of the differences suggests that these differences may be overstated. Additionally, for an overview of experimental psychology and competition as it relates to non-cooperation see Van Vugt, et al. (2007), for an overview of evolutionary psychology and competition see Campbell(2002) and Knight(2002), for an overview of gender and social psychology see Marini (1990), for an excellent overview of some of the first studies in psychology and competition literature see Vaughn and Diserens (1938). 2 to compete across these experiments also suggests that even slight differences in the design of the experiment may influence both male and female decisions to freely enter into a competitive payment scheme. Additionally, as highlighted by Gneezy, Leonard, and List (2008), the data collected does not unanimously support this hypothesis. Gneezy, Leonard, and List (2008) compare results across a matrilineal and patriarchal society studying the Khasi and Maasai. The authors find that men from the patriarchal Maasai people choose the tournament compensation more often than do Maasai women. On the other hand, they find that Khasi women, who exist in a matrilineal society, choose the tournament compensation more often than do Khasi males. This suggests that environment/social norms play a crucial role in determining gender roles regarding competition. Yavas, Vandegrift, and Brown (2004) have subjects perform a forecasting task. The forecasting task is completed 20 times with a subject being able to choose a compensation scheme at the beginning of each round. The proportions found in Table 1 are for choices that the subjects made across all twenty periods. Interestingly, the proportion of subjects who choose the tournament declines as experiment proceeds, but when considering the choices in the first round of the experiment, the authors find that female subjects choose the tournament slightly more often than male subjects in the winner-take-all tournament and only slightly less in the multiwinner tournament (graduated tournament). Also, in line with the results from Cason, Masters, and Shermeta(2008), the authors find that, while controlling for performance differences, male and female subjects do not enter into the winner-take-all tournament at statistically significant different rates. Nonetheless, they do find significant differences in total and within the graduated tournament. This replication study was conducted at the Vernon Smith Experimental Economics Laboratory (VSEEL) at Purdue University. In addition to this study, there have been two other studies very similar to NV conducted at Purdue University: Price (2009), and Cason, Masters, and Shermeta (2008). Both studies use the addition task as outlined in this replication and collect substantially more data than NV and this replication (Price (2009) has 310 subjects and Cason et al. (2009) has 93 subjects). Interestingly, both studies replicate NV’s result, but both show a substantially smaller difference in gender choices relative to NV. This difference is dramatically smaller in Cason et al. (2009) under the winner-take-all tournament where there is no statistically significant difference in the choices of male and female subjects although male 3 subjects do choose the tournament more often than female subjects. The authors only find statistically significant differences in choices when the subjects choose a proportional tournament and in total. Even so, Price (2009) does find a significant difference in a winner take all tournament choice similar to NV. This evidence suggests that the overall size of the gender difference may be smaller among the VSEEL subject population. If this is the case, the smaller sample size in the replication may be insufficient to identify (statistically) the overall effect. Table 1: Related Literature Authors Payment Choices Task (3) Forecasting Male % (4) 35% Female % (5) 22% (1) Vandegrift, Yavas, and Brown (2004) (2) piece rate/WTA tournament piece rate/multi-winner tournament Forecasting 45% 24% Pennsylvania State University/LEMA Datta Gupta, Poulsen, & Villeval (2005) piece rate/tournament Solving Mazes 60% 34% GATE (Groupe 'dAnalyse et de Theorie Economique) Dohmen and Falk (2006) fixed payment/tournament Multiplication 58% 40% University of Bonn Gneezy & Rustichini (2006) piece rate/WTA tournament Anagrams ~42% ~25% piece rate/WTA tournament Basketball ~62% ~12% Niederly & Vesterlund (2007) piece rate/WTA tournament Addition 73% 35% Pittsburgh Students/PEEL Gneezy, Leonard, List ( 2008) piece rate/WTA tournament Bucket Toss Game 50% 26% Maasai piece rate/WTA tournament Bucket Toss Game 39% 54% Khasi piece rate/ WTA tournament piece rate/ proportional payment tournament Addition 43% 37% Purdue Students/VSEEL Addition 68% 52% Purdue Students/VSEEL piece rate/multi-winner tournament piece rate/WTA tournament Addition 74% 31% Addition 66% 49% Harvard Business School/CLER Purdue Students/VSEEL Dargnies (2009) piece rate/ WTA tournament Addition 85% 51% Eriksson et al. (2009) piece rate/tournament Teyssier (2008) revenue sharing/tournament Chosen Effort (non-real effort) Chosen Effort (non-real effort) Cason, Masters, & Shermeta (2009) Nielerle, Segal, & Vesterlund (2009) Price (2009) Subject Population / Laboratory (6) Pennsylvania State University/LEMA University of Chicago Students/Executive MBA students & Teachers University of Chicago Students/Executive MBA students & Teachers Parisian Experimental Economics Laboratory (LEEP) GATE (Groupe 'dAnalyse et de Theorie Economique) GATE (Groupe 'dAnalyse et de Theorie Economique) Risk aversion has been documented to affect male and female subjects differently in the laboratory. Results in the literature concerning the interaction of the decision to enter into the competitive compensation scheme and risk aversion have been mixed. In Cason et al. (2009) the authors find a significant effect of gender on the decision to the tournament even after 4 controlling for risk aversion. In contrast, Eriksson et al. (2009) and Teyssier (2008) find no significant difference in choices when controlling for risk aversion. Unfortunately, these studies do not document the raw proportion of male and female subjects choosing the tournament for comparison to other studies. Additionally, Data Gupta, Poulsen, and Villeval (2005) document that risk aversion plays a role in the decision of female, but not male, subjects to enter into the tournament. 3. Experiment Design Copies of NV’s instructions were utilized as well as their software.2 The experiment consists of four tasks. In each task, except the last, the subjects are asked to find the correct sum of five randomly generated two-digit numbers. The subjects have five minutes to solve a series of these problems. At the beginning of each session, the experimenter hands out instructions and reads them aloud while the subjects follow along. At that time, the experimenter informs the subjects that there are four tasks and that how they get paid varies across the four tasks. The experimenter does not tell the subjects details of the payment schemes for each task until immediately prior to the beginning of each task. Subjects receive absolutely no feedback on the task outcomes (other than their own performance) until the end of the session. Additionally, after finishing the four tasks, but prior to payment, subjects are asked to gauge their ability relative to their group. In particular, subjects are asked to guess their ranking (1st to 4th) in both the task 1-piece rate and the task 2 – tournament. Lastly, subjects are asked a series of questions regarding their college major, gender, and race. At the conclusion of the experiment, a number from 1 to 4 is drawn to select the task for payment. The subject’s total compensation is the sum of a $5 show-up fee, a $7 participation fee, and the performance pay from the selected task. A total of 60 subjects (30 men and 30 women) participated in the experiment at Purdue University.3 The average payoff was $20.40 and the experiment lasted less than one hour. 2 The program was written in Z-tree. Fischbacher, U. (2007) 5 Table 2: The Four Tasks Task 1 –Piece Rate For each correct answer the subject is paid $0.50 Subjects are matched in groups of four. Groups consist of two men and two women. The subject with the largest number of correct problems earns $2.00 per Task 2 – Tournament correct problem. Otherwise the subject earns zero for this task. In the case of ties, the winner is determined randomly. Subjects have the option of choosing the tournament or the piece rate. Subjects Task 3 – Choice choosing the tournament have their performance gauged against the performance in the task 2-tournament of the other three subjects in their group. Subjects have the choice to submit their task 1 performance to be paid either by Task 4 – Submit Piece Rate the piece rate or the tournament. 4. Task 3 & Task 4 Choice After solving addition problems under the piece rate and the tournament in tasks 1 & 2, and prior to solving problems in the third task, the subjects are given the choice of how they want to be paid for their performance in the third task. They choose from either the piece rate or the tournament. Females and males within the replication experiment do not differ in their choices between the piece rate and the tournament with both choosing the tournament 57% of the time. On the other hand, NV’s experiment documents a large disparity between men and women choosing the tournament where men choose the tournament 73% of the time compared to only 35% of the time for female subjects. Using a Fisher’s Exact Test we cannot reject the null hypothesis that the male subject’s choices across experiments come from the same distribution (p-value = 0.21). A Fisher’s Exact test across the experiments on female’s choices is significant with a p-value of 0.091. In the fourth task subjects do not have to solve problems. The subjects are told that they may be paid one more time for the number of problems they correctly solved in task 1 but now have to choose how they want their payment to be determined. Again, they choose from either the piece rate or the tournament. 3 Niederle & Vesterlund (2007) sample a total of 80 subjects, including 40 men and 40 women and the average subject payment was $19.80. 6 Within the NV experiment, male subjects choose the tournament in task 4 55% of the time compared to only 25% of the time for females (p-value = 0.012). In this replication, male and female subjects choose the tournament equally often 40% of the time. Nonetheless, no statistically significant difference exists in either men’s or women’s choices across the two experiments (p-value = 0.24 for men, p-value = 0.20 for women). 5. Analysis Before considering the data, it should be noted that we cannot rule out potential bias in this replication that is (inadvertently) introduced by the experimenter himself (see Hoffman, McCabe, & Smith (1996)). Experimental psychologists have found evidence of experimenter bias in a broad range of studies from achievement scores to rats running a maze.4 Although the fact that Price (2009) replicates the results of NV, makes experimenter bias an unlikely source of differences it is important to note this potential problem for experimental economists. Table 3: Average Performance by Gender Task 1 Male Female (p-value) Task 2 Male Female NV (p-value) 10.1 10.7 (0.522) 10.6 (0.585) 10.2 (0.459) (0.526) 11.9 12.1 (0.738) (0.300) 12.7 11.8 (0.408) (0.643) Male 1.7 1.5 (0.645) Female 2.1 1.7 (0.527) (0.641) (0.673) (p-value) Task 2 - Task 1 Replication (p-value) In the table above, the p-values under the averages represent a t-test of means within the replication and NV but across gender.5 The p-values in the rightmost column are from a t-test of 4 See Rosenthal (2002) for an overview of this social psychology literature on experimenter bias. Additionally, in light of other studies that replicate NV’s finding, the gender of the experimenter is an unlikely source of bias but it should be noted that studies such as Innocenti & Pazienza (2004) demonstrate that this is still potential source of bias. Any unconscious cue from the experimenter which may prime the subjects regarding to gender is also unlikely to bias the results since Niederle, Segal, and Vesterlund (2009) replicate the results of NV and they explicitly inform the subjects of the gendered nature of their research hypothesis. 5 Comparisons of means are, unless otherwise stated, for two-sided t-tests. This is done to facilitate a comparison to the results in Niederle and Vesterlund (2007). 7 the difference in means across the replication and NV data but within gender. Inspection of the above table reveals no significant differences across genders and within study or across studies but within gender. In light of the fact that there are no discernable differences in performance, and that individuals have no knowledge of relative performance until after all choices have been made, it is highly unlikely that performance differences, either within experiment or across experiments are driving the disparate results of this study and NV. 6 After all four tasks have been completed, subjects are asked to subjectively gauge their ability in the task 2 tournament. If a subject’s guess is correct, she earns $1. In particular, the subjects are asked to guess their within group ranking from 1st (best) to 4th(worst). Table 4 summarizes the guesses made by the subjects in both NV and the replication experiment along with the number of times the guesses were incorrect. The bottom panel of Table 4 shows the same data for NV’s experiment. Table 4: Comparison of Task 2 - Tournament Rankings Replication Data 1st (best) 2nd 3rd Men #Guessed 17 8 4 #Incorrect 10 8 2 Women #Guessed 15 10 3 #Incorrect 8 4 2 4th (worst) 1 0 2 2 Total: 30 20 30 16 NV Data 1st (best) 2nd 3rd 4th (worst) Total: 6 Men #Guessed 30 5 4 1 40 #Incorrect 22 3 2 1 28 Women #Guessed 17 15 6 2 40 Incorrect 9 10 5 1 25 In the replication data, male subjects that choose the tournament in task 3 solve more problems than male subjects who choose the piece rate in task 3 in all measures of performance (task 1, task 2, and task 2 minus task 1. Female performance in these three measures does not differ when we compare across task 3 choice. This implies that male subjects may be more accurate in their evaluation of their own ability but it is difficult to imagine that this would lead to more female subjects choosing the tournament. NV found no such difference for either male or female subjects. In the next section we analyze probit models of the decision to enter into the tournament including controls for performance across the two experiments. 8 Although NV find strong evidence of a difference in male and female rankings (Fisher’s Exact test yields a p-value of 0.016) this difference does not exist in the replication data (p-value = 0.853). Nonetheless, we cannot reject the null hypothesis that the choices of males come from the same distribution (p-value = 0.342). Nor can we reject the null hypothesis that the women’s choices come from the same distribution (p-value = 0.920). NV, Niederle, Segal, and Vesterlund (2009) and Dargnies (2009) all document that male subjects are more overconfident than females. This is consistent with the psychology literature that demonstrates that males are more overconfident than women.7 In NV’s study the authors estimate that about 25% of the total gender difference is explained by relatively higher overconfidence of male subjects, Niederle, Segal, and Vesterlund (2009) estimate overconfidence’s contribution to be about 30%-50% and Dargnies (2009) estimates overconfidence accounts for about 13% of the difference. Similar to the results in this replication, Price (2009) finds that male subjects are only slightly more overconfident, although not statistically more overconfident, than female subjects and documents a much smaller gender difference. Even so, the lack of statistically different levels of confidence across NV and this replication may make it difficult to measure how this impacts the overall differences. Nonetheless, it is clear that this is a defining difference between the two subject populations and at least partially contributes to the disparate results. 6. Statistical Models of the Decision to Enter the Tournament In the Table 5 I utilize an interaction term in the probit model to determine the differences across subject pools. I define the variable “NV” as being 1 when the subject’s record is from NV’s subject pool and 0 when it is from the subject pool of the replication. By interacting this indicator with the three factors that may influence the decision to enter into the 7 See Beyer (1998) for an overview of the psychology literature concerning biases in the ability to gauge one’s own ability in a task. Interestingly, this study also finds that the amount of bias is also a function of the task. Females have a larger bias in self-evaluation in answering questions that are determined to be masculine in nature. In answering questions that are neutral and feminine in nature, the author finds no difference in self-evaluation between male and female subjects. Additionally, Niederle and Yestrumskas (2008) have subjects choose from a hard and an easy task. They find that there is no difference in overconfidence across gender but that male subjects choose the hard task significantly more often than females. 9 tournament we can estimate how these effect the decision to enter the tournament across the two subject pools.8 Table 5: Estimation of the Difference in the Marginal Effects across Subject Pools Female (1) -0.031 (2) 0.022 Marginal Effect (p-value) (3) 0.025 ( 1 if female) (0.815) (0.844) (0.856) (0.767) (0.977) (0.990) (0.668) (0.925) (0.872) Tournament 0.031 0.008 0.001 0.039 0.021 0.009 0.029 0.021 0.011 (0.043) (0.602) (0.955) (0.053) (0.317) (0.375) (0.044) (0.269) (0.317) (task 2 correct) (4) -0.033 (5) 0.003 (6) -0.001 (7) -0.033 (8) -0.009 (9) -0.009 Tournament minus Piece Rate 0.017 -0.002 0.018 0.012 -0.003 0.004 0.007 -0.003 0.004 (task 2 minus task 1 correct) (0.369) (0.893) (0.447) (0.645) (0.914) (0.778) (0.720) (0.886) (0.774) Tournament Rank -0.214 -0.228 -0.151 -0.062 -0.128 -0.060 (guessed task 2 rank) (0.000) (0.001) (0.066) (0.118) (0.088) (0.165) Submit Piece Rate (1 if submit task 1 to tournament) NV * Female 0.215 0.066 0.081 (0.016) (0.360) (0.310) -0.349 -0.341 -0.303 -0.208 -0.177 -0.058 -0.108 -0.140 -0.055 (0.039) (0.056) (0.098) (0.047) (0.088) (0.184) (0.192) (0.182) (0.350) -0.023 -0.024 -0.017 -0.028 -0.031 -0.020 (0.390) (0.389) (0.221) (0.209) (0.285) (0.223) 0.004 0.001 0.006 0.043 0.048 0.029 (0.915) (0.978) (0.731) (0.155) (0.218) (0.211) -0.081 -0.044 -0.042 -0.029 (0.459) (0.409) (0.691) (0.640) NV * Tournament NV * Tournament minus PR NV * Tournament Rank NV * Submit Piece Rate NV Observations 0.126 0.152 (0.272) (0.260) 0.151 0.097 0.109 0.461 0.583 0.571 0.496 0.547 0.461 (0.191) (0.333) (0.386) (0.198) (0.234) (0.209) (0.167) (0.301) (0.335) 140 140 140 140 140 140 118 118 118 NV is an indicator variable with 1=NV’s data. NV*___ is the indicator multiplied by the associated independent variable. The marginal effects are calculated at a male subject who solves 12 problems in the piece rate and 13 problems in the tournament, who chooses to submit his task 1 performance to a tournament in task 4 and guesses a tournament rank of 1st. The bottom portion of Table 5 shows the difference in the marginal effects between the replication data and the original data of NV. The NV*Female variable in the bottom section of Table 5 captures the difference in the marginal effect of the female indicator variable across the 8 This is commonly referred to as a Chow test. 10 two experiments. Columns 1-3 indicate that, while not controlling for any differences across experiments, female subjects in NV’s experiment range from 30-35% less likely to enter into the tournament in task 3 than subjects in the replication. To control for differences across experiments, we consider the fully interacted model in columns 4-6. By considering the NV*Female variable in column 4 we see that when controlling for a subjects performance there is a large significant difference in female choices across experiments. In particular, when controlling for performance, the point estimate indicates that women in the replication experiment are 20.8% more likely to enter into the tournament than women in the NV data. In column 5, we include the guessed tournament rank of the subjects. Controlling for this only reduces the difference in propensity to enter into the tournament to 17.7%. Thus, as was anticipated, the differences in confidence and performance across the two experiments only accounts for a small portion of the differences in women’s choices. In column 6, we introduce the subject’s decision to submit their task 1 performance to a tournament. Including this information increases the p-value of the NV*Female variable to 0.184, making it no longer significant. Thus when accounting for task 4 choices there is no (statistically significant) difference in women’s propensity to enter the tournament across the two experiments. Unfortunately, a subject’s decision to submit their task 1 performance to a tournament ultimately measures the subject’s willingness to enter into a tournament and therefore yields no further information about the underlying cause of the difference in tournament entry across the two experiments. Nonetheless, we can surmise from this analysis that differences in performance and confidence do not fully explain the large difference in female participation in the tournament. In addition to gender, data is also collected on age, race, college major, student status, and highest degree obtained. Table 6 gives a summary of the number of subjects across these demographic categories. There is little difference in the distribution of major and race across the two experiments. 9 On the other hand, NV’s data include a wide range of subjects across student 9 The categories for the choices of college major are taken directly from NV’s study to facilitate comparison. In the replication data, there are quite a few subjects who indicate “Other” for major. This is likely caused by the fact that the available choices of college majors do not track well with common majors at Purdue University. In particular, engineering is a common major of students at Purdue, but has no clear associated option. Probit models were 11 status and age while the replication data is entirely comprised of subjects who are younger than age 30. This is important because the results of Gneezy, Leonard, and List (2008) suggest that gender differences in competition are due to a nurture component. If this is the case, these differences may grow larger with age and experience. Table 6: Demographic Data by Gender & Subject Pool Replication Data NV Data Total 3 19 0 3 5 6 24 Male 1 6 0 1 2 5 15 Female 2 13 0 2 3 1 9 Total 4 17 11 2 16 12 18 Male 1 9 5 0 7 8 10 Female 3 8 6 2 9 4 8 Asian 11 Black 2 Caucasian 41 Hispanic 3 Other 3 Highest Degree Obtained 8 1 19 1 1 3 1 22 2 2 13 2 60 2 3 6 0 30 2 2 7 2 30 0 1 Highschool/GED Bachelors Masters Ph.D. None Student Status 54 4 1 0 1 25 4 0 0 1 29 0 1 0 0 55 15 8 1 1 26 7 6 1 0 29 8 2 0 1 Undergrad Grad Neither Age 59 1 0 30 0 0 29 1 0 59 14 7 27 8 5 32 6 2 18-21 22-25 26-29 30-33 34-37 38+ 46 13 1 0 0 0 25 5 0 0 0 0 21 8 1 0 0 0 47 19 3 3 3 5 22 9 1 3 2 3 25 10 2 0 1 2 Major Arts Business Humanities Natural Sciences Social Sciences Physical Sciences Other Race In NV’s experiment 59 subjects were undergraduate students. Of these 59 students, 70% of the male subjects choose the tournament and 40% of the females choose the tournament (Fisher’s Exact test yields p-value = 0.035). For non-undergraduate students 77% of the male subjects choose the tournament compared to only one out of the eight female non-undergraduate students (i.e. 12.5% of females choose the tournament). From this information we can conclude analyzed with indicators for major, race, highest degree obtained, student status, and age. No significant differences were found within either NV’s data or the replication data. 12 that the results of NV are not driven by the inclusion of these non-standard subjects but that the inclusion of these subjects strengthens NV’s finding. If we limit our scope to only undergraduate students across NV and this replication, there is no significant difference in either the male or female choices (Fisher’s Exact test yields pvalues of 0.41 and 0.31 for men and women respectively). This suggests that the differences across the two experiments are exacerbated by the differences in subject pool. By considering columns 7 – 9 in Table 5, where we pool NV’s and this replication’s data while restricting the data to only those subjects who are undergraduate students we can see that there are no measurable differences across the two experiments. In particular, by considering the NV*female variable in column 7 we see that, while controlling for differences in performance across the two experiments and restricting our sample to only undergraduate students, there are no measureable differences in the choices of male and female subjects across experiments. This implies, as we anticipated from the previous analysis, that some of the difference across the two experiments is due to the inclusion of non-undergraduate subjects in NV’s data. Nonetheless, it is also important to note that NV’s finding does not hinge on these subjects. 7. Conclusion This note describes the results of a replication experiment that was conducted at Purdue University. Although Niederle & Vesterlund (2007) find that 73% of men and only 35% of women choose the tournament over the piece rate, there is no difference in male and female choices in the replication. An analysis of the data shows that two main factors contribute to this non-replication. First, male and female subjects in the replication do not seem to differ in their level of overconfidence in ability whereas NV found a sizable difference in overconfidence. Secondly, the inclusion of non-undergraduate students in NV strengthens their results whereas this study includes almost solely undergraduate students. Even so, the results of NV are not driven by the inclusion of non-undergraduate subjects. Previous studies which replicate NV’s finding at the same laboratory as this replication seem to suggest that the gender difference is less pronounced at Purdue University and the analysis points to differences in overconfidence being one of the main factors for this smaller gender difference. Nonetheless, we are unable to ascertain exactly why this replication and NV yield different qualitative results. 13 The important aspect of this non-replication is not to challenge the qualitative results of NV or other studies in this vein. Indeed, along with a multitude of other studies, a similar study by the author (Price (2009)) does replicate the result that male and female subjects choose different compensation schemes. In my opinion, this study serves as a reminder that, although it may be easier to publish studies that highlight large gender differences, we need to be mindful and consider the evidence as a whole. This study shows that it is difficult to extrapolate from one laboratory to the next, let alone to apply laboratory results to the field. It could very well be the case that when the totality of the data is considered that male and females are not very different even though the popular belief is that males and females are drastically different along many dimensions (see Hyde (2005) for an overview of this problem in psychology). This raises the question of the external validity of experimental results and demonstrates that in conjunction to laboratory studies, economists must also endeavor to study data from the field. 14 References Beyer, Sylvia (1998), “Gender Differences in Self-Perception and Negative Recall Biases”, Sex Roles, v.38 nos.1/2, p.103-133 Cason Timothy N.., Masters William A., and Shermeta, Roman M. (2010), “Entry into WinnerTake-All and Proportional Prize Contests: An Experimental Study”, working paper, January 2010 Croson, Rachel, and Gneezy, Uri (2009), “Gender Differences in Preferences”, Journal of Economic Literature, 47:2, p.1-27 Datta Gupta, N. Poulsen, A., and Villeval, M.C. (2005), "Male and Female Competitive Behavior - Experimental Evidence," Working Papers 0512, Groupe d'Analyse et de Théorie Economique (GATE), Centre national de la recherche scientifique (CNRS), Université Lyon 2, Ecole Normale Supérieure. Dohmen, Thomas & Falk, Armin (2006), “Performance Pay and Multi-dimensional Sorting: Productivity, Preferences and Gender”, IZA discussion paper# 2001 (May 2006) Eriksson Tor, Teyssier, Sabrina, and Villeval, Marie-Claire (2009), “Does Self-Selection Improve the Efficiency of Tournaments?”, Economic Inquiry, (47), 530-48 Fischbacher, U. (2007), “z-Tree: Zurich toolbox for ready-made economic experiments”, Experimental Economics, Jun 2007. Vol. 10, Iss. 2; p. 171 Hoffman, Elizabeth, McCabe, Kevin, Smith, Vernon (1996), “Social Distance and Other Regarding Behavior in Dictator Games”, American Economic Review, 86, p.653-660 Hyde, Janet Shibley (2005), “The Gender Similarities Hypothesis”, American Psychologist, v. 60 n. 6, p.581-592 15 Innocenti, Alessandro and Pazienza, Maria Grazia (2004), “Experimenter bias across gender differences”, University of Siena, Department of Political Economy, working paper #438 Gneezy Uri.& Rustichini, Aldo (2006), “Executives versus Teachers: Gender, Competition, and Self Selection”, working paper, University of California, San Diego. Gneezy, Uri, Leonard, Kenneth L., and List, John A. (2009), “Gender Differences in Competition: Evidence From a Matrineal and Patriarchal Society”, Econometrica, vol. 77 n.5, p.1637-1644 Marini, Margaret Mooney (1990), “Sex and Gender: What do we know?”, Sociological Forum, v.5, n.1, p.95-120 Niederle, Muriel, and Vesterlund, Lise (2007), “Do Women Shy away from Competition? Do Men Compete too Much?,”, Quarterly Journal of Economics, August 2007, v. 122, iss. 3, pp. 1067-1101. Niederle, Muriel, Segal, Carmit, & Vesterlund, Lise., (2009) "How Costly is Diversity? Affirmative Action in Light of Gender Differences in Competitiveness", working paper (May 2009) Niederle, Muriel and Yestrumskas, Alexandra H. (2008), “Gender Differences in Seeking Challenges: The Role of Institutions”, NBER working paper #13922 (April 2008) Price, Curtis .R. (2009), “Gender, Competition, and Managerial Decisions”, working paper (available at SSRN) Rosenthal, Robert (2002), “Covert Communication in Classrooms, Clinics, Courtrooms, and Cubicles”, American Psychologist, v.57(11), p.839-849 16 Teyssier, Sabrina (2008), “Experimental Evidence on Inequity Aversion and Self-Selection between Incentive Contracts”, working paper (May 2008) Vandegrift, Donald; Yavas, Abdullah and Brown, Paul M.,2004 "Men, Women, and Competition: An Experimental Test of Labor Market Behavior," Mimeo. Van Vugt, M., De Cremer, D., Janessen, D.P.(2007), “Gender Differences in Cooperation and Competition The Male-Warrior Hypothesis”, Psychological Science, v.18, n.1, p.19-23 Vaughn, James and Diserens, Charles M.(1938),”The Experimental Psychology of Competition”, The Journal of Experimental Education, Vol. 7, No. 1, pp. 76-97 17
© Copyright 2026 Paperzz