Do Men Compete Too Much? : A (Failed) Replication

Do Women Shy Away From Competition? Do Men
Compete Too Much? : A (Failed) Replication
Curtis R. Price*
College of Business
University of Southern Indiana
Evansville, IN 47712
Revision: January 15, 2010
Recent research pertaining to gender differences in the labor market have focused on
women’s preference to abstain from competition as a potential explanation of why so few
women are employed in high paying competitive jobs. Niederle and Vesterlund (2007) find that
when given the choice of payment scheme in a simple addition task 73% of men and only 35%
of women choose the tournament over the piece rate. This note describes additional data
collected at a different university that fails to replicate these results. I discuss potential causes
such as differences in performance, relative higher female overconfidence, and the differences in
the subject pools between Purdue University and the University of Pittsburgh.
JEL Classifications: C91, C99, J16, J22
Keywords: Gender, Competition, Experiments, Wages, Replication
*
This work was carried out while the author was a graduate student at Purdue University in the Krannert Graduate
School of Management. I would like to thank Muriel Niederle & Lise Vesterlund for supplying the raw data,
instructions, and software from their study along with helpful comments. I would also like to thank Jack Barron and
Tim Cason for their support, comments, and discussions. This project was funded by NSF grant #102953.
Supporting data and instructions can be found in the ExLab Digital Archive at http://exlab.bus.ucf.edu. All errors are
my own. Send correspondence to: [email protected]
1
1. Introduction
Recent experimental research has focused on selection biases in relation to competitive
payment schemes to explain the under representation of women in high paying managerial jobs.
The purpose of this paper is to replicate the findings of a recent study by Niederle and
Vesterlund (2007) (hereafter NV). Replication in experimental economics is an important, and
sometimes overlooked, aspect of the research. As a replication, the design and methodology are
followed as closely as possible. However, the results of the original study are not replicated.
2. Related Studies
Our focus here is on replicating the results of NV in which subject’s choices between noncompetitive (piece rate) and competitive (tournament) wage schemes are explored. Most closely
related to our purpose here is the segment of the literature that explores gender differences in
choices for competition in the laboratory. There is also a literature that focuses on the
performance of male and female subjects when competition in wage schemes is used. For a
review of this literature see Croson and Gneezy (2009). Studies in this vein have found a
remarkably robust result that females tend to select wage schemes that involve absolute rather
than relative compensation schemes.
Table 1 summarizes the current set of literature on the impact of competition on the selection
of a payment scheme.1 Notice that the results of NV have previously been replicated with
similar designs using several tasks and subject populations. The preponderance of evidence
supports the hypothesis that males choose competitive payment schemes more often than
women. Nonetheless, the variability of the proportions of female and male subjects that choose
1
Because of the nature of this replication, the studies in Table 1 focus on the experimental economics literature.
The author is unaware of any sufficiently similar studies from the experimental psychology literature to compare.
This is likely due to the fact that it is generally economists who desire to study wages. For a discussion on the
psychological similarities between males and females in the psychology literature see Hyde (2007) where the author
discusses how, even though there may be evidence of gender differences, the variability of the magnitude of the
differences suggests that these differences may be overstated. Additionally, for an overview of experimental
psychology and competition as it relates to non-cooperation see Van Vugt, et al. (2007), for an overview of
evolutionary psychology and competition see Campbell(2002) and Knight(2002), for an overview of gender and
social psychology see Marini (1990), for an excellent overview of some of the first studies in psychology and
competition literature see Vaughn and Diserens (1938).
2
to compete across these experiments also suggests that even slight differences in the design of
the experiment may influence both male and female decisions to freely enter into a competitive
payment scheme. Additionally, as highlighted by Gneezy, Leonard, and List (2008), the data
collected does not unanimously support this hypothesis.
Gneezy, Leonard, and List (2008) compare results across a matrilineal and patriarchal society
studying the Khasi and Maasai. The authors find that men from the patriarchal Maasai people
choose the tournament compensation more often than do Maasai women. On the other hand,
they find that Khasi women, who exist in a matrilineal society, choose the tournament
compensation more often than do Khasi males. This suggests that environment/social norms play
a crucial role in determining gender roles regarding competition.
Yavas, Vandegrift, and Brown (2004) have subjects perform a forecasting task. The
forecasting task is completed 20 times with a subject being able to choose a compensation
scheme at the beginning of each round. The proportions found in Table 1 are for choices that the
subjects made across all twenty periods. Interestingly, the proportion of subjects who choose the
tournament declines as experiment proceeds, but when considering the choices in the first round
of the experiment, the authors find that female subjects choose the tournament slightly more
often than male subjects in the winner-take-all tournament and only slightly less in the multiwinner tournament (graduated tournament). Also, in line with the results from Cason, Masters,
and Shermeta(2008), the authors find that, while controlling for performance differences, male
and female subjects do not enter into the winner-take-all tournament at statistically significant
different rates. Nonetheless, they do find significant differences in total and within the graduated
tournament.
This replication study was conducted at the Vernon Smith Experimental Economics
Laboratory (VSEEL) at Purdue University. In addition to this study, there have been two other
studies very similar to NV conducted at Purdue University: Price (2009), and Cason, Masters,
and Shermeta (2008). Both studies use the addition task as outlined in this replication and
collect substantially more data than NV and this replication (Price (2009) has 310 subjects and
Cason et al. (2009) has 93 subjects). Interestingly, both studies replicate NV’s result, but both
show a substantially smaller difference in gender choices relative to NV. This difference is
dramatically smaller in Cason et al. (2009) under the winner-take-all tournament where there is
no statistically significant difference in the choices of male and female subjects although male
3
subjects do choose the tournament more often than female subjects. The authors only find
statistically significant differences in choices when the subjects choose a proportional
tournament and in total. Even so, Price (2009) does find a significant difference in a winner take
all tournament choice similar to NV. This evidence suggests that the overall size of the gender
difference may be smaller among the VSEEL subject population. If this is the case, the smaller
sample size in the replication may be insufficient to identify (statistically) the overall effect.
Table 1: Related Literature
Authors
Payment Choices
Task
(3)
Forecasting
Male
%
(4)
35%
Female
%
(5)
22%
(1)
Vandegrift, Yavas, and Brown (2004)
(2)
piece rate/WTA
tournament
piece rate/multi-winner
tournament
Forecasting
45%
24%
Pennsylvania State
University/LEMA
Datta Gupta, Poulsen, & Villeval (2005)
piece rate/tournament
Solving Mazes
60%
34%
GATE (Groupe 'dAnalyse et
de Theorie Economique)
Dohmen and Falk (2006)
fixed
payment/tournament
Multiplication
58%
40%
University of Bonn
Gneezy & Rustichini (2006)
piece rate/WTA
tournament
Anagrams
~42%
~25%
piece rate/WTA
tournament
Basketball
~62%
~12%
Niederly & Vesterlund (2007)
piece rate/WTA
tournament
Addition
73%
35%
Pittsburgh Students/PEEL
Gneezy, Leonard, List ( 2008)
piece rate/WTA
tournament
Bucket Toss
Game
50%
26%
Maasai
piece rate/WTA
tournament
Bucket Toss
Game
39%
54%
Khasi
piece rate/ WTA
tournament
piece rate/ proportional
payment tournament
Addition
43%
37%
Purdue Students/VSEEL
Addition
68%
52%
Purdue Students/VSEEL
piece rate/multi-winner
tournament
piece rate/WTA
tournament
Addition
74%
31%
Addition
66%
49%
Harvard Business
School/CLER
Purdue Students/VSEEL
Dargnies (2009)
piece rate/ WTA
tournament
Addition
85%
51%
Eriksson et al. (2009)
piece rate/tournament
Teyssier (2008)
revenue
sharing/tournament
Chosen Effort
(non-real effort)
Chosen Effort
(non-real effort)
Cason, Masters, & Shermeta (2009)
Nielerle, Segal, & Vesterlund (2009)
Price (2009)
Subject Population /
Laboratory
(6)
Pennsylvania State
University/LEMA
University of Chicago
Students/Executive MBA
students & Teachers
University of Chicago
Students/Executive MBA
students & Teachers
Parisian Experimental
Economics Laboratory
(LEEP)
GATE (Groupe 'dAnalyse et
de Theorie Economique)
GATE (Groupe 'dAnalyse et
de Theorie Economique)
Risk aversion has been documented to affect male and female subjects differently in the
laboratory. Results in the literature concerning the interaction of the decision to enter into the
competitive compensation scheme and risk aversion have been mixed. In Cason et al. (2009) the
authors find a significant effect of gender on the decision to the tournament even after
4
controlling for risk aversion. In contrast, Eriksson et al. (2009) and Teyssier (2008) find no
significant difference in choices when controlling for risk aversion. Unfortunately, these studies
do not document the raw proportion of male and female subjects choosing the tournament for
comparison to other studies. Additionally, Data Gupta, Poulsen, and Villeval (2005) document
that risk aversion plays a role in the decision of female, but not male, subjects to enter into the
tournament.
3. Experiment Design
Copies of NV’s instructions were utilized as well as their software.2
The experiment consists of four tasks. In each task, except the last, the subjects are asked
to find the correct sum of five randomly generated two-digit numbers. The subjects have five
minutes to solve a series of these problems.
At the beginning of each session, the experimenter hands out instructions and reads them
aloud while the subjects follow along. At that time, the experimenter informs the subjects that
there are four tasks and that how they get paid varies across the four tasks. The experimenter
does not tell the subjects details of the payment schemes for each task until immediately prior to
the beginning of each task.
Subjects receive absolutely no feedback on the task outcomes (other than their own
performance) until the end of the session. Additionally, after finishing the four tasks, but prior to
payment, subjects are asked to gauge their ability relative to their group. In particular, subjects
are asked to guess their ranking (1st to 4th) in both the task 1-piece rate and the task 2 –
tournament. Lastly, subjects are asked a series of questions regarding their college major,
gender, and race.
At the conclusion of the experiment, a number from 1 to 4 is drawn to select the task for
payment. The subject’s total compensation is the sum of a $5 show-up fee, a $7 participation
fee, and the performance pay from the selected task.
A total of 60 subjects (30 men and 30 women) participated in the experiment at Purdue
University.3 The average payoff was $20.40 and the experiment lasted less than one hour.
2
The program was written in Z-tree. Fischbacher, U. (2007)
5
Table 2: The Four Tasks
Task 1 –Piece Rate
For each correct answer the subject is paid $0.50
Subjects are matched in groups of four. Groups consist of two men and two
women. The subject with the largest number of correct problems earns $2.00 per
Task 2 – Tournament
correct problem. Otherwise the subject earns zero for this task. In the case of ties,
the winner is determined randomly.
Subjects have the option of choosing the tournament or the piece rate. Subjects
Task 3 – Choice
choosing the tournament have their performance gauged against the performance
in the task 2-tournament of the other three subjects in their group.
Subjects have the choice to submit their task 1 performance to be paid either by
Task 4 – Submit Piece Rate
the piece rate or the tournament.
4. Task 3 & Task 4 Choice
After solving addition problems under the piece rate and the tournament in tasks 1 & 2,
and prior to solving problems in the third task, the subjects are given the choice of how they
want to be paid for their performance in the third task. They choose from either the piece rate or
the tournament.
Females and males within the replication experiment do not differ in their choices
between the piece rate and the tournament with both choosing the tournament 57% of the time.
On the other hand, NV’s experiment documents a large disparity between men and women
choosing the tournament where men choose the tournament 73% of the time compared to only
35% of the time for female subjects. Using a Fisher’s Exact Test we cannot reject the null
hypothesis that the male subject’s choices across experiments come from the same distribution
(p-value = 0.21). A Fisher’s Exact test across the experiments on female’s choices is significant
with a p-value of 0.091.
In the fourth task subjects do not have to solve problems. The subjects are told that they
may be paid one more time for the number of problems they correctly solved in task 1 but now
have to choose how they want their payment to be determined. Again, they choose from either
the piece rate or the tournament.
3
Niederle & Vesterlund (2007) sample a total of 80 subjects, including 40 men and 40 women and the average
subject payment was $19.80.
6
Within the NV experiment, male subjects choose the tournament in task 4 55% of the time
compared to only 25% of the time for females (p-value = 0.012). In this replication, male and
female subjects choose the tournament equally often 40% of the time. Nonetheless, no
statistically significant difference exists in either men’s or women’s choices across the two
experiments (p-value = 0.24 for men, p-value = 0.20 for women).
5. Analysis
Before considering the data, it should be noted that we cannot rule out potential bias in
this replication that is (inadvertently) introduced by the experimenter himself (see Hoffman,
McCabe, & Smith (1996)). Experimental psychologists have found evidence of experimenter
bias in a broad range of studies from achievement scores to rats running a maze.4 Although the
fact that Price (2009) replicates the results of NV, makes experimenter bias an unlikely source of
differences it is important to note this potential problem for experimental economists.
Table 3: Average Performance by Gender
Task 1
Male
Female
(p-value)
Task 2
Male
Female
NV
(p-value)
10.1
10.7
(0.522)
10.6
(0.585)
10.2
(0.459)
(0.526)
11.9
12.1
(0.738)
(0.300)
12.7
11.8
(0.408)
(0.643)
Male
1.7
1.5
(0.645)
Female
2.1
1.7
(0.527)
(0.641)
(0.673)
(p-value)
Task 2 - Task 1
Replication
(p-value)
In the table above, the p-values under the averages represent a t-test of means within the
replication and NV but across gender.5 The p-values in the rightmost column are from a t-test of
4
See Rosenthal (2002) for an overview of this social psychology literature on experimenter bias. Additionally, in
light of other studies that replicate NV’s finding, the gender of the experimenter is an unlikely source of bias but it
should be noted that studies such as Innocenti & Pazienza (2004) demonstrate that this is still potential source of
bias. Any unconscious cue from the experimenter which may prime the subjects regarding to gender is also unlikely
to bias the results since Niederle, Segal, and Vesterlund (2009) replicate the results of NV and they explicitly inform
the subjects of the gendered nature of their research hypothesis.
5
Comparisons of means are, unless otherwise stated, for two-sided t-tests. This is done to facilitate a comparison to
the results in Niederle and Vesterlund (2007).
7
the difference in means across the replication and NV data but within gender. Inspection of the
above table reveals no significant differences across genders and within study or across studies
but within gender. In light of the fact that there are no discernable differences in performance,
and that individuals have no knowledge of relative performance until after all choices have been
made, it is highly unlikely that performance differences, either within experiment or across
experiments are driving the disparate results of this study and NV. 6
After all four tasks have been completed, subjects are asked to subjectively gauge their
ability in the task 2 tournament. If a subject’s guess is correct, she earns $1. In particular, the
subjects are asked to guess their within group ranking from 1st (best) to 4th(worst). Table 4
summarizes the guesses made by the subjects in both NV and the replication experiment along
with the number of times the guesses were incorrect. The bottom panel of Table 4 shows the
same data for NV’s experiment.
Table 4: Comparison of Task 2 - Tournament Rankings
Replication Data
1st (best)
2nd
3rd
Men
#Guessed
17
8
4
#Incorrect
10
8
2
Women
#Guessed
15
10
3
#Incorrect
8
4
2
4th (worst)
1
0
2
2
Total:
30
20
30
16
NV Data
1st (best)
2nd
3rd
4th (worst)
Total:
6
Men
#Guessed
30
5
4
1
40
#Incorrect
22
3
2
1
28
Women
#Guessed
17
15
6
2
40
Incorrect
9
10
5
1
25
In the replication data, male subjects that choose the tournament in task 3 solve more problems than male subjects
who choose the piece rate in task 3 in all measures of performance (task 1, task 2, and task 2 minus task 1. Female
performance in these three measures does not differ when we compare across task 3 choice. This implies that male
subjects may be more accurate in their evaluation of their own ability but it is difficult to imagine that this would
lead to more female subjects choosing the tournament. NV found no such difference for either male or female
subjects. In the next section we analyze probit models of the decision to enter into the tournament including
controls for performance across the two experiments.
8
Although NV find strong evidence of a difference in male and female rankings (Fisher’s
Exact test yields a p-value of 0.016) this difference does not exist in the replication data (p-value
= 0.853). Nonetheless, we cannot reject the null hypothesis that the choices of males come from
the same distribution (p-value = 0.342). Nor can we reject the null hypothesis that the women’s
choices come from the same distribution (p-value = 0.920).
NV, Niederle, Segal, and Vesterlund (2009) and Dargnies (2009) all document that male
subjects are more overconfident than females. This is consistent with the psychology literature
that demonstrates that males are more overconfident than women.7 In NV’s study the authors
estimate that about 25% of the total gender difference is explained by relatively higher
overconfidence of male subjects, Niederle, Segal, and Vesterlund (2009) estimate
overconfidence’s contribution to be about 30%-50% and Dargnies (2009) estimates
overconfidence accounts for about 13% of the difference. Similar to the results in this
replication, Price (2009) finds that male subjects are only slightly more overconfident, although
not statistically more overconfident, than female subjects and documents a much smaller gender
difference. Even so, the lack of statistically different levels of confidence across NV and this
replication may make it difficult to measure how this impacts the overall differences.
Nonetheless, it is clear that this is a defining difference between the two subject populations and
at least partially contributes to the disparate results.
6. Statistical Models of the Decision to Enter the Tournament
In the Table 5 I utilize an interaction term in the probit model to determine the
differences across subject pools. I define the variable “NV” as being 1 when the subject’s record
is from NV’s subject pool and 0 when it is from the subject pool of the replication. By
interacting this indicator with the three factors that may influence the decision to enter into the
7
See Beyer (1998) for an overview of the psychology literature concerning biases in the ability to gauge one’s own
ability in a task. Interestingly, this study also finds that the amount of bias is also a function of the task. Females
have a larger bias in self-evaluation in answering questions that are determined to be masculine in nature. In
answering questions that are neutral and feminine in nature, the author finds no difference in self-evaluation between
male and female subjects. Additionally, Niederle and Yestrumskas (2008) have subjects choose from a hard and an
easy task. They find that there is no difference in overconfidence across gender but that male subjects choose the
hard task significantly more often than females.
9
tournament we can estimate how these effect the decision to enter the tournament across the two
subject pools.8
Table 5: Estimation of the Difference in the Marginal Effects across Subject Pools
Female
(1)
-0.031
(2)
0.022
Marginal
Effect
(p-value)
(3)
0.025
( 1 if female)
(0.815)
(0.844)
(0.856)
(0.767)
(0.977)
(0.990)
(0.668)
(0.925)
(0.872)
Tournament
0.031
0.008
0.001
0.039
0.021
0.009
0.029
0.021
0.011
(0.043)
(0.602)
(0.955)
(0.053)
(0.317)
(0.375)
(0.044)
(0.269)
(0.317)
(task 2 correct)
(4)
-0.033
(5)
0.003
(6)
-0.001
(7)
-0.033
(8)
-0.009
(9)
-0.009
Tournament minus Piece Rate
0.017
-0.002
0.018
0.012
-0.003
0.004
0.007
-0.003
0.004
(task 2 minus task 1 correct)
(0.369)
(0.893)
(0.447)
(0.645)
(0.914)
(0.778)
(0.720)
(0.886)
(0.774)
Tournament Rank
-0.214
-0.228
-0.151
-0.062
-0.128
-0.060
(guessed task 2 rank)
(0.000)
(0.001)
(0.066)
(0.118)
(0.088)
(0.165)
Submit Piece Rate
(1 if submit task 1 to tournament)
NV * Female
0.215
0.066
0.081
(0.016)
(0.360)
(0.310)
-0.349
-0.341
-0.303
-0.208
-0.177
-0.058
-0.108
-0.140
-0.055
(0.039)
(0.056)
(0.098)
(0.047)
(0.088)
(0.184)
(0.192)
(0.182)
(0.350)
-0.023
-0.024
-0.017
-0.028
-0.031
-0.020
(0.390)
(0.389)
(0.221)
(0.209)
(0.285)
(0.223)
0.004
0.001
0.006
0.043
0.048
0.029
(0.915)
(0.978)
(0.731)
(0.155)
(0.218)
(0.211)
-0.081
-0.044
-0.042
-0.029
(0.459)
(0.409)
(0.691)
(0.640)
NV * Tournament
NV * Tournament minus PR
NV * Tournament Rank
NV * Submit Piece Rate
NV
Observations
0.126
0.152
(0.272)
(0.260)
0.151
0.097
0.109
0.461
0.583
0.571
0.496
0.547
0.461
(0.191)
(0.333)
(0.386)
(0.198)
(0.234)
(0.209)
(0.167)
(0.301)
(0.335)
140
140
140
140
140
140
118
118
118
NV is an indicator variable with 1=NV’s data. NV*___ is the indicator multiplied by the associated independent
variable. The marginal effects are calculated at a male subject who solves 12 problems in the piece rate and 13
problems in the tournament, who chooses to submit his task 1 performance to a tournament in task 4 and guesses a
tournament rank of 1st.
The bottom portion of Table 5 shows the difference in the marginal effects between the
replication data and the original data of NV. The NV*Female variable in the bottom section of
Table 5 captures the difference in the marginal effect of the female indicator variable across the
8
This is commonly referred to as a Chow test.
10
two experiments. Columns 1-3 indicate that, while not controlling for any differences across
experiments, female subjects in NV’s experiment range from 30-35% less likely to enter into the
tournament in task 3 than subjects in the replication. To control for differences across
experiments, we consider the fully interacted model in columns 4-6. By considering the
NV*Female variable in column 4 we see that when controlling for a subjects performance there
is a large significant difference in female choices across experiments. In particular, when
controlling for performance, the point estimate indicates that women in the replication
experiment are 20.8% more likely to enter into the tournament than women in the NV data. In
column 5, we include the guessed tournament rank of the subjects. Controlling for this only
reduces the difference in propensity to enter into the tournament to 17.7%. Thus, as was
anticipated, the differences in confidence and performance across the two experiments only
accounts for a small portion of the differences in women’s choices. In column 6, we introduce
the subject’s decision to submit their task 1 performance to a tournament. Including this
information increases the p-value of the NV*Female variable to 0.184, making it no longer
significant. Thus when accounting for task 4 choices there is no (statistically significant)
difference in women’s propensity to enter the tournament across the two experiments.
Unfortunately, a subject’s decision to submit their task 1 performance to a tournament ultimately
measures the subject’s willingness to enter into a tournament and therefore yields no further
information about the underlying cause of the difference in tournament entry across the two
experiments. Nonetheless, we can surmise from this analysis that differences in performance
and confidence do not fully explain the large difference in female participation in the
tournament.
In addition to gender, data is also collected on age, race, college major, student status,
and highest degree obtained. Table 6 gives a summary of the number of subjects across these
demographic categories. There is little difference in the distribution of major and race across the
two experiments. 9 On the other hand, NV’s data include a wide range of subjects across student
9
The categories for the choices of college major are taken directly from NV’s study to facilitate comparison. In the
replication data, there are quite a few subjects who indicate “Other” for major. This is likely caused by the fact that
the available choices of college majors do not track well with common majors at Purdue University. In particular,
engineering is a common major of students at Purdue, but has no clear associated option. Probit models were
11
status and age while the replication data is entirely comprised of subjects who are younger than
age 30. This is important because the results of Gneezy, Leonard, and List (2008) suggest that
gender differences in competition are due to a nurture component. If this is the case, these
differences may grow larger with age and experience.
Table 6: Demographic Data by Gender & Subject Pool
Replication Data
NV Data
Total
3
19
0
3
5
6
24
Male
1
6
0
1
2
5
15
Female
2
13
0
2
3
1
9
Total
4
17
11
2
16
12
18
Male
1
9
5
0
7
8
10
Female
3
8
6
2
9
4
8
Asian
11
Black
2
Caucasian
41
Hispanic
3
Other
3
Highest Degree Obtained
8
1
19
1
1
3
1
22
2
2
13
2
60
2
3
6
0
30
2
2
7
2
30
0
1
Highschool/GED
Bachelors
Masters
Ph.D.
None
Student Status
54
4
1
0
1
25
4
0
0
1
29
0
1
0
0
55
15
8
1
1
26
7
6
1
0
29
8
2
0
1
Undergrad
Grad
Neither
Age
59
1
0
30
0
0
29
1
0
59
14
7
27
8
5
32
6
2
18-21
22-25
26-29
30-33
34-37
38+
46
13
1
0
0
0
25
5
0
0
0
0
21
8
1
0
0
0
47
19
3
3
3
5
22
9
1
3
2
3
25
10
2
0
1
2
Major
Arts
Business
Humanities
Natural Sciences
Social Sciences
Physical Sciences
Other
Race
In NV’s experiment 59 subjects were undergraduate students. Of these 59 students, 70%
of the male subjects choose the tournament and 40% of the females choose the tournament
(Fisher’s Exact test yields p-value = 0.035). For non-undergraduate students 77% of the male
subjects choose the tournament compared to only one out of the eight female non-undergraduate
students (i.e. 12.5% of females choose the tournament). From this information we can conclude
analyzed with indicators for major, race, highest degree obtained, student status, and age. No significant differences
were found within either NV’s data or the replication data.
12
that the results of NV are not driven by the inclusion of these non-standard subjects but that the
inclusion of these subjects strengthens NV’s finding.
If we limit our scope to only undergraduate students across NV and this replication, there
is no significant difference in either the male or female choices (Fisher’s Exact test yields pvalues of 0.41 and 0.31 for men and women respectively). This suggests that the differences
across the two experiments are exacerbated by the differences in subject pool.
By considering columns 7 – 9 in Table 5, where we pool NV’s and this replication’s data
while restricting the data to only those subjects who are undergraduate students we can see that
there are no measurable differences across the two experiments. In particular, by considering the
NV*female variable in column 7 we see that, while controlling for differences in performance
across the two experiments and restricting our sample to only undergraduate students, there are
no measureable differences in the choices of male and female subjects across experiments. This
implies, as we anticipated from the previous analysis, that some of the difference across the two
experiments is due to the inclusion of non-undergraduate subjects in NV’s data. Nonetheless, it
is also important to note that NV’s finding does not hinge on these subjects.
7. Conclusion
This note describes the results of a replication experiment that was conducted at Purdue
University. Although Niederle & Vesterlund (2007) find that 73% of men and only 35% of
women choose the tournament over the piece rate, there is no difference in male and female
choices in the replication. An analysis of the data shows that two main factors contribute to this
non-replication. First, male and female subjects in the replication do not seem to differ in their
level of overconfidence in ability whereas NV found a sizable difference in overconfidence.
Secondly, the inclusion of non-undergraduate students in NV strengthens their results whereas
this study includes almost solely undergraduate students. Even so, the results of NV are not
driven by the inclusion of non-undergraduate subjects.
Previous studies which replicate NV’s finding at the same laboratory as this replication
seem to suggest that the gender difference is less pronounced at Purdue University and the
analysis points to differences in overconfidence being one of the main factors for this smaller
gender difference. Nonetheless, we are unable to ascertain exactly why this replication and NV
yield different qualitative results.
13
The important aspect of this non-replication is not to challenge the qualitative results of
NV or other studies in this vein. Indeed, along with a multitude of other studies, a similar study
by the author (Price (2009)) does replicate the result that male and female subjects choose
different compensation schemes. In my opinion, this study serves as a reminder that, although it
may be easier to publish studies that highlight large gender differences, we need to be mindful
and consider the evidence as a whole. This study shows that it is difficult to extrapolate from
one laboratory to the next, let alone to apply laboratory results to the field. It could very well be
the case that when the totality of the data is considered that male and females are not very
different even though the popular belief is that males and females are drastically different along
many dimensions (see Hyde (2005) for an overview of this problem in psychology). This raises
the question of the external validity of experimental results and demonstrates that in conjunction
to laboratory studies, economists must also endeavor to study data from the field.
14
References
Beyer, Sylvia (1998), “Gender Differences in Self-Perception and Negative Recall Biases”, Sex
Roles, v.38 nos.1/2, p.103-133
Cason Timothy N.., Masters William A., and Shermeta, Roman M. (2010), “Entry into WinnerTake-All and Proportional Prize Contests: An Experimental Study”, working paper, January
2010
Croson, Rachel, and Gneezy, Uri (2009), “Gender Differences in Preferences”, Journal of
Economic Literature, 47:2, p.1-27
Datta Gupta, N. Poulsen, A., and Villeval, M.C. (2005), "Male and Female Competitive
Behavior - Experimental Evidence," Working Papers 0512, Groupe d'Analyse et de Théorie
Economique (GATE), Centre national de la recherche scientifique (CNRS), Université Lyon 2,
Ecole Normale Supérieure.
Dohmen, Thomas & Falk, Armin (2006), “Performance Pay and Multi-dimensional Sorting:
Productivity, Preferences and Gender”, IZA discussion paper# 2001 (May 2006)
Eriksson Tor, Teyssier, Sabrina, and Villeval, Marie-Claire (2009), “Does Self-Selection
Improve the Efficiency of Tournaments?”, Economic Inquiry, (47), 530-48
Fischbacher, U. (2007), “z-Tree: Zurich toolbox for ready-made economic experiments”,
Experimental Economics, Jun 2007. Vol. 10, Iss. 2; p. 171
Hoffman, Elizabeth, McCabe, Kevin, Smith, Vernon (1996), “Social Distance and Other
Regarding Behavior in Dictator Games”, American Economic Review, 86, p.653-660
Hyde, Janet Shibley (2005), “The Gender Similarities Hypothesis”, American Psychologist, v. 60
n. 6, p.581-592
15
Innocenti, Alessandro and Pazienza, Maria Grazia (2004), “Experimenter bias across gender
differences”, University of Siena, Department of Political Economy, working paper #438
Gneezy Uri.& Rustichini, Aldo (2006), “Executives versus Teachers: Gender, Competition, and
Self Selection”, working paper, University of California, San Diego.
Gneezy, Uri, Leonard, Kenneth L., and List, John A. (2009), “Gender Differences in
Competition: Evidence From a Matrineal and Patriarchal Society”, Econometrica, vol. 77 n.5,
p.1637-1644
Marini, Margaret Mooney (1990), “Sex and Gender: What do we know?”, Sociological Forum,
v.5, n.1, p.95-120
Niederle, Muriel, and Vesterlund, Lise (2007), “Do Women Shy away from Competition? Do
Men Compete too Much?,”, Quarterly Journal of Economics, August 2007, v. 122, iss. 3, pp.
1067-1101.
Niederle, Muriel, Segal, Carmit, & Vesterlund, Lise., (2009) "How Costly is Diversity?
Affirmative Action in Light of Gender Differences in Competitiveness", working paper (May
2009)
Niederle, Muriel and Yestrumskas, Alexandra H. (2008), “Gender Differences in Seeking
Challenges: The Role of Institutions”, NBER working paper #13922 (April 2008)
Price, Curtis .R. (2009), “Gender, Competition, and Managerial Decisions”, working paper
(available at SSRN)
Rosenthal, Robert (2002), “Covert Communication in Classrooms, Clinics, Courtrooms, and
Cubicles”, American Psychologist, v.57(11), p.839-849
16
Teyssier, Sabrina (2008), “Experimental Evidence on Inequity Aversion and Self-Selection
between Incentive Contracts”, working paper (May 2008)
Vandegrift, Donald; Yavas, Abdullah and Brown, Paul M.,2004 "Men, Women, and
Competition: An Experimental Test of Labor Market Behavior," Mimeo.
Van Vugt, M., De Cremer, D., Janessen, D.P.(2007), “Gender Differences in Cooperation and
Competition The Male-Warrior Hypothesis”, Psychological Science, v.18, n.1, p.19-23
Vaughn, James and Diserens, Charles M.(1938),”The Experimental Psychology of Competition”,
The Journal of Experimental Education, Vol. 7, No. 1, pp. 76-97
17