The Reproducibility of Psychological Science The Open Science

The Reproducibility of Psychological Science
The Open Science Collaboration
Replication of Prescribed Optimism: Is it Right to Be Wrong About the Future? by David
A. Armor, Cade Massey & Aaron M. Sackett (2008, Psychological Science)
Anna van ‘t Veer1, Bethany Lassetter2 & Mark J Brandt3,
1
[email protected], Tilburg University
[email protected], University of Oregon
3
[email protected], Tilburg University
2
Introduction
People tend to make optimistically biased predictions about their personal futures; for
example, we anticipate living longer than average, and we overestimate our chances of
success in the job market (Weinstein, 1980). This observation conflicts with the
assumption that our primary goal is to be accurate in our predictions. The original study
explored–amongst other things–what kind of predictions (accurate, optimistic, or
pessimistic) one ought to make.
Researchers found that participants (n = 127) clearly recommended optimistic
predictions, t(124) = 10.36, prep > 0.99, p < .001, Cohen’s d = 0.93 (Armor, Massey, &
Sackett, 2008).1 Overall, the modal prescription was moderately optimistic, which was
recommended almost twice as often as an accurate prescription (32.3% vs. 17.7%).
These findings support the view that people believe optimistically-biased predictions are
ideal.
Methods
Power Analysis
The original effect size for the one-sample t-test that tested the primary prediction
was Cohen’s d = .93, 95% CI .72, 1.14. A power analysis using G*Power to determine
the sample sizes necessary to achieve 80%, 90%, 95% power to detect the effect size
indicates that samples with 12, 15, and 18 total participants are necessary, respectively
(see Figures 1 – 3). As is clear from these analyses the original study was well powered
to detect a Cohen’s d = .93 and had a post-hoc power of essentially 1.00 (See Figure 4).
1
The total N and the df for the one-sample t-test do not match. We assume that some
participants were missing data and that this explains the discrepancy.
Planned Sample
To achieve a very high power we only need a very few number of participants
because the original effect size was large. To give the original effect the best possible
chance to succeed we used a power calculation of 99% Power and based the
calculation on the lower confidence interval of the effect size (0.72; see Figure 5).2 This
analysis indicated that we need N = 38 to replicate the study assuming a high level of
power and a small effect size compared to the original study. We also commit to
collecting data from at least an additional 10 people to provide a "cushion" in case there
is missing data. This replication attempt is a joint project between a lab at the University
of Oregon (UO) in the United States and Tilburg University (TU) in the Netherlands.
Each lab will collect enough data to have 99% power to detect Cohen’s d = .72 with at
least 10 participants of “cushion” (UO N = 48, TU N = 48). Importantly, in both labs it is
likely that it will be possible to collect data from additional participants and so N = 48 is
the lower limit of the planned sample size. In the TU lab, studies are run for one week at
a time and data collection will stop at the end of one week (typically between 100 and
150 participants). In the UO lab, studies are typically run throughout a 10-week term.
Data collection will stop at the end of the term, or when 100 participants have gone
through the study, whichever comes first.
In both samples, participants will be student participants and will be
compensated with either course credit or a monetary reward.3 The average UO sample
is typically between 18 and 22 years old and is primarily European American and
female. The average TU sample is typically between 18 and 22 years old and is primarily
native Dutch and female.
Materials
All materials were obtained directly from the original authors. Because the
primary result is based on only people in one of the between-subject experimental
conditions (the “prescriptions” condition) we will only replicate this condition. Participants
will be asked:
“to imagine one of four different settings in which predictions (a) would be
relevant and (b) might range from overly pessimistic to overly optimistic.
These settings, chosen for breadth, included decisions about a financial
investment, an academic-award application, a surgical procedure, and a
dinner party. For each setting, we created eight vignettes by
independently manipulating three variables known to be related to
optimism: commitment (whether the decision to engage in a particular
2
The larger sample sizes also ensure that there will be an adequate number of participants for
each of the four topical settings (a between-subject condition). If we used the sample size for
95% power only 4 or 5 participants would have been in each condition for the topical settings.
3 At the TU lab some lab sessions are run for course credit and some are run for monetary
payment depending on the week and the current course offerings at TU. We anticipate that this
study will be conducted when participants are offered course credit. The first 48 participants at
the UO will be compensated $5 for their participation. Course credit will be granted thereafter to
increase the sample size or to reach our goal n depending on the speed and ease of recruitment.
action has or has not been made; Armor & Taylor, 2003), agency
(whether the decision to commit was, or will be, made by the protagonist
or by another person; Henry, 1994), and control (the degree to which the
protagonist can influence the predicted outcome; Klein & Helweg-Larsen,
2002). Each participant was randomly assigned to one setting and
received all eight vignettes, in counterbalanced order, within that setting”
(Armor, Massey, & Sackett, 2008, p. 329).
In the between-subjects condition that we replicate, the “prescriptions” condition,
participants will be “asked to provide prescriptions (i.e., to indicate whether it would be
best to be overly pessimistic, accurate, or overly optimistic for each of the eight
vignettes” (Armor, Massey, & Sackett, 2008, p. 329, italics in the original). “Response
options ranged from -4 (extremely pessimistic) through 0 (accurate) to +4 (extremely
optimistic)” (Armor, Massey, & Sackett, 2008, p. 329, italics in the original) with
additional labels at -2 (moderately pessimistic) and +2 (moderately optimistic).
Consistent with the original study, participants will also complete a number of other
questions about the desirability of the scenario panning out, the probability of that
happening, and three other questions associated with the three variables manipulated in
the scenario.
The materials for the TU sample were translated to Dutch by Anna van ‘t Veer and back
translated to English by a research assistant fluent in Dutch and English. The back
translated version was compared to the original version by Mark Brandt and Anna van ‘t
Veer. Any discrepancies were resolved through discussion.
Procedure
Participants will arrive at the laboratory and will complete the study in a paperand-pencil format, consistent with the original study. After completing an informed
consent form, participants will receive instructions, complete the experimental materials,
complete the Life Orientation Test, and finally complete the demographic variables
(gender, age, ethnicity, and year in college). This will closely replicate the prescriptions
condition from the original study in its entirety. In the TU lab, participants will complete
the materials in individual cubicles. Studies in this lab are often conducted within a onehour lab session with several studies in each lab session. The order participants
complete the studies in a session are negotiated on a week-by-week basis. Although we
cannot guarantee that the replication study will be the first study in the session, we
anticipate that we will be able to negotiate this order. In the UO lab, participants will
complete experimental materials alone in a study room or in a shared space separated
from others by privacy dividers. Participants will be instructed to complete their materials
independently and to refrain from checking cellular devices throughout the study. Like
the TU lab, the replication study will likely be included with several other studies in one
experimental session.
Analysis Plan
Participants who did not complete all of the measures we analyze will not be
included (i.e. a listwise deletion strategy). Participants’ prescribed optimism responses
will be averaged together. The original test of the primary hypothesis is a one-sample t-
test that compares the average responses on the prescribed optimism measure to zero
(the mid-point of the scale). Because we are collecting data from both UO and TU, and
the participants from UO may receive money or course credit for their participation we
will conduct an equivalent analysis in two-stages. First, prescribed optimism will be
regressed on sample location/compensation (contrast codes: UO course credit = -1, UO
money = 0, TU = 1 & UO course credit = -1, UO money = 2, TU = -1). If sample
location/compensation does not have a significant effect on prescribed optimism (p >
.05), then we will conduct a one-sample t-test (test value = 0) across the samples. If
sample location/compensation does have a significant effect (p < .05), then we will
conduct a one-sample t-test (test value = 0) in all samples separately. Although we will
report the results from all of the samples in our final report, the result from the UO
money sample will be used for the analyses of the Reproducibility Project because the
UO money sample is the most similar to the original sample.
We also plan to conduct some exploratory analyses and not as an indication of
the reproducibility of the final result.. First, we will also see what the modal prescribed
optimism response is and see how that compares to the frequency of accurate
responses (see Armor, Massey, & Sackett, 2008, p. 329). Second, and consistent with
the original study, we will conduct the same two-stage one-sample t-test strategy
described above for each of the eight vignette conditions (see Armor, Massey, &
Sackett, 2008, p. 329). Third, will conducted the same two-stage one-sample t-test
strategy described above across each of the agency, commitment, and control
manipulations (see Armor, Massey, & Sackett, 2008, p. 330). Fourth, a 2 (Sample: UO
vs. TU) X 2 (Agency: External vs. Internal) X 2 (Control: Low vs. High) X 2 (Commitment:
Precommitment vs. Postcommitment) mixed-method ANOVA where the first factor is a
between-subjects factor and all other factors are within-subject factors will be conducted
to test for main effects and interactiosn of the condition on prescribed optimism (see
Armor, Massey, & Sackett, 2008, p. 330). Finally, we will test whether people whose
average score on the LOT-R is below the midpoint also prescribe optimism more than
zero (see Armor, Massey, & Sackett, 2008, p. 330) using the two-stage one-sample ttest strategy for just those individuals below the midpoint.
If sample location does have a significant effect on prescribed optimism, then we
will also see if the intercept of the regression model is significantly different from zero.
The intercept in this model, because the sample location is contrast coded, is the mean
of the prescribed optimism measure while controlling for sample location and the
significance of the intercept is equivalent to a one-sample t-test on prescribed optimism
(test value = 0) when controlling for sample location. That is, this effect will tell us
whether the predicted effect is significant on average across both samples, even if it
differs in size between the two samples.
Differences from Original Study
There are several differences between the original study and our replication.
 In the original study (but not noted in the original article), participants
were recruited from campus locations at Yale University or at the
University of Chicago’s Graduate School of Business’s Decision


Research Lab (two elite private American universities). The replication
study will recruit participants from UO and TU; respected, but public
universities in America and the Netherlands. We do not anticipate that
these differences will affect the comparison between the original study
and the UO-sample of the replication study. We are, however, uncertain
about the effects for the TU-sample. The observations of one of the
authors of this replication attempt (MB) suggest that the Dutch may prefer
more direct (and thus potentially less optimistic) advice. Although it may
be out-of-date, a cross-country comparison found results consistent with
this observation (Michalos, 1988; data collected between 1978 and 1987).
In response to questions about expectations that next year will be better
the study found that while an average of 50% of people in the United
States thought the next year would be better, only 21 percent of people in
the Netherlands thought the same thing. There are obviously a number of
alternative explanations for this difference; however, it is a difference that
we feel may have the potential to influence the results.
Participants in the original study were compensated with a Snapple drink
(Yale Sample) or with $3 (University of Chicago Sample). We received a
small grant to help us mimic the original study’s participant compensation;
thus, the first 48 participants in the UO sample will be compensated $5 for
their participation. Participants in the TU sample and any participants
surpassing the first 48 of the UO sample will likely complete the study for
course credit (but see footnote 3 for a slight caveat). Importantly, the
original authors noted in an e-mail to our team that the location of their
sample did not influence their results and we do not anticipate that
receiving course credit instead of a tasty drink or money will influence the
results.
We will not collect data on the between-subject experimental conditions
that do not test the primary result. Because these are between-subject
conditions, it is logically extremely unlikely that they will influence the
outcome of the replication.
In summary, although there are differences between the original and the replication
studies, we believe these differences are either trivial, or will be directly tested.
References
Armor, D. A., Massey, C., & Sackett, A. M. (2008). Prescribed optimism: Is it right to be
wrong about the future?. Psychological Science, 19, 329-331.
Michalos, A. C. (1988). Optimism in thirty countries over a decade. Social Indicators
Research, 20, 177-180.
Weinstein, N. D. (1980). Unrealistic optimism about future life events. Journal of
Personality and Social Psychology, 39, 806-820.
Figure 1
Power analysis for 80% Power.
Figure 1
Power analysis for 90% Power.
Figure 3
Power analysis for 95% Power.
Figure 4
Post-hoc power analysis for the primary result of (Armor, Massey, & Sakette, 2008).
Figure 5
Power analysis for 99% Power for the effect size of the lower limit of the 95% confidence
interval.
.