An Investigation of the Prevalence of Replication

An Investigation of the Prevalence
of Replication Research in Human Factors
Keith S. Jones, Paul L. Derby, and Elizabeth A. Schmidlin,
Texas Tech University, Lubbock, Texas
Objective: The present studies investigated the
nature of replication research within the human factors
literature.
Background: Many claim that researchers in certain
fields do not replicate prior research. This is troubling
because replications allow science to self-correct. A successful replication corroborates the original finding,
whereas an unsuccessful replication falsifies it. To date,
no one has assessed whether this issue affects the field
of human factors.
Method: In the first study, eight articles (parent
articles) were selected from the 1991 issues of the
journal Human Factors. Each article that had referenced
one of the eight parent articles between 1991 and September 2006 (child articles) were also retrieved. Two
investigators coded and compared each child article
against its 1991 parent article to determine whether
the child article replicated its parent article. The second
study replicated these procedures.
Results: Half or more of the parent articles in Study 1
and Study 2 (75% and 50%, respectively) were replicated
at least once. Furthermore, human factors researchers
conducted replications of their own work as well as the
work of others. However, many researchers did not
state that they replicated previous research.
Conclusion: Replications seem to be common in
the human factors literature. However, readers may
not realize that a study replicated prior research. Thus,
they may incorrectly assess the evidence concerning a
given finding.
Application: Human factors professionals should
be taught how to identify replications and to be cautious
of research that has not been replicated.
Keywords: replication research, evidence, science of
human factors, philosophy of science, falsification
Address correspondence to Keith S. Jones, Department
of Psychology, Texas Tech University, Lubbock, TX 794092051; [email protected].
HUMAN FACTORS
Vol. 52, No. 5, October 2010, pp. 586–595.
DOI: 10.1177/0018720810384394.
Copyright © 2010, Human Factors and Ergonomics Society.
INTRODUCTION
Science in its ideal form is self-correcting
(Peirce, 1935). Over time, ideas are corroborated
or falsified. To corroborate or falsify an idea,
researchers must replicate prior studies. Therefore, it is important to understand how often
replications are conducted. The current studies
sought to investigate the prevalence of replication
research within the human factors literature.
Replication Types
When a study is methodologically or conceptually similar to an earlier one, then the later study
is considered a replication (Hendrick, 1990; Kelly,
2006; Lindsay & Ehrenberg, 1993; Rosenthal,
1993). According to Hendrick (1990) and Kelly
(2006), there are three ways in which a study can
be replicated: exactly, partially, or conceptually.
Exact replications investigate the same construct
and involve identical methods and measures.
Partial replications investigate the same construct
but with a combination of identical and modified
methods and measures. For example, a researcher
could partially replicate an original study by conducting an extension of the previous research
while retaining most of the original measures.
Conceptual replications investigate the same construct but do so with different methods and measures. For example, a researcher could conduct a
conceptual replication by measuring the same
effects with completely different measures. In each
case, the study published later is considered a replication of the study published earlier (Hendrick,
1990; Kelly, 2006; Lindsay & Ehrenberg, 1993;
Neuliep & Crandall, 1993a; Rosenthal, 1993;
Tsang & Kwan, 1999).
Replication as Self-Correction Mechanism
Scientists do not always agree on how replications allow science to self-correct. This disagreement stems from differing philosophies of science.
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
Replication Research in Human Factors
It would be impossible to adequately summarize
that debate within the scope of this article. Consequently, replications are henceforth discussed
primarily in terms of a single philosophy of science, that is, that of Karl Popper (Gattei, 2006;
Miller, 1985; Popper, 1959), with occasional references to Imre Lakatos’s (1978) philosophy of
science. Popper’s particular philosophy of science was adopted because it is widely known in
the scientific community (McGrew, AlspectorKelly, & Allhoff, 2009).
Popper (1959) argued that science should proceed via the process of elimination (Gattei, 2006;
Miller, 1985). Ideas should be put forth and tested.
Those that fail the test (i.e., are falsified) should
be rejected. Those that pass the test (i.e., are verified) should be provisionally retained pending
further testing. In this light, replications are a vital
part of science’s ability to self-correct. Specifically,
each replication affords another opportunity to
falsify the idea that is under scrutiny. If the replication is unsuccessful, then the idea is rejected.
Popper (1959) also argued that one can assess
how well existing empirical evidence corroborates
an idea that was provisionally retained. Corroboration depends on many factors (Gattei, 2006;
Miller, 1985). Two important factors for the present discussion are how many times the idea has
passed empirical test and the “riskiness” of those
tests. Corroboration increases each time the idea
passes an empirical test; however, the degree
of that corroboration depends on the riskiness
of the test. Tests that rely on novel or unexpected
predictions are considered more risky. In contrast,
tests that involve predictions that have been tested
before are considered less risky. According to
Popper, the former provide greater corroboration
than the latter.
In this light, each replication type offers a different degree of corroboration. First, an exact replication would involve the least risk, because
it would test the idea via the same methods and
measures that were employed by previous research.
Thus, if the exact replication did not falsify the
idea, then it would provide some corroboration
but not as much as other replication types. Second, a partial replication would involve a moderate level of risk, because it would test the idea
via a combination of new and previously used
methods and measures. As such, if the partial
587
replication did not falsify the idea, then it would
provide more corroboration than an exact replication. Third, a conceptual replication would
involve the highest level of risk, because it would
test the idea via methods and measures that have
not been used before. Consequently, if the conceptual replication did not falsify the idea, then
it would provide the most corroboration.
Factors That Discourage Replication
Despite the importance of replication, several
factors discourage researchers from doing so
(Lindsay & Ehrenberg, 1993; Smith, 1970). First,
many researchers believe that they lack the time,
participants, or funds necessary to reproduce
original findings (Smith, 1970). Second, replication is discouraged by certain institutional elements. For example, current methodological
training in many fields focuses on the design of
single rather than multiple experiments (Lindsay
& Ehrenberg, 1993). Furthermore, many journal
editors (Madden, Easley, & Dunn, 1995; Neuliep
& Crandall, 1990) and reviewers (Neuliep &
Crandall, 1993b) prefer to publish studies that
introduce original findings (Hubbard & Armstrong,
1994; Madden et al., 1995; Neuliep & Crandall,
1990, 1993b; Smith, 1970). This emphasis on the
production of original findings may discourage
researchers from pursuing replications (Lindsay
& Ehrenberg, 1993). Collectively, these factors
conspire to discourage researchers from replicating prior research.
Frequency of Replication
The preceding makes it clear that scientists
should replicate prior research, but they may not.
If they do not, then the scientific literature cannot self-correct, and the field will be negatively
affected. The only way to gauge the extent of that
effect is to assess how frequently replications are
being conducted in a given field.
Empirical evidence suggests that researchers
in certain fields are not replicating prior research.
For example, empirical studies of research in the
field of business suggest that replications are
uncommon (Evanschitzky, Baumgarth, Hubbard,
& Armstrong, 2007; Hubbard & Armstrong, 1994;
Hubbard & Vetter, 1996, 1997; Hubbard, Vetter,
& Little, 1998). Those studies found very few
replications (2% to 6%) despite the fact that they
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
588
October 2010 - Human Factors
examined the literature across long time frames
(19 to 30 years). Furthermore, empirical and exploratory studies of research in the social sciences (e.g.,
psychology) suggest that replications are also
uncommon in that domain (Bozarth & Roberts,
1972; Mahoney, 1985; Schmidt, 2009; Sterling,
1959; cf. Neuliep & Crandall, 1993a). For example, Sterling (1959) found no replications in four
volumes of psychology journals that spanned
a 2-year period. Similarly, Bozarth and Roberts
(1972) found only eight replications among the
1,046 empirical clinical and counseling psychology articles published between 1967 and 1970.
These data suggest that replications may be
uncommon in certain fields. Consequently, one
should not assume that replications are common
in one’s field without empirical support. Unfortunately, no empirical evidence exists for the field
of human factors. As a result, human factors scientists cannot assess whether our field benefits
from frequent replication or suffers from the lack
of replication. Without that information, it is difficult to assess whether ideas in our knowledge
base have high or low degrees of corroboration.
STUDY 1
The present study investigated the nature of
replication research within the human factors literature. To do so, we identified eight articles that
reported original findings (“parent” articles).
Then, we identified each article that referenced
those eight parent articles (“child” articles). Two
coders compared the characteristics of each parent
article with the characteristics of each of its child
articles. If the parent and child articles investigated
the same construct, then the child article was considered a replication of the parent article.
Method
This study’s methodology was atypical. For
example, this study did not include human participants. Rather, selected articles served as “participants.” Then, the first coder (the second author)
trained a second investigator (the third author) to
code various aspects of those articles. Last, both
investigators separately coded the articles and
compared their results.
Article selection. Two types of articles were
retrieved: articles that reported original findings
(i.e., parent articles) and articles that cited the
parent articles (i.e., child articles).
(1) We drew parent articles from a single journal, Human Factors, because it is considered one
of the most respected journals within the human
factors community (Dul & Karwowski, 2003).
Furthermore, it publishes a diverse set of research
topics.
To ensure the adequate passage of time, we
chose the 1991 issues of Human Factors. Between
1991 and 2006 (September), parent articles in
Human Factors were cited many times (M = 25.33,
SD = 24.50, range = 1-109). Accordingly, we
expected that this 15-year time frame would capture replications, if they existed.
We then used the ISI Web of Knowledge to
identify every article that was published in the
1991 issues of Human Factors. In addition, we
used the ISI Web of Knowledge to determine how
many later articles referenced each article published in the 1991 issues of Human Factors.
We chose the ISI Web of Knowledge because
it included more than 15,000 journals (Thomson
Reuters, 2008).
(2) Next, the population of parent articles
(N = 43) was divided into quartiles based on the
number of citations received between January of
1991 and September of 2006. The first quartile
contained parent articles that were cited 1 to 6 times
(n = 10), and the second quartile contained parent
articles that were cited 7 to 19 times (n = 11). The
third quartile contained parent articles that were
cited 20 to 36 times (n = 11), and the fourth quartile contained parent articles that were cited 37 or
more times (n = 11).
(3) Two articles were randomly selected from
each quartile. If an article that contained nonreplicable material (e.g., theoretical discussion, field
study) was selected, then another article was randomly selected from its respective quartile until
an article that contained replicable material was
drawn. This process was used to identify eight
parent articles.
(4) To locate child articles, we used the ISI
Web of Knowledge to conduct a “cited reference”
search for each parent article. These searches
identified every article (i.e., child article) that
made reference to the eight parent articles. For
example, a cited reference search would uncover
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
Replication Research in Human Factors
a 1998 child article that cited one of our 1991
parent articles. Nonempirical child articles were
omitted from the results because they could not
contain replications.
Coder training. (5.1) It was imperative that
the two coders were well trained. Consequently,
they were provided with background material
related to replication. The first coder trained the
second coder about how to recognize a replication and to categorize the replication as being
exact, partial, or conceptual. This sort of practice
is commonplace within studies that evaluate literature (Stock, 1994).
(5.2) When the background training was complete, the second coder practiced identifying the
different replication types. To do this, the first
coder selected four parent articles and four child
articles. Three of those child articles represented
either an exact, partial, or conceptual replication,
and the remaining child article represented a
nonreplication.
(5.3) To help conceptualize the difference
between a replication and a nonreplication, the
first coder highlighted necessary variables within
the text of the eight practice articles (four parent
and four child articles). The highlighted items
included the independent variables, dependent
variables, and related statistics. Also, the first
coder reported these variables on a coding sheet.
(5.5) Then, the second coder read the first child
article and used a separate coding sheet in the following ways. First, the second coder reported the
article number, the article authors, and the year of
publication. Next, the second coder reported the
sample size, independent variables, and dependent
variables. Afterward, the second coder reported
the results of the child article as being “significant
and in the same direction,” “significant but not
in the same direction,” “not significant and in the
same direction,” “not significant and not in the
same direction,” or “not significant and direction
not specified.” The second coder then reported
whether any of the previous information was the
same as that in the parent article.
(5.6) If the information on both coding sheets
was similar, then the second coder went to the
next section of their coding sheet. Here, the second coder reported the type of replication (i.e.,
conceptual, partial, exact) used in the child article. A justification was required for this report,
589
which encouraged the coder to make an informed
decision.
(5.7) Next, the second coder reported any
available results of the child article, for example,
the p value or effect size. On a 4-point Likert-type
scale, the second coder reported the overall similarity of the first child article to the parent article
(i.e., very dissimilar to very similar). Using an
even-numbered rating scale forced the investigator to judge the articles as being similar or not.
The second coder then listed similarities and dissimilarities between the parent and child articles.
Last, the second coder repeated the preceding
process for the remaining child articles.
Coding. (6) The general coding process was
identical to that used during training. Child articles associated with a particular parent article
were coded together and separately from the child
articles that were associated with other parent
articles. Coders did not continue to a different set
of articles until every article in the current group
had been coded. This avoided comparing a child
article with a noncorresponding parent article.
Results
Interrater reliability. It was important to establish that there was a high degree of agreement
between our two coders. Therefore, the interrater
reliability was assessed by calculating Cohen’s
kappa for categorical data (Cohen, 1988; Stock,
1994). A value of .70 was the criterion in this
study (Fleiss, 1981; Gardner, 1995; Landis &
Koch, 1977).
The obtained interrater reliability (κ = .88)
exceeded our criterion. Because of the high level
of rater agreement, subsequent analyses were
performed on the data generated by the first
coder.
Were human factors studies replicated? Of
the eight parent articles, six (75%) were replicated at least once. This finding suggests that
researchers have attempted to falsify or corroborate human factors findings, which should
allow for self-correction within the field. However, one should not lose sight of the fact that
researchers had not attempted to falsify or corroborate all of the findings. As such, human
factors researchers should view any finding
with skepticism until they can demonstrate that
someone has attempted to falsify it.
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
590
October 2010 - Human Factors
Number of Replications
18
Conceptual
16
Partial
14
12
10
8
6
9
4
2
0
7
4
1
2
1
A
B
C
4
D
1
1
E
F
G
H
Parent Articles
Figure 1. Number of replications per parent articles A
through H (N = 30) in Study 1.
How frequently were studies replicated? As seen
in Figure 1, parent articles were replicated 0 to
10 times (M = 3.75, SE = 1.41). More important,
four of the eight parent articles were replicated
more than four times. It is encouraging that many
studies have been replicated several times. This
finding suggests two interpretations. First, the
human factors community fosters “progressive
research programmes” (Lakatos, 1978). That is,
human factors researchers actively continue to
generate empirical findings, which contribute to
the cumulative nature of theory and paradigm
development (Lakatos, 1978). Second, many
human factors findings have been thoroughly scrutinized. If those replications were consistently successful, then they would be highly corroborated.
Were the replications successful? Traditionally,
one of two things must happen for a replication to
be successful. Specifically, a successful replication
will have statistically significant results in the
same direction as in the original study (Kelly,
2006; Rosenthal, 1993). Alternatively, a replication could occur when neither study produced
statistically significant findings (Rosenthal, 1993).
Of the 30 replications, 28 (93%) were successful, leaving 2 replications (7%) that were not. Of
the 2 unsuccessful replications, 1 contained results
that failed to reject the null hypothesis. In the
other, the null hypothesis was rejected in the opposite direction of the original finding.
There are at least three potential explanations
for finding so few unsuccessful replications. First,
the results in the original studies could be correct.
Second, our sample may have missed unsuccessful
replications. That is, despite its broad coverage,
the ISI Web of Knowledge does not cover the
entire literature. So it is possible that unsuccessful
replications exist in journals that are not indexed
by the ISI Web of Knowledge. Third, other unsuccessful replications may exist but have not been
published. Rosenthal (1979) refers to this phenomenon as the “file drawer problem,” which
stems from a bias toward publishing successful
research (i.e., significant results). Because of this
bias, unsuccessful results are not always reported,
which presents the possibility that other unsuccessful replications could be stored away in someone’s file drawer. It would be beneficial to the
human factors community if there were a means
of reporting null results. However, because there
is no institutionalized means of reporting null
results, it is difficult to determine how many results
go unpublished.
It is important to note that these unsuccessful
attempts at corroborating the parent article would
not necessarily lead to the falsification of the
overarching theories or paradigms. Other philosophies of science (e.g., Lakatos, 1978) would
suggest that some theories and paradigms would
remain active, or progressive, despite empirical
counter instances or anomalies. Oftentimes, these
counter instances can be explained by additional
hypotheses or treated as outliers. Nevertheless,
careful consideration should be taken while interpreting unsuccessful replications or other examples of empirical counter evidence.
What types of replications were conducted?
None of the replications found in this study was
an exact replication (i.e., used identical methods
and measures). That is, when a parent article was
replicated, the replication was either partial (i.e.,
used identical and modified methods and measures) or conceptual (i.e., used different methods
and measures). Specifically, we found that there
were 4 (13%) partial replications and 26 (87%)
conceptual replications. A binomial test indicated
that partial (n = 4) and conceptual (n = 26) replications were not equally prevalent in our sample,
p < .001.
These findings are consistent with similar studies conducted in other disciplines (Hubbard &
Vetter, 1996, 1997; Hubbard, Vetter, & Little,
1998; Kelly, 2006; Neuliep & Crandall, 1993a).
Specifically, Hubbard and Vetter (1996, 1997,
1998), Kelly (2006), and Neuliep and Crandall
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
Replication Research in Human Factors
(1993a) did not find exact replications within the
business, behavioral ecological, or social psychological literatures, respectively. These results
could be explained in one of the following ways.
First, researchers may choose to conduct partial or conceptual replications because they are
riskier than exact replications. Consequently, if
those replications are successful, then partial or
conceptual replications corroborate findings to
a greater degree than do exact replications. This
benefit may encourage researchers to conduct
partial or conceptual replications (Neuliep &
Crandall, 1993a). Further research would be
needed to confirm this possibility.
Second, the interdisciplinary culture of the
human factors community may influence researchers to generalize their results in other domains.
In other words, a human factors researcher in the
medical domain may want to replicate the principles or theories found within aviation research.
This would, most likely, result in a conceptual
replication, which was the most prevalent replication found.
Third, researchers may avoid conducting exact
replications altogether for various reasons. For
example, there is a positive bias toward publishing
new findings, especially within the social sciences (Lindsey & Ehrenberg, 1993). Also, there
have been negative biases toward publishing replications reported among journal editors and
reviewers (Neuliep & Crandall, 1990, 1993b).
Given these biases, researchers may be conducting conceptual or partial replications without
knowing that they are conducting a replication
(Schmidt, 2009). Rather, they know that they are
not conducting an exact replication (i.e., avoiding
any potential negative bias) while still producing
new findings, such as those associated with conducting conceptual or partial replications.
To examine this possibility, we identified the
number of replications that included any variation
of the word replication (e.g., replicate, replicated).
Our investigation indicated that 6 of the 30 replications (20%) included the words replication or
replicated within the text of the article. The
remaining 24 replications (80%) did not include
any language about replication. This finding suggests that authors may not have known that their
research was a replication or may not have considered their research to be a replication.
591
Is the prevalence of replications correlated
with the prevalence of citations? Results indicated
that the number of replications for a given parent
article was significantly correlated with the number of child articles that were associated with that
parent article (r = .76, p < .03). In general, as the
number of child articles increased, so did the number of replications.
Who conducted the replications? Our results
indicate that the original author(s) or coauthor(s)
of the parent articles conducted 11 of the 30 total
replications (37%). Other authors conducted
19 of the 30 total replications (63%). A Pearson’s
chi-square test determined that replications were
fairly equally distributed between original authors
and other authors, χ2 = 2.13, p = .144.
These results suggest two implications. First,
authors seem to be replicating the work of others.
According to Rosenthal (1993), replications conducted by other authors are better than replications
authored by the same authors, because they offer
more of an unbiased approach to the replication
process. Second, researchers also seem to be taking
the responsibility of replicating their own research.
Even though this is a positive finding, this route
is not as valuable as the first. Rosenthal suggested
that replications conducted by the original authors
or coauthors may bias the replications. Factors
such as similar training and related research interests among researchers may influence the results
of a replication (Rosenthal, 1993). Nonetheless,
replications conducted by original author(s) and
coauthor(s) continue to contribute to the cumulative process of human factors as a science.
Did different types of authors publish different
types of replications? As seen in Table 1, 3 partial
replications (10%) and 8 conceptual replications
(27%) out of the 30 total replications were conducted by the author(s) of the parent article or
someone who had coauthored a paper with the
author(s) of the parent article, and 1 partial replication (3%) and 18 conceptual replications
(60%) out of the 30 total replications were conducted by someone other than the author(s) of
the parent article or someone who had coauthored a paper with the author(s) of the parent
article. According to a two-tailed Fisher exact
probability test, the type of replication was independent of who authored it, p = .13. Therefore,
authors(s) or coauthor(s) and other authors
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
592
October 2010 - Human Factors
TABLE 1: Number of Partial and Conceptual Replications Authored by Same and Different Author(s)
in Study 1
Partial
Parent Article
A
B
C
D
E
F
G
H
n
Conceptual
Same Author(s)
Different Author(s)
Same Author(s)
Different Author(s)
n
1
1
1
0
0
0
0
0
3
0
1
0
0
0
0
0
0
1
5
2
1
0
0
0
0
0
8
4
5
3
4
1
1
0
0
18
10
9
5
4
1
1
0
0
conducted a fairly equal proportion of partial
and conceptual replications.
STUDY 2
Given that Study 1 was the first of its kind,
we replicated it exactly in Study 2. That is, Study 2
employed the same methods as in Study 1. The
main goal of Study 2 was to determine whether
the results of Study 1 could be reproduced.
Method
Article selection. The same methods for parent
and child article selection were employed in
Study 2. If a parent article that was chosen for
Study 1 was randomly selected for Study 2, then
it was omitted and another parent article was
randomly selected.
Coding. The methods for coding were similar
to those employed in Study 1. There were two
exceptions. First, coders did not undergo training because Study 1 and 2 employed the same
coders. Second, coders inputted information
into separate Microsoft Access databases rather
than recording information on individual coding
sheets.
Results
Interrater reliability. As in Study 1, the interrater reliability was assessed by calculating
Cohen’s kappa for categorical data (Cohen, 1988;
Stock, 1994) with a value of .70 as the criterion
for reliability (Fleiss, 1981; Gardner, 1995; Landis
& Koch, 1977). The obtained interrater reliability
(к = .77) in Study 2 exceeded the criterion.
Therefore, subsequent analyses were performed
on the data generated by the first coder.
Were human factors studies replicated? Of the
eight parent articles, four (50%) were replicated
at least once. Consistent with Study 1, this finding
suggests that researchers have attempted to falsify many human factors findings. However, this
sample revealed fewer replications (50%) than
did Study 1 (75%). Consequently, the proportion
of replicated studies in the human factors literature
may be smaller than we first thought.
How frequently were studies replicated? As
seen in Figure 2, parent articles were replicated
0 to 16 times (M = 4.0, SE = 2.04). Furthermore,
three of the eight parent articles were replicated
more than four times. These findings are consistent with the findings of Study 1.
Were the replications successful? In Study 2,
30 of the 32 replications (94%) were successful,
whereas 2 of the replications (6%) were not successful. These results are consistent with those
of Study 1.
What types of replications were conducted? As
found in Study 1, none of the replications in
Study 2 was an exact replication. Study 2 uncovered 2 (6%) partial replications and 30 (94%)
conceptual replications. A binomial test indicated
that partial and conceptual replications were not
equally prevalent in Study 2, p < .001. That is,
the proportion of conceptual replications was
greater than the proportion of partial replications.
This finding was similar to the results of Study 1.
Of those replications, 2 (6%) contained any
variation of the word replication, whereas the
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
Replication Research in Human Factors
Number of Replications
18
593
Conceptual
16
Partial
14
12
10
8
14
6
8
4
2
0
6
2
2
I
J
K
L
M
N
O
P
Parent Articles
Figure 2. Number of replications per parent articles I
through P (N = 32) in Study 2.
other 30 replications (94%) did not. This finding
was also consistent with Study 1.
Is the prevalence of replications correlated
with the prevalence of citations? Results indicated
that the number of replications for a given parent
article was significantly correlated with the number of child articles associated with that parent
article (r = .95, p < .001). In both studies, the
number of replications increased as the number
of child articles increased.
Who conducted the replications? Original
authors or coauthors conducted 6 (19%) of the
32 replications. Other authors conducted 26 (81%)
of the 32 replications. A Pearson’s chi-square test
determined that replications were not equally distributed among author types, χ2 = 12.5, p < .001.
This result is not consistent with Study 1. However, both studies found that a substantial majority
of the replications were conducted by authors not
originally associated with the parent article. If we
combine the data across Study 1 and 2, original
authors conducted 17 (27%) of the replications,
whereas other authors conducted 45 (73%) of the
replications, χ2 = 12.6, p < .001.
Did different types of authors publish different
types of replications? As seen in Table 2, 0 partial
(0%) and 6 conceptual (19%) replications were
published by the original author(s) and coauthor(s),
and 2 partial (6%) and 24 conceptual (75%) replications were published by someone other than
the original author(s) or coauthor(s) of the parent
article. A two-tailed Fisher exact probability test
indicated that the type of replication was independent of who authored it, p = 1.0. This finding
is consistent with the results of Study 1.
CONCLUSION
Science in its ideal form is self-correcting
(Peirce, 1935). Over time, ideas are corroborated or falsified. To corroborate or falsify an
idea, researchers must replicate prior studies.
Researchers do not, however, always do so
(Bozarth & Roberts, 1972; Evanschitzky et al.,
2007; Hendrick, 1990; Hubbard et al., 1998;
Hubbard & Armstrong, 1994; Hubbard & Vetter,
1996, 1997; Kelly, 2006; Lindsay & Ehrenberg,
1993; Smith, 1970; Sterling, 1959). Consequently,
one should not assume that replications are common in one’s field without empirical support.
The present studies were the first to investigate the nature of replication research within the
human factors literature. Collectively, the results
of Study 1 and 2 were as follows. Half or more
of the sampled human factors studies were replicated. If a study was replicated, then it was
generally replicated several times. Those replications were typically successful, despite the fact
that they were not exact replications. Rather, they
were partial or, more commonly, conceptual replications. It should be noted, though, that authors
rarely referred to their research as a replication.
Replications were conducted by the authors or
coauthors of the original study as well as by
researchers who were not associated with the
original publication.
Consequently, our results suggest that half or
more of the findings in our sample have survived
at least one falsification or corroboration attempt,
and many studies have survived repeated attempts.
Furthermore, our results suggest that those falsification attempts have typically been of the riskiest variety, that is, conceptual replications. As
such, it appears that many human factors findings
have survived thorough scrutiny. In other words,
our results suggest that many, but not all, of the
findings in the human factors literature have a
high degree of corroboration.
Two issues, however, must temper this conclusion. First, the limitations of the present studies
must be considered. Specifically, it is unclear
whether our results would transfer across journals
or periods. Study 2 successfully replicated Study 1’s
findings. However, Study 2 sampled parent articles from the same year and journal. Future
research should further examine the nature of
replication research across other periods and
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
594
October 2010 - Human Factors
TABLE 2: Number of Partial and Conceptual Replications Authored by Same and Different Author(s)
in Study 2
Partial
Parent Article
I
J
K
L
M
N
O
P
n
Conceptual
Same Author(s)
Different Author(s)
Same Author(s)
Different Author(s)
0
0
0
0
0
0
0
0
0
2
0
0
0
0
0
0
0
2
1
3
1
1
0
0
0
0
6
13
5
5
1
0
0
0
0
24
within other human factors journals. Second, the
possibility of publication bias must be taken into
account. As noted earlier, the “file drawer problem” stems from a bias toward publishing successful research (Rosenthal, 1979). Because of this
bias, unsuccessful results are not always reported.
Consequently, it is possible that Studies 1 and 2
did not uncover many unsuccessful replications
because they have been stored away in someone’s
file drawer. If true, then the results of Studies 1
and 2 overestimate the degree of corroboration
associated with the findings that were examined
as part of the present research.
One of the more interesting outcomes was that
authors of replications rarely identified their
research as replications. As noted earlier, authors
may not have known that their research was a replication. That is, it may simply be that the authors
did not consider their research to be a replication.
However, not identifying one’s work as a replication undermines the cumulative nature of the science of human factors. In other words, if
researchers do not identify their research as replications, then it may be difficult for readers to
recognize that those articles contain replications,
especially in cases in which conceptual replications are not methodologically similar to their
respective parent article. In that case, it would be
difficult for readers to accurately assess the degree
of corroboration associated with the finding(s) of
the research. Therefore, not making replications
recognizable may negate the purposes of replicating. To remedy this issue, we recommend
that human factors professionals be taught how to
n
16
8
6
2
0
0
0
0
identify replications and that they should identify
replications as such when publishing their research.
KEY POINTS
•• Replications exist within the human factors literature.
•• Exact replications are rare, whereas conceptual and
partial replications are more prevalent.
•• Replications are not always labeled as replications,
so it can be difficult to identify them.
REFERENCES
Bozarth, J. D., & Roberts, R. R. (1972). Signifying significant
significance. American Psychologist, 27, 774–775.
Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Dul, J., & Karwowski, W. (2003). An assessment system for rating scientific journals in the field of ergonomics and human
factors. Rotterdam, Netherlands: Eramus Research Institute of
Management.
Evanschitzky, H., Baumgarth, C., Hubbard, R., & Armstrong, J. S.
(2007). Replication research’s disturbing trend. Journal of
Business Research, 60, 411–415.
Fleiss, J. L. (1981). Statistical methods of rates and proportions
(2nd ed.). New York, NY: Wiley.
Gardner, W. (1995). On the reliability of sequential data: Measurement, meaning, and correction. In J. M. Gottman (Ed.), The
analysis of change (pp. 339–359). Mahwah, NJ: Erlbaum.
Gattei, S. (2009). Karl Popper’s philosophy of science. New York,
NY: Routledge.
Hendrick, C. (1990). Replications, strict replications, and conceptual replications: Are they important? In J. W. Neuliep (Ed.),
Handbook of replication research in the behavioral and social
sciences (pp. 41–49). Newbury Park, CA: Sage.
Hubbard, R., & Armstrong, J. S. (1994). Replications and extension
in marketing: Rarely published but quite contrary. International
Journal of Research in Marketing, 11, 223–248.
Hubbard, R., & Vetter, D. E. (1996). An empirical comparison of
published replication research in accounting, economics,
finance, management, and marketing. Journal of Business
Research, 35, 153–164.
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
Replication Research in Human Factors
Hubbard, R., & Vetter, D. E. (1997). Journal prestige and the publication frequency of replication research in the finance literature.
Quarterly Journal of Business and Economics, 36, 3–14.
Hubbard, R., Vetter, D. E., & Little, E. L. (1998). Replication in strategic management: Scientific testing for validity, generalizability,
and usefulness. Strategic Management Journal, 19, 243–254.
Kelly, C. D. (2006). Replicating empirical research in behavioral
ecology: How and why it should be done but rarely ever is.
Quarterly Review of Biology, 81, 221–236.
Lakatos, I. (1978). The methodology of scientific research programmes: Philosophical papers (Vol. 1). Cambridge, England:
Cambridge University Press.
Landis, J., & Koch, G. G. (1977). The measurement of observer
agreement for categorical data. Biometrics, 33, 159–174.
Lindsay, R. M., & Ehrenberg, A. S. (1993). The design of replicated
studies. American Statistician, 47, 217–338.
Madden, C. S., Easley, R. W., & Dunn, M. G. (1995). How journal editors view replication research. Journal of Advertising, 24, 77–87.
Mahoney, M. J. (1985). Open exchange and epistemic process.
American Psychologist, 40, 29–39.
McGrew, T., Alspector-Kelly, M., & Allhoff, F. (Eds.). (2009).
Philosophy of science: A historical anthology. West Sussex,
England: Wiley-Blackwell.
Miller, D. (1985). Popper selections. Princeton, NJ: Princeton
University Press.
Neuliep, J. W., & Crandall, R. (1990). Editorial bias against replication
research. Journal of Social Behavior and Personality, 5, 85–90.
Neuliep, J. W., & Crandall, R. (1993a). Everyone was wrong: There
are lots of replications out there. Journal of Social Behavior and
Personality, 8, 1–8.
Neuliep, J. W., & Crandall, R. (1993b). Reviewer bias against replication research. Journal of Social Behavior and Personality, 8,
21–29.
Peirce, C. S. (1935). Collected papers of Charles Sanders Peirce
(C. Hartshorne & P. Weiss, Eds., Vol. 6). Cambridge, MA: Harvard
University Press.
Popper, C. (1959). The logic of scientific discovery. London, UK:
Hutchinson.
Rosenthal, R. (1979). The file drawer problem and tolerance for null
results. Psychological Bulletin, 86, 638–641.
595
Rosenthal, R. (1993). Cumulating evidence. In G. Keren & C. Lewis
(Eds.), A handbook for data analysis in the behavioral sciences:
Methodological issues (pp. 519–559). Hillsdale, NJ: Erlbaum.
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of
General Psychology, 13, 90–100.
Smith, N. C. (1970). Replication studies: A neglected aspect of
psychological research. American Psychologist, 25, 970–975.
Sterling, T. D. (1959). Publication decisions and their possible effects
on inferences drawn from tests of significance—or vice versa.
Journal of the American Statistical Association, 54, 30–34.
Stock, W. (1994). Systematic coding for research synthesis. In
H. Cooper & L. V. Hedges (Eds.), The handbook of research
synthesis (pp. 125–138). New York, NY: Russell Sage Foundation.
Thomson Reuters. (2008). Thomson Reuters master journal list.
Retrieved from http://science.thomsonreuters.com/mjl/
Tsang, E. W., & Kwan, K.-M. (1999). Replication and theory development in organizational science: A critical realist perspective.
Academy of Management Review, 24, 759–780.
Keith S. Jones is an associate professor of psychology
at Texas Tech University. He received his PhD in
experimental psychology from the Human Factors
Program at the University of Cincinnati in 2000.
Paul L. Derby is a doctoral candidate in Texas Tech
University’s Psychology Department. He received his
MA in experimental psychology from Texas Tech
University in 2009.
Elizabeth A. Schmidlin is a doctoral student in Texas
Tech University’s Psychology Department. She
received her MA in experimental psychology from
Texas Tech University in 2009.
Date received: November 24, 2008
Date accepted: August 3, 2010
Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016