An Investigation of the Prevalence of Replication Research in Human Factors Keith S. Jones, Paul L. Derby, and Elizabeth A. Schmidlin, Texas Tech University, Lubbock, Texas Objective: The present studies investigated the nature of replication research within the human factors literature. Background: Many claim that researchers in certain fields do not replicate prior research. This is troubling because replications allow science to self-correct. A successful replication corroborates the original finding, whereas an unsuccessful replication falsifies it. To date, no one has assessed whether this issue affects the field of human factors. Method: In the first study, eight articles (parent articles) were selected from the 1991 issues of the journal Human Factors. Each article that had referenced one of the eight parent articles between 1991 and September 2006 (child articles) were also retrieved. Two investigators coded and compared each child article against its 1991 parent article to determine whether the child article replicated its parent article. The second study replicated these procedures. Results: Half or more of the parent articles in Study 1 and Study 2 (75% and 50%, respectively) were replicated at least once. Furthermore, human factors researchers conducted replications of their own work as well as the work of others. However, many researchers did not state that they replicated previous research. Conclusion: Replications seem to be common in the human factors literature. However, readers may not realize that a study replicated prior research. Thus, they may incorrectly assess the evidence concerning a given finding. Application: Human factors professionals should be taught how to identify replications and to be cautious of research that has not been replicated. Keywords: replication research, evidence, science of human factors, philosophy of science, falsification Address correspondence to Keith S. Jones, Department of Psychology, Texas Tech University, Lubbock, TX 794092051; [email protected]. HUMAN FACTORS Vol. 52, No. 5, October 2010, pp. 586–595. DOI: 10.1177/0018720810384394. Copyright © 2010, Human Factors and Ergonomics Society. INTRODUCTION Science in its ideal form is self-correcting (Peirce, 1935). Over time, ideas are corroborated or falsified. To corroborate or falsify an idea, researchers must replicate prior studies. Therefore, it is important to understand how often replications are conducted. The current studies sought to investigate the prevalence of replication research within the human factors literature. Replication Types When a study is methodologically or conceptually similar to an earlier one, then the later study is considered a replication (Hendrick, 1990; Kelly, 2006; Lindsay & Ehrenberg, 1993; Rosenthal, 1993). According to Hendrick (1990) and Kelly (2006), there are three ways in which a study can be replicated: exactly, partially, or conceptually. Exact replications investigate the same construct and involve identical methods and measures. Partial replications investigate the same construct but with a combination of identical and modified methods and measures. For example, a researcher could partially replicate an original study by conducting an extension of the previous research while retaining most of the original measures. Conceptual replications investigate the same construct but do so with different methods and measures. For example, a researcher could conduct a conceptual replication by measuring the same effects with completely different measures. In each case, the study published later is considered a replication of the study published earlier (Hendrick, 1990; Kelly, 2006; Lindsay & Ehrenberg, 1993; Neuliep & Crandall, 1993a; Rosenthal, 1993; Tsang & Kwan, 1999). Replication as Self-Correction Mechanism Scientists do not always agree on how replications allow science to self-correct. This disagreement stems from differing philosophies of science. Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 Replication Research in Human Factors It would be impossible to adequately summarize that debate within the scope of this article. Consequently, replications are henceforth discussed primarily in terms of a single philosophy of science, that is, that of Karl Popper (Gattei, 2006; Miller, 1985; Popper, 1959), with occasional references to Imre Lakatos’s (1978) philosophy of science. Popper’s particular philosophy of science was adopted because it is widely known in the scientific community (McGrew, AlspectorKelly, & Allhoff, 2009). Popper (1959) argued that science should proceed via the process of elimination (Gattei, 2006; Miller, 1985). Ideas should be put forth and tested. Those that fail the test (i.e., are falsified) should be rejected. Those that pass the test (i.e., are verified) should be provisionally retained pending further testing. In this light, replications are a vital part of science’s ability to self-correct. Specifically, each replication affords another opportunity to falsify the idea that is under scrutiny. If the replication is unsuccessful, then the idea is rejected. Popper (1959) also argued that one can assess how well existing empirical evidence corroborates an idea that was provisionally retained. Corroboration depends on many factors (Gattei, 2006; Miller, 1985). Two important factors for the present discussion are how many times the idea has passed empirical test and the “riskiness” of those tests. Corroboration increases each time the idea passes an empirical test; however, the degree of that corroboration depends on the riskiness of the test. Tests that rely on novel or unexpected predictions are considered more risky. In contrast, tests that involve predictions that have been tested before are considered less risky. According to Popper, the former provide greater corroboration than the latter. In this light, each replication type offers a different degree of corroboration. First, an exact replication would involve the least risk, because it would test the idea via the same methods and measures that were employed by previous research. Thus, if the exact replication did not falsify the idea, then it would provide some corroboration but not as much as other replication types. Second, a partial replication would involve a moderate level of risk, because it would test the idea via a combination of new and previously used methods and measures. As such, if the partial 587 replication did not falsify the idea, then it would provide more corroboration than an exact replication. Third, a conceptual replication would involve the highest level of risk, because it would test the idea via methods and measures that have not been used before. Consequently, if the conceptual replication did not falsify the idea, then it would provide the most corroboration. Factors That Discourage Replication Despite the importance of replication, several factors discourage researchers from doing so (Lindsay & Ehrenberg, 1993; Smith, 1970). First, many researchers believe that they lack the time, participants, or funds necessary to reproduce original findings (Smith, 1970). Second, replication is discouraged by certain institutional elements. For example, current methodological training in many fields focuses on the design of single rather than multiple experiments (Lindsay & Ehrenberg, 1993). Furthermore, many journal editors (Madden, Easley, & Dunn, 1995; Neuliep & Crandall, 1990) and reviewers (Neuliep & Crandall, 1993b) prefer to publish studies that introduce original findings (Hubbard & Armstrong, 1994; Madden et al., 1995; Neuliep & Crandall, 1990, 1993b; Smith, 1970). This emphasis on the production of original findings may discourage researchers from pursuing replications (Lindsay & Ehrenberg, 1993). Collectively, these factors conspire to discourage researchers from replicating prior research. Frequency of Replication The preceding makes it clear that scientists should replicate prior research, but they may not. If they do not, then the scientific literature cannot self-correct, and the field will be negatively affected. The only way to gauge the extent of that effect is to assess how frequently replications are being conducted in a given field. Empirical evidence suggests that researchers in certain fields are not replicating prior research. For example, empirical studies of research in the field of business suggest that replications are uncommon (Evanschitzky, Baumgarth, Hubbard, & Armstrong, 2007; Hubbard & Armstrong, 1994; Hubbard & Vetter, 1996, 1997; Hubbard, Vetter, & Little, 1998). Those studies found very few replications (2% to 6%) despite the fact that they Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 588 October 2010 - Human Factors examined the literature across long time frames (19 to 30 years). Furthermore, empirical and exploratory studies of research in the social sciences (e.g., psychology) suggest that replications are also uncommon in that domain (Bozarth & Roberts, 1972; Mahoney, 1985; Schmidt, 2009; Sterling, 1959; cf. Neuliep & Crandall, 1993a). For example, Sterling (1959) found no replications in four volumes of psychology journals that spanned a 2-year period. Similarly, Bozarth and Roberts (1972) found only eight replications among the 1,046 empirical clinical and counseling psychology articles published between 1967 and 1970. These data suggest that replications may be uncommon in certain fields. Consequently, one should not assume that replications are common in one’s field without empirical support. Unfortunately, no empirical evidence exists for the field of human factors. As a result, human factors scientists cannot assess whether our field benefits from frequent replication or suffers from the lack of replication. Without that information, it is difficult to assess whether ideas in our knowledge base have high or low degrees of corroboration. STUDY 1 The present study investigated the nature of replication research within the human factors literature. To do so, we identified eight articles that reported original findings (“parent” articles). Then, we identified each article that referenced those eight parent articles (“child” articles). Two coders compared the characteristics of each parent article with the characteristics of each of its child articles. If the parent and child articles investigated the same construct, then the child article was considered a replication of the parent article. Method This study’s methodology was atypical. For example, this study did not include human participants. Rather, selected articles served as “participants.” Then, the first coder (the second author) trained a second investigator (the third author) to code various aspects of those articles. Last, both investigators separately coded the articles and compared their results. Article selection. Two types of articles were retrieved: articles that reported original findings (i.e., parent articles) and articles that cited the parent articles (i.e., child articles). (1) We drew parent articles from a single journal, Human Factors, because it is considered one of the most respected journals within the human factors community (Dul & Karwowski, 2003). Furthermore, it publishes a diverse set of research topics. To ensure the adequate passage of time, we chose the 1991 issues of Human Factors. Between 1991 and 2006 (September), parent articles in Human Factors were cited many times (M = 25.33, SD = 24.50, range = 1-109). Accordingly, we expected that this 15-year time frame would capture replications, if they existed. We then used the ISI Web of Knowledge to identify every article that was published in the 1991 issues of Human Factors. In addition, we used the ISI Web of Knowledge to determine how many later articles referenced each article published in the 1991 issues of Human Factors. We chose the ISI Web of Knowledge because it included more than 15,000 journals (Thomson Reuters, 2008). (2) Next, the population of parent articles (N = 43) was divided into quartiles based on the number of citations received between January of 1991 and September of 2006. The first quartile contained parent articles that were cited 1 to 6 times (n = 10), and the second quartile contained parent articles that were cited 7 to 19 times (n = 11). The third quartile contained parent articles that were cited 20 to 36 times (n = 11), and the fourth quartile contained parent articles that were cited 37 or more times (n = 11). (3) Two articles were randomly selected from each quartile. If an article that contained nonreplicable material (e.g., theoretical discussion, field study) was selected, then another article was randomly selected from its respective quartile until an article that contained replicable material was drawn. This process was used to identify eight parent articles. (4) To locate child articles, we used the ISI Web of Knowledge to conduct a “cited reference” search for each parent article. These searches identified every article (i.e., child article) that made reference to the eight parent articles. For example, a cited reference search would uncover Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 Replication Research in Human Factors a 1998 child article that cited one of our 1991 parent articles. Nonempirical child articles were omitted from the results because they could not contain replications. Coder training. (5.1) It was imperative that the two coders were well trained. Consequently, they were provided with background material related to replication. The first coder trained the second coder about how to recognize a replication and to categorize the replication as being exact, partial, or conceptual. This sort of practice is commonplace within studies that evaluate literature (Stock, 1994). (5.2) When the background training was complete, the second coder practiced identifying the different replication types. To do this, the first coder selected four parent articles and four child articles. Three of those child articles represented either an exact, partial, or conceptual replication, and the remaining child article represented a nonreplication. (5.3) To help conceptualize the difference between a replication and a nonreplication, the first coder highlighted necessary variables within the text of the eight practice articles (four parent and four child articles). The highlighted items included the independent variables, dependent variables, and related statistics. Also, the first coder reported these variables on a coding sheet. (5.5) Then, the second coder read the first child article and used a separate coding sheet in the following ways. First, the second coder reported the article number, the article authors, and the year of publication. Next, the second coder reported the sample size, independent variables, and dependent variables. Afterward, the second coder reported the results of the child article as being “significant and in the same direction,” “significant but not in the same direction,” “not significant and in the same direction,” “not significant and not in the same direction,” or “not significant and direction not specified.” The second coder then reported whether any of the previous information was the same as that in the parent article. (5.6) If the information on both coding sheets was similar, then the second coder went to the next section of their coding sheet. Here, the second coder reported the type of replication (i.e., conceptual, partial, exact) used in the child article. A justification was required for this report, 589 which encouraged the coder to make an informed decision. (5.7) Next, the second coder reported any available results of the child article, for example, the p value or effect size. On a 4-point Likert-type scale, the second coder reported the overall similarity of the first child article to the parent article (i.e., very dissimilar to very similar). Using an even-numbered rating scale forced the investigator to judge the articles as being similar or not. The second coder then listed similarities and dissimilarities between the parent and child articles. Last, the second coder repeated the preceding process for the remaining child articles. Coding. (6) The general coding process was identical to that used during training. Child articles associated with a particular parent article were coded together and separately from the child articles that were associated with other parent articles. Coders did not continue to a different set of articles until every article in the current group had been coded. This avoided comparing a child article with a noncorresponding parent article. Results Interrater reliability. It was important to establish that there was a high degree of agreement between our two coders. Therefore, the interrater reliability was assessed by calculating Cohen’s kappa for categorical data (Cohen, 1988; Stock, 1994). A value of .70 was the criterion in this study (Fleiss, 1981; Gardner, 1995; Landis & Koch, 1977). The obtained interrater reliability (κ = .88) exceeded our criterion. Because of the high level of rater agreement, subsequent analyses were performed on the data generated by the first coder. Were human factors studies replicated? Of the eight parent articles, six (75%) were replicated at least once. This finding suggests that researchers have attempted to falsify or corroborate human factors findings, which should allow for self-correction within the field. However, one should not lose sight of the fact that researchers had not attempted to falsify or corroborate all of the findings. As such, human factors researchers should view any finding with skepticism until they can demonstrate that someone has attempted to falsify it. Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 590 October 2010 - Human Factors Number of Replications 18 Conceptual 16 Partial 14 12 10 8 6 9 4 2 0 7 4 1 2 1 A B C 4 D 1 1 E F G H Parent Articles Figure 1. Number of replications per parent articles A through H (N = 30) in Study 1. How frequently were studies replicated? As seen in Figure 1, parent articles were replicated 0 to 10 times (M = 3.75, SE = 1.41). More important, four of the eight parent articles were replicated more than four times. It is encouraging that many studies have been replicated several times. This finding suggests two interpretations. First, the human factors community fosters “progressive research programmes” (Lakatos, 1978). That is, human factors researchers actively continue to generate empirical findings, which contribute to the cumulative nature of theory and paradigm development (Lakatos, 1978). Second, many human factors findings have been thoroughly scrutinized. If those replications were consistently successful, then they would be highly corroborated. Were the replications successful? Traditionally, one of two things must happen for a replication to be successful. Specifically, a successful replication will have statistically significant results in the same direction as in the original study (Kelly, 2006; Rosenthal, 1993). Alternatively, a replication could occur when neither study produced statistically significant findings (Rosenthal, 1993). Of the 30 replications, 28 (93%) were successful, leaving 2 replications (7%) that were not. Of the 2 unsuccessful replications, 1 contained results that failed to reject the null hypothesis. In the other, the null hypothesis was rejected in the opposite direction of the original finding. There are at least three potential explanations for finding so few unsuccessful replications. First, the results in the original studies could be correct. Second, our sample may have missed unsuccessful replications. That is, despite its broad coverage, the ISI Web of Knowledge does not cover the entire literature. So it is possible that unsuccessful replications exist in journals that are not indexed by the ISI Web of Knowledge. Third, other unsuccessful replications may exist but have not been published. Rosenthal (1979) refers to this phenomenon as the “file drawer problem,” which stems from a bias toward publishing successful research (i.e., significant results). Because of this bias, unsuccessful results are not always reported, which presents the possibility that other unsuccessful replications could be stored away in someone’s file drawer. It would be beneficial to the human factors community if there were a means of reporting null results. However, because there is no institutionalized means of reporting null results, it is difficult to determine how many results go unpublished. It is important to note that these unsuccessful attempts at corroborating the parent article would not necessarily lead to the falsification of the overarching theories or paradigms. Other philosophies of science (e.g., Lakatos, 1978) would suggest that some theories and paradigms would remain active, or progressive, despite empirical counter instances or anomalies. Oftentimes, these counter instances can be explained by additional hypotheses or treated as outliers. Nevertheless, careful consideration should be taken while interpreting unsuccessful replications or other examples of empirical counter evidence. What types of replications were conducted? None of the replications found in this study was an exact replication (i.e., used identical methods and measures). That is, when a parent article was replicated, the replication was either partial (i.e., used identical and modified methods and measures) or conceptual (i.e., used different methods and measures). Specifically, we found that there were 4 (13%) partial replications and 26 (87%) conceptual replications. A binomial test indicated that partial (n = 4) and conceptual (n = 26) replications were not equally prevalent in our sample, p < .001. These findings are consistent with similar studies conducted in other disciplines (Hubbard & Vetter, 1996, 1997; Hubbard, Vetter, & Little, 1998; Kelly, 2006; Neuliep & Crandall, 1993a). Specifically, Hubbard and Vetter (1996, 1997, 1998), Kelly (2006), and Neuliep and Crandall Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 Replication Research in Human Factors (1993a) did not find exact replications within the business, behavioral ecological, or social psychological literatures, respectively. These results could be explained in one of the following ways. First, researchers may choose to conduct partial or conceptual replications because they are riskier than exact replications. Consequently, if those replications are successful, then partial or conceptual replications corroborate findings to a greater degree than do exact replications. This benefit may encourage researchers to conduct partial or conceptual replications (Neuliep & Crandall, 1993a). Further research would be needed to confirm this possibility. Second, the interdisciplinary culture of the human factors community may influence researchers to generalize their results in other domains. In other words, a human factors researcher in the medical domain may want to replicate the principles or theories found within aviation research. This would, most likely, result in a conceptual replication, which was the most prevalent replication found. Third, researchers may avoid conducting exact replications altogether for various reasons. For example, there is a positive bias toward publishing new findings, especially within the social sciences (Lindsey & Ehrenberg, 1993). Also, there have been negative biases toward publishing replications reported among journal editors and reviewers (Neuliep & Crandall, 1990, 1993b). Given these biases, researchers may be conducting conceptual or partial replications without knowing that they are conducting a replication (Schmidt, 2009). Rather, they know that they are not conducting an exact replication (i.e., avoiding any potential negative bias) while still producing new findings, such as those associated with conducting conceptual or partial replications. To examine this possibility, we identified the number of replications that included any variation of the word replication (e.g., replicate, replicated). Our investigation indicated that 6 of the 30 replications (20%) included the words replication or replicated within the text of the article. The remaining 24 replications (80%) did not include any language about replication. This finding suggests that authors may not have known that their research was a replication or may not have considered their research to be a replication. 591 Is the prevalence of replications correlated with the prevalence of citations? Results indicated that the number of replications for a given parent article was significantly correlated with the number of child articles that were associated with that parent article (r = .76, p < .03). In general, as the number of child articles increased, so did the number of replications. Who conducted the replications? Our results indicate that the original author(s) or coauthor(s) of the parent articles conducted 11 of the 30 total replications (37%). Other authors conducted 19 of the 30 total replications (63%). A Pearson’s chi-square test determined that replications were fairly equally distributed between original authors and other authors, χ2 = 2.13, p = .144. These results suggest two implications. First, authors seem to be replicating the work of others. According to Rosenthal (1993), replications conducted by other authors are better than replications authored by the same authors, because they offer more of an unbiased approach to the replication process. Second, researchers also seem to be taking the responsibility of replicating their own research. Even though this is a positive finding, this route is not as valuable as the first. Rosenthal suggested that replications conducted by the original authors or coauthors may bias the replications. Factors such as similar training and related research interests among researchers may influence the results of a replication (Rosenthal, 1993). Nonetheless, replications conducted by original author(s) and coauthor(s) continue to contribute to the cumulative process of human factors as a science. Did different types of authors publish different types of replications? As seen in Table 1, 3 partial replications (10%) and 8 conceptual replications (27%) out of the 30 total replications were conducted by the author(s) of the parent article or someone who had coauthored a paper with the author(s) of the parent article, and 1 partial replication (3%) and 18 conceptual replications (60%) out of the 30 total replications were conducted by someone other than the author(s) of the parent article or someone who had coauthored a paper with the author(s) of the parent article. According to a two-tailed Fisher exact probability test, the type of replication was independent of who authored it, p = .13. Therefore, authors(s) or coauthor(s) and other authors Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 592 October 2010 - Human Factors TABLE 1: Number of Partial and Conceptual Replications Authored by Same and Different Author(s) in Study 1 Partial Parent Article A B C D E F G H n Conceptual Same Author(s) Different Author(s) Same Author(s) Different Author(s) n 1 1 1 0 0 0 0 0 3 0 1 0 0 0 0 0 0 1 5 2 1 0 0 0 0 0 8 4 5 3 4 1 1 0 0 18 10 9 5 4 1 1 0 0 conducted a fairly equal proportion of partial and conceptual replications. STUDY 2 Given that Study 1 was the first of its kind, we replicated it exactly in Study 2. That is, Study 2 employed the same methods as in Study 1. The main goal of Study 2 was to determine whether the results of Study 1 could be reproduced. Method Article selection. The same methods for parent and child article selection were employed in Study 2. If a parent article that was chosen for Study 1 was randomly selected for Study 2, then it was omitted and another parent article was randomly selected. Coding. The methods for coding were similar to those employed in Study 1. There were two exceptions. First, coders did not undergo training because Study 1 and 2 employed the same coders. Second, coders inputted information into separate Microsoft Access databases rather than recording information on individual coding sheets. Results Interrater reliability. As in Study 1, the interrater reliability was assessed by calculating Cohen’s kappa for categorical data (Cohen, 1988; Stock, 1994) with a value of .70 as the criterion for reliability (Fleiss, 1981; Gardner, 1995; Landis & Koch, 1977). The obtained interrater reliability (к = .77) in Study 2 exceeded the criterion. Therefore, subsequent analyses were performed on the data generated by the first coder. Were human factors studies replicated? Of the eight parent articles, four (50%) were replicated at least once. Consistent with Study 1, this finding suggests that researchers have attempted to falsify many human factors findings. However, this sample revealed fewer replications (50%) than did Study 1 (75%). Consequently, the proportion of replicated studies in the human factors literature may be smaller than we first thought. How frequently were studies replicated? As seen in Figure 2, parent articles were replicated 0 to 16 times (M = 4.0, SE = 2.04). Furthermore, three of the eight parent articles were replicated more than four times. These findings are consistent with the findings of Study 1. Were the replications successful? In Study 2, 30 of the 32 replications (94%) were successful, whereas 2 of the replications (6%) were not successful. These results are consistent with those of Study 1. What types of replications were conducted? As found in Study 1, none of the replications in Study 2 was an exact replication. Study 2 uncovered 2 (6%) partial replications and 30 (94%) conceptual replications. A binomial test indicated that partial and conceptual replications were not equally prevalent in Study 2, p < .001. That is, the proportion of conceptual replications was greater than the proportion of partial replications. This finding was similar to the results of Study 1. Of those replications, 2 (6%) contained any variation of the word replication, whereas the Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 Replication Research in Human Factors Number of Replications 18 593 Conceptual 16 Partial 14 12 10 8 14 6 8 4 2 0 6 2 2 I J K L M N O P Parent Articles Figure 2. Number of replications per parent articles I through P (N = 32) in Study 2. other 30 replications (94%) did not. This finding was also consistent with Study 1. Is the prevalence of replications correlated with the prevalence of citations? Results indicated that the number of replications for a given parent article was significantly correlated with the number of child articles associated with that parent article (r = .95, p < .001). In both studies, the number of replications increased as the number of child articles increased. Who conducted the replications? Original authors or coauthors conducted 6 (19%) of the 32 replications. Other authors conducted 26 (81%) of the 32 replications. A Pearson’s chi-square test determined that replications were not equally distributed among author types, χ2 = 12.5, p < .001. This result is not consistent with Study 1. However, both studies found that a substantial majority of the replications were conducted by authors not originally associated with the parent article. If we combine the data across Study 1 and 2, original authors conducted 17 (27%) of the replications, whereas other authors conducted 45 (73%) of the replications, χ2 = 12.6, p < .001. Did different types of authors publish different types of replications? As seen in Table 2, 0 partial (0%) and 6 conceptual (19%) replications were published by the original author(s) and coauthor(s), and 2 partial (6%) and 24 conceptual (75%) replications were published by someone other than the original author(s) or coauthor(s) of the parent article. A two-tailed Fisher exact probability test indicated that the type of replication was independent of who authored it, p = 1.0. This finding is consistent with the results of Study 1. CONCLUSION Science in its ideal form is self-correcting (Peirce, 1935). Over time, ideas are corroborated or falsified. To corroborate or falsify an idea, researchers must replicate prior studies. Researchers do not, however, always do so (Bozarth & Roberts, 1972; Evanschitzky et al., 2007; Hendrick, 1990; Hubbard et al., 1998; Hubbard & Armstrong, 1994; Hubbard & Vetter, 1996, 1997; Kelly, 2006; Lindsay & Ehrenberg, 1993; Smith, 1970; Sterling, 1959). Consequently, one should not assume that replications are common in one’s field without empirical support. The present studies were the first to investigate the nature of replication research within the human factors literature. Collectively, the results of Study 1 and 2 were as follows. Half or more of the sampled human factors studies were replicated. If a study was replicated, then it was generally replicated several times. Those replications were typically successful, despite the fact that they were not exact replications. Rather, they were partial or, more commonly, conceptual replications. It should be noted, though, that authors rarely referred to their research as a replication. Replications were conducted by the authors or coauthors of the original study as well as by researchers who were not associated with the original publication. Consequently, our results suggest that half or more of the findings in our sample have survived at least one falsification or corroboration attempt, and many studies have survived repeated attempts. Furthermore, our results suggest that those falsification attempts have typically been of the riskiest variety, that is, conceptual replications. As such, it appears that many human factors findings have survived thorough scrutiny. In other words, our results suggest that many, but not all, of the findings in the human factors literature have a high degree of corroboration. Two issues, however, must temper this conclusion. First, the limitations of the present studies must be considered. Specifically, it is unclear whether our results would transfer across journals or periods. Study 2 successfully replicated Study 1’s findings. However, Study 2 sampled parent articles from the same year and journal. Future research should further examine the nature of replication research across other periods and Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 594 October 2010 - Human Factors TABLE 2: Number of Partial and Conceptual Replications Authored by Same and Different Author(s) in Study 2 Partial Parent Article I J K L M N O P n Conceptual Same Author(s) Different Author(s) Same Author(s) Different Author(s) 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 2 1 3 1 1 0 0 0 0 6 13 5 5 1 0 0 0 0 24 within other human factors journals. Second, the possibility of publication bias must be taken into account. As noted earlier, the “file drawer problem” stems from a bias toward publishing successful research (Rosenthal, 1979). Because of this bias, unsuccessful results are not always reported. Consequently, it is possible that Studies 1 and 2 did not uncover many unsuccessful replications because they have been stored away in someone’s file drawer. If true, then the results of Studies 1 and 2 overestimate the degree of corroboration associated with the findings that were examined as part of the present research. One of the more interesting outcomes was that authors of replications rarely identified their research as replications. As noted earlier, authors may not have known that their research was a replication. That is, it may simply be that the authors did not consider their research to be a replication. However, not identifying one’s work as a replication undermines the cumulative nature of the science of human factors. In other words, if researchers do not identify their research as replications, then it may be difficult for readers to recognize that those articles contain replications, especially in cases in which conceptual replications are not methodologically similar to their respective parent article. In that case, it would be difficult for readers to accurately assess the degree of corroboration associated with the finding(s) of the research. Therefore, not making replications recognizable may negate the purposes of replicating. To remedy this issue, we recommend that human factors professionals be taught how to n 16 8 6 2 0 0 0 0 identify replications and that they should identify replications as such when publishing their research. KEY POINTS •• Replications exist within the human factors literature. •• Exact replications are rare, whereas conceptual and partial replications are more prevalent. •• Replications are not always labeled as replications, so it can be difficult to identify them. REFERENCES Bozarth, J. D., & Roberts, R. R. (1972). Signifying significant significance. American Psychologist, 27, 774–775. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Dul, J., & Karwowski, W. (2003). An assessment system for rating scientific journals in the field of ergonomics and human factors. Rotterdam, Netherlands: Eramus Research Institute of Management. Evanschitzky, H., Baumgarth, C., Hubbard, R., & Armstrong, J. S. (2007). Replication research’s disturbing trend. Journal of Business Research, 60, 411–415. Fleiss, J. L. (1981). Statistical methods of rates and proportions (2nd ed.). New York, NY: Wiley. Gardner, W. (1995). On the reliability of sequential data: Measurement, meaning, and correction. In J. M. Gottman (Ed.), The analysis of change (pp. 339–359). Mahwah, NJ: Erlbaum. Gattei, S. (2009). Karl Popper’s philosophy of science. New York, NY: Routledge. Hendrick, C. (1990). Replications, strict replications, and conceptual replications: Are they important? In J. W. Neuliep (Ed.), Handbook of replication research in the behavioral and social sciences (pp. 41–49). Newbury Park, CA: Sage. Hubbard, R., & Armstrong, J. S. (1994). Replications and extension in marketing: Rarely published but quite contrary. International Journal of Research in Marketing, 11, 223–248. Hubbard, R., & Vetter, D. E. (1996). An empirical comparison of published replication research in accounting, economics, finance, management, and marketing. Journal of Business Research, 35, 153–164. Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016 Replication Research in Human Factors Hubbard, R., & Vetter, D. E. (1997). Journal prestige and the publication frequency of replication research in the finance literature. Quarterly Journal of Business and Economics, 36, 3–14. Hubbard, R., Vetter, D. E., & Little, E. L. (1998). Replication in strategic management: Scientific testing for validity, generalizability, and usefulness. Strategic Management Journal, 19, 243–254. Kelly, C. D. (2006). Replicating empirical research in behavioral ecology: How and why it should be done but rarely ever is. Quarterly Review of Biology, 81, 221–236. Lakatos, I. (1978). The methodology of scientific research programmes: Philosophical papers (Vol. 1). Cambridge, England: Cambridge University Press. Landis, J., & Koch, G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33, 159–174. Lindsay, R. M., & Ehrenberg, A. S. (1993). The design of replicated studies. American Statistician, 47, 217–338. Madden, C. S., Easley, R. W., & Dunn, M. G. (1995). How journal editors view replication research. Journal of Advertising, 24, 77–87. Mahoney, M. J. (1985). Open exchange and epistemic process. American Psychologist, 40, 29–39. McGrew, T., Alspector-Kelly, M., & Allhoff, F. (Eds.). (2009). Philosophy of science: A historical anthology. West Sussex, England: Wiley-Blackwell. Miller, D. (1985). Popper selections. Princeton, NJ: Princeton University Press. Neuliep, J. W., & Crandall, R. (1990). Editorial bias against replication research. Journal of Social Behavior and Personality, 5, 85–90. Neuliep, J. W., & Crandall, R. (1993a). Everyone was wrong: There are lots of replications out there. Journal of Social Behavior and Personality, 8, 1–8. Neuliep, J. W., & Crandall, R. (1993b). Reviewer bias against replication research. Journal of Social Behavior and Personality, 8, 21–29. Peirce, C. S. (1935). Collected papers of Charles Sanders Peirce (C. Hartshorne & P. Weiss, Eds., Vol. 6). Cambridge, MA: Harvard University Press. Popper, C. (1959). The logic of scientific discovery. London, UK: Hutchinson. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638–641. 595 Rosenthal, R. (1993). Cumulating evidence. In G. Keren & C. Lewis (Eds.), A handbook for data analysis in the behavioral sciences: Methodological issues (pp. 519–559). Hillsdale, NJ: Erlbaum. Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology, 13, 90–100. Smith, N. C. (1970). Replication studies: A neglected aspect of psychological research. American Psychologist, 25, 970–975. Sterling, T. D. (1959). Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. Journal of the American Statistical Association, 54, 30–34. Stock, W. (1994). Systematic coding for research synthesis. In H. Cooper & L. V. Hedges (Eds.), The handbook of research synthesis (pp. 125–138). New York, NY: Russell Sage Foundation. Thomson Reuters. (2008). Thomson Reuters master journal list. Retrieved from http://science.thomsonreuters.com/mjl/ Tsang, E. W., & Kwan, K.-M. (1999). Replication and theory development in organizational science: A critical realist perspective. Academy of Management Review, 24, 759–780. Keith S. Jones is an associate professor of psychology at Texas Tech University. He received his PhD in experimental psychology from the Human Factors Program at the University of Cincinnati in 2000. Paul L. Derby is a doctoral candidate in Texas Tech University’s Psychology Department. He received his MA in experimental psychology from Texas Tech University in 2009. Elizabeth A. Schmidlin is a doctoral student in Texas Tech University’s Psychology Department. She received her MA in experimental psychology from Texas Tech University in 2009. Date received: November 24, 2008 Date accepted: August 3, 2010 Downloaded from hfs.sagepub.com at PENNSYLVANIA STATE UNIV on September 19, 2016
© Copyright 2026 Paperzz