Setting Expectations for Replicability in Science Prasad Patil Harvard/DFCI Biostatistics 03/02/17 OSC (Aug 2015) 100 pairs of original studies and replication efforts. Many metrics presented, both quantitative and qualitative. P-value comparison 95% CI membership Average effect size Questionnaires posed to replicators Terms We’ve Seen So Far Reproducibility Replicability Reliability Repeatability Strength Validity OSC (2015) Article Summary Reproducibility Given code/data/materials, can I get the same1 numbers that you did? Replicability Given scientific protocol, can I get the same2 result that you did in my own 1identical study? 2 in agreement See also: Cacioppo (2015); Goodman (2016) Patil, P., Peng, R. D., & Leek, J. (2016). bioRxiv, 066803. Reproducibility Tools Desire ? • Data repositories • • http://www.nature.com/sdata/policies/repositories http://oad.simmons.edu/oadwiki/Data_repositories “methodological • R Markdown/Jupyter Notebooks • Githubterrorism” •“research AWS parasites” • Reproducible analysis • Journal policies Replicability Tools ? Desire • AMGEN (Begley & Ellis, 2012) • Preclinical Reproducibility & Robustness channel F1000 (2016) • OSC • Science 2015 • Manylabs • NIH Rigor & Reproducibility • Pre-Registration/Registered Reports “More generally, there is no such thing as exact replication…When results differ, it offers an opportunity for hypothesis generation and then testing to determine why.” Anderson et. al. (2016) Response to Comment 𝜃𝑜𝑟𝑖𝑔 ≅ 𝜃𝑟𝑒𝑝 variation Robustness/Sensitivity 𝜃𝑜𝑟𝑖𝑔 ≅ 𝜃𝑟𝑒𝑝 | variation Replicability Original study effect size versus replication effect size (correlation coefficients) Open Science Collaboration Science 2015;349:aac4716 77% in or above PI Patil, P., Peng, R. D., & Leek, J. T. (2016). Perspectives on Psychological Science, 11(4), 539-544. • Invariance to order of experiments? • RP-testing (Goodman 1992 -> Shao & Chow 2002 -> De Martini 2008) • Assumptions about strictness of replication • How to condition/adjust for nonsampling variability? • Statistical vs. meaningful replication References • Anderson, Christopher J., et al. "Response to comment on ‘estimating the reproducibility of psychological science’." Science 351.6277 (2016): 1037-1037. • Cacioppo, John T., et al. "Social, Behavioral, and Economic Sciences Perspectives on Robust and Reliable Science." (2015). • De Martini, Daniele. "Reproducibility probability estimation for testing statistical hypotheses." Statistics & probability letters 78.9 (2008): 1056-1061. • Fiske, Susan T. “Mob Rule or Wisdom of the Crowds?” APS Observer, in press. • Goodman, Steven N. "A comment on replication, P‐values and evidence." Statistics in medicine 11.7 (1992): 875-879. • Goodman, Steven N., Daniele Fanelli, and John PA Ioannidis. "What does research reproducibility mean?." Science translational medicine 8.341 (2016): 341ps12-341ps12. • Longo, Dan L., and Jeffrey M. Drazen. "Data sharing." NEJM 374 (2016): 276-277. • Open Science Collaboration. "Estimating the reproducibility of psychological science." Science 349.6251 (2015): aac4716. • Patil, Prasad, Roger D. Peng, and Jeffrey T. Leek. "What should researchers expect when they replicate studies? A statistical view of replicability in psychological science." Perspectives on Psychological Science 11.4 (2016): 539-544. • Patil, Prasad, Roger D. Peng, and Jeffrey T. Leek. "A statistical definition for reproducibility and replicability." bioRxiv (2016): 066803. • Shao, Jun, and Shein‐Chung Chow. "Reproducibility probability in clinical trials." Statistics in Medicine 21.12 (2002): 1727-1742. Gilbert et. al. (Oct 2015) • Improper benchmark for how many replications fail by chance. • Sampling variation. • n = 2 not enough for “estimating replicability”. • ManyLabs • Infidelities in replication protocols • Variation beyond sampling left unquantified. Shnabel & Nadler (2008) “In one original study, researchers asked Israelis to imagine the consequences of taking a leave from mandatory military service (Shnabel & Nadler, 2008). The replication study asked Americans to imagine the consequences of taking a leave to get married and go on a honeymoon.” Gilbert et. al. (2016) Response to Rebuttal Shnabel & Nadler (2008) “…the study was about how victims and perpetrators respond when someone else takes credit for their work…the reason for being away from the office (reserve duty, maternity leave, or honeymoon) was an incidental feature of the scenario to allow someone to take credit for another person’s work.” http://retractionwatch.com/2016/03/07/lets-not-mischaracterize-replicationstudies-authors/ 95% Prediction Interval For (𝑋𝑖 , 𝑌𝑖 ) bivariate normal, 𝑖 = 1, … , 𝑁 1 1+𝜌 1 arctanh 𝑟 ~ 𝑁 ln , 2 1−𝜌 𝑁−3 Hence, for 𝑟𝑜𝑟𝑖𝑔 and 𝑟𝑟𝑒𝑝 , arctanh 𝑟𝑜𝑟𝑖𝑔 − arctanh 𝑟𝑟𝑒𝑝 ~ 𝑁 0, 1 + 1 𝑁𝑜𝑟𝑖𝑔 − 3 𝑁𝑟𝑒𝑝 − 3 Patil, P., Peng, R. D., & Leek, J. T. (2016). Perspectives on Psychological Science, 11(4), 539-544.
© Copyright 2026 Paperzz