Setting Expectations for Replicability in Science

Setting Expectations for
Replicability in Science
Prasad Patil
Harvard/DFCI Biostatistics
03/02/17
OSC (Aug 2015)
 100 pairs of original studies and
replication efforts.
 Many metrics presented, both
quantitative and qualitative.
 P-value comparison
 95% CI membership
 Average effect size
 Questionnaires posed to replicators
Terms We’ve Seen So Far
 Reproducibility
 Replicability
 Reliability
 Repeatability
 Strength
 Validity
OSC (2015) Article Summary
Reproducibility
Given code/data/materials, can I get
the same1 numbers that you did?
Replicability
Given scientific protocol, can I get the
same2 result that you did in my own
1identical
study?
2
in agreement
See also: Cacioppo (2015); Goodman (2016)
Patil, P., Peng, R. D., & Leek, J. (2016). bioRxiv, 066803.
Reproducibility
Tools

Desire ?
• Data repositories
•
•
http://www.nature.com/sdata/policies/repositories
http://oad.simmons.edu/oadwiki/Data_repositories
“methodological
• R Markdown/Jupyter Notebooks
• Githubterrorism”
•“research
AWS
parasites”
• Reproducible analysis
• Journal policies
Replicability
Tools ?
Desire

• AMGEN (Begley & Ellis, 2012)
• Preclinical Reproducibility &
Robustness channel F1000 (2016)
• OSC
• Science 2015
• Manylabs
• NIH Rigor & Reproducibility
• Pre-Registration/Registered Reports
“More generally, there is no such
thing as exact replication…When
results differ, it offers an
opportunity for hypothesis
generation and then testing to
determine why.”
Anderson et. al. (2016) Response to Comment
𝜃𝑜𝑟𝑖𝑔 ≅ 𝜃𝑟𝑒𝑝
variation
Robustness/Sensitivity
𝜃𝑜𝑟𝑖𝑔 ≅ 𝜃𝑟𝑒𝑝 | variation
Replicability
Original study effect size versus replication effect size (correlation coefficients)
Open Science Collaboration Science 2015;349:aac4716
77% in or
above PI
Patil, P., Peng, R. D., & Leek, J. T. (2016). Perspectives on Psychological Science, 11(4), 539-544.
• Invariance to order of experiments?
• RP-testing (Goodman 1992 -> Shao &
Chow 2002 -> De Martini 2008)
• Assumptions about strictness of
replication
• How to condition/adjust for nonsampling variability?
• Statistical vs. meaningful replication
References
• Anderson, Christopher J., et al. "Response to comment on ‘estimating the
reproducibility of psychological science’." Science 351.6277 (2016): 1037-1037.
• Cacioppo, John T., et al. "Social, Behavioral, and Economic Sciences Perspectives on
Robust and Reliable Science." (2015).
• De Martini, Daniele. "Reproducibility probability estimation for testing statistical
hypotheses." Statistics & probability letters 78.9 (2008): 1056-1061.
• Fiske, Susan T. “Mob Rule or Wisdom of the Crowds?” APS Observer, in press.
• Goodman, Steven N. "A comment on replication, P‐values and evidence." Statistics in
medicine 11.7 (1992): 875-879.
• Goodman, Steven N., Daniele Fanelli, and John PA Ioannidis. "What does research
reproducibility mean?." Science translational medicine 8.341 (2016): 341ps12-341ps12.
• Longo, Dan L., and Jeffrey M. Drazen. "Data sharing." NEJM 374 (2016): 276-277.
• Open Science Collaboration. "Estimating the reproducibility of psychological
science." Science 349.6251 (2015): aac4716.
• Patil, Prasad, Roger D. Peng, and Jeffrey T. Leek. "What should researchers expect
when they replicate studies? A statistical view of replicability in psychological
science." Perspectives on Psychological Science 11.4 (2016): 539-544.
• Patil, Prasad, Roger D. Peng, and Jeffrey T. Leek. "A statistical definition for
reproducibility and replicability." bioRxiv (2016): 066803.
• Shao, Jun, and Shein‐Chung Chow. "Reproducibility probability in clinical
trials." Statistics in Medicine 21.12 (2002): 1727-1742.
Gilbert et. al. (Oct 2015)
• Improper benchmark for how many
replications fail by chance.
• Sampling variation.
• n = 2 not enough for “estimating
replicability”.
• ManyLabs
• Infidelities in replication protocols
• Variation beyond sampling left
unquantified.
Shnabel & Nadler (2008)
“In one original study, researchers
asked Israelis to imagine the
consequences of taking a leave from
mandatory military service (Shnabel &
Nadler, 2008). The replication study
asked Americans to imagine the
consequences of taking a leave to get
married and go on a honeymoon.”
Gilbert et. al. (2016) Response to Rebuttal
Shnabel & Nadler (2008)
“…the study was about how victims
and perpetrators respond when
someone else takes credit for their
work…the reason for being away from
the office (reserve duty, maternity
leave, or honeymoon) was an
incidental feature of the scenario to
allow someone to take credit for
another person’s work.”
http://retractionwatch.com/2016/03/07/lets-not-mischaracterize-replicationstudies-authors/
95% Prediction Interval
For (𝑋𝑖 , 𝑌𝑖 ) bivariate normal, 𝑖 = 1, … , 𝑁
1
1+𝜌
1
arctanh 𝑟 ~ 𝑁 ln
,
2
1−𝜌 𝑁−3
Hence, for 𝑟𝑜𝑟𝑖𝑔 and 𝑟𝑟𝑒𝑝 ,
arctanh 𝑟𝑜𝑟𝑖𝑔 − arctanh 𝑟𝑟𝑒𝑝 ~ 𝑁 0,
1
+
1
𝑁𝑜𝑟𝑖𝑔 − 3 𝑁𝑟𝑒𝑝 − 3
Patil, P., Peng, R. D., & Leek, J. T. (2016). Perspectives on Psychological Science, 11(4), 539-544.