The Reproducible Research Advantage Why + how to make your research more reproducible Presentation for Research Week February 22, 2016 April Clyburne-Sherin Objectives • What is reproducibility? • Why practice reproducibility? • What materials are necessary for reproducibility? • How can you make your research reproducible? What is reproducibility? Scientific method • Replication of findings is highest standard of evaluating evidence • Focuses on validating the scientific claim • Requires transparency of methods, data, and code • Minimum standard for any scientific study Observation Analysis Question Replication Testing Hypothesis Prediction Why practice reproducibility? Why practice reproducibility? Fanelli D (2010) “Positive” Results Increase Down the Hierarchy of the Sciences. PLoS ONE 5(4): e10068. Why practice reproducibility? • Study report is not enough to: • Assess the sensitivity of findings to assumptions • Replicate the findings • Distinguish confirmatory from exploratory analyses • Identify protocol deviations • Cannot evaluate the study analyses and findings using a study report alone. Study report Reported results Figures Tables Numerical summaries Why practice reproducibility? Study report Processing Raw data Analysing Analytic data Reporting Reported results Figures Raw results Tables Collecting Numerical summaries Why practice reproducibility? Study report Processing Raw data Analysing Analytic data Reporting Reported results Figures Raw results Tables Collecting To fully assess the analyses and findings of a study, we need more information. Numerical summaries What materials are necessary for reproducibility? Study report Processing Raw data Analysing Analytic data Reporting Reported results Figures Raw results Tables Collecting 1. Data + metadata 2. Code 3. Documentation of methods Numerical summaries Why practice reproducibility? The idealist The pragmatist • Shoulders of giants! • Minimum scientific standard • Allows others to build on your findings • Improved transparency • Increased transfer of knowledge • Increased utility of your data + methods • Data sharing citation advantage (Piwowar 2013) • “It takes some effort to organize your research to be reproducible… the principal beneficiary is generally the author herself.”- Schwab & Claerbout • Improves capacity for complex and large datasets or analyses • Increased productivity How can you make your research reproducible? 1. Plan for reproducibility before you start • • • • Power Data management plan Informative naming + location Study plan + pre-analysis plan 2. Keep track of things • Version control • Documentation 3. Contain bias • Reporting • Confirmatory vs. exploratory analyses 4. Archive + share your materials 1. Plan for reproducibility before you start Power • Calculate your power • Low power means: – Low probability of finding true effects – Low probability that a positive is a true positive (positive predictive value) – Exaggerated estimate of the magnitude of effect when true effect discovered – Greater vibration of effects • Low powered studies produce more false negatives than high powered • If there are 100 true positive effects in a field, 20% power means only 20 of them will be discovered 1. Plan for reproducibility before you start Power • Calculate your power • Low power means: – Low probability of finding true effects – Low probability that a positive is a true positive (positive predictive value) – Exaggerated estimate of the magnitude of effect when true effect discovered – Greater vibration of effects 1. Plan for reproducibility before you start Power • Calculate your power • Low power means: – Low probability of finding true effects – Low probability that a positive is a true positive (positive predictive value) – Exaggerated estimate of the magnitude of effect when true effect discovered – Greater vibration of effects The Winner’s Curse 1. Plan for reproducibility before you start Power • Calculate your power • Low power means: – Low probability of finding true effects – Low probability that a positive is a true positive (positive predictive value) – Exaggerated estimate of the magnitude of effect when true effect discovered – Greater vibration of effects How? • More likely obtain different estimates of the magnitude of the effect depending on the analytical options it implements • A manipulation affecting only three observations could change the odds ratio from 1.00 to 1.50 in a small study but might only change it from 1.00 to 1.01 in a large study 1. Plan for reproducibility before you start Power • Calculate your power • Low power means: – Low probability of finding true effects – Low probability that a positive is a true positive (positive predictive value) – Exaggerated estimate of the magnitude of effect when true effect discovered – Greater vibration of effects How? • Estimate the size of effect you are studying • Design your study with sufficient power to detect that effect • If you need more power, consider collaborating • If your study is underpowered, report this and acknowledge this limitation in the interpretation of your results 1. Plan for reproducibility before you start Data management plan How? • Prepare to share • Data that is well-managed from the start is easier to prepare for sharing • Smooths transitions between researchers • Protects you if questions are raised about data validity • Metadata provides context • Document metadata while collecting to save time • Use open data formats rather than proprietary: .csv, .txt , .png • Data: – – – – Collected Stored Documented Managed • Metadata: – Collected – Documented / Version control 1. Plan for reproducibility before you start Informative name + location • Plan your file naming + location system a priori • Names and locations should be distinctive, consistent, and informative: – What it is – Why it exists – How it relates to other files 1. Plan for reproducibility before you start Informative name + location • The rules don’t matter. That you have rules matters. • Make it machine readable: – Default ordering – Use of meaningful deliminators and tags – Example: use “_” and “-” to store metadata in name (eg, YYYY-MMDD_assay_sample-set_well) • Make it human readable: – Choose self-explanatory names and locations 1. Plan for reproducibility before you start Study plan • Pre-register your study plan before you look at your data • Public registration of all studies counters publication bias • Counters selective reporting and outcome reporting bias • Distinguishes a priori design decisions from post hoc • Corroborates the rigor of your findings How? • Hypothesis • Study design – – – – Type of design Sampling Power and sample size Randomization? • Variables measured – Meaningful effect size • Variables constructed – Data processing Open Science Framework ClinicalTrials.gov 1. Plan for reproducibility before you start Pre-analysis plan How? • Pre-register your analysis plan before you look at your data • Defines your confirmatory analyses • Corroborates the rigor of your findings • Define data analysis set • Statistical analyses Processing Analysing – Primary – Secondary – Exploratory • • • • Missing data Outliers Multiplicity Subgroups + covariates (Adams-Huet and Ahn, 2009) Raw data Analytic data Raw results 1. Plan for reproducibility before you start Pre-registration • Pre-register your study + analysis plan with Registered Reports 2. Keep track of things Version control • Track your changes • Everything created manually should use version control • Tracks changes to files, code, metadata • Allows you to revert to old versions • Make incremental changes: commit early, commit often • Git / GitHub / BitBucket Version control for data • Metadata should be version controlled 2. Keep track of things Documentation • Document everything done by hand • Document your software environment (eg, dependencies, libraries, sessionInfo () in R) • Everything done by hand or not automated from data and code should be precisely documented: – README files • Make raw data read only – You won’t edit it by accident – Forces you to document or code data processing • Document in code comments 3. Contain bias Reporting • Report transparently + completely • Transparently means: – – – – Readers can use the findings Replication is possible Users are not misled Findings can be pooled in meta-analyses • Completely means: – All results are reported, no matter their direction or statistical significance How? • Use reporting guidelines • CONSORT • Consolidated Standards of Reporting Trials • SAMPL • Statistical analyses and methods in the published literature 3. Contain bias Confirmatory vs. exploratory • Distinguish confirmatory from exploratory analyses How? • Provide access to your preregistered analysis plan • Avoid HARKing: Hypothesizing After the Results are Known • Report all deviations from your study plan • Report which decisions were made after looking at the data 4. Archive + share your materials Share your materials • Where doesn’t matter. That you share matters. • Get credit for your code, your data, your methods • Increase the impact of your research Open Science Framework How can you make your research reproducible? 1. Plan for reproducibility before you start • Power – Calculate your power • Data management plan – Prepare to share • Informative naming + location – The rules don’t matter. That you have rules matters. • Study plan + pre-analysis plan – Pre-register your plans 2. Keep track of things • Version control – Track your changes • Documentation – Document everything done by hand 3. Contain bias • Reporting – Report transparently + completely • Confirmatory vs. exploratory analyses – Distinguish confirmatory from exploratory 4. Archive + share your materials • Where doesn’t matter. That you share matters. How to learn more • Organizing a project for reproducibility – Reproducible Science Curriculum by Jenny Bryan – https://github.com/reproducibl e-science-curriculum/ • Data management – Data Management from Software Carpentry by Orion Buske – http://softwarecarpentry.org/v4/data/mgmt.h tml • Literate programming – Literate Statistical Programming by Roger Peng – https://www.youtube.com/wat ch?v=YcJb1HBc-1Q • Version control – Version Control by Software Carpentry – http://softwarecarpentry.org/v4/vc/ • Sharing materials – Open Science Framework by Center for Open Science – https://osf.io/
© Copyright 2026 Paperzz