PowerPoint-Präsentation - Open Science Framework

“Making Science Great Again”
From replication crisis to open science, how we can
improve research
Roy Salomon
Gonda Center 12.6.17
This presentation is inspired by presentations from Daniel Lakens, Jim Grange, and Brian Nosek. Most slides are from PD Dr. Felix
Schönbrodt, Ludwig-Maximilians-Universität München, and used under a CC-BY 4.0 license.
“Only when certain events recur in accordance with rules or
regularities, as in the case of repeatable experiments, can our
observations be tested—in principle—by anyone.... Only by such
repetition can we convince ourselves that we are not dealing with a
mere isolated ‘coincidence.” – Karl Popper (1959, p. 45)
We have a problem!
What are the
causes?
We have a problem!
2011
Bem
2012
2013
2014
2015
4
2011
Bem
Simmons et al.: False-positive psychology
2012
2013
2014
2015
The combination of some typical
questionable research practices
(QRPs) increasesType-I error rate
from 5% to > 50%.
5
2011
Bem
Simmons et al.: False-positive psychology
2012
John et al.: Prevalence of QRPs
2013
2014
2015
“Self-admission rate” for many QRPs > 50%;
estimated prevalence partly > 70%.
6
2011
Bem
Simmons et al.: False-positive psychology
2012
2013
John et al.: Prevalence of QRPs
Doyen et al. (2012) ➙ “The Bargh rant”
Kahneman: Open Letter
Cited by 4195
2014
2015
I believe that you should collectively do
something about this mess.
I see a train wreck looming.
http://www.nature.com/polopoly_fs/7.6716.1349271308!/suppinfoFile/Kahneman%20Letter.pdf
7
I believe that you should collectively do
something about this mess.
I see a train wreck looming.
http://www.nature.com/polopoly_fs/7.6716.1349271308!/suppinfoFile/Kahneman%20Letter.pdf
8
n = 20 in each
condition
d = 0.73
95% CI[0.05; 1.41]
577 citations
http://www.terryburnham.com/2015/04/a-trick-for-higher-sat-scores.html?m=1
9
N >3500
in each condition
p=.76
d = -0.01
95% CI[-0.05; 0.04]
http://www.terryburnham.com/2015/04/a-trick-for-higher-sat-scores.html?m=1
n = 20 in each
condition
d = 0.73
95% CI[0.05; 1.41]
577cited
citations
577x
1
0
2011
Bem
Simmons et al.: False-positive psychology
2012
2013
John et al.: Prevalence of QRPs
Doyen et al. (2012) ➙ “The Bargh rant”
Kahneman: Open Letter
Foundation of Center for Open Science ( Open
) Science
Framework
2014
2015
11
Complete scientific project management
Data management, pre-registrations, version control,
private/public, private read-only links for reviewers, wikis, email
lists, Dropbox/Figshare/Github integration, download statistics
…
12
2011
Bem
Simmons et al.: False-positive psychology
2012
2013
2014
John et al.: Prevalence of QRPs
Doyen et al. (2012) ➙ “The Bargh rant”
Kahneman: Open Letter
Foundation of Center for Open Science (
Open Science Framework
Simonsohn et al.: p-curve
)+
2015
13
Simonsohn et al.: p-curve
p-curve: Null effect
• Under H₀, p-values
are uniformly distributed
a study = drawing a random p-value from this
distribution
3
2
5%
1
0
Density
4
5
• Doing
0.0
0.2
0.4
0.6
p value
0.8
1.0
14
Simonsohn et al.: p-curve
p-curve: Effect size > 0
increasing power, the p-curve gets more positively
skewed
8
• With
4
10%
2
0
Density
6
10% power
0.0
0.2
0.4
0.6
p value
0.8
1.0
15
Simonsohn et al.: p-curve
p-curve: Effect size > 0
• With
increasing power, the p-curve gets more positively
skewed
6
35%
4
2
0
Density
8
10 12
35% power
(average in
psychology)
0.0
0.2
0.4
0.6
p value
0.8
1.0
16
Simonsohn et al.: p-curve
p-curve: Effect size > 0
• With
increasing power, the p-curve gets more positively
skewed
80%
10 15 20 25 30
5
0
Density
80% power
0.0
0.2
0.4
0.6
p value
0.8
1.0
17
Simonsohn et al.: p-curve
Elderly priming p-values
30
20
0
10
Density
40
50
of all p-values
are
49% of all11%
p-values
are expected
expected
to be between
to be <.025
.025 and .05
k=5
0.00
60% power
k=13
0.05
0.10
p value
0.15
0.20
18
Simonsohn et al.: p-curve
http://p-curve.com/
Elderly priming
p-values
(k = 18):
p = .043
p = .034
p = .046
p = .033
p = .017
p = .044
p = .043
p = .048
p = .039
…
19
2011
Bem
Simmons et al.: False-positive psychology
2012
2013
2014
John et al.: Prevalence of QRPs
Doyen et al. (2012) ➙
“The Bargh
rant” Kahneman: Open Letter
Foundation of Center for Open Science (
Open Science Framework
Simonsohn et al.: p-curve
)+
ManyLabs 1 & Special Issue “Replication”
2015
20
ManyLabs 1 & Special Issue “Replication”
Social Psychology:
Replication Special Issue
(Nosek & Lakens, 2014)
Bayesian reanalysis
(Marsman, Schönbrodt, Morey,Wagenmakers, in prep.)
7/59 =
12% replicable
21
2011
Bem
Simmons et al.: False-positive psychology
2012
2013
2014
John et al.: Prevalence of QRPs
Doyen et al. (2012) ➙
“The Bargh
rant” Kahneman: Open Letter
Foundation of Center for Open Science (
Open Science Framework
Simonsohn et al.: p-curve
)+
ManyLabs 1 & Special Issue “Replication”
Schnall-Debate
2015
ManyLabs 3
22
ManyLabs 3
10 effects, 20 labs, n > 3400
23
ManyLabs 3
ES: d = .09, p = .02
n for 95% power = 6708
power in original study
(n = 152): 8%
10 effects, 20 labs, n > 3400
24
2011
Bem
Simmons et al.: False-positive psychology
2012
2013
2014
John et al.: Prevalence of QRPs
Doyen et al. (2012) ➙
“The Bargh
rant” Kahneman: Open Letter
Foundation of Center for Open Science (
Open Science Framework
Simonsohn et al.: p-curve
)+
ManyLabs 1 & Special Issue “Replication”
Schnall-Debate
2015
ManyLabs 3
Reproducibility Project: Psychology (RP:P)
25
Reproducibility Project: Psychology (RP:P)
https://osf.io/ezcuj/wiki/home/
97 replications
• 36% of all
replications were
significant
•
PS - cog: 53%
• JEP:LMC: 48%
• PS - soc: 29%
• JPSP - soc: 23%
•
•
83% of all effect
sizes are smaller
than the original
27
Not my problem?
An outlook to other disciplines.
31
• 53 ‘landmark studies’, not randomly selected:
fresh approaches
targeted for future drug development
• “scientific findings were confirmed in only 6 (11%) cases. Even
knowing the limitations of preclinical research, this was a shocking
result.”
• Bayer Healthcare: 67 target-validation projects in oncology,
women’s health, and cardiovascular medicine. Only 14 (21%)
could be reproduced.
Begley, C. G., & Ellis, L. M. (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483, 531–533. doi:10.1038/483531a Prinz,
F., Schlange, T., & Asadullah, K. (2011). Believe it or not: how much can we rely on published data on potential drug targets? Nature Reviews Drug
Discovery, 10, 712–712. doi:10.1038/nrd3439-c1
32
“Our results indicate that the average statistical power of studies in the
field of neuroscience is probably no more than between ~8% and ~31%,
on the basis of evidence from diverse subfields within neuro-science.
What are the
Causes?
What are the
Solutions?
32
Why is reproducibility so low??
Why?
How?
We are human
We Err. We p-hack.
We HARK. We use
QRDs.
We are part of a system
Publication Bias
Problematic incentive
scheme
33
Unintentional mistakes
The garden of forking paths
Questionable Research Practices
(QRPs)
Fraud
Publication bias
35
Unintentional mistakes
The garden of forking paths
Questionable Research Practices
(QRPs)
Fraud
Publication bias
36
•
Reproducible analysis code and open data
required at submission - “inhouse checking” in
review process
•
54%of all submissions had results in the paper that did not
match the computed results from the code
•
wrong signs, wrong labeling of regression coefficients,
erorrs in sample sizes, wrong descriptive stats
http://thepoliticalmethodologist.com/2014/12/09/a-decade-of-replications-lessons-from-the-quarterly-journal-of-political-science/
37
Unintentional mistakes
Solution:
Open Data
Solution:
Open Scripts
The garden of forking paths
Questionable Research Practices
(QRPs)
Fraud
Publication bias
39
Unintentional mistakes
Solution:
Open Data
Solution:
Open Scripts
The garden of forking paths
Questionable Research Practices
(QRPs)
Fraud
Publication bias
40
The garden of forking paths
Data
Andrew Gelman & Eric Loken, 2013
Inspired by Neurosceptic’s blog: http://blogs.discovermagazine.com/neuroskeptic/2015/05/18/p-hacking-a-talk-and-further-thoughts/#.VV2TiOePKsN
40
The garden of forking p-hacks
P=0.82
P=0.04
P=0.34
Data
P=0.17
P=0.66
P=0.82
P=0.34
P=0.07
Andrew Gelman & Eric Loken, 2013
P=0.24
Inspired by Neurosceptic’s blog: http://blogs.discovermagazine.com/neuroskeptic/2015/05/18/p-hacking-a-talk-and-further-thoughts/#.VV2TiOePKsN
41
Lets do this together
http://shinyapps.org/appshttp://shinyapps.org/apps/p-hacker//p-hacker/
Inspired by Neurosceptic’s blog: http://blogs.discovermagazine.com/neuroskeptic/2015/05/18/p-hacking-a-talk-and-further-thoughts/#.VV2TiOePKsN
42
Solution: Preregistration
The first principle is that you must not fool yourself and you are the easiest
person to fool. -Richard P. Feynman
What should be included in a preregistration?
What
is
a
preregistration?
Predictions
• Hypotheses
•
• Models
Dependent variables
• ROIs
• Confounds
• Exclusion criteria
• Feature definition (“functional connectivity defined as…”)
•
It’s the introduction and methods section
of your future paper.
• Analysis plan
Statistical techniques (algorithms)
• Multiple comparison correction
•
•
Parameters
http://dx.doi.org/10.1371/journal.pone.0132382
Unintentional mistakes
Solution:
Open Data
Solution:
Open Scripts
The garden of forking paths
Solution:
Open Data
Solution:
Pre- registration
Questionable Research Practices
(QRPs)
Fraud
Publication bias
45
Unintentional mistakes
Solution:
Open Data
Solution:
Open Scripts
The garden of forking paths
Solution:
Open Data
Solution:
Pre- registration
Questionable Research Practices
(QRPs)
Fraud
Publication bias
46
QRP
Unintentional mistakes
Solution:
Open Data
Solution:
Reproducible
Scripts
The garden of forking paths
Solution:
Open Data
Solution:
Pre- registration
Questionable Research Practices
(QRPs)
Solution:
Pre- registration
Fraud
Publication bias
48
Psychology/Psychiatry
92%!
34%?
21%?
Fanelli, D. (2011). Negative results are disappearing from most disciplines and countries. Scientometrics, 90(3), 891–904. doi:10.1007/s11192-011-0494-7
49
Reviewed Pre-Registration
https://www.elsevier.com/editors-update/story/peer-review/cortexs-registered-reports
Reviewed Pre-Registration
Advances in Methodologies and
Practices in Psychological
ScienceAIMS Neuroscience
Animal Behavior and Cognition
Attention, Perception, and
Psychophysics
Behavioral Neuroscience
Cognition and Emotion
Cognitive Research: Principles and
Implications
Comprehensive Results in Social
Psychology
Cortex
Drug and Alcohol Dependence
European Journal of Neuroscience
Experimental Psychology
Health Psychology Bulletin
Human Movement Science
Infancy
International Journal of
Psychophysiology
Journal of Business and Psychology
Journal of Cognitive Enhancement
Journal of European Psychology
Students
Journal of Experimental Political
Science
Journal of Media Psychology
Journal of Personnel Psychology
Journal of Research in Personality
Judgment and Decision Making
Management and Organization
Review
Memory
Nature Human Behaviour
NFS Journal
Nicotine & Tobacco Research
Perspectives on Psychological
Science
Royal Society Open Science
Stress and Health
The Leadership Quarterly
Work, Aging and Retirement
Unintentional mistakes
Solution:
Open Data
Solution:
Reproducible
Scripts
The garden of forking paths
Solution:
Open Data
Solution:
Pre- registration
Questionable Research Practices
(QRPs)
Solution:
Pre- registration
Fraud
Publication bias
Solution:
Pre- registration,
Registered reports
52
How we can improve research?
Summary
Our current system’s incentives foster questionable research
Personal
level
practices,
which
decrease the truth
value of our
shared
System
Level
knowledge. To make science great again we need to adopt
new approaches:
• How we appraise
•
•
•
•
•
•
•
Open data
Open scripts
Open materials
Preregistration
Transparency
Better Statistics
Peer review openness
(make others
participate)
and hire people.
• Expect less papers
but better ones.
• What journals we
support?
• Open Access!