the importance of data access and research transparency

THE IMPORTANCE OF DATA ACCESS AND RESEARCH TRANSPARENCY
Vera E. Troeger
Department of Economics and CAGE, University of Warwick
Editor in Chief, Political Science Research and Methods (PSRM), the Journal of the EPSA
DART in Switzerland
November 7, 2014
TRUST IS GOOD, CONTROL IS BETTER?
TRUST IS GOOD, CONTROL IS BETTER!
TRUST IS GOOD, SELF-CONTROL IS BETTER!
Academic fraud has developed into an endemic disease. Particular disciplines working
with experimental data (esp. Medicine and Psychology) face the questions which results
can be trusted.
Why Fraud?
Like doping in sports – allows to reach the goal (publications, citations, tenure,
promotion…) faster
With probability of detection low  incentives for cheating high
Costs are borne by honest academics both personally (competition) and as a profession
(reputation)
An Anecdote:
The FAZ published an article on Wednesday (5 November) that titled “Der Doktortitel
gehoert abgeschafft” – “the doctoral title should be abolished”
The argument was that this would put an end to people trying to cheat their way to a PhD
and the doctoral title.
Especially in Germany this seems to be a big part of the new trend towards “DART”,
transparency, academic honesty and access. Starting with “VroniPlag” – many PhD
dissertations (especially those of prominent politicians) have been proven to be
plagiarised.
Thus academic fraud and misconduct are issues, not just when plagiarising a dissertation,
however proposing to eliminate the very proof of academic qualification (a PhD
dissertation) seems to be equivalent to eliminate speed limits because drivers violate
them all the time…
THUS: Control is Better!
Academic Fraud - Some Facts: The Tip of the Iceberg
Number of withdrawn articles by publication year
Source: Web of Science.
Note: The decrease after 2007 is not due a reduction in cases but the amount of time it takes from
publication to detection of fraud.
Withdrawn articles by discipline, 1985-2013
most affected disciplines Number
cases per
1000 articles
Molecular Biology
293
0,039
Cell Biology
191
0,062
Chemistry
187
0,047
Oncology
145
0,056
Physics
143
0,040
selected disciplines for comparison
Psychology
81
0,030
Ecology
66
0,051
Economics
34
0,028
Sociology
10
0,011
Political Science
3
0,005
Source: Web of Science
Withdrawn Articles by Research Location, 1985-2013
Country
Number
Cases per 1000 Articles
India
183
0.177
China
488
0.170
Singapore
24
0.115
Japan
181
0.086
The Netherlands
90
0.084
Germany
211
0.073
USA
799
0.057
Italy
85
0.049
UK
133
0.047
Switzerland
32
0.046
Canada
75
0.037
France
66
0.029
Source: Web of Science
A Spectacular Case:
Jan Hendrik Schoen, a Physicist who received his PhD from the University of Konstanz (which
he lost after the fraud became public) , published 45 articles in 2001, 17 of which alone in
Science and Nature the flagship journals of the natural and life sciences. Meanwhile 16
articles have been withdrawn due to proven fraud.
The media has caught up to spectacular cases of academic fraud – but these are rather
pathological than representative.
Spectacular cases reveal the weaknesses of academia, however they obscure the much
more usual lower level of academic misconduct. Typically researchers do not invent results
or data but tweak results in order to confirm theories and hypotheses.
- Selective choice of cases
- Fill in missing values at will
- Strategic choice of estimation procedures and model specification
These more subtle forms of academic fraud are much more prevalent.
They are also much harder to detect – not least by the typical peer review process.
Why is control important?
Research produces (positive) results that hinge on our credibility and reputation.
We need to maintain this credibility and reputation by implementing self-control
mechanisms that prevent academic fraud and misconduct.
DART is such an initiative
We cannot leave it to the (criminal) justice system, since the fraud of a few produces
negative externalities for the whole profession.
Detection of Fraud?
It seems almost impossible to detect this kind of subtle fraud through the typical peer
review process, which is supposedly the main instrument of quality assurance in the
academic profession.
In most cases, authors don’t have to provide their data to the reviewers (for good reason?
– original data, sensitive data, personalised data)
The peer review process only evaluates the plausibility of results, it assumes honesty
SOLUTIONS...
Plagiarism
Ulysses: researchers have to bind themselves „to the mast“  Registration
Publication
Replication
Robustness
Solutions have to increase the perceived probability of detection for the single researcher
PLAGIARISM
Publishers can easily implement Plagiarism software into their online submission systems to
screen articles and books for potential copying of existing work without proper citation.
REGISTRATION
(Disciplines which are less affected by spectacular fraud seem to be leading…)
Political Science: egap – 80 experiments
Economics: RCT Registry (American Economic Association) – 240 esperiments
Exponentially increasing registration
Platforms for pre-registration of experimental designs – registered experiments cannot be
changed exposed to adapt the design to the results
Problem: registration doesn’t work if researchers regard the experimentally generated
data as private property which don’t have to be published or made available to reviewers
– researchers can remove cases that do not fit…
CPS – special issue with pre-registered analyses, ms. are reviewed without empirical
results  review process detached from results, avoidance of publication bias
PUBLICATION
Make all data publicly available
Problem: original data, confidential data, personalized data
Improve data citation – data are intellectual products for which citation should be
required (Mooney 2011)

Increase incentives for scholars to publish data (citations count!)
Publication of replication files (datasets and do-files)  necessary but not sufficient
REPLICATION
Strengthening of review process with ACTUAL replication of empirical results
Journals are key!
Journals need to publish null findings and replication studies
PSRM (sorry blatant self-advertisement…): Successful Replication by Data Analyst –
necessary for publication of ms.
90% of replication files do not produce output in submitted ms.
10% serious problems
5% need to withdraw, empirical results cannot be replicated
IS REPLICATION ENOUGH?
Probably not
The example of the excel-spreadsheet mistakes of Rogoff and Reinhard, as well as the
problem of how to treat missing values (FT) in the Piketty case show that simple
replication of results will remain insufficient to prevent fraud…
ROBUSTNESS
Robustness checks can close part of the gap – over the last decade increasingly standard in
the social sciences
Problem: feasibility – who should check robustness of results and at what stage of the
process?
Robustness checks do not just replicate empirical results but take into account that
researchers have to take many decisions about estimation and specification
Many published studies, however, read as if the presented empirical specification was the
only plausible one
Robustness checks assume that alternative specification are not less plausible and test
whether results and conclusions hold for alternative assumptions
At the moment, authors decide which robustness tests to include…
The future: publishers and editors should decide joint policies and agree on which
robustness checks are necessary!
CONTROL
Scientific progress is important
Systematic control to avoid potential fraud is necessary
The scientific community, publishers and journals need to provide the necessary resources
to generate an infrastructure which increases the probability of detecting academic fraud
– much more so than it is the case at present.
DART is an important step in the right direction
(some of the information was graciously provided by Prof. Thomas Pluemper, Essex – please also consult his FAZ article “Vertrauen ist gut,
Kontrolle ist besser” from 20 August 2014)