Simple Selection in Complex Metapopulations

Data archiving in evolutionary biology
Michael Whitlock
Why publicly archive data?
Error checking Meta-analysis New uses
Increase citations
Why publicly archive data?
Error checking Meta-analysis New uses
Increase citations
More than half of published papers contain statistical errors.
5-10% of papers contain errors that change the conclusions.
Gore et al. 1977, Kantoer and Taylor 1994,
McGuigan 1995, Hurlbert and White 1993
Why publicly archive data?
Error checking Meta-analysis New uses
Increase citations
Why publicly archive data?
Error checking Meta-analysis New uses
Increase citations
Bumpus' (1898) data has
been used numerous
times, in ways he never
imagined.
Why publicly archive data?
Error checking Meta-analysis New uses
Increase citations
"Publicly available data was significantly (p = 0.006)
associated with a 69% increase in citations."
-Piwowar et al. (2007) PLOS One
Why publicly archive data?
Error checking Meta-analysis New uses
Increase citations
Teaching and learning
Why publicly archive data?
Error checking Meta-analysis New uses
Increase citations
Data security and Back-ups
Data sharing/archiving in ecology and
evolution: Previous policies
• Some types of archiving required
– (e.g. DNA sequences in GenBank, phylogenies in
TREEBase)
• Data sharing already required by many
journals and by most major funding agencies
Most data is lost to science
very quickly.
... through loss of files, loss of researchers,
loss of context, etc.
Most evolutionary biologists
want data archiving
• 95% of scientists in evolution and ecology
think that data should be publicly archived
(S.
Carrier, J. Greenberg, H. Lapp, R. Scherle, A. Thompson, T. Vision,
and H. White, unpublished manuscript)
• 78% of editorial board members voted for
mandatory archiving (11% against)
• Supported by executive councils of each
society
Evolutionary biology journals
adopting data archiving policies
• The American
Naturalist
• Evolution
• Journal of
Evolutionary Biology
• Molecular Ecology
• Evolutionary
Applications
• Genetics
• Heredity
• Molecular Biology
and Evolution
• Systematic Biology
• Paleobiology
• BMC Evolutionary
Biology
Joint data archiving policy
"This journal requires, as a condition for publication, that
data supporting the results in the paper should be
archived in an appropriate public archive, such as
GenBank, TreeBASE, Dryad, or the Knowledge Network
for Biocomplexity. Data are important products of the
scientific enterprise, and they should be preserved and
usable for decades in the future. Authors may elect to
have the data publicly available at time of publication, or,
if the technology of the archive allows, may opt to
embargo access to the data for a period up to a year after
publication. Exceptions may be granted at the discretion
of the editor, especially for sensitive information such as
human subject data or the location of endangered
species."
Joint data archiving policy
"This journal requires, as a condition for publication, that
data supporting the results in the paper should be
archived in an appropriate public archive, such as
GenBank, TreeBASE, Dryad, or the Knowledge Network
for Biocomplexity. Data are important products of the
scientific enterprise, and they should be preserved and
usable for decades in the future. Authors may elect to
have the data publicly available at time of publication, or,
if the technology of the archive allows, may opt to
embargo access to the data for a period up to a year after
publication. Exceptions may be granted at the discretion
of the editor, especially for sensitive information such as
human subject data or the location of endangered
species."
Joint data archiving policy
"This journal requires, as a condition for publication, that
data supporting the results in the paper should be
archived in an appropriate public archive, such as
GenBank, TreeBASE, Dryad, or the Knowledge Network
for Biocomplexity. Data are important products of the
scientific enterprise, and they should be preserved and
usable for decades in the future. Authors may elect to
have the data publicly available at time of publication, or,
if the technology of the archive allows, may opt to
embargo access to the data for a period up to a year after
publication. Exceptions may be granted at the discretion
of the editor, especially for sensitive information such as
human subject data or the location of endangered
species."
Joint data archiving policy
"This journal requires, as a condition for publication, that
data supporting the results in the paper should be
archived in an appropriate public archive, such as
GenBank, TreeBASE, Dryad, or the Knowledge Network
for Biocomplexity. Data are important products of the
scientific enterprise, and they should be preserved and
usable for decades in the future. Authors may elect to
have the data publicly available at time of publication, or,
if the technology of the archive allows, may opt to
embargo access to the data for a period up to a year after
publication. Exceptions may be granted at the discretion
of the editor, especially for sensitive information such as
human subject data or the location of endangered
species."
Buffering author’s IP concerns
•
Embargos allow time for further use.
•
Archiving only required for data used in
paper.
•
Archived data should be cited fairly-encourage citation of original paper, not
accession numbers.
•
Required by funding bodies in any case
What to archive?
•
Raw data at individual level
–
–
•
Sufficient to re-create results in paper
Not necessarily the whole dataset from
the project
.readme file that explains any missing
details – header names, units, etc.
Data archiving:
Preserving our legacy