Sharing your research data: good scientific

Sharing your research data:
good scientific practice,
but is it for me ?
Hot Topic Lecture series
University of Ghent
15 December 2014
Science advances through data sharing
Society benefits from data sharing
Society benefits from data sharing
Sharing research data
• Collective vs. individual …… but ….. It’s my data !
✔
Motivations for sharing research data
Motivations for researchers to share their data:
• science directly drives the need for data sharing
• data sharing increases the visibility of the research(er)
• cultural norms in a research group, community, discipline
• framework of policies, infrastructure and data services as
external drivers
Need:
• policies and agreements – create level playing field
• training – sharing to become standard research practice
(Van den Eynden & Bishop, 2014. http://t.co/K6P006cROH)
Researchers’ views (cont.)
Different modes of data sharing:
• private management sharing
• collaborative sharing
• peer exchange
• sharing for transparent governance
• community sharing
• public sharing
(> 5 European case studies with active data sharing in 5 European
countries: arts and humanities, social sciences, biomedicine,
chemistry and biology)
(Van den Eynden & Bishop, 2014. http://t.co/K6P006cROH)
Case studies
Push factors: research funders
European open access policies: Horizon 2020, European Research
Council (ERC)
• communication & recommendation on access to / preservation
of scientific information (July 2012) (publications & research
data)
• pilot on open access to research data, primarily data underlying
(open access) scientific publications for Horizon 2020
• data management guidelines for Horizon 2020 (~ policies)
generally based on OECD Principles and Guidelines for Access to
Research Data from Public Funding
Push factors: journals & publishers
• data underpinning publication accessible
• upon request from author
• as supplement with publication
• in public repository
Examples:
•
•
•
BioMed Central open data statement
BMJ: “From January 2013, trials of drugs and medical devices will be
considered for publication only if the authors commit to making the
relevant anonymised patient level data available on reasonable request.”
PLOS ONE: “Publication is conditional upon the agreement of the
authors to make freely available any materials and information described
in their publication that may be reasonably requested by others.” …/….
“will not consider a study if the conclusions depend solely on the analysis
of proprietary data” … “the paper must include an analysis of public data
that validates the conclusions so others can reproduce the analysis and
build on the findings.”
Push factors: political
G8 science ministers statement:
open scientific research data that are easily discoverable,
accessible, assessable, intelligible, useable, and wherever possible
interoperable to specific quality standards (G8 UK, 2014).
High Level Expert Group on Scientific Data:
need for a European scientific e-infrastructure that supports
seamless access, use, reuse, and trust in data in order for open
infrastructure, open culture and open content to go hand-in-hand
(European Commission, 2010)
Push factors: science community
Science as an Open Enterprise:
publishing data in a reusable form to support research
findings should be mandatory; stronger: not sharing
research data should be considered bad science (The
Royal Society, 2012)
Push factors: research integrity
(De Standaard, 23 april 2013, p.1)
Happenings in “dataland”
• How can you share research data ?
• What infrastructure, services, tools, etc are available ?
More and more data repositories (examples)
Domain repositories
• Marien Data Archief (VLIZ)
• Genbank
• Publishing Network for Geoscientific and Environmental
Data (PANGAEA)
• The Arabidopsis Information Resource (TAIR)
• Dataverse
Generic repositories
• Dryad
• figshare
• Zenodo
Find a repository: www.re3data.org
Example: UK data centres
•
•
•
•
•
•
•
•
•
•
Archaeology Data Service
Biomedical Informatics Research
Network Data Repository
British Atmospheric Data Centre
British Library National Sound
Archive
British Oceanographic Data
Centre
Cambridge Crystallographic
Data Centre
ChemSpider
ChemSpider Synthetic Pages
eCrystals
Endangered Language Archive
•
•
•
•
•
•
•
•
•
•
•
Environmental Information Data
Centre
Ethno-ornithology World Archive
National Biodiversity Network
National Geoscience Data
Centre
NERC Earth Observation Data
Centre
NERC Environmental
Bioinformatics Centre
Polar Data Centre
The Oxford Text Archive
UK Data Archive
UK Solar System Data Centre
Visual Arts Data Service
Institutional data repositories - platforms
www.eprints.org/
• Widely used as an IR solution already
• Active community of users and developers
ckan.org/
• Great for open, active data
• Includes visualisation features
www.dspace.org/
• Another widely supported IR, now doing data
projecthydra.org/
• Very customisable top layer
• Fedora repository underneath
Data journals
• Data journals a fairly new phenomenon, but growing
• Publish a detailed journal style article describing the data and how it
was collected
• Recommends or provides a place of deposit
• e.g. Nature Scientific Data
Keep an overview of dispersed data
resources
• Research Data Australia: register of projects, publications, data,
people for all research in Australia
• UK: central registry harvesting metadata from all national data
centres and institutional repositories in development
Citing data
• Citation a fundamental part of research and academia in general
• Just as articles are cited, data which has contributed to research
should be cited
• Data citation:
•
•
•
•
•
Fairly acknowledges the authors sources
Promotes reproduction of research findings
Makes it easier to find data for others who are interested
Allows impact of data to be tracked
Provides a structure that recognises and rewards data creators
FORCE 11 Joint Declaration of Data Citation Principles
Data citation
• Data DOIs (persistent identifiers)
• Thomson Reuters Data Citation Index
Link together your research
Source: ORCID: Connecting Research and Researchers,
Biblioteca del Campus Terrassa on Jul 11, 2013
Dynamic data citation and microcitation
• Citing parts (fragments) of data collections
• single files
• subsets of quantitative data
• extracts of textual data
• E.g. UKDS Quali Bank enables extract level citation
• Citation has rich highly structured XML metadata
• GUIDs to identify subsets citation database
• Human reference references the ‘mother’ DOI
UK Quali Bank
Linking publications and data
Options for sharing research data with
personal / confidential data
• Obtain informed consent, also for data sharing and
preservation / curation
• Protect identities e.g. anonymisation, not collecting personal
data
• Regulate access where needed (all or part of data) e.g. by
group, use, time period
Managing data access
• UK Data Service: web access to data and metadata
• Data freely available for use; commercial use charges
• Metadata / documentation always open
• Data available under 3 access levels:
OPEN
SAFEGUARDED – End User Licence
(e.g. not identify any potentially identifiable individuals)
• Special agreements: depositor permission; approved
researcher
• Embargo for fixed time period
CONTROLLED – only for accredited users
• Access via on-site or virtual secure environment
(secure lab
Example: managed access ReShare
Example: managed access ReShare
Example: managed access ReShare
Example: managed access ICRAF dataverse
References
•
•
•
•
European Commission (2010) Riding the wave: How Europe can gain from the
rising tide of scientific data. Final report of the High Level Expert Group on
Scientific Data. European Commission.
G8 UK (2013) G8 Sciences Ministers Statement, 12 June 2013.
The Royal Society (2012) Science as an Open Enterprise. The Royal Society
Science Policy Centre Report 02/12.
Van den Eynden, V. & Bishop, L. (2014) Incentives and motivations for sharing
research data: researchers’ perspectives.
Questions
Contact details
[email protected]