What is a Data Management Plan?

What is a Data Management Plan?
- and why write one? -
Myriam Mertens | Ghent University Library
Data management plan (DMP)
üdocument outlining how data
will be handled during and after
a project
üincreasingly required by
research funders/institutions
ügood practice even if not
required, because…
2
First step towards good research data
management
“[RDM] ensure[s] that data are of a
high quality, are well organized,
documented, preserved,
sustainable, accessible and
reusable.” (Corti et al. 2014)
3
‘Research data’ can mean a lot…
For example:
any information
collected/created
for the purposes of
analysis to generate
scientific claims
• content: numerical, textual, audiovisual,
multimedia... data
• data format/object:
spreadsheets/tabular data, field notes,
databases, images, audio recordings,
marked up texts, surveys, instrument
readings…
• mode of data collection: experimental,
observational, simulation,
derived/compiled… data
• digital or non-digital data
• primary or secondary data
• raw, processed or analyzed data
4
Problem 1: navigating & using data is a
challenge
5
Problem 2: digital data are inherently fragile
6
Problem 3: research data are undervalued &
neglected
Vines et al. 2013
7
Problem 3: research data are undervalued &
neglected
Poor verifiability of science
The Economist, 19 Oct 2013
Missed opportunities for data reuse
Yozwiak et al. 2015
8
The solution: RDM
“[…] the compilation of many small
practices that make your data easier to
understand, less likely to be lost, and
more likely to be useable during a
project or ten years later.” (Briney 2015)
9
RDM is part of good research practice
Expectations include:
• secure preservation for a
reasonable period
• access: as open as possible, as
closed as necessary
• FAIR principles (Findable,
Accessible, Interoperable &
Reusable data)
• data = legitimate & citable
products of research
10
Traditional view of research process
Project
planning
Data
collection
Data
processing
& analysis
Publication
of findings
Adapted from Briney 2015
11
Research data lifecycle
Project &
RDM
planning
search for data, use data
for further
research/teaching, cite
data…
select data to keep,
prepare data for
preservation, archive data…
check data policies,
write a DMP, plan
consent for sharing…
Data
discovery
& reuse
Data
collection
Data
preservation
processing
release/publish, register,
license data, regulate
access…
capture, organize,
document, store &
back up data…
Data
& analysis
Publication
of findings
& data
sharing
anonymize,
document, store &
back up data…
Adapted from Briney 2015 & Corti et al. 2014
12
DMPs – a mere administrative burden?
- takes time and effort upfront,
but…
+ saves time and problems later on
+ helps consider whole range of
RDM activities/issues
+ makes expectations, tasks,
procedures explicit
+ leads to more informed
decisions about data
+ helps identify resources required
(& obtain funding)
13
Common DMP themes
1. description of data to be
collected/created (e.g. content, type,
format, volume…)
“[…] plans typically state what data
will be created and how, and
outline the plans for sharing and
preservation, noting what is
appropriate given the nature of the
data and any restrictions that may
need to be applied.” (DCC website)
2. methodologies, standards for
collecting/creating data & data
documentation
3. ethics & intellectual property issues
4.
(e.g. informed consent, anonymisation,
any restrictions on sharing – e.g.
confidentiality, copyright, embargoes – ,
usage licenses…)
plans for data sharing & access (e.g.
how, when, with whom…)
5. strategy for long-term preservation
(e.g. what to keep, for how long,
where…)
Also see: http://www.dcc.ac.uk/resources/how-guides/develop-data-plan
14
Example – data description
“This project will produce qualitative
observational data from interviews and fieldwork
conducted at various locations across Finland
between January and June 2017. Raw data will
comprise digital audio recordings of interviews
(stored in .flac format), digital images (.tiff format)
and hand-written field notes. Audio files will be
transcribed into digital text documents (.xml files)
and notes will be digitized (via manual
transcription to .txt files) to prepare them for
analysis.”
15
Example – data documentation, metadata
“Descriptive metadata of data items will
be captured in XML files in accordance
with the Darwin Core schema, which is an
international metadata standard for
biodiversity data. In addition, datasets will
be accompanied by a separate readme.txt
file providing study-level documentation
including the field methods used for data
collection.”
More on metadata standards: http://rd-alliance.github.io/metadata-directory
16
Example – data sharing
“My dataset will be made available upon
publication of the associated journal article via
the Zenodo repository. The data in Zenodo
will be made open access and licensed under
CC-BY. As required by my institution’s research
data policy, and to further aid data discovery,
the dataset will also be registered in my
institutional repository (with basic descriptive
metadata, including a link to the data record in
Zenodo, made public).”
Registry of Research Data Repositories: http://www.re3data.org
17
Example – restrictions on sharing
“Because my research data are part of a
potentially patentable invention, sharing will
be delayed to investigate patent protection
first. Before any disclosure can take place,
my research results will be reported to my
university’s TechTransfer office, which will
determine whether releasing data will need
to be embargoed until after a patent
application has been filed.”
Also see EUDAT legal guide: http://wiz.eudat.eu
18
Example – data preservation
“Raw data and documentation files will be
offered for deposit to 4TU.ResearchData,
which is a DSA-certified data repository
accepting research data in the field of
engineering and preserves them for a
minimum of 15 years. Files will be offered in
the repository’s preferred formats (.txt, .xml
and JCAMP), and as the volume of data does
not exceed 10GB the repository will not charge
for the deposit.”
More on how to select a data repository: https://www.openaire.eu/opendatapilot-repository
19
Tips for writing a DMP
• consider it a ‘living’ document
• use a template
• e.g. Horizon 2020 FAIR DMP
• list of public DMP templates
https://wiki.surfnet.nl/display/RD/Datam
anagementplannen
• use an online planning tool
• have a look at example DMPs
20
Example plans
• examples on the Digital Curation Centre (DCC) website
http://www.dcc.ac.uk/resources/data-management-plans/guidance-examples
• examples in the Zenodo repository
https://zenodo.org/search?page=1&size=20&q=data management plans
• public DMPs on the DMPTool website
https://dmptool.org/public_dmps
• DMPs published in RIO (Research Ideas and Outcomes OA journal)
http://riojournal.com/browse_user_collection_documents?collection_id=3
21
Tips for writing a DMP
• check applicable data policies
• keep it simple, but be as specific
as possible
• justify your decisions
• familiarize yourself with RDM
terminology & best practices
(for your field)
22
Online RDM training resources
• FOSTER training portal
• OpenAIRE webinars
• EUDAT training materials
• Digital Curation Centre How-to Guides & Checklists
• UK Data Archive ‘Create & Manage Data’ webpages
• MANTRA – Research Data Management Training
• ‘Research Data Management and Sharing’ MOOC on Coursera
• Data Management Training Clearinghouse
23
Thank you for listening!
24
Credits
• slides adapted from S. Jones (2016), ‘What is a Data Management Plan?’, Licensed under CC-BY 4.0
• images
•
[slide 2]: ‘Writing’ by Aiconica, licensed under CC0 1.0
•
[slide 3]: ‘Database’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK
•
[slide 5]: From ‘Analyzing DMPs to inform Research Data Services’ by A. L. Whitmire, licensed under CC-BY 4.0
•
[slide 6]: ‘Day 10: Lost’ by Dave Hill, licensed under CC-BY-NC-SA 2.0; ‘A Domesday system at the Vintage Computer Festival 2010, Bletchley UK’ by
Regregex, licensed under CC-BY 3.0
•
[slide 7]: ‘Publications and Data’ by Auke Herrema, licensed under CC-BY 4.0; T. H. Vines et al. , ‘The availability of Research Data Declines Rapidly with
Article Age’, Current Biology 24 (2014) 1: 94-97. http://doi.org/10.1016/j.cub.2013.11.014
•
[slide 8]: ‘How Science goes wrong’, The Economist, 19 Oct 2013. http://www.economist.com/printedition/2013-10-19; N.L. Yozwiak, ‘Data sharing: Make
outbreak research open access’, Nature 518 (2015) 7540: 477- 479. https://doi.org/10.1038/518477a
•
[slide 9]: From ‘RDM: An Overview’ by Research Support Team, IT Services (University of Oxford), licensed under CC-BY-NC-SA 4.0
•
[slide 10]: The European Code of Conduct for Research Integrity. Revised Edition (ALLEA – ALL European Academies, 2017).
•
[slide 13]: ‘Planning’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK
•
[slide 16]: ‘Metadata’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK
•
[slide 20]: ‘Preservation plan’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK
•
[slide 22]: V. Van den Eynden et al., Managing and Sharing Data. Best practice for Researchers (UK Data Archive, 2009), licensed under CC-BY-NC-SA
3.0
•
[slide 24]: ‘Knowledge’ by Jørgen Stamp, attribution: digitalbevaring.dk, licensed under CC-BY 2.5 DK
25