TSR Hub_Data Management Plan_April 2016

Photo: Natalie Tapson (Flickr CC BY-NA-SA 2.0)
Data Management Plan
April 2016
This project is supported through funding from the Australian Government’s National Environmental Science Programme.
VERSION CONTROL REVISION HISTORY
Version Date revised
Reviewed by (Name,
Position)
1.0
18/12/2015
TSR Hub Leadership
Group
2.0
03/03/2016
2.1
3.0
19/03/2016
30/05/2016
CdL
Comment (review/amendment
type)
First draft based NESP Data and
Accessibility Guidelines
Second draft based on feedback
from TSR Hub Leadership Group
Comments
Amendments undertaken
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 2
Acknowledgements
The Threatened Species Recovery Hub wishes to thank the many people that contributed their
expertise towards developing this data and information management plan. Particular thanks go to
staff from TERN’s ÆKOS facility, the Australian National Data Service, and Atlas of Living Australia,
and to the LTERN (Long Term Ecological Network) facility for their advice regarding data submission
guidelines. Many thanks also to staff from the University of Melbourne’s Digital Scholarship and
Research Platforms groups, and to Monash University’s Research Data Management Unit.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 3
Contents
ACKNOWLEDGEMENTS ................................................................................................................................. 3
EXECUTIVE SUMMARY .................................................................................................................................. 5
INTRODUCTION ............................................................................................................................................... 7
THREATENED SPECIES RECOVERY HUB ....................................................................................................................... 7
PURPOSE OF THE TSR HUB’S DATA AND INFORMATION MANAGEMENT PLAN .................................................... 8
COPYRIGHT AND LICENSING.....................................................................................................................10
WHO CAN APPLY A LICENCE TO RESEARCH OUTPUTS? ............................................................................................ 10
OBTAINING COMPILED DATA: DATA SHARING AGREEMENTS ................................................................................. 11
LICENSING COMPILED DATA......................................................................................................................................... 12
TIMELINES FOR MAKING RESEARCH OUTPUTS PUBLICLY AVAILABLE ....................................13
OPEN-ACCESS PUBLICATIONS ...................................................................................................................................... 13
OPEN ACCESS DATA AND OTHER NON-PUBLICATION RESEARCH OUTPUTS ......................................................... 15
EXCEPTIONS TO THE OPEN ACCESS POLICY ............................................................................................................... 15
SENSITIVE INFORMATION PROVISIONS ............................................................................................... 15
SENSITIVE ECOLOGICAL DATA ..................................................................................................................................... 16
SENSITIVE CULTURAL OR HUMAN DATA .................................................................................................................... 16
UNIQUELY IDENTIFYING RESEARCH OUTPUTS .................................................................................17
MINTING DOIS FOR STATIC AND DYNAMIC DATASETS ............................................................................................ 17
SELECTING A SUITABLE REPOSITORY...................................................................................................17
INSTITUTIONAL REPOSITORIES ................................................................................................................................... 18
NATIONAL REPOSITORIES ............................................................................................................................................ 19
INTERNATIONAL REPOSITORIES .................................................................................................................................. 19
METADATA......................................................................................................................................................20
METADATA STANDARDS ............................................................................................................................................... 20
DESCRIPTIVE METADATA ............................................................................................................................................. 21
RESEARCH OUTPUT SUBMISSION GUIDELINES .................................................................................23
DATA PREPARATION GUIDELINES ............................................................................................................................... 23
CITING DATA................................................................................................................................................................... 26
LINKING DATA WITH RELATED RESEARCH OUTPUTS............................................................................................... 27
ACKNOWLEDGING NESP SUPPORT AND THE TSR HUB ......................................................................................... 27
ANDS PROJECT IDENTIFIERS ....................................................................................................................................... 27
PERMISSION TO PUBLISH RESEARCH OUTPUTS......................................................................................................... 28
NOTIFYING DOTE OF TSR HUB RESEARCH OUTPUTS ............................................................................................. 28
LINKING RESEARCH OUTPUTS WITH THE TSR HUB’S WEBSITE ............................................................................ 28
DATA MANAGEMENT AT THE PROJECT-LEVEL ..................................................................................28
PROJECT DETAILS .......................................................................................................................................................... 28
COLLECT .......................................................................................................................................................................... 29
ASSURE ............................................................................................................................................................................ 31
DESCRIBE ........................................................................................................................................................................ 32
PRESERVE AND DISCOVER ........................................................................................................................................... 32
DATA MANAGEMENT ADMINISTRATION ................................................................................................................... 33
APPENDICES ...................................................................................................................................................34
APPENDIX A. TSR HUB DATA, DATA REPOSITORIES, AND METADATA SCHEMAS ............................................... 34
APPENDIX B. RESEARCH OUTPUT SUBMISSION CHECKLIST .................................................................................... 39
APPENDIX C. BLANK PROJECT DATA MANAGEMENT PLAN ................................................................................... 40
APPENDIX D. WEB RESOURCES TO MANAGE RESEARCH DATA .............................................................................. 43
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 4
Executive summary
All TSR Hub outputs, including publications and data, should be made publicly available under the
latest Creative Commons (CC) framework unless there are valid reasons for not doing so (e.g.,
sensitive threatened species, social, or cultural information). Where possible, data should be
licenced under the most open Creative Commons licence (CC BY 4.0 at the time of writing).
Only copyright owners can apply a licence to a research output. You could be the copyright owner if
you generated a research output, but this varies with the intellectual property policies of the
organisation you work for. Researchers should check their institution’s IP policies before attempting
to licence a research output.
The use of pre-existing (Third party) data, including both entire datasets and subsets of data, should
be governed by a data sharing agreement that clearly states the terms and conditions of use for the
data. There are many types of data sharing agreements, and third parties will often have their own
agreement. The legal services unit of the lead researcher’s organisation should review all data
sharing agreements before signing.
Third parties should be encouraged to make pre-existing data used by the TSR Hub, or at least the
metadata, publicly available. Options to make pre-existing data publicly available should be
discussed before data is shared, and the results of this discussion incorporated into the data sharing
agreement.
Publications should be made freely and openly available on the internet within 12 months of
publication. Publication reprints are the preferred format, but post-prints are also acceptable. There
are five publishing scenarios available to TSR Hub researchers that meet both NESP’s requirements
for public access and publisher licensing agreements.
Data and other non-publication research outputs should be made publicly available on the internet
within 12 months of collection.
Exceptions to the open access policy are acceptable in some cases, for example if information is
culturally, environmentally, or socially sensitive. Delays or embargoes on making data publicly
available may also be permitted in some cases. The option to restrict access to research outputs
should be discussed with Project Leader(s) and the Hub’s Chief Operations Officer in the first
instance, before being put to the Department of the Environment (DotE) for approval.
Potentially sensitive ecological data should be curated in accordance with the guidelines established
by the Atlas of Living Australia’s Sensitive Data Service. Researchers should also consider publishing
spatial information for potentially sensitive species separate to information on the species
themselves (e.g. abundance, demographic information).
Approval from an institutional, and possibly state agency, Animal Ethics Committee will be required
for all projects that are collecting new data on animals. Permits may also be required for work with
animals, national parks etc. Researchers must also follow the federal codes of conduct for working
with animals.
Projects collecting cultural or human data are required to have approval from a Human Ethics
Committee, and must also conform to the federal guidelines for conducting human research. Human
research data may be suitable for public release if a number of steps are followed when acquiring
ethics approval, collecting the data, and preparing if for publication.
All research outputs should be assigned a persistent identifier (PI) such as a DOI (digital object
identifier) or PURL (Permanent Uniform Resource Locator). PIs are typically minted during the of
submitting research outputs to a repository or publishing a manuscript. There are different protocols
for minting DOIs for static and dynamic datasets, and care should be taken to prevent the
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 5
duplication of datasets. Publishers and data repositories are typically responsible for ensuring that
PIs remain resolvable, however some repositories pass this on to researchers.
Data should be deposited into an established repository, and one that is most appropriate for the
data. Institutional, national, and international repositories may all be suitable; repositories need to
be approved by DotE. Among other considerations, the decision of where to deposit data should be
based on the subject of the data, how it is likely to be used in the future, a repository’s licensing
framework and handling of sensitive data, and costs to depositing or maintaining the data.
High quality metadata about a resource or object should be recorded from the beginning of a
research project, and continue for the life of the project. There are a wide range of metadata
standards, ranging from subject-specific to general standards. Data repositories typically use just one
metadata standard that is tailored to the subject or type of data that the repository holds. General,
descriptive metadata should be recorded for all projects, regardless of the metadata standard used
when making data publicly accessible.
Detailed Research Output Submission Guidelines are provided to help prepare data and other
research outputs for publication.
Related research outputs should be linked via formal citation within the output and/or in their
metadata. Datasets should be cited in publications and included in the reference list, while the
metadata of datasets should include an up to date list of, and a link to, related research outputs.
All research outputs should include the TSR Hub boilerplate to acknowledge NESP support, project
partners, contributing organisations and individuals.
The Australian National Data Service is developing project-specific codes for each TSR Hub project.
These codes should be associated with all research outputs, either in the acknowledgments or
keywords, or in the output’s metadata.
Project Leaders should approve the final version of all research outputs (and their metadata) before
they are submitted to a journal, repository, or other publishing platform.
A copy of the completed the NESP Research Product Submission Form should be sent to DotE,
Project Leader(s), and the TSR Hub’s Chief Operations Officer at least five working days before any
research outputs are publicly released. A copy of all publications should also be sent to Project
Leader(s) and the TSR Hub’s Chief Operations Officer at the time of publication.
Each project is required to develop a data management plan within 12 months of starting. Project
Data Management Plans will detail how data will be collected and managed throughout the life of a
project. Completed plans should be sent to the TSR Hub’s Chief Operations Officer. Plans should be
revised annually.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 6
Introduction
Threatened Species Recovery Hub
The National Environmental Science Programme (NESP) is a long-term commitment to environment
and climate research. Funded by the Australian Government via the Department of the Environment
(DotE), the key objective of the NESP is to improve our understanding of Australia’s environment
through collaborative research that delivers accessible results and informs decision-making. The
focus of the NESP is therefore on practical and applied research that informs on-ground action and
that will yield measurable improvements to the environment.
The NESP is delivered through six multi-disciplinary research Hubs or consortia, hosted by Australian
research institutions. One of these hubs, the Threatened Species Recovery Hub (TSR), Hub aims to
support the development of regulations and investments to manage threats and improve the
recovery of threatened species and biodiversity in terrestrial and freshwater ecosystems.
The TSR Hub will provide a robust evidence base and strategic advice that will contribute
significantly to the overarching objectives of:
an improved conservation outlook for a substantial proportion of Australia’s threatened
species and ecological communities
cost-effective strategies to provide early warning about extinction risk and triggers for
immediate action
ongoing decline in the number of Australian species eligible for listing as threatened through
enhanced management of threats
increased community knowledge of, engagement, and investment in threatened species
conservation
efficient, effective, coordinated, and evidence-based public policy and strategies for
conserving threatened species.
Further details on the TSR Hub’s research priorities and research projects can be found on the TSR
Hub's website (www.nespthreatenedspecies.edu.au).
Data and information: freely and openly accessible
The Australian Government's position is that information funded by the Government is a national
resource that should be managed for public purposes. This openness is driven by Government
agencies and the research community’s identification of a need to promote open access to public
sector and publicly funded information. Open access to Government-funded information is DotE’s
default position, with exception only for privacy, security, or confidentiality reasons. In accordance
with this position, and as outlined in the NESP Programme Guidelines and the NESP Data and
Accessibility Guidelines, research outputs generated from NESP research are expected to be freely
accessible and available on the internet.
The TSR Hub is expected to generate a broad range of research outputs, including:
raw data sets e.g., raw, non-aggregated data (or other specified observational units)
publications, including scientific papers, reviews, books, and book chapters
grey literature, including fact sheets, project profiles, and technical reports
images, maps, photos, videos, and animations
models and other tools created by the research process (e.g., Decision Support Tools)
websites created specifically as a research output
mobile or tablet apps
unspecified emerging technology.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 7
To satisfy the public accessibility requirements of the NESP Data and Accessibility Guidelines, it is
expected that researchers will take steps to meet the following criteria:
1. licence research outputs generated from NESP research
2. make research outputs generated from NESP research freely and openly available on the
internet as soon as they are approved by the Leadership Group and DotE
3. make publications freely and openly available on the internet within 12 months of
publication. As a minimum requirement, this will be in the post-print format.
Providing access to the data and other research outputs generated by NESP research will provide upto-date, high quality data and information to decision-makers, environmental managers, other
scientists, and the general public. This will increase the opportunity to take a more collaborative,
informed approach to managing Australia's environment.
Purpose of the TSR Hub’s Data and Information Management Plan
The TSR Hub’s Data and Information Management Plan outlines how TSR Hub researchers are
expected to manage their data and other research outputs before, during, and after their research
project. This Plan therefore provides guidance on the key steps, resources, and tools needed to
ensure that TSR Hub research outputs are adequately and securely backed up, and that they can be
easily searched for, and accessed, by the public. This Plan should be read in conjunction with the
NESP Programme Guidelines and the NESP Data and Accessibility Guidelines.
A number of conditions must be met, and steps taken, before research outputs funded by NESP can
be made publicly accessible (Figure 1). Where possible, this includes:
applying the appropriate licence to research outputs
uniquely identifying research outputs
depositing research outputs in an appropriate repository, and in a timely manner
associating detailed metadata with research outputs
choosing file formats that facilitate long-term access
linking associated research outputs
including the boilerplate to acknowledge NESP funding and collaborators
ensuring provisions are made for any sensitive data or information.
The following sections detail these conditions and steps, linking to external websites for specialist
information when required. Where appropriate, guidance is provided for the management of preexisting or third party data, as the TSR Hub will make extensive use of such data. TSR Hub
researchers are also required to develop their own data management plan for each project.
Where possible, examples have been provided for ways to manage research products that the TSR
Hub is expected to produce, including detailed submission guidelines for tabular research data, and
general submission guidelines for all types of research products. However, given the variety of these
research outputs, there is no one set of standards or tools for researchers to adopt to meet the
NESP funding agreement, and it is beyond the scope of this document to detail specific data
management guidelines for every possible type of research output: for research outputs without
specific guidelines, the conditions of the funding agreement must still be met.
The strategies and tools you use to manage your data and research outputs should therefore meet
the three conditions of the NESP funding agreement, but should also be guided by the type of
research outputs you will produce, any sensitivities about public accessibility to data, in addition to
any legacy arrangements (e.g., between researchers and data repositories) that you might have. If
you are unsure about how best to meet the expectations of the NESP Programme Guidelines or to
manage your data, please contact your Project Leader, the TSR Hub’s Chief Operations Officer
(Melanie King: [email protected]), or your institution’s data management advisors.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 8
Figure 1. Steps in the publication and data management process include obtaining permits and approvals,
collecting and analysing data, publicly releasing all research outputs, and linking research outputs and the TSR
Hub website. Approval from Project Leader(s), the TSR Hub’s Chief Operations Officer, and/or the Department
of the Environment (DotE) must be sought for several steps.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 9
Copyright and Licensing
NESP hub funding agreements require all research outputs to be made publicly available under the
latest Creative Commons framework (Creative Commons Version 4.0 International at the time of
writing), unless there are valid reasons for not doing so (see Exceptions to the open access policy on
page 15 for more information). This licensing framework provides clear permissions, terms, and
conditions upon which data can be re-used, and has been endorsed for the Australian Government
Open Access and Licensing Framework (AusGOAL). AusGOAL establishes a common approach to
data licensing across research and government, and facilitates use and reuse of data for further
innovation and research. This is consistent with DotE’s Information Licensing Policy.
The AusGOAL framework includes six individual Creative Commons licences, one restrictive licence,
and one software licence. See the AusGOAL suite of licences for guidance of selecting a licence.
DotE’s default requirement is for all outputs resulting from the NESP to be licensed under the most
open Creative Commons licence, CC BY 4.0 (at the time of writing). The CC BY 4.0 licence is the most
accommodating of licences offered, and is recommended for maximum dissemination and use of
licensed materials. This licence allows others to copy, remix, transform, and build upon work, even
commercially, as long as they credit the licensor for the original creation.
In addition to licences, Creative Commons also offers a Creative Commons Zero (CC0) tool that
allows licensors to waive all rights and place a work in the public domain. Use of the CC0 waiver is
not recommended, however, because there are some doubts about its force under Australian law,
particularly with respect to moral rights. Furthermore, the disclaimer that accompanies CC0 may be
ineffective in protecting the user from liability for claims of negligence. It is therefore recommended
that CC BY licences are applied to research outputs rather than CC0 waivers, although it should be
noted that some data repositories (e.g., Data Dryad) only offer CC0 waivers.
Who can apply a licence to research outputs?
You have a right to apply a licence to a research output if you are its copyright owner. You may be
the copyright owner if you generated the research output yourself, but this will also vary with the
intellectual property (IP) policies of the institution or organisation for which you work.
All Australian research organisations have IP policies that clearly describe the ownership of IP
developed at the institution. These policies also stipulate copyright and licensing rights.
The following points outline general rules regarding IP at research organisations:
Where research outputs are produced by an employee acting in the ordinary course of their
employment duties, IP typically belongs to a person’s employer. Some organisations
recognise the rights of researchers to their research outputs, however, and transfer to
researchers the IP for their scholarly works (e.g., any article, book, manuscript, manual,
musical composition, diagram, photograph, creative writing, film or like publication). The IP
rights of visiting researchers, or researchers working under joint appointments, may differ to
those of academic staff.
Students typically retain ownership of IP created by them, however IP policies may vary
where the IP was jointly developed by staff, or where the IP is the subject of specified
agreements (e.g., third-party agreements with external organisations such as granting
bodies, or public or private sector organisations funding contract research and development
at an organisation). Where such agreements occur, they typically do not apply to a student’s
thesis or scholarly works, except where explicitly agreed.
It must be stressed that IP policies vary between organisations, and also with specific contractual
agreements, so researchers will need to check which rules apply to them.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 10
As research outputs generated by the Hub must conform to the NESP funding agreement, new
contracts with staff and students should clearly define who owns the copyright over research
outputs that they produce, and the requirement to license research outputs and make them
available to the public within NESP’s specified time frames (page 13). Where no such definition
exists, staff and students are still encouraged to license their research outputs and make them
available in a manner consistent with the NESP Programme Guidelines, the NESP Data and
Accessibility Guidelines, and the guidelines outlined in this document. Staff and students should
refer to their institution’s IP Policy for guidance and/or contact their legal department if they have
any questions.
Obtaining compiled data: data sharing agreements
The TSR Hub will make extensive use of pre-existing (third party) data, where such data is defined for
the purposes of this Data and Information Management Plan as any data collected by an
organisation that is not one of the TSR Hub nodes, or any individual(s) not employed by one of the
TSR Hub nodes. Third party data includes entire datasets and subsets of datasets.
It is important that any third parties give their consent for how their data is to be used and
potentially shared, and that this should happen before the data has been shared. It is therefore
recommended that the use of third party data be governed by a data sharing agreement that clearly
states the terms and conditions of use for the data.
There are many types of data sharing agreements, including letters, memorandums of
understanding, licences, contracts etc. Which of these data sharing agreements is most appropriate
will depend on the circumstances. A contract might be required if money is exchanged, for example,
while a letter of exchange might be appropriate for a one-off sharing agreement.
Data sharing agreements should outline the relationships between the parties, which data are to be
shared, and how the data may be used. The National Statistical Service’s (NSS) Good practice guide
to sharing your data with others suggests considering the following elements when preparing a data
sharing agreement:
Aims and purpose: state why the data is being shared, what will the data be used for, and
how will it be used.
Data definition: define the data that is being shared, including its formats and metadata, and
the terms of the data sharing agreement.
Legal restrictions: include the legislative and consent-based restrictions of the data. This
should include consideration of who owns the copyright and any IP, whether licences are
required to access the data, and whether there are privacy concerns around the data.
Governance: describe the governance arrangements for all aspects of the shared data,
including the process and operational management of the data, and maintaining and
terminating the agreement.
Access issues: detail how the data will be accessed and shared, both during the project and
after any analysis is complete, including licensing requirements to access the data, and also
for the compiled data or research products that arise from the data. It is NESP’s preference
that data is licensed under the broadest possible terms (a CC BY 4.0 licence)
Data quality: include information on the data’s “fitness for purpose”, including the
methodology used to create or generate the data and any processes used to monitor and
assess data quality.
Data management: describe the processes and systems that ensure data security and
integrity.
Costs: detail any costs, including a payment schedule.
Greater detail about each of these elements can be found on the NSS website.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 11
Third parties will often have their own data sharing agreements; researchers should ensure that the
terms of such agreements are aligned with the requirements of the NESP funding agreement. Most
institutions also have their own templates that can be used, or adapted where necessary. Third party
data sharing agreements and non-standard institutional agreements should be referred to a
researcher’s institution’s legal department for approval before signing.
Licensing compiled data
If your data is a compilation or derivative of a third party’s work, then you may not have the rights to
license the data or other research outputs. Ideally, use of such third party data will be governed by
the licence and terms of the data sharing agreement that you entered into when obtaining the data.
This agreement should specify how data can be used by you and others, and whether, to whom, and
how data can be made available at the end of the project. Use of third party data should therefore
be negotiated under the broadest possible terms (i.e., a CC BY 4.0 licence). Similarly, if your research
is based on pre-existing (or background) IP then the research agreement you entered into when
obtaining the data should include clauses describing the treatment of background IP.
Note: if you are using third party data that is already publicly available, it is not possible to license it
under anything other than the terms under which it was originally licensed.
Options for licensing compiled data
It is DotE’s preference that all data, including pre-existing data, which is generated or used by the
TSR Hub, be made freely and publicly available. Thus, if It is not already publicly available, the option
to deposit pre-existing data in a public repository should be discussed with the copyright holder
when negotiating the licence or data sharing agreement. There are three options for licensing
compiled data:
1) Under DotE’s preferred scenario, pre-existing data that is not already publicly available
would be deposited in an online repository, and a DOI would be minted for the dataset. All
research outputs that use that dataset should cite the data following standard procedures,
or as outlined by the repository. See Uniquely identifying research outputs (page 17) and the
research output submission guidelines (page 23) for more information.
2) If the copyright holder of pre-existing data is unwilling or unable to make it publicly
available, then an alternative option is to negotiate a license under which a dataset’s
metadata will be made publicly available, but not the dataset itself. In this case, a DOI will be
minted for the metadata, and the metadata should also include the name and contact
details of the data owner, and preferably also the data creator. Manuscripts based on the
data would refer to the data DOI, but users would need to contact the owner to negotiate
access to the data itself.
3) If the copyright holder or data creator is unwilling to include any provisions for public
accessibility to data or metadata in the data licence or data sharing agreement, then
publications must refer to the dataset using:
the name of the dataset
the name of the organisation that owns the data (i.e., the copyright holder)
the role of the person to contact about accessing the data, and, preferably,
the website of the copyright holder.
This is DotE’s least preferred option.
If a compiled dataset is published in association with a scientific paper (e.g., as electronic
supplementary material), and the paper is publicly accessible according to NESP’s open access
requirements (page 13), then the dataset fulfils the requirements for public access. Under this
scenario the dataset does not need to be separately deposited in an online repository. See Citing
data (page 26) for more information on how data can be made publicly accessible.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 12
Timelines for making research outputs publicly available
Open-access publications
Under the NESP funding agreement, all publications must be made freely and openly available on
the internet within 12 months of publication. Publications include peer reviewed scientific papers,
books, reports (grey literature), and other published material. At a minimum, publications must be
the final text and figures as accepted for publication. Pre-prints (text and figures before peerreview), abstracts, and/or citations alone do not meet NESP’s requirements.
The NESP Data and Accessibility Guidelines outline a number of publishing scenarios that meet their
requirements, depending on a publisher’s licensing framework (Figure 2, grey boxes). These
Guidelines preclude researchers from publishing in journals (or with publishers) that do not allow
free access to publications within 12 months.
It is the TSR Hub’s preference that publications arising from Hub activities are published in the most
appropriate place for the work, and that researchers are free to make this decision themselves, in
consultation with colleagues and supervisors (where appropriate). In anticipation of instances where
a researcher’s preferred publisher will not provide access to publications under the scenarios
outlined by NESP, the TSR Hub will implement its own process to make publications available from
the TSR Hub’s website, in a way that satisfies both NESP and publisher licensing requirements
(Figure 2, yellow box).
There are therefore five publishing scenarios available to Hub researchers that meet both NESP’s
requirements for public access and publisher licensing agreements (Figure 2). All publications and
research outputs are required to be made available from the TSR Hub’s website and the researcher
is to ensure a copy is forwarded to the Chief Operations Officer in a timely manner to ensure this
takes place. These scenarios are outlined below, in order of decreasing preference.
1. The publication is freely available online immediately upon publication. If there are no
restrictions to making the publication freely available, all publications will be accessible on
the publisher’s website and via the TSR Hub website.
2. The publication re-print is freely available online within 12 months of publication. Reprints are the final version of the manuscript, complete with journal/publisher formatting.
Under this scenario, re-prints would be available from the publisher’s website, the TSR Hub’s
website, and/or another website or online repository.
3. The publication post-print is freely available online within 12 months of publication. Postprints are the peer-reviewed text and figures as accepted for publication, but without
journal or publisher formatting. Publication post-prints would typically be made available
from the TSR Hub’s website, and/or another website or online repository. Post-print PDFs
should include a text header advising where the article has been published and link to the
article’s Digital Object Identifier (DOI). Individual journals often provide set wording that
should accompany post-prints of their articles.
4. In cases where a publisher will not allow a copy of the re-print or post-print to be made
publicly available within 12 months, there may be other alternatives available to ensure
open access, such as negotiating to publish under an alternative licence, or the payment of
an open access fee. Payment of open access fees are the responsibility of the Hub; DotE’s
preference is for research funding to be used for research, in preference to publication fees.
If you wish to publish in a journal or other publication that requires payment of an open
access fee, then this should be discussed with, and approved by, your Project Leader prior to
submission.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 13
5. For publications that do not fit within any of the previous scenarios, the TSR Hub will
develop a process to make them available from the TSR Hub website in a way that meets
both funding and licensing requirements. Researchers may therefore submit their work for
consideration in their preferred journal.
Note: while the NESP funding agreement states that publications must be made available within 12
months, if a publisher will allow re-prints or post-prints to be uploaded to a website after a 12month embargo, then this is acceptable to DotE. Publications should therefore be uploaded as soon
as the 12-month embargo period has passed.
There are several web-based tools that enable you to easily check the copyright regulations of most
journals. For example, the SHERPA/RoMEO website allows you to search publisher copyright policies
and self-archiving options for over 22,000 journals. Please note that this information should not be
relied upon for legal advice, and it is the responsibility of the researcher to ensure that the
information obtained is correct and current.
Figure 2. DotE has outlined a number of steps to determine whether a journal meets DotE’s preferred
publishing arrangement and the NESP funding agreement. Where this is not the case, the TSR Hub will make
articles available from the TSR Hub’s website via a process that satisfies both licensing and NESP funding
requirements. Figure adapted from the NESP Data and Accessibility Guidelines.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 14
Open access data and other non-publication research outputs
NESP expects that researchers will take all reasonable steps to deposit research data in an
appropriate subject and/or institutional repository within 12 months of data collection. The data and
its metadata should be stored in an open format together, in a way that clearly shows how they are
linked. Under this requirement, data may need to be placed in a repository before associated
publications have been published.
Much of the data generated by TSR Hub researchers is likely to be dynamic (e.g., monitoring data,
were new data is added on an annual or seasonal basis), collected over an extended period (e.g.,
long-running experiments), require long data validation processes, or will use revisable datasets
(e.g., computational models that need to be updated with new material and/or as material is
deleted or corrected). Where researchers that have collected such data feel unable to deposit their
data within 12 months of collection, they should refer to the Exceptions to the open access policy
below.
Exceptions to the open access policy
DotE recognises that open access to data or other information may not be suitable in cases where
that information is culturally, environmentally, or socially sensitive; refer to the Sensitive
information provisions (page 15) for more details. Furthermore, there may be valid reasons to delay
depositing data in a publicly accessible repository, or to embargo public access to a dataset after it
has been uploaded (typically for a maximum of 12 months).
Decisions to restrict public access to research products should be made by those closest to the
source (i.e., the data owner) and discussed with the Project Leader and the Chief Operations Officer.
If an exception (or delay) to open access is required, the Chief Operations Officer will apply to DotE
for approval. In cases where restricted access to sensitive information is approved, the data
custodian should keep an enduring copy of the unaltered data, and make the metadata freely
available. The metadata should describe the data and why it has not been released, and include the
contact details for the person(s) to whom other researchers should apply for more information, or
potential access, to the data. It is the data owner’s responsibility to determine what ethical and legal
steps must be observed to make the metadata of sensitive information publicly available.
Where students (or other researchers) are not contractually obliged to conform to the public
accessibility requirements of the NESP funding agreement, they should still be encouraged to follow
the guidelines outlined here.
Sensitive information provisions
DotE recognises that restricting access to information may be suitable in cases where that
information is environmentally, culturally, or socially sensitive. Sensitive data may include, but is not
limited to: location information for rare species, highly desirable or collectable species, culturally
significant site data, social data restricted by privacy considerations, and other heritage or
Indigenous matters. Refer to the Exceptions to the open access policy above for more information.
It is the Hub leader’s responsibility to collate instances of exceptions to the NESP open access
guidelines. Collated lists can be at project level and include basic information on the data collected
and the justification for non-release of the data. All proposed exceptions to the NESP open access
policy must be reported to the Hub steering committee as part of project progress reports, and
included in Hub annual reports.
Additional information for handling different types of sensitive data is given below.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 15
Sensitive ecological data
Curating data
In Australia, the conservation status of flora and fauna is determined at both a state and federal
level. Each state and territory treats the management of sensitive information about threatened
species in their jurisdiction differently, including how it assesses which species are sensitive, and
restrictions on sensitive information.
The Atlas of Living Australia (ALA) has worked with Commonwealth, state, and territory agencies,
and other data providers, to collate an authoritative list of species considered sensitive for
conservation or biosecurity reasons, and advice on how to handle data on these species. The ALA
has developed a Sensitive Data Service (SDS) to implement this advice.
The SDS enables data in sensitive records to be annotated, withheld, or generalised according to a
publicly available set of rules. Conservation and biosecurity data sensitivity can be checked online at
the SDS, while the species lists and rules provided by the external agencies can be accessed via the
Conservation Sensitivity related data sets.
Researchers who are working with species that are considered, or potentially considered, sensitive,
should consult the SDS for guidance on how to manage their data.
DotE is also developing policy guidelines for the access and management of sensitive ecological data.
This document will be circulated to Project Leaders once it is released.
Publishing data
As it is not possible to foresee which species may be considered sensitive in the future, the TSR Hub
recommends publishing the spatial coordinates of field sites (or other plot infrastructure) or species
sightings separately to information on the species themselves (presence/absence, abundance,
demographic information etc.). Guidelines on how to prepare separate data tables are given in the
sensitive information section of the data preparation guidelines (page 25).
Sensitive cultural or human data
It is possible for research data that relates to people to be shared ethically and legally, if researchers
pay attention to four processes when planning and conducting their research:
including provisions for data sharing when gaining informed consent
anonymising data where needed to protect people’s identities
controlling access to data in some instances
applying the appropriate licence.
Plans to make human data or metadata publicly available must be approved by the Human Ethics
Committee of a researcher’s institution, and must also conform to the National Statement of Ethical
Conduct in Human Research. More information on the ethics, consent, and sharing of human or
culturally sensitive data can be found on the ANDS Ethics, consent and data sharing website and the
ANDS guide Publishing and Sharing Sensitive Data. Researchers should also refer to the TSR Hub’s
Indigenous Engagement and Participation Strategy when planning and conducting their research.
Researchers working with Indigenous Australians can also contact the Aboriginal and Torres Strait
Islander Data Archive (ATSIDA) for specialist guidance in managing and depositing Indigenous
research data. ATSIDA has a series of protocols to assist with best practice management of
Indigenous data, including issues of preservation, access, reuse, and the return of research data
relating to Indigenous communities. ATSIDA is a thematic archive of the Australian Data Archive
(ADA), based at the Australian National University. For further information on depositing Indigenous
data with ATSIDA, or how to collect suitable metadata, contact the ADA: [email protected]
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 16
Uniquely identifying research outputs
TSR Hub research outputs should be assigned a persistent identifier (PI) such as a digital object
identifier (DOI) or a persistent uniform resource locator (PURL). PIs uniquely identify research
outputs, and enable outputs such as data and grey literature to be cited in the same way that journal
articles and books are. Citation also enables data to be recognised as a scholarly effort in and of
itself, with the potential for reward based on data outputs. TSR Hub researchers are therefore
encouraged to assign DOIs to all their research outputs, including, where practicable, data and grey
literature. See Citing data (page 26) for more information.
PIs are typically minted during the publication process, or while submitting research outputs to a
repository. In addition, there are a number of online tools that can mint DOIs for products not
associated with a repository, for example the ANDS Cite My Data Service. DOIs are associated with a
product’s metadata, and can typically be minted only if minimum metadata requirements are met.
For example, the minimum, mandatory set of metadata required to mint a DOI via the ANDS Cite My
Data service is the URL, title, creators, publisher, and publication year.
If the location or URL of a product changes, then the registered location must also be updated. The
responsibility for keeping PIs resolvable generally lies with the data repository, however some
repositories pass this responsibility on to researchers (e.g., institutions that have arrangements with
figshare).
Minting DOIs for static and dynamic datasets
DOIs are typically minted once for static datasets, i.e., complete datasets or those based on a single
data collection. In contrast, multiple DOIs can be assigned to dynamic datasets (i.e., those that are
added to on an ongoing basis, for example seasonally or annually), but only one DOI is associated
with a dataset at a specific point in time. For example, a dataset that consists of data collected
during one year will be assigned a DOI when it is submitted to a repository. After a second year of
data collection the dataset can again be submitted to a repository: this dataset contains data
collected during the first and second years, and it is assigned its own DOI. The metadata of updated
datasets typically retain links to the DOIs of earlier versions, or a list of previous DOIs. Publications
cite the specific dataset used in their analyses. Researchers that are collecting dynamic datasets
should carefully consider how they wish to use (and cite) their data before submitting it to a
repository, and also how future users are likely to want to access and use the data. Care should be
taken to prevent the duplication of datasets.
See Timelines for making data publicly available (page 13) for guidelines on when research outputs,
including static and dynamic datasets, should be submitted to a public repository, and Citing data
(page 26) for information on how to cite datasets.
Selecting a suitable repository
Data should be deposited into an established repository that is most fit for purpose. When deciding
where to deposit data, researchers should consider:
the subject of the data, including what it describes and how it was collected
where and how future users of the data are likely to search for it
the importance of having data associated with similar datasets (e.g., in a subject-specific
repository vs. a cross-disciplinary repository)
the metadata standard used by a repository, and whether it captures the required (or
desired) detail. See Metadata standards (page 20) for more details.
the format accepted
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 17
the repository’s licensing framework
the repository’s handling of sensitive data
the likely persistence of a repository, or the persistence of records within the repository
any costs to depositing or maintaining the data
how searchable the repository is
whether data can be easily retrieved and/or downloaded
quality standards during data submission and curation (e.g., auditing processes)
whether it is important that data be held under national laws.
There are a wide variety of data repositories that researchers can choose from. Each of these has its
own benefits and drawbacks. A broad overview of the different types of repositories is provided
below, divided here into institutional, national, and international repositories.
In addition, Appendix A offers examples of suitable repositories for a range of research outputs that
the TSR Hub is likely to produce. This list is not exhaustive, however, and researchers are
encouraged to explore their options, including institutional repositories (not included in Appendix
A). To this end, researchers may find it useful to consult the REgistry of REsearch data REpositories
(re3data.org) data portal, which provides an overview of existing research data repositories across
all academic disciplines, or the more life sciences-specific BioSharing directory.
Researchers are encouraged to start thinking about which repository they would like to use before
data collection begins, as this may save considerable time later on. Some repositories have strict
metadata requirements, for example, around which you can design your data collection (e.g.,
spreadsheets, scripts for automating computational tasks) to aid metadata creation.
Repositories must be approved by DotE before data are submitted to them. Contact the Chief
Operations Officer to confirm that your preferred repository is approved by DotE, or to gain
approval.
Institutional repositories
All the TSR Hub university nodes have good data management facilities, including data repositories.
Data can therefore be deposited in institutional repositories, with metadata minted by the
university, and appropriate links made to the TSR Hub’s website. The benefits of using institutional
repositories include:
a wide range of material (and formats) is accepted
ready access to information and advice on how best to deposit data
the metadata of institutional repositories are already harvested by ANDS
data and metadata are held under Australian law.
However, there are a number of limitations to using institutional repositories:
Some institutions are unwilling (or unable) to commit to holding data in perpetuity because
of limits on the availability of space, and the high costs to host and maintain data. Data held
in institutional repositories with these limitations will typically be retained for the minimum
time legally required (typically 5-7 years), and then re-assessed by the institution. If data has
not been accessed, or is perceived to be of limited public value, then it may be discarded.
Data hosted under these conditions would not meet NESP’s requirements.
Metadata standards vary across institutions, and may not have the level of detail (or
flexibility) to record the information that researchers would like.
Several universities are rolling out new data services to their researchers in collaboration
with external repositories. The University of Melbourne and Monash University both have
newly established relationships with figshare, for example. Researchers can mint DOIs
through these services and make their data (or any research output) publicly accessible.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 18
Such accounts are linked directly to an individual researcher, however, and are therefore
closed when a researcher leaves the university; it is the researcher’s responsibility to ensure
that data is migrated to another service provider prior to the end of their contract.
National repositories
There are a number of federally funded data repositories and networks that may be of interest to
TSR Hub researchers, including (but not limited to):
the Terrestrial Ecosystem Research Network (TERN) consists of twelve facilities, including
- Eco-informatics, which has established the Australian Ecological Knowledge and
Observation System (ÆKOS), a data repository and data portal
- The Long Term Ecological Research Network (LTERN)
the Atlas of Living Australia (ALA)
the Australian Data Archive (ADA), which includes
- ADA Indigenous, a specialised research data management facility for Australian
Indigenous research data
- ADA Social Science, which provides a national service to preserve and provide access
to data on social, political, and economic issues
the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS)
the Australian Antarctic Data Centre (AADC).
National repositories typically offer users:
specialised advice and services
the ability to accept a wide variety of data types and also data formats, although this varies
between repositories
detailed metadata schemas tailored to the subject specialisation, often with a highly level of
quality control at the data submission stage
close links with government agencies, universities, museums, and other research
organisations
established protocols to handle sensitive data
in some instances, close links between repositories, to cross-walk relevant data (e.g.,
biodiversity data contained in a dataset deposited with ÆKOS can be cross-walked to ALA)
the ability to add associated research outputs in the future (e.g., publications)
data is held under Australian law.
Some potential limitations of these services can include:
lengthy processes to deposit data because of the level of metadata detail required
some repositories are only open to data from established contributors (e.g., LTERN).
International repositories
Some data will be best suited to international, subject-specific repositories (e.g., GenBank).
Established protocols for depositing such data should be followed: metadata records can be sent to
ANDS and associated (e.g., ecological) data can be deposited in another subject-specific or general
repository (with appropriate links in the metadata of each dataset). In addition, there are a large
number of general, cross-disciplinary repositories available (e.g., Dryad, figshare). These services are:
easy to access
often have good links to journals
take a wide variety of data and research products.
Again, there are a number of limitations to using these repositories:
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 19
Licensing conditions vary between repositories, and there may not be the intellectual and/or
legal protections that researchers and/or NESP would like. Dryad applies the CC0 waiver to
all data deposited with it, for example, while figshare applies a CC BY 4.0 licence to all
objects publicly stored in its international repository, but a CC0 waiver to metadata only,
data, and databases. The University of Melbourne and Monash University figshare services
offer researchers the full range of Creative Commons licences from which they can choose.
Data are typically not held under Australian law; in response to concerns about this,
institutional agreements with figshare have stipulated that data must be held in Australia.
Repositories differ in whether material must be associated with a publication (e.g., Dryad),
or can merely be considered academic (e.g., figshare also accepts grey literature).
Some repositories charge researchers to deposit data with them; costs may be reimbursed
by some institutions, but not all.
Cross-disciplinary repositories typically use general metadata standards (e.g., Dublin Core),
and may not have the ability to capture all the information that researchers would like.
ANDS can harvest from some international repositories, but not all. The appropriate
metadata would therefore need to be sent to ANDS.
Metadata
Metadata, or “data about data”, is the descriptive, contextual information that is associated with, or
refers to, another resource or object. This information should be recorded from the beginning of a
research project, and continue throughout the project.
Recording high quality metadata will ensure that other users can:
Discover your data: contextual information will enable other users to find data by searching
the internet or a repository. This requires metadata that can be indexed and searched, for
example using geographic, temporal, administrative, or biological keywords or phrases.
Understand your data: to be useful, other users need to understand what the data
describes, how it was generated, how it is structured, what has been done to it etc.
Metadata must therefore record all the processes that lead to a dataset (or other research
output), and how it should be read.
Access data: metadata should include clear instructions about how users can access data,
and if and how it can be reused (e.g., licensing). Data that are not in digital format should be
stored and referenced in a manner that facilitates sharing in the event that access is
requested. Where access to the data is constrained, for example due to ethical, legal,
commercial, or security issues, metadata should outline how potential users can apply for
access, and summarise any conditions which they may need to satisfy for access to be
granted. See Sensitive information provisions (page 15) for more details.
There are a large number of resources available on the internet that provide background
information on the value and characteristics of metadata - this information will not be replicated
here. Instead, the following sections provide general guidelines about the types of metadata that
TSR Hub researchers should record, and a range of tools available to do so. Researchers interested in
going into greater depth about metadata are encouraged to explore some of the resources available,
such as the ANDS Metadata webpage and the DCC Using Metadata Standards webpage.
Metadata standards
Metadata normally takes the form of a structured set of elements which describe the data or other
research outputs – its content, formats and internal relationships - and which enable its
identification, location, and retrieval by others. A metadata standard formalises the information that
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 20
is recorded. Standards typically support a number of defined functions, including capturing
descriptive, technical, administrative, use, and preservation information about a dataset or object.
The specific metadata standard used will be linked to the choice of data repository: subject-specific
repositories typically use a metadata standard tailored to the subject or type of data that the
repository holds, while cross-disciplinary repositories use very general metadata standards. A large
number of metadata standards potentially meet the needs of the TSR Hub, including multiple
subject-specific standards. When choosing which repository (and thus metadata standard) to use,
consider:
The context in which the metadata will be created and used: consider who the data may be
shared with, and who will want to access the data. Best practice is to use a standard that has
been developed with a particular community in mind.
The format of the objects being described: there may be metadata standards that are
particularly suited to your data type.
The budget: metadata creation can be expensive (although this is not necessarily the case),
but it is worth creating the best possible metadata that the budget will allow.
The metadata capture method: a lot of technical metadata can be auto-generated, while
descriptive metadata (page 21) typically requires an element of manual input.
Metadata storage and delivery: choices of metadata software will affect which metadata
standards to implement, and how to reference elements within the standard (e.g., to enable
a search for a species, species names must be indexed).
Importantly, most repositories can harvest or export metadata from a standard different to the one
they use. There is therefore no need to generate metadata in multiple formats, and nor is there one,
ultimate standard that researchers should use.
Appendix A (page 34) lists a range of data types that the TSR Hub is likely to produce, and common
metadata standards and data repositories that are suited to these data types. This list is not
exhaustive: if you have any questions about where your data should be deposited, or which
metadata standard you should use, please contact your Project Leader, the TSR Hub’s Chief
Operations Officer (Melanie King: [email protected]), or your institution’s data management
advisors.
Descriptive metadata
Below are some general, descriptive metadata that should be recorded for all projects, regardless of
the discipline or metadata standard used when depositing data in a long-term repository. At a
minimum, store this metadata in a readme.txt file (or equivalent) with the data itself.
General Overview
Title
Name of the dataset or research project. Refer to Naming data packages (page 25)
for guidelines on how to name datasets before they are made publicly available in
an online repository.
Creator
Names and addresses of the people or organisations who created the data; the
preferred format for personal names is surname first (e.g., Smith, Jane).
Owner
Name(s) and addresses of the people or organisations that own the IP to the data,
formatted as for the Creator.
Personnel
Name(s) and addresses of the people or organisations that helped generate or
analyse the data.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 21
Identifier
Unique number used to identify the data, even if it is just an internal project
reference number (e.g., the TSR Hub project number).
Date
Key dates associated with the data, including: project start and end date; release
date; time period covered by the data; and other dates associated with the data
lifespan, e.g., the maintenance cycle and update schedule. The preferred format is
yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range.
Method
How the data were generated, listing equipment and software used (including
model and version numbers), formulae, algorithms, experimental protocols, and
other things one might include in a lab notebook.
Processing
Data should be deposited in raw, non-aggregated format where possible. Where
this is not possible, describe how the data have been altered or processed (e.g.,
normalised).
Source
Citations to data derived from other sources or third-party datasets, including
details of where the source data is held and how it was accessed. See Copyright and
Licensing (page 10) and Citing data (page 26) for more information.
Funder
Organisations or agencies who funded the research. This will include the NESP, and
possibly other organisations.
Content Description
Subject
Detailed, tabular metadata describing the methods used in the study, including the
survey structure, sampling regime, assumptions, and any constraints to re-use. Also
include keywords or phrases describing the subject or content of the data. This
should include the TSR Hub Theme, Project, and Sub-project numbers (where
applicable), and could also include the ANDS project identifiers (page 27).
Place
All applicable physical locations, including specific locations of observational units
and descriptions of the study/research area.
Language
All languages used in the dataset.
Variable list
Describe al variables in the list, as well as labels for those variables that are
encoded.
Code list
Explanation of codes or abbreviations used in either the file names or the variables
in the data files.
Technical Description
File inventory
All files associated with the project, including extensions.
File Formats
Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc.
File structure
Organisation of the data file(s) and layout of the variables, where applicable. See
page 30 for more information on file naming conventions.
Version
Unique date/time stamp and identifier for each version. See page 31 for more
information on file versioning.
Necessary
software
Names of any special-purpose software packages required to create, view, analyse,
or otherwise use the data
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 22
Access
Rights
The intellectual property rights, statutory rights, licenses, embargoes, or restrictions
on use of the data. See Copyright and Licensing (page 10) for more information.
Access
information
Where and how your data can be accessed by other researchers. See Selecting a
suitable repository (page 17) for more information.
Research output submission guidelines
The following guidelines provide detailed instructions to help prepare tabular data for submission to
an online repository, and general guidelines that should be followed when making any research
outputs publicly available. A research output submission checklist can be found in Appendix B.
Data preparation guidelines
The following guidelines are designed for use when preparing data for submission to an online
repository. Where possible, similar protocols should be followed for preparing other data types. For
specific guidelines on preparing other data types, please contact your institution’s data management
section, your Project Leader, or the TSR Hub’s Chief Operations Officer ([email protected]).
File formats
The file format of your data plays a large role in its ability to be accessed and used in future.
Contemporary software and hardware should be expected to become obsolete, however some
formats are more likely to be readable in the future than others. Researchers are therefore
encouraged to submit their data in an open format where possible.
Formats likely to be accessible in the future are:
Non-proprietary
Open, with documented standards
In common usage by the research community
Using standard character encodings (e.g., ASCII, UTF-8)
Uncompressed (space permitting)
Unencrypted
Examples of discouraged formats, and preferred alternative formats, are shown below:
Discouraged format
Excel (.xls, .xlsx)
Word (.doc, .docx)
PowerPoint (.ppt, .pptx)
Photoshop (.psd)
Quicktime (.mov)
Encouraged format
Comma separated values (.csv) and other text formats (e.g., .txt).
For databases, prefer XML or CSV to native binary formats.
Plain text (.txt) or, if formatting is needed, PDF/A (.pdf) or rich text
format (.rtf)
PDF/A (.pdf)
TIFF (.tif, .tiff), JPEG, JPG-2000, PNG
MPEG-4 (.mp4)
Some repositories restrict which file formats they will accept, while others encourage submission of
data in multiple formats (e.g., excel files (.xls or .xlsx) and .csv). Widely used proprietary formats
(e.g., ESRI shape files) may be acceptable to some repositories, but not all. Always check the
repository you will use before preparing your data for submission.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 23
For more information on recommended formats for data archiving, see the AusGOAL open formats
website, the ANDS File Formats website, and the CDL Digital File Format Recommendations.
Data Tables
Provide data in a raw format and at the level at which it was collected.
Do not include summaries unless providing aggregated data in conjunction with the raw
data is conducive to improving data reuse.
Provide as much data as possible in the form of a stand-alone data table (i.e., a single
worksheet).
If you have multiple data tables, each table should include a column that allows the tables to
be linked unambiguously (i.e., a unique identifier).
Remove abbreviations.
Ensure species variables contain only species names, and not descriptors. For example, a
species list should not contain “Eucalyptus pauciflora (live)” or “Eucalyptus pauciflora (x2)”.
Instead, record only the species name (Eucalyptus pauciflora) in the species variable column,
and any descriptive data (e.g., “live”, “2”) as separate variables in additional columns.
Avoid using common names, or include them (in a separate column) in addition to scientific
names.
If you have more than 20 species consider producing a species file that lists scientific names
and common names.
If your study had many sites (or transects/plots within sites), consider producing a file that
lists the coordinates (decimal degrees). See also guidelines for submitting spatial data for
sensitive information (page 25).
Columns
Columns should contain a single variable rather than observations from a mix of variables
(e.g., do not incorporate species names and their growth form in the one column).
Create long data tables rather than wide data tables. For example, use separate columns to
record “year” and “count” rather than “count_in_year1”, “count_in_year2” etc.
Where nominal data is repeated in multiple data tables (e.g., Location ID), ensure that
column headers are consistent in each data table.
Use descriptive variable names.
Use a consistent format within a column (i.e., all numeric or all character string).
Check for and repair data inconsistencies (e.g., species name and common name used
interchangeably, truncated or abbreviated names).
Clearly identify missing values using a consistent value (e.g., “NA” or “missing”).
Be careful and consistent with quotations.
Record reasons for missing values (e.g., “not observed”, “removed”, “censored”) in
metadata records e.g., missing values denote “no survey” or “no observations”.
Ensure consistent recording of date/time formats using unambiguous, standard formats
(e.g., ISO 8601) within single columns and across columns within a data table.
Where possible, do not use integers for nominal or ordinal level variables.
Spatial coordinates (decimal degrees) should be provided in two separate columns (i.e.,
latitude and longitude)
Rows
1. Rows should contain observations.
2. Check for and correct typographical errors, missing values, duplicate entries, and unusual
characters.
3. Present data in SI units where possible.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 24
Sensitive Information
Researchers are responsible for ensuring that confidential, private, or sensitive data is removed or
appropriately modified prior to publicly releasing a dataset. Refer to the Sensitive information
provisions (page 15) for more information.
Researchers should refer to the Sensitive Data Service (SDS) when collating their data for advice on
how to handle data for species that are currently considered threatened. However, as new species
are periodically added to schedules to legislation listing threatened species, and it is not possible to
foresee which species will be added in future, researchers are encouraged to publish the spatial
coordinates of field sites or other plot infrastructure separate to information on the species
surveyed or recorded:
Spatial data tables: provide spatial data for observational and experimental units in a standalone data table. Each set of spatial coordinates should be provided with a unique Location
ID in order to spatially reference all sites, plots, transects and so on. Spatial data tables
should contain:
- Spatial coordinates provided at each spatial scale relevant to the study, e.g., from
the transect or sub-plot level (finest level), to the plot or site level (coarser level)
- Details of what the coordinates refer to (e.g., location of SW corner of 1ha plot)
- A Location ID to link the spatial data to the survey data
Survey data tables (containing survey or monitoring data) should not contain spatial data
but should contain the Location ID, which can be linked to the spatial data table.
Compile separate data table descriptions for spatial and survey data tables.
Should a species be listed as threatened in future, data within the spatial data table can be
annotated, withheld, or generalised according to the rules outlined in the SDS.
Naming data packages
The titles given to TSR Hub datasets or data packages that are submitted on online repositories
should provide a description of the data that is long enough to differentiate it from other similar
data sets/packages. Using a consistent naming format will also facilitate the TSR Hub’s tracking of its
research outputs. The following format should be used:
[Theme name (Project name)] [:] [Data type (such as experimental unit, observational unit,
and/or measurement methods)] [,] [Geographic location] [,] [State] [,] [Country] [,] [Annual
or seasonal ranges]
For example, a dataset reporting feral cat abundance in Kakadu National Park in 2009 that stems
from Theme 1 (Taking the threat out of threatened species) and Project 1.1 (Developing evidencebased management tools and protocols to reduce impacts of introduced predators on threatened
mammals) would receive the following title:
Taking the threat out of threatened species (Developing evidence-based management tools
and protocols to reduce impacts of introduced predators on threatened mammals): Feral cat
abundance, Kakadu National Park, Northern Territory, Australia, 2009.
The full citation of this dataset, if authored by Brown and colleagues in 2018, would then be:
Brown et al. (2018) Taking the threat out of threatened species (Developing evidence-based
management tools and protocols to reduce impacts of introduced predators on threatened
mammals): Feral cat abundance, Kakadu National Park, Northern Territory, Australia, 2009.
Name of preferred repository, http://doi.org/10.1718/492allmadeup356d.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 25
Saving data tables
Data tables that are to be submitted to a repository should be saved in plain text format (e.g., .csv or
.txt) using a name that clearly describes its contents. The TSR Hub recommends that you use the
following naming convention:
[theme]_[project]_[data type]_[location]_[YYYY or YYYY-YYYY]_[document type]
Use a logical abbreviation for the theme and project names; all data files produced by
this theme and project should use the same abbreviation
Use lowercase with underscore.
Generalised location is optional
Document type is optional (e.g., data_table, table_description)
For example, for data that is linked to the citation:
Brown et al. (2018) Taking the threat out of threatened species (Developing evidence-based
management tools and protocols to reduce impacts of introduced predators on threatened
mammals): Feral cat abundance, Kakadu National Park, Northern Territory, Australia, 2009.
Name of preferred repository, http://doi.org/10.1718/492allmadeup356d.
An appropriate file name could be:
ttts_ebmt_cat_abundance_data_2009.csv
Data table descriptions
Complete a separate table description for each data table (including those supplied as part
of a data package).
Include descriptions of all columns in each data table.
Ensure information that is necessary for re-use (e.g., projections, datum) is included.
Include the units relevant to each column, and ensure that the column descriptions relate to
the measurement protocols described in the metadata for the data package.
Citing data
Datasets should be formally cited in an article, and included in the article’s reference list. Citing data
is important to give the data producer credit, allow easier access to data for re-purposing or re-use,
and to enable others to verify your results. See Naming data packages for guidelines on how TSR
Hub datasets and data packages should be titled.
Many data repositories (and some journals) provide explicit instructions for citing their contents. If
no citation information is provided, you can still construct a citation following generally agreed-upon
guidelines from sources such as the CODATA Report on data citation, the ESIP Federation Data
Citation Guidelines, and the DataCite Metadata Schema.
Five core elements are usually included in a dataset citation, with additional elements added as
appropriate.
Creator(s) – may be individuals or organisations
Publication year when the dataset was released (this may be different from the Access
date; see below)
Title – follow the guidelines set out in Naming data packages.
Publisher – the data centre, archive, or repository
Identifier – a unique public identifier (e.g., a DOI)
Although the core elements are sufficient to refer to static datasets, additional elements may be
required if you wish to cite a dynamic dataset or a subset of a larger dataset.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 26
Version of the dataset analysed in the citing paper
Access date when the data was accessed for analysis in the citing paper
Subset of the dataset analysed (e.g., a range of dates, geographic locations, or record
numbers, a list of variables)
Verifier that the dataset or subset accessed by a reader is identical to the one analysed by
the author (e.g., a Checksum)
Location of the dataset on the internet, needed if the identifier is not "actionable"
(convertible to a web address)
See the DMPTool website for more details and example citations.
In addition to citing data within a journal article, data that is deposited in a repository can itself be
supplemented with a data paper. Data papers are peer-reviewed publications that describe a
dataset, but which do not draw any conclusions about it. Examples of journals that accept data
papers that may be applicable to TSR Hub data are Nature Scientific Data, Ecological Archives, and
Geoscience Data Journal. See this University of Edinburgh website for more options.
Data can also be published as supplementary material associated with a scientific paper that meets
NESP’s public accessibility requirements.
Linking data with related research outputs
The metadata of archived datasets should be updated when related research outputs (e.g., journal
articles, books, grey literature) are published. Update the metadata to include the DOI and location
(weblink) of the published item. Some repositories will also allow copies of items related to a dataset
to be uploaded as associated files.
Acknowledging NESP support and the TSR Hub
Support from DotE needs to be acknowledged in all research outputs, including data, publications,
presentation material (including presentations at conferences), promotional and advertising
material, signs, plaques, etc.
As per the TSR Hub Media Protocol, research outputs should include the boilerplate:
Reference to all partners and contributing organisations
The following paragraphs:
The Threatened Species Recovery Hub brings together Australia’s leading conservation scientists to
help develop better management and policy for conserving Australia’s threatened species.
It is supported by the Federal Department of the Environment’s National Environmental Science
Programme (NESP), a long-term commitment to support environmental and climate research.
The boilerplate can be included the acknowledgements section of publications, or the metadata of
other research outputs. Where possible, a link to the TSR Hub website should also be included.
ANDS project identifiers
The Australia National Data Service is developing project-specific codes for each of the NESP Hubs.
ANDS will use these keywords to find and link NESP research outputs with the ANDS data portal,
Research Data Australia (RDA), which will act as a central metadata repository for all the Hubs. ANDS
regularly harvests metadata from a large number of Australian research institutes and data
repositories. Metadata that is deposited in these partner organisations will be automatically ported
into ANDS, while the TSR Hub will notify ANDS of research outputs deposited elsewhere.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 27
ANDS’ project codes will be available from TSR Hub Project Leaders. Researchers will need to include
their project’s ANDS code in their research output’s metadata. This code should be included as a
keyword if possible, but can go in other fields (e.g., free-text fields) if keyword options are restricted.
Permission to publish research outputs
Your Project Leader should approve the final version of all research outputs (and their metadata)
before they are submitted to a journal, online repository, or other publishing platform. Approval is
required once only, unless there are major changes made to the output during the revision process.
Notifying DotE of TSR Hub research outputs
A copy of the NESP Research Product Submission Form should be sent to DotE at least five (5)
working days before an output is publicly released. Forms are available from the TSR Hub website.
The Form should be accompanied by a copy of the research output in a format that is approved for
public release.
Linking research outputs with the TSR Hub’s website
The TSR Hub’s website will act as a centralised, up-to-date source of information on the Hub’s
activities, including a link to (or copy of) all research outputs arising from the TSR Hub.
To facilitate this, please send your Project Leader and the Chief Operations Officer a copy of the
NESP Research Product Submission Form and the research output at the time you submit them to
DotE (i.e., at least five working days before the output is made publicly accessible, and within 12
months of publication). For publications, please note under which publication scenario the
document can be made publicly accessible.
Data management at the project-level
TSR Hub researchers should consider how they will manage their data before projects begin, and to
revisit and refine their data management strategy at regular intervals throughout the project’s life.
Data management plans are formal documents that outline what you will do with your data before,
during, and after a research project. Each project is required to develop a data management plan
within 12 months of starting to provide a guide as to how data collected will be managed. A blank
copy of the project-level data management plan can be found in Appendix C. Completed copies of
the project-level plans should be sent to the Chief Operations Officer ([email protected]).
Project Details
Project number and name: give the TSR Hub project number and name.
Project Leader(s): list all the Project Leaders.
Lead organisation(s): list the lead organisations.
Partner Investigator(s): list all the partner investigators.
Partner organisations: list the partner organisations.
Project leader contact details: provide the name and contact details (telephone number and email)
of the person(s) primarily responsible for the project.
Project summary: provide a brief summary of the project.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 28
Collect
What types of research outputs will be collected and/or produced during the project? Research
outputs (including data) can be grouped into four main categories:
observational e.g., telemetry, surveys, images, sensor readings
experimental e.g., habitat manipulations, gene sequences
simulation e.g., population models, climate models
derived/compiled e.g., meta-analyses, compiled database, reports, manuscripts.
These categories differ in their reproducibility, and their cost and time to generate. Which categories
your research outputs come from will affect how you manage them.
Describe the methods, processes, and software that will be used to generate each type of research
output. Give a brief overview of how you will conduct your research, and the research outputs that
you will produce.
Is ethics approval (human and/or animal) required for this project? If so, give details. Projects
involving research on animals and humans require approval by an ethics committee: if your project
requires ethics approval, list the approval number(s). It is the researcher’s responsibility to be aware
of the relevant legislation and institutional policies. For more information, see:
NHMRC/ARC/UA Australian Code for the Responsible Conduct of Research (2007)
NHMRC Values and Ethics: Guidelines for Ethical Conduct in Aboriginal and Torres Strait
Islander Health Research (2003)
Australian Institute of Aboriginal and Torres Strait Islander Studies Guidelines for Ethical
Research in Australian Indigenous Studies (2012)
Australian Code for the care and use of animals for scientific purposes (2013).
Describe any third-party data to be used in the project and the licence agreement to use the data.
Third-party data should be accessed under a data sharing licence agreement: see Copyright and
Licensing (page 10) for more details. Provide the name and source of any third-party data that your
project will use, and the data sharing licence agreement that you have entered into to use that data
(e.g., which Creative Commons licence). Include details on any constraints to the re-use of data.
How and where will data and related information be backed up and stored during the project?
Regularly backing up data and related information, including metadata, is an integral part of data
management.
Some general advice for backing up your data are to:
Have three copies in at least two locations (e.g., original + external/local backup +
external/remote backup).
Geographically distribute your copies to reduce the risk of losing all backups (e.g., because
of fire or flood).
Use software to backup your hard drive (e.g., Windows File History, OS X Time Machine,
Linus/UNIX rsync).
Test your backup system to confirm that you can retrieve and read your files. Do this when
you initially set up the system, and on a regular schedule thereafter.
Ensure that your files can be accessed by at least one other, trusted person, in case of
accident or injury.
You can backup data to your personal computer, external hard drives, or institutional server. All the
of the TSR Hub node institutions have good infrastructure to support data management, including
backups. The ANDS website provides links to the data management plans and toolkits for most
Australian Universities.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 29
In which file format(s) will data be stored during the project? List the types of files your project will
produce, e.g., .csv files, word files, R project, ESRI shape files.
What is the expected volume of data that will be collected or generated? The answer to this
question will determine the amount and type of data storage you will need to access. Some
questions to keep in mind are:
1.
2.
3.
4.
5.
Are you manually collecting and recording data?
Are you using observational instruments and computers to collect data?
Is your data collection highly iterative?
How much data will you accumulate every month or every 90 days?
How much data do you anticipate collecting/generating by the end of your project?
How stable is the data, and how will you control for different file versions? Data can be fixed or
changing over the course of a project (and possibly after the project’s end):
Fixed or static datasets: do not change after being collected or generated (e.g., data
collected during a single, short-term experiment), other than to correct errors or other
minor issues.
Growing or dynamic datasets: new data may be added, but the old data is not changed or
deleted (e.g., monitoring data that is collected on an annual basis)
Revisable datasets: data may be added, deleted or changed (e.g., predictions based on
computational models that need to be updated with new material and/or inaccurate
material deleted or corrected).
Which category your data falls into will affect how you organise your data, the level of versioning
you will need to undertake, and how, when and where you deposit your data (see Timelines for
making research outputs publicly available (page 13) for more details).
Using naming/versioning conventions:
makes it easier to locate specific files
prevents accidental overwrites or deletion
preserves differences in the content or function of different file versions
prevents confusion if multiple people are working on shared files
General recommendations for file naming and version control:
File and folder names
define a naming convention and be consistent using it, especially if multiple people are
sharing files. For example, the top-level directory/folder could include i) the project title, ii)
a unique identifier, and iii) the date (e.g., yyyy-mm-dd or yyyymmdd).
The sub-directly structure should also have clear, documented naming conventions. For
example, separate files or directories could apply to each run of an experiment, version of a
dataset, or person in the group.
use underscores (_) and not spaces to separate terms
keep names short (15-25 characters)
use folder names that describe the general category of files that the folder contains
use file names that describe the contents of the file
do not include the folder name in the file name unless you are sharing files and there might
be confusion about which folder a file should be added to
reserve the 3-letter file extension for the file format (e.g., .txt, .pdf, .csv)
record all changes to a file, no matter how small, and discard obsolete versions after making
backups.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 30
File versions
include the date at the beginning of the file name (e.g., yyyymmdd). Update this number
each time the file is saved
for the final version, substitute the word FINAL for the version number
turn on version tracking in collaborative works or storage spaces (e.g., Wikis, GoogleDocs)
use a versioning software that automatically tracks versions.
Whichever file naming and version control protocols you define and use, ensure that everyone in the
project is aware of the protocols, and periodically check that they are being followed.
How will you keep data secure during the project? Data security is the protection of data from
unauthorised access, use, change, disclosure, and destruction. Ensure your data is safe with regards
to:
Network security: keep confidential data off the internet and, in extreme cases, put sensitive
material on computers not connected to the internet.
Physical security: restrict access to buildings and rooms where computers or media are kept.
Computer systems and files: keep virus protection up to date, set passwords on files and
computers, do not send confidential material via email (or encrypt files if you must).
Some types of data (e.g., human data that contains identifiable or re-identifiable information) may
have specific storage or encryption requirements. Researchers must ensure that they are aware of
the relevant state and federal legislation and guidelines governing privacy regarding their research.
For more details, contact your institution’s ethics committee and refer to the NHMRC National
Statement on Ethical Conduct in Human Research.
Assure
Describe conditions during data collection that may affect the quality of data. Examples of
conditions that may affect data quality include equipment settings or failure, environmental
conditions, inadequate training, a lack of calibration, missing values or incorrect entry of data.
Aspects of data quality include:
Accuracy: how well do data accurately represent the “real world” values they are expected
to model?
Completeness: is all the required information available?
Consistency: are values consistent across data sets? Do distinct occurrences of the same
data agree with each other? If multiple files exist then a linking variable is required so that
they can be joined.
Validity: how correct is the data?
Conformity: if there are expectations that data conforms to a specified format, does this
occur?
Integrity: is there data missing, or are important relationship linkages missing?
Describe any quality check processes. Quality checks should include confirmation that the Data
preparation guidelines (page 23) have been followed, and protocols to have draft publications
reviewed by the authors and Project Leader(s) prior to publication. Quality checks could also include
processes to address the aspects of data quality outlined above.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 31
Describe
Which metadata standards will be used to describe the data? The metadata standards you use
should be tailored to the type(s) of data and research outputs you produce. See Metadata standards
(page 20) and Appendix A (page 34) for more information.
Which tools will be used to create and manage metadata? There are a number of tools available on
the internet to assist in metadata management; these should always be supplemented by your own
manual entries. See Metadata standards (page 20) for more information on methods to record
metadata, and Descriptive metadata (page 21) for general metadata that you should record about
your project.
Which standard keyword lists and/or controlled vocabularies will be used to describe data? The
metadata of all TSR Hub projects should include the TSR Hub boilerplate (page 27), and the projectspecific codes generated by ANDS (page 27). In addition, you may wish to include other keywords
relevant to your research subject or location (e.g., taxonomic, ecological, geographic, or sociological
keywords).
Preserve and Discover
How will data be shared between project partners and external researchers? Describe the tools
that you will use to share between collaborators (e.g., Dropbox, Google Drive, Bitbucket, email).
How will data and other research outputs be stored and made freely available to the public in the
long term? Provide details of the freely accessible data repository/repositories in which you will
deposit your data for long-term public access. See Selecting a suitable repository (page 17) and
Appendix A (page 34) for more information.
In what format will data be stored and published in the long term? List the types of files your in
which your data will be stored/deposited. See File formats (page 23) for more information on
suitable file formats for long-term storage.
How will data be retained and/or disposed of? While it is expected that data and research outputs
will be made permanently accessible on the internet, researchers must also manage the original
and/or hard copies of their data. Universities are guided by the requirements of the Australian Code
for the Responsible Conduct of Research, which sets out retention periods for research data,
research records and primary materials. In addition, there is need to consider:
specific provisions set out in Commonwealth and State legislation concerning archives and
record keeping, especially those materials of heritage value,
disciplinary requirements, and
possibly other regulatory requirements.
Here, outline how and when you will dispose of the different research outputs that your project will
produce.
Which licensing policy will be used to share data and other research outputs? List which Creative
Commons licensing policy/policies you will apply to your research outputs. See Copyright and
Licensing (page 10) for more information.
Will there be embargo period for any data? If yes, give details. While it is expected that TSR Hub
research outputs will be made freely accessible to the public within 12 months, embargos may be
applied in exceptional circumstances. Detail whether you will require an embargo for any of your
research outputs, including the reasons why, and the length of the embargo period. See Timelines
for making research outputs publicly available (page 13) for more information.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 32
If applicable, describe procedures to access restricted data. It may not be suitable to make some
types of data (e.g., some threatened species data, or social or cultural information) fully accessible
to the public. If your project will produce sensitive information, detail how you will store the data for
long-term preservation, and how you will manage access to the data. See Sensitive information
provisions (page 15) for more information.
Data Management Administration
What is the staff allocation for data management? State the full-time equivalent (FTE) that has
been allocated to manage data for your project.
Who is responsible for managing and backing up data prior to its submission to a repository? Give
the name and contact details (email, telephone number) of the staff member(s) responsible for
managing and backing up data and metadata.
How will project participants be trained to manage data? Detail how you will ensure that all
researchers (staff and students) are managing data appropriately.
What is the technology cost to store data? Detail the costs you will incur to store your data (e.g.,
hard drives, cloud storage access costs).
What is the technology cost to make data accessible? Detail any costs you will incur to deposit data
in online repositories.
Date this plan was prepared. Give the date that this data and information management plan was
completed.
Date of next review. Give the date that you will next review your data and information management
plan: plans should be reviewed regularly, and at least annually.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 33
Appendices
Appendix A. TSR Hub data, data repositories, and metadata schemas
The following tables provide examples of data that are likely to be produced by the TSR Hub,
common metadata standards associated with those data types, and a selection of subject-specific
repositories that use those metadata standards. Information on common cross-disciplinary
metadata standards and repositories is provided in the final table. Repositories preceded by a star (*)
are Australian-based, although they may also take data from outside Australia; all others are
international.
The metadata standards and repositories listed here are just a small selection of the repositories
that exist. For more information on standards and repositories for each subject or data type (and
also any that are not listed here), see:
re3data.org: a REgistry of REsearch data REpositories
BioSharing: a directory of life sciences databases and metadata standards
DCC disciplinary metadata standards webpages
In addition to the repositories listed here, each of the TSR Hub nodes has an institutional repository.
Contact the relevant facility in your institution for more information; the ANDS website also has links
to the data management plans and toolkits for Australian Universities.
Spatial data
Metadata
ANZLIC Metadata Profile: a profile of ISO 19115, and NESP’s preferred metadata
standard for spatial data. This standard can be used to describe a wide variety of
non-spatial material: see the ANZLIC Metadata Profile Guidelines Version 1.2 for
examples. Also maps to Dublin Core.
Metadata creation tool: ANZMet Lite metadata collection tool
Repositories
*
AODN – Australian Ocean Data Network Portal: the primary access point for the
search, discovery, access and download of data collected by the Australian marine
community. Uses the Marine Community Profile extension of ISO 19115. Re3data
summary.
AusCover: a TERN facility that provides remote sensing data time-series and
satellite based biophysical map products for Australia. AusCover also provides
associated field calibration and validation data at continental scales. Uses ANZLIC
metadata.
Comments
See the NESP Data and Accessibility Guidelines for more information, and the
ANZLIC Metadata Profile Guidelines Version 1.2 for examples.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 34
Ecological data
Data that is collected using field techniques such as surveys, trapping, radio- or GPS-tracking, and
camera traps. May be associated with demographic, biological, and/or genetic data, or experimental
manipulations (e.g., of habitat). Ecological data often has an important spatial element, and is
typically structured at some level (e.g., plot-based).
Metadata
EML - Ecological Metadata Language: a standard developed by the ecological
discipline to describe ecological data
Metadata creation tool: Morpho
Summaries: DCC and BioSharing
Darwin Core: developed to describe biological diversity. Primarily based on taxa and
their occurrence as recorded by observations, specimens, samples and related
information. Often supplemented with EML to provide an ecological context to the
data.
Metadata creation tool: Darwin Core Archive Assistant
Summaries: DCC and BioSharing
Repositories
*
ÆKOS - Australian Ecological Knowledge and Observation System: A TERN facility
that takes structured (e.g., plot-based) ecological data; this can include
photographs, sound recordings, maps, molecular data, and links to additional
literature. Data is submitted using the SHaRED data submission tool. ÆKOS uses a
mix of metadata standards, including EML and ICAT. re3data summary.
KNB - The Knowledge Network for Biocomplexity: an international repository that
takes ecological and environmental data. BioSharing and re3data summaries.
*
LTERN – Long Term Ecological Research Network: A TERN facility that is composed
of 12 ecological plot networks across Australia that have been monitored for
several years, and up to decades.
*
ALA - Atlas of Living Australia: repository for species records (point-based data),
based on field observations, surveys, and specimens from biological collections. Can
also take photographs, sound recordings, maps, molecular data, and links to
additional literature. The ALA is the Australian node of the GBIF. Re3data summary.
GBIF – Global Biodiversity information Facility: an international open data
infrastructure, funded by governments, that coordinates the biodiversity
information facilities of participating countries and organisations. The GBIF
Integrated Publishing Toolkit (IPT) uses EML as its metadata standard. BioSharing
and re3data summaries.
Comments
Specialised data associated with ecological data should be submitted to the most
appropriate repository (e.g., genetic data sent to GenBank), with appropriate links
to the associated ecological data.
Researchers should consider separating occurrence data for sensitive species (or
potentially sensitive data) from the spatial locations of occurrences. See Sensitive
Information (page 25) for more information.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 35
Sociological data
Social science data collected using quantitative or qualitative methods.
Metadata
DDI – Data Documentation Initiative: an international effort to create a standard to
describe statistical and social science data
Metadata creation tool:
Summaries: DCC and BioSharing
da|ra: an extension of the DataCite metadata standard, developed for social
sciences
Metadata creation tool: via the GESIS data archive
Repositories
*
ADA - Australian Data Archive: a national service for the collection and
preservation of digital data. Consists of seven sub-archives, including Social Science,
Historical, and Indigenous. Information relating to Indigenous communities can be
deposited with ATSIDA (Aboriginal & Torres Strait Islander Data Archive): see the
ATSIDA website for protocols and more information. Uses the DDI metadata
standard. Re3data summary.
GESIS – Leibniz-Institute for the Social Sciences: an open repository for social
science and economic data. Uses the da|ra metadata standard. Re3data summary.
ICPSR – Inter-university Consortium for Political and Social Research: international
repository for political, social, and behavioural data. Offers training in quantitative
methods to facilitate effective data use. Largest collection of accessible computerreadable data in the social sciences and humanities in the United Kingdom. Uses
DDI metadata standards. See also openICPSR. Re3data summary.
QDR – Qualitative Data Repository: accepts digital data generated through
qualitative and multi-method research in the social sciences. Re3data summary.
Comments
The Atlas of Living Australia is also developing a number of tools to record
Indigenous Ecological Knowledge, and they are interested in working with new
projects. Contact the ALA if you are interested in being involved in the pilot project.
Grey literature
May include reports, fact sheets, presentations, recommendations,
Comments
Attachment E of the NESP Data and Accessibility Guidelines provides detailed
suggestions for best practice in the publication and management of grey literature
and its associated metadata.
Depending on the format and subject, grey literature it may be suitable for
deposition in a subject-specific repository, either as a stand-alone document or as
associated information for other material (e.g., a data set). Alternatively, grey
literature can be submitted to institutional repositories or to several crossdisciplinary repositories (e.g., figshare, or to Dryad if the document is related to a
scientific publication).
A range of metadata standards have the flexibility to record information relevant to
grey literature, for example Dublin Core, ANZLIC, EML.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 36
Models and simulations
Metadata
CIM – Common Information Model: a standard that describes climate data, the
models and software from which they derive, the geographic tools use to calculate
and project them, and the simulations and other processes that produced them.
DCC summary
SED-ML: an XML-based format for encoding simulation setups, to ensure
exchangeability and reproducibility of simulation experiments. It follows the
requirements defined in the MIASE guidelines.
BioSharing summary
Repositories
BioModels: a repository of computational models of biological processes BioSharing
summary.
Comments
Guidelines for depositing model code and simulations in online repositories are
currently not well developed, however there exist several options.
Models can be published in stand-alone papers (published under CC BY 4.0) and
then referred to in later papers, although any changes to the models will need to be
detailed. Alternatively, the version of the code and simulations used in a paper can
be included in the paper’s supplementary material. Again, the paper must be
published under CC BY.
The material can be deposited in a repository: BioModels is one example of a
repository of models for biological processes, although it focuses on cell biology.
Other general repositories (e.g., Dryad, figshare) might also be suitable, as might
services such as GitHub.
Finally, models and simulations can be placed in a repository as material associated
with a dataset.
See also the Minimal Information Required In the Annotation of Models (MIRIAM)
website and the related Minimum Information About a Simulation Experiment.
Image or video data
Comments
Image or video data will typically be associated with ecological data (e.g., is derived
from camera traps, standardised photo points) and is therefore best deposited with
the ecological data in an ecological data repository. However, the metadata for the
data package should include detailed metadata for the images/videos. See the
Guidelines for Handling Image Metadata Version 2.0 for information on metadata
standards for a wide range of digital media, and also the NISO Metadata for Images.
Meta-analyses and compiled data
Comments
Depending on the subject, data compiled from a range of sources (e.g., scientific
publications, legal documents, management plans, species profiles etc.) may be
suited to a subject-specific repository, or a general inter-disciplinary repository. See
the appropriate subject within this appendix for more information.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 37
General research data, cross-disciplinary standards and repositories
The following metadata standards and repositories are inter-disciplinary and can therefore
describe/take the full range of research outputs.
Metadata
Dublin Core: an inter-disciplinary standard; one of the best known and most widely
used metadata standards
Metadata creation tool: a range of tools is available here
Summaries: DCC and BioSharing
DataCite Metadata Schema: an inter-disciplinary standard
Metadata creation tool: DataCite API for datacentres
Summaries: DCC and BioSharing
RIF-CS: developed as a data interchange format to support the electronic exchange
of collection and service descriptions.
Metadata creation tool: can only be created via the RDA website
Summary: BioSharing
Repositories
Data Dryad: an international repository of data underlying peer-reviewed scientific
and medical literature. Uses a version of Dublin Core as its metadata standard, in
addition to Darwin Core, Bibliographic Ontology and its own repository-specific
elements. Uses a CC0 waiver. BioSharing and re3data summaries.
figshare: an international, inter-disciplinary repository that can take academicrelated material in almost any format. Material does not need to be associated with
a publication. figshare entered into institutional relationships with the University of
Melbourne and Monash University in Australia, and several others in the UK. Data
licensed via a CC BY 4.0. BioSharing and re3data summaries.
*
RDA – Research Data Australia: a metadata repository administered by ANDS.
Harvests metadata from partner organisations (and can therefore consume a
variety of metadata standards), but does not hold data itself. Data licensed via a CC
BY 4.0. re3data summary.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 38
Appendix B. Research output submission checklist
Pre-submission checklist
Select a suitable repository based on the subject and format, the repository’s specialisation
and/or audience, its licensing terms, and legacy arrangements.
Confirm with the TSR Hub’s Chief Operations Officer that the repository you wish to use is
approved by DotE. If the repository is not approved, send its details to the Chief Operations
Officer so that approval can be obtained.
Obtain the consent of the data owner (if not yourself) to publish a research output (data,
publication etc.).
Ensure that the metadata for your research output is detailed and complete. Refer to the
Metadata section for an overview of the elements likely to be required.
Confirm that the research output will be published under the most open licence possible, and
preferably a CC BY 4.0 licence
Obtain permission to publish the research output from your Project Leader
Ensure that your data meets all ethical and legal requirements for handling sensitive
information.
Check that your research output (or its metadata) includes:
The TSR Hub boilerplate to acknowledge all partners and contributing organisations,
DotE, and NESP
A link to the TSR Hub website
The Theme and Project under which your project was funded
All relevant ANDS project identifiers
Updated links between related research outputs: cite datasets in publications, and
include citations and links to associated publications in the metadata of datasets.
Post-submission checklist
At least five working days before the output becomes publicly available, submit a Research
Product Submission Form to 1) the Department of the Environment, 2) your Project
Leader(s) 3) the TSR Hub’s Chief Operations Officer ([email protected]).
Send a copy of your research output to the Chief Operations Officer ([email protected])
for inclusion on the TSR Hub website.
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 39
Appendix C. Blank Project Data Management Plan
A blank copy of the TSR Hub project Data Management Plan can be found on the following pages.
Copies can also be obtained by emailing the TSR Hub’s Chief Operations Officer, Melanie King
([email protected]).
Project Data Management Plans should be completed within 12 months of starting a project and
sent to Melanie King ([email protected]). Plans should be revised regularly, and at least annually
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 40
Threatened Species Recovery Hub
Project Data Management Plan
Project Data Management Plans should be completed within 12 months of starting a project. Completed plans
should be sent to the TSR Hub’s Chief Operations Officer, Melanie King ([email protected]).
Project details
Project number and name
Project leader(s)
Lead organisation(s)
Partner Investigator(s)
Partner organisation(s)
Project leader contact details
Project summary
Collect
What types of research products will be collected and/or produced during the project?
Describe the methods, processes, and software that will be used to generate each type of research product.
Is ethics approval (human and/or animal) required for this project? If so, give details.
Describe any third party data to be used in the project and the licence agreement to use the data.
How and where will data and related information be backed up and stored during the project?
In which file format(s) will data be stored during the project?
What is the expected volume of data that will be collected or generated?
How stable is the data, and how will you control for different file versions?
How will you keep data secure during the project?
Assure
Describe conditions during data collection that may affect the quality of data.
Describe any quality check processes.
Describe
Which metadata standards will be used to describe the data?
Which tools will be used to create and manage metadata?
Which standard keyword lists and/or controlled vocabularies will be used to describe data?
Preserve and discover
How will data be shared between project partners and external researchers?
How will data and other research products be stored and made freely accessible to the public in the long term?
In what format will data be stored and published in the long term?
How will data be retained and/or disposed of?
Which licensing policy will be used to share data and other research products?
Will there be embargo period for any data? If yes, give details.
If applicable, describe procedures to access restricted data.
Data Management Administration
What is the staff allocation for data management?
Who is responsible for managing and backing up data?
How will project participants be trained to manage data?
What is the technology cost to store data?
What is the technology cost to make data accessible?
Date this plan was prepared
Date of next review
Appendix D. Web resources to manage research data
There are a large number of resources available on the internet that go into great detail about all the
aspects of managing research data and making it publicly accessible. The following list is a very brief
selection of these resources, arranged in alphabetical order. For additional information, google.
Australian National Data Service: http://ands.org.au/
An Australian NCRIS facility that supports the management of research data in Australian
institutions. The ANDS website has an enormous amount of information and guidelines for every
stage of the data management process.
DataONE: https://www.dataone.org/
Provides infrastructure, tools, and training resources to manage data.
Digital Curation Centre: http://www.dcc.ac.uk/about-us
A UK centre of expertise in digital information curation. Has a UK focus, but provides a large number
of how-to guides, case studies, and training programs.
DMPTool: https://dmptool.org/
A data management planning tool; has excellent general information within the Help menu.
Inter-university Consortium for Political and Social Research:
http://www.icpsr.umich.edu/icpsrweb/landing.jsp
Information on managing social science data
MANTRA: http://datalib.edina.ac.uk/mantra/
A free online course for those who manage digital research data
Open data handbook: http://opendatahandbook.org/
Discusses the legal, social, and technical aspects of open data.
UK Data Archive: http://www.data-archive.ac.uk/
An online repository that also has a lot of information about data management.
Each of the TSR Hub node institutions provides advice on data management. See the ANDS website
for a comprehensive list of codes of practice, policies, procedures, and training for Australian
Universities: https://projects.ands.org.au/policy.php
TSR Hub Data and Information Management Plan V3.0 – June 2016
Page 43
Funding partners:
Collaborating organisations:
ACT Government, Environment and Planning
Department of Parks and Wildlife (WA)
BirdLife Australia
Department of Land Resource Management (NT)
Booderee National Park
Department of Environment and Heritage Protection (QLD)
Bush Heritage Australia
Greening Australia
CSIRO Publishing
NSW Office of Environment and Heritage
Department of Environment, Land, Water and Planning (VIC)
Parks Victoria
Department of Primary Industries, Parks, Water and Environment (TAS)
The PEW Charitable Trusts
Department of Environment, Water & Natural Resources (SA)
Royal Botanic Gardens Melbourne
Taronga Conservation Society Australia
Woodlands and Wetlands Trust
Tasmanian Land Conservancy
WWF Australia
Threatened Plants Tasmania (Wildcare Inc)
Royal Zoological Society of SA Inc
The Nature Conservancy
Further information:
http://www.nespthreatenedspecies.edu.au/
Contact:
Melanie King
Chief Operations Officer
Centre for Biodiversity and Conservation Science
Room 532, Goddard Building
The University of Queensland
St Lucia, QLD 4072, Australia
T (+61 7) 3365 6907
M (+61) 412 952 220
E [email protected]