Photo: Natalie Tapson (Flickr CC BY-NA-SA 2.0) Data Management Plan April 2016 This project is supported through funding from the Australian Government’s National Environmental Science Programme. VERSION CONTROL REVISION HISTORY Version Date revised Reviewed by (Name, Position) 1.0 18/12/2015 TSR Hub Leadership Group 2.0 03/03/2016 2.1 3.0 19/03/2016 30/05/2016 CdL Comment (review/amendment type) First draft based NESP Data and Accessibility Guidelines Second draft based on feedback from TSR Hub Leadership Group Comments Amendments undertaken TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 2 Acknowledgements The Threatened Species Recovery Hub wishes to thank the many people that contributed their expertise towards developing this data and information management plan. Particular thanks go to staff from TERN’s ÆKOS facility, the Australian National Data Service, and Atlas of Living Australia, and to the LTERN (Long Term Ecological Network) facility for their advice regarding data submission guidelines. Many thanks also to staff from the University of Melbourne’s Digital Scholarship and Research Platforms groups, and to Monash University’s Research Data Management Unit. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 3 Contents ACKNOWLEDGEMENTS ................................................................................................................................. 3 EXECUTIVE SUMMARY .................................................................................................................................. 5 INTRODUCTION ............................................................................................................................................... 7 THREATENED SPECIES RECOVERY HUB ....................................................................................................................... 7 PURPOSE OF THE TSR HUB’S DATA AND INFORMATION MANAGEMENT PLAN .................................................... 8 COPYRIGHT AND LICENSING.....................................................................................................................10 WHO CAN APPLY A LICENCE TO RESEARCH OUTPUTS? ............................................................................................ 10 OBTAINING COMPILED DATA: DATA SHARING AGREEMENTS ................................................................................. 11 LICENSING COMPILED DATA......................................................................................................................................... 12 TIMELINES FOR MAKING RESEARCH OUTPUTS PUBLICLY AVAILABLE ....................................13 OPEN-ACCESS PUBLICATIONS ...................................................................................................................................... 13 OPEN ACCESS DATA AND OTHER NON-PUBLICATION RESEARCH OUTPUTS ......................................................... 15 EXCEPTIONS TO THE OPEN ACCESS POLICY ............................................................................................................... 15 SENSITIVE INFORMATION PROVISIONS ............................................................................................... 15 SENSITIVE ECOLOGICAL DATA ..................................................................................................................................... 16 SENSITIVE CULTURAL OR HUMAN DATA .................................................................................................................... 16 UNIQUELY IDENTIFYING RESEARCH OUTPUTS .................................................................................17 MINTING DOIS FOR STATIC AND DYNAMIC DATASETS ............................................................................................ 17 SELECTING A SUITABLE REPOSITORY...................................................................................................17 INSTITUTIONAL REPOSITORIES ................................................................................................................................... 18 NATIONAL REPOSITORIES ............................................................................................................................................ 19 INTERNATIONAL REPOSITORIES .................................................................................................................................. 19 METADATA......................................................................................................................................................20 METADATA STANDARDS ............................................................................................................................................... 20 DESCRIPTIVE METADATA ............................................................................................................................................. 21 RESEARCH OUTPUT SUBMISSION GUIDELINES .................................................................................23 DATA PREPARATION GUIDELINES ............................................................................................................................... 23 CITING DATA................................................................................................................................................................... 26 LINKING DATA WITH RELATED RESEARCH OUTPUTS............................................................................................... 27 ACKNOWLEDGING NESP SUPPORT AND THE TSR HUB ......................................................................................... 27 ANDS PROJECT IDENTIFIERS ....................................................................................................................................... 27 PERMISSION TO PUBLISH RESEARCH OUTPUTS......................................................................................................... 28 NOTIFYING DOTE OF TSR HUB RESEARCH OUTPUTS ............................................................................................. 28 LINKING RESEARCH OUTPUTS WITH THE TSR HUB’S WEBSITE ............................................................................ 28 DATA MANAGEMENT AT THE PROJECT-LEVEL ..................................................................................28 PROJECT DETAILS .......................................................................................................................................................... 28 COLLECT .......................................................................................................................................................................... 29 ASSURE ............................................................................................................................................................................ 31 DESCRIBE ........................................................................................................................................................................ 32 PRESERVE AND DISCOVER ........................................................................................................................................... 32 DATA MANAGEMENT ADMINISTRATION ................................................................................................................... 33 APPENDICES ...................................................................................................................................................34 APPENDIX A. TSR HUB DATA, DATA REPOSITORIES, AND METADATA SCHEMAS ............................................... 34 APPENDIX B. RESEARCH OUTPUT SUBMISSION CHECKLIST .................................................................................... 39 APPENDIX C. BLANK PROJECT DATA MANAGEMENT PLAN ................................................................................... 40 APPENDIX D. WEB RESOURCES TO MANAGE RESEARCH DATA .............................................................................. 43 TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 4 Executive summary All TSR Hub outputs, including publications and data, should be made publicly available under the latest Creative Commons (CC) framework unless there are valid reasons for not doing so (e.g., sensitive threatened species, social, or cultural information). Where possible, data should be licenced under the most open Creative Commons licence (CC BY 4.0 at the time of writing). Only copyright owners can apply a licence to a research output. You could be the copyright owner if you generated a research output, but this varies with the intellectual property policies of the organisation you work for. Researchers should check their institution’s IP policies before attempting to licence a research output. The use of pre-existing (Third party) data, including both entire datasets and subsets of data, should be governed by a data sharing agreement that clearly states the terms and conditions of use for the data. There are many types of data sharing agreements, and third parties will often have their own agreement. The legal services unit of the lead researcher’s organisation should review all data sharing agreements before signing. Third parties should be encouraged to make pre-existing data used by the TSR Hub, or at least the metadata, publicly available. Options to make pre-existing data publicly available should be discussed before data is shared, and the results of this discussion incorporated into the data sharing agreement. Publications should be made freely and openly available on the internet within 12 months of publication. Publication reprints are the preferred format, but post-prints are also acceptable. There are five publishing scenarios available to TSR Hub researchers that meet both NESP’s requirements for public access and publisher licensing agreements. Data and other non-publication research outputs should be made publicly available on the internet within 12 months of collection. Exceptions to the open access policy are acceptable in some cases, for example if information is culturally, environmentally, or socially sensitive. Delays or embargoes on making data publicly available may also be permitted in some cases. The option to restrict access to research outputs should be discussed with Project Leader(s) and the Hub’s Chief Operations Officer in the first instance, before being put to the Department of the Environment (DotE) for approval. Potentially sensitive ecological data should be curated in accordance with the guidelines established by the Atlas of Living Australia’s Sensitive Data Service. Researchers should also consider publishing spatial information for potentially sensitive species separate to information on the species themselves (e.g. abundance, demographic information). Approval from an institutional, and possibly state agency, Animal Ethics Committee will be required for all projects that are collecting new data on animals. Permits may also be required for work with animals, national parks etc. Researchers must also follow the federal codes of conduct for working with animals. Projects collecting cultural or human data are required to have approval from a Human Ethics Committee, and must also conform to the federal guidelines for conducting human research. Human research data may be suitable for public release if a number of steps are followed when acquiring ethics approval, collecting the data, and preparing if for publication. All research outputs should be assigned a persistent identifier (PI) such as a DOI (digital object identifier) or PURL (Permanent Uniform Resource Locator). PIs are typically minted during the of submitting research outputs to a repository or publishing a manuscript. There are different protocols for minting DOIs for static and dynamic datasets, and care should be taken to prevent the TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 5 duplication of datasets. Publishers and data repositories are typically responsible for ensuring that PIs remain resolvable, however some repositories pass this on to researchers. Data should be deposited into an established repository, and one that is most appropriate for the data. Institutional, national, and international repositories may all be suitable; repositories need to be approved by DotE. Among other considerations, the decision of where to deposit data should be based on the subject of the data, how it is likely to be used in the future, a repository’s licensing framework and handling of sensitive data, and costs to depositing or maintaining the data. High quality metadata about a resource or object should be recorded from the beginning of a research project, and continue for the life of the project. There are a wide range of metadata standards, ranging from subject-specific to general standards. Data repositories typically use just one metadata standard that is tailored to the subject or type of data that the repository holds. General, descriptive metadata should be recorded for all projects, regardless of the metadata standard used when making data publicly accessible. Detailed Research Output Submission Guidelines are provided to help prepare data and other research outputs for publication. Related research outputs should be linked via formal citation within the output and/or in their metadata. Datasets should be cited in publications and included in the reference list, while the metadata of datasets should include an up to date list of, and a link to, related research outputs. All research outputs should include the TSR Hub boilerplate to acknowledge NESP support, project partners, contributing organisations and individuals. The Australian National Data Service is developing project-specific codes for each TSR Hub project. These codes should be associated with all research outputs, either in the acknowledgments or keywords, or in the output’s metadata. Project Leaders should approve the final version of all research outputs (and their metadata) before they are submitted to a journal, repository, or other publishing platform. A copy of the completed the NESP Research Product Submission Form should be sent to DotE, Project Leader(s), and the TSR Hub’s Chief Operations Officer at least five working days before any research outputs are publicly released. A copy of all publications should also be sent to Project Leader(s) and the TSR Hub’s Chief Operations Officer at the time of publication. Each project is required to develop a data management plan within 12 months of starting. Project Data Management Plans will detail how data will be collected and managed throughout the life of a project. Completed plans should be sent to the TSR Hub’s Chief Operations Officer. Plans should be revised annually. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 6 Introduction Threatened Species Recovery Hub The National Environmental Science Programme (NESP) is a long-term commitment to environment and climate research. Funded by the Australian Government via the Department of the Environment (DotE), the key objective of the NESP is to improve our understanding of Australia’s environment through collaborative research that delivers accessible results and informs decision-making. The focus of the NESP is therefore on practical and applied research that informs on-ground action and that will yield measurable improvements to the environment. The NESP is delivered through six multi-disciplinary research Hubs or consortia, hosted by Australian research institutions. One of these hubs, the Threatened Species Recovery Hub (TSR), Hub aims to support the development of regulations and investments to manage threats and improve the recovery of threatened species and biodiversity in terrestrial and freshwater ecosystems. The TSR Hub will provide a robust evidence base and strategic advice that will contribute significantly to the overarching objectives of: an improved conservation outlook for a substantial proportion of Australia’s threatened species and ecological communities cost-effective strategies to provide early warning about extinction risk and triggers for immediate action ongoing decline in the number of Australian species eligible for listing as threatened through enhanced management of threats increased community knowledge of, engagement, and investment in threatened species conservation efficient, effective, coordinated, and evidence-based public policy and strategies for conserving threatened species. Further details on the TSR Hub’s research priorities and research projects can be found on the TSR Hub's website (www.nespthreatenedspecies.edu.au). Data and information: freely and openly accessible The Australian Government's position is that information funded by the Government is a national resource that should be managed for public purposes. This openness is driven by Government agencies and the research community’s identification of a need to promote open access to public sector and publicly funded information. Open access to Government-funded information is DotE’s default position, with exception only for privacy, security, or confidentiality reasons. In accordance with this position, and as outlined in the NESP Programme Guidelines and the NESP Data and Accessibility Guidelines, research outputs generated from NESP research are expected to be freely accessible and available on the internet. The TSR Hub is expected to generate a broad range of research outputs, including: raw data sets e.g., raw, non-aggregated data (or other specified observational units) publications, including scientific papers, reviews, books, and book chapters grey literature, including fact sheets, project profiles, and technical reports images, maps, photos, videos, and animations models and other tools created by the research process (e.g., Decision Support Tools) websites created specifically as a research output mobile or tablet apps unspecified emerging technology. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 7 To satisfy the public accessibility requirements of the NESP Data and Accessibility Guidelines, it is expected that researchers will take steps to meet the following criteria: 1. licence research outputs generated from NESP research 2. make research outputs generated from NESP research freely and openly available on the internet as soon as they are approved by the Leadership Group and DotE 3. make publications freely and openly available on the internet within 12 months of publication. As a minimum requirement, this will be in the post-print format. Providing access to the data and other research outputs generated by NESP research will provide upto-date, high quality data and information to decision-makers, environmental managers, other scientists, and the general public. This will increase the opportunity to take a more collaborative, informed approach to managing Australia's environment. Purpose of the TSR Hub’s Data and Information Management Plan The TSR Hub’s Data and Information Management Plan outlines how TSR Hub researchers are expected to manage their data and other research outputs before, during, and after their research project. This Plan therefore provides guidance on the key steps, resources, and tools needed to ensure that TSR Hub research outputs are adequately and securely backed up, and that they can be easily searched for, and accessed, by the public. This Plan should be read in conjunction with the NESP Programme Guidelines and the NESP Data and Accessibility Guidelines. A number of conditions must be met, and steps taken, before research outputs funded by NESP can be made publicly accessible (Figure 1). Where possible, this includes: applying the appropriate licence to research outputs uniquely identifying research outputs depositing research outputs in an appropriate repository, and in a timely manner associating detailed metadata with research outputs choosing file formats that facilitate long-term access linking associated research outputs including the boilerplate to acknowledge NESP funding and collaborators ensuring provisions are made for any sensitive data or information. The following sections detail these conditions and steps, linking to external websites for specialist information when required. Where appropriate, guidance is provided for the management of preexisting or third party data, as the TSR Hub will make extensive use of such data. TSR Hub researchers are also required to develop their own data management plan for each project. Where possible, examples have been provided for ways to manage research products that the TSR Hub is expected to produce, including detailed submission guidelines for tabular research data, and general submission guidelines for all types of research products. However, given the variety of these research outputs, there is no one set of standards or tools for researchers to adopt to meet the NESP funding agreement, and it is beyond the scope of this document to detail specific data management guidelines for every possible type of research output: for research outputs without specific guidelines, the conditions of the funding agreement must still be met. The strategies and tools you use to manage your data and research outputs should therefore meet the three conditions of the NESP funding agreement, but should also be guided by the type of research outputs you will produce, any sensitivities about public accessibility to data, in addition to any legacy arrangements (e.g., between researchers and data repositories) that you might have. If you are unsure about how best to meet the expectations of the NESP Programme Guidelines or to manage your data, please contact your Project Leader, the TSR Hub’s Chief Operations Officer (Melanie King: [email protected]), or your institution’s data management advisors. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 8 Figure 1. Steps in the publication and data management process include obtaining permits and approvals, collecting and analysing data, publicly releasing all research outputs, and linking research outputs and the TSR Hub website. Approval from Project Leader(s), the TSR Hub’s Chief Operations Officer, and/or the Department of the Environment (DotE) must be sought for several steps. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 9 Copyright and Licensing NESP hub funding agreements require all research outputs to be made publicly available under the latest Creative Commons framework (Creative Commons Version 4.0 International at the time of writing), unless there are valid reasons for not doing so (see Exceptions to the open access policy on page 15 for more information). This licensing framework provides clear permissions, terms, and conditions upon which data can be re-used, and has been endorsed for the Australian Government Open Access and Licensing Framework (AusGOAL). AusGOAL establishes a common approach to data licensing across research and government, and facilitates use and reuse of data for further innovation and research. This is consistent with DotE’s Information Licensing Policy. The AusGOAL framework includes six individual Creative Commons licences, one restrictive licence, and one software licence. See the AusGOAL suite of licences for guidance of selecting a licence. DotE’s default requirement is for all outputs resulting from the NESP to be licensed under the most open Creative Commons licence, CC BY 4.0 (at the time of writing). The CC BY 4.0 licence is the most accommodating of licences offered, and is recommended for maximum dissemination and use of licensed materials. This licence allows others to copy, remix, transform, and build upon work, even commercially, as long as they credit the licensor for the original creation. In addition to licences, Creative Commons also offers a Creative Commons Zero (CC0) tool that allows licensors to waive all rights and place a work in the public domain. Use of the CC0 waiver is not recommended, however, because there are some doubts about its force under Australian law, particularly with respect to moral rights. Furthermore, the disclaimer that accompanies CC0 may be ineffective in protecting the user from liability for claims of negligence. It is therefore recommended that CC BY licences are applied to research outputs rather than CC0 waivers, although it should be noted that some data repositories (e.g., Data Dryad) only offer CC0 waivers. Who can apply a licence to research outputs? You have a right to apply a licence to a research output if you are its copyright owner. You may be the copyright owner if you generated the research output yourself, but this will also vary with the intellectual property (IP) policies of the institution or organisation for which you work. All Australian research organisations have IP policies that clearly describe the ownership of IP developed at the institution. These policies also stipulate copyright and licensing rights. The following points outline general rules regarding IP at research organisations: Where research outputs are produced by an employee acting in the ordinary course of their employment duties, IP typically belongs to a person’s employer. Some organisations recognise the rights of researchers to their research outputs, however, and transfer to researchers the IP for their scholarly works (e.g., any article, book, manuscript, manual, musical composition, diagram, photograph, creative writing, film or like publication). The IP rights of visiting researchers, or researchers working under joint appointments, may differ to those of academic staff. Students typically retain ownership of IP created by them, however IP policies may vary where the IP was jointly developed by staff, or where the IP is the subject of specified agreements (e.g., third-party agreements with external organisations such as granting bodies, or public or private sector organisations funding contract research and development at an organisation). Where such agreements occur, they typically do not apply to a student’s thesis or scholarly works, except where explicitly agreed. It must be stressed that IP policies vary between organisations, and also with specific contractual agreements, so researchers will need to check which rules apply to them. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 10 As research outputs generated by the Hub must conform to the NESP funding agreement, new contracts with staff and students should clearly define who owns the copyright over research outputs that they produce, and the requirement to license research outputs and make them available to the public within NESP’s specified time frames (page 13). Where no such definition exists, staff and students are still encouraged to license their research outputs and make them available in a manner consistent with the NESP Programme Guidelines, the NESP Data and Accessibility Guidelines, and the guidelines outlined in this document. Staff and students should refer to their institution’s IP Policy for guidance and/or contact their legal department if they have any questions. Obtaining compiled data: data sharing agreements The TSR Hub will make extensive use of pre-existing (third party) data, where such data is defined for the purposes of this Data and Information Management Plan as any data collected by an organisation that is not one of the TSR Hub nodes, or any individual(s) not employed by one of the TSR Hub nodes. Third party data includes entire datasets and subsets of datasets. It is important that any third parties give their consent for how their data is to be used and potentially shared, and that this should happen before the data has been shared. It is therefore recommended that the use of third party data be governed by a data sharing agreement that clearly states the terms and conditions of use for the data. There are many types of data sharing agreements, including letters, memorandums of understanding, licences, contracts etc. Which of these data sharing agreements is most appropriate will depend on the circumstances. A contract might be required if money is exchanged, for example, while a letter of exchange might be appropriate for a one-off sharing agreement. Data sharing agreements should outline the relationships between the parties, which data are to be shared, and how the data may be used. The National Statistical Service’s (NSS) Good practice guide to sharing your data with others suggests considering the following elements when preparing a data sharing agreement: Aims and purpose: state why the data is being shared, what will the data be used for, and how will it be used. Data definition: define the data that is being shared, including its formats and metadata, and the terms of the data sharing agreement. Legal restrictions: include the legislative and consent-based restrictions of the data. This should include consideration of who owns the copyright and any IP, whether licences are required to access the data, and whether there are privacy concerns around the data. Governance: describe the governance arrangements for all aspects of the shared data, including the process and operational management of the data, and maintaining and terminating the agreement. Access issues: detail how the data will be accessed and shared, both during the project and after any analysis is complete, including licensing requirements to access the data, and also for the compiled data or research products that arise from the data. It is NESP’s preference that data is licensed under the broadest possible terms (a CC BY 4.0 licence) Data quality: include information on the data’s “fitness for purpose”, including the methodology used to create or generate the data and any processes used to monitor and assess data quality. Data management: describe the processes and systems that ensure data security and integrity. Costs: detail any costs, including a payment schedule. Greater detail about each of these elements can be found on the NSS website. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 11 Third parties will often have their own data sharing agreements; researchers should ensure that the terms of such agreements are aligned with the requirements of the NESP funding agreement. Most institutions also have their own templates that can be used, or adapted where necessary. Third party data sharing agreements and non-standard institutional agreements should be referred to a researcher’s institution’s legal department for approval before signing. Licensing compiled data If your data is a compilation or derivative of a third party’s work, then you may not have the rights to license the data or other research outputs. Ideally, use of such third party data will be governed by the licence and terms of the data sharing agreement that you entered into when obtaining the data. This agreement should specify how data can be used by you and others, and whether, to whom, and how data can be made available at the end of the project. Use of third party data should therefore be negotiated under the broadest possible terms (i.e., a CC BY 4.0 licence). Similarly, if your research is based on pre-existing (or background) IP then the research agreement you entered into when obtaining the data should include clauses describing the treatment of background IP. Note: if you are using third party data that is already publicly available, it is not possible to license it under anything other than the terms under which it was originally licensed. Options for licensing compiled data It is DotE’s preference that all data, including pre-existing data, which is generated or used by the TSR Hub, be made freely and publicly available. Thus, if It is not already publicly available, the option to deposit pre-existing data in a public repository should be discussed with the copyright holder when negotiating the licence or data sharing agreement. There are three options for licensing compiled data: 1) Under DotE’s preferred scenario, pre-existing data that is not already publicly available would be deposited in an online repository, and a DOI would be minted for the dataset. All research outputs that use that dataset should cite the data following standard procedures, or as outlined by the repository. See Uniquely identifying research outputs (page 17) and the research output submission guidelines (page 23) for more information. 2) If the copyright holder of pre-existing data is unwilling or unable to make it publicly available, then an alternative option is to negotiate a license under which a dataset’s metadata will be made publicly available, but not the dataset itself. In this case, a DOI will be minted for the metadata, and the metadata should also include the name and contact details of the data owner, and preferably also the data creator. Manuscripts based on the data would refer to the data DOI, but users would need to contact the owner to negotiate access to the data itself. 3) If the copyright holder or data creator is unwilling to include any provisions for public accessibility to data or metadata in the data licence or data sharing agreement, then publications must refer to the dataset using: the name of the dataset the name of the organisation that owns the data (i.e., the copyright holder) the role of the person to contact about accessing the data, and, preferably, the website of the copyright holder. This is DotE’s least preferred option. If a compiled dataset is published in association with a scientific paper (e.g., as electronic supplementary material), and the paper is publicly accessible according to NESP’s open access requirements (page 13), then the dataset fulfils the requirements for public access. Under this scenario the dataset does not need to be separately deposited in an online repository. See Citing data (page 26) for more information on how data can be made publicly accessible. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 12 Timelines for making research outputs publicly available Open-access publications Under the NESP funding agreement, all publications must be made freely and openly available on the internet within 12 months of publication. Publications include peer reviewed scientific papers, books, reports (grey literature), and other published material. At a minimum, publications must be the final text and figures as accepted for publication. Pre-prints (text and figures before peerreview), abstracts, and/or citations alone do not meet NESP’s requirements. The NESP Data and Accessibility Guidelines outline a number of publishing scenarios that meet their requirements, depending on a publisher’s licensing framework (Figure 2, grey boxes). These Guidelines preclude researchers from publishing in journals (or with publishers) that do not allow free access to publications within 12 months. It is the TSR Hub’s preference that publications arising from Hub activities are published in the most appropriate place for the work, and that researchers are free to make this decision themselves, in consultation with colleagues and supervisors (where appropriate). In anticipation of instances where a researcher’s preferred publisher will not provide access to publications under the scenarios outlined by NESP, the TSR Hub will implement its own process to make publications available from the TSR Hub’s website, in a way that satisfies both NESP and publisher licensing requirements (Figure 2, yellow box). There are therefore five publishing scenarios available to Hub researchers that meet both NESP’s requirements for public access and publisher licensing agreements (Figure 2). All publications and research outputs are required to be made available from the TSR Hub’s website and the researcher is to ensure a copy is forwarded to the Chief Operations Officer in a timely manner to ensure this takes place. These scenarios are outlined below, in order of decreasing preference. 1. The publication is freely available online immediately upon publication. If there are no restrictions to making the publication freely available, all publications will be accessible on the publisher’s website and via the TSR Hub website. 2. The publication re-print is freely available online within 12 months of publication. Reprints are the final version of the manuscript, complete with journal/publisher formatting. Under this scenario, re-prints would be available from the publisher’s website, the TSR Hub’s website, and/or another website or online repository. 3. The publication post-print is freely available online within 12 months of publication. Postprints are the peer-reviewed text and figures as accepted for publication, but without journal or publisher formatting. Publication post-prints would typically be made available from the TSR Hub’s website, and/or another website or online repository. Post-print PDFs should include a text header advising where the article has been published and link to the article’s Digital Object Identifier (DOI). Individual journals often provide set wording that should accompany post-prints of their articles. 4. In cases where a publisher will not allow a copy of the re-print or post-print to be made publicly available within 12 months, there may be other alternatives available to ensure open access, such as negotiating to publish under an alternative licence, or the payment of an open access fee. Payment of open access fees are the responsibility of the Hub; DotE’s preference is for research funding to be used for research, in preference to publication fees. If you wish to publish in a journal or other publication that requires payment of an open access fee, then this should be discussed with, and approved by, your Project Leader prior to submission. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 13 5. For publications that do not fit within any of the previous scenarios, the TSR Hub will develop a process to make them available from the TSR Hub website in a way that meets both funding and licensing requirements. Researchers may therefore submit their work for consideration in their preferred journal. Note: while the NESP funding agreement states that publications must be made available within 12 months, if a publisher will allow re-prints or post-prints to be uploaded to a website after a 12month embargo, then this is acceptable to DotE. Publications should therefore be uploaded as soon as the 12-month embargo period has passed. There are several web-based tools that enable you to easily check the copyright regulations of most journals. For example, the SHERPA/RoMEO website allows you to search publisher copyright policies and self-archiving options for over 22,000 journals. Please note that this information should not be relied upon for legal advice, and it is the responsibility of the researcher to ensure that the information obtained is correct and current. Figure 2. DotE has outlined a number of steps to determine whether a journal meets DotE’s preferred publishing arrangement and the NESP funding agreement. Where this is not the case, the TSR Hub will make articles available from the TSR Hub’s website via a process that satisfies both licensing and NESP funding requirements. Figure adapted from the NESP Data and Accessibility Guidelines. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 14 Open access data and other non-publication research outputs NESP expects that researchers will take all reasonable steps to deposit research data in an appropriate subject and/or institutional repository within 12 months of data collection. The data and its metadata should be stored in an open format together, in a way that clearly shows how they are linked. Under this requirement, data may need to be placed in a repository before associated publications have been published. Much of the data generated by TSR Hub researchers is likely to be dynamic (e.g., monitoring data, were new data is added on an annual or seasonal basis), collected over an extended period (e.g., long-running experiments), require long data validation processes, or will use revisable datasets (e.g., computational models that need to be updated with new material and/or as material is deleted or corrected). Where researchers that have collected such data feel unable to deposit their data within 12 months of collection, they should refer to the Exceptions to the open access policy below. Exceptions to the open access policy DotE recognises that open access to data or other information may not be suitable in cases where that information is culturally, environmentally, or socially sensitive; refer to the Sensitive information provisions (page 15) for more details. Furthermore, there may be valid reasons to delay depositing data in a publicly accessible repository, or to embargo public access to a dataset after it has been uploaded (typically for a maximum of 12 months). Decisions to restrict public access to research products should be made by those closest to the source (i.e., the data owner) and discussed with the Project Leader and the Chief Operations Officer. If an exception (or delay) to open access is required, the Chief Operations Officer will apply to DotE for approval. In cases where restricted access to sensitive information is approved, the data custodian should keep an enduring copy of the unaltered data, and make the metadata freely available. The metadata should describe the data and why it has not been released, and include the contact details for the person(s) to whom other researchers should apply for more information, or potential access, to the data. It is the data owner’s responsibility to determine what ethical and legal steps must be observed to make the metadata of sensitive information publicly available. Where students (or other researchers) are not contractually obliged to conform to the public accessibility requirements of the NESP funding agreement, they should still be encouraged to follow the guidelines outlined here. Sensitive information provisions DotE recognises that restricting access to information may be suitable in cases where that information is environmentally, culturally, or socially sensitive. Sensitive data may include, but is not limited to: location information for rare species, highly desirable or collectable species, culturally significant site data, social data restricted by privacy considerations, and other heritage or Indigenous matters. Refer to the Exceptions to the open access policy above for more information. It is the Hub leader’s responsibility to collate instances of exceptions to the NESP open access guidelines. Collated lists can be at project level and include basic information on the data collected and the justification for non-release of the data. All proposed exceptions to the NESP open access policy must be reported to the Hub steering committee as part of project progress reports, and included in Hub annual reports. Additional information for handling different types of sensitive data is given below. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 15 Sensitive ecological data Curating data In Australia, the conservation status of flora and fauna is determined at both a state and federal level. Each state and territory treats the management of sensitive information about threatened species in their jurisdiction differently, including how it assesses which species are sensitive, and restrictions on sensitive information. The Atlas of Living Australia (ALA) has worked with Commonwealth, state, and territory agencies, and other data providers, to collate an authoritative list of species considered sensitive for conservation or biosecurity reasons, and advice on how to handle data on these species. The ALA has developed a Sensitive Data Service (SDS) to implement this advice. The SDS enables data in sensitive records to be annotated, withheld, or generalised according to a publicly available set of rules. Conservation and biosecurity data sensitivity can be checked online at the SDS, while the species lists and rules provided by the external agencies can be accessed via the Conservation Sensitivity related data sets. Researchers who are working with species that are considered, or potentially considered, sensitive, should consult the SDS for guidance on how to manage their data. DotE is also developing policy guidelines for the access and management of sensitive ecological data. This document will be circulated to Project Leaders once it is released. Publishing data As it is not possible to foresee which species may be considered sensitive in the future, the TSR Hub recommends publishing the spatial coordinates of field sites (or other plot infrastructure) or species sightings separately to information on the species themselves (presence/absence, abundance, demographic information etc.). Guidelines on how to prepare separate data tables are given in the sensitive information section of the data preparation guidelines (page 25). Sensitive cultural or human data It is possible for research data that relates to people to be shared ethically and legally, if researchers pay attention to four processes when planning and conducting their research: including provisions for data sharing when gaining informed consent anonymising data where needed to protect people’s identities controlling access to data in some instances applying the appropriate licence. Plans to make human data or metadata publicly available must be approved by the Human Ethics Committee of a researcher’s institution, and must also conform to the National Statement of Ethical Conduct in Human Research. More information on the ethics, consent, and sharing of human or culturally sensitive data can be found on the ANDS Ethics, consent and data sharing website and the ANDS guide Publishing and Sharing Sensitive Data. Researchers should also refer to the TSR Hub’s Indigenous Engagement and Participation Strategy when planning and conducting their research. Researchers working with Indigenous Australians can also contact the Aboriginal and Torres Strait Islander Data Archive (ATSIDA) for specialist guidance in managing and depositing Indigenous research data. ATSIDA has a series of protocols to assist with best practice management of Indigenous data, including issues of preservation, access, reuse, and the return of research data relating to Indigenous communities. ATSIDA is a thematic archive of the Australian Data Archive (ADA), based at the Australian National University. For further information on depositing Indigenous data with ATSIDA, or how to collect suitable metadata, contact the ADA: [email protected] TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 16 Uniquely identifying research outputs TSR Hub research outputs should be assigned a persistent identifier (PI) such as a digital object identifier (DOI) or a persistent uniform resource locator (PURL). PIs uniquely identify research outputs, and enable outputs such as data and grey literature to be cited in the same way that journal articles and books are. Citation also enables data to be recognised as a scholarly effort in and of itself, with the potential for reward based on data outputs. TSR Hub researchers are therefore encouraged to assign DOIs to all their research outputs, including, where practicable, data and grey literature. See Citing data (page 26) for more information. PIs are typically minted during the publication process, or while submitting research outputs to a repository. In addition, there are a number of online tools that can mint DOIs for products not associated with a repository, for example the ANDS Cite My Data Service. DOIs are associated with a product’s metadata, and can typically be minted only if minimum metadata requirements are met. For example, the minimum, mandatory set of metadata required to mint a DOI via the ANDS Cite My Data service is the URL, title, creators, publisher, and publication year. If the location or URL of a product changes, then the registered location must also be updated. The responsibility for keeping PIs resolvable generally lies with the data repository, however some repositories pass this responsibility on to researchers (e.g., institutions that have arrangements with figshare). Minting DOIs for static and dynamic datasets DOIs are typically minted once for static datasets, i.e., complete datasets or those based on a single data collection. In contrast, multiple DOIs can be assigned to dynamic datasets (i.e., those that are added to on an ongoing basis, for example seasonally or annually), but only one DOI is associated with a dataset at a specific point in time. For example, a dataset that consists of data collected during one year will be assigned a DOI when it is submitted to a repository. After a second year of data collection the dataset can again be submitted to a repository: this dataset contains data collected during the first and second years, and it is assigned its own DOI. The metadata of updated datasets typically retain links to the DOIs of earlier versions, or a list of previous DOIs. Publications cite the specific dataset used in their analyses. Researchers that are collecting dynamic datasets should carefully consider how they wish to use (and cite) their data before submitting it to a repository, and also how future users are likely to want to access and use the data. Care should be taken to prevent the duplication of datasets. See Timelines for making data publicly available (page 13) for guidelines on when research outputs, including static and dynamic datasets, should be submitted to a public repository, and Citing data (page 26) for information on how to cite datasets. Selecting a suitable repository Data should be deposited into an established repository that is most fit for purpose. When deciding where to deposit data, researchers should consider: the subject of the data, including what it describes and how it was collected where and how future users of the data are likely to search for it the importance of having data associated with similar datasets (e.g., in a subject-specific repository vs. a cross-disciplinary repository) the metadata standard used by a repository, and whether it captures the required (or desired) detail. See Metadata standards (page 20) for more details. the format accepted TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 17 the repository’s licensing framework the repository’s handling of sensitive data the likely persistence of a repository, or the persistence of records within the repository any costs to depositing or maintaining the data how searchable the repository is whether data can be easily retrieved and/or downloaded quality standards during data submission and curation (e.g., auditing processes) whether it is important that data be held under national laws. There are a wide variety of data repositories that researchers can choose from. Each of these has its own benefits and drawbacks. A broad overview of the different types of repositories is provided below, divided here into institutional, national, and international repositories. In addition, Appendix A offers examples of suitable repositories for a range of research outputs that the TSR Hub is likely to produce. This list is not exhaustive, however, and researchers are encouraged to explore their options, including institutional repositories (not included in Appendix A). To this end, researchers may find it useful to consult the REgistry of REsearch data REpositories (re3data.org) data portal, which provides an overview of existing research data repositories across all academic disciplines, or the more life sciences-specific BioSharing directory. Researchers are encouraged to start thinking about which repository they would like to use before data collection begins, as this may save considerable time later on. Some repositories have strict metadata requirements, for example, around which you can design your data collection (e.g., spreadsheets, scripts for automating computational tasks) to aid metadata creation. Repositories must be approved by DotE before data are submitted to them. Contact the Chief Operations Officer to confirm that your preferred repository is approved by DotE, or to gain approval. Institutional repositories All the TSR Hub university nodes have good data management facilities, including data repositories. Data can therefore be deposited in institutional repositories, with metadata minted by the university, and appropriate links made to the TSR Hub’s website. The benefits of using institutional repositories include: a wide range of material (and formats) is accepted ready access to information and advice on how best to deposit data the metadata of institutional repositories are already harvested by ANDS data and metadata are held under Australian law. However, there are a number of limitations to using institutional repositories: Some institutions are unwilling (or unable) to commit to holding data in perpetuity because of limits on the availability of space, and the high costs to host and maintain data. Data held in institutional repositories with these limitations will typically be retained for the minimum time legally required (typically 5-7 years), and then re-assessed by the institution. If data has not been accessed, or is perceived to be of limited public value, then it may be discarded. Data hosted under these conditions would not meet NESP’s requirements. Metadata standards vary across institutions, and may not have the level of detail (or flexibility) to record the information that researchers would like. Several universities are rolling out new data services to their researchers in collaboration with external repositories. The University of Melbourne and Monash University both have newly established relationships with figshare, for example. Researchers can mint DOIs through these services and make their data (or any research output) publicly accessible. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 18 Such accounts are linked directly to an individual researcher, however, and are therefore closed when a researcher leaves the university; it is the researcher’s responsibility to ensure that data is migrated to another service provider prior to the end of their contract. National repositories There are a number of federally funded data repositories and networks that may be of interest to TSR Hub researchers, including (but not limited to): the Terrestrial Ecosystem Research Network (TERN) consists of twelve facilities, including - Eco-informatics, which has established the Australian Ecological Knowledge and Observation System (ÆKOS), a data repository and data portal - The Long Term Ecological Research Network (LTERN) the Atlas of Living Australia (ALA) the Australian Data Archive (ADA), which includes - ADA Indigenous, a specialised research data management facility for Australian Indigenous research data - ADA Social Science, which provides a national service to preserve and provide access to data on social, political, and economic issues the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) the Australian Antarctic Data Centre (AADC). National repositories typically offer users: specialised advice and services the ability to accept a wide variety of data types and also data formats, although this varies between repositories detailed metadata schemas tailored to the subject specialisation, often with a highly level of quality control at the data submission stage close links with government agencies, universities, museums, and other research organisations established protocols to handle sensitive data in some instances, close links between repositories, to cross-walk relevant data (e.g., biodiversity data contained in a dataset deposited with ÆKOS can be cross-walked to ALA) the ability to add associated research outputs in the future (e.g., publications) data is held under Australian law. Some potential limitations of these services can include: lengthy processes to deposit data because of the level of metadata detail required some repositories are only open to data from established contributors (e.g., LTERN). International repositories Some data will be best suited to international, subject-specific repositories (e.g., GenBank). Established protocols for depositing such data should be followed: metadata records can be sent to ANDS and associated (e.g., ecological) data can be deposited in another subject-specific or general repository (with appropriate links in the metadata of each dataset). In addition, there are a large number of general, cross-disciplinary repositories available (e.g., Dryad, figshare). These services are: easy to access often have good links to journals take a wide variety of data and research products. Again, there are a number of limitations to using these repositories: TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 19 Licensing conditions vary between repositories, and there may not be the intellectual and/or legal protections that researchers and/or NESP would like. Dryad applies the CC0 waiver to all data deposited with it, for example, while figshare applies a CC BY 4.0 licence to all objects publicly stored in its international repository, but a CC0 waiver to metadata only, data, and databases. The University of Melbourne and Monash University figshare services offer researchers the full range of Creative Commons licences from which they can choose. Data are typically not held under Australian law; in response to concerns about this, institutional agreements with figshare have stipulated that data must be held in Australia. Repositories differ in whether material must be associated with a publication (e.g., Dryad), or can merely be considered academic (e.g., figshare also accepts grey literature). Some repositories charge researchers to deposit data with them; costs may be reimbursed by some institutions, but not all. Cross-disciplinary repositories typically use general metadata standards (e.g., Dublin Core), and may not have the ability to capture all the information that researchers would like. ANDS can harvest from some international repositories, but not all. The appropriate metadata would therefore need to be sent to ANDS. Metadata Metadata, or “data about data”, is the descriptive, contextual information that is associated with, or refers to, another resource or object. This information should be recorded from the beginning of a research project, and continue throughout the project. Recording high quality metadata will ensure that other users can: Discover your data: contextual information will enable other users to find data by searching the internet or a repository. This requires metadata that can be indexed and searched, for example using geographic, temporal, administrative, or biological keywords or phrases. Understand your data: to be useful, other users need to understand what the data describes, how it was generated, how it is structured, what has been done to it etc. Metadata must therefore record all the processes that lead to a dataset (or other research output), and how it should be read. Access data: metadata should include clear instructions about how users can access data, and if and how it can be reused (e.g., licensing). Data that are not in digital format should be stored and referenced in a manner that facilitates sharing in the event that access is requested. Where access to the data is constrained, for example due to ethical, legal, commercial, or security issues, metadata should outline how potential users can apply for access, and summarise any conditions which they may need to satisfy for access to be granted. See Sensitive information provisions (page 15) for more details. There are a large number of resources available on the internet that provide background information on the value and characteristics of metadata - this information will not be replicated here. Instead, the following sections provide general guidelines about the types of metadata that TSR Hub researchers should record, and a range of tools available to do so. Researchers interested in going into greater depth about metadata are encouraged to explore some of the resources available, such as the ANDS Metadata webpage and the DCC Using Metadata Standards webpage. Metadata standards Metadata normally takes the form of a structured set of elements which describe the data or other research outputs – its content, formats and internal relationships - and which enable its identification, location, and retrieval by others. A metadata standard formalises the information that TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 20 is recorded. Standards typically support a number of defined functions, including capturing descriptive, technical, administrative, use, and preservation information about a dataset or object. The specific metadata standard used will be linked to the choice of data repository: subject-specific repositories typically use a metadata standard tailored to the subject or type of data that the repository holds, while cross-disciplinary repositories use very general metadata standards. A large number of metadata standards potentially meet the needs of the TSR Hub, including multiple subject-specific standards. When choosing which repository (and thus metadata standard) to use, consider: The context in which the metadata will be created and used: consider who the data may be shared with, and who will want to access the data. Best practice is to use a standard that has been developed with a particular community in mind. The format of the objects being described: there may be metadata standards that are particularly suited to your data type. The budget: metadata creation can be expensive (although this is not necessarily the case), but it is worth creating the best possible metadata that the budget will allow. The metadata capture method: a lot of technical metadata can be auto-generated, while descriptive metadata (page 21) typically requires an element of manual input. Metadata storage and delivery: choices of metadata software will affect which metadata standards to implement, and how to reference elements within the standard (e.g., to enable a search for a species, species names must be indexed). Importantly, most repositories can harvest or export metadata from a standard different to the one they use. There is therefore no need to generate metadata in multiple formats, and nor is there one, ultimate standard that researchers should use. Appendix A (page 34) lists a range of data types that the TSR Hub is likely to produce, and common metadata standards and data repositories that are suited to these data types. This list is not exhaustive: if you have any questions about where your data should be deposited, or which metadata standard you should use, please contact your Project Leader, the TSR Hub’s Chief Operations Officer (Melanie King: [email protected]), or your institution’s data management advisors. Descriptive metadata Below are some general, descriptive metadata that should be recorded for all projects, regardless of the discipline or metadata standard used when depositing data in a long-term repository. At a minimum, store this metadata in a readme.txt file (or equivalent) with the data itself. General Overview Title Name of the dataset or research project. Refer to Naming data packages (page 25) for guidelines on how to name datasets before they are made publicly available in an online repository. Creator Names and addresses of the people or organisations who created the data; the preferred format for personal names is surname first (e.g., Smith, Jane). Owner Name(s) and addresses of the people or organisations that own the IP to the data, formatted as for the Creator. Personnel Name(s) and addresses of the people or organisations that helped generate or analyse the data. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 21 Identifier Unique number used to identify the data, even if it is just an internal project reference number (e.g., the TSR Hub project number). Date Key dates associated with the data, including: project start and end date; release date; time period covered by the data; and other dates associated with the data lifespan, e.g., the maintenance cycle and update schedule. The preferred format is yyyy-mm-dd, or yyyy.mm.dd-yyyy.mm.dd for a range. Method How the data were generated, listing equipment and software used (including model and version numbers), formulae, algorithms, experimental protocols, and other things one might include in a lab notebook. Processing Data should be deposited in raw, non-aggregated format where possible. Where this is not possible, describe how the data have been altered or processed (e.g., normalised). Source Citations to data derived from other sources or third-party datasets, including details of where the source data is held and how it was accessed. See Copyright and Licensing (page 10) and Citing data (page 26) for more information. Funder Organisations or agencies who funded the research. This will include the NESP, and possibly other organisations. Content Description Subject Detailed, tabular metadata describing the methods used in the study, including the survey structure, sampling regime, assumptions, and any constraints to re-use. Also include keywords or phrases describing the subject or content of the data. This should include the TSR Hub Theme, Project, and Sub-project numbers (where applicable), and could also include the ANDS project identifiers (page 27). Place All applicable physical locations, including specific locations of observational units and descriptions of the study/research area. Language All languages used in the dataset. Variable list Describe al variables in the list, as well as labels for those variables that are encoded. Code list Explanation of codes or abbreviations used in either the file names or the variables in the data files. Technical Description File inventory All files associated with the project, including extensions. File Formats Formats of the data, e.g., FITS, SPSS, HTML, JPEG, etc. File structure Organisation of the data file(s) and layout of the variables, where applicable. See page 30 for more information on file naming conventions. Version Unique date/time stamp and identifier for each version. See page 31 for more information on file versioning. Necessary software Names of any special-purpose software packages required to create, view, analyse, or otherwise use the data TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 22 Access Rights The intellectual property rights, statutory rights, licenses, embargoes, or restrictions on use of the data. See Copyright and Licensing (page 10) for more information. Access information Where and how your data can be accessed by other researchers. See Selecting a suitable repository (page 17) for more information. Research output submission guidelines The following guidelines provide detailed instructions to help prepare tabular data for submission to an online repository, and general guidelines that should be followed when making any research outputs publicly available. A research output submission checklist can be found in Appendix B. Data preparation guidelines The following guidelines are designed for use when preparing data for submission to an online repository. Where possible, similar protocols should be followed for preparing other data types. For specific guidelines on preparing other data types, please contact your institution’s data management section, your Project Leader, or the TSR Hub’s Chief Operations Officer ([email protected]). File formats The file format of your data plays a large role in its ability to be accessed and used in future. Contemporary software and hardware should be expected to become obsolete, however some formats are more likely to be readable in the future than others. Researchers are therefore encouraged to submit their data in an open format where possible. Formats likely to be accessible in the future are: Non-proprietary Open, with documented standards In common usage by the research community Using standard character encodings (e.g., ASCII, UTF-8) Uncompressed (space permitting) Unencrypted Examples of discouraged formats, and preferred alternative formats, are shown below: Discouraged format Excel (.xls, .xlsx) Word (.doc, .docx) PowerPoint (.ppt, .pptx) Photoshop (.psd) Quicktime (.mov) Encouraged format Comma separated values (.csv) and other text formats (e.g., .txt). For databases, prefer XML or CSV to native binary formats. Plain text (.txt) or, if formatting is needed, PDF/A (.pdf) or rich text format (.rtf) PDF/A (.pdf) TIFF (.tif, .tiff), JPEG, JPG-2000, PNG MPEG-4 (.mp4) Some repositories restrict which file formats they will accept, while others encourage submission of data in multiple formats (e.g., excel files (.xls or .xlsx) and .csv). Widely used proprietary formats (e.g., ESRI shape files) may be acceptable to some repositories, but not all. Always check the repository you will use before preparing your data for submission. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 23 For more information on recommended formats for data archiving, see the AusGOAL open formats website, the ANDS File Formats website, and the CDL Digital File Format Recommendations. Data Tables Provide data in a raw format and at the level at which it was collected. Do not include summaries unless providing aggregated data in conjunction with the raw data is conducive to improving data reuse. Provide as much data as possible in the form of a stand-alone data table (i.e., a single worksheet). If you have multiple data tables, each table should include a column that allows the tables to be linked unambiguously (i.e., a unique identifier). Remove abbreviations. Ensure species variables contain only species names, and not descriptors. For example, a species list should not contain “Eucalyptus pauciflora (live)” or “Eucalyptus pauciflora (x2)”. Instead, record only the species name (Eucalyptus pauciflora) in the species variable column, and any descriptive data (e.g., “live”, “2”) as separate variables in additional columns. Avoid using common names, or include them (in a separate column) in addition to scientific names. If you have more than 20 species consider producing a species file that lists scientific names and common names. If your study had many sites (or transects/plots within sites), consider producing a file that lists the coordinates (decimal degrees). See also guidelines for submitting spatial data for sensitive information (page 25). Columns Columns should contain a single variable rather than observations from a mix of variables (e.g., do not incorporate species names and their growth form in the one column). Create long data tables rather than wide data tables. For example, use separate columns to record “year” and “count” rather than “count_in_year1”, “count_in_year2” etc. Where nominal data is repeated in multiple data tables (e.g., Location ID), ensure that column headers are consistent in each data table. Use descriptive variable names. Use a consistent format within a column (i.e., all numeric or all character string). Check for and repair data inconsistencies (e.g., species name and common name used interchangeably, truncated or abbreviated names). Clearly identify missing values using a consistent value (e.g., “NA” or “missing”). Be careful and consistent with quotations. Record reasons for missing values (e.g., “not observed”, “removed”, “censored”) in metadata records e.g., missing values denote “no survey” or “no observations”. Ensure consistent recording of date/time formats using unambiguous, standard formats (e.g., ISO 8601) within single columns and across columns within a data table. Where possible, do not use integers for nominal or ordinal level variables. Spatial coordinates (decimal degrees) should be provided in two separate columns (i.e., latitude and longitude) Rows 1. Rows should contain observations. 2. Check for and correct typographical errors, missing values, duplicate entries, and unusual characters. 3. Present data in SI units where possible. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 24 Sensitive Information Researchers are responsible for ensuring that confidential, private, or sensitive data is removed or appropriately modified prior to publicly releasing a dataset. Refer to the Sensitive information provisions (page 15) for more information. Researchers should refer to the Sensitive Data Service (SDS) when collating their data for advice on how to handle data for species that are currently considered threatened. However, as new species are periodically added to schedules to legislation listing threatened species, and it is not possible to foresee which species will be added in future, researchers are encouraged to publish the spatial coordinates of field sites or other plot infrastructure separate to information on the species surveyed or recorded: Spatial data tables: provide spatial data for observational and experimental units in a standalone data table. Each set of spatial coordinates should be provided with a unique Location ID in order to spatially reference all sites, plots, transects and so on. Spatial data tables should contain: - Spatial coordinates provided at each spatial scale relevant to the study, e.g., from the transect or sub-plot level (finest level), to the plot or site level (coarser level) - Details of what the coordinates refer to (e.g., location of SW corner of 1ha plot) - A Location ID to link the spatial data to the survey data Survey data tables (containing survey or monitoring data) should not contain spatial data but should contain the Location ID, which can be linked to the spatial data table. Compile separate data table descriptions for spatial and survey data tables. Should a species be listed as threatened in future, data within the spatial data table can be annotated, withheld, or generalised according to the rules outlined in the SDS. Naming data packages The titles given to TSR Hub datasets or data packages that are submitted on online repositories should provide a description of the data that is long enough to differentiate it from other similar data sets/packages. Using a consistent naming format will also facilitate the TSR Hub’s tracking of its research outputs. The following format should be used: [Theme name (Project name)] [:] [Data type (such as experimental unit, observational unit, and/or measurement methods)] [,] [Geographic location] [,] [State] [,] [Country] [,] [Annual or seasonal ranges] For example, a dataset reporting feral cat abundance in Kakadu National Park in 2009 that stems from Theme 1 (Taking the threat out of threatened species) and Project 1.1 (Developing evidencebased management tools and protocols to reduce impacts of introduced predators on threatened mammals) would receive the following title: Taking the threat out of threatened species (Developing evidence-based management tools and protocols to reduce impacts of introduced predators on threatened mammals): Feral cat abundance, Kakadu National Park, Northern Territory, Australia, 2009. The full citation of this dataset, if authored by Brown and colleagues in 2018, would then be: Brown et al. (2018) Taking the threat out of threatened species (Developing evidence-based management tools and protocols to reduce impacts of introduced predators on threatened mammals): Feral cat abundance, Kakadu National Park, Northern Territory, Australia, 2009. Name of preferred repository, http://doi.org/10.1718/492allmadeup356d. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 25 Saving data tables Data tables that are to be submitted to a repository should be saved in plain text format (e.g., .csv or .txt) using a name that clearly describes its contents. The TSR Hub recommends that you use the following naming convention: [theme]_[project]_[data type]_[location]_[YYYY or YYYY-YYYY]_[document type] Use a logical abbreviation for the theme and project names; all data files produced by this theme and project should use the same abbreviation Use lowercase with underscore. Generalised location is optional Document type is optional (e.g., data_table, table_description) For example, for data that is linked to the citation: Brown et al. (2018) Taking the threat out of threatened species (Developing evidence-based management tools and protocols to reduce impacts of introduced predators on threatened mammals): Feral cat abundance, Kakadu National Park, Northern Territory, Australia, 2009. Name of preferred repository, http://doi.org/10.1718/492allmadeup356d. An appropriate file name could be: ttts_ebmt_cat_abundance_data_2009.csv Data table descriptions Complete a separate table description for each data table (including those supplied as part of a data package). Include descriptions of all columns in each data table. Ensure information that is necessary for re-use (e.g., projections, datum) is included. Include the units relevant to each column, and ensure that the column descriptions relate to the measurement protocols described in the metadata for the data package. Citing data Datasets should be formally cited in an article, and included in the article’s reference list. Citing data is important to give the data producer credit, allow easier access to data for re-purposing or re-use, and to enable others to verify your results. See Naming data packages for guidelines on how TSR Hub datasets and data packages should be titled. Many data repositories (and some journals) provide explicit instructions for citing their contents. If no citation information is provided, you can still construct a citation following generally agreed-upon guidelines from sources such as the CODATA Report on data citation, the ESIP Federation Data Citation Guidelines, and the DataCite Metadata Schema. Five core elements are usually included in a dataset citation, with additional elements added as appropriate. Creator(s) – may be individuals or organisations Publication year when the dataset was released (this may be different from the Access date; see below) Title – follow the guidelines set out in Naming data packages. Publisher – the data centre, archive, or repository Identifier – a unique public identifier (e.g., a DOI) Although the core elements are sufficient to refer to static datasets, additional elements may be required if you wish to cite a dynamic dataset or a subset of a larger dataset. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 26 Version of the dataset analysed in the citing paper Access date when the data was accessed for analysis in the citing paper Subset of the dataset analysed (e.g., a range of dates, geographic locations, or record numbers, a list of variables) Verifier that the dataset or subset accessed by a reader is identical to the one analysed by the author (e.g., a Checksum) Location of the dataset on the internet, needed if the identifier is not "actionable" (convertible to a web address) See the DMPTool website for more details and example citations. In addition to citing data within a journal article, data that is deposited in a repository can itself be supplemented with a data paper. Data papers are peer-reviewed publications that describe a dataset, but which do not draw any conclusions about it. Examples of journals that accept data papers that may be applicable to TSR Hub data are Nature Scientific Data, Ecological Archives, and Geoscience Data Journal. See this University of Edinburgh website for more options. Data can also be published as supplementary material associated with a scientific paper that meets NESP’s public accessibility requirements. Linking data with related research outputs The metadata of archived datasets should be updated when related research outputs (e.g., journal articles, books, grey literature) are published. Update the metadata to include the DOI and location (weblink) of the published item. Some repositories will also allow copies of items related to a dataset to be uploaded as associated files. Acknowledging NESP support and the TSR Hub Support from DotE needs to be acknowledged in all research outputs, including data, publications, presentation material (including presentations at conferences), promotional and advertising material, signs, plaques, etc. As per the TSR Hub Media Protocol, research outputs should include the boilerplate: Reference to all partners and contributing organisations The following paragraphs: The Threatened Species Recovery Hub brings together Australia’s leading conservation scientists to help develop better management and policy for conserving Australia’s threatened species. It is supported by the Federal Department of the Environment’s National Environmental Science Programme (NESP), a long-term commitment to support environmental and climate research. The boilerplate can be included the acknowledgements section of publications, or the metadata of other research outputs. Where possible, a link to the TSR Hub website should also be included. ANDS project identifiers The Australia National Data Service is developing project-specific codes for each of the NESP Hubs. ANDS will use these keywords to find and link NESP research outputs with the ANDS data portal, Research Data Australia (RDA), which will act as a central metadata repository for all the Hubs. ANDS regularly harvests metadata from a large number of Australian research institutes and data repositories. Metadata that is deposited in these partner organisations will be automatically ported into ANDS, while the TSR Hub will notify ANDS of research outputs deposited elsewhere. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 27 ANDS’ project codes will be available from TSR Hub Project Leaders. Researchers will need to include their project’s ANDS code in their research output’s metadata. This code should be included as a keyword if possible, but can go in other fields (e.g., free-text fields) if keyword options are restricted. Permission to publish research outputs Your Project Leader should approve the final version of all research outputs (and their metadata) before they are submitted to a journal, online repository, or other publishing platform. Approval is required once only, unless there are major changes made to the output during the revision process. Notifying DotE of TSR Hub research outputs A copy of the NESP Research Product Submission Form should be sent to DotE at least five (5) working days before an output is publicly released. Forms are available from the TSR Hub website. The Form should be accompanied by a copy of the research output in a format that is approved for public release. Linking research outputs with the TSR Hub’s website The TSR Hub’s website will act as a centralised, up-to-date source of information on the Hub’s activities, including a link to (or copy of) all research outputs arising from the TSR Hub. To facilitate this, please send your Project Leader and the Chief Operations Officer a copy of the NESP Research Product Submission Form and the research output at the time you submit them to DotE (i.e., at least five working days before the output is made publicly accessible, and within 12 months of publication). For publications, please note under which publication scenario the document can be made publicly accessible. Data management at the project-level TSR Hub researchers should consider how they will manage their data before projects begin, and to revisit and refine their data management strategy at regular intervals throughout the project’s life. Data management plans are formal documents that outline what you will do with your data before, during, and after a research project. Each project is required to develop a data management plan within 12 months of starting to provide a guide as to how data collected will be managed. A blank copy of the project-level data management plan can be found in Appendix C. Completed copies of the project-level plans should be sent to the Chief Operations Officer ([email protected]). Project Details Project number and name: give the TSR Hub project number and name. Project Leader(s): list all the Project Leaders. Lead organisation(s): list the lead organisations. Partner Investigator(s): list all the partner investigators. Partner organisations: list the partner organisations. Project leader contact details: provide the name and contact details (telephone number and email) of the person(s) primarily responsible for the project. Project summary: provide a brief summary of the project. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 28 Collect What types of research outputs will be collected and/or produced during the project? Research outputs (including data) can be grouped into four main categories: observational e.g., telemetry, surveys, images, sensor readings experimental e.g., habitat manipulations, gene sequences simulation e.g., population models, climate models derived/compiled e.g., meta-analyses, compiled database, reports, manuscripts. These categories differ in their reproducibility, and their cost and time to generate. Which categories your research outputs come from will affect how you manage them. Describe the methods, processes, and software that will be used to generate each type of research output. Give a brief overview of how you will conduct your research, and the research outputs that you will produce. Is ethics approval (human and/or animal) required for this project? If so, give details. Projects involving research on animals and humans require approval by an ethics committee: if your project requires ethics approval, list the approval number(s). It is the researcher’s responsibility to be aware of the relevant legislation and institutional policies. For more information, see: NHMRC/ARC/UA Australian Code for the Responsible Conduct of Research (2007) NHMRC Values and Ethics: Guidelines for Ethical Conduct in Aboriginal and Torres Strait Islander Health Research (2003) Australian Institute of Aboriginal and Torres Strait Islander Studies Guidelines for Ethical Research in Australian Indigenous Studies (2012) Australian Code for the care and use of animals for scientific purposes (2013). Describe any third-party data to be used in the project and the licence agreement to use the data. Third-party data should be accessed under a data sharing licence agreement: see Copyright and Licensing (page 10) for more details. Provide the name and source of any third-party data that your project will use, and the data sharing licence agreement that you have entered into to use that data (e.g., which Creative Commons licence). Include details on any constraints to the re-use of data. How and where will data and related information be backed up and stored during the project? Regularly backing up data and related information, including metadata, is an integral part of data management. Some general advice for backing up your data are to: Have three copies in at least two locations (e.g., original + external/local backup + external/remote backup). Geographically distribute your copies to reduce the risk of losing all backups (e.g., because of fire or flood). Use software to backup your hard drive (e.g., Windows File History, OS X Time Machine, Linus/UNIX rsync). Test your backup system to confirm that you can retrieve and read your files. Do this when you initially set up the system, and on a regular schedule thereafter. Ensure that your files can be accessed by at least one other, trusted person, in case of accident or injury. You can backup data to your personal computer, external hard drives, or institutional server. All the of the TSR Hub node institutions have good infrastructure to support data management, including backups. The ANDS website provides links to the data management plans and toolkits for most Australian Universities. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 29 In which file format(s) will data be stored during the project? List the types of files your project will produce, e.g., .csv files, word files, R project, ESRI shape files. What is the expected volume of data that will be collected or generated? The answer to this question will determine the amount and type of data storage you will need to access. Some questions to keep in mind are: 1. 2. 3. 4. 5. Are you manually collecting and recording data? Are you using observational instruments and computers to collect data? Is your data collection highly iterative? How much data will you accumulate every month or every 90 days? How much data do you anticipate collecting/generating by the end of your project? How stable is the data, and how will you control for different file versions? Data can be fixed or changing over the course of a project (and possibly after the project’s end): Fixed or static datasets: do not change after being collected or generated (e.g., data collected during a single, short-term experiment), other than to correct errors or other minor issues. Growing or dynamic datasets: new data may be added, but the old data is not changed or deleted (e.g., monitoring data that is collected on an annual basis) Revisable datasets: data may be added, deleted or changed (e.g., predictions based on computational models that need to be updated with new material and/or inaccurate material deleted or corrected). Which category your data falls into will affect how you organise your data, the level of versioning you will need to undertake, and how, when and where you deposit your data (see Timelines for making research outputs publicly available (page 13) for more details). Using naming/versioning conventions: makes it easier to locate specific files prevents accidental overwrites or deletion preserves differences in the content or function of different file versions prevents confusion if multiple people are working on shared files General recommendations for file naming and version control: File and folder names define a naming convention and be consistent using it, especially if multiple people are sharing files. For example, the top-level directory/folder could include i) the project title, ii) a unique identifier, and iii) the date (e.g., yyyy-mm-dd or yyyymmdd). The sub-directly structure should also have clear, documented naming conventions. For example, separate files or directories could apply to each run of an experiment, version of a dataset, or person in the group. use underscores (_) and not spaces to separate terms keep names short (15-25 characters) use folder names that describe the general category of files that the folder contains use file names that describe the contents of the file do not include the folder name in the file name unless you are sharing files and there might be confusion about which folder a file should be added to reserve the 3-letter file extension for the file format (e.g., .txt, .pdf, .csv) record all changes to a file, no matter how small, and discard obsolete versions after making backups. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 30 File versions include the date at the beginning of the file name (e.g., yyyymmdd). Update this number each time the file is saved for the final version, substitute the word FINAL for the version number turn on version tracking in collaborative works or storage spaces (e.g., Wikis, GoogleDocs) use a versioning software that automatically tracks versions. Whichever file naming and version control protocols you define and use, ensure that everyone in the project is aware of the protocols, and periodically check that they are being followed. How will you keep data secure during the project? Data security is the protection of data from unauthorised access, use, change, disclosure, and destruction. Ensure your data is safe with regards to: Network security: keep confidential data off the internet and, in extreme cases, put sensitive material on computers not connected to the internet. Physical security: restrict access to buildings and rooms where computers or media are kept. Computer systems and files: keep virus protection up to date, set passwords on files and computers, do not send confidential material via email (or encrypt files if you must). Some types of data (e.g., human data that contains identifiable or re-identifiable information) may have specific storage or encryption requirements. Researchers must ensure that they are aware of the relevant state and federal legislation and guidelines governing privacy regarding their research. For more details, contact your institution’s ethics committee and refer to the NHMRC National Statement on Ethical Conduct in Human Research. Assure Describe conditions during data collection that may affect the quality of data. Examples of conditions that may affect data quality include equipment settings or failure, environmental conditions, inadequate training, a lack of calibration, missing values or incorrect entry of data. Aspects of data quality include: Accuracy: how well do data accurately represent the “real world” values they are expected to model? Completeness: is all the required information available? Consistency: are values consistent across data sets? Do distinct occurrences of the same data agree with each other? If multiple files exist then a linking variable is required so that they can be joined. Validity: how correct is the data? Conformity: if there are expectations that data conforms to a specified format, does this occur? Integrity: is there data missing, or are important relationship linkages missing? Describe any quality check processes. Quality checks should include confirmation that the Data preparation guidelines (page 23) have been followed, and protocols to have draft publications reviewed by the authors and Project Leader(s) prior to publication. Quality checks could also include processes to address the aspects of data quality outlined above. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 31 Describe Which metadata standards will be used to describe the data? The metadata standards you use should be tailored to the type(s) of data and research outputs you produce. See Metadata standards (page 20) and Appendix A (page 34) for more information. Which tools will be used to create and manage metadata? There are a number of tools available on the internet to assist in metadata management; these should always be supplemented by your own manual entries. See Metadata standards (page 20) for more information on methods to record metadata, and Descriptive metadata (page 21) for general metadata that you should record about your project. Which standard keyword lists and/or controlled vocabularies will be used to describe data? The metadata of all TSR Hub projects should include the TSR Hub boilerplate (page 27), and the projectspecific codes generated by ANDS (page 27). In addition, you may wish to include other keywords relevant to your research subject or location (e.g., taxonomic, ecological, geographic, or sociological keywords). Preserve and Discover How will data be shared between project partners and external researchers? Describe the tools that you will use to share between collaborators (e.g., Dropbox, Google Drive, Bitbucket, email). How will data and other research outputs be stored and made freely available to the public in the long term? Provide details of the freely accessible data repository/repositories in which you will deposit your data for long-term public access. See Selecting a suitable repository (page 17) and Appendix A (page 34) for more information. In what format will data be stored and published in the long term? List the types of files your in which your data will be stored/deposited. See File formats (page 23) for more information on suitable file formats for long-term storage. How will data be retained and/or disposed of? While it is expected that data and research outputs will be made permanently accessible on the internet, researchers must also manage the original and/or hard copies of their data. Universities are guided by the requirements of the Australian Code for the Responsible Conduct of Research, which sets out retention periods for research data, research records and primary materials. In addition, there is need to consider: specific provisions set out in Commonwealth and State legislation concerning archives and record keeping, especially those materials of heritage value, disciplinary requirements, and possibly other regulatory requirements. Here, outline how and when you will dispose of the different research outputs that your project will produce. Which licensing policy will be used to share data and other research outputs? List which Creative Commons licensing policy/policies you will apply to your research outputs. See Copyright and Licensing (page 10) for more information. Will there be embargo period for any data? If yes, give details. While it is expected that TSR Hub research outputs will be made freely accessible to the public within 12 months, embargos may be applied in exceptional circumstances. Detail whether you will require an embargo for any of your research outputs, including the reasons why, and the length of the embargo period. See Timelines for making research outputs publicly available (page 13) for more information. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 32 If applicable, describe procedures to access restricted data. It may not be suitable to make some types of data (e.g., some threatened species data, or social or cultural information) fully accessible to the public. If your project will produce sensitive information, detail how you will store the data for long-term preservation, and how you will manage access to the data. See Sensitive information provisions (page 15) for more information. Data Management Administration What is the staff allocation for data management? State the full-time equivalent (FTE) that has been allocated to manage data for your project. Who is responsible for managing and backing up data prior to its submission to a repository? Give the name and contact details (email, telephone number) of the staff member(s) responsible for managing and backing up data and metadata. How will project participants be trained to manage data? Detail how you will ensure that all researchers (staff and students) are managing data appropriately. What is the technology cost to store data? Detail the costs you will incur to store your data (e.g., hard drives, cloud storage access costs). What is the technology cost to make data accessible? Detail any costs you will incur to deposit data in online repositories. Date this plan was prepared. Give the date that this data and information management plan was completed. Date of next review. Give the date that you will next review your data and information management plan: plans should be reviewed regularly, and at least annually. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 33 Appendices Appendix A. TSR Hub data, data repositories, and metadata schemas The following tables provide examples of data that are likely to be produced by the TSR Hub, common metadata standards associated with those data types, and a selection of subject-specific repositories that use those metadata standards. Information on common cross-disciplinary metadata standards and repositories is provided in the final table. Repositories preceded by a star (*) are Australian-based, although they may also take data from outside Australia; all others are international. The metadata standards and repositories listed here are just a small selection of the repositories that exist. For more information on standards and repositories for each subject or data type (and also any that are not listed here), see: re3data.org: a REgistry of REsearch data REpositories BioSharing: a directory of life sciences databases and metadata standards DCC disciplinary metadata standards webpages In addition to the repositories listed here, each of the TSR Hub nodes has an institutional repository. Contact the relevant facility in your institution for more information; the ANDS website also has links to the data management plans and toolkits for Australian Universities. Spatial data Metadata ANZLIC Metadata Profile: a profile of ISO 19115, and NESP’s preferred metadata standard for spatial data. This standard can be used to describe a wide variety of non-spatial material: see the ANZLIC Metadata Profile Guidelines Version 1.2 for examples. Also maps to Dublin Core. Metadata creation tool: ANZMet Lite metadata collection tool Repositories * AODN – Australian Ocean Data Network Portal: the primary access point for the search, discovery, access and download of data collected by the Australian marine community. Uses the Marine Community Profile extension of ISO 19115. Re3data summary. AusCover: a TERN facility that provides remote sensing data time-series and satellite based biophysical map products for Australia. AusCover also provides associated field calibration and validation data at continental scales. Uses ANZLIC metadata. Comments See the NESP Data and Accessibility Guidelines for more information, and the ANZLIC Metadata Profile Guidelines Version 1.2 for examples. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 34 Ecological data Data that is collected using field techniques such as surveys, trapping, radio- or GPS-tracking, and camera traps. May be associated with demographic, biological, and/or genetic data, or experimental manipulations (e.g., of habitat). Ecological data often has an important spatial element, and is typically structured at some level (e.g., plot-based). Metadata EML - Ecological Metadata Language: a standard developed by the ecological discipline to describe ecological data Metadata creation tool: Morpho Summaries: DCC and BioSharing Darwin Core: developed to describe biological diversity. Primarily based on taxa and their occurrence as recorded by observations, specimens, samples and related information. Often supplemented with EML to provide an ecological context to the data. Metadata creation tool: Darwin Core Archive Assistant Summaries: DCC and BioSharing Repositories * ÆKOS - Australian Ecological Knowledge and Observation System: A TERN facility that takes structured (e.g., plot-based) ecological data; this can include photographs, sound recordings, maps, molecular data, and links to additional literature. Data is submitted using the SHaRED data submission tool. ÆKOS uses a mix of metadata standards, including EML and ICAT. re3data summary. KNB - The Knowledge Network for Biocomplexity: an international repository that takes ecological and environmental data. BioSharing and re3data summaries. * LTERN – Long Term Ecological Research Network: A TERN facility that is composed of 12 ecological plot networks across Australia that have been monitored for several years, and up to decades. * ALA - Atlas of Living Australia: repository for species records (point-based data), based on field observations, surveys, and specimens from biological collections. Can also take photographs, sound recordings, maps, molecular data, and links to additional literature. The ALA is the Australian node of the GBIF. Re3data summary. GBIF – Global Biodiversity information Facility: an international open data infrastructure, funded by governments, that coordinates the biodiversity information facilities of participating countries and organisations. The GBIF Integrated Publishing Toolkit (IPT) uses EML as its metadata standard. BioSharing and re3data summaries. Comments Specialised data associated with ecological data should be submitted to the most appropriate repository (e.g., genetic data sent to GenBank), with appropriate links to the associated ecological data. Researchers should consider separating occurrence data for sensitive species (or potentially sensitive data) from the spatial locations of occurrences. See Sensitive Information (page 25) for more information. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 35 Sociological data Social science data collected using quantitative or qualitative methods. Metadata DDI – Data Documentation Initiative: an international effort to create a standard to describe statistical and social science data Metadata creation tool: Summaries: DCC and BioSharing da|ra: an extension of the DataCite metadata standard, developed for social sciences Metadata creation tool: via the GESIS data archive Repositories * ADA - Australian Data Archive: a national service for the collection and preservation of digital data. Consists of seven sub-archives, including Social Science, Historical, and Indigenous. Information relating to Indigenous communities can be deposited with ATSIDA (Aboriginal & Torres Strait Islander Data Archive): see the ATSIDA website for protocols and more information. Uses the DDI metadata standard. Re3data summary. GESIS – Leibniz-Institute for the Social Sciences: an open repository for social science and economic data. Uses the da|ra metadata standard. Re3data summary. ICPSR – Inter-university Consortium for Political and Social Research: international repository for political, social, and behavioural data. Offers training in quantitative methods to facilitate effective data use. Largest collection of accessible computerreadable data in the social sciences and humanities in the United Kingdom. Uses DDI metadata standards. See also openICPSR. Re3data summary. QDR – Qualitative Data Repository: accepts digital data generated through qualitative and multi-method research in the social sciences. Re3data summary. Comments The Atlas of Living Australia is also developing a number of tools to record Indigenous Ecological Knowledge, and they are interested in working with new projects. Contact the ALA if you are interested in being involved in the pilot project. Grey literature May include reports, fact sheets, presentations, recommendations, Comments Attachment E of the NESP Data and Accessibility Guidelines provides detailed suggestions for best practice in the publication and management of grey literature and its associated metadata. Depending on the format and subject, grey literature it may be suitable for deposition in a subject-specific repository, either as a stand-alone document or as associated information for other material (e.g., a data set). Alternatively, grey literature can be submitted to institutional repositories or to several crossdisciplinary repositories (e.g., figshare, or to Dryad if the document is related to a scientific publication). A range of metadata standards have the flexibility to record information relevant to grey literature, for example Dublin Core, ANZLIC, EML. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 36 Models and simulations Metadata CIM – Common Information Model: a standard that describes climate data, the models and software from which they derive, the geographic tools use to calculate and project them, and the simulations and other processes that produced them. DCC summary SED-ML: an XML-based format for encoding simulation setups, to ensure exchangeability and reproducibility of simulation experiments. It follows the requirements defined in the MIASE guidelines. BioSharing summary Repositories BioModels: a repository of computational models of biological processes BioSharing summary. Comments Guidelines for depositing model code and simulations in online repositories are currently not well developed, however there exist several options. Models can be published in stand-alone papers (published under CC BY 4.0) and then referred to in later papers, although any changes to the models will need to be detailed. Alternatively, the version of the code and simulations used in a paper can be included in the paper’s supplementary material. Again, the paper must be published under CC BY. The material can be deposited in a repository: BioModels is one example of a repository of models for biological processes, although it focuses on cell biology. Other general repositories (e.g., Dryad, figshare) might also be suitable, as might services such as GitHub. Finally, models and simulations can be placed in a repository as material associated with a dataset. See also the Minimal Information Required In the Annotation of Models (MIRIAM) website and the related Minimum Information About a Simulation Experiment. Image or video data Comments Image or video data will typically be associated with ecological data (e.g., is derived from camera traps, standardised photo points) and is therefore best deposited with the ecological data in an ecological data repository. However, the metadata for the data package should include detailed metadata for the images/videos. See the Guidelines for Handling Image Metadata Version 2.0 for information on metadata standards for a wide range of digital media, and also the NISO Metadata for Images. Meta-analyses and compiled data Comments Depending on the subject, data compiled from a range of sources (e.g., scientific publications, legal documents, management plans, species profiles etc.) may be suited to a subject-specific repository, or a general inter-disciplinary repository. See the appropriate subject within this appendix for more information. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 37 General research data, cross-disciplinary standards and repositories The following metadata standards and repositories are inter-disciplinary and can therefore describe/take the full range of research outputs. Metadata Dublin Core: an inter-disciplinary standard; one of the best known and most widely used metadata standards Metadata creation tool: a range of tools is available here Summaries: DCC and BioSharing DataCite Metadata Schema: an inter-disciplinary standard Metadata creation tool: DataCite API for datacentres Summaries: DCC and BioSharing RIF-CS: developed as a data interchange format to support the electronic exchange of collection and service descriptions. Metadata creation tool: can only be created via the RDA website Summary: BioSharing Repositories Data Dryad: an international repository of data underlying peer-reviewed scientific and medical literature. Uses a version of Dublin Core as its metadata standard, in addition to Darwin Core, Bibliographic Ontology and its own repository-specific elements. Uses a CC0 waiver. BioSharing and re3data summaries. figshare: an international, inter-disciplinary repository that can take academicrelated material in almost any format. Material does not need to be associated with a publication. figshare entered into institutional relationships with the University of Melbourne and Monash University in Australia, and several others in the UK. Data licensed via a CC BY 4.0. BioSharing and re3data summaries. * RDA – Research Data Australia: a metadata repository administered by ANDS. Harvests metadata from partner organisations (and can therefore consume a variety of metadata standards), but does not hold data itself. Data licensed via a CC BY 4.0. re3data summary. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 38 Appendix B. Research output submission checklist Pre-submission checklist Select a suitable repository based on the subject and format, the repository’s specialisation and/or audience, its licensing terms, and legacy arrangements. Confirm with the TSR Hub’s Chief Operations Officer that the repository you wish to use is approved by DotE. If the repository is not approved, send its details to the Chief Operations Officer so that approval can be obtained. Obtain the consent of the data owner (if not yourself) to publish a research output (data, publication etc.). Ensure that the metadata for your research output is detailed and complete. Refer to the Metadata section for an overview of the elements likely to be required. Confirm that the research output will be published under the most open licence possible, and preferably a CC BY 4.0 licence Obtain permission to publish the research output from your Project Leader Ensure that your data meets all ethical and legal requirements for handling sensitive information. Check that your research output (or its metadata) includes: The TSR Hub boilerplate to acknowledge all partners and contributing organisations, DotE, and NESP A link to the TSR Hub website The Theme and Project under which your project was funded All relevant ANDS project identifiers Updated links between related research outputs: cite datasets in publications, and include citations and links to associated publications in the metadata of datasets. Post-submission checklist At least five working days before the output becomes publicly available, submit a Research Product Submission Form to 1) the Department of the Environment, 2) your Project Leader(s) 3) the TSR Hub’s Chief Operations Officer ([email protected]). Send a copy of your research output to the Chief Operations Officer ([email protected]) for inclusion on the TSR Hub website. TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 39 Appendix C. Blank Project Data Management Plan A blank copy of the TSR Hub project Data Management Plan can be found on the following pages. Copies can also be obtained by emailing the TSR Hub’s Chief Operations Officer, Melanie King ([email protected]). Project Data Management Plans should be completed within 12 months of starting a project and sent to Melanie King ([email protected]). Plans should be revised regularly, and at least annually TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 40 Threatened Species Recovery Hub Project Data Management Plan Project Data Management Plans should be completed within 12 months of starting a project. Completed plans should be sent to the TSR Hub’s Chief Operations Officer, Melanie King ([email protected]). Project details Project number and name Project leader(s) Lead organisation(s) Partner Investigator(s) Partner organisation(s) Project leader contact details Project summary Collect What types of research products will be collected and/or produced during the project? Describe the methods, processes, and software that will be used to generate each type of research product. Is ethics approval (human and/or animal) required for this project? If so, give details. Describe any third party data to be used in the project and the licence agreement to use the data. How and where will data and related information be backed up and stored during the project? In which file format(s) will data be stored during the project? What is the expected volume of data that will be collected or generated? How stable is the data, and how will you control for different file versions? How will you keep data secure during the project? Assure Describe conditions during data collection that may affect the quality of data. Describe any quality check processes. Describe Which metadata standards will be used to describe the data? Which tools will be used to create and manage metadata? Which standard keyword lists and/or controlled vocabularies will be used to describe data? Preserve and discover How will data be shared between project partners and external researchers? How will data and other research products be stored and made freely accessible to the public in the long term? In what format will data be stored and published in the long term? How will data be retained and/or disposed of? Which licensing policy will be used to share data and other research products? Will there be embargo period for any data? If yes, give details. If applicable, describe procedures to access restricted data. Data Management Administration What is the staff allocation for data management? Who is responsible for managing and backing up data? How will project participants be trained to manage data? What is the technology cost to store data? What is the technology cost to make data accessible? Date this plan was prepared Date of next review Appendix D. Web resources to manage research data There are a large number of resources available on the internet that go into great detail about all the aspects of managing research data and making it publicly accessible. The following list is a very brief selection of these resources, arranged in alphabetical order. For additional information, google. Australian National Data Service: http://ands.org.au/ An Australian NCRIS facility that supports the management of research data in Australian institutions. The ANDS website has an enormous amount of information and guidelines for every stage of the data management process. DataONE: https://www.dataone.org/ Provides infrastructure, tools, and training resources to manage data. Digital Curation Centre: http://www.dcc.ac.uk/about-us A UK centre of expertise in digital information curation. Has a UK focus, but provides a large number of how-to guides, case studies, and training programs. DMPTool: https://dmptool.org/ A data management planning tool; has excellent general information within the Help menu. Inter-university Consortium for Political and Social Research: http://www.icpsr.umich.edu/icpsrweb/landing.jsp Information on managing social science data MANTRA: http://datalib.edina.ac.uk/mantra/ A free online course for those who manage digital research data Open data handbook: http://opendatahandbook.org/ Discusses the legal, social, and technical aspects of open data. UK Data Archive: http://www.data-archive.ac.uk/ An online repository that also has a lot of information about data management. Each of the TSR Hub node institutions provides advice on data management. See the ANDS website for a comprehensive list of codes of practice, policies, procedures, and training for Australian Universities: https://projects.ands.org.au/policy.php TSR Hub Data and Information Management Plan V3.0 – June 2016 Page 43 Funding partners: Collaborating organisations: ACT Government, Environment and Planning Department of Parks and Wildlife (WA) BirdLife Australia Department of Land Resource Management (NT) Booderee National Park Department of Environment and Heritage Protection (QLD) Bush Heritage Australia Greening Australia CSIRO Publishing NSW Office of Environment and Heritage Department of Environment, Land, Water and Planning (VIC) Parks Victoria Department of Primary Industries, Parks, Water and Environment (TAS) The PEW Charitable Trusts Department of Environment, Water & Natural Resources (SA) Royal Botanic Gardens Melbourne Taronga Conservation Society Australia Woodlands and Wetlands Trust Tasmanian Land Conservancy WWF Australia Threatened Plants Tasmania (Wildcare Inc) Royal Zoological Society of SA Inc The Nature Conservancy Further information: http://www.nespthreatenedspecies.edu.au/ Contact: Melanie King Chief Operations Officer Centre for Biodiversity and Conservation Science Room 532, Goddard Building The University of Queensland St Lucia, QLD 4072, Australia T (+61 7) 3365 6907 M (+61) 412 952 220 E [email protected]
© Copyright 2026 Paperzz