Staff and Departmental Development Unit Leeds Research Data Management Pilot (RoaDMaP) Dr Keeping your research alive: preserving research data 25th February 2013 Dublin Core field Title Keeping your research alive: preserving research data Creator Banks, Tim Creator Baxter, Jim Creator Blyth, Graham Creator Philips, Brenda Creator Proudfoot, Rachel Subject Research data management; social science data Description A course booklet to accompany a two hour training course for social science researchers on research data management. Publisher University of Leeds Contributor Miller, Kerry Contributor Ball, Alex Contributor Newton, Angela Contributor Pullinger, Dan Date 2013 Type Text; image Format Paper Identifier N/A Source RoaDMAP training booklet Preserving your research data for future use Language En Relation IsBasedOn SDDU/ RoaDMAP training booklet Preserving your research data for future use Coverage N/A Rights CC-BY - unless indicated otherwise 1 Acknowledgements This training course was developed as part of the University of Leeds Research Data Management Pilot (RoaDMaP) project. The RoaDMaP project is funded through JISC’s Digital infrastructure: Research management programme within strand Research Data Management Infrastructure Projects. The course has been developed in collaboration with Kerry Miller and Alex Ball of the Digital Curation Centre. Indeed much of the course material has been informed by the work of the Digital Curation Centre, www.dcc.ac.uk. The start point for the course were courses developed by Jez Cope and Cathy Pink, Research360 project, University of Bath entitled Managing Your Research Data, June 2011, http://blogs.bath.ac.uk/research360/2012/07/research-data-management-training-take-2/ (accessed 27/11/12), and Ben Taylorson, University of Durham entitled Managing your research data, April 2011, www.dur.ac.uk/library/research/training/archive/20112012/ (accessed 27/11/12). Staff at Leeds who have contributed to the course development include Jim Baxter, Tim Banks, Graham Blyth, Brenda Phillips, Rachel Proudfoot, Angela Newton and Dan Pullinger. 2 Tutors Dr Jim Baxter, Senior Staff Development Officer (Research and Knowledge Transfer), SDDU Tim Banks, Faculty IT Manager, ESSL and PVAC Rachel Proudfoot, RoaDMaP Project Manager, Library 3 Course Structure The programme for the course is as follows. Introduction 5 mins What is research data? 15 mins Why manage research data? 15 mins How to manage research data 15 mins Access, sharing and re-use 25 mins Data lifecycle and data management planning 25 mins Approaches to preserving and managing research data 10 mins Summary and Q and A 10 mins Feedback (over lunch) 60 mins 1 4 Aim The aim of the course is to raise awareness of the challenges of preserving and managing research data and approaches to addressing these challenges. By the end of the session participants will be better able to: Describe the forms research data takes and the role of contextual documentation and metadata in enabling data re-use Describe how managing research data effectively will improve your research, save you time, decrease the risks of data loss and increase your professional impact and identify tools to help Describe University of Leeds and research funder data management expectations Identify sources of information and guidance on managing research data effectively, including additional training courses 2 5 What are Research Data? 5.1 What are data? The lowest level of abstraction from which information and knowledge are derived Both analogue and digital materials are data Digital data can be: o created in a digital form ("born digital") o converted to a digital form (digitised) MANTRA provide an explanation of research data in their online training material, Research Data Explained, http://datalib.edina.ac.uk/mantra/researchdataexplained.html, (accessed 26/11/12). Some definitions of research data Queensland University of Technology Management of Research Data Policy Research data means data in the form of facts, observations, images, computer program results, recordings, measurements or experiences on which an argument, theory, test or hypothesis, or another research output is based. Data may be numerical, descriptive, visual or tactile. It may be raw, cleaned or processed, and may be held in any format or media. http://bit.ly/PA21ex Monash University Research Data Policy Research data: The data, records, files or other evidence, irrespective of their content or form (e.g. in print, digital, physical or other forms), that comprise research observations, findings or outcomes, including primary materials and analysed data. http://bit.ly/OFaz9B Cairo Project - Managing Creative Arts Research Data: Training Unit 1 The data might be part of an actual work created through research activity (for example a three-dimensional model displayed via public exhibition) or data may instead be documentary evidence (such as video documentation of a real-world performance event or digital photograph of an installation) of research efforts. Research data includes preparatory, unfinished and supportive work in digital form in addition to data relating to completed works. http://bit.ly/UHamPG EPSRC - Recorded, factual material commonly retained by and accepted in the [research] community as necessary to validate research findings; although the majority of such data is created in digital format, all research data is included irrespective of the format in which it is created. 3 5.2 Data formats Categories of data • Text Documents (inc with images), spreadsheets • Field notebooks, diaries • Questionnaires, transcripts, codebooks • Audiotapes, videotapes • Photographs, films • Test responses • Slides, artefacts, specimens, samples • Collection of digital objects acquired and generated during the process of research • Data files • Database contents (video, audio, text, images) Data can be analogue, born digital or digitised. 6 Why Manage Research Data? Research data management is a standard part of good research practice, offering a range of benefits to society, funders of research and the research community. It should also help with your own research practice, potentially increasing efficiency, saving time and resources and boosting the impact and visibility of your work. Here are some of the drivers promoting good research data management and research data sharing. Society Publicly funded research data are a public good, produced in the public interest, which should be made openly available.” (RCUK) The OECD has produced a set of Principles and Guidelines for Access to Research Data from Public Funding, available online at http://www.oecd.org/science/sci-tech/38500813.pdf Government The government in its ‘Innovation and Research Strategy for Growth’ has committed to the principle that publicly funded academic research is a public good produced in the public interest and that, while intellectual property must be protected and commercial interests considered, it should be made openly available with as few restrictions as possible. In this way, we will more effectively realise the social and economic benefits of spreading knowledge, raising the prestige of UK research and encouraging technology transfer. Research Funders Many research funders have research data management requirements – the Digital Curation Centre has collated a useful summary of Research Council policies at: http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies The RCUK Common Principles on Data Policy provide an overarching framework for the individual councils. The Principles are online at http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx and some of the key points are: 4 • • • • • Data with acknowledged long term value should be preserved and remain accessible and usable for future research To enable research data to be discoverable and effectively re-used by others, sufficient metadata should be recorded and made openly available RCUK recognise that there are legal, ethical and commercial constraints on the release of research data. [Researchers] may be entitled to a limited period of privileged use of the data […] to enable them to publish the results of their research. […] all users of research data should acknowledge the sources of their data and abide by the T&Cs under which they are accessed. University The University Policy on Research Data Management (text included here as Annex D) stems from the University’s commitment to research excellence and recognition of the value of the data generated by researchers. The University is working towards a more robust infrastructure for research data management so that we can fulfil our funder requirements and provide better support for researchers. Individual There is evidence to suggest that making data available increases the citation rate of associated papers (Piwowar et al 2007 shows 69% citation advantage, Henneken & Accomazzi 2011 shows 20% citation advantage). Furthermore, citation of the data in its own right is an emerging scholarly trend. Managing your data effectively should: • Increase research efficiency • Save time and resources • Enhance data security • Preserve data for your own use in the future Sharing your data with others could increase your own impact and profile as a researcher and help you build networks with other researchers in your field. 5 7 How to manage research data 7.1 Metadata What does this mean? 110612 Contextual information for data is called metadata literally data about data Data repositories & archives require some generic metadata, e.g. author, title, publication date For data to be useful, it will also need subject-specific metadata e.g. sampling and processing, experimental conditions, population demographic Record contextual information in a text file (such as a ‘read me’ file) in the same directory as the data e.g. codes for categorical survey responses ‘999 indicates a dummy value in the data’ The UK Data Archive provides a more detailed explanation of metadata in Documenting your data, www.data-archive.ac.uk/create-manage/document, (accessed 26/11/12). A standard for metadata is Dublin Core http://dublincore.org/documents/dces/ (accessed 08/11/12). An example of the Dublin Core fields, illustrating the type of metadata which might be generated for the document you are reading, is available on the inside cover of the booklet. The MIT Libraries also provide guidance on metadata on their Data Management and Publishing, Documentation and Metadata web page, http://libraries.mit.edu/guides/subjects/datamanagement/metadata.html, (accessed 26/11/12). 6 Metadata can be stored at different levels: Individual file level • e.g. image format, resolution, colour depth etc. Data object level • e.g. multiple files make up a single panoramic data object Object group level • e.g. multiple photos taken at same location during a single session Project level • e.g. relevant research records / consent forms. etc More info: http://www.data-archive.ac.uk/create-manage/document 7.2 Documenting data Guidance on the format of data is available from the following. UK Data Archive, Create & Manage Data, Documenting Your Data, Data Level, www.dataarchive.ac.uk/create-manage/document/data-level, (accessed 27/11/12). Broader advice on documenting data, linking the views of metadata and data is available from UK Data Archive, Create & Manage Data, Documenting Your Data, www.dataarchive.ac.uk/create-manage/document, (accessed 27/11/12). 7.3 File formats When storing data in files you should consider the file format. Formats that can only be accessed by specialist software are only of use as long as the software is available. On the other hand ASCII text files (.txt) and comma-separated values (.csv) formats can be read by many pieces of software. A project might recommend a set of file types to be used such as, Timescapes: http://www.timescapes.leeds.ac.uk/archive/ which has as standard formats XLS; CSVs, JPGs, PDFs, PPTs, RTFs (all doc and docx files are saved as rich text format files/ ‘RTFs’ as this better supports longevity). Guidance on preservation friendly file formats is available from the following: UK Data Archive, Create & Manage Data, Formatting Your Data, File Formats & Software, http://www.data-archive.ac.uk/create-manage/format/formats-table , (accessed 22/02/13). MIT Libraries, Data Management and Publishing, File Formats for Long-Term Access, http://libraries.mit.edu/guides/subjects/data-management/formats.html, (accessed 26/11/12). University of Minnesota, Digital Conservancy Preservation policy, http://conservancy.umn.edu/pol-preservation.jsp (accessed 22/02/13) 7 7.4 Access, Sharing and Re-use Constraints on data sharing Possible solutions The table from the UK Data Archive on page 10 gives some suggested barriers to research data sharing with potential solutions to make data sharing possible. There are benefits in publishing research data. There is also legislation such as the Data Protection Act and commercial confidentiality that requires care is taken when publishing research data. Some research may be commercially valuable and covered by confidentiality agreements. Research data that compromises such areas of confidentiality should not be published. “When research involves obtaining data from people, researchers are expected to maintain high ethical standards such as those recommended by professional bodies, institutions and funding organisations, both during research and when sharing data.” (UKDA, Managing and Sharing Data). UK Data Archive suggests that it is often possible to make data available if three key issues are addressed: Informed consent, addressing both current and future use. Confidentiality – data may need to be anonymised so that individuals, organisations or businesses cannot be identified. Access control – it may be necessary to control who has access to data – this may enable data sharing for research and educational purposes. Further exploration of these issues is available on the UK Data Archive web site at http://dataarchive.ac.uk/create-manage/consent-ethics IPR and Copyright Ownership of the data should be carefully considered – particularly if the research utilises data from third party sources who own copyright. Ideally, ownership should be considered at the beginning of the research process. If the rights in the data are unclear, this may limit what can be done with the data. 8 For data where you own the rights – or where you have appropriate permissions from third party copyright holders – you should consider what licence or end user agreement will be applied if your data is made available to others for re-use. For example, will it be appropriate to apply a global licence such as a Creative Commons or Open Data Common Licence? More information on how to licence data is available from the Digital Curation Centre at: http://www.dcc.ac.uk/resources/how-guides/license-research-data Data sharing and repositories Data are often shared informally between researchers or may, for example, be made available from a project web site. Locating an appropriate data repository can be an excellent way to share data and to make sure that it will be available in the longer term. Repositories are likely to apply some quality control standards to the data and may, for example, assign a unique identifier to the data, making it more readily findable and citable. Always check the terms of conditions of any repository service. A list of data repositories is available from DataCite at: http://datacite.org/repolist University of Leeds is working towards a repository service – either local or in collaboration with other institutions – which will hold research data generated by University of Leeds researchers. Further support and advice Further information on legal and ethical aspects of research is available from the Secretariat and Research and Innovation Services web sites at the University. http://www.leeds.ac.uk/secretariat/policies_procedures_codesofpractice.html http://researchsupport.leeds.ac.uk/index.php/academic_staff/good_practice/ There is also a wealth of information and advice available from external organisations such as the UK Data Archive http://data-archive.ac.uk/, Digital Curation Centre http://www.dcc.ac.uk/ and from research funders such as the ESRC http://www.esrc.ac.uk/researchethics/ . Ethics, sharing and re-use checklist Are there funder requirements you need to fulfill? Is there a subject repository for your data? What other dissemination routes are available to you? Have you obtained appropriate consent for current and future uses of the data? Are you clear on who owns the data – particularly if you are using data sourced from a third party copyright owner? Are you being consistent across your data management plan, ethical review, and consent forms? If appropriate, have you anonymised your data? Can your data be made openly available? If not, are you clear on how access to your data will be controlled? Will some data be embargoed? Have you applied a re-use licence to clarify what may or may not be done with your data? 9 UKDA Potential barriers to data sharing – with suggested solutions 1/2 REASONS NOT TO SHARE DATA REPLIES OR ARGUMENTS IN FAVOUR OF SHARING 1 My data is not of interest or use to anyone else. It is! Researchers want to access data from all kinds of studies, methodologies and disciplines. It is very difficult to predict which data may be important for future research. Who would have thought that amateur gardener’s diaries would one day provide essential data for climate change research? Your data may also be essential for teaching purposes. Sharing is not just about archiving your data but about sharing them amongst colleagues. 2 I want to publish my work before anyone else sees my data. Data sharing will not stand in the way of you first using your data for your publications. Most research funders allow you some period of sole use, but also want timely sharing. Also remember that you have already been working with your data for some time so you undoubtedly know the data better than anyone coming to use them afresh. If you are still concerned you can embargo your data for a specific period of time. 3 I have not got the time or money to prepare data for sharing It is important to plan data management early in the research data lifecycle. Data management ideally becomes an integral part of your research practice, reduces time and financial costs and greatly enhancing the quality of the data for your use too. 4 If I ask my respondents for consent to share their data then they will not agree to participate in the study. Don’t assume that participants will not participate because data sharing is discussed. Talk to them – they may be less reluctant than you might think, or less concerned over data sharing! Make it clear that it is entirely their decision, whereby they can decide whether their data can be shared, independent of them participating in the research. Explain clearly what data sharing means, and why it may be important. But they are still free to consent or not. You can always explain what data archiving means in practice for their data. If you have not asked permission to share data during the research, then you can always return to gain retrospective permission from participants. 5 I am doing highly sensitive research. I cannot possibly make my data available for others to see. The first thing is to ask respondents and see if you can get consent for sharing in the first instance. Anonymisation procedures can help to protect identifying information. If these first two strategies are not appropriate then consider controlling access to the data or embargoing for a period of time. Also data that is held in the UK Data Archive is not publically available. Only registered researchers can gain access to the data. 6 I am doing quantitative research and the combination of my variables discloses my participant’s identity. Quantitative data can be anonymised through processes of aggregation, top coding, removal of variables, or controlled access to certain variables (i.e. postcodes). 10 SHARING YOUR DATA – WHY AND HOW 11 7 I have collected audiovisual data and I cannot anonymise them, therefore I cannot share these data. Visual data can be anonymised through blurring faces or distorting voices, but this can be time consuming and costly to carry out. It can mean losing much of the value of the data. It is better to ask for consent to share data from participants in an unanonymised form, and/or control access to the data. 8 I have made promises to Why were such promises made? Always avoid making unnecessary promises to destroy data. There is usually no destroy my data once legal or ethical need to do so, except in the case of personal the project finishes. data. But that certainly would not apply to research data in general. Also consider where you have received this advice from? You may need to negotiate with research ethics committee or ethics boards about this agreement. 9 My data have been gathered under complete assurances of confidentiality. Again why was such an assurance made? It is best to avoid unnecessary promises. Anonymisation procedures can be implemented to protect identities, but confidentially can never be completely guaranteed. You can also consider controlling access to the data. 10 My data collection and resulting transcripts are in a foreign language. This should not be a problem. The UK Data Archive can accept foreign language transcripts although translations into English are preferred. 11 It is impossible to anonymise my transcripts as too much useful information is lost. Get in touch with us at the UK Data Archive. We may be able to help and it might not be as difficult as it looks. Also, access controls on the data may be a better solution than anonymisation if too much useful information would be lost. 12 My data collection contains data which I have purchased and it cannot be made public. It is important to know who holds the copyright to the data you are using and to obtain the relevant permissions. You need to be aware of the licence conditions of the data you are using and what you can and cannot do with the data. 13 Other researchers would not understand my data at all - or may use them for the wrong purpose. Producing good documentation and providing contextual information for your research project should enable other researchers to correctly use and understand your data. 14 There is IPR in the data. This should not be a problem if you seek copyright permission from the owner of the intellectual property rights. This is best done early on in the research project, but could be sought retrospectively. 11 7.5 Timescapes re-use example Timescapes is an ESRC Qualitative Longitudinal Initiative comprising a network research of projects across five universities as a test bed to advance Qualitative Longitudinal (QL) research with the aim of creating an archive of QL data for sharing, to promote and showcase the re-use of the Timescapes resource. Spanning the life course, Timescapes aims to produce new insights into the dynamics of personal relationships and family life by ‘walking alongside’ individuals and family groups as their lives unfold. The datasets created from the seven Timescapes projects have been deposited in the Timescapes Data Archive - an accessible, re-useable and growing resource available to researchers. http://www.timescapes.leeds.ac.uk/ The Timescapes Secondary Analysis Project The aim of re-using data/data sharing of archived data was central to the research undertaken by the Timescapes Secondary Analysis Project who researched opportunities for the development of new theories and research questions from existing data; the creation of new evidence and the exploration of new methodologies in data use and re-use. For example - secondary analysing the National Child Development Survey 1969 Essay dataset alongside the data arising from the Timescapes Young Lives and Times Research Project 2006, highlighted contemporary and historical social research perspectives of inequality, social change and continuity issues occurring not only across families but across generations also. For further details see Timescapes secondary-analysis project http://www.timescapes.leeds.ac.uk/research/secondary-analysis 12 8 A Lifecycle for Research Data A number of lifecycles for research data have been developed. These include the following: Digital Curation Centre, Curation Lifecycle Model, www.dcc.ac.uk/resources/curationlifecycle-model (accessed 26/11/12). MANTRA’s www.docs.is.ed.ac.uk/docs/data-library/MANTRA_poster.pdf (accessed 26/11/12). UK Data Archive, Research Data Lifecycle, http://data-archive.ac.uk/create-manage/lifecycle (accessed 26/11/12). UKOLN, Infrastructure for Integration in Structural Sciences (I2S2) project, I2S2 Idealised Scientific Research Activity Lifecycle Model, www.ukoln.ac.uk/projects/I2S2/documents/I2S2-ResearchActivityLifecycleModel110407.pdf (accessed 26/11/12). 13 Figure 1. UK Data Archive Research Data Lifecycle. CREATING DATA RE-USING DATA PROCESSING DATA GIVING ACCESS TO DATA ANALYSING DATA PRESERVING DATA CREATING DATA design research plan data management (formats, storage etc.) plan consent for sharing locate existing data collect data (experiment, observe, measure, simulate) capture and create metadata PRESERVING DATA migrate data to best format migrate data to suitable medium back-up and store data create metadata and documentation archive data PROCESSING DATA enter data, digitise, transcribe, translate check, validate, clean data anonymise data where necessary describe data manage and store data GIVING ACCESS TO DATA distribute data share data control access establish copyright promote data ANALYSING DATA interpret data derive data produce research outputs author publications prepare data for preservation RE-USING DATA follow-up research new research undertake research reviews scrutinise findings teach and learn 14 9 Data Management Planning 9.1 What makes a good research data management plan? This section has been provided by the Digital Curation Service. Exercise: assessing a Data Management Plan Researchers often assume that those reviewing grant proposals will know what we are inferring with vague statements and that the acronyms and terms we use daily in our specific research areas are understood by all. There is no better way to learn what not to do when submitting a new bid than to take part in the evaluation of others' proposals. Similarly, a good way to get a grasp of what you might want to include in your future data management plans is to look over one and assess its merit as a reviewer might. Breaking into groups of 4-5, look over the sample data management plans at: - Annex A – Data Management Plan A (page 23) - Annex B – Data Management Plan B (page 25) Each group should consider whether the sample data management plans provide sufficient detail to enable the plans to be effectively assessed as part of a bid. Would you need more information in some areas to be able to determine how well the PI will be managing their data? Groups will have 10 minutes to review the sample data management plans and related documents. Each group will be asked to report back on areas where additional information would be needed by reviewers to make an informed assessment. 15 A data management plan that explicitly addresses the capture, management, integrity, confidentiality, preservation, sharing and publication of research data must be created for each proposed research project or funding application. Sufficient metadata shall also be created and stored to aid discovery and re-use. Data management plans should take account of and ensure compliance with relevant legislative frameworks which may limit public access to the data (for example, in the areas of data protection, intellectual property and human rights). University of Leeds Research Data Management Policy Be clear about who is responsible for all aspects of data management. ESRC/ESDS have produced a guide for peer reviewers of research bids on what to look for in a data management plan. The guidance is included as Annex C and is online at http://www.esrc.ac.uk/_images/Data-Management-Plan-Guidance-for-peer-reviewers_tcm815569.pdf. Many research funders require data management plans – though their exact requirements vary. A useful guide is available from the Digital Curation Centre. http://www.dcc.ac.uk/resources/how-guides/ 16 9.2 Planning your data management The following space should be used to consider your broader research data management needs. The questions are intended to help you identify the current state of how you manage your research data and to get you thinking about how you will improve the way you manage your research data. These questions are based on elements in the Digital Curation Centre Checklist for a Data Management Plan. The questions are not a complete set. 1. What funding body requirements should you fulfill? 2. How do you intend to go about developing your (individual and research group) data management plan? 3. What ethical and privacy issues do you have and how will they be addressed? 4. Who is interested in your research or has a stake in it and why? 5. Who would be interested in research data? 6. How much data requires short term storage? 7. What security is required? 8. Who is responsible for creating the research data? 9. Who is responsible for storing the research data? 10. Who is responsible for how you or your research group manage your research data (responsible for the research data management plan)? 11. How will you decide which data is preserved? 12. Where will the data you identify for preservation be stored? 13. What is the budget for data storage and preservation? 14. What is your next step to improve your research data management? 17 10 Approaches to preserving and managing research data Use meta data tools Structure files Document the research As a minimum (at the bottom of the pyramid) research needs documenting. As you go up the pyramid research data management becomes more refined. In the middle there is structuring files. This includes using a folder structure, having readme files to explain the files in a folder and using appropriate file names. At the top of the pyramid is the use of tools specifically for managing research data. 10.1 Organising files Structuring and naming files can aid in finding your data now and in the future. The following is an example approach used by Timescapes for file naming and structuring: 1. Keep file names simple. Use a combination of abbreviations as pointers to what the file is, do not use any spaces, underscores or symbols and I try not to make the file names any longer than 20 characters. 2. An example file name structure: TsJsmith2WrkIntvBp.docx (each new part of the structure starts with a capital letter). The file names reflect elements of the research. i.e. TS (Timescapes as the name of the project) Jsmith (the case of John Smith or an anonymised name) 2 (2nd wave/phase activity but could be the date/month) Wrk (Work as being the Main theme/keyword that this particular John Smith file is referring to) Intv (The research activity e.g. Interview, or Pic for photo, TL Timeline, RMP relationship map / or any variation on this approach as suits your data) Fw (persons initials denoting what stage the file/data is at) 3. Compile a table of the abbreviations used in forming file names. 18 4. This type of file name also suggests a folder structure. In this example the John Smith folder will hold all documentation relating to him, organised firstly into time/wave/phase periods and then within these time periods, the files covering the range of activities relating to Jsmith will be housed. Similarly even when a file is located outside of a bundled structure the file name will tell you the bulk of what you need to know. This above approach is an example. Researchers should design their own file naming and structuring scheme in such a way that it allows easy organisation and finding of data. Guidance on file naming is available from the following. Digital Curation Centre, Standard Naming Conventions for Electronic Records, www.dcc.ac.uk/resources/external/standard-naming-conventions-electronic-records-0 (accessed 08/11/12). University Edinburgh, Records Management Section, Standard Naming Conventions for Electronic Records: The Rules, www.recordsmanagement.ed.ac.uk/InfoStaff/RMstaff/RMprojects/PP/FileNameRules/Rules .htm (accessed 08/11/12). UK Data Archive, Create & Manage Data, Formatting Your Data, Organising Data, www.data-archive.ac.uk/create-manage/format/organising-data, (accessed 26/11/12). MIT Libraries, Data Management and Publishing, Organizing Your Files http://libraries.mit.edu/guides/subjects/data-management/organizing.html, (accessed 26/11/12). 10.2 Data storage You should consider what storage device is most appropriate to your needs. You should identify how much data you have to store. Broadly, electronic storage is achieved using magnetic or optical devices. Both can degrade for example CDs can get scratched. The UK Data Archive recommends that data is copied to new media between 2 and 5 years after they were created. You should also check the integrity of the data. Memory sticks should be used with caution as they are easily corruptible and lost. Indeed, anecdotally the Library and ISS have boxes of memory sticks which have been left in computers and elsewhere. For data protection reasons they are not allowed to look at the content of memory sticks. Could you identify your memory stick in a box of 100 others? Guidance on storing data is available from the following. UK Data Archive, Create & Manage Data, Storing Your Data, Storing Data, www.dataarchive.ac.uk/create-manage/storage/store-data, (accessed 26/11/12). 10.3 Backup You should also consider what the risk of losing your data is. Risk is a function of likelihood and consequence of something happening. What is the risk of losing your data? The JISC funded Sound Data Management Training (SoDaMaT) project has identified research http://code.soundsoftware.ac.uk/attachments/605/JISCTraining.pdf (accessed 09/11/12) which suggests that: within education and research 1 in 10 laptops are lost within 3 years, http://tinyurl.com/8c9m4bn, a typical length for a research project; 1 in 5 laptops have significant problems during their lifetime, http://tinyurl.com/876qza5; as part of their repairs procedures Google replace 1 in 5 of their hard disk drives within 4 years, http://tinyurl.com/octz6b. 19 What is the consequence of losing your data? It might compromise a whole research project and reduce the likelihood of publishing papers. So it is suggested that the risk of research data without mitigating against the above is high! To mitigate against this risk you should back up your data. You should think about how frequently you back up and where you keep the back up. If you keep your backup and computer in the same place and there is a fire then you are likely to lose your data. Guidance on backing up is available from the following. University of Leeds, Policy on Safeguarding Data, http://iss.leeds.ac.uk/info/362/policies/782/policy_on_safeguarding_data (accessed 08/11/12), provides guidance on backing up data. UK Data Archive, Create & Manage Data, Storing Your Data, Backing-up, www.dataarchive.ac.uk/create-manage/storage/back-up, (accessed 26/11/12). 10.4 University of Leeds Policy Policy on safeguarding data http://iss.leeds.ac.uk/info/362/policies/782/policy_on_safeguarding_data (accessed 08/11/12). Policy on Research Data Management http://researchdata.leeds.ac.uk/management-policy (accessed 08/11/12). Also included in Annex D. Data Protection Code of Practice www.leeds.ac.uk/secretariat/data_protection_code_of_practice.html (accessed 08/11/12). What is your experience of finding files? 20 11 Summary “Publicly funded research data are a public good, produced in the public interest, which should be made openly available with as few restrictions as possible in a timely and responsible manner that does not harm intellectual property.” This is the first principle in Research Councils Common Principles on Data Policy. Managing research data is part of good research practice. The data allows you to justify your research findings. Enables you to more easily find and re-use your research data Managed data can be shared with others A research data management plan helps in achieving this. Be clear about who is responsible for research data and its management. 21 References Digital Curation Centre, www.dcc.ac.uk, (accessed 27/11/12). Digital Curation Centre, Overview of funders' data policies, www.dcc.ac.uk/resources/policy-andlegal/overview-funders-data-policies (accessed 26/11/12). Digital Curation Centre, Resources for digital curators, www.dcc.ac.uk/resources, (accessed 27/11/12) Henneken, E.A. & Accomazzi, A. Linking to Data - Effect on Citation Rates in Astronomy. Astronomical Data Analysis Software and Systems XXI: p.763-766 http://arxiv.org/abs/1111.3618, 2011. Jones, S. ‘How to Develop a Data Management and Sharing Plan’. DCC How-to Guides. Edinburgh: Digital Curation Centre, 2011. Available online: www.dcc.ac.uk/resources/how-guides, (accessed 26/11/12). Massachusetts Institute of Technology (MIT) Libraries, Data Management and Publishing, http://libraries.mit.edu/guides/subjects/data-management/index.html (accessed 26/11/12). OECD, OECD Principles and Guidelines for Access to Research Data from Public Funding, 2007. http://www.oecd.org/science/sci-tech/38500813.pdf (accessed 21/02/13) Piwowar H.A., Day R.S., Fridsma D.B. Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308. doi:10.1371/journal.pone.0000308, 2007. Research Councils UK, RCUK Common Principles on Data Policy, www.rcuk.ac.uk/research/Pages/DataPolicy.aspx (accessed 10/12/12). Research Councils UK, RCUK Policy and Code of Conduct on the Governance of Good Research Conduct, Integrity, Clarity and Good Management, October 2011, www.rcuk.ac.uk/documents/reviews/grc/goodresearchconductcode.pdf (accessed 26/11/12). Timescapes, an ESRC Qualitative Longitudinal Initiative, http://www.timescapes.leeds.ac.uk/ (accessed 26/11/12). UK Data Archive, Create and Manage Data, www.data-archive.ac.uk/create-manage, (accessed 26/11/12). University of Leeds, Policy on safeguarding data http://iss.leeds.ac.uk/info/362/policies/782/policy_on_safeguarding_data (accessed 08/11/12). University of Leeds, Policy on Research Data Management http://library.leeds.ac.uk/researchdata-policies#activate-tab1_university_research_data_policy (accessed 08/11/12). University of Leeds, Data Protection Code of Practice www.leeds.ac.uk/secretariat/data_protection_code_of_practice.html (accessed 08/11/12). Researchers should also make sure they know of the data management requirements of their research funders. 22 [Annex A - Data Management Plan A] Socio-technical Systems and Call Centres: a Case Study Investigation This project is not yet funded. Funding body: ESRC Lead organisation: University of X Other organisations: Financial call centre A; Financial call centre: B Project dates: 02 Jan 2012 to 30 Apr 2012 Budget: £25,000.00 1 Existing data sources 1.1 An explanation of the existing data sources that will be used by the research project (with references). 2.2.2 What existing datasets could you use or build upon? N/A 2 Gaps between the currently available and required data 2.1 An analysis of the gaps identified between the currently available and required data for the research. 2.3.1 Why do you need to capture/create new data? 2.4.1 What is the relationship between the new dataset(s) and existing data? There are currently no data available that facilitate my research objectives. N/A 3 Information on the data that will be produced by the research project 3.1 Data volume and data type, e.g. qualitative or quantitative data 35 semi-structured interviews will be carried out with financial call centre managers, employees and customer service representatives (CSR's). The interviews will be audio-taped. Data quality, formats, standards documentation and metadata I will make use of standard formats for my dataset and 2.3.3 Which file formats will you use, audio files. and why? Interview participants will have the chance to sign-off on 2.3.4 What criteria will you use for the audio transcripts to ensure that the information is Quality Assurance/Management? accurate. 2.5.1 Are the datasets which you will be Yes capturing/creating self-explanatory, or understandable in isolation? 2.5.2 If you answered No to DCC 2.5.1, what contextual details are N/A needed to make the data you capture or collect meaningful? Methodologies for data collection 2.3.2 Describe the process by which The interviews will be carried out face to face. you Transcripts will be created following the interview. will capture/create new data 2.1 Give a short description of the data being generated or re-used in this research 3.2 3.3 4 Quality assurance and back-up procedures 4.1 Planned quality assurance and back-up procedures (security/storage) 5.2.1 How will you back-up the data during the project's lifetime? 5.3.1 How will you manage access The data will be stored on the PI's laptop during the life of the project. Once the project has been completed, the data will be deposited with ESDS. Only the PI will have access to the data during the 23 restrictions and data security during the project's lifetime? 5.3.3 Give details of any other security issues. project. N/A 5 Management and archiving of collected data 5.1 Plans for management and archiving of collected data 6.1 What is the long-term strategy for maintaining, curating and archiving the data? The data will be deposited with the ESDS upon completion of the project. 6 Difficulties in data sharing 6.1 Expected difficulties in data sharing, along with causes and possible measures to overcome these difficulties. (You may wish to include explicit mention of consent, confidentiality, anonymisation and other ethical considerations.) 3.1.1 Are there ethical and privacy issues that may prohibit sharing some or all of the dataset(s)? 3.1.2 If you answered Yes to DCC 3.1.1 How will these be resolved? Yes Transcripts will only be seen by the participants involved in the interview and the PI. The transcripts for deposit to ESDS will be anonymised.. 7 Copyright and intellectual property 7.1 Copyright and intellectual property ownership of the data 3.2.1 Will the dataset(s) be covered by copyright or the Database Right? If so give details in DCC 3.2.2, below. 3.2.2 If you answered Yes to DCC 3.2.1 Who owns the copyright and other Intellectual Property? Yes The PI owns the copyright for this data. 8 Responsibilities for data management and curation 8.1 Responsibilities for data management and curation within research teams at all participating institutions 7.1 Outline the staff/organisational roles and responsibilities for implementing this data management plan. 3.1.2 If you answered Yes to DCC 3.1.1 How will these be resolved? The PI will be responsible for all aspects of data management during the life of the project. ESDS will be responsible for long-term management and curation of this data. Transcripts will only be seen by the participants involved in the interview and the PI. The transcripts for deposit to ESDS will be anonymised.. Signature _______________________________Date ______________________________ Print name ______________________________Role/institution ______________________ Signature _______________________________Date ______________________________ Print name ______________________________Role/institution ______________________ Signature _______________________________Date _______________________________ Print name ______________________________Role/institution _______________________ 24 [Annex B - Data Management Plan B] Public relations, society and the public sphere Project Stage: Application RCUK Research Councils: Economic and Social Research Council Lead Organisation: University of Leeds Project dates: 1 January 2013 to 31 December 2015 Budget: £414,238.00 1 Existing data sources 1.1 An explanation of the existing data sources that will be used by the research project (with references). DCC 2.2.2: What existing datasets could you use or build upon? A number of data sets are named in the proposal, as follows: National Census Data 2011; PRCA member survey 2011; CIPR member survey 2011. 2 Gaps between the currently available and required data 2.1 An analysis of the gaps identified between the currently available and required data for the research. DCC 2.3.1: Why do you need to capture/create new data? Existing quantitative data will provide enough material to establish the structures of the PR industry. However, there is no existing qualitative data about the ways in which the cultures and practices of PR across different contexts contribute to the shape of the field. For this reason, new qualitative data is required, targeting this information specifically. DCC 2.4.1: What is the relationship between the new dataset(s) and existing data? Existing data will facilitate a quantitative analysis of the field’s objective structures (company sizes, specialist sectors, turnover, geographical spread, practitioner demographics). This will inform the analysis of the qualitative data and put into context the cultural and practical norms that characterise the field. The analysis of both aspects of the field will help clarify why certain meanings and values are attributed to different forms of practice across the field, and how these different attributions ultimately shape the industry as a whole. 3 Information on the data that will be produced by the research project 3.1 Data volume and data type, e.g. qualitative or quantitative data DCC 2.1: Give a short overview description of the data being generated or re-used in this research Existing quantitative data will be drawn from industry associations, the ESDS databases and government sources, in order to establish the objective structures of the field. These sources are: • National occupational and demographic data including the 2011 census data • Existing CIPR and PRCA membership data covering industry structures including sectors, specialisms, salary levels, educational data, patterns of tenure, practitioner demographics and distribution across the field, consultancy size, turnover and locations (e.g. PRCA, 2011, CIPR 2010 and the annual PRCA benchmarking survey) • CIPR and PRCA membership survey data • Quantitative analysis of PR award recipients (campaigns and practitioners) for the last decade (this database will be constructed from existing records) 25 • ESRC databases (Miller, 2005) • The Great British Class Survey (BBC-led project, data will be publicly available in 2013) Qualitative data sources are new, raw observational data gathered from interviews, focus groups and existing documentary sources, and will comprise: • Professional association and consultancy web sites and promotional material • Industry awards submissions and judgements • Interviews and focus groups with practitioners • Interviews with recruitment specialists in PR • Case analyses of specific issues as examples of ‘public sphere’ debates 3.2 Data quality, formats, standards documentation and metadata DCC 2.3.3: Which file formats will you use, and why? Microsoft Word 2007 for text based documents. MP3 or WAV for audio files. Quicktime Movie or Windows Media Video for video files. Quantitative data analysis will be stored in SAV file format (used by SPSS) from which data can be extracted using the open-source spssread Perl script. These file formats have been chosen because they are accepted standards and in widespread use. Files will be converted to open file formats where possible for long term storage. DCC 2.3.4: What criteria and/or procedures will you use for Quality Assurance/Management? Existing quantitative data sources will be subject to the quality assurance processes of the organisation that originally gathered the data. However, we will review the data and ensure it is ‘cleaned’ for use in our own study. Where possible and permitted by participants, data will also be stored in the ESDS archive, as per the terms of ESRC funding. DCC 2.5.1: Are the datasets which you will be capturing/creating self-explanatory, or understandable in isolation? Yes DCC 2.5.2: If you answered No to DCC 2.5.1, what contextual details are needed to make the data you capture or collect meaningful? However, in order to be absolutely clear about the meaning of the data, each data set will have an overview document attached to it, covering the details required by ESDS that will facilitate their use by other researchers (e.g. sample structure, focus of the data collection, dates of collection, etc). 3.3 Methodologies for data collection DCC 2.3.2: Describe the process by which you will capture/create new data The following methods will be used to collect new data: 1. Collating existing web site and promotional material from professional associations and consultancies; 2. Collation of submissions to and judgements of industry awards; 3. Interviews with practitioners (60 in total), PR specialist recruiters (5); end-users of PR (15) 4. Focus groups with practitioners in different UK regions (6) 5. Case analyses (3 issues) comprising PR-driven documents created by stakeholder organisations in relation to the issue; and media coverage of the different perspectives on the issue (including PR-driven and non-PR driven media coverage). 4 Quality assurance and back-up procedures 26 4.1 Planned quality assurance and back-up procedures (security/storage) DCC 5.2.1: How will you back-up the data during the project's lifetime? Electronic data will be stored on the University of Leeds SAN (Storage Area Network), which comprises enterprise level file servers in physically secure data centres with appropriate fire suppression equipment. Snapshots are taken every day at 10pm (and accessible for 1 month). A second level of snapshots is taken every month and are kept for 11 months. Snaphots are user recoverable from the desktop. An incremental copy to backup tape is taken every night (and kept for 28 days) and a full copy is taken every month. Every quarter, the full dump tapes are moved to a long term storage facility where they are kept for 12 months. Tapes are initially stored in on-campus fireproof safes and then moved to off-campus secure locations. DCC 5.3.1: How will you manage access restrictions and data security during the project's lifetime? Access to electronic data is controlled by Active Directory (AD) Group membership. The Faculty IT Manager will set up a dedicated folder for this research project and create readonly and read-write AD groups. The PI will decide which users require read-only and readwrite access. Off-campus access is via the Citrix portal. External users who need access to the data will apply for a University username and then be assigned to the appropriate AD group. DCC 5.3.3: Give details of any other security issues. Any sensitive data (as defined by the Data Protection Act) that is stored on portable electronic devices will be protected by encryption software to FIPS 140-2 standard. Any sensitive data that needs to be transmitted electronically will first be encrypted to FIPS 140-2. If any highly sensitive data needs to be stored, then research data folder on the SAN will be encrypted, so it can only be accessed by authorised members of the project with the appropriate encryption software installed on their desktop PCs. Highly sensitive data is not available from off-campus. 5 Management and archiving of collected data 5.1 Plans for management and archiving of collected data DCC 6.1: What is the long-term strategy for maintaining, curating and archiving the data? Data which is able to be made publicly available will be offered to the UK Data Archive at the end of the project. 6 Difficulties in data sharing 6.1 Expected difficulties in data sharing, along with causes and possible measures to overcome these difficulties. (You may wish to include explicit mention of consent, confidentiality, anonymisation and other ethical considerations.) DCC 3.1.1: Are there ethical and privacy issues that may prohibit sharing some or all of the dataset(s)? 27 Yes DCC 3.1.2: If you answered Yes to DCC 3.1.1, How will these be resolved? Some consultancies or individuals may be reluctant to have their data shared. Such concerns will be resolved for the interviews and focus groups by ensuring all transcripts are anonymous and identifying detail (e.g. campaign names) are removed or given pseudonyms in the transcript. The case analyses, industry awards data and analysis of promotional material will proceed on the basis of publicly available documents, so the issue of sharing should not arise. However, copyright may be an issue for some of these documents. To resolve this, and before data is stored by ESDS, copyright permission will be sought for web and other promotional material written by participating organisations. 7 Copyright and intellectual property 7.1 Copyright and intellectual property ownership of the data DCC 3.2.1: Will the dataset(s) be covered by copyright or the Database Right? If so give details in DCC 3.2.2, below. Yes DCC 3.2.2: If you answered Yes to DCC 3.2.1, Who owns the copyright and other Intellectual Property? See response to question 3.1.1 8 Responsibilities for data management and curation 8.1 Responsibilities for data management and curation within research teams at all participating institutions DCC 7.1: Outline the staff/organisational roles and responsibilities for data management The PI will have overall responsibility for implementing the data management plan. The Faculty IT Manager will be responsible for ensuring that electronic file permissions have been correctly assigned and for advising on other aspects of data storage and security. Staff involved in the project at participating institutions will be responsible for following data management procedures. 28 Annex C - ESDS Data Management Plan Guidance for Peer Reviewers] Data Management Plan: Guidance for peer reviewers What is a data management plan? .......................................................................................................... What do I need to check? ....................................................................................................................... Assessment of existing data .................................................................................................................... Information on new data .......................................................................................................................... Quality assurance of data ......................................................................................................................... Backup and security of data ..................................................................................................................... Expected difficulties in data sharing ....................................................................................................... Copyright/intellectual property right .................................................................................................... Responsibilities ........................................................................................................................................... Preparation of data for sharing and archiving ...................................................................................... 1 1 2 2 2 2 3 3 3 4 What is a data management plan? A data management plan should incorporate data management into the research cycle to ensure that generated data can be made available and re-used to the maximum extent possible at the end of a grant, in line with the ESRC Research Data Policy (http://www.esrc.ac.uk/aboutesrc/information/data-policy.aspx). A data management plan is mandatory in all applications planning to generate data, except for applicants applying for studentships. Most data generated as a result of economic and social research can be successfully archived and shared. However, some research data are more sensitive than others. It is the grant holder’s responsibility to consider all issues related to confidentiality, ethics, security and copyright before initiating the research. Any challenges to data sharing (eg copyright or data confidentiality) should have been critically considered in a plan, with possible solutions discussed to optimise data sharing. What do I need to check? Please assess the quality of the proposed data management plan and comment on whether appropriate and realistic consideration has been given to data management requirements to maximise data sharing, and whether the requirements are justified according to the proposed 29 research. The data management plan is an integral part of the application and should be considered in the context of information presented in the Case for Support and Justification for Resources. We have given suggestions below for each of the points included in a data management plan that may help you to evaluate whether the plan is fit for purpose. For more guidance please refer to the ESDS data management guides (http://www.esds.ac.uk/support/datamanguides.asp) and the UK Data Archive’s Managing and Sharing Data guide (http://www.data-archive.ac.uk/create-manage). Any suggestions for improvements to data management plans are welcome and will be fed back to applicants. If you do not feel competent to comment on data management, please select ‘Unable to assess’. Assessment of existing data You may want to consider the following questions: • Is there evidence that secondary sources of data have been considered and evaluated? • Is there evidence presented that the project is not creating new data when there are existing resources that could be re-used? • If existing data are used, have issues such as copyright or IPR of such data been considered and possible copyright clearance obtained to be able to share data or data derived thereof? Information on new data You may want to consider the following questions: • Is the information on data to be produced adequate and realistic and according to the research and methodology proposed in the application? • Is there evidence that the plan covers all data that is planned to be generated from the research? • Is sufficient information given on how data will be collected and in which formats (eg Open Document Format, tab-delimited, Excel etc) data will be analysed and stored, as well as an indication of how they will be documented? Quality assurance of data You may want to consider the following questions: • Is information given on procedures for quality assurance that will be carried out on the data collected? (Please refer to the Case for Support for full information on quality control of the proposed research.) This could include methods for data validation or 30 standards applied during data collection and data entry, codes of research practice adhered to, transcription templates used, etc. • Are no quality assurance procedures mentioned when there is a clear need from the proposed research that there should be? Please note that quality issues are to be addressed at the time of data collection, data entry, digitisation or data checking. Backup and security of data You may want to consider the following questions: • Is the data back-up procedure described fit for purpose? eg considering back-up procedures for all institutions • Are methods of version control described? (ie making sure that if the information in one file is altered, the related information in other files is also adopted, as well as keeping a track on a number of versions and their locations) Expected difficulties in data sharing You may want to consider the following questions: • Have all obstacles to sharing data been considered? • Have strategies been considered for dealing with these issues? For example by: o discussing data sharing and re-use with interviewees and gaining specific consent from participants to share research data o anonymising data to remove personal and disclosive information o regulating access to data If there are ethical issues which may cause difficulties in data sharing, strategies for dealing with these issues should be discussed in the relevant section in the Je-S form. In assessing this part of the application you may want to refer to the requirements of the ESRC Framework for Research Ethics (www.esrc.ac.uk/researchethics). If newly generated data cannot be shared, adequate justification should be given. It may be a case that parts of the data that are sensitive cannot be shared, but this should be considered critically and the plan should provide evidence that it has been assessed from all angles. We regard a waiver of deposit as an exception, and reserve the right to refuse waivers where there is insufficient evidence that the applicant has fully explored all strategies to enable data sharing and archiving. Copyright/intellectual property right You may want to consider the following questions: 31 • Is copyright of research data (both existing sources of data used or created) agreed or clarified, especially for collaborative research or if various sources of data are combined? • Are plans in place for copyright clearance for data sharing (if possible)? Responsibilities You may want to consider the following questions: • Have data management responsibilities been allocated to named individuals? • Is there evidence that data management will be followed throughout the course of the project? • Has consideration been given to the variety of data management tasks that may be required for the research? • For collaborative research, are data management responsibilities allocated at each partner organisation (if needed for the research) or has the coordination of data management responsibilities across partners been considered? For further information please refer to the information provided within the Staff Duties section in the Je-S form and where appropriate in the Justification of Resources. Preparation of data for sharing and archiving The following questions may be considered when assessing this section of the plan: • Are the plans for preparing and documenting data for sharing and archiving with the Economic and Social Data Service appropriate? • Is there evidence that data will be well documented during research to provide highquality contextual information and/or structured metadata for secondary users? eg documenting the method of data collection, origin, circumstances, processing and analysis of data. Preparation of data for sharing and archiving The following questions may be considered when assessing this section of the plan: • Are the plans for preparing and documenting data for sharing and archiving with the Economic and Social Data Service appropriate? • Is there evidence that data will be well documented during research to provide highquality contextual information and/or structured metadata for secondary users? eg documenting the method of data collection, origin, circumstances, processing and analysis of data 32 [Annex D - University of Leeds Research Data Management Policy (July 2012)] University of Leeds Research Data Management Policy (July 2012) http://researchdata.leeds.ac.uk/management-policy The management of Research Data reflects our: commitment to research excellence recognition of our duty to our funders appreciation of the value of our data - to us and to others 1. Research data will be managed to agreed standards throughout the research data lifecycle and according to funder requirements. 2. Responsibility for research data management during any research project or programme lies with responsible owners such as Principal Investigators (PIs). 3. The University is responsible for the provision of training, support and advice on research data management 4. A data management plan that explicitly addresses the capture, management, integrity, confidentiality, preservation, sharing and publication of research data must be created for each proposed research project or funding application. Sufficient metadata shall also be created and stored to aid discovery and re-use. Data management plans should take account of and ensure compliance with relevant legislative frameworks which may limit public access to the data (for example, in the areas of data protection, intellectual property and human rights). 5. All research data should be offered and assessed for deposit and preservation in an appropriate University, national or international data service or domain repository, unless specified otherwise in the data management plan. 6. Data should not be deposited with any organisation that does not commit to its access and availability for re-use, unless this is a condition of the project funding or arising from other requirements. 7. At the completion of each research project, the PI should ensure that all relevant research data are made available, subject to meeting appropriate requirements, in the location specified in the data management plan. 8. Research and Innovation Board will be responsible for reviewing and updating the policy. The University recognises the following benefits of implementing this policy: a. support for the re-use of data b. benefit future generations c. improved data integrity, security and access management d. opportunities for further research collaboration e. improved research reproducibility and validation f. further development of research skills g. the ability to cite data as a publication h. improved institutional research reputation i. improved relationship with research funders 33
© Copyright 2024 Paperzz