Risk Assessment and Cost Appraisal Delos - November 15-16 Academia Nazionale Dei Lincei - Rome Rory McLeod – Digital Preservation Manager The British Library, London Overview Risk assessment Background Components Key findings and activities 2007/08 Cost appraisal Lifecycle management LIFE1 LIFE2 2 Content analysis from 2006- DPC Mind the Gap Over 200 terabytes of data growing by over 50 terabytes a year Majority is from the sound archive, however: Manuscripts 1.5 terabytes Asia, Pacific and Africa 9.5 terabytes European and American 0.5 terabytes Commercial services 5 terabytes Voluntary deposit of electronic publications 1.5 terabytes Newspapers potentially 70 terabytes at project conclusion Microsoft digitisation 30 terabytes Sound recordings 12 terabytes A wide variety of formats are represented in the data; most common formats are found, but there are also smaller amounts of rare and proprietary data Update- Based upon the risk assessment we estimate the total size to be closer to 300 TB. 3 Background — Risk assessment 2007 The objective of Digital Preservation Team is to address the risk of deterioration of digital material through short-term access to long-term preservation. –Building on a 2003 study When we examined the 2003 risk assessment we concluded: Having the object isn’t enough (CDR) Knowing the format of the object isn’t enough (EPS) You need software to use it, a computer to run it on (Postscript Parser) The functionality and access of the object can intimately depend on the details of the environment, most of which we don’t have. (operating system, hardware requirements) Taking those basic concepts a little further.. 4 Background - Risk assessment 2007 Physical media deterioration: The lifetimes of physical media can be measured in years (or even months, e.g. recordable CD/DVD) Unlike books which can be kept for centuries in the right conditions Technical obsolescence: 3.5” floppy disk drives used to be ubiquitous, now only a few have them File format obsolescence: Keeping the bitstream isn’t enough; we need to understand it Many file formats are undocumented; we can’t understand the files, and need the software Software gets abandoned (who uses WordStar any more?) and new versions can be incompatible with old files Environmental obsolescence: Keeping the software isn’t enough; we need to run it But old hardware doesn’t work, or isn’t available 5 Background - Risk assessment 2007 So our starting point: 1. 2. 3. 4. 5. Identify the digital assets currently held Identify the environmental requirements of those assets Assess the risks jeopardizing the use of those assets React to those risks (“save” those assets to which access no longer exists or will most soon be lost) Proactively respond to those risks (prevent assets from becoming inaccessible in the future) 6 Background – Risk assessment 2007 This produced an accurate and detailed digital holdings list: Be as near to exhaustive as possible Be detailed Physical formats Disk/file-system formats File formats Operating system requirements Application requirements } Not covered in 2003 7 Background – Risk assessment From this holdings list, we have performed a risk assessment to: Enumerate the risks faced Evaluate the likelihood and impact of each risk Rank holdings according to risk Perform risk-based triage on holdings And longer term, from the assessment and ranking we will: Prioritise ingest into DOM Write preservation plans and take preservation actions to target the highest-risk material Possibly Migrate it to less risky file formats Preserve software environments (emulators etc.) Guide future ingest Determine a set of preferred long-term preservation formats 8 Components - Risk assessment 2007 AS/NZ 4360:2004 risk standard Follows on from 2003 study Methodology is split into Context, identify, analyse, evaluate and treat A new value scale for BL holdings plus DRAMBORA an established risk toolkit was used for impact (cataclysmic to none) Representative content was analysed from all areas of the collections, the first time this has been done. 23 different risks were identified, these were gathered into 6 direct and 2 indirect risks Media Degradation, Media Obsolescence, File format obsolescence, Hardware obsolescence, Operating system + file system, Software obsolescence Poor policy (Cataloguing, Metadata), Poor policy other (Handling, Training) 9 Components - Risk assessment 2007 The AS/NZS 4360:2004 Risk Management standard defines a seven-step approach to risk management: Communicate and consult Communicate and consult with internal and external stakeholders as appropriate at each stage of the risk management process and concerning the process as a whole. Establish the context Stakeholders are identified, and the objectives of the stakeholders and the organization as a whole are established. Identify the risks. In this stage, the risks—that is, what can go wrong—are enumerated and described. We used a combination of industry analysis and real life scenarios. 10 Components - Risk assessment 2007 Analyse the risks This step covers the evaluation of the impact of the risks, and the likelihood of those risks. The evaluation may be qualitative or quantitative Evaluate the analysis At this stage, negligible risks might be discarded (to simplify analysis), and evaluations (especially qualitative evaluations) adjusted. Treat the risks The options to address the risks are identified, the best option chosen, and implemented. This may include “taking no action” if no risk is sufficient. This step was felt beyond the remit of this assessment project. 11 Components - Risk assessment 2007 Monitor and review It is necessary to monitor the effectiveness of all steps of the risk management process. This is important for continuous improvement. Risks and the effectiveness of treatment measures need to be monitored to ensure changing circumstances do not alter priorities. The assessment also uses the impact scale devised in the DRAMBORA[i] methodology. [i] Digital Repository Audit Method Based on Risk Assessment http://www.repositoryaudit.eu/ Summary The first part of the analysis was to create an inventory of the digital assets. Each collection area was visited and interviewed, and a partial audit of their digital material conducted. This provided an indicative sample of the current state of play within the Library. It is likely that continued annual updating of this list will form part of the long-term maintenance of the analysis. 12 Components - Risk assessment 2007 Broad results returned, can be split into technical and policy, the headlines are; 12 of the 13 case studies returned results consistent with the highest category of risk identified (Media Degradation) Secondary risks associated with software and hardware are less risky but without addressing Media Degradation all data is at the same high level of risk. Failure rates for disks within the BL collections have reached a high level (up to 3%) No central store or service for this digital content The proposed timeframes for ingest of this material mean that an interim solution must now be considered to safeguard this material (and prepare it for ingest) There is a lack of awareness of the fragility of these collection items across the BL There is a need for training in both handling and data stewardship skills across the collection areas 13 Components - Risk assessment 2007 Risk ranking Risk Access type jeopardized 8 Media degradation Bit-stream 7 Media obsolescence 6 File format obsolescence 5 Hardware obsolescence 4 Operating system obsolescence 3 Software obsolescence 2 Poor policy metadata) 1 Poor policy (other) + (improper File/Semantic file system cataloguing, Semantic Semantic/File/Bit-stream 14 Risk assessment- Final Prioritisation Priority for action Collection SDM 1 Endangered Archives Newspapers Modern British/Legal Deposit Maps e-manuscripts STM 2 European/APAC Photography Sound ASR project IDP Music Web Archiving 3 VDEP Sound Archive (in DOM) 15 Components - Risk assessment 2007 BL preservation value system Our obligation to preserve the material Estimates of the cost/effort to mitigate the risks Estimates of the resource available to the Digital Preservation Team Estimates of the cultural significance and value of the collection The commercial significance and value of the collection The need for further analysis of the collection to inform future preservation activities Reader and researcher needs Interest and demand 16 17 Key findings- Risk Assessment 2007 DPT needs to create and implement a policy that deals with all digital content consistently This reduces the variations seen in how digital material is cared for BL needs to move from at-risk physical media to online hard disk-based managed storage. This addresses media deterioration, physical damage, environmental damage, and media obsolescence, and is believed to be the best long-term storage mechanism option available This also enhances manageability of the digital collection Where migration to hard disk is not immediately possible, move to climate controlled (etc.) storage to ensure that the physical media last as long as possible (and back-up) This reduces the problems due to media deterioration, physical damage, and environmental damage Failure rates for disks within the BL collections have reached unacceptably high levels (up to 3%) for hand held media. 18 Activities to mitigate 07/08- Risk Assessment 2007 Monitor and review DPT will use a continuous improvement approach constantly reducing the level of risk Annual update to the risk assessment to continuously improve the condition of the collection based digital objects Annual identification of resulting actions to mitigate risks Management of the digital preservation prioritisation table Key performance indicators to be drawn from the risk factors within the prioritisation table, to be monitored by the digital preservation steering group. (Ideally all risk factors should be in a continuous process of reduction) Resource Plan DPT will take responsibility for this effort by writing a resource plan to establish next stage activity. This will involve people, equipment, storage and policy issues. Establishment of the British Library centre for digital preservation based upon this risk work. This work is already underway. 19 The cost of digitisation and preservation: The LIFE Project 011011 010101 010110 011101 001101 101010 101011 001110 100110 110101 010101 Cost appraisal overview ► What is the LIFE Project? ► LIFE1 and LIFE2 ► LIFE Models ► Burney Case Study ► Benefits ► Further Information 21 Lifecycle Information for E-literature Project phases: ► LIFE1 (12 months) ► LIFE2 (18 months) 22 LIFE starts to answer the question: What is the long term cost of preserving digital material? 23 Why use lifecycle costing? Enables evaluation of all the financial commitments for an item in a collection Important for digital collections, where many costs are largely unknown 24 Lifecycle management- aims Better understanding of the digital lifecycle Plan and prepare for digital preservation activities Evaluate and improve efforts Compare analogue and digital 25 LIFE1 project 1. Literature Review 2. Economic Lifecycle Model 3. Generic Preservation Model 4. Case Studies 5. International Conference 26 LIFE1 Case Studies e-Journals Web Archiving Voluntary Deposit 27 1 LIFE LIFE2 28 Aim of LIFE2 To evaluate, refine and further develop the techniques developed in phase one of LIFE 29 LIFE2 deliverables ► Economic Evaluation of LIFE1 ► Revision of the LIFE Model Version 1.1 (October 2007) Version 2 (Summer 2008) ► Updated Preservation Model (Summer 2008) ► Final report ► End of project conference 30 The LIFE Model v1.1 Lifecycle Stage Creation or Purchase Acquisition Ingest Metadata Creation Bit-stream Preservation Content Preservation Access .... Selection Quality Assurance Re-use Existing Metadata Repository Admin Preservation Watch Access Provision Deposit Ingest Metadata Creation Metadata Creation Storage Provision Bit-stream Preservation Preservation Planning Content Preservation Access Control Access Metadata Extraction Refreshment Preservation Action User Support Backup Re-ingest Lifecycle Elements Creation .... or Purchase Submission Agreement Acquisition .... IPR & Licensing Holdings Update .... Ordering & Invoicing Reference Linking Obtaining Inspection Check-in 31 LIFE Model v1.1: Non-lifecycle Elements Non-Lifecycle Stage Non-Lifecycle Elements Management and Administration Systems / Infrastructure Economic Adjustments Management Repository Software Inflation Administration Discounting 32 Generic LIFE Preservation Model Preservation Actions: 1. Preservation Tool Cost 2. Preservation Metadata 3. Performing preservation action 4. Quality Assurance The GPM predicted large cost and much activity - the challenge is reducing both. 33 Generic LIFE Preservation Model Preservation cost of n objects of a particular format for the period 0 to t. e.g. 200000 objects of the GIF format for a period of 10 years. Preservation = Tech + Watch Frequency * of action Preservation action Monitoring formats and Update software for obsolescence Cost of The number object of and Preservation actions within event Preservation planningpreservation the time metadata period calculatedtool Updating metadata Perform preservation action Q/A 34 Complexity of file formats Tech + Watch Preservation = Category = Preservation action Complexity Examples Simple 0.1 ASCII, Unicode Bitmap 0.2 JPEG, GIF • Size • Complexity Update Vector • Proprietary metadata Multimedia • Open • Standardised Document Mark-up Format Complexity Frequency * of action Complex 0.3 XML, HTML Cost of Perform 0.4 EMF, Draw Preservation preservation 0.6 toolMPEG3, WAVaction Q/A 0.8 Word, PDF 1 Oracle database dump 35 LIFE2 Case Studies Institutional Repositories Primary Data Digitised Newspapers 01101101 01010101 10011101 00110110 10101010 11001110 10011011 01010101 01100111 01001101 10101010 10110011 10100110 11010101 01011001 11010110 36 The Burney Collection Purchased by the British Library in 1818 for £13,500 1,100 volumes of the earliest known newspapers 1,000,000 pages from 17th, 18th and 19th Centuries. Re-scanning or re-microfilming is not possible. Microfilmed in the 1970s Digitisation started in 1995-96 and ran until 2004. 37 Questions that arise from Burney Comparing digital and analogue lifecycles What is the lifecycle cost to an institution of producing digitised surrogates? What are the key preservation issues common across digitisation projects of differing scales? 38 Benefits of LIFE ► Assess the financial commitment for acquiring or creating new digital materials ► More effective planning for preservation activities ► Comparison of digital lifecycles across collections ► Evaluation and optimisation of existing digital lifecycles ► Predictive future cost of digital preservation 39 LIFE Website & Blog Website www.life.ac.uk LIFE Blog www.life.ac.uk/blog 40 Thanks and Acknowledgements: Thanks for your attention. Risk Assessment 2007 (Peter Bright and Paul Wheatley) LIFE Team (Paul Ayris, Helen Shenton, Paul Wheatley and Richard Davies) 41
© Copyright 2025 Paperzz