Risk assessment and cost appraisal

Risk Assessment and Cost Appraisal
Delos - November 15-16
Academia Nazionale Dei Lincei - Rome
Rory McLeod – Digital Preservation Manager
The British Library, London
Overview

Risk assessment
Background
Components
Key findings and activities 2007/08

Cost appraisal
Lifecycle management
LIFE1
LIFE2
2
Content analysis from 2006- DPC Mind the Gap

Over 200 terabytes of data growing by over 50 terabytes a year

Majority is from the sound archive, however:
 Manuscripts 1.5 terabytes
 Asia, Pacific and Africa 9.5 terabytes
 European and American 0.5 terabytes
 Commercial services 5 terabytes
 Voluntary deposit of electronic publications 1.5 terabytes
 Newspapers potentially 70 terabytes at project conclusion
 Microsoft digitisation 30 terabytes
 Sound recordings 12 terabytes

A wide variety of formats are represented in the data; most common formats are
found, but there are also smaller amounts of rare and proprietary data

Update- Based upon the risk assessment we estimate the total size to be
closer to 300 TB.
3
Background — Risk assessment 2007
The objective of Digital Preservation Team is to address the risk of
deterioration of digital material through short-term access to long-term
preservation.
–Building on a 2003 study
When we examined the 2003 risk assessment we concluded:
 Having the object isn’t enough (CDR)
 Knowing the format of the object isn’t enough (EPS)
 You need software to use it, a computer to run it on (Postscript Parser)
 The functionality and access of the object can intimately depend on the
details of the environment, most of which we don’t have. (operating
system, hardware requirements)
Taking those basic concepts a little further..
4
Background - Risk assessment 2007
Physical media deterioration:
 The lifetimes of physical media can be measured in years (or even months, e.g.
recordable CD/DVD)
 Unlike books which can be kept for centuries in the right conditions
Technical obsolescence:
 3.5” floppy disk drives used to be ubiquitous, now only a few have them
File format obsolescence:
 Keeping the bitstream isn’t enough; we need to understand it
 Many file formats are undocumented; we can’t understand the files, and need the
software
 Software gets abandoned (who uses WordStar any more?) and new versions can
be incompatible with old files
Environmental obsolescence:
 Keeping the software isn’t enough; we need to run it
 But old hardware doesn’t work, or isn’t available
5
Background - Risk assessment 2007
So our starting point:
1.
2.
3.
4.
5.
Identify the digital assets currently held
Identify the environmental requirements of those assets
Assess the risks jeopardizing the use of those assets
React to those risks (“save” those assets to which access no longer
exists or will most soon be lost)
Proactively respond to those risks (prevent assets from becoming
inaccessible in the future)
6
Background – Risk assessment 2007
This produced an accurate and detailed digital holdings list:

Be as near to exhaustive as possible
 Be detailed
 Physical formats
 Disk/file-system formats
 File formats
 Operating system requirements
 Application requirements
}
Not covered in 2003
7
Background – Risk assessment
From this holdings list, we have performed a risk assessment to:
 Enumerate the risks faced
 Evaluate the likelihood and impact of each risk
 Rank holdings according to risk
 Perform risk-based triage on holdings
And longer term, from the assessment and ranking we will:
 Prioritise ingest into DOM
 Write preservation plans and take preservation actions to target the
highest-risk material
 Possibly Migrate it to less risky file formats
 Preserve software environments (emulators etc.)
 Guide future ingest
 Determine a set of preferred long-term preservation formats
8
Components - Risk assessment 2007

AS/NZ 4360:2004 risk standard

Follows on from 2003 study

Methodology is split into Context, identify, analyse, evaluate and treat

A new value scale for BL holdings plus DRAMBORA an established risk toolkit was
used for impact (cataclysmic to none)

Representative content was analysed from all areas of the collections, the first time
this has been done.

23 different risks were identified, these were gathered into 6 direct and 2 indirect
risks

Media Degradation, Media Obsolescence, File format obsolescence, Hardware
obsolescence, Operating system + file system, Software obsolescence

Poor policy (Cataloguing, Metadata), Poor policy other (Handling, Training)
9
Components - Risk assessment 2007
The AS/NZS 4360:2004 Risk Management standard defines a seven-step approach
to risk management:
Communicate and consult
Communicate and consult with internal and external stakeholders as appropriate
at each stage of the risk management process and concerning the process as a
whole.
Establish the context
Stakeholders are identified, and the objectives of the stakeholders and the
organization as a whole are established.
Identify the risks.
In this stage, the risks—that is, what can go wrong—are enumerated and
described. We used a combination of industry analysis and real life scenarios.
10
Components - Risk assessment 2007
Analyse the risks
This step covers the evaluation of the impact of the risks, and the
likelihood of those risks. The evaluation may be qualitative or
quantitative
Evaluate the analysis
At this stage, negligible risks might be discarded (to simplify analysis),
and evaluations (especially qualitative evaluations) adjusted.
Treat the risks
The options to address the risks are identified, the best option chosen,
and implemented. This may include “taking no action” if no risk is
sufficient.
This step was felt beyond the remit of this assessment project.
11
Components - Risk assessment 2007
Monitor and review
It is necessary to monitor the effectiveness of all steps of the risk management
process. This is important for continuous improvement. Risks and the
effectiveness of treatment measures need to be monitored to ensure changing
circumstances do not alter priorities.
The assessment also uses the impact scale devised in the DRAMBORA[i]
methodology.
[i] Digital Repository Audit Method Based on Risk Assessment
http://www.repositoryaudit.eu/
Summary
The first part of the analysis was to create an inventory of the digital assets. Each
collection area was visited and interviewed, and a partial audit of their digital
material conducted. This provided an indicative sample of the current state of
play within the Library. It is likely that continued annual updating of this list will
form part of the long-term maintenance of the analysis.
12
Components - Risk assessment 2007
Broad results returned, can be split into technical and policy, the headlines are;

12 of the 13 case studies returned results consistent with the highest category of
risk identified (Media Degradation)

Secondary risks associated with software and hardware are less risky but without
addressing Media Degradation all data is at the same high level of risk.

Failure rates for disks within the BL collections have reached a high level (up to
3%)

No central store or service for this digital content

The proposed timeframes for ingest of this material mean that an interim solution
must now be considered to safeguard this material (and prepare it for ingest)

There is a lack of awareness of the fragility of these collection items across the BL

There is a need for training in both handling and data stewardship skills across the
collection areas
13
Components - Risk assessment 2007
Risk ranking
Risk
Access type jeopardized
8
Media degradation
Bit-stream
7
Media obsolescence
6
File format obsolescence
5
Hardware obsolescence
4
Operating system
obsolescence
3
Software obsolescence
2
Poor policy
metadata)
1
Poor policy (other)
+
(improper
File/Semantic
file
system
cataloguing,
Semantic
Semantic/File/Bit-stream
14
Risk assessment- Final Prioritisation
Priority for action
Collection
SDM
1
Endangered Archives
Newspapers
Modern British/Legal Deposit
Maps
e-manuscripts
STM
2
European/APAC
Photography
Sound ASR project
IDP
Music
Web Archiving
3
VDEP
Sound Archive (in DOM)
15
Components - Risk assessment 2007
BL preservation value system

Our obligation to preserve the material

Estimates of the cost/effort to mitigate the risks

Estimates of the resource available to the Digital Preservation Team

Estimates of the cultural significance and value of the collection

The commercial significance and value of the collection

The need for further analysis of the collection to inform future preservation
activities

Reader and researcher needs

Interest and demand
16
17
Key findings- Risk Assessment 2007

DPT needs to create and implement a policy that deals with all digital content
consistently
 This reduces the variations seen in how digital material is cared for

BL needs to move from at-risk physical media to online hard disk-based managed
storage.
 This addresses media deterioration, physical damage, environmental
damage, and media obsolescence, and is believed to be the best long-term
storage mechanism option available
 This also enhances manageability of the digital collection

Where migration to hard disk is not immediately possible, move to climate controlled
(etc.) storage to ensure that the physical media last as long as possible (and back-up)
 This reduces the problems due to media deterioration, physical damage, and
environmental damage

Failure rates for disks within the BL collections have reached unacceptably high
levels (up to 3%) for hand held media.
18
Activities to mitigate 07/08- Risk Assessment 2007

Monitor and review
DPT will use a continuous improvement approach constantly reducing the level of risk

Annual update to the risk assessment to continuously improve the condition of the
collection based digital objects

Annual identification of resulting actions to mitigate risks

Management of the digital preservation prioritisation table

Key performance indicators to be drawn from the risk factors within the
prioritisation table, to be monitored by the digital preservation steering group.
(Ideally all risk factors should be in a continuous process of reduction)

Resource Plan
DPT will take responsibility for this effort by writing a resource plan to establish
next stage activity. This will involve people, equipment, storage and policy issues.
Establishment of the British Library centre for digital preservation based upon this
risk work.
This work is already underway.
19
The cost of digitisation and preservation:
The LIFE Project
011011
010101
010110
011101
001101
101010
101011
001110
100110
110101
010101
Cost appraisal overview
► What is the LIFE Project?
► LIFE1 and LIFE2
► LIFE Models
► Burney Case Study
► Benefits
► Further Information
21
Lifecycle Information for E-literature
Project phases:
► LIFE1 (12 months)
► LIFE2 (18 months)
22
LIFE starts to answer the question:
What is the long term cost
of preserving digital material?
23
Why use lifecycle costing?

Enables evaluation of all the financial commitments for an
item in a collection

Important for digital collections, where many costs are
largely unknown
24
Lifecycle management- aims

Better understanding of the digital lifecycle

Plan and prepare for digital preservation activities

Evaluate and improve efforts

Compare analogue and digital
25
LIFE1 project
1.
Literature Review
2.
Economic Lifecycle Model
3.
Generic Preservation Model
4.
Case Studies
5.
International Conference
26
LIFE1 Case Studies
e-Journals
Web Archiving
Voluntary Deposit
27
1
LIFE
LIFE2
28
Aim of LIFE2
To evaluate, refine and
further develop the techniques
developed in phase one of LIFE
29
LIFE2 deliverables
► Economic Evaluation of LIFE1
► Revision of the LIFE Model
 Version 1.1
(October 2007)
 Version 2
(Summer 2008)
► Updated Preservation Model (Summer 2008)
► Final report
► End of project conference
30
The LIFE Model v1.1
Lifecycle
Stage
Creation
or
Purchase
Acquisition
Ingest
Metadata
Creation
Bit-stream
Preservation
Content
Preservation
Access
....
Selection
Quality
Assurance
Re-use
Existing
Metadata
Repository
Admin
Preservation
Watch
Access
Provision
Deposit
Ingest
Metadata
Creation
Metadata
Creation
Storage
Provision
Bit-stream
Preservation
Preservation
Planning
Content
Preservation
Access
Control
Access
Metadata
Extraction
Refreshment
Preservation
Action
User
Support
Backup
Re-ingest
Lifecycle Elements
Creation ....
or
Purchase
Submission
Agreement
Acquisition
....
IPR &
Licensing
Holdings
Update
....
Ordering &
Invoicing
Reference
Linking
Obtaining
Inspection
Check-in
31
LIFE Model v1.1: Non-lifecycle Elements
Non-Lifecycle
Stage
Non-Lifecycle
Elements
Management
and
Administration
Systems /
Infrastructure
Economic
Adjustments
Management
Repository
Software
Inflation
Administration
Discounting
32
Generic LIFE Preservation Model
Preservation
Actions:
1. Preservation Tool
Cost
2. Preservation
Metadata
3. Performing
preservation action
4. Quality Assurance
The GPM predicted large cost and much activity - the
challenge is reducing both.
33
Generic LIFE Preservation Model
Preservation cost of n objects of a particular format for the period 0 to t.
e.g. 200000 objects of the GIF format for a period of 10 years.
Preservation =
Tech
+
Watch
Frequency
*
of action
Preservation action
Monitoring
formats and
Update
software for obsolescence
Cost of
The number
object of
and
Preservation
actions within
event
Preservation planningpreservation
the time metadata
period calculatedtool
Updating metadata
Perform
preservation
action
Q/A
34
Complexity of file formats
Tech
+
Watch
Preservation =
Category
=
Preservation action
Complexity
Examples
Simple
0.1 ASCII, Unicode
Bitmap
0.2 JPEG, GIF
• Size
• Complexity
Update
Vector
• Proprietary
metadata
Multimedia
• Open
• Standardised
Document
Mark-up
Format
Complexity
Frequency
*
of action
Complex
0.3 XML, HTML
Cost of
Perform
0.4 EMF, Draw
Preservation
preservation
0.6
toolMPEG3, WAVaction
Q/A
0.8 Word, PDF
1 Oracle database dump
35
LIFE2 Case Studies
Institutional Repositories
Primary Data
Digitised Newspapers
01101101
01010101
10011101
00110110
10101010
11001110
10011011
01010101
01100111
01001101
10101010
10110011
10100110
11010101
01011001
11010110
36
The Burney Collection

Purchased by the British Library in 1818 for £13,500

1,100 volumes of the earliest known newspapers

1,000,000 pages from 17th, 18th and 19th Centuries.

Re-scanning or re-microfilming is not possible.

Microfilmed in the 1970s

Digitisation started in 1995-96 and ran until 2004.
37
Questions that arise from Burney

Comparing digital and analogue lifecycles

What is the lifecycle cost to an institution of producing
digitised surrogates?

What are the key preservation issues common across
digitisation projects of differing scales?
38
Benefits of LIFE
► Assess the financial commitment for acquiring or creating
new digital materials
► More effective planning for preservation activities
► Comparison of digital lifecycles across collections
► Evaluation and optimisation of existing digital lifecycles
► Predictive future cost of digital preservation
39
LIFE Website & Blog
Website
www.life.ac.uk
LIFE Blog
www.life.ac.uk/blog
40
Thanks and Acknowledgements:

Thanks for your attention.

Risk Assessment 2007 (Peter Bright and Paul Wheatley)
 LIFE Team (Paul Ayris, Helen Shenton, Paul Wheatley and Richard
Davies)
41