Exercising the Disaster-Recovery Plan

Exercising Disaster
Recovery
The plan is no better than the exercise program.
Miami University Information Technology
Services has committed to exercising and testing
its disaster-recovery plan at least twice a year.
Techniques for developing and evaluating
tabletop and drill exercises will be presented.
Ohio Higher Education Computing Council 2006
Who is JdK?
I am an IT Guy!




Decades of experience Systems Programming.
Information Technology Infrastructure Library (ITIL) Certificate of Competency
Certified Business Continuity Professional – Disaster Recovery Institute International.
Systems Integration
Now pursuing PMP…
With good relations with the business side of
the university.
[email protected]
Exercising Disaster
Recovery
OHECC
4/20/2006
2
Overview
What we want from Exercising
What we did
What we got
Exercising Disaster
Recovery
OHECC
4/20/2006
3
What does Miami wants from a
Disaster Recovery Exercise?
Preparation
Training
Relationship building
Publicity
Evaluation
Improvement
(maybe have a little bit of fun)
Exercising Disaster
Recovery
OHECC
4/20/2006
4
What does Miami wants from a
Disaster Recovery Exercise?
For a Financial Services exercise:

Pay Payroll

Pay Vendors

Manage Cash

Maintain information
Exercising Disaster
Recovery
OHECC
4/20/2006
5
What We Did
Types of Exercises
Project methodology
An Example
Exercising Disaster
Recovery
OHECC
4/20/2006
6
Types of Disaster Recovery
Exercises
Walk Through
Tabletop
Drill (Operational)
Exercising Disaster
Recovery
OHECC
4/20/2006
7
Walk Through
Format: 1 hour meeting with a few staff; walk through a
specific DR Procedure
Participants: facilitator and trainees
Purpose: train staff to use DR Procedures
evaluate procedures
Preparation:

Distribute procedure before meeting

Facilitator should have & understand DR procedure
Exercising Disaster
Recovery
OHECC
4/20/2006
8
Tabletop
Format: 4 hour meeting: exercise & debriefing
Talk through a specific disaster scenario
Participants: Players, Evaluators, Observers, Controllers
From multiple departments
Purpose: Preparation, Training, …
Evaluation, Improvement
Preparation:

Objectives

Scenario

Evaluation Criteria

People
Exercising Disaster
Recovery
OHECC
4/20/2006
9
Drill
Format:
All day: exercise & debriefing
Work through a specific disaster scenario
Participants: Players, Evaluators, Observers, Controllers
Purpose: Preparation, Training, …
Evaluation, Improvement
Preparation:




Exercising Disaster
Recovery
Objectives
Scenario
Evaluation Criteria
People
OHECC
4/20/2006
10
Project methodology
Seven Step Process
1.
2.
3.
4.
5.
6.
7.
Exercising Disaster
Recovery
OHECC
4/20/2006
Concept
Initiation
Requirements
Development
Validation
Deployment
Close
11
An Example
Drill
 MU Disaster Tolerance Architecture
 Exercise Philosophy
 Scenario
 Anticipated Schedule of Events
 Exercise Documentation
Exercising Disaster
Recovery
OHECC
4/20/2006
12
MU
Disaster
Tolerance
Architecture
Exercising Disaster
Recovery
OHECC
4/20/2006
13
Exercise Philosophy

No harm to production environments

Partnership between IT & Client
 Client

chaired Evaluation Team
80 / 20 Rule
 80%
of the results from 20% of everything that could
be tested.

Start with 1 pound weights
Exercising Disaster
Recovery
OHECC
4/20/2006
14
Sample Scenario
Today is Friday, December 14th, 2005. The skies are overcast and it is snowing lightly. The
current temperature is 12°F. At 8:30 a.m., the lights in Hoyt Hall flickered and went out. Within
a few seconds, they came back on and went out again. All lights, workstations (with the
exception of a few laptops), and other electrical devices are without power. The Machine
Room’s emergency lighting and indicator lights are lit indicating servers are still powered up.
The Physical Facilities Department Operations Center was notified by telephone of an apparent
failure of the Hoyt emergency generator. Within minutes the fire alarm is activated.
Bright strobe lights and the high-pitched shrill of the fire alarm filled the building. Occupants
grabbed jackets, purses and laptops and began evacuating the building. Before leaving,
someone called 911 to report the fire alarm. Police Dispatch received the call from Hoyt Hall at
8:37 a.m.
By 8:42 a.m. occupants of Hoyt Hall have left the building. Police and PFD staffs arrive on the
scene by 8:45. A metal rod is found sticking out of the generator at 8:55. Domestic terrorism is
highly suspected and the Miami University Emergency Operations Center is activated.
This is a critical day for Payroll Services. Student payroll is scheduled to be paid. In addition,
Accounts Payable needs to process their regular check runs to pay vendors and refund
students. Treasury Services needs to process the daily cash and investment transactions.
PFD informs the Information Technology Services’ Computing and Network Operations Center
(CNOC) staff that the Hoyt machine room UPS has approximately 90 minutes of capacity. After
90 minutes the machine room will be without electrical power.
Exercising Disaster
Recovery
OHECC
4/20/2006
15
Anticipated Schedule of Events
9:30
Controllers review Player’s Handbook with Players & other participants.
Assistant Drill Controller reads scenario to IT players
Failover based services continue to be available:
Approx 9:40
Deputy CIO Appoints a Disaster Recovery Coordinator (DRC)
Thereafter:
DRC pursues recovery of services on “failover” equipment.
Approx 10:30
Primary site is completely powered down. No electricity & no one
allowed in.
Lead Drill Controller informs Finance players that they no longer have
IT services.
Approx 12:00
Recovered services made available to Finance for test transactions.
Thereafter:
Finance staff pursues sample transactions
Accounts Payable
Payroll
Treasury Services
3:30
Debriefing
Exercising Disaster
Recovery
OHECC
4/20/2006
16
Exercise Documentation
Exercise Plan
Evaluation Plan
Participants’ Handbook
Memo to Participants
Exercising Disaster
Recovery
OHECC
4/20/2006
17
Exercise Plan
Exercising Disaster
Recovery
OHECC
4/20/2006
18
Simulated
Drill
Infrastructure
Exercising Disaster
Recovery
OHECC
4/20/2006
19
Evaluation Plan

Exercise Objectives
1.
2.
3.
4.
5.
Effective communication
Identify appropriate measures to restore
financial services
Resolve technical engineering questions
Demonstrate the level of knowledge
Demonstrate the adequacy of current
procedures, practices and knowledge
Exercising Disaster
Recovery
OHECC
4/20/2006
20
Exercising Disaster
Recovery
OHECC
4/20/2006
21
Participants’ Handbook
Exercising Disaster
Recovery
OHECC
4/20/2006
22
Assumptions



Finance & Business Services and Information
Technology Services have established emergency plans
and procedures. Those documents include mitigation,
response and recovery elements. They may be brought
to and used at the exercise.
Players will respond in accordance with the existing
plans, procedures and policies. In the absence of
applicable plans, procedures or policies, players will be
expected to apply individual and/or team initiative to
satisfy response requirements.
+ others…
Exercising Disaster
Recovery
OHECC
4/20/2006
23
Artificialities






The university’s banks are not participating in the exercise;
procedures will prepare files for transmission but they will not be
transferred.
The disaster recovery environment is a copy of the production
environment, reflecting the state of the production environment
approximately two days before the exercise.
Outage notices will not be emailed nor posted to web sites.
Voice communications are tagged with “This is a disaster
recovery exercise communication.”
The secondary computing center currently hosts neither the
quick recovery database server nor the Citrix server. These
machines will be moved to the secondary site when it can
support them.
The Controller and Assistant Controller may add other
artificialities during the drill; these should be documented for the
After Action Report.
+ others…
Exercising Disaster
Recovery
OHECC
4/20/2006
24
Exercise Rules
Players may talk to other players during the exercise. Players should work with other players to
understand procedures and strategize solutions.
In the event a player needs to talk with a non-player the player must first consult with the controller.
The controller will log the request and will approve, disallow it, or provide the requested
information.
Evaluators, observers and other non-players should not offer advice or comments to the players,
unless directed to do so by the controller who is responsible for logging the communication.
Players should talk to the controller when they need to talk to someone for whom there is no player.
For instance if bank personnel need to be called the player would talk to their controller since bank
personnel are not participating in the exercise.
Follow department/university procedures when they are available. One of the goals of the exercises is
to evaluate existing procedures. Another is to determine if additional procedures are needed.
Exercise voice communications and exercise emails sent out during the event should be prefixed with
THIS IS A DISASTER RECOVERY EXERCISE COMMUNICATION
Exercise voice communications and exercise emails sent out during the event should be suffixed with
THIS WAS A DISASTER RECOVERY EXERCISE COMMUNICATION, THIS IS ONLY AN EXERCISE
Production services should not be affected by exercise activities. Since banks are not participating
care must be taken to make sure files are not transmitted to banks.
Exercise may be rescheduled in the event of a critical incident which requires the attention of exercise
participants.
Exercising Disaster
Recovery
OHECC
4/20/2006
25
What we got – After Action Report
Exercising Disaster
Recovery
OHECC
4/20/2006
26
What we got
Preparation
Training
Relationship building
Publicity
Evaluation
Improvement
(maybe had a little bit of fun)
Exercising Disaster
Recovery
OHECC
4/20/2006
27
What we got

Pay Payroll

Pay Vendors

Manage Cash

Maintain information
Exercising Disaster
Recovery
OHECC
4/20/2006
28
What we got
Project to improve 2nd Site
 Project to improve Remote Site
 Improved procedures
 Crisis Leadership Training


Positive Auditor Review
Exercising Disaster
Recovery
OHECC
4/20/2006
29
Lessons Learned
Start with one pound weights
 Expect creativity!
 Expect surprises as well.
 Project Manager wrote all documentation

Exercising Disaster
Recovery
OHECC
4/20/2006
30
Comments / Questions
Exercising Disaster
Recovery
OHECC
4/20/2006
31