in the Face of Chaos, Calamity, and Cataclysm

At the Core
This article:
➢
Discusses the impact of 9/11
on disaster planning
➢
Discusses centralized vs.
decentralized computing
➢
Provides best practices for
long-term data retention
Protecting
Records
in the Face of Chaos,
Calamity, and Cataclysm
David O. Stephens, CRM, FAI, CMC
T
he impact of the events of September 11,
2001 (9/11), will be felt for decades in
many areas of national and international affairs. One business impact has been on
efforts to prevent and mitigate the loss of
records and information. In terms of records
and information, 9/11 was, by far, the largest
disaster in American history. What conclusions can be drawn from the tragedy, and
what lessons can be learned by those who
manage records and information for organizations in government or business?
First and foremost, no amount of planning
could have anticipated the incredible level of
destruction; no contingency planner could
have imagined the devastating events. The
loss of life was astounding. In fact, the loss of
people and their technical talents set the
Even organizations that do
not think they are prime targets
for terrorists do not have the
luxury of considering themselves
exempt from disaster planning
January/February 2003
•
The Information Management Journal
33
The ability to
attacks apart from any disaster that preceded them. Some companies lost most,
if not all, of their disaster recovery team,
as well as large numbers of their information technology (IT) staffs. This
made the recovery efforts enormously
more difficult than they otherwise
would have been.
The World Trade Center (WTC)
Towers housed about 1,200 businesses
that employed some 40,000 people. The
largest tenants included some of the
world’s prominent financial institutions. Among them
respond to a
disaster successfully
is not an optional
management
initiative but
an essential
component of the
• Morgan Stanley Dean Witter
cost of doing
• Bank of America
• Deutsche Bank
business.
• Oppenheimer Funds
• Credit Suisse First Boston
percent of respondents indicated that
their plans required at least some revision. A Disaster Recovery Journal article,
“What Disaster Recovery Experts Were
Thinking Just After the Attacks,” states
that clearly the respondents were underprepared for threats of terrorism, war,
biological hazards, and explosions.
Even organizations that do not think
of themselves as prime terrorist targets
do not have the luxury of considering
themselves exempt from the need for
disaster planning. Terrorism and threats
of war must be taken seriously in all disaster planning scenarios.
The 1993 terrorist bombing of the
WTC taught tenants to be ready.
Charles Phillips of Morgan Stanley
Dean Witter told InformationWeek that
the WTC “was probably one of the best
prepared office facilities from a systems
and data-recovery perspective.” In an
interview with Storage Management
Solutions, Merrill Lynch’s Director of
Global Contingency Planning, Paul
Honey, stated that the Y2K experience
greatly improved his company’s ability
to respond to the 9/11 disaster. The
huge expenditures of Y2K yielded small
dividends at the dawn of the new millennium, but were priceless on
September 11. Without such prior
Many other large financial institutions maintained offices in the WTC’s
immediate vicinity and were extensively
damaged if not destroyed. According to
“Terrorist Attacks Have Far-Reaching
Effects on Businesses,” a recent Disaster
Recovery Journal article, more than 15
million square feet of office space were
destroyed or damaged – an amount
comparable to the entire downtowns of
Atlanta or Miami. The Emergency
Operations Center for the city government, located at 7 World Trade Center,
caught fire and eventually collapsed,
leaving New York City without a centralized facility to manage its emergency
operations.
Disaster Recovery Planning
The terrorist attacks changed the face
of disaster preparedness. In their wake,
traditional disaster recovery practices
have been shown to be inadequate.
Businesses must now prepare for the
kinds of threats and a scale of potential
damage never before imagined.
According to a survey conducted at
the Disaster Recovery Conference, prior
to September 11, only a small percentage of disaster recovery plans addressed
terrorist acts, and still fewer addressed
acts of war. In light of the attacks, 97
34
The Information Management Journal
•
January/February 2003
efforts, the information losses of 9/11
would have been much worse.
Smaller Organizations
Most Vulnerable
The ability to respond to a disaster
successfully is not an optional management initiative but an essential component of the cost of doing business. For
the most part, the critical systems of
major financial institutions and other
large businesses seem to have recovered
quickly and according to plan.
“In the most high-risk, high-exposure environments, we had great success,” Joseph Walton of EMC Corp. told
Information Week. In general, however,
small and mid-size businesses were
more vulnerable because few had the
budget for real-time data backup or offsite recovery facilities. Walton also said
that some smaller companies “were just
put out of business.” Their ability to get
back to business depended on whether
they recently saved essential data to
disk, tape, or other media for storage
offsite, and many had not done so.
New Government Initiatives
The 9/11 disaster revealed a major
problem in the ability to fight terrorism
– the inability to share intelligence data
among various agencies of government.
The detection and prevention of acts of
terrorism poses truly unprecedented
information challenges. Success in the
war on terrorism will depend on extensive sharing of computer data and
records among intelligence and law
enforcement agencies at all levels of the
U.S. government. The FBI, the CIA, the
National Security Agency, the
Immigration and Naturalization
Service, and many other government
agencies are now trying to share intelligence and investigative data on an
unparalleled scale. Part of the challenge
is that dozens or even hundreds of databases are involved, running on a wide
variety of computer and software platforms. Only a few of the systems are
currently interconnected. The resolution of this problem is a major challenge
for the U.S. government.
The notion that the value of information is directly proportionate to its
accessibility has been an article of faith
in information management circles for
decades. The 9/11 nightmare underscored this cardinal principle. The
seamless transmission and sharing of
information, both inside and outside
the firewall, is needed now more than
ever. Organizations do not have the luxury of designing information infrastructures solely for their own use. The
value of information can be optimized
only through greater accessibility.
President Bush recently signed the
Homeland Security Act into law,
authorizing the formation of the
Department of Homeland Security,
which will have the authority to develop
a plan to inventory and protect the
nation’s critical infrastructures –
telecommunications, financial and
banking, energy, and transportation.
Safeguarding the IT infrastructure in
both the public and private sectors is
expected to be a key part of the new
department’s work. Moreover, the U.S.
Department of Justice has proposed
new legislation that will give it the
power to prosecute computer crimes as
acts of terrorism.
the company activated its disaster
recovery plan within minutes of the
attacks and immediately began transferring business-critical functions to its
command center in New Jersey.
Because the 60,000-square-foot facility
had been pre-designed as a corporatewide disaster response facility, all personnel knew immediately where to dial
in to transfer information. This allowed
transactions throughout the company’s
global offices to continue basically
without interruption.
As was widely reported, trading operations were transferred to London,
Tokyo, and Hong Kong. Moreover,
Merrill Lynch devised a plan to use a
telemarketing service and the company’s public Internet site to communicate
with displaced workers. In 1999, the
A Success Story: Recovery
at Merrill Lynch
When Wall Street went back to work
on the Monday following the attacks,
most of the world saw what Wall Street
and Washington wanted it to see: that
financial markets had reopened, trading
systems were up and running, and
investors were buying and selling millions of shares of stock, seemingly without a glitch. “America’s economic system is intact,” Richard Grasso, chairman
of the New York Stock Exchange
(NYSE), was quoted as saying in a
Fortune article, “Telco on the Frontline.”
However, things weren’t that simple.
No story illustrates an impressive
response to disaster as well as the recovery at Merrill Lynch, one of the world’s
preeminent securities and investment
companies. Merrill Lynch’s Honey has
said in numerous media interviews that
January/February 2003
•
The Information Management Journal
35
company had installed a global trading
platform that meant all its business was
on the same system – and that the system would operate even if part of it
went down. Merrill Lynch deployed
5,000 technical employees, many working 36-hour shifts, to re-establish communications to the stock exchange.
This foresight did not mean that
everything went perfectly, however.
When the stock market opened on
Monday morning, Merrill Lynch’s
the integrity of the nation’s financial
systems, to establish backup facilities
hundreds of miles outside of Wall Street
and New York City.
Largely for reasons of convenience,
many firms had established their backup
facilities within 20 to 30 miles of the city.
In some cases, these facilities used the
same power and telecommunications
grids as were used for the primary sites,
which meant that full operations could
not be restored for days or even weeks.
...backup facilities supporting mission-critical
business functions should not be located
where they may be subject to the same
disaster that affects the primary site.
traders lacked the electronic data feeds
normally used to send orders to the
exchange floor. They had to make do
with the telephone. Instead of transmitting buy and sell orders electronically,
clerks rushed to large whiteboards
where they scribbled ticker symbols
with felt-tipped pens. Research analysts,
who had escaped the WTC with none of
their old research reports, worked from
home. They began assembling and
transmitting research reports directly in
the e-mail messaging environment.
Still, Merrill Lynch was never unable to
execute buy and sell orders for securities
– a very impressive achievement.
To cite one example, the contingency
floor of the NYSE is located in Brooklyn,
where the offices of the Securities
Industry Association, the technical backbone of the NYSE, are also located. NYSE
Chairman Grasso has stated that these
facilities are too close to Wall Street to act
as an effective backup site, and alternate
sites are currently being evaluated.
The new regulations, which have
been prepared by the U.S. Department
of the Treasury, the Federal Reserve
Board, and the Securities and Exchange
Commission, would require regulated
companies to ensure that critical payment and clearance functions could be
restored in full on the day of a catastrophic event.
An important lesson is that backup
facilities supporting mission-critical
business functions should not be located
where they may be subject to the same
disaster that affects the primary site.
How “Remote” Must Backup Be?
As mentioned, Merrill Lynch transferred certain business-critical functions to its command center in New
Jersey. Although its remote processing
capability was successful, 9/11 has
caused serious consideration of just
how remote an organization’s backup
facilities should be. Sixteen months
later, according to “U.S. May Require
Backup Wall Street,” a Wall Street
Journal article, federal regulators are
considering a plan that would require
the nation’s largest banks and securities
firms, whose operations are critical to
36
The Information Management Journal
munications-intensive sites in the world.
The attacks knocked out telecommunications in a huge swath of lower
Manhattan, including the data lines that
served the NYSE and a custom software
system that helped the “Big Board” communicate with its trading partners.
Under great pressure and horrific circumstances, Verizon, the major
provider of local phone service in the
area of the WTC, was able to pull off a
feat few thought possible: It rebuilt its
network in lower Manhattan during the
course of a single weekend.
“On Thursday there was literally no
hope,” said Lawrence Babbio, head of
Verizon’s disaster recovery team, in an
interview with Computerworld. In the
same article, Paul Lacouture, president
of Verizon’s network services division,
said, “I’ve gone into our buildings after
fires. I’ve restored our networks after
floods and earthquakes. This was a
combination of all those things, times a
factor of three or four.”
To cite just one aspect of the restoration effort: The major cellular carriers
(AT&T Wireless, Verizon Wireless, and
Cingular) all deployed temporary
Cellular on Wheels Systems (COWS) to
replace downed towers and restore
wireless phone communications at both
the WTC and Pentagon sites.
Despite the success in restoring data
and phone communications, 9/11
underscored the vulnerability of the
existing telecom network in the United
States, a vast jumble of copper and fiber
lines, wireless transmitters, and computers that is much more fragmented
than it was a decade ago.
Despite the WTC experience, disaster
recovery plans must assume that phone
service and Internet connections may
not be restored for up to several weeks
following a cataclysmic disaster.
The Technology Infrastructure
One of the biggest lessons learned
from 9/11 was that, although it sustained severe damage, the IT and
telecommunications infrastructure supporting the WTC did not collapse.
The area around the WTC in lower
Manhattan is one of the most telecom-
•
January/February 2003
Centralized vs. Decentralized
Computing
After 9/11, only a few companies
required help with mainframe recovery;
the biggest challenges were restoring
mid-range systems and servers, restoring connectivity to computer networks
data regardless of platforms or media.
The data itself can be easily and quickly
dispersed to secure, offsite locations. In
a SAN data storage environment, if one
site is taken out of commission, the
whole enterprise is not affected by the
event. The SAN architecture simplifies
the process of creating mirror data
images at remote locations, thus ensuring that remote, up-to-date copies of
databases are available in case of disaster at the main location.
The major lesson is that organizations must now take steps to make their
businesses less dependent on a single
office or data infrastructure.
and desktops, and obtaining access to
stored backup information. Whether
organizations should implement centralized or decentralized computing
strategies is an issue that has been
debated for decades. With the rise of
client/server computing during the
1990s, following the end of IBM’s
dominance in highly centralized mainframe computing, decentralized computing became very popular. During
Organizations
that rely heavily
A Safety Net
Despite massive breakdowns on the
telecommunications and computing
front, the Internet never skipped a beat.
As originally planned by the U.S.
Department of Defense, the Internet
was designed to remain invulnerable to
acts of war. If one part were to be disabled, the entire system would not be
put out of commission. This is exactly
what occurred 9/11. The packet-based,
asynchronous Internet enables messages to travel by a variety of routes to
reach recipients, thereby reducing the
risk of overall system failure. When
phone service failed, people turned to
the Internet. If Internet connectivity
was lost and organizations couldn’t get
e-mail in and out, they established temporary access via other Internet service
providers. Finally, it was widely reported that many organizations relied heavily on instant messaging to communicate with employees because it was the
only method by which to do so.
The major conclusion and lesson to
be learned: The 9/11 tragedy is very likely to spur use of the Internet and
Internet protocol (IP)-based networking as the next generation of disaster
recovery plans evolve. Far from discouraging organizations from doing backup
of the Web, the opposite is much more
likely to occur.
The Internet is, however, extremely
vulnerable to the risks associated with
cyberterrorism. In the InformationWeek
on paper
documents are
most vulnerable to
significant losses.
and after 9/11, many organizations
were crippled because they could not
access the servers where their data
was stored. The terrorist attacks have
thus underscored the need to deploy
digital assets in a manner that is
designed to reduce the risk of attacks on
any one location where these assets are
deployed, according to InfoWorld article
“Enterprise Storage.”
In the wake of the attacks, organizations must consider strategies for dispersing computer data over multiple
servers and multiple locations. One
such approach to decentralized computing that is receiving increased attention is the use of storage area networks
(SANs). SANs are dedicated storage networks in which servers and storage
devices are connected by hubs and
switches. The network’s software permits the centralized management of
38
The Information Management Journal
•
January/February 2003
article “Terror Attack Brings Renewed
Emphasis On Security,” Dennis Treece
of Internet Security Systems Inc. states:
“The way I look at vulnerability is to see
how dependent companies are on
Internet connectivity.” Companies
whose entire business is based on the
Internet are completely vulnerable.
The Vulnerability of Paper
The 9/11 tragedy has had a huge
impact on the way in which organizations view paper records. This disaster
was unique in that almost no paper
records survived the terrorist attacks.
Except in cases where people took some
paper documents with them in their
briefcases when they escaped the collapsing buildings, virtually all paper
records were destroyed.
What lessons can be learned about
paper records from what happened on
9/11? The following points are most
significant:
• Organizations that rely heavily on
paper documents are most vulnerable to significant losses. Law firms
and insurance companies are examples of businesses that remain very
paper-intensive and, therefore, most
vulnerable. As noted earlier, smaller
organizations are generally more at
risk than larger ones.
• Because paper records have been
declining in importance relative to
computer-based records for many
years, organizations that have aggressively applied new technologies to
automate their business processes are
much less vulnerable than those that
have not.
• For most large organizations today, to
lose all paper records would be
extremely inconvenient, but it would
not actually put the organization out
of business. To cite just one example,
the headquarter offices of the Port
Authority of New York and New
Jersey were located in the WTC.
Nearly all paper records were
destroyed but the Port Authority
remained in business and says it is
now recovering.
• On the other hand, for an organization to lose all its computer records
would be truly cataclysmic and
would very likely result in the demise
of the organization.
• To cite another example, Marsh &
McLennan, a WTC tenant, had been
engaged in a five-year program to
convert paper documents to scanned
images. The effort paid off. More
than 25 million business documents
that had been converted from paper
to scanned images were backed up
offsite and, therefore, were saved
from destruction. According to
InfoWorld, only one business day’s
worth of records was lost.
media. These attempts to retrieve old
data uncovered significant problems
and the need for better long-term data
retention practices. In many cases, the
data on the backup tapes could not be
accessed. In some cases, this was due
to data errors or defective media. In
other cases, when information several
years old needed to be reconstructed,
conflicting formats sometimes stymied
the efforts.
employees. Most, if not all, records of
official character and high strategic
importance should be converted to digital format as soon as possible. Every
disaster planning initiative should
incorporate a document digitization
strategy to make this happen.
Digital Preservation Best Practices
September 11 resulted in a flood of
retrievals for data residing on backup
• Offsite protection has always been a
better and more reliable means of
protecting vital data, but 9/11 highlighted the utter futility of onsite protection strategies. For many years,
records management specialists have
relied on fire-resistive filing cabinets
and fireproof vaults as the primary
methods for protecting paper records
onsite in cases where it is not feasible
to protect them by sending either the
originals or duplicate copies to a
secure offsite location. However, if an
organization is vulnerable to a terrorist attack of the severity of those that
occurred on 9/11, the only feasible
strategy to protect the records is to
get them offsite.
Where paper records are concerned,
the major lesson to be learned from
9/11 is that organizations should adopt
the long-term goal of converting to digital format every paper-based recordkeeping system of mission-critical
importance as soon as resources and
priorities allow.
Most experts usually recommend
that organizations give themselves five
years to get out of paper, at least for
their most important, business-critical
applications. Generally, paper should
be reserved for small quantities of
records of low value – personal working
papers kept for convenience or reference at or near the workstations of
January/February 2003
•
The Information Management Journal
39
ment is an element in disaster
planning often overlooked by busy
IT departments in the rush to
bring new applications on stream.
Many systems in a stage of development need to be backed up,
particularly if they are deemed
business-critical.
Best practices for long-term data
retention include
• selection of storage media specifically
intended to support extended-term
data retention
• standardization of file formats
• proper management of metadata
• proper maintenance of systems documentation
David O. Stephens, CRM, FAI, CMC, is Vice President, Records Management
Consulting, at Zasio Enterprises. He may be reached at [email protected].
• proper housing of stored media
• regular inspection and maintenance
of stored media
References
Andolsen, Alan A. “On the Horizon.” The Information Management Journal.
March/April 2002.
It is important to inspect archived
data regularly to ensure that it can be
processed and that the medium hasn’t
become outdated, given current backup
systems. Organizations should implement formal practices to ensure the
integrity and retrievability of backup
data, no matter how long it may reside
on storage media.
Ballman, Janette. “Merrill Lynch Resumes Critical Business Functions Within
Minutes of Attack.” Disaster Recovery Journal. Fall 2001.
———. “Terrorist Attacks Have Far-Reaching Effects on Businesses.” Disaster
Recovery Journal. Fall 2001.
Behar, Richard. “Fear Along the Firewall.” Fortune. 15 October 2002.
Brewin, Bob. “Nation’s Networks See Sharp Volume Spikes After Attacks.”
Computerworld. 17 September 2001.
Chandler, Robert C. and J.D. Wallace. “What Disaster Recovery Experts Were
Thinking Just After the Attacks.” Disaster Recovery Journal. Winter 2002.
Lessons Learned
• Disaster recovery plans should be
summarized in a few pages and
always carried in a briefcase and/or a
PDA.
D’Auria, Thomas. “Facilitation, Cooperation Guide New York City to a Quick
Recovery.” Disaster Recovery Journal. Winter 2002.
DeJesus, Audrey. “Real Life After September 11.” Disaster Recovery Journal. Spring
2002.
• These plans should clearly delineate
the responsibilities of all employees
and include critical phone numbers, communications inventories,
hardware and software inventories,
master call lists, master vendor lists,
and inventories of offsite storage
facilities.
“Disaster Takes Toll on Pubic Network.” Informationweek. 17 September 2001.
“Enterprise Storage.” InfoWorld. 26 February 2001.
Foley, John. “Ready for Anything?” InformationWeek. 24 September 2001.
Grygo, Eugene, Ed Scannell, Mathew Woollacott, and Dan Neel. “Industry Toils to
Restore Trading.” InfoWorld. 24 September 2001.
“IT Recovery Efforts Forge Ahead.” InfoWorld. 17 September 2001.
• Vital records and data should never
be stored solely on local PC hard
drives. Organizational policies
should require that all mission-critical documents and data be saved to
network servers so they can be routinely backed up.
Kempster, Linda. “2001: The Year of Vulnerability.” Storage Management Solutions.
Vol. 7, No. 2, 2002.
Mehta, Stephanie N. “Telco on the Frontline.” Fortune. 15 October 2001.
Rendleman, John, et al. “Assessing the Impact.” InformationWeek. 17 September 2001.
——— and George V. Hulme. “Security Synergy.” InformationWeek. 22 July 2002.
• Keep copies of all software and passwords offsite, including client and
server installation disks.
Rynecki, David. “The Bull Fights Back.” Fortune. 15 October 2001.
Schoeder, Michael and Kate Kelly. “U.S. May Require Backup Wall Street.” Wall Street
Journal. 22 October 2002.
• According to the Disaster Recovery
Journal article “Real Life After
September 11,” data about new
computer systems under develop-
40
The Information Management Journal
If American governmental and business organizations apply these lessons,
they will be in a high state of readiness
to meet whatever emergency may arise
in the future. Where the integrity of
their recordkeeping systems is concerned, they will be better and
stronger than ever.
“Terror Attack Brings Renewed Emphasis On Security.” InformationWeek. 17
September 2001.
Whiting, Rick and Eric Chabrow. “Safety in Sharing.” InformationWeek. 8 October 2002.
•
January/February 2003