to view the presentation - College of Law

0
E-Discovery on a Budget
© 2015
Craig Ball
About this Collection ........................................................................................................ 1
A Lawyers’ Introduction to Digital Computers, Servers and Storage ............................... 4
Eight Tips to Quash the Cost of E-Discovery ................................................................. 24
E-Discovery for Everybody: The EDna Challenge ......................................................... 31
Ten Things That Trouble Judges About E-Discovery .................................................... 42
Preserving Google Content for Dummies ...................................................................... 47
Easing the Pain of E-Discovery with ESI Special .......................................................... 50
Gold Standard ............................................................................................................... 56
Ten Bonehead Mistakes in E-Discovery ........................................................................ 59
About the Author ............................................................................................................ 63
About this Collection
What should e-discovery cost?
That’s easy. It should cost less. Much less. E-discovery should cost proportionately less than the
amounts at issue in business and damages cases; and e-discovery should not cost so much as to
chill the willingness of litigants to bring meritorious cases and defend those without merit. Moreover,
e-discovery shouldn’t cost much more than the ways we used to do discovery before all the “e-stuff”
got out of hand.
Lawyers and judges generally agree with these cost propositions. Who wouldn’t?
Happily, e-discovery can handily meet these goals. It’s the wasteful “extras” that kill you, like:
1. Over preserving information because digital information is poorly managed;
2. Preserving data by copying data, generating multiple divergent sets over time;
3. Using crude search mechanisms to segregate potentially responsive information;
4. Finding and culling privileged information commingled with other data;
5. Converting functional and complete forms to degraded forms for review and production;
6. Relying upon labor-intensive item-by-item human review to assess relevance; and,
7. Though not an extra, insufficient skill to make defensible choices about volume and cost.
One reason e-discovery is more costly than paper discovery is because we didn’t include the cost
of information management in our tabulation of paper discovery costs. Clients bore the expense of
keeping records organized as a cost of doing business, and information tended to be produced in
1
forms used in the ordinary course of business, i.e., on paper. This meant less information found its
way into the hands of the biggest contributors to discovery expense: the lawyers.
Another reason discovery on paper cost less is because everyone using paper grasped the
fundamentals of managing information on paper. You could reliably gauge volume by looking at the
size of a pile of papers or by counting boxes or drawers. Information was encoded in accessible
ways; e.g., for Americans, in decimal numbers and English words. Information was also logically
unitized by staples, paper clips, binders and folders. Retention was routinely managed because
excessive retention was expensive and inconvenient.
It was a splendid system. We adored it. We miss it. Now, we must get over it because it’s gone.
Kaput! It’s not a system that can be scaled to an information society. We cannot practically or costeffectively adapt a paper-centric system to the amount and variety of digital information generated
today. Lawyers still refuse to accept this and squander huge sums in their obstinacy.
Don’t get me wrong. I love lawyers. Most of my friends are lawyers. I’d be proud if my kids become
lawyers. Some of the best and brightest people I’ve ever known are lawyers.
I share my affection for my colleagues so you’ll know where I’m coming from when I confess I’m
chagrinned by lawyers’ steadfast refusal to acquire even the most basic competency in electronic
discovery and digital evidence. Electronic evidence is fast becoming the most ubiquitous, probative
and powerful proof extant; yet, the justice system abides a crisis of competence when it comes to
discovery of electronically stored information. It’s a crisis that carries an awesome cost—millions
upon millions of dollars wasted on overbroad preservation and collection, purposeless processing
and—worst of all—profligate document review efforts destined to turn courtrooms into country clubs
and suck the souls from young lawyers.
The fault doesn’t lie with e-discovery or the rules of procedure or even (greedy) plaintiffs or (greedy)
defendants. I have seen the enemy, and it is us.
America's halcyon days of hammer and harness are behind us. We are all knowledge workers now;
yet, even those who drive trucks or empty bedpans are tasked by pixels and tracked by bytes. The
evidence of what we do and say, of when and where and how we go, of what we own and earn and
spend is digital. More than 99% of it will never exist as anything but electronically stored information,
and most takes forms that require special tools or expertise to see and interpret. This irritates and
intimidates old school lawyers. At great cost to unwitting clients, the old school cling to what they
know and disregard the rest. They print documents or convert them to paper-like formats like TIFF.
They unleash armies of reviewers against hordes of irrelevant documents. They thunder that ediscovery is "out-of-control," extolling the merits of raw meat rather than learning to make fire.
2
A lawyer without the skills needed to properly preserve, collect, analyze and present electronic
evidence is all-but-incompetent to manage litigation today, and visiting the cost to compensate for
those shortcomings upon the client is an ethical minefield.
That's why you must make it your mission to master electronic discovery, and help staunch the
hemorrhaging of money stemming from lawyer incompetence in e-discovery.
Now, let me tell you why you'll be glad you did.
Occasionally you’ll win a case on charm, a good or bad judge, an appealing client, a hateful opponent
or just dumb luck. But, without any of these things, you'll win most of the time if you have the
evidence proving your case. Much of that evidence is digital. It's there. It's waiting for you--eager
to tell its compelling story, ready to show your client was right and the other side should pay big or
go hence without day. The lawyer who can get to the digital evidence--find it, understand it and use
it in a cost-effective fashion--enjoys an enormous competitive advantage.
The selected articles and columns that follow were chosen to help you identify ways to tame the cost
of e-discovery. They are a small sampling of the articles I've written about electronic discovery and
computer forensics, available at www.craigball.com and ballinyourcourt.com. I hope you find them
to be a helpful, accessible introduction to the cost-saving side of electronic discovery.
Craig Ball, January 2015
3
A Lawyers’ Introduction to Digital Computers, Servers and Storage
By Craig Ball © 2014
In 1774, a Swiss watchmaker named Pierre
Jaquet-Droz built an ingenious mechanical
doll resembling a barefoot boy. Constructed
of 6,000 handcrafted parts and dubbed
"L'Ecrivain” (“The Writer”), Jaquet-Droz’
automaton uses quill and ink to handwrite
messages in cursive, up to 40 letters long, with
the content controlled by interchangeable
cams. The Writer is a charming example of an
early programmable computer.
The monarchs that marveled at Jaquet-Droz’
little penman didn’t need to understand how it
worked to enjoy it. Lawyers, too, once had little need to understand the operation of their clients’
information systems in order to conduct discovery. But as the volume of electronically stored
information (ESI) has exploded and the forms and sources of ESI continue to morph and multiply,
lawyers conducting electronic discovery cannot ignore the watch works. New standards of
competence demand that lawyers master some fundamentals of information technology and
electronic evidence.
Digital Data
Despite its daunting complexity, all digital content—photos, music, documents, spreadsheets,
databases, social media and communications—exist in one common and mind-boggling form: as
an unbroken string of ones and zeroes, most memorialized as impossibly tiny reversals of magnetic
polarity. These minute fluctuations must be read by a detector riding above the surface of a spinning
disk on a cushion of air one-thousandth the width of a human hair in an operation akin to a jet fighter
flying around the world at more than 800 times the speed of sound, less than a millimeter above the
ground…and precisely counting every blade of grass it passes.
That’s astonishing, but what should astound you more is that there are no pages, paragraphs,
spaces or markers of any kind to define the data stream. That is, the history, knowledge and
creativity of humankind have been reduced to two different states (on/off…one/zero) in an unbroken,
featureless expanse. Moreover, it’s a data stream that carries not only the information we store but
all of the instructions needed to make sense of that data, as well. The data stream holds all of the
information about the data required to play it, display it, transmit it or otherwise put it to work. It’s a
reductive feat that’ll make your head spin…or at least make you want to buy a computer scientist a
beer.
4
Data, Not Documents
Lawyers—particularly those who didn’t grow up with computers—tend to equate data with
documents when, in a digital world, documents are just one variant of the many forms in which
electronic information exists. Documents like the letters, memos and reports of yore account for a
dwindling share of electronically stored information relevant in discovery, and documents generated
from electronic sources tend to convey just part of the information stored in the source. The decisive
information in a case may exist as nothing more than a single bit of data that, in context, signals
whether the fact you seek to establish is true or not. A Facebook page doesn’t exist until a request
sent to a database triggers the page’s assembly and display. Word documents, PowerPoint
presentations and Excel spreadsheets lose content and functionality when printed to screen images
or paper.
With so much discoverable information bearing so little resemblance to documents, and with
electronic documents carrying much more probative and useful information than a printout or screen
image conveys, competence in electronic discovery demands an appreciation of data more than
documents.
Binary
When we were children starting to count, we had to learn the decimal system. We had to think about
what numbers meant. When our first grade selves tackled a big number like 9,465, we were acutely
aware that each digit represented a decimal multiple. The nine was in the thousands place, the four
in the hundreds, the six in the tens place and so on. We might even have parsed 9,465 as: (9 x
1000) + (4 x 100) + (6 x 10) + (5 x 1).
But soon, it became second nature to us. We’d unconsciously process 9,465 as nine thousand four
hundred sixty-five. As we matured, we learned about powers of ten and now saw 9,465 as: (9 x 103)
+ (4 x 102) + (6 x 101) + (5 x 100). This was exponential or “base ten” notation.
Mankind probably uses base ten to count because we evolved with ten fingers. But, had we slithered
from the ooze with eight or twelve digits, we’d have gotten on splendidly using a base eight or base
twelve number system. It really wouldn’t matter because any number--and consequently any data-can be expressed in any number system. So, it happens that computers use the base two or
“binary” notation, and computer programmers are partial to base sixteen or “hexadecimal” notation.
It’s all just counting.
Bits
Computers use binary digits in place of decimal digits. The word bit is even a shortening of the
words "Binary digIT." Unlike the decimal system, where any number is represented by some
combination of ten possible digits (0-9), the bit has only two possible values: zero or one. This is not
as limiting as one might expect when you consider that a digital circuit—essentially an unfathomably
5
complex array of switches—hasn’t got any fingers to
count on, but is very good and very fast at being “on” or
“off.”
In the binary system, each binary digit—“bit”—holds the
value of a power of two. Therefore, a binary number is
composed of only zeroes and ones, like this: 10101. How
do you figure out what the value of the binary number
10101 is? You do it in the same way we did it above for
9,465, but you use a base of 2 instead of a base of 10.
Hence: (1 x 24) + (0 x 23) + (1 x 22) + (0 x 21) + (1 x 20) =
16 + 0 + 4 + 0 + 1 = 21.
Moving from right to left, each bit you encounter
represents the value of increasing powers of 2, standing
in for zero, two, four, eight, sixteen, thirty-two, sixty-four
and so on. That makes counting in binary pretty easy.
From zero to 21, decimal and binary equivalents look like
the table at right.
Bytes
A byte is a sequence or “string” of eight bits. The biggest
number that can be stored as one byte of information is
11111111, equal to 255 in the decimal system. The
smallest number is zero or 00000000. Thus, there are 256
different numbers that can be stored as one byte of
information. So, what do you do if you need to store a number larger than 256? Simple! You use
a second byte. This affords you all the combinations that can be achieved with 16 bits, being the
product of all the variations of the first byte and all of the second byte (256 x 256 or 65,536). So,
using bytes to express values, any number that is greater than 256 needs at least two bytes to be
expressed (called a “word” in geek speak), and any number above 65,536 requires at least three
bytes, and so on. A value greater than 16,777,216 (256 3 or 224) needs four bytes (called a “long
word”) and so on.
Let’s try it: Suppose we want to represent the number 51,975. It’s 1100101100000111, viz:
215
214
213
212
211
210
32768 16384 8192 4096 2048 1024
1
1
0
0
1
0
(32768+16384+2048+512+256) or 51,968
29
512
1
28
256
1
+
+
27
26 25
128 64 32
0
0
0
(4+2+1) or 7
24
16
0
23
8
0
22
4
1
21
2
1
20
1
1
Why is an eight-bit sequence the fundamental building block of computing? It just sort of happened
that way. In this time of cheap memory, expansive storage and lightning-fast processors, it’s easy
6
to forget how scarce and costly these resources were at the dawn of the
computing era. Seven bits (with a leading bit reserved) was basically the
smallest block of data that would suffice to represent the minimum
complement of alphabetic characters, decimal digits, punctuation and control
instructions needed by the pioneers in computer engineering. It was, in
another sense, about all the data early processors could chew on at a time,
perhaps explaining the name “byte” (coined by IBM scientist, Dr. Werner
Buchholz, in 1956).
Werner Bucholz
The Magic Decoder Ring called ASCII
Back in 1935, American kids who listened to the Little Orphan
Annie radio show (and who drank lots of Ovaltine) could join the
Radio Orphan Annie Secret Society and obtain a device with
rotating disks that allowed them to write secret messages in
numeric code.
Similarly, computers encode words as numbers. Binary data
stand in for the upper and lower case English alphabet, as well
as punctuation marks, special characters and machine
instructions (like carriage return and line feed). The most widely
deployed U.S. encoding mechanism is known as the ASCII code
(for American Standard Code for Information Interchange,
pronounced “ask-key”). By limiting the ASCII character set to just 128 characters, any character can
be expressed in just seven bits (27 or 128) and so occupies less than one byte in the computer's
storage and memory. In the Binary Table that follows, the columns reflect a binary (byte) value, its
decimal equivalent and the corresponding ASCII text value (including some for machine codes and
punctuation):
ASCII Table
Binary
Decimal
Character
Binary
Decimal
Character
Binary
Decimal
Character
00000000
000
NUL
00101011
043
+
01010110
086
V
00000001
001
SOH
00101100
044
,
01010111
087
W
00000010
002
STX
00101101
045
-
01011000
088
X
00000011
003
ETX
00101110
046
.
01011001
089
Y
00000100
004
EOT
00101111
047
/
01011010
090
Z
00000101
005
ENQ
00110000
048
0
01011011
091
[
00000110
006
ACK
00110001
049
1
01011100
092
\
00000111
007
BEL
00110010
050
2
01011101
093
]
00001000
008
BS
00110011
051
3
01011110
094
^
00001001
009
HT
00110100
052
4
01011111
095
_
00001010
010
LF
00110101
053
5
01100000
096
`
00001011
011
VT
00110110
054
6
01100001
097
a
7
00001100
012
FF
00110111
055
7
01100010
098
b
00001101
013
CR
00111000
056
8
01100011
099
c
00001110
014
SO
00111001
057
9
01100100
100
d
00001111
015
SI
00111010
058
:
01100101
101
e
00010000
016
DLE
00111011
059
;
01100110
102
f
00010001
017
DC1
00111100
060
<
01100111
103
g
00010010
018
DC2
00111101
061
=
01101000
104
h
00010011
019
DC3
00111110
062
>
01101001
105
i
00010100
020
DC4
00111111
063
?
01101010
106
j
00010101
021
NAK
01000000
064
@
01101011
107
k
00010110
022
SYN
01000001
065
A
01101100
108
l
00010111
023
ETB
01000010
066
B
01101101
109
m
00011000
024
CAN
01000011
067
C
01101110
110
n
00011001
025
EM
01000100
068
D
01101111
111
o
00011010
026
SUB
01000101
069
E
01110000
112
p
00011011
027
ESC
01000110
070
F
01110001
113
q
00011100
028
FS
01000111
071
G
01110010
114
r
00011101
029
GS
01001000
072
H
01110011
115
s
00011110
030
RS
01001001
073
I
01110100
116
t
00011111
031
US
01001010
074
J
01110101
117
u
00100000
032
SP
01001011
075
K
01110110
118
v
00100001
033
!
01001100
076
L
01110111
119
w
00100010
034
"
01001101
077
M
01111000
120
x
00100011
035
#
01001110
078
N
01111001
121
y
00100100
036
$
01001111
079
O
01111010
122
z
00100101
037
%
01010000
080
P
01111011
123
{
00100110
038
&
01010001
081
Q
01111100
124
|
00100111
039
'
01010010
082
R
01111101
125
}
00101000
040
(
01010011
083
S
01111110
126
~
00101001
041
)
01010100
084
T
01111111
127
DEL
00101010
042
*
01010101
085
U
So, “E-Discovery” would be written in a binary ASCII sequence as:
0100010100101101010001000110100101110011011000110110111101110110011001010111001001111001
Now that you have some sense of how information can be written as digital data, let’s take a look
at some of the devices that store and utilize digital data.
Introduction to Data Storage Media
Mankind has been storing data for thousands of years, on stone, bone, clay, wood, metal, glass,
skin, papyrus, paper, plastic and film. In fact, people were storing data in binary formats long
before the emergence of modern digital computers. Records from 9th century Persia describe an
8
organ playing interchangeable cylinders. Eighteenth century textile manufacturers employed
perforated rolls of paper to control looms, and Swiss and German music box makers used metal
drums or platters to store tunes. At the dawn of the Jazz Age, no self-respecting American family
of means lacked a player piano capable (more-or-less) of reproducing the works of the world’s
greatest pianists.
Whether you store data as a perforation or a pin, you’re storing binary data. That is, there are two
data states: hole or no hole, pin or no pin. Zeroes or ones.
Punched Cards
In the 1930’s, demand for electronic data storage led to the development of fast, practical and costeffective binary storage media. The first of these were punched cards, initially made in a variety of
sizes and formats, but ultimately standardized
IBM 5081 80 column card
by IBM as the 80 column, 12 row (7.375” byy
3.25”) format (right) that dominated
computing well into the 1970’s. [From 197579, the author spent many a midnight in the
basement of a computer center at Rice
University typing program instructions on
these unforgiving punch cards].
The 1950’s saw the emergence of magnetic
storage as the dominant medium for
electronic data storage, and it remains so today. Although optical and solid state storage are
expected to ultimately eclipse magnetic media for local storage, magnetic storage will continue to
dominate network and cloud storage well into the 2020s, if not beyond.
9
Tape
The earliest popular form of magnetic data storage was
magnetic tape. Spinning reels of tape were a clichéd visual
metaphor for computing in films and television shows from
the 1950s through 1970’s. Though the miles of tape on
those reels now resides in cartridges and cassettes, tapes
remain an enduring medium for backup and archival of
electronically stored information.
The LTO-5 format
introduced in 2010 natively holds 1.5 terabytes of
uncompressed data and delivers a transfer rate of 140
megabytes per second. Since most data stored on backup
tape is compressed, the actual volume of ESI on tape may
be 2-3 times greater than the native capacity of the tape.
Magnetic tape was the earliest data storage medium for
personal computers including the pioneering Radio Shack
TRS-80
and the very first IBM personal computer, the model
TRS
XT.
While tape isn’t as fast or capacious as hard drives, it’s proven to be more durable and less costly
for long term storage; that is, so long as the data is being stored, not restored.
5 Ultrium Tape
LTO-5
Sony AIT-3 Tape
SDLT-II Tape
Chronology of Magnetic Tape
Formats for Data Storage (Wikipedia)
1951 – UNISERVO
1952 - IBM 7 track
1958 - TX-2 Tape System
1962 – LINCtape
1963 – DECtape
1964 - 9 Track
1964 – MagCard Selectric typewriter
10
1986 - SLR
1987 - Data8
1989 - DDS/DAT
1992 - Ampex DST
1994 - Mammoth
1995 - IBM 3590
1995 - Redwood SD-3
1966 - 8-Track Tape
1972 - QIC
1975 - KC Standard, Compact
Cassette
1976 - DC100
1977 - Commodore Datasette
1979 – DECtapeII
1979 - Exatron Stringy Floppy
1983 - ZX Microdrive
1984 - Rotronics Wafadrive
1984 - IBM 3480
1984 - DLT
1995 - Travan
1996 - AIT
1997 - IBM 3570 MP
1998 - T9840
1999 – VXA
2000 - T9940
2000 - LTO Ultrium
2003 - SAIT
2006 - T10000
2007 - IBM 3592
2008 - IBM TS1130
2011 - IBM TS1140
For further information, see Ball, Technology Primer: Backups in Civil Discovery at
http://www.craigball.com/Ball_Technology%20Primer-Backups%20in%20E-Discovery.pdf
Floppy Disks
It’s rare to encounter a floppy disk
today, but floppy disks played a
central
role
in
software
distribution and data storage for
personal computing for almost
thirty years. Today, the only
place a computer user is likely to
see a floppy disk is as the menu
icon for storage on the menu bar
of Microsft Office applications. All
floppy disks have a spinning,
flexible plastic disk coated with a magnetic oxide (e.g., rust).
The disk is essentially the same composition as magnetic tape
in disk form. Disks are formatted (either by the user or preformatted by the manufacturer) so as to divide the disk into
various concentric rings of data called tracks, with tracks further
subdivided into tiny arcs called sectors. Formatting enables
systems to locate data on physical storage media much as roads
and lots enable us to locate homes in a neighborhood.
8", 5.25" and 3.5" Floppy
Disks
8" Floppy Disk in Use
Though many competing floppy disk sizes and formats have
been introduced since 1971, only five formats are likely to be
encountered in e-discovery. These are the 8”, 5.25”, 3.5
standard, 3.5 high density and Zip formats and, of these, the 3.5HD format 1.44 megabyte capacity
floppy is by far the most prevalent legacy floppy disk format.
11
The Zip Disk was one of several proprietary “super floppy”
products that enjoyed brief success before the high capacity
and low cost of recordable optical media (CD-R and DVD-R)
and flash drives rendered them obsolete.
Zip Disk
Optical Media
The most common forms of optical media for data storage are
the CD, DVD and Blu-ray disks in read only, recordable or
rewritable formats. Each typically exists as a 4.75” plastic disk
with a metalized reflective coating
and/or dye layer that can be
distorted by a focused laser beam to induce pits and lands in the media.
These pits and lands, in turn, interrupt a laser reflected off the surface
of the disk to generate the ones and zeroes of digital data storage. The
practical difference between the three prevailing forms of optical media
are their native data storage capacities and the availability of drives to
read them.
A CD (for Compact Disk) or CD-ROM (for CD Read Only Media) is read
only and not recordable by the end user. It’s typically fabricated in
factory to carry music or software. A CD-R is recordable by the end user, but once a recording
session is closed, it cannot be altered in normal use. A CD-RW is a re-recordable format that can
be erased and written to multple times. The native data storage capacity of a standard-size CD is
about 700 megabytes.
A DVD (for Digital Versitile Disk) also comes in read only, recordable (DVD±R) and rewritable
(DVD±RW) iterations and the most common form of the disk has a native data storage capacity of
approximately 4.7 gigabytes. So, one DVD holds the same amount of data as six and one-half
CDs.
By employing the narrower wavelength of a blue laser to read and write disks, a dual layer Blu-ray
disk can hold up to about 50 gigabytes of data, equalling the capacity of about ten and one-half
DVDs. Like their predecessors, Blu-ray disks are available in recordable (BD-R) and rewritable (CDRE) formats
Though ESI resides on a dizzying array of media and devices, by far the largest complement of
same occurs within three closely-related species of computing hardware: computers, hard drives
and servers. A server is essentially a computer dedicated to a specialized task or tasks, and both
servers and computers routinely employ hard drives for program and data storage.
Conventional Electromagnetic Hard Drives
A hard drive is an immensely complex data storage device that’s been engineered to appear
deceptively simple. When you connect a hard drive to your machine, and the operating system
detects the drive, assigns it a drive letter and—presto!—you’ve got trillions of bytes of new storage!
12
Microprocessor chips garner the glory, but the humdrum hard drive is every bit a paragon of ingenuity
and technical prowess.
A conventional personal computer hard drive is a sealed aluminum box measuring (for a desktop
system) roughly 4” x 6” x 1” in height. A hard drive can be located almost anywhere within the case
and is customarily secured by several screws attached to any of ten pre-threaded mounting holes
along the edges and base of the case. One face of the case will be labeled to reflect the drive
specifications, while a printed circuit
board containing logic and controller
circuits will cover the opposite face.
A conventional hard disk contains
round, flat discs called platters,
coated on both sides with a special
material able to store data as
magnetic patterns. Much like a record
player, the platters have a hole in the
center allowing multiple platters to be
stacked on a spindle for greater
storage capacity.
The platters rotate at high speed—
typically 5,400, 7,200 or 10,000
rotations per minute—driven by an
electric motor. Data is written to and
read from the platters by tiny devices called read/write heads mounted on the end of a pivoting
extension called an actuator arm that functions similarly to the tone arm that carried the phonograph
cartridge and needle across the face of a record. Each platter has two read/write heads, one on the
top of the platter and one on the bottom. So, a conventional hard disk with three platters typically
sports six surfaces and six read/write heads.
Unlike a record player, the read/write head never touches the spinning platter. Instead, when the
platters spin up to operating speed, their rapid rotation causes air to flow under the read/write heads
and lift them off the surface of the disk—the same principle of lift that operates on aircraft wings and
enables them to fly. The head then reads the magnetic patterns on the disc while flying just .5
millionths of an inch above the surface. At this speed, if the head bounces against the surface, there
is a good chance that the head will burrow into the surface of the platter, obliterating data, destroying
both read/write heads and rendering the hard drive inoperable—a so-called “head crash.”
13
The hard disk drive has been around for more than 50
years, but it was not until the 1980’s that the physical
size and cost of hard drives fell sufficiently for their use
to be commonplace.
Introduced in 1956, the IBM 350 Disk Storage Unit
pictured was the first commercial hard drive. It was 60
inches long, 68 inches high and 29 inches deep (so it
could fit through a door). It held 50 magnetic disks of
50,000 sectors, each storing 100 alphanumeric
characters. Thus, it held 4.4 megabytes, or enough for
about two cellphone snapshots today. It weighed a ton
(literally), and users paid $130.00 per month to rent
each megabyte of storage.
Today, that same $130.00 buys a 3-4 terabyte hard
drive that stores 3 million times
more information, weighs less
paperback book.
Over time, hard drives took
as the standard dimensions of
speak). Three form factors are
(laptop drive) and 1.8” (iPod and
solid state storage).
$130.00
than three pounds and hides behind a
various shapes and sizes (or “form factors”
key system components are called in geek
still in use: 3.5” (desktop drive), 2.5”
microsystem drive, now supplanted by
Hard
drives
connect
to
computers by various mechanisms called
“interfaces” that describe both
how devices “talk” to one-another as well
as the physical plugs and cabling required. The five most common hard drive interfaces in use today
are:
PATA for Parallel Advanced Technology Attachment (sometimes called EIDE for Extended
Integrated Drive Electronics):
SATA for Serial Advanced Technology Attachment
SCSI for Small Computer System Interface
SAS for Serial Attached SCSI
FC for Fibre Channel
Though once dominant in personal computers, PATA drives are rarely found in machines
manufactured after 2006. Today, virtually all laptop and desktop computers employ SATA drives for
local storage. SCSI, SAS and FC drives tend to be seen exclusively in servers and other applications
demanding high performance and reliability.
14
From the user’s perspective, PATA, SATA, SCSI, SAS and FC drives are indistinguishable; however,
from the point of view of the technician tasked to connect to and image the contents of the drive, the
difference implicates different tools and connectors.
The five drive interfaces divide into two employing
parallel data paths (PATA and SCSI) and three
employing serial data paths (SATA, SAS and FC).
Parallel ATA interfaces route data over multiple
simultaneous channels necessitating 40 wires where
serial ATA interfaces route data through a single, highspeed data channel requiring only 7 wires. Accordingly,
SATA cabling and connectors are smaller than their
PATA counterparts (see photos, right).
Fibre Channel employs optical fiber (the spelling
difference is intentional) and light waves to carry data
at impressive speeds. The premium hardware required
by FC dictates that it will be found in enterprise
computing environments, typically in conjunction with a
high capacity/high demand storage device called a
SAN (for Storage Attached Network) or a NAS (for
Network Attached Storage).
It’s easy to become confused between hard drive interfaces and external data transfer interfaces like
USB or FireWire seen on external hard drives. The drive within the external hard drive housing will
employ one of the interfaces described above (except FC); however, to facilitate external connection
to a computer, a device called a bridge will convert data written to and from the hard drive to a form
that can traverse a USB or FireWire connection. In some compact, low-cost external drives,
manufacturers dispense with the external bridge board altogether and build the USB interface right
on the hard drive’s circuit board.
15
Flash Drives, Memory Cards and Solid State Drives
Computer memory storage devices have no moving
parts and the data resides entirely within the solid
materials which compose the memory chips, hence the
term, “solid state.” Historically, rewritable memory was
volatile (in the sense that contents disappeared when
power was withdrawn) and expensive. But, beginning
around 1995, a type of non-volatile memory called NAND
flash became sufficiently affordable to be used for
removable storage in emerging applications like digital
photography. Further leaps in the capacity and dips in
the cost of NAND flash led to the near-eradication of film
for photography and the extinction of the floppy disk,
replaced by simple, inexpensive and reusable USB
storage devices called, variously, flash drives, thumb drives,
pen drives and memory sticks or keys.
As the storage capacity of NAND flash has gone up and its cost has come down, the conventional
electromagnetic hard drive is rapidly being replaced by solid state drives in standard hard drive
form factors. Solid state drives are significantly faster, lighter and more energy efficient than
conventional drives, but they currently cost anywhere from 10-20 times more per gigabyte than their
mechanical counterparts. All signs point to the ultimate obsolescence of mechanical drives by solid
state drives, and some products (notably tablets like the iPad and ultra-lightweight laptops like the
MacBook Air) have eliminated hard drives altogether in favor of solid state storage.
Currently, solid state drives assume the size and shape of mechanical drives to facilitate compatibility
with existing devices. However, the size and shape of mechanical hard drives was driven by the
size and operation of the platter they contain. Because solid state storage devices have no moving
parts, they can assume virtually any shape. It’s likely,
then, that slavish adherence to 2.5” and 3.5” rectangular
form factors will diminish in favor of shapes and sizes
uniquely suited to the devices that employ them.
With respect to e-discovery, the shift from electromagnetic
to solid state drives is inconsequential. However, the
move to solid state drives will significantly impact matters
necessitating computer forensic analysis. Because the
NAND memory cells that comprise solid state drives wear out rapidly with use, solid state drive
controllers must constantly reposition data to insure usage is distributed across all cells. Such “wear
leveling” hampers techniques that forensic examiners have long employed to recover deleted data
from conventional hard drives.
RAID Arrays
16
Whether local to a user or in the Cloud, hard drives account for nearly all the electronically stored
information attendant to e-discovery. In network server and Cloud applications, hard drives rarely
work alone. That is, hard drives are ganged together to achieve greater capacity, speed and
reliability in so-called Redundant Arrays of Independent Disks or RAIDs. In the SAN pictured at left,
the 16 hard drives housed in trays may be accessed as Just a Bunch of Disks or JBOD, but it’s far
more likely they are working together as a RAID
RAIDs serve two ends: redundancy and performance.
The redundancy aspect is obvious—two drives holding
identical data safeguard against data loss due to
mechanical failure of either drive—but how do multiple
drives improve performance? The answer lies in
splitting the data across more than one drive using a
technique called striping.
A RAID improves performance by dividing data across
more than one physical drive. The swath of data deposited on one drive in an array before moving
to the next drive is called the "stripe." If you imagine the drives lined up alongside one-another, you
can see why moving back-and-forth the drives to store data might seem like painting a stripe across
the drives. By striping data, each drive can deliver their share of the data simultaneously, increasing
the amount of information handed off to the computer’s microprocessor.
But, when you stripe data across drives, Information is lost if any drive in the stripe fails. You gain
performance, but surrender security.
This type of RAID configuration is called a RAID 0. It wrings maximum performance from a storage
system; but it's risky.
If RAID 0 is for gamblers, RAID 1 is for the risk averse. A RAID 1 configuration duplicates everything
from one drive to an identical twin, so that a failure of one drive won't lead to data loss. RAID 1
doesn't improve performance, and it requires twice the hardware to store the same information.
Other RAID configurations strive to integrate the performance of RAID 0 and the protection of RAID
1.
Thus, a "RAID 0+1" mirrors two striped drives, but demands four hard drives delivering only half their
total storage capacity, Safe and fast, but not cost-efficient. The solution lies in a concept called
parity, key to a range of other sequentially numbered RAID configurations. Of those other
configurations, the ones most often seen are called RAID 5 and RAID 7.
To understand parity, consider the simple equation 5 + 2 = 7. If you didn't know one of the three
values in this equation, you could easily solve for the missing value, i.e., presented with "5 + __ =
7," you can reliably calculate the missing value is 2. In this example, "7" is the parity value or
checksum for "5" and "2."
17
The same process is used in RAID configurations to gain increased performance by striping data
across multiple drives while using parity values to permit the calculation of any missing values lost
to drive failure. In a three drive array, any one of the drives can fail, and we can use the remaining
two to recreate the third (just as we solved for 2 in the equation above).
In this illustration, data is striped across three hard drives,
HDA, HDB and HDC. HDC holds the parity values for
data stripe 1 on HDA and stripe 2 on HDB. It's shown as
"Parity (1, 2)." The parity values for the other stripes are
distributed on the other drives. Again, any one of the
three drives can fail and all of the data is recoverable.
This configuration is RAID 5 and, though it requires a minimum of three drives, it can be expanded
to dozens or hundreds of disks.
Computers
Historically, all sorts of devices—and even people—were “computers.” During World War II, human
computers—women for the most part—were instrumental in calculating artillery trajectories and
assisting with the challenging number-crunching needed by the Manhattan Project. Today, laptop
and desktop personal computers spring to mind when we hear the term “computer;” yet smart
phones, tablet devices, global positioning systems, video gaming platforms, televisions and a host
of other intelligent tools and toys are also computers. More precisely, the central processing unit
(CPU) or microprocessor of the system is the “computer,” and the various input and output devices
that permit humans to interact with the processor are termed peripherals. The key distinction
between a mere calculator and a computer is the latter’s ability to be programmed and its use of
memory and storage. The physical electronic and mechanical components of a computer are its
hardware, and the instruction sets used to program a computer are its software. Unlike the
interchangeable cams of Pierre Jaquet-Droz’ mechanical doll, modern electronic computers receive
their instructions in the form of digital data typically retrieved from the same electronic storage
medium as the digital information upon which the computer performs its computational wizardry.
When you push the power button on your computer, you trigger an extraordinary, expedited
education that takes the machine from insensible illiterate to worldly savant in a matter of seconds.
The process starts with a snippet of data on a chip called the ROM BIOS storing just enough
information in its Read Only Memory to grope around for the Basic Input and Output System
peripherals (like the keyboard, screen and, most importantly, the hard drive). The ROM BIOS also
holds the instructions needed to permit the processor to access more and more data from the hard
drive in a widening gyre, “teaching” itself to be a modern, capable computer.
This rapid, self-sustaining self-education is as magical as if you lifted yourself into the air by pulling
on the straps of your boots, which is truly why it’s called “bootstrapping” or just “booting” a computer.
18
Computer hardware circa 2014 shares certain common characteristics. Within the CPU, a
microprocessor chip is the computational “brains” of system and resides in a socket on the
motherboard, a rigid surface etched with metallic patterns serving as the wiring between the
components on the board. The microprocessor generates considerable heat necessitating the
attachment of a heat dissipation device called a heat sink, often abetted by a small fan. The
motherboard also serves as the attachment
point for memory boards (grouped as modules
or “sticks”) called RAM for Random Access
Memory. RAM serves as the working memory
of the processor while it performs calculations;
accordingly, the more memory present, the
more information can be processed at once,
enhancing overall system performance.
Other chips comprise a Graphics Processor
Unit (GPU) residing on the motherboard or on
a separate expansion board called a video
card or graphics adapter. The GPU supports
the display of information from the processor
onto a monitor or projector and has its own
complement of memory dedicated to superior
graphics performance. Likewise, specialized
chips on the motherboard or an expansion
board called a sound card support the
reproduction of audio to speakers or a
headphone. Video and sound processing
capabilities may even be fully integrated into the
microprocessor chip.
The processor communicates with networks through an interface device called a network adapter
which connects to the network physically, through a LAN Port, or wirelessly using a Wi-Fi
connection.
Users convey information and instructions to computers using tactile devices like a keyboard, mouse
or track pad, but may also employ voice or gestural recognition mechanisms.
Persistent storage of data is a task delegated to other peripherals: optical drives (CD-ROM and
DVD-ROM devices), floppy disk drives, portable solid-state media (i.e., thumb drives) and, most
commonly, hard drives. .
All of the components just described require electricity, supplied by batteries in portable devices or
by a power supply converting AC current to the lower DC voltages required by electronics.
19
From the standpoint of electronic discovery, it’s less important to define these devices than it is to
fathom the information they hold, the places it resides and the forms it takes. Parties and lawyers
have been sanctioned for what was essentially their failure to inquire into and understand the roles
computers, hard drives and servers play as repositories of electronic evidence. Moreover, much
money spent on electronic discovery today is wasted as a consequence of parties’ efforts to convert
ESI to paper-like forms instead of learning to work with ESI in the forms in which it customarily
resides on computers, hard drives and servers.
Servers
Servers were earlier defined as computers dedicated to a specialized task or tasks. But that
definition doesn’t begin to encompass the profound impact upon society of the so-called clientserver computing model. The ability to connect local “client” applications to servers via a network,
particularly to database servers, is central to the operation of most businesses and to all
telecommunications and social networking. Google and Facebook are just enormous groupings of
servers, and the Internet is merely a vast, global array of shared servers.
Local, Cloud and Peer-to-Peer Servers
For e-discovery, let’s divide the world of servers into three realms: Local, Cloud and Peer-to-Peer
server environments.
“Local” servers employ hardware that’s physically available to the party that owns or leases the
servers. Local servers reside in a computer room on a business’ premises or in leased equipment
“lockers” accessed at a co-located data center where a lessor furnishes, e.g., premises security,
power and cooling. Local servers are easiest to deal with in e-discovery because physical access
to the hardware supports more and faster options when it comes to preservation and collection of
potentially responsive ESI.
“Cloud” servers typically reside in facilities not physically accessible to persons using the servers,
and discrete computing hardware is typically not dedicated to a particular user. Instead, the Cloud
computing consumer is buying services via the Internet that emulate the operation of a single
machine or a room full of machines, all according to the changing needs of the Cloud consumer.
Web mail is the most familiar form of Cloud computing, in a variant called SaaS (for Software as a
Service). Webmail providers like Google, Yahoo and Microsoft make e-mail accounts available on
their servers in massive data centers, and the data on those servers is available solely via the
Internet, no user having the right to gain physical access to the machines storing their messaging.
“Peer-to-Peer” (P2P) networks exploit the fact that any computer connected to a network has the
potential to serve data across the network. Accordingly, P2P networks are decentralized; that is,
each computer or “node” on a P2P network acts as client and server, sharing storage space,
communication bandwidth and/or processor time with other nodes. P2P networking may be
employed to share a printer in the home, where the computer physically connected to the printer
acts as a print server for other machines on the network. On a global scale, P2P networking is the
20
technology behind file sharing applications like BitTorrent and Gnutella that have garnered headlines
for their facilitation of illegal sharing of copyrighted content. When users install P2P applications to
gain access to shared files, they simultaneously (and often unwittingly) dedicate their machine to
serving up such content to a multitude of other nodes.
Virtual Servers
Though we’ve so far spoken of server hardware, i.e., physical devices, servers may also be
implemented virtually, through software that emulates the functions of a physical device. Such
“hardware virtualization” allows for more efficient deployment of computing resources by enabling a
single physical server to host multiple virtual servers.
Virtualization is the key enabling technology behind many Cloud services. If a company needs
powerful servers to launch a new social networking site, it can raise capital and invest in the
hardware, software, physical plant and personnel needed to support a data center, with the attendant
risk that it will be over-provisioned or under-provisioned as demand fluctuates. Alternatively, the
startup can secure the computing resources it needs by using virtual servers hosted by a Cloud
service provider like Amazon, Microsoft or Rackspace. Virtualization permits computing resources
to be added or retired commensurate with demand, and being pay-as-you-go, it requires little capital
investment.
It’s helpful for attorneys to understand the role of virtual machines (VMs) because the ease and
speed with which VMs are deployed and retired, as well as their isolation within the operating system,
can pose unique risks and challenges in e-discovery, especially with respect to implementing a
proper legal hold and when identifying and collecting potentially responsive ESI.
Server Applications
Computers dedicated to server roles typically run operating systems optimized for server tasks and
applications specially designed to run in a server environment. In turn, servers are often dedicated
to supporting specific functions such as serving web pages (Web Server), retaining and delivering
files from shared storage allocations (File Server), organizing voluminous data (Database Server),
facilitating the use of shared printers (Print Server), running programs (Application Server) or
handling messages (Mail Server). These various server applications may run physically, virtually or
as a mix of the two.
Practice Tips for Computers, Hard Drives and Servers
Your first hurdle when dealing with computers, hard drives and servers in e-discovery is to identify
potentially responsive sources of ESI and take appropriate steps to inventory their relevant contents
and preserve them against spoliation. As the volume of ESI to be collected and processed bears on
the expense and time required, it’s useful to get a handle on data volumes and distribution as early
in the litigation process as possible.
21
Start your ESI inventory by taking stock of physical computing and storage devices. For each
machine or device holding potentially responsive ESI, collect the following information (as
applicable):
 Manufacturer and model
 Serial number and/or service or asset tag
 Operating system
 Custodian
 Location
 Type of storage (don’t miss removable media, like SD cards)
 Aggregate storage capacity (in MB, GB or TB)
 Encryption status
 Credentials (user IDs and passwords), if encrypted
 Prospects for upgrade or disposal
 If you’ll preserve ESI by drive imaging, it’s helpful to identify device interfaces.
For servers, further information might include:









Purpose(s) of the server (e.g., web server, file server, print server, etc.)
Names and contact information of server administrator(s)
Time in service
Whether hardware virtualization is used
RAID implementation(s)
Users and privileges
Logging and log retention practices
Backup procedures and backup media rotation and retention
Whether the server is “mission critical” and cannot be taken offline or can be downed.
When preserving the contents of a desktop or laptop computer, it’s typically unnecessary to
sequester any component of the machine other than its hard drive(s) since the ROM BIOS holds
little information beyond the rare forensic artifact. Before returning a chassis to service with a new
hard drive, be sure to document the custodian, manufacturer, model and serial number/service tag
of the redeployed chassis, retaining this information with the sequestered hard drive.
The ability to fully explore the contents of servers for potentially responsive information hinges upon
the privileges extended to the user. Be sure that the person tasked to identify data for preservation
or collection holds administrator-level privileges.
Above all, remember that computers, hard drives and servers are constantly changing while in
service. Simply rebooting a machine alters system metadata values for large numbers of files.
Accordingly, you should consider the need for evidentiary integrity before exploring the contents of
a device, at least until appropriate steps are taken to guard against unwitting alteration. Note also
22
that connecting an evidence drive to a new machine effects changes to the evidence unless suitable
write blocking tools or techniques are employed.
23
Eight Tips to Quash the Cost of E-Discovery
This really happened:
Opposing counsel supplied an affidavit stating it would take thirteen years to review 33 months of email traffic for thirteen people. Counsel averred there would be about 950,000 messages and
attachments after keyword filtering. Working all day, every day reviewing 40 documents per hour,
they expected first level review to wrap up in 23,750 hours. A more deliberate second level review
of 10-15% of the items would require an additional two years. Finally, counsel projected another
year to prepare a privilege log. Cost: millions of dollars.
The arithmetic was unassailable, and a partner in a prestigious law firm swore to its truth under oath.
This could have happened:
On Monday afternoon, an associate attached a hard drive holding 33 months of e-mail for thirteen
custodians to the USB port of her computer and headed home. Overnight, e-discovery review
software churned through the messages and attachments indexing their contents for search and deduplicating redundant data. The next morning, the associate identified responsive documents using
keywords and concept clustering. She learned the lingo, mastered the acronyms and identified
common misspellings. She found large swaths of irrelevant data that could be safely eliminated from
the collection and began segregating responsive and non-responsive items. By lunchtime on
Wednesday, the software started asking whether particular items were responsive. Before she
called it a day, the associate ceded much of the heavy lifting to the program’s technology-assisted
review capabilities and shifted her attention to searching for lawyers’ names and e-mail domains to
flag privileged communications. She spent Thursday afternoon sampling items the computer
identified as non-responsive to be assured of the quality of review. Before she called it a day, the
associate tasked the software to generate a production set and a privilege log for partner review on
Friday and wondered if it might be a good weekend to head to the beach. Cost: 40 associate
hours.
These two scenarios contrast the gross disparity in review costs and time between lawyers who
approach e-discovery in ignorance and those who do so with skill. The Luddite lawyer who knows
nothing of modern methods misleads the court and cheats the client. The adept associate proves
that e-discovery is fast and affordable when the right tools and talents are brought to bear.
Electronically stored information (ESI) serves us in all our day-to-day endeavors. ESI can and should
serve us just as well in our search for probative evidence and in the resolution of disputes.
You Must Make It Happen
Finding efficiencies and avoiding dumb decisions in electronic discovery isn’t someone else’s
responsibility. It’s yours. If someone else must perennially whisper in your ear, articulating the
issues and answering the questions you should be competent to address, you aren’t serving your
client.
24
ESI isn’t going away, nor will it wane in quantity, variety or importance as evidence. Each day you
fail to hone your e-discovery skills is a day closer to losing a case or losing a client. Each day you
learn something new about ESI and better appreciate how to request, find, cull, review and produce
it at lowest cost is a day that cements your worth to your clients and makes you a more effective
counselor and advocate.
Eight Tips to Quash the Cost of E-Discovery
The following tips are offered to help you slash the outsize cost of e-discovery:
1.
2.
3.
4.
5.
6.
7.
8.
Eliminate Waste
Reduce Redundancy and Fragmentation
Don’t Convert ESI
Review Rationally
Test your Methods and Know your ESI
Use good tools
Communicate and cooperate
Price is what the seller accepts
1. Eliminate Waste
The author once polled thought leaders in electronic discovery about costs. They uniformly agreed
that about half of every e-discovery dollar is expended unnecessarily as a consequence of counsel
lacking competence with respect to ESI. Half was kind.
Every time you over-preserve or over-collect ESI, every time you convert native data to alternate
forms or fail to deduplicate ESI before review and every time you otherwise review information that
didn’t warrant “eyes on,” you add cost without benefitting your client. It’s money wasted. Poor ediscovery choices tend to be driven by irrational fears, and irrational fears flow from lack of familiarity
with systems, tools and techniques that achieve better outcomes at lower cost. The consequences
of poor e-discovery decisions prompt motions to compel or for sanctions, further ratcheting up the
cost of incompetence.
2. Reduce Redundancy and Fragmentation
Many complain that electronic discovery has made litigation more costly because there is so much
more information available today. Certainly, there are more channels of information available today,
allowing an enlightened advocate more probative evidence. Much of what evaporated as a phone
conversation now endures as a writing. There is more temporal, photographic and geolocation data
to draw on, and more “persons with knowledge of relevant facts” who are privy to revealing
information.
Despite there being more, the increase doesn’t reflect the dire logarithmic leap in data volume some
suggest. Much of the growth is attributable to replication and fragmentation. Put simply, human
beings don’t create that much more unique information; they mostly make more copies of the same
25
information and break it into smaller pieces. Yesterday’s memo sent to three people is today’s 30message thread sent to the whole department and retrieved on multiple devices. These iterations
add a lot to the quantity of ESI, but little in the way of truly unique evidence. Thus, the burden and
cost of e-discovery is inversely proportional to a litigant’s ability to reduce redundancy and
fragmentation. There are many ways to minimize redundancy and fragmentation. Some entail
sensible choices during identification and collection; others involve the application of tools and
techniques geared to eliminating replication and organizing fragmented information for efficient
review.
Anyone who has done a document review can attest to the tedium of seeing the same documents
over and over again. Messages repeat within threads or across recipients, and attachments to
messages mirror documents from file servers. Some of this can be readily eliminated by simple
hash-based de-duplication that costs very little and reliably eliminates documents that are duplicates
in all respects. Hash-based deduplication calculates a “digital fingerprint” value (variously called an
MD5 or SHA1 value) for each document, allowing redundant documents to be excluded from review.
Nothing offers a more cost-effective means to reduce the cost of document review than
deduplication; consequently, no one should undertake a document review without minimally running
a simple hash-based deduplication to eliminate replication.
Unfortunately, simple hash-based deduplication doesn’t work for e-mail messages (which
necessarily reflect different routing information for different recipients) or for documents with minor
variations that don’t signify material differences in content. For these items, more advanced neardeduplication techniques are needed to eliminate redundancy without increasing the risk that unique
documents will be overlooked.
Deduplication is a mechanical process requiring little, if any, human intervention or costly
programming. Accordingly, its cost should always be a nominal component of an e-discovery effort.
If a service provider attempts to charge princely sums for deduplication, consider it a sign that it’s
time to find a new vendor. When the volume of information to be deduplicated is modest (e.g., less
than 10-15 GB), low cost tools are available to deduplicate without the need to engage a service
provider.1
3. Don’t Convert ESI
It’s criminal how much money is wasted converting electronic information into paper-like forms just
so lawyers don’t have to update workflows or adopt contemporary review tools. Our clients work
with native forms of ESI because native forms are the most utile, complete and efficient forms in
One of the finest tools for deduplicating collections less than 15GB is called Prooffinder (www.prooffinder.com). It costs
$100.00 for an annual license, and all proceeds from its sale go to support child literacy.
1
26
which to store and access data. Our clients don’t print their e-mail before reading it. Our clients
don’t see the need to emboss the document’s name on every page. Our clients communicate and
collaborate using tracked changes and embedded comments, yet many lawyers intentionally or
unwittingly purge these changes and comments in e-discovery and fail to disclose such redaction.
They do it by converting native forms to images, like TIFF.
Converting a client’s ESI from its natural state as kept “in its ordinary course of business” to TIFF
images injects needless expense in at least half a dozen ways. First, you must pay someone to
convert native forms to TIFF images and emboss Bates numbers. Second, you must pay someone
to generate load files containing extracted text and application metadata from the native ESI. Third,
you must produce multiple copies of certain documents (like spreadsheets) that are virtually
incapable of being produced as TIFF images. Fourth, because TIFF images paired with load files
are much “fatter” files than their native counterparts, you pay much more for vendors to ingest and
host them by the gigabyte. Fifth, it’s very difficult to reliably deduplicate documents once they have
been converted to TIFF images. Sixth, you may have to reproduce everything when your opponent
wises up to the fact that you’ve substituted cumbersome TIFF images and load files for the genuine,
efficient evidence.
4. Review Rationally
Recently, an opponent advised the Court that their projected cost of review encompassed the
obligation to look at every e-mail attachment when the body of the e-mail message contained a
keyword hit, even when none of the attachments contained a hit. They made this representation
knowing that the majority of the hits would prove to be noise hits, that is, keywords in a context that
doesn’t denote responsiveness. Why would a party incur the expense to review the attachments to
a message they’d determined was non-responsive when the attachments contained no keywords?
It turned out they had separated attachments from e-mail transmittals, surrendering the ability to
know which attachments could be eliminated from review because the transmitting message was
eliminated from review. That’s not a rational approach to review.
A common irrational approach to review is to treat information in any form from any source as
requiring privilege review when even a dollop of thought would make clear that not all forms or
sources of ESI are created equal when it comes to their potential to hold privileged content. Review
accounts for anywhere from 60-90% of the cost of e-discovery; so, anything that defensibly narrows
the scope of review prompts maximum savings. Almost anytime you can use technology to isolate
privileged content and prudently employ a clawback agreement or Federal Rule of Evidence 502 to
guard against inadvertent disclosure, you can slash the cost of privilege review.
5. Test your Methods and Know your ESI
Staggering sums are spent in e-discovery to collect and review data that would never have been
collected if only someone had run a small scale test before deploying an enterprise search. It’s easy
and inexpensive to test proposed searches against representative samples of data (e.g., one key
custodian’s mailbox) so as to identify outcomes that will unduly drive up the cost of ingestion, hosting
27
and review. This entails more than simply eliminating queries with large numbers of hits; it requires
modifying them to balance the incidence of noise hits against hits on responsive data.
A lot of money gets wasted in e-discovery over disputes that could be quickly resolved if someone
simply knew more about the ESI i.e., if someone simply looked. Here again, knowing the software
and file types used, the nature and configuration of the e-mail system, the retention scheme for
backup media or whether a key custodian used a home system for business are all examples of
information that can serve to facilitate decisions that will narrow the scope of collection and review
with consequent cost savings.
6. Use Good Tools
If you needed to dig a big hole, you wouldn’t use a teaspoon, nor would you hire a hundred people
with teaspoons. You’d use the right power tool and a skilled operator.
You can’t efficiently collect or review ESI without using good tools. Anyone engaging in e-discovery
should be able to answer the question, “What’s your review platform?” They should be able to
articulate why they use one review platform over another, and “because we already owned a copy”
is not the best reason.
A review platform is the software tool used to index, sort, search, view, organize and tag ESI.
Choosing the right review platform for your practice requires understanding your workflow,
personnel, search needs and forms in which ESI will be ingested and produced. Review platforms
can be cost-prohibitive for some practitioners, but it’s untenable to manage ESI in discovery without
a capable review platform.
There are many review platforms on the market, including familiar names like Relativity,
Concordance and Summation. There are also Internet-accessible “hosted” review environments
and many proprietary review tools touting more bells and whistles than a Mardi Gras parade. Among
the most important consideration in selecting a review platform is its ability to accept data in forms
that do not to require costly conversion to TIFF images. Additionally, you may want the platform you
select to support the most advanced forms of technology-assisted search and review that your
budget allows, including predictive coding capabilities.
7. Communicate and Cooperate
Poor communication and lack of cooperation between parties on e-discovery issues contribute
markedly to increased cost. The incentives driving transparency and cooperation in e-discovery are
often misunderstood. You don’t communicate or cooperate with an opponent to help them win their
case on the merits; you do it to permit the case to be resolved on its merits and not be derailed or
made more expensive by e-discovery disputes.
28
Much of the waste in e-discovery grows out of apprehension and uncertainty. Litigants often overcollect and over-review, preferring to spend more than necessary instead of giving the transparency
needed to secure a crucial concession on scope or methodology.
Communication and cooperation in e-discovery are not signs of weakness but of strength.
Cooperation is a means to demonstrate that your client understands its e-discovery obligations and
is meeting them. More, it’s a means to build trust in the scope and methods of discovery so as to
forestall challenges that may prove disruptive to the case and the client’s operations. It’s even
possible that your opponent understands e-discovery or your client’s systems better than you do and
can propose more efficient ways to scope and complete the effort. What an opponent will accept in
a cooperative give-and-take is often less onerous than what you were planning to produce.
Put simply: the more you seek to hide the ball, the more likely a savvy opponent will dig deeper and
find something your side missed. Because there are no perfect e-discovery efforts, there are none
that can withstand the heightened scrutiny invited by shortsighted stonewalling.
Hubris doesn’t help. Most flaws in e-discovery processes can be rectified quickly and cheaply when
they surface early. An overlooked variant on a keyword or a missed file type is easy to fix at the
outset, but can prove costly or irreparable when discovered months or years later. Moreover,
disclosure tends to shift the burden to act. Courts tend not to entertain belated objections from
parties who’d been supplied sufficient information to act promptly.
8. Price is What the Seller Accepts
I’ve haggled in bazaars and markets from Cairo to Kowloon; but, I’ve never seen more pliant pricing
than among those hawking e-discovery tools and services in the United States.
A famous/infamous e-discovery vendor once quoted $43.5 million for a six-week engagement
processing a very large volume of data on an expedited basis. The prospect was desperate, but not
insane. Rebuffed, the vendor re-quoted the job the next day for several million dollars less. They
“sharpened their pencil” again the next day…and the next. Before the week was out, the vendor
was proposing to do the job for $3.5 million. They didn’t get the work.
Service providers have to pay staff and keep the lights on. So, almost any work beats no work at all.
Many will accept work that isn’t profitable, if it keeps a competitor from getting the business. Shop
around. Make an offer. Only a sucker pays rack rate.
Make yourself sheep and the wolves will eat you. Benjamin Franklin
29
Copyright 2005-11
30
E-Discovery for Everybody: The EDna Challenge
Craig Ball
© 2009
E-discovery is just for big budget cases involving big companies, handled by big firms.
Right, and suffrage is just for white, male landowners.
Some Neanderthal notions take longer than others to get shown the door, and it's time to dispel the
mistaken belief that e-discovery is just for the country club set.
Today, evidence means electronic evidence; so, like the courts themselves, access to evidence can't
be just for the privileged. Everyone gets to play.
If you think big firms succeed at e-discovery because they know more than you do, think again.
Marketing hype aside, big firm litigators don't know much more about e-discovery than solo
practitioners. Corporate clients hire pricey vendors with loads of computing power to index, search,
de-duplicate, convert and manage terabytes of data. Big law firms deploy sophisticated in-house or
hosted review platforms that let armies of associates and contract lawyers plow through vast plains
of data--viewing, tagging, searching, sorting and redacting with a few keystrokes. The big boys
simply have better toys.
A hurdle for everyone else is the unavailability and high cost of specialized software to process and
review electronic evidence.
A Mercedes and a Mazda both get you where you need to go, but the e-discovery industry has no
Mazdas on the lot. This article explores affordable, off-the-shelf ways to get where you need to go
in e-discovery.
One Size Doesn't Fit All
First, let's set sensible expectations: Vast, varied productions of ESI cannot be efficiently or
affordably managed and reviewed with software from Best Buy. If you're grappling with millions of
files and messages, you'll need to turn to some pretty pricy power tools.
The key consideration is workflow. Tools designed for ESI review can save considerable time over
cobbled-together methods employing off-the-shelf applications; and, when every action is
extrapolated across millions of messages and documents, seconds saved add up to big productivity
gains.
But few cases involve millions of files. Most entail review of material collected from a handful of
custodians in familiar productivity formats like Outlook e-mail, Word documents, Excel spreadsheets
and PowerPoint presentations. Yes, volume is a challenge in these cases, too; but, a mix of low31
cost tools and careful attention to process makes it possible to do defensible e-discovery on the
cheap.
Paper Jam
More from comfort than sense, ESI in smaller cases tends to be printed out. Paper filled the void for
a time, but lately the cracks are starting to show. Lawyers are coming to appreciate that printing
evidence isn't just more expensive and slower, it puts clients at an informational disadvantage.
When you print an electronic document, you lose three things: Money, time and metadata. Money
and time are obvious, but the impact of lost metadata is often missed. When you move ESI to paper
or paper-like formats like TIFF images, you cede most of your ability to search and authenticate
information, along with the ability to quickly and reliably exclude irrelevant data. Losing metadata
isn't about missing the chance to mine embedded information for smoking guns. That's secondary.
Losing metadata is like losing all the colors, folders, staples, dates and page numbers that help
paper records make sense.
The EDna Challenge
I polled a group of leading e-discovery lawyers and forensic technologists to see what tools and
techniques they thought suited to the following hypothetical:
Your old school chum, Edna, runs a small firm and wants your advice. A client is about to
send her two DVDs containing ESI collected in a construction dispute. It will be Outlook PST
files for six people and a mixed bag of Word documents, Excel spreadsheets, PowerPoint
presentations, Adobe PDFs and scanned paper records sans OCR. There could be a little
video, some photographs and a smattering of voicemail in WAV formats. "Nothing too hinky,"
she promises. Edna's confident it will comprise less than 50,000 documents and e-mails, but
it could grow to 100,000 items before the case concludes in a year or two.
Edna's determined to conduct an in-house, paperless privilege and responsiveness review,
sharing the task with a tech-savvy associate and legal assistant. All have late-model, big
screen Windows desktop PCs with MS Office Professional 2007 and Adobe Acrobat 9.0
installed. The network file server has ample available storage space. Edna doesn't own
Summation or Concordance, but she's willing to spend up to $1,000.00 for new software and
hardware, but not a penny more. She's open to an online Software as a Service (SaaS)
option, but the review has to be completed using just the hardware and software she currently
owns, supplemented only by the $1,000.00 in new purchases. Her team will supply as much
brute force as necessary. She's too proud to accept a loan of systems or software, and you
can't change her mind or budget.
How should Edna proceed?
Goals of the Challenge
Ideally, the review method employed should:
32
1.
2.
3.
4.
5.
6.
7.
Preserve relevant metadata;
Incorporate de-duplication, as feasible;
Support robust search of Outlook mail and productivity formats;
Allow for efficient workflow;
Enable rudimentary redaction;
Run well on most late-model personal computers; and
Require no more than $1,000.00 in new software or hardware, though it's fine to use fullyfunctional "free trial" software so long as you can access the data for the 2-3 year life of the
case.
I had some ideas (shared later in this article), but expected my colleagues might point me to better
mousetraps. Instead, I was struck by the familiarity and consistency of their excellent suggestions
as compared to options that have been around for years. Sadly, there's not that much new for those
on shoestring budgets; that is, developers remain steadfastly disinterested in 85% of the potential
market for desktop discovery tools.
One possible bright spot was the emergence of hosted options. No one was sure the job could be
begun--let alone completed--using SaaS on so tight a budget; but, there was enough mention of
Saas to make it seem like a possibility, now or someday soon.
Advice to Edna
While the range of proposals was thin, the thought behind them was first-rate. All responding
recognized the peril of using the various Microsoft applications to review the ESI. Outlook's search
capabilities are limited, especially with respect to attachments. If Edna expected to reliably search
inside of every message, attachment and container file, she would need more than Outlook alone.
Notable by their absence were any suggestions to use Google's free desktop indexing and search
tool. Though a painful interface for e-discovery, Google Desktop installed on a dedicated, "clean"
machine would be capable of reading and searching Outlook e-mail, Word documents, Excel
spreadsheets, PowerPoint presentations, PDF files, Zip archives and even text within music, video
and image files. It wouldn't be pretty--and Edna would have to scrupulously guard against crosscontamination of the evidence with other data--but Google Desktop might get much of the job done
without spending a penny.
Quin Gregor of Strategic Data Retention LLC in Georgia was first to respond with an endorsement
of my two favorite affordable workhorses, the ubiquitous dtSearch indexing and search tool ($199.00
at www.dtsearch.com) and Aid4Mail ($69.95 at www.fookes.com), a robust utility for opening,
filtering and converting common e-mail container files and message formats. Quin described a
bankruptcy case where a microscopic budget necessitated finding a low-end option. He reports that
dtSearch and Aid4Mail saved the day.
33
Ron Chichester, an attorney and forensic examiner in Texas pointed to the many open source Linux
tools available without cost. These command line interface tools are capable of indexing, Bayesian
analysis and much of the heavy lifting of the tools used by e-discovery vendors; but. Ron
acknowledged that Edna and her staff would need a lot of Linux expertise to integrate the open
source offerings. Bottom line: The price is right, but the complexity unacceptable.
Florida e-discovery author and blogger, Ralph Losey, a partner at AkermanSenterfitt, suggested
using an online review tool like Catalyst and tried to dance around the budget barrier by pointing out
that the cost could be passed on to the client. Ralph argued that hosting would save enough lawyer
time to pay for itself. No doubt he's right; but, passing on the costs isn't permitted in the Edna
Challenge and, even in a real world situation, unless the savings were considerable, Edna's likely to
keep the work--and the revenue--in house.
Another Floridian, veteran forensic examiner, Dave Kleiman, suggested that Edna blow her budget
on alcohol and amphetamines because she has a lot of toil ahead of her. Party on, Dave!
Our northern neighbor, Dominic Jaar of Ledjit Consulting Inc. in Quebec, took a similar doleful tack.
Dominic thought that SaaS might be a possibility but added that Edna should use her grand to take
an e-discovery course because she needs to learn enough to "stay far from the case." Else, he
offered, she could go forward and apply the funds to coffee and increased malpractice coverage.
Ouch!
John Simek of Sensei Enterprises in Virginia prudently suggested that Edna use part of her budget
to buy an hour of a consultant's time to help her get started. John predicted that a SaaS approach
would be priced out-of-reach, but was another who thought salvation lay with dtSearch. John
recognized that Adobe Acrobat could handle both the redaction and light-duty OCR required. As for
the images, video and sounds, Edna's in the same boat, rich or poor. She's just going to have to
view or listen to them, one-by-one.
Jerry Hatchett with Evidence Technology in Houston suggested LitScope, a SaaS offering from
LitSoft. Jerry projected a cost of around $40/GB/month, which would burn through Edna's budget in
about 3 months...if she didn't buy any Starbucks. Following up, I discovered that LitScope can't
ingest the native file formats Edna needed to review unless accompanied by load files containing
the text and metadata of the documents and messages. The cost to pre-process the data to load it
would eat up Edna's budget before she looked a single page. That, and a standard $200 minimum
on monthly billings coupled with a 6 month minimum commitment, made this SaaS option a nonstarter. Attractive pricing, to be sure, but not low enough for Edna's shallow pockets.
The meager budget forced George Rudoy, Director of Global Practice Technology & Information
Services at Shearman & Sterling, LLP in New York, to suggest using Outlook 2007 as the e-mail
review tool, adding the caveat that metadata may change. Unlike earlier versions, Outlook 2007
claims to extend its text search capabilities to attachments. Unfortunately, it doesn't work very well
34
in practice, meaning Edna and her staff will need to examine each attachment instead of ruling any
out by search. George also urged Edna to buy licenses for Quick View Plus--a universal file viewer
utility--and hire an Access guru to design a simple database to track the files and hyperlink to each
one for review.
From Down Under, Michelle Mahoney of Mallesons Stephen Jaques in Melbourne shared several
promising approaches. She suggested Karen's Power Tools (a $30 suite of applications at
www.karenware.com) as a means to inventory and hash the files and Microsoft Access as a means
to de-duplicate by hash values. Michelle also favored hyperlinking from Access for review, working
through the collection progressively, ordering them by file type and then filename. She envisions
adding fields to the database for Relevant and Privileged designations and a checkbox for
exceptional files that can't be opened and require further work.
For the e-mail files, Michelle also turns to Outlook as a review tool, proposing that folders be created
for dragging-and-dropping items into Relevant Non Privileged; Relevant Privileged and Non
Relevant groups. She echoed warnings about metadata modification and gives her thumbs up to
Aid4Mail.
Finally, Michelle offers more kudos for dtSearch as the low cost tool-of-choice for keyword searching.
dtSearch will allow Edna to run keywords across files, including emails and attachments, and it is a
simple file copy option to copy them, with or without original path, into a folder. Messages emerge
in the generic MSG mail format, and Edna can either produce them in that format (with embedded
attachments) or use Aid4Mail to copy them into an Outlook PST file format. For further discussion of
using dtSearch as a low-cost e-discovery tool, see, Craig Ball, Do-It-Yourself Digital Discovery, (Law
Technology News, May 2006).
Tom O'Connor, Director of the Legal Electronic Document Institute in New Orleans, observed that
he often gets requests like Edna's from his clients in Louisiana and Mississippi and weighed in with
a mention of Adobe Acrobat, noting that it might be feasible to print everything to Acrobat and use
Acrobat's annotation and redaction features. As mentioned, Acrobat also offers rudimentary OCR
capabilities to help deal with the scanned paper documents in the collection and even has the ability
to convert modest volumes of e-mail to PDFs directly from Outlook. For further discussion of using
Adobe Acrobat to process Outlook e-mail, see, Craig Ball, Adobe Brings an Acrobat to Perform EDD
(Law Technology News, June 2008). Tom concludes that, although working with the tools you
already own and know can be cumbersome, it's sometimes a better approach that trying to master
new tools under pressure.
Ohio-based e-discovery consultant, Brett Burney, had some very concrete ideas for Edna. He
thought she could try to find some SaaS solution to host the data, suggesting Lexbe, NextPoint or
Trial Solutions as candidates. Brett was most familiar with Lexbe and knew of small law firms that
had successfully and inexpensively used their services.
35
Brett guessed Edna's budget might allow her to upload everything to Lexbe, review it quickly and
then take everything down before the hosting costs ate up her budget. He reported that Lexbe will
accept about any file format, by uploading it yourself or sending it to Lexbe to load. Brett put the
cost at $99 per month for 2 users and 1GB of storage. Noting that Edna needs to host more than
1GB of data, he predicted her outlay should be close to $200/month. Brett added, "Edna and her
crew can upload everything with the tools they have, get it reviewed pronto (i.e. less than a month),
and then take everything down--paying only for what they use."
For the Outlook e-mail, Brett thought Edna should turn to Adobe Acrobat and convert the PST
container files to PDF Portfolios along the lines of my June 2008 column. Alternatively, Brett
suggested Edna use the free Trident Lite tool from Wave Software (www.discoverthewave.com) to
get a "snapshot" of the PSTs and then convert relevant messages to PDF or upload them to a hosting
provider.
Lisa Habbeshaw of FTI in California pointed to Intella by Vound Software (http://www.voundsoftware.com) as an all-in-one answer to Edna's needs. Intella offers an efficient indexing engine,
user-friendly interface and innovative visual analysis capability sure to make quick work of Edna's
review effort. Lisa was unsure if the program could be had for under $1,000, but noted that Vound
Software offers a free, fully-functional demo that might fill the bill for Edna's immediate needs. Like
Lisa, I'm unsure whether Intella will bust Edna's budget, but it's certainly a splendid new entry to the
do-it-yourself market.
Other Great Tools
If the dollar holds its own against the Euro, Edna could accomplish just about everything she needs
to do using a terrific tool created in Germany called X-Ways Forensics from X-Ways Software
Technology AG. X-Ways Forensics could make quick work of the listing, hashing, opening, viewing,
indexing, searching, categorizing and reporting on all that client data; however, it's a complex,
powerful forensics tool that would require more time and training to master than Edna can spare.
Plus, it would eat up all of her $1,000 budget.
If her budget was bigger, Edna would be very happy attacking the review with the easy-to-use, fast
and versatile Nuix Desktop (www.nuix.com). Nuix would allow Edna to begin her review in minutes,
and it supports a host of search options. The embedded viewer, hash and classification features
foster an efficient workflow and division of review among multiple reviewers. Like Intella, Nuix is an
Australian import. Whatever they're doing way down there in Kangaroo land, they're certainly doing
something right!
A Few More Ideas for Edna
It's hard to add much to so many fine ideas. Collectively, dtSearch, Adobe Acrobat and Aid4Mail
deliver the essential capabilities to unbundle, index, search, OCR and redact the conventional file
formats and modest data volumes Edna faces. Her challenge will be cobbling together tools not
36
designed for e-discovery so as to achieve an acceptable workflow and defensible tracking
methodology. It won't be easy.
For example, while dtSearch is Best of Class in its price range, it doesn't afford Edna any reasonable
way to tag or annotate documents as she reviews them. Accordingly, Edna will be obliged to move
each document to a folder as she makes her assessments respecting privilege and responsiveness.
That effort will get very old, very fast.
On the plus side, dtSearch offers a fully functional thirty-day demo of its desktop version, so Edna
can buy a copy for her long-term use, but rely on 30-day evaluation copies for her staff during the
intense review effort--a $400 savings.
While Adobe Acrobat supports conversion of e-mail into PDFs, the process is painfully slow and
cumbersome. Moreover, the conversion capabilities break down above 10,000 messages. That
sounds like a lot, but it's likely less than Edna will see emerge in the collections of six custodians.
Further, Edna may encounter an opponent who smart enough to demand the more versatile
electronic formats for e-mail (i.e., PST, MSG or EML). What's Edna going to do if she finds herself
locked into a reviewed wedded to image formats?
Whatever tools she employs, Edna will need to be meticulous in her shepherding of the individual
messages and documents through the process.
To that end, I'd offer this advice:
1. Your first step should be to make a working copy of the data to be processed and secure the
source dataset against any usage or alteration. Processing of ESI poses risks of data loss or
alteration. If errors occur, you must be able to return to uncorrupted data from prior steps.
For each major processing threshold, set aside a copy of the data for safekeeping and
carefully document the time the data was set aside and what work had been done to that point
(e.g., the status of deduplication, filtering and redaction).
2. From the working copy, hash the files and generate an inventory of all files and their metadata.
The processes you employ must account for the disposition of every file in the source
collection or extracted from those files (i.e., message attachments and contents of
compressed archives). Your accounting must extend from inception of processing to
production. By hashing the constituents of the collection as it grows, you gain a means to
uniquely identify files as well as a way to identify identical files across custodians and sources.
A useful tool for hashing files is Karen's Hasher available at http://www.karenware.com. But
the best "free" tool for the task is AccessData's FTK Imager, available from
www.accessdata.com/downloads. FTK Imager not only hashes files, it also exports Excelcompatible comma delimited listings of filenames, file paths, file sizes and modified, accessed
and created dates. Moreover, it supports loading the collected files into a container called a
Custom Content Image that protects the data from metadata corruption.
37
3. Devise a logical division scheme for the components of the collection; e.g., by machine,
custodian, business unit or otherwise. Be careful not to aggregate files in a manner that files
from one source may overwrite identically named files from other sources.
4. Expand files that hold messages and other files. Here, you should identify e-mail container
files (like Outlook .PST files) and archives (e.g., .Zip files) that must be opened or
decompressed to make their constituents amenable to search. For e-mail, this can be done
using an inexpensive utility like Aid4mail from Fookes Software or Trident Lite from Wave
Software. Additionally, e-mail client applications, including Outlook, usually permit export of
individual messages and attachments. Though dtSearch includes a command line utility to
convert Outlook PST container files to individual messages (.MSG) files for indexing, it doesn't
work well or easily compared to Aid4Mail. Finally, most indexing tools are capable of directly
accessing text within compressed formats. For example, DTSearch can extract text from Zip
files and other archives.
5. A feature common to premium e-discovery tools but hard to match with off-the-shelf software
is deduplication. You can use hash values to identify identical files, but the challenge is to
keep track of all de-duplicated content and reliably apply tagging for privilege and
responsiveness to all deduplicated iterations. Most off-the-shelf utilities simply eliminate
duplicates and so aren't suited to e-discovery.
This is where it's a good investment to secure help from an expert in Microsoft Excel or Access
because those applications can be programmed to support deduplication tracking and
tagging.
When employing deduplication, keep in mind that files with matching hash values can have
different filenames and dates. The hash identicality of two files speaks to the contents of the
files, not the names assigned to the files by the operating system or to information, like
modified, accessed and created dates, stored outside the files.
6. Above all, don't process and review ESI in a vacuum. Be certain that you understand the
other side's expectations in terms of the scope of the effort, approach to search and--critically-the forms of production they seek. You may not agree on much, but you may be pleasantly
surprised to learn that some of the perils of a low budget e-discovery effort (e.g., altered
metadata, limited search capabilities, native production formats) don't concern the other side.
Further, you may reach accord on limiting the scope of review in terms of time intervals,
custodians and types of data under scrutiny. Why look at all the e-mail if the other side is
content with your searching just communications between Don and Betty during the third
week of January 2009?
38
Finally, Edna may seek an answer to two common questions from those taking the do-it-yourself
route in e-discovery:
What if I change metadata?
Certain system metadata values--e.g., last access times and creation dates--are prone to
alteration when processed using tools not designed for e-discovery. Such changes are rarely
a problem if you adhere to three rules:
1.
Preserve an unaltered copy of whatever you're about to process;
2.
Understand what metadata were altered; and,
3.
Disclose the changes to the requesting party.
By keeping a copy of the data at each step, you can recover true metadata values if particular
values prove significant. Then, disclosing what metadata values were changed eliminates
any suggestion that you pulled a fast one. Many requesting parties have little regard for
system metadata values; but, they don't want to be surprised by relying on inaccurate
information.
Can I Use My Own E-Mail Account for Review?
You wouldn't commingle client funds with your own money, so why commingle e-mail that's
evidence in a case with your own mail? That said, when ESI is evidence and the budget
leaves no alternative, you may be forced to use your own e-mail tools for small-scale review
efforts. If so, remember that you can create alternate user accounts within Windows to avoid
commingling client data with your own. Better still, undertake the review using a machine
with a clean install of the operating system. Very tech-savvy counsel can employ virtual
environments (e.g., VMWare products) to the same end.
If using an e-mail client for review, it may be sufficient to categorize messages and
attachments by simply dragging them to folders representing review categories; for example:
1.
Attorney-client privilege: entire item;
2.
Work product privilege: entire item;
3.
A-C Privilege: needs redaction;
4.
W-P privilege: needs redaction;
5.
Other privilege;
6.
Responsive;
7.
Non-responsive.
Once categorized, the contents of the various folders can be exported for further processing
or for production, if in a suitable format.
Throwing Down The Gauntlet
The vast majority of cases filed, developed and tried in the United States are not multimillion dollar
dust ups between big companies. The evidence in modest cases is digital, too. Solo and small firm
counsel like Edna need affordable, user-friendly tools designed for desktop e-discovery--tools that
39
preserve metadata, offer efficient workflow and ably handle the common file formats that account for
nearly all of the ESI seen in day-to-day litigation. Using the tools and techniques described by my
thoughtful colleagues, Edna will get the job done on time and under budget. The pieces are there,
though the integration falls short.
So, how about it e-discovery industry? Can you divert your gaze from the golden calf long enough
to see the future and recall the past? Sam Walton became the richest man of his era by selling to
more for less. There's a fast growing need...and a huge emerging market. The real Edna Challenge
is waiting for the visionaries who will meet the need and serve the market.
March 2013 Epilog:
Since I penned this article in 2009, several software vendors have risen to the EDna Challenge and
market capable tools at prices within EDna’s reach. I’m still not ready to declare anyone a “winner”
of the Challenge, but the emergence of lower-priced e-discovery tools makes us all winners. Two
offerings meriting special recognition are Nuix’ Prooffinder (www.prooffinder.com) and GGO’s Digital
WarRoom Pro (www.digitalwarroom.com).
Prooffinder would cost Edna $100.00 for an annual license, with all proceeds of sale going to support
children’s literacy. Prooffinder is scaled-down version of Nuix, arguably the most capable ediscovery processing tool on the market today. To keep its price low, Prooffinder will not process
more than 15GB of data for a single case (ample for Edna’s needs); but, even at so piddling a price,
Prooffinder delivers speedy, sophisticated search capabilities, excellent metadata extraction,
effective de-duplication and a host of other functional and analytical features. From the standpoint
of cost of capability, no other product can touch it, and probably the only additional cost Edna will
need to incur is to purchase some redaction software (or a copy of Adobe Acrobat with redaction
capabilities).
Digital WarRoom is a full-featured e-discovery suite of tools that was a promising EDNa challenge
contender on all fronts except for its pricing. Though an annual license for DWR Pro is only $895.00,
renewal would push the product out of Edna’s budget. Moreover, to gain full functionality of the
product, users must purchase a separate $49.00 license for a file viewer application.
40
41
Ten Things That Trouble Judges About E-Discovery
Craig Ball
© 2010
As counselor, consultant or court-appointed special master, my law practice revolves around
electronically stored information (ESI)--seeking to salvage the wrecks others have made of ediscovery and helping parties to navigate unfamiliar shoals.
The goal is to forestall or resolve conflicts with judges incensed by parties’ failure to fulfill e-discovery
duties. Judges frequently doubt that electronic discovery is as difficult or expensive as the lawyers
before them claim. For the most part, the judges are right. E-discovery is not that hard and need not
be so costly.
That is, it’s not that hard or expensive if counsel knows what he or she is doing, and that’s a huge
“if.” Judges feel lawyers should know how to protect, marshal, search and produce the evidence in
their cases or enlist co-counsel and experts with that know how. The judges are right about that,
too. Lawyers must master modern evidence in the same way that doctors must stay abreast of the
latest developments in medicine.
The challenge to listing ten things that trouble judges about e-discovery is limiting it to only ten things.
E-discovery exposes much that is not pretty about the state of the law practice, e.g., wasteful,
obsolete practices; poor management skills; conflicting interests between lawyers and clients; and
unequal access to justice between the rich and the rest. E-discovery didn't create these problems,
but like a hard rain on an old roof, it exposes failings too long ignored.
First and most intractable among these problems is:
1. Lawyer incompetence
The landscape of litigation has forever changed, and there is no going back to a paper-centric
world. Too many lawyers are like farriers after the advent of the automobile, grossly--even
stubbornly--unprepared to deal with electronic evidence
As lawyers’ duties to supervise and direct clients’ preservation and collection of ESI have
broadened, their grasp of information systems, forms of ESI and effective search hasn’t kept
pace. This knowledge gap troubles judges who rely upon lawyers to police the discovery
process and stand behind the integrity of that process. Lawyers cannot defend what they
don’t understand.
No lawyer wants to be thought incompetent; yet the skills developed to collect, assess and
produce paper records do not translate well to a world steeped in ESI. Digital is different, and
neither clients nor the justice system can long afford the costly, cumbersome efforts lawyers
employ to regress data to paper or images.
42
Other things that trouble judges about e-discovery are:
2. Misstatements of fact coupled with a lack of reliable metrics
Perhaps because no lawyer wants to be thought incompetent, some resort to “winging it”
when it comes to reporting the state of client ESI and status of discovery. The case law
proves the folly of blind reliance on clients when gauging the true state of retention and
collection. Lawyers must not parrot client claims without undertaking even minimal steps to
establish their accuracy.
Often, the misstatements take the form of fanciful claims of burden or cost, advanced sans
reliable metrics gained through measurement or testing. Judges expect more than histrionics
and hand wringing. They demand competent, quantitative evidence of burden and cost
supported by the testimony of knowledgeable people who’ve done their homework. It troubles
judges to be asked to decide important issues on much less.
3. Cost and waste
Judges are of one troubled mind about litigation today. They all feel it costs too much and
worry that spiraling costs may crowd out legitimate cases or compel unjustified settlements.
Recently, a distinguished panel of e-discovery experts surprised this writer by agreeing that
about 70% of the money spent on e-discovery is wasted through poor planning and decisionmaking. Worse, they attributed about 70% of that waste to lawyer incompetence. If true, that
suggests that about half of every dollar spent on e-discovery is wasted because lawyers don’t
know what they’re doing with ESI. Half!
4. Delay in addressing ESI Issues
Over time, data tends to morph, migrate and disappear. Employees join and leave, and
machines are re-tasked or retired. Memories fade. Active data migrates to tape. Tape moves
to warehouses. Old tape formats give way to newer formats, and old tape drives are
discarded. With these changes, discoverable information grows more difficult and costly to
access over time. It troubles judges when parties ignore ESI issues until little problems grow
into big ones.
Judges expect parties and counsel to think and act in timely ways, identifying and preserving
potentially responsive evidence when they anticipate a claim or lawsuit instead of waiting until
a preservation demand surfaces or a lawsuit is filed.
Judges are also troubled when parties or counsel delay getting needed help from experts and
vendors. When a lawyer waits until discovery is overdue to begin seeking such help, it's hard
for a judge to impute good faith.
43
5. Lack of communication and cooperation
One reason judges don’t like discovery disputes is that they're often so unnecessary; that is,
they concern issues the parties could have resolved if they’d simply listened and cooperated.
It greatly troubles judges when parties and counsel exert little effort to resolve e-discovery
disputes before filing motions and demanding hearings. It further troubles judges when
lawyers mistakenly equate candor and cooperation with weakness, seeking to profit from
pointless disputes and motion practice.
Judges don’t abide trial by ambush or gamesmanship in e-discovery. The bench expects
parties to be forthcoming about the volume and nature of discoverable ESI and to be
reasonably transparent in, e.g., detailing preservation efforts or disclosing automated search
methods. Because judges never forget that all lawyers owe duties to uphold the integrity of
the justice system they serve, judges are troubled when advocates let the desire to win eclipse
those duties.
6. Failing to get the geeks together
Communication presupposes comprehension, but judges daily confront how working through
intermediaries clouds the court's understanding of technical issues. Like lawyers, information
technologists employ a language all their own. They speak geek.
Because lawyers rarely know what IT personnel are talking about, lawyers are often fearful of
allowing technical personnel from opposing sides to talk to each other. Instead, counsel for
the requesting party conveys questions from their technical expert to opposing counsel, who
passes them on to in house counsel, who has the paralegal on the case talk to the IT person.
The IT person responds to the paralegal who speaks to in house counsel who tells outside
counsel who passes on his or her best understanding to opposing counsel or the court. No
wonder so much gets misunderstood.
Judges expect clear, accurate communication about technical matters, and it troubles them
when knowledgeable people aren't brought together to foster transparency and trust.
7. Failing to implement a prompt and effective legal hold
Preservation is a backstop against error. Slipshod preservation pervades and poisons much
of what follows, and the cost to resolve inadequate preservation is breathtakingly more than
the cost of a reasonable and timely legal hold effort.
One need only peruse the opus opinions in The Pension Committee of the University of
Montreal Pension Plan, et al. v. Banc of America Securities, et al.,2 or Rimkus v. Cammarata3
2
3
2010 WL 184312 (S.D.N.Y. Jan. 15, 2010)
07-cv-00405 (S.D. Tex. Feb. 19, 2010)
44
to appreciate the signal importance judges place on a prompt and effective legal hold of
potentially relevant ESI and documents. Lawyers appear to have only two settings when it
comes to implementing legal holds: "off" and "crazy." Either they ignore the need for a hold
until challenged about missing data, or they issue so vague, paralyzing and impractical a
retention directive, that responses run the gamut from doing nothing to pulling the plug and
sitting in the dark.
It troubles judges when lawyers and clients fail to preserve information that bears on the
issues. Judges rightly expect lawyers to promptly hone in on potentially responsive
information when a claim or suit looms. Judges expect lawyers to identify fragile forms of
information and take reasonable steps to protect the evidence against loss or corruption due
to negligence or guile.
8. Overbroad requests and boilerplate objections
In the bygone era of paper discovery, asking for "any and all documents touching or
concerning" a topic was accepted. Information was generally stored on paper, paper was
predictably managed and a company's documents were typically organized topically in a few
easily-ascertainable locations.
But when information exploded into countless shards of messages and attachments strewn
across a sea of accounts, servers, machines, media and devices, "any and all" became too
many.
It deeply troubles--even antagonizes--judges when requests for information are unfocused
and over-inclusive and when reasonable requests are met with a litany of generic objections
Both demonstrate a lack of care and judgment.
Judges want to see evidence that the discovery sought is proportional to the matters at issue.
They expect objections to be asserted in good faith and narrowly drawn. Some judges are
even exploring sanctions under Fed. R. Civ. P. 26(g) to address fishing expeditions and
boilerplate objections. See, e.g., Mancia v. Mayflower Textile Servs. Co.4
9. Mishandling claims of privilege
Ask a judge what percentage of documents claimed “privileged” actually prove to be
privileged, and you'll probably hear, "ten percent, perhaps less." Yet more than one ediscovery expert has opined that finding, fighting about and redacting privileged documents
accounts for a sizeable share of the money spent on e-discovery. Whatever the percentages,
it's clear litigants spend far too much money and time ginning the seeds of privilege from
electronic evidence, even while overlooking privileged content through a paucity of quality
4
253 F.R.D. 354 (D. Md. 2008)
45
assurance and control. See, e.g., Mt. Hawley Ins. Co. v. Felman Prod., Inc.5 and Victor
Stanley, Inc. v. Creative Pipe, Inc.6
Lawyers gravitate to error-prone tools, like seat-of-the-pants keyword search, to cull
potentially privileged content, mischaracterizing much that's not privileged and much that is.
Further, many lawyers forget (or ignore) their client's duty to generate a proper privilege log
when material withheld from discovery as privileged happens to be ESI.
Finally, lawyers inexplicably fail to avail themselves of Fed. R. Evid. 502, which provides
significant protections against waiver of privilege, including the near-impregnable shield of a
R. 502(d) court order.
Last, but not least, any list of things that trouble judges about e-discovery is sure to include:
10. Failing to follow the Rules
Judges value the rules of procedure, and they expect those who come to their courts to do
so. So it troubles judges when the rules set forth a clear requirement that's ignored, especially
when the failure to follow a rule triggers a superfluous motion and hearing.
A telling example is the Federal Rule of Civil Procedure requiring a producing party to object
to a requested form of production and specify the form to be produced.7 It's a rule observed
more in the breach than in compliance; yet adherence to the rule would make many costly
battles demanding alternate forms of production unnecessary. The rule sets out what to do-with the goal that conflicts be resolved before production in objectionable forms--but litigants
just don't do it.
Heads in the Sand
Ironically, what most troubles judges about e-discovery also makes their lives easier: judges are
astounded they don't see more efforts to discover ESI! The bench well understands that the dearth
of e-discovery isn't indicia of cooperation, but of evasion. Though virtually all evidence today is
digital, many lawyers still try to pretend otherwise and look where they've always looked for evidence.
Increasingly, judges know this shouldn't be the case and that it can't last. They enjoy the calm, but
are troubled that so few are prepared for the gathering storm.
2010 WL 1990555 (S.D. W. Va. May 18, 2010)
250 F.R.D. 251 (D. Md. 2008)
7 Fed. R. Civ. P. Rule 34(b)(2)(D): Responding to a Request for Production of Electronically Stored Information.
The response may state an objection to a requested form for producing electronically stored information. If the
responding party objects to a requested form—or if no form was specified in the request—the party must state the form
or forms it intends to use.
5
6
46
Preserving Google Content for Dummies
© 2014
Craig Ball
A key responsibility of in-house and litigation counsel is to insure that potentially responsive
information is preserved facing litigation. Counsel must advise and supervise a client’s efforts to
preserve both information deemed favorable and information helpful to the other side. It’s a duty
owed to the Court under common law.
Attorneys have seen harsh criticism from courts and borne the brunt of monetary sanctions for failing
to act promptly and prudently to preserve electronically stored information (ESI). The duty to
preserve ESI attaches to every case, including those where parties lack the wherewithal to hire
technical experts. Moreover, absent agreement or court order, parties are not free to degrade the
forms of the ESI preserved and produced, such as by printing ESI out and destroying its electronic
searchability.
Meeting these obligations is challenging; more so when the data resides with third-parties like cloud
and webmail services. Millions of clients depend on Google tools to manage e-mail, contacts,
documents, calendars, contacts, photos and more. That’s a lot of potentially relevant evidence, and
it’s often sensible or necessary to preserve cloud content by collecting it.
Heretofore, Google made it easy to find content, but hard to get that content out in forms that
preserved utility and integrity. Some coped by printing individual messages and attachments to the
Adobe PDF format. But, printing to PDF is tedious and doesn’t always produce usable or complete
forms. Others relied on a mail transmission protocol called IMAP to download the contents of a
Gmail account to Microsoft Outlook PST container files. But, downloading Gmail using IMAP and
Outlook is tricky and slow.
Happily, the geniuses at Google have introduced a truly simple, no-cost way to collect Google cloud
content like Gmail, Google Drive, Calendar and others for preservation and portability. It sets a top
flight example for other cloud service providers and presages how we may use the speed, power
and flexibility of Google search as a culling mechanism before exporting for e-discovery.
Even if you’re a lawyer who could care less about IMAP, this is a development worth
cheering because until now, you had two choices when it came to putting Gmail on legal hold: Either
you’d instruct your client not to delete anything (and cross your fingers they’d comply) or you had to
hire someone to download the data. Now, Google does the Gmail collection gratis and puts it in a
standard MBOX container format that can be downloaded and sequestered. Google even
incorporates custom metadata values that reflect labeling and threading. You won’t see these
47
unique metadata tags if you pull the messages into an e-mail client; but, e-discovery software will
pick them up. I tested this using Nuix and the $100 marvel, Prooffinder. Both parsed the Gmail
metadata handily, enabling the messages to be threaded and paired with their Gmail labels.
MBOX might not have been everyone’s choice for a Gmail container file; but, it’s an inspired
choice. MBOX stores the messages in their original Internet message format called RFC 2822 (now
RFC 5322), a superior form for e-discovery preservation and production.
So, meet Google Data Tools (https://www.google.com/settings/datatools).
Armed with login credentials and client permission, the
only hard part of preserving a client’s Google content is
navigating to the right page. After logging into the user
account, you get to Google Data Tools from the Google
Account Setting page by selecting “Data Tools” and
looking for the “Download your Data” option on the lower
right. When you click on “Create New Archive,” you’ll see
a menu where you select the Google content to archive
and even choose whether to download all mail or just items
bearing the labels you select.
The ability to label content within Gmail and archive only
labelled messages means that Gmail’s powerful search
capabilities can be used to identify and label potentially responsive messages, obviating the need to
archive everything. It’s not a workflow suited to every case; yet, it’s a promising capability for keeping
costs down in the majority of cases involving just a handful of custodians with Gmail.
A lot of discoverable data is moving to Google–to Gmail, Drive, Calendar, YouTube–you name
it. Kudos to Google for turning a task that’s been hard into something so simple anyone can do it
well. That it costs nothing at all--thank you, Google!
48
49
Easing the Pain of E-Discovery with ESI Special
©2010
Craig Ball
I get quizzical looks when people ask what I do and I answer, ‘I’m an ESI Special Master. I help
courts and litigants resolve electronic evidence issues.” They know what litigants, judges and
lawyers do; and they’re often familiar with mediators, arbitrators and expert witnesses; but, few have
a clue about the many roles played by ESI Special Masters in litigation. In simplest terms, a Special
Master is someone—most often a lawyer— appointed to act for a court in specific ways An ESI
Special Master is tasked to assist with matters relating to electronic discovery, computer forensics
and digital evidence.
The role of the ESI Special Master may be adjudicative, investigative or ministerial. One day, I’m
presiding over hearings on e-discovery disputes and issuing directives geared to effective and
proportionate e-discovery; another, I’m the Court’s neutral forensic examiner poring over vast data
volumes to uncover the facts while protecting each side’s privileged and proprietary information.
It’s the broad range of responsibilities delegated to Special Masters that makes the work so
rewarding and piques the interest of lawyers with strong technical skills seeking a new and
challenging career in e-discovery and computer forensics. An ESI Special Master does the sorts of
things the judge would do, if the judge had the time and technical expertise, and what neutral IT
experts would do, if those experts were experienced trial lawyers. Technical expertise equips a
master to know what to do and how to do it, but legal training equips the master to know what's
important and when enough is enough.
Pros and Cons of Special Masters
When I'm approached to consult on e-discovery, I often ask, "Are you sure you want a partisan
consultant? Wouldn't a neutral special master be more effective and less costly?" The first response
is usually, "I never thought about it." The next is, "I don't know if the other side will go for it.”
Both sides may benefit from a neutral. An ESI Special Master can achieve significant savings serving
as a neutral investigator. In matters where the evidence on digital media is commingled with
privileged, proprietary or confidential information, the use of a qualified neutral examiner obviates
the need for separate-but-redundant examinations by opposing experts. Instead, the partisan
experts work with the neutral to frame a suitable examination protocol and then flesh out particular
areas of concern after the neutral examiner completes the work. The result is that both parties enjoy
substantially reduced costs and trusted outcomes.
A master enjoys greater access to the producing parties' systems and data, helping to insure that
responsive, non-privileged material will see the light of day. Producing parties benefit because a
neutral has no incentive to pursue overbroad or unduly expensive discovery, and by doing what the
neutral directs, they're insulated from criticism for doing too much or too little. While most producing
parties recognize that they will have to devote resources to e-discovery, what they despise most is
expending those resources only to find they're vulnerable to sanctions or obliged to start over again
because something was mishandled.
A skilled special master is better able to "right size" e-discovery, striking the optimum balance
between avoiding unnecessary expense and the right to receive information. A careful neutral has
50
no incentive to spend more or find less. Further, a neutral's right to see information withheld on
claims of privilege or confidentiality without triggering a waiver is a powerful hedge against abuse.
An effective neutral finds consensus; but when consensus fails, the special master must possess
the technical skill to fashion a sensible protocol and the legal ability to memorialize and enforce it.
It’s crucial that the Master serve as a catalyst to speedier and less-costly resolutions, not another
venue for endless argument or a means of delay. The overarching goal of a Master should be to do
away with any enduring need for a Master in the case.
The principal objection to use of a master is cost. Going before a judge on e-discovery disputes
feels "free" to lawyers because the judge doesn't charge by the hour and is paid from public coffers.
In fact, bringing discovery disputes to the judge is very costly and time consuming. Issues must be
briefed in formal submissions, witnesses must attend court and the delay pending a ruling introduces
still more costs, such as idling a large review team. But the biggest expense flows from the potential
that the court, hampered by a lack of technical insight, will decide the issues in ways that seem
equitable at first blush but prove unjust, ineffective or unduly expensive in practice.
Breaking Bad Habits and Fostering Cooperation
Resolving e-discovery disputes demands a mix of technical initiatives, information exchange and
behavioral modification. Often, problems stem from a breakdown in communication, so parties must
be steered to more effective communication strategies concerning ESI. It's like marriage counseling,
but without happier times to hearken back to. As in ugly divorces, conflict can become an end in
itself. Reasonable requests are refused just to be obstreperous. Unreasonable demands for
marginally relevant information are served simply because responding engenders hardship or
expense. Each side is determined to give no quarter and perceives cooperation as complicity and
weakness. A successful master helps the parties separate advocacy from discovery and works to
end peripheral battles over ESI, refocusing the parties on the merits.
The first thing I seek to instill in the parties is a clear understanding of what must stop. Data
destruction, dissembling, sniping at opponents and gross speculation are verboten. Where feasible,
each side must designate a technical liaison equipped to answer questions about systems,
applications and capabilities. Introducing players without a history of animus and shifting the focus
to technical issues helps establish a culture of cooperation.
Fostering cooperation may seem misguided in an adversarial system, especially to those who see
cooperation as affording aid and comfort to the enemy. But, the savvy lawyer understands that the
biggest beneficiary of cooperation is his or her own client. E-discovery efforts characterized by
cooperation cost the parties less and serve as a bulwark against waste and sanctions.
Working with the ESI Special Master
Over the course of dozens of appointments as ESI Special Master, I’ve done almost any task an ESI
Master might be called upon to do as facilitator, adjudicator or investigator. Along the way, I’ve
identified ways litigants can aid the process and further their standing with the Master:

Focus on the Facts
51
Because few attorneys are well-versed in information technology, it’s not surprising that
assumptions made with respect to the cost, burden and risks of e-discovery are frequently
off-the-mark. Requesting parties tend to think it too easy, where responding parties make it
sound improbably hard. An important role for the ESI Master is getting parties to examine the
bases for their assumptions and secure reliable metrics. The right questions posed to the
right persons often reveals that matters thought arduous are trivial and vice-versa. When
working with an ESI Special Master, bring forward the persons with knowledge and be
prepared to respond with solid metrics respecting file types, data volumes and other essential
facts.

Designate a Technical Liaison
It’s understandable that lawyers often seek to interpose themselves between technicians and
the court; but, much is lost in translation. An ESI Special Master “speaks geek” and may
prefer to deal directly with technically-astute liaisons. I customarily direct each party to
designate one or more technical liaisons who are obliged to be fluent in the particulars of the
implicated systems and ESI. Few steps are more effective at resolving e-discovery disputes
than facilitating productive communications between counterparts who grasp the technical
challenges and range of solutions. As well, countless hours can be saved by eliminating
much of the “let-us-get-back-to-you-on-that” typical of ESI disputes.

Come Armed with a Plan
Robert Moses, the controversial master builder who reshaped 20th-century New York, won
many battles by the simple expedient of showing up at meetings with fully-realized drawings
for civic improvements. Where others came with dreams, Robert Moses came with blueprints.
Lawyers often approach e-discovery disputes with nothing more than a naked demand or an
intransigent refusal.
Don’t force the Master to construct a solution from scratch and run the risk that it will be less
favorable to your client’s interests; instead, come armed with a sound plan, and don’t be
surprised if, in making the plan, you discover there’s less in dispute than you thought.

Be Candid
If you have problems in your case, such as spoliation issues or processing defects, promptly
communicate them to the ESI Special Master. A skilled Master may be able to resolve defects
before they become grounds for sanctions, and courts are hesitant to sanction when advised
that the parties are working with the Special Master to fix problems.
Mechanics of Appointment
In federal practice, the appointment of a special master is governed by Fed. R. Civ. P. 53, which
provides that a court may appoint a master with the parties' consent, where the appointment is
warranted by "some exceptional condition" or to address pretrial matters that cannot be effectively
and timely addressed by an available judge. Each state has its own regime for appointment of a
Special Master. For example, Rule 2-541 of the Maryland Rules of Civil Procedure is an amalgam
of the Federal rule and Maryland practice. Maryland Rule 2-541 states that, “[on] motion of any party
or on its own initiative, the court, by order, may refer to a master any…matter or issue not triable of
right before a jury.” The Maryland rule afford special masters broad powers. The appointment order
must “prescribe the compensation, fees, and costs of the special master and assess them among
the parties,” and may specify or limit the powers of a special master and contain special directions.
52
Tips for Appointment Order
The federal rule governing appointment of a Master sets out the requirements to serve and the
requisites of the appointment order, which should clearly define the role and powers of the master
with a particular eye toward establishing when the master's work is concluded. Masters cost money,
so it's important to insure the meter stops running once the job's done. Appointment orders should
specify the duties, powers and limits placed upon the Master, as well as whether and how the Master
may engage in ex parte contact with the parties. Orders should set out the Master’s obligations to
make a record and periodically report to the Court. Finally, the order should address the master’s
compensation, including the parties’ payment responsibilities and whether the Master’s charges may
be taxed as costs.
The appointment order is also a means by which the Court can address common concerns such as
whether the Master may be deposed or subject to trial subpoena and what is the standard for review
for particular actions taken by the Master.
An example of a federal appointment order follows as Appendix A. Though the example affords
broad discretion, parties would be wise to consider the master's experience before seeking such
leeway in all cases.
A Bridge to Competence
E-discovery and digital evidence pose technical challenges that few litigants are equipped to handle
and fewer lawyers have been trained to address. Courts, too, often lack the resources and
experience to delve deeply into the digital realm to achieve optimum outcomes. The consequences
have been costly and, until attorney competence in information technology becomes commonplace,
the need for ESI Special Masters will grow. ESI Special Masters can ease the pain of e-discovery
by insuring that it proceeds fairly, efficiently, effectively and in proportion to each side’s needs and
rights. An ESI Master promotes transparency of process, consensus and cooperation where
possible, and provides prompt, practical direction and resolution, when not. As neutral investigator,
an ESI Special Master affords all parties protections difficult to secure by other means, all allowing
the parties to focus on the merits, and the lawyers to be more confident and competent in the ediscovery process.
53
APPENDIX A: Exemplar ESI Special Master Appointment Order
IN THE UNITED STATES DISTRICT COURT
FOR THE ____ DISTRICT OF _____
______ DIVISION
[STYLE]
ORDER APPOINTING SPECIAL MASTER FOR ESI
1. Craig Ball of Austin, Texas, is hereby appointed as Special Master for Electronically Stored
Information pursuant to Rule 53 of the Federal Rules of Civil Procedure. Mr. Ball has filed the
certification required by Rule 53(b)(3).
2. The Special Master shall proceed with all reasonable diligence to assist and, when necessary,
direct the parties in completing required identification, preservation, recovery and discovery of
electronically stored information with reasonable dispatch and efficiency.
3. The Special Master shall review with the parties ongoing discovery requests to determine where
potentially responsive information is stored and how it can most effectively be identified, accessed,
preserved, sampled, searched, reviewed, redacted and produced. To the extent the parties have
disputes as to these matters, the Special Master may initiate or participate in the parties’ efforts to
resolve same. He is authorized to resolve issues as to the scope and necessity of electronic
discovery, as well as search methods, terms and protocols, means, methods and forms of
preservation, restoration, production and redaction, formatting and other technical matters.
4. The Special Master is granted the full rights, powers and duties afforded by F.R.C.P. Rule 53(c)
and may adopt such procedures as are not inconsistent with that Rule or with this or other Orders of
the Court. The Special Master may by order impose upon a party any sanction other than contempt
and may recommend a contempt sanction against a party and contempt or any other sanction
against a non-party.
5. The Special Master shall be empowered to communicate on an ex parte basis with a party for
purposes of seeking to maintain the confidentiality of privileged, trade secret or proprietary
information or for routine scheduling and other matters which do not concern the merits of the parties’
claims. The Special Master may communicate with the Court ex parte on all matters as to which the
Special Master has been empowered to act. The Special Master shall enjoy the same protections
from being compelled to give testimony and from liability for damages as those enjoyed by other
federal judicial adjuncts performing similar functions.
6. The Special Master shall regularly file a written report, in such format he deems most helpful,
identifying his activities and the status of matters within his purview. The report should identify
outstanding issues, with particular reference to matters requiring Court action. The Special Master
54
shall maintain a record of materials and communications that form the basis for such reporting by a
suitable means determined at the Special Master's discretion.
7. Each side is ordered to designate a lead attorney and a lead technical individual as contacts for
the Special Master. These designees shall have sufficient authority and knowledge to make
commitments and carry them out to allow the Special Master to accomplish his duties. The parties
are directed to give the Special Master their full cooperation and to promptly provide the Special
Master access to any and all facilities, files, documents, media, systems, databases and personnel
(including technical staff and vendors) which the Special Master deems necessary to complete his
duties.
8. Disclosure of privileged or protected information connected with the litigation to the Special Master
shall not be a waiver of privilege or a right of protection in this cause and is also not a waiver in any
other Federal or State proceeding; accordingly, a claim of privilege or protection may not be raised
as a basis to resist such disclosure.
9. The Court will decide de novo all objections to findings of fact or conclusions of law made by the
Special Master. Any order, report, or recommendation of the Special Master, unless it involves a
finding of fact or conclusion of law, will be deemed a ruling on a procedural matter. The Court will
set aside a ruling on a procedural matter only where it is clearly erroneous or contrary to law.
10. The Special Master’s compensation, as well as reasonable and necessary expenses, will be paid
by the [Plaintiff] [Defendant] [parties in equal shares]. Mr. Ball shall be compensated at his usual
and customary rate of $500 per hour, including time spent in transit or otherwise in connection with
this appointment, provided however that travel time will be paid at one-half (50%) of the usual and
customary rate unless substantive work, research or discussions in support of the engagement are
performed while traveling, in which case such activities will be paid at the usual and customary rate.
The Special Master shall submit to both parties invoices for services performed according to his
normal billing cycle and [Plaintiff] [Defendant] [the Plaintiff and Defendant in equal shares] shall pay
such invoices within thirty (30) days of receipt.
11. In making this appointment, the Court has determined that the matters within the purview of the
Special Master necessitate highly specialized technical knowledge and cannot be effectively and
timely addressed by an available district judge or magistrate judge of the district.
SO ORDERED AND ADJUDGED this the _______ day of _______________ 20____.
_________________________________
UNITED STATES DISTRICT JUDGE
55
Gold Standard
by Craig Ball
[Originally published in Law Technology News, April 2012]
Lawyers are in denial to the point of delusion with respect to the reliability of keyword search and
human review. Judge John Facciola put it best when he quipped that lawyers think they’re experts
at keyword search because they once found a Chinese restaurant on Google.
We trust keyword search because we understand it. We trust manual review of documents because
we grossly overestimate reviewers’ abilities to make sound, consistent decisions about relevance.
“To err is human,” the Bar seems to say, “but forgive us if we’d rather not divine just how error-prone
reviewers really are.”
Better approaches to search are arriving as so-called “predictive coding” or “technology assisted
review” (TAR) products. Still, it will be years before the rank and file embraces TAR, if only because
those hawking TAR tools remain resolutely uninterested in positioning the technology for use by
anyone but big corporations and white shoe law firms. Worse, the fervor among vendors to sell
something, anything they can label predictive coding insures that tools little different from ordinary
keyword search will be given a dab of lipstick and pushed out to market as TAR tools. It’s messy
down in the TAR pit.
Even those adopting predictive coding tools will need to compile “seed sets” of relevant documents
to train their tools. So, clunky-but-comfy keyword search and manual review are likely to remain the
means to cull seed sets from samples. Despite serious shortcomings, keyword search and manual
review will be with us for a while.
Keyword search is the art of finding documents containing words and phrases that signal relevance
followed by page-by-page (linear) review of those documents. It’s often called the “gold standard”
of electronic discovery.
That’s ironic, because extracting and refining gold relies less on finding precious aurum than it does
on dispersing all that isn’t golden. Prospectors use water and chemicals to flush away all but the
gold left behind. So, a true “gold standard” for keyword search would incorporate both precise
inclusion (smart queries) and defensible exclusion (smart filters).
To illustrate, in one e-discovery dispute over search, the plaintiff submitted keywords to be run
against the defendant’s e-mail archive for a three-month interval. Unfortunately, the archive held all
e-mail for all custodians, and the defendant adamantly refused to segregate by key custodian or
deduplicate before running searches. The interval was narrow, but the collection was vast and
redundant.
The defendant tested the agreed-upon keywords but shared only aggregate hit rates for each.
Thinking the numbers too high, but unwilling to look at the hits in context, the defendant rejected the
search terms. The plaintiff agreed the hit counts were daunting but asked to see examples of hits
on irrelevant documents before furnishing exclusionary (AND NOT) modifications to flush away more
of what wasn’t golden.
56
The defendant refused, insisting it wasn’t necessary to see the noise hits in context to generate more
precise queries. The parties were at an impasse, with one side grousing “too many hits” and
demanding different search terms and the other side uncertain how to exclude irrelevant documents
without knowing what caused the noisy results.
A lawyer who dismisses a search because it yields “too many hits” is as astute as the Emperor
Joseph dismissing Mozart’s Il Seraglio as an opera with “too many notes.” Mozart replied, “There
are just as many notes as there should be." Indeed, if data is properly processed to be susceptible
to text search and the search tool performs appropriately, a keyword search generates just as many
hits as there should be. Of course, few lawyers craft queries with the precision Mozart brought to
music; so when the terms used seem well chosen for relevance, it’s crucial to scrutinize the results
to learn what tailings are cropping up with the gilt-edged, relevant documents.
Keyword search is just a crude screen: “Show me items that contain these words, and don’t show
me items that contain those.” High hit counts don’t always signal a bad screen. If search terms
merely divide the collection into one pile holding relevant documents and one without, you’re closer
to striking gold. Then, you look at what you can reliably exclude with the next screen, and the next;
drawing ever closer to that elusive quarry, documentum relevantus.
But you must see hits in context to refine queries by exclusion. That seems so manifestly obvious,
it’s astounding how often it’s not done.
When lawyers delegate keyword search, they often get back only aggregate hit counts and
mistakenly conclude that’s enough information to judge searches noisy or not. If, instead, counsel
get their hands dirty with the data, as by personally exploring representative samples using desktop
or hosted tools, the parties could work quickly, effectively and cooperatively to zero in on relevant
material. Good queries are best refined by knowledgeable people testing them against pertinent,
small collections. Lousy outcomes spring from lawyers thinking up magic words and running them
against everything.
It’s not just a theory. Recently, as part of an early case assessment effort, I sought to rapidly isolate
relevant documents from a half million e-mail items culled from four key custodians. That’s a volume
where you’d expect to see bids from service providers and mustering of review teams. It’s a project
most firms would see as much more than a weekend’s work for one lawyer.
We tried something different. To start, the client exported the four key custodians’ e-mail messages
for the time period of interest from its e-mail archives. Those 50 gigabytes of messaging went into
a desktop processing and review tool.
Extracting and indexing the data overnight, I flagged exception items (e.g., images without
extractable text and encrypted files) for further processing, then exported spreadsheets reflecting
the most used e-mail addresses. I asked the custodians to flag addresses with no connection to the
dispute. Meanwhile, I compiled the customary list of search terms and phrases expected to occur
in relevant documents and tested these. Documents with false hits were examined for
characteristics permitting mechanical exclusion. Testing, re-testing and re-examination soon
produced reliable inclusion and exclusion term lists. Weeks of evaluation took just days because
the iterations and results were instantaneous.
57
The discards were tested, too. For example, material excluded by addresses but containing
inclusion terms was carefully checked to insure the hits weren’t relevant. Defensible exclusion
proved as powerful as inclusion, and potentially relevant material that couldn’t be excluded as tailings
stayed in the collection as ore. A true “gold standard.”
Did it produce a perfectly parsed set of material? Certainly not. Keyword search and human review
still fall short of expectations. But it was fast, relatively cheap and afforded cautious confidence that
the set produced was more relevant and less riddled with junk than what would have emerged from
the usual game of blind man’s buff. It was fast and cheap because the person creating and testing
the inclusive and exclusive filters was elbows deep in the data and hands on with the search tool.
Feedback was immediate. Quality checks could be done at once.
Ideally, e-discovery tools don’t put distance between the lawyer and the evidence but, instead,
extend our reach and help us get our arms around big data. A lawyer who is hands-on with the
evidence and who tests and refines his or her choices is a lawyer who can explain and defend those
choices. That’s the real golden future of e-discovery. Welcome back, counselor.
58
Ten Bonehead Mistakes in E-Discovery
by Craig Ball
[Originally published in Law Technology News, June 2012]
Spoiled by Google and legal research, lawyers are woefully unprepared for the difficulty of search in
e-discovery.
Search fails us in two, non-exclusive ways: our query will not retrieve the information we seek, and
our query will retrieve information we didn’t seek. Obviously, we want what we’re looking for (high
recall) and only what we are looking for (high precision).
Recall and Precision aren’t friends. Every time Recall has a tea party, Precision crashes with his
biker buddies and breaks the dishes.
It’s easy to achieve a high recall of responsive ESI. You simply grab it all: 100% of the data = 100%
recall. The challenge is achieving precision. If one out of every hundred items returned is what you
seek, 99 items are duds—1% precision stinks.
Keyword search followed by human review is called “linear search,” and for now, it’s standard
operating procedure in e-discovery, in part because linear search is mistakenly considered the safest
course lest a party fail to produce something responsive or turn over something that should have
been withheld.
Linear search is time-consuming, so it’s expensive. Worse, it doesn’t work well. People make search
and assessment errors, and making lots of searches and assessments, they make lots of errors!
Mistakes can be subtle and hyper technical, but most are not. If we eliminate bonehead errors,
we improve the quality of e-discovery and markedly trim its cost. Search will ever be a battle between
Recall and Precision, but avoiding bonehead mistakes limits casualties.
Recently, I ran a blog post sharing five bonehead mistakes I’d observed and asking readers to
contribute five more.
Mistake 1: Searching for someone’s name or e-mail address in their own e-mail
If you run a list of search terms including a custodian’s name or e-mail address against their own email, you should expect to get hits on all messages. I know some of you are saying, “Craig, no one’s
that boneheaded!” Actually, plaintiffs do it, defendants do it, and vendors run these searches without
flagging the error. Ask yourself: how often are the proposed search term lists exchanged between
counsel carefully broken out by particular custodians or forms of ESI to be searched?
Bill Onwusah, Litigation Support Manager at Hogan Lovells in London, commented that he’d seen
this mistake take the form of “searching for a term that shows up in the footer of every single
document produced by the organisation,” such as the firm’s name.
Mistake 2: Assuming the Tool can run the Search
59
Every ESI search tool has features and limitations. You must understand what data has been
indexed and what search methods and syntax are supported.
Most e-discovery tools index words, which means you won’t retrieve any information that isn’t text
(including some PDF, TIF and other pictures of words that haven’t been OCR’d to searchable text)
or that isn’t accessible text (like encrypted documents). Plus, most search tools don’t index parts of
speech called “noise” or “stop” words deemed so common they’ll gum up the works. I call this the
“To Be or Not to Be” problem, because all of the words in Hamlet’s famous phrase tend not to be
indexed in e-discovery.
Syntax mistakes occur when you assume the tool can run the search the way you constructed it.
Not every search tool supports every common search method, e.g., wildcard characters, Boolean
constructs, stemming, proximity searches or regular expressions, and even when two tools support
the same search method, tool A may require you to use different search syntax than tool B.
Mistake 3: Not Testing Searches
Much of what distinguishes a mistake as boneheaded is the ease with which it could have been
avoided. When a party to a lawsuit once proposed the letter “S” as a search term, I didn’t need to
test it to know it was a bonehead choice. But what about all those noisy terms that pop up in file
paths or are invariably encountered within ESI yet have nothing to do with the case? Even search
terms that appear bulletproof can surprise you. Test your searches to be sure they perform as
expected.
Mistake 4: Not Looking at the Data!
Don’t just natter on about the quantity of hits to evaluate your search; check the quality of the
hits. Look at the data! Minutes spent looking at the data can eliminate weeks or months of reviewing
crappy results and a zillion dollars spent in motion practice.
Mistake 5: Ignoring the Exceptions List
It’s the rare e-discovery effort where everything processes without exception. Typically, the
exception list will reflect hundreds or thousands of items that are encrypted, corrupt, unrecognized
or unreadable. You may take a calculated risk to ignore certain exceptional items; but too often,
exceptions are misclassified as benign or dismissed altogether. That’s boneheaded.
Ed Fiducia, Regional Vice President for EDD vendor Inventus, offered a sixth and seventh for the
bonehead mistakes list:
Mistake 6: Assuming That Deduplication Solves My Problem
Ed pointed to the limits of using hashing to identify truly duplicative files. “The rub is the definition of
a truly duplicative file.”
For example, e-mail messages sent to multiple addressees won’t deduplicate across custodians
because each message reflects its unique message ID and delivery path. Word and PDF versions
of the same document won’t hash deduplicate because they’re different file formats.
60
Hashing leaves “thousands upon thousands of near duplicates that must be identified and reviewed.
This leads to not only a dramatic increase in review costs, but a dramatic increase in the probability
that documents will be coded inconsistently. Spend more money, get worse results. Not a good
combination.”
Mistake 7: Reviewing Fifty Custodians When Five Will Do
Ed Fiducia: “Preserve everything? You bet! Review everything? Not in my book.”
“The knee jerk reaction is to blame plaintiffs’ attorneys who ask for everything. Equal responsibility
goes to defense attorneys who don’t negotiate the process from the start in meet and confer. As a
service provider, you’d think I’d push to process and review everything; but over the past 18 years,
I’ve seen case after case prove that if the scope of e-discovery is limited from the start--with caveats
to allow for additional discovery when warranted--everybody wins.”
Dave Swider, Senior Discovery Consultant for Evolve Discovery, contributed:
Mistake 8: Failing to Search for Common Name Variations
“Here’s one we see pretty often: Searching for names without anticipating variations. We’ll see a
search for ‘Robert Smith’ with no variations specified; no Rob, Bob, Bobby, Robby, not even an email
address.”
“Similarly, we’ll be asked to search for a complete law firm name: all five names as an exact string,
with no domain or proximity search.”
Too, “we see use of wildcards and terms that are far too expansive…. I worked on a case that
involved laying one material on top of another in a process called ‘deposition.’ Guess what term
appeared on the potential privileged terms list? A common offender in groundwater cases is ‘well.’”
Marc Hirschfeld, President of Precision Legal Services, added:
Mistake 9: Neglecting to Run Searches Against File and Folder Names
“Here is one that I never see attorneys talk about…. I often find a treasure trove of information when
the name of a folder holding relevant information includes a search word but the documents inside
do not. It’s as if the user pre-identified these documents as relevant but, because the file and folder
names weren’t indexed or searched, the treasure is missed.”
Ann Marie Gibbs, National Director of Consulting at Daegis, offered:
10: Failing to Rapidly React to the Problems You Encounter
“Another review oversight we see is a failure to ‘update’ the review set when a ‘false hit’ is running
up the review bill. This relates to the mistake where a client declines to accept excellent advice on
search selection criteria. If you can’t get them to understand the problem on the front end, you have
a second bite at the apple on the back-end.”
61
Dave Swider sums it up: “The number one boneheaded move by legal staff is simply not bothering
to understand how data works and how they can best apply tools that will make their outcomes
better. Our best clients are those that treat data not like documents, but like data.”
62
About the Author
CRAIG BALL
ESI Special Master and Attorney
Computer Forensic Examiner
Author and Educator
3723 Lost Creek Blvd.
Austin, Texas 78735
E-mail: [email protected]
Web: craigball.com
Blog: ballinyourcourt.com
Lab:
512-514-0182
Mobile: 713-320-6066
Craig Ball is a trial lawyer, certified computer forensic examiner, law professor and electronic evidence expert He's
dedicated his career to teaching the bench and bar about forensic technology and trial tactics. After decades trying lawsuits,
Craig limits his practice to service as a court-appointed special master and consultant in computer forensics and e-discovery.
A prolific contributor to educational programs worldwide--having delivered over 1,650 presentations and papers--Craig’s
articles on forensic technology and electronic discovery frequently appear in the national media. For nine years, he wrote
the award winning column on computer forensics and e-discovery for American Lawyer Media called "Ball in your Court."
Craig Ball has served as the Special Master or testifying expert on computer forensics and electronic discovery in some of
the most challenging, front page cases in the U.S.
EDUCATION
Rice University (B.A., 1979, triple major); University of Texas (J.D., with honors, 1982); Oregon State University (Computer
Forensics certification, 2003); EnCase Intermediate Reporting and Analysis Course (Guidance Software 2004); WinHex
Forensics Certification Course (X-Ways Software Technology 2005); Certified Data Recovery Specialist (Forensic Strategy
Services 2009); Nuix Certified E-Discovery Specialist (2014); numerous other classes on computer forensics and electronic
discovery.
SELECTED PROFESSIONAL ACTIVITIES
Law Offices of Craig D. Ball, P.C.; Licensed in Texas since 1982.
Board Certified in Personal Injury Trial Law by the Texas Board of Legal Specialization 1988-2015
Certified Computer Forensic Examiner, Oregon State University and NTI
Certified Computer Examiner (CCE), International Society of Forensic Computer Examiners
Certified Data Recovery Specialist
Certified E-Discovery Specialist (Nuix)
Faculty, University of Texas School of Law, Adjunct Professor teaching Electronic Discovery & Digital Evidence
Faculty, Georgetown University Law Center, E-Discovery Training Academy
Admitted to practice U.S. Court of Appeals, Fifth Circuit; U.S.D.C., Southern, Northern and Western Districts of Texas.
Board Member, Georgetown University Law Center Advanced E-Discovery Institute and E-Discovery Academy
Board Member, International Society of Forensic Computer Examiners (agency certifying computer forensic examiners)
Member, Sedona Conference WG1 on Electronic Document Retention and Production
Member, Educational Advisory Board for LegalTech (largest annual legal technology event)
Member, Maryland Committee on Federal E-Discovery Guidelines, 2014
Special Master, Electronic Discovery, numerous federal and state tribunals
Instructor in Computer Forensics and Electronic Discovery, United States Department of Justice
Lecturer/Author on Electronic Discovery for Federal Judicial Center and Texas Office of the Attorney General
Instructor, HTCIA Annual 2010, 2011 Cybercrime Summit, 2006, 2007; SANS Instructor 2009, PFIC 2010, CEIC 2011, 2012
Special Prosecutor, Texas Commission for Lawyer Discipline, 1995-96
Council Member, Computer and Technology Section of the State Bar of Texas, 2003-date
Chairman: Technology Advisory Committee, State Bar of Texas, 2000-02
President, Houston Trial Lawyers Association (2000-01); President, Houston Trial Lawyers Foundation (2001-02)
Director, Texas Trial Lawyers Association (1995-2003); Chairman, Technology Task Force (1995-97)
Member, High Technology Crime Investigation Association and International Information Systems Forensics Assn.
Member, Texas State Bar College
Member, Continuing Legal Education Comm., 2000-04, Civil Pattern Jury Charge Comm., 1983-94, State Bar of Texas
Life Fellow, Texas and Houston Bar Foundations
Adjunct Professor, South Texas College of Law, 1983-88
Selected Publications available at www.craigball.com
63
Matters in Which Craig Ball has Served as a Court Appointed Special Master or Neutral
or Testified as an Expert or in Connection with Computer Forensics/Electronic Evidence
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
Meyer v. Brown; Harris County, TX, Judge Baker; (Court’s Neutral)
In Re: Enron and Arthur Andersen Secs. Litigation; USDC SDTX (Lead Plaintiff’s Counsel’s ESI expert)
In Re: Tyco Securities Litigation; USDC NH (Lead Plaintiff’s Counsel’s ESI Expert)
American Express v. Americap; USDC SDTX (Court’s Special Master)
TXU v. Whittaker et al.; 151st Harris County, TX (Court’s Special Master)
Miller et al. v. Highland Medical Center; 295th JDC, Harris County, TX (Plaintiff’s Counsel’s Expert)
Barnes v. Kissner; 190thJDC; Harris County, TX (Court’s Neutral)
BP Texas City Explosion Litigation, Galveston, TX (Joint Prosecution Group ‘s Expert)
Chart Industries v. Runyan and Applied Hydrocarbon Systems; USDC SDTX (Plaintiff’s Expert)
Key Energy v. Crisp; USDC Midland, TX (Plaintiff’s Counsel’s Expert)
Broussard v. Dunlap; 190th Harris County, TX (Court’s Neutral)
State Bar of Texas v. [Attorneys Under Investigation]; TX Office of the Disciplinary Counsel
In Re: Flowserve Securities Litigation; USDC NDTX (Lead Plaintiff’s Counsel’s Expert)
Grooms v. Montelaro; 295th, Harris County, TX (Court’s Special Master)
Luk v. Eisner; 11th, Harris County, TX (Defense Counsel’s Expert)
MJCM, LLC. v. Floyd and Associates. Harris County, TX (Court’s Neutral)
PowerTrain v. American Honda; USDC NDMS (Hybrid Appointment)
Shue v USAA et al; Kendall County, TX (Court’s Special Master)
In Re: Sirna Therapeutics Litigation; USDC NDCA (Defense Counsel’s Expert)
Yeh v. McDougal; 333rd Harris County, TX (Court’s Neutral)
Plus Technologia, SA de CV v ACI Worldwide; Pinellas Cty., FL (Plaintiff’s Counsel’s Expert)
Anadarko Petroleum v. Geosouthern Energy; USDC SDTX (Hybrid/Court’s Neutral)
ASC v. SCI; Ft. Bend County, TX (Court’s Neutral by Stipulation)
Katrina Canal Breaches Consolidated Litigation; USDC EDLA (Court’s Neutral)
Sellar v. Boecker; Harris County, TX (Court’s Neutral)
In Re: Seroquel Products Liability Litigation; USDC MDFL (Court’s Special Master-ESI)
Daimler Trucks N.A. LLC v. Younessi; USDC OR (Court's Special Master)
MDI v. NaphCare; USDC SDMS (Court's Neutral)
Baker Hughes v. Pathfinder; USDC SDTX (Defense Counsel's Expert)
Bd. of Comms. of the Port of N.O. v. Lexington Ins. Co. et al.; USDC EDLA (Special Master-ESI)
Stewart & Stevenson v. McGuirt; Harris County, TX (Neutral Expert by Stipulation)
Fisher et al. V. Halliburton et al.; USDC SDTX (Plaintiff Counsel's Expert)
Aquamar S.A. v. E.I. Du Pont de Nemours & Co.; Broward County, FL (Plaintiff’s Counsel’s Expert)
AmWINS Brokerage of Texas, Inc . v Hildebrand; Collin County, TX (Neutral by Agreement)
Arthur v. Stern; Harris County, TX (Court's Special Master in computer forensics)
Duke Energy Int'l, LLC et al. v. Napoli; Harris County., TX (Court's Special Master)
Austin Capital Mgmt. v. Balthrop; USDC WDTX (Court's Special Master in computer forensics)
Grace et al. v. DRS Sensors & Targeting Systems, Inc.; USDC MDFL (Defense Counsel's Expert)
Peironnet et al. v. Matador Resources Co. et al.; Caddo Parish, LA (Court's Neutral)
Camp Mystic, Inc. et al. v. Eastland et al.; Kerr County, TX (Defense Counsel's Expert)
Maggette, Jacobs et al. v. BL Development et al.; USDC NDMS (Court's Special Master)
Ridha et al. v. Texas A&M University et al.; USDC SDTX (Defense Counsels' Expert)
In re: CityCenter Construction Litigation, Clark County, NV (Court’s Special Master for ESI)
In re: Bernard L. Madoff Investment Services Litigation; Bankruptcy Court SDNY (Trustee’s Expert for ESI)
Lexington v. Estate of John O’Quinn, Deceased; Probate Ct 2, Harris County, TX (Court’s Neutral Examiner)
Allison et al. v. Exxon Mobil Corp.; Circuit Court Baltimore County, MD (Court’s Special Master-ESI)
PIC Group v LandCoast, Inc.; USDC SDMS (Court’s Special Master)
SSC, et al v. Halberdier, et al; Harris County., TX (Neutral by Agreement)
Houlahan v. WWASPS; USDC DDC (Court’s Neutral)
M-I L.L.C. v. Stelly et al; USDC SDTX (Court’s Neutral)
Coyote Springs Inv. v Pardee Homes; Clark County, NV (Court’s Special Master)
Segner v. Sinclair Oil & Gas; USDC NDTX (Court’s Special Master)
Adams Golf v. Reed and Callaway Golf; 296 th, Collin County, TX (Court’s Special Master)
Elliott v. Tetlow and MCO-I; USDC SDTX (Court’s Special Master)
12001 Beamer, Ltd. V. Valtasaros; 295th, Harris County, TX (Court’s Special Master)
0
56.
57.
58.
59.
60.
61.
62.
63.
64.
William A. Sawyer v. Frank Gabrysch et al.; 269th. Harris County, TX (Court’s Special Master)
Bridges et al. v. GES et al.; 164th, Harris County, TX (Court’s Special Master)
Ramirez v. State Farm Lloyds; 206th, Hidalgo County, TX (Plaintiffs’ Counsel’s Expert)
In re: Forest Research Institute Cases, USDC DNJ (Plaintiffs’ Counsel’s Expert)
Radcliffe v. Tidal Petroleum; 218th , LaSalle County, TX (Court’s Special Master)
Estate of Henry G. McMahon, Jr.; Travis County, TX (Court’s Special Master)
Samame d/b/a Alamo Packing v. Arco Iris et al; USDC WDTX (Court’s Special Master)
Huerta/Kodish v. BASF; Circuit Court, Cook County, IL (Court’s E-Discovery Mediator)
EPAC Technologies v. Thomas Nelson, Inc.; USDC MDTN (Court’s Special Master)
1