Master - rsctc 2008

Email Archiving
Arvind Srinivasan
Gaurav Baone
RSCTC 2008
© 2008 ZL Technologies, Inc.
Imagine this is what happens
to your business records
at the end of every month ….
RSCTC 2008
15181
If this looks absurd …
That’s exactly what we do to email!
Practically every major transaction, project, and
contract, is recorded in email
Regulators now treat email like hard copy records
SEC 17a-4
FDA 21 CFR 11
NASD 3010, 3110
DoD 5015.2
HIPAA
Sarbanes-Oxley
And the courts agree (FRCP, Dec 2006)
Non-compliance fines and legal liabilities are rising . . .
RSCTC 2008
ZipLip, Inc.
15181
Just How Much Scalability Does Archiving Require?
Assume:
25,000
7
4.47
Employees averaging 70 mails/day
Years Retention
Billion Emails For Archive System To Index & Search
versus
4.28
Billion Web-Pages Indexed by
source: Google Press Release, Feb 17, 2004
Functionality needs to scale to these volumes
RSCTC 2008
15181
ZLTI Unified Archival
ZL Technologies, Inc.
Outline
Email Capture Methods
Business Drivers
Archive Functionality
Retention & Deletion
Surveillance & Compliance
E Discovery
Conclusion
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Email Capture Methods
 Active Capture Methods – PRO-ACTIVE Archiving
– Journaling
– Mailbox crawling
– SMTP Gateway Capture
 Historical Capture Methods – REACTIVE Archiving
– Restore from backup tapes
– Crawl for PST / NSF files from desktops
– Forensic captures
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Journaling – 100% Capture
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Mailbox Crawling – Policy Based
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Reactive Archiving
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Not Just Email
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Primary Business Drivers - Regulations and Laws
SEC 17a-4
Gramm-Leach-Bliley Act
HIPAA
CA SB1386
NASD 3011
Hedge Funds Rule 203(b)
Mutual Funds Rule 38a-1
UK Freedom of Information Act
Canada PIPEDA
NASD 3010
Basel II
Sarbanes-Oxley Act
Investment Advisors Act
US Freedom of Information Act
Florida Sunshine Law
FRCP
Japan Personal Information Protection Act
DoD5015.2
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Functional Requirements
 Retention
 Surveillance and Compliance
 e Discovery
 Common Theme - Classification
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Retention & Deletion
Conflicting Requirements:
Retention Periods and Policies
Regulation
Type of
Record
Retention
Period
Age
Discrimination
in Employment
Act
Hiring
Documents
One year from
date of decision
Fair Labor
Standards
Payroll ,sales
and Personal
Records
Three Years
Rehabilitation
Act
Handicap
discrimination
Records
Three Years
Civil Rights Act
Records
One Year
Occupational
Safety and
Health Act
Health
Records
30 Years
 Laws & Regulation => Retain for “x” years.
 Vs
 Company Liability/Risk and Cost
Real-time Categorization of Mail
 Sender/Recipients
 Content (Subject, body, attachment)
 User Input (Which folder it was
found, Manual Tagging)
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Retention & Deletion (cont’d)
 "a priori" and "a posteriori“ based Retention.
 Event Driven – Deletion of mail from user folder,
Reclassification of mail by end user
 Legal Hold – Court Orders to retain evidence relating to
certain subject matters.
 Single Instance Storage
 Same Email in Multiple Mailboxes
 Same Attachment in Multiple Emails
 Significant storage savings.
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Surveillance
Examples of Compliance Categories
Conflicting Requirements:
 Regulation require review of documents
Category
Content
Action
 Vs
Adult
Offensive
language
Post-Review
Confidential
SSN Numbers,
Bank Account
numbers
Pre-Review to
prevent
confidential
information
from going
out
Legal Issues
Words like
attorney, charge*.
Phrases like
breach* and
agreement within
6 words
Post/Pre
Review
Compliance
Hype
Stocks and sell
between 3 words
of each other
Pre-Review in
Financial
Industries
 Effort spent into reviewing the documents.
Real-time Flagging of Mail
 Lexical Based – Key words, word
associations, wild-cards
 Policy Based – Eg. Mail from
WallStreetJournal.com is
newsletter.
 Custom Code – Detect Vacation
Response, Read Receipts, DSN’s
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Surveillance(cont’d)
 Real-time Flagging is a categorization problem
 Current Systems suffer from lot of false positive.
 Transparent and Deterministic rules preferred over
Blackboxes.
 Disclaimers (Internal and External) tend to get flagged
as it contains the very terms that we try to flag.
 Use Reviewer feedback to adapt the rules.
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
E-Discovery
Conflicting Requirements:
 Produce electronic docs. to satisfy court-orders
 Vs.
 Providing insufficient, not relevant, privileged
Information
Discovery Request
 Certain number of custodians
 Date Range
 Pertaining to certain subject matter;
usually described by a set of
Search terms.
Search Type
Court-dictated
Required Search
Full text
"acidosis"
Boolean
"cardiac" OR
"respiratory"
Phrase
"in-custody death"
Proximity
"pre-existing" within 10
words of "condition"
Wildcard
"epilep*"
Wildcard
proximity
"mental*" within 5
words of "condition"
Dual
wildcard
proximity
"continu*" within 10
words of "discharg*"
Wildcard
sentencelevel
"caus*" within same
sentence as "death"
┼ Source: Williams v. Taser Int’l, Inc., 2007 WL 1630875 (N.D. Ga. June 4, 2007)
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
E-Discovery(cont’d)
 Landmark case Zubulake vs. UBS Warburg (2003)
 Primarily driven by Federal Rules of Civil Procedure
(FRCP) established in 2006.
 Litigants are entitled to obtain electronic information
from the adverse party.
 Voluntary Initial Disclosures need to be made pertaining
to each litigant
 Today, almost all cases have some sort of electronic
documents as evidence.
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
E-Discovery(cont’d)
 Parties face Sanctions if they do not provide all the
relevant documents.
(Numerous precedence, eg. Metrokane vs Built NY 2008). Validation occurs when
receiving party can prove existence of other document through hard-copy printout or
other means.
 Lawyers from both parties routinely negotiate keywords
to define Search Concepts
 Manual Review of Documents for Relevance and
Privilege.
Numerous product cluster similar documents (near deduplication) to present similar
documents to reviewers to improve efficiency.
 Chain of Custody – To prove that the document has not
be tampered or altered.
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
ZLTI Unified Archival
ZL Technologies, Inc.
Palin’s e-mail at $15m per request
Cost to retrieve e-mail for 1
mailbox
6
Hours to assemble email
for 1 employee mailbox
2
Hours for “security” checks
5
Hours to filter by requested
keyword or topic
13
$73.87
Total hours per mailbox
Hourly rate
$960.31
Cost to retrieve e-mail
for
Cost to retrieve
e-mail for all
1 mailbox
employees
$960.31

NBC's price quote for e-mails sent to Todd Palin: $15 million.

AP's price quote for e-mails between state employees and the campaign
headquarters of Sen. John McCain: $15 million.

AP's price quote for e-mails between state employees and the National Park Service:
$15 million.
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL
Cost to retrieve email for 1
mailbox
16,000
Full-time employees
$15.3
million
Cost to retrieve e-mail
for all employees
ZLTI Unified Archival
ZL Technologies, Inc.
Conclusion
 Most challenges in archiving can be reduced to Classification
problem.
 Segmentation Problems: Detect internal and external disclaimers
 Detect change in Email behavior through email profile analysis
 Understanding mails: Need to develop Analysis techniques to
understand the contents
 Visualization and Grouping Similar mails – Control the order in
which mails and documents are viewed.
 Consistent way of defining Subject Matters – Beyond just a set of
keywords.
 Extract more meta data about attachments such as images, audio
and video files.
 And all the above are required in muliple languages – English,
Japanese, Spanish, Chinese, and others.
CONFIDENTIAL
RSCTC
2008
CONFIDENTIAL