Email Archiving Arvind Srinivasan Gaurav Baone RSCTC 2008 © 2008 ZL Technologies, Inc. Imagine this is what happens to your business records at the end of every month …. RSCTC 2008 15181 If this looks absurd … That’s exactly what we do to email! Practically every major transaction, project, and contract, is recorded in email Regulators now treat email like hard copy records SEC 17a-4 FDA 21 CFR 11 NASD 3010, 3110 DoD 5015.2 HIPAA Sarbanes-Oxley And the courts agree (FRCP, Dec 2006) Non-compliance fines and legal liabilities are rising . . . RSCTC 2008 ZipLip, Inc. 15181 Just How Much Scalability Does Archiving Require? Assume: 25,000 7 4.47 Employees averaging 70 mails/day Years Retention Billion Emails For Archive System To Index & Search versus 4.28 Billion Web-Pages Indexed by source: Google Press Release, Feb 17, 2004 Functionality needs to scale to these volumes RSCTC 2008 15181 ZLTI Unified Archival ZL Technologies, Inc. Outline Email Capture Methods Business Drivers Archive Functionality Retention & Deletion Surveillance & Compliance E Discovery Conclusion CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Email Capture Methods Active Capture Methods – PRO-ACTIVE Archiving – Journaling – Mailbox crawling – SMTP Gateway Capture Historical Capture Methods – REACTIVE Archiving – Restore from backup tapes – Crawl for PST / NSF files from desktops – Forensic captures CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Journaling – 100% Capture CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Mailbox Crawling – Policy Based CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Reactive Archiving CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Not Just Email CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Primary Business Drivers - Regulations and Laws SEC 17a-4 Gramm-Leach-Bliley Act HIPAA CA SB1386 NASD 3011 Hedge Funds Rule 203(b) Mutual Funds Rule 38a-1 UK Freedom of Information Act Canada PIPEDA NASD 3010 Basel II Sarbanes-Oxley Act Investment Advisors Act US Freedom of Information Act Florida Sunshine Law FRCP Japan Personal Information Protection Act DoD5015.2 CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Functional Requirements Retention Surveillance and Compliance e Discovery Common Theme - Classification CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Retention & Deletion Conflicting Requirements: Retention Periods and Policies Regulation Type of Record Retention Period Age Discrimination in Employment Act Hiring Documents One year from date of decision Fair Labor Standards Payroll ,sales and Personal Records Three Years Rehabilitation Act Handicap discrimination Records Three Years Civil Rights Act Records One Year Occupational Safety and Health Act Health Records 30 Years Laws & Regulation => Retain for “x” years. Vs Company Liability/Risk and Cost Real-time Categorization of Mail Sender/Recipients Content (Subject, body, attachment) User Input (Which folder it was found, Manual Tagging) CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Retention & Deletion (cont’d) "a priori" and "a posteriori“ based Retention. Event Driven – Deletion of mail from user folder, Reclassification of mail by end user Legal Hold – Court Orders to retain evidence relating to certain subject matters. Single Instance Storage Same Email in Multiple Mailboxes Same Attachment in Multiple Emails Significant storage savings. CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Surveillance Examples of Compliance Categories Conflicting Requirements: Regulation require review of documents Category Content Action Vs Adult Offensive language Post-Review Confidential SSN Numbers, Bank Account numbers Pre-Review to prevent confidential information from going out Legal Issues Words like attorney, charge*. Phrases like breach* and agreement within 6 words Post/Pre Review Compliance Hype Stocks and sell between 3 words of each other Pre-Review in Financial Industries Effort spent into reviewing the documents. Real-time Flagging of Mail Lexical Based – Key words, word associations, wild-cards Policy Based – Eg. Mail from WallStreetJournal.com is newsletter. Custom Code – Detect Vacation Response, Read Receipts, DSN’s CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Surveillance(cont’d) Real-time Flagging is a categorization problem Current Systems suffer from lot of false positive. Transparent and Deterministic rules preferred over Blackboxes. Disclaimers (Internal and External) tend to get flagged as it contains the very terms that we try to flag. Use Reviewer feedback to adapt the rules. CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. E-Discovery Conflicting Requirements: Produce electronic docs. to satisfy court-orders Vs. Providing insufficient, not relevant, privileged Information Discovery Request Certain number of custodians Date Range Pertaining to certain subject matter; usually described by a set of Search terms. Search Type Court-dictated Required Search Full text "acidosis" Boolean "cardiac" OR "respiratory" Phrase "in-custody death" Proximity "pre-existing" within 10 words of "condition" Wildcard "epilep*" Wildcard proximity "mental*" within 5 words of "condition" Dual wildcard proximity "continu*" within 10 words of "discharg*" Wildcard sentencelevel "caus*" within same sentence as "death" ┼ Source: Williams v. Taser Int’l, Inc., 2007 WL 1630875 (N.D. Ga. June 4, 2007) CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. E-Discovery(cont’d) Landmark case Zubulake vs. UBS Warburg (2003) Primarily driven by Federal Rules of Civil Procedure (FRCP) established in 2006. Litigants are entitled to obtain electronic information from the adverse party. Voluntary Initial Disclosures need to be made pertaining to each litigant Today, almost all cases have some sort of electronic documents as evidence. CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. E-Discovery(cont’d) Parties face Sanctions if they do not provide all the relevant documents. (Numerous precedence, eg. Metrokane vs Built NY 2008). Validation occurs when receiving party can prove existence of other document through hard-copy printout or other means. Lawyers from both parties routinely negotiate keywords to define Search Concepts Manual Review of Documents for Relevance and Privilege. Numerous product cluster similar documents (near deduplication) to present similar documents to reviewers to improve efficiency. Chain of Custody – To prove that the document has not be tampered or altered. CONFIDENTIAL RSCTC 2008 CONFIDENTIAL ZLTI Unified Archival ZL Technologies, Inc. Palin’s e-mail at $15m per request Cost to retrieve e-mail for 1 mailbox 6 Hours to assemble email for 1 employee mailbox 2 Hours for “security” checks 5 Hours to filter by requested keyword or topic 13 $73.87 Total hours per mailbox Hourly rate $960.31 Cost to retrieve e-mail for Cost to retrieve e-mail for all 1 mailbox employees $960.31 NBC's price quote for e-mails sent to Todd Palin: $15 million. AP's price quote for e-mails between state employees and the campaign headquarters of Sen. John McCain: $15 million. AP's price quote for e-mails between state employees and the National Park Service: $15 million. CONFIDENTIAL RSCTC 2008 CONFIDENTIAL Cost to retrieve email for 1 mailbox 16,000 Full-time employees $15.3 million Cost to retrieve e-mail for all employees ZLTI Unified Archival ZL Technologies, Inc. Conclusion Most challenges in archiving can be reduced to Classification problem. Segmentation Problems: Detect internal and external disclaimers Detect change in Email behavior through email profile analysis Understanding mails: Need to develop Analysis techniques to understand the contents Visualization and Grouping Similar mails – Control the order in which mails and documents are viewed. Consistent way of defining Subject Matters – Beyond just a set of keywords. Extract more meta data about attachments such as images, audio and video files. And all the above are required in muliple languages – English, Japanese, Spanish, Chinese, and others. CONFIDENTIAL RSCTC 2008 CONFIDENTIAL
© Copyright 2026 Paperzz