Information Lifecycle Governance IBM Information Economics for Big Data The problem with managing information assets today Siloed tools and systems for creating, managing, and sharing documents Reactive, one-off approaches to ediscovery and FOIA requests 2 Massive duplication within and across repositories Disconnect between retention schedule and how the Department works Everyone keeps everything forever 2 © 2012 IBM Corporation Massive growth in structured and unstructured content Worldwide Corporate Data Growth 80% of Data Growth is Unstructured Source: IDC The Digital Universe 2010 3 © 2012 IBM Corporation Why do companies keep information? Enterprise Information Subject to Legal Hold 1% Hold & Collect Evidence Has Business utility Archive for Value & Dispose 25% Retain Records & Dispose Everything Else 69% Cost & Risk Reduction Enables Disposal Regulatory Record Keeping 5% DISPOSE OF UNNECESSARY DATA Cost Reduction Normalizes growth curve %s based on CGOC Summit 2012 Survey 4 © 2012 IBM Corporation StoredIQ Customer Value DATA INTELLIGENCE INTELLIGENT EDISCOVERY INFORMATION GOVERNANCE BUSINESS ANALYTICS Identify, Analyze, Act Litigation Readiness Policy Management Business Intelligence Dimensional Data Maps Risk Reduction Storage Optimization File Share Clean-up Legal Hold Notifications True Early Case Assessment Intelligent Collections Review Platform Integration Identify Anywhere Records Retention Defensible Deletion Compliance Enforcement Holistic Data Views Recognize Business Value Collect Unstructured Data Vertical BA Tool Integration ACTIVE INFORMATION PLATFORM BIG DATA Archive Platform 5 5 ECM Forensic Images/Tapes File Servers Email Servers Desktops/ Mobile SharePoint & Enterprise Collaboration Cloud Social Networks* Media* © 2012 IBM Corporation StoredIQ’s rapid solution validation deployment and information visualization enables discovery and in-place governance and disposal Key Findings Potential Actions Records Management Records Management • We only sampled against 4 of 625+ record categories and found almost 20,000 files that matched consisting of 6% of the sample set. •20% of all objects found were emails on the file system •Expand to include all 625 record categories •All records found would be cross referenced against age and retention polices and then moved to archive or retention/collaboration platform as needed with all relevant tags and retention times Data’s Value and Risk • We were able to quickly create complex information sets such as finding all documents related to Apples, customer accounts and renewals (3700+ files vs. Bananas in the same query 460 files) • We utilized the SIQ social security algorithm and found over 1450 documents containing SSN’s •Checking the age of both data sets we were able to determine that most of this data was well past it’s prime with over 50% 5 years or older Data’s Value and Risk •Maintain a full index allowing for right sized, rapid and accurate exports and results for any discovery case or needs involving unstructured data •Expand filters to include credit card data, PII and ePHI as determined by Customer •Data that contained PII and ePHI could be moved to a secure location for proper handling by the security teams or appropriate parties Data to Delete or Archive Data to Delete or Archive • Using simple filter sets we were able to determine that over 30% of the data should easily be up for consideration of archiving and/or disposal • Aged data over 5 years was spread through various file types and owners • Most of this data set did have owners associated with it. 6 •Records, work in progress and relevant data will be excluded from the aged data sets •Data beyond it’s life and usefulness could be moved to a staging area for timely disposal •Data still in use but older could be moved to archive platforms or cheaper storage •Prohibited data could be outright deleted © 2012 IBM Corporation Data Age by Data Source 7 © 2012 IBM Corporation Solution validation overview – Customer provided sample data set Data Sources CIFS Collection Objects- 1,779,557 Size – 461.27GB Email Objects – 359,556 (20%) Collection Times Meta – 4 hours Full – 48 hours SharePoint Collection Sites – 5 Objects – 1,535 Size - .35GB Collection Times Full – 7 minutes 8 Records Data Cleanup Data’s Value We tested against 4 Customer Records Classifications We looked at Data that could be deleted or archived ACC-40040 Objects - 17,938 Size - 25.43gb HUM-70080 Objects - 1,163 Size - 1.75gb LEG-60200 Objects - 475 Size - .38gb MAR-10071 Objects - 266 Size - .62gb Archival Data Last Accessed 2010 Objects – 826,525 Size – 49.52gb Disposal Data Last Accessed 2008 Objects – 452,159 Size – 39.14gb Prohibited Files Objects - 153,049 Size – 48.93gb Totals Objects – 1,431,733 Size – 137.59 gb (30% of all storage) We built layered information sets that let us find Account w/3 of Information Objects – 27,928 Size – 17.84gb Narrowing by “Renewal” Objects – 7,860 Size – 12.96gb Data Clean-up RESULT Data’s Risk We found PII all over the data set SSN Algorithm Objects – 1,488 Size – 1.73gb Secure PII Storage Archive Platform Retention Platform Collaboration Platform 50+% was over 5 years old!!!! • Full end-to-end audit trail for all disposed records, files and data • Corporate Governance Policy directives proactively achieved in real-time • Data can be easily moved to the proper retention or collaboration platform based off records policies • Data Automation Means • Data can be put where it needs to live if it needs to live • Risk can be mitigated • Data can be available to those who need it when they need it. Data Debris To Be Deleted © 2012 IBM Corporation Putting Data in the right place Data Sources CIFS Collection Objects- 1,779,557 Size – 461.27GB Email Objects – 359,556 (20%) Collection Times Meta – 4 hours Full – 48 hours SharePoint Collection Sites – 5 Objects – 1,535 Size - .35GB Collection Times Full – 7 minutes Records: Identified though full text search and moved to retention Data of Value: Identified though full text search and moved to either Collaboration or Retention PII: Identified though full text search and moved to Secure Storage Data Cleanup: Identified though metadata and full text search and either moved to Archive or deleted Retention Platform Collaboration Platform Secure PII Storage Archive Platform Data Debris To Be Deleted 9 © 2012 IBM Corporation FileShares/SharePoint Archiving and Disposal Flow FileShares And SharePoint Files 2 Files to be retained (left in place) Review by designated Business Unit Experts Files to be archived 1 4 Data Analysis to determine ROT •Age •Frequency of access •Record Codes •Non-conforming file types •…. Files to be disposed Candidate Files analyzed by StoredIQ Files re-classified after review from Business Unit Owners P8 Content Classifier Start Here Content Classifier Interface is used to build the Watson based models for Records Classification as well as refining the rules throughout the process Files Retained (left in place) 8 Files moved from P8 after waiting period 7 Space reclaimed / reallocated after ATT Disk/SharePoint reclaim process 5 6 Files archived using ICC for fileshares & SharePoint Files moved on P8 location (appear deleted to users) 0 Files to be retained (left in place) Files to be archived Files to be disposed StoredIQ 10 3 Atlas Policies Content Collector for SharePoint & Files Disk Space freed up by Archiving and Disposal on Primary file / SharePoint location (Stubs left in place to access data from the Archive (P8)) © 2012 IBM Corporation A picture tells a thousand words DATA CLASSIFICATION TOUR 11 © 2012 IBM Corporation While things are indexing… Build Classification Based on Existing Policies and Retention Schedule Use existing Email policy and Retention Schedule to define Classification rules. 12 © 2012 IBM Corporation Ready to use Enterprise Schedule Classification Rules on all data Each Rule classifies using examples from the existing Schedule, and training examples from the Line of Business. Each rule flexible to handle all the varied content types that match each classification. 13 In addition to Boolean rules, in production we will leverage IBM Content Classification and training sets to build Watson based filter models © 2012 IBM Corporation Start to View the File results at a very high level, By Type, Location, Size or Age This example has found 20% of the files in the sample set are emails totaling 359,596 objects 14 © 2012 IBM Corporation Quickly understand where critical data resides in the sample set; view of the information with LEG-60200 Classification applied Individual data sets can be examined and acted upon Here we can see that the bulk of LEG-60200 records are word processing but 24% or 115 are emails 15 © 2012 IBM Corporation Once classified, the data can be staged to a CIFS Retention location before ingest into appropriate Systems with meta data Action Log showing the actions that have been run against the selected infoset and the results. 16 © 2012 IBM Corporation Auto-classification tags the objects with metadata respective to the responsive information sets with retention codes ready for ingest 17 © 2012 IBM Corporation Data Clean-Up – Defensible Disposal 18 © 2012 IBM Corporation Dynamic Data Topology Maps A top three Insurance and Financial services company Identified 1,000’s of unstructured data repositories with relevant claims data across an enterprise-wide SharePoint implementation PROBLEM: Identifying breadth and scope of data in SharePoint sites that relates to individual claims. SOLUTION: DataIQ found over 50,000 sites (many previously hidden) and correlated content in versions and social wikis/blogs. ROI: Reduced manual search by 100% and created defensible audit trails for any claim. 5:1 savings in first year on claims management. 19 © 2012 IBM Corporation eDiscovery: Comprehensive Response StoredIQ has successfully helped companies meet their eDiscovery obligation in thousands of cases – with complete accuracy, reliability and defensibility. “Companies can more than justify the purchase of inhouse eDiscovery software and expect a return on investment in 3-6 months, or after the first matter.” - Gartner PROBLEM: Identifying, collecting and preserving historical information potentially relevant to the Deep Water Horizon matter in a timely manner. SOLUTION: StoredIQ enabled rapid indexing of 100s of terabytes of data including the identification, preservation and collection across multiple data sources and throughout multinational organizations. IMPACT: Respond rapidly to legal discovery requests, lower eDiscovery costs, ensure complete audit trail and defensibility. 20 20 © 2012 IBM Corporation Litigation Readiness Supported “bet the company” litigation effort. Identified, collected and analyzed (132) TB of data to produce (200) GB of relevant data. PROBLEM: For The Deep Water Horizon matter, look across 132TB's, 3 continents and 8 locations. Collect 1TB to a preservation location in Houston. Full text indexing and apply additional terms to reduce to the smallest defensible data set which was sent out for production review by outside counsel. Final data set was approximately 200 GB's.” SOLUTION: Enable a 100:1 reduction in collection process in less than 2 weeks. ROI: Saved $5m+; responded to every DOJ request; lowered outsourced review costs and built a defensible audit trial. Increased Case Preparation Time. 21 © 2012 IBM Corporation Thank You 22 © 2012 IBM Corporation
© Copyright 2026 Paperzz