SERIES EXPERT ADVISORY Advanced Discovery Quick Guide 2017 Essential Analytic Tools & Techniques: Leveraging Analytic Features and Workflows to Increase Efficiency, Ensure Quality, and Decrease Costs Essential Analytic Tools & Techniques Analytics is a widely used term in the eDiscovery industry, but what are we referring to when we use it in this context? Do we mean predictive coding? Another form of assisted review? Features based on conceptual indexing? What about email threading and near-duplicate identification? Sampling tools? When we refer to analytic tools, techniques or workflows in the eDiscovery context, we are referring to all of these specific examples but also, more broadly, to any tool or technique that helps us discover and understand meaningful patterns in our data collection. Anything that helps us filter at the right level of granularity, identify associations between related materials, or recognize gaps in a set of materials is analytic. Certainly, there is no shortage of clever technology and branded features, but in practical terms, how can analytic tools, techniques, and workflows actually be leveraged in our eDiscovery matters? What benefits can they confer during each phase of a project? This white paper reviews some of the core applications of the most common analytic tools and their benefits: during ECA for investigation, culling, and project planning; during review for organization and prioritization; and, during production for quality control and the protection of sensitive materials. EARLY CASE ASSESSMENT In general, early case assessment (ECA) serves dual purposes: to gain an understanding of the data in your discovery set for project planning and to acquire some substantive insight into the content of those documents for case planning. Performing meaningful ECA can enable you to develop a riskbenefit analysis of the case, establish a budget and timeline, prepare for detailed process negotiations at the 26(f) Meet and Confer, and begin developing your case strategy. A variety of analytic tools and techniques can be brought to bear on accomplishing these goals during ECA, either by using a custom extension of Relativity for that purpose, like Advanced Discovery’s, or by using kCura’s new Relativity ECA. Regardless of the context, the principle approaches are the same including: Sampling Perhaps the most valuable and most overlooked approach for conducting initial assessment of a new data collection is the use of random sampling. Pursuing the traditional approaches of targeted keyword searching, etc., will help you confirm (or deny) what you expect to find, but sampling can reveal those things for which you would not have thought to look. Put another way, random sampling is one of the only effective ways to discover unknown unknowns – the things you don’t know you don’t know. If the sampling employed is simple random sampling, performed in a statistically valid matter, it can facilitate not just discovery but also estimation. With a large enough sample, reliable projections can be made of how many documents will need to be produced, how many will need redactions, how Quick Guide: Essential Analytic Tools and Techniques I 2017 Page | 2 2 many will need privilege logging, and more. With that kind of reliably estimated information available at the beginning, project planning can become more science than art. An eDiscovery expert can assist you in leveraging sampling correctly for this purpose. Keyword The use of analytics does not preclude the use of keywords in the ECA. In fact, keyword expansion can be an excellent place to start if you have a good idea of what you are looking for. The search dictionary built into Relativity lists all iterations (including misspellings) of the root word input by the user, to help identify potential keywords. This can be used to include or exclude versions of words from the search. For instance, if you are looking for documents about trading instruments, you might choose tradable and its misspelling tradeable along with the phrase “trading instrument.” Clustering As noted above, keyword searching is a common step in investigating a new data collection, looking for hot documents and testing theories. But, we can also leverage another valuable analytic tool in conjunction with our keyword searching: clustering. Conceptual clustering is a form of unsupervised machine learning that builds a latent semantic index of all the concepts in your data collection and then looks for dense clusters of materials on the same or similar topics. These clusters, each defined by their core terms, can be leveraged a variety of ways during ECA including: • • • Browsing clusters to learn more about the collection and look for potentially relevant clusters Comparing keyword search results to clusters and then using cluster terms to expand the search Identifying clusters of clearly irrelevant materials for immediate culling (e.g., fantasy sports) Concept Search and Categorization Once you’ve used clustering and keyword expansion to find documents considered to be of interest, the question becomes: “Are there more documents like this document?” This is where the Conceptually Similar analytics tool comes in. As with clustering, it leverages the power of latent semantic indexing to identify documents conceptually similar to the documents currently being viewed. Among other uses, attorneys employ it to better understand the strength of their position in the case. In addition to Concept Search, once a set of key or hot documents are identified, these documents can be used to categorize and find additional documents conceptually similar to the documents coded as key or hot. Quick Guide: Essential Analytic Tools and Techniques I 2017 Page | 3 3 VISUALIZATION TOOLS kCura’s Relativity provides a variety of useful visualization tools, many of which can be leveraged usefully for ECA through a Relativity ECA extension such as Advanced Discovery’s or kCura’s. Relativity Dashboards now combine custom Widgets and Pivots together into commanding overviews of datasets. Some examples of the custom views that are possible and that are available for use in Advanced Discovery’s ECA extension of Relativity include: Email Author/Recipients This pivot provides a clear view of who was communicating with whom. It can be used to identify the key players. At times, this pivot will provide an indication of additional custodians from whom data must be collected. Email Author + Recipients/Dates The email dates pivot is useful in discoveries for cases with time-sensitive issues. When used in conjunction with the graphical view, it provides a timeline of who knew what when. This information can be used to help determine the strength of a position. Custodian Duplicate Document Overlaps This pivot indicates which custodians shared the most documents with each other. While this commonly indicates working groups, unexpected document sharing could indicate the need for additional investigation. Document Properties Hidden content is one of the common document properties analyzed with the pivots. Hidden content, such as hidden notes in Word or hidden slides in PowerPoint, might be grouped together to alert the reviewer to look for the content that is not immediately visible. Email Properties There are various email properties that provide valuable information in the ECA: messages marked “high importance” might indicate key conversations while unread messages might be less important than read emails. Our PMs have found that emails sent “on behalf of” someone else are often marketing emails or general company-wide announcements. They can be segregated for review together and will likely be tagged as non-responsive. With so many analytic tools and techniquees available during ECA, it is possible to carefully investigate and cull a data set, meaningfully plan for a project, and arrive at a 26(f) Meet and Confer prepared to engage in detailed negotiations about the discovery process and limits. Quick Guide: Essential Analytic Tools and Techniques I 2017 Page | 4 4 REVIEW ORGANIZATION Large-scale document review is both a crucial legal process and an undeniably assembly-line one. Real assessments must be made, but everything that can be done to make that assessment easy and make the transition to the next one fast, should be done. This means considering not just what documents are selected for review, but how best to organize, prioritize, and present them to the review team for assessment. Analytic tools and techniques can be leveraged to achieve optimal organization, including: Near-Duplicate Identification Near-duplicate identification assists the review in two ways: first, it groups superficially-identical versions of the same document together so that they are all coded the same way; and, second, it increases review speed by allowing near-duplicates to be reviewed in succession or coded as a group. Email Threading An email thread is what we call all of the messages stemming from the same root email in an email conversation. Email threading groups these messages together, most often in the order in which the messages were sent. As with near-duplicate identification, organizing connected messages into a thread ensures consistent coding and increases review speed by allowing for sequential review or group coding based on review of the most complete message in the thread. Categorization Categorization is similar to concept search, but with categorization, the user is able to identify exemplary documents with several different concepts for the system to group against. An important feature of this analytics tool is the ability to determine whether a document can be included in more than one category. By limiting documents to a single category, you eliminate duplicate review of the same document. However, review instructions will need to be broad enough so that reviewers can identify documents that are relevant to the matter even if the relevance is not based on the concept that the reviewer is working on. Relativity Assisted Review Categorization is also the fundamental process underlying Relativity Assisted Review, kCura’s predictive coding or technology assisted review (TAR) solution. TAR uses active machine learning within an iterative workflow to extrapolate coding decisions made by expert reviewers to a broader document population. Much has been and could be written about how TAR works, but here we will confine our discussion to its use as a review organization tool. TAR can be used to establish review priorities, assigning sets of documents to reviewers by document relevancy ranges, or to group documents by content based on different TAR trainings. This second scenario is similar to using categorization; however, categorization is a once-and-done computer Quick Guide: Essential Analytic Tools and Techniques I 2017 Page | 5 5 training system, while with TAR, the user can continue to train the system until a desired level of coding agreement between the human and the computer is reached. QUALITY CONTROL In an era of discovery sanctions and growing privacy concerns, ensuring that the right materials and only the right materials are produced is more important than ever. In addition to all of the inprogress quality control (QC) steps that occur during processing and throughout review, analytic tools and techniques can be leveraged to provide some additional safety nets before production, including: Sampling As discussed above, statistically valid simple random sampling is a powerful tool for assessing a given data set, and the same thing is true of any review set, production set, or remainder believed nonresponsive. Sampling tools and techniques like those described above can be leveraged to check populations at a variety of scales and with a variety of degrees of certainty, as the situation demands. Categorization Categorization can also be used to identify miscoded documents. To do this, the sample set is categorized using tagged documents as the basis for the categories. Then, we can pivot on tags to identify any review conflicts and re-review them to understand why they are not coded as expected. Near-Duplicate Identification and Email Threading If not already done during the organization and preparation of the review batches, near-duplicate identification and email threading can be leveraged during QC to ensure that all similar or related documents have been coded consistently. So, as we have seen, there are numerous ways to leverage analytic tools and techniques throughout the lifecycle of your discovery projects to learn more, do better, and do it more efficiently. And, these essential examples are only the beginning. With the guidance of Relativity Masters and Experts like those on our Solutions team, much more can be done, and with custom extensions of Relativity’s core functions, like Advanced Discovery’s ECA solution, even more enhancement becomes possible. Quick Guide: Essential Analytic Tools and Techniques I 2017 Page | 6 6 ABOUT THE AUTHOR Matthew Verga, JD, serves as the VP, Marketing Content for Advanced Discovery. Matthew is an electronic discovery expert proficient at leveraging his legal experience as an attorney, his technical knowledge as a practitioner, and his skills as a communicator to make complex eDiscovery topics accessible. A nine-year industry veteran, Matthew has worked across every phase of the EDRM and at every level from the project trenches to enterprise program design. As VP, Marketing Content, for Advanced Discovery, he leverages this background to produce engaging educational content to empower practitioners at all levels with knowledge they can use to improve their projects, their careers, and their organizations. ABOUT ADVANCED DISCOVERY Advanced Discovery is an award winning, end to end eDiscovery services and software provider, supporting law firmas and corporations since 2002. Advanced Discovery and its global family of companies, Millnet, LPI and Ditto, offer project planning and budgeting, data preservation and forensic collection, early case assessment, hosted review, managed document review, and more, from numerous state-of-the-art facilities around the world. The company employes leading professionals in the industry, applies defensible workflows, and provides industry proven technology across all phases of the eDiscovery lifecycle. This devotion to excellence has earned Advanced Discovery inclusion on the Inc. 5000 list of fastest growing companies in the US five consecutive years and recognition as a top provider by Legal Times, Texas’ Best and other publications. CONNECT WITH ADVANCED DISCOVERY 00 1 (866) 342-3282 Website Contact Us Blog Quick Guide: Essential Analytic Tools and Techniques I 2017 Page | 7 7
© Copyright 2026 Paperzz