EXPER T AD V ISOR Y - Advanced Discovery

SERIES
EXPERT ADVISORY
Advanced Discovery Quick Guide
2017
Essential Analytic Tools &
Techniques: Leveraging
Analytic Features and
Workflows to Increase
Efficiency, Ensure Quality,
and Decrease Costs
Essential Analytic Tools & Techniques
Analytics is a widely used term in the eDiscovery industry, but what are we referring to when we use it
in this context? Do we mean predictive coding? Another form of assisted review? Features based on
conceptual indexing? What about email threading and near-duplicate identification? Sampling tools?
When we refer to analytic tools, techniques or workflows in the eDiscovery context, we are referring
to all of these specific examples but also, more broadly, to any tool or technique that helps us
discover and understand meaningful patterns in our data collection. Anything that helps us filter at
the right level of granularity, identify associations between related materials, or recognize gaps in a
set of materials is analytic.
Certainly, there is no shortage of clever technology and branded features, but in practical terms, how
can analytic tools, techniques, and workflows actually be leveraged in our eDiscovery matters? What
benefits can they confer during each phase of a project?
This white paper reviews some of the core applications of the most common analytic tools and their
benefits: during ECA for investigation, culling, and project planning; during review for organization
and prioritization; and, during production for quality control and the protection of sensitive materials.
EARLY CASE ASSESSMENT
In general, early case assessment (ECA) serves dual purposes: to gain an understanding of the data in
your discovery set for project planning and to acquire some substantive insight into the content of
those documents for case planning. Performing meaningful ECA can enable you to develop a riskbenefit analysis of the case, establish a budget and timeline, prepare for detailed process negotiations
at the 26(f) Meet and Confer, and begin developing your case strategy.
A variety of analytic tools and techniques can be brought to bear on accomplishing these goals during
ECA, either by using a custom extension of Relativity for that purpose, like Advanced Discovery’s, or
by using kCura’s new Relativity ECA. Regardless of the context, the principle approaches are the same
including:
Sampling
Perhaps the most valuable and most overlooked approach for conducting initial assessment of a new
data collection is the use of random sampling. Pursuing the traditional approaches of targeted
keyword searching, etc., will help you confirm (or deny) what you expect to find, but sampling can
reveal those things for which you would not have thought to look. Put another way, random sampling
is one of the only effective ways to discover unknown unknowns – the things you don’t know you
don’t know.
If the sampling employed is simple random sampling, performed in a statistically valid matter, it can
facilitate not just discovery but also estimation. With a large enough sample, reliable projections can
be made of how many documents will need to be produced, how many will need redactions, how
Quick Guide: Essential Analytic Tools and Techniques I 2017
Page | 2
2
many will need privilege logging, and more. With that kind of reliably estimated information available
at the beginning, project planning can become more science than art. An eDiscovery expert can assist
you in leveraging sampling correctly for this purpose.
Keyword
The use of analytics does not preclude the use of keywords in the ECA. In fact, keyword expansion can
be an excellent place to start if you have a good idea of what you are looking for. The search
dictionary built into Relativity lists all iterations (including misspellings) of the root word input by the
user, to help identify potential keywords. This can be used to include or exclude versions of words
from the search. For instance, if you are looking for documents about trading instruments, you might
choose tradable and its misspelling tradeable along with the phrase “trading instrument.”
Clustering
As noted above, keyword searching is a common step in investigating a new data collection, looking
for hot documents and testing theories. But, we can also leverage another valuable analytic tool in
conjunction with our keyword searching: clustering. Conceptual clustering is a form of unsupervised
machine learning that builds a latent semantic index of all the concepts in your data collection and
then looks for dense clusters of materials on the same or similar topics. These clusters, each defined
by their core terms, can be leveraged a variety of ways during ECA including:
•
•
•
Browsing clusters to learn more about the collection and look for potentially
relevant clusters
Comparing keyword search results to clusters and then using cluster terms to
expand the search
Identifying clusters of clearly irrelevant materials for immediate culling (e.g.,
fantasy sports)
Concept Search and Categorization
Once you’ve used clustering and keyword expansion to find documents considered to be of interest,
the question becomes: “Are there more documents like this document?” This is where the
Conceptually Similar analytics tool comes in. As with clustering, it leverages the power of latent
semantic indexing to identify documents conceptually similar to the documents currently being
viewed. Among other uses, attorneys employ it to better understand the strength of their position in
the case. In addition to Concept Search, once a set of key or hot documents are identified, these
documents can be used to categorize and find additional documents conceptually similar to the
documents coded as key or hot.
Quick Guide: Essential Analytic Tools and Techniques I 2017
Page | 3
3
VISUALIZATION TOOLS
kCura’s Relativity provides a variety of useful visualization tools, many of which can be leveraged
usefully for ECA through a Relativity ECA extension such as Advanced Discovery’s or kCura’s.
Relativity Dashboards now combine custom Widgets and Pivots together into commanding overviews
of datasets. Some examples of the custom views that are possible and that are available for use in
Advanced Discovery’s ECA extension of Relativity include:
Email Author/Recipients
This pivot provides a clear view of who was communicating with whom. It can be used to identify the
key players. At times, this pivot will provide an indication of additional custodians from whom data
must be collected.
Email Author + Recipients/Dates
The email dates pivot is useful in discoveries for cases with time-sensitive issues. When used in
conjunction with the graphical view, it provides a timeline of who knew what when. This information
can be used to help determine the strength of a position.
Custodian Duplicate Document Overlaps
This pivot indicates which custodians shared the most documents with each other. While this
commonly indicates working groups, unexpected document sharing could indicate the need for
additional investigation.
Document Properties
Hidden content is one of the common document properties analyzed with the pivots. Hidden content,
such as hidden notes in Word or hidden slides in PowerPoint, might be grouped together to alert the
reviewer to look for the content that is not immediately visible.
Email Properties
There are various email properties that provide valuable information in the ECA: messages marked
“high importance” might indicate key conversations while unread messages might be less important
than read emails. Our PMs have found that emails sent “on behalf of” someone else are often
marketing emails or general company-wide announcements. They can be segregated for review
together and will likely be tagged as non-responsive.
With so many analytic tools and techniquees available during ECA, it is possible to carefully investigate
and cull a data set, meaningfully plan for a project, and arrive at a 26(f) Meet and Confer prepared to
engage in detailed negotiations about the discovery process and limits.
Quick Guide: Essential Analytic Tools and Techniques I 2017
Page | 4
4
REVIEW ORGANIZATION
Large-scale document review is both a crucial legal process and an undeniably assembly-line one.
Real assessments must be made, but everything that can be done to make that assessment easy and
make the transition to the next one fast, should be done. This means considering not just what
documents are selected for review, but how best to organize, prioritize, and present them to the
review team for assessment. Analytic tools and techniques can be leveraged to achieve optimal
organization, including:
Near-Duplicate Identification
Near-duplicate identification assists the review in two ways: first, it groups superficially-identical
versions of the same document together so that they are all coded the same way; and, second, it
increases review speed by allowing near-duplicates to be reviewed in succession or coded as a group.
Email Threading
An email thread is what we call all of the messages stemming from the same root email in an email
conversation. Email threading groups these messages together, most often in the order in which the
messages were sent. As with near-duplicate identification, organizing connected messages into a
thread ensures consistent coding and increases review speed by allowing for sequential review or
group coding based on review of the most complete message in the thread.
Categorization
Categorization is similar to concept search, but with categorization, the user is able to identify
exemplary documents with several different concepts for the system to group against. An important
feature of this analytics tool is the ability to determine whether a document can be included in more
than one category. By limiting documents to a single category, you eliminate duplicate review of the
same document. However, review instructions will need to be broad enough so that reviewers can
identify documents that are relevant to the matter even if the relevance is not based on the concept
that the reviewer is working on.
Relativity Assisted Review
Categorization is also the fundamental process underlying Relativity Assisted Review, kCura’s
predictive coding or technology assisted review (TAR) solution. TAR uses active machine learning
within an iterative workflow to extrapolate coding decisions made by expert reviewers to a broader
document population. Much has been and could be written about how TAR works, but here we will
confine our discussion to its use as a review organization tool.
TAR can be used to establish review priorities, assigning sets of documents to reviewers by document
relevancy ranges, or to group documents by content based on different TAR trainings. This second
scenario is similar to using categorization; however, categorization is a once-and-done computer
Quick Guide: Essential Analytic Tools and Techniques I 2017
Page | 5
5
training system, while with TAR, the user can continue to train the system until a desired level of
coding agreement between the human and the computer is reached.
QUALITY CONTROL
In an era of discovery sanctions and growing privacy concerns, ensuring that the right materials and
only the right materials are produced is more important than ever. In addition to all of the inprogress quality control (QC) steps that occur during processing and throughout review, analytic tools
and techniques can be leveraged to provide some additional safety nets before production, including:
Sampling
As discussed above, statistically valid simple random sampling is a powerful tool for assessing a given
data set, and the same thing is true of any review set, production set, or remainder believed
nonresponsive. Sampling tools and techniques like those described above can be leveraged to check
populations at a variety of scales and with a variety of degrees of certainty, as the situation demands.
Categorization
Categorization can also be used to identify miscoded documents. To do this, the sample set is
categorized using tagged documents as the basis for the categories. Then, we can pivot on tags to
identify any review conflicts and re-review them to understand why they are not coded as expected.
Near-Duplicate Identification and Email Threading
If not already done during the organization and preparation of the review batches, near-duplicate
identification and email threading can be leveraged during QC to ensure that all similar or related
documents have been coded consistently.
So, as we have seen, there are numerous ways to leverage analytic tools and techniques throughout
the lifecycle of your discovery projects to learn more, do better, and do it more efficiently. And, these
essential examples are only the beginning. With the guidance of Relativity Masters and Experts like
those on our Solutions team, much more can be done, and with custom extensions of Relativity’s core
functions, like Advanced Discovery’s ECA solution, even more enhancement becomes possible.
Quick Guide: Essential Analytic Tools and Techniques I 2017
Page | 6
6
ABOUT THE AUTHOR
Matthew Verga, JD, serves as the VP, Marketing Content for Advanced Discovery. Matthew is an
electronic discovery expert proficient at leveraging his legal experience as an attorney, his technical
knowledge as a practitioner, and his skills as a communicator to make complex eDiscovery topics
accessible. A nine-year industry veteran, Matthew has worked across every phase of the EDRM and at
every level from the project trenches to enterprise program design. As VP, Marketing Content, for
Advanced Discovery, he leverages this background to produce engaging educational content to
empower practitioners at all levels with knowledge they can use to improve their projects, their
careers, and their organizations.
ABOUT ADVANCED DISCOVERY
Advanced Discovery is an award winning, end to end eDiscovery services and software provider,
supporting law firmas and corporations since 2002. Advanced Discovery and its global family of
companies, Millnet, LPI and Ditto, offer project planning and budgeting, data preservation and
forensic collection, early case assessment, hosted review, managed document review, and more, from
numerous state-of-the-art facilities around the world. The company employes leading professionals in
the industry, applies defensible workflows, and provides industry proven technology across all phases
of the eDiscovery lifecycle. This devotion to excellence has earned Advanced Discovery inclusion on
the Inc. 5000 list of fastest growing companies in the US five consecutive years and recognition as a
top provider by Legal Times, Texas’ Best and other publications.
CONNECT WITH ADVANCED DISCOVERY
00 1 (866) 342-3282
Website
Contact Us
Blog
Quick Guide: Essential Analytic Tools and Techniques I 2017
Page | 7
7