E-Discovery Tip Sheet
LegalTech 2015 – Some Panels and Briefings
Last month I took you on a select tour of the vendor exhibits and products from
LegalTech 2015. This month I want to provide a small brief that might provide a little
more incentive to brave the cold and crowds next time around.
Below I have digested one plenary session, and two vendor briefings from kCura
on developments in their industry-leading review platform, Relativity 9.
A.
Legal Tech Panel Session: “Taking TAR to the Next Level: Recent Research and
the Promise of Continuous Active Learning”
This panel was comprised of Professor Gordon Cormack of University of
Waterloo and Maura R. Grossman of Wachtell Lipton, co-authors of a cornerstone study
of technology assisted review; Magistrate Judge Andrew Peck, a leading voice from the
Federal bench on ediscovery issues; Susan Nielsen Hammond, General Counsel of
Regions Financial Corporation; and moderator John Tredennick of big data review
vendor Catalyst Systems.
To boil down a deep and interesting discussion, the evolution and efficacy of
several classes of computer assisted review were compared to the “false gold standard”
(per Judge Peck) of linear manual review, and to each other. Ms. Grossman, followed
up by Professor Cormack, used slides to illustrate the differences in process and efficacy
of three different types of computer learning:
> Simple Passive Learning (SPL):
1.
Critical initial factors are (a) seed set selection – random vs. judgmental;
and (b) number of documents in the seed set.
March 2015
E-Discovery Tip Sheet
2.
Review and code seed set (by “Expert”, i.e., senior attorney on case).
3.
Feed expertly-coded seed set to algorithm; evaluate machine vote
Page 2
effectiveness and training result.
4.
Repeat as required to “stabilize” results (“till the Popcorn stops popping”,
or the stability does not materially change).
5.
When done, run results against entire document set.
6.
Review documents auto-coded “Responsive” or above confidence ranking
percentile cut-off.
7.
Team chooses next set to review.
> Simple Active Learning (SAL):
1.
Create Control Set – think of as a Responsive key for benchmark.
2.
Critical factors in seed set selection are random vs. judgmental, and
number of documents in set, as above.
3.
Review and code seed set (by “Expert”).
4.
Use machine learning algorithm to select documents from which it will
learn the most (ambiguous content).
5.
Still an iterative process until “stable” (all the popcorn is popped).
> Continuous Active Learning (CAL):
1.
Seed set (initial training set) selection is judgmental; also dependent on
number of documents in set. Inferentially, some initial document counts
have been calculated that seem to create a stable set under multiple
circumstances (between about 5,000 and 14,000). Example given was to
put in one or more party’s Request for Production as part of the set.
2.
Machine learning algorithm based upon review.
(a)
Review and code newly suggested documents and add to training
set.
{00032533;1 }
March 2015
E-Discovery Tip Sheet
(b)
3.
Page 3
Repeat until substantially all documents have been reviewed.
Iterative, constant review and feedback.
Professor Cormack noted, in reviewing the higher recall of CAL, that search
term-based seed sets contain a built-in STOP, limited by the keyword hits, even within
TAR. Analyses were offered of recall versus effort in a first-level doc review: for
example, 56,000 documents were required in SPL to reach the same level of recall as
5,000 documents in SAL.
Ms. Hammond added a practical perspective on the theoretical and judicial
discussions: in regulatory practice, precision is vital. Having used most of the types of
tools under discussion, she noted that testing is needed for determining good seed sets,
and continues to be required as new terms arise during review. She recommended a
blended approach continuing to engage human intelligence.
A toolkit and resources for the Cormack & Gordon SIGIR ’14 report, including 4
Text Retrieval Conference (TREC) ’09 Enron databases which were part of those used
for the controlled comparison of SPL, SAL and CAL cited, are available for free under
the GPL at trec.nist.gov, among other sources.
B.
kCura Relativity Briefings
1.
The Mobile Attorney: Working with Key Documents Using Relativity
Binders. Relativity v8 and later can export and synchronize Binder data with an iPad in
this Mobile and Web application that helps consolidate critical case documents. Binders
are locked behind the Apple encryption keychain for security. Relativity field settings
control metadata, docket or coding output, with contents based upon a Saved Search.
Binder users must already be licensed Relativity users. Among the limited palette of
features available to mobile Binder users are:
- Annotations (highlight, note, draw, control colors and thickness – only see own);
- Organization (create Sections, drag and drop);
- Search of metadata or text (builds an index on the iPad, with highlights on hits; must
use UPPERCASE only for Boolean AND, OR, NOT);
{00032533;1 }
March 2015
E-Discovery Tip Sheet
Page 4
- Offline Access (sync with Relativity as Backup, visible only to individual Binders user,
via HTTPS or SSL);
- AirPlay iPad Binder info can be wirelessly projected to Apple TV; and
- Binders on Web (Binder viewer, track changes, sync across multiple devices).
One can do incremental Binder builds, with updates and additions; won’t
remove anything, though. Apple iOS will warn on space, and can set auto-expire to
clear.
With Relativity 9, users will be able to publish to Binders, even push a single doc
to a pre-made Binder. There will also be mobile device management and security
configuration, as well as added Notifications, Favorites, Preview before download (but
no filesize parameter); the beta is due in March/April 2015. Must have Native Imaging
Server (the processing bit add-on module, which requires additional servers) to use
Binders. This is NOT a collaborative tool at this point.
2.
Relativity Analytics Overview.
The presenter discussed analytics in case workflow as ideal where there is a short
time line, such as in a Federal second request on a prospective merger, and a lot of data
to get through. She cited that the average case here was about 1M docs, and the top
1000 cases were about 3.8M docs. Relativity Analytics is thus intended to (a) investigate
an unknown data set for doc types, languages, and find related documents; (b) evaluate
large sets of data and prioritize; or (c) structure documents by batching out clusters.
The presenter broke it out as follows:
> Email Threading (based on Content Analyst) – identify a group within a conversation;
display groupings; show master inclusive email (indicated by a solid dot).
> Near Duplication – organization of highly similar text into relational groups with
percentage of similarity; used for review batching or conflict check, or to find subtle
differences in language between documents.
> Language Identification – Determine primary and up to 2 secondary languages per
document; report percentage of text in each language found; handles 172 languages and
{00032533;1 }
March 2015
E-Discovery Tip Sheet
Page 5
dialects. Used to assign documents to language review teams, create grand total charts
and reports, and further classification. One cannot exclude text, at least in Relativity 8.
The above fall into the category of Document Organization and Structure. Next
are Conceptual Analytics:
> Latent Semantic Analysis – mathematical assessment of language learned from
documents in the current case, based upon concepts, not words –
- “aboutness” (about a plan, RFP re subject, precis of blog post content),
versus more common
- “is-ness” (metadata, keyword, proximity, document type, author).
> Search using example sentence, paragraph, entire document to return documents
related in concept, based on ideas and thus conceptual relevancy to get around false
keyword hits, misspellings and code words.
> Keyword expansion – submit a term to list conceptually-related items –
- Develop a search term list (synonyms).
- Learn language of a case (jargon / new terms / idiom).
- Revealing code words and variations.
Last are the Review and QC analytics:
> Clustering – group documents by concept and visual hierarchy (a title is provided for
each cluster of 4 words found together). One can then batch out by cluster (# of docs,
score e.g. 0.65). The process runs an index of all documents in the workspace, or by
custodian, or by set submitted for Analytics clustering. This facilitates Mass actions,
e.g., Mass Tag a certain cluster Not Relevant. One can batch out either using or
overriding the Family Field Group identifier.
> Categorization – Based upon expert user-defined examples or categories, using
example documents from Relativity Assisted Review. Use for Prioritization, sorting
large volumes quickly, or creating a pivot table to visualize clusters against categories.
Under the Indexing & Analytics Tab, can set example source (e.g., Tag), maximum
{00032533;1 }
March 2015
E-Discovery Tip Sheet
Page 6
categories per document, minimum coherent score (default = 70%) and issue
designation.
◊◊◊
The above notes represent a tiny fraction of what was on offer at LegalTech. The
show truly is one place and time where legal technology people, knowledge and
commerce converge. Hope to see you there next year!
-- Andy Kass
[email protected]
917-512-7503
The views expressed in this E-Discovery Tip Sheet are solely the views of the author, and do not necessarily
represent the opinion of U.S. Legal Support, Inc.
U.S. LEGAL SUPPORT, INC.
ESI & Litigation Services
PROVIDING EXPERT SOLUTIONS FROM DISCOVERY TO VERDICT
•
•
•
•
•
•
e-Discovery
Document Collection & Review
Litigation Management
Litigation Software Training
Meet & Confer Advice
Court Reporting Services
•
•
•
•
•
•
At Trial Electronic Evidence Presentation
Trial Consulting
Demonstrative Graphics
Courtroom & War Room Equipment
Deposition & Case Management Services
Record Retrieval
www.uslegalsupport.com
Copyright © 2015 U.S. Legal Support, Inc., 425 Park Avenue, New York NY 10022 (800) 824-9055. All rights reserved.
To update your e-mail address or unsubscribe from these mailings, please reply to this email with CANCEL in the subject
line.
{00032533;1 }
© Copyright 2026 Paperzz