!! ! TAR 2.0 CASE STUDY Is your TAR temperature 98.6? We’re getting hot results using Catalyst’s Insight Predict platform to significantly streamline document review. Challenge: Find the responsive documents out of an already filtered population of 2.1 million in a secure, timely and cost-effective manner Solution: Using DSi and Catalyst’s Insight Predict, a robust TAR 2.0 platform, the review team found approximately 98% of the relevant documents after viewing only approximately 6% of the document population Overview A large financial institution had been allegedly defrauded by a borrower. The details aren’t important to this discussion, but assume the borrower employed a variety of creative accounting techniques to make its financial position look better than it really was. And, as is often the case, the problems were missed by the accounting and other financial professionals conducting due diligence. Indeed, there were strong factual suggestions that one or more of the professionals were in on the scam. As the fraud came to light, litigation followed. Perhaps in retaliation or simply to mount a counteroffense, the defendants hit the bank with lengthy document requests. After collection and best-efforts culling, DSi was left with over 2.1 million potentially responsive documents. Neither time deadlines nor budget allowed for review of such volume. Keyword search offered some help, but the problem remained. How do you work with 2.1 million potentially responsive documents? Process DSi loaded the documents into Insight Predict, Catalyst’s proprietary system for Technology Assisted Review. Predict uses an advanced form of Continuous Active Learning (CAL), which was developed by Catalyst over the past few years. The process takes advantage of Insight’s ability to rank documents in the review population on a continuous basis. As reviewers tag documents, the system takes into • 877-797-4771 • ! DSicovery.com !! ! account the new judgments, and re-ranks the remaining, unseen documents. Review continues, followed by continuous ranking. This review/train/ranking cycle continues until the review process is complete. Catalyst’s Continuous Active Learning process often starts with documents found through keyword search or other methods of locating relevant documents – such as witness interviews, key custodian review, etc. These are fed into the system as seeds for an initial ranking to get the process started. They can also be added later to aid in training the algorithm. No matter how documents are found or where they are coded, those additional judgments can be fed back into Predict as judgmental seeds/training documents. CONTINUOUS ACTIVE LEARNING There are two aspects to continuous active learning. The first is that the process is “continuous.” Training doesn’t stop until the review finishes. The second is that the training is “active.” That means the computer feeds documents to the review team with the goal of making the review as efficient as possible, minimizing the total cost of review. Because training and review are part of the same process, there is no requirement that a separate subject matter expert review 3,000 or so documents as “training” in advance of the review. The system also includes contextual diversity samples to combat bias, and the QC algorithm continues to learn as the review progresses. CAL also allows rolling document collections, which occurred in this case (and are common in most cases). Since the system’s training is not based on a separate control set—but instead by measuring the fluctuation in ranking across all the files—newly collected documents can be added on the fly. They are immediately ranked, and any new subject matter introduced by the new collections is identified for review by Catalyst’s contextual diversity algorithm. Results: 98% Recall; Only 6% Reviewed Our CAL protocol allowed the review team to find and review approximately 136,000 documents out of the total population of approximately 2.1 million. Of the documents reviewed, the team marked 23,950 as relevant. A systematic sample of just under 6,000 documents confirmed that the team had found approximately 98% of the relevant documents in the collection. • 877-797-4771 • ! DSicovery.com !! ! We can illustrate these results through a yield curve, which is drawn from the systematic sample taken at the end of the review. Yield Curve from Systematic Sample Taken at the End of the Review Yield curves are relatively easy to understand. The X-axis shows the number of documents actually reviewed by the team as a percentage of the total documents. The Y-axis shows the percentage of relevant/responsive documents found as the review progressed. The red line shows the expected outcome of a linear review where documents would be presented randomly. The blue line shows the progress of the review team in finding relevant documents. In total, the team reviewed approximately 6% of the document population and found 98% of the relevant files. • 877-797-4771 • ! DSicovery.com !! ! Seed Sets TAR 1.0 products require that a senior attorney, often called a Subject Matter Expert (SME), do initial training before review can begin. Training is iterative in that the SME goes through a series of training rounds before the process is complete. It is a one-time process in that when training concludes, that is it. The review team jumps in to look at documents but there is no easy mechanism to return their judgments to the algorithm to make it smarter. One-time training means a one-time ranking. Catalyst’s Insight Predict is built using a TAR 2.0 engine, which allows but does not require that an SME do initial training. It encourages the use of review teams for training and the use of senior attorneys to find relevant documents using keyword search, witness interviews and any other means at their disposal. In this case, the senior attorneys used Insight’s powerful search tools to find initial seeds for training. They were also able to use relevant documents from an earlier production as examples of positive seeds. Predict allows these judgment seeds to be added at any stage of the process. Prioritized Review After the initial ranking based on keyword and tagged seed documents, the review team began reviewing batches containing a mixture of highly ranked documents, plus a smaller number of exploratory documents chosen by the system through a “contextual diversity” algorithm. The purpose of the contextual diversity process is to find documents that are markedly different than the ones already reviewed. The platform’s proprietary algorithm identifies the most diverse sets of documents, pulls a representative document from each and presents it to the reviewer as part of the review batch. If the reviewer tags it as relevant, Predict uses this new information to promote other similar documents for review. Through sampling, one in 100 documents were estimated to be relevant, indicating a richness of 1%. As the CAL review progressed and the training took hold, reviewers received higher volumes of relevant documents, reflecting CAL’s objective to move relevant documents to the top of the order. We saw relevance rates between 10% and 25% (occasionally 35%), which represented a large increase over what could be expected from a linear review. Ultimately, the review team continued the review process until the percentage of relevant documents in their batches petered out. Then, a systematic sample was conducted to determine the success at finding relevant documents. • 877-797-4771 • ! DSicovery.com !! ! Validation We built a yield curve based on a systematic sample of approximately 6,000 documents. We then focused on 5,354 sample documents that had not been reviewed and which came from the approximately 1.8 million documents left in the discard pile (i.e. below the cutoff). The purpose was to confirm that we were not leaving too many relevant documents in the discard pile and to calculate recall. Out of the 5,354 not-reviewed samples, the attorneys found only one document that they tagged as relevant. Using a binomial calculator, we can we determine a point estimate for richness in the discard pile along with a confidence interval around that point estimate. Here are the figures we obtained: With a point estimate of 0.02%, we estimate there could be 371 relevant documents in the discard pile, out of 1,852,589. Using the upper confidence interval figure (0.0010) to calculate a worst-case scenario, we estimate that there could be as many as 1,853 documents in the discard pile. Note that we are using a confidence level of 95%, which is an industry standard. As noted earlier, 23,950 documents were found relevant. Using the point estimate, we can estimate that the team found 98% of the relevant documents (23,950 out of 24,321). If we use the higher boundary of the confidence interval, we can estimate that we found at least 93% of the relevant documents (23,950 out of 25,803). Both are markedly higher than the recall values approved by the courts, which are closer to 75%. • 877-797-4771 • ! DSicovery.com !! ! TAR at 98.6? Pretty Hot The protocol allowed the review team to find and review approximately 136,000 documents out of the total population of approximately 2.1 million. Of the documents reviewed, the team marked 23,950 as relevant. The team found approximately 98% of the relevant documents in the collection after viewing only about 6% of the document population – skipping the review of over 1.8 million documents. That’s a pretty hot result. About DSi Serving law firms and corporate legal departments worldwide since 1999, DSi (formerly Document Solutions, Inc.) is a litigation support services company that provides advanced eDiscovery and digital forensics services. Through five core business processes—DSicollect, DSintake, DSinsight, DSireview, DSisupport—DSi’s highly trained staff will help you harness today’s most forward technology to gain a competitive advantage. DSi is headquartered in Nashville, Tenn. with offices in Knoxville, Tenn., Cincinnati, Ohio, Charlotte, N.C., Minneapolis, Minn., Atlanta, Ga. and Washington D.C. For more information, please visit DSi at www.dsicovery.com or follow us on Twitter at: @DSicovery. About Catalyst Catalyst designs, hosts and services the world’s fastest and most powerful document repositories for large-scale discovery and regulatory compliance. For more than 15 years, corporations and their counsel have relied on Catalyst to help reduce litigation costs and take control of complex legal matters. Catalyst provides secure, scalable multi-language document repositories specifically built to manage Big Discovery. Through Catalyst Insight, its next-generation ediscovery platform, and Insight Predict, its advanced technology-assisted review tool, Catalyst enables corporations to reduce the cost and risk of discovery, achieve greater control and predictability in workflows, and gain greater visibility and accountability across all their matters. To learn more about Catalyst, visit catalystsecure.com or follow the company on Twitter at @CatalystSecure. • 877-797-4771 • ! DSicovery.com
© Copyright 2025 Paperzz