Percipient LogoPercipient LogoPercipient LogoPercipient Logo
  • About
  • Services
  • Articles & Resources
  • Contact
✕

8 Ways to Reduce Data for Large Document Reviews

May 6, 2022
Webinar Document Review Culling Percipient

We recently hosted a webinar “Recognize, Reduce, Review: Techniques to defensibly reduce your document review.” During the 30-minute presentation, Percipient’s Head of Forensics Vaish Palavalli and Document Review Project Manager Adam Szulczewski discussed eight techniques they use to reduce data collections to manageable sizes for efficient document reviews.

 

Check out the video of the presentation and summaries of the culling techniques below. 

 

Don’t have time to watch the video? Download the transcript now

 

 

 

Potential Data Culling Ideas:

1. File Types

“…we can potentially reduce by over a third of the documents for review.”

 

The first topic covered was culling by file types. Vaish began by pointing out that it is generally a good idea to “recognize file types that are not relevant to your project and pull them out of your review set.” This is a type of data culling that may not always be done due to technological limitations. If you have access to technology that allows filtering and removal by file type, that can greatly reduce the initial data set and start your savings on storage space from the beginning. 

 

2. Date Ranges

“….recognizing documents that fall outside of your project’s timeline and eliminating them from your review.”

 

Not all data reduction techniques need to be sexy. Adam discussed an “oldie but goodie”: culling data between two predetermined dates.  “This cuts down on costs significantly, because you’re only hosting data that falls within this date range period and not documents that fall outside that range that you’ll probably likely never look at.”

 

3. Search Terms

“…one of the first things we always do with our clients is create a list of search terms”

 

A pivotal part of effective data culling is the development of search terms. Search terms can be used to both identify responsive data that will need to be reviewed and also to search for and remove non-responsive data reducing the noise during your review. 

 

4. Privilege

“If we can withhold privileged documents for later and review things that are less likely to need privilege culls first, then we can get our production out faster.”

 

Culling by privilege is a great technique to expedite your review. Segregating potentially privileged files at the outset of a review may limit your overall review size. This helps better understand what is a priority to review and then review secondary data later. 

 

5. Near-Duplicate Analysis

“…how amazingly helpful deduping data can be when it comes to reviewing fewer documents.”

 

Lessening potential data for review with near-duplicate analysis is another helpful technique. Using “near dupe” analysis, a document is chosen as the base or pivot document. Then the ediscovery software compares other documents to the pivot document. The software is looking for textual similarities up to a pre-selected percentage. It is a very efficient way to quickly review a large amount of similar data and when augmented with human review of sample sets it is highly accurate. 

 

6. Domains

“…recognize irrelevant data using a list of domains that are extracted from our data.”

 

Domain culling is a very simple technique and can cut review time and costs tremendously. Running a domain report early in a project permits an easy opportunity to identify irrelevant email and related data so it does not make it to the review set.

 

7. Email Threading

“…group-related emails together, so the reviewer sees one coherent conversation.”

 

A great way to reduce your data storage costs by at least 25% is through the use of email threading and identifying the most inclusive email in a chain. Using this technique permits reviewers to view the last or most inclusive email in an exchange rather than reading each one by one (and maybe not in the right order).

 

8. Artificial Intelligence & TAR:

“…there’s multiple ways to leverage analytics to help you cull down your review set…”

 

The final topic discussed was the use of AI. Vaish covered two types of AI for data culling. The first was clustering, a method that uses AI to collect and organize related data into buckets. The other AI tool discussed was Technology-Assisted Review (TAR). With the use of TAR, a software algorithm is trained to identify files that are similar or related to others that have been previously reviewed and tagged as responsive, privileged, etc.

 

 

Other Articles You May Be Interested In

What is ECA (Early Case Assessment) and Why is it Important?

5 Search Tricks to Increase Legal Document Review Efficiency

Statistical Sampling in Legal Document and Data Reviews

 

 

Share
Thomas Pearce
Thomas Pearce
Tom Pearce, Percipient's Editorial and Content Manager, has worn many hats in the past such as Marine, Sales Associate, and Digital Marketing Coordinator before joining the Percipient Team. He has multiple certifications in HubSpot and Google bringing with him the knowledge and passion for marketing.

Related posts

Your guide to eDiscovery Review Protocol
September 29, 2022

The Complete Guide to Drafting Legal Document Review Protocols


Read more
Image for article on ediscovery search in microsoft 365
July 21, 2022

What Version of Microsoft 365 Do We Need for eDiscovery?


Read more
Artificial Intelligence Defensibility
June 23, 2022

Artificial Intelligence and Legal Defensibility – Distinguishing AI Concepts and Explaining in Plain Language


Read more
Percipient Logo

Learn

Articles & Resources

Technically Legal Podcast

Company

About

Services

Contact

Talk to Us
(c) Percipient, LLC – not a law firm and
not licensed to practice law in any jurisdiction.
Privacy Policy
Website construction by WorkSite, LLC