Despite advances in artificial intelligence and machine learning in e-discovery software, proper use of search functionality is still an important tool for efficient document reviews. However, searching can be tricky and, if not phrased right, may turn up less than ideal results.
Why is that, and what can be done to improve search results? Here are five important search concepts and tricks to help return better-targeted search results.
When building a search index, e-discovery software uses “noise words” or “stop words” to improve search performance. Noise words are words that are so common that they are deemed unimportant for searching (for example, words like and, if, and it). Most e-discovery software skips noise words or otherwise removes stop words when it indexes documents.
That noise words are ignored e-discovery software can be an issue if you are searching for similar phrases. For example, if you are trying to search the phrase “IT department”, your results may bring up any document that has two words with the second word being department.
There are a few things you can do to avoid this issue. You can use proximity searches (a Boolean search technique) and other key terms to filter out junk search results. For example, if you notice that the IT department is always followed by the word computers you can use proximity searching to narrow down your search results (Department w/2 computers). You can also use other keywords that are closely associated with your term to identify documents.
If using proximity search and expanding keyword searches does not help, another way to address noise word-related search issues is deleting noise words and stop words from your software’s default noise word list and re-run the index. This is a good option if you anticipate using the noise word frequently in your searches or if using proximity searching does not return accurate search results. (Note, you must re-index your documents after adjusting noise words and it might take longer for your searches to run if you delete words that are in your noise words list).
Download all of the search cheatsheets below here or by clicking the images.
Here is a noise word list of default noise and stop words used by a few popular document review platforms:
Most e-discovery software offerings have different types of search indexes that can be used during a search. Common indexes are keyword, dtSearch, Lucene and Elasticsearch. It is important to understand which index you are using to use to optimize your search results. A big mistake is using search syntax not recognized by the specified search index. Take a look at the following helpful chart to help you build your search:
If you typed in dog AND cat OR bird, in what order do you think your search will be performed? Many assume that because dog AND cat appear first, the software searches for those words first. That is not the case.
It is important to understand the order of operations to obtain accurate search results. The criteria within logic groups or parentheses are assessed first before evaluating against the other search conditions.
In a long string of search terms without any indication of an order, OR conditions are always performed first. So, in the case of the search dog AND cat OR bird, the search will run cat OR bird first and then run AND dog (i.e. dog AND (cat OR bird)). Using parenthesizes and logic groups in complex searches can help you specify the order in which you want your search to be run.
RegEx (regular expression search) is a search type that uses patterns instead of terms or phrases to search documents. Many eDiscovery software platforms (such as Reveal and Relativity) support regular expression searches. This is especially helpful when searching for things like social security numbers, phone numbers, bates numbers, zip codes, URLs, email addresses, dates, etc. Here is a common use case chart and the appropriate RegEx search syntax.
When creating custom fields within your document review platform, something many overlook is that the type of fields added impacts how searches are performed. This is an important consideration during case setup because in most review platforms a field type cannot be changed so you are stuck with the field type unless you take several time-consuming steps to transfer data into a newly created field. Understanding how you would want to search within each field will help you determine what type of field you want to use. Use this chart to help you decide what type of field you should use: