Despite advances in artificial intelligence and machine learning in e-discovery software, proper use of search functionality is still an important tool for efficient document reviews. However, searching can be tricky and, if not phrased right, may turn up less than ideal results. Why is that, and what can be done to improve search results? Here are five important search concepts and tricks to help return better targeted search results.
When building a search index, e-discovery software uses “noise words” to improve search performance. Noise words are words that are so common that they are deemed unimportant for the purposes of searching (for example, words like and, if and it). Most e-discovery software skips noise words when it indexes documents.
That noise words are ignored e-discovery software can be an issue if you are searching for similar phrases. For example, if you are trying to search the phrase “IT department”, your results may bring up any document that has two words with the second word being department.
There are a few things you can do to avoid this issue. You can use proximity searches and other key terms to filter out junk search results. For example, if you notice that IT department is always followed by the word computers you can use proximity searching to narrow down your search results (Department w/2 computers). You can also use other key words that are closely associated with your term to identify documents.
If using proximity search and expanding keyword searches does not help, another way to address noise word related search issues is deleting the noise word from your software’s default noise word list and re-run the index. This is a good option if you anticipate using the noise word frequently in your searches or if using proximity searching does not return accurate search results. (Note, you must re-index your documents after adjusting noise words and it might take longer for your searches to run if you delete words that are in your noise words list).
Here is a list of default noise words used by a few popular document review platforms:
Most e-discovery software offerings have different types of search indexes that can be used during a search. Common indexes are keyword, dtSearch, and Lucene. It is important to understand which index you are using to use to optimize your search results. A big mistake people make is using search syntax not recognized by the specified search index. Take a look the following helpful chart to help you build your search:
If you typed in dog AND cat OR bird, in what order do you think your search will be performed? Many assume that because dog AND cat appear first, the software searches for those words first. That is not the case.
It is important to understand order of operations to obtain accurate search results. The criteria within logic groups or parentheses are assessed first before evaluating against the other search conditions.
In a long string of search terms without any indication of order, OR conditions are always performed first. So, in the case of the search dog AND cat OR bird, the search will run cat OR bird first and then run AND dog (i.e. dog AND (cat OR bird)). Using parenthesizes and logic groups in complex searches can help you specify the order in which you want your search to be run.
RegEx is a search type that uses patterns instead of terms or phrases to search documents. This is especially helpful when searching for things like social security numbers, phone numbers, bates numbers, zip codes, URLs, email address, dates, etc. Here is a common use case chart and the appropriate RegEx syntax to use for searches.
When creating custom fields within your document review platform, something many overlook is that the type of fields added impacts how searches are performed. This is an important consideration during case setup because in most review platforms a field type cannot be changed so you are stuck with the field type unless you take several time-consuming steps to transfer data into a newly created field. Understanding how you would want to search within each field will help you determine what type of field you want to use. Use this chart to help you decide what type of field you should use: