Did you just receive a document production with a load file error?
You are not the first and will not be the last.
What are load files? Files that help load and organize electronic documents and electronically stored information (ESI) when it is imported into e-discovery software so that it may be viewed, searched and filtered.
To make your life easier, we discuss three common load file snafus and how to remedy them.
To start, let’s discuss what is included in a database production. (For purposes of this article, when we refer to a “database production”, we mean a collection of ESI loaded into e-discovery software in a form other than its native (original) format. As a result, a load file may be necessary to organize and maintain original information about the documents (metadata).
Usually a database production includes load files, along with ESI in an image format (such as tiff or pdf) and corresponding text files. Load files contain data about each document and helps link the image and text files when they are imported into an e-discovery database.
Load files also generally contain metadata relating to the documents such as the date a document was created, the original file name, and for email, information about senders and recipients. This data is saved in load files in tabular format and each piece of information is separated by a delimiter–characters (often commas) used to separate one value from another in a record.
Have you ever opened a load file only to find random characters that made no sense and did not look like the text you were expecting to see? Chances are that the file is incorrectly encoded.
Encoding is the process of converting data into a format recognizable by the software you are using. Load files may be created with an encoding format incompatible with the e-discovery software into which they will be loaded; this means the load file will be unreadable. So, it is very important to resolve encoding errors before importing them.
E-Discovery software often only recognizes certain encoding formats. Here is a list of a few popular e-discovery database platforms and compatible encoding formats:
Logikcull: UTF-8, ASCII
Relativity: Western European, Unicode, UTF-7, UTF-8, ASCII
Concordance: Unicode Standard (UTF-1, UTF-7, UTF-8, UTF-EBCDIC, UTF-16, UTF-32), ASCII
Summation: UTF-8-BOM, ASCI, ASCII
To convert a load file into the proper encoding for the software you are using, text editing software like Notepad++ may be used. To correct encoding errors, open the load file with the text editor and re-save the file in an encoding format compatible with the software into which you will import the ESI or document production.
Another frequent error encountered when loading document production load files into e-discovery software is a line mismatch error. This simply means that the number of documents being uploaded does not match up with the number of lines in the load file.
Each line in a load file corresponds to a single document. Thus, the number of lines in a load file must match the number of documents being imported. If they do not match, a common cause is an extra line break in the load file.
This is a tricky error to fix because you must dig deep into the load file to find the extra line break. One way to identify the error is again by opening the load file in a text editing tool, noting the number of lines in the file and compare it to the number of documents to be uploaded into the e-discovery software.
The number of extra lines in your load file compared to the number of documents is the number of line break errors you must fix. To correct, review the load file and delete any unnecessary line breaks which are indicated by delimiters (a text character like a comma or paragraph sign). Be aware of the type of delimiters used in your load file and maintain the proper format as you adjust any line breaks. (Check out this article for more on delimiters). Once you have deleted the extra line breaks the number of lines in your load file should match the number of documents you are uploading.
When importing data into an e-discovery database, the fields in a load file must be matched to the appropriate fields in the software. This is called mapping. Additionally, if information in the load file fields are not formatted in a way that the e-discovery software recognizes, errors will occur upon import.
An example of a common formatting error relates to dates. If software requires date formats listing the full year (01/01/2018) and your load file lists only the last two digits of the year (01/01/18) the metadata will not populate the field. To fix this the load file must be corrected so all dates are in the proper format of 01/01/2018. Once the formatting has been corrected in your load file you will be able to properly import your date fields into your database.
Another common load file formatting issue occurs when text in a metadata field is larger than what the database allows. There are typically two text formats for fields in your database, one for short text that has a character restriction and one that is a long text format. If you have a field that is formatted for short text but in your load file you have documents that with more characters than the field accommodates, an error will occur during import. This commonly happens with a field such as Email CC, Email BCC, Email To, and text link fields.
To fix this problem, try to adjust database fields to handle longer text entries. (Note: In Relativity, you will not be able to change the format of a field after creation. To fix a formatting issue you need to create a new field with the proper formatting type.)
These are just a few errors that you may encounter when importing load files into an e-discovery database. If you have questions about your load files, send us a message. We are always happy to help.