Electronically stored information (ESI) and files have a “hash value.” Hash values are strings of numbers and letters assigned to electronic data by a hashing algorithm (a hash function) and sometimes colloquially referred to as a file’s “digital fingerprint.” In the case of electronic discovery, the data given hash values are the individual files in the ESI collection (or portions of individual files). Two of the most common hash value algorithms are the MD5 and the SHA-1.
E-discovery software scans the hash values of each document in its database and flags or segregates duplicate files so that the same document is not reviewed multiple times. A common example would be an email message. Unless deleted, both the sender and recipient have a copy of the email, so rather than review the same message twice, the software identifies the file as having a duplicate, but presents only one copy for review.
Hash values are also used in digital and computer forensics to ensure electronic evidence has not been altered. Even a slight modification to a file changes its hash value, so an altered file would have a different hash value than the original. To ascertain whether a file has been altered, access to the original or “native” file is generally necessary so that its hash value may be compared against that of the file in question.
For example, let’s say we have two files and want to determine if either has been altered. To do this we would use digital forensic software (one of our favorites is Magnet Axiom) to compare the hash values against the values of the original file.
First, we would calculate the hash value for the original file.
The information in red below is the hash values of the original file.
Then we would compare the hash values of the other files to the original. As we can see below, Document 1 has been altered and does not match the hash of the original file:
Sample Document 1:
Sample Document 2 matches the original:
Two files match, one does not: