“Deduplication” or “Deduping” is the process of comparing computer files in a data-set and removing or segregating duplicates. Two significant benefits of using e-discovery software are deduplication capabilities and identification of “near duplicates.” (Near duplicate documents are those that are closely related, such as contract drafts with textual differences, or a document in different formats). Deduping a document collection reduces the number of documents to review. E-discovery software generally dedupes document collections by analyzing the hash value of the files.
Vertical deduplication occurs when duplicates are removed from documents collected from individual data custodians. This is also sometimes called custodian deduplication.
Horizontal, or global, deduplication occurs when a whole data-set is analyzed and duplicates are removed.