Skip Navigation Links
Centers for Disease Control and Prevention
CDC
CDC CDC Home Search Health Topics A-Z
Contact Help Travelers Health n i p Home NIP header
Family

Thursday, March 24, 2005 - 9:25 AM
88

Methods for Evaluating Deduplication

Brandy Altstadter, Immunization Registries, Scientific Technologies Corporation, 67 E. Weldon Avenue, Phoenix, AZ, USA


BACKGROUND:
Deduplication is plays one of the most important roles in ensuring data quality in a registry. In an effort to ensure that registry deduplication is as accurate and complete as possible, we examined several methods for evaluating the deduplication algorithm, including:
1) Matching based on a 3rd record
2) Comparing two or more algorithms with a large dataset
3) User feedback
4) CDC Deduplication Toolkit

OBJECTIVE:
The purpose of the analysis was to determine the advantages and disadvantages of each method and, ultimately, to identify improvements to the deduplication algorithm.

METHOD:
Each method was evaluated using a variety of datasets (except for the CDC Deduplication Toolkit which uses a standard dataset). The results were examined to determine the advantages and disadvantages of each method. In addition, the results were used to identify suggestions for improvement. The presentation will provide additional detail of how each method was executed, including a brief technical overview.

RESULT:
Each method has several advantages and disadvantages which will be outlined in the presentation. Among these is the fact that the methods vary in their ability to allow the quality of the deduplication algorithm to be quantified. However, each method provides valuable feedback. In some cases, the findings overlap but each method also provides some unique information that is not easily obtained from other methods.

CONCLUSION:
Registry deduplication is a process of continuous improvement. In addition, is important to test and review the deduplication algorithm from multiple angles to obtain the highest level of data quality.

LEARNING OBJECTIVES:
Workshop attendees will learn several methods for evaluation their deduplication algorithms so that they can identify areas in their algorithms that need improvement.

See more of Immunization Registries Track: Data Standards: Meeting the Demand for High Quality Data
See more of The 39th National Immunization Conference (NIC)