KEYWORDS:
Data Quality, Immunization Registry, De-Duplication
BACKGROUND:
In 1998, CDC gave a grant to Emory University to study de-duplication of immunization registries. The analysis of typographical errors in immunization registry data was performed as part of that study, and has been researched further since the completion of that study.
OBJECTIVE(S):
The objective was to use software to recognize typographical errors in data, to enhance the capability of recognizing multiple records for the same person in an immunization registry database.
METHOD(S):
An iterative process was used in which the human recognition of a particular pattern of typographical error was then encoded as a software algorithm to search for and automatically identify additional records for the same person that contained these typographical errors. Once this software was implemented, additional patterns of typographical errors would often become visible in different fields of the recognized pair of records, and the process would be repeated.
RESULT(S):
The majority of typographical errors observed fall into a limited group, which can be programmed into de-duplication algorithms.
CONCLUSIONS(S):
Software that recognizes certain common forms of typographical errors can enhance immunization registry de-duplication.
LEARNING OBJECTIVES:
Understand the relationship between Data Quality, typographical errors, and record de-duplication.
Web Page: www.dedup.com
Handout (.pdf format, 51.0 kb)
Back to Protect: Data Quality — Part I
Back to Contributed Papers
Back to The 2002 Immunization Registry Conference of CDC