Chris K. Kim, Sarah H. Kang, and Mohammed A. Mohammed. Immunization Branch, California Department of Public Health, 850 Marina Bay Parkway, Building P, 2nd Floor, Richmond, CA, USA
Learning Objectives for this Presentation: By the end of the presentation participants will be understand one approach to de-duplication of immunization records.
Background: De-duplication of records is essential to attain accurate and integrated data in immunization information systems (IIS).
Objectives: To create a systematic approach and algorithm for de-duplicating immunization registry records.
Methods: We developed a de-duplication algorithm utilizing deterministic characters of data elements to identify the possible duplicates and utilizing probabilistic weights to determine the level of matching records at both the person and vaccine history levels. Outcomes from different approaches were compared and analyzed to identify accuracy and the performance.
Results: The combination of deterministic and weights-based approaches was superior to the weights-based probabilistic approach alone. The combined algorithm recognized 84% of possible duplicate records percent as “high-confidence” matches. The algorithm deduced 100% of the non-matches.
Conclusions: An algorithm utilizing deterministic characters of data elements and probabilistic weights to determine the level of match would appear to be useful for the critical process of de-duplication in IIS.