|
|||||||
BACKGROUND:
Rhode Island’s KIDSNET integrates data from multiple public health programs, including Immunization, Lead, WIC, Newborn Screening, Hearing Screening, Early Intervention, Home Visiting and Risk Response, and Vital Records. Developed in the mid-1990’s, KIDSNET employed a simple deterministic algorithm for matching incoming data to existing KIDSNET demographic records. By 2004, KIDSNET had accumulated a queue of over 47,000 unmatched records. With a limited budget, RI embarked on a project to improve the matching process and to ultimately reduce the number of unmatched records.
OBJECTIVE:
Demonstrate how probabilistic matching and deduplication can be implemented with the help of open source software.
METHOD:
KIDSNET’s unmatched record queue was analyzed, and surveys of matching methods and software options were conducted. Requirements were documented, and a probabilistic matching, adding, and deduplication architecture for KIDSNET was designed. Febrl (Freely Extensible Biomedical Record Linkage), an open source package, was modified for use within the new framework. Probabilistic parameters were developed, and an extensive six-month testing process ensued. The process was placed into production in May, 2004.
RESULT:
The new process, combined with “human review” activity, reduced the number of unmatched records by over 93% in its first three weeks. Probabilistic deduplication combined with an interactive merging interface ensures that the number of duplicates in KIDSNET will remain low even as unmatched children are added to KIDSNET.
CONCLUSION:
There is a middle ground between less expensive, “home grown” deterministic matching and more expensive commercial products. KIDSNET has achieved success in this area with the help of open source software.
LEARNING OBJECTIVES:
To understand the software options for probabilistic matching, and to identify the potential benefits and limitations of implementing probabilistic matching with the help of open source software.
Recorded presentation
See more of Deduplication: Challenges and Solutions
See more of The 2004 Immunization Registry Conference