Development of a De-duplication model for a Public/Private Immunization Registry

Tuesday, October 28, 2003 - 4:00 PM
3787

This presentation is part of D4: Winning the Match Game: Improving Deduplication Processes

Development of a De-duplication model for a Public/Private Immunization Registry

nan Zhou, IS Specialist, Detroit, MI, USA, Daniel Lafferty, SEMHA, 3011 W. Grand Blvd. Suite 200, Detroit, MI, USA, Julie Gleason-Comstock, Southeastern Michigan Childhood Immunization Registry, Southeastern Michigan Health Association, 3011 West Grand Boulevard, Detroit, MI, USA, and Yan Jin, Computer Science Department, Wayne State University, 5143 Cass Avenue, 431 State Hall, Detroit, MI, USA.

KEYWORDS:
Data quality, De-duplication, Data Matching, Michigan Childhood Immunization Registry (MCIR), Southeaster Michigan Childhood Immunization Registry (SEMCIR)

BACKGROUND:
MCIR is a statewide electronic immunization information system accessible to both public and private providers. Because of the often absence of unique identifiers in the search for child's record in the database prior to the creation of a new record, there is potential for duplicate child records to be generated due to the common typographical errors on other string and date fields. A time-consuming effort was previously undertaken to manually de-duplicate the data at the regional level.

OBJECTIVE:
To describe how a de-duplication model was created and is utilized to solve the duplication problem. To discuss the challenges faced in design, implementation and maintenance.

METHOD:
Five child elements and four family elements are selected to take part in comparison between records. A specific weight point is assigned to each element according to the level of likelihood in duplicate record. A fuzzy match process is used to calculate weight points of each element on the candidate match. The result of the de-duplication model only presents those records with total weight points above a certain level.

RESULT:
This de-duplication model is currently used at regional level to automatically and efficiently locates duplicated records for the same child against the existing database. There have been an increase in data quality and a major reduction in the previous labor intensively process and a cost savings.

CONCLUSION:
Lessons learned from the design of the de-duplication model, include Soundex match vs. Fuzzy match in a search algorithm.

LEARNING OBJECTIVES:
To understand a de-duplication model.
To demonstrate how the model saves time and money and increases data quality.
To understand common but critical problems of duplicate record generation and provide solutions for them.

Back to Winning the Match Game: Improving Deduplication Processes
Back to The 2003 Immunization Registry Conference (October 27-29, 2003)

Tuesday, October 28, 2003 - 4:00 PM3787

This presentation is part of D4: Winning the Match Game: Improving Deduplication Processes

Development of a De-duplication model for a Public/Private Immunization Registry

Tuesday, October 28, 2003 - 4:00 PM
3787