Susan M. Salkowitz, Salkowitz Associates, LLC, Two Independence Place, Unit 1601, 233 S. 6th Street, Philadelphia, PA, USA,
Stephen Clyde, Computer Science Department, Utah State University, UMC 4205, Logan, UT, USA, and Ellen Wild, AKC- Connections, Task Force for Child Development and Survival, 750 Commerce Drive, Suite 400, Decatur, GA, USA.
BACKGROUND:
All Kids Count Connections is a peer-to-peer learning network of 11 state and local health departments engaged in developing and implementing integrated information systems. Many of the 11 projects the integration of immunization registries with other early child health information systems. Each of the projects are creating enterprise-wide, person-centric integrated systems for improving program efficiency and service delivery. Although the projects address similar goals, they differed in how they standardize data from various sources, identify records for same child, and coalesce those records. These activities, collectively termed deduplication, are essential for ensuring data quality and to the overall success of any integration project. The challenge is to select the most cost-effective deduplication strategy and tools for a given environment.
OBJECTIVE:
This study researched deduplication strategies and software used by Connections projects, analyzed available technologies, conducted limited testing of off-the-shelf solutions using CDC’s Deduplication Tool Kit, and documented the findings in matrices that compared approaches, effectiveness, costs, and other factors.
METHOD:
The research methods included surveying current practices and issues among the participating projects, exploring off-the-shelf and custom software, reviewing current technical literature, developing a method for evaluating deduplication strategies and software, and applying that method on some examples.
RESULT:
The study provides a framework and method for integration projects to assess deduplication alternatives by comparing their requirements and results to the products and practices of similar projects.
CONCLUSION:
There is no single best product. The study identified technical and non-technical factors that affect the efficacy of deduplication efforts and the ability to test and evaluate them, such as vendor cooperation, knowledge and practices of users, integration mandates, project organization, size and complexity of files, programmatic integration, interaction with external stakeholders, service delivery goals, support, and resources.
LEARNING OBJECTIVES:
To understand both the technical and non-technical issues of deduplication and to communicate a framework and method for examining alternatives and selecting strategies and software tools that match project requirements.