C5d From Chart to CART: Improving Automated Case-Finding for Pelvic Inflammatory Disease Using CART Analysis

Wednesday, March 10, 2010: 11:15 AM
Cottonwood (M1) (Omni Hotel)
Delia Scholes, PhD1, Onchee Yu, MS1, Catherine Satterwhite, MSPH, MPH2, Hillard Weinstock, MD, MPH3, Jane Grafton, BA1, Linda Wehnes, BA1 and Stuart Berman, MD, ScM2, 1Group Health Research Institute, Group Health Cooperative, Seattle, WA, 2Division of STD Prevention, Epidemiology and Surevillance Branch, Centers for Disease Control and Prevention, Atlanta, GA, 3Division of STD Prevention, CDC, Atlanta, GA

Background:  Research and surveillance work addressing pelvic inflammatory disease (PID) often rely on use of ICD diagnostic codes from automated data sources. However, cases identified in this way may not be true PID, as these codes are also used to rule out, follow up, and note a history of PID.

Objectives: To develop an algorithm that improves the accuracy of PID case-finding through the use of additional automated data on treatment and other aspects of care.

Methods:  Using Group Health Cooperative automated datafiles, we identified potential PID episodes among women aged 15-44 years during 2002-2006 using ICD9 codes. Chart reviews to verify PID status were conducted on 200 potential cases in the algorithm development dataset and another 200 in the validation dataset. Using additional information on demographics, other diagnosis and procedure codes, treatment, and care setting, we conducted a classification and regression tree (CART) analysis to develop and validate a PID case-finding algorithm.

Results: The CART-based case-finding algorithm identified two main predictors of PID: treatment with an antibiotic and age 15-24 years. Sensitivity for the development and validation sets was 95-98%. However, specificity was 43-50%. The misclassification rate was 19-24% when PID cases were defined solely through ICD-9 codes.  This fell to 14-16% using the algorithm. Methods:  Results:

Conclusions: The PID case-finding algorithm was highly sensitive. Specificity was less optimal, indicating the likelihood of retaining false positive cases. However, the PID misclassification rates were substantially improved over the ICD9-only case identification method.

Implications for Programs, Policy, and/or Research:  Additional automated data available in many health plans nationwide can improve the accuracy of PID identification. With further validation, application of this algorithm offers the opportunity to enhance PID surveillance and research.