20837 Identification of Blood Culture Contamination Using Natural Language Processing

Sunday, August 30, 2009
Grand Hall/Exhibit Hall
Michael E. Matheny, MD, MS, MPH , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Enlai Wang, MS , Biomedical Informatics, Mount Sinai Medical Center, New York, NY
Jacob Hathaway, MD , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Fern FitzHenry, PhD, RN, MM , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Harvey J. Murff, MD, MPH , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
George Giles, BS , Division of Cardiology, Vanderbilt University, Nashville, TN
Steven H. Brown, MD, MS , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Elliot M. FielStein, PhD , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Robert S. Dittus, MD, MPH , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Brett E. Trusko, PhD , Medicine, Occupational Medicine, Mt. Sinai School of Medicine, New York, NY
Theodore Speroff, PhD , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Peter L. Elkin, MD , Biomedical Informatics, Mount Sinai Medical Center, New York, NY

Abstract

Blood culture contamination during specimen collection is a significant problem, with rates of 2-3% among all blood cultures and 30-40% of positive blood cultures.  Guidelines exist for determining contamination, but detection cannot be automated because culture and sensitivity reports are frequently in free text form.  Extracting and evaluating these data has been an ongoing challenge in public health informatics.  Concept-based natural language processing (NLP) is a method that analyzes sentence structure and syntax to extract meaning from words and phrases, relates the terms to other terms in a sentence or paragraph using grammar rules, and then matches the terms to concepts in a medical ontology.  Such knowledge mapping is important, for example, in order to determine the family and gram stain characteristics of a particular bacterium.  However, the use of NLP for semi-structured microbiology text is difficult because normal sentence syntax and grammar do not exist to assist with efforts to identify codes and their relationships. 

We developed and validated a hybrid regular expression and natural language processing solution for blood culture microbiology reports and developed an algorithm to detect bacterial contamination as defined administratively by the American College of Pathologists. A total of 600 sets of reports from six Veterans Administration hospitals were used to create hospital-stratified random training (100) and testing (500) data sets.  The tool was iteratively developed to determine culture and sensitivity and contamination results using a training dataset, and then evaluated on the test dataset to determine antibiotic susceptibility data extraction and contamination detection performance.  Our algorithm had a sensitivity of 84.8% and a positive predictive value of 96.0% for mapping the antibiotics and bacteria with appropriate sensitivity findings in the test data.  The contamination detection algorithm had a sensitivity of 83.3% and a positive predictive value of 81.8%.

See more of: Posters
See more of: Submissions
<< Previous Abstract | Next Abstract >>