Sunday, August 30, 2009
Grand Hall/Exhibit Hall
Enlai Wang, MS
,
Biomedical Informatics, Mount Sinai Medical Center, New York, NY
Jacob Hathaway, MD
,
General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Fern FitzHenry, PhD, RN, MM
,
General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Harvey J. Murff, MD, MPH
,
General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Steven H. Brown, MD, MS
,
General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Elliot M. FielStein, PhD
,
General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Robert S. Dittus, MD, MPH
,
General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Theodore Speroff, PhD
,
General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Abstract
Blood culture contamination during
specimen collection is a significant problem, with rates of 2-3% among all
blood cultures and 30-40% of positive blood cultures. Guidelines exist for determining
contamination, but detection cannot be automated because culture and
sensitivity reports are frequently in free text form. Extracting and evaluating these data has been
an ongoing challenge in public health informatics. Concept-based natural language processing
(NLP) is a method that analyzes sentence structure and syntax to extract
meaning from words and phrases, relates the terms to other terms in a sentence
or paragraph using grammar rules, and then matches the terms to concepts in a
medical ontology. Such knowledge mapping
is important, for example, in order to determine the family and gram stain
characteristics of a particular bacterium.
However, the use of NLP for semi-structured microbiology text is
difficult because normal sentence syntax and grammar do not exist to assist
with efforts to identify codes and their relationships.
We developed and validated a
hybrid regular expression and natural language processing solution for blood
culture microbiology reports and developed an algorithm to detect bacterial
contamination as defined administratively by the American College
of Pathologists. A total of 600 sets of reports from six Veterans
Administration hospitals were used to create hospital-stratified random
training (100) and testing (500) data sets.
The tool was iteratively developed to determine culture and sensitivity
and contamination results using a training dataset, and then evaluated on the
test dataset to determine antibiotic susceptibility data extraction and
contamination detection performance. Our
algorithm had a sensitivity of 84.8% and a positive predictive value of 96.0%
for mapping the antibiotics and bacteria with appropriate sensitivity findings
in the test data. The contamination
detection algorithm had a sensitivity of 83.3% and a positive predictive value
of 81.8%.