20990 Improving Biosurveillance Using Full-Text Clinical Note Processing

Tuesday, September 1, 2009: 1:50 PM
The Learning Center
S. Trent Rosenbloom, MD, MPH , Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN
Elliot M. FielStein, PhD , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Theodore Speroff, PhD , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Peter L. Elkin, MD , Internal Medicine, Mount Sinai School of Medicine, New York, NY
Brett E. Trusko , Biomedical Informatics, Mount Sinai Medical Center, New York, NY
Steven H. Brown, MD, MS , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Robert S. Dittus, MD, MPH , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Michael E. Matheny, MD, MS, MPH , General Internal Medicine / Biomedical Informatics, Vanderbilt University / Veteran's Administration, Nashville, TN
Background: Improving surveillance for chemical or biological agents and disease outbreaks represents a complex challenge for public health and biomedical informatics.  Symptoms documented in text-based clinical notes are an important source for surveillance.  Natural language processing (NLP) tools applied to clinical notes have untapped potential for public health outbreak detection.  This project's aim was to develop and evaluate a set of NLP-based rules to detect examples of infectious diseases relevant to biosurveillance.
Methods: Emergency department, urgent care and primary care clinical notes from 33,000 veterans tracked by the VA National Surgical Quality Improvement Program from 9/30/1999 to 09/30/2007 were randomly assigned to a symptom training set (60), a symptom testing set (444), and a disease detection set (216).  Case definitions were constructed for 18 symptoms associated with tuberculosis, influenza, and acute hepatitis according to symptom profiles determined by physician consensus and literature review.  Clinicians conducted gold standard manual reviews of notes to evaluate whether candidate symptoms were present, and whether they were asserted positively or negatively.  Each note was then indexed by the NLP Multi-threaded Clinical Vocabulary Server (MCVS) software system, which mapped concepts and keywords using SNOMED-CT.
Results: A total 12,224 sentences containing 2,608 unique phrases were parsed; 90,673 terms were mapped to 3,066 symptom concepts and 36,410 terms were retained as keywords.  The overall performance of the automated symptom detection algorithm was 90.3% sensitive and had a90.6% positive predictive value. The automatic detection algorithm was able to correctly determine negation in 77.3% (1542/1995) of symptoms.
Conclusion: The natural language processing tools used in this project performed excellently in detecting symptoms across three infectious diseases.  We found that distinction between positive and negative assertions improves prediction.  Concept indexing is a sensitive and predictive method for surveilling disease outbreaks.