6th Annual Public Health Information Network Conference: Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems

Towards Improved Sensitivity, Specificity, and Timeliness of Syndromic Surveillance Systems

Wednesday, August 27, 2008: 10:00 AM
Atlanta EFG
Anna L. Buczak, PhD , National Security Technology Department, Johns Hopkins University Applied Physics Laboratory, Laurel, MD
Linda Moniz, PhD , NSTD-STJ, Johns Hopkins University Applied Physics Laboratory, Laurel, MD
Joseph Lombardo, MS , NSTD-STJ, Johns Hopkins University Applied Physics Laboratory, Laurel, MD
Existing automated syndromic surveillance systems provide too many alerts that are not of interest to public health professionals.  Most of the alerting methods use purely statistical approaches (e.g. regression, time series) to determine whether the counts in a given syndrome/subsyndrome time series are unusually high.  Those methods use univariate statistics i.e. look at each of the syndrome/subsyndrome/age/gender, etc. combination separately.  As such they are prone to the proliferation of false alarms, since given a probability p of  false alarm for each of the univariate methods, the combined false alarm probability of n data streams is: 1-(1-p)n

We describe a novel methodology for outbreak detection that requires only examples of normal behavior to train the model.  It uses machine learning techniques to learn the model of normal activity and then detects anomalies based on their dissimilarity from regular activities.  Machine learning techniques, such as Support Vector Machines, are used to develop rich multivariate models that allow detecting abnormal relationships between different time series.  By taking into account the interplay among different parameters, a much richer (and closer to reality) model of normalcy is developed than by using purely statistical methods on univariate time series. 

ESSENCE data for normal days (no disease outbreaks) is used for training.  Testing is performed on simulated outbreaks of several illnesses added to the real background data.  The approach employs the Chief Complaint data categorized by ESSENCE into 11 syndromes and 159 subsyndromes, taking into account age, gender, and geographic location of patients.

The multivariate characteristic of our approach and the fact that it uses a powerful machine learning technique, ultimately lead to improved sensitivity, specificity and timeliness of outbreak detection.  An additional benefit is that it requires for training only examples of normal behavior that are easier and less expensive to obtain than data about true outbreaks.

See more of: Evaluation of Surveillance Systems
See more of: Abstracts
Previous Abstract | Next Abstract >>