21153 Computationally Efficient Event Detection and Investigation in Very Large Multivariate Temporal Data

Monday, August 31, 2009: 4:05 PM
The Learning Center
Artur Dubrawski, PhD , Auton Lab, Carnegie Mellon University, Pittsburgh, PA
Maheshkumar Sabhnani, Masters , Machine Learning, Carnegie Mellon University, Pittsburgh, PA
We present a tool for rapid processing of large sets of multivariate time series designed to support public health event detection and investigation. Many public health datasets have a form of multivariate time series collected over multiple years. One of the uses of such data is to monitor it for signs of emerging disease outbreaks and to alert about them early so that the impact of such events could be effectively mitigated. This can be done by screening the data for indicative patterns known in advance and/or by detecting statistically significant anomalies. Either approach often turns out to be computationally expensive as it requires finding the correct explanation of the detected events among millions of potential explanations involving patient demographics, symptoms, signs or diagnoses. Our tool uses a data structure called T-Cube which supports retrieval and analysis of millions of time series in seconds. This advantage becomes very useful in searching through a massive number possible explanations of events found in multivariate data. The underlying data structure typically does not require more than a few hundred megabytes of memory and it can be built in a few minutes. Quick response speeds-up outbreak investigations and enables exhaustive post-event analyses. We demonstrate performance of the tool using a real-world public health data from Sri Lanka. With the increasing ease of data collection and storage and with the corresponding growing need for its intelligent analyses, the need of scalable analytics will become increasingly prevalent. We believe that tools similar to the proposed one should find many uses in public health informatics applications and that they will enable efficient and effective processing of the increasing volumes of relevant data..
<< Previous Abstract | Next Abstract