The findings and conclusions in these presentations have not been formally disseminated by the Centers for Disease Control and Prevention and should not be construed to represent any agency determination or policy.

Tuesday, March 11, 2008
P151

STD Infection Link Data Mining and Visualization

Joy Alamgir, Research and Development, Consilience Software, 3636 Executive Center Dr, Suite 120, Austin, TX, USA


Background:
STD infection investigation and containment requires data mining and visualization to find high risk contacts and sources.

Objective:
Describe computational methods of mining infection data and determine links and patterns of infections using various “house-holding” and speciation detection algorithms.

Method:
De-identified test data was generated (2000 infections) pseudo-stochastically. House-holding and speciation algorithms were applied to visualize and correctly predict infection networks. Geo-coding was also applied to correlate exposure sites to augment pattern detection.

Result:
The algorithms correctly created networks that could be identified based on similar traits (addresses, exposure sites, and speciation). The algorithm also failed to create networks if the above similar traits were not properly populated.

Conclusion:
Computation assisted data mining and visualization can rapidly increase efficiency of STD epidemiologists by visualizing such patterns immediately.

Implications:
Using such productivity improving tools for data mining and visualization can make epidemiologists more efficient and effective across program areas including STD. Further research is warranted in expanding the “similar traits” mentioned above to include additional demographic and clinical information (race, clinical findings, and symptoms).