Sunday, August 24, 2008
South/West Halls
Public Health is experiencing a crisis in the explosion of data. The heterogeneity of structured and unstructured data types is growing, and the sources of data are becoming more and more distributed. Traditional informatics is failing these challenges and there is need for more rapid data exploitation in dynamic and emergency situations. Newer approaches include data grids. Rather than singular and central ownership in data warehouses, collaboration is achieved by network-based data sharing, with an information or knowledge layer as a cross-index. However, mere visibility of data exacerbates the problem of data overload. Data mining methods are reductionistic - losing most information and failing in problems with individual differences, outliers, rare events, and subliminal warnings. Associative memories techniques are emerging to solve this. The idea dates back to 1945 when Vannevar Bush imagined a memory assistant to think the way we think. “The human mind … operates by association. Selection by association, rather than indexing, may yet be mechanized.” Scaling challenges limit this approach, but recent advances provide massive armies of memory-based personal assistants; each one reads, remembers, and recalls all the associations for each person, place, and thing in massive data so the analyst does not have to remember everything. A memory-based information layer combines and shares all the cross-document, cross-repository knowledge with empirically grounded semantics. In epidemiology, such memories can provide the immediate discovery of relevant associations within populations, such as by recalling the familial, behavioral or geographical contexts of specific associations.
Associative memories techniques provide collaboration and social networking by remembering analytic actions. Query results include recommendation of one analyst to another through an implicit expertise directory. This provides another knowledge layer over the data grid, sharing experiences as well as data.