Validation of Clinical Data Submitted to Biosense with Whole Clinical Record Surveillance Using Natural Language Processing
Gail A Welsh MD, Peter L Elkin MD, Brett E Trusko PhD, David A Froehling MD , Dietlind Wahner-Roedler MD
Aim
Our study is a collaborative effort to
validate the accuracy of data submitted from patient clinical records at the
Johns Hopkins University Biosense site to the CDC Biosense database.
We will assess whether data is coded and transmitted correctly and whether
it accurately reflects information from source clinical records. We will compare the accuracy of ICD9 codes
with
Background
Volunteer institutions submit clinical data, including chief complaint, from patient records to the CDC Biosense database for biosurveillance of disease outbreaks. It is not clear how accurately submitted data represents the source patient record. The process may overlook important data or report false positives.
Method
Two reviewers will
review 1000 randomly selected patient records from the JHU Biosense
site. Mayo Clinic Vocabulary Server
(MCVS), a natural language processor, will assign ICD9 codes independently and
reviewers will compare the codes with Biosense data
for accuracy. The text of the whole
patient record from JHU Biosense will then be parsed
and coded in
Results
Our project is in process but has been approved by IRB at both institutions. We plan to have interim data before PHIN.
Conclusion:
We are in active collaboration for Biosense evaluation.