Analyze Diet
JMIR medical informatics2017; 5(2); e17; doi: 10.2196/medinform.7123

Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records.

Abstract: The use of electronic medical records (EMRs) offers opportunity for clinical epidemiological research. With large EMR databases, automated analysis processes are necessary but require thorough validation before they can be routinely used. Objective: The aim of this study was to validate a computer-assisted technique using commercially available content analysis software (SimStat-WordStat v.6 (SS/WS), Provalis Research) for mining free-text EMRs. Methods: The dataset used for the validation process included life-long EMRs from 335 patients (17,563 rows of data), selected at random from a larger dataset (141,543 patients, ~2.6 million rows of data) and obtained from 10 equine veterinary practices in the United Kingdom. The ability of the computer-assisted technique to detect rows of data (cases) of colic, renal failure, right dorsal colitis, and non-steroidal anti-inflammatory drug (NSAID) use in the population was compared with manual classification. The first step of the computer-assisted analysis process was the definition of inclusion dictionaries to identify cases, including terms identifying a condition of interest. Words in inclusion dictionaries were selected from the list of all words in the dataset obtained in SS/WS. The second step consisted of defining an exclusion dictionary, including combinations of words to remove cases erroneously classified by the inclusion dictionary alone. The third step was the definition of a reinclusion dictionary to reinclude cases that had been erroneously classified by the exclusion dictionary. Finally, cases obtained by the exclusion dictionary were removed from cases obtained by the inclusion dictionary, and cases from the reinclusion dictionary were subsequently reincluded using Rv3.0.2 (R Foundation for Statistical Computing, Vienna, Austria). Manual analysis was performed as a separate process by a single experienced clinician reading through the dataset once and classifying each row of data based on the interpretation of the free-text notes. Validation was performed by comparison of the computer-assisted method with manual analysis, which was used as the gold standard. Sensitivity, specificity, negative predictive values (NPVs), positive predictive values (PPVs), and F values of the computer-assisted process were calculated by comparing them with the manual classification. Results: Lowest sensitivity, specificity, PPVs, NPVs, and F values were 99.82% (1128/1130), 99.88% (16410/16429), 94.6% (223/239), 100.00% (16410/16412), and 99.0% (100×2×0.983×0.998/[0.983+0.998]), respectively. The computer-assisted process required few seconds to run, although an estimated 30 h were required for dictionary creation. Manual classification required approximately 80 man-hours. Conclusions: The critical step in this work is the creation of accurate and inclusive dictionaries to ensure that no potential cases are missed. It is significantly easier to remove false positive terms from a SS/WS selected subset of a large database than search that original database for potential false negatives. The benefits of using this method are proportional to the size of the dataset to be analyzed.
Publication Date: 2017-06-29 PubMed ID: 28663163PubMed Central: PMC5509949DOI: 10.2196/medinform.7123Google Scholar: Lookup
The Equine Research Bank provides access to a large database of publicly available scientific literature. Inclusion in the Research Bank does not imply endorsement of study methods or findings by Mad Barn.
  • Journal Article

Summary

This research summary has been generated with artificial intelligence and may contain errors and omissions. Refer to the original study to confirm details provided. Submit correction.

The research focuses on validating a computer-assisted method for analyzing free-text electronic medical records using commercially available software, with the validation process involving a dataset of lifelong EMRs from 335 patients.

Objective and Methodology

  • The researchers aimed to validate a computer-assisted technique for mining free-text electronic medical records (EMRs). This approach uses a commercially available content analysis software called SimStat-WordStat v.6.
  • The method involves the use of inclusion, exclusion, and reinclusion dictionaries for identifying and classifying data of interest. Inclusion dictionaries identify cases of interest in the data, while exclusion dictionaries help remove mismatched cases. If the exclusion dictionary wrongly classifies some cases, the reinclusion dictionary will reinclude them.
  • Finally, cases from the exclusion dictionary are removed from the cases identified by the inclusion dictionary, and cases from the reinclusion dictionary are reincluded using a computer program.
  • The process was validated against manual analysis done by an experienced clinician to evaluate its efficacy and accuracy.

Dataset for Validation

  • The validation process utilized a dataset drawn from lifelong EMRs of 335 patients, randomly selected from a larger database of around 141,543 patients.
  • The researchers primarily focused on detection of cases relating to colic, renal failure, right dorsal colitis and non-steroidal anti-inflammatory drug (NSAID).

Results

  • The computer-assisted analysis technique exhibited high sensitivity, specificity, and predictive values, indicating its effectiveness in correctly identifying and classifying medical conditions in the dataset.
  • The creation of accurate and comprehensive dictionaries was identified as a crucial step in the process to avoid missing potential cases.
  • The manual process of classification was considerably time-consuming compared to the computer-assisted method, which ran within seconds, although the creation of the dictionaries did take an estimated 30 hours.
  • The benefits of this approach increase with the size of the dataset to be analyzed, demonstrating its scalability and potential for use with larger databases.

Conclusions

  • Overall, the study validates the efficacy and reliability of a computer-assisted technique for mining free-text electronic medical records.
  • This approach could significantly enhance the speed and efficiency of data analysis within large electronic medical record databases, advancing research in clinical epidemiology.

Cite This Article

APA
Duz M, Marshall JF, Parkin T. (2017). Validation of an Improved Computer-Assisted Technique for Mining Free-Text Electronic Medical Records. JMIR Med Inform, 5(2), e17. https://doi.org/10.2196/medinform.7123

Publication

ISSN: 2291-9694
NlmUniqueID: 101645109
Country: Canada
Language: English
Volume: 5
Issue: 2
Pages: e17
PII: e17

Researcher Affiliations

Duz, Marco
  • School of Veterinary Medicine and Science, University of Nottingham, Loughborough, United Kingdom.
Marshall, John F
  • School of Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.
Parkin, Tim
  • School of Veterinary Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow, United Kingdom.

Conflict of Interest Statement

Conflicts of Interest: None declared.

References

This article includes 13 references
  1. Kao A, Poteet SR. Natural Language Processing and Text Mining. London: Springer; 2007.
  2. Ford E, Carroll JA, Smith HE, Scott D, Cassell JA. Extracting information from the text of electronic medical records to improve case detection: a systematic review.. J Am Med Inform Assoc 2016 Sep;23(5):1007-15.
    doi: 10.1093/jamia/ocv180pmc: PMC4997034pubmed: 26911811google scholar: lookup
  3. . The dismantled national programme for IT in the NHSn. 2013.
  4. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research.. Yearb Med Inform 2008;:128-44.
    pubmed: 18660887
  5. Kushida CA, Nichols DA, Jadrnicek R, Miller R, Walsh JK, Griffin K. Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.. Med Care 2012 Jul;50 Suppl(Suppl):S82-101.
  6. Greenhalgh T. Narrative based medicine: narrative based medicine in an evidence based world.. BMJ 1999 Jan 30;318(7179):323-5.
    pmc: PMC1114786pubmed: 9924065doi: 10.1136/bmj.318.7179.323google scholar: lookup
  7. Salton G. The SMART Retrieval System—Experiments in Automatic Document Processing. Upper Saddle River, NJ: Prentice-Hall, Inc; 1971.
  8. Harman HD. The text retrieval conferences (trecs). Fourth Text REtrieval Conference (TREC-4); 1996; Vienna, Virginia. 1996. pp. 373–410.
  9. Heinze DT, Morsch ML, Holbrook J. Mining free-text medical records.. Proc AMIA Symp 2001;:254-8.
    pmc: PMC2243575pubmed: 11825190
  10. Krallinger M, Valencia A, Hirschman L. Linking genes to literature: text mining, information extraction, and retrieval applications for biology.. Genome Biol 2008;9 Suppl 2(Suppl 2):S8.
  11. Piatetsky-Shapiro G. Knowledge discovery in databases. SIGKDD Explor Newsl 2000 Jan 01;1(2):59–61.
    doi: 10.1145/846183.846197google scholar: lookup
  12. Lam K, Parkin T, Riggs C, Morgan K. Use of free text clinical records in identifying syndromes and analysing health data.. Vet Rec 2007 Oct 20;161(16):547-51.
    doi: 10.1136/vr.161.16.547pubmed: 17951562google scholar: lookup
  13. Anholt RM, Berezowski J, Jamal I, Ribble C, Stephen C. Mining free-text medical records for companion animal enteric syndrome surveillance.. Prev Vet Med 2014 Mar 1;113(4):417-22.

Citations

This article has been cited 6 times.
  1. Van Olmen J, Van Nooten J, Philips H, Sollie A, Daelemans W. Predicting COVID-19 Symptoms From Free Text in Medical Records Using Artificial Intelligence: Feasibility Study. JMIR Med Inform 2022 Apr 27;10(4):e37771.
    doi: 10.2196/37771pubmed: 35442903google scholar: lookup
  2. McKenzie J, Rajapakshe R, Shen H, Rajapakshe S, Lin A. A Semiautomated Chart Review for Assessing the Development of Radiation Pneumonitis Using Natural Language Processing: Diagnostic Accuracy and Feasibility Study. JMIR Med Inform 2021 Nov 12;9(11):e29241.
    doi: 10.2196/29241pubmed: 34766919google scholar: lookup
  3. Li R, Niu Y, Scott SR, Zhou C, Lan L, Liang Z, Li J. Using Electronic Medical Record Data for Research in a Healthcare Information and Management Systems Society (HIMSS) Analytics Electronic Medical Record Adoption Model (EMRAM) Stage 7 Hospital in Beijing: Cross-sectional Study. JMIR Med Inform 2021 Aug 3;9(8):e24405.
    doi: 10.2196/24405pubmed: 34342589google scholar: lookup
  4. Kim JD, Wang Y, Fujiwara T, Okuda S, Callahan TJ, Cohen KB. Open Agile text mining for bioinformatics: the PubAnnotation ecosystem. Bioinformatics 2019 Nov 1;35(21):4372-4380.
    doi: 10.1093/bioinformatics/btz227pubmed: 30937439google scholar: lookup
  5. Hardjojo A, Gunachandran A, Pang L, Abdullah MRB, Wah W, Chong JWC, Goh EH, Teo SH, Lim G, Lee ML, Hsu W, Lee V, Chen MI, Wong F, Phang JSK. Validation of a Natural Language Processing Algorithm for Detecting Infectious Disease Symptoms in Primary Care Electronic Medical Records in Singapore. JMIR Med Inform 2018 Jun 11;6(2):e36.
    doi: 10.2196/medinform.8204pubmed: 29907560google scholar: lookup
  6. Kim S, Park D, Choi Y, Lee K, Kim B, Jeon M, Kim J, Tan AC, Kang J. A Pilot Study of Biomedical Text Comprehension using an Attention-Based Deep Neural Reader: Design and Experimental Analysis. JMIR Med Inform 2018 Jan 5;6(1):e2.
    doi: 10.2196/medinform.8751pubmed: 29305341google scholar: lookup