Implementing natural language processing within the clinical enterprise data warehouse for encephalopathy patient identification Jianlin Shi MD, MS1, John Arego BS2, Reed Barney BS3, John F. Hurdle MD, PhD1 1. Biomedical informatics department, University of Utah 2. University of Utah Health 3. University of Utah Health Data Warehouse Introduction Natural language processing (NLP) has been studied to identify specific patient cohorts and/or to assign standard terminology codes to clinical concepts in recent years[1–3]. These studies benefit both clinical research in terms of cohort identification, and clinical practice regarding quality measurement, billing justification, etc. Compared with manual coding, NLP can achieve comparable or better precision (i.e., fewer false positives). However, in previous studies [1–3] recall (i.e., more false negatives) is compromised as precision increases. At the University of Utah hospital, a simple term searching system (Warthog) is used to identify several diseases cohort with high recall but low precision, which needs a lot of post-processing human review work. This study is aimed at developing a high-recall NLP system to improve Warthog’s precision while preserving its recall. Methods To implement this service, first a set of current Warthog users were interviewed to learn user needs and requirements. We discovered that there was an operational priority to better identify in patients exhibiting encephalopathy. Then stakeholder group discussions were organized periodically to settle on the workflow and the data exchange protocols between our NLP components and the Hospital’s Enterprise Data Warehouse (EDW). Figure 1 demonstrates the system workflow: 1) a task scheduler periodically initiates the execution of NLP tasks; 2) It queries the related documents from the EDW to feed the NLP component; 3) Our NLP component consumes free text and outputs structured results back into database (Figure 2); and 4) The results will be presented using Warthog’s Web interface when requested by users. Task scheduler initiated Query documents for the tasks NLP component Save output to database Web interface to display results Figure 1. Overall Warthog-NLP Workflow Figure 2 shows the processing pipeline of the NLP component itself. First it detects sentence boundaries[4], then finds task-specific targeted concepts. Next, it detects the context information of the target concepts within the corresponding sentence boundary. The context information includes negation (whether a concept mention is negated), certainty (whether a mention is speculated), experiencer (whether a mention refers the patient), and temporality (whether a mention is historical). Temporality inference is applied to infer whether a time mention is present or historical, by checking if the time mention is 10 days prior to the admission date. For instance, if the admission date is 05/20/2016, “On 3/20” will be inferred as historical. Sentence detecter Name Entity Recognizer ConText detecter (e.g. Negation detecter) Temporality inferencer Figure 2. The Processing Pipeline for our NLP Component. We evaluated this NLP solution against the previous term searching Warthog system regarding precision and recall, using sampled 665 visits, including 8068 documents. Results Among all the visits with encephalopathy positive mentions, two reviewers sampled 60 documents of 21 visits, with 94% document level agreement and 95% visit level agreement. One reviewer reviewed the rest of results. Among the 50 visits that previous Warthog system identified to have encephalopathy positive mentions, 13 visits were approved by reviewer. Among the 208 visits that our NLP identified, 178 visits were approved by reviewer. The precision improved 231%, recall improved 1269%. Conclusion The NLP solution significantly improves the precision and recall of identifying visits that have documents referencing encephalopathy. This solution can be easily extended for other cohorts. References 1 Friedman C, Shagina L, Lussier Y, et al. Automated Encoding of Clinical Documents Based on Natural Language Processing. J Am Med Inform Assoc 2004;11:392–402. doi:10.1197/jamia.M1552 2 Peissig PL, Rasmussen LV, Berg RL, et al. Importance of multi-modal approaches to effectively identify cataract cases from electronic health records. J Am Med Inform Assoc 2012;19:225–34. doi:10.1136/amiajnl-2011-000456 3 DeLisle S, South B, Anthony JA, et al. Combining Free Text and Structured Electronic Medical Record Entries to Detect Acute Respiratory Infections. PLOS ONE 2010;5:e13377. doi:10.1371/journal.pone.0013377 4 Jianlin Shi, Danielle Mowery, Kristina M. Doing-Harris, et al. RuSH: a Rule-based Segmentation Tool Using Hashing for Extremely Accurate Sentence Segmentation of Clinical Text. In: AMIA Annu Symp Proc. Chicago, Ill: 2016. 1587.
© Copyright 2026 Paperzz