Implementing natural language processing within the clinical

Implementing natural language processing within the clinical enterprise data
warehouse for encephalopathy patient identification
Jianlin Shi MD, MS1, John Arego BS2, Reed Barney BS3, John F. Hurdle MD, PhD1
1. Biomedical informatics department, University of Utah 2. University of Utah Health 3. University of
Utah Health Data Warehouse
Introduction
Natural language processing (NLP) has been studied to identify specific patient cohorts and/or to assign
standard terminology codes to clinical concepts in recent years[1–3]. These studies benefit both clinical
research in terms of cohort identification, and clinical practice regarding quality measurement, billing
justification, etc. Compared with manual coding, NLP can achieve comparable or better precision (i.e.,
fewer false positives). However, in previous studies [1–3] recall (i.e., more false negatives) is
compromised as precision increases. At the University of Utah hospital, a simple term searching system
(Warthog) is used to identify several diseases cohort with high recall but low precision, which needs a lot
of post-processing human review work. This study is aimed at developing a high-recall NLP system to
improve Warthog’s precision while preserving its recall.
Methods
To implement this service, first a set of current Warthog users were interviewed to learn user needs and
requirements. We discovered that there was an operational priority to better identify in patients
exhibiting encephalopathy. Then stakeholder group discussions were organized periodically to settle on
the workflow and the data exchange protocols between our NLP components and the Hospital’s
Enterprise Data Warehouse (EDW). Figure 1 demonstrates the system workflow: 1) a task scheduler
periodically initiates the execution of NLP tasks; 2) It queries the related documents from the EDW to
feed the NLP component; 3) Our NLP component consumes free text and outputs structured results
back into database (Figure 2); and 4) The results will be presented using Warthog’s Web interface when
requested by users.
Task
scheduler
initiated
Query
documents
for the
tasks
NLP
component
Save output
to database
Web
interface to
display
results
Figure 1. Overall Warthog-NLP Workflow
Figure 2 shows the processing pipeline of the NLP component itself. First it detects sentence
boundaries[4], then finds task-specific targeted concepts. Next, it detects the context information of the
target concepts within the corresponding sentence boundary. The context information includes
negation (whether a concept mention is negated), certainty (whether a mention is speculated),
experiencer (whether a mention refers the patient), and temporality (whether a mention is historical).
Temporality inference is applied to infer whether a time mention is present or historical, by checking if
the time mention is 10 days prior to the admission date. For instance, if the admission date is
05/20/2016, “On 3/20” will be inferred as historical.
Sentence
detecter
Name Entity
Recognizer
ConText
detecter
(e.g. Negation
detecter)
Temporality
inferencer
Figure 2. The Processing Pipeline for our NLP Component.
We evaluated this NLP solution against the previous term searching Warthog system regarding precision
and recall, using sampled 665 visits, including 8068 documents.
Results
Among all the visits with encephalopathy positive mentions, two reviewers sampled 60 documents of 21
visits, with 94% document level agreement and 95% visit level agreement. One reviewer reviewed the
rest of results. Among the 50 visits that previous Warthog system identified to have encephalopathy
positive mentions, 13 visits were approved by reviewer. Among the 208 visits that our NLP identified,
178 visits were approved by reviewer. The precision improved 231%, recall improved 1269%.
Conclusion
The NLP solution significantly improves the precision and recall of identifying visits that have documents
referencing encephalopathy. This solution can be easily extended for other cohorts.
References
1 Friedman C, Shagina L, Lussier Y, et al. Automated Encoding of Clinical Documents Based on Natural
Language Processing. J Am Med Inform Assoc 2004;11:392–402. doi:10.1197/jamia.M1552
2 Peissig PL, Rasmussen LV, Berg RL, et al. Importance of multi-modal approaches to effectively identify
cataract cases from electronic health records. J Am Med Inform Assoc 2012;19:225–34.
doi:10.1136/amiajnl-2011-000456
3 DeLisle S, South B, Anthony JA, et al. Combining Free Text and Structured Electronic Medical Record
Entries to Detect Acute Respiratory Infections. PLOS ONE 2010;5:e13377.
doi:10.1371/journal.pone.0013377
4 Jianlin Shi, Danielle Mowery, Kristina M. Doing-Harris, et al. RuSH: a Rule-based Segmentation Tool
Using Hashing for Extremely Accurate Sentence Segmentation of Clinical Text. In: AMIA Annu Symp
Proc. Chicago, Ill: 2016. 1587.