SIIM 2017 Scientific Session Reporting & Communication Part 2 Friday, June 2 | 1:15 pm – 2:45 pm Intelligently Searching Inside the Patient Record Using Meaningful Synonym Expansion Gabe Mankovich, Philips Research; Roger Ballard; Amir Tahmasebi, PhD; Lucas Oliveira (Presenter) Hypothesis It is possible to build an intelligent search engine for locating various mentions of disease and anatomies within a patient’s narrative health record documents. Introduction In a typical clinical workflow, clinicians frequently have to refer to the patient’s record in order to answer or resolve important questions that impact clinical decisions. Locating the desired information can be challenging, particularly when it is hidden within narrative records of the patients such as pathology or radiology reports. Going through every document one by one could be very timeconsuming and there is a chance a crucial piece of information is missed when the patient possesses several such records. One could imagine that missing a critical piece of information before making a final diagnosis could have a dramatic effect on the patient’s health. An intelligent search tool that allows the clinicians to quickly crawl through patient’s history and accurately and reliably detect relevant information is desired. This application should support synonyms for terms, be able to handle typos and miss-spellings due to human error and suggest relevant mentions that are syntactically different, but semantically similar to what the clinician is trying to find. Methods In this paper, we present a search engine built on top of an Elastic Search [1] instance, leveraging the UMLS synonym set [2]. A series of analyzers were built within the Elastic framework that allow the application to index the documents of a patient so as to enable both synonym-based matching and intelligent suggestions. UMLS Synonyms Sets The synonym sets are generated by reading directly from the UMLS ontology itself using the unique concept identifiers. The process is outlined in Figure 1. The end result is a file in Solr synonym format [3], which can be referenced in an Elastic analyzer. The next section describes how the analyzer is used. Figure 1 Figure 1. Outline of the synonym set generation based on the UMLS ontology. Synonym-based Elastic Analyzer The synonym sets generated from UMLS are used during both the indexing and query processes. First, the text is tokenized and stop words are ignored. Then, the tokens are expanded to word n-grams (shingles in the Elastic terminology). Finally, these n-grams are expanded using the UMLS synonyms. Intelligent Suggestions Suggestions are a vital piece of search and retrieval solutions. By suggesting query terms you can greatly improve the recall of information, particularly when the text in the search space is expressed in a way that is unfamiliar to the user. Our suggestion process leverages the synonym indexing seen previously, in combination with phrase auto-completion based on the content of the current patient’s record. The indexing process for suggestions is the same as described previously for the synonym search, with two additions: An inverse synonym filter, and an edge n-gram filter. The details, along with an example, of the suggestion index pipeline can be found in Results Section. The index for the suggestions is only part of the solution. Querying must support phrase completion derived from the patient record being searched. In order to achieve this, our solution has a pipeline tailored towards the expansion of the query for suggestion matching. Results As previously mentioned, in order to complete the pipeline described above, a solution was created using an Elastic instance. Described here are the fine details of the Analyzer pipelines that were built to achieve the synonym-based search and intelligent suggestions. Figure 2, shows the detailed process of Synonym-based Elastic Analyzer, which additionally outlines the order of analyzers applied. Figure 2 Figure 2. An example of the synonym-based analyzer running on both the document text and query text. Both the document and query are expanded to synonym-shingle sets. In Figure 3, details of the suggestion index pipeline is demonstrated through an example. Figure 3 Figure 3. Continuing example showing the creation of the suggestion index for the document “Heart attack, risk of chf.” The result of the Inverse Synonym filter and Edge n-gram filter creates a very large index, it has been abridged here for readability. In reality, each inverse synonym is expanded to prefixes, in this diagram we only show the result for coronary and myocardial infarction. Figure 4 shows an example of the partial query text matching through an example: “lung nodules and prostate can.” Figure 4 Figure 4: An example of the partial query text “lung nodules and prostate can” being ran through the query suggestion process. The two vital steps here are the prefix mapping, which using exact prefix matches expands to a set of full phrases, and scoring/sorting which ranks the set generated by prefix mapping according to how often the constituent terms appear for the patient being searched. Discussion The index and query pipelines described have been built and integrated with a research prototype. The proposed framework has been qualitatively tested using a sample set of radiology and pathology reports. A thorough clinical validation of the prototype is a work in progress. The synonym search is strictly tied to the UMLS preferred terms currently. This makes it conservative, and tied to existing UMLS concepts. If the area of search is not well covered by UMLS, or if preferred terms and related concepts are not well organized in a given domain, an alternative or supplementary synonym set would need to be included. The pipeline supports this, although there has been limited exploration of the alternative sets. Another consideration is the conservative nature of true synonym sets. Often, in the case of clinical search applications, users are looking for more than exact synonym matches. For example, if searching the term “lung nodule” in many cases a clinician would expect it to match to “lingular nodule.” However, these kind of adjacent or hierarchical relationships are not captured in the UMLS synonym set. In order to manage this kind of intelligence Hypernyms would need to be introduced to cover the more complex similarity relationships. This is a topic worth exploring for this application. Conclusion This work aimed to build an intelligent patient medical record search engine for finding answers to clinical questions in narrative reports. We have demonstrated the feasibility of using an Elastic instance, with analyzers based on UMLS to enable both synonym matches and intelligent suggestions. References 1. Elastic Search (2016, 09 01). Retrieved from https://www.elastic.co/ 2. UMLS (2016, 09 01). Retrieved from https://www.nlm.nih.gov/research/umls/ 3. Solr (2016, 09 01). Retried from http://lucene.apache.org/solr/ Keywords synonyms, elastic search, semantic search, UMLS, search suggestion engine
© Copyright 2026 Paperzz