Audio Transcript Health Care Data Analytics Machine Learning and Natural Language Processing Lecture b Health IT Workforce Curriculum Version 4.0/Spring 2016 This material (Comp 24 Unit 6) was developed by Oregon Health & Science University, funded by the Department of Health and Human Services, Office of the National Coordinator for Health Information Technology under Award Number 1U24OC000015. This material was updated in 2016 by [Institution Name] under Award Number 90WT00006. This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/ Slide 1 Welcome to Health Care Data Analytics: Machine Learning and Natural Language Processing. This is lecture b. The component, Health Care Data Analytics, covers the topic of health care data analytics, which applies the use of data, statistical and quantitative analysis, and explanatory and predictive models to drive decisions and actions in health care Slide 2 The learning objectives for this unit, Machine Learning and Natural Language Processing, are to: Describe the major tasks for which machine learning is used Compare and contrast the major approaches for machine learning Describe the major tasks for which natural language processing is used And discuss the major approaches and challenges for processing clinical narratives Slide 3 In this lecture, we begin our discussion of natural language processing, or NLP, of clinical text. First, we will look at basic definitions and approaches to NLP. This will be followed by challenges in processing the clinical narrative. In the next lecture, we will discuss various clinical NLP approaches and projects. And finally, we will describe alternatives and future directions. Slide 4 Let's begin with basic definitions and approaches. Successful NLP of the clinical narrative could help better enable the use of data in electronic health records, or EHRs. We know, for example, that current coded data, such as ICD-10, does not cover the complexity of what is described in the clinical narrative. We also know that a good deal of clinical information is “locked” in that text, meaning we cannot easily extract and process the information to use for various purposes. Slide 5 Some have noted that the term NLP could actually be better described as natural language understanding, because the goal of NLP is the understanding of natural language in computerized text. For those desiring more detail on the various approaches to NLP and its uses, the references given in the last few slides of this presentation can be consulted. Slide 6 What are some of the use cases for clinical NLP? The three major ones are listed on this slide. Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 2 The first use case is classification, where we are trying to classify what we find in the text into some sort of category. For example, we may want to classify a patient finding into a category such as when determining if they might be eligible for a clinical study. Probably the major use case for clinical NLP is extraction, where we want to extract information from a clinical narrative. For example, we might want to extract the findings that occur in a radiology report and even the measurements that are reported within that text. A third use case is summarization, where we may want to summarize or abstract the information that is in the narrative. We may do this for medical literature to summarize scientific information or the clinical narrative where were trying to summarize the major findings that the patient has. Slide 7 We can delve further into use cases for NLP by considering cancer care. This set of use cases comes from some promotional literature from a company that sells clinical NLP products, but actually gives a good set of use cases for which NLP might help us. For example, we might identify potential clinical trials matches, something akin to what we mentioned on the last slide. We might be able to do advanced information extraction from complex patient documents. We may be able to carry out more precise information retrieval for clinical case histories and outcome studies. We also may be able to streamline the process of entering patients into cancer registries. In addition, we may be able to use the data that we extract using NLP to apply predictive models and care coordination rules to clinical narratives in the patient record. Slide 8 We may also be able to perform semantic enrichment of patient documentation to improve the ability to search their notes. We can analyze patient narratives for insights into treatment outcomes and also to assess the effect of genetic aberrations on disease. Finally, we may be able to support tumor boards, where the care of patients who developed cancer is discussed by those providing care for them. Slide 9 Let's take a more detailed look at human language so we can understand the applications and limitations of NLP tools in clinical documents. Linguists talk about the levels of human language. We begin with phonology. The sound units that make up a language’s discrete sounds are called phonemes. The next level up is morphology, which is the analysis of parts of words, which are called morphemes. Sometimes a whole word is the morpheme, but other times they may be bound morphemes that are a part of the word. For example, many anatomic locations such as the appendix are bound to another word such as –itis, indicating inflammation. Thus, appendicitis indicates there is inflammation of the appendix. Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 3 There are other morphemes, such as pharyng- and –itis. There are also bound morphemes that indicate procedures such as appendectomy. Syntax refers to the rules that govern the construction of language, sometimes called the grammar. Slide 10 Semantics describes the meaning of the words, phrases, and th sentences that make up language. Linguists also talk about pragmatics, which is how the context of language affects its meaning. And then there is larger world knowledge that is not explicitly part of language but is the general knowledge that's necessary to understand it. Slide 11 The classic approach to NLP goes through three phases. The first phase is syntax, where we attempt to recognize the grammatical constituents of language: sentences, phrases within them, and down to nouns, verbs, adjectives etc. The next phase is semantics, where we attempt to recognize the meaning of those words, phrases, and sentences. Finally is the context in which the sentence occurs. Slide 12 Each of these levels is successively harder and requires more knowledge engineering, but would add more value if we could solve those problems. One of the ways we address the inability to completely perform classic NLP is through the use of rules and matching, where we don't aim for complete understanding of everything in the document but instead try to recognize the terms that occur and perhaps normalize them. This may allow us to understand what was said. Or, instead of using detailed grammar rules, we may use machine learning techniques where we learn the rules of parsing rather than developing human enumerations of all the possible grammar rules that might exist. Slide 13 Let's explore syntax and semantics in more detail. Processing of syntax is usually done via a technique called parsing. This requires a grammar, which is the rules that govern the syntax of language. The most common way that we express a grammar is as a set of rewrite rules, where a more complex grammatical construct is rewritten from constituent parts. Slide 14 Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 4 For example, a relatively simple grammatical rule is that a sentence consists of a noun phrase, a verb phrase, and a noun phrase. An example is, the patient has severe hypertension. The first noun phrase is the patient. The verb phrase is the verb has. The second noun phrase is severe hypertension. Of course, noun phrases themselves can be rewritten into more basic constituents. There are determiners, such as a grammatical article an example of which is the word the. There are also adjectives, such as severe. And a noun phrase can also just consist of a single noun. The symbols that cannot be further decomposed, such as an adjective and a noun, are called terminal symbols. Likewise, those that can be further decomposed, such as sentence and noun phrase, are called non-terminal symbols. Slide 15 As you can imagine, the grammar supporting the English language can get highly complex, with many, many rewrite rules. This is why the machine learning approach has superseded the approach of trying to enumerate every last grammar rule. In semantics, we aim to map the parts of speech, these nouns, adjectives, verbs, etc. into standardized terminology. For medicine, probably the most descriptive terminology is SNOMED CT. Slide 16 Processing language has been one of the most challenging computer tasks and is difficult not only in the clinical narrative but almost in all forms of natural language. Clinical narratives, such as progress notes and discharge summaries, can be even more difficult to process than other types of text for many reasons. One is that clinical narratives are written in a telegraphic, elliptical style. Oftentimes, the narratives are not complete sentences. We will see examples of that in a moment. Clinical text also may have spelling errors or grammatical errors. We also know that physicians and others may take license with language and oftentimes there may be important information that is buried within normal language that is implicit but not actually in the words and phrases. Slide 17 We will look at some of the challenges at the syntactic, semantic, and contextual levels. Slide 18 Here is a look at some of the syntactic challenges that were first enumerated by Sager in the 1980s; others have since validated these challenges. As mentioned in the previous slide, a great deal of clinical narrative text is syntactically incomplete, that is, at least according to Sager's analysis, half of all sentences in the clinical narrative were Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 5 found to be grammatically incomplete. If we think of the minimal English sentence as subject verb object we see different types of incomplete sentences. Slide 19 For example, the medical record may delete the verb and object. When the text says, “stiff neck and fever,” there has been a deletion of the verb and object from the sentence. In “Brain scan negative,” there is deletion of the verb “is.” For the statement, “Positive for heart disease,” there is deletion of the subject and verb, such as “The patient has.” And finally, “Was seen by local doctor” has deletion of the subject. As humans we can read these and still for the most part understand what is happening, but computer algorithms, especially those that are solely based on rules, have difficulty with these sorts of violations of rules of basic English grammar. Slide 20 There are also semantic challenges, which again as humans, especially those who have some clinical knowledge, we readily understand. But to a computer that is just functioning based on rules, there is a lot more difficulty. We know that words have different senses and meanings. For example, when we read in a medical chart murmur is appreciated, we know that likely there is a clinician who is listening, probably with a stethoscope, to the heart and there is a murmur. It's not so much that the murmur is appreciated in the sense of it being liked. By the same token, when we read about eye drops, we are thinking about drops of liquid containing medication put into the eye and not the eye physically dropping. Likewise, when we read mass at 3 o'clock, we know that were likely reading about something that is felt on the left-hand side of the abdomen and not that there's a religious service in the afternoon. Slide 21 Another semantic challenge is synonymy, where different words and phrases have the same meaning but they are expressed differently. For example, consider the phrase epigastric pain after eating versus another phrase postprandial stomach discomfort. These two phrases have no words in common but essentially mean the same thing. There is also polysemy, where the same words and phrases have different meetings. For example, someone might say, the PCP of the patient with PCP advised him to stop using PCP. PCP is an acronym that stands for several things, such as primary care physician, pneumocystis carinii pneumonia, or an abbreviated name for the drug phencyclidine. Slide 22 Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 6 There are a number of additional semantic challenges. One is negation. The clinical narrative is often full of negation. Clinicians may say the patient does not have this finding or that finding. Or that this disease is not present. Or saying we are choosing not to use this treatment and instead are using another one. Negation is common in medical text, for example, Patient does not have any chest pain. There is also uncertainty in natural language text. Clinicians may say things like, Patient treated for possible pneumonia. There is also temporality. Just because something is mentioned doesn't mean that it is present now. For example, Patient has history of pneumonia. Or there might be something that has been resolved, such as, Chest pain resolved after administration of nitroglycerin. Slide 23 There are also contextual challenges in the clinical narrative. The term that describes a broad category of these is coreference, which is the relation between linguistic expressions that refer to the same real-world entity. Consider the sentence, Chest x-ray shows nodule in left upper lobe, followed by another sentence, The tumor has increased in size to 2 cm. The phrase the tumor from the second sentence is actually referring to that same nodule from the first sentence. There is a particular type of coreference that can be challenging, which is anaphora, or the use of pronouns. Consider these two sentences: He complains of chest pain. It awakens him at night. “It” in the second sentence refers to “chest pain” in the first sentence. Slide 24 There is another type of contextual challenge where there is the deletion of subjects. This is quite common in clinical narratives, so we may see strings of sentences such as, Complains of chest pain. Increasing frequency. Worse in the morning. Again, as human readers we usually understand that quite easily, but when we have a natural language processing system, the computer may not make the connections across the sentences. Slide 25 Are there any silver linings that may enable us to have hope that we can carry out clinical NLP. It turns out that there are. First is the notion of subgrammars, after work by Sager in medicine and other disciplines. She determined that there were subgrammars that were grammars that were specific to disciplines, and that there was a subgrammar of clinical narratives that actually fairly regular and predictable. Another finding is that medical charts tend to have a predictable discourse, especially documents like the history and physical, where the document begins with the history of the patient, goes into the past medical history, and then into the physical exam. Physicians for the most part follow a well prescribed pathway through the exam. Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 7 More recently, another silver lining has been that we should abandon the notion of processing the entire clinical narrative and instead focus on specific elements that we can identify to indicate whether or not a specific disease or specific clinical finding is present. Thus, giving up on the approach of processing everything and instead focusing on specific elements present. Slide 26 Before we look at usage of clinical NLP and systems for it in the next lecture, let's talk briefly about how we evaluate how well NLP systems work. There are a variety of ways that systems can be measured but basically we want to determine how well they identify correct concepts and how well they don't identify incorrect concepts. The measures that we typically use are recall and precision. Recall is the proportion of correct concepts found. For example, if there are100 concepts that should be found by an NLP system and 75 actually are found, then the recall is 75%. Precision is the proportion of found concepts that are correct, so if we identify 150 concepts and 75 of them are correct, then our precision is 50%. Slide 27 Many evaluations of NLP are carried out in so-called challenge evaluations, where there is a common data set that different researchers use. These different research groups will compare the results on the same task. For the clinical NLP community, the largest and most participatory challenge evaluation has been the i2b2 NLP shared task. There has also been a systematic review of all studies through 2010 that was published and will be described more in the next lecture. Slide 28 This concludes lecture b of machine learning and natural language processing. In summarizing this lecture, we learned: The major use cases for NLP are classification, extraction, and summarization. The major phases of NLP are syntax, semantics, and context, each of which has challenges and is successively harder to do with computers. Slide 29 And, there may be some silver linings to help with NLP, such as sub grammars, predictable discourse, and focus on processing less than the entire meaning of everything in the document. Slide 30 No Audio. Slide 31 Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 8 No Audio. Slide 32 No Audio. Slide 33 No Audio. Slide 34 No Audio. Health IT Workforce Curriculum Version 4.0 Health Care Data Analytics Machine Learning and Natural Language Processing 9
© Copyright 2024 Paperzz