Lecture 5: Morphology Kai-Wei Chang CS @ University of Virginia [email protected] Couse webpage: http://kwchang.net/teaching/NLP16 6501 Natural Language Processing 1 This lecture v What is the structure of words? v Can we build an analyzer to model the structure of words? v Finite-state automata and regular expression 6501 Natural Language Processing 2 Words v Finite-state methods are particularly useful in dealing with a lexicon vCompact representations of words v Agenda vsome facts about words vcomputational methods 6501 Natural Language Processing 3 A Turkish word v How about English? ExamplefromJuliaHockenmaier, IntrotoNLP 6501 Natural Language Processing 4 Longest word in English v Longest word in Shakespeare’s Honorificabilitudinitatibus (27 letters) v Longest non-technical word: Antidisestablishmentarianism (28 letters) v Longest word in a major dictionary Pneumonoultramicroscopicsilicovolcanoconiosis (45 letters) v Longest word in literature Lopadotemachoselachogaleokranioleipsano...pterygon (182 letters) – Ancient greek transliteration v Methionylthreonylthreonylglutaminylarginyl...isoleucine (189,819 letters) – chemical name of a protein 6501 Natural Language Processing 5 What is Morphology? v The ways that words are built up from smaller meaningful units (morphemes) v Two classes of morphemes v Stems: The core meaning-bearing units v Affixes: adhere to stems to change their meanings and grammatical functions v e.g,. dis-grace-ful-ly 6501 Natural Language Processing 6 Inflection Morphology Create different forms of the same word: v Examples: v Verbs: walk, walked, walks v Nouns: Book, books, book’s v Personal pronouns: he, she, her, them, us v Serves a grammatical/semantic purpose that is different from the original but is transparently related to the original 6501 Natural Language Processing 7 Derivational Morphology Create different words from the same lemma: v Nominalization: v V+ -ation: e.g., computerization v V+er: killer v Negation: v Un-: Unod, unseen, … v Mis-: mistake, misunderstand ... v Adjectivization: v V+-able: doable v N+-al: national 6501 Natural Language Processing 8 What else? v Combines words into a new word: v Cream, ice cream, ice cream cone, ice cream cone bakery v Word formation is productive v Google, Googler, to google, to misgoogle, to googlefy, googlification v Google Map, Google Book, … 6501 Natural Language Processing 9 Morphological parsing and generation v Morphological parsing: v Morphological generation v What words can be generated from grace? grace, graceful, gracefully, disgrace, ungrace, undisgraceful, undisgracefully 6501 Natural Language Processing 10 Finite State Automata v FSA and regular expression has the same expressive power v The above FSA accepts string r/baa+!/ 6501 Natural Language Processing 11 Finite State Automata v Terminology: v v v v v It has 5 states Alphabet: {b, a, !} Start state: 𝑞" Accept state: 𝑞# 5 transitions Alphabet justmeansafinite setofsymbols intheinput Canhavemanyacceptstates v Are there other machines that correspond to the same language r/baa+!/ ? v Yes 6501 Natural Language Processing 12 Formal definition v You can specify an FSA by enumerating the following things. v The set of states: Q v A finite alphabet: Σ v A start state v A set of accept/final states v A transition function that maps QxΣ to Q 6501 Natural Language Processing 13 Example -- dollars and Cents 6501 Natural Language Processing 14 Yet another view – table representation Ifyou’reinstate1 andyou’relooking at ana,gotostate2 0 1 2 3 4 b a ! 1 2 2,3 4 6501 Natural Language Processing e 15 Non-Deterministic FSA v 𝜖- transition v More than one possible next states v Equivalent to deterministic FSA 6501 Natural Language Processing 16 Regular expression v Equivalent to FSA v Matching strings with regular expressions (e.g., perl, python, grep) v translating the regular expression into a machine (a table) and v passing the table and the string to an interpreter 6501 Natural Language Processing 17 Model morphology with FSA v Regular singular nouns are ok v Regular plural nouns have an -s on the end v Irregulars are ok as is 6501 Natural Language Processing 18 Now plug in the words 6501 Natural Language Processing 19 Derivational Rules 6501 Natural Language Processing 20 From recognition to parsing v Now we can use these machines to recognize strings v Can we use the machines to assign a structure to a string? (parsing) v Example: v From “cats” to “cat +N +p” 6501 Natural Language Processing 21 Transitions c:c a:a t:t ε: +N s: +p v c:c reads a c and write a c v ε:+N reads nothing and write +N 6501 Natural Language Processing 22 Challenge: Ambiguity v books: book +N +p or book +V +z (3rd person) v Non-deterministic FSA: allows multiple paths through a machine lead to the same accept state v Bias the search (or learn) so that a few likely paths are explored 6501 Natural Language Processing 23 Challenge: Spelling rules v The underlying morphemes (e.g., plural-s) can have different surface realization (-s, -es) v cat+s = cats v fox+s = foxes v Make+ing = making v How can we model it? 6501 Natural Language Processing 24 Intermediate representation 6501 Natural Language Processing 25 Overall Scheme v One FST that has explicit information about the lexicon v Lexical level to intermediate forms v Large set of machines that capture spelling rules v Intermediate forms to surface 6501 Natural Language Processing 26 Lexical to intermediate level 6501 Natural Language Processing 27 Intermediate level to surface v The add and “e” rule for –s v Example: fox^s# ↔ foxes# 6501 Natural Language Processing 28 Other application of FST v ELIZA: https://en.wikipedia.org/wiki/ELIZA v Implemented using pattern matching -- FST 6501 Natural Language Processing 29 ELIZA as a FST cascade Human: You don't argue with me. Computer: WHY DO YOU THINK I DON'T ARGUE WITH YOU A simple rule: v 1. Replace you with I and me with you: I don't argue with you. v 2. Replace <...> with Why do you think <...>: Why do you think I don't argue with you. 6501 Natural Language Processing 30 What about compounds? v Compounds have heretical structure: v (((ice cream) cone) bakery) not (ice ((cream cone) bakery)) v ((computer science) (graduate student)) not (computer ((science graduate) student)) v We need context-free grammars to capture this underlying structure 6501 Natural Language Processing 31
© Copyright 2026 Paperzz