Two-Level Morphology STEFAN LANGER, CIS – UNIVERSITÄT MÜNCHEN SOMMERSEMESTER 2016 VERSION 1.0 Übersicht Flexionsmorphologie – Ansätze zur Behandlung in der Sprachverarbeitung Two-level morphology ◦ ◦ Lexikon und Morphosyntax Two-Level-Regeln Morphologische Ansätze - Beispiel bakeries -> bakery Vollformlexikon (1 Komponente) bakery.[bakery] bakeries.[bakery] Grammatik + Stammlexikon mit Varianten (2 Komponenten) bakery.[Stem1,bakery] bakerie.[Stem2,bakery] Word -> Stem2 + s Grammatik + Stammlexikon + Phonologischer Regeln bakery.[Stem] Word -> Stem + s y becomes ie before ‘s(PLU)’ Ansätze - Konsequenzen hyperbakeries -> hyperbakery Vollformenansatz: klappt nicht Stammvariantenlexikon + Regeln hyper.[Prefix,hyper] bakery.[Stem1,bakery] bakerie.[Stem2,bakery] Word -> Prefix + Stem2 + 's' Two-level-Morphologie bakery.[Stem] hyper.[Prefix] Word -> Prefix + Stem + 's' y becomes ie before 's' Motivation I Example Finnish – number of word forms. Similar situation in Turkish, Hungarian and other languages lapsi|N [lasten,lapseni,lapsemmekin,lapsellasi,lapsestasi,lapsenkin,lapsista,lastani, lapsetkaan,lapsellamme,lapsiinne,lapsiltani,lapsenaan,lapsistamme,lapsellemme ,lapsianikaan,lapsiakin,lastanne,lapsillesi,lapsillahan,lapsinaan,lapsennekin ,lapsillenikin,lapsella,lapselle,lastenne,lapsetko,lapseenkin,lapsillehan,lap sillenne,lapsillaan,lastesi,lapsistaan,lapsineen,lapsenne,lapsilla,lapselta,l apsille,lapsellensa,lapsellekaan,lapsihan,lapsiani,lapsilleen,lapsilta,lapsen ,lastaan,lapsenakaan,lapsillakaan,lapset,lapsellani,lastakin,lapsiltaan,lapse stani,lapsien,lapsillakin,lapsiini,lapsethan,lapsillekaan,lapsiamme,lapsineni ,lapsi,lapsillekin,lapsellanikin,lapsensakin,lapsiemme,lapsissaan,lapsilleni, lapsestamme,lapsiaankin,lapsiakaan,lapsiesi,lapsikin,lapsiltakaan,lapsina,lap sillesikin,lapsiltakin,lapsiimme,lapsellesi,lapsellanne,lapsissakin,lapseensa ,lastakaan,lasteni,lapsiansa,lapsilleenkin,lapsiaan,lastensakin,lapsessani,la psistahan,lapsillasi,lapsistasi,lapsetkin,lapsistanne,lapsellenne,lastamme,la psellaan,lapsiensa,lastenhan,lapsestaan,lapsillamme,lastenkaan,lapsesi,lapsen akin,lastemme,lastasi,lasta,lapsiinsa,lapsillemme,lapselleen,lapsemme,lapsilt asi,lapsillenikaan,lapseenkaan,lapseltaan,lapseksi,lapsellakin,lapsiaankaan,l apseeni,lapsinensa,lastansa,lapsia,lapsekseen,lapsienkin,lapsiltamme,lapsenik in,lapsessa,lapseen,lapsissamme,lapsistakin,lapsiksi,lapsellekin,lapsieni,lap sistaankin,lastenko,lastensa,lapsikaan,lastenkin,lapsillensa,lapsessaan,lapsi inkin,lapsensa,lapselleni,lapsissa,lapsiin,lapsiahan,lapsianne,lapsillani,lap sistani,lapsesta,lapsiasi,lapsellako,lapsena,lapsestahan,lapsienne] Motivation II Finnish – proper names. Proper names are also inflected in many other languages (Polish, Russian ...) porsche|N [porschelta,porschella,porschessa,porschen,pors cheksi,porschelle,porscheen,porschesta,porschea ,porsche] Motivation III Arabic, Hebrew - Arabic and Hebrew append conjunctions, pronouns and articles to words. This leads to a very high number of different tokens which contain the same content word with any combination of affixes. والقمر wa-al-qamar and-the-moon - cannot be caught by simple dictionary lookup (dictionaries only have limited coverage) - similar phenomena are verbal clitics in romance language (e.g. Spanish) Two level morphology: history Two level morphology is a model from the 80ies ◦ First presented by Kimmo Koskenniemi in 1983 Freely available implementation from the early 90ies (PCKimmo) used in some commercial systems (lingsoft) Two level morphology Komponenten 1. Wörterbuch mit Affixen und Stemmen 2. Reguläre Grammatik ins Lexikon integriert 3. Two-level-Komponente hyperbakeries ↕ hyper+´bakery+s Morphotactics: Classes english.lex ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION ALTERNATION Particle Prefix Root Suffix Infl PN_Suffix Y_Suffix IC_Suffix PT_Suffix Clitic Contraction CD Compound End AUX AUX-V PP CJ PP-CJ DT PR DT-PR IJ PREFIX N AJ V AV N-V N-AJ AJ-V AJ-AV CD OD SUFFIX INFL ;inflection PN_SUFF ;proper nouns Y_SUFF ;-y suffix IC_SUFF ;-ic suffix PT_SUFF ;participles GEN CNTR End CNTR End ;contractions CD OD ORDR ;cardinals and ordinals INITIAL ;compounds End extract from lexicon (affix.lex) ;LEXICON INITIAL \lf 0 \lx INITIAL \alt Particle \gl1 \gl2 \lf 0 \lx INITIAL \alt Prefix \gl1 \gl2 Extract from lexicon (affix.lex) ;LEXICON PREFIX \lf 0 \lx PREFIX \alt Root \gl1 \gl2 \lf hyper+ \lx PREFIX \alt Prefix \gl1 DEG3+ \gl2 DEG3+ Extract from lexicon (noun.lex) \lf `bakery \lx N \alt Suffix \fea deverb \gl1 `bake \gl2 \lf `balcony \lx N \alt Suffix \gl1 \gl2 #STEM with stress #STEM category #continuation #additional morph. Information Extract from lexicon (affix.lex) \lf 0 \lx SUFFIX \alt Infl \gl1 \gl2 ;noun plural \lf +s \lx INFL \alt Clitic \fea n/n pl reg \gl1 +PL \gl2 +PL Extract from lexicon (affix.lex) ;LEXICON End ;to disable compound parsing, comment out the next entry \lf \lx End \alt Compound \fea compound \gl1 \gl2 \lf 0 \lx End \alt # \gl1 \gl2 Morphotactic rule component & lexicon Summary: - The dictionary contains affixes, stems, eventually some additional information (boundaries,stress) - simple regular grammar integrated in dictionary - operates on sequences containing morphotactic information - verifies that following are well formed sequences: hyper+`bakery+s Next step: How to we get the sequences analysed/generated by the word grammar and the lexicon from surface text? hyperbakeries ↕ ? hyper+´bakery+s Two level rules - Alphabet Alphabet and character classes ALPHABET ;lexical (upper) and surface (lower) characters: b c d f g h j k l m n p q r s t v w x y z a e i o u ' - . sh ch ;digraphs B C D F G H J K L M N P Q R S T V W X Y Z A E I O U ;lexical (upper) only characters: ` + NULL 0 ANY @ BOUNDARY # SUBSET CN b c d f g h j k l m n p q r s t v w x y z sh ch SUBSET CNsib s x z sh ch ;sibilant consonants SUBSET CNpal c g ;palatal consonants SUBSET CNgem b d f g l m n p r s t ;geminated consonants SUBSET VO a e i o u SUBSET VObk a o u ;back vowels Two level rules – default mappings default mappings RULE "Defaults 1" 1 33 b c d f g h j k l m n p q r s t v w x y z sh ch a e i o u ' - ` + @ b c d f g h j k l m n p q r s t v w x y z sh ch a e i o u ' - 0 0 @ 1: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Two level rules – types of rules => the correspondence only occurs in the environment <= the correspondence always occurs in the environment <=> the correspondence always and only occurs in the environment /<= the correspondence never occurs in the environment Two level rules - examples Two level rule (always and only) ;========== ;Epenthesis ;========== ; LR: fox+s kiss+s church+s spy+s ; SR: foxes kisses churches spies RULE "+:e <=> [CNsib | y:i | o] ___ s [+:@ | #]" 7 8 + CNsib + s # y o @ e CNsib @ s # i o @ 1: 0 2 1 2 1 2 7 1 2: 3 2 5 2 1 2 7 1 3. 0 0 0 4 0 0 0 0 4. 0 0 1 0 1 0 0 0 5: 0 1 1 6 1 1 1 1 6: 0 1 0 1 0 1 1 1 7: 3 2 1 2 1 2 7 1 Two level rules - examples Two level rule (always) RULE "y:i <= @:CN ___ +:@ ~[i | ']" @ y y + i ' @ CN i @ @ i ' @ 1: 2 1 1 1 1 1 1 2: 2 1 3 2 1 1 1 3: 2 1 1 4 1 1 1 4: 0 0 0 0 1 1 0 4 7 Two level rules - examples Two level rule (only) RULE "y:i => @:CN ___ +:@ ~[i | ']" @ y + i ' @ CN i @ i ' @ 1: 2 0 1 1 1 1 2: 2 3 2 1 1 1 3. 0 0 4 0 0 0 4. 2 1 1 0 0 1 4 6 Zusammenfassung – TwolevelRegeln - Regeln beschreiben systematische Beziehungen zwischen Oberflächenformen und Formen, die vom Lexikon und der Grammatik generiert werden - werden als Transducer kompiliert
© Copyright 2026 Paperzz