Morphology in MT April 26th 2016 Road Map Road Map • Definition: What is morphology? Road Map • Definition: What is morphology? • Problems: How does morphology affect MT? Road Map • Definition: What is morphology? • Problems: How does morphology affect MT? • Solutions: How can we adapt MT systems to work with morphologically rich languages? What is morphology? • Word formation from smaller parts What is morphology? • Word formation from smaller parts • Inflectional • eat (V) + -s → eats (V) What is morphology? • Word formation from smaller parts • Inflectional • • eat (V) + -s → eats (V) Derivational • happy (A) + -ness → happiness (N) What is morphology? • Word formation from smaller parts • Inflectional • • Derivational • • eat (V) + -s → eats (V) happy (A) + -ness → happiness (N) Compounding • dish (N) + washer (N) → dishwasher (N) What is morphology? • establish (V) What is morphology? • establish (V) • disestablish (V) What is morphology? • establish (V) • disestablish (V) • disestablishment (N) What is morphology? • establish (V) • disestablish (V) • disestablishment (N) • antidisestablishment (N) What is morphology? • establish (V) • disestablish (V) • disestablishment (N) • antidisestablishment (N) • antidisestablishmentary (A) What is morphology? • establish (V) • disestablish (V) • disestablishment (N) • antidisestablishment (N) • antidisestablishmentary (A) • antidisestablishmentarian (N) What is morphology? • establish (V) • disestablish (V) • disestablishment (N) • antidisestablishment (N) • antidisestablishmentary (A) • antidisestablishmentarian (N) • antidisestablishmentarianism (N) What is morphology? Unabhängigkeitserklärung 我们 reunification библиотеку 입혔습니까 शब्दावली kitapçığa 聞かせられたら להבדיל étudiiez inquiriendo tusaatsiarunnanngittualuujunga Problems • Alignment • Phrase scoring • Input OOVs • Novel form generation • Language Modeling • Evaluation Alignment green colorless ideas sleep bezbarvé zelené myšlenky spí I like green pears mám rád zelené hrušky Alignment green colorless ideas sleep bezbarvé zelené myšlenky spí I like green pears mám rád zelené hrušky I sat under a green tree seděl jsem pod zeleným stromem Alignment green colorless ideas sleep bezbarvé zelené myšlenky spí I like green pears mám rád zelené hrušky I sat under a green tree seděl jsem pod zeleným stromem Phrase Scoring en cs p(cs | en) en cs p(cs | en) cat kočka 0.5629 cat kokour 0.112 cat kočku 0.1769 cat kokoura 0.074 cat kočce 0.0002 cat kokourovi 0.017 cat kočky 0.00004 cat kokoura 0.0051 Phrase Scoring en cs p(cs | en) en cs p(cs | en) cat kočka 0.7805 cat NOM 0.7117 cat kokour 0.2194 cat ACC 0.2646 cat DAT 0.018 cat GEN 0.0054 Input OOVs • La mejor aplicación sería la que erradicase el hambre del mundo. f la mejor aplicación ser sería la que erradica erradicó erradican erradico erradicando el hambre del mundo • e the best application to be would be to eradicates eradicated eradicate erradicate eradicating hunger world p(e|f) 0.9173 0.6330 0.8211 0.1182 0.3442 0.0596 0.9754 0.9303 0.9481 0.8731 0.9713 0.5385 0.2006 The best application would be to erradicase world hunger. 😒 Input OOVs • La mejor aplicación sería la que erradicase el hambre del mundo. f la mejor aplicación ser sería la que erradica erradicó erradican erradico erradicando el hambre del mundo • e the best application to be would be to eradicates eradicated eradicate erradicate eradicating hunger world p(e|f) 0.9173 0.6330 0.8211 0.1182 0.3442 0.0596 0.9754 0.9303 0.9481 0.8731 0.9713 0.5385 0.2006 OOV! The best application would be to erradicase world hunger. 😒 Novel Form Generation She had attempted to cross the road on her bike. Она пыталась пересечь пути на её велосипед. Language Modeling • Je porte un parapluie dans mon sac . • À Seattle , on doit porter un parapluie tous les jours . Evaluation Sentence Input The earnings on its 10 - year bonds are 28.45 % . BLEU +1 - Reference Výnos na jejích 10 - letých dluhopisech je na 28,45 % . 100.00 System 1 Příjmy na své desetileté dluhopisy jsou 28,45 % . 22.61 System 2 Příjmy na jeho 10 - letých poutech jsou 28,45 % . 32.04 Evaluation Sentence Input The earnings on its 10 - year bonds are 28.45 % . BLEU +1 - Reference Výnos na jejích 10 - letých dluhopisech je na 28,45 % . 100.00 System 1 Příjmy na své desetileté dluhopisy jsou 28,45 % . 22.61 System 2 Příjmy na jeho 10 - letých poutech jsou 28,45 % . 32.04 Zisk z jejích 10 - letých dluhopisů je 28,45 % . 32.04 Another Human Evaluation Sentence Input The earnings on its 10 - year bonds are 28.45 % . BLEU METEOR +1 - Reference Výnos na jejích 10 - letých dluhopisech je na 28,45 % . 100.00 100.00 System 1 Příjmy na své desetileté dluhopisy jsou 28,45 % . 22.61 18.6 System 2 Příjmy na jeho 10 - letých poutech jsou 28,45 % . 32.04 26.0 32.04 36.7 Another Zisk z jejích 10 - letých dluhopisů je 28,45 % . Human Banerjee and Lavie (2008), Denkowski and Lavie (2014) Overview Overview • Morphology on the source side Overview • Morphology on the source side • Stemming Overview • Morphology on the source side • Stemming • Lattices Overview • • Morphology on the source side • Stemming • Lattices Morphology on the target side Overview • • Morphology on the source side • Stemming • Lattices Morphology on the target side • Source enrichment Overview • • Morphology on the source side • Stemming • Lattices Morphology on the target side • Source enrichment • Factored models Overview • • Morphology on the source side • Stemming • Lattices Morphology on the target side • Source enrichment • Factored models • Synthetic phrases Overview • • Morphology on the source side • Stemming • Lattices Morphology on the target side • Source enrichment • Factored models • Synthetic phrases • Other formalisms? Overview • • • Morphology on the source side • Stemming • Lattices Morphology on the target side • Source enrichment • Factored models • Synthetic phrases • Other formalisms? Morphology in Neural MT Stemming • La mejor aplicación sería la que erradicase el hambre del mundo. f la mejor aplicación ser sería la que erradica erradicó erradican erradico erradicando el hambre del mundo • e the best application to be would be to eradicates eradicated eradicate erradicate eradicating hunger world p(e|f) 0.9173 0.6330 0.8211 0.1182 0.3442 0.0596 0.97540 0.9303 0.9481 0.8731 0.9713 0.5385 0.2006 The best application would be to erradicase world hunger. 😒 Stemming • La mejor aplicación sería la que erradicase el hambre del mundo. f la mejor aplicación ser sería la que erradica erradicó erradican erradico erradicando el hambre del mundo • e the best application to be would be to eradicates eradicated eradicate erradicate eradicating hunger world p(e|f) 0.9173 0.6330 0.8211 0.1182 0.3442 0.0596 0.97540 0.9303 0.9481 0.8731 0.9713 0.5385 0.2006 OOV! The best application would be to erradicase world hunger. 😒 Stemming • La mejor aplic ser la que erradic el hambr del mundo. f la mejor aplic ser ser la que erradic erradic erradic erradic el hambr del mundo • e the best application to be would be to eradicates eradicated eradicate eradicating hunger world p(e|f) 0.9173 0.6330 0.8211 0.0807 0.0338 0.0596 0.0633 0.2173 0.3880 0.1503 0.5385 0.2006 Not OOV! The best application to be to eradicate world hunger. 😐 Stemming • La mejor aplic ación ser ía la que erradic ase el hambr e del mundo. f la mejor aplic ación ser ser ía la que erradic a erradic ó erradic an erradic o erradic ando el hambr e del mundo • e the best application to be would be to eradicates eradicated eradicate erradicate eradicating hunger world p(e|f) 0.9173 0.6330 0.8211 0.1182 0.3442 0.0596 0.97540 0.9303 0.9481 0.8731 0.9713 0.5385 0.2006 f e p(e|f) erradic erradic ase erradicate erradicating have been 0.2571 0.1253 0.1334 The best application would be to have been eradicating world hunger. 😐 Input Lattices sería ser/2 ía aplicación aplic/2 mejor 2 aplic/1 ser/1 ación 3 erradicase 4 5 6 la 7 que 8 erradic/1 erradic/2 9 ase 10 hambre el 11 hambr/2 hambr/1 e 12 13 del 14 la mundo 15 1 . 0 16 • The best application would be to eradicate world hunger. 😀 Dyer et. al (2008) Input Lattices "a competition-induced price fall" Dyer et. al (2008) Take Aways • If you have source side morphology: • Stem, lowercase, and compound split your data when doing alignment • Extract phrases normally • Use input lattices during tuning and decoding Target Side Morphology I want to eat a sandwich . Chci jíst sendvič . Source Enrichment I want-1s eat-inf sandwich-acc . Chci jíst sendvič . Avramidis and Koehn (2008) Factored Models Surface neue häuser werden gebaut Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. neue neu JJ +pl +fem häuser häus NN +pl VB +3+pl +pres VBN +past +part werden werden gebaut bauen Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. neue neu JJ +pl +fem häuser häus NN +pl VB +3+pl +pres VBN +past +part werden werden gebaut bauen Surface Lemma POS Morph. Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. neue neu JJ +pl +fem häuser häus NN +pl VB +3+pl +pres VBN +past +part werden werden gebaut bauen Surface Lemma POS Morph. Koehn and Hoang (2007) Factored Models Surface Lemma neue neu häuser häus werden werden gebaut bauen POS Morph. Surface Lemma POS Morph. +pl JJ +fem DE EN p(EN | DE) häus house 0.76 NN +pl häus home 0.15 haüs buildin 0.06 +3+pl g VB haüs shell 0.02 +pres VBN +past +part Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. Surface Lemma new neue neu JJ +pl +fem häuser häus NN +pl house VB +3+pl +pres be VBN +past +part build werden werden gebaut bauen POS Morph. Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. Surface Lemma new neue neu JJ +pl +fem häuser häus NN +pl house VB +3+pl +pres be VBN +past +part build werden werden gebaut bauen POS Morph. Koehn and Hoang (2007) Factored Models DE EN VB+3p+pl+pres VB+3p+pl+pres Surface Lemma neue neu häuser häus werden werden gebaut bauen p(EN | DE) 0.81 POS Morph. Surface Lemma POS VB+3p+pl+pres VB+3p+sg+pres 0.10 VB+3p+pl+pres +pl JJ +fem VB+3p+pl+pres PRN+3p+pl NN+pl Morph. 0.04 new 0.03 NN +pl house VB +3+pl +pres be VBN +past +part build Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. new JJ house NN +pl VB +3+pl +pres VBN +past +part neue neu JJ +pl +fem häuser häus NN +pl VB +3+pl +pres VBN +past +part werden werden gebaut bauen Surface Lemma be build POS Morph. Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. new JJ house NN +pl VB +3+pl +pres VBN +past +part be build Koehn and Hoang (2007) Factored Models Surface Lemma POS Morph. new new JJ houses house NN +pl VB +3+pl +pres VBN +past +part are built be build Koehn and Hoang (2007) Factored Models Koehn and Hoang (2007) Factored Models Pros Cons Much more human-like Huge search space Can generate novel forms Word forms generated independently Factored language models Changes whole MT Pipeline Koehn and Hoang (2007) Factored Models Pros Cons Much more human-like Huge search space Can generate novel forms Word forms generated independently Factored language models Changes whole MT Pipeline Koehn and Hoang (2007) POS Language Models • Convert corpus to POS tags instead of surface forms • Build large (7~8) ngram models The president announced his new plan yesterday . The council approved the sanctions on Iran . Polls in the UK show the LDP up 2 % over last year . POS Language Models • Convert corpus to POS tags instead of surface forms • Build large (7~8) ngram models DT NN VBD PRP$ JJ NN ADV PUNC DT NN VBD DT NN IN NNP PUNC NNS IN DT NNP VB DT NNP NUM PUNC IN JJ NN PUNC Brown Cluster LMs • Automatically induce word classes • Can capture more nuances of the language The president announced his new plan yesterday . The council approved the sanctions on Iran . Polls in the UK show the LDP up 2 % over last year . Brown Cluster LMs • Automatically induce word classes • Can capture more nuances of the language • Example The president announced his new plan yesterday . The council approved the sanctions on Iran . Polls in the UK show the LDP up 2 % over last year . Brown Cluster LMs • Automatically induce word classes • Can capture more nuances of the language • Example 10010 110110010 0100111110010 10011111 1010011100 110111111010 01011011111 000000 10010 1101101011110 0100111100111 10010 110100111111 001110110 111100011 000000 110001110 0011100 10010 11111101011 01000001111 10010 1111110100 010110000 111101011 1111101110 00111011111110 10111010 1111100101 000000 Synthetic Phrases • Dynamically add phrases to the translation table • Can condition on source sentence, phrase table, and more! • Originally used to insert determiners in RU →EN translation Tsvetkov et al. (2013) Synthetic Phrases σ:пытаться_V,+,μ:mis2sfm2e она пыталась пересечь пути на ее велосипед -1 +1 she had attempted to cross the road on her bike C50 C473 C28 C8 C275 C37 C43 C82 C94 C331 PRP VBD VBN TO VB nsubj aux root DT NN IN PRP$ NN xcomp Chahuneau et al. (2013) Synthetic Phrases • Generate compound words in the target language • Character-level MT system Matthews et al. (2015) Synthetic Phrases • • Generate compound words in the target language Character-level MT system EN DE tomato tomate processing v e r a r b e i t processing b e h a n d l u n g processing v e r e d e l u n g Score -2.58 -0.75 -2.74 -4.94 EN DE Score <suf> <suf> <end> <end> n s ung ende -3.71 -2.53 -5.73 -9.86 Matthews et al. (2015) Synthetic Phrases • • 1 processing <suf> 2 tomato 3 <end> processing processing 7 tomato 4 <suf> 5 tomato <end> DE tomato tomate processing v e r a r b e i t processing b e h a n d l u n g processing v e r e d e l u n g Character-level MT system tomato 0 EN Generate compound words in the target language Score -2.58 -0.75 -2.74 -4.94 EN DE Score <suf> <suf> <end> <end> n s ung ende -3.71 -2.53 -5.73 -9.86 processing 6 Matthews et al. (2015) Synthetic Phrases • • 1 processing <suf> 2 tomato 3 <end> processing processing 7 tomato 4 <suf> 5 tomato <end> DE tomato tomate processing v e r a r b e i t processing b e h a n d l u n g processing v e r e d e l u n g Character-level MT system tomato 0 EN Generate compound words in the target language Score -2.58 -0.75 -2.74 -4.94 EN DE Score <suf> <suf> <end> <end> n s ung ende -3.71 -2.53 -5.73 -9.86 processing 6 tomatenverarbeitung Matthews et al. (2015) More Ideas • Use information to synthetically add/modify feature values More Ideas • Use information to synthetically add/modify feature values • Add synthetic phrases for discourse-level information More Ideas • Use information to synthetically add/modify feature values • Add synthetic phrases for discourse-level information • Add synthetic grammar rules in addition to phrase pairs More Ideas • Use information to synthetically add/modify feature values • Add synthetic phrases for discourse-level information • Add synthetic grammar rules in addition to phrase pairs • Many, many more! Take Aways • Use Brown Cluster LMs (c=600, o=7) • (whether you have target morphology or not!) • Synthetic phrases can solve a wide range of targetside generation problems • Check out morphogen Morphology in Neural MT • Morpheme-level models • Character-level models • Hybrid models Standard Attentional Models Input Sentence Matrix x1 x2 x3 x4 Standard Attentional Models Output state Input Sentence Matrix x1 x2 x3 x4 Standard Attentional Models Output state Input Sentence Matrix x1 x2 x3 x4 Attention Vector Standard Attentional Models Output state Input Sentence Matrix Context x1 x2 x3 x4 Attention Vector Standard Attentional Models Output state Input Sentence Matrix Context y1 x1 x2 x3 x4 Attention Vector Standard Attentional Models Output state Input Sentence Matrix Context y1 x1 x2 x3 x4 Attention Vector Standard Attentional Models Output state Input Sentence Matrix Context y1 x1 x2 x3 x4 Attention Vector But why do we have to just use independent word vectors? Morpheme-Level Models x1 x2 x3 x4 • Typically we just look up a word vector for each word from a big table • But there's no reason we can't do something smarter Morpheme-Level Models çin'in tutumu belli değil Morpheme-Level Models çin'in tutumu belli değil But we don't want the word vectors of these to all be independent: çin çin'i çin'e çin'in çin'deki çin'de çin'den Morpheme-Level Models çin+SG+GEN tutum+SG+NOM belli değil+3+SG+PRES Morpheme-Level Models çin+SG+GEN tutum+SG+NOM belli değil+3+SG+PRES Matthews et al. (2016) Morpheme-Level Models Output state Input Sentence Matrix Context y1 x1 x2 x3 x4 Attention Vector Morpheme-Level Models Output state Input Sentence Matrix Context gel x1 x2 x3 x4 Attention Vector +PAST +3SG Morpheme-Level Models Output state Input Sentence Matrix Context gel x1 x2 x3 x4 Attention Vector +PAST +3SG Morpheme-Level Models Output state Input Sentence Matrix Context gel +PAST +3SG geldi x1 x2 x3 x4 Attention Vector Morpheme-Level Models Morpheme-Level Models • Combine vectors for morphemes to get word vectors Morpheme-Level Models • Combine vectors for morphemes to get word vectors • Can be used on on the input or output sides Morpheme-Level Models • Combine vectors for morphemes to get word vectors • Can be used on on the input or output sides • But what do we do if we don't have a morphological analyzer? Character-Level Models ç i n ' i n Ling et al. (2015) Character-Level Models • ç i n ' i We can use the same trick to make character-level models n Ling et al. (2015) Character-Level Models ç i n ' i • We can use the same trick to make character-level models • Pros: Can elegantly handle passthroughs, OOVs, morphology n Ling et al. (2015) Character-Level Models ç i n ' i n • We can use the same trick to make character-level models • Pros: Can elegantly handle passthroughs, OOVs, morphology • Cons: Harder/slower to train Ling et al. (2015) Hybrid Models çin'in çin+SG+GEN ç i n ' i n Matthews et al. (2016) Hybrid Models • çin'in Combine word-, morpheme-, and character-level models çin+SG+GEN ç i n ' i n Matthews et al. (2016) Hybrid Models • Combine word-, morpheme-, and character-level models • Trains quickly because of word- and morpheme-level models çin'in çin+SG+GEN ç i n ' i n Matthews et al. (2016) Hybrid Models • Combine word-, morpheme-, and character-level models • Trains quickly because of word- and morpheme-level models • General because of character-level model çin'in çin+SG+GEN ç i n ' i n Matthews et al. (2016) Take Aways • Word vectors don't have to be naïve table lookups • You can (and should!) leverage any languagespecific tools you have access too • Character-level models add generality • Especially useful in combination with higher-level models to make learning tractable
© Copyright 2026 Paperzz