Language and Computers What is Machine Translation? Machine Translation Introduction Introduction Examples for Translations Background: Dictionaries Language and Computers Machine Translation Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Examples for Translations Translation is the process of: I moving texts from one (human) language (source language) to another (target language), I in a way that preserves meaning. Statistical Modeling Based on Dickinson, Brew, & Meurers (2013) Indiana University Fall 2013 Language and Computers Machine Translation Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems Machine translation (MT) automates (part of) the process: I Fully automatic translation I Computer-aided (human) translation References Phrase-Based Translation What makes MT hard? Evaluating MT systems References 1 / 49 What is MT good for? I When you need the gist of something and there are no human translators around: I I I If you have a limited vocabulary and a small range of sentence types: I I I I translating e-mails & webpages obtaining information from sources in multiple languages (e.g., search engines) translating weather reports translating technical manuals translating terms in scientific meetings If you want your human translators to focus on interesting/difficult sentences while avoiding lookup of unknown words and translation of mundane sentences. Language and Computers 2 / 49 Is MT needed? Language and Computers Machine Translation Machine Translation Introduction Introduction Examples for Translations Examples for Translations Background: Dictionaries Background: Dictionaries Linguistic knowledge based systems I Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems I Translation is of immediate importance for multilingual countries (Canada, India, Switzerland, . . . ), international institutions (United Nations, International Monetary Fund, World Trade Organization, . . . ), multinational or exporting companies. The European Union has 23 official languages. All federal laws and other documents have to be translated into all languages. References Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 4 / 49 3 / 49 Example translations The simple case Language and Computers Machine Translation Example translations A slightly more complex case Introduction I It will help to look at a few examples of real translation before talking about how a machine does it. Take the simple Spanish sentence and its English translation below: ‘I speak Spanish.’ I I Words in this example pretty much translate one-for-one But we have to make sure hablo matches with Yo, i.e., that the subject agrees with the form of the verb. Examples for Translations Background: Dictionaries Background: Dictionaries Linguistic knowledge based systems The order and number of words can differ: Direct transfer systems Interlingua-based systems Machine learning based systems Alignment (1) (Yo) hablo español. I speak1st ,sg Spanish Machine Translation Introduction Examples for Translations I Language and Computers Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 5 / 49 Linguistic knowledge based systems Direct transfer systems (2) a. Tu hablas español? You speak2nd ,sg Spanish ‘Do you speak Spanish?’ b. Hablas español? Speak2nd ,sg Spanish ‘Do you speak Spanish?’ Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 6 / 49 What goes into a translation Language and Computers Different approaches to MT Machine Translation Introduction Introduction Examples for Translations Examples for Translations Background: Dictionaries Some things to note about these examples and thus what we might need to know to translate: I I I I Words have to be translated → dictionaries Words are grouped into meaningful units → syntax Word order can differ from language to languge The forms of words within a sentence are systematic, e.g., verbs have to be conjugated, etc. Language and Computers Machine Translation Background: Dictionaries Linguistic knowledge based systems We’ll look at some basic approaches to MT: Direct transfer systems Interlingua-based systems I Machine learning based systems Systems based on linguistic knowledge (Rule-Based MT (RBMT)) I Alignment Statistical Modeling Phrase-Based Translation I What makes MT hard? Machine learning approaches, i.e., statistical machine translation (SMT) I Evaluating MT systems Direct transfer systems SMT is the most popular form of MT right now References Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 7 / 49 Dictionaries Language and Computers 8 / 49 Direct transfer systems Machine Translation Introduction Examples for Translations An MT dictionary differs from a “paper” dictionary: I must be computer-usable (electronic form, indexed) I needs to be able to handle various word inflections I can contain (syntactic and semantic) restrictions that a word places on other words I I I e.g., subcategorization information: give needs a giver, a person given to, and an object that is given e.g., selectional restrictions: if X eats, X must be animate contains frequency information I Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Machine Translation A direct transfer systems consists of: I A source language grammar I A target language grammar I Rules relating source language underlying representation (UR) to target language UR Interlingua-based systems Machine learning based systems I Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems Language and Computers I A direct transfer system has a transfer component which relates a source language representation with a target language representation. This can also be called a comparative grammar. We’ll walk through the following French to English example: References Introduction Examples for Translations Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References (3) Londres plaı̂t à Sam. London is pleasing to Sam for SMT, may be the only piece of additional information ‘Sam likes London.’ 9 / 49 Steps in a transfer system 1. source language grammar analyzes the input and puts it into an underlying representation (UR). Londres plaı̂t à Sam → Londres plaire Sam (source UR) 2. The transfer component relates this source language UR (French UR) to a target language UR (English UR). French UR English UR X plaire Y ↔ Eng(Y) like Eng(X) (where Eng(X) means the English translation of X) Londres plaire Sam (source UR) → Sam like London (target UR) Language and Computers 10 / 49 Notes on transfer systems Language and Computers Machine Translation Machine Translation Introduction Introduction Examples for Translations Background: Dictionaries Examples for Translations I Linguistic knowledge based systems Direct transfer systems I Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation I What makes MT hard? Evaluating MT systems References 3. target language grammar translates the target language UR into an actual target language sentence. Sam like London → Sam likes London 11 / 49 The transfer mechanism is in theory reversible; e.g., the plaire rule works in both directions I Not clear if this is desirable: e.g., Dutch aanvangen should be translated into English as begin, but begin should be translated as beginnen. Because we have a separate target language grammar, we are able to ensure that the rules of English apply; like → likes. RBMT systems are still in use today, especially for more exotic language pairs Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 12 / 49 Levels of abstraction Language and Computers Direct transfer & syntactic similarity Machine Translation This method works best for structurally-similar languages Introduction Examples for Translations I I There are differing levels of abstraction at which transfer can take place. So far we have looked at URs that represent only word information. We can do a full syntactic analysis, which helps us to know how the words in a sentence relate. Or we can do only a partial syntactic analysis, such as representing the dependencies between words. Introduction Examples for Translations S Background: Dictionaries I Language and Computers Machine Translation Background: Dictionaries Linguistic knowledge based systems Linguistic knowledge based systems NP Direct transfer systems Interlingua-based systems VP Direct transfer systems Interlingua-based systems Machine learning based systems Machine learning based systems John Alignment Statistical Modeling V PP Alignment Statistical Modeling Phrase-Based Translation Phrase-Based Translation What makes MT hard? goes Evaluating MT systems References P on What makes MT hard? NP Evaluating MT systems Det N the roller coaster References 14 / 49 13 / 49 Direct transfer & syntactic similarity (2) Language and Computers Interlinguas Machine Translation Machine Translation Introduction Introduction Examples for Translations S Examples for Translations Background: Dictionaries NP Background: Dictionaries Linguistic knowledge based systems VP Direct transfer systems I Interlingua-based systems V John Machine learning based systems PP Alignment I Statistical Modeling Phrase-Based Translation fährt P NP auf What makes MT hard? Evaluating MT systems Det N der Achterbahn Language and Computers Ideally, we could use an interlingua = a language-independent representation of meaning. Benefit: To add new languages to your MT system, you merely have to provide mapping rules between your language and the interlingua, and then you can translate into any other language in your system. References Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 15 / 49 The translation triangle Language and Computers Linguistic similarities ct io n ra Introduction I What exactly should be represented in the interlingua? I Direct transfer systems Interlingua-based systems Machine learning based systems Statistical Modeling s Similar word order and grammar Alignment es Similar concepts. Different grammar Examples for Translations Linguistic knowledge based systems Similar Abstract Meanings n te re st Machine Translation Introduction Examples for Translations nc Co Ab Interlingual problems Machine Translation Background: Dictionaries Meaning similarities Word-level similarities 16 / 49 Phrase-Based Translation I Evaluating MT systems References Similar grammar, concepts, and word order e.g., English corner = Spanish rincón = ’inside corner’ or esquina = ’outside corner’ A fine-grained interlingua can require extra (unnecessary) work: I What makes MT hard? Similar concepts and grammar Language and Computers I e.g., Japanese distinguishes older brother from younger brother, so we have to disambiguate English brother to put it into the interlingua. Then, if we translate into French, we have to ignore the disambiguation and simply translate it as frère, which simply means ’brother’. Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References Words and concepts equivalent, grammar and word order same Source Language Target Language 17 / 49 18 / 49 Machine learning Instead of trying to tell the MT system how we’re going to translate, we might try a machine learning approach I I We can look at how often a source language word is translated as a target language word, i.e., the frequency of a given translation, and choose the most frequent translation. But how can we tell what a word is being translated as? There are two different cases: I I We are told what each word is translated as: text alignment We are not told what each word is translated as: use a bag of words Language and Computers Sentence alignment Language and Computers Machine Translation Machine Translation Introduction Introduction Examples for Translations Examples for Translations Background: Dictionaries Background: Dictionaries Linguistic knowledge based systems Direct transfer systems I Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation I What makes MT hard? Evaluating MT systems sentence alignment = determine which source language sentences align with which target language ones (what we assumed in the bag of words example). Intuitively easy, but can be difficult in practice since different languages have different punctuation conventions. References Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References We can also attempt to learn alignments, as a part of the process, as we will see. 19 / 49 Word alignment Language and Computers 20 / 49 Different word alignments Machine Translation Machine Translation Introduction Introduction Examples for Translations Examples for Translations Background: Dictionaries I word alignment = determine which source language words align with which target language ones I I Much harder than sentence alignment to do automatically. But if it has already been done for us, it gives us good information about a word’s translation equivalent. Language and Computers Background: Dictionaries Linguistic knowledge based systems I Direct transfer systems Interlingua-based systems Machine learning based systems Alignment I Statistical Modeling Phrase-Based Translation One word can map to one word or to multiple words. Likewise, sometimes it is best for multiple words to align with multiple words. English-Russian examples: I What makes MT hard? I I Evaluating MT systems I one-to-one: khorosho = well one-to-many: kniga = the book many-to-one: to take a walk = gulyat’ many-to-many: at least = khotya by (’although if/would’) References Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 21 / 49 Calculating probabilities I With word alignments, it is relatively easy to calculate probabilities. I e.g., What is the probability that run translates as correr in Spanish? 1. Count up how many times run appears in the English part of your bi-text. e.g., 500 times 2. Out of all those times, count up how many times it was translated as (i.e., aligns with) correr. e.g., 275 (out of 500) times. 3. Divide to get a probability: 275/500 = 0.55, or 55% I Word alignment gives us some frequency numbers, which we can use to align new cases, using other information, too (e.g., contextual information) Language and Computers 22 / 49 The bag of words method Language and Computers Machine Translation Machine Translation Introduction Introduction Examples for Translations Examples for Translations Background: Dictionaries Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 23 / 49 I What if we’re not given word alignments? I How can we tell which English words are translated as which German words if we are only given an English text and a corresponding German text? I I We can treat each sentence as a bag of words = unordered collection of words. If word A appears in a sentence, then we will record all of the words in the corresponding sentence in the other language as appearing with it. Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 24 / 49 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation Example for bag of words method Lecture 3 Word Alignment and Phrase Models Overview p Example for bag of words method Language and Computers Machine Translation Introduction Examples for Translations Philipp Koehn I English He [email protected] Russian well. Background: Dictionaries I Russian On khorosho govorit po-russki. Linguistic knowledge based systems School of Informatics Eng He He He He Direct transfer systems Machine learning based systems Alignment Calculating probabilities: sentence 1 EM algorithm Introduction Improved word alignment Background: Dictionaries Phrase-based SMT Linguistic knowledge based systems Examples for Translations Direct transfer systems Interlingua-based systems Machine learning based systems Alignment 1. Count up the number of Russian words: 4. Statistical Modeling Phrase-Based Translation Statistical Modeling Phrase-Based Translation 2. Assign each word equal probability of translation: 1/4 = 0/25, or 25%. What makes MT hard? Evaluating MT systems The idea is that, over thousands, or even millions, of sentences, He will tend to appear more often with On, speaks will appear with govorit, and so on. Machine Translation So, for He in He speaks Russian well/On khorosho govorit po-russki, we do the following: Interlingua-based systems Rus Eng University of Edinburgh Rus On speaks On khorosho speaks khorosho govorit ... ... po-russki well po-russki Language and Computers Statistical modeling What makes MT hard? Evaluating MT systems References References – p.1 – p.2 Philipp Koehn, University of Edinburgh 26 / 49 25 / 49 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Example forModeling bag of words Statistical p method 2 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Probabilities used (2) in IBM Statistical Modeling p models Calculating probabilities: sentence 2 Language and Computers Language and Computers Machine Translation Machine Translation Introduction Mary did not slap the green witch Background: Dictionaries If we also have He is nice./On simpatich’nyi., then for He, we do the following: Linguistic knowledge based systems Direct transfer systems Machine learning based systems Alignment Note that we are NOT counting the number of English words: we count the number of possible translations Maria nothe daba una of bofetada verde 2. Count up number times Onaisla thebruja translation =2 I Statistical Modeling Phrase-Based Translation What makes MT hard? times out of 6 = 1/3 = 0.33, or 33%. All other words have the probability 1/6 = 0.17, or 17%, so On is Not the best translation He. P (f je) directly sufficient data tofor estimate Linguistic knowledge based systems n(#|word ) = probability of the number of words in the target language that the source word generates I p-null = probability of a null word appearing I t (tword |sword ) = probability of a target word, given the Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? I d (tposition|sposition) = probability of a target word Evaluating MT systems Breakappearing the process into smaller stepsgiven the source in position tposition, References References position sposition But we need alignments to estimate these parameters. – p.3 Philipp Koehn, University of Edinburgh Background: Dictionaries worduna (i.e.,bofetada what we’ve so far)verde Mariasource no daba a seen la bruja Evaluating MT systems Learn P (f je) from a parallel corpus Examples for Translations Probabilistic models are generally more sophisticated, treating the problem as the source language generating the target and taking into account probabilities such as: I Interlingua-based systems 1. Count up the number of possible translation words: 4 from the first sentence, 2 from the second = 6 total. Introduction Mary did not slap the green witch Examples for Translations 3 Philipp Koehn, University of Edinburgh – p.4 4 27 / 49 A Generative Story— Lecture (IBM3:Models) Statistical Machine Translation Word Alignment and Phrase Models p Statistical Modeling (3) p 28 / 49 Beyond Bags of Words Language and Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models Computers Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Machine Translation Machine Translation Introduction Introduction Parallel Corpora Statistical Modelingp(4) p Examples for Translations ... la amaison ...an laEnglish maisonstring bluee ... fleur Generate story how getsla to be a Background: Dictionaries Mary did not slap the green witch n(3|slap) Linguistic knowledge based systems Mary not slap slap slap the green witch Direct transfer systems p-null Interlingua-based systems Mary not slap slap slap NULL the green witch Machine learning based systems t(la|the) Alignment Maria no daba una botefada a la verde bruja Statistical Modeling Phrase-Based Translation d(4|4) What makes MT hard? Maria no daba una bofetada a la bruja verde ... Linguistic knowledge based systems – choices in story are decided by reference to parameters j Direct transfer systems – e.g., p(bruja witch) Interlingua-based systems ... the house ... the blue house ... the flower ... – Incomplete usually long anddata hairy, but mechanical to extract from the story Alignment Statistical Modeling Phrase-Based Translation – English and foreign words, but no connections them A Training to obtain parameter estimates frombetween possibly chicken-and-egg problem What makes MT hard? Evaluating MT systems References parameters of our generative story. generative story I If we had the parameters, we could estimate the alignments. – if we had the parameters, we could estimate the connections – p.5 5 Machine learning based systems Formula for P (f je) in terms of parameters word alignments, we could estimate of the – if we had the connections, we could estimate the parameters our Source: Introduction to Statistical Machine Translation, Chris Callison-Burch and Philipp Koehn, http://www.iccs.inf.ed.ac. uk/∼pkoehn/publications/esslli-slides-day3.pdf Philipp Koehn, University of Edinburgh Background: Dictionaries foreign string f – Ioff-the-shelf EMthe If we had References EM Examples for Translations incomplete Chicken data and egg problem Evaluating MT systems Probabilities for smaller steps can be learned p Language and Sta Computers – p.6 Philipp Koehn, University of Edinburgh 29 / 49 Philipp Koehn, University of Edinburgh 6 – p.7 7 30 / 49Ph Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Sta EM Algorithm (2) p EM ... la maison ... la maison blue ... la fleur ... – p.7 Philipp Koehn, University of Edinburgh 7 P Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Sta EM Initial Algorithm Step (2) p Algorithm MExpectation Algorithm Maximization p Incomplete data Language and Computers Machine Translation ... la maison ... la maison blue ... la fleur ... Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Introduction – if we had complete data, would could estimate model Examples for Translations Parallel Corpora p Background: Dictionaries Linguistic knowledge based systems EM EMininaanutshell nutshell – initialize model parameters (e.g. uniform) Linguistic knowledge based systems maison maison blue ...the la flower fleur ... ...... thelahouse ...... thelablue house ... ... Direct transfer systems Interlingua-based systems Interlingua-based systems Initial step: all connections equally likely Machine learning based systems Alignment – assign probabilities to the missing data 2. (re-)assign probabilities to the missing data Alignment ... the house ... the blue house ... the flower ... Model learns that, e.g., la is often connected with the Statistical Modeling Phrase-Based Translation – estimate model parameters from completed data 3. (re-)estimate model parameters from completed data – iterate (weighted counts) What makes MT hard? Statistical Modeling Phrase-Based Translation What makes MT hard? Incomplete data Evaluating MT systems 4. iterate, i.e., repeat steps 2&3 until you hit some stopping point EM Background: Dictionaries Direct transfer systems Machine learning based systems 1. initialize model parameters (e.g. uniform) Sta Introduction Examples for Translations The Expectation Maximization (EM) algorithm works – if we had we couldto fillestimate in the gapsthe in the data forwards andmodel, backwards probabilities: E Language and Computers Machine Translation References Evaluating MT systems – English and foreign words, but no connections between them All connections equally likely. Chicken and egg problem I References – if we had the connections, we could estimate the parameters of our generative story hilipp Koehn, University of Edinburgh – p.9 Philipp Koehn, University of Edinburgh – if we had the parameters, we could estimate the connections – p.8 8 9 Ph 31 / 49 After 1st Iteration 32 / 49 After Another Iteration Language and Computers Machine Translation 7 Sta Introduction Introduction EM Algorithm (4) p Examples for Translations M Algorithm (3) p Background: Dictionaries Statistical Translation Lecture 3: Word Alignment ...Machine la maison ...—la maison bleu ... and la Phrase fleur Models ... p Direct transfer systems Interlingua-based systems ... the house ... the blue house ... the flower ... Direct transfer systems Alignment ... the house ... the blue house ... the flower ... Statistical Modeling Phrase-Based Translation Phrase-Based Translation What makes MT hard? After one iteration Connections, e.g., between la and the are more likely What makes MT hard? After another iteration Evaluating MT systems Connections between e.g. la and the are more likely. EM Interlingua-based systems Machine learning based systems ... la maison ... la maison blue ... la fleur ... Alignment Statistical Modeling Sta Linguistic knowledge based systems EM Algorithm (2) p Machine learning based systems E Examples for Translations Background: Dictionaries Linguistic knowledge based systems ... la maison ... la maison blue ... la fleur ... Ph Machine Translation Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p I Language and Computers – p.7 Philipp Koehn, University of Edinburgh Evaluating MT systems ... the house ... the blue house ... the flower ... Connections between e.g. fleur and flower are more It becomes apparent that connections, e.g., between fleur likely (pigeon hole principle). and Initial step: all connections equally likely flower are more likely (pigeon hole principle) I References References Model learns that, e.g., la is often connected with the 33 / 49 Convergence Phrase-Based Translation Statistical Machine Translation — Lecture 3: WordOverview Alignment and Phrase Models p 11 Philipp Koehn, University of Edinburgh atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p hilipp Koehn, University of Edinburgh 34 / 49 10 EM Statistical Algorithm (6) p — Lecture 3: Word Alignment and Phrase Models p Machine Translation Language and – p.10 Computers Language and – p.11 Computers Machine Translation Machine Translation IBM Model 1 p translation Statistical Machine Translation Lecture 3: Word Alignment and Phrase Models p But this word-based — doesn’t account for Introduction Examples for Translations ... la maison ... la maison Growing Heuristic p atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p GROW-DIAG-FINAL(e2f,f2e): Y t(e jf many-to-many mappings between languages m Phrase-Based Translation p Background: Dictionaries bleu ... la fleur ... Philipp Koehn, University of Edinburgh p(e; ajf) = Linguistic knowledge (l + 1) j a(j ) Introduction – p.9 Examples for Translations 9 Machine learning based systems ... the house ... the blue house ... the flower ... ... la maison ... la maison bleu ... la fleur ... GROW-DIAG(): ) Linguistic knowledge based systems 1 Phrase-Based Translation What makes MT hard? Evaluating MT systems Statistical Modeling – English sentence e = e1 :::el What makes MT hard? Evaluating MT systems –by any words, a not necessarily linguistically motivated , with the probabilty t the sequence alignmentoffunction Foreign “phrases” are translated into English. the normalization factor is required to turn the formula into a proper –IEach phrase is be translated into English Phrases may reordered. I EM Phrase-Based Translation References ... the house ... the blue house ... the flower ... probability function Current models allow for many-to-one mappings → we can Phrases are reordered use those to induce many-to-many mappings After another iteration See [Koehn et al., NAACL2003] as introduction 35 / 49 – p.13 It becomes apparent that connections, e.g., between fleur Philipp Koehn, University of Edinburgh – p.31 31 atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p hilipp Statistical Koehn, University of Edinburgh 12 p Machine Translation — Lecture 3: Word Alignment and Phrase Models One example p Advantages of Phrase-Based Translation p Alignment in Canada a(j ) References 13 Philipp Koehn, University of Edinburgh to the conference Sta Machine learning based systems ... la maison ... la maison bleu ... la fleur ... –Foreign input is segmented in phrases each English word ej is generated by a English word f , as defined hilipp Koehn, University of Edinburgh Direct transfer systems Interlingua-based systems What is going on? –Tomorrow foreign sentence f = ffly I will EM Algorithm (4) p:::fm Alignment Statistical Modeling iterate until no new points added for english word e = 0 ... en for foreign word f = 0 ... fn p(la|the) = 0.453 if ( e aligned with f ) p(le|the) point = 0.334 for each ( e-new,the f-new ): ... the housep(maison|house) ... neighboring the blue house = ... 0.876 flower ... if ( ( e-new not aligned and f-new not aligned ) and p(bleu|blue) = 0.563 ( e-new, f-new ) in union( e2f, f2e ) ) add alignment ... point ( e-new, f-new ) Convergence FINAL(a): for english word e-new = 0 ... en Parameter estimation from Inherent hidden structure for foreign word f-new =revealed 0the ... connected fn by EM corpus if ( ( e-new not aligned or f-new not aligned ) and ( e-new, f-new ) in alignment a ) add alignment point ( e-new, f-new ) Ph Background: Dictionaries m based systems neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1)) Morgen fliege ich nach Kanada zur Konferenz Direct transfer systems j =1 alignment = intersect(e2f,f2e); Interlingua-based systems GROW-DIAG(); FINAL(e2f); FINAL(f2e); Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p M Algorithm (5) p Ph – p.12 flower are more likely Philipp and Koehn, University of Edinburgh – p.14 36 / 49 14 (pigeon hole principle) – p.32 32 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p IBM Model 1 and EM p HowKoehn, to Learn the Phrase Translation Table? p11 Philipp University of Edinburgh – p.11 Ph witch University of Edinburgh 27 Philipp Koehn, University of Edinburgh – p.27 versity of Edinburgh – p.28 27 Philipp Koehn, University of Edinburgh ne Translation — Lecture 3: Word Alignment and Phrase Models p Translation — Lecture 3: Word Alignment and Phrase Models p Alignments d Intersecting Word Alignments p english to spanish ary bofetada bruja daba a la una verde la Mary bruja Maria verde Introduction bofetada Maria no daba una a Examples for Translations bofetada no daba una a la bruja verde Mary Background: Dictionaries la bruja verde Mary Linguistic knowledge based systems did not lap la not not not slap the the green reen slap green intersection witch bofetada Maria no daba una a la did slap not the slap Machine learning based systems Statistical Modeling Phrase-Based Translation la Phrase-Based Translation green What makes MT hard? bruja verde Evaluating MT systems Mary the did Grow additional alignment points the Evaluating MT systems green References not slap I green References witch Heuristically add alignments along the diagonal (Och & [Och and Ney, CompLing2003] Ney, Computational Linguistics, 2003) on of GIZA++ bidirectional alignments witch Grow additional alignment points 37 / 49 ection of Phrases GIZA++ bidirectional alignments Induced versity of Edinburgh 29 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p [Och andTranslation Ney, CompLing2003] 30 Advantages Phrase-Based Philipp Koehn, University of of Edinburgh Machine Translation – p.30and Language Computers Introduction Word Alignment Induced Phrases (3) p Examples for Translations Background: Dictionaries bruja la verde Mary bofetada Maria no daba una a Mary 29 did Direct transfer systems not did I Interlingua-based systems slap Machine learning based systems the I Alignment green Statistical Modeling witch knowledge Philipp Koehn, University of EdinburghLinguistic based systems Many-to-many translation can handle slap non-compositional phrases. the Direct transfer systems not Interlingua-based systems Machine learning based systems green Use of local context. Alignment Statistical Modeling witch I The more data, the longer the phrases that can be Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), learned. (verde, green), (Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the), (bruja verde, green witch), (Maria no daba una bofetada, Mary did not slap), (no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch) Phrase-Based Translation (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), (verde, green), (Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the), (bruja verde, green witch) Examples for Translations Background: Dictionaries bruja la verde – p.29 Linguistic knowledge based systems University of Edinburgh Phrase-Based Translation What makes MT (Maria, hard? What makes MT hard? Evaluating MT systems Evaluating MT systems References We can now use these phrase pairs as the units of our probability model. – p.37 Philipp Koehn, University of Edinburgh Machine Translation Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Introduction Word Alignment Induced Phrases (2) p 38 / 49 Language and Computers – p.29 bofetada Maria no daba una a What makes MT hard? the witch green witch slap Alignment Statistical Modeling bruja verde bofetada Maria no daba una a not Direct transfer systems Interlingua-based systems Alignment witch intersection Mary Linguistic knowledge based systems not Machine learning based systems witch green Background: Dictionaries did Interlingua-based systems the itch Introduction bofetada Examples for Translations Maria no daba una a did Direct transfer systems slap Machine Translation Mary did did the bofetada no daba una a Mary did Language and Computers Improved Word Alignments (2) p spanish to english bruja verde Maria Improved Word Alignments Growing Alignments Language and Computers Machine Translation spanish to english english to spanish Statistical Machine Translation — Lecture 3: Word A Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Word Alignments p bofetada noa Maria no daba Maria una 28 – p.38 39 / 49 Philipp 37 What makes MT hard? References Language and Computers Koehn, University of Edinburgh 40 / 49 38 Lexical ambiguity Language and Computers Machine Translation Machine Translation Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Introduction Introduction Word Alignment Induced Phrases We’ve seen how MT systems can work, but MT(4) is a p very Background: Dictionaries Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Examples for Translations bofetada bruja Maria no daba una la verde difficult task because languages area vastly different. bofetada Linguistic knowledge based systems Mary Languages differ: Direct transfer systems did Interlingua-based systems not Machine learning based systems I Lexically: In thethewords they use I green Syntactically: In the constructions they allow I Semantically: In the way meanings work Examples for Translations Word Alignment Induced Phrases (5) p slap I (Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch), I Pragmatically: In what readers take from a sentence. (verde, green), (Maria no, Mary did not), (no daba una bofetada, did not slap), (daba una bofetada a la, slap the), (bruja green witch), In addition, there is a good deal verde, of real-world knowledge that (Maria no daba una bofetada, Mary did not slap), goes into a translation. (no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch), (Maria no daba una bofetada a la, Mary did not slap the), (daba una bofetada a la bruja verde, slap the green witch) Linguistic knowledge based systems Direct transfer systems Interlingua-based systems bank can be a financial institution or a place along a slap river. the not can cangreen be a cylindrical object, as well as the act of witch What makes MT putting something into that cylinder (e.g., John cans hard? (Maria, Mary), tuna.), (no, did not), (slap, as dababeing una bofetada), la, the), (bruja, witch), or as well a word(alike must, might, Evaluating MT systems (Maria no, Mary did not), (no daba una bofetada, did not slap), (verde, green),should. Phrase-Based Translation witch bruja Maria no daba una a la verde Words can be lexically ambiguous = have multiple Mary meanings. did Alignment Statistical Modeling Background: Dictionaries References I (daba una bofetada a la, slap the), (bruja verde, green witch), (Maria no daba una bofetada, Mary did not slap), (no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch), (Maria no daba una bofetada a la, Mary did not slap the), (daba una bofetada a la bruja verde, slap the green witch), (no daba una bofetada a la bruja verde, did not slap the green witch), (Maria no daba una bofetada a la bruja verde, Mary did not slap the green witch) 39 40 Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p Probability Distribution of Phrase Pairs p Reordering p What makes MT hard? Evaluating MT systems 42 / 49 – p.40 Philipp Koehn, University of Edinburgh Alignment Statistical Modeling Phrase-Based Translation References 41 / 49 – p.39 Philipp Koehn, University of Edinburgh Machine learning based systems Semantic relationships Language and Computers Often we find (rough) synonyms between two languages: English book = Russian kniga I English music = Spanish música English hypernyms = words that are more general in English than in their counterparts in other languages I I I English know is rendered by the French savoir (’to know a fact’) and connaitre (’to know a thing’) English library is German Bücherei if it is open to the public, but Bibliothek if it is intended for scholarly work. English hyponyms = words that are more specific in English than in their foreign language counterparts. I I Introduction Introduction Examples for Translations Background: Dictionaries But words don’t always line up exactly between languages. Language and Computers Machine Translation Examples for Translations I I Semantic overlap Machine Translation Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment The situation can be fuzzy, as in the following English and French correspondences (Jurafsky & Martin 2000, Figure 21.2) I Statistical Modeling Phrase-Based Translation leg = etape (journey), jambe (human), pied (chair), patte (animal) Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? I foot = pied (human), patte (bird) What makes MT hard? Evaluating MT systems I paw = patte (animal) Evaluating MT systems References References The German word berg can mean either hill or mountain in English. The Russian word ruka can mean either hand or arm. 43 / 49 Venn diagram of semantic overlap Language and Computers 44 / 49 Semantic non-compositionality Machine Translation Machine Translation Introduction Introduction Examples for Translations paw Background: Dictionaries I animal Alignment Statistical Modeling bird leg Phrase-Based Translation foot human chair jambe I Machine learning based systems patte journey human Interlingua-based systems animal Some verbs carry little meaning, so-called light verbs Linguistic knowledge based systems Direct transfer systems etape Language and Computers What makes MT hard? Evaluating MT systems French faire une promenade is literally ’make a walk,’ but it has the meaning of the English take a walk Dutch een poging doen ’do an attempt’ means the same as the English make an attempt And we often face idioms = expressions whose meaning is not made up of the meanings of the individual words. I e.g., English kick the bucket I References pied I I approximately equivalent to the French casser sa pipe (’break his/her pipe’) but we might want to translate it as mourir (’die’) and we want to treat it differently than kick the table Examples for Translations Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 45 / 49 Idiosyncratic differences Some words do not exist in a language and have to be translated with a more complex phrase: lexical gap or lexical hole. I I French gratiner means something like ’to cook with a cheese coating’ Hebrew stam means something like ’I’m just kidding’ or ’Nothing special.’ There are also idiosyncratic collocations among languages, e.g.: I English heavy smoker I French grand fumeur (’large smoker’) I German starker Raucher (’strong smoker’) Language and Computers 46 / 49 Evaluating quality Language and Computers Machine Translation Machine Translation Introduction Introduction Examples for Translations Examples for Translations Background: Dictionaries Background: Dictionaries Linguistic knowledge based systems Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Two main components in evaluating quality: I I Intelligibility = how understandable the output is Accuracy = how faithful the output is to the input I A common (though problematic) evaluation metric is the BLEU metric, based on n-gram comparisons Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems Evaluating MT systems References References 47 / 49 48 / 49 Further reading Language and Computers Machine Translation Introduction Examples for Translations Some of the examples are adapted from the following books: I I Doug J. Arnold, Lorna Balkan, Siety Meijer, R. Lee Humphreys and Louisa Sadler (1994). Machine Translation: an Introductory Guide. Blackwells-NCC, London. 1994. Available from http://www.essex.ac.uk/linguistics/clmt/MTbook/ Jurafsky, Daniel, and James H. Martin (2000). Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics. Prentice-Hall. More info at http://www.cs.colorado.edu/∼martin/slp.html. Background: Dictionaries Linguistic knowledge based systems Direct transfer systems Interlingua-based systems Machine learning based systems Alignment Statistical Modeling Phrase-Based Translation What makes MT hard? Evaluating MT systems References 49 / 49
© Copyright 2026 Paperzz