INF5820 Language technological applications H2010 Jan Tore Lønning [email protected] Maskinoversettelse INF 5820 – H2008 Forelesning 2 Machine Translation 1. 2. 3. Some examples Why is machine translation a problem? Traditional approaches: 1. 2. 3. 4. Empirical approaches: 1. 2. 5. Direct Interlingua Transfer SMT Example-based MT (EBMT) The LOGON approach 1. Realskolealgoritmen SNBE V Pr V PP H D 3p E OAU F Jenta fra byen har gitt ham noen røde epler Mädchen von Stadt haben geben er einige rot Apfel Das Mädchen von der Stadt hat gegeben ihm einige rote Äpfel gegeben 1. Identify verb, syntactic function, case 2. And morphosyntactic features: • definiteness, number, person, form, tense, … 3. Translate the lexemes 4. Properties of the target lexemes: gender, arguments, agreement 5. Inflection: Case, number, person, gender, def., tense, agr. … 6. Word order 1. Direct translation Main idea: Translate words! Bilingual dictionary Some morphological analysis Two steps: Determine the words Determine the word order (Similar to statistical MT) J&M: Decision list algorithm 2. Interlingua A universal meaning representation language (lingua franca) Analyse the source language sentence resulting in an interlinguag representation From this generate sentence in target language 2. Interlingua strength Translation between many languages. One analysis module and one generation module per languages Example 17 languages: Direct 17*16 modules (=272) Interlingua 2*17 (=34) Language18: Direct +(2*17) Interlingua +2 3. Transfer Problem for interlingua: Transfer approach: Language specific representations Contrast between pair of languages as transfer rules Syntactic transfer: A language independent meaning representation Extends the direct approach with a syntactic analysis Semantic transfer Semantic representations, but language independent Alternative strategies interlingua Vauquoistriangel Semantic transfer Syntaktic transfer Norsk setning Ord-for-ord English sentence Machine Translation 1. 2. 3. Some examples Why is machine translation a problem? Traditional approaches: 1. 2. 3. 4. Empirical approaches: 1. 2. 5. Direct Interlingua Transfer SMT Example-based MT (EBMT) The LOGON approach Example-based MT No: Jenta har lest lekser i en time. Eng: ? Eksempler: Jenta har spist et eple hver dag The girl has eaten an apple a day Per hadde lest lekser Per had studied Kari sang i en time. Kari sang for an hour. Not necessarily constituents SMT Figure 25.8 Machine Translation 1. 2. 3. Some examples Why is machine translation a problem? Traditional approaches: 1. 2. 3. 4. Empirical approaches: 1. 2. 5. Direct Interlingua Transfer SMT Example-based MT (EBMT) The LOGON approach The LOGON-project Maskinoversettelse norsk engelsk Mange områder av språkteknologi trengs: Samvirke i en demonstrator Likheter og forskjeller mellom norsk og andre språk Turisttekster/turbeskrivelser Høykvalitet, (begrenset dekning) 2003-2007 Alternative strategier interlingua semantikk syntaks Norsk setning Ord-for-ord Engelsk setning MT strategies (symbolic) semantic (syntactic) Norwegian sentence English sentence Basis: Transferbasert oversettelse Underbestemt semantisk rep. av norsk 1. Analyse LFG-basert Norsk setning 2. Transfer Underbestemt semantisk rep. av engelsk 3. Generering HPSG-basert Engelsk setning 2.2 Flertydighet 1. Analysis 2. Transfer 3. Generation Hvordan velge den rette eller beste på hvert trinn? |< |Toppen er luftig, og har en utrolig utsikt!| (83) --- 2 x 24 x 12 = 12 |> |the top is airy and has an incredible view| [85.9] <0.70> (1:0:0). |> |the summit is airy and has an incredible view| [87.4] <1.00> (1:4:0). |> |the top is breezy and has an incredible view| [87.7] <0.46> (1:6:0). |> |the top is airy and has an unbelievable view| [88.9] <0.70> (1:1:0). |> |the peak is airy and has an incredible view| [89.1] <0.96> (1:2:0). |> |the summit is breezy and has an incredible view| [89.1] <0.66> (1:10:0). |> |the summit is airy and has an unbelievable view| [90.3] <1.00> (1:5:0). |> |the top is breezy and has an unbelievable view| [90.7] <0.46> (1:7:0). |> |the peak is breezy and has an incredible view| [90.8] <0.66> (1:8:0). |> |the peak is airy and has an unbelievable view| [92.0] <0.96> (1:3:0). |> |the summit is breezy and has an unbelievable view| [92.1] <0.66> (1:11:0). |> |the peak is breezy and has an unbelievable view| [93.8] <0.66> (1:9:0). |= 64:19 of 83 {77.1+22.9}; 58:9 of 64:19 {90.6 47.4}; 55:9 of 58:9 {94.8 100.0} @ 64 of 83 {77.1} <0.51 0.67>. |< |De slipper å bære.| (70) --- 3 x 4 x 9 = 6 [9] |> |they do not have to carry something| [40.6] <0.84> (0:0:1). |> |you do not have to carry something| [41.8] <0.53> (1:0:1). |> |those do not have to carry something| [51.6] <0.53> (2:1:1). |> |they don't have to carry something| [55.2] <0.80> (0:0:0). |> |you don't have to carry something| [65.8] <0.43> (1:0:0). |> |those don't have to carry something| [66.3] <0.43> (2:1:0). |= 57:13 of 70 {81.4+18.6}; 51:6 of 57:13 {89.5 46.2}; 48:6 of 51:6 {94.1 100.0} @ 54 of 70 {77.1} <0.53 0.69>. Maskinoversettelse Hva er maskinoversettelse Hvorfor er det vanskelig? Tradisjonelle tilnærminger: 1. 2. 3. 1. 2. 3. Empiriske tilnærminger: 4. 1. 2. 5. 6. 7. 8. Direkte Interlingua Transfer Eksempelbasert MT (EBMT) Statistisk MT (SMT) LOGON-prosjektet Evaluering Maskinoversettelse i praksis Litt historie Historien 1950-årene: stor optimisme (FAHQT) 1960-årene: for vanskelig Bar-Hillel ALPAC-rapporten 1980-årene-fornyet interesse: Japan EU, Eurotra Vår tid (1992 ) Anvendelser Hyllevare for PC-er WWW Interaktive oversettelsesverktøy Nye markeder: Kina Teori Taleoversettelse, eks. VerbMobil, tysk prosjekt Afghanistan, jfr. SMT, EMT SMTs tidsalder Fra 1990 Med som et alternativ på slutten av VerbMobil På markedet fra ca. 2003 Google: SMT fra ca 2005 Overbevisende kvalitet Mange språkpar Men forutsigbare feil
© Copyright 2026 Paperzz