Hva er maskinoversettelse?

INF5820
Language technological applications
H2010
Jan Tore Lønning
[email protected]
Maskinoversettelse
INF 5820 – H2008
Forelesning 2
Machine Translation
1.
2.
3.
Some examples
Why is machine translation a problem?
Traditional approaches:
1.
2.
3.
4.
Empirical approaches:
1.
2.
5.
Direct
Interlingua
Transfer
SMT
Example-based MT (EBMT)
The LOGON approach
1. Realskolealgoritmen
SNBE
V Pr V PP H D 3p E
OAU F
Jenta fra byen har gitt ham
noen røde epler
Mädchen von Stadt haben geben er
einige rot Apfel
Das Mädchen von der Stadt hat gegeben ihm einige rote Äpfel
gegeben
1. Identify verb, syntactic function, case
2. And morphosyntactic features:
• definiteness, number, person, form, tense, …
3. Translate the lexemes
4. Properties of the target lexemes: gender, arguments, agreement
5. Inflection: Case, number, person, gender, def., tense, agr. …
6. Word order
1. Direct translation




Main idea: Translate words!
Bilingual dictionary
Some morphological analysis
Two steps:




Determine the words
Determine the word order
(Similar to statistical MT)
J&M: Decision list algorithm
2. Interlingua




A universal meaning representation
language (lingua franca)
Analyse the source language sentence
resulting in an interlinguag representation
From this generate sentence in target
language
2. Interlingua strength



Translation between many languages.
One analysis module and one generation
module per languages
Example 17 languages:



Direct 17*16 modules (=272)
Interlingua 2*17 (=34)
Language18:


Direct +(2*17)
Interlingua +2
3. Transfer

Problem for interlingua:


Transfer approach:



Language specific representations
Contrast between pair of languages as transfer rules
Syntactic transfer:


A language independent meaning representation
Extends the direct approach with a syntactic analysis
Semantic transfer

Semantic representations, but language independent
Alternative strategies
interlingua
Vauquoistriangel
Semantic
transfer
Syntaktic transfer
Norsk
setning
Ord-for-ord
English
sentence
Machine Translation
1.
2.
3.
Some examples
Why is machine translation a problem?
Traditional approaches:
1.
2.
3.
4.
Empirical approaches:
1.
2.
5.
Direct
Interlingua
Transfer
SMT
Example-based MT (EBMT)
The LOGON approach
Example-based MT



No: Jenta har lest lekser i en time.
Eng: ?
Eksempler:







Jenta har spist et eple hver dag
The girl has eaten an apple a day
Per hadde lest lekser
Per had studied
Kari sang i en time.
Kari sang for an hour.
Not necessarily constituents
SMT Figure 25.8
Machine Translation
1.
2.
3.
Some examples
Why is machine translation a problem?
Traditional approaches:
1.
2.
3.
4.
Empirical approaches:
1.
2.
5.
Direct
Interlingua
Transfer
SMT
Example-based MT (EBMT)
The LOGON approach
The LOGON-project

Maskinoversettelse norsk  engelsk
Mange områder av språkteknologi trengs:
 Samvirke i en demonstrator
 Likheter og forskjeller mellom norsk og andre
språk




Turisttekster/turbeskrivelser
Høykvalitet, (begrenset dekning)
2003-2007
Alternative strategier
interlingua
semantikk
syntaks
Norsk
setning
Ord-for-ord
Engelsk
setning
MT strategies (symbolic)
semantic
(syntactic)
Norwegian
sentence
English
sentence
Basis: Transferbasert oversettelse
Underbestemt
semantisk
rep. av norsk
1. Analyse
LFG-basert
Norsk
setning
2. Transfer
Underbestemt
semantisk
rep. av engelsk
3. Generering
HPSG-basert
Engelsk
setning
2.2 Flertydighet
1. Analysis

2. Transfer
3. Generation
Hvordan velge den rette eller beste på
hvert trinn?














|< |Toppen er luftig, og har en utrolig utsikt!| (83) --- 2 x 24 x 12 = 12
|> |the top is airy and has an incredible view| [85.9] <0.70> (1:0:0).
|> |the summit is airy and has an incredible view| [87.4] <1.00> (1:4:0).
|> |the top is breezy and has an incredible view| [87.7] <0.46> (1:6:0).
|> |the top is airy and has an unbelievable view| [88.9] <0.70> (1:1:0).
|> |the peak is airy and has an incredible view| [89.1] <0.96> (1:2:0).
|> |the summit is breezy and has an incredible view| [89.1] <0.66> (1:10:0).
|> |the summit is airy and has an unbelievable view| [90.3] <1.00> (1:5:0).
|> |the top is breezy and has an unbelievable view| [90.7] <0.46> (1:7:0).
|> |the peak is breezy and has an incredible view| [90.8] <0.66> (1:8:0).
|> |the peak is airy and has an unbelievable view| [92.0] <0.96> (1:3:0).
|> |the summit is breezy and has an unbelievable view| [92.1] <0.66> (1:11:0).
|> |the peak is breezy and has an unbelievable view| [93.8] <0.66> (1:9:0).
|= 64:19 of 83 {77.1+22.9}; 58:9 of 64:19 {90.6 47.4}; 55:9 of 58:9 {94.8
100.0} @ 64 of 83 {77.1} <0.51 0.67>.








|< |De slipper å bære.| (70) --- 3 x 4 x 9 = 6 [9]
|> |they do not have to carry something| [40.6] <0.84> (0:0:1).
|> |you do not have to carry something| [41.8] <0.53> (1:0:1).
|> |those do not have to carry something| [51.6] <0.53> (2:1:1).
|> |they don't have to carry something| [55.2] <0.80> (0:0:0).
|> |you don't have to carry something| [65.8] <0.43> (1:0:0).
|> |those don't have to carry something| [66.3] <0.43> (2:1:0).
|= 57:13 of 70 {81.4+18.6}; 51:6 of 57:13 {89.5 46.2}; 48:6 of
51:6 {94.1 100.0} @ 54 of 70 {77.1} <0.53 0.69>.
Maskinoversettelse
Hva er maskinoversettelse
Hvorfor er det vanskelig?
Tradisjonelle tilnærminger:
1.
2.
3.
1.
2.
3.
Empiriske tilnærminger:
4.
1.
2.
5.
6.
7.
8.
Direkte
Interlingua
Transfer
Eksempelbasert MT (EBMT)
Statistisk MT (SMT)
LOGON-prosjektet
Evaluering
Maskinoversettelse i praksis
Litt historie
Historien


1950-årene: stor optimisme (FAHQT)
1960-årene: for vanskelig



Bar-Hillel
ALPAC-rapporten
1980-årene-fornyet interesse:


Japan
EU, Eurotra
Vår tid (1992 )

Anvendelser
Hyllevare for PC-er
 WWW
 Interaktive oversettelsesverktøy
 Nye markeder: Kina


Teori

Taleoversettelse,



eks. VerbMobil, tysk prosjekt
Afghanistan, jfr.
SMT, EMT
SMTs tidsalder




Fra 1990
Med som et alternativ på slutten av
VerbMobil
På markedet fra ca. 2003
Google:




SMT fra ca 2005
Overbevisende kvalitet
Mange språkpar
Men forutsigbare feil