Example translations Example translations

Language and
Computers
What is Machine Translation?
Machine Translation
Introduction
Introduction
Examples for Translations
Background:
Dictionaries
Language and Computers
Machine Translation
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Examples for Translations
Translation is the process of:
I
moving texts from one (human) language (source
language) to another (target language),
I
in a way that preserves meaning.
Statistical Modeling
Based on Dickinson, Brew, & Meurers (2013)
Indiana University
Fall 2013
Language and
Computers
Machine Translation
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
Machine translation (MT) automates (part of) the process:
I
Fully automatic translation
I
Computer-aided (human) translation
References
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
1 / 49
What is MT good for?
I
When you need the gist of something and there are no
human translators around:
I
I
I
If you have a limited vocabulary and a small range of
sentence types:
I
I
I
I
translating e-mails & webpages
obtaining information from sources in multiple
languages (e.g., search engines)
translating weather reports
translating technical manuals
translating terms in scientific meetings
If you want your human translators to focus on
interesting/difficult sentences while avoiding lookup of
unknown words and translation of mundane sentences.
Language and
Computers
2 / 49
Is MT needed?
Language and
Computers
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
Examples for Translations
Background:
Dictionaries
Background:
Dictionaries
Linguistic knowledge
based systems
I
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
I
Translation is of immediate importance for multilingual
countries (Canada, India, Switzerland, . . . ),
international institutions (United Nations, International
Monetary Fund, World Trade Organization, . . . ),
multinational or exporting companies.
The European Union has 23 official languages. All
federal laws and other documents have to be translated
into all languages.
References
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
4 / 49
3 / 49
Example translations
The simple case
Language and
Computers
Machine Translation
Example translations
A slightly more complex case
Introduction
I
It will help to look at a few examples of real translation
before talking about how a machine does it.
Take the simple Spanish sentence and its English
translation below:
‘I speak Spanish.’
I
I
Words in this example pretty much translate one-for-one
But we have to make sure hablo matches with Yo, i.e.,
that the subject agrees with the form of the verb.
Examples for Translations
Background:
Dictionaries
Background:
Dictionaries
Linguistic knowledge
based systems
The order and number of words can differ:
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
(1) (Yo) hablo
español.
I
speak1st ,sg Spanish
Machine Translation
Introduction
Examples for Translations
I
Language and
Computers
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
5 / 49
Linguistic knowledge
based systems
Direct transfer systems
(2) a. Tu hablas
español?
You speak2nd ,sg Spanish
‘Do you speak Spanish?’
b. Hablas
español?
Speak2nd ,sg Spanish
‘Do you speak Spanish?’
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
6 / 49
What goes into a translation
Language and
Computers
Different approaches to MT
Machine Translation
Introduction
Introduction
Examples for Translations
Examples for Translations
Background:
Dictionaries
Some things to note about these examples and thus what
we might need to know to translate:
I
I
I
I
Words have to be translated → dictionaries
Words are grouped into meaningful units → syntax
Word order can differ from language to languge
The forms of words within a sentence are systematic,
e.g., verbs have to be conjugated, etc.
Language and
Computers
Machine Translation
Background:
Dictionaries
Linguistic knowledge
based systems
We’ll look at some basic approaches to MT:
Direct transfer systems
Interlingua-based systems
I
Machine learning
based systems
Systems based on linguistic knowledge (Rule-Based
MT (RBMT))
I
Alignment
Statistical Modeling
Phrase-Based Translation
I
What makes MT
hard?
Machine learning approaches, i.e., statistical machine
translation (SMT)
I
Evaluating MT
systems
Direct transfer systems
SMT is the most popular form of MT right now
References
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
7 / 49
Dictionaries
Language and
Computers
8 / 49
Direct transfer systems
Machine Translation
Introduction
Examples for Translations
An MT dictionary differs from a “paper” dictionary:
I
must be computer-usable (electronic form, indexed)
I
needs to be able to handle various word inflections
I
can contain (syntactic and semantic) restrictions that a
word places on other words
I
I
I
e.g., subcategorization information: give needs a giver,
a person given to, and an object that is given
e.g., selectional restrictions: if X eats, X must be
animate
contains frequency information
I
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Machine Translation
A direct transfer systems consists of:
I
A source language grammar
I
A target language grammar
I
Rules relating source language underlying
representation (UR) to target language UR
Interlingua-based systems
Machine learning
based systems
I
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
Language and
Computers
I
A direct transfer system has a transfer component
which relates a source language representation with a
target language representation.
This can also be called a comparative grammar.
We’ll walk through the following French to English example:
References
Introduction
Examples for Translations
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
(3) Londres plaı̂t
à Sam.
London is pleasing to Sam
for SMT, may be the only piece of additional information
‘Sam likes London.’
9 / 49
Steps in a transfer system
1. source language grammar analyzes the input and puts
it into an underlying representation (UR).
Londres plaı̂t à Sam → Londres plaire Sam (source UR)
2. The transfer component relates this source language
UR (French UR) to a target language UR (English UR).
French UR
English UR
X plaire Y
↔ Eng(Y) like Eng(X)
(where Eng(X) means the English translation of X)
Londres plaire Sam (source UR) → Sam like London
(target UR)
Language and
Computers
10 / 49
Notes on transfer systems
Language and
Computers
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
Background:
Dictionaries
Examples for Translations
I
Linguistic knowledge
based systems
Direct transfer systems
I
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
I
What makes MT
hard?
Evaluating MT
systems
References
3. target language grammar translates the target language
UR into an actual target language sentence.
Sam like London → Sam likes London
11 / 49
The transfer mechanism is in theory reversible; e.g., the
plaire rule works in both directions
I
Not clear if this is desirable: e.g., Dutch aanvangen
should be translated into English as begin, but begin
should be translated as beginnen.
Because we have a separate target language grammar,
we are able to ensure that the rules of English apply;
like → likes.
RBMT systems are still in use today, especially for more
exotic language pairs
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
12 / 49
Levels of abstraction
Language and
Computers
Direct transfer & syntactic similarity
Machine Translation
This method works best for structurally-similar languages
Introduction
Examples for Translations
I
I
There are differing levels of abstraction at which transfer
can take place. So far we have looked at URs that
represent only word information.
We can do a full syntactic analysis, which helps us to
know how the words in a sentence relate.
Or we can do only a partial syntactic analysis, such as
representing the dependencies between words.
Introduction
Examples for Translations
S
Background:
Dictionaries
I
Language and
Computers
Machine Translation
Background:
Dictionaries
Linguistic knowledge
based systems
Linguistic knowledge
based systems
NP
Direct transfer systems
Interlingua-based systems
VP
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Machine learning
based systems
John
Alignment
Statistical Modeling
V
PP
Alignment
Statistical Modeling
Phrase-Based Translation
Phrase-Based Translation
What makes MT
hard?
goes
Evaluating MT
systems
References
P
on
What makes MT
hard?
NP
Evaluating MT
systems
Det
N
the
roller coaster
References
14 / 49
13 / 49
Direct transfer & syntactic similarity (2)
Language and
Computers
Interlinguas
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
S
Examples for Translations
Background:
Dictionaries
NP
Background:
Dictionaries
Linguistic knowledge
based systems
VP
Direct transfer systems
I
Interlingua-based systems
V
John
Machine learning
based systems
PP
Alignment
I
Statistical Modeling
Phrase-Based Translation
fährt
P
NP
auf
What makes MT
hard?
Evaluating MT
systems
Det
N
der
Achterbahn
Language and
Computers
Ideally, we could use an interlingua = a
language-independent representation of meaning.
Benefit: To add new languages to your MT system, you
merely have to provide mapping rules between your
language and the interlingua, and then you can
translate into any other language in your system.
References
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
15 / 49
The translation triangle
Language and
Computers
Linguistic similarities
ct
io
n
ra
Introduction
I
What exactly should be represented in the interlingua?
I
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Statistical Modeling
s
Similar
word order and grammar
Alignment
es
Similar concepts.
Different grammar
Examples for Translations
Linguistic knowledge
based systems
Similar
Abstract
Meanings
n
te
re
st
Machine Translation
Introduction
Examples for Translations
nc
Co
Ab
Interlingual problems
Machine Translation
Background:
Dictionaries
Meaning similarities
Word-level similarities
16 / 49
Phrase-Based Translation
I
Evaluating MT
systems
References
Similar grammar, concepts, and word order
e.g., English corner = Spanish rincón = ’inside corner’
or esquina = ’outside corner’
A fine-grained interlingua can require extra
(unnecessary) work:
I
What makes MT
hard?
Similar concepts and grammar
Language and
Computers
I
e.g., Japanese distinguishes older brother from younger
brother, so we have to disambiguate English brother to
put it into the interlingua.
Then, if we translate into French, we have to ignore the
disambiguation and simply translate it as frère, which
simply means ’brother’.
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
Words and concepts equivalent, grammar and word order same
Source
Language
Target
Language
17 / 49
18 / 49
Machine learning
Instead of trying to tell the MT system how we’re going to
translate, we might try a machine learning approach
I
I
We can look at how often a source language word is
translated as a target language word, i.e., the
frequency of a given translation, and choose the most
frequent translation.
But how can we tell what a word is being translated as?
There are two different cases:
I
I
We are told what each word is translated as: text
alignment
We are not told what each word is translated as: use a
bag of words
Language and
Computers
Sentence alignment
Language and
Computers
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
Examples for Translations
Background:
Dictionaries
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
I
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
I
What makes MT
hard?
Evaluating MT
systems
sentence alignment = determine which source
language sentences align with which target language
ones (what we assumed in the bag of words example).
Intuitively easy, but can be difficult in practice since
different languages have different punctuation
conventions.
References
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
We can also attempt to learn alignments, as a part of
the process, as we will see.
19 / 49
Word alignment
Language and
Computers
20 / 49
Different word alignments
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
Examples for Translations
Background:
Dictionaries
I
word alignment = determine which source language
words align with which target language ones
I
I
Much harder than sentence alignment to do
automatically.
But if it has already been done for us, it gives us good
information about a word’s translation equivalent.
Language and
Computers
Background:
Dictionaries
Linguistic knowledge
based systems
I
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
I
Statistical Modeling
Phrase-Based Translation
One word can map to one word or to multiple words.
Likewise, sometimes it is best for multiple words to align
with multiple words.
English-Russian examples:
I
What makes MT
hard?
I
I
Evaluating MT
systems
I
one-to-one: khorosho = well
one-to-many: kniga = the book
many-to-one: to take a walk = gulyat’
many-to-many: at least = khotya by (’although if/would’)
References
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
21 / 49
Calculating probabilities
I
With word alignments, it is relatively easy to calculate
probabilities.
I
e.g., What is the probability that run translates as correr
in Spanish?
1. Count up how many times run appears in the English
part of your bi-text. e.g., 500 times
2. Out of all those times, count up how many times it was
translated as (i.e., aligns with) correr. e.g., 275 (out of
500) times.
3. Divide to get a probability: 275/500 = 0.55, or 55%
I
Word alignment gives us some frequency numbers,
which we can use to align new cases, using other
information, too (e.g., contextual information)
Language and
Computers
22 / 49
The bag of words method
Language and
Computers
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
Examples for Translations
Background:
Dictionaries
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
23 / 49
I
What if we’re not given word alignments?
I
How can we tell which English words are translated as
which German words if we are only given an English
text and a corresponding German text?
I
I
We can treat each sentence as a bag of words =
unordered collection of words.
If word A appears in a sentence, then we will record all
of the words in the corresponding sentence in the other
language as appearing with it.
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
24 / 49
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Statistical Machine Translation
Example for bag of words
method
Lecture
3
Word Alignment and Phrase Models
Overview p
Example for bag of words method
Language and
Computers
Machine Translation
Introduction
Examples for Translations
Philipp Koehn
I
English He [email protected]
Russian well.
Background:
Dictionaries
I
Russian On khorosho govorit po-russki.
Linguistic knowledge
based systems
School of Informatics
Eng
He
He
He
He
Direct transfer systems
Machine learning
based systems
Alignment
Calculating probabilities: sentence 1
EM algorithm
Introduction
Improved word alignment
Background:
Dictionaries
Phrase-based SMT
Linguistic knowledge
based systems
Examples for Translations
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
1. Count up the number of Russian words: 4.
Statistical Modeling
Phrase-Based Translation
Statistical Modeling
Phrase-Based Translation
2. Assign each word equal probability of translation: 1/4 =
0/25, or 25%.
What makes MT
hard?
Evaluating MT
systems
The idea is that, over thousands, or even millions, of
sentences, He will tend to appear more often with On,
speaks will appear with govorit, and so on.
Machine Translation
So, for He in He speaks Russian well/On khorosho govorit
po-russki, we do the following:
Interlingua-based systems
Rus
Eng
University of
Edinburgh Rus
On
speaks
On
khorosho speaks khorosho
govorit
...
...
po-russki
well
po-russki
Language and
Computers
Statistical modeling
What makes MT
hard?
Evaluating MT
systems
References
References
– p.1
– p.2
Philipp Koehn, University of Edinburgh
26 / 49
25 / 49
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Example
forModeling
bag of words
Statistical
p method
2
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Probabilities
used (2)
in IBM
Statistical
Modeling
p models
Calculating probabilities: sentence 2
Language and
Computers
Language and
Computers
Machine Translation
Machine Translation
Introduction
Mary did not slap the green witch
Background:
Dictionaries
If we also have He is nice./On simpatich’nyi., then for He, we
do the following:
Linguistic knowledge
based systems
Direct transfer systems
Machine learning
based systems
Alignment
Note that we are NOT counting the number of English
words: we count the number of possible translations
Maria
nothe
daba
una of
bofetada
verde
2. Count up
number
times Onaisla
thebruja
translation
=2
I
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
times out of 6 = 1/3 = 0.33, or 33%.
All other words have the probability 1/6 = 0.17, or 17%, so
On is Not
the best
translation
He. P (f je) directly
sufficient
data tofor
estimate
Linguistic knowledge
based systems
n(#|word ) = probability of the number of words in the
target language that the source word generates
I
p-null = probability of a null word appearing
I
t (tword |sword ) = probability of a target word, given the
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
I
d (tposition|sposition) = probability of a target word
Evaluating MT
systems
Breakappearing
the process
into smaller
stepsgiven the source
in position
tposition,
References
References
position sposition
But we need alignments to estimate these parameters.
– p.3
Philipp Koehn, University of Edinburgh
Background:
Dictionaries
worduna
(i.e.,bofetada
what we’ve
so far)verde
Mariasource
no daba
a seen
la bruja
Evaluating MT
systems
Learn P (f je) from a parallel corpus
Examples for Translations
Probabilistic models are generally more sophisticated,
treating the problem as the source language generating the
target and taking into account probabilities such as:
I
Interlingua-based systems
1. Count up the number of possible translation words: 4
from the first sentence, 2 from the second = 6 total.
Introduction
Mary did not slap the green witch
Examples for Translations
3
Philipp Koehn, University of Edinburgh
– p.4
4
27 / 49
A
Generative
Story— Lecture
(IBM3:Models)
Statistical
Machine Translation
Word Alignment and Phrase Models p
Statistical Modeling (3) p
28 / 49
Beyond Bags of Words
Language and
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models
Computers Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Machine Translation
Machine Translation
Introduction
Introduction
Parallel Corpora
Statistical
Modelingp(4) p
Examples for Translations
... la amaison
...an
laEnglish
maisonstring
bluee ...
fleur
Generate
story how
getsla
to be
a
Background:
Dictionaries
Mary did not slap the green witch
n(3|slap)
Linguistic knowledge
based systems
Mary not slap slap slap the green witch
Direct transfer systems
p-null
Interlingua-based systems
Mary not slap slap slap NULL the green witch
Machine learning
based systems
t(la|the)
Alignment
Maria no daba una botefada a la verde bruja
Statistical Modeling
Phrase-Based Translation
d(4|4)
What makes MT
hard?
Maria no daba una bofetada a la bruja verde
...
Linguistic knowledge
based systems
– choices in story are decided by reference to parameters
j
Direct transfer systems
– e.g., p(bruja witch)
Interlingua-based systems
... the house ... the blue house ... the flower ...
– Incomplete
usually long anddata
hairy, but mechanical to extract from the story
Alignment
Statistical Modeling
Phrase-Based Translation
– English
and foreign
words, but
no connections
them
A
Training
to obtain
parameter
estimates
frombetween
possibly
chicken-and-egg
problem
What makes MT
hard?
Evaluating MT
systems
References
parameters
of our generative story.
generative story
I
If we had the parameters, we could estimate the
alignments.
– if we had the parameters, we could estimate the connections
– p.5
5
Machine learning
based systems
Formula for P (f je) in terms of parameters
word alignments,
we could
estimate of
the
– if we had the connections,
we could estimate
the parameters
our
Source: Introduction to Statistical Machine Translation, Chris
Callison-Burch and Philipp Koehn, http://www.iccs.inf.ed.ac.
uk/∼pkoehn/publications/esslli-slides-day3.pdf
Philipp Koehn, University of Edinburgh
Background:
Dictionaries
foreign string f
– Ioff-the-shelf
EMthe
If we had
References
EM
Examples for Translations
incomplete
Chicken data
and egg problem
Evaluating MT
systems
Probabilities for smaller steps can be learned
p
Language and
Sta
Computers
– p.6
Philipp Koehn, University of Edinburgh
29 / 49
Philipp Koehn, University of Edinburgh
6
– p.7
7
30 / 49Ph
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Sta
EM Algorithm (2) p
EM
... la maison ... la maison blue ... la fleur ...
– p.7
Philipp Koehn, University of Edinburgh
7
P
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Sta
EM Initial
Algorithm
Step (2) p
Algorithm
MExpectation
Algorithm Maximization
p
Incomplete data
Language and
Computers
Machine Translation
... la maison ... la maison blue ... la fleur ...
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Introduction
– if we had complete data, would could estimate model
Examples for Translations
Parallel Corpora p
Background:
Dictionaries
Linguistic knowledge
based systems
EM
EMininaanutshell
nutshell
– initialize model parameters (e.g. uniform)
Linguistic knowledge
based systems
maison
maison
blue
...the
la flower
fleur ...
......
thelahouse
......
thelablue
house
...
...
Direct transfer systems
Interlingua-based systems
Interlingua-based systems
Initial step: all connections equally likely
Machine learning
based systems
Alignment
– assign probabilities to the missing data
2. (re-)assign probabilities to the missing data
Alignment
... the house ... the blue house ... the flower ...
Model
learns that, e.g., la is often connected with the
Statistical Modeling
Phrase-Based Translation
– estimate model parameters from completed data
3. (re-)estimate model parameters from completed data
– iterate
(weighted counts)
What makes MT
hard?
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Incomplete data
Evaluating MT
systems
4. iterate, i.e., repeat steps 2&3 until you hit some
stopping point
EM
Background:
Dictionaries
Direct transfer systems
Machine learning
based systems
1. initialize model parameters (e.g. uniform)
Sta
Introduction
Examples for Translations
The Expectation Maximization (EM) algorithm works
– if we had
we couldto
fillestimate
in the gapsthe
in the
data
forwards
andmodel,
backwards
probabilities:
E
Language and
Computers
Machine Translation
References
Evaluating MT
systems
– English and foreign words, but no connections between them
All connections equally likely.
Chicken and egg problem
I
References
– if we had the connections, we could estimate the parameters of our
generative story
hilipp Koehn, University of Edinburgh
– p.9
Philipp Koehn,
University
of Edinburgh
– if we
had the parameters,
we could estimate the connections
– p.8
8
9
Ph
31 / 49
After 1st Iteration
32 / 49
After Another Iteration
Language and
Computers
Machine Translation
7
Sta
Introduction
Introduction
EM Algorithm (4) p
Examples for Translations
M Algorithm (3) p
Background:
Dictionaries
Statistical
Translation
Lecture
3: Word
Alignment
...Machine
la maison
...—la
maison
bleu
... and
la Phrase
fleur Models
... p
Direct transfer systems
Interlingua-based systems
... the house ... the blue house ... the flower ...
Direct transfer systems
Alignment
... the house ... the blue house ... the flower ...
Statistical Modeling
Phrase-Based Translation
Phrase-Based Translation
What makes MT
hard?
After one iteration
Connections, e.g., between la and the are more likely
What makes MT
hard?
After another iteration
Evaluating MT
systems
Connections between e.g. la and the are more likely.
EM
Interlingua-based systems
Machine learning
based systems
... la maison ... la maison blue ... la fleur ...
Alignment
Statistical Modeling
Sta
Linguistic knowledge
based systems
EM Algorithm (2) p
Machine learning
based systems
E
Examples for Translations
Background:
Dictionaries
Linguistic knowledge
based systems
... la maison ... la maison blue ... la fleur ...
Ph
Machine Translation
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
I
Language and
Computers
– p.7
Philipp Koehn, University of Edinburgh
Evaluating MT
systems
... the house ... the blue house ... the flower ...
Connections between e.g. fleur and flower are more
It becomes
apparent that connections, e.g., between fleur
likely (pigeon hole principle).
and
Initial step: all connections equally likely
flower are more likely (pigeon hole principle)
I
References
References
Model learns that, e.g., la is often connected with the
33 / 49
Convergence
Phrase-Based
Translation
Statistical
Machine
Translation
— Lecture 3: WordOverview
Alignment and Phrase Models p 11
Philipp
Koehn,
University
of Edinburgh
atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
hilipp Koehn, University of Edinburgh
34 / 49
10
EM Statistical
Algorithm
(6) p — Lecture 3: Word Alignment and Phrase Models p
Machine Translation
Language and
– p.10 Computers
Language and
– p.11
Computers
Machine Translation
Machine Translation
IBM
Model
1 p translation
Statistical
Machine Translation
Lecture 3: Word Alignment and Phrase Models p
But this
word-based —
doesn’t account for
Introduction
Examples for Translations
... la maison
... la maison
Growing
Heuristic
p
atistical Machine
Translation — Lecture 3: Word Alignment and Phrase Models p
GROW-DIAG-FINAL(e2f,f2e):
Y t(e jf
many-to-many mappings
between languages
m
Phrase-Based
Translation
p
Background:
Dictionaries
bleu ... la fleur ...
Philipp Koehn, University of Edinburgh
p(e; ajf) =
Linguistic knowledge
(l + 1)
j
a(j )
Introduction
– p.9
Examples for Translations
9
Machine learning
based systems
... the house ... the blue house ... the flower ...
...
la maison ... la maison bleu ... la fleur ...
GROW-DIAG():
)
Linguistic knowledge
based systems
1
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
Statistical Modeling
– English sentence e = e1 :::el
What makes MT
hard?
Evaluating MT
systems
–by any
words, a
not
necessarily
linguistically
motivated
, with
the probabilty
t
the sequence
alignmentoffunction
Foreign “phrases” are translated into English.
the normalization
factor
is required
to turn the formula into a proper
–IEach
phrase
is be
translated
into English
Phrases
may
reordered.
I
EM
Phrase-Based Translation
References
...
the house
... the blue house ... the flower ...
probability
function
Current
models
allow for many-to-one mappings → we can
Phrases
are reordered
use those to induce many-to-many mappings
After
another iteration
See [Koehn et al., NAACL2003] as introduction
35 / 49
– p.13
It becomes apparent that connections, e.g., between fleur
Philipp Koehn, University of Edinburgh
– p.31
31
atistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
hilipp Statistical
Koehn, University
of Edinburgh
12 p
Machine Translation
— Lecture 3: Word Alignment and Phrase Models
One example p
Advantages of Phrase-Based Translation p
Alignment
in Canada
a(j )
References
13
Philipp Koehn, University of Edinburgh
to the conference
Sta
Machine learning
based systems
... la maison ... la maison bleu ... la fleur ...
–Foreign
input is segmented in phrases
each English word ej is generated by a English word f
, as defined
hilipp Koehn, University of Edinburgh
Direct transfer systems
Interlingua-based systems
What is going on?
–Tomorrow
foreign sentence
f = ffly
I will
EM Algorithm
(4)
p:::fm
Alignment
Statistical Modeling
iterate until no new points added
for english word e = 0 ... en
for foreign word f = 0 ... fn
p(la|the) = 0.453
if ( e aligned with f )
p(le|the) point
= 0.334
for
each
( e-new,the
f-new ):
... the housep(maison|house)
... neighboring
the blue house
= ...
0.876 flower ...
if ( ( e-new not aligned and f-new not aligned ) and
p(bleu|blue)
= 0.563
( e-new, f-new ) in union( e2f, f2e ) )
add alignment ...
point ( e-new, f-new )
Convergence
FINAL(a):
for english word e-new = 0 ... en
Parameter
estimation
from
Inherent
hidden
structure
for foreign
word
f-new
=revealed
0the
... connected
fn by EM corpus
if ( ( e-new not aligned or f-new not aligned ) and
( e-new, f-new ) in alignment a )
add alignment point ( e-new, f-new )
Ph
Background:
Dictionaries
m
based systems
neighboring = ((-1,0),(0,-1),(1,0),(0,1),(-1,-1),(-1,1),(1,-1),(1,1))
Morgen fliege ich
nach Kanada zur Konferenz
Direct transfer systems
j =1
alignment = intersect(e2f,f2e);
Interlingua-based systems
GROW-DIAG(); FINAL(e2f); FINAL(f2e);
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
M Algorithm (5) p
Ph
– p.12
flower
are more
likely
Philipp and
Koehn,
University
of Edinburgh
– p.14
36 / 49
14
(pigeon hole principle)
– p.32
32
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
IBM Model 1 and EM p
HowKoehn,
to Learn
the
Phrase Translation Table? p11
Philipp
University
of Edinburgh
– p.11
Ph
witch
University
of Edinburgh
27
Philipp Koehn, University of Edinburgh
– p.27
versity of Edinburgh
– p.28
27
Philipp Koehn, University of Edinburgh
ne Translation — Lecture 3: Word Alignment and Phrase Models p
Translation — Lecture 3: Word Alignment and Phrase Models p
Alignments
d Intersecting
Word Alignments
p
english to spanish
ary
bofetada
bruja
daba
a
la una
verde
la
Mary
bruja
Maria
verde
Introduction
bofetada
Maria no daba una
a
Examples for Translations
bofetada
no daba una
a
la
bruja
verde
Mary
Background:
Dictionaries
la
bruja
verde
Mary
Linguistic knowledge
based systems
did
not
lap
la
not
not
not
slap
the
the
green
reen
slap
green
intersection
witch
bofetada
Maria no daba una
a
la
did
slap
not
the
slap
Machine learning
based systems
Statistical Modeling
Phrase-Based Translation
la
Phrase-Based Translation
green
What makes MT
hard?
bruja
verde
Evaluating MT
systems
Mary
the
did
Grow additional alignment points
the
Evaluating MT
systems
green
References
not
slap
I
green
References
witch
Heuristically add alignments along the diagonal (Och &
[Och and
Ney, CompLing2003]
Ney, Computational Linguistics, 2003)
on of GIZA++ bidirectional alignments
witch
Grow additional alignment points
37 / 49
ection
of Phrases
GIZA++ bidirectional alignments
Induced
versity
of Edinburgh
29
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
[Och andTranslation
Ney, CompLing2003]
30
Advantages
Phrase-Based
Philipp Koehn,
University of
of Edinburgh
Machine Translation
– p.30and
Language
Computers
Introduction
Word Alignment Induced Phrases (3) p
Examples for Translations
Background:
Dictionaries
bruja
la
verde
Mary
bofetada
Maria no daba una
a
Mary
29
did
Direct transfer systems
not
did
I
Interlingua-based systems
slap
Machine learning
based systems
the
I
Alignment
green
Statistical Modeling
witch
knowledge
Philipp Koehn, University of EdinburghLinguistic
based systems
Many-to-many translation can handle
slap
non-compositional
phrases.
the
Direct transfer systems
not
Interlingua-based systems
Machine learning
based systems
green
Use of local
context.
Alignment
Statistical Modeling
witch
I The more data, the longer the phrases that can be
Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch),
learned.
(verde, green), (Maria no, Mary did not), (no daba una bofetada, did not slap),
(daba una bofetada a la, slap the), (bruja verde, green witch),
(Maria no daba una bofetada, Mary did not slap),
(no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch)
Phrase-Based Translation
(Maria, Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch),
(verde, green), (Maria no, Mary did not), (no daba una bofetada, did not slap),
(daba una bofetada a la, slap the), (bruja verde, green witch)
Examples for Translations
Background:
Dictionaries
bruja
la
verde
– p.29
Linguistic knowledge
based systems
University of Edinburgh
Phrase-Based Translation
What makes MT (Maria,
hard?
What makes MT
hard?
Evaluating MT
systems
Evaluating MT
systems
References
We can now use these phrase pairs as the units of our
probability model.
– p.37
Philipp Koehn, University of Edinburgh
Machine Translation
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Introduction
Word Alignment Induced Phrases (2) p
38 / 49
Language and
Computers
– p.29
bofetada
Maria no daba una
a
What makes MT
hard?
the
witch
green
witch
slap
Alignment
Statistical Modeling
bruja
verde
bofetada
Maria no daba una
a
not
Direct transfer systems
Interlingua-based systems
Alignment
witch
intersection
Mary
Linguistic knowledge
based systems
not
Machine learning
based systems
witch
green
Background:
Dictionaries
did
Interlingua-based systems
the
itch
Introduction
bofetada
Examples for Translations
Maria no daba una
a
did
Direct transfer systems
slap
Machine Translation
Mary
did
did
the
bofetada
no daba una
a
Mary
did
Language and
Computers
Improved Word Alignments (2) p
spanish to english
bruja
verde Maria
Improved Word Alignments
Growing Alignments
Language and
Computers
Machine Translation
spanish to english
english to spanish
Statistical Machine Translation — Lecture 3: Word A
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Word Alignments p
bofetada
noa
Maria no daba Maria
una
28
– p.38
39
/ 49
Philipp
37
What makes MT hard?
References
Language and
Computers
Koehn, University of Edinburgh
40 / 49
38
Lexical ambiguity
Language and
Computers
Machine Translation
Machine Translation
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Introduction
Introduction
Word
Alignment
Induced
Phrases
We’ve seen
how MT systems
can work,
but MT(4)
is a p
very
Background:
Dictionaries
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Examples for Translations
bofetada
bruja
Maria no daba una
la
verde
difficult task because languages
area vastly
different.
bofetada
Linguistic knowledge
based systems
Mary
Languages differ:
Direct transfer systems
did
Interlingua-based systems
not
Machine learning
based systems
I
Lexically: In thethewords they use
I
green
Syntactically: In
the constructions they allow
I
Semantically: In the way meanings work
Examples for Translations
Word Alignment Induced Phrases (5) p
slap
I
(Maria,
Mary), (no, did not), (slap, daba una bofetada), (a la, the), (bruja, witch),
I Pragmatically:
In what readers take from a sentence.
(verde, green), (Maria no, Mary did not), (no daba una bofetada, did not slap),
(daba
una
bofetada
a
la,
slap
the), (bruja
green witch),
In addition, there is a
good
deal verde,
of real-world
knowledge that
(Maria no daba una bofetada, Mary did not slap),
goes into a translation.
(no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch),
(Maria no daba una bofetada a la, Mary did not slap the),
(daba una bofetada a la bruja verde, slap the green witch)
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
bank can be a financial institution or a place along a
slap
river. the
not
can cangreen
be a cylindrical object, as well as the act of
witch
What makes MT
putting
something
into that cylinder (e.g., John cans
hard?
(Maria, Mary), tuna.),
(no, did not),
(slap, as
dababeing
una bofetada),
la, the),
(bruja,
witch), or
as
well
a word(alike
must,
might,
Evaluating MT
systems
(Maria no, Mary did not), (no daba una bofetada, did not slap),
(verde, green),should.
Phrase-Based Translation
witch
bruja
Maria no daba una
a
la
verde
Words can be lexically
ambiguous
= have multiple
Mary
meanings. did
Alignment
Statistical Modeling
Background:
Dictionaries
References
I
(daba una bofetada a la, slap the), (bruja verde, green witch),
(Maria no daba una bofetada, Mary did not slap),
(no daba una bofetada a la, did not slap the), (a la bruja verde, the green witch),
(Maria no daba una bofetada a la, Mary did not slap the),
(daba una bofetada a la bruja verde, slap the green witch),
(no daba una bofetada a la bruja verde, did not slap the green witch),
(Maria no daba una bofetada a la bruja verde, Mary did not slap the green witch)
39
40
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Statistical Machine Translation — Lecture 3: Word Alignment and Phrase Models p
Probability Distribution of Phrase Pairs p
Reordering p
What makes MT
hard?
Evaluating MT
systems
42 / 49
– p.40
Philipp Koehn, University of Edinburgh
Alignment
Statistical Modeling
Phrase-Based Translation
References
41 / 49
– p.39
Philipp Koehn, University of Edinburgh
Machine learning
based systems
Semantic relationships
Language and
Computers
Often we find (rough) synonyms between two languages:
English book = Russian kniga
I
English music = Spanish música
English hypernyms = words that are more general in
English than in their counterparts in other languages
I
I
I
English know is rendered by the French savoir (’to know
a fact’) and connaitre (’to know a thing’)
English library is German Bücherei if it is open to the
public, but Bibliothek if it is intended for scholarly work.
English hyponyms = words that are more specific in
English than in their foreign language counterparts.
I
I
Introduction
Introduction
Examples for Translations
Background:
Dictionaries
But words don’t always line up exactly between languages.
Language and
Computers
Machine Translation
Examples for Translations
I
I
Semantic overlap
Machine Translation
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
The situation can be fuzzy, as in the following English and
French correspondences (Jurafsky & Martin 2000, Figure
21.2)
I
Statistical Modeling
Phrase-Based Translation
leg = etape (journey), jambe (human), pied (chair),
patte (animal)
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
I
foot = pied (human), patte (bird)
What makes MT
hard?
Evaluating MT
systems
I
paw = patte (animal)
Evaluating MT
systems
References
References
The German word berg can mean either hill or
mountain in English.
The Russian word ruka can mean either hand or arm.
43 / 49
Venn diagram of semantic overlap
Language and
Computers
44 / 49
Semantic non-compositionality
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
paw
Background:
Dictionaries
I
animal
Alignment
Statistical Modeling
bird
leg
Phrase-Based Translation
foot
human
chair
jambe
I
Machine learning
based systems
patte
journey
human
Interlingua-based systems
animal
Some verbs carry little meaning, so-called light verbs
Linguistic knowledge
based systems
Direct transfer systems
etape
Language and
Computers
What makes MT
hard?
Evaluating MT
systems
French faire une promenade is literally ’make a walk,’
but it has the meaning of the English take a walk
Dutch een poging doen ’do an attempt’ means the same
as the English make an attempt
And we often face idioms = expressions whose meaning is
not made up of the meanings of the individual words.
I
e.g., English kick the bucket
I
References
pied
I
I
approximately equivalent to the French casser sa pipe
(’break his/her pipe’)
but we might want to translate it as mourir (’die’)
and we want to treat it differently than kick the table
Examples for Translations
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
45 / 49
Idiosyncratic differences
Some words do not exist in a language and have to be
translated with a more complex phrase: lexical gap or
lexical hole.
I
I
French gratiner means something like ’to cook with a
cheese coating’
Hebrew stam means something like ’I’m just kidding’ or
’Nothing special.’
There are also idiosyncratic collocations among languages,
e.g.:
I
English heavy smoker
I
French grand fumeur (’large smoker’)
I
German starker Raucher (’strong smoker’)
Language and
Computers
46 / 49
Evaluating quality
Language and
Computers
Machine Translation
Machine Translation
Introduction
Introduction
Examples for Translations
Examples for Translations
Background:
Dictionaries
Background:
Dictionaries
Linguistic knowledge
based systems
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Two main components in evaluating quality:
I
I
Intelligibility = how understandable the output is
Accuracy = how faithful the output is to the input
I
A common (though problematic) evaluation metric is the
BLEU metric, based on n-gram comparisons
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
Evaluating MT
systems
References
References
47 / 49
48 / 49
Further reading
Language and
Computers
Machine Translation
Introduction
Examples for Translations
Some of the examples are adapted from the following books:
I
I
Doug J. Arnold, Lorna Balkan, Siety Meijer, R. Lee
Humphreys and Louisa Sadler (1994). Machine
Translation: an Introductory Guide. Blackwells-NCC,
London. 1994. Available from
http://www.essex.ac.uk/linguistics/clmt/MTbook/
Jurafsky, Daniel, and James H. Martin (2000). Speech
and Language Processing: An Introduction to Natural
Language Processing, Speech Recognition, and
Computational Linguistics. Prentice-Hall. More info at
http://www.cs.colorado.edu/∼martin/slp.html.
Background:
Dictionaries
Linguistic knowledge
based systems
Direct transfer systems
Interlingua-based systems
Machine learning
based systems
Alignment
Statistical Modeling
Phrase-Based Translation
What makes MT
hard?
Evaluating MT
systems
References
49 / 49