Surface Lemma POS Morph.

Morphology in MT
April 26th 2016
Road Map
Road Map
•
Definition: What is morphology?
Road Map
•
Definition: What is morphology?
•
Problems: How does morphology affect MT?
Road Map
•
Definition: What is morphology?
•
Problems: How does morphology affect MT?
•
Solutions: How can we adapt MT systems to work
with morphologically rich languages?
What is morphology?
•
Word formation from smaller parts
What is morphology?
•
Word formation from smaller parts
•
Inflectional
•
eat (V) + -s → eats (V)
What is morphology?
•
Word formation from smaller parts
•
Inflectional
•
•
eat (V) + -s → eats (V)
Derivational
•
happy (A) + -ness → happiness (N)
What is morphology?
•
Word formation from smaller parts
•
Inflectional
•
•
Derivational
•
•
eat (V) + -s → eats (V)
happy (A) + -ness → happiness (N)
Compounding
•
dish (N) + washer (N) → dishwasher (N)
What is morphology?
•
establish (V)
What is morphology?
•
establish (V)
•
disestablish (V)
What is morphology?
•
establish (V)
•
disestablish (V)
•
disestablishment (N)
What is morphology?
•
establish (V)
•
disestablish (V)
•
disestablishment (N)
•
antidisestablishment (N)
What is morphology?
•
establish (V)
•
disestablish (V)
•
disestablishment (N)
•
antidisestablishment (N)
•
antidisestablishmentary (A)
What is morphology?
•
establish (V)
•
disestablish (V)
•
disestablishment (N)
•
antidisestablishment (N)
•
antidisestablishmentary (A)
•
antidisestablishmentarian (N)
What is morphology?
•
establish (V)
•
disestablish (V)
•
disestablishment (N)
•
antidisestablishment (N)
•
antidisestablishmentary (A)
•
antidisestablishmentarian (N)
•
antidisestablishmentarianism (N)
What is morphology?
Unabhängigkeitserklärung
我们
reunification
библиотеку
입혔습니까
शब्दावली
kitapçığa
聞かせられたら
‫להבדיל‬
étudiiez
inquiriendo
tusaatsiarunnanngittualuujunga
Problems
•
Alignment
•
Phrase scoring
•
Input OOVs
•
Novel form generation
•
Language Modeling
•
Evaluation
Alignment
green colorless ideas sleep
bezbarvé zelené myšlenky spí
I like green pears
mám rád zelené hrušky
Alignment
green colorless ideas sleep
bezbarvé zelené myšlenky spí
I like green pears
mám rád zelené hrušky
I sat under a green tree
seděl jsem pod zeleným stromem
Alignment
green colorless ideas sleep
bezbarvé zelené myšlenky spí
I like green pears
mám rád zelené hrušky
I sat under a green tree
seděl jsem pod zeleným stromem
Phrase Scoring
en
cs
p(cs | en)
en
cs
p(cs | en)
cat
kočka
0.5629
cat
kokour
0.112
cat
kočku
0.1769
cat
kokoura
0.074
cat
kočce
0.0002
cat
kokourovi
0.017
cat
kočky
0.00004
cat
kokoura
0.0051
Phrase Scoring
en
cs
p(cs | en)
en
cs
p(cs | en)
cat
kočka
0.7805
cat
NOM
0.7117
cat
kokour
0.2194
cat
ACC
0.2646
cat
DAT
0.018
cat
GEN
0.0054
Input OOVs
•
La mejor aplicación sería la que erradicase el
hambre del mundo.
f
la
mejor
aplicación
ser
sería
la que
erradica
erradicó
erradican
erradico
erradicando
el hambre
del mundo
•
e
the
best
application
to be
would be
to
eradicates
eradicated
eradicate
erradicate
eradicating
hunger
world
p(e|f)
0.9173
0.6330
0.8211
0.1182
0.3442
0.0596
0.9754
0.9303
0.9481
0.8731
0.9713
0.5385
0.2006
The best application would be to erradicase world hunger. 😒
Input OOVs
•
La mejor aplicación sería la que erradicase el
hambre del mundo.
f
la
mejor
aplicación
ser
sería
la que
erradica
erradicó
erradican
erradico
erradicando
el hambre
del mundo
•
e
the
best
application
to be
would be
to
eradicates
eradicated
eradicate
erradicate
eradicating
hunger
world
p(e|f)
0.9173
0.6330
0.8211
0.1182
0.3442
0.0596
0.9754
0.9303
0.9481
0.8731
0.9713
0.5385
0.2006
OOV!
The best application would be to erradicase world hunger. 😒
Novel Form Generation
She had attempted to cross the road on her bike.
Она пыталась пересечь пути на её велосипед.
Language Modeling
•
Je porte un parapluie dans mon sac .
•
À Seattle , on doit porter un parapluie tous les jours .
Evaluation
Sentence
Input
The earnings on its 10 - year bonds are 28.45 % .
BLEU
+1
-
Reference
Výnos na jejích 10 - letých dluhopisech je na 28,45 % .
100.00
System 1
Příjmy na své desetileté dluhopisy jsou 28,45 % .
22.61
System 2
Příjmy na jeho 10 - letých poutech jsou 28,45 % .
32.04
Evaluation
Sentence
Input
The earnings on its 10 - year bonds are 28.45 % .
BLEU
+1
-
Reference
Výnos na jejích 10 - letých dluhopisech je na 28,45 % .
100.00
System 1
Příjmy na své desetileté dluhopisy jsou 28,45 % .
22.61
System 2
Příjmy na jeho 10 - letých poutech jsou 28,45 % .
32.04
Zisk z jejích 10 - letých dluhopisů je 28,45 % .
32.04
Another
Human
Evaluation
Sentence
Input
The earnings on its 10 - year bonds are 28.45 % .
BLEU
METEOR
+1
-
Reference Výnos na jejích 10 - letých dluhopisech je na 28,45 % .
100.00
100.00
System 1 Příjmy na své desetileté dluhopisy jsou 28,45 % .
22.61
18.6
System 2 Příjmy na jeho 10 - letých poutech jsou 28,45 % .
32.04
26.0
32.04
36.7
Another
Zisk z jejích 10 - letých dluhopisů je 28,45 % .
Human
Banerjee and Lavie (2008), Denkowski and Lavie (2014)
Overview
Overview
•
Morphology on the source side
Overview
•
Morphology on the source side
•
Stemming
Overview
•
Morphology on the source side
•
Stemming
•
Lattices
Overview
•
•
Morphology on the source side
•
Stemming
•
Lattices
Morphology on the target side
Overview
•
•
Morphology on the source side
•
Stemming
•
Lattices
Morphology on the target side
•
Source enrichment
Overview
•
•
Morphology on the source side
•
Stemming
•
Lattices
Morphology on the target side
•
Source enrichment
•
Factored models
Overview
•
•
Morphology on the source side
•
Stemming
•
Lattices
Morphology on the target side
•
Source enrichment
•
Factored models
•
Synthetic phrases
Overview
•
•
Morphology on the source side
•
Stemming
•
Lattices
Morphology on the target side
•
Source enrichment
•
Factored models
•
Synthetic phrases
•
Other formalisms?
Overview
•
•
•
Morphology on the source side
•
Stemming
•
Lattices
Morphology on the target side
•
Source enrichment
•
Factored models
•
Synthetic phrases
•
Other formalisms?
Morphology in Neural MT
Stemming
•
La mejor aplicación sería la que erradicase el
hambre del mundo.
f
la
mejor
aplicación
ser
sería
la que
erradica
erradicó
erradican
erradico
erradicando
el hambre
del mundo
•
e
the
best
application
to be
would be
to
eradicates
eradicated
eradicate
erradicate
eradicating
hunger
world
p(e|f)
0.9173
0.6330
0.8211
0.1182
0.3442
0.0596
0.97540
0.9303
0.9481
0.8731
0.9713
0.5385
0.2006
The best application would be to erradicase world hunger. 😒
Stemming
•
La mejor aplicación sería la que erradicase el
hambre del mundo.
f
la
mejor
aplicación
ser
sería
la que
erradica
erradicó
erradican
erradico
erradicando
el hambre
del mundo
•
e
the
best
application
to be
would be
to
eradicates
eradicated
eradicate
erradicate
eradicating
hunger
world
p(e|f)
0.9173
0.6330
0.8211
0.1182
0.3442
0.0596
0.97540
0.9303
0.9481
0.8731
0.9713
0.5385
0.2006
OOV!
The best application would be to erradicase world hunger. 😒
Stemming
•
La mejor aplic ser la que erradic el hambr del
mundo.
f
la
mejor
aplic
ser
ser
la que
erradic
erradic
erradic
erradic
el hambr
del mundo
•
e
the
best
application
to be
would be
to
eradicates
eradicated
eradicate
eradicating
hunger
world
p(e|f)
0.9173
0.6330
0.8211
0.0807
0.0338
0.0596
0.0633
0.2173
0.3880
0.1503
0.5385
0.2006
Not OOV!
The best application to be to eradicate world hunger. 😐
Stemming
•
La mejor aplic ación ser ía la que erradic ase el
hambr e del mundo.
f
la
mejor
aplic ación
ser
ser ía
la que
erradic a
erradic ó
erradic an
erradic o
erradic ando
el hambr e
del mundo
•
e
the
best
application
to be
would be
to
eradicates
eradicated
eradicate
erradicate
eradicating
hunger
world
p(e|f)
0.9173
0.6330
0.8211
0.1182
0.3442
0.0596
0.97540
0.9303
0.9481
0.8731
0.9713
0.5385
0.2006
f
e
p(e|f)
erradic
erradic
ase
erradicate
erradicating
have been
0.2571
0.1253
0.1334
The best application would be to have been eradicating world hunger. 😐
Input Lattices
sería
ser/2
ía
aplicación
aplic/2
mejor
2
aplic/1
ser/1
ación
3
erradicase
4
5
6
la
7
que
8
erradic/1
erradic/2
9
ase
10
hambre
el
11
hambr/2
hambr/1
e
12
13
del
14
la
mundo
15
1
.
0
16
•
The best application would be to eradicate world hunger. 😀
Dyer et. al (2008)
Input Lattices
"a competition-induced price fall"
Dyer et. al (2008)
Take Aways
•
If you have source side morphology:
•
Stem, lowercase, and compound split your data
when doing alignment
•
Extract phrases normally
•
Use input lattices during tuning and decoding
Target Side Morphology
I want to eat a sandwich .
Chci jíst sendvič .
Source Enrichment
I want-1s eat-inf sandwich-acc .
Chci jíst sendvič .
Avramidis and Koehn (2008)
Factored Models
Surface
neue
häuser
werden
gebaut
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
neue
neu
JJ
+pl
+fem
häuser
häus
NN
+pl
VB
+3+pl
+pres
VBN
+past
+part
werden werden
gebaut
bauen
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
neue
neu
JJ
+pl
+fem
häuser
häus
NN
+pl
VB
+3+pl
+pres
VBN
+past
+part
werden werden
gebaut
bauen
Surface Lemma
POS
Morph.
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
neue
neu
JJ
+pl
+fem
häuser
häus
NN
+pl
VB
+3+pl
+pres
VBN
+past
+part
werden werden
gebaut bauen
Surface Lemma
POS
Morph.
Koehn and Hoang (2007)
Factored Models
Surface Lemma
neue
neu
häuser
häus
werden werden
gebaut bauen
POS
Morph.
Surface Lemma
POS
Morph.
+pl
JJ
+fem
DE
EN p(EN | DE)
häus house
0.76
NN
+pl
häus home
0.15
haüs buildin
0.06
+3+pl
g
VB haüs shell
0.02
+pres
VBN
+past
+part
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
Surface Lemma
new
neue
neu
JJ
+pl
+fem
häuser
häus
NN
+pl
house
VB
+3+pl
+pres
be
VBN
+past
+part
build
werden werden
gebaut bauen
POS
Morph.
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
Surface Lemma
new
neue
neu
JJ
+pl
+fem
häuser
häus
NN
+pl
house
VB
+3+pl
+pres
be
VBN
+past
+part
build
werden werden
gebaut
bauen
POS
Morph.
Koehn and Hoang (2007)
Factored Models
DE
EN
VB+3p+pl+pres VB+3p+pl+pres
Surface Lemma
neue
neu
häuser
häus
werden werden
gebaut
bauen
p(EN | DE)
0.81
POS Morph.
Surface Lemma POS
VB+3p+pl+pres VB+3p+sg+pres
0.10
VB+3p+pl+pres
+pl
JJ
+fem
VB+3p+pl+pres
PRN+3p+pl
NN+pl
Morph.
0.04
new
0.03
NN
+pl
house
VB
+3+pl
+pres
be
VBN
+past
+part
build
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
new
JJ
house
NN
+pl
VB
+3+pl
+pres
VBN
+past
+part
neue
neu
JJ
+pl
+fem
häuser
häus
NN
+pl
VB
+3+pl
+pres
VBN
+past
+part
werden werden
gebaut
bauen
Surface Lemma
be
build
POS
Morph.
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
new
JJ
house
NN
+pl
VB
+3+pl
+pres
VBN
+past
+part
be
build
Koehn and Hoang (2007)
Factored Models
Surface Lemma
POS
Morph.
new
new
JJ
houses
house
NN
+pl
VB
+3+pl
+pres
VBN
+past
+part
are
built
be
build
Koehn and Hoang (2007)
Factored Models
Koehn and Hoang (2007)
Factored Models
Pros
Cons
Much more human-like
Huge search space
Can generate novel forms
Word forms generated
independently
Factored language models
Changes whole MT Pipeline
Koehn and Hoang (2007)
Factored Models
Pros
Cons
Much more human-like
Huge search space
Can generate novel forms
Word forms generated
independently
Factored language models
Changes whole MT Pipeline
Koehn and Hoang (2007)
POS Language Models
•
Convert corpus to
POS tags instead of
surface forms
•
Build large (7~8) ngram models
The president announced
his new plan yesterday .
The council approved the
sanctions on Iran .
Polls in the UK show the
LDP up 2 % over last year .
POS Language Models
•
Convert corpus to
POS tags instead of
surface forms
•
Build large (7~8) ngram models
DT NN VBD PRP$ JJ NN
ADV PUNC
DT NN VBD DT NN IN NNP
PUNC
NNS IN DT NNP VB DT
NNP NUM PUNC IN JJ NN
PUNC
Brown Cluster LMs
•
Automatically induce
word classes
•
Can capture more
nuances of the
language
The president announced
his new plan yesterday .
The council approved the
sanctions on Iran .
Polls in the UK show the
LDP up 2 % over last year .
Brown Cluster LMs
•
Automatically induce
word classes
•
Can capture more
nuances of the
language
•
Example
The president announced
his new plan yesterday .
The council approved the
sanctions on Iran .
Polls in the UK show the
LDP up 2 % over last year .
Brown Cluster LMs
•
Automatically induce
word classes
•
Can capture more
nuances of the
language
•
Example
10010 110110010 0100111110010
10011111 1010011100
110111111010 01011011111
000000
10010 1101101011110
0100111100111 10010
110100111111 001110110
111100011 000000
110001110 0011100 10010
11111101011 01000001111 10010
1111110100 010110000 111101011
1111101110 00111011111110
10111010 1111100101 000000
Synthetic Phrases
•
Dynamically add phrases
to the translation table
•
Can condition on source
sentence, phrase table,
and more!
•
Originally used to insert
determiners in RU →EN
translation
Tsvetkov et al. (2013)
Synthetic Phrases
σ:пытаться_V,+,μ:mis2sfm2e
она пыталась пересечь пути на ее велосипед
-1
+1
she had attempted to cross the road on her bike
C50 C473
C28
C8
C275 C37 C43 C82 C94 C331
PRP VBD
VBN
TO
VB
nsubj
aux
root
DT
NN
IN PRP$ NN
xcomp
Chahuneau et al. (2013)
Synthetic Phrases
•
Generate compound
words in the target
language
•
Character-level MT
system
Matthews et al. (2015)
Synthetic Phrases
•
•
Generate compound
words in the target
language
Character-level MT
system
EN
DE
tomato
tomate
processing v e r a r b e i t
processing b e h a n d l u n g
processing v e r e d e l u n g
Score
-2.58
-0.75
-2.74
-4.94
EN
DE
Score
<suf>
<suf>
<end>
<end>
n
s
ung
ende
-3.71
-2.53
-5.73
-9.86
Matthews et al. (2015)
Synthetic Phrases
•
•
1
processing
<suf>
2
tomato
3
<end>
processing
processing
7
tomato
4
<suf>
5
tomato
<end>
DE
tomato
tomate
processing v e r a r b e i t
processing b e h a n d l u n g
processing v e r e d e l u n g
Character-level MT
system
tomato
0
EN
Generate compound
words in the target
language
Score
-2.58
-0.75
-2.74
-4.94
EN
DE
Score
<suf>
<suf>
<end>
<end>
n
s
ung
ende
-3.71
-2.53
-5.73
-9.86
processing
6
Matthews et al. (2015)
Synthetic Phrases
•
•
1
processing
<suf>
2
tomato
3
<end>
processing
processing
7
tomato
4
<suf>
5
tomato
<end>
DE
tomato
tomate
processing v e r a r b e i t
processing b e h a n d l u n g
processing v e r e d e l u n g
Character-level MT
system
tomato
0
EN
Generate compound
words in the target
language
Score
-2.58
-0.75
-2.74
-4.94
EN
DE
Score
<suf>
<suf>
<end>
<end>
n
s
ung
ende
-3.71
-2.53
-5.73
-9.86
processing
6
tomatenverarbeitung
Matthews et al. (2015)
More Ideas
•
Use information to synthetically add/modify feature
values
More Ideas
•
Use information to synthetically add/modify feature
values
•
Add synthetic phrases for discourse-level
information
More Ideas
•
Use information to synthetically add/modify feature
values
•
Add synthetic phrases for discourse-level
information
•
Add synthetic grammar rules in addition to phrase
pairs
More Ideas
•
Use information to synthetically add/modify feature
values
•
Add synthetic phrases for discourse-level
information
•
Add synthetic grammar rules in addition to phrase
pairs
•
Many, many more!
Take Aways
•
Use Brown Cluster LMs (c=600, o=7)
•
(whether you have target morphology or not!)
•
Synthetic phrases can solve a wide range of targetside generation problems
•
Check out morphogen
Morphology in Neural MT
•
Morpheme-level models
•
Character-level models
•
Hybrid models
Standard Attentional Models
Input Sentence Matrix
x1 x2 x3 x4
Standard Attentional Models
Output state
Input Sentence Matrix
x1 x2 x3 x4
Standard Attentional Models
Output state
Input Sentence Matrix
x1 x2 x3 x4
Attention Vector
Standard Attentional Models
Output state
Input Sentence Matrix
Context
x1 x2 x3 x4
Attention Vector
Standard Attentional Models
Output state
Input Sentence Matrix
Context
y1
x1 x2 x3 x4
Attention Vector
Standard Attentional Models
Output state
Input Sentence Matrix
Context
y1
x1 x2 x3 x4
Attention Vector
Standard Attentional Models
Output state
Input Sentence Matrix
Context
y1
x1 x2 x3 x4
Attention Vector
But why do we have to just use independent word vectors?
Morpheme-Level Models
x1 x2 x3 x4
•
Typically we just look up a word
vector for each word from a big
table
•
But there's no reason we can't do
something smarter
Morpheme-Level Models
çin'in
tutumu
belli
değil
Morpheme-Level Models
çin'in
tutumu
belli
değil
But we don't want the word vectors of these to all be independent:
çin çin'i çin'e çin'in çin'deki çin'de çin'den
Morpheme-Level Models
çin+SG+GEN
tutum+SG+NOM
belli değil+3+SG+PRES
Morpheme-Level Models
çin+SG+GEN
tutum+SG+NOM
belli değil+3+SG+PRES
Matthews et al. (2016)
Morpheme-Level Models
Output state
Input Sentence Matrix
Context
y1
x1 x2 x3 x4
Attention Vector
Morpheme-Level Models
Output state
Input Sentence Matrix
Context
gel
x1 x2 x3 x4
Attention Vector
+PAST +3SG
Morpheme-Level Models
Output state
Input Sentence Matrix
Context
gel
x1 x2 x3 x4
Attention Vector
+PAST +3SG
Morpheme-Level Models
Output state
Input Sentence Matrix
Context
gel
+PAST +3SG
geldi
x1 x2 x3 x4
Attention Vector
Morpheme-Level Models
Morpheme-Level Models
•
Combine vectors for
morphemes to get
word vectors
Morpheme-Level Models
•
Combine vectors for
morphemes to get
word vectors
•
Can be used on on the
input or output sides
Morpheme-Level Models
•
Combine vectors for
morphemes to get
word vectors
•
Can be used on on the
input or output sides
•
But what do we do if
we don't have a
morphological
analyzer?
Character-Level Models
ç
i
n
'
i
n
Ling et al. (2015)
Character-Level Models
•
ç
i
n
'
i
We can use the same
trick to make
character-level models
n
Ling et al. (2015)
Character-Level Models
ç
i
n
'
i
•
We can use the same
trick to make
character-level models
•
Pros: Can elegantly
handle passthroughs,
OOVs, morphology
n
Ling et al. (2015)
Character-Level Models
ç
i
n
'
i
n
•
We can use the same
trick to make
character-level models
•
Pros: Can elegantly
handle passthroughs,
OOVs, morphology
•
Cons: Harder/slower
to train
Ling et al. (2015)
Hybrid Models
çin'in
çin+SG+GEN
ç
i
n
'
i
n
Matthews et al. (2016)
Hybrid Models
•
çin'in
Combine word-,
morpheme-, and
character-level models
çin+SG+GEN
ç
i
n
'
i
n
Matthews et al. (2016)
Hybrid Models
•
Combine word-,
morpheme-, and
character-level models
•
Trains quickly
because of word- and
morpheme-level
models
çin'in
çin+SG+GEN
ç
i
n
'
i
n
Matthews et al. (2016)
Hybrid Models
•
Combine word-,
morpheme-, and
character-level models
•
Trains quickly
because of word- and
morpheme-level
models
•
General because of
character-level model
çin'in
çin+SG+GEN
ç
i
n
'
i
n
Matthews et al. (2016)
Take Aways
•
Word vectors don't have to be naïve table lookups
•
You can (and should!) leverage any languagespecific tools you have access too
•
Character-level models add generality
•
Especially useful in combination with higher-level
models to make learning tractable