slide

Effects of Empty Categories
on Machine Translation
Tagyoung Chung
Daniel Gildea
Department of Computer Science
University of Rochester
1
One-sentence summary
• Incorporating some empty categories in the source language
may improve machine translation
2
Empty categories
3
What are they?
• An element in parse tree that does not have corresponding
surface word
anaphoric
pronominal
empty category
overt noun type
-
-
Wh-trace
R-expression
-
+
*pro*
pronoun
+
-
NP-trace
anaphor
+
+
*PRO*
none
[Wikipedia]
4
An example
• Wh-trace (A’-movement)
SBARQ
WHNP-1
what
SQ
are
NP-SBJ
you
VP
thinking
PP-CLR
about
NP
*T*-1
5
*pro* and *PRO*
• Both *pro* and *PRO* are very common in CTB
– *pro*: marks dropped pronouns
– *PRO*: marks control structure
IP
VP
NP
-NONE-
VV
*pro*
要求
They
PU
NP
IP
。
NR
NN
NP
VP
米洛舍维奇
总统
-NONE-
VV
*PRO*
下台
asked
president
Milošević
to
step down
6
Phrase-based machine translation
7
Word alignment
• Words in sentence pairs are aligned (usually using GIZA++)
They
要求
米洛舍维奇
asked
president
总统
Milošević
下台
to
step down
8
Phrase extraction
• Phrase pairs are extracted and scored
They
要求
米洛舍维奇
asked
president
总统
Milošević
下台
to
step down
9
Translate
• Given a new Chinese sentence, translate using learned phrase
pairs and language model
• Problems:
– Phrase extraction has difficulties with unaligned words
to can be attached to preceding or following phrase
– One Chinese phrase is aligned to multiple English phrases
下台 can aligned to step down, stepped down, is stepping down,
to step down, et cetra
10
Initial experiments
11
Experiment design
1. Predict where empty categories should be
要求 米洛舍维奇 总统 下台
*pro* 要求 米洛舍维奇 总统 *PRO* 下台
2. Align with parallel sentence
*pro*
要求
米洛舍维奇
They
asked
president
总统
Milošević
*PRO*
to
下台
step down
3. Extract phrases and score them
*pro*
要求
米洛舍维奇 总统
*PRO* 下台
They
asked
president Milošević
to step down
4. Given a new sentence, translate using learned phrase pairs
12
Experiment
• Trained phrase-based MT systems by adding different empty
categories to training data using gold-standard treebank
• Adding *PRO* and *pro* brought improvement
• Why does this work?
– Adding *pro* facilitated generating English pronouns that are
dropped in Chinese
– Adding *PRO* helped get verb form right (gerund,
to-infinitive, and nominalization)
– Works when translation is more literal
13
Analyses
• Learned word alignment
| ∗pro∗)
word
the
0.18
to
0.45
i
0.13
N ULL
0.10
it
0.08
the
0.02
to
0.08
of
0.02
they
0.05
as
0.02
word
P(e
P(e
| ∗PRO∗)
14
Analyses
• Learned phrase pairs
Chinese source
*PRO* 贯彻
English reference
implementing
System trained w/ nulls
implementation
System trained w/o nulls
implemented
Chinese source
*PRO* 逐步 形成
English reference
have gradually formed
System trained w/ nulls
to gradually form
System trained w/o nulls
gradually formed
Chinese source
*PRO* 吸引 外资 作为
English reference
attracting foreign investment
System trained w/ nulls
attracting foreign investment
System trained w/o nulls
attract foreign capital
15
Recovering empty nodes
16
Previous works
• Johnson (2002) successfully uses pattern matching for
recovering empty nodes and antecedents
• Gabbard et al. (2006) use various machine learning algorithms
to recover different empty categories in several steps
• Difference
– language-specific issues
– recovering only *pro* and *PRO* and not antecedents
17
Pattern matching
• Johnson (2002) used minimally connected tree fragments
containing an empty node as patterns
IP
VP
NP
-NONE-
VV
*pro*
要求
PU
NP
IP
。
NR
NN
NP
VP
米洛舍维奇
总统
-NONE-
VV
*PRO*
下台
18
Pattern matching
• Parse with a normal grammar
• If (IP VP PU) is encountered in a parse tree, (NP (-NONE*pro*)) can be added back in
• Problem: Minimally connected patterns for *pro* and *PRO*
largely overlap in Chinese
• Better: More context can avoid the overlap
• In the example, instead of using (IP (NP (-NONE- *PRO*)) VP),
one can use (VP (VV NP (IP (NP (-NONE- *PRO*)) VP)))
19
Conditional random field
• Examine every word boundary and decide whether to insert
*pro* or *PRO*
• Features include local window of words, POS tags, and parents
of POS tags
20
Parsing
• Modify trees in treebank so that empty categories are
recoverable from nonterminals
IP
NP-SBJ
SPRO0IP
modify
VP
PU
-NONE-
VV
NP-OBJ
*pro*
谢谢
PN
各位
。
VP
PU
VV
NP-OBJ
谢谢
PN
。
各位
21
Parsing
• Extract grammar from new trees
– SPR0IP → VP PU, . . .
– The resulting grammar is too coarse
• Use state-splitting grammar trainer (Petrov et al., 2006) to
automatically learn refined grammar (latent annotation)
1. Split a symbol into two subcategories using EM
2. Merge it back if loss in likelihood for merging is small
3. Additive smoothing
4. Repeat
• Parse with the refined grammar and recover empty categories
22
Results
• 90% of CTB 6.0 for training and the rest for testing
• CRF has higher precision than other methods
• Parsing method requires least intervention but performed
almost as well as other methods
• Result is mediocre compared to works on English
*PRO*
*pro*
Prec.
Rec.
F1
Prec
Rec.
F1
Pattern
0.65
0.61
0.63
0.41
0.23
0.29
CRF
0.68
0.46
0.55
0.58
0.35
0.44
Parsing
0.60
0.53
0.56
0.46
0.39
0.42
23
Training MT systems
24
Experiments
• Subset of FBIS corpus is used for training MT system (2M
words, 60K setences)
• Dev/test set are from NIST 2002 MT evaluation
• Moses (Koehn et al., 2007) and MERT (Och and Ney, 2003) are
used for all systems
25
Results
• Empty node prediction has mediocre accuracy but brought
statistically significant improvement
BLEU
BP
*PRO* (F1)
*pro* (F1)
Baseline
23.73
1.000
Pattern
23.99
0.998
0.62
0.31
CRF
24.69
1.000
0.55
0.44
Parsing
23.99
1.000
0.56
0.42
26
Sample translations
source
中国 计划 *PRO* 投资 在 基础 设施 上
reference
china plans to invest in the infrastructure
w/ nulls
china plans to invest in infrastructure
w/o nulls
china ’s investment in infrastructure
source
有利 *PRO* 巩固 香港 的 贸易 和 航运 中心
reference
good for consolidating the trade and shipping center of hong kong
w/ nulls
favorable to the consolidation of the trade and shipping center in hong kong
w/o nulls
hong kong will consolidate the trade and shipping center
source
*pro* 现在 还 不 清楚
reference
it is not clear yet
w/ nulls
it is still not clear
w/o nulls
is still not clear
27
Conclusion
• Adding some empty categories, even when automatic prediction
is far from perfect, may help building better machine translation
systems
• Finding a good strategy for accurate prediction of empty nodes
remains to be a challenge
28
Possible future works and difficulties
• Better empty category prediction
– Tweaking existing models: did not yield better results
– Combining different models: conflicts between models
– Further research on recent publications needed
• Anaphora resolution
– Reliable Chinese pronoun resolver is not available
– Ways to translate sentence blocks are needed
29
Questions?
30