Effects of Empty Categories on Machine Translation Tagyoung Chung Daniel Gildea Department of Computer Science University of Rochester 1 One-sentence summary • Incorporating some empty categories in the source language may improve machine translation 2 Empty categories 3 What are they? • An element in parse tree that does not have corresponding surface word anaphoric pronominal empty category overt noun type - - Wh-trace R-expression - + *pro* pronoun + - NP-trace anaphor + + *PRO* none [Wikipedia] 4 An example • Wh-trace (A’-movement) SBARQ WHNP-1 what SQ are NP-SBJ you VP thinking PP-CLR about NP *T*-1 5 *pro* and *PRO* • Both *pro* and *PRO* are very common in CTB – *pro*: marks dropped pronouns – *PRO*: marks control structure IP VP NP -NONE- VV *pro* 要求 They PU NP IP 。 NR NN NP VP 米洛舍维奇 总统 -NONE- VV *PRO* 下台 asked president Milošević to step down 6 Phrase-based machine translation 7 Word alignment • Words in sentence pairs are aligned (usually using GIZA++) They 要求 米洛舍维奇 asked president 总统 Milošević 下台 to step down 8 Phrase extraction • Phrase pairs are extracted and scored They 要求 米洛舍维奇 asked president 总统 Milošević 下台 to step down 9 Translate • Given a new Chinese sentence, translate using learned phrase pairs and language model • Problems: – Phrase extraction has difficulties with unaligned words to can be attached to preceding or following phrase – One Chinese phrase is aligned to multiple English phrases 下台 can aligned to step down, stepped down, is stepping down, to step down, et cetra 10 Initial experiments 11 Experiment design 1. Predict where empty categories should be 要求 米洛舍维奇 总统 下台 *pro* 要求 米洛舍维奇 总统 *PRO* 下台 2. Align with parallel sentence *pro* 要求 米洛舍维奇 They asked president 总统 Milošević *PRO* to 下台 step down 3. Extract phrases and score them *pro* 要求 米洛舍维奇 总统 *PRO* 下台 They asked president Milošević to step down 4. Given a new sentence, translate using learned phrase pairs 12 Experiment • Trained phrase-based MT systems by adding different empty categories to training data using gold-standard treebank • Adding *PRO* and *pro* brought improvement • Why does this work? – Adding *pro* facilitated generating English pronouns that are dropped in Chinese – Adding *PRO* helped get verb form right (gerund, to-infinitive, and nominalization) – Works when translation is more literal 13 Analyses • Learned word alignment | ∗pro∗) word the 0.18 to 0.45 i 0.13 N ULL 0.10 it 0.08 the 0.02 to 0.08 of 0.02 they 0.05 as 0.02 word P(e P(e | ∗PRO∗) 14 Analyses • Learned phrase pairs Chinese source *PRO* 贯彻 English reference implementing System trained w/ nulls implementation System trained w/o nulls implemented Chinese source *PRO* 逐步 形成 English reference have gradually formed System trained w/ nulls to gradually form System trained w/o nulls gradually formed Chinese source *PRO* 吸引 外资 作为 English reference attracting foreign investment System trained w/ nulls attracting foreign investment System trained w/o nulls attract foreign capital 15 Recovering empty nodes 16 Previous works • Johnson (2002) successfully uses pattern matching for recovering empty nodes and antecedents • Gabbard et al. (2006) use various machine learning algorithms to recover different empty categories in several steps • Difference – language-specific issues – recovering only *pro* and *PRO* and not antecedents 17 Pattern matching • Johnson (2002) used minimally connected tree fragments containing an empty node as patterns IP VP NP -NONE- VV *pro* 要求 PU NP IP 。 NR NN NP VP 米洛舍维奇 总统 -NONE- VV *PRO* 下台 18 Pattern matching • Parse with a normal grammar • If (IP VP PU) is encountered in a parse tree, (NP (-NONE*pro*)) can be added back in • Problem: Minimally connected patterns for *pro* and *PRO* largely overlap in Chinese • Better: More context can avoid the overlap • In the example, instead of using (IP (NP (-NONE- *PRO*)) VP), one can use (VP (VV NP (IP (NP (-NONE- *PRO*)) VP))) 19 Conditional random field • Examine every word boundary and decide whether to insert *pro* or *PRO* • Features include local window of words, POS tags, and parents of POS tags 20 Parsing • Modify trees in treebank so that empty categories are recoverable from nonterminals IP NP-SBJ SPRO0IP modify VP PU -NONE- VV NP-OBJ *pro* 谢谢 PN 各位 。 VP PU VV NP-OBJ 谢谢 PN 。 各位 21 Parsing • Extract grammar from new trees – SPR0IP → VP PU, . . . – The resulting grammar is too coarse • Use state-splitting grammar trainer (Petrov et al., 2006) to automatically learn refined grammar (latent annotation) 1. Split a symbol into two subcategories using EM 2. Merge it back if loss in likelihood for merging is small 3. Additive smoothing 4. Repeat • Parse with the refined grammar and recover empty categories 22 Results • 90% of CTB 6.0 for training and the rest for testing • CRF has higher precision than other methods • Parsing method requires least intervention but performed almost as well as other methods • Result is mediocre compared to works on English *PRO* *pro* Prec. Rec. F1 Prec Rec. F1 Pattern 0.65 0.61 0.63 0.41 0.23 0.29 CRF 0.68 0.46 0.55 0.58 0.35 0.44 Parsing 0.60 0.53 0.56 0.46 0.39 0.42 23 Training MT systems 24 Experiments • Subset of FBIS corpus is used for training MT system (2M words, 60K setences) • Dev/test set are from NIST 2002 MT evaluation • Moses (Koehn et al., 2007) and MERT (Och and Ney, 2003) are used for all systems 25 Results • Empty node prediction has mediocre accuracy but brought statistically significant improvement BLEU BP *PRO* (F1) *pro* (F1) Baseline 23.73 1.000 Pattern 23.99 0.998 0.62 0.31 CRF 24.69 1.000 0.55 0.44 Parsing 23.99 1.000 0.56 0.42 26 Sample translations source 中国 计划 *PRO* 投资 在 基础 设施 上 reference china plans to invest in the infrastructure w/ nulls china plans to invest in infrastructure w/o nulls china ’s investment in infrastructure source 有利 *PRO* 巩固 香港 的 贸易 和 航运 中心 reference good for consolidating the trade and shipping center of hong kong w/ nulls favorable to the consolidation of the trade and shipping center in hong kong w/o nulls hong kong will consolidate the trade and shipping center source *pro* 现在 还 不 清楚 reference it is not clear yet w/ nulls it is still not clear w/o nulls is still not clear 27 Conclusion • Adding some empty categories, even when automatic prediction is far from perfect, may help building better machine translation systems • Finding a good strategy for accurate prediction of empty nodes remains to be a challenge 28 Possible future works and difficulties • Better empty category prediction – Tweaking existing models: did not yield better results – Combining different models: conflicts between models – Further research on recent publications needed • Anaphora resolution – Reliable Chinese pronoun resolver is not available – Ways to translate sentence blocks are needed 29 Questions? 30
© Copyright 2026 Paperzz