Machine translation system for patent documents combining rule-based translation and statistical post-editing applied to the NTCIR-10 PatentMT Task Terumasa EHARA Yamanashi Eiwa College Table of contents • • • • • • • Motivation System architecture Evaluation and translation selection Translation model training Experiments Related works Conclusion NTCIR-10 Workshop Meeting 1 Motivation • Hybrid system combining rule-based MT (RBTM) and statistical post editing (SPE) may make MT more accurate. • NTCIR-9’s result shows that simple RBMT system outperforms RBMT+SPE system. • Automatic evaluating of two outputs and selecting the best output may make MT more accurate. NTCIR-10 Workshop Meeting 2 System architecture (1) Evaluation and translation selection Machine translated target sentence Source sentence e m a s e st y s m translation training Rule-based MT from S to T (RBMT) Grammar Dictionaries Outputted target sentence Post edited target sentence Statistical post editing (SPE) Translation model (TM) Language model (LM) TM training LM training Rule-based MT from S to T Source part Sentence aligned training data NTCIR-10 Workshop Meeting Target part 3 System architecture (2) • Tools that are used in our experiments RBMT part: commercial based MT systems SPE processor: phrase-based Moses (Rev. 4343, distortion limit = 0) LM training tool: Srilm (ver.1.5.5) TM training tool: Giza++ (v1.0.3) NTCIR-10 Workshop Meeting 4 Evaluation and translation selection (1) Source sentence Source sentence MT1 ・・・・・ MTn RBMT RBMT+SPE Translated sentence 1 ・・・・・ Translated sentence n Translated sentence 1 Translated sentence 2 Backward MT ・・・・・ Backward MT Backward MT Backward MT Inv. translated sentence 1 ・・・・・ Inv. translated sentence n Inv. translated sentence 1 Inv. translated sentence 2 ・・・・・ Evaluation and selection Evaluation and selection Output Output NTCIR-10 Workshop Meeting 5 Evaluation and translation selection (2) • Evaluation criterion: IMPACT • Bonus score is added to the IMPACT score of SPE outputs: Subtask Bonus JE EJ CE 0.1 ∞ 0.2 NTCIR-10 Workshop Meeting 6 Evaluation and translation selection (3) • Preliminary test results using NTCIR-9’ JE subtask data: Won system RBMT1 Tie EIWA Counts 97 134 69 Won system RBMT1 Tie EIWA Counts 86 147 67 Won system RBMT1 Tie Our method Counts 34 228 38 Won system RBMT1 Tie Our method Counts 26 235 39 Adequacy Acceptability NTCIR-10 Workshop Meeting 7 Evaluation and translation selection (4) • Preliminary test results using NTCIR-9’ JE subtask data: 100% 90% 80% 70% 60% 50% 1 2 3 4 5 40% 30% 20% 10% 0% RBMT1 EIWA Our method Oracle Adequacy NTCIR-10 Workshop Meeting 8 Evaluation and translation selection (5) • Preliminary test results using NTCIR-9’ JE subtask data: 100% 80% 60% F C B A AA 40% 20% 0% RBMT1 EIWA Our method Oracle Acceptability NTCIR-10 Workshop Meeting 9 Translation model training (1) • We extract matched data to the test sentences from total training data. • Matching algorithm is : total training data extraction phase 1 keyword matching extraction phase 2 sentence similarity (top 10) test sentence TM training data NTCIR-10 Workshop Meeting sim = 2×# (T ∩ S ) # (T )+ # ( S ) 10 Translation model training (2) Subtask Phase and Eval. Test/dev sentences JE EJ CE Development Test (IE PEE) Development Test (IE) Test ChE Development Test (IE) Test (ME PEE) 2,000 2,543 2,000 2,300 2,000 2,000 2,300 2,282 NTCIR-10 Workshop Meeting Selected training sentences 253,333 357,443 181,000 205,460 183,663 115,528 99,732 126,321 11 Experimental results (1) • Preliminary experiment using NTCIR-9’s JE subtask data Adequacy (average) Acceptability (rate of C or higher) RBMT1 3.530 EIWA 3.430 0.57 0.49 NTCIR-10 Workshop Meeting Our method Oracle 3.563 3.853 0.59 0.69 12 Experimental results (2) • Human judgment (intrinsic evaluation) Adequacy (average) Acceptability (rate of C or higher) Adequacy (average) Acceptability (rate of C or higher) Adequacy (average) Acceptability (rate of C or higher) EIWA 2.80 --- CE subtask RBMT1 3.57 EIWA 3.53 --- 0.44 RBMT4 --- EIWA 3.42 --- 0.59 JE subtask EJ subtask NTCIR-10 Workshop Meeting 13 Experimental results (3) • Automatic evaluation (intrinsic evaluation) RIBES BLEU NIST EIWA 0.7403 0.2690 7.5480 CE subtask RIBES BLEU NIST RBMT1 0.7106 0.2035 6.7520 EIWA 0.7402 0.3250 8.2700 RIBES BLEU NIST RBMT4 0.7111 0.2244 6.2950 EIWA 0.7692 0.3693 8.5010 JE subtask EJ subtask NTCIR-10 Workshop Meeting 14 Related works (1) Structural transfer (ST) MT Lexical transfer (LT) NTCIR-10 Workshop Meeting 15 Related works (2) Our method RBMT Input ST / LT Pre-ordering method Rule based Input pre-ordering ST SPE Output LT SMT Output LT NTCIR-10 Workshop Meeting 16 Conclusion • Combining rule-based machine translation and statistical post-editing, we can improve automatic evaluation score. • Human judgment score of simple rulebased system is higher than our method, but the difference is not statistically significant. NTCIR-10 Workshop Meeting 17 Future work • To improve the parsing accuracy in the RBMT part. • Syntactically collapsed outputs from the RBMT part can't be recovered by the SPE part. NTCIR-10 Workshop Meeting 18
© Copyright 2026 Paperzz