Poster

Machine translation system for
patent documents combining
rule-based translation and
statistical post-editing applied to
the NTCIR-10 PatentMT Task
Terumasa EHARA
Yamanashi Eiwa College
Table of contents
•
•
•
•
•
•
•
Motivation
System architecture
Evaluation and translation selection
Translation model training
Experiments
Related works
Conclusion
NTCIR-10 Workshop Meeting
1
Motivation
• Hybrid system combining rule-based MT
(RBTM) and statistical post editing (SPE)
may make MT more accurate.
• NTCIR-9’s result shows that simple RBMT
system outperforms RBMT+SPE system.
• Automatic evaluating of two outputs and
selecting the best output may make MT
more accurate.
NTCIR-10 Workshop Meeting
2
System architecture (1)
Evaluation and
translation selection
Machine translated
target sentence
Source sentence
e
m
a
s
e
st
y
s
m
translation
training
Rule-based MT from S to T
(RBMT)
Grammar
Dictionaries
Outputted
target sentence
Post edited
target sentence
Statistical post editing
(SPE)
Translation model
(TM)
Language model
(LM)
TM training
LM training
Rule-based MT from S to T
Source part
Sentence aligned
training data
NTCIR-10 Workshop Meeting
Target part
3
System architecture (2)
• Tools that are used in our experiments
RBMT part: commercial based MT systems
SPE processor: phrase-based Moses
(Rev. 4343, distortion limit = 0)
LM training tool: Srilm (ver.1.5.5)
TM training tool: Giza++ (v1.0.3)
NTCIR-10 Workshop Meeting
4
Evaluation and translation selection (1)
Source
sentence
Source
sentence
MT1
・・・・・
MTn
RBMT
RBMT+SPE
Translated
sentence 1
・・・・・
Translated
sentence n
Translated
sentence 1
Translated
sentence 2
Backward MT
・・・・・
Backward MT
Backward MT
Backward MT
Inv. translated
sentence 1
・・・・・
Inv. translated
sentence n
Inv. translated
sentence 1
Inv. translated
sentence 2
・・・・・
Evaluation and
selection
Evaluation and
selection
Output
Output
NTCIR-10 Workshop Meeting
5
Evaluation and translation selection (2)
• Evaluation criterion: IMPACT
• Bonus score is added to the IMPACT
score of SPE outputs:
Subtask
Bonus
JE
EJ
CE
0.1
∞
0.2
NTCIR-10 Workshop Meeting
6
Evaluation and translation selection (3)
• Preliminary test results using NTCIR-9’ JE
subtask data:
Won system
RBMT1
Tie
EIWA
Counts
97
134
69
Won system
RBMT1
Tie
EIWA
Counts
86
147
67
Won system
RBMT1
Tie
Our method
Counts
34
228
38
Won system
RBMT1
Tie
Our method
Counts
26
235
39
Adequacy
Acceptability
NTCIR-10 Workshop Meeting
7
Evaluation and translation selection (4)
• Preliminary test results using NTCIR-9’ JE
subtask data:
100%
90%
80%
70%
60%
50%
1
2
3
4
5
40%
30%
20%
10%
0%
RBMT1
EIWA
Our method
Oracle
Adequacy
NTCIR-10 Workshop Meeting
8
Evaluation and translation selection (5)
• Preliminary test results using NTCIR-9’ JE
subtask data:
100%
80%
60%
F
C
B
A
AA
40%
20%
0%
RBMT1
EIWA
Our method
Oracle
Acceptability
NTCIR-10 Workshop Meeting
9
Translation model training (1)
• We extract matched data to the test
sentences from total training data.
• Matching algorithm is :
total training data
extraction phase 1
keyword matching
extraction phase 2
sentence similarity (top 10)
test sentence
TM training data
NTCIR-10 Workshop Meeting
sim =
2×# (T ∩ S )
# (T )+ # ( S )
10
Translation model training (2)
Subtask Phase and Eval. Test/dev sentences
JE
EJ
CE
Development
Test (IE PEE)
Development
Test (IE)
Test ChE
Development
Test (IE)
Test (ME PEE)
2,000
2,543
2,000
2,300
2,000
2,000
2,300
2,282
NTCIR-10 Workshop Meeting
Selected training
sentences
253,333
357,443
181,000
205,460
183,663
115,528
99,732
126,321
11
Experimental results (1)
• Preliminary experiment using NTCIR-9’s
JE subtask data
Adequacy (average)
Acceptability
(rate of C or higher)
RBMT1
3.530
EIWA
3.430
0.57
0.49
NTCIR-10 Workshop Meeting
Our method Oracle
3.563
3.853
0.59
0.69
12
Experimental results (2)
• Human judgment (intrinsic evaluation)
Adequacy (average)
Acceptability
(rate of C or higher)
Adequacy (average)
Acceptability
(rate of C or higher)
Adequacy (average)
Acceptability
(rate of C or higher)
EIWA
2.80
---
CE subtask
RBMT1
3.57
EIWA
3.53
---
0.44
RBMT4
---
EIWA
3.42
---
0.59
JE subtask
EJ subtask
NTCIR-10 Workshop Meeting
13
Experimental results (3)
• Automatic evaluation (intrinsic evaluation)
RIBES
BLEU
NIST
EIWA
0.7403
0.2690
7.5480
CE subtask
RIBES
BLEU
NIST
RBMT1
0.7106
0.2035
6.7520
EIWA
0.7402
0.3250
8.2700
RIBES
BLEU
NIST
RBMT4
0.7111
0.2244
6.2950
EIWA
0.7692
0.3693
8.5010
JE subtask
EJ subtask
NTCIR-10 Workshop Meeting
14
Related works (1)
Structural transfer
(ST)
MT
Lexical transfer
(LT)
NTCIR-10 Workshop Meeting
15
Related works (2)
Our method
RBMT
Input
ST / LT
Pre-ordering
method
Rule based
Input
pre-ordering
ST
SPE
Output
LT
SMT
Output
LT
NTCIR-10 Workshop Meeting
16
Conclusion
• Combining rule-based machine translation
and statistical post-editing, we can
improve automatic evaluation score.
• Human judgment score of simple rulebased system is higher than our method,
but the difference is not statistically
significant.
NTCIR-10 Workshop Meeting
17
Future work
• To improve the parsing accuracy in the
RBMT part.
• Syntactically collapsed outputs from the
RBMT part can't be recovered by the SPE
part.
NTCIR-10 Workshop Meeting
18