Bayesian Reordering Model
with Feature Selection
Abdullah Alrajeh and Mahesan Niranjan
The Ninth Workshop on Statistical Machine Translation
Baltimore, Maryland USA
June 26-27, 2014
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
1 / 10
Translation System Overview
Given a foreign sentence f, the best translation e is (Brown et al., 1993):
ebest = arg max p(e|f)
e
p(f|e)p(e)
e
p(f)
= arg max Translation Model × Language Model
= arg max
e
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
2 / 10
Translation System Overview
Given a foreign sentence f, the best translation e is (Brown et al., 1993):
ebest = arg max p(e|f)
e
p(f|e)p(e)
e
p(f)
= arg max Translation Model × Language Model
= arg max
e
ebest = arg max {pt (f|e)λt plm (e)λlm plex (f|e)λlex preo (f, e)λreo w |e|λw }
e
X
= arg max
λi log pi (f, e)
e
i
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
2 / 10
Translation System Overview
Given a foreign sentence f, the best translation e is (Brown et al., 1993):
ebest = arg max p(e|f)
e
p(f|e)p(e)
e
p(f)
= arg max Translation Model × Language Model
= arg max
e
ebest = arg max {pt (f|e)λt plm (e)λlm plex (f|e)λlex preo (f, e)λreo w |e|λw }
e
X
= arg max
λi log pi (f, e)
e
i
In general, reordering model is defined as:
Y
Y h(f̄n , ēn , on )
preo (f, e) =
p(on |f̄n , ēn ) =
P
k h(f̄n , ēn , ok )
n
n
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
2 / 10
Reordering Models
Foreign sentence f :
English sentence e :
f̄1
ē1
f̄2
ē3
f̄3 .
ē2
.
preo (f, e) = p(o1 = mono|f̄1 , ē1 )×p(o2 = swap|f̄2 , ē2 ) × p(o3 = other|f̄3 , ē3 )
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
3 / 10
Reordering Models
Foreign sentence f :
English sentence e :
f̄1
ē1
f̄2
ē3
f̄3 .
ē2
.
preo (f, e) = p(o1 = mono|f̄1 , ē1 )×p(o2 = swap|f̄2 , ē2 ) × p(o3 = other|f̄3 , ē3 )
Lexicalized Reordering Model (Tillmann, 2004; Kumar and Byrne, 2005;
Koehn et al., 2005; Galley and Manning, 2008)
p(ok |f̄n , ēn ) = P
Abdullah Alrajeh and Mahesan Niranjan
count(f̄n , ēn , ok )
k 0 count(f̄n , ēn , ok 0 )
Bayesian Reordering Model with Feature Selection
3 / 10
Reordering Models
Foreign sentence f :
English sentence e :
f̄1
ē1
f̄2
ē3
f̄3 .
ē2
.
preo (f, e) = p(o1 = mono|f̄1 , ē1 )×p(o2 = swap|f̄2 , ē2 ) × p(o3 = other|f̄3 , ē3 )
Lexicalized Reordering Model (Tillmann, 2004; Kumar and Byrne, 2005;
Koehn et al., 2005; Galley and Manning, 2008)
p(ok |f̄n , ēn ) = P
count(f̄n , ēn , ok )
k 0 count(f̄n , ēn , ok 0 )
Discriminative Reordering Model (Zens and Ney, 2006; Xiong et al.,
2006; Nguyen et al., 2009; Xiang et al., 2011; Ni et al., 2011)
exp(wTk φ(f̄n , ēn ))
exp(wT φ(f̄n , ēn , ok ))
p(ok |f̄n , ēn ) = P
≡
P
T
T
k 0 exp(wk 0 φ(f̄ , ē))
k 0 exp(w φ(f̄ , ē, ok 0 ))
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
3 / 10
Feature Extraction
Foreign sentence f :
f1
English sentence e :
e1
Abdullah Alrajeh and Mahesan Niranjan
f2
1
f3
1
e2
e3
f4
3
f5
2
e4
f6
e5
3
2
Bayesian Reordering Model with Feature Selection
.
.
4 / 10
Feature Extraction
Foreign sentence f :
f1
English sentence e :
e1
f2
1
f3
1
e2
e3
f4
3
f5
2
e4
f6
e5
3
2
.
.
Extracted phrase pairs :
f̄n
||| ēn ||| on ||| word alignment ||| context feature
f1 f2
||| e1 ||| mono ||| 0-0 1-0
||| +f3
f3 f4 f5 ||| e4 e5 ||| swap ||| 0-1 2-0
||| −f2 + f6
f6
||| e2 e3 ||| other ||| 0-0 0-1
||| −f5
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
4 / 10
Feature Extraction
Foreign sentence f :
f1
English sentence e :
e1
f2
1
f3
1
e2
e3
f4
3
f5
2
e4
f6
e5
3
2
.
.
Extracted phrase pairs :
f̄n
||| ēn ||| on ||| word alignment ||| context feature
f1 f2
||| e1 ||| mono ||| 0-0 1-0
||| +f3
f3 f4 f5 ||| e4 e5 ||| swap ||| 0-1 2-0
||| −f2 + f6
f6
||| e2 e3 ||| other ||| 0-0 0-1
||| −f5
All linguistic features:
(f1 &e1 )1 (f2 &e1 )2 (+f3 )3 (f3 &e5 )4 (f5 &e4 )5 (−f2 )6 (+f6 )7 (f6 &e2 )8 (f6 &e3 )9 (−f5 )10
Bag-of-words representation (0=not exist):
φ(f̄n , ēn )
1 2 3 4 5 6 7 8 9 10
φ(f̄1 , ē1 ) = 1 1 1 0 0 0 0 0 0 0
φ(f̄2 , ē2 ) = 0 0 0 1 1 1 1 0 0 0
φ(f̄3 , ē3 ) = 0 0 0 0 0 0 0 1 1 1
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
4 / 10
The Proposed Reordeing Model
Naive Bayes
p(f̄n , ēn |ok )p(ok )
.
p(ok |f̄n , ēn ) = P
k 0 p(f̄n , ēn |o)p(ok 0 )
Multinomial distribution:
p(f̄n , ēn |qk ) = C
M
Y
qkm φm (f̄n ,ēn )
m
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
5 / 10
The Proposed Reordeing Model
Naive Bayes
p(f̄n , ēn |ok )p(ok )
.
p(ok |f̄n , ēn ) = P
k 0 p(f̄n , ēn |o)p(ok 0 )
Multinomial distribution:
p(f̄n , ēn |qk ) = C
M
Y
qkm φm (f̄n ,ēn )
m
Maximum-likelihood estimation:
∗
qkm
= arg max
Nk
Y
qk
Abdullah Alrajeh and Mahesan Niranjan
n
PNk
n φm (f̄n , ēn )
p(f̄n , ēn |qk ) = PM P
.
Nk
n φm0 (f̄n , ēn )
m0
Bayesian Reordering Model with Feature Selection
5 / 10
The Proposed Reordeing Model
Naive Bayes
p(f̄n , ēn |ok )p(ok )
.
p(ok |f̄n , ēn ) = P
k 0 p(f̄n , ēn |o)p(ok 0 )
Multinomial distribution:
p(f̄n , ēn |qk ) = C
M
Y
qkm φm (f̄n ,ēn )
m
Maximum-likelihood estimation:
∗
qkm
= arg max
Nk
Y
qk
n
PNk
n φm (f̄n , ēn )
p(f̄n , ēn |qk ) = PM P
.
Nk
n φm0 (f̄n , ēn )
m0
Maximum a posteriori (MAP) estimation:
∗
qkm
= arg max
qk
Nk
Y
n
P k
α−1+ N
n φm (f̄n , ēn )
p(f̄n , ēn |qk )p(qk |α) =
PM PNk
M(α − 1) + m0 n φm0 (f̄n , ēn )
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
5 / 10
The Proposed Reordeing Model
Bayesian Naive Bayes (Barber, 2012)
p(f̄n , ēn |ok )p(ok )
p(ok |f̄n , ēn ) = P
.
k 0 p(f̄n , ēn |o)p(ok 0 )
Prior and Posterior
Full Bayesian Inference
Multinomial-Dirichlet
Z
p(f̄n , ēn |ok ) = p(f̄n , ēn |qk )p(qk |αk ) dqk
P
Q
Γ ( m αkm ) m Γ(αkm + φm (f̄n , ēn ))
=C Q
P
m Γ(αkm ) Γ
m αkm + φm (f̄n , ēn )
Q k
p(qk |α) N
n p(f̄n , ēn |qk )
p(qk |αk ) = R
QNk
p(qk |α) n p(f̄n , ēn |qk )dqk
Abdullah Alrajeh and Mahesan Niranjan
αk = α +
PNk
n
φ(f̄n , ēn )
Bayesian Reordering Model with Feature Selection
6 / 10
Classification Results
3-class problem: mono , swap , other
Table: Arabic-English MultiUN corpus (Eisele and Chen, 2010)
Statistics
Sentence Pairs
Running Words
Word/Line
Vocabulary Size
Abdullah Alrajeh and Mahesan Niranjan
Arabic
English
9.7 M
255.5 M 285.7 M
22
25
677 K
410 K
Bayesian Reordering Model with Feature Selection
7 / 10
Classification Results
3-class problem: mono , swap , other
Table: Arabic-English MultiUN corpus (Eisele and Chen, 2010)
Statistics
Sentence Pairs
Running Words
Word/Line
Vocabulary Size
Arabic
English
9.7 M
255.5 M 285.7 M
22
25
677 K
410 K
Table: Error rate based on 3-fold cross-validation
Classifier
Lexicalized model
Bayes-MAP estimate
Bayes-Bayesian inference
Abdullah Alrajeh and Mahesan Niranjan
Error Rate
25.2%
19.53%
20.13%
Bayesian Reordering Model with Feature Selection
7 / 10
Feature Selection
Normalized mutual information (Estevez et al., 2009):
Inorm (X ; Y ) =
I(X ; Y )
.
min(H(X ), H(Y ))
22.5
22
Error Rate
21.5
21
20.5
20
19.5
19
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Percentage of Feature Reductuion
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
8 / 10
1
Translation Results
Table: NIST test sets (4 references for each Arabic sentence)
Evaluation Set
NIST MT06 sentences
words
NIST MT08 sentences
words
Abdullah Alrajeh and Mahesan Niranjan
Arabic
1797
49 K
813
25 K
English
7188
223 K
3252
117 K
Bayesian Reordering Model with Feature Selection
9 / 10
Translation Results
Table: NIST test sets (4 references for each Arabic sentence)
Evaluation Set
NIST MT06 sentences
words
NIST MT08 sentences
words
Arabic
1797
49 K
813
25 K
English
7188
223 K
3252
117 K
Table: BLEU Score (Papineni et al., 2002)
Translation System
Baseline
BL + Lexicalized ReoM
BL + Bayes-MAP ReoM
BL + Bayes-Baysien ReoM
Abdullah Alrajeh and Mahesan Niranjan
ReoM Size
604 MB
18 MB
18 MB
Speed
2.2 sec/s
2.6 sec/s
2.6 sec/s
MT06
28.92
30.86
31.21
31.20
Bayesian Reordering Model with Feature Selection
MT08
32.13
34.22
34.72
34.69
9 / 10
Thank you for your attention.
Abdullah Alrajeh and Mahesan Niranjan
Bayesian Reordering Model with Feature Selection
10 / 10
© Copyright 2025 Paperzz