Inducing Sentence Structure from Parallel Corpora for Reordering

Inducing Sentence Structure
from Parallel Corpora for Reordering
John DeNero and Jakob Uszkoreit
Overview
2
Overview
• Translation is an end-task application for syntactic parsing
2
Overview
• Translation is an end-task application for syntactic parsing
 Lexical disambiguation
2
Overview
• Translation is an end-task application for syntactic parsing
 Lexical disambiguation
 Structural (hierarchical) reordering
2
Overview
• Translation is an end-task application for syntactic parsing
 Lexical disambiguation
 Structural (hierarchical) reordering
• Syntactic pre-ordering uses a supervised parser to predict structure
2
Overview
• Translation is an end-task application for syntactic parsing
 Lexical disambiguation
 Structural (hierarchical) reordering
• Syntactic pre-ordering uses a supervised parser to predict structure
• This talk: Unsupervised approach to predicting sentence structure
2
Overview
• Translation is an end-task application for syntactic parsing
 Lexical disambiguation
 Structural (hierarchical) reordering
• Syntactic pre-ordering uses a supervised parser to predict structure
• This talk: Unsupervised approach to predicting sentence structure
 Pipeline of reordering-focused prediction problems
2
Overview
• Translation is an end-task application for syntactic parsing
 Lexical disambiguation
 Structural (hierarchical) reordering
• Syntactic pre-ordering uses a supervised parser to predict structure
• This talk: Unsupervised approach to predicting sentence structure
 Pipeline of reordering-focused prediction problems
 Learning signal comes from aligned parallel corpora
2
Syntax Matters for Hierarchical Reordering
English-to-Japanese Web Translation
Tune
Test
Single-Reference BLEU
3
Syntax Matters for Hierarchical Reordering
Tune
English-to-Japanese Web Translation
Test
Phrase-Based + Lexicalized Reordering
Phrase-Based + Syntactic Pre-Ordering
(Genzel, Coling 2010)
Forest-to-String with Flexible Binarization
(Zhang et al., ACL 2011)
STIR
(This paper)
0
5
10
15
20
25
Single-Reference BLEU
3
Syntax Matters for Hierarchical Reordering
Tune
English-to-Japanese Web Translation
Test
19.5
Phrase-Based + Lexicalized Reordering
18.9
Phrase-Based + Syntactic Pre-Ordering
(Genzel, Coling 2010)
Forest-to-String with Flexible Binarization
(Zhang et al., ACL 2011)
STIR
(This paper)
0
5
10
15
20
25
Single-Reference BLEU
3
Syntax Matters for Hierarchical Reordering
Tune
English-to-Japanese Web Translation
Test
19.5
Phrase-Based + Lexicalized Reordering
18.9
22.6
Phrase-Based + Syntactic Pre-Ordering
23.3
(Genzel, Coling 2010)
Forest-to-String with Flexible Binarization
(Zhang et al., ACL 2011)
STIR
(This paper)
0
5
10
15
20
25
Single-Reference BLEU
3
Syntax Matters for Hierarchical Reordering
Tune
English-to-Japanese Web Translation
Test
19.5
Phrase-Based + Lexicalized Reordering
18.9
22.6
Phrase-Based + Syntactic Pre-Ordering
23.3
(Genzel, Coling 2010)
23.1
Forest-to-String with Flexible Binarization
22.9
(Zhang et al., ACL 2011)
STIR
(This paper)
0
5
10
15
20
25
Single-Reference BLEU
3
Syntax Matters for Hierarchical Reordering
Tune
English-to-Japanese Web Translation
Test
19.5
Phrase-Based + Lexicalized Reordering
18.9
22.6
Phrase-Based + Syntactic Pre-Ordering
23.3
(Genzel, Coling 2010)
23.1
Forest-to-String with Flexible Binarization
22.9
(Zhang et al., ACL 2011)
22.5
STIR
22.9
(This paper)
0
5
10
15
20
25
Single-Reference BLEU
3
Syntax Matters for Hierarchical Reordering
Tune
English-to-Japanese Web Translation
Test
19.5
Phrase-Based + Lexicalized Reordering
18.9
22.6
Phrase-Based + Syntactic Pre-Ordering
23.3
(Genzel, Coling 2010)
23.1
Forest-to-String with Flexible Binarization
22.9
(Zhang et al., ACL 2011)
22.5
+4
STIR
(This paper)
0
5
10
15
22.9
20
25
Single-Reference BLEU
3
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
SVO
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
SVO
Monolingual
parsing model
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
SVO
Monolingual
parsing model
SVO
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
SVO
Monolingual
parsing model
SVO
Reordering
model
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
SVO
Monolingual
parsing model
SVO
Reordering
model
SOV
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
SVO
Monolingual
parsing model
Translation Pipeline
Source
training
Target
training
SVO
Reordering
model
SOV
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
SVO
Monolingual
parsing model
SVO
Reordering
model
SOV
Translation Pipeline
Source
training
Target
training
Word aligner
Phrase extraction
Model estimation
Translation
Model
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
SVO
Monolingual
parsing model
SVO
Reordering
model
SOV
Translation Pipeline
Source
training
Preprocess
Target
training
Word aligner
Phrase extraction
Model estimation
Translation
Model
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
Source
training
SVO
Preprocess
Monolingual
parsing model
Target
training
Word aligner
Phrase extraction
SVO
Model estimation
Reordering
model
Translation
Model
SOV
Source
input
Target
output
4
Syntactic Pre-Ordering in Phrase-Based Systems
Pre-Ordering Pipeline
Translation Pipeline
Source
training
SVO
Preprocess
Monolingual
parsing model
Word aligner
Phrase extraction
SVO
Model estimation
Reordering
model
SOV
Target
training
Preprocess
Source
input
Translation
Model
Target
output
4
Syntactic Pre-Ordering Example
pair
added
to
the
lexicon
5
Syntactic Pre-Ordering Example
S
VP
PP
NP
NP
NN
VBD
TO
DT
NN
pair
added
to
the
lexicon
5
Syntactic Pre-Ordering Example
S
S
VP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
VBD
pair
added
to
the
lexicon
pair
added
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
TO
VBD
pair
added
to
the
lexicon
pair
to
added
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
NN
TO
the lexicon to
VBD
added
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
NN
TO
the lexicon to
VBD
added
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
NN
TO
the lexicon to
VBD
added
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
NN
TO
the lexicon to
VBD
added
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
NN
TO
the lexicon to
VBD
added
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
NN
TO
the lexicon to
VBD
added
一対 が
pair
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
NN
TO
the lexicon to
VBD
added
一対 が 目録
pair
list
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
TO
the lexicon to
一対 が 目録
pair
NN
list
VBD
added
に
to
5
Syntactic Pre-Ordering Example
S
S
VP
VP
PP
PP
NP
NP
NP
NN
VBD
TO
DT
NN
NN
pair
added
to
the
lexicon
pair
NP
DT
TO
the lexicon to
一対 が 目録
pair
NN
list
VBD
added
に
追加されました
to
add
was
5
Bracketing and Word Classes are Often Sufficient
Decisions
pair
added
to
the
Methods
lexicon
6
Bracketing and Word Classes are Often Sufficient
Decisions
Methods
N
V
A
D
N
pair
added
to
the
lexicon
6
Bracketing and Word Classes are Often Sufficient
Decisions
Methods
Universal Parts-of-speech
(Petrov et al., 2011)
N
V
A
D
N
pair
added
to
the
lexicon
6
Bracketing and Word Classes are Often Sufficient
Decisions
Methods
Universal Parts-of-speech
(Petrov et al., 2011)
N
V
A
D
N
pair
added
to
the
lexicon
• Supervised tagging model
• Project models via alignments
• Unsupervised POS induction
6
Bracketing and Word Classes are Often Sufficient
Decisions
Methods
Universal Parts-of-speech
(Petrov et al., 2011)
N
V
A
D
N
pair
added
to
the
lexicon
• Supervised tagging model
• Project models via alignments
• Unsupervised POS induction
6
Bracketing and Word Classes are Often Sufficient
Decisions
Methods
Universal Parts-of-speech
(Petrov et al., 2011)
N
V
A
D
N
pair
added
to
the
lexicon
• Supervised tagging model
• Project models via alignments
• Unsupervised POS induction
• Alignments ≈ bracketing
• Discriminative bracketing model
6
Bracketing and Word Classes are Often Sufficient
Decisions
Methods
Universal Parts-of-speech
(Petrov et al., 2011)
N
V
A
D
N
pair
added
to
the
lexicon
• Supervised tagging model
• Project models via alignments
• Unsupervised POS induction
• Alignments ≈ bracketing
• Discriminative bracketing model
6
Bracketing and Word Classes are Often Sufficient
Decisions
Methods
Universal Parts-of-speech
(Petrov et al., 2011)
N
V
A
D
N
pair
added
to
the
lexicon
• Supervised tagging model
• Project models via alignments
• Unsupervised POS induction
• Alignments ≈ bracketing
• Discriminative bracketing model
• Alignments gives reordering
• Reordering classifier
6
Pre-Ordering from a Parallel Corpus
Analyze Aligned Parallel Corpus
Pre-Ordering Pipeline
SVO
Monolingual
parsing model
SVO
Reordering
model
SOV
7
Pre-Ordering from a Parallel Corpus
Analyze Aligned Parallel Corpus
一対
Pre-Ordering Pipeline
SVO
が
目録
に
追加
されました
pair added to
the lexicon
Monolingual
parsing model
SVO
Reordering
model
SOV
7
Pre-Ordering from a Parallel Corpus
Analyze Aligned Parallel Corpus
一対
Pre-Ordering Pipeline
SVO
が
目録
に
追加
されました
pair added to
the lexicon
Monolingual
parsing model
SVO
Reordering
model
SOV
7
Pre-Ordering from a Parallel Corpus
Analyze Aligned Parallel Corpus
一対
Pre-Ordering Pipeline
SVO
が
目録
に
追加
されました
pair added to
the lexicon
Monolingual
parsing model
SVO
Reordering
model
SOV
7
Binary Bracketing Trees with Phrasal Terminals
A tree is defined by the
set of spans it contains
pair added to
the lexicon
8
Binary Bracketing Trees with Phrasal Terminals
A tree is defined by the
set of spans it contains
pair added to
the lexicon
9
Binary Bracketing Trees with Phrasal Terminals
A tree is defined by the
set of spans it contains
Not every word will
correspond to a span
pair added to
the lexicon
9
Binary Bracketing Trees with Phrasal Terminals
A tree is defined by the
set of spans it contains
Not every word will
correspond to a span
A tree can be composed
of only a single span
pair added to
the lexicon
9
Parallel Parsing Model Constraint
一対
が
目録
に
追加
されました
pair added to the lexicon
10
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
追加
されました
pair added to the lexicon
10
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
追加
されました
pair added to the lexicon
10
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
追加
ITG?
されました
pair added to the lexicon
10
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
追加
ITG?
されました
ITG.
pair added to the lexicon
10
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
Remaining ambiguity:
追加
• Ordering alternatives
されました
pair added to the lexicon
11
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
Remaining ambiguity:
追加
• Ordering alternatives
されました
vs
pair added to the lexicon
pair added to the lexicon
11
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
追加
されました
Remaining ambiguity:
• Ordering alternatives
• Phrase granularity
pair added to the lexicon
12
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
追加
されました
Remaining ambiguity:
• Ordering alternatives
• Phrase granularity
pair added to the lexicon
12
Parallel Parsing Model Constraint
一対
が
An alignment licenses a
tree if every tree span is
aligned contiguously
目録
に
追加
されました
Remaining ambiguity:
• Ordering alternatives
• Phrase granularity
vs
pair added to the lexicon
pair added to the lexicon
12
Parallel Parsing Model
vs
pair added to the lexicon
pair added to the lexicon
13
Parallel Parsing Model
φ(pair) · φ(added)
· φ(to) · φ(the lexicon)
vs
pair added to the lexicon
pair added to the lexicon
13
Parallel Parsing Model
φ(pair) · φ(added)
φ(pair)·
· φ(to) · φ(the lexicon)
φ(added to the lexicon)
vs
pair added to the lexicon
pair added to the lexicon
13
Parallel Parsing Model
φ(e) =
�
κ
contiguous(e)
total(e)
if |t| = 1
otherwise
φ(pair) · φ(added)
φ(pair)·
· φ(to) · φ(the lexicon)
φ(added to the lexicon)
vs
pair added to the lexicon
pair added to the lexicon
13
Parallel Parsing Model
φ(e) =
�
κ
contiguous(e)
total(e)
if |t| = 1
otherwise
φ(pair) · φ(added)
English-Japanese
κ = 0.3
φ(pair)·
· φ(to) · φ(the lexicon)
φ(added to the lexicon)
vs
pair added to the lexicon
pair added to the lexicon
13
Monolingual Parsing Model
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
• Maximum terminal phrase length (2 for English-Japanese)
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
• Maximum terminal phrase length (2 for English-Japanese)
pair added to the lexicon
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
• Maximum terminal phrase length (2 for English-Japanese)
∗
t :
e:
pair added to the lexicon
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
• Maximum terminal phrase length (2 for English-Japanese)
Pw (t|e) ∝ exp(w · θ(t, e))
∗
t :
e:
pair added to the lexicon
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
• Maximum terminal phrase length (2 for English-Japanese)
∗
t :
e:
pair added to the lexicon
Pw (t|e) ∝ exp(w · θ(t, e))


�
∗

Pw (t |e)
arg max
w
(t∗ ,e)
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
• Maximum terminal phrase length (2 for English-Japanese)
�
φ(e)
e
∗
t :
e:
pair added to the lexicon
Pw (t|e) ∝ exp(w · θ(t, e))


�
∗

Pw (t |e)
arg max
w
(t∗ ,e)
14
Monolingual Parsing Model
• Predict the same tree as the parallel parser, without the alignments
• Lexical, word class, corpus statistics, & length features
• Conditional likelihood objective
• Maximum terminal phrase length (2 for English-Japanese)
�
e
φ(e)
, constituents, contexts, span length, common words, ...
∗
t :
e:
pair added to the lexicon
Pw (t|e) ∝ exp(w · θ(t, e))


�
∗

Pw (t |e)
arg max
w
(t∗ ,e)
14
Reordering Models for Terminals & Non-Terminals
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
• Features similar to monolingual parser
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
• Features similar to monolingual parser
• Terminals vs non-terminals
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
• Features similar to monolingual parser
• Terminals vs non-terminals
• Maximum Entropy objective
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
• Features similar to monolingual parser
• Terminals vs non-terminals
• Maximum Entropy objective
∗
t :
e:
pair added to the lexicon
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
• Features similar to monolingual parser
• Terminals vs non-terminals
• Maximum Entropy objective
Non-terminal model
trained on tree spans
∗
t :
e:
pair added to the lexicon
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
• Features similar to monolingual parser
• Terminals vs non-terminals
• Maximum Entropy objective
Non-terminal model
trained on tree spans
Terminal model trained on all
contiguously aligned spans
∗
t :
e:
pair added to the lexicon
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
一対
• Features similar to monolingual parser
が
• Terminals vs non-terminals
目録
に
• Maximum Entropy objective
追加
されました
Non-terminal model
trained on tree spans
added
to
the lexicon
Terminal model trained on all
contiguously aligned spans
∗
t :
e:
pair
pair added to the lexicon
15
Reordering Models for Terminals & Non-Terminals
• Predict which spans to permute
一対
• Features similar to monolingual parser
が
• Terminals vs non-terminals
目録
に
• Maximum Entropy objective
追加
されました
Non-terminal model
trained on tree spans
added
to
the lexicon
Terminal model trained on all
contiguously aligned spans
∗
t :
e:
pair
pair added to the lexicon
15
Related Work (Sub-Sampled from Paper)
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
 Learn the ordering for each pair of words
(Tromble & Eisner, EMNLP 09)
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
 Learn the ordering for each pair of words
 Learn post-ordering collocations
(Tromble & Eisner, EMNLP 09)
(Visweswariah, EMNLP 2011)
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
 Learn the ordering for each pair of words
 Learn post-ordering collocations
(Tromble & Eisner, EMNLP 09)
(Visweswariah, EMNLP 2011)
• Synchronous grammar induction
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
 Learn the ordering for each pair of words
 Learn post-ordering collocations
(Tromble & Eisner, EMNLP 09)
(Visweswariah, EMNLP 2011)
• Synchronous grammar induction
 ITG models focus on lexical productions
(Blunsom et al., ACL 09)
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
 Learn the ordering for each pair of words
 Learn post-ordering collocations
(Tromble & Eisner, EMNLP 09)
(Visweswariah, EMNLP 2011)
• Synchronous grammar induction
 ITG models focus on lexical productions
(Blunsom et al., ACL 09)
• Monolingual grammar induction
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
 Learn the ordering for each pair of words
 Learn post-ordering collocations
(Tromble & Eisner, EMNLP 09)
(Visweswariah, EMNLP 2011)
• Synchronous grammar induction
 ITG models focus on lexical productions
(Blunsom et al., ACL 09)
• Monolingual grammar induction
 Likelihood objective over a monolingual corpus
16
Related Work (Sub-Sampled from Paper)
• Syntactic Pre-ordering
• Pre-ordering as permutation prediction
 Learn the ordering for each pair of words
 Learn post-ordering collocations
(Tromble & Eisner, EMNLP 09)
(Visweswariah, EMNLP 2011)
• Synchronous grammar induction
 ITG models focus on lexical productions
(Blunsom et al., ACL 09)
• Monolingual grammar induction
 Likelihood objective over a monolingual corpus
 Alignments
(Kuhn, ACL 04)
and bitexts (Snyder et al., ACL 09) are useful
16
An Evaluation Method for Pre-Ordering Pipelines
17
An Evaluation Method for Pre-Ordering Pipelines
1 • Annotate a parallel corpus with word alignments
 Translators are responsible for annotation
 Solicit translations that align well
 Details in Talbot et al. (WMT 2011)
17
An Evaluation Method for Pre-Ordering Pipelines
1 • Annotate a parallel corpus with word alignments
 Translators are responsible for annotation
 Solicit translations that align well
 Details in Talbot et al. (WMT 2011)
∗
2 • Derive “gold ordering”
σ
from word alignments
 Sort source words by the index of their aligned targets
 Sensible handling of “null” alignments
17
An Evaluation Method for Pre-Ordering Pipelines
1 • Annotate a parallel corpus with word alignments
 Translators are responsible for annotation
 Solicit translations that align well
 Details in Talbot et al. (WMT 2011)
∗
2 • Derive “gold ordering”
σ
from word alignments
 Sort source words by the index of their aligned targets
 Sensible handling of “null” alignments
3 • Compute “fuzzy reordering” metric for proposed ordering
σ̂
|e| − chunks(σ̂, σ ∗ )
|e| − 1
17
An Evaluation Method for Pre-Ordering Pipelines
1 • Annotate a parallel corpus with word alignments
 Translators are responsible for annotation
 Solicit translations that align well
 Details in Talbot et al. (WMT 2011)
∗
2 • Derive “gold ordering”
σ
from word alignments
 Sort source words by the index of their aligned targets
 Sensible handling of “null” alignments
3 • Compute “fuzzy reordering” metric for proposed ordering
|e| − chunks(σ̂, σ ∗ )
|e| − 1
Partition
σ̂
σ̂
into pieces, s.t.
17
An Evaluation Method for Pre-Ordering Pipelines
1 • Annotate a parallel corpus with word alignments
 Translators are responsible for annotation
 Solicit translations that align well
 Details in Talbot et al. (WMT 2011)
∗
2 • Derive “gold ordering”
σ
from word alignments
 Sort source words by the index of their aligned targets
 Sensible handling of “null” alignments
3 • Compute “fuzzy reordering” metric for proposed ordering
|e| − chunks(σ̂, σ ∗ )
|e| − 1
Partition
σ̂
σ̂
into pieces, s.t.
All pieces contiguous in
σ
∗
17
An Evaluation Method for Pre-Ordering Pipelines
1 • Annotate a parallel corpus with word alignments
 Translators are responsible for annotation
 Solicit translations that align well
 Details in Talbot et al. (WMT 2011)
∗
2 • Derive “gold ordering”
σ
from word alignments
 Sort source words by the index of their aligned targets
 Sensible handling of “null” alignments
3 • Compute “fuzzy reordering” metric for proposed ordering
|e| − chunks(σ̂, σ ∗ )
|e| − 1
σ̂
min size Partition σ̂ into pieces, s.t.
partition All pieces contiguous in σ ∗
17
Results: Fuzzy Reordering
18
Results: Fuzzy Reordering
Monotone
Reverse
Syntactic (Genzel, Coling 2010)
STIR (HMM-based Alignments)
STIR (Annotated Alignments)
0
25
50
75
100
18
Results: Fuzzy Reordering
34.9
Monotone
Reverse
Syntactic (Genzel, Coling 2010)
STIR (HMM-based Alignments)
STIR (Annotated Alignments)
0
25
50
75
100
18
Results: Fuzzy Reordering
34.9
Monotone
30.8
Reverse
Syntactic (Genzel, Coling 2010)
STIR (HMM-based Alignments)
STIR (Annotated Alignments)
0
25
50
75
100
18
Results: Fuzzy Reordering
34.9
Monotone
30.8
Reverse
66.0
Syntactic (Genzel, Coling 2010)
STIR (HMM-based Alignments)
STIR (Annotated Alignments)
0
25
50
75
100
18
Results: Fuzzy Reordering
34.9
Monotone
30.8
Reverse
66.0
Syntactic (Genzel, Coling 2010)
49.5
STIR (HMM-based Alignments)
STIR (Annotated Alignments)
0
25
50
75
100
18
Results: Fuzzy Reordering
34.9
Monotone
30.8
Reverse
66.0
Syntactic (Genzel, Coling 2010)
49.5
STIR (HMM-based Alignments)
80.5
STIR (Annotated Alignments)
0
25
50
75
100
18
Results: Supervised & Induced Parts-of-Speech
100
80.5
79.4
77.8
75
49.7
50
25
0
All Features
No POS
No Clusters
No POS/Clusters
19
End-to-End Translation Experiments
• Translation model trained on ~700 million tokens of parallel text
• Primarily extracted from the web (Uszkoreit et al., Coling 2010)
• Alignments: 2 iterations IBM Model 1; 2 iterations HMM-based model
• Tune and test: 3100 and 1000 sentences sampled from the web
20
Results: End-to-End Translation
Tune
English-to-Japanese Web Translation (BLEU)
Test
Phrase-Based Translation System
18.7
19.0
Phrase-Based + Lexicalized Reordering
19.5
18.9
22.6
23.3
Phrase-Based + Syntactic Pre-Ordering
(Genzel, Coling 2010)
Forest-to-String with Flexible Binarization
23.1
22.9
STIR (Annotated Alignments)
22.5
22.9
(Zhang et al., ACL 2011)
20.3
20.7
STIR (HMM-Based Alignments)
0
5
10
15
20
25
21
Conclusion
22
Conclusion
• Task-specific grammar induction matches supervised parsers
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
 Forest-based extensions
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
 Forest-based extensions
 Induction from multiple languages
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
 Forest-based extensions
 Induction from multiple languages
 Incorporate monolingual grammar induction models
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
 Forest-based extensions
 Induction from multiple languages
 Incorporate monolingual grammar induction models
 Improve performance using unsupervised alignments
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
 Forest-based extensions
 Induction from multiple languages
 Incorporate monolingual grammar induction models
 Improve performance using unsupervised alignments
 Combine with supervised parsers
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
 Forest-based extensions
 Induction from multiple languages
 Incorporate monolingual grammar induction models
 Improve performance using unsupervised alignments
 Combine with supervised parsers
• Open question: what grammatical phenomena does this learn?
22
Conclusion
• Task-specific grammar induction matches supervised parsers
• Reordering as an extrinsic evaluation for grammar induction
• Lots of future work:
 Forest-based extensions
 Induction from multiple languages
 Incorporate monolingual grammar induction models
 Improve performance using unsupervised alignments
 Combine with supervised parsers
• Open question: what grammatical phenomena does this learn?
snahkT!
snahkT!
Thanks!
22