Inducing Sentence Structure from Parallel Corpora for Reordering John DeNero and Jakob Uszkoreit Overview 2 Overview • Translation is an end-task application for syntactic parsing 2 Overview • Translation is an end-task application for syntactic parsing Lexical disambiguation 2 Overview • Translation is an end-task application for syntactic parsing Lexical disambiguation Structural (hierarchical) reordering 2 Overview • Translation is an end-task application for syntactic parsing Lexical disambiguation Structural (hierarchical) reordering • Syntactic pre-ordering uses a supervised parser to predict structure 2 Overview • Translation is an end-task application for syntactic parsing Lexical disambiguation Structural (hierarchical) reordering • Syntactic pre-ordering uses a supervised parser to predict structure • This talk: Unsupervised approach to predicting sentence structure 2 Overview • Translation is an end-task application for syntactic parsing Lexical disambiguation Structural (hierarchical) reordering • Syntactic pre-ordering uses a supervised parser to predict structure • This talk: Unsupervised approach to predicting sentence structure Pipeline of reordering-focused prediction problems 2 Overview • Translation is an end-task application for syntactic parsing Lexical disambiguation Structural (hierarchical) reordering • Syntactic pre-ordering uses a supervised parser to predict structure • This talk: Unsupervised approach to predicting sentence structure Pipeline of reordering-focused prediction problems Learning signal comes from aligned parallel corpora 2 Syntax Matters for Hierarchical Reordering English-to-Japanese Web Translation Tune Test Single-Reference BLEU 3 Syntax Matters for Hierarchical Reordering Tune English-to-Japanese Web Translation Test Phrase-Based + Lexicalized Reordering Phrase-Based + Syntactic Pre-Ordering (Genzel, Coling 2010) Forest-to-String with Flexible Binarization (Zhang et al., ACL 2011) STIR (This paper) 0 5 10 15 20 25 Single-Reference BLEU 3 Syntax Matters for Hierarchical Reordering Tune English-to-Japanese Web Translation Test 19.5 Phrase-Based + Lexicalized Reordering 18.9 Phrase-Based + Syntactic Pre-Ordering (Genzel, Coling 2010) Forest-to-String with Flexible Binarization (Zhang et al., ACL 2011) STIR (This paper) 0 5 10 15 20 25 Single-Reference BLEU 3 Syntax Matters for Hierarchical Reordering Tune English-to-Japanese Web Translation Test 19.5 Phrase-Based + Lexicalized Reordering 18.9 22.6 Phrase-Based + Syntactic Pre-Ordering 23.3 (Genzel, Coling 2010) Forest-to-String with Flexible Binarization (Zhang et al., ACL 2011) STIR (This paper) 0 5 10 15 20 25 Single-Reference BLEU 3 Syntax Matters for Hierarchical Reordering Tune English-to-Japanese Web Translation Test 19.5 Phrase-Based + Lexicalized Reordering 18.9 22.6 Phrase-Based + Syntactic Pre-Ordering 23.3 (Genzel, Coling 2010) 23.1 Forest-to-String with Flexible Binarization 22.9 (Zhang et al., ACL 2011) STIR (This paper) 0 5 10 15 20 25 Single-Reference BLEU 3 Syntax Matters for Hierarchical Reordering Tune English-to-Japanese Web Translation Test 19.5 Phrase-Based + Lexicalized Reordering 18.9 22.6 Phrase-Based + Syntactic Pre-Ordering 23.3 (Genzel, Coling 2010) 23.1 Forest-to-String with Flexible Binarization 22.9 (Zhang et al., ACL 2011) 22.5 STIR 22.9 (This paper) 0 5 10 15 20 25 Single-Reference BLEU 3 Syntax Matters for Hierarchical Reordering Tune English-to-Japanese Web Translation Test 19.5 Phrase-Based + Lexicalized Reordering 18.9 22.6 Phrase-Based + Syntactic Pre-Ordering 23.3 (Genzel, Coling 2010) 23.1 Forest-to-String with Flexible Binarization 22.9 (Zhang et al., ACL 2011) 22.5 +4 STIR (This paper) 0 5 10 15 22.9 20 25 Single-Reference BLEU 3 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline SVO 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline SVO Monolingual parsing model 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline SVO Monolingual parsing model SVO 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline SVO Monolingual parsing model SVO Reordering model 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline SVO Monolingual parsing model SVO Reordering model SOV 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline SVO Monolingual parsing model Translation Pipeline Source training Target training SVO Reordering model SOV 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline SVO Monolingual parsing model SVO Reordering model SOV Translation Pipeline Source training Target training Word aligner Phrase extraction Model estimation Translation Model 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline SVO Monolingual parsing model SVO Reordering model SOV Translation Pipeline Source training Preprocess Target training Word aligner Phrase extraction Model estimation Translation Model 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline Source training SVO Preprocess Monolingual parsing model Target training Word aligner Phrase extraction SVO Model estimation Reordering model Translation Model SOV Source input Target output 4 Syntactic Pre-Ordering in Phrase-Based Systems Pre-Ordering Pipeline Translation Pipeline Source training SVO Preprocess Monolingual parsing model Word aligner Phrase extraction SVO Model estimation Reordering model SOV Target training Preprocess Source input Translation Model Target output 4 Syntactic Pre-Ordering Example pair added to the lexicon 5 Syntactic Pre-Ordering Example S VP PP NP NP NN VBD TO DT NN pair added to the lexicon 5 Syntactic Pre-Ordering Example S S VP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair 5 Syntactic Pre-Ordering Example S S VP VP PP NP NP NP NN VBD TO DT NN NN VBD pair added to the lexicon pair added 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN TO VBD pair added to the lexicon pair to added 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT NN TO the lexicon to VBD added 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT NN TO the lexicon to VBD added 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT NN TO the lexicon to VBD added 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT NN TO the lexicon to VBD added 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT NN TO the lexicon to VBD added 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT NN TO the lexicon to VBD added 一対 が pair 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT NN TO the lexicon to VBD added 一対 が 目録 pair list 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT TO the lexicon to 一対 が 目録 pair NN list VBD added に to 5 Syntactic Pre-Ordering Example S S VP VP PP PP NP NP NP NN VBD TO DT NN NN pair added to the lexicon pair NP DT TO the lexicon to 一対 が 目録 pair NN list VBD added に 追加されました to add was 5 Bracketing and Word Classes are Often Sufficient Decisions pair added to the Methods lexicon 6 Bracketing and Word Classes are Often Sufficient Decisions Methods N V A D N pair added to the lexicon 6 Bracketing and Word Classes are Often Sufficient Decisions Methods Universal Parts-of-speech (Petrov et al., 2011) N V A D N pair added to the lexicon 6 Bracketing and Word Classes are Often Sufficient Decisions Methods Universal Parts-of-speech (Petrov et al., 2011) N V A D N pair added to the lexicon • Supervised tagging model • Project models via alignments • Unsupervised POS induction 6 Bracketing and Word Classes are Often Sufficient Decisions Methods Universal Parts-of-speech (Petrov et al., 2011) N V A D N pair added to the lexicon • Supervised tagging model • Project models via alignments • Unsupervised POS induction 6 Bracketing and Word Classes are Often Sufficient Decisions Methods Universal Parts-of-speech (Petrov et al., 2011) N V A D N pair added to the lexicon • Supervised tagging model • Project models via alignments • Unsupervised POS induction • Alignments ≈ bracketing • Discriminative bracketing model 6 Bracketing and Word Classes are Often Sufficient Decisions Methods Universal Parts-of-speech (Petrov et al., 2011) N V A D N pair added to the lexicon • Supervised tagging model • Project models via alignments • Unsupervised POS induction • Alignments ≈ bracketing • Discriminative bracketing model 6 Bracketing and Word Classes are Often Sufficient Decisions Methods Universal Parts-of-speech (Petrov et al., 2011) N V A D N pair added to the lexicon • Supervised tagging model • Project models via alignments • Unsupervised POS induction • Alignments ≈ bracketing • Discriminative bracketing model • Alignments gives reordering • Reordering classifier 6 Pre-Ordering from a Parallel Corpus Analyze Aligned Parallel Corpus Pre-Ordering Pipeline SVO Monolingual parsing model SVO Reordering model SOV 7 Pre-Ordering from a Parallel Corpus Analyze Aligned Parallel Corpus 一対 Pre-Ordering Pipeline SVO が 目録 に 追加 されました pair added to the lexicon Monolingual parsing model SVO Reordering model SOV 7 Pre-Ordering from a Parallel Corpus Analyze Aligned Parallel Corpus 一対 Pre-Ordering Pipeline SVO が 目録 に 追加 されました pair added to the lexicon Monolingual parsing model SVO Reordering model SOV 7 Pre-Ordering from a Parallel Corpus Analyze Aligned Parallel Corpus 一対 Pre-Ordering Pipeline SVO が 目録 に 追加 されました pair added to the lexicon Monolingual parsing model SVO Reordering model SOV 7 Binary Bracketing Trees with Phrasal Terminals A tree is defined by the set of spans it contains pair added to the lexicon 8 Binary Bracketing Trees with Phrasal Terminals A tree is defined by the set of spans it contains pair added to the lexicon 9 Binary Bracketing Trees with Phrasal Terminals A tree is defined by the set of spans it contains Not every word will correspond to a span pair added to the lexicon 9 Binary Bracketing Trees with Phrasal Terminals A tree is defined by the set of spans it contains Not every word will correspond to a span A tree can be composed of only a single span pair added to the lexicon 9 Parallel Parsing Model Constraint 一対 が 目録 に 追加 されました pair added to the lexicon 10 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に 追加 されました pair added to the lexicon 10 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に 追加 されました pair added to the lexicon 10 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に 追加 ITG? されました pair added to the lexicon 10 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に 追加 ITG? されました ITG. pair added to the lexicon 10 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に Remaining ambiguity: 追加 • Ordering alternatives されました pair added to the lexicon 11 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に Remaining ambiguity: 追加 • Ordering alternatives されました vs pair added to the lexicon pair added to the lexicon 11 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に 追加 されました Remaining ambiguity: • Ordering alternatives • Phrase granularity pair added to the lexicon 12 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に 追加 されました Remaining ambiguity: • Ordering alternatives • Phrase granularity pair added to the lexicon 12 Parallel Parsing Model Constraint 一対 が An alignment licenses a tree if every tree span is aligned contiguously 目録 に 追加 されました Remaining ambiguity: • Ordering alternatives • Phrase granularity vs pair added to the lexicon pair added to the lexicon 12 Parallel Parsing Model vs pair added to the lexicon pair added to the lexicon 13 Parallel Parsing Model φ(pair) · φ(added) · φ(to) · φ(the lexicon) vs pair added to the lexicon pair added to the lexicon 13 Parallel Parsing Model φ(pair) · φ(added) φ(pair)· · φ(to) · φ(the lexicon) φ(added to the lexicon) vs pair added to the lexicon pair added to the lexicon 13 Parallel Parsing Model φ(e) = � κ contiguous(e) total(e) if |t| = 1 otherwise φ(pair) · φ(added) φ(pair)· · φ(to) · φ(the lexicon) φ(added to the lexicon) vs pair added to the lexicon pair added to the lexicon 13 Parallel Parsing Model φ(e) = � κ contiguous(e) total(e) if |t| = 1 otherwise φ(pair) · φ(added) English-Japanese κ = 0.3 φ(pair)· · φ(to) · φ(the lexicon) φ(added to the lexicon) vs pair added to the lexicon pair added to the lexicon 13 Monolingual Parsing Model 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective • Maximum terminal phrase length (2 for English-Japanese) 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective • Maximum terminal phrase length (2 for English-Japanese) pair added to the lexicon 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective • Maximum terminal phrase length (2 for English-Japanese) ∗ t : e: pair added to the lexicon 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective • Maximum terminal phrase length (2 for English-Japanese) Pw (t|e) ∝ exp(w · θ(t, e)) ∗ t : e: pair added to the lexicon 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective • Maximum terminal phrase length (2 for English-Japanese) ∗ t : e: pair added to the lexicon Pw (t|e) ∝ exp(w · θ(t, e)) � ∗ Pw (t |e) arg max w (t∗ ,e) 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective • Maximum terminal phrase length (2 for English-Japanese) � φ(e) e ∗ t : e: pair added to the lexicon Pw (t|e) ∝ exp(w · θ(t, e)) � ∗ Pw (t |e) arg max w (t∗ ,e) 14 Monolingual Parsing Model • Predict the same tree as the parallel parser, without the alignments • Lexical, word class, corpus statistics, & length features • Conditional likelihood objective • Maximum terminal phrase length (2 for English-Japanese) � e φ(e) , constituents, contexts, span length, common words, ... ∗ t : e: pair added to the lexicon Pw (t|e) ∝ exp(w · θ(t, e)) � ∗ Pw (t |e) arg max w (t∗ ,e) 14 Reordering Models for Terminals & Non-Terminals 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute • Features similar to monolingual parser 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute • Features similar to monolingual parser • Terminals vs non-terminals 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute • Features similar to monolingual parser • Terminals vs non-terminals • Maximum Entropy objective 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute • Features similar to monolingual parser • Terminals vs non-terminals • Maximum Entropy objective ∗ t : e: pair added to the lexicon 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute • Features similar to monolingual parser • Terminals vs non-terminals • Maximum Entropy objective Non-terminal model trained on tree spans ∗ t : e: pair added to the lexicon 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute • Features similar to monolingual parser • Terminals vs non-terminals • Maximum Entropy objective Non-terminal model trained on tree spans Terminal model trained on all contiguously aligned spans ∗ t : e: pair added to the lexicon 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute 一対 • Features similar to monolingual parser が • Terminals vs non-terminals 目録 に • Maximum Entropy objective 追加 されました Non-terminal model trained on tree spans added to the lexicon Terminal model trained on all contiguously aligned spans ∗ t : e: pair pair added to the lexicon 15 Reordering Models for Terminals & Non-Terminals • Predict which spans to permute 一対 • Features similar to monolingual parser が • Terminals vs non-terminals 目録 に • Maximum Entropy objective 追加 されました Non-terminal model trained on tree spans added to the lexicon Terminal model trained on all contiguously aligned spans ∗ t : e: pair pair added to the lexicon 15 Related Work (Sub-Sampled from Paper) 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction Learn the ordering for each pair of words (Tromble & Eisner, EMNLP 09) 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction Learn the ordering for each pair of words Learn post-ordering collocations (Tromble & Eisner, EMNLP 09) (Visweswariah, EMNLP 2011) 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction Learn the ordering for each pair of words Learn post-ordering collocations (Tromble & Eisner, EMNLP 09) (Visweswariah, EMNLP 2011) • Synchronous grammar induction 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction Learn the ordering for each pair of words Learn post-ordering collocations (Tromble & Eisner, EMNLP 09) (Visweswariah, EMNLP 2011) • Synchronous grammar induction ITG models focus on lexical productions (Blunsom et al., ACL 09) 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction Learn the ordering for each pair of words Learn post-ordering collocations (Tromble & Eisner, EMNLP 09) (Visweswariah, EMNLP 2011) • Synchronous grammar induction ITG models focus on lexical productions (Blunsom et al., ACL 09) • Monolingual grammar induction 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction Learn the ordering for each pair of words Learn post-ordering collocations (Tromble & Eisner, EMNLP 09) (Visweswariah, EMNLP 2011) • Synchronous grammar induction ITG models focus on lexical productions (Blunsom et al., ACL 09) • Monolingual grammar induction Likelihood objective over a monolingual corpus 16 Related Work (Sub-Sampled from Paper) • Syntactic Pre-ordering • Pre-ordering as permutation prediction Learn the ordering for each pair of words Learn post-ordering collocations (Tromble & Eisner, EMNLP 09) (Visweswariah, EMNLP 2011) • Synchronous grammar induction ITG models focus on lexical productions (Blunsom et al., ACL 09) • Monolingual grammar induction Likelihood objective over a monolingual corpus Alignments (Kuhn, ACL 04) and bitexts (Snyder et al., ACL 09) are useful 16 An Evaluation Method for Pre-Ordering Pipelines 17 An Evaluation Method for Pre-Ordering Pipelines 1 • Annotate a parallel corpus with word alignments Translators are responsible for annotation Solicit translations that align well Details in Talbot et al. (WMT 2011) 17 An Evaluation Method for Pre-Ordering Pipelines 1 • Annotate a parallel corpus with word alignments Translators are responsible for annotation Solicit translations that align well Details in Talbot et al. (WMT 2011) ∗ 2 • Derive “gold ordering” σ from word alignments Sort source words by the index of their aligned targets Sensible handling of “null” alignments 17 An Evaluation Method for Pre-Ordering Pipelines 1 • Annotate a parallel corpus with word alignments Translators are responsible for annotation Solicit translations that align well Details in Talbot et al. (WMT 2011) ∗ 2 • Derive “gold ordering” σ from word alignments Sort source words by the index of their aligned targets Sensible handling of “null” alignments 3 • Compute “fuzzy reordering” metric for proposed ordering σ̂ |e| − chunks(σ̂, σ ∗ ) |e| − 1 17 An Evaluation Method for Pre-Ordering Pipelines 1 • Annotate a parallel corpus with word alignments Translators are responsible for annotation Solicit translations that align well Details in Talbot et al. (WMT 2011) ∗ 2 • Derive “gold ordering” σ from word alignments Sort source words by the index of their aligned targets Sensible handling of “null” alignments 3 • Compute “fuzzy reordering” metric for proposed ordering |e| − chunks(σ̂, σ ∗ ) |e| − 1 Partition σ̂ σ̂ into pieces, s.t. 17 An Evaluation Method for Pre-Ordering Pipelines 1 • Annotate a parallel corpus with word alignments Translators are responsible for annotation Solicit translations that align well Details in Talbot et al. (WMT 2011) ∗ 2 • Derive “gold ordering” σ from word alignments Sort source words by the index of their aligned targets Sensible handling of “null” alignments 3 • Compute “fuzzy reordering” metric for proposed ordering |e| − chunks(σ̂, σ ∗ ) |e| − 1 Partition σ̂ σ̂ into pieces, s.t. All pieces contiguous in σ ∗ 17 An Evaluation Method for Pre-Ordering Pipelines 1 • Annotate a parallel corpus with word alignments Translators are responsible for annotation Solicit translations that align well Details in Talbot et al. (WMT 2011) ∗ 2 • Derive “gold ordering” σ from word alignments Sort source words by the index of their aligned targets Sensible handling of “null” alignments 3 • Compute “fuzzy reordering” metric for proposed ordering |e| − chunks(σ̂, σ ∗ ) |e| − 1 σ̂ min size Partition σ̂ into pieces, s.t. partition All pieces contiguous in σ ∗ 17 Results: Fuzzy Reordering 18 Results: Fuzzy Reordering Monotone Reverse Syntactic (Genzel, Coling 2010) STIR (HMM-based Alignments) STIR (Annotated Alignments) 0 25 50 75 100 18 Results: Fuzzy Reordering 34.9 Monotone Reverse Syntactic (Genzel, Coling 2010) STIR (HMM-based Alignments) STIR (Annotated Alignments) 0 25 50 75 100 18 Results: Fuzzy Reordering 34.9 Monotone 30.8 Reverse Syntactic (Genzel, Coling 2010) STIR (HMM-based Alignments) STIR (Annotated Alignments) 0 25 50 75 100 18 Results: Fuzzy Reordering 34.9 Monotone 30.8 Reverse 66.0 Syntactic (Genzel, Coling 2010) STIR (HMM-based Alignments) STIR (Annotated Alignments) 0 25 50 75 100 18 Results: Fuzzy Reordering 34.9 Monotone 30.8 Reverse 66.0 Syntactic (Genzel, Coling 2010) 49.5 STIR (HMM-based Alignments) STIR (Annotated Alignments) 0 25 50 75 100 18 Results: Fuzzy Reordering 34.9 Monotone 30.8 Reverse 66.0 Syntactic (Genzel, Coling 2010) 49.5 STIR (HMM-based Alignments) 80.5 STIR (Annotated Alignments) 0 25 50 75 100 18 Results: Supervised & Induced Parts-of-Speech 100 80.5 79.4 77.8 75 49.7 50 25 0 All Features No POS No Clusters No POS/Clusters 19 End-to-End Translation Experiments • Translation model trained on ~700 million tokens of parallel text • Primarily extracted from the web (Uszkoreit et al., Coling 2010) • Alignments: 2 iterations IBM Model 1; 2 iterations HMM-based model • Tune and test: 3100 and 1000 sentences sampled from the web 20 Results: End-to-End Translation Tune English-to-Japanese Web Translation (BLEU) Test Phrase-Based Translation System 18.7 19.0 Phrase-Based + Lexicalized Reordering 19.5 18.9 22.6 23.3 Phrase-Based + Syntactic Pre-Ordering (Genzel, Coling 2010) Forest-to-String with Flexible Binarization 23.1 22.9 STIR (Annotated Alignments) 22.5 22.9 (Zhang et al., ACL 2011) 20.3 20.7 STIR (HMM-Based Alignments) 0 5 10 15 20 25 21 Conclusion 22 Conclusion • Task-specific grammar induction matches supervised parsers 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: Forest-based extensions 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: Forest-based extensions Induction from multiple languages 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: Forest-based extensions Induction from multiple languages Incorporate monolingual grammar induction models 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: Forest-based extensions Induction from multiple languages Incorporate monolingual grammar induction models Improve performance using unsupervised alignments 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: Forest-based extensions Induction from multiple languages Incorporate monolingual grammar induction models Improve performance using unsupervised alignments Combine with supervised parsers 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: Forest-based extensions Induction from multiple languages Incorporate monolingual grammar induction models Improve performance using unsupervised alignments Combine with supervised parsers • Open question: what grammatical phenomena does this learn? 22 Conclusion • Task-specific grammar induction matches supervised parsers • Reordering as an extrinsic evaluation for grammar induction • Lots of future work: Forest-based extensions Induction from multiple languages Incorporate monolingual grammar induction models Improve performance using unsupervised alignments Combine with supervised parsers • Open question: what grammatical phenomena does this learn? snahkT! snahkT! Thanks! 22
© Copyright 2026 Paperzz