Rich bitext projection features for parse reranking

Morphosyntactic correspondence: a
progress report on bitext parsing
Alexander Fraser, Renjing Wang, Hinrich Schütze
Institute for NLP
University of Stuttgart
INFuture2009: Digital Resources and Knowledge Sharing
Nov 4th 2009, Zagreb
Outline
 The Institute for Natural Language Processing
at the University of Stuttgart
 Bitext parsing
 Using morphosyntactic correspondence
IfNLP Stuttgart
 The Institute for Natural Language Processing (IfNLP/IMS) at
the University of Stuttgart
 Dogil (Phonetics and Speech)
 Large department
 Kuhn/Rohrer (LFG syntax and semantics)
 Cahill (LFG generation)
 Heid (Terminology extraction, morphology)
 Padó (Semantics, lexical semantics)
 Schütze (Statistical NLP and Information Retrieval)
 More on next slide
IfNLP – Statistical NLP Group
 Hinrich Schütze (director since 2004)





Bernd Möbius – Speech recognition and synthesis
Helmut Schmid - Parsing , morphology (known for TreeTagger, BitPar)
Sabine Schulte im Walde – NLP and cognitive modeling of lexical semantics
Michael Walsh – Speech, exemplar theoretic syntax
Alex Fraser - Statistical machine translation, parsing, cross-lingual information retrieval
 General department areas of research
 New statistical NLP models and methods
 Semi-supervised and active learning
 Cognitive/linguistic representation models
 Applied to: NLP, retrieval, MT, speech, e-learning, …
IfNLP - Partnerships
 Partnerships
 Stuttgart: large projects with linguistics, computer science, EE signal
processing, high performance computing
 Germany: Darmstadt, Tübingen, DSPIN/CLARIN consortium (UIMAbased German processing)
 International: large French-led European project (6 universities, 4
industrial partners), collaborations on South African languages,
Edinburgh, CLARIN
 Industrial: various projects with publishers (many focusing on
terminology)
Outline
 The Institute for Natural Language Processing
at the University of Stuttgart
 Bitext parsing
 Using morphosyntactic correspondence
What is bitext parsing?
 Bitext: a text and its translation
 Sentences and their translations are aligned
 Sometimes called a parallel corpus
 Syntactic parsing: automatically find the syntactic structure of a sentence
(syntactic parse)
 Bitext parsing: automatically find the syntactic structure of the parallel
sentences in a bitext
 We will use the complementarity of the syntax of the two languages to obtain
improved parses
Motivation for bitext parsing
 Many advances in syntactic parsing come from better modeling
 But the overall bottleneck is the size of the treebank
 Our research asks a different question:
 Where can we (cheaply) obtain additional information, which helps to
supplement the treebank?
 A new information source for resolving ambiguity is a translation
 The human translator understands the sentence and disambiguates for us!
 Our research goal was to build large databases of improved parses to help
establish preferences for difficult phenomena like PP-attachment
Clause attachment ambiguity
Parse 1: high attachment
(wrong)
Parse 2: low attachment
(correct)
Not ambiguous in German



Number agreement disambiguates
FRAU (woman) and HATTE (had) agree
Unambiguous low attachment
Parse reranking of bitext
 Goal: improve English parsing accuracy
 Parse English sentence, obtain list of 100 best parse candidates
 Parse German sentence, obtain single best parse
 Determine the correspondence of German to English words using a word
alignment
 Calculate syntactic divergence of each English parse candidate and the
projection of the German parse
 Choose probable English parse candidate with low syntactic divergence
Measuring syntactic divergence
 Define features to capture different (overlapping) aspects of
syntactic divergence. Functions of:
 Candidate English parse e
 German parse g
 Word alignment a
 Combine in log-linear model
P(e | g) =
exp ∑m λm hm(g, e, a)
∑e exp ∑m λm hm(g, e, a)
 Discriminatively train λ parameters to maximize parsing
accuracy on a training set (minimum error rate training)
Rich bitext projection features
 Defined 36 features by looking at common English parsing errors
 No monolingual features, except baseline parser probability
 General features
 Is there a probable label correspondence between German and the
hypothesized English parse?
 How expected is the size of each constituent in the hypothesized
English parse given the German parse?
 Specific features
 Are coordinations realized identically?
 Is the NP structure the same?
 Mix of probabilistic and heuristic features
Training
 Use BitPar syntactic forest parser
 English BitPar trained on Penn Treebank
 German BitPar trained on Tiger Treebank
 Probabilistic feature functions built using large parallel text
(Europarl)
 Weights on feature functions (lambda vector) trained on
portion of the Penn Treebank together with its translation into
German
 Minimum error rate training using F score
Reranking English parses
 Difficult task
 German is difficult to parse
 Our knowledge source, the German parser, is out-ofdomain (poor performance)
 Baseline English parser we are trying to improve is indomain (good performance)
 Test set has long sentences
 Result: 0.70% F1 improvement on test data (stat.
significant)
New results
 Reranking German parses
 We needed German gold standard parses (and English translations)
 Sebastian Pado has made a small parallel treebank for Europarl available
 No engineering on German yet
 We are using the same syntactic divergence features which were designed to
improve English parsing
 There are German specific ambiguities which could be modeled, such as subjectobject ambiguity (e.g., Die Maus jagt die Katze, “the mouse chases the cat” or “the
cat chases the mouse”)
 But easier task because the parser we are trying to improve is weaker (German is
hard to parse, Europarl is out of domain)
 2.3% F1 improvement currently, we think this can be further improved
Summary: bitext parsing
 I showed you an approach for bitext parsing
 Reranking the parses of English to minimize syntactic divergence with
an automatically generated German parse
 I then showed our first results for reranking German parses
using a single English parse
 The approach we used for this kind of morphosyntactic
correspondence is more general than just parse reranking
 Machine translation involves morphosyntactic correspondence
 And this is where we are interested in looking at Croatian
Outline
 The Institute for Natural Language Processing
at the University of Stuttgart
 Bitext parsing
 Using morphosyntactic correspondence
Morphosyntactic processing
 I am co-PI of a new IfNLP project funded by the DFG (German Science
Foundation)
 Project: morphosyntactic modeling for statistical machine translation
(SMT)
 SMT research, up until recently, has been dominated by translation into
English
 English expresses a lot of information through word order, very little through
inflection
 Approaches to translating morphologically rich languages to English
are preprocessing based
Present: linguistic preprocessing
 Linguistic preprocessing for SMT (stat. machine translation)
 From: freer syntax, morphologically rich language
 To:
rigid syntax, morphologically poor language
 Existing examples: German to English, Czech to English
Present: linguistic preprocessing
 How this works
 Produce morphosyntactic analysis of German (or Czech)
 Reorder words in the German/Czech sentence to be in English order
 Reduce morphological inflection (for instance, remove case marking,
remove all agreement on adjectives, etc)
 For Czech: insert pseudo-words (e.g. indicate PRO-drop pronouns)
 Use statistics on this “simplified” German or Czech to map directly to
English using SMT
Present: linguistic preprocessing
 How well does this work?
 German to English SMT with linguistic preprocessing
(Stuttgart system)
 Results from 2008 ACL workshop on machine translation (extensive
human evaluation)
 Only system limited to organizer’s data competitive with:
 The best system of 5 rule-based MT systems
 Saarbrücken hybrid rule-based/SMT system
 Google Translate, which does not use linguistic preprocessing but does
use vastly more data
Future: modeling
 What about translating from English to German or to Slavic
languages?
 Problem: morphological generation is more difficult
 It is easy to reduce multiple inflections to one (for instance, stemming)
 Harder to learn to generate the right inflection
Future: modeling
 Current work on morphological generation
 Work at Charles University in Prague on Czech
 Tectogrammatical representation is not (yet) competitive with simple
statistics (little explicit knowledge of morphology or syntax)
 Best English to German SMT systems also use little or no
morphological knowledge
 And they are much worse than rule-based English to German systems
 Challenge: to use morphosyntactic knowledge with statistical
approaches requires more than just linguistic preprocessing
 morphosyntactic modeling
Morphosyntactic correspondence
 In fact, all multilingual problems involve morphosyntactic
correspondence:
 If we have a source parse tree, and source text, and we would like a
target text, this is machine translation
 If we have a source parse tree, source text and target text, and we
would like a target parse, this is bitext parsing
 If we would like to know which word in the target text is a translation
of a particular word in the source text and we use morphosyntactic
analysis, this is syntactic word alignment
 The same thinking can be used for cross-lingual information retrieval
 Very relevant when one of the languages is morphologically rich
Conclusion



I introduced the IfNLP Stuttgart
I presented a new approach to improving parsing using morphosyntactic
correspondence: bitext parsing
I discussed the general challenge of using morphosyntactic correspondence,
focusing on statistical machine translation
 Biggest challenge is translating into freer word order, morphologically rich (e.g., German
and particularly Slavic languages)
 We are interested in the challenge of building systems to translate to Croatian
 To do this: we need partners who are working on Croatian analysis!
 We also request that you think about multilingual applications when producing Croatian
NLP resources

The type of approach I showed for bitext parsing is useful for other multilingual
applications
Thank you!
Title
 text
Statistical Approach
 Using statistical models
 Create many alternatives, called hypotheses
 Give a score to each hypothesis
 Find the hypothesis with the best score through search
 Disadvantages
 Difficulties handling structurally rich models (math and computation)
 Need data to train the model parameters
 Difficult to understand decision process made by system
 Advantages




Avoid hard decisions
Speed can be traded with quality, no all-or-nothing
Works better in the presence of unexpected input
Learns automatically as more data becomes available
Modified from Vogel
Morphosyntactic knowledge
 We use: morphological analyzers & treebanks, which are combined in
parsing models learned from treebanks
 English models have little morphological analysis (suffix analysis to determine
POS for unknown words)
 German syntactic parser BitPar (Schmid) uses SMOR (Stuttgart Morphological
Analyzer)
 Given inflected form, SMOR returns possible fine-grained POS tags
 E.g., for nouns/adjectives: POS, case, gender, number, definiteness
 BitPar puts possible analyses in the chart, and disambiguates
 Slavic languages require even more morphological knowledge than German
Transferring syntactic knowledge
 Need knowledge source!
 English syntactic parser
 About 90% bracketing accuracy
 Mapping
 Requires bitext
 Work discussed here uses German/English Europarl
(European Parliament Proceedings)
 Resource for Croatian: Acquis Communautaire
 Automatically generated word alignment
Additional details in the paper




Formalization of bitext parsing as a parse reranking task
Definitions of bitext feature functions
Analysis of feature functions through feature selection
Comparison of MERT (minimum error rate training) with SVMRank