Multiword Expression Identification with Tree Substitution Grammars Spence Green, Marie-Catherine de Marneffe, John Bauer, and Christopher D. Manning Stanford University EMNLP 2011 Main Idea Use syntactic context to find multiword expressions Main Idea Use syntactic context to find multiword expressions Syntactic context → constituency parses Main Idea Use syntactic context to find multiword expressions Syntactic context → Multiword expressions → constituency parses idiomatic constructions Which languages? Results and analysis for French 3 / 42 Which languages? Results and analysis for French I Lexicographic tradition of compiling MWE lists I Annotated data! 3 / 42 Which languages? Results and analysis for French I Lexicographic tradition of compiling MWE lists I Annotated data! English examples in the talk 3 / 42 Motivating Example: Humans get this 1. He kicked the pail. 2. He kicked the bucket. I “He died.” (Katz and Postal 1963) 4 / 42 Stanford parser can’t tell the difference S NP VP NP He kicked the pail 5 / 42 Stanford parser can’t tell the difference S S NP NP VP NP He kicked the pail He VP NP kicked the bucket 5 / 42 What does the lexicon contain? S Single-word entries? I kick : <agent, theme> I die : <theme> NP Multi-word entries? I kick the bucket : <theme> He VP NP kicked the bucket 6 / 42 Lexicon-Grammar: He kicked the bucket S NP VP He died 7 / 42 Lexicon-Grammar: He kicked the bucket S S NP He VP died NP He VP MWV kicked the bucket (Gross 1986) 7 / 42 MWEs in Lexicon-Grammar Classified by global POS Described by internal POS sequence Flat structures! MWV VBD kicked DT NN the bucket 8 / 42 MWEs in Lexicon-Grammar Classified by global POS Described by internal POS sequence Flat structures! MWV VBD kicked DT NN the bucket Of theoretical interest but... 8 / 42 Why do we care (in NLP)? MWE knowledge improves: Dependency parsing (Nivre and Nilsson 2004) Constituency parsing (Arun and Keller 2005) Sentence generation (Hogan et al. 2007) Machine translation Shallow parsing (Carpuat and Diab 2010) (Korkontzelos and Manandhar 2010) 9 / 42 Why do we care (in NLP)? MWE knowledge improves: Dependency parsing (Nivre and Nilsson 2004) Constituency parsing (Arun and Keller 2005) Sentence generation (Hogan et al. 2007) Machine translation Shallow parsing (Carpuat and Diab 2010) (Korkontzelos and Manandhar 2010) Most experiments assume high accuracy identification! 9 / 42 French and the French Treebank MWEs common in French I ∼5,000 multiword adverbs 10 / 42 French and the French Treebank MWC MWEs common in French I ∼5,000 multiword adverbs Paris 7 French Treebank I ∼16,000 trees I 13% of tokens are MWE P N C sous prétexte que on the grounds that 10 / 42 French Treebank: MWE types I ET CL Global POS PRO Lots of nominal compounds e.g. N – N numéro deux ADV D V C P ADV N 0 10 20 30 40 50 %Total MWEs 11 / 42 MWE Identification Evaluation Identification is a by-product of parsing 12 / 42 MWE Identification Evaluation Identification is a by-product of parsing I Corpus: Paris 7 French Treebank (FTB) I Split: same as (Crabbé and Candito 2008) I Metrics: Precision and Recall I Lengths ≤ 40 words 12 / 42 MWE Identification: Parent-Annotated PCFG 60 32.6 20 PC FG 0 PA - F1 40 13 / 42 MWE Identification: n-gram methods 60 32.6 34.7 20 m w et oo lk it PC FG 0 PA - F1 40 14 / 42 MWE Identification: n-gram methods 60 F1 40 32.6 34.7 20 m w et oo lk it PA - PC FG 0 Standard approach in 2008 MWE Shared Task, MWE Workshops, etc. 14 / 42 n-gram methods: mwetoolkit Based on surface statistics 15 / 42 n-gram methods: mwetoolkit Based on surface statistics Step 1: Lemmatize and POS tag corpus 15 / 42 n-gram methods: mwetoolkit Based on surface statistics Step 1: Lemmatize and POS tag corpus Step 2: Compute n-gram statistics: I Maximum likelihood estimator I Dice’s coefficient I Pointwise mutual information I Student’s t-score (Ramisch, Villavicencio, and Boitet 2010) 15 / 42 n-gram methods: mwetoolkit Step 3: Create n-gram feature vectors 16 / 42 n-gram methods: mwetoolkit Step 3: Create n-gram feature vectors Step 4: Train a binary classifier 16 / 42 n-gram methods: mwetoolkit Step 3: Create n-gram feature vectors Step 4: Train a binary classifier Exploits statistical idiomaticity of MWEs 16 / 42 Is statistical idiomaticity sufficient? VN French multiword verbs Tree maintains relationship between MWV parts MWV MWADV MWV va d’ ailleurs bon train is also well underway 17 / 42 Recap: French MWE Identification Baselines 60 32.6 34.7 20 m w et oo lk it PC FG 0 PA - F1 40 18 / 42 Recap: French MWE Identification Baselines 60 F1 40 32.6 34.7 20 m w et oo lk it PA - PC FG 0 Let’s build a better grammar 18 / 42 Better PCFGs: Manual grammar splits Symbol refinement à la (Klein and Manning 2003) 19 / 42 Better PCFGs: Manual grammar splits Symbol refinement à la (Klein and Manning 2003) I Has a verbal nucleus (VN) 19 / 42 Better PCFGs: Manual grammar splits COORD Symbol refinement à la (Klein and Manning 2003) I Has a verbal nucleus (VN) C ADV VN Ou bien doit -il ... Otherwise he must 19 / 42 Better PCFGs: Manual grammar splits COORD-hasVN Symbol refinement à la (Klein and Manning 2003) I Has a verbal nucleus (VN) C ADV VN Ou bien doit -il ... Otherwise he must 20 / 42 French MWE Identification: Manual Splits 80 63.1 60 32.6 34.7 20 Sp lits m w et oo lk it 0 PA -P C FG F1 40 21 / 42 French MWE Identification: Manual Splits 80 63.1 60 F1 40 32.6 34.7 20 Sp lits m w et oo lk it PA -P C FG 0 MWE features: high frequency POS sequences 21 / 42 Capture more syntactic context? PCFGs work well! 22 / 42 Capture more syntactic context? PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG) 22 / 42 Capture more syntactic context? PCFGs work well! Larger “rules”: Tree Substitution Grammars (TSG) Relationship with Data-Oriented Parsing (DOP): I Same grammar formalism (TSG) I We include unlexicalized fragments I Different parameter estimation 22 / 42 Which tree fragments do we select? S NP VP N MWV He V kicked D N the bucket 23 / 42 Which tree fragments do we select? S NP VP N MWV He V kicked D N the bucket 24 / 42 Which tree fragments do we select? NP V N kicked He MWV V D S N the bucket NP VP MWV 25 / 42 TSG Grammar Extraction as Tree Selection MWV V D N the bucket 26 / 42 TSG Grammar Extraction as Tree Selection MWV V D N the bucket I Describes MWE context I Allows for inflection: kick, kicked, kicking 26 / 42 Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering1 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42 Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering1 Labeled Chinese Restaurant process I Dirichlet process (DP) prior for each non-terminal type c 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42 Dirichlet process TSG (DP-TSG) Tree selection as non-parametric clustering1 Labeled Chinese Restaurant process I Dirichlet process (DP) prior for each non-terminal type c Supervised case: segment the treebank 1 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 27 / 42 DP-TSG: Learning and Inference DP base distribution from manually-split CFG 28 / 42 DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler I (Liang, Jordan, and Klein 2010) Fast convergence: 400 iterations 28 / 42 DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler I (Liang, Jordan, and Klein 2010) Fast convergence: 400 iterations Derivations of a TSG are a CFG forest 28 / 42 DP-TSG: Learning and Inference DP base distribution from manually-split CFG Type-based Gibbs sampler I (Liang, Jordan, and Klein 2010) Fast convergence: 400 iterations Derivations of a TSG are a CFG forest I SCFG decoder: cdec (Dyer et al. 2010) 28 / 42 French MWE Identification: DP-TSG 80 71.1 63.1 60 32.6 34.7 20 G D PTS Sp lit s m w et oo lk it PC FG 0 PA - F1 40 29 / 42 French MWE Identification: DP-TSG 80 71.1 63.1 60 32.6 34.7 20 G D PTS Sp lit s m w et oo lk it PC FG 0 PA - F1 40 DP-TSG result is a lower bound 29 / 42 Human-interpretable DP-TSG rules MWN → coup de N coup de pied ‘kick’ coup de coeur ‘favorite’ coup de foudre ‘love at first sight’ coup de main ‘help’ coup de grâce ‘death blow’ 30 / 42 Human-interpretable DP-TSG rules MWN → coup de N coup de pied ‘kick’ coup de coeur ‘favorite’ coup de foudre ‘love at first sight’ coup de main ‘help’ coup de grâce ‘death blow’ n-gram methods: separate feature vectors 30 / 42 DP-TSG errors: Overgeneration NP NP D N AP Le marché A national ‘The national march’ D Le MWN N A marché national DP-TSG Reference 31 / 42 DP-TSG errors: Overgeneration NP NP D N AP Le marché A national ‘The national march’ D Le MWN N A marché national DP-TSG Reference MWEs are subtle; reference sometimes inconsistent 31 / 42 Standard Parsing Evaluation Same setup as MWE identification! 32 / 42 Standard Parsing Evaluation Same setup as MWE identification! I Corpus: Paris 7 French Treebank (FTB) I Split: same as (Crabbé and Candito 2008) I Metrics: Evalb and Leaf Ancestor I Lengths ≤ 40 words 32 / 42 French Parsing Evaluation: All bracketings 90 70 75.2 75.8 67.6 G D PTS Sp lit s 60 PA -P C FG Evalb F1 80 33 / 42 French Parsing Evaluation: All bracketings 90 Evalb F1 80 70 75.2 75.8 67.6 G D PTS Sp lit s PA -P C FG 60 Paper: more results (Stanford, Berkeley, etc.) 33 / 42 Future Directions Syntactic context for n-gram methods I Parse the corpus! I Adapt lexical context measures to syntactic context 34 / 42 Future Directions Syntactic context for n-gram methods I Parse the corpus! I Adapt lexical context measures to syntactic context DP-TSG I Better base distribution 34 / 42 Conclusion Parsers work well for MWE identification 35 / 42 Conclusion Parsers work well for MWE identification Other languages: combine treebanks with MWE lists 35 / 42 Conclusion Parsers work well for MWE identification Other languages: combine treebanks with MWE lists Non-“gold mode” parsing results for French 35 / 42 Conclusion Parsers work well for MWE identification Other languages: combine treebanks with MWE lists Non-“gold mode” parsing results for French Code → Google: “Stanford parser” 35 / 42 un grand merci. thanks a lot. Questions? MWE Identification Results 80 69.6 71.1 70.1 63.1 60 32.6 34.7 20 G TS DP - St an fo rd Be rk el ey Sp lit s m w et oo lk it PC FG 0 PA - F1 40 38 / 42 Dirichlet process TSG DP prior for each non-terminal type c ∈ V : θc |c, αc , P0 (·|c) ∼ DP(αc , P0 ) e|θc ∼ θc 2 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 39 / 42 Dirichlet process TSG DP prior for each non-terminal type c ∈ V : θc |c, αc , P0 (·|c) ∼ DP(αc , P0 ) e|θc ∼ θc Binary variable bs for each non-terminal node in corpus I Supervised case: segment the treebank2 2 Cohn, Goldwater, and Blunsom 2009; Post and Gildea 2009; O’Donnell, Tenenbaum, and Goodman 2009. 39 / 42 DP-TSG: Base distribution P0 Phrasal rules: P0 (A+ → B− C+ ) = pMLE (A → B C) sB (1 − sC ) 40 / 42 DP-TSG: Base distribution P0 Phrasal rules: P0 (A+ → B− C+ ) = pMLE (A → B C) sB (1 − sC ) pMLE is the manually-split grammar! sB is the stop probability 40 / 42 DP-TSG: Base distribution P0 Lexical insertion rules: P0 (C+ → t) = pMLE (C → t) p(t) 41 / 42 DP-TSG: Base distribution P0 Lexical insertion rules: P0 (C+ → t) = pMLE (C → t) p(t) p(t) is unigram probability of word t 41 / 42 Tree substitution grammars A Probabilistic TSG is a 5-tuple hV , Σ, R, ♦, θi c ∈ V are non-terminals ♦ ∈ V is a unique start symbol t ∈ Σ are terminals e ∈ R are elementary trees θc,e ∈ θ are parameters for each tree fragment 42 / 42 Tree substitution grammars A Probabilistic TSG is a 5-tuple hV , Σ, R, ♦, θi c ∈ V are non-terminals ♦ ∈ V is a unique start symbol t ∈ Σ are terminals e ∈ R are elementary trees θc,e ∈ θ are parameters for each tree fragment elementary tree == tree fragment 42 / 42
© Copyright 2026 Paperzz