Reranking Bilingually Extracted Paraphrases Using Monolingual Distribu<onal Similarity Charley Chan, Chris Callison-‐Burch, Benjamin Van Durme Center for Language and Speech Processing, HLTCOE Johns Hopkins University GEMS 2011 JOHNS HOPKINS
U N I V E R S I T Y
CENTER
FOR
LANGUAGE
CENTER
FOR
LANGUAGE
SPEECH
PROCESSING
SPEECH
PROCESSING
&
Paraphrasing • Goal To use huge amount of text to iden<fy paraphrases that are seman<cally similar huge amount of paraphrase candidate paraphrase score Chan, Callison-‐Burch, Van Durme
huge amount of 1.0 large quan/ty of 0.98 large number of 0.98 great number of 0.97 vast number of 0.94 in large numbers 0.10 large numbers of 0.08 GEMS 2011 Our Approach Bilingual Paraphrase Extrac<on Monolingual Reranking + (distribu<onal context vector) (Bilingual pivo<ng) Relevant Work t week five farmers were thrown into jail
in Ireland because they resisted ...
• Clusters phrases through sta<s<cal n in Irland fünf Landwirte festgenommen
, weil sie verhindern wollten ...characteris<cs • Dependency path similari<es (e.g. schwunden oder wurden festgenommen
, gefoltert und getötet .
Lin and Pantel, ‘01) • Distribu<onal co-‐occurrence appeared or have been
imprisoned
, tortured and killed .
informa<on • Extract poten<al paraphrases by • e.g. Ravichandran et al., ‘05, Paşca grouping English phrases that share and Dienes, ’05; Van Durme and the same foreign transla<ons Lall, ‘10 • e.g. Bannard and Callison-‐Burch, ‘05 Chan, Callison-‐Burch, Van Durme
GEMS 2011 Pros and Cons Bilingual Paraphrase Extrac<on (Bilingual pivo<ng) Monolingual Reranking + (distribu<onal context vector) − Incorrect word alignment − High paraphrase probability ≠ good paraphrases + Produces manageable paraphrase candidate set à Scalability + Less prone to antonym issues + Mul<ple languages as pivots Chan, Callison-‐Burch, Van Durme
− Antonyms, cousins (e.g. rise, fall; boy, girl) − Hard to generate candidate set from corpus + Prefers gramma<cally similar phrases + Training data availability GEMS 2011 Scoring Methods • MonoDS – Monolingual Distribu<onal Similarity – Distribu<onal context vector built from monolingual n-‐gram corpus – Implemented via locality sensi<ve hashing (LSH) • BiP – Bilingual Pivo<ng – Phrase table and transla<on model from French-‐English parallel corpus • SyntBiP – Syntac<cally-‐constrained Bilingual Pivo<ng – Iden<cal to BiP except paraphrase candidates share the same syntac<c type Chan, Callison-‐Burch, Van Durme
GEMS 2011 MonoDS • Based on distribu<onal similarity • Baseline approach: Distribu<onal context vector – Neighboring unigram frequency – For each phrase x
e.g. x = “dog” • Every n-‐gram feature of the form: (“the dog”, count = 1707225) or (“dog barks”, count = 8654) • Gives rise to a (feature, value) pair: (wi-‐1 = “the”, 1707225) or (wi+1 = ”barks”, 8654) • Vector size: 2|V|, |V|= size of vocabulary – Similarity: standard cosine distance à MonoDS score Chan, Callison-‐Burch, Van Durme
sim ( x, y) =
vx ! vy
vx vy
GEMS 2011 MonoDS • Limita<ons – Computa<onal burden in candidate extrac<on à need to compute context vectors for all phrases in n-‐gram corpus – Suscep<bility to antonyms à Example: -‐ Source phrase = reluctant -‐ Candidates generated by -‐ bilingual paraphrase method -‐ hand-‐selected phrases sim(reluctant, willing) > sim(reluctant, disinclined) Chan, Callison-‐Burch, Van Durme
GEMS 2011 BiP -‐ Bilingual Paraphrase Extrac<on • Idea – Iden<fy paraphrases by pivo<ng through foreign phrases in bilingual parallel corpora – Objec<ve: Phrase Paraphrase ê2 = arg max p ( e2 e1 )
... last week five farmers were thrown into jail
p ( e2 e1 ) = ! p ( e2 , f e...1 ) letzte Woche wurden
f
Paraphrase Probability in Ireland because they re
e2 !e1
in Irland fünf Landwirte
festgenommen
, weil sie verhindern woll
Zahlreiche Journalisten sind verschwunden oder wurden
festgenommen
, gefoltert und getötet .
imprisoned
, tortured and killed .
= ! p ( e2 f , e1 )p ( f e1 )
f
Quite a few journalists have disappeared or have been
" ! p ( e2 f )p ( f e1 )
f
Transla<on Probability Chan, Callison-‐Burch, Van Durme
GEMS 2011 SyntBiP – Syntac<c Bilingual Paraphrasing • In addi<on, – Iden<fy paraphrases of the same syntac<c type by pivo<ng through foreign phrases with the specific syntac<c type in bitext – Objec<ve: Phrase VP ê2 = arg max p ( e2 e1 , s )
e2 !e1
Paraphrase ... last week five farmers were thrown into jail
Syntac<c type p ( e2 e1, s ) = ! p ( e2 , f ...e1,letzte
s ) Woche wurden
in Irland fünf Landwirte
festgenommen
, weil sie verhindern wol
Zahlreiche Journalisten sind verschwunden oder wurden
festgenommen
, gefoltert und getötet .
Quite a few journalists have disappeared or have been
imprisoned
, tortured and killed .
f
Paraphrase Probability in Ireland because they r
= ! p ( e2 f , e1, s )p ( f e1, s )
f
" ! p ( e2 f , s )p ( f e1 , s )
f
VP Transla<on Probability Chan, Callison-‐Burch, Van Durme
GEMS 2011 Bilingual Paraphrase Extrac<on • Less suscep<ble to antonyms – Should alleviates antonym problem à Antonyms don’t usually share foreign transla<ons – Previous example: Chan, Callison-‐Burch, Van Durme
GEMS 2011 Paraphrasing Pipeline Bilingual paraphrase acquisi<on BiP scores (probabili<es) Bilingual parallel corpus Grammar Extrac<on Bilingual Pivo<ng Phrase table Paraphrase table Phrases & paraphrases Paraphrase model -‐ Europarl French-‐English corpus (Koehn, 2005) -‐ 1.3M sentences, 34M words on English side -‐ Word alignment: Berkeley aligner -‐ Syntac<c informa<on: Stanford parser -‐ Phrase table and probabili<es: Thrax grammar extractor Chan, Callison-‐Burch, Van Durme
GEMS 2011 Paraphrasing Pipeline Context vectors -‐ Latest version of Google n-‐gram (Lin et al., 2010) -‐ max 5-‐gram in corpus à source phrases up to 4-‐gram -‐ 3.8 Billion n-‐grams: Scalability problem à Locality Sensi<ve Hashing (Indyk & Motwani, ‘98; Charikar, ’02) Monolingual N-‐gram corpus Context Vector Construc<on Phrases & paraphrases Approximate Cosine Similarity LSH Signatures MonoDS scores Monolingual features construc<on Chan, Callison-‐Burch, Van Durme
GEMS 2011 Paraphrasing Pipeline Bilingual parallel corpus Monolingual N-‐gram corpus Bilingual paraphrase acquisi<on BiP scores (probabili<es) Grammar Extrac<on Bilingual Pivo<ng Phrase table Paraphrase table Phrases & paraphrases Context Vector Construc<on Approximate Cosine Similarity LSH Signatures MonoDS scores Monolingual features construc<on Chan, Callison-‐Burch, Van Durme
GEMS 2011 Example huge amount of BiP SyntBiP BiP + MonoDS large number of, 0.33 large number of, 0.38 huge amount of, 1.0 in large numbers, 0.11 great number of, 0.09 large quan/ty of, 0.98 great number of, 0.08 huge amount of, 0.06 large number of, 0.98 large numbers of, 0.06 vast number of, 0.06 great number of, 0.97 vast number of, 0.06 vast number of, 0.94 huge amount of, 0.06 in large numbers, 0.10 large quan/ty of, 0.03 large numbers of, 0.08 • BiP-‐MonoDS Gramma<cality error – Less weight on gramma<cality error
• Coverage of SyntBiP – Limited coverage due to syntac<c type constraint Chan, Callison-‐Burch, Van Durme
GEMS 2011 Human Evalua<on • Subs<tu<on test • Amazon Mechanical Turk • Source phrase subs<tuted by each of the paraphrase candidates • Judge separately the preserva<on of meaning and gramma<cality on a 5-‐point scale • 3 repe<<on per human intelligence task (HIT), results averaged • 5-‐point Scale Score Meaning Grammar 5 perfect perfect 4 minor differences okay but awkward 3 moderate differences one error 2 substan<ally different many errors 1 completely different ungramma<cal Chan, Callison-‐Burch, Van Durme
GEMS 2011 Human Evalua<on • Subs<tu<on test example Original the sea level will rise by about 14 inches instead of 39 . Paraphrase Meaning Grammar 1 that will rise by about 14 inches instead of 39 . 2 4 2 the rise will rise by about 14 inches instead of 39 . 2 4 3 sea level will rise by about 14 inches instead of 39 . 5 4 4 the will rise by about 14 inches instead of 39 . 2 2 5 sea levels will rise by about 14 inches instead of 39 . 5 5 Chan, Callison-‐Burch, Van Durme
Meaning Grammar 5 perfect perfect 4 minor differences okay but awkward 3 moderate differences one error 2 substan<ally different many errors 1 completely different ungramma<cal GEMS 2011 Human Evalua<on • Dataset – 100 source phrases • 25 phrases per n-‐gram, n = 1,2,3,4 • Randomly sampled from paraphrase tables generated by BiP and SyntBiP • 5 sentences per phrase, 500 total • Each scoring method (BiP, SyntBiP, MonoDS) evaluated with: – Averages of meaning and gramma<cality scores – Rank correla<on with human judgment (Kendall Tau-‐b) τΒ = 1: perfectly correlated τΒ > 0: posi<vely correlated τΒ < 0: nega<vely correlated Chan, Callison-‐Burch, Van Durme
GEMS 2011 Correla<on with Human Judgments 0.3 * Kendall Tau-‐b 0.25 * 0.2 BiP BiP-‐MonoDS 0.15 SyntBiP 0.1 SyntBiP-‐MonoDS 0.05 0 Meaning Chan, Callison-‐Burch, Van Durme
Grammar * : p ≥ 0.01 GEMS 2011 Thresholding by Score Grammar: Okay but awkward BiP Grammar: Okay but awkward MonoDS 4.5 4 4 Bilingual ó Meaning Chan, Callison-‐Burch, Van Durme
Grammar: one error Meaning: moderate differences M
onoDS ó Grammar GEMS 2011 1 0.95 bin cutoff bin cutoff Meaning: substan<ally different 0.9 2 0.85 2 0.8 2.5 0.7 2.5 0.6 3 0.5 Grammar Grammar 0.4 3 3.5 0.3 Meaning Meaning 0.2 3.5 0.1 4.5 Meaning: minor differences Combining MonoDS and BiP Thresholds • Joint Thresholding – Tighter threshold à higher paraphrase quality – Addi<onal gain in score than each alone – High precision paraphrase retrieved at the expense of coverage Meaning 4.4 4.5 4.2 4 3.5 Grammar Number of Items ≥ 0.05 ≥ 0.01 ≥ 6.7e-‐3 ≥ 0.7 ≥ 0.8 BiP ≥ 0.9 MonoDS Chan, Callison-‐Burch, Van Durme
≥ 0.05 ≥ 0.01 ≥ 6.7e-‐3 4 3.8 ≥ 0.7 ≥ 0.8 BiP ≥ 0.9 MonoDS Grammar = 4: O kay but a wkward GEMS 2011 Effects of Bilingual Training Corpus • Bri<sh vs American English é Meaning, Grammar – {legalise, legalize}
– BiP: à trained on corpus with Bri<sh English é BiP + MonoDS ê BiP • Incorrect word alignment ê Meaning, Grammar – {considerable changes, caused quite} ê BiP + MonoDS – Incorrect transla<on from misalignment é BiP propagate through bilingual paraphrase model considerable changes à modifie considérablement modifie considérablement à caused quite (misaligned) à MonoDS alleviates problems caused by bilingual model training Chan, Callison-‐Burch, Van Durme
GEMS 2011 Context Influence • {hauled, delivered} countries which do not comply with community legisla/on should be hauled before the court of jus/ce … é Meaning, Grammar é BiP + MonoDS ê BiP à MonoDS takes advantage of context informaEon Chan, Callison-‐Burch, Van Durme
GEMS 2011 Context Influence é BiP • {fiscal burden, taxes} ê BiP + MonoDS (1) … the member states can reduce the fiscal burden consis/ng of taxes and social contribu/ons . ê Meaning, Grammar (2) ... and , in addi/on , the fiscal burden in europe has reached an all-‐/me high of 46 % é Meaning, Grammar à MonoDS is more useful in context that defines the context vector Chan, Callison-‐Burch, Van Durme
GEMS 2011 MonoDS Limita<ons • Feature sparsity – {to deal properly with, address} é Meaning, Grammar • Non-‐zero features for to deal properly with: é BiP to deal properly with the 709 • Non-‐zero features for address ê BiP + MonoDS 43120 n-‐grams with unigram neighbor E.g. address lawsuits 36 operators address 247 today address 685 address judicial
341 – {you have just stated, you have just suggested} é Meaning, Grammar • MonoDS score = 0.05 • MonoDS score of {stated, suggested} = 0.91 é BiP ê BiP + MonoDS à MonoDS suffers from feature sparsity for higher n-‐grams Chan, Callison-‐Burch, Van Durme
GEMS 2011 Summary • Presented a novel paraphrase acquisi<on method – Generate paraphrase candidate using bilingual pivo<ng – Rerank with monolingual distribu<onal similarity and paraphrase probabili<es • Findings through analysis on human judgment tasks: v Combining monolingual and bilingual informa<on à substanEally improves paraphrase quality v Monolingual distribu<onal similarity confines structural match à MonoDS score correlates strongly with grammaEcality v Joint threshold on MonoDS and BiP scores à high precision paraphrases at expense of coverage Chan, Callison-‐Burch, Van Durme
GEMS 2011 Summary • Future direc<ons – Explore bexer models for MonoDS to enhance meaning preserva<on • Experiment with richer features – Increased number of neighboring words – Syntac<c dependency parsing (Lin, 1997) – Mutual informa<on between phrasal co-‐occurences (Church and Hanks, 1991) – Beyond joint thresholding • Systema<cally combine different scores and addi<onal features – SVM training – Large-‐scale paraphrase dataset… stay tuned – Inves<gate domain-‐specific paraphrasing Chan, Callison-‐Burch, Van Durme
GEMS 2011 Thank you Ques<ons? JOHNS HOPKINS
U N I V E R S I T Y
Chan, Callison-‐Burch, Van Durme
CENTER
FOR
LANGUAGE
CENTER
FOR
LANGUAGE
SPEECH
PROCESSING
SPEECH
PROCESSING
&
GEMS 2011
© Copyright 2026 Paperzz