Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and Genitives CICLING 2012, New Delhi Anselmo Peñas NLP & IR Group, UNED, Spain Ekaterina Ovchinnikova USC – Information Science Institute, USA UNED Texts omit information Humans optimize language generation effort We omit information that we know the receptor is able to predict and recover Our research goal is to make explicit the omitted information in texts nlp.uned.es UNED Implicit predicates In particular, some noun compounds and genitives are used in such way In these cases, we want to recover the implicit predicates For example: • Morning coffee -> coffee drunk in the morning • Malaria mosquito -> mosquito that carries malaria nlp.uned.es UNED How to find the candidates? Nakov & Hearst 2006 Search the web •N1 N2 -> N2 THAT * N1 •Malaria mosquito -> mosquito THAT * malaria Here we use Proposition Stores Harvest a text collection that will serve as context Parse documents Count N-V-N, N-V-P-N, N-P-N, … structures Build Proposition Stores (Peñas & Hovy, 2010) nlp.uned.es UNED Proposition Stores Example: propositions that relate Bomb, attack •npn:[bomb:n, in:in, attack:n]:13. •nvpn:[bomb:n, explode:v, in:in, attack:n]:11. •nvnpn:[bomb:n, kill:v, people:n, in:in, attack:n]:8. •npn:[attack:n, with:in, bomb:n]:8. •… All of them could be paraphrases for the noun compound “bomb attack” nlp.uned.es UNED NE Semantic Classes Now, What happens if we have a Named Entity? Shakespeare’s tragedy -> write Why? Consider • John’s tragedy • Airbus’ tragedy nlp.uned.es UNED NE Semantic Classes We are considering the “semantic classes” of the NE Shakespeare -> writer writer, tragedy -> write nlp.uned.es UNED Class-Instance relations Fortunately, relevant semantic classes are pointed out in texts through well-known structures • appositions, copulative verbs, “such as”, … Here we take advantage of dependency parsing to get class-instance relations NNP NNP nn NN NNP appos NN be NN nlp.uned.es UNED Class-Instance relations World News has_instance(leader,'Yasir':'Arafat'):1491. has_instance(spokesman,'Marlin':'Fitzwater'):1001. has_instance(leader,'Mikhail':'S.':'Gorbachev'):980. has_instance(chairman,'Yasir':'Arafat'):756. has_instance(agency,'Tass'):637. has_instance(leader,'Radovan':'Karadzic'):611. has_instance(adviser,'Condoleezza':'Rice'):590. … nlp.uned.es UNED So far Propositions: <p,a> | P(p,a) p: predicate a: list of arguments <a1 …an> P(p,a): joint probability Class-instance relations: <c,i> | P(c,i) c: class i: instance P(c,i): joint probability nlp.uned.es UNED Probability of a predicate Let’s consider the following example Favre pass Assume the text has pointed out he is a quarterback What is Favre doing with the pass? The same as other quarterbacks •The quarterbacks we observed before in the background collection – Proposition Store nlp.uned.es UNED Probability of a predicate Favre pass -> p | P(p|i) Favre -> quarterback | P(c|i) quarterback, pass -> throw | P(p|c) P ( p | i ) P (c | i ) P ( p | c ) ci We already have: n P(c | i) P(ck | ik ) k 1 We need to estimate: P(p|c) (What other quarterbacks do with passes) nlp.uned.es UNED Probability of a predicate quarterback pass -> p | P(p|c) • Steve:Young pass -> throw | P(p|i) • Culpepper pass -> complete | P(p|i) •… P( p | c) P(i | c) P( p | i) ic We already have n P(i | c) P(ik | ck ) k 1 and P(p|i) comes from previous observation: Proposition Store nlp.uned.es UNED Evaluation We want to address the following questions Do we find the paraphrases required to enable Textual Entailment? Do all the noun-noun dependencies need to be paraphrased? How frequently NEs appear in them? nlp.uned.es UNED Experimental setting Proposition Store from 216,303 World News 7,800,000 sentences parsed RTE-2 (Recognizing Textual Entailment) 83 entailment decisions depend on noun-noun paraphrases 77 different noun-noun paraphrases nlp.uned.es UNED Results How frequently NEs appear in these pairs? 82% of paraphrases contain at least one NE 62% are paraphrasing NE-N (e.g. Vikings quarterback) nlp.uned.es UNED Results Do all the noun-noun dependencies need to be paraphrased? No, only 54% in our test set Some compounds encode semantic relations such as: 12% are locative relations (e.g. New York club) Temporal relations (e.g. April 23rd strike , Friday semi-final) Class-instance relations (e.g. quarterback Favre) Measure, … Some are trivial: 27% are paraphrased with “of” nlp.uned.es UNED Results Do we find the paraphrases required to enable Textual Entailment? Yes in 63% of non-trivial cases Proposition type Paraphrase NPN Jackson trial ↔ trial against Jackson engine problem ↔ problem with engine NVN U.S. Ambassador ↔ Ambassador represents the U.S. ETA bombing ↔ ETA carried_out bombing NVNPN wife of Joseph Wilson ↔ wife is married to Joseph Wilson NVPN Vietnam veteran ↔ veteran comes from Vietnam Shapiro’s office ↔ Shapiro work in office Germany's people ↔ people live in Germany Abu Musab al-Zarqawi's group ↔ group led by Abu Musab al-Zarqawi nlp.uned.es UNED Results RTE-2 pair 485: Paraphrase not found United Nations vehicle ↔ United Nations produces vehicles United Nations doesn’t share any class with the instances that “produce vehicles” Toyota vehicle -> develop, build, sell, produce, make, export, recall, assemble, … nlp.uned.es UNED Conclusions A significant proportion of noun-noun dependencies includes Named Entities Some noun-noun dependencies don’t require the retrieval of implicit predicates The method proposed is sensitive to different Nes Different NEs retrieve different predicates Current work: to select the most relevant paraphrase according to the text We are exploring weighted abduction nlp.uned.es Unsupervised Acquisition of Axioms to Paraphrase Noun Compounds and Genitives CICLING 2012, New Delhi Thanks!
© Copyright 2026 Paperzz