Path-based vs. Distributional Information in Recognizing Lexical Semantic Relations Vered Shwartz and Ido Dagan Bar-Ilan University December 12, 2016 Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 1 / 33 Lexical Semantic Relations Some interesting lexical semantic relations: Synonymy: (elevator, lift) Hypernymy (Hyponymy): (green, color), (Obama, president) Meronymy (Holonymy): (London, England), (hand, body) Antonymy: (cold, hot) etc. Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 2 / 33 Lexical Semantic Relations Some interesting lexical semantic relations: Synonymy: (elevator, lift) Hypernymy (Hyponymy): (green, color), (Obama, president) Meronymy (Holonymy): (London, England), (hand, body) Antonymy: (cold, hot) etc. Relations are used to infer one word from another in certain contexts: I I I I ate an apple → I ate a fruit hate fruit → I hate apples visited Tokyo → I visited Japan left Tokyo 6→ I left Japan Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 2 / 33 Recognizing Lexical Semantic Relations Given two terms, x and y , decide what is the semantic relation that holds between them (if any) in some senses of x and y e.g. both fruit and company are hypernyms of apple Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 3 / 33 Example Motivation - Recognizing Textual Entailment Text A boy is hitting a baseball Hypotheses Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 4 / 33 Example Motivation - Recognizing Textual Entailment Text A boy is hitting a baseball Hypotheses 1 A child is hitting a baseball ⇒ ENTAILMENT: hypernym:(boy, child) Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 4 / 33 Example Motivation - Recognizing Textual Entailment Text A boy is hitting a baseball Hypotheses 1 A child is hitting a baseball ⇒ ENTAILMENT: hypernym:(boy, child) 2 A boy is missing a baseball ⇒ CONTRADICTION: antonym:(hitting, missing) Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 4 / 33 Example Motivation - Recognizing Textual Entailment Text A boy is hitting a baseball Hypotheses 1 A child is hitting a baseball ⇒ ENTAILMENT: hypernym:(boy, child) 2 A boy is missing a baseball ⇒ CONTRADICTION: antonym:(hitting, missing) 3 A girl is hitting a baseball ⇒ NEUTRAL: co-hyponym:(boy, girl) Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 4 / 33 What’s in the talk? Overview of prior path-based and distributional methods for semantic relation classification Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 5 / 33 What’s in the talk? Overview of prior path-based and distributional methods for semantic relation classification ...LSTM-based architecture for semantic relation classification (sorry!) Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 5 / 33 What’s in the talk? Overview of prior path-based and distributional methods for semantic relation classification ...LSTM-based architecture for semantic relation classification (sorry!) Analysis of the contribution of each information source to the classification Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 5 / 33 Prior Methods Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 6 / 33 Semantic Relation Classification Corpus-based Methods Distributional Vered Shwartz (Bar-Ilan University) Path-based LexNET CogALex 2016 7 / 33 Distributional Methods Corpus-based Methods Distributional Vered Shwartz (Bar-Ilan University) Path-based LexNET CogALex 2016 8 / 33 Distributional Methods Recognize the relation between x and y based on their separate occurrences in the corpus Namely: based on their neighbor distributions [Harris, 1954] Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 9 / 33 Distributional Methods Recognize the relation between x and y based on their separate occurrences in the corpus Namely: based on their neighbor distributions [Harris, 1954] In earlier methods, words are represented as count-based vectors: elevator lift Vered Shwartz (Bar-Ilan University) 0 0 0 0 ... ... 0.56 0.55 ↑ up LexNET 0 0 ... ... 0.89 0.91 ↑ stairs 0 0 ... ... 0 0 CogALex 2016 9 / 33 Distributional Methods Recognize the relation between x and y based on their separate occurrences in the corpus Namely: based on their neighbor distributions [Harris, 1954] In earlier methods, words are represented as count-based vectors: elevator lift 0.56 0 ... 0.89 0 ... 0 0.55 0 ... 0.91 0 ... 0 ↑ ↑ up stairs Recent years: words are represented using low-dimensional word embeddings [Mikolov et al., 2013, Pennington et al., 2014] Vered Shwartz (Bar-Ilan University) 0 0 0 0 ... ... LexNET CogALex 2016 9 / 33 Supervised Distributional Methods (x, y ) term-pairs are represented as a feature vector, based of the terms’ embeddings: Concatenation ~x ⊕ ~y [Baroni et al., 2012] Difference ~y − ~x [Roller et al., 2014, Weeds et al., 2014] Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 10 / 33 Supervised Distributional Methods (x, y ) term-pairs are represented as a feature vector, based of the terms’ embeddings: Concatenation ~x ⊕ ~y [Baroni et al., 2012] Difference ~y − ~x [Roller et al., 2014, Weeds et al., 2014] A classifier is trained to predict the semantic relation between x and y Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 10 / 33 Supervised Distributional Methods (x, y ) term-pairs are represented as a feature vector, based of the terms’ embeddings: Concatenation ~x ⊕ ~y [Baroni et al., 2012] Difference ~y − ~x [Roller et al., 2014, Weeds et al., 2014] A classifier is trained to predict the semantic relation between x and y Achieved very good results on common datasets Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 10 / 33 Supervised Distributional Methods [Levy et al., 2015]: supervised distributional method do not learn the relation between x and y Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 11 / 33 Supervised Distributional Methods [Levy et al., 2015]: supervised distributional method do not learn the relation between x and y Instead, they memorize single words that occur in the same relation e.g. that fruit or animal are prototypical hypernyms Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 11 / 33 Supervised Distributional Methods [Levy et al., 2015]: supervised distributional method do not learn the relation between x and y Instead, they memorize single words that occur in the same relation e.g. that fruit or animal are prototypical hypernyms Recent work suggests these methods actually do more than memorize [Roller and Erk, 2016] Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 11 / 33 Supervised Distributional Methods [Levy et al., 2015]: supervised distributional method do not learn the relation between x and y Instead, they memorize single words that occur in the same relation e.g. that fruit or animal are prototypical hypernyms Recent work suggests these methods actually do more than memorize [Roller and Erk, 2016] But they still provide only the prior of x or y to fit the relation [Shwartz et al., 2016] Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 11 / 33 Path-based Methods Corpus-based Methods Distributional Vered Shwartz (Bar-Ilan University) Path-based LexNET CogALex 2016 12 / 33 Path-based Methods Recognize the relation between x and y based on their joint occurrences in the corpus Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 13 / 33 Path-based Methods Recognize the relation between x and y based on their joint occurrences in the corpus Patterns connecting x and y may indicate the semantic relation that holds between them Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 13 / 33 Path-based Methods Recognize the relation between x and y based on their joint occurrences in the corpus Patterns connecting x and y may indicate the semantic relation that holds between them [Hearst, 1992] defined a set of patterns that indicate hypernymy, e.g.: X or other Y X is a Y Y, including X Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 13 / 33 Path-based Methods [Snow et al., 2004] represented patterns as the dependency paths connecting the words: ATTR NSUBJ Vered Shwartz (Bar-Ilan University) DET apple is a fruit NOUN VERB DET NOUN LexNET CogALex 2016 14 / 33 Path-based Methods [Snow et al., 2004] represented patterns as the dependency paths connecting the words: ATTR NSUBJ DET apple is a fruit NOUN VERB DET NOUN They learned a hypernymy classifier using paths as features Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 14 / 33 Path-based Methods [Snow et al., 2004] represented patterns as the dependency paths connecting the words: ATTR NSUBJ DET apple is a fruit NOUN VERB DET NOUN They learned a hypernymy classifier using paths as features Methods inspired by [Snow et al., 2004] were broadly adopted for semantic relation classification [Snow et al., 2006, Turney, 2006, Riedel et al., 2013] Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 14 / 33 Path-based Methods However, this method suffers from relatively low recall: Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 15 / 33 Path-based Methods However, this method suffers from relatively low recall: 1 x and y must occur together in the corpus (enough times!) Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 15 / 33 Path-based Methods However, this method suffers from relatively low recall: 1 2 x and y must occur together in the corpus (enough times!) paths connecting x and y must occur enough times in the corpus Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 15 / 33 Path-based Methods However, this method suffers from relatively low recall: 1 2 x and y must occur together in the corpus (enough times!) paths connecting x and y must occur enough times in the corpus [Necşulescu et al., 2015] tried to overcome (1) by concatenating paths – e.g. by a connecting word z. Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 15 / 33 Path-based Methods However, this method suffers from relatively low recall: 1 2 x and y must occur together in the corpus (enough times!) paths connecting x and y must occur enough times in the corpus [Necşulescu et al., 2015] tried to overcome (1) by concatenating paths – e.g. by a connecting word z. Performance improved, but the method was still outperformed by a distributional baseline Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 15 / 33 Path-based Methods However, this method suffers from relatively low recall: 1 2 x and y must occur together in the corpus (enough times!) paths connecting x and y must occur enough times in the corpus [Necşulescu et al., 2015] tried to overcome (1) by concatenating paths – e.g. by a connecting word z. Performance improved, but the method was still outperformed by a distributional baseline PATTY [Nakashole et al., 2012] tried to overcome (2) by generalized paths, e.g.: X is defined/described as Y ⇒ X is VERB as Y Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 15 / 33 Path-based Methods However, this method suffers from relatively low recall: 1 2 x and y must occur together in the corpus (enough times!) paths connecting x and y must occur enough times in the corpus [Necşulescu et al., 2015] tried to overcome (1) by concatenating paths – e.g. by a connecting word z. Performance improved, but the method was still outperformed by a distributional baseline PATTY [Nakashole et al., 2012] tried to overcome (2) by generalized paths, e.g.: X is defined/described as Y ⇒ X is VERB as Y Syntactic generalizations make semantic mistakes: X is rejected as Y Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 15 / 33 Integrated Methods Corpus-based Methods Distributional Path-based Integrated Methods Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 16 / 33 Integrated Methods Path-based and distributional information sources are considered complementary in recognizing semantic relations Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 17 / 33 Integrated Methods Path-based and distributional information sources are considered complementary in recognizing semantic relations Classifiers with path-based and distributional features: [Mirkin et al., 2006, Pavlick et al., 2015] Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 17 / 33 Integrated Methods Path-based and distributional information sources are considered complementary in recognizing semantic relations Classifiers with path-based and distributional features: [Mirkin et al., 2006, Pavlick et al., 2015] HypeNET [Shwartz et al., 2016]: integrated path-based and distributional method for hypernymy detection Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 17 / 33 Integrated Methods Path-based and distributional information sources are considered complementary in recognizing semantic relations Classifiers with path-based and distributional features: [Mirkin et al., 2006, Pavlick et al., 2015] HypeNET [Shwartz et al., 2016]: integrated path-based and distributional method for hypernymy detection Improved neural path representation Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 17 / 33 Integrated Methods Path-based and distributional information sources are considered complementary in recognizing semantic relations Classifiers with path-based and distributional features: [Mirkin et al., 2006, Pavlick et al., 2015] HypeNET [Shwartz et al., 2016]: integrated path-based and distributional method for hypernymy detection Improved neural path representation Integrated with distributional features, and trained jointly Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 17 / 33 Integrated Methods Path-based and distributional information sources are considered complementary in recognizing semantic relations Classifiers with path-based and distributional features: [Mirkin et al., 2006, Pavlick et al., 2015] HypeNET [Shwartz et al., 2016]: integrated path-based and distributional method for hypernymy detection Improved neural path representation Integrated with distributional features, and trained jointly ...works well also for multiple semantic relations (LexNET) Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 17 / 33 LexNET Architecture Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 18 / 33 Dependency Path Representation [Shwartz et al., 2016] 1 An edge is a concatenation of 4 component vectors: dependent lemma / dependent POS / Vered Shwartz (Bar-Ilan University) dependency label / LexNET be/VERB/ROOT/- direction CogALex 2016 19 / 33 Dependency Path Representation [Shwartz et al., 2016] 1 An edge is a concatenation of 4 component vectors: dependent lemma / 2 dependent POS / dependency label / be/VERB/ROOT/- direction Edges are fed sequentially to an LSTM to get the path embedding: Embeddings: lemma POS dependency label direction ~vpaths(x,y ) o~p X/NOUN/nsubj > be/VERB/ROOT < Y/NOUN/attr . . . average pooling X/NOUN/dobj > define/VERB/ROOT < as/ADP/prep < Y/NOUN/pobj Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 19 / 33 Classification Models 1 Path-based 2 Distributional 3 Integrated Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 20 / 33 Path-based Model (PB) A classifier is trained on the path embedding ~vpaths(x,y ) : v~xy v~wx ... (x, y ) classification (softmax) ~vpaths(x,y ) v~wy Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 21 / 33 Distributional Model (DS) A classifier is trained on the concatenation of x and y ’s word embeddings [~vwx , ~vwy ]: v~xy v~wx ... (x, y ) classification v~wy Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 22 / 33 Integrated Model (LexNET) A classifier is trained on the concatenation of the path embedding ~vpaths(x,y ) , and x and y ’s word embeddings [~vwx , ~vwy ]. Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 23 / 33 Integrated Model (LexNET) A classifier is trained on the concatenation of the path embedding ~vpaths(x,y ) , and x and y ’s word embeddings [~vwx , ~vwy ]. (HypeNET for multiple relations): v~xy v~wx ... (x, y ) classification (softmax) ~vpaths(x,y ) v~wy Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 23 / 33 Non-linear Distributional Model (DSh ) [Levy et al., 2015]: linear classifiers incapable of learning interactions between x and y ’s features Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 24 / 33 Non-linear Distributional Model (DSh ) [Levy et al., 2015]: linear classifiers incapable of learning interactions between x and y ’s features How about adding non-linear expressive power? (i.e. hidden layer...) v~xy v~wx ... (x, y ) classification (softmax) ~h v~wy Vered Shwartz (Bar-Ilan University) v~wy LexNET CogALex 2016 24 / 33 Non-linear Integrated Model (LexNETh ) Similarly, we add a hidden layer to the integrated network: v~xy v~wx ... (x, y ) classification (softmax) ~vpaths(x,y ) ~h v~wy Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 25 / 33 Analysis Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 26 / 33 Evaluation (1/3) We tested our models on common semantic relations datasets: K&H+N [Necşulescu et al., 2015] BLESS [Baroni and Lenci, 2011] ROOT09 [Santus et al., 2016] EVALution [Santus et al., 2015] Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 27 / 33 Evaluation (1/3) We tested our models on common semantic relations datasets: K&H+N [Necşulescu et al., 2015] BLESS [Baroni and Lenci, 2011] ROOT09 [Santus et al., 2016] EVALution [Santus et al., 2015] Each dataset contains several semantic relations, among: hypernymy meroynymy co-hyponymy event attribute synonymy antonymy random Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 27 / 33 Evaluation (2/3) LexNET outperforms individual path-based and distributional methods method Path-based (PB) Distributional (DSh ) Integrated (LexNET) P 0.713 0.983 0.985 Vered Shwartz (Bar-Ilan University) K&H+N R F1 0.604 0.55 0.984 0.983 0.986 0.985 P 0.759 0.891 0.894 BLESS R F1 0.756 0.755 0.889 0.889 0.893 0.893 LexNET P 0.788 0.712 0.813 ROOT09 R F1 0.789 0.788 0.721 0.716 0.814 0.813 EVALution P R F1 0.53 0.537 0.503 0.57 0.573 0.571 0.601 0.607 0.6 CogALex 2016 28 / 33 Evaluation (2/3) LexNET outperforms individual path-based and distributional methods method Path-based (PB) Distributional (DSh ) Integrated (LexNET) P 0.713 0.983 0.985 Vered Shwartz (Bar-Ilan University) K&H+N R F1 0.604 0.55 0.984 0.983 0.986 0.985 P 0.759 0.891 0.894 BLESS R F1 0.756 0.755 0.889 0.889 0.893 0.893 LexNET P 0.788 0.712 0.813 ROOT09 R F1 0.789 0.788 0.721 0.716 0.814 0.813 EVALution P R F1 0.53 0.537 0.503 0.57 0.573 0.571 0.601 0.607 0.6 CogALex 2016 28 / 33 Evaluation (2/3) LexNET outperforms individual path-based and distributional methods method Path-based (PB) Distributional (DSh ) Integrated (LexNET) P 0.713 0.983 0.985 Vered Shwartz (Bar-Ilan University) K&H+N R F1 0.604 0.55 0.984 0.983 0.986 0.985 P 0.759 0.891 0.894 BLESS R F1 0.756 0.755 0.889 0.889 0.893 0.893 LexNET P 0.788 0.712 0.813 ROOT09 R F1 0.789 0.788 0.721 0.716 0.814 0.813 EVALution P R F1 0.53 0.537 0.503 0.57 0.573 0.571 0.601 0.607 0.6 CogALex 2016 28 / 33 Evaluation (2/3) LexNET outperforms individual path-based and distributional methods method Path-based (PB) Distributional (DSh ) Integrated (LexNET) P 0.713 0.983 0.985 Vered Shwartz (Bar-Ilan University) K&H+N R F1 0.604 0.55 0.984 0.983 0.986 0.985 P 0.759 0.891 0.894 BLESS R F1 0.756 0.755 0.889 0.889 0.893 0.893 LexNET P 0.788 0.712 0.813 ROOT09 R F1 0.789 0.788 0.721 0.716 0.814 0.813 EVALution P R F1 0.53 0.537 0.503 0.57 0.573 0.571 0.601 0.607 0.6 CogALex 2016 28 / 33 Evaluation (3/3) Path-based contribution over distributional info is small in some datasets: method Path-based (PB) Distributional (DSh ) Integrated (LexNET) P 0.713 0.983 0.985 Vered Shwartz (Bar-Ilan University) K&H+N R F1 0.604 0.55 0.984 0.983 0.986 0.985 P 0.759 0.891 0.894 BLESS R F1 0.756 0.755 0.889 0.889 0.893 0.893 LexNET P 0.788 0.712 0.813 ROOT09 R F1 0.789 0.788 0.721 0.716 0.814 0.813 EVALution P R F1 0.53 0.537 0.503 0.57 0.573 0.571 0.601 0.607 0.6 CogALex 2016 29 / 33 Evaluation (3/3) Contribution is prominent in unbiased datasets (i.e. when lexical memorization is disabled) method Path-based (PB) Distributional (DSh ) Integrated (LexNET) P 0.713 0.983 0.985 K&H+N R F1 0.604 0.55 0.984 0.983 0.986 0.985 P 0.759 0.891 0.894 BLESS R F1 0.756 0.755 0.889 0.889 0.893 0.893 P 0.788 0.712 0.813 ROOT09 R F1 0.789 0.788 0.721 0.716 0.814 0.813 EVALution P R F1 0.53 0.537 0.503 0.57 0.573 0.571 0.601 0.607 0.6 Path-based models don’t memorize words! e.g. classify (cat, fruit) and (apple, animal) as randoms Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 29 / 33 Contribution of the Path-based Information Source Path-based contribution over distributional info is prominent in the following scenarios: x or y are polysemous, e.g. mero:(piano, key). Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 30 / 33 Contribution of the Path-based Information Source Path-based contribution over distributional info is prominent in the following scenarios: x or y are polysemous, e.g. mero:(piano, key). the relation is not prototypical, e.g. event:(cherry, pick). Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 30 / 33 Contribution of the Path-based Information Source Path-based contribution over distributional info is prominent in the following scenarios: x or y are polysemous, e.g. mero:(piano, key). the relation is not prototypical, e.g. event:(cherry, pick). x or y are rare, e.g. hyper:(mastodon, proboscidean). Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 30 / 33 Contribution of the Path-based Information Source Path-based contribution over distributional info is prominent in the following scenarios: x or y are polysemous, e.g. mero:(piano, key). the relation is not prototypical, e.g. event:(cherry, pick). x or y are rare, e.g. hyper:(mastodon, proboscidean). Thanks to the path representation, such relations are captured even with a single meaningful co-occurrence of x and y Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 30 / 33 Analysis of Semantic Relations All models are bad in recognizing synonyms and antonyms. Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 31 / 33 Analysis of Semantic Relations All models are bad in recognizing synonyms and antonyms. Path-based: Synonyms do not tend to occur together Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 31 / 33 Analysis of Semantic Relations All models are bad in recognizing synonyms and antonyms. Path-based: Synonyms do not tend to occur together Antonyms occur in similar paths as co-hyponyms: hot and cold, cats and dogs Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 31 / 33 Analysis of Semantic Relations All models are bad in recognizing synonyms and antonyms. Path-based: Synonyms do not tend to occur together Antonyms occur in similar paths as co-hyponyms: hot and cold, cats and dogs Distributional: Synonyms and antonyms occur in similar contexts: “go down in the elevator/lift”, “it is hot/cold today” Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 31 / 33 Analysis of Semantic Relations All models are bad in recognizing synonyms and antonyms. Path-based: Synonyms do not tend to occur together Antonyms occur in similar paths as co-hyponyms: hot and cold, cats and dogs Distributional: Synonyms and antonyms occur in similar contexts: “go down in the elevator/lift”, “it is hot/cold today” Problem not yet solved! Can we integrate additional information sources? :) Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 31 / 33 Recap We presented LexNET, an adaptation of HypeNET [Shwartz et al., 2016] for multiple semantic relations Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 32 / 33 Recap We presented LexNET, an adaptation of HypeNET [Shwartz et al., 2016] for multiple semantic relations LexNET outperforms the individual path-based / distributional models Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 32 / 33 Recap We presented LexNET, an adaptation of HypeNET [Shwartz et al., 2016] for multiple semantic relations LexNET outperforms the individual path-based / distributional models The distributional source is dominant across most datasets Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 32 / 33 Recap We presented LexNET, an adaptation of HypeNET [Shwartz et al., 2016] for multiple semantic relations LexNET outperforms the individual path-based / distributional models The distributional source is dominant across most datasets However, it learns the characteristics of a single word Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 32 / 33 Recap We presented LexNET, an adaptation of HypeNET [Shwartz et al., 2016] for multiple semantic relations LexNET outperforms the individual path-based / distributional models The distributional source is dominant across most datasets However, it learns the characteristics of a single word Path-based information always contributes to the classification Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 32 / 33 Recap We presented LexNET, an adaptation of HypeNET [Shwartz et al., 2016] for multiple semantic relations LexNET outperforms the individual path-based / distributional models The distributional source is dominant across most datasets However, it learns the characteristics of a single word Path-based information always contributes to the classification It captures the relation between the words Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 32 / 33 Recap We presented LexNET, an adaptation of HypeNET [Shwartz et al., 2016] for multiple semantic relations LexNET outperforms the individual path-based / distributional models The distributional source is dominant across most datasets However, it learns the characteristics of a single word Path-based information always contributes to the classification It captures the relation between the words Thank you! Vered Shwartz (Bar-Ilan University) LexNET CogALex 2016 32 / 33 References Baroni, M., Bernardi, R., Do, N.-Q., and Shan, C.-c. (2012). Entailment above the word level in distributional semantics. In EACL, pages 23–32. Pennington, J., Socher, R., and Manning, C. D. (2014). Glove: Global vectors for word representation. In EMNLP, pages 1532–1543. Baroni, M. and Lenci, A. (2011). How we blessed distributional semantic evaluation. In Proceedings of the GEMS 2011 Workshop on GEometrical Models of Natural Language Semantics, pages 1–10. Riedel, S., Yao, L., McCallum, A., and Marlin, B. M. (2013). Relation extraction with matrix factorization and universal schemas. In NAACL. Harris, Z. S. (1954). Distributional structure. Word, 10(2-3):146–162. Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In ACL, pages 539–545. Levy, O., Remus, S., Biemann, C., and Dagan, I. (2015). Do supervised distributional methods really learn lexical inference relations. NAACL. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111–3119. Mirkin, S., Dagan, I., and Geffet, M. (2006). Integrating pattern-based and distributional similarity methods for lexical entailment acquisition. In COLING and ACL, pages 579–586. Nakashole, N., Weikum, G., and Suchanek, F. (2012). Patty: a taxonomy of relational patterns with semantic types. In EMNLP and CoNLL, pages 1135–1145. Necşulescu, S., Bel, N., Mendes, S., Jurgens, D., and Navigli, R. (2015). Reading between the lines: Overcoming data sparsity for accurate classification of lexical relationships. Lexical and Computational Semantics (* SEM 2015), page 182. Pavlick, E., Bos, J., Nissim, M., Beller, C., Durme, B. V., and Callison-Burch, C. (2015). Adding semantics to data-driven paraphrasing. In ACL. Vered Shwartz (Bar-Ilan University) Roller, S. and Erk, K. (2016). Relations such as hypernymy: Identifying and exploiting hearst patterns in distributional vectors for lexical entailment. In Proceedings of EMNLP 2016. Roller, S., Erk, K., and Boleda, G. (2014). Inclusive yet selective: Supervised distributional hypernymy detection. In COLING, pages 1025–1036. Santus, E., Lenci, A., Chiu, T.-S., Lu, Q., and Huang, C.-R. (2016). Nine features in a random forest to learn taxonomical semantic relations. In LREC. Santus, E., Yung, F., Lenci, A., and Huang, C.-R. (2015). Evalution 1.0: an evolving semantic dataset for training and evaluation of distributional semantic models. ACL-IJCNLP 2015, page 64. Shwartz, V., Goldberg, Y., and Dagan, I. (2016). Improving hypernymy detection with an integrated path-based and distributional method. In Proceedings of ACL 2016 (Volume 1: Long Papers), pages 2389–2398. Snow, R., Jurafsky, D., and Ng, A. Y. (2004). Learning syntactic patterns for automatic hypernym discovery. In NIPS. Snow, R., Jurafsky, D., and Ng, A. Y. (2006). Semantic taxonomy induction from heterogenous evidence. In ACL, pages 801–808. Turney, P. D. (2006). Similarity of semantic relations. CL, 32(3):379–416. Weeds, J., Clarke, D., Reffin, J., Weir, D., and Keller, B. (2014). Learning to distinguish hypernyms and co-hyponyms. In COLING, pages 2249–2259. LexNET CogALex 2016 33 / 33
© Copyright 2026 Paperzz