slides

Path-based vs. Distributional Information
in Recognizing Lexical Semantic Relations
Vered Shwartz and Ido Dagan
Bar-Ilan University
December 12, 2016
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
1 / 33
Lexical Semantic Relations
Some interesting lexical semantic relations:
Synonymy: (elevator, lift)
Hypernymy (Hyponymy): (green, color), (Obama, president)
Meronymy (Holonymy): (London, England), (hand, body)
Antonymy: (cold, hot)
etc.
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
2 / 33
Lexical Semantic Relations
Some interesting lexical semantic relations:
Synonymy: (elevator, lift)
Hypernymy (Hyponymy): (green, color), (Obama, president)
Meronymy (Holonymy): (London, England), (hand, body)
Antonymy: (cold, hot)
etc.
Relations are used to infer one word from another in certain contexts:
I
I
I
I
ate an apple → I ate a fruit
hate fruit → I hate apples
visited Tokyo → I visited Japan
left Tokyo 6→ I left Japan
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
2 / 33
Recognizing Lexical Semantic Relations
Given two terms, x and y , decide what is the semantic relation that
holds between them (if any)
in some senses of x and y
e.g. both fruit and company are hypernyms of apple
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
3 / 33
Example Motivation - Recognizing Textual Entailment
Text
A boy is hitting a baseball
Hypotheses
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
4 / 33
Example Motivation - Recognizing Textual Entailment
Text
A boy is hitting a baseball
Hypotheses
1
A child is hitting a baseball ⇒ ENTAILMENT: hypernym:(boy, child)
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
4 / 33
Example Motivation - Recognizing Textual Entailment
Text
A boy is hitting a baseball
Hypotheses
1
A child is hitting a baseball ⇒ ENTAILMENT: hypernym:(boy, child)
2
A boy is missing a baseball ⇒ CONTRADICTION: antonym:(hitting, missing)
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
4 / 33
Example Motivation - Recognizing Textual Entailment
Text
A boy is hitting a baseball
Hypotheses
1
A child is hitting a baseball ⇒ ENTAILMENT: hypernym:(boy, child)
2
A boy is missing a baseball ⇒ CONTRADICTION: antonym:(hitting, missing)
3
A girl is hitting a baseball ⇒ NEUTRAL: co-hyponym:(boy, girl)
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
4 / 33
What’s in the talk?
Overview of prior path-based and distributional methods for
semantic relation classification
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
5 / 33
What’s in the talk?
Overview of prior path-based and distributional methods for
semantic relation classification
...LSTM-based architecture for semantic relation classification
(sorry!)
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
5 / 33
What’s in the talk?
Overview of prior path-based and distributional methods for
semantic relation classification
...LSTM-based architecture for semantic relation classification
(sorry!)
Analysis of the contribution of each information source to the
classification
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
5 / 33
Prior Methods
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
6 / 33
Semantic Relation Classification
Corpus-based Methods
Distributional
Vered Shwartz (Bar-Ilan University)
Path-based
LexNET
CogALex 2016
7 / 33
Distributional Methods
Corpus-based Methods
Distributional
Vered Shwartz (Bar-Ilan University)
Path-based
LexNET
CogALex 2016
8 / 33
Distributional Methods
Recognize the relation between x and y based on their separate
occurrences in the corpus
Namely: based on their neighbor distributions [Harris, 1954]
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
9 / 33
Distributional Methods
Recognize the relation between x and y based on their separate
occurrences in the corpus
Namely: based on their neighbor distributions [Harris, 1954]
In earlier methods, words are represented as count-based vectors:
elevator
lift
Vered Shwartz (Bar-Ilan University)
0
0
0
0
...
...
0.56
0.55
↑
up
LexNET
0
0
...
...
0.89
0.91
↑
stairs
0
0
...
...
0
0
CogALex 2016
9 / 33
Distributional Methods
Recognize the relation between x and y based on their separate
occurrences in the corpus
Namely: based on their neighbor distributions [Harris, 1954]
In earlier methods, words are represented as count-based vectors:
elevator
lift
0.56 0 ... 0.89 0 ... 0
0.55 0 ... 0.91 0 ... 0
↑
↑
up
stairs
Recent years: words are represented using low-dimensional word
embeddings [Mikolov et al., 2013, Pennington et al., 2014]
Vered Shwartz (Bar-Ilan University)
0
0
0
0
...
...
LexNET
CogALex 2016
9 / 33
Supervised Distributional Methods
(x, y ) term-pairs are represented as a feature vector, based of the
terms’ embeddings:
Concatenation ~x ⊕ ~y [Baroni et al., 2012]
Difference ~y − ~x [Roller et al., 2014, Weeds et al., 2014]
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
10 / 33
Supervised Distributional Methods
(x, y ) term-pairs are represented as a feature vector, based of the
terms’ embeddings:
Concatenation ~x ⊕ ~y [Baroni et al., 2012]
Difference ~y − ~x [Roller et al., 2014, Weeds et al., 2014]
A classifier is trained to predict the semantic relation between x and y
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
10 / 33
Supervised Distributional Methods
(x, y ) term-pairs are represented as a feature vector, based of the
terms’ embeddings:
Concatenation ~x ⊕ ~y [Baroni et al., 2012]
Difference ~y − ~x [Roller et al., 2014, Weeds et al., 2014]
A classifier is trained to predict the semantic relation between x and y
Achieved very good results on common datasets
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
10 / 33
Supervised Distributional Methods
[Levy et al., 2015]: supervised distributional method do not learn the
relation between x and y
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
11 / 33
Supervised Distributional Methods
[Levy et al., 2015]: supervised distributional method do not learn the
relation between x and y
Instead, they memorize single words that occur in the same relation
e.g. that fruit or animal are prototypical hypernyms
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
11 / 33
Supervised Distributional Methods
[Levy et al., 2015]: supervised distributional method do not learn the
relation between x and y
Instead, they memorize single words that occur in the same relation
e.g. that fruit or animal are prototypical hypernyms
Recent work suggests these methods actually do more than memorize
[Roller and Erk, 2016]
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
11 / 33
Supervised Distributional Methods
[Levy et al., 2015]: supervised distributional method do not learn the
relation between x and y
Instead, they memorize single words that occur in the same relation
e.g. that fruit or animal are prototypical hypernyms
Recent work suggests these methods actually do more than memorize
[Roller and Erk, 2016]
But they still provide only the prior of x or y to fit the relation
[Shwartz et al., 2016]
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
11 / 33
Path-based Methods
Corpus-based Methods
Distributional
Vered Shwartz (Bar-Ilan University)
Path-based
LexNET
CogALex 2016
12 / 33
Path-based Methods
Recognize the relation between x and y based on their joint
occurrences in the corpus
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
13 / 33
Path-based Methods
Recognize the relation between x and y based on their joint
occurrences in the corpus
Patterns connecting x and y may indicate the semantic relation that
holds between them
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
13 / 33
Path-based Methods
Recognize the relation between x and y based on their joint
occurrences in the corpus
Patterns connecting x and y may indicate the semantic relation that
holds between them
[Hearst, 1992] defined a set of patterns that indicate hypernymy, e.g.:
X or other Y
X is a Y
Y, including X
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
13 / 33
Path-based Methods
[Snow et al., 2004] represented patterns as the dependency paths
connecting the words:
ATTR
NSUBJ
Vered Shwartz (Bar-Ilan University)
DET
apple
is
a
fruit
NOUN
VERB
DET
NOUN
LexNET
CogALex 2016
14 / 33
Path-based Methods
[Snow et al., 2004] represented patterns as the dependency paths
connecting the words:
ATTR
NSUBJ
DET
apple
is
a
fruit
NOUN
VERB
DET
NOUN
They learned a hypernymy classifier using paths as features
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
14 / 33
Path-based Methods
[Snow et al., 2004] represented patterns as the dependency paths
connecting the words:
ATTR
NSUBJ
DET
apple
is
a
fruit
NOUN
VERB
DET
NOUN
They learned a hypernymy classifier using paths as features
Methods inspired by [Snow et al., 2004] were broadly adopted for
semantic relation classification
[Snow et al., 2006, Turney, 2006, Riedel et al., 2013]
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
14 / 33
Path-based Methods
However, this method suffers from relatively low recall:
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
15 / 33
Path-based Methods
However, this method suffers from relatively low recall:
1
x and y must occur together in the corpus (enough times!)
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
15 / 33
Path-based Methods
However, this method suffers from relatively low recall:
1
2
x and y must occur together in the corpus (enough times!)
paths connecting x and y must occur enough times in the corpus
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
15 / 33
Path-based Methods
However, this method suffers from relatively low recall:
1
2
x and y must occur together in the corpus (enough times!)
paths connecting x and y must occur enough times in the corpus
[Necşulescu et al., 2015] tried to overcome (1) by concatenating
paths – e.g. by a connecting word z.
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
15 / 33
Path-based Methods
However, this method suffers from relatively low recall:
1
2
x and y must occur together in the corpus (enough times!)
paths connecting x and y must occur enough times in the corpus
[Necşulescu et al., 2015] tried to overcome (1) by concatenating
paths – e.g. by a connecting word z.
Performance improved, but the method was still outperformed by a
distributional baseline
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
15 / 33
Path-based Methods
However, this method suffers from relatively low recall:
1
2
x and y must occur together in the corpus (enough times!)
paths connecting x and y must occur enough times in the corpus
[Necşulescu et al., 2015] tried to overcome (1) by concatenating
paths – e.g. by a connecting word z.
Performance improved, but the method was still outperformed by a
distributional baseline
PATTY [Nakashole et al., 2012] tried to overcome (2) by generalized
paths, e.g.: X is defined/described as Y ⇒ X is VERB as Y
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
15 / 33
Path-based Methods
However, this method suffers from relatively low recall:
1
2
x and y must occur together in the corpus (enough times!)
paths connecting x and y must occur enough times in the corpus
[Necşulescu et al., 2015] tried to overcome (1) by concatenating
paths – e.g. by a connecting word z.
Performance improved, but the method was still outperformed by a
distributional baseline
PATTY [Nakashole et al., 2012] tried to overcome (2) by generalized
paths, e.g.: X is defined/described as Y ⇒ X is VERB as Y
Syntactic generalizations make semantic mistakes: X is rejected as Y
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
15 / 33
Integrated Methods
Corpus-based Methods
Distributional
Path-based
Integrated Methods
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
16 / 33
Integrated Methods
Path-based and distributional information sources are considered
complementary in recognizing semantic relations
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
17 / 33
Integrated Methods
Path-based and distributional information sources are considered
complementary in recognizing semantic relations
Classifiers with path-based and distributional features:
[Mirkin et al., 2006, Pavlick et al., 2015]
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
17 / 33
Integrated Methods
Path-based and distributional information sources are considered
complementary in recognizing semantic relations
Classifiers with path-based and distributional features:
[Mirkin et al., 2006, Pavlick et al., 2015]
HypeNET [Shwartz et al., 2016]: integrated path-based and
distributional method for hypernymy detection
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
17 / 33
Integrated Methods
Path-based and distributional information sources are considered
complementary in recognizing semantic relations
Classifiers with path-based and distributional features:
[Mirkin et al., 2006, Pavlick et al., 2015]
HypeNET [Shwartz et al., 2016]: integrated path-based and
distributional method for hypernymy detection
Improved neural path representation
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
17 / 33
Integrated Methods
Path-based and distributional information sources are considered
complementary in recognizing semantic relations
Classifiers with path-based and distributional features:
[Mirkin et al., 2006, Pavlick et al., 2015]
HypeNET [Shwartz et al., 2016]: integrated path-based and
distributional method for hypernymy detection
Improved neural path representation
Integrated with distributional features, and trained jointly
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
17 / 33
Integrated Methods
Path-based and distributional information sources are considered
complementary in recognizing semantic relations
Classifiers with path-based and distributional features:
[Mirkin et al., 2006, Pavlick et al., 2015]
HypeNET [Shwartz et al., 2016]: integrated path-based and
distributional method for hypernymy detection
Improved neural path representation
Integrated with distributional features, and trained jointly
...works well also for multiple semantic relations (LexNET)
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
17 / 33
LexNET Architecture
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
18 / 33
Dependency Path Representation [Shwartz et al., 2016]
1
An edge is a concatenation of 4 component vectors:
dependent lemma /
dependent POS /
Vered Shwartz (Bar-Ilan University)
dependency label /
LexNET
be/VERB/ROOT/-
direction
CogALex 2016
19 / 33
Dependency Path Representation [Shwartz et al., 2016]
1
An edge is a concatenation of 4 component vectors:
dependent lemma /
2
dependent POS /
dependency label /
be/VERB/ROOT/-
direction
Edges are fed sequentially to an LSTM to get the path embedding:
Embeddings:
lemma
POS
dependency label
direction
~vpaths(x,y )
o~p
X/NOUN/nsubj > be/VERB/ROOT <
Y/NOUN/attr
. . .
average pooling
X/NOUN/dobj > define/VERB/ROOT < as/ADP/prep < Y/NOUN/pobj
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
19 / 33
Classification Models
1
Path-based
2
Distributional
3
Integrated
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
20 / 33
Path-based Model (PB)
A classifier is trained on the path embedding ~vpaths(x,y ) :
v~xy
v~wx
...
(x, y )
classification
(softmax)
~vpaths(x,y )
v~wy
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
21 / 33
Distributional Model (DS)
A classifier is trained on the concatenation of x and y ’s word
embeddings [~vwx , ~vwy ]:
v~xy
v~wx
...
(x, y )
classification
v~wy
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
22 / 33
Integrated Model (LexNET)
A classifier is trained on the concatenation of the path embedding
~vpaths(x,y ) , and x and y ’s word embeddings [~vwx , ~vwy ].
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
23 / 33
Integrated Model (LexNET)
A classifier is trained on the concatenation of the path embedding
~vpaths(x,y ) , and x and y ’s word embeddings [~vwx , ~vwy ].
(HypeNET for multiple relations):
v~xy
v~wx
...
(x, y )
classification
(softmax)
~vpaths(x,y )
v~wy
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
23 / 33
Non-linear Distributional Model (DSh )
[Levy et al., 2015]: linear classifiers incapable of learning interactions
between x and y ’s features
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
24 / 33
Non-linear Distributional Model (DSh )
[Levy et al., 2015]: linear classifiers incapable of learning interactions
between x and y ’s features
How about adding non-linear expressive power? (i.e. hidden layer...)
v~xy
v~wx
...
(x, y )
classification
(softmax)
~h
v~wy
Vered Shwartz (Bar-Ilan University)
v~wy
LexNET
CogALex 2016
24 / 33
Non-linear Integrated Model (LexNETh )
Similarly, we add a hidden layer to the integrated network:
v~xy
v~wx
...
(x, y )
classification
(softmax)
~vpaths(x,y )
~h
v~wy
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
25 / 33
Analysis
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
26 / 33
Evaluation (1/3)
We tested our models on common semantic relations datasets:
K&H+N [Necşulescu et al., 2015]
BLESS [Baroni and Lenci, 2011]
ROOT09 [Santus et al., 2016]
EVALution [Santus et al., 2015]
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
27 / 33
Evaluation (1/3)
We tested our models on common semantic relations datasets:
K&H+N [Necşulescu et al., 2015]
BLESS [Baroni and Lenci, 2011]
ROOT09 [Santus et al., 2016]
EVALution [Santus et al., 2015]
Each dataset contains several semantic relations, among:
hypernymy
meroynymy
co-hyponymy
event
attribute
synonymy
antonymy
random
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
27 / 33
Evaluation (2/3)
LexNET outperforms individual path-based and distributional methods
method
Path-based (PB)
Distributional (DSh )
Integrated (LexNET)
P
0.713
0.983
0.985
Vered Shwartz (Bar-Ilan University)
K&H+N
R
F1
0.604
0.55
0.984 0.983
0.986 0.985
P
0.759
0.891
0.894
BLESS
R
F1
0.756 0.755
0.889 0.889
0.893 0.893
LexNET
P
0.788
0.712
0.813
ROOT09
R
F1
0.789 0.788
0.721 0.716
0.814 0.813
EVALution
P
R
F1
0.53
0.537 0.503
0.57
0.573 0.571
0.601 0.607
0.6
CogALex 2016
28 / 33
Evaluation (2/3)
LexNET outperforms individual path-based and distributional methods
method
Path-based (PB)
Distributional (DSh )
Integrated (LexNET)
P
0.713
0.983
0.985
Vered Shwartz (Bar-Ilan University)
K&H+N
R
F1
0.604
0.55
0.984 0.983
0.986 0.985
P
0.759
0.891
0.894
BLESS
R
F1
0.756 0.755
0.889 0.889
0.893 0.893
LexNET
P
0.788
0.712
0.813
ROOT09
R
F1
0.789 0.788
0.721 0.716
0.814 0.813
EVALution
P
R
F1
0.53
0.537 0.503
0.57
0.573 0.571
0.601 0.607
0.6
CogALex 2016
28 / 33
Evaluation (2/3)
LexNET outperforms individual path-based and distributional methods
method
Path-based (PB)
Distributional (DSh )
Integrated (LexNET)
P
0.713
0.983
0.985
Vered Shwartz (Bar-Ilan University)
K&H+N
R
F1
0.604
0.55
0.984 0.983
0.986 0.985
P
0.759
0.891
0.894
BLESS
R
F1
0.756 0.755
0.889 0.889
0.893 0.893
LexNET
P
0.788
0.712
0.813
ROOT09
R
F1
0.789 0.788
0.721 0.716
0.814 0.813
EVALution
P
R
F1
0.53
0.537 0.503
0.57
0.573 0.571
0.601 0.607
0.6
CogALex 2016
28 / 33
Evaluation (2/3)
LexNET outperforms individual path-based and distributional methods
method
Path-based (PB)
Distributional (DSh )
Integrated (LexNET)
P
0.713
0.983
0.985
Vered Shwartz (Bar-Ilan University)
K&H+N
R
F1
0.604
0.55
0.984 0.983
0.986 0.985
P
0.759
0.891
0.894
BLESS
R
F1
0.756 0.755
0.889 0.889
0.893 0.893
LexNET
P
0.788
0.712
0.813
ROOT09
R
F1
0.789 0.788
0.721 0.716
0.814 0.813
EVALution
P
R
F1
0.53
0.537 0.503
0.57
0.573 0.571
0.601 0.607
0.6
CogALex 2016
28 / 33
Evaluation (3/3)
Path-based contribution over distributional info is small in some
datasets:
method
Path-based (PB)
Distributional (DSh )
Integrated (LexNET)
P
0.713
0.983
0.985
Vered Shwartz (Bar-Ilan University)
K&H+N
R
F1
0.604
0.55
0.984 0.983
0.986 0.985
P
0.759
0.891
0.894
BLESS
R
F1
0.756 0.755
0.889 0.889
0.893 0.893
LexNET
P
0.788
0.712
0.813
ROOT09
R
F1
0.789 0.788
0.721 0.716
0.814 0.813
EVALution
P
R
F1
0.53
0.537 0.503
0.57
0.573 0.571
0.601 0.607
0.6
CogALex 2016
29 / 33
Evaluation (3/3)
Contribution is prominent in unbiased datasets (i.e. when lexical
memorization is disabled)
method
Path-based (PB)
Distributional (DSh )
Integrated (LexNET)
P
0.713
0.983
0.985
K&H+N
R
F1
0.604
0.55
0.984 0.983
0.986 0.985
P
0.759
0.891
0.894
BLESS
R
F1
0.756 0.755
0.889 0.889
0.893 0.893
P
0.788
0.712
0.813
ROOT09
R
F1
0.789 0.788
0.721 0.716
0.814 0.813
EVALution
P
R
F1
0.53
0.537 0.503
0.57
0.573 0.571
0.601 0.607
0.6
Path-based models don’t memorize words! e.g. classify (cat, fruit)
and (apple, animal) as randoms
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
29 / 33
Contribution of the Path-based Information Source
Path-based contribution over distributional info is prominent in the
following scenarios:
x or y are polysemous, e.g. mero:(piano, key).
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
30 / 33
Contribution of the Path-based Information Source
Path-based contribution over distributional info is prominent in the
following scenarios:
x or y are polysemous, e.g. mero:(piano, key).
the relation is not prototypical, e.g. event:(cherry, pick).
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
30 / 33
Contribution of the Path-based Information Source
Path-based contribution over distributional info is prominent in the
following scenarios:
x or y are polysemous, e.g. mero:(piano, key).
the relation is not prototypical, e.g. event:(cherry, pick).
x or y are rare, e.g. hyper:(mastodon, proboscidean).
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
30 / 33
Contribution of the Path-based Information Source
Path-based contribution over distributional info is prominent in the
following scenarios:
x or y are polysemous, e.g. mero:(piano, key).
the relation is not prototypical, e.g. event:(cherry, pick).
x or y are rare, e.g. hyper:(mastodon, proboscidean).
Thanks to the path representation, such relations are captured even
with a single meaningful co-occurrence of x and y
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
30 / 33
Analysis of Semantic Relations
All models are bad in recognizing synonyms and antonyms.
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
31 / 33
Analysis of Semantic Relations
All models are bad in recognizing synonyms and antonyms.
Path-based:
Synonyms do not tend to occur together
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
31 / 33
Analysis of Semantic Relations
All models are bad in recognizing synonyms and antonyms.
Path-based:
Synonyms do not tend to occur together
Antonyms occur in similar paths as co-hyponyms:
hot and cold, cats and dogs
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
31 / 33
Analysis of Semantic Relations
All models are bad in recognizing synonyms and antonyms.
Path-based:
Synonyms do not tend to occur together
Antonyms occur in similar paths as co-hyponyms:
hot and cold, cats and dogs
Distributional:
Synonyms and antonyms occur in similar contexts:
“go down in the elevator/lift”, “it is hot/cold today”
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
31 / 33
Analysis of Semantic Relations
All models are bad in recognizing synonyms and antonyms.
Path-based:
Synonyms do not tend to occur together
Antonyms occur in similar paths as co-hyponyms:
hot and cold, cats and dogs
Distributional:
Synonyms and antonyms occur in similar contexts:
“go down in the elevator/lift”, “it is hot/cold today”
Problem not yet solved! Can we integrate additional information
sources? :)
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
31 / 33
Recap
We presented LexNET, an adaptation of HypeNET
[Shwartz et al., 2016] for multiple semantic relations
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
32 / 33
Recap
We presented LexNET, an adaptation of HypeNET
[Shwartz et al., 2016] for multiple semantic relations
LexNET outperforms the individual path-based / distributional models
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
32 / 33
Recap
We presented LexNET, an adaptation of HypeNET
[Shwartz et al., 2016] for multiple semantic relations
LexNET outperforms the individual path-based / distributional models
The distributional source is dominant across most datasets
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
32 / 33
Recap
We presented LexNET, an adaptation of HypeNET
[Shwartz et al., 2016] for multiple semantic relations
LexNET outperforms the individual path-based / distributional models
The distributional source is dominant across most datasets
However, it learns the characteristics of a single word
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
32 / 33
Recap
We presented LexNET, an adaptation of HypeNET
[Shwartz et al., 2016] for multiple semantic relations
LexNET outperforms the individual path-based / distributional models
The distributional source is dominant across most datasets
However, it learns the characteristics of a single word
Path-based information always contributes to the classification
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
32 / 33
Recap
We presented LexNET, an adaptation of HypeNET
[Shwartz et al., 2016] for multiple semantic relations
LexNET outperforms the individual path-based / distributional models
The distributional source is dominant across most datasets
However, it learns the characteristics of a single word
Path-based information always contributes to the classification
It captures the relation between the words
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
32 / 33
Recap
We presented LexNET, an adaptation of HypeNET
[Shwartz et al., 2016] for multiple semantic relations
LexNET outperforms the individual path-based / distributional models
The distributional source is dominant across most datasets
However, it learns the characteristics of a single word
Path-based information always contributes to the classification
It captures the relation between the words
Thank you!
Vered Shwartz (Bar-Ilan University)
LexNET
CogALex 2016
32 / 33
References
Baroni, M., Bernardi, R., Do, N.-Q., and Shan, C.-c. (2012).
Entailment above the word level in distributional semantics.
In EACL, pages 23–32.
Pennington, J., Socher, R., and Manning, C. D. (2014).
Glove: Global vectors for word representation.
In EMNLP, pages 1532–1543.
Baroni, M. and Lenci, A. (2011).
How we blessed distributional semantic evaluation.
In Proceedings of the GEMS 2011 Workshop on GEometrical Models
of Natural Language Semantics, pages 1–10.
Riedel, S., Yao, L., McCallum, A., and Marlin, B. M. (2013).
Relation extraction with matrix factorization and universal schemas.
In NAACL.
Harris, Z. S. (1954).
Distributional structure.
Word, 10(2-3):146–162.
Hearst, M. A. (1992).
Automatic acquisition of hyponyms from large text corpora.
In ACL, pages 539–545.
Levy, O., Remus, S., Biemann, C., and Dagan, I. (2015).
Do supervised distributional methods really learn lexical inference
relations.
NAACL.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., and Dean, J.
(2013).
Distributed representations of words and phrases and their
compositionality.
In NIPS, pages 3111–3119.
Mirkin, S., Dagan, I., and Geffet, M. (2006).
Integrating pattern-based and distributional similarity methods for
lexical entailment acquisition.
In COLING and ACL, pages 579–586.
Nakashole, N., Weikum, G., and Suchanek, F. (2012).
Patty: a taxonomy of relational patterns with semantic types.
In EMNLP and CoNLL, pages 1135–1145.
Necşulescu, S., Bel, N., Mendes, S., Jurgens, D., and Navigli, R.
(2015).
Reading between the lines: Overcoming data sparsity for accurate
classification of lexical relationships.
Lexical and Computational Semantics (* SEM 2015), page 182.
Pavlick, E., Bos, J., Nissim, M., Beller, C., Durme, B. V., and
Callison-Burch, C. (2015).
Adding semantics to data-driven paraphrasing.
In ACL.
Vered Shwartz (Bar-Ilan University)
Roller, S. and Erk, K. (2016).
Relations such as hypernymy: Identifying and exploiting hearst
patterns in distributional vectors for lexical entailment.
In Proceedings of EMNLP 2016.
Roller, S., Erk, K., and Boleda, G. (2014).
Inclusive yet selective: Supervised distributional hypernymy
detection.
In COLING, pages 1025–1036.
Santus, E., Lenci, A., Chiu, T.-S., Lu, Q., and Huang, C.-R. (2016).
Nine features in a random forest to learn taxonomical semantic
relations.
In LREC.
Santus, E., Yung, F., Lenci, A., and Huang, C.-R. (2015).
Evalution 1.0: an evolving semantic dataset for training and
evaluation of distributional semantic models.
ACL-IJCNLP 2015, page 64.
Shwartz, V., Goldberg, Y., and Dagan, I. (2016).
Improving hypernymy detection with an integrated path-based and
distributional method.
In Proceedings of ACL 2016 (Volume 1: Long Papers), pages
2389–2398.
Snow, R., Jurafsky, D., and Ng, A. Y. (2004).
Learning syntactic patterns for automatic hypernym discovery.
In NIPS.
Snow, R., Jurafsky, D., and Ng, A. Y. (2006).
Semantic taxonomy induction from heterogenous evidence.
In ACL, pages 801–808.
Turney, P. D. (2006).
Similarity of semantic relations.
CL, 32(3):379–416.
Weeds, J., Clarke, D., Reffin, J., Weir, D., and Keller, B. (2014).
Learning to distinguish hypernyms and co-hyponyms.
In COLING, pages 2249–2259.
LexNET
CogALex 2016
33 / 33