Classification of Entailment Relations in PPDB

Classification of Entailment Relations in PPDB
CHAPTER 5. ENTAILMENT RELATIONS
71
CHAPTER 5. ENTAILMENT RELATIONS
R0000
1
R0001
R0010
CHAPTER 5. ENTAILMENT RELATIONS
Overview
CHAPTER 5. ENTAILMENT71
RELATIONS
equivalence
R0100
This document outlines our protocol for labeling
noun pairs according to the entailment relations proposed by Bill MacCartney in his 2009 thesis on Natural Language Inference. Our purpose of doing this
is to build a labelled data set with which to train a
classifier for differentiating between these relations.
The classifier can be used to assign probabilities of
each relation to the paraphrase rules in PPDB, making PPDB a more informative resource for downstream tasks such as recognizing textual entailment
(RTE).
71
R0011
synonym
R0101
R0110
R0000
R1000
R0001
R1001
R0010
R1010
R0100
R1100
R0101
R1101
R0110
R1110
R0111
R0000
antonym
R0001
R0010
R0011
R0011
R0000
R1011
R0100
R0001
R0101
R0010
R0110
R0011
R0111
R0111
R0100
R1111
R1000
R0101
R1001
R0110
R1010
couch
sofa
CHAPTER 5. ENTAILMENT RELATIONS
forward entailment
71
negation
hyponymy
unable
able
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Each
box representsRthe universe U , and
within the box
represent
the sets
R1001 the two circles
R1010
R1011
R1000
R1001
1000
x and y. A region is white if it is empty, and shaded if it is non-empty.
Thus in the
R1100
R1101
CHAPTER 5. ENTAILMENT RELATIONS
71
diagram labeled R1101 , only the region x \ y is empty,
indicatingbird
that ; ⇢ x ⇢ y ⇢ U .
alternation
71
R0111
R1011
shared hypernym
R1010
R1110
R1011
R1111
carni
vore
16 elementary set
relations, represented
by Johnston diagrams. Each
R0000Figure 5.2: The
R0001
R0010
R0011
feline
canine
represents
U , are
and
within the box represent
the sets
R1100in which only R
Rbox
R1111universe
equivalence class
partition
10 is empty.)
These equivalence
classes
Rthe
R1101 the two circles
R1110
R1111
1101
1110
1100
x and y. A region is white if it is empty, and shaded if it is non-empty. Thus in the
depicted graphically in figure 5.2.
crow R1101 , only the region x \ y is empty, indicating that ; ⇢ x ⇢ y ⇢ U .
diagram labeled
In fact,
of elementary
these equivalence
classes represented
is a setFigure
relation,
that
aelementary
set of ordered
Figure
5.2: each
The 16
set relations,
by5.2:
Johnston
Each
Theis,
16diagrams.
set relations, represented by Johnston diagrams.
Each
dog
cat
universe
U , these
and
two relations
circles
within
box
represent
theU ,sets
universe
and
within the box represent the sets
pairsrepresents
of sets. We
will
refer to
16 set
asrepresents
thethe
elementary
set relations,
R0001box
Rthe
R0011 the
R0100box
Rthe
R0110 the two circles
R0111
0010
0101
x and y. A region is white if it is empty, and shaded
if ity.is A
non-empty.
Thusonly
x and
region
white
ifinitthe
is empty, 10
andis shaded
it is non-empty.
in the
equivalence
classthe
inisrelations
which
empty.)if These
equivalenceThus
classes
are
and we will denote this set of 16 relations by R. By
construction,
in partition
R
diagram labeled R1101 , only the region x \ y is empty,
indicating
that
; ⇢, xonly
⇢ ythe
⇢U
.
diagram
labeled
R1101
region
x \ y is empty, indicating that ; ⇢ x ⇢ y ⇢ U .
depicted
graphically
in figure
5.2.
are both mutually exhaustive (every ordered pair of
sets belongs
to some
relation
in
reverse entailment
independence
no path
Inhypernymy
fact,
of these equivalence
R) and mutually exclusive (no ordered pair of sets belongs
to each
two different
relations classes is a set relation, that is, a set of ordered
R0000
of exactly
sets.
We
will
refer
toin these
16 set
the elementary
setclasses
relations,
R0101equivalence
R0110in ordered
Rof
which only
10 be
is empty.)
These
equivalence
classes
are
R1000pairs
R1001
R
R1011 as These
equivalence
class
in
which
only
partition
10relations
is empty.)
equivalence
are
in R). Thus,class
every
pairpartition
assigned
to
one
relation
R.
0111sets can
1010
R0100
entity
and
we will
denote this
set of 5.2.
16 relations by R. By construction, the relations in R
Asian
depicted
graphically
in figure
both
mutually
(every classes
orderedispair
sets belongs
in
In fact, each of these equivalence classes is a setare
relation,
that
set ofequivalence
ordered
In
fact,
eachis,ofaexhaustive
these
a setof relation,
that to
is, some
a set relation
of ordered
R)
(no
ordered
of sets belongs
two different
relations
pairs of sets. We will refer to these 16 set relations
asand
the
elementary
relations,
pairs
of mutually
sets.
We exclusive
willset
refer
to these
16 pair
set relations
as the to
elementary
set relations,
ungula
substa
depicted graphically in figure 5.2.
te
nce
R1001and we will denote
R1010 this set of 16R1011
R1100
R1101
R1111 construction,
in
R).
every
ordered
pair
of
sets canbybeR.
assigned
to exactly one
relation
in
1110
relations by R.
By
construction,
the this
relations
in16RR
and
weThus,
will denote
set of
relations
By
the relations
inR.
R
R1000
2
Entailment Relations
Thaito some
are both mutually exhaustive (every ordered pair of
belongs
relation(every
in ordered pair of sets belongs to some relation in
aresets
both
mutually
exhaustive
food
hippo
Figure pair
5.2: of
The
16R)
elementary
set different
relations,
represented
Johnston
R) and mutually exclusive (no ordered
sets
belongs
to two
relations
and mutually
exclusive
(no orderedbypair
of sets diagrams.
belongs to Each
two different
relations
box
represents
theinuniverse
U , every
and
the
two circles
thebe
box
represent
the setsone relation in R.
in R). Thus, every ordered pair of
sets
can be assigned
exactly
one
relation
in R.
R).toThus,
ordered
pair
ofwithin
sets can
assigned
to exactly
and y. A region is white if it is empty, and shaded if it is non-empty. Thus in the
R1101
R1110
Rx
1111
diagram labeled R1101 , only the region x \ y is empty, indicating that ; ⇢ x ⇢ y ⇢ U .
R1100
Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Each
box represents the universe U , and the two circles within the box represent
the class
sets in which only partition 10 is empty.) These equivalence classes are
equivalence
x and y. A region is white if it is empty, and shaded if it is non-empty. Thus in the
in figure 5.2.
diagram labeled R1101 , only the region x \ y is empty, indicating thatdepicted
; ⇢ x ⇢ graphically
y ⇢ U.
In fact, each of these equivalence classes is a set relation, that is, a set of ordered
MacCartney’s thesis proposes a method for partitioning all of the possible relationships that can exist
between two phrases into 16 distinct relationships.
Of these, he defines the 7 non-degenerate relations
as the “basic entailment relationships.” MacCartney’s
table explaining these 7 relations is reproduced
CHAPTER 5. ENTAILMENT RELATIONS
in figure 1.
Figure 2: Mappings of MacCartney’s relations onto the
Wordnet hierarchy.
pairs of sets. We will refer to these 16 set relations as the elementary set relations,
equivalence class in which only partition 10 is empty.) These equivalence
are this set of 16 relations by R. By construction, the relations in R
and weclasses
will denote
depicted graphically in figure 5.2.
are both mutually exhaustive (every ordered pair of sets belongs to some relation in
In fact, each of these equivalence classes is a set relation, that is,R)
a set
ordered exclusive (no ordered pair of sets belongs to two different relations
andofmutually
• forward entailment (<) : if X is true then Y is
true but if Y is true then X may or may not be
true.
pairs of sets. We will refer to these 16 set relations as the elementary
set Thus,
relations,
in R).
every ordered pair of sets can be assigned to exactly one relation in R.
and we will denote this set of 16 relations by R. By construction, the relations in R
are both mutually exhaustive (every ordered pair of sets belongs to some relation in
R) and mutually exclusive (no ordered pair of sets belongs to two different relations
in R). Thus, every ordered pair of sets can be assigned to exactly one relation in R.
symbol10
x⌘y
name
equivalence
x@y
forward entailment
xAy
reverse entailment
x
^
y
negation
x | y
alternation
x`y
cover
x#y
independence
example
couch ⌘ sofa
crow @ bird
Asian A Thai
able
^
unable
cat | dog
animal ` non-ape
hungry # hippo
79
set theoretic definition11
in R
x=y
R1001
x⇢y
R1101
x\y =;^x[y =U
R0110
x
y
R1011
x \ y = ; ^ x [ y 6= U
R1110
(all other cases)
R1111
x \ y 6= ; ^ x [ y = U
R0111
The relations in B can be characterized as follows. First, we preserve the semantic
Figure
1: MacCartney’s
entailment
relations.
In into
containment
relations (v and w) 7
of basic
the monotonicity
calculus,
but factor them
three mutually
equivalence
(⌘),phrase
(strict) forward
column
4, Uexclusive
refers relations:
to the space
of all
pairs entailment
(x,y). (@),
and (strict) reverse entailment (A). Second, we include two relations expressing
semantic exclusion: negation (^), or exhaustive exclusion, which is analogous to set
In summary, the relations are :
complement; and alternation (|), or non-exhaustive exclusion. The next relation is
cover (`), or non-exclusive exhaustion. Though its utility is not immediately obvious,
it is the dual under negation (in the sense defined in section 5.3.1) of the alternation
• reverse entailment (>) : if Y is true then X is
true but if X is true then Y may or may not be
true.
• negation (∧) : if X is true then Y is not true and
if Y is not true then X is true, and either X or Y
must be true.
• alternation (|) : if X is true then Y is not true
but if Y is not true then X may or may not be
true.
• cover (^) : if X is true then Y may or may not
be true and if Y is true then X may or may not
be true, and either X or Y must be true.
• equivalence (=) : if X is true then Y is true and
if Y is true then X is true.
• independent (#) : if X is true then Y may or
non-equivalence, non-containment, non-exclusion, and non-exhaustion. As noted in
relation. Finally, the independence relation (#) covers all other cases: it expresses
section 5.3.2, # is the least informative relation, in that it places the fewest constraints
on its arguments.
10
Selecting an appropriate symbol to represent each relation is a vexed problem. We sought
symbols which (a) are easily approximated by a single ASCII character, (b) are graphically symmetric
iff the relations they represent are symmetric, and (c) do not excessively abuse accepted conventions.
The
^
symbol was chosen to evoke the logically similar bitwise XOR operator of the C programming
language family; regrettably, it may also evoke the Boolean AND function. The | symbol was chosen
may not be true and if Y is true then X may or
may not be true.
measure
quantity
sum
sum of money
As MacCartney acknowledges, the utility of the
cover relation for RTE is “not immediately obvious,”
and we do not use this relation in our work.
amount
payroll
revenue
box office
3
Defining relations within Wordnet
Below, we describe our rules for mapping a pair of
nodes in the Wordnet noun hierarchy onto one of
the six basic entailment relations. Our mappings are
shown pictorially in figure 2.
We use Python NLTK’s Wordnet interface with
Wordnet version 3.1 to implement the described algorithms.
In the following definitions, lowercase letters represent strings and capital letters represent Wordnet
synsets.
Synonym x and y are considered synonyms if any
sense of x shares a synset with any sense of y. That
is, if
synsets(x) ∩ synsets(y) 6= ∅.
Hypernym x is considered a hypernym of y if
some synset of x is the root of a subtree containing
some synset of y. I.e.
synsets(y) ∩
[
X∈synsets(x)
children(X) 6= ∅
where children(X) is the transitive closure of the
children of X (i.e. all the nodes in the subtree rooted
at X).
Hyponym x is considered a hyponym of y if y is
a hypernym of x, using the definition of hypernym
given above. Figure 3 gives a visual representation
of how this process establishes that “drop” is a hyponym of “amount.”
Antonym Wordnet defines the antonym relationship over lemmas, rather than synsets. We therefore
define the set of antonyms of x as
antonyms(x) =
[
[
total
volume
indefinite quantity
capacity
small
indefinite quantity
vital capacity
driblet
grand total
subtotal
drib
drop
dip
fall
drop off
cliff
bead
pearl
Figure 3: We establish the hypernym relation by taking
the transitive closure of the children of the synsets of x
(“amount”) and intersecting with the union of the synsets
of y (“drop”).
where wx is a lemma and WNantonyms(wx ) is the
set of lemmas Wordnet defines as antonyms for wx .
We then call x and y antonyms if x ∈ antonyms(y)
or y ∈ antonyms(x).
Alternation We say that x and y are in alternation
if x and y are in disjoint subtrees rooted at a shared
hypernym. I.e., x and y are in alternation if both
1. None of the conditions for synonym, hypernym, hyponym, or antonym are met, and
2. hypernyms(x) ∩ hypernyms(y) 6= ∅.
Wordnet is structured so that the entire noun hierarchy is rooted at “entity.” The hierarchy is fairly
fat, with most nodes having a depth between 5 and
10 (figure 5). To avoid having too many false positives for the alternation class, we do not count two
words as alternation if their deepest shared hypernym is in the first three levels of the hierarchy (i.e.
has a depth less than three). This amounts to removing 26 synsets as potential shared hypernyms
for alternation, which are shown in table 5. These
nodes largely represent abstract categories such as
“physical entity” and “causal agent.” Removing
these nodes prevents true independent pairs from being falsely labeled as alternation. We elaborate on
the more potential false positives for the alternation
class in section 7.
WNantonyms(wx ) Other Relatedness In addition to MacCartney’s
relations, we define a sixth relation category, which
X∈synsets(x) wx ∈X
serves as a catch-all for terms which are flagged as
related by Wordnet but whose relation is not built
into the hierarchical structure. WN refers to these
relations as either semantic, meaning they exist as
pointers between synsets, or as lexical, meaning they
exist as pointers between lemmas.
While these noun pairs do not meet the criteria
of the basic entailment relations defined, we feel
they carry more information about entailment than
do truly independent terms. Currently, we combine
all of these forms of relatedness into an “other” category, and we leave the exact entailment implications
of this category to be learned in downstream applications.
Semantic relations We use two of WN’s semantic noun-noun relations, holonymy and meronymy.
WN breaks each of these into three subclasses: part,
member, and substance. For example, arm/body is
classified as part holonym, musician/musical group
is classified as member holonym, water/ice is classified as substance holonym. We combine these
three fine-grained classes and use only the coarser
holonym and meronym labels. We define x and y
as holonyms analogously to our definition of hypernymy:
[
synsets(y) ∩
WNholonyms(X) 6= ∅
[
lemmas(y) =
[
wy .
Y ∈synsets(y) wy ∈Y
We then call y a pertainym of x if lemmas(y) ∩
pertain(x) 6= ∅.
Derivational relatedness is defined identically,
with a slight modification to enforce transitivity.
I.e. x is derivationally related to y if lemmas(y) ∩
derive(x) 6= ∅ or deriv(x) ∩ derive(y) 6= ∅.
This is to deal with a small number of cases in
which two nodes share a derivational stem but lack
an explicit pointer between them, e.g. WN relates
vaccine/vaccinate and vaccination/vaccinate, but not
vaccine/vaccination.
Semantic relations treated as lexical relations
We also use WN’s attribute pointer, which is given
as a semantic relation (between synsets) but exists as
a pointer from an adjective synset to a noun synset.
We want to treat this relation as we did with pertainymy, that is we want to assign the attribute label
to noun pairs if x is lexically identical to an adjective reachable from y through an attribute pointer,
or vis-versa. We therefore treat attribute as a lexical relation by taking the union over the lemmas of
synsets, rather than over synsets:
X∈synsets(x)
where WNholonym(X) is the transitive closure of the
holonyms of X according to WN. Meronymy is defined identically.
Lexical relations We use two of WN’s lexical
relations: derivational relatedness and pertainymy.
Pertainym pointers exist between adjectives and
nouns, not between nouns and other nouns. Because
PPDB contains unreliable part of speech labels, we
assign the pertainym labels to describe the relation
between a pair of nouns x/y if x is lexically identical to an adjective reachable from y through a pertainym pointers, or vis-versa. We define pertained of
x analogously to antonyms:
pertain(x) =
[
[
X∈synsets(x) wx ∈X
We define the lemmas of y as:
WNpertain(wx ).
attr(x) =
[
[
[
wx
X∈synsets(x) A∈WNattr(X) wx ∈A
where WNattr(x) is the set of synsets for which
WN considers X to be an attribute. As before,
we then call y an attribute of x if lemmas(y) ∩
attar(x) 6= ∅.
Independent Independence is simply defined as
failing to meet any of the criteria for the above 5 basic entailment relations or for the “other” category.
4
Preprocessing steps
We remove noun pairs in which x and y are inflected
forms of the same lemma. In querying for Wordnet synsets, however, we leave each noun in its inflected form. This is to address the fact that NLTK
lemmatizes automatically when no synset exists for
the inflected form (e.g. it will map “dogs” to the
synset for “dog”) but requires inflections when there
alternation
actress/suspect
contamination/desiccation
cup/dons
disguise/monsignor
dory/race
enterprise/nasa
governance/presentation
magazine/tricycle
phenomenon/tooth
region/science
antonym
antonym/simile
assembly/disassembly
benignity/malignancy
broadening/narrowing
end/outset
evil/goodness
foes/friend
hardness/softness
hypernatremia/hyponatremia
imprecision/precision
hypernym
amount/droplet
box/casket
care/medication
content/formalism
country/kabul
country/uganda
district/omaha
genus/populus
grounds/newfoundland
object/wedge
hyponym
aryan/being
battleground/region
blattaria/order
botswana/countries
carol/vocals
connecticut/district
family/parentage
fashion/practice
fisher/workers
male/someone
independent
appeal/v
armenian/consonant
cops/machination
farewell/string
freshmen/obligation
general/inspection
hitchcock/vocals
ko/state
southwestern/special
wave/weiss
other related
anatolia/bosporus
blur/smear
championship/mount
dissent/stand
fujiyama/japan
haliotidae/haliotis
mountain/pack
rating/site
shipment/transportation
stoppage/stopping
synonym
belongings/property
centers/plaza
dispatch/expedition
environment/surroundings
lasagna/lasagne
level/tier
pa/pop
quivering/trembling
sentence/synonym
yankees/yanks
Table 1: Some example noun pairs for each label defined according to Wordnet.
independent
alternation
hypernym
hyponym
synonym
antonym
other relations
deriv. rel.
holonym
meronym
attribute
pertainym
11,512,210
8,826,439
115,256
112,339
13,653
656
43,620
25,094
8,973
8,595
537
421
Table 2: Distribution of labels over 20,624,173 noun pairs
found in Wordnet.
is a differentiation in the synsets (e.g. querying for
“marine” will return the synset for “a member of the
United States Marine Corps” but only querying for
the pluralized “marines” will return the synset for “a
division of the United States Navy.”)
We check the criteria for each label in sequence,
to enforce mutual exclusivity of labels. The order
in which the criteria are checked is 1) antonym, 2)
synonym, 3) hypernym, 4) hyponym, 5) other relatedness, 6) alternation, 7) independence. This means
that if a noun pair fits the criteria for multiple relations (due to multiple senses, for example), the labels will be prioritized in this order. For example,
the noun pair king/queen fits the criteria for antonym
as well as for synonym (under the synset “a competitor who holds a preeminent position”), but it is
assigned only to the antonym relation.
5
Results
We ran the described labeling method on 20,624,173
noun pairs. The nouns in each pair occurred to-
gether in a sentence in the Wackypedia corpus and
were joined by a path of no more than 5 nodes in the
parse tree. Table 1 shows some sample noun pairs
and table 2 gives the number of noun pairs matching
each relation. Table 4 shows a sample of the labels
assigned to noun pairs from occurring both in Wackypedia and in PPDB.
6
Weaknesses and limitations
Deviations from MacCartney’s definitions The
structure of the Wordnet hierarchy and our definitions result in some differences between our classes
and those proposed by MacCarthy. Whereas MacCarthy’s alternation is a strictly contradictory relationship (if X is true, than Y cannot be true), our
alternation class does not have this property.
Wordnet’s inconsistent use of the parent/child
pointers prevents us from consistently identifying
when a shared parent node indicates true alternation
verses when it indicates a hypernymy relationship.
Figure 4 shows an example of how the relationship
between a node and its immediate children may exemplify either relationship. Because we cannot systematically minimize these errors (we elaborate in
section 7), we do nothing to prevent this form of
false positive. We recognize that, compared to the
strong exclusivity implication of MacCartney’s alternation (table 3), our alternation will be weaker in
RTE tasks.
Additionally, MacCartney’s definition of antonym
requires that antonym pairs have the “exhaustiveness” property; that is, it requires that if x and y are
antonyms, then any noun must either be x or be y.
This is not the definition of antonym used by Wordnet. For example, Wordnet provides the antonym
pair man/woman, even though it is possible for some
noun to be neither a man nor a woman. Because of
this, many of our pairs labelled antonym more accu-
Entailment
Containment
Contradiction
Compatibility
MacCartney
equivalence
forward entailment
reverse entailment
negation
alternation
independent
Wordnet
synonym
hyponym
hypernym
antonym
alternation
independent
Table 3: Our definition of alternation provides substantially weaker signal for entailment recognition than does
MacCartney’s alternation.
rately would fall under MacCartney’s alternation.
emotion
love
|
fear
fear
hysteria
>
panic
Figure 4: Inconsistent levels of granularity in the Wordnet hierarchy can cause containment relations (hypernymy/hyponym) to be falsely labeled as alternation. On
the left is true alternation. On the right, the child nodes
would be most appropriately labeled as hypernymy.
Other limitations We try to work only with
nouns. We therefore draw all of our labels from
WN’s noun hierarchy and
apply our labeling only
carnivore
instrumentality
to phrases in PPDB which are tagged as nouns. The
tagging in PPDB is not perfect, and we receive some
pairs which are
more likely a
feline supposed noun/noundevice
equipment
different part of speech (e.g. basic/fundamental).
Classifying these pairs under the assumption that
|
cat
machine
they are nouns
causes noisy <
labelingtechnology
(e.g. basic/fundamental is labeled as independent).
Wordnet has peculiarities which sometimes cause
incorrect labels. One issue is that of word sense.
Despite Wordnet’s effort to cover a large number
of senses for each noun, it is not always complete. For example, Wordnet causes the pairs constraint/limitation and downturn/slowdown to be labeled as alternation, even though they are most often
used synonymously. Another issue is that of overformalization, which causes a disconnect between
the information in Wordnet and the way terms are
used colloquially. For example, Wordnet defines ob-
jectivity as an “attribute” and impartiality as a “psychological feature,” thus causing the pair objectivity/impartiality to be labeled as independent, despite
the fact that these two terms are used interchangeably. Similarly, Wordnet defines “dairy” as a hyponym of “farm” and “dairy product” as a hyponym
of “food,” causing the pair dairy/milk to be labeled
as independent.
7
Handling alternations
As discussed above, the alternation class presents
some particular difficulties when defined using only
the information available in Wordnet. In this section,
we explore in detail the shortcomings of the Wordnet
approach and motivate our decision to collect manual annotations for our data.
7.1
Definitions
We refer to the depth of a node as the length of the
path from that node to the root. Some nodes have
multiple inheritance 1 or are reachable from the root
node via more than one path. The NLTK Wordnet
interface provides access to both the minimum depth
(the length of the shortest path from root to node)
and the maximum depth (the length of the longest
path from root to node). In the following experiments, we mean the minimum depth when we refer to depth. All of the experiments were replicated
using the maximum depth, and showed no notable
change in results.
When we refer to the distance between two nodes
we are referring to the length of the shortest path
between the two nodes.
When we refer to the shared parent of two nouns,
we refer to the deepest common hypernym of both
nouns. That is, in cases where the nouns share more
than one hypernym, we choose the hypernym with
the greatest depth. For example, the nouns “ape” and
“monkey” share the hypernym “primate,” which has
a depth of 11 and also “animal” which has a depth
of 6. We choose “primate” as the deepest shared
hypernym.
1
For example, the synset person.n.01 has a depth of 3
via the path entity→physical entity→causal agent→person,
as well as a depth of 6, via the path entity→physical
entity→object→whole→living thing→organism→person.
alternation
advisor/councillor
beach/range
cap/plug
dealer/distributor
flavor/perfume
homecoming/prom
hop/leap
kitten/pussycat
phenol/phenolic
text/version
hypernym
accuracy/precision
aircraft/plane
animal/livestock
combustion/incineration
enterprise/firm
eruption/rash
grandchild/granddaughter
homicide/manslaughter
machine/machinery
oxide/rust
hyponym
bobcat/lynx
boldness/courage
chaos/confusion
coast/shore
compassion/sympathy
diet/food
footnote/note
horseman/rider
jet/plane
spadefoot/toad
228
3
2020
4
6249
5
12267
6
18936
7
14155
8
11042
9
7207
10
4267
11
2505
12
independent
1383
13
business/commercial
846
14
cave/cellar
449
15
col/colonel
341
16
disability/invalidity
164
17
flipper/pinball
30
18
freezing/gel
klein/small
man/rights
stability/stabilization
tour/tower
other related
bicycle/cycling
green/greening
karelia/karelian
path/road
patriarch/patriarchate
programming/programs
prosecution/trial
retail/retailer
ski/skiing
work/working
synonym
auberge/inn
avant-garde/vanguard
bazaar/bazar
boundary/limit
cereal/grain
faith/trust
normandie/normandy
optimisation/optimization
usage/use
vapor/vapour
Table 4: Labeling applied to noun pairs in PPDB-large. Only one pair appearing in WN (proliferation/nonproliferation)
was assigned the label “antonym.”
7.2
Parameters
We want to optimize the accuracy of classifying a
noun pair x/y as “alternation.” Using only the information within Wordnet, we have three basic parameters to play with: the depth of the x and y’s
shared parent in the hierarchy, the distance of the
shared parent from the either x or y, and the number
of children the parent has.
Depth of parent Wordnet is organized hierarchically, so intuitively, nodes which are nearer to the
root refer to more general topics, and nodes deeper
in the hierarchy become more specific. Figure 5
shows how the top levels of the hierarchy contain
very few nodes, and table 5 shows the kinds of general concepts that exist in these top levels.
It is therefore seems reasonable to assume that,
for a noun pair labeled as alternation, the depth of
the shared parent may be indicative of the accuracy
of the alternation: noun pairs for which the lowest
shared parent is very high in the hierarchy (close to
the root) may actually be independent (e.g. if both
nouns are hyponyms of “thing” but otherwise are
unconnected) and noun pairs for which the lowest
shared parent is very deep in the hierarchy (close to
the leaf nodes) may actually be in a hypernymy relationship (e.g. if both nodes are children of something specific like “happiness”). We test the hypothesis that the depth of the shared parent can be tuned
to reduce the number of false positive labels for alternation.
Distance from parent Additionally, we consider
the nouns’ distance from their shared parent. While
in some alternation pairs, both nouns are immediate
decedents of their shared hypernym (e.g. “love” and
“fear” are both children of “emotion”), other pairs
consist of nouns which are very far from the shared
18936
20000
14155
15000
12267
11042
10000
7207
6249
4267
5000
2505
2020
0
1
3
22
228
0
1
2
3
4
5
6
7
8
9
10
11
12
1383 846
13
14
449
341
164
30
15
16
17
18
Depth
Figure 5: Number of nodes at each depth in WN.
0
1
2
entity.n.01
abstraction.n.06, physical entity.n.01, thing.n.08
attribute.n.02, causal agent.n.01, change.n.06,
communication.n.02, freshener.n.01, group.n.01,
horror.n.02, jimdandy.n.02, matter.n.03, measure.n.02, object.n.01, otherworld.n.01, pacifier.n.02, process.n.06, psychological feature.n.01,
relation.n.01, security blanket.n.01, set.n.02,
stinker.n.02,
substance.n.04,
thing.n.12,
whacker.n.01
Table 5: 26 synsets appearing with depth <3. Most
of these synsets refer to general concepts (e.g. process
and object). Others are specific, but have few or no
child nodes. The following have no children: change,
freshener, horror, jimdandy, otherworld, pacifier, security blanket, stinker, substance, and whacker.
parent (e.g. “step father” and “oil painter” can be
classified as alternation through the shared parent
“person,” but the nouns are 7 and 4 steps away, respectively). We explore the idea of limiting the maximum distance from the shared parent in order to increase the precision of our alternation classification.
It is worth noting that Wordnet’s “sister terms” are
equivalent to our alternation if we restrict the maximum distance from the shared parent to be no more
than 1.
Number of children Finally, we consider the
number of immediate children of the shared parent.
While it is less intuitive than the previous parame-
ters how this may affect the likelihood of alternation, it may be that parents with many children are
nodes which enumerate subtypes (e.g. the synset
for “flower” has over 100 children, which enumerate different types of flowers) and thus are highprecision indicators of alternation. We therefore also
investigate the relationship between the number of
children of a node and the likelihood of the children
being in alternation.
7.3
Labelled data for testing
In order to compare the effect of altering these parameters, we manually labeled 200 noun pairs as
alternation/non-alternation. We chose these pairs
by randomly sorting the noun pairs that appeared
in PPDB (xl) and were labeled as alternation using
our original configuration (minimum depth of 3 and
no other restrictions). We labelled the first 100 unambiguous examples of alternation and the first 100
unambiguous examples of non-alternation. The full
list used is included at the end of this paper (table 8).
This data is used to compute precision/recall measures in the following experiments.
7.4
Experiments
We first performed a qualitative analysis. Manually
inspecting the data to evaluate the minimum depth
parameter, we see that there are examples of both
good and bad alternation pairs alternating through
parents at every depth. Table 6 illustrates this.
Manual inspection of the maximum distance parameter is similarly inconclusive. We ran our tagging algorithm on a random sample of 30 manually
labeled pairs (15 alternation, 15 non-alternation),
setting max distance d to each value from 1 to 10,
and enforcing the constraint that a noun pair is in alternation if the nouns in the pair share a hypernym
and neither noun is more than d hops from the shared
hypernym. Increasing d increases equally the number of true positives and the number of false positives. At d = 7, all 30 nouns were labeled as alternation. Table 7 shows these results.
We also look quantitatively at the effects of these
parameters. Figure 6 shows the precision and recall curves generated by a) fixing the max distance
parameter at infinity and varying the min depth between 1 and 10 and b) fixing the min depth at 0 and
varying the max distance parameter between 1 and
depth
parent
x/y
3
whole.n.02
state.n.02
ballplayer/baseball
bliss/happiness
4
happening.n.01
disorder.n.03
avalanche/flood
instability/turmoil
5
linear unit.n.01
district.n.01
km/mile
town/village
6
meal.n.01
atmospheric phenomenon.n.01
dinner/lunch
snow/snowstorm
7
modulation.n.02
payment.n.01
am/pm
compensation/wage
8
garment.n.01
man.n.01
sweater/t-shirt
boy/guy
9
movable feast.n.01
clergyman.n.01
easter/passover
preacher/priest
10
bird of prey.n.01
placental.n.01
eagle/hawk
cattle/livestock
Table 6: Pairs of words labeled as alternation, and the
words’ shared parent. Parents at every depth can produce
both good (top) and bad (bottom) examples of alternation.
10. In the former case (left of figure 6), we see that
we are able to achieve acceptable precision when we
set a high minimum depth, but only at the expense
of unacceptably low recall. In the latter case (right
of figure 6), precision stays near 50% (the equivalent
of labeling everything as alternation) no matter what
value we choose for max distance.
P
R
P
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0
R
0.1
0
1
2
3
4
5
6
7
Min depth of shared parent
8
9
10
0
0
1
2
3
4
5
6
7
8
9
10
Max distance to shared parent
Figure 6: Precision and recall curves when varying max
distance (left) and min depth (right). For max distance
curves, min depth was fixed at 0. For min depth curves,
max distance was fixed at 20. A precision of 0.5 and
a recall of 1.0 can are achieved by labeling all pairs as
alternation.
We also look at the effect of using these two parameters (maximum distance from parent and minimum depth of parent) together. Figure 7 shows the
F1 scores achieved for different combinations of parameter settings. No combination achieves higher
than 0.67, the F1 score achievable by labeling everything as alternation (which results in 50% precision
1
2
3
20/21
butterflies/moths
clam/mussel
cuttlefish/squid
duck/goose
ecg/eeg
km/mile
tornado/typhoon
xxvii/xxviii
adjustment/tuning
bomber/kamikaze
briefing/presentation
contaminant/pollutant
diagram/graphic
home/shelter
interval/span
judgement/trial
20/21
butterflies/moths
clam/mussel
coat/gown
cuttlefish/squid
dress/skirt
duck/goose
ecg/eeg
elf/goblin
km/mile
tornado/typhoon
xxvii/xxviii
adjustment/tuning
artist/performer
bomber/kamikaze
briefing/presentation
contaminant/pollutant
diagram/graphic
drama/theatre
home/shelter
indignation/resentment
interval/span
judgement/trial
4
5
20/21
butterflies/moths
clam/mussel
coat/gown
cuttlefish/squid
dress/skirt
duck/goose
ecg/eeg
elf/goblin
girls/guys
km/mile
tornado/typhoon
xxvii/xxviii
adjustment/tuning
artist/performer
bomber/kamikaze
briefing/presentation
contaminant/pollutant
cutoff/threshold
diagram/graphic
drama/theatre
home/shelter
indignation/resentment
internet/website
interval/span
judgement/trial
20/21
butterflies/moths
clam/mussel
coat/gown
cuttlefish/squid
dress/skirt
duck/goose
ecg/eeg
elf/goblin
girls/guys
km/mile
tornado/typhoon
xxvii/xxviii
adjustment/tuning
artist/performer
bomber/kamikaze
briefing/presentation
contaminant/pollutant
cutoff/threshold
diagram/graphic
drama/theatre
home/shelter
indignation/resentment
internet/website
interval/span
judgement/trial
monitoring/surveillance
6
20/21
ballplayer/baseball
butterflies/moths
clam/mussel
coat/gown
cuttlefish/squid
dress/skirt
duck/goose
ecg/eeg
elf/goblin
girls/guys
km/mile
tornado/typhoon
volleyball/weightlifting
xxvii/xxviii
adjustment/tuning
artist/performer
bomber/kamikaze
briefing/presentation
contaminant/pollutant
cutoff/threshold
diagram/graphic
drama/theatre
home/shelter
indignation/resentment
internet/website
interval/span
judgement/trial
monitoring/surveillance
ALTERNATION
20/21
butterflies/moths
clam/mussel
cuttlefish/squid
duck/goose
ecg/eeg
tornado/typhoon
xxvii/xxviii
adjustment/tuning
briefing/presentation
home/shelter
interval/span
NON-ALTERNATION
7
20/21
ballplayer/baseball
butterflies/moths
clam/mussel
coat/gown
cuttlefish/squid
dress/skirt
duck/goose
ecg/eeg
elf/goblin
girls/guys
km/mile
tornado/typhoon
volleyball/weightlifting
xxvii/xxviii
adjustment/tuning
artist/performer
bomber/kamikaze
briefing/presentation
contaminant/pollutant
cutoff/threshold
diagram/graphic
drama/theatre
home/shelter
indignation/resentment
internet/website
interval/span
judgement/trial
looting/theft
monitoring/surveillance
Table 7: Noun pairs labeled as alternation (out of a test set of 30 manually labeled pairs) with varying values of max
distance. The numbers of true positives and of false positives increase equally as the max distance increases.
and 100% recall).
depth/dist
0
1
2
3
4
5
6
7
8
9
10
0
0.000
0.532
0.557
0.624
0.640
0.650
0.667
0.669
0.669
0.667
0.667
1
0.000
0.532
0.557
0.624
0.640
0.650
0.667
0.669
0.669
0.667
0.667
2
0.000
0.532
0.557
0.624
0.640
0.650
0.667
0.669
0.669
0.667
0.667
3
0.000
0.532
0.557
0.624
0.640
0.650
0.667
0.669
0.669
0.667
0.667
4
0.000
0.532
0.567
0.636
0.651
0.664
0.667
0.664
0.664
0.662
0.662
5
0.000
0.521
0.570
0.649
0.653
0.667
0.672
0.672
0.674
0.674
0.674
6
0.000
0.491
0.541
0.629
0.627
0.624
0.628
0.628
0.628
0.628
0.628
7
0.000
0.403
0.468
0.561
0.570
0.570
0.575
0.575
0.575
0.575
0.575
8
0.000
0.331
0.400
0.468
0.468
0.468
0.465
0.465
0.465
0.465
0.465
9
0.000
0.293
0.318
0.357
0.357
0.357
0.354
0.354
0.354
0.354
0.354
10
0.000
0.212
0.226
0.226
0.226
0.226
0.224
0.224
0.224
0.224
0.224
graphs are not shown, the results were the same as
in the graph for maximum distance: for all values of
k and for both the “at least” and the “at most” setting, precision remained at 50%, equivalent to random guessing.
non-alternation
alternation
32
19
Finally, we look at the association between the
number of children a parent has and the likelihood
that those children are in alternation. Figure 8 shows
a histogram of the number of immediate children
per parent for each pair in our manually labeled data
set. The distributions of number of children is nearly
identical for both alternation and non-alternation
pairs. We also ran a precision-recall analysis varying both the maximum and the minimum number of
children. That is, we tried labeling pairs as alternation if 1) both nouns shared a parent and 2) that
parent had at least/at most k children. Although the
23
10-20
11
Figure 7: F1 scores for varying minimum depths (rows)
and maximum distances (columns). Precision and recall
were calculated on a set of 200 manually labeled noun
pairs (100 positive examples of alternation, 100 negative). No combination achieves acceptable results: 0.67
is the F1 score that would be achieved by labeling every
pair as alternation (0.5 precision and 1.0 recall).
32
0-10
9
20-30
2
19
04-03
18
5
7
50-60
2
4
60-70
70-80
1
>90
2
05-04
06-05
07-06
08-07
4
80-90
7
03-02
30-40
40-50
3
0 1- 0
02-01
09-08
09>
Figure 8: Number of immediate children of the deepest
shared parent for non-alternation pairs (left) and for alternation pairs (right). The distributions are almost identical, suggesting that the number of children of the shared
parent is not a useful way of reducing the number of false
positive alternation labels.
8
Next steps
Do to the described limitations, and the goal application to RTE, we will gather manual annotations for a
subset of our data in PPDB. By using more reliably
labeled data than what we can collect from Wordnet,
we should be able to train a classifier to automatically label the remainder of the pairs in PPDB. The
10/11
18/19
19/20
20/21
acre/hectare
afternoon/evening
am/pm
animal/zoo
australasia/oceania
avalanche/flood
badger/skunk
ballplayer/baseball
band/orchestra
bands/orchestras
batter/catcher
battleship/boat
beef/ox
billionaire/millionaire
binoculars/telescope
boat/ship
bonnet/cap
box/glove
boxer/wrestler
brass/bronze
bucks/bullets
butterflies/moths
cabin/cockpit
canoe/motorboat
cape/robe
catcher/hitter
char/trout
chariot/wagon
cheetah/leopard
alternation
clam/mussel
cm/inch
coat/gown
coffee/diner
crab/prawn
cricket/grasshopper
crustacean/shellfish
cucumber/lettuce
cuttlefish/squid
d/j
daughter-in-law/son-in-law
dictionary/thesaurus
dinner/lunch
disc/drive
discussion/review
dizziness/numbness
dollar/yuan
dress/skirt
dress/suit
drink/wood
drive/walk
driveway/garage
duck/goose
dungeon/tower
eagle/hawk
easter/passover
ecg/eeg
elbow/shoulder
elf/goblin
epinephrine/norepinephrine
firefighter/firehouse
gallon/litre
gdp/gnp
girls/guys
grape/raisin
hotel/restaurant
i/j
ii/iii
km/mile
latte/milk
lavender/violet
leopard/lion
librarian/library
mouth/throat
n/s
ovum/zygote
p/s
pad/tampon
pan/pot
psychiatrist/psychologist
rider/runner
roman/romanian
sailboat/yacht
saturday/sunday
sauce/spice
skating/skiing
spite/sympathy
sweater/t-shirt
sweater/vest
tornado/typhoon
typewriter/typist
vcr/video
vii/viii
volleyball/weightlifting
windshield/wiper
xvii/xviii
xxvii/xxviii
adjustment/tuning
ailment/disease
ambiguity/vagueness
analysis/assessment
anarchy/chaos
angle/perspective
archives/file
army/military
artist/performer
atmosphere/climate
award/compensation
balloting/election
bigotry/fanaticism
bliss/happiness
block/chunk
blouse/shirt
bomber/kamikaze
boy/guy
brawl/rumble
briefing/presentation
bullying/harassment
cannabis/hashish
capital/investment
case/file
cassette/tapes
cattle/livestock
charisma/charm
chevalier/knight
chin/jaw
chip/token
coil/spool
companion/partner
compensation/wage
non-alternation
conceptualization/design
configuration/setup
construction/design
contaminant/pollutant
correctness/validity
corruption/embezzlement
cutoff/threshold
demonstration/rally
determination/will
diagram/graphic
dish/platter
drama/theater
drama/theatre
drum/keg
dust/powder
education/studies
election/poll
fellowship/scholarship
figure/graphic
figure/silhouette
first/top
focus/spotlight
freeway/interstate
guard/warden
high-rise/skyscraper
home/shelter
humidity/moisture
idea/insight
inability/weakness
indignation/resentment
instability/turmoil
internet/website
interval/span
itinerary/trip
judgement/trial
lady/queen
leave/vacation
looting/theft
love/romance
mansion/villa
map/route
means/resources
merit/value
mil/thousand
monitoring/surveillance
motion/petition
municipality/township
opponent/rival
outfit/suit
papa/pope
pattern/trend
persecution/repression
personality/self
polo/shirt
preacher/priest
psychologist/therapist
reality/truth
room/suite
sailing/yachting
seminars/symposiums
senior/superior
shawl/wrap
shuttle/spaceship
snow/snowstorm
sore/wound
subway/train
town/village
Table 8: Manually annotated noun pairs used in precision/recall calculations.
resulting labels (or distributions over labels) will
hopefully offer a valuable contribution to RTE systems, with a scale and level of annotation that is not
currently available in existing resources.
9
References
MacCartney, Bill. Natural Language Inference.
Diss. Stanford University, 2009.