Classification of Entailment Relations in PPDB CHAPTER 5. ENTAILMENT RELATIONS 71 CHAPTER 5. ENTAILMENT RELATIONS R0000 1 R0001 R0010 CHAPTER 5. ENTAILMENT RELATIONS Overview CHAPTER 5. ENTAILMENT71 RELATIONS equivalence R0100 This document outlines our protocol for labeling noun pairs according to the entailment relations proposed by Bill MacCartney in his 2009 thesis on Natural Language Inference. Our purpose of doing this is to build a labelled data set with which to train a classifier for differentiating between these relations. The classifier can be used to assign probabilities of each relation to the paraphrase rules in PPDB, making PPDB a more informative resource for downstream tasks such as recognizing textual entailment (RTE). 71 R0011 synonym R0101 R0110 R0000 R1000 R0001 R1001 R0010 R1010 R0100 R1100 R0101 R1101 R0110 R1110 R0111 R0000 antonym R0001 R0010 R0011 R0011 R0000 R1011 R0100 R0001 R0101 R0010 R0110 R0011 R0111 R0111 R0100 R1111 R1000 R0101 R1001 R0110 R1010 couch sofa CHAPTER 5. ENTAILMENT RELATIONS forward entailment 71 negation hyponymy unable able Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Each box representsRthe universe U , and within the box represent the sets R1001 the two circles R1010 R1011 R1000 R1001 1000 x and y. A region is white if it is empty, and shaded if it is non-empty. Thus in the R1100 R1101 CHAPTER 5. ENTAILMENT RELATIONS 71 diagram labeled R1101 , only the region x \ y is empty, indicatingbird that ; ⇢ x ⇢ y ⇢ U . alternation 71 R0111 R1011 shared hypernym R1010 R1110 R1011 R1111 carni vore 16 elementary set relations, represented by Johnston diagrams. Each R0000Figure 5.2: The R0001 R0010 R0011 feline canine represents U , are and within the box represent the sets R1100in which only R Rbox R1111universe equivalence class partition 10 is empty.) These equivalence classes Rthe R1101 the two circles R1110 R1111 1101 1110 1100 x and y. A region is white if it is empty, and shaded if it is non-empty. Thus in the depicted graphically in figure 5.2. crow R1101 , only the region x \ y is empty, indicating that ; ⇢ x ⇢ y ⇢ U . diagram labeled In fact, of elementary these equivalence classes represented is a setFigure relation, that aelementary set of ordered Figure 5.2: each The 16 set relations, by5.2: Johnston Each Theis, 16diagrams. set relations, represented by Johnston diagrams. Each dog cat universe U , these and two relations circles within box represent theU ,sets universe and within the box represent the sets pairsrepresents of sets. We will refer to 16 set asrepresents thethe elementary set relations, R0001box Rthe R0011 the R0100box Rthe R0110 the two circles R0111 0010 0101 x and y. A region is white if it is empty, and shaded if ity.is A non-empty. Thusonly x and region white ifinitthe is empty, 10 andis shaded it is non-empty. in the equivalence classthe inisrelations which empty.)if These equivalenceThus classes are and we will denote this set of 16 relations by R. By construction, in partition R diagram labeled R1101 , only the region x \ y is empty, indicating that ; ⇢, xonly ⇢ ythe ⇢U . diagram labeled R1101 region x \ y is empty, indicating that ; ⇢ x ⇢ y ⇢ U . depicted graphically in figure 5.2. are both mutually exhaustive (every ordered pair of sets belongs to some relation in reverse entailment independence no path Inhypernymy fact, of these equivalence R) and mutually exclusive (no ordered pair of sets belongs to each two different relations classes is a set relation, that is, a set of ordered R0000 of exactly sets. We will refer toin these 16 set the elementary setclasses relations, R0101equivalence R0110in ordered Rof which only 10 be is empty.) These equivalence classes are R1000pairs R1001 R R1011 as These equivalence class in which only partition 10relations is empty.) equivalence are in R). Thus,class every pairpartition assigned to one relation R. 0111sets can 1010 R0100 entity and we will denote this set of 5.2. 16 relations by R. By construction, the relations in R Asian depicted graphically in figure both mutually (every classes orderedispair sets belongs in In fact, each of these equivalence classes is a setare relation, that set ofequivalence ordered In fact, eachis,ofaexhaustive these a setof relation, that to is, some a set relation of ordered R) (no ordered of sets belongs two different relations pairs of sets. We will refer to these 16 set relations asand the elementary relations, pairs of mutually sets. We exclusive willset refer to these 16 pair set relations as the to elementary set relations, ungula substa depicted graphically in figure 5.2. te nce R1001and we will denote R1010 this set of 16R1011 R1100 R1101 R1111 construction, in R). every ordered pair of sets canbybeR. assigned to exactly one relation in 1110 relations by R. By construction, the this relations in16RR and weThus, will denote set of relations By the relations inR. R R1000 2 Entailment Relations Thaito some are both mutually exhaustive (every ordered pair of belongs relation(every in ordered pair of sets belongs to some relation in aresets both mutually exhaustive food hippo Figure pair 5.2: of The 16R) elementary set different relations, represented Johnston R) and mutually exclusive (no ordered sets belongs to two relations and mutually exclusive (no orderedbypair of sets diagrams. belongs to Each two different relations box represents theinuniverse U , every and the two circles thebe box represent the setsone relation in R. in R). Thus, every ordered pair of sets can be assigned exactly one relation in R. R).toThus, ordered pair ofwithin sets can assigned to exactly and y. A region is white if it is empty, and shaded if it is non-empty. Thus in the R1101 R1110 Rx 1111 diagram labeled R1101 , only the region x \ y is empty, indicating that ; ⇢ x ⇢ y ⇢ U . R1100 Figure 5.2: The 16 elementary set relations, represented by Johnston diagrams. Each box represents the universe U , and the two circles within the box represent the class sets in which only partition 10 is empty.) These equivalence classes are equivalence x and y. A region is white if it is empty, and shaded if it is non-empty. Thus in the in figure 5.2. diagram labeled R1101 , only the region x \ y is empty, indicating thatdepicted ; ⇢ x ⇢ graphically y ⇢ U. In fact, each of these equivalence classes is a set relation, that is, a set of ordered MacCartney’s thesis proposes a method for partitioning all of the possible relationships that can exist between two phrases into 16 distinct relationships. Of these, he defines the 7 non-degenerate relations as the “basic entailment relationships.” MacCartney’s table explaining these 7 relations is reproduced CHAPTER 5. ENTAILMENT RELATIONS in figure 1. Figure 2: Mappings of MacCartney’s relations onto the Wordnet hierarchy. pairs of sets. We will refer to these 16 set relations as the elementary set relations, equivalence class in which only partition 10 is empty.) These equivalence are this set of 16 relations by R. By construction, the relations in R and weclasses will denote depicted graphically in figure 5.2. are both mutually exhaustive (every ordered pair of sets belongs to some relation in In fact, each of these equivalence classes is a set relation, that is,R) a set ordered exclusive (no ordered pair of sets belongs to two different relations andofmutually • forward entailment (<) : if X is true then Y is true but if Y is true then X may or may not be true. pairs of sets. We will refer to these 16 set relations as the elementary set Thus, relations, in R). every ordered pair of sets can be assigned to exactly one relation in R. and we will denote this set of 16 relations by R. By construction, the relations in R are both mutually exhaustive (every ordered pair of sets belongs to some relation in R) and mutually exclusive (no ordered pair of sets belongs to two different relations in R). Thus, every ordered pair of sets can be assigned to exactly one relation in R. symbol10 x⌘y name equivalence x@y forward entailment xAy reverse entailment x ^ y negation x | y alternation x`y cover x#y independence example couch ⌘ sofa crow @ bird Asian A Thai able ^ unable cat | dog animal ` non-ape hungry # hippo 79 set theoretic definition11 in R x=y R1001 x⇢y R1101 x\y =;^x[y =U R0110 x y R1011 x \ y = ; ^ x [ y 6= U R1110 (all other cases) R1111 x \ y 6= ; ^ x [ y = U R0111 The relations in B can be characterized as follows. First, we preserve the semantic Figure 1: MacCartney’s entailment relations. In into containment relations (v and w) 7 of basic the monotonicity calculus, but factor them three mutually equivalence (⌘),phrase (strict) forward column 4, Uexclusive refers relations: to the space of all pairs entailment (x,y). (@), and (strict) reverse entailment (A). Second, we include two relations expressing semantic exclusion: negation (^), or exhaustive exclusion, which is analogous to set In summary, the relations are : complement; and alternation (|), or non-exhaustive exclusion. The next relation is cover (`), or non-exclusive exhaustion. Though its utility is not immediately obvious, it is the dual under negation (in the sense defined in section 5.3.1) of the alternation • reverse entailment (>) : if Y is true then X is true but if X is true then Y may or may not be true. • negation (∧) : if X is true then Y is not true and if Y is not true then X is true, and either X or Y must be true. • alternation (|) : if X is true then Y is not true but if Y is not true then X may or may not be true. • cover (^) : if X is true then Y may or may not be true and if Y is true then X may or may not be true, and either X or Y must be true. • equivalence (=) : if X is true then Y is true and if Y is true then X is true. • independent (#) : if X is true then Y may or non-equivalence, non-containment, non-exclusion, and non-exhaustion. As noted in relation. Finally, the independence relation (#) covers all other cases: it expresses section 5.3.2, # is the least informative relation, in that it places the fewest constraints on its arguments. 10 Selecting an appropriate symbol to represent each relation is a vexed problem. We sought symbols which (a) are easily approximated by a single ASCII character, (b) are graphically symmetric iff the relations they represent are symmetric, and (c) do not excessively abuse accepted conventions. The ^ symbol was chosen to evoke the logically similar bitwise XOR operator of the C programming language family; regrettably, it may also evoke the Boolean AND function. The | symbol was chosen may not be true and if Y is true then X may or may not be true. measure quantity sum sum of money As MacCartney acknowledges, the utility of the cover relation for RTE is “not immediately obvious,” and we do not use this relation in our work. amount payroll revenue box office 3 Defining relations within Wordnet Below, we describe our rules for mapping a pair of nodes in the Wordnet noun hierarchy onto one of the six basic entailment relations. Our mappings are shown pictorially in figure 2. We use Python NLTK’s Wordnet interface with Wordnet version 3.1 to implement the described algorithms. In the following definitions, lowercase letters represent strings and capital letters represent Wordnet synsets. Synonym x and y are considered synonyms if any sense of x shares a synset with any sense of y. That is, if synsets(x) ∩ synsets(y) 6= ∅. Hypernym x is considered a hypernym of y if some synset of x is the root of a subtree containing some synset of y. I.e. synsets(y) ∩ [ X∈synsets(x) children(X) 6= ∅ where children(X) is the transitive closure of the children of X (i.e. all the nodes in the subtree rooted at X). Hyponym x is considered a hyponym of y if y is a hypernym of x, using the definition of hypernym given above. Figure 3 gives a visual representation of how this process establishes that “drop” is a hyponym of “amount.” Antonym Wordnet defines the antonym relationship over lemmas, rather than synsets. We therefore define the set of antonyms of x as antonyms(x) = [ [ total volume indefinite quantity capacity small indefinite quantity vital capacity driblet grand total subtotal drib drop dip fall drop off cliff bead pearl Figure 3: We establish the hypernym relation by taking the transitive closure of the children of the synsets of x (“amount”) and intersecting with the union of the synsets of y (“drop”). where wx is a lemma and WNantonyms(wx ) is the set of lemmas Wordnet defines as antonyms for wx . We then call x and y antonyms if x ∈ antonyms(y) or y ∈ antonyms(x). Alternation We say that x and y are in alternation if x and y are in disjoint subtrees rooted at a shared hypernym. I.e., x and y are in alternation if both 1. None of the conditions for synonym, hypernym, hyponym, or antonym are met, and 2. hypernyms(x) ∩ hypernyms(y) 6= ∅. Wordnet is structured so that the entire noun hierarchy is rooted at “entity.” The hierarchy is fairly fat, with most nodes having a depth between 5 and 10 (figure 5). To avoid having too many false positives for the alternation class, we do not count two words as alternation if their deepest shared hypernym is in the first three levels of the hierarchy (i.e. has a depth less than three). This amounts to removing 26 synsets as potential shared hypernyms for alternation, which are shown in table 5. These nodes largely represent abstract categories such as “physical entity” and “causal agent.” Removing these nodes prevents true independent pairs from being falsely labeled as alternation. We elaborate on the more potential false positives for the alternation class in section 7. WNantonyms(wx ) Other Relatedness In addition to MacCartney’s relations, we define a sixth relation category, which X∈synsets(x) wx ∈X serves as a catch-all for terms which are flagged as related by Wordnet but whose relation is not built into the hierarchical structure. WN refers to these relations as either semantic, meaning they exist as pointers between synsets, or as lexical, meaning they exist as pointers between lemmas. While these noun pairs do not meet the criteria of the basic entailment relations defined, we feel they carry more information about entailment than do truly independent terms. Currently, we combine all of these forms of relatedness into an “other” category, and we leave the exact entailment implications of this category to be learned in downstream applications. Semantic relations We use two of WN’s semantic noun-noun relations, holonymy and meronymy. WN breaks each of these into three subclasses: part, member, and substance. For example, arm/body is classified as part holonym, musician/musical group is classified as member holonym, water/ice is classified as substance holonym. We combine these three fine-grained classes and use only the coarser holonym and meronym labels. We define x and y as holonyms analogously to our definition of hypernymy: [ synsets(y) ∩ WNholonyms(X) 6= ∅ [ lemmas(y) = [ wy . Y ∈synsets(y) wy ∈Y We then call y a pertainym of x if lemmas(y) ∩ pertain(x) 6= ∅. Derivational relatedness is defined identically, with a slight modification to enforce transitivity. I.e. x is derivationally related to y if lemmas(y) ∩ derive(x) 6= ∅ or deriv(x) ∩ derive(y) 6= ∅. This is to deal with a small number of cases in which two nodes share a derivational stem but lack an explicit pointer between them, e.g. WN relates vaccine/vaccinate and vaccination/vaccinate, but not vaccine/vaccination. Semantic relations treated as lexical relations We also use WN’s attribute pointer, which is given as a semantic relation (between synsets) but exists as a pointer from an adjective synset to a noun synset. We want to treat this relation as we did with pertainymy, that is we want to assign the attribute label to noun pairs if x is lexically identical to an adjective reachable from y through an attribute pointer, or vis-versa. We therefore treat attribute as a lexical relation by taking the union over the lemmas of synsets, rather than over synsets: X∈synsets(x) where WNholonym(X) is the transitive closure of the holonyms of X according to WN. Meronymy is defined identically. Lexical relations We use two of WN’s lexical relations: derivational relatedness and pertainymy. Pertainym pointers exist between adjectives and nouns, not between nouns and other nouns. Because PPDB contains unreliable part of speech labels, we assign the pertainym labels to describe the relation between a pair of nouns x/y if x is lexically identical to an adjective reachable from y through a pertainym pointers, or vis-versa. We define pertained of x analogously to antonyms: pertain(x) = [ [ X∈synsets(x) wx ∈X We define the lemmas of y as: WNpertain(wx ). attr(x) = [ [ [ wx X∈synsets(x) A∈WNattr(X) wx ∈A where WNattr(x) is the set of synsets for which WN considers X to be an attribute. As before, we then call y an attribute of x if lemmas(y) ∩ attar(x) 6= ∅. Independent Independence is simply defined as failing to meet any of the criteria for the above 5 basic entailment relations or for the “other” category. 4 Preprocessing steps We remove noun pairs in which x and y are inflected forms of the same lemma. In querying for Wordnet synsets, however, we leave each noun in its inflected form. This is to address the fact that NLTK lemmatizes automatically when no synset exists for the inflected form (e.g. it will map “dogs” to the synset for “dog”) but requires inflections when there alternation actress/suspect contamination/desiccation cup/dons disguise/monsignor dory/race enterprise/nasa governance/presentation magazine/tricycle phenomenon/tooth region/science antonym antonym/simile assembly/disassembly benignity/malignancy broadening/narrowing end/outset evil/goodness foes/friend hardness/softness hypernatremia/hyponatremia imprecision/precision hypernym amount/droplet box/casket care/medication content/formalism country/kabul country/uganda district/omaha genus/populus grounds/newfoundland object/wedge hyponym aryan/being battleground/region blattaria/order botswana/countries carol/vocals connecticut/district family/parentage fashion/practice fisher/workers male/someone independent appeal/v armenian/consonant cops/machination farewell/string freshmen/obligation general/inspection hitchcock/vocals ko/state southwestern/special wave/weiss other related anatolia/bosporus blur/smear championship/mount dissent/stand fujiyama/japan haliotidae/haliotis mountain/pack rating/site shipment/transportation stoppage/stopping synonym belongings/property centers/plaza dispatch/expedition environment/surroundings lasagna/lasagne level/tier pa/pop quivering/trembling sentence/synonym yankees/yanks Table 1: Some example noun pairs for each label defined according to Wordnet. independent alternation hypernym hyponym synonym antonym other relations deriv. rel. holonym meronym attribute pertainym 11,512,210 8,826,439 115,256 112,339 13,653 656 43,620 25,094 8,973 8,595 537 421 Table 2: Distribution of labels over 20,624,173 noun pairs found in Wordnet. is a differentiation in the synsets (e.g. querying for “marine” will return the synset for “a member of the United States Marine Corps” but only querying for the pluralized “marines” will return the synset for “a division of the United States Navy.”) We check the criteria for each label in sequence, to enforce mutual exclusivity of labels. The order in which the criteria are checked is 1) antonym, 2) synonym, 3) hypernym, 4) hyponym, 5) other relatedness, 6) alternation, 7) independence. This means that if a noun pair fits the criteria for multiple relations (due to multiple senses, for example), the labels will be prioritized in this order. For example, the noun pair king/queen fits the criteria for antonym as well as for synonym (under the synset “a competitor who holds a preeminent position”), but it is assigned only to the antonym relation. 5 Results We ran the described labeling method on 20,624,173 noun pairs. The nouns in each pair occurred to- gether in a sentence in the Wackypedia corpus and were joined by a path of no more than 5 nodes in the parse tree. Table 1 shows some sample noun pairs and table 2 gives the number of noun pairs matching each relation. Table 4 shows a sample of the labels assigned to noun pairs from occurring both in Wackypedia and in PPDB. 6 Weaknesses and limitations Deviations from MacCartney’s definitions The structure of the Wordnet hierarchy and our definitions result in some differences between our classes and those proposed by MacCarthy. Whereas MacCarthy’s alternation is a strictly contradictory relationship (if X is true, than Y cannot be true), our alternation class does not have this property. Wordnet’s inconsistent use of the parent/child pointers prevents us from consistently identifying when a shared parent node indicates true alternation verses when it indicates a hypernymy relationship. Figure 4 shows an example of how the relationship between a node and its immediate children may exemplify either relationship. Because we cannot systematically minimize these errors (we elaborate in section 7), we do nothing to prevent this form of false positive. We recognize that, compared to the strong exclusivity implication of MacCartney’s alternation (table 3), our alternation will be weaker in RTE tasks. Additionally, MacCartney’s definition of antonym requires that antonym pairs have the “exhaustiveness” property; that is, it requires that if x and y are antonyms, then any noun must either be x or be y. This is not the definition of antonym used by Wordnet. For example, Wordnet provides the antonym pair man/woman, even though it is possible for some noun to be neither a man nor a woman. Because of this, many of our pairs labelled antonym more accu- Entailment Containment Contradiction Compatibility MacCartney equivalence forward entailment reverse entailment negation alternation independent Wordnet synonym hyponym hypernym antonym alternation independent Table 3: Our definition of alternation provides substantially weaker signal for entailment recognition than does MacCartney’s alternation. rately would fall under MacCartney’s alternation. emotion love | fear fear hysteria > panic Figure 4: Inconsistent levels of granularity in the Wordnet hierarchy can cause containment relations (hypernymy/hyponym) to be falsely labeled as alternation. On the left is true alternation. On the right, the child nodes would be most appropriately labeled as hypernymy. Other limitations We try to work only with nouns. We therefore draw all of our labels from WN’s noun hierarchy and apply our labeling only carnivore instrumentality to phrases in PPDB which are tagged as nouns. The tagging in PPDB is not perfect, and we receive some pairs which are more likely a feline supposed noun/noundevice equipment different part of speech (e.g. basic/fundamental). Classifying these pairs under the assumption that | cat machine they are nouns causes noisy < labelingtechnology (e.g. basic/fundamental is labeled as independent). Wordnet has peculiarities which sometimes cause incorrect labels. One issue is that of word sense. Despite Wordnet’s effort to cover a large number of senses for each noun, it is not always complete. For example, Wordnet causes the pairs constraint/limitation and downturn/slowdown to be labeled as alternation, even though they are most often used synonymously. Another issue is that of overformalization, which causes a disconnect between the information in Wordnet and the way terms are used colloquially. For example, Wordnet defines ob- jectivity as an “attribute” and impartiality as a “psychological feature,” thus causing the pair objectivity/impartiality to be labeled as independent, despite the fact that these two terms are used interchangeably. Similarly, Wordnet defines “dairy” as a hyponym of “farm” and “dairy product” as a hyponym of “food,” causing the pair dairy/milk to be labeled as independent. 7 Handling alternations As discussed above, the alternation class presents some particular difficulties when defined using only the information available in Wordnet. In this section, we explore in detail the shortcomings of the Wordnet approach and motivate our decision to collect manual annotations for our data. 7.1 Definitions We refer to the depth of a node as the length of the path from that node to the root. Some nodes have multiple inheritance 1 or are reachable from the root node via more than one path. The NLTK Wordnet interface provides access to both the minimum depth (the length of the shortest path from root to node) and the maximum depth (the length of the longest path from root to node). In the following experiments, we mean the minimum depth when we refer to depth. All of the experiments were replicated using the maximum depth, and showed no notable change in results. When we refer to the distance between two nodes we are referring to the length of the shortest path between the two nodes. When we refer to the shared parent of two nouns, we refer to the deepest common hypernym of both nouns. That is, in cases where the nouns share more than one hypernym, we choose the hypernym with the greatest depth. For example, the nouns “ape” and “monkey” share the hypernym “primate,” which has a depth of 11 and also “animal” which has a depth of 6. We choose “primate” as the deepest shared hypernym. 1 For example, the synset person.n.01 has a depth of 3 via the path entity→physical entity→causal agent→person, as well as a depth of 6, via the path entity→physical entity→object→whole→living thing→organism→person. alternation advisor/councillor beach/range cap/plug dealer/distributor flavor/perfume homecoming/prom hop/leap kitten/pussycat phenol/phenolic text/version hypernym accuracy/precision aircraft/plane animal/livestock combustion/incineration enterprise/firm eruption/rash grandchild/granddaughter homicide/manslaughter machine/machinery oxide/rust hyponym bobcat/lynx boldness/courage chaos/confusion coast/shore compassion/sympathy diet/food footnote/note horseman/rider jet/plane spadefoot/toad 228 3 2020 4 6249 5 12267 6 18936 7 14155 8 11042 9 7207 10 4267 11 2505 12 independent 1383 13 business/commercial 846 14 cave/cellar 449 15 col/colonel 341 16 disability/invalidity 164 17 flipper/pinball 30 18 freezing/gel klein/small man/rights stability/stabilization tour/tower other related bicycle/cycling green/greening karelia/karelian path/road patriarch/patriarchate programming/programs prosecution/trial retail/retailer ski/skiing work/working synonym auberge/inn avant-garde/vanguard bazaar/bazar boundary/limit cereal/grain faith/trust normandie/normandy optimisation/optimization usage/use vapor/vapour Table 4: Labeling applied to noun pairs in PPDB-large. Only one pair appearing in WN (proliferation/nonproliferation) was assigned the label “antonym.” 7.2 Parameters We want to optimize the accuracy of classifying a noun pair x/y as “alternation.” Using only the information within Wordnet, we have three basic parameters to play with: the depth of the x and y’s shared parent in the hierarchy, the distance of the shared parent from the either x or y, and the number of children the parent has. Depth of parent Wordnet is organized hierarchically, so intuitively, nodes which are nearer to the root refer to more general topics, and nodes deeper in the hierarchy become more specific. Figure 5 shows how the top levels of the hierarchy contain very few nodes, and table 5 shows the kinds of general concepts that exist in these top levels. It is therefore seems reasonable to assume that, for a noun pair labeled as alternation, the depth of the shared parent may be indicative of the accuracy of the alternation: noun pairs for which the lowest shared parent is very high in the hierarchy (close to the root) may actually be independent (e.g. if both nouns are hyponyms of “thing” but otherwise are unconnected) and noun pairs for which the lowest shared parent is very deep in the hierarchy (close to the leaf nodes) may actually be in a hypernymy relationship (e.g. if both nodes are children of something specific like “happiness”). We test the hypothesis that the depth of the shared parent can be tuned to reduce the number of false positive labels for alternation. Distance from parent Additionally, we consider the nouns’ distance from their shared parent. While in some alternation pairs, both nouns are immediate decedents of their shared hypernym (e.g. “love” and “fear” are both children of “emotion”), other pairs consist of nouns which are very far from the shared 18936 20000 14155 15000 12267 11042 10000 7207 6249 4267 5000 2505 2020 0 1 3 22 228 0 1 2 3 4 5 6 7 8 9 10 11 12 1383 846 13 14 449 341 164 30 15 16 17 18 Depth Figure 5: Number of nodes at each depth in WN. 0 1 2 entity.n.01 abstraction.n.06, physical entity.n.01, thing.n.08 attribute.n.02, causal agent.n.01, change.n.06, communication.n.02, freshener.n.01, group.n.01, horror.n.02, jimdandy.n.02, matter.n.03, measure.n.02, object.n.01, otherworld.n.01, pacifier.n.02, process.n.06, psychological feature.n.01, relation.n.01, security blanket.n.01, set.n.02, stinker.n.02, substance.n.04, thing.n.12, whacker.n.01 Table 5: 26 synsets appearing with depth <3. Most of these synsets refer to general concepts (e.g. process and object). Others are specific, but have few or no child nodes. The following have no children: change, freshener, horror, jimdandy, otherworld, pacifier, security blanket, stinker, substance, and whacker. parent (e.g. “step father” and “oil painter” can be classified as alternation through the shared parent “person,” but the nouns are 7 and 4 steps away, respectively). We explore the idea of limiting the maximum distance from the shared parent in order to increase the precision of our alternation classification. It is worth noting that Wordnet’s “sister terms” are equivalent to our alternation if we restrict the maximum distance from the shared parent to be no more than 1. Number of children Finally, we consider the number of immediate children of the shared parent. While it is less intuitive than the previous parame- ters how this may affect the likelihood of alternation, it may be that parents with many children are nodes which enumerate subtypes (e.g. the synset for “flower” has over 100 children, which enumerate different types of flowers) and thus are highprecision indicators of alternation. We therefore also investigate the relationship between the number of children of a node and the likelihood of the children being in alternation. 7.3 Labelled data for testing In order to compare the effect of altering these parameters, we manually labeled 200 noun pairs as alternation/non-alternation. We chose these pairs by randomly sorting the noun pairs that appeared in PPDB (xl) and were labeled as alternation using our original configuration (minimum depth of 3 and no other restrictions). We labelled the first 100 unambiguous examples of alternation and the first 100 unambiguous examples of non-alternation. The full list used is included at the end of this paper (table 8). This data is used to compute precision/recall measures in the following experiments. 7.4 Experiments We first performed a qualitative analysis. Manually inspecting the data to evaluate the minimum depth parameter, we see that there are examples of both good and bad alternation pairs alternating through parents at every depth. Table 6 illustrates this. Manual inspection of the maximum distance parameter is similarly inconclusive. We ran our tagging algorithm on a random sample of 30 manually labeled pairs (15 alternation, 15 non-alternation), setting max distance d to each value from 1 to 10, and enforcing the constraint that a noun pair is in alternation if the nouns in the pair share a hypernym and neither noun is more than d hops from the shared hypernym. Increasing d increases equally the number of true positives and the number of false positives. At d = 7, all 30 nouns were labeled as alternation. Table 7 shows these results. We also look quantitatively at the effects of these parameters. Figure 6 shows the precision and recall curves generated by a) fixing the max distance parameter at infinity and varying the min depth between 1 and 10 and b) fixing the min depth at 0 and varying the max distance parameter between 1 and depth parent x/y 3 whole.n.02 state.n.02 ballplayer/baseball bliss/happiness 4 happening.n.01 disorder.n.03 avalanche/flood instability/turmoil 5 linear unit.n.01 district.n.01 km/mile town/village 6 meal.n.01 atmospheric phenomenon.n.01 dinner/lunch snow/snowstorm 7 modulation.n.02 payment.n.01 am/pm compensation/wage 8 garment.n.01 man.n.01 sweater/t-shirt boy/guy 9 movable feast.n.01 clergyman.n.01 easter/passover preacher/priest 10 bird of prey.n.01 placental.n.01 eagle/hawk cattle/livestock Table 6: Pairs of words labeled as alternation, and the words’ shared parent. Parents at every depth can produce both good (top) and bad (bottom) examples of alternation. 10. In the former case (left of figure 6), we see that we are able to achieve acceptable precision when we set a high minimum depth, but only at the expense of unacceptably low recall. In the latter case (right of figure 6), precision stays near 50% (the equivalent of labeling everything as alternation) no matter what value we choose for max distance. P R P 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0 R 0.1 0 1 2 3 4 5 6 7 Min depth of shared parent 8 9 10 0 0 1 2 3 4 5 6 7 8 9 10 Max distance to shared parent Figure 6: Precision and recall curves when varying max distance (left) and min depth (right). For max distance curves, min depth was fixed at 0. For min depth curves, max distance was fixed at 20. A precision of 0.5 and a recall of 1.0 can are achieved by labeling all pairs as alternation. We also look at the effect of using these two parameters (maximum distance from parent and minimum depth of parent) together. Figure 7 shows the F1 scores achieved for different combinations of parameter settings. No combination achieves higher than 0.67, the F1 score achievable by labeling everything as alternation (which results in 50% precision 1 2 3 20/21 butterflies/moths clam/mussel cuttlefish/squid duck/goose ecg/eeg km/mile tornado/typhoon xxvii/xxviii adjustment/tuning bomber/kamikaze briefing/presentation contaminant/pollutant diagram/graphic home/shelter interval/span judgement/trial 20/21 butterflies/moths clam/mussel coat/gown cuttlefish/squid dress/skirt duck/goose ecg/eeg elf/goblin km/mile tornado/typhoon xxvii/xxviii adjustment/tuning artist/performer bomber/kamikaze briefing/presentation contaminant/pollutant diagram/graphic drama/theatre home/shelter indignation/resentment interval/span judgement/trial 4 5 20/21 butterflies/moths clam/mussel coat/gown cuttlefish/squid dress/skirt duck/goose ecg/eeg elf/goblin girls/guys km/mile tornado/typhoon xxvii/xxviii adjustment/tuning artist/performer bomber/kamikaze briefing/presentation contaminant/pollutant cutoff/threshold diagram/graphic drama/theatre home/shelter indignation/resentment internet/website interval/span judgement/trial 20/21 butterflies/moths clam/mussel coat/gown cuttlefish/squid dress/skirt duck/goose ecg/eeg elf/goblin girls/guys km/mile tornado/typhoon xxvii/xxviii adjustment/tuning artist/performer bomber/kamikaze briefing/presentation contaminant/pollutant cutoff/threshold diagram/graphic drama/theatre home/shelter indignation/resentment internet/website interval/span judgement/trial monitoring/surveillance 6 20/21 ballplayer/baseball butterflies/moths clam/mussel coat/gown cuttlefish/squid dress/skirt duck/goose ecg/eeg elf/goblin girls/guys km/mile tornado/typhoon volleyball/weightlifting xxvii/xxviii adjustment/tuning artist/performer bomber/kamikaze briefing/presentation contaminant/pollutant cutoff/threshold diagram/graphic drama/theatre home/shelter indignation/resentment internet/website interval/span judgement/trial monitoring/surveillance ALTERNATION 20/21 butterflies/moths clam/mussel cuttlefish/squid duck/goose ecg/eeg tornado/typhoon xxvii/xxviii adjustment/tuning briefing/presentation home/shelter interval/span NON-ALTERNATION 7 20/21 ballplayer/baseball butterflies/moths clam/mussel coat/gown cuttlefish/squid dress/skirt duck/goose ecg/eeg elf/goblin girls/guys km/mile tornado/typhoon volleyball/weightlifting xxvii/xxviii adjustment/tuning artist/performer bomber/kamikaze briefing/presentation contaminant/pollutant cutoff/threshold diagram/graphic drama/theatre home/shelter indignation/resentment internet/website interval/span judgement/trial looting/theft monitoring/surveillance Table 7: Noun pairs labeled as alternation (out of a test set of 30 manually labeled pairs) with varying values of max distance. The numbers of true positives and of false positives increase equally as the max distance increases. and 100% recall). depth/dist 0 1 2 3 4 5 6 7 8 9 10 0 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667 1 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667 2 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667 3 0.000 0.532 0.557 0.624 0.640 0.650 0.667 0.669 0.669 0.667 0.667 4 0.000 0.532 0.567 0.636 0.651 0.664 0.667 0.664 0.664 0.662 0.662 5 0.000 0.521 0.570 0.649 0.653 0.667 0.672 0.672 0.674 0.674 0.674 6 0.000 0.491 0.541 0.629 0.627 0.624 0.628 0.628 0.628 0.628 0.628 7 0.000 0.403 0.468 0.561 0.570 0.570 0.575 0.575 0.575 0.575 0.575 8 0.000 0.331 0.400 0.468 0.468 0.468 0.465 0.465 0.465 0.465 0.465 9 0.000 0.293 0.318 0.357 0.357 0.357 0.354 0.354 0.354 0.354 0.354 10 0.000 0.212 0.226 0.226 0.226 0.226 0.224 0.224 0.224 0.224 0.224 graphs are not shown, the results were the same as in the graph for maximum distance: for all values of k and for both the “at least” and the “at most” setting, precision remained at 50%, equivalent to random guessing. non-alternation alternation 32 19 Finally, we look at the association between the number of children a parent has and the likelihood that those children are in alternation. Figure 8 shows a histogram of the number of immediate children per parent for each pair in our manually labeled data set. The distributions of number of children is nearly identical for both alternation and non-alternation pairs. We also ran a precision-recall analysis varying both the maximum and the minimum number of children. That is, we tried labeling pairs as alternation if 1) both nouns shared a parent and 2) that parent had at least/at most k children. Although the 23 10-20 11 Figure 7: F1 scores for varying minimum depths (rows) and maximum distances (columns). Precision and recall were calculated on a set of 200 manually labeled noun pairs (100 positive examples of alternation, 100 negative). No combination achieves acceptable results: 0.67 is the F1 score that would be achieved by labeling every pair as alternation (0.5 precision and 1.0 recall). 32 0-10 9 20-30 2 19 04-03 18 5 7 50-60 2 4 60-70 70-80 1 >90 2 05-04 06-05 07-06 08-07 4 80-90 7 03-02 30-40 40-50 3 0 1- 0 02-01 09-08 09> Figure 8: Number of immediate children of the deepest shared parent for non-alternation pairs (left) and for alternation pairs (right). The distributions are almost identical, suggesting that the number of children of the shared parent is not a useful way of reducing the number of false positive alternation labels. 8 Next steps Do to the described limitations, and the goal application to RTE, we will gather manual annotations for a subset of our data in PPDB. By using more reliably labeled data than what we can collect from Wordnet, we should be able to train a classifier to automatically label the remainder of the pairs in PPDB. The 10/11 18/19 19/20 20/21 acre/hectare afternoon/evening am/pm animal/zoo australasia/oceania avalanche/flood badger/skunk ballplayer/baseball band/orchestra bands/orchestras batter/catcher battleship/boat beef/ox billionaire/millionaire binoculars/telescope boat/ship bonnet/cap box/glove boxer/wrestler brass/bronze bucks/bullets butterflies/moths cabin/cockpit canoe/motorboat cape/robe catcher/hitter char/trout chariot/wagon cheetah/leopard alternation clam/mussel cm/inch coat/gown coffee/diner crab/prawn cricket/grasshopper crustacean/shellfish cucumber/lettuce cuttlefish/squid d/j daughter-in-law/son-in-law dictionary/thesaurus dinner/lunch disc/drive discussion/review dizziness/numbness dollar/yuan dress/skirt dress/suit drink/wood drive/walk driveway/garage duck/goose dungeon/tower eagle/hawk easter/passover ecg/eeg elbow/shoulder elf/goblin epinephrine/norepinephrine firefighter/firehouse gallon/litre gdp/gnp girls/guys grape/raisin hotel/restaurant i/j ii/iii km/mile latte/milk lavender/violet leopard/lion librarian/library mouth/throat n/s ovum/zygote p/s pad/tampon pan/pot psychiatrist/psychologist rider/runner roman/romanian sailboat/yacht saturday/sunday sauce/spice skating/skiing spite/sympathy sweater/t-shirt sweater/vest tornado/typhoon typewriter/typist vcr/video vii/viii volleyball/weightlifting windshield/wiper xvii/xviii xxvii/xxviii adjustment/tuning ailment/disease ambiguity/vagueness analysis/assessment anarchy/chaos angle/perspective archives/file army/military artist/performer atmosphere/climate award/compensation balloting/election bigotry/fanaticism bliss/happiness block/chunk blouse/shirt bomber/kamikaze boy/guy brawl/rumble briefing/presentation bullying/harassment cannabis/hashish capital/investment case/file cassette/tapes cattle/livestock charisma/charm chevalier/knight chin/jaw chip/token coil/spool companion/partner compensation/wage non-alternation conceptualization/design configuration/setup construction/design contaminant/pollutant correctness/validity corruption/embezzlement cutoff/threshold demonstration/rally determination/will diagram/graphic dish/platter drama/theater drama/theatre drum/keg dust/powder education/studies election/poll fellowship/scholarship figure/graphic figure/silhouette first/top focus/spotlight freeway/interstate guard/warden high-rise/skyscraper home/shelter humidity/moisture idea/insight inability/weakness indignation/resentment instability/turmoil internet/website interval/span itinerary/trip judgement/trial lady/queen leave/vacation looting/theft love/romance mansion/villa map/route means/resources merit/value mil/thousand monitoring/surveillance motion/petition municipality/township opponent/rival outfit/suit papa/pope pattern/trend persecution/repression personality/self polo/shirt preacher/priest psychologist/therapist reality/truth room/suite sailing/yachting seminars/symposiums senior/superior shawl/wrap shuttle/spaceship snow/snowstorm sore/wound subway/train town/village Table 8: Manually annotated noun pairs used in precision/recall calculations. resulting labels (or distributions over labels) will hopefully offer a valuable contribution to RTE systems, with a scale and level of annotation that is not currently available in existing resources. 9 References MacCartney, Bill. Natural Language Inference. Diss. Stanford University, 2009.
© Copyright 2026 Paperzz