Are WordNet sense distinctions appropriate for computational

Are WordNet sense distinctions appropriate for computational lexicons?
Martha Palmer
Institute for Research in Cognitive Science and
Department of Computer Science
University of Pennsylvania
Philadelphia PA 19104-6389
[email protected]
1
Introduction
The diculty of achieving adequate hand-crafted semantic representations has limited the eld of
natural language processing to applications that can be contained within well-dened subdomains.
The only escape from this limitation will be through the use of automated or semi-automated
methods of lexical acquisition. However, the eld has yet to develop a clear consensus on guidelines
for a computational lexicon that could provide a springboard for such methods, in spite of all of
the eort on dierent lexicon development approaches [8], [17],[12], [2], [7], [4]. One of the most
controversial areas has to do with polysemy. What constitutes a clear separation into senses for
any one verb, and how can these senses be computationally characterized and distinguished? The
answer to this question is the key to breaking the bottleneck of semantic representation that is
currently the single greatest limitation on the general application of natural language processing
techniques.
In this paper we specically address the question of polysemy with respect to verbs, and whether
or not the sense distinctions that are made in on- line dictionary resources are appropriate for
computational lexicons. We examine the use of sets of related syntactic frames and verb classes
as a means of simplifying the task of dening dierent senses, and we focus on the mismatches
between distinctions that can readily be made with these tools and some of the distinctions that
are made in WordNet.
2
Challenges in building large-scale lexicons
Computational lexicons are an integral part of any natural language processing system, and perform many essential tasks. The diculty in achieving adequate performance of one of the most
visible ones, word sense disambiguation, is currently creating a major roadblock on the information
highway. 1 Inadequate word sense disambiguation results in information retrieval mismatches such
as the retrieval of an article on plea bargaining when the query is speed of light. A sentence in the
plea bargaining article has speed and light within a few words of each other, since plea bargaining
can lead to speedier trials and lighter sentences. These are of course not the same senses of light,
(or even the same parts of speech), but a system would have to distinguish between WordNet
light1, involving visible light, and WordNet light2, having to do with quantity or degree in
order to rule out this retrieval. In addition, failing to properly disambiguate can lead to erroneous
translations. For instance, in Korean, there are two dierent translations for the English verb lose,
1
See the SIGIR-97 Proceedings for a comprehensive view of this eld.
depending on whether it is an object that has been misplaced or a competition that has been lost:
lose1, lose the report - ilepeli-ess-ta, and lose2, lose the battle - ci-ess-ta.
These particular distinctions can be made by placing semantic class constraints on the objects
of lose,+solid object, and +competition, as illustrated by the trees in Figure 1. The rst constraint corresponds directly to a WordNet hypernym, but the second one does not. The closest
correlate would be +abstract activity, which is the common hypernym for both hostile military
engagement and game, and which may discriminate suciently.
Selectional Restrictions for Lexical Choice
S
NP 0
S
VP
V
NP 0
VP
V
NP 1
NP 1
type:solid object+
[type:competition+]
lose1
lose2
lose1 -> ci-ess-ta
lose2 -> ilepeli-ess-ta
Figure 1: Distinguishing two senses of lose
If this is the case for many polysemous verbs, then one of the questions to be addressed is
simply - why is this such a dicult task? (Sinces nouns have on average many fewer senses than
verbs in most lexical resources, for the time being we will put aside the problem of also having
to disambiguate the noun phrases that constitute the verb arguments). The Wall Street Journal
corpus has around three to four thousand verbs. The entire Xtag English lexicon has around eight
thousand. Why is this so daunting?
The crux of the matter is that, in addition to the highly polysemous nature of many of these
verbs, some of which can have over eighty dierent senses, there is another, more critical challenge.
New senses are continuously being added, due to the extreme uidity and exibility of language. It
is not enough to be able to simply list every possible sense for every possible verb on September 4,
1998. It is also necessary to be able to account for the adaptable manner in which old senses can be
transformed into new senses by substituting in a new object type, or adding an argument, or using
a dierent preposition, etc. So that even though many of us are most familiar with smash when
it is being used to describe a breaking event, we have no trouble understanding its use in smash
at his opponent's face, where it occurs in a hit sense, even if we have never seen that use before.
Computational lexicons need to be able to adapt to new usages in an equally exible fashion.
The current approach to building computational lexicons involves handcrafted, independent
entries for each sense, with the expectation that a particular sentence will either exactly match one
of these entries or fail completely. When a failure occurs, there is no way of gracefully adapting
to the best partial match. The importance of allowing for best partial matches, in particular
for translation purposes, has been demonstrated by an experimental system for English/Chinese
machine translation system implemented at the National University of Singapore [20], [15]. This
system achieved a rate of successful lexical choice that was signicantly higher than the TRANSTAR
[15] accuracy rate for the break examples from the Brown corpus, as well as additional sentences
featuring other break verbs and hit, touch and cut verbs. (Note that accurate lexical choice in
machine translation depends critically on rst appropriately disambiguating the sense of the source
verb.) The success rate was achieved by using hand-crafted conceptual ontologies for both English
and Chinese lexical items, and then merging them into an interlingua ontology. These ontologies
were represented as conceptual lattices which could be used to compute conceptual similarity based
on proximity in the lattice. The system found the set of best partial matches between the source
item and target lexical items, and then language- specic selectional restrictions based on the verb
arguments were used to nalize the choice of the target verb, even when the source lexical items
occurred in novel usages.
This type of successful partial matching relies on entries in a computational lexicon that include:
predicate argument structures with selectional restrictions
all possible syntactic contexts for each unique predicate-argument structure
conceptual links to related senses
The current challenge is to extend this approach to much broader lexicons and to other languages. The immediate question is whether or not on- line available resources such as WordNet
can readily lend themselves to producing such conceptually rich computational lexicons. Another
question is whether or not word sense disambiguation evaluations can be set up that can also allow
points to be awarded for partial matches [6].
3
What does WordNet provide?
WordNet is an on-line lexical database of English that currently contains about 120,000 sets of
noun, verb, adjective, and adverb synonyms, each representing a lexicalized concept [9], [10], [11].
A synset (synonym set) contains, besides all the word forms that can refer to a given concept, a
denitional gloss and - in most cases - an example sentence. Words and synsets are interrelated
by means of lexical and semantic-conceptual links, respectively. Antonymy or semantic opposition
links individual words, while the super-/subordinate relation links entire synsets. WordNet was
designed principally as a semantic network, although it does contain some syntactic information.
Although WordNet provides the conceptual links between senses that we are in search of, the
sense distinctions it makes are not necessarily those that are appropriate for a computational
lexicon. We will rst examine WordNet sense distinctions that are basically appropriate, but
could perhaps be elaborated or made more regular, and then look at distinctions that, although
meaningful to humans, are based on criteria outside the realm of a computational lexicon.
3.1
Regularizing sense distinctions
The following two senses are listed in and among the 63 sense distinctions listed for break.
break - 2. break, separate, split up, fall apart, come apart { (become separated into pieces or
fragments; "The gurine broke"; "The freshly baked loaf fell apart")
break - 5. (destroy the integrity of; usually by force; cause to separate into pieces or fragments;
"He broke the glass plate"; "She broke the match")
They are shown as being related senses in the latest version of WordNet, which provides one of
the conceptual links we are interested in. In addition, our computational lexicon should also specify
the type of link, +causative, and the inclusion of an AGENT as an argument to the second sense.
In a Lexicalized Tree-Adjoining Grammar, this relationship can be captured by including both
syntactic frames in a single tree family (the same tree family that is associated with all ergative
verbs, and which contains all of the syntactic realizations associated with a particular verb sense,
including subject extraction and object extraction, etc.) and indicating that the transitive tree has
a +causative feature that the intransitive verb does not have [5], [19, 18], [16]. This is illustrated
in Figure 2, along with additional trees. The tree family captures the relationship in a general way,
and indicates the close semantic relationship between the two usages - they describe the same type
of event, the only dierence being the explicit causation and the extra argument in the transitive
[14]. Since WordNet associates two very dierent synonym sets with each usage, that conceptual
closeness is obscured.
The BREAK Tree Family (partial)
S
S
NP
VP
1
tense = passive
PP
V
NP 0
VP
[ event: + causative]
break
P
NP
0
by
V
NP 1
S
break
NP
S
wh = +
S
NP
NP
1
VP
0
VP
ε
V
NP
1
V
break
break
Figure 2: Break tree family
In addition to regular extensions in meaning that derive from systematic changes in subcategorization frames, there are also regular extensions occasioned by the adjunction of optional prepositions, adverbials and prepositional phrases. For example, the basic meaning of push, He pushed
the next boy, can be extended to explicitly indicate accompanied motion by the adjunction of a
path prepositional phrase, as in He pushed the boxes across the room [3], which corresponds to
1 below. The possibility of motion of the object can be explicitly denied through the use of the
conative, as in He pushed at the box, which is captured by sense 5. Finally, the basic sense can also
be extended to indicate a change of state of the object by the adjunction of apart, as in He pushed
the boxes apart. There is no WordNet sense that corresponds to this, nor should there be. What
is important is for the lexicon to provide the capability of recognizing and generating these usages
where appropriate. If they are general enough to apply to entire classes of verbs, then they can be
captured through regular adjunctions rather than being listed explicitly, as illustrated by Figure 3
and Figure 4, [1].
1. push, force { (move with force, "He pushed the table into a corner"; "She pushed her chin
out")
5. push { (press against forcefully without being able to move)
Distinct PUSH senses: WN 1, WN 5
S
S
VP
NP 0
V
[event:force+, contact+]
event:motion-,force+, contact+]
NP 0
VP
NP 1
V
PP
push
push
at
NP 1
Figure 3: Push tree family
3.2
Inappropriate distinctions
However, there are other senses of push in WordNet where the distinctions cannot be so clearly
related to changes in syntactic frames. The following senses all have to do with a sense of push
in which abstract pressure is brought to bear. Distinctions can be made between 2 and 3 perhaps
based on the arguments, one is animate and the other is presumably a product. Sense 4 does not
seem to have an argument that is the object of push, but rather a purpose clause, as does 7. Are
they distinguished because one is a to-clause and one is for-clause? And how are 7 and 8 dierent?
Surely it could be said that The liberal party crusaded for reforms? or that She pressed for women's
rights? What criteria could a computational lexicon use to separate senses 7 and 8?
Available PUSH senses through adjunction
S
NP
1
VP
[event: force+, contact+]
V
NP
VP
[path:ACROSS, event:motion+]
[event:change of state+, path:NONE]
VP
push
VP*
PP
[event:force+, contact+]
VP*
P
NP
0
ADV
[event:force+, motion+]
across
apart
Figure 4: Regular sense extensions of push through adjunction
2. push, bear on { (press, drive, or impel to action; "He pushed her to nish her doctorate")
3. advertize, promote, push { (make publicity for; try to sell)
4. tug, labor, labour, push, drive { (exert oneself; "She tugged for years to make a decent
living")
7. crusade, ght, campaign, push, agitate { (engage in a crusade for a certain cause or person;
"She is crusading for women's rights")
8. push, press { (exert oneself continuously, vigorously, or obtrusively to gain an end; "The
liberal party pushed for reforms")
It could be argued that WordNet should not have listed separate senses for 7 and 8 in the rst
place, but that easy out is not always available. The inappropriateness of the demands being placed
on a computational lexicon can also be illustrated by examing the ten senses for the verb lose. We
nd one that corresponds to our lose1 from above, lost the battle sense, 2, but two that correspond
to our lose2, misplace an item, 1 and 5.
lose1 - 2. lose { (fail to win; "We lost the battle but we won the war")
lose2 - 1. (fail to keep or to maintain; cease to have, either physically or in an abstract sense;
fail to keep in mind or in sight; "She lost her purse when she left it unattended on her seat";
"She lost her husband a year ago")
lose2 - 5. (miss from one's possessions; lose sight of; "I've lost my glasses again!")
When we try to establish concrete criteria for distinguishing between 1 (lost her purse) and
5 (lost my glasses), we realize that these two WordNet senses are not distinguished because of
anything to do with the verb arguments (an animate agent and a solid object possessed by the
agent in both cases), but rather are distinguished by possible future events - namely the likelihood
of the object being found. Are these two usages really dierent senses of the verb, or are they
simply two dierent instances of the same verb, and the same basic event, but in very dierent
world contexts which will therefore lead to dierent outcomes? In which case, is it reasonable to
expect a computational lexicon to, in addition to meticulously listing changes in syntactic context,
in argument structure, and in selectional restrictions on those arguments, to also characterize all
possible worlds in which an event can take place, and distinguish between all possible outcomes?
We are not denying that a computational lexicon should include any changes in the state of the
world that are entailed by a particular action, quite the contrary, as evidenced by [13]. However,
the characterizations of these changes should be generally applicable, and cannot be so dependent
on a single world context that they change with every new situation.
This does not invalidate the sense distinctions being made by WordNet (or other dictionaries).
It is perfectly legitimate for people to use their knowledge of world context to illustrate subtle shifts
in entailments. However, we need to separate that from the task we are setting the computer, and
the demands being placed on it. It may be appropriate someday for a complete natural language
understanding system to produce representations of the sentences in the denitions presented here
that capture exactly those dierences in entailments that are occasioned by the dierent situations.
But that burden should not be bourne by the lexicon alone, but rather by the lexicon interacting
with world knowledge and discourse context.
4
Conclusion
It has been argued that WordNet sense distinctions are too ne- grained, and coarser senses are
needed to drive the word sense disambiguation task. For instance, in dening cut, WordNet distinguishes between 1. separating into pieces, 29. cutting grain, 30. cutting trees, and 33. cutting hair.
For many purposes, the three more specialized senses, 29,30 and 33, could all be collapsed into the
more coarse-grained 1, even though when searching for articles on recent changes in hair styles,
the more ne-grained 33 would be necessary. These types of ne-grained distinctions are based on
selectional restrictions on arguments, and should in fact be feasible for a computational lexicon to
distinguish among. The point being made here is that although computational lexicons are capable
of making ne distinctions, those distinctions must be related to explicit concrete criteria. Computational lexicons can distinguish senses readily based on changes in argument structure, changes
in sets of syntactic frames and/or changes in selectional restrictions, and these are the kinds of
criteria that are appropriate for computational sense distinctions. Distinctions that are based on
world knowledge, no matter how diverse, are much more problematic. By bearing this in mind, we
can design a word sense disambiguation task that will provide a forum for encouraging rational,
incremental development of computational lexicons.
References
[1] Tonia Bleam, Martha Palmer, and Vijay Shanker. Motion verbs and semantic features in tag.
In TAG+-98, Philadelphia, PA, 1998. Submitted.
[2] Ann Copestake and Antonio Sanlippo. Multilingual lexical representation. In Proceedings of
the AAAI Spring Symposium: Building Lexicons for Machine Translation, Stanford, California,
1993.
[3] Hoa Trang Dang, Karin Kipper, Martha Palmer, and Joseph Rosenzweig. Investigating regular
sense extensions based on intersective levin classes. In Proceedings of ACL98, Montreal, CA,
August 1998.
[4] Bonnie J. Dorr. Large-scale dictionary construction for foreign language tutoring and interlingual machine translation. Machine Translation, 12:1{55, 1997.
[5] Aravind K. Joshi, L. Levy, and M. Takahashi. Tree Adjunct Grammars. Journal of Computer
and System Sciences, 1975.
[6] Adam Kilgarri. Evaluating word sense disambiguation programs: Progress report. In Proceedings of the SALT Workshop on Evaluation in Speech and Language Technology, Sheeld,
U.K., July 1997.
[7] J.B. Lowe, C.F. Baker, and C.J. Fillmore. A frame-semantic approach to semantic annotation.
In Proceedings 1997 Siglex Workshop/ANLP97, Washington, D.C., 1997.
[8] I. A Mel'cuk. Semantic description of lexical units in an explanatory combinatorial dictionary:
Basic principles and heuristic criteria. International Journal of Lexicography, pages 165{188,
1988.
[9] G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. Miller. Five papers on wordnet.
Technical Report 43, Cognitive Science Laboratory, Princeton University, July 1990.
[10] George A. Miller. Wordnet: An on-line lexical database. International Journal of Lexicography,
3:235{312, 1990.
[11] George A. Miller and Christiane Fellbaum. Semantic networks of english in beth levin and
steven pinker, editors. Lexical and Conceptual Semantics, Cognition Special Issue, pages 197{
229, 1991.
[12] S. Nirenburg, J. Carbonell, M. Tomita, and K. Goodman. Machine translation: a knowledgebased approach. Morgan Kaufmann, San Mateo, California, USA, 1992.
[13] M. Palmer. Customizing verb denitions for specic semantic domains. Machine Translation,
5, 1990.
[14] M. Palmer and A. Polguere. Lexical Computational Semantics, chapter A Lexical and Conceptual Analysis of Break. Cambridge University Press, 1995.
[15] M. Palmer and Z. Wu. Verb semantics for english-chinese translation. Machine Translation,
10:59{92, 1995.
[16] Martha Palmer and Joseph Rosenzweig. Capturing motion verb generalizations with synchronous tags. In AMTA-96, Montreal, Quebec, October 1996.
[17] James Pustejovsky. The generative lexicon. Computational Linguistics, 17(4), 1991.
[18] Yves Schabes. Mathematical and Computational Aspects of Lexicalized Grammars. PhD thesis,
Computer Science Department, University of Pennsylvania, 1990.
[19] Yves Schabes, Anne Abeille, and Aravind K. Joshi. Parsing strategies with `lexicalized' grammars: Application to Tree Adjoining Grammars. In Proceedings of the 12th International
Conference on Computational Linguistics (COLING'88), Budapest, Hungary, August 1988.
[20] Zhibiao Wu and Martha Palmer. Verb semantics and lexical selection. In Proceedings of the
32nd Annual Meeting of the Association for Computational Linguistics, New Mexico State
University, July 1994.