Word Sense Disambiguation - Computational Lexical Semantics at

Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Word Sense Disambiguation
Computational Lexical Semantics
Gemma Boleda1
1 Universitat
Stefan Evert2
Politècnica de Catalunya
2 University
of Osnabrück
ESSLLI. Bordeaux, France, July 2009.
1 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Thanks
These slides are based on Jurafsky & Martin (2004: chapter 20)
and material by Ann Copestake (course at UPF, 2008)
2 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Outline
1
Overview
2
Supervised WSD
3
Evaluation
4
Dictionary and Thesaurus Methods
5
Discussion
3 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Outline
1
Overview
2
Supervised WSD
3
Evaluation
4
Dictionary and Thesaurus Methods
5
Discussion
4 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Overview
Word Sense Disambiguation
The task of selecting the correct sense for a word in context.
potentially helpful in many applications
machine translation, question answering, information retrieval
...
we focus on WSD as a stand-alone task
artificial!
5 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Overview
Word Sense Disambiguation
The task of selecting the correct sense for a word in context.
potentially helpful in many applications
machine translation, question answering, information retrieval
...
we focus on WSD as a stand-alone task
artificial!
5 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Overview
Word Sense Disambiguation
The task of selecting the correct sense for a word in context.
potentially helpful in many applications
machine translation, question answering, information retrieval
...
we focus on WSD as a stand-alone task
artificial!
5 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Overview
Word Sense Disambiguation
The task of selecting the correct sense for a word in context.
potentially helpful in many applications
machine translation, question answering, information retrieval
...
we focus on WSD as a stand-alone task
artificial!
5 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD algorithm
basic form:
input: word in context, fixed inventory of word senses
output: the correct word sense for that use
context?
words surrounding the target word: annotated? just the words
in no particular order? context size?
inventory?
task-dependent
machine translation from English to Spanish: set of Spanish
translations
speech synthesis: homographs with differing pronunciations
(e.g., bass)
stand-alone task: a lexical resource (usually, WordNet)
6 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD algorithm
basic form:
input: word in context, fixed inventory of word senses
output: the correct word sense for that use
context?
words surrounding the target word: annotated? just the words
in no particular order? context size?
inventory?
task-dependent
machine translation from English to Spanish: set of Spanish
translations
speech synthesis: homographs with differing pronunciations
(e.g., bass)
stand-alone task: a lexical resource (usually, WordNet)
6 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD algorithm
basic form:
input: word in context, fixed inventory of word senses
output: the correct word sense for that use
context?
words surrounding the target word: annotated? just the words
in no particular order? context size?
inventory?
task-dependent
machine translation from English to Spanish: set of Spanish
translations
speech synthesis: homographs with differing pronunciations
(e.g., bass)
stand-alone task: a lexical resource (usually, WordNet)
6 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD algorithm
basic form:
input: word in context, fixed inventory of word senses
output: the correct word sense for that use
context?
words surrounding the target word: annotated? just the words
in no particular order? context size?
inventory?
task-dependent
machine translation from English to Spanish: set of Spanish
translations
speech synthesis: homographs with differing pronunciations
(e.g., bass)
stand-alone task: a lexical resource (usually, WordNet)
6 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
An example
WordNet Sense
bass4
bass4
bass7
bass7
Target Word in Context
. . . fish as Pacific salmon and striped bass and. . .
. . . produce filets of smoked bass or sturgeon. . .
. . . exciting jazz bass player since Ray Brown. . .
. . . play bass because he doesn’t have to solo. . .
Figure: Possible inventory of sense tags for word bass
7 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Variants of the task
lexical sample task
WSD for a small set of target words
a number of corpus instances are selected and labeled
similar to task in our case study
→ supervised approaches; word-specific classifiers
all-words
WSD for all content words in a text
similar to POS-tagging; but very large “tagset”! → data
sparseness
not enough training data for every word
8 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Outline
1
Overview
2
Supervised WSD
3
Evaluation
4
Dictionary and Thesaurus Methods
5
Discussion
9 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Feature extraction
supervised approach → need to identify features that are
predictive of word senses
fundamental (and early) insight: look at the context words
bass
smoked bass or
jazz bass player
window (e.g., 1-word window)
10 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Feature extraction
supervised approach → need to identify features that are
predictive of word senses
fundamental (and early) insight: look at the context words
bass
smoked bass or
jazz bass player
window (e.g., 1-word window)
10 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Feature extraction
supervised approach → need to identify features that are
predictive of word senses
fundamental (and early) insight: look at the context words
bass
smoked bass or
jazz bass player
window (e.g., 1-word window)
10 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Feature extraction
supervised approach → need to identify features that are
predictive of word senses
fundamental (and early) insight: look at the context words
bass
smoked bass or
jazz bass player
window (e.g., 1-word window)
10 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Method
process the dataset (POS-tagging, lemmatization, parsing)
build feature representation encoding the relevant linguistic
information
two main feature types:
1
2
collocational features
bag-of-words features
11 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Collocational features
features that take order or syntactic relations into account
restricted to immediate word context (usually fixed window).
For example:
lemma and part of speech of two-word window
syntactic function of the target word
12 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Collocational features: Example
Example: (20.1) An electric guitar and bass player stand off to
one side, not really part of the scene, just as a sort of nod to
gringo expectations perhaps.
2-word window representation, using parts of speech:
[guitar , NN, and, CC , player , NN, stand, VB]
[w − 2, P − 2, w − 1, P − 1, w + 1, P + 1, w + 2, P + 2]
13 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Collocational features: Example
Example: (20.1) An electric guitar and bass player stand off to
one side, not really part of the scene, just as a sort of nod to
gringo expectations perhaps.
2-word window representation, using parts of speech:
[guitar , NN, and, CC , player , NN, stand, VB]
[w − 2, P − 2, w − 1, P − 1, w + 1, P + 1, w + 2, P + 2]
13 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Collocational features: Example
Example: (20.1) An electric guitar and bass player stand off to
one side, not really part of the scene, just as a sort of nod to
gringo expectations perhaps.
2-word window representation, using parts of speech:
[guitar , NN, and, CC , player , NN, stand, VB]
[w − 2, P − 2, w − 1, P − 1, w + 1, P + 1, w + 2, P + 2]
13 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Bag-of-words features
lexical features
pre-selected words that are potentially relevant for sense
distinctions. For example:
for all-words task: frequent content words in the corpus
for lexical sample task: content words in the sentences of the
target word
test for presence/absence of a certain word in the selected
context
14 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Bag-of-words features: Example
Example: (20.1) An electric guitar and bass player stand off to one
side, not really part of the scene, just as a sort of nod to gringo
expectations perhaps.
pre-selected words:
[fishing , big , sound, player , fly ]
feature vector:
[0, 0, 0, 1, 0]
15 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Bag-of-words features: Example
Example: (20.1) An electric guitar and bass player stand off to one
side, not really part of the scene, just as a sort of nod to gringo
expectations perhaps.
pre-selected words:
[fishing , big , sound, player , fly ]
feature vector:
[0, 0, 0, 1, 0]
15 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
More on features
collocational cues account for:
“collocational” effects
bass+player=bass 7
syntax-related sense differences
serve breakfast to customers vs. serve Philadelphia
bag of word features account for topic and domain related
features
resemblance to semantic fields, frames, . . .
complementary information → both feature types usually
combined
16 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Combined representation: Example
simplified representation for 2 sentences:
collocational features corresponding to 1-word window:
. . . jazz bass player . . .
. . . smoked bass or . . .
bag-of-word features only fishing, player
17 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Combined representation
Weka format
@relation bass
@attribute
@attribute
@attribute
@attribute
@attribute
@attribute
@attribute
wordL1 {jazz,smoke}
posL1 {CC,VBD}
wordR1 {player,or}
posR1 {CC,NN}
fishing {0,1}
player {0,1}
sense {s4,s7}
@data
jazz,CC,player,NN,0,1,s7
smoke,VBD,or,NN,0,0,s4
. . . jazz bass player . . .
. . . smoked bass or . . .
18 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Method
any supervised algorithm
Decision Trees (for example, J48)
Decision Lists (similar to Decision Trees)
Naive Bayes (probabilistic)
...
and tool
Weka
R
SVMTool
your own implementation
...
19 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Interim Summary
supervised approaches use sense-annotated datasets
need many annotated examples for every word
relevant information in the context:
lexico-syntactic information (collocational features)
lexical information (bag of words features)
information is encoded in the form of features . . .
and a classifier is trained to distinguish different senses of a
given word
20 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Outline
1
Overview
2
Supervised WSD
3
Evaluation
4
Dictionary and Thesaurus Methods
5
Discussion
21 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Extrinsic evaluation
long term goal: improve performance in end-to-end application
→ extrinsic evaluation (or task-based, end-to-end, in vivo
evaluation)
example: Word Sense Disambiguation for (Cross-Lingual)
Information Retrieval
http://ixa2.si.ehu.es/clirwsd
22 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Intrinsic evaluation
however, extrinsic evaluation difficult and time consuming
→ intrinsic evaluation (or in vitro evaluation)
treat a WSD component as if it were a stand-alone system
measure: sense accuracy (percentage of words correctly
tagged)
matches
Accuracy =
total
method: held-out data from the same sense-tagged corpora
used for training (train-test methodology)
to standardize datasets and methods: SensEval and SemEval
competitions
example: our case study
23 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Intrinsic evaluation
however, extrinsic evaluation difficult and time consuming
→ intrinsic evaluation (or in vitro evaluation)
treat a WSD component as if it were a stand-alone system
measure: sense accuracy (percentage of words correctly
tagged)
matches
Accuracy =
total
method: held-out data from the same sense-tagged corpora
used for training (train-test methodology)
to standardize datasets and methods: SensEval and SemEval
competitions
example: our case study
23 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Baseline
baseline: performance we would get without much knowledge
/ with a simple approach
necessary for any Machine Learning experiment
(how good is 70%?)
simplest baseline: most frequent sense
WordNet: first sense heuristic (senses ordered)
very powerful baseline! → skewed distribution of senses in
corpora
BUT we need access to annotated data for every word in the
dataset to estimate sense frequencies
this is a “knowledge-laden” baseline
24 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Ceiling
ceiling or upper-bound for performance: inter-coder
agreement
all-word corpora using WordNet: Ao ≈ 0.75 − 0.8
more coarse-grained sense distinctions: Ao ≈ 0.9
another possibility: avoid annotation using pseudowords
banana-door
however: unrealistic → real polysemy is not like
banana-doors!
need to find better ways to create pseudowords
25 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Outline
1
Overview
2
Supervised WSD
3
Evaluation
4
Dictionary and Thesaurus Methods
5
Discussion
26 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Overview
sense-labeled corpora give accurate information – but scarce!
need other sources: dictionaries, thesaurus, selectional
restrictions . . .
idea: use dictionaries as corpora (identifying related words in
definitions and examples)
27 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
An example
Example: (20.10) The bank can guarantee deposits will eventually
cover future tuition costs because it invests in adjustable-rate
mortgage securities.
bank1
Gloss:
Examples:
bank2
Gloss:
Examples:
a financial institution that accepts deposits and
channels the money into lending activities
“he cashed a check at the bank”; “that bank
holds the mortgage on my home”
sloping land (especially beside a body of water)
“they pulled the canoe up on the bank”;
“he sat on the bank of the river”
Figure: WordNet information for two senses of bank
28 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
An example
Example: (20.10) The bank can guarantee deposits will eventually
cover future tuition costs because it invests in adjustable-rate
mortgage securities.
bank1
Gloss:
Examples:
bank2
Gloss:
Examples:
a financial institution that accepts deposits and
channels the money into lending activities
“he cashed a check at the bank”; “that bank
holds the mortgage on my home”
sloping land (especially beside a body of water)
“they pulled the canoe up on the bank”;
“he sat on the bank of the river”
Figure: WordNet information for two senses of bank
28 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
An example
Example: (20.10) The bank can guarantee deposits will eventually
cover future tuition costs because it invests in adjustable-rate
mortgage securities.
bank1
Gloss:
Examples:
bank2
Gloss:
Examples:
a financial institution that accepts deposits and
channels the money into lending activities
“he cashed a check at the bank”; “that bank
holds the mortgage on my home”
sloping land (especially beside a body of water)
“they pulled the canoe up on the bank”;
“he sat on the bank of the river”
Figure: WordNet information for two senses of bank
29 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Signatures
signature: set of words that characterizes a given sense of a
target word
extracted from dictionaries, thesauri, tagged corpora, . . .
for example (20.10):
bank 1 : financial, institution, accept, deposit, channel, money,
lending, activity, cash, check, hold, mortgage, home
bank 2 : sloping, land, body, water, pull, canoe, bank, sit, river
30 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Lesk Algorithm
Lesk Algorithm
function SIMPLIFIED LESK(word, sentence) returns best sense of word
best-sense ← most frequent sense for word
max-overlap ← 0
context ← set of words in sentence
for each sense in senses of word do
signature ← set of words in the gloss and examples of sense
overlap ← COMPUTEOVERLAP(signature, context)
if overlap > max-overlap then
max-overlap ← overlap
best-sense ← sense
end
return(best-sense)
31 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Lesk Algorithm
Example: she strolled by the river bank.
best-sense ← bank 1 ; max-overlap ← 0
context ← {she, stroll, river}
sense bank 1 :
signature ← {financial, institution, accept, deposit, channel,
money, lending, activity, cash, check, hold, mortgage, home}
overlap ← 0; 0 > 0 fails
sense bank 2 :
signature ← {sloping, land, body, water, pull, canoe, bank, sit,
river}
overlap ← 1; 1 > 0 succeeds
best-sense ← bank 2 ; max − overlap ← 1
return bank 2
32 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Lesk Algorithm
Example: she strolled by the river bank.
best-sense ← bank 1 ; max-overlap ← 0
context ← {she, stroll, river}
sense bank 1 :
signature ← {financial, institution, accept, deposit, channel,
money, lending, activity, cash, check, hold, mortgage, home}
overlap ← 0; 0 > 0 fails
sense bank 2 :
signature ← {sloping, land, body, water, pull, canoe, bank, sit,
river}
overlap ← 1; 1 > 0 succeeds
best-sense ← bank 2 ; max − overlap ← 1
return bank 2
32 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Lesk Algorithm
Example: she strolled by the river bank.
best-sense ← bank 1 ; max-overlap ← 0
context ← {she, stroll, river}
sense bank 1 :
signature ← {financial, institution, accept, deposit, channel,
money, lending, activity, cash, check, hold, mortgage, home}
overlap ← 0; 0 > 0 fails
sense bank 2 :
signature ← {sloping, land, body, water, pull, canoe, bank, sit,
river}
overlap ← 1; 1 > 0 succeeds
best-sense ← bank 2 ; max − overlap ← 1
return bank 2
32 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Lesk Algorithm
Example: she strolled by the river bank.
best-sense ← bank 1 ; max-overlap ← 0
context ← {she, stroll, river}
sense bank 1 :
signature ← {financial, institution, accept, deposit, channel,
money, lending, activity, cash, check, hold, mortgage, home}
overlap ← 0; 0 > 0 fails
sense bank 2 :
signature ← {sloping, land, body, water, pull, canoe, bank, sit,
river}
overlap ← 1; 1 > 0 succeeds
best-sense ← bank 2 ; max − overlap ← 1
return bank 2
32 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Lesk Algorithm
Example: she strolled by the river bank.
best-sense ← bank 1 ; max-overlap ← 0
context ← {she, stroll, river}
sense bank 1 :
signature ← {financial, institution, accept, deposit, channel,
money, lending, activity, cash, check, hold, mortgage, home}
overlap ← 0; 0 > 0 fails
sense bank 2 :
signature ← {sloping, land, body, water, pull, canoe, bank, sit,
river}
overlap ← 1; 1 > 0 succeeds
best-sense ← bank 2 ; max − overlap ← 1
return bank 2
32 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Discussion
right intuition: words that appear in dictionary definitions and
examples are relevant to a given sense
problem: data sparseness: dictionary entries short, not always
examples
→ Lesk algorithm currently used as baseline
BUT many extensions possible and have been tried
(generalizations over lemmata, corpus data, weighting, . . . )
AND dictionary-derived features can be used (are used) in
standard supervised approaches
33 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Interim Summary
information encoded in dictionaries (definitions, examples) is
useful for WSD
can be used exclusively or in addition to other information
(collocations, bag of words) for supervised approaches
the Lesk algorithm disambiguates solely on the basis of
dicionary information
overlap between dictionary entry and context of word
occurrence
the most frequent sense and the Lesk algorithm are used as
baselines for evaluation
34 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Overview
we have a huge number of classes (senses)
need large hand-built resources:
supervised approaches need large annotated corpora
(unrealistic)
dictionary methods need large dictionaries, which, even if
available, often do not provide enough information
alternatives:
Minimally supervised WSD
Unsupervised WSD
both make use of unannotated data
these approaches are not as successful as supervised
approaches
35 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Minimally supervised WSD: Bootstrapping
for a given word, for example plant
start with a small number of annotated examples (seeds) for
each sense
collect additional examples for each sense based on their
similarity to annotated examples
iterate
36 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Bootstrapping: example
plant (Yarowsky 1995)
sense A: living entity; sense B: building
first examples: those that appear with life (sense A) and
manufacturing (sense B)
Figure: Bootstrapping word senses. Figure 20.4 in Jurafsky & Martin.
37 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Yarowsky 1995
Influential insights (used as heuristics in Yarowsky’s algorithm):
→ one sense per collocation
life+plant = plantA
manufacturing+plant = plantB
→ one sense per discourse
if a word appears multiple times in a text, probably all
occurrences will bear the same sense
also useful to enlarge datasets
38 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Unsupervised WSD
no previous knowledge
no human-defined word senses
simply group examples according to the similarity of the
examples
clustering
and infer senses from that
problem: hard to interpret and evaluate
39 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Unsupervised WSD
no previous knowledge
no human-defined word senses
simply group examples according to the similarity of the
examples
clustering
and infer senses from that
problem: hard to interpret and evaluate
39 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Outline
1
Overview
2
Supervised WSD
3
Evaluation
4
Dictionary and Thesaurus Methods
5
Discussion
40 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Interim summary
WSD can be framed as a standard classification task
training data, feature definition, classifier, evaluation
→ supervised approaches
most useful information:
syntactic and lexical context (collocational features)
words related to the different senses of a given word (bag of
word features)
words in dictionary (thesaurus, etc.) entries
other approaches try to make use of unannotated data
bootstrapping, unsupervised learning
would be great, but not as successful as supervised approaches
(and harder to interpret and work with)
41 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Useful empirical facts
skewed distribution of senses
→ most frequent sense baseline
→ heuristic when no other information is available
BUT distribution varies with text/corpus! (cone in geometry
textbook)
one sense per collocation
bass+player=bass 7
→ simple cues for sense classification (heuristic)
one sense per discourse
different occurences of a word in a given text tend to be used
in the same sense
→ heuristic for classification and for data gathering
42 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Conceptual problems
the task as currently defined does no allow for generalization
over different words → learning is word-specific
number of classes = number of senses; equal to or greater
than number of words!
need training data for every sense of every word
most words have low frequency (Zipf’s law)
no chance with unknown words
this wouldn’t be a problem if word sense alternation were like
bank 1 − bank 2 (homonymy). . .
. . . but many alternations are systematic! (regular polysemy,
metonymy, metaphor)
43 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Conceptual problems
the task as currently defined does no allow for generalization
over different words → learning is word-specific
number of classes = number of senses; equal to or greater
than number of words!
need training data for every sense of every word
most words have low frequency (Zipf’s law)
no chance with unknown words
this wouldn’t be a problem if word sense alternation were like
bank 1 − bank 2 (homonymy). . .
. . . but many alternations are systematic! (regular polysemy,
metonymy, metaphor)
43 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Regular polysemy
conversion
bank (N): financial institution
bank (V): put money in a bank
same for sugar, hammer, tango, etc. (also derivation: -ize)
adjectives (Boleda 2007)
qualitative vs. relational: cara familiar (‘familiar face’) vs.
reunió familiar (‘family meeting’)
event-related vs. qualitative: fet sabut (‘known fact’) vs. home
sabut (‘wise man’)
44 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Regular polysemy
conversion
bank (N): financial institution
bank (V): put money in a bank
same for sugar, hammer, tango, etc. (also derivation: -ize)
adjectives (Boleda 2007)
qualitative vs. relational: cara familiar (‘familiar face’) vs.
reunió familiar (‘family meeting’)
event-related vs. qualitative: fet sabut (‘known fact’) vs. home
sabut (‘wise man’)
44 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Regular polysemy: mass/count
animal/meat
chicken1 : animal; chicken2 : meat
lamb 1 : animal; lamb 2 : meat
...
portions/kinds: two beers
two servings of beer
two types of beer
generally: thing/derived substance (grinding)
After several lorries had run over the body, there was rabbit
splattered all over the road.
45 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Regular polysemy
verb alternations
causative/inchoative (Levin 1993)
John broke the window
The window broke
Spanish psychological verbs
Le preocupa la situación (Dative + Subject)
Bruna no quiere preocuparla (subject + Accusative)
46 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Contextual coercion / Logical metonymy
(Also see course by Louise McNally.)
object to eventuality (Pustejovsky 1995)
Mary enjoyed the book.
After three martinis, Kim felt much happier.
adjectives (Pustejovsky 1995): event selection
fast runner vs. fast typist vs. fast car
47 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Metonymy
container/content
He drank a bottle of whisky.
Morphology again: He drank a bottleful of whisky. (-ful
suffixation)
fruit/plant
olive, grapefruit, . . .
Spanish: often tree masculine (olivo, naranjo), fruit feminine
(oliva, naranja)
figure/ground
Kim painted the door
Kim walked through the door
48 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Metonymy
country names
Location: I live in China.
Government: The US and Lybia have agreed to work together
to solve. . .
Team (sports): England won last’s year World Cup.
more generally: institutions
Barcelona applied for the Olympic Games.
The banks won’t give credits now.
The newspapers criticized this policy.
object/person
The cello is playing badly.
Not so regular: contextual metaphor: The ham sandwich
wants his check. (Lakoff & Johnson 1980)
49 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Metonymy
country names
Location: I live in China.
Government: The US and Lybia have agreed to work together
to solve. . .
Team (sports): England won last’s year World Cup.
more generally: institutions
Barcelona applied for the Olympic Games.
The banks won’t give credits now.
The newspapers criticized this policy.
object/person
The cello is playing badly.
Not so regular: contextual metaphor: The ham sandwich
wants his check. (Lakoff & Johnson 1980)
49 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Metaphor
physical → mental
depart 1 : physical transfer; arrive 1 : physical transfer; go 1 :
physical transfer
depart 2 : mental transfer; arrive 2 : mental transfer; go 2 : mental
transfer
concrete → abstract
aigua clara (‘clear water’) vs. estil clar (‘clear style’)
cabells negres (‘black hair’) vs. humor negre (‘black humour’)
50 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Metaphor
physical → mental
depart 1 : physical transfer; arrive 1 : physical transfer; go 1 :
physical transfer
depart 2 : mental transfer; arrive 2 : mental transfer; go 2 : mental
transfer
concrete → abstract
aigua clara (‘clear water’) vs. estil clar (‘clear style’)
cabells negres (‘black hair’) vs. humor negre (‘black humour’)
50 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
To sum up
pervasive systematicity in sense alternations: regular
polysemy, metonymy, metaphor
productive
We found a little, hairy wampimuk sleeping behind the tree
(McDonald & Ramscar 2001)
Wampimuk soup is delicious!
inherent property of language
analogical reasoning
(psychology again)
WSD as currently handled cannot capture these regularities
theoretical and practical problem!
51 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
To sum up
pervasive systematicity in sense alternations: regular
polysemy, metonymy, metaphor
productive
We found a little, hairy wampimuk sleeping behind the tree
(McDonald & Ramscar 2001)
Wampimuk soup is delicious!
inherent property of language
analogical reasoning
(psychology again)
WSD as currently handled cannot capture these regularities
theoretical and practical problem!
51 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
To sum up
pervasive systematicity in sense alternations: regular
polysemy, metonymy, metaphor
productive
We found a little, hairy wampimuk sleeping behind the tree
(McDonald & Ramscar 2001)
Wampimuk soup is delicious!
inherent property of language
analogical reasoning
(psychology again)
WSD as currently handled cannot capture these regularities
theoretical and practical problem!
51 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
To sum up
pervasive systematicity in sense alternations: regular
polysemy, metonymy, metaphor
productive
We found a little, hairy wampimuk sleeping behind the tree
(McDonald & Ramscar 2001)
Wampimuk soup is delicious!
inherent property of language
analogical reasoning
(psychology again)
WSD as currently handled cannot capture these regularities
theoretical and practical problem!
51 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD and regularities: what one can do
generalize on FEATURES
e.g., jazz → MUSIC-STYLE → jazz, rock, blues, . . .
provided some lexical resource is available that encodes this
information
He is a jazz bass player.
→ I love bass solos in rock music.
problem: when (how) to generalize? when to stop?
52 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD and regularities: what one can do
generalize on FEATURES
e.g., jazz → MUSIC-STYLE → jazz, rock, blues, . . .
provided some lexical resource is available that encodes this
information
He is a jazz bass player.
→ I love bass solos in rock music.
problem: when (how) to generalize? when to stop?
52 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD and regularities: what one can do
generalize on FEATURES
e.g., jazz → MUSIC-STYLE → jazz, rock, blues, . . .
provided some lexical resource is available that encodes this
information
He is a jazz bass player.
→ I love bass solos in rock music.
problem: when (how) to generalize? when to stop?
52 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD and regularities: what one can do
generalize on FEATURES
e.g., jazz → MUSIC-STYLE → jazz, rock, blues, . . .
provided some lexical resource is available that encodes this
information
He is a jazz bass player.
→ I love bass solos in rock music.
problem: when (how) to generalize? when to stop?
52 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD and regularities: what would be desirable
train on chicken and use the data for lamb, wampimuk, . . .
Resources such as WordNet encode the meat/animal
distinction:
WordNet info for chicken:
chicken1 :
chicken2 :
chicken3 :
chicken4 :
the flesh of a chicken used for food.
a domesticated gallinaceous bird (hyponym).
a person who lacks confidence.
a foolhardy competition.
WordNet info for lamb:
lamb 1 :
lamb 2 :
lamb 3 :
lamb 4 :
young sheep.
a person easily deceived or cheated.
a sweet innocent mild-mannered person.
the flesh of a young domestic sheep eaten as food .
WHAT IS MISSING: link between chicken2 and lamb 1 , chicken1
and lamb 4 (note other senses)
53 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD and regularities: what would be desirable
train on chicken and use the data for lamb, wampimuk, . . .
Resources such as WordNet encode the meat/animal
distinction:
WordNet info for chicken:
chicken1 :
chicken2 :
chicken3 :
chicken4 :
the flesh of a chicken used for food.
a domesticated gallinaceous bird (hyponym).
a person who lacks confidence.
a foolhardy competition.
WordNet info for lamb:
lamb 1 :
lamb 2 :
lamb 3 :
lamb 4 :
young sheep.
a person easily deceived or cheated.
a sweet innocent mild-mannered person.
the flesh of a young domestic sheep eaten as food .
WHAT IS MISSING: link between chicken2 and lamb 1 , chicken1
and lamb 4 (note other senses)
53 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
WSD and regularities: what would be desirable
train on chicken and use the data for lamb, wampimuk, . . .
Resources such as WordNet encode the meat/animal
distinction:
WordNet info for chicken:
chicken1 :
chicken2 :
chicken3 :
chicken4 :
the flesh of a chicken used for food.
a domesticated gallinaceous bird (hyponym).
a person who lacks confidence.
a foolhardy competition.
WordNet info for lamb:
lamb 1 :
lamb 2 :
lamb 3 :
lamb 4 :
young sheep.
a person easily deceived or cheated.
a sweet innocent mild-mannered person.
the flesh of a young domestic sheep eaten as food .
WHAT IS MISSING: link between chicken2 and lamb 1 , chicken1
and lamb 4 (note other senses)
53 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Word Sense Disambiguation
Computational Lexical Semantics
Gemma Boleda1
1 Universitat
Stefan Evert2
Politècnica de Catalunya
2 University
of Osnabrück
ESSLLI. Bordeaux, France, July 2009.
54 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Classifier example 1: Naive Bayes
probabilistic classifier (related to HMMs)
choosing the best sense amounts to choosing the most
probable sense given the feature vector
conditional probability
BUT it is impossible to train it directly (too many feature
combinations)
2 strategies:
decomposing the probabilities (Bayes’ rules) → easier to
estimate
making unrealistic assumption: words are independent (→
Naive Bayes)
training the classifier = estimating probabilities from the
sense-tagged corpus
55 / 56
Overview
Supervised WSD
Evaluation
Dictionary and Thesaurus Methods
Other approaches to WSD
Discussion
Other example classifiers
Classifier example 2: Decision Lists
similar to decision trees (difference: only one condition)
Rule
fish within window
striped bass
guitar within window
play/V bass
→
→
→
→
Sense
bass4
bass4
bass7
bass7
Figure: Decision List for word bass
to learn a decision list classifier:
generate and order tests according to the training data
56 / 56