Word Meaning & Word Sense Disambiguation CMSC 723 / LING 723 / INST 725 MARINE CARPUAT [email protected] Today • Representing word meaning • Word sense disambiguation as supervised classification • Word sense disambiguation without annotated examples Drunk gets nine year in violin case. http://www.ling.upenn.edu/ ˜beatrice/humor/headlines.html How do we know that a word (lemma) has distinct senses? • Linguists often design tests for this purpose Which flight serves breakfast? • e.g., zeugma combines distinct senses in an uncomfortable way Which flights serve Tuscon? *Which flights serve breakfast and Tuscon? Where can we look up the meaning of words? • Dictionary? Word Senses • “Word sense” = distinct meaning of a word • Same word, different senses – Homonyms (homonymy): unrelated senses; identical orthographic form is coincidental – Polysemes (polysemy): related, but distinct senses – Metonyms (metonymy): “stand in”, technically, a subcase of polysemy • Different word, same sense – Synonyms (synonymy) • Homophones: same pronunciation, different orthography, different meaning – Examples: would/wood, to/too/two • Homographs: distinct senses, same orthographic form, different pronunciation – Examples: bass (fish) vs. bass (instrument) Relationship Between Senses • IS-A relationships – From specific to general (up): hypernym (hypernymy) – From general to specific (down): hyponym (hyponymy) • Part-Whole relationships – wheel is a meronym of car (meronymy) – car is a holonym of wheel (holonymy) WordNet: a lexical database for English https://wordnet.princeton.edu/ • Includes most English nouns, verbs, adjectives, adverbs • Electronic format makes it amenable to automatic manipulation: used in many NLP applications • “WordNets” generically refers to similar resources in other languages WordNet: History • Research in artificial intelligence: – How do humans store and access knowledge about concept? – Hypothesis: concepts are interconnected via meaningful relations – Useful for reasoning • The WordNet project started in 1986 – Can most (all?) of the words in a language be represented as a semantic network where words are interlinked by meaning? – If so, the result would be a large semantic network… Synonymy in WordNet • WordNet is organized in terms of “synsets” – Unordered set of (roughly) synonymous “words” (or multi-word phrases) • Each synset expresses a distinct meaning/concept WordNet: Example Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt” What do you think of the sense granularity? The “Net” Part of WordNet {c o n v e y a n c e ;tra n s p o rt} h y p e ro n y m {v e h ic le } { b u m p e r} h y p e ro n y m { m o to rv e h ic le ;a u to m o tiv ev e h ic le } m e ro n y m { c a rd o o r} m e ro n y m h y p e ro n y m {c a r;a u to ;a u to m o b ile ;m a c h in e ;m o to rc a r} m e ro n y m { d o o rlo c k } m e ro n y m { c a rw in d o w } { c a rm irro r} h y p e ro n y m h y p e ro n y m {c ru is e r;s q u a dc a r;p a tro lc a r;p o lic ec a r;p ro w lc a r} { h in g e ;fle x ib lejo in t} {c a b ;ta x i;h a c k ;ta x ic a b ;} { a rm re st} WordNet 3.0: Size Part of speech Word form Synsets Noun 117,798 82,115 Verb 11,529 13,767 Adjective 21,479 18,156 Adverb 4,481 3,621 Total 155,287 117,659 http://wordnet.princeton.edu/ Word Sense From WordNet: Noun {pipe, tobacco pipe} (a tube with a small bowl at one end; used for smoking tobacco) {pipe, pipage, piping} (a long tube made of metal or plastic that is used to carry water or oil or gas etc.) {pipe, tube} (a hollow cylindrical shape) {pipe} (a tubular wind instrument) {organ pipe, pipe, pipework} (the flues and stops on a pipe organ) Verb {shriek, shrill, pipe up, pipe} (utter a shrill cry) {pipe} (transport by pipeline) “pipe oil, water, and gas into the desert” {pipe} (play on a pipe) “pipe a tune” {pipe} (trim with piping) “pipe the skirt” WORD SENSE DISAMBIGUATION Word Sense Disambiguation • Task: automatically select the correct sense of a word – Input: a word in context – Output: sense of the word – Can be framed as lexical sample (focus on one word type at a time) or all-words (disambiguate all content words in a document) • Motivated by many applications: – Information retrieval – Machine translation –… How big is the problem? • Most words in English have only one sense – 62% in Longman’s Dictionary of Contemporary English – 79% in WordNet • But the others tend to have several senses – Average of 3.83 in LDOCE – Average of 2.96 in WordNet • Ambiguous words are more frequently used – In the British National Corpus, 84% of instances have more than one sense • Some senses are more frequent than others Ground Truth • Which sense inventory do we use? Existing Corpora • Lexical sample – line-hard-serve corpus (4k sense-tagged examples) – interest corpus (2,369 sense-tagged examples) –… • All-words – SemCor (234k words, subset of Brown Corpus) – Senseval/SemEval (2081 tagged content words from 5k total words) –… Evaluation • Intrinsic – Measure accuracy of sense selection wrt ground truth • Extrinsic – Integrate WSD as part of a bigger end-to-end system, e.g., machine translation or information retrieval – Compare WSD Baseline Performance • Baseline: most frequent sense – Equivalent to “take first sense” in WordNet – Does surprisingly well! 62% accuracy in this case! Upper Bound Performance • Upper bound – Fine-grained WordNet sense: 75-80% human agreement – Coarser-grained inventories: 90% human agreement possible WSD as Supervised Classification Training Testing training data ? label1 label2 label3 unlabeled document label4 Feature Functions label1? supervised machine learning algorithm label2? Classifier label3? label4? WSD Approaches • Depending on use of manually created knowledge sources – Knowledge-lean – Knowledge-rich • Depending on use of labeled data – Supervised – Semi- or minimally supervised – Unsupervised Simplest WSD algorithm: Lesk’s Algorithm • Intuition: note word overlap between context and dictionary entries – Unsupervised, but knowledge rich The bank can guarantee deposits will eventually cover future tuition costs because it invests in adjustable-rate mortgage securities. WordNet Lesk’s Algorithm • Simplest implementation: – Count overlapping content words between glosses and context • Lots of variants: – – – – – Include the examples in dictionary definitions Include hypernyms and hyponyms Give more weight to larger overlaps (e.g., bigrams) Give extra weight to infrequent words … WSD Accuracy • Generally – Supervised approaches yield ~70-80% accuracy • However – depends on actual word, sense inventory, amount of training data, number of features, etc. WORD SENSE DISAMBIGUATION: MINIMIZING SUPERVISION Minimally Supervised WSD • Problem: annotations are expensive! • Solution 1: “Bootstrapping” or co-training (Yarowsky 1995) – – – – – Start with (small) seed, learn classifier Use classifier to label rest of corpus Retain “confident” labels, add to training set Learn new classifier Repeat… Heuristics (derived from observation): – One sense per discourse – One sense per collocation One Sense per Discourse A word tends to preserve its meaning across all its occurrences in a given discourse • Gale et al. 1992 – 8 words with two-way ambiguity, e.g. plant, crane, etc. – 98% of the two-word occurrences in the same discourse carry the same meaning • Krovetz 1998 – Heuristic true mostly for coarse-grained senses and for homonymy rather than polysemy – Performance of “one sense per discourse” measured on SemCor is approximately 70% One Sense per Collocation A word tends to preserve its meaning when used in the same collocation – Strong for adjacent collocations – Weaker as the distance between words increases • Evaluation: – 97% precision on words with two-way ambiguity – Again, accuracy depends on granularity: • 70% precision on SemCor words Yarowsky’s Method: Stopping • Stop when: – Error on training data is less than a threshold – No more training data is covered Yarowsky’s Method: Discussion • Advantages – Accuracy is about as good as a supervised algorithm – Bootstrapping: far less manual effort • Disadvantages – Seeds may be tricky to construct – Works only for coarse-grained sense distinctions – Snowballing error with co-training nd 2 WSD with language as supervision • Problems: annotations are expensive! • What’s the “proper” sense inventory? – How fine or coarse grained? – Application specific? • Observation: multiple senses translate to different words in other languages – Use the foreign language as the sense inventory – Added bonus: annotations for free! (Using machine translation data) Today • Representing word senses & word relations – WordNet • Word Sense Disambiguation – Lesk Algorithm – Supervised classification – Minimizing supervision • Co-training: combine 2 views of data • Use parallel text • Next: general techniques for learning without annotated training examples – If needed review probability, expectations
© Copyright 2026 Paperzz