Semi-Supervised Natural Language Learning Reading Group • I set up a site at: http://www.cs.cmu.edu/~acarlson/semisup ervised/ • Cover other applications of semisupervised learning? • Volunteers? • Every week or bi-weekly? • Time change? 1pm? Noon? Unsupervised Word Sense Disambiguation Rivaling Supervised Methods Author: David Yarowsky (1995) Presented by: Andy Carlson Word Sense Disambiguation • Determining what sense of a word is meant in a given sentence • “Toyota is considering opening a plant in Detroit.” • “The banana plant is grown all over the tropics for its fruit.” • Different from sense induction– we assume we already know distinct senses Using unlabeled data • Two properties of language let us use unlabeled data: • One sense per collocation – Nearby words provide strong and consistent clues • One sense per discourse – With a document, the sense of a word is highly consistent • We can base an iterative bootstrapping algorithm on these two properties One sense per discourse • How accurate? • How frequently does it apply? Decision Lists • List of rules of the form “collocation => sense” • Example: life (within 2-10 words) => biological sense of plant • Rules are ordered by log-likelihood ratio The algorithm – step 1 • Find all occurrences of the given polysemous word • We follow examples for the word plant Step 2 – Initial Labeling • For each sense of the word, identify a small number of training examples • Strategies: dictionary words, humanlabelling of most frequent collocates, or human-chosen collocates • Example: the words life and manufacturing are used as seed collocations Labeled as ‘living’ plant Unlabeled examples Labeled as ‘factory’ plant Sample initial state Step 3a • Train the decision list based on the current labeling of the state space Step 3b • Apply learned classifier to all examples Step 3c • Optionally, apply the one-sense-perdiscourse constraint Step 3c Step 3c After steps 3b and 3c Step 3d • Repeat step 3 iteratively • Details – grow window size for collocations, and randomly perturb the class inclusion threshold Step 4 • Stop. The algorithm converges to a stable residual set. Sample final state Final decision list Results
© Copyright 2026 Paperzz