CSA3202 - Search

CSA2050:
Introduction to Computational
Linguistics
Part of Speech (POS) Tagging II
Transformation Based Tagging
 Brill (1995)

3 Approaches to Tagging
1.
Rule-Based Tagger: ENGTWOL Tagger
(Voutilainen 1995)
2.
Stochastic Tagger: HMM-based Tagger
3.
Transformation-Based Tagger: Brill Tagger
(Brill 1995)
February 2007
CSA3050: Tagging III and Chunking
2
Transformation-Based Tagging


A combination of rule-based and stochastic
tagging methodologies:
 like the rule-based tagging because rules are
used to specify tags in a certain environment;
 like stochastic tagging, because machine
learning is used.
 uses Transformation-Based Learning (TBL)
Input:
 tagged corpus
 dictionary (with most frequent tags)
April 2005
CLINT Lecture IV
3
Transformation-Based Tagging
Basic Process:
Set the most probable tag for each word as a
start value, e.g. tag all “race” with NN
P(NN|race) = .98
P(VB|race) = .02
The set of possible transformations is limited




April 2005
by using a fixed number of rule templates,
containing slots and
allowing a fixed number of fillers to fill the slots
CLINT Lecture IV
4
Transformation Based Error
Driven Learning
unannotated
text
initial
state
annotated
text
retag
TRUTH
transformation
rules
diagram after Brill (1996)
April 2005
learner
CLINT Lecture IV
5
TBL Requirements




Initial State Annotator
List of allowable transformations
Scoring function
Search strategy
April 2005
CLINT Lecture IV
6
Initial State Annotation

Input




Corpus
Dictionary
Frequency counts for each entry
Output

Corpus tagged with most frequent tags
February 2007
CSA3050: Tagging III and Chunking
7
Transformations
Each transformation comprises
 A source tag
 A target tag
 A triggering environment
Example
 NN
 VB
 Previous tag is TO
February 2007
CSA3050: Tagging III and Chunking
8
More Examples
Source tag
Target Tag
Triggering Environment
NN
VB
previous tag is TO
VBP
tags is MD
VB
one of the three previous
JJR
RBR
next tag is JJ
VBP
VB
words is n’t
February 2007
one of the two previous
CSA3050: Tagging III and Chunking
9
TBL Requirements




Initial State Annotator
List of allowable transformations
Scoring function
Search strategy
February 2007
CSA3050: Tagging III and Chunking
10
Rule Templates
- triggering environments
Schema ti-3
1
2
3
4
5
6
7
8
9
April 2005
ti-2
ti-1
ti
*
*
*
*
*
*
*
*
*
CLINT Lecture IV
ti+1
ti+2
ti+3
11
Set of Possible
Transformations
The set of possible transformations is
enumerated by allowing



every possible tag or word
in every possible slot
in every possible schema
This set can get quite large
February 2007
CSA3050: Tagging III and Chunking
12
Rule Types and Instances
Brill’s Templates
• Each rule begins with change tag a to tag b
• The variables a,b,z,w range over POS tags
• All possible variable substitutions are considered
April 2005
CLINT Lecture IV
13
TBL Requirements




Initial State Annotator
List of allowable transformations
Scoring function
Search strategy
February 2007
CSA3050: Tagging III and Chunking
14
Scoring Function
For a given tagging state of the corpus
For a given transformation
For every word position in the corpus


February 2007
If the rule applies and yields a correct tag, increment
score by 1
If the rule applies and yields an incorrect tag,
decrement score by 1
CSA3050: Tagging III and Chunking
15
The Basic Algorithm


Label every word with its most likely tag
Repeat the following until a stopping
condition is reached.




Examine every possible transformation, selecting
the one that results in the most improved tagging
Retag the data according to this rule
Append this rule to output list
Return output list
April 2005
CLINT Lecture IV
16
Examples of learned rules
April 2005
CLINT Lecture IV
17
TBL: Remarks
Execution Speed: TBL tagger is slower than
HMM approach.
 Learning Speed is slow: Brill’s implementation
over a day (600k tokens)
BUT …



April 2005
Learns small number of simple, nonstochastic rules
Can be made to work faster with Finite
State Transducers
CLINT Lecture IV
18
Tagging Unknown Words
New words added to (newspaper) language
20+ per month
Plus many proper names …
Increases error rates by 1-2%
Methods







April 2005
Assume the unknowns are nouns.
Assume the unknowns have a probability
distribution similar to words occurring once in the
training set.
Use morphological information, e.g. words ending
with –ed tend to be tagged VBN.
CLINT Lecture IV
19
Evaluation

The result is compared with a manually
coded “Gold Standard”



Typically accuracy reaches 95-97%
This may be compared with the result for a
baseline tagger (one that uses no context).
Important: 100% accuracy is impossible even
for human annotators.
April 2005
CLINT Lecture IV
20
A word of caution


95% accuracy: every 20th token wrong
96% accuracy: every 25th token wrong



an improvement of 25% from 95% to 96% ???
97% accuracy: every 33th token wrong
98% accuracy: every 50th token wrong
April 2005
CLINT Lecture IV
21
How much training data is
needed?




When working with the STTS (50 tags) we
observed
a strong increase in accuracy when testing
on 10´000, 20´000, …, 50´000 tokens,
a slight increase in accuracy when testing on
up to 100´000 tokens,
hardly any increase thereafter.
April 2005
CLINT Lecture IV
22
Summary



Tagging decisions are conditioned on a wider
range of events that HMM models mentioned
earlier. For example, left and right context
can be used simultaneously.
Learning and tagging are simple, intuitive and
understandable.
Transformation-based learning has also been
applied to sentence parsing.
April 2005
CLINT Lecture IV
23
The Three Approaches
Compared



Rule Based
 Hand crafted rules
 It takes too long to come up with good rules
 Portability problems
Stochastic
 Find the sequence with the highest probability – Viterbi Algorithm
 Result of training not accessible to humans
 Large volume of intermediate results
Transformation
 Rules are learned
 Small number of rules
 Rules can be inspected and modified by humans
April 2005
CLINT Lecture IV
24