Feature Transformation through Rule Induction: A Case Study with

A memory-based learning-plus-inference approach
to morphological analysis
Antal van den Bosch
With Walter Daelemans, Ton Weijters, Erwin Marsi,
Abdelhadi Soudi, and Sander Canisius
ILK / Language and Information Sciences Dept.
Tilburg University, The Netherlands
FLaVoR Workshop, 17 November 2006, Leuven
Learning plus inference
• Paradigmatic solution to natural
language processing tasks
• Decomposition:
– The disambiguation of local, elemental
ambiguities in context
– A holistic, global coordination of local
decisions over the entire sequence
Learning plus inference
• Example: grapheme-phoneme
conversion
• Local decisions
– The mapping of a vowel letter in context to
a vowel phoneme with primary stress
• Global coordination
– Making sure that there is only one primary
stress
Learning plus inference
• Example: dependency parsing
• Local decisions
– The relation between a noun and a verb is
of the “subject” type
• Global coordination
– The verb only has one subject relation
Learning plus inference
• Example: named entity recognition
• Local decisions
– A name that can be a location or a person,
is a location in this context
• Global coordination
– Everywhere in the text this name always
refers to the location
Learning plus inference
• Local decision making by learning
– All NLP decisions can be recast as classification tasks
• (Daelemans, 1996: segmentation or identification)
• Global coordination by inference
– Given local proposals that may conflict, find the best
overall solution
• (e.g. minimizing conflict, or adhering to language
model)
• Collins and colleagues; Manning and Klein and
colleagues; Dan Roth & colleagues; Marquez
and Carreras; etc.
L+I and morphology
• Segmentation boundaries, spelling
changes, and PoS tagging recast as
classification
• Global inference checks for
– Noun stem followed by noun inflection
– Infix in a noun-noun compound is
surrounded by two nouns
– Etc.
Talk overview
• English morphological segmentation
– Easy learning
– Inference not really needed
• Dutch morphological analysis
– Learning operations rather than simple decisions
– Reasonably complex inference
• Arabic morphological analysis
– Learning as an attempt at lowering the massive
ambiguity
– Inference as an attempt to separate the chaff from
the grain
English segmentation
• (Van den Bosch, Daelemans, Weijters, NeMLaP 1996)
• Morphological segmentation as classification
• Versus traditional approach:
– E.g. Mitalk’s DECOMP, analysing scarcity:
• First analysis: scar|city - both stems found in morpheme
lexicon, and validated as a possible analysis
• Second analysis: scarc|ity - stem scarce found due to
application of e-deletion rule; suffix -ity found; validated as a
possible analysis
• Cost-based heuristic prefers stem|derivation over stem|stem
• Ingredients: morpheme lexicons, finite state analysis
validator, spelling changing rules, cost heuristics
– Validator, rules, and cost heuristics are costly knowledgebased resources
English segmentation
• Segmentations as local decisions
– To segment or not to segment
– If segment, identify start (or end) of
• Stem
• Affixes
• Inflectional morpheme
English segmentation
• Three tasks: given a letter in context, is
it the start of
– a segment or not
– a derivational morpheme (stem or affix),
inflection, or not
– a stem, a stress-affecting affix, a stressneutral affix, an inflection, or not
English segmentation
Local classification
• Memory-based learning
– k-nearest neighbor classification
– (Daelemans & Van den Bosch, 2005)
• E.g. instance # 9
– m a l i t i e ?
• Nearest neighbors: a lot of evidence for “2”:
Instance
m
t
u
i
g
n
r
c
p
h
…
a
a
a
a
a
a
a
a
a
a
l
l
l
l
l
l
l
l
l
l
i
i
i
i
i
i
i
i
i
i
t
t
t
t
t
t
t
t
t
t
distance clones
i
i
i
i
i
i
i
i
i
i
e
e
e
e
e
e
e
e
e
c
2
2
2
2
2
2
2
2
2
s
0
1
1
1
1
1
1
1
1
2
2x
3x
2x
11x
2x
7x
5x
7x
2x
1x
Memory-based learning
n
(X,Y )   w i (x i , y i )
i1
Similarity function:
•X and Y are instances
• n is the number of features
• xi is the value of the ith feature of X
• wi is the weight of the ith feature
Similarity function components
Generalizing lexicon
• A memory-based morphological analyzer is
– A lexicon: 100% accurate reconstruction of all
examples in training material
– At the same time, capable of processing unseen
words
• In essence, unseen words are the only
problem remaining
– CELEX Dutch has +300k words; average coverage of
text is 90%-95%
– Evaluation should focus solely on unseen words
– So, a held-out test from CELEX is fairly
representative of unseen words
Experiments
• CELEX English
– 65,558 segmented words
– 573,544 instances
• 10-fold cross-validation
–
–
–
–
Measuring accuracy
M1: 88.0% correct test words
M2: 85.6% correct test words
M3: 82.4% correct test words
Add inference
• (Van den Bosch and Canisius, SIGPHON
2006)
• Original approach: only learning
• Now: inference
– Constraint satisfaction inference
– Based on Van den Bosch and Daelemans
(CoNLL 2005) trigram prediction
Constraint satisfaction inference
• Predict trigrams, and use them as
complete as possible
• Formulate the inference procedure as a
constraint satisfaction problem
• Constraint satisfaction
– Assigning values to a number of variables while
satisfying certain predefined constraints
• Constraint satisfaction for inference
– Each token maps to a variable, the domain of which
corresponds to the three candidate labels
– Constraints are derived from the predicted trigrams
Constraint satisfaction inference
input
Trigram constraints
h,a,n → h,{,n
a,n,d → {,n,t
output
h
_
h
{
(1)
a
h
{
n
(2)
n
{
n
t
(3)
d
n
d
_
(4)
Bigram constraints
h,a → h,{
a,n → {,n
n,d → n,t
Unigram constraints
h→h
a→{
n→n
d→t
h,a → h,{
a,n → {,n
n,d → n,d
a→{
n→n
d→d
h→h
a→{
n→n
Constraint satisfaction inference
input
Trigram constraints
h,a,n → h,{,n
a,n,d → {,n,t
output
h
_
h
{
(1)
a
h
{
n
(2)
n
{
n
t
(3)
d
n
d
_
(4)
Bigram constraints
h,a → h,{
a,n → {,n
n,d → n,t
Unigram constraints
h→h
a→{
n→n
d→t
h,a → h,{
a,n → {,n
n,d → n,d
a→{
n→n
d→d
h→h
a→{
n→n
Constraint satisfaction inference
input
Trigram constraints
h,a,n → h,{,n
a,n,d → {,n,t
output
h
_
h
{
(1)
a
h
{
n
(2)
n
{
n
t
(3)
d
n
d
_
(4)
Bigram constraints
h,a → h,{
a,n → {,n
n,d → n,t
Unigram constraints
h→h
a→{
n→n
d→t
h,a → h,{
a,n → {,n
n,d → n,d
a→{
n→n
d→d
h→h
a→{
n→n
Constraint satisfaction inference
input
Trigram constraints
h,a,n → h,{,n
a,n,d → {,n,t
output
h
_
h
{
(1)
a
h
{
n
(2)
n
{
n
t
(3)
d
n
d
_
(4)
Bigram constraints
h,a → h,{
a,n → {,n
n,d → n,t
Unigram constraints
h→h
a→{
n→n
d→t
h,a → h,{
a,n → {,n
n,d → n,d
a→{
n→n
d→d
h→h
a→{
n→n
Constraint satisfaction inference
input
Trigram constraints
h,a,n → h,{,n
a,n,d → {,n,t
output
h
_
h
{
(1)
a
h
{
n
(2)
n
{
n
t
(3)
d
n
d
_
(4)
Bigram constraints
h,a → h,{
a,n → {,n
n,d → n,t
Unigram constraints
h→h
a→{
n→n
d→t
Conflicting
constraints
h,a → h,{
a,n → {,n
n,d → n,d
a→{
n→n
d→d
h→h
a→{
n→n
Weighted constraint satisfaction
• Extension of constraint satisfaction to
deal with overconstrainedness
– Each constraint has a weight associated to it
– Optimal solution assigns those values to the
variables that optimise the sum of weights of
the constraints that are satisfied
• For constrained satisfaction inference, a
constraint's weight should reflect the
classifier's confidence in its correctness
Example instances
Left
_ _ _
_ _ _
_ _ _
_ _ a
_ a b
a b n
b n o
n o r
o r m
r m a
m a l
a l i
l i t
_
_
a
b
n
o
r
m
a
l
i
t
i
focus
_ a b
a b n
b n o
n o r
o r m
r m a
m a l
a l i
l i t
i t i
t i e
i e s
e s _
n
o
r
m
a
l
i
t
i
e
s
_
_
right
o r m
r m a
m a l
a l i
l i t
i t i
t i e
i e s
e s _
s _ _
_ _ _
_ _ _
_ _ _
uni
2
0
s
0
0
0
0
0
1
0
0
0
i
tri
-20
20s
0s0
s00
000
000
000
001
010
100
000
00i
0i-
Results
• Only learning:
– M3: 82.4% correct unseen words
• Learning + CSI:
– M3: 85.4% correct unseen words
• Mild effect.
Dutch morphological analysis
• (Van den Bosch & Daelemans, 1999;
Van den Bosch & Canisius, 2006)
• Task expanded to
– Spelling changes
– Part-of-speech tagging
– Analysis generation
• Dutch is mildly productive
– Compounding
– A bit more inflection than in English
– Infixes, diminutives, …
Dutch morphological analysis
Left
_
_
_
_
_
a
b
n
o
r
m
a
l
i
t
_
_
_
_
a
b
n
o
r
m
a
l
i
t
e
_
_
_
a
b
n
o
r
m
a
l
i
t
e
i
focus
_
_
a
b
n
o
r
m
a
l
i
t
e
i
t
_
a
b
n
o
r
m
a
l
i
t
e
i
t
e
a
b
n
o
r
m
a
l
i
t
e
i
t
e
n
b
n
o
r
m
a
l
i
t
e
i
t
e
n
_
right
n
o
r
m
a
l
i
t
e
i
t
e
n
_
_
o
r
m
a
l
i
t
e
i
t
e
n
_
_
_
r
m
a
l
i
t
e
i
t
e
n
_
_
_
_
m
a
l
i
t
e
i
t
e
n
_
_
_
_
_
uni
A
0
0
0
0
0
0
+Da
A_->N
0
0
0
0
plural
0
tri
-A0
A00
000
000
000
000
00+Da
0+DaA_->N
+DaA_->N0
A_->N00
000
000
00plural
0plural0
plural0-
Spelling changes
• Deletion, insertion, replacement
b n o r m
n o r m a
o r m a l
a
l
i
l i t e i
i t e i t
t e i t e
0
+Da
A_->N
• abnormaliteiten analyzed as
[[abnormaal]A iteit]N[en]plural
• Root form has double a, wordform
drops one a
Part-of-speech
• Selection processes in derivation
n o r m a
o r m a l
r m a l i
l
i
t
i t e i t
t e i t e
e i t e n
+Da
A_->N
0
• Stem abnormaal is an adjective;
• Affix -iteit seeks an adjective to its left
to turn it into a noun
Experiments
• CELEX Dutch:
– 336,698 words
– 3,209,090 instances
• 10-fold cross validation
• Learning only: 41.3% correct unseen
words
• With CSI: 51.9% correct unseen words
• Useful improvement
Arabic analysis
(Marsi, Van den Bosch, and Soudi, 2005)
Arabic analysis
Arabic analysis
Arabic analysis
Arabic analysis
Arabic analysis
• Problem of undergeneration and
overgeneration of analyses
• Undergeneration: at k=1,
– 7 out of 10 analyses of unknown words are correct,
but
– 4 out of 5 of the real analyses are not generated
• Overgeneration: at k=10,
– Only 3 out of 5 are missed, but
– Half of the generated analyses is incorrect
• Harmony at k=3 (F-score 0.42)
Discussion (1)
• Memory-based morphological analysis
– Lexicon and analyzer in one
– Extremely simple algorithm
• Unseen words are the remaining problem
• Learning: local classifications
– From simple boundary decisions
– To complex operations
– And trigrams
• Inference:
– More complex morphologies need more inference
effort
Discussion (2)
• Ceiling not reached yet; good solutions
still wanted
– Particularly for unknown words with
unknown stems
– Also, recent work by De Pauw!
• External evaluation needed
– Integration with part-of-speech tagging
(software packages forthcoming)
– Effect on IR, IE, QA
– Effect in ASR
Thank you.
http://ilk.uvt.nl
[email protected]

Download Report

Feature Transformation through Rule Induction: A Case Study with

Paperzz.com

Your Paperzz