slides day 2 (Compound nouns)

English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Compound nouns
Ann Copestake
Natural Language and Information Processing Group
Computer Laboratory
University of Cambridge
May 2008
Semi-productivity
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Course topics
• Friday: Compound nouns: the problem, data-driven
approaches to interpretation, semi-productivity and
compounds.
• Monday: Generative lexicon: logical metonymy, interaction
between logical metonymy and pragmatics, data-driven
approaches to logical metonymy.
• Tuesday: Idioms in HPSG implementations. Speech acts
and conventionalisation.
• Wednesday: Generation: realisation ranking. Lexical
selection: grammar vs collocation.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Outline.
English compound nouns: the problems
Bracketing
Compound relations
Statistical techniques for interpretation of compound relations
Distributional methods in general
Compound nouns
Semi-productivity
Semi-productivity in general
Semi-productivity in English compounds
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Outline.
English compound nouns: the problems
Bracketing
Compound relations
Statistical techniques for interpretation of compound relations
Distributional methods in general
Compound nouns
Semi-productivity
Semi-productivity in general
Semi-productivity in English compounds
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Problem overview.
Interpretation:
• identification (especially for hapaxes)
• bracketing: cat food container label
• semantics relations between elements (esp. when no
domain limitations)
Compounding is (somewhat) productive, so we can’t list all the
compounds.
Generation:
• when is compound possible/appropriate?
• special cases: compounds with ’s
• stress (speech synthesis)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Bracketing compound nouns (Lauer 1995)
(Illustrate on three noun compounds, extends to longer ones)
N1 N2 N3 cat food container
adjacency model:
plausibility N2 with N1 vs N2 with N3 cat-food vs food-container
dependency model plausibility of N1 as a dependent of N2 vs
N1 as a dependent of N3 cat-container vs cat-food
Why dependency model:
((cat food) container): cat
(cat (food container)): cat
food and food container
container and food container
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Bracketing compound nouns
• Train models by collecting adjacent nouns in a large
corpus.
• Dependency (80%) substantially outperforms adjacency
(75%)
• Best results are very close to human agreement: solved
problem!
• BUT: requires a separate model from the general parse
ranking model, even for Penn TreeBank parsers.
• Either: underspecification in the grammar, post-processing
to disambiguate bracketing.
Or: use compound model to provide features for a
maximum entropy model.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Internal meaning of compounds
Compounds vs adjectives:
sharp knife: knife0 (x) ∧ sharp0 (x) (roughly)
cheese knife: NOT knife0 (x) ∧ cheese0 (x)
(BUT tuna fish could be: tuna0 (x) ∧ fish0 (x))
• cheese knife: knife for cutting cheese
• steel knife: knife made of steel
• kitchen knife: knife characteristically used in the kitchen
• cotton bag: bag made of cotton
• cotton bag: (possibly) bag for cotton
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Contextual interpretation
Novel types of usage
Please sit in the apple juice seat (Downing, 1977)
Weird contextual interpretations:
Context: instructions for a trading game:
Buy and sell orders are executed by placing order slips in paper
bags which correspond to different commodities. For example,
Mary wished to buy cotton, so she put a slip in the cotton bag.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Radical pragmatics and compounds
1. Compounds cannot all be conventionally/compositionally
associated with a limited number of specific meanings (not
generally possible to e.g., recover a verb)
2. Novel types of usage
3. Reasoning about discourse situation is needed to resolve
referent for novel cases
4. Radical pragmatics view: since pragmatics is needed for
interpretation of some compounds, it is unnecessary to
assume any contribution of the lexicon in general
5. Computational analogue: use domain to interpret
compounds.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Complete underspecification of compound meaning
Rule for right-headed noun-noun compounds:
N0
−→ N1
N2
λx[P(x) ∧ Q(y ) ∧ R(x, y )]
λy [Q(y )] λx[P(x)]
juice seat: λx[seat(x) ∧ juice(y ) ∧ R(x, y )]
R is underspecified, to be filled in by pragmatics, based on the
discourse context.
(Here and below, ignore the issue of the quantification of
non-head element. )
• Not much help for modern computational linguistics!
• Theoretical problems when compound generation is
considered.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Outline.
English compound nouns: the problems
Bracketing
Compound relations
Statistical techniques for interpretation of compound relations
Distributional methods in general
Compound nouns
Semi-productivity
Semi-productivity in general
Semi-productivity in English compounds
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
You shall know a word by the company it keeps!
(Firth, 1957)
Words represented as vectors of features:
feature1 feature2 ... featuren
word1
f1,1
f2,1
fn,1
word2
f1,2
f2,2
fn,2
...
wordm
f1,m
f2,m
fn,m
Features: co-occur with wordn in some window, co-occur with
wordn as a syntactic dependent, occur in paragraphn , occur in
documentn . . .
First computational application: Spärck Jones (1964)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Words co-occurring with words
apricot
pineapple
digital
information
arts boil data function large sugar summarized water
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
0
(from Jurafsky and Martin, 2008)
apricot: { boil, large, sugar, water }
pineapple: { boil, large, sugar, water }
digital: { arts, data, function, summarized }
information: { arts, data, function, summarized }
Clustering: group together words with ‘similar’ vectors.
First computational implementation by Harper (1965), first use
of statistically sound similarity metric by Spärck Jones (1967).
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Words co-occurring with words
apricot
pineapple
digital
information
arts boil data function large sugar summarized water
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
0
(from Jurafsky and Martin, 2008)
apricot: { boil, large, sugar, water }
pineapple: { boil, large, sugar, water }
digital: { arts, data, function, summarized }
information: { arts, data, function, summarized }
Clustering: group together words with ‘similar’ vectors.
First computational implementation by Harper (1965), first use
of statistically sound similarity metric by Spärck Jones (1967).
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Words co-occurring with words
apricot
pineapple
digital
information
arts boil data function large sugar summarized water
0
1
0
0
1
1
0
1
0
1
0
0
1
1
0
1
1
0
1
1
0
0
1
0
1
0
1
1
0
0
1
0
(from Jurafsky and Martin, 2008)
apricot: { boil, large, sugar, water }
pineapple: { boil, large, sugar, water }
digital: { arts, data, function, summarized }
information: { arts, data, function, summarized }
Clustering: group together words with ‘similar’ vectors.
First computational implementation by Harper (1965), first use
of statistically sound similarity metric by Spärck Jones (1967).
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Compound noun relations
• cheese knife: knife for cutting cheese
• steel knife: knife made of steel
• kitchen knife: knife characteristically used in the kitchen
Automatic disambiguation:
• Syntactic parsers can’t distinguish: N1(x), N2(y),
compound(x,y)
• One approach: human annotation of compounds, machine
learning of unseen examples.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Compound noun relation schemes
• Lauer: prepositions, Lapata: verbal compounds, Girju et al,
Turner.
• Ó Séaghdha, 2007: BE, HAVE, INST, ACTOR, IN, ABOUT:
(with subclasses)
LEX: lexicalised, REL: weird, MISTAG: not a noun
compound.
• Based on Levi (1978)
• Considerable experimentation to define a usable scheme:
some classes very rare (therefore not annotated reliably)
• Annotation in context (helicopter radio), though mostly not
important.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Compound noun relation learning
(Ó Séaghdha, 2007)
Semi-productivity
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Compound noun relation learning
(Ó Séaghdha)
Semi-productivity
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Squirrels and pasties
Semi-productivity
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
How to make squirrel pasties: Observer, May 11, 2008
Kevin Viner’s recipe for two pasties
140g squirrel meat cut into 1cm cubes;
100g sliced potato; 100g sliced swede; 50g diced onion; 30g smoked bacon;
15g chopped hazelnuts; 75g butter;
5g chopped parsley; a good pinch of salt and pepper
Method
Egg wash edges of pastry circles.
Place the potato, swede, hazelnuts, parsley and seasoning on to each circle
followed by the bacon, squirrel meat and, finally, the onion.
Place butter in each pasty, then fold over the pastry and crimp the edges.
Put the pasties on to a greaseproof baking tray, egg wash both pasties well,
place in a pre-heated oven at 180C or gas mark 5.
Bake for 45-50 minutes. The juices should start to boil and the pasties should
be able to move on the tray with ease.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Compound noun relation learning
(Ó Séaghdha)
• Treat compounds as single words: doesn’t work!
• Constituent similarity: compounds x1 x2 and y1 y2,
compare x1 vs y1 and x2 vs y2.
squirrel vs pork, pasty vs pie
• Relational similarity: sentences with x1 and x2 vs
sentences with y1 and y2.
squirrel is very tasty, especially in a pasty vs
pies are filled with tasty pork
Semi-productivity
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Human annotation
• Preliminary to supervised machine learning, evaluation of
unsupervised techniques.
• Methodology: define categories, develop guidelines,
multiple annotators, measure annotator agreement, refine
categories and guidelines . . .
• Kappa values of around .7 quite usual in semantic
annotation (raw agreement somewhat higher).
• What’s going on?
Sometimes, local effects: sponsorship cash. Cash gained
through sponsorship (INST) or sponsorship in form of cash
(BE)?
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Human annotation
• Preliminary to supervised machine learning, evaluation of
unsupervised techniques.
• Methodology: define categories, develop guidelines,
multiple annotators, measure annotator agreement, refine
categories and guidelines . . .
• Kappa values of around .7 quite usual in semantic
annotation (raw agreement somewhat higher).
• What’s going on?
Sometimes, local effects: sponsorship cash. Cash gained
through sponsorship (INST) or sponsorship in form of cash
(BE)?
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Outline.
English compound nouns: the problems
Bracketing
Compound relations
Statistical techniques for interpretation of compound relations
Distributional methods in general
Compound nouns
Semi-productivity
Semi-productivity in general
Semi-productivity in English compounds
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Productivity
• derivational processes in general have very different
productivities, considered as the proportion of possible
inputs they actually apply to e.g. er-suffixation (teacher) vs.
eer-suffixation (profiteer)
• Compound patterns:
non-productive e.g., verb-object: pickpocket, cutpurse, but
no recent examples (bag-snatcher not
*snatchbag)
moderate productivity e.g., has-part: 4-door car but
*sunroof car
highly productive e.g., made-of: ebony tripod (though
obviously not e.g., juice chair)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Sense extension
• e.g., lamb (mass use) meaning lamb meat, sugar as a verb
• generally syntactic as well as semantic effects
• similarity with derivational morphology
• ambiguity rather than vagueness
• formalised as lexical rules
• blocking effects e.g. beef (generally) blocks cow meaning
cow meat
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Productivity of sense extension and morphology
Could capture generalizations with redundancy rules, but:
• new words: e.g., fax, Xerox, Sadolin as verbs
• novel examples in corpora, spontaneous speech: e.g.,
crocodile, mole as mass terms
Badger hams are a delicacy in China while mole
is eaten in many parts of Africa.
• child language experiments
• recursive morphological rules like re-, anti- or great-
prefixation (e.g. rereprogram, anti-anti-missile,
great-great-grandfather)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Exceptions to productivity
• preemption by synonymy (blocking)
• variation in acceptability
The pilot helicoptered
The pilot helicoptered over the forest
Mrs Clinton was helicoptered to the base
The pilot helicoptered the forest
The pilot helicoptered his passengers sick
Kim dirigibled Sandy to Detroit
Such cases are problematic for the assumption that there
are narrow semantic classes controlling alternations
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Item familiarity
Observation: the frequency with which a given word form is
associated with a particular lexical item (form, syntax, meaning
triples) is often highly skewed.
Assumptions:
• speakers have some idea of frequency of lexical items,
even for forms that could be productively derived
• speakers generally use higher frequency forms to convey a
given meaning
• hearers generally assume higher frequency meanings for a
given form
cf Goldberg (1993), Bauer (1983:71f) in context of morphology
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Productivity
• speakers will not generally use unattested forms unless
licensed by a high productivity lexical rule
• probability of unseen but derivable lexical items can be
differentially estimated based on productivity of the
corresponding rule
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Blocking
• Assume highest frequency existing form that meets
constraints
• Unattested forms postulated if no existing form found
• Blocking is a special case of this principle
• Unblocking allowed for (though we haven’t formally
accounted for rhetorical effect)
In the case of at least one county primary school . . . they were
offered (with perfect timing) saute potatoes, carrots, runner
beans and roast cow.
(Guardian, May 16th 1990, in a story about mad cow disease.)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Blocking
• Assume highest frequency existing form that meets
constraints
• Unattested forms postulated if no existing form found
• Blocking is a special case of this principle
• Unblocking allowed for (though we haven’t formally
accounted for rhetorical effect)
In the case of at least one county primary school . . . they were
offered (with perfect timing) saute potatoes, carrots, runner
beans and roast cow.
(Guardian, May 16th 1990, in a story about mad cow disease.)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Complete underspecification of compound meaning
Rule for right-headed noun-noun compounds:
N0
−→ N1
N2
λx[P(x) ∧ Q(y ) ∧ R(x, y )]
λy [Q(y )] λx[P(x)]
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Problems
1. Overgeneration — some classes of ‘real-world-possible’
compounds do not occur
2. Grammar of compounds (‘possessive’ compounds, stress
assignment)
3. Conventional uses (not just a matter of listing lexicalized
compounds — used bookstore, ?jet blackbird)
4. Failure to account for usual interpretations — most
compounds actually do fit into small number of classes
(e.g. Levi 1978)
5. Pragmatics does not have enough to work on — rule would
predict that unseen compounds could only be
used/interpreted in contexts which could disambiguate
them
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Overgeneration: German compounds with
non-compound translations
Arzttermin
*doctor appointment
Terminvorschlag
Terminvereinbarung
* date proposal
* date agreement
Januarhälfte
Frühlingsanfang
* January half
* spring beginning
1. *doctor appointment/doctor’s appointment — possessive
compounds
2. *date agreement/agreement on a date — head derived
from a PP-taking verb (water seeker vs *water looker)
3. *spring beginning/beginning of spring — relational nouns
as heads?
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Possessive compounds
• Many compounds with human-denoting first elements
require a possessive in English.
• blacksmith’s hammer/*blacksmith hammer ‘hammer of a
type conventionally associated with a blacksmith’ (also
driver’s cab, widow’s allowance etc).
• Not the usual possessive:
compare (((his blacksmith)’s) hammer)
with (his (blacksmith’s hammer))
• Adjective placement: three English blacksmith’s hammers/
*three blacksmith’s English hammers.
• May be an agentivity requirement: child seat
• Plural forms: children’s lecture
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Stress assignment
• Compounds generally have leftmost stress (simplifying —
see e.g., Bauer (1983), Liberman and Sproat (1992) for
more accurate statement)
• But compounds which have the interpretation ‘Y made of
X’ (e.g., nylon rope, oak table) generally have main stress
on the righthand noun.
• Stress sometimes disambiguates:
Sandy put the shirts into the cotton bag
with righthand stress — bag made of cotton
with leftmost stress — bag for cotton (other reading may
still be available)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Conventional uses
1. Established compounds may be transparent but not
predictable, are often dialect specific: center divide,
dustbin man (vs. rubbish man)
2. Established compounds must be recorded somehow in the
lexicon, at least for generation
3. Therefore, classes should also be lexical, to allow
generalizations to be expressed
4. BUT: in context, even established compounds have
alternative interpretations (Bauer’s garbage man by
analogy with snowman)
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Principles of compound representation
• Compound meaning cannot be completely underspecified
— the grammar/lexicon delimits the range of compounds,
lists established compounds and lists conventional
patterns.
• Patterns of interpretation have differing productivities.
• Some compounds are only possible in context (e.g.,
Downing (1977) apple-juice seat).
• Non-conventional contextual interpretations are available
for all compounds in context (e.g. Bauer (1983) garbage
man like snowman), but are much less likely than
conventional readings.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Back to grammar engineering
• Currently ERG does not bracket compounds well, but
could use Lauer technique (post processing or maybe with
model).
• Currently ERG underspecifies compound relation, doesn’t
do possessive compounds properly, allows any noun-noun
combination as a compound (though disprefers via n-gram
model).
• How does statistical interpretation technique fit?
• Straightforward post-processing: partial disambiguation.
Further refinement if context module.
• Doesn’t directly tell us about productivity of classes, but
could be used to help provide frequencies for classes.
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Integrating the lexicon and pragmatics: Assumptions
1. Lexical representation must integrate with syntax and
compositional semantics
2. Lexicon should not allow arbitrary inference, lexical
semantic information is not open-ended
3. Pragmatics should not access language-specific
(conventional) information
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
The role of pragmatics and discourse context in
compound interpretation
• accept most probable LF that is coherent in context
• instantiate underspecified relationships if necessary
Mary sorted her laundry into various large bags.
She put her skirt in the cotton bag.
Lexically most likely interpretation is bag made of cotton:
nothing makes this incoherent, so accepted.
Mary sorted her laundry into various bags made from plastic.
She put her skirt into the cotton bag.
Lexically most likely interpretation is incoherent because the
definite description cannot be accommodated into the
discourse context.
Second-most likely interpretation — bag used for cotton — is
OK
English compound nouns: the problems
Statistical techniques for interpretation of compound relations
Semi-productivity
Summary
• Compound bracketing
• Compound relations.
• Compound interpretation via statistical methods (e.g.,
distributional techniques).
• General phenomenon of semi-productivity.
• Compound semi-productivity.
• Compounds in ERG, interaction with discourse context.