slides day 5 (Generation and lexical selection)

Generation
Lexical selection and collocation
Generation and lexical selection
Ann Copestake
Natural Language and Information Processing Group
Computer Laboratory
University of Cambridge
June 2008
Generation
Lexical selection and collocation
Outline.
Generation
Overview: components of a generation system
Generation and parsing in constraint-based formalisms
Lexicalist generation
Realisation ranking
Lexical choice
Lexical selection and collocation
Generation
Lexical selection and collocation
Outline.
Generation
Overview: components of a generation system
Generation and parsing in constraint-based formalisms
Lexicalist generation
Realisation ranking
Lexical choice
Lexical selection and collocation
Generation
Lexical selection and collocation
Terminology
Content in KB
STRATEGIC
GENERATION
?
LF
TACTICAL
GENERATION
?
string (plus markup)
STRATEGIC GENERATION organizing knowledge to be
conveyed and constructing a LF that corresponds
to a sentence.
TACTICAL GENERATION/REALIZATION LF to string. In
principle, independent of domain knowledge.
Generation
Lexical selection and collocation
Tasks in generation
Content determination/selection deciding what information to
convey (small amount of recent work on statistical
approaches)
Document structuring
Aggregation deciding how information may be split into
sentence-sized chunks
Referring expression generation deciding when to use
pronouns, etc. (mostly limited domain)
Lexical choice deciding which lexical items to use to convey a
given concept (mostly limited domain).
Surface realization mapping from a meaning representation for
an individual sentence to a string (or speech
output).
Generation
Lexical selection and collocation
Properties of a grammar
Grammar A grammar consists of a set of grammar rules, G,
a set of lexical entries, L, and a start structure, Q.
Lexical sign A lexical sign is a pair hL, Si of a TFS L and a
string list S
Valid phrase A valid phrase P is a pair hF , Si of a TFS F and a
string list S:
1. P is a lexical sign, or
2. F is subsumed by some rule R and
hF1 , S1 i . . . hFn , Sn i s.t. R’s daughters’
subsume F1 . . . Fn and S is the ordered
concatenation of S1 . . . Sn .
Sentences A string list S is a well-formed sentence if there is
a valid phrase hF , Si such that the start structure
Q subsumes F .
Generation
Lexical selection and collocation
Properties of parsing and generation
Parsing a string, S, consists of finding all valid phrases
hF1 , Si . . . hFn , Si such that the start structure Q subsumes
each structure F1 to Fn .
Generating from a start structure, Q 0 , which is equal to or
subsumed by, the general start structure Q, consists of finding
all valid strings, S1 . . . Sn which correspond to valid signs,
hF1 , S1 i . . . hFn , Sn i such that the start structure Q 0 subsumes
each structure F1 to Fn .
Generation
Lexical selection and collocation
Equivalent logical forms
Have to instantiate Q 0 with an LF in a particular syntactic form.
Logical form equivalence problem:
• Multiple LFs are logically equivalent.
• We can’t tell which LF a grammar will accept.
• LF equivalence problem is undecidable, even for FOPC
Two part solution:
1. Not all semantic equivalences should be treated as
equivalent inputs for generation
∀x[student 0 (x) =⇒ happy 0 (x)]
¬∃x[student 0 (x) ∧ ¬happy 0 (x)] ∧ (happy 0 (k ) ∨ ¬happy 0 (k ))
2. Allow for some variation via flat semantic representation:
2.1 [this(c) ∧ dog(c) ∧ chase(e, c, c 0 ) ∧ the(c 0 ) ∧ cat(c 0 )]
2.2 [cat(c 0 ) ∧ chase(e, c, c 0 ) ∧ dog(c) ∧ the(c 0 ) ∧ this(c)]
Generation
Lexical selection and collocation
Naive lexicalist generation
1. From the LF, construct a bag of instantiated lexical signs.
2. List the signs in all possible orders.
3. Parse each order.
• Highly independent of syntax
• Requires lexical entries to be recoverable
• Not exactly efficient . . .
• Shake and Bake generation is part of an approach to MT in
which transfer operates across instantiated lexical signs
• Shake and Bake isn’t as bad as the totally naive approach,
but still worst-case exponential
Generation
Lexical selection and collocation
Lexical lookup for lexicalist generation
a0 (y ), consultant0 (y ), german0 (y ),
every0 (x), manager0 (x), interview0 (epast , x, y )
The instantiated lexical entry for interview contains:
interview0 (e1, x1, y 1) (e1,x1 and y 1 constants)
Complications:
• Lexical rules: past form of interview
• Multiple lexical entries (cf lexical ambiguity).
• Multiple relations in a lexical entry, with possible overlaps.
E.g., who — which_rel, person_rel.
• Lexical entries without relations (e.g., infinitival to).
Generation
Lexical selection and collocation
Chart generation
Lexical signs are used to instantiate the chart, then generation
as parsing.
id
1
2
3
4
5
6
12
18
22
24
MRS
string
Lexical edges
a0 (y 1)
a/an
consultant0 (y 1)
consultant
german0 (y 1)
German
every0 (x1)
every
manager0 (x1)
manager
interview0 (e1past , x1, y 1)
interviewed
Some of the edges constructed
a0 (y 1), consultant0 (y 1)
a consultant
every0 (x1), manager0 (x1)
every manager
interview0 (e1past , x1, y 1),
interviewed a
a0 (y 1), consultant0 (y 1)
consultant
german0 (y 1), consultant0 (y 1) german consultant
drs
(1,2)
(4,5)
(6,12)
(3,2)
Generation
Lexical selection and collocation
Chart generation, more details
1. indexing can be done by semantic indices
2. indices are constants, therefore don’t get incorrect
coindexation
3. daughters may not overlap (check overlap on LF: MRS is
good for this)
4. still worst case exponential (intersective modifiers)
Generation
Lexical selection and collocation
Chart generation in the LKB
1. algorithm above enhanced to allow semantics to be
contributed by rules
2. MRS input makes construction of the bag of signs fairly
easy
3. some tweaks for added efficiency
4. with added tweaks, there’s actually no advantage in
indexing by semantic indices
5. overgeneration is an issue with the LinGO ERG, mainly
with respect modifier order e.g., big red box, ?red big box.
6. stochastic ordering constraints: less well investigated than
for parsing
Generation
Lexical selection and collocation
Problems with purely symbolic generation
• Controlling realizations: cf ambiguity in parsing
• detailed grammars, very specific input
• ‘grammaticality’ vs fluency
• collocation (in the linguistic sense)
• adjective ordering: big red triangle vs ?red big triangle
• heaviness: ?Kim gave the very important consultant it vs
Kim gave it to the very important consultant
• Information structure.
• Topicalization: e.g., Bananas, I like
Generation
Lexical selection and collocation
Statistical generation approaches
1. n-grams on a word lattice
Langkilde and Knight:
• shallow hand-written rewrite grammar generates a word
lattice
• concepts based on WordNet, lexical choice among
elements in synsets
• bigram model for choosing between realisations
2. train a bi-directional grammar on a realisation bank
3. model specific problems that the symbolic grammar
doesn’t deal with
Techniques are not mutually exclusive.
Generation
Lexical selection and collocation
Statistical generation model with an HPSG
• Erik Velldal: Velldal and Oepen (2006)
• n-gram models trained on BNC
• symmetric treebanks:
• standard Redwoods Treebank: select analysis (and thus
semantics) for items in corpus.
• symmetric treebank: record other possible realisations for
the given semantics
• Maximum entropy model trained on TreeBank
• Selective unpacking
• n-grams in addition
Generation
Lexical selection and collocation
Adjective ordering
• Logical representation does not determine order
wet(x) ∧ weather(x) ∧ cold(x)
• Constraints / preferences: big red car
* red big car
cold wet weather
wet cold weather (OK, but dispreferred)
• bigrams perform poorly (sparse data)
• positional probability — i.e., probability adjective is
first/second in any pairing
• Malouf (2000): memory-based learning plus positional
probability
Generation
Lexical selection and collocation
Lexical choice in lexicalist generation
• Basic assumption is that EP corresponds to word. Null
semantics items dealt with via ‘trigger rules’.
• Grammar controls some lexical selection (see later), so
input to generator shouldn’t have to specify it.
• Other cases are partially conventional but not dealt with in
the grammar.
Determiner choice: I cut my face vs *I cut the face
• Currently, generator input to ERG has to be very precisely
specified: unsuitable for many potential applications.
Generation
Lexical selection and collocation
Determiner choice
We went climbing in Andes
president of United States
I tore pyjamas
I tore duvet
George doesn’t like vegetables
We bought new car yesterday
cf Minnen et al: a/the/no determiner selection
Generation
Lexical selection and collocation
Outline.
Generation
Overview: components of a generation system
Generation and parsing in constraint-based formalisms
Lexicalist generation
Realisation ranking
Lexical choice
Lexical selection and collocation
Generation
Lexical selection and collocation
Types of grammatical selection
• syntactic: e.g., preposition among selects for an NP (like
other prepositions)
• lexical: e.g., spend selects for PP headed by on
Kim spent the money on a car
• semantic, but conventionalised: e.g., temporal at selects
for times of day (and meals)
at 3am
at three thirty five and ten seconds precisely
Generation
Lexical selection and collocation
Lexical selection
spend_v2 := v_np-pp_le &
[ STEM < "spend" >,
SYNSEM [ LKEYS [ -OCOMPKEY _on_p_rel,
KEYREL.PRED "_spend_v_1_rel" ] ]
• ERG relies on convention that different lexemes have
different relations
• ‘lexical’ selection is actually semantic. cf Wechsler
• no true synonyms assumption, or assume that grammar
makes distinctions that are more fine-grained than
real-world denotation justifies.
• near-synonymy would have to be recorded elsewhere
Generation
Lexical selection and collocation
Semantic selection
In ERG, specify a higher node in the hierarchy of relations:
at_temp := p_np_i-tmp_le &
[ STEM < "at" >,
SYNSEM [ LKEYS [ -COMPKEY hour_or_time_rel,
KEYREL.PRED _at_p_temp_rel ] ] ]
• Semantic selection allows for indefinitely large set of
alternative phrases.
• productive with respect to new words, but exceptions
allowable: not falsified if e.g., *at tiffin
• ERG lexical selection is a special case of ERG semantic
selection!
• also idiom mechanism in ERG
Generation
Lexical selection and collocation
Denotation and grammar engineering
• Denotation is truth-conditional, logically formalisable (in
principle), refers to ‘real world’ (extension)
• Must interface with non-linguistic components
• Minimising lexical complexity in broad-coverage grammars
is practically necessary
• Plausible input to generator: reasonable to expect real
world constraints to be obeyed (except in context)
Generation
Lexical selection and collocation
Denotation and grammar engineering
• Assume linkage to domain, richer knowledge
representation language available
• TFS language for morphology, syntax, compositional
semantics: not intended for general inference.
• Talmy example: the baguette lay across the road
across - Figure’s length > Ground’s width
• identifying F and G and location for comparison in
grammar?
• coding average length of all nouns?
• allowing for massive baguettes and tiny roads?
Generation
Lexical selection and collocation
But . . .
• KR currently assumes description logics rather than richer
languages, so inference will be limited.
• Need to think about the denotation to justify
grammaticization (or otherwise):
if temporal in/on/at share denotation, selectional account
for distribution
unreasonable to expect in/on/at in generator input
• Linguistic criteria: denotation versus grammaticization?
effect found cross-linguistically? predictable on basis of
world knowledge? closed class vs open class
• Practical considerations about interfacing
• allow generalisation over e.g., in/on/at in generator input,
while keeping possibility of distinction
Generation
Lexical selection and collocation
Collocations
• Intuition: two or more lexical items occurring together in
some syntactic relationship more frequently than would be
expected, even given world knowledge. e.g., shake and fist
but NOT buy and house
• anti-collocation: concentrated tea vs strong tea
• collocation or semantics? is this even a testable concept?
heavy smoker, heavy use, heavy consumption
heavy weather, heavy sea, heavy breathing
(compare with strong)
Generation
Lexical selection and collocation
Collocation versus denotation
• Whether an unusually frequent word pair is a collocation or
not depends on assumptions about denotation: fix
denotation to investigate collocation
• Empirically: investigations using WordNet synsets (Pearce,
2001)
• Anti-collocation: words that might be expected to go
together and tend not to
e.g., flawless behaviour (Cruse, 1986), big rain (unless
explained by denotation)
Generation
Lexical selection and collocation
Bake versus roast
roast
bake
beef, pork, chicken,
head of lamb, camel,
goat, !cow, lizard,
crab, *ham, *gammon
bread, cake, pies
potato, chestnuts,
turnip, crab apples,
?apple
ham, fish, snake,
barracuda steak
apple, potato
Gratin Dauphinois,
pork chops with apples
coffee beans
clay, concrete, earth
duke
pottery, resistance films
metal ore
Generation
Lexical selection and collocation
rancid in the BNC
most frequent uses (77 cases in 100 million words):
fat 6
butter 5
oil 5
meat 4
pork 2
odour 3
smell 2
Possibly speaker-dependent status:
Collocation rancid normally occurs with oily things (or, for
some people, dairy products, or . . . ) but just
means ‘off’
Denotation (technical use) rancid refers to a certain sort of
‘offness’ (associated with oxidized fat)
Generation
Lexical selection and collocation
Final summary: Data-driven and generative
approaches.
• Two paradigms:
1. formal linguistics/generative grammar
2. data-driven techniques
• Hypothesis: data-driven techniques work because they
model some aspects of language that the classical
approaches don’t.
• Course has discussed some areas of semantics and
pragmatics in computational linguistics where combined
models seem promising.
• Many more questions than conclusions!
Generation
Lexical selection and collocation
Final summary: Data-driven and generative
approaches.
• Two paradigms:
1. formal linguistics/generative grammar
2. data-driven techniques
• Hypothesis: data-driven techniques work because they
model some aspects of language that the classical
approaches don’t.
• Course has discussed some areas of semantics and
pragmatics in computational linguistics where combined
models seem promising.
• Many more questions than conclusions!