Das Projekt MiLCA

Approximating Textual
Entailment with LFG and
FrameNet Frames
Aljoscha Burchardt, Anette Frank
Computational Linguistics Department
Saarland University, Saarbrücken
Second Pascal Challenge Workshop
Venice, April 2006
SALSA-WS 09/05
Outline of this Talk
• Frame Semantics
• A baseline system for approximating Textual
Entailment
– LFG syntactical analyses with
– Frame semantics
– Statistical decision: entailed?
• Walk-through example from RTE 2006
• RTE 2006 results / brief conclusions
SALSA-WS 09/05
Frame Semantics
(Fillmore 1976, Fillmore et. al. 2003)
• Lexical semantic classification of predicates and their
argument structure
• A frame represents a prototypical situation (e.g.
Commercial_transaction, Theft, Awareness)
• A set of roles identifies the participants or propositions
involved
• Frames are organized in a hierarchy
• Berkeley FrameNet Project db: 600 frames, 9.000
lexical units, 135.000 annotated sentences
SALSA-WS 09/05
Linguistic Normalizations
(Frame: Commerce_buy)
Seller
Buyer
BMW bought Rover from British Aerospace.
Voice: active /
Rover was bought by BMW, which
financed
passive
[...] the new Range Rover.
Lexicalization
Goods
BMW, which acquired Rover in 1994, is now
dismantling the company.
Money
POS: verb /
BMW‘s purchase of Rover for $1.2 billion
noun
was a good move.
SALSA-WS 09/05
Frame Semantics for RTE
Focusing on lexical semantic classes and rolebased argument structure
– Built-in normalizations help to determine semantic
similarity at a high level of abstraction
– Disregarding aspects of “deep“ semantics: negation,
modality, quantification, ...
– Open for deeper modeling on demand (e.g. our
treatment of modality)
SALSA-WS 09/05
A Baseline System for Approximating
Textual Entailment
• Fine-grained LFG-based syntactic analysis
– English LFG grammar (Riezler et al. 2002)
– Wide-coverage with high-quality probabilistic disambiguation
• Frame Semantics
– Shallow lexical-semantic classification of predicate-argument
structure
– Extensions: WordNet senses, SUMO concepts
• Computing structural and semantic overlap of t and h
– Hypothesis: large overlap ≈ entailment
text
SALSA-WS 09/05
hypothesis
A Baseline System for Approximating
Textual Entailment
text
LFG f-structure graph w/ frames
& concepts
hypothesis
Linguistic Analyses
LFG f-structure graph w/ frames
& concepts
text-hypothesis
match graph
different types of
matches (aspects
Computing
of similarity)
Semantic
Overlap
Feature extraction
lexical, syntactic,
semantic structure &
overlap measures
Model
training Decision:
& classification
Statistical
Entailment?
Linguistic Components
XLE parsing:
LFG f-structure
Fred / Detour /
Rosy:
frames & roles
F-structure w/
semantics
projection
Rule-based: extend
& refine sem. proj.
• NEs, Locations
• Co-reference
• Modality, etc.
SALSA-WS 09/05
WordNet-based WSD:
WordNet & SUMO
Using XLE term rewriting
system (Crouch 2005)
Example from RTE 2006
Pair 716
Text
In 1983, Aki Kaurismäki directed his first full-time
feature.
Hypothesis
Aki Kaurismäki directed a film.
SALSA-WS 09/05
LFG F-Structures
Automatic Frame Annotation
for Text (SALTO Viewer)
Fred & Rosy
frames & roles
(statistical)
Collins Parse
SALSA-WS 09/05
Detour System
frames
(via WordNet)
Automatic Frame Annotation
for Hypothesis
716_h: Aki Karusmäki directed a film.
SALSA-WS 09/05
LFG + Frames for Hypothesis
(FEFViewer)
Rule-based
(LFG-NER)
Aki Kaurismäki directed a film.
SALSA-WS 09/05
Hypothesis-Text-Match Graphs
Computing Structural and Semantic overlap
Match graph bundles overlapping partial graphs marked
by match types
• Aspects of similarity
– Syntax-based (i.e. lexical and structural): Identical predicates
(attributes) trigger node (edge) matches.
– Semantics-based: Identical frames/concepts (roles) trigger
node (edge) matches.
• Degrees of similarity
– Strict matching
– Weak matching conditions for non-identical predicates:
• “Structurally related” e.g. via coreference (relative clauses,
appositives, pronominals)
• “Semantically related” via WordNet, Frame-Relations
SALSA-WS 09/05
t: In 1983, Aki Kaurismäki
directed his first full-time
feature.
Grammatically
related
h: Aki Kaurismäki directed a film.
WordNet
related
Statistical Modeling
•
Feature extraction on the basis of
–
–
–
–
–
•
Syntactic, Semantic matches (of different types)
Matching clusters’ sizes
Ratio (matched vs. hypothesis)
(Non-)matching modality
RTE-task, fragmentary (parse),…
Training/classification with WEKA tool
–
Feature selection
1. Predicate Matches
2. Frame overlap
3. Matching cluster size
–
–
Model 1: Conjunctive rule (Feat. 1,2)
Model 2: LogitBoost (Feat. 1,2,3)
RTE 2006 Results
all tasks
IE
IR
QA
SUM
Model 1
59.0
49.5
59.5
54.5
72.5
Model 2
57.8
48.5
58.5
57.0
67.0
• SUM (and IR) are natural tasks for Frame Semantics,
IE and QA need more deeper modeling (aboutness
vs. factivity)
• Error analysis
– True positives: high semantic overlap
– True negatives: 27% involve modality mismatches
– False examples: poor modeling of dissimalrity
• Many high-frequency features measuring similarity
• Few low-frequency features measuring dissimilarity
Brief Conclusions
• Good approximation of semantic similarity
– Deep LFG syntactical analyses integrated with
– Shallow lexical Frame Semantics (plus other lex. resources)
– Match graph measuring overlap
• Need better model for semantic dissimilarity
– Too few rejections (false positives >> false negatives)
• Towards deeper modeling
– Treatment of modal contexts
– Integration of lexical inferences
– Open for collaborations
SALSA-WS 09/05
stmt_type(f(0),declarative).
tense(f(0),past).
pred(f(0),direct).
mood(f(0),indicative).
dsubj(f(0),f(7)).
dobj(f(0),f(2)).
pred(f(2),film).
num(f(2),sg).
det_type(f(2),indef).
proper(f(7),name).
pred(f(7),'Kaurismaki').
num(f(7),sg).
mod(f(7),f(10)).
proper(f(10),name).
pred(f(10),'Aki').
num(f(10),sg).
sslink(f(0),s(41)).
sslink(f(2),s(42)).
sslink(f(7),s(45)).
sslink(f(10),s(59)).
frame(s(41),'Behind_the_scenes').
artist(s(41),s(45)).
production(s(41),s(42)).
frame(s(42),'Behind_the_scenes').
frame(s(45),'People').
person(s(45),s(59)).
person(s(45),s(45)).
LFG + Frames for
Hypothesis (FEF)
ont(s(41),s(48)).
ont(s(42),s(49)).
ont(s(45),s(56)).
wn_syn(s(48),'direct#v#11').
sumo_sub(s(48),'Steering').
milo_sub(s(48),'Steering').
wn_syn(s(49),'film#n#1').
sumo_sub(s(49),'MotionPicture').
milo_sub(s(49),'MotionPicture').
sumo_syn(s(56),'Human').
sumo_syn(s(58),'Human').