A Confidence Model for Syntactically

A Confidence Model for
Syntactically-Motivated Entailment
Proofs
Asher Stern & Ido Dagan
ISCOL
June 2011, Israel
1
Recognizing Textual Entailment (RTE)
• Given a text, T, and a hypothesis, H
• Does T entail H
Example
T: An explosion caused by gas took place at a
Taba hotel
H: A blast occurred at a hotel in Taba.
2
Proof Over Parse Trees
T = T0 → T1 → T2 → ... → Tn = H
3
Bar Ilan Proof System - Entailment Rules
Generic
Syntactic
Lexical
Syntactic
Lexical
explosion
blast
4
Bar Ilan Proof System
H: A blast occurred at a hotel in Taba.
An explosion caused by gas took place at a Taba hotel
A blast caused by gas took place at a Taba hotel
A blast took place at a Taba hotel
A blast occurred at a Taba hotel
A blast occurred at a hotel in Taba.
Lexical
Lexical syntactic
Syntactic
5
Tree-Edit-Distance
Insurgents attacked soldiers ->
Soldiers were attacked by insurgents
6
Proof over parse trees
Which steps?
• Tree-Edits
– Regular or custom
• Entailment Rules
How to classify?
• Decide “yes” if and only if a
proof was found
– Almost always “no”
– Cannot handle knowledge
inaccuracies
• Estimate a confidence to
the proof correctness
7
Proof systems
TED based
• Estimate the cost of a proof
• Complete proofs
• Arbitrary operations
• Limited knowledge
Entailment Rules based
• Linguistically motivated
• Rich knowledge
• No estimation of proof
correctness
• Incomplete proofs
– Mixed system with ad-hoc
approximate match criteria
Our System
• The benefits of both worlds, and more!
– Linguistically motivated complete proofs
– Confidence model
8
Our Method
1. Complete proofs
– On the fly operations
2. Cost model
3. Learning model parameters
9
On the fly Operations
• “On the fly” operations
– Insert node on the fly
– Move node / move sub-tree on the fly
– Flip part of speech
– Etc.
• More syntactically motivated than Tree Edits
• Not justified, but:
• Their impact on the proof correctness can be
estimated by the cost model.
10
Cost Model
The Idea:
1. Represent the proof as a feature-vector
2. Use the vector in a learning algorithm
11
Cost Model
•
•
•
•
Represent a proof as F(P) = (F1, F2 … FD)
Define weight vector w=(w1,w2,…,wD)
Define proof cost C ( P)   w  F  w  F
Classify a proof C ( P)  b
D
w
i 1
(P)
i
T
(P)
i
w
– b is a threshold
• Learn the parameters (w,b)
12
Search Algorithm
• Need to find the “best” proof
• “Best Proof” = proof with lowest cost
‒ Assuming a weight vector is given
• Search space is exponential
‒ pruning
13
Parameter Estimation
• Goal: find good weight vector and threshold (w,b)
• Use a standard machine learning algorithm
(logistic regression or linear SVM)
• But: Training samples are not given as feature
vectors
– Learning algorithm requires training samples
– Training samples construction requires weight vector
– Learning weight vector done by learning algorithm
• Iterative learning
14
Parameter Estimation
Learning
Algorithm
Weight
Vector
Training
Samples
15
Parameter Estimation
1. Start with w0, a reasonable guess for weight
vector
2. i=0
3. Repeat until convergence
1. Find the best proofs and construct vectors, using
wi
2. Use a linear ML algorithm to find a new weight
vector, wi+1
3. i = i+1
16
Results
System
RTE-1 RTE-2 RTE-3 RTE-5
Logical Resolution Refutation (Raina et al. 2005)
57.0
Probabilistic Calculus of Tree Transformations (Harmeling, 2009)
56.39
57.88
Probabilistic Tree Edit model (Wang and Manning, 2010)
63.0
61.10
Deterministic Entailment Proofs (Bar-Haim et al., 2007)
Our System
57.13
61.63
Operation
Avg. in
positives
Avg. in
negatives
Ratio
Insert Named Entity
0.006
0.016
2.67
Insert Content Word
0.038
0.094
2.44
DIRT
0.013
0.023
1.73
Change “subject” to “object” and vice versa
0.025
0.040
1.60
Flip Part-of-speech
0.098
0.101
1.03
Lin similarity
0.084
0.072
0.86
WordNet
0.064
0.052
0.81
61.12
63.80
67.13
63.50
17
Conclusions
1. Linguistically motivated proofs
– Complete proofs
2. Cost model
– Estimation of proof correctness
3. Search best proof
4. Learning parameters
5. Results
– Reasonable behavior of learning scheme
18
Thank you
Q&A
19