slides

Learning Adaptable Pa0erns for Passage Reranking Aliaksei Severyn1, Massimo Nicosia1, Alessandro Moschi@1,2 1DISI, University of Trento, Italy 2QCRI, Qatar Founda<on, Doha, Qatar CoNLL, Sofia, 2013
1 Factoid QA What is Mark Twain's real name? 2 Factoid QA: Answer Retrieval Roll over, Mark Twain, because Mark McGwire is on the scene. fast recall IR What is Mark Twain's real name? Mark Twain couldn't have put it any beNer. Samuel Langhorne Clemens, beNer known as Mark Twain. SEARCH ENGINE KB 3 Factoid QA: Answer Passage Reranking slower precision NLP/ML What is Mark Twain's real name? Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beNer known as Mark Twain. Mark Twain couldn't have put it any beNer. Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beNer known as Mark Twain. Mark Twain couldn't have put it any beNer. SEARCH ENGINE KB 4 Factoid QA: Answer Extrac<on slow precision NLP/ML What is Mark Twain's real name? Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beNer known as Mark Twain. Mark Twain couldn't have put it any beNer. Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beNer known as Mark Twain. Mark Twain couldn't have put it any beNer. SEARCH ENGINE KB 5 Encoding ques<on/answer pairs <
What is Mark Twain's real name? <
What is Mark Twain's real name? ,
,
Samuel Langhorne Clemens, beNer known as Mark Twain. >
Roll over, Mark Twain, because Mark McGwire is on the scene. >
6 Encoding ques<on/answer pairs <
What is Mark Twain's real name? ,
Samuel Langhorne Clemens, beNer known as Mark Twain. >
Encode q/a pairs via similarity features (0.5, 0.4, 0.3, 0.0, 0.2,…, 1.0) lexical: n-­‐grams, Jaccard sim., etc. syntacJc: dependency path, TED semanJc: WN path, ESA, etc. 7 Encoding ques<on/answer pairs <
What is Mark Twain's real name? ,
Samuel Langhorne Clemens, beNer known as Mark Twain. >
Encode q/a pairs via similarity features (0.5, 0.4, 0.3, 0.0, 0.2,…, 1.0) bri0le representaJon lexical: n-­‐grams, Jaccard sim., etc. syntacJc: dependency path, TED semanJc: WN path, ESA, etc. 8 Encoding ques<on/answer pairs <
What is Mark Twain's real name? Tedious feature engineering ,
Samuel Langhorne Clemens, beNer known as Mark Twain. >
Encode q/a pairs via similarity features (0.5, 0.4, 0.3, 0.0, 0.2,…, 1.0) bri0le representaJon lexical: n-­‐grams, Jaccard sim., etc. syntacJc: dependency path, TED semanJc: WN path, ESA, etc. 9 Our goal §  Build an Answer Passage Reranking model that: §  requires no manual feature engineering §  learns robust and adaptable syntac<c/
shallow seman<c features 10 Our approach §  Model q/a pairs explicitly as linguis<c structures §  Rely on Kernel Learning to automaJcally extract and learn powerful syntac<c paNerns <
<
What is Mark Twain's real name? ,
,
Samuel Langhorne Clemens, beNer known as Mark Twain. ,
>
>
(0.5, 0.2,…, 1.0) 11 Roadmap §  Learning to rank with kernels §  Preference reranking with kernels §  Tree Kernels §  Structural models of q/a pairs §  Shallow syntac<c and dependency trees §  Rela<onal linking § 
§ 
Naïve string matching Seman<c linking §  Experiments
12 Preference reranking with kernels Pairwise reranking approach [Crammer & Singer, 2002] §  Given a set of q/a pairs {a, b, c, d, e}, where a, c – relevant §  encode a set of pairwise preferences: a>b, c>e, a>d, etc. via preference kernel: PK (�a, b�, �c, e�) =
where K(a, c) =
�a − b, c − e� =
K(a, c) − K(a, e) − K(b, c) + K(b, e)
K(�Qa , Aa �, �Qc , Ac �) =
KTK (Qa , Qc ) + KTK (Aa , Ac ) + Kfvec (a, c)
13 Compu<ng kernel between q/a pairs <
,
KTK <
Kfvec KTK ,
>
,
(0.5, 0.2,…, 1.0) ,
>
(0.5, 0.2,…, 1.0) 14 Tree Kernels §  Syntac<c and Par<al Tree Kernel (PTK) (Moschik, 2006) §  PTK generalizes STK (Collins and Duffy, 2002) to generate more general tree fragments §  PTK is suitable for cons<tuency and dependency structures 15 Structural representa<ons of q/a pairs §  NLP structures are rich sources of features to model STS §  Shallow syntac<c and dependency trees §  Linking related fragments between ques<on and answer is important: §  Simple lemma matching (Severyn and Moschik, 2012) §  Seman<c linking (this work) 16 Rela<onal shallow tree (Severyn and Moschik, 2012) <
<
What is Mark Twain's real name? ,
,
Samuel Langhorne Clemens, beNer known as Mark Twain. >
>
17 Seman<c linking <
<
What is Mark Twain's real name? ,
Samuel Langhorne Clemens, beNer known as Mark Twain. >
>
,
focus
NER: Person
NER: Person
18 Seman<c linking Find ques<on category (QC): HUM <
>
,
focus
NER: Person
NER: Person
19 Seman<c linking Find ques<on category (QC): HUM <
>
,
focus
NER: Person
NER: Person
Find focus (FC): name 20 Seman<c linking Find ques<on category (QC): HUM <
>
,
focus
Find focus (FC): name NER: Person
NER: Person
Find en<<es according to ques<on category in the answer passage (NER) 21 Seman<c linking Find ques<on category (QC): HUM <
Link focus word and named en<ty tree fragments >
,
focus
Find focus (FC): name NER: Person
NER: Person
Find en<<es according to ques<on category in the answer passage (NER) 22 Ques<on and Focus classifiers §  Trained with same Tree Kernel learning technology §  No feature engineering §  State-­‐of-­‐the-­‐art performance 23 Feature Vector Representa<on §  Lexical §  Term-­‐overlap: n-­‐grams of lemmas, POS tags, dependency triplets §  SyntacJc §  Tree kernel score over shallow syntac<c and dependency trees §  QA compaJbility §  QuesJon category §  NER relatedness – propor<on of NER types related to the ques<on category 24 Experiments and models Data §  TREC QA 2002 & 2003 (824 ques<ons) §  Answerbag 10k for training and 1k for tes<ng Baselines §  BM25 from IR §  CH -­‐ shallow tree [Severyn & Moschik, 2012] §  V -­‐ similarity feature vector model Our approach §  +FC+QC -­‐ seman<c linking §  +TFC+QC -­‐ typed seman<c linking 25 Structural representa<ons on TREC QA MAP
MRR
P@1
BM25
0.22
28.02
18.17
V
0.22
28.40
18.54
CH
+V
+V+QC+FC
0.28
0.30
0.32
35.63
37.45
39.48
24.88
27.91
29.63
DEP
0.30
37.87
28.05
+V
0.30
37.64
28.05
+V+QC+FC
0.31
37.49
28.93
26 Structural representa<ons on TREC QA MAP
MRR
P@1
BM25
0.22
28.02
18.17
V
0.22
28.40
18.54
CH
+V
+V+QC+FC
0.28
0.30
0.32
35.63
37.45
39.48
24.88
27.91
29.63
DEP
0.30
37.87
28.05
+V
0.30
37.64
28.05
+V+QC+FC
0.31
37.49
28.93
27 Structural representa<ons on TREC QA MAP
MRR
P@1
BM25
0.22
28.02
18.17
V
0.22
28.40
18.54
CH
+V
+V+QC+FC
0.28
0.30
0.32
35.63
37.45
39.48
24.88
27.91
29.63
DEP
0.30
37.87
28.05
+V
0.30
37.64
28.05
+V+QC+FC
0.31
37.49
28.93
28 Structural representa<ons on TREC QA MAP
MRR
P@1
BM25
0.22
28.02
18.17
V
0.22
28.40
18.54
CH
+V
+V+QC+FC
0.28
0.30
0.32
35.63
37.45
39.48
24.88
27.91
29.63
DEP
0.30
37.87
28.05
+V
0.30
37.64
28.05
+V+QC+FC
0.31
37.49
28.93
29 Adaptability of rela<onal paNerns §  Train on non-­‐factoid (Answerbag) and test on factoid (TREC) QA §  Answer passages in Answerbag are real answers provided by humans, TREC answer passages just contain an answer key §  Answerbag is much larger than TREC §  How good are the paNerns learnt on Answerbag when tested on TREC? 30 Cross-­‐domain experiment: Answerbag -­‐> TREC QA MAP
BM25
V
CH
CH+V
CH+V+QC+FC
DEP+V
DEP+V+QC+FC
MRR
0.22
P@1
27.91
0.23
18.08
28.86
0.24
18.90
30.25
0.25
0.27
0.26
0.29
20.42
31.31
21.28
33.53
22.81
33.26
22.21
34.25
23.45
31 Conclusions §  Treat q/a pairs directly encoding them into linguisJc structures augmented with seman<c informa<on §  Structural kernel technology to automaJcally extract and learn syntac<c/seman<c features §  SemanJc linking using ques<on and focus classifiers (trained with same tree kernel technology) and NERs §  Learn adaptable syntac<c paNerns, e.g. using Answerbag model on TREC 32 Thanks for your a0enJon and welcome to our poster 33 34 Kernel Answer Passage reranker UIMA pipeline
NLP
annotators
syntactic/semantic
graph
Focus and
Question
classifiers
q/a similarity
features
train/test
data
Kernel-based
reranker
Query
Candidate
answers
Reranked
answers
Evaluation
Search Engine
35 Seman<c Linking §  Use Ques<on Category (QC) and Focus Classifier (FC) to find ques<on category and focus word §  Run NER on the answer passage text §  Connect focus word with related NERs (according to the ques<on category) in the answer Question Category
HUM
LOC
NUM
ENTY
Named Entity types
Person
Location
Date, Time, Money, Percentage
Organization, Person
36 Ques<on Classifier §  Tree kernel SVM mul<-­‐classifier (one-­‐vs-­‐all) §  6 coarse classes from Li & Roth, 2002: §  ABBR, DESC, ENTY, HUM, LOC, NUM §  Data §  5500 ques<ons from UIUIC [Li & Roth, 2002] Dataset
UIUIC
TREC test
STK
86.1
79.3
PTK
82.2
78.1
37 Focus classifier §  Tree Kernel SVM classifier §  Train: §  Posi<ve examples: label parent and grandparent nodes of the focus word with FC tag §  Nega<ve examples: label all other cons<tuent nodes with FC tag §  Test: §  Generate a set of candidate trees labeling parent and grandparen nodes of each word in a tree with FC §  Select the tree and thus the focus word associated with the highest SVM score 38 Focus classifier: genera<ng candidates §  Tree Kernel SVM classifier +1 -­‐1 39 Accuracy of focus classifer §  Ques<on Focus § 
§ 
§ 
600 ques<ons from SeCo-­‐600 [Quarteroni et al., 2012] 250 ques<ons from GeoQuery [Damjanovic et al. 2010] 2000 ques<ons from [Bunescu & Hang, 2010] Dataset
Mooney
SeCo-600
Bunescu
ST
73.0
90.0
89.7
STK
81.9
94.5
98.3
PTK
80.5
90.0
96.9
40 Structural representa<ons on TREC QA Models
MAP MRR
BM25
0.22
28.02
V
0.22
28.40
Structural representations
CH (S&M, 2012)
0.28
35.63
CH+V
0.30†
37.45†
DEP
0.30†
37.87†
DEP+V
0.30†
37.64†
Refined relational tag
CH+V+QC+FC
0.32‡
39.48‡
CH+V+QC+TFC
0.32‡
39.49‡
DEP+V+QC+FC
0.31‡
37.49
DEP+V+QC+TFC 0.31‡
38.05‡
P@1
18.17
18.54
24.88
27.91†
28.05†
28.05†
29.63‡
30.00‡
28.56
28.93‡
41 Cross-­‐domain experiment: Answerbag -­‐> TREC QA Models
MAP MRR
P@1
BM25
0.22
27.91
18.08
V
0.23
28.86
18.90
Basic structural representations
CH (S&M, 2012)
0.24
30.25
20.42
CH+V
0.25† 31.31† 21.28†
DEP+V
0.26† 33.26† 22.21†
Refined relational tag
CH+V+QC+TFC
0.27‡ 33.53‡ 22.81‡
DEP+V+QC+TFC 0.29‡ 34.25‡ 23.45‡
42