slides

Building Structures from Classifiers for Passage Reranking Aliaksei Severyn1, Massimo Nicosia1, Alessandro Moschi@1,2 kindly presented by: Daniil Mirylenka1 1DISI, University of Trento, Italy 2QCRI, Qatar Founda@on, Doha, Qatar CIKM, 2013
1 Factoid QA What is Mark Twain's real name? 2 Factoid QA: Answer Retrieval Roll over, Mark Twain, because Mark McGwire is on the scene. fast recall IR What is Mark Twain's real name? Mark Twain couldn't have put it any beOer. Samuel Langhorne Clemens, beOer known as Mark Twain. SEARCH ENGINE KB 3 Factoid QA: Answer Passage Reranking slower precision NLP/ML What is Mark Twain's real name? Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beOer known as Mark Twain. Mark Twain couldn't have put it any beOer. Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beOer known as Mark Twain. Mark Twain couldn't have put it any beOer. SEARCH ENGINE KB 4 Factoid QA: Answer Extrac@on slow precision NLP/ML What is Mark Twain's real name? Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beOer known as Mark Twain. Mark Twain couldn't have put it any beOer. Roll over, Mark Twain, because Mark McGwire is on the scene. Samuel Langhorne Clemens, beOer known as Mark Twain. Mark Twain couldn't have put it any beOer. SEARCH ENGINE KB 5 Encoding ques@on/answer pairs <
What is Mark Twain's real name? <
What is Mark Twain's real name? ,
,
Samuel Langhorne Clemens, beOer known as Mark Twain. >
Roll over, Mark Twain, because Mark McGwire is on the scene. >
6 Encoding ques@on/answer pairs <
What is Mark Twain's real name? ,
Samuel Langhorne Clemens, beOer known as Mark Twain. >
Encode q/a pairs via similarity features (0.5, 0.4, 0.3, 0.0, 0.2,…, 1.0) lexical: n-­‐grams, Jaccard sim., etc. syntacKc: dependency path, TED semanKc: WN path, ESA, etc. 7 Encoding ques@on/answer pairs <
What is Mark Twain's real name? ,
Samuel Langhorne Clemens, beOer known as Mark Twain. >
Encode q/a pairs via similarity features (0.5, 0.4, 0.3, 0.0, 0.2,…, 1.0) briMle representaKon lexical: n-­‐grams, Jaccard sim., etc. syntacKc: dependency path, TED semanKc: WN path, ESA, etc. 8 Encoding ques@on/answer pairs <
What is Mark Twain's real name? Tedious feature engineering ,
Samuel Langhorne Clemens, beOer known as Mark Twain. >
Encode q/a pairs via similarity features (0.5, 0.4, 0.3, 0.0, 0.2,…, 1.0) briMle representaKon lexical: n-­‐grams, Jaccard sim., etc. syntacKc: dependency path, TED semanKc: WN path, ESA, etc. 9 Our goal §  Build an Answer Passage Reranking model that: §  encodes powerful syntac@c paOerns rela@ng q/a pairs §  requires no manual feature engineering 10 Previous work Previous state of the art systems on TREC QA build complicated feature-­‐based models derived from: §  Quasi synchronous grammars [Wang et al., 2007] §  Tree Edit Distance [Heilman & Smith, 2010] §  Probabilis@c model to learn TED transforma@ons on dependency trees [Wang & Manning, 2010] §  CRF + TED features [Yao et al., 2013] 11 Our approach §  Model q/a pairs explicitly as linguis@c structures §  Rely on Kernel Learning to automaKcally extract and learn powerful syntac@c paOerns <
<
What is Mark Twain's real name? ,
,
Samuel Langhorne Clemens, beOer known as Mark Twain. ,
>
>
(0.5, 0.2,…, 1.0) 12 Roadmap §  Learning to rank with kernels §  Preference reranking with kernels §  Tree Kernels §  Structural models of q/a pairs §  Structural tree representa@ons §  Seman@c Linking to relate ques@on and answer §  Experiments
13 Preference reranking with kernels Pairwise reranking approach §  Given a set of q/a pairs {a, b, c, d, e}, where a, c – relevant §  encode a set of pairwise preferences: a>b, c>e, a>d, c>b, etc. via preference kernel: PK (�a, b�, �c, e�) =
where K(a, c) =
�a − b, c − e� =
K(a, c) − K(a, e) − K(b, c) + K(b, e)
K(�Qa , Aa �, �Qc , Ac �) =
KTK (Qa , Qc ) + KTK (Aa , Ac ) + Kfvec (a, c)
14 Compu@ng kernel between q/a pairs <
,
KTK <
K(a, c) =
Kfvec KTK ,
>
,
(0.5, 0.2,…, 1.0) >
,
(0.1, 0.9,…, 0.4) K(�Qa , Aa �, �Qc , Ac �) =
KTK (Qa , Qc ) + KTK (Aa , Ac ) + Kfvec (a, c)
15 Tree Kernels §  Syntac@c and Par@al Tree Kernel (PTK) (Moschil, 2006) §  PTK generalizes STK (Collins and Duffy, 2002) to generate more general tree fragments §  PTK is suitable for cons@tuency and dependency structures 16 Structural representa@ons of q/a pairs §  NLP structures are rich sources of features §  Shallow syntac@c and dependency trees §  Linking related fragments between ques@on and answer is important: §  Simple string matching (Severyn and Moschil, 2012) §  Seman@c linking (this work) 17 Rela@onal shallow tree [Severyn & Moschil, 2012] <
<
What is Mark Twain's real name? ,
,
Samuel Langhorne Clemens, beOer known as Mark Twain. >
>
18 Seman@c linking <
<
What is Mark Twain's real name? ,
Samuel Langhorne Clemens, beOer known as Mark Twain. >
>
,
focus
NER: Person
NER: Person
19 Seman@c linking Find ques@on category (QC): HUM <
>
,
focus
NER: Person
NER: Person
20 Seman@c linking Find ques@on category (QC): HUM <
>
,
focus
NER: Person
NER: Person
Find focus (FC): name 21 Seman@c linking Find ques@on category (QC): HUM <
>
,
focus
Find focus (FC): name NER: Person
NER: Person
Find en@@es according to ques@on category in the answer passage (NER) 22 Seman@c linking Find ques@on category (QC): HUM <
Link focus word and named en@ty tree fragments >
,
focus
Find focus (FC): name NER: Person
NER: Person
Find en@@es according to ques@on category in the answer passage (NER) 23 Ques@on and Focus classifiers §  Trained with same Tree Kernel learning technology (SVM) §  No feature engineering §  State-­‐of-­‐the-­‐art performance 24 Feature Vector Representa@on §  Lexical §  Term-­‐overlap: n-­‐grams of lemmas, POS tags, dependency triplets §  SyntacKc §  Tree kernel score over shallow syntac@c and dependency trees §  QA compaKbility §  QuesKon category §  NER relatedness – propor@on of NER types related to the ques@on category 25 Experiments and models Data §  TREC QA 2002 & 2003 (824 ques@ons) §  Public benchmark on TREC 13 [Wang et al., 2007] Baselines §  BM25 model from IR §  CH -­‐ shallow tree [Severyn & Moschil, 2012] §  DEP – dependency tree §  V -­‐ similarity feature vector model Our approach §  +F -­‐ seman@c linking 26 Structural representa@ons on TREC QA MAP
MRR
P@1
BM25
0.22
28.02
18.17
V
0.22
28.40
18.54
CH
+V
+V+F
0.28
0.30
0.32
35.63
37.45
39.48
24.88
27.91
29.63
DEP
0.30
37.87
28.05
+V
0.30
37.64
28.05
+V+F
0.31
37.49
28.93
27 Structural representa@ons on TREC QA MAP
MRR
P@1
BM25
0.22
28.02
18.17
V
0.22
28.40
18.54
CH
+V
+V+F
0.28
0.30
0.32
35.63
37.45
39.48
24.88
27.91
29.63
DEP
0.30
37.87
28.05
+V
0.30
37.64
28.05
+V+F
0.31
37.49
28.93
28 Structural representa@ons on TREC QA MAP
MRR
P@1
BM25
0.22
28.02
18.17
V
0.22
28.40
18.54
CH
+V
+V+F
0.28
0.30
0.32
35.63
37.45
39.48
24.88
27.91
29.63
DEP
0.30
37.87
28.05
+V
0.30
37.64
28.05
+V+F
0.31
37.49
28.93
29 Comparing to state-­‐of-­‐the-­‐art on TREC 13 §  Manually curated test collec@on from TREC 13 [Wang et al., 2007] §  Used as a public benchmark to compare state-­‐of-­‐the-­‐art systems on TREC QA §  Use 824 ques@ons from TREC 2002-­‐2003 to train and TREC 13 to test §  Use strong Vadv feature baseline (word overlap, ESA, Transla@on model, etc.) 30 Comparing to state-­‐of-­‐the-­‐art on TREC 13 MAP
MRR
Wang et al., 2007
60.29
68.52
Heilman & Smith, 2010
60.91
69.17
Wang & Manning, 2010
59.51
Yao et al., 2013
Vadv
CH+Vadv
+F
69.51
63.07
56.27
74.77
62.94
66.11
68.29
74.19
75.20
31 Conclusions §  Treat q/a pairs directly encoding them into linguisKc structures augmented with seman@c informa@on §  Structural kernel technology to automaKcally extract and learn syntac@c/seman@c features §  SemanKc linking using ques@on and focus classifiers (trained with same tree kernel technology) and NERs §  State-­‐of-­‐the-­‐art results on TREC 13 32 Thanks for your aMenKon! 33 34 BACKUP SLIDES 35 Kernel Answer Passage reranker UIMA pipeline
NLP
annotators
syntactic/semantic
graph
Focus and
Question
classifiers
q/a similarity
features
train/test
data
Kernel-based
reranker
Query
Candidate
answers
Reranked
answers
Evaluation
Search Engine
36 Seman@c Linking §  Use Ques@on Category (QC) and Focus Classifier (FC) to find ques@on category and focus word §  Run NER on the answer passage text §  Connect focus word with related NERs (according to the ques@on category) in the answer Question Category
HUM
LOC
NUM
ENTY
Named Entity types
Person
Location
Date, Time, Money, Percentage
Organization, Person
37 Ques@on Classifier §  Tree kernel SVM mul@-­‐classifier (one-­‐vs-­‐all) §  6 coarse classes from Li & Roth, 2002: §  ABBR, DESC, ENTY, HUM, LOC, NUM §  Data §  5500 ques@ons from UIUIC [Li & Roth, 2002] Dataset
UIUIC
TREC test
STK
86.1
79.3
PTK
82.2
78.1
38 Focus classifier §  Tree Kernel SVM classifier §  Train: §  Posi@ve examples: label parent and grandparent nodes of the focus word with FC tag §  Nega@ve examples: label all other cons@tuent nodes with FC tag §  Test: §  Generate a set of candidate trees labeling parent and grandparen nodes of each word in a tree with FC §  Select the tree and thus the focus word associated with the highest SVM score 39 Focus classifier: genera@ng candidates §  Tree Kernel SVM classifier +1 -­‐1 40 Accuracy of focus classifer §  Ques@on Focus § 
§ 
§ 
600 ques@ons from SeCo-­‐600 [Quarteroni et al., 2012] 250 ques@ons from GeoQuery [Damjanovic et al. 2010] 2000 ques@ons from [Bunescu & Hang, 2010] Dataset
Mooney
SeCo-600
Bunescu
ST
73.0
90.0
89.7
STK
81.9
94.5
98.3
PTK
80.5
90.0
96.9
41