Poster

 Takeaway
MSRA at NTCIR-­‐10 1CLICK-­‐2 ▶ AutomaFc query aHribute extracFon is effecFve especially for celebrity queries. Kazuya Narita (Tohoku University, Japan), Tetsuya Sakai, Zhicheng Dou (MSRA, China), Young-­‐In Song (NHN Corpora;on, Korea) System Architecture
I. Query Type Classifier
Classify by some heurisFc rules For example: Query length QuesFon parFcles GEO clue suffixes at middle “市” (city), “駅” (sta;on), “区” (ward), “町” (town) … FACILITY clue suffixes “園” (park), “学校” (school), “病院” (hospital) … NE tagger Clue words in wiki pages “小説家” (novelist), “選手” (player), “女優” (actress) … Celebrity dicFonary (made by Wikipedia Infobox) Else Query Type II. A;ribute Extractor
Input: Query QA
Sentences Sentences Web search results Find query aHributes Ichiro has aHributes
Born Height Future Science Museum GEO
Sentences Sentences Sentences Address Approach: III. Sentence Ranker
FACILITY Rank sentences by some heurisFc rules Content Similarity cosine s1
similarity
s2
LexRank algorithm (similar to other sentences s4
s3
ARTIST = high score) ACTOR Query a@ributes POLITICIAN weight for each aHribute as importance ATHLETE like TF-­‐IDF (based on classifier results)
DEFINITION Manual clue words and URLs “birthday”: more relevant for celebrity “amazon.co.jp”: less relevant
score
Search Rank 1
linear score with search rank
0
search rank
Sentences Sentences Ranked Sentences Output: X-­‐string Posi.on has aHributes
Opening Hours Tel Web search results
Extract tables or text containing “:”
Born Height
Posi.on
October 22, 1973 (age 38)
5 g 11 in (1.80m) Ouhielder
Extract a
Hributes
Born: October 22, 1973 (age 38)
Height: 5 g 11 in (1.80m) Posi.on: Ouhielder Height Posi.on Born Sentences Sentences A;ributes IV. Post-­‐Processor
Diversify ranked sentences sentence1 sentence2 sentence3 …
sentencen compare with already selected sentences (MMR algorithm)
X-­‐string
sentence1 sentence3 … Similar sentences are eliminated
EvaluaFon Results and Discussion
▶ Query classificaFon accuracy: 0.83 ▷ FACILITY: insufficient suffixes and other rules ▶ Query aHributes are effecFve especially for celebrity queries ▷ E.g. ATHLETE query “イチロー” (Ichiro; professional baseball player): e
ffecFve a
Hributes “
所属” (
affilia;on) a
nd “
打率” (
ba[ng a
verage) gold \ sys ARTACTPOLATHFACGEODEF QA
▶
D
EFINITION a
nd Q
A: a
Hributes d
o n
ot w
ork w
ell (
by t
he n
ature o
f t
ypes) ARTIST
9 1 0 0 0 0 0 0
0.3 ACTOR
1 9 0 0 0 0 0 0
with aHributes 0.25 without aHributes POLITICIAN 1 0 8 0 0 0 1 0
0.2 without query type ATHLETE 0 0 0 10 0 0 0 0
0.15 FACILITY 1 0 0 0 7 1 5 1
0.1 0.05 GEO
1 0 0 0 0 14 0 0
0 DEFINITION 0 0 0 1 1 1 12 0
Overall ARTIST ACTOR POLITICIAN ATHLETE FACILITY GEO DEFINITION QA QA
0 0 0 0 0 0 1 14
mean S#-­‐measure Conclusion and Future work
▶ AutomaFc query aHribute extracFon is effecFve especially for celebrity queries. ▶ Future work: automaFc extracFon of clue words for query classificaFon, new framework instead of aHributes for DEFINITION and QA