answers

Seman&c Analysis in Language Technology http://stp.lingfil.uu.se/~santinim/sais/2016/sais_2016.htm
Question Answering
Marina San(ni [email protected]fil.uu.se Department of Linguis(cs and Philology Uppsala University, Uppsala, Sweden Spring 2016 1 Previous Lecture: IE – Named En$ty Recogni$on (NER) 2 Named En$ty Recogni$on (NER) •  A very important sub-­‐task: find and classify names in text, for example: •  The decision by the independent MP Andrew Wilkie to withdraw his support for the minority Labor government sounded drama(c but it should not further threaten its stability. When, aJer the 2010 elec(on, Wilkie, Rob OakeshoN, Tony Windsor and the Greens agreed to support Labor, they gave just two guarantees: confidence and supply. Person Date Loca(on Organiza(on Etc. NER pipeline Representa(ve documents NER system 4 Human annota(on Sequence classifiers Annotated documents Training data Feature extrac(on Encoding classes for sequence labeling IO encoding
IOB encoding PER
O
PER
PER
PER
O
O
O
B-­‐PER O B-­‐PER B-­‐PER I-­‐PER O O O Fred showed Sue Mengqiu Huang ‘s new pain(ng
Features for sequence labeling •  Words •  Current word (essen(ally like a learned dic(onary) •  Previous/next word (context) •  Other kinds of inferred linguis(c classifica(on •  Part-­‐of-­‐speech tags •  Other features •  Word shapes •  etc. 6 Features: Word shapes
•  Word Shapes
•  Map words to simplified representation that encodes attributes
such as length, capitalization, numerals, Greek letters, internal
punctuation, etc.
•  Varicella zoster is a virus •  Messenger RNA (mRNA) is a large family of RNA molecules •  CPA1 (Carboxypep(dase A1 (Pancrea(c)) is a Protein Coding gene. Varicella-zoster
mRNA
CPA1
Xx-xxx
xXXX
XXXd
Inspira$on figure Task: Develop a set of regular expressions to recognize the character shape features. •  Possible set of REs matching the inspira(on figure (syntax dpn on prLang): No need to remember things by heart: once you know what you have to do, find the correct syntax on the web! 8 The gold standard corpus There are always many solu(ons to a research ques(on! You had to make your choice… Basic steps: 1.  Analyse the data (you must know your data well!!!); 2.  Get an idea of the paNerns 3.  Choose the way to go… 4.  Report your results 9 Proposed solu$ons •  (Xx*)* regardless the NE type •  Complex paNerns that could iden(fy approx. 900 lines out of 1316 en((es (regardless NE type) •  etc… 10 Some alterna$ves: create paLerns per NE type… (divide and conquer approach J ) Ex: person names (283): most person names have the shape: (Xx*){2} (presumably you woud get high accuracy) Miles Sindercombe p:person Armand de Pontmar(n p:person Alicia Gorey
p:person Kim Crosby (singer) p:person Edmond Roudnitska p:person Shobha Gurtu
p:person Bert Greene
p:person 11 Danica McKellar
p:person Sheila O'Brien
p:person Mar(n Day
p:person Clive MaNhew-­‐Wilson p:person Venugopal Dhoot
p:person Clifford Berry
p:person Munir Malik
p:person Mary Sears
p:person Charles Wayne "Chuck" Day
p:person Michael Formanek p:person Felix Carlebach
p:person Alexander Keith, Jr. p:person Omer Vanaudenhove p:person What’s the mathema$cal formalism underlying REs? 12 DFA 13 Conver$ng the regular expression (a|b)* to a DFA 14 Conver$ng the regular expression (a*|b*)* to a DFA 15 Conver$ng the regular expression ab(a|b)* to a DFA 16 Chomsky hierarchy •  Regular expressions help solve problems that are tractable by ”regular grammars”. 17 For example, it is not possible to write an FSM (and consequently regular expressions) that generates the language an bn, i.e. the set of all strings which consist of a (possibly empty) block of as followed by a (possibly empty) block of bs of exactly the same length). Areas where finite state methods have been shown to be par(cularly useful in NLP are phonological and morphological processing. In our case, we must explore and experiment with the NE corpus and see if there are sequences that cannot be captured by a regular language. For some problems, •  … the expressive power of REs is exactly what is needed •  For some other problems, the expressive power of REs is too weak… •  Addionally, since REs a basically hand-­‐wriNen rules, it is easy to get entagled with rules… at one point you do not know any more how the rules interact with each other… so results might be unpredictable J 18 End of previous lecture 19 Question
Answering
What is Ques(on Answering? Acknowledgements
Most slides borrowed or adapted from: Dan Jurafsky and Christopher Manning, Coursera Dan Jurafsky and James H. Mar(n (2015) J&M(2015, draJ): hNps://web.stanford.edu/~jurafsky/slp3/ Ques$on Answering One of the oldest NLP tasks (punched card systems in 1961) Ques%on:
What do worms eat?
worms
eat
Poten%al-Answers:
Worms eat grass
worms
eat
grass
what
Birds eat worms
birds
eat
22 Simmons, Klein, McConlogue. 1964. Indexing and Dependency Logic for Answering English Ques(ons. American Documenta(on 15:30, 196-­‐204 worms
Horses with worms eat grass
horses
with
worms
eat
grass
Grass is eaten by worms
worms
eat
grass
Ques$on Answering: IBM’s Watson •  Won Jeopardy on February 16, 2011! •  IBM’s Watson is a Ques(on Answering system. •  What is Jeopardy? 23 Jeopardy! •  Jeopardy! is an American television quiz compe((on in which contestants are presented with general knowledge clues in the form of answers, and must phrase their responses in the form of ques/ons. •  The original day(me version debuted on NBC on March 30, 1964, 24 Watson’s performance •  With the answer: “You just need a nap. You don’t have this sleep disorder that can make sufferers nod off while standing up,” Watson replied, “What is narcolepsy?” 25 Ques$on Answering: IBM’s Watson •  The winning reply! WILLIAM WILKINSON’S
“AN ACCOUNT OF THE PRINCIPALITIES OF
WALLACHIA AND MOLDOVIA”
INSPIRED THIS AUTHOR’S
MOST FAMOUS NOVEL
26 Bram Stoker Apple’s Siri 27 Wolfram Alpha 28 Types of Ques$ons in Modern Systems •  Factoid ques(ons • 
• 
• 
• 
Who wrote “The Universal Declara/on of Human Rights”? How many calories are there in two slices of apple pie? What is the average age of the onset of au/sm? Where is Apple Computer based? •  Complex (narra(ve) ques(ons: •  In children with an acute febrile illness, what is the efficacy of acetaminophen in reducing fever? •  What do scholars think about Jefferson’s posi/on on dealing with pirates? 29 Commercial systems: mainly factoid ques$ons Where is the Louvre Museum located? In Paris, France What’s the abbrevia(on for limited partnership? L.P. What are the names of Odin’s ravens? Huginn and Muninn What currency is used in China? The yuan What kind of nuts are used in marzipan? almonds What instrument does Max Roach play? drums What is the telephone number for Stanford 650-­‐723-­‐2300 University? Paradigms for QA •  IR-­‐based approaches •  TREC; IBM Watson; Google •  Knowledge-­‐based •  Apple Siri; Wolfram Alpha; •  Hybrid approaches •  IBM Watson; True Knowledge Evi 31 Many ques$ons can already be answered by web search •  a 32 IR-­‐based Ques$on Answering •  a 33 Things change all the $me…. J •  Google was a pure IR-­‐based QA, but in 2012 Knowledge Graph was added to Google's search engine. •  The Knowledge Graph is a knowledge base used by Google to enhance its search engine's search results with seman(c-­‐search informa(on gathered from a wide variety of sources. • 
Wikipedia: The goal of KGraph is that users would be able to use this informa(on to resolve their query without having to navigate to other sites and assemble the informa(on themselves. [...] According to some news websites, the implementa(on of Google's Knowledge Graph has played a role in the page view decline of various language versions of Wikipedia. 34 IR-­‐based Factoid QA Document
Document
Document
Document
Document Document
Answer
Indexing
Passage
Retrieval
Question
Processing
Question
Query
Formulation
Answer Type
Detection
35 Document
Retrieval
Docume
Docume
nt
Docume
nt
Docume
nt
Docume
nt
Relevant
nt
Docs
Passage
Retrieval
passages
Answer
Processing
IR-­‐based Factoid QA •  QUESTION PROCESSING •  Detect ques(on type, answer type, focus, rela(ons •  Formulate queries to send to a search engine •  PASSAGE RETRIEVAL •  Retrieve ranked documents •  Break into suitable passages and rerank •  ANSWER PROCESSING •  Extract candidate answers •  Rank candidates •  using evidence from the text and external sources Knowledge-­‐based approaches (Siri) •  Build a seman(c representa(on of the query •  Times, dates, loca(ons, en((es, numeric quan((es •  Map from this seman(cs to query structured data or resources • 
• 
• 
• 
37 Geospa(al databases Ontologies (Wikipedia infoboxes, dbPedia, WordNet, Yago) Restaurant review sources and reserva(on services Scien(fic databases SIRI's main tasks, at a high level, involve: • 
• 
• 
• 
• 
• 
• 
38 Using ASR (Automa(c speech recogni(on) to transcribe human speech (in this case, short uNerances of commands, ques(ons, or dicta(ons) into text. Using natural language processing (part of speech tagging, noun-­‐phrase chunking, dependency & cons(tuent parsing) to translate transcribed text into "parsed text". Using ques(on & intent analysis to analyze parsed text, detec(ng user commands and ac(ons. ("Schedule a mee(ng", "Set my alarm", ...) Using data technologies to interface with 3rd-­‐party web services such as OpenTable, WolframAlpha, to perform ac(ons, search opera(ons, and ques(on answering. ULerances SIRI has iden$fied as a ques$on, that it cannot directly answer, it will forward to more general ques$on-­‐answering services such as WolframAlpha Transforming output of 3rd party web services back into natural language text (eg, Today's weather report -­‐> "The weather will be sunny") Using TTS (text-­‐to-­‐speech) technologies to transform the natural language text from step 5 above into synthesized speech. Hybrid approaches (IBM Watson) •  Build a shallow seman(c representa(on of the query •  Generate answer candidates using IR methods •  Augmented with ontologies and semi-­‐structured data •  Score each candidate using richer knowledge sources •  Geospa(al databases •  Temporal reasoning •  Taxonomical classifica(on 39 Question
Answering
Answer Types and Query Formula(on Factoid Q/A Document
Document
Document
Document
Document Document
Answer
Indexing
Passage
Retrieval
Question
Processing
Question
Query
Formulation
Answer Type
Detection
41 Document
Retrieval
Docume
Docume
nt
Docume
nt
Docume
nt
Docume
nt
Relevant
nt
Docs
Passage
Retrieval
passages
Answer
Processing
Ques$on Processing Things to extract from the ques$on •  Answer Type Detec(on •  Decide the named en$ty type (person, place) of the answer •  Query Formula(on •  Choose query keywords for the IR system •  Ques(on Type classifica(on •  Is this a defini(on ques(on, a math ques(on, a list ques(on? •  Focus Detec(on •  Find the ques(on words that are replaced by the answer •  Rela(on Extrac(on 42 •  Find rela(ons between en((es in the ques(on Question Processing
They’re the two states you could be reentering if you’re crossing
Florida’s northern border
• 
• 
• 
• 
43 Answer Type: US state Query: two states, border, Florida, north Focus: the two states Rela(ons: borders(Florida, ?x, north) Answer Type Detec$on: Named En$$es •  Who founded Virgin Airlines? •  PERSON •  What Canadian city has the largest popula/on? •  CITY. Answer Type Taxonomy Xin Li, Dan Roth. 2002. Learning Ques(on Classifiers. COLING'02 •  6 coarse classes •  ABBEVIATION, ENTITY, DESCRIPTION, HUMAN, LOCATION, NUMERIC •  50 finer classes •  LOCATION: city, country, mountain… •  HUMAN: group, individual, (tle, descrip(on •  ENTITY: animal, body, color, currency… 45 Part of Li & Roth’s Answer Type Taxonomy city
country
state
reason
expression
LOCATION
definition
abbreviation
ABBREVIATION
DESCRIPTION
individual
food
ENTITY
NUMERIC
currency
animal
HUMAN
date
title
group
money
percent
distance
size
46 Answer Types 47 More Answer Types 48 Answer types in Jeopardy Ferrucci et al. 2010. Building Watson: An Overview of the DeepQA Project. AI Magazine. Fall 2010. 59-­‐79. •  2500 answer types in 20,000 Jeopardy ques(on sample •  The most frequent 200 answer types cover < 50% of data •  The 40 most frequent Jeopardy answer types he, country, city, man, film, state, she, author, group, here, company, president, capital, star, novel, character, woman, river, island, king, song, part, series, sport, singer, actor, play, team, show, actress, animal, presiden(al, composer, musical, na(on, book, (tle, leader, game 49 Answer Type Detec$on •  Hand-­‐wriNen rules •  Machine Learning •  Hybrids Answer Type Detec$on •  Regular expression-­‐based rules can get some cases: •  Who {is|was|are|were} PERSON •  PERSON (YEAR – YEAR) •  Other rules use the ques$on headword: (the headword of the first noun phrase aJer the wh-­‐word) •  Which city in China has the largest number of foreign financial companies? •  What is the state flower of California? Answer Type Detec$on •  Most oJen, we treat the problem as machine learning classifica(on •  Define a taxonomy of ques(on types •  Annotate training data for each ques(on type •  Train classifiers for each ques(on class using a rich set of features. •  features include those hand-­‐wriNen rules! 52 Features for Answer Type Detec$on • 
• 
• 
• 
• 
53 Ques(on words and phrases Part-­‐of-­‐speech tags Parse features (headwords) Named En((es Seman(cally related words Factoid Q/A Document
Document
Document
Document
Document Document
Answer
Indexing
Passage
Retrieval
Question
Processing
Question
Query
Formulation
Answer Type
Detection
54 Document
Retrieval
Docume
Docume
nt
Docume
nt
Docume
nt
Docume
nt
Relevant
nt
Docs
Passage
Retrieval
passages
Answer
Processing
Keyword Selec$on Algorithm Dan Moldovan, Sanda Harabagiu, Marius Paca, Rada Mihalcea, Richard Goodrum, Roxana Girju and Vasile Rus. 1999. Proceedings of TREC-­‐8. 1. Select all non-­‐stop words in quota(ons 2. Select all NNP words in recognized named en((es 3. Select all complex nominals with their adjec(val modifiers 4. Select all other complex nominals 5. Select all nouns with their adjec(val modifiers 6. Select all other nouns 7. Select all verbs 8. Select all adverbs 9. Select the QFW word (skipped in all previous steps) 10. Select all other words Choosing keywords from the query
Slide from Mihai Surdeanu Who coined the term “cyberspace” in his novel “Neuromancer”?
1
4
1
4
7
cyberspace/1 Neuromancer/1 term/4 novel/4 coined/7
56
Question
Answering
Passage Retrieval and Answer Extrac(on Factoid Q/A Document
Document
Document
Document
Document Document
Answer
Indexing
Passage
Retrieval
Question
Processing
Question
Query
Formulation
Answer Type
Detection
58 Document
Retrieval
Docume
Docume
nt
Docume
nt
Docume
nt
Docume
nt
Relevant
nt
Docs
Passage
Retrieval
passages
Answer
Processing
Passage Retrieval •  Step 1: IR engine retrieves documents using query terms •  Step 2: Segment the documents into shorter units •  something like paragraphs •  Step 3: Passage ranking •  Use answer type to help rerank passages 59 Features for Passage Ranking Either in rule-­‐based classifiers or with supervised machine learning • 
• 
• 
• 
• 
• 
Number of Named En((es of the right type in passage Number of query words in passage Number of ques(on N-­‐grams also in passage Proximity of query keywords to each other in passage Longest sequence of ques(on words Rank of the document containing passage Factoid Q/A Document
Document
Document
Document
Document Document
Answer
Indexing
Passage
Retrieval
Question
Processing
Question
Query
Formulation
Answer Type
Detection
61 Document
Retrieval
Docume
Docume
nt
Docume
nt
Docume
nt
Docume
nt
Relevant
nt
Docs
Passage
Retrieval
passages
Answer
Processing
Answer Extrac$on •  Run an answer-­‐type named-­‐en(ty tagger on the passages •  Each answer type requires a named-­‐en(ty tagger that detects it •  If answer type is CITY, tagger has to tag CITY •  Can be full NER, simple regular expressions, or hybrid •  Return the string with the right type: •  Who is the prime minister of India (PERSON) Manmohan Singh, Prime Minister of India, had told
left leaders that the deal would not be renegotiated.!
•  How tall is Mt. Everest? (LENGTH) The official height of Mount Everest is 29035 feet!
Ranking Candidate Answers •  But what if there are mul(ple candidate answers! Q: Who was Queen Victoria’s second son?!
•  Answer Type: Person •  Passage: The Marie biscuit is named aJer Marie Alexandrovna, the daughter of Czar Alexander II of Russia and wife of Alfred, the second son of Queen Victoria and Prince Albert Apposi(on is a gramma(cal construc(on in which two elements, normally noun phrases, are placed side by side, with one element serving to iden(fy the other in a different way. Use machine learning: Features for ranking candidate answers Answer type match: Candidate contains a phrase with the correct answer type. PaLern match: Regular expression paNern matches the candidate. Ques$on keywords: # of ques(on keywords in the candidate. Keyword distance: Distance in words between the candidate and query keywords Novelty factor: A word in the candidate is not in the query. Apposi$on features: The candidate is an apposi(ve to ques(on terms Punctua$on loca$on: The candidate is immediately followed by a comma, period, quota(on marks, semicolon, or exclama(on mark. Sequences of ques$on terms: The length of the longest sequence of ques(on terms that occurs in the candidate answer. Candidate Answer scoring in IBM Watson •  Each candidate answer gets scores from >50 components •  (from unstructured text, semi-­‐structured text, triple stores) 65 •  logical form (parse) match between ques(on and candidate •  passage source reliability •  geospa(al loca(on •  California is ”southwest of Montana” •  temporal rela(onships •  taxonomic classifica(on Common Evalua$on Metrics 1.  Accuracy (does answer match gold-­‐labeled answer?) 2.  Mean Reciprocal Rank •  For each query return a ranked list of M candidate answers. •  Its score is 1/Rank of the first right answer. •  Take the mean over all N queries N
MRR =
66 1
∑ rank
i
i=1
N
Common Evalua$on Metrics 1.  Accuracy (does answer match gold-­‐labeled answer?) 2.  Mean Reciprocal Rank: • 
• 
The reciprocal rank of a query response is the inverse of the rank of the first correct answer. The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q N
MRR =
67 1
∑ rank
i
i=1
N
= Common Evalua$on Metrics: MRR •  The mean reciprocal rank is the average of the reciprocal ranks of results for a sample of queries Q. •  (ex adapted from Wikipedia) •  3 ranked answers for a query, with the first one being the one it thinks is most likely correct •  Given those 3 samples, we could calculate the mean reciprocal rank as (1/3 + 1/2 + 1)/3 = 11/18 or about 0.61. 68 Common Evalua$on Metrics 1.  Mean Reciprocal Rank •  For each query return a ranked list of M candidate answers. •  Query score is 1/Rank of the first correct answer • 
• 
• 
• 
If first answer is correct: 1 else if second answer is correct: ½ else if third answer is correct: ⅓, etc. Score is 0 if none of the M answers are correct N
•  Take the mean over all N queries MRR =
69 1
∑ rank
i
i=1
N
Use of this metric •  Mean reciprocal rank is a sta(s(c measure for evalua(ng any process that produces a list of possible responses to a sample of queries, ordered by probability of correctness. •  Machine transla(on •  Ques(on answering •  Etc. 70 Question
Answering
Advanced: Answering Complex Ques(ons Answering harder ques$ons Q: What is water spinach? A: Water spinach (ipomoea aqua(ca) is a semi-­‐aqua(c leafy green plant with long hollow stems and spear-­‐ or heart-­‐shaped leaves, widely grown throughout Asia as a leaf vegetable. The leaves and stems are oJen eaten s(r-­‐fried flavored with salt or in soups. Other common names include morning glory vegetable, kangkong (Malay), rau muong (Viet.), ong choi (Cant.), and kong xin cai (Mand.). It is not related to spinach, but is closely related to sweet potato and convolvulus. Answering harder ques$on Q: In children with an acute febrile illness, what is the efficacy of single medica(on therapy with acetaminophen or ibuprofen in reducing fever? A: Ibuprofen provided greater temperature decrement and longer dura(on of an(pyresis than acetaminophen when the two drugs were administered in approximately equal doses. (PubMedID: 1621668, Evidence Strength: A) Answering harder ques$ons via query-­‐focused summariza$on •  The (boNom-­‐up) snippet method •  Find a set of relevant documents •  Extract informa(ve sentences from the documents (using …-­‐idf, MMR) •  Order and modify the sentences into an answer •  The (top-­‐down) informa(on extrac(on method •  build specific answerers for different ques(on types: •  defini(on ques(ons, •  biography ques(ons, •  certain medical ques(ons The Informa$on Extrac$on method •  a good biography of a person contains: •  a person’s birth/death, fame factor, educa$on, na$onality and so on •  a good defini$on contains: •  genus or hypernym •  The Hajj is a type of ritual •  a medical answer about a drug’s use contains: •  the problem (the medical condi(on), •  the interven$on (the drug or procedure), and •  the outcome (the result of the study). Informa$on that should be in the answer for 3 kinds of ques$ons Architecture for complex ques$on answering: defini$on ques$ons . S. Blair-­‐Goldensohn, K. McKeown and A. Schlaikjer. 2004. Answering Defini(on Ques(ons: A Hyrbid Approach
"What is the Hajj?"
(Ndocs=20, Len=8)
Document
Retrieval
11 Web documents
1127 total
sentences
The Hajj, or pilgrimage to Makkah [Mecca], is the central duty of Islam. More than
two million Muslims are expected to take the Hajj this year. Muslims must perform
the hajj at least once in their lifetime if physically and financially able. The Hajj is a
milestone event in a Muslim's life. The annual hajj begins in the twelfth month of
the Islamic year (which is lunar, not solar, so that hajj and Ramadan fall sometimes
in summer, sometimes in winter). The Hajj is a week-long pilgrimage that begins in
the 12th month of the Islamic lunar calendar. Another ceremony, which was not
connected with the rites of the Ka'ba before the rise of Islam, is the Hajj, the
annual pilgrimage to 'Arafat, about two miles east of Mecca, toward Mina…
Predicate
Identification
9 Genus-Species Sentences
The Hajj, or pilgrimage to Makkah (Mecca), is the central duty of Islam.
The Hajj is a milestone event in a Muslim's life.
The hajj is one of five pillars that make up the foundation of Islam.
...
383 Non-Specific Definitional sentences
Definition
Creation
Sentence clusters,
Importance ordering
Data-Driven
Analysis
The end