Open-domain QA systems

Open-domain QA systems
AnswerBus
LCC2([7]), QuASM3, IONAUT4([1]),
START5([11]) and Webclopedia6([10]).
AnswerBus: 句子级,多语言支持
functional words deletion (prepositions, determiners/pronouns, conjunctions, interjections,and
discourse particles.)
use of word frequency table (delete frequently used words)
special words deletion
word form modification.
候选答案提取
words, then an answer candidate sentence should have at least two of them. When a sentence
meets the condition as indicated by the above formula, it will receive a primary score based
on the number of matching words it contains. Otherwise, it will receive a score of “0.”
候选答案排序
问题类型答案类型(who name)
问题类型关键词扩展(多远千米)
名字实体提取
Coreference resolution (他何靖)
(AnswerBus only solves the coreferences in the
adjacent sentences. When this type of coreference is
detected, the later sentence receives part of score from
its previous sentence.
)
搜索引擎返回的顺序
答案句子评分
Webclopedia
Previous work in automated question answering has often categorized questions by question
word
alone or by a mixture of question word and the semantic class of the answer (Srihari and Li,
2000; Moldovan et al., 2000). To ensure full coverage of all forms of simple question and
answer, we have been developing a QA Typology as a taxonomy of QA types, becoming
increasingly specific as one moves from root downward.
To create the QA Typology, we analyzed 17,384 questions and their answers (downloaded
from
answers.com); see (Gerber, 2001). The Typology contains 94 nodes, of which 47 are leaf
nodes;
a section of it appears in Figure 2.
By CONTEXT
Naturally, this forces the patterns to
contain not only surface forms (words and punctuation, but
also type markers (Date, NumericalAmount, MoneyAmount...).
A Question/Answer Typology with Surface Text
Patterns
问题分类树
pattern自动提取(suffix tree,precision)
(NAME_OF_PERSON BIRTHYEAR),
pattern提取 查询
评估每个pattern的precision 查询
银平
Patterns of Potential Answer Expressions as Clues to the Right Answers
TextRoller
searches for candidate answers using key words (from the question text) and
chooses the most probable answer using patterns.
In the literature we find approaches attempting to distinguish between the main
(primary) and
additional (secondary) query words. In (Sneiders, 1998) this distinction is
discussed as applied to
searching for answers to FAQs, where the answers are represented as
sentences. Primary keywords
are the words that convey the essence of the sentence. They cannot be
ignored. Secondary
keywords are the less-relevant words for a particular sentence. They help to
convey the meaning of
the sentence but can be omitted without changing the essence of the meaning.
Answer Extraction
Ranking
1.In most cases, the matching is boolean:
2.a couple of special cases where finer distinctions
are made.
How many lives were lost in the Lockerbie air crash,
entities such as 270 lives or almost 300 lives would
be ranked above entities such as 200 pumpkins or
150. 2
3. the frequency and position of occurrences of a
given entity within the retrieved passages.