NTT`s Japanese-English Cross Language Question

NTT’s Japanese-English Cross
Language Question Answering System
Hideki Isozaki, Katsuhito Sudoh,
Hajime Tsukada
NTT Communication Science Labs
Nippon Telegraph and Telephone Corp.
Background
• We are working on Japanese Question Answering
systems based on Information Retrieval.
Question Analysis
Document Retrieval
Answer Extraction
• They are less expensive and more robust than
conventional DB-based QA systems.
• However, the amount of Japanese documents is
much smaller than that of English documents.
• If the systems can extract answers from English
documents, we will be able to get better answers for
our Japanese questions.
• Therefore, we worked on CLQA for a few years.
NTCIR CLQA
• Japan’s National Institute of Informatics holds an
Information Retrieval evaluation workshop (NTCIR)
every 1.5 year.
• In NTCIR 2005, CLQA among English, Japanese,
Chinese was one of the evaluation tasks.
• In this task, systems were evaluated by factoid
questions (i.e., Answers are names or numerical
expressions).
• We participated in only J-to-E, and our system
outperformed other systems.
Features of our system
• Proximity-based document retrieval
• Web-based back-transliteration
• Recognition of synonyms in document retrieval
and answer evaluation
0.35
0.3
CLQA JE results
(Top1 accuracy)
0.25
0.2
strict
lenient
0.15
0.1
0.05
0
O
sy
r
u
m
st e
2
3
4
5
6
7
8
Rough sketch of the system
Japanese QA
New!
English QA
We will introduce our document retrieval module
before the query translation module.
Document Retrieval for QA
Conventional TF・IDF does not work well for QA.
• Number of different query terms around an answer
candidate is a good hint of its correctness.
• However, repetitions of a single query term (or
synonyms) are not. (i.e., TF is detrimental.)
• In order to get the density of query terms, we
developed a paragraph retrieval engine.
• But it did not work well because paragraphs were
often too short or too long to cover useful query
terms around a correct answer.
Proximity-based Document Retrieval
(Isozaki, ACM TALIP 2005)
The score of a document D is defined by the best score of
all passages (= any word sequences) in the document.
The score of a passage is defined by the sum of IDFs of
query terms in the passage decayed by its length.
decay factor
Query: q1 or q2 or q3 ⇒ Q = {q1, q2, q3}
Introduction of a synonym operator
• Number of different query terms around an answer
candidate is a good hint of its correctness.
• However, repetitions of a single query term is not.
• Occurrences of synonyms should also be
regarded as repetitions.
• We introduce ‘or2’ to specify synonyms.
watercraft or2 ship or2 boat or2 ...
(Pirkola ’98 also used a synonym operator for CLIR.)
Performance of the IR engine
• This IR system was tuned by using TREC
QA English questions. (β=0.001)
• DIDF was better than Okapi BM25 tuned for
TREC QA (k1=0.3).
• Similar results were obtained for Japanese
questions.
Query Translation Module
847,000 Japanese words can be translated.
Monash University
EDICT
+ ENAMDICT
+
110 k entries
484 k entries
NTT
In-house dictionary
661 k entries
They were also added to the morphological analysis dictionary.
Query Translation Table
北京 has two translations: Peking ⇔ Beijing
or2
金閣寺 has three translations:
Kinkaku Temple ⇔ Kinkakuji
⇔ Temple of the Golden Pavilion
Output of the IR module
CLQA1-JA-S0011-00:
1600年、臼杵に漂着したオランダの船は何という?
(What is the name of the Dutch ship that drifted ashore
at Usuki in 1600?)
船
Automatic enhancements of the dictionary
Sometimes, Romanization of Japanese words
causes troubles in IR.
• 北谷町 (a town in Okinawa prefecture)
– Chatancho, Chatanchou, Chatan-cho,
Chatan cho, Chatan-chou, Chatan town, etc.
• 浩一 (first name for a boy)
– Kouichi, Kooichi, Koichi, Kohichi, etc.
We generated typical variations for these words
and joined them with ‘or2’.
Sometimes, we encounter unknown words.
Most unknown words are written in KATAKANA.
What is the original English spelling for アンジェイ?
Most of unknown KATAKANA words are loan words
and unfamiliar names
(e.g., new companies, chemical compounds).
We have to find the original English spelling for
the KATAKANA word. = Back-Transliteration
Conventional method for the backtransliteration
Prepare possible spelling candidates (phonetic
equivalents) for each KATAKANA characters,
and combine them. ア → a, u, er, or, ur, ir, etc.
イ → i, e, ee, ie, y, yi, yie, etc.
ウ → u, oo, w, woo, etc.
: : :
We can generate possible spellings, but they lead to
combinatorial explosion.
Our Approach:
What is the original English spelling for アンジェイ?
IR-based QA will answer the question!!
Web-based back-transliteration
What is the original English spelling for アンジェイ?
IR-based QA will answer the question!!
Unknown KATAKANA word:
Alphabet sequences
phonetic
distance
Training the transliteration module
アイボリー
アイスクリーム
ivory
ice cream
20 k transliteration pairs
Characters are aligned by φ2 values. (φ2 = normalized χ2)
ア イ
c1 c2
roman letter ε
i
katakana
ス ε ク リ ー ε
c3
c4 c5 c6
c7 c8
c
e
c
r
e
a
ム
c9
m
(ε = NULL)
best character alignment
katakana
roman
Applying the transliteration model
KATAKANA word
Roman letter word
all possible alignments
the best alignment
the second-order Markov model
Weighted Finite State Transducers (WFST) were
used for efficient calculation.
Performance of the back-transliteration
Development Qs
The search engine
returned no docs.
In-house Qs
Official run Qs
This is not
transliteration.
Answer Extraction
• Rule-based NE tagger (669 rules for 143 answer types)
• They were developed by using TREC QA questions,
CLQA development set, and 100 in-house questions.
wife of Bill Clinton →
wife of <MALE>Bill Clinton
Texas Governor George Bush →
<STATE>Texas <PTITLE>Governor <PERSON>George Bush
Use of Word Sense Tagger
(without disambiguation)
sentence no.
word no.
part-of-speech
word senses
Answer Evaluation
• Use of a simple Hanning window function
• The evaluation module also regards synonyms as simple
repetitions.
-W
0
+W
NTCIR CLQA task
• Development set: 300 questions
• Official run: 200 questions
• Document data:
– Daily Yomiuri 2000-2001 (17,741 news articles)
Performance of the entire QA systems
Official run (automatic lenient evaluation)
CLQA gave much better scores than MT+EQA2.
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
A
2
EQ
A
EQ
Q
A
2
+E
Improved question
analyzer
M
T
ah
oo
Y
Ex
c
it
e
M
T
C
+E
LQ
A
Q
A
2
Top1
MRR
Top5
ans type
Old question analyzer
Performance of MT+EQA systems
Question analysis often fails because MT systems
are not good at translating questions.
• Japanese
「水の画家」と呼ばれる印象派の画家といえば?
• Human translation
Who is the Impressionist painter known as a “waterlandscape painter”?
• Excite MT
Do if you say the painter of the impressionist school
that is called “Painter of water”?
• Yahoo! MT
Speaking of a painter of the Impressionists called “a
painter of water?”
Performance of IR module
% of answerable questions at rank R
(= at least one document contains a correct answer)
The official run questions are very easy
if they are given in English.
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Rank
1
~3
~10
~20
~50
CLQA with
or2
CLQA without
or2
EQA2
Performance of IR module
Official run (automatic lenient evaluation)
Precision
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Rank
1
~3
~10
~20
~50
CLQA with or2 CLQA without
or2
EQA2
The synonym operator `or2’ improves the IR performance.
Effectiveness of the synonym operator
for the entire QA system
Development set (automatic lenient evaluation)
0.7
0.6
0.5
Top1
MRR
Top5
0.4
0.3
0.2
0.1
0
CLQA with or2
CLQA without or2
Effectiveness of
Back-transliteration (+ Romanization)
Official run (automatic lenient evaluation)
0.7
0.6
Large translation dictionary
(EDICT + ENAMDICT + In-house)
Small translation dictionary
(Only EDICT)
Top1
MRR
Top5
0.5
0.4
0.3
0.2
0.1
0
CLQA
without with rm
bt
CLQA
without with rm
bt
Conclusions
• Our system showed the best performance in CLQA
JE task.
• The official run questions were very easy if they
were given in English.
• Web-based back-translation and Romanization are
useful when the translation dictionary is small.
• The synonym operator improved the performance.
• MT+EQA does not work well because MT systems
are not good at translating questions.
Performance of the entire QA systems
Development set (automatic lenient evaluation)
0.7
0.6
0.5
Top1
MRR
Top5
0.4
0.3
0.2
0.1
0
CLQA
EQA
EQA2